[Mesa-dev] [PATCH v2 065/103] i965/vec4: Fix UBO loads for 64-bit data

2016-10-11 Thread Iago Toral Quiroga
We need to emit 2 32-bit load messages to load a full dvec4. If only 1 or 2 double components are needed dead-code-elimination will remove the second one. We also need to shuffle the result of the 32-bit messages to form valid 64-bit SIMD4x2 data. --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |

[Mesa-dev] [PATCH v2 085/103] i965/vec4: fix store output for 64-bit types

2016-10-11 Thread Iago Toral Quiroga
We need to shuffle the data before it is written to the URB. Also, dvec3/4 need two vec4 slots. --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 29 ++--- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp

[Mesa-dev] [PATCH v2 055/103] i965/vec4: implement access to DF source components Z/W

2016-10-11 Thread Iago Toral Quiroga
The general idea is that with 32-bit swizzles we cannot address DF components Z/W directly, so instead we select the region that starts at the the 16B offset into the register and use X/Y swizzles. The above, however, has the caveat that we can't do that without violating register region

[Mesa-dev] [PATCH v2 094/103] i965/vec4/scalarize_df: do not scalarize swizzles that we can support natively

2016-10-11 Thread Iago Toral Quiroga
Certain swizzles like XYZW can be supported by translating only the first two 64-bit swizzle channels to 32-bit channels. This happens with swizzles such that the first two logical components, when translated to 32-bit channels and replicated across the second dvec2 row, select the same channels

[Mesa-dev] [PATCH v2 092/103] i965/vec4: dump subnr for FIXED_GRF

2016-10-11 Thread Iago Toral Quiroga
This came in handy when debugging the payload setup for Tess Eval, since it prints correct subnr for attributes that can be loaded in the second half of a register. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git

[Mesa-dev] [PATCH v2 030/103] i965/vec4: add helpers for conversions to/from doubles

2016-10-11 Thread Iago Toral Quiroga
Use these helpers to implement d2f and f2d. We will reuse these helpers when we implement things like d2i or i2d as well. --- src/mesa/drivers/dri/i965/brw_vec4.h | 5 +++ src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 54 +++--- 2 files changed, 39 insertions(+), 20

[Mesa-dev] [PATCH v2 064/103] i965/vec4: Add a shuffle_64bit_data helper

2016-10-11 Thread Iago Toral Quiroga
SIMD4x2 64bit data is stored in register space like this: r0.0:DF x0 y0 z0 w0 r0.1:DF x1 y1 z1 w1 When we need to write data such as this to memory using 32-bit write messages we need to shuffle it in this fashion: r0.0:DF x0 y0 x1 y1 r0.1:DF z0 w0 z1 w1 and emit two 32-bit write messages,

[Mesa-dev] [PATCH v2 059/103] i965/vec4: fix indentation in pack_uniform_registers

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index b79fd5e..45d49e9 100644 ---

[Mesa-dev] [PATCH v2 095/103] i965/vec4/scalarize_df: support more swizzles via vstride=0

2016-10-11 Thread Iago Toral Quiroga
By exploiting gen7's hardware decompression bug with vstride=0 we gain the capacity to support additional swizzle combinations. This also fixes ZW writes from X/Y channels like in: mov r2.z:df r0.:df Because DF regions use 2-wide rows with a vstride of 2, the region generated for the source

[Mesa-dev] [PATCH v2 027/103] i965/vec4: make opt_vector_float ignore doubles

2016-10-11 Thread Iago Toral Quiroga
The pass does not support doubles in its current form. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 06fa38f..675b7fc 100644 ---

[Mesa-dev] [PATCH v2 048/103] i965/vec4: dump NibCtrl for instructions with execsize != 8

2016-10-11 Thread Iago Toral Quiroga
v2: do it in the same fashion as the FS backend for consistency (Curro) --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 490cbae..69fdb1e 100644 ---

[Mesa-dev] [PATCH v2 093/103] i965/vec4: split instructions that read 64-bit interleaved attributes

2016-10-11 Thread Iago Toral Quiroga
Stages that use interleaved attributes generate regions with a vstride=0 that can hit the gen7 hardware decompression bug. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 28 ++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git

[Mesa-dev] [PATCH v2 075/103] i965/vec4: do not split scratch read/write opcodes

2016-10-11 Thread Iago Toral Quiroga
64-bit scratch read/writes require to shuffle data around so we need to have access to the full 64-bit data. We will do the right thing for these when we emit the messages. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 9 + 1 file changed, 9 insertions(+) diff --git

[Mesa-dev] [PATCH v2 083/103] i965/vec4: fix indentation in lower_attributes_to_hw_regs()

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index e732bf4..426faf0 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++

[Mesa-dev] [PATCH v2 043/103] i965/vec4: handle 32 and 64 bit channels in liveness analysis

2016-10-11 Thread Iago Toral Quiroga
From: "Juan A. Suarez Romero" Our current data flow analysis does not take into account that channels on 64-bit operands are 64-bit. This is a problem when the same register is accessed using both 64-bit and 32-bit channels. This is very common in operations where we need to

[Mesa-dev] [PATCH v2 084/103] i965/vec4: fix attribute setup for doubles

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 426faf0..56a46ad 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++

[Mesa-dev] [PATCH v2 037/103] i965/vec4: use the new helper function to create double immediates

2016-10-11 Thread Iago Toral Quiroga
From: Samuel Iglesias Gonsálvez Signed-off-by: Samuel Iglesias Gonsálvez --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp

[Mesa-dev] [PATCH v2 047/103] i965/vec4: make the generator set correct NibCtrl for SIMD4 DF instructions

2016-10-11 Thread Iago Toral Quiroga
From the HSW PRM, Command Reference, QtrCtrl: "NibCtrl is only allowed for SIMD4 instructions with a DF (Double Float) source or destination type." v2: Assert that the type is DF (Samuel) v3: Don't set the default group to 0 and then set it only for 4-wide instructions. Instead,

[Mesa-dev] [PATCH v2 079/103] i965/vec4: fix move_uniform_array_access_to_pull_constant() for 64-bit data

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index b0b5f39..f12a114 100644 ---

[Mesa-dev] [PATCH v2 042/103] i965/vec4: dump the instruction execution size

2016-10-11 Thread Iago Toral Quiroga
Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 2bde628..3191eab 100644 ---

[Mesa-dev] [PATCH v2 009/103] i965/vec4: add double/float conversion pseudo-opcodes

2016-10-11 Thread Iago Toral Quiroga
the weirdness of the double register allocation... Signed-off-by: Connor Abbott <connor.w.abb...@intel.com> Signed-off-by: Iago Toral Quiroga <ito...@igalia.com> --- src/mesa/drivers/dri/i965/brw_defines.h | 2 ++ src/mesa/drivers/dri/i965/brw_shader.cpp | 4 +++ src/

[Mesa-dev] [PATCH v2 052/103] i965/vec4: split double-precision bcsel

2016-10-11 Thread Iago Toral Quiroga
There is a hardware bug affecting compressed double-precision bcsel instructions in align16 mode by which they won't read predication mask properly. The bug does not affect other predicated instructions and it does not affect bcsel in Align1 mode either. This was found empirically and verified by

[Mesa-dev] [PATCH v2 016/103] i965/vec4: add dst_null_df()

2016-10-11 Thread Iago Toral Quiroga
Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_vec4.h | 5 + 1 file changed, 5 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 1505ba6..86e58f3 100644 ---

[Mesa-dev] [PATCH v2 077/103] i965/vec4: fix scratch reads for 64bit data

2016-10-11 Thread Iago Toral Quiroga
v2: Setup for a 64-bit scratch read by checking the type size of the correct register (Iago) --- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp

[Mesa-dev] [PATCH v2 033/103] i965/vec4: implement d2b

2016-10-11 Thread Iago Toral Quiroga
v2 (Curo): - Generate the flag register with a predicated MOV instead of a CMP instruction, which has the benefit that we can skip loading a DF 0.0 constant. - Avoid the PICK_LOW_32BIT + MOV by using the flag result and a SEL to set the boolean result. ---

[Mesa-dev] [PATCH v2 053/103] i965/vec4: add a scalarization pass for double-precision instructions

2016-10-11 Thread Iago Toral Quiroga
The hardware only supports 32-bit swizzles, which means that we can only access directly channels XY of a DF making access to channels ZW more difficult, specially considering the various regioning restrictions imposed by the hardware. The combination of both things makes handling ramdom swizzles

[Mesa-dev] [PATCH v2 024/103] i965/vec4: fix base offset for nir_registers with doubles

2016-10-11 Thread Iago Toral Quiroga
v2: do this inside dst_reg_for_nir_reg() instead of its callers --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 815082e..860ec51 100644 ---

[Mesa-dev] [PATCH v2 063/103] i965/vec4: support multiple dispatch widths and groups in the IR builder.

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_builder.h | 39 ++-- 1 file changed, 37 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_builder.h b/src/mesa/drivers/dri/i965/brw_vec4_builder.h index dab6e03..8352542 100644 ---

[Mesa-dev] [PATCH v2 039/103] i965/vec4: fix size_written for doubles

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 619e010..4e7515c 100644 ---

[Mesa-dev] [PATCH v2 074/103] i965/vec4: Do not use DepCtrl with 64-bit instructions

2016-10-11 Thread Iago Toral Quiroga
The BDW PRM says that it is not supported, but it seems that gen7 is also affected, since doing DepCtrl on double-float instructions leads to GPU hangs in some cases, which is probably not surprising knowing that this is not supported in new hardware iterations. The SKL PRMs do not mention this

[Mesa-dev] [PATCH v2 019/103] i965/vec4: Fix DCE for VEC4_OPCODE_SET_{LOW, HIGH}_32BIT

2016-10-11 Thread Iago Toral Quiroga
These align1 opcodes do partial writes of 64-bit data. The problem is that we want to use them to write on the same register to implement packDouble2x32 and from the point of view of DCE, since both opcodes write to the same register, only the last one stands and decides to eliminate the first,

[Mesa-dev] [PATCH v2 097/103] i965/vec4: run scalarize_df() after spilling

2016-10-11 Thread Iago Toral Quiroga
Spilling of 64-bit data requires data shuffling for the corresponding scratch read/write messages. This produces unsupported swizzle regions and writemasks that we need to scalarize. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 18 ++ 1 file changed, 18 insertions(+) diff --git

[Mesa-dev] [PATCH v2 067/103] i965/vec4: Fix SSBO stores for 64-bit data

2016-10-11 Thread Iago Toral Quiroga
In this case we need to shuffle the 64-bit data before we write it to memory, source from reg_offset + 1 to write components Z and W and consider that each DF channel is twice as big. --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 40 -- 1 file changed, 32

[Mesa-dev] [PATCH v2 044/103] i965/vec4: add a horiz_offset() helper

2016-10-11 Thread Iago Toral Quiroga
This will come in handy when we implement a simd lowering pass in a follow-up patch. --- src/mesa/drivers/dri/i965/brw_ir_vec4.h | 41 + 1 file changed, 41 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h

[Mesa-dev] [PATCH v2 041/103] i965/vec4: use the IR's execution size

2016-10-11 Thread Iago Toral Quiroga
In the vec4 backend the generator sets to 8 the execution size for all instructions by default, however, to implement 64-bit floating-point we will need to split certain instruction into smaller sizes so we need the IR to convey this information like we do in the scalar backend. This patch uses

[Mesa-dev] [PATCH v2 051/103] i965/vec4: teach cmod propagation about different execution sizes

2016-10-11 Thread Iago Toral Quiroga
We can't propagate the conditional modifier from one instruction to another of a different execution size / group, since that would change the channels affected by the conditional. --- src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)

[Mesa-dev] [PATCH v2 018/103] i965/vec4: add VEC4_OPCODE_SET_{LOW, HIGH}_32BIT opcodes

2016-10-11 Thread Iago Toral Quiroga
These opcodes will set the low/high 32-bit in each 64-bit data element using Align1 mode. We will use this to implement packDouble2x32. We use Align1 mode because in order to implement this in Align16 mode we would need to use 32-bit logical swizzles (XZ for low, YW for high), but the IR works in

[Mesa-dev] [PATCH v2 004/103] i965/vec4/nir: Add bit-size information to types

2016-10-11 Thread Iago Toral Quiroga
Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index af76730..5048c4e 100644

[Mesa-dev] [PATCH v2 011/103] i965: fix subnr overflow in suboffset()

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_reg.h | 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_reg.h b/src/mesa/drivers/dri/i965/brw_reg.h index 3b46d27..8907c9c 100644 --- a/src/mesa/drivers/dri/i965/brw_reg.h +++

[Mesa-dev] [PATCH v2 005/103] i965/vec4/nir: support doubles in ALU operations

2016-10-11 Thread Iago Toral Quiroga
Basically, this involves considering the bit-size information to set the appropriate type on both operands and destination. v2 (Curro) - Don't use two temporaries (and write one of them twice ) to obtain the nir_alu_type. Reviewed-by: Francisco Jerez ---

[Mesa-dev] [PATCH v2 021/103] i965/vec4: implement double unpacking

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 12 1 file changed, 12 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 04f70ef..2631bf3 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++

[Mesa-dev] [PATCH v2 006/103] i965/vec4/nir: set the right type for 64-bit registers

2016-10-11 Thread Iago Toral Quiroga
From: Connor Abbott --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 0d4c8f5..05e7f29 100644 ---

[Mesa-dev] [PATCH v2 035/103] i965/vec4: fix optimize predicate for doubles

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index c0cb141..088ed13 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++

[Mesa-dev] [PATCH v2 028/103] i965/vec4: fix register allocation for 64-bit undef sources

2016-10-11 Thread Iago Toral Quiroga
Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index fdd3cba..4dffd76 100644 ---

[Mesa-dev] [PATCH v2 007/103] i965/vec4/nir: fix emitting 64-bit immediates

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 22 ++ 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 05e7f29..ce95c8d 100644 ---

[Mesa-dev] [PATCH v2 010/103] i965/vec4: translate d2f/f2d

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 24 1 file changed, 24 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index ce95c8d..b75337c 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++

[Mesa-dev] [PATCH v2 002/103] i965/vec4/nir: simplify glsl_type_for_nir_alu_type()

2016-10-11 Thread Iago Toral Quiroga
From: Connor Abbott <connor.w.abb...@intel.com> Less duplication, one one less case to handle for doubles and support for sized NIR types. v2: Fix call to get_instance by swapping rows and columns params (Iago) Signed-off-by: Iago Toral Quiroga <ito...@igalia.com> Reviewed-by: Fra

[Mesa-dev] [PATCH v2 026/103] i965/vec4: fix get_nir_dest() to use DF type for 64-bit destinations

2016-10-11 Thread Iago Toral Quiroga
v2: Make dst_reg_for_nir_reg() handle this for nir_register since we want to have the correct type set before we call offset(). --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp

[Mesa-dev] [PATCH v2 003/103] i965/vec4/nir: allocate two registers for dvec3/dvec4

2016-10-11 Thread Iago Toral Quiroga
From: Connor Abbott v2 (Curro): - Do not special-case for a bit-size of 64, divide the bit_size by 32 instead. - Use DIV_ROUND_UP so we can handle sub-32-bit types. --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 7 --- 1 file changed, 4 insertions(+), 3

[Mesa-dev] [PATCH v2 012/103] i965: add brw_vecn_grf()

2016-10-11 Thread Iago Toral Quiroga
From: Connor Abbott Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_reg.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_reg.h b/src/mesa/drivers/dri/i965/brw_reg.h index 8907c9c..1fa2595

[Mesa-dev] [PATCH v2 034/103] i965/vec4: implement fsign() for doubles

2016-10-11 Thread Iago Toral Quiroga
v2: use a predicated MOV instead of a CMP, like we do in d2b, to skip loading a double immediate. --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 64 +++--- 1 file changed, 49 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp

[Mesa-dev] [PATCH v2 029/103] i965/vec4: Rename DF to/from F generator opcodes

2016-10-11 Thread Iago Toral Quiroga
The opcodes are not specific for conversions to/from float since we need the same for conversions to/from other 32-bit types. Rename the opcodes accordingly and change the asserts to check the size of the types involved instead. --- src/mesa/drivers/dri/i965/brw_defines.h | 4

[Mesa-dev] [PATCH v2 008/103] i965/vec4: add support for printing DF immediates

2016-10-11 Thread Iago Toral Quiroga
From: Connor Abbott Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index

[Mesa-dev] [PATCH v2 022/103] i965/vec4: implement double packing

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 2631bf3..37c3d7c 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++

[Mesa-dev] [PATCH v2 032/103] i965/vec4: implement d2i, d2u, i2d and u2d

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 14 ++ 1 file changed, 14 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 0170d21..cc10247 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++

[Mesa-dev] [PATCH v2 013/103] i965/vec4: set correct register regions for 32-bit and 64-bit

2016-10-11 Thread Iago Toral Quiroga
For 32-bit instructions we want to use <4,4,1> regions for VGRF sources so we should really set a width of 4 (we were setting 8). For 64-bit instructions we want to use a width of 2 because the hardware uses 32-bit swizzles, meaning that we can only address 2 consecutive 64-bit components in a

[Mesa-dev] [PATCH v2 017/103] i965/vec4: add VEC4_OPCODE_PICK_{LOW, HIGH}_32BIT opcodes

2016-10-11 Thread Iago Toral Quiroga
These opcodes will pick the low/high 32-bit in each 64-bit data element using Align1 mode. We will use this, for example, to do things like unpackDouble2x32. We use Align1 mode because in order to implement this in Align16 mode we would need to use 32-bit logical swizzles (XZ for low, YW for

[Mesa-dev] [PATCH v2 014/103] i965/disasm: align16 DF source regions have a width of 2

2016-10-11 Thread Iago Toral Quiroga
Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_disasm.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c b/src/mesa/drivers/dri/i965/brw_disasm.c index 5e51be7..1d2a4d2 100644 ---

[Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-10-11 Thread Iago Toral Quiroga
immediates Iago Toral Quiroga (92): i965/vec4/nir: Add bit-size information to types i965/vec4/nir: support doubles in ALU operations i965/vec4/nir: fix emitting 64-bit immediates i965/vec4: add double/float conversion pseudo-opcodes i965/vec4: translate d2f/f2d i965: fix subnr overflow

[Mesa-dev] [PATCH 3/3] i965/vec4: make offset() work in terms of a simd width and scalar components

2016-10-04 Thread Iago Toral Quiroga
So that it has the same semantics as the scalar backend implementation. The helper will now take a simd width (which is always 8 in vec4 mode) and step as many scalar components as specified by that width, respecting the size of the scalar channels. --- src/mesa/drivers/dri/i965/brw_ir_vec4.h

[Mesa-dev] [PATCH 1/3] i965/vec4: add a byte_offset helper

2016-10-04 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_ir_vec4.h | 46 + 1 file changed, 46 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h b/src/mesa/drivers/dri/i965/brw_ir_vec4.h index a8e5f4a..ef79e33 100644 --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h +++

[Mesa-dev] [PATCH 2/3] i965/vec4: use byte_offset() instead of offset()

2016-10-04 Thread Iago Toral Quiroga
In a later patch we want to change the semantics of offset() to be in terms of SIMD width and scalar channels so it is consistent with the definition of the same helper in the scalar backend. However, some uses of offset() in the vec4 backend do not operate naturally in terms of these semantics.

[Mesa-dev] [PATCH 0/3] i965/vec4: Make offset() work in terms of width and scalar channels

2016-10-04 Thread Iago Toral Quiroga
-August/125057.html Iago Toral Quiroga (3): i965/vec4: add a byte_offset helper i965/vec4: use byte_offset() instead of offset() i965/vec4: make offset() work in terms of a simd width and scalar components src/mesa/drivers/dri/i965/brw_ir_vec4.h| 58 -- src

[Mesa-dev] [PATCH v2] i965/vec4: add a SIMD lowering pass

2016-08-29 Thread Iago Toral Quiroga
Generally, instructions in Align16 mode only ever write to a single register and don't need any form of SIMD splitting, that's why we have never had a SIMD splitting pass in the vec4 backend. However, double-precision instructions typically write 2 registers and in some cases they run into certain

[Mesa-dev] [PATCH] i965/vec4: remove the generator hack for dual instanced GS

2016-08-26 Thread Iago Toral Quiroga
This hack was introduced in commit 03ac2c7223f7645e3: i965/gs: Fix up gl_PointSize input swizzling for DUAL_INSTANCED gs Specifically to fixup the code we emitted to deal with gl_PointSize inputs in dual instance mode, where we were emitting a MOV to copy the point size from .w (where the

[Mesa-dev] [PATCH 1/4] i965: move subreg_offset to backend_reg

2016-08-23 Thread Iago Toral Quiroga
So we can access it in the vec4 backend to handle byte offsets into registers. --- src/mesa/drivers/dri/i965/brw_ir_fs.h | 6 -- src/mesa/drivers/dri/i965/brw_shader.h | 6 ++ 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h

[Mesa-dev] [PATCH v3] i965/vec4: make offset() operate in terms of channels instead of full registers

2016-08-23 Thread Iago Toral Quiroga
This will make it more consistent with the FS implementation of the same helper and will provide more flexibility that will come in handy, for example, when we add a SIMD lowering pass in the vec4 backend. v2: - Move the switch statement to an add_byte_offset helper that takes a pointer to a

[Mesa-dev] [PATCH v2] i965/vec4: make offset() operate in terms of channels instead of full registers

2016-08-23 Thread Iago Toral Quiroga
This will make it more consistent with the FS implementation of the same helper and will provide more flexibility that will come in handy, for example, when we add a SIMD lowering pass in the vec4 backend. v2: - Move the switch statement to add_byte_offset (Iago) - Remove the assert on the

[Mesa-dev] [PATCH 0/4] i965/vec4: Changes to the offset() helper

2016-08-22 Thread Iago Toral Quiroga
semantics of the offset() helper. The align16/fp64 series should be rebased on top of these changes too. Iago Toral Quiroga (4): i965: move subreg_offset to backend_reg i965/vec4: make the offset() operate in terms of width and type i965/vec4/cse: adapt to changes in offset() helper i965/vec4

[Mesa-dev] [PATCH 3/4] i965/vec4/cse: adapt to changes in offset() helper

2016-08-22 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp index 0c1f0c3..eaf95c8 100644 ---

[Mesa-dev] [PATCH 4/4] i965/vec4: adapt to changes in the offset() helper

2016-08-22 Thread Iago Toral Quiroga
This commit should be squashed with the previous one. --- src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp | 12 +++- src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp | 8 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 2 +-

[Mesa-dev] [PATCH 2/4] i965/vec4: make the offset() operate in terms of width and type

2016-08-22 Thread Iago Toral Quiroga
This will make it more consistent with the FS implementation of the same helper and will provide more flexibility that will come in handy, for example, when we add a SIMD lowering pass in the vec4 backend. --- src/mesa/drivers/dri/i965/brw_ir_vec4.h | 47 ++--- 1 file

[Mesa-dev] [PATCH 56/95] i965/vec4: fix regs_written for doubles

2016-07-19 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 265bb17..ae8704a 100644 ---

[Mesa-dev] [PATCH 53/95] i965/disasm: fix subreg for dst in Align16 mode

2016-07-19 Thread Iago Toral Quiroga
There is a single bit for this, so it is a binary 0 or 1 meaning offset 0B or 16B respectively. --- src/mesa/drivers/dri/i965/brw_disasm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c b/src/mesa/drivers/dri/i965/brw_disasm.c index

[Mesa-dev] [PATCH 44/95] i965/vec4: teach CSE about exec_size, group and doubles

2016-07-19 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 30 -- 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp index 0c1f0c3..d1bd9fa 100644 ---

[Mesa-dev] [PATCH 51/95] i965/vec4: add a sanity check for force_vstride0

2016-07-19 Thread Iago Toral Quiroga
We only set this to true when fixing up 64bit regions and for one specific purpose only, so check that nothing else sets this to true. This helped me find a bug where the field was incorrectly initialized to true in some cases. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++ 1 file changed, 3

[Mesa-dev] [PATCH 58/95] i965/vec4: fix indentation in pack_uniform_registers

2016-07-19 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 95b408e..68efea6 100644 ---

[Mesa-dev] [PATCH 45/95] i965/vec4: split double-precision bcsel

2016-07-19 Thread Iago Toral Quiroga
There is a hardware bug affecting compressed double-precision bcsel instructions in align16 mode by which they won't read predication mask properly, leading to incorrect behavior at least in non-uniform control flow scenarios. The bug does not affect other predicated instructions and it does not

[Mesa-dev] [PATCH 35/95] i965/vec4: fix optimize predicate for doubles

2016-07-19 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index c9b8edf..d7c6bf4 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++

[Mesa-dev] [PATCH 49/95] i965/vec4: implement access to DF source components Z/W

2016-07-19 Thread Iago Toral Quiroga
The general idea is that with 32-bit swizzles we cannot address DF components Z/W directly, so instead we select the region that starts at the middle of the SIMD register and use X/Y swizzles. The above, however, has the caveat that we can't do that without violating register region restrictions

[Mesa-dev] [PATCH 55/95] i965/vec4: teach register coalescing about 64-bit

2016-07-19 Thread Iago Toral Quiroga
Specifically, at least for now, we don't want to deal with the fact that channel sizes for fp64 instructions are twice the size, so prevent coalescing from instructions with a different type size. Also, we should check that if we are coalescing a register from another MOV we should be reading the

[Mesa-dev] [PATCH 52/95] i965/vec4: print subnr in dump_instruction()

2016-07-19 Thread Iago Toral Quiroga
Also, we use reg_offset=1 with DF uniforms when we try to access components Z/W, so print reg_offset for them too. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp

[Mesa-dev] [PATCH 59/95] i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands

2016-07-19 Thread Iago Toral Quiroga
We make scalar sources in 3src instructions use subnr instead of swizzles because they don't really use swizzles. With doubles it is more complicated because we use vstride=0 in more scenarios in which they don't produce scalar regions. Also RepCtrl is not allowed with 64-bit operands, so we

[Mesa-dev] [PATCH 43/95] i965/disasm: print NibCtrl for instructions with execsize 4

2016-07-19 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_disasm.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c b/src/mesa/drivers/dri/i965/brw_disasm.c index c8bdeab..d5e9916 100644 --- a/src/mesa/drivers/dri/i965/brw_disasm.c +++

[Mesa-dev] [PATCH 54/95] i965/vec4: fix regs_read() for doubles

2016-07-19 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 9400baa..a366548 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++

[Mesa-dev] [PATCH 39/95] i965/vec4: dump the instruction execution size

2016-07-19 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index c55d594..8316691 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++

[Mesa-dev] [PATCH 57/95] i965/vec4: fix pack_uniform_registers for doubles

2016-07-19 Thread Iago Toral Quiroga
We need to consider the fact that dvec3/4 require two vec4 slots. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 1b190ab..95b408e

[Mesa-dev] [PATCH 11/95] i965/vec4: add double/float conversion pseudo-opcodes

2016-07-19 Thread Iago Toral Quiroga
the weirdness of the double register allocation... Signed-off-by: Connor Abbott <connor.w.abb...@intel.com> Signed-off-by: Iago Toral Quiroga <ito...@igalia.com> --- src/mesa/drivers/dri/i965/brw_defines.h | 2 ++ src/mesa/drivers/dri/i965/brw_shader.cpp | 4 +++ src/

[Mesa-dev] [PATCH 27/95] i965/vec4: make opt_vector_float ignore doubles

2016-07-19 Thread Iago Toral Quiroga
The pass does not support doubles in its current form. I'm not even sure that it should, since it would basically change the type of the operation and that could have implications for things like SSBO writes, etc. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 + 1 file changed, 1 insertion(+)

[Mesa-dev] [PATCH 18/95] i965/vec4: add VEC4_OPCODE_PICK_{LOW, HIGH}_32BIT opcodes

2016-07-19 Thread Iago Toral Quiroga
These opcodes will pick the low/high 32-bit in each 64-bit data element using Align1 mode. We will use this, for example, to things like unpackDouble2x32. We can't do this in Align16 because we would need data to cross the vec4 boundary. --- src/mesa/drivers/dri/i965/brw_defines.h | 2

[Mesa-dev] [PATCH 33/95] i965/vec4: implement d2b

2016-07-19 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 18 ++ 1 file changed, 18 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 1525a3d..4014020 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++

[Mesa-dev] [PATCH 38/95] i965/vec4: allow the vec4 IR to indicate the execution size of instructions

2016-07-19 Thread Iago Toral Quiroga
In the vec4 backend the generator sets the execution size for all instructions to 8, however, we will have to split certain DF instructions to have an execution size of 4, so we need to indicate this explicitly in the IR for the generator to set the right execution size for them. We will use this

[Mesa-dev] [PATCH 07/95] i965/vec4/nir: set the right type for 64-bit registers

2016-07-19 Thread Iago Toral Quiroga
From: Connor Abbott --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index df927e7..095a27d 100644 ---

[Mesa-dev] [PATCH 25/95] i965/vec4: fix base offset for nir_registers with doubles

2016-07-19 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index cf35f2e..fde7b60 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++

[Mesa-dev] [PATCH 46/95] i965/vec4: add a scalarization pass for double-precision instructions

2016-07-19 Thread Iago Toral Quiroga
The hardware only supports 32-bit swizzles, which means that a swizzle like XYZW only selects channels XY of a DF, making access to channels ZW more difficult, specially considering the various regioning restrictions imposed by the hardware. The combination of both things makes handling ramdom

[Mesa-dev] [PATCH 23/95] i965/vec4: implement double packing

2016-07-19 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 82bf927..dd06a32 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++

[Mesa-dev] [PATCH 30/95] i965/vec4: add helpers for conversions to/from doubles

2016-07-19 Thread Iago Toral Quiroga
Use these helpers to implement d2f and f2d. We will reuse these helpers when we implement things like d2i or i2d as well. --- src/mesa/drivers/dri/i965/brw_vec4.h | 5 +++ src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 53 +++--- 2 files changed, 38 insertions(+), 20

[Mesa-dev] [PATCH 31/95] i965/vec4: implement hardware workaround for align16 double to float conversion

2016-07-19 Thread Iago Toral Quiroga
From the BDW PRM, Workarounds chapter: "DF->f format conversion for Align16 has wrong emask calculation when source is immediate." So detect the case and move the immediate source to a VGRF before we attempt the conversion. Notice that Broadwell and later are strictly scalar at the

[Mesa-dev] [PATCH 29/95] i965/vec4: Rename DF to/from F generator opcodes

2016-07-19 Thread Iago Toral Quiroga
The opcodes are not specific for conversions to/from float since we need the same for conversions to/from other 32-bit types. Rename the opcodes accordingly and change the asserts to check the size of the types involved instead. --- src/mesa/drivers/dri/i965/brw_defines.h | 4

[Mesa-dev] [PATCH 20/95] i965/vec4: Fix DCE for VEC4_OPCODE_SET_{LOW, HIGH}_32BIT

2016-07-19 Thread Iago Toral Quiroga
These opcodes do partial writes of 64-bit data. The problem is that we intend to use them to write on the same register to implement packDouble2x32 and from the point of view of DCE, since both opcodes write to the same register, only the last one stands and decides to eliminate the first, which

<    2   3   4   5   6   7   8   9   10   11   >