Hi Nicolai,
On 13/10/16 01:50 AM, Nicolai Hähnle wrote:
> Module: Mesa
> Branch: master
> Commit: f5f3cadca3809952288e3726ed5fde22090dc61d
> URL:
> http://cgit.freedesktop.org/mesa/mesa/commit/?id=f5f3cadca3809952288e3726ed5fde22090dc61d
>
> Author: Nicolai Hähnle
> Date: Fri Oct 7 12:49:36 2016 +0200
>
> st/glsl_to_tgsi: simpler fixup of empty writemasks
This change broke the piglit tests
spec@glsl-110@execution@variable-indexing@vs-temp-array-mat2-index(-col)-wr
on my Kaveri. Output with R600_DEBUG=ps,vs attached as
vs-temp-array-mat2-index-wr.txt .
P.S. The newly enabled tests
spec@arb_enhanced_layouts@execution@component-layout@vs-tcs-load-output(-indirect)
also fail, output attached as vs-tcs-load-output.stderr .
--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer
VERT
PROPERTY NEXT_SHADER FRAG
DCL IN[0]
DCL OUT[0], POSITION
DCL OUT[1], COLOR
DCL CONST[0..9]
DCL TEMP[0], LOCAL
DCL TEMP[1..2], ARRAY(1), LOCAL
DCL TEMP[3..8], ARRAY(2), LOCAL
DCL TEMP[9..10], ARRAY(3), LOCAL
DCL TEMP[11..12], ARRAY(4), LOCAL
DCL TEMP[13..14], LOCAL
DCL ADDR[0]
IMM[0] FLT32 {0., 0., 1., 0.}
IMM[1] INT32 {2, 0, 0, 0}
0: MUL TEMP[0], CONST[6], IN[0].
1: MAD TEMP[0], CONST[7], IN[0]., TEMP[0]
2: MAD TEMP[0], CONST[8], IN[0]., TEMP[0]
3: MAD TEMP[0], CONST[9], IN[0]., TEMP[0]
4: MOV TEMP[1], IMM[0].
5: MOV TEMP[2], IMM[0].
6: MOV TEMP[3].xy, TEMP[1].xyxx
7: MOV TEMP[4].xy, TEMP[2].xyxx
8: MOV TEMP[9], IMM[0].
9: MOV TEMP[10], IMM[0].
10: MOV TEMP[5].xy, TEMP[9].xyxx
11: MOV TEMP[6].xy, TEMP[10].xyxx
12: MOV TEMP[11], IMM[0].
13: MOV TEMP[12], IMM[0].
14: MOV TEMP[7].xy, TEMP[11].xyxx
15: MOV TEMP[8].xy, TEMP[12].xyxx
16: UMUL TEMP[13].x, CONST[4]., IMM[1].
17: UARL ADDR[0].x, TEMP[13].
18: MOV TEMP[ADDR[0].x+3](2).xy, CONST[0].xyxx
19: UARL ADDR[0].x, TEMP[13].
20: MOV TEMP[ADDR[0].x+4](2).xy, CONST[1].xyxx
21: UMUL TEMP[13].x, CONST[4]., IMM[1].
22: UARL ADDR[0].x, TEMP[13].
23: MOV TEMP[ADDR[0].x+4](2).xy, CONST[5].xyxx
24: UMUL TEMP[13].x, CONST[4]., IMM[1].
25: UMUL TEMP[14].x, CONST[4]., IMM[1].
26: UARL ADDR[0].x, TEMP[14].
27: MUL TEMP[14].xy, TEMP[ADDR[0].x+3](2).xyyy, CONST[2].
28: UARL ADDR[0].x, TEMP[13].
29: MAD TEMP[13].xy, TEMP[ADDR[0].x+4](2).xyyy, CONST[2]., TEMP[14].xyyy
30: ADD TEMP[13].xy, TEMP[13].xyyy, -CONST[3].xyyy
31: DP2 TEMP[13].x, TEMP[13].xyyy, TEMP[13].xyyy
32: FSLT TEMP[13].x, TEMP[13]., IMM[0].
33: UIF TEMP[13]. :0
34: MOV TEMP[13], IMM[0].xzxz
35: ELSE :0
36: MOV TEMP[13], IMM[0].zxxz
37: ENDIF
38: MOV OUT[0], TEMP[0]
39: MOV OUT[1], TEMP[13]
40: END
radeonsi: Compiling shader 1
TGSI shader LLVM IR:
; ModuleID = 'tgsi'
source_filename = "tgsi"
target triple = "amdgcn--"
define amdgpu_vs <{ float, float, float }> @main([17 x <16 x i8>] addrspace(2)*
byval dereferenceable(18446744073709551615), [16 x <16 x i8>] addrspace(2)*
byval dereferenceable(18446744073709551615), [32 x <8 x i32>] addrspace(2)*
byval dereferenceable(18446744073709551615), [16 x <8 x i32>] addrspace(2)*
byval dereferenceable(18446744073709551615), [16 x <4 x i32>] addrspace(2)*
byval dereferenceable(18446744073709551615), [16 x <16 x i8>] addrspace(2)*
byval dereferenceable(18446744073709551615), i32 inreg, i32 inreg, i32 inreg,
i32 inreg, i32, i32, i32, i32, i32) {
main_body:
%15 = getelementptr [16 x <16 x i8>], [16 x <16 x i8>] addrspace(2)* %5, i64
0, i64 0, !amdgpu.uniform !0
%16 = load <16 x i8>, <16 x i8> addrspace(2)* %15, align 16, !invariant.load
!0
%17 = call <4 x float> @llvm.SI.vs.load.input(<16 x i8> %16, i32 0, i32 %14)
%18 = extractelement <4 x float> %17, i32 0
%19 = extractelement <4 x float> %17, i32 1
%20 = extractelement <4 x float> %17, i32 2
%21 = extractelement <4 x float> %17, i32 3
%22 = getelementptr [16 x <16 x i8>], [16 x <16 x i8>] addrspace(2)* %1, i64
0, i64 0, !amdgpu.uniform !0
%23 = load <16 x i8>, <16 x i8> addrspace(2)* %22, align 16, !invariant.load
!0
%24 = call float @llvm.SI.load.const(<16 x i8> %23, i32 96)
%25 = fmul float %24, %18
%26 = call float @llvm.SI.load.const(<16 x i8> %23, i32 100)
%27 = fmul float %26, %18
%28 = call float @llvm.SI.load.const(<16 x i8> %23, i32 104)
%29 = fmul float %28, %18
%30 = getelementptr [16 x <16 x i8>], [16 x <16 x i8>] addrspace(2)* %1, i64
0, i64 0, !amdgpu.uniform !0
%31 = load <16 x i8>, <16 x i8> addrspace(2)* %30, align 16, !invariant.load
!0
%32 = call float @llvm.SI.load.const(<16 x i8> %31, i32 108)
%33 = fmul float %32, %18
%34 = call float @llvm.SI.load.const(<16 x i8> %31, i32 112)
%35 = fmul float %34, %19
%36 = fadd float %35, %25
%37 = getelementptr [16 x <16 x i8>], [16 x <16 x i8>] addrspace(2)* %1, i64
0, i64 0, !amdgpu.uniform !0