Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute
On 16.04.2016 19:20, Marek Olšák wrote: > On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer wrote: >> On 16.04.2016 14:51, Michel Dänzer wrote: >>> On 16.04.2016 11:39, Tom Stellard wrote: The ds_bpermute instruction allows threads to transfer data directly to or from the vgprs of other threads. These instructions use the lds hardware to transfer data, but do not read or write lds memory. DDX BEFORE:| DDX AFTER: | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 v_mbcnt_hi_u32_b32_e64 v2, -1, v2 | v_mbcnt_hi_u32_b32_e64 v2, -1, v2 v_lshlrev_b32_e32 v4, 2, v2| v_and_b32_e32 v2, 0x3ffc, v2 v_and_b32_e32 v2, -4, v2 | v_lshlrev_b32_e32 v2, 2, v2 v_lshlrev_b32_e32 v3, 2, v2| ds_bpermute_b32 v3, v2, v0 s_mov_b32 m0, -1 | ds_bpermute_b32 v0, v2, v0 offset:4 ds_write_b32 v4, v0| s_waitcnt lgkmcnt(0) s_waitcnt lgkmcnt(0) | v_or_b32_e32 v0, 1, v2 | v_lshlrev_b32_e32 v0, 2, v0| ds_read_b32 v1, v3 | ds_read_b32 v0, v0 | s_waitcnt lgkmcnt(0) | | LDS: 1 blocks | LDS: 0 blocks >>> >>> Nice. >>> >>> >>> Were these intrinsics already available in LLVM 3.6? If not, the old >>> code needs to be kept for backwards compatibility. >> >> I can see now that you're taking care of this for the bpermute >> intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8. > > How do you feel about increasing the requirement to LLVM 3.8 for Mesa git? It's too early for that. IMO we should always support at least two major releases of LLVM. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute
On Sat, Apr 16, 2016 at 8:17 PM, Nicolai Hähnle wrote: > On 16.04.2016 05:20, Marek Olšák wrote: >> >> On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer wrote: >>> >>> On 16.04.2016 14:51, Michel Dänzer wrote: On 16.04.2016 11:39, Tom Stellard wrote: > > The ds_bpermute instruction allows threads to transfer data directly > to or from the vgprs of other threads. These instructions use the lds > hardware to transfer data, but do not read or write lds memory. > > DDX BEFORE:| DDX AFTER: > | > v_mbcnt_lo_u32_b32_e64 v2, -1, 0 | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 > v_mbcnt_hi_u32_b32_e64 v2, -1, v2 | v_mbcnt_hi_u32_b32_e64 v2, -1, v2 > v_lshlrev_b32_e32 v4, 2, v2| v_and_b32_e32 v2, 0x3ffc, v2 > v_and_b32_e32 v2, -4, v2 | v_lshlrev_b32_e32 v2, 2, v2 > v_lshlrev_b32_e32 v3, 2, v2| ds_bpermute_b32 v3, v2, v0 > s_mov_b32 m0, -1 | ds_bpermute_b32 v0, v2, v0 > offset:4 > ds_write_b32 v4, v0| s_waitcnt lgkmcnt(0) > s_waitcnt lgkmcnt(0) | > v_or_b32_e32 v0, 1, v2 | > v_lshlrev_b32_e32 v0, 2, v0| > ds_read_b32 v1, v3 | > ds_read_b32 v0, v0 | > s_waitcnt lgkmcnt(0) | > | > LDS: 1 blocks | LDS: 0 blocks Nice. Were these intrinsics already available in LLVM 3.6? If not, the old code needs to be kept for backwards compatibility. >>> >>> >>> I can see now that you're taking care of this for the bpermute >>> intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8. >> >> >> How do you feel about increasing the requirement to LLVM 3.8 for Mesa git? > > > Please no. > > In addition to Gentoo and Arch mentioned by Ilia, Ubuntu also still ships > 3.7. This will change soon enough, but even then, we should give people a > few months to update. > > Let's not scare people away too much. OK. Sounds good. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute
On 16.04.2016 05:20, Marek Olšák wrote: On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer wrote: On 16.04.2016 14:51, Michel Dänzer wrote: On 16.04.2016 11:39, Tom Stellard wrote: The ds_bpermute instruction allows threads to transfer data directly to or from the vgprs of other threads. These instructions use the lds hardware to transfer data, but do not read or write lds memory. DDX BEFORE:| DDX AFTER: | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 v_mbcnt_hi_u32_b32_e64 v2, -1, v2 | v_mbcnt_hi_u32_b32_e64 v2, -1, v2 v_lshlrev_b32_e32 v4, 2, v2| v_and_b32_e32 v2, 0x3ffc, v2 v_and_b32_e32 v2, -4, v2 | v_lshlrev_b32_e32 v2, 2, v2 v_lshlrev_b32_e32 v3, 2, v2| ds_bpermute_b32 v3, v2, v0 s_mov_b32 m0, -1 | ds_bpermute_b32 v0, v2, v0 offset:4 ds_write_b32 v4, v0| s_waitcnt lgkmcnt(0) s_waitcnt lgkmcnt(0) | v_or_b32_e32 v0, 1, v2 | v_lshlrev_b32_e32 v0, 2, v0| ds_read_b32 v1, v3 | ds_read_b32 v0, v0 | s_waitcnt lgkmcnt(0) | | LDS: 1 blocks | LDS: 0 blocks Nice. Were these intrinsics already available in LLVM 3.6? If not, the old code needs to be kept for backwards compatibility. I can see now that you're taking care of this for the bpermute intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8. How do you feel about increasing the requirement to LLVM 3.8 for Mesa git? Please no. In addition to Gentoo and Arch mentioned by Ilia, Ubuntu also still ships 3.7. This will change soon enough, but even then, we should give people a few months to update. Let's not scare people away too much. Nicolai ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute
On Sat, Apr 16, 2016 at 10:36 AM, Marek Olšák wrote: > On Sat, Apr 16, 2016 at 3:28 PM, Roland Scheidegger > wrote: >> Am 16.04.2016 um 15:19 schrieb eocallag...@alterapraxis.com: >>> On 2016-04-16 20:20, Marek Olšák wrote: On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer wrote: > On 16.04.2016 14:51, Michel Dänzer wrote: >> On 16.04.2016 11:39, Tom Stellard wrote: >>> The ds_bpermute instruction allows threads to transfer data directly >>> to or from the vgprs of other threads. These instructions use the lds >>> hardware to transfer data, but do not read or write lds memory. >>> >>> DDX BEFORE:| DDX AFTER: >>>| >>> v_mbcnt_lo_u32_b32_e64 v2, -1, 0 | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 >>> v_mbcnt_hi_u32_b32_e64 v2, -1, v2 | v_mbcnt_hi_u32_b32_e64 v2, >>> -1, v2 >>> v_lshlrev_b32_e32 v4, 2, v2| v_and_b32_e32 v2, 0x3ffc, v2 >>> v_and_b32_e32 v2, -4, v2 | v_lshlrev_b32_e32 v2, 2, v2 >>> v_lshlrev_b32_e32 v3, 2, v2| ds_bpermute_b32 v3, v2, v0 >>> s_mov_b32 m0, -1 | ds_bpermute_b32 v0, v2, v0 >>> offset:4 >>> ds_write_b32 v4, v0| s_waitcnt lgkmcnt(0) >>> s_waitcnt lgkmcnt(0) | >>> v_or_b32_e32 v0, 1, v2 | >>> v_lshlrev_b32_e32 v0, 2, v0| >>> ds_read_b32 v1, v3 | >>> ds_read_b32 v0, v0 | >>> s_waitcnt lgkmcnt(0) | >>>| >>> LDS: 1 blocks | LDS: 0 blocks >> >> Nice. >> >> >> Were these intrinsics already available in LLVM 3.6? If not, the old >> code needs to be kept for backwards compatibility. > > I can see now that you're taking care of this for the bpermute > intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8. How do you feel about increasing the requirement to LLVM 3.8 for Mesa git? >>> >>> +1 from me. Supporting more than two generations of LLVM is a bit much >>> to carry imho. >>> >> >> You don't want to support any released version which is older than one >> month? >> (This isn't an objection, just a remark...) > > Life's hard. Sometimes we have to make hard choices. :) > > Now seriously, LLVM 3.7 enables OpenGL 4.0-4.1 and LLVM 3.8 enables > immediate shader compilation (without recompilations) for radeonsi. > I'll let others assess how important those two are. From a practical standpoint, gentoo and arch are shipping LLVM 3.7.1 by default. It may cause a bunch of frustration for people if you require 3.8. I don't actually build or use radeonsi, just pointing out some potentially pertinent facts. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute
On Sat, Apr 16, 2016 at 3:28 PM, Roland Scheidegger wrote: > Am 16.04.2016 um 15:19 schrieb eocallag...@alterapraxis.com: >> On 2016-04-16 20:20, Marek Olšák wrote: >>> On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer >>> wrote: On 16.04.2016 14:51, Michel Dänzer wrote: > On 16.04.2016 11:39, Tom Stellard wrote: >> The ds_bpermute instruction allows threads to transfer data directly >> to or from the vgprs of other threads. These instructions use the lds >> hardware to transfer data, but do not read or write lds memory. >> >> DDX BEFORE:| DDX AFTER: >>| >> v_mbcnt_lo_u32_b32_e64 v2, -1, 0 | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 >> v_mbcnt_hi_u32_b32_e64 v2, -1, v2 | v_mbcnt_hi_u32_b32_e64 v2, >> -1, v2 >> v_lshlrev_b32_e32 v4, 2, v2| v_and_b32_e32 v2, 0x3ffc, v2 >> v_and_b32_e32 v2, -4, v2 | v_lshlrev_b32_e32 v2, 2, v2 >> v_lshlrev_b32_e32 v3, 2, v2| ds_bpermute_b32 v3, v2, v0 >> s_mov_b32 m0, -1 | ds_bpermute_b32 v0, v2, v0 >> offset:4 >> ds_write_b32 v4, v0| s_waitcnt lgkmcnt(0) >> s_waitcnt lgkmcnt(0) | >> v_or_b32_e32 v0, 1, v2 | >> v_lshlrev_b32_e32 v0, 2, v0| >> ds_read_b32 v1, v3 | >> ds_read_b32 v0, v0 | >> s_waitcnt lgkmcnt(0) | >>| >> LDS: 1 blocks | LDS: 0 blocks > > Nice. > > > Were these intrinsics already available in LLVM 3.6? If not, the old > code needs to be kept for backwards compatibility. I can see now that you're taking care of this for the bpermute intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8. >>> >>> How do you feel about increasing the requirement to LLVM 3.8 for Mesa >>> git? >> >> +1 from me. Supporting more than two generations of LLVM is a bit much >> to carry imho. >> > > You don't want to support any released version which is older than one > month? > (This isn't an objection, just a remark...) Life's hard. Sometimes we have to make hard choices. :) Now seriously, LLVM 3.7 enables OpenGL 4.0-4.1 and LLVM 3.8 enables immediate shader compilation (without recompilations) for radeonsi. I'll let others assess how important those two are. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute
Am 16.04.2016 um 15:19 schrieb eocallag...@alterapraxis.com: > On 2016-04-16 20:20, Marek Olšák wrote: >> On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer >> wrote: >>> On 16.04.2016 14:51, Michel Dänzer wrote: On 16.04.2016 11:39, Tom Stellard wrote: > The ds_bpermute instruction allows threads to transfer data directly > to or from the vgprs of other threads. These instructions use the lds > hardware to transfer data, but do not read or write lds memory. > > DDX BEFORE:| DDX AFTER: >| > v_mbcnt_lo_u32_b32_e64 v2, -1, 0 | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 > v_mbcnt_hi_u32_b32_e64 v2, -1, v2 | v_mbcnt_hi_u32_b32_e64 v2, > -1, v2 > v_lshlrev_b32_e32 v4, 2, v2| v_and_b32_e32 v2, 0x3ffc, v2 > v_and_b32_e32 v2, -4, v2 | v_lshlrev_b32_e32 v2, 2, v2 > v_lshlrev_b32_e32 v3, 2, v2| ds_bpermute_b32 v3, v2, v0 > s_mov_b32 m0, -1 | ds_bpermute_b32 v0, v2, v0 > offset:4 > ds_write_b32 v4, v0| s_waitcnt lgkmcnt(0) > s_waitcnt lgkmcnt(0) | > v_or_b32_e32 v0, 1, v2 | > v_lshlrev_b32_e32 v0, 2, v0| > ds_read_b32 v1, v3 | > ds_read_b32 v0, v0 | > s_waitcnt lgkmcnt(0) | >| > LDS: 1 blocks | LDS: 0 blocks Nice. Were these intrinsics already available in LLVM 3.6? If not, the old code needs to be kept for backwards compatibility. >>> >>> I can see now that you're taking care of this for the bpermute >>> intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8. >> >> How do you feel about increasing the requirement to LLVM 3.8 for Mesa >> git? > > +1 from me. Supporting more than two generations of LLVM is a bit much > to carry imho. > You don't want to support any released version which is older than one month? (This isn't an objection, just a remark...) Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute
On 2016-04-16 20:20, Marek Olšák wrote: On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer wrote: On 16.04.2016 14:51, Michel Dänzer wrote: On 16.04.2016 11:39, Tom Stellard wrote: The ds_bpermute instruction allows threads to transfer data directly to or from the vgprs of other threads. These instructions use the lds hardware to transfer data, but do not read or write lds memory. DDX BEFORE:| DDX AFTER: | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 v_mbcnt_hi_u32_b32_e64 v2, -1, v2 | v_mbcnt_hi_u32_b32_e64 v2, -1, v2 v_lshlrev_b32_e32 v4, 2, v2| v_and_b32_e32 v2, 0x3ffc, v2 v_and_b32_e32 v2, -4, v2 | v_lshlrev_b32_e32 v2, 2, v2 v_lshlrev_b32_e32 v3, 2, v2| ds_bpermute_b32 v3, v2, v0 s_mov_b32 m0, -1 | ds_bpermute_b32 v0, v2, v0 offset:4 ds_write_b32 v4, v0| s_waitcnt lgkmcnt(0) s_waitcnt lgkmcnt(0) | v_or_b32_e32 v0, 1, v2 | v_lshlrev_b32_e32 v0, 2, v0| ds_read_b32 v1, v3 | ds_read_b32 v0, v0 | s_waitcnt lgkmcnt(0) | | LDS: 1 blocks | LDS: 0 blocks Nice. Were these intrinsics already available in LLVM 3.6? If not, the old code needs to be kept for backwards compatibility. I can see now that you're taking care of this for the bpermute intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8. How do you feel about increasing the requirement to LLVM 3.8 for Mesa git? +1 from me. Supporting more than two generations of LLVM is a bit much to carry imho. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute
On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer wrote: > On 16.04.2016 14:51, Michel Dänzer wrote: >> On 16.04.2016 11:39, Tom Stellard wrote: >>> The ds_bpermute instruction allows threads to transfer data directly >>> to or from the vgprs of other threads. These instructions use the lds >>> hardware to transfer data, but do not read or write lds memory. >>> >>> DDX BEFORE:| DDX AFTER: >>>| >>> v_mbcnt_lo_u32_b32_e64 v2, -1, 0 | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 >>> v_mbcnt_hi_u32_b32_e64 v2, -1, v2 | v_mbcnt_hi_u32_b32_e64 v2, -1, v2 >>> v_lshlrev_b32_e32 v4, 2, v2| v_and_b32_e32 v2, 0x3ffc, v2 >>> v_and_b32_e32 v2, -4, v2 | v_lshlrev_b32_e32 v2, 2, v2 >>> v_lshlrev_b32_e32 v3, 2, v2| ds_bpermute_b32 v3, v2, v0 >>> s_mov_b32 m0, -1 | ds_bpermute_b32 v0, v2, v0 offset:4 >>> ds_write_b32 v4, v0| s_waitcnt lgkmcnt(0) >>> s_waitcnt lgkmcnt(0) | >>> v_or_b32_e32 v0, 1, v2 | >>> v_lshlrev_b32_e32 v0, 2, v0| >>> ds_read_b32 v1, v3 | >>> ds_read_b32 v0, v0 | >>> s_waitcnt lgkmcnt(0) | >>>| >>> LDS: 1 blocks | LDS: 0 blocks >> >> Nice. >> >> >> Were these intrinsics already available in LLVM 3.6? If not, the old >> code needs to be kept for backwards compatibility. > > I can see now that you're taking care of this for the bpermute > intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8. How do you feel about increasing the requirement to LLVM 3.8 for Mesa git? Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute
On 16.04.2016 14:51, Michel Dänzer wrote: > On 16.04.2016 11:39, Tom Stellard wrote: >> The ds_bpermute instruction allows threads to transfer data directly >> to or from the vgprs of other threads. These instructions use the lds >> hardware to transfer data, but do not read or write lds memory. >> >> DDX BEFORE:| DDX AFTER: >>| >> v_mbcnt_lo_u32_b32_e64 v2, -1, 0 | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 >> v_mbcnt_hi_u32_b32_e64 v2, -1, v2 | v_mbcnt_hi_u32_b32_e64 v2, -1, v2 >> v_lshlrev_b32_e32 v4, 2, v2| v_and_b32_e32 v2, 0x3ffc, v2 >> v_and_b32_e32 v2, -4, v2 | v_lshlrev_b32_e32 v2, 2, v2 >> v_lshlrev_b32_e32 v3, 2, v2| ds_bpermute_b32 v3, v2, v0 >> s_mov_b32 m0, -1 | ds_bpermute_b32 v0, v2, v0 offset:4 >> ds_write_b32 v4, v0| s_waitcnt lgkmcnt(0) >> s_waitcnt lgkmcnt(0) | >> v_or_b32_e32 v0, 1, v2 | >> v_lshlrev_b32_e32 v0, 2, v0| >> ds_read_b32 v1, v3 | >> ds_read_b32 v0, v0 | >> s_waitcnt lgkmcnt(0) | >>| >> LDS: 1 blocks | LDS: 0 blocks > > Nice. > > > Were these intrinsics already available in LLVM 3.6? If not, the old > code needs to be kept for backwards compatibility. I can see now that you're taking care of this for the bpermute intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute
On 16.04.2016 11:39, Tom Stellard wrote: > The ds_bpermute instruction allows threads to transfer data directly > to or from the vgprs of other threads. These instructions use the lds > hardware to transfer data, but do not read or write lds memory. > > DDX BEFORE:| DDX AFTER: >| > v_mbcnt_lo_u32_b32_e64 v2, -1, 0 | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 > v_mbcnt_hi_u32_b32_e64 v2, -1, v2 | v_mbcnt_hi_u32_b32_e64 v2, -1, v2 > v_lshlrev_b32_e32 v4, 2, v2| v_and_b32_e32 v2, 0x3ffc, v2 > v_and_b32_e32 v2, -4, v2 | v_lshlrev_b32_e32 v2, 2, v2 > v_lshlrev_b32_e32 v3, 2, v2| ds_bpermute_b32 v3, v2, v0 > s_mov_b32 m0, -1 | ds_bpermute_b32 v0, v2, v0 offset:4 > ds_write_b32 v4, v0| s_waitcnt lgkmcnt(0) > s_waitcnt lgkmcnt(0) | > v_or_b32_e32 v0, 1, v2 | > v_lshlrev_b32_e32 v0, 2, v0| > ds_read_b32 v1, v3 | > ds_read_b32 v0, v0 | > s_waitcnt lgkmcnt(0) | >| > LDS: 1 blocks | LDS: 0 blocks Nice. Were these intrinsics already available in LLVM 3.6? If not, the old code needs to be kept for backwards compatibility. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute
The ds_bpermute instruction allows threads to transfer data directly to or from the vgprs of other threads. These instructions use the lds hardware to transfer data, but do not read or write lds memory. DDX BEFORE:| DDX AFTER: | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 v_mbcnt_hi_u32_b32_e64 v2, -1, v2 | v_mbcnt_hi_u32_b32_e64 v2, -1, v2 v_lshlrev_b32_e32 v4, 2, v2| v_and_b32_e32 v2, 0x3ffc, v2 v_and_b32_e32 v2, -4, v2 | v_lshlrev_b32_e32 v2, 2, v2 v_lshlrev_b32_e32 v3, 2, v2| ds_bpermute_b32 v3, v2, v0 s_mov_b32 m0, -1 | ds_bpermute_b32 v0, v2, v0 offset:4 ds_write_b32 v4, v0| s_waitcnt lgkmcnt(0) s_waitcnt lgkmcnt(0) | v_or_b32_e32 v0, 1, v2 | v_lshlrev_b32_e32 v0, 2, v0| ds_read_b32 v1, v3 | ds_read_b32 v0, v0 | s_waitcnt lgkmcnt(0) | | LDS: 1 blocks | LDS: 0 blocks --- src/gallium/drivers/radeonsi/si_shader.c | 51 +--- 1 file changed, 34 insertions(+), 17 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 377ff26..c3d03eb 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -4117,22 +4117,22 @@ static void si_llvm_emit_ddxy( LLVMValueRef indices[2]; LLVMValueRef store_ptr, load_ptr0, load_ptr1; LLVMValueRef tl, trbl, result[4]; - LLVMValueRef tid_args[2]; + LLVMValueRef tl_tid, trbl_tid, tid, tid_args[2]; unsigned swizzle[4]; unsigned c; int idx; unsigned mask; - tid_args[0] = lp_build_const_int32(gallivm, 0x); tid_args[1] = bld_base->uint_bld.zero; tid_args[1] = lp_build_intrinsic(gallivm->builder, "llvm.amdgcn.mbcnt.lo", ctx->i32, tid_args, 2, LLVMReadNoneAttribute); - - indices[0] = bld_base->uint_bld.zero; - indices[1] = lp_build_intrinsic(gallivm->builder, + tid = lp_build_intrinsic(gallivm->builder, "llvm.amdgcn.mbcnt.hi", ctx->i32, tid_args, 2, LLVMReadNoneAttribute); + + indices[0] = bld_base->uint_bld.zero; + indices[1] = tid; store_ptr = LLVMBuildGEP(gallivm->builder, ctx->lds, indices, 2, ""); @@ -4143,20 +4143,24 @@ static void si_llvm_emit_ddxy( else mask = TID_MASK_TOP_LEFT; - indices[1] = LLVMBuildAnd(gallivm->builder, indices[1], - lp_build_const_int32(gallivm, mask), ""); + tl_tid = LLVMBuildAnd(gallivm->builder, indices[1], + lp_build_const_int32(gallivm, mask), ""); + indices[1] = tl_tid; load_ptr0 = LLVMBuildGEP(gallivm->builder, ctx->lds, indices, 2, ""); /* for DDX we want to next X pixel, DDY next Y pixel. */ idx = (opcode == TGSI_OPCODE_DDX || opcode == TGSI_OPCODE_DDX_FINE) ? 1 : 2; - indices[1] = LLVMBuildAdd(gallivm->builder, indices[1], + trbl_tid = LLVMBuildAdd(gallivm->builder, indices[1], lp_build_const_int32(gallivm, idx), ""); + indices[1] = trbl_tid; load_ptr1 = LLVMBuildGEP(gallivm->builder, ctx->lds, indices, 2, ""); for (c = 0; c < 4; ++c) { unsigned i; + LLVMValueRef val; + LLVMValueRef args[2]; swizzle[c] = tgsi_util_get_full_src_register_swizzle(&inst->Src[0], c); for (i = 0; i < c; ++i) { @@ -4168,18 +4172,31 @@ static void si_llvm_emit_ddxy( if (i != c) continue; - LLVMBuildStore(gallivm->builder, - LLVMBuildBitCast(gallivm->builder, - lp_build_emit_fetch(bld_base, inst, 0, c), - ctx->i32, ""), - store_ptr); + val = LLVMBuildBitCast(gallivm->builder, + lp_build_emit_fetch(bld_base, inst, 0, c), + ctx->i32, ""); - tl = LLVMBuildLoad(gallivm->builder, load_ptr0, ""); - tl = LLVMBuildBitCast(gallivm->builder, tl, ctx->f32, ""); + if ((HAVE_LLVM >= 0x0309) && ctx->screen->b.family >= CHIP_TONGA) { - trbl = LLVMBuildLoad(gallivm->builder, load_ptr1, ""); - trbl = LLVMBuildBitCast(gallivm->builder, trbl, ctx->f32, ""); + args[0] = LLVMBuildMul(gallivm->builder, tl_tid, +