Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute

2016-04-17 Thread Michel Dänzer
On 16.04.2016 19:20, Marek Olšák wrote:
> On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer  wrote:
>> On 16.04.2016 14:51, Michel Dänzer wrote:
>>> On 16.04.2016 11:39, Tom Stellard wrote:
 The ds_bpermute instruction allows threads to transfer data directly
 to or from the vgprs of other threads.  These instructions use the lds
 hardware to transfer data, but do not read or write lds memory.

 DDX BEFORE:|  DDX AFTER:
|
 v_mbcnt_lo_u32_b32_e64 v2, -1, 0   |  v_mbcnt_lo_u32_b32_e64 v2, -1, 0
 v_mbcnt_hi_u32_b32_e64 v2, -1, v2  |  v_mbcnt_hi_u32_b32_e64 v2, -1, v2
 v_lshlrev_b32_e32 v4, 2, v2|  v_and_b32_e32 v2, 0x3ffc, v2
 v_and_b32_e32 v2, -4, v2   |  v_lshlrev_b32_e32 v2, 2, v2
 v_lshlrev_b32_e32 v3, 2, v2|  ds_bpermute_b32 v3, v2, v0
 s_mov_b32 m0, -1   |  ds_bpermute_b32 v0, v2, v0 offset:4
 ds_write_b32 v4, v0|  s_waitcnt lgkmcnt(0)
 s_waitcnt lgkmcnt(0)   |
 v_or_b32_e32 v0, 1, v2 |
 v_lshlrev_b32_e32 v0, 2, v0|
 ds_read_b32 v1, v3 |
 ds_read_b32 v0, v0 |
 s_waitcnt lgkmcnt(0)   |
|
 LDS: 1 blocks  |  LDS: 0 blocks
>>>
>>> Nice.
>>>
>>>
>>> Were these intrinsics already available in LLVM 3.6? If not, the old
>>> code needs to be kept for backwards compatibility.
>>
>> I can see now that you're taking care of this for the bpermute
>> intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8.
> 
> How do you feel about increasing the requirement to LLVM 3.8 for Mesa git?

It's too early for that. IMO we should always support at least two major
releases of LLVM.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute

2016-04-16 Thread Marek Olšák
On Sat, Apr 16, 2016 at 8:17 PM, Nicolai Hähnle  wrote:
> On 16.04.2016 05:20, Marek Olšák wrote:
>>
>> On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer  wrote:
>>>
>>> On 16.04.2016 14:51, Michel Dänzer wrote:

 On 16.04.2016 11:39, Tom Stellard wrote:
>
> The ds_bpermute instruction allows threads to transfer data directly
> to or from the vgprs of other threads.  These instructions use the lds
> hardware to transfer data, but do not read or write lds memory.
>
> DDX BEFORE:|  DDX AFTER:
> |
> v_mbcnt_lo_u32_b32_e64 v2, -1, 0   |  v_mbcnt_lo_u32_b32_e64 v2, -1, 0
> v_mbcnt_hi_u32_b32_e64 v2, -1, v2  |  v_mbcnt_hi_u32_b32_e64 v2, -1, v2
> v_lshlrev_b32_e32 v4, 2, v2|  v_and_b32_e32 v2, 0x3ffc, v2
> v_and_b32_e32 v2, -4, v2   |  v_lshlrev_b32_e32 v2, 2, v2
> v_lshlrev_b32_e32 v3, 2, v2|  ds_bpermute_b32 v3, v2, v0
> s_mov_b32 m0, -1   |  ds_bpermute_b32 v0, v2, v0
> offset:4
> ds_write_b32 v4, v0|  s_waitcnt lgkmcnt(0)
> s_waitcnt lgkmcnt(0)   |
> v_or_b32_e32 v0, 1, v2 |
> v_lshlrev_b32_e32 v0, 2, v0|
> ds_read_b32 v1, v3 |
> ds_read_b32 v0, v0 |
> s_waitcnt lgkmcnt(0)   |
> |
> LDS: 1 blocks  |  LDS: 0 blocks


 Nice.


 Were these intrinsics already available in LLVM 3.6? If not, the old
 code needs to be kept for backwards compatibility.
>>>
>>>
>>> I can see now that you're taking care of this for the bpermute
>>> intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8.
>>
>>
>> How do you feel about increasing the requirement to LLVM 3.8 for Mesa git?
>
>
> Please no.
>
> In addition to Gentoo and Arch mentioned by Ilia, Ubuntu also still ships
> 3.7. This will change soon enough, but even then, we should give people a
> few months to update.
>
> Let's not scare people away too much.

OK. Sounds good.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute

2016-04-16 Thread Nicolai Hähnle

On 16.04.2016 05:20, Marek Olšák wrote:

On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer  wrote:

On 16.04.2016 14:51, Michel Dänzer wrote:

On 16.04.2016 11:39, Tom Stellard wrote:

The ds_bpermute instruction allows threads to transfer data directly
to or from the vgprs of other threads.  These instructions use the lds
hardware to transfer data, but do not read or write lds memory.

DDX BEFORE:|  DDX AFTER:
|
v_mbcnt_lo_u32_b32_e64 v2, -1, 0   |  v_mbcnt_lo_u32_b32_e64 v2, -1, 0
v_mbcnt_hi_u32_b32_e64 v2, -1, v2  |  v_mbcnt_hi_u32_b32_e64 v2, -1, v2
v_lshlrev_b32_e32 v4, 2, v2|  v_and_b32_e32 v2, 0x3ffc, v2
v_and_b32_e32 v2, -4, v2   |  v_lshlrev_b32_e32 v2, 2, v2
v_lshlrev_b32_e32 v3, 2, v2|  ds_bpermute_b32 v3, v2, v0
s_mov_b32 m0, -1   |  ds_bpermute_b32 v0, v2, v0 offset:4
ds_write_b32 v4, v0|  s_waitcnt lgkmcnt(0)
s_waitcnt lgkmcnt(0)   |
v_or_b32_e32 v0, 1, v2 |
v_lshlrev_b32_e32 v0, 2, v0|
ds_read_b32 v1, v3 |
ds_read_b32 v0, v0 |
s_waitcnt lgkmcnt(0)   |
|
LDS: 1 blocks  |  LDS: 0 blocks


Nice.


Were these intrinsics already available in LLVM 3.6? If not, the old
code needs to be kept for backwards compatibility.


I can see now that you're taking care of this for the bpermute
intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8.


How do you feel about increasing the requirement to LLVM 3.8 for Mesa git?


Please no.

In addition to Gentoo and Arch mentioned by Ilia, Ubuntu also still 
ships 3.7. This will change soon enough, but even then, we should give 
people a few months to update.


Let's not scare people away too much.

Nicolai
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute

2016-04-16 Thread Ilia Mirkin
On Sat, Apr 16, 2016 at 10:36 AM, Marek Olšák  wrote:
> On Sat, Apr 16, 2016 at 3:28 PM, Roland Scheidegger  
> wrote:
>> Am 16.04.2016 um 15:19 schrieb eocallag...@alterapraxis.com:
>>> On 2016-04-16 20:20, Marek Olšák wrote:
 On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer 
 wrote:
> On 16.04.2016 14:51, Michel Dänzer wrote:
>> On 16.04.2016 11:39, Tom Stellard wrote:
>>> The ds_bpermute instruction allows threads to transfer data directly
>>> to or from the vgprs of other threads.  These instructions use the lds
>>> hardware to transfer data, but do not read or write lds memory.
>>>
>>> DDX BEFORE:|  DDX AFTER:
>>>|
>>> v_mbcnt_lo_u32_b32_e64 v2, -1, 0   |  v_mbcnt_lo_u32_b32_e64 v2, -1, 0
>>> v_mbcnt_hi_u32_b32_e64 v2, -1, v2  |  v_mbcnt_hi_u32_b32_e64 v2,
>>> -1, v2
>>> v_lshlrev_b32_e32 v4, 2, v2|  v_and_b32_e32 v2, 0x3ffc, v2
>>> v_and_b32_e32 v2, -4, v2   |  v_lshlrev_b32_e32 v2, 2, v2
>>> v_lshlrev_b32_e32 v3, 2, v2|  ds_bpermute_b32 v3, v2, v0
>>> s_mov_b32 m0, -1   |  ds_bpermute_b32 v0, v2, v0
>>> offset:4
>>> ds_write_b32 v4, v0|  s_waitcnt lgkmcnt(0)
>>> s_waitcnt lgkmcnt(0)   |
>>> v_or_b32_e32 v0, 1, v2 |
>>> v_lshlrev_b32_e32 v0, 2, v0|
>>> ds_read_b32 v1, v3 |
>>> ds_read_b32 v0, v0 |
>>> s_waitcnt lgkmcnt(0)   |
>>>|
>>> LDS: 1 blocks  |  LDS: 0 blocks
>>
>> Nice.
>>
>>
>> Were these intrinsics already available in LLVM 3.6? If not, the old
>> code needs to be kept for backwards compatibility.
>
> I can see now that you're taking care of this for the bpermute
> intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8.

 How do you feel about increasing the requirement to LLVM 3.8 for Mesa
 git?
>>>
>>> +1 from me. Supporting more than two generations of LLVM is a bit much
>>> to carry imho.
>>>
>>
>> You don't want to support any released version which is older than one
>> month?
>> (This isn't an objection, just a remark...)
>
> Life's hard. Sometimes we have to make hard choices. :)
>
> Now seriously, LLVM 3.7 enables OpenGL 4.0-4.1 and LLVM 3.8 enables
> immediate shader compilation (without recompilations) for radeonsi.
> I'll let others assess how important those two are.

From a practical standpoint, gentoo and arch are shipping LLVM 3.7.1
by default. It may cause a bunch of frustration for people if you
require 3.8. I don't actually build or use radeonsi, just pointing out
some potentially pertinent facts.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute

2016-04-16 Thread Marek Olšák
On Sat, Apr 16, 2016 at 3:28 PM, Roland Scheidegger  wrote:
> Am 16.04.2016 um 15:19 schrieb eocallag...@alterapraxis.com:
>> On 2016-04-16 20:20, Marek Olšák wrote:
>>> On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer 
>>> wrote:
 On 16.04.2016 14:51, Michel Dänzer wrote:
> On 16.04.2016 11:39, Tom Stellard wrote:
>> The ds_bpermute instruction allows threads to transfer data directly
>> to or from the vgprs of other threads.  These instructions use the lds
>> hardware to transfer data, but do not read or write lds memory.
>>
>> DDX BEFORE:|  DDX AFTER:
>>|
>> v_mbcnt_lo_u32_b32_e64 v2, -1, 0   |  v_mbcnt_lo_u32_b32_e64 v2, -1, 0
>> v_mbcnt_hi_u32_b32_e64 v2, -1, v2  |  v_mbcnt_hi_u32_b32_e64 v2,
>> -1, v2
>> v_lshlrev_b32_e32 v4, 2, v2|  v_and_b32_e32 v2, 0x3ffc, v2
>> v_and_b32_e32 v2, -4, v2   |  v_lshlrev_b32_e32 v2, 2, v2
>> v_lshlrev_b32_e32 v3, 2, v2|  ds_bpermute_b32 v3, v2, v0
>> s_mov_b32 m0, -1   |  ds_bpermute_b32 v0, v2, v0
>> offset:4
>> ds_write_b32 v4, v0|  s_waitcnt lgkmcnt(0)
>> s_waitcnt lgkmcnt(0)   |
>> v_or_b32_e32 v0, 1, v2 |
>> v_lshlrev_b32_e32 v0, 2, v0|
>> ds_read_b32 v1, v3 |
>> ds_read_b32 v0, v0 |
>> s_waitcnt lgkmcnt(0)   |
>>|
>> LDS: 1 blocks  |  LDS: 0 blocks
>
> Nice.
>
>
> Were these intrinsics already available in LLVM 3.6? If not, the old
> code needs to be kept for backwards compatibility.

 I can see now that you're taking care of this for the bpermute
 intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8.
>>>
>>> How do you feel about increasing the requirement to LLVM 3.8 for Mesa
>>> git?
>>
>> +1 from me. Supporting more than two generations of LLVM is a bit much
>> to carry imho.
>>
>
> You don't want to support any released version which is older than one
> month?
> (This isn't an objection, just a remark...)

Life's hard. Sometimes we have to make hard choices. :)

Now seriously, LLVM 3.7 enables OpenGL 4.0-4.1 and LLVM 3.8 enables
immediate shader compilation (without recompilations) for radeonsi.
I'll let others assess how important those two are.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute

2016-04-16 Thread Roland Scheidegger
Am 16.04.2016 um 15:19 schrieb eocallag...@alterapraxis.com:
> On 2016-04-16 20:20, Marek Olšák wrote:
>> On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer 
>> wrote:
>>> On 16.04.2016 14:51, Michel Dänzer wrote:
 On 16.04.2016 11:39, Tom Stellard wrote:
> The ds_bpermute instruction allows threads to transfer data directly
> to or from the vgprs of other threads.  These instructions use the lds
> hardware to transfer data, but do not read or write lds memory.
>
> DDX BEFORE:|  DDX AFTER:
>|
> v_mbcnt_lo_u32_b32_e64 v2, -1, 0   |  v_mbcnt_lo_u32_b32_e64 v2, -1, 0
> v_mbcnt_hi_u32_b32_e64 v2, -1, v2  |  v_mbcnt_hi_u32_b32_e64 v2,
> -1, v2
> v_lshlrev_b32_e32 v4, 2, v2|  v_and_b32_e32 v2, 0x3ffc, v2
> v_and_b32_e32 v2, -4, v2   |  v_lshlrev_b32_e32 v2, 2, v2
> v_lshlrev_b32_e32 v3, 2, v2|  ds_bpermute_b32 v3, v2, v0
> s_mov_b32 m0, -1   |  ds_bpermute_b32 v0, v2, v0
> offset:4
> ds_write_b32 v4, v0|  s_waitcnt lgkmcnt(0)
> s_waitcnt lgkmcnt(0)   |
> v_or_b32_e32 v0, 1, v2 |
> v_lshlrev_b32_e32 v0, 2, v0|
> ds_read_b32 v1, v3 |
> ds_read_b32 v0, v0 |
> s_waitcnt lgkmcnt(0)   |
>|
> LDS: 1 blocks  |  LDS: 0 blocks

 Nice.


 Were these intrinsics already available in LLVM 3.6? If not, the old
 code needs to be kept for backwards compatibility.
>>>
>>> I can see now that you're taking care of this for the bpermute
>>> intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8.
>>
>> How do you feel about increasing the requirement to LLVM 3.8 for Mesa
>> git?
> 
> +1 from me. Supporting more than two generations of LLVM is a bit much
> to carry imho.
> 

You don't want to support any released version which is older than one
month?
(This isn't an objection, just a remark...)

Roland


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute

2016-04-16 Thread eocallaghan

On 2016-04-16 20:20, Marek Olšák wrote:
On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer  
wrote:

On 16.04.2016 14:51, Michel Dänzer wrote:

On 16.04.2016 11:39, Tom Stellard wrote:

The ds_bpermute instruction allows threads to transfer data directly
to or from the vgprs of other threads.  These instructions use the 
lds

hardware to transfer data, but do not read or write lds memory.

DDX BEFORE:|  DDX AFTER:
   |
v_mbcnt_lo_u32_b32_e64 v2, -1, 0   |  v_mbcnt_lo_u32_b32_e64 v2, -1, 
0
v_mbcnt_hi_u32_b32_e64 v2, -1, v2  |  v_mbcnt_hi_u32_b32_e64 v2, -1, 
v2
v_lshlrev_b32_e32 v4, 2, v2|  v_and_b32_e32 v2, 0x3ffc, 
v2

v_and_b32_e32 v2, -4, v2   |  v_lshlrev_b32_e32 v2, 2, v2
v_lshlrev_b32_e32 v3, 2, v2|  ds_bpermute_b32 v3, v2, v0
s_mov_b32 m0, -1   |  ds_bpermute_b32 v0, v2, v0 
offset:4

ds_write_b32 v4, v0|  s_waitcnt lgkmcnt(0)
s_waitcnt lgkmcnt(0)   |
v_or_b32_e32 v0, 1, v2 |
v_lshlrev_b32_e32 v0, 2, v0|
ds_read_b32 v1, v3 |
ds_read_b32 v0, v0 |
s_waitcnt lgkmcnt(0)   |
   |
LDS: 1 blocks  |  LDS: 0 blocks


Nice.


Were these intrinsics already available in LLVM 3.6? If not, the old
code needs to be kept for backwards compatibility.


I can see now that you're taking care of this for the bpermute
intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 
3.8.


How do you feel about increasing the requirement to LLVM 3.8 for Mesa 
git?


+1 from me. Supporting more than two generations of LLVM is a bit much 
to carry imho.




Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute

2016-04-16 Thread Marek Olšák
On Sat, Apr 16, 2016 at 8:04 AM, Michel Dänzer  wrote:
> On 16.04.2016 14:51, Michel Dänzer wrote:
>> On 16.04.2016 11:39, Tom Stellard wrote:
>>> The ds_bpermute instruction allows threads to transfer data directly
>>> to or from the vgprs of other threads.  These instructions use the lds
>>> hardware to transfer data, but do not read or write lds memory.
>>>
>>> DDX BEFORE:|  DDX AFTER:
>>>|
>>> v_mbcnt_lo_u32_b32_e64 v2, -1, 0   |  v_mbcnt_lo_u32_b32_e64 v2, -1, 0
>>> v_mbcnt_hi_u32_b32_e64 v2, -1, v2  |  v_mbcnt_hi_u32_b32_e64 v2, -1, v2
>>> v_lshlrev_b32_e32 v4, 2, v2|  v_and_b32_e32 v2, 0x3ffc, v2
>>> v_and_b32_e32 v2, -4, v2   |  v_lshlrev_b32_e32 v2, 2, v2
>>> v_lshlrev_b32_e32 v3, 2, v2|  ds_bpermute_b32 v3, v2, v0
>>> s_mov_b32 m0, -1   |  ds_bpermute_b32 v0, v2, v0 offset:4
>>> ds_write_b32 v4, v0|  s_waitcnt lgkmcnt(0)
>>> s_waitcnt lgkmcnt(0)   |
>>> v_or_b32_e32 v0, 1, v2 |
>>> v_lshlrev_b32_e32 v0, 2, v0|
>>> ds_read_b32 v1, v3 |
>>> ds_read_b32 v0, v0 |
>>> s_waitcnt lgkmcnt(0)   |
>>>|
>>> LDS: 1 blocks  |  LDS: 0 blocks
>>
>> Nice.
>>
>>
>> Were these intrinsics already available in LLVM 3.6? If not, the old
>> code needs to be kept for backwards compatibility.
>
> I can see now that you're taking care of this for the bpermute
> intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8.

How do you feel about increasing the requirement to LLVM 3.8 for Mesa git?

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute

2016-04-16 Thread Michel Dänzer
On 16.04.2016 14:51, Michel Dänzer wrote:
> On 16.04.2016 11:39, Tom Stellard wrote:
>> The ds_bpermute instruction allows threads to transfer data directly
>> to or from the vgprs of other threads.  These instructions use the lds
>> hardware to transfer data, but do not read or write lds memory.
>>
>> DDX BEFORE:|  DDX AFTER:
>>|
>> v_mbcnt_lo_u32_b32_e64 v2, -1, 0   |  v_mbcnt_lo_u32_b32_e64 v2, -1, 0
>> v_mbcnt_hi_u32_b32_e64 v2, -1, v2  |  v_mbcnt_hi_u32_b32_e64 v2, -1, v2
>> v_lshlrev_b32_e32 v4, 2, v2|  v_and_b32_e32 v2, 0x3ffc, v2
>> v_and_b32_e32 v2, -4, v2   |  v_lshlrev_b32_e32 v2, 2, v2
>> v_lshlrev_b32_e32 v3, 2, v2|  ds_bpermute_b32 v3, v2, v0
>> s_mov_b32 m0, -1   |  ds_bpermute_b32 v0, v2, v0 offset:4
>> ds_write_b32 v4, v0|  s_waitcnt lgkmcnt(0)
>> s_waitcnt lgkmcnt(0)   |
>> v_or_b32_e32 v0, 1, v2 |
>> v_lshlrev_b32_e32 v0, 2, v0|
>> ds_read_b32 v1, v3 |
>> ds_read_b32 v0, v0 |
>> s_waitcnt lgkmcnt(0)   |
>>|
>> LDS: 1 blocks  |  LDS: 0 blocks
> 
> Nice.
> 
> 
> Were these intrinsics already available in LLVM 3.6? If not, the old
> code needs to be kept for backwards compatibility.

I can see now that you're taking care of this for the bpermute
intrinsic, but AFAICT the mbcnt intrinsics were only added in LLVM 3.8.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute

2016-04-15 Thread Michel Dänzer
On 16.04.2016 11:39, Tom Stellard wrote:
> The ds_bpermute instruction allows threads to transfer data directly
> to or from the vgprs of other threads.  These instructions use the lds
> hardware to transfer data, but do not read or write lds memory.
> 
> DDX BEFORE:|  DDX AFTER:
>|
> v_mbcnt_lo_u32_b32_e64 v2, -1, 0   |  v_mbcnt_lo_u32_b32_e64 v2, -1, 0
> v_mbcnt_hi_u32_b32_e64 v2, -1, v2  |  v_mbcnt_hi_u32_b32_e64 v2, -1, v2
> v_lshlrev_b32_e32 v4, 2, v2|  v_and_b32_e32 v2, 0x3ffc, v2
> v_and_b32_e32 v2, -4, v2   |  v_lshlrev_b32_e32 v2, 2, v2
> v_lshlrev_b32_e32 v3, 2, v2|  ds_bpermute_b32 v3, v2, v0
> s_mov_b32 m0, -1   |  ds_bpermute_b32 v0, v2, v0 offset:4
> ds_write_b32 v4, v0|  s_waitcnt lgkmcnt(0)
> s_waitcnt lgkmcnt(0)   |
> v_or_b32_e32 v0, 1, v2 |
> v_lshlrev_b32_e32 v0, 2, v0|
> ds_read_b32 v1, v3 |
> ds_read_b32 v0, v0 |
> s_waitcnt lgkmcnt(0)   |
>|
> LDS: 1 blocks  |  LDS: 0 blocks

Nice.


Were these intrinsics already available in LLVM 3.6? If not, the old
code needs to be kept for backwards compatibility.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radeonsi: Implement ddx/ddy on VI using ds_bpermute

2016-04-15 Thread Tom Stellard
The ds_bpermute instruction allows threads to transfer data directly
to or from the vgprs of other threads.  These instructions use the lds
hardware to transfer data, but do not read or write lds memory.

DDX BEFORE:|  DDX AFTER:
   |
v_mbcnt_lo_u32_b32_e64 v2, -1, 0   |  v_mbcnt_lo_u32_b32_e64 v2, -1, 0
v_mbcnt_hi_u32_b32_e64 v2, -1, v2  |  v_mbcnt_hi_u32_b32_e64 v2, -1, v2
v_lshlrev_b32_e32 v4, 2, v2|  v_and_b32_e32 v2, 0x3ffc, v2
v_and_b32_e32 v2, -4, v2   |  v_lshlrev_b32_e32 v2, 2, v2
v_lshlrev_b32_e32 v3, 2, v2|  ds_bpermute_b32 v3, v2, v0
s_mov_b32 m0, -1   |  ds_bpermute_b32 v0, v2, v0 offset:4
ds_write_b32 v4, v0|  s_waitcnt lgkmcnt(0)
s_waitcnt lgkmcnt(0)   |
v_or_b32_e32 v0, 1, v2 |
v_lshlrev_b32_e32 v0, 2, v0|
ds_read_b32 v1, v3 |
ds_read_b32 v0, v0 |
s_waitcnt lgkmcnt(0)   |
   |
LDS: 1 blocks  |  LDS: 0 blocks
---
 src/gallium/drivers/radeonsi/si_shader.c | 51 +---
 1 file changed, 34 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 377ff26..c3d03eb 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4117,22 +4117,22 @@ static void si_llvm_emit_ddxy(
LLVMValueRef indices[2];
LLVMValueRef store_ptr, load_ptr0, load_ptr1;
LLVMValueRef tl, trbl, result[4];
-   LLVMValueRef tid_args[2];
+   LLVMValueRef tl_tid, trbl_tid, tid, tid_args[2];
unsigned swizzle[4];
unsigned c;
int idx;
unsigned mask;
-
tid_args[0] = lp_build_const_int32(gallivm, 0x);
tid_args[1] = bld_base->uint_bld.zero;
tid_args[1] = lp_build_intrinsic(gallivm->builder,
"llvm.amdgcn.mbcnt.lo", ctx->i32,
tid_args, 2, LLVMReadNoneAttribute);
-
-   indices[0] = bld_base->uint_bld.zero;
-   indices[1] = lp_build_intrinsic(gallivm->builder,
+   tid = lp_build_intrinsic(gallivm->builder,
"llvm.amdgcn.mbcnt.hi", ctx->i32,
tid_args, 2, LLVMReadNoneAttribute);
+
+   indices[0] = bld_base->uint_bld.zero;
+   indices[1] = tid;
store_ptr = LLVMBuildGEP(gallivm->builder, ctx->lds,
 indices, 2, "");
 
@@ -4143,20 +4143,24 @@ static void si_llvm_emit_ddxy(
else
mask = TID_MASK_TOP_LEFT;
 
-   indices[1] = LLVMBuildAnd(gallivm->builder, indices[1],
- lp_build_const_int32(gallivm, mask), "");
+   tl_tid = LLVMBuildAnd(gallivm->builder, indices[1],
+   lp_build_const_int32(gallivm, mask), "");
+   indices[1] = tl_tid;
load_ptr0 = LLVMBuildGEP(gallivm->builder, ctx->lds,
 indices, 2, "");
 
/* for DDX we want to next X pixel, DDY next Y pixel. */
idx = (opcode == TGSI_OPCODE_DDX || opcode == TGSI_OPCODE_DDX_FINE) ? 1 
: 2;
-   indices[1] = LLVMBuildAdd(gallivm->builder, indices[1],
+   trbl_tid = LLVMBuildAdd(gallivm->builder, indices[1],
  lp_build_const_int32(gallivm, idx), "");
+   indices[1] = trbl_tid;
load_ptr1 = LLVMBuildGEP(gallivm->builder, ctx->lds,
 indices, 2, "");
 
for (c = 0; c < 4; ++c) {
unsigned i;
+   LLVMValueRef val;
+   LLVMValueRef args[2];
 
swizzle[c] = 
tgsi_util_get_full_src_register_swizzle(>Src[0], c);
for (i = 0; i < c; ++i) {
@@ -4168,18 +4172,31 @@ static void si_llvm_emit_ddxy(
if (i != c)
continue;
 
-   LLVMBuildStore(gallivm->builder,
-  LLVMBuildBitCast(gallivm->builder,
-   lp_build_emit_fetch(bld_base, 
inst, 0, c),
-   ctx->i32, ""),
-  store_ptr);
+   val = LLVMBuildBitCast(gallivm->builder,
+   lp_build_emit_fetch(bld_base, inst, 0, c),
+   ctx->i32, "");
 
-   tl = LLVMBuildLoad(gallivm->builder, load_ptr0, "");
-   tl = LLVMBuildBitCast(gallivm->builder, tl, ctx->f32, "");
+   if ((HAVE_LLVM >= 0x0309) && ctx->screen->b.family >= 
CHIP_TONGA) {
 
-   trbl = LLVMBuildLoad(gallivm->builder, load_ptr1, "");
-   trbl = LLVMBuildBitCast(gallivm->builder, trbl, ctx->f32, "");
+   args[0] = LLVMBuildMul(gallivm->builder, tl_tid,
+