Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.

2017-07-08 Thread Marek Olšák
On Jul 8, 2017 1:59 PM, "Christian König"  wrote:

Am 08.07.2017 um 00:27 schrieb Marek Olšák:

> On Fri, Jul 7, 2017 at 9:37 PM, Dave Airlie  wrote:
>
>> On 8 July 2017 at 04:07, Christian König  wrote:
>>
>>> Am 07.07.2017 um 18:51 schrieb Marek Olšák:
>>>
 On Fri, Jul 7, 2017 at 11:18 AM, Christian König
  wrote:

> What tilling format have the destination textures?
>
> Sounds like the offset is just added so that we distribute memory
> accesses
> more equally over memory channels.
>
 You can't set an offset that is not aligned. The hardware ignores the
 low unaligned bits, so they have a different meaning. They specify
 pipe and bank rotation for macro tiling. It's like a state. It
 basically rotates the tile pattern.

>>>
>>> Yeah, I know. That's what I meant with distributing memory accesses more
>>> equally over all channels. The lower bits select a memory bank swizzle
>>> IIRC.
>>>
>>> I've tried years ago with R600 if shuffling them randomly could improve
>>> performance, but MRT wasn't widely used and/or supported at that time.
>>>
>> I'd known this and forgotten, the public CIK docs say bits 0..7 must be
>> zero,
>> but I have older docs which had more info. It would be nice if we could
>> get
>> proper docs released for the bottom bits considering AMD are using them
>> in their
>> drivers.
>>
>
I'm pretty sure AMD released that stuff years ago because I knew of it
before starting to work for AMD.


I think it was first released as addrlib source code. Some people might
have had access to docs under NDA, but it wasn't known publicly. I didn't
know it when I started at AMD.

Marek



The low 8 bits of the address are unused and can't be set, because
> CB_COLOR0_BASE is shifted by 8 bits. We are really talking about bits
> starting from 8 going higher. E.g. 8K alignment gives you 5 bits that
> can be used to express the rotation.
>
> It would be good to know what registers have the bits that matter (i.e.
>> BASE,
>> FMASK, CMASK, DCC, and resource descriptors.)
>>
>
The feature to select the memory pipe/bank to start with is implemented in
the MC. So AFAIK all blocks are programmed the same way regarding this.
E.g. you can use it for UVD/VCE as well.



>> Then I suppose we'd need to know the algorithm for programming them, and
>> if we need to make any allocations bigger in order to do so.
>>
>
As far as I understand it you don't need to make anything bigger. Addrlib
makes sure anyway that all pipe/banks are covered by a texture allocation
as soon as you select some tilling mode (linear is obviously an exception).

Regards,
Christian.


I expect this only starts to matter when we hit memory bandwidth limits,
>> the deferred demo does 3 MRT, one depth at 2kx2k then samples from those
>> down to 1280x720 displayed. This combined with a 3 instanced 57k vertex
>> draw seemed to be enough to see the pain. (Maybe a GL example doing
>> something
>> similiar might show the problem for radeonsi).
>>
> Addrlib contains the encoding code for the base address pipe/bank bits.
>
> The other open question I have, is does this just matter for MRT or does
>> texture
>> sampling also get some boost from it, my hack patch does it for only
>> surfaces which
>> will end up attached to the CB.
>>
> Yes, it should be done for read-only textures too.
>
> I'll update the patch to not call it an offset but name them the tile
>> rotation bits.
>>
> The proper name is "tile swizzle" or "pipe/bank swizzle". On gfx9,
> it's called "pipe/bank xor".
>
> Marek
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.

2017-07-08 Thread Christian König

Am 08.07.2017 um 00:27 schrieb Marek Olšák:

On Fri, Jul 7, 2017 at 9:37 PM, Dave Airlie  wrote:

On 8 July 2017 at 04:07, Christian König  wrote:

Am 07.07.2017 um 18:51 schrieb Marek Olšák:

On Fri, Jul 7, 2017 at 11:18 AM, Christian König
 wrote:

What tilling format have the destination textures?

Sounds like the offset is just added so that we distribute memory
accesses
more equally over memory channels.

You can't set an offset that is not aligned. The hardware ignores the
low unaligned bits, so they have a different meaning. They specify
pipe and bank rotation for macro tiling. It's like a state. It
basically rotates the tile pattern.


Yeah, I know. That's what I meant with distributing memory accesses more
equally over all channels. The lower bits select a memory bank swizzle IIRC.

I've tried years ago with R600 if shuffling them randomly could improve
performance, but MRT wasn't widely used and/or supported at that time.

I'd known this and forgotten, the public CIK docs say bits 0..7 must be zero,
but I have older docs which had more info. It would be nice if we could get
proper docs released for the bottom bits considering AMD are using them in their
drivers.


I'm pretty sure AMD released that stuff years ago because I knew of it 
before starting to work for AMD.



The low 8 bits of the address are unused and can't be set, because
CB_COLOR0_BASE is shifted by 8 bits. We are really talking about bits
starting from 8 going higher. E.g. 8K alignment gives you 5 bits that
can be used to express the rotation.


It would be good to know what registers have the bits that matter (i.e. BASE,
FMASK, CMASK, DCC, and resource descriptors.)


The feature to select the memory pipe/bank to start with is implemented 
in the MC. So AFAIK all blocks are programmed the same way regarding 
this. E.g. you can use it for UVD/VCE as well.




Then I suppose we'd need to know the algorithm for programming them, and
if we need to make any allocations bigger in order to do so.


As far as I understand it you don't need to make anything bigger. 
Addrlib makes sure anyway that all pipe/banks are covered by a texture 
allocation as soon as you select some tilling mode (linear is obviously 
an exception).


Regards,
Christian.


I expect this only starts to matter when we hit memory bandwidth limits,
the deferred demo does 3 MRT, one depth at 2kx2k then samples from those
down to 1280x720 displayed. This combined with a 3 instanced 57k vertex
draw seemed to be enough to see the pain. (Maybe a GL example doing something
similiar might show the problem for radeonsi).

Addrlib contains the encoding code for the base address pipe/bank bits.


The other open question I have, is does this just matter for MRT or does texture
sampling also get some boost from it, my hack patch does it for only
surfaces which
will end up attached to the CB.

Yes, it should be done for read-only textures too.


I'll update the patch to not call it an offset but name them the tile
rotation bits.

The proper name is "tile swizzle" or "pipe/bank swizzle". On gfx9,
it's called "pipe/bank xor".

Marek



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.

2017-07-07 Thread Marek Olšák
On Fri, Jul 7, 2017 at 9:37 PM, Dave Airlie  wrote:
> On 8 July 2017 at 04:07, Christian König  wrote:
>> Am 07.07.2017 um 18:51 schrieb Marek Olšák:
>>>
>>> On Fri, Jul 7, 2017 at 11:18 AM, Christian König
>>>  wrote:

 What tilling format have the destination textures?

 Sounds like the offset is just added so that we distribute memory
 accesses
 more equally over memory channels.
>>>
>>> You can't set an offset that is not aligned. The hardware ignores the
>>> low unaligned bits, so they have a different meaning. They specify
>>> pipe and bank rotation for macro tiling. It's like a state. It
>>> basically rotates the tile pattern.
>>
>>
>> Yeah, I know. That's what I meant with distributing memory accesses more
>> equally over all channels. The lower bits select a memory bank swizzle IIRC.
>>
>> I've tried years ago with R600 if shuffling them randomly could improve
>> performance, but MRT wasn't widely used and/or supported at that time.
>
> I'd known this and forgotten, the public CIK docs say bits 0..7 must be zero,
> but I have older docs which had more info. It would be nice if we could get
> proper docs released for the bottom bits considering AMD are using them in 
> their
> drivers.

The low 8 bits of the address are unused and can't be set, because
CB_COLOR0_BASE is shifted by 8 bits. We are really talking about bits
starting from 8 going higher. E.g. 8K alignment gives you 5 bits that
can be used to express the rotation.

>
> It would be good to know what registers have the bits that matter (i.e. BASE,
> FMASK, CMASK, DCC, and resource descriptors.)
>
> Then I suppose we'd need to know the algorithm for programming them, and
> if we need to make any allocations bigger in order to do so.
>
> I expect this only starts to matter when we hit memory bandwidth limits,
> the deferred demo does 3 MRT, one depth at 2kx2k then samples from those
> down to 1280x720 displayed. This combined with a 3 instanced 57k vertex
> draw seemed to be enough to see the pain. (Maybe a GL example doing something
> similiar might show the problem for radeonsi).

Addrlib contains the encoding code for the base address pipe/bank bits.

>
> The other open question I have, is does this just matter for MRT or does 
> texture
> sampling also get some boost from it, my hack patch does it for only
> surfaces which
> will end up attached to the CB.

Yes, it should be done for read-only textures too.

>
> I'll update the patch to not call it an offset but name them the tile
> rotation bits.

The proper name is "tile swizzle" or "pipe/bank swizzle". On gfx9,
it's called "pipe/bank xor".

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount. (v2)

2017-07-07 Thread Dave Airlie
From: Dave Airlie 

(this patch doesn't seem to work fully, hopefully AMD can tell us
more info on the rules, and how to calculate the magic).

It appears that to get full access to memory bandwidth with MRT
rendering the pro vulkan driver seems to offset each image by 0x3800.
I'm not sure how that value is calculated.

Glenn came up with the idea (probably what -pro does also) of just
offseting every image in round robin order, in the hope that apps
would create mrt images in sequence anyways.

This attempts to do that using an atomic counter in the device.

This gets the deferred demo from 800fps->1150fps on my rx480.

(I've tested dota2 and talos still run at least after this)

v2: acknowledge it isn't an offset but a tile rotation pattern.
add a quote from evergreen docs
---
 src/amd/vulkan/radv_device.c  |  8 
 src/amd/vulkan/radv_image.c   | 22 ++
 src/amd/vulkan/radv_private.h |  3 +++
 3 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 59efccf..fb15ed6 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -2756,16 +2756,16 @@ radv_initialise_color_surface(struct radv_device 
*device,
}
}
 
-   cb->cb_color_base = va >> 8;
+   cb->cb_color_base = (va >> 8) | iview->image->tile_rotate_bits;
 
/* CMASK variables */
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
va += iview->image->cmask.offset;
-   cb->cb_color_cmask = va >> 8;
+   cb->cb_color_cmask = (va >> 8) | iview->image->tile_rotate_bits;
 
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
va += iview->image->dcc_offset;
-   cb->cb_dcc_base = va >> 8;
+   cb->cb_dcc_base = (va >> 8) | iview->image->tile_rotate_bits;
 
uint32_t max_slice = radv_surface_layer_count(iview);
cb->cb_color_view = S_028C6C_SLICE_START(iview->base_layer) |
@@ -2780,7 +2780,7 @@ radv_initialise_color_surface(struct radv_device *device,
 
if (iview->image->fmask.size) {
va = device->ws->buffer_get_va(iview->bo) + 
iview->image->offset + iview->image->fmask.offset;
-   cb->cb_color_fmask = va >> 8;
+   cb->cb_color_fmask = (va >> 8) | iview->image->tile_rotate_bits;
} else {
cb->cb_color_fmask = cb->cb_color_base;
}
diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c
index b3a223b..b57a7d1 100644
--- a/src/amd/vulkan/radv_image.c
+++ b/src/amd/vulkan/radv_image.c
@@ -31,6 +31,7 @@
 #include "sid.h"
 #include "gfx9d.h"
 #include "util/debug.h"
+#include "util/u_atomic.h"
 static unsigned
 radv_choose_tiling(struct radv_device *Device,
   const struct radv_image_create_info *create_info)
@@ -208,7 +209,7 @@ si_set_mutable_tex_desc_fields(struct radv_device *device,
} else
va += base_level_info->offset;
 
-   state[0] = va >> 8;
+   state[0] = (va >> 8) | image->tile_rotate_bits;
state[1] &= C_008F14_BASE_ADDRESS_HI;
state[1] |= S_008F14_BASE_ADDRESS_HI(va >> 40);
state[3] |= S_008F1C_TILING_INDEX(si_tile_mode_index(image, base_level,
@@ -223,8 +224,7 @@ si_set_mutable_tex_desc_fields(struct radv_device *device,
if (chip_class <= VI)
meta_va += base_level_info->dcc_offset;
state[6] |= S_008F28_COMPRESSION_EN(1);
-   state[7] = meta_va >> 8;
-
+   state[7] = (meta_va >> 8) | image->tile_rotate_bits;
}
}
 
@@ -471,7 +471,7 @@ si_make_texture_descriptor(struct radv_device *device,
num_format = V_008F14_IMG_NUM_FORMAT_UINT;
}
 
-   fmask_state[0] = va >> 8;
+   fmask_state[0] = (va >> 8) | image->tile_rotate_bits;
fmask_state[1] = S_008F14_BASE_ADDRESS_HI(va >> 40) |
S_008F14_DATA_FORMAT_GFX6(fmask_format) |
S_008F14_NUM_FORMAT_GFX6(num_format);
@@ -801,6 +801,20 @@ radv_image_create(VkDevice _device,
image->size = image->surface.surf_size;
image->alignment = image->surface.surf_alignment;
 
+   if ((pCreateInfo->usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT) && 
!create_info->scanout) {
+   /*
+* from the evergreen docs -
+* Bits [p-1:0] of this field, where p =
+* log2(numPipes), specifiy the pipe swizzle. Bits [p+b-
+* 1:p], where b = log2(numBanks) specify the bank
+* swizzle.
+* this may not be correct for GCN gpus.
+   */
+   uint32_t mrt_idx = 
p_atomic_inc_return(>image_mrt_offset_counter) - 1;
+   mrt_idx %= 4;
+   image->tile_rotate_bits = 0x38 * mrt_idx;
+   }
+

Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.

2017-07-07 Thread Dave Airlie
On 8 July 2017 at 04:07, Christian König  wrote:
> Am 07.07.2017 um 18:51 schrieb Marek Olšák:
>>
>> On Fri, Jul 7, 2017 at 11:18 AM, Christian König
>>  wrote:
>>>
>>> What tilling format have the destination textures?
>>>
>>> Sounds like the offset is just added so that we distribute memory
>>> accesses
>>> more equally over memory channels.
>>
>> You can't set an offset that is not aligned. The hardware ignores the
>> low unaligned bits, so they have a different meaning. They specify
>> pipe and bank rotation for macro tiling. It's like a state. It
>> basically rotates the tile pattern.
>
>
> Yeah, I know. That's what I meant with distributing memory accesses more
> equally over all channels. The lower bits select a memory bank swizzle IIRC.
>
> I've tried years ago with R600 if shuffling them randomly could improve
> performance, but MRT wasn't widely used and/or supported at that time.

I'd known this and forgotten, the public CIK docs say bits 0..7 must be zero,
but I have older docs which had more info. It would be nice if we could get
proper docs released for the bottom bits considering AMD are using them in their
drivers.

It would be good to know what registers have the bits that matter (i.e. BASE,
FMASK, CMASK, DCC, and resource descriptors.)

Then I suppose we'd need to know the algorithm for programming them, and
if we need to make any allocations bigger in order to do so.

I expect this only starts to matter when we hit memory bandwidth limits,
the deferred demo does 3 MRT, one depth at 2kx2k then samples from those
down to 1280x720 displayed. This combined with a 3 instanced 57k vertex
draw seemed to be enough to see the pain. (Maybe a GL example doing something
similiar might show the problem for radeonsi).

The other open question I have, is does this just matter for MRT or does texture
sampling also get some boost from it, my hack patch does it for only
surfaces which
will end up attached to the CB.

I'll update the patch to not call it an offset but name them the tile
rotation bits.

Thanks,
Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.

2017-07-07 Thread Christian König

Am 07.07.2017 um 18:51 schrieb Marek Olšák:

On Fri, Jul 7, 2017 at 11:18 AM, Christian König
 wrote:

What tilling format have the destination textures?

Sounds like the offset is just added so that we distribute memory accesses
more equally over memory channels.

You can't set an offset that is not aligned. The hardware ignores the
low unaligned bits, so they have a different meaning. They specify
pipe and bank rotation for macro tiling. It's like a state. It
basically rotates the tile pattern.


Yeah, I know. That's what I meant with distributing memory accesses more 
equally over all channels. The lower bits select a memory bank swizzle IIRC.


I've tried years ago with R600 if shuffling them randomly could improve 
performance, but MRT wasn't widely used and/or supported at that time.


Christian.



Marek



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.

2017-07-07 Thread Marek Olšák
On Fri, Jul 7, 2017 at 11:18 AM, Christian König
 wrote:
> What tilling format have the destination textures?
>
> Sounds like the offset is just added so that we distribute memory accesses
> more equally over memory channels.

You can't set an offset that is not aligned. The hardware ignores the
low unaligned bits, so they have a different meaning. They specify
pipe and bank rotation for macro tiling. It's like a state. It
basically rotates the tile pattern.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.

2017-07-07 Thread Dave Airlie
On 7 Jul. 2017 19:29, "Christian König"  wrote:

What tilling format have the destination textures?

Sounds like the offset is just added so that we distribute memory accesses
more equally over memory channels.


>From the traces i think tile index mode was 10.

Dave.


Regards,
Christian.


Am 07.07.2017 um 09:18 schrieb Dave Airlie:

> From: Dave Airlie 
>
> (this patch doesn't seem to work fully, hopefully AMD can tell us
> more info on the rules, and how to calculate the magic).
>
> It appears that to get full access to memory bandwidth with MRT
> rendering the pro vulkan driver seems to offset each image by 0x3800.
> I'm not sure how that value is calculated.
>
> Glenn came up with the idea (probably what -pro does also) of just
> offseting every image in round robin order, in the hope that apps
> would create mrt images in sequence anyways.
>
> This attempts to do that using an atomic counter in the device.
>
> This gets the deferred demo from 800fps->1150fps on my rx480.
>
> (I've tested dota2 and talos still run at least after this)
> ---
>   src/amd/vulkan/radv_device.c  |  7 ---
>   src/amd/vulkan/radv_image.c   | 16 +++-
>   src/amd/vulkan/radv_private.h |  3 +++
>   3 files changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index d1c519a..f39526d 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -2706,7 +2706,7 @@ radv_initialise_color_surface(struct radv_device
> *device,
> /* Intensity is implemented as Red, so treat it that way. */
> cb->cb_color_attrib = S_028C74_FORCE_DST_ALPHA_1(desc->swizzle[3]
> == VK_SWIZZLE_1);
>   - va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
> +   va = device->ws->buffer_get_va(iview->bo) + iview->image->offset
> + iview->image->mrt_offset;
> if (device->physical_device->rad_info.chip_class >= GFX9) {
> struct gfx9_surf_meta_flags meta;
> @@ -2756,11 +2756,11 @@ radv_initialise_color_surface(struct radv_device
> *device,
> /* CMASK variables */
> va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
> -   va += iview->image->cmask.offset;
> +   va += iview->image->cmask.offset + iview->image->mrt_offset;
> cb->cb_color_cmask = va >> 8;
> va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
> -   va += iview->image->dcc_offset;
> +   va += iview->image->dcc_offset + iview->image->mrt_offset;
> cb->cb_dcc_base = va >> 8;
> uint32_t max_slice = radv_surface_layer_count(iview);
> @@ -2776,6 +2776,7 @@ radv_initialise_color_surface(struct radv_device
> *device,
> if (iview->image->fmask.size) {
> va = device->ws->buffer_get_va(iview->bo) +
> iview->image->offset + iview->image->fmask.offset;
> +   va += iview->image->mrt_offset;
> cb->cb_color_fmask = va >> 8;
> } else {
> cb->cb_color_fmask = cb->cb_color_base;
> diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c
> index b3a223b..bc20a53 100644
> --- a/src/amd/vulkan/radv_image.c
> +++ b/src/amd/vulkan/radv_image.c
> @@ -31,6 +31,7 @@
>   #include "sid.h"
>   #include "gfx9d.h"
>   #include "util/debug.h"
> +#include "util/u_atomic.h"
>   static unsigned
>   radv_choose_tiling(struct radv_device *Device,
>const struct radv_image_create_info *create_info)
> @@ -208,6 +209,7 @@ si_set_mutable_tex_desc_fields(struct radv_device
> *device,
> } else
> va += base_level_info->offset;
>   + va += image->mrt_offset;
> state[0] = va >> 8;
> state[1] &= C_008F14_BASE_ADDRESS_HI;
> state[1] |= S_008F14_BASE_ADDRESS_HI(va >> 40);
> @@ -220,6 +222,7 @@ si_set_mutable_tex_desc_fields(struct radv_device
> *device,
> state[7] = 0;
> if (image->surface.dcc_size && first_level <
> image->surface.num_dcc_levels) {
> uint64_t meta_va = gpu_address + image->dcc_offset;
> +   meta_va += image->mrt_offset;
> if (chip_class <= VI)
> meta_va += base_level_info->dcc_offset;
> state[6] |= S_008F28_COMPRESSION_EN(1);
> @@ -436,7 +439,7 @@ si_make_texture_descriptor(struct radv_device *device,
> uint64_t gpu_address = device->ws->buffer_get_va(imag
> e->bo);
> uint64_t va;
>   - va = gpu_address + image->offset + image->fmask.offset;
> +   va = gpu_address + image->offset + image->mrt_offset +
> image->fmask.offset;
> if (device->physical_device->rad_info.chip_class >= GFX9)
> {
> fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK;
> @@ -642,6 +645,7 @@ radv_image_alloc_fmask(struct radv_device *device,
> 

Re: [Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.

2017-07-07 Thread Christian König

What tilling format have the destination textures?

Sounds like the offset is just added so that we distribute memory 
accesses more equally over memory channels.


Regards,
Christian.

Am 07.07.2017 um 09:18 schrieb Dave Airlie:

From: Dave Airlie 

(this patch doesn't seem to work fully, hopefully AMD can tell us
more info on the rules, and how to calculate the magic).

It appears that to get full access to memory bandwidth with MRT
rendering the pro vulkan driver seems to offset each image by 0x3800.
I'm not sure how that value is calculated.

Glenn came up with the idea (probably what -pro does also) of just
offseting every image in round robin order, in the hope that apps
would create mrt images in sequence anyways.

This attempts to do that using an atomic counter in the device.

This gets the deferred demo from 800fps->1150fps on my rx480.

(I've tested dota2 and talos still run at least after this)
---
  src/amd/vulkan/radv_device.c  |  7 ---
  src/amd/vulkan/radv_image.c   | 16 +++-
  src/amd/vulkan/radv_private.h |  3 +++
  3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index d1c519a..f39526d 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -2706,7 +2706,7 @@ radv_initialise_color_surface(struct radv_device *device,
/* Intensity is implemented as Red, so treat it that way. */
cb->cb_color_attrib = S_028C74_FORCE_DST_ALPHA_1(desc->swizzle[3] == 
VK_SWIZZLE_1);
  
-	va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;

+   va = device->ws->buffer_get_va(iview->bo) + iview->image->offset + 
iview->image->mrt_offset;
  
  	if (device->physical_device->rad_info.chip_class >= GFX9) {

struct gfx9_surf_meta_flags meta;
@@ -2756,11 +2756,11 @@ radv_initialise_color_surface(struct radv_device 
*device,
  
  	/* CMASK variables */

va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
-   va += iview->image->cmask.offset;
+   va += iview->image->cmask.offset + iview->image->mrt_offset;
cb->cb_color_cmask = va >> 8;
  
  	va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;

-   va += iview->image->dcc_offset;
+   va += iview->image->dcc_offset + iview->image->mrt_offset;
cb->cb_dcc_base = va >> 8;
  
  	uint32_t max_slice = radv_surface_layer_count(iview);

@@ -2776,6 +2776,7 @@ radv_initialise_color_surface(struct radv_device *device,
  
  	if (iview->image->fmask.size) {

va = device->ws->buffer_get_va(iview->bo) + iview->image->offset + 
iview->image->fmask.offset;
+   va += iview->image->mrt_offset;
cb->cb_color_fmask = va >> 8;
} else {
cb->cb_color_fmask = cb->cb_color_base;
diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c
index b3a223b..bc20a53 100644
--- a/src/amd/vulkan/radv_image.c
+++ b/src/amd/vulkan/radv_image.c
@@ -31,6 +31,7 @@
  #include "sid.h"
  #include "gfx9d.h"
  #include "util/debug.h"
+#include "util/u_atomic.h"
  static unsigned
  radv_choose_tiling(struct radv_device *Device,
   const struct radv_image_create_info *create_info)
@@ -208,6 +209,7 @@ si_set_mutable_tex_desc_fields(struct radv_device *device,
} else
va += base_level_info->offset;
  
+	va += image->mrt_offset;

state[0] = va >> 8;
state[1] &= C_008F14_BASE_ADDRESS_HI;
state[1] |= S_008F14_BASE_ADDRESS_HI(va >> 40);
@@ -220,6 +222,7 @@ si_set_mutable_tex_desc_fields(struct radv_device *device,
state[7] = 0;
if (image->surface.dcc_size && first_level < 
image->surface.num_dcc_levels) {
uint64_t meta_va = gpu_address + image->dcc_offset;
+   meta_va += image->mrt_offset;
if (chip_class <= VI)
meta_va += base_level_info->dcc_offset;
state[6] |= S_008F28_COMPRESSION_EN(1);
@@ -436,7 +439,7 @@ si_make_texture_descriptor(struct radv_device *device,
uint64_t gpu_address = device->ws->buffer_get_va(image->bo);
uint64_t va;
  
-		va = gpu_address + image->offset + image->fmask.offset;

+   va = gpu_address + image->offset + image->mrt_offset + 
image->fmask.offset;
  
  		if (device->physical_device->rad_info.chip_class >= GFX9) {

fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK;
@@ -642,6 +645,7 @@ radv_image_alloc_fmask(struct radv_device *device,
radv_image_get_fmask_info(device, image, image->info.samples, 
>fmask);
  
  	image->fmask.offset = align64(image->size, image->fmask.alignment);

+   image->fmask.size += image->mrt_offset;
image->size = image->fmask.offset + image->fmask.size;
image->alignment = MAX2(image->alignment, image->fmask.alignment);
  }
@@ -709,6 +713,7 

[Mesa-dev] [PATCH] [rfc] radv: offset images by a differing amount.

2017-07-07 Thread Dave Airlie
From: Dave Airlie 

(this patch doesn't seem to work fully, hopefully AMD can tell us
more info on the rules, and how to calculate the magic).

It appears that to get full access to memory bandwidth with MRT
rendering the pro vulkan driver seems to offset each image by 0x3800.
I'm not sure how that value is calculated.

Glenn came up with the idea (probably what -pro does also) of just
offseting every image in round robin order, in the hope that apps
would create mrt images in sequence anyways.

This attempts to do that using an atomic counter in the device.

This gets the deferred demo from 800fps->1150fps on my rx480.

(I've tested dota2 and talos still run at least after this)
---
 src/amd/vulkan/radv_device.c  |  7 ---
 src/amd/vulkan/radv_image.c   | 16 +++-
 src/amd/vulkan/radv_private.h |  3 +++
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index d1c519a..f39526d 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -2706,7 +2706,7 @@ radv_initialise_color_surface(struct radv_device *device,
/* Intensity is implemented as Red, so treat it that way. */
cb->cb_color_attrib = S_028C74_FORCE_DST_ALPHA_1(desc->swizzle[3] == 
VK_SWIZZLE_1);
 
-   va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
+   va = device->ws->buffer_get_va(iview->bo) + iview->image->offset + 
iview->image->mrt_offset;
 
if (device->physical_device->rad_info.chip_class >= GFX9) {
struct gfx9_surf_meta_flags meta;
@@ -2756,11 +2756,11 @@ radv_initialise_color_surface(struct radv_device 
*device,
 
/* CMASK variables */
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
-   va += iview->image->cmask.offset;
+   va += iview->image->cmask.offset + iview->image->mrt_offset;
cb->cb_color_cmask = va >> 8;
 
va = device->ws->buffer_get_va(iview->bo) + iview->image->offset;
-   va += iview->image->dcc_offset;
+   va += iview->image->dcc_offset + iview->image->mrt_offset;
cb->cb_dcc_base = va >> 8;
 
uint32_t max_slice = radv_surface_layer_count(iview);
@@ -2776,6 +2776,7 @@ radv_initialise_color_surface(struct radv_device *device,
 
if (iview->image->fmask.size) {
va = device->ws->buffer_get_va(iview->bo) + 
iview->image->offset + iview->image->fmask.offset;
+   va += iview->image->mrt_offset;
cb->cb_color_fmask = va >> 8;
} else {
cb->cb_color_fmask = cb->cb_color_base;
diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c
index b3a223b..bc20a53 100644
--- a/src/amd/vulkan/radv_image.c
+++ b/src/amd/vulkan/radv_image.c
@@ -31,6 +31,7 @@
 #include "sid.h"
 #include "gfx9d.h"
 #include "util/debug.h"
+#include "util/u_atomic.h"
 static unsigned
 radv_choose_tiling(struct radv_device *Device,
   const struct radv_image_create_info *create_info)
@@ -208,6 +209,7 @@ si_set_mutable_tex_desc_fields(struct radv_device *device,
} else
va += base_level_info->offset;
 
+   va += image->mrt_offset;
state[0] = va >> 8;
state[1] &= C_008F14_BASE_ADDRESS_HI;
state[1] |= S_008F14_BASE_ADDRESS_HI(va >> 40);
@@ -220,6 +222,7 @@ si_set_mutable_tex_desc_fields(struct radv_device *device,
state[7] = 0;
if (image->surface.dcc_size && first_level < 
image->surface.num_dcc_levels) {
uint64_t meta_va = gpu_address + image->dcc_offset;
+   meta_va += image->mrt_offset;
if (chip_class <= VI)
meta_va += base_level_info->dcc_offset;
state[6] |= S_008F28_COMPRESSION_EN(1);
@@ -436,7 +439,7 @@ si_make_texture_descriptor(struct radv_device *device,
uint64_t gpu_address = device->ws->buffer_get_va(image->bo);
uint64_t va;
 
-   va = gpu_address + image->offset + image->fmask.offset;
+   va = gpu_address + image->offset + image->mrt_offset + 
image->fmask.offset;
 
if (device->physical_device->rad_info.chip_class >= GFX9) {
fmask_format = V_008F14_IMG_DATA_FORMAT_FMASK;
@@ -642,6 +645,7 @@ radv_image_alloc_fmask(struct radv_device *device,
radv_image_get_fmask_info(device, image, image->info.samples, 
>fmask);
 
image->fmask.offset = align64(image->size, image->fmask.alignment);
+   image->fmask.size += image->mrt_offset;
image->size = image->fmask.offset + image->fmask.size;
image->alignment = MAX2(image->alignment, image->fmask.alignment);
 }
@@ -709,6 +713,7 @@ radv_image_alloc_cmask(struct radv_device *device,
radv_image_get_cmask_info(device, image, >cmask);
 
image->cmask.offset = align64(image->size, image->cmask.alignment);
+