from:"Alex Smith"

Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-12 Thread Alex Smith

On Tue, 11 Jun 2019 at 14:32, Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 10.06.19 um 15:56 schrieb Bas Nieuwenhuizen:
> > On Sat, Jun 8, 2019 at 3:36 PM Alex Smith 
> wrote:
> >> On Mon, 3 Jun 2019 at 13:27, Koenig, Christian <
> christian.koe...@amd.com> wrote:
> >>> Am 03.06.19 um 14:21 schrieb Alex Smith:
> >>>
> >>> On Mon, 3 Jun 2019 at 11:57, Koenig, Christian <
> christian.koe...@amd.com> wrote:
> >>>> Am 02.06.19 um 12:32 schrieb Alex Smith:
> >>>>> Put the uncached GTT type at a higher index than the visible VRAM
> type,
> >>>>> rather than having GTT first.
> >>>>>
> >>>>> When we don't have dedicated VRAM, we don't have a non-visible VRAM
> >>>>> type, and the property flags for GTT and visible VRAM are identical.
> >>>>> According to the spec, for types with identical flags, we should give
> >>>>> the one with better performance a lower index.
> >>>>>
> >>>>> Previously, apps which follow the spec guidance for choosing a memory
> >>>>> type would have picked the GTT type in preference to visible VRAM
> (all
> >>>>> Feral games will do this), and end up with lower performance.
> >>>>>
> >>>>> On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
> >>>>> the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple
> of
> >>>>> other (Feral) games and saw similar improvement on those as well.
> >>>> Well that patch doesn't looks like a good idea to me.
> >>>>
> >>>> Using VRAM over uncached GTT should have something between no and only
> >>>> minimal performance difference on APU.
> >>>>
> >>>> To make things even worse VRAM is still needed for scanout and newer
> >>>> laptops have only a very very low default setting (32 or 16MB). So you
> >>>> can end up in VRAM clashing on those systems.
> >>>>
> >>>> Can you check some kernel statistics to figure out what exactly is
> going
> >>>> on here?
> >>>
> >>> What statistics should I look at?
> >>>
> >>>
> >>> First of all take a look at amdgpu_gem_info file in the debugfs
> directory while using GTT and match that to using VRAM. You should see a
> lot more of GTT ... CPU_GTT_USWC entries with the GTT variant. If the
> CPU_GTT_USWC flag is missing we have found the problem.
> >>>
> >>> If that looks ok, then take a look at the ttm_page_pool or
> ttm_dma_page_pool file and see how many wc/uc and wc/uc huge pages you got.
> Huge pages should be used for anything larger than 2MB, if not we have
> found the problem.
> >>>
> >>> If that still isn't the issue I need to take a look at the VM code
> again and see if we still map VRAM/GTT differently on APUs.
> >>
> >> OK, got around to looking at this. amdgpu_gem_info does have more USWC
> entries when using GTT. I've attached the output from VRAM vs GTT in case
> you can spot anything else in there.
> >>
> >> ttm_page_pool has 9806 wc, 238 wc huge, no uc or uc huge.
> > To add to this, I tried rounding up the size all application GTT
> > allocations to a multiple of 2 megabytes (+ a suballocator for buffers
> > < 2M). This increased performance a bit but not nearly what going from
> > GTT->"VRAM" brings.
>
> I need to dig deeper when I have a bit more time.
>
> The logs Alex provided didn't showed anything obviously wrong, so no
> idea what's the actual problem here.
>
> Anyway feel free to go ahead with this approach, but please keep in mind
> that this might cause problems on some systems.
>

Thanks Christian.

Bas, Samuel - what do you think to going ahead with this change in the
meantime? Perhaps we could add some threshold on minimum "VRAM" size
(256MB?) required to change the order, to avoid issues on systems where
that heap is really small?

Alex


>
> Christian.
>
> >
> >> FWIW this was from kernel 5.0.10, I just upgraded to 5.1.6 and still
> the same perf difference there.
> >>
> >> Thanks,
> >> Alex
> >>
> >>>
> >>> Thanks,
> >>> Christian.
> >>>
> >>>
> >>> Thanks,
> >>> Alex
> >>>
> >>>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>

Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-08 Thread Alex Smith

On Mon, 3 Jun 2019 at 13:27, Koenig, Christian 
wrote:

> Am 03.06.19 um 14:21 schrieb Alex Smith:
>
> On Mon, 3 Jun 2019 at 11:57, Koenig, Christian 
> wrote:
>
>> Am 02.06.19 um 12:32 schrieb Alex Smith:
>> > Put the uncached GTT type at a higher index than the visible VRAM type,
>> > rather than having GTT first.
>> >
>> > When we don't have dedicated VRAM, we don't have a non-visible VRAM
>> > type, and the property flags for GTT and visible VRAM are identical.
>> > According to the spec, for types with identical flags, we should give
>> > the one with better performance a lower index.
>> >
>> > Previously, apps which follow the spec guidance for choosing a memory
>> > type would have picked the GTT type in preference to visible VRAM (all
>> > Feral games will do this), and end up with lower performance.
>> >
>> > On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
>> > the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
>> > other (Feral) games and saw similar improvement on those as well.
>>
>> Well that patch doesn't looks like a good idea to me.
>>
>> Using VRAM over uncached GTT should have something between no and only
>> minimal performance difference on APU.
>>
>> To make things even worse VRAM is still needed for scanout and newer
>> laptops have only a very very low default setting (32 or 16MB). So you
>> can end up in VRAM clashing on those systems.
>>
>> Can you check some kernel statistics to figure out what exactly is going
>> on here?
>>
>
> What statistics should I look at?
>
>
> First of all take a look at amdgpu_gem_info file in the debugfs directory
> while using GTT and match that to using VRAM. You should see a lot more of
> GTT ... CPU_GTT_USWC entries with the GTT variant. If the CPU_GTT_USWC flag
> is missing we have found the problem.
>
> If that looks ok, then take a look at the ttm_page_pool or
> ttm_dma_page_pool file and see how many wc/uc and wc/uc huge pages you got.
> Huge pages should be used for anything larger than 2MB, if not we have
> found the problem.
>
> If that still isn't the issue I need to take a look at the VM code again
> and see if we still map VRAM/GTT differently on APUs.
>

OK, got around to looking at this. amdgpu_gem_info does have more USWC
entries when using GTT. I've attached the output from VRAM vs GTT in case
you can spot anything else in there.

ttm_page_pool has 9806 wc, 238 wc huge, no uc or uc huge.

FWIW this was from kernel 5.0.10, I just upgraded to 5.1.6 and still the
same perf difference there.

Thanks,
Alex


>
> Thanks,
> Christian.
>
>
> Thanks,
> Alex
>
>
>>
>> Regards,
>> Christian.
>>
>> >
>> > Signed-off-by: Alex Smith 
>> > ---
>> > I noticed that the memory types advertised on my Raven laptop looked a
>> > bit odd so played around with it and found this. I'm not sure if it is
>> > actually expected that the performance difference between visible VRAM
>> > and GTT is so large, seeing as it's not dedicated VRAM, but the results
>> > are clear (and consistent, tested multiple times).
>> > ---
>> >   src/amd/vulkan/radv_device.c | 18 +++---
>> >   1 file changed, 15 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
>> > index 3cf050ed220..d36ee226ebd 100644
>> > --- a/src/amd/vulkan/radv_device.c
>> > +++ b/src/amd/vulkan/radv_device.c
>> > @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct
>> radv_physical_device *device)
>> >   .heapIndex = vram_index,
>> >   };
>> >   }
>> > - if (gart_index >= 0) {
>> > + if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
>> >   device->mem_type_indices[type_count] =
>> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
>> >   device->memory_properties.memoryTypes[type_count++] =
>> (VkMemoryType) {
>> >   .propertyFlags =
>> VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
>> > - VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
>> > - (device->rad_info.has_dedicated_vram ? 0 :
>> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
>> > + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
>> >   .heapIndex = gart_index,
>> >   };
>> >   }
>> > @@ -189,6 +188,19 @@ rad

Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-03 Thread Alex Smith

Thanks, will have a look - can't look right now, will see if I can sometime
tomorrow.

Alex

On Mon, 3 Jun 2019 at 13:27, Koenig, Christian 
wrote:

> Am 03.06.19 um 14:21 schrieb Alex Smith:
>
> On Mon, 3 Jun 2019 at 11:57, Koenig, Christian 
> wrote:
>
>> Am 02.06.19 um 12:32 schrieb Alex Smith:
>> > Put the uncached GTT type at a higher index than the visible VRAM type,
>> > rather than having GTT first.
>> >
>> > When we don't have dedicated VRAM, we don't have a non-visible VRAM
>> > type, and the property flags for GTT and visible VRAM are identical.
>> > According to the spec, for types with identical flags, we should give
>> > the one with better performance a lower index.
>> >
>> > Previously, apps which follow the spec guidance for choosing a memory
>> > type would have picked the GTT type in preference to visible VRAM (all
>> > Feral games will do this), and end up with lower performance.
>> >
>> > On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
>> > the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
>> > other (Feral) games and saw similar improvement on those as well.
>>
>> Well that patch doesn't looks like a good idea to me.
>>
>> Using VRAM over uncached GTT should have something between no and only
>> minimal performance difference on APU.
>>
>> To make things even worse VRAM is still needed for scanout and newer
>> laptops have only a very very low default setting (32 or 16MB). So you
>> can end up in VRAM clashing on those systems.
>>
>> Can you check some kernel statistics to figure out what exactly is going
>> on here?
>>
>
> What statistics should I look at?
>
>
> First of all take a look at amdgpu_gem_info file in the debugfs directory
> while using GTT and match that to using VRAM. You should see a lot more of
> GTT ... CPU_GTT_USWC entries with the GTT variant. If the CPU_GTT_USWC flag
> is missing we have found the problem.
>
> If that looks ok, then take a look at the ttm_page_pool or
> ttm_dma_page_pool file and see how many wc/uc and wc/uc huge pages you got.
> Huge pages should be used for anything larger than 2MB, if not we have
> found the problem.
>
> If that still isn't the issue I need to take a look at the VM code again
> and see if we still map VRAM/GTT differently on APUs.
>
> Thanks,
> Christian.
>
>
> Thanks,
> Alex
>
>
>>
>> Regards,
>> Christian.
>>
>> >
>> > Signed-off-by: Alex Smith 
>> > ---
>> > I noticed that the memory types advertised on my Raven laptop looked a
>> > bit odd so played around with it and found this. I'm not sure if it is
>> > actually expected that the performance difference between visible VRAM
>> > and GTT is so large, seeing as it's not dedicated VRAM, but the results
>> > are clear (and consistent, tested multiple times).
>> > ---
>> >   src/amd/vulkan/radv_device.c | 18 +++---
>> >   1 file changed, 15 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
>> > index 3cf050ed220..d36ee226ebd 100644
>> > --- a/src/amd/vulkan/radv_device.c
>> > +++ b/src/amd/vulkan/radv_device.c
>> > @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct
>> radv_physical_device *device)
>> >   .heapIndex = vram_index,
>> >   };
>> >   }
>> > - if (gart_index >= 0) {
>> > + if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
>> >   device->mem_type_indices[type_count] =
>> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
>> >   device->memory_properties.memoryTypes[type_count++] =
>> (VkMemoryType) {
>> >   .propertyFlags =
>> VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
>> > - VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
>> > - (device->rad_info.has_dedicated_vram ? 0 :
>> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
>> > + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
>> >   .heapIndex = gart_index,
>> >   };
>> >   }
>> > @@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct
>> radv_physical_device *device)
>> >   .heapIndex = visible_vram_index,
>> >   };
>> >   }
>> > +

Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-03 Thread Alex Smith

On Mon, 3 Jun 2019 at 11:57, Koenig, Christian 
wrote:

> Am 02.06.19 um 12:32 schrieb Alex Smith:
> > Put the uncached GTT type at a higher index than the visible VRAM type,
> > rather than having GTT first.
> >
> > When we don't have dedicated VRAM, we don't have a non-visible VRAM
> > type, and the property flags for GTT and visible VRAM are identical.
> > According to the spec, for types with identical flags, we should give
> > the one with better performance a lower index.
> >
> > Previously, apps which follow the spec guidance for choosing a memory
> > type would have picked the GTT type in preference to visible VRAM (all
> > Feral games will do this), and end up with lower performance.
> >
> > On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
> > the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
> > other (Feral) games and saw similar improvement on those as well.
>
> Well that patch doesn't looks like a good idea to me.
>
> Using VRAM over uncached GTT should have something between no and only
> minimal performance difference on APU.
>
> To make things even worse VRAM is still needed for scanout and newer
> laptops have only a very very low default setting (32 or 16MB). So you
> can end up in VRAM clashing on those systems.
>
> Can you check some kernel statistics to figure out what exactly is going
> on here?
>

What statistics should I look at?

Thanks,
Alex


>
> Regards,
> Christian.
>
> >
> > Signed-off-by: Alex Smith 
> > ---
> > I noticed that the memory types advertised on my Raven laptop looked a
> > bit odd so played around with it and found this. I'm not sure if it is
> > actually expected that the performance difference between visible VRAM
> > and GTT is so large, seeing as it's not dedicated VRAM, but the results
> > are clear (and consistent, tested multiple times).
> > ---
> >   src/amd/vulkan/radv_device.c | 18 +++---
> >   1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> > index 3cf050ed220..d36ee226ebd 100644
> > --- a/src/amd/vulkan/radv_device.c
> > +++ b/src/amd/vulkan/radv_device.c
> > @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct
> radv_physical_device *device)
> >   .heapIndex = vram_index,
> >   };
> >   }
> > - if (gart_index >= 0) {
> > + if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
> >   device->mem_type_indices[type_count] =
> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
> >   device->memory_properties.memoryTypes[type_count++] =
> (VkMemoryType) {
> >   .propertyFlags =
> VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> > - VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
> > - (device->rad_info.has_dedicated_vram ? 0 :
> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
> > + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
> >   .heapIndex = gart_index,
> >   };
> >   }
> > @@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct
> radv_physical_device *device)
> >   .heapIndex = visible_vram_index,
> >   };
> >   }
> > + if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
> > + /* Put GTT after visible VRAM for GPUs without dedicated
> VRAM
> > +  * as they have identical property flags, and according to
> the
> > +  * spec, for types with identical flags, the one with
> greater
> > +  * performance must be given a lower index. */
> > + device->mem_type_indices[type_count] =
> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
> > + device->memory_properties.memoryTypes[type_count++] =
> (VkMemoryType) {
> > + .propertyFlags =
> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
> > + VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> > + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
> > + .heapIndex = gart_index,
> > + };
> > + }
> >   if (gart_index >= 0) {
> >   device->mem_type_indices[type_count] =
> RADV_MEM_TYPE_GTT_CACHED;
> >   device->memory_properties.memoryTypes[type_count++] =
> (VkMemoryType) {
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-03 Thread Alex Smith

On Sun, 2 Jun 2019 at 11:59, Bas Nieuwenhuizen 
wrote:

> On Sun, Jun 2, 2019 at 12:32 PM Alex Smith 
> wrote:
> >
> > Put the uncached GTT type at a higher index than the visible VRAM type,
> > rather than having GTT first.
> >
> > When we don't have dedicated VRAM, we don't have a non-visible VRAM
> > type, and the property flags for GTT and visible VRAM are identical.
> > According to the spec, for types with identical flags, we should give
> > the one with better performance a lower index.
> >
> > Previously, apps which follow the spec guidance for choosing a memory
> > type would have picked the GTT type in preference to visible VRAM (all
> > Feral games will do this), and end up with lower performance.
> >
> > On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
> > the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
> > other (Feral) games and saw similar improvement on those as well.
> >
> > Signed-off-by: Alex Smith 
> > ---
> > I noticed that the memory types advertised on my Raven laptop looked a
> > bit odd so played around with it and found this. I'm not sure if it is
> > actually expected that the performance difference between visible VRAM
> > and GTT is so large, seeing as it's not dedicated VRAM, but the results
> > are clear (and consistent, tested multiple times).
>
> AFAIU it is still using different memory paths, with GTT using
> different pagetables (those from the CPU I believe on APUs) and
> possible CPU snooping.
>
> Main risk here seems applications pushing out driver internal stuff
> (descriptor sets etc.) from "VRAM", posssibly hitting perf elsewhere.
>

Driver internal allocations have higher BO priorities than all app
allocations, wouldn't that help avoid that? I'm not sure how much effect
the priorities actually have...


> That said,
>
> Reviewed-by: Bas Nieuwenhuizen 
>
> > ---
> >  src/amd/vulkan/radv_device.c | 18 +++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> > index 3cf050ed220..d36ee226ebd 100644
> > --- a/src/amd/vulkan/radv_device.c
> > +++ b/src/amd/vulkan/radv_device.c
> > @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct
> radv_physical_device *device)
> > .heapIndex = vram_index,
> > };
> > }
> > -   if (gart_index >= 0) {
> > +   if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
> > device->mem_type_indices[type_count] =
> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
> > device->memory_properties.memoryTypes[type_count++] =
> (VkMemoryType) {
> > .propertyFlags =
> VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> > -   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
> > -   (device->rad_info.has_dedicated_vram ? 0 :
> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
> > +   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
> > .heapIndex = gart_index,
> > };
> > }
> > @@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct
> radv_physical_device *device)
> > .heapIndex = visible_vram_index,
> > };
> > }
> > +   if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
> > +   /* Put GTT after visible VRAM for GPUs without dedicated
> VRAM
> > +* as they have identical property flags, and according
> to the
> > +* spec, for types with identical flags, the one with
> greater
> > +* performance must be given a lower index. */
> > +   device->mem_type_indices[type_count] =
> RADV_MEM_TYPE_GTT_WRITE_COMBINE;
> > +   device->memory_properties.memoryTypes[type_count++] =
> (VkMemoryType) {
> > +   .propertyFlags =
> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
> > +   VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
> > +   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
> > +   .heapIndex = gart_index,
> > +   };
> > +   }
> > if (gart_index >= 0) {
> > device->mem_type_indices[type_count] =
> RADV_MEM_TYPE_GTT_CACHED;
> > device->memory_properties.memoryTypes[type_count++] =
> (VkMemoryType) {
> > --
> > 2.21.0
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radv: Change memory type order for GPUs without dedicated VRAM

2019-06-02 Thread Alex Smith

Put the uncached GTT type at a higher index than the visible VRAM type,
rather than having GTT first.

When we don't have dedicated VRAM, we don't have a non-visible VRAM
type, and the property flags for GTT and visible VRAM are identical.
According to the spec, for types with identical flags, we should give
the one with better performance a lower index.

Previously, apps which follow the spec guidance for choosing a memory
type would have picked the GTT type in preference to visible VRAM (all
Feral games will do this), and end up with lower performance.

On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
other (Feral) games and saw similar improvement on those as well.

Signed-off-by: Alex Smith 
---
I noticed that the memory types advertised on my Raven laptop looked a
bit odd so played around with it and found this. I'm not sure if it is
actually expected that the performance difference between visible VRAM
and GTT is so large, seeing as it's not dedicated VRAM, but the results
are clear (and consistent, tested multiple times).
---
 src/amd/vulkan/radv_device.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 3cf050ed220..d36ee226ebd 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct 
radv_physical_device *device)
.heapIndex = vram_index,
};
}
-   if (gart_index >= 0) {
+   if (gart_index >= 0 && device->rad_info.has_dedicated_vram) {
device->mem_type_indices[type_count] = 
RADV_MEM_TYPE_GTT_WRITE_COMBINE;
device->memory_properties.memoryTypes[type_count++] = 
(VkMemoryType) {
.propertyFlags = VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
-   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT |
-   (device->rad_info.has_dedicated_vram ? 0 : 
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT),
+   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
.heapIndex = gart_index,
};
}
@@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct 
radv_physical_device *device)
.heapIndex = visible_vram_index,
};
}
+   if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) {
+   /* Put GTT after visible VRAM for GPUs without dedicated VRAM
+* as they have identical property flags, and according to the
+* spec, for types with identical flags, the one with greater
+* performance must be given a lower index. */
+   device->mem_type_indices[type_count] = 
RADV_MEM_TYPE_GTT_WRITE_COMBINE;
+   device->memory_properties.memoryTypes[type_count++] = 
(VkMemoryType) {
+   .propertyFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT |
+   VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT |
+   VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
+   .heapIndex = gart_index,
+   };
+   }
if (gart_index >= 0) {
device->mem_type_indices[type_count] = RADV_MEM_TYPE_GTT_CACHED;
device->memory_properties.memoryTypes[type_count++] = 
(VkMemoryType) {
-- 
2.21.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 3/3] radv: add support for VK_EXT_memory_budget

2019-01-09 Thread Alex Smith

Reviewed-by: Alex Smith 

for the series.

On Wed, 9 Jan 2019 at 13:37, Samuel Pitoiset 
wrote:

> A simple Vulkan extension that allows apps to query size and
> usage of all exposed memory heaps.
>
> The different usage values are not really accurate because
> they are per drm-fd, but they should be close enough.
>
> v3: - use atomic operations in the winsys
> v2: - add software counters for the different heaps in the winsys
> - improve budget/usage computations based on these counters
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_device.c  | 72 +++
>  src/amd/vulkan/radv_extensions.py |  1 +
>  src/amd/vulkan/radv_radeon_winsys.h   |  4 ++
>  src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c | 38 +-
>  .../vulkan/winsys/amdgpu/radv_amdgpu_winsys.c |  6 ++
>  .../vulkan/winsys/amdgpu/radv_amdgpu_winsys.h |  4 ++
>  6 files changed, 124 insertions(+), 1 deletion(-)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index 279917f3e0c..4bf36f9f384 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -1350,12 +1350,84 @@ void radv_GetPhysicalDeviceMemoryProperties(
> *pMemoryProperties = physical_device->memory_properties;
>  }
>
> +static void
> +radv_get_memory_budget_properties(VkPhysicalDevice physicalDevice,
> +
>  VkPhysicalDeviceMemoryBudgetPropertiesEXT *memoryBudget)
> +{
> +   RADV_FROM_HANDLE(radv_physical_device, device, physicalDevice);
> +   VkPhysicalDeviceMemoryProperties *memory_properties =
> >memory_properties;
> +   uint64_t visible_vram_size = radv_get_visible_vram_size(device);
> +   uint64_t vram_size = radv_get_vram_size(device);
> +   uint64_t gtt_size = device->rad_info.gart_size;
> +   uint64_t heap_budget, heap_usage;
> +
> +   /* For all memory heaps, the computation of budget is as follow:
> +*  heap_budget = heap_size - global_heap_usage +
> app_heap_usage
> +*
> +* The Vulkan spec 1.1.97 says that the budget should include any
> +* currently allocated device memory.
> +*
> +* Note that the application heap usages are not really accurate
> (eg.
> +* in presence of shared buffers).
> +*/
> +   if (vram_size) {
> +   heap_usage = device->ws->query_value(device->ws,
> +
> RADEON_ALLOCATED_VRAM);
> +
> +   heap_budget = vram_size -
> +   device->ws->query_value(device->ws,
> RADEON_VRAM_USAGE) +
> +   heap_usage;
> +
> +   memoryBudget->heapBudget[RADV_MEM_HEAP_VRAM] = heap_budget;
> +   memoryBudget->heapUsage[RADV_MEM_HEAP_VRAM] = heap_usage;
> +   }
> +
> +   if (visible_vram_size) {
> +   heap_usage = device->ws->query_value(device->ws,
> +
> RADEON_ALLOCATED_VRAM_VIS);
> +
> +   heap_budget = visible_vram_size -
> +   device->ws->query_value(device->ws,
> RADEON_VRAM_VIS_USAGE) +
> +   heap_usage;
> +
> +   memoryBudget->heapBudget[RADV_MEM_HEAP_VRAM_CPU_ACCESS] =
> heap_budget;
> +   memoryBudget->heapUsage[RADV_MEM_HEAP_VRAM_CPU_ACCESS] =
> heap_usage;
> +   }
> +
> +   if (gtt_size) {
> +   heap_usage = device->ws->query_value(device->ws,
> +RADEON_ALLOCATED_GTT);
> +
> +   heap_budget = gtt_size -
> +   device->ws->query_value(device->ws,
> RADEON_GTT_USAGE) +
> +   heap_usage;
> +
> +   memoryBudget->heapBudget[RADV_MEM_HEAP_GTT] = heap_budget;
> +   memoryBudget->heapUsage[RADV_MEM_HEAP_GTT] = heap_usage;
> +   }
> +
> +   /* The heapBudget and heapUsage values must be zero for array
> elements
> +* greater than or equal to
> +* VkPhysicalDeviceMemoryProperties::memoryHeapCount.
> +*/
> +   for (uint32_t i = memory_properties->memoryHeapCount; i <
> VK_MAX_MEMORY_HEAPS; i++) {
> +   memoryBudget->heapBudget[i] = 0;
> +   memoryBudget->heapUsage[i] = 0;
> +   }
> +}
> +
>  void radv_GetPhysicalDeviceMemoryProperties2(
> VkPhysicalDevicephysicalDevice,
> VkPhysicalDeviceMemoryProperties2  *pMemoryProperties)
>  {
> radv_GetPhysicalDeviceMemoryProperties(physicalDevice,
>
>  >memoryProperties);
> +
> +   VkPhysicalDeviceMemoryBudgetPropertiesE

Re: [Mesa-dev] [PATCH v2 3/3] radv: add support for VK_EXT_memory_budget

2019-01-08 Thread Alex Smith

Thanks! I've played around with this a bit and it looks like it's behaving
how I'd expect.

One comment inline below...

On Tue, 8 Jan 2019 at 15:17, Samuel Pitoiset 
wrote:

> A simple Vulkan extension that allows apps to query size and
> usage of all exposed memory heaps.
>
> The different usage values are not really accurate because
> they are per drm-fd, but they should be close enough.
>
> v2: - add software counters for the different heaps in the winsys
> - improve budget/usage computations based on these counters
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_device.c  | 72 +++
>  src/amd/vulkan/radv_extensions.py |  1 +
>  src/amd/vulkan/radv_radeon_winsys.h   |  4 ++
>  src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c | 29 +++-
>  .../vulkan/winsys/amdgpu/radv_amdgpu_winsys.c |  6 ++
>  .../vulkan/winsys/amdgpu/radv_amdgpu_winsys.h |  4 ++
>  6 files changed, 115 insertions(+), 1 deletion(-)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index d1e47133d1f..f79d54296b4 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -1350,12 +1350,84 @@ void radv_GetPhysicalDeviceMemoryProperties(
> *pMemoryProperties = physical_device->memory_properties;
>  }
>
> +static void
> +radv_get_memory_budget_properties(VkPhysicalDevice physicalDevice,
> +
>  VkPhysicalDeviceMemoryBudgetPropertiesEXT *memoryBudget)
> +{
> +   RADV_FROM_HANDLE(radv_physical_device, device, physicalDevice);
> +   VkPhysicalDeviceMemoryProperties *memory_properties =
> >memory_properties;
> +   uint64_t visible_vram_size = radv_get_visible_vram_size(device);
> +   uint64_t vram_size = radv_get_vram_size(device);
> +   uint64_t gtt_size = device->rad_info.gart_size;
> +   uint64_t heap_budget, heap_usage;
> +
> +   /* For all memory heaps, the computation of budget is as follow:
> +*  heap_budget = heap_size - global_heap_usage +
> app_heap_usage
> +*
> +* The Vulkan spec 1.1.97 says that the budget should include any
> +* currently allocated device memory.
> +*
> +* Note that the application heap usages are not really accurate
> (eg.
> +* in presence of shared buffers).
> +*/
> +   if (vram_size) {
> +   heap_usage = device->ws->query_value(device->ws,
> +
> RADEON_ALLOCATED_VRAM);
> +
> +   heap_budget = vram_size -
> +   device->ws->query_value(device->ws,
> RADEON_VRAM_USAGE) +
> +   heap_usage;
> +
> +   memoryBudget->heapBudget[RADV_MEM_HEAP_VRAM] = heap_budget;
> +   memoryBudget->heapUsage[RADV_MEM_HEAP_VRAM] = heap_usage;
> +   }
> +
> +   if (visible_vram_size) {
> +   heap_usage = device->ws->query_value(device->ws,
> +
> RADEON_ALLOCATED_VRAM_VIS);
> +
> +   heap_budget = visible_vram_size -
> +   device->ws->query_value(device->ws,
> RADEON_VRAM_VIS_USAGE) +
> +   heap_usage;
> +
> +   memoryBudget->heapBudget[RADV_MEM_HEAP_VRAM_CPU_ACCESS] =
> heap_budget;
> +   memoryBudget->heapUsage[RADV_MEM_HEAP_VRAM_CPU_ACCESS] =
> heap_usage;
> +   }
> +
> +   if (gtt_size) {
> +   heap_usage = device->ws->query_value(device->ws,
> +RADEON_ALLOCATED_GTT);
> +
> +   heap_budget = gtt_size -
> +   device->ws->query_value(device->ws,
> RADEON_GTT_USAGE) +
> +   heap_usage;
> +
> +   memoryBudget->heapBudget[RADV_MEM_HEAP_GTT] = heap_budget;
> +   memoryBudget->heapUsage[RADV_MEM_HEAP_GTT] = heap_usage;
> +   }
> +
> +   /* The heapBudget and heapUsage values must be zero for array
> elements
> +* greater than or equal to
> +* VkPhysicalDeviceMemoryProperties::memoryHeapCount.
> +*/
> +   for (uint32_t i = memory_properties->memoryHeapCount; i <
> VK_MAX_MEMORY_HEAPS; i++) {
> +   memoryBudget->heapBudget[i] = 0;
> +   memoryBudget->heapUsage[i] = 0;
> +   }
> +}
> +
>  void radv_GetPhysicalDeviceMemoryProperties2(
> VkPhysicalDevicephysicalDevice,
> VkPhysicalDeviceMemoryProperties2KHR   *pMemoryProperties)
>  {
> radv_GetPhysicalDeviceMemoryProperties(physicalDevice,
>
>  >memoryProperties);
> +
> +   VkPhysicalDeviceMemoryBudgetPropertiesEXT *memory_budget =
> +   vk_find_struct(pMemoryProperties->pNext,
> +
> PHYSICAL_DEVICE_MEMORY_BUDGET_PROPERTIES_EXT);
> +   if (memory_budget)
> +   radv_get_memory_budget_properties(physicalDevice,
> memory_budget);
>  }
>
>  VkResult radv_GetMemoryHostPointerPropertiesEXT(
> diff --git a/src/amd/vulkan/radv_extensions.py
> b/src/amd/vulkan/radv_extensions.py
> index

Re: [Mesa-dev] [PATCH 3/3] radv: add support for VK_EXT_memory_budget

2019-01-08 Thread Alex Smith

On Mon, 7 Jan 2019 at 17:20, Samuel Pitoiset 
wrote:

>
> On 1/7/19 6:06 PM, Alex Smith wrote:
>
> Hi Samuel,
>
> Thanks for implementing this - I've been wanting this extension for a
> while so it's good it's finally available.
>
> This is just reporting the total heap sizes as the budget, which is the
> same info we already get from the basic heap properties. The way I'd
> expected budget to work (and what the spec is saying as far as I can see)
> is that it's an estimate of how much is available for the calling app to
> use in that heap at the time of the call, so should account for current
> system-wide usage of the heap by other apps. Shouldn't this be something
> like (heap size - system wide usage of the heap + current app usage of the
> heap)? (+ app usage since the spec says budget includes currently allocated
> device memory)
>
> Hi Alex,
>
> Yes, I was also wondering about that. We can add per-process counters for
> VRAM and GTT heaps, but I don't see how we can be accurate for the visible
> VRAM heap.
>
> As said in the commit description, that implementation is really
> inacurate. Though if you need something better I can improve.
>
What I'm after from this extension is the ability to get an idea of when we
will either start failing allocations, or start causing some allocations to
be paged out from a heap (and probably cause a perf degradation). We have
our own systems for relocating resources between heaps when under memory
pressure, but currently we can only decide when we need to do this based on
guesswork of how much we need to leave free in a heap for other apps or
when we actually get to the point where allocations fail. For this
extension to be useful to improve on that it needs to include some sort of
reporting of app usage vs system-wide usage.

Thanks,
Alex

> Note that I agree with you about the spec.
>
>
> Alex
>
> On Mon, 7 Jan 2019 at 16:35, Samuel Pitoiset 
> wrote:
>
>> A simple Vulkan extension that allows apps to query size and
>> usage of all exposed memory heaps.
>>
>> The different usage values are not really accurate because
>> they are per drm-fd, but they should be close enough.
>>
>> Signed-off-by: Samuel Pitoiset 
>> ---
>>  src/amd/vulkan/radv_device.c  | 44 +++
>>  src/amd/vulkan/radv_extensions.py |  1 +
>>  2 files changed, 45 insertions(+)
>>
>> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
>> index cef3a430555..32eaeb3b226 100644
>> --- a/src/amd/vulkan/radv_device.c
>> +++ b/src/amd/vulkan/radv_device.c
>> @@ -1352,12 +1352,56 @@ void radv_GetPhysicalDeviceMemoryProperties(
>> *pMemoryProperties = physical_device->memory_properties;
>>  }
>>
>> +static void
>> +radv_get_memory_budget_properties(VkPhysicalDevice physicalDevice,
>> +
>>  VkPhysicalDeviceMemoryBudgetPropertiesEXT *memoryBudget)
>> +{
>> +   RADV_FROM_HANDLE(radv_physical_device, device, physicalDevice);
>> +   VkPhysicalDeviceMemoryProperties *memory_properties =
>> >memory_properties;
>> +   uint64_t visible_vram_size = radv_get_visible_vram_size(device);
>> +   uint64_t vram_size = radv_get_vram_size(device);
>> +   uint64_t gtt_size = device->rad_info.gart_size;
>> +
>> +   if (vram_size) {
>> +   memoryBudget->heapBudget[RADV_MEM_HEAP_VRAM] = vram_size;
>> +   memoryBudget->heapUsage[RADV_MEM_HEAP_VRAM] =
>> +   device->ws->query_value(device->ws,
>> RADEON_VRAM_USAGE);
>> +   }
>> +
>> +   if (visible_vram_size) {
>> +   memoryBudget->heapBudget[RADV_MEM_HEAP_VRAM_CPU_ACCESS] =
>> visible_vram_size;
>> +   memoryBudget->heapUsage[RADV_MEM_HEAP_VRAM_CPU_ACCESS] =
>> +   device->ws->query_value(device->ws,
>> RADEON_VRAM_VIS_USAGE);
>> +   }
>> +
>> +   if (gtt_size) {
>> +   memoryBudget->heapBudget[RADV_MEM_HEAP_GTT] = gtt_size;
>> +   memoryBudget->heapUsage[RADV_MEM_HEAP_GTT] =
>> +   device->ws->query_value(device->ws,
>> RADEON_GTT_USAGE);
>> +   }
>> +
>> +   /* The heapBudget and heapUsage values must be zero for array
>> elements
>> +* greater than or equal to
>> +* VkPhysicalDeviceMemoryProperties::memoryHeapCount.
>> +*/
>> +   for (uint32_t i = memory_properties->memoryHeapCount; i <
>> VK_MAX_MEMORY_HEAPS; i++) {
>> +   memoryBudget->heapBudget[i] = 0;
>> +

Re: [Mesa-dev] [PATCH 3/3] radv: add support for VK_EXT_memory_budget

2019-01-07 Thread Alex Smith

Hi Samuel,

Thanks for implementing this - I've been wanting this extension for a while
so it's good it's finally available.

This is just reporting the total heap sizes as the budget, which is the
same info we already get from the basic heap properties. The way I'd
expected budget to work (and what the spec is saying as far as I can see)
is that it's an estimate of how much is available for the calling app to
use in that heap at the time of the call, so should account for current
system-wide usage of the heap by other apps. Shouldn't this be something
like (heap size - system wide usage of the heap + current app usage of the
heap)? (+ app usage since the spec says budget includes currently allocated
device memory)

Alex

On Mon, 7 Jan 2019 at 16:35, Samuel Pitoiset 
wrote:

> A simple Vulkan extension that allows apps to query size and
> usage of all exposed memory heaps.
>
> The different usage values are not really accurate because
> they are per drm-fd, but they should be close enough.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_device.c  | 44 +++
>  src/amd/vulkan/radv_extensions.py |  1 +
>  2 files changed, 45 insertions(+)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index cef3a430555..32eaeb3b226 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -1352,12 +1352,56 @@ void radv_GetPhysicalDeviceMemoryProperties(
> *pMemoryProperties = physical_device->memory_properties;
>  }
>
> +static void
> +radv_get_memory_budget_properties(VkPhysicalDevice physicalDevice,
> +
>  VkPhysicalDeviceMemoryBudgetPropertiesEXT *memoryBudget)
> +{
> +   RADV_FROM_HANDLE(radv_physical_device, device, physicalDevice);
> +   VkPhysicalDeviceMemoryProperties *memory_properties =
> >memory_properties;
> +   uint64_t visible_vram_size = radv_get_visible_vram_size(device);
> +   uint64_t vram_size = radv_get_vram_size(device);
> +   uint64_t gtt_size = device->rad_info.gart_size;
> +
> +   if (vram_size) {
> +   memoryBudget->heapBudget[RADV_MEM_HEAP_VRAM] = vram_size;
> +   memoryBudget->heapUsage[RADV_MEM_HEAP_VRAM] =
> +   device->ws->query_value(device->ws,
> RADEON_VRAM_USAGE);
> +   }
> +
> +   if (visible_vram_size) {
> +   memoryBudget->heapBudget[RADV_MEM_HEAP_VRAM_CPU_ACCESS] =
> visible_vram_size;
> +   memoryBudget->heapUsage[RADV_MEM_HEAP_VRAM_CPU_ACCESS] =
> +   device->ws->query_value(device->ws,
> RADEON_VRAM_VIS_USAGE);
> +   }
> +
> +   if (gtt_size) {
> +   memoryBudget->heapBudget[RADV_MEM_HEAP_GTT] = gtt_size;
> +   memoryBudget->heapUsage[RADV_MEM_HEAP_GTT] =
> +   device->ws->query_value(device->ws,
> RADEON_GTT_USAGE);
> +   }
> +
> +   /* The heapBudget and heapUsage values must be zero for array
> elements
> +* greater than or equal to
> +* VkPhysicalDeviceMemoryProperties::memoryHeapCount.
> +*/
> +   for (uint32_t i = memory_properties->memoryHeapCount; i <
> VK_MAX_MEMORY_HEAPS; i++) {
> +   memoryBudget->heapBudget[i] = 0;
> +   memoryBudget->heapUsage[i] = 0;
> +   }
> +}
> +
>  void radv_GetPhysicalDeviceMemoryProperties2(
> VkPhysicalDevicephysicalDevice,
> VkPhysicalDeviceMemoryProperties2KHR   *pMemoryProperties)
>  {
> radv_GetPhysicalDeviceMemoryProperties(physicalDevice,
>
>  >memoryProperties);
> +
> +   VkPhysicalDeviceMemoryBudgetPropertiesEXT *memory_budget =
> +   vk_find_struct(pMemoryProperties->pNext,
> +
> PHYSICAL_DEVICE_MEMORY_BUDGET_PROPERTIES_EXT);
> +   if (memory_budget)
> +   radv_get_memory_budget_properties(physicalDevice,
> memory_budget);
>  }
>
>  VkResult radv_GetMemoryHostPointerPropertiesEXT(
> diff --git a/src/amd/vulkan/radv_extensions.py
> b/src/amd/vulkan/radv_extensions.py
> index 9952bb9c1c6..491ed9d94c3 100644
> --- a/src/amd/vulkan/radv_extensions.py
> +++ b/src/amd/vulkan/radv_extensions.py
> @@ -105,6 +105,7 @@ EXTENSIONS = [
>  Extension('VK_EXT_external_memory_dma_buf',   1, True),
>  Extension('VK_EXT_external_memory_host',  1,
> 'device->rad_info.has_userptr'),
>  Extension('VK_EXT_global_priority',   1,
> 'device->rad_info.has_ctx_priority'),
> +Extension('VK_EXT_memory_budget', 1, True),
>  Extension('VK_EXT_pci_bus_info',  2, True),
>  Extension('VK_EXT_sampler_filter_minmax', 1,
> 'device->rad_info.chip_class >= CIK'),
>  Extension('VK_EXT_scalar_block_layout',   1,
> 'device->rad_info.chip_class >= CIK'),
> --
> 2.20.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
>

Re: [Mesa-dev] [PATCH] radv: reset pending_reset_query when flushing caches

2018-12-05 Thread Alex Smith

Reviewed-by: Alex Smith 

On Wed, 5 Dec 2018 at 10:32, Samuel Pitoiset 
wrote:

> If the driver used a compute shader for resetting a query pool,
> it should be completed when caches are flushed.
>
> This might reduce the number of stalls if operations are done
> between vkCmdResetQueryPool() and vkCmdBeginQuery()
> (or vkCmdWriteTimestamp()).
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_query.c| 1 -
>  src/amd/vulkan/si_cmd_buffer.c | 5 +
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
> index e226bcef6a9..276cc1c42d7 100644
> --- a/src/amd/vulkan/radv_query.c
> +++ b/src/amd/vulkan/radv_query.c
> @@ -1447,7 +1447,6 @@ static void emit_query_flush(struct radv_cmd_buffer
> *cmd_buffer,
>  * because we use a CP dma clear.
>  */
> si_emit_cache_flush(cmd_buffer);
> -   cmd_buffer->pending_reset_query = false;
> }
> }
>  }
> diff --git a/src/amd/vulkan/si_cmd_buffer.c
> b/src/amd/vulkan/si_cmd_buffer.c
> index a9f25725415..2f57584bf82 100644
> --- a/src/amd/vulkan/si_cmd_buffer.c
> +++ b/src/amd/vulkan/si_cmd_buffer.c
> @@ -992,6 +992,11 @@ si_emit_cache_flush(struct radv_cmd_buffer
> *cmd_buffer)
> radv_cmd_buffer_trace_emit(cmd_buffer);
>
> cmd_buffer->state.flush_bits = 0;
> +
> +   /* If the driver used a compute shader for resetting a query pool,
> it
> +* should be finished at this point.
> +*/
> +   cmd_buffer->pending_reset_query = false;
>  }
>
>  /* sets the CP predication state using a boolean stored at va */
> --
> 2.19.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: Flush before vkCmdWriteTimestamp() if needed

2018-12-05 Thread Alex Smith

Thanks. Though this fixes the 100% repro hang, I think your first patch is
still needed as well to handle getting 0x in the low 32 bits.

On Wed, 5 Dec 2018 at 10:04, Samuel Pitoiset 
wrote:

> Yes, this is correct, indeed.
>
> The issue wasn't present because we used EOP events before removing the
> availability bit.
>
> Btw, just noticed that we should reset pending_reset_query directly in
> si_emit_cache_flush() to reduce the number of stalls. I will send a patch.
>
> Also note that fill CP DMA operations are currently always sync'ed,
> while CP DMA copies are not. I plan to change this at some point.
>
> Reviewed-by: Samuel Pitoiset 
>
> On 12/5/18 10:52 AM, Alex Smith wrote:
> > As done for vkCmdBeginQuery() already. Prevents timestamps from being
> > overwritten by previous vkCmdResetQueryPool() calls if the shader path
> > was used to do the reset.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108925
> > Fixes: a41e2e9cf5 ("radv: allow to use a compute shader for resetting
> the query pool")
> > Signed-off-by: Alex Smith 
> > ---
> >   src/amd/vulkan/radv_query.c | 30 +++---
> >   1 file changed, 19 insertions(+), 11 deletions(-)
> >
> > diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
> > index 550abe307a..e226bcef6a 100644
> > --- a/src/amd/vulkan/radv_query.c
> > +++ b/src/amd/vulkan/radv_query.c
> > @@ -1436,6 +1436,22 @@ static unsigned event_type_for_stream(unsigned
> stream)
> >   }
> >   }
> >
> > +static void emit_query_flush(struct radv_cmd_buffer *cmd_buffer,
> > +  struct radv_query_pool *pool)
> > +{
> > + if (cmd_buffer->pending_reset_query) {
> > + if (pool->size >= RADV_BUFFER_OPS_CS_THRESHOLD) {
> > + /* Only need to flush caches if the query pool
> size is
> > +  * large enough to be resetted using the compute
> shader
> > +  * path. Small pools don't need any cache flushes
> > +  * because we use a CP dma clear.
> > +  */
> > + si_emit_cache_flush(cmd_buffer);
> > + cmd_buffer->pending_reset_query = false;
> > + }
> > + }
> > +}
> > +
> >   static void emit_begin_query(struct radv_cmd_buffer *cmd_buffer,
> >uint64_t va,
> >VkQueryType query_type,
> > @@ -1582,17 +1598,7 @@ void radv_CmdBeginQueryIndexedEXT(
> >
> >   radv_cs_add_buffer(cmd_buffer->device->ws, cs, pool->bo);
> >
> > - if (cmd_buffer->pending_reset_query) {
> > - if (pool->size >= RADV_BUFFER_OPS_CS_THRESHOLD) {
> > - /* Only need to flush caches if the query pool
> size is
> > -  * large enough to be resetted using the compute
> shader
> > -  * path. Small pools don't need any cache flushes
> > -  * because we use a CP dma clear.
> > -  */
> > - si_emit_cache_flush(cmd_buffer);
> > - cmd_buffer->pending_reset_query = false;
> > - }
> > - }
> > + emit_query_flush(cmd_buffer, pool);
> >
> >   va += pool->stride * query;
> >
> > @@ -1669,6 +1675,8 @@ void radv_CmdWriteTimestamp(
> >
> >   radv_cs_add_buffer(cmd_buffer->device->ws, cs, pool->bo);
> >
> > + emit_query_flush(cmd_buffer, pool);
> > +
> >   int num_queries = 1;
> >   if (cmd_buffer->state.subpass &&
> cmd_buffer->state.subpass->view_mask)
> >   num_queries =
> util_bitcount(cmd_buffer->state.subpass->view_mask);
> >
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radv: Flush before vkCmdWriteTimestamp() if needed

2018-12-05 Thread Alex Smith

As done for vkCmdBeginQuery() already. Prevents timestamps from being
overwritten by previous vkCmdResetQueryPool() calls if the shader path
was used to do the reset.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108925
Fixes: a41e2e9cf5 ("radv: allow to use a compute shader for resetting the query 
pool")
Signed-off-by: Alex Smith 
---
 src/amd/vulkan/radv_query.c | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
index 550abe307a..e226bcef6a 100644
--- a/src/amd/vulkan/radv_query.c
+++ b/src/amd/vulkan/radv_query.c
@@ -1436,6 +1436,22 @@ static unsigned event_type_for_stream(unsigned stream)
}
 }
 
+static void emit_query_flush(struct radv_cmd_buffer *cmd_buffer,
+struct radv_query_pool *pool)
+{
+   if (cmd_buffer->pending_reset_query) {
+   if (pool->size >= RADV_BUFFER_OPS_CS_THRESHOLD) {
+   /* Only need to flush caches if the query pool size is
+* large enough to be resetted using the compute shader
+* path. Small pools don't need any cache flushes
+* because we use a CP dma clear.
+*/
+   si_emit_cache_flush(cmd_buffer);
+   cmd_buffer->pending_reset_query = false;
+   }
+   }
+}
+
 static void emit_begin_query(struct radv_cmd_buffer *cmd_buffer,
 uint64_t va,
 VkQueryType query_type,
@@ -1582,17 +1598,7 @@ void radv_CmdBeginQueryIndexedEXT(
 
radv_cs_add_buffer(cmd_buffer->device->ws, cs, pool->bo);
 
-   if (cmd_buffer->pending_reset_query) {
-   if (pool->size >= RADV_BUFFER_OPS_CS_THRESHOLD) {
-   /* Only need to flush caches if the query pool size is
-* large enough to be resetted using the compute shader
-* path. Small pools don't need any cache flushes
-* because we use a CP dma clear.
-*/
-   si_emit_cache_flush(cmd_buffer);
-   cmd_buffer->pending_reset_query = false;
-   }
-   }
+   emit_query_flush(cmd_buffer, pool);
 
va += pool->stride * query;
 
@@ -1669,6 +1675,8 @@ void radv_CmdWriteTimestamp(
 
radv_cs_add_buffer(cmd_buffer->device->ws, cs, pool->bo);
 
+   emit_query_flush(cmd_buffer, pool);
+
int num_queries = 1;
if (cmd_buffer->state.subpass && cmd_buffer->state.subpass->view_mask)
num_queries = 
util_bitcount(cmd_buffer->state.subpass->view_mask);
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: fix vkCmdCopyQueryoolResults() for timestamp queries

2018-12-05 Thread Alex Smith

On Tue, 4 Dec 2018 at 21:57, Bas Nieuwenhuizen 
wrote:

> On Tue, Dec 4, 2018 at 4:52 PM Samuel Pitoiset
>  wrote:
> >
> > Because WAIT_REG_MEM can only wait for a 32-bit value, it's not
> > safe to use it for timestamp queries. If we only wait on the low
> > 32 bits of a timestamp query we could be unlucky and the GPU
> > might hang.
> >
> > One possible fix is to emit a full end of pipe event and wait
> > on a 32-bit value which is actually an availability bit. This
> > bit is allocated at creation time and always cleared before
> > emitting the EOP event.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108925
> > Fixes: 5d6a560a29 ("radv: do not use the availability bit for timestamp
> queries")
> > Signed-off-by: Samuel Pitoiset 
> > ---
> >  src/amd/vulkan/radv_query.c | 49 +++--
> >  1 file changed, 41 insertions(+), 8 deletions(-)
> >
> > diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
> > index 550abe307a1..9bb6b660add 100644
> > --- a/src/amd/vulkan/radv_query.c
> > +++ b/src/amd/vulkan/radv_query.c
> > @@ -1056,8 +1056,15 @@ VkResult radv_CreateQueryPool(
> > pool->pipeline_stats_mask = pCreateInfo->pipelineStatistics;
> > pool->availability_offset = pool->stride *
> pCreateInfo->queryCount;
> > pool->size = pool->availability_offset;
> > -   if (pCreateInfo->queryType == VK_QUERY_TYPE_PIPELINE_STATISTICS)
> > +   if (pCreateInfo->queryType == VK_QUERY_TYPE_PIPELINE_STATISTICS)
> {
> > pool->size += 4 * pCreateInfo->queryCount;
> > +   } else if (pCreateInfo->queryType == VK_QUERY_TYPE_TIMESTAMP) {
> > +   /* Allocate one DWORD for the availability bit which is
> needed
> > +* for vkCmdCopyQueryPoolResults() because we can't
> perform a
> > +* WAIT_REG_MEM on a 64-bit value.
> > +*/
> > +   pool->size += 4;
> > +   }
> >
> > pool->bo = device->ws->buffer_create(device->ws, pool->size,
> >  64, RADEON_DOMAIN_GTT,
> RADEON_FLAG_NO_INTERPROCESS_SHARING);
> > @@ -1328,19 +1335,45 @@ void radv_CmdCopyQueryPoolResults(
> >   pool->availability_offset + 4 *
> firstQuery);
> > break;
> > case VK_QUERY_TYPE_TIMESTAMP:
> > +   if (flags & VK_QUERY_RESULT_WAIT_BIT) {
> > +   /* Emit a full end of pipe event because we can't
> > +* perform a WAIT_REG_MEM on a 64-bit value. If
> we only
> > +* do a WAIT_REG_MEM on the low 32 bits of a
> timestamp
> > +* query we could be unlucky and the GPU might
> hang.
> > +*/
> > +   enum chip_class chip =
> cmd_buffer->device->physical_device->rad_info.chip_class;
> > +   bool is_mec =
> radv_cmd_buffer_uses_mec(cmd_buffer);
> > +   uint64_t avail_va = va +
> pool->availability_offset;
> > +
> > +   /* Clear the availability bit before waiting on
> the end
> > +* of pipe event.
> > +*/
> > +   radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
> > +   radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
> > +   S_370_WR_CONFIRM(1) |
> > +   S_370_ENGINE_SEL(V_370_ME));
> > +   radeon_emit(cs, avail_va);
> > +   radeon_emit(cs, avail_va >> 32);
> > +   radeon_emit(cs, 0xdeadbeef);
> > +
> > +   /* Wait for all prior GPU work. */
> > +   si_cs_emit_write_event_eop(cs, chip, is_mec,
> > +
> V_028A90_BOTTOM_OF_PIPE_TS, 0,
> > +
> EOP_DATA_SEL_VALUE_32BIT,
> > +  avail_va, 0, 1,
> > +
> cmd_buffer->gfx9_eop_bug_va);
> > +
> > +   /* Wait on the timestamp value. */
> > +   radv_cp_wait_mem(cs, WAIT_REG_MEM_EQUAL,
> avail_va,
> > +1, 0x);
> > +   }
> > +
>
> Can we put this in a separate function? Also, you'll want to allocate
> the availability bit in the upload buffer, in case there are multiple
> concurrent command buffers using the same query pool.
>
> Alternative solution: look at the upper 32 bits, those definitely
> should not be 0xfff until a far away point in the future.
>

I just looked into this a bit more, since if the cause of the hang is that
the low 32 bits on a valid timestamp are 0x, it seemed a bit
suspicious that it's 100% repro.

What's actually happening is that some of the timestamps are being written
before vkCmdResetQueryPool completes, so the reset ends up overwriting them
back to TIMESTAMP_NOT_READY. I've updated the test case on the bug to map

Re: [Mesa-dev] [PATCH] radv: fix vkCmdCopyQueryoolResults() for timestamp queries

2018-12-04 Thread Alex Smith

Tested-by: Alex Smith 

Thanks! s/Queryool/QueryPool/ in the subject, btw.

On Tue, 4 Dec 2018 at 15:52, Samuel Pitoiset 
wrote:

> Because WAIT_REG_MEM can only wait for a 32-bit value, it's not
> safe to use it for timestamp queries. If we only wait on the low
> 32 bits of a timestamp query we could be unlucky and the GPU
> might hang.
>
> One possible fix is to emit a full end of pipe event and wait
> on a 32-bit value which is actually an availability bit. This
> bit is allocated at creation time and always cleared before
> emitting the EOP event.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108925
> Fixes: 5d6a560a29 ("radv: do not use the availability bit for timestamp
> queries")
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_query.c | 49 +++--
>  1 file changed, 41 insertions(+), 8 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
> index 550abe307a1..9bb6b660add 100644
> --- a/src/amd/vulkan/radv_query.c
> +++ b/src/amd/vulkan/radv_query.c
> @@ -1056,8 +1056,15 @@ VkResult radv_CreateQueryPool(
> pool->pipeline_stats_mask = pCreateInfo->pipelineStatistics;
> pool->availability_offset = pool->stride * pCreateInfo->queryCount;
> pool->size = pool->availability_offset;
> -   if (pCreateInfo->queryType == VK_QUERY_TYPE_PIPELINE_STATISTICS)
> +   if (pCreateInfo->queryType == VK_QUERY_TYPE_PIPELINE_STATISTICS) {
> pool->size += 4 * pCreateInfo->queryCount;
> +   } else if (pCreateInfo->queryType == VK_QUERY_TYPE_TIMESTAMP) {
> +   /* Allocate one DWORD for the availability bit which is
> needed
> +* for vkCmdCopyQueryPoolResults() because we can't
> perform a
> +* WAIT_REG_MEM on a 64-bit value.
> +*/
> +   pool->size += 4;
> +   }
>
> pool->bo = device->ws->buffer_create(device->ws, pool->size,
>  64, RADEON_DOMAIN_GTT,
> RADEON_FLAG_NO_INTERPROCESS_SHARING);
> @@ -1328,19 +1335,45 @@ void radv_CmdCopyQueryPoolResults(
>   pool->availability_offset + 4 *
> firstQuery);
> break;
> case VK_QUERY_TYPE_TIMESTAMP:
> +   if (flags & VK_QUERY_RESULT_WAIT_BIT) {
> +   /* Emit a full end of pipe event because we can't
> +* perform a WAIT_REG_MEM on a 64-bit value. If we
> only
> +* do a WAIT_REG_MEM on the low 32 bits of a
> timestamp
> +* query we could be unlucky and the GPU might
> hang.
> +*/
> +   enum chip_class chip =
> cmd_buffer->device->physical_device->rad_info.chip_class;
> +   bool is_mec = radv_cmd_buffer_uses_mec(cmd_buffer);
> +   uint64_t avail_va = va + pool->availability_offset;
> +
> +   /* Clear the availability bit before waiting on
> the end
> +* of pipe event.
> +*/
> +   radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
> +   radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
> +   S_370_WR_CONFIRM(1) |
> +   S_370_ENGINE_SEL(V_370_ME));
> +   radeon_emit(cs, avail_va);
> +   radeon_emit(cs, avail_va >> 32);
> +   radeon_emit(cs, 0xdeadbeef);
> +
> +   /* Wait for all prior GPU work. */
> +   si_cs_emit_write_event_eop(cs, chip, is_mec,
> +
> V_028A90_BOTTOM_OF_PIPE_TS, 0,
> +
> EOP_DATA_SEL_VALUE_32BIT,
> +  avail_va, 0, 1,
> +
> cmd_buffer->gfx9_eop_bug_va);
> +
> +   /* Wait on the timestamp value. */
> +   radv_cp_wait_mem(cs, WAIT_REG_MEM_EQUAL, avail_va,
> +1, 0x);
> +   }
> +
> for(unsigned i = 0; i < queryCount; ++i, dest_va +=
> stride) {
> unsigned query = firstQuery + i;
> uint64_t local_src_va = va  + query * pool->stride;
>
> MAYBE_UNUSED unsigned cdw_max =
> radeon_check_space(cmd_buffer->device->ws, cs, 19);
>
> -
> -   if (flags & VK_QUERY_RESULT_WAIT_BIT) {
> -   radv_cp_wait_mem(cs,
> WAIT_REG_MEM

Re: [Mesa-dev] [PATCH] radv: Clamp gfx9 image view extents to the allocated image extents.

2018-11-27 Thread Alex Smith

Tested-by: Alex Smith 

Confirmed it fixes both the testcase and the in-game bug it was causing.
Thanks!

On Tue, 27 Nov 2018 at 08:34, Samuel Pitoiset 
wrote:

> cc stable?
>
> Reviewed-by: Samuel Pitoiset 
>
> On 11/24/18 11:31 PM, Bas Nieuwenhuizen wrote:
> > Mirrors AMDVLK. Looks like if we go over the alignment of height
> > we actually start to change the addressing. Seems like the extra
> > miplevels actually work with this.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108245
> > Fixes: f6cc15dccd5 "radv/gfx9: fix block compression texture views. (v2)"
> > ---
> >   src/amd/vulkan/radv_image.c | 6 ++
> >   1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c
> > index 7492bf48b51..ba8e28f0e23 100644
> > --- a/src/amd/vulkan/radv_image.c
> > +++ b/src/amd/vulkan/radv_image.c
> > @@ -1175,8 +1175,6 @@ radv_image_view_init(struct radv_image_view *iview,
> >if (device->physical_device->rad_info.chip_class >= GFX9
> &&
> >vk_format_is_compressed(image->vk_format) &&
> >!vk_format_is_compressed(iview->vk_format)) {
> > -  unsigned rounded_img_w =
> util_next_power_of_two(iview->extent.width);
> > -  unsigned rounded_img_h =
> util_next_power_of_two(iview->extent.height);
> >unsigned lvl_width  =
> radv_minify(image->info.width , range->baseMipLevel);
> >unsigned lvl_height =
> radv_minify(image->info.height, range->baseMipLevel);
> >
> > @@ -1186,8 +1184,8 @@ radv_image_view_init(struct radv_image_view *iview,
> >lvl_width <<= range->baseMipLevel;
> >lvl_height <<= range->baseMipLevel;
> >
> > -  iview->extent.width = CLAMP(lvl_width,
> iview->extent.width, rounded_img_w);
> > -  iview->extent.height = CLAMP(lvl_height,
> iview->extent.height, rounded_img_h);
> > +  iview->extent.width = CLAMP(lvl_width,
> iview->extent.width, iview->image->surface.u.gfx9.surf_pitch);
> > +  iview->extent.height = CLAMP(lvl_height,
> iview->extent.height, iview->image->surface.u.gfx9.surf_height);
> >}
> >   }
> >
> >
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv: Fix sanitization of stencil state when the depth test is disabled

2018-10-26 Thread Alex Smith

Good point, I didn't notice they were both now doing the same thing. Merged
them together and pushed.

On Thu, 25 Oct 2018 at 18:00, Jason Ekstrand  wrote:

> Maybe we should just roll the depthTestEnable check in with the ds_aspects
> & VK_IMAGE_ASPECT_DEPTH_BIT check right below it.  In either case,
>
> Reviewed-by: Jason Ekstrand 
>
> On Thu, Oct 25, 2018 at 5:25 AM Alex Smith 
> wrote:
>
>> When depth testing is disabled, we shouldn't pay attention to the
>> specified depthCompareOp, and just treat it as always passing. Before,
>> if the depth test is disabled, but depthCompareOp is VK_COMPARE_OP_NEVER
>> (e.g. from the app having zero-initialized the structure), then
>> sanitize_stencil_face() would have incorrectly changed passOp to
>> VK_STENCIL_OP_KEEP.
>>
>> Signed-off-by: Alex Smith 
>> ---
>>  src/intel/vulkan/genX_pipeline.c | 8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/intel/vulkan/genX_pipeline.c
>> b/src/intel/vulkan/genX_pipeline.c
>> index 33f1f7832a..877a9fb850 100644
>> --- a/src/intel/vulkan/genX_pipeline.c
>> +++ b/src/intel/vulkan/genX_pipeline.c
>> @@ -755,9 +755,13 @@
>> sanitize_ds_state(VkPipelineDepthStencilStateCreateInfo *state,
>>  {
>> *stencilWriteEnable = state->stencilTestEnable;
>>
>> -   /* If the depth test is disabled, we won't be writing anything. */
>> -   if (!state->depthTestEnable)
>> +   /* If the depth test is disabled, we won't be writing anything. Make
>> sure
>> +* we treat it as always passing later on as well.
>> +*/
>> +   if (!state->depthTestEnable) {
>>state->depthWriteEnable = false;
>> +  state->depthCompareOp = VK_COMPARE_OP_ALWAYS;
>> +   }
>>
>> /* The Vulkan spec requires that if either depth or stencil is not
>> present,
>>  * the pipeline is to act as if the test silently passes.
>> --
>> 2.14.4
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] anv: Fix sanitization of stencil state when the depth test is disabled

2018-10-25 Thread Alex Smith

When depth testing is disabled, we shouldn't pay attention to the
specified depthCompareOp, and just treat it as always passing. Before,
if the depth test is disabled, but depthCompareOp is VK_COMPARE_OP_NEVER
(e.g. from the app having zero-initialized the structure), then
sanitize_stencil_face() would have incorrectly changed passOp to
VK_STENCIL_OP_KEEP.

Signed-off-by: Alex Smith 
---
 src/intel/vulkan/genX_pipeline.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index 33f1f7832a..877a9fb850 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -755,9 +755,13 @@ sanitize_ds_state(VkPipelineDepthStencilStateCreateInfo 
*state,
 {
*stencilWriteEnable = state->stencilTestEnable;
 
-   /* If the depth test is disabled, we won't be writing anything. */
-   if (!state->depthTestEnable)
+   /* If the depth test is disabled, we won't be writing anything. Make sure
+* we treat it as always passing later on as well.
+*/
+   if (!state->depthTestEnable) {
   state->depthWriteEnable = false;
+  state->depthCompareOp = VK_COMPARE_OP_ALWAYS;
+   }
 
/* The Vulkan spec requires that if either depth or stencil is not present,
 * the pipeline is to act as if the test silently passes.
-- 
2.14.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] amd/common: check DRM version 3.27 for JPEG decode

2018-10-24 Thread Alex Smith

Thanks, that's fixed it for me.

On Tue, 23 Oct 2018 at 18:05, Liu, Leo  wrote:

> JPEG was added after DRM version 3.26
>
> Signed-off-by: Leo Liu 
> Fixes: 4558758c51749(amd/common: add vcn jpeg ip info query)
> Cc: Boyuan Zhang 
> Cc: Alex Smith 
> ---
>  src/amd/common/ac_gpu_info.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/amd/common/ac_gpu_info.c b/src/amd/common/ac_gpu_info.c
> index ed08b500c63..2c70fb2c721 100644
> --- a/src/amd/common/ac_gpu_info.c
> +++ b/src/amd/common/ac_gpu_info.c
> @@ -186,7 +186,7 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle
> dev,
> }
> }
>
> -   if (info->drm_major == 3 && info->drm_minor >= 17) {
> +   if (info->drm_major == 3 && info->drm_minor >= 27) {
> r = amdgpu_query_hw_ip_info(dev, AMDGPU_HW_IP_VCN_JPEG, 0,
> _jpeg);
> if (r) {
> fprintf(stderr, "amdgpu:
> amdgpu_query_hw_ip_info(vcn_jpeg) failed.\n");
> --
> 2.17.1
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 13/15] amd/common: add vcn jpeg ip info query

2018-10-23 Thread Alex Smith

Hi,

With this commit, both radeonsi and radv fail to load for me with:

amdgpu: amdgpu_query_hw_ip_info(vcn_jpeg) failed.

If I comment out that query in ac_gpu_info.c, then they work again. I'm
running kernel 4.18.7 with a Vega 64 - is the DRM version check on that
correct?

Thanks,
Alex

On Wed, 17 Oct 2018 at 20:06,  wrote:

> From: Boyuan Zhang 
>
> Signed-off-by: Boyuan Zhang 
> Reviewed-by: Leo Liu 
> ---
>  src/amd/common/ac_gpu_info.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/src/amd/common/ac_gpu_info.c b/src/amd/common/ac_gpu_info.c
> index 766ad83547..8c50738c3f 100644
> --- a/src/amd/common/ac_gpu_info.c
> +++ b/src/amd/common/ac_gpu_info.c
> @@ -99,7 +99,7 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
> struct drm_amdgpu_info_device device_info = {};
> struct amdgpu_buffer_size_alignments alignment_info = {};
> struct drm_amdgpu_info_hw_ip dma = {}, compute = {}, uvd = {};
> -   struct drm_amdgpu_info_hw_ip uvd_enc = {}, vce = {}, vcn_dec = {};
> +   struct drm_amdgpu_info_hw_ip uvd_enc = {}, vce = {}, vcn_dec = {},
> vcn_jpeg = {};
> struct drm_amdgpu_info_hw_ip vcn_enc = {}, gfx = {};
> struct amdgpu_gds_resource_info gds = {};
> uint32_t vce_version = 0, vce_feature = 0, uvd_version = 0,
> uvd_feature = 0;
> @@ -186,6 +186,14 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle
> dev,
> }
> }
>
> +   if (info->drm_major == 3 && info->drm_minor >= 17) {
> +   r = amdgpu_query_hw_ip_info(dev, AMDGPU_HW_IP_VCN_JPEG, 0,
> _jpeg);
> +   if (r) {
> +   fprintf(stderr, "amdgpu:
> amdgpu_query_hw_ip_info(vcn_jpeg) failed.\n");
> +   return false;
> +   }
> +   }
> +
> r = amdgpu_query_firmware_version(dev, AMDGPU_INFO_FW_GFX_ME, 0, 0,
> >me_fw_version,
> >me_fw_feature);
> @@ -340,7 +348,8 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle
> dev,
> info->max_se = amdinfo->num_shader_engines;
> info->max_sh_per_se = amdinfo->num_shader_arrays_per_engine;
> info->has_hw_decode =
> -   (uvd.available_rings != 0) || (vcn_dec.available_rings !=
> 0);
> +   (uvd.available_rings != 0) || (vcn_dec.available_rings !=
> 0) ||
> +   (vcn_jpeg.available_rings != 0);
> info->uvd_fw_version =
> uvd.available_rings ? uvd_version : 0;
> info->vce_fw_version =
> @@ -439,6 +448,7 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle
> dev,
> ib_align = MAX2(ib_align, vce.ib_start_alignment);
> ib_align = MAX2(ib_align, vcn_dec.ib_start_alignment);
> ib_align = MAX2(ib_align, vcn_enc.ib_start_alignment);
> +   ib_align = MAX2(ib_align, vcn_jpeg.ib_start_alignment);
> assert(ib_align);
> info->ib_start_alignment = ib_align;
>
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2] anv: Allow presenting via a different GPU

2018-10-23 Thread Alex Smith

anv_GetPhysicalDeviceSurfaceSupportKHR will already return success for
this, but anv_GetPhysicalDevice{Xcb,Xlib}PresentationSupportKHR do not.
Apps which check for presentation support via the latter (all Feral
Vulkan games at least) will therefore fail.

This allows me to render on an Intel GPU and present to a display
connected to an AMD card (tested HD 530 + Vega 64).

v2: Rebase on current master.

Signed-off-by: Alex Smith 
---
 src/intel/vulkan/anv_wsi_x11.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/anv_wsi_x11.c b/src/intel/vulkan/anv_wsi_x11.c
index bfa76e88e7..7a27ceab64 100644
--- a/src/intel/vulkan/anv_wsi_x11.c
+++ b/src/intel/vulkan/anv_wsi_x11.c
@@ -41,7 +41,7 @@ VkBool32 anv_GetPhysicalDeviceXcbPresentationSupportKHR(
return wsi_get_physical_device_xcb_presentation_support(
   >wsi_device,
   queueFamilyIndex,
-  false,
+  true,
   connection, visual_id);
 }
 
@@ -56,7 +56,7 @@ VkBool32 anv_GetPhysicalDeviceXlibPresentationSupportKHR(
return wsi_get_physical_device_xcb_presentation_support(
   >wsi_device,
   queueFamilyIndex,
-  false,
+  true,
   XGetXCBConnection(dpy), visualID);
 }
 
-- 
2.14.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] anv: Allow presenting via a different GPU

2018-10-18 Thread Alex Smith

anv_GetPhysicalDeviceSurfaceSupportKHR will already return success for
this, but anv_GetPhysicalDevice{Xcb,Xlib}PresentationSupportKHR do not.
Apps which check for presentation support via the latter (all Feral
Vulkan games at least) will therefore fail.

This allows me to render on an Intel GPU and present to a display
connected to an AMD card (tested HD 530 + Vega 64).

Signed-off-by: Alex Smith 
---
 src/intel/vulkan/anv_wsi_x11.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/anv_wsi_x11.c b/src/intel/vulkan/anv_wsi_x11.c
index 2feb5f1337..d23cedb316 100644
--- a/src/intel/vulkan/anv_wsi_x11.c
+++ b/src/intel/vulkan/anv_wsi_x11.c
@@ -42,7 +42,7 @@ VkBool32 anv_GetPhysicalDeviceXcbPresentationSupportKHR(
   >wsi_device,
   >instance->alloc,
   queueFamilyIndex,
-  device->local_fd, false,
+  device->local_fd, true,
   connection, visual_id);
 }
 
@@ -58,7 +58,7 @@ VkBool32 anv_GetPhysicalDeviceXlibPresentationSupportKHR(
   >wsi_device,
   >instance->alloc,
   queueFamilyIndex,
-  device->local_fd, false,
+  device->local_fd, true,
   XGetXCBConnection(dpy), visualID);
 }
 
-- 
2.14.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: Add support for VK_KHR_driver_properties.

2018-10-17 Thread Alex Smith

This patch never landed in git, is that intentional?

On Mon, 1 Oct 2018 at 17:46, Jason Ekstrand  wrote:

> On Sun, Sep 30, 2018 at 1:04 PM Bas Nieuwenhuizen 
> wrote:
>
>> ---
>>  src/amd/vulkan/radv_device.c  | 27 +++
>>  src/amd/vulkan/radv_extensions.py |  1 +
>>  2 files changed, 28 insertions(+)
>>
>> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
>> index f7752eac83b..fe7e7f7f6ac 100644
>> --- a/src/amd/vulkan/radv_device.c
>> +++ b/src/amd/vulkan/radv_device.c
>> @@ -1196,6 +1196,33 @@ void radv_GetPhysicalDeviceProperties2(
>>
>> properties->conservativeRasterizationPostDepthCoverage = VK_FALSE;
>> break;
>> }
>> +   case
>> VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DRIVER_PROPERTIES_KHR: {
>> +   VkPhysicalDeviceDriverPropertiesKHR *driver_props
>> =
>> +   (VkPhysicalDeviceDriverPropertiesKHR *)
>> ext;
>> +
>> +   driver_props->driverID =
>> VK_DRIVER_ID_MESA_RADV_KHR;
>> +   memset(driver_props->driverName, 0,
>> VK_MAX_DRIVER_NAME_SIZE_KHR);
>> +   strcpy(driver_props->driverName, "radv");
>> +
>> +   memset(driver_props->driverInfo, 0,
>> VK_MAX_DRIVER_INFO_SIZE_KHR);
>> +   snprintf(driver_props->driverInfo,
>> VK_MAX_DRIVER_INFO_SIZE_KHR,
>> +   "Mesa " PACKAGE_VERSION
>> +#ifdef MESA_GIT_SHA1
>> +   " ("MESA_GIT_SHA1")"
>> +#endif
>> +   " (LLVM %i.%i.%i)",
>>
>
> I think %d is more customary, but I don't care.  Assuming you actually
> pass 1.1.0.2,
>
> Reviewed-by: Jason Ekstrand 
>
>
>> +(HAVE_LLVM >> 8) & 0xff, HAVE_LLVM &
>> 0xff,
>> +MESA_LLVM_VERSION_PATCH);
>> +
>> +   driver_props->conformanceVersion =
>> (VkConformanceVersionKHR) {
>> +   .major = 1,
>> +   .minor = 1,
>> +   .subminor = 0,
>> +   .patch = 2,
>> +   };
>> +   break;
>> +   }
>> +
>> default:
>> break;
>> }
>> diff --git a/src/amd/vulkan/radv_extensions.py
>> b/src/amd/vulkan/radv_extensions.py
>> index 584926df390..8df5da76ed5 100644
>> --- a/src/amd/vulkan/radv_extensions.py
>> +++ b/src/amd/vulkan/radv_extensions.py
>> @@ -59,6 +59,7 @@ EXTENSIONS = [
>>  Extension('VK_KHR_device_group',  1, True),
>>  Extension('VK_KHR_device_group_creation', 1, True),
>>  Extension('VK_KHR_draw_indirect_count',   1, True),
>> +Extension('VK_KHR_driver_properties', 1, True),
>>  Extension('VK_KHR_external_fence',1,
>> 'device->rad_info.has_syncobj_wait_for_submit'),
>>  Extension('VK_KHR_external_fence_capabilities',   1, True),
>>  Extension('VK_KHR_external_fence_fd', 1,
>> 'device->rad_info.has_syncobj_wait_for_submit'),
>> --
>> 2.19.0
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] ac/nir: Use context-specific LLVM types

2018-10-15 Thread Alex Smith

LLVMInt*Type() return types from the global context and therefore are
not safe for use in other contexts. Use types from our own context
instead.

Fixes frequent crashes seen when doing multithreaded pipeline creation.

Fixes: 4d0b02bb5a "ac: add support for 16bit load_push_constant"
Fixes: 7e7ee82698 "ac: add support for 16bit buffer loads"
Cc: "18.2" 
Signed-off-by: Alex Smith 
---
 src/amd/common/ac_nir_to_llvm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index e0a8e04cf3..402cf2d665 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1387,7 +1387,7 @@ static LLVMValueRef visit_load_push_constant(struct 
ac_nir_context *ctx,
 
if (instr->dest.ssa.bit_size == 16) {
unsigned load_dwords = instr->dest.ssa.num_components / 2 + 1;
-   LLVMTypeRef vec_type = LLVMVectorType(LLVMInt16Type(), 2 * 
load_dwords);
+   LLVMTypeRef vec_type = 
LLVMVectorType(LLVMInt16TypeInContext(ctx->ac.context), 2 * load_dwords);
ptr = ac_cast_ptr(>ac, ptr, vec_type);
LLVMValueRef res = LLVMBuildLoad(ctx->ac.builder, ptr, "");
res = LLVMBuildBitCast(ctx->ac.builder, res, vec_type, "");
@@ -1671,7 +1671,7 @@ static LLVMValueRef visit_load_buffer(struct 
ac_nir_context *ctx,
};
results[idx] = ac_build_intrinsic(>ac, load_name, 
data_type, params, 5, 0);
unsigned num_elems = ac_get_type_size(data_type) / 
elem_size_bytes;
-   LLVMTypeRef resTy = 
LLVMVectorType(LLVMIntType(instr->dest.ssa.bit_size), num_elems);
+   LLVMTypeRef resTy = 
LLVMVectorType(LLVMIntTypeInContext(ctx->ac.context, instr->dest.ssa.bit_size), 
num_elems);
results[idx] = LLVMBuildBitCast(ctx->ac.builder, 
results[idx], resTy, "");
}
}
-- 
2.14.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir/alu_to_scalar: Use ssa_for_alu_src in hand-rolled expansions

2018-10-04 Thread Alex Smith

Fixes the garbage output I was seeing with bloom enabled. Thanks!

Tested-by: Alex Smith 

On Wed, 3 Oct 2018 at 20:16, Jason Ekstrand  wrote:

> The ssa_for_alu_src helper will correctly handle swizzles and other
> source modifiers for you.  The expansions for unpack_half_2x16,
> pack_uvec2_to_uint, and pack_uvec4_to_uint were all broken with regards
> to swizzles.  The brokenness of unpack_half_2x16 was causing rendering
> errors in Rise of the Tomb Raider on Intel ever since c11833ab24dcba26
> which added an extra copy propagation to the optimization pipeline and
> caused us to start seeing swizzles where we hadn't seen any before.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926
> Fixes: 9ce901058f3d "nir: Add lowering of nir_op_unpack_half_2x16."
> Fixes: 9b8786eba955 "nir: Add lowering support for packing opcodes."
> Cc: Alex Smith 
> Cc: Matt Turner 
> ---
>  src/compiler/nir/nir_lower_alu_to_scalar.c | 33 --
>  1 file changed, 18 insertions(+), 15 deletions(-)
>
> diff --git a/src/compiler/nir/nir_lower_alu_to_scalar.c
> b/src/compiler/nir/nir_lower_alu_to_scalar.c
> index 742c8d8ee66..0be3aba9456 100644
> --- a/src/compiler/nir/nir_lower_alu_to_scalar.c
> +++ b/src/compiler/nir/nir_lower_alu_to_scalar.c
> @@ -107,11 +107,11 @@ lower_alu_instr_scalar(nir_alu_instr *instr,
> nir_builder *b)
>if (!b->shader->options->lower_pack_half_2x16)
>   return false;
>
> +  nir_ssa_def *src_vec2 = nir_ssa_for_alu_src(b, instr, 0);
> +
>nir_ssa_def *val =
> - nir_pack_half_2x16_split(b, nir_channel(b, instr->src[0].src.ssa,
> -
>  instr->src[0].swizzle[0]),
> - nir_channel(b, instr->src[0].src.ssa,
> -
>  instr->src[0].swizzle[1]));
> + nir_pack_half_2x16_split(b, nir_channel(b, src_vec2, 0),
> + nir_channel(b, src_vec2, 1));
>
>nir_ssa_def_rewrite_uses(>dest.dest.ssa,
> nir_src_for_ssa(val));
>nir_instr_remove(>instr);
> @@ -130,9 +130,11 @@ lower_alu_instr_scalar(nir_alu_instr *instr,
> nir_builder *b)
>if (!b->shader->options->lower_unpack_half_2x16)
>   return false;
>
> +  nir_ssa_def *packed = nir_ssa_for_alu_src(b, instr, 0);
> +
>nir_ssa_def *comps[2];
> -  comps[0] = nir_unpack_half_2x16_split_x(b, instr->src[0].src.ssa);
> -  comps[1] = nir_unpack_half_2x16_split_y(b, instr->src[0].src.ssa);
> +  comps[0] = nir_unpack_half_2x16_split_x(b, packed);
> +  comps[1] = nir_unpack_half_2x16_split_y(b, packed);
>nir_ssa_def *vec = nir_vec(b, comps, 2);
>
>nir_ssa_def_rewrite_uses(>dest.dest.ssa,
> nir_src_for_ssa(vec));
> @@ -144,8 +146,8 @@ lower_alu_instr_scalar(nir_alu_instr *instr,
> nir_builder *b)
>assert(b->shader->options->lower_pack_snorm_2x16 ||
>   b->shader->options->lower_pack_unorm_2x16);
>
> -  nir_ssa_def *word =
> - nir_extract_u16(b, instr->src[0].src.ssa, nir_imm_int(b, 0));
> +  nir_ssa_def *word = nir_extract_u16(b, nir_ssa_for_alu_src(b,
> instr, 0),
> + nir_imm_int(b, 0));
>nir_ssa_def *val =
>   nir_ior(b, nir_ishl(b, nir_channel(b, word, 1), nir_imm_int(b,
> 16)),
>  nir_channel(b, word, 0));
> @@ -159,8 +161,8 @@ lower_alu_instr_scalar(nir_alu_instr *instr,
> nir_builder *b)
>assert(b->shader->options->lower_pack_snorm_4x8 ||
>   b->shader->options->lower_pack_unorm_4x8);
>
> -  nir_ssa_def *byte =
> - nir_extract_u8(b, instr->src[0].src.ssa, nir_imm_int(b, 0));
> +  nir_ssa_def *byte = nir_extract_u8(b, nir_ssa_for_alu_src(b, instr,
> 0),
> +nir_imm_int(b, 0));
>nir_ssa_def *val =
>   nir_ior(b, nir_ior(b, nir_ishl(b, nir_channel(b, byte, 3),
> nir_imm_int(b, 24)),
> nir_ishl(b, nir_channel(b, byte, 2),
> nir_imm_int(b, 16))),
> @@ -173,14 +175,15 @@ lower_alu_instr_scalar(nir_alu_instr *instr,
> nir_builder *b)
> }
>
> case nir_op_fdph: {
> +  nir_ssa_def *src0_vec = nir_ssa_for_alu_src(b, instr, 0);
> +  nir_ssa_def *src1_vec = nir_ssa_for_alu_src(b, instr, 1);
> +
>nir_ssa_def *sum[4];
>for (unsigned i = 0; i < 3; i++) {
> - sum[i] = nir_fmul(b, nir_channel(b, instr->src[0].src.ssa,
> -  instr->src[0].swizzle[i]),
> -  nir_channel(b, instr->src[1

Re: [Mesa-dev] [PATCH] anv/batch_chain: Don't start a new BO just for BATCH_BUFFER_START

2018-10-03 Thread Alex Smith

Fixes a crash I see on Broadwell.

Tested-by: Alex Smith 

Cc to stable for 18.2? The crash is reproducible there as well.

On Tue, 2 Oct 2018 at 23:25, Jason Ekstrand  wrote:

> Previously, we just went ahead and emitted MI_BATCH_BUFFER_START as
> normal.  If we are near enough to the end, this can cause us to start a
> new BO just for the MI_BATCH_BUFFER_START which messes up chaining.  We
> always reserve enough space at the end for an MI_BATCH_BUFFER_START so
> we can just increment cmd_buffer->batch.end prior to emitting the
> command.
>
> Fixes: a0b133286a3 "anv/batch_chain: Simplify secondary batch return..."
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926
> ---
>  src/intel/vulkan/anv_batch_chain.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/src/intel/vulkan/anv_batch_chain.c
> b/src/intel/vulkan/anv_batch_chain.c
> index 3e13553ac18..e08e07ad7bd 100644
> --- a/src/intel/vulkan/anv_batch_chain.c
> +++ b/src/intel/vulkan/anv_batch_chain.c
> @@ -894,8 +894,17 @@ anv_cmd_buffer_end_batch_buffer(struct anv_cmd_buffer
> *cmd_buffer)
>* It doesn't matter where it points now so long as has a valid
>* relocation.  We'll adjust it later as part of the chaining
>* process.
> +  *
> +  * We set the end of the batch a little short so we would be
> sure we
> +  * have room for the chaining command.  Since we're about to
> emit the
> +  * chaining command, let's set it back where it should go.
>*/
> + cmd_buffer->batch.end += GEN8_MI_BATCH_BUFFER_START_length * 4;
> + assert(cmd_buffer->batch.start == batch_bo->bo.map);
> + assert(cmd_buffer->batch.end == batch_bo->bo.map +
> batch_bo->bo.size);
> +
>   emit_batch_buffer_start(cmd_buffer, _bo->bo, 0);
> + assert(cmd_buffer->batch.start == batch_bo->bo.map);
>} else {
>   cmd_buffer->exec_mode = ANV_CMD_BUFFER_EXEC_MODE_COPY_AND_CHAIN;
>}
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: use radeon_info::name

2018-08-17 Thread Alex Smith

FWIW, putting "RADV" inside the brackets (e.g. " (RADV, LLVM ...)") would
still work for us.

On 17 August 2018 at 10:38, Samuel Pitoiset 
wrote:

> Yeah, ignore this patch.
>
> On 8/17/18 11:25 AM, Alex Smith wrote:
>
>> All of our Vulkan games rely on the presence of "RADV" somewhere in the
>> device name string to distinguish between RADV and AMDVLK/GPU-PRO, and
>> that's used to check whether the driver version is supported, whether to
>> enable bug workarounds, etc. This will certainly break that.
>>
>> Thanks,
>> Alex
>>
>> On 17 August 2018 at 10:00, Samuel Pitoiset > <mailto:samuel.pitoi...@gmail.com>> wrote:
>>
>> Signed-off-by: Samuel Pitoiset > <mailto:samuel.pitoi...@gmail.com>>
>>
>> ---
>>   src/amd/vulkan/radv_device.c | 33 +++--
>>   1 file changed, 3 insertions(+), 30 deletions(-)
>>
>> diff --git a/src/amd/vulkan/radv_device.c
>> b/src/amd/vulkan/radv_device.c
>> index cc88abb57a..e11005a1f8 100644
>> --- a/src/amd/vulkan/radv_device.c
>> +++ b/src/amd/vulkan/radv_device.c
>> @@ -77,41 +77,14 @@ radv_get_device_uuid(struct radeon_info *info,
>> void *uuid)
>>   }
>>
>>   static void
>> -radv_get_device_name(enum radeon_family family, char *name, size_t
>> name_len)
>> +radv_get_device_name(struct radeon_info *info, char *name, size_t
>> name_len)
>>   {
>> -   const char *chip_string;
>>  char llvm_string[32] = {};
>>
>> -   switch (family) {
>> -   case CHIP_TAHITI: chip_string = "AMD RADV TAHITI"; break;
>> -   case CHIP_PITCAIRN: chip_string = "AMD RADV PITCAIRN"; break;
>> -   case CHIP_VERDE: chip_string = "AMD RADV CAPE VERDE"; break;
>> -   case CHIP_OLAND: chip_string = "AMD RADV OLAND"; break;
>> -   case CHIP_HAINAN: chip_string = "AMD RADV HAINAN"; break;
>> -   case CHIP_BONAIRE: chip_string = "AMD RADV BONAIRE"; break;
>> -   case CHIP_KAVERI: chip_string = "AMD RADV KAVERI"; break;
>> -   case CHIP_KABINI: chip_string = "AMD RADV KABINI"; break;
>> -   case CHIP_HAWAII: chip_string = "AMD RADV HAWAII"; break;
>> -   case CHIP_MULLINS: chip_string = "AMD RADV MULLINS"; break;
>> -   case CHIP_TONGA: chip_string = "AMD RADV TONGA"; break;
>> -   case CHIP_ICELAND: chip_string = "AMD RADV ICELAND"; break;
>> -   case CHIP_CARRIZO: chip_string = "AMD RADV CARRIZO"; break;
>> -   case CHIP_FIJI: chip_string = "AMD RADV FIJI"; break;
>> -   case CHIP_POLARIS10: chip_string = "AMD RADV POLARIS10";
>> break;
>> -   case CHIP_POLARIS11: chip_string = "AMD RADV POLARIS11";
>> break;
>> -   case CHIP_POLARIS12: chip_string = "AMD RADV POLARIS12";
>> break;
>> -   case CHIP_STONEY: chip_string = "AMD RADV STONEY"; break;
>> -   case CHIP_VEGAM: chip_string = "AMD RADV VEGA M"; break;
>> -   case CHIP_VEGA10: chip_string = "AMD RADV VEGA10"; break;
>> -   case CHIP_VEGA12: chip_string = "AMD RADV VEGA12"; break;
>> -   case CHIP_RAVEN: chip_string = "AMD RADV RAVEN"; break;
>> -   default: chip_string = "AMD RADV unknown"; break;
>> -   }
>> -
>>  snprintf(llvm_string, sizeof(llvm_string),
>>   " (LLVM %i.%i.%i)", (HAVE_LLVM >> 8) & 0xff,
>>   HAVE_LLVM & 0xff, MESA_LLVM_VERSION_PATCH);
>> -   snprintf(name, name_len, "%s%s", chip_string, llvm_string);
>> +   snprintf(name, name_len, "%s%s", info->name, llvm_string);
>>   }
>>
>>   static void
>> @@ -297,7 +270,7 @@ radv_physical_device_init(struct
>> radv_physical_device *device,
>>
>>  radv_handle_env_var_force_family(device);
>>
>> -   radv_get_device_name(device->rad_info.family, device->name,
>> sizeof(device->name));
>> +   radv_get_device_name(>rad_info, device->name,
>> sizeof(device->name));
>>
>>  if (radv_device_get_cache_uuid(device->rad_info.family,
>> device->cache_uuid)) {
>>  device->ws->destroy(device->ws);
>> -- 2.18.0
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org <mailto:mesa-dev@lists.freedesktop.org
>> >
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>
>>
>>
>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: use radeon_info::name

2018-08-17 Thread Alex Smith

All of our Vulkan games rely on the presence of "RADV" somewhere in the
device name string to distinguish between RADV and AMDVLK/GPU-PRO, and
that's used to check whether the driver version is supported, whether to
enable bug workarounds, etc. This will certainly break that.

Thanks,
Alex

On 17 August 2018 at 10:00, Samuel Pitoiset 
wrote:

> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_device.c | 33 +++--
>  1 file changed, 3 insertions(+), 30 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index cc88abb57a..e11005a1f8 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -77,41 +77,14 @@ radv_get_device_uuid(struct radeon_info *info, void
> *uuid)
>  }
>
>  static void
> -radv_get_device_name(enum radeon_family family, char *name, size_t
> name_len)
> +radv_get_device_name(struct radeon_info *info, char *name, size_t
> name_len)
>  {
> -   const char *chip_string;
> char llvm_string[32] = {};
>
> -   switch (family) {
> -   case CHIP_TAHITI: chip_string = "AMD RADV TAHITI"; break;
> -   case CHIP_PITCAIRN: chip_string = "AMD RADV PITCAIRN"; break;
> -   case CHIP_VERDE: chip_string = "AMD RADV CAPE VERDE"; break;
> -   case CHIP_OLAND: chip_string = "AMD RADV OLAND"; break;
> -   case CHIP_HAINAN: chip_string = "AMD RADV HAINAN"; break;
> -   case CHIP_BONAIRE: chip_string = "AMD RADV BONAIRE"; break;
> -   case CHIP_KAVERI: chip_string = "AMD RADV KAVERI"; break;
> -   case CHIP_KABINI: chip_string = "AMD RADV KABINI"; break;
> -   case CHIP_HAWAII: chip_string = "AMD RADV HAWAII"; break;
> -   case CHIP_MULLINS: chip_string = "AMD RADV MULLINS"; break;
> -   case CHIP_TONGA: chip_string = "AMD RADV TONGA"; break;
> -   case CHIP_ICELAND: chip_string = "AMD RADV ICELAND"; break;
> -   case CHIP_CARRIZO: chip_string = "AMD RADV CARRIZO"; break;
> -   case CHIP_FIJI: chip_string = "AMD RADV FIJI"; break;
> -   case CHIP_POLARIS10: chip_string = "AMD RADV POLARIS10"; break;
> -   case CHIP_POLARIS11: chip_string = "AMD RADV POLARIS11"; break;
> -   case CHIP_POLARIS12: chip_string = "AMD RADV POLARIS12"; break;
> -   case CHIP_STONEY: chip_string = "AMD RADV STONEY"; break;
> -   case CHIP_VEGAM: chip_string = "AMD RADV VEGA M"; break;
> -   case CHIP_VEGA10: chip_string = "AMD RADV VEGA10"; break;
> -   case CHIP_VEGA12: chip_string = "AMD RADV VEGA12"; break;
> -   case CHIP_RAVEN: chip_string = "AMD RADV RAVEN"; break;
> -   default: chip_string = "AMD RADV unknown"; break;
> -   }
> -
> snprintf(llvm_string, sizeof(llvm_string),
>  " (LLVM %i.%i.%i)", (HAVE_LLVM >> 8) & 0xff,
>  HAVE_LLVM & 0xff, MESA_LLVM_VERSION_PATCH);
> -   snprintf(name, name_len, "%s%s", chip_string, llvm_string);
> +   snprintf(name, name_len, "%s%s", info->name, llvm_string);
>  }
>
>  static void
> @@ -297,7 +270,7 @@ radv_physical_device_init(struct radv_physical_device
> *device,
>
> radv_handle_env_var_force_family(device);
>
> -   radv_get_device_name(device->rad_info.family, device->name,
> sizeof(device->name));
> +   radv_get_device_name(>rad_info, device->name,
> sizeof(device->name));
>
> if (radv_device_get_cache_uuid(device->rad_info.family,
> device->cache_uuid)) {
> device->ws->destroy(device->ws);
> --
> 2.18.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2] anv: Pay attention to VK_ACCESS_MEMORY_(READ|WRITE)_BIT

2018-07-23 Thread Alex Smith

According to the spec, these should apply to all read/write access
types (so would be equivalent to specifying all other access types
individually). Currently, they were doing nothing.

v2: Handle VK_ACCESS_MEMORY_WRITE_BIT in dstAccessMask.

Signed-off-by: Alex Smith 
Cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/vulkan/anv_private.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index cec2842792..1660fcbbc8 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1731,6 +1731,9 @@ anv_pipe_flush_bits_for_access_flags(VkAccessFlags flags)
  pipe_bits |= ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT;
  pipe_bits |= ANV_PIPE_DEPTH_CACHE_FLUSH_BIT;
  break;
+  case VK_ACCESS_MEMORY_WRITE_BIT:
+ pipe_bits |= ANV_PIPE_FLUSH_BITS;
+ break;
   default:
  break; /* Nothing to do */
   }
@@ -1761,6 +1764,12 @@ anv_pipe_invalidate_bits_for_access_flags(VkAccessFlags 
flags)
   case VK_ACCESS_TRANSFER_READ_BIT:
  pipe_bits |= ANV_PIPE_TEXTURE_CACHE_INVALIDATE_BIT;
  break;
+  case VK_ACCESS_MEMORY_READ_BIT:
+ pipe_bits |= ANV_PIPE_INVALIDATE_BITS;
+ break;
+  case VK_ACCESS_MEMORY_WRITE_BIT:
+ pipe_bits |= ANV_PIPE_FLUSH_BITS;
+ break;
   default:
  break; /* Nothing to do */
   }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv: Pay attention to VK_ACCESS_MEMORY_(READ|WRITE)_BIT

2018-07-23 Thread Alex Smith

On 20 July 2018 at 19:01, Jason Ekstrand  wrote:

> On Fri, Jul 20, 2018 at 8:37 AM Lionel Landwerlin <
> lionel.g.landwer...@intel.com> wrote:
>
>> On 20/07/18 11:44, Alex Smith wrote:
>>
>> According to the spec, these should apply to all read/write access
>> types (so would be equivalent to specifying all other access types
>> individually). Currently, they were doing nothing.
>>
>> Signed-off-by: Alex Smith  
>> 
>> Cc: mesa-sta...@lists.freedesktop.org
>> ---
>>  src/intel/vulkan/anv_private.h | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
>> index cec2842792..775bacaff2 100644
>> --- a/src/intel/vulkan/anv_private.h
>> +++ b/src/intel/vulkan/anv_private.h
>> @@ -1731,6 +1731,9 @@ anv_pipe_flush_bits_for_access_flags(VkAccessFlags 
>> flags)
>>   pipe_bits |= ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT;
>>   pipe_bits |= ANV_PIPE_DEPTH_CACHE_FLUSH_BIT;
>>   break;
>> +  case VK_ACCESS_MEMORY_WRITE_BIT:
>> + pipe_bits |= ANV_PIPE_FLUSH_BITS;
>> + break;
>>default:
>>   break; /* Nothing to do */
>>}
>> @@ -1761,6 +1764,9 @@ 
>> anv_pipe_invalidate_bits_for_access_flags(VkAccessFlags flags)
>>case VK_ACCESS_TRANSFER_READ_BIT:
>>   pipe_bits |= ANV_PIPE_TEXTURE_CACHE_INVALIDATE_BIT;
>>   break;
>> +  case VK_ACCESS_MEMORY_READ_BIT:
>> + pipe_bits |= ANV_PIPE_INVALIDATE_BITS;
>> + break;
>>
>>
>> I know this function is a bit oddly named for that, but with this part of
>> the spec regarding VK_ACCESS_MEMORY_WRITE_BIT :
>>
>> "
>>
>>-
>>
>>When included in a destination access mask, makes all available
>>writes visible to all future write accesses on entities known to the 
>> Vulkan
>>device.
>>
>> "
>>
>> I would also add :
>>
>> case VK_ACCESS_MEMORY_WRITE_BIT:
>> pipe_bits |= ANV_PIPE_FLUSH_BITS;
>> break;
>>
>> Does that sound fair?
>>
>
> That's quite the heavy hammer But I think it's the right thing to do.
>

Yes - these bits are supposed to be the heaviest hammer there is. I'll add
that.

Alex
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] anv: Pay attention to VK_ACCESS_MEMORY_(READ|WRITE)_BIT

2018-07-20 Thread Alex Smith

According to the spec, these should apply to all read/write access
types (so would be equivalent to specifying all other access types
individually). Currently, they were doing nothing.

Signed-off-by: Alex Smith 
Cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/vulkan/anv_private.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index cec2842792..775bacaff2 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1731,6 +1731,9 @@ anv_pipe_flush_bits_for_access_flags(VkAccessFlags flags)
  pipe_bits |= ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT;
  pipe_bits |= ANV_PIPE_DEPTH_CACHE_FLUSH_BIT;
  break;
+  case VK_ACCESS_MEMORY_WRITE_BIT:
+ pipe_bits |= ANV_PIPE_FLUSH_BITS;
+ break;
   default:
  break; /* Nothing to do */
   }
@@ -1761,6 +1764,9 @@ anv_pipe_invalidate_bits_for_access_flags(VkAccessFlags 
flags)
   case VK_ACCESS_TRANSFER_READ_BIT:
  pipe_bits |= ANV_PIPE_TEXTURE_CACHE_INVALIDATE_BIT;
  break;
+  case VK_ACCESS_MEMORY_READ_BIT:
+ pipe_bits |= ANV_PIPE_INVALIDATE_BITS;
+ break;
   default:
  break; /* Nothing to do */
   }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: optimize vkCmd{Set, Reset}Event() a little bit

2018-06-29 Thread Alex Smith

FWIW none of our released Vulkan games will be using these functions.

On 29 June 2018 at 03:28, Dieter Nützel  wrote:

> Tested-by: Dieter Nützel 
>
> on RX 580 with F1 2017.
>
> Dieter
>
>
> Am 28.06.2018 12:21, schrieb Samuel Pitoiset:
>
>> Always emitting a bottom-of-pipe event is quite dumb. Instead,
>> start to optimize these functions by syncing PFP for the
>> top-of-pipe and syncing ME for the post-index-fetch event.
>>
>> This can still be improved by emitting EOS events for
>> syncing PS and CS stages.
>>
>> Signed-off-by: Samuel Pitoiset 
>> ---
>>  src/amd/vulkan/radv_cmd_buffer.c | 46 ++--
>>  1 file changed, 38 insertions(+), 8 deletions(-)
>>
>> diff --git a/src/amd/vulkan/radv_cmd_buffer.c
>> b/src/amd/vulkan/radv_cmd_buffer.c
>> index 074e9c4c7f..17385aace1 100644
>> --- a/src/amd/vulkan/radv_cmd_buffer.c
>> +++ b/src/amd/vulkan/radv_cmd_buffer.c
>> @@ -4275,14 +4275,44 @@ static void write_event(struct radv_cmd_buffer
>> *cmd_buffer,
>>
>> MAYBE_UNUSED unsigned cdw_max =
>> radeon_check_space(cmd_buffer->device->ws, cs, 18);
>>
>> -   /* TODO: this is overkill. Probably should figure something out
>> from
>> -* the stage mask. */
>> -
>> -   si_cs_emit_write_event_eop(cs,
>> -  cmd_buffer->device->physical_d
>> evice->rad_info.chip_class,
>> -  radv_cmd_buffer_uses_mec(cmd_buffer),
>> -  V_028A90_BOTTOM_OF_PIPE_TS, 0,
>> -  EOP_DATA_SEL_VALUE_32BIT, va, 2,
>> value);
>> +   /* Flags that only require a top-of-pipe event. */
>> +   static const VkPipelineStageFlags top_of_pipe_flags =
>> +   VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
>> +
>> +   /* Flags that only require a post-index-fetch event. */
>> +   static const VkPipelineStageFlags post_index_fetch_flags =
>> +   top_of_pipe_flags |
>> +   VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT |
>> +   VK_PIPELINE_STAGE_VERTEX_INPUT_BIT;
>> +
>> +   /* TODO: Emit EOS events for syncing PS/CS stages. */
>> +
>> +   if (!(stageMask & ~top_of_pipe_flags)) {
>> +   /* Just need to sync the PFP engine. */
>> +   radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
>> +   radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
>> +   S_370_WR_CONFIRM(1) |
>> +   S_370_ENGINE_SEL(V_370_PFP));
>> +   radeon_emit(cs, va);
>> +   radeon_emit(cs, va >> 32);
>> +   radeon_emit(cs, value);
>> +   } else if (!(stageMask & ~post_index_fetch_flags)) {
>> +   /* Sync ME because PFP reads index and indirect buffers.
>> */
>> +   radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
>> +   radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
>> +   S_370_WR_CONFIRM(1) |
>> +   S_370_ENGINE_SEL(V_370_ME));
>> +   radeon_emit(cs, va);
>> +   radeon_emit(cs, va >> 32);
>> +   radeon_emit(cs, value);
>> +   } else {
>> +   /* Otherwise, sync all prior GPU work using an EOP event.
>> */
>> +   si_cs_emit_write_event_eop(cs,
>> +  cmd_buffer->device->physical_d
>> evice->rad_info.chip_class,
>> +  radv_cmd_buffer_uses_mec(cmd_b
>> uffer),
>> +  V_028A90_BOTTOM_OF_PIPE_TS, 0,
>> +  EOP_DATA_SEL_VALUE_32BIT, va,
>> 2, value);
>> +   }
>>
>> assert(cmd_buffer->cs->cdw <= cdw_max);
>>  }
>>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 11/11] ac/radv: using tls to store llvm related info and speed up compiles (v3)

2018-06-28 Thread Alex Smith

Hi Dave,

I did a quick test with this on Rise of the Tomb Raider. It reduced the
time taken to create all pipelines for the whole game over 8 threads (with
RADV_DEBUG=nocache) from 12m24s to 11m35s. Nice improvement :)

Also didn't see any issues, so:

Tested-by: Alex Smith 

Thanks,
Alex

On 27 June 2018 at 04:58, Dave Airlie  wrote:

> From: Dave Airlie 
>
> I'd like to encourage people to test this to see if it helps (like
> does it make app startup better or less hitching in dxvk).
>
> The basic idea is to store a bunch of LLVM related data structs
> in thread local storage so we can avoid reiniting them every time
> we compile a shader. Since we know llvm objects aren't thread safe
> it has to be stored using TLS to avoid any collisions.
>
> This should remove all the fixed overheads setup costs of creating
> the pass manager each time.
>
> This takes a demo app time to compile the radv meta shaders on nocache
> and exit from 1.7s to 1s.
>
> TODO: this doesn't work for radeonsi yet, but I'm not sure how TLS
> works if you have radeonsi and radv loaded at the same time, if
> they'll magically try and use the same tls stuff, in which case
> this might explode all over the place.
>
> v2: fix llvm6 build, inline emit function, handle multiple targets
> in one thread
> v3: rebase and port onto new structure
> ---
>  src/amd/common/ac_llvm_helper.cpp | 120 --
>  src/amd/common/ac_llvm_util.c |  10 +--
>  src/amd/common/ac_llvm_util.h |   9 +++
>  src/amd/vulkan/radv_debug.h   |   1 +
>  src/amd/vulkan/radv_device.c  |   1 +
>  src/amd/vulkan/radv_shader.c  |   2 +
>  6 files changed, 132 insertions(+), 11 deletions(-)
>
> diff --git a/src/amd/common/ac_llvm_helper.cpp b/src/amd/common/ac_llvm_
> helper.cpp
> index 27403dbe085..f1f1399b3fb 100644
> --- a/src/amd/common/ac_llvm_helper.cpp
> +++ b/src/amd/common/ac_llvm_helper.cpp
> @@ -31,12 +31,21 @@
>
>  #include "ac_llvm_util.h"
>  #include 
> -#include 
> -#include 
> -#include 
> -#include 
> +#include 
>  #include 
>  #include 
> +#include 
> +
> +#include 
> +#include 
> +#if HAVE_LLVM >= 0x0700
> +#include 
> +#endif
> +
> +#if HAVE_LLVM < 0x0700
> +#include "llvm/Support/raw_ostream.h"
> +#endif
> +#include 
>
>  void ac_add_attr_dereferenceable(LLVMValueRef val, uint64_t bytes)
>  {
> @@ -101,11 +110,110 @@ ac_dispose_target_library_info(LLVMTargetLibraryInfoRef
> library_info)
> delete reinterpret_cast *>(library_info);
>  }
>
> +class ac_llvm_per_thread_info {
> +public:
> +   ac_llvm_per_thread_info(enum radeon_family arg_family,
> +   enum ac_target_machine_options
> arg_tm_options)
> +   : family(arg_family), tm_options(arg_tm_options),
> + OStream(CodeString) {}
> +   ~ac_llvm_per_thread_info() {
> +   ac_llvm_compiler_dispose_internal(_info);
> +   }
> +
> +   struct ac_llvm_compiler_info llvm_info;
> +   enum radeon_family family;
> +   enum ac_target_machine_options tm_options;
> +   llvm::SmallString<0> CodeString;
> +   llvm::raw_svector_ostream OStream;
> +   llvm::legacy::PassManager pass;
> +};
> +
> +/* we have to store a linked list per thread due to the possiblity of
> multiple gpus being required */
> +static thread_local std::list
> ac_llvm_per_thread_list;
> +
>  bool ac_compile_to_memory_buffer(struct ac_llvm_compiler_info *info,
>  LLVMModuleRef M,
>  char **ErrorMessage,
>  LLVMMemoryBufferRef *OutMemBuf)
>  {
> -   return LLVMTargetMachineEmitToMemoryBuffer(info->tm, M,
> LLVMObjectFile,
> -  ErrorMessage,
> OutMemBuf);
> +   ac_llvm_per_thread_info *thread_info = nullptr;
> +   if (info->thread_stored) {
> +   for (auto  : ac_llvm_per_thread_list) {
> +   if (I.llvm_info.tm == info->tm) {
> +   thread_info = 
> +   break;
> +   }
> +   }
> +
> +   if (!thread_info) {
> +   assert(0);
> +   return false;
> +   }
> +   } else {
> +   return LLVMTargetMachineEmitToMemoryBuffer(info->tm, M,
> LLVMObjectFile,
> +  ErrorMessage,
> OutMemBuf);
> +   }
> +
> +   llvm::TargetMachine *TM = reinterpret_cast TargetMachine*>(thread_info->

Re: [Mesa-dev] [PATCH mesa] radv: fix reported number of available VGPRs

2018-06-18 Thread Alex Smith

Reviewed-by: Alex Smith 

On 15 June 2018 at 17:52, Eric Engestrom  wrote:

> It's a bit late to round up after an integer division.
>
> Fixes: de889794134e6245e08a2 "radv: Implement VK_AMD_shader_info"
> Cc: Alex Smith 
> Signed-off-by: Eric Engestrom 
> ---
>  src/amd/vulkan/radv_shader.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
> index 76790a19047a86abdad5..b31eb9bfda5e9e29e115 100644
> --- a/src/amd/vulkan/radv_shader.c
> +++ b/src/amd/vulkan/radv_shader.c
> @@ -808,7 +808,7 @@ radv_GetShaderInfoAMD(VkDevice _device,
> unsigned workgroup_size = local_size[0] *
> local_size[1] * local_size[2];
>
> statistics.numAvailableVgprs =
> statistics.numPhysicalVgprs /
> -
> ceil(workgroup_size / statistics.numPhysicalVgprs);
> +
> ceil((double)workgroup_size / statistics.numPhysicalVgprs);
>
> statistics.computeWorkGroupSize[0] =
> local_size[0];
> statistics.computeWorkGroupSize[1] =
> local_size[1];
> --
> Cheers,
>   Eric
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/1] vulkan/wsi: Destroy swapchain images after terminating FIFO queues

2018-06-08 Thread Alex Smith

Thanks, I've pushed it.

On 8 June 2018 at 10:38, Lionel Landwerlin 
wrote:

> Sorry for missing that.
>
> Fixes: e73d136a023080 ("vulkan/wsi/x11: Implement FIFO mode.")
> Reviewed-by: Lionel Landwerlin 
>
>
> On 01/06/18 12:16, Cameron Kumar wrote:
>
>> The queue_manager thread can access the images from x11_present_to_x11,
>> hence this reorder prevents dereferencing of dangling pointers.
>>
>> Cc: "18.1" 
>> ---
>>   src/vulkan/wsi/wsi_common_x11.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/vulkan/wsi/wsi_common_x11.c
>> b/src/vulkan/wsi/wsi_common_x11.c
>> index 1bfbc7c300..20d7cf5a2c 100644
>> --- a/src/vulkan/wsi/wsi_common_x11.c
>> +++ b/src/vulkan/wsi/wsi_common_x11.c
>> @@ -1235,9 +1235,6 @@ x11_swapchain_destroy(struct wsi_swapchain
>> *anv_chain,
>>  struct x11_swapchain *chain = (struct x11_swapchain *)anv_chain;
>>  xcb_void_cookie_t cookie;
>>   -   for (uint32_t i = 0; i < chain->base.image_count; i++)
>> -  x11_image_finish(chain, pAllocator, >images[i]);
>> -
>>  if (chain->threaded) {
>> chain->status = VK_ERROR_OUT_OF_DATE_KHR;
>> /* Push a UINT32_MAX to wake up the manager */
>> @@ -1247,6 +1244,9 @@ x11_swapchain_destroy(struct wsi_swapchain
>> *anv_chain,
>> wsi_queue_destroy(>present_queue);
>>  }
>>   +   for (uint32_t i = 0; i < chain->base.image_count; i++)
>> +  x11_image_finish(chain, pAllocator, >images[i]);
>> +
>>  xcb_unregister_for_special_event(chain->conn, chain->special_event);
>>  cookie = xcb_present_select_input_checked(chain->conn,
>> chain->event_id,
>>chain->window,
>>
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/1] vulkan/wsi: Destroy swapchain images after terminating FIFO queues

2018-06-08 Thread Alex Smith

Any feedback on this?

On 1 June 2018 at 12:16, Cameron Kumar  wrote:

> The queue_manager thread can access the images from x11_present_to_x11,
> hence this reorder prevents dereferencing of dangling pointers.
>
> Cc: "18.1" 
> ---
>  src/vulkan/wsi/wsi_common_x11.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/src/vulkan/wsi/wsi_common_x11.c b/src/vulkan/wsi/wsi_common_
> x11.c
> index 1bfbc7c300..20d7cf5a2c 100644
> --- a/src/vulkan/wsi/wsi_common_x11.c
> +++ b/src/vulkan/wsi/wsi_common_x11.c
> @@ -1235,9 +1235,6 @@ x11_swapchain_destroy(struct wsi_swapchain
> *anv_chain,
> struct x11_swapchain *chain = (struct x11_swapchain *)anv_chain;
> xcb_void_cookie_t cookie;
>
> -   for (uint32_t i = 0; i < chain->base.image_count; i++)
> -  x11_image_finish(chain, pAllocator, >images[i]);
> -
> if (chain->threaded) {
>chain->status = VK_ERROR_OUT_OF_DATE_KHR;
>/* Push a UINT32_MAX to wake up the manager */
> @@ -1247,6 +1244,9 @@ x11_swapchain_destroy(struct wsi_swapchain
> *anv_chain,
>wsi_queue_destroy(>present_queue);
> }
>
> +   for (uint32_t i = 0; i < chain->base.image_count; i++)
> +  x11_image_finish(chain, pAllocator, >images[i]);
> +
> xcb_unregister_for_special_event(chain->conn, chain->special_event);
> cookie = xcb_present_select_input_checked(chain->conn,
> chain->event_id,
>   chain->window,
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH 1/3] radv: Set active_stages the same whether or not shaders were cached

2018-06-04 Thread Alex Smith

On 1 June 2018 at 22:51, Dylan Baker  wrote:

> Quoting Samuel Pitoiset (2018-06-01 08:58:42)
> >
> >
> > On 06/01/2018 05:48 PM, Dylan Baker wrote:
> > > Quoting Alex Smith (2018-06-01 07:56:38)
> > >> On 1 June 2018 at 15:48, Dylan Baker  wrote:
> > >>
> > >>  Quoting Alex Smith (2018-05-31 08:44:18)
> > >>  > With GFX9 merged shaders, active_stages would be set to the
> original
> > >>  > stages specified if shaders were not cached, but to the stages
> still
> > >>  > present after merging if they were.
> > >>  >
> > >>  > Be consistent and use the original stages.
> > >>  >
> > >>  > Signed-off-by: Alex Smith 
> > >>  > Cc: "18.1" 
> > >>  > ---
> > >>  >  src/amd/vulkan/radv_pipeline.c | 7 ++-
> > >>  >  1 file changed, 2 insertions(+), 5 deletions(-)
> > >>  >
> > >>  > diff --git a/src/amd/vulkan/radv_pipeline.c
> b/src/amd/vulkan/radv_
> > >>  pipeline.c
> > >>  > index 52734a308a..18dcc43ebe 100644
> > >>  > --- a/src/amd/vulkan/radv_pipeline.c
> > >>  > +++ b/src/amd/vulkan/radv_pipeline.c
> > >>  > @@ -1964,6 +1964,8 @@ void radv_create_shaders(struct
> radv_pipeline
> > >>  *pipeline,
> > >>  > _mesa_sha1_compute(modules[i]
> ->nir->
> > >>  info.name,
> > >>  >
> strlen(modules[i]->
> > >>  nir->info.name),
> > >>  >
> modules[i]->sha1);
> > >>  > +
> > >>  > +   pipeline->active_stages |=
> > >>  mesa_to_vk_shader_stage(i);
> > >>  > }
> > >>  > }
> > >>  >
> > >>  > @@ -1979,10 +1981,6 @@ void radv_create_shaders(struct
> radv_pipeline
> > >>  *pipeline,
> > >>  >
> > >>  > if (radv_create_shader_variants_
> from_pipeline_cache(device,
> > >>  cache, hash, pipeline->shaders) &&
> > >>  > (!modules[MESA_SHADER_GEOMETRY] ||
> pipeline->gs_copy_shader))
> > >>  {
> > >>  > -   for (unsigned i = 0; i < MESA_SHADER_STAGES;
> ++i) {
> > >>  > -   if (pipeline->shaders[i])
> > >>  > -   pipeline->active_stages |=
> > >>  mesa_to_vk_shader_stage(i);
> > >>  > -   }
> > >>  > return;
> > >>  > }
> > >>  >
> > >>  > @@ -2015,7 +2013,6 @@ void radv_create_shaders(struct
> radv_pipeline
> > >>  *pipeline,
> > >>  > stage ?
> stage->pName
> > >>  : "main", i,
> > >>  > stage ?
> stage->
> > >>  pSpecializationInfo : NULL,
> > >>  > flags);
> > >>  > -   pipeline->active_stages |=
> mesa_to_vk_shader_stage(i);
> > >>  >
> > >>  > /* We don't want to alter meta shaders IR
> directly so
> > >>  clone it
> > >>  >  * first.
> > >>  > --
> > >>  > 2.14.3
> > >>  >
> > >>  > ___
> > >>  > mesa-stable mailing list
> > >>  > mesa-sta...@lists.freedesktop.org
> > >>  > https://lists.freedesktop.org/mailman/listinfo/mesa-stable
> > >>
> > >>  Hi Alex,
> > >>
> > >>  This doesn't apply cleanly to the 18.1 tree with the following
> collision:
> > >>
> > >>  ++<<<<<<< HEAD
> > >>
> > >>   +  stage ?
> stage->
> > >>  pSpecializationInfo : NULL);
> > >>   +  pipeline->active_stages |=
> mesa_to_vk_shader_stage(i);
> > >>
> > >>  ++===
> > >>
> > >>

Re: [Mesa-dev] [PATCH] radv: Don't pass a TESS_EVAL shader when tesselation is not enabled.

2018-06-04 Thread Alex Smith

Oops. Thanks for tracking that down.

Reviewed-by: Alex Smith 

On 2 June 2018 at 13:31, Bas Nieuwenhuizen  wrote:

> Otherwise on pre-GFX9, if the constant layout allows both TESS_EVAL and
> GEOMETRY shaders, but the PIPELINE has only GEOMETRY, it would return the
> GEOMETRY shader for the TESS_EVAL shader.
>
> This would cause the flush_constants code to emit the GEOMETRY constants
> to the TESS_EVAL registers and then conclude that it did not need to set
> the GEOMETRY shader registers.
>
> Fixes: dfff9fb6f8d "radv: Handle GFX9 merged shaders in
> radv_flush_constants()"
> CC: 18.1 
> CC: Alex Smith 
> CC: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_pipeline.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_
> pipeline.c
> index ff647ed9af3..375f7c357d3 100644
> --- a/src/amd/vulkan/radv_pipeline.c
> +++ b/src/amd/vulkan/radv_pipeline.c
> @@ -1594,6 +1594,8 @@ radv_get_shader(struct radv_pipeline *pipeline,
> if (pipeline->shaders[MESA_SHADER_GEOMETRY])
> return pipeline->shaders[MESA_SHADER_GEOMETRY];
> } else if (stage == MESA_SHADER_TESS_EVAL) {
> +   if (!radv_pipeline_has_tess(pipeline))
> +   return NULL;
> if (pipeline->shaders[MESA_SHADER_TESS_EVAL])
> return pipeline->shaders[MESA_SHADER_TESS_EVAL];
> if (pipeline->shaders[MESA_SHADER_GEOMETRY])
> --
> 2.17.0
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH 1/3] radv: Set active_stages the same whether or not shaders were cached

2018-06-01 Thread Alex Smith

On 1 June 2018 at 16:58, Samuel Pitoiset  wrote:

>
>
> On 06/01/2018 05:48 PM, Dylan Baker wrote:
>
>> Quoting Alex Smith (2018-06-01 07:56:38)
>>
>>> On 1 June 2018 at 15:48, Dylan Baker  wrote:
>>>
>>>  Quoting Alex Smith (2018-05-31 08:44:18)
>>>  > With GFX9 merged shaders, active_stages would be set to the
>>> original
>>>  > stages specified if shaders were not cached, but to the stages
>>> still
>>>  > present after merging if they were.
>>>  >
>>>  > Be consistent and use the original stages.
>>>  >
>>>  > Signed-off-by: Alex Smith 
>>>  > Cc: "18.1" 
>>>  > ---
>>>  >  src/amd/vulkan/radv_pipeline.c | 7 ++-
>>>  >  1 file changed, 2 insertions(+), 5 deletions(-)
>>>  >
>>>  > diff --git a/src/amd/vulkan/radv_pipeline.c
>>> b/src/amd/vulkan/radv_
>>>  pipeline.c
>>>  > index 52734a308a..18dcc43ebe 100644
>>>  > --- a/src/amd/vulkan/radv_pipeline.c
>>>  > +++ b/src/amd/vulkan/radv_pipeline.c
>>>  > @@ -1964,6 +1964,8 @@ void radv_create_shaders(struct
>>> radv_pipeline
>>>  *pipeline,
>>>  > _mesa_sha1_compute(modules[i]
>>> ->nir->
>>>  info.name,
>>>  >
>>> strlen(modules[i]->
>>>  nir->info.name),
>>>  >
>>> modules[i]->sha1);
>>>  > +
>>>  > +   pipeline->active_stages |=
>>>  mesa_to_vk_shader_stage(i);
>>>  > }
>>>  > }
>>>  >
>>>  > @@ -1979,10 +1981,6 @@ void radv_create_shaders(struct
>>> radv_pipeline
>>>  *pipeline,
>>>  >
>>>  > if (radv_create_shader_variants_f
>>> rom_pipeline_cache(device,
>>>  cache, hash, pipeline->shaders) &&
>>>  > (!modules[MESA_SHADER_GEOMETRY] ||
>>> pipeline->gs_copy_shader))
>>>  {
>>>  > -   for (unsigned i = 0; i < MESA_SHADER_STAGES; ++i)
>>> {
>>>  > -   if (pipeline->shaders[i])
>>>  > -   pipeline->active_stages |=
>>>  mesa_to_vk_shader_stage(i);
>>>  > -   }
>>>  > return;
>>>  > }
>>>  >
>>>  > @@ -2015,7 +2013,6 @@ void radv_create_shaders(struct
>>> radv_pipeline
>>>  *pipeline,
>>>  > stage ?
>>> stage->pName
>>>  : "main", i,
>>>  > stage ?
>>> stage->
>>>  pSpecializationInfo : NULL,
>>>  > flags);
>>>  > -   pipeline->active_stages |=
>>> mesa_to_vk_shader_stage(i);
>>>  >
>>>  > /* We don't want to alter meta shaders IR
>>> directly so
>>>  clone it
>>>  >  * first.
>>>  > --
>>>  > 2.14.3
>>>  >
>>>  > ___
>>>  > mesa-stable mailing list
>>>  > mesa-sta...@lists.freedesktop.org
>>>  > https://lists.freedesktop.org/mailman/listinfo/mesa-stable
>>>
>>>  Hi Alex,
>>>
>>>  This doesn't apply cleanly to the 18.1 tree with the following
>>> collision:
>>>
>>>  ++<<<<<<< HEAD
>>>   +
>>> stage ? stage->
>>>  pSpecializationInfo : NULL);
>>>   +  pipeline->active_stages |=
>>> mesa_to_vk_shader_stage(i);
>>>  ++===
>>>  +
>>>  stage ? stage->
>>>  pSpecializationInfo : NULL,
>>>  +   flags);
>>>  ++>>>>>>> 0fa51bfdbe5... radv: Set
>>> active_stages the same whether or not
>>>  shaders were cached
>>>
>>>  I can remove the flags field (which doesn't exist in 18.1) before
>>> merging
>>>  if you
>>>  think that's the right thing to do. Does that seem reasonable?
>>>
>>>
>>> Yes, that's correct. Should just be removing the pipeline->active_stages
>>> line
>>> at that point.
>>>
>>> Thanks,
>>> Alex
>>>
>>
>> Thanks! This is in the tree for 18.1.2.
>>
>
> Apparently, this series regresses some CTS on my side, Bas will
> investigate, would be nice to not release 18.1.2 without this fixed. :)
>

What tests are failing?


>
>
>> Dylan
>>
>>
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH 1/3] radv: Set active_stages the same whether or not shaders were cached

2018-06-01 Thread Alex Smith

On 1 June 2018 at 15:48, Dylan Baker  wrote:

> Quoting Alex Smith (2018-05-31 08:44:18)
> > With GFX9 merged shaders, active_stages would be set to the original
> > stages specified if shaders were not cached, but to the stages still
> > present after merging if they were.
> >
> > Be consistent and use the original stages.
> >
> > Signed-off-by: Alex Smith 
> > Cc: "18.1" 
> > ---
> >  src/amd/vulkan/radv_pipeline.c | 7 ++-
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_
> pipeline.c
> > index 52734a308a..18dcc43ebe 100644
> > --- a/src/amd/vulkan/radv_pipeline.c
> > +++ b/src/amd/vulkan/radv_pipeline.c
> > @@ -1964,6 +1964,8 @@ void radv_create_shaders(struct radv_pipeline
> *pipeline,
> > _mesa_sha1_compute(modules[i]->nir->
> info.name,
> >
> strlen(modules[i]->nir->info.name),
> >modules[i]->sha1);
> > +
> > +   pipeline->active_stages |=
> mesa_to_vk_shader_stage(i);
> > }
> > }
> >
> > @@ -1979,10 +1981,6 @@ void radv_create_shaders(struct radv_pipeline
> *pipeline,
> >
> > if (radv_create_shader_variants_from_pipeline_cache(device,
> cache, hash, pipeline->shaders) &&
> > (!modules[MESA_SHADER_GEOMETRY] ||
> pipeline->gs_copy_shader)) {
> > -   for (unsigned i = 0; i < MESA_SHADER_STAGES; ++i) {
> > -   if (pipeline->shaders[i])
> > -   pipeline->active_stages |=
> mesa_to_vk_shader_stage(i);
> > -   }
> > return;
> > }
> >
> > @@ -2015,7 +2013,6 @@ void radv_create_shaders(struct radv_pipeline
> *pipeline,
> > stage ? stage->pName
> : "main", i,
> > stage ?
> stage->pSpecializationInfo : NULL,
> > flags);
> > -   pipeline->active_stages |= mesa_to_vk_shader_stage(i);
> >
> > /* We don't want to alter meta shaders IR directly so
> clone it
> >  * first.
> > --
> > 2.14.3
> >
> > ___
> > mesa-stable mailing list
> > mesa-sta...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-stable
>
> Hi Alex,
>
> This doesn't apply cleanly to the 18.1 tree with the following collision:
>
> ++<<<<<<< HEAD
>
>  +  stage ?
> stage->pSpecializationInfo : NULL);
>  +  pipeline->active_stages |= mesa_to_vk_shader_stage(i);
>
> ++===
>
> +   stage ?
> stage->pSpecializationInfo : NULL,
> +   flags);
>
> ++>>>>>>> 0fa51bfdbe5... radv: Set active_stages the same whether or not
> shaders were cached
>
> I can remove the flags field (which doesn't exist in 18.1) before merging
> if you
> think that's the right thing to do. Does that seem reasonable?
>

Yes, that's correct. Should just be removing the pipeline->active_stages
line at that point.

Thanks,
Alex


> Dylan
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] radv: Handle GFX9 merged shaders in radv_flush_constants()

2018-06-01 Thread Alex Smith

On 31 May 2018 at 21:15, Bas Nieuwenhuizen  wrote:

> On Thu, May 31, 2018 at 5:44 PM, Alex Smith 
> wrote:
> > This was not previously handled correctly. For example,
> > push_constant_stages might only contain MESA_SHADER_VERTEX because
> > only that stage was changed by CmdPushConstants or
> > CmdBindDescriptorSets.
> >
> > In that case, if vertex has been merged with tess control, then the
> > push constant address wouldn't be updated since
> > pipeline->shaders[MESA_SHADER_VERTEX] would be NULL.
> >
> > Use radv_get_shader() instead of getting the shader directly so that
> > we get the right shader if merged. Also, skip emitting the address
> > redundantly - if two merged stages are set in push_constant_stages
> > this change would have made the address get emitted twice.
> >
> > Signed-off-by: Alex Smith 
> > Cc: "18.1" 
> > ---
> >  src/amd/vulkan/radv_cmd_buffer.c | 9 -
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_
> buffer.c
> > index da9591b9a5..c6a2d6c5b9 100644
> > --- a/src/amd/vulkan/radv_cmd_buffer.c
> > +++ b/src/amd/vulkan/radv_cmd_buffer.c
> > @@ -1585,6 +1585,7 @@ radv_flush_constants(struct radv_cmd_buffer
> *cmd_buffer,
> >  ? cmd_buffer->state.compute_
> pipeline
> >  : cmd_buffer->state.pipeline;
> > struct radv_pipeline_layout *layout = pipeline->layout;
> > +   struct radv_shader_variant *shader, *prev_shader;
> > unsigned offset;
> > void *ptr;
> > uint64_t va;
> > @@ -1609,11 +1610,17 @@ radv_flush_constants(struct radv_cmd_buffer
> *cmd_buffer,
> > MAYBE_UNUSED unsigned cdw_max = radeon_check_space(cmd_buffer-
> >device->ws,
> >
> cmd_buffer->cs, MESA_SHADER_STAGES * 4);
> >
> > +   prev_shader = NULL;
> > radv_foreach_stage(stage, stages) {
> > -   if (pipeline->shaders[stage]) {
> > +   shader = radv_get_shader(pipeline, stage);
> > +
> > +   /* Avoid redundantly emitting the address for merged
> stages. */
> > +   if (shader && shader != prev_shader) {
> > radv_emit_userdata_address(cmd_buffer,
> pipeline, stage,
> >AC_UD_PUSH_CONSTANTS,
> va);
> > }
> > +
> > +   prev_shader = shader;
>
> This emits the same shader twice if we have a geometry shader and a
> vertex shader but no tessellation shaders in cases were the stage mask
> is larger than needed and includes the tessellation stages. On the
> iteration for the tess shaders, prev_shader will be reset to NULL, and
> hence when we visit the geometry shader we will emit the constants
> again.
>
> I think this should be solved by moving the prev_shader update within
> the if statement.
>

Good point. Fixed, and pushed.


>
> With that, this series is
>
> Reviewed-by: Bas Nieuwenhuizen 
>
> Thanks a lot!
>
>
> > }
> >
> > cmd_buffer->push_constant_stages &= ~stages;
> > --
> > 2.14.3
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] radv: Set active_stages the same whether or not shaders were cached

2018-05-31 Thread Alex Smith

With GFX9 merged shaders, active_stages would be set to the original
stages specified if shaders were not cached, but to the stages still
present after merging if they were.

Be consistent and use the original stages.

Signed-off-by: Alex Smith 
Cc: "18.1" 
---
 src/amd/vulkan/radv_pipeline.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index 52734a308a..18dcc43ebe 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -1964,6 +1964,8 @@ void radv_create_shaders(struct radv_pipeline *pipeline,
_mesa_sha1_compute(modules[i]->nir->info.name,
   
strlen(modules[i]->nir->info.name),
   modules[i]->sha1);
+
+   pipeline->active_stages |= mesa_to_vk_shader_stage(i);
}
}
 
@@ -1979,10 +1981,6 @@ void radv_create_shaders(struct radv_pipeline *pipeline,
 
if (radv_create_shader_variants_from_pipeline_cache(device, cache, 
hash, pipeline->shaders) &&
(!modules[MESA_SHADER_GEOMETRY] || pipeline->gs_copy_shader)) {
-   for (unsigned i = 0; i < MESA_SHADER_STAGES; ++i) {
-   if (pipeline->shaders[i])
-   pipeline->active_stages |= 
mesa_to_vk_shader_stage(i);
-   }
return;
}
 
@@ -2015,7 +2013,6 @@ void radv_create_shaders(struct radv_pipeline *pipeline,
stage ? stage->pName : 
"main", i,
stage ? 
stage->pSpecializationInfo : NULL,
flags);
-   pipeline->active_stages |= mesa_to_vk_shader_stage(i);
 
/* We don't want to alter meta shaders IR directly so clone it
 * first.
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] radv: Consolidate GFX9 merged shader lookup logic

2018-05-31 Thread Alex Smith

This was being handled in a few different places, consolidate it into a
single radv_get_shader() function.

Signed-off-by: Alex Smith 
Cc: "18.1" 
---
 src/amd/vulkan/radv_cmd_buffer.c | 20 
 src/amd/vulkan/radv_pipeline.c   | 38 --
 src/amd/vulkan/radv_private.h|  3 ++-
 3 files changed, 26 insertions(+), 35 deletions(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index 6ff1f1a6cb..da9591b9a5 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -560,20 +560,8 @@ radv_lookup_user_sgpr(struct radv_pipeline *pipeline,
  gl_shader_stage stage,
  int idx)
 {
-   if (stage == MESA_SHADER_VERTEX) {
-   if (pipeline->shaders[MESA_SHADER_VERTEX])
-   return 
>shaders[MESA_SHADER_VERTEX]->info.user_sgprs_locs.shader_data[idx];
-   if (pipeline->shaders[MESA_SHADER_TESS_CTRL])
-   return 
>shaders[MESA_SHADER_TESS_CTRL]->info.user_sgprs_locs.shader_data[idx];
-   if (pipeline->shaders[MESA_SHADER_GEOMETRY])
-   return 
>shaders[MESA_SHADER_GEOMETRY]->info.user_sgprs_locs.shader_data[idx];
-   } else if (stage == MESA_SHADER_TESS_EVAL) {
-   if (pipeline->shaders[MESA_SHADER_TESS_EVAL])
-   return 
>shaders[MESA_SHADER_TESS_EVAL]->info.user_sgprs_locs.shader_data[idx];
-   if (pipeline->shaders[MESA_SHADER_GEOMETRY])
-   return 
>shaders[MESA_SHADER_GEOMETRY]->info.user_sgprs_locs.shader_data[idx];
-   }
-   return >shaders[stage]->info.user_sgprs_locs.shader_data[idx];
+   struct radv_shader_variant *shader = radv_get_shader(pipeline, stage);
+   return >info.user_sgprs_locs.shader_data[idx];
 }
 
 static void
@@ -1639,7 +1627,7 @@ radv_flush_vertex_descriptors(struct radv_cmd_buffer 
*cmd_buffer,
if ((pipeline_is_dirty ||
(cmd_buffer->state.dirty & RADV_CMD_DIRTY_VERTEX_BUFFER)) &&
cmd_buffer->state.pipeline->vertex_elements.count &&
-   
radv_get_vertex_shader(cmd_buffer->state.pipeline)->info.info.vs.has_vertex_buffers)
 {
+   radv_get_shader(cmd_buffer->state.pipeline, 
MESA_SHADER_VERTEX)->info.info.vs.has_vertex_buffers) {
struct radv_vertex_elements_info *velems = 
_buffer->state.pipeline->vertex_elements;
unsigned vb_offset;
void *vb_ptr;
@@ -2940,7 +2928,7 @@ radv_cs_emit_indirect_draw_packet(struct radv_cmd_buffer 
*cmd_buffer,
struct radeon_winsys_cs *cs = cmd_buffer->cs;
unsigned di_src_sel = indexed ? V_0287F0_DI_SRC_SEL_DMA
  : V_0287F0_DI_SRC_SEL_AUTO_INDEX;
-   bool draw_id_enable = 
radv_get_vertex_shader(cmd_buffer->state.pipeline)->info.info.vs.needs_draw_id;
+   bool draw_id_enable = radv_get_shader(cmd_buffer->state.pipeline, 
MESA_SHADER_VERTEX)->info.info.vs.needs_draw_id;
uint32_t base_reg = cmd_buffer->state.pipeline->graphics.vtx_base_sgpr;
assert(base_reg);
 
diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index 18dcc43ebe..b44feae4cf 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -1583,21 +1583,23 @@ static void si_multiwave_lds_size_workaround(struct 
radv_device *device,
 }
 
 struct radv_shader_variant *
-radv_get_vertex_shader(struct radv_pipeline *pipeline)
+radv_get_shader(struct radv_pipeline *pipeline,
+   gl_shader_stage stage)
 {
-   if (pipeline->shaders[MESA_SHADER_VERTEX])
-   return pipeline->shaders[MESA_SHADER_VERTEX];
-   if (pipeline->shaders[MESA_SHADER_TESS_CTRL])
-   return pipeline->shaders[MESA_SHADER_TESS_CTRL];
-   return pipeline->shaders[MESA_SHADER_GEOMETRY];
-}
-
-static struct radv_shader_variant *
-radv_get_tess_eval_shader(struct radv_pipeline *pipeline)
-{
-   if (pipeline->shaders[MESA_SHADER_TESS_EVAL])
-   return pipeline->shaders[MESA_SHADER_TESS_EVAL];
-   return pipeline->shaders[MESA_SHADER_GEOMETRY];
+   if (stage == MESA_SHADER_VERTEX) {
+   if (pipeline->shaders[MESA_SHADER_VERTEX])
+   return pipeline->shaders[MESA_SHADER_VERTEX];
+   if (pipeline->shaders[MESA_SHADER_TESS_CTRL])
+   return pipeline->shaders[MESA_SHADER_TESS_CTRL];
+   if (pipeline->shaders[MESA_SHADER_GEOMETRY])
+   return pipeline->shaders[MESA_SHADER_GEOMETRY];
+   } else if (stage == MESA_SHADER_TESS_EVAL) {
+   if (pipeline->shaders[MESA_SHADER_TESS_EVAL])
+   return pipeline->shaders[MESA_SHADER_TESS_EVAL];
+

[Mesa-dev] [PATCH 3/3] radv: Handle GFX9 merged shaders in radv_flush_constants()

2018-05-31 Thread Alex Smith

This was not previously handled correctly. For example,
push_constant_stages might only contain MESA_SHADER_VERTEX because
only that stage was changed by CmdPushConstants or
CmdBindDescriptorSets.

In that case, if vertex has been merged with tess control, then the
push constant address wouldn't be updated since
pipeline->shaders[MESA_SHADER_VERTEX] would be NULL.

Use radv_get_shader() instead of getting the shader directly so that
we get the right shader if merged. Also, skip emitting the address
redundantly - if two merged stages are set in push_constant_stages
this change would have made the address get emitted twice.

Signed-off-by: Alex Smith 
Cc: "18.1" 
---
 src/amd/vulkan/radv_cmd_buffer.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index da9591b9a5..c6a2d6c5b9 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -1585,6 +1585,7 @@ radv_flush_constants(struct radv_cmd_buffer *cmd_buffer,
 ? cmd_buffer->state.compute_pipeline
 : cmd_buffer->state.pipeline;
struct radv_pipeline_layout *layout = pipeline->layout;
+   struct radv_shader_variant *shader, *prev_shader;
unsigned offset;
void *ptr;
uint64_t va;
@@ -1609,11 +1610,17 @@ radv_flush_constants(struct radv_cmd_buffer *cmd_buffer,
MAYBE_UNUSED unsigned cdw_max = 
radeon_check_space(cmd_buffer->device->ws,
   cmd_buffer->cs, 
MESA_SHADER_STAGES * 4);
 
+   prev_shader = NULL;
radv_foreach_stage(stage, stages) {
-   if (pipeline->shaders[stage]) {
+   shader = radv_get_shader(pipeline, stage);
+
+   /* Avoid redundantly emitting the address for merged stages. */
+   if (shader && shader != prev_shader) {
radv_emit_userdata_address(cmd_buffer, pipeline, stage,
   AC_UD_PUSH_CONSTANTS, va);
}
+
+   prev_shader = shader;
}
 
cmd_buffer->push_constant_stages &= ~stages;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH] radeonsi: Fix crash on shaders using MSAA image load/store

2018-05-31 Thread Alex Smith

Hmm, the crash I was seeing is in RenderDoc from one of its own shaders.
Maybe it's missing some support checks? I'll look into it.

If you're happy with this though, I'll push it.

Thanks,
Alex

On 30 May 2018 at 21:17, Marek Olšák  wrote:

> Reviewed-by: Marek Olšák 
>
> Note that radeonsi doesn't support MSAA images.
>
> Marek
>
> On Wed, May 30, 2018 at 4:48 AM, Alex Smith 
> wrote:
>
>> The value returned by tgsi_util_get_texture_coord_dim() does not
>> account for the sample index. This means image_fetch_coords() will not
>> fetch it, leading to a null deref in ac_build_image_opcode() which
>> expects it to be present (the return value of ac_num_coords() *does*
>> include the sample index).
>>
>> Signed-off-by: Alex Smith 
>> Cc: "18.1" 
>> ---
>>  src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c | 8 +++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c
>> b/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c
>> index 1c244fa3c0..d0dd4e7cab 100644
>> --- a/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c
>> +++ b/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c
>> @@ -276,10 +276,16 @@ static void image_fetch_coords(
>> struct si_shader_context *ctx = si_shader_context(bld_base);
>> LLVMBuilderRef builder = ctx->ac.builder;
>> unsigned target = inst->Memory.Texture;
>> -   const unsigned num_coords = tgsi_util_get_texture_coord_di
>> m(target);
>> +   unsigned num_coords = tgsi_util_get_texture_coord_dim(target);
>> LLVMValueRef tmp;
>> int chan;
>>
>> +   if (target == TGSI_TEXTURE_2D_MSAA ||
>> +   target == TGSI_TEXTURE_2D_ARRAY_MSAA) {
>> +   /* Need the sample index as well. */
>> +   num_coords++;
>> +   }
>> +
>> for (chan = 0; chan < num_coords; ++chan) {
>> tmp = lp_build_emit_fetch(bld_base, inst, src, chan);
>> tmp = ac_to_integer(>ac, tmp);
>> --
>> 2.14.3
>>
>> ___
>> mesa-stable mailing list
>> mesa-sta...@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-stable
>>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radeonsi: Fix crash on shaders using MSAA image load/store

2018-05-30 Thread Alex Smith

The value returned by tgsi_util_get_texture_coord_dim() does not
account for the sample index. This means image_fetch_coords() will not
fetch it, leading to a null deref in ac_build_image_opcode() which
expects it to be present (the return value of ac_num_coords() *does*
include the sample index).

Signed-off-by: Alex Smith 
Cc: "18.1" 
---
 src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c 
b/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c
index 1c244fa3c0..d0dd4e7cab 100644
--- a/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c
+++ b/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c
@@ -276,10 +276,16 @@ static void image_fetch_coords(
struct si_shader_context *ctx = si_shader_context(bld_base);
LLVMBuilderRef builder = ctx->ac.builder;
unsigned target = inst->Memory.Texture;
-   const unsigned num_coords = tgsi_util_get_texture_coord_dim(target);
+   unsigned num_coords = tgsi_util_get_texture_coord_dim(target);
LLVMValueRef tmp;
int chan;
 
+   if (target == TGSI_TEXTURE_2D_MSAA ||
+   target == TGSI_TEXTURE_2D_ARRAY_MSAA) {
+   /* Need the sample index as well. */
+   num_coords++;
+   }
+
for (chan = 0; chan < num_coords; ++chan) {
tmp = lp_build_emit_fetch(bld_base, inst, src, chan);
tmp = ac_to_integer(>ac, tmp);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: fix multisample image copies

2018-04-25 Thread Alex Smith

Any more thoughts on this? Any objections to it going to stable as well (it
fixes bugs, but is quite a large change)?

Thanks,
Alex

On 19 April 2018 at 09:27, Matthew Nicholls 
wrote:

> On 18/04/18 22:56, Dave Airlie wrote:
>
> On 18 April 2018 at 00:31, Matthew Nicholls
>>  wrote:
>>
>>> Previously before fb077b0728, the LOD parameter was being used in place
>>> of the
>>> sample index, which would only copy the first sample to all samples in
>>> the
>>> destination image. After that multisample image copies wouldn't copy
>>> anything
>>> from my observations.
>>>
>>> Fix this properly by copying each sample in a separate radv_CmdDraw and
>>> using a
>>> pipeline with the correct rasterizationSamples for the destination image.
>>>
>> Have you run CTS on this?
>>
> I ran the CTS tests under dEQP-VK.api.copy_and_blit.core.* and didn't see
> any
> changes. There were 6 failures both with and without this patch however:
>
> dEQP-VK.api.copy_and_blit.core.resolve_image.{whole_array_
> image,whole_copy_before_resolving}.{2,4,8}_bit
>
> This is on an RX 460.
>
> Matthew.
>
>
>> I wrote something similiar (I'm on holidays at the moment so can't
>> confirm how similiar)
>> but it failed some CTS tests for me.
>>
>> Dave.
>>
>> ---
>>>   src/amd/vulkan/radv_meta_blit2d.c | 279 --
>>> 
>>>   src/amd/vulkan/radv_private.h |  18 +--
>>>   2 files changed, 189 insertions(+), 108 deletions(-)
>>>
>>> diff --git a/src/amd/vulkan/radv_meta_blit2d.c
>>> b/src/amd/vulkan/radv_meta_blit2d.c
>>> index e163056257..d953241b55 100644
>>> --- a/src/amd/vulkan/radv_meta_blit2d.c
>>> +++ b/src/amd/vulkan/radv_meta_blit2d.c
>>> @@ -100,7 +100,8 @@ blit2d_bind_src(struct radv_cmd_buffer *cmd_buffer,
>>>   struct radv_meta_blit2d_buffer *src_buf,
>>>   struct blit2d_src_temps *tmp,
>>>   enum blit2d_src_type src_type, VkFormat depth_format,
>>> -VkImageAspectFlagBits aspects)
>>> +VkImageAspectFlagBits aspects,
>>> +uint32_t log2_samples)
>>>   {
>>>  struct radv_device *device = cmd_buffer->device;
>>>
>>> @@ -108,7 +109,7 @@ blit2d_bind_src(struct radv_cmd_buffer *cmd_buffer,
>>>  create_bview(cmd_buffer, src_buf, >bview,
>>> depth_format);
>>>
>>>  radv_meta_push_descriptor_set(cmd_buffer,
>>> VK_PIPELINE_BIND_POINT_GRAPHICS,
>>> -
>>>  device->meta_state.blit2d.p_layouts[src_type],
>>> +
>>>  device->meta_state.blit2d[log2_samples].p_layouts[src_type],
>>>0, /* set */
>>>1, /*
>>> descriptorWriteCount */
>>>(VkWriteDescriptorSet[]) {
>>> @@ -123,7 +124,7 @@ blit2d_bind_src(struct radv_cmd_buffer *cmd_buffer,
>>>});
>>>
>>>  radv_CmdPushConstants(radv_cm
>>> d_buffer_to_handle(cmd_buffer),
>>> - device->meta_state.blit2d.p_l
>>> ayouts[src_type],
>>> + device->meta_state.blit2d[log
>>> 2_samples].p_layouts[src_type],
>>>VK_SHADER_STAGE_FRAGMENT_BIT, 16,
>>> 4,
>>>_buf->pitch);
>>>  } else {
>>> @@ -131,12 +132,12 @@ blit2d_bind_src(struct radv_cmd_buffer *cmd_buffer,
>>>
>>>  if (src_type == BLIT2D_SRC_TYPE_IMAGE_3D)
>>>  radv_CmdPushConstants(radv_cm
>>> d_buffer_to_handle(cmd_buffer),
>>> -
>>>  device->meta_state.blit2d.p_layouts[src_type],
>>> +
>>>  device->meta_state.blit2d[log2_samples].p_layouts[src_type],
>>>
>>>  VK_SHADER_STAGE_FRAGMENT_BIT, 16, 4,
>>>_img->layer);
>>>
>>>  radv_meta_push_descriptor_set(cmd_buffer,
>>> VK_PIPELINE_BIND_POINT_GRAPHICS,
>>> -
>>>  device->meta_state.blit2d.p_layouts[src_type],
>>> +
>>>  device->meta_state.blit2d[log2_samples].p_layouts[src_type],
>>>0, /* set */
>>>1, /*
>>> descriptorWriteCount */
>>>(VkWriteDescriptorSet[]) {
>>> @@ -190,10 +191,11 @@ blit2d_bind_dst(struct radv_cmd_buffer *cmd_buffer,
>>>
>>>   static void
>>>   bind_pipeline(struct radv_cmd_buffer *cmd_buffer,
>>> -  enum blit2d_src_type src_type, unsigned fs_key)
>>> +  enum blit2d_src_type src_type, unsigned fs_key,
>>> +  uint32_t log2_samples)
>>>   {
>>>  VkPipeline pipeline =
>>> -   cmd_buffer->device->meta_stat
>>> e.blit2d.pipelines[src_type][fs_key];
>>> +   cmd_buffer->device->meta_stat
>>> e.blit2d[log2_samples].pipelines[src_type][fs_key];
>>>
>>>

Re: [Mesa-dev] [PATCH 3/3] ac: make use of if/loop build helpers

2018-04-10 Thread Alex Smith

On 10 April 2018 at 15:49, Juan A. Suarez Romero <jasua...@igalia.com>
wrote:

> On Tue, 2018-04-03 at 10:58 +0100, Alex Smith wrote:
> > I don't know exactly what's causing it, no. I noticed the issue was
> fixed on master so just bisected to this.
> >
> > CC'ing stable to nominate:
> > 42627dabb4db3011825a022325be7ae9b51103d6 - (1/3) ac: add if/loop build
> helpers
> > 6e1a142863b368a032e333f09feb107241446053 - (2/3) radeonsi: make use of
> if/loop build helpers in ac
> > 99cdc019bf6fe11c135b7544ef6daf4ac964fa24 - (3/3) ac: make use of
> if/loop build helpers
> >
>
> Hi, Alex.
>
> Are these 3 commits nominated for a specific stable branch? From the CC
> not sure
> if you want to nominate them for 17.3, 18.0 or both.
>

They work for me on both 18.0 and 17.3, so I think they can be nominated
for both.

Thanks,
Alex


>
>
> J.A.
>
> >
> >
> > On 3 April 2018 at 10:45, Timothy Arceri <tarc...@itsqueeze.com> wrote:
> > > I have no issue with these going in stable if they fix bugs. Ideally
> we should create a piglit test to catch this also but presumably you guys
> don't actually know the exact shader combination thats tripping things up?
> > >
> > >
> > > On 03/04/18 19:36, Samuel Pitoiset wrote:
> > > > This fixes a rendering issue with Wolfenstein 2 as well. A backport
> sounds reasonable to me.
> > > >
> > > > On 04/03/2018 11:33 AM, Alex Smith wrote:
> > > > > Hi Timothy,
> > > > >
> > > > > This patch fixes some rendering issues I see with RADV on SI.
> > > > >
> > > > > It doesn't sound like it was really intended to fix anything, so
> possibly it's masking some other issue, but would you object to nominating
> the series for stable? Applying it on the 18.0 branch fixes the issue there
> as well.
> > > > >
> > > > > Thanks,
> > > > > Alex
> > > > >
> > > > > On 7 March 2018 at 20:43, Marek Olšák <mar...@gmail.com  mar...@gmail.com>> wrote:
> > > > >
> > > > > For the series:
> > > > >
> > > > > Reviewed-by: Marek Olšák <marek.ol...@amd.com
> > > > > <mailto:marek.ol...@amd.com>>
> > > > >
> > > > > Marek
> > > > >
> > > > > On Tue, Mar 6, 2018 at 8:40 PM, Timothy Arceri
> > > > > <tarc...@itsqueeze.com <mailto:tarc...@itsqueeze.com>> wrote:
> > > > >  > These helpers insert the basic block in the same order as
> they
> > > > >  > appear in NIR making it easier to follow LLVM IR dumps. The
> helpers
> > > > >  > also insert more useful labels onto the blocks.
> > > > >  >
> > > > >  > TGSI use the line number of the corresponding opcode in the
> TGSI
> > > > >  > dump as the label id, here we use the corresponding block
> index
> > > > >  > from NIR.
> > > > >  > ---
> > > > >  >  src/amd/common/ac_nir_to_llvm.c | 60
> > > > > +
> > > > >  >  1 file changed, 18 insertions(+), 42 deletions(-)
> > > > >  >
> > > > >  > diff --git a/src/amd/common/ac_nir_to_llvm.c
> > > > > b/src/amd/common/ac_nir_to_llvm.c
> > > > >  > index cda91fe8bf..dc463ed253 100644
> > > > >  > --- a/src/amd/common/ac_nir_to_llvm.c
> > > > >  > +++ b/src/amd/common/ac_nir_to_llvm.c
> > > > >  > @@ -5237,17 +5237,15 @@ static void visit_ssa_undef(struct
> > > > > ac_nir_context *ctx,
> > > > >  > _mesa_hash_table_insert(ctx->defs, >def,
> undef);
> > > > >  >  }
> > > > >  >
> > > > >  > -static void visit_jump(struct ac_nir_context *ctx,
> > > > >  > +static void visit_jump(struct ac_llvm_context *ctx,
> > > > >  >const nir_jump_instr *instr)
> > > > >  >  {
> > > > >  > switch (instr->type) {
> > > > >  > case nir_jump_break:
> > > > >  > -   LLVMBuildBr(ctx->ac.builder,
> ctx->break_block);
> > > > >  > -   LLVMClearInsertionPosition(
> ctx->ac.builder);
> > > &

Re: [Mesa-dev] [PATCH 3/3] ac: make use of if/loop build helpers

2018-04-03 Thread Alex Smith

I don't know exactly what's causing it, no. I noticed the issue was fixed
on master so just bisected to this.

CC'ing stable to nominate:
42627dabb4db3011825a022325be7ae9b51103d6 - (1/3) ac: add if/loop build
helpers
6e1a142863b368a032e333f09feb107241446053 - (2/3) radeonsi: make use of
if/loop build helpers in ac
99cdc019bf6fe11c135b7544ef6daf4ac964fa24 - (3/3) ac: make use of if/loop
build helpers



On 3 April 2018 at 10:45, Timothy Arceri <tarc...@itsqueeze.com> wrote:

> I have no issue with these going in stable if they fix bugs. Ideally we
> should create a piglit test to catch this also but presumably you guys
> don't actually know the exact shader combination thats tripping things up?
>
>
> On 03/04/18 19:36, Samuel Pitoiset wrote:
>
>> This fixes a rendering issue with Wolfenstein 2 as well. A backport
>> sounds reasonable to me.
>>
>> On 04/03/2018 11:33 AM, Alex Smith wrote:
>>
>>> Hi Timothy,
>>>
>>> This patch fixes some rendering issues I see with RADV on SI.
>>>
>>> It doesn't sound like it was really intended to fix anything, so
>>> possibly it's masking some other issue, but would you object to nominating
>>> the series for stable? Applying it on the 18.0 branch fixes the issue there
>>> as well.
>>>
>>> Thanks,
>>> Alex
>>>
>>> On 7 March 2018 at 20:43, Marek Olšák <mar...@gmail.com >> mar...@gmail.com>> wrote:
>>>
>>> For the series:
>>>
>>> Reviewed-by: Marek Olšák <marek.ol...@amd.com
>>> <mailto:marek.ol...@amd.com>>
>>>
>>> Marek
>>>
>>> On Tue, Mar 6, 2018 at 8:40 PM, Timothy Arceri
>>> <tarc...@itsqueeze.com <mailto:tarc...@itsqueeze.com>> wrote:
>>>  > These helpers insert the basic block in the same order as they
>>>  > appear in NIR making it easier to follow LLVM IR dumps. The
>>> helpers
>>>  > also insert more useful labels onto the blocks.
>>>  >
>>>  > TGSI use the line number of the corresponding opcode in the TGSI
>>>  > dump as the label id, here we use the corresponding block index
>>>  > from NIR.
>>>  > ---
>>>  >  src/amd/common/ac_nir_to_llvm.c | 60
>>> +
>>>  >  1 file changed, 18 insertions(+), 42 deletions(-)
>>>  >
>>>  > diff --git a/src/amd/common/ac_nir_to_llvm.c
>>> b/src/amd/common/ac_nir_to_llvm.c
>>>  > index cda91fe8bf..dc463ed253 100644
>>>  > --- a/src/amd/common/ac_nir_to_llvm.c
>>>  > +++ b/src/amd/common/ac_nir_to_llvm.c
>>>  > @@ -5237,17 +5237,15 @@ static void visit_ssa_undef(struct
>>> ac_nir_context *ctx,
>>>  > _mesa_hash_table_insert(ctx->defs, >def, undef);
>>>  >  }
>>>  >
>>>  > -static void visit_jump(struct ac_nir_context *ctx,
>>>  > +static void visit_jump(struct ac_llvm_context *ctx,
>>>  >const nir_jump_instr *instr)
>>>  >  {
>>>  > switch (instr->type) {
>>>  > case nir_jump_break:
>>>  > -   LLVMBuildBr(ctx->ac.builder, ctx->break_block);
>>>  > -   LLVMClearInsertionPosition(ctx->ac.builder);
>>>  > +   ac_build_break(ctx);
>>>  > break;
>>>  > case nir_jump_continue:
>>>  > -   LLVMBuildBr(ctx->ac.builder, ctx->continue_block);
>>>  > -   LLVMClearInsertionPosition(ctx->ac.builder);
>>>  > +   ac_build_continue(ctx);
>>>  > break;
>>>  > default:
>>>  > fprintf(stderr, "Unknown NIR jump instr: ");
>>>  > @@ -5285,7 +5283,7 @@ static void visit_block(struct
>>> ac_nir_context *ctx, nir_block *block)
>>>  > visit_ssa_undef(ctx,
>>> nir_instr_as_ssa_undef(instr));
>>>  > break;
>>>  > case nir_instr_type_jump:
>>>  > -   visit_jump(ctx, nir_instr_as_jump(instr));
>>>  > +   visit_jump(>ac,
>>> nir_instr_as_jump(instr));
>>>  > break;
>>>  >

Re: [Mesa-dev] [PATCH 3/3] ac: make use of if/loop build helpers

2018-04-03 Thread Alex Smith

Hi Timothy,

This patch fixes some rendering issues I see with RADV on SI.

It doesn't sound like it was really intended to fix anything, so possibly
it's masking some other issue, but would you object to nominating the
series for stable? Applying it on the 18.0 branch fixes the issue there as
well.

Thanks,
Alex

On 7 March 2018 at 20:43, Marek Olšák  wrote:

> For the series:
>
> Reviewed-by: Marek Olšák 
>
> Marek
>
> On Tue, Mar 6, 2018 at 8:40 PM, Timothy Arceri 
> wrote:
> > These helpers insert the basic block in the same order as they
> > appear in NIR making it easier to follow LLVM IR dumps. The helpers
> > also insert more useful labels onto the blocks.
> >
> > TGSI use the line number of the corresponding opcode in the TGSI
> > dump as the label id, here we use the corresponding block index
> > from NIR.
> > ---
> >  src/amd/common/ac_nir_to_llvm.c | 60 +-
> ---
> >  1 file changed, 18 insertions(+), 42 deletions(-)
> >
> > diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_
> llvm.c
> > index cda91fe8bf..dc463ed253 100644
> > --- a/src/amd/common/ac_nir_to_llvm.c
> > +++ b/src/amd/common/ac_nir_to_llvm.c
> > @@ -5237,17 +5237,15 @@ static void visit_ssa_undef(struct
> ac_nir_context *ctx,
> > _mesa_hash_table_insert(ctx->defs, >def, undef);
> >  }
> >
> > -static void visit_jump(struct ac_nir_context *ctx,
> > +static void visit_jump(struct ac_llvm_context *ctx,
> >const nir_jump_instr *instr)
> >  {
> > switch (instr->type) {
> > case nir_jump_break:
> > -   LLVMBuildBr(ctx->ac.builder, ctx->break_block);
> > -   LLVMClearInsertionPosition(ctx->ac.builder);
> > +   ac_build_break(ctx);
> > break;
> > case nir_jump_continue:
> > -   LLVMBuildBr(ctx->ac.builder, ctx->continue_block);
> > -   LLVMClearInsertionPosition(ctx->ac.builder);
> > +   ac_build_continue(ctx);
> > break;
> > default:
> > fprintf(stderr, "Unknown NIR jump instr: ");
> > @@ -5285,7 +5283,7 @@ static void visit_block(struct ac_nir_context
> *ctx, nir_block *block)
> > visit_ssa_undef(ctx,
> nir_instr_as_ssa_undef(instr));
> > break;
> > case nir_instr_type_jump:
> > -   visit_jump(ctx, nir_instr_as_jump(instr));
> > +   visit_jump(>ac, nir_instr_as_jump(instr));
> > break;
> > default:
> > fprintf(stderr, "Unknown NIR instr type: ");
> > @@ -5302,56 +5300,34 @@ static void visit_if(struct ac_nir_context *ctx,
> nir_if *if_stmt)
> >  {
> > LLVMValueRef value = get_src(ctx, if_stmt->condition);
> >
> > -   LLVMValueRef fn = LLVMGetBasicBlockParent(
> LLVMGetInsertBlock(ctx->ac.builder));
> > -   LLVMBasicBlockRef merge_block =
> > -   LLVMAppendBasicBlockInContext(ctx->ac.context, fn, "");
> > -   LLVMBasicBlockRef if_block =
> > -   LLVMAppendBasicBlockInContext(ctx->ac.context, fn, "");
> > -   LLVMBasicBlockRef else_block = merge_block;
> > -   if (!exec_list_is_empty(_stmt->else_list))
> > -   else_block = LLVMAppendBasicBlockInContext(
> > -   ctx->ac.context, fn, "");
> > -
> > -   LLVMValueRef cond = LLVMBuildICmp(ctx->ac.builder, LLVMIntNE,
> value,
> > - ctx->ac.i32_0, "");
> > -   LLVMBuildCondBr(ctx->ac.builder, cond, if_block, else_block);
> > -
> > -   LLVMPositionBuilderAtEnd(ctx->ac.builder, if_block);
> > +   nir_block *then_block =
> > +   (nir_block *) exec_list_get_head(_stmt->then_list);
> > +
> > +   ac_build_uif(>ac, value, then_block->index);
> > +
> > visit_cf_list(ctx, _stmt->then_list);
> > -   if (LLVMGetInsertBlock(ctx->ac.builder))
> > -   LLVMBuildBr(ctx->ac.builder, merge_block);
> >
> > if (!exec_list_is_empty(_stmt->else_list)) {
> > -   LLVMPositionBuilderAtEnd(ctx->ac.builder, else_block);
> > +   nir_block *else_block =
> > +   (nir_block *) exec_list_get_head(_stmt->
> else_list);
> > +
> > +   ac_build_else(>ac, else_block->index);
> > visit_cf_list(ctx, _stmt->else_list);
> > -   if (LLVMGetInsertBlock(ctx->ac.builder))
> > -   LLVMBuildBr(ctx->ac.builder, merge_block);
> > }
> >
> > -   LLVMPositionBuilderAtEnd(ctx->ac.builder, merge_block);
> > +   ac_build_endif(>ac, then_block->index);
> >  }
> >
> >  static void visit_loop(struct ac_nir_context *ctx, nir_loop *loop)
> >  {
> > -   LLVMValueRef fn = LLVMGetBasicBlockParent(
> LLVMGetInsertBlock(ctx->ac.builder));
> > -   LLVMBasicBlockRef continue_parent =

Re: [Mesa-dev] [ANNOUNCE] Mesa 17.3.7 release candidate

2018-03-16 Thread Alex Smith

On 16 March 2018 at 12:46, Juan A. Suarez Romero <jasua...@igalia.com>
wrote:

> On Fri, 2018-03-16 at 13:40 +0100, Juan A. Suarez Romero wrote:
> > On Fri, 2018-03-16 at 12:17 +0000, Alex Smith wrote:
> > > Hi Juan,
> > >
> > > On 16 March 2018 at 11:42, Juan A. Suarez Romero <jasua...@igalia.com>
> wrote:
> > > > Hello list,
> > > >
> > > > The candidate for the Mesa 17.3.7 is now available. Currently we
> have:
> > > >  - 53 queued
> > > >  - 9 nominated (outstanding)
> > > >  - and 6 rejected patches
> > > >
> > > >
> > > > In the current queue we have lot of fixes, as the latest two
> releases were
> > > > emergency releases to fix major issues.
> > > >
> > > > The i965 receives quite a few of fixes. We have fixes for hangs on
> GFXBench 5's
> > > > Aztec Ruins benchmark, a fix for OpenGL CTS test in Haswell, another
> fix for the
> > > > number of input components, a fix for KHR_blend_equation_advanced,
> another fix
> > > > in the intel_from_planar, and some other fixes.
> > > >
> > > > In the RADV driver, there is a 3D images copying fix, another to
> disable tc-
> > > > compat on multisample d32s8, a fix related with HTILE, and a fix to
> avoid hangs
> > > > on Vega. Also, there is a couple of fixes regarding fences.
> > > >
> > > > R600 driver gets fixes for some hangs related with recip generation,
> a fix for
> > > > XFB stream check, another for indirect UBO access, and a final one
> for cubemap
> > > > arrays.
> > > >
> > > > In SWR/Rast driver we have a couple of fixes when using LLVM 6.0,
> which has been
> > > > recently released. Besides those, there are another couple of fixes
> too.
> > > >
> > > > Finally, there a some fixes for other drivers like Virgl, NVC0,
> Winsys, and
> > > > RadeonSI.
> > > >
> > > > Fixes also reach Wayland part. There's a fix to use the
> wayland-egl-backend.h
> > > > provided by Mesa itself, and a fix related with ARGB/XRGB
> transposition.
> > > >
> > > > Just to continue, there are also other framework-specific fixes.
> There are a
> > > > bunch of fixes for NIR, fixes for GLSL component.
> > > >
> > > > Finally, and to avoid extending to much the list of fixes, there are
> several
> > > > fixes that touches different parts of Mesa that solves different
> bugs.
> > > >
> > > > Take a look at section "Mesa stable queue" for more information.
> > > >
> > > >
> > > > Testing reports/general approval
> > > > 
> > > > Any testing reports (or general approval of the state of the branch)
> will be
> > > > greatly appreciated.
> > > >
> > > > The plan is to have 17.3.7 this XXX (XXth Mar), around or shortly
> after 12:00
> > > > GMT.
> > > >
> > > > If you have any questions or suggestions - be that about the current
> patch queue
> > > > or otherwise, please go ahead.
> > > >
> > > >
> > > > Trivial merge conflicts
> > > > ---
> > > >
> > > > commit fc507dbfd193de5ef09ed2944090e9727820d9ea
> > > > Author: Dave Airlie <airl...@redhat.com>
> > > >
> > > > ac/nir: don't apply slice rounding on txf_ms
> > > >
> > > > (cherry picked from commit 69495b30a38fbb01a937cdea6f7674
> f89a2e60e7)
> > > >
> > > >
> > > > commit 5fd11359b66c8138d2c7ee29bd9740280b02d1e2
> > > > Author: Daniel Stone <dani...@collabora.com>
> > > >
> > > > egl/wayland: Fix ARGB/XRGB transposition in config map
> > > >
> > > > (cherry picked from commit 4fbd2d50b1c06a3c10f3a254e93364
> 6345123751)
> > > >
> > > >
> > > > commit 7f4a0a16284391f802a11717fa74ba5e13cbe43b
> > > > Author: Bas Nieuwenhuizen <ba...@chromium.org>
> > > >
> > > > radeonsi: Export signalled sync file instead of -1.
> > > >
> > > > (cherry picked from commit 5a3404d443e0c6e8e9a44d7f8dccf9
> 6c5ac18f0f)
> > > >
> > > > squashed with
> > > >
> > > > configure/meson: Bump libdrm_amdgpu version requirement.
> > >

Re: [Mesa-dev] [ANNOUNCE] Mesa 17.3.7 release candidate

2018-03-16 Thread Alex Smith

Hi Juan,

On 16 March 2018 at 11:42, Juan A. Suarez Romero <jasua...@igalia.com>
wrote:

> Hello list,
>
> The candidate for the Mesa 17.3.7 is now available. Currently we have:
>  - 53 queued
>  - 9 nominated (outstanding)
>  - and 6 rejected patches
>
>
> In the current queue we have lot of fixes, as the latest two releases were
> emergency releases to fix major issues.
>
> The i965 receives quite a few of fixes. We have fixes for hangs on
> GFXBench 5's
> Aztec Ruins benchmark, a fix for OpenGL CTS test in Haswell, another fix
> for the
> number of input components, a fix for KHR_blend_equation_advanced, another
> fix
> in the intel_from_planar, and some other fixes.
>
> In the RADV driver, there is a 3D images copying fix, another to disable
> tc-
> compat on multisample d32s8, a fix related with HTILE, and a fix to avoid
> hangs
> on Vega. Also, there is a couple of fixes regarding fences.
>
> R600 driver gets fixes for some hangs related with recip generation, a fix
> for
> XFB stream check, another for indirect UBO access, and a final one for
> cubemap
> arrays.
>
> In SWR/Rast driver we have a couple of fixes when using LLVM 6.0, which
> has been
> recently released. Besides those, there are another couple of fixes too.
>
> Finally, there a some fixes for other drivers like Virgl, NVC0, Winsys, and
> RadeonSI.
>
> Fixes also reach Wayland part. There's a fix to use the
> wayland-egl-backend.h
> provided by Mesa itself, and a fix related with ARGB/XRGB transposition.
>
> Just to continue, there are also other framework-specific fixes. There are
> a
> bunch of fixes for NIR, fixes for GLSL component.
>
> Finally, and to avoid extending to much the list of fixes, there are
> several
> fixes that touches different parts of Mesa that solves different bugs.
>
> Take a look at section "Mesa stable queue" for more information.
>
>
> Testing reports/general approval
> 
> Any testing reports (or general approval of the state of the branch) will
> be
> greatly appreciated.
>
> The plan is to have 17.3.7 this XXX (XXth Mar), around or shortly after
> 12:00
> GMT.
>
> If you have any questions or suggestions - be that about the current patch
> queue
> or otherwise, please go ahead.
>
>
> Trivial merge conflicts
> ---
>
> commit fc507dbfd193de5ef09ed2944090e9727820d9ea
> Author: Dave Airlie <airl...@redhat.com>
>
> ac/nir: don't apply slice rounding on txf_ms
>
> (cherry picked from commit 69495b30a38fbb01a937cdea6f7674f89a2e60e7)
>
>
> commit 5fd11359b66c8138d2c7ee29bd9740280b02d1e2
> Author: Daniel Stone <dani...@collabora.com>
>
> egl/wayland: Fix ARGB/XRGB transposition in config map
>
> (cherry picked from commit 4fbd2d50b1c06a3c10f3a254e933646345123751)
>
>
> commit 7f4a0a16284391f802a11717fa74ba5e13cbe43b
> Author: Bas Nieuwenhuizen <ba...@chromium.org>
>
> radeonsi: Export signalled sync file instead of -1.
>
> (cherry picked from commit 5a3404d443e0c6e8e9a44d7f8dccf96c5ac18f0f)
>
> squashed with
>
> configure/meson: Bump libdrm_amdgpu version requirement.
>
> (cherry picked from commit 52be440f48ac7c337f6604846bb6f0cfd88e7118)
>
>
> commit 6ddf838def69036a48524e2f5ae79fb01170e59c
> Author: Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl>
>
> radv: Always lower indirect derefs after nir_lower_global_vars_to_
> local.
>
> (cherry picked from commit 05d84ed68add9e6adfcc602a274405e04226c1b7)
>
>
> Cheers,
>
> J.A.
>
>
>
> Mesa stable queue
> -
>
> Nominated (9)
> ==
>
> Alex Smith (1):
>   fcf267ba08 radv: Fix CmdCopyImage between uncompressed and compressed
> images
>
> Bas Nieuwenhuizen (1):
>   997306c031 radv: Increase the number of dynamic uniform buffers.
>

To clarify, does "nominated" mean that these will *not* be in 17.3.7? (I
guess so since I don't see them on the branch)

I'd like the above two changes in if possible.

Thanks,
Alex


>
> Dave Airlie (1):
>   5d4fbc2b54 r600: implement callstack workaround for evergreen.
>
> Jordan Justen (2):
>   06e3bd02c0 i965: Hard code CS scratch_ids_per_subslice for Cherryview
>   24b415270f intel/vulkan: Hard code CS scratch_ids_per_subslice for
> Cherryview
>
> Marek Olšák (3):
>   75c5d25f0f radeonsi: align command buffer starting address to fix
> some
> Raven hangs
>   2bdb54bce7 radeonsi: add a workaround for GFX9 hang with init_config
> alignment
>   5d0acff39e configure.ac: blacklist libdrm 2.4.90
>
> Samuel Pitois

Re: [Mesa-dev] [PATCH] radv: Fix CmdCopyImage between uncompressed and compressed images

2018-03-14 Thread Alex Smith

On 13 March 2018 at 19:14, Dave Airlie <airl...@gmail.com> wrote:

> On 13 March 2018 at 01:38, Alex Smith <asm...@feralinteractive.com> wrote:
> > From the spec:
> >
> > "When copying between compressed and uncompressed formats the
> >  extent members represent the texel dimensions of the source
> >  image and not the destination."
> >
> > However, as per 7b890a36, we must still use the destination image type
> > when clamping the extent so that we copy the correct number of layers
> > for 2D to 3D copies.
> >
> > Fixes: 7b890a36 "radv: Fix vkCmdCopyImage for 2d slices into 3d Images"
> > Cc: <mesa-sta...@lists.freedesktop.org>
> > Signed-off-by: Alex Smith <asm...@feralinteractive.com>
>
> Reviewed-by: Dave Airlie <airl...@redhat.com>
>
> Might be worth filing a cts issue to see if someone wants to write
> tests for this sort of hole.
>

Thanks, pushed. CTS issue filed:
https://github.com/KhronosGroup/VK-GL-CTS/issues/90


>
> Dave.
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radv: Fix CmdCopyImage between uncompressed and compressed images

2018-03-12 Thread Alex Smith

From the spec:

"When copying between compressed and uncompressed formats the
 extent members represent the texel dimensions of the source
 image and not the destination."

However, as per 7b890a36, we must still use the destination image type
when clamping the extent so that we copy the correct number of layers
for 2D to 3D copies.

Fixes: 7b890a36 "radv: Fix vkCmdCopyImage for 2d slices into 3d Images"
Cc: <mesa-sta...@lists.freedesktop.org>
Signed-off-by: Alex Smith <asm...@feralinteractive.com>
---
 src/amd/vulkan/radv_meta_copy.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/src/amd/vulkan/radv_meta_copy.c b/src/amd/vulkan/radv_meta_copy.c
index 2a3faa64f1..a0ef18ee70 100644
--- a/src/amd/vulkan/radv_meta_copy.c
+++ b/src/amd/vulkan/radv_meta_copy.c
@@ -37,10 +37,11 @@ meta_image_block_size(const struct radv_image *image)
  */
 static struct VkExtent3D
 meta_region_extent_el(const struct radv_image *image,
+  const VkImageType imageType,
   const struct VkExtent3D *extent)
 {
const VkExtent3D block = meta_image_block_size(image);
-   return radv_sanitize_image_extent(image->type, (VkExtent3D) {
+   return radv_sanitize_image_extent(imageType, (VkExtent3D) {
.width  = DIV_ROUND_UP(extent->width , block.width),
.height = DIV_ROUND_UP(extent->height, 
block.height),
.depth  = DIV_ROUND_UP(extent->depth , 
block.depth),
@@ -146,11 +147,11 @@ meta_copy_buffer_to_image(struct radv_cmd_buffer 
*cmd_buffer,
pRegions[r].bufferImageHeight : 
pRegions[r].imageExtent.height,
};
const VkExtent3D buf_extent_el =
-   meta_region_extent_el(image, );
+   meta_region_extent_el(image, image->type, 
);
 
/* Start creating blit rect */
const VkExtent3D img_extent_el =
-   meta_region_extent_el(image, [r].imageExtent);
+   meta_region_extent_el(image, image->type, 
[r].imageExtent);
struct radv_meta_blit2d_rect rect = {
.width = img_extent_el.width,
.height =  img_extent_el.height,
@@ -259,11 +260,11 @@ meta_copy_image_to_buffer(struct radv_cmd_buffer 
*cmd_buffer,
pRegions[r].bufferImageHeight : 
pRegions[r].imageExtent.height,
};
const VkExtent3D buf_extent_el =
-   meta_region_extent_el(image, );
+   meta_region_extent_el(image, image->type, 
);
 
/* Start creating blit rect */
const VkExtent3D img_extent_el =
-   meta_region_extent_el(image, [r].imageExtent);
+   meta_region_extent_el(image, image->type, 
[r].imageExtent);
struct radv_meta_blit2d_rect rect = {
.width = img_extent_el.width,
.height =  img_extent_el.height,
@@ -408,8 +409,18 @@ meta_copy_image(struct radv_cmd_buffer *cmd_buffer,
meta_region_offset_el(dest_image, 
[r].dstOffset);
const VkOffset3D src_offset_el =
meta_region_offset_el(src_image, 
[r].srcOffset);
+
+   /*
+* From Vulkan 1.0.68, "Copying Data Between Images":
+*"When copying between compressed and uncompressed formats
+* the extent members represent the texel dimensions of the
+* source image and not the destination."
+* However, we must use the destination image type to avoid
+* clamping depth when copying multiple layers of a 2D image to
+* a 3D image.
+*/
const VkExtent3D img_extent_el =
-   meta_region_extent_el(dest_image, [r].extent);
+   meta_region_extent_el(src_image, dest_image->type, 
[r].extent);
 
/* Start creating blit rect */
struct radv_meta_blit2d_rect rect = {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: Increase the number of dynamic uniform buffers.

2018-03-09 Thread Alex Smith

Tested-by: Alex Smith <asm...@feralinteractive.com>

Thanks!

On 9 March 2018 at 16:21, Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl> wrote:

> The vulkan API is not ideal as it does not allow us have a
> shared limit.
>
> Feral needs 15+6 for one of their games, and I'm not a fan
> of overcommitting the limits, so increase the number of
> dynamic uniform buffers to 16.
>
> CC: <mesa-sta...@lists.freedesktop.org>
> CC: Alex Smith <asm...@feralinteractive.com>
> ---
>  src/amd/vulkan/radv_device.c  | 4 ++--
>  src/amd/vulkan/radv_private.h | 4 +++-
>  2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index 7a11e08f97..0ed3e27c7b 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -775,9 +775,9 @@ void radv_GetPhysicalDeviceProperties(
> .maxPerStageResources =
> max_descriptor_set_size,
> .maxDescriptorSetSamplers =
> max_descriptor_set_size,
> .maxDescriptorSetUniformBuffers   =
> max_descriptor_set_size,
> -   .maxDescriptorSetUniformBuffersDynamic=
> MAX_DYNAMIC_BUFFERS / 2,
> +   .maxDescriptorSetUniformBuffersDynamic=
> MAX_DYNAMIC_UNIFORM_BUFFERS,
> .maxDescriptorSetStorageBuffers   =
> max_descriptor_set_size,
> -   .maxDescriptorSetStorageBuffersDynamic=
> MAX_DYNAMIC_BUFFERS / 2,
> +   .maxDescriptorSetStorageBuffersDynamic=
> MAX_DYNAMIC_STORAGE_BUFFERS,
> .maxDescriptorSetSampledImages=
> max_descriptor_set_size,
> .maxDescriptorSetStorageImages=
> max_descriptor_set_size,
> .maxDescriptorSetInputAttachments =
> max_descriptor_set_size,
> diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
> index 0f8ddb2e10..439522585a 100644
> --- a/src/amd/vulkan/radv_private.h
> +++ b/src/amd/vulkan/radv_private.h
> @@ -87,7 +87,9 @@ typedef uint32_t xcb_window_t;
>  #define MAX_DISCARD_RECTANGLES 4
>  #define MAX_PUSH_CONSTANTS_SIZE 128
>  #define MAX_PUSH_DESCRIPTORS 32
> -#define MAX_DYNAMIC_BUFFERS 16
> +#define MAX_DYNAMIC_UNIFORM_BUFFERS 16
> +#define MAX_DYNAMIC_STORAGE_BUFFERS 8
> +#define MAX_DYNAMIC_BUFFERS (MAX_DYNAMIC_UNIFORM_BUFFERS +
> MAX_DYNAMIC_STORAGE_BUFFERS)
>  #define MAX_SAMPLES_LOG2 4
>  #define NUM_META_FS_KEYS 13
>  #define RADV_MAX_DRM_DEVICES 8
> --
> 2.16.1
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: Increase maxDescriptorSet{Uniform, Storage}BuffersDynamic limits

2018-03-09 Thread Alex Smith

Ping.

Maybe it'd be better to just increase MAX_DYNAMIC_BUFFERS? I can't see any
side effects of that other than increasing the size of radv_cmd_buffer?

Alex

On 5 March 2018 at 09:59, Alex Smith <asm...@feralinteractive.com> wrote:

> I just checked what Rise of the Tomb Raider is using. Maximum it hits for
> uniform buffers is 15, and 6 for storage buffers. The highest combined
> total is 15.
>
> Alex
>
> On 2 March 2018 at 20:11, Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl>
> wrote:
>
>> Hi Alex,
>>
>> How many do you need of either type?
>>
>> - Bas
>>
>> On Fri, Mar 2, 2018 at 4:28 PM, Alex Smith <asm...@feralinteractive.com>
>> wrote:
>> > These were set to MAX_DYNAMIC_BUFFERS / 2, which is too restrictive
>> > since an app may have it's total usage of both uniform and storage
>> > within MAX_DYNAMIC_BUFFERS, but exceed the limit for one of the types.
>> >
>> > Recently the validation layers have started raising errors for when
>> > these limits are exceeded, so these are firing for something that
>> > actually works just fine.
>> >
>> > Set the limit for both to MAX_DYNAMIC_BUFFERS. Not ideal because it
>> > now allows the total across both to exceed the real limit, but we have
>> > no way to express that limit properly.
>> >
>> > Cc: <mesa-sta...@lists.freedesktop.org>
>> > Signed-off-by: Alex Smith <asm...@feralinteractive.com>
>> > ---
>> >  src/amd/vulkan/radv_device.c | 4 ++--
>> >  1 file changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
>> > index 36d7a406bf..1e81ddb891 100644
>> > --- a/src/amd/vulkan/radv_device.c
>> > +++ b/src/amd/vulkan/radv_device.c
>> > @@ -717,9 +717,9 @@ void radv_GetPhysicalDeviceProperties(
>> > .maxPerStageResources =
>> max_descriptor_set_size,
>> > .maxDescriptorSetSamplers =
>> max_descriptor_set_size,
>> > .maxDescriptorSetUniformBuffers   =
>> max_descriptor_set_size,
>> > -   .maxDescriptorSetUniformBuffersDynamic=
>> MAX_DYNAMIC_BUFFERS / 2,
>> > +   .maxDescriptorSetUniformBuffersDynamic=
>> MAX_DYNAMIC_BUFFERS,
>> > .maxDescriptorSetStorageBuffers   =
>> max_descriptor_set_size,
>> > -   .maxDescriptorSetStorageBuffersDynamic=
>> MAX_DYNAMIC_BUFFERS / 2,
>> > +   .maxDescriptorSetStorageBuffersDynamic=
>> MAX_DYNAMIC_BUFFERS,
>> > .maxDescriptorSetSampledImages=
>> max_descriptor_set_size,
>> > .maxDescriptorSetStorageImages=
>> max_descriptor_set_size,
>> > .maxDescriptorSetInputAttachments =
>> max_descriptor_set_size,
>> > --
>> > 2.14.3
>> >
>> > ___
>> > mesa-dev mailing list
>> > mesa-dev@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [ANNOUNCE] mesa 18.0.0-rc4

2018-03-07 Thread Alex Smith

Hi Emil,

Just wondering what the status of the 18.0 release is? It's been almost a
month since this last RC.

Thanks,
Alex

On 5 March 2018 at 17:28, Ernst Sjöstrand  wrote:

> Is there a lot of new patches queued up before the 18.0 release? In
> that case you could push them to a branch (before tagging)
> and it could be given a quick pre-test... Not sure how that ties in
> the with current release process.
>
> Regards
> //Ernst
>
> 2018-02-09 3:27 GMT+01:00 Emil Velikov :
> > The fourth release candidate for Mesa 18.0.0 is now available.
> >
> >
> > Andres Gomez (1):
> >   i965: perform 2 uploads with dual slot *64*PASSTHRU formats on
> gen<8
> >
> > Bas Nieuwenhuizen (1):
> >   radv: Signal fence correctly after sparse binding.
> >
> > Dave Airlie (4):
> >   r600/eg: construct proper rat mask for image/buffers.
> >   r600/sb: insert the else clause when we might depart from a loop
> >   radv/gfx9: fix block compression texture views. (v2)
> >   virgl: also remove dimension on indirect.
> >
> > Dylan Baker (2):
> >   meson: Don't confuse the install and search paths for dri drivers
> >   meson: Check for actual LLVM required versions
> >
> > Emil Velikov (3):
> >   radv: Stop advertising VK_KHX_multiview
> >   cherry-ignore: radv: Don't expose VK_KHX_multiview on android.
> >   Update version to 18.0.0-rc4
> >
> > Eric Anholt (1):
> >   mesa: Drop incorrect A4B4G4R4
> > _mesa_format_matches_format_and_type() cases.
> >
> > George Kyriazis (2):
> >   meson/swr: re-shuffle generated files
> >   meson/swr: Updated copyright dates
> >
> > Jason Ekstrand (3):
> >   anv/cmd_buffer: Re-emit the pipeline at every subpass
> >   anv: Stop advertising VK_KHX_multiview
> >   i965: Call prepare_external after implicit window-system MSAA
> resolves
> >
> > Jon Turney (9):
> >   meson: libdrm shouldn't appear in Requires.private: if it wasn't
> found
> >   configure: Default to gbm=no on osx
> >   osx: ld doesn't support --build-id
> >   glx/apple: include util/debug.h for env_var_as_boolean prototype
> >   glx/apple: locate dispatch table functions to wrap by name
> >   glx/test: fix building for osx
> >   travis: conditionalize building of prerequisites on if OS=linux
> >   travis: pip -> pip2
> >   travis: add osx autotools build
> >
> > Jordan Justen (1):
> >   i965: Create new program cache bo when clearing the program cache
> >
> > Kenneth Graunke (1):
> >   i965: Bump official kernel requirement to Linux v3.9.
> >
> > Lucas Stach (1):
> >   renderonly: fix dumb BO allocation for non 32bpp formats
> >
> > Marc Dietrich (1):
> >   meson: don't install windows headers on non-windows platforms
> >
> > Marek Olšák (1):
> >   winsys/amdgpu: fix assertion failure with UVD and VCE rings
> >
> > Matthew Nicholls (1):
> >   radv: remove predication on cache flushes
> >
> > Michel Dänzer (1):
> >   winsys/radeon: Compute is_displayable in surf_drm_to_winsys
> >
> > Rafael Antognolli (2):
> >   anv/gen10: Emit CS stall and mark push constants dirty.
> >   i965/gen10: Use CS Stall instead of WriteImmediate.
> >
> > Roland Scheidegger (1):
> >   r600: don't do stack workarounds for hemlock
> >
> > Stephan Gerhold (1):
> >   util/build-id: Fix address comparison for binaries with LOAD vaddr
> > 0
> >
> > Tapani Pälli (3):
> >   i965: fix prog_data leak in brw_disk_cache
> >   i965: fix disk_cache leak when destroying context
> >   nir: mark unused space in packed_tex_data
> >
> > Timothy Arceri (1):
> >   st/shader_cache: restore num_tgsi_tokens when loading from cache
> >
> > git tag: mesa-18.0.0-rc4
> >
> > https://mesa.freedesktop.org/archive/mesa-18.0.0-rc4.tar.gz
> > MD5:  b09a18dd7a0ab9c3f55a820b5a25  mesa-18.0.0-rc4.tar.gz
> > SHA1: b0785f1b2328e3d68a5e01cc00663301bb0829c2  mesa-18.0.0-rc4.tar.gz
> > SHA256: e5d038a58221fe9c62b3f784df67210e87df81559032cfee7648d51bdce3a356
> >  mesa-18.0.0-rc4.tar.gz
> > SHA512: e7d7898ab8b4788adaf34d7eb924f893314acd2dbf4970d71ecfa8e44b86
> 62e1e924db97365ddb7b8fc3bd0744ba82dd1942439582cb685b54e86f00f8e51310
> >  mesa-18.0.0-rc4.tar.gz
> > PGP:  https://mesa.freedesktop.org/archive/mesa-18.0.0-rc4.tar.gz.sig
> >
> > https://mesa.freedesktop.org/archive/mesa-18.0.0-rc4.tar.xz
> > MD5:  195889b71ee88785d55b03d99e0034d3  mesa-18.0.0-rc4.tar.xz
> > SHA1: ea24546b10f1a089c25c992d01083d7eb005ec44  mesa-18.0.0-rc4.tar.xz
> > SHA256: ad575becea192f04403b6783492955f395dd8faad7e51cbcbad203be70eb9075
> >  mesa-18.0.0-rc4.tar.xz
> > SHA512: 91dd0a4396715a7896fc47aabf38c4b486df3b50c9764795805550ef0172
> 4d2e2281ba9b000e82760ea0e199c58d8c9943dbc732b2adab46554ff5c2f9e2ece1
> >  mesa-18.0.0-rc4.tar.xz
> > PGP:  https://mesa.freedesktop.org/archive/mesa-18.0.0-rc4.tar.xz.sig
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> >

Re: [Mesa-dev] [PATCH] radv: Increase maxDescriptorSet{Uniform, Storage}BuffersDynamic limits

2018-03-05 Thread Alex Smith

I just checked what Rise of the Tomb Raider is using. Maximum it hits for
uniform buffers is 15, and 6 for storage buffers. The highest combined
total is 15.

Alex

On 2 March 2018 at 20:11, Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl> wrote:

> Hi Alex,
>
> How many do you need of either type?
>
> - Bas
>
> On Fri, Mar 2, 2018 at 4:28 PM, Alex Smith <asm...@feralinteractive.com>
> wrote:
> > These were set to MAX_DYNAMIC_BUFFERS / 2, which is too restrictive
> > since an app may have it's total usage of both uniform and storage
> > within MAX_DYNAMIC_BUFFERS, but exceed the limit for one of the types.
> >
> > Recently the validation layers have started raising errors for when
> > these limits are exceeded, so these are firing for something that
> > actually works just fine.
> >
> > Set the limit for both to MAX_DYNAMIC_BUFFERS. Not ideal because it
> > now allows the total across both to exceed the real limit, but we have
> > no way to express that limit properly.
> >
> > Cc: <mesa-sta...@lists.freedesktop.org>
> > Signed-off-by: Alex Smith <asm...@feralinteractive.com>
> > ---
> >  src/amd/vulkan/radv_device.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> > index 36d7a406bf..1e81ddb891 100644
> > --- a/src/amd/vulkan/radv_device.c
> > +++ b/src/amd/vulkan/radv_device.c
> > @@ -717,9 +717,9 @@ void radv_GetPhysicalDeviceProperties(
> > .maxPerStageResources =
> max_descriptor_set_size,
> > .maxDescriptorSetSamplers =
> max_descriptor_set_size,
> > .maxDescriptorSetUniformBuffers   =
> max_descriptor_set_size,
> > -   .maxDescriptorSetUniformBuffersDynamic=
> MAX_DYNAMIC_BUFFERS / 2,
> > +   .maxDescriptorSetUniformBuffersDynamic=
> MAX_DYNAMIC_BUFFERS,
> > .maxDescriptorSetStorageBuffers   =
> max_descriptor_set_size,
> > -   .maxDescriptorSetStorageBuffersDynamic=
> MAX_DYNAMIC_BUFFERS / 2,
> > +   .maxDescriptorSetStorageBuffersDynamic=
> MAX_DYNAMIC_BUFFERS,
> > .maxDescriptorSetSampledImages=
> max_descriptor_set_size,
> > .maxDescriptorSetStorageImages=
> max_descriptor_set_size,
> > .maxDescriptorSetInputAttachments =
> max_descriptor_set_size,
> > --
> > 2.14.3
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH] [RFC] gallivm: Use new LLVM fast-math-flags API

2018-03-05 Thread Alex Smith

Hi Emil,

On 2 March 2018 at 18:38, Emil Velikov <emil.l.veli...@gmail.com> wrote:

> Hi Alex,
>
> On 28 February 2018 at 15:25, Alex Smith <asm...@feralinteractive.com>
> wrote:
> > Hi,
> >
> > Could this (commit 5d61fa4e68b7eb6d481a37efdbb35fdce675a6ad on master)
> be
> > backported to the 17.3 branch to allow it to build with LLVM 6?
> >
> Normally we don't aim to support LLVM versions released after the .0
> Mesa release is out.
> Not that we don't want to - there is simply not enough testing happening.
>
> Sometimes picking the odd build fix is enough, but not always.
>

>From my (not particularly extensive) testing, with just this compile fix
radeonsi and radv appear to work OK (radeonsi is functional enough to run
my desktop and radv can run a full game).

It'd be nice to have it able to compile, even if not officially supported.

Thanks,
Alex


>
> As a matter of fact, the only feedback for the AMD drivers status
> (brokenness) is the LunarG testing rig.
>
> Michel, usually you are usually more realistic/conservative on with
> this kind of changes.
> Any objections?
>
> Thanks
> Emil
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radv: Increase maxDescriptorSet{Uniform, Storage}BuffersDynamic limits

2018-03-02 Thread Alex Smith

These were set to MAX_DYNAMIC_BUFFERS / 2, which is too restrictive
since an app may have it's total usage of both uniform and storage
within MAX_DYNAMIC_BUFFERS, but exceed the limit for one of the types.

Recently the validation layers have started raising errors for when
these limits are exceeded, so these are firing for something that
actually works just fine.

Set the limit for both to MAX_DYNAMIC_BUFFERS. Not ideal because it
now allows the total across both to exceed the real limit, but we have
no way to express that limit properly.

Cc: <mesa-sta...@lists.freedesktop.org>
Signed-off-by: Alex Smith <asm...@feralinteractive.com>
---
 src/amd/vulkan/radv_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 36d7a406bf..1e81ddb891 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -717,9 +717,9 @@ void radv_GetPhysicalDeviceProperties(
.maxPerStageResources = 
max_descriptor_set_size,
.maxDescriptorSetSamplers = 
max_descriptor_set_size,
.maxDescriptorSetUniformBuffers   = 
max_descriptor_set_size,
-   .maxDescriptorSetUniformBuffersDynamic= MAX_DYNAMIC_BUFFERS 
/ 2,
+   .maxDescriptorSetUniformBuffersDynamic= MAX_DYNAMIC_BUFFERS,
.maxDescriptorSetStorageBuffers   = 
max_descriptor_set_size,
-   .maxDescriptorSetStorageBuffersDynamic= MAX_DYNAMIC_BUFFERS 
/ 2,
+   .maxDescriptorSetStorageBuffersDynamic= MAX_DYNAMIC_BUFFERS,
.maxDescriptorSetSampledImages= 
max_descriptor_set_size,
.maxDescriptorSetStorageImages= 
max_descriptor_set_size,
.maxDescriptorSetInputAttachments = 
max_descriptor_set_size,
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: do not set pending_reset_query in BeginCommandBuffer()

2018-03-01 Thread Alex Smith

Reviewed-by: Alex Smith <asm...@feralinteractive.com>

On 1 March 2018 at 09:53, Samuel Pitoiset <samuel.pitoi...@gmail.com> wrote:

> This is just useless for two reasons:
> 1) flush_bits is not set accordingly, so nothing will be flushed
>in BeginQuery().
> 2) we always flush caches in EndCommandBuffer(), so if a reset
>is done in a previous command buffer we are safe.
>
> Cc: Alex Smith <asm...@feralinteractive.com>
> Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
> ---
>  src/amd/vulkan/radv_cmd_buffer.c | 7 ---
>  1 file changed, 7 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_
> buffer.c
> index cfdc531acd..2b41baea3d 100644
> --- a/src/amd/vulkan/radv_cmd_buffer.c
> +++ b/src/amd/vulkan/radv_cmd_buffer.c
> @@ -1930,13 +1930,6 @@ VkResult radv_BeginCommandBuffer(
>
> cmd_buffer->status = RADV_CMD_BUFFER_STATUS_RECORDING;
>
> -   /* Force cache flushes before starting a new query in case the
> -* corresponding pool has been resetted from a different command
> -* buffer. This is because we have to flush caches between reset
> and
> -* begin if the compute shader path has been used.
> -*/
> -   cmd_buffer->pending_reset_query = true;
> -
> return result;
>  }
>
> --
> 2.16.2
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3] radv: make sure to emit cache flushes before starting a query

2018-03-01 Thread Alex Smith

Hi Samuel,

On 28 February 2018 at 20:47, Samuel Pitoiset 
wrote:

> If the query pool has been previously resetted using the compute
> shader path.
>
> v3: set pending_reset_query only for the compute shader path
> v2: handle multiple commands buffers with same pool
>
> Fixes: a41e2e9cf5 ("radv: allow to use a compute shader for resetting the
> query pool")
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105292
> Cc: "18.0" 
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_cmd_buffer.c |  7 +++
>  src/amd/vulkan/radv_private.h|  5 +
>  src/amd/vulkan/radv_query.c  | 28 +---
>  3 files changed, 33 insertions(+), 7 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_
> buffer.c
> index 2b41baea3d..cfdc531acd 100644
> --- a/src/amd/vulkan/radv_cmd_buffer.c
> +++ b/src/amd/vulkan/radv_cmd_buffer.c
> @@ -1930,6 +1930,13 @@ VkResult radv_BeginCommandBuffer(
>
> cmd_buffer->status = RADV_CMD_BUFFER_STATUS_RECORDING;
>
> +   /* Force cache flushes before starting a new query in case the
> +* corresponding pool has been resetted from a different command
> +* buffer. This is because we have to flush caches between reset
> and
> +* begin if the compute shader path has been used.
> +*/
> +   cmd_buffer->pending_reset_query = true;
>

Since this just ends up calling si_emit_cache_flush, doesn't flush_bits
need to be set accordingly for it to actually do anything? If the reset is
done in a previous command buffer, I think the flush would already have
been done in EndCommandBuffer on that?

Thanks,
Alex

+
> return result;
>  }
>
> diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
> index c72df5a737..b76d2eb5cb 100644
> --- a/src/amd/vulkan/radv_private.h
> +++ b/src/amd/vulkan/radv_private.h
> @@ -1003,6 +1003,11 @@ struct radv_cmd_buffer {
> uint32_t gfx9_fence_offset;
> struct radeon_winsys_bo *gfx9_fence_bo;
> uint32_t gfx9_fence_idx;
> +
> +   /**
> +* Whether a query pool has been resetted and we have to flush
> caches.
> +*/
> +   bool pending_reset_query;
>  };
>
>  struct radv_image;
> diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
> index ace745e4e6..b1393a2ec7 100644
> --- a/src/amd/vulkan/radv_query.c
> +++ b/src/amd/vulkan/radv_query.c
> @@ -1058,17 +1058,23 @@ void radv_CmdResetQueryPool(
>  {
> RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
> RADV_FROM_HANDLE(radv_query_pool, pool, queryPool);
> -   struct radv_cmd_state *state = _buffer->state;
> +   uint32_t flush_bits = 0;
>
> -   state->flush_bits |= radv_fill_buffer(cmd_buffer, pool->bo,
> - firstQuery * pool->stride,
> - queryCount * pool->stride,
> 0);
> +   flush_bits |= radv_fill_buffer(cmd_buffer, pool->bo,
> +  firstQuery * pool->stride,
> +  queryCount * pool->stride, 0);
>
> if (pool->type == VK_QUERY_TYPE_TIMESTAMP ||
> pool->type == VK_QUERY_TYPE_PIPELINE_STATISTICS) {
> -   state->flush_bits |= radv_fill_buffer(cmd_buffer, pool->bo,
> -
>  pool->availability_offset + firstQuery * 4,
> - queryCount * 4, 0);
> +   flush_bits |= radv_fill_buffer(cmd_buffer, pool->bo,
> +  pool->availability_offset +
> firstQuery * 4,
> +  queryCount * 4, 0);
> +   }
> +
> +   if (flush_bits) {
> +   /* Only need to flush caches for the compute shader path.
> */
> +   cmd_buffer->pending_reset_query = true;
> +   cmd_buffer->state.flush_bits |= flush_bits;
> }
>  }
>
> @@ -1086,6 +1092,14 @@ void radv_CmdBeginQuery(
>
> radv_cs_add_buffer(cmd_buffer->device->ws, cs, pool->bo, 8);
>
> +   if (cmd_buffer->pending_reset_query) {
> +   /* Make sure to flush caches if the query pool has been
> +* previously resetted using the compute shader path.
> +*/
> +   si_emit_cache_flush(cmd_buffer);
> +   cmd_buffer->pending_reset_query = false;
> +   }
> +
> switch (pool->type) {
> case VK_QUERY_TYPE_OCCLUSION:
> radeon_check_space(cmd_buffer->device->ws, cs, 7);
> --
> 2.16.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

Re: [Mesa-dev] [PATCH] [RFC] gallivm: Use new LLVM fast-math-flags API

2018-02-28 Thread Alex Smith

Hi,

Could this (commit 5d61fa4e68b7eb6d481a37efdbb35fdce675a6ad on master) be
backported to the 17.3 branch to allow it to build with LLVM 6?

Thanks,
Alex

On 6 November 2017 at 21:16, Marek Olšák  wrote:

> Reviewed-by: Marek Olšák 
>
> Marek
>
> On Mon, Nov 6, 2017 at 10:09 PM, Tobias Droste  wrote:
> > LLVM 6 changed the API on the fast-math-flags:
> > https://reviews.llvm.org/rL317488
> >
> > NOTE: This also enables the new flag 'ApproxFunc' to allow for
> > approximations for library functions (sin, cos, ...). I'm not completly
> > convinced, that this is something mesa should do.
> >
> > Signed-off-by: Tobias Droste 
> > ---
> >  src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> > index d988910a7e..1319407290 100644
> > --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> > +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> > @@ -830,7 +830,11 @@ lp_create_builder(LLVMContextRef ctx, enum
> lp_float_mode float_mode)
> >llvm::unwrap(builder)->setFastMathFlags(flags);
> >break;
> > case LP_FLOAT_MODE_UNSAFE_FP_MATH:
> > +#if HAVE_LLVM >= 0x0600
> > +  flags.setFast();
> > +#else
> >flags.setUnsafeAlgebra();
> > +#endif
> >llvm::unwrap(builder)->setFastMathFlags(flags);
> >break;
> > }
> > --
> > 2.14.3
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH] anv/pipeline: Don't look at blend state unless we have an attachment

2018-01-26 Thread Alex Smith

That commit is "anv/pipeline: Don't assert on more than 32 samplers"?
https://cgit.freedesktop.org/mesa/mesa/commit/?id=4b69ba381766cd911eb1284f1b0332a139ec8a75

On 25 January 2018 at 22:53, Jason Ekstrand  wrote:

> It landed as 4b69ba381766cd911eb1284f1b0332a139ec8a75
>
> On Thu, Jan 25, 2018 at 3:27 AM, Emil Velikov 
> wrote:
>
>> On 18 January 2018 at 01:16, Jason Ekstrand  wrote:
>> > Without this, we may end up dereferencing blend before we check for
>> > binding->index != UINT32_MAX.  However, Vulkan allows the blend state to
>> > be NULL so long as you don't have any color attachments.  This fixes a
>> > segfault when running The Talos Principal.
>> >
>> > Fixes: 12f4e00b69e724a23504b7bd3958fb75dc462950
>> > Cc: mesa-sta...@lists.freedesktop.org
>> > ---
>> Jason, did this fall through the cracks or it has been
>> superseded/rejected for some reason?
>>
>> -Emil
>>
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/4] ac/nir: set amdgpu.uniform and invariant.load for UBOs

2018-01-25 Thread Alex Smith

Tested-by: Alex Smith <asm...@feralinteractive.com>

This fixes a regression seen after 41c36c45 ("amd/common: use
ac_build_buffer_load() for emitting UBO loads").

On 24 January 2018 at 22:26, Samuel Pitoiset <samuel.pitoi...@gmail.com>
wrote:

> UBOs are constants buffers.
>
> Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
> ---
>  src/amd/common/ac_nir_to_llvm.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_
> llvm.c
> index 1a52367602..07089349e2 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -4568,8 +4568,14 @@ static LLVMValueRef radv_load_ssbo(struct
> ac_shader_abi *abi,
>  static LLVMValueRef radv_load_ubo(struct ac_shader_abi *abi, LLVMValueRef
> buffer_ptr)
>  {
> struct nir_to_llvm_context *ctx = nir_to_llvm_context_from_abi(
> abi);
> +   LLVMValueRef result;
>
> -   return LLVMBuildLoad(ctx->builder, buffer_ptr, "");
> +   LLVMSetMetadata(buffer_ptr, ctx->ac.uniform_md_kind,
> ctx->ac.empty_md);
> +
> +   result = LLVMBuildLoad(ctx->builder, buffer_ptr, "");
> +   LLVMSetMetadata(result, ctx->ac.invariant_load_md_kind,
> ctx->ac.empty_md);
> +
> +   return result;
>  }
>
>  static LLVMValueRef radv_get_sampler_desc(struct ac_shader_abi *abi,
> --
> 2.16.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: restore previous stencil reference after depth-stencil clear

2018-01-19 Thread Alex Smith

Reviewed-by: Alex Smith <asm...@feralinteractive.com>

On 19 January 2018 at 15:14, Samuel Pitoiset <samuel.pitoi...@gmail.com>
wrote:

> Reviewed-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
>
>
> On 01/19/2018 03:11 PM, Matthew Nicholls wrote:
>
>> Cc: mesa-sta...@lists.freedesktop.org
>> ---
>>   src/amd/vulkan/radv_meta_clear.c | 6 ++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/src/amd/vulkan/radv_meta_clear.c
>> b/src/amd/vulkan/radv_meta_clear.c
>> index b42ecedfc9..98fb8fa6a7 100644
>> --- a/src/amd/vulkan/radv_meta_clear.c
>> +++ b/src/amd/vulkan/radv_meta_clear.c
>> @@ -624,6 +624,7 @@ emit_depthstencil_clear(struct radv_cmd_buffer
>> *cmd_buffer,
>>   VK_SHADER_STAGE_VERTEX_BIT, 0, 4,
>>   _value.depth);
>>   + uint32_t prev_reference = cmd_buffer->state.dynamic.sten
>> cil_reference.front;
>> if (aspects & VK_IMAGE_ASPECT_STENCIL_BIT) {
>> radv_CmdSetStencilReference(cmd_buffer_h,
>> VK_STENCIL_FACE_FRONT_BIT,
>>   clear_value.stencil);
>> @@ -658,6 +659,11 @@ emit_depthstencil_clear(struct radv_cmd_buffer
>> *cmd_buffer,
>> radv_CmdSetScissor(radv_cmd_buffer_to_handle(cmd_buffer), 0, 1,
>> _rect->rect);
>> radv_CmdDraw(cmd_buffer_h, 3, clear_rect->layerCount, 0,
>> clear_rect->baseArrayLayer);
>> +
>> +   if (aspects & VK_IMAGE_ASPECT_STENCIL_BIT) {
>> +   radv_CmdSetStencilReference(cmd_buffer_h,
>> VK_STENCIL_FACE_FRONT_BIT,
>> + prev_reference);
>> +   }
>>   }
>> static bool
>>
>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv/pipeline: Don't look at blend state unless we have an attachment

2018-01-18 Thread Alex Smith

Oops, sorry about that.

Reviewed-by: Alex Smith <asm...@feralinteractive.com>

On 18 January 2018 at 01:16, Jason Ekstrand <ja...@jlekstrand.net> wrote:

> Without this, we may end up dereferencing blend before we check for
> binding->index != UINT32_MAX.  However, Vulkan allows the blend state to
> be NULL so long as you don't have any color attachments.  This fixes a
> segfault when running The Talos Principal.
>
> Fixes: 12f4e00b69e724a23504b7bd3958fb75dc462950
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/intel/vulkan/genX_pipeline.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_
> pipeline.c
> index cfc3bea..e8ac7c6 100644
> --- a/src/intel/vulkan/genX_pipeline.c
> +++ b/src/intel/vulkan/genX_pipeline.c
> @@ -1345,10 +1345,10 @@ has_color_buffer_write_enabled(const struct
> anv_pipeline *pipeline,
>if (binding->set != ANV_DESCRIPTOR_SET_COLOR_ATTACHMENTS)
>   continue;
>
> -  const VkPipelineColorBlendAttachmentState *a =
> - >pAttachments[binding->index];
> +  if (binding->index == UINT32_MAX)
> + continue;
>
> -  if (binding->index != UINT32_MAX && a->colorWriteMask != 0)
> +  if (blend->pAttachments[binding->index].colorWriteMask != 0)
>   return true;
> }
>
> --
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] [rfc] radv: inline push constants where possible. (v2)

2018-01-12 Thread Alex Smith

Looks like it's working fine here now. One comment inline below.

On 12 January 2018 at 02:43, Dave Airlie <airl...@gmail.com> wrote:

> From: Dave Airlie <airl...@redhat.com>
>
> Instead of putting the push constants into the upload buffer,
> if we have space in the sgprs we can upload the per-stage
> constants into the shaders directly.
>
> This saves a few reads from memory in the meta shaders,
> we should also be able to inline other objects like
> descriptors.
>
> v2: fixup 16->available_sgprs (Samuel)
> fixup dynamic offsets. (Alex)
> bump to 12.
> handle push consts > 32 better, avoid F1 2017 crash
>
> TODO: proper vega support (Samuel)
>
> Signed-off-by: Dave Airlie <airl...@redhat.com>
> ---
>  src/amd/common/ac_nir_to_llvm.c  | 102 ++
> +
>  src/amd/common/ac_nir_to_llvm.h  |   5 ++
>  src/amd/common/ac_shader_info.c  |   5 +-
>  src/amd/common/ac_shader_info.h  |   1 +
>  src/amd/vulkan/radv_cmd_buffer.c |  75 +---
>  5 files changed, 159 insertions(+), 29 deletions(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_
> llvm.c
> index 6c578de3aca..00ad76a82f7 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -92,6 +92,7 @@ struct nir_to_llvm_context {
> LLVMValueRef descriptor_sets[AC_UD_MAX_SETS];
> LLVMValueRef ring_offsets;
> LLVMValueRef push_constants;
> +   LLVMValueRef inline_push_consts[AC_UD_MAX_INLINE_PUSH_CONST];
> LLVMValueRef view_index;
> LLVMValueRef num_work_groups;
> LLVMValueRef workgroup_ids[3];
> @@ -243,7 +244,7 @@ static void set_llvm_calling_convention(LLVMValueRef
> func,
> LLVMSetFunctionCallConv(func, calling_conv);
>  }
>
> -#define MAX_ARGS 23
> +#define MAX_ARGS 32
>  struct arg_info {
> LLVMTypeRef types[MAX_ARGS];
> LLVMValueRef *assign[MAX_ARGS];
> @@ -538,6 +539,8 @@ struct user_sgpr_info {
> bool need_ring_offsets;
> uint8_t sgpr_count;
> bool indirect_all_descriptor_sets;
> +   uint8_t base_inline_push_consts;
> +   uint8_t num_inline_push_consts;
>  };
>
>  static void allocate_user_sgprs(struct nir_to_llvm_context *ctx,
> @@ -609,8 +612,49 @@ static void allocate_user_sgprs(struct
> nir_to_llvm_context *ctx,
> } else {
> user_sgpr_info->sgpr_count += 
> util_bitcount(ctx->shader_info->info.desc_set_used_mask)
> * 2;
> }
> +
> +   if (ctx->shader_info->info.loads_push_constants) {
> +   uint32_t remaining_sgprs = available_sgprs -
> user_sgpr_info->sgpr_count;
> +   if (!ctx->shader_info->info.has_indirect_push_constants &&
> +   !ctx->shader_info->info.loads_dynamic_offsets)
> +   remaining_sgprs += 2;
> +
> +   if (ctx->options->layout->push_constant_size) {
> +   uint8_t num_32bit_push_consts =
> (ctx->shader_info->info.max_push_constant_used -
> +
> ctx->shader_info->info.min_push_constant_used) / 4;
> +
> +   if ((ctx->shader_info->info.min_push_constant_used
> / 4) <= 63 &&
> +   (ctx->shader_info->info.max_push_constant_used
> / 4) <= 63) {
> +   user_sgpr_info->base_inline_push_consts =
> ctx->shader_info->info.min_push_constant_used / 4;
> +
> +   if (num_32bit_push_consts <
> remaining_sgprs) {
> +   user_sgpr_info->num_inline_push_consts
> = num_32bit_push_consts;
> +   if (!ctx->shader_info->info.has_
> indirect_push_constants)
> +
>  ctx->shader_info->info.loads_push_constants = false;
> +   } else {
> +   user_sgpr_info->num_inline_push_consts
> = remaining_sgprs;
> +   }
> +
> +   if (user_sgpr_info->num_inline_push_consts
> > AC_UD_MAX_INLINE_PUSH_CONST)
> +   user_sgpr_info->num_inline_push_consts
> = AC_UD_MAX_INLINE_PUSH_CONST;
>

This is done after possibly setting loads_push_constants to false, if this
happens shouldn't that still be true?

With that fixed, for the series:

Reviewed-by: Alex Smith <asm...@feralinteractive.com>

Thanks,
Alex


> +   }
> +   }
> +   }
>  }
>
> +static void
> +declare_inline_push_consts(struct nir_to

Re: [Mesa-dev] [PATCH 2/2] radv: inline push constants where possible.

2018-01-11 Thread Alex Smith

Hi Dave,

This seems to cause some breakage when both push constants and dynamic
descriptors are used.

I've commented 2 fixes inline below needed to avoid a crash, but with those
F1 2017 will still hang pretty quick before the main menu, not sure why so
far. Mad Max is OK but that doesn't use dynamic descriptors, so I presume
the problem is related to that.

On 11 January 2018 at 03:03, Dave Airlie  wrote:

> From: Dave Airlie 
>
> Instead of putting the push constants into the upload buffer,
> if we have space in the sgprs we can upload the per-stage
> constants into the shaders directly.
>
> This saves a few reads from memory in the meta shaders,
> we should also be able to inline other objects like
> descriptors.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/amd/common/ac_nir_to_llvm.c  | 93 ++
> ++
>  src/amd/common/ac_nir_to_llvm.h  |  4 ++
>  src/amd/common/ac_shader_info.c  |  5 ++-
>  src/amd/common/ac_shader_info.h  |  1 +
>  src/amd/vulkan/radv_cmd_buffer.c | 74 
>  5 files changed, 150 insertions(+), 27 deletions(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_
> llvm.c
> index c00220a9c3..818ce40168 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -92,6 +92,7 @@ struct nir_to_llvm_context {
> LLVMValueRef descriptor_sets[AC_UD_MAX_SETS];
> LLVMValueRef ring_offsets;
> LLVMValueRef push_constants;
> +   LLVMValueRef inline_push_consts[AC_UD_MAX_INLINE_PUSH_CONST];
> LLVMValueRef view_index;
> LLVMValueRef num_work_groups;
> LLVMValueRef workgroup_ids[3];
> @@ -243,7 +244,7 @@ static void set_llvm_calling_convention(LLVMValueRef
> func,
> LLVMSetFunctionCallConv(func, calling_conv);
>  }
>
> -#define MAX_ARGS 23
> +#define MAX_ARGS 32
>  struct arg_info {
> LLVMTypeRef types[MAX_ARGS];
> LLVMValueRef *assign[MAX_ARGS];
> @@ -538,6 +539,8 @@ struct user_sgpr_info {
> bool need_ring_offsets;
> uint8_t sgpr_count;
> bool indirect_all_descriptor_sets;
> +   uint8_t base_inline_push_consts;
> +   uint8_t num_inline_push_consts;
>  };
>
>  static void allocate_user_sgprs(struct nir_to_llvm_context *ctx,
> @@ -609,8 +612,45 @@ static void allocate_user_sgprs(struct
> nir_to_llvm_context *ctx,
> } else {
> user_sgpr_info->sgpr_count += 
> util_bitcount(ctx->shader_info->info.desc_set_used_mask)
> * 2;
> }
> +
> +   if (ctx->shader_info->info.loads_push_constants) {
> +   uint32_t remaining_sgprs = 16 - user_sgpr_info->sgpr_count;
> +   if (!ctx->shader_info->info.has_indirect_push_constants &&
> +   !ctx->shader_info->info.loads_dynamic_offsets)
> +   remaining_sgprs += 2;
> +
> +   if (ctx->options->layout->push_constant_size) {
> +   uint8_t num_32bit_push_consts =
> (ctx->shader_info->info.max_push_constant_used -
> +
> ctx->shader_info->info.min_push_constant_used) / 4;
> +   user_sgpr_info->base_inline_push_consts =
> ctx->shader_info->info.min_push_constant_used / 4;
> +
> +   if (num_32bit_push_consts < remaining_sgprs) {
> +   user_sgpr_info->num_inline_push_consts =
> num_32bit_push_consts;
> +   if (!ctx->shader_info->info.has_
> indirect_push_constants)
> +   
> ctx->shader_info->info.loads_push_constants
> = false;
> +   } else {
> +   user_sgpr_info->num_inline_push_consts =
> remaining_sgprs;
> +   }
> +
> +   if (user_sgpr_info->num_inline_push_consts >
> AC_UD_MAX_INLINE_PUSH_CONST)
> +   user_sgpr_info->num_inline_push_consts =
> AC_UD_MAX_INLINE_PUSH_CONST;
> +   }
> +   }
>  }
>
> +static void
> +declare_inline_push_consts(struct nir_to_llvm_context *ctx,
> +  gl_shader_stage stage,
> +  const struct user_sgpr_info *user_sgpr_info,
> +  struct arg_info *args)
> +{
> +   ctx->shader_info->inline_push_const_mask = (1 <<
> user_sgpr_info->num_inline_push_consts) - 1;
> +   ctx->shader_info->inline_push_const_mask <<=
> user_sgpr_info->base_inline_push_consts;
> +
> +   for (unsigned i = 0; i < user_sgpr_info->num_inline_push_consts;
> i++)
> +   add_arg(args, ARG_SGPR, ctx->ac.i32,
> >inline_push_consts[i]);
> +
> +}
>  static void
>  declare_global_input_sgprs(struct nir_to_llvm_context *ctx,
>gl_shader_stage stage,
> @@ -644,6 +684,9 @@ declare_global_input_sgprs(struct nir_to_llvm_context
> *ctx,
> /* 1 for push constants and dynamic descriptors */
>

Re: [Mesa-dev] [PATCH] anv: Make sure state on primary is correct after CmdExecuteCommands

2018-01-09 Thread Alex Smith

Thanks Jason. I take it you didn't find any other state that needed
resetting then?

Alex

On 9 January 2018 at 16:49, Jason Ekstrand <ja...@jlekstrand.net> wrote:

> From: Alex Smith <asm...@feralinteractive.com>
>
> After executing a secondary command buffer, we need to update certain
> state on the primary command buffer to reflect changes by the secondary.
> Otherwise subsequent commands may not have the correct state set.
>
> This fixes various issues (rendering errors, GPU hangs) seen after
> executing secondary command buffers in some cases.
>
> v2 (Jason Ekstrand):
>  - Reset to invalid values instead of pulling from the secondary
>  - Change the comment to be more descriptive
>
> Signed-off-by: Alex Smith <asm...@feralinteractive.com>
> Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net>
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/intel/vulkan/genX_cmd_buffer.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> b/src/intel/vulkan/genX_cmd_buffer.c
> index b7253d5..4b73ac8 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -1078,6 +1078,15 @@ genX(CmdExecuteCommands)(
>anv_cmd_buffer_add_secondary(primary, secondary);
> }
>
> +   /* The secondary may have selected a different pipeline (3D or
> compute) and
> +* may have changed the current L3$ configuration.  Reset our tracking
> +* variables to invalid values to ensure that we re-emit these in the
> case
> +* where we do any draws or compute dispatches from the primary after
> the
> +* secondary has returned.
> +*/
> +   primary->state.current_pipeline = UINT32_MAX;
> +   primary->state.current_l3_config = NULL;
> +
> /* Each of the secondary command buffers will use its own state base
>  * address.  We need to re-emit state base address for the primary
> after
>  * all of the secondaries are done.
> --
> 2.5.0.400.gff86faf
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] spirv: Use correct type for sampled images

2018-01-06 Thread Alex Smith

On 6 January 2018 at 01:03, Jason Ekstrand <ja...@jlekstrand.net> wrote:

> On Tue, Nov 7, 2017 at 3:08 AM, Alex Smith <asm...@feralinteractive.com>
> wrote:
>
>> Thanks Jason. Can someone push this?
>>
>
> Did you never get push access?
>

I did - this is commit e9eb3c4753e4f56b03d16d8d6f71d49f1e7b97db.

Thanks,
Alex


> --Jason
>
>
>> On 6 November 2017 at 16:21, Jason Ekstrand <ja...@jlekstrand.net> wrote:
>>
>>> On Mon, Nov 6, 2017 at 2:37 AM, Alex Smith <asm...@feralinteractive.com>
>>> wrote:
>>>
>>>> We should use the result type of the OpSampledImage opcode, rather than
>>>> the type of the underlying image/samplers.
>>>>
>>>> This resolves an issue when using separate images and shadow samplers
>>>> with glslang. Example:
>>>>
>>>> layout (...) uniform samplerShadow s0;
>>>> layout (...) uniform texture2D res0;
>>>> ...
>>>> float result = textureLod(sampler2DShadow(res0, s0), uv, 0);
>>>>
>>>> For this, for the combined OpSampledImage, the type of the base image
>>>> was being used (which does not have the Depth flag set, whereas the
>>>> result type does), therefore it was not being recognised as a shadow
>>>> sampler. This led to the wrong LLVM intrinsics being emitted by RADV.
>>>>
>>>
>>> Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net>
>>>
>>>
>>>> Signed-off-by: Alex Smith <asm...@feralinteractive.com>
>>>> Cc: "17.2 17.3" <mesa-sta...@lists.freedesktop.org>
>>>> ---
>>>>  src/compiler/spirv/spirv_to_nir.c  | 10 --
>>>>  src/compiler/spirv/vtn_private.h   |  1 +
>>>>  src/compiler/spirv/vtn_variables.c |  1 +
>>>>  3 files changed, 6 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/src/compiler/spirv/spirv_to_nir.c
>>>> b/src/compiler/spirv/spirv_to_nir.c
>>>> index 6825e0d6a8..93a515d731 100644
>>>> --- a/src/compiler/spirv/spirv_to_nir.c
>>>> +++ b/src/compiler/spirv/spirv_to_nir.c
>>>> @@ -1490,6 +1490,8 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp
>>>> opcode,
>>>>struct vtn_value *val =
>>>>   vtn_push_value(b, w[2], vtn_value_type_sampled_image);
>>>>val->sampled_image = ralloc(b, struct vtn_sampled_image);
>>>> +  val->sampled_image->type =
>>>> + vtn_value(b, w[1], vtn_value_type_type)->type;
>>>>val->sampled_image->image =
>>>>   vtn_value(b, w[3], vtn_value_type_pointer)->pointer;
>>>>val->sampled_image->sampler =
>>>> @@ -1516,16 +1518,12 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp
>>>> opcode,
>>>>sampled = *sampled_val->sampled_image;
>>>> } else {
>>>>assert(sampled_val->value_type == vtn_value_type_pointer);
>>>> +  sampled.type = sampled_val->pointer->type;
>>>>sampled.image = NULL;
>>>>sampled.sampler = sampled_val->pointer;
>>>> }
>>>>
>>>> -   const struct glsl_type *image_type;
>>>> -   if (sampled.image) {
>>>> -  image_type = sampled.image->var->var->interface_type;
>>>> -   } else {
>>>> -  image_type = sampled.sampler->var->var->interface_type;
>>>> -   }
>>>> +   const struct glsl_type *image_type = sampled.type->type;
>>>> const enum glsl_sampler_dim sampler_dim =
>>>> glsl_get_sampler_dim(image_type);
>>>> const bool is_array = glsl_sampler_type_is_array(image_type);
>>>> const bool is_shadow = glsl_sampler_type_is_shadow(image_type);
>>>> diff --git a/src/compiler/spirv/vtn_private.h
>>>> b/src/compiler/spirv/vtn_private.h
>>>> index 84584620fc..6b4645acc8 100644
>>>> --- a/src/compiler/spirv/vtn_private.h
>>>> +++ b/src/compiler/spirv/vtn_private.h
>>>> @@ -411,6 +411,7 @@ struct vtn_image_pointer {
>>>>  };
>>>>
>>>>  struct vtn_sampled_image {
>>>> +   struct vtn_type *type;
>>>> struct vtn_pointer *image; /* Image or array of images */
>>>> struct vtn_pointer *sampler; /* Sampler */
>>>>  };
>>>> diff --git a/src/compiler/spirv/vtn_variables.c
>>>> b/src/compiler/spirv/vtn_variables.c
>>>> index 1cf9d597cf..9a69b4f6fc 100644
>>>> --- a/src/compiler/spirv/vtn_variables.c
>>>> +++ b/src/compiler/spirv/vtn_variables.c
>>>> @@ -1805,6 +1805,7 @@ vtn_handle_variables(struct vtn_builder *b, SpvOp
>>>> opcode,
>>>>   struct vtn_value *val =
>>>>  vtn_push_value(b, w[2], vtn_value_type_sampled_image);
>>>>   val->sampled_image = ralloc(b, struct vtn_sampled_image);
>>>> + val->sampled_image->type = base_val->sampled_image->type;
>>>>   val->sampled_image->image =
>>>>  vtn_pointer_dereference(b, base_val->sampled_image->image,
>>>> chain);
>>>>   val->sampled_image->sampler = base_val->sampled_image->sampl
>>>> er;
>>>> --
>>>> 2.13.6
>>>>
>>>>
>>>
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] anv: Allow PMA optimization to be enabled in secondary command buffers

2018-01-05 Thread Alex Smith

On 5 January 2018 at 21:43, Jason Ekstrand <ja...@jlekstrand.net> wrote:

> On Fri, Jan 5, 2018 at 9:06 AM, Alex Smith <asm...@feralinteractive.com>
> wrote:
>
>> This was never enabled in secondary buffers because hiz_enabled was
>> never set to true for those.
>>
>> If the app provides a framebuffer in the inheritance info when beginning
>> a secondary buffer, we can determine if HiZ is enabled and therefore
>> allow the PMA optimization to be enabled within the command buffer.
>>
>> This improves performance by ~13% on an internal benchmark on Skylake.
>>
>
> Are you sure this is Sky Lake and not Broadwell?  We've never measured the
> stencil PMA to help anything before.  Neat!
>

Yep, this was on HD 530 specifically.


> Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net>
>
>
>> v2: Use anv_cmd_buffer_get_depth_stencil_view().
>>
>> Signed-off-by: Alex Smith <asm...@feralinteractive.com>
>> Reviewed-by: Lionel Landwerlin <lionel.g.landwer...@intel.com>
>> ---
>>  src/intel/vulkan/genX_cmd_buffer.c | 22 +-
>>  1 file changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
>> b/src/intel/vulkan/genX_cmd_buffer.c
>> index 0bd3874db7..b7253d5251 100644
>> --- a/src/intel/vulkan/genX_cmd_buffer.c
>> +++ b/src/intel/vulkan/genX_cmd_buffer.c
>> @@ -977,11 +977,31 @@ genX(BeginCommandBuffer)(
>>   anv_render_pass_from_handle(pBeginInfo->pInheritanceInfo->re
>> nderPass);
>>cmd_buffer->state.subpass =
>>   _buffer->state.pass->subpasses[pBeginInfo->pInheritanceI
>> nfo->subpass];
>> -  cmd_buffer->state.framebuffer = NULL;
>> +
>> +  /* This is optional in the inheritance info. */
>> +  cmd_buffer->state.framebuffer =
>> + anv_framebuffer_from_handle(pBeginInfo->pInheritanceInfo->f
>> ramebuffer);
>>
>
> FYI: The only reason why we were always setting framebuffer to NULL for
> secondaries was because most of the CTS tests specify a framebuffer and I
> wanted our testing to run without the framebuffer because that's the more
> difficult case.  (It's easy to forget that framebuffer may be NULL.)  I
> knew it was a potential performance issue but no one was heavily using
> secondaries in the wild up until now.  Thanks for getting this working!
>

All of our Vulkan games use secondary command buffers, but we also always
set the framebuffer in the inheritance info :)


>
>
>>
>>result = genX(cmd_buffer_setup_attachments)(cmd_buffer,
>>
>>  cmd_buffer->state.pass, NULL);
>>
>> +  /* Record that HiZ is enabled if we can. */
>> +  if (cmd_buffer->state.framebuffer) {
>> + const struct anv_image_view * const iview =
>> +anv_cmd_buffer_get_depth_stencil_view(cmd_buffer);
>> +
>> + if (iview) {
>> +VkImageLayout layout =
>> +cmd_buffer->state.subpass->dep
>> th_stencil_attachment.layout;
>> +
>> +enum isl_aux_usage aux_usage =
>> +   anv_layout_to_aux_usage(_buffer->device->info,
>> iview->image,
>> +   VK_IMAGE_ASPECT_DEPTH_BIT,
>> layout);
>> +
>> +cmd_buffer->state.hiz_enabled = aux_usage ==
>> ISL_AUX_USAGE_HIZ;
>> + }
>> +  }
>> +
>>cmd_buffer->state.dirty |= ANV_CMD_DIRTY_RENDER_TARGETS;
>> }
>>
>> --
>> 2.13.6
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2] anv: Allow PMA optimization to be enabled in secondary command buffers

2018-01-05 Thread Alex Smith

This was never enabled in secondary buffers because hiz_enabled was
never set to true for those.

If the app provides a framebuffer in the inheritance info when beginning
a secondary buffer, we can determine if HiZ is enabled and therefore
allow the PMA optimization to be enabled within the command buffer.

This improves performance by ~13% on an internal benchmark on Skylake.

v2: Use anv_cmd_buffer_get_depth_stencil_view().

Signed-off-by: Alex Smith <asm...@feralinteractive.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwer...@intel.com>
---
 src/intel/vulkan/genX_cmd_buffer.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 0bd3874db7..b7253d5251 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -977,11 +977,31 @@ genX(BeginCommandBuffer)(
  anv_render_pass_from_handle(pBeginInfo->pInheritanceInfo->renderPass);
   cmd_buffer->state.subpass =
  
_buffer->state.pass->subpasses[pBeginInfo->pInheritanceInfo->subpass];
-  cmd_buffer->state.framebuffer = NULL;
+
+  /* This is optional in the inheritance info. */
+  cmd_buffer->state.framebuffer =
+ 
anv_framebuffer_from_handle(pBeginInfo->pInheritanceInfo->framebuffer);
 
   result = genX(cmd_buffer_setup_attachments)(cmd_buffer,
   cmd_buffer->state.pass, 
NULL);
 
+  /* Record that HiZ is enabled if we can. */
+  if (cmd_buffer->state.framebuffer) {
+ const struct anv_image_view * const iview =
+anv_cmd_buffer_get_depth_stencil_view(cmd_buffer);
+
+ if (iview) {
+VkImageLayout layout =
+cmd_buffer->state.subpass->depth_stencil_attachment.layout;
+
+enum isl_aux_usage aux_usage =
+   anv_layout_to_aux_usage(_buffer->device->info, iview->image,
+   VK_IMAGE_ASPECT_DEPTH_BIT, layout);
+
+cmd_buffer->state.hiz_enabled = aux_usage == ISL_AUX_USAGE_HIZ;
+ }
+  }
+
   cmd_buffer->state.dirty |= ANV_CMD_DIRTY_RENDER_TARGETS;
}
 
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv: Allow PMA optimization to enabled in secondary command buffers

2018-01-05 Thread Alex Smith

Thanks, that change makes it look a bit tidier - I'll send a v2 and wait to
see what Jason thinks.

On 5 January 2018 at 16:06, Lionel Landwerlin <lionel.g.landwer...@intel.com
> wrote:

> This makes sense to me, it would be good to have Jason's opinion.
> I have a suggestion below.
>
> Reviewed-by: Lionel Landwerlin <lionel.g.landwer...@intel.com>
>
> Thanks!
>
>
> On 05/01/18 11:20, Alex Smith wrote:
>
>> This was never enabled in secondary buffers because hiz_enabled was
>> never set to true for those.
>>
>> If the app provides a framebuffer in the inheritance info when beginning
>> a secondary buffer, we can determine if HiZ is enabled and therefore
>> allow the PMA optimization to be enabled within the command buffer.
>>
>> This improves performance by ~13% on an internal benchmark on Skylake.
>>
>> Signed-off-by: Alex Smith <asm...@feralinteractive.com>
>> ---
>>   src/intel/vulkan/genX_cmd_buffer.c | 22 +-
>>   1 file changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
>> b/src/intel/vulkan/genX_cmd_buffer.c
>> index 0bd3874db7..2036151249 100644
>> --- a/src/intel/vulkan/genX_cmd_buffer.c
>> +++ b/src/intel/vulkan/genX_cmd_buffer.c
>> @@ -977,11 +977,31 @@ genX(BeginCommandBuffer)(
>>anv_render_pass_from_handle(pBeginInfo->pInheritanceInfo->r
>> enderPass);
>> cmd_buffer->state.subpass =
>>_buffer->state.pass->subpasses[pBeginInfo->pInheritance
>> Info->subpass];
>> -  cmd_buffer->state.framebuffer = NULL;
>> +
>> +  /* This is optional in the inheritance info. */
>> +  cmd_buffer->state.framebuffer =
>> + anv_framebuffer_from_handle(pBeginInfo->pInheritanceInfo->f
>> ramebuffer);
>>   result = genX(cmd_buffer_setup_attachments)(cmd_buffer,
>>
>> cmd_buffer->state.pass, NULL);
>>   +  /* Record that HiZ is enabled if we can. */
>> +  if (cmd_buffer->state.framebuffer) {
>>
>
> You might be able to knock a few lines below by reusing
> anv_cmd_buffer_get_depth_stencil_view().
>
>
> + const VkAttachmentReference * const ds =
>> +_buffer->state.subpass->depth_stencil_attachment;
>> +
>> + if (ds->attachment != VK_ATTACHMENT_UNUSED) {
>> +const struct anv_image_view * const iview =
>> +   cmd_buffer->state.framebuffer->attachments[ds->
>> attachment];
>> +const struct anv_image * const image = iview->image;
>> +enum isl_aux_usage aux_usage =
>> +   anv_layout_to_aux_usage(_buffer->device->info, image,
>> +   VK_IMAGE_ASPECT_DEPTH_BIT,
>> ds->layout);
>> +
>> +cmd_buffer->state.hiz_enabled = aux_usage ==
>> ISL_AUX_USAGE_HIZ;
>> + }
>> +  }
>> +
>> cmd_buffer->state.dirty |= ANV_CMD_DIRTY_RENDER_TARGETS;
>>  }
>>
>>
>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] anv: Allow PMA optimization to enabled in secondary command buffers

2018-01-05 Thread Alex Smith

This was never enabled in secondary buffers because hiz_enabled was
never set to true for those.

If the app provides a framebuffer in the inheritance info when beginning
a secondary buffer, we can determine if HiZ is enabled and therefore
allow the PMA optimization to be enabled within the command buffer.

This improves performance by ~13% on an internal benchmark on Skylake.

Signed-off-by: Alex Smith <asm...@feralinteractive.com>
---
 src/intel/vulkan/genX_cmd_buffer.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 0bd3874db7..2036151249 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -977,11 +977,31 @@ genX(BeginCommandBuffer)(
  anv_render_pass_from_handle(pBeginInfo->pInheritanceInfo->renderPass);
   cmd_buffer->state.subpass =
  
_buffer->state.pass->subpasses[pBeginInfo->pInheritanceInfo->subpass];
-  cmd_buffer->state.framebuffer = NULL;
+
+  /* This is optional in the inheritance info. */
+  cmd_buffer->state.framebuffer =
+ 
anv_framebuffer_from_handle(pBeginInfo->pInheritanceInfo->framebuffer);
 
   result = genX(cmd_buffer_setup_attachments)(cmd_buffer,
   cmd_buffer->state.pass, 
NULL);
 
+  /* Record that HiZ is enabled if we can. */
+  if (cmd_buffer->state.framebuffer) {
+ const VkAttachmentReference * const ds =
+_buffer->state.subpass->depth_stencil_attachment;
+
+ if (ds->attachment != VK_ATTACHMENT_UNUSED) {
+const struct anv_image_view * const iview =
+   cmd_buffer->state.framebuffer->attachments[ds->attachment];
+const struct anv_image * const image = iview->image;
+enum isl_aux_usage aux_usage =
+   anv_layout_to_aux_usage(_buffer->device->info, image,
+   VK_IMAGE_ASPECT_DEPTH_BIT, ds->layout);
+
+cmd_buffer->state.hiz_enabled = aux_usage == ISL_AUX_USAGE_HIZ;
+ }
+  }
+
   cmd_buffer->state.dirty |= ANV_CMD_DIRTY_RENDER_TARGETS;
}
 
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] anv: Take write mask into account in has_color_buffer_write_enabled

2018-01-04 Thread Alex Smith

If we have a color attachment, but its writes are masked, this would
have still returned true. This is inconsistent with how HasWriteableRT
in 3DSTATE_PS_BLEND is set, which does take the mask into account.

This could lead to PixelShaderHasUAV not being set in 3DSTATE_PS_EXTRA
if the fragment shader does use UAVs, meaning the fragment shader may
not be invoked because HasWriteableRT is false. Specifically, this was
seen to occur when the shader also enables early fragment tests: the
fragment shader was not invoked despite passing depth/stencil.

Fix by taking the color write mask into account in this function. This
is consistent with how things are done on i965.

Signed-off-by: Alex Smith <asm...@feralinteractive.com>
Cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/vulkan/genX_pipeline.c | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index 0ae9ead587..cfc3bea426 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -1330,7 +1330,8 @@ emit_3dstate_gs(struct anv_pipeline *pipeline)
 }
 
 static bool
-has_color_buffer_write_enabled(const struct anv_pipeline *pipeline)
+has_color_buffer_write_enabled(const struct anv_pipeline *pipeline,
+   const VkPipelineColorBlendStateCreateInfo 
*blend)
 {
const struct anv_shader_bin *shader_bin =
   pipeline->shaders[MESA_SHADER_FRAGMENT];
@@ -1339,10 +1340,15 @@ has_color_buffer_write_enabled(const struct 
anv_pipeline *pipeline)
 
const struct anv_pipeline_bind_map *bind_map = _bin->bind_map;
for (int i = 0; i < bind_map->surface_count; i++) {
-  if (bind_map->surface_to_descriptor[i].set !=
-  ANV_DESCRIPTOR_SET_COLOR_ATTACHMENTS)
+  struct anv_pipeline_binding *binding = 
_map->surface_to_descriptor[i];
+
+  if (binding->set != ANV_DESCRIPTOR_SET_COLOR_ATTACHMENTS)
  continue;
-  if (bind_map->surface_to_descriptor[i].index != UINT32_MAX)
+
+  const VkPipelineColorBlendAttachmentState *a =
+ >pAttachments[binding->index];
+
+  if (binding->index != UINT32_MAX && a->colorWriteMask != 0)
  return true;
}
 
@@ -1351,6 +1357,7 @@ has_color_buffer_write_enabled(const struct anv_pipeline 
*pipeline)
 
 static void
 emit_3dstate_wm(struct anv_pipeline *pipeline, struct anv_subpass *subpass,
+const VkPipelineColorBlendStateCreateInfo *blend,
 const VkPipelineMultisampleStateCreateInfo *multisample)
 {
const struct brw_wm_prog_data *wm_prog_data = get_wm_prog_data(pipeline);
@@ -1395,7 +1402,7 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct 
anv_subpass *subpass,
  if (wm.PixelShaderComputedDepthMode != PSCDEPTH_OFF ||
  wm_prog_data->has_side_effects ||
  wm.PixelShaderKillsPixel ||
- has_color_buffer_write_enabled(pipeline))
+ has_color_buffer_write_enabled(pipeline, blend))
 wm.ThreadDispatchEnable = true;
 
  if (samples > 1) {
@@ -1520,7 +1527,8 @@ emit_3dstate_ps(struct anv_pipeline *pipeline,
 #if GEN_GEN >= 8
 static void
 emit_3dstate_ps_extra(struct anv_pipeline *pipeline,
-  struct anv_subpass *subpass)
+  struct anv_subpass *subpass,
+  const VkPipelineColorBlendStateCreateInfo *blend)
 {
const struct brw_wm_prog_data *wm_prog_data = get_wm_prog_data(pipeline);
 
@@ -1575,7 +1583,7 @@ emit_3dstate_ps_extra(struct anv_pipeline *pipeline,
* attachments, we need to force-enable here.
*/
   if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
-  !has_color_buffer_write_enabled(pipeline))
+  !has_color_buffer_write_enabled(pipeline, blend))
  ps.PixelShaderHasUAV = true;
 
 #if GEN_GEN >= 9
@@ -1705,10 +1713,11 @@ genX(graphics_pipeline_create)(
emit_3dstate_hs_te_ds(pipeline, pCreateInfo->pTessellationState);
emit_3dstate_gs(pipeline);
emit_3dstate_sbe(pipeline);
-   emit_3dstate_wm(pipeline, subpass, pCreateInfo->pMultisampleState);
+   emit_3dstate_wm(pipeline, subpass, pCreateInfo->pColorBlendState,
+   pCreateInfo->pMultisampleState);
emit_3dstate_ps(pipeline, pCreateInfo->pColorBlendState);
 #if GEN_GEN >= 8
-   emit_3dstate_ps_extra(pipeline, subpass);
+   emit_3dstate_ps_extra(pipeline, subpass, pCreateInfo->pColorBlendState);
emit_3dstate_vf_topology(pipeline);
 #endif
emit_3dstate_vf_statistics(pipeline);
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] anv: Make sure state on primary is correct after CmdExecuteCommands

2018-01-04 Thread Alex Smith

After executing a secondary command buffer, we need to update certain
state on the primary command buffer to reflect changes by the secondary.
Otherwise subsequent commands may not have the correct state set.

This fixes various issues (rendering errors, GPU hangs) seen after
executing secondary command buffers in some cases.

Signed-off-by: Alex Smith <asm...@feralinteractive.com>
Cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/vulkan/genX_cmd_buffer.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 0bd3874db7..f6129f9d67 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -1056,6 +1056,14 @@ genX(CmdExecuteCommands)(
   }
 
   anv_cmd_buffer_add_secondary(primary, secondary);
+
+  /* Make sure state on the primary reflects any state that was set by
+   * the secondary. If the secondary didn't set these, they will be at
+   * their default values, so we will re-set them next time they're
+   * needed on the primary.
+   */
+  primary->state.current_pipeline = secondary->state.current_pipeline;
+  primary->state.current_l3_config = secondary->state.current_l3_config;
}
 
/* Each of the secondary command buffers will use its own state base
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] anv: Add missing unlock in anv_scratch_pool_alloc

2018-01-04 Thread Alex Smith

Fixes hangs seen due to the lock not being released here.

Signed-off-by: Alex Smith <asm...@feralinteractive.com>
Cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/vulkan/anv_allocator.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c
index 33bd3c68c5..fe14d6cfab 100644
--- a/src/intel/vulkan/anv_allocator.c
+++ b/src/intel/vulkan/anv_allocator.c
@@ -1088,8 +1088,10 @@ anv_scratch_pool_alloc(struct anv_device *device, struct 
anv_scratch_pool *pool,
pthread_mutex_lock(>mutex);
 
__sync_synchronize();
-   if (bo->exists)
+   if (bo->exists) {
+  pthread_mutex_unlock(>mutex);
   return >bo;
+   }
 
const struct anv_physical_device *physical_device =
   >instance->physicalDevice;
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] radv: gfx9 3d image fixes

2017-12-21 Thread Alex Smith

Nice - this does fix the issue I was seeing, thanks.

Can at least patches 2 and 3 go to stable?

On 21 December 2017 at 01:50, Dave Airlie  wrote:

> This series fixes about 340 CTS tests on Vega that involve 3D images.
>
> The two main things are to use 3D samplers for copy paths sources that
> are 3D images.
>
> I've also found another bug, and refactors a bit of code at the end.
>
> I've also test this on a tonga and tests don't seem to break.
>
> Dave.
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv/gfx9: add 3d sampler image->buffer copy shader. (v2)

2017-12-20 Thread Alex Smith

Looks like blit2d needs this fix as well - been debugging an issue that's
turned out to be due to a corrupted copy of a 3D texture with CmdCopyImage.
I can do that tomorrow.

However, I did notice that with KHR_maintenance1, it seems like creating 2D
views of 3D textures (and binding them to 2D samplers) is expected to work
- see VK_IMAGE_CREATE_2D_ARRAY_COMPATIBLE_BIT_KHR. Is there no way to make
that work on GFX9?

Thanks,
Alex

On 20 December 2017 at 14:49, Alex Smith <asm...@feralinteractive.com>
wrote:

> Tested-by: Alex Smith <asm...@feralinteractive.com>
>
> Fixes 3D texture contents being captured incorrectly in RenderDoc for me.
>
> On 19 December 2017 at 07:36, Dave Airlie <airl...@gmail.com> wrote:
>
>> From: Dave Airlie <airl...@redhat.com>
>>
>> On GFX9 we must access 3D textures with 3D samplers AFAICS.
>>
>> This fixes:
>> dEQP-VK.api.image_clearing.core.clear_color_image.3d.single_layer
>>
>> on GFX9 for me.
>>
>> v2: fixes a bunch of other tests as well.
>>
>> v1.1: fix tex->sampler_dim to dim
>> v2: send layer in from outside
>>
>> Fixes: e38685cc62e 'Revert "radv: disable support for VEGA for now."'
>> Signed-off-by: Dave Airlie <airl...@redhat.com>
>> ---
>>  src/amd/vulkan/radv_meta_bufimage.c | 87 ++
>> ---
>>  src/amd/vulkan/radv_private.h   |  1 +
>>  2 files changed, 72 insertions(+), 16 deletions(-)
>>
>> diff --git a/src/amd/vulkan/radv_meta_bufimage.c
>> b/src/amd/vulkan/radv_meta_bufimage.c
>> index dfd99aa75f..4a61beef18 100644
>> --- a/src/amd/vulkan/radv_meta_bufimage.c
>> +++ b/src/amd/vulkan/radv_meta_bufimage.c
>> @@ -29,11 +29,15 @@
>>   * Compute queue: implementation also of buffer->image, image->image,
>> and image clear.
>>   */
>>
>> +/* GFX9 needs to use a 3D sampler to access 3D resources, so the shader
>> has the options
>> + * for that.
>> + */
>>  static nir_shader *
>> -build_nir_itob_compute_shader(struct radv_device *dev)
>> +build_nir_itob_compute_shader(struct radv_device *dev, bool is_3d)
>>  {
>> nir_builder b;
>> -   const struct glsl_type *sampler_type =
>> glsl_sampler_type(GLSL_SAMPLER_DIM_2D,
>> +   enum glsl_sampler_dim dim = is_3d ? GLSL_SAMPLER_DIM_3D :
>> GLSL_SAMPLER_DIM_2D;
>> +   const struct glsl_type *sampler_type = glsl_sampler_type(dim,
>>  false,
>>  false,
>>
>>  GLSL_TYPE_FLOAT);
>> @@ -42,7 +46,7 @@ build_nir_itob_compute_shader(struct radv_device *dev)
>>  false,
>>
>>  GLSL_TYPE_FLOAT);
>> nir_builder_init_simple_shader(, NULL, MESA_SHADER_COMPUTE,
>> NULL);
>> -   b.shader->info.name = ralloc_strdup(b.shader, "meta_itob_cs");
>> +   b.shader->info.name = ralloc_strdup(b.shader, is_3d ?
>> "meta_itob_cs_3d" : "meta_itob_cs");
>> b.shader->info.cs.local_size[0] = 16;
>> b.shader->info.cs.local_size[1] = 16;
>> b.shader->info.cs.local_size[2] = 1;
>> @@ -69,32 +73,46 @@ build_nir_itob_compute_shader(struct radv_device
>> *dev)
>>
>> nir_intrinsic_instr *offset = nir_intrinsic_instr_create(b.shader,
>> nir_intrinsic_load_push_constant);
>> nir_intrinsic_set_base(offset, 0);
>> -   nir_intrinsic_set_range(offset, 12);
>> +   nir_intrinsic_set_range(offset, 16);
>> offset->src[0] = nir_src_for_ssa(nir_imm_int(, 0));
>> -   offset->num_components = 2;
>> -   nir_ssa_dest_init(>instr, >dest, 2, 32,
>> "offset");
>> +   offset->num_components = 3;
>> +   nir_ssa_dest_init(>instr, >dest, 3, 32,
>> "offset");
>> nir_builder_instr_insert(, >instr);
>>
>> nir_intrinsic_instr *stride = nir_intrinsic_instr_create(b.shader,
>> nir_intrinsic_load_push_constant);
>> nir_intrinsic_set_base(stride, 0);
>> -   nir_intrinsic_set_range(stride, 12);
>> -   stride->src[0] = nir_src_for_ssa(nir_imm_int(, 8));
>> +   nir_intrinsic_set_range(stride, 16);
>> +   stride->src[0] = nir_src_for_ssa(nir_imm_int(, 12));
>> stride->num_components = 1;
>> nir_ssa_dest_init(>instr, >dest, 1, 32,
>> "stride");
>> nir_builder_instr_insert(, >instr);
>>
>

Re: [Mesa-dev] [PATCH] radv/gfx9: add 3d sampler image->buffer copy shader. (v2)

2017-12-20 Thread Alex Smith

Tested-by: Alex Smith <asm...@feralinteractive.com>

Fixes 3D texture contents being captured incorrectly in RenderDoc for me.

On 19 December 2017 at 07:36, Dave Airlie <airl...@gmail.com> wrote:

> From: Dave Airlie <airl...@redhat.com>
>
> On GFX9 we must access 3D textures with 3D samplers AFAICS.
>
> This fixes:
> dEQP-VK.api.image_clearing.core.clear_color_image.3d.single_layer
>
> on GFX9 for me.
>
> v2: fixes a bunch of other tests as well.
>
> v1.1: fix tex->sampler_dim to dim
> v2: send layer in from outside
>
> Fixes: e38685cc62e 'Revert "radv: disable support for VEGA for now."'
> Signed-off-by: Dave Airlie <airl...@redhat.com>
> ---
>  src/amd/vulkan/radv_meta_bufimage.c | 87 ++
> ---
>  src/amd/vulkan/radv_private.h   |  1 +
>  2 files changed, 72 insertions(+), 16 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_meta_bufimage.c
> b/src/amd/vulkan/radv_meta_bufimage.c
> index dfd99aa75f..4a61beef18 100644
> --- a/src/amd/vulkan/radv_meta_bufimage.c
> +++ b/src/amd/vulkan/radv_meta_bufimage.c
> @@ -29,11 +29,15 @@
>   * Compute queue: implementation also of buffer->image, image->image, and
> image clear.
>   */
>
> +/* GFX9 needs to use a 3D sampler to access 3D resources, so the shader
> has the options
> + * for that.
> + */
>  static nir_shader *
> -build_nir_itob_compute_shader(struct radv_device *dev)
> +build_nir_itob_compute_shader(struct radv_device *dev, bool is_3d)
>  {
> nir_builder b;
> -   const struct glsl_type *sampler_type = glsl_sampler_type(GLSL_
> SAMPLER_DIM_2D,
> +   enum glsl_sampler_dim dim = is_3d ? GLSL_SAMPLER_DIM_3D :
> GLSL_SAMPLER_DIM_2D;
> +   const struct glsl_type *sampler_type = glsl_sampler_type(dim,
>  false,
>  false,
>
>  GLSL_TYPE_FLOAT);
> @@ -42,7 +46,7 @@ build_nir_itob_compute_shader(struct radv_device *dev)
>  false,
>
>  GLSL_TYPE_FLOAT);
> nir_builder_init_simple_shader(, NULL, MESA_SHADER_COMPUTE,
> NULL);
> -   b.shader->info.name = ralloc_strdup(b.shader, "meta_itob_cs");
> +   b.shader->info.name = ralloc_strdup(b.shader, is_3d ?
> "meta_itob_cs_3d" : "meta_itob_cs");
> b.shader->info.cs.local_size[0] = 16;
> b.shader->info.cs.local_size[1] = 16;
> b.shader->info.cs.local_size[2] = 1;
> @@ -69,32 +73,46 @@ build_nir_itob_compute_shader(struct radv_device *dev)
>
> nir_intrinsic_instr *offset = nir_intrinsic_instr_create(b.shader,
> nir_intrinsic_load_push_constant);
> nir_intrinsic_set_base(offset, 0);
> -   nir_intrinsic_set_range(offset, 12);
> +   nir_intrinsic_set_range(offset, 16);
> offset->src[0] = nir_src_for_ssa(nir_imm_int(, 0));
> -   offset->num_components = 2;
> -   nir_ssa_dest_init(>instr, >dest, 2, 32, "offset");
> +   offset->num_components = 3;
> +   nir_ssa_dest_init(>instr, >dest, 3, 32, "offset");
> nir_builder_instr_insert(, >instr);
>
> nir_intrinsic_instr *stride = nir_intrinsic_instr_create(b.shader,
> nir_intrinsic_load_push_constant);
> nir_intrinsic_set_base(stride, 0);
> -   nir_intrinsic_set_range(stride, 12);
> -   stride->src[0] = nir_src_for_ssa(nir_imm_int(, 8));
> +   nir_intrinsic_set_range(stride, 16);
> +   stride->src[0] = nir_src_for_ssa(nir_imm_int(, 12));
> stride->num_components = 1;
> nir_ssa_dest_init(>instr, >dest, 1, 32, "stride");
> nir_builder_instr_insert(, >instr);
>
> nir_ssa_def *img_coord = nir_iadd(, global_id,
> >dest.ssa);
>
> +   nir_ssa_def *img_coord_3d = NULL;
> +
> +   if (is_3d) {
> +   nir_ssa_def *chans[3];
> +
> +   chans[0] = nir_channel(, img_coord, 0);
> +   chans[1] = nir_channel(, img_coord, 1);
> +   chans[2] = nir_channel(, img_coord, 2);
> +   img_coord_3d = nir_vec(, chans, 3);
> +   }
> +
> nir_tex_instr *tex = nir_tex_instr_create(b.shader, 2);
> -   tex->sampler_dim = GLSL_SAMPLER_DIM_2D;
> +   tex->sampler_dim = dim;
> tex->op = nir_texop_txf;
> tex->src[0].src_type = nir_tex_src_coord;
> -   tex->src[0].src = nir_src_for_ssa(nir_channels(, img_coord,
> 0x3));
> +   if (is_3d)
> +   tex->src[0].src = nir_src_for_ssa(nir_channels(,
>

Re: [Mesa-dev] [PATCH] Revert "radv: do not load unused gl_LocalInvocationID/gl_WorkGroupID components"

2017-12-15 Thread Alex Smith

Tested-by: Alex Smith <asm...@feralinteractive.com>

On 15 December 2017 at 15:01, Samuel Pitoiset <samuel.pitoi...@gmail.com>
wrote:

> This reverts commit 2294d35b243dee15af15895e876a63b7d22e48cc.
>
> We can't do this without adjusting the input SGPRs/VGPRs logic.
> For now, just revert it. I will send a proper solution later.
>
> It fixes a rendering issue in F1 2017 that CTS didn't catch up.
>
> Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
> ---
>  src/amd/vulkan/radv_shader.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
> index ab8ba42511..f96b0c07f1 100644
> --- a/src/amd/vulkan/radv_shader.c
> +++ b/src/amd/vulkan/radv_shader.c
> @@ -395,11 +395,8 @@ radv_fill_shader_variant(struct radv_device *device,
> case MESA_SHADER_COMPUTE: {
> struct ac_shader_info *info = >info.info;
> variant->rsrc2 |=
> -   S_00B84C_TGID_X_EN(info->cs.uses_block_id[0]) |
> -   S_00B84C_TGID_Y_EN(info->cs.uses_block_id[1]) |
> -   S_00B84C_TGID_Z_EN(info->cs.uses_block_id[2]) |
> -   S_00B84C_TIDIG_COMP_CNT(info->cs.uses_thread_id[2]
> ? 2 :
> -   info->cs.uses_thread_id[1]
> ? 1 : 0) |
> +   S_00B84C_TGID_X_EN(1) | S_00B84C_TGID_Y_EN(1) |
> +   S_00B84C_TGID_Z_EN(1) | S_00B84C_TIDIG_COMP_CNT(2)
> |
> 
> S_00B84C_TG_SIZE_EN(info->cs.uses_local_invocation_idx)
> |
> S_00B84C_LDS_SIZE(variant->config.lds_size);
> break;
> --
> 2.15.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir/opcodes: Fix constant-folding of bitfield_insert

2017-12-07 Thread Alex Smith

Pushed.

On 6 December 2017 at 17:35, Matt Turner  wrote:

> On Wed, Dec 6, 2017 at 3:55 AM, James Legg 
> wrote:
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104119
> > CC: 
> > CC: Samuel Pitoiset 
> > ---
> >  src/compiler/nir/nir_opcodes.py | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_
> opcodes.py
> > index ac7333fe78..278562b2bd 100644
> > --- a/src/compiler/nir/nir_opcodes.py
> > +++ b/src/compiler/nir/nir_opcodes.py
> > @@ -724,12 +724,12 @@ opcode("bitfield_insert", 0, tuint32, [0, 0, 0, 0],
> >  unsigned base = src0, insert = src1;
> >  int offset = src2, bits = src3;
> >  if (bits == 0) {
> > -   dst = 0;
> > +   dst = base;
> >  } else if (offset < 0 || bits < 0 || bits + offset > 32) {
> > dst = 0;
> >  } else {
> > unsigned mask = ((1ull << bits) - 1) << offset;
> > -   dst = (base & ~mask) | ((insert << bits) & mask);
> > +   dst = (base & ~mask) | ((insert << offset) & mask);
> >  }
> >  """)
>
> Reviewed-by: Matt Turner 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radv: Add LLVM version to the device name string

2017-12-06 Thread Alex Smith

Allows apps to determine the LLVM version so that they can decide
whether or not to enable workarounds for LLVM issues.

Signed-off-by: Alex Smith <asm...@feralinteractive.com>
Cc: "17.2 17.3" <mesa-sta...@lists.freedesktop.org>
---
 src/amd/vulkan/radv_device.c  | 61 +--
 src/amd/vulkan/radv_private.h |  2 +-
 2 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 1b7cd35593..2c3c84ee19 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -75,32 +75,43 @@ radv_get_device_uuid(struct radeon_info *info, void *uuid)
ac_compute_device_uuid(info, uuid, VK_UUID_SIZE);
 }
 
-static const char *
-get_chip_name(enum radeon_family family)
+static void
+radv_get_device_name(enum radeon_family family, char *name, size_t name_len)
 {
+   const char *chip_string;
+   char llvm_string[32] = {};
+
switch (family) {
-   case CHIP_TAHITI: return "AMD RADV TAHITI";
-   case CHIP_PITCAIRN: return "AMD RADV PITCAIRN";
-   case CHIP_VERDE: return "AMD RADV CAPE VERDE";
-   case CHIP_OLAND: return "AMD RADV OLAND";
-   case CHIP_HAINAN: return "AMD RADV HAINAN";
-   case CHIP_BONAIRE: return "AMD RADV BONAIRE";
-   case CHIP_KAVERI: return "AMD RADV KAVERI";
-   case CHIP_KABINI: return "AMD RADV KABINI";
-   case CHIP_HAWAII: return "AMD RADV HAWAII";
-   case CHIP_MULLINS: return "AMD RADV MULLINS";
-   case CHIP_TONGA: return "AMD RADV TONGA";
-   case CHIP_ICELAND: return "AMD RADV ICELAND";
-   case CHIP_CARRIZO: return "AMD RADV CARRIZO";
-   case CHIP_FIJI: return "AMD RADV FIJI";
-   case CHIP_POLARIS10: return "AMD RADV POLARIS10";
-   case CHIP_POLARIS11: return "AMD RADV POLARIS11";
-   case CHIP_POLARIS12: return "AMD RADV POLARIS12";
-   case CHIP_STONEY: return "AMD RADV STONEY";
-   case CHIP_VEGA10: return "AMD RADV VEGA";
-   case CHIP_RAVEN: return "AMD RADV RAVEN";
-   default: return "AMD RADV unknown";
-   }
+   case CHIP_TAHITI: chip_string = "AMD RADV TAHITI"; break;
+   case CHIP_PITCAIRN: chip_string = "AMD RADV PITCAIRN"; break;
+   case CHIP_VERDE: chip_string = "AMD RADV CAPE VERDE"; break;
+   case CHIP_OLAND: chip_string = "AMD RADV OLAND"; break;
+   case CHIP_HAINAN: chip_string = "AMD RADV HAINAN"; break;
+   case CHIP_BONAIRE: chip_string = "AMD RADV BONAIRE"; break;
+   case CHIP_KAVERI: chip_string = "AMD RADV KAVERI"; break;
+   case CHIP_KABINI: chip_string = "AMD RADV KABINI"; break;
+   case CHIP_HAWAII: chip_string = "AMD RADV HAWAII"; break;
+   case CHIP_MULLINS: chip_string = "AMD RADV MULLINS"; break;
+   case CHIP_TONGA: chip_string = "AMD RADV TONGA"; break;
+   case CHIP_ICELAND: chip_string = "AMD RADV ICELAND"; break;
+   case CHIP_CARRIZO: chip_string = "AMD RADV CARRIZO"; break;
+   case CHIP_FIJI: chip_string = "AMD RADV FIJI"; break;
+   case CHIP_POLARIS10: chip_string = "AMD RADV POLARIS10"; break;
+   case CHIP_POLARIS11: chip_string = "AMD RADV POLARIS11"; break;
+   case CHIP_POLARIS12: chip_string = "AMD RADV POLARIS12"; break;
+   case CHIP_STONEY: chip_string = "AMD RADV STONEY"; break;
+   case CHIP_VEGA10: chip_string = "AMD RADV VEGA"; break;
+   case CHIP_RAVEN: chip_string = "AMD RADV RAVEN"; break;
+   default: chip_string = "AMD RADV unknown"; break;
+   }
+
+   if (HAVE_LLVM > 0) {
+   snprintf(llvm_string, sizeof(llvm_string),
+" (LLVM %i.%i.%i)", (HAVE_LLVM >> 8) & 0xff,
+HAVE_LLVM & 0xff, MESA_LLVM_VERSION_PATCH);
+   }
+
+   snprintf(name, name_len, "%s%s", chip_string, llvm_string);
 }
 
 static void
@@ -215,7 +226,7 @@ radv_physical_device_init(struct radv_physical_device 
*device,
device->local_fd = fd;
device->ws->query_info(device->ws, >rad_info);
 
-   device->name = get_chip_name(device->rad_info.family);
+   radv_get_device_name(device->rad_info.family, device->name, 
sizeof(device->name));
 
if (radv_device_get_cache_uuid(device->rad_info.family, 
device->cache_uuid)) {
device->ws->destroy(device->ws);
diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
index 67c2011107..3edfda6b12 100644
--- a/src/amd/vulkan/radv_private.h
+++ b/src/amd/vulka

Re: [Mesa-dev] [PATCH] nir/spirv: tg4 requires a sampler

2017-11-07 Thread Alex Smith

Good point :) Will do soon.

On 7 November 2017 at 15:46, Jason Ekstrand <ja...@jlekstrand.net> wrote:

> Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net>
>
> Given that you already have 27 non-trivial commits in mesa, I think now
> would be a good time to apply for commit access.  :-)
>
> On Tue, Nov 7, 2017 at 2:52 AM, Alex Smith <asm...@feralinteractive.com>
> wrote:
>
>> Gather operations in both GLSL and SPIR-V require a sampler. Fixes
>> gathers returning garbage when using separate texture/samplers (on AMD,
>> was using an invalid sampler descriptor).
>>
>> Signed-off-by: Alex Smith <asm...@feralinteractive.com>
>> Cc: "17.2 17.3" <mesa-sta...@lists.freedesktop.org>
>> ---
>>  src/compiler/nir/nir.h| 1 -
>>  src/compiler/spirv/spirv_to_nir.c | 2 +-
>>  2 files changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index 0174c30504..9c804c62bd 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -1214,7 +1214,6 @@ typedef struct {
>>  *- nir_texop_txf_ms
>>  *- nir_texop_txs
>>  *- nir_texop_lod
>> -*- nir_texop_tg4
>>  *- nir_texop_query_levels
>>  *- nir_texop_texture_samples
>>  *- nir_texop_samples_identical
>> diff --git a/src/compiler/spirv/spirv_to_nir.c
>> b/src/compiler/spirv/spirv_to_nir.c
>> index 93a515d731..027efab88d 100644
>> --- a/src/compiler/spirv/spirv_to_nir.c
>> +++ b/src/compiler/spirv/spirv_to_nir.c
>> @@ -1755,6 +1755,7 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp
>> opcode,
>> case nir_texop_txb:
>> case nir_texop_txl:
>> case nir_texop_txd:
>> +   case nir_texop_tg4:
>>/* These operations require a sampler */
>>instr->sampler = nir_deref_var_clone(sampler, instr);
>>break;
>> @@ -1762,7 +1763,6 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp
>> opcode,
>> case nir_texop_txf_ms:
>> case nir_texop_txs:
>> case nir_texop_lod:
>> -   case nir_texop_tg4:
>> case nir_texop_query_levels:
>> case nir_texop_texture_samples:
>> case nir_texop_samples_identical:
>> --
>> 2.13.6
>>
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] spirv: Use correct type for sampled images

2017-11-07 Thread Alex Smith

Thanks Jason. Can someone push this?

On 6 November 2017 at 16:21, Jason Ekstrand <ja...@jlekstrand.net> wrote:

> On Mon, Nov 6, 2017 at 2:37 AM, Alex Smith <asm...@feralinteractive.com>
> wrote:
>
>> We should use the result type of the OpSampledImage opcode, rather than
>> the type of the underlying image/samplers.
>>
>> This resolves an issue when using separate images and shadow samplers
>> with glslang. Example:
>>
>> layout (...) uniform samplerShadow s0;
>> layout (...) uniform texture2D res0;
>> ...
>> float result = textureLod(sampler2DShadow(res0, s0), uv, 0);
>>
>> For this, for the combined OpSampledImage, the type of the base image
>> was being used (which does not have the Depth flag set, whereas the
>> result type does), therefore it was not being recognised as a shadow
>> sampler. This led to the wrong LLVM intrinsics being emitted by RADV.
>>
>
> Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net>
>
>
>> Signed-off-by: Alex Smith <asm...@feralinteractive.com>
>> Cc: "17.2 17.3" <mesa-sta...@lists.freedesktop.org>
>> ---
>>  src/compiler/spirv/spirv_to_nir.c  | 10 --
>>  src/compiler/spirv/vtn_private.h   |  1 +
>>  src/compiler/spirv/vtn_variables.c |  1 +
>>  3 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/compiler/spirv/spirv_to_nir.c
>> b/src/compiler/spirv/spirv_to_nir.c
>> index 6825e0d6a8..93a515d731 100644
>> --- a/src/compiler/spirv/spirv_to_nir.c
>> +++ b/src/compiler/spirv/spirv_to_nir.c
>> @@ -1490,6 +1490,8 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp
>> opcode,
>>struct vtn_value *val =
>>   vtn_push_value(b, w[2], vtn_value_type_sampled_image);
>>val->sampled_image = ralloc(b, struct vtn_sampled_image);
>> +  val->sampled_image->type =
>> + vtn_value(b, w[1], vtn_value_type_type)->type;
>>val->sampled_image->image =
>>   vtn_value(b, w[3], vtn_value_type_pointer)->pointer;
>>val->sampled_image->sampler =
>> @@ -1516,16 +1518,12 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp
>> opcode,
>>sampled = *sampled_val->sampled_image;
>> } else {
>>assert(sampled_val->value_type == vtn_value_type_pointer);
>> +  sampled.type = sampled_val->pointer->type;
>>sampled.image = NULL;
>>sampled.sampler = sampled_val->pointer;
>> }
>>
>> -   const struct glsl_type *image_type;
>> -   if (sampled.image) {
>> -  image_type = sampled.image->var->var->interface_type;
>> -   } else {
>> -  image_type = sampled.sampler->var->var->interface_type;
>> -   }
>> +   const struct glsl_type *image_type = sampled.type->type;
>> const enum glsl_sampler_dim sampler_dim =
>> glsl_get_sampler_dim(image_type);
>> const bool is_array = glsl_sampler_type_is_array(image_type);
>> const bool is_shadow = glsl_sampler_type_is_shadow(image_type);
>> diff --git a/src/compiler/spirv/vtn_private.h
>> b/src/compiler/spirv/vtn_private.h
>> index 84584620fc..6b4645acc8 100644
>> --- a/src/compiler/spirv/vtn_private.h
>> +++ b/src/compiler/spirv/vtn_private.h
>> @@ -411,6 +411,7 @@ struct vtn_image_pointer {
>>  };
>>
>>  struct vtn_sampled_image {
>> +   struct vtn_type *type;
>> struct vtn_pointer *image; /* Image or array of images */
>> struct vtn_pointer *sampler; /* Sampler */
>>  };
>> diff --git a/src/compiler/spirv/vtn_variables.c
>> b/src/compiler/spirv/vtn_variables.c
>> index 1cf9d597cf..9a69b4f6fc 100644
>> --- a/src/compiler/spirv/vtn_variables.c
>> +++ b/src/compiler/spirv/vtn_variables.c
>> @@ -1805,6 +1805,7 @@ vtn_handle_variables(struct vtn_builder *b, SpvOp
>> opcode,
>>   struct vtn_value *val =
>>  vtn_push_value(b, w[2], vtn_value_type_sampled_image);
>>   val->sampled_image = ralloc(b, struct vtn_sampled_image);
>> + val->sampled_image->type = base_val->sampled_image->type;
>>   val->sampled_image->image =
>>  vtn_pointer_dereference(b, base_val->sampled_image->image,
>> chain);
>>   val->sampled_image->sampler = base_val->sampled_image->sampler;
>> --
>> 2.13.6
>>
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] nir/spirv: tg4 requires a sampler

2017-11-07 Thread Alex Smith

Gather operations in both GLSL and SPIR-V require a sampler. Fixes
gathers returning garbage when using separate texture/samplers (on AMD,
was using an invalid sampler descriptor).

Signed-off-by: Alex Smith <asm...@feralinteractive.com>
Cc: "17.2 17.3" <mesa-sta...@lists.freedesktop.org>
---
 src/compiler/nir/nir.h| 1 -
 src/compiler/spirv/spirv_to_nir.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 0174c30504..9c804c62bd 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1214,7 +1214,6 @@ typedef struct {
 *- nir_texop_txf_ms
 *- nir_texop_txs
 *- nir_texop_lod
-*- nir_texop_tg4
 *- nir_texop_query_levels
 *- nir_texop_texture_samples
 *- nir_texop_samples_identical
diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index 93a515d731..027efab88d 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -1755,6 +1755,7 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp opcode,
case nir_texop_txb:
case nir_texop_txl:
case nir_texop_txd:
+   case nir_texop_tg4:
   /* These operations require a sampler */
   instr->sampler = nir_deref_var_clone(sampler, instr);
   break;
@@ -1762,7 +1763,6 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp opcode,
case nir_texop_txf_ms:
case nir_texop_txs:
case nir_texop_lod:
-   case nir_texop_tg4:
case nir_texop_query_levels:
case nir_texop_texture_samples:
case nir_texop_samples_identical:
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: force enable LLVM sisched for The Talos Principle

2017-11-07 Thread Alex Smith

On 7 November 2017 at 09:28, Samuel Pitoiset 
wrote:

>
>
> On 11/07/2017 10:18 AM, Michel Dänzer wrote:
>
>> On 07/11/17 10:08 AM, Samuel Pitoiset wrote:
>>
>>> It seems safe and it improves performance by +4% (73->76).
>>>
>>> Signed-off-by: Samuel Pitoiset 
>>> ---
>>>   src/amd/vulkan/radv_device.c | 20 
>>>   1 file changed, 20 insertions(+)
>>>
>>> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
>>> index 1ecf70d4a9..29bbcc5a43 100644
>>> --- a/src/amd/vulkan/radv_device.c
>>> +++ b/src/amd/vulkan/radv_device.c
>>> @@ -341,6 +341,24 @@ radv_get_perftest_option_name(int id)
>>> return radv_perftest_options[id].string;
>>>   }
>>>   +static void
>>> +radv_handle_per_app_options(struct radv_instance *instance,
>>> +   const VkApplicationInfo *info)
>>> +{
>>> +   const char *name = info ? info->pApplicationName : NULL;
>>> +
>>> +   if (!name)
>>> +   return;
>>> +
>>> +   if (!strcmp(name, "Talos - Linux - 32bit") ||
>>> +   !strcmp(name, "Talos - Linux - 64bit")) {
>>> +   /* Force enable LLVM sisched for Talos because it looks
>>> safe
>>> +* and it gives few more FPS.
>>> +*/
>>> +   instance->perftest_flags |= RADV_PERFTEST_SISCHED;
>>> +   }
>>> +}
>>> +
>>>   VkResult radv_CreateInstance(
>>> const VkInstanceCreateInfo* pCreateInfo,
>>> const VkAllocationCallbacks*pAllocator,
>>> @@ -400,6 +418,8 @@ VkResult radv_CreateInstance(
>>> instance->perftest_flags = parse_debug_string(getenv("RAD
>>> V_PERFTEST"),
>>>
>>>  radv_perftest_options);
>>>   + radv_handle_per_app_options(instance,
>>> pCreateInfo->pApplicationInfo);
>>> +
>>> *pInstance = radv_instance_to_handle(instance);
>>> return VK_SUCCESS;
>>>
>>>
>> There should probably be a way to explicitly disable sisched.
>>
>
> mmh, yeah probably. RADV_DEBUG="nosisched" could be a thing then.


Can't this sort of app-specific stuff go to a drirc-style config file?

Alex


>
>
>
>>
>> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] spirv: Use correct type for sampled images

2017-11-06 Thread Alex Smith

We should use the result type of the OpSampledImage opcode, rather than
the type of the underlying image/samplers.

This resolves an issue when using separate images and shadow samplers
with glslang. Example:

layout (...) uniform samplerShadow s0;
layout (...) uniform texture2D res0;
...
float result = textureLod(sampler2DShadow(res0, s0), uv, 0);

For this, for the combined OpSampledImage, the type of the base image
was being used (which does not have the Depth flag set, whereas the
result type does), therefore it was not being recognised as a shadow
sampler. This led to the wrong LLVM intrinsics being emitted by RADV.

Signed-off-by: Alex Smith <asm...@feralinteractive.com>
Cc: "17.2 17.3" <mesa-sta...@lists.freedesktop.org>
---
 src/compiler/spirv/spirv_to_nir.c  | 10 --
 src/compiler/spirv/vtn_private.h   |  1 +
 src/compiler/spirv/vtn_variables.c |  1 +
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index 6825e0d6a8..93a515d731 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -1490,6 +1490,8 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp opcode,
   struct vtn_value *val =
  vtn_push_value(b, w[2], vtn_value_type_sampled_image);
   val->sampled_image = ralloc(b, struct vtn_sampled_image);
+  val->sampled_image->type =
+ vtn_value(b, w[1], vtn_value_type_type)->type;
   val->sampled_image->image =
  vtn_value(b, w[3], vtn_value_type_pointer)->pointer;
   val->sampled_image->sampler =
@@ -1516,16 +1518,12 @@ vtn_handle_texture(struct vtn_builder *b, SpvOp opcode,
   sampled = *sampled_val->sampled_image;
} else {
   assert(sampled_val->value_type == vtn_value_type_pointer);
+  sampled.type = sampled_val->pointer->type;
   sampled.image = NULL;
   sampled.sampler = sampled_val->pointer;
}
 
-   const struct glsl_type *image_type;
-   if (sampled.image) {
-  image_type = sampled.image->var->var->interface_type;
-   } else {
-  image_type = sampled.sampler->var->var->interface_type;
-   }
+   const struct glsl_type *image_type = sampled.type->type;
const enum glsl_sampler_dim sampler_dim = glsl_get_sampler_dim(image_type);
const bool is_array = glsl_sampler_type_is_array(image_type);
const bool is_shadow = glsl_sampler_type_is_shadow(image_type);
diff --git a/src/compiler/spirv/vtn_private.h b/src/compiler/spirv/vtn_private.h
index 84584620fc..6b4645acc8 100644
--- a/src/compiler/spirv/vtn_private.h
+++ b/src/compiler/spirv/vtn_private.h
@@ -411,6 +411,7 @@ struct vtn_image_pointer {
 };
 
 struct vtn_sampled_image {
+   struct vtn_type *type;
struct vtn_pointer *image; /* Image or array of images */
struct vtn_pointer *sampler; /* Sampler */
 };
diff --git a/src/compiler/spirv/vtn_variables.c 
b/src/compiler/spirv/vtn_variables.c
index 1cf9d597cf..9a69b4f6fc 100644
--- a/src/compiler/spirv/vtn_variables.c
+++ b/src/compiler/spirv/vtn_variables.c
@@ -1805,6 +1805,7 @@ vtn_handle_variables(struct vtn_builder *b, SpvOp opcode,
  struct vtn_value *val =
 vtn_push_value(b, w[2], vtn_value_type_sampled_image);
  val->sampled_image = ralloc(b, struct vtn_sampled_image);
+ val->sampled_image->type = base_val->sampled_image->type;
  val->sampled_image->image =
 vtn_pointer_dereference(b, base_val->sampled_image->image, chain);
  val->sampled_image->sampler = base_val->sampled_image->sampler;
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: do not advertise D16_UNORM on VI

2017-11-03 Thread Alex Smith

Hi Samuel,

D16_UNORM support is mandatory on 2D images according to the spec
("Features, Limits and Formats" chapter).

Thanks,
Alex

On 3 November 2017 at 10:02, Samuel Pitoiset 
wrote:

> TC compatible HTILE only supports D32_SFLOAT on VI, while GFX9
> supports both. This is a recommandation for apps because HTILE
> decompressions are costly.
>
> This improves performance with Talos (73->76FPS) and Serious
> Sam 2017 (119->134FPS) because they no longer use any 16bpp
> depth surfaces and thus no HTILE decompressions are needed.
>
> Mad Max and DOW3 are not affected, but F1 2017 still uses one
> 16bpp depth surface even when the format is not advertised.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_formats.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/src/amd/vulkan/radv_formats.c b/src/amd/vulkan/radv_formats.c
> index 5c79ea7406..21a05c4bbd 100644
> --- a/src/amd/vulkan/radv_formats.c
> +++ b/src/amd/vulkan/radv_formats.c
> @@ -1122,6 +1122,15 @@ static VkResult radv_get_image_format_properties(struct
> radv_physical_device *ph
> }
> }
>
> +   if (physical_device->rad_info.chip_class == VI &&
> +   info->format == VK_FORMAT_D16_UNORM) {
> +   /* Do not advertise D16_UNORM on VI because TC compatible
> HTILE
> +* only supports D32_SFLOAT (GFX9 supports both), and HTILE
> +* decompressions are costly.
> +*/
> +   goto unsupported;
> +   }
> +
> *pImageFormatProperties = (VkImageFormatProperties) {
> .maxExtent = maxExtent,
> .maxMipLevels = maxMipLevels,
> --
> 2.15.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radv: Fix -Wformat-security issue

2017-10-30 Thread Alex Smith

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103513
Fixes: de889794134e ("radv: Implement VK_AMD_shader_info")
Signed-off-by: Alex Smith <asm...@feralinteractive.com>
---
 src/amd/vulkan/radv_shader.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index c9edb28ba2..9162612284 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -680,7 +680,7 @@ radv_shader_dump_stats(struct radv_device *device,
generate_shader_stats(device, variant, stage, buf);
 
fprintf(file, "\n%s:\n", radv_get_shader_name(variant, stage));
-   fprintf(file, buf->buf);
+   fprintf(file, "%s", buf->buf);
 
_mesa_string_buffer_destroy(buf);
 }
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3] radv: Implement VK_AMD_shader_info

2017-10-27 Thread Alex Smith

This allows an app to query shader statistics and get a disassembly of
a shader. RenderDoc git has support for it, so this allows you to view
shader disassembly from a capture.

When this extension is enabled on a device (or when tracing), we now
disable pipeline caching, since we don't get the shader debug info when
we retrieve cached shaders.

v2: Improvements to resource usage reporting
v3: Disassembly string must be null terminated (string_buffer's length
does not include the terminator)

Signed-off-by: Alex Smith <asm...@feralinteractive.com>
---
 src/amd/vulkan/radv_device.c |   9 ++
 src/amd/vulkan/radv_extensions.py|   1 +
 src/amd/vulkan/radv_pipeline.c   |   2 +-
 src/amd/vulkan/radv_pipeline_cache.c |  11 ++-
 src/amd/vulkan/radv_private.h|   3 +
 src/amd/vulkan/radv_shader.c | 179 +--
 6 files changed, 170 insertions(+), 35 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index d25e9c97ba..0c2f6fa631 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -944,10 +944,15 @@ VkResult radv_CreateDevice(
VkResult result;
struct radv_device *device;
 
+   bool keep_shader_info = false;
+
for (uint32_t i = 0; i < pCreateInfo->enabledExtensionCount; i++) {
const char *ext_name = pCreateInfo->ppEnabledExtensionNames[i];
if (!radv_physical_device_extension_supported(physical_device, 
ext_name))
return vk_error(VK_ERROR_EXTENSION_NOT_PRESENT);
+
+   if (strcmp(ext_name, VK_AMD_SHADER_INFO_EXTENSION_NAME) == 0)
+   keep_shader_info = true;
}
 
/* Check enabled features */
@@ -1041,10 +1046,14 @@ VkResult radv_CreateDevice(
device->physical_device->rad_info.max_se >= 2;
 
if (getenv("RADV_TRACE_FILE")) {
+   keep_shader_info = true;
+
if (!radv_init_trace(device))
goto fail;
}
 
+   device->keep_shader_info = keep_shader_info;
+
result = radv_device_init_meta(device);
if (result != VK_SUCCESS)
goto fail;
diff --git a/src/amd/vulkan/radv_extensions.py 
b/src/amd/vulkan/radv_extensions.py
index dfeb2880fc..eeb679d65a 100644
--- a/src/amd/vulkan/radv_extensions.py
+++ b/src/amd/vulkan/radv_extensions.py
@@ -81,6 +81,7 @@ EXTENSIONS = [
 Extension('VK_EXT_global_priority',   1, 
'device->rad_info.has_ctx_priority'),
 Extension('VK_AMD_draw_indirect_count',   1, True),
 Extension('VK_AMD_rasterization_order',   1, 
'device->rad_info.chip_class >= VI && device->rad_info.max_se >= 2'),
+Extension('VK_AMD_shader_info',   1, True),
 ]
 
 class VkVersion:
diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index c25642c966..c6d9debc7e 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -1942,7 +1942,7 @@ void radv_create_shaders(struct radv_pipeline *pipeline,
 
for (int i = 0; i < MESA_SHADER_STAGES; ++i) {
free(codes[i]);
-   if (modules[i] && !pipeline->device->trace_bo)
+   if (modules[i] && !pipeline->device->keep_shader_info)
ralloc_free(nir[i]);
}
 
diff --git a/src/amd/vulkan/radv_pipeline_cache.c 
b/src/amd/vulkan/radv_pipeline_cache.c
index 5dee114749..441e2257d5 100644
--- a/src/amd/vulkan/radv_pipeline_cache.c
+++ b/src/amd/vulkan/radv_pipeline_cache.c
@@ -62,9 +62,11 @@ radv_pipeline_cache_init(struct radv_pipeline_cache *cache,
cache->hash_table = malloc(byte_size);
 
/* We don't consider allocation failure fatal, we just start with a 
0-sized
-* cache. */
+* cache. Disable caching when we want to keep shader debug info, since
+* we don't get the debug info on cached shaders. */
if (cache->hash_table == NULL ||
-   (device->instance->debug_flags & RADV_DEBUG_NO_CACHE))
+   (device->instance->debug_flags & RADV_DEBUG_NO_CACHE) ||
+   device->keep_shader_info)
cache->table_size = 0;
else
memset(cache->hash_table, 0, byte_size);
@@ -186,8 +188,11 @@ radv_create_shader_variants_from_pipeline_cache(struct 
radv_device *device,
entry = radv_pipeline_cache_search_unlocked(cache, sha1);
 
if (!entry) {
+   /* Again, don't cache when we want debug info, since this isn't
+* present in the cache. */
if (!device->physical_device->disk_cache ||
-   (device->instance->debug_flags & RADV_DEBUG_NO_CACHE)) {
+   (device->instance->debug_flags & RADV_DEBUG_NO_CACHE) ||
+

[Mesa-dev] [PATCH v2] radv: Implement VK_AMD_shader_info

2017-10-26 Thread Alex Smith

This allows an app to query shader statistics and get a disassembly of
a shader. RenderDoc git has support for it, so this allows you to view
shader disassembly from a capture.

When this extension is enabled on a device (or when tracing), we now
disable pipeline caching, since we don't get the shader debug info when
we retrieve cached shaders.

v2: Improvements to resource usage reporting

Signed-off-by: Alex Smith <asm...@feralinteractive.com>
---
 src/amd/vulkan/radv_device.c |   9 ++
 src/amd/vulkan/radv_extensions.py|   1 +
 src/amd/vulkan/radv_pipeline.c   |   2 +-
 src/amd/vulkan/radv_pipeline_cache.c |  11 ++-
 src/amd/vulkan/radv_private.h|   3 +
 src/amd/vulkan/radv_shader.c | 176 +--
 6 files changed, 167 insertions(+), 35 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 19ff8fec64..e891e40467 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -944,10 +944,15 @@ VkResult radv_CreateDevice(
VkResult result;
struct radv_device *device;
 
+   bool keep_shader_info = false;
+
for (uint32_t i = 0; i < pCreateInfo->enabledExtensionCount; i++) {
const char *ext_name = pCreateInfo->ppEnabledExtensionNames[i];
if (!radv_physical_device_extension_supported(physical_device, 
ext_name))
return vk_error(VK_ERROR_EXTENSION_NOT_PRESENT);
+
+   if (strcmp(ext_name, VK_AMD_SHADER_INFO_EXTENSION_NAME) == 0)
+   keep_shader_info = true;
}
 
/* Check enabled features */
@@ -1041,10 +1046,14 @@ VkResult radv_CreateDevice(
device->physical_device->rad_info.max_se >= 2;
 
if (getenv("RADV_TRACE_FILE")) {
+   keep_shader_info = true;
+
if (!radv_init_trace(device))
goto fail;
}
 
+   device->keep_shader_info = keep_shader_info;
+
result = radv_device_init_meta(device);
if (result != VK_SUCCESS)
goto fail;
diff --git a/src/amd/vulkan/radv_extensions.py 
b/src/amd/vulkan/radv_extensions.py
index dfeb2880fc..eeb679d65a 100644
--- a/src/amd/vulkan/radv_extensions.py
+++ b/src/amd/vulkan/radv_extensions.py
@@ -81,6 +81,7 @@ EXTENSIONS = [
 Extension('VK_EXT_global_priority',   1, 
'device->rad_info.has_ctx_priority'),
 Extension('VK_AMD_draw_indirect_count',   1, True),
 Extension('VK_AMD_rasterization_order',   1, 
'device->rad_info.chip_class >= VI && device->rad_info.max_se >= 2'),
+Extension('VK_AMD_shader_info',   1, True),
 ]
 
 class VkVersion:
diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index c25642c966..c6d9debc7e 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -1942,7 +1942,7 @@ void radv_create_shaders(struct radv_pipeline *pipeline,
 
for (int i = 0; i < MESA_SHADER_STAGES; ++i) {
free(codes[i]);
-   if (modules[i] && !pipeline->device->trace_bo)
+   if (modules[i] && !pipeline->device->keep_shader_info)
ralloc_free(nir[i]);
}
 
diff --git a/src/amd/vulkan/radv_pipeline_cache.c 
b/src/amd/vulkan/radv_pipeline_cache.c
index 5dee114749..441e2257d5 100644
--- a/src/amd/vulkan/radv_pipeline_cache.c
+++ b/src/amd/vulkan/radv_pipeline_cache.c
@@ -62,9 +62,11 @@ radv_pipeline_cache_init(struct radv_pipeline_cache *cache,
cache->hash_table = malloc(byte_size);
 
/* We don't consider allocation failure fatal, we just start with a 
0-sized
-* cache. */
+* cache. Disable caching when we want to keep shader debug info, since
+* we don't get the debug info on cached shaders. */
if (cache->hash_table == NULL ||
-   (device->instance->debug_flags & RADV_DEBUG_NO_CACHE))
+   (device->instance->debug_flags & RADV_DEBUG_NO_CACHE) ||
+   device->keep_shader_info)
cache->table_size = 0;
else
memset(cache->hash_table, 0, byte_size);
@@ -186,8 +188,11 @@ radv_create_shader_variants_from_pipeline_cache(struct 
radv_device *device,
entry = radv_pipeline_cache_search_unlocked(cache, sha1);
 
if (!entry) {
+   /* Again, don't cache when we want debug info, since this isn't
+* present in the cache. */
if (!device->physical_device->disk_cache ||
-   (device->instance->debug_flags & RADV_DEBUG_NO_CACHE)) {
+   (device->instance->debug_flags & RADV_DEBUG_NO_CACHE) ||
+   device->keep_shader_info) {
pthread_mutex_unlock(>mutex);

Re: [Mesa-dev] [PATCH 2/2] radv: Implement VK_AMD_shader_info

2017-10-26 Thread Alex Smith

On 25 October 2017 at 21:58, Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl>
wrote:

> On Wed, Oct 25, 2017 at 4:03 PM, Samuel Pitoiset
> <samuel.pitoi...@gmail.com> wrote:
> >
> >
> > On 10/25/2017 02:20 PM, Alex Smith wrote:
> >>
> >> On 25 October 2017 at 12:46, Samuel Pitoiset <samuel.pitoi...@gmail.com
> >> <mailto:samuel.pitoi...@gmail.com>> wrote:
> >>
> >> I have something similar on my local tree (started on monday).
> >>
> >> Though, I don't like the way we expose the number of VGPRS/SGPRS
> >> because we can't really figure out the number of spilled ones.
> >>
> >>
> >> My assumption was that if we've spilled then we've used all available
> >> registers, so if numUsed{V,S}gprs is greater than the number available,
> then
> >> you'd know that the number spilled is the difference between the two.
> Can we
> >> have spilling when num_{v,s}gprs is less than the number available?
> >
> >
> > Assuming the number of waves per CU is 4, I would go with:
> >
> > num_available_vgprs = num_physical_vgprs (ie. 256) / max_simd_waves
> (aligned
> > down to 4).
>
> for compute there is
>
> num_available_vgprs (as LLVM sees as constraints) = num_physical_vgprs
> / ceil(compute_workgroup_size / 256)
>
> for other stages it always is 256. (Until we implement the wave limit ext)
>
> Reading from the spec I think it is unintuitive that the usedVgpr
> stats include spilled registers though. I'd
> expect to see just the physically used regs. Is this something that
> Feral has tried on the official driver on any platform? I'd say to not
> include the spilled regs (you can get it approximately with scratch
> memory / 256), unless the official driver does otherwise, in which
> case we should go for consistency.
>

I've not looked at amdgpu-pro, I'm unable to check it right now. Not sure
if that would even have the extension since it only appeared in the spec
very recently.

I'll go with what you suggest for now, I think you're probably right that
we shouldn't include the spilled registers.

Alex


>
> >
> > (or we can just set num_available_vgprs to conf->num_vgprs and return
> > num_used_vgprs = conf->num_vgprs + conf->num_spilled_sgprs).
> >
> > That way, if num_used_vgprs is greater than num_available_vgprs we know
> that
> > we are spilling some vgprs.
> >
> > For the number of available SGPRs, I think we can just hardcode the
> value to
> > 104 for now.
> >
> > Also with this, we can easily re-compute the maximum number of waves.
> >
> >>
> >> Alex
> >>
> >>
> >>
> >> On 10/25/2017 01:18 PM, Alex Smith wrote:
> >>
> >> This allows an app to query shader statistics and get a
> >> disassembly of
> >> a shader. RenderDoc git has support for it, so this allows you
> >> to view
> >> shader disassembly from a capture.
> >>
> >> When this extension is enabled on a device (or when tracing), we
> >> now
> >> disable pipeline caching, since we don't get the shader debug
> >> info when
> >> we retrieve cached shaders.
> >>
> >> Signed-off-by: Alex Smith <asm...@feralinteractive.com
> >> <mailto:asm...@feralinteractive.com>>
> >>
> >> ---
> >>src/amd/vulkan/radv_device.c |   9 ++
> >>src/amd/vulkan/radv_extensions.py|   1 +
> >>src/amd/vulkan/radv_pipeline.c   |   2 +-
> >>src/amd/vulkan/radv_pipeline_cache.c |  11 ++-
> >>src/amd/vulkan/radv_private.h|   3 +
> >>src/amd/vulkan/radv_shader.c | 163
> >> ---
> >>6 files changed, 154 insertions(+), 35 deletions(-)
> >>
> >> diff --git a/src/amd/vulkan/radv_device.c
> >> b/src/amd/vulkan/radv_device.c
> >> index c4e25222ea..5603551680 100644
> >> --- a/src/amd/vulkan/radv_device.c
> >> +++ b/src/amd/vulkan/radv_device.c
> >> @@ -943,10 +943,15 @@ VkResult radv_CreateDevice(
> >>  VkResult result;
> >>  struct radv_device *device;
> >>+ bool keep_shader_info = false;
> >> +
> >>  for (uint32_t i = 0; i <
> >> pCreateI

Re: [Mesa-dev] [PATCH 2/2] radv: Implement VK_AMD_shader_info

2017-10-25 Thread Alex Smith

On 25 October 2017 at 12:46, Samuel Pitoiset <samuel.pitoi...@gmail.com>
wrote:

> I have something similar on my local tree (started on monday).
>
> Though, I don't like the way we expose the number of VGPRS/SGPRS because
> we can't really figure out the number of spilled ones.


My assumption was that if we've spilled then we've used all available
registers, so if numUsed{V,S}gprs is greater than the number available,
then you'd know that the number spilled is the difference between the two.
Can we have spilling when num_{v,s}gprs is less than the number available?

Alex


>
>
> On 10/25/2017 01:18 PM, Alex Smith wrote:
>
>> This allows an app to query shader statistics and get a disassembly of
>> a shader. RenderDoc git has support for it, so this allows you to view
>> shader disassembly from a capture.
>>
>> When this extension is enabled on a device (or when tracing), we now
>> disable pipeline caching, since we don't get the shader debug info when
>> we retrieve cached shaders.
>>
>> Signed-off-by: Alex Smith <asm...@feralinteractive.com>
>> ---
>>   src/amd/vulkan/radv_device.c |   9 ++
>>   src/amd/vulkan/radv_extensions.py|   1 +
>>   src/amd/vulkan/radv_pipeline.c   |   2 +-
>>   src/amd/vulkan/radv_pipeline_cache.c |  11 ++-
>>   src/amd/vulkan/radv_private.h|   3 +
>>   src/amd/vulkan/radv_shader.c | 163
>> ---
>>   6 files changed, 154 insertions(+), 35 deletions(-)
>>
>> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
>> index c4e25222ea..5603551680 100644
>> --- a/src/amd/vulkan/radv_device.c
>> +++ b/src/amd/vulkan/radv_device.c
>> @@ -943,10 +943,15 @@ VkResult radv_CreateDevice(
>> VkResult result;
>> struct radv_device *device;
>>   + bool keep_shader_info = false;
>> +
>> for (uint32_t i = 0; i < pCreateInfo->enabledExtensionCount;
>> i++) {
>> const char *ext_name = pCreateInfo->ppEnabledExtensio
>> nNames[i];
>> if 
>> (!radv_physical_device_extension_supported(physical_device,
>> ext_name))
>> return vk_error(VK_ERROR_EXTENSION_NOT_PRESENT);
>> +
>> +   if (strcmp(ext_name, VK_AMD_SHADER_INFO_EXTENSION_NAME)
>> == 0)
>> +   keep_shader_info = true;
>> }
>> /* Check enabled features */
>> @@ -1040,10 +1045,14 @@ VkResult radv_CreateDevice(
>> device->physical_device->rad_info.max_se >= 2;
>> if (getenv("RADV_TRACE_FILE")) {
>> +   keep_shader_info = true;
>> +
>> if (!radv_init_trace(device))
>> goto fail;
>> }
>>   + device->keep_shader_info = keep_shader_info;
>> +
>> result = radv_device_init_meta(device);
>> if (result != VK_SUCCESS)
>> goto fail;
>> diff --git a/src/amd/vulkan/radv_extensions.py
>> b/src/amd/vulkan/radv_extensions.py
>> index dfeb2880fc..eeb679d65a 100644
>> --- a/src/amd/vulkan/radv_extensions.py
>> +++ b/src/amd/vulkan/radv_extensions.py
>> @@ -81,6 +81,7 @@ EXTENSIONS = [
>>   Extension('VK_EXT_global_priority',   1,
>> 'device->rad_info.has_ctx_priority'),
>>   Extension('VK_AMD_draw_indirect_count',   1, True),
>>   Extension('VK_AMD_rasterization_order',   1,
>> 'device->rad_info.chip_class >= VI && device->rad_info.max_se >= 2'),
>> +Extension('VK_AMD_shader_info',   1, True),
>>   ]
>> class VkVersion:
>> diff --git a/src/amd/vulkan/radv_pipeline.c
>> b/src/amd/vulkan/radv_pipeline.c
>> index d6b33a5327..2df03a83cf 100644
>> --- a/src/amd/vulkan/radv_pipeline.c
>> +++ b/src/amd/vulkan/radv_pipeline.c
>> @@ -1874,7 +1874,7 @@ void radv_create_shaders(struct radv_pipeline
>> *pipeline,
>> if (device->instance->debug_flags &
>> RADV_DEBUG_DUMP_SHADERS)
>> nir_print_shader(nir[i], stderr);
>>   - if (!pipeline->device->trace_bo)
>> +   if (!pipeline->device->keep_shader_info)
>> ralloc_free(nir[i]);
>> }
>> }
>> diff --git a/src/amd/vulkan/radv_pipeline_cache.c
>> b/src/amd/vulkan/radv_pipeline_cache.c
>> index 9ba9a3b61b..46198799a7 100644
>> --- a/s

[Mesa-dev] [PATCH 2/2] radv: Implement VK_AMD_shader_info

2017-10-25 Thread Alex Smith

This allows an app to query shader statistics and get a disassembly of
a shader. RenderDoc git has support for it, so this allows you to view
shader disassembly from a capture.

When this extension is enabled on a device (or when tracing), we now
disable pipeline caching, since we don't get the shader debug info when
we retrieve cached shaders.

Signed-off-by: Alex Smith <asm...@feralinteractive.com>
---
 src/amd/vulkan/radv_device.c |   9 ++
 src/amd/vulkan/radv_extensions.py|   1 +
 src/amd/vulkan/radv_pipeline.c   |   2 +-
 src/amd/vulkan/radv_pipeline_cache.c |  11 ++-
 src/amd/vulkan/radv_private.h|   3 +
 src/amd/vulkan/radv_shader.c | 163 ---
 6 files changed, 154 insertions(+), 35 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index c4e25222ea..5603551680 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -943,10 +943,15 @@ VkResult radv_CreateDevice(
VkResult result;
struct radv_device *device;
 
+   bool keep_shader_info = false;
+
for (uint32_t i = 0; i < pCreateInfo->enabledExtensionCount; i++) {
const char *ext_name = pCreateInfo->ppEnabledExtensionNames[i];
if (!radv_physical_device_extension_supported(physical_device, 
ext_name))
return vk_error(VK_ERROR_EXTENSION_NOT_PRESENT);
+
+   if (strcmp(ext_name, VK_AMD_SHADER_INFO_EXTENSION_NAME) == 0)
+   keep_shader_info = true;
}
 
/* Check enabled features */
@@ -1040,10 +1045,14 @@ VkResult radv_CreateDevice(
device->physical_device->rad_info.max_se >= 2;
 
if (getenv("RADV_TRACE_FILE")) {
+   keep_shader_info = true;
+
if (!radv_init_trace(device))
goto fail;
}
 
+   device->keep_shader_info = keep_shader_info;
+
result = radv_device_init_meta(device);
if (result != VK_SUCCESS)
goto fail;
diff --git a/src/amd/vulkan/radv_extensions.py 
b/src/amd/vulkan/radv_extensions.py
index dfeb2880fc..eeb679d65a 100644
--- a/src/amd/vulkan/radv_extensions.py
+++ b/src/amd/vulkan/radv_extensions.py
@@ -81,6 +81,7 @@ EXTENSIONS = [
 Extension('VK_EXT_global_priority',   1, 
'device->rad_info.has_ctx_priority'),
 Extension('VK_AMD_draw_indirect_count',   1, True),
 Extension('VK_AMD_rasterization_order',   1, 
'device->rad_info.chip_class >= VI && device->rad_info.max_se >= 2'),
+Extension('VK_AMD_shader_info',   1, True),
 ]
 
 class VkVersion:
diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index d6b33a5327..2df03a83cf 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -1874,7 +1874,7 @@ void radv_create_shaders(struct radv_pipeline *pipeline,
if (device->instance->debug_flags & 
RADV_DEBUG_DUMP_SHADERS)
nir_print_shader(nir[i], stderr);
 
-   if (!pipeline->device->trace_bo)
+   if (!pipeline->device->keep_shader_info)
ralloc_free(nir[i]);
}
}
diff --git a/src/amd/vulkan/radv_pipeline_cache.c 
b/src/amd/vulkan/radv_pipeline_cache.c
index 9ba9a3b61b..46198799a7 100644
--- a/src/amd/vulkan/radv_pipeline_cache.c
+++ b/src/amd/vulkan/radv_pipeline_cache.c
@@ -62,9 +62,11 @@ radv_pipeline_cache_init(struct radv_pipeline_cache *cache,
cache->hash_table = malloc(byte_size);
 
/* We don't consider allocation failure fatal, we just start with a 
0-sized
-* cache. */
+* cache. Disable caching when we want to keep shader debug info, since
+* we don't get the debug info on cached shaders. */
if (cache->hash_table == NULL ||
-   (device->instance->debug_flags & RADV_DEBUG_NO_CACHE))
+   (device->instance->debug_flags & RADV_DEBUG_NO_CACHE) ||
+   device->keep_shader_info)
cache->table_size = 0;
else
memset(cache->hash_table, 0, byte_size);
@@ -186,8 +188,11 @@ radv_create_shader_variants_from_pipeline_cache(struct 
radv_device *device,
entry = radv_pipeline_cache_search_unlocked(cache, sha1);
 
if (!entry) {
+   /* Again, don't cache when we want debug info, since this isn't
+* present in the cache. */
if (!device->physical_device->disk_cache ||
-   (device->instance->debug_flags & RADV_DEBUG_NO_CACHE)) {
+   (device->instance->debug_flags & RADV_DEBUG_NO_CACHE) ||
+   device->keep_shader_info) {
pthread_mutex_unlock

[Mesa-dev] [PATCH 1/2] vulkan: Update headers and registry to 1.0.64

2017-10-25 Thread Alex Smith

Signed-off-by: Alex Smith <asm...@feralinteractive.com>
---
 include/vulkan/vulkan.h|  50 +-
 src/vulkan/registry/vk.xml | 159 +
 2 files changed, 181 insertions(+), 28 deletions(-)

diff --git a/include/vulkan/vulkan.h b/include/vulkan/vulkan.h
index e1398c68ba..048866c444 100644
--- a/include/vulkan/vulkan.h
+++ b/include/vulkan/vulkan.h
@@ -43,7 +43,7 @@ extern "C" {
 #define VK_VERSION_MINOR(version) (((uint32_t)(version) >> 12) & 0x3ff)
 #define VK_VERSION_PATCH(version) ((uint32_t)(version) & 0xfff)
 // Version of this file
-#define VK_HEADER_VERSION 63
+#define VK_HEADER_VERSION 64
 
 
 #define VK_NULL_HANDLE 0
@@ -5194,7 +5194,7 @@ VKAPI_ATTR VkResult VKAPI_CALL vkBindImageMemory2KHR(
 #define VK_EXT_debug_report 1
 VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkDebugReportCallbackEXT)
 
-#define VK_EXT_DEBUG_REPORT_SPEC_VERSION  8
+#define VK_EXT_DEBUG_REPORT_SPEC_VERSION  9
 #define VK_EXT_DEBUG_REPORT_EXTENSION_NAME "VK_EXT_debug_report"
 #define VK_STRUCTURE_TYPE_DEBUG_REPORT_CREATE_INFO_EXT 
VK_STRUCTURE_TYPE_DEBUG_REPORT_CALLBACK_CREATE_INFO_EXT
 #define VK_DEBUG_REPORT_OBJECT_TYPE_DEBUG_REPORT_EXT 
VK_DEBUG_REPORT_OBJECT_TYPE_DEBUG_REPORT_CALLBACK_EXT_EXT
@@ -5488,6 +5488,52 @@ typedef struct VkTextureLODGatherFormatPropertiesAMD {
 
 
 
+#define VK_AMD_shader_info 1
+#define VK_AMD_SHADER_INFO_SPEC_VERSION   1
+#define VK_AMD_SHADER_INFO_EXTENSION_NAME "VK_AMD_shader_info"
+
+
+typedef enum VkShaderInfoTypeAMD {
+VK_SHADER_INFO_TYPE_STATISTICS_AMD = 0,
+VK_SHADER_INFO_TYPE_BINARY_AMD = 1,
+VK_SHADER_INFO_TYPE_DISASSEMBLY_AMD = 2,
+VK_SHADER_INFO_TYPE_BEGIN_RANGE_AMD = VK_SHADER_INFO_TYPE_STATISTICS_AMD,
+VK_SHADER_INFO_TYPE_END_RANGE_AMD = VK_SHADER_INFO_TYPE_DISASSEMBLY_AMD,
+VK_SHADER_INFO_TYPE_RANGE_SIZE_AMD = (VK_SHADER_INFO_TYPE_DISASSEMBLY_AMD 
- VK_SHADER_INFO_TYPE_STATISTICS_AMD + 1),
+VK_SHADER_INFO_TYPE_MAX_ENUM_AMD = 0x7FFF
+} VkShaderInfoTypeAMD;
+
+typedef struct VkShaderResourceUsageAMD {
+uint32_tnumUsedVgprs;
+uint32_tnumUsedSgprs;
+uint32_tldsSizePerLocalWorkGroup;
+size_t  ldsUsageSizeInBytes;
+size_t  scratchMemUsageInBytes;
+} VkShaderResourceUsageAMD;
+
+typedef struct VkShaderStatisticsInfoAMD {
+VkShaderStageFlags  shaderStageMask;
+VkShaderResourceUsageAMDresourceUsage;
+uint32_tnumPhysicalVgprs;
+uint32_tnumPhysicalSgprs;
+uint32_tnumAvailableVgprs;
+uint32_tnumAvailableSgprs;
+uint32_tcomputeWorkGroupSize[3];
+} VkShaderStatisticsInfoAMD;
+
+
+typedef VkResult (VKAPI_PTR *PFN_vkGetShaderInfoAMD)(VkDevice device, 
VkPipeline pipeline, VkShaderStageFlagBits shaderStage, VkShaderInfoTypeAMD 
infoType, size_t* pInfoSize, void* pInfo);
+
+#ifndef VK_NO_PROTOTYPES
+VKAPI_ATTR VkResult VKAPI_CALL vkGetShaderInfoAMD(
+VkDevicedevice,
+VkPipeline  pipeline,
+VkShaderStageFlagBits   shaderStage,
+VkShaderInfoTypeAMD infoType,
+size_t* pInfoSize,
+void*   pInfo);
+#endif
+
 #define VK_AMD_shader_image_load_store_lod 1
 #define VK_AMD_SHADER_IMAGE_LOAD_STORE_LOD_SPEC_VERSION 1
 #define VK_AMD_SHADER_IMAGE_LOAD_STORE_LOD_EXTENSION_NAME 
"VK_AMD_shader_image_load_store_lod"
diff --git a/src/vulkan/registry/vk.xml b/src/vulkan/registry/vk.xml
index 88e0997148..d2aba617c9 100644
--- a/src/vulkan/registry/vk.xml
+++ b/src/vulkan/registry/vk.xml
@@ -107,7 +107,7 @@ private version is maintained in the 1.0 branch of the 
member gitlab server.
 // Vulkan 1.0 version number
 #define VK_API_VERSION_1_0 VK_MAKE_VERSION(1, 0, 
0)// Patch version should always be set to 0
 // Version of this file
-#define VK_HEADER_VERSION 63
+#define VK_HEADER_VERSION 64
 
 
 #define VK_DEFINE_HANDLE(object) typedef struct object##_T* 
object;
@@ -386,6 +386,7 @@ private version is maintained in the 1.0 branch of the 
member gitlab server.
 
 
 
+
 
 
 WSI extensions
@@ -1251,7 +1252,7 @@ private version is maintained in the 1.0 branch of the 
member gitlab server.
 VkBool32   
shaderInt6464-bit integers in shaders
 VkBool32   
shaderInt1616-bit integers in shaders
 VkBool32   
shaderResourceResidencyshader can use texture operations 
that return resource residency information (requires sparseNonResident 
support)
-VkBool32   
shaderResourceMinLodshader can use texture operations 
that specify minimum resource level of detail
+VkBool32   
shaderResourceMinLodshader can use texture operati

Re: [Mesa-dev] [PATCH] ac/nir: generate correct instruction for atomic min/max on unsigned images

2017-10-25 Thread Alex Smith

Forgot to add

Cc: "17.2 17.3" 

On 25 October 2017 at 11:24, Matthew Nicholls <
mnicho...@feralinteractive.com> wrote:

> ---
>  src/amd/common/ac_nir_to_llvm.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_
> llvm.c
> index 3d635d4206..870731f3eb 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -3667,15 +3667,17 @@ static LLVMValueRef visit_image_atomic(struct
> ac_nir_context *ctx,
> LLVMValueRef i1true = LLVMConstInt(ctx->ac.i1, 1, false);
> MAYBE_UNUSED int length;
>
> +   bool is_unsigned = glsl_get_sampler_result_type(type) ==
> GLSL_TYPE_UINT;
> +
> switch (instr->intrinsic) {
> case nir_intrinsic_image_atomic_add:
> atomic_name = "add";
> break;
> case nir_intrinsic_image_atomic_min:
> -   atomic_name = "smin";
> +   atomic_name = is_unsigned ? "umin" : "smin";
> break;
> case nir_intrinsic_image_atomic_max:
> -   atomic_name = "smax";
> +   atomic_name = is_unsigned ? "umin" : "smax";
> break;
> case nir_intrinsic_image_atomic_and:
> atomic_name = "and";
> --
> 2.13.6
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radv: Update code pointer correctly if a variant is already created

2017-10-23 Thread Alex Smith

This was the actual cause of GPU hangs fixed by 0fdd531457ec ("radv:
Fix pipeline cache locking issues"), since multiple threads would end
up trying to create the variants for a single entry.

Now that we're locking around the whole of this function, this isn't
really necessary (we either create all or none of the variants), but
fix this anyway in case things change later.

Signed-off-by: Alex Smith <asm...@feralinteractive.com>
---
 src/amd/vulkan/radv_pipeline_cache.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/amd/vulkan/radv_pipeline_cache.c 
b/src/amd/vulkan/radv_pipeline_cache.c
index a75356b822..9ba9a3b61b 100644
--- a/src/amd/vulkan/radv_pipeline_cache.c
+++ b/src/amd/vulkan/radv_pipeline_cache.c
@@ -231,6 +231,8 @@ radv_create_shader_variants_from_pipeline_cache(struct 
radv_device *device,
p += entry->code_sizes[i];
 
entry->variants[i] = variant;
+   } else if (entry->code_sizes[i]) {
+   p += sizeof(struct cache_entry_variant_info) + 
entry->code_sizes[i];
}
 
}
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radv: Fix pipeline cache locking issues

2017-10-23 Thread Alex Smith

On 21 October 2017 at 02:54, Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl>
wrote:

> For radv_create_shader_variants_from_pipeline_cache I'm not really
> sure why this would cause corruption. Yes we might create the variants
> a few times too much, but that should not cause any corruption.
>

Just had another look and figured out what the actual problem is - if
there's a race between multiple threads to create the variants for a single
entry, then they may not update p inside the loop properly (since they only
do so when the variant isn't already created), so can end up using the
wrong code. I'll send a patch - I don't think it's necessary now that we're
locking around there (we'll only create all or none of the variants), but
may as well fix it in case things change later.

Alex


>
> Either way, it is a fix, so
> Reviewed-by: Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl>
>
> and pushed. Thanks.
>
> On Thu, Oct 19, 2017 at 12:49 PM, Alex Smith
> <asm...@feralinteractive.com> wrote:
> > Need to lock around the whole process of retrieving cached shaders, and
> > around GetPipelineCacheData.
> >
> > This fixes GPU hangs observed when creating multiple pipelines in
> > parallel, which appeared to be due to invalid shader code being pulled
> > from the cache.
> >
> > Signed-off-by: Alex Smith <asm...@feralinteractive.com>
> > ---
> >  src/amd/vulkan/radv_pipeline_cache.c | 30
> +++---
> >  1 file changed, 23 insertions(+), 7 deletions(-)
> >
> > diff --git a/src/amd/vulkan/radv_pipeline_cache.c b/src/amd/vulkan/radv_
> pipeline_cache.c
> > index 034dc35af8..a75356b822 100644
> > --- a/src/amd/vulkan/radv_pipeline_cache.c
> > +++ b/src/amd/vulkan/radv_pipeline_cache.c
> > @@ -177,15 +177,20 @@ radv_create_shader_variants_from_pipeline_cache(struct
> radv_device *device,
> > struct
> radv_shader_variant **variants)
> >  {
> > struct cache_entry *entry;
> > -   if (cache)
> > -   entry = radv_pipeline_cache_search(cache, sha1);
> > -   else
> > -   entry = radv_pipeline_cache_search(device->mem_cache,
> sha1);
> > +
> > +   if (!cache)
> > +   cache = device->mem_cache;
> > +
> > +   pthread_mutex_lock(>mutex);
> > +
> > +   entry = radv_pipeline_cache_search_unlocked(cache, sha1);
> >
> > if (!entry) {
> > if (!device->physical_device->disk_cache ||
> > -   (device->instance->debug_flags &
> RADV_DEBUG_NO_CACHE))
> > +   (device->instance->debug_flags &
> RADV_DEBUG_NO_CACHE)) {
> > +   pthread_mutex_unlock(>mutex);
> > return false;
> > +   }
> >
> > uint8_t disk_sha1[20];
> > disk_cache_compute_key(device-
> >physical_device->disk_cache,
> > @@ -193,8 +198,10 @@ radv_create_shader_variants_from_pipeline_cache(struct
> radv_device *device,
> > entry = (struct cache_entry *)
> > disk_cache_get(device->
> physical_device->disk_cache,
> >disk_sha1, NULL);
> > -   if (!entry)
> > +   if (!entry) {
> > +   pthread_mutex_unlock(>mutex);
> > return false;
> > +   }
> > }
> >
> > char *p = entry->code;
> > @@ -204,8 +211,10 @@ radv_create_shader_variants_from_pipeline_cache(struct
> radv_device *device,
> > struct cache_entry_variant_info info;
> >
> > variant = calloc(1, sizeof(struct
> radv_shader_variant));
> > -   if (!variant)
> > +   if (!variant) {
> > +   pthread_mutex_unlock(>mutex);
> > return false;
> > +   }
> >
> > memcpy(, p, sizeof(struct
> cache_entry_variant_info));
> > p += sizeof(struct cache_entry_variant_info);
> > @@ -231,6 +240,7 @@ radv_create_shader_variants_from_pipeline_cache(struct
> radv_device *device,
> > p_atomic_inc(>variants[i]->ref_count);
> >
> > memcpy(variants, entry->variants, sizeof(entry->variants));
> > +   pthread_mutex_unlock(>mutex);
> > return true;
>

1 2 >

1 - 100 of 167 matches

Mail list logo