Re: [Mesa-dev] [PATCH 6/8] radeonsi: use WRITE_DATA for small glBufferSubData sizes

2019-01-18 Thread Marek Olšák
FYI, I'm squashing this patch into patch 8.

Marek

On Fri, Jan 18, 2019 at 8:23 PM Bas Nieuwenhuizen 
wrote:

> Ack, patches 1-6 are
>
> Reviewed-by: Bas Nieuwenhuizen 
>
> On Sat, Jan 19, 2019 at 2:08 AM Marek Olšák  wrote:
> >
> > On Fri, Jan 18, 2019 at 6:05 PM Bas Nieuwenhuizen <
> b...@basnieuwenhuizen.nl> wrote:
> >>
> >> On Fri, Jan 18, 2019 at 5:44 PM Marek Olšák  wrote:
> >> >
> >> > From: Marek Olšák 
> >> >
> >> > ---
> >> >  src/gallium/drivers/radeonsi/si_buffer.c | 27
> 
> >> >  src/gallium/drivers/radeonsi/si_pipe.h   |  1 +
> >> >  2 files changed, 28 insertions(+)
> >> >
> >> > diff --git a/src/gallium/drivers/radeonsi/si_buffer.c
> b/src/gallium/drivers/radeonsi/si_buffer.c
> >> > index 4766cf4bdfa..a1e421b8b0d 100644
> >> > --- a/src/gallium/drivers/radeonsi/si_buffer.c
> >> > +++ b/src/gallium/drivers/radeonsi/si_buffer.c
> >> > @@ -16,20 +16,22 @@
> >> >   * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> >> >   * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> >> >   * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO
> EVENT SHALL
> >> >   * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
> >> >   * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> TORT OR
> >> >   * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> SOFTWARE OR THE
> >> >   * USE OR OTHER DEALINGS IN THE SOFTWARE.
> >> >   */
> >> >
> >> >  #include "radeonsi/si_pipe.h"
> >> > +#include "sid.h"
> >> > +
> >> >  #include "util/u_memory.h"
> >> >  #include "util/u_upload_mgr.h"
> >> >  #include "util/u_transfer.h"
> >> >  #include 
> >> >  #include 
> >> >
> >> >  bool si_rings_is_buffer_referenced(struct si_context *sctx,
> >> >struct pb_buffer *buf,
> >> >enum radeon_bo_usage usage)
> >> >  {
> >> > @@ -506,20 +508,38 @@ static void *si_buffer_transfer_map(struct
> pipe_context *ctx,
> >> > data = si_buffer_map_sync_with_rings(sctx, rbuffer, usage);
> >> > if (!data) {
> >> > return NULL;
> >> > }
> >> > data += box->x;
> >> >
> >> > return si_buffer_get_transfer(ctx, resource, usage, box,
> >> > ptransfer, data, NULL, 0);
> >> >  }
> >> >
> >> > +static void si_buffer_write_data(struct si_context *sctx, struct
> r600_resource *buf,
> >> > +unsigned offset, unsigned size,
> const void *data)
> >> > +{
> >> > +   struct radeon_cmdbuf *cs = sctx->gfx_cs;
> >> > +
> >> > +   si_need_gfx_cs_space(sctx);
> >> > +
> >> > +   sctx->flags |= SI_CONTEXT_PS_PARTIAL_FLUSH |
> >> > +  SI_CONTEXT_CS_PARTIAL_FLUSH |
> >> > +  si_get_flush_flags(sctx, SI_COHERENCY_SHADER,
> L2_LRU);
> >> > +   si_emit_cache_flush(sctx);
> >>
> >> Maybe only do the cache flush if the buffer is referenced by the
> >> current cmd buffer?
> >
> >
> > We can't do that, because 2 consecutive IBs can execute simultaneously
> for a moment.
> >
> > Marek
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/9] radeonsi: rename r600_resource -> si_resource

2019-01-18 Thread Bas Nieuwenhuizen
Not really a fan of the whole renaming thing due to the blame and
cherrypicking churn, but a bunch of these are long overdue. The series
is

Reviewed-by: Bas Nieuwenhuizen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/8] radeonsi: use WRITE_DATA for small glBufferSubData sizes

2019-01-18 Thread Bas Nieuwenhuizen
Ack, patches 1-6 are

Reviewed-by: Bas Nieuwenhuizen 

On Sat, Jan 19, 2019 at 2:08 AM Marek Olšák  wrote:
>
> On Fri, Jan 18, 2019 at 6:05 PM Bas Nieuwenhuizen  
> wrote:
>>
>> On Fri, Jan 18, 2019 at 5:44 PM Marek Olšák  wrote:
>> >
>> > From: Marek Olšák 
>> >
>> > ---
>> >  src/gallium/drivers/radeonsi/si_buffer.c | 27 
>> >  src/gallium/drivers/radeonsi/si_pipe.h   |  1 +
>> >  2 files changed, 28 insertions(+)
>> >
>> > diff --git a/src/gallium/drivers/radeonsi/si_buffer.c 
>> > b/src/gallium/drivers/radeonsi/si_buffer.c
>> > index 4766cf4bdfa..a1e421b8b0d 100644
>> > --- a/src/gallium/drivers/radeonsi/si_buffer.c
>> > +++ b/src/gallium/drivers/radeonsi/si_buffer.c
>> > @@ -16,20 +16,22 @@
>> >   * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>> > EXPRESS OR
>> >   * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>> > MERCHANTABILITY,
>> >   * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT 
>> > SHALL
>> >   * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
>> >   * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>> >   * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR 
>> > THE
>> >   * USE OR OTHER DEALINGS IN THE SOFTWARE.
>> >   */
>> >
>> >  #include "radeonsi/si_pipe.h"
>> > +#include "sid.h"
>> > +
>> >  #include "util/u_memory.h"
>> >  #include "util/u_upload_mgr.h"
>> >  #include "util/u_transfer.h"
>> >  #include 
>> >  #include 
>> >
>> >  bool si_rings_is_buffer_referenced(struct si_context *sctx,
>> >struct pb_buffer *buf,
>> >enum radeon_bo_usage usage)
>> >  {
>> > @@ -506,20 +508,38 @@ static void *si_buffer_transfer_map(struct 
>> > pipe_context *ctx,
>> > data = si_buffer_map_sync_with_rings(sctx, rbuffer, usage);
>> > if (!data) {
>> > return NULL;
>> > }
>> > data += box->x;
>> >
>> > return si_buffer_get_transfer(ctx, resource, usage, box,
>> > ptransfer, data, NULL, 0);
>> >  }
>> >
>> > +static void si_buffer_write_data(struct si_context *sctx, struct 
>> > r600_resource *buf,
>> > +unsigned offset, unsigned size, const 
>> > void *data)
>> > +{
>> > +   struct radeon_cmdbuf *cs = sctx->gfx_cs;
>> > +
>> > +   si_need_gfx_cs_space(sctx);
>> > +
>> > +   sctx->flags |= SI_CONTEXT_PS_PARTIAL_FLUSH |
>> > +  SI_CONTEXT_CS_PARTIAL_FLUSH |
>> > +  si_get_flush_flags(sctx, SI_COHERENCY_SHADER, 
>> > L2_LRU);
>> > +   si_emit_cache_flush(sctx);
>>
>> Maybe only do the cache flush if the buffer is referenced by the
>> current cmd buffer?
>
>
> We can't do that, because 2 consecutive IBs can execute simultaneously for a 
> moment.
>
> Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/8] gallium/util: add a linear allocator for reducing malloc overhead

2019-01-18 Thread Bas Nieuwenhuizen
On Sat, Jan 19, 2019 at 2:10 AM Marek Olšák  wrote:
>
> On Fri, Jan 18, 2019 at 6:08 PM Bas Nieuwenhuizen  
> wrote:
>>
>> On Fri, Jan 18, 2019 at 5:44 PM Marek Olšák  wrote:
>> >
>> > From: Marek Olšák 
>> >
>> > ---
>> >  src/gallium/auxiliary/Makefile.sources  |  1 +
>> >  src/gallium/auxiliary/meson.build   |  1 +
>> >  src/gallium/auxiliary/util/u_cpu_suballoc.h | 90 +
>> >  3 files changed, 92 insertions(+)
>> >  create mode 100644 src/gallium/auxiliary/util/u_cpu_suballoc.h
>> >
>> > diff --git a/src/gallium/auxiliary/Makefile.sources 
>> > b/src/gallium/auxiliary/Makefile.sources
>> > index 50e88088ff8..b26415858f6 100644
>> > --- a/src/gallium/auxiliary/Makefile.sources
>> > +++ b/src/gallium/auxiliary/Makefile.sources
>> > @@ -211,20 +211,21 @@ C_SOURCES := \
>> > util/u_bitmask.c \
>> > util/u_bitmask.h \
>> > util/u_blend.h \
>> > util/u_blit.c \
>> > util/u_blit.h \
>> > util/u_blitter.c \
>> > util/u_blitter.h \
>> > util/u_box.h \
>> > util/u_cache.c \
>> > util/u_cache.h \
>> > +   util/u_cpu_suballoc.h \
>> > util/u_debug_gallium.h \
>> > util/u_debug_gallium.c \
>> > util/u_debug_describe.c \
>> > util/u_debug_describe.h \
>> > util/u_debug_flush.c \
>> > util/u_debug_flush.h \
>> > util/u_debug_image.c \
>> > util/u_debug_image.h \
>> > util/u_debug_memory.c \
>> > util/u_debug_refcnt.c \
>> > diff --git a/src/gallium/auxiliary/meson.build 
>> > b/src/gallium/auxiliary/meson.build
>> > index 57f7e69050f..7e1e4732421 100644
>> > --- a/src/gallium/auxiliary/meson.build
>> > +++ b/src/gallium/auxiliary/meson.build
>> > @@ -231,20 +231,21 @@ files_libgallium = files(
>> >'util/u_bitmask.c',
>> >'util/u_bitmask.h',
>> >'util/u_blend.h',
>> >'util/u_blit.c',
>> >'util/u_blit.h',
>> >'util/u_blitter.c',
>> >'util/u_blitter.h',
>> >'util/u_box.h',
>> >'util/u_cache.c',
>> >'util/u_cache.h',
>> > +  'util/u_cpu_suballoc.h',
>> >'util/u_debug_gallium.h',
>> >'util/u_debug_gallium.c',
>> >'util/u_debug_describe.c',
>> >'util/u_debug_describe.h',
>> >'util/u_debug_flush.c',
>> >'util/u_debug_flush.h',
>> >'util/u_debug_image.c',
>> >'util/u_debug_image.h',
>> >'util/u_debug_memory.c',
>> >'util/u_debug_refcnt.c',
>> > diff --git a/src/gallium/auxiliary/util/u_cpu_suballoc.h 
>> > b/src/gallium/auxiliary/util/u_cpu_suballoc.h
>> > new file mode 100644
>> > index 000..2373c1f7c70
>> > --- /dev/null
>> > +++ b/src/gallium/auxiliary/util/u_cpu_suballoc.h
>> > @@ -0,0 +1,90 @@
>> > +/**
>> > + *
>> > + * Copyright 2019 Advanced Micro Devices, Inc.
>> > + * All Rights Reserved.
>> > + *
>> > + * Permission is hereby granted, free of charge, to any person obtaining a
>> > + * copy of this software and associated documentation files (the
>> > + * "Software"), to deal in the Software without restriction, including
>> > + * without limitation the rights to use, copy, modify, merge, publish,
>> > + * distribute, sub license, and/or sell copies of the Software, and to
>> > + * permit persons to whom the Software is furnished to do so, subject to
>> > + * the following conditions:
>> > + *
>> > + * The above copyright notice and this permission notice (including the
>> > + * next paragraph) shall be included in all copies or substantial portions
>> > + * of the Software.
>> > + *
>> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
>> > + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
>> > + * IN NO EVENT SHALL AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR
>> > + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF 
>> > CONTRACT,
>> > + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>> > + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>> > + *
>> > + 
>> > **/
>> > +
>> > +/* A simple utility for suballocating out of malloc_aligned. */
>> > +
>> > +#ifndef U_CPU_SUBALLOC_H
>> > +#define U_CPU_SUBALLOC_H
>> > +
>> > +#include 
>> > +#include "util/os_memory.h"
>> > +
>> > +struct u_cpu_suballoc {
>> > +   unsigned default_size;  /* Default size of the buffer, in bytes. */
>> > +   unsigned current_size;  /* Current size of the buffer, in bytes. */
>> > +   unsigned alignment; /* malloc alignment. */
>> > +   unsigned offset;/* Offset pointing to the first unused byte. */
>> > +   uint8_t *buffer;/* Pointer to the CPU buffer. */
>> > +};
>> > +
>> > +
>> > +static inline void
>> > +u_cpu_suballoc_init(struct u_cpu_suballoc *alloc, unsigned default_size,
>> > +   unsigned alignment)
>> > +{
>> > + 

Re: [Mesa-dev] [PATCH] mesa: add EXT_debug_label support

2019-01-18 Thread Timothy Arceri



On 19/1/19 12:02 pm, Eric Anholt wrote:

Timothy Arceri  writes:


On 11/12/18 5:11 pm, Ian Romanick wrote:

On 12/10/18 5:52 PM, Timothy Arceri wrote:

On 11/12/18 11:35 am, Ian Romanick wrote:

It seems like someone already sent out patches to implement this, and we
decided to not take it for some reason.  Maybe it was Rob?


I discovered a thread from the beginning of 2017 titled "feature.txt &
EXT_debug_label extension". But couldn't find any implementation.

There was a reply from yourself, but it seems incorrect to me:

"I checked both extensions, and they're not "just" aliases.  The EXT adds
a single function with an enum to select the kind of object.  The KHR
adds a function per kind of object.  It would be easy enough to add, but
it seems more valuable to suggest the developer use the more broadly
supported extension."


That's weird for a couple reasons.  One, that's not even the discussion
that I was thinking of.  I'll check in the morning to see if I can find
it.  Two, I was clearly full of it... I really don't see how I came that
conclusion.  I don't even see any other related extensions that I could
have been confusing either thing with.


So do you think we should push this?


I don't see any piglit or CTS tests for this extension.



Maybe I should have worded my reply above better. I'm happy to port the 
KHR_debug equivalent test. First I just want to know if people think we 
should go ahead and add support for this extension.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/8] i965: Added support for ETC2 mipmaps

2019-01-18 Thread Nanley Chery
On Mon, Nov 19, 2018 at 10:54:10AM +0200, Eleni Maria Stea wrote:
> Extended the intel_update_decompress_shadow to update all the mipmap
> tree levels so that we can display and run Get functions on mipmaps.
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 48 +++
>  1 file changed, 29 insertions(+), 19 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index ef3e2c33d3..4886bb2b96 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -3962,15 +3962,16 @@ intel_update_decompressed_shadow(struct brw_context 
> *brw,
> int img_h = smt->surf.logical_level0_px.height;
> int img_d = smt->surf.logical_level0_px.depth;
>  
> -   ptrdiff_t shadow_stride = _mesa_format_row_stride(smt->format, img_w);
> +   int level_w = img_w;
> +   int level_h = img_h;
>  
> for (int level = smt->first_level; level <= smt->last_level; level++) {
> -  struct compressed_pixelstore store;
> -  _mesa_compute_compressed_pixelstore(mt->surf.dim,
> -  mt->format,
> -  img_w, img_h, img_d,
> -  >ctx.Unpack,
> -  );
> +  ptrdiff_t shadow_stride = _mesa_format_row_stride(smt->format,
> +level_w);
> +
> +  ptrdiff_t main_stride = _mesa_format_row_stride(mt->format,
> +  level_w);
> +
>for (unsigned int slice = 0; slice < img_d; slice++) {
>   GLbitfield mmode = GL_MAP_READ_BIT | BRW_MAP_DIRECT_BIT |
>  BRW_MAP_ETC_BIT;
> @@ -3978,30 +3979,39 @@ intel_update_decompressed_shadow(struct brw_context 
> *brw,
>  GL_MAP_INVALIDATE_RANGE_BIT |
>  BRW_MAP_DIRECT_BIT;
>  
> - uint32_t img_x, img_y;
> - intel_miptree_get_image_offset(smt, level, slice, _x, _y);
> + uint32_t slevel_x, slevel_y;
> + intel_miptree_get_image_offset(smt, level, slice, _x,
> +_y);
> +
> + uint32_t mlevel_x, mlevel_y;
> + intel_miptree_get_image_offset(mt, level, slice, _x,
> +_y);
> +
> + void *mptr;
> + intel_miptree_map(brw, mt, level, slice, 0, 0,
> +   level_w, level_h, mmode, , _stride);
>  
> - void *mptr = intel_miptree_map_raw(brw, mt, mmode) + mt->offset
> -+ img_y * store.TotalBytesPerRow
> -+ img_x * store.TotalBytesPerRow / img_w;
>  
>   void *sptr;
> - intel_miptree_map(brw, smt, level, slice, img_x, img_y, img_w, 
> img_h,
> -   smode, , _stride);
> + intel_miptree_map(brw, smt, level, slice, 0, 0, level_w,
> +   level_h, smode, , _stride);
>  
>   if (mt->format == MESA_FORMAT_ETC1_RGB8) {
>  _mesa_etc1_unpack_rgba(sptr, shadow_stride,
> -   mptr, store.TotalBytesPerRow,
> -   img_w, img_h);
> +   mptr, main_stride,
> +   level_w, level_h);
>   } else {
>  _mesa_unpack_etc2_format(sptr, shadow_stride,
> - mptr, store.TotalBytesPerRow,
> - img_w, img_h, mt->format, true);
> + mptr, main_stride,
> + level_w, level_h, mt->format, true);
>   }
>  
> - intel_miptree_unmap_raw(mt);
> + intel_miptree_unmap(brw, mt, level, slice);
>   intel_miptree_unmap(brw, smt, level, slice);
>}
> +
> +  level_w /= 2;
> +  level_h /= 2;

You want to use minify() to avoid level_w or level_h from becoming 0.

-Nanley

> }
>  
> mt->shadow_needs_update = false;
> -- 
> 2.19.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/8] gallium/util: add a linear allocator for reducing malloc overhead

2019-01-18 Thread Marek Olšák
On Fri, Jan 18, 2019 at 6:08 PM Bas Nieuwenhuizen 
wrote:

> On Fri, Jan 18, 2019 at 5:44 PM Marek Olšák  wrote:
> >
> > From: Marek Olšák 
> >
> > ---
> >  src/gallium/auxiliary/Makefile.sources  |  1 +
> >  src/gallium/auxiliary/meson.build   |  1 +
> >  src/gallium/auxiliary/util/u_cpu_suballoc.h | 90 +
> >  3 files changed, 92 insertions(+)
> >  create mode 100644 src/gallium/auxiliary/util/u_cpu_suballoc.h
> >
> > diff --git a/src/gallium/auxiliary/Makefile.sources
> b/src/gallium/auxiliary/Makefile.sources
> > index 50e88088ff8..b26415858f6 100644
> > --- a/src/gallium/auxiliary/Makefile.sources
> > +++ b/src/gallium/auxiliary/Makefile.sources
> > @@ -211,20 +211,21 @@ C_SOURCES := \
> > util/u_bitmask.c \
> > util/u_bitmask.h \
> > util/u_blend.h \
> > util/u_blit.c \
> > util/u_blit.h \
> > util/u_blitter.c \
> > util/u_blitter.h \
> > util/u_box.h \
> > util/u_cache.c \
> > util/u_cache.h \
> > +   util/u_cpu_suballoc.h \
> > util/u_debug_gallium.h \
> > util/u_debug_gallium.c \
> > util/u_debug_describe.c \
> > util/u_debug_describe.h \
> > util/u_debug_flush.c \
> > util/u_debug_flush.h \
> > util/u_debug_image.c \
> > util/u_debug_image.h \
> > util/u_debug_memory.c \
> > util/u_debug_refcnt.c \
> > diff --git a/src/gallium/auxiliary/meson.build
> b/src/gallium/auxiliary/meson.build
> > index 57f7e69050f..7e1e4732421 100644
> > --- a/src/gallium/auxiliary/meson.build
> > +++ b/src/gallium/auxiliary/meson.build
> > @@ -231,20 +231,21 @@ files_libgallium = files(
> >'util/u_bitmask.c',
> >'util/u_bitmask.h',
> >'util/u_blend.h',
> >'util/u_blit.c',
> >'util/u_blit.h',
> >'util/u_blitter.c',
> >'util/u_blitter.h',
> >'util/u_box.h',
> >'util/u_cache.c',
> >'util/u_cache.h',
> > +  'util/u_cpu_suballoc.h',
> >'util/u_debug_gallium.h',
> >'util/u_debug_gallium.c',
> >'util/u_debug_describe.c',
> >'util/u_debug_describe.h',
> >'util/u_debug_flush.c',
> >'util/u_debug_flush.h',
> >'util/u_debug_image.c',
> >'util/u_debug_image.h',
> >'util/u_debug_memory.c',
> >'util/u_debug_refcnt.c',
> > diff --git a/src/gallium/auxiliary/util/u_cpu_suballoc.h
> b/src/gallium/auxiliary/util/u_cpu_suballoc.h
> > new file mode 100644
> > index 000..2373c1f7c70
> > --- /dev/null
> > +++ b/src/gallium/auxiliary/util/u_cpu_suballoc.h
> > @@ -0,0 +1,90 @@
> >
> +/**
> > + *
> > + * Copyright 2019 Advanced Micro Devices, Inc.
> > + * All Rights Reserved.
> > + *
> > + * Permission is hereby granted, free of charge, to any person
> obtaining a
> > + * copy of this software and associated documentation files (the
> > + * "Software"), to deal in the Software without restriction, including
> > + * without limitation the rights to use, copy, modify, merge, publish,
> > + * distribute, sub license, and/or sell copies of the Software, and to
> > + * permit persons to whom the Software is furnished to do so, subject to
> > + * the following conditions:
> > + *
> > + * The above copyright notice and this permission notice (including the
> > + * next paragraph) shall be included in all copies or substantial
> portions
> > + * of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS
> > + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> NON-INFRINGEMENT.
> > + * IN NO EVENT SHALL AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR
> > + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
> CONTRACT,
> > + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> > + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
> > + *
> > +
> **/
> > +
> > +/* A simple utility for suballocating out of malloc_aligned. */
> > +
> > +#ifndef U_CPU_SUBALLOC_H
> > +#define U_CPU_SUBALLOC_H
> > +
> > +#include 
> > +#include "util/os_memory.h"
> > +
> > +struct u_cpu_suballoc {
> > +   unsigned default_size;  /* Default size of the buffer, in bytes. */
> > +   unsigned current_size;  /* Current size of the buffer, in bytes. */
> > +   unsigned alignment; /* malloc alignment. */
> > +   unsigned offset;/* Offset pointing to the first unused byte.
> */
> > +   uint8_t *buffer;/* Pointer to the CPU buffer. */
> > +};
> > +
> > +
> > +static inline void
> > +u_cpu_suballoc_init(struct u_cpu_suballoc *alloc, unsigned default_size,
> > +   unsigned alignment)
> > +{
> > +   memset(alloc, 0, sizeof(*alloc));
> > +   alloc->default_size = default_size;
> > +   alloc->alignment = alignment;
> > +}
> > +
> > +
> > +static inline void
> > 

Re: [Mesa-dev] [PATCH 4/8] i965: Update the shadow miptree from the main to fake the ETC2 compression

2019-01-18 Thread Nanley Chery
On Mon, Nov 19, 2018 at 10:54:08AM +0200, Eleni Maria Stea wrote:
> On GPUs gen < 8 that don't support ETC2 sampling/rendering we now fake
> the support using 2 mipmap trees: one (the main) that stores the
> compressed data for the Get* functions to work and one (the shadow) that
> stores the same data decompressed for the render/sampling to work.
> 
> Added the intel_update_decompressed_shadow function to update the shadow
> tree with the decompressed data whenever the main miptree with the
> compressed is changing.
> ---
>  .../drivers/dri/i965/brw_wm_surface_state.c   |  1 +
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 70 ++-
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  3 +
>  3 files changed, 71 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
> b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> index 4d1eafac91..2e6d85e1fe 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> @@ -579,6 +579,7 @@ static void brw_update_texture_surface(struct gl_context 
> *ctx,
>  
>if (obj->StencilSampling && firstImage->_BaseFormat == 
> GL_DEPTH_STENCIL) {
>   if (devinfo->gen <= 7) {
> +assert(!intel_obj->mt->needs_fake_etc);
>  assert(mt->shadow_mt && !mt->stencil_mt->shadow_needs_update);
>  mt = mt->shadow_mt;
>   } else {
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index b24332ff67..ef3e2c33d3 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -3740,12 +3740,15 @@ intel_miptree_map(struct brw_context *brw,
> assert(mt->surf.samples == 1);
>  
> if (mt->needs_fake_etc) {
> -  if (!(mode & BRW_MAP_ETC_BIT)) {
> +  if (!(mode & BRW_MAP_ETC_BIT) && !(mode & GL_MAP_READ_BIT)) {
>   assert(mt->shadow_mt);
>  
> - mt->is_shadow_mapped = true;
> + if (mt->shadow_needs_update) {
> +intel_update_decompressed_shadow(brw, mt);
> +mt->shadow_needs_update = false;
> + }
>  
> - mt->shadow_needs_update = false;
> + mt->is_shadow_mapped = true;
>   mt = miptree->shadow_mt;
>} else {
>   mt->is_shadow_mapped = false;
> @@ -3762,6 +3765,8 @@ intel_miptree_map(struct brw_context *brw,
>  
> map = intel_miptree_attach_map(mt, level, slice, x, y, w, h, mode);
> if (!map){
> +  miptree->is_shadow_mapped = false;
> +
>*out_ptr = NULL;
>*out_stride = 0;
>return;
> @@ -3942,3 +3947,62 @@ intel_miptree_get_clear_color(const struct 
> gen_device_info *devinfo,
>return mt->fast_clear_color;
> }
>  }
> +
> +void
> +intel_update_decompressed_shadow(struct brw_context *brw,
> + struct intel_mipmap_tree *mt)
> +{
> +   struct intel_mipmap_tree *smt = mt->shadow_mt;
> +
> +   assert(smt);
> +   assert(mt->needs_fake_etc);
> +   assert(mt->surf.size_B > 0);
> +
> +   int img_w = smt->surf.logical_level0_px.width;
> +   int img_h = smt->surf.logical_level0_px.height;
> +   int img_d = smt->surf.logical_level0_px.depth;

I don't think 3D ETC textures are possible. From the GL4.6 spec:

An INVALID_OPERATION error is generated by CompressedTexImage3D
if internalformat is one of the EAC, ETC2, or RGTC formats and
either border is non-zero, or target is not TEXTURE_2D_ARRAY.

> +
> +   ptrdiff_t shadow_stride = _mesa_format_row_stride(smt->format, img_w);
> +

This variable gets overwritten when calling intel_miptree_map().

> +   for (int level = smt->first_level; level <= smt->last_level; level++) {

Since we're already iterating levels here we should fold in patch 6 to
get the right level dimensions.

> +  struct compressed_pixelstore store;
> +  _mesa_compute_compressed_pixelstore(mt->surf.dim,
> +  mt->format,
> +  img_w, img_h, img_d,
> +  >ctx.Unpack,
> +  );

store.TotalBytesPerRow will give you the pitch for a buffer allocated
without padding. mt->surf->row_pitch_B gives you the actual pitch.

> +  for (unsigned int slice = 0; slice < img_d; slice++) {
> + GLbitfield mmode = GL_MAP_READ_BIT | BRW_MAP_DIRECT_BIT |
> +BRW_MAP_ETC_BIT;
> + GLbitfield smode = GL_MAP_WRITE_BIT |
> +GL_MAP_INVALIDATE_RANGE_BIT |
> +BRW_MAP_DIRECT_BIT;
> +
> + uint32_t img_x, img_y;
> + intel_miptree_get_image_offset(smt, level, slice, _x, _y);
> +
> + void *mptr = intel_miptree_map_raw(brw, mt, mmode) + mt->offset
> ++ img_y * store.TotalBytesPerRow
> ++ img_x * 

Re: [Mesa-dev] [PATCH 6/8] radeonsi: use WRITE_DATA for small glBufferSubData sizes

2019-01-18 Thread Marek Olšák
On Fri, Jan 18, 2019 at 6:05 PM Bas Nieuwenhuizen 
wrote:

> On Fri, Jan 18, 2019 at 5:44 PM Marek Olšák  wrote:
> >
> > From: Marek Olšák 
> >
> > ---
> >  src/gallium/drivers/radeonsi/si_buffer.c | 27 
> >  src/gallium/drivers/radeonsi/si_pipe.h   |  1 +
> >  2 files changed, 28 insertions(+)
> >
> > diff --git a/src/gallium/drivers/radeonsi/si_buffer.c
> b/src/gallium/drivers/radeonsi/si_buffer.c
> > index 4766cf4bdfa..a1e421b8b0d 100644
> > --- a/src/gallium/drivers/radeonsi/si_buffer.c
> > +++ b/src/gallium/drivers/radeonsi/si_buffer.c
> > @@ -16,20 +16,22 @@
> >   * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> >   * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> >   * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT
> SHALL
> >   * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
> >   * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> >   * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
> OR THE
> >   * USE OR OTHER DEALINGS IN THE SOFTWARE.
> >   */
> >
> >  #include "radeonsi/si_pipe.h"
> > +#include "sid.h"
> > +
> >  #include "util/u_memory.h"
> >  #include "util/u_upload_mgr.h"
> >  #include "util/u_transfer.h"
> >  #include 
> >  #include 
> >
> >  bool si_rings_is_buffer_referenced(struct si_context *sctx,
> >struct pb_buffer *buf,
> >enum radeon_bo_usage usage)
> >  {
> > @@ -506,20 +508,38 @@ static void *si_buffer_transfer_map(struct
> pipe_context *ctx,
> > data = si_buffer_map_sync_with_rings(sctx, rbuffer, usage);
> > if (!data) {
> > return NULL;
> > }
> > data += box->x;
> >
> > return si_buffer_get_transfer(ctx, resource, usage, box,
> > ptransfer, data, NULL, 0);
> >  }
> >
> > +static void si_buffer_write_data(struct si_context *sctx, struct
> r600_resource *buf,
> > +unsigned offset, unsigned size, const
> void *data)
> > +{
> > +   struct radeon_cmdbuf *cs = sctx->gfx_cs;
> > +
> > +   si_need_gfx_cs_space(sctx);
> > +
> > +   sctx->flags |= SI_CONTEXT_PS_PARTIAL_FLUSH |
> > +  SI_CONTEXT_CS_PARTIAL_FLUSH |
> > +  si_get_flush_flags(sctx, SI_COHERENCY_SHADER,
> L2_LRU);
> > +   si_emit_cache_flush(sctx);
>
> Maybe only do the cache flush if the buffer is referenced by the
> current cmd buffer?
>

We can't do that, because 2 consecutive IBs can execute simultaneously for
a moment.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/8] radeonsi: use WRITE_DATA for small glBufferSubData sizes

2019-01-18 Thread Marek Olšák
On Fri, Jan 18, 2019 at 6:05 PM Bas Nieuwenhuizen 
wrote:

> On Fri, Jan 18, 2019 at 5:44 PM Marek Olšák  wrote:
> >
> > From: Marek Olšák 
> >
> > ---
> >  src/gallium/drivers/radeonsi/si_buffer.c | 27 
> >  src/gallium/drivers/radeonsi/si_pipe.h   |  1 +
> >  2 files changed, 28 insertions(+)
> >
> > diff --git a/src/gallium/drivers/radeonsi/si_buffer.c
> b/src/gallium/drivers/radeonsi/si_buffer.c
> > index 4766cf4bdfa..a1e421b8b0d 100644
> > --- a/src/gallium/drivers/radeonsi/si_buffer.c
> > +++ b/src/gallium/drivers/radeonsi/si_buffer.c
> > @@ -16,20 +16,22 @@
> >   * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> >   * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> >   * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT
> SHALL
> >   * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
> >   * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> >   * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
> OR THE
> >   * USE OR OTHER DEALINGS IN THE SOFTWARE.
> >   */
> >
> >  #include "radeonsi/si_pipe.h"
> > +#include "sid.h"
> > +
> >  #include "util/u_memory.h"
> >  #include "util/u_upload_mgr.h"
> >  #include "util/u_transfer.h"
> >  #include 
> >  #include 
> >
> >  bool si_rings_is_buffer_referenced(struct si_context *sctx,
> >struct pb_buffer *buf,
> >enum radeon_bo_usage usage)
> >  {
> > @@ -506,20 +508,38 @@ static void *si_buffer_transfer_map(struct
> pipe_context *ctx,
> > data = si_buffer_map_sync_with_rings(sctx, rbuffer, usage);
> > if (!data) {
> > return NULL;
> > }
> > data += box->x;
> >
> > return si_buffer_get_transfer(ctx, resource, usage, box,
> > ptransfer, data, NULL, 0);
> >  }
> >
> > +static void si_buffer_write_data(struct si_context *sctx, struct
> r600_resource *buf,
> > +unsigned offset, unsigned size, const
> void *data)
> > +{
> > +   struct radeon_cmdbuf *cs = sctx->gfx_cs;
> > +
> > +   si_need_gfx_cs_space(sctx);
> > +
> > +   sctx->flags |= SI_CONTEXT_PS_PARTIAL_FLUSH |
> > +  SI_CONTEXT_CS_PARTIAL_FLUSH |
> > +  si_get_flush_flags(sctx, SI_COHERENCY_SHADER,
> L2_LRU);
> > +   si_emit_cache_flush(sctx);
>
> Maybe only do the cache flush if the buffer is referenced by the
> current cmd buffer?
>
> (I'm kinda surprised reading this that we don't do
> DISCARD_WHOLE_RESOURCE if the offset is 0 and the size equal to the
> buffer size. )
>

Thanks, that's a good point. We promote to DISCARD_WHOLE_RESOURCE in
si_buffer_transfer_map. I'll remove the hunk in si_buffer_subdata, because
it doesn't do optimizations that si_buffer_transfer_map does.

Marek


> > +
> > +   si_cp_write_data(sctx, buf, offset, size, V_370_TC_L2, V_370_ME,
> data);
> > +
> > +   radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0));
> > +   radeon_emit(cs, 0);
> > +}
> > +
> >  static void si_buffer_do_flush_region(struct pipe_context *ctx,
> >   struct pipe_transfer *transfer,
> >   const struct pipe_box *box)
> >  {
> > struct si_transfer *stransfer = (struct si_transfer*)transfer;
> > struct r600_resource *rbuffer =
> r600_resource(transfer->resource);
> >
> > if (stransfer->u.staging) {
> > /* Copy the staging buffer into the original one. */
> > si_copy_buffer((struct si_context*)ctx,
> transfer->resource,
> > @@ -568,20 +588,27 @@ static void si_buffer_transfer_unmap(struct
> pipe_context *ctx,
> >
> >  static void si_buffer_subdata(struct pipe_context *ctx,
> >   struct pipe_resource *buffer,
> >   unsigned usage, unsigned offset,
> >   unsigned size, const void *data)
> >  {
> > struct pipe_transfer *transfer = NULL;
> > struct pipe_box box;
> > uint8_t *map = NULL;
> >
> > +   if (size <= SI_TRANSFER_WRITE_DATA_THRESHOLD &&
> > +   offset % 4 == 0 && size % 4 == 0 && (uintptr_t)data % 4 ==
> 0) {
> > +   si_buffer_write_data((struct si_context*)ctx,
> > +r600_resource(buffer), offset,
> size, data);
> > +   return;
> > +   }
> > +
> > u_box_1d(offset, size, );
> > map = si_buffer_transfer_map(ctx, buffer, 0,
> >PIPE_TRANSFER_WRITE |
> >PIPE_TRANSFER_DISCARD_RANGE |
> >usage,
> >, );
> > if (!map)
> > return;
> >
> > memcpy(map, data, size);
> > diff --git 

[Mesa-dev] [PATCH 1/9] radeonsi: rename r600_resource -> si_resource

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeon/radeon_vcn_dec.c   |  4 +-
 src/gallium/drivers/radeon/radeon_video.c |  6 +-
 src/gallium/drivers/radeon/radeon_video.h |  2 +-
 src/gallium/drivers/radeonsi/cik_sdma.c   |  4 +-
 src/gallium/drivers/radeonsi/si_buffer.c  | 52 +++---
 src/gallium/drivers/radeonsi/si_compute.c | 30 
 .../drivers/radeonsi/si_compute_blit.c|  6 +-
 src/gallium/drivers/radeonsi/si_cp_dma.c  | 18 ++---
 src/gallium/drivers/radeonsi/si_debug.c   |  8 +--
 src/gallium/drivers/radeonsi/si_descriptors.c | 70 +--
 src/gallium/drivers/radeonsi/si_dma.c |  4 +-
 src/gallium/drivers/radeonsi/si_dma_cs.c  |  6 +-
 src/gallium/drivers/radeonsi/si_fence.c   | 12 ++--
 src/gallium/drivers/radeonsi/si_gfx_cs.c  |  2 +-
 src/gallium/drivers/radeonsi/si_perfcounter.c |  6 +-
 src/gallium/drivers/radeonsi/si_pipe.c| 18 ++---
 src/gallium/drivers/radeonsi/si_pipe.h| 58 +++
 src/gallium/drivers/radeonsi/si_pm4.c | 12 ++--
 src/gallium/drivers/radeonsi/si_pm4.h |  6 +-
 src/gallium/drivers/radeonsi/si_query.c   | 28 
 src/gallium/drivers/radeonsi/si_query.h   | 10 +--
 src/gallium/drivers/radeonsi/si_shader.c  |  6 +-
 src/gallium/drivers/radeonsi/si_shader.h  |  4 +-
 src/gallium/drivers/radeonsi/si_state.c   | 12 ++--
 src/gallium/drivers/radeonsi/si_state.h   |  8 +--
 src/gallium/drivers/radeonsi/si_state_draw.c  | 26 +++
 .../drivers/radeonsi/si_state_shaders.c   |  8 +--
 .../drivers/radeonsi/si_state_streamout.c |  8 +--
 src/gallium/drivers/radeonsi/si_texture.c | 30 
 29 files changed, 232 insertions(+), 232 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_vcn_dec.c 
b/src/gallium/drivers/radeon/radeon_vcn_dec.c
index e402af21a64..a4e6d9dc6b5 100644
--- a/src/gallium/drivers/radeon/radeon_vcn_dec.c
+++ b/src/gallium/drivers/radeon/radeon_vcn_dec.c
@@ -822,8 +822,8 @@ static struct pb_buffer *rvcn_dec_message_decode(struct 
radeon_decoder *dec,
decode->bsd_size = align(dec->bs_size, 128);
decode->dpb_size = dec->dpb.res->buf->size;
decode->dt_size =
-   r600_resource(((struct vl_video_buffer 
*)target)->resources[0])->buf->size +
-   r600_resource(((struct vl_video_buffer 
*)target)->resources[1])->buf->size;
+   si_resource(((struct vl_video_buffer 
*)target)->resources[0])->buf->size +
+   si_resource(((struct vl_video_buffer 
*)target)->resources[1])->buf->size;
 
decode->sct_size = 0;
decode->sc_coeff_size = 0;
diff --git a/src/gallium/drivers/radeon/radeon_video.c 
b/src/gallium/drivers/radeon/radeon_video.c
index bb1173e8005..06f0d132219 100644
--- a/src/gallium/drivers/radeon/radeon_video.c
+++ b/src/gallium/drivers/radeon/radeon_video.c
@@ -63,8 +63,8 @@ bool si_vid_create_buffer(struct pipe_screen *screen, struct 
rvid_buffer *buffer
 * able to move buffers around individually, so request a
 * non-sub-allocated buffer.
 */
-   buffer->res = r600_resource(pipe_buffer_create(screen, PIPE_BIND_SHARED,
-  usage, size));
+   buffer->res = si_resource(pipe_buffer_create(screen, PIPE_BIND_SHARED,
+usage, size));
 
return buffer->res != NULL;
 }
@@ -72,7 +72,7 @@ bool si_vid_create_buffer(struct pipe_screen *screen, struct 
rvid_buffer *buffer
 /* destroy a buffer */
 void si_vid_destroy_buffer(struct rvid_buffer *buffer)
 {
-   r600_resource_reference(>res, NULL);
+   si_resource_reference(>res, NULL);
 }
 
 /* reallocate a buffer, preserving its content */
diff --git a/src/gallium/drivers/radeon/radeon_video.h 
b/src/gallium/drivers/radeon/radeon_video.h
index 71904b313f4..b7797c05d16 100644
--- a/src/gallium/drivers/radeon/radeon_video.h
+++ b/src/gallium/drivers/radeon/radeon_video.h
@@ -40,7 +40,7 @@
 struct rvid_buffer
 {
unsignedusage;
-   struct r600_resource*res;
+   struct si_resource  *res;
 };
 
 /* generate an stream handle */
diff --git a/src/gallium/drivers/radeonsi/cik_sdma.c 
b/src/gallium/drivers/radeonsi/cik_sdma.c
index 1c2fd0f7b1c..8bf6b30aec7 100644
--- a/src/gallium/drivers/radeonsi/cik_sdma.c
+++ b/src/gallium/drivers/radeonsi/cik_sdma.c
@@ -35,8 +35,8 @@ static void cik_sdma_copy_buffer(struct si_context *ctx,
 {
struct radeon_cmdbuf *cs = ctx->dma_cs;
unsigned i, ncopy, csize;
-   struct r600_resource *rdst = r600_resource(dst);
-   struct r600_resource *rsrc = r600_resource(src);
+   struct si_resource *rdst = si_resource(dst);
+   struct si_resource *rsrc = si_resource(src);
 
/* Mark the buffer range of destination as valid (initialized),
 * so that transfer_map knows it should wait for the GPU when mapping
diff --git 

Re: [Mesa-dev] [PATCH] mesa: add EXT_debug_label support

2019-01-18 Thread Eric Anholt
Timothy Arceri  writes:

> On 11/12/18 5:11 pm, Ian Romanick wrote:
>> On 12/10/18 5:52 PM, Timothy Arceri wrote:
>>> On 11/12/18 11:35 am, Ian Romanick wrote:
 It seems like someone already sent out patches to implement this, and we
 decided to not take it for some reason.  Maybe it was Rob?
>>>
>>> I discovered a thread from the beginning of 2017 titled "feature.txt &
>>> EXT_debug_label extension". But couldn't find any implementation.
>>>
>>> There was a reply from yourself, but it seems incorrect to me:
>>>
>>> "I checked both extensions, and they're not "just" aliases.  The EXT adds
>>> a single function with an enum to select the kind of object.  The KHR
>>> adds a function per kind of object.  It would be easy enough to add, but
>>> it seems more valuable to suggest the developer use the more broadly
>>> supported extension."
>> 
>> That's weird for a couple reasons.  One, that's not even the discussion
>> that I was thinking of.  I'll check in the morning to see if I can find
>> it.  Two, I was clearly full of it... I really don't see how I came that
>> conclusion.  I don't even see any other related extensions that I could
>> have been confusing either thing with.
>
> So do you think we should push this?

I don't see any piglit or CTS tests for this extension.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 102204] GL_ARB_buffer_storage crippled extension on r600, radeonsi and amdgpu Mesa drivers

2019-01-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=102204

John  changed:

   What|Removed |Added

 CC||john.etted...@gmail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 109391] LTO Build fails

2019-01-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109391

Bug ID: 109391
   Summary: LTO Build fails
   Product: Mesa
   Version: git
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: GLX
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: che...@gmail.com
QA Contact: mesa-dev@lists.freedesktop.org

Building mesa git with meson fails if LTO is enabled. This has been working
before for a long time (when i was still using autotools to build, though due
to a longer christmas holiday this was some time ago). 

Here is the build log of the failed build:

https://copr-be.cloud.fedoraproject.org/results/che/mesa/fedora-29-x86_64/00847998-mesa/build.log.gz

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 9/9] radeonsi: remove r600 from comments

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_gfx_cs.c | 2 +-
 src/gallium/drivers/radeonsi/si_pipe.h   | 2 +-
 src/gallium/drivers/radeonsi/si_state.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_gfx_cs.c 
b/src/gallium/drivers/radeonsi/si_gfx_cs.c
index b77231366e2..3d64587fa2b 100644
--- a/src/gallium/drivers/radeonsi/si_gfx_cs.c
+++ b/src/gallium/drivers/radeonsi/si_gfx_cs.c
@@ -26,21 +26,21 @@
 #include "si_pipe.h"
 
 #include "util/os_time.h"
 
 /* initialize */
 void si_need_gfx_cs_space(struct si_context *ctx)
 {
struct radeon_cmdbuf *cs = ctx->gfx_cs;
 
/* There is no need to flush the DMA IB here, because
-* r600_need_dma_space always flushes the GFX IB if there is
+* si_need_dma_space always flushes the GFX IB if there is
 * a conflict, which means any unflushed DMA commands automatically
 * precede the GFX IB (= they had no dependency on the GFX IB when
 * they were submitted).
 */
 
/* There are two memory usage counters in the winsys for all buffers
 * that have been added (cs_add_buffer) and two counters in the pipe
 * driver for those that haven't been added yet.
 */
if (unlikely(!radeon_cs_memory_below_limit(ctx->screen, ctx->gfx_cs,
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 352d1ba3034..caaa42e2f77 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -1245,21 +1245,21 @@ struct pipe_fence_handle *si_create_fence(struct 
pipe_context *ctx,
 
 /* si_get.c */
 void si_init_screen_get_functions(struct si_screen *sscreen);
 
 /* si_gfx_cs.c */
 void si_flush_gfx_cs(struct si_context *ctx, unsigned flags,
 struct pipe_fence_handle **fence);
 void si_begin_new_gfx_cs(struct si_context *ctx);
 void si_need_gfx_cs_space(struct si_context *ctx);
 
-/* r600_gpu_load.c */
+/* si_gpu_load.c */
 void si_gpu_load_kill_thread(struct si_screen *sscreen);
 uint64_t si_begin_counter(struct si_screen *sscreen, unsigned type);
 unsigned si_end_counter(struct si_screen *sscreen, unsigned type,
uint64_t begin);
 
 /* si_compute.c */
 void si_init_compute_functions(struct si_context *sctx);
 
 /* si_perfcounters.c */
 void si_init_perfcounters(struct si_screen *screen);
diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index fc8343133ac..b49a1b3695e 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -2145,21 +2145,21 @@ static boolean si_is_format_supported(struct 
pipe_screen *screen,
  enum pipe_format format,
  enum pipe_texture_target target,
  unsigned sample_count,
  unsigned storage_sample_count,
  unsigned usage)
 {
struct si_screen *sscreen = (struct si_screen *)screen;
unsigned retval = 0;
 
if (target >= PIPE_MAX_TEXTURE_TYPES) {
-   PRINT_ERR("r600: unsupported texture type %d\n", target);
+   PRINT_ERR("radeonsi: unsupported texture type %d\n", target);
return false;
}
 
if (MAX2(1, sample_count) < MAX2(1, storage_sample_count))
return false;
 
if (sample_count > 1) {
if (!screen->get_param(screen, PIPE_CAP_TEXTURE_MULTISAMPLE))
return false;
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/9] radeonsi: rename rsrc -> ssrc, rdst -> sdst

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/cik_sdma.c  | 12 -
 src/gallium/drivers/radeonsi/si_buffer.c | 32 
 src/gallium/drivers/radeonsi/si_cp_dma.c | 16 ++--
 src/gallium/drivers/radeonsi/si_dma.c| 12 -
 src/gallium/drivers/radeonsi/si_dma_cs.c | 10 
 src/gallium/drivers/radeonsi/si_fence.c  | 20 +++
 6 files changed, 51 insertions(+), 51 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/cik_sdma.c 
b/src/gallium/drivers/radeonsi/cik_sdma.c
index 8bf6b30aec7..096f75e508f 100644
--- a/src/gallium/drivers/radeonsi/cik_sdma.c
+++ b/src/gallium/drivers/radeonsi/cik_sdma.c
@@ -28,34 +28,34 @@
 
 static void cik_sdma_copy_buffer(struct si_context *ctx,
 struct pipe_resource *dst,
 struct pipe_resource *src,
 uint64_t dst_offset,
 uint64_t src_offset,
 uint64_t size)
 {
struct radeon_cmdbuf *cs = ctx->dma_cs;
unsigned i, ncopy, csize;
-   struct si_resource *rdst = si_resource(dst);
-   struct si_resource *rsrc = si_resource(src);
+   struct si_resource *sdst = si_resource(dst);
+   struct si_resource *ssrc = si_resource(src);
 
/* Mark the buffer range of destination as valid (initialized),
 * so that transfer_map knows it should wait for the GPU when mapping
 * that range. */
-   util_range_add(>valid_buffer_range, dst_offset,
+   util_range_add(>valid_buffer_range, dst_offset,
   dst_offset + size);
 
-   dst_offset += rdst->gpu_address;
-   src_offset += rsrc->gpu_address;
+   dst_offset += sdst->gpu_address;
+   src_offset += ssrc->gpu_address;
 
ncopy = DIV_ROUND_UP(size, CIK_SDMA_COPY_MAX_SIZE);
-   si_need_dma_space(ctx, ncopy * 7, rdst, rsrc);
+   si_need_dma_space(ctx, ncopy * 7, sdst, ssrc);
 
for (i = 0; i < ncopy; i++) {
csize = MIN2(size, CIK_SDMA_COPY_MAX_SIZE);
radeon_emit(cs, CIK_SDMA_PACKET(CIK_SDMA_OPCODE_COPY,
CIK_SDMA_COPY_SUB_OPCODE_LINEAR,
0));
radeon_emit(cs, ctx->chip_class >= GFX9 ? csize - 1 : csize);
radeon_emit(cs, 0); /* src/dst endian swap */
radeon_emit(cs, src_offset);
radeon_emit(cs, src_offset >> 32);
diff --git a/src/gallium/drivers/radeonsi/si_buffer.c 
b/src/gallium/drivers/radeonsi/si_buffer.c
index f84dae79102..5c93eacc4b1 100644
--- a/src/gallium/drivers/radeonsi/si_buffer.c
+++ b/src/gallium/drivers/radeonsi/si_buffer.c
@@ -296,36 +296,36 @@ si_invalidate_buffer(struct si_context *sctx,
 
return true;
 }
 
 /* Replace the storage of dst with src. */
 void si_replace_buffer_storage(struct pipe_context *ctx,
 struct pipe_resource *dst,
 struct pipe_resource *src)
 {
struct si_context *sctx = (struct si_context*)ctx;
-   struct si_resource *rdst = si_resource(dst);
-   struct si_resource *rsrc = si_resource(src);
-   uint64_t old_gpu_address = rdst->gpu_address;
-
-   pb_reference(>buf, rsrc->buf);
-   rdst->gpu_address = rsrc->gpu_address;
-   rdst->b.b.bind = rsrc->b.b.bind;
-   rdst->b.max_forced_staging_uploads = rsrc->b.max_forced_staging_uploads;
-   rdst->max_forced_staging_uploads = rsrc->max_forced_staging_uploads;
-   rdst->flags = rsrc->flags;
-
-   assert(rdst->vram_usage == rsrc->vram_usage);
-   assert(rdst->gart_usage == rsrc->gart_usage);
-   assert(rdst->bo_size == rsrc->bo_size);
-   assert(rdst->bo_alignment == rsrc->bo_alignment);
-   assert(rdst->domains == rsrc->domains);
+   struct si_resource *sdst = si_resource(dst);
+   struct si_resource *ssrc = si_resource(src);
+   uint64_t old_gpu_address = sdst->gpu_address;
+
+   pb_reference(>buf, ssrc->buf);
+   sdst->gpu_address = ssrc->gpu_address;
+   sdst->b.b.bind = ssrc->b.b.bind;
+   sdst->b.max_forced_staging_uploads = ssrc->b.max_forced_staging_uploads;
+   sdst->max_forced_staging_uploads = ssrc->max_forced_staging_uploads;
+   sdst->flags = ssrc->flags;
+
+   assert(sdst->vram_usage == ssrc->vram_usage);
+   assert(sdst->gart_usage == ssrc->gart_usage);
+   assert(sdst->bo_size == ssrc->bo_size);
+   assert(sdst->bo_alignment == ssrc->bo_alignment);
+   assert(sdst->domains == ssrc->domains);
 
si_rebind_buffer(sctx, dst, old_gpu_address);
 }
 
 static void si_invalidate_resource(struct pipe_context *ctx,
   struct pipe_resource *resource)
 {
struct si_context *sctx = (struct si_context*)ctx;
struct si_resource *rbuffer = si_resource(resource);
 
diff --git 

[Mesa-dev] [PATCH 6/9] radeonsi: rename rfence -> sfence

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_fence.c | 98 -
 1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_fence.c 
b/src/gallium/drivers/radeonsi/si_fence.c
index bb53ccba947..78da742b5da 100644
--- a/src/gallium/drivers/radeonsi/si_fence.c
+++ b/src/gallium/drivers/radeonsi/si_fence.c
@@ -279,81 +279,81 @@ static void si_fine_fence_set(struct si_context *ctx,
assert(false);
}
 }
 
 static boolean si_fence_finish(struct pipe_screen *screen,
   struct pipe_context *ctx,
   struct pipe_fence_handle *fence,
   uint64_t timeout)
 {
struct radeon_winsys *rws = ((struct si_screen*)screen)->ws;
-   struct si_multi_fence *rfence = (struct si_multi_fence *)fence;
+   struct si_multi_fence *sfence = (struct si_multi_fence *)fence;
struct si_context *sctx;
int64_t abs_timeout = os_time_get_absolute_timeout(timeout);
 
ctx = threaded_context_unwrap_sync(ctx);
sctx = (struct si_context*)(ctx ? ctx : NULL);
 
-   if (!util_queue_fence_is_signalled(>ready)) {
-   if (rfence->tc_token) {
+   if (!util_queue_fence_is_signalled(>ready)) {
+   if (sfence->tc_token) {
/* Ensure that si_flush_from_st will be called for
 * this fence, but only if we're in the API thread
 * where the context is current.
 *
 * Note that the batch containing the flush may already
 * be in flight in the driver thread, so the fence
 * may not be ready yet when this call returns.
 */
-   threaded_context_flush(ctx, rfence->tc_token,
+   threaded_context_flush(ctx, sfence->tc_token,
   timeout == 0);
}
 
if (!timeout)
return false;
 
if (timeout == PIPE_TIMEOUT_INFINITE) {
-   util_queue_fence_wait(>ready);
+   util_queue_fence_wait(>ready);
} else {
-   if (!util_queue_fence_wait_timeout(>ready, 
abs_timeout))
+   if (!util_queue_fence_wait_timeout(>ready, 
abs_timeout))
return false;
}
 
if (timeout && timeout != PIPE_TIMEOUT_INFINITE) {
int64_t time = os_time_get_nano();
timeout = abs_timeout > time ? abs_timeout - time : 0;
}
}
 
-   if (rfence->sdma) {
-   if (!rws->fence_wait(rws, rfence->sdma, timeout))
+   if (sfence->sdma) {
+   if (!rws->fence_wait(rws, sfence->sdma, timeout))
return false;
 
/* Recompute the timeout after waiting. */
if (timeout && timeout != PIPE_TIMEOUT_INFINITE) {
int64_t time = os_time_get_nano();
timeout = abs_timeout > time ? abs_timeout - time : 0;
}
}
 
-   if (!rfence->gfx)
+   if (!sfence->gfx)
return true;
 
-   if (rfence->fine.buf &&
-   si_fine_fence_signaled(rws, >fine)) {
-   rws->fence_reference(>gfx, NULL);
-   si_resource_reference(>fine.buf, NULL);
+   if (sfence->fine.buf &&
+   si_fine_fence_signaled(rws, >fine)) {
+   rws->fence_reference(>gfx, NULL);
+   si_resource_reference(>fine.buf, NULL);
return true;
}
 
/* Flush the gfx IB if it hasn't been flushed yet. */
-   if (sctx && rfence->gfx_unflushed.ctx == sctx &&
-   rfence->gfx_unflushed.ib_index == sctx->num_gfx_cs_flushes) {
+   if (sctx && sfence->gfx_unflushed.ctx == sctx &&
+   sfence->gfx_unflushed.ib_index == sctx->num_gfx_cs_flushes) {
/* Section 4.1.2 (Signaling) of the OpenGL 4.6 (Core profile)
 * spec says:
 *
 *"If the sync object being blocked upon will not be
 * signaled in finite time (for example, by an associated
 * fence command issued previously, but not yet flushed to
 * the graphics pipeline), then ClientWaitSync may hang
 * forever. To help prevent this behavior, if
 * ClientWaitSync is called and all of the following are
 * true:
@@ -366,111 +366,111 @@ static boolean si_fence_finish(struct pipe_screen 
*screen,
 * then the GL will behave as if the equivalent of Flush
 * were inserted immediately after the creation of sync."
 *
 

[Mesa-dev] [PATCH 5/9] radeonsi: rename rbo, rbuffer to buf or buffer

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_buffer.c  | 138 +-
 src/gallium/drivers/radeonsi/si_descriptors.c |  48 +++---
 src/gallium/drivers/radeonsi/si_pipe.h|  14 +-
 src/gallium/drivers/radeonsi/si_state.h   |   2 +-
 .../drivers/radeonsi/si_state_streamout.c |   4 +-
 5 files changed, 103 insertions(+), 103 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_buffer.c 
b/src/gallium/drivers/radeonsi/si_buffer.c
index 5c93eacc4b1..200aaa32ebb 100644
--- a/src/gallium/drivers/radeonsi/si_buffer.c
+++ b/src/gallium/drivers/radeonsi/si_buffer.c
@@ -243,62 +243,62 @@ bool si_alloc_resource(struct si_screen *sscreen,
fprintf(stderr, "VM start=0x%"PRIX64"  end=0x%"PRIX64" | Buffer 
%"PRIu64" bytes\n",
res->gpu_address, res->gpu_address + res->buf->size,
res->buf->size);
}
return true;
 }
 
 static void si_buffer_destroy(struct pipe_screen *screen,
  struct pipe_resource *buf)
 {
-   struct si_resource *rbuffer = si_resource(buf);
+   struct si_resource *buffer = si_resource(buf);
 
threaded_resource_deinit(buf);
-   util_range_destroy(>valid_buffer_range);
-   pb_reference(>buf, NULL);
-   FREE(rbuffer);
+   util_range_destroy(>valid_buffer_range);
+   pb_reference(>buf, NULL);
+   FREE(buffer);
 }
 
 /* Reallocate the buffer a update all resource bindings where the buffer is
  * bound.
  *
  * This is used to avoid CPU-GPU synchronizations, because it makes the buffer
  * idle by discarding its contents.
  */
 static bool
 si_invalidate_buffer(struct si_context *sctx,
-struct si_resource *rbuffer)
+struct si_resource *buf)
 {
/* Shared buffers can't be reallocated. */
-   if (rbuffer->b.is_shared)
+   if (buf->b.is_shared)
return false;
 
/* Sparse buffers can't be reallocated. */
-   if (rbuffer->flags & RADEON_FLAG_SPARSE)
+   if (buf->flags & RADEON_FLAG_SPARSE)
return false;
 
/* In AMD_pinned_memory, the user pointer association only gets
 * broken when the buffer is explicitly re-allocated.
 */
-   if (rbuffer->b.is_user_ptr)
+   if (buf->b.is_user_ptr)
return false;
 
/* Check if mapping this buffer would cause waiting for the GPU. */
-   if (si_rings_is_buffer_referenced(sctx, rbuffer->buf, 
RADEON_USAGE_READWRITE) ||
-   !sctx->ws->buffer_wait(rbuffer->buf, 0, RADEON_USAGE_READWRITE)) {
-   uint64_t old_va = rbuffer->gpu_address;
+   if (si_rings_is_buffer_referenced(sctx, buf->buf, 
RADEON_USAGE_READWRITE) ||
+   !sctx->ws->buffer_wait(buf->buf, 0, RADEON_USAGE_READWRITE)) {
+   uint64_t old_va = buf->gpu_address;
 
/* Reallocate the buffer in the same pipe_resource. */
-   si_alloc_resource(sctx->screen, rbuffer);
-   si_rebind_buffer(sctx, >b.b, old_va);
+   si_alloc_resource(sctx->screen, buf);
+   si_rebind_buffer(sctx, >b.b, old_va);
} else {
-   util_range_set_empty(>valid_buffer_range);
+   util_range_set_empty(>valid_buffer_range);
}
 
return true;
 }
 
 /* Replace the storage of dst with src. */
 void si_replace_buffer_storage(struct pipe_context *ctx,
 struct pipe_resource *dst,
 struct pipe_resource *src)
 {
@@ -320,25 +320,25 @@ void si_replace_buffer_storage(struct pipe_context *ctx,
assert(sdst->bo_alignment == ssrc->bo_alignment);
assert(sdst->domains == ssrc->domains);
 
si_rebind_buffer(sctx, dst, old_gpu_address);
 }
 
 static void si_invalidate_resource(struct pipe_context *ctx,
   struct pipe_resource *resource)
 {
struct si_context *sctx = (struct si_context*)ctx;
-   struct si_resource *rbuffer = si_resource(resource);
+   struct si_resource *buf = si_resource(resource);
 
/* We currently only do anyting here for buffers */
if (resource->target == PIPE_BUFFER)
-   (void)si_invalidate_buffer(sctx, rbuffer);
+   (void)si_invalidate_buffer(sctx, buf);
 }
 
 static void *si_buffer_get_transfer(struct pipe_context *ctx,
struct pipe_resource *resource,
unsigned usage,
const struct pipe_box *box,
struct pipe_transfer **ptransfer,
void *data, struct si_resource *staging,
unsigned offset)
 {
@@ -365,98 +365,98 @@ static void *si_buffer_get_transfer(struct pipe_context 
*ctx,
 }
 
 static void *si_buffer_transfer_map(struct pipe_context *ctx,
struct 

[Mesa-dev] [PATCH 3/9] radeonsi: rename rscreen -> sscreen

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeon/radeon_uvd_enc.h |  2 +-
 src/gallium/drivers/radeon/radeon_uvd_enc_1_1.c | 10 +-
 src/gallium/drivers/radeonsi/si_compute.c   |  4 ++--
 src/gallium/drivers/radeonsi/si_gpu_load.c  |  4 ++--
 src/gallium/drivers/radeonsi/si_texture.c   |  2 +-
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_uvd_enc.h 
b/src/gallium/drivers/radeon/radeon_uvd_enc.h
index 63176d264c2..52e7ae3c0a9 100644
--- a/src/gallium/drivers/radeon/radeon_uvd_enc.h
+++ b/src/gallium/drivers/radeon/radeon_uvd_enc.h
@@ -457,13 +457,13 @@ struct radeon_uvd_encoder
unsigned byte_index;
unsigned bits_output;
uint32_t total_task_size;
uint32_t *p_task_size;
 
bool emulation_prevention;
bool need_feedback;
 };
 
 void radeon_uvd_enc_1_1_init(struct radeon_uvd_encoder *enc);
-bool si_radeon_uvd_enc_supported(struct si_screen *rscreen);
+bool si_radeon_uvd_enc_supported(struct si_screen *sscreen);
 
 #endif // _RADEON_UVD_ENC_H
diff --git a/src/gallium/drivers/radeon/radeon_uvd_enc_1_1.c 
b/src/gallium/drivers/radeon/radeon_uvd_enc_1_1.c
index ddb219792ae..1f41b09472f 100644
--- a/src/gallium/drivers/radeon/radeon_uvd_enc_1_1.c
+++ b/src/gallium/drivers/radeon/radeon_uvd_enc_1_1.c
@@ -828,24 +828,24 @@ radeon_uvd_enc_slice_header_hevc(struct 
radeon_uvd_encoder *enc)
   RADEON_ENC_CS(instruction[j]);
   RADEON_ENC_CS(num_bits[j]);
}
 
RADEON_ENC_END();
 }
 
 static void
 radeon_uvd_enc_ctx(struct radeon_uvd_encoder *enc)
 {
-   struct si_screen *rscreen = (struct si_screen *) enc->screen;
+   struct si_screen *sscreen = (struct si_screen *) enc->screen;
 
enc->enc_pic.ctx_buf.swizzle_mode = 0;
-   if (rscreen->info.chip_class < GFX9) {
+   if (sscreen->info.chip_class < GFX9) {
   enc->enc_pic.ctx_buf.rec_luma_pitch =
  (enc->luma->u.legacy.level[0].nblk_x * enc->luma->bpe);
   enc->enc_pic.ctx_buf.rec_chroma_pitch =
  (enc->chroma->u.legacy.level[0].nblk_x * enc->chroma->bpe);
}
else {
   enc->enc_pic.ctx_buf.rec_luma_pitch =
  enc->luma->u.gfx9.surf_pitch * enc->luma->bpe;
   enc->enc_pic.ctx_buf.rec_chroma_pitch =
  enc->chroma->u.gfx9.surf_pitch * enc->chroma->bpe;
@@ -943,41 +943,41 @@ radeon_uvd_enc_rc_per_pic(struct radeon_uvd_encoder *enc,
RADEON_ENC_CS(enc->enc_pic.rc_per_pic.max_au_size);
RADEON_ENC_CS(enc->enc_pic.rc_per_pic.enabled_filler_data);
RADEON_ENC_CS(enc->enc_pic.rc_per_pic.skip_frame_enable);
RADEON_ENC_CS(enc->enc_pic.rc_per_pic.enforce_hrd);
RADEON_ENC_END();
 }
 
 static void
 radeon_uvd_enc_encode_params_hevc(struct radeon_uvd_encoder *enc)
 {
-   struct si_screen *rscreen = (struct si_screen *) enc->screen;
+   struct si_screen *sscreen = (struct si_screen *) enc->screen;
switch (enc->enc_pic.picture_type) {
case PIPE_H265_ENC_PICTURE_TYPE_I:
case PIPE_H265_ENC_PICTURE_TYPE_IDR:
   enc->enc_pic.enc_params.pic_type = RENC_UVD_PICTURE_TYPE_I;
   break;
case PIPE_H265_ENC_PICTURE_TYPE_P:
   enc->enc_pic.enc_params.pic_type = RENC_UVD_PICTURE_TYPE_P;
   break;
case PIPE_H265_ENC_PICTURE_TYPE_SKIP:
   enc->enc_pic.enc_params.pic_type = RENC_UVD_PICTURE_TYPE_P_SKIP;
   break;
case PIPE_H265_ENC_PICTURE_TYPE_B:
   enc->enc_pic.enc_params.pic_type = RENC_UVD_PICTURE_TYPE_B;
   break;
default:
   enc->enc_pic.enc_params.pic_type = RENC_UVD_PICTURE_TYPE_I;
}
 
enc->enc_pic.enc_params.allowed_max_bitstream_size = enc->bs_size;
-   if (rscreen->info.chip_class < GFX9) {
+   if (sscreen->info.chip_class < GFX9) {
   enc->enc_pic.enc_params.input_pic_luma_pitch =
  (enc->luma->u.legacy.level[0].nblk_x * enc->luma->bpe);
   enc->enc_pic.enc_params.input_pic_chroma_pitch =
  (enc->chroma->u.legacy.level[0].nblk_x * enc->chroma->bpe);
}
else {
   enc->enc_pic.enc_params.input_pic_luma_pitch =
  enc->luma->u.gfx9.surf_pitch * enc->luma->bpe;
   enc->enc_pic.enc_params.input_pic_chroma_pitch =
  enc->chroma->u.gfx9.surf_pitch * enc->chroma->bpe;
@@ -991,21 +991,21 @@ radeon_uvd_enc_encode_params_hevc(struct 
radeon_uvd_encoder *enc)
   enc->enc_pic.enc_params.reference_picture_index =
  (enc->enc_pic.frame_num - 1) % 2;
 
enc->enc_pic.enc_params.reconstructed_picture_index =
   enc->enc_pic.frame_num % 2;
 
RADEON_ENC_BEGIN(RENC_UVD_IB_PARAM_ENCODE_PARAMS);
RADEON_ENC_CS(enc->enc_pic.enc_params.pic_type);
RADEON_ENC_CS(enc->enc_pic.enc_params.allowed_max_bitstream_size);
 
-   if (rscreen->info.chip_class < GFX9) {
+   if (sscreen->info.chip_class < GFX9) {
   RADEON_ENC_READ(enc->handle, RADEON_DOMAIN_VRAM,
   enc->luma->u.legacy.level[0].offset);
   RADEON_ENC_READ(enc->handle, RADEON_DOMAIN_VRAM,
   enc->chroma->u.legacy.level[0].offset);
}
else {
   

[Mesa-dev] [PATCH 8/9] winsys/amdgpu: rename rfence, rsrc, rdst -> afence, asrc, adst

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 36 +++
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.h | 10 +++
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index 72cf1e6c639..b4e62acbae4 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -165,99 +165,99 @@ static int amdgpu_export_signalled_sync_file(struct 
radeon_winsys *rws)
}
 
amdgpu_cs_destroy_syncobj(ws->dev, syncobj);
return fd;
 }
 
 static void amdgpu_fence_submitted(struct pipe_fence_handle *fence,
uint64_t seq_no,
uint64_t *user_fence_cpu_address)
 {
-   struct amdgpu_fence *rfence = (struct amdgpu_fence*)fence;
+   struct amdgpu_fence *afence = (struct amdgpu_fence*)fence;
 
-   rfence->fence.fence = seq_no;
-   rfence->user_fence_cpu_address = user_fence_cpu_address;
-   util_queue_fence_signal(>submitted);
+   afence->fence.fence = seq_no;
+   afence->user_fence_cpu_address = user_fence_cpu_address;
+   util_queue_fence_signal(>submitted);
 }
 
 static void amdgpu_fence_signalled(struct pipe_fence_handle *fence)
 {
-   struct amdgpu_fence *rfence = (struct amdgpu_fence*)fence;
+   struct amdgpu_fence *afence = (struct amdgpu_fence*)fence;
 
-   rfence->signalled = true;
-   util_queue_fence_signal(>submitted);
+   afence->signalled = true;
+   util_queue_fence_signal(>submitted);
 }
 
 bool amdgpu_fence_wait(struct pipe_fence_handle *fence, uint64_t timeout,
bool absolute)
 {
-   struct amdgpu_fence *rfence = (struct amdgpu_fence*)fence;
+   struct amdgpu_fence *afence = (struct amdgpu_fence*)fence;
uint32_t expired;
int64_t abs_timeout;
uint64_t *user_fence_cpu;
int r;
 
-   if (rfence->signalled)
+   if (afence->signalled)
   return true;
 
/* Handle syncobjs. */
-   if (amdgpu_fence_is_syncobj(rfence)) {
+   if (amdgpu_fence_is_syncobj(afence)) {
   /* Absolute timeouts are only be used by BO fences, which aren't
* backed by syncobjs.
*/
   assert(!absolute);
 
-  if (amdgpu_cs_syncobj_wait(rfence->ws->dev, >syncobj, 1,
+  if (amdgpu_cs_syncobj_wait(afence->ws->dev, >syncobj, 1,
  timeout, 0, NULL))
  return false;
 
-  rfence->signalled = true;
+  afence->signalled = true;
   return true;
}
 
if (absolute)
   abs_timeout = timeout;
else
   abs_timeout = os_time_get_absolute_timeout(timeout);
 
/* The fence might not have a number assigned if its IB is being
 * submitted in the other thread right now. Wait until the submission
 * is done. */
-   if (!util_queue_fence_wait_timeout(>submitted, abs_timeout))
+   if (!util_queue_fence_wait_timeout(>submitted, abs_timeout))
   return false;
 
-   user_fence_cpu = rfence->user_fence_cpu_address;
+   user_fence_cpu = afence->user_fence_cpu_address;
if (user_fence_cpu) {
-  if (*user_fence_cpu >= rfence->fence.fence) {
- rfence->signalled = true;
+  if (*user_fence_cpu >= afence->fence.fence) {
+ afence->signalled = true;
  return true;
   }
 
   /* No timeout, just query: no need for the ioctl. */
   if (!absolute && !timeout)
  return false;
}
 
/* Now use the libdrm query. */
-   r = amdgpu_cs_query_fence_status(>fence,
+   r = amdgpu_cs_query_fence_status(>fence,
abs_timeout,
AMDGPU_QUERY_FENCE_TIMEOUT_IS_ABSOLUTE,
);
if (r) {
   fprintf(stderr, "amdgpu: amdgpu_cs_query_fence_status failed.\n");
   return false;
}
 
if (expired) {
   /* This variable can only transition from false to true, so it doesn't
* matter if threads race for it. */
-  rfence->signalled = true;
+  afence->signalled = true;
   return true;
}
return false;
 }
 
 static bool amdgpu_fence_wait_rel_timeout(struct radeon_winsys *rws,
   struct pipe_fence_handle *fence,
   uint64_t timeout)
 {
return amdgpu_fence_wait(fence, timeout, false);
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
index 5de770c89e7..07b5d4b350c 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
@@ -163,35 +163,35 @@ static inline void amdgpu_ctx_unref(struct amdgpu_ctx 
*ctx)
if (p_atomic_dec_zero(>refcount)) {
   amdgpu_cs_ctx_free(ctx->ctx);
   amdgpu_bo_free(ctx->user_fence_bo);
   FREE(ctx);
}
 }
 
 static inline void amdgpu_fence_reference(struct pipe_fence_handle **dst,
   struct pipe_fence_handle *src)
 {
-   struct amdgpu_fence 

[Mesa-dev] [PATCH 2/9] radeonsi: rename rquery -> squery

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_perfcounter.c | 34 +++
 src/gallium/drivers/radeonsi/si_query.c   | 94 +--
 src/gallium/drivers/radeonsi/si_query.h   |  8 +-
 3 files changed, 68 insertions(+), 68 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_perfcounter.c 
b/src/gallium/drivers/radeonsi/si_perfcounter.c
index 949724ca720..2da14f8868f 100644
--- a/src/gallium/drivers/radeonsi/si_perfcounter.c
+++ b/src/gallium/drivers/radeonsi/si_perfcounter.c
@@ -750,42 +750,42 @@ static void si_pc_emit_read(struct si_context *sctx,
radeon_emit(cs, 0); /* immediate */
radeon_emit(cs, 0);
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
va += sizeof(uint64_t);
}
}
 }
 
 static void si_pc_query_destroy(struct si_screen *sscreen,
-   struct si_query *rquery)
+   struct si_query *squery)
 {
-   struct si_query_pc *query = (struct si_query_pc *)rquery;
+   struct si_query_pc *query = (struct si_query_pc *)squery;
 
while (query->groups) {
struct si_query_group *group = query->groups;
query->groups = group->next;
FREE(group);
}
 
FREE(query->counters);
 
si_query_buffer_destroy(sscreen, >buffer);
FREE(query);
 }
 
-static void si_pc_query_resume(struct si_context *sctx, struct si_query 
*rquery)
+static void si_pc_query_resume(struct si_context *sctx, struct si_query 
*squery)
 /*
   struct si_query_hw *hwquery,
   struct si_resource *buffer, uint64_t va)*/
 {
-   struct si_query_pc *query = (struct si_query_pc *)rquery;
+   struct si_query_pc *query = (struct si_query_pc *)squery;
int current_se = -1;
int current_instance = -1;
 
if (!si_query_buffer_alloc(sctx, >buffer, NULL, 
query->result_size))
return;
si_need_gfx_cs_space(sctx);
 
if (query->shaders)
si_pc_emit_shaders(sctx, query->shaders);
 
@@ -801,23 +801,23 @@ static void si_pc_query_resume(struct si_context *sctx, 
struct si_query *rquery)
si_pc_emit_select(sctx, block, group->num_counters, 
group->selectors);
}
 
if (current_se != -1 || current_instance != -1)
si_pc_emit_instance(sctx, -1, -1);
 
uint64_t va = query->buffer.buf->gpu_address + 
query->buffer.results_end;
si_pc_emit_start(sctx, query->buffer.buf, va);
 }
 
-static void si_pc_query_suspend(struct si_context *sctx, struct si_query 
*rquery)
+static void si_pc_query_suspend(struct si_context *sctx, struct si_query 
*squery)
 {
-   struct si_query_pc *query = (struct si_query_pc *)rquery;
+   struct si_query_pc *query = (struct si_query_pc *)squery;
 
if (!query->buffer.buf)
return;
 
uint64_t va = query->buffer.buf->gpu_address + 
query->buffer.results_end;
query->buffer.results_end += query->result_size;
 
si_pc_emit_stop(sctx, query->buffer.buf, va);
 
for (struct si_query_group *group = query->groups; group; group = 
group->next) {
@@ -835,42 +835,42 @@ static void si_pc_query_suspend(struct si_context *sctx, 
struct si_query *rquery
si_pc_emit_instance(sctx, se, instance);
si_pc_emit_read(sctx, block, 
group->num_counters, va);
va += sizeof(uint64_t) * group->num_counters;
} while (group->instance < 0 && ++instance < 
block->num_instances);
} while (++se < se_end);
}
 
si_pc_emit_instance(sctx, -1, -1);
 }
 
-static bool si_pc_query_begin(struct si_context *ctx, struct si_query *rquery)
+static bool si_pc_query_begin(struct si_context *ctx, struct si_query *squery)
 {
-   struct si_query_pc *query = (struct si_query_pc *)rquery;
+   struct si_query_pc *query = (struct si_query_pc *)squery;
 
si_query_buffer_reset(ctx, >buffer);
 
LIST_ADDTAIL(>b.active_list, >active_queries);
ctx->num_cs_dw_queries_suspend += query->b.num_cs_dw_suspend;
 
-   si_pc_query_resume(ctx, rquery);
+   si_pc_query_resume(ctx, squery);
 
return true;
 }
 
-static bool si_pc_query_end(struct si_context *ctx, struct si_query *rquery)
+static bool si_pc_query_end(struct si_context *ctx, struct si_query *squery)
 {
-   struct si_query_pc *query = (struct si_query_pc *)rquery;
+   struct si_query_pc *query = (struct si_query_pc *)squery;
 
-   si_pc_query_suspend(ctx, rquery);
+   si_pc_query_suspend(ctx, squery);
 
-   LIST_DEL(>active_list);
-   ctx->num_cs_dw_queries_suspend -= rquery->num_cs_dw_suspend;
+   LIST_DEL(>active_list);
+   ctx->num_cs_dw_queries_suspend -= 

[Mesa-dev] [PATCH 7/9] radeonsi: rename rview -> sview

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_descriptors.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 9900cde51ee..21d4ca946d3 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -490,32 +490,32 @@ static bool depth_needs_decompression(struct si_texture 
*tex)
 */
return tex->db_compatible;
 }
 
 static void si_set_sampler_view(struct si_context *sctx,
unsigned shader,
unsigned slot, struct pipe_sampler_view *view,
bool disallow_early_out)
 {
struct si_samplers *samplers = >samplers[shader];
-   struct si_sampler_view *rview = (struct si_sampler_view*)view;
+   struct si_sampler_view *sview = (struct si_sampler_view*)view;
struct si_descriptors *descs = si_sampler_and_image_descriptors(sctx, 
shader);
unsigned desc_slot = si_get_sampler_slot(slot);
uint32_t *desc = descs->list + desc_slot * 16;
 
if (samplers->views[slot] == view && !disallow_early_out)
return;
 
if (view) {
struct si_texture *tex = (struct si_texture *)view->texture;
 
-   si_set_sampler_view_desc(sctx, rview,
+   si_set_sampler_view_desc(sctx, sview,
 samplers->sampler_states[slot], desc);
 
if (tex->buffer.b.b.target == PIPE_BUFFER) {
tex->buffer.bind_history |= PIPE_BIND_SAMPLER_VIEW;
samplers->needs_depth_decompress_mask &= ~(1u << slot);
samplers->needs_color_decompress_mask &= ~(1u << slot);
} else {
if (depth_needs_decompression(tex)) {
samplers->needs_depth_decompress_mask |= 1u << 
slot;
} else {
@@ -532,21 +532,21 @@ static void si_set_sampler_view(struct si_context *sctx,
sctx->need_check_render_feedback = true;
}
 
pipe_sampler_view_reference(>views[slot], view);
samplers->enabled_mask |= 1u << slot;
 
/* Since this can flush, it must be done after enabled_mask is
 * updated. */
si_sampler_view_add_buffer(sctx, view->texture,
   RADEON_USAGE_READ,
-  rview->is_stencil_sampler, true);
+  sview->is_stencil_sampler, true);
} else {
pipe_sampler_view_reference(>views[slot], NULL);
memcpy(desc, null_texture_descriptor, 8*4);
/* Only clear the lower dwords of FMASK. */
memcpy(desc + 8, null_texture_descriptor, 4*4);
/* Re-set the sampler state if we are transitioning from FMASK. 
*/
if (samplers->sampler_states[slot])

si_set_sampler_state_desc(samplers->sampler_states[slot], NULL, NULL,
  desc + 12);
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] intel/fs: Exclude control sources from execution type and region alignment calculations.

2019-01-18 Thread Francisco Jerez
Currently the execution type calculation will return a bogus value in
cases like:

  mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u

Which will be considered to have a 32-bit integer execution type even
though the actual indirect move operation will be carried out with
16-bit precision.

Similarly there's no need to apply the CHV/BXT double-precision region
alignment restrictions to such control sources, since they aren't
directly involved in the double-precision arithmetic operations
emitted by these virtual instructions.  Applying the CHV/BXT
restrictions to control sources was expected to be harmless if mildly
inefficient, but unfortunately it exposed problems at codegen level
for virtual instructions (namely the SHUFFLE instruction used for the
Vulkan 1.1 subgroup feature) that weren't prepared to accept control
sources with an arbitrary strided region.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes 
Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass."
---
 src/intel/compiler/brw_fs.cpp | 54 +++
 src/intel/compiler/brw_fs_lower_regioning.cpp |  6 +--
 src/intel/compiler/brw_ir_fs.h| 10 +++-
 3 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 0359eb079f7..f475b617df2 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -271,6 +271,60 @@ fs_inst::is_send_from_grf() const
}
 }
 
+bool
+fs_inst::is_control_source(unsigned arg) const
+{
+   switch (opcode) {
+   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
+   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD_GEN7:
+   case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN4:
+   case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7:
+  return arg == 0;
+
+   case SHADER_OPCODE_BROADCAST:
+   case SHADER_OPCODE_SHUFFLE:
+   case SHADER_OPCODE_QUAD_SWIZZLE:
+   case FS_OPCODE_INTERPOLATE_AT_SAMPLE:
+   case FS_OPCODE_INTERPOLATE_AT_SHARED_OFFSET:
+   case FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET:
+   case SHADER_OPCODE_IMAGE_SIZE:
+   case SHADER_OPCODE_GET_BUFFER_SIZE:
+  return arg == 1;
+
+   case SHADER_OPCODE_MOV_INDIRECT:
+   case SHADER_OPCODE_CLUSTER_BROADCAST:
+   case SHADER_OPCODE_TEX:
+   case FS_OPCODE_TXB:
+   case SHADER_OPCODE_TXD:
+   case SHADER_OPCODE_TXF:
+   case SHADER_OPCODE_TXF_LZ:
+   case SHADER_OPCODE_TXF_CMS:
+   case SHADER_OPCODE_TXF_CMS_W:
+   case SHADER_OPCODE_TXF_UMS:
+   case SHADER_OPCODE_TXF_MCS:
+   case SHADER_OPCODE_TXL:
+   case SHADER_OPCODE_TXL_LZ:
+   case SHADER_OPCODE_TXS:
+   case SHADER_OPCODE_LOD:
+   case SHADER_OPCODE_TG4:
+   case SHADER_OPCODE_TG4_OFFSET:
+   case SHADER_OPCODE_SAMPLEINFO:
+   case SHADER_OPCODE_UNTYPED_ATOMIC:
+   case SHADER_OPCODE_UNTYPED_ATOMIC_FLOAT:
+   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
+   case SHADER_OPCODE_UNTYPED_SURFACE_WRITE:
+   case SHADER_OPCODE_BYTE_SCATTERED_READ:
+   case SHADER_OPCODE_BYTE_SCATTERED_WRITE:
+   case SHADER_OPCODE_TYPED_ATOMIC:
+   case SHADER_OPCODE_TYPED_SURFACE_READ:
+   case SHADER_OPCODE_TYPED_SURFACE_WRITE:
+  return arg == 1 || arg == 2;
+
+   default:
+  return false;
+   }
+}
+
 /**
  * Returns true if this instruction's sources and destinations cannot
  * safely be the same register.
diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp 
b/src/intel/compiler/brw_fs_lower_regioning.cpp
index df50993dee6..6a3c23892b4 100644
--- a/src/intel/compiler/brw_fs_lower_regioning.cpp
+++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
@@ -74,7 +74,7 @@ namespace {
  unsigned stride = inst->dst.stride * type_sz(inst->dst.type);
 
  for (unsigned i = 0; i < inst->sources; i++) {
-if (!is_uniform(inst->src[i]))
+if (!is_uniform(inst->src[i]) && !inst->is_control_source(i))
stride = MAX2(stride, inst->src[i].stride *
  type_sz(inst->src[i].type));
  }
@@ -92,7 +92,7 @@ namespace {
required_dst_byte_offset(const fs_inst *inst)
{
   for (unsigned i = 0; i < inst->sources; i++) {
- if (!is_uniform(inst->src[i]))
+ if (!is_uniform(inst->src[i]) && !inst->is_control_source(i))
 if (reg_offset(inst->src[i]) % REG_SIZE !=
 reg_offset(inst->dst) % REG_SIZE)
return 0;
@@ -109,7 +109,7 @@ namespace {
has_invalid_src_region(const gen_device_info *devinfo, const fs_inst *inst,
   unsigned i)
{
-  if (is_unordered(inst)) {
+  if (is_unordered(inst) || inst->is_control_source(i)) {
  return false;
   } else {
  const unsigned dst_byte_stride = inst->dst.stride * 
type_sz(inst->dst.type);
diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
index 08e3d83d910..0a0ba1d363a 100644
--- a/src/intel/compiler/brw_ir_fs.h
+++ b/src/intel/compiler/brw_ir_fs.h
@@ -358,6 +358,13 @@ public:
bool can_change_types() const;

[Mesa-dev] [PATCH 5/5] intel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer multiply.

2019-01-18 Thread Francisco Jerez
Even though the hardware spec claims that any "integer DWord multiply"
operation is affected by the regioning restrictions of CHV/BXT/GLK,
this is inconsistent with the behavior of the simulator and with
empirical evidence -- Return false from has_dst_aligned_region_restriction()
for such instructions as a micro-optimization.
---
 src/intel/compiler/brw_ir_fs.h | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
index 0a0ba1d363a..c50df45922a 100644
--- a/src/intel/compiler/brw_ir_fs.h
+++ b/src/intel/compiler/brw_ir_fs.h
@@ -543,11 +543,19 @@ has_dst_aligned_region_restriction(const gen_device_info 
*devinfo,
const fs_inst *inst)
 {
const brw_reg_type exec_type = get_exec_type(inst);
-   const bool is_int_multiply = !brw_reg_type_is_floating_point(exec_type) &&
- (inst->opcode == BRW_OPCODE_MUL || inst->opcode == BRW_OPCODE_MAD);
+   /* Even though the hardware spec claims that "integer DWord multiply"
+* operations are restricted, empirical evidence and the behavior of the
+* simulator suggest that only 32x32-bit integer multiplication is
+* restricted.
+*/
+   const bool is_dword_multiply = !brw_reg_type_is_floating_point(exec_type) &&
+  ((inst->opcode == BRW_OPCODE_MUL &&
+MIN2(type_sz(inst->src[0].type), type_sz(inst->src[1].type)) >= 4) ||
+   (inst->opcode == BRW_OPCODE_MAD &&
+MIN2(type_sz(inst->src[1].type), type_sz(inst->src[2].type)) >= 4));
 
if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
-   (type_sz(exec_type) == 4 && is_int_multiply))
+   (type_sz(exec_type) == 4 && is_dword_multiply))
   return devinfo->is_cherryview || gen_device_info_is_9lp(devinfo);
else
   return false;
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] intel/fs: Lower integer multiply correctly when destination stride equals 4.

2019-01-18 Thread Francisco Jerez
Because the "low" temporary needs to be accessed with word type and
twice the original stride, attempting to preserve the alignment of the
original destination can potentially lead to instructions with illegal
destination stride greater than four.  Because the CHV/BXT alignment
restrictions are now being enforced by the regioning lowering pass run
after lower_integer_multiplication(), there is no real need to
preserve the original strides anymore.

Note that this bug can be reproduced on stable branches, but
back-porting would be non-trivial, because the fix relies on the
regioning lowering pass recently introduced.
---
 src/intel/compiler/brw_fs.cpp | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index f475b617df2..5768e0d6542 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -3962,13 +3962,11 @@ fs_visitor::lower_integer_multiplication()
 regions_overlap(inst->dst, inst->size_written,
 inst->src[0], inst->size_read(0)) ||
 regions_overlap(inst->dst, inst->size_written,
-inst->src[1], inst->size_read(1))) {
+inst->src[1], inst->size_read(1)) ||
+inst->dst.stride >= 4) {
needs_mov = true;
-   /* Get a new VGRF but keep the same stride as inst->dst */
low = fs_reg(VGRF, alloc.allocate(regs_written(inst)),
 inst->dst.type);
-   low.stride = inst->dst.stride;
-   low.offset = inst->dst.offset % REG_SIZE;
 }
 
 /* Get a new VGRF but keep the same stride as inst->dst */
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] intel/fs: Cap dst-aligned region stride to maximum representable hstride value.

2019-01-18 Thread Francisco Jerez
This is required in combination with the following commit, because
otherwise if a source region with an extended 8+ stride is present in
the instruction (which we're about to declare legal) we'll end up
emitting code that attempts to write to such a region, even though
strides greater than four are still illegal for the destination.
---
 src/intel/compiler/brw_fs_lower_regioning.cpp | 20 ++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp 
b/src/intel/compiler/brw_fs_lower_regioning.cpp
index 6a3c23892b4..b86e95ed9eb 100644
--- a/src/intel/compiler/brw_fs_lower_regioning.cpp
+++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
@@ -71,15 +71,25 @@ namespace {
   !is_byte_raw_mov(inst)) {
  return get_exec_type_size(inst);
   } else {
- unsigned stride = inst->dst.stride * type_sz(inst->dst.type);
+ /* Calculate the maximum byte stride and the minimum type size across
+  * all source and destination operands.
+  */
+ unsigned max_stride = inst->dst.stride * type_sz(inst->dst.type);
+ unsigned min_size = type_sz(inst->dst.type);
 
  for (unsigned i = 0; i < inst->sources; i++) {
-if (!is_uniform(inst->src[i]) && !inst->is_control_source(i))
-   stride = MAX2(stride, inst->src[i].stride *
- type_sz(inst->src[i].type));
+if (!is_uniform(inst->src[i]) && !inst->is_control_source(i)) {
+   max_stride = MAX2(max_stride, inst->src[i].stride *
+ type_sz(inst->src[i].type));
+   min_size = MIN2(min_size, type_sz(inst->src[i].type));
+}
  }
 
- return stride;
+ /* Attempt to use the largest byte stride among all present operands,
+  * but never exceed a stride of 4 since that would lead to illegal
+  * destination regions during lowering.
+  */
+ return MIN2(max_stride, 4 * min_size);
   }
}
 
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] intel/fs: Implement extended strides greater than 4 for IR source regions.

2019-01-18 Thread Francisco Jerez
Strides up to 32B can be implemented for the source regions of most
instructions by leveraging either the vertical or the horizontal
stride of the hardware Align1 region.  The main motivation for this is
that currently the lower_integer_multiplication() pass will happily
double the stride of one of the 32-bit sources, which can blow up if
the stride of the original source was already the maximum value
allowed by the hardware.

An alternative would be to use the regioning legalization pass in
order to lower such strides into the composition of multiple legal
strides, but that would be somewhat less efficient.

This showed up as a regression from my commit cbea91eb57a501bebb1ca2
in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a
pre-existing problem that had affected conformance on other platforms
without native support for integer multiplication.  CHV/BXT were
getting around it because the code I removed in that commit had the
"fortunate" side effect of emitting narrower regions that didn't hit
the hardware stride limit after lowering.  Beyond fixing the
regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on
ICL (that's why this patch is marked for inclusion in mesa-stable even
though the original regressing patch was not).

Cc: mesa-sta...@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes 
---
 src/intel/compiler/brw_fs_generator.cpp | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index 5fc6cf5f8cc..b169eacf15b 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -90,9 +90,17 @@ brw_reg_from_fs_reg(const struct gen_device_info *devinfo, 
fs_inst *inst,
   *   different execution size when the number of components
   *   written to each destination GRF is not the same.
   */
- const unsigned width = MIN2(reg_width, phys_width);
- brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr, 0);
- brw_reg = stride(brw_reg, width * reg->stride, width, reg->stride);
+ if (reg->stride > 4) {
+assert(reg != >dst);
+assert(reg->stride * type_sz(reg->type) <= REG_SIZE);
+brw_reg = brw_vecn_reg(1, brw_file_from_reg(reg), reg->nr, 0);
+brw_reg = stride(brw_reg, reg->stride, 1, 0);
+
+ } else {
+const unsigned width = MIN2(reg_width, phys_width);
+brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), reg->nr, 0);
+brw_reg = stride(brw_reg, width * reg->stride, width, reg->stride);
+ }
 
  if (devinfo->gen == 7 && !devinfo->is_haswell) {
 /* From the IvyBridge PRM (EU Changes by Processor Generation, 
page 13):
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: add EXT_debug_label support

2019-01-18 Thread Timothy Arceri

On 11/12/18 5:11 pm, Ian Romanick wrote:

On 12/10/18 5:52 PM, Timothy Arceri wrote:

On 11/12/18 11:35 am, Ian Romanick wrote:

It seems like someone already sent out patches to implement this, and we
decided to not take it for some reason.  Maybe it was Rob?


I discovered a thread from the beginning of 2017 titled "feature.txt &
EXT_debug_label extension". But couldn't find any implementation.

There was a reply from yourself, but it seems incorrect to me:

"I checked both extensions, and they're not "just" aliases.  The EXT adds
a single function with an enum to select the kind of object.  The KHR
adds a function per kind of object.  It would be easy enough to add, but
it seems more valuable to suggest the developer use the more broadly
supported extension."


That's weird for a couple reasons.  One, that's not even the discussion
that I was thinking of.  I'll check in the morning to see if I can find
it.  Two, I was clearly full of it... I really don't see how I came that
conclusion.  I don't even see any other related extensions that I could
have been confusing either thing with.


So do you think we should push this?




On 12/10/18 4:08 PM, Timothy Arceri wrote:

KHR_debug already provides superior functionality but this
extension is still in use and adding support for it seems fairly
harmless. For example it seems to be used by Unity as seen in the
Parkitect trace attached to Mesa bug #108919.
---
   src/mapi/glapi/gen/gl_API.xml    | 17 +
   src/mesa/main/extensions_table.h |  1 +
   src/mesa/main/objectlabel.c  |  6 ++
   3 files changed, 24 insertions(+)

diff --git a/src/mapi/glapi/gen/gl_API.xml
b/src/mapi/glapi/gen/gl_API.xml
index f1def8090d..75423c4edb 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -12973,6 +12973,23 @@
   
   
   +
+  


Since these are just aliases, I don't think any changes needed in
dispatch-sanity... but did you run 'make check' anyway? :)



Yes :) Passed as expected.



+    
+    
+    
+    
+  
+
+  
+    
+    
+    
+    
+    
+  
+
+
   http://www.w3.org/2001/XInclude"/>
     
diff --git a/src/mesa/main/extensions_table.h
b/src/mesa/main/extensions_table.h
index dad38124d5..b68f6781c4 100644
--- a/src/mesa/main/extensions_table.h
+++ b/src/mesa/main/extensions_table.h
@@ -217,6 +217,7 @@ EXT(EXT_compiled_vertex_array   ,
dummy_true
   EXT(EXT_compressed_ETC1_RGB8_sub_texture    ,
OES_compressed_ETC1_RGB8_texture   ,  x ,  x , ES1, ES2, 2014)
   EXT(EXT_copy_image  ,
OES_copy_image ,  x ,  x ,  x ,  30, 2014)
   EXT(EXT_copy_texture    ,
dummy_true , GLL,  x ,  x ,  x , 1995)
+EXT(EXT_debug_label ,
dummy_true , GLL, GLC,  x ,  x , 2013)
   EXT(EXT_depth_bounds_test   ,
EXT_depth_bounds_test  , GLL, GLC,  x ,  x , 2002)
   EXT(EXT_discard_framebuffer ,
dummy_true ,  x ,  x , ES1, ES2, 2009)
   EXT(EXT_disjoint_timer_query    ,
EXT_disjoint_timer_query   ,  x ,  x ,  x , ES2, 2016)
diff --git a/src/mesa/main/objectlabel.c b/src/mesa/main/objectlabel.c
index 1e3022ee54..9d4cc1871e 100644
--- a/src/mesa/main/objectlabel.c
+++ b/src/mesa/main/objectlabel.c
@@ -139,6 +139,7 @@ get_label_pointer(struct gl_context *ctx, GLenum
identifier, GLuint name,
    switch (identifier) {
  case GL_BUFFER:
+   case GL_BUFFER_OBJECT_EXT:
     {
    struct gl_buffer_object *bufObj =
_mesa_lookup_bufferobj(ctx, name);
    if (bufObj)
@@ -146,6 +147,7 @@ get_label_pointer(struct gl_context *ctx, GLenum
identifier, GLuint name,
     }
     break;
  case GL_SHADER:
+   case GL_SHADER_OBJECT_EXT:
     {
    struct gl_shader *shader = _mesa_lookup_shader(ctx, name);
    if (shader)
@@ -153,6 +155,7 @@ get_label_pointer(struct gl_context *ctx, GLenum
identifier, GLuint name,
     }
     break;
  case GL_PROGRAM:
+   case GL_PROGRAM_OBJECT_EXT:
     {
    struct gl_shader_program *program =
   _mesa_lookup_shader_program(ctx, name);
@@ -161,6 +164,7 @@ get_label_pointer(struct gl_context *ctx, GLenum
identifier, GLuint name,
     }
     break;
  case GL_VERTEX_ARRAY:
+   case GL_VERTEX_ARRAY_OBJECT_EXT:
     {
    struct gl_vertex_array_object *obj = _mesa_lookup_vao(ctx,
name);
    if (obj)
@@ -168,6 +172,7 @@ get_label_pointer(struct gl_context *ctx, GLenum
identifier, GLuint name,
     }
     break;
  case GL_QUERY:
+   case GL_QUERY_OBJECT_EXT:
     {
    struct gl_query_object *query =
_mesa_lookup_query_object(ctx, name);
    if (query)
@@ -225,6 +230,7 @@ get_label_pointer(struct gl_context *ctx, GLenum
identifier, GLuint name,
     }
     break;
  case GL_PROGRAM_PIPELINE:
+   case 

Re: [Mesa-dev] [PATCH v10 09/20] clover: Track flags per module section

2019-01-18 Thread Francisco Jerez
Pierre Moreau  writes:

> One flag that needs to be tracked is whether a library is allowed to
> received mathematics optimisations or not, as the authorisation is given
> when creating the library while the optimisations are specified when
> creating the executable.
>
> Reviewed-by: Aaron Watry 
>
> Changes since:
> * v3: drop the modification to the tgsi backend, as already dropped
>   (Aaron Watry)
>
> Signed-off-by: Pierre Moreau 
> ---
>  src/gallium/state_trackers/clover/core/module.cpp   |  1 +
>  src/gallium/state_trackers/clover/core/module.hpp   | 13 +
>  .../state_trackers/clover/llvm/codegen/bitcode.cpp  |  3 ++-
>  .../state_trackers/clover/llvm/codegen/common.cpp   |  2 +-
>  4 files changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/src/gallium/state_trackers/clover/core/module.cpp 
> b/src/gallium/state_trackers/clover/core/module.cpp
> index a6c5b98d8e0..0e11506d0d7 100644
> --- a/src/gallium/state_trackers/clover/core/module.cpp
> +++ b/src/gallium/state_trackers/clover/core/module.cpp
> @@ -163,6 +163,7 @@ namespace {
>proc(S , QT ) {
>   _proc(s, x.id);
>   _proc(s, x.type);
> + _proc(s, x.flags);
>   _proc(s, x.size);
>   _proc(s, x.data);
>}
> diff --git a/src/gallium/state_trackers/clover/core/module.hpp 
> b/src/gallium/state_trackers/clover/core/module.hpp
> index 2ddd26426fb..ff7e9b6234a 100644
> --- a/src/gallium/state_trackers/clover/core/module.hpp
> +++ b/src/gallium/state_trackers/clover/core/module.hpp
> @@ -41,14 +41,19 @@ namespace clover {
>  data_local,
>  data_private
>   };
> + enum class flags_t {

You probably want the type to be "enum flags" for consistency with the
other enums defined here.

> +none,
> +allow_link_options

And explicitly define allow_link_options = 1u, assuming that this is
going to be a bit-mask with multiple flags.

Is this patch being used at all in this series?

> + };
>  
> - section(resource_id id, enum type type, size_t size,
> - const std::vector ) :
> - id(id), type(type), size(size), data(data) { }
> - section() : id(0), type(text_intermediate), size(0), data() { }
> + section(resource_id id, enum type type, flags_t flags,
> + size_t size, const std::vector ) :
> + id(id), type(type), flags(flags), size(size), data(data) { }
> + section() : id(0), type(text_intermediate), flags(flags_t::none), 
> size(0), data() { }
>  
>   resource_id id;
>   type type;
> + flags_t flags;
>   size_t size;
>   std::vector data;
>};
> diff --git a/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp 
> b/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp
> index 40bb426218d..8e9d4c7e85c 100644
> --- a/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp
> +++ b/src/gallium/state_trackers/clover/llvm/codegen/bitcode.cpp
> @@ -84,7 +84,8 @@ clover::llvm::build_module_library(const ::llvm::Module 
> ,
> enum module::section::type section_type) {
> module m;
> const auto code = emit_code(mod);
> -   m.secs.emplace_back(0, section_type, code.size(), code);
> +   m.secs.emplace_back(0, section_type, module::section::flags_t::none,
> +   code.size(), code);
> return m;
>  }
>  
> diff --git a/src/gallium/state_trackers/clover/llvm/codegen/common.cpp 
> b/src/gallium/state_trackers/clover/llvm/codegen/common.cpp
> index ca5f78940d2..a278e675003 100644
> --- a/src/gallium/state_trackers/clover/llvm/codegen/common.cpp
> +++ b/src/gallium/state_trackers/clover/llvm/codegen/common.cpp
> @@ -178,7 +178,7 @@ namespace {
> make_text_section(const std::vector ) {
>const pipe_llvm_program_header header { uint32_t(code.size()) };
>module::section text { 0, module::section::text_executable,
> - header.num_bytes, {} };
> + module::section::flags_t::none, 
> header.num_bytes, {} };
>  
>text.data.insert(text.data.end(), reinterpret_cast *>(),
> reinterpret_cast() + 
> sizeof(header));
> -- 
> 2.20.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v10 06/20] clover/api: Rework the validation of devices for building

2019-01-18 Thread Francisco Jerez
Pierre Moreau  writes:

> Reviewed-by: Francisco Jerez 
>
> Changes since:
> * v5:
>   - Drop the `valid_devs` argument to `validate_build_common()`
> (Francisco Jerez)
>   - Change `clLinkProgram()` to initialise `prog`’s devices prior to
> calling `validate_build_common()`.
> * v2:
>   - validate_build_common no longer returns a list of devices (Francisco
> Jerez);
>   - Dropped duplicate checks (Francisco Jerez).
>
> Signed-off-by: Pierre Moreau 

The current revision of this patch is still:

Reviewed-by: Francisco Jerez 

> ---
>  .../state_trackers/clover/api/program.cpp  | 18 +-
>  .../state_trackers/clover/core/program.cpp |  3 ++-
>  2 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/src/gallium/state_trackers/clover/api/program.cpp 
> b/src/gallium/state_trackers/clover/api/program.cpp
> index 9d59668f8f6..891a002f3d0 100644
> --- a/src/gallium/state_trackers/clover/api/program.cpp
> +++ b/src/gallium/state_trackers/clover/api/program.cpp
> @@ -41,7 +41,7 @@ namespace {
>   throw error(CL_INVALID_OPERATION);
>  
>if (any_of([&](const device ) {
> -   return !count(dev, prog.context().devices());
> +   return !count(dev, prog.devices());
>  }, objs(d_devs, num_devs)))
>   throw error(CL_INVALID_DEVICE);
> }
> @@ -176,8 +176,8 @@ clBuildProgram(cl_program d_prog, cl_uint num_devs,
> void (*pfn_notify)(cl_program, void *),
> void *user_data) try {
> auto  = obj(d_prog);
> -   auto devs = (d_devs ? objs(d_devs, num_devs) :
> -ref_vector(prog.context().devices()));
> +   auto devs =
> +  (d_devs ? objs(d_devs, num_devs) : ref_vector(prog.devices()));
> const auto opts = std::string(p_opts ? p_opts : "") + " " +
>   debug_get_option("CLOVER_EXTRA_BUILD_OPTIONS", "");
>  
> @@ -202,8 +202,8 @@ clCompileProgram(cl_program d_prog, cl_uint num_devs,
>   void (*pfn_notify)(cl_program, void *),
>   void *user_data) try {
> auto  = obj(d_prog);
> -   auto devs = (d_devs ? objs(d_devs, num_devs) :
> -ref_vector(prog.context().devices()));
> +   auto devs =
> +   (d_devs ? objs(d_devs, num_devs) : 
> ref_vector(prog.devices()));
> const auto opts = std::string(p_opts ? p_opts : "") + " " +
>   debug_get_option("CLOVER_EXTRA_COMPILE_OPTIONS", "");
> header_map headers;
> @@ -279,10 +279,10 @@ clLinkProgram(cl_context d_ctx, cl_uint num_devs, const 
> cl_device_id *d_devs,
> const auto opts = std::string(p_opts ? p_opts : "") + " " +
>   debug_get_option("CLOVER_EXTRA_LINK_OPTIONS", "");
> auto progs = objs(d_progs, num_progs);
> -   auto prog = create(ctx);
> -   auto devs = validate_link_devices(progs,
> - (d_devs ? objs(d_devs, num_devs) :
> -  ref_vector(ctx.devices(;
> +   auto all_devs =
> +  (d_devs ? objs(d_devs, num_devs) : ref_vector(ctx.devices()));
> +   auto prog = create(ctx, all_devs);
> +   auto devs = validate_link_devices(progs, all_devs);
>  
> validate_build_common(prog, num_devs, d_devs, pfn_notify, user_data);
>  
> diff --git a/src/gallium/state_trackers/clover/core/program.cpp 
> b/src/gallium/state_trackers/clover/core/program.cpp
> index ec71d99b017..62fa13efbf9 100644
> --- a/src/gallium/state_trackers/clover/core/program.cpp
> +++ b/src/gallium/state_trackers/clover/core/program.cpp
> @@ -26,7 +26,8 @@
>  using namespace clover;
>  
>  program::program(clover::context , const std::string ) :
> -   has_source(true), context(ctx), _source(source), _kernel_ref_counter(0) {
> +   has_source(true), context(ctx), _devices(ctx.devices()), _source(source),
> +   _kernel_ref_counter(0) {
>  }
>  
>  program::program(clover::context ,
> -- 
> 2.20.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/8] i965: Added support for ETC2 texture arrays on Gen7

2019-01-18 Thread Nanley Chery
On Mon, Nov 19, 2018 at 10:54:11AM +0200, Eleni Maria Stea wrote:
> Modified the calculation of the number of slices in the
> intel_update_decompressed_shadow function to take the array length into
> account to support arrays.
> ---

At this point, we can delete map_etc and unmap_etc, right?

-Nanley

>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index 4886bb2b96..0840b3b243 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -3965,6 +3965,8 @@ intel_update_decompressed_shadow(struct brw_context 
> *brw,
> int level_w = img_w;
> int level_h = img_h;
>  
> +   int num_slices = img_d * smt->surf.logical_level0_px.array_len;
> +
> for (int level = smt->first_level; level <= smt->last_level; level++) {
>ptrdiff_t shadow_stride = _mesa_format_row_stride(smt->format,
>  level_w);
> @@ -3972,7 +3974,7 @@ intel_update_decompressed_shadow(struct brw_context 
> *brw,
>ptrdiff_t main_stride = _mesa_format_row_stride(mt->format,
>level_w);
>  
> -  for (unsigned int slice = 0; slice < img_d; slice++) {
> +  for (unsigned int slice = 0; slice < num_slices; slice++) {
>   GLbitfield mmode = GL_MAP_READ_BIT | BRW_MAP_DIRECT_BIT |
>  BRW_MAP_ETC_BIT;
>   GLbitfield smode = GL_MAP_WRITE_BIT |
> -- 
> 2.19.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ac/nir_to_llvm: fix interpolateAt* for arrays

2019-01-18 Thread Timothy Arceri

On 19/1/19 10:29 am, Bas Nieuwenhuizen wrote:

On Sat, Jan 19, 2019 at 12:27 AM Bas Nieuwenhuizen
 wrote:


On Sat, Jan 19, 2019 at 12:17 AM Timothy Arceri  wrote:




On 19/1/19 9:36 am, Bas Nieuwenhuizen wrote:

On Thu, Jan 10, 2019 at 6:59 AM Timothy Arceri  wrote:


This builds on the recent interpolate fix by Rhys ee8488ea3b99.

This doesn't handle arrays of structs but I've got a feeling those
might be broken even for radeonsi tgsi (we currently have no tests).

This fixes the arb_gpu_shader5 interpolateAt* tests that contain
arrays.

Fixes: ee8488ea3b99 ("ac/nir,radv,radeonsi/nir: use correct indices for 
interpolation intrinsics")
---
   src/amd/common/ac_nir_to_llvm.c | 80 +
   1 file changed, 61 insertions(+), 19 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 5023b96f92..00011a439d 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2830,15 +2830,16 @@ static LLVMValueRef visit_interp(struct ac_nir_context 
*ctx,
   const nir_intrinsic_instr *instr)
   {
  LLVMValueRef result[4];
-   LLVMValueRef interp_param, attr_number;
+   LLVMValueRef interp_param;
  unsigned location;
  unsigned chan;
  LLVMValueRef src_c0 = NULL;
  LLVMValueRef src_c1 = NULL;
  LLVMValueRef src0 = NULL;

-   nir_variable *var = 
nir_deref_instr_get_variable(nir_instr_as_deref(instr->src[0].ssa->parent_instr));
-   int input_index = ctx->abi->fs_input_attr_indices[var->data.location - 
VARYING_SLOT_VAR0];
+   nir_deref_instr *deref_instr = 
nir_instr_as_deref(instr->src[0].ssa->parent_instr);
+   nir_variable *var = nir_deref_instr_get_variable(deref_instr);
+   int input_base = ctx->abi->fs_input_attr_indices[var->data.location - 
VARYING_SLOT_VAR0];
  switch (instr->intrinsic) {
  case nir_intrinsic_interp_deref_at_centroid:
  location = INTERP_CENTROID;
@@ -2868,7 +2869,6 @@ static LLVMValueRef visit_interp(struct ac_nir_context 
*ctx,
  src_c1 = LLVMBuildFSub(ctx->ac.builder, src_c1, halfval, "");
  }
  interp_param = ctx->abi->lookup_interp_param(ctx->abi, 
var->data.interpolation, location);
-   attr_number = LLVMConstInt(ctx->ac.i32, input_index, false);

  if (location == INTERP_CENTER) {
  LLVMValueRef ij_out[2];
@@ -2906,26 +2906,68 @@ static LLVMValueRef visit_interp(struct ac_nir_context 
*ctx,

  }

+   LLVMValueRef array_idx = ctx->ac.i32_0;
+   while(deref_instr->deref_type != nir_deref_type_var) {
+   if (deref_instr->deref_type == nir_deref_type_array) {
+   unsigned array_size = 
glsl_get_aoa_size(deref_instr->type);
+   if (!array_size)
+   array_size = 1;
+
+   LLVMValueRef offset;
+   nir_const_value *const_value = 
nir_src_as_const_value(deref_instr->arr.index);
+   if (const_value) {
+   offset = LLVMConstInt(ctx->ac.i32, array_size * 
const_value->u32[0], false);
+   } else {
+   LLVMValueRef indirect = get_src(ctx, 
deref_instr->arr.index);
+
+   offset = LLVMBuildMul(ctx->ac.builder, indirect,
+ LLVMConstInt(ctx->ac.i32, 
array_size, false), "");
+   }
+
+   array_idx = LLVMBuildAdd(ctx->ac.builder, array_idx, offset, 
"");
+   deref_instr = nir_src_as_deref(deref_instr->parent);
+   } else if (deref_instr->deref_type == nir_deref_type_struct) {
+   /* TODO: Probably need to do more here to support 
arrays of structs etc */
+   deref_instr = nir_src_as_deref(deref_instr->parent);


If we don't have confidence this works can we just have it go to the
unreachable below. IIRC spirv->nir also lowered struct inputs so I'm
not even sure we would encounter this.


This will work for structs, just probably not for arrays of structs. We
do need struct handling for radeonsi so I'd rather leave this as is.


Actually, how does this work for structs? I find it suspicous we don't
care about which member is taken?


Yeah your right. It seems the piglit tests are too simple and always use 
the first member.


I think I will fall through to the unreachable() as you suggested for 
now. Then I'll write some better tests before adding proper struct support.


Thanks for the review.







Otherwise,

Reviewed-by: Bas Nieuwenhuizen 


+   } else {
+   unreachable("Unsupported deref type");
+   }
+
+   }
+
+   unsigned input_array_size = glsl_get_aoa_size(var->type);
+   if (!input_array_size)
+   input_array_size = 1;
+
 

Re: [Mesa-dev] [PATCH 3/8] i965: Faking the ETC2 compression on Gen < 8 GPUs using two miptrees.

2019-01-18 Thread Nanley Chery
On Mon, Nov 19, 2018 at 10:54:07AM +0200, Eleni Maria Stea wrote:
> GPUs Gen < 8 cannot render ETC2 formats. So far, they converted the
> compressed EAC/ETC2 images to non-compressed RGB format images that they
> can render. When GetCompressed* functions were called, the pixels were
> returned in the RGB format and not the compressed format as expected.
> 
> Trying to fix this problem, we use the shadow miptree to store the
> decompressed data for the rendering and the main miptree to store the
> compressed. We use the BRW_MAP_ETC_BIT as a flag to indicate when we
> use the fake compression in order to map the main tree with the
> compressed data. The functions that upload the compressed data as well
> as the mapping/unmapping functions are now updated to use this flag.

Did you mean sample instead of render?

> ---
>  .../drivers/dri/i965/brw_wm_surface_state.c   | 26 +-
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 73 +--
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 17 
>  src/mesa/drivers/dri/i965/intel_tex_image.c   | 45 -
>  src/mesa/main/texstore.c  | 92 +++
>  src/mesa/main/texstore.h  |  9 ++
>  6 files changed, 204 insertions(+), 58 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
> b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> index e214fae140..4d1eafac91 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> @@ -329,6 +329,17 @@ brw_get_texture_swizzle(const struct gl_context *ctx,
>  {
> const struct gl_texture_image *img = t->Image[0][t->BaseLevel];
>  
> +   struct brw_context *brw = brw_context((struct gl_context *)ctx);
> +   const struct gen_device_info *devinfo = >screen->devinfo;
> +   bool is_fake_etc = _mesa_is_format_etc2(img->TexFormat) &&
> +  devinfo->gen < 8;
> +
> +   mesa_format format;
> +   if (is_fake_etc)
> +  format = intel_lower_compressed_format(brw, img->TexFormat);
> +   else
> +  format = img->TexFormat;
> +

Why is modifying this function necessary?

> int swizzles[SWIZZLE_NIL + 1] = {
>SWIZZLE_X,
>SWIZZLE_Y,
> @@ -381,7 +392,7 @@ brw_get_texture_swizzle(const struct gl_context *ctx,
>}
> }
>  
> -   GLenum datatype = _mesa_get_format_datatype(img->TexFormat);
> +   GLenum datatype = _mesa_get_format_datatype(format);
>  
> /* If the texture's format is alpha-only, force R, G, and B to
>  * 0.0. Similarly, if the texture's format has no alpha channel,
> @@ -422,9 +433,9 @@ brw_get_texture_swizzle(const struct gl_context *ctx,
> case GL_RED:
> case GL_RG:
> case GL_RGB:
> -  if (_mesa_get_format_bits(img->TexFormat, GL_ALPHA_BITS) > 0 ||
> -  img->TexFormat == MESA_FORMAT_RGB_DXT1 ||
> -  img->TexFormat == MESA_FORMAT_SRGB_DXT1)
> +  if (_mesa_get_format_bits(format, GL_ALPHA_BITS) > 0 ||
> +  format == MESA_FORMAT_RGB_DXT1 ||
> +  format == MESA_FORMAT_SRGB_DXT1)
>   swizzles[3] = SWIZZLE_ONE;
>break;
> }
> @@ -474,6 +485,11 @@ static void brw_update_texture_surface(struct gl_context 
> *ctx,
>struct intel_texture_object *intel_obj = intel_texture_object(obj);
>struct intel_mipmap_tree *mt = intel_obj->mt;
>  
> +  if (mt->needs_fake_etc) {
> + assert(mt->shadow_mt);
> + mt = mt->shadow_mt;
> +  }
> +
>if (plane > 0) {
>   if (mt->plane[plane - 1] == NULL)
>  return;
> @@ -512,7 +528,7 @@ static void brw_update_texture_surface(struct gl_context 
> *ctx,
>* is safe because texture views aren't allowed on depth/stencil.
>*/
>   mesa_fmt = mt->format;
> -  } else if (mt->etc_format != MESA_FORMAT_NONE) {
> +  } else if (intel_obj->mt->etc_format != MESA_FORMAT_NONE) {
>   mesa_fmt = mt->format;

For uniformity, lets access mt->shadow_mt->format here and move the
mt->needs_fake_etc check from above to below this condition:

} else if (devinfo->gen <= 7 && mt->format == MESA_FORMAT_S_UINT8) {


>} else if (plane > 0) {
>   mesa_fmt = mt->format;
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index 0e67e4d8f3..b24332ff67 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -689,6 +689,8 @@ miptree_create(struct brw_context *brw,
> if (devinfo->gen < 6 && _mesa_is_format_color_format(format))
>tiling_flags &= ~ISL_TILING_Y0_BIT;
>  
> +   bool fakes_etc_compression = devinfo->gen < 8 && 
> _mesa_is_format_etc2(format);
> +
> mesa_format mt_fmt;
> if (_mesa_is_format_color_format(format)) {
>mt_fmt = intel_lower_compressed_format(brw, format);

Why not reserve calling intel_lower_compressed_format() for the case in
which we're creating a 

Re: [Mesa-dev] [PATCH] ac/nir_to_llvm: fix interpolateAt* for arrays

2019-01-18 Thread Bas Nieuwenhuizen
On Sat, Jan 19, 2019 at 12:27 AM Bas Nieuwenhuizen
 wrote:

> On Sat, Jan 19, 2019 at 12:17 AM Timothy Arceri  wrote:
> >
> >
> >
> > On 19/1/19 9:36 am, Bas Nieuwenhuizen wrote:
> > > On Thu, Jan 10, 2019 at 6:59 AM Timothy Arceri  
> > > wrote:
> > >>
> > >> This builds on the recent interpolate fix by Rhys ee8488ea3b99.
> > >>
> > >> This doesn't handle arrays of structs but I've got a feeling those
> > >> might be broken even for radeonsi tgsi (we currently have no tests).
> > >>
> > >> This fixes the arb_gpu_shader5 interpolateAt* tests that contain
> > >> arrays.
> > >>
> > >> Fixes: ee8488ea3b99 ("ac/nir,radv,radeonsi/nir: use correct indices for 
> > >> interpolation intrinsics")
> > >> ---
> > >>   src/amd/common/ac_nir_to_llvm.c | 80 +
> > >>   1 file changed, 61 insertions(+), 19 deletions(-)
> > >>
> > >> diff --git a/src/amd/common/ac_nir_to_llvm.c 
> > >> b/src/amd/common/ac_nir_to_llvm.c
> > >> index 5023b96f92..00011a439d 100644
> > >> --- a/src/amd/common/ac_nir_to_llvm.c
> > >> +++ b/src/amd/common/ac_nir_to_llvm.c
> > >> @@ -2830,15 +2830,16 @@ static LLVMValueRef visit_interp(struct 
> > >> ac_nir_context *ctx,
> > >>   const nir_intrinsic_instr *instr)
> > >>   {
> > >>  LLVMValueRef result[4];
> > >> -   LLVMValueRef interp_param, attr_number;
> > >> +   LLVMValueRef interp_param;
> > >>  unsigned location;
> > >>  unsigned chan;
> > >>  LLVMValueRef src_c0 = NULL;
> > >>  LLVMValueRef src_c1 = NULL;
> > >>  LLVMValueRef src0 = NULL;
> > >>
> > >> -   nir_variable *var = 
> > >> nir_deref_instr_get_variable(nir_instr_as_deref(instr->src[0].ssa->parent_instr));
> > >> -   int input_index = 
> > >> ctx->abi->fs_input_attr_indices[var->data.location - VARYING_SLOT_VAR0];
> > >> +   nir_deref_instr *deref_instr = 
> > >> nir_instr_as_deref(instr->src[0].ssa->parent_instr);
> > >> +   nir_variable *var = nir_deref_instr_get_variable(deref_instr);
> > >> +   int input_base = 
> > >> ctx->abi->fs_input_attr_indices[var->data.location - VARYING_SLOT_VAR0];
> > >>  switch (instr->intrinsic) {
> > >>  case nir_intrinsic_interp_deref_at_centroid:
> > >>  location = INTERP_CENTROID;
> > >> @@ -2868,7 +2869,6 @@ static LLVMValueRef visit_interp(struct 
> > >> ac_nir_context *ctx,
> > >>  src_c1 = LLVMBuildFSub(ctx->ac.builder, src_c1, 
> > >> halfval, "");
> > >>  }
> > >>  interp_param = ctx->abi->lookup_interp_param(ctx->abi, 
> > >> var->data.interpolation, location);
> > >> -   attr_number = LLVMConstInt(ctx->ac.i32, input_index, false);
> > >>
> > >>  if (location == INTERP_CENTER) {
> > >>  LLVMValueRef ij_out[2];
> > >> @@ -2906,26 +2906,68 @@ static LLVMValueRef visit_interp(struct 
> > >> ac_nir_context *ctx,
> > >>
> > >>  }
> > >>
> > >> +   LLVMValueRef array_idx = ctx->ac.i32_0;
> > >> +   while(deref_instr->deref_type != nir_deref_type_var) {
> > >> +   if (deref_instr->deref_type == nir_deref_type_array) {
> > >> +   unsigned array_size = 
> > >> glsl_get_aoa_size(deref_instr->type);
> > >> +   if (!array_size)
> > >> +   array_size = 1;
> > >> +
> > >> +   LLVMValueRef offset;
> > >> +   nir_const_value *const_value = 
> > >> nir_src_as_const_value(deref_instr->arr.index);
> > >> +   if (const_value) {
> > >> +   offset = LLVMConstInt(ctx->ac.i32, 
> > >> array_size * const_value->u32[0], false);
> > >> +   } else {
> > >> +   LLVMValueRef indirect = get_src(ctx, 
> > >> deref_instr->arr.index);
> > >> +
> > >> +   offset = LLVMBuildMul(ctx->ac.builder, 
> > >> indirect,
> > >> + 
> > >> LLVMConstInt(ctx->ac.i32, array_size, false), "");
> > >> +   }
> > >> +
> > >> +   array_idx = LLVMBuildAdd(ctx->ac.builder, 
> > >> array_idx, offset, "");
> > >> +   deref_instr = 
> > >> nir_src_as_deref(deref_instr->parent);
> > >> +   } else if (deref_instr->deref_type == 
> > >> nir_deref_type_struct) {
> > >> +   /* TODO: Probably need to do more here to 
> > >> support arrays of structs etc */
> > >> +   deref_instr = 
> > >> nir_src_as_deref(deref_instr->parent);
> > >
> > > If we don't have confidence this works can we just have it go to the
> > > unreachable below. IIRC spirv->nir also lowered struct inputs so I'm
> > > not even sure we would encounter this.
> >
> > This will work for structs, just probably not for arrays of structs. We
> > do need struct handling for radeonsi so I'd rather leave this as is.

Actually, how does this work for 

Re: [Mesa-dev] [PATCH] ac/nir_to_llvm: fix interpolateAt* for arrays

2019-01-18 Thread Bas Nieuwenhuizen
Fair, r-b

On Sat, Jan 19, 2019 at 12:17 AM Timothy Arceri  wrote:
>
>
>
> On 19/1/19 9:36 am, Bas Nieuwenhuizen wrote:
> > On Thu, Jan 10, 2019 at 6:59 AM Timothy Arceri  
> > wrote:
> >>
> >> This builds on the recent interpolate fix by Rhys ee8488ea3b99.
> >>
> >> This doesn't handle arrays of structs but I've got a feeling those
> >> might be broken even for radeonsi tgsi (we currently have no tests).
> >>
> >> This fixes the arb_gpu_shader5 interpolateAt* tests that contain
> >> arrays.
> >>
> >> Fixes: ee8488ea3b99 ("ac/nir,radv,radeonsi/nir: use correct indices for 
> >> interpolation intrinsics")
> >> ---
> >>   src/amd/common/ac_nir_to_llvm.c | 80 +
> >>   1 file changed, 61 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/src/amd/common/ac_nir_to_llvm.c 
> >> b/src/amd/common/ac_nir_to_llvm.c
> >> index 5023b96f92..00011a439d 100644
> >> --- a/src/amd/common/ac_nir_to_llvm.c
> >> +++ b/src/amd/common/ac_nir_to_llvm.c
> >> @@ -2830,15 +2830,16 @@ static LLVMValueRef visit_interp(struct 
> >> ac_nir_context *ctx,
> >>   const nir_intrinsic_instr *instr)
> >>   {
> >>  LLVMValueRef result[4];
> >> -   LLVMValueRef interp_param, attr_number;
> >> +   LLVMValueRef interp_param;
> >>  unsigned location;
> >>  unsigned chan;
> >>  LLVMValueRef src_c0 = NULL;
> >>  LLVMValueRef src_c1 = NULL;
> >>  LLVMValueRef src0 = NULL;
> >>
> >> -   nir_variable *var = 
> >> nir_deref_instr_get_variable(nir_instr_as_deref(instr->src[0].ssa->parent_instr));
> >> -   int input_index = 
> >> ctx->abi->fs_input_attr_indices[var->data.location - VARYING_SLOT_VAR0];
> >> +   nir_deref_instr *deref_instr = 
> >> nir_instr_as_deref(instr->src[0].ssa->parent_instr);
> >> +   nir_variable *var = nir_deref_instr_get_variable(deref_instr);
> >> +   int input_base = 
> >> ctx->abi->fs_input_attr_indices[var->data.location - VARYING_SLOT_VAR0];
> >>  switch (instr->intrinsic) {
> >>  case nir_intrinsic_interp_deref_at_centroid:
> >>  location = INTERP_CENTROID;
> >> @@ -2868,7 +2869,6 @@ static LLVMValueRef visit_interp(struct 
> >> ac_nir_context *ctx,
> >>  src_c1 = LLVMBuildFSub(ctx->ac.builder, src_c1, halfval, 
> >> "");
> >>  }
> >>  interp_param = ctx->abi->lookup_interp_param(ctx->abi, 
> >> var->data.interpolation, location);
> >> -   attr_number = LLVMConstInt(ctx->ac.i32, input_index, false);
> >>
> >>  if (location == INTERP_CENTER) {
> >>  LLVMValueRef ij_out[2];
> >> @@ -2906,26 +2906,68 @@ static LLVMValueRef visit_interp(struct 
> >> ac_nir_context *ctx,
> >>
> >>  }
> >>
> >> +   LLVMValueRef array_idx = ctx->ac.i32_0;
> >> +   while(deref_instr->deref_type != nir_deref_type_var) {
> >> +   if (deref_instr->deref_type == nir_deref_type_array) {
> >> +   unsigned array_size = 
> >> glsl_get_aoa_size(deref_instr->type);
> >> +   if (!array_size)
> >> +   array_size = 1;
> >> +
> >> +   LLVMValueRef offset;
> >> +   nir_const_value *const_value = 
> >> nir_src_as_const_value(deref_instr->arr.index);
> >> +   if (const_value) {
> >> +   offset = LLVMConstInt(ctx->ac.i32, 
> >> array_size * const_value->u32[0], false);
> >> +   } else {
> >> +   LLVMValueRef indirect = get_src(ctx, 
> >> deref_instr->arr.index);
> >> +
> >> +   offset = LLVMBuildMul(ctx->ac.builder, 
> >> indirect,
> >> + 
> >> LLVMConstInt(ctx->ac.i32, array_size, false), "");
> >> +   }
> >> +
> >> +   array_idx = LLVMBuildAdd(ctx->ac.builder, 
> >> array_idx, offset, "");
> >> +   deref_instr = 
> >> nir_src_as_deref(deref_instr->parent);
> >> +   } else if (deref_instr->deref_type == 
> >> nir_deref_type_struct) {
> >> +   /* TODO: Probably need to do more here to support 
> >> arrays of structs etc */
> >> +   deref_instr = 
> >> nir_src_as_deref(deref_instr->parent);
> >
> > If we don't have confidence this works can we just have it go to the
> > unreachable below. IIRC spirv->nir also lowered struct inputs so I'm
> > not even sure we would encounter this.
>
> This will work for structs, just probably not for arrays of structs. We
> do need struct handling for radeonsi so I'd rather leave this as is.
>
> >
> > Otherwise,
> >
> > Reviewed-by: Bas Nieuwenhuizen 
> >
> >> +   } else {
> >> +   unreachable("Unsupported deref type");
> >> +   }
> >> +
> >> +   }
> >> +
> >> +   unsigned input_array_size = glsl_get_aoa_size(var->type);
> >> +   if 

Re: [Mesa-dev] [PATCH] ac/nir_to_llvm: fix interpolateAt* for arrays

2019-01-18 Thread Timothy Arceri



On 19/1/19 9:36 am, Bas Nieuwenhuizen wrote:

On Thu, Jan 10, 2019 at 6:59 AM Timothy Arceri  wrote:


This builds on the recent interpolate fix by Rhys ee8488ea3b99.

This doesn't handle arrays of structs but I've got a feeling those
might be broken even for radeonsi tgsi (we currently have no tests).

This fixes the arb_gpu_shader5 interpolateAt* tests that contain
arrays.

Fixes: ee8488ea3b99 ("ac/nir,radv,radeonsi/nir: use correct indices for 
interpolation intrinsics")
---
  src/amd/common/ac_nir_to_llvm.c | 80 +
  1 file changed, 61 insertions(+), 19 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 5023b96f92..00011a439d 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2830,15 +2830,16 @@ static LLVMValueRef visit_interp(struct ac_nir_context 
*ctx,
  const nir_intrinsic_instr *instr)
  {
 LLVMValueRef result[4];
-   LLVMValueRef interp_param, attr_number;
+   LLVMValueRef interp_param;
 unsigned location;
 unsigned chan;
 LLVMValueRef src_c0 = NULL;
 LLVMValueRef src_c1 = NULL;
 LLVMValueRef src0 = NULL;

-   nir_variable *var = 
nir_deref_instr_get_variable(nir_instr_as_deref(instr->src[0].ssa->parent_instr));
-   int input_index = ctx->abi->fs_input_attr_indices[var->data.location - 
VARYING_SLOT_VAR0];
+   nir_deref_instr *deref_instr = 
nir_instr_as_deref(instr->src[0].ssa->parent_instr);
+   nir_variable *var = nir_deref_instr_get_variable(deref_instr);
+   int input_base = ctx->abi->fs_input_attr_indices[var->data.location - 
VARYING_SLOT_VAR0];
 switch (instr->intrinsic) {
 case nir_intrinsic_interp_deref_at_centroid:
 location = INTERP_CENTROID;
@@ -2868,7 +2869,6 @@ static LLVMValueRef visit_interp(struct ac_nir_context 
*ctx,
 src_c1 = LLVMBuildFSub(ctx->ac.builder, src_c1, halfval, "");
 }
 interp_param = ctx->abi->lookup_interp_param(ctx->abi, 
var->data.interpolation, location);
-   attr_number = LLVMConstInt(ctx->ac.i32, input_index, false);

 if (location == INTERP_CENTER) {
 LLVMValueRef ij_out[2];
@@ -2906,26 +2906,68 @@ static LLVMValueRef visit_interp(struct ac_nir_context 
*ctx,

 }

+   LLVMValueRef array_idx = ctx->ac.i32_0;
+   while(deref_instr->deref_type != nir_deref_type_var) {
+   if (deref_instr->deref_type == nir_deref_type_array) {
+   unsigned array_size = 
glsl_get_aoa_size(deref_instr->type);
+   if (!array_size)
+   array_size = 1;
+
+   LLVMValueRef offset;
+   nir_const_value *const_value = 
nir_src_as_const_value(deref_instr->arr.index);
+   if (const_value) {
+   offset = LLVMConstInt(ctx->ac.i32, array_size * 
const_value->u32[0], false);
+   } else {
+   LLVMValueRef indirect = get_src(ctx, 
deref_instr->arr.index);
+
+   offset = LLVMBuildMul(ctx->ac.builder, indirect,
+ LLVMConstInt(ctx->ac.i32, 
array_size, false), "");
+   }
+
+   array_idx = LLVMBuildAdd(ctx->ac.builder, array_idx, offset, 
"");
+   deref_instr = nir_src_as_deref(deref_instr->parent);
+   } else if (deref_instr->deref_type == nir_deref_type_struct) {
+   /* TODO: Probably need to do more here to support 
arrays of structs etc */
+   deref_instr = nir_src_as_deref(deref_instr->parent);


If we don't have confidence this works can we just have it go to the
unreachable below. IIRC spirv->nir also lowered struct inputs so I'm
not even sure we would encounter this.


This will work for structs, just probably not for arrays of structs. We 
do need struct handling for radeonsi so I'd rather leave this as is.




Otherwise,

Reviewed-by: Bas Nieuwenhuizen 


+   } else {
+   unreachable("Unsupported deref type");
+   }
+
+   }
+
+   unsigned input_array_size = glsl_get_aoa_size(var->type);
+   if (!input_array_size)
+   input_array_size = 1;
+
 for (chan = 0; chan < 4; chan++) {
+   LLVMValueRef gather = LLVMGetUndef(LLVMVectorType(ctx->ac.f32, 
input_array_size));
 LLVMValueRef llvm_chan = LLVMConstInt(ctx->ac.i32, chan, 
false);

-   if (interp_param) {
-   interp_param = LLVMBuildBitCast(ctx->ac.builder,
+   for (unsigned idx = 0; idx < input_array_size; ++idx) {
+   LLVMValueRef v, attr_number;
+
+   attr_number = LLVMConstInt(ctx->ac.i32, input_base + 
idx, false);
+   

Re: [Mesa-dev] [PATCH 7/8] gallium/util: add a linear allocator for reducing malloc overhead

2019-01-18 Thread Bas Nieuwenhuizen
On Fri, Jan 18, 2019 at 5:44 PM Marek Olšák  wrote:
>
> From: Marek Olšák 
>
> ---
>  src/gallium/auxiliary/Makefile.sources  |  1 +
>  src/gallium/auxiliary/meson.build   |  1 +
>  src/gallium/auxiliary/util/u_cpu_suballoc.h | 90 +
>  3 files changed, 92 insertions(+)
>  create mode 100644 src/gallium/auxiliary/util/u_cpu_suballoc.h
>
> diff --git a/src/gallium/auxiliary/Makefile.sources 
> b/src/gallium/auxiliary/Makefile.sources
> index 50e88088ff8..b26415858f6 100644
> --- a/src/gallium/auxiliary/Makefile.sources
> +++ b/src/gallium/auxiliary/Makefile.sources
> @@ -211,20 +211,21 @@ C_SOURCES := \
> util/u_bitmask.c \
> util/u_bitmask.h \
> util/u_blend.h \
> util/u_blit.c \
> util/u_blit.h \
> util/u_blitter.c \
> util/u_blitter.h \
> util/u_box.h \
> util/u_cache.c \
> util/u_cache.h \
> +   util/u_cpu_suballoc.h \
> util/u_debug_gallium.h \
> util/u_debug_gallium.c \
> util/u_debug_describe.c \
> util/u_debug_describe.h \
> util/u_debug_flush.c \
> util/u_debug_flush.h \
> util/u_debug_image.c \
> util/u_debug_image.h \
> util/u_debug_memory.c \
> util/u_debug_refcnt.c \
> diff --git a/src/gallium/auxiliary/meson.build 
> b/src/gallium/auxiliary/meson.build
> index 57f7e69050f..7e1e4732421 100644
> --- a/src/gallium/auxiliary/meson.build
> +++ b/src/gallium/auxiliary/meson.build
> @@ -231,20 +231,21 @@ files_libgallium = files(
>'util/u_bitmask.c',
>'util/u_bitmask.h',
>'util/u_blend.h',
>'util/u_blit.c',
>'util/u_blit.h',
>'util/u_blitter.c',
>'util/u_blitter.h',
>'util/u_box.h',
>'util/u_cache.c',
>'util/u_cache.h',
> +  'util/u_cpu_suballoc.h',
>'util/u_debug_gallium.h',
>'util/u_debug_gallium.c',
>'util/u_debug_describe.c',
>'util/u_debug_describe.h',
>'util/u_debug_flush.c',
>'util/u_debug_flush.h',
>'util/u_debug_image.c',
>'util/u_debug_image.h',
>'util/u_debug_memory.c',
>'util/u_debug_refcnt.c',
> diff --git a/src/gallium/auxiliary/util/u_cpu_suballoc.h 
> b/src/gallium/auxiliary/util/u_cpu_suballoc.h
> new file mode 100644
> index 000..2373c1f7c70
> --- /dev/null
> +++ b/src/gallium/auxiliary/util/u_cpu_suballoc.h
> @@ -0,0 +1,90 @@
> +/**
> + *
> + * Copyright 2019 Advanced Micro Devices, Inc.
> + * All Rights Reserved.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> + * "Software"), to deal in the Software without restriction, including
> + * without limitation the rights to use, copy, modify, merge, publish,
> + * distribute, sub license, and/or sell copies of the Software, and to
> + * permit persons to whom the Software is furnished to do so, subject to
> + * the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> + * next paragraph) shall be included in all copies or substantial portions
> + * of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
> + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
> + * IN NO EVENT SHALL AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR
> + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
> + *
> + **/
> +
> +/* A simple utility for suballocating out of malloc_aligned. */
> +
> +#ifndef U_CPU_SUBALLOC_H
> +#define U_CPU_SUBALLOC_H
> +
> +#include 
> +#include "util/os_memory.h"
> +
> +struct u_cpu_suballoc {
> +   unsigned default_size;  /* Default size of the buffer, in bytes. */
> +   unsigned current_size;  /* Current size of the buffer, in bytes. */
> +   unsigned alignment; /* malloc alignment. */
> +   unsigned offset;/* Offset pointing to the first unused byte. */
> +   uint8_t *buffer;/* Pointer to the CPU buffer. */
> +};
> +
> +
> +static inline void
> +u_cpu_suballoc_init(struct u_cpu_suballoc *alloc, unsigned default_size,
> +   unsigned alignment)
> +{
> +   memset(alloc, 0, sizeof(*alloc));
> +   alloc->default_size = default_size;
> +   alloc->alignment = alignment;
> +}
> +
> +
> +static inline void
> +u_cpu_suballoc_deinit(struct u_cpu_suballoc *alloc)
> +{
> +   os_free_aligned(alloc->buffer);
> +   alloc->buffer = NULL;
> +}
> +
> +
> +static inline void *
> +u_cpu_suballoc(struct u_cpu_suballoc *alloc, unsigned size, unsigned 
> alignment)
> +{
> +   unsigned offset = align(alloc->offset, alignment);
> +
> +   /* Make sure we have 

Re: [Mesa-dev] [PATCH 6/8] radeonsi: use WRITE_DATA for small glBufferSubData sizes

2019-01-18 Thread Bas Nieuwenhuizen
On Fri, Jan 18, 2019 at 5:44 PM Marek Olšák  wrote:
>
> From: Marek Olšák 
>
> ---
>  src/gallium/drivers/radeonsi/si_buffer.c | 27 
>  src/gallium/drivers/radeonsi/si_pipe.h   |  1 +
>  2 files changed, 28 insertions(+)
>
> diff --git a/src/gallium/drivers/radeonsi/si_buffer.c 
> b/src/gallium/drivers/radeonsi/si_buffer.c
> index 4766cf4bdfa..a1e421b8b0d 100644
> --- a/src/gallium/drivers/radeonsi/si_buffer.c
> +++ b/src/gallium/drivers/radeonsi/si_buffer.c
> @@ -16,20 +16,22 @@
>   * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>   * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>   * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
>   * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
>   * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>   * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
>   * USE OR OTHER DEALINGS IN THE SOFTWARE.
>   */
>
>  #include "radeonsi/si_pipe.h"
> +#include "sid.h"
> +
>  #include "util/u_memory.h"
>  #include "util/u_upload_mgr.h"
>  #include "util/u_transfer.h"
>  #include 
>  #include 
>
>  bool si_rings_is_buffer_referenced(struct si_context *sctx,
>struct pb_buffer *buf,
>enum radeon_bo_usage usage)
>  {
> @@ -506,20 +508,38 @@ static void *si_buffer_transfer_map(struct pipe_context 
> *ctx,
> data = si_buffer_map_sync_with_rings(sctx, rbuffer, usage);
> if (!data) {
> return NULL;
> }
> data += box->x;
>
> return si_buffer_get_transfer(ctx, resource, usage, box,
> ptransfer, data, NULL, 0);
>  }
>
> +static void si_buffer_write_data(struct si_context *sctx, struct 
> r600_resource *buf,
> +unsigned offset, unsigned size, const void 
> *data)
> +{
> +   struct radeon_cmdbuf *cs = sctx->gfx_cs;
> +
> +   si_need_gfx_cs_space(sctx);
> +
> +   sctx->flags |= SI_CONTEXT_PS_PARTIAL_FLUSH |
> +  SI_CONTEXT_CS_PARTIAL_FLUSH |
> +  si_get_flush_flags(sctx, SI_COHERENCY_SHADER, L2_LRU);
> +   si_emit_cache_flush(sctx);

Maybe only do the cache flush if the buffer is referenced by the
current cmd buffer?

(I'm kinda surprised reading this that we don't do
DISCARD_WHOLE_RESOURCE if the offset is 0 and the size equal to the
buffer size. )

> +
> +   si_cp_write_data(sctx, buf, offset, size, V_370_TC_L2, V_370_ME, 
> data);
> +
> +   radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0));
> +   radeon_emit(cs, 0);
> +}
> +
>  static void si_buffer_do_flush_region(struct pipe_context *ctx,
>   struct pipe_transfer *transfer,
>   const struct pipe_box *box)
>  {
> struct si_transfer *stransfer = (struct si_transfer*)transfer;
> struct r600_resource *rbuffer = r600_resource(transfer->resource);
>
> if (stransfer->u.staging) {
> /* Copy the staging buffer into the original one. */
> si_copy_buffer((struct si_context*)ctx, transfer->resource,
> @@ -568,20 +588,27 @@ static void si_buffer_transfer_unmap(struct 
> pipe_context *ctx,
>
>  static void si_buffer_subdata(struct pipe_context *ctx,
>   struct pipe_resource *buffer,
>   unsigned usage, unsigned offset,
>   unsigned size, const void *data)
>  {
> struct pipe_transfer *transfer = NULL;
> struct pipe_box box;
> uint8_t *map = NULL;
>
> +   if (size <= SI_TRANSFER_WRITE_DATA_THRESHOLD &&
> +   offset % 4 == 0 && size % 4 == 0 && (uintptr_t)data % 4 == 0) {
> +   si_buffer_write_data((struct si_context*)ctx,
> +r600_resource(buffer), offset, size, 
> data);
> +   return;
> +   }
> +
> u_box_1d(offset, size, );
> map = si_buffer_transfer_map(ctx, buffer, 0,
>PIPE_TRANSFER_WRITE |
>PIPE_TRANSFER_DISCARD_RANGE |
>usage,
>, );
> if (!map)
> return;
>
> memcpy(map, data, size);
> diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
> b/src/gallium/drivers/radeonsi/si_pipe.h
> index 5bd3d9641d2..f79828f3438 100644
> --- a/src/gallium/drivers/radeonsi/si_pipe.h
> +++ b/src/gallium/drivers/radeonsi/si_pipe.h
> @@ -94,20 +94,21 @@
>  #define SI_PREFETCH_ES (1 << 3)
>  #define SI_PREFETCH_GS (1 << 4)
>  #define SI_PREFETCH_VS (1 << 5)
>  #define SI_PREFETCH_PS (1 << 6)
>
>  #define SI_MAX_BORDER_COLORS   4096
>  #define SI_MAX_VIEWPORTS

Re: [Mesa-dev] [PATCH 2/8] i965: r8stencil_mt/needs_update renamed to shadow_mt/needs_update

2019-01-18 Thread Nanley Chery
On Mon, Nov 19, 2018 at 10:54:06AM +0200, Eleni Maria Stea wrote:
> Renamed the r8stencil_mt and r8stencil_needs_update to shadow_mt and
> shadow_needs_update respectively to allow reusing the shadow_mt as a
> generic purpose secondary mipmap tree.

The series I pointed you to earlier has a patch like this, but it's more
complete. It also modifies the comment above the data structure being
modified. Do you want to review it?

https://patchwork.freedesktop.org/patch/253197/

I think what people usually do in this case is send out their series
with the other person's patch included (and their rb tacked onto it).

-Nanley

> ---
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c |  8 
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c| 16 
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h|  4 ++--
>  3 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
> b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> index 8d21cf5fa7..e214fae140 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> @@ -563,15 +563,15 @@ static void brw_update_texture_surface(struct 
> gl_context *ctx,
>  
>if (obj->StencilSampling && firstImage->_BaseFormat == 
> GL_DEPTH_STENCIL) {
>   if (devinfo->gen <= 7) {
> -assert(mt->r8stencil_mt && 
> !mt->stencil_mt->r8stencil_needs_update);
> -mt = mt->r8stencil_mt;
> +assert(mt->shadow_mt && !mt->stencil_mt->shadow_needs_update);
> +mt = mt->shadow_mt;
>   } else {
>  mt = mt->stencil_mt;
>   }
>   format = ISL_FORMAT_R8_UINT;
>} else if (devinfo->gen <= 7 && mt->format == MESA_FORMAT_S_UINT8) {
> - assert(mt->r8stencil_mt && !mt->r8stencil_needs_update);
> - mt = mt->r8stencil_mt;
> + assert(mt->shadow_mt && !mt->shadow_needs_update);
> + mt = mt->shadow_mt;
>   format = ISL_FORMAT_R8_UINT;
>}
>  
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index 5e11ec0c30..0e67e4d8f3 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -1216,7 +1216,7 @@ intel_miptree_release(struct intel_mipmap_tree **mt)
>  
>brw_bo_unreference((*mt)->bo);
>intel_miptree_release(&(*mt)->stencil_mt);
> -  intel_miptree_release(&(*mt)->r8stencil_mt);
> +  intel_miptree_release(&(*mt)->shadow_mt);
>intel_miptree_aux_buffer_free((*mt)->aux_buf);
>free_aux_state_map((*mt)->aux_state);
>  
> @@ -2429,7 +2429,7 @@ intel_miptree_finish_write(struct brw_context *brw,
> switch (mt->aux_usage) {
> case ISL_AUX_USAGE_NONE:
>if (mt->format == MESA_FORMAT_S_UINT8 && devinfo->gen <= 7)
> - mt->r8stencil_needs_update = true;
> + mt->shadow_needs_update = true;
>break;
>  
> case ISL_AUX_USAGE_MCS:
> @@ -2935,9 +2935,9 @@ intel_update_r8stencil(struct brw_context *brw,
>  
> assert(src->surf.size_B > 0);
>  
> -   if (!mt->r8stencil_mt) {
> +   if (!mt->shadow_mt) {
>assert(devinfo->gen > 6); /* Handle MIPTREE_LAYOUT_GEN6_HIZ_STENCIL */
> -  mt->r8stencil_mt = make_surface(
> +  mt->shadow_mt = make_surface(
>  brw,
>  src->target,
>  MESA_FORMAT_R_UINT8,
> @@ -2951,13 +2951,13 @@ intel_update_r8stencil(struct brw_context *brw,
>  ISL_TILING_Y0_BIT,
>  ISL_SURF_USAGE_TEXTURE_BIT,
>  BO_ALLOC_BUSY, 0, NULL);
> -  assert(mt->r8stencil_mt);
> +  assert(mt->shadow_mt);
> }
>  
> -   if (src->r8stencil_needs_update == false)
> +   if (src->shadow_needs_update == false)
>return;
>  
> -   struct intel_mipmap_tree *dst = mt->r8stencil_mt;
> +   struct intel_mipmap_tree *dst = mt->shadow_mt;
>  
> for (int level = src->first_level; level <= src->last_level; level++) {
>const unsigned depth = src->surf.dim == ISL_SURF_DIM_3D ?
> @@ -2977,7 +2977,7 @@ intel_update_r8stencil(struct brw_context *brw,
> }
>  
> brw_cache_flush_for_read(brw, dst->bo);
> -   src->r8stencil_needs_update = false;
> +   src->shadow_needs_update = false;
>  }
>  
>  static void *
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> index b0333655ad..b955a2bab1 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> @@ -302,8 +302,8 @@ struct intel_mipmap_tree
>  *
>  * \see intel_update_r8stencil()
>  */
> -   struct intel_mipmap_tree *r8stencil_mt;
> -   bool r8stencil_needs_update;
> +   struct intel_mipmap_tree *shadow_mt;
> +   bool shadow_needs_update;
>  
> /**
>  * 

Re: [Mesa-dev] [PATCH 1/8] i965: Removed assertions from intel_miptree_map_etc

2019-01-18 Thread Nanley Chery
On Mon, Nov 19, 2018 at 10:54:05AM +0200, Eleni Maria Stea wrote:
> The assertions that the GL_MAP_WRITE_BIT and GL_MAP_INVALIDATE_RANGE_BIT
> in intel_miptree_map_etc should be removed since they will fail when the
  ^
  missing "bits are set"?

> ETC miptree is mapped for reading.
> 

The assertion is still valid at this point. Reading will give you
incorrect results. You'll want to do this later on in the series though.

> Fixes: KHR-GL45.direct_state_access.textures_compressed_subimage crash
   ^
   Should probably remove the semicolon so that you're not using the
   Fixes tag. I think that's reserved for fixing bugs in 
   commits. See the git log for more info.

-Nanley

> on Gen 7 GPUs.
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index 8e50aabb3b..5e11ec0c30 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -3444,9 +3444,6 @@ intel_miptree_map_etc(struct brw_context *brw,
>assert(mt->format == MESA_FORMAT_R8G8B8X8_UNORM);
> }
>  
> -   assert(map->mode & GL_MAP_WRITE_BIT);
> -   assert(map->mode & GL_MAP_INVALIDATE_RANGE_BIT);
> -
> intel_miptree_access_raw(brw, mt, level, slice, true);
>  
> map->stride = _mesa_format_row_stride(mt->etc_format, map->w);
> -- 
> 2.19.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ac/nir_to_llvm: fix interpolateAt* for arrays

2019-01-18 Thread Bas Nieuwenhuizen
On Thu, Jan 10, 2019 at 6:59 AM Timothy Arceri  wrote:
>
> This builds on the recent interpolate fix by Rhys ee8488ea3b99.
>
> This doesn't handle arrays of structs but I've got a feeling those
> might be broken even for radeonsi tgsi (we currently have no tests).
>
> This fixes the arb_gpu_shader5 interpolateAt* tests that contain
> arrays.
>
> Fixes: ee8488ea3b99 ("ac/nir,radv,radeonsi/nir: use correct indices for 
> interpolation intrinsics")
> ---
>  src/amd/common/ac_nir_to_llvm.c | 80 +
>  1 file changed, 61 insertions(+), 19 deletions(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index 5023b96f92..00011a439d 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -2830,15 +2830,16 @@ static LLVMValueRef visit_interp(struct 
> ac_nir_context *ctx,
>  const nir_intrinsic_instr *instr)
>  {
> LLVMValueRef result[4];
> -   LLVMValueRef interp_param, attr_number;
> +   LLVMValueRef interp_param;
> unsigned location;
> unsigned chan;
> LLVMValueRef src_c0 = NULL;
> LLVMValueRef src_c1 = NULL;
> LLVMValueRef src0 = NULL;
>
> -   nir_variable *var = 
> nir_deref_instr_get_variable(nir_instr_as_deref(instr->src[0].ssa->parent_instr));
> -   int input_index = ctx->abi->fs_input_attr_indices[var->data.location 
> - VARYING_SLOT_VAR0];
> +   nir_deref_instr *deref_instr = 
> nir_instr_as_deref(instr->src[0].ssa->parent_instr);
> +   nir_variable *var = nir_deref_instr_get_variable(deref_instr);
> +   int input_base = ctx->abi->fs_input_attr_indices[var->data.location - 
> VARYING_SLOT_VAR0];
> switch (instr->intrinsic) {
> case nir_intrinsic_interp_deref_at_centroid:
> location = INTERP_CENTROID;
> @@ -2868,7 +2869,6 @@ static LLVMValueRef visit_interp(struct ac_nir_context 
> *ctx,
> src_c1 = LLVMBuildFSub(ctx->ac.builder, src_c1, halfval, "");
> }
> interp_param = ctx->abi->lookup_interp_param(ctx->abi, 
> var->data.interpolation, location);
> -   attr_number = LLVMConstInt(ctx->ac.i32, input_index, false);
>
> if (location == INTERP_CENTER) {
> LLVMValueRef ij_out[2];
> @@ -2906,26 +2906,68 @@ static LLVMValueRef visit_interp(struct 
> ac_nir_context *ctx,
>
> }
>
> +   LLVMValueRef array_idx = ctx->ac.i32_0;
> +   while(deref_instr->deref_type != nir_deref_type_var) {
> +   if (deref_instr->deref_type == nir_deref_type_array) {
> +   unsigned array_size = 
> glsl_get_aoa_size(deref_instr->type);
> +   if (!array_size)
> +   array_size = 1;
> +
> +   LLVMValueRef offset;
> +   nir_const_value *const_value = 
> nir_src_as_const_value(deref_instr->arr.index);
> +   if (const_value) {
> +   offset = LLVMConstInt(ctx->ac.i32, array_size 
> * const_value->u32[0], false);
> +   } else {
> +   LLVMValueRef indirect = get_src(ctx, 
> deref_instr->arr.index);
> +
> +   offset = LLVMBuildMul(ctx->ac.builder, 
> indirect,
> + 
> LLVMConstInt(ctx->ac.i32, array_size, false), "");
> +   }
> +
> +   array_idx = LLVMBuildAdd(ctx->ac.builder, array_idx, 
> offset, "");
> +   deref_instr = nir_src_as_deref(deref_instr->parent);
> +   } else if (deref_instr->deref_type == nir_deref_type_struct) {
> +   /* TODO: Probably need to do more here to support 
> arrays of structs etc */
> +   deref_instr = nir_src_as_deref(deref_instr->parent);

If we don't have confidence this works can we just have it go to the
unreachable below. IIRC spirv->nir also lowered struct inputs so I'm
not even sure we would encounter this.

Otherwise,

Reviewed-by: Bas Nieuwenhuizen 

> +   } else {
> +   unreachable("Unsupported deref type");
> +   }
> +
> +   }
> +
> +   unsigned input_array_size = glsl_get_aoa_size(var->type);
> +   if (!input_array_size)
> +   input_array_size = 1;
> +
> for (chan = 0; chan < 4; chan++) {
> +   LLVMValueRef gather = 
> LLVMGetUndef(LLVMVectorType(ctx->ac.f32, input_array_size));
> LLVMValueRef llvm_chan = LLVMConstInt(ctx->ac.i32, chan, 
> false);
>
> -   if (interp_param) {
> -   interp_param = LLVMBuildBitCast(ctx->ac.builder,
> +   for (unsigned idx = 0; idx < input_array_size; ++idx) {
> +   LLVMValueRef v, attr_number;
> +
> +   attr_number = LLVMConstInt(ctx->ac.i32, input_base + 
> idx, false);
> +

Re: [Mesa-dev] [PATCH] ac/nir_to_llvm: fix interpolateAt* for arrays

2019-01-18 Thread Timothy Arceri

Ping!

On 10/1/19 4:59 pm, Timothy Arceri wrote:

This builds on the recent interpolate fix by Rhys ee8488ea3b99.

This doesn't handle arrays of structs but I've got a feeling those
might be broken even for radeonsi tgsi (we currently have no tests).

This fixes the arb_gpu_shader5 interpolateAt* tests that contain
arrays.

Fixes: ee8488ea3b99 ("ac/nir,radv,radeonsi/nir: use correct indices for 
interpolation intrinsics")
---
  src/amd/common/ac_nir_to_llvm.c | 80 +
  1 file changed, 61 insertions(+), 19 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 5023b96f92..00011a439d 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2830,15 +2830,16 @@ static LLVMValueRef visit_interp(struct ac_nir_context 
*ctx,
 const nir_intrinsic_instr *instr)
  {
LLVMValueRef result[4];
-   LLVMValueRef interp_param, attr_number;
+   LLVMValueRef interp_param;
unsigned location;
unsigned chan;
LLVMValueRef src_c0 = NULL;
LLVMValueRef src_c1 = NULL;
LLVMValueRef src0 = NULL;
  
-	nir_variable *var = nir_deref_instr_get_variable(nir_instr_as_deref(instr->src[0].ssa->parent_instr));

-   int input_index = ctx->abi->fs_input_attr_indices[var->data.location - 
VARYING_SLOT_VAR0];
+   nir_deref_instr *deref_instr = 
nir_instr_as_deref(instr->src[0].ssa->parent_instr);
+   nir_variable *var = nir_deref_instr_get_variable(deref_instr);
+   int input_base = ctx->abi->fs_input_attr_indices[var->data.location - 
VARYING_SLOT_VAR0];
switch (instr->intrinsic) {
case nir_intrinsic_interp_deref_at_centroid:
location = INTERP_CENTROID;
@@ -2868,7 +2869,6 @@ static LLVMValueRef visit_interp(struct ac_nir_context 
*ctx,
src_c1 = LLVMBuildFSub(ctx->ac.builder, src_c1, halfval, "");
}
interp_param = ctx->abi->lookup_interp_param(ctx->abi, 
var->data.interpolation, location);
-   attr_number = LLVMConstInt(ctx->ac.i32, input_index, false);
  
  	if (location == INTERP_CENTER) {

LLVMValueRef ij_out[2];
@@ -2906,26 +2906,68 @@ static LLVMValueRef visit_interp(struct ac_nir_context 
*ctx,
  
  	}
  
+	LLVMValueRef array_idx = ctx->ac.i32_0;

+   while(deref_instr->deref_type != nir_deref_type_var) {
+   if (deref_instr->deref_type == nir_deref_type_array) {
+   unsigned array_size = 
glsl_get_aoa_size(deref_instr->type);
+   if (!array_size)
+   array_size = 1;
+
+   LLVMValueRef offset;
+   nir_const_value *const_value = 
nir_src_as_const_value(deref_instr->arr.index);
+   if (const_value) {
+   offset = LLVMConstInt(ctx->ac.i32, array_size * 
const_value->u32[0], false);
+   } else {
+   LLVMValueRef indirect = get_src(ctx, 
deref_instr->arr.index);
+
+   offset = LLVMBuildMul(ctx->ac.builder, indirect,
+ LLVMConstInt(ctx->ac.i32, 
array_size, false), "");
+   }
+
+   array_idx = LLVMBuildAdd(ctx->ac.builder, array_idx, offset, 
"");
+   deref_instr = nir_src_as_deref(deref_instr->parent);
+   } else if (deref_instr->deref_type == nir_deref_type_struct) {
+   /* TODO: Probably need to do more here to support 
arrays of structs etc */
+   deref_instr = nir_src_as_deref(deref_instr->parent);
+   } else {
+   unreachable("Unsupported deref type");
+   }
+
+   }
+
+   unsigned input_array_size = glsl_get_aoa_size(var->type);
+   if (!input_array_size)
+   input_array_size = 1;
+
for (chan = 0; chan < 4; chan++) {
+   LLVMValueRef gather = LLVMGetUndef(LLVMVectorType(ctx->ac.f32, 
input_array_size));
LLVMValueRef llvm_chan = LLVMConstInt(ctx->ac.i32, chan, false);
  
-		if (interp_param) {

-   interp_param = LLVMBuildBitCast(ctx->ac.builder,
+   for (unsigned idx = 0; idx < input_array_size; ++idx) {
+   LLVMValueRef v, attr_number;
+
+   attr_number = LLVMConstInt(ctx->ac.i32, input_base + 
idx, false);
+   if (interp_param) {
+   interp_param = LLVMBuildBitCast(ctx->ac.builder,
interp_param, ctx->ac.v2f32, 
"");
-   LLVMValueRef i = LLVMBuildExtractElement(
-   ctx->ac.builder, interp_param, ctx->ac.i32_0, 
"");
-   LLVMValueRef j = LLVMBuildExtractElement(
-   

Re: [Mesa-dev] [PATCH v3 3/5] st/mesa: add support for EXT_shader_image_load_formatted

2019-01-18 Thread Marek Olšák
For patches 1-3:

Reviewed-by: Marek Olšák 

You can also update release notes for the next Mesa release.

Marek

On Wed, Jan 16, 2019 at 6:20 PM Rhys Perry  wrote:

> v3: rebase
>
> Signed-off-by: Rhys Perry 
> Reviewed-by: Marek Olšák  (v2)
> ---
>  src/mesa/state_tracker/st_extensions.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/mesa/state_tracker/st_extensions.c
> b/src/mesa/state_tracker/st_extensions.c
> index 4628079260..b713eed969 100644
> --- a/src/mesa/state_tracker/st_extensions.c
> +++ b/src/mesa/state_tracker/st_extensions.c
> @@ -717,6 +717,7 @@ void st_init_extensions(struct pipe_screen *screen,
>{ o(ARB_shader_clock), PIPE_CAP_TGSI_CLOCK
>  },
>{ o(ARB_shader_draw_parameters),   PIPE_CAP_DRAW_PARAMETERS
>   },
>{ o(ARB_shader_group_vote),PIPE_CAP_TGSI_VOTE
>   },
> +  { o(EXT_shader_image_load_formatted),
> PIPE_CAP_IMAGE_LOAD_FORMATTED },
>{ o(ARB_shader_stencil_export),
> PIPE_CAP_SHADER_STENCIL_EXPORT},
>{ o(ARB_shader_texture_image_samples), PIPE_CAP_TGSI_TXQS
>   },
>{ o(ARB_shader_texture_lod),   PIPE_CAP_SM3
>   },
> --
> 2.20.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 00/42] intel: VK_KHR_shader_float16_int8 implementation

2019-01-18 Thread Jason Ekstrand
I think I've gotten through everything at this point.  If I've missed
anything, please let me know.

On Fri, Jan 18, 2019 at 5:37 AM Iago Toral  wrote:

> Thanks a lot of for all the review work! When you're done reviewing all
> the patches I'll prepare a v4 with all the changes.
>
> On Thu, 2019-01-17 at 18:24 -0600, Jason Ekstrand wrote:
>
> I'm done for the day but I've read through most of the patches.  I think
> I've got 4 or 5 tricky ones left.  By and large, I think things are looking
> really good.  I don't know that we'll make 19.0 but there's a possibility.
> If not, it'll likely land shortly after.
>
> On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga 
> wrote:
>
> The changes in this version address review feedback to v2 and, most
> importantly,
> rebase on top of relevant changes in master, specifically Curro's regioning
> lowering pass. This new regioning pass simplifies some of the NIR
> translation
> code (specifically the code for translating regioning restrictions on
> conversions for atom platforms) making some of the previous work in this
> series
> unnecessary. The regioning restrictions for conversions between integer and
> half-float added with this series are are now implemented as part of this
> framework instead of doing it at NIR translation time. This version of the
> series also dropped the SPIR-V compiler patches that have already been
> merged.
>
> As always, a branch for with these patches is available for testing in the
> itoral/VK_KHR_shader_float16_int8 branch of the Igalia Mesa repository at
> https://github.com/Igalia/mesa.
>
> Iago Toral Quiroga (42):
>   intel/compiler: handle conversions between int and half-float on atom
>   intel/compiler: add a NIR pass to lower conversions
>   intel/compiler: split float to 64-bit opcodes from int to 64-bit
>   intel/compiler: handle b2i/b2f with other integer conversion opcodes
>   intel/compiler: assert restrictions on conversions to half-float
>   intel/compiler: lower some 16-bit float operations to 32-bit
>   intel/compiler: lower 16-bit extended math to 32-bit prior to gen9
>   intel/compiler: implement 16-bit fsign
>   intel/compiler: allow extended math functions with HF operands
>   compiler/nir: add lowering option for 16-bit fmod
>   intel/compiler: lower 16-bit fmod
>   compiler/nir: add lowering for 16-bit flrp
>   intel/compiler: lower 16-bit flrp
>   compiler/nir: add lowering for 16-bit ldexp
>   intel/compiler: Extended Math is limited to SIMD8 on half-float
>   intel/compiler: add instruction setters for Src1Type and Src2Type.
>   intel/compiler: add new half-float register type for 3-src
> instructions
>   intel/compiler: add a helper function to query hardware type table
>   intel/compiler: don't compact 3-src instructions with Src1Type or
> Src2Type bits
>   intel/compiler: allow half-float on 3-source instructions since gen8
>   intel/compiler: set correct precision fields for 3-source float
> instructions
>   intel/compiler: don't propagate HF immediates to 3-src instructions
>   intel/compiler: fix ddx and ddy for 16-bit float
>   intel/compiler: fix ddy for half-float in gen8
>   intel/compiler: workaround for SIMD8 half-float MAD in gen8
>   intel/compiler: split is_partial_write() into two variants
>   intel/compiler: activate 16-bit bit-size lowerings also for 8-bit
>   intel/compiler: handle 64-bit float to 8-bit integer conversions
>   intel/compiler: handle conversions between int and half-float on atom
>   intel/compiler: implement isign for int8
>   intel/compiler: ask for an integer type if requesting an 8-bit type
>   intel/eu: force stride of 2 on NULL register for Byte instructions
>   compiler/spirv: add support for Float16 and Int8 capabilities
>   anv/pipeline: support Float16 and Int8 capabilities in gen8+
>   anv/device: expose shaderFloat16 and shaderInt8 in gen8+
>   intel/compiler: implement is_zero, is_one, is_negative_one for
> 8-bit/16-bit
>   intel/compiler: add a brw_reg_type_is_integer helper
>   intel/compiler: fix cmod propagation for non 32-bit types
>   intel/compiler: remove MAD/LRP algebraic optimizations from the
> backend
>   intel/compiler: support half-float in the combine constants pass
>   intel/compiler: fix combine constants for Align16 with half-float
> prior to gen9
>   intel/compiler: allow propagating HF immediates to MAD/LRP
>
>  src/compiler/nir/nir.h|   2 +
>  src/compiler/nir/nir_opt_algebraic.py |  11 +-
>  src/compiler/shader_info.h|   2 +
>  src/compiler/spirv/spirv_to_nir.c |   8 +-
>  src/intel/Makefile.sources|   1 +
>  src/intel/compiler/brw_compiler.c |   2 +
>  src/intel/compiler/brw_eu_compact.c   |   5 +-
>  src/intel/compiler/brw_eu_emit.c  |  36 +++-
>  src/intel/compiler/brw_fs.cpp | 143 ++--
>  src/intel/compiler/brw_fs.h   |   1 +
>  

Re: [Mesa-dev] [PATCH v3 26/42] intel/compiler: split is_partial_write() into two variants

2019-01-18 Thread Jason Ekstrand
Ugh... I really don't like this...  But I also don't have a better idea
off-hand.  The unfortunate reality is that this IR really isn't designed to
be able to handle this sort of thing.  My #1 concern here is that I don't
think it does good things when we have instructions with exec_size <
dispatch_width such as split instructions in SIMD32.  I think it's *mostly*
a no-op there.  I'll have to think on this one a bit more.  Don't wait to
re-send the v4 until I've come up with something.

On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga 
wrote:

> This function is used in two different scenarios that for 32-bit
> instructions are the same, but for 16-bit instructions are not.
>
> One scenario is that in which we are working at a SIMD8 register
> level and we need to know if a register is fully defined or written.
> This is useful, for example, in the context of liveness analysis or
> register allocation, where we work with units of registers.
>
> The other scenario is that in which we want to know if an instruction
> is writing a full scalar component or just some subset of it. This is
> useful, for example, in the context of some optimization passes
> like copy propagation.
>
> For 32-bit instructions (or larger), a SIMD8 dispatch will always write
> at least a full SIMD8 register (32B) if the write is not partial. The
> function is_partial_write() checks this to determine if we have a partial
> write. However, when we deal with 16-bit instructions, that logic disables
> some optimizations that should be safe. For example, a SIMD8 16-bit MOV
> will
> only update half of a SIMD register, but it is still a complete write of
> the
> variable for a SIMD8 dispatch, so we should not prevent copy propagation in
> this scenario because we don't write all 32 bytes in the SIMD register
> or because the write starts at offset 16B (wehere we pack components Y or
> W of 16-bit vectors).
>
> This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit
> instructions, which lose a number of optimizations because of this, most
> important of which is copy-propagation.
>
> This patch splits is_partial_write() into is_partial_reg_write(), which
> represents the current is_partial_write(), useful for things like
> liveness analysis, and is_partial_var_write(), which considers
> the dispatch size to check if we are writing a full variable (rather
> than a full register) to decide if the write is partial or not, which
> is what we really want in many optimization passes.
>
> Then the patch goes on and rewrites all uses of is_partial_write() to use
> one or the other version. Specifically, we use is_partial_var_write()
> in the following places: copy propagation, cmod propagation, common
> subexpression elimination, saturate propagation and sel peephole.
>
> Notice that the semantics of is_partial_var_write() exactly match the
> current implementation of is_partial_write() for anything that is
> 32-bit or larger, so no changes are expected for 32-bit instructions.
>
> Tested against ~5000 tests involving 16-bit instructions in CTS produced
> the following changes in instruction counts:
>
> Patched  | Master|%|
> 
> SIMD8  |621,900  |706,721| -12.00% |
> 
> SIMD16 | 93,252  | 93,252|   0.00% |
> 
>
> As expected, the change only affects SIMD8 dispatches.
>
> Reviewed-by: Topi Pohjolainen 
> ---
>  src/intel/compiler/brw_fs.cpp | 31 +++
>  .../compiler/brw_fs_cmod_propagation.cpp  | 20 ++--
>  .../compiler/brw_fs_copy_propagation.cpp  |  8 ++---
>  src/intel/compiler/brw_fs_cse.cpp |  3 +-
>  .../compiler/brw_fs_dead_code_eliminate.cpp   |  2 +-
>  src/intel/compiler/brw_fs_live_variables.cpp  |  2 +-
>  src/intel/compiler/brw_fs_reg_allocate.cpp|  2 +-
>  .../compiler/brw_fs_register_coalesce.cpp |  2 +-
>  .../compiler/brw_fs_saturate_propagation.cpp  |  7 +++--
>  src/intel/compiler/brw_fs_sel_peephole.cpp|  4 +--
>  src/intel/compiler/brw_ir_fs.h|  3 +-
>  11 files changed, 54 insertions(+), 30 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index d6096cd667d..77c955ac435 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -716,14 +716,33 @@ fs_visitor::limit_dispatch_width(unsigned n, const
> char *msg)
>   * it.
>   */
>  bool
> -fs_inst::is_partial_write() const
> +fs_inst::is_partial_reg_write() const
>  {
> return ((this->predicate && this->opcode != BRW_OPCODE_SEL) ||
> -   (this->exec_size * type_sz(this->dst.type)) < 32 ||
> !this->dst.is_contiguous() ||
> +   (this->exec_size * type_sz(this->dst.type)) < REG_SIZE ||
> this->dst.offset % REG_SIZE != 0);
>  }
>
> +/**
> + * Returns true if the instruction 

Re: [Mesa-dev] [PATCH] intel/genxml: add missing MI_PREDICATE compare operations

2019-01-18 Thread Rafael Antognolli
Reviewed-by: Rafael Antognolli 

On Fri, Jan 18, 2019 at 05:01:58PM +, Lionel Landwerlin wrote:
> Doesn't save us a great deal of lines but at least they get decoded in
> aubinators.
> 
> Signed-off-by: Lionel Landwerlin 
> ---
>  src/intel/genxml/gen10.xml | 2 ++
>  src/intel/genxml/gen11.xml | 2 ++
>  src/intel/genxml/gen7.xml  | 2 ++
>  src/intel/genxml/gen75.xml | 2 ++
>  src/intel/genxml/gen8.xml  | 2 ++
>  src/intel/genxml/gen9.xml  | 2 ++
>  src/intel/vulkan/genX_cmd_buffer.c | 1 -
>  7 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/src/intel/genxml/gen10.xml b/src/intel/genxml/gen10.xml
> index 9ec311d6cc5..7043ab8995d 100644
> --- a/src/intel/genxml/gen10.xml
> +++ b/src/intel/genxml/gen10.xml
> @@ -3047,6 +3047,8 @@
>
>  
>   prefix="COMPARE">
> +  
> +  
>
>
>  
> diff --git a/src/intel/genxml/gen11.xml b/src/intel/genxml/gen11.xml
> index 6ab1f965650..3af80a6ed3d 100644
> --- a/src/intel/genxml/gen11.xml
> +++ b/src/intel/genxml/gen11.xml
> @@ -3042,6 +3042,8 @@
>
>  
>   prefix="COMPARE">
> +  
> +  
>
>
>  
> diff --git a/src/intel/genxml/gen7.xml b/src/intel/genxml/gen7.xml
> index 893c12b8af9..3c445757300 100644
> --- a/src/intel/genxml/gen7.xml
> +++ b/src/intel/genxml/gen7.xml
> @@ -2051,6 +2051,8 @@
>
>  
>   prefix="COMPARE">
> +  
> +  
>
>
>  
> diff --git a/src/intel/genxml/gen75.xml b/src/intel/genxml/gen75.xml
> index 009a123ad69..3df7dc29939 100644
> --- a/src/intel/genxml/gen75.xml
> +++ b/src/intel/genxml/gen75.xml
> @@ -2462,6 +2462,8 @@
>
>  
>   prefix="COMPARE">
> +  
> +  
>
>
>  
> diff --git a/src/intel/genxml/gen8.xml b/src/intel/genxml/gen8.xml
> index fd19b0c8b33..4d1488dae62 100644
> --- a/src/intel/genxml/gen8.xml
> +++ b/src/intel/genxml/gen8.xml
> @@ -2690,6 +2690,8 @@
>
>  
>   prefix="COMPARE">
> +  
> +  
>
>
>  
> diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml
> index 706d398babb..3f02e866d0c 100644
> --- a/src/intel/genxml/gen9.xml
> +++ b/src/intel/genxml/gen9.xml
> @@ -2973,6 +2973,8 @@
>
>  
>   prefix="COMPARE">
> +  
> +  
>
>
>  
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> b/src/intel/vulkan/genX_cmd_buffer.c
> index 6fb19661ebb..544e2929990 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -3310,7 +3310,6 @@ void genX(CmdDispatchIndirect)(
> }
>  
> /* predicate = !predicate; */
> -#define COMPARE_FALSE   1
> anv_batch_emit(batch, GENX(MI_PREDICATE), mip) {
>mip.LoadOperation= LOAD_LOADINV;
>mip.CombineOperation = COMBINE_OR;
> -- 
> 2.20.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 05/24] mapi: add all _glapi_table entrypoints to static_data.py

2019-01-18 Thread Erik Faye-Lund
On Mon, 2019-01-14 at 11:41 +, Emil Velikov wrote:
> From: Emil Velikov 
> 
> Currently various parts of mesa use the glapi_table differently.
> 
> Some use _glapi_get_proc_offset() to get the offset, while others
> directly reference the specific offset via _gloffset_Function.
> 
> Add all static entries, to ensure things don't break as we flip to
> the
> upstream XML + new mapi generator.
> 
> Note: the offsets are also used for the alias remap table, thus we
> need
> to ensure we honour the correct offsets range or it will break.
> 
> Currently this is done via MAX_OFFSETS constant, although a better
> solution is in the works.
> 
> v2: add FramebufferTexture2DMultisampleEXT
> v3: add MAX_OFFSETS guard
> 
> Signed-off-by: Emil Velikov 
> Reviewed-by: Erik Faye-Lund  (v1)

Consider this version also:

Reviewed-by: Erik Faye-Lund 

> Signed-off-by: Emil Velikov 
> ---
>  src/mapi/glapi/gen/gl_XML.py  |2 +-
>  src/mapi/glapi/gen/static_data.py | 1023
> -
>  2 files changed, 1023 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mapi/glapi/gen/gl_XML.py
> b/src/mapi/glapi/gen/gl_XML.py
> index b4aa6be985e..d2972992d1e 100644
> --- a/src/mapi/glapi/gen/gl_XML.py
> +++ b/src/mapi/glapi/gen/gl_XML.py
> @@ -693,7 +693,7 @@ class gl_function( gl_item ):
>  # Only try to set the offset when a non-alias entry-
> point
>  # is being processed.
>  
> -if name in static_data.offsets:
> +if name in static_data.offsets and
> static_data.offsets[name] <= static_data.MAX_OFFSETS:
>  self.offset = static_data.offsets[name]
>  else:
>  self.offset = -1
> diff --git a/src/mapi/glapi/gen/static_data.py
> b/src/mapi/glapi/gen/static_data.py
> index 0596d2cd3bb..1c71e188ef1 100644
> --- a/src/mapi/glapi/gen/static_data.py
> +++ b/src/mapi/glapi/gen/static_data.py
> @@ -20,8 +20,17 @@
>  # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> OTHER DEALINGS
>  # IN THE SOFTWARE.
>  
> +
> +"""The maximum entries of actual static data required by indirect
> GLX."""
> +
> +
> +MAX_OFFSETS = 407
> +
>  """Table of functions that have ABI-mandated offsets in the dispatch
> table.
>  
> +The first MAX_OFFSETS entries are required by indirect GLX. The rest
> are
> +required to preserve the glapi <> drivers ABI. This is to be
> addressed shortly.
> +
>  This list will never change."""
>  offsets = {
>  "NewList": 0,
> @@ -431,7 +440,1019 @@ offsets = {
>  "MultiTexCoord4i": 404,
>  "MultiTexCoord4iv": 405,
>  "MultiTexCoord4s": 406,
> -"MultiTexCoord4sv": 407
> +"MultiTexCoord4sv": 407,
> +"CompressedTexImage1D": 408,
> +"CompressedTexImage2D": 409,
> +"CompressedTexImage3D": 410,
> +"CompressedTexSubImage1D": 411,
> +"CompressedTexSubImage2D": 412,
> +"CompressedTexSubImage3D": 413,
> +"GetCompressedTexImage": 414,
> +"LoadTransposeMatrixd": 415,
> +"LoadTransposeMatrixf": 416,
> +"MultTransposeMatrixd": 417,
> +"MultTransposeMatrixf": 418,
> +"SampleCoverage": 419,
> +"BlendFuncSeparate": 420,
> +"FogCoordPointer": 421,
> +"FogCoordd": 422,
> +"FogCoorddv": 423,
> +"MultiDrawArrays": 424,
> +"PointParameterf": 425,
> +"PointParameterfv": 426,
> +"PointParameteri": 427,
> +"PointParameteriv": 428,
> +"SecondaryColor3b": 429,
> +"SecondaryColor3bv": 430,
> +"SecondaryColor3d": 431,
> +"SecondaryColor3dv": 432,
> +"SecondaryColor3i": 433,
> +"SecondaryColor3iv": 434,
> +"SecondaryColor3s": 435,
> +"SecondaryColor3sv": 436,
> +"SecondaryColor3ub": 437,
> +"SecondaryColor3ubv": 438,
> +"SecondaryColor3ui": 439,
> +"SecondaryColor3uiv": 440,
> +"SecondaryColor3us": 441,
> +"SecondaryColor3usv": 442,
> +"SecondaryColorPointer": 443,
> +"WindowPos2d": 444,
> +"WindowPos2dv": 445,
> +"WindowPos2f": 446,
> +"WindowPos2fv": 447,
> +"WindowPos2i": 448,
> +"WindowPos2iv": 449,
> +"WindowPos2s": 450,
> +"WindowPos2sv": 451,
> +"WindowPos3d": 452,
> +"WindowPos3dv": 453,
> +"WindowPos3f": 454,
> +"WindowPos3fv": 455,
> +"WindowPos3i": 456,
> +"WindowPos3iv": 457,
> +"WindowPos3s": 458,
> +"WindowPos3sv": 459,
> +"BeginQuery": 460,
> +"BindBuffer": 461,
> +"BufferData": 462,
> +"BufferSubData": 463,
> +"DeleteBuffers": 464,
> +"DeleteQueries": 465,
> +"EndQuery": 466,
> +"GenBuffers": 467,
> +"GenQueries": 468,
> +"GetBufferParameteriv": 469,
> +"GetBufferPointerv": 470,
> +"GetBufferSubData": 471,
> +"GetQueryObjectiv": 472,
> +"GetQueryObjectuiv": 473,
> +"GetQueryiv": 474,
> +"IsBuffer": 475,
> +"IsQuery": 476,
> +"MapBuffer": 477,
> +"UnmapBuffer": 478,
> +"AttachShader": 479,
> +"BindAttribLocation": 480,
> +"BlendEquationSeparate": 481,
> +"CompileShader": 482,
> +

Re: [Mesa-dev] [PATCH v3 25/42] intel/compiler: workaround for SIMD8 half-float MAD in gen8

2019-01-18 Thread Jason Ekstrand
On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga 
wrote:

> Broadwell hardware has a bug that manifests in SIMD8 executions of
> 16-bit MAD instructions when any of the sources is a Y or W component.
> We pack these components in the same SIMD register as components X and
> Z respectively, but starting at offset 16B (so they live in the second
> half of the register). The problem does not exist in SKL or later.
>
> We work around this issue by moving any such sources to a temporary
> starting at offset 0B. We want to do this after the main optimization loop
> to prevent copy-propagation and friends to undo the fix.
>
> Reviewed-by: Topi Pohjolainen 
> ---
>  src/intel/compiler/brw_fs.cpp | 48 +++
>  src/intel/compiler/brw_fs.h   |  1 +
>  2 files changed, 49 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 0b3ec94e2d2..d6096cd667d 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -6540,6 +6540,48 @@ fs_visitor::optimize()
> validate();
>  }
>
> +/**
> + * Broadwell hardware has a bug that manifests in SIMD8 executions of
> 16-bit
> + * MAD instructions when any of the sources is a Y or W component. We pack
> + * these components in the same SIMD register as components X and Z
> + * respectively, but starting at offset 16B (so they live in the second
> half
> + * of the register).
>

What exactly do you mean by a Y or W component?  Is this for the case where
you have a scalar that happens to land at certain offsets?  Or does it
apply to regular stride == 1 MADs?  If it applied in the stride == 1 case,
then I really don't see what this is doing to fix it.  It might help if you
provided some before and after assembly example.

Also, this seems like something that should go in the new region
restrictions pass as a special case in has_invalid_src_region.

--Jason


> + *
> + * We work around this issue by moving any such sources to a temporary
> + * starting at offset 0B. We want to do this after the main optimization
> loop
> + * to prevent copy-propagation and friends to undo the fix.
> + */
> +void
> +fs_visitor::fixup_hf_mad()
> +{
> +   if (devinfo->gen != 8)
> +  return;
> +
> +   bool progress = false;
> +
> +   foreach_block_and_inst_safe (block, fs_inst, inst, cfg) {
> +  if (inst->opcode != BRW_OPCODE_MAD ||
> +  inst->dst.type != BRW_REGISTER_TYPE_HF ||
> +  inst->exec_size > 8)
> + continue;
> +
> +  for (int i = 0; i < 3; i++) {
> + if (inst->src[i].offset > 0) {
> +assert(inst->src[i].type == BRW_REGISTER_TYPE_HF);
> +const fs_builder ibld =
> +   bld.at(block, inst).exec_all().group(inst->exec_size, 0);
> +fs_reg tmp = ibld.vgrf(inst->src[i].type);
> +ibld.MOV(tmp, inst->src[i]);
> +inst->src[i] = tmp;
> +progress = true;
> + }
> +  }
> +   }
> +
> +   if (progress)
> +  invalidate_live_intervals();
> +}
> +
>  /**
>   * Three source instruction must have a GRF/MRF destination register.
>   * ARF NULL is not allowed.  Fix that up by allocating a temporary GRF.
> @@ -6698,6 +6740,7 @@ fs_visitor::run_vs()
> assign_curb_setup();
> assign_vs_urb_setup();
>
> +   fixup_hf_mad();
> fixup_3src_null_dest();
> allocate_registers(8, true);
>
> @@ -6782,6 +6825,7 @@ fs_visitor::run_tcs_single_patch()
> assign_curb_setup();
> assign_tcs_single_patch_urb_setup();
>
> +   fixup_hf_mad();
> fixup_3src_null_dest();
> allocate_registers(8, true);
>
> @@ -6816,6 +6860,7 @@ fs_visitor::run_tes()
> assign_curb_setup();
> assign_tes_urb_setup();
>
> +   fixup_hf_mad();
> fixup_3src_null_dest();
> allocate_registers(8, true);
>
> @@ -6865,6 +6910,7 @@ fs_visitor::run_gs()
> assign_curb_setup();
> assign_gs_urb_setup();
>
> +   fixup_hf_mad();
> fixup_3src_null_dest();
> allocate_registers(8, true);
>
> @@ -6965,6 +7011,7 @@ fs_visitor::run_fs(bool allow_spilling, bool
> do_rep_send)
>
>assign_urb_setup();
>
> +  fixup_hf_mad();
>fixup_3src_null_dest();
>allocate_registers(8, allow_spilling);
>
> @@ -7009,6 +7056,7 @@ fs_visitor::run_cs(unsigned min_dispatch_width)
>
> assign_curb_setup();
>
> +   fixup_hf_mad();
> fixup_3src_null_dest();
> allocate_registers(min_dispatch_width, true);
>
> diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
> index 68287bcdcea..1879d4bc7f7 100644
> --- a/src/intel/compiler/brw_fs.h
> +++ b/src/intel/compiler/brw_fs.h
> @@ -103,6 +103,7 @@ public:
> void setup_vs_payload();
> void setup_gs_payload();
> void setup_cs_payload();
> +   void fixup_hf_mad();
> void fixup_3src_null_dest();
> void assign_curb_setup();
> void calculate_urb_setup();
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> 

[Mesa-dev] [PATCH] intel/genxml: add missing MI_PREDICATE compare operations

2019-01-18 Thread Lionel Landwerlin
Doesn't save us a great deal of lines but at least they get decoded in
aubinators.

Signed-off-by: Lionel Landwerlin 
---
 src/intel/genxml/gen10.xml | 2 ++
 src/intel/genxml/gen11.xml | 2 ++
 src/intel/genxml/gen7.xml  | 2 ++
 src/intel/genxml/gen75.xml | 2 ++
 src/intel/genxml/gen8.xml  | 2 ++
 src/intel/genxml/gen9.xml  | 2 ++
 src/intel/vulkan/genX_cmd_buffer.c | 1 -
 7 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/intel/genxml/gen10.xml b/src/intel/genxml/gen10.xml
index 9ec311d6cc5..7043ab8995d 100644
--- a/src/intel/genxml/gen10.xml
+++ b/src/intel/genxml/gen10.xml
@@ -3047,6 +3047,8 @@
   
 
 
+  
+  
   
   
 
diff --git a/src/intel/genxml/gen11.xml b/src/intel/genxml/gen11.xml
index 6ab1f965650..3af80a6ed3d 100644
--- a/src/intel/genxml/gen11.xml
+++ b/src/intel/genxml/gen11.xml
@@ -3042,6 +3042,8 @@
   
 
 
+  
+  
   
   
 
diff --git a/src/intel/genxml/gen7.xml b/src/intel/genxml/gen7.xml
index 893c12b8af9..3c445757300 100644
--- a/src/intel/genxml/gen7.xml
+++ b/src/intel/genxml/gen7.xml
@@ -2051,6 +2051,8 @@
   
 
 
+  
+  
   
   
 
diff --git a/src/intel/genxml/gen75.xml b/src/intel/genxml/gen75.xml
index 009a123ad69..3df7dc29939 100644
--- a/src/intel/genxml/gen75.xml
+++ b/src/intel/genxml/gen75.xml
@@ -2462,6 +2462,8 @@
   
 
 
+  
+  
   
   
 
diff --git a/src/intel/genxml/gen8.xml b/src/intel/genxml/gen8.xml
index fd19b0c8b33..4d1488dae62 100644
--- a/src/intel/genxml/gen8.xml
+++ b/src/intel/genxml/gen8.xml
@@ -2690,6 +2690,8 @@
   
 
 
+  
+  
   
   
 
diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml
index 706d398babb..3f02e866d0c 100644
--- a/src/intel/genxml/gen9.xml
+++ b/src/intel/genxml/gen9.xml
@@ -2973,6 +2973,8 @@
   
 
 
+  
+  
   
   
 
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 6fb19661ebb..544e2929990 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -3310,7 +3310,6 @@ void genX(CmdDispatchIndirect)(
}
 
/* predicate = !predicate; */
-#define COMPARE_FALSE   1
anv_batch_emit(batch, GENX(MI_PREDICATE), mip) {
   mip.LoadOperation= LOAD_LOADINV;
   mip.CombineOperation = COMBINE_OR;
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/8] radeonsi: don't use WRITE_DATA.DST_SEL == MEM_GRBM on >= CIK

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_pipe.c   | 3 ++-
 src/gallium/drivers/radeonsi/si_state_draw.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index b2eb91dca92..f68ef3f67ce 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -525,21 +525,22 @@ static struct pipe_context *si_create_context(struct 
pipe_screen *screen,
 
if (sctx->chip_class >= GFX9) {
sctx->wait_mem_scratch = r600_resource(
pipe_buffer_create(screen, 0, PIPE_USAGE_DEFAULT, 4));
if (!sctx->wait_mem_scratch)
goto fail;
 
/* Initialize the memory. */
struct radeon_cmdbuf *cs = sctx->gfx_cs;
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
-   radeon_emit(cs, S_370_DST_SEL(V_370_MEM_GRBM) |
+   radeon_emit(cs, S_370_DST_SEL(sctx->chip_class >= CIK ? 
V_370_MEM
+ : 
V_370_MEM_GRBM) |
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_ME));
radeon_emit(cs, sctx->wait_mem_scratch->gpu_address);
radeon_emit(cs, sctx->wait_mem_scratch->gpu_address >> 32);
radeon_emit(cs, sctx->wait_mem_number);
radeon_add_to_buffer_list(sctx, cs, sctx->wait_mem_scratch,
  RADEON_USAGE_WRITE, 
RADEON_PRIO_FENCE);
}
 
/* CIK cannot unbind a constant buffer (S_BUFFER_LOAD doesn't skip loads
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index ea8c5d054b5..9a80bd81327 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -1589,21 +1589,22 @@ si_draw_rectangle(struct blitter_context *blitter,
si_draw_vbo(pipe, );
 }
 
 void si_trace_emit(struct si_context *sctx)
 {
struct radeon_cmdbuf *cs = sctx->gfx_cs;
uint64_t va = sctx->current_saved_cs->trace_buf->gpu_address;
uint32_t trace_id = ++sctx->current_saved_cs->trace_id;
 
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
-   radeon_emit(cs, S_370_DST_SEL(V_370_MEM_GRBM) |
+   radeon_emit(cs, S_370_DST_SEL(sctx->chip_class >= CIK ? V_370_MEM
+ : V_370_MEM_GRBM) 
|
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_ME));
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
radeon_emit(cs, trace_id);
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
radeon_emit(cs, AC_ENCODE_TRACE_POINT(trace_id));
 
if (sctx->log)
u_log_flush(sctx->log);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/8] radeonsi: correct WRITE_DATA.DST_SEL definitions

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/amd/common/sid.h |  4 ++--
 src/amd/vulkan/radv_cmd_buffer.c | 16 
 src/amd/vulkan/radv_meta_buffer.c|  2 +-
 src/amd/vulkan/radv_query.c  |  2 +-
 src/gallium/drivers/radeonsi/si_fence.c  |  2 +-
 src/gallium/drivers/radeonsi/si_pipe.c   |  2 +-
 src/gallium/drivers/radeonsi/si_state_draw.c |  2 +-
 7 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/src/amd/common/sid.h b/src/amd/common/sid.h
index 12e80df4884..5c8eee0124d 100644
--- a/src/amd/common/sid.h
+++ b/src/amd/common/sid.h
@@ -126,25 +126,25 @@
 #define   R_370_CONTROL0x370 /* 0x[packet 
number][word index] */
 #define S_370_ENGINE_SEL(x)(((unsigned)(x) & 0x3) 
<< 30)
 #define   V_370_ME 0
 #define   V_370_PFP1
 #define   V_370_CE 2
 #define   V_370_DE 3
 #define S_370_WR_CONFIRM(x)(((unsigned)(x) & 0x1) 
<< 20)
 #define S_370_WR_ONE_ADDR(x)   (((unsigned)(x) & 0x1) << 16)
 #define S_370_DST_SEL(x)   (((unsigned)(x) & 0xf) << 8)
 #define   V_370_MEM_MAPPED_REGISTER0
-#define   V_370_MEMORY_SYNC1
+#define   V_370_MEM_GRBM   1 /* sync across GRBM */
 #define   V_370_TC_L2  2
 #define   V_370_GDS3
 #define   V_370_RESERVED   4
-#define   V_370_MEM_ASYNC  5
+#define   V_370_MEM5 /* not on SI */
 #define   R_371_DST_ADDR_LO0x371
 #define   R_372_DST_ADDR_HI0x372
 #define PKT3_DRAW_INDEX_INDIRECT_MULTI 0x38
 #define PKT3_MEM_SEMAPHORE 0x39
 #define PKT3_MPEG_INDEX0x3A /* not on CIK */
 #define PKT3_WAIT_REG_MEM  0x3C
 #defineWAIT_REG_MEM_EQUAL  3
 #defineWAIT_REG_MEM_NOT_EQUAL  4
 #defineWAIT_REG_MEM_GREATER_OR_EQUAL   5
 #define WAIT_REG_MEM_MEM_SPACE(x)   (((unsigned)(x) & 0x3) << 4)
diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index f41d6c0b3e7..20aeda96da2 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -446,21 +446,21 @@ radv_cmd_buffer_upload_data(struct radv_cmd_buffer 
*cmd_buffer,
 
 static void
 radv_emit_write_data_packet(struct radv_cmd_buffer *cmd_buffer, uint64_t va,
unsigned count, const uint32_t *data)
 {
struct radeon_cmdbuf *cs = cmd_buffer->cs;
 
radeon_check_space(cmd_buffer->device->ws, cs, 4 + count);
 
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 2 + count, 0));
-   radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
+   radeon_emit(cs, S_370_DST_SEL(V_370_MEM) |
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_ME));
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
radeon_emit_array(cs, data, count);
 }
 
 void radv_cmd_buffer_trace_emit(struct radv_cmd_buffer *cmd_buffer)
 {
struct radv_device *device = cmd_buffer->device;
@@ -1237,21 +1237,21 @@ radv_set_ds_clear_metadata(struct radv_cmd_buffer 
*cmd_buffer,
if (aspects & VK_IMAGE_ASPECT_STENCIL_BIT) {
++reg_count;
} else {
++reg_offset;
va += 4;
}
if (aspects & VK_IMAGE_ASPECT_DEPTH_BIT)
++reg_count;
 
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 2 + reg_count, 0));
-   radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
+   radeon_emit(cs, S_370_DST_SEL(V_370_MEM) |
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_PFP));
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
if (aspects & VK_IMAGE_ASPECT_STENCIL_BIT)
radeon_emit(cs, ds_clear_value.stencil);
if (aspects & VK_IMAGE_ASPECT_DEPTH_BIT)
radeon_emit(cs, fui(ds_clear_value.depth));
 }
 
@@ -1261,21 +1261,21 @@ radv_set_ds_clear_metadata(struct radv_cmd_buffer 
*cmd_buffer,
 static void
 radv_set_tc_compat_zrange_metadata(struct radv_cmd_buffer *cmd_buffer,
   struct radv_image *image,
   uint32_t value)
 {
struct radeon_cmdbuf *cs = cmd_buffer->cs;
uint64_t va = radv_buffer_get_va(image->bo);
va += image->offset + image->tc_compat_zrange_offset;
 
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
-   radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
+   radeon_emit(cs, S_370_DST_SEL(V_370_MEM) |
S_370_WR_CONFIRM(1) |

[Mesa-dev] [PATCH 7/8] gallium/util: add a linear allocator for reducing malloc overhead

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/auxiliary/Makefile.sources  |  1 +
 src/gallium/auxiliary/meson.build   |  1 +
 src/gallium/auxiliary/util/u_cpu_suballoc.h | 90 +
 3 files changed, 92 insertions(+)
 create mode 100644 src/gallium/auxiliary/util/u_cpu_suballoc.h

diff --git a/src/gallium/auxiliary/Makefile.sources 
b/src/gallium/auxiliary/Makefile.sources
index 50e88088ff8..b26415858f6 100644
--- a/src/gallium/auxiliary/Makefile.sources
+++ b/src/gallium/auxiliary/Makefile.sources
@@ -211,20 +211,21 @@ C_SOURCES := \
util/u_bitmask.c \
util/u_bitmask.h \
util/u_blend.h \
util/u_blit.c \
util/u_blit.h \
util/u_blitter.c \
util/u_blitter.h \
util/u_box.h \
util/u_cache.c \
util/u_cache.h \
+   util/u_cpu_suballoc.h \
util/u_debug_gallium.h \
util/u_debug_gallium.c \
util/u_debug_describe.c \
util/u_debug_describe.h \
util/u_debug_flush.c \
util/u_debug_flush.h \
util/u_debug_image.c \
util/u_debug_image.h \
util/u_debug_memory.c \
util/u_debug_refcnt.c \
diff --git a/src/gallium/auxiliary/meson.build 
b/src/gallium/auxiliary/meson.build
index 57f7e69050f..7e1e4732421 100644
--- a/src/gallium/auxiliary/meson.build
+++ b/src/gallium/auxiliary/meson.build
@@ -231,20 +231,21 @@ files_libgallium = files(
   'util/u_bitmask.c',
   'util/u_bitmask.h',
   'util/u_blend.h',
   'util/u_blit.c',
   'util/u_blit.h',
   'util/u_blitter.c',
   'util/u_blitter.h',
   'util/u_box.h',
   'util/u_cache.c',
   'util/u_cache.h',
+  'util/u_cpu_suballoc.h',
   'util/u_debug_gallium.h',
   'util/u_debug_gallium.c',
   'util/u_debug_describe.c',
   'util/u_debug_describe.h',
   'util/u_debug_flush.c',
   'util/u_debug_flush.h',
   'util/u_debug_image.c',
   'util/u_debug_image.h',
   'util/u_debug_memory.c',
   'util/u_debug_refcnt.c',
diff --git a/src/gallium/auxiliary/util/u_cpu_suballoc.h 
b/src/gallium/auxiliary/util/u_cpu_suballoc.h
new file mode 100644
index 000..2373c1f7c70
--- /dev/null
+++ b/src/gallium/auxiliary/util/u_cpu_suballoc.h
@@ -0,0 +1,90 @@
+/**
+ *
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ **/
+
+/* A simple utility for suballocating out of malloc_aligned. */
+
+#ifndef U_CPU_SUBALLOC_H
+#define U_CPU_SUBALLOC_H
+
+#include 
+#include "util/os_memory.h"
+
+struct u_cpu_suballoc {
+   unsigned default_size;  /* Default size of the buffer, in bytes. */
+   unsigned current_size;  /* Current size of the buffer, in bytes. */
+   unsigned alignment; /* malloc alignment. */
+   unsigned offset;/* Offset pointing to the first unused byte. */
+   uint8_t *buffer;/* Pointer to the CPU buffer. */
+};
+
+
+static inline void
+u_cpu_suballoc_init(struct u_cpu_suballoc *alloc, unsigned default_size,
+   unsigned alignment)
+{
+   memset(alloc, 0, sizeof(*alloc));
+   alloc->default_size = default_size;
+   alloc->alignment = alignment;
+}
+
+
+static inline void
+u_cpu_suballoc_deinit(struct u_cpu_suballoc *alloc)
+{
+   os_free_aligned(alloc->buffer);
+   alloc->buffer = NULL;
+}
+
+
+static inline void *
+u_cpu_suballoc(struct u_cpu_suballoc *alloc, unsigned size, unsigned alignment)
+{
+   unsigned offset = align(alloc->offset, alignment);
+
+   /* Make sure we have enough space in the buffer for the sub-allocation. */
+   if (unlikely(!alloc->buffer || offset + size > alloc->current_size)) {
+  os_free_aligned(alloc->buffer);
+
+  alloc->current_size = MAX2(alloc->default_size, size);
+  alloc->offset = 0;
+  alloc->buffer = (uint8_t*)os_malloc_aligned(alloc->current_size,
+

[Mesa-dev] [PATCH 8/8] radeonsi: use WRITE_DATA for small MapBuffer(INVALIDATE_RANGE) sizes

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_buffer.c | 30 
 src/gallium/drivers/radeonsi/si_pipe.c   |  2 ++
 src/gallium/drivers/radeonsi/si_pipe.h   |  8 +++
 3 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_buffer.c 
b/src/gallium/drivers/radeonsi/si_buffer.c
index a1e421b8b0d..1d4387252a0 100644
--- a/src/gallium/drivers/radeonsi/si_buffer.c
+++ b/src/gallium/drivers/radeonsi/si_buffer.c
@@ -443,32 +443,47 @@ static void *si_buffer_transfer_map(struct pipe_context 
*ctx,
 PIPE_TRANSFER_PERSISTENT))) ||
 (rbuffer->flags & RADEON_FLAG_SPARSE))) {
assert(usage & PIPE_TRANSFER_WRITE);
 
/* Check if mapping this buffer would cause waiting for the GPU.
 */
if (rbuffer->flags & RADEON_FLAG_SPARSE ||
force_discard_range ||
si_rings_is_buffer_referenced(sctx, rbuffer->buf, 
RADEON_USAGE_READWRITE) ||
!sctx->ws->buffer_wait(rbuffer->buf, 0, 
RADEON_USAGE_READWRITE)) {
+   unsigned alloc_start = box->x % SI_MAP_BUFFER_ALIGNMENT;
+   unsigned alloc_size = alloc_start + box->width;
+
+   /* Use PKT3_WRITE_DATA for small uploads. */
+   if (box->width <= SI_TRANSFER_WRITE_DATA_THRESHOLD &&
+   box->x % 4 == 0 && box->width % 4 == 0) {
+   void *cpu_map = 
u_cpu_suballoc(>cpu_suballoc, alloc_size,
+  
SI_MAP_BUFFER_ALIGNMENT);
+   cpu_map = (char*)cpu_map + alloc_start;
+
+   return si_buffer_get_transfer(ctx, resource, 
usage, box,
+ ptransfer, 
cpu_map, cpu_map,
+ 
SI_TRANSFER_SPECIAL_OFFSET_USE_CPU_ALLOC);
+   }
+
/* Do a wait-free write-only transfer using a temporary 
buffer. */
unsigned offset;
struct r600_resource *staging = NULL;
 
u_upload_alloc(ctx->stream_uploader, 0,
-   box->width + (box->x % 
SI_MAP_BUFFER_ALIGNMENT),
+   alloc_size,
   sctx->screen->info.tcc_cache_line_size,
   , (struct 
pipe_resource**),
(void**));
 
if (staging) {
-   data += box->x % SI_MAP_BUFFER_ALIGNMENT;
+   data += alloc_start;
return si_buffer_get_transfer(ctx, resource, 
usage, box,
ptransfer, 
data, staging, offset);
} else if (rbuffer->flags & RADEON_FLAG_SPARSE) {
return NULL;
}
} else {
/* At this point, the buffer is always idle (we checked 
it above). */
usage |= PIPE_TRANSFER_UNSYNCHRONIZED;
}
}
@@ -530,26 +545,30 @@ static void si_buffer_write_data(struct si_context *sctx, 
struct r600_resource *
si_cp_write_data(sctx, buf, offset, size, V_370_TC_L2, V_370_ME, data);
 
radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0));
radeon_emit(cs, 0);
 }
 
 static void si_buffer_do_flush_region(struct pipe_context *ctx,
  struct pipe_transfer *transfer,
  const struct pipe_box *box)
 {
+   struct si_context *sctx = (struct si_context*)ctx;
struct si_transfer *stransfer = (struct si_transfer*)transfer;
struct r600_resource *rbuffer = r600_resource(transfer->resource);
 
-   if (stransfer->u.staging) {
+   if (stransfer->offset == SI_TRANSFER_SPECIAL_OFFSET_USE_CPU_ALLOC) {
+   si_buffer_write_data(sctx, rbuffer, box->x, box->width,
+stransfer->u.cpu);
+   } else if (stransfer->u.staging) {
/* Copy the staging buffer into the original one. */
-   si_copy_buffer((struct si_context*)ctx, transfer->resource,
+   si_copy_buffer(sctx, transfer->resource,
   >u.staging->b.b, box->x,
   stransfer->offset + box->x % 
SI_MAP_BUFFER_ALIGNMENT,
   box->width);
}
 
util_range_add(>valid_buffer_range, box->x,
   box->x + box->width);
 }
 
 static void si_buffer_flush_region(struct pipe_context *ctx,
@@ -570,21 +589,22 @@ static void si_buffer_flush_region(struct pipe_context 
*ctx,
 

[Mesa-dev] [PATCH 6/8] radeonsi: use WRITE_DATA for small glBufferSubData sizes

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_buffer.c | 27 
 src/gallium/drivers/radeonsi/si_pipe.h   |  1 +
 2 files changed, 28 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_buffer.c 
b/src/gallium/drivers/radeonsi/si_buffer.c
index 4766cf4bdfa..a1e421b8b0d 100644
--- a/src/gallium/drivers/radeonsi/si_buffer.c
+++ b/src/gallium/drivers/radeonsi/si_buffer.c
@@ -16,20 +16,22 @@
  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
  * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
  * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
  * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
  * USE OR OTHER DEALINGS IN THE SOFTWARE.
  */
 
 #include "radeonsi/si_pipe.h"
+#include "sid.h"
+
 #include "util/u_memory.h"
 #include "util/u_upload_mgr.h"
 #include "util/u_transfer.h"
 #include 
 #include 
 
 bool si_rings_is_buffer_referenced(struct si_context *sctx,
   struct pb_buffer *buf,
   enum radeon_bo_usage usage)
 {
@@ -506,20 +508,38 @@ static void *si_buffer_transfer_map(struct pipe_context 
*ctx,
data = si_buffer_map_sync_with_rings(sctx, rbuffer, usage);
if (!data) {
return NULL;
}
data += box->x;
 
return si_buffer_get_transfer(ctx, resource, usage, box,
ptransfer, data, NULL, 0);
 }
 
+static void si_buffer_write_data(struct si_context *sctx, struct r600_resource 
*buf,
+unsigned offset, unsigned size, const void 
*data)
+{
+   struct radeon_cmdbuf *cs = sctx->gfx_cs;
+
+   si_need_gfx_cs_space(sctx);
+
+   sctx->flags |= SI_CONTEXT_PS_PARTIAL_FLUSH |
+  SI_CONTEXT_CS_PARTIAL_FLUSH |
+  si_get_flush_flags(sctx, SI_COHERENCY_SHADER, L2_LRU);
+   si_emit_cache_flush(sctx);
+
+   si_cp_write_data(sctx, buf, offset, size, V_370_TC_L2, V_370_ME, data);
+
+   radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0));
+   radeon_emit(cs, 0);
+}
+
 static void si_buffer_do_flush_region(struct pipe_context *ctx,
  struct pipe_transfer *transfer,
  const struct pipe_box *box)
 {
struct si_transfer *stransfer = (struct si_transfer*)transfer;
struct r600_resource *rbuffer = r600_resource(transfer->resource);
 
if (stransfer->u.staging) {
/* Copy the staging buffer into the original one. */
si_copy_buffer((struct si_context*)ctx, transfer->resource,
@@ -568,20 +588,27 @@ static void si_buffer_transfer_unmap(struct pipe_context 
*ctx,
 
 static void si_buffer_subdata(struct pipe_context *ctx,
  struct pipe_resource *buffer,
  unsigned usage, unsigned offset,
  unsigned size, const void *data)
 {
struct pipe_transfer *transfer = NULL;
struct pipe_box box;
uint8_t *map = NULL;
 
+   if (size <= SI_TRANSFER_WRITE_DATA_THRESHOLD &&
+   offset % 4 == 0 && size % 4 == 0 && (uintptr_t)data % 4 == 0) {
+   si_buffer_write_data((struct si_context*)ctx,
+r600_resource(buffer), offset, size, data);
+   return;
+   }
+
u_box_1d(offset, size, );
map = si_buffer_transfer_map(ctx, buffer, 0,
   PIPE_TRANSFER_WRITE |
   PIPE_TRANSFER_DISCARD_RANGE |
   usage,
   , );
if (!map)
return;
 
memcpy(map, data, size);
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 5bd3d9641d2..f79828f3438 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -94,20 +94,21 @@
 #define SI_PREFETCH_ES (1 << 3)
 #define SI_PREFETCH_GS (1 << 4)
 #define SI_PREFETCH_VS (1 << 5)
 #define SI_PREFETCH_PS (1 << 6)
 
 #define SI_MAX_BORDER_COLORS   4096
 #define SI_MAX_VIEWPORTS   16
 #define SIX_BITS   0x3F
 #define SI_MAP_BUFFER_ALIGNMENT64
 #define SI_MAX_VARIABLE_THREADS_PER_BLOCK 1024
+#define SI_TRANSFER_WRITE_DATA_THRESHOLD 64
 
 #define SI_RESOURCE_FLAG_TRANSFER  (PIPE_RESOURCE_FLAG_DRV_PRIV << 0)
 #define SI_RESOURCE_FLAG_FLUSHED_DEPTH (PIPE_RESOURCE_FLAG_DRV_PRIV << 1)
 #define SI_RESOURCE_FLAG_FORCE_MSAA_TILING (PIPE_RESOURCE_FLAG_DRV_PRIV << 2)
 #define SI_RESOURCE_FLAG_DISABLE_DCC   (PIPE_RESOURCE_FLAG_DRV_PRIV 

[Mesa-dev] [PATCH 2/8] radeonsi: fix the top-of-pipe fence on SI

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

SI doesn't have MEM.
---
 src/gallium/drivers/radeonsi/si_fence.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_fence.c 
b/src/gallium/drivers/radeonsi/si_fence.c
index be394119af6..46d0289c90b 100644
--- a/src/gallium/drivers/radeonsi/si_fence.c
+++ b/src/gallium/drivers/radeonsi/si_fence.c
@@ -259,21 +259,22 @@ static void si_fine_fence_set(struct si_context *ctx,
 
*fence_ptr = 0;
 
uint64_t fence_va = fine->buf->gpu_address + fine->offset;
 
radeon_add_to_buffer_list(ctx, ctx->gfx_cs, fine->buf,
  RADEON_USAGE_WRITE, RADEON_PRIO_QUERY);
if (flags & PIPE_FLUSH_TOP_OF_PIPE) {
struct radeon_cmdbuf *cs = ctx->gfx_cs;
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
-   radeon_emit(cs, S_370_DST_SEL(V_370_MEM) |
+   radeon_emit(cs, S_370_DST_SEL(ctx->chip_class >= CIK ? V_370_MEM
+: 
V_370_MEM_GRBM) |
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_PFP));
radeon_emit(cs, fence_va);
radeon_emit(cs, fence_va >> 32);
radeon_emit(cs, 0x8000);
} else if (flags & PIPE_FLUSH_BOTTOM_OF_PIPE) {
si_cp_release_mem(ctx,
  V_028A90_BOTTOM_OF_PIPE_TS, 0,
  EOP_DST_SEL_MEM, EOP_INT_SEL_NONE,
  EOP_DATA_SEL_VALUE_32BIT,
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/8] radeonsi: move PKT3_WRITE_DATA generation into a helper function

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_cp_dma.c  | 25 +++
 src/gallium/drivers/radeonsi/si_descriptors.c | 10 ++--
 src/gallium/drivers/radeonsi/si_fence.c   | 21 ++--
 src/gallium/drivers/radeonsi/si_pipe.c| 13 ++
 src/gallium/drivers/radeonsi/si_pipe.h|  3 +++
 src/gallium/drivers/radeonsi/si_state_draw.c  | 12 +++--
 6 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_cp_dma.c 
b/src/gallium/drivers/radeonsi/si_cp_dma.c
index 80673f3f5f2..59360c0d4aa 100644
--- a/src/gallium/drivers/radeonsi/si_cp_dma.c
+++ b/src/gallium/drivers/radeonsi/si_cp_dma.c
@@ -574,10 +574,35 @@ void si_test_gds(struct si_context *sctx)
 
pipe_buffer_read(ctx, dst, 0, sizeof(r), r);
printf("GDS clear = %08x %08x %08x %08x -> %s\n", r[0], r[1], r[2], 
r[3],
r[0] == 0xc1ea4146 && r[1] == 0xc1ea4146 &&
r[2] == 0xc1ea4146 && r[3] == 0xc1ea4146 ? "pass" : 
"fail");
 
pipe_resource_reference(, NULL);
pipe_resource_reference(, NULL);
exit(0);
 }
+
+void si_cp_write_data(struct si_context *sctx, struct r600_resource *buf,
+ unsigned offset, unsigned size, unsigned dst_sel,
+ unsigned engine, const void *data)
+{
+   struct radeon_cmdbuf *cs = sctx->gfx_cs;
+
+   assert(offset % 4 == 0);
+   assert(size % 4 == 0);
+
+   if (sctx->chip_class == SI && dst_sel == V_370_MEM)
+   dst_sel = V_370_MEM_GRBM;
+
+   radeon_add_to_buffer_list(sctx, cs, buf,
+ RADEON_USAGE_WRITE, RADEON_PRIO_CP_DMA);
+   uint64_t va = buf->gpu_address + offset;
+
+   radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 2 + size/4, 0));
+   radeon_emit(cs, S_370_DST_SEL(dst_sel) |
+   S_370_WR_CONFIRM(1) |
+   S_370_ENGINE_SEL(engine));
+   radeon_emit(cs, va);
+   radeon_emit(cs, va >> 32);
+   radeon_emit_array(cs, (const uint32_t*)data, size/4);
+}
diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 71ae00c53cb..ca62848296b 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -1814,35 +1814,29 @@ void si_rebind_buffer(struct si_context *sctx, struct 
pipe_resource *buf,
}
}
}
 }
 
 static void si_upload_bindless_descriptor(struct si_context *sctx,
  unsigned desc_slot,
  unsigned num_dwords)
 {
struct si_descriptors *desc = >bindless_descriptors;
-   struct radeon_cmdbuf *cs = sctx->gfx_cs;
unsigned desc_slot_offset = desc_slot * 16;
uint32_t *data;
uint64_t va;
 
data = desc->list + desc_slot_offset;
va = desc->gpu_address + desc_slot_offset * 4;
 
-   radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 2 + num_dwords, 0));
-   radeon_emit(cs, S_370_DST_SEL(V_370_TC_L2) |
-   S_370_WR_CONFIRM(1) |
-   S_370_ENGINE_SEL(V_370_ME));
-   radeon_emit(cs, va);
-   radeon_emit(cs, va >> 32);
-   radeon_emit_array(cs, data, num_dwords);
+   si_cp_write_data(sctx, desc->buffer, va - desc->buffer->gpu_address,
+num_dwords * 4, V_370_TC_L2, V_370_ME, data);
 }
 
 static void si_upload_bindless_descriptors(struct si_context *sctx)
 {
if (!sctx->bindless_descriptors_dirty)
return;
 
/* Wait for graphics/compute to be idle before updating the resident
 * descriptors directly in memory, in case the GPU is using them.
 */
diff --git a/src/gallium/drivers/radeonsi/si_fence.c 
b/src/gallium/drivers/radeonsi/si_fence.c
index 46d0289c90b..84bf4d10c20 100644
--- a/src/gallium/drivers/radeonsi/si_fence.c
+++ b/src/gallium/drivers/radeonsi/si_fence.c
@@ -252,35 +252,30 @@ static void si_fine_fence_set(struct si_context *ctx,
assert(util_bitcount(flags & (PIPE_FLUSH_TOP_OF_PIPE | 
PIPE_FLUSH_BOTTOM_OF_PIPE)) == 1);
 
/* Use uncached system memory for the fence. */
u_upload_alloc(ctx->cached_gtt_allocator, 0, 4, 4,
   >offset, (struct pipe_resource **)>buf, 
(void **)_ptr);
if (!fine->buf)
return;
 
*fence_ptr = 0;
 
-   uint64_t fence_va = fine->buf->gpu_address + fine->offset;
-
-   radeon_add_to_buffer_list(ctx, ctx->gfx_cs, fine->buf,
- RADEON_USAGE_WRITE, RADEON_PRIO_QUERY);
if (flags & PIPE_FLUSH_TOP_OF_PIPE) {
-   struct radeon_cmdbuf *cs = ctx->gfx_cs;
-   radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
-   radeon_emit(cs, S_370_DST_SEL(ctx->chip_class >= CIK ? V_370_MEM
-: 
V_370_MEM_GRBM) 

[Mesa-dev] [PATCH 0/8] RadeonSI: PKT3_WRITE_DATA for small uploads

2019-01-18 Thread Marek Olšák
Hi,

These uploads should have lower CPU overhead.

There are also some cleanups around the WRITE_DATA packet.

Please review.

Thanks,
Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/8] radeonsi: wrap si_transfer::staging in a union

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

the union will be used later.
---
 src/gallium/drivers/radeonsi/si_buffer.c  |  8 +++
 src/gallium/drivers/radeonsi/si_pipe.h|  4 +++-
 src/gallium/drivers/radeonsi/si_texture.c | 26 +++
 3 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_buffer.c 
b/src/gallium/drivers/radeonsi/si_buffer.c
index c7260e06ccf..4766cf4bdfa 100644
--- a/src/gallium/drivers/radeonsi/si_buffer.c
+++ b/src/gallium/drivers/radeonsi/si_buffer.c
@@ -350,21 +350,21 @@ static void *si_buffer_get_transfer(struct pipe_context 
*ctx,
 
transfer->b.b.resource = NULL;
pipe_resource_reference(>b.b.resource, resource);
transfer->b.b.level = 0;
transfer->b.b.usage = usage;
transfer->b.b.box = *box;
transfer->b.b.stride = 0;
transfer->b.b.layer_stride = 0;
transfer->b.staging = NULL;
transfer->offset = offset;
-   transfer->staging = staging;
+   transfer->u.staging = staging;
*ptransfer = >b.b;
return data;
 }
 
 static void *si_buffer_transfer_map(struct pipe_context *ctx,
struct pipe_resource *resource,
unsigned level,
unsigned usage,
const struct pipe_box *box,
struct pipe_transfer **ptransfer)
@@ -513,24 +513,24 @@ static void *si_buffer_transfer_map(struct pipe_context 
*ctx,
ptransfer, data, NULL, 0);
 }
 
 static void si_buffer_do_flush_region(struct pipe_context *ctx,
  struct pipe_transfer *transfer,
  const struct pipe_box *box)
 {
struct si_transfer *stransfer = (struct si_transfer*)transfer;
struct r600_resource *rbuffer = r600_resource(transfer->resource);
 
-   if (stransfer->staging) {
+   if (stransfer->u.staging) {
/* Copy the staging buffer into the original one. */
si_copy_buffer((struct si_context*)ctx, transfer->resource,
-  >staging->b.b, box->x,
+  >u.staging->b.b, box->x,
   stransfer->offset + box->x % 
SI_MAP_BUFFER_ALIGNMENT,
   box->width);
}
 
util_range_add(>valid_buffer_range, box->x,
   box->x + box->width);
 }
 
 static void si_buffer_flush_region(struct pipe_context *ctx,
   struct pipe_transfer *transfer,
@@ -550,21 +550,21 @@ static void si_buffer_flush_region(struct pipe_context 
*ctx,
 static void si_buffer_transfer_unmap(struct pipe_context *ctx,
 struct pipe_transfer *transfer)
 {
struct si_context *sctx = (struct si_context*)ctx;
struct si_transfer *stransfer = (struct si_transfer*)transfer;
 
if (transfer->usage & PIPE_TRANSFER_WRITE &&
!(transfer->usage & PIPE_TRANSFER_FLUSH_EXPLICIT))
si_buffer_do_flush_region(ctx, transfer, >box);
 
-   r600_resource_reference(>staging, NULL);
+   r600_resource_reference(>u.staging, NULL);
assert(stransfer->b.staging == NULL); /* for threaded context only */
pipe_resource_reference(>resource, NULL);
 
/* Don't use pool_transfers_unsync. We are always in the driver
 * thread. */
slab_free(>pool_transfers, transfer);
 }
 
 static void si_buffer_subdata(struct pipe_context *ctx,
  struct pipe_resource *buffer,
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index d874f215a21..5bd3d9641d2 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -245,21 +245,23 @@ struct r600_resource {
/* Whether this resource is referenced by bindless handles. */
booltexture_handle_allocated;
boolimage_handle_allocated;
 
/* Whether the resource has been exported via resource_get_handle. */
unsignedexternal_usage; /* PIPE_HANDLE_USAGE_* 
*/
 };
 
 struct si_transfer {
struct threaded_transferb;
-   struct r600_resource*staging;
+   union {
+   struct r600_resource*staging;
+   } u;
unsignedoffset;
 };
 
 struct si_texture {
struct r600_resourcebuffer;
 
struct radeon_surf  surface;
uint64_tsize;
struct si_texture   *flushed_depth_texture;
 
diff --git a/src/gallium/drivers/radeonsi/si_texture.c 
b/src/gallium/drivers/radeonsi/si_texture.c
index 585f58c1e38..8f81c777aba 100644
--- a/src/gallium/drivers/radeonsi/si_texture.c
+++ 

Re: [Mesa-dev] [PATCH] anv: Re-sort the extensions list

2019-01-18 Thread Jason Ekstrand
Thanks!  Pushed.

On Fri, Jan 18, 2019 at 10:30 AM Lionel Landwerlin <
lionel.g.landwer...@intel.com> wrote:

> Reviewed-by: Lionel Landwerlin 
>
> On 18/01/2019 16:24, Jason Ekstrand wrote:
> > I like to keep things in good order so that you can find them.
> > ---
> >   src/intel/vulkan/anv_extensions.py | 12 ++--
> >   1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/src/intel/vulkan/anv_extensions.py
> b/src/intel/vulkan/anv_extensions.py
> > index 2ea4cab0e97..2d212361955 100644
> > --- a/src/intel/vulkan/anv_extensions.py
> > +++ b/src/intel/vulkan/anv_extensions.py
> > @@ -71,8 +71,8 @@ MAX_API_VERSION = None # Computed later
> >   EXTENSIONS = [
> >   Extension('VK_ANDROID_external_memory_android_hardware_buffer', 3,
> 'ANDROID'),
> >   Extension('VK_ANDROID_native_buffer', 5,
> 'ANDROID'),
> > -Extension('VK_KHR_16bit_storage', 1,
> 'device->info.gen >= 8'),
> >   Extension('VK_KHR_8bit_storage',  1,
> 'device->info.gen >= 8'),
> > +Extension('VK_KHR_16bit_storage', 1,
> 'device->info.gen >= 8'),
> >   Extension('VK_KHR_bind_memory2',  1, True),
> >   Extension('VK_KHR_create_renderpass2',1, True),
> >   Extension('VK_KHR_dedicated_allocation',  1, True),
> > @@ -80,6 +80,7 @@ EXTENSIONS = [
> >   Extension('VK_KHR_descriptor_update_template',1, True),
> >   Extension('VK_KHR_device_group',  1, True),
> >   Extension('VK_KHR_device_group_creation', 1, True),
> > +Extension('VK_KHR_display',  23,
> 'VK_USE_PLATFORM_DISPLAY_KHR'),
> >   Extension('VK_KHR_driver_properties', 1, True),
> >   Extension('VK_KHR_external_fence',1,
> > 'device->has_syncobj_wait'),
> > @@ -101,6 +102,7 @@ EXTENSIONS = [
> >   Extension('VK_KHR_maintenance1',  1, True),
> >   Extension('VK_KHR_maintenance2',  1, True),
> >   Extension('VK_KHR_maintenance3',  1, True),
> > +Extension('VK_KHR_multiview', 1, True),
> >   Extension('VK_KHR_push_descriptor',   1, True),
> >   Extension('VK_KHR_relaxed_block_layout',  1, True),
> >   Extension('VK_KHR_sampler_mirror_clamp_to_edge',  1, True),
> > @@ -113,9 +115,8 @@ EXTENSIONS = [
> >   Extension('VK_KHR_wayland_surface',   6,
> 'VK_USE_PLATFORM_WAYLAND_KHR'),
> >   Extension('VK_KHR_xcb_surface',   6,
> 'VK_USE_PLATFORM_XCB_KHR'),
> >   Extension('VK_KHR_xlib_surface',  6,
> 'VK_USE_PLATFORM_XLIB_KHR'),
> > -Extension('VK_KHR_multiview', 1, True),
> > -Extension('VK_KHR_display',  23,
> 'VK_USE_PLATFORM_DISPLAY_KHR'),
> >   Extension('VK_EXT_acquire_xlib_display',  1,
> 'VK_USE_PLATFORM_XLIB_XRANDR_EXT'),
> > +Extension('VK_EXT_calibrated_timestamps', 1, True),
> >   Extension('VK_EXT_debug_report',  8, True),
> >   Extension('VK_EXT_direct_mode_display',   1,
> 'VK_USE_PLATFORM_DISPLAY_KHR'),
> >   Extension('VK_EXT_display_control',   1,
> 'VK_USE_PLATFORM_DISPLAY_KHR'),
> > @@ -124,13 +125,12 @@ EXTENSIONS = [
> >   Extension('VK_EXT_global_priority',   1,
> > 'device->has_context_priority'),
> >   Extension('VK_EXT_pci_bus_info',  2, True),
> > +Extension('VK_EXT_post_depth_coverage',   1,
> 'device->info.gen >= 9'),
> > +Extension('VK_EXT_sampler_filter_minmax', 1,
> 'device->info.gen >= 9'),
> >   Extension('VK_EXT_scalar_block_layout',   1, True),
> >   Extension('VK_EXT_shader_viewport_index_layer',   1, True),
> >   Extension('VK_EXT_shader_stencil_export', 1,
> 'device->info.gen >= 9'),
> >   Extension('VK_EXT_vertex_attribute_divisor',  3, True),
> > -Extension('VK_EXT_post_depth_coverage',   1,
> 'device->info.gen >= 9'),
> > -Extension('VK_EXT_sampler_filter_minmax', 1,
> 'device->info.gen >= 9'),
> > -Extension('VK_EXT_calibrated_timestamps', 1, True),
> >   Extension('VK_GOOGLE_decorate_string',1, True),
> >   Extension('VK_GOOGLE_hlsl_functionality1',1, True),
> >   ]
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: Re-sort the extensions list

2019-01-18 Thread Lionel Landwerlin

Reviewed-by: Lionel Landwerlin 

On 18/01/2019 16:24, Jason Ekstrand wrote:

I like to keep things in good order so that you can find them.
---
  src/intel/vulkan/anv_extensions.py | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/intel/vulkan/anv_extensions.py 
b/src/intel/vulkan/anv_extensions.py
index 2ea4cab0e97..2d212361955 100644
--- a/src/intel/vulkan/anv_extensions.py
+++ b/src/intel/vulkan/anv_extensions.py
@@ -71,8 +71,8 @@ MAX_API_VERSION = None # Computed later
  EXTENSIONS = [
  Extension('VK_ANDROID_external_memory_android_hardware_buffer', 3, 
'ANDROID'),
  Extension('VK_ANDROID_native_buffer', 5, 'ANDROID'),
-Extension('VK_KHR_16bit_storage', 1, 'device->info.gen 
>= 8'),
  Extension('VK_KHR_8bit_storage',  1, 'device->info.gen 
>= 8'),
+Extension('VK_KHR_16bit_storage', 1, 'device->info.gen 
>= 8'),
  Extension('VK_KHR_bind_memory2',  1, True),
  Extension('VK_KHR_create_renderpass2',1, True),
  Extension('VK_KHR_dedicated_allocation',  1, True),
@@ -80,6 +80,7 @@ EXTENSIONS = [
  Extension('VK_KHR_descriptor_update_template',1, True),
  Extension('VK_KHR_device_group',  1, True),
  Extension('VK_KHR_device_group_creation', 1, True),
+Extension('VK_KHR_display',  23, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
  Extension('VK_KHR_driver_properties', 1, True),
  Extension('VK_KHR_external_fence',1,
'device->has_syncobj_wait'),
@@ -101,6 +102,7 @@ EXTENSIONS = [
  Extension('VK_KHR_maintenance1',  1, True),
  Extension('VK_KHR_maintenance2',  1, True),
  Extension('VK_KHR_maintenance3',  1, True),
+Extension('VK_KHR_multiview', 1, True),
  Extension('VK_KHR_push_descriptor',   1, True),
  Extension('VK_KHR_relaxed_block_layout',  1, True),
  Extension('VK_KHR_sampler_mirror_clamp_to_edge',  1, True),
@@ -113,9 +115,8 @@ EXTENSIONS = [
  Extension('VK_KHR_wayland_surface',   6, 
'VK_USE_PLATFORM_WAYLAND_KHR'),
  Extension('VK_KHR_xcb_surface',   6, 
'VK_USE_PLATFORM_XCB_KHR'),
  Extension('VK_KHR_xlib_surface',  6, 
'VK_USE_PLATFORM_XLIB_KHR'),
-Extension('VK_KHR_multiview', 1, True),
-Extension('VK_KHR_display',  23, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
  Extension('VK_EXT_acquire_xlib_display',  1, 
'VK_USE_PLATFORM_XLIB_XRANDR_EXT'),
+Extension('VK_EXT_calibrated_timestamps', 1, True),
  Extension('VK_EXT_debug_report',  8, True),
  Extension('VK_EXT_direct_mode_display',   1, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
  Extension('VK_EXT_display_control',   1, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
@@ -124,13 +125,12 @@ EXTENSIONS = [
  Extension('VK_EXT_global_priority',   1,
'device->has_context_priority'),
  Extension('VK_EXT_pci_bus_info',  2, True),
+Extension('VK_EXT_post_depth_coverage',   1, 'device->info.gen 
>= 9'),
+Extension('VK_EXT_sampler_filter_minmax', 1, 'device->info.gen 
>= 9'),
  Extension('VK_EXT_scalar_block_layout',   1, True),
  Extension('VK_EXT_shader_viewport_index_layer',   1, True),
  Extension('VK_EXT_shader_stencil_export', 1, 'device->info.gen 
>= 9'),
  Extension('VK_EXT_vertex_attribute_divisor',  3, True),
-Extension('VK_EXT_post_depth_coverage',   1, 'device->info.gen 
>= 9'),
-Extension('VK_EXT_sampler_filter_minmax', 1, 'device->info.gen 
>= 9'),
-Extension('VK_EXT_calibrated_timestamps', 1, True),
  Extension('VK_GOOGLE_decorate_string',1, True),
  Extension('VK_GOOGLE_hlsl_functionality1',1, True),
  ]



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/mesa: fix PRIMITIVES_GENERATED query after the "pipeline stat single" changes

2019-01-18 Thread Marek Olšák
From: Marek Olšák 

---
 src/mesa/state_tracker/st_cb_queryobj.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_cb_queryobj.c 
b/src/mesa/state_tracker/st_cb_queryobj.c
index abb126547c9..642b901d05a 100644
--- a/src/mesa/state_tracker/st_cb_queryobj.c
+++ b/src/mesa/state_tracker/st_cb_queryobj.c
@@ -84,21 +84,22 @@ st_DeleteQuery(struct gl_context *ctx, struct 
gl_query_object *q)
struct st_query_object *stq = st_query_object(q);
 
free_queries(pipe, stq);
 
free(stq);
 }
 
 static int
 target_to_index(const struct st_context *st, const struct gl_query_object *q)
 {
-   if (q->Target == GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN ||
+   if (q->Target == GL_PRIMITIVES_GENERATED ||
+   q->Target == GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN ||
q->Target == GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW_ARB)
   return q->Stream;
 
if (st->has_single_pipe_stat) {
   switch (q->Target) {
   case GL_VERTICES_SUBMITTED_ARB:
  return PIPE_STAT_QUERY_IA_VERTICES;
   case GL_PRIMITIVES_SUBMITTED_ARB:
  return PIPE_STAT_QUERY_IA_PRIMITIVES;
   case GL_VERTEX_SHADER_INVOCATIONS_ARB:
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] anv: Re-sort the extensions list

2019-01-18 Thread Jason Ekstrand
I like to keep things in good order so that you can find them.
---
 src/intel/vulkan/anv_extensions.py | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/intel/vulkan/anv_extensions.py 
b/src/intel/vulkan/anv_extensions.py
index 2ea4cab0e97..2d212361955 100644
--- a/src/intel/vulkan/anv_extensions.py
+++ b/src/intel/vulkan/anv_extensions.py
@@ -71,8 +71,8 @@ MAX_API_VERSION = None # Computed later
 EXTENSIONS = [
 Extension('VK_ANDROID_external_memory_android_hardware_buffer', 3, 
'ANDROID'),
 Extension('VK_ANDROID_native_buffer', 5, 'ANDROID'),
-Extension('VK_KHR_16bit_storage', 1, 'device->info.gen 
>= 8'),
 Extension('VK_KHR_8bit_storage',  1, 'device->info.gen 
>= 8'),
+Extension('VK_KHR_16bit_storage', 1, 'device->info.gen 
>= 8'),
 Extension('VK_KHR_bind_memory2',  1, True),
 Extension('VK_KHR_create_renderpass2',1, True),
 Extension('VK_KHR_dedicated_allocation',  1, True),
@@ -80,6 +80,7 @@ EXTENSIONS = [
 Extension('VK_KHR_descriptor_update_template',1, True),
 Extension('VK_KHR_device_group',  1, True),
 Extension('VK_KHR_device_group_creation', 1, True),
+Extension('VK_KHR_display',  23, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
 Extension('VK_KHR_driver_properties', 1, True),
 Extension('VK_KHR_external_fence',1,
   'device->has_syncobj_wait'),
@@ -101,6 +102,7 @@ EXTENSIONS = [
 Extension('VK_KHR_maintenance1',  1, True),
 Extension('VK_KHR_maintenance2',  1, True),
 Extension('VK_KHR_maintenance3',  1, True),
+Extension('VK_KHR_multiview', 1, True),
 Extension('VK_KHR_push_descriptor',   1, True),
 Extension('VK_KHR_relaxed_block_layout',  1, True),
 Extension('VK_KHR_sampler_mirror_clamp_to_edge',  1, True),
@@ -113,9 +115,8 @@ EXTENSIONS = [
 Extension('VK_KHR_wayland_surface',   6, 
'VK_USE_PLATFORM_WAYLAND_KHR'),
 Extension('VK_KHR_xcb_surface',   6, 
'VK_USE_PLATFORM_XCB_KHR'),
 Extension('VK_KHR_xlib_surface',  6, 
'VK_USE_PLATFORM_XLIB_KHR'),
-Extension('VK_KHR_multiview', 1, True),
-Extension('VK_KHR_display',  23, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
 Extension('VK_EXT_acquire_xlib_display',  1, 
'VK_USE_PLATFORM_XLIB_XRANDR_EXT'),
+Extension('VK_EXT_calibrated_timestamps', 1, True),
 Extension('VK_EXT_debug_report',  8, True),
 Extension('VK_EXT_direct_mode_display',   1, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
 Extension('VK_EXT_display_control',   1, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
@@ -124,13 +125,12 @@ EXTENSIONS = [
 Extension('VK_EXT_global_priority',   1,
   'device->has_context_priority'),
 Extension('VK_EXT_pci_bus_info',  2, True),
+Extension('VK_EXT_post_depth_coverage',   1, 'device->info.gen 
>= 9'),
+Extension('VK_EXT_sampler_filter_minmax', 1, 'device->info.gen 
>= 9'),
 Extension('VK_EXT_scalar_block_layout',   1, True),
 Extension('VK_EXT_shader_viewport_index_layer',   1, True),
 Extension('VK_EXT_shader_stencil_export', 1, 'device->info.gen 
>= 9'),
 Extension('VK_EXT_vertex_attribute_divisor',  3, True),
-Extension('VK_EXT_post_depth_coverage',   1, 'device->info.gen 
>= 9'),
-Extension('VK_EXT_sampler_filter_minmax', 1, 'device->info.gen 
>= 9'),
-Extension('VK_EXT_calibrated_timestamps', 1, True),
 Extension('VK_GOOGLE_decorate_string',1, True),
 Extension('VK_GOOGLE_hlsl_functionality1',1, True),
 ]
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Double free error on etnaviv driver.

2019-01-18 Thread Lucas Stach
Hi Sergey,

Am Donnerstag, den 17.01.2019, 18:14 +0300 schrieb Nazarov Sergey:
> Hi, Lucas!
> Here is result of execution standard Qt5 example application mainwindow
> built under custom buildroot with gcc-4.9.4, mesa3d-17.3.6.
> But we had the same error with latest buildroot and latest supported mesa and 
> gcc.
> ---
> # ./mainwindow
> QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to 
> '/root/.tmp/runtime-root'
> Failed to move cursor on screen LVDS1: -14
> *** Error in `./mainwindow': double free or corruption (fasttop): 0x021295b8 
> ***
> Aborted
> ---
> Some from valgrind output:
> ==1473== Invalid free() / delete / delete[] / realloc()
> ==1473==at 0x48469F8: free (vg_replace_malloc.c:530)
> ==1473==by 0x825C61B: _mesa_glsl_release_builtin_functions() (in 
> /usr/lib/dri/imx-drm_dri.so)
> ==1473==by 0x8286113: _mesa_destroy_shader_compiler (in 
> /usr/lib/dri/imx-drm_dri.so)
> ==1473==by 0x808F073: one_time_fini (in /usr/lib/dri/imx-drm_dri.so)
> ==1473==  Address 0x96f4ee8 is 0 bytes inside a block of size 24 free'd
> ==1473==at 0x48469F8: free (vg_replace_malloc.c:530)
> ==1473==by 0x4015CF7: _dl_close_worker (in /lib/ld-2.23.so)
> ==1473==  Block was alloc'd at
> ==1473==at 0x48454B0: malloc (vg_replace_malloc.c:299)
> ==1473==by 0x829EA7F: ralloc_size (in /usr/lib/dri/imx-drm_dri.so)

I've just been made aware of a peculiarity of the buildroot toolchain
configuration, which might well explain why you are seeing the obvious
crash, but no one else is complaining about this.

Can you try if [1] also solves this issue for you? If it does, I think
we should not try to work around this in Mesa.

Regards,
Lucas

[1] http://lists.busybox.net/pipermail/buildroot/2018-November/235923.html
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl/lower_output_reads: set invariant and precise flags on temporaries

2019-01-18 Thread Karol Herbst
fixes a couple of deqp tests (on nvc0 and potential other drivers):
dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_1
dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_2
dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_3
dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_1
dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_2
dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_3
dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_1
dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_2
dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_3

Signed-off-by: Karol Herbst 
---
 src/compiler/glsl/lower_output_reads.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/compiler/glsl/lower_output_reads.cpp 
b/src/compiler/glsl/lower_output_reads.cpp
index bd3accb3dda..6d4132854f5 100644
--- a/src/compiler/glsl/lower_output_reads.cpp
+++ b/src/compiler/glsl/lower_output_reads.cpp
@@ -101,6 +101,9 @@ output_read_remover::visit(ir_dereference_variable *ir)
   void *var_ctx = ralloc_parent(ir->var);
   temp = new(var_ctx) ir_variable(ir->var->type, ir->var->name,
   ir_var_temporary);
+  /* save invariant and precise flags */
+  temp->data.invariant = ir->var->data.invariant;
+  temp->data.precise = ir->var->data.precise;
   _mesa_hash_table_insert(replacements, ir->var, temp);
   ir->var->insert_after(temp);
}
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 28/42] intel/compiler: handle 64-bit float to 8-bit integer conversions

2019-01-18 Thread Jason Ekstrand



On January 18, 2019 06:23:35 Iago Toral  wrote:

On Fri, 2019-01-18 at 12:13 +0100, Iago Toral wrote:

On Thu, 2019-01-17 at 17:12 -0600, Jason Ekstrand wrote:
This patch doesn't really do what the commit message says.  What it really 
does is implement float -> 8-bit converions for *any* size float.


On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga  wrote:

These are not directly supported in hardware and brw_nir_lower_conversions
should have taken care of that before we get here. Also, while we are
at it, make sure 64-bit integer to 8-bit are also properly split by
the same lowering pass, since they have the same hardware restrictions.


Now that we have a lowering pass, having separate cases just so one of them 
can assert seems silly.  If anything, we should just do


if (result.type == BRW_REGISTER_TYPE_B ||
   result.type == BRW_REGISTER_TYPE_UB ||
   result.type == BRW_REGISTER_TYPE_HF)
  assert(type_sz(op[0].type) < 8) /* brw_nir_lower_conversions */

and have it all in one big case.  The only special case we need is for 
booleans where we need to negate them and fall through.


There are more cases, since the inverse conversions of these are not 
supported either. I guess I'll just add this as well:


if (op[0].type == BRW_REGISTER_TYPE_B ||
op[0].type == BRW_REGISTER_TYPE_UB ||
op[0].type == BRW_REGISTER_TYPE_HF)
assert(type_sz(result.type) < 8); /* brw_nir_lower_conversions */


Oh, and there is also the rounding opcodes for f16 destinations (plus the 
big comment about brw_F32TO16 that comes with their fallthrough)... I think 
we might want to keep the three f2f16* opcodes as a separate block and do 
as you suggest for the remaining ones if that is okay.


That's fine as they actually do do something different.

-Jason




--Jason


---
src/intel/compiler/brw_fs_nir.cpp | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp

index cf546b8ff09..e454578d99b 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -786,6 +786,10 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)

   case nir_op_f2f16:
   case nir_op_i2f16:
   case nir_op_u2f16:
+   case nir_op_i2i8:
+   case nir_op_u2u8:
+   case nir_op_f2i8:
+   case nir_op_f2u8:
  assert(type_sz(op[0].type) < 8); /* brw_nir_lower_conversions */
  inst = bld.MOV(result, op[0]);
  inst->saturate = instr->dest.saturate;
@@ -824,8 +828,6 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)

   case nir_op_u2u32:
   case nir_op_i2i16:
   case nir_op_u2u16:
-   case nir_op_i2i8:
-   case nir_op_u2u8:
  inst = bld.MOV(result, op[0]);
  inst->saturate = instr->dest.saturate;
  break;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 21/42] intel/compiler: set correct precision fields for 3-source float instructions

2019-01-18 Thread Jason Ekstrand


On January 18, 2019 04:47:51 Iago Toral  wrote:

On Thu, 2019-01-17 at 14:18 -0600, Jason Ekstrand wrote:

On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga  wrote:

Source0 and Destination extract the floating-point precision automatically
from the SrcType and DstType instruction fields respectively when they are
set to types :F or :HF. For Source1 and Source2 operands, we use the new
1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1
means half-precision. Since we always use the type of the destination for
all operands when we emit 3-source instructions, we only need set Src1Type
and Src2Type to 1 when we are emitting a half-precision instruction.

v2:
- Set the bit separately for each source based on its type so we can
  do mixed floating-point mode in the future (Topi).

Reviewed-by: Topi Pohjolainen 
---
src/intel/compiler/brw_eu_emit.c | 16 
1 file changed, 16 insertions(+)

diff --git a/src/intel/compiler/brw_eu_emit.c 
b/src/intel/compiler/brw_eu_emit.c

index a785f96b650..2fa89f8a2a3 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -801,6 +801,22 @@ brw_alu3(struct brw_codegen *p, unsigned opcode, 
struct brw_reg dest,

  */
 brw_inst_set_3src_a16_src_type(devinfo, inst, dest.type);
 brw_inst_set_3src_a16_dst_type(devinfo, inst, dest.type);
+
+ /* From the Bspec: Instruction types
+  *
+  * Three source instructions can use operands with mixed-mode
+  * precision. When SrcType field is set to :f or :hf it defines
+  * precision for source 0 only, and fields Src1Type and Src2Type
+  * define precision for other source operands:
+  *
+  *   0b = :f. Single precision Float (32-bit).
+  *   1b = :hf. Half precision Float (16-bit).
+  */
+ if (src1.type == BRW_REGISTER_TYPE_HF)
+brw_inst_set_3src_a16_src1_type(devinfo, inst, 1);


Maybe worth throwing in an

assert(src0.type == BRW_REGISTER_TYPE_F || src0.type == BRW_REGISTER_TYPE_HF);

just to be sure?


If we are going to do this I guess we should also check the same for src2.


Yeah, it'd probably be good to have a general assertion that the three 
sources have the same type with the caveat that they can vary to mix half 
and full float.  Maybe that would be better than something specific right here.






Either way, this and patch 20 are

Reviewed-by: Jason Ekstrand 


+
+ if (src2.type == BRW_REGISTER_TYPE_HF)
+brw_inst_set_3src_a16_src2_type(devinfo, inst, 1);
  }


And the same here (for src0 and src1)


   }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 19/42] intel/compiler: don't compact 3-src instructions with Src1Type or Src2Type bits

2019-01-18 Thread Jason Ekstrand


On January 18, 2019 04:12:44 Iago Toral  wrote:

On Thu, 2019-01-17 at 14:14 -0600, Jason Ekstrand wrote:

On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga  wrote:

We are now using these bits, so don't assert that they are not set, just
avoid compaction in that case.

Reviewed-by: Topi Pohjolainen 
---
src/intel/compiler/brw_eu_compact.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_eu_compact.c 
b/src/intel/compiler/brw_eu_compact.c

index ae14ef10ec0..20fed254331 100644
--- a/src/intel/compiler/brw_eu_compact.c
+++ b/src/intel/compiler/brw_eu_compact.c
@@ -928,8 +928,11 @@ has_3src_unmapped_bits(const struct gen_device_info 
*devinfo,

  assert(!brw_inst_bits(src, 127, 126) &&
 !brw_inst_bits(src, 105, 105) &&
 !brw_inst_bits(src, 84, 84) &&
- !brw_inst_bits(src, 36, 35) &&
 !brw_inst_bits(src, 7,  7));
+
+  /* Src1Type and Src2Type, used for mixed-precision floating point */
+  if (brw_inst_bits(src, 36, 35))
+ return true;


You're only doing this in the broadwell case.  What about SKL+ and CHV?  
Can we compact mixed-precision stuff there?  Looks like maybe we can but 
there should be at least something in the commit message about that.


In these platforms compaction is possible in some cases and 
set_3src_control_index() takes
care of this by including these bits in a tablre lookup for accepted 
combinations. I can add this

to the commit message:

"We are now using these bits, so don't assert that they are not set. In 
gen8, if these bits are set
compaction is not possible. On gen9 and CHV platforms 
set_3src_control_index() checks these
bits (and others) against a table to validate if the particular bit 
combination is eligible for

compaction or not."

With that said, if I am reading this correctly, it looks like all entries 
in gen8_3src_control_index_table that allow compaction require these bits 
to be zero at present, so I guess that right now we could also just extend 
the check I am adding here for BDW to other platforms, however, I guess 
that relying on the array with accepted bit combinations for compaction is 
more reliable should any future platforms allow for more combinations.


Sounds good. With that commit message update, r-b me.




   }

   return false;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 08/42] intel/compiler: implement 16-bit fsign

2019-01-18 Thread Jason Ekstrand



On January 18, 209 01:56:05 Iago Toral  wrote:

On Thu, 2019-01-17 at 13:55 -0600, Jason Ekstrand wrote:

On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga  wrote:

v2:
- make 16-bit be its own separate case (Jason)

Reviewed-by: Topi Pohjolainen 
---
src/intel/compiler/brw_fs_nir.cpp | 18 +-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp

index d742f55a957..cf546b8ff09 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -844,7 +844,22 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)

: bld.MOV(result, brw_imm_f(1.0f));

 set_predicate(BRW_PREDICATE_NORMAL, inst);
-  } else if (type_sz(op[0].type) < 8) {
+  } else if (type_sz(op[0].type) == 2) {
+ /* AND(val, 0x8000) gives the sign bit.
+  *
+  * Predicated OR ORs 1.0 (0x3c00) with the sign bit if val is not 
zero.

+  */
+ fs_reg zero = retype(brw_imm_uw(0), BRW_REGISTER_TYPE_HF);
+ bld.CMP(bld.null_reg_f(), op[0], zero, BRW_CONDITIONAL_NZ);
+
+ fs_reg result_int = retype(result, BRW_REGISTER_TYPE_UW);
+ op[0].type = BRW_REGISTER_TYPE_UW;
+ result.type = BRW_REGISTER_TYPE_UW;


Why are you whacking the type on result and also making a result_int temp?  
I guess you just copied that from the 32-bit case?


Oh yes, I didn't noticed that.

If we're going to whack result.type (which is fine), just use result for 
the rest of it.  With that fixed,


Right, while I am on it I guess it makes sense to do this small fix for the 
32-bit case in the same patch unless you prefer that to be a separate change.


That's probably best separate on the off chance something bisects to it.  
You can automatically add my review to the new patch though.





Reviewed-by: Jason Ekstrand 


+ bld.AND(result_int, op[0], brw_imm_uw(0x8000u));
+
+ inst = bld.OR(result_int, result_int, brw_imm_uw(0x3c00u));
+ inst->predicate = BRW_PREDICATE_NORMAL;
+  } else if (type_sz(op[0].type) == 4) {
 /* AND(val, 0x8000) gives the sign bit.
  *
  * Predicated OR ORs 1.0 (0x3f80) with the sign bit if val is not
@@ -866,6 +881,7 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)

  * - The sign is encoded in the high 32-bit of each DF
  * - We need to produce a DF result.
  */
+ assert(type_sz(op[0].type) == 8);

 fs_reg zero = vgrf(glsl_type::double_type);
 bld.MOV(zero, setup_imm_df(bld, 0.0));


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 02/42] intel/compiler: add a NIR pass to lower conversions

2019-01-18 Thread Jason Ekstrand

On January 18, 2019 01:48:25 Iago Toral  wrote:

On Thu, 2019-01-17 at 13:42 -0600, Jason Ekstrand wrote:

On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga  wrote:

Some conversions are not directly supported in hardware and need to be
split in two conversion instructions going through an intermediary type.
Doing this at the NIR level simplifies a bit the complexity in the backend.

v2:
- Consider fp16 rounding conversion opcodes
- Properly handle swizzles on conversion sources.

Reviewed-by: Topi Pohjolainen  (v1)
---
src/intel/Makefile.sources|   1 +
src/intel/compiler/brw_nir.c  |   1 +
src/intel/compiler/brw_nir.h  |   2 +
.../compiler/brw_nir_lower_conversions.c  | 158 ++
src/intel/compiler/meson.build|   1 +
5 files changed, 163 insertions(+)
create mode 100644 src/intel/compiler/brw_nir_lower_conversions.c

diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
index 94a28d370e8..9975daa3ad1 100644
--- a/src/intel/Makefile.sources
+++ b/src/intel/Makefile.sources
@@ -83,6 +83,7 @@ COMPILER_FILES = \
   compiler/brw_nir_analyze_boolean_resolves.c \
   compiler/brw_nir_analyze_ubo_ranges.c \
   compiler/brw_nir_attribute_workarounds.c \
+   compiler/brw_nir_lower_conversions.c \
   compiler/brw_nir_lower_cs_intrinsics.c \
   compiler/brw_nir_lower_image_load_store.c \
   compiler/brw_nir_lower_mem_access_bit_sizes.c \
diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index 92d7fe4bede..572ab824a94 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -882,6 +882,7 @@ brw_postprocess_nir(nir_shader *nir, const struct 
brw_compiler *compiler,

   OPT(nir_opt_move_comparisons);

   OPT(nir_lower_bool_to_int32);
+   OPT(brw_nir_lower_conversions);


This is *really* late and I don't think you actually want this to run after 
we apply source/destination modifiers.  If you just move it to right after 
nir_opt_algebraic_late, that will fix a multitude of issues.


Sure, I'll move it there.



   OPT(nir_lower_locals_to_regs);

diff --git a/src/intel/compiler/brw_nir.h b/src/intel/compiler/brw_nir.h
index bc81950d47e..662b2627e95 100644
--- a/src/intel/compiler/brw_nir.h
+++ b/src/intel/compiler/brw_nir.h
@@ -114,6 +114,8 @@ void brw_nir_lower_tcs_outputs(nir_shader *nir, const 
struct brw_vue_map *vue,

   GLenum tes_primitive_mode);
void brw_nir_lower_fs_outputs(nir_shader *nir);

+bool brw_nir_lower_conversions(nir_shader *nir);
+
bool brw_nir_lower_image_load_store(nir_shader *nir,
const struct gen_device_info *devinfo);
void brw_nir_rewrite_image_intrinsic(nir_intrinsic_instr *intrin,
diff --git a/src/intel/compiler/brw_nir_lower_conversions.c 
b/src/intel/compiler/brw_nir_lower_conversions.c

new file mode 100644
index 000..583167c7753
--- /dev/null
+++ b/src/intel/compiler/brw_nir_lower_conversions.c
@@ -0,0 +1,158 @@
+/*
+ * Copyright © 2018 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
DEALINGS

+ * IN THE SOFTWARE.
+ */
+
+#include "brw_nir.h"
+#include "compiler/nir/nir_builder.h"
+
+static nir_op
+get_conversion_op(nir_alu_type src_type,
+  unsigned src_bit_size,
+  nir_alu_type dst_type,
+  unsigned dst_bit_size,
+  nir_rounding_mode rounding_mode)
+{
+   nir_alu_type src_full_type = (nir_alu_type) (src_type | src_bit_size);
+   nir_alu_type dst_full_type = (nir_alu_type) (dst_type | dst_bit_size);
+
+   return nir_type_conversion_op(src_full_type, dst_full_type, rounding_mode);
+}
+
+static nir_rounding_mode
+get_opcode_rounding_mode(nir_op op)
+{
+   switch (op) {
+   case nir_op_f2f16_rtz:
+  return nir_rounding_mode_rtz;
+   case nir_op_f2f16_rtne:
+  return nir_rounding_mode_rtne;
+   default:
+  return 

Re: [Mesa-dev] [PATCH v3 28/42] intel/compiler: handle 64-bit float to 8-bit integer conversions

2019-01-18 Thread Iago Toral
On Fri, 2019-01-18 at 12:13 +0100, Iago Toral wrote:
> On Thu, 2019-01-17 at 17:12 -0600, Jason Ekstrand wrote:
> > This patch doesn't really do what the commit message says.  What it
> > really does is implement float -> 8-bit converions for *any* size
> > float.
> > 
> > On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga <
> > ito...@igalia.com> wrote:
> > > These are not directly supported in hardware and
> > > brw_nir_lower_conversions
> > > 
> > > should have taken care of that before we get here. Also, while we
> > > are
> > > 
> > > at it, make sure 64-bit integer to 8-bit are also properly split
> > > by
> > > 
> > > the same lowering pass, since they have the same hardware
> > > restrictions.
> > 
> > Now that we have a lowering pass, having separate cases just so one
> > of them can assert seems silly.  If anything, we should just do
> > 
> > if (result.type == BRW_REGISTER_TYPE_B ||
> > result.type == BRW_REGISTER_TYPE_UB ||
> > result.type == BRW_REGISTER_TYPE_HF)
> >assert(type_sz(op[0].type) < 8) /* brw_nir_lower_conversions */
> > 
> > and have it all in one big case.  The only special case we need is
> > for booleans where we need to negate them and fall through.
> 
> There are more cases, since the inverse conversions of these are not
> supported either. I guess I'll just add this as well:
> 
> if (op[0].type == BRW_REGISTER_TYPE_B ||
> op[0].type == BRW_REGISTER_TYPE_UB ||
> op[0].type == BRW_REGISTER_TYPE_HF)
>assert(type_sz(result.type) < 8); /* brw_nir_lower_conversions */

Oh, and there is also the rounding opcodes for f16 destinations (plus
the big comment about brw_F32TO16 that comes with their fallthrough)...
I think we might want to keep the three f2f16* opcodes as a separate
block and do as you suggest for the remaining ones if that is okay.
> > --Jason
> >  
> > > ---
> > > 
> > >  src/intel/compiler/brw_fs_nir.cpp | 6 --
> > > 
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > 
> > > 
> > > diff --git a/src/intel/compiler/brw_fs_nir.cpp
> > > b/src/intel/compiler/brw_fs_nir.cpp
> > > 
> > > index cf546b8ff09..e454578d99b 100644
> > > 
> > > --- a/src/intel/compiler/brw_fs_nir.cpp
> > > 
> > > +++ b/src/intel/compiler/brw_fs_nir.cpp
> > > 
> > > @@ -786,6 +786,10 @@ fs_visitor::nir_emit_alu(const fs_builder
> > > , nir_alu_instr *instr)
> > > 
> > > case nir_op_f2f16:
> > > 
> > > case nir_op_i2f16:
> > > 
> > > case nir_op_u2f16:
> > > 
> > > +   case nir_op_i2i8:
> > > 
> > > +   case nir_op_u2u8:
> > > 
> > > +   case nir_op_f2i8:
> > > 
> > > +   case nir_op_f2u8:
> > > 
> > >assert(type_sz(op[0].type) < 8); /*
> > > brw_nir_lower_conversions */
> > > 
> > >inst = bld.MOV(result, op[0]);
> > > 
> > >inst->saturate = instr->dest.saturate;
> > > 
> > > @@ -824,8 +828,6 @@ fs_visitor::nir_emit_alu(const fs_builder
> > > , nir_alu_instr *instr)
> > > 
> > > case nir_op_u2u32:
> > > 
> > > case nir_op_i2i16:
> > > 
> > > case nir_op_u2u16:
> > > 
> > > -   case nir_op_i2i8:
> > > 
> > > -   case nir_op_u2u8:
> > > 
> > >inst = bld.MOV(result, op[0]);
> > > 
> > >inst->saturate = instr->dest.saturate;
> > > 
> > >break;
> > > 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] radv: initialize the per-queue descriptor BO only once

2019-01-18 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

for the series.

On Thu, Jan 17, 2019 at 6:08 PM Samuel Pitoiset
 wrote:
>
> Totally useless to write the descriptors inside the loop.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_device.c | 47 ++--
>  1 file changed, 23 insertions(+), 24 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index 0bb2dcdcc20..c2de61c935d 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -2456,6 +2456,29 @@ radv_get_preamble_cs(struct radv_queue *queue,
> } else
> descriptor_bo = queue->descriptor_bo;
>
> +   if (descriptor_bo != queue->descriptor_bo) {
> +   uint32_t *map = 
> (uint32_t*)queue->device->ws->buffer_map(descriptor_bo);
> +
> +   if (scratch_bo) {
> +   uint64_t scratch_va = radv_buffer_get_va(scratch_bo);
> +   uint32_t rsrc1 = S_008F04_BASE_ADDRESS_HI(scratch_va 
> >> 32) |
> +S_008F04_SWIZZLE_ENABLE(1);
> +   map[0] = scratch_va;
> +   map[1] = rsrc1;
> +   }
> +
> +   if (esgs_ring_bo || gsvs_ring_bo || tess_rings_bo || 
> add_sample_positions)
> +   fill_geom_tess_rings(queue, map, add_sample_positions,
> +esgs_ring_size, esgs_ring_bo,
> +gsvs_ring_size, gsvs_ring_bo,
> +tess_factor_ring_size,
> +tess_offchip_ring_offset,
> +tess_offchip_ring_size,
> +tess_rings_bo);
> +
> +   queue->device->ws->buffer_unmap(descriptor_bo);
> +   }
> +
> for(int i = 0; i < 3; ++i) {
> struct radeon_cmdbuf *cs = NULL;
> cs = queue->device->ws->cs_create(queue->device->ws,
> @@ -2480,30 +2503,6 @@ radv_get_preamble_cs(struct radv_queue *queue,
> break;
> }
>
> -   if (descriptor_bo != queue->descriptor_bo) {
> -   uint32_t *map = 
> (uint32_t*)queue->device->ws->buffer_map(descriptor_bo);
> -
> -   if (scratch_bo) {
> -   uint64_t scratch_va = 
> radv_buffer_get_va(scratch_bo);
> -   uint32_t rsrc1 = 
> S_008F04_BASE_ADDRESS_HI(scratch_va >> 32) |
> -S_008F04_SWIZZLE_ENABLE(1);
> -   map[0] = scratch_va;
> -   map[1] = rsrc1;
> -   }
> -
> -   if (esgs_ring_bo || gsvs_ring_bo || tess_rings_bo ||
> -   add_sample_positions)
> -   fill_geom_tess_rings(queue, map, 
> add_sample_positions,
> -esgs_ring_size, 
> esgs_ring_bo,
> -gsvs_ring_size, 
> gsvs_ring_bo,
> -tess_factor_ring_size,
> -tess_offchip_ring_offset,
> -tess_offchip_ring_size,
> -tess_rings_bo);
> -
> -   queue->device->ws->buffer_unmap(descriptor_bo);
> -   }
> -
> if (esgs_ring_bo || gsvs_ring_bo || tess_rings_bo)  {
> radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
> radeon_emit(cs, EVENT_TYPE(V_028A90_VS_PARTIAL_FLUSH) 
> | EVENT_INDEX(4));
> --
> 2.20.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 00/42] intel: VK_KHR_shader_float16_int8 implementation

2019-01-18 Thread Iago Toral
Thanks a lot of for all the review work! When you're done reviewing all
the patches I'll prepare a v4 with all the changes.
On Thu, 2019-01-17 at 18:24 -0600, Jason Ekstrand wrote:
> I'm done for the day but I've read through most of the patches.  I
> think I've got 4 or 5 tricky ones left.  By and large, I think things
> are looking really good.  I don't know that we'll make 19.0 but
> there's a possibility.  If not, it'll likely land shortly after.
> 
> On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga  > wrote:
> > The changes in this version address review feedback to v2 and, most
> > importantly,
> > 
> > rebase on top of relevant changes in master, specifically Curro's
> > regioning
> > 
> > lowering pass. This new regioning pass simplifies some of the NIR
> > translation
> > 
> > code (specifically the code for translating regioning restrictions
> > on
> > 
> > conversions for atom platforms) making some of the previous work in
> > this series
> > 
> > unnecessary. The regioning restrictions for conversions between
> > integer and
> > 
> > half-float added with this series are are now implemented as part
> > of this
> > 
> > framework instead of doing it at NIR translation time. This version
> > of the
> > 
> > series also dropped the SPIR-V compiler patches that have already
> > been merged.
> > 
> > 
> > 
> > As always, a branch for with these patches is available for testing
> > in the
> > 
> > itoral/VK_KHR_shader_float16_int8 branch of the Igalia Mesa
> > repository at
> > 
> > https://github.com/Igalia/mesa.
> > 
> > 
> > 
> > Iago Toral Quiroga (42):
> > 
> >   intel/compiler: handle conversions between int and half-float on
> > atom
> > 
> >   intel/compiler: add a NIR pass to lower conversions
> > 
> >   intel/compiler: split float to 64-bit opcodes from int to 64-bit
> > 
> >   intel/compiler: handle b2i/b2f with other integer conversion
> > opcodes
> > 
> >   intel/compiler: assert restrictions on conversions to half-float
> > 
> >   intel/compiler: lower some 16-bit float operations to 32-bit
> > 
> >   intel/compiler: lower 16-bit extended math to 32-bit prior to
> > gen9
> > 
> >   intel/compiler: implement 16-bit fsign
> > 
> >   intel/compiler: allow extended math functions with HF operands
> > 
> >   compiler/nir: add lowering option for 16-bit fmod
> > 
> >   intel/compiler: lower 16-bit fmod
> > 
> >   compiler/nir: add lowering for 16-bit flrp
> > 
> >   intel/compiler: lower 16-bit flrp
> > 
> >   compiler/nir: add lowering for 16-bit ldexp
> > 
> >   intel/compiler: Extended Math is limited to SIMD8 on half-float
> > 
> >   intel/compiler: add instruction setters for Src1Type and
> > Src2Type.
> > 
> >   intel/compiler: add new half-float register type for 3-src
> > 
> > instructions
> > 
> >   intel/compiler: add a helper function to query hardware type
> > table
> > 
> >   intel/compiler: don't compact 3-src instructions with Src1Type or
> > 
> > Src2Type bits
> > 
> >   intel/compiler: allow half-float on 3-source instructions since
> > gen8
> > 
> >   intel/compiler: set correct precision fields for 3-source float
> > 
> > instructions
> > 
> >   intel/compiler: don't propagate HF immediates to 3-src
> > instructions
> > 
> >   intel/compiler: fix ddx and ddy for 16-bit float
> > 
> >   intel/compiler: fix ddy for half-float in gen8
> > 
> >   intel/compiler: workaround for SIMD8 half-float MAD in gen8
> > 
> >   intel/compiler: split is_partial_write() into two variants
> > 
> >   intel/compiler: activate 16-bit bit-size lowerings also for 8-bit
> > 
> >   intel/compiler: handle 64-bit float to 8-bit integer conversions
> > 
> >   intel/compiler: handle conversions between int and half-float on
> > atom
> > 
> >   intel/compiler: implement isign for int8
> > 
> >   intel/compiler: ask for an integer type if requesting an 8-bit
> > type
> > 
> >   intel/eu: force stride of 2 on NULL register for Byte
> > instructions
> > 
> >   compiler/spirv: add support for Float16 and Int8 capabilities
> > 
> >   anv/pipeline: support Float16 and Int8 capabilities in gen8+
> > 
> >   anv/device: expose shaderFloat16 and shaderInt8 in gen8+
> > 
> >   intel/compiler: implement is_zero, is_one, is_negative_one for
> > 
> > 8-bit/16-bit
> > 
> >   intel/compiler: add a brw_reg_type_is_integer helper
> > 
> >   intel/compiler: fix cmod propagation for non 32-bit types
> > 
> >   intel/compiler: remove MAD/LRP algebraic optimizations from the
> > 
> > backend
> > 
> >   intel/compiler: support half-float in the combine constants pass
> > 
> >   intel/compiler: fix combine constants for Align16 with half-float
> > 
> > prior to gen9
> > 
> >   intel/compiler: allow propagating HF immediates to MAD/LRP
> > 
> > 
> > 
> >  src/compiler/nir/nir.h|   2 +
> > 
> >  src/compiler/nir/nir_opt_algebraic.py |  11 +-
> > 
> >  src/compiler/shader_info.h|   2 +
> > 
> >  src/compiler/spirv/spirv_to_nir.c |   8 +-
> > 

Re: [Mesa-dev] [PATCH v3 40/42] intel/compiler: support half-float in the combine constants pass

2019-01-18 Thread Iago Toral
On Thu, 2019-01-17 at 18:18 -0600, Jason Ekstrand wrote:
> On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga  > wrote:
> > Reviewed-by: Topi Pohjolainen 
> > 
> > ---
> > 
> >  .../compiler/brw_fs_combine_constants.cpp | 60
> > +++
> > 
> >  1 file changed, 49 insertions(+), 11 deletions(-)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_fs_combine_constants.cpp
> > b/src/intel/compiler/brw_fs_combine_constants.cpp
> > 
> > index 7343f77bb45..54017e5668b 100644
> > 
> > --- a/src/intel/compiler/brw_fs_combine_constants.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_combine_constants.cpp
> > 
> > @@ -36,6 +36,7 @@
> > 
> > 
> > 
> >  #include "brw_fs.h"
> > 
> >  #include "brw_cfg.h"
> > 
> > +#include "util/half_float.h"
> > 
> > 
> > 
> >  using namespace brw;
> > 
> > 
> > 
> > @@ -114,8 +115,9 @@ struct imm {
> > 
> >  */
> > 
> > exec_list *uses;
> > 
> > 
> > 
> > -   /** The immediate value.  We currently only handle floats. */
> > 
> > +   /** The immediate value.  We currently only handle float and
> > half-float. */
> > 
> > float val;
> > 
> > +   brw_reg_type type;
> 
> I had a brief chat with Matt today and I think that this may be going
> in the wrong direction.  In particular, I'd like us to eventually
> (maybe we can do it now?) generalize the combine_constants pass to
> more data types; in particular, integers.  I recently came across a
> shader where the fact that we couldn't do combine_constants on
> integers was causing significant register pressure problems and
> spilling.  (The test was doing a bunch of BFI2/BFE with constant
> sources.)  It could also be a huge win for 8-bit and 64-bit where we
> can't put immediates in regular 2-src instructions.
> What does this mean for the pass?  I suspect we want a bit size
> instead of a type and a simple char[8] for the data and just make it
> a blob of bits.  We may also want some sort of heuristic so we don't
> burn constant table space for things that are only used once or maybe
> even twice.
> 
> Normally, I would say "do it as a fixup" but if we go the direction
> of having a float and using _mesa_half_to_float and
> _mesa_float_to_half, I suspect it'll be harder to go for the bag-of-
> bits approach.
> 
> Thoughts?

Fair enough, I see the value in having this support more than just F
and HF. I'll think a bit about it and I'll send a different version
that tries to cover these cases as well.
> --Jason
>  
> > 
> > /**
> > 
> >  * The GRF register and subregister number where we've decided
> > to store the
> > 
> > @@ -145,10 +147,10 @@ struct table {
> > 
> >  };
> > 
> > 
> > 
> >  static struct imm *
> > 
> > -find_imm(struct table *table, float val)
> > 
> > +find_imm(struct table *table, float val, brw_reg_type type)
> > 
> >  {
> > 
> > for (int i = 0; i < table->len; i++) {
> > 
> > -  if (table->imm[i].val == val) {
> > 
> > +  if (table->imm[i].val == val && table->imm[i].type == type)
> > {
> > 
> >   return >imm[i];
> > 
> >}
> > 
> > }
> > 
> > @@ -190,6 +192,20 @@ compare(const void *_a, const void *_b)
> > 
> > return a->first_use_ip - b->first_use_ip;
> > 
> >  }
> > 
> > 
> > 
> > +static bool
> > 
> > +needs_negate(float reg_val, float imm_val, brw_reg_type type)
> > 
> > +{
> > 
> > +   /* reg_val represents the immediate value in the register in
> > its original
> > 
> > +* bit-size, while imm_val is always a valid 32-bit float
> > value.
> > 
> > +*/
> > 
> > +   if (type == BRW_REGISTER_TYPE_HF) {
> > 
> > +  uint32_t reg_val_ud = *((uint32_t *) _val);
> > 
> > +  reg_val = _mesa_half_to_float(reg_val_ud & 0x);
> > 
> > +   }
> > 
> > +
> > 
> > +   return signbit(imm_val) != signbit(reg_val);
> > 
> > +}
> > 
> > +
> > 
> >  bool
> > 
> >  fs_visitor::opt_combine_constants()
> > 
> >  {
> > 
> > @@ -215,12 +231,20 @@ fs_visitor::opt_combine_constants()
> > 
> > 
> > 
> >for (int i = 0; i < inst->sources; i++) {
> > 
> >   if (inst->src[i].file != IMM ||
> > 
> > - inst->src[i].type != BRW_REGISTER_TYPE_F)
> > 
> > + (inst->src[i].type != BRW_REGISTER_TYPE_F &&
> > 
> > +  inst->src[i].type != BRW_REGISTER_TYPE_HF))
> > 
> >  continue;
> > 
> > 
> > 
> > - float val = !inst->can_do_source_mods(devinfo) ? inst-
> > >src[i].f :
> > 
> > - fabs(inst->src[i].f);
> > 
> > - struct imm *imm = find_imm(, val);
> > 
> > + float val;
> > 
> > + if (inst->src[i].type == BRW_REGISTER_TYPE_F) {
> > 
> > +val = !inst->can_do_source_mods(devinfo) ? inst-
> > >src[i].f :
> > 
> > +fabs(inst->src[i].f);
> > 
> > + } else {
> > 
> > +val = !inst->can_do_source_mods(devinfo) ?
> > 
> > +   _mesa_half_to_float(inst->src[i].d & 0x) :
> > 
> > +   fabs(_mesa_half_to_float(inst->src[i].d & 0x));
> > 
> > + }
> > 
> > + 

Re: [Mesa-dev] [PATCH v3 28/42] intel/compiler: handle 64-bit float to 8-bit integer conversions

2019-01-18 Thread Iago Toral
On Thu, 2019-01-17 at 17:12 -0600, Jason Ekstrand wrote:
> This patch doesn't really do what the commit message says.  What it
> really does is implement float -> 8-bit converions for *any* size
> float.
> 
> On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga  > wrote:
> > These are not directly supported in hardware and
> > brw_nir_lower_conversions
> > 
> > should have taken care of that before we get here. Also, while we
> > are
> > 
> > at it, make sure 64-bit integer to 8-bit are also properly split by
> > 
> > the same lowering pass, since they have the same hardware
> > restrictions.
> 
> Now that we have a lowering pass, having separate cases just so one
> of them can assert seems silly.  If anything, we should just do
> 
> if (result.type == BRW_REGISTER_TYPE_B ||
> result.type == BRW_REGISTER_TYPE_UB ||
> result.type == BRW_REGISTER_TYPE_HF)
>assert(type_sz(op[0].type) < 8) /* brw_nir_lower_conversions */
> 
> and have it all in one big case.  The only special case we need is
> for booleans where we need to negate them and fall through.

There are more cases, since the inverse conversions of these are not
supported either. I guess I'll just add this as well:
if (op[0].type == BRW_REGISTER_TYPE_B ||op[0].type ==
BRW_REGISTER_TYPE_UB ||op[0].type ==
BRW_REGISTER_TYPE_HF)   assert(type_sz(result.type) < 8); /*
brw_nir_lower_conversions */
> --Jason
>  
> > ---
> > 
> >  src/intel/compiler/brw_fs_nir.cpp | 6 --
> > 
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_fs_nir.cpp
> > b/src/intel/compiler/brw_fs_nir.cpp
> > 
> > index cf546b8ff09..e454578d99b 100644
> > 
> > --- a/src/intel/compiler/brw_fs_nir.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_nir.cpp
> > 
> > @@ -786,6 +786,10 @@ fs_visitor::nir_emit_alu(const fs_builder
> > , nir_alu_instr *instr)
> > 
> > case nir_op_f2f16:
> > 
> > case nir_op_i2f16:
> > 
> > case nir_op_u2f16:
> > 
> > +   case nir_op_i2i8:
> > 
> > +   case nir_op_u2u8:
> > 
> > +   case nir_op_f2i8:
> > 
> > +   case nir_op_f2u8:
> > 
> >assert(type_sz(op[0].type) < 8); /*
> > brw_nir_lower_conversions */
> > 
> >inst = bld.MOV(result, op[0]);
> > 
> >inst->saturate = instr->dest.saturate;
> > 
> > @@ -824,8 +828,6 @@ fs_visitor::nir_emit_alu(const fs_builder ,
> > nir_alu_instr *instr)
> > 
> > case nir_op_u2u32:
> > 
> > case nir_op_i2i16:
> > 
> > case nir_op_u2u16:
> > 
> > -   case nir_op_i2i8:
> > 
> > -   case nir_op_u2u8:
> > 
> >inst = bld.MOV(result, op[0]);
> > 
> >inst->saturate = instr->dest.saturate;
> > 
> >break;
> > 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] MR: Anv: document & improve pipeline flushes/invalidates

2019-01-18 Thread Lionel Landwerlin

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/132

2 change in this MR :

    * add some documentation to clarify how we choose pipeline flushes 
invalidations
    * narrow the CS stall & RT flushes for the query copies to track 
only operations that write a destination buffer


For the second change we have crucible test bug.108909 to verify that 
this is still correct.


-
Lionel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 00/42] intel: VK_KHR_shader_float16_int8 implementation

2019-01-18 Thread Iago Toral
On Thu, 2019-01-17 at 17:16 -0600, Jason Ekstrand wrote:
> On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga  > wrote:
> > The changes in this version address review feedback to v2 and, most
> > importantly,
> > 
> > rebase on top of relevant changes in master, specifically Curro's
> > regioning
> > 
> > lowering pass. This new regioning pass simplifies some of the NIR
> > translation
> > 
> > code (specifically the code for translating regioning restrictions
> > on
> > 
> > conversions for atom platforms) making some of the previous work in
> > this series
> > 
> > unnecessary. The regioning restrictions for conversions between
> > integer and
> > 
> > half-float added with this series are are now implemented as part
> > of this
> > 
> > framework instead of doing it at NIR translation time. This version
> > of the
> > 
> > series also dropped the SPIR-V compiler patches that have already
> > been merged.
> > 
> > 
> > 
> > As always, a branch for with these patches is available for testing
> > in the
> > 
> > itoral/VK_KHR_shader_float16_int8 branch of the Igalia Mesa
> > repository at
> > 
> > https://github.com/Igalia/mesa.
> > 
> > 
> > 
> > Iago Toral Quiroga (42):
> > 
> >   intel/compiler: handle conversions between int and half-float on
> > atom
> > 
> >   intel/compiler: add a NIR pass to lower conversions
> > 
> >   intel/compiler: split float to 64-bit opcodes from int to 64-bit
> > 
> >   intel/compiler: handle b2i/b2f with other integer conversion
> > opcodes
> > 
> >   intel/compiler: assert restrictions on conversions to half-float
> > 
> >   intel/compiler: lower some 16-bit float operations to 32-bit
> > 
> >   intel/compiler: lower 16-bit extended math to 32-bit prior to
> > gen9
> > 
> >   intel/compiler: implement 16-bit fsign
> > 
> >   intel/compiler: allow extended math functions with HF operands
> > 
> >   compiler/nir: add lowering option for 16-bit fmod
> > 
> >   intel/compiler: lower 16-bit fmod
> > 
> >   compiler/nir: add lowering for 16-bit flrp
> > 
> >   intel/compiler: lower 16-bit flrp
> > 
> >   compiler/nir: add lowering for 16-bit ldexp
> > 
> >   intel/compiler: Extended Math is limited to SIMD8 on half-float
> > 
> >   intel/compiler: add instruction setters for Src1Type and
> > Src2Type.
> > 
> >   intel/compiler: add new half-float register type for 3-src
> > 
> > instructions
> > 
> >   intel/compiler: add a helper function to query hardware type
> > table
> > 
> >   intel/compiler: don't compact 3-src instructions with Src1Type or
> > 
> > Src2Type bits
> > 
> >   intel/compiler: allow half-float on 3-source instructions since
> > gen8
> > 
> >   intel/compiler: set correct precision fields for 3-source float
> > 
> > instructions
> > 
> >   intel/compiler: don't propagate HF immediates to 3-src
> > instructions
> > 
> >   intel/compiler: fix ddx and ddy for 16-bit float
> > 
> >   intel/compiler: fix ddy for half-float in gen8
> > 
> >   intel/compiler: workaround for SIMD8 half-float MAD in gen8
> > 
> >   intel/compiler: split is_partial_write() into two variants
> > 
> >   intel/compiler: activate 16-bit bit-size lowerings also for 8-bit
> > 
> >   intel/compiler: handle 64-bit float to 8-bit integer conversions
> > 
> >   intel/compiler: handle conversions between int and half-float on
> > atom
> 
> I can't find this patch (29/42) in my e-mail but it's on patchwork. 
> In any case, I don't think it's doing anything useful anymore and
> should just be dropped.

Right, addressing your comment to the previous patch takes care of
this. I'll drop it.
> >   intel/compiler: implement isign for int8
> > 
> >   intel/compiler: ask for an integer type if requesting an 8-bit
> > type
> > 
> >   intel/eu: force stride of 2 on NULL register for Byte
> > instructions
> > 
> >   compiler/spirv: add support for Float16 and Int8 capabilities
> > 
> >   anv/pipeline: support Float16 and Int8 capabilities in gen8+
> > 
> >   anv/device: expose shaderFloat16 and shaderInt8 in gen8+
> > 
> >   intel/compiler: implement is_zero, is_one, is_negative_one for
> > 
> > 8-bit/16-bit
> > 
> >   intel/compiler: add a brw_reg_type_is_integer helper
> > 
> >   intel/compiler: fix cmod propagation for non 32-bit types
> > 
> >   intel/compiler: remove MAD/LRP algebraic optimizations from the
> > 
> > backend
> > 
> >   intel/compiler: support half-float in the combine constants pass
> > 
> >   intel/compiler: fix combine constants for Align16 with half-float
> > 
> > prior to gen9
> > 
> >   intel/compiler: allow propagating HF immediates to MAD/LRP
> > 
> > 
> > 
> >  src/compiler/nir/nir.h|   2 +
> > 
> >  src/compiler/nir/nir_opt_algebraic.py |  11 +-
> > 
> >  src/compiler/shader_info.h|   2 +
> > 
> >  src/compiler/spirv/spirv_to_nir.c |   8 +-
> > 
> >  src/intel/Makefile.sources|   1 +
> > 
> >  src/intel/compiler/brw_compiler.c |   2 +
> > 
> >  

[Mesa-dev] [Bug 109362] Objects are invisible in Resident Evil 2 "1-Shot Demo" with RADV

2019-01-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109362

Samuel Pitoiset  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #1 from Samuel Pitoiset  ---
Should be fixed with
https://cgit.freedesktop.org/mesa/mesa/commit/?id=8424cd8fbd1671c4c13f57cfa34bf8145d0fffcf

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 21/42] intel/compiler: set correct precision fields for 3-source float instructions

2019-01-18 Thread Iago Toral
On Thu, 2019-01-17 at 14:18 -0600, Jason Ekstrand wrote:
> On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga  > wrote:
> > Source0 and Destination extract the floating-point precision
> > automatically
> > 
> > from the SrcType and DstType instruction fields respectively when
> > they are
> > 
> > set to types :F or :HF. For Source1 and Source2 operands, we use
> > the new
> > 
> > 1-bit fields Src1Type and Src2Type, where 0 means normal precision
> > and 1
> > 
> > means half-precision. Since we always use the type of the
> > destination for
> > 
> > all operands when we emit 3-source instructions, we only need set
> > Src1Type
> > 
> > and Src2Type to 1 when we are emitting a half-precision
> > instruction.
> > 
> > 
> > 
> > v2:
> > 
> >  - Set the bit separately for each source based on its type so we
> > can
> > 
> >do mixed floating-point mode in the future (Topi).
> > 
> > 
> > 
> > Reviewed-by: Topi Pohjolainen 
> > 
> > ---
> > 
> >  src/intel/compiler/brw_eu_emit.c | 16 
> > 
> >  1 file changed, 16 insertions(+)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_eu_emit.c
> > b/src/intel/compiler/brw_eu_emit.c
> > 
> > index a785f96b650..2fa89f8a2a3 100644
> > 
> > --- a/src/intel/compiler/brw_eu_emit.c
> > 
> > +++ b/src/intel/compiler/brw_eu_emit.c
> > 
> > @@ -801,6 +801,22 @@ brw_alu3(struct brw_codegen *p, unsigned
> > opcode, struct brw_reg dest,
> > 
> >*/
> > 
> >   brw_inst_set_3src_a16_src_type(devinfo, inst, dest.type);
> > 
> >   brw_inst_set_3src_a16_dst_type(devinfo, inst, dest.type);
> > 
> > +
> > 
> > + /* From the Bspec: Instruction types
> > 
> > +  *
> > 
> > +  * Three source instructions can use operands with mixed-
> > mode
> > 
> > +  * precision. When SrcType field is set to :f or :hf it
> > defines
> > 
> > +  * precision for source 0 only, and fields Src1Type and
> > Src2Type
> > 
> > +  * define precision for other source operands:
> > 
> > +  *
> > 
> > +  *   0b = :f. Single precision Float (32-bit).
> > 
> > +  *   1b = :hf. Half precision Float (16-bit).
> > 
> > +  */
> > 
> > + if (src1.type == BRW_REGISTER_TYPE_HF)
> > 
> > +brw_inst_set_3src_a16_src1_type(devinfo, inst, 1);
> 
> Maybe worth throwing in an
> 
> assert(src0.type == BRW_REGISTER_TYPE_F || src0.type ==
> BRW_REGISTER_TYPE_HF);
> 
> just to be sure? 

If we are going to do this I guess we should also check the same for
src2.
>  Either way, this and patch 20 are
> Reviewed-by: Jason Ekstrand 
>  
> > +
> > 
> > + if (src2.type == BRW_REGISTER_TYPE_HF)
> > 
> > +brw_inst_set_3src_a16_src2_type(devinfo, inst, 1);
> > 
> >}

And the same here (for src0 and src1)
> > }
> > 
> > 
> > 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] egl/dri2: try to bind old context if bindContext failed

2019-01-18 Thread Luigi Santivetti
Before this change, if bindContext() failed then dri2_make_current() would
rebind the old EGL context and surfaces and return EGL_BAD_MATCH. However,
it wouldn't rebind the DRI context and surfaces, thus leaving it in an
inconsistent and unrecoverable state.

After this change, dri2_make_current() tries to bind the old DRI context
and surfaces when bindContext() failed. If unable to do so, it leaves EGL
and the DRI driver in a consistent state, it reports an error and returns
EGL_BAD_MATCH.

Fixes: 4e8f95f64d004aa1 ("egl_dri2: Always unbind old contexts")

Signed-off-by: Luigi Santivetti 
---
 src/egl/drivers/dri2/egl_dri2.c | 54 ++---
 1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index dca22762047..016a3ced96d 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -1648,8 +1648,9 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, 
_EGLSurface *dsurf,
_EGLSurface *old_dsurf, *old_rsurf;
_EGLSurface *tmp_dsurf, *tmp_rsurf;
__DRIdrawable *ddraw, *rdraw;
-   __DRIcontext *cctx;
+   __DRIcontext *cctx, *old_cctx;
EGLBoolean unbind;
+   EGLint egl_error;
 
if (!dri2_dpy)
   return _eglError(EGL_NOT_INITIALIZED, "eglMakeCurrent");
@@ -1674,7 +1675,7 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, 
_EGLSurface *dsurf,
cctx = (dri2_ctx) ? dri2_ctx->dri_context : NULL;
 
if (old_ctx) {
-  __DRIcontext *old_cctx = dri2_egl_context(old_ctx)->dri_context;
+  old_cctx = dri2_egl_context(old_ctx)->dri_context;
 
   if (old_dsurf)
  dri2_surf_update_fence_fd(old_ctx, disp, old_dsurf);
@@ -1691,17 +1692,24 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, 
_EGLSurface *dsurf,
unbind = (cctx == NULL && ddraw == NULL && rdraw == NULL);
 
if (!unbind && !dri2_dpy->core->bindContext(cctx, ddraw, rdraw)) {
+  __DRIdrawable *old_ddraw, *old_rdraw;
+
+  /* dri2_dpy->core->bindContext failed. We cannot tell for sure why, but
+   * setting the error to EGL_BAD_MATCH is surely better than leaving it
+   * as EGL_SUCCESS.
+   */
+  egl_error = EGL_BAD_MATCH;
+
+  old_ddraw = (old_dsurf) ? dri2_dpy->vtbl->get_dri_drawable(old_dsurf) : 
NULL;
+  old_rdraw = (old_rsurf) ? dri2_dpy->vtbl->get_dri_drawable(old_rsurf) : 
NULL;
+  old_cctx = (old_ctx) ? dri2_egl_context(old_ctx)->dri_context : NULL;
+
   /* undo the previous _eglBindContext */
   _eglBindContext(old_ctx, old_dsurf, old_rsurf, , _dsurf, 
_rsurf);
   assert(_ctx->base == ctx &&
  tmp_dsurf == dsurf &&
  tmp_rsurf == rsurf);
 
-  if (old_dsurf && _eglSurfaceInSharedBufferMode(old_dsurf) &&
-  old_dri2_dpy->vtbl->set_shared_buffer_mode) {
- old_dri2_dpy->vtbl->set_shared_buffer_mode(old_disp, old_dsurf, true);
-  }
-
   _eglPutSurface(dsurf);
   _eglPutSurface(rsurf);
   _eglPutContext(ctx);
@@ -1710,11 +1718,32 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, 
_EGLSurface *dsurf,
   _eglPutSurface(old_rsurf);
   _eglPutContext(old_ctx);
 
-  /* dri2_dpy->core->bindContext failed. We cannot tell for sure why, but
-   * setting the error to EGL_BAD_MATCH is surely better than leaving it
-   * as EGL_SUCCESS.
+  /* undo the previous dri2_dpy->core->unbindContext */
+  if (dri2_dpy->core->bindContext(old_cctx, old_ddraw, old_rdraw)) {
+ if (old_dsurf && _eglSurfaceInSharedBufferMode(old_dsurf) &&
+ old_dri2_dpy->vtbl->set_shared_buffer_mode) {
+old_dri2_dpy->vtbl->set_shared_buffer_mode(old_disp, old_dsurf, 
true);
+ }
+
+ return _eglError(egl_error, "eglMakeCurrent");
+  }
+
+  /* We cannot restore the same state as it was before calling
+   * eglMakeCurrent(), but we can keep EGL in a consistent state with
+   * the DRI driver by unbinding the old EGL context and surfaces.
*/
-  return _eglError(EGL_BAD_MATCH, "eglMakeCurrent");
+  ctx = dsurf = rsurf = NULL;
+  unbind = true;
+
+  _eglBindContext(ctx, dsurf, rsurf, _ctx, _dsurf, _rsurf);
+  assert(_ctx->base == old_ctx &&
+ tmp_dsurf == old_dsurf &&
+ tmp_rsurf == old_rsurf);
+
+  _eglLog(_EGL_FATAL, "DRI2: failed to rebind the previous context");
+   } else {
+  /* We can no longer fail at this point. */
+  egl_error = EGL_SUCCESS;
}
 
dri2_destroy_surface(drv, disp, old_dsurf);
@@ -1740,6 +1769,9 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, 
_EGLSurface *dsurf,
   dri2_dpy->vtbl->set_shared_buffer_mode(disp, dsurf, mode);
}
 
+   if (egl_error != EGL_SUCCESS)
+  return _eglError(egl_error, "eglMakeCurrent");
+
return EGL_TRUE;
 }
 
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Android + tegra + mesa

2019-01-18 Thread Artyom Bambalov
Do you have some notes on how to reproduce your setup? I've been meaningto try this out but never found enough time to see it through. Having aset of simple instructions for getting this built and booted would helpreproduce the behavior and debug.My device is xiaomi mipad 1(codename is mocha). It's similar with with shieldtablet(tn8), but uses different lcd panel, pmic(tps65913 like in t114 instead of as3xxx) and some other secondary hardware. My device tree(lineage-16.0 branch): https://github.com/art/android_device_xiaomi_mocha_mainlineMy kernel(android-4.19_mocha-changes branch): https://github.com/Insei/linuxThat said, it looks like you've got a completely white display, whichusually means that you're getting page faults from the SMMU, so maybethat'd be an interesting place to look at. You should be seeing errorsfrom the SMMU in the kernel log in that case.The display can be white or black. It depends on real picture color. There was a brief period where the SMMU wasn't working properly withNouveau, which could be related to this, especially since it was aroundthe timeframe of 4.19. That issue was fixed with this commit:b59fb482b522 ("drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping")which was merged into v4.19, but maybe it's not in the tree from Googlethat you're using (for whatever reason).Come to think of it, that wouldn't really explain the white display,though, since that usually happens on SMMU faults for reads by thedisplay controller. In any case, the kernel log would hopefully containsome clues as to what could be wrong.ThierryI attached dmesg. It contains pretty much nouveau errors. [0.00] Booting Linux on physical CPU 0x0
[0.00] Linux version 
4.19.13-4.19_mocha-build27-17834-g07c6e936c512-dirty (art@eOS-5.0-Juno) 
(gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision 
d29120a424ecfbc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05)) #1 SMP 
PREEMPT Sat Jan 12 00:27:20 MSK 2019
[0.00] CPU: ARMv7 Processor [413fc0f3] revision 3 (ARMv7), cr=10c5387d
[0.00] CPU: div instructions available: patching division code
[0.00] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
[0.00] OF: fdt: Machine model: NVIDIA Tegra124 MOCHA
[0.00] Memory policy: Data cache writealloc
[0.00] cma: Reserved 64 MiB at 0xf940
[0.00] On node 0 totalpages: 513024
[0.00] Normal zone: 1728 pages used for memmap
[0.00] Normal zone: 0 pages reserved
[0.00] Normal zone: 196608 pages, LIFO batch:63
[0.00] HighMem zone: 316416 pages, LIFO batch:63
[0.00] psci: probing for conduit method from DT.
[0.00] psci: PSCIv0.2 detected in firmware.
[0.00] psci: Using standard PSCI v0.2 function IDs
[0.00] psci: MIGRATE_INFO_TYPE not supported.
[0.00] random: get_random_bytes called from start_kernel+0xa0/0x49c 
with crng_init=0
[0.00] percpu: Embedded 17 pages/cpu @(ptrval) s40844 r8192 d20596 
u69632
[0.00] pcpu-alloc: s40844 r8192 d20596 u69632 alloc=17*4096
[0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 511296
[0.00] Kernel command line: vpr_resize androidboot.selinux=permissive 
androidboot.hardware=tn8 buildvariant=userdebug tegraid=40.1.1.0.0 
mem=2006M@2048M memtype=0 tsec=32M@4064M 
otf_key=df908e015017cc8c8111a390a434436e tzram=4M@4058M ddr_die=1024M@2048M 
ddr_die=1024M@3072M section=128M commchip_id=0 usb_port_owner_info=0 
lane_owner_info=0 emc_max_dvfs=0 androidboot.serialno=1413E094 
androidboot.cpuid=17409C1470C000BFD00C0 androidboot.commchip_id=0 
androidboot.modem=none bl_version=1.0 syspart=system  touch_id=0@2150632068 
video=tegrafb no_console_suspend=1 console=none debug_uartport=hsport 
usbcore.old_scheme_first=1 lp0_vec=2896@0xfdfff000 
tegra_fbmem=25296896@0xad012000 nvdumper_reserved=0xfd70 core_edp_mv=1150 
core_edp_ma=4000 pmuboard=0x06c8:0x03e8:0x00:0x44:0x04 
displayboard=0x8038:0x:0x00:0x00:0x00 power_supply=Battery 
board_info=0x06f4:0x044c:0x03:0x41:0x07 tegraboot=sdmmc gpt gpt_sector=4095 
modem_id=0 androidboot.hwversion=1 panel_id=32 android.kerneltype=norma
[0.00] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[0.00] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[0.00] Memory: 1946400K/2052096K available (12288K kernel code, 982K 
rwdata, 4656K rodata, 1024K init, 304K bss, 40160K reserved, 65536K 
cma-reserved, 1200128K highmem)
[0.00] Virtual kernel memory layout:
[0.00] vector  : 0x - 0x1000   (   4 kB)
[0.00] fixmap  : 0xffc0 - 0xfff0   (3072 kB)
[0.00] vmalloc : 0xf080 - 0xff80   ( 240 MB)
[0.00] lowmem  : 0xc000 - 0xf000   ( 768 MB)
[0.00] pkmap   : 0xbfe0 - 0xc000   (   2 MB)
[0.00] modules : 0xbf00 - 0xbfe0   (  14 MB)
[0.00] 

Re: [Mesa-dev] [PATCH v3 18/42] intel/compiler: add a helper function to query hardware type table

2019-01-18 Thread Iago Toral
On Thu, 2019-01-17 at 14:16 -0600, Jason Ekstrand wrote:
> On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga  > wrote:
> > We open coded this in a couple of places, so a helper function is
> > probably
> > 
> > sensible. Plus it makes it more consistent with the 3src hardware
> > type case.
> > 
> > 
> > 
> > Suggested-by: Topi Pohjolainen 
> > 
> > ---
> > 
> >  src/intel/compiler/brw_reg_type.c | 34 ---
> > 
> > 
> >  1 file changed, 18 insertions(+), 16 deletions(-)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_reg_type.c
> > b/src/intel/compiler/brw_reg_type.c
> > 
> > index 09b3ea61d4c..0c9f522eca0 100644
> > 
> > --- a/src/intel/compiler/brw_reg_type.c
> > 
> > +++ b/src/intel/compiler/brw_reg_type.c
> > 
> > @@ -193,6 +193,20 @@ static const struct hw_3src_type {
> > 
> >  #undef E
> > 
> >  };
> > 
> > 
> > 
> > +static inline const struct hw_type *
> > 
> > +get_hw_type_map(const struct gen_device_info *devinfo, uint32_t
> > *size)
> > 
> > +{
> > 
> > +   if (devinfo->gen >= 11) {
> > 
> > +  if (size)
> > 
> > + *size = ARRAY_SIZE(gen11_hw_type);
> 
> All of these type tables have the same size because we declare
> everything through BRW_REGISTER_TYPE_LAST.  Do we really need to be
> returning the array size separately for every table?

Good point, I guess we can drop that in this patch and the previous and
simply check that whatever type we are indexing is <=
BRW_REGISTER_TYPE_LAST. I'll do that.
> > +  return gen11_hw_type;
> > 
> > +   } else {
> > 
> > +  if (size)
> > 
> > + *size = ARRAY_SIZE(gen4_hw_type);
> > 
> > +  return gen4_hw_type;
> > 
> > +   }
> > 
> > +}
> > 
> > +
> > 
> >  /**
> > 
> >   * Convert a brw_reg_type enumeration value into the hardware
> > representation.
> > 
> >   *
> > 
> > @@ -203,16 +217,10 @@ brw_reg_type_to_hw_type(const struct
> > gen_device_info *devinfo,
> > 
> >  enum brw_reg_file file,
> > 
> >  enum brw_reg_type type)
> > 
> >  {
> > 
> > -   const struct hw_type *table;
> > 
> > -
> > 
> > -   if (devinfo->gen >= 11) {
> > 
> > -  assert(type < ARRAY_SIZE(gen11_hw_type));
> > 
> > -  table = gen11_hw_type;
> > 
> > -   } else {
> > 
> > -  assert(type < ARRAY_SIZE(gen4_hw_type));
> > 
> > -  table = gen4_hw_type;
> > 
> > -   }
> > 
> > +   uint32_t table_size;
> > 
> > +   const struct hw_type *table = get_hw_type_map(devinfo,
> > _size);
> > 
> > 
> > 
> > +   assert(type < table_size);
> > 
> > assert(devinfo->has_64bit_types || brw_reg_type_to_size(type) <
> > 8 ||
> > 
> >type == BRW_REGISTER_TYPE_NF);
> > 
> > 
> > 
> > @@ -234,13 +242,7 @@ enum brw_reg_type
> > 
> >  brw_hw_type_to_reg_type(const struct gen_device_info *devinfo,
> > 
> >  enum brw_reg_file file, unsigned hw_type)
> > 
> >  {
> > 
> > -   const struct hw_type *table;
> > 
> > -
> > 
> > -   if (devinfo->gen >= 11) {
> > 
> > -  table = gen11_hw_type;
> > 
> > -   } else {
> > 
> > -  table = gen4_hw_type;
> > 
> > -   }
> > 
> > +   const struct hw_type *table = get_hw_type_map(devinfo, NULL);
> > 
> > 
> > 
> > if (file == BRW_IMMEDIATE_VALUE) {
> > 
> >for (enum brw_reg_type i = 0; i <= BRW_REGISTER_TYPE_LAST;
> > i++) {
> > 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: avoid context rolls when binding graphics pipelines

2019-01-18 Thread Bas Nieuwenhuizen
Besides CTS, I'd appreciate if you can test with Talos, as that was
the msot affected by bugs in this code.

Otherwise,

Reviewed-by: Bas Nieuwenhuizen 

On Mon, Jan 14, 2019 at 5:02 PM Rhys Perry  wrote:
>
> It's common in some applications to bind a new graphics pipeline without
> ending up changing any context registers.
>
> This has a pipline have two command buffers: one for setting context
> registers and one for everything else. The context register command buffer
> is only emitted if it differs from the previous pipeline's.
>
> Signed-off-by: Rhys Perry 
> ---
>  src/amd/vulkan/radv_cmd_buffer.c |  46 +--
>  src/amd/vulkan/radv_pipeline.c   | 217 ---
>  src/amd/vulkan/radv_private.h|   2 +
>  3 files changed, 150 insertions(+), 115 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_cmd_buffer.c 
> b/src/amd/vulkan/radv_cmd_buffer.c
> index f41d6c0b3e7..59903ab64d8 100644
> --- a/src/amd/vulkan/radv_cmd_buffer.c
> +++ b/src/amd/vulkan/radv_cmd_buffer.c
> @@ -634,7 +634,7 @@ radv_emit_descriptor_pointers(struct radv_cmd_buffer 
> *cmd_buffer,
> }
>  }
>
> -static void
> +static bool
>  radv_update_multisample_state(struct radv_cmd_buffer *cmd_buffer,
>   struct radv_pipeline *pipeline)
>  {
> @@ -646,7 +646,7 @@ radv_update_multisample_state(struct radv_cmd_buffer 
> *cmd_buffer,
> cmd_buffer->sample_positions_needed = true;
>
> if (old_pipeline && num_samples == 
> old_pipeline->graphics.ms.num_samples)
> -   return;
> +   return false;
>
> radeon_set_context_reg_seq(cmd_buffer->cs, R_028BDC_PA_SC_LINE_CNTL, 
> 2);
> radeon_emit(cmd_buffer->cs, ms->pa_sc_line_cntl);
> @@ -661,6 +661,8 @@ radv_update_multisample_state(struct radv_cmd_buffer 
> *cmd_buffer,
> radeon_emit(cmd_buffer->cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
> radeon_emit(cmd_buffer->cs, EVENT_TYPE(V_028A90_FLUSH_DFSM) | 
> EVENT_INDEX(0));
> }
> +
> +   return true;
>  }
>
>  static void
> @@ -863,15 +865,15 @@ radv_emit_rbplus_state(struct radv_cmd_buffer 
> *cmd_buffer)
> radeon_emit(cmd_buffer->cs, sx_blend_opt_control);
>  }
>
> -static void
> +static bool
>  radv_emit_graphics_pipeline(struct radv_cmd_buffer *cmd_buffer)
>  {
> struct radv_pipeline *pipeline = cmd_buffer->state.pipeline;
>
> if (!pipeline || cmd_buffer->state.emitted_pipeline == pipeline)
> -   return;
> +   return false;
>
> -   radv_update_multisample_state(cmd_buffer, pipeline);
> +   bool context_roll = radv_update_multisample_state(cmd_buffer, 
> pipeline);
>
> cmd_buffer->scratch_size_needed =
>   MAX2(cmd_buffer->scratch_size_needed,
> @@ -884,6 +886,15 @@ radv_emit_graphics_pipeline(struct radv_cmd_buffer 
> *cmd_buffer)
>
> radeon_emit_array(cmd_buffer->cs, pipeline->cs.buf, pipeline->cs.cdw);
>
> +   if (!cmd_buffer->state.emitted_pipeline ||
> +   cmd_buffer->state.emitted_pipeline->ctx_cs.cdw != 
> pipeline->ctx_cs.cdw ||
> +   cmd_buffer->state.emitted_pipeline->ctx_cs_hash != 
> pipeline->ctx_cs_hash ||
> +   memcmp(cmd_buffer->state.emitted_pipeline->ctx_cs.buf,
> +  pipeline->ctx_cs.buf, pipeline->ctx_cs.cdw * 4)) {
> +   radeon_emit_array(cmd_buffer->cs, pipeline->ctx_cs.buf, 
> pipeline->ctx_cs.cdw);
> +   context_roll = true;
> +   }
> +
> for (unsigned i = 0; i < MESA_SHADER_COMPUTE; i++) {
> if (!pipeline->shaders[i])
> continue;
> @@ -902,6 +913,8 @@ radv_emit_graphics_pipeline(struct radv_cmd_buffer 
> *cmd_buffer)
> cmd_buffer->state.emitted_pipeline = pipeline;
>
> cmd_buffer->state.dirty &= ~RADV_CMD_DIRTY_PIPELINE;
> +
> +   return context_roll;
>  }
>
>  static void
> @@ -2859,6 +2872,8 @@ radv_emit_compute_pipeline(struct radv_cmd_buffer 
> *cmd_buffer)
> if (!pipeline || pipeline == 
> cmd_buffer->state.emitted_compute_pipeline)
> return;
>
> +   assert(!pipeline->ctx_cs.cdw);
> +
> cmd_buffer->state.emitted_compute_pipeline = pipeline;
>
> radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs, 
> pipeline->cs.cdw);
> @@ -3609,30 +3624,30 @@ radv_emit_draw_packets(struct radv_cmd_buffer 
> *cmd_buffer,
>   * any context registers.
>   */
>  static bool radv_need_late_scissor_emission(struct radv_cmd_buffer 
> *cmd_buffer,
> -bool indexed_draw)
> +bool indexed_draw,
> +bool pipeline_context_roll)
>  {
> struct radv_cmd_state *state = _buffer->state;
>
> if (!cmd_buffer->device->physical_device->has_scissor_bug)
> return false;
>
> +   if (pipeline_context_roll)
> +   return true;
> +
> uint32_t 

Re: [Mesa-dev] [PATCH v3 19/42] intel/compiler: don't compact 3-src instructions with Src1Type or Src2Type bits

2019-01-18 Thread Iago Toral
On Thu, 2019-01-17 at 14:14 -0600, Jason Ekstrand wrote:
> On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga  > wrote:
> > We are now using these bits, so don't assert that they are not set,
> > just
> > 
> > avoid compaction in that case.
> > 
> > 
> > 
> > Reviewed-by: Topi Pohjolainen 
> > 
> > ---
> > 
> >  src/intel/compiler/brw_eu_compact.c | 5 -
> > 
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_eu_compact.c
> > b/src/intel/compiler/brw_eu_compact.c
> > 
> > index ae14ef10ec0..20fed254331 100644
> > 
> > --- a/src/intel/compiler/brw_eu_compact.c
> > 
> > +++ b/src/intel/compiler/brw_eu_compact.c
> > 
> > @@ -928,8 +928,11 @@ has_3src_unmapped_bits(const struct
> > gen_device_info *devinfo,
> > 
> >assert(!brw_inst_bits(src, 127, 126) &&
> > 
> >   !brw_inst_bits(src, 105, 105) &&
> > 
> >   !brw_inst_bits(src, 84, 84) &&
> > 
> > - !brw_inst_bits(src, 36, 35) &&
> > 
> >   !brw_inst_bits(src, 7,  7));
> > 
> > +
> > 
> > +  /* Src1Type and Src2Type, used for mixed-precision floating
> > point */
> > 
> > +  if (brw_inst_bits(src, 36, 35))
> > 
> > + return true;
> 
> You're only doing this in the broadwell case.  What about SKL+ and
> CHV?  Can we compact mixed-precision stuff there?  Looks like maybe
> we can but there should be at least something in the commit message
> about that.

In these platforms compaction is possible in some cases
and  set_3src_control_index() takescare of this by including these bits
in a tablre lookup for accepted combinations. I can add thisto the
commit message:
"We are now using these bits, so don't assert that they are not set. In
gen8, if these bits are setcompaction is not possible. On gen9 and CHV
platforms set_3src_control_index() checks thesebits (and others)
against a table to validate if the particular bit combination is
eligible forcompaction or not."
With that said, if I am reading this correctly, it looks like all
entries in gen8_3src_control_index_table that allow compaction require
these bits to be zero at present, so I guess that right now we could
also just extend the check I am adding here for BDW to other platforms,
however, I guess that relying on the array with accepted bit
combinations for compaction is more reliable should any future
platforms allow for more combinations.
> > }
> > 
> > 
> > 
> > return false;
> > 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4] anv/device: fix maximum number of images supported

2019-01-18 Thread Iago Toral
Argh, I am really sorry about that :-(

It seems I didn't push the right version patch (the one I sent for
review) but a previous version of that. The patch that Lionel sent to
fix this is exactly what I had changed in the version I sent for
review.

I am dorry for the mess, I'll be more careful next time.

Iago

On Thu, 2019-01-17 at 10:13 -0800, Mark Janes wrote:
> Hi Iago,
> 
> It looks like you tested this patch in CI and got the same failures
> that
> we are seeing on master:
> 
> 
http://mesa-ci-results.jf.intel.com/itoral/builds/263/group/63a9f0ea7bb98050796b649e85481845
> 
> Why was this patch pushed?
> 
> -Mark
> 
> Mark Janes  writes:
> 
> > This patch regresses thousands of tests on BDW and HSW:
> > 
> > 
http://mesa-ci-results.jf.intel.com/vulkancts/builds/10035/group/63a9f0ea7bb98050796b649e85481845#fails
> > 
> > I'll revert it as soon as my testing completes.
> > 
> > Iago Toral Quiroga  writes:
> > 
> > > We had defined MAX_IMAGES as 8, which we used to size the array
> > > for
> > > image push constant data. The comment there stated that this was
> > > for
> > > gen8, but anv_nir_apply_pipeline_layout runs for all gens and
> > > writes
> > > that array, asserting that we don't exceed that number of images,
> > > which imposes a limit of MAX_IMAGES on all gens.
> > > 
> > > Furthermore, despite this, we are exposing up to 64 images per
> > > shader
> > > stage on all gens, gen8 included.
> > > 
> > > This patch lowers the number of images we expose in gen8 to 8 and
> > > keeps 64 images for gen9+ while making sure that only pre-SKL
> > > gens
> > > use push constant space to handle images.
> > > 
> > > v2:
> > >  - <= instead of < in the assert (Eric, Lionel)
> > >  - Change the way the assertion is written (Eric)
> > > 
> > > v3:
> > >  - Revert the way the assertion is written to the form it had in
> > > v1,
> > >the version in v2 was not equivalent and was incorrect.
> > > (Lionel)
> > > 
> > > v4:
> > >  - gen9+ doesn't need push constants for images at all (Jason)
> > > ---
> > >  src/intel/vulkan/anv_device.c |  7 --
> > >  .../vulkan/anv_nir_apply_pipeline_layout.c|  4 +--
> > >  src/intel/vulkan/anv_private.h|  5 ++--
> > >  src/intel/vulkan/genX_cmd_buffer.c| 25
> > > +--
> > >  4 files changed, 28 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/src/intel/vulkan/anv_device.c
> > > b/src/intel/vulkan/anv_device.c
> > > index 523f1483e29..f85458b672e 100644
> > > --- a/src/intel/vulkan/anv_device.c
> > > +++ b/src/intel/vulkan/anv_device.c
> > > @@ -987,9 +987,12 @@ void anv_GetPhysicalDeviceProperties(
> > > const uint32_t max_samplers = (devinfo->gen >= 8 || devinfo-
> > > >is_haswell) ?
> > >   128 : 16;
> > >  
> > > +   const uint32_t max_images = devinfo->gen < 9 ?
> > > MAX_GEN8_IMAGES : MAX_IMAGES;
> > > +
> > > VkSampleCountFlags sample_counts =
> > >isl_device_get_sample_counts(>isl_dev);
> > >  
> > > +
> > > VkPhysicalDeviceLimits limits = {
> > >.maxImageDimension1D  = (1 << 14),
> > >.maxImageDimension2D  = (1 << 14),
> > > @@ -1009,7 +1012,7 @@ void anv_GetPhysicalDeviceProperties(
> > >.maxPerStageDescriptorUniformBuffers  = 64,
> > >.maxPerStageDescriptorStorageBuffers  = 64,
> > >.maxPerStageDescriptorSampledImages   = max_samplers,
> > > -  .maxPerStageDescriptorStorageImages   = 64,
> > > +  .maxPerStageDescriptorStorageImages   = max_images,
> > >.maxPerStageDescriptorInputAttachments= 64,
> > >.maxPerStageResources = 250,
> > >.maxDescriptorSetSamplers = 6 *
> > > max_samplers, /* number of stages * maxPerStageDescriptorSamplers
> > > */
> > > @@ -1018,7 +1021,7 @@ void anv_GetPhysicalDeviceProperties(
> > >.maxDescriptorSetStorageBuffers   = 6 *
> > > 64,   /* number of stages *
> > > maxPerStageDescriptorStorageBuffers */
> > >.maxDescriptorSetStorageBuffersDynamic=
> > > MAX_DYNAMIC_BUFFERS / 2,
> > >.maxDescriptorSetSampledImages= 6 *
> > > max_samplers, /* number of stages *
> > > maxPerStageDescriptorSampledImages */
> > > -  .maxDescriptorSetStorageImages= 6 *
> > > 64,   /* number of stages *
> > > maxPerStageDescriptorStorageImages */
> > > +  .maxDescriptorSetStorageImages= 6 *
> > > max_images,   /* number of stages *
> > > maxPerStageDescriptorStorageImages */
> > >.maxDescriptorSetInputAttachments = 256,
> > >.maxVertexInputAttributes = MAX_VBS,
> > >.maxVertexInputBindings   = MAX_VBS,
> > > diff --git a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
> > > b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
> > > index b3daf702bc0..623984b0f8c 100644
> > > --- 

Re: [Mesa-dev] Android + tegra + mesa

2019-01-18 Thread Thierry Reding
On Fri, Jan 18, 2019 at 12:28:07AM +0300, Artyom Bambalov wrote:
> Hello, guys. I have device with tegra K1(T124). I try to get open source
> graphical stack. What I use: mesa 3d(18.2-19.0), libdrm(2.4.96),
> drm_hwcomposer(the latest), gbm_gralloc(the latest), 4.19 kernel by
> google(tegra drm + nouveau). What I get: normal bootanimatin, but the
> system UI is not displaying properly(look at pinned picture). I also tested
> the kernel with linux distr and all works fine. So, I wanted to ask is it
> possible to get a normal picture on android with nouveau on tegra K1(T124)?

Do you have some notes on how to reproduce your setup? I've been meaning
to try this out but never found enough time to see it through. Having a
set of simple instructions for getting this built and booted would help
reproduce the behavior and debug.

That said, it looks like you've got a completely white display, which
usually means that you're getting page faults from the SMMU, so maybe
that'd be an interesting place to look at. You should be seeing errors
from the SMMU in the kernel log in that case.

There was a brief period where the SMMU wasn't working properly with
Nouveau, which could be related to this, especially since it was around
the timeframe of 4.19. That issue was fixed with this commit:

b59fb482b522 ("drm/nouveau: tegra: Detach from ARM DMA/IOMMU mapping")

which was merged into v4.19, but maybe it's not in the tree from Google
that you're using (for whatever reason).

Come to think of it, that wouldn't really explain the white display,
though, since that usually happens on SMMU faults for reads by the
display controller. In any case, the kernel log would hopefully contain
some clues as to what could be wrong.

Thierry


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 13/42] intel/compiler: lower 16-bit flrp

2019-01-18 Thread Iago Toral
Sorry about that, I split them like this because one change is for NIR
and the other one is for Intel so I thought it was preferred to keep
changes like this in separate patches. Anyway, I understand that for
small changes like this it is not helpful, I'll keep that in mind for
future patches.
On Thu, 2019-01-17 at 14:04 -0600, Jason Ekstrand wrote:
> Again, please squash with the previous patch.  Splitting stuff this
> granular just makes review harder.
> 
> On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga  > wrote:
> > Reviewed-by: Jason Ekstrand 
> > 
> > ---
> > 
> >  src/intel/compiler/brw_compiler.c | 1 +
> > 
> >  1 file changed, 1 insertion(+)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_compiler.c
> > b/src/intel/compiler/brw_compiler.c
> > 
> > index f885e79c3e6..04a1a7cac4e 100644
> > 
> > --- a/src/intel/compiler/brw_compiler.c
> > 
> > +++ b/src/intel/compiler/brw_compiler.c
> > 
> > @@ -33,6 +33,7 @@
> > 
> > .lower_sub = true, 
> >\
> > 
> > .lower_fdiv = true,   
> > \
> > 
> > .lower_scmp = true,   
> > \
> > 
> > +   .lower_flrp16 = true, 
> > \
> > 
> > .lower_fmod16 = true, 
> > \
> > 
> > .lower_fmod32 = true, 
> > \
> > 
> > .lower_fmod64 = false, 
> >\
> > 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 09/42] intel/compiler: allow extended math functions with HF operands

2019-01-18 Thread Iago Toral
Sure, I'll squash all 3.

On Thu, 2019-01-17 at 14:00 -0600, Jason Ekstrand wrote:
> This should probably be squashed into patch 7 so that it's clear from
> one patch that it's being properly handed on both sides of the gen9
> boundary.  I'd also squash in patch 15 and just give the whole thing
> the title "Handle extended math restrictions for half-float" with a
> detailed message describing what all we have to handle.  If you want
> to keep it as two patches, that's fine but unnecessary.
> 
> 
> On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga  > wrote:
> > The PRM states that half-float operands are supported since gen9.
> > 
> > 
> > 
> > Reviewed-by: Topi Pohjolainen 
> > 
> > ---
> > 
> >  src/intel/compiler/brw_eu_emit.c | 6 --
> > 
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_eu_emit.c
> > b/src/intel/compiler/brw_eu_emit.c
> > 
> > index 45e2552783b..e21df4624b3 100644
> > 
> > --- a/src/intel/compiler/brw_eu_emit.c
> > 
> > +++ b/src/intel/compiler/brw_eu_emit.c
> > 
> > @@ -1874,8 +1874,10 @@ void gen6_math(struct brw_codegen *p,
> > 
> >assert(src1.file == BRW_GENERAL_REGISTER_FILE ||
> > 
> >   (devinfo->gen >= 8 && src1.file ==
> > BRW_IMMEDIATE_VALUE));
> > 
> > } else {
> > 
> > -  assert(src0.type == BRW_REGISTER_TYPE_F);
> > 
> > -  assert(src1.type == BRW_REGISTER_TYPE_F);
> > 
> > +  assert(src0.type == BRW_REGISTER_TYPE_F ||
> > 
> > + (src0.type == BRW_REGISTER_TYPE_HF && devinfo->gen >=
> > 9));
> > 
> > +  assert(src1.type == BRW_REGISTER_TYPE_F ||
> > 
> > + (src1.type == BRW_REGISTER_TYPE_HF && devinfo->gen >=
> > 9));
> > 
> > }
> > 
> > 
> > 
> > /* Source modifiers are ignored for extended math instructions
> > on Gen6. */
> > 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev