Re: [Mesa-dev] [PATCH 48/95] i965/vec4: add a force_vstride0 flag to src_reg

2016-09-13 Thread Iago Toral
On Tue, 2016-09-13 at 22:12 -0700, Francisco Jerez wrote:
> Iago Toral  writes:
> 
> > 
> > On Mon, 2016-09-12 at 14:05 -0700, Francisco Jerez wrote:
> > > 
> > > Iago Toral Quiroga  writes:
> > > 
> > > > 
> > > > 
> > > > We will use this in cases where we want to force the vstride of
> > > > a
> > > > src_reg
> > > > to 0 to exploit a particular behavior of the hardware. It will
> > > > come
> > > > in
> > > > handy to implement access to components Z/W.
> > > > ---
> > > >  src/mesa/drivers/dri/i965/brw_ir_vec4.h | 1 +
> > > >  src/mesa/drivers/dri/i965/brw_vec4.cpp  | 2 ++
> > > >  2 files changed, 3 insertions(+)
> > > > 
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> > > > b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> > > > index f66c093..f3cce4b 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> > > > +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> > > > @@ -51,6 +51,7 @@ public:
> > > > explicit src_reg(const dst_reg ®);
> > > >  
> > > > src_reg *reladdr;
> > > > +   bool force_vstride0;
> > > I was wondering whether it would make more sense to unify this
> > > with
> > > the
> > > FS back-end's fs_reg::stride (a numeric stride field is also
> > > likely
> > > more
> > > convenient to do arithmetic on than a boolean) and promote it to
> > > backend_reg?  It could be defined as the number of components to
> > > jump
> > > over for each logical channel of the register, which is just the
> > > vstride
> > > in single-precision SIMD4x2 and the hstride in scalar mode.
> > We could do that, but I thought it would be a good idea to make it
> > clear that here we are using the vstride=0 with a very specific
> > intention and we don't expect the hardware to do what it would be
> > expected (we are trying to exploit a hardware bug after all). If we
> > were to use a normal stride field for this I think we would make
> > this
> > intention much less obvious and other people reading the code would
> > have a much harder time understanding what is really going on.
> > Since we
> > are being tricky here I think the extra field to signal that we are
> > trying to do something "special" might be worth it: people can
> > track
> > where we read and write that field and see exactly where it is
> > being
> > used for the purpose of exploiting this particular hardware
> > behavior.
> > 
> Yes, I agree that the hardware's behavior on Gen7 with non-identity
> vstride is tricky and special -- Special enough that *none* of the
> VEC4
> optimization passes and IR-handling code need to be aware of it,
> because
> the field is only going to be used as internal book-keeping data
> structure in convert_to_hw_regs() and immediately discarded.  IOW
> you're
> storing an internal data structure of convert_to_hw_regs() as part of
> the shared IR data structure, with no well-defined semantics and
> which
> no back-end code (not even convert_to_hw_regs()) is going to be able
> to
> honor.
> 
> So if your argument for making the representation of vstride
> unnecessarily non-orthogonal is that you want to discourage people
> from
> using it at the IR level (which is fair because it won't work at
> all!),
> I would argue that it doesn't belong in the IR data structures in the
> first place, because you could just keep convert_to_hw_regs' internal
> data structures internal to convert_to_hw_regs.  (I don't actually
> think
> you need the data structure, neither internal nor external, but more
> on
> that later)

Yes, that makes sense.

> > 
> > > 
> > > But thinking about it some more, I wonder if it's really
> > > necessary to
> > > expose vertical strides at the IR level?  Aren't you planing to
> > > use
> > > this
> > > during the conversion to HW registers exclusively?  Why don't you
> > > set
> > > the vstride field directly in that case?
> > Yes, this is used exclusively at that time. The conversion to
> > hardware
> > registers in convert_to_hw_regs() happens in two stages now:
> > 
> > We call our 'expand_64bit_swizzle_to_32bit()' helper first. This
> > one
> > takes care of checking the regioning on DF instructions, translate
> > swizzles and set force_vstrid0 to true when needed (which is also
> > the
> > only place that would set this to true). Then the rest of the code
> > in
> > convert_to_hw_regs() just operates as usual, only that it will
> > check
> > the force_vstride0 setting to decide the vstride to use for DF
> > regions.
> > 
> > I did it like this because it allows us to keep the DF swizzle
> > translation and regioning checking logic separated from the
> > conversion
> > to hardware registers, but this separation means that we need to
> > tell
> > the latter when it has to set the vstride to 0, thus the addition
> > of
> > the forcE_vstride0 field. I think having these two things separated
> > makes sense and makes the code easier to read. We can keep both
> > things
> > separate and still avoid the force_vstride0 field by using a stride
> > field as you suggest above, but as I 

[Mesa-dev] [PATCH] i965/fs: Take the sample mask into account in FIND_LIVE_CHANNEL

2016-09-13 Thread Jason Ekstrand
Just looking at the channel enables is not sufficient, at least not on Sky
Lake.  Channels that are disabled by the sample_mask may show up in the
channel enable register as being enabled even if they are not executing.
This can cause FIND_LIVE_CHANNEL to return a channel that isn't actually
executing.  In our handling of interpolateAtSample we do a clever trick
with emit_uniformize to call the interpolator once for each unique sample
id.  Thanks to FIND_LIVE_CHANNEL returning a dead channel, we can get an
infinite loop which hangs the GPU.

Signed-off-by: Jason Ekstrand 
---
 src/mesa/drivers/dri/i965/brw_eu.h   |  3 ++-
 src/mesa/drivers/dri/i965/brw_eu_emit.c  | 22 +++---
 src/mesa/drivers/dri/i965/brw_fs_builder.h   |  3 ++-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  2 +-
 5 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
b/src/mesa/drivers/dri/i965/brw_eu.h
index 3e52764..9aaab78 100644
--- a/src/mesa/drivers/dri/i965/brw_eu.h
+++ b/src/mesa/drivers/dri/i965/brw_eu.h
@@ -488,7 +488,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
 
 void
 brw_find_live_channel(struct brw_codegen *p,
-  struct brw_reg dst);
+  struct brw_reg dst,
+  struct brw_reg sample_mask);
 
 void
 brw_broadcast(struct brw_codegen *p,
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 3b12030..f593a8d 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -3361,7 +3361,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
 }
 
 void
-brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst)
+brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst,
+  struct brw_reg sample_mask)
 {
const struct gen_device_info *devinfo = p->devinfo;
const unsigned exec_size = 1 << brw_inst_exec_size(devinfo, p->current);
@@ -3377,13 +3378,20 @@ brw_find_live_channel(struct brw_codegen *p, struct 
brw_reg dst)
 
   if (devinfo->gen >= 8) {
  /* Getting the first active channel index is easy on Gen8: Just find
-  * the first bit set in the mask register.  The same register exists
-  * on HSW already but it reads back as all ones when the current
-  * instruction has execution masking disabled, so it's kind of
-  * useless.
+  * the first bit set in the mask register AND the sample mask.  The
+  * same register exists on HSW already but it reads back as all ones
+  * when the current instruction has execution masking disabled, so
+  * it's kind of useless.
   */
- inst = brw_FBL(p, vec1(dst),
-retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD));
+ struct brw_reg mask_reg = retype(brw_mask_reg(0),
+  BRW_REGISTER_TYPE_UD);
+ if (sample_mask.file != BRW_IMMEDIATE_VALUE ||
+ sample_mask.ud != 0x) {
+brw_AND(p, vec1(dst), mask_reg, sample_mask);
+mask_reg = vec1(dst);
+ }
+
+ inst = brw_FBL(p, vec1(dst), mask_reg);
 
  /* Quarter control has the effect of magically shifting the value of
   * this register so you'll get the first active channel relative to
diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
b/src/mesa/drivers/dri/i965/brw_fs_builder.h
index 483672f..45b5f88 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
+++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
@@ -407,7 +407,8 @@ namespace brw {
  const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD);
  const dst_reg dst = vgrf(src.type);
 
- ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
+ ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index,
+   sample_mask_reg());
  ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, component(chan_index, 
0));
 
  return src_reg(component(dst, 0));
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 2f4ba7b..d923b0b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -2041,7 +2041,7 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
  break;
 
   case SHADER_OPCODE_FIND_LIVE_CHANNEL:
- brw_find_live_channel(p, dst);
+ brw_find_live_channel(p, dst, src[0]);
  break;
 
   case SHADER_OPCODE_BROADCAST:
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 256abae..63fca6f 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1863,7 +1863,7 @@ generate_code(struct brw_c

Re: [Mesa-dev] [PATCH v2 3/7] intel/isl: Add support for 1-D compressed textures

2016-09-13 Thread Pohjolainen, Topi
On Mon, Sep 12, 2016 at 05:58:20PM -0700, Jason Ekstrand wrote:
> Compressed 1-D textures are a well-defined thing in both GL and Vulkan.

Looks correct to me:

Reviewed-by: Topi Pohjolainen 

> ---
>  src/intel/isl/isl.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> index a75fddf..185984d 100644
> --- a/src/intel/isl/isl.c
> +++ b/src/intel/isl/isl.c
> @@ -518,7 +518,6 @@ isl_calc_phys_level0_extent_sa(const struct isl_device 
> *dev,
>assert(info->height == 1);
>assert(info->depth == 1);
>assert(info->samples == 1);
> -  assert(!isl_format_is_compressed(info->format));
>  
>switch (dim_layout) {
>case ISL_DIM_LAYOUT_GEN4_3D:
> @@ -527,8 +526,8 @@ isl_calc_phys_level0_extent_sa(const struct isl_device 
> *dev,
>case ISL_DIM_LAYOUT_GEN9_1D:
>case ISL_DIM_LAYOUT_GEN4_2D:
>   *phys_level0_sa = (struct isl_extent4d) {
> -.w = info->width,
> -.h = 1,
> +.w = isl_align_npot(info->width, fmtl->bw),
> +.h = fmtl->bh,
>  .d = 1,
>  .a = info->array_len,
>   };
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] st/vdpau: flush the context before calling flush_frontbuffer

2016-09-13 Thread Nayan Deshmukh
so that the texture is rendered to back buffer before calling
flush_frontbuffer and can be copied to a different buffer in
the function

Signed-off-by: Nayan Deshmukh 
---
 src/gallium/state_trackers/vdpau/presentation.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/gallium/state_trackers/vdpau/presentation.c 
b/src/gallium/state_trackers/vdpau/presentation.c
index 2862eaf..f35d73a 100644
--- a/src/gallium/state_trackers/vdpau/presentation.c
+++ b/src/gallium/state_trackers/vdpau/presentation.c
@@ -271,11 +271,14 @@ vlVdpPresentationQueueDisplay(VdpPresentationQueue 
presentation_queue,
}
 
vscreen->set_next_timestamp(vscreen, earliest_presentation_time);
-   pipe->screen->flush_frontbuffer(pipe->screen, tex, 0, 0,
-   vscreen->get_private(vscreen), NULL);
 
+   // flush before calling flush_frontbuffer so that rendering is flushed
+   //  to back buffer so the texture can be copied in flush_frontbuffer
pipe->screen->fence_reference(pipe->screen, &surf->fence, NULL);
pipe->flush(pipe, &surf->fence, 0);
+   pipe->screen->flush_frontbuffer(pipe->screen, tex, 0, 0,
+   vscreen->get_private(vscreen), NULL);
+
pq->last_surf = surf;
 
if (dump_window == -1) {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] st/va: flush the context before calling flush_frontbuffer

2016-09-13 Thread Nayan Deshmukh
so that the texture is rendered to back buffer before calling
flush_frontbuffer and can be copied to a different buffer in
the function

Signed-off-by: Nayan Deshmukh 
---
 src/gallium/state_trackers/va/surface.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/va/surface.c 
b/src/gallium/state_trackers/va/surface.c
index 3ee1cdd..05c890d 100644
--- a/src/gallium/state_trackers/va/surface.c
+++ b/src/gallium/state_trackers/va/surface.c
@@ -321,10 +321,13 @@ vlVaPutSurface(VADriverContextP ctx, VASurfaceID 
surface_id, void* draw, short s
   return status;
}
 
+   // flush before calling flush_frontbuffer so that rendering is flushed
+   //  to back buffer so the texture can be copied in flush_frontbuffer
+   drv->pipe->flush(drv->pipe, NULL, 0);
+
screen->flush_frontbuffer(screen, tex, 0, 0,
  vscreen->get_private(vscreen), NULL);
 
-   drv->pipe->flush(drv->pipe, NULL, 0);
 
pipe_resource_reference(&tex, NULL);
pipe_surface_reference(&surf_draw, NULL);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] vl/dri3: handle the case of different GPU(v4)

2016-09-13 Thread Nayan Deshmukh
In case of prime when rendering is done on GPU other then the
server GPU, use a seprate linear buffer for each back buffer
which will be displayed using present extension.

v2: Use a seprate linear buffer for each back buffer (Michel)
v3: Change variable names and fix coding style (Leo and Emil)
v4: Use PIPE_BIND_SAMPLER_VIEW for back buffer in case when
a seprate linear buffer is used (Michel)

Signed-off-by: Nayan Deshmukh 
---
 src/gallium/auxiliary/vl/vl_winsys_dri3.c | 62 ---
 1 file changed, 49 insertions(+), 13 deletions(-)

diff --git a/src/gallium/auxiliary/vl/vl_winsys_dri3.c 
b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
index 3d596a6..f86300d 100644
--- a/src/gallium/auxiliary/vl/vl_winsys_dri3.c
+++ b/src/gallium/auxiliary/vl/vl_winsys_dri3.c
@@ -49,6 +49,7 @@
 struct vl_dri3_buffer
 {
struct pipe_resource *texture;
+   struct pipe_resource *linear_texture;
 
uint32_t pixmap;
uint32_t sync_fence;
@@ -69,6 +70,8 @@ struct vl_dri3_screen
xcb_present_event_t eid;
xcb_special_event_t *special_event;
 
+   struct pipe_context *pipe;
+
struct vl_dri3_buffer *back_buffers[BACK_BUFFER_NUM];
int cur_back;
 
@@ -82,6 +85,7 @@ struct vl_dri3_screen
int64_t last_ust, ns_frame, last_msc, next_msc;
 
bool flushed;
+   bool is_different_gpu;
 };
 
 static void
@@ -102,6 +106,8 @@ dri3_free_back_buffer(struct vl_dri3_screen *scrn,
xcb_sync_destroy_fence(scrn->conn, buffer->sync_fence);
xshmfence_unmap_shm(buffer->shm_fence);
pipe_resource_reference(&buffer->texture, NULL);
+   if (buffer->linear_texture)
+   pipe_resource_reference(&buffer->linear_texture, NULL);
FREE(buffer);
 }
 
@@ -209,7 +215,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
xcb_sync_fence_t sync_fence;
struct xshmfence *shm_fence;
int buffer_fd, fence_fd;
-   struct pipe_resource templ;
+   struct pipe_resource templ, *pixmap_buffer_texture;
struct winsys_handle whandle;
unsigned usage;
 
@@ -226,8 +232,7 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
   goto close_fd;
 
memset(&templ, 0, sizeof(templ));
-   templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW |
-PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
+   templ.bind = PIPE_BIND_RENDER_TARGET | PIPE_BIND_SAMPLER_VIEW;
templ.format = PIPE_FORMAT_B8G8R8X8_UNORM;
templ.target = PIPE_TEXTURE_2D;
templ.last_level = 0;
@@ -235,16 +240,35 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
templ.height0 = scrn->height;
templ.depth0 = 1;
templ.array_size = 1;
-   buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
- &templ);
-   if (!buffer->texture)
-  goto unmap_shm;
 
+   if (scrn->is_different_gpu) {
+  buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
+&templ);
+  if (!buffer->texture)
+ goto unmap_shm;
+
+  templ.bind |= PIPE_BIND_SCANOUT | PIPE_BIND_SHARED |
+PIPE_BIND_LINEAR;
+  buffer->linear_texture = 
scrn->base.pscreen->resource_create(scrn->base.pscreen,
+  &templ);
+  pixmap_buffer_texture = buffer->linear_texture;
+
+  if (!buffer->linear_texture)
+ goto no_linear_texture;
+
+   } else {
+  templ.bind |= PIPE_BIND_SCANOUT | PIPE_BIND_SHARED;
+  buffer->texture = scrn->base.pscreen->resource_create(scrn->base.pscreen,
+&templ);
+  if (!buffer->texture)
+ goto unmap_shm;
+  pixmap_buffer_texture = buffer->texture;
+   }
memset(&whandle, 0, sizeof(whandle));
whandle.type= DRM_API_HANDLE_TYPE_FD;
usage = PIPE_HANDLE_USAGE_EXPLICIT_FLUSH | PIPE_HANDLE_USAGE_READ;
scrn->base.pscreen->resource_get_handle(scrn->base.pscreen, NULL,
-   buffer->texture, &whandle,
+   pixmap_buffer_texture, &whandle,
usage);
buffer_fd = whandle.handle;
buffer->pitch = whandle.stride;
@@ -271,6 +295,8 @@ dri3_alloc_back_buffer(struct vl_dri3_screen *scrn)
 
return buffer;
 
+no_linear_texture:
+   pipe_resource_reference(&buffer->texture, NULL);
 unmap_shm:
xshmfence_unmap_shm(shm_fence);
 close_fd:
@@ -474,6 +500,7 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen,
struct vl_dri3_screen *scrn = (struct vl_dri3_screen *)context_private;
uint32_t options = XCB_PRESENT_OPTION_NONE;
struct vl_dri3_buffer *back;
+   struct pipe_box src_box;
 
back = scrn->back_buffers[scrn->cur_back];
if (!back)
@@ -485,6 +512,16 @@ vl_dri3_flush_frontbuffer(struct pipe_screen *screen,
 return;
}
 
+   if (scrn->is_different_gpu) {
+  u_box_origin_2d(scrn->width, scrn->height, &src_box);
+

Re: [Mesa-dev] [PATCH 62/95] i965/vec4: Add a shuffle_64bit_data helper

2016-09-13 Thread Francisco Jerez
Iago Toral  writes:

> On Mon, 2016-09-12 at 14:19 -0700, Francisco Jerez wrote:
>> Iago Toral Quiroga  writes:
>> 
>> > 
>> > SIMD4x2 64bit data is stored in register space like this:
>> > 
>> > r0.0:DF  x0 y0 z0 w0
>> > r0.1:DF  x1 y1 z1 w1
>> > 
>> > When we need to write data such as this to memory using 32-bit
>> > write
>> > messages we need to shuffle it in this fashion:
>> > 
>> > r0.0:DF  x0 y0 x1 y1
>> > r0.1:DF  z0 w0 z1 w1
>> > 
>> > and emit two 32-bit write messages, one for r0.0 at base_offset
>> > and another one for r0.1 at base_offset+16.
>> > 
>> > We also need to do the inverse operation when we read using 32-bit
>> > messages
>> > to produce valid SIMD4x2 64bit data from the data read. We can
>> > achieve this
>> > by aplying the exact same shuffling to the data read, although we
>> > need to
>> > apply different channel enables since the layout of the data is
>> > reversed.
>> > 
>> > This helper implements the data shuffling logic and we will use it
>> > in
>> > various places where we read and write 64bit data from/to memory.
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_vec4.h   |  5 ++
>> >  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 96
>> > ++
>> >  2 files changed, 101 insertions(+)
>> > 
>> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h
>> > b/src/mesa/drivers/dri/i965/brw_vec4.h
>> > index 26228d0..3337fc0 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_vec4.h
>> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
>> > @@ -327,6 +327,11 @@ public:
>> >  
>> > src_reg setup_imm_df(double v);
>> >  
>> > +   vec4_instruction *shuffle_64bit_data(dst_reg dst, src_reg src,
>> > +bool for_write,
>> > +bblock_t *block = NULL,
>> > +vec4_instruction *ref =
>> > NULL);
>> > +
>> > virtual void emit_nir_code();
>> > virtual void nir_setup_uniforms();
>> > virtual void
>> > nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr);
>> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > index 450db92..346e822 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
>> > @@ -2145,4 +2145,100 @@
>> > vec4_visitor::nir_emit_undef(nir_ssa_undef_instr *instr)
>> >    dst_reg(VGRF, alloc.allocate(instr->def.bit_size / 32));
>> >  }
>> >  
>> > +/* SIMD4x2 64bit data is stored in register space like this:
>> > + *
>> > + * r0.0:DF  x0 y0 z0 w0
>> > + * r0.1:DF  x1 y1 z1 w1
>> > + *
>> > + * When we need to write data such as this to memory using 32-bit
>> > write
>> > + * messages we need to shuffle it in this fashion:
>> > + *
>> > + * r0.0:DF  x0 y0 x1 y1 (to be written at base offset)
>> > + * r0.0:DF  z0 w0 z1 w1 (to be written at base offset + 16)
>> > + *
>> > + * We need to do the inverse operation when we read using 32-bit
>> > messages,
>> > + * which we can do by applying the same exact shuffling on the 64-
>> > bit data
>> > + * read, only that because the data for each vertex is positioned
>> > differently
>> > + * we need to apply different channel enables.
>> > + *
>> > + * This function takes 64bit data and shuffles it as explained
>> > above.
>> > + *
>> > + * The @for_write parameter is used to specify if the shuffling is
>> > being done
>> > + * for proper SIMD4x2 64-bit data that needs to be shuffled prior
>> > to a 32-bit
>> > + * write message (for_write = true), or instead we are doing the
>> > inverse
>> > + * opperation and we have just read 64-bit data using a 32-bit
>> > messages that we
>> > + * need to shuffle to create valid SIMD4x2 64-bit data (for_write
>> > = false).
>> > + *
>> > + * If @block and @ref are non-NULL, then the shuffling is done
>> > after @ref,
>> > + * otherwise the instructions are emitted normally at the end. The
>> > function
>> > + * returns the last instruction inserted.
>> > + *
>> > + * Notice that @src and @dst cannot be the same register.
>> > + */
>> > +vec4_instruction *
>> > +vec4_visitor::shuffle_64bit_data(dst_reg dst, src_reg src, bool
>> > for_write,
>> > + bblock_t *block, vec4_instruction
>> > *ref)
>> > +{
>> > +   assert(type_sz(src.type) == 8);
>> > +   assert(type_sz(dst.type) == 8);
>> > +   assert(!src.in_range(dst, 2));
>> > +   assert(dst.writemask == WRITEMASK_XYZW);
>> > +   assert(!ref == !block);
>> > +
>> > +   vec4_instruction *inst, *last;
>> > +   bool emit_before = ref != NULL;
>> > +
>> > +   #define EMIT(i) \
>> > +  if (!emit_before) { \
>> > + emit(i); \
>> > +  } else { \
>> > + ref->insert_after(block, i); \
>> > + ref = i; \
>> > +  }  \
>> > +  last = i;
>> > +
>> > +   /* Resolve swizzle in src */
>> > +   if (src.swizzle != BRW_SWIZZLE_XYZW) {
>> > +  dst_reg data = dst_reg(this, glsl_type::dvec4_type);
>> > +  inst = MOV(data, src);
>> 

Re: [Mesa-dev] [PATCH 48/95] i965/vec4: add a force_vstride0 flag to src_reg

2016-09-13 Thread Francisco Jerez
Iago Toral  writes:

> On Mon, 2016-09-12 at 14:05 -0700, Francisco Jerez wrote:
>> Iago Toral Quiroga  writes:
>> 
>> > 
>> > We will use this in cases where we want to force the vstride of a
>> > src_reg
>> > to 0 to exploit a particular behavior of the hardware. It will come
>> > in
>> > handy to implement access to components Z/W.
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_ir_vec4.h | 1 +
>> >  src/mesa/drivers/dri/i965/brw_vec4.cpp  | 2 ++
>> >  2 files changed, 3 insertions(+)
>> > 
>> > diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
>> > b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
>> > index f66c093..f3cce4b 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
>> > +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
>> > @@ -51,6 +51,7 @@ public:
>> > explicit src_reg(const dst_reg ®);
>> >  
>> > src_reg *reladdr;
>> > +   bool force_vstride0;
>> I was wondering whether it would make more sense to unify this with
>> the
>> FS back-end's fs_reg::stride (a numeric stride field is also likely
>> more
>> convenient to do arithmetic on than a boolean) and promote it to
>> backend_reg?  It could be defined as the number of components to jump
>> over for each logical channel of the register, which is just the
>> vstride
>> in single-precision SIMD4x2 and the hstride in scalar mode.
>
> We could do that, but I thought it would be a good idea to make it
> clear that here we are using the vstride=0 with a very specific
> intention and we don't expect the hardware to do what it would be
> expected (we are trying to exploit a hardware bug after all). If we
> were to use a normal stride field for this I think we would make this
> intention much less obvious and other people reading the code would
> have a much harder time understanding what is really going on. Since we
> are being tricky here I think the extra field to signal that we are
> trying to do something "special" might be worth it: people can track
> where we read and write that field and see exactly where it is being
> used for the purpose of exploiting this particular hardware behavior.
>

Yes, I agree that the hardware's behavior on Gen7 with non-identity
vstride is tricky and special -- Special enough that *none* of the VEC4
optimization passes and IR-handling code need to be aware of it, because
the field is only going to be used as internal book-keeping data
structure in convert_to_hw_regs() and immediately discarded.  IOW you're
storing an internal data structure of convert_to_hw_regs() as part of
the shared IR data structure, with no well-defined semantics and which
no back-end code (not even convert_to_hw_regs()) is going to be able to
honor.

So if your argument for making the representation of vstride
unnecessarily non-orthogonal is that you want to discourage people from
using it at the IR level (which is fair because it won't work at all!),
I would argue that it doesn't belong in the IR data structures in the
first place, because you could just keep convert_to_hw_regs' internal
data structures internal to convert_to_hw_regs.  (I don't actually think
you need the data structure, neither internal nor external, but more on
that later)

>> But thinking about it some more, I wonder if it's really necessary to
>> expose vertical strides at the IR level?  Aren't you planing to use
>> this
>> during the conversion to HW registers exclusively?  Why don't you set
>> the vstride field directly in that case?
>
> Yes, this is used exclusively at that time. The conversion to hardware
> registers in convert_to_hw_regs() happens in two stages now:
>
> We call our 'expand_64bit_swizzle_to_32bit()' helper first. This one
> takes care of checking the regioning on DF instructions, translate
> swizzles and set force_vstrid0 to true when needed (which is also the
> only place that would set this to true). Then the rest of the code in
> convert_to_hw_regs() just operates as usual, only that it will check
> the force_vstride0 setting to decide the vstride to use for DF regions.
>
> I did it like this because it allows us to keep the DF swizzle
> translation and regioning checking logic separated from the conversion
> to hardware registers, but this separation means that we need to tell
> the latter when it has to set the vstride to 0, thus the addition of
> the forcE_vstride0 field. I think having these two things separated
> makes sense and makes the code easier to read. We can keep both things
> separate and still avoid the force_vstride0 field by using a stride
> field as you suggest above, but as I said, I think we might be doing a
> rather tricky thing a bit less obvious than it should to other people.
>

Keeping these two tasks logically separate from each other sounds fine
to me, but you don't need to extend the IR for them to exchange data.
AFAICT expand_64bit_swizzle_to_32bit() is doing two things:

 - Calculate the hardware swizzle, which potentially involves an
   adjustment of the subregister offset -- These two are uniquely
   determined

Re: [Mesa-dev] [PATCH 1/3] nir: Call nir_metadata_preserve from nir_lower_alu_to_scalar().

2016-09-13 Thread Eric Anholt
Kenneth Graunke  writes:

> This is mandatory.

This series is:

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] nir/spirv/glsl450: Add support for the InterpolateAt opcodes

2016-09-13 Thread Dave Airlie
On 14 September 2016 at 14:26, Jason Ekstrand  wrote:
> Signed-off-by: Jason Ekstrand 

After writing my own version, I now understand the code enough to review it,

The only thing I did different was to set a flag in the first switch
to avoid the second switch,
but otherwise this looks how it should.

For both:

Reviewed-by: Dave Airlie 
> ---
>  src/compiler/spirv/vtn_glsl450.c | 54 
> +++-
>  1 file changed, 53 insertions(+), 1 deletion(-)
>
> diff --git a/src/compiler/spirv/vtn_glsl450.c 
> b/src/compiler/spirv/vtn_glsl450.c
> index e05d28f..cb0570d 100644
> --- a/src/compiler/spirv/vtn_glsl450.c
> +++ b/src/compiler/spirv/vtn_glsl450.c
> @@ -634,6 +634,57 @@ handle_glsl450_alu(struct vtn_builder *b, enum 
> GLSLstd450 entrypoint,
> }
>  }
>
> +static void
> +handle_glsl450_interpolation(struct vtn_builder *b, enum GLSLstd450 opcode,
> + const uint32_t *w, unsigned count)
> +{
> +   const struct glsl_type *dest_type =
> +  vtn_value(b, w[1], vtn_value_type_type)->type->type;
> +
> +   struct vtn_value *val = vtn_push_value(b, w[2], vtn_value_type_ssa);
> +   val->ssa = vtn_create_ssa_value(b, dest_type);
> +
> +   nir_intrinsic_op op;
> +   switch (opcode) {
> +   case GLSLstd450InterpolateAtCentroid:
> +  op = nir_intrinsic_interp_var_at_centroid;
> +  break;
> +   case GLSLstd450InterpolateAtSample:
> +  op = nir_intrinsic_interp_var_at_sample;
> +  break;
> +   case GLSLstd450InterpolateAtOffset:
> +  op = nir_intrinsic_interp_var_at_offset;
> +  break;
> +   default:
> +  unreachable("Invalid opcode");
> +   }
> +
> +   nir_intrinsic_instr *intrin = nir_intrinsic_instr_create(b->nb.shader, 
> op);
> +
> +   nir_deref_var *deref = vtn_nir_deref(b, w[5]);
> +   intrin->variables[0] =
> +  nir_deref_as_var(nir_copy_deref(intrin, &deref->deref));
> +
> +   switch (opcode) {
> +   case GLSLstd450InterpolateAtCentroid:
> +  break;
> +   case GLSLstd450InterpolateAtSample:
> +   case GLSLstd450InterpolateAtOffset:
> +  intrin->src[0] = nir_src_for_ssa(vtn_ssa_value(b, w[6])->def);
> +  break;
> +   default:
> +  unreachable("Invalid opcode");
> +   }
> +
> +   intrin->num_components = glsl_get_vector_elements(dest_type);
> +   nir_ssa_dest_init(&intrin->instr, &intrin->dest,
> + glsl_get_vector_elements(dest_type),
> + glsl_get_bit_size(dest_type), NULL);
> +   val->ssa->def = &intrin->dest.ssa;
> +
> +   nir_builder_instr_insert(&b->nb, &intrin->instr);
> +}
> +
>  bool
>  vtn_handle_glsl450_instruction(struct vtn_builder *b, uint32_t ext_opcode,
> const uint32_t *w, unsigned count)
> @@ -656,7 +707,8 @@ vtn_handle_glsl450_instruction(struct vtn_builder *b, 
> uint32_t ext_opcode,
> case GLSLstd450InterpolateAtCentroid:
> case GLSLstd450InterpolateAtSample:
> case GLSLstd450InterpolateAtOffset:
> -  unreachable("Unhandled opcode");
> +  handle_glsl450_interpolation(b, ext_opcode, w, count);
> +  break;
>
> default:
>handle_glsl450_alu(b, (enum GLSLstd450)ext_opcode, w, count);
> --
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] nir/spirv: Claim support for SampleRateShading

2016-09-13 Thread Jason Ekstrand
We already support all of the decorations that require this capability.

Signed-off-by: Jason Ekstrand 
---
 src/compiler/spirv/spirv_to_nir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index 7e7a026..0c6743b 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -2448,6 +2448,7 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, 
SpvOp opcode,
   case SpvCapabilityDerivativeControl:
   case SpvCapabilityInterpolationFunction:
   case SpvCapabilityMultiViewport:
+  case SpvCapabilitySampleRateShading:
  break;
 
   case SpvCapabilityClipDistance:
@@ -2467,7 +2468,6 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, 
SpvOp opcode,
   case SpvCapabilityImageGatherExtended:
   case SpvCapabilityStorageImageMultisample:
   case SpvCapabilityImageCubeArray:
-  case SpvCapabilitySampleRateShading:
   case SpvCapabilityInt8:
   case SpvCapabilityInputAttachment:
   case SpvCapabilitySparseResidency:
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] nir/spirv/glsl450: Add support for the InterpolateAt opcodes

2016-09-13 Thread Jason Ekstrand
Signed-off-by: Jason Ekstrand 
---
 src/compiler/spirv/vtn_glsl450.c | 54 +++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c
index e05d28f..cb0570d 100644
--- a/src/compiler/spirv/vtn_glsl450.c
+++ b/src/compiler/spirv/vtn_glsl450.c
@@ -634,6 +634,57 @@ handle_glsl450_alu(struct vtn_builder *b, enum GLSLstd450 
entrypoint,
}
 }
 
+static void
+handle_glsl450_interpolation(struct vtn_builder *b, enum GLSLstd450 opcode,
+ const uint32_t *w, unsigned count)
+{
+   const struct glsl_type *dest_type =
+  vtn_value(b, w[1], vtn_value_type_type)->type->type;
+
+   struct vtn_value *val = vtn_push_value(b, w[2], vtn_value_type_ssa);
+   val->ssa = vtn_create_ssa_value(b, dest_type);
+
+   nir_intrinsic_op op;
+   switch (opcode) {
+   case GLSLstd450InterpolateAtCentroid:
+  op = nir_intrinsic_interp_var_at_centroid;
+  break;
+   case GLSLstd450InterpolateAtSample:
+  op = nir_intrinsic_interp_var_at_sample;
+  break;
+   case GLSLstd450InterpolateAtOffset:
+  op = nir_intrinsic_interp_var_at_offset;
+  break;
+   default:
+  unreachable("Invalid opcode");
+   }
+
+   nir_intrinsic_instr *intrin = nir_intrinsic_instr_create(b->nb.shader, op);
+
+   nir_deref_var *deref = vtn_nir_deref(b, w[5]);
+   intrin->variables[0] =
+  nir_deref_as_var(nir_copy_deref(intrin, &deref->deref));
+
+   switch (opcode) {
+   case GLSLstd450InterpolateAtCentroid:
+  break;
+   case GLSLstd450InterpolateAtSample:
+   case GLSLstd450InterpolateAtOffset:
+  intrin->src[0] = nir_src_for_ssa(vtn_ssa_value(b, w[6])->def);
+  break;
+   default:
+  unreachable("Invalid opcode");
+   }
+
+   intrin->num_components = glsl_get_vector_elements(dest_type);
+   nir_ssa_dest_init(&intrin->instr, &intrin->dest,
+ glsl_get_vector_elements(dest_type),
+ glsl_get_bit_size(dest_type), NULL);
+   val->ssa->def = &intrin->dest.ssa;
+
+   nir_builder_instr_insert(&b->nb, &intrin->instr);
+}
+
 bool
 vtn_handle_glsl450_instruction(struct vtn_builder *b, uint32_t ext_opcode,
const uint32_t *w, unsigned count)
@@ -656,7 +707,8 @@ vtn_handle_glsl450_instruction(struct vtn_builder *b, 
uint32_t ext_opcode,
case GLSLstd450InterpolateAtCentroid:
case GLSLstd450InterpolateAtSample:
case GLSLstd450InterpolateAtOffset:
-  unreachable("Unhandled opcode");
+  handle_glsl450_interpolation(b, ext_opcode, w, count);
+  break;
 
default:
   handle_glsl450_alu(b, (enum GLSLstd450)ext_opcode, w, count);
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Problem with RX 480 on Alien: Isolation and Dota 2

2016-09-13 Thread Michel Dänzer
On 14/09/16 02:53 AM, Marek Olšák wrote:
> 
> cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/x86_64-linux-gnu
> -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=O
>   -DCMAKE_BUILD_TYPE=RelWithDebInfo
> -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \
>   -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
> -fno-omit-frame-pointer" \
>   -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
> -fno-omit-frame-pointer".

FWIW, I recommend enabling assertions, i.e. setting
-DLLVM_ENABLE_ASSERTIONS=1 and removing -DNDEBUG.


>   -DLLVM_BUILD_32_BITS=ON

Hah, didn't know about this, I manually added -m32 to C(XX)FLAGS. Thanks.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: enable ARB_ES3_2_compatibility on gen8+

2016-09-13 Thread Ilia Mirkin
Note that ASTC support is not actually mandated for this extension to be
exposed.

Signed-off-by: Ilia Mirkin 
---

Also note that it doesn't seem required for the driver to simultaneously be
exposing an actual ES 3.2 context. The ext does, however, nominally require
GL 4.5. I think that can be ignored though.

 src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index f1ef4f6..fe22d3f 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -400,6 +400,7 @@ intelInitExtensions(struct gl_context *ctx)
   ctx->Extensions.ARB_shader_precision = true;
   ctx->Extensions.ARB_gpu_shader_fp64 = true;
   ctx->Extensions.ARB_vertex_attrib_64bit = true;
+  ctx->Extensions.ARB_ES3_2_compatibility = true;
   ctx->Extensions.OES_geometry_shader = true;
   ctx->Extensions.OES_texture_cube_map_array = true;
}
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Enable ANDROID_extension_pack_es31a on Gen9+.

2016-09-13 Thread Ilia Mirkin
Seems reasonable. Perhaps it'd be worth figuring out what the deal
with CHV's ASTC support is, since that's probably a more likely
Android target. In the meanwhile, this is

Reviewed-by: Ilia Mirkin 

On Tue, Sep 13, 2016 at 9:06 PM, Kenneth Graunke  wrote:
> AEP requires ASTC, which is only supported on Skylake and later.
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
> b/src/mesa/drivers/dri/i965/intel_extensions.c
> index 0f28546..6bb73b8 100644
> --- a/src/mesa/drivers/dri/i965/intel_extensions.c
> +++ b/src/mesa/drivers/dri/i965/intel_extensions.c
> @@ -409,6 +409,7 @@ intelInitExtensions(struct gl_context *ctx)
>ctx->Extensions.KHR_texture_compression_astc_ldr = true;
>ctx->Extensions.KHR_texture_compression_astc_sliced_3d = true;
>ctx->Extensions.ARB_shader_stencil_export = true;
> +  ctx->Extensions.ANDROID_extension_pack_es31a = true;
>ctx->Extensions.MESA_shader_framebuffer_fetch = true;
> }
>
> --
> 2.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Enable ANDROID_extension_pack_es31a on Gen9+.

2016-09-13 Thread Kenneth Graunke
AEP requires ASTC, which is only supported on Skylake and later.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 0f28546..6bb73b8 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -409,6 +409,7 @@ intelInitExtensions(struct gl_context *ctx)
   ctx->Extensions.KHR_texture_compression_astc_ldr = true;
   ctx->Extensions.KHR_texture_compression_astc_sliced_3d = true;
   ctx->Extensions.ARB_shader_stencil_export = true;
+  ctx->Extensions.ANDROID_extension_pack_es31a = true;
   ctx->Extensions.MESA_shader_framebuffer_fetch = true;
}
 
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] st/mesa: enable GL_ANDROID_extension_pack_es31a when available

2016-09-13 Thread Ilia Mirkin
For now that's never since advanced blend hasn't been piped through.

Signed-off-by: Ilia Mirkin 
---
 src/mesa/state_tracker/st_extensions.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 807fbfb..4d54928 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -1228,4 +1228,22 @@ void st_init_extensions(struct pipe_screen *screen,
 
extensions->OES_primitive_bounding_box = 
extensions->ARB_ES3_1_compatibility;
consts->NoPrimitiveBoundingBoxOutput = true;
+
+   extensions->ANDROID_extension_pack_es31a =
+  extensions->KHR_texture_compression_astc_ldr &&
+  extensions->KHR_blend_equation_advanced &&
+  extensions->OES_sample_variables &&
+  extensions->ARB_shader_image_load_store &&
+  extensions->ARB_texture_stencil8 &&
+  extensions->ARB_texture_multisample &&
+  extensions->OES_copy_image &&
+  extensions->ARB_draw_buffers_blend &&
+  extensions->OES_geometry_shader &&
+  extensions->ARB_gpu_shader5 &&
+  extensions->OES_primitive_bounding_box &&
+  extensions->ARB_tessellation_shader &&
+  extensions->ARB_texture_border_clamp &&
+  extensions->OES_texture_buffer &&
+  extensions->OES_texture_cube_map_array &&
+  extensions->EXT_texture_sRGB_decode;
 }
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] st/mesa: enable ARB_ES3_2_compatibility when enough available

2016-09-13 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin 
---
 src/mesa/state_tracker/st_extensions.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 4d54928..55019d7 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -1246,4 +1246,24 @@ void st_init_extensions(struct pipe_screen *screen,
   extensions->OES_texture_buffer &&
   extensions->OES_texture_cube_map_array &&
   extensions->EXT_texture_sRGB_decode;
+
+   /* Same deal as for ARB_ES3_1_compatibility - this has to be computed
+* before overall versions are selected. Also it's actually a subset of ES
+* 3.2, since it doesn't require ASTC or advanced blending.
+*/
+   extensions->ARB_ES3_2_compatibility =
+  extensions->ARB_ES3_1_compatibility &&
+  extensions->KHR_robustness &&
+  extensions->ARB_copy_image &&
+  extensions->ARB_draw_buffers_blend &&
+  extensions->ARB_draw_elements_base_vertex &&
+  extensions->OES_geometry_shader &&
+  extensions->ARB_gpu_shader5 &&
+  extensions->ARB_sample_shading &&
+  extensions->ARB_tessellation_shader &&
+  extensions->ARB_texture_border_clamp &&
+  extensions->OES_texture_buffer &&
+  extensions->ARB_texture_cube_map_array &&
+  extensions->ARB_texture_stencil8 &&
+  extensions->ARB_texture_multisample;
 }
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] nir: Report progress from nir_lower_phis_to_scalar.

2016-09-13 Thread Kenneth Graunke
Signed-off-by: Kenneth Graunke 
---
 src/compiler/nir/nir.h  |  2 +-
 src/compiler/nir/nir_lower_phis_to_scalar.c | 20 +++-
 src/gallium/drivers/freedreno/ir3/ir3_nir.c |  2 +-
 src/gallium/drivers/vc4/vc4_program.c   |  3 +--
 src/mesa/drivers/dri/i965/brw_nir.c |  2 +-
 5 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index ea8837d..8c7837a 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2411,7 +2411,7 @@ bool nir_lower_vec_to_movs(nir_shader *shader);
 bool nir_lower_alu_to_scalar(nir_shader *shader);
 void nir_lower_load_const_to_scalar(nir_shader *shader);
 
-void nir_lower_phis_to_scalar(nir_shader *shader);
+bool nir_lower_phis_to_scalar(nir_shader *shader);
 void nir_lower_io_to_scalar(nir_shader *shader, nir_variable_mode mask);
 
 void nir_lower_samplers(nir_shader *shader,
diff --git a/src/compiler/nir/nir_lower_phis_to_scalar.c 
b/src/compiler/nir/nir_lower_phis_to_scalar.c
index 9fd00cc..b12718f 100644
--- a/src/compiler/nir/nir_lower_phis_to_scalar.c
+++ b/src/compiler/nir/nir_lower_phis_to_scalar.c
@@ -166,6 +166,8 @@ static bool
 lower_phis_to_scalar_block(nir_block *block,
struct lower_phis_to_scalar_state *state)
 {
+   bool progress = false;
+
/* Find the last phi node in the block */
nir_phi_instr *last_phi = NULL;
nir_foreach_instr(instr, block) {
@@ -248,6 +250,8 @@ lower_phis_to_scalar_block(nir_block *block,
   ralloc_steal(state->dead_ctx, phi);
   nir_instr_remove(&phi->instr);
 
+  progress = true;
+
   /* We're using the safe iterator and inserting all the newly
* scalarized phi nodes before their non-scalarized version so that's
* ok.  However, we are also inserting vec operations after all of
@@ -258,13 +262,14 @@ lower_phis_to_scalar_block(nir_block *block,
  break;
}
 
-   return true;
+   return progress;
 }
 
-static void
+static bool
 lower_phis_to_scalar_impl(nir_function_impl *impl)
 {
struct lower_phis_to_scalar_state state;
+   bool progress = false;
 
state.mem_ctx = ralloc_parent(impl);
state.dead_ctx = ralloc_context(NULL);
@@ -272,13 +277,14 @@ lower_phis_to_scalar_impl(nir_function_impl *impl)
  _mesa_key_pointer_equal);
 
nir_foreach_block(block, impl) {
-  lower_phis_to_scalar_block(block, &state);
+  progress = lower_phis_to_scalar_block(block, &state) || progress;
}
 
nir_metadata_preserve(impl, nir_metadata_block_index |
nir_metadata_dominance);
 
ralloc_free(state.dead_ctx);
+   return progress;
 }
 
 /** A pass that lowers vector phi nodes to scalar
@@ -288,11 +294,15 @@ lower_phis_to_scalar_impl(nir_function_impl *impl)
  * instance, if one of the sources is a non-scalarizable vector, then we
  * don't bother lowering because that would generate hard-to-coalesce movs.
  */
-void
+bool
 nir_lower_phis_to_scalar(nir_shader *shader)
 {
+   bool progress = false;
+
nir_foreach_function(function, shader) {
   if (function->impl)
- lower_phis_to_scalar_impl(function->impl);
+ progress = lower_phis_to_scalar_impl(function->impl) || progress;
}
+
+   return progress;
 }
diff --git a/src/gallium/drivers/freedreno/ir3/ir3_nir.c 
b/src/gallium/drivers/freedreno/ir3/ir3_nir.c
index 2526222..2d86a52 100644
--- a/src/gallium/drivers/freedreno/ir3/ir3_nir.c
+++ b/src/gallium/drivers/freedreno/ir3/ir3_nir.c
@@ -91,7 +91,7 @@ ir3_optimize_loop(nir_shader *s)
 
OPT_V(s, nir_lower_vars_to_ssa);
progress |= OPT(s, nir_lower_alu_to_scalar);
-   OPT_V(s, nir_lower_phis_to_scalar);
+   progress |= OPT(s, nir_lower_phis_to_scalar);
 
progress |= OPT(s, nir_copy_prop);
progress |= OPT(s, nir_opt_dce);
diff --git a/src/gallium/drivers/vc4/vc4_program.c 
b/src/gallium/drivers/vc4/vc4_program.c
index ca0bd44..64c075a 100644
--- a/src/gallium/drivers/vc4/vc4_program.c
+++ b/src/gallium/drivers/vc4/vc4_program.c
@@ -1424,8 +1424,7 @@ vc4_optimize_nir(struct nir_shader *s)
 
 NIR_PASS_V(s, nir_lower_vars_to_ssa);
 NIR_PASS(progress, s, nir_lower_alu_to_scalar);
-NIR_PASS_V(s, nir_lower_phis_to_scalar);
-
+NIR_PASS(progress, s, nir_lower_phis_to_scalar);
 NIR_PASS(progress, s, nir_copy_prop);
 NIR_PASS(progress, s, nir_opt_remove_phis);
 NIR_PASS(progress, s, nir_opt_dce);
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index 27be201..5b2130f 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -381,7 +381,7 @@ nir_optimize(nir_shader *nir, bool is_scalar)
   OPT(nir_copy_prop);
 
   if (is_scalar) {
- OPT_V(nir_lower_phis_to_scalar);
+ 

[Mesa-dev] [PATCH 2/3] nir: Report progress from nir_lower_alu_to_scalar.

2016-09-13 Thread Kenneth Graunke
Signed-off-by: Kenneth Graunke 
---
 src/compiler/nir/nir.h  |  2 +-
 src/compiler/nir/nir_lower_alu_to_scalar.c  | 42 ++---
 src/gallium/drivers/freedreno/ir3/ir3_nir.c |  2 +-
 src/gallium/drivers/vc4/vc4_program.c   |  2 +-
 src/mesa/drivers/dri/i965/brw_nir.c |  2 +-
 5 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index ff7c422..ea8837d 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2408,7 +2408,7 @@ bool nir_remove_dead_variables(nir_shader *shader, 
nir_variable_mode modes);
 
 void nir_move_vec_src_uses_to_dest(nir_shader *shader);
 bool nir_lower_vec_to_movs(nir_shader *shader);
-void nir_lower_alu_to_scalar(nir_shader *shader);
+bool nir_lower_alu_to_scalar(nir_shader *shader);
 void nir_lower_load_const_to_scalar(nir_shader *shader);
 
 void nir_lower_phis_to_scalar(nir_shader *shader);
diff --git a/src/compiler/nir/nir_lower_alu_to_scalar.c 
b/src/compiler/nir/nir_lower_alu_to_scalar.c
index a84fbdf..fa18deb 100644
--- a/src/compiler/nir/nir_lower_alu_to_scalar.c
+++ b/src/compiler/nir/nir_lower_alu_to_scalar.c
@@ -73,7 +73,7 @@ lower_reduction(nir_alu_instr *instr, nir_op chan_op, nir_op 
merge_op,
nir_instr_remove(&instr->instr);
 }
 
-static void
+static bool
 lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b)
 {
unsigned num_src = nir_op_infos[instr->op].num_inputs;
@@ -90,7 +90,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b)
case name##3: \
case name##4: \
   lower_reduction(instr, chan, merge, b); \
-  return;
+  return true;
 
switch (instr->op) {
case nir_op_vec4:
@@ -99,11 +99,11 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b)
   /* We don't need to scalarize these ops, they're the ones generated to
* group up outputs into a value that can be SSAed.
*/
-  return;
+  return false;
 
case nir_op_pack_half_2x16:
   if (!b->shader->options->lower_pack_half_2x16)
- return;
+ return false;
 
   nir_ssa_def *val =
  nir_pack_half_2x16_split(b, nir_channel(b, instr->src[0].src.ssa,
@@ -113,7 +113,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b)
 
   nir_ssa_def_rewrite_uses(&instr->dest.dest.ssa, nir_src_for_ssa(val));
   nir_instr_remove(&instr->instr);
-  return;
+  return true;
 
case nir_op_unpack_unorm_4x8:
case nir_op_unpack_snorm_4x8:
@@ -122,11 +122,11 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder 
*b)
   /* There is no scalar version of these ops, unless we were to break it
* down to bitshifts and math (which is definitely not intended).
*/
-  return;
+  return false;
 
case nir_op_unpack_half_2x16: {
   if (!b->shader->options->lower_unpack_half_2x16)
- return;
+ return false;
 
   nir_ssa_def *comps[2];
   comps[0] = nir_unpack_half_2x16_split_x(b, instr->src[0].src.ssa);
@@ -135,7 +135,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b)
 
   nir_ssa_def_rewrite_uses(&instr->dest.dest.ssa, nir_src_for_ssa(vec));
   nir_instr_remove(&instr->instr);
-  return;
+  return true;
}
 
case nir_op_pack_uvec2_to_uint: {
@@ -185,11 +185,11 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder 
*b)
 
   nir_ssa_def_rewrite_uses(&instr->dest.dest.ssa, nir_src_for_ssa(val));
   nir_instr_remove(&instr->instr);
-  return;
+  return true;
}
 
case nir_op_unpack_double_2x32:
-  return;
+  return false;
 
   LOWER_REDUCTION(nir_op_fdot, nir_op_fmul, nir_op_fadd);
   LOWER_REDUCTION(nir_op_ball_fequal, nir_op_feq, nir_op_iand);
@@ -204,7 +204,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b)
}
 
if (instr->dest.dest.ssa.num_components == 1)
-  return;
+  return false;
 
unsigned num_components = instr->dest.dest.ssa.num_components;
nir_ssa_def *comps[] = { NULL, NULL, NULL, NULL };
@@ -240,30 +240,40 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder 
*b)
nir_ssa_def_rewrite_uses(&instr->dest.dest.ssa, nir_src_for_ssa(vec));
 
nir_instr_remove(&instr->instr);
+   return true;
 }
 
-static void
+static bool
 nir_lower_alu_to_scalar_impl(nir_function_impl *impl)
 {
nir_builder builder;
nir_builder_init(&builder, impl);
+   bool progress = false;
 
nir_foreach_block(block, impl) {
   nir_foreach_instr_safe(instr, block) {
- if (instr->type == nir_instr_type_alu)
-lower_alu_instr_scalar(nir_instr_as_alu(instr), &builder);
+ if (instr->type == nir_instr_type_alu) {
+progress = lower_alu_instr_scalar(nir_instr_as_alu(instr),
+  &builder) || progress;
+ }
   }
}
 
nir_metadata_preserve(impl, nir_metadata_block_index |
nir_met

[Mesa-dev] [PATCH 1/3] nir: Call nir_metadata_preserve from nir_lower_alu_to_scalar().

2016-09-13 Thread Kenneth Graunke
This is mandatory.

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Kenneth Graunke 
---
 src/compiler/nir/nir_lower_alu_to_scalar.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/compiler/nir/nir_lower_alu_to_scalar.c 
b/src/compiler/nir/nir_lower_alu_to_scalar.c
index 4f72cf7..a84fbdf 100644
--- a/src/compiler/nir/nir_lower_alu_to_scalar.c
+++ b/src/compiler/nir/nir_lower_alu_to_scalar.c
@@ -254,6 +254,9 @@ nir_lower_alu_to_scalar_impl(nir_function_impl *impl)
 lower_alu_instr_scalar(nir_instr_as_alu(instr), &builder);
   }
}
+
+   nir_metadata_preserve(impl, nir_metadata_block_index |
+   nir_metadata_dominance);
 }
 
 void
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/6] RadeonSI: Let's just stop spilling SGPRs

2016-09-13 Thread Edward O'Callaghan
Patches 2,3 and 5 are easy enough and so,
Reviewed-by: Edward O'Callaghan 

Those numbers do look good but I think I'll leave the rest up to other
folks to review.
Kind Regards,
Edward.

On 09/14/2016 03:13 AM, Marek Olšák wrote:
> This is quite easy because we just have to get rid of all of
> the preloading at the beginning of shaders.
> 
> I also removed preloading of PS inputs with literal indexing, which
> has almost the same effect as sinking interp instructions.
> 
> I'm slightly concerned that LICM won't move interps because they are
> not considered speculatively-executable (=movable) by LLVM, but
> the shader-db stats show that it doesn't matter.
> 
> LLVM is smart enough to do CSE where needed for both descriptor loads
> and interps. In fact, it's the CSE which is responsible for some of
> the remaining SGPR spills. (It makes sense if you think about it)
> 
> The compile time increased by 6% because CSE has a lot more work,
> but it's certainly worth it.
> 
> 
> shader-db stats:
> 
> [PATCH 4/6] radeonsi: get rid of img/buf/sampler descriptor
> https://people.freedesktop.org/~mareko/no_preload1.html
> [PATCH 5/6] radeonsi: get rid of constant buffer preloading
> https://people.freedesktop.org/~mareko/no_preload2.html
> [PATCH 6/6] radeonsi: reload PS inputs with direct indexing at each
> https://people.freedesktop.org/~mareko/no_preload3_ps.html
> 
> Total diff:
> https://people.freedesktop.org/~mareko/no_preload_total.html
> 
> Please review.
> 
> Marek
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 1/3] mesa: add a GLES3.2 enums section, and expose new MS line width params

2016-09-13 Thread Kenneth Graunke
On Tuesday, September 13, 2016 7:10:57 PM PDT Ilia Mirkin wrote:
> This also exposes them for ARB_ES3_2_compatibility.
> 
> While both specs refer to the new MS line width parameters being
> separate from the existing AA line widths, reality begs to differ. It's
> the same on all hardware currently supported by mesa. Should hardware
> come along that wants these to be different, they're easy enough to
> separate out.
> 
> Signed-off-by: Ilia Mirkin 
> Reviewed-by: Ian Romanick  (v1)

Series is:
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 0/3] *** Aubinator code simplification ***

2016-09-13 Thread Kenneth Graunke
On Tuesday, September 13, 2016 4:19:28 PM PDT Sirisha Gandikota wrote:
> From: Sirisha Gandikota 
> 
> This patch set simplifies parts of code in the aubinator tool
> as per review comments from Ken (Wed Aug 24 04:51:47 UTC 2016)
> 
> v2 of the earlier patches simplifying code further as per Ken's comments
> 
> Sirisha Gandikota (3):
>   aubinator: Simplify print_dword_val() method
>   aubinator: Make gen_disasm_disassemble handle split sends
>   aubinator: Remove bogus "end" parameter in gen_disasm_disassemble()
> 
>  src/intel/tools/aubinator.c  | 24 ++--
>  src/intel/tools/disasm.c | 27 +--
>  src/intel/tools/gen_disasm.h |  2 +-
>  3 files changed, 28 insertions(+), 25 deletions(-)

Pushed, thanks!

To ssh://git.freedesktop.org/git/mesa/mesa
   1eebb60..aa7b410  master -> master


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 0/3] *** Aubinator code simplification ***

2016-09-13 Thread Sirisha Gandikota
From: Sirisha Gandikota 

This patch set simplifies parts of code in the aubinator tool
as per review comments from Ken (Wed Aug 24 04:51:47 UTC 2016)

v2 of the earlier patches simplifying code further as per Ken's comments

Sirisha Gandikota (3):
  aubinator: Simplify print_dword_val() method
  aubinator: Make gen_disasm_disassemble handle split sends
  aubinator: Remove bogus "end" parameter in gen_disasm_disassemble()

 src/intel/tools/aubinator.c  | 24 ++--
 src/intel/tools/disasm.c | 27 +--
 src/intel/tools/gen_disasm.h |  2 +-
 3 files changed, 28 insertions(+), 25 deletions(-)

-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/3] aubinator: Remove bogus "end" parameter in gen_disasm_disassemble()

2016-09-13 Thread Sirisha Gandikota
From: Sirisha Gandikota 

Earlier, the loop pretends to loop over instructions from "start" to "end",
but the callers always pass 8192 for end, which is some huge bogus
value. The real loop termination condition is send-with-EOT or 0. (Ken)

v2: no change

Signed-off-by: Sirisha Gandikota 
---
 src/intel/tools/aubinator.c  | 12 ++--
 src/intel/tools/disasm.c |  8 +---
 src/intel/tools/gen_disasm.h |  2 +-
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
index 89d29f2..fad8aaa 100644
--- a/src/intel/tools/aubinator.c
+++ b/src/intel/tools/aubinator.c
@@ -303,7 +303,7 @@ handle_media_interface_descriptor_load(struct gen_spec 
*spec, uint32_t *p)
   }
 
   insns = (struct brw_instruction *) (gtt + start);
-  gen_disasm_disassemble(disasm, insns, 0, 8192, stdout);
+  gen_disasm_disassemble(disasm, insns, 0, stdout);
 
   dump_samplers(spec, descriptors[3] & ~0x1f);
   dump_binding_table(spec, descriptors[4] & ~0x1f);
@@ -401,7 +401,7 @@ handle_3dstate_vs(struct gen_spec *spec, uint32_t *p)
  instruction_base, start);
 
   insns = (struct brw_instruction *) (gtt + instruction_base + start);
-  gen_disasm_disassemble(disasm, insns, 0, 8192, stdout);
+  gen_disasm_disassemble(disasm, insns, 0, stdout);
}
 }
 
@@ -425,7 +425,7 @@ handle_3dstate_hs(struct gen_spec *spec, uint32_t *p)
  instruction_base, start);
 
   insns = (struct brw_instruction *) (gtt + instruction_base + start);
-  gen_disasm_disassemble(disasm, insns, 0, 8192, stdout);
+  gen_disasm_disassemble(disasm, insns, 0, stdout);
}
 }
 
@@ -519,21 +519,21 @@ handle_3dstate_ps(struct gen_spec *spec, uint32_t *p)
printf("  Kernel[0] %s\n", k0);
if (k0 != unused) {
   insns = (struct brw_instruction *) (gtt + start);
-  gen_disasm_disassemble(disasm, insns, 0, 8192, stdout);
+  gen_disasm_disassemble(disasm, insns, 0, stdout);
}
 
start = instruction_base + (p[k1_offset] & mask);
printf("  Kernel[1] %s\n", k1);
if (k1 != unused) {
   insns = (struct brw_instruction *) (gtt + start);
-  gen_disasm_disassemble(disasm, insns, 0, 8192, stdout);
+  gen_disasm_disassemble(disasm, insns, 0, stdout);
}
 
start = instruction_base + (p[k2_offset] & mask);
printf("  Kernel[2] %s\n", k2);
if (k2 != unused) {
   insns = (struct brw_instruction *) (gtt + start);
-  gen_disasm_disassemble(disasm, insns, 0, 8192, stdout);
+  gen_disasm_disassemble(disasm, insns, 0, stdout);
}
 }
 
diff --git a/src/intel/tools/disasm.c b/src/intel/tools/disasm.c
index 89c711b..2b51424 100644
--- a/src/intel/tools/disasm.c
+++ b/src/intel/tools/disasm.c
@@ -45,13 +45,15 @@ is_send(uint32_t opcode)
 }
 
 void
-gen_disasm_disassemble(struct gen_disasm *disasm, void *assembly, int start,
-   int end, FILE *out)
+gen_disasm_disassemble(struct gen_disasm *disasm, void *assembly,
+   int start, FILE *out)
 {
struct gen_device_info *devinfo = &disasm->devinfo;
bool dump_hex = false;
+   int offset = start;
 
-   for (int offset = start; offset < end;) {
+   /* This loop exits when send-with-EOT or when opcode is 0 */
+   while (true) {
   brw_inst *insn = assembly + offset;
   brw_inst uncompacted;
   bool compacted = brw_inst_cmpt_control(devinfo, insn);
diff --git a/src/intel/tools/gen_disasm.h b/src/intel/tools/gen_disasm.h
index af6654f..24b56c9 100644
--- a/src/intel/tools/gen_disasm.h
+++ b/src/intel/tools/gen_disasm.h
@@ -28,7 +28,7 @@ struct gen_disasm;
 
 struct gen_disasm *gen_disasm_create(int pciid);
 void gen_disasm_disassemble(struct gen_disasm *disasm,
-void *assembly, int start, int end, FILE *out);
+void *assembly, int start, FILE *out);
 
 void gen_disasm_destroy(struct gen_disasm *disasm);
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/3] aubinator: Make gen_disasm_disassemble handle split sends

2016-09-13 Thread Sirisha Gandikota
From: Sirisha Gandikota 

Skylake adds new SENDS and SENDSC opcodes, which should be
handled in the send-with-EOT check. Make an is_send() helper
that checks if the opcode is SEND/SENDC/SENDS/SENDSC (Ken)

v2: Make is_send() much more crispier, Mix declaration and
code to make the code compact (Ken)

Signed-off-by: Sirisha Gandikota 
---
 src/intel/tools/disasm.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/src/intel/tools/disasm.c b/src/intel/tools/disasm.c
index 7e5a7cb..89c711b 100644
--- a/src/intel/tools/disasm.c
+++ b/src/intel/tools/disasm.c
@@ -35,6 +35,15 @@ struct gen_disasm {
 struct gen_device_info devinfo;
 };
 
+static bool
+is_send(uint32_t opcode)
+{
+   return (opcode == BRW_OPCODE_SEND  ||
+   opcode == BRW_OPCODE_SENDC ||
+   opcode == BRW_OPCODE_SENDS ||
+   opcode == BRW_OPCODE_SENDSC );
+}
+
 void
 gen_disasm_disassemble(struct gen_disasm *disasm, void *assembly, int start,
int end, FILE *out)
@@ -74,14 +83,10 @@ gen_disasm_disassemble(struct gen_disasm *disasm, void 
*assembly, int start,
   brw_disassemble_inst(out, devinfo, insn, compacted);
 
   /* Simplistic, but efficient way to terminate disasm */
-  if (brw_inst_opcode(devinfo, insn) == BRW_OPCODE_SEND ||
-  brw_inst_opcode(devinfo, insn) == BRW_OPCODE_SENDC) {
- if (brw_inst_eot(devinfo, insn))
-break;
-  }
-
-  if (brw_inst_opcode(devinfo, insn) == 0)
+  uint32_t opcode = brw_inst_opcode(devinfo, insn);
+  if (opcode == 0 || (is_send(opcode) && brw_inst_eot(devinfo, insn))) {
  break;
+  }
}
 }
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/3] aubinator: Simplify print_dword_val() method

2016-09-13 Thread Sirisha Gandikota
From: Sirisha Gandikota 

Remove the float/dword union and use the iter->p[f->start / 32]
directly as printf formatter %08x expects uint32_t (Ken)

v2: Make the cleanup much more crispier (Ken)

Signed-off-by: Sirisha Gandikota 
---
 src/intel/tools/aubinator.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c
index 9d29b68..89d29f2 100644
--- a/src/intel/tools/aubinator.c
+++ b/src/intel/tools/aubinator.c
@@ -91,18 +91,14 @@ print_dword_val(struct gen_field_iterator *iter, uint64_t 
offset,
 int *dword_num)
 {
struct gen_field *f;
-   union {
-  uint32_t dw;
-  float f;
-   } v;
 
f = iter->group->fields[iter->i - 1];
-   v.dw = iter->p[f->start / 32];
+   const int dword = f->start / 32;
 
-   if (*dword_num != (f->start / 32)) {
+   if (*dword_num != dword) {
   printf("0x%08lx:  0x%08x : Dword %d\n",
- offset + 4 * (f->start / 32), v.dw, f->start / 32);
-  *dword_num = (f->start / 32);
+ offset + 4 * dword,  iter->p[dword], dword);
+  *dword_num = dword;
}
 }
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 3/3] glsl: add core plumbing for GL_ANDROID_extension_pack_es31a

2016-09-13 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin 
---
 src/compiler/glsl/glsl_parser_extras.cpp | 58 +++-
 src/compiler/glsl/glsl_parser_extras.h   |  2 ++
 src/mesa/main/extensions_table.h |  2 ++
 src/mesa/main/mtypes.h   |  1 +
 4 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/src/compiler/glsl/glsl_parser_extras.cpp 
b/src/compiler/glsl/glsl_parser_extras.cpp
index 436ddd0..0e9bfa7 100644
--- a/src/compiler/glsl/glsl_parser_extras.cpp
+++ b/src/compiler/glsl/glsl_parser_extras.cpp
@@ -523,6 +523,11 @@ struct _mesa_glsl_extension {
const char *name;
 
/**
+* Whether this extension is a part of AEP
+*/
+   bool aep;
+
+   /**
 * Predicate that checks whether the relevant extension is available for
 * this context.
 */
@@ -565,9 +570,14 @@ has_##name_str(const struct gl_context *ctx, gl_api api, 
uint8_t version) \
 #undef EXT
 
 #define EXT(NAME)   \
-   { "GL_" #NAME, has_##NAME, \
- &_mesa_glsl_parse_state::NAME##_enable,\
- &_mesa_glsl_parse_state::NAME##_warn }
+   { "GL_" #NAME, false, has_##NAME,\
+ &_mesa_glsl_parse_state::NAME##_enable,\
+ &_mesa_glsl_parse_state::NAME##_warn }
+
+#define EXT_AEP(NAME)   \
+   { "GL_" #NAME, true, has_##NAME, \
+ &_mesa_glsl_parse_state::NAME##_enable,\
+ &_mesa_glsl_parse_state::NAME##_warn }
 
 /**
  * Table of extensions that can be enabled/disabled within a shader,
@@ -623,7 +633,7 @@ static const _mesa_glsl_extension 
_mesa_glsl_supported_extensions[] = {
 
/* KHR extensions go here, sorted alphabetically.
 */
-   EXT(KHR_blend_equation_advanced),
+   EXT_AEP(KHR_blend_equation_advanced),
 
/* OES extensions go here, sorted alphabetically.
 */
@@ -632,17 +642,17 @@ static const _mesa_glsl_extension 
_mesa_glsl_supported_extensions[] = {
EXT(OES_geometry_shader),
EXT(OES_gpu_shader5),
EXT(OES_primitive_bounding_box),
-   EXT(OES_sample_variables),
-   EXT(OES_shader_image_atomic),
+   EXT_AEP(OES_sample_variables),
+   EXT_AEP(OES_shader_image_atomic),
EXT(OES_shader_io_blocks),
-   EXT(OES_shader_multisample_interpolation),
+   EXT_AEP(OES_shader_multisample_interpolation),
EXT(OES_standard_derivatives),
EXT(OES_tessellation_point_size),
EXT(OES_tessellation_shader),
EXT(OES_texture_3D),
EXT(OES_texture_buffer),
EXT(OES_texture_cube_map_array),
-   EXT(OES_texture_storage_multisample_2d_array),
+   EXT_AEP(OES_texture_storage_multisample_2d_array),
 
/* All other extensions go here, sorted alphabetically.
 */
@@ -651,23 +661,24 @@ static const _mesa_glsl_extension 
_mesa_glsl_supported_extensions[] = {
EXT(AMD_shader_trinary_minmax),
EXT(AMD_vertex_shader_layer),
EXT(AMD_vertex_shader_viewport_index),
+   EXT(ANDROID_extension_pack_es31a),
EXT(EXT_blend_func_extended),
EXT(EXT_draw_buffers),
EXT(EXT_clip_cull_distance),
EXT(EXT_geometry_point_size),
-   EXT(EXT_geometry_shader),
-   EXT(EXT_gpu_shader5),
-   EXT(EXT_primitive_bounding_box),
+   EXT_AEP(EXT_geometry_shader),
+   EXT_AEP(EXT_gpu_shader5),
+   EXT_AEP(EXT_primitive_bounding_box),
EXT(EXT_separate_shader_objects),
EXT(EXT_shader_framebuffer_fetch),
EXT(EXT_shader_integer_mix),
-   EXT(EXT_shader_io_blocks),
+   EXT_AEP(EXT_shader_io_blocks),
EXT(EXT_shader_samples_identical),
EXT(EXT_tessellation_point_size),
-   EXT(EXT_tessellation_shader),
+   EXT_AEP(EXT_tessellation_shader),
EXT(EXT_texture_array),
-   EXT(EXT_texture_buffer),
-   EXT(EXT_texture_cube_map_array),
+   EXT_AEP(EXT_texture_buffer),
+   EXT_AEP(EXT_texture_cube_map_array),
EXT(MESA_shader_integer_functions),
 };
 
@@ -713,7 +724,6 @@ static const _mesa_glsl_extension *find_extension(const 
char *name)
return NULL;
 }
 
-
 bool
 _mesa_glsl_process_extension(const char *name, YYLTYPE *name_locp,
 const char *behavior_string, YYLTYPE 
*behavior_locp,
@@ -768,6 +778,22 @@ _mesa_glsl_process_extension(const char *name, YYLTYPE 
*name_locp,
   const _mesa_glsl_extension *extension = find_extension(name);
   if (extension && extension->compatible_with_state(state, api, 
gl_version)) {
  extension->set_flags(state, behavior);
+ if (extension->available_pred == has_ANDROID_extension_pack_es31a) {
+for (unsigned i = 0;
+ i < ARRAY_SIZE(_mesa_glsl_supported_extensions); ++i) {
+   const _mesa_glsl_extension *extension =
+  &_mesa_glsl_supported_extensions[i];
+
+   if (!extension->aep)
+  continue;
+   /* AEP should not be enabled if all of the sub-extensions can't
+* also be enabled. This is not the proper layer to do such
+* error-checking though

[Mesa-dev] [PATCH v3 2/3] mesa: introduce glPrimitiveBoundingBoxARB entrypoint

2016-09-13 Thread Ilia Mirkin
This requires a bit of rejiggering, since normally ES entrypoints alias
core ones, not vice-versa.

Signed-off-by: Ilia Mirkin 
Reviewed-by: Ian Romanick 
---
 src/mapi/glapi/gen/es_EXT.xml   | 19 -
 src/mapi/glapi/gen/gl_API.xml   | 37 +
 src/mesa/main/tests/dispatch_sanity.cpp |  3 +++
 3 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/src/mapi/glapi/gen/es_EXT.xml b/src/mapi/glapi/gen/es_EXT.xml
index b9fbec4..332dc5e 100644
--- a/src/mapi/glapi/gen/es_EXT.xml
+++ b/src/mapi/glapi/gen/es_EXT.xml
@@ -1342,23 +1342,4 @@
 
 
 
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
 
diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index c39aa22..17c59db 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -8318,6 +8318,43 @@
 
 
 
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
 
 
 
diff --git a/src/mesa/main/tests/dispatch_sanity.cpp 
b/src/mesa/main/tests/dispatch_sanity.cpp
index 42fe61a..c87b1dc 100644
--- a/src/mesa/main/tests/dispatch_sanity.cpp
+++ b/src/mesa/main/tests/dispatch_sanity.cpp
@@ -1866,6 +1866,9 @@ const struct function gl_core_functions_possible[] = {
{ "glMultiDrawArraysIndirectCountARB", 31, -1 },
{ "glMultiDrawElementsIndirectCountARB", 31, -1 },
 
+   /* GL_ARB_ES3_2_compatibility */
+   { "glPrimitiveBoundingBoxARB", 45, -1 },
+
{ NULL, 0, -1 }
 };
 
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 1/3] mesa: add a GLES3.2 enums section, and expose new MS line width params

2016-09-13 Thread Ilia Mirkin
This also exposes them for ARB_ES3_2_compatibility.

While both specs refer to the new MS line width parameters being
separate from the existing AA line widths, reality begs to differ. It's
the same on all hardware currently supported by mesa. Should hardware
come along that wants these to be different, they're easy enough to
separate out.

Signed-off-by: Ilia Mirkin 
Reviewed-by: Ian Romanick  (v1)
---

v3: drop separate constants for the MS params, reuse the AA ones

 src/mesa/main/context.h | 10 ++
 src/mesa/main/get.c | 26 --
 src/mesa/main/get_hash_generator.py | 15 +++
 src/mesa/main/get_hash_params.py|  5 +
 4 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/src/mesa/main/context.h b/src/mesa/main/context.h
index 4cd149d..520b3bb 100644
--- a/src/mesa/main/context.h
+++ b/src/mesa/main/context.h
@@ -318,6 +318,16 @@ _mesa_is_gles31(const struct gl_context *ctx)
 
 
 /**
+ * Checks if the context is for GLES 3.2 or later
+ */
+static inline bool
+_mesa_is_gles32(const struct gl_context *ctx)
+{
+   return ctx->API == API_OPENGLES2 && ctx->Version >= 32;
+}
+
+
+/**
  * Checks if the context supports geometry shaders.
  */
 static inline bool
diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index 810ccb9..3cabb2b 100644
--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -142,6 +142,7 @@ enum value_extra {
EXTRA_API_ES2,
EXTRA_API_ES3,
EXTRA_API_ES31,
+   EXTRA_API_ES32,
EXTRA_NEW_BUFFERS, 
EXTRA_NEW_FRAG_CLAMP,
EXTRA_VALID_DRAW_BUFFER,
@@ -416,6 +417,12 @@ static const int 
extra_ARB_gpu_shader5_or_OES_sample_variables[] = {
EXTRA_END
 };
 
+static const int extra_ES32[] = {
+   EXT(ARB_ES3_2_compatibility),
+   EXTRA_API_ES32,
+   EXTRA_END
+};
+
 EXTRA_EXT(ARB_texture_cube_map);
 EXTRA_EXT(EXT_texture_array);
 EXTRA_EXT(NV_fog_distance);
@@ -1164,6 +1171,11 @@ check_extra(struct gl_context *ctx, const char *func, 
const struct value_desc *d
  if (_mesa_is_gles31(ctx))
 api_found = GL_TRUE;
 break;
+  case EXTRA_API_ES32:
+ api_check = GL_TRUE;
+ if (_mesa_is_gles32(ctx))
+api_found = GL_TRUE;
+break;
   case EXTRA_API_GL:
  api_check = GL_TRUE;
  if (_mesa_is_desktop_gl(ctx))
@@ -1312,12 +1324,14 @@ find_value(const char *func, GLenum pname, void **p, 
union value *v)
 * value since it's compatible with GLES2 its entry in table_set[] is at the
 * end.
 */
-   STATIC_ASSERT(ARRAY_SIZE(table_set) == API_OPENGL_LAST + 3);
-   if (_mesa_is_gles3(ctx)) {
-  api = API_OPENGL_LAST + 1;
-   }
-   if (_mesa_is_gles31(ctx)) {
-  api = API_OPENGL_LAST + 2;
+   STATIC_ASSERT(ARRAY_SIZE(table_set) == API_OPENGL_LAST + 4);
+   if (ctx->API == API_OPENGLES2) {
+  if (ctx->Version >= 32)
+ api = API_OPENGL_LAST + 3;
+  else if (ctx->Version >= 31)
+ api = API_OPENGL_LAST + 2;
+  else if (ctx->Version >= 30)
+ api = API_OPENGL_LAST + 1;
}
mask = ARRAY_SIZE(table(api)) - 1;
hash = (pname * prime_factor);
diff --git a/src/mesa/main/get_hash_generator.py 
b/src/mesa/main/get_hash_generator.py
index c777b78..d7460c8 100644
--- a/src/mesa/main/get_hash_generator.py
+++ b/src/mesa/main/get_hash_generator.py
@@ -44,7 +44,7 @@ prime_factor = 89
 prime_step = 281
 hash_table_size = 1024
 
-gl_apis=set(["GL", "GL_CORE", "GLES", "GLES2", "GLES3", "GLES31"])
+gl_apis=set(["GL", "GL_CORE", "GLES", "GLES2", "GLES3", "GLES31", "GLES32"])
 
 def print_header():
print "typedef const unsigned short table_t[%d];\n" % (hash_table_size)
@@ -69,6 +69,7 @@ api_enum = [
'GL_CORE',
'GLES3', # Not in gl_api enum in mtypes.h
'GLES31', # Not in gl_api enum in mtypes.h
+   'GLES32', # Not in gl_api enum in mtypes.h
 ]
 
 def api_index(api):
@@ -168,13 +169,18 @@ def generate_hash_tables(enum_list, enabled_apis, 
param_descriptors):
 
  for api in valid_apis:
 add_to_hash_table(tables[api], hash_val, len(params))
-# Also add GLES2 items to the GLES3 and GLES31 hash table
+# Also add GLES2 items to the GLES3+ hash tables
 if api == "GLES2":
add_to_hash_table(tables["GLES3"], hash_val, len(params))
add_to_hash_table(tables["GLES31"], hash_val, len(params))
-# Also add GLES3 items to the GLES31 hash table
+   add_to_hash_table(tables["GLES32"], hash_val, len(params))
+# Also add GLES3 items to the GLES31+ hash tables
 if api == "GLES3":
add_to_hash_table(tables["GLES31"], hash_val, len(params))
+   add_to_hash_table(tables["GLES32"], hash_val, len(params))
+# Also add GLES31 items to the GLES32+ hash tables
+if api == "GLES31":
+   add_to_hash_table(tables["GLES32"], hash_val, len(params))
  params.append(["GL_" + enum_name, param[1]])
 
sort

Re: [Mesa-dev] [PATCH v3] HUD: Add support for block I/O, network I/O and lmsensor stats

2016-09-13 Thread Brian Paul

On 09/13/2016 02:21 PM, Steven Toth wrote:

V3: Flatten the entire patchset ready for the ML

V2: Additional seperate patches based on feedback
a) configure.ac: Add a comment related to libsensors

b) HUD: Disable Block/NIC I/O stats by default.
Implement configuration option --enable-gallium-extra-hud=yes
and enable both statistics when this option is enabled.

c) Configure.ac: Minor cleanup to user visible configuration settings

d) Configure.ac: HUD stats - build system improvements
Move the -lsensors out of a deeper Makefile, bring it into the configure.ac.
Also, rename a compiler directive to more closely follow the standard.

V1: Initial release to the ML
Three new features:
1. Disk/block I/O device read/write stats MB/ps.
2. Network Interface RX/TX transfer statistics as a percentage
of the overall NIC speed.
3. lmsensor power, voltage and temperature sensors.

The lmsensor changes makes a dependency on libsensors so support
so the change is opt out by default.

Signed-off-by: Steven Toth 


Builds and runs as expected with MSVC.

I'll leave the detailed review to others.

Tested-by: Brian Paul 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 13/14] egl: Track EGL_KHR_debug state when going through EGL API calls (v2)

2016-09-13 Thread Kyle Brenneman

On 09/13/2016 02:42 PM, Adam Jackson wrote:

On Tue, 2016-09-13 at 14:14 -0600, Kyle Brenneman wrote:

On 09/13/2016 11:57 AM, Adam Jackson wrote:

@@ -37,7 +39,7 @@
   
   /* This should be kept in sync with _eglInitThreadInfo() */

   #define _EGL_THREAD_INFO_INITIALIZER \
-   { EGL_SUCCESS, NULL, 0, NULL, NULL, NULL }
+   { EGL_SUCCESS, NULL, EGL_NONE, NULL, NULL, NULL }

The API here should be EGL_OPENGL_ES_API, not EGL_NONE. Otherwise, the
current API would effectively change when the _EGLThreadInfo struct is
allocated. Or I guess more generally, _EGL_THREAD_INFO_INITIALIZER
should produce the same data as _eglInitThreadInfo.

Mmm, okay. That's a very close reading of the spec. QueryAPI allows the
result to be EGL_NONE, which does make sense for the dummy thread since
you sure won't be doing much with it. But BindAPI says the default is
EGL_OPENGL_ES_API, so presumably that should apply even to the dummy
context. One does wonder then how you could ever get EGL_NONE out of
QueryAPI.

- ajax
eglQueryAPI allows the result to be EGL_NONE only if it doesn't support 
GLES. From the spec (EGL 1.5, section 3.7):
"The initial value of the current rendering API is EGL_OPENGL_ES_API , 
unless OpenGL ES is not supported by an implementation, in which case 
the initial value is EGL_NONE."


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] tgsi: Enable returns from within loops

2016-09-13 Thread Lars Hamre
Yes please, thanks!

On Tue, Sep 13, 2016 at 4:22 PM, Brian Paul  wrote:
> On 09/13/2016 01:08 PM, Lars Hamre wrote:
>>
>> Fixes the following piglit test (for softpipe):
>> /spec/glsl-1.10/execution/fs-loop-return
>>
>> Signed-off-by: Lars Hamre 
>>
>> ---
>>   src/gallium/auxiliary/tgsi/tgsi_exec.c | 4 
>>   1 file changed, 4 insertions(+)
>>
>> NOTE: Someone with access will need to commit this
>>after the review process
>>
>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c
>> b/src/gallium/auxiliary/tgsi/tgsi_exec.c
>> index 1457c06..aff35e6 100644
>> --- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
>> +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
>> @@ -5148,6 +5148,10 @@ exec_instruction(
>>   /* returning from main() */
>>   mach->CondStackTop = 0;
>>   mach->LoopStackTop = 0;
>> +mach->ContStackTop = 0;
>> +mach->LoopLabelStackTop = 0;
>> +mach->SwitchStackTop = 0;
>> +mach->BreakStackTop = 0;
>>   *pc = -1;
>>   return FALSE;
>>}
>> --
>> 2.7.4
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
> Reviewed-by: Brian Paul 
>
> Do you need me to push this for you?
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 13/14] egl: Track EGL_KHR_debug state when going through EGL API calls (v2)

2016-09-13 Thread Adam Jackson
On Tue, 2016-09-13 at 14:14 -0600, Kyle Brenneman wrote:
> On 09/13/2016 11:57 AM, Adam Jackson wrote:
> > @@ -37,7 +39,7 @@
> >   
> >   /* This should be kept in sync with _eglInitThreadInfo() */
> >   #define _EGL_THREAD_INFO_INITIALIZER \
> > -   { EGL_SUCCESS, NULL, 0, NULL, NULL, NULL }
> > +   { EGL_SUCCESS, NULL, EGL_NONE, NULL, NULL, NULL }
> 
> The API here should be EGL_OPENGL_ES_API, not EGL_NONE. Otherwise, the 
> current API would effectively change when the _EGLThreadInfo struct is 
> allocated. Or I guess more generally, _EGL_THREAD_INFO_INITIALIZER 
> should produce the same data as _eglInitThreadInfo.

Mmm, okay. That's a very close reading of the spec. QueryAPI allows the
result to be EGL_NONE, which does make sense for the dummy thread since
you sure won't be doing much with it. But BindAPI says the default is
EGL_OPENGL_ES_API, so presumably that should apply even to the dummy
context. One does wonder then how you could ever get EGL_NONE out of
QueryAPI.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] HUD: Add support for block I/O, network I/O and lmsensor stats

2016-09-13 Thread Steven Toth
> V3: Flatten the entire patchset ready for the ML

Compile tested on Windows via AppVeyor
Patches tested in various ./configure disable/enable modes on Ubuntu
16.04, 4.5.7 kernel on 32bit.

Many thanks to everyone who provided feedback.

-- 
Steven Toth - Kernel Labs
http://www.kernellabs.com
+1.646.355.8490
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: get rid of img/buf/sampler descriptor preloading (v2)

2016-09-13 Thread Marek Olšák
From: Marek Olšák 

26011 shaders in 14651 tests
Totals:
SGPRS: 1251920 -> 1152636 (-7.93 %)
VGPRS: 728421 -> 728198 (-0.03 %)
Spilled SGPRs: 16644 -> 3776 (-77.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 36001064 -> 35835152 (-0.46 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 21 -> 222372 (0.07 %)
Wait states: 0 -> 0 (0.00 %)

v2: merge codepaths where possible
---
 src/gallium/drivers/radeonsi/si_shader.c | 173 ---
 1 file changed, 41 insertions(+), 132 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 84cbfd7..6f9c45f 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -100,25 +100,20 @@ struct si_shader_context
 
LLVMTargetMachineRef tm;
 
unsigned invariant_load_md_kind;
unsigned range_md_kind;
unsigned uniform_md_kind;
LLVMValueRef empty_md;
 
/* Preloaded descriptors. */
LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS];
-   LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS];
-   LLVMValueRef sampler_views[SI_NUM_SAMPLERS];
-   LLVMValueRef sampler_states[SI_NUM_SAMPLERS];
-   LLVMValueRef fmasks[SI_NUM_SAMPLERS];
-   LLVMValueRef images[SI_NUM_IMAGES];
LLVMValueRef esgs_ring;
LLVMValueRef gsvs_ring[4];
 
LLVMValueRef lds;
LLVMValueRef gs_next_vertex[4];
LLVMValueRef return_value;
 
LLVMTypeRef voidt;
LLVMTypeRef i1;
LLVMTypeRef i8;
@@ -3399,32 +3394,32 @@ static void membar_emit(
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
 
emit_waitcnt(ctx);
 }
 
 static LLVMValueRef
 shader_buffer_fetch_rsrc(struct si_shader_context *ctx,
 const struct tgsi_full_src_register *reg)
 {
-   LLVMValueRef ind_index;
-   LLVMValueRef rsrc_ptr;
+   LLVMValueRef index;
+   LLVMValueRef rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn,
+SI_PARAM_SHADER_BUFFERS);
 
if (!reg->Register.Indirect)
-   return ctx->shader_buffers[reg->Register.Index];
-
-   ind_index = get_bounded_indirect_index(ctx, ®->Indirect,
-  reg->Register.Index,
-  SI_NUM_SHADER_BUFFERS);
+   index = LLVMConstInt(ctx->i32, reg->Register.Index, 0);
+   else
+   index = get_bounded_indirect_index(ctx, ®->Indirect,
+  reg->Register.Index,
+  SI_NUM_SHADER_BUFFERS);
 
-   rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, 
SI_PARAM_SHADER_BUFFERS);
-   return build_indexed_load_const(ctx, rsrc_ptr, ind_index);
+   return build_indexed_load_const(ctx, rsrc_ptr, index);
 }
 
 static bool tgsi_is_array_sampler(unsigned target)
 {
return target == TGSI_TEXTURE_1D_ARRAY ||
   target == TGSI_TEXTURE_SHADOW1D_ARRAY ||
   target == TGSI_TEXTURE_2D_ARRAY ||
   target == TGSI_TEXTURE_SHADOW2D_ARRAY ||
   target == TGSI_TEXTURE_CUBE_ARRAY ||
   target == TGSI_TEXTURE_SHADOWCUBE_ARRAY ||
@@ -3473,51 +3468,47 @@ static LLVMValueRef force_dcc_off(struct 
si_shader_context *ctx,
  * Load the resource descriptor for \p image.
  */
 static void
 image_fetch_rsrc(
struct lp_build_tgsi_context *bld_base,
const struct tgsi_full_src_register *image,
bool dcc_off,
LLVMValueRef *rsrc)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
+   LLVMValueRef rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn,
+SI_PARAM_IMAGES);
+   LLVMValueRef index, tmp;
 
assert(image->Register.File == TGSI_FILE_IMAGE);
 
if (!image->Register.Indirect) {
-   /* Fast path: use preloaded resources */
-   *rsrc = ctx->images[image->Register.Index];
+   index = LLVMConstInt(ctx->i32, image->Register.Index, 0);
} else {
-   /* Indexing and manual load */
-   LLVMValueRef ind_index;
-   LLVMValueRef rsrc_ptr;
-   LLVMValueRef tmp;
-
/* From the GL_ARB_shader_image_load_store extension spec:
 *
 *If a shader performs an image load, store, or atomic
 *operation using an image variable declared as an array,
 *and if the index used to select an individual element is
 *negative or greater than or equal to the size of the
 *array, the results of the operation are undefined but may
 *not lead to termination.
 */
-   ind_index = get_bounded_indirect_ind

[Mesa-dev] [PATCH] radeonsi: load streamout buffer descriptors before use (v2)

2016-09-13 Thread Marek Olšák
From: Marek Olšák 

v2: inline the code and remove the conditional that's a no-op now
---
 src/gallium/drivers/radeonsi/si_shader.c | 47 ++--
 1 file changed, 14 insertions(+), 33 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index be6fae7..d61f4ff 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -105,21 +105,20 @@ struct si_shader_context
unsigned uniform_md_kind;
LLVMValueRef empty_md;
 
LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS];
LLVMValueRef lds;
LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS];
LLVMValueRef sampler_views[SI_NUM_SAMPLERS];
LLVMValueRef sampler_states[SI_NUM_SAMPLERS];
LLVMValueRef fmasks[SI_NUM_SAMPLERS];
LLVMValueRef images[SI_NUM_IMAGES];
-   LLVMValueRef so_buffers[4];
LLVMValueRef esgs_ring;
LLVMValueRef gsvs_ring[4];
LLVMValueRef gs_next_vertex[4];
LLVMValueRef return_value;
 
LLVMTypeRef voidt;
LLVMTypeRef i1;
LLVMTypeRef i8;
LLVMTypeRef i32;
LLVMTypeRef i64;
@@ -2264,20 +2263,33 @@ static void si_dump_streamout(struct 
pipe_stream_output_info *so)
  * to buffers. */
 static void si_llvm_emit_streamout(struct si_shader_context *ctx,
   struct si_shader_output_values *outputs,
   unsigned noutput)
 {
struct pipe_stream_output_info *so = &ctx->shader->selector->so;
struct gallivm_state *gallivm = &ctx->radeon_bld.gallivm;
LLVMBuilderRef builder = gallivm->builder;
int i, j;
struct lp_build_if_state if_ctx;
+   LLVMValueRef so_buffers[4];
+   LLVMValueRef buf_ptr = LLVMGetParam(ctx->radeon_bld.main_fn,
+   SI_PARAM_RW_BUFFERS);
+
+   /* Load the descriptors. */
+   for (i = 0; i < 4; ++i) {
+   if (ctx->shader->selector->so.stride[i]) {
+   LLVMValueRef offset = lp_build_const_int32(gallivm,
+  
SI_VS_STREAMOUT_BUF0 + i);
+
+   so_buffers[i] = build_indexed_load_const(ctx, buf_ptr, 
offset);
+   }
+   }
 
/* Get bits [22:16], i.e. (so_param >> 16) & 127; */
LLVMValueRef so_vtx_count =
unpack_param(ctx, ctx->param_streamout_config, 16, 7);
 
LLVMValueRef tid = get_thread_id(ctx);
 
/* can_emit = tid < so_vtx_count; */
LLVMValueRef can_emit =
LLVMBuildICmp(builder, LLVMIntULT, tid, so_vtx_count, "");
@@ -2359,21 +2371,21 @@ static void si_llvm_emit_streamout(struct 
si_shader_context *ctx,
}
break;
}
 
LLVMValueRef can_emit_stream =
LLVMBuildICmp(builder, LLVMIntEQ,
  stream_id,
  lp_build_const_int32(gallivm, 
stream), "");
 
lp_build_if(&if_ctx_stream, gallivm, can_emit_stream);
-   build_tbuffer_store_dwords(ctx, 
ctx->so_buffers[buf_idx],
+   build_tbuffer_store_dwords(ctx, so_buffers[buf_idx],
   vdata, num_comps,
   so_write_offset[buf_idx],
   LLVMConstInt(ctx->i32, 0, 0),
   so->output[i].dst_offset*4);
lp_build_endif(&if_ctx_stream);
}
}
lp_build_endif(&if_ctx);
 }
 
@@ -5917,49 +5929,20 @@ static void preload_images(struct si_shader_context 
*ctx)
 lp_build_const_int32(gallivm, 
i));
 
if (info->images_writemask & (1 << i) &&
!(info->images_buffers & (1 << i)))
rsrc = force_dcc_off(ctx, rsrc);
 
ctx->images[i] = rsrc;
}
 }
 
-static void preload_streamout_buffers(struct si_shader_context *ctx)
-{
-   struct lp_build_tgsi_context *bld_base = &ctx->radeon_bld.soa.bld_base;
-   struct gallivm_state *gallivm = bld_base->base.gallivm;
-   unsigned i;
-
-   /* Streamout can only be used if the shader is compiled as VS. */
-   if (!ctx->shader->selector->so.num_outputs ||
-   (ctx->type == PIPE_SHADER_VERTEX &&
-(ctx->shader->key.vs.as_es ||
- ctx->shader->key.vs.as_ls)) ||
-   (ctx->type == PIPE_SHADER_TESS_EVAL &&
-ctx->shader->key.tes.as_es))
-   return;
-
-   LLVMValueRef buf_ptr = LLVMGetParam(ctx->radeon_bld.main_fn,
-   SI_PARAM_R

[Mesa-dev] [PATCH v3] HUD: Add support for block I/O, network I/O and lmsensor stats

2016-09-13 Thread Steven Toth
V3: Flatten the entire patchset ready for the ML

V2: Additional seperate patches based on feedback
a) configure.ac: Add a comment related to libsensors

b) HUD: Disable Block/NIC I/O stats by default.
Implement configuration option --enable-gallium-extra-hud=yes
and enable both statistics when this option is enabled.

c) Configure.ac: Minor cleanup to user visible configuration settings

d) Configure.ac: HUD stats - build system improvements
Move the -lsensors out of a deeper Makefile, bring it into the configure.ac.
Also, rename a compiler directive to more closely follow the standard.

V1: Initial release to the ML
Three new features:
1. Disk/block I/O device read/write stats MB/ps.
2. Network Interface RX/TX transfer statistics as a percentage
   of the overall NIC speed.
3. lmsensor power, voltage and temperature sensors.

The lmsensor changes makes a dependency on libsensors so support
so the change is opt out by default.

Signed-off-by: Steven Toth 
---
 configure.ac |  42 +++
 src/gallium/auxiliary/Makefile.am|   2 +
 src/gallium/auxiliary/Makefile.sources   |   3 +
 src/gallium/auxiliary/hud/hud_context.c  |  73 +
 src/gallium/auxiliary/hud/hud_diskstat.c | 335 
 src/gallium/auxiliary/hud/hud_nic.c  | 441 +++
 src/gallium/auxiliary/hud/hud_private.h  |  25 ++
 src/gallium/auxiliary/hud/hud_sensors_temp.c | 374 +++
 src/gallium/include/pipe/p_defines.h |   4 +
 9 files changed, 1299 insertions(+)
 create mode 100644 src/gallium/auxiliary/hud/hud_diskstat.c
 create mode 100644 src/gallium/auxiliary/hud/hud_nic.c
 create mode 100644 src/gallium/auxiliary/hud/hud_sensors_temp.c

diff --git a/configure.ac b/configure.ac
index a413a3a..610dff0 100644
--- a/configure.ac
+++ b/configure.ac
@@ -91,6 +91,7 @@ XCBGLX_REQUIRED=1.8.1
 XSHMFENCE_REQUIRED=1.1
 XVMC_REQUIRED=1.0.6
 PYTHON_MAKO_REQUIRED=0.8.0
+LIBSENSORS_REQUIRED=4.0.0
 
 dnl Check for progs
 AC_PROG_CPP
@@ -871,6 +872,32 @@ AC_ARG_ENABLE([dri],
 [enable_dri="$enableval"],
 [enable_dri=yes])
 
+AC_ARG_ENABLE([gallium-extra-hud],
+[AS_HELP_STRING([--enable-gallium-extra-hud],
+[enable HUD block/NIC I/O HUD stats support 
@<:@default=disabled@:>@])],
+[enable_gallium_extra_hud="$enableval"],
+[enable_gallium_extra_hud=no])
+AM_CONDITIONAL(HAVE_GALLIUM_EXTRA_HUD, test "x$enable_gallium_extra_hud" = 
xyes)
+if test "x$enable_gallium_extra_hud" = xyes ; then
+DEFINES="${DEFINES} -DHAVE_GALLIUM_EXTRA_HUD=1"
+fi
+
+#TODO: no pkgconfig .pc available for libsensors.
+#PKG_CHECK_MODULES([LIBSENSORS], [libsensors >= $LIBSENSORS_REQUIRED], 
[enable_lmsensors=yes], [enable_lmsensors=no])
+AC_ARG_ENABLE([lmsensors],
+[AS_HELP_STRING([--enable-lmsensors],
+[enable HUD lmsensor support @<:@default=disabled@:>@])],
+[enable_lmsensors="$enableval"],
+[enable_lmsensors=no])
+AM_CONDITIONAL(HAVE_LIBSENSORS, test "x$enable_lmsensors" = xyes)
+if test "x$enable_lmsensors" = xyes ; then
+DEFINES="${DEFINES} -DHAVE_LIBSENSORS=1"
+LIBSENSORS_LDFLAGS="-lsensors"
+else
+LIBSENSORS_LDFLAGS=""
+fi
+AC_SUBST(LIBSENSORS_LDFLAGS)
+
 case "$host_os" in
 linux*)
 dri3_default=yes
@@ -1122,6 +1149,8 @@ AM_CONDITIONAL(HAVE_DRISW_KMS, test "x$have_drisw_kms" = 
xyes )
 AM_CONDITIONAL(HAVE_DRI2, test "x$enable_dri" = xyes -a "x$dri_platform" = 
xdrm -a "x$have_libdrm" = xyes )
 AM_CONDITIONAL(HAVE_DRI3, test "x$enable_dri3" = xyes -a "x$dri_platform" = 
xdrm -a "x$have_libdrm" = xyes )
 AM_CONDITIONAL(HAVE_APPLEDRI, test "x$enable_dri" = xyes -a "x$dri_platform" = 
xapple )
+AM_CONDITIONAL(HAVE_LMSENSORS, test "x$enable_lmsensors" = xyes )
+AM_CONDITIONAL(HAVE_GALLIUM_EXTRA_HUD, test "x$enable_gallium_extra_hud" = 
xyes )
 
 AC_ARG_ENABLE([shared-glapi],
 [AS_HELP_STRING([--enable-shared-glapi],
@@ -2876,6 +2905,19 @@ else
 echo "Gallium: no"
 fi
 
+echo ""
+if test "x$enable_gallium_extra_hud" != xyes; then
+echo "HUD extra stats: no"
+else
+echo "HUD extra stats: yes"
+fi
+
+if test "x$enable_lmsensors" != xyes; then
+echo "HUD lmsensors:   no"
+else
+echo "HUD lmsensors:   yes"
+fi
+
 dnl Shader cache
 echo ""
 echo "Shader cache:$enable_shader_cache"
diff --git a/src/gallium/auxiliary/Makefile.am 
b/src/gallium/auxiliary/Makefile.am
index d971a2b..4a4a4fb 100644
--- a/src/gallium/auxiliary/Makefile.am
+++ b/src/gallium/auxiliary/Makefile.am
@@ -34,6 +34,8 @@ libgallium_la_SOURCES += \
 
 endif
 
+libgallium_la_LDFLAGS = $(LIBSENSORS_LDFLAGS)
+
 MKDIR_GEN = $(AM_V_at)$(MKDIR_P) $(@D)
 PYTHON_GEN =  $(AM_V_GEN)$(PYTHON2) $(PYTHON_FLAGS)
 
diff --git a/src/gallium/auxiliary/Makefile.sources 
b/src/gallium/auxiliary/Makefile.sources
index f8954c9..650a403 100644
--- a/src/gallium/auxiliary/Makefile.sources
+++ b/src/gallium/auxiliary/Makefile.sources
@@ -62,6 +62,9 @@ C_SOURCES := \
hud

Re: [Mesa-dev] [PATCH] tgsi: Enable returns from within loops

2016-09-13 Thread Brian Paul

On 09/13/2016 01:08 PM, Lars Hamre wrote:

Fixes the following piglit test (for softpipe):
/spec/glsl-1.10/execution/fs-loop-return

Signed-off-by: Lars Hamre 

---
  src/gallium/auxiliary/tgsi/tgsi_exec.c | 4 
  1 file changed, 4 insertions(+)

NOTE: Someone with access will need to commit this
   after the review process

diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index 1457c06..aff35e6 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -5148,6 +5148,10 @@ exec_instruction(
  /* returning from main() */
  mach->CondStackTop = 0;
  mach->LoopStackTop = 0;
+mach->ContStackTop = 0;
+mach->LoopLabelStackTop = 0;
+mach->SwitchStackTop = 0;
+mach->BreakStackTop = 0;
  *pc = -1;
  return FALSE;
   }
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



Reviewed-by: Brian Paul 

Do you need me to push this for you?

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: reload PS inputs with direct indexing at each use (v2)

2016-09-13 Thread Marek Olšák
From: Marek Olšák 

The LLVM compiler can CSE interp intrinsics thanks to
LLVMReadNoneAttribute.

26011 shaders in 14651 tests
Totals:
SGPRS: 1146340 -> 1132676 (-1.19 %)
VGPRS: 727371 -> 711730 (-2.15 %)
Spilled SGPRs: 2218 -> 2078 (-6.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 35841268 -> 36009732 (0.47 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222559 -> 224779 (1.00 %)
Wait states: 0 -> 0 (0.00 %)

v2: don't call load_input for fragment shaders in emit_declaration
---
 src/gallium/drivers/radeon/radeon_llvm.h   |  6 -
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 30 ++
 src/gallium/drivers/radeonsi/si_shader.c   | 27 ---
 3 files changed, 41 insertions(+), 22 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_llvm.h 
b/src/gallium/drivers/radeon/radeon_llvm.h
index da5b7f5..f508d32 100644
--- a/src/gallium/drivers/radeon/radeon_llvm.h
+++ b/src/gallium/drivers/radeon/radeon_llvm.h
@@ -23,21 +23,23 @@
  * Authors: Tom Stellard 
  *
  */
 
 #ifndef RADEON_LLVM_H
 #define RADEON_LLVM_H
 
 #include 
 #include "gallivm/lp_bld_init.h"
 #include "gallivm/lp_bld_tgsi.h"
+#include "tgsi/tgsi_parse.h"
 
+#define RADEON_LLVM_MAX_INPUT_SLOTS 32
 #define RADEON_LLVM_MAX_INPUTS 32 * 4
 #define RADEON_LLVM_MAX_OUTPUTS 32 * 4
 
 #define RADEON_LLVM_INITIAL_CF_DEPTH 4
 
 #define RADEON_LLVM_MAX_SYSTEM_VALUES 4
 
 struct radeon_llvm_branch {
LLVMBasicBlockRef endif_block;
LLVMBasicBlockRef if_block;
@@ -55,33 +57,35 @@ struct radeon_llvm_context {
 
/*=== Front end configuration ===*/
 
/* Instructions that are not described by any of the TGSI opcodes. */
 
/** This function is responsible for initilizing the inputs array and 
will be
  * called once for each input declared in the TGSI shader.
  */
void (*load_input)(struct radeon_llvm_context *,
   unsigned input_index,
-  const struct tgsi_full_declaration *decl);
+  const struct tgsi_full_declaration *decl,
+  LLVMValueRef out[4]);
 
void (*load_system_value)(struct radeon_llvm_context *,
  unsigned index,
  const struct tgsi_full_declaration *decl);
 
void (*declare_memory_region)(struct radeon_llvm_context *,
  const struct tgsi_full_declaration *decl);
 
/** This array contains the input values for the shader.  Typically 
these
  * values will be in the form of a target intrinsic that will inform 
the
  * backend how to load the actual inputs to the shader. 
  */
+   struct tgsi_full_declaration input_decls[RADEON_LLVM_MAX_INPUT_SLOTS];
LLVMValueRef inputs[RADEON_LLVM_MAX_INPUTS];
LLVMValueRef outputs[RADEON_LLVM_MAX_OUTPUTS][TGSI_NUM_CHANNELS];
 
/** This pointer is used to contain the temporary values.
  * The amount of temporary used in tgsi can't be bound to a max value 
and
  * thus we must allocate this array at runtime.
  */
LLVMValueRef *temps;
unsigned temps_count;
LLVMValueRef system_values[RADEON_LLVM_MAX_SYSTEM_VALUES];
diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 4643e6d..4fa43cd 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -439,28 +439,43 @@ LLVMValueRef radeon_llvm_emit_fetch(struct 
lp_build_tgsi_context *bld_base,
bld_base->int_bld.zero);
result = LLVMConstInsertElement(result,

bld->immediates[reg->Register.Index][swizzle + 1],
bld_base->int_bld.one);
return LLVMConstBitCast(result, ctype);
} else {
return 
LLVMConstBitCast(bld->immediates[reg->Register.Index][swizzle], ctype);
}
}
 
-   case TGSI_FILE_INPUT:
-   result = 
ctx->inputs[radeon_llvm_reg_index_soa(reg->Register.Index, swizzle)];
+   case TGSI_FILE_INPUT: {
+   unsigned index = reg->Register.Index;
+   LLVMValueRef input[4];
+
+   /* I don't think doing this for vertex shaders is beneficial.
+* For those, we want to make sure the VMEM loads are executed
+* only once. Fragment shaders don't care much, because
+* v_interp instructions are much cheaper than VMEM loads.
+*/
+   if (ctx->soa.bld_base.info->processor == PIPE_SHADER_FRAGMENT)
+   ctx->load_input(ctx, index, &ctx->input_de

Re: [Mesa-dev] [PATCH 13/14] egl: Track EGL_KHR_debug state when going through EGL API calls (v2)

2016-09-13 Thread Kyle Brenneman

On 09/13/2016 11:57 AM, Adam Jackson wrote:

From: Kyle Brenneman 

This decorates every EGL entrypoint with _EGL_FUNC_START, which records
the function name and primary dispatch object label in the current
thread state. It also adds debug report functions and calls them when
appropriate.

This would be useful enough for debugging on its own, if the user set a
breakpoint when the report function was called. We will also need this
state tracked in order to expose EGL_KHR_debug.

v2:
- Clear the object label in more cases in _eglSetFuncName
- Set dummy thread's CurrentAPI to EGL_NONE not zero
- Pass draw surface (if any) to _EGL_FUNC_START in eglSwapInterval
---
  src/egl/main/eglapi.c | 155 ++
  src/egl/main/eglcurrent.c |  91 ++-
  src/egl/main/eglcurrent.h |  22 +++
  src/egl/main/eglglobals.h |   5 ++
  4 files changed, 259 insertions(+), 14 deletions(-)

diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index 0477ad9..216b289 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -250,6 +250,37 @@ _eglUnlockDisplay(_EGLDisplay *dpy)
 mtx_unlock(&dpy->Mutex);
  }
  
+static EGLBoolean

+_eglSetFuncName(const char *funcName, _EGLDisplay *disp, EGLenum objectType, 
_EGLResource *object)
+{
+   _EGLThreadInfo *thr = _eglGetCurrentThread();
+   if (!_eglIsCurrentThreadDummy()) {
+  thr->CurrentFuncName = funcName;
+  thr->CurrentObjectLabel = NULL;
+
+  if (objectType == EGL_OBJECT_THREAD_KHR)
+ thr->CurrentObjectLabel = thr->Label;
+  else if (objectType == EGL_OBJECT_DISPLAY_KHR)
+ thr->CurrentObjectLabel = disp ? disp->Label : NULL;
+  else
+ thr->CurrentObjectLabel = object ? object->Label : NULL;
+
+  return EGL_TRUE;
+   }
+
+   _eglDebugReportFull(EGL_BAD_ALLOC, funcName, funcName,
+  EGL_DEBUG_MSG_CRITICAL_KHR, NULL, NULL);
+   return EGL_FALSE;
+}
_eglSetFuncName starts with "thr->CurrentObjectLabel = NULL", so if it 
didn't set the label to something else later, it would have been cleared.

+
+#define _EGL_FUNC_START(disp, objectType, object, ret) \
+   do { \
+  if (!_eglSetFuncName(__func__, disp, objectType, (_EGLResource *) 
object)) { \
+ if (disp) \
+_eglUnlockDisplay(disp);   \
+ return ret; \
+  } \
+   } while(0)
  
  static EGLint *

  _eglConvertAttribsToInt(const EGLAttrib *attr_list)
@@ -287,6 +318,8 @@ eglGetDisplay(EGLNativeDisplayType nativeDisplay)
 _EGLDisplay *dpy;
 void *native_display_ptr;
  
+   _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY);

+
 STATIC_ASSERT(sizeof(void*) == sizeof(nativeDisplay));
 native_display_ptr = (void*) nativeDisplay;
  
@@ -330,6 +363,7 @@ static EGLDisplay EGLAPIENTRY

  eglGetPlatformDisplayEXT(EGLenum platform, void *native_display,
   const EGLint *attrib_list)
  {
+   _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY);
 return _eglGetPlatformDisplayCommon(platform, native_display, attrib_list);
  }
  
@@ -340,6 +374,8 @@ eglGetPlatformDisplay(EGLenum platform, void *native_display,

 EGLDisplay display;
 EGLint *int_attribs;
  
+   _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY);

+
 int_attribs = _eglConvertAttribsToInt(attrib_list);
 if (attrib_list && !int_attribs)
RETURN_EGL_ERROR(NULL, EGL_BAD_ALLOC, NULL);
@@ -483,6 +519,8 @@ eglInitialize(EGLDisplay dpy, EGLint *major, EGLint *minor)
  {
 _EGLDisplay *disp = _eglLockDisplay(dpy);
  
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);

+
 if (!disp)
RETURN_EGL_ERROR(NULL, EGL_BAD_DISPLAY, EGL_FALSE);
  
@@ -533,6 +571,8 @@ eglTerminate(EGLDisplay dpy)

  {
 _EGLDisplay *disp = _eglLockDisplay(dpy);
  
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);

+
 if (!disp)
RETURN_EGL_ERROR(NULL, EGL_BAD_DISPLAY, EGL_FALSE);
  
@@ -560,6 +600,7 @@ eglQueryString(EGLDisplay dpy, EGLint name)

 }
  
 disp = _eglLockDisplay(dpy);

+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, NULL);
 _EGL_CHECK_DISPLAY(disp, NULL, drv);
  
 switch (name) {

@@ -585,6 +626,8 @@ eglGetConfigs(EGLDisplay dpy, EGLConfig *configs,
 _EGLDriver *drv;
 EGLBoolean ret;
  
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);

+
 _EGL_CHECK_DISPLAY(disp, EGL_FALSE, drv);
 ret = drv->API.GetConfigs(drv, disp, configs, config_size, num_config);
  
@@ -600,6 +643,8 @@ eglChooseConfig(EGLDisplay dpy, const EGLint *attrib_list, EGLConfig *configs,

 _EGLDriver *drv;
 EGLBoolean ret;
  
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);

+
 _EGL_CHECK_DISPLAY(disp, EGL_FALSE, drv);
 ret = drv->API.ChooseConfig(drv, disp, attrib_list, configs,
  config_size, num_config);
@@ -617,6 +662

Re: [Mesa-dev] [PATCH] Remove GL_GLEXT_PROTOTYPES guards from non-ext headers.

2016-09-13 Thread Ilia Mirkin
On Tue, Sep 13, 2016 at 4:05 PM, Eric Anholt  wrote:
> Ilia Mirkin  writes:
>
>> On Mon, Sep 12, 2016 at 11:55 AM, Emil Velikov  
>> wrote:
>>> On 12 September 2016 at 15:35, Ilia Mirkin  wrote:
 On Mon, Sep 12, 2016 at 10:10 AM, Emil Velikov  
 wrote:
> Keeping diff/patches in git always felt like a hack, imho. Plus
> most/all(?) distros rely on the Mesa headers, so I'm not sure how that
> is going to work.

 The alternatives are considerably more painful for just a handful of
 files with a small number of diffs. This would be as a tool for
 developers like us who update the mesa versions by importing new KHR
 versions, which will not have our local changes applied. The patch
 would not be used as part of the build process or anything else.

>>> The goal being to have the patches alongside the patched headers.
>>> This way one can use them as reference ? Sure sounds great imho.
>>
>> Exactly. So that when I download new KHR headers, I just apply the
>> patch to them (and hope it applies), and if not, look at what was
>> being done and try to repeat the process. Then I regenerate the patch
>> against the (new) originals and check the whole thing in.
>
> Or you could just use git like normal: You have a public branch of the
> unchanged headers.  You make your own changes to the headers on master.
> When you want to update to new upstream headers, you check out the
> unchanged-headers branch, commit new unchanged upstreams there, check
> out master, and git merge.

Right. Seems hardly worth the hassle for a small handful of diffs on
files we update once every 2 years.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Remove GL_GLEXT_PROTOTYPES guards from non-ext headers.

2016-09-13 Thread Eric Anholt
Ilia Mirkin  writes:

> On Mon, Sep 12, 2016 at 11:55 AM, Emil Velikov  
> wrote:
>> On 12 September 2016 at 15:35, Ilia Mirkin  wrote:
>>> On Mon, Sep 12, 2016 at 10:10 AM, Emil Velikov  
>>> wrote:
 Keeping diff/patches in git always felt like a hack, imho. Plus
 most/all(?) distros rely on the Mesa headers, so I'm not sure how that
 is going to work.
>>>
>>> The alternatives are considerably more painful for just a handful of
>>> files with a small number of diffs. This would be as a tool for
>>> developers like us who update the mesa versions by importing new KHR
>>> versions, which will not have our local changes applied. The patch
>>> would not be used as part of the build process or anything else.
>>>
>> The goal being to have the patches alongside the patched headers.
>> This way one can use them as reference ? Sure sounds great imho.
>
> Exactly. So that when I download new KHR headers, I just apply the
> patch to them (and hope it applies), and if not, look at what was
> being done and try to repeat the process. Then I regenerate the patch
> against the (new) originals and check the whole thing in.

Or you could just use git like normal: You have a public branch of the
unchanged headers.  You make your own changes to the headers on master.
When you want to update to new upstream headers, you check out the
unchanged-headers branch, commit new unchanged upstreams there, check
out master, and git merge.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 97261] vaapi u/v wrong order since vl/util: add copy func for yv12image to nv12surface

2016-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=97261

Andy Furniss  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Andy Furniss  ---
Fixed in mesa git -

https://cgit.freedesktop.org/mesa/mesa/commit/?id=304f70536a73f4b63360632428241c7488c99610

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 3/9] nv50/ir: teach load propagation about src2

2016-09-13 Thread Samuel Pitoiset
With OP_ADD3, we might want to swap sources 2 and 1.

Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 29 ++
 1 file changed, 29 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index a9172f8..f212eba 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -153,6 +153,7 @@ private:
virtual bool visit(BasicBlock *);
 
void checkSwapSrc01(Instruction *);
+   void checkSwapSrc21(Instruction *);
 
bool isCSpaceLoad(Instruction *);
bool isImmdLoad(Instruction *);
@@ -239,6 +240,32 @@ LoadPropagation::checkSwapSrc01(Instruction *insn)
}
 }
 
+void
+LoadPropagation::checkSwapSrc21(Instruction *insn)
+{
+   const Target *targ = prog->getTarget();
+   if (insn->op != OP_ADD3)
+  return;
+   if (insn->src(2).getFile() != FILE_GPR)
+  return;
+
+   Instruction *i1 = insn->getSrc(1)->getInsn();
+   Instruction *i2 = insn->getSrc(2)->getInsn();
+
+   // Swap sources to inline the less frequently used source. That way,
+   // optimistically, it will eventually be able to remove the instruction.
+   int i1refs = insn->getSrc(1)->refCount();
+   int i2refs = insn->getSrc(2)->refCount();
+
+   if ((isCSpaceLoad(i2) || isImmdLoad(i2)) && targ->insnCanLoad(insn, 1, i2)) 
{
+  if ((!isImmdLoad(i1) && !isCSpaceLoad(i1)) ||
+  !targ->insnCanLoad(insn, 1, i1) ||
+  i2refs < i1refs) {
+ insn->swapSources(2, 1);
+  }
+   }
+}
+
 bool
 LoadPropagation::visit(BasicBlock *bb)
 {
@@ -256,6 +283,8 @@ LoadPropagation::visit(BasicBlock *bb)
 
   if (i->srcExists(1))
  checkSwapSrc01(i);
+  if (i->srcExists(2))
+ checkSwapSrc21(i);
 
   for (int s = 0; i->srcExists(s); ++s) {
  Instruction *ld = i->getSrc(s)->getInsn();
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 5/9] nv50/ir: optimize ADD3(d, a, b, c) to MOV(d, a + b + c)

2016-09-13 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index fe815e3..ecde364 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -797,6 +797,17 @@ ConstantFolding::expr(Instruction *i,
   }
   break;
}
+   case OP_ADD3: {
+  switch (i->dType) {
+  case TYPE_S32:
+  case TYPE_U32:
+ res.data.u32 = a->data.u32 + b->data.u32 + c->data.u32;
+ break;
+  default:
+ return;
+  }
+  break;
+   }
default:
   return;
}
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 4/9] nv50/ir: optimize ADD(ADD(a, b), c) to ADD3(a, b, c)

2016-09-13 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 55 ++
 1 file changed, 55 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index f212eba..fe815e3 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1569,6 +1569,7 @@ private:
void handleABS(Instruction *);
bool handleADD(Instruction *);
bool tryADDToMADOrSAD(Instruction *, operation toOp);
+   bool tryADDToADD3(Instruction *);
void handleMINMAX(Instruction *);
void handleRCP(Instruction *);
void handleSLCT(Instruction *);
@@ -1642,6 +1643,8 @@ AlgebraicOpt::handleADD(Instruction *add)
   changed = tryADDToMADOrSAD(add, OP_MAD);
if (!changed && prog->getTarget()->isOpSupported(OP_SAD, add->dType))
   changed = tryADDToMADOrSAD(add, OP_SAD);
+   if (!changed && prog->getTarget()->isOpSupported(OP_ADD3, add->dType))
+  changed = tryADDToADD3(add);
return changed;
 }
 
@@ -1712,6 +1715,58 @@ AlgebraicOpt::tryADDToMADOrSAD(Instruction *add, 
operation toOp)
return true;
 }
 
+// ADD(ADD(a,b), c) -> ADD3(a,b,c)
+bool
+AlgebraicOpt::tryADDToADD3(Instruction *add)
+{
+   Value *src0 = add->getSrc(0);
+   Value *src1 = add->getSrc(1);
+   const Modifier modBad = Modifier(~NV50_IR_MOD_NEG);
+   Modifier mod[4];
+   Value *src;
+   int s;
+
+   if (src0->refCount() == 1 &&
+   src0->getUniqueInsn() && src0->getUniqueInsn()->op == OP_ADD)
+  s = 0;
+   else
+   if (src1->refCount() == 1 &&
+   src1->getUniqueInsn() && src1->getUniqueInsn()->op == OP_ADD)
+  s = 1;
+   else
+  return false;
+
+   src = add->getSrc(s);
+
+   if (src->getUniqueInsn() && src->getUniqueInsn()->bb != add->bb)
+  return false;
+
+   if (src->getInsn()->saturate)
+  return false;
+
+   if (typeSizeof(add->dType) != typeSizeof(src->getInsn()->dType))
+  return false;
+
+   mod[0] = add->src(0).mod;
+   mod[1] = add->src(1).mod;
+   mod[2] = src->getUniqueInsn()->src(0).mod;
+   mod[3] = src->getUniqueInsn()->src(1).mod;
+
+   if (((mod[0] | mod[1]) | (mod[2] | mod[3])) & modBad)
+  return false;
+
+   add->op = OP_ADD3;
+   add->dType = src->getInsn()->dType;
+   add->sType = src->getInsn()->sType;
+
+   add->setSrc(s, src->getInsn()->getSrc(0));
+   add->src(s).mod = mod[s] ^ mod[2];
+   add->setSrc(2, src->getInsn()->getSrc(1));
+   add->src(2).mod = mod[3];
+
+   return true;
+}
+
 void
 AlgebraicOpt::handleMINMAX(Instruction *minmax)
 {
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 6/9] nv50/ir: optimize ADD3(d, a, b, 0x0) to ADD(d, a, b)

2016-09-13 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index ecde364..246cdff 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -937,6 +937,14 @@ ConstantFolding::opnd3(Instruction *i, ImmediateValue 
&imm2)
  return;
   }
   break;
+   case OP_ADD3:
+  if (imm2.isInteger(0)) {
+ i->op = OP_ADD;
+ i->setSrc(2, NULL);
+ foldCount++;
+ return;
+  }
+  break;
default:
   return;
}
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 2/9] gm107/ir: add emission for OP_ADD3

2016-09-13 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index cfde66c..fd3dd3f 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -150,6 +150,7 @@ private:
void emitLOP();
void emitNOT();
void emitIADD();
+   void emitIADD3();
void emitIMUL();
void emitIMAD();
void emitIMNMX();
@@ -1728,6 +1729,36 @@ CodeEmitterGM107::emitIADD()
 }
 
 void
+CodeEmitterGM107::emitIADD3()
+{
+   switch (insn->src(1).getFile()) {
+   case FILE_GPR:
+  emitInsn(0x5cc0);
+  emitGPR (0x14, insn->src(1));
+  break;
+   case FILE_MEMORY_CONST:
+  emitInsn(0x4cc0);
+  emitCBUF(0x22, -1, 0x14, 16, 2, insn->src(1));
+  break;
+   case FILE_IMMEDIATE:
+  emitInsn(0x38c0);
+  emitIMMD(0x14, 19, insn->src(1));
+  break;
+   default:
+  assert(!"bad src1 file");
+  break;
+   }
+   emitNEG(0x33, insn->src(0));
+   emitNEG(0x32, insn->src(1));
+   emitNEG(0x31, insn->src(2));
+   emitX  (0x30);
+   emitCC (0x2f);
+   emitGPR(0x27, insn->src(2));
+   emitGPR(0x08, insn->src(0));
+   emitGPR(0x00, insn->def(0));
+}
+
+void
 CodeEmitterGM107::emitIMUL()
 {
if (insn->src(1).getFile() != FILE_IMMEDIATE) {
@@ -3077,6 +3108,9 @@ CodeEmitterGM107::emitInstruction(Instruction *i)
  emitIADD();
   }
   break;
+   case OP_ADD3:
+  emitIADD3();
+  break;
case OP_MUL:
   if (isFloatType(insn->dType)) {
  if (insn->dType == TYPE_F64)
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 7/9] nv50/ir: optimize ADD3(d, 0x0, b, c) to ADD(d, b, c)

2016-09-13 Thread Samuel Pitoiset
And ADD3(d, a, 0x0, c) to ADD(d, a, c) as well.

v2: - use moveSources()
- allow ADD3 -> ADD when srcFlags is set

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 246cdff..284f187 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1070,7 +1070,12 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
&imm0, int s)
 i->src(0).mod = Modifier(0);
   }
   break;
-
+   case OP_ADD3:
+  if (imm0.isInteger(0)) {
+ i->op = OP_ADD;
+ i->moveSources(s + 1, -1);
+  }
+  break;
case OP_DIV:
   if (s != 1 || (i->dType != TYPE_S32 && i->dType != TYPE_U32))
  break;
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 1/9] nv50/ir: add preliminary support for OP_ADD3

2016-09-13 Thread Samuel Pitoiset
This instruction is new since SM50 (Maxwell) and allows to perform
an add with three sources. Unfortunately, it only supports integers.

v3: - set commutative flag for OP_ADD3
- move OP_ADD3 after arithmetic ops

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.h|  1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp|  1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp   |  6 +++---
 src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp |  4 
 src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp  |  1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp  | 12 
 6 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index d6011d9..12a8b10 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -57,6 +57,7 @@ enum operation
OP_MAD,
OP_FMA,
OP_SAD, // abs(src0 - src1) + src2
+   OP_ADD3,
OP_ABS,
OP_NEG,
OP_NOT,
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
index 22f2f5d..83340f2 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
@@ -86,6 +86,7 @@ const char *operationStr[OP_LAST + 1] =
"mad",
"fma",
"sad",
+   "add3",
"abs",
"neg",
"not",
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
index 7d7b315..dcf35ba 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp
@@ -30,7 +30,7 @@ const uint8_t Target::operationSrcNr[] =
0, 0,   // NOP, PHI
0, 0, 0, 0, // UNION, SPLIT, MERGE, CONSTRAINT
1, 1, 2,// MOV, LOAD, STORE
-   2, 2, 2, 2, 2, 3, 3, 3, // ADD, SUB, MUL, DIV, MOD, MAD, FMA, SAD
+   2, 2, 2, 2, 2, 3, 3, 3, 3, // ADD, SUB, MUL, DIV, MOD, MAD, FMA, SAD, ADD3
1, 1, 1,// ABS, NEG, NOT
2, 2, 2, 2, 2,  // AND, OR, XOR, SHL, SHR
2, 2, 1,// MAX, MIN, SAT
@@ -70,10 +70,10 @@ const OpClass Target::operationClass[] =
OPCLASS_MOVE,
OPCLASS_LOAD,
OPCLASS_STORE,
-   // ADD, SUB, MUL; DIV, MOD; MAD, FMA, SAD
+   // ADD, SUB, MUL; DIV, MOD; MAD, FMA, SAD, ADD3
OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH,
OPCLASS_ARITH, OPCLASS_ARITH,
-   OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH,
+   OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH, OPCLASS_ARITH,
// ABS, NEG; NOT, AND, OR, XOR; SHL, SHR
OPCLASS_CONVERT, OPCLASS_CONVERT,
OPCLASS_LOGIC, OPCLASS_LOGIC, OPCLASS_LOGIC, OPCLASS_LOGIC,
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
index 6b8f767..eecd61f 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp
@@ -61,6 +61,10 @@ TargetGM107::isOpSupported(operation op, DataType ty) const
case OP_DIV:
case OP_MOD:
   return false;
+   case OP_ADD3:
+  if (isFloatType(ty))
+ return false;
+  break;
default:
   break;
}
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
index b37ea73..e1a7963 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
@@ -437,6 +437,7 @@ TargetNV50::isOpSupported(operation op, DataType ty) const
case OP_EXTBF:
case OP_EXIT: // want exit modifier instead (on NOP if required)
case OP_MEMBAR:
+   case OP_ADD3:
   return false;
case OP_SAD:
   return ty == TYPE_S32;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
index f5981de..a927c1e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
@@ -147,7 +147,9 @@ static const struct opProperties _initProps[] =
{ OP_SUSTP,   0x0, 0x0, 0x0, 0x0, 0x2, 0x0 },
{ OP_SUCLAMP, 0x0, 0x0, 0x0, 0x0, 0x2, 0x2 },
{ OP_SUBFM,   0x0, 0x0, 0x0, 0x0, 0x6, 0x2 },
-   { OP_SUEAU,   0x0, 0x0, 0x0, 0x0, 0x6, 0x2 }
+   { OP_SUEAU,   0x0, 0x0, 0x0, 0x0, 0x6, 0x2 },
+   // gm107 ops:
+   { OP_ADD3,0x7, 0x0, 0x0, 0x0, 0x2, 0x2 },
 };
 
 void TargetNVC0::initOpInfo()
@@ -156,14 +158,14 @@ void TargetNVC0::initOpInfo()
 
static const uint32_t commutative[(OP_LAST + 31) / 32] =
{
-  // ADD, MAD, MUL, AND, OR, XOR, MAX, MIN
-  0x0670ca00, 0x003f, 0x, 0x
+  // ADD, MAD, MUL, ADD3, AND, OR, XOR, MAX, MIN
+

[Mesa-dev] [PATCH v3 8/9] nv50/ir: optimize ADD3(d, a, b, c) to ADD(d, c, a + b)

2016-09-13 Thread Samuel Pitoiset
This is similar to what we already do for MAD/FMA.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 284f187..6ba2af6 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -605,6 +605,14 @@ ConstantFolding::expr(Instruction *i,
  return;
   }
   break;
+   case OP_ADD3:
+  switch (i->dType) {
+  case TYPE_S32:
+  case TYPE_U32: res.data.u32 = a->data.u32 + b->data.u32; break;
+  default:
+ return;
+  }
+  break;
case OP_POW:
   switch (i->dType) {
   case TYPE_F32: res.data.f32 = pow(a->data.f32, b->data.f32); break;
@@ -721,7 +729,8 @@ ConstantFolding::expr(Instruction *i,
 
switch (i->op) {
case OP_MAD:
-   case OP_FMA: {
+   case OP_FMA:
+   case OP_ADD3: {
   ImmediateValue src0, src1 = *i->getSrc(0)->asImm();
 
   // Move the immediate into position 1, where we know it might be
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 9/9] nv50/ir: optimize ADD3(d, a, b, c) to ADD(d, a, b + c)

2016-09-13 Thread Samuel Pitoiset
And ADD3(d, a, b, c) to ADD(d, b, a + c) as well.

Very modest effect because OP_ADD3 only supports integers, but can
reduce the number of instructions in some shaders.

total instructions in shared programs :2594754 -> 2594686 (-0.00%)
total gprs used in shared programs:366893 -> 366919 (0.01%)
total local used in shared programs   :31872 -> 31872 (0.00%)

localgpr   inst  bytes
helped   0   0  39  39
  hurt   0  26   0   0

Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 62 ++
 1 file changed, 62 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 6ba2af6..e5e6e8e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -374,6 +374,7 @@ private:
void expr(Instruction *, ImmediateValue&, ImmediateValue&);
void expr(Instruction *, ImmediateValue&, ImmediateValue&, ImmediateValue&);
void opnd(Instruction *, ImmediateValue&, int s);
+   void opnd2(Instruction *, ImmediateValue&, int, ImmediateValue&, int);
void opnd3(Instruction *, ImmediateValue&);
 
void unary(Instruction *, const ImmediateValue&);
@@ -429,6 +430,13 @@ ConstantFolding::visit(BasicBlock *bb)
  opnd(i, src1, 1);
   if (i->srcExists(2) && i->src(2).getImmediate(src2))
  opnd3(i, src2);
+  if (i->srcExists(2) &&
+  i->src(0).getImmediate(src0) && i->src(2).getImmediate(src2))
+ opnd2(i, src0, 0, src2, 2);
+  else
+  if (i->srcExists(2) &&
+  i->src(1).getImmediate(src1) && i->src(2).getImmediate(src2))
+ opnd2(i, src1, 1, src2, 2);
}
return true;
 }
@@ -960,6 +968,60 @@ ConstantFolding::opnd3(Instruction *i, ImmediateValue 
&imm2)
 }
 
 void
+ConstantFolding::opnd2(Instruction *i, ImmediateValue &imm0, int s0,
+   ImmediateValue &imm1, int s1)
+{
+   struct Storage *const a = &imm0.reg, *const b = &imm1.reg;
+   ImmediateValue src0, src1;
+   struct Storage res;
+   DataType type = i->dType;
+
+   memset(&res.data, 0, sizeof(res.data));
+
+   switch (i->op) {
+   case OP_ADD3:
+  switch (i->dType) {
+  case TYPE_S32:
+  case TYPE_U32: res.data.u32 = a->data.u32 + b->data.u32; break;
+  default:
+ return;
+  }
+  break;
+   default:
+  return;
+   }
+   ++foldCount;
+
+   i->op = OP_ADD;
+
+   if (s0 == 0) {
+  i->setSrc(0, i->getSrc(1));
+  i->src(0).mod = i->src(1).mod;
+   }
+
+   i->setSrc(1, new_ImmediateValue(i->bb->getProgram(), res.data.u32));
+   i->setSrc(2, NULL);
+
+   i->getSrc(1)->reg.data = res.data;
+   i->getSrc(1)->reg.type = type;
+   i->getSrc(1)->reg.size = typeSizeof(type);
+
+   src1 = *i->getSrc(1)->asImm();
+
+   // Move the immediate into position 1, where we know it might be
+   // emittable. However it might not be anyways, as there may be other
+   // restrictions, so move it into a separate LValue.
+   bld.setPosition(i, false);
+   i->setSrc(1, bld.mkMov(bld.getSSA(type), i->getSrc(1), type)->getDef(0));
+   i->src(1).mod = Modifier(0);
+
+   if (i->src(0).getImmediate(src0))
+  expr(i, src0, src1);
+   else
+  opnd(i, src1, 1);
+}
+
+void
 ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
 {
const int t = !s;
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] tgsi: Enable returns from within loops

2016-09-13 Thread Lars Hamre
Fixes the following piglit test (for softpipe):
/spec/glsl-1.10/execution/fs-loop-return

Signed-off-by: Lars Hamre 

---
 src/gallium/auxiliary/tgsi/tgsi_exec.c | 4 
 1 file changed, 4 insertions(+)

NOTE: Someone with access will need to commit this
  after the review process

diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index 1457c06..aff35e6 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -5148,6 +5148,10 @@ exec_instruction(
 /* returning from main() */
 mach->CondStackTop = 0;
 mach->LoopStackTop = 0;
+mach->ContStackTop = 0;
+mach->LoopLabelStackTop = 0;
+mach->SwitchStackTop = 0;
+mach->BreakStackTop = 0;
 *pc = -1;
 return FALSE;
  }
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 94627] Game Risen on wine black grass

2016-09-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=94627

--- Comment #9 from Heiko Ernst  ---
Bug is closed in mesa 12.0.2

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radeonsi/compute: Use the HSA abi for non-TGSI compute shaders v2

2016-09-13 Thread Tom Stellard
This patch switches non-TGSI compute shaders over to using the HSA
ABI described here:

https://github.com/RadeonOpenCompute/ROCm-Docs/blob/master/AMDGPU-ABI.md

The HSA ABI provides a much cleaner interface for compute shaders and allows
us to share more code in the compiler with the HSA stack.

The main changes in this patch are:
  - We now pass the scratch buffer resource into the shader via user sgprs
rather than using relocations.
  - Grid/Block sizes are now passed to the shader via the dispatch packet
rather than at the beginning of the kernel arguments.

Typically for HSA, the CP firmware will create the dispatch packet and set
up the user sgprs automatically.  However, in Mesa we let the driver do
this work.  The main reason for this is that I haven't researched how to
get the CP to do all these things, and I'm not sure if it is supported
for all GPUs.

v2:
  - Add comments explaining why we are setting certian bits of the scratch
resource descriptor.
---
 src/gallium/drivers/radeon/r600_pipe_common.c|   6 +-
 src/gallium/drivers/radeonsi/amd_kernel_code_t.h | 534 +++
 src/gallium/drivers/radeonsi/si_compute.c| 236 +-
 3 files changed, 758 insertions(+), 18 deletions(-)
 create mode 100644 src/gallium/drivers/radeonsi/amd_kernel_code_t.h

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 6d7cc1b..8f17f36 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -822,7 +822,11 @@ static int r600_get_compute_param(struct pipe_screen 
*screen,
if (rscreen->family <= CHIP_ARUBA) {
triple = "r600--";
} else {
-   triple = "amdgcn--";
+   if (HAVE_LLVM < 0x0400) {
+   triple = "amdgcn--";
+   } else {
+   triple = "amdgcn--mesa3d";
+   }
}
switch(rscreen->family) {
/* Clang < 3.6 is missing Hainan in its list of
diff --git a/src/gallium/drivers/radeonsi/amd_kernel_code_t.h 
b/src/gallium/drivers/radeonsi/amd_kernel_code_t.h
new file mode 100644
index 000..d0d7809
--- /dev/null
+++ b/src/gallium/drivers/radeonsi/amd_kernel_code_t.h
@@ -0,0 +1,534 @@
+/*
+ * Copyright 2015,2016 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * on the rights to use, copy, modify, merge, publish, distribute, sub
+ * license, and/or sell copies of the Software, and to permit persons to whom
+ * the Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef AMDKERNELCODET_H
+#define AMDKERNELCODET_H
+
+//---//
+// AMD Kernel Code, and its dependencies //
+//---//
+
+// Sets val bits for specified mask in specified dst packed instance.
+#define AMD_HSA_BITS_SET(dst, mask, val)   
\
+  dst &= (~(1 << mask ## _SHIFT) & ~mask); 
\
+  dst |= (((val) << mask ## _SHIFT) & mask)
+
+// Gets bits for specified mask from specified src packed instance.
+#define AMD_HSA_BITS_GET(src, mask)
\
+  ((src & mask) >> mask ## _SHIFT) 
\
+
+/* Every amd_*_code_t has the following properties, which are composed of
+ * a number of bit fields. Every bit field has a mask (AMD_CODE_PROPERTY_*),
+ * bit width (AMD_CODE_PROPERTY_*_WIDTH, and bit shift amount
+ * (AMD_CODE_PROPERTY_*_SHIFT) for convenient access. Unused bits must be 0.
+ *
+ * (Note that bit fields cannot be used as their layout is
+ * implementation defined in the C standard and so cannot be used to
+ * specify an ABI)
+ */
+enum amd_code_property_mask_t {
+
+  /* Enable the setup of the SGPR user data registers
+   * (AMD_CODE_PROPERTY_ENAB

[Mesa-dev] [PATCH 1/2] radeonsi/compute: Add some more debug printfs

2016-09-13 Thread Tom Stellard
---
 src/gallium/drivers/radeonsi/si_compute.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 5041761..a79c224 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -298,6 +298,9 @@ static bool si_switch_compute_shader(struct si_context 
*sctx,
radeon_emit(cs, config->rsrc1);
radeon_emit(cs, config->rsrc2);
 
+   COMPUTE_DBG(sctx->screen, "COMPUTE_PGM_RSRC1: 0x%08x "
+   "COMPUTE_PGM_RSRC2: 0x%08x\n", config->rsrc1, config->rsrc2);
+
radeon_set_sh_reg(cs, R_00B860_COMPUTE_TMPRING_SIZE,
  S_00B860_WAVES(sctx->scratch_waves)
 | S_00B860_WAVESIZE(config->scratch_bytes_per_wave >> 10));
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/va: also honors interlaced preference when providing a video format

2016-09-13 Thread Andy Furniss

Andy Furniss wrote:

Leo Liu wrote:



mpv --vo-vaapi all is apparently OK when playing say a 25fps vid,
but I've found that if I push the framerate to refresh rate and
do something that draws OSD than image is corrupted, possible many
VM faults logged followed by a segfault in u_copy_yv12_img_to_nv12_surf
this happens with or without the uv swap patch below. I will file a bug
after more investigation. Bisecting mesa goes back to the commit that
introduced  VAAPI_DISABLE_INTERLACE.


We have to be careful, we cannot override preferred interlaced type, got
from querying.


OK, but it won't work = corrupted but no crash without the env.


Ahh, ignore that I see what you mean (I think), but I don't know how to 
fix it.


I notice the env was described as a "temporary solution" in the commit 
message.



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/6] RadeonSI: Let's just stop spilling SGPRs

2016-09-13 Thread Marek Olšák
On Tue, Sep 13, 2016 at 8:16 PM, Nicolai Hähnle  wrote:
>
> On 13.09.2016 19:13, Marek Olšák wrote:
>>
>> This is quite easy because we just have to get rid of all of
>> the preloading at the beginning of shaders.
>>
>> I also removed preloading of PS inputs with literal indexing, which
>> has almost the same effect as sinking interp instructions.
>>
>> I'm slightly concerned that LICM won't move interps because they are
>> not considered speculatively-executable (=movable) by LLVM, but
>> the shader-db stats show that it doesn't matter.
>>
>> LLVM is smart enough to do CSE where needed for both descriptor loads
>> and interps. In fact, it's the CSE which is responsible for some of
>> the remaining SGPR spills. (It makes sense if you think about it)
>>
>> The compile time increased by 6% because CSE has a lot more work,
>> but it's certainly worth it.
>>
>>
>> shader-db stats:
>>
>> [PATCH 4/6] radeonsi: get rid of img/buf/sampler descriptor
>> https://people.freedesktop.org/~mareko/no_preload1.html
>> [PATCH 5/6] radeonsi: get rid of constant buffer preloading
>> https://people.freedesktop.org/~mareko/no_preload2.html
>> [PATCH 6/6] radeonsi: reload PS inputs with direct indexing at each
>> https://people.freedesktop.org/~mareko/no_preload3_ps.html
>>
>> Total diff:
>> https://people.freedesktop.org/~mareko/no_preload_total.html
>
>
> Those numbers are impressive.
>
> We do have to be slightly careful, I noticed that LLVM didn't lift some 
> constant loads out of loops with the earlier preload removal, in shaders 
> where SGPR pressure wasn't an issue at all.
>
> I think the right way to deal with this is to improve heuristics in LLVM, so 
> I'm fine with changing Mesa in this way.


Yeah. The problem is the LICM (moving stuff out of loops) and Sink
(moving stuff forward) passes are no-ops with intrinsics, because
intrinsics fail the "isSafeToSpeculativelyExecute" function. The
trivial fix would be to add a new "movable" flag for intrinsics and
process it in that function.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Problem with RX 480 on Alien: Isolation and Dota 2

2016-09-13 Thread Romain Failliot
Thanks a lot! I'll try that tonight!

I have a 64-bit distrib, I don't think so but do I need to compile the 32-bit
version of llvm as well (is it because Steam is using 32-bit libraries?).

2016-09-13 13:53 GMT-04:00 Marek Olšák :
> LLVM 64-bit:
>
> mkdir -p build
> cd build
> cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/x86_64-linux-gnu
> -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=O
>   -DCMAKE_BUILD_TYPE=RelWithDebInfo
> -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \
>   -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
> -fno-omit-frame-pointer" \
>   -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
> -fno-omit-frame-pointer".
> ninja
> sudo ninja install
>
>
> LLVM 32-bit:
>
> mkdir -p build32
> cd build32
> cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/i386-linux-gnu
> -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=ON
>   -DCMAKE_BUILD_TYPE=RelWithDebInfo
> -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \
>   -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
> -fno-omit-frame-pointer" \
>   -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
> -fno-omit-frame-pointer" \
>   -DLLVM_BUILD_32_BITS=ON
> ninja
> sudo ninja install
> # then add /usr/llvm/x86_64-linux-gnu and /usr/llvm/i386-linux-gnu to
> ld.conf
>
>
> Mesa configure helper script, it will overwrite the /usr/lib/ files on
> Ubuntu (run as-is for 64-bit, or use "-32" for 32-bit):
>
> if test x$1 = x-32; then
> dir=i386-linux-gnu
> build=i686-linux-gnu
> export CFLAGS="-m32 -O2 -g"
> export CXXFLAGS="$CFLAGS"
> export LDFLAGS="-L/usr/lib/$dir"
> export PKG_CONFIG_PATH="/usr/lib/$dir/pkgconfig"
> else
> dir=x86_64-linux-gnu
> build=$dir
> fi
>
> ./autogen.sh \
>  --build=$build --prefix=/usr --libdir=/usr/lib/$dir
> --with-llvm-prefix=/usr/llvm/$dir \
>  --enable-glx-tls --enable-texture-float --enable-debug --enable-vdpau \
>  --disable-xvmc --disable-va --enable-nine --with-sha1=libnettle \
>  --with-gallium-drivers=radeonsi,r600,swrast --with-dri-drivers= \
>  --with-egl-platforms=x11,drm --enable-gles1 --enable-gles2
>
> make -j4
> sudo make install
>
> You'll probably want to delete /usr/lib/$dir/*mesa*/*. That's Ubuntu's
> invention that will prevent you from using installed libGL and libEGL.
>
> It's all kind of a mess, but I don't know of a better way.
>
> Marek
>
>
>
> On Tue, Sep 13, 2016 at 7:33 PM, Romain Failliot
>  wrote:
>> 2016-09-13 12:41 GMT-04:00 Marek Olšák :
>>>
>>> BTW, If you update LLVM to a newer version, you also have to re-build
>>> Mesa, because the LLVM version used by Mesa is determined while Mesa
>>> is being built.
>>>
>>> Also, the chance to rage-quit while building LLVM+Mesa is pretty high
>>> if you've never done it before.
>>
>> I see, is there a tutorial somewhere maybe on how to do that?
>> I know how to compile projects, that's not a problem. It's more about the
>> little details to make everything work once it's compiled.
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/14] egl: Lock the display in _eglCreateSync's callers

2016-09-13 Thread Adam Jackson
On Tue, 2016-09-13 at 19:18 +0100, Emil Velikov wrote:

> For the series as a whole ?
> Two words which contradict any software's stable scheme - new feature.

Disagree, but I'm not the one running Mesa's stable branch, so my
opinion doesn't count here.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] virgl: fix flushing while removing sampler views

2016-09-13 Thread marcandre . lureau
From: Marc-André Lureau 

When updating the sampler views, virgl might need to flush. After
flushing, the resources are reattached, however, the sampler
enabled_mask isn't yet updated, and some views could be in the process
of being removed, which lead to the following crash:

 Thread 1 "heaven_x64" received signal SIGSEGV, Segmentation fault.
 0x7f83893cc3de in virgl_attach_res_sampler_views 
(vctx=vctx@entry=0x1c22c00, shader_type=shader_type@entry=1) at 
virgl_context.c:113
 113  res = virgl_resource(tinfo->views[i]->base.texture);
 (gdb) bt
 #0  0x7f83893cc3de in virgl_attach_res_sampler_views 
(vctx=vctx@entry=0x1c22c00, shader_type=shader_type@entry=1) at 
virgl_context.c:113
 #1  0x7f83893cc703 in virgl_reemit_res (vctx=0x1c22c00) at 
virgl_context.c:182
 #2  virgl_flush_eq (ctx=ctx@entry=0x1c22c00, closure=0x1c22c00) at 
virgl_context.c:637
 #3  0x7f83893ccbf8 in virgl_flush_from_st (ctx=0x1c22c00, fence=, flags=) at virgl_context.c:659
 #4  0x7f83893cd6b0 in virgl_encoder_write_cmd_dword 
(ctx=ctx@entry=0x1c22c00, dword=dword@entry=67075) at virgl_encode.c:43
 #5  0x7f83893cd76b in virgl_encode_delete_object (ctx=0x1c22c00, 
handle=1306480, object=object@entry=6) at virgl_encode.c:72
 #6  0x7f83893ccc81 in virgl_destroy_sampler_view (ctx=, 
view=0x7aca1b0) at virgl_context.c:741
 #7  0x7f83893cca17 in pipe_sampler_view_reference (view=0x0, 
ptr=0x1c22fc8) at ../../../../src/gallium/auxiliary/util/u_inlines.h:151
 #8  virgl_set_sampler_views (ctx=0x1c22c00, shader_type=1, 
start_slot=, num_views=, views=) 
at virgl_context.c:724
 #9  0x7f8388fffd68 in cso_set_sampler_views (ctx=0x1ca2ee0, 
shader_stage=, count=9, views=) at 
cso_cache/cso_context.c:1301
 #10 0x7f8388e670c1 in update_textures (st=, 
mesa_shader=, prog=, max_units=16, 
sampler_views=0x1c8c140, num_textures=0x1c8c644) at 
state_tracker/st_atom_texture.c:465
 #11 0x7f8388e6296d in st_validate_state (st=st@entry=0x1c8a710, 
pipeline=pipeline@entry=ST_PIPELINE_RENDER) at state_tracker/st_atom.c:289
 #12 0x7f8388e8343b in st_draw_vbo (ctx=0x1c50600, prims=0x7ffe99b5a580, 
nr_prims=1, ib=0x7ffe99b5a560, index_bounds_valid=, 
min_index=, max_index=, tfb_vertcount=0x0, 
stream=0,
 indirect=0x0) at state_tracker/st_draw.c:176
 #13 0x7f8388e44d34 in vbo_validated_drawrangeelements 
(ctx=ctx@entry=0x1c50600, mode=mode@entry=4, 
index_bounds_valid=index_bounds_valid@entry=0 '\000', 
start=start@entry=4294967295, end=end@entry=4294967295, count=count@entry=2688,
 type=5123, indices=0x0, basevertex=0, numInstances=1, baseInstance=0) at 
vbo/vbo_exec_array.c:849
 #14 0x7f8388e44db5 in vbo_exec_DrawElementsInstanced (mode=4,
 #count=2688, type=5123, indices=0x0, numInstances=1) at
 #vbo/vbo_exec_array.c:1030

Instead, remove the views from enabled_mask immediately.

Signed-off-by: Marc-André Lureau 
---
 src/gallium/drivers/virgl/virgl_context.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/virgl/virgl_context.c 
b/src/gallium/drivers/virgl/virgl_context.c
index 9007583..55144d5 100644
--- a/src/gallium/drivers/virgl/virgl_context.c
+++ b/src/gallium/drivers/virgl/virgl_context.c
@@ -711,6 +711,7 @@ static void virgl_set_sampler_views(struct pipe_context 
*ctx,
   pipe_sampler_view_reference((struct pipe_sampler_view 
**)&tinfo->views[i], NULL);
}
 
+   tinfo->enabled_mask &= ~disable_mask;
for (i = 0; i < num_views; i++) {
   struct virgl_sampler_view *grview = virgl_sampler_view(views[i]);
 
@@ -721,12 +722,11 @@ static void virgl_set_sampler_views(struct pipe_context 
*ctx,
  new_mask |= 1 << i;
  pipe_sampler_view_reference((struct pipe_sampler_view 
**)&tinfo->views[i], views[i]);
   } else {
+ tinfo->enabled_mask &= ~(1 << i);
  pipe_sampler_view_reference((struct pipe_sampler_view 
**)&tinfo->views[i], NULL);
- disable_mask |= 1 << i;
   }
}
 
-   tinfo->enabled_mask &= ~disable_mask;
tinfo->enabled_mask |= new_mask;
virgl_encode_set_sampler_views(vctx, shader_type, start_slot, num_views, 
tinfo->views);
virgl_attach_res_sampler_views(vctx, shader_type);
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/14] egl: Lock the display in _eglCreateSync's callers

2016-09-13 Thread Emil Velikov
On 13 September 2016 at 19:14, Adam Jackson  wrote:
> On Tue, 2016-09-13 at 17:22 +0100, Emil Velikov wrote:
>
>> Actually, current code has a bunch of such bugs which this series addresses.
>> Considering there's only a couple of those and they are pretty hard to
>> hit I won't bother with respinning the patches.
>>
>> That is unless we want them for stable ?
>
> I mean, I'm going to want this in "stable" because I want to switch to
> libglvnd sooner than later. I'm perfectly capable of applying the
> series to Fedora's Mesa build on my own though.
>
> I guess my question is why applying the whole series to stable wouldn't
> be acceptable.
>
For the series as a whole ?
Two words which contradict any software's stable scheme - new feature.

Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/va: also honors interlaced preference when providing a video format

2016-09-13 Thread Andy Furniss

Leo Liu wrote:



On 09/13/2016 01:29 PM, Andy Furniss wrote:

Leo Liu wrote:

Hi Andy,

On 09/13/2016 06:22 AM, Andy Furniss wrote:

Zhang, Boyuan wrote:

Hi Leo, Christian and Julien,

I tested the patch with Vaapi Encoding and Transcoding, it seems
working fine. We are using "VAAPI_DISABLE_INTERLACE" env, so
interlaced is always disabled.


Though I notice it will break screen recording scripts for existing
users who previously didn't need the env set but will after this.

Totally untested/thought through, but maybe the env should default
to on?


Agree, can you come up a patch for that?


OK, but maybe I should test a bit first to see if anything regresses.

Unfortunately I today, by chance found an issue with mpv.

With VAAPI_DISABLE_INTERLACE=1 which it needs for

mpv --vo-vaapi all is apparently OK when playing say a 25fps vid,
but I've found that if I push the framerate to refresh rate and
do something that draws OSD than image is corrupted, possible many
VM faults logged followed by a segfault in u_copy_yv12_img_to_nv12_surf
this happens with or without the uv swap patch below. I will file a bug
after more investigation. Bisecting mesa goes back to the commit that
introduced  VAAPI_DISABLE_INTERLACE.


We have to be careful, we cannot override preferred interlaced type, got
from querying.


OK, but it won't work = corrupted but no crash without the env.






Also any outstanding patches for VA-API encode from you was reviewed,
but not committed?
if any, sent to me, I can push them.


There's only

https://lists.freedesktop.org/archives/mesa-dev/2016-July/124695.html

for the uv swap issue.


Done. pushed.


Thanks.





Not my issue as such, but did anyone notice this from Mark Thompson, who
does vaapi for libav/ffmpeg?

I notice he didn't keep the CCs so maybe it got missed.

https://lists.freedesktop.org/archives/mesa-dev/2016-September/128076.html




Did anyone have this reviewed?


No - I was thinking that as no one replied at all it had been missed,
it's kind of a bug report really.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/6] RadeonSI: Let's just stop spilling SGPRs

2016-09-13 Thread Nicolai Hähnle

On 13.09.2016 19:13, Marek Olšák wrote:

This is quite easy because we just have to get rid of all of
the preloading at the beginning of shaders.

I also removed preloading of PS inputs with literal indexing, which
has almost the same effect as sinking interp instructions.

I'm slightly concerned that LICM won't move interps because they are
not considered speculatively-executable (=movable) by LLVM, but
the shader-db stats show that it doesn't matter.

LLVM is smart enough to do CSE where needed for both descriptor loads
and interps. In fact, it's the CSE which is responsible for some of
the remaining SGPR spills. (It makes sense if you think about it)

The compile time increased by 6% because CSE has a lot more work,
but it's certainly worth it.


shader-db stats:

[PATCH 4/6] radeonsi: get rid of img/buf/sampler descriptor
https://people.freedesktop.org/~mareko/no_preload1.html
[PATCH 5/6] radeonsi: get rid of constant buffer preloading
https://people.freedesktop.org/~mareko/no_preload2.html
[PATCH 6/6] radeonsi: reload PS inputs with direct indexing at each
https://people.freedesktop.org/~mareko/no_preload3_ps.html

Total diff:
https://people.freedesktop.org/~mareko/no_preload_total.html


Those numbers are impressive.

We do have to be slightly careful, I noticed that LLVM didn't lift some 
constant loads out of loops with the earlier preload removal, in shaders 
where SGPR pressure wasn't an issue at all.


I think the right way to deal with this is to improve heuristics in 
LLVM, so I'm fine with changing Mesa in this way.


Some comments sent in reply to the respective patches. Patches 2, 3, and 
5 are


Reviewed-by: Nicolai Hähnle 


Please review.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 14/14] egl: Implement EGL_KHR_debug

2016-09-13 Thread Emil Velikov
On 13 September 2016 at 18:55, Adam Jackson  wrote:
> On Tue, 2016-09-13 at 17:17 +0100, Emil Velikov wrote:
>
>> > +  } else {
>> > + _eglDebugReportFull(EGL_BAD_ALLOC, __func__, __func__,
>> > +   EGL_DEBUG_MSG_CRITICAL_KHR, NULL, NULL);
>> > + return EGL_BAD_ALLOC;
>> > +  }
>>
>> Nit: Please use the same style as the "objectType ==
>> EGL_OBJECT_DISPLAY_KHR" case.
>
> AFAICT the reason this code doesn't use RETURN_EGL_ERROR like
> everything else is because it doesn't lock the display. Which is
> extremely wrong, since we definitely depend on it not going away from
> under us later! Fixed in v2.
>
Hehe, the locking 'issue' mentioned in 09/14 is already upon us. I've
completely missed the lack of unlock here.

>> Nit: You can also drop the "else" and flatten (indent one level less)
>> all of the following code.
>
> Done in v2.
>
>> Missing EGLAPIENTRY
>
> Fixed in v2.
>
>> > +eglDebugMessageControlKHR(EGLDEBUGPROCKHR callback,
>> > + const EGLAttrib *attrib_list)
>> > +{
>> > +   unsigned int newEnabled;
>> > +
>> > +   _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC);
>> > +
>> > +   mtx_lock(_eglGlobal.Mutex);
>> > +
>> > +   newEnabled = _eglGlobal.debugTypesEnabled;
>> > +   if (attrib_list != NULL) {
>> > +  int i;
>> > +
>> > +  for (i = 0; attrib_list[i] != EGL_NONE; i += 2) {
>>
>> Don't think we check it elsewhere (and/or if we should care too much) but 
>> still:
>> Check if i overflows or use unsigned type ?
>
> There's a bunch of places where we don't check that...
>
>> > + if (attrib_list[i] >= EGL_DEBUG_MSG_CRITICAL_KHR &&
>> > +   attrib_list[i] <= EGL_DEBUG_MSG_INFO_KHR) {
>> > +if (attrib_list[i + 1]) {
>> > +   newEnabled |= DebugBitFromType(attrib_list[i]);
>> > +} else {
>> > +   newEnabled &= ~DebugBitFromType(attrib_list[i]);
>> > +}
>>
>> Nit: break; ?
>
> Nope. You're allowed to set the disposition for multiple error levels
> in a single call to DebugMessageControl, so you need to validate them
> all.
>
Right, had a bit of a brain freeze moment.

>> > +eglQueryDebugKHR(EGLint attribute, EGLAttrib *value)
>> > +{
>> > +   _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC);
>> > +
>> > +   mtx_lock(_eglGlobal.Mutex);
>> > +   if (attribute >= EGL_DEBUG_MSG_CRITICAL_KHR &&
>> > + attribute <= EGL_DEBUG_MSG_INFO_KHR) {
>> > +  if (_eglGlobal.debugTypesEnabled & DebugBitFromType(attribute)) {
>> > + *value = EGL_TRUE;
>> > +  } else {
>> > + *value = EGL_FALSE;
>> > +  }
>> > +   } else if (attribute == EGL_DEBUG_CALLBACK_KHR) {
>> > +  *value = (EGLAttrib) _eglGlobal.debugCallback;
>> > +   } else {
>> > +  mtx_unlock(_eglGlobal.Mutex);
>> > +  _eglReportError(EGL_BAD_ATTRIBUTE, NULL,
>> > +  "Invalid attribute 0x%04lx", (unsigned long) attribute);
>> > +  return EGL_FALSE;
>> > +   }
>>
>> Nit: Switch statement will be a lot easier to read.
>
> Meh. I factored out the valid-debug-level check to a helper, at which
> point you can't really use a switch. Redone as a do-while so I could
> use break to bail out of the success conditions.
>
Whichever works really. Just pointing out that using if/else chains
esp. when the else isn't needed makes things messy/less
appealing/etc..

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/14] egl: Lock the display in _eglCreateSync's callers

2016-09-13 Thread Adam Jackson
On Tue, 2016-09-13 at 17:22 +0100, Emil Velikov wrote:

> Actually, current code has a bunch of such bugs which this series addresses.
> Considering there's only a couple of those and they are pretty hard to
> hit I won't bother with respinning the patches.
> 
> That is unless we want them for stable ?

I mean, I'm going to want this in "stable" because I want to switch to
libglvnd sooner than later. I'm perfectly capable of applying the
series to Fedora's Mesa build on my own though.

I guess my question is why applying the whole series to stable wouldn't
be acceptable.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/6] radeonsi: reload PS inputs with direct indexing at each use

2016-09-13 Thread Nicolai Hähnle

On 13.09.2016 19:13, Marek Olšák wrote:

From: Marek Olšák 

The LLVM compiler can CSE interp intrinsics thanks to
LLVMReadNoneAttribute.

26011 shaders in 14651 tests
Totals:
SGPRS: 1146340 -> 1132676 (-1.19 %)
VGPRS: 727371 -> 711730 (-2.15 %)
Spilled SGPRs: 2218 -> 2078 (-6.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 35841268 -> 36009732 (0.47 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222559 -> 224779 (1.00 %)
Wait states: 0 -> 0 (0.00 %)
---
 src/gallium/drivers/radeon/radeon_llvm.h   |  6 -
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 28 ++
 src/gallium/drivers/radeonsi/si_shader.c   | 27 +
 3 files changed, 39 insertions(+), 22 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_llvm.h 
b/src/gallium/drivers/radeon/radeon_llvm.h
index da5b7f5..f508d32 100644
--- a/src/gallium/drivers/radeon/radeon_llvm.h
+++ b/src/gallium/drivers/radeon/radeon_llvm.h
@@ -23,21 +23,23 @@
  * Authors: Tom Stellard 
  *
  */

 #ifndef RADEON_LLVM_H
 #define RADEON_LLVM_H

 #include 
 #include "gallivm/lp_bld_init.h"
 #include "gallivm/lp_bld_tgsi.h"
+#include "tgsi/tgsi_parse.h"

+#define RADEON_LLVM_MAX_INPUT_SLOTS 32
 #define RADEON_LLVM_MAX_INPUTS 32 * 4
 #define RADEON_LLVM_MAX_OUTPUTS 32 * 4

 #define RADEON_LLVM_INITIAL_CF_DEPTH 4

 #define RADEON_LLVM_MAX_SYSTEM_VALUES 4

 struct radeon_llvm_branch {
LLVMBasicBlockRef endif_block;
LLVMBasicBlockRef if_block;
@@ -55,33 +57,35 @@ struct radeon_llvm_context {

/*=== Front end configuration ===*/

/* Instructions that are not described by any of the TGSI opcodes. */

/** This function is responsible for initilizing the inputs array and 
will be
  * called once for each input declared in the TGSI shader.
  */
void (*load_input)(struct radeon_llvm_context *,
   unsigned input_index,
-  const struct tgsi_full_declaration *decl);
+  const struct tgsi_full_declaration *decl,
+  LLVMValueRef out[4]);

void (*load_system_value)(struct radeon_llvm_context *,
  unsigned index,
  const struct tgsi_full_declaration *decl);

void (*declare_memory_region)(struct radeon_llvm_context *,
  const struct tgsi_full_declaration *decl);

/** This array contains the input values for the shader.  Typically 
these
  * values will be in the form of a target intrinsic that will inform 
the
  * backend how to load the actual inputs to the shader.
  */
+   struct tgsi_full_declaration input_decls[RADEON_LLVM_MAX_INPUT_SLOTS];
LLVMValueRef inputs[RADEON_LLVM_MAX_INPUTS];
LLVMValueRef outputs[RADEON_LLVM_MAX_OUTPUTS][TGSI_NUM_CHANNELS];

/** This pointer is used to contain the temporary values.
  * The amount of temporary used in tgsi can't be bound to a max value 
and
  * thus we must allocate this array at runtime.
  */
LLVMValueRef *temps;
unsigned temps_count;
LLVMValueRef system_values[RADEON_LLVM_MAX_SYSTEM_VALUES];
diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 4643e6d..11f0cf2 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -439,28 +439,43 @@ LLVMValueRef radeon_llvm_emit_fetch(struct 
lp_build_tgsi_context *bld_base,
bld_base->int_bld.zero);
result = LLVMConstInsertElement(result,

bld->immediates[reg->Register.Index][swizzle + 1],
bld_base->int_bld.one);
return LLVMConstBitCast(result, ctype);
} else {
return 
LLVMConstBitCast(bld->immediates[reg->Register.Index][swizzle], ctype);
}
}

-   case TGSI_FILE_INPUT:
-   result = 
ctx->inputs[radeon_llvm_reg_index_soa(reg->Register.Index, swizzle)];
+   case TGSI_FILE_INPUT: {
+   unsigned index = reg->Register.Index;
+   LLVMValueRef input[4];
+
+   /* I don't think doing this for vertex shaders is beneficial.
+* For those, we want to make sure the VMEM loads are executed
+* only once. Fragment shaders don't care much, because
+* v_interp instructions are much cheaper than VMEM loads.
+*/
+   if (ctx->soa.bld_base.info->processor == PIPE_SHADER_FRAGMENT)
+   ctx->load_input(ctx, index, &ctx->input_decls[index], 
input);
+   el

Re: [Mesa-dev] Problem with RX 480 on Alien: Isolation and Dota 2

2016-09-13 Thread Kai Wasserbäch
Marek Olšák wrote on 13.09.2016 19:53:
> LLVM 64-bit:
> 
> mkdir -p build
> cd build
> cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/x86_64-linux-gnu
> -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=O
>   -DCMAKE_BUILD_TYPE=RelWithDebInfo
> -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \
>   -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
> -fno-omit-frame-pointer" \
>   -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
> -fno-omit-frame-pointer".
> ninja
> sudo ninja install
> 
> 
> LLVM 32-bit:
> 
> mkdir -p build32
> cd build32
> cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/i386-linux-gnu
> -DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=ON
>   -DCMAKE_BUILD_TYPE=RelWithDebInfo
> -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \
>   -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
> -fno-omit-frame-pointer" \
>   -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
> -fno-omit-frame-pointer" \
>   -DLLVM_BUILD_32_BITS=ON
> ninja
> sudo ninja install
> # then add /usr/llvm/x86_64-linux-gnu and /usr/llvm/i386-linux-gnu to
> ld.conf
> 
> 
> Mesa configure helper script, it will overwrite the /usr/lib/ files on
> Ubuntu (run as-is for 64-bit, or use "-32" for 32-bit):

Just a note about Debian/Ubuntu (even though I don't think that's too
interesting for Romain):
Well, if you build for Debian or a derivative like Ubuntu and you do not need
many versions in parallel, ie. as a user, then the best option (IMHO) is using
Debian's package and building it with the current upstream (plus some odd
changes here and there for new symbols and such). LLVM you mostly don't need to
build (in that case), because you can just use the packages from apt.llvm.org,
which Sylvestre, the Debian LLVM maintainer, builds for various Debian and
Ubuntu flavours. (And that saves a lot of the rage-quit potential IMHO, since
building LLVM can take a while and fail in very inopportune moments.) That way
you can switch cleanly back to your distros packages, if something breaks (or
they catch up far enough that you want to stop building Mesa yourself).
Anyway, this is how I do my builds: LLVM only if I have to, in case apt.llvm.org
is currently outdated (happens occasionally) or I'm testing a patch for upstream
inclusion. And Mesa I build regularly by just using git-buildpackage with the
Debian package as a base and a local branch for current Mesa git plus a local
"Debian branch" including different configuration (eg. I was already building
the VA-API stuff for myself before Debian started doing it) or patches (again:
those I'm testing for upstream inclusion), if any. Mesa builds are quite fast,
only a couple of minutes in a clean chroot environment (pbuilder!), so it's not
nearly as annoying as building LLVM.
If there's interest, I could probably whip some guide up, which could be shipped
in Mesa's doc directory?


By the way, since nobody mentioned this so far: if you want OpenCL support
you're going to need to rebuild libclc as well. Your distro's libclc was built
against the LLVM it ships.

Cheers,
Kai



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/6] radeonsi: get rid of img/buf/sampler descriptor preloading

2016-09-13 Thread Nicolai Hähnle

On 13.09.2016 19:13, Marek Olšák wrote:

From: Marek Olšák 

26011 shaders in 14651 tests
Totals:
SGPRS: 1251920 -> 1152636 (-7.93 %)
VGPRS: 728421 -> 728198 (-0.03 %)
Spilled SGPRs: 16644 -> 3776 (-77.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 36001064 -> 35835152 (-0.46 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 21 -> 222372 (0.07 %)
Wait states: 0 -> 0 (0.00 %)
---
 src/gallium/drivers/radeonsi/si_shader.c | 123 +++
 1 file changed, 28 insertions(+), 95 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 3f77714..c96c52e 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -100,25 +100,20 @@ struct si_shader_context

LLVMTargetMachineRef tm;

unsigned invariant_load_md_kind;
unsigned range_md_kind;
unsigned uniform_md_kind;
LLVMValueRef empty_md;

/* Preloaded descriptors. */
LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS];
-   LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS];
-   LLVMValueRef sampler_views[SI_NUM_SAMPLERS];
-   LLVMValueRef sampler_states[SI_NUM_SAMPLERS];
-   LLVMValueRef fmasks[SI_NUM_SAMPLERS];
-   LLVMValueRef images[SI_NUM_IMAGES];
LLVMValueRef esgs_ring;
LLVMValueRef gsvs_ring[4];

LLVMValueRef lds;
LLVMValueRef gs_next_vertex[4];
LLVMValueRef return_value;

LLVMTypeRef voidt;
LLVMTypeRef i1;
LLVMTypeRef i8;
@@ -3420,30 +3415,32 @@ static void membar_emit(
struct si_shader_context *ctx = si_shader_context(bld_base);

emit_waitcnt(ctx);
 }

 static LLVMValueRef
 shader_buffer_fetch_rsrc(struct si_shader_context *ctx,
 const struct tgsi_full_src_register *reg)
 {
LLVMValueRef ind_index;
-   LLVMValueRef rsrc_ptr;
+   LLVMValueRef rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn,
+SI_PARAM_SHADER_BUFFERS);

-   if (!reg->Register.Indirect)
-   return ctx->shader_buffers[reg->Register.Index];
+   if (!reg->Register.Indirect) {
+   ind_index = LLVMConstInt(ctx->i32, reg->Register.Index, 0);
+   return build_indexed_load_const(ctx, rsrc_ptr, ind_index);
+   }

ind_index = get_bounded_indirect_index(ctx, ®->Indirect,
   reg->Register.Index,
   SI_NUM_SHADER_BUFFERS);

-   rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, 
SI_PARAM_SHADER_BUFFERS);
return build_indexed_load_const(ctx, rsrc_ptr, ind_index);


The calls to build_indexed_load_const can be further unified.


 }

 static bool tgsi_is_array_sampler(unsigned target)
 {
return target == TGSI_TEXTURE_1D_ARRAY ||
   target == TGSI_TEXTURE_SHADOW1D_ARRAY ||
   target == TGSI_TEXTURE_2D_ARRAY ||
   target == TGSI_TEXTURE_SHADOW2D_ARRAY ||
   target == TGSI_TEXTURE_CUBE_ARRAY ||
@@ -3493,46 +3490,54 @@ static LLVMValueRef force_dcc_off(struct 
si_shader_context *ctx,
  * Load the resource descriptor for \p image.
  */
 static void
 image_fetch_rsrc(
struct lp_build_tgsi_context *bld_base,
const struct tgsi_full_src_register *image,
bool dcc_off,
LLVMValueRef *rsrc)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
+   LLVMValueRef rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn,
+SI_PARAM_IMAGES);

assert(image->Register.File == TGSI_FILE_IMAGE);

if (!image->Register.Indirect) {
-   /* Fast path: use preloaded resources */
-   *rsrc = ctx->images[image->Register.Index];
+   struct tgsi_shader_info *info = &ctx->shader->selector->info;
+   int i = image->Register.Index;
+   LLVMValueRef index = LLVMConstInt(ctx->i32, i, 0);
+
+   /* Rely on LLVM to shrink the load for buffer resources. */
+   *rsrc = build_indexed_load_const(ctx, rsrc_ptr, index);
+
+   if (info->images_writemask & (1 << i) &&
+   !(info->images_buffers & (1 << i)))
+   *rsrc = force_dcc_off(ctx, *rsrc);
} else {
/* Indexing and manual load */
LLVMValueRef ind_index;
-   LLVMValueRef rsrc_ptr;
LLVMValueRef tmp;

/* From the GL_ARB_shader_image_load_store extension spec:
 *
 *If a shader performs an image load, store, or atomic
 *operation using an image variable declared as an array,
 *and if the index used to select an individual element is
 *negative or greater than or equal to th

Re: [Mesa-dev] [PATCH] st/va: also honors interlaced preference when providing a video format

2016-09-13 Thread Leo Liu



On 09/13/2016 01:29 PM, Andy Furniss wrote:

Leo Liu wrote:

Hi Andy,

On 09/13/2016 06:22 AM, Andy Furniss wrote:

Zhang, Boyuan wrote:

Hi Leo, Christian and Julien,

I tested the patch with Vaapi Encoding and Transcoding, it seems
working fine. We are using "VAAPI_DISABLE_INTERLACE" env, so
interlaced is always disabled.


Though I notice it will break screen recording scripts for existing
users who previously didn't need the env set but will after this.

Totally untested/thought through, but maybe the env should default 
to on?


Agree, can you come up a patch for that?


OK, but maybe I should test a bit first to see if anything regresses.

Unfortunately I today, by chance found an issue with mpv.

With VAAPI_DISABLE_INTERLACE=1 which it needs for

mpv --vo-vaapi all is apparently OK when playing say a 25fps vid,
but I've found that if I push the framerate to refresh rate and
do something that draws OSD than image is corrupted, possible many
VM faults logged followed by a segfault in u_copy_yv12_img_to_nv12_surf
this happens with or without the uv swap patch below. I will file a bug
after more investigation. Bisecting mesa goes back to the commit that
introduced  VAAPI_DISABLE_INTERLACE.


We have to be careful, we cannot override preferred interlaced type, got 
from querying.





Also any outstanding patches for VA-API encode from you was reviewed,
but not committed?
if any, sent to me, I can push them.


There's only

https://lists.freedesktop.org/archives/mesa-dev/2016-July/124695.html

for the uv swap issue.


Done. pushed.



Not my issue as such, but did anyone notice this from Mark Thompson, who
does vaapi for libav/ffmpeg?

I notice he didn't keep the CCs so maybe it got missed.

https://lists.freedesktop.org/archives/mesa-dev/2016-September/128076.html 





Did anyone have this reviewed?

Regards,
Leo

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] intel/blorp: Stop setting 3DSTATE_DRAWING_RECTANGLE

2016-09-13 Thread Anuj Phogat
On Mon, Sep 12, 2016 at 3:50 PM, Jason Ekstrand  wrote:
> The Vulkan driver sets 3DSTATE_DRAWING_RECTANGLE once to MAX_INT x MAX_INT
> at the GPU initialization time and never sets it again.  The GL driver sets
> it every time the framebuffer changes.  Originally, blorp set it to the
> size of the drawing area but meant we had to set it back in the Vulkan
> driver.  Instead, we can easily just do that in the GL driver's blorp_exec
> implementation and not set it in blorp core.
>
> Signed-off-by: Jason Ekstrand 
> ---
>  src/intel/blorp/blorp_genX_exec.h   |  5 -
>  src/intel/vulkan/genX_blorp_exec.c  | 15 ---
>  src/mesa/drivers/dri/i965/genX_blorp_exec.c |  5 +
>  3 files changed, 5 insertions(+), 20 deletions(-)
>
> diff --git a/src/intel/blorp/blorp_genX_exec.h 
> b/src/intel/blorp/blorp_genX_exec.h
> index aff59e1..eb4a5b9 100644
> --- a/src/intel/blorp/blorp_genX_exec.h
> +++ b/src/intel/blorp/blorp_genX_exec.h
> @@ -1216,11 +1216,6 @@ blorp_exec(struct blorp_batch *batch, const struct 
> blorp_params *params)
>clear.DepthClearValue = params->depth.clear_color.u32[0];
> }
>
> -   blorp_emit(batch, GENX(3DSTATE_DRAWING_RECTANGLE), rect) {
> -  rect.ClippedDrawingRectangleXMax = MAX2(params->x1, params->x0) - 1;
> -  rect.ClippedDrawingRectangleYMax = MAX2(params->y1, params->y0) - 1;
> -   }
> -
> blorp_emit(batch, GENX(3DPRIMITIVE), prim) {
>prim.VertexAccessType = SEQUENTIAL;
>prim.PrimitiveTopologyType = _3DPRIM_RECTLIST;
> diff --git a/src/intel/vulkan/genX_blorp_exec.c 
> b/src/intel/vulkan/genX_blorp_exec.c
> index a3ad97a..5ddbb7d 100644
> --- a/src/intel/vulkan/genX_blorp_exec.c
> +++ b/src/intel/vulkan/genX_blorp_exec.c
> @@ -203,21 +203,6 @@ genX(blorp_exec)(struct blorp_batch *batch,
>
> blorp_exec(batch, params);
>
> -   /* BLORP sets DRAWING_RECTANGLE but we always want it set to the maximum.
> -* Since we set it once at driver init and never again, we have to set it
> -* back after invoking blorp.
> -*
> -* TODO: BLORP should assume a max drawing rectangle
> -*/
> -   blorp_emit(batch, GENX(3DSTATE_DRAWING_RECTANGLE), rect) {
> -  rect.ClippedDrawingRectangleYMin = 0;
> -  rect.ClippedDrawingRectangleXMin = 0;
> -  rect.ClippedDrawingRectangleYMax = UINT16_MAX;
> -  rect.ClippedDrawingRectangleXMax = UINT16_MAX;
> -  rect.DrawingRectangleOriginY = 0;
> -  rect.DrawingRectangleOriginX = 0;
> -   }
> -
> cmd_buffer->state.vb_dirty = ~0;
> cmd_buffer->state.dirty = ~0;
> cmd_buffer->state.push_constants_dirty = ~0;
> diff --git a/src/mesa/drivers/dri/i965/genX_blorp_exec.c 
> b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
> index 8cd5a62..edcd896 100644
> --- a/src/mesa/drivers/dri/i965/genX_blorp_exec.c
> +++ b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
> @@ -206,6 +206,11 @@ retry:
>
> brw_emit_depth_stall_flushes(brw);
>
> +   blorp_emit(batch, GENX(3DSTATE_DRAWING_RECTANGLE), rect) {
> +  rect.ClippedDrawingRectangleXMax = MAX2(params->x1, params->x0) - 1;
> +  rect.ClippedDrawingRectangleYMax = MAX2(params->y1, params->y0) - 1;
> +   }
> +
> blorp_exec(batch, params);
>
> /* Make sure we didn't wrap the batch unintentionally, and make sure we
> --
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reviewed-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] radeonsi: load streamout buffer descriptors before use

2016-09-13 Thread Nicolai Hähnle

On 13.09.2016 19:13, Marek Olšák wrote:

From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_shader.c | 67 
 1 file changed, 34 insertions(+), 33 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index be6fae7..b9ad4be 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -105,21 +105,20 @@ struct si_shader_context
unsigned uniform_md_kind;
LLVMValueRef empty_md;

LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS];
LLVMValueRef lds;
LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS];
LLVMValueRef sampler_views[SI_NUM_SAMPLERS];
LLVMValueRef sampler_states[SI_NUM_SAMPLERS];
LLVMValueRef fmasks[SI_NUM_SAMPLERS];
LLVMValueRef images[SI_NUM_IMAGES];
-   LLVMValueRef so_buffers[4];
LLVMValueRef esgs_ring;
LLVMValueRef gsvs_ring[4];
LLVMValueRef gs_next_vertex[4];
LLVMValueRef return_value;

LLVMTypeRef voidt;
LLVMTypeRef i1;
LLVMTypeRef i8;
LLVMTypeRef i32;
LLVMTypeRef i64;
@@ -2253,31 +2252,64 @@ static void si_dump_streamout(struct 
pipe_stream_output_info *so)
i, so->output[i].output_buffer,
so->output[i].dst_offset, so->output[i].dst_offset + 
so->output[i].num_components - 1,
so->output[i].register_index,
mask & 1 ? "x" : "",
mask & 2 ? "y" : "",
mask & 4 ? "z" : "",
mask & 8 ? "w" : "");
}
 }

+static void load_streamout_descriptors(struct si_shader_context *ctx,
+  LLVMValueRef so_buffers[4])
+{
+   struct lp_build_tgsi_context *bld_base = &ctx->radeon_bld.soa.bld_base;
+   struct gallivm_state *gallivm = bld_base->base.gallivm;
+   unsigned i;
+
+   /* Streamout can only be used if the shader is compiled as VS. */
+   if (!ctx->shader->selector->so.num_outputs ||
+   (ctx->type == PIPE_SHADER_VERTEX &&
+(ctx->shader->key.vs.as_es ||
+ ctx->shader->key.vs.as_ls)) ||
+   (ctx->type == PIPE_SHADER_TESS_EVAL &&
+ctx->shader->key.tes.as_es))
+   return;


This should probably be an assertion now.

Cheers
Nicolai


+
+   LLVMValueRef buf_ptr = LLVMGetParam(ctx->radeon_bld.main_fn,
+   SI_PARAM_RW_BUFFERS);
+
+   /* Load the resources, we rely on the code sinking to do the rest */
+   for (i = 0; i < 4; ++i) {
+   if (ctx->shader->selector->so.stride[i]) {
+   LLVMValueRef offset = lp_build_const_int32(gallivm,
+  
SI_VS_STREAMOUT_BUF0 + i);
+
+   so_buffers[i] = build_indexed_load_const(ctx, buf_ptr, 
offset);
+   }
+   }
+}
+
 /* On SI, the vertex shader is responsible for writing streamout data
  * to buffers. */
 static void si_llvm_emit_streamout(struct si_shader_context *ctx,
   struct si_shader_output_values *outputs,
   unsigned noutput)
 {
struct pipe_stream_output_info *so = &ctx->shader->selector->so;
struct gallivm_state *gallivm = &ctx->radeon_bld.gallivm;
LLVMBuilderRef builder = gallivm->builder;
int i, j;
struct lp_build_if_state if_ctx;
+   LLVMValueRef so_buffers[4];
+
+   load_streamout_descriptors(ctx, so_buffers);

/* Get bits [22:16], i.e. (so_param >> 16) & 127; */
LLVMValueRef so_vtx_count =
unpack_param(ctx, ctx->param_streamout_config, 16, 7);

LLVMValueRef tid = get_thread_id(ctx);

/* can_emit = tid < so_vtx_count; */
LLVMValueRef can_emit =
LLVMBuildICmp(builder, LLVMIntULT, tid, so_vtx_count, "");
@@ -2359,21 +2391,21 @@ static void si_llvm_emit_streamout(struct 
si_shader_context *ctx,
}
break;
}

LLVMValueRef can_emit_stream =
LLVMBuildICmp(builder, LLVMIntEQ,
  stream_id,
  lp_build_const_int32(gallivm, stream), 
"");

lp_build_if(&if_ctx_stream, gallivm, can_emit_stream);
-   build_tbuffer_store_dwords(ctx, 
ctx->so_buffers[buf_idx],
+   build_tbuffer_store_dwords(ctx, so_buffers[buf_idx],
   vdata, num_comps,
   so_write_offset[buf_idx],
   LLVMConstInt(ctx->i32, 0, 0),
 

[Mesa-dev] [PATCH 08/14] egl: Factor out _eglCreateImageCommon (v2)

2016-09-13 Thread Adam Jackson
From: Kyle Brenneman 

v2:
- Pass disp to RETURN_EGL_ERROR so we unlock the display
---
 src/egl/main/eglapi.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index a74e5e4..ba4826a 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -1309,11 +1309,10 @@ eglReleaseThread(void)
 }
 
 
-static EGLImage EGLAPIENTRY
-eglCreateImageKHR(EGLDisplay dpy, EGLContext ctx, EGLenum target,
+static EGLImage
+_eglCreateImageCommon(_EGLDisplay *disp, EGLContext ctx, EGLenum target,
   EGLClientBuffer buffer, const EGLint *attr_list)
 {
-   _EGLDisplay *disp = _eglLockDisplay(dpy);
_EGLContext *context = _eglLookupContext(ctx, disp);
_EGLDriver *drv;
_EGLImage *img;
@@ -1337,18 +1336,27 @@ eglCreateImageKHR(EGLDisplay dpy, EGLContext ctx, 
EGLenum target,
RETURN_EGL_EVAL(disp, ret);
 }
 
+static EGLImage EGLAPIENTRY
+eglCreateImageKHR(EGLDisplay dpy, EGLContext ctx, EGLenum target,
+  EGLClientBuffer buffer, const EGLint *attr_list)
+{
+   _EGLDisplay *disp = _eglLockDisplay(dpy);
+   return _eglCreateImageCommon(disp, ctx, target, buffer, attr_list);
+}
+
 
 EGLImage EGLAPIENTRY
 eglCreateImage(EGLDisplay dpy, EGLContext ctx, EGLenum target,
EGLClientBuffer buffer, const EGLAttrib *attr_list)
 {
+   _EGLDisplay *disp = _eglLockDisplay(dpy);
EGLImage image;
EGLint *int_attribs = _eglConvertAttribsToInt(attr_list);
 
if (attr_list && !int_attribs)
-  RETURN_EGL_ERROR(NULL, EGL_BAD_ALLOC, EGL_NO_IMAGE);
+  RETURN_EGL_ERROR(disp, EGL_BAD_ALLOC, EGL_NO_IMAGE);
 
-   image = eglCreateImageKHR(dpy, ctx, target, buffer, int_attribs);
+   image = _eglCreateImageCommon(disp, ctx, target, buffer, int_attribs);
free(int_attribs);
return image;
 }
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/14] egl: Implement EGL_KHR_debug (v2)

2016-09-13 Thread Adam Jackson
From: Kyle Brenneman 

Wire up the debug entrypoints to EGL dispatch, and add the extension
string to the client extension list.

v2:
- Lots of style fixes
- Fix missing EGLAPIENTRYs
- Factor out valid attribute check
- Lock display in eglLabelObjectKHR as needed, and use RETURN_EGL_*
- Move "EGL_KHR_debug" into asciibetical order in client extension
  string
---
 src/egl/main/eglapi.c | 145 ++
 src/egl/main/eglglobals.c |   1 +
 2 files changed, 146 insertions(+)

diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index 216b289..7162039 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -1977,6 +1977,148 @@ eglExportDMABUFImageMESA(EGLDisplay dpy, EGLImage image,
RETURN_EGL_EVAL(disp, ret);
 }
 
+static EGLint EGLAPIENTRY
+eglLabelObjectKHR(EGLDisplay dpy, EGLenum objectType, EGLObjectKHR object,
+ EGLLabelKHR label)
+{
+   _EGLDisplay *disp = NULL;
+   _EGLResourceType type;
+
+   _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC);
+
+   if (objectType == EGL_OBJECT_THREAD_KHR) {
+  _EGLThreadInfo *t = _eglGetCurrentThread();
+
+  if (!_eglIsCurrentThreadDummy()) {
+ t->Label = label;
+ return EGL_SUCCESS;
+  }
+
+  RETURN_EGL_ERROR(NULL, EGL_BAD_ALLOC, EGL_BAD_ALLOC);
+   }
+
+   disp = _eglLockDisplay(dpy);
+   if (disp == NULL)
+  RETURN_EGL_ERROR(disp, EGL_BAD_DISPLAY, EGL_BAD_DISPLAY);
+
+   if (objectType == EGL_OBJECT_DISPLAY_KHR) {
+  if (dpy != (EGLDisplay) object)
+ RETURN_EGL_ERROR(disp, EGL_BAD_PARAMETER, EGL_BAD_PARAMETER);
+
+  disp->Label = label;
+  RETURN_EGL_EVAL(disp, EGL_SUCCESS);
+   }
+
+   switch (objectType) {
+  case EGL_OBJECT_CONTEXT_KHR:
+ type = _EGL_RESOURCE_CONTEXT;
+ break;
+  case EGL_OBJECT_SURFACE_KHR:
+ type = _EGL_RESOURCE_SURFACE;
+ break;
+  case EGL_OBJECT_IMAGE_KHR:
+ type = _EGL_RESOURCE_IMAGE;
+ break;
+  case EGL_OBJECT_SYNC_KHR:
+ type = _EGL_RESOURCE_SYNC;
+ break;
+  case EGL_OBJECT_STREAM_KHR:
+  default:
+ RETURN_EGL_ERROR(disp, EGL_BAD_PARAMETER, EGL_BAD_PARAMETER);
+   }
+
+   if (_eglCheckResource(object, type, disp)) {
+  _EGLResource *res = (_EGLResource *) object;
+
+  res->Label = label;
+  RETURN_EGL_EVAL(disp, EGL_SUCCESS);
+   }
+
+   RETURN_EGL_ERROR(disp, EGL_BAD_PARAMETER, EGL_BAD_PARAMETER);
+}
+
+static EGLBoolean
+validDebugMessageLevel(EGLAttrib level)
+{
+   return (level >= EGL_DEBUG_MSG_CRITICAL_KHR &&
+   level <= EGL_DEBUG_MSG_INFO_KHR);
+}
+
+static EGLint EGLAPIENTRY
+eglDebugMessageControlKHR(EGLDEBUGPROCKHR callback,
+ const EGLAttrib *attrib_list)
+{
+   unsigned int newEnabled;
+
+   _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC);
+
+   mtx_lock(_eglGlobal.Mutex);
+
+   newEnabled = _eglGlobal.debugTypesEnabled;
+   if (attrib_list != NULL) {
+  int i;
+
+  for (i = 0; attrib_list[i] != EGL_NONE; i += 2) {
+ if (validDebugMessageLevel(attrib_list[i])) {
+if (attrib_list[i + 1])
+   newEnabled |= DebugBitFromType(attrib_list[i]);
+else
+   newEnabled &= ~DebugBitFromType(attrib_list[i]);
+continue;
+ }
+
+ // On error, set the last error code, call the current
+ // debug callback, and return the error code.
+ mtx_unlock(_eglGlobal.Mutex);
+ _eglReportError(EGL_BAD_ATTRIBUTE, NULL,
+   "Invalid attribute 0x%04lx", (unsigned long) attrib_list[i]);
+ return EGL_BAD_ATTRIBUTE;
+  }
+   }
+
+   if (callback != NULL) {
+  _eglGlobal.debugCallback = callback;
+  _eglGlobal.debugTypesEnabled = newEnabled;
+   } else {
+  _eglGlobal.debugCallback = NULL;
+  _eglGlobal.debugTypesEnabled = _EGL_DEBUG_BIT_CRITICAL | 
_EGL_DEBUG_BIT_ERROR;
+   }
+
+   mtx_unlock(_eglGlobal.Mutex);
+   return EGL_SUCCESS;
+}
+
+static EGLBoolean EGLAPIENTRY
+eglQueryDebugKHR(EGLint attribute, EGLAttrib *value)
+{
+   _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC);
+
+   mtx_lock(_eglGlobal.Mutex);
+
+   do {
+  if (validDebugMessageLevel(attribute)) {
+ if (_eglGlobal.debugTypesEnabled & DebugBitFromType(attribute))
+*value = EGL_TRUE;
+ else
+*value = EGL_FALSE;
+ break;
+  }
+
+  if (attribute == EGL_DEBUG_CALLBACK_KHR) {
+ *value = (EGLAttrib) _eglGlobal.debugCallback;
+ break;
+  }
+
+  mtx_unlock(_eglGlobal.Mutex);
+  _eglReportError(EGL_BAD_ATTRIBUTE, NULL,
+  "Invalid attribute 0x%04lx", (unsigned long) attribute);
+  return EGL_FALSE;
+   } while (0);
+
+   mtx_unlock(_eglGlobal.Mutex);
+   return EGL_TRUE;
+}
+
 __eglMustCastToProperFunctionPointerType EGLAPIENTRY
 eglGetProcAddress(const char *procname)
 {
@@ -2056,6 +2198,9 @@ eglGetProcAddress(const char *procname)
   { "eglGetSyncValuesCHROMIU

[Mesa-dev] [PATCH 13/14] egl: Track EGL_KHR_debug state when going through EGL API calls (v2)

2016-09-13 Thread Adam Jackson
From: Kyle Brenneman 

This decorates every EGL entrypoint with _EGL_FUNC_START, which records
the function name and primary dispatch object label in the current
thread state. It also adds debug report functions and calls them when
appropriate.

This would be useful enough for debugging on its own, if the user set a
breakpoint when the report function was called. We will also need this
state tracked in order to expose EGL_KHR_debug.

v2:
- Clear the object label in more cases in _eglSetFuncName
- Set dummy thread's CurrentAPI to EGL_NONE not zero
- Pass draw surface (if any) to _EGL_FUNC_START in eglSwapInterval
---
 src/egl/main/eglapi.c | 155 ++
 src/egl/main/eglcurrent.c |  91 ++-
 src/egl/main/eglcurrent.h |  22 +++
 src/egl/main/eglglobals.h |   5 ++
 4 files changed, 259 insertions(+), 14 deletions(-)

diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index 0477ad9..216b289 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -250,6 +250,37 @@ _eglUnlockDisplay(_EGLDisplay *dpy)
mtx_unlock(&dpy->Mutex);
 }
 
+static EGLBoolean
+_eglSetFuncName(const char *funcName, _EGLDisplay *disp, EGLenum objectType, 
_EGLResource *object)
+{
+   _EGLThreadInfo *thr = _eglGetCurrentThread();
+   if (!_eglIsCurrentThreadDummy()) {
+  thr->CurrentFuncName = funcName;
+  thr->CurrentObjectLabel = NULL;
+
+  if (objectType == EGL_OBJECT_THREAD_KHR)
+ thr->CurrentObjectLabel = thr->Label;
+  else if (objectType == EGL_OBJECT_DISPLAY_KHR)
+ thr->CurrentObjectLabel = disp ? disp->Label : NULL;
+  else
+ thr->CurrentObjectLabel = object ? object->Label : NULL;
+
+  return EGL_TRUE;
+   }
+
+   _eglDebugReportFull(EGL_BAD_ALLOC, funcName, funcName,
+  EGL_DEBUG_MSG_CRITICAL_KHR, NULL, NULL);
+   return EGL_FALSE;
+}
+
+#define _EGL_FUNC_START(disp, objectType, object, ret) \
+   do { \
+  if (!_eglSetFuncName(__func__, disp, objectType, (_EGLResource *) 
object)) { \
+ if (disp) \
+_eglUnlockDisplay(disp);   \
+ return ret; \
+  } \
+   } while(0)
 
 static EGLint *
 _eglConvertAttribsToInt(const EGLAttrib *attr_list)
@@ -287,6 +318,8 @@ eglGetDisplay(EGLNativeDisplayType nativeDisplay)
_EGLDisplay *dpy;
void *native_display_ptr;
 
+   _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY);
+
STATIC_ASSERT(sizeof(void*) == sizeof(nativeDisplay));
native_display_ptr = (void*) nativeDisplay;
 
@@ -330,6 +363,7 @@ static EGLDisplay EGLAPIENTRY
 eglGetPlatformDisplayEXT(EGLenum platform, void *native_display,
  const EGLint *attrib_list)
 {
+   _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY);
return _eglGetPlatformDisplayCommon(platform, native_display, attrib_list);
 }
 
@@ -340,6 +374,8 @@ eglGetPlatformDisplay(EGLenum platform, void 
*native_display,
EGLDisplay display;
EGLint *int_attribs;
 
+   _EGL_FUNC_START(NULL, EGL_OBJECT_THREAD_KHR, NULL, EGL_NO_DISPLAY);
+
int_attribs = _eglConvertAttribsToInt(attrib_list);
if (attrib_list && !int_attribs)
   RETURN_EGL_ERROR(NULL, EGL_BAD_ALLOC, NULL);
@@ -483,6 +519,8 @@ eglInitialize(EGLDisplay dpy, EGLint *major, EGLint *minor)
 {
_EGLDisplay *disp = _eglLockDisplay(dpy);
 
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);
+
if (!disp)
   RETURN_EGL_ERROR(NULL, EGL_BAD_DISPLAY, EGL_FALSE);
 
@@ -533,6 +571,8 @@ eglTerminate(EGLDisplay dpy)
 {
_EGLDisplay *disp = _eglLockDisplay(dpy);
 
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);
+
if (!disp)
   RETURN_EGL_ERROR(NULL, EGL_BAD_DISPLAY, EGL_FALSE);
 
@@ -560,6 +600,7 @@ eglQueryString(EGLDisplay dpy, EGLint name)
}
 
disp = _eglLockDisplay(dpy);
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, NULL);
_EGL_CHECK_DISPLAY(disp, NULL, drv);
 
switch (name) {
@@ -585,6 +626,8 @@ eglGetConfigs(EGLDisplay dpy, EGLConfig *configs,
_EGLDriver *drv;
EGLBoolean ret;
 
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);
+
_EGL_CHECK_DISPLAY(disp, EGL_FALSE, drv);
ret = drv->API.GetConfigs(drv, disp, configs, config_size, num_config);
 
@@ -600,6 +643,8 @@ eglChooseConfig(EGLDisplay dpy, const EGLint *attrib_list, 
EGLConfig *configs,
_EGLDriver *drv;
EGLBoolean ret;
 
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);
+
_EGL_CHECK_DISPLAY(disp, EGL_FALSE, drv);
ret = drv->API.ChooseConfig(drv, disp, attrib_list, configs,
 config_size, num_config);
@@ -617,6 +662,8 @@ eglGetConfigAttrib(EGLDisplay dpy, EGLConfig config,
_EGLDriver *drv;
EGLBoolean ret;
 
+   _EGL_FUNC_START(disp, EGL_OBJECT_DISPLAY_KHR, NULL, EGL_FALSE);
+
_EGL_CHECK_CONFIG(disp, conf, EGL_FALSE, drv);
ret = drv->API.GetConfigAttrib(drv, di

Re: [Mesa-dev] [PATCH 14/14] egl: Implement EGL_KHR_debug

2016-09-13 Thread Adam Jackson
On Tue, 2016-09-13 at 17:17 +0100, Emil Velikov wrote:

> > +  } else {
> > + _eglDebugReportFull(EGL_BAD_ALLOC, __func__, __func__,
> > +   EGL_DEBUG_MSG_CRITICAL_KHR, NULL, NULL);
> > + return EGL_BAD_ALLOC;
> > +  }
> 
> Nit: Please use the same style as the "objectType ==
> EGL_OBJECT_DISPLAY_KHR" case.

AFAICT the reason this code doesn't use RETURN_EGL_ERROR like
everything else is because it doesn't lock the display. Which is
extremely wrong, since we definitely depend on it not going away from
under us later! Fixed in v2.

> Nit: You can also drop the "else" and flatten (indent one level less)
> all of the following code.

Done in v2.

> Missing EGLAPIENTRY

Fixed in v2.

> > +eglDebugMessageControlKHR(EGLDEBUGPROCKHR callback,
> > + const EGLAttrib *attrib_list)
> > +{
> > +   unsigned int newEnabled;
> > +
> > +   _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC);
> > +
> > +   mtx_lock(_eglGlobal.Mutex);
> > +
> > +   newEnabled = _eglGlobal.debugTypesEnabled;
> > +   if (attrib_list != NULL) {
> > +  int i;
> > +
> > +  for (i = 0; attrib_list[i] != EGL_NONE; i += 2) {
> 
> Don't think we check it elsewhere (and/or if we should care too much) but 
> still:
> Check if i overflows or use unsigned type ?

There's a bunch of places where we don't check that...

> > + if (attrib_list[i] >= EGL_DEBUG_MSG_CRITICAL_KHR &&
> > +   attrib_list[i] <= EGL_DEBUG_MSG_INFO_KHR) {
> > +if (attrib_list[i + 1]) {
> > +   newEnabled |= DebugBitFromType(attrib_list[i]);
> > +} else {
> > +   newEnabled &= ~DebugBitFromType(attrib_list[i]);
> > +}
> 
> Nit: break; ?

Nope. You're allowed to set the disposition for multiple error levels
in a single call to DebugMessageControl, so you need to validate them
all.

> > +eglQueryDebugKHR(EGLint attribute, EGLAttrib *value)
> > +{
> > +   _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC);
> > +
> > +   mtx_lock(_eglGlobal.Mutex);
> > +   if (attribute >= EGL_DEBUG_MSG_CRITICAL_KHR &&
> > + attribute <= EGL_DEBUG_MSG_INFO_KHR) {
> > +  if (_eglGlobal.debugTypesEnabled & DebugBitFromType(attribute)) {
> > + *value = EGL_TRUE;
> > +  } else {
> > + *value = EGL_FALSE;
> > +  }
> > +   } else if (attribute == EGL_DEBUG_CALLBACK_KHR) {
> > +  *value = (EGLAttrib) _eglGlobal.debugCallback;
> > +   } else {
> > +  mtx_unlock(_eglGlobal.Mutex);
> > +  _eglReportError(EGL_BAD_ATTRIBUTE, NULL,
> > +  "Invalid attribute 0x%04lx", (unsigned long) attribute);
> > +  return EGL_FALSE;
> > +   }
> 
> Nit: Switch statement will be a lot easier to read.

Meh. I factored out the valid-debug-level check to a helper, at which
point you can't really use a switch. Redone as a do-while so I could
use break to bail out of the success conditions.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Problem with RX 480 on Alien: Isolation and Dota 2

2016-09-13 Thread Marek Olšák
LLVM 64-bit:

mkdir -p build
cd build
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/x86_64-linux-gnu
-DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=O
  -DCMAKE_BUILD_TYPE=RelWithDebInfo
-DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \
  -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
-fno-omit-frame-pointer" \
  -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
-fno-omit-frame-pointer".
ninja
sudo ninja install


LLVM 32-bit:

mkdir -p build32
cd build32
cmake .. -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/llvm/i386-linux-gnu
-DLLVM_TARGETS_TO_BUILD="X86;AMDGPU" -DLLVM_ENABLE_ASSERTIONS=ON
  -DCMAKE_BUILD_TYPE=RelWithDebInfo
-DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON \
  -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
-fno-omit-frame-pointer" \
  -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g -DNDEBUG
-fno-omit-frame-pointer" \
  -DLLVM_BUILD_32_BITS=ON
ninja
sudo ninja install
# then add /usr/llvm/x86_64-linux-gnu and /usr/llvm/i386-linux-gnu to
ld.conf


Mesa configure helper script, it will overwrite the /usr/lib/ files on
Ubuntu (run as-is for 64-bit, or use "-32" for 32-bit):

if test x$1 = x-32; then
dir=i386-linux-gnu
build=i686-linux-gnu
export CFLAGS="-m32 -O2 -g"
export CXXFLAGS="$CFLAGS"
export LDFLAGS="-L/usr/lib/$dir"
export PKG_CONFIG_PATH="/usr/lib/$dir/pkgconfig"
else
dir=x86_64-linux-gnu
build=$dir
fi

./autogen.sh \
 --build=$build --prefix=/usr --libdir=/usr/lib/$dir
--with-llvm-prefix=/usr/llvm/$dir \
 --enable-glx-tls --enable-texture-float --enable-debug --enable-vdpau \
 --disable-xvmc --disable-va --enable-nine --with-sha1=libnettle \
 --with-gallium-drivers=radeonsi,r600,swrast --with-dri-drivers= \
 --with-egl-platforms=x11,drm --enable-gles1 --enable-gles2

make -j4
sudo make install

You'll probably want to delete /usr/lib/$dir/*mesa*/*. That's Ubuntu's
invention that will prevent you from using installed libGL and libEGL.

It's all kind of a mess, but I don't know of a better way.

Marek


On Tue, Sep 13, 2016 at 7:33 PM, Romain Failliot <
romain.faill...@foolstep.com> wrote:
> 2016-09-13 12:41 GMT-04:00 Marek Olšák :
>>
>> BTW, If you update LLVM to a newer version, you also have to re-build
>> Mesa, because the LLVM version used by Mesa is determined while Mesa
>> is being built.
>>
>> Also, the chance to rage-quit while building LLVM+Mesa is pretty high
>> if you've never done it before.
>
> I see, is there a tutorial somewhere maybe on how to do that?
> I know how to compile projects, that's not a problem. It's more about the
> little details to make everything work once it's compiled.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] intel/isl: Divide QPitch by 2 for 3-D stencil textures on SKL+

2016-09-13 Thread Jason Ekstrand
On Tue, Sep 13, 2016 at 10:46 AM, Chad Versace 
wrote:

> On Thu 08 Sep 2016, Jason Ekstrand wrote:
> > ---
> >  src/intel/isl/isl_surface_state.c | 15 ++-
> >  1 file changed, 14 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/intel/isl/isl_surface_state.c
> b/src/intel/isl/isl_surface_state.c
> > index f8ea122..22fef3d 100644
> > --- a/src/intel/isl/isl_surface_state.c
> > +++ b/src/intel/isl/isl_surface_state.c
> > @@ -173,7 +173,20 @@ get_qpitch(const struct isl_surf *surf)
> >unreachable("Bad isl_surf_dim");
> > case ISL_DIM_LAYOUT_GEN4_2D:
> >if (GEN_GEN >= 9) {
> > - return isl_surf_get_array_pitch_el_rows(surf);
> > + if (surf->dim == ISL_SURF_DIM_3D && surf->tiling ==
> ISL_TILING_W) {
> > +/* This is rather annoying and completely undocumented.  It
> > + * appears that the hardware has a bug (or undocumented
> feature)
> > + * regarding stencil buffers most likely related to the way
> > + * W-tiling is handled as modified Y-tiling.  If you bind a
> 3-D or
> > + * 2-D array stencil buffer normally, and use texelFetch on
> it,
> > + * the z or array index will get implicitly multiplied by 2
> for no
> > + * obvious reason.  The fix appears to be to divide qpitch
> by 2
> > + * for W-tiled surfaces.
> > + */
>
> Have you confirmed that this fix is not needed on other gens? Or have
> you only confirmed that it's needed on SKL, and are deferring the
> workaround on the other gens until you had a chance to test it on them?
>

I'm not sure about KBL or later hardware.  However, it's not needed pre-SKL
because they use the GEN4_3D layout whereas SKL uses GEN4_2D for 3D
textures.


> Either way, the patch is sound. And the workaround doesn't surprise me.
> Reviewed-by: Chad Versace 
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] intel/isl: Divide QPitch by 2 for 3-D stencil textures on SKL+

2016-09-13 Thread Chad Versace
On Thu 08 Sep 2016, Jason Ekstrand wrote:
> ---
>  src/intel/isl/isl_surface_state.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/src/intel/isl/isl_surface_state.c 
> b/src/intel/isl/isl_surface_state.c
> index f8ea122..22fef3d 100644
> --- a/src/intel/isl/isl_surface_state.c
> +++ b/src/intel/isl/isl_surface_state.c
> @@ -173,7 +173,20 @@ get_qpitch(const struct isl_surf *surf)
>unreachable("Bad isl_surf_dim");
> case ISL_DIM_LAYOUT_GEN4_2D:
>if (GEN_GEN >= 9) {
> - return isl_surf_get_array_pitch_el_rows(surf);
> + if (surf->dim == ISL_SURF_DIM_3D && surf->tiling == ISL_TILING_W) {
> +/* This is rather annoying and completely undocumented.  It
> + * appears that the hardware has a bug (or undocumented feature)
> + * regarding stencil buffers most likely related to the way
> + * W-tiling is handled as modified Y-tiling.  If you bind a 3-D 
> or
> + * 2-D array stencil buffer normally, and use texelFetch on it,
> + * the z or array index will get implicitly multiplied by 2 for 
> no
> + * obvious reason.  The fix appears to be to divide qpitch by 2
> + * for W-tiled surfaces.
> + */

Have you confirmed that this fix is not needed on other gens? Or have
you only confirmed that it's needed on SKL, and are deferring the
workaround on the other gens until you had a chance to test it on them?

Either way, the patch is sound. And the workaround doesn't surprise me.
Reviewed-by: Chad Versace 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] isl/state: Don't set QPitch for GEN4_3D surfaces

2016-09-13 Thread Chad Versace
On Thu 08 Sep 2016, Jason Ekstrand wrote:
> ---
>  src/intel/isl/isl_surface_state.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/src/intel/isl/isl_surface_state.c 
> b/src/intel/isl/isl_surface_state.c
> index 979e140..f8ea122 100644
> --- a/src/intel/isl/isl_surface_state.c
> +++ b/src/intel/isl/isl_surface_state.c
> @@ -172,7 +172,6 @@ get_qpitch(const struct isl_surf *surf)
> default:
>unreachable("Bad isl_surf_dim");
> case ISL_DIM_LAYOUT_GEN4_2D:
> -   case ISL_DIM_LAYOUT_GEN4_3D:
>if (GEN_GEN >= 9) {
>   return isl_surf_get_array_pitch_el_rows(surf);
>} else {
> @@ -199,6 +198,22 @@ get_qpitch(const struct isl_surf *surf)
> *slices.
> */
>return isl_surf_get_array_pitch_el(surf);
> +   case ISL_DIM_LAYOUT_GEN4_3D:
> +  /* QPitch doesn't make sense for ISL_DIM_LAYOUT_GEN4_3D since it uses a
> +   * different pitch at each LOD.  Also, the QPitch field is ignored for
> +   * these surfaces.

Yep.
Reviewed-by: Chad Versace 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/isl: Ignore base_array_layer and array_len for 3D storage surfaces

2016-09-13 Thread Chad Versace
On Tue 13 Sep 2016, Jason Ekstrand wrote:
> The time we want to restrict the Z range of a 3-D surface is when rendering
> to it.  For storage surfaces, we always want he full range.  However, we
Typo --^^
> still need to set MinimumArrayElement and RenderTargetViewExtent to
> sensible values so we'll just set them to the reasonable defaults we used
> before we started respecting the base_array_layer and array_len.
> 
> This fixes a bunch of Vulkan CTS regressions caused by 48f195d7c6483ed.
> 
> Signed-off-by: Jason Ekstrand 
> ---
>  src/intel/isl/isl_surface_state.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)

Reviewed-by: Chad Versace 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 14/14] egl: Implement EGL_KHR_debug

2016-09-13 Thread Kyle Brenneman

On 09/13/2016 10:17 AM, Emil Velikov wrote:

Hi guys,

There's a bunch of outstanding style nitpicks (come to think of it
13/14 could use the same)

Those aside: there's a bunch of serious suggestions which I missed last time.

On 12 September 2016 at 23:19, Adam Jackson  wrote:

From: Kyle Brenneman 

Wire up the debug entrypoints to EGL dispatch, and add the extension
string to the client extension list.
---
  src/egl/main/eglapi.c | 140 ++
  src/egl/main/eglglobals.c |   3 +-
  2 files changed, 142 insertions(+), 1 deletion(-)

diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index 0a6ebe7..6b0fd2e 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -1987,6 +1987,143 @@ eglExportDMABUFImageMESA(EGLDisplay dpy, EGLImage image,
 RETURN_EGL_EVAL(disp, ret);
  }

+static EGLint EGLAPIENTRY
+eglLabelObjectKHR(EGLDisplay dpy, EGLenum objectType, EGLObjectKHR object,
+ EGLLabelKHR label)
+{
+   _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC);
+
+   if (objectType == EGL_OBJECT_THREAD_KHR) {
+  _EGLThreadInfo *t = _eglGetCurrentThread();
+
+  if (!_eglIsCurrentThreadDummy()) {
+ t->Label = label;
+ return EGL_SUCCESS;
+  } else {
+ _eglDebugReportFull(EGL_BAD_ALLOC, __func__, __func__,
+   EGL_DEBUG_MSG_CRITICAL_KHR, NULL, NULL);
+ return EGL_BAD_ALLOC;
+  }

Nit: Please use the same style as the "objectType ==
EGL_OBJECT_DISPLAY_KHR" case.



+   } else {

Nit: You can also drop the "else" and flatten (indent one level less)
all of the following code.


+  _EGLDisplay *disp = _eglLookupDisplay(dpy);
+
+  if (disp == NULL) {
+ _eglError(EGL_BAD_DISPLAY, "eglLabelObjectKHR");
+ return EGL_BAD_DISPLAY;
+  }
+
+  if (objectType == EGL_OBJECT_DISPLAY_KHR) {
+ if (dpy != (EGLDisplay) object) {
+_eglError(EGL_BAD_PARAMETER, "eglLabelObjectKHR");
+return EGL_BAD_PARAMETER;
+ }
+ disp->Label = label;
+ return EGL_SUCCESS;
+  } else {

Nit: kill this "else {" as well ?


+ _EGLResourceType type;
+
+ switch (objectType)
+ {

Nit: move to previous line


+case EGL_OBJECT_CONTEXT_KHR:
+   type = _EGL_RESOURCE_CONTEXT;
+   break;
+case EGL_OBJECT_SURFACE_KHR:
+   type = _EGL_RESOURCE_SURFACE;
+   break;
+case EGL_OBJECT_IMAGE_KHR:
+   type = _EGL_RESOURCE_IMAGE;
+   break;
+case EGL_OBJECT_SYNC_KHR:
+   type = _EGL_RESOURCE_SYNC;
+   break;
+case EGL_OBJECT_STREAM_KHR:
+default:
+_eglError(EGL_BAD_PARAMETER, "eglLabelObjectKHR");
+   return EGL_BAD_PARAMETER;
+ }
+
+ if (_eglCheckResource(object, type, disp)) {
+_EGLResource *res = (_EGLResource *) object;
+res->Label = label;
+return EGL_SUCCESS;
+ } else {
+_eglError(EGL_BAD_PARAMETER, "eglLabelObjectKHR");
+return EGL_BAD_PARAMETER;
+ }

Nit: coding style.


+  }
+   }
+}
+
+static EGLint

Missing EGLAPIENTRY


+eglDebugMessageControlKHR(EGLDEBUGPROCKHR callback,
+ const EGLAttrib *attrib_list)
+{
+   unsigned int newEnabled;
+
+   _EGL_FUNC_START(NULL, EGL_NONE, NULL, EGL_BAD_ALLOC);
+
+   mtx_lock(_eglGlobal.Mutex);
+
+   newEnabled = _eglGlobal.debugTypesEnabled;
+   if (attrib_list != NULL) {
+  int i;
+
+  for (i = 0; attrib_list[i] != EGL_NONE; i += 2) {

Don't think we check it elsewhere (and/or if we should care too much) but still:
Check if i overflows or use unsigned type ?


+ if (attrib_list[i] >= EGL_DEBUG_MSG_CRITICAL_KHR &&
+   attrib_list[i] <= EGL_DEBUG_MSG_INFO_KHR) {
+if (attrib_list[i + 1]) {
+   newEnabled |= DebugBitFromType(attrib_list[i]);
+} else {
+   newEnabled &= ~DebugBitFromType(attrib_list[i]);
+}

Nit: break; ?
A break here would be incorrect, since you can specify multiple flags in 
the attribute list.



+ } else {
+// On error, set the last error code, call the current
+// debug callback, and return the error code.
+mtx_unlock(_eglGlobal.Mutex);
+_eglReportError(EGL_BAD_ATTRIBUTE, NULL,
+  "Invalid attribute 0x%04lx", (unsigned long) attrib_list[i]);
+return EGL_BAD_ATTRIBUTE;
+ }
+  }
+   }
+
+   if (callback != NULL) {
+  _eglGlobal.debugCallback = callback;
+  _eglGlobal.debugTypesEnabled = newEnabled;
+   } else {
+  _eglGlobal.debugCallback = NULL;
+  _eglGlobal.debugTypesEnabled = _EGL_DEBUG_BIT_CRITICAL | 
_EGL_DEBUG_BIT_ERROR;
+   }
+
+   mtx_unlock(_eglGlobal.Mutex);
+   return EGL_SUCCESS;
+}
+
+static EGLBoolean

Missing EGLAPIENTRY


+eglQueryDebugKH

[Mesa-dev] [PATCH] nvc0/ir: fix comments about instructions info

2016-09-13 Thread Samuel Pitoiset
The comment for the commutative flags was wrong because OP_MUL is
before OP_MAD. While we are at it add missing opcodes, and fix
the comment about the short forms.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
index f5981de..f75e395 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp
@@ -156,13 +156,14 @@ void TargetNVC0::initOpInfo()
 
static const uint32_t commutative[(OP_LAST + 31) / 32] =
{
-  // ADD, MAD, MUL, AND, OR, XOR, MAX, MIN
+  // ADD, MUL, MAD, FMA, AND, OR, XOR, MAX, MIN, SET_AND, SET_OR, SET_XOR,
+  // SET, SELP, SLCT
   0x0670ca00, 0x003f, 0x, 0x
};
 
static const uint32_t shortForm[(OP_LAST + 31) / 32] =
{
-  // ADD, MAD, MUL, AND, OR, XOR, PRESIN, PREEX2, SFN, CVT, PINTERP, MOV
+  // ADD, MUL, MAD, FMA, AND, OR, XOR, MAX, MIN
   0x0670ca00, 0x, 0x, 0x
};
 
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Problem with RX 480 on Alien: Isolation and Dota 2

2016-09-13 Thread Romain Failliot
2016-09-13 12:41 GMT-04:00 Marek Olšák :
>
> BTW, If you update LLVM to a newer version, you also have to re-build
> Mesa, because the LLVM version used by Mesa is determined while Mesa
> is being built.
>
> Also, the chance to rage-quit while building LLVM+Mesa is pretty high
> if you've never done it before.

I see, is there a tutorial somewhere maybe on how to do that?
I know how to compile projects, that's not a problem. It's more about the
little details to make everything work once it's compiled.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/12] anv: Add func anv_image_has_hiz()

2016-09-13 Thread Chad Versace
On Tue 13 Sep 2016, Nanley Chery wrote:
> On Wed, Sep 07, 2016 at 03:51:14PM -0700, Chad Versace wrote:
> > On Wed 07 Sep 2016, Nanley Chery wrote:
> > > On Fri, Sep 02, 2016 at 11:42:24AM -0700, Chad Versace wrote:
> > > > On Thu 01 Sep 2016, Jason Ekstrand wrote:
> > > > > On Wed, Aug 31, 2016 at 8:29 PM, Nanley Chery  
> > > > > wrote:
> > > > > 
> > > > > From: Chad Versace 
> > > > > 
> > > > > Nanley Chery (amend):
> > > > >  - Remove wip! tag
> > > > > 
> > > > > Signed-off-by: Nanley Chery 
> > > > > ---
> > > > >  src/intel/vulkan/anv_private.h | 10 ++
> > > > >  1 file changed, 10 insertions(+)
> > > > > 
> > > > 
> > > > 
> > > > > +static inline bool
> > > > > +anv_image_has_hiz(const struct anv_image *image)
> > > > > +{
> > > > > +   /* We must check the usage because anv_image::hiz_surface 
> > > > > belongs to
> > > > > +    * a union.
> > > > > +    */
> > > > > +   return (image->usage & 
> > > > > VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT) &&
> > > > > 
> > > > > 
> > > > > Would checking (image->aspects & VK_IMAGE_ASPECT_DEPTH_BIT) be more
> > > > > appropriate?
> > > > 
> > > > I agree. VK_IMAGE_ASPECT_DEPTH_BIT makes more sense.
> > > > 
> > > > Also, that's what the documentation for anv_image says, quoted below:
> > > >
> > > >struct anv_image {
> > > >   ...
> > > > 
> > > >   /**
> > > >* Image subsurfaces
> > > >*
> > > >* For each foo, anv_image::foo_surface is valid if and only if
> > > >* anv_image::aspects has a foo aspect.
> > > >*
> > > >* ...
> > > >*/
> > > >   union {
> > > >  struct anv_surface color_surface;
> > > >
> > > >  struct {
> > > > struct anv_surface depth_surface;
> > > > struct anv_surface stencil_surface; 
> > > >
> > > >  };
> > > >   };
> > > >};
> > > >
> > > 
> > > Sure. Thanks for the documentation quote.
> > > 
> > > A HiZ surface is created for a depth image if both usage and aspect 
> > > conditions
> > > are satisfied. Would it be better for me to add the aspect check instead 
> > > of
> > > replacing the usage check with it?
> > 
> > I see. You want to avoid allocating the HiZ surface if it's never
> > rendered as a depth attachment. 
> > 
> > So yes, your suggestion sounds good to me.
> 
> I'll actually leave it out if you don't mind. The usage check isn't
> required to get the right result.

Sure. As long as the aspect check is present, then it's good with me.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] AppVeyor fails with 404 during wget

2016-09-13 Thread Steven Toth
>> It looks like the winflexbison URL changed some time ago.  But this
>> didn't cause any build failures because the ZIP was being recovered from
>> the cache.
>>
>> I'll look into it.
>>
>> Jose
>
>
> It looks the archive was moved into a old_versions subdir.
>
> The attached patch should fix it.  Could you please try it on your repos and
> confirm.

Much better, compilation starts correctly now, thanks.

- Steve
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/va: also honors interlaced preference when providing a video format

2016-09-13 Thread Andy Furniss

Leo Liu wrote:

Hi Andy,

On 09/13/2016 06:22 AM, Andy Furniss wrote:

Zhang, Boyuan wrote:

Hi Leo, Christian and Julien,

I tested the patch with Vaapi Encoding and Transcoding, it seems
working fine. We are using "VAAPI_DISABLE_INTERLACE" env, so
interlaced is always disabled.


Though I notice it will break screen recording scripts for existing
users who previously didn't need the env set but will after this.

Totally untested/thought through, but maybe the env should default to on?


Agree, can you come up a patch for that?


OK, but maybe I should test a bit first to see if anything regresses.

Unfortunately I today, by chance found an issue with mpv.

With VAAPI_DISABLE_INTERLACE=1 which it needs for

mpv --vo-vaapi all is apparently OK when playing say a 25fps vid,
but I've found that if I push the framerate to refresh rate and
do something that draws OSD than image is corrupted, possible many
VM faults logged followed by a segfault in u_copy_yv12_img_to_nv12_surf
this happens with or without the uv swap patch below. I will file a bug
after more investigation. Bisecting mesa goes back to the commit that
introduced  VAAPI_DISABLE_INTERLACE.


Also any outstanding patches for VA-API encode from you was reviewed,
but not committed?
if any, sent to me, I can push them.


There's only

https://lists.freedesktop.org/archives/mesa-dev/2016-July/124695.html

for the uv swap issue.

Not my issue as such, but did anyone notice this from Mark Thompson, who
does vaapi for libav/ffmpeg?

I notice he didn't keep the CCs so maybe it got missed.

https://lists.freedesktop.org/archives/mesa-dev/2016-September/128076.html


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] intel: Move Vulkan sample positions to common code

2016-09-13 Thread Anuj Phogat
On Mon, Sep 12, 2016 at 3:50 PM, Jason Ekstrand  wrote:
> Signed-off-by: Jason Ekstrand 
> ---
>  .../genX_multisample.h => common/gen_sample_positions.h} | 10 +-
>  src/intel/vulkan/genX_blorp_exec.c   | 10 +-
>  src/intel/vulkan/genX_pipeline_util.h| 10 +-
>  src/intel/vulkan/genX_state.c| 12 
> ++--
>  4 files changed, 21 insertions(+), 21 deletions(-)
>  rename src/intel/{vulkan/genX_multisample.h => 
> common/gen_sample_positions.h} (94%)
>
> diff --git a/src/intel/vulkan/genX_multisample.h 
> b/src/intel/common/gen_sample_positions.h
> similarity index 94%
> rename from src/intel/vulkan/genX_multisample.h
> rename to src/intel/common/gen_sample_positions.h
> index 0deb48f..0411bf0 100644
> --- a/src/intel/vulkan/genX_multisample.h
> +++ b/src/intel/common/gen_sample_positions.h
> @@ -22,17 +22,17 @@
>   */
>  #pragma once
>
> -#define SAMPLE_POS_1X(prefix) \
> +#define GEN_SAMPLE_POS_1X(prefix) \
>  prefix##0XOffset   = 0.5; \
>  prefix##0YOffset   = 0.5;
>
> -#define SAMPLE_POS_2X(prefix) \
> +#define GEN_SAMPLE_POS_2X(prefix) \
>  prefix##0XOffset   = 0.25; \
>  prefix##0YOffset   = 0.25; \
>  prefix##1XOffset   = 0.75; \
>  prefix##1YOffset   = 0.75;
>
> -#define SAMPLE_POS_4X(prefix) \
> +#define GEN_SAMPLE_POS_4X(prefix) \
>  prefix##0XOffset   = 0.375; \
>  prefix##0YOffset   = 0.125; \
>  prefix##1XOffset   = 0.875; \
> @@ -42,7 +42,7 @@ prefix##2YOffset   = 0.625; \
>  prefix##3XOffset   = 0.625; \
>  prefix##3YOffset   = 0.875;
>
> -#define SAMPLE_POS_8X(prefix) \
> +#define GEN_SAMPLE_POS_8X(prefix) \
>  prefix##0XOffset   = 0.5625; \
>  prefix##0YOffset   = 0.3125; \
>  prefix##1XOffset   = 0.4375; \
> @@ -60,7 +60,7 @@ prefix##6YOffset   = 0.9375; \
>  prefix##7XOffset   = 0.9375; \
>  prefix##7YOffset   = 0.0625;
>
> -#define SAMPLE_POS_16X(prefix) \
> +#define GEN_SAMPLE_POS_16X(prefix) \
>  prefix##0XOffset   = 0.5625; \
>  prefix##0YOffset   = 0.5625; \
>  prefix##1XOffset   = 0.4375; \
> diff --git a/src/intel/vulkan/genX_blorp_exec.c 
> b/src/intel/vulkan/genX_blorp_exec.c
> index 889c423..5a08ed3 100644
> --- a/src/intel/vulkan/genX_blorp_exec.c
> +++ b/src/intel/vulkan/genX_blorp_exec.c
> @@ -24,7 +24,6 @@
>  #include 
>
>  #include "anv_private.h"
> -#include "genX_multisample.h"
>
>  /* These are defined in anv_private.h and blorp_genX_exec.h */
>  #undef __gen_address_type
> @@ -32,6 +31,7 @@
>  #undef __gen_combine_address
>
>  #include "common/gen_l3_config.h"
> +#include "common/gen_sample_positions.h"
>  #include "blorp/blorp_genX_exec.h"
>
>  static void *
> @@ -164,16 +164,16 @@ blorp_emit_3dstate_multisample(struct blorp_batch 
> *batch, unsigned samples)
>
>switch (samples) {
>case 1:
> - SAMPLE_POS_1X(ms.Sample);
> + GEN_SAMPLE_POS_1X(ms.Sample);
>   break;
>case 2:
> - SAMPLE_POS_2X(ms.Sample);
> + GEN_SAMPLE_POS_2X(ms.Sample);
>   break;
>case 4:
> - SAMPLE_POS_4X(ms.Sample);
> + GEN_SAMPLE_POS_4X(ms.Sample);
>   break;
>case 8:
> - SAMPLE_POS_8X(ms.Sample);
> + GEN_SAMPLE_POS_8X(ms.Sample);
>   break;
>default:
>   break;
> diff --git a/src/intel/vulkan/genX_pipeline_util.h 
> b/src/intel/vulkan/genX_pipeline_util.h
> index 2c0bf3f..0ff92f1 100644
> --- a/src/intel/vulkan/genX_pipeline_util.h
> +++ b/src/intel/vulkan/genX_pipeline_util.h
> @@ -22,8 +22,8 @@
>   */
>
>  #include "common/gen_l3_config.h"
> +#include "common/gen_sample_positions.h"
>  #include "vk_format_info.h"
> -#include "genX_multisample.h"
>
>  static uint32_t
>  vertex_element_comp_control(enum isl_format format, unsigned comp)
> @@ -610,16 +610,16 @@ emit_ms_state(struct anv_pipeline *pipeline,
>
>switch (samples) {
>case 1:
> - SAMPLE_POS_1X(ms.Sample);
> + GEN_SAMPLE_POS_1X(ms.Sample);
>   break;
>case 2:
> - SAMPLE_POS_2X(ms.Sample);
> + GEN_SAMPLE_POS_2X(ms.Sample);
>   break;
>case 4:
> - SAMPLE_POS_4X(ms.Sample);
> + GEN_SAMPLE_POS_4X(ms.Sample);
>   break;
>case 8:
> - SAMPLE_POS_8X(ms.Sample);
> + GEN_SAMPLE_POS_8X(ms.Sample);
>   break;
>default:
>   break;
> diff --git a/src/intel/vulkan/genX_state.c b/src/intel/vulkan/genX_state.c
> index 2849b50..a6d405d 100644
> --- a/src/intel/vulkan/genX_state.c
> +++ b/src/intel/vulkan/genX_state.c
> @@ -28,8 +28,8 @@
>  #include 
>
>  #include "anv_private.h"
> -#include "genX_multisample.h"
>
> +#include "common/gen_sample_positions.h"
>  #include "genxml/gen_macros.h"
>  #include "genxml/genX_pack.h"
>
> @@ -77,12 +77,12 @@ genX(init_device_state)(struct anv_device *device)
>  * VkPhysicalDeviceFeatures::standardSampleLocations.
>  */
> anv_batch_emit(&batch, GENX(3DSTATE_SAMPLE_PATTERN), sp) {
> -  SAMPLE_POS_1X(sp

[Mesa-dev] [PATCH v3 0/3] Make eglExportDMABUFImageMESA return corresponding offset.

2016-09-13 Thread Chuanbo Weng
This patchset makes eglExportDMABUFImageMESA return corresponding offset
of EGLImage instead of 0 on intel platfrom with classic dri driver(i965).

v2: Add version check of __DRIimageExtension implementation in egl loader
(Suggested by Axel Davy).

v3: Don't add version check of __DRIimageExtension implementation in egl
loader. Set the offset only when queryImage() succeeds. (Suggested by Emil
Velikov)

Chuanbo Weng (3):
  dri: add offset attribute and bump version of EGLImage extensions.
  egl: return corresponding offset of EGLImage instead of 0.
  i965: implement querying __DRI_IMAGE_ATTRIB_OFFSET.

 include/GL/internal/dri_interface.h  | 4 +++-
 src/egl/drivers/dri2/egl_dri2.c  | 8 +++-
 src/mesa/drivers/dri/i965/intel_screen.c | 9 +++--
 3 files changed, 17 insertions(+), 4 deletions(-)

-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 2/3] egl: return corresponding offset of EGLImage instead of 0.

2016-09-13 Thread Chuanbo Weng
The offset should not always be 0. For example, if EGLImage is
created from a 2D texture with EGL_GL_TEXTURE_LEVEL=1, then the
offset should be the actual start of miplevel 1 in bo.

v2: Add version check of __DRIimageExtension implementation
(Suggested by Axel Davy).

v3: Don't add version check of __DRIimageExtension implementation.
Set the offset only when queryImage() succeeds. (Suggested by Emil
Velikov)

Signed-off-by: Chuanbo Weng 
---
 src/egl/drivers/dri2/egl_dri2.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index bbc457c..84687de 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -2259,8 +2259,14 @@ dri2_export_dma_buf_image_mesa(_EGLDriver *drv, 
_EGLDisplay *disp, _EGLImage *im
   dri2_dpy->image->queryImage(dri2_img->dri_image,
  __DRI_IMAGE_ATTRIB_STRIDE, strides);
 
-   if (offsets)
+   if (offsets) {
   offsets[0] = 0;
+  EGLint img_offset = 0;
+  bool ret = dri2_dpy->image->queryImage(dri2_img->dri_image,
+ __DRI_IMAGE_ATTRIB_OFFSET, &img_offset);
+  if(ret == true)
+ offsets[0] = img_offset;
+   }
 
return EGL_TRUE;
 }
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 1/3] dri: add offset attribute and bump version of EGLImage extensions.

2016-09-13 Thread Chuanbo Weng
Offset is useful for buffer sharing with other components, so add
it to queryImage attributes.

Signed-off-by: Chuanbo Weng 
---
 include/GL/internal/dri_interface.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/GL/internal/dri_interface.h 
b/include/GL/internal/dri_interface.h
index 1c73cce..d0b1bc6 100644
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -1094,7 +1094,7 @@ struct __DRIdri2ExtensionRec {
  * extensions.
  */
 #define __DRI_IMAGE "DRI_IMAGE"
-#define __DRI_IMAGE_VERSION 12
+#define __DRI_IMAGE_VERSION 13
 
 /**
  * These formats correspond to the similarly named MESA_FORMAT_*
@@ -1208,6 +1208,8 @@ struct __DRIdri2ExtensionRec {
 #define __DRI_IMAGE_ATTRIB_FOURCC   0x2008 /* available in versions 11 */
 #define __DRI_IMAGE_ATTRIB_NUM_PLANES   0x2009 /* available in versions 11 */
 
+#define __DRI_IMAGE_ATTRIB_OFFSET 0x200A /* available in versions 13 */
+
 enum __DRIYUVColorSpace {
__DRI_YUV_COLOR_SPACE_UNDEFINED = 0,
__DRI_YUV_COLOR_SPACE_ITU_REC601 = 0x327F,
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 3/3] i965: implement querying __DRI_IMAGE_ATTRIB_OFFSET.

2016-09-13 Thread Chuanbo Weng
Implement querying this attribute in intelImageExtension and bump
version of intelImageExtension.

Signed-off-by: Chuanbo Weng 
---
 src/mesa/drivers/dri/i965/intel_screen.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index a3d252d..8c75e61 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -609,6 +609,9 @@ intel_query_image(__DRIimage *image, int attrib, int *value)
case __DRI_IMAGE_ATTRIB_NUM_PLANES:
   *value = 1;
   return true;
+   case __DRI_IMAGE_ATTRIB_OFFSET:
+  *value = image->offset;
+  return true;
 
   default:
   return false;
@@ -845,7 +848,7 @@ intel_from_planar(__DRIimage *parent, int plane, void 
*loaderPrivate)
 }
 
 static const __DRIimageExtension intelImageExtension = {
-.base = { __DRI_IMAGE, 11 },
+.base = { __DRI_IMAGE, 13 },
 
 .createImageFromName= intel_create_image_from_name,
 .createImageFromRenderbuffer= intel_create_image_from_renderbuffer,
@@ -860,7 +863,9 @@ static const __DRIimageExtension intelImageExtension = {
 .createImageFromFds = intel_create_image_from_fds,
 .createImageFromDmaBufs = intel_create_image_from_dma_bufs,
 .blitImage  = NULL,
-.getCapabilities= NULL
+.getCapabilities= NULL,
+.mapImage   = NULL,
+.unmapImage = NULL,
 };
 
 static int
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] radeonsi: rename get_sampler_desc -> load_sampler_desc

2016-09-13 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_shader.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 696f67b..3f77714 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4298,23 +4298,23 @@ enum desc_type {
 
 static LLVMTypeRef const_array(LLVMTypeRef elem_type, int num_elements)
 {
return LLVMPointerType(LLVMArrayType(elem_type, num_elements),
   CONST_ADDR_SPACE);
 }
 
 /**
  * Load an image view, fmask view. or sampler state descriptor.
  */
-static LLVMValueRef get_sampler_desc_custom(struct si_shader_context *ctx,
-   LLVMValueRef list, LLVMValueRef 
index,
-   enum desc_type type)
+static LLVMValueRef load_sampler_desc_custom(struct si_shader_context *ctx,
+LLVMValueRef list, LLVMValueRef 
index,
+enum desc_type type)
 {
struct gallivm_state *gallivm = &ctx->radeon_bld.gallivm;
LLVMBuilderRef builder = gallivm->builder;
 
switch (type) {
case DESC_IMAGE:
/* The image is at [0:7]. */
index = LLVMBuildMul(builder, index, LLVMConstInt(ctx->i32, 2, 
0), "");
break;
case DESC_FMASK:
@@ -4327,27 +4327,27 @@ static LLVMValueRef get_sampler_desc_custom(struct 
si_shader_context *ctx,
index = LLVMBuildMul(builder, index, LLVMConstInt(ctx->i32, 4, 
0), "");
index = LLVMBuildAdd(builder, index, LLVMConstInt(ctx->i32, 3, 
0), "");
list = LLVMBuildPointerCast(builder, list,
const_array(ctx->v4i32, 0), "");
break;
}
 
return build_indexed_load_const(ctx, list, index);
 }
 
-static LLVMValueRef get_sampler_desc(struct si_shader_context *ctx,
+static LLVMValueRef load_sampler_desc(struct si_shader_context *ctx,
 LLVMValueRef index, enum desc_type type)
 {
LLVMValueRef list = LLVMGetParam(ctx->radeon_bld.main_fn,
 SI_PARAM_SAMPLERS);
 
-   return get_sampler_desc_custom(ctx, list, index, type);
+   return load_sampler_desc_custom(ctx, list, index, type);
 }
 
 /* Disable anisotropic filtering if BASE_LEVEL == LAST_LEVEL.
  *
  * SI-CI:
  *   If BASE_LEVEL == LAST_LEVEL, the shader must disable anisotropic
  *   filtering manually. The driver sets img7 to a mask clearing
  *   MAX_ANISO_RATIO if BASE_LEVEL == LAST_LEVEL. The shader must do:
  * s_and_b32 samp0, samp0, img7
  *
@@ -4388,31 +4388,31 @@ static void tex_fetch_ptrs(
 
if (emit_data->inst->Src[sampler_src].Register.Indirect) {
const struct tgsi_full_src_register *reg = 
&emit_data->inst->Src[sampler_src];
LLVMValueRef ind_index;
 
ind_index = get_bounded_indirect_index(ctx,
   ®->Indirect,
   reg->Register.Index,
   SI_NUM_SAMPLERS);
 
-   *res_ptr = get_sampler_desc(ctx, ind_index, DESC_IMAGE);
+   *res_ptr = load_sampler_desc(ctx, ind_index, DESC_IMAGE);
 
if (target == TGSI_TEXTURE_2D_MSAA ||
target == TGSI_TEXTURE_2D_ARRAY_MSAA) {
if (samp_ptr)
*samp_ptr = NULL;
if (fmask_ptr)
-   *fmask_ptr = get_sampler_desc(ctx, ind_index, 
DESC_FMASK);
+   *fmask_ptr = load_sampler_desc(ctx, ind_index, 
DESC_FMASK);
} else {
if (samp_ptr) {
-   *samp_ptr = get_sampler_desc(ctx, ind_index, 
DESC_SAMPLER);
+   *samp_ptr = load_sampler_desc(ctx, ind_index, 
DESC_SAMPLER);
*samp_ptr = sici_fix_sampler_aniso(ctx, 
*res_ptr, *samp_ptr);
}
if (fmask_ptr)
*fmask_ptr = NULL;
}
} else {
*res_ptr = ctx->sampler_views[sampler_index];
if (samp_ptr)
*samp_ptr = ctx->sampler_states[sampler_index];
if (fmask_ptr)
@@ -5907,29 +5907,29 @@ static void preload_samplers(struct si_shader_context 
*ctx)
LLVMValueRef offset;
 
if (num_samplers == 0)
return;
 
/* Load the resources and samplers, we rely on the code sinking to do 
the rest */
for (i = 0; i < num_samplers; ++i) {
/* Resource */
offset = lp_build_co

[Mesa-dev] [PATCH 4/6] radeonsi: get rid of img/buf/sampler descriptor preloading

2016-09-13 Thread Marek Olšák
From: Marek Olšák 

26011 shaders in 14651 tests
Totals:
SGPRS: 1251920 -> 1152636 (-7.93 %)
VGPRS: 728421 -> 728198 (-0.03 %)
Spilled SGPRs: 16644 -> 3776 (-77.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 36001064 -> 35835152 (-0.46 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 21 -> 222372 (0.07 %)
Wait states: 0 -> 0 (0.00 %)
---
 src/gallium/drivers/radeonsi/si_shader.c | 123 +++
 1 file changed, 28 insertions(+), 95 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 3f77714..c96c52e 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -100,25 +100,20 @@ struct si_shader_context
 
LLVMTargetMachineRef tm;
 
unsigned invariant_load_md_kind;
unsigned range_md_kind;
unsigned uniform_md_kind;
LLVMValueRef empty_md;
 
/* Preloaded descriptors. */
LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS];
-   LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS];
-   LLVMValueRef sampler_views[SI_NUM_SAMPLERS];
-   LLVMValueRef sampler_states[SI_NUM_SAMPLERS];
-   LLVMValueRef fmasks[SI_NUM_SAMPLERS];
-   LLVMValueRef images[SI_NUM_IMAGES];
LLVMValueRef esgs_ring;
LLVMValueRef gsvs_ring[4];
 
LLVMValueRef lds;
LLVMValueRef gs_next_vertex[4];
LLVMValueRef return_value;
 
LLVMTypeRef voidt;
LLVMTypeRef i1;
LLVMTypeRef i8;
@@ -3420,30 +3415,32 @@ static void membar_emit(
struct si_shader_context *ctx = si_shader_context(bld_base);
 
emit_waitcnt(ctx);
 }
 
 static LLVMValueRef
 shader_buffer_fetch_rsrc(struct si_shader_context *ctx,
 const struct tgsi_full_src_register *reg)
 {
LLVMValueRef ind_index;
-   LLVMValueRef rsrc_ptr;
+   LLVMValueRef rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn,
+SI_PARAM_SHADER_BUFFERS);
 
-   if (!reg->Register.Indirect)
-   return ctx->shader_buffers[reg->Register.Index];
+   if (!reg->Register.Indirect) {
+   ind_index = LLVMConstInt(ctx->i32, reg->Register.Index, 0);
+   return build_indexed_load_const(ctx, rsrc_ptr, ind_index);
+   }
 
ind_index = get_bounded_indirect_index(ctx, ®->Indirect,
   reg->Register.Index,
   SI_NUM_SHADER_BUFFERS);
 
-   rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn, 
SI_PARAM_SHADER_BUFFERS);
return build_indexed_load_const(ctx, rsrc_ptr, ind_index);
 }
 
 static bool tgsi_is_array_sampler(unsigned target)
 {
return target == TGSI_TEXTURE_1D_ARRAY ||
   target == TGSI_TEXTURE_SHADOW1D_ARRAY ||
   target == TGSI_TEXTURE_2D_ARRAY ||
   target == TGSI_TEXTURE_SHADOW2D_ARRAY ||
   target == TGSI_TEXTURE_CUBE_ARRAY ||
@@ -3493,46 +3490,54 @@ static LLVMValueRef force_dcc_off(struct 
si_shader_context *ctx,
  * Load the resource descriptor for \p image.
  */
 static void
 image_fetch_rsrc(
struct lp_build_tgsi_context *bld_base,
const struct tgsi_full_src_register *image,
bool dcc_off,
LLVMValueRef *rsrc)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
+   LLVMValueRef rsrc_ptr = LLVMGetParam(ctx->radeon_bld.main_fn,
+SI_PARAM_IMAGES);
 
assert(image->Register.File == TGSI_FILE_IMAGE);
 
if (!image->Register.Indirect) {
-   /* Fast path: use preloaded resources */
-   *rsrc = ctx->images[image->Register.Index];
+   struct tgsi_shader_info *info = &ctx->shader->selector->info;
+   int i = image->Register.Index;
+   LLVMValueRef index = LLVMConstInt(ctx->i32, i, 0);
+
+   /* Rely on LLVM to shrink the load for buffer resources. */
+   *rsrc = build_indexed_load_const(ctx, rsrc_ptr, index);
+
+   if (info->images_writemask & (1 << i) &&
+   !(info->images_buffers & (1 << i)))
+   *rsrc = force_dcc_off(ctx, *rsrc);
} else {
/* Indexing and manual load */
LLVMValueRef ind_index;
-   LLVMValueRef rsrc_ptr;
LLVMValueRef tmp;
 
/* From the GL_ARB_shader_image_load_store extension spec:
 *
 *If a shader performs an image load, store, or atomic
 *operation using an image variable declared as an array,
 *and if the index used to select an individual element is
 *negative or greater than or equal to the size of the
 *array, the results of the operation are undefined but may

[Mesa-dev] [PATCH 1/6] radeonsi: load streamout buffer descriptors before use

2016-09-13 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_shader.c | 67 
 1 file changed, 34 insertions(+), 33 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index be6fae7..b9ad4be 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -105,21 +105,20 @@ struct si_shader_context
unsigned uniform_md_kind;
LLVMValueRef empty_md;
 
LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS];
LLVMValueRef lds;
LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS];
LLVMValueRef sampler_views[SI_NUM_SAMPLERS];
LLVMValueRef sampler_states[SI_NUM_SAMPLERS];
LLVMValueRef fmasks[SI_NUM_SAMPLERS];
LLVMValueRef images[SI_NUM_IMAGES];
-   LLVMValueRef so_buffers[4];
LLVMValueRef esgs_ring;
LLVMValueRef gsvs_ring[4];
LLVMValueRef gs_next_vertex[4];
LLVMValueRef return_value;
 
LLVMTypeRef voidt;
LLVMTypeRef i1;
LLVMTypeRef i8;
LLVMTypeRef i32;
LLVMTypeRef i64;
@@ -2253,31 +2252,64 @@ static void si_dump_streamout(struct 
pipe_stream_output_info *so)
i, so->output[i].output_buffer,
so->output[i].dst_offset, so->output[i].dst_offset + 
so->output[i].num_components - 1,
so->output[i].register_index,
mask & 1 ? "x" : "",
mask & 2 ? "y" : "",
mask & 4 ? "z" : "",
mask & 8 ? "w" : "");
}
 }
 
+static void load_streamout_descriptors(struct si_shader_context *ctx,
+  LLVMValueRef so_buffers[4])
+{
+   struct lp_build_tgsi_context *bld_base = &ctx->radeon_bld.soa.bld_base;
+   struct gallivm_state *gallivm = bld_base->base.gallivm;
+   unsigned i;
+
+   /* Streamout can only be used if the shader is compiled as VS. */
+   if (!ctx->shader->selector->so.num_outputs ||
+   (ctx->type == PIPE_SHADER_VERTEX &&
+(ctx->shader->key.vs.as_es ||
+ ctx->shader->key.vs.as_ls)) ||
+   (ctx->type == PIPE_SHADER_TESS_EVAL &&
+ctx->shader->key.tes.as_es))
+   return;
+
+   LLVMValueRef buf_ptr = LLVMGetParam(ctx->radeon_bld.main_fn,
+   SI_PARAM_RW_BUFFERS);
+
+   /* Load the resources, we rely on the code sinking to do the rest */
+   for (i = 0; i < 4; ++i) {
+   if (ctx->shader->selector->so.stride[i]) {
+   LLVMValueRef offset = lp_build_const_int32(gallivm,
+  
SI_VS_STREAMOUT_BUF0 + i);
+
+   so_buffers[i] = build_indexed_load_const(ctx, buf_ptr, 
offset);
+   }
+   }
+}
+
 /* On SI, the vertex shader is responsible for writing streamout data
  * to buffers. */
 static void si_llvm_emit_streamout(struct si_shader_context *ctx,
   struct si_shader_output_values *outputs,
   unsigned noutput)
 {
struct pipe_stream_output_info *so = &ctx->shader->selector->so;
struct gallivm_state *gallivm = &ctx->radeon_bld.gallivm;
LLVMBuilderRef builder = gallivm->builder;
int i, j;
struct lp_build_if_state if_ctx;
+   LLVMValueRef so_buffers[4];
+
+   load_streamout_descriptors(ctx, so_buffers);
 
/* Get bits [22:16], i.e. (so_param >> 16) & 127; */
LLVMValueRef so_vtx_count =
unpack_param(ctx, ctx->param_streamout_config, 16, 7);
 
LLVMValueRef tid = get_thread_id(ctx);
 
/* can_emit = tid < so_vtx_count; */
LLVMValueRef can_emit =
LLVMBuildICmp(builder, LLVMIntULT, tid, so_vtx_count, "");
@@ -2359,21 +2391,21 @@ static void si_llvm_emit_streamout(struct 
si_shader_context *ctx,
}
break;
}
 
LLVMValueRef can_emit_stream =
LLVMBuildICmp(builder, LLVMIntEQ,
  stream_id,
  lp_build_const_int32(gallivm, 
stream), "");
 
lp_build_if(&if_ctx_stream, gallivm, can_emit_stream);
-   build_tbuffer_store_dwords(ctx, 
ctx->so_buffers[buf_idx],
+   build_tbuffer_store_dwords(ctx, so_buffers[buf_idx],
   vdata, num_comps,
   so_write_offset[buf_idx],
   LLVMConstInt(ctx->i32, 0, 0),
   so->output[i].dst_offset*4);
lp_build_endif(&if_ctx_st

[Mesa-dev] [PATCH 6/6] radeonsi: reload PS inputs with direct indexing at each use

2016-09-13 Thread Marek Olšák
From: Marek Olšák 

The LLVM compiler can CSE interp intrinsics thanks to
LLVMReadNoneAttribute.

26011 shaders in 14651 tests
Totals:
SGPRS: 1146340 -> 1132676 (-1.19 %)
VGPRS: 727371 -> 711730 (-2.15 %)
Spilled SGPRs: 2218 -> 2078 (-6.31 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 35841268 -> 36009732 (0.47 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222559 -> 224779 (1.00 %)
Wait states: 0 -> 0 (0.00 %)
---
 src/gallium/drivers/radeon/radeon_llvm.h   |  6 -
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 28 ++
 src/gallium/drivers/radeonsi/si_shader.c   | 27 +
 3 files changed, 39 insertions(+), 22 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_llvm.h 
b/src/gallium/drivers/radeon/radeon_llvm.h
index da5b7f5..f508d32 100644
--- a/src/gallium/drivers/radeon/radeon_llvm.h
+++ b/src/gallium/drivers/radeon/radeon_llvm.h
@@ -23,21 +23,23 @@
  * Authors: Tom Stellard 
  *
  */
 
 #ifndef RADEON_LLVM_H
 #define RADEON_LLVM_H
 
 #include 
 #include "gallivm/lp_bld_init.h"
 #include "gallivm/lp_bld_tgsi.h"
+#include "tgsi/tgsi_parse.h"
 
+#define RADEON_LLVM_MAX_INPUT_SLOTS 32
 #define RADEON_LLVM_MAX_INPUTS 32 * 4
 #define RADEON_LLVM_MAX_OUTPUTS 32 * 4
 
 #define RADEON_LLVM_INITIAL_CF_DEPTH 4
 
 #define RADEON_LLVM_MAX_SYSTEM_VALUES 4
 
 struct radeon_llvm_branch {
LLVMBasicBlockRef endif_block;
LLVMBasicBlockRef if_block;
@@ -55,33 +57,35 @@ struct radeon_llvm_context {
 
/*=== Front end configuration ===*/
 
/* Instructions that are not described by any of the TGSI opcodes. */
 
/** This function is responsible for initilizing the inputs array and 
will be
  * called once for each input declared in the TGSI shader.
  */
void (*load_input)(struct radeon_llvm_context *,
   unsigned input_index,
-  const struct tgsi_full_declaration *decl);
+  const struct tgsi_full_declaration *decl,
+  LLVMValueRef out[4]);
 
void (*load_system_value)(struct radeon_llvm_context *,
  unsigned index,
  const struct tgsi_full_declaration *decl);
 
void (*declare_memory_region)(struct radeon_llvm_context *,
  const struct tgsi_full_declaration *decl);
 
/** This array contains the input values for the shader.  Typically 
these
  * values will be in the form of a target intrinsic that will inform 
the
  * backend how to load the actual inputs to the shader. 
  */
+   struct tgsi_full_declaration input_decls[RADEON_LLVM_MAX_INPUT_SLOTS];
LLVMValueRef inputs[RADEON_LLVM_MAX_INPUTS];
LLVMValueRef outputs[RADEON_LLVM_MAX_OUTPUTS][TGSI_NUM_CHANNELS];
 
/** This pointer is used to contain the temporary values.
  * The amount of temporary used in tgsi can't be bound to a max value 
and
  * thus we must allocate this array at runtime.
  */
LLVMValueRef *temps;
unsigned temps_count;
LLVMValueRef system_values[RADEON_LLVM_MAX_SYSTEM_VALUES];
diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 4643e6d..11f0cf2 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -439,28 +439,43 @@ LLVMValueRef radeon_llvm_emit_fetch(struct 
lp_build_tgsi_context *bld_base,
bld_base->int_bld.zero);
result = LLVMConstInsertElement(result,

bld->immediates[reg->Register.Index][swizzle + 1],
bld_base->int_bld.one);
return LLVMConstBitCast(result, ctype);
} else {
return 
LLVMConstBitCast(bld->immediates[reg->Register.Index][swizzle], ctype);
}
}
 
-   case TGSI_FILE_INPUT:
-   result = 
ctx->inputs[radeon_llvm_reg_index_soa(reg->Register.Index, swizzle)];
+   case TGSI_FILE_INPUT: {
+   unsigned index = reg->Register.Index;
+   LLVMValueRef input[4];
+
+   /* I don't think doing this for vertex shaders is beneficial.
+* For those, we want to make sure the VMEM loads are executed
+* only once. Fragment shaders don't care much, because
+* v_interp instructions are much cheaper than VMEM loads.
+*/
+   if (ctx->soa.bld_base.info->processor == PIPE_SHADER_FRAGMENT)
+   ctx->load_input(ctx, index, &ctx->input_decls[index], 
input);
+   else
+   

[Mesa-dev] [PATCH 2/6] radeonsi: cosmetic changes in si_shader.c

2016-09-13 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_shader.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index b9ad4be..696f67b 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -98,29 +98,31 @@ struct si_shader_context
 */
int param_tess_offchip;
 
LLVMTargetMachineRef tm;
 
unsigned invariant_load_md_kind;
unsigned range_md_kind;
unsigned uniform_md_kind;
LLVMValueRef empty_md;
 
+   /* Preloaded descriptors. */
LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS];
-   LLVMValueRef lds;
LLVMValueRef shader_buffers[SI_NUM_SHADER_BUFFERS];
LLVMValueRef sampler_views[SI_NUM_SAMPLERS];
LLVMValueRef sampler_states[SI_NUM_SAMPLERS];
LLVMValueRef fmasks[SI_NUM_SAMPLERS];
LLVMValueRef images[SI_NUM_IMAGES];
LLVMValueRef esgs_ring;
LLVMValueRef gsvs_ring[4];
+
+   LLVMValueRef lds;
LLVMValueRef gs_next_vertex[4];
LLVMValueRef return_value;
 
LLVMTypeRef voidt;
LLVMTypeRef i1;
LLVMTypeRef i8;
LLVMTypeRef i32;
LLVMTypeRef i64;
LLVMTypeRef i128;
LLVMTypeRef f32;
@@ -5856,21 +5858,21 @@ static void create_function(struct si_shader_context 
*ctx)
LLVMArrayType(ctx->i32, 64),
"ddxy_lds",
LOCAL_ADDR_SPACE);
 
if ((ctx->type == PIPE_SHADER_VERTEX && shader->key.vs.as_ls) ||
ctx->type == PIPE_SHADER_TESS_CTRL ||
ctx->type == PIPE_SHADER_TESS_EVAL)
declare_tess_lds(ctx);
 }
 
-static void preload_constants(struct si_shader_context *ctx)
+static void preload_constant_buffers(struct si_shader_context *ctx)
 {
struct lp_build_tgsi_context *bld_base = &ctx->radeon_bld.soa.bld_base;
struct gallivm_state *gallivm = bld_base->base.gallivm;
const struct tgsi_shader_info *info = bld_base->info;
unsigned buf;
LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, 
SI_PARAM_CONST_BUFFERS);
 
for (buf = 0; buf < SI_NUM_CONST_BUFFERS; buf++) {
if (info->const_file_max[buf] == -1)
continue;
@@ -6790,21 +6792,21 @@ int si_compile_tgsi_shader(struct si_screen *sscreen,
case PIPE_SHADER_COMPUTE:
ctx.radeon_bld.declare_memory_region = declare_compute_memory;
break;
default:
assert(!"Unsupported shader type");
return -1;
}
 
create_meta_data(&ctx);
create_function(&ctx);
-   preload_constants(&ctx);
+   preload_constant_buffers(&ctx);
preload_shader_buffers(&ctx);
preload_samplers(&ctx);
preload_images(&ctx);
preload_ring_buffers(&ctx);
 
if (ctx.is_monolithic && sel->type == PIPE_SHADER_FRAGMENT &&
shader->key.ps.prolog.poly_stipple) {
LLVMValueRef list = LLVMGetParam(ctx.radeon_bld.main_fn,
 SI_PARAM_RW_BUFFERS);
si_llvm_emit_polygon_stipple(&ctx, list,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] radeonsi: get rid of constant buffer preloading

2016-09-13 Thread Marek Olšák
From: Marek Olšák 

26011 shaders in 14651 tests
Totals:
SGPRS: 1152636 -> 1146340 (-0.55 %)
VGPRS: 728198 -> 727371 (-0.11 %)
Spilled SGPRs: 3776 -> 2218 (-41.26 %)
Spilled VGPRs: 369 -> 369 (0.00 %)
Scratch VGPRs: 1344 -> 1344 (0.00 %) dwords per thread
Code Size: 35835152 -> 35841268 (0.02 %) bytes
LDS: 767 -> 767 (0.00 %) blocks
Max Waves: 222372 -> 222559 (0.08 %)
Wait states: 0 -> 0 (0.00 %)
---
 src/gallium/drivers/radeonsi/si_shader.c | 38 
 1 file changed, 14 insertions(+), 24 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index c96c52e..faa5363 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -99,21 +99,20 @@ struct si_shader_context
int param_tess_offchip;
 
LLVMTargetMachineRef tm;
 
unsigned invariant_load_md_kind;
unsigned range_md_kind;
unsigned uniform_md_kind;
LLVMValueRef empty_md;
 
/* Preloaded descriptors. */
-   LLVMValueRef const_buffers[SI_NUM_CONST_BUFFERS];
LLVMValueRef esgs_ring;
LLVMValueRef gsvs_ring[4];
 
LLVMValueRef lds;
LLVMValueRef gs_next_vertex[4];
LLVMValueRef return_value;
 
LLVMTypeRef voidt;
LLVMTypeRef i1;
LLVMTypeRef i8;
@@ -1842,20 +1841,29 @@ static void declare_compute_memory(struct 
radeon_llvm_context *radeon_bld,
 
var = LLVMAddGlobalInAddressSpace(gallivm->module,
  LLVMArrayType(ctx->i8, 
sel->local_size),
  "compute_lds",
  LOCAL_ADDR_SPACE);
LLVMSetAlignment(var, 4);
 
ctx->shared_memory = LLVMBuildBitCast(gallivm->builder, var, i8p, "");
 }
 
+static LLVMValueRef load_const_buffer_desc(struct si_shader_context *ctx, int 
i)
+{
+   LLVMValueRef list_ptr = LLVMGetParam(ctx->radeon_bld.main_fn,
+SI_PARAM_CONST_BUFFERS);
+
+   return build_indexed_load_const(ctx, list_ptr,
+   LLVMConstInt(ctx->i32, i, 0));
+}
+
 static LLVMValueRef fetch_constant(
struct lp_build_tgsi_context *bld_base,
const struct tgsi_full_src_register *reg,
enum tgsi_opcode_type type,
unsigned swizzle)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
struct lp_build_context *base = &bld_base->base;
const struct tgsi_ind_register *ireg = ®->Indirect;
unsigned buf, idx;
@@ -1869,45 +1877,46 @@ static LLVMValueRef fetch_constant(
for (chan = 0; chan < TGSI_NUM_CHANNELS; ++chan)
values[chan] = fetch_constant(bld_base, reg, type, 
chan);
 
return lp_build_gather_values(bld_base->base.gallivm, values, 
4);
}
 
buf = reg->Register.Dimension ? reg->Dimension.Index : 0;
idx = reg->Register.Index * 4 + swizzle;
 
if (!reg->Register.Indirect && !reg->Dimension.Indirect) {
-   LLVMValueRef c0, c1;
+   LLVMValueRef c0, c1, desc;
 
-   c0 = buffer_load_const(ctx, ctx->const_buffers[buf],
+   desc = load_const_buffer_desc(ctx, buf);
+   c0 = buffer_load_const(ctx, desc,
   LLVMConstInt(ctx->i32, idx * 4, 0));
 
if (!tgsi_type_is_64bit(type))
return bitcast(bld_base, type, c0);
else {
-   c1 = buffer_load_const(ctx, ctx->const_buffers[buf],
+   c1 = buffer_load_const(ctx, desc,
   LLVMConstInt(ctx->i32,
(idx + 1) * 4, 0));
return radeon_llvm_emit_fetch_64bit(bld_base, type,
c0, c1);
}
}
 
if (reg->Register.Dimension && reg->Dimension.Indirect) {
LLVMValueRef ptr = LLVMGetParam(ctx->radeon_bld.main_fn, 
SI_PARAM_CONST_BUFFERS);
LLVMValueRef index;
index = get_bounded_indirect_index(ctx, ®->DimIndirect,
   reg->Dimension.Index,
   SI_NUM_CONST_BUFFERS);
bufp = build_indexed_load_const(ctx, ptr, index);
} else
-   bufp = ctx->const_buffers[buf];
+   bufp = load_const_buffer_desc(ctx, buf);
 
addr = ctx->radeon_bld.soa.addr[ireg->Index][ireg->Swizzle];
addr = LLVMBuildLoad(base->gallivm->builder, addr, "load addr reg");
addr = lp_build_mul_imm(&bld_base->uint_bld, addr, 16);
addr = lp_build_add(&bld_base->uint_bld, addr,
lp_build_const_int32(base->gallivm, idx * 4));
 
result = buffer_

  1   2   >