date:20160408

[Mesa-dev] r600g,radeonsi: ARB_query_buffer_object grunt work

2016-04-08 Thread Edward O'Callaghan

These two patches provide just the grunt work for ARB_query_buffer_object
support. My incomplete hardware specific callback has been left out of
this series. However this does allow a user to fake the PIPE_CAP and not
just get a segfault from a unchecked derefenced to a NULL pointer from
within 'st_StoreQueryResult()'.

Edward O'Callaghan (2):
  radeon: Provision a query_ops callback,
  radeon: Implement the get_query_result_resource() call-in
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] radeon: Implement the get_query_result_resource() call-in site

2016-04-08 Thread Edward O'Callaghan

Implement the frontend part of the ARB_query_buffer_object
gallium callback into the driver query_ops.

Signed-off-by: Edward O'Callaghan 
---
 src/gallium/drivers/radeon/r600_query.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/gallium/drivers/radeon/r600_query.c 
b/src/gallium/drivers/radeon/r600_query.c
index 7a2d2ee..b75ad2d 100644
--- a/src/gallium/drivers/radeon/r600_query.c
+++ b/src/gallium/drivers/radeon/r600_query.c
@@ -899,6 +899,24 @@ static void r600_query_hw_add_result(struct 
r600_common_context *ctx,
}
 }
 
+static void r600_get_query_result_resource(struct pipe_context *ctx,
+  struct pipe_query *query, boolean 
wait,
+  enum pipe_query_value_type 
result_type,
+  int index, struct pipe_resource 
*resource,
+  unsigned offset)
+{
+   struct r600_common_context *rctx = (struct r600_common_context *)ctx;
+   struct r600_query *rquery = (struct r600_query *)query;
+
+   if (!rquery->ops->get_query_result_resource) {
+  assert(!"Unexpected lack of get_query_result_resource");
+  return;
+   }
+
+   rquery->ops->get_query_result_resource(rctx, rquery, wait, result_type,
+  index, resource, offset);
+}
+
 static boolean r600_get_query_result(struct pipe_context *ctx,
 struct pipe_query *query, boolean wait,
 union pipe_query_result *result)
@@ -1269,6 +1287,7 @@ void r600_query_init(struct r600_common_context *rctx)
rctx->b.begin_query = r600_begin_query;
rctx->b.end_query = r600_end_query;
rctx->b.get_query_result = r600_get_query_result;
+   rctx->b.get_query_result_resource = r600_get_query_result_resource;
rctx->render_cond_atom.emit = r600_emit_query_predication;
 
if (((struct 
r600_common_screen*)rctx->b.screen)->info.num_render_backends > 0)
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] radeon: Provision a query_ops callback, get_query_result_resource()

2016-04-08 Thread Edward O'Callaghan

ARB_query_buffer_object requires us to provide a query_ops
callback for statistics about the operation in the OpenGL pipeline.

Signed-off-by: Edward O'Callaghan 
---
 src/gallium/drivers/radeon/r600_query.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/gallium/drivers/radeon/r600_query.h 
b/src/gallium/drivers/radeon/r600_query.h
index 8b2c4e3..b3d4501 100644
--- a/src/gallium/drivers/radeon/r600_query.h
+++ b/src/gallium/drivers/radeon/r600_query.h
@@ -73,6 +73,11 @@ struct r600_query_ops {
boolean (*get_result)(struct r600_common_context *,
  struct r600_query *, boolean wait,
  union pipe_query_result *result);
+   void (*get_query_result_resource)(struct r600_common_context *,
+ struct r600_query *, boolean wait,
+ enum pipe_query_value_type result_type,
+ int index, struct pipe_resource *resource,
+ unsigned offset);
 };
 
 struct r600_query {
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] dri/i965: extend GLES3 sRGB workaround to cover all byte orders

2016-04-08 Thread Jason Ekstrand

On Wed, Apr 6, 2016 at 10:35 PM, Haixia Shi  wrote:

> I am afraid that I am not the original author of the
> gles3_srgb_workaround, and from the comments it appears to only apply for
> argb formats, so I am not sure if extending it to all other formats would
> cause behavioral regression.
> On Apr 6, 2016 10:13 PM, "Jason Ekstrand"  wrote:
>
>>
>> On Apr 6, 2016 7:23 PM, "Kenneth Graunke"  wrote:
>> >
>> > On Wednesday, April 6, 2016 4:43:39 PM PDT Haixia Shi wrote:
>> > > It is incorrect to assume BGRA byte order for the GLES3 sRGB
>> workaround.
>> > >
>> > > Signed-off-by: Haixia Shi 
>> > > Reviewed-by: Stéphane Marchesin 
>> > > Cc: kenneth.w.grau...@intel.com
>> > >
>> > > Change-Id: I5a081d7eaa7544afff0e7874cffef80d3f69a401
>> > > ---
>> > >  src/mesa/drivers/dri/i965/brw_context.c | 18 --
>> > >  1 file changed, 16 insertions(+), 2 deletions(-)
>> > >
>> > > diff --git a/src/mesa/drivers/dri/i965/brw_context.c
>> b/src/mesa/drivers/dri/
>> > i965/brw_context.c
>> > > index 2d480d0..cebfbda 100644
>> > > --- a/src/mesa/drivers/dri/i965/brw_context.c
>> > > +++ b/src/mesa/drivers/dri/i965/brw_context.c
>> > > @@ -1151,9 +1151,23 @@ intel_gles3_srgb_workaround(struct brw_context
>> *brw,
>> > >  */
>> > > fb->Visual.sRGBCapable = false;
>> > > for (int i = 0; i < BUFFER_COUNT; i++) {
>> > > -  if (fb->Attachment[i].Renderbuffer &&
>> > > -  fb->Attachment[i].Renderbuffer->Format ==
>> > MESA_FORMAT_B8G8R8A8_SRGB) {
>> > > +  if (!fb->Attachment[i].Renderbuffer)
>> > > + continue;
>> > > +  switch (fb->Attachment[i].Renderbuffer->Format) {
>> > > +  case MESA_FORMAT_A8B8G8R8_SRGB:
>> > > + fb->Attachment[i].Renderbuffer->Format =
>> > MESA_FORMAT_A8B8G8R8_UNORM;
>> > > + break;
>> > > +  case MESA_FORMAT_B8G8R8A8_SRGB:
>> > >   fb->Attachment[i].Renderbuffer->Format =
>> > MESA_FORMAT_B8G8R8A8_UNORM;
>> > > + break;
>> > > +  case MESA_FORMAT_A8R8G8B8_SRGB:
>> > > + fb->Attachment[i].Renderbuffer->Format =
>> > MESA_FORMAT_A8R8G8B8_UNORM;
>> > > + break;
>> > > +  case MESA_FORMAT_R8G8B8A8_SRGB:
>> > > + fb->Attachment[i].Renderbuffer->Format =
>> > MESA_FORMAT_R8G8B8A8_UNORM;
>> > > + break;
>> > > +  default:
>> > > + break;
>> > >}
>> > > }
>> > >  }
>> > >
>> >
>> > Why don't we simply do:
>> >
>> >struct gl_renderbuffer *rb = fb->Attachment[i].Renderbuffer;
>> >rb->Format = _mesa_get_srgb_format_linear(rb->Format);
>> >
>> > This would handle far more formats than we need to, but that shouldn't
>> > be a problem.
>>
>> I'll second that.
>>
>
Look at the comment right above the function in question, I think Ken's
suggest is exactly the right thing to do.  There's nothing involved here
that should be -specific.
--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] glsl: handle unsigned int wraparound in link_shaders()

2016-04-08 Thread Timothy Arceri

Pushed thanks!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa: enable ARB_explicit_uniform_location for swrast

2016-04-08 Thread Timothy Arceri

This is untested but since ARB_explicit_attrib_location is enabled
there should be no problem enabling this also.
---
 src/mesa/main/extensions.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index fa50cb6..a889902 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -132,6 +132,7 @@ _mesa_enable_sw_extensions(struct gl_context *ctx)
ctx->Extensions.ARB_draw_elements_base_vertex = GL_TRUE;
ctx->Extensions.ARB_draw_instanced = GL_TRUE;
ctx->Extensions.ARB_explicit_attrib_location = GL_TRUE;
+   ctx->Extensions.ARB_explicit_uniform_location = GL_TRUE;
ctx->Extensions.ARB_fragment_coord_conventions = GL_TRUE;
ctx->Extensions.ARB_fragment_program = GL_TRUE;
ctx->Extensions.ARB_fragment_program_shadow = GL_TRUE;
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/4] nir: Silence unused "options" warning in algebraic passes.

2016-04-08 Thread Dylan Baker

Quoting Dylan Baker (2016-04-08 14:54:38)
> Quoting Kenneth Graunke (2016-04-08 11:48:01)
> > On Friday, April 8, 2016 11:18:46 AM PDT Dylan Baker wrote: > 
> > > Quoting Eduardo Lima Mitev (2016-04-08 08:26:22)
> > > > On 04/08/2016 01:35 AM, Kenneth Graunke wrote:
> > > > > Some passes may not refer to options->..., at which point the compiler
> > > > > will warn about an unused variable. Just cast to void unconditionally
> > > > > to shut it up.
> > > > >
> > > > > Signed-off-by: Kenneth Graunke 
> > > > > ---
> > > > > src/compiler/nir/nir_algebraic.py | 1 +
> > > > > 1 file changed, 1 insertion(+)
> > > > >
> > > > > diff --git a/src/compiler/nir/nir_algebraic.py b/src/compiler/nir/
> > nir_algebraic.py
> > > > > index d05564f..53a7907 100644
> > > > > --- a/src/compiler/nir/nir_algebraic.py
> > > > > +++ b/src/compiler/nir/nir_algebraic.py
> > > > > @@ -291,6 +291,7 @@ ${pass_name}(nir_shader *shader)
> > > > > bool progress = false;
> > > > > bool condition_flags[${len(condition_list)}];
> > > > > const nir_shader_compiler_options *options = shader->options;
> > > > > + (void) options;
> > > > >
> > > >
> > > > Hmm, don't like this very much. I suppose since it is generated code, it
> > > > can not be done per block where options is unused.
> > > > For respect to other people's right to have clean build logs, patch is:
> > > >
> > > > Reviewed-by: Eduardo Lima Mitev 
> > > >
> > > > > % for index, condition in enumerate(condition_list):
> > > > > condition_flags[${index}] = ${condition};
> > > > >
> > > >
> > >
> > > I'm not familiar with the code, but looking at it options must be used
> > > in the condition_list argument?
> >  
> > Right, we create the 'options' temporary so you can write rules like:
> >  
> > (('~fmin', ('fmax', a, 0.0), 1.0), ('fsat', a), '!options->lower_fsat'),
> >  
> > where !options->lower_fsat becomes part of the condition_list. However,
> > conditions are optional. If you have an algebraic rule list which doesn't
> > include any conditions referring to options->..., the variable will be
> > unused. (For example, the list in patch 4 meets this criteria.)
> >  
> 
> So what about wrapping the definitions of options in a conditional?
> % if 'options' in conditions[-1]:
> const nir_shader_compiler_options *options = shader->options;
> % endif
> 
> This only covers that options is the last argument, which seems to be
> what nir_opt_algebraic does, although I don't know that is actually a
> hard requirement, or just a coincidence.
> 
> Dylan

Never mind. Jason and I talked about this and this might possibly work,
by accident. Please ignore.

Dylan


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/5] include/GL: add mesa_glinterop.h for OpenGL-OpenCL interop (v3)

2016-04-08 Thread Marek Olšák

On Fri, Apr 1, 2016 at 2:13 PM, Emil Velikov  wrote:
> On 16 March 2016 at 10:40, Marek Olšák  wrote:
>
>> "offset_after" isn't nasty. :) Yeah, I was inspired by other APIs I
>> had seen. The sizes make even more sense when they are function
>> parameters, because the caller can just do:
>> (sizeof(in), , sizeof(out), )
>>
> A nice list of arguments:
>  - If the majority of people like offset_after, the question "Why
> barely any projects use it (from a quick search) ?" comes to mind.
>  - I wasn't the only one advocating for versioned interfaces ;-)
>  - They will just work in an identical way and the code will be less.
>  - Regardless of how ugly/nasty/etc, mesa uses versioned interfaces
> throughout. "Consistency is the key" a wise man have said once.
>  - The interfaces using explicit size argument, that I'm aware of, are
> not designed with extensibility in mind.

Updated branch with your suggestions applied:
https://cgit.freedesktop.org/~mareko/mesa/log/?h=interop

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: remove duplicate Makefile.sources

2016-04-08 Thread Rob Clark

On Fri, Apr 8, 2016 at 4:37 AM, Jose Fonseca  wrote:
> On 02/04/16 19:35, Rob Clark wrote:
>>
>> From: Rob Clark 
>>
>> Allegedly this was needed still by scons build.. but in practice it
>> doesn't seem to be needed.  Removing it and running 'scons' results
>> in no build errors.
>>
>> Signed-off-by: Rob Clark 
>> ---
>> So, afaict NIR is not even built w/ scons build (I'm just running
>> 'scons' with no args, so let me know if I'm missing some build
>> variant).  So at least if there is no scons variant that *does*
>> build NIR, I think this is the right thing to do to reduce
>> confusion.  But it brings up a bigger question of what to do
>> with my patchset which adds NIR support in mesa/st, since that
>> obviosly won't work with scons build as-is.
>>
>> I guess the two options are to try to add NIR into scons build
>> (which involves some .py generated code, so maybe not trivial)
>> or just #ifdef'ify all the mesa/st parts in my gallium-nir
>> patchset which introduce dependencies on NIR.  Opinions?
>
>
> If you might recall, I already had patches to build NIR with SCons:
>
>   http://cgit.freedesktop.org/~jrfonseca/mesa/log/?h=scons-nir
>
> I ended up not taking any action at the time.  First due to lack of time,
> second because I noticed NIR source trees being moving around, so the risk
> of exposing scons build to breakage didn't seem worthwhile.
>
> So those changes need to be rebased and probably updated a bit, but it gives
> you an idea of what needs to be done.


So, I don't suppose you could give me some hint about how you are
supposed to get path to generated header (in this case nir_opcodes.h
included from nir.h included from mesa/st) into the include path for
src/mesa?

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 7/7] gallium/radeon: remove R600_QUERY_HW_FLAG_TIMER

2016-04-08 Thread Marek Olšák

From: Marek Olšák 

not used anymore
---
 src/gallium/drivers/r600/r600_hw_context.c| 2 +-
 src/gallium/drivers/radeon/r600_perfcounter.c | 1 -
 src/gallium/drivers/radeon/r600_query.c   | 4 +---
 src/gallium/drivers/radeon/r600_query.h   | 3 +--
 4 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 3ef2ac5..0c3b580 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -64,7 +64,7 @@ void r600_need_cs_space(struct r600_context *ctx, unsigned 
num_dw,
num_dw += R600_MAX_FLUSH_CS_DWORDS + R600_MAX_DRAW_CS_DWORDS;
}
 
-   /* Count in queries_suspend. */
+   /* Count in r600_suspend_queries. */
num_dw += ctx->b.num_cs_dw_queries_suspend;
 
/* Count in streamout_end at the end of CS. */
diff --git a/src/gallium/drivers/radeon/r600_perfcounter.c 
b/src/gallium/drivers/radeon/r600_perfcounter.c
index f3529a1..9ab17d9 100644
--- a/src/gallium/drivers/radeon/r600_perfcounter.c
+++ b/src/gallium/drivers/radeon/r600_perfcounter.c
@@ -310,7 +310,6 @@ struct pipe_query *r600_create_batch_query(struct 
pipe_context *ctx,
 
query->b.b.ops = _query_ops;
query->b.ops = _query_hw_ops;
-   query->b.flags = R600_QUERY_HW_FLAG_TIMER;
 
query->num_counters = num_queries;
 
diff --git a/src/gallium/drivers/radeon/r600_query.c 
b/src/gallium/drivers/radeon/r600_query.c
index aa86560..de6e37b 100644
--- a/src/gallium/drivers/radeon/r600_query.c
+++ b/src/gallium/drivers/radeon/r600_query.c
@@ -369,13 +369,11 @@ static struct pipe_query *r600_query_hw_create(struct 
r600_common_context *rctx,
query->result_size = 16;
query->num_cs_dw_begin = 8;
query->num_cs_dw_end = 8;
-   query->flags = R600_QUERY_HW_FLAG_TIMER;
break;
case PIPE_QUERY_TIMESTAMP:
query->result_size = 8;
query->num_cs_dw_end = 8;
-   query->flags = R600_QUERY_HW_FLAG_TIMER |
-  R600_QUERY_HW_FLAG_NO_START;
+   query->flags = R600_QUERY_HW_FLAG_NO_START;
break;
case PIPE_QUERY_PRIMITIVES_EMITTED:
case PIPE_QUERY_PRIMITIVES_GENERATED:
diff --git a/src/gallium/drivers/radeon/r600_query.h 
b/src/gallium/drivers/radeon/r600_query.h
index 8b2c4e3..9f3a917 100644
--- a/src/gallium/drivers/radeon/r600_query.h
+++ b/src/gallium/drivers/radeon/r600_query.h
@@ -84,8 +84,7 @@ struct r600_query {
 
 enum {
R600_QUERY_HW_FLAG_NO_START = (1 << 0),
-   R600_QUERY_HW_FLAG_TIMER = (1 << 1),
-   R600_QUERY_HW_FLAG_PREDICATE = (1 << 2),
+   R600_QUERY_HW_FLAG_PREDICATE = (1 << 1),
 };
 
 struct r600_query_hw_ops {
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/7] gallium/radeon: merge timer and non-timer query lists

2016-04-08 Thread Marek Olšák

From: Marek Olšák 

All of them are paused only between IBs.
---
 src/gallium/drivers/r600/r600_hw_context.c|  3 +-
 src/gallium/drivers/radeon/r600_pipe_common.c | 18 ++--
 src/gallium/drivers/radeon/r600_pipe_common.h | 19 +++-
 src/gallium/drivers/radeon/r600_query.c   | 65 ++-
 4 files changed, 23 insertions(+), 82 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 63b631a..3ef2ac5 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -65,8 +65,7 @@ void r600_need_cs_space(struct r600_context *ctx, unsigned 
num_dw,
}
 
/* Count in queries_suspend. */
-   num_dw += ctx->b.num_cs_dw_nontimer_queries_suspend +
- ctx->b.num_cs_dw_timer_queries_suspend;
+   num_dw += ctx->b.num_cs_dw_queries_suspend;
 
/* Count in streamout_end at the end of CS. */
if (ctx->b.streamout.begin_emitted) {
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 32bd6e4..f587332 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -156,14 +156,8 @@ static void r600_memory_barrier(struct pipe_context *ctx, 
unsigned flags)
 void r600_preflush_suspend_features(struct r600_common_context *ctx)
 {
/* suspend queries */
-   if (ctx->num_cs_dw_nontimer_queries_suspend) {
-   /* Since non-timer queries are suspended during blits,
-* we have to guard against double-suspends. */
-   r600_suspend_nontimer_queries(ctx);
-   ctx->nontimer_queries_suspended_by_flush = true;
-   }
-   if (!LIST_IS_EMPTY(>active_timer_queries))
-   r600_suspend_timer_queries(ctx);
+   if (!LIST_IS_EMPTY(>active_queries))
+   r600_suspend_queries(ctx);
 
ctx->streamout.suspended = false;
if (ctx->streamout.begin_emitted) {
@@ -180,12 +174,8 @@ void r600_postflush_resume_features(struct 
r600_common_context *ctx)
}
 
/* resume queries */
-   if (!LIST_IS_EMPTY(>active_timer_queries))
-   r600_resume_timer_queries(ctx);
-   if (ctx->nontimer_queries_suspended_by_flush) {
-   ctx->nontimer_queries_suspended_by_flush = false;
-   r600_resume_nontimer_queries(ctx);
-   }
+   if (!LIST_IS_EMPTY(>active_queries))
+   r600_resume_queries(ctx);
 }
 
 static void r600_flush_from_st(struct pipe_context *ctx,
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index 57af0ff..c387922 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -428,18 +428,11 @@ struct r600_common_context {
unsigned flags; /* flush flags */
 
/* Queries. */
-   /* The list of active queries. */
+   /* Maintain the list of active queries for pausing between IBs. */
int num_occlusion_queries;
int num_perfect_occlusion_queries;
-   /* Keep track of non-timer queries, because they should be suspended
-* during context flushing.
-* The timer queries (TIME_ELAPSED) shouldn't be suspended for blits,
-* but they should be suspended between IBs. */
-   struct list_headactive_nontimer_queries;
-   struct list_headactive_timer_queries;
-   unsignednum_cs_dw_nontimer_queries_suspend;
-   boolnontimer_queries_suspended_by_flush;
-   unsignednum_cs_dw_timer_queries_suspend;
+   struct list_headactive_queries;
+   unsignednum_cs_dw_queries_suspend;
/* Additional hardware info. */
unsignedbackend_mask;
unsignedmax_db; /* for OQ */
@@ -569,10 +562,8 @@ void r600_perfcounters_destroy(struct r600_common_screen 
*rscreen);
 /* r600_query.c */
 void r600_init_screen_query_functions(struct r600_common_screen *rscreen);
 void r600_query_init(struct r600_common_context *rctx);
-void r600_suspend_nontimer_queries(struct r600_common_context *ctx);
-void r600_resume_nontimer_queries(struct r600_common_context *ctx);
-void r600_suspend_timer_queries(struct r600_common_context *ctx);
-void r600_resume_timer_queries(struct r600_common_context *ctx);
+void r600_suspend_queries(struct r600_common_context *ctx);
+void r600_resume_queries(struct r600_common_context *ctx);
 void r600_query_init_backend_mask(struct r600_common_context *ctx);
 
 /* r600_streamout.c */
diff --git a/src/gallium/drivers/radeon/r600_query.c 
b/src/gallium/drivers/radeon/r600_query.c
index 7a2d2ee..aa86560 100644
---

[Mesa-dev] [PATCH 1/7] gallium/radeon: move pipeline stat context flags to common code

2016-04-08 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeon/r600_pipe_common.h | 5 -
 src/gallium/drivers/radeonsi/si_pipe.h| 3 ---
 src/gallium/drivers/radeonsi/si_state.c   | 8 
 src/gallium/drivers/radeonsi/si_state_draw.c  | 4 ++--
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index 7da7736..57af0ff 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -50,7 +50,10 @@
 #define R600_RESOURCE_FLAG_FORCE_TILING
(PIPE_RESOURCE_FLAG_DRV_PRIV << 2)
 
 #define R600_CONTEXT_STREAMOUT_FLUSH   (1u << 0)
-#define R600_CONTEXT_PRIVATE_FLAG  (1u << 1)
+/* Pipeline & streamout query controls. */
+#define R600_CONTEXT_START_PIPELINE_STATS  (1u << 1)
+#define R600_CONTEXT_STOP_PIPELINE_STATS   (1u << 2)
+#define R600_CONTEXT_PRIVATE_FLAG  (1u << 3)
 
 /* special primitive types */
 #define R600_PRIM_RECTANGLE_LIST   PIPE_PRIM_MAX
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 8fcfcd2..f665c81 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -66,9 +66,6 @@
 /* Compute only. */
 #define SI_CONTEXT_FLUSH_WITH_INV_L2   (R600_CONTEXT_PRIVATE_FLAG << 13) /* 
TODO: merge with TC? */
 #define SI_CONTEXT_FLAG_COMPUTE(R600_CONTEXT_PRIVATE_FLAG << 
14)
-/* Pipeline & streamout query controls. */
-#define SI_CONTEXT_START_PIPELINE_STATS(R600_CONTEXT_PRIVATE_FLAG << 
15)
-#define SI_CONTEXT_STOP_PIPELINE_STATS (R600_CONTEXT_PRIVATE_FLAG << 16)
 
 #define SI_CONTEXT_FLUSH_AND_INV_FRAMEBUFFER (SI_CONTEXT_FLUSH_AND_INV_CB | \
  SI_CONTEXT_FLUSH_AND_INV_CB_META 
| \
diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 0c46425..94130a9 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -1358,11 +1358,11 @@ static void si_set_active_query_state(struct 
pipe_context *ctx, boolean enable)
 
/* Pipeline stat & streamout queries. */
if (enable) {
-   sctx->b.flags &= ~SI_CONTEXT_STOP_PIPELINE_STATS;
-   sctx->b.flags |= SI_CONTEXT_START_PIPELINE_STATS;
+   sctx->b.flags &= ~R600_CONTEXT_STOP_PIPELINE_STATS;
+   sctx->b.flags |= R600_CONTEXT_START_PIPELINE_STATS;
} else {
-   sctx->b.flags &= ~SI_CONTEXT_START_PIPELINE_STATS;
-   sctx->b.flags |= SI_CONTEXT_STOP_PIPELINE_STATS;
+   sctx->b.flags &= ~R600_CONTEXT_START_PIPELINE_STATS;
+   sctx->b.flags |= R600_CONTEXT_STOP_PIPELINE_STATS;
}
 
/* Occlusion queries. */
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index 105c5fb..40cad50 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -722,11 +722,11 @@ void si_emit_cache_flush(struct si_context *si_ctx, 
struct r600_atom *atom)
}
}
 
-   if (sctx->flags & SI_CONTEXT_START_PIPELINE_STATS) {
+   if (sctx->flags & R600_CONTEXT_START_PIPELINE_STATS) {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
radeon_emit(cs, EVENT_TYPE(V_028A90_PIPELINESTAT_START) |
EVENT_INDEX(0));
-   } else if (sctx->flags & SI_CONTEXT_STOP_PIPELINE_STATS) {
+   } else if (sctx->flags & R600_CONTEXT_STOP_PIPELINE_STATS) {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
radeon_emit(cs, EVENT_TYPE(V_028A90_PIPELINESTAT_STOP) |
EVENT_INDEX(0));
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] r600g: fix typo in r600 register definitions

2016-04-08 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/r600/r600d.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600d.h b/src/gallium/drivers/r600/r600d.h
index 3d223ed..ef99573 100644
--- a/src/gallium/drivers/r600/r600d.h
+++ b/src/gallium/drivers/r600/r600d.h
@@ -780,7 +780,7 @@
 #define   S_028D0C_STENCIL_COMPRESS_DISABLE(x) (((x) & 0x1) << 5)
 #define   S_028D0C_DEPTH_COMPRESS_DISABLE(x)   (((x) & 0x1) << 6)
 #define   S_028D0C_COPY_CENTROID(x)(((x) & 0x1) << 7)
-#define   S_028D0C_COPY_SAMPLE(x)  (((x) & 0x1) << 8)
+#define   S_028D0C_COPY_SAMPLE(x)  (((x) & 0x03) << 8)
 #define   S_028D0C_R700_PERFECT_ZPASS_COUNTS(x)(((x) & 0x1) << 15)
 #define   S_028D0C_CONSERVATIVE_Z_EXPORT(x)(((x) & 0x03) << 13)
 #define   G_028D0C_CONSERVATIVE_Z_EXPORT(x)(((x) >> 13) & 0x03)
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/7] r600g: don't manually stop queries for blitter

2016-04-08 Thread Marek Olšák

From: Marek Olšák 

r600_set_active_query_state does it better.
---
 src/gallium/drivers/r600/r600_blit.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_blit.c 
b/src/gallium/drivers/r600/r600_blit.c
index c52d5a9..7ddd4fa 100644
--- a/src/gallium/drivers/r600/r600_blit.c
+++ b/src/gallium/drivers/r600/r600_blit.c
@@ -54,8 +54,6 @@ static void r600_blitter_begin(struct pipe_context *ctx, enum 
r600_blitter_op op
 {
struct r600_context *rctx = (struct r600_context *)ctx;
 
-   r600_suspend_nontimer_queries(>b);
-
util_blitter_save_vertex_buffer_slot(rctx->blitter, 
rctx->vertex_buffer_state.vb);
util_blitter_save_vertex_elements(rctx->blitter, 
rctx->vertex_fetch_shader.cso);
util_blitter_save_vertex_shader(rctx->blitter, rctx->vs_shader);
@@ -98,7 +96,6 @@ static void r600_blitter_end(struct pipe_context *ctx)
struct r600_context *rctx = (struct r600_context *)ctx;
 
rctx->b.render_cond_force_off = false;
-   r600_resume_nontimer_queries(>b);
 }
 
 static unsigned u_max_sample(struct pipe_resource *r)
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/7] r600g: implement set_active_query_state for pausing occlusion queries

2016-04-08 Thread Marek Olšák

From: Marek Olšák 

Use ZPASS_INCREMENT_DISABLE everywhere.
---
 src/gallium/drivers/r600/evergreen_state.c   |  5 -
 src/gallium/drivers/r600/evergreend.h|  2 +-
 src/gallium/drivers/r600/r600_pipe.h |  1 +
 src/gallium/drivers/r600/r600_state.c|  6 +-
 src/gallium/drivers/r600/r600_state_common.c | 12 
 src/gallium/drivers/r600/r600d.h |  1 +
 src/gallium/drivers/radeon/r600_query.c  |  6 --
 7 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 077664d..c1b0b56 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -1802,12 +1802,15 @@ static void evergreen_emit_db_misc_state(struct 
r600_context *rctx, struct r600_
S_02800C_FORCE_HIS_ENABLE0(V_02800C_FORCE_DISABLE) |
S_02800C_FORCE_HIS_ENABLE1(V_02800C_FORCE_DISABLE);
 
-   if (rctx->b.num_occlusion_queries > 0) {
+   if (rctx->b.num_occlusion_queries > 0 &&
+   !a->occlusion_queries_disabled) {
db_count_control |= S_028004_PERFECT_ZPASS_COUNTS(1);
if (rctx->b.chip_class == CAYMAN) {
db_count_control |= 
S_028004_SAMPLE_RATE(a->log_samples);
}
db_render_override |= S_02800C_NOOP_CULL_DISABLE(1);
+   } else {
+   db_count_control |= S_028004_ZPASS_INCREMENT_DISABLE(1);
}
 
/* This is to fix a lockup when hyperz and alpha test are enabled at
diff --git a/src/gallium/drivers/r600/evergreend.h 
b/src/gallium/drivers/r600/evergreend.h
index ebe8c4a..a900458 100644
--- a/src/gallium/drivers/r600/evergreend.h
+++ b/src/gallium/drivers/r600/evergreend.h
@@ -1735,7 +1735,7 @@
 #define   S_028000_COPY_SAMPLE(x)  (((x) & 0x7) << 8)
 #define   S_028000_COLOR_DISABLE(x)(((x) & 0x1) << 12)
 #define R_028004_DB_COUNT_CONTROL0x00028004
-#define   S_028004_ZPASS_INCREMENT_DISABLE(((x) & 0x1) << 0)
+#define   S_028004_ZPASS_INCREMENT_DISABLE(x) (((x) & 0x1) << 0)
 #define   S_028004_PERFECT_ZPASS_COUNTS(x)(((x) & 0x1) << 1)
 #define   S_028004_SAMPLE_RATE(x) (((x) & 0x7) << 4) /* cayman 
only */
 #define R_028008_DB_DEPTH_VIEW   0x00028008
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index de3fd06..0102638 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -120,6 +120,7 @@ struct r600_db_state {
 
 struct r600_db_misc_state {
struct r600_atomatom;
+   boolocclusion_queries_disabled;
boolflush_depthstencil_through_cb;
boolflush_depth_inplace;
boolflush_stencil_inplace;
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 62b46ce..c4de963 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -1644,12 +1644,16 @@ static void r600_emit_db_misc_state(struct r600_context 
*rctx, struct r600_atom
}
}
 
-   if (rctx->b.num_occlusion_queries > 0) {
+   if (rctx->b.num_occlusion_queries > 0 &&
+   !a->occlusion_queries_disabled) {
if (rctx->b.chip_class >= R700) {
db_render_control |= 
S_028D0C_R700_PERFECT_ZPASS_COUNTS(1);
}
db_render_override |= S_028D10_NOOP_CULL_DISABLE(1);
+   } else {
+   db_render_control |= S_028D0C_ZPASS_INCREMENT_DISABLE(1);
}
+
if (rctx->db_state.rsurf && rctx->db_state.rsurf->db_htile_surface) {
/* FORCE_OFF means HiZ/HiS are determined by DB_SHADER_CONTROL 
*/
db_render_override |= 
S_028D10_FORCE_HIZ_ENABLE(V_028D10_FORCE_OFF);
diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 32a1049..cdb493d 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -2860,6 +2860,17 @@ static void r600_invalidate_buffer(struct pipe_context 
*ctx, struct pipe_resourc
}
 }
 
+static void r600_set_active_query_state(struct pipe_context *ctx, boolean 
enable)
+{
+   struct r600_context *rctx = (struct r600_context*)ctx;
+
+   /* Occlusion queries. */
+   if (rctx->db_misc_state.occlusion_queries_disabled != !enable) {
+   rctx->db_misc_state.occlusion_queries_disabled = !enable;
+   r600_mark_atom_dirty(rctx, >db_misc_state.atom);
+   }
+}
+
 static void r600_set_occlusion_query_state(struct pipe_context *ctx, bool 
enable)
 {
struct r600_context *rctx = (struct

[Mesa-dev] [PATCH 2/7] r600g: simplify r600_set_occlusion_query_state

2016-04-08 Thread Marek Olšák

From: Marek Olšák 

The caller does the same checking.
---
 src/gallium/drivers/r600/evergreen_state.c   | 2 +-
 src/gallium/drivers/r600/r600_pipe.h | 1 -
 src/gallium/drivers/r600/r600_state.c| 2 +-
 src/gallium/drivers/r600/r600_state_common.c | 5 +
 4 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 6595267..077664d 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -1802,7 +1802,7 @@ static void evergreen_emit_db_misc_state(struct 
r600_context *rctx, struct r600_
S_02800C_FORCE_HIS_ENABLE0(V_02800C_FORCE_DISABLE) |
S_02800C_FORCE_HIS_ENABLE1(V_02800C_FORCE_DISABLE);
 
-   if (a->occlusion_query_enabled) {
+   if (rctx->b.num_occlusion_queries > 0) {
db_count_control |= S_028004_PERFECT_ZPASS_COUNTS(1);
if (rctx->b.chip_class == CAYMAN) {
db_count_control |= 
S_028004_SAMPLE_RATE(a->log_samples);
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index cd0052a..de3fd06 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -120,7 +120,6 @@ struct r600_db_state {
 
 struct r600_db_misc_state {
struct r600_atomatom;
-   boolocclusion_query_enabled;
boolflush_depthstencil_through_cb;
boolflush_depth_inplace;
boolflush_stencil_inplace;
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 3189a13..62b46ce 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -1644,7 +1644,7 @@ static void r600_emit_db_misc_state(struct r600_context 
*rctx, struct r600_atom
}
}
 
-   if (a->occlusion_query_enabled) {
+   if (rctx->b.num_occlusion_queries > 0) {
if (rctx->b.chip_class >= R700) {
db_render_control |= 
S_028D0C_R700_PERFECT_ZPASS_COUNTS(1);
}
diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 82babeb..32a1049 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -2864,10 +2864,7 @@ static void r600_set_occlusion_query_state(struct 
pipe_context *ctx, bool enable
 {
struct r600_context *rctx = (struct r600_context*)ctx;
 
-   if (rctx->db_misc_state.occlusion_query_enabled != enable) {
-   rctx->db_misc_state.occlusion_query_enabled = enable;
-   r600_mark_atom_dirty(rctx, >db_misc_state.atom);
-   }
+   r600_mark_atom_dirty(rctx, >db_misc_state.atom);
 }
 
 static void r600_need_gfx_cs_space(struct pipe_context *ctx, unsigned num_dw,
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/7] r600g: add pausing pipeline & streamout queries into set_active_query_state

2016-04-08 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/r600/evergreen_state.c   | 12 
 src/gallium/drivers/r600/r600_hw_context.c   | 10 ++
 src/gallium/drivers/r600/r600_pipe.h |  2 +-
 src/gallium/drivers/r600/r600_state.c|  6 ++
 src/gallium/drivers/r600/r600_state_common.c |  9 +
 5 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index c1b0b56..f76d7a9 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -2395,6 +2395,12 @@ static void cayman_init_atom_start_cs(struct 
r600_context *rctx)
r600_store_value(cb, PKT3(PKT3_EVENT_WRITE, 0, 0));
r600_store_value(cb, EVENT_TYPE(EVENT_TYPE_PS_PARTIAL_FLUSH) | 
EVENT_INDEX(4));
 
+   /* This enables pipeline stat & streamout queries.
+* They are only disabled by blits.
+*/
+   r600_store_value(cb, PKT3(PKT3_EVENT_WRITE, 0, 0));
+   r600_store_value(cb, EVENT_TYPE(EVENT_TYPE_PIPELINESTAT_START) | 
EVENT_INDEX(0));
+
cayman_init_common_regs(cb, rctx->b.chip_class,
rctx->b.family, rctx->screen->b.info.drm_minor);
 
@@ -2648,6 +2654,12 @@ void evergreen_init_atom_start_cs(struct r600_context 
*rctx)
r600_store_value(cb, PKT3(PKT3_EVENT_WRITE, 0, 0));
r600_store_value(cb, EVENT_TYPE(EVENT_TYPE_PS_PARTIAL_FLUSH) | 
EVENT_INDEX(4));
 
+   /* This enables pipeline stat & streamout queries.
+* They are only disabled by blits.
+*/
+   r600_store_value(cb, PKT3(PKT3_EVENT_WRITE, 0, 0));
+   r600_store_value(cb, EVENT_TYPE(EVENT_TYPE_PIPELINESTAT_START) | 
EVENT_INDEX(0));
+
evergreen_init_common_regs(rctx, cb, rctx->b.chip_class,
   rctx->b.family, 
rctx->screen->b.info.drm_minor);
 
diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 7a6f957..63b631a 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -223,6 +223,16 @@ void r600_flush_emit(struct r600_context *rctx)
cs->buf[cs->cdw++] = 0x000A;  /* POLL_INTERVAL */
}
 
+   if (rctx->b.flags & R600_CONTEXT_START_PIPELINE_STATS) {
+   radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
+   radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_PIPELINESTAT_START) |
+   EVENT_INDEX(0));
+   } else if (rctx->b.flags & R600_CONTEXT_STOP_PIPELINE_STATS) {
+   radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
+   radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_PIPELINESTAT_STOP) |
+   EVENT_INDEX(0));
+   }
+
if (wait_until) {
/* Use of WAIT_UNTIL is deprecated on Cayman+ */
if (rctx->b.family < CHIP_CAYMAN) {
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 0102638..86dd3c8 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -56,7 +56,7 @@
 #define R600_CONTEXT_WAIT_CP_DMA_IDLE  (R600_CONTEXT_PRIVATE_FLAG << 
10)
 
 /* the number of CS dwords for flushing and drawing */
-#define R600_MAX_FLUSH_CS_DWORDS   16
+#define R600_MAX_FLUSH_CS_DWORDS   18
 #define R600_MAX_DRAW_CS_DWORDS58
 
 #define R600_MAX_USER_CONST_BUFFERS 13
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index c4de963..02702ae 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -2177,6 +2177,12 @@ void r600_init_atom_start_cs(struct r600_context *rctx)
r600_store_value(cb, PKT3(PKT3_EVENT_WRITE, 0, 0));
r600_store_value(cb, EVENT_TYPE(EVENT_TYPE_PS_PARTIAL_FLUSH) | 
EVENT_INDEX(4));
 
+   /* This enables pipeline stat & streamout queries.
+* They are only disabled by blits.
+*/
+   r600_store_value(cb, PKT3(PKT3_EVENT_WRITE, 0, 0));
+   r600_store_value(cb, EVENT_TYPE(EVENT_TYPE_PIPELINESTAT_START) | 
EVENT_INDEX(0));
+
family = rctx->b.family;
ps_prio = 0;
vs_prio = 1;
diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index cdb493d..c03b75a 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -2864,6 +2864,15 @@ static void r600_set_active_query_state(struct 
pipe_context *ctx, boolean enable
 {
struct r600_context *rctx = (struct r600_context*)ctx;
 
+   /* Pipeline stat & streamout queries. */
+   if (enable) {
+   rctx->b.flags &= ~R600_CONTEXT_STOP_PIPELINE_STATS;
+   rctx->b.flags |= R600_CONTEXT_START_PIPELINE_STATS;
+   } else {
+   rctx->b.flags &=

Re: [Mesa-dev] [PATCH 2/2] Remove fs-discard-exit-3

2016-04-08 Thread Marek Olšák

BTW, Talos Principle had a bug that it calculated derivatives after
discard, which is undefined on radeonsi if it results in non-uniform
control flow. Then, somebody reported the bug to them, and they fixed
it.

I have to say that rarely do we find a test that quotes the spec and
expects the exact opposite. :)

(oh sorry, this should have been on the piglit ML)

Marek

On Fri, Apr 8, 2016 at 11:58 PM, Francisco Jerez  wrote:
> Marek Olšák  writes:
>
>> From: Marek Olšák 
>>
>> The test is wrong and the GLSL 1.30 citation in the test states very
>> clearly that discard can cause non-uniform control flow for any code that
>> follows.
>
> Heh, IIRC I sent this exact same patch over a year ago but I didn't get
> particularly enthusiastic feed-back so I never pushed it (you can
> probably find the discussion in the piglit mailing list archives if
> you're interested).  Anyway the test is obviously bogus so this seems
> like the right thing to do to me:
>
> Reviewed-by: Francisco Jerez 
>
>> ---
>>  .../execution/fs-discard-exit-3.shader_test| 76 
>> --
>>  1 file changed, 76 deletions(-)
>>  delete mode 100644 
>> tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test
>>
>> diff --git a/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test 
>> b/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test
>> deleted file mode 100644
>> index 14e9b47..000
>> --- a/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test
>> +++ /dev/null
>> @@ -1,76 +0,0 @@
>> -# This is a test for derivatives behavior after a discard.
>> -#
>> -# From the GLSL 1.30 spec:
>> -#
>> -# "The discard keyword is only allowed within fragment shaders. It
>> -#  can be used within a fragment shader to abandon the operation
>> -#  on the current fragment. This keyword causes the fragment to be
>> -#  discarded and no updates to any buffers will occur. Control
>> -#  flow exits the shader, and subsequent implicit or explicit
>> -#  derivatives are undefined when this control flow is non-uniform
>> -#  (meaning different fragments within the primitive take
>> -#  different control paths)."
>> -
>> -
>> -[require]
>> -GLSL >= 1.30
>> -
>> -[vertex shader]
>> -#version 130
>> -
>> -in vec4 vertex;
>> -out vec2 texcoords;
>> -void main()
>> -{
>> - gl_Position = vertex;
>> -
>> - /* Turn the texcoords into a 1:1 mapping with pixels when
>> -  * interpolated.  This means that the coords for our 2x2
>> -  * subspan we're interested in for the FS will be:
>> -  *
>> -  * +-+-+
>> -  * | 0,1 | 1,1 |
>> -  * +-+-+
>> -  * | 0,0 | 0,1 |
>> -  * +-+-+
>> -  *
>> -  * So it would sample the 1x1 miplevel of the GL_TEXTURE_2D
>> -  * miptree, unless some other math occurs...
>> -  */
>> - texcoords.yx = (vertex.xy + 1) / 2 * 250;
>> -}
>> -
>> -[fragment shader]
>> -#version 130
>> -in vec2 texcoords;
>> -uniform sampler2D s;
>> -
>> -void main()
>> -{
>> - if (gl_FragCoord.x >= 1.0 || gl_FragCoord.y >= 1.0)
>> - discard;
>> -
>> - /* Now, we have uniform control after the discard (well,
>> -  * except for the join after the if statement up there).  The
>> -  * derivatives on this sample should get us the same values
>> -  * for the undiscarded pixel as if we hadn't done any discard
>> -  * (comment out the "discard" above to see).
>> -  */
>> - gl_FragColor = texture(s, texcoords / 4);
>> -}
>> -
>> -[vertex data]
>> -vertex/float/2
>> --1.0 -1.0
>> - 1.0 -1.0
>> - 1.0  1.0
>> --1.0  1.0
>> -
>> -[test]
>> -clear color 0.5 0.5 0.5 0.5
>> -clear
>> -
>> -texture miptree 0
>> -
>> -draw arrays GL_TRIANGLE_FAN 0 4
>> -probe rgba 0 0 0.0 1.0 0.0 1.0
>> --
>> 2.5.0
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] i965: Port INTEL_PRECISE_TRIG=1 to NIR.

2016-04-08 Thread Jason Ekstrand

On Fri, Apr 8, 2016 at 9:36 AM, Eduardo Lima Mitev  wrote:

> On 04/08/2016 01:35 AM, Kenneth Graunke wrote:
>
>> This makes the extra multiply visible to NIR's algebraic optimizations
>> (for constant reassociation) as well as constant folding.  This means
>> that when the result of sin/cos are multiplied by an constant, we can
>> eliminate the extra multiply altogether, reducing the cost of the
>> workaround.
>>
>> It also means we only have to implement it one place, rather than in
>> both backends.
>>
>> This makes INTEL_PRECISE_TRIG=1 cost nothing on GPUTest/Volplosion,
>> which has a ton of sin() calls, but always multiplies them by an
>> immediate constant.  The extra multiply gets folded away.
>>
>>
> Cleaner, indeed. Patch (and series) is:
>
> Reviewed-by: Eduardo Lima Mitev 
>
> Thanks!
>
> Eduardo
>
>
> Signed-off-by: Kenneth Graunke 
>> ---
>>   src/mesa/drivers/dri/i965/Makefile.am  |  5 +++
>>   src/mesa/drivers/dri/i965/Makefile.sources |  1 +
>>   src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 16 +---
>>   src/mesa/drivers/dri/i965/brw_nir.c|  3 ++
>>   src/mesa/drivers/dri/i965/brw_nir.h|  2 +
>>   .../drivers/dri/i965/brw_nir_trig_workarounds.py   | 43
>> ++
>>   src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 16 +---
>>   7 files changed, 58 insertions(+), 28 deletions(-)
>>   create mode 100755 src/mesa/drivers/dri/i965/brw_nir_trig_workarounds.py
>>
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.am
>> b/src/mesa/drivers/dri/i965/Makefile.am
>> index 0db5a51..a41c830 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.am
>> +++ b/src/mesa/drivers/dri/i965/Makefile.am
>> @@ -33,6 +33,7 @@ AM_CFLAGS = \
>> -I$(top_srcdir)/src/mesa/drivers/dri/common \
>> -I$(top_srcdir)/src/mesa/drivers/dri/intel/server \
>> -I$(top_srcdir)/src/gtest/include \
>> +   -I$(top_srcdir)/src/compiler/nir \
>> -I$(top_builddir)/src/compiler/nir \
>> -I$(top_builddir)/src/mesa/drivers/dri/common \
>> $(DEFINES) \
>> @@ -41,6 +42,10 @@ AM_CFLAGS = \
>>
>>   AM_CXXFLAGS = $(AM_CFLAGS)
>>
>> +brw_nir_trig_workarounds.c: brw_nir_trig_workarounds.py
>> $(top_srcdir)/src/compiler/nir/nir_algebraic.py
>> +   $(MKDIR_GEN)
>> +   $(AM_V_GEN) PYTHONPATH=$(top_srcdir)/src/compiler/nir $(PYTHON2)
>> $(PYTHON_FLAGS) $(srcdir)/brw_nir_trig_workarounds.py > $@ || ($(RM) $@;
>> false)
>> +
>>   noinst_LTLIBRARIES = libi965_dri.la libi965_compiler.la
>>   libi965_dri_la_SOURCES = $(i965_FILES)
>>   libi965_dri_la_LIBADD = libi965_compiler.la $(INTEL_LIBS)
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources
>> b/src/mesa/drivers/dri/i965/Makefile.sources
>> index 4689588..2619e43 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>> @@ -44,6 +44,7 @@ i965_compiler_FILES = \
>> brw_nir.c \
>> brw_nir_analyze_boolean_resolves.c \
>> brw_nir_attribute_workarounds.c \
>> +   brw_nir_trig_workarounds.c \
>> brw_nir_opt_peephole_ffma.c \
>> brw_nir_uniforms.cpp \
>> brw_packed_float.c \
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> index 90b8789..bd6314a 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> @@ -775,24 +775,12 @@ fs_visitor::nir_emit_alu(const fs_builder ,
>> nir_alu_instr *instr)
>> break;
>>
>>  case nir_op_fsin:
>> -  if (!compiler->precise_trig) {
>> - inst = bld.emit(SHADER_OPCODE_SIN, result, op[0]);
>> -  } else {
>> - fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F);
>> - inst = bld.emit(SHADER_OPCODE_SIN, tmp, op[0]);
>> - inst = bld.MUL(result, tmp, brw_imm_f(0.7));
>> -  }
>> +  inst = bld.emit(SHADER_OPCODE_SIN, result, op[0]);
>> inst->saturate = instr->dest.saturate;
>> break;
>>
>>  case nir_op_fcos:
>> -  if (!compiler->precise_trig) {
>> - inst = bld.emit(SHADER_OPCODE_COS, result, op[0]);
>> -  } else {
>> - fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F);
>> - inst = bld.emit(SHADER_OPCODE_COS, tmp, op[0]);
>> - inst = bld.MUL(result, tmp, brw_imm_f(0.7));
>> -  }
>> +  inst = bld.emit(SHADER_OPCODE_COS, result, op[0]);
>> inst->saturate = instr->dest.saturate;
>> break;
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_nir.c
>> b/src/mesa/drivers/dri/i965/brw_nir.c
>> index 1821c0d..932979a 100644
>> --- a/src/mesa/drivers/dri/i965/brw_nir.c
>> +++ b/src/mesa/drivers/dri/i965/brw_nir.c
>> @@ -447,6 +447,9 @@ brw_preprocess_nir(const struct brw_compiler
>> *compiler, nir_shader *nir)
>>  if (nir->stage == MESA_SHADER_GEOMETRY)
>> OPT(nir_lower_gs_intrinsics);
>>
>> +   if (compiler->precise_trig)
>> +

Re: [Mesa-dev] [PATCH 1/4] nir: Do basic constant reassociation.

2016-04-08 Thread Jason Ekstrand

On Fri, Apr 8, 2016 at 8:21 AM, Eduardo Lima Mitev  wrote:

> On 04/08/2016 01:35 AM, Kenneth Graunke wrote:
>
>> Many shaders contain expression trees of the form:
>>
>>  const_1 * (value * const_2)
>>
>> Reorganizing these to
>>
>>  (const_1 * const_2) * value
>>
>> will allow constant folding to combine the constants.  Sometimes, these
>> constants are 2 and 0.5, so we can remove a multiply altogether.  Other
>> times, it can create more immediate constants, which can actually hurt.
>>
>
Further justification would be that the original was 4 NIR instructions: 2
load_const and 2 multiplies.  Once constant folding takes over, we are
again left with 4 NIR instructions: 2 load_const and 2 multiplies.  The
difference is that we now have (value * const_2) and (value * (const_1 *
const_2)).  So we have no more instructions in the end and, hopefully, the
(value * const_2) goes away.  Obviously, the emperical evidence says this
actually hurts sometimes, but that's not surprising.


> Finding a good balance here is tricky.  While much more could be done,
>> this simple patch seems to have a lot of positive benefit while having
>> a low downside.
>>
>>
> Makes sense.
>
> Reviewed-by: Eduardo Lima Mitev 
>
> shader-db results on Broadwell:
>>
>> total instructions in shared programs: 8963768 -> 8961369 (-0.03%)
>> instructions in affected programs: 438318 -> 435919 (-0.55%)
>> helped: 1502
>> HURT: 245
>>
>> total cycles in shared programs: 71527354 -> 71421516 (-0.15%)
>> cycles in affected programs: 11541788 -> 11435950 (-0.92%)
>> helped: 3445
>> HURT: 1224
>>
>>
> FWIW, these are my results on HSW:
>
> total instructions in shared programs: 6223995 -> 6222470 (-0.02%)
> instructions in affected programs: 300945 -> 299420 (-0.51%)
> helped: 1231
> HURT: 240
>
> total cycles in shared programs: 56809432 -> 56615552 (-0.34%)
> cycles in affected programs: 7980160 -> 7786280 (-2.43%)
> helped: 2087
> HURT: 634
>
> Eduardo
>
> Signed-off-by: Kenneth Graunke 
>> ---
>>   src/compiler/nir/nir_opt_algebraic.py | 8 
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/src/compiler/nir/nir_opt_algebraic.py
>> b/src/compiler/nir/nir_opt_algebraic.py
>> index e72b4a7..420d9d9 100644
>> --- a/src/compiler/nir/nir_opt_algebraic.py
>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>> @@ -274,6 +274,14 @@ optimizations = [
>>  (('fmul', ('fneg', a), b), ('fneg', ('fmul', a, b))),
>>  (('imul', ('ineg', a), b), ('ineg', ('imul', a, b))),
>>
>> +   # Reassociate constants in add/mul chains so they can be folded
>> together.
>> +   # For now, we only handle cases where the constants are separated by
>> +   # a single non-constant.  We could do better eventually.
>> +   (('~fmul', '#a', ('fmul', b, '#c')), ('fmul', ('fmul', a, c), b)),
>> +   (('imul', '#a', ('imul', b, '#c')), ('imul', ('imul', a, c), b)),
>> +   (('~fadd', '#a', ('fadd', b, '#c')), ('fadd', ('fadd', a, c), b)),
>> +   (('iadd', '#a', ('iadd', b, '#c')), ('iadd', ('iadd', a, c), b)),
>> +
>>  # Misc. lowering
>>  (('fmod', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a,
>> b, 'options->lower_fmod'),
>>  (('uadd_carry', a, b), ('b2i', ('ult', ('iadd', a, b), a)),
>> 'options->lower_uadd_carry'),
>>
>>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] Remove fs-discard-exit-3

2016-04-08 Thread Francisco Jerez

Marek Olšák  writes:

> From: Marek Olšák 
>
> The test is wrong and the GLSL 1.30 citation in the test states very
> clearly that discard can cause non-uniform control flow for any code that
> follows.

Heh, IIRC I sent this exact same patch over a year ago but I didn't get
particularly enthusiastic feed-back so I never pushed it (you can
probably find the discussion in the piglit mailing list archives if
you're interested).  Anyway the test is obviously bogus so this seems
like the right thing to do to me:

Reviewed-by: Francisco Jerez 

> ---
>  .../execution/fs-discard-exit-3.shader_test| 76 
> --
>  1 file changed, 76 deletions(-)
>  delete mode 100644 
> tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test
>
> diff --git a/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test 
> b/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test
> deleted file mode 100644
> index 14e9b47..000
> --- a/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test
> +++ /dev/null
> @@ -1,76 +0,0 @@
> -# This is a test for derivatives behavior after a discard.
> -#
> -# From the GLSL 1.30 spec:
> -#
> -# "The discard keyword is only allowed within fragment shaders. It
> -#  can be used within a fragment shader to abandon the operation
> -#  on the current fragment. This keyword causes the fragment to be
> -#  discarded and no updates to any buffers will occur. Control
> -#  flow exits the shader, and subsequent implicit or explicit
> -#  derivatives are undefined when this control flow is non-uniform
> -#  (meaning different fragments within the primitive take
> -#  different control paths)."
> -
> -
> -[require]
> -GLSL >= 1.30
> -
> -[vertex shader]
> -#version 130
> -
> -in vec4 vertex;
> -out vec2 texcoords;
> -void main()
> -{
> - gl_Position = vertex;
> -
> - /* Turn the texcoords into a 1:1 mapping with pixels when
> -  * interpolated.  This means that the coords for our 2x2
> -  * subspan we're interested in for the FS will be:
> -  *
> -  * +-+-+
> -  * | 0,1 | 1,1 |
> -  * +-+-+
> -  * | 0,0 | 0,1 |
> -  * +-+-+
> -  *
> -  * So it would sample the 1x1 miplevel of the GL_TEXTURE_2D
> -  * miptree, unless some other math occurs...
> -  */
> - texcoords.yx = (vertex.xy + 1) / 2 * 250;
> -}
> -
> -[fragment shader]
> -#version 130
> -in vec2 texcoords;
> -uniform sampler2D s;
> -
> -void main()
> -{
> - if (gl_FragCoord.x >= 1.0 || gl_FragCoord.y >= 1.0)
> - discard;
> -
> - /* Now, we have uniform control after the discard (well,
> -  * except for the join after the if statement up there).  The
> -  * derivatives on this sample should get us the same values
> -  * for the undiscarded pixel as if we hadn't done any discard
> -  * (comment out the "discard" above to see).
> -  */
> - gl_FragColor = texture(s, texcoords / 4);
> -}
> -
> -[vertex data]
> -vertex/float/2
> --1.0 -1.0
> - 1.0 -1.0
> - 1.0  1.0
> --1.0  1.0
> -
> -[test]
> -clear color 0.5 0.5 0.5 0.5
> -clear
> -
> -texture miptree 0
> -
> -draw arrays GL_TRIANGLE_FAN 0 4
> -probe rgba 0 0 0.0 1.0 0.0 1.0
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/4] nir: Silence unused "options" warning in algebraic passes.

2016-04-08 Thread Dylan Baker

Quoting Kenneth Graunke (2016-04-08 11:48:01)
> On Friday, April 8, 2016 11:18:46 AM PDT Dylan Baker wrote: > 
> > Quoting Eduardo Lima Mitev (2016-04-08 08:26:22)
> > > On 04/08/2016 01:35 AM, Kenneth Graunke wrote:
> > > > Some passes may not refer to options->..., at which point the compiler
> > > > will warn about an unused variable. Just cast to void unconditionally
> > > > to shut it up.
> > > >
> > > > Signed-off-by: Kenneth Graunke 
> > > > ---
> > > > src/compiler/nir/nir_algebraic.py | 1 +
> > > > 1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/src/compiler/nir/nir_algebraic.py b/src/compiler/nir/
> nir_algebraic.py
> > > > index d05564f..53a7907 100644
> > > > --- a/src/compiler/nir/nir_algebraic.py
> > > > +++ b/src/compiler/nir/nir_algebraic.py
> > > > @@ -291,6 +291,7 @@ ${pass_name}(nir_shader *shader)
> > > > bool progress = false;
> > > > bool condition_flags[${len(condition_list)}];
> > > > const nir_shader_compiler_options *options = shader->options;
> > > > + (void) options;
> > > >
> > >
> > > Hmm, don't like this very much. I suppose since it is generated code, it
> > > can not be done per block where options is unused.
> > > For respect to other people's right to have clean build logs, patch is:
> > >
> > > Reviewed-by: Eduardo Lima Mitev 
> > >
> > > > % for index, condition in enumerate(condition_list):
> > > > condition_flags[${index}] = ${condition};
> > > >
> > >
> >
> > I'm not familiar with the code, but looking at it options must be used
> > in the condition_list argument?
>  
> Right, we create the 'options' temporary so you can write rules like:
>  
> (('~fmin', ('fmax', a, 0.0), 1.0), ('fsat', a), '!options->lower_fsat'),
>  
> where !options->lower_fsat becomes part of the condition_list. However,
> conditions are optional. If you have an algebraic rule list which doesn't
> include any conditions referring to options->..., the variable will be
> unused. (For example, the list in patch 4 meets this criteria.)
>  

So what about wrapping the definitions of options in a conditional?
% if 'options' in conditions[-1]:
const nir_shader_compiler_options *options = shader->options;
% endif

This only covers that options is the last argument, which seems to be
what nir_opt_algebraic does, although I don't know that is actually a
hard requirement, or just a coincidence.

Dylan


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] i965/disasm: Decode "channel mask present" bit correctly.

2016-04-08 Thread Kenneth Graunke

Bit 15 means "interleave" for most messages, but for SIMD8 messages it
means "use channel masks".

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_disasm.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c 
b/src/mesa/drivers/dri/i965/brw_disasm.c
index 0ae237d..0848657 100644
--- a/src/mesa/drivers/dri/i965/brw_disasm.c
+++ b/src/mesa/drivers/dri/i965/brw_disasm.c
@@ -1505,7 +1505,9 @@ brw_disassemble_inst(FILE *file, const struct 
brw_device_info *devinfo,
 break;
  }
 
- case BRW_SFID_URB:
+ case BRW_SFID_URB: {
+unsigned opcode = brw_inst_urb_opcode(devinfo, inst);
+
 format(file, " %ld", brw_inst_urb_global_offset(devinfo, inst));
 
 space = 1;
@@ -1513,10 +1515,18 @@ brw_disassemble_inst(FILE *file, const struct 
brw_device_info *devinfo,
 err |= control(file, "urb opcode",
devinfo->gen >= 7 ? gen7_urb_opcode
  : gen5_urb_opcode,
-   brw_inst_urb_opcode(devinfo, inst), );
+   opcode, );
+
+if (opcode == GEN8_URB_OPCODE_SIMD8_WRITE ||
+opcode == GEN8_URB_OPCODE_SIMD8_READ) {
+   if (brw_inst_urb_channel_mask_present(devinfo, inst))
+  string(file, " masked");
+} else {
+   err |= control(file, "urb swizzle", urb_swizzle,
+  brw_inst_urb_swizzle_control(devinfo, inst),
+  );
+}
 
-err |= control(file, "urb swizzle", urb_swizzle,
-   brw_inst_urb_swizzle_control(devinfo, inst), 
);
 if (devinfo->gen < 7) {
err |= control(file, "urb allocate", urb_allocate,
   brw_inst_urb_allocate(devinfo, inst), );
@@ -1528,6 +1538,7 @@ brw_disassemble_inst(FILE *file, const struct 
brw_device_info *devinfo,
   brw_inst_urb_complete(devinfo, inst), );
 }
 break;
+ }
  case BRW_SFID_THREAD_SPAWNER:
 break;
 
-- 
2.8.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] i965/disasm: Simplify the URB opcode printing with ?:.

2016-04-08 Thread Kenneth Graunke

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_disasm.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c 
b/src/mesa/drivers/dri/i965/brw_disasm.c
index 09eb239..0ae237d 100644
--- a/src/mesa/drivers/dri/i965/brw_disasm.c
+++ b/src/mesa/drivers/dri/i965/brw_disasm.c
@@ -1509,13 +1509,12 @@ brw_disassemble_inst(FILE *file, const struct 
brw_device_info *devinfo,
 format(file, " %ld", brw_inst_urb_global_offset(devinfo, inst));
 
 space = 1;
-if (devinfo->gen >= 7) {
-   err |= control(file, "urb opcode", gen7_urb_opcode,
-  brw_inst_urb_opcode(devinfo, inst), );
-} else if (devinfo->gen >= 5) {
-   err |= control(file, "urb opcode", gen5_urb_opcode,
-  brw_inst_urb_opcode(devinfo, inst), );
-}
+
+err |= control(file, "urb opcode",
+   devinfo->gen >= 7 ? gen7_urb_opcode
+ : gen5_urb_opcode,
+   brw_inst_urb_opcode(devinfo, inst), );
+
 err |= control(file, "urb swizzle", urb_swizzle,
brw_inst_urb_swizzle_control(devinfo, inst), 
);
 if (devinfo->gen < 7) {
-- 
2.8.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] i965/disasm: Decode per-slot offsets.

2016-04-08 Thread Kenneth Graunke

We just never bothered to decode this.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_disasm.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c 
b/src/mesa/drivers/dri/i965/brw_disasm.c
index 0848657..88bd7a4 100644
--- a/src/mesa/drivers/dri/i965/brw_disasm.c
+++ b/src/mesa/drivers/dri/i965/brw_disasm.c
@@ -1517,6 +1517,11 @@ brw_disassemble_inst(FILE *file, const struct 
brw_device_info *devinfo,
  : gen5_urb_opcode,
opcode, );
 
+if (devinfo->gen >= 7 &&
+brw_inst_urb_per_slot_offset(devinfo, inst)) {
+   string(file, " per-slot");
+}
+
 if (opcode == GEN8_URB_OPCODE_SIMD8_WRITE ||
 opcode == GEN8_URB_OPCODE_SIMD8_READ) {
if (brw_inst_urb_channel_mask_present(devinfo, inst))
-- 
2.8.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 05/16] nir: add lowering pass for glBitmap

2016-04-08 Thread Eric Anholt

Rob Clark  writes:

> From: Rob Clark 
>
> Signed-off-by: Rob Clark 
> Reviewed-by: Connor Abbott 
> ---
>  src/compiler/Makefile.sources   |   1 +
>  src/compiler/nir/nir.h  |   7 ++
>  src/compiler/nir/nir_lower_bitmap.c | 166 
> 
>  3 files changed, 174 insertions(+)
>  create mode 100644 src/compiler/nir/nir_lower_bitmap.c
>
> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
> index c396128..cb9eee9 100644
> --- a/src/compiler/Makefile.sources
> +++ b/src/compiler/Makefile.sources
> @@ -185,6 +185,7 @@ NIR_FILES = \
>   nir/nir_liveness.c \
>   nir/nir_lower_alu_to_scalar.c \
>   nir/nir_lower_atomics.c \
> + nir/nir_lower_bitmap.c \
>   nir/nir_lower_clip.c \
>   nir/nir_lower_drawpixels.c \
>   nir/nir_lower_global_vars_to_local.c \
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 34635ed..85b28d2 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2269,6 +2269,13 @@ typedef struct nir_lower_drawpixels_options {
>  void nir_lower_drawpixels(nir_shader *shader,
>const nir_lower_drawpixels_options *options);
>  
> +typedef struct nir_lower_bitmap_options {
> +   unsigned sampler;
> +   bool swizzle_;
> +} nir_lower_bitmap_options;
> +
> +void nir_lower_bitmap(nir_shader *shader, const nir_lower_bitmap_options 
> *options);
> +
>  void nir_lower_atomics(nir_shader *shader,
> const struct gl_shader_program *shader_program);
>  void nir_lower_to_source_mods(nir_shader *shader);
> diff --git a/src/compiler/nir/nir_lower_bitmap.c 
> b/src/compiler/nir/nir_lower_bitmap.c
> new file mode 100644
> index 000..3cba69a
> --- /dev/null
> +++ b/src/compiler/nir/nir_lower_bitmap.c
> @@ -0,0 +1,166 @@
> +/*
> + * Copyright © 2015 Red Hat
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
> THE
> + * SOFTWARE.
> + */
> +
> +#include "nir.h"
> +#include "nir_builder.h"
> +
> +/* Lower glBitmap().
> + *
> + * This is based on the logic in st_get_bitmap_shader() in TGSI compiler.
> + * From st_cb_bitmap.c:
> + *
> + *glBitmaps are drawn as textured quads.  The user's bitmap pattern
> + *is stored in a texture image.  An alpha8 texture format is used.
> + *The fragment shader samples a bit (texel) from the texture, then
> + *discards the fragment if the bit is off.
> + *
> + *Note that we actually store the inverse image of the bitmap to
> + *simplify the fragment program.  An "on" bit gets stored as texel=0x0
> + *and an "off" bit is stored as texel=0xff.  Then we kill the
> + *fragment if the negated texel value is less than zero.
> + *
> + * Note that the texture format will be, according to what driver supports,
> + * in order of preference (with swizzle):
> + *
> + *I8_UNORM - .
> + *A8_UNORM - .000x
> + *L8_UNORM - .xxx1
> + *
> + * If L8_UNORM, options->swizzle_ is true.  Otherwise we can just use
> + * the .w comp.
> + *
> + * Run before nir_lower_io.
> + */
> +
> +typedef struct {
> +   const nir_lower_bitmap_options *options;
> +   nir_shader   *shader;
> +   nir_builder   b;
> +   nir_variable *texcoord;
> +} lower_bitmap_state;
> +
> +static nir_ssa_def *
> +get_texcoord(lower_bitmap_state *state)
> +{
> +   if (state->texcoord == NULL) {
> +  nir_variable *texcoord = NULL;
> +
> +  /* find gl_TexCoord, if it exists: */
> +  nir_foreach_variable(var, >shader->inputs) {
> + if (strcmp(var->name, "gl_TexCoord") == 0) {
> +texcoord = var;
> +break;
> + }
> +  }
> +
> +  /* otherwise create it: */
> +  if (texcoord == NULL) {
> + texcoord = nir_variable_create(state->shader,
> +

Re: [Mesa-dev] [PATCH 07/16] nir: passthrough-edgeflags support

2016-04-08 Thread Eric Anholt

Rob Clark  writes:

> From: Rob Clark 
>
> Handled by tgsi_emulate for glsl->tgsi case.
>
> Signed-off-by: Rob Clark 
> Reviewed-by: Connor Abbott 
> ---
>  src/compiler/Makefile.sources  |  1 +
>  src/compiler/nir/nir.h |  2 +
>  src/compiler/nir/nir_lower_passthrough_edgeflags.c | 82 
> ++
>  3 files changed, 85 insertions(+)
>  create mode 100644 src/compiler/nir/nir_lower_passthrough_edgeflags.c
>
> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
> index 12badf0..ae7efbf 100644
> --- a/src/compiler/Makefile.sources
> +++ b/src/compiler/Makefile.sources
> @@ -197,6 +197,7 @@ NIR_FILES = \
>   nir/nir_lower_indirect_derefs.c \
>   nir/nir_lower_io.c \
>   nir/nir_lower_outputs_to_temporaries.c \
> + nir/nir_lower_passthrough_edgeflags.c \
>   nir/nir_lower_phis_to_scalar.c \
>   nir/nir_lower_returns.c \
>   nir/nir_lower_samplers.c \
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 67373b6..65f75bd 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2246,6 +2246,8 @@ void nir_lower_two_sided_color(nir_shader *shader);
>  
>  void nir_lower_clamp_color_outputs(nir_shader *shader);
>  
> +void nir_lower_passthrough_edgeflags(nir_shader *shader);
> +
>  typedef struct nir_lower_wpos_ytransform_options {
> int state_tokens[5];
> bool fs_coord_origin_upper_left :1;
> diff --git a/src/compiler/nir/nir_lower_passthrough_edgeflags.c 
> b/src/compiler/nir/nir_lower_passthrough_edgeflags.c
> new file mode 100644
> index 000..1afcbf7
> --- /dev/null
> +++ b/src/compiler/nir/nir_lower_passthrough_edgeflags.c
> @@ -0,0 +1,82 @@
> +/*
> + * Copyright © 2015 Red Hat
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
> THE
> + * SOFTWARE.
> + */
> +
> +#include "nir.h"
> +#include "nir_builder.h"
> +
> +typedef struct {
> +   nir_shader *shader;
> +   nir_builder b;
> +} lower_state;
> +
> +static nir_variable *
> +create_edgeflag_var(nir_shader *shader, bool output)
> +{
> +   nir_variable_mode mode = output ? nir_var_shader_out : nir_var_shader_in;
> +   nir_variable *var = nir_variable_create(shader, mode, glsl_vec4_type(),
> +   output ? "edgeflag_out" : 
> "edgeflag_in");
> +   var->data.location = output ? VARYING_SLOT_EDGE : VERT_ATTRIB_EDGEFLAG;
> +   return var;
> +}
> +
> +static bool
> +lower_block(nir_block *block, void *_state)
> +{
> +   lower_state *state = _state;
> +   nir_builder *b = >b;
> +   nir_variable *in, *out;
> +   nir_ssa_def *def;
> +
> +   in = create_edgeflag_var(state->shader, false);
> +   out = create_edgeflag_var(state->shader, true);

I think the code would be clearer if you inlined create_edgeflag_var.

> +
> +   b->cursor = nir_before_block(block);
> +
> +   def = nir_load_var(b, in);
> +   nir_store_var(b, out, def, 0xf);
> +
> +   /* only do this for first block: */
> +   return false;
> +}
> +
> +static void
> +lower_impl(lower_state *state, nir_function_impl *impl)
> +{
> +   nir_builder_init(>b, impl);
> +
> +   nir_foreach_block(impl, lower_block, state);

I think in the previous review when I suggested nir_before_block(), I
actually wanted to say that you probably want here to just do
nir_before_cf_list(>cf_node).

> +   nir_metadata_preserve(impl, nir_metadata_block_index |
> +   nir_metadata_dominance);
> +}
> +
> +void nir_lower_passthrough_edgeflags(nir_shader *shader)
> +{
> +   lower_state state = {
> +  .shader = shader,
> +   };
> +
> +   nir_foreach_function(shader, function) {
> +  if ((strcmp(function->name, "main") == 0) && function->impl)
> + lower_impl(, function->impl);
> +   }
> +}

We pretty clearly want a "Get me the main

Re: [Mesa-dev] Merging the Vulkan driver

2016-04-08 Thread Eric Anholt

Jason Ekstrand  writes:

> All,
>
> We are getting very close to being able to merge the Vulkan driver into
> mesa master.  I've got around 30 patches on the mailing list that are still
> awaiting review.  Once those get merged, the diff between the "vulkan" and
> "master" branches is basically zero except for adding new files.  The going
> plan is to get the diff between the two branches down to zero except for
> the new "src/intel" directory and minimal build system changes and then do
> an actual git merge.  I know we don't usually do merges in this project,
> but the Vulkan driver is big and has a lot of history that we would like to
> preserve.  If you're strongly opposed to doing a merge, please speak up now!
>
> Things that still need review:
>
>  1) i965 back-end patches for indirect push constants
>  2) i965 and NIR patches to add a couple new opcodes
>  3) Misc i965 and NIR patches we've been carrying around in the vulkan
> branch for a while
>  4) A patch to configure.ac to add new flags for building Vulkan drivers
> (mostly written, not yet sent).
>
> Those will all undergo the normal review process and get pushed *before* we
> merge.  That way the only thing the merge does is add a new directory and a
> three or four lines each to configure.ac and src/Makefile.am.
>
> If people step up with code review, I'm hoping that we can merge some time
> this week or next.

I think a merge makes a ton of sense for this.  Having duplicate commits
in the history sucks, but maintaining history is worth it.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 00/15] i965: Rework uniform handling in the back-end

2016-04-08 Thread Jason Ekstrand

On Thu, Apr 7, 2016 at 10:01 PM, Matt Turner  wrote:

> On Tue, Mar 22, 2016 at 3:33 PM, Jason Ekstrand 
> wrote:
> > This is mostly a re-send of a patch series I've had floating around in
> one
> > form or a while for quite some time.  It's basically the same except that
> > the original version was missing a work-around for Sandy Bridge.  For a
> > while, I wasn't really pushing to get it merged because I couldn't
> > demonstrate any actual performance benifit from pushing arrays.  However,
> > with the Vulkan API, the concept of push constants is directly exposed to
> > the user and we really need to be able to indirect on them.  This series
> > makes the FS backend 100% ready for indirect push constants;  vec4 will
> > take a little more work.
> >
> > It's worth noting that we've been carying these patches around in our
> > Vulkan driver for probably 3 or 4 months now and it's working great.
> >
> > For those that prefer to review on a branch:
> >
> > https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i965-uniforms
> >
> > I think Kristian has mostly reviewed these patches.  However, he never
> sent
> > any R-Bs to the list.  I'd also like Ken or Matt to look at it from a
> > design perspective.
>
> I don't know what I think. I'm sympathetic to Curro's argument, but in
> the absence of more data it's hard to judge anything really. I'm not
> at all sympathetic to
>
> """
> Do I have a proof-of-concept in code, no.  However, I've run through
> it in my head and I have a pretty good idea what it would look like.
> You are free to go off and do it if you don't believe me, but I don't
> really want to hold things up while you do.
> """
>
> That's what... An Appeal to Your Brain? :)
>
> I don't know how to proceed on that front if no one is willing or
> interested in trying to implement it using reladdr.
>
> I ran shader-db.
>
> total instructions in shared programs: 7113290 -> 7161760 (0.68%)
> instructions in affected programs: 866011 -> 914481 (5.60%)
> helped: 0
> HURT: 7180
>
> total cycles in shared programs: 64705926 -> 64776118 (0.11%)
> cycles in affected programs: 4951554 -> 5021746 (1.42%)
> helped: 1605
> HURT: 5204
>
> of which the overwhelming majority is vertex shaders (why? this series
> is i965/fs). FS changes are just
>
> instructions in affected programs: 13550 -> 14132 (4.30%)
> helped: 0
> HURT: 50
>
> but I'm having a hard time finding shaders that actually use the
> address register.
>
> What's going on with the shader-db regressions?
>

As I figured, the shader-db issues are primarily a problem with
inconsistent use of D and UD.  This patch fixes that problem:

https://cgit.freedesktop.org/~jekstrand/mesa/commit/?h=wip/i965-uniforms-v3=0dbf9c8ee415b19073efe92fa586fddf22b725e6

I've pushed a version of the series with that patch to fd.o here:

https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/i965-uniforms-v3

There is still a small shader-db delta in vec4 which comes from changing
the algorithm we use to place pull constants in the buffer.  The original
algorithm walks over the instructions and places uniforms in the pull
constant buffer as it finds an instructions that use them.  The new
algorithm places them in the same order as their uniform number which is
effectively the order in which they are declared in the shader.

There are a number of shaders from Valve games (left4dead, portal 2, tf2)
that seem to have a single uniform array they use *a lot* and others that
they use less frequently.  The one they use most frequently happens to also
be used first so it gets placed first in the buffer with the old algorighm
and later with the new one.  The only reason this makes any difference
whatsoever is that whatever uniform gets placed first in the buffer is at
offset 0 so you don't need to add a constant offset to the array offset and
the address calculation has one less instruction.  In these Valve games,
that uniform happens to be used enough more often than the rest that that
extra instruction shows up in the shader-db results.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] i965/tiled_memoyp: Get rid of the direction parameter to get_memcpy

2016-04-08 Thread Jason Ekstrand

On Thu, Apr 7, 2016 at 4:57 PM, Matt Turner  wrote:

> On Thu, Apr 7, 2016 at 1:56 PM, Jason Ekstrand 
> wrote:
> > Now that we can use the much simpler rgba8_copy function, we don't need
> to
> > hand different functions out based on direction.
> > ---
>
> Typo in the subject, and PATCH 2/3 needs a bugzilla link.
>

Fixed


> The series is
>
> Reviewed-by: Matt Turner 
>
> I noticed that gcc does *not* unroll the loop in the 64-byte case.
> I've got some small optimizations to this code that I'll send after
> this series, so I'll just fix that up at the same time.
>

I just pushed.  I added a Cc to stable for the first two patches.  I
recommend you do as well for whatever you send that fixes the unrolling
problem.
--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/4] nir: Silence unused "options" warning in algebraic passes.

2016-04-08 Thread Kenneth Graunke

On Friday, April 8, 2016 11:18:46 AM PDT Dylan Baker wrote:
> Quoting Eduardo Lima Mitev (2016-04-08 08:26:22)
> > On 04/08/2016 01:35 AM, Kenneth Graunke wrote:
> > > Some passes may not refer to options->..., at which point the 
compiler
> > > will warn about an unused variable.  Just cast to void unconditionally
> > > to shut it up.
> > >
> > > Signed-off-by: Kenneth Graunke 
> > > ---
> > >   src/compiler/nir/nir_algebraic.py | 1 +
> > >   1 file changed, 1 insertion(+)
> > >
> > > diff --git a/src/compiler/nir/nir_algebraic.py b/src/compiler/nir/
nir_algebraic.py
> > > index d05564f..53a7907 100644
> > > --- a/src/compiler/nir/nir_algebraic.py
> > > +++ b/src/compiler/nir/nir_algebraic.py
> > > @@ -291,6 +291,7 @@ ${pass_name}(nir_shader *shader)
> > >  bool progress = false;
> > >  bool condition_flags[${len(condition_list)}];
> > >  const nir_shader_compiler_options *options = shader->options;
> > > +   (void) options;
> > >
> > 
> > Hmm, don't like this very much. I suppose since it is generated code, it 
> > can not be done per block where options is unused.
> > For respect to other people's right to have clean build logs, patch is:
> > 
> > Reviewed-by: Eduardo Lima Mitev 
> > 
> > >  % for index, condition in enumerate(condition_list):
> > >  condition_flags[${index}] = ${condition};
> > >
> > 
> 
> I'm not familiar with the code, but looking at it options must be used
> in the condition_list argument?

Right, we create the 'options' temporary so you can write rules like:

   (('~fmin', ('fmax', a, 0.0), 1.0), ('fsat', a), '!options->lower_fsat'),

where !options->lower_fsat becomes part of the condition_list.  However,
conditions are optional.  If you have an algebraic rule list which doesn't
include any conditions referring to options->..., the variable will be
unused.  (For example, the list in patch 4 meets this criteria.)




signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/4] nir: Silence unused "options" warning in algebraic passes.

2016-04-08 Thread Dylan Baker

Quoting Eduardo Lima Mitev (2016-04-08 08:26:22)
> On 04/08/2016 01:35 AM, Kenneth Graunke wrote:
> > Some passes may not refer to options->..., at which point the compiler
> > will warn about an unused variable.  Just cast to void unconditionally
> > to shut it up.
> >
> > Signed-off-by: Kenneth Graunke 
> > ---
> >   src/compiler/nir/nir_algebraic.py | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/src/compiler/nir/nir_algebraic.py 
> > b/src/compiler/nir/nir_algebraic.py
> > index d05564f..53a7907 100644
> > --- a/src/compiler/nir/nir_algebraic.py
> > +++ b/src/compiler/nir/nir_algebraic.py
> > @@ -291,6 +291,7 @@ ${pass_name}(nir_shader *shader)
> >  bool progress = false;
> >  bool condition_flags[${len(condition_list)}];
> >  const nir_shader_compiler_options *options = shader->options;
> > +   (void) options;
> >
> 
> Hmm, don't like this very much. I suppose since it is generated code, it 
> can not be done per block where options is unused.
> For respect to other people's right to have clean build logs, patch is:
> 
> Reviewed-by: Eduardo Lima Mitev 
> 
> >  % for index, condition in enumerate(condition_list):
> >  condition_flags[${index}] = ${condition};
> >
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

I'm not familiar with the code, but looking at it options must be used
in the condition_list argument?


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] i965/tiled_memoyp: Get rid of the direction parameter to get_memcpy

2016-04-08 Thread Chad Versace

On 04/07/2016 01:56 PM, Jason Ekstrand wrote:
> Now that we can use the much simpler rgba8_copy function, we don't need to
> hand different functions out based on direction.
> ---
>  src/mesa/drivers/dri/i965/intel_pixel_read.c   |  3 +--
>  src/mesa/drivers/dri/i965/intel_tex_image.c|  3 +--
>  src/mesa/drivers/dri/i965/intel_tex_subimage.c |  3 +--
>  src/mesa/drivers/dri/i965/intel_tiled_memcpy.c |  3 +--
>  src/mesa/drivers/dri/i965/intel_tiled_memcpy.h | 15 +--
>  5 files changed, 5 insertions(+), 22 deletions(-)

I read the patches closely, and I like their approach.

For the series,
Reviewed-by: Chad Versace 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeon/uvd: alignment fix for decode message buffer

2016-04-08 Thread Leo Liu


Cc: 

On 04/08/2016 11:34 AM, Boyuan Zhang wrote:

Signed-off-by: Boyuan Zhang 
Reviewed-by: Christian König 
---
  src/gallium/drivers/radeon/radeon_uvd.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
b/src/gallium/drivers/radeon/radeon_uvd.c
index 233f460..098baf2 100644
--- a/src/gallium/drivers/radeon/radeon_uvd.c
+++ b/src/gallium/drivers/radeon/radeon_uvd.c
@@ -1003,7 +1003,7 @@ static void ruvd_end_frame(struct pipe_video_codec 
*decoder,
  
  	dec->msg->body.decode.dpb_size = dec->dpb.res->buf->size;

dec->msg->body.decode.bsd_size = bs_size;
-   dec->msg->body.decode.db_pitch = dec->base.width;
+   dec->msg->body.decode.db_pitch = align(dec->base.width, 16);
  
  	dt = dec->set_dtb(dec->msg, (struct vl_video_buffer *)target);

if (((struct r600_common_screen*)dec->screen)->family >= CHIP_STONEY)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] i965: Port INTEL_PRECISE_TRIG=1 to NIR.

2016-04-08 Thread Eduardo Lima Mitev


On 04/08/2016 01:35 AM, Kenneth Graunke wrote:

This makes the extra multiply visible to NIR's algebraic optimizations
(for constant reassociation) as well as constant folding.  This means
that when the result of sin/cos are multiplied by an constant, we can
eliminate the extra multiply altogether, reducing the cost of the
workaround.

It also means we only have to implement it one place, rather than in
both backends.

This makes INTEL_PRECISE_TRIG=1 cost nothing on GPUTest/Volplosion,
which has a ton of sin() calls, but always multiplies them by an
immediate constant.  The extra multiply gets folded away.



Cleaner, indeed. Patch (and series) is:

Reviewed-by: Eduardo Lima Mitev 

Thanks!

Eduardo


Signed-off-by: Kenneth Graunke 
---
  src/mesa/drivers/dri/i965/Makefile.am  |  5 +++
  src/mesa/drivers/dri/i965/Makefile.sources |  1 +
  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 16 +---
  src/mesa/drivers/dri/i965/brw_nir.c|  3 ++
  src/mesa/drivers/dri/i965/brw_nir.h|  2 +
  .../drivers/dri/i965/brw_nir_trig_workarounds.py   | 43 ++
  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 16 +---
  7 files changed, 58 insertions(+), 28 deletions(-)
  create mode 100755 src/mesa/drivers/dri/i965/brw_nir_trig_workarounds.py

diff --git a/src/mesa/drivers/dri/i965/Makefile.am 
b/src/mesa/drivers/dri/i965/Makefile.am
index 0db5a51..a41c830 100644
--- a/src/mesa/drivers/dri/i965/Makefile.am
+++ b/src/mesa/drivers/dri/i965/Makefile.am
@@ -33,6 +33,7 @@ AM_CFLAGS = \
-I$(top_srcdir)/src/mesa/drivers/dri/common \
-I$(top_srcdir)/src/mesa/drivers/dri/intel/server \
-I$(top_srcdir)/src/gtest/include \
+   -I$(top_srcdir)/src/compiler/nir \
-I$(top_builddir)/src/compiler/nir \
-I$(top_builddir)/src/mesa/drivers/dri/common \
$(DEFINES) \
@@ -41,6 +42,10 @@ AM_CFLAGS = \

  AM_CXXFLAGS = $(AM_CFLAGS)

+brw_nir_trig_workarounds.c: brw_nir_trig_workarounds.py 
$(top_srcdir)/src/compiler/nir/nir_algebraic.py
+   $(MKDIR_GEN)
+   $(AM_V_GEN) PYTHONPATH=$(top_srcdir)/src/compiler/nir $(PYTHON2) 
$(PYTHON_FLAGS) $(srcdir)/brw_nir_trig_workarounds.py > $@ || ($(RM) $@; false)
+
  noinst_LTLIBRARIES = libi965_dri.la libi965_compiler.la
  libi965_dri_la_SOURCES = $(i965_FILES)
  libi965_dri_la_LIBADD = libi965_compiler.la $(INTEL_LIBS)
diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 4689588..2619e43 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -44,6 +44,7 @@ i965_compiler_FILES = \
brw_nir.c \
brw_nir_analyze_boolean_resolves.c \
brw_nir_attribute_workarounds.c \
+   brw_nir_trig_workarounds.c \
brw_nir_opt_peephole_ffma.c \
brw_nir_uniforms.cpp \
brw_packed_float.c \
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 90b8789..bd6314a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -775,24 +775,12 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
break;

 case nir_op_fsin:
-  if (!compiler->precise_trig) {
- inst = bld.emit(SHADER_OPCODE_SIN, result, op[0]);
-  } else {
- fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F);
- inst = bld.emit(SHADER_OPCODE_SIN, tmp, op[0]);
- inst = bld.MUL(result, tmp, brw_imm_f(0.7));
-  }
+  inst = bld.emit(SHADER_OPCODE_SIN, result, op[0]);
inst->saturate = instr->dest.saturate;
break;

 case nir_op_fcos:
-  if (!compiler->precise_trig) {
- inst = bld.emit(SHADER_OPCODE_COS, result, op[0]);
-  } else {
- fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F);
- inst = bld.emit(SHADER_OPCODE_COS, tmp, op[0]);
- inst = bld.MUL(result, tmp, brw_imm_f(0.7));
-  }
+  inst = bld.emit(SHADER_OPCODE_COS, result, op[0]);
inst->saturate = instr->dest.saturate;
break;

diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index 1821c0d..932979a 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -447,6 +447,9 @@ brw_preprocess_nir(const struct brw_compiler *compiler, 
nir_shader *nir)
 if (nir->stage == MESA_SHADER_GEOMETRY)
OPT(nir_lower_gs_intrinsics);

+   if (compiler->precise_trig)
+  OPT(brw_nir_apply_trig_workarounds);
+
 static const nir_lower_tex_options tex_options = {
.lower_txp = ~0,
 };
diff --git a/src/mesa/drivers/dri/i965/brw_nir.h 
b/src/mesa/drivers/dri/i965/brw_nir.h
index b10c083..2711606 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.h
+++ b/src/mesa/drivers/dri/i965/brw_nir.h
@@ -106,6 +106,8 @@ bool brw_nir_apply_attribute_workarounds(nir_shader *nir,

Re: [Mesa-dev] [Nouveau] [PATCH] nouveau: codegen: Take src swizzle into account on loads

2016-04-08 Thread Ilia Mirkin

On Fri, Apr 8, 2016 at 12:26 PM, Hans de Goede  wrote:
> Hi,
>
>
> On 08-04-16 18:06, Hans de Goede wrote:
>>
>> Hi,
>>
>> On 08-04-16 17:45, Ilia Mirkin wrote:
>>>
>>> On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goede 
>>> wrote:

 When dealing with non vector variables the llvm register allocator
 will use TEMP[0].x then TEMP[0].y, etc.

 When loading something from a global buffer it will calculate the
 address to use, and store that in say TEMP[0].x, so it ends up
 generating:

 LOAD TEMP[0].y, MEMORY[0], TEMP[0]

 Expecting the contents of TEMP[0].y to become the 32 bits of data
 to which TEMP[0].x is pointing. But instead it will get the 32 bits of
 data at address (TEMP[0].x + 4).

 With the old RES[32767] code one could generate the following TGSI:

 LOAD TEMP[0].y, RES[32767]., TEMP[0]

 And things would work fine since the . swizzling postfix would
 be honored and when storing to y (the only component set in the
 dest-mask)
 the x component at address (TEMP[0].x) would be loaded, rather then the
 y component at (TEMP[0].y)

 Note that another approach would be to not increment the address by
 a 32 bit word for skipped (not set in destmask) components.

 The way I see it either:

 1) We see that LOAD does not deal with vectors, but with flat memory,
 in which case skipping 4 bytes because x is not set in the destmask
 does not make sense, as that is a vector thing todo.

 2) LOAD is vector layout aware in which case supporting swizzling
 makes sense.

 Currently we have a weird hybrid which is rather cumbersome to
 work with from a compiler pov.
>>>
>>>
>>> And I guess LLVM never ends up generating any of the other "funny"
>>> instructions like LIT and the such. Well, I have no problem adding the
>>> swizzling logic, i.e. the way that LOAD will now work (logically) is
>>> that it will
>>>
>>> (a) fetch 4 values from the coordinates provided (4 sequential dwords
>>> from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in
>>> the case of images)
>>> (b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE
>>> argument
>>> (c) store that swizzled result into the destination based on the
>>> writemask
>>>
>>> That would sound reasonable to me, and if I understand correctly, is
>>> option 2 of your proposal.
>>
>>
>> Yes that is option 2, and is basically what the patch which started this
>> thread does. So that would work for me :)
>>
>>> We'd need some docs updates and buy-in from the other gallium driver
>>> developers.
>>
>>
>> What docs would need updating ? The TGSI docs I'm aware of are at:
>>
>> http://gallium.readthedocs.org/en/latest/tgsi.html
>>
>> I assume those have a source in the mesa src somewhere (I've not looked),
>> but those mostly just look quite incomplete in general when it comes to
>> TGSI
>> (I've had to revert to figuring what things do from the mesa srcs quite
>> often)
>>
>> Have I been looking at the wrong docs perhaps ?
>>
>> Note that them being incomplete is not intended as an excuse to not
>> document
>> this, I'm all for better documentation.
>>
>>> STORE remains unchanged, as the MEMORY/etc is in the destination,
>>> where there is a writemask, which is presently used and will remain
>>> effective.
>>
>>
>> Right and note that the first src operand of STORE already takes swizzling
>> into account, so the proposed option 2 will actually make the 2 more
>> inline.
>
>
> Erm, I mean the 2nd src operand of the store of-course, the actual src.
>
> On a related note, comparing handleLOAD and handleSTORE, this bit in
> handleLOAD seems wrong:
>
>  Value *off = fetchSrc(1, c);
>
> I believe that should be:
>
>  Value *off = fetchSrc(1, 0);

Yep, that's wrong. I think I was waffling back and forth early on in
the lifetime of the patchset about how it would work, and one of the
options was to read each dword from a separate offset. (I think I
started implementing atomic buffers well over a year ago, only to be
stymied by the memory window issue and give up for a long time.) I
eventually came to realize that was insanity.

>
> Just like handleSTORE does:
>
>  off = fetchSrc(0, 0);
>
> And always using a 'c' of 0 seems correct here since we are dealing
> with an address.
>
> Once I know which docs to update for this, I'll do a v2 of this patch
> and add a preparation patch fixing the above to the v2 set.

src/gallium/docs/source/tgsi.rst

There are push hooks which make readthedocs.org re-pull from the mesa
repo on every push so that things are up to date (well, it takes a few
minutes to regenerate the html).

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Nouveau] [PATCH] nouveau: codegen: Take src swizzle into account on loads

2016-04-08 Thread Hans de Goede


Hi,

On 08-04-16 18:06, Hans de Goede wrote:

Hi,

On 08-04-16 17:45, Ilia Mirkin wrote:

On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goede  wrote:

When dealing with non vector variables the llvm register allocator
will use TEMP[0].x then TEMP[0].y, etc.

When loading something from a global buffer it will calculate the
address to use, and store that in say TEMP[0].x, so it ends up
generating:

LOAD TEMP[0].y, MEMORY[0], TEMP[0]

Expecting the contents of TEMP[0].y to become the 32 bits of data
to which TEMP[0].x is pointing. But instead it will get the 32 bits of
data at address (TEMP[0].x + 4).

With the old RES[32767] code one could generate the following TGSI:

LOAD TEMP[0].y, RES[32767]., TEMP[0]

And things would work fine since the . swizzling postfix would
be honored and when storing to y (the only component set in the dest-mask)
the x component at address (TEMP[0].x) would be loaded, rather then the
y component at (TEMP[0].y)

Note that another approach would be to not increment the address by
a 32 bit word for skipped (not set in destmask) components.

The way I see it either:

1) We see that LOAD does not deal with vectors, but with flat memory,
in which case skipping 4 bytes because x is not set in the destmask
does not make sense, as that is a vector thing todo.

2) LOAD is vector layout aware in which case supporting swizzling
makes sense.

Currently we have a weird hybrid which is rather cumbersome to
work with from a compiler pov.


And I guess LLVM never ends up generating any of the other "funny"
instructions like LIT and the such. Well, I have no problem adding the
swizzling logic, i.e. the way that LOAD will now work (logically) is
that it will

(a) fetch 4 values from the coordinates provided (4 sequential dwords
from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in
the case of images)
(b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE argument
(c) store that swizzled result into the destination based on the writemask

That would sound reasonable to me, and if I understand correctly, is
option 2 of your proposal.


Yes that is option 2, and is basically what the patch which started this
thread does. So that would work for me :)


We'd need some docs updates and buy-in from the other gallium driver developers.


What docs would need updating ? The TGSI docs I'm aware of are at:

http://gallium.readthedocs.org/en/latest/tgsi.html

I assume those have a source in the mesa src somewhere (I've not looked),
but those mostly just look quite incomplete in general when it comes to TGSI
(I've had to revert to figuring what things do from the mesa srcs quite often)

Have I been looking at the wrong docs perhaps ?

Note that them being incomplete is not intended as an excuse to not document
this, I'm all for better documentation.


STORE remains unchanged, as the MEMORY/etc is in the destination,
where there is a writemask, which is presently used and will remain
effective.


Right and note that the first src operand of STORE already takes swizzling
into account, so the proposed option 2 will actually make the 2 more inline.


Erm, I mean the 2nd src operand of the store of-course, the actual src.

On a related note, comparing handleLOAD and handleSTORE, this bit in
handleLOAD seems wrong:

 Value *off = fetchSrc(1, c);

I believe that should be:

 Value *off = fetchSrc(1, 0);

Just like handleSTORE does:

 off = fetchSrc(0, 0);

And always using a 'c' of 0 seems correct here since we are dealing
with an address.

Once I know which docs to update for this, I'll do a v2 of this patch
and add a preparation patch fixing the above to the v2 set.

Regards,

Hans
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH mesa v2 1/2] nouveau: codegen: Use FILE_MEMORY_BUFFER for buffers

2016-04-08 Thread Samuel Pitoiset




On 04/08/2016 12:17 PM, Hans de Goede wrote:

Hi,

On 23-03-16 23:10, Samuel Pitoiset wrote:

Are you sure this won't break compute shaders on fermi?
Could you please double-check that?


I just checked:

lspci:
01:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT
610] (rev a1)

Before this patch-set:

[hans@plank piglit]$ ./piglit run -o shader -t
'.*arb_shader_storage_buffer_object.*' results/shader
[9/9] pass: 9 /

After this patch-set:

[hans@plank piglit]$ ./piglit run -o shader -t
'.*arb_shader_storage_buffer_object.*' results/shader
[9/9] pass: 9 /


And what about compute shaders? (ie. -t arb_compute_shader)
Sorry to ask you again but I just want to be sure it's fine. :-)




One minor comment below.

On 03/17/2016 05:07 PM, Hans de Goede wrote:

Some of the lowering steps we currently do for FILE_MEMORY_GLOBAL only
apply to buffers, making it impossible to use FILE_MEMORY_GLOBAL for
OpenCL global buffers.

This commits changes the buffer code to use FILE_MEMORY_BUFFER at the
ir_from_tgsi and lowering steps, freeing use of FILE_MEMORY_GLOBAL
for use with OpenCL global buffers.

Note that after lowering buffer accesses use the FILE_MEMORY_GLOBAL
register file.

Tested with piglet on a gk107, before this patch:
./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*'
results/shader
[9/9] pass: 9 /
after:
./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*'
results/shader
[9/9] pass: 9 /

Signed-off-by: Hans de Goede 
---
Changes in v2:
-New patch in v2 of patch-set to re-enable support for global opencl
buffers
---
  src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 +
  src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 +-
  src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 8
+---
  src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp | 1 +
  src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp   | 5
-
  src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp   | 1 +
  6 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index 7b0eb2f..5141fc6 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -332,6 +332,7 @@ enum DataFile
 FILE_MEMORY_CONST,
 FILE_SHADER_INPUT,
 FILE_SHADER_OUTPUT,
+   FILE_MEMORY_BUFFER,
 FILE_MEMORY_GLOBAL,
 FILE_MEMORY_SHARED,
 FILE_MEMORY_LOCAL,
diff --git
a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index baa2e30..7ae0cb2 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -373,7 +373,7 @@ static nv50_ir::DataFile translateFile(uint file)
 case TGSI_FILE_PREDICATE:   return nv50_ir::FILE_PREDICATE;
 case TGSI_FILE_IMMEDIATE:   return nv50_ir::FILE_IMMEDIATE;
 case TGSI_FILE_SYSTEM_VALUE:return nv50_ir::FILE_SYSTEM_VALUE;
-   case TGSI_FILE_BUFFER:  return nv50_ir::FILE_MEMORY_GLOBAL;
+   case TGSI_FILE_BUFFER:  return nv50_ir::FILE_MEMORY_BUFFER;
 case TGSI_FILE_MEMORY:  return nv50_ir::FILE_MEMORY_GLOBAL;
 case TGSI_FILE_SAMPLER:
 case TGSI_FILE_NULL:
diff --git
a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index d0936d8..628deb7 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -1141,13 +1141,14 @@ NVC0LoweringPass::handleATOM(Instruction *atom)
handleSharedATOM(atom);
return true;
 default:
-  assert(atom->src(0).getFile() == FILE_MEMORY_GLOBAL);
+  assert(atom->src(0).getFile() == FILE_MEMORY_BUFFER);
base = loadResInfo64(ind, atom->getSrc(0)->reg.fileIndex * 16);
assert(base->reg.size == 8);
if (ptr)
   base = bld.mkOp2v(OP_ADD, TYPE_U64, base, base, ptr);
assert(base->reg.size == 8);
atom->setIndirect(0, 0, base);
+  atom->getSrc(0)->reg.file = FILE_MEMORY_GLOBAL;
return true;
 }
 base =
@@ -1963,7 +1964,7 @@ NVC0LoweringPass::visit(Instruction *i)
} else if (i->src(0).getFile() == FILE_SHADER_OUTPUT) {
   assert(prog->getType() ==
Program::TYPE_TESSELLATION_CONTROL);
   i->op = OP_VFETCH;
-  } else if (i->src(0).getFile() == FILE_MEMORY_GLOBAL) {
+  } else if (i->src(0).getFile() == FILE_MEMORY_BUFFER) {
   Value *ind = i->getIndirect(0, 1);
   Value *ptr = loadResInfo64(ind,
i->getSrc(0)->reg.fileIndex * 16);
   // XXX come up with a way not to do this for EVERY little
access but
@@ -1978,6 +1979,7 @@ NVC0LoweringPass::visit(Instruction *i)
   }
   i->setIndirect(0, 1, NULL);

Re: [Mesa-dev] [PATCH] nouveau: codegen: Take src swizzle into account on loads

2016-04-08 Thread Hans de Goede


Hi,

On 08-04-16 17:45, Ilia Mirkin wrote:

On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goede  wrote:

When dealing with non vector variables the llvm register allocator
will use TEMP[0].x then TEMP[0].y, etc.

When loading something from a global buffer it will calculate the
address to use, and store that in say TEMP[0].x, so it ends up
generating:

LOAD TEMP[0].y, MEMORY[0], TEMP[0]

Expecting the contents of TEMP[0].y to become the 32 bits of data
to which TEMP[0].x is pointing. But instead it will get the 32 bits of
data at address (TEMP[0].x + 4).

With the old RES[32767] code one could generate the following TGSI:

LOAD TEMP[0].y, RES[32767]., TEMP[0]

And things would work fine since the . swizzling postfix would
be honored and when storing to y (the only component set in the dest-mask)
the x component at address (TEMP[0].x) would be loaded, rather then the
y component at (TEMP[0].y)

Note that another approach would be to not increment the address by
a 32 bit word for skipped (not set in destmask) components.

The way I see it either:

1) We see that LOAD does not deal with vectors, but with flat memory,
in which case skipping 4 bytes because x is not set in the destmask
does not make sense, as that is a vector thing todo.

2) LOAD is vector layout aware in which case supporting swizzling
makes sense.

Currently we have a weird hybrid which is rather cumbersome to
work with from a compiler pov.


And I guess LLVM never ends up generating any of the other "funny"
instructions like LIT and the such. Well, I have no problem adding the
swizzling logic, i.e. the way that LOAD will now work (logically) is
that it will

(a) fetch 4 values from the coordinates provided (4 sequential dwords
from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in
the case of images)
(b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE argument
(c) store that swizzled result into the destination based on the writemask

That would sound reasonable to me, and if I understand correctly, is
option 2 of your proposal.


Yes that is option 2, and is basically what the patch which started this
thread does. So that would work for me :)


We'd need some docs updates and buy-in from the other gallium driver developers.


What docs would need updating ? The TGSI docs I'm aware of are at:

http://gallium.readthedocs.org/en/latest/tgsi.html

I assume those have a source in the mesa src somewhere (I've not looked),
but those mostly just look quite incomplete in general when it comes to TGSI
(I've had to revert to figuring what things do from the mesa srcs quite often)

Have I been looking at the wrong docs perhaps ?

Note that them being incomplete is not intended as an excuse to not document
this, I'm all for better documentation.


STORE remains unchanged, as the MEMORY/etc is in the destination,
where there is a writemask, which is presently used and will remain
effective.


Right and note that the first src operand of STORE already takes swizzling
into account, so the proposed option 2 will actually make the 2 more inline.

Regards,

Hans
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nouveau: codegen: Take src swizzle into account on loads

2016-04-08 Thread Ilia Mirkin

On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goede  wrote:
> When dealing with non vector variables the llvm register allocator
> will use TEMP[0].x then TEMP[0].y, etc.
>
> When loading something from a global buffer it will calculate the
> address to use, and store that in say TEMP[0].x, so it ends up
> generating:
>
> LOAD TEMP[0].y, MEMORY[0], TEMP[0]
>
> Expecting the contents of TEMP[0].y to become the 32 bits of data
> to which TEMP[0].x is pointing. But instead it will get the 32 bits of
> data at address (TEMP[0].x + 4).
>
> With the old RES[32767] code one could generate the following TGSI:
>
> LOAD TEMP[0].y, RES[32767]., TEMP[0]
>
> And things would work fine since the . swizzling postfix would
> be honored and when storing to y (the only component set in the dest-mask)
> the x component at address (TEMP[0].x) would be loaded, rather then the
> y component at (TEMP[0].y)
>
> Note that another approach would be to not increment the address by
> a 32 bit word for skipped (not set in destmask) components.
>
> The way I see it either:
>
> 1) We see that LOAD does not deal with vectors, but with flat memory,
> in which case skipping 4 bytes because x is not set in the destmask
> does not make sense, as that is a vector thing todo.
>
> 2) LOAD is vector layout aware in which case supporting swizzling
> makes sense.
>
> Currently we have a weird hybrid which is rather cumbersome to
> work with from a compiler pov.

And I guess LLVM never ends up generating any of the other "funny"
instructions like LIT and the such. Well, I have no problem adding the
swizzling logic, i.e. the way that LOAD will now work (logically) is
that it will

(a) fetch 4 values from the coordinates provided (4 sequential dwords
from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in
the case of images)
(b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE argument
(c) store that swizzled result into the destination based on the writemask

That would sound reasonable to me, and if I understand correctly, is
option 2 of your proposal. We'd need some docs updates and buy-in from
the other gallium driver developers.

STORE remains unchanged, as the MEMORY/etc is in the destination,
where there is a writemask, which is presently used and will remain
effective.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radeon/uvd: alignment fix for decode message buffer

2016-04-08 Thread Boyuan Zhang

Signed-off-by: Boyuan Zhang 
Reviewed-by: Christian König 
---
 src/gallium/drivers/radeon/radeon_uvd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
b/src/gallium/drivers/radeon/radeon_uvd.c
index 233f460..098baf2 100644
--- a/src/gallium/drivers/radeon/radeon_uvd.c
+++ b/src/gallium/drivers/radeon/radeon_uvd.c
@@ -1003,7 +1003,7 @@ static void ruvd_end_frame(struct pipe_video_codec 
*decoder,
 
dec->msg->body.decode.dpb_size = dec->dpb.res->buf->size;
dec->msg->body.decode.bsd_size = bs_size;
-   dec->msg->body.decode.db_pitch = dec->base.width;
+   dec->msg->body.decode.db_pitch = align(dec->base.width, 16);
 
dt = dec->set_dtb(dec->msg, (struct vl_video_buffer *)target);
if (((struct r600_common_screen*)dec->screen)->family >= CHIP_STONEY)
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/4] nir: Silence unused "options" warning in algebraic passes.

2016-04-08 Thread Eduardo Lima Mitev


On 04/08/2016 01:35 AM, Kenneth Graunke wrote:

Some passes may not refer to options->..., at which point the compiler
will warn about an unused variable.  Just cast to void unconditionally
to shut it up.

Signed-off-by: Kenneth Graunke 
---
  src/compiler/nir/nir_algebraic.py | 1 +
  1 file changed, 1 insertion(+)

diff --git a/src/compiler/nir/nir_algebraic.py 
b/src/compiler/nir/nir_algebraic.py
index d05564f..53a7907 100644
--- a/src/compiler/nir/nir_algebraic.py
+++ b/src/compiler/nir/nir_algebraic.py
@@ -291,6 +291,7 @@ ${pass_name}(nir_shader *shader)
 bool progress = false;
 bool condition_flags[${len(condition_list)}];
 const nir_shader_compiler_options *options = shader->options;
+   (void) options;



Hmm, don't like this very much. I suppose since it is generated code, it 
can not be done per block where options is unused.

For respect to other people's right to have clean build logs, patch is:

Reviewed-by: Eduardo Lima Mitev 


 % for index, condition in enumerate(condition_list):
 condition_flags[${index}] = ${condition};



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nouveau: codegen: Take src swizzle into account on loads

2016-04-08 Thread Hans de Goede


Hi,

On 08-04-16 17:02, Ilia Mirkin wrote:

On Fri, Apr 8, 2016 at 5:27 AM, Hans de Goede  wrote:

Hi,

On 07-04-16 15:58, Ilia Mirkin wrote:


That's wrong.



It used to work with the old RES[] code and if one cannot specify
a source swizzle, then how can I do something like

LOAD TEMP[0].y, MEMORY[0], address

And get the data at absolute global memory address "address" into TEMP[0].y
?

This is a must-have for llvm to be able to generate working TGSI code,
I do not see any way around this.

AFAIK this is exactly what src-swizzling is for. Also note that
this commit does not change anything if no src-swizzling is specified,
in that case things work exactly as before.


The spec for the instruction needs to be clarified...

The current nouveau impl is correct - only the .x of the address
should be loaded, with up to 16 bytes read into the destination.



Ah note this is not about swizzling on the address, that indeed
makes no sense given how the addressing works for BUFFERS / MEMORY,
no this is about adding a swizlling postfix to the buffer / memory
resource specification, for example:

LOAD TEMP[0].y, MEMORY[0]., TEMP[0]

See the swizzling is done on the resource, not on the address, so
the swizzling specifies swizzling of the up to 16 bytes read from
address, it does not influence the address handling at all.

I now see I made an error in my commit msg, it gives the following
example:

LOAD TEMP[0].y, MEMORY[0]., TEMP[0].x

This clearly is wrong, the last TEMP[0].x is not even valid TGSI,
the correct example would be:

LOAD TEMP[0].y, MEMORY[0]., TEMP[0]


I stand by my comment of "working as intended". But that doesn't mean
the intent can't be changed :)

For memory/buffers, LOAD takes the address at TEMP[0].x and loads 16
bytes (4 words), and sticks them into the destination's .xyzw. If you
happen to have a writemask, then only some of those are written out.

It seems that you're trying to add additional meaning to the swizzle
on the "memory" argument. However I don't believe that such a thing is
defined. (And definitely not used anywhere, at least not on purpose.)

Why does this cause you issues with LLVM-generated TGSI?


When dealing with non vector variables the llvm register allocator
will use TEMP[0].x then TEMP[0].y, etc.

When loading something from a global buffer it will calculate the
address to use, and store that in say TEMP[0].x, so it ends up
generating:

LOAD TEMP[0].y, MEMORY[0], TEMP[0]

Expecting the contents of TEMP[0].y to become the 32 bits of data
to which TEMP[0].x is pointing. But instead it will get the 32 bits of
data at address (TEMP[0].x + 4).

With the old RES[32767] code one could generate the following TGSI:

LOAD TEMP[0].y, RES[32767]., TEMP[0]

And things would work fine since the . swizzling postfix would
be honored and when storing to y (the only component set in the dest-mask)
the x component at address (TEMP[0].x) would be loaded, rather then the
y component at (TEMP[0].y)

Note that another approach would be to not increment the address by
a 32 bit word for skipped (not set in destmask) components.

The way I see it either:

1) We see that LOAD does not deal with vectors, but with flat memory,
in which case skipping 4 bytes because x is not set in the destmask
does not make sense, as that is a vector thing todo.

2) LOAD is vector layout aware in which case supporting swizzling
makes sense.

Currently we have a weird hybrid which is rather cumbersome to
work with from a compiler pov.

Regards,

Hans















   -ilia


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/4] nir: Do basic constant reassociation.

2016-04-08 Thread Eduardo Lima Mitev


On 04/08/2016 01:35 AM, Kenneth Graunke wrote:

Many shaders contain expression trees of the form:

 const_1 * (value * const_2)

Reorganizing these to

 (const_1 * const_2) * value

will allow constant folding to combine the constants.  Sometimes, these
constants are 2 and 0.5, so we can remove a multiply altogether.  Other
times, it can create more immediate constants, which can actually hurt.

Finding a good balance here is tricky.  While much more could be done,
this simple patch seems to have a lot of positive benefit while having
a low downside.



Makes sense.

Reviewed-by: Eduardo Lima Mitev 


shader-db results on Broadwell:

total instructions in shared programs: 8963768 -> 8961369 (-0.03%)
instructions in affected programs: 438318 -> 435919 (-0.55%)
helped: 1502
HURT: 245

total cycles in shared programs: 71527354 -> 71421516 (-0.15%)
cycles in affected programs: 11541788 -> 11435950 (-0.92%)
helped: 3445
HURT: 1224



FWIW, these are my results on HSW:

total instructions in shared programs: 6223995 -> 6222470 (-0.02%)
instructions in affected programs: 300945 -> 299420 (-0.51%)
helped: 1231
HURT: 240

total cycles in shared programs: 56809432 -> 56615552 (-0.34%)
cycles in affected programs: 7980160 -> 7786280 (-2.43%)
helped: 2087
HURT: 634

Eduardo


Signed-off-by: Kenneth Graunke 
---
  src/compiler/nir/nir_opt_algebraic.py | 8 
  1 file changed, 8 insertions(+)

diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index e72b4a7..420d9d9 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -274,6 +274,14 @@ optimizations = [
 (('fmul', ('fneg', a), b), ('fneg', ('fmul', a, b))),
 (('imul', ('ineg', a), b), ('ineg', ('imul', a, b))),

+   # Reassociate constants in add/mul chains so they can be folded together.
+   # For now, we only handle cases where the constants are separated by
+   # a single non-constant.  We could do better eventually.
+   (('~fmul', '#a', ('fmul', b, '#c')), ('fmul', ('fmul', a, c), b)),
+   (('imul', '#a', ('imul', b, '#c')), ('imul', ('imul', a, c), b)),
+   (('~fadd', '#a', ('fadd', b, '#c')), ('fadd', ('fadd', a, c), b)),
+   (('iadd', '#a', ('iadd', b, '#c')), ('iadd', ('iadd', a, c), b)),
+
 # Misc. lowering
 (('fmod', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b, 
'options->lower_fmod'),
 (('uadd_carry', a, b), ('b2i', ('ult', ('iadd', a, b), a)), 
'options->lower_uadd_carry'),



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 00/15] i965: Rework uniform handling in the back-end

2016-04-08 Thread Jason Ekstrand

On Apr 7, 2016 11:08 PM, "Jason Ekstrand"  wrote:
>
>
> On Apr 7, 2016 10:01 PM, "Matt Turner"  wrote:
> >
> > On Tue, Mar 22, 2016 at 3:33 PM, Jason Ekstrand 
wrote:
> > > This is mostly a re-send of a patch series I've had floating around
in one
> > > form or a while for quite some time.  It's basically the same except
that
> > > the original version was missing a work-around for Sandy Bridge.  For
a
> > > while, I wasn't really pushing to get it merged because I couldn't
> > > demonstrate any actual performance benifit from pushing arrays.
However,
> > > with the Vulkan API, the concept of push constants is directly
exposed to
> > > the user and we really need to be able to indirect on them.  This
series
> > > makes the FS backend 100% ready for indirect push constants;  vec4
will
> > > take a little more work.
> > >
> > > It's worth noting that we've been carying these patches around in our
> > > Vulkan driver for probably 3 or 4 months now and it's working great.
> > >
> > > For those that prefer to review on a branch:
> > >
> > >
https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i965-uniforms
> > >
> > > I think Kristian has mostly reviewed these patches.  However, he
never sent
> > > any R-Bs to the list.  I'd also like Ken or Matt to look at it from a
> > > design perspective.
> >
> > I don't know what I think. I'm sympathetic to Curro's argument, but in
> > the absence of more data it's hard to judge anything really. I'm not
> > at all sympathetic to
> >
> > """
> > Do I have a proof-of-concept in code, no.  However, I've run through
> > it in my head and I have a pretty good idea what it would look like.
> > You are free to go off and do it if you don't believe me, but I don't
> > really want to hold things up while you do.
> > """
> >
> > That's what... An Appeal to Your Brain? :)
>
> Sort-of... It was more a remark of frustration at the (percieved)
implication that I hadn't thought about it or at the very least hadnt given
it a fair shake.  In a bit more detail here are some of my thoughts on
reladdr and an ADDR file in no particular order
>
> a) Not a single FS optimization pass handles it.  Yes, "if you see
reladdr, bail” is a valid (if suboptimal) strategy 90% of the time.
However, anything that computes any sort of kill set now needs a recursive
algorithm to walk register sources.  We do handle this in NIR and it's not
terrible but it does come with nontrivial pain and retrofitting it isn't
necessarily going to be quick-and-easy.  Curro's response of "use-def
chains will fix this" while probably accurate doesn't solve the immediate
problem while these patches have been on the list for 6 months.
>
> b) The hardware doesn't do reladdr.  It has an address register with
substantial restrictions.  Eventually, we would need to lower to something
that writes the address register and have an indirect source type that
consumes it.  If you end up with two indirect sources, we have to emit a
move for one of them.  Where do we do that lowering?  Do we do it in the
generator or as a pass?
>
> c) If we handle it all in the generator, we have no ability to schedule
it at all.  It also makes the generator far more complex.
>
> d) If we handle it in a lowering pass, what does that pass produce? Do we
expose the ADDR file and try to do RA on it or do we treat it as a fixed
thing like flag?  In either case, we need to add extra logic to at least
the scheduler if not other places to add this whole new concept.
>
> e) If we allow indirect sources of any sort, how do we carry range
information around post-RA.  Pre-RA we can theoretically just say if you
indirect you touch the whole thing.  Post-RA, you either have to carry that
information around per-instruction or you have to assume that any
instruction that uses an indirect source could be reading anything in the
entire GRF and it becomes almost a complete scheduling barrier.
>
> Those are the thoughts that pop to the top. I could come up with more if
you'd like.
>
> So, yes, using reladdr or or an ADDR file would be possible but it would
involve substantial IR surgery.  What's the benefit?  You can put the
relative source directly in the instruction that uses it and maybe do an
address calculation directly to the address register instead of having to
move it there.  The approach I've taken on the other hand, neatly
side-steps all of the issues listed above.  This comes at the cost of a few
extra instructions (which you probably have to spend anyway on gen7).  I
think that trade-off is worthwhile.

There are some other things I found to be very nice about the approach. One
is that we got too stop carrying around these arrays of extra uniform size
information.  They're the only substantial per-backend bit of uniform setup
that we do and they're only used for knowing how big the indirected
uniforms are. Their existence has bugged me ever since we first got NIR
going.  They're not too

Re: [Mesa-dev] [PATCH] nouveau: codegen: Take src swizzle into account on loads

2016-04-08 Thread Ilia Mirkin

On Fri, Apr 8, 2016 at 5:27 AM, Hans de Goede  wrote:
> Hi,
>
> On 07-04-16 15:58, Ilia Mirkin wrote:
>>
>> That's wrong.
>
>
> It used to work with the old RES[] code and if one cannot specify
> a source swizzle, then how can I do something like
>
> LOAD TEMP[0].y, MEMORY[0], address
>
> And get the data at absolute global memory address "address" into TEMP[0].y
> ?
>
> This is a must-have for llvm to be able to generate working TGSI code,
> I do not see any way around this.
>
> AFAIK this is exactly what src-swizzling is for. Also note that
> this commit does not change anything if no src-swizzling is specified,
> in that case things work exactly as before.
>
>> The spec for the instruction needs to be clarified...
>>
>> The current nouveau impl is correct - only the .x of the address
>> should be loaded, with up to 16 bytes read into the destination.
>
>
> Ah note this is not about swizzling on the address, that indeed
> makes no sense given how the addressing works for BUFFERS / MEMORY,
> no this is about adding a swizlling postfix to the buffer / memory
> resource specification, for example:
>
> LOAD TEMP[0].y, MEMORY[0]., TEMP[0]
>
> See the swizzling is done on the resource, not on the address, so
> the swizzling specifies swizzling of the up to 16 bytes read from
> address, it does not influence the address handling at all.
>
> I now see I made an error in my commit msg, it gives the following
> example:
>
> LOAD TEMP[0].y, MEMORY[0]., TEMP[0].x
>
> This clearly is wrong, the last TEMP[0].x is not even valid TGSI,
> the correct example would be:
>
> LOAD TEMP[0].y, MEMORY[0]., TEMP[0]

I stand by my comment of "working as intended". But that doesn't mean
the intent can't be changed :)

For memory/buffers, LOAD takes the address at TEMP[0].x and loads 16
bytes (4 words), and sticks them into the destination's .xyzw. If you
happen to have a writemask, then only some of those are written out.

It seems that you're trying to add additional meaning to the swizzle
on the "memory" argument. However I don't believe that such a thing is
defined. (And definitely not used anywhere, at least not on purpose.)

Why does this cause you issues with LLVM-generated TGSI?

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: remove duplicate Makefile.sources

2016-04-08 Thread Rob Clark

On Fri, Apr 8, 2016 at 4:37 AM, Jose Fonseca  wrote:
> On 02/04/16 19:35, Rob Clark wrote:
>>
>> From: Rob Clark 
>>
>> Allegedly this was needed still by scons build.. but in practice it
>> doesn't seem to be needed.  Removing it and running 'scons' results
>> in no build errors.
>>
>> Signed-off-by: Rob Clark 
>> ---
>> So, afaict NIR is not even built w/ scons build (I'm just running
>> 'scons' with no args, so let me know if I'm missing some build
>> variant).  So at least if there is no scons variant that *does*
>> build NIR, I think this is the right thing to do to reduce
>> confusion.  But it brings up a bigger question of what to do
>> with my patchset which adds NIR support in mesa/st, since that
>> obviosly won't work with scons build as-is.
>>
>> I guess the two options are to try to add NIR into scons build
>> (which involves some .py generated code, so maybe not trivial)
>> or just #ifdef'ify all the mesa/st parts in my gallium-nir
>> patchset which introduce dependencies on NIR.  Opinions?
>
>
> If you might recall, I already had patches to build NIR with SCons:
>
>   http://cgit.freedesktop.org/~jrfonseca/mesa/log/?h=scons-nir

oh, awesome, I'd forgotten about that..  let me see if I can resurrect that

BR,
-R

> I ended up not taking any action at the time.  First due to lack of time,
> second because I noticed NIR source trees being moving around, so the risk
> of exposing scons build to breakage didn't seem worthwhile.
>
> So those changes need to be rebased and probably updated a bit, but it gives
> you an idea of what needs to be done.
>
> Jose
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2] glsl: handle unsigned int wraparound in link_shaders()

2016-04-08 Thread Lars Hamre

NOTE: someone with access will need to commit this afer the
  review process

v2: change check_explicit_uniform_locations() to return an
unsigned 0 (Timothy Arceri)

We were storing the int result of check_explicit_uniform_locations()
in num_explicit_uniform_locs as an unsigned int which caused it to
be 4294967295 when a -1 was returned.

This in turn would cause the following error during linking:
error: count of uniform locations > MAX_UNIFORM_LOCATIONS(4294967295 > 98304)

Here are the results from running piglit tests/all with this patch:
changes: 178
fixes:   176
regressions: 2

The two regressions are for the following tests:
glean@glsl1-matrix column check (1)
glean@glsl1-matrix column check (2)
which regress from FAIL to CRASH.

I am okay with these regressions because the tests are currently
failing due to the aforementioned linker error.

Signed-off-by: Lars Hamre 

---
 src/compiler/glsl/linker.cpp | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
index cd35464..cc74574 100644
--- a/src/compiler/glsl/linker.cpp
+++ b/src/compiler/glsl/linker.cpp
@@ -3164,12 +3164,12 @@ reserve_subroutine_explicit_locations(struct 
gl_shader_program *prog,
  * any optimizations happen to handle also inactive uniforms and
  * inactive array elements that may get trimmed away.
  */
-static int
+static unsigned
 check_explicit_uniform_locations(struct gl_context *ctx,
  struct gl_shader_program *prog)
 {
if (!ctx->Extensions.ARB_explicit_uniform_location)
-  return -1;
+  return 0;

/* This map is used to detect if overlapping explicit locations
 * occur with the same uniform (from different stage) or a different one.
@@ -3178,7 +3178,7 @@ check_explicit_uniform_locations(struct gl_context *ctx,

if (!uniform_map) {
   linker_error(prog, "Out of memory during linking.\n");
-  return -1;
+  return 0;
}

unsigned entries_total = 0;
@@ -3207,7 +3207,7 @@ check_explicit_uniform_locations(struct gl_context *ctx,
 }
 if (!ret) {
delete uniform_map;
-   return -1;
+   return 0;
 }
  }
   }
--
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nouveau: codegen: Take src swizzle into account on loads

2016-04-08 Thread Ilia Mirkin

Ah, I may have read over your commit too hastily. Will have another look.
On Apr 8, 2016 5:27 AM, "Hans de Goede"  wrote:

> Hi,
>
> On 07-04-16 15:58, Ilia Mirkin wrote:
>
>> That's wrong.
>>
>
> It used to work with the old RES[] code and if one cannot specify
> a source swizzle, then how can I do something like
>
> LOAD TEMP[0].y, MEMORY[0], address
>
> And get the data at absolute global memory address "address" into
> TEMP[0].y ?
>
> This is a must-have for llvm to be able to generate working TGSI code,
> I do not see any way around this.
>
> AFAIK this is exactly what src-swizzling is for. Also note that
> this commit does not change anything if no src-swizzling is specified,
> in that case things work exactly as before.
>
> > The spec for the instruction needs to be clarified...
>
>> The current nouveau impl is correct - only the .x of the address
>> should be loaded, with up to 16 bytes read into the destination.
>>
>
> Ah note this is not about swizzling on the address, that indeed
> makes no sense given how the addressing works for BUFFERS / MEMORY,
> no this is about adding a swizlling postfix to the buffer / memory
> resource specification, for example:
>
> LOAD TEMP[0].y, MEMORY[0]., TEMP[0]
>
> See the swizzling is done on the resource, not on the address, so
> the swizzling specifies swizzling of the up to 16 bytes read from
> address, it does not influence the address handling at all.
>
> I now see I made an error in my commit msg, it gives the following
> example:
>
> LOAD TEMP[0].y, MEMORY[0]., TEMP[0].x
>
> This clearly is wrong, the last TEMP[0].x is not even valid TGSI,
> the correct example would be:
>
> LOAD TEMP[0].y, MEMORY[0]., TEMP[0]
>
> Regards,
>
> Hans
>
>
> On Thu, Apr 7, 2016 at 9:27 AM, Hans de Goede  wrote:
>>
>>> The llvm TGSI backend does things like:
>>>
>>>
>>>
>>> Expecting the data at address TEMP[0].x to get loaded to
>>> TEMP[0].y. Before this commit the data at TEMP[0].x + 4 would be
>>> loaded instead. This commit fixes this.
>>>
>>> Signed-off-by: Hans de Goede 
>>> ---
>>>   src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 8 ++--
>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>>> index 557608e..cc51f5a 100644
>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>>> @@ -2279,12 +2279,16 @@ Converter::handleLOAD(Value *dst0[4])
>>>
>>>Value *off = fetchSrc(1, c);
>>>Symbol *sym;
>>> + uint32_t src0_component_offset = tgsi.getSrc(0).getSwizzle(c)
>>> * 4;
>>> +
>>>if (tgsi.getSrc(1).getFile() == TGSI_FILE_IMMEDIATE) {
>>>   off = NULL;
>>>   sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c,
>>> -  tgsi.getSrc(1).getValueU32(0, info) + 4 * c);
>>> +  tgsi.getSrc(1).getValueU32(0, info) +
>>> +  src0_component_offset);
>>>} else {
>>> -sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c, 4 * c);
>>> +sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c,
>>> +  src0_component_offset);
>>>}
>>>
>>>Instruction *ld = mkLoad(TYPE_U32, dst0[c], sym, off);
>>> --
>>> 2.7.3
>>>
>>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: handle unsigned int wraparound in link_shaders()

2016-04-08 Thread Lars Hamre

Agreed, I didn't see that check_explicit_uniform_locations() was
only used in link_shaders(). I will submit a v2 with those changes.

On Thu, Apr 7, 2016 at 11:24 PM, Timothy Arceri <
timothy.arc...@collabora.com> wrote:

> On Thu, 2016-04-07 at 11:22 -0400, Lars Hamre wrote:
> > NOTES:
> > * Reposting due to no response last week
> > * Someone with access will need to commit this patch after the review
> >   process
> >
> > check_explicit_uniform_locations() returns a -1 when the extension
> > ARB_explicit_uniform_location is not found.
> >
> > We were storing the result in num_explicit_uniform_locs as an
> > unsigned int which caused it to be 4294967295 when a -1 was returned.
> >
> > This in turn would cause the following error during linking:
> > error: count of uniform locations > MAX_UNIFORM_LOCATIONS(4294967295
> > > 98304)
> >
> > Here are the results from running piglit tests/all with this patch:
> > changes: 178
> > fixes:   176
> > regressions: 2
> >
> > The two regressions are for the following tests:
> > glean@glsl1-matrix column check (1)
> > glean@glsl1-matrix column check (2)
> > which regress from FAIL to CRASH.
> >
> > I am okay with these regressions because the tests are currently
> > failing due to the aforementioned linker error.
> >
> > Signed-off-by: Lars Hamre 
> >
> > ---
> >  src/compiler/glsl/linker.cpp | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/compiler/glsl/linker.cpp
> > b/src/compiler/glsl/linker.cpp
> > index cd35464..a0a9104 100644
> > --- a/src/compiler/glsl/linker.cpp
> > +++ b/src/compiler/glsl/linker.cpp
> > @@ -4171,7 +4171,7 @@ link_shaders(struct gl_context *ctx, struct
> > gl_shader_program *prog)
> >
> > tfeedback_decl *tfeedback_decls = NULL;
> > unsigned num_tfeedback_decls = prog-
> > >TransformFeedback.NumVarying;
> > -   unsigned int num_explicit_uniform_locs = 0;
> > +   int num_explicit_uniform_locs = 0;
> >
> > void *mem_ctx = ralloc_context(NULL); // temporary linker context
> >
> > @@ -4354,6 +4354,8 @@ link_shaders(struct gl_context *ctx, struct
> > gl_shader_program *prog)
> > }
> >
> > num_explicit_uniform_locs = check_explicit_uniform_locations(ctx,
> > prog);
> > +   if (num_explicit_uniform_locs < 0)
> > +  num_explicit_uniform_locs = 0;
>
> Thanks for spotting this. However the just hacks around the problem.
>
> Since we never do anything with the -1 I would rather
> see check_explicit_uniform_locations() changed to return an unsigned 0
> rather than -1.
>
>
> > link_assign_subroutine_types(prog);
> >
> > if (!prog->LinkStatus)
> > @@ -4585,7 +4587,7 @@ link_shaders(struct gl_context *ctx, struct
> > gl_shader_program *prog)
> >
> > update_array_sizes(prog);
> > link_assign_uniform_locations(prog, ctx-
> > >Const.UniformBooleanTrue,
> > - num_explicit_uniform_locs,
> > + (unsigned
> > int)num_explicit_uniform_locs,
> >   ctx-
> > >Const.MaxUserAssignableUniformLocations);
> > link_assign_atomic_counter_resources(ctx, prog);
> > store_fragdepth_layout(prog);
> > --
> > 2.5.5
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: fix mask checking when emitting scissors and viewports

2016-04-08 Thread Marek Olšák

On Fri, Apr 8, 2016 at 12:16 PM,   wrote:
> On 2016-04-08 19:00, Marek Olšák wrote:
>>
>> From: Marek Olšák 
>>
>> ---
>>  src/gallium/drivers/radeonsi/si_state.c | 12 
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_state.c
>> b/src/gallium/drivers/radeonsi/si_state.c
>> index 8087d23..3894e1d 100644
>> --- a/src/gallium/drivers/radeonsi/si_state.c
>> +++ b/src/gallium/drivers/radeonsi/si_state.c
>> @@ -912,8 +912,10 @@ static void si_emit_scissors(struct si_context
>> *sctx, struct r600_atom *atom)
>> bool scissor_enable =
>> sctx->queued.named.rasterizer->scissor_enable;
>>
>> /* The simple case: Only 1 viewport is active. */
>> -   if (mask & 1 &&
>> -   !si_get_vs_info(sctx)->writes_viewport_index) {
>> +   if (!si_get_vs_info(sctx)->writes_viewport_index) {
>> +   if (!(mask & 1))
>
>
> seems a bit tentative.. did you want 1u here or?

1 and 1u are equivalent in this case.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [android-x86-devel] Re: gralloc_drm_pipe

2016-04-08 Thread Chih-Wei Huang

2016-04-07 13:30 GMT+08:00 Rob Herring :
> On Wed, Apr 6, 2016 at 11:12 AM, Chih-Wei Huang  
> wrote:
>
>> I guess the first supported GPU is virgl. Right?
>
> Yes. Any gallium driver really. The classic mesa drivers will need
> their own additions for GBM map/unmap.
>
>> When could we expect it's ready to test?
>
> Not sure. Definitely not until the GBM interface is set. There's not
> really much reason for android-x86 to move to it until it gets flushed
> out.

Could you provide me the patches of mesa?
I'm glad to test it. Thanks!


-- 
Chih-Wei
Android-x86 project
http://www.android-x86.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] radeonsi: implement and rely on set_active_query_state

2016-04-08 Thread eocallaghan


Reviewed-by: Edward O'Callaghan 

On 2016-04-08 18:58, Marek Olšák wrote:

From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_blit.c   |  3 ---
 src/gallium/drivers/radeonsi/si_pipe.h   |  4 
 src/gallium/drivers/radeonsi/si_state.c  | 32 
+++-

 src/gallium/drivers/radeonsi/si_state_draw.c | 10 +
 4 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_blit.c
b/src/gallium/drivers/radeonsi/si_blit.c
index c5ea8b1..aed783f 100644
--- a/src/gallium/drivers/radeonsi/si_blit.c
+++ b/src/gallium/drivers/radeonsi/si_blit.c
@@ -52,8 +52,6 @@ static void si_blitter_begin(struct pipe_context
*ctx, enum si_blitter_op op)
 {
struct si_context *sctx = (struct si_context *)ctx;

-   r600_suspend_nontimer_queries(>b);
-
 	util_blitter_save_vertex_buffer_slot(sctx->blitter, 
sctx->vertex_buffer);
 	util_blitter_save_vertex_elements(sctx->blitter, 
sctx->vertex_elements);

util_blitter_save_vertex_shader(sctx->blitter, sctx->vs_shader.cso);
@@ -95,7 +93,6 @@ static void si_blitter_end(struct pipe_context *ctx)
struct si_context *sctx = (struct si_context *)ctx;

sctx->b.render_cond_force_off = false;
-   r600_resume_nontimer_queries(>b);
 }

 static unsigned u_max_sample(struct pipe_resource *r)
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h
b/src/gallium/drivers/radeonsi/si_pipe.h
index 4158fc5..8fcfcd2 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -66,6 +66,9 @@
 /* Compute only. */
 #define SI_CONTEXT_FLUSH_WITH_INV_L2   (R600_CONTEXT_PRIVATE_FLAG <<
13) /* TODO: merge with TC? */
 #define SI_CONTEXT_FLAG_COMPUTE(R600_CONTEXT_PRIVATE_FLAG << 
14)
+/* Pipeline & streamout query controls. */
+#define SI_CONTEXT_START_PIPELINE_STATS	(R600_CONTEXT_PRIVATE_FLAG << 
15)
+#define SI_CONTEXT_STOP_PIPELINE_STATS	(R600_CONTEXT_PRIVATE_FLAG << 
16)


 #define SI_CONTEXT_FLUSH_AND_INV_FRAMEBUFFER 
(SI_CONTEXT_FLUSH_AND_INV_CB | \

  SI_CONTEXT_FLUSH_AND_INV_CB_META 
| \
@@ -289,6 +292,7 @@ struct si_context {
booldb_stencil_clear;
booldb_stencil_disable_expclear;
unsignedps_db_shader_control;
+   boolocclusion_queries_disabled;

/* Emitted draw state. */
int last_base_vertex;
diff --git a/src/gallium/drivers/radeonsi/si_state.c
b/src/gallium/drivers/radeonsi/si_state.c
index a66bd30..6fbbb68 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -1352,6 +1352,26 @@ static void *si_create_db_flush_dsa(struct
si_context *sctx)

 /* DB RENDER STATE */

+static void si_set_active_query_state(struct pipe_context *ctx, 
boolean enable)

+{
+   struct si_context *sctx = (struct si_context*)ctx;
+
+   /* Pipeline stat & streamout queries. */
+   if (enable) {
+   sctx->b.flags &= ~SI_CONTEXT_STOP_PIPELINE_STATS;
+   sctx->b.flags |= SI_CONTEXT_START_PIPELINE_STATS;
+   } else {
+   sctx->b.flags &= ~SI_CONTEXT_START_PIPELINE_STATS;
+   sctx->b.flags |= SI_CONTEXT_STOP_PIPELINE_STATS;
+   }
+
+   /* Occlusion queries. */
+   if (sctx->occlusion_queries_disabled != !enable) {
+   sctx->occlusion_queries_disabled = !enable;
+   si_mark_atom_dirty(sctx, >db_render_state);
+   }
+}
+
 static void si_set_occlusion_query_state(struct pipe_context *ctx, 
bool enable)

 {
struct si_context *sctx = (struct si_context*)ctx;
@@ -1386,7 +1406,8 @@ static void si_emit_db_render_state(struct
si_context *sctx, struct r600_atom *s
}

/* DB_COUNT_CONTROL (occlusion queries) */
-   if (sctx->b.num_occlusion_queries > 0) {
+   if (sctx->b.num_occlusion_queries > 0 &&
+   !sctx->occlusion_queries_disabled) {
bool perfect = sctx->b.num_perfect_occlusion_queries > 0;

if (sctx->b.chip_class >= CIK) {
@@ -3765,6 +3786,7 @@ void si_init_state_functions(struct si_context 
*sctx)

sctx->b.b.set_min_samples = si_set_min_samples;
sctx->b.b.set_tess_state = si_set_tess_state;

+   sctx->b.b.set_active_query_state = si_set_active_query_state;
sctx->b.set_occlusion_query_state = si_set_occlusion_query_state;
sctx->b.need_gfx_cs_space = si_need_gfx_cs_space;

@@ -3995,6 +4017,14 @@ static void si_init_config(struct si_context 
*sctx)

si_pm4_cmd_add(pm4, 0x8000);
si_pm4_cmd_end(pm4, false);

+   /* This enables pipeline stat & streamout queries.
+* They are only disabled by blits.
+*/
+   si_pm4_cmd_begin(pm4, PKT3_EVENT_WRITE);
+   si_pm4_cmd_add(pm4, EVENT_TYPE(V_028A90_PIPELINESTAT_START) |
+

Re: [Mesa-dev] [PATCH mesa v2 1/2] nouveau: codegen: Use FILE_MEMORY_BUFFER for buffers

2016-04-08 Thread Hans de Goede


Hi,

On 23-03-16 23:10, Samuel Pitoiset wrote:

Are you sure this won't break compute shaders on fermi?
Could you please double-check that?


I just checked:

lspci:
01:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 610] 
(rev a1)

Before this patch-set:

[hans@plank piglit]$ ./piglit run -o shader -t 
'.*arb_shader_storage_buffer_object.*' results/shader
[9/9] pass: 9 /

After this patch-set:

[hans@plank piglit]$ ./piglit run -o shader -t 
'.*arb_shader_storage_buffer_object.*' results/shader
[9/9] pass: 9 /


One minor comment below.

On 03/17/2016 05:07 PM, Hans de Goede wrote:

Some of the lowering steps we currently do for FILE_MEMORY_GLOBAL only
apply to buffers, making it impossible to use FILE_MEMORY_GLOBAL for
OpenCL global buffers.

This commits changes the buffer code to use FILE_MEMORY_BUFFER at the
ir_from_tgsi and lowering steps, freeing use of FILE_MEMORY_GLOBAL
for use with OpenCL global buffers.

Note that after lowering buffer accesses use the FILE_MEMORY_GLOBAL
register file.

Tested with piglet on a gk107, before this patch:
./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader
[9/9] pass: 9 /
after:
./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader
[9/9] pass: 9 /

Signed-off-by: Hans de Goede 
---
Changes in v2:
-New patch in v2 of patch-set to re-enable support for global opencl buffers
---
  src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 +
  src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 +-
  src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 8 +---
  src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp | 1 +
  src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp   | 5 -
  src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp   | 1 +
  6 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index 7b0eb2f..5141fc6 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -332,6 +332,7 @@ enum DataFile
 FILE_MEMORY_CONST,
 FILE_SHADER_INPUT,
 FILE_SHADER_OUTPUT,
+   FILE_MEMORY_BUFFER,
 FILE_MEMORY_GLOBAL,
 FILE_MEMORY_SHARED,
 FILE_MEMORY_LOCAL,
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index baa2e30..7ae0cb2 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -373,7 +373,7 @@ static nv50_ir::DataFile translateFile(uint file)
 case TGSI_FILE_PREDICATE:   return nv50_ir::FILE_PREDICATE;
 case TGSI_FILE_IMMEDIATE:   return nv50_ir::FILE_IMMEDIATE;
 case TGSI_FILE_SYSTEM_VALUE:return nv50_ir::FILE_SYSTEM_VALUE;
-   case TGSI_FILE_BUFFER:  return nv50_ir::FILE_MEMORY_GLOBAL;
+   case TGSI_FILE_BUFFER:  return nv50_ir::FILE_MEMORY_BUFFER;
 case TGSI_FILE_MEMORY:  return nv50_ir::FILE_MEMORY_GLOBAL;
 case TGSI_FILE_SAMPLER:
 case TGSI_FILE_NULL:
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index d0936d8..628deb7 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -1141,13 +1141,14 @@ NVC0LoweringPass::handleATOM(Instruction *atom)
handleSharedATOM(atom);
return true;
 default:
-  assert(atom->src(0).getFile() == FILE_MEMORY_GLOBAL);
+  assert(atom->src(0).getFile() == FILE_MEMORY_BUFFER);
base = loadResInfo64(ind, atom->getSrc(0)->reg.fileIndex * 16);
assert(base->reg.size == 8);
if (ptr)
   base = bld.mkOp2v(OP_ADD, TYPE_U64, base, base, ptr);
assert(base->reg.size == 8);
atom->setIndirect(0, 0, base);
+  atom->getSrc(0)->reg.file = FILE_MEMORY_GLOBAL;
return true;
 }
 base =
@@ -1963,7 +1964,7 @@ NVC0LoweringPass::visit(Instruction *i)
} else if (i->src(0).getFile() == FILE_SHADER_OUTPUT) {
   assert(prog->getType() == Program::TYPE_TESSELLATION_CONTROL);
   i->op = OP_VFETCH;
-  } else if (i->src(0).getFile() == FILE_MEMORY_GLOBAL) {
+  } else if (i->src(0).getFile() == FILE_MEMORY_BUFFER) {
   Value *ind = i->getIndirect(0, 1);
   Value *ptr = loadResInfo64(ind, i->getSrc(0)->reg.fileIndex * 16);
   // XXX come up with a way not to do this for EVERY little access but
@@ -1978,6 +1979,7 @@ NVC0LoweringPass::visit(Instruction *i)
   }
   i->setIndirect(0, 1, NULL);
   i->setIndirect(0, 0, ptr);
+ i->getSrc(0)->reg.file = FILE_MEMORY_GLOBAL;
   bld.mkCmp(OP_SET, CC_GT, TYPE_U32, pred, TYPE_U32, offset, length);

Re: [Mesa-dev] [PATCH] radeonsi: fix mask checking when emitting scissors and viewports

2016-04-08 Thread eocallaghan


On 2016-04-08 19:00, Marek Olšák wrote:

From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_state.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state.c
b/src/gallium/drivers/radeonsi/si_state.c
index 8087d23..3894e1d 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -912,8 +912,10 @@ static void si_emit_scissors(struct si_context
*sctx, struct r600_atom *atom)
bool scissor_enable = sctx->queued.named.rasterizer->scissor_enable;

/* The simple case: Only 1 viewport is active. */
-   if (mask & 1 &&
-   !si_get_vs_info(sctx)->writes_viewport_index) {
+   if (!si_get_vs_info(sctx)->writes_viewport_index) {
+   if (!(mask & 1))


seems a bit tentative.. did you want 1u here or?


+   return;
+
 		radeon_set_context_reg_seq(cs, R_028250_PA_SC_VPORT_SCISSOR_0_TL, 
2);

si_emit_one_scissor(cs, >viewports.states[0],
scissor_enable ? [0] : NULL);
@@ -960,8 +962,10 @@ static void si_emit_viewports(struct si_context
*sctx, struct r600_atom *atom)
unsigned mask = sctx->viewports.dirty_mask;

/* The simple case: Only 1 viewport is active. */
-   if (mask & 1 &&
-   !si_get_vs_info(sctx)->writes_viewport_index) {
+   if (!si_get_vs_info(sctx)->writes_viewport_index) {
+   if (!(mask & 1))
+   return;
+
radeon_set_context_reg_seq(cs, R_02843C_PA_CL_VPORT_XSCALE, 6);
radeon_emit(cs, fui(states[0].scale[0]));
radeon_emit(cs, fui(states[0].translate[0]));


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallium/radeon: unify checking streamout enable state

2016-04-08 Thread eocallaghan


Reviewed-by: Edward O'Callaghan 

On 2016-04-08 19:00, Marek Olšák wrote:

From: Marek Olšák 

---
 src/gallium/drivers/r600/r600_state_common.c  | 5 ++---
 src/gallium/drivers/radeon/r600_pipe_common.h | 6 ++
 src/gallium/drivers/radeon/r600_streamout.c   | 6 --
 src/gallium/drivers/radeonsi/si_state_draw.c  | 3 +--
 4 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_state_common.c
b/src/gallium/drivers/r600/r600_state_common.c
index df41d3f..82babeb 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -1841,8 +1841,7 @@ static void r600_draw_vbo(struct pipe_context
*ctx, const struct pipe_draw_info
ia_switch_on_eop = true;
}

-   if (rctx->b.streamout.streamout_enabled ||
-   rctx->b.streamout.prims_gen_query_enabled)
+   if (r600_get_strmout_en(>b))
partial_vs_wave = true;

radeon_set_context_reg(cs, CM_R_028AA8_IA_MULTI_VGT_PARAM,
@@ -2018,7 +2017,7 @@ static void r600_draw_vbo(struct pipe_context
*ctx, const struct pipe_draw_info
rctx->b.family == CHIP_RV635) {
/* if we have gs shader or streamout
   we need to do a wait idle after every draw */
-   if (rctx->gs_shader || rctx->b.streamout.streamout_enabled) {
+   if (rctx->gs_shader || r600_get_strmout_en(>b)) {
 			radeon_set_config_reg(cs, R_008040_WAIT_UNTIL, 
S_008040_WAIT_3D_IDLE(1));

}
}
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h
b/src/gallium/drivers/radeon/r600_pipe_common.h
index 062c319..7da7736 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -639,6 +639,12 @@ r600_resource_reference(struct r600_resource
**ptr, struct r600_resource *res)
(struct pipe_resource *)res);
 }

+static inline bool r600_get_strmout_en(struct r600_common_context 
*rctx)

+{
+   return rctx->streamout.streamout_enabled ||
+  rctx->streamout.prims_gen_query_enabled;
+}
+
 static inline unsigned r600_tex_aniso_filter(unsigned filter)
 {
if (filter <= 1)   return 0;
diff --git a/src/gallium/drivers/radeon/r600_streamout.c
b/src/gallium/drivers/radeon/r600_streamout.c
index e977ed9..fc9ec48 100644
--- a/src/gallium/drivers/radeon/r600_streamout.c
+++ b/src/gallium/drivers/radeon/r600_streamout.c
@@ -311,12 +311,6 @@ void r600_emit_streamout_end(struct
r600_common_context *rctx)
  * are no buffers bound.
  */

-static bool r600_get_strmout_en(struct r600_common_context *rctx)
-{
-   return rctx->streamout.streamout_enabled ||
-  rctx->streamout.prims_gen_query_enabled;
-}
-
 static void r600_emit_streamout_enable(struct r600_common_context 
*rctx,

   struct r600_atom *atom)
 {
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c
b/src/gallium/drivers/radeonsi/si_state_draw.c
index 84b850a..3863e59 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -882,8 +882,7 @@ void si_draw_vbo(struct pipe_context *ctx, const
struct pipe_draw_info *info)
if ((sctx->b.family == CHIP_HAWAII ||
 sctx->b.family == CHIP_TONGA ||
 sctx->b.family == CHIP_FIJI) &&
-   (sctx->b.streamout.streamout_enabled ||
-sctx->b.streamout.prims_gen_query_enabled)) {
+   r600_get_strmout_en(>b)) {
sctx->b.flags |= SI_CONTEXT_VGT_STREAMOUT_SYNC;
}


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nouveau: codegen: Take src swizzle into account on loads

2016-04-08 Thread Hans de Goede


Hi,

On 07-04-16 15:58, Ilia Mirkin wrote:

That's wrong.


It used to work with the old RES[] code and if one cannot specify
a source swizzle, then how can I do something like

LOAD TEMP[0].y, MEMORY[0], address

And get the data at absolute global memory address "address" into TEMP[0].y ?

This is a must-have for llvm to be able to generate working TGSI code,
I do not see any way around this.

AFAIK this is exactly what src-swizzling is for. Also note that
this commit does not change anything if no src-swizzling is specified,
in that case things work exactly as before.

> The spec for the instruction needs to be clarified...

The current nouveau impl is correct - only the .x of the address
should be loaded, with up to 16 bytes read into the destination.


Ah note this is not about swizzling on the address, that indeed
makes no sense given how the addressing works for BUFFERS / MEMORY,
no this is about adding a swizlling postfix to the buffer / memory
resource specification, for example:

LOAD TEMP[0].y, MEMORY[0]., TEMP[0]

See the swizzling is done on the resource, not on the address, so
the swizzling specifies swizzling of the up to 16 bytes read from
address, it does not influence the address handling at all.

I now see I made an error in my commit msg, it gives the following
example:

LOAD TEMP[0].y, MEMORY[0]., TEMP[0].x

This clearly is wrong, the last TEMP[0].x is not even valid TGSI,
the correct example would be:

LOAD TEMP[0].y, MEMORY[0]., TEMP[0]

Regards,

Hans



On Thu, Apr 7, 2016 at 9:27 AM, Hans de Goede  wrote:

The llvm TGSI backend does things like:



Expecting the data at address TEMP[0].x to get loaded to
TEMP[0].y. Before this commit the data at TEMP[0].x + 4 would be
loaded instead. This commit fixes this.

Signed-off-by: Hans de Goede 
---
  src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 557608e..cc51f5a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -2279,12 +2279,16 @@ Converter::handleLOAD(Value *dst0[4])

   Value *off = fetchSrc(1, c);
   Symbol *sym;
+ uint32_t src0_component_offset = tgsi.getSrc(0).getSwizzle(c) * 4;
+
   if (tgsi.getSrc(1).getFile() == TGSI_FILE_IMMEDIATE) {
  off = NULL;
  sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c,
-  tgsi.getSrc(1).getValueU32(0, info) + 4 * c);
+  tgsi.getSrc(1).getValueU32(0, info) +
+  src0_component_offset);
   } else {
-sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c, 4 * c);
+sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c,
+  src0_component_offset);
   }

   Instruction *ld = mkLoad(TYPE_U32, dst0[c], sym, off);
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radeonsi: fix mask checking when emitting scissors and viewports

2016-04-08 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_state.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 8087d23..3894e1d 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -912,8 +912,10 @@ static void si_emit_scissors(struct si_context *sctx, 
struct r600_atom *atom)
bool scissor_enable = sctx->queued.named.rasterizer->scissor_enable;
 
/* The simple case: Only 1 viewport is active. */
-   if (mask & 1 &&
-   !si_get_vs_info(sctx)->writes_viewport_index) {
+   if (!si_get_vs_info(sctx)->writes_viewport_index) {
+   if (!(mask & 1))
+   return;
+
radeon_set_context_reg_seq(cs, 
R_028250_PA_SC_VPORT_SCISSOR_0_TL, 2);
si_emit_one_scissor(cs, >viewports.states[0],
scissor_enable ? [0] : NULL);
@@ -960,8 +962,10 @@ static void si_emit_viewports(struct si_context *sctx, 
struct r600_atom *atom)
unsigned mask = sctx->viewports.dirty_mask;
 
/* The simple case: Only 1 viewport is active. */
-   if (mask & 1 &&
-   !si_get_vs_info(sctx)->writes_viewport_index) {
+   if (!si_get_vs_info(sctx)->writes_viewport_index) {
+   if (!(mask & 1))
+   return;
+
radeon_set_context_reg_seq(cs, R_02843C_PA_CL_VPORT_XSCALE, 6);
radeon_emit(cs, fui(states[0].scale[0]));
radeon_emit(cs, fui(states[0].translate[0]));
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gallium/radeon: unify checking streamout enable state

2016-04-08 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/r600/r600_state_common.c  | 5 ++---
 src/gallium/drivers/radeon/r600_pipe_common.h | 6 ++
 src/gallium/drivers/radeon/r600_streamout.c   | 6 --
 src/gallium/drivers/radeonsi/si_state_draw.c  | 3 +--
 4 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index df41d3f..82babeb 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -1841,8 +1841,7 @@ static void r600_draw_vbo(struct pipe_context *ctx, const 
struct pipe_draw_info
ia_switch_on_eop = true;
}
 
-   if (rctx->b.streamout.streamout_enabled ||
-   rctx->b.streamout.prims_gen_query_enabled)
+   if (r600_get_strmout_en(>b))
partial_vs_wave = true;
 
radeon_set_context_reg(cs, CM_R_028AA8_IA_MULTI_VGT_PARAM,
@@ -2018,7 +2017,7 @@ static void r600_draw_vbo(struct pipe_context *ctx, const 
struct pipe_draw_info
rctx->b.family == CHIP_RV635) {
/* if we have gs shader or streamout
   we need to do a wait idle after every draw */
-   if (rctx->gs_shader || rctx->b.streamout.streamout_enabled) {
+   if (rctx->gs_shader || r600_get_strmout_en(>b)) {
radeon_set_config_reg(cs, R_008040_WAIT_UNTIL, 
S_008040_WAIT_3D_IDLE(1));
}
}
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index 062c319..7da7736 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -639,6 +639,12 @@ r600_resource_reference(struct r600_resource **ptr, struct 
r600_resource *res)
(struct pipe_resource *)res);
 }
 
+static inline bool r600_get_strmout_en(struct r600_common_context *rctx)
+{
+   return rctx->streamout.streamout_enabled ||
+  rctx->streamout.prims_gen_query_enabled;
+}
+
 static inline unsigned r600_tex_aniso_filter(unsigned filter)
 {
if (filter <= 1)   return 0;
diff --git a/src/gallium/drivers/radeon/r600_streamout.c 
b/src/gallium/drivers/radeon/r600_streamout.c
index e977ed9..fc9ec48 100644
--- a/src/gallium/drivers/radeon/r600_streamout.c
+++ b/src/gallium/drivers/radeon/r600_streamout.c
@@ -311,12 +311,6 @@ void r600_emit_streamout_end(struct r600_common_context 
*rctx)
  * are no buffers bound.
  */
 
-static bool r600_get_strmout_en(struct r600_common_context *rctx)
-{
-   return rctx->streamout.streamout_enabled ||
-  rctx->streamout.prims_gen_query_enabled;
-}
-
 static void r600_emit_streamout_enable(struct r600_common_context *rctx,
   struct r600_atom *atom)
 {
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index 84b850a..3863e59 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -882,8 +882,7 @@ void si_draw_vbo(struct pipe_context *ctx, const struct 
pipe_draw_info *info)
if ((sctx->b.family == CHIP_HAWAII ||
 sctx->b.family == CHIP_TONGA ||
 sctx->b.family == CHIP_FIJI) &&
-   (sctx->b.streamout.streamout_enabled ||
-sctx->b.streamout.prims_gen_query_enabled)) {
+   r600_get_strmout_en(>b)) {
sctx->b.flags |= SI_CONTEXT_VGT_STREAMOUT_SYNC;
}
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] i965: Extract SSEU configuration info

2016-04-08 Thread Kenneth Graunke

On Thursday, April 7, 2016 10:53:12 AM PDT Ben Widawsky wrote:
> Signed-off-by: Ben Widawsky 
> ---
>  src/mesa/drivers/dri/i965/intel_screen.c | 35 ++
+-
>  1 file changed, 21 insertions(+), 14 deletions(-)

Thanks for taking care of this.

Series is:
Reviewed-by: Kenneth Graunke 

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radeonsi: implement and rely on set_active_query_state

2016-04-08 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_blit.c   |  3 ---
 src/gallium/drivers/radeonsi/si_pipe.h   |  4 
 src/gallium/drivers/radeonsi/si_state.c  | 32 +++-
 src/gallium/drivers/radeonsi/si_state_draw.c | 10 +
 4 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_blit.c 
b/src/gallium/drivers/radeonsi/si_blit.c
index c5ea8b1..aed783f 100644
--- a/src/gallium/drivers/radeonsi/si_blit.c
+++ b/src/gallium/drivers/radeonsi/si_blit.c
@@ -52,8 +52,6 @@ static void si_blitter_begin(struct pipe_context *ctx, enum 
si_blitter_op op)
 {
struct si_context *sctx = (struct si_context *)ctx;
 
-   r600_suspend_nontimer_queries(>b);
-
util_blitter_save_vertex_buffer_slot(sctx->blitter, 
sctx->vertex_buffer);
util_blitter_save_vertex_elements(sctx->blitter, sctx->vertex_elements);
util_blitter_save_vertex_shader(sctx->blitter, sctx->vs_shader.cso);
@@ -95,7 +93,6 @@ static void si_blitter_end(struct pipe_context *ctx)
struct si_context *sctx = (struct si_context *)ctx;
 
sctx->b.render_cond_force_off = false;
-   r600_resume_nontimer_queries(>b);
 }
 
 static unsigned u_max_sample(struct pipe_resource *r)
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 4158fc5..8fcfcd2 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -66,6 +66,9 @@
 /* Compute only. */
 #define SI_CONTEXT_FLUSH_WITH_INV_L2   (R600_CONTEXT_PRIVATE_FLAG << 13) /* 
TODO: merge with TC? */
 #define SI_CONTEXT_FLAG_COMPUTE(R600_CONTEXT_PRIVATE_FLAG << 
14)
+/* Pipeline & streamout query controls. */
+#define SI_CONTEXT_START_PIPELINE_STATS(R600_CONTEXT_PRIVATE_FLAG << 
15)
+#define SI_CONTEXT_STOP_PIPELINE_STATS (R600_CONTEXT_PRIVATE_FLAG << 16)
 
 #define SI_CONTEXT_FLUSH_AND_INV_FRAMEBUFFER (SI_CONTEXT_FLUSH_AND_INV_CB | \
  SI_CONTEXT_FLUSH_AND_INV_CB_META 
| \
@@ -289,6 +292,7 @@ struct si_context {
booldb_stencil_clear;
booldb_stencil_disable_expclear;
unsignedps_db_shader_control;
+   boolocclusion_queries_disabled;
 
/* Emitted draw state. */
int last_base_vertex;
diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index a66bd30..6fbbb68 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -1352,6 +1352,26 @@ static void *si_create_db_flush_dsa(struct si_context 
*sctx)
 
 /* DB RENDER STATE */
 
+static void si_set_active_query_state(struct pipe_context *ctx, boolean enable)
+{
+   struct si_context *sctx = (struct si_context*)ctx;
+
+   /* Pipeline stat & streamout queries. */
+   if (enable) {
+   sctx->b.flags &= ~SI_CONTEXT_STOP_PIPELINE_STATS;
+   sctx->b.flags |= SI_CONTEXT_START_PIPELINE_STATS;
+   } else {
+   sctx->b.flags &= ~SI_CONTEXT_START_PIPELINE_STATS;
+   sctx->b.flags |= SI_CONTEXT_STOP_PIPELINE_STATS;
+   }
+
+   /* Occlusion queries. */
+   if (sctx->occlusion_queries_disabled != !enable) {
+   sctx->occlusion_queries_disabled = !enable;
+   si_mark_atom_dirty(sctx, >db_render_state);
+   }
+}
+
 static void si_set_occlusion_query_state(struct pipe_context *ctx, bool enable)
 {
struct si_context *sctx = (struct si_context*)ctx;
@@ -1386,7 +1406,8 @@ static void si_emit_db_render_state(struct si_context 
*sctx, struct r600_atom *s
}
 
/* DB_COUNT_CONTROL (occlusion queries) */
-   if (sctx->b.num_occlusion_queries > 0) {
+   if (sctx->b.num_occlusion_queries > 0 &&
+   !sctx->occlusion_queries_disabled) {
bool perfect = sctx->b.num_perfect_occlusion_queries > 0;
 
if (sctx->b.chip_class >= CIK) {
@@ -3765,6 +3786,7 @@ void si_init_state_functions(struct si_context *sctx)
sctx->b.b.set_min_samples = si_set_min_samples;
sctx->b.b.set_tess_state = si_set_tess_state;
 
+   sctx->b.b.set_active_query_state = si_set_active_query_state;
sctx->b.set_occlusion_query_state = si_set_occlusion_query_state;
sctx->b.need_gfx_cs_space = si_need_gfx_cs_space;
 
@@ -3995,6 +4017,14 @@ static void si_init_config(struct si_context *sctx)
si_pm4_cmd_add(pm4, 0x8000);
si_pm4_cmd_end(pm4, false);
 
+   /* This enables pipeline stat & streamout queries.
+* They are only disabled by blits.
+*/
+   si_pm4_cmd_begin(pm4, PKT3_EVENT_WRITE);
+   si_pm4_cmd_add(pm4, EVENT_TYPE(V_028A90_PIPELINESTAT_START) |
+   EVENT_INDEX(0));
+   si_pm4_cmd_end(pm4, false);
+
si_pm4_set_reg(pm4,

[Mesa-dev] [PATCH 1/2] arb_shader_image_size/builtin: don't report subtests that are never run

2016-04-08 Thread Marek Olšák

From: Marek Olšák 

this removes a lot of skipped subtests from results
---
 tests/spec/arb_shader_image_size/builtin.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/tests/spec/arb_shader_image_size/builtin.c 
b/tests/spec/arb_shader_image_size/builtin.c
index 434af70..c8a7724 100755
--- a/tests/spec/arb_shader_image_size/builtin.c
+++ b/tests/spec/arb_shader_image_size/builtin.c
@@ -254,11 +254,13 @@ test_max_dimensions(const struct image_format_info 
*format,
const struct image_extent size =
get_test_extent(target, d);
 
-   subtest(status,
-   !quick &&
-   is_test_reasonable(!slow, size) &&
-   is_format_interesting(format, slow) &&
-   is_stage_interesting(stage, slow),
+   if (quick ||
+   !is_test_reasonable(!slow, size) ||
+   !is_format_interesting(format, slow) ||
+   !is_stage_interesting(stage, slow))
+   continue;
+
+   subtest(status, true,
run_test(format, target, size),
"%s/%s/image%s max size test/%dx%dx%dx%d",
format->name, stage->name, target->name,
@@ -278,9 +280,11 @@ test_small_dimensions(const struct image_format_info 
*format,
image_extent_for_target(target,
16, 96);
 
-   subtest(status,
-   is_format_interesting(format, slow) &&
-   is_stage_interesting(stage, slow),
+   if (!is_format_interesting(format, slow) ||
+   !is_stage_interesting(stage, slow))
+   return;
+
+   subtest(status, true,
run_test(format, target, size),
"%s/%s/image%s size test/%dx%dx%dx%d",
format->name, stage->name, target->name,
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] Remove fs-discard-exit-3

2016-04-08 Thread Marek Olšák

From: Marek Olšák 

The test is wrong and the GLSL 1.30 citation in the test states very
clearly that discard can cause non-uniform control flow for any code that
follows.
---
 .../execution/fs-discard-exit-3.shader_test| 76 --
 1 file changed, 76 deletions(-)
 delete mode 100644 tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test

diff --git a/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test 
b/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test
deleted file mode 100644
index 14e9b47..000
--- a/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test
+++ /dev/null
@@ -1,76 +0,0 @@
-# This is a test for derivatives behavior after a discard.
-#
-# From the GLSL 1.30 spec:
-#
-# "The discard keyword is only allowed within fragment shaders. It
-#  can be used within a fragment shader to abandon the operation
-#  on the current fragment. This keyword causes the fragment to be
-#  discarded and no updates to any buffers will occur. Control
-#  flow exits the shader, and subsequent implicit or explicit
-#  derivatives are undefined when this control flow is non-uniform
-#  (meaning different fragments within the primitive take
-#  different control paths)."
-
-
-[require]
-GLSL >= 1.30
-
-[vertex shader]
-#version 130
-
-in vec4 vertex;
-out vec2 texcoords;
-void main()
-{
-   gl_Position = vertex;
-
-   /* Turn the texcoords into a 1:1 mapping with pixels when
-* interpolated.  This means that the coords for our 2x2
-* subspan we're interested in for the FS will be:
-*
-* +-+-+
-* | 0,1 | 1,1 |
-* +-+-+
-* | 0,0 | 0,1 |
-* +-+-+
-*
-* So it would sample the 1x1 miplevel of the GL_TEXTURE_2D
-* miptree, unless some other math occurs...
-*/
-   texcoords.yx = (vertex.xy + 1) / 2 * 250;
-}
-
-[fragment shader]
-#version 130
-in vec2 texcoords;
-uniform sampler2D s;
-
-void main()
-{
-   if (gl_FragCoord.x >= 1.0 || gl_FragCoord.y >= 1.0)
-   discard;
-
-   /* Now, we have uniform control after the discard (well,
-* except for the join after the if statement up there).  The
-* derivatives on this sample should get us the same values
-* for the undiscarded pixel as if we hadn't done any discard
-* (comment out the "discard" above to see).
-*/
-   gl_FragColor = texture(s, texcoords / 4);
-}
-
-[vertex data]
-vertex/float/2
--1.0 -1.0
- 1.0 -1.0
- 1.0  1.0
--1.0  1.0
-
-[test]
-clear color 0.5 0.5 0.5 0.5
-clear
-
-texture miptree 0
-
-draw arrays GL_TRIANGLE_FAN 0 4
-probe rgba 0 0 0.0 1.0 0.0 1.0
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: remove duplicate Makefile.sources

2016-04-08 Thread Jose Fonseca


On 02/04/16 19:35, Rob Clark wrote:

From: Rob Clark 

Allegedly this was needed still by scons build.. but in practice it
doesn't seem to be needed.  Removing it and running 'scons' results
in no build errors.

Signed-off-by: Rob Clark 
---
So, afaict NIR is not even built w/ scons build (I'm just running
'scons' with no args, so let me know if I'm missing some build
variant).  So at least if there is no scons variant that *does*
build NIR, I think this is the right thing to do to reduce
confusion.  But it brings up a bigger question of what to do
with my patchset which adds NIR support in mesa/st, since that
obviosly won't work with scons build as-is.

I guess the two options are to try to add NIR into scons build
(which involves some .py generated code, so maybe not trivial)
or just #ifdef'ify all the mesa/st parts in my gallium-nir
patchset which introduce dependencies on NIR.  Opinions?


If you might recall, I already had patches to build NIR with SCons:

  http://cgit.freedesktop.org/~jrfonseca/mesa/log/?h=scons-nir

I ended up not taking any action at the time.  First due to lack of 
time, second because I noticed NIR source trees being moving around, so 
the risk of exposing scons build to breakage didn't seem worthwhile.


So those changes need to be rebased and probably updated a bit, but it 
gives you an idea of what needs to be done.


Jose

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/4] i965: Pass brw_compiler into brw_preprocess_nir() instead of is_scalar.

2016-04-08 Thread Alejandro Piñeiro

LGTM,

Reviewed-by: Alejandro Piñeiro 

On 08/04/16 01:35, Kenneth Graunke wrote:
> I want to be able to read other fields.
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_nir.c | 6 --
>  src/mesa/drivers/dri/i965/brw_nir.h | 3 ++-
>  2 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
> b/src/mesa/drivers/dri/i965/brw_nir.c
> index c62840a..1821c0d 100644
> --- a/src/mesa/drivers/dri/i965/brw_nir.c
> +++ b/src/mesa/drivers/dri/i965/brw_nir.c
> @@ -437,11 +437,13 @@ nir_optimize(nir_shader *nir, bool is_scalar)
>   * is_scalar = true to scalarize everything prior to code gen.
>   */
>  nir_shader *
> -brw_preprocess_nir(nir_shader *nir, bool is_scalar)
> +brw_preprocess_nir(const struct brw_compiler *compiler, nir_shader *nir)
>  {
> bool progress; /* Written by OPT and OPT_V */
> (void)progress;
>  
> +   const bool is_scalar = compiler->scalar_stage[nir->stage];
> +
> if (nir->stage == MESA_SHADER_GEOMETRY)
>OPT(nir_lower_gs_intrinsics);
>  
> @@ -568,7 +570,7 @@ brw_create_nir(struct brw_context *brw,
>  
> (void)progress;
>  
> -   nir = brw_preprocess_nir(nir, is_scalar);
> +   nir = brw_preprocess_nir(brw->intelScreen->compiler, nir);
>  
> OPT(nir_lower_system_values);
> OPT_V(brw_nir_lower_uniforms, is_scalar);
> diff --git a/src/mesa/drivers/dri/i965/brw_nir.h 
> b/src/mesa/drivers/dri/i965/brw_nir.h
> index 440b4ce..b10c083 100644
> --- a/src/mesa/drivers/dri/i965/brw_nir.h
> +++ b/src/mesa/drivers/dri/i965/brw_nir.h
> @@ -81,7 +81,8 @@ nir_shader *brw_create_nir(struct brw_context *brw,
> gl_shader_stage stage,
> bool is_scalar);
>  
> -nir_shader *brw_preprocess_nir(nir_shader *nir, bool is_scalar);
> +nir_shader *brw_preprocess_nir(const struct brw_compiler *compiler,
> +   nir_shader *nir);
>  
>  void brw_nir_lower_vs_inputs(nir_shader *nir,
>   const struct brw_device_info *devinfo,

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/mesa: fix glReadBuffer() assertion failure

2016-04-08 Thread Jose Fonseca


On 08/04/16 00:45, Brian Paul wrote:

If the first call in a GL app is glReadPixels(GL_FRONT) we'd fail the
assert(st->ctx->FragmentProgram._Current) at st_atom_shader.c:114 in
update_fp().

This is because we were calling st_validate_state() without first
updating Mesa state with _mesa_update_state().

The regression came from commit 83b589301f4a150f4 "st/mesa: fix
frontbuffer glReadPixels regressions".

The new piglit gl-1.0-simple-readbuffer test exercises this.

Cc: "11.1 11.2" 
---
  src/mesa/state_tracker/st_cb_fbo.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/src/mesa/state_tracker/st_cb_fbo.c 
b/src/mesa/state_tracker/st_cb_fbo.c
index ff570e0..456ad83 100644
--- a/src/mesa/state_tracker/st_cb_fbo.c
+++ b/src/mesa/state_tracker/st_cb_fbo.c
@@ -40,6 +40,7 @@
  #include "main/glformats.h"
  #include "main/macros.h"
  #include "main/renderbuffer.h"
+#include "main/state.h"

  #include "pipe/p_context.h"
  #include "pipe/p_defines.h"
@@ -729,6 +730,7 @@ st_ReadBuffer(struct gl_context *ctx, GLenum buffer)
 fb->Attachment[fb->_ColorReadBufferIndex].Type == GL_NONE) {
/* add the buffer */
st_manager_add_color_renderbuffer(st, fb, fb->_ColorReadBufferIndex);
+  _mesa_update_state(ctx);
st_validate_state(st, ST_PIPELINE_RENDER);
 }
  }



Reviewed-by: Jose Fonseca 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 00/15] i965: Rework uniform handling in the back-end

2016-04-08 Thread Jason Ekstrand

On Apr 7, 2016 10:01 PM, "Matt Turner"  wrote:
>
> On Tue, Mar 22, 2016 at 3:33 PM, Jason Ekstrand 
wrote:
> > This is mostly a re-send of a patch series I've had floating around in
one
> > form or a while for quite some time.  It's basically the same except
that
> > the original version was missing a work-around for Sandy Bridge.  For a
> > while, I wasn't really pushing to get it merged because I couldn't
> > demonstrate any actual performance benifit from pushing arrays.
However,
> > with the Vulkan API, the concept of push constants is directly exposed
to
> > the user and we really need to be able to indirect on them.  This series
> > makes the FS backend 100% ready for indirect push constants;  vec4 will
> > take a little more work.
> >
> > It's worth noting that we've been carying these patches around in our
> > Vulkan driver for probably 3 or 4 months now and it's working great.
> >
> > For those that prefer to review on a branch:
> >
> > https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i965-uniforms
> >
> > I think Kristian has mostly reviewed these patches.  However, he never
sent
> > any R-Bs to the list.  I'd also like Ken or Matt to look at it from a
> > design perspective.
>
> I don't know what I think. I'm sympathetic to Curro's argument, but in
> the absence of more data it's hard to judge anything really. I'm not
> at all sympathetic to
>
> """
> Do I have a proof-of-concept in code, no.  However, I've run through
> it in my head and I have a pretty good idea what it would look like.
> You are free to go off and do it if you don't believe me, but I don't
> really want to hold things up while you do.
> """
>
> That's what... An Appeal to Your Brain? :)

Sort-of... It was more a remark of frustration at the (percieved)
implication that I hadn't thought about it or at the very least hadnt given
it a fair shake.  In a bit more detail here are some of my thoughts on
reladdr and an ADDR file in no particular order

a) Not a single FS optimization pass handles it.  Yes, "if you see reladdr,
bail” is a valid (if suboptimal) strategy 90% of the time.  However,
anything that computes any sort of kill set now needs a recursive algorithm
to walk register sources.  We do handle this in NIR and it's not terrible
but it does come with nontrivial pain and retrofitting it isn't necessarily
going to be quick-and-easy.  Curro's response of "use-def chains will fix
this" while probably accurate doesn't solve the immediate problem while
these patches have been on the list for 6 months.

b) The hardware doesn't do reladdr.  It has an address register with
substantial restrictions.  Eventually, we would need to lower to something
that writes the address register and have an indirect source type that
consumes it.  If you end up with two indirect sources, we have to emit a
move for one of them.  Where do we do that lowering?  Do we do it in the
generator or as a pass?

c) If we handle it all in the generator, we have no ability to schedule it
at all.  It also makes the generator far more complex.

d) If we handle it in a lowering pass, what does that pass produce? Do we
expose the ADDR file and try to do RA on it or do we treat it as a fixed
thing like flag?  In either case, we need to add extra logic to at least
the scheduler if not other places to add this whole new concept.

e) If we allow indirect sources of any sort, how do we carry range
information around post-RA.  Pre-RA we can theoretically just say if you
indirect you touch the whole thing.  Post-RA, you either have to carry that
information around per-instruction or you have to assume that any
instruction that uses an indirect source could be reading anything in the
entire GRF and it becomes almost a complete scheduling barrier.

Those are the thoughts that pop to the top. I could come up with more if
you'd like.

So, yes, using reladdr or or an ADDR file would be possible but it would
involve substantial IR surgery.  What's the benefit?  You can put the
relative source directly in the instruction that uses it and maybe do an
address calculation directly to the address register instead of having to
move it there.  The approach I've taken on the other hand, neatly
side-steps all of the issues listed above.  This comes at the cost of a few
extra instructions (which you probably have to spend anyway on gen7).  I
think that trade-off is worthwhile.

--Jason

> I don't know how to proceed on that front if no one is willing or
> interested in trying to implement it using reladdr.
>
> I ran shader-db.
>
> total instructions in shared programs: 7113290 -> 7161760 (0.68%)
> instructions in affected programs: 866011 -> 914481 (5.60%)
> helped: 0
> HURT: 7180
>
> total cycles in shared programs: 64705926 -> 64776118 (0.11%)
> cycles in affected programs: 4951554 -> 5021746 (1.42%)
> helped: 1605
> HURT: 5204
>
> of which the overwhelming majority is vertex shaders (why? this series
> is i965/fs). FS changes are

Re: [Mesa-dev] [PATCH] st/va: avoid dereference after free

2016-04-08 Thread Julien Isorce

Hi Thomas and Emil, I tested it and pushed. Thx a lot.

On 6 April 2016 at 21:34, Thomas H.P. Andersen  wrote:

>
>
> On Sun, Mar 6, 2016 at 10:08 AM, Thomas H.P. Andersen 
> wrote:
>
>>
>>
>> On Sat, Mar 5, 2016 at 1:30 PM, Emil Velikov 
>> wrote:
>>
>>> Hi Thomas,
>>>
>>> On 5 March 2016 at 12:07, Thomas Hindoe Paaboel Andersen
>>>  wrote:
>>> > ---
>>> >  src/gallium/state_trackers/va/image.c | 4 +++-
>>> >  1 file changed, 3 insertions(+), 1 deletion(-)
>>> >
>>> > diff --git a/src/gallium/state_trackers/va/image.c
>>> b/src/gallium/state_trackers/va/image.c
>>> > index 2c42a98..92d014c 100644
>>> > --- a/src/gallium/state_trackers/va/image.c
>>> > +++ b/src/gallium/state_trackers/va/image.c
>>> > @@ -280,6 +280,7 @@ vlVaDestroyImage(VADriverContextP ctx, VAImageID
>>> image)
>>> >  {
>>> > vlVaDriver *drv;
>>> > VAImage  *vaimage;
>>> > +   VAStatus status;
>>> >
>>> > if (!ctx)
>>> >return VA_STATUS_ERROR_INVALID_CONTEXT;
>>> > @@ -294,8 +295,9 @@ vlVaDestroyImage(VADriverContextP ctx, VAImageID
>>> image)
>>> >
>>> > handle_table_remove(VL_VA_DRIVER(ctx)->htab, image);
>>> > pipe_mutex_unlock(drv->mutex);
>>> > +   status = vlVaDestroyBuffer(ctx, vaimage->buf);
>>> > FREE(vaimage);
>>> > -   return vlVaDestroyBuffer(ctx, vaimage->buf);
>>> > +   return status;
>>>
>>> Nicely spotted !
>>> Out of curiosity: did you notice this during code inspection or did it
>>> pop up during testing ?
>>>
>>
>> Thanks for the review! I only saw it by inspection. I am just poking
>> around a bit to get to know the code.
>>
>>
>>>
>>> For the patch
>>>
>>> Cc: "11.1 11.2" 
>>> Reviewed-by: Emil Velikov 
>>>
>>> I'll push this in a couple of days unless someone beats me to it.
>>>
>>
> Hi Emil,
>
> Friendly ping :)
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

68 matches

Mail list logo