[Mesa-dev] r600g,radeonsi: ARB_query_buffer_object grunt work
These two patches provide just the grunt work for ARB_query_buffer_object support. My incomplete hardware specific callback has been left out of this series. However this does allow a user to fake the PIPE_CAP and not just get a segfault from a unchecked derefenced to a NULL pointer from within 'st_StoreQueryResult()'. Edward O'Callaghan (2): radeon: Provision a query_ops callback, radeon: Implement the get_query_result_resource() call-in ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radeon: Implement the get_query_result_resource() call-in site
Implement the frontend part of the ARB_query_buffer_object gallium callback into the driver query_ops. Signed-off-by: Edward O'Callaghan--- src/gallium/drivers/radeon/r600_query.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/src/gallium/drivers/radeon/r600_query.c b/src/gallium/drivers/radeon/r600_query.c index 7a2d2ee..b75ad2d 100644 --- a/src/gallium/drivers/radeon/r600_query.c +++ b/src/gallium/drivers/radeon/r600_query.c @@ -899,6 +899,24 @@ static void r600_query_hw_add_result(struct r600_common_context *ctx, } } +static void r600_get_query_result_resource(struct pipe_context *ctx, + struct pipe_query *query, boolean wait, + enum pipe_query_value_type result_type, + int index, struct pipe_resource *resource, + unsigned offset) +{ + struct r600_common_context *rctx = (struct r600_common_context *)ctx; + struct r600_query *rquery = (struct r600_query *)query; + + if (!rquery->ops->get_query_result_resource) { + assert(!"Unexpected lack of get_query_result_resource"); + return; + } + + rquery->ops->get_query_result_resource(rctx, rquery, wait, result_type, + index, resource, offset); +} + static boolean r600_get_query_result(struct pipe_context *ctx, struct pipe_query *query, boolean wait, union pipe_query_result *result) @@ -1269,6 +1287,7 @@ void r600_query_init(struct r600_common_context *rctx) rctx->b.begin_query = r600_begin_query; rctx->b.end_query = r600_end_query; rctx->b.get_query_result = r600_get_query_result; + rctx->b.get_query_result_resource = r600_get_query_result_resource; rctx->render_cond_atom.emit = r600_emit_query_predication; if (((struct r600_common_screen*)rctx->b.screen)->info.num_render_backends > 0) -- 2.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] radeon: Provision a query_ops callback, get_query_result_resource()
ARB_query_buffer_object requires us to provide a query_ops callback for statistics about the operation in the OpenGL pipeline. Signed-off-by: Edward O'Callaghan--- src/gallium/drivers/radeon/r600_query.h | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/drivers/radeon/r600_query.h b/src/gallium/drivers/radeon/r600_query.h index 8b2c4e3..b3d4501 100644 --- a/src/gallium/drivers/radeon/r600_query.h +++ b/src/gallium/drivers/radeon/r600_query.h @@ -73,6 +73,11 @@ struct r600_query_ops { boolean (*get_result)(struct r600_common_context *, struct r600_query *, boolean wait, union pipe_query_result *result); + void (*get_query_result_resource)(struct r600_common_context *, + struct r600_query *, boolean wait, + enum pipe_query_value_type result_type, + int index, struct pipe_resource *resource, + unsigned offset); }; struct r600_query { -- 2.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] dri/i965: extend GLES3 sRGB workaround to cover all byte orders
On Wed, Apr 6, 2016 at 10:35 PM, Haixia Shiwrote: > I am afraid that I am not the original author of the > gles3_srgb_workaround, and from the comments it appears to only apply for > argb formats, so I am not sure if extending it to all other formats would > cause behavioral regression. > On Apr 6, 2016 10:13 PM, "Jason Ekstrand" wrote: > >> >> On Apr 6, 2016 7:23 PM, "Kenneth Graunke" wrote: >> > >> > On Wednesday, April 6, 2016 4:43:39 PM PDT Haixia Shi wrote: >> > > It is incorrect to assume BGRA byte order for the GLES3 sRGB >> workaround. >> > > >> > > Signed-off-by: Haixia Shi >> > > Reviewed-by: Stéphane Marchesin >> > > Cc: kenneth.w.grau...@intel.com >> > > >> > > Change-Id: I5a081d7eaa7544afff0e7874cffef80d3f69a401 >> > > --- >> > > src/mesa/drivers/dri/i965/brw_context.c | 18 -- >> > > 1 file changed, 16 insertions(+), 2 deletions(-) >> > > >> > > diff --git a/src/mesa/drivers/dri/i965/brw_context.c >> b/src/mesa/drivers/dri/ >> > i965/brw_context.c >> > > index 2d480d0..cebfbda 100644 >> > > --- a/src/mesa/drivers/dri/i965/brw_context.c >> > > +++ b/src/mesa/drivers/dri/i965/brw_context.c >> > > @@ -1151,9 +1151,23 @@ intel_gles3_srgb_workaround(struct brw_context >> *brw, >> > > */ >> > > fb->Visual.sRGBCapable = false; >> > > for (int i = 0; i < BUFFER_COUNT; i++) { >> > > - if (fb->Attachment[i].Renderbuffer && >> > > - fb->Attachment[i].Renderbuffer->Format == >> > MESA_FORMAT_B8G8R8A8_SRGB) { >> > > + if (!fb->Attachment[i].Renderbuffer) >> > > + continue; >> > > + switch (fb->Attachment[i].Renderbuffer->Format) { >> > > + case MESA_FORMAT_A8B8G8R8_SRGB: >> > > + fb->Attachment[i].Renderbuffer->Format = >> > MESA_FORMAT_A8B8G8R8_UNORM; >> > > + break; >> > > + case MESA_FORMAT_B8G8R8A8_SRGB: >> > > fb->Attachment[i].Renderbuffer->Format = >> > MESA_FORMAT_B8G8R8A8_UNORM; >> > > + break; >> > > + case MESA_FORMAT_A8R8G8B8_SRGB: >> > > + fb->Attachment[i].Renderbuffer->Format = >> > MESA_FORMAT_A8R8G8B8_UNORM; >> > > + break; >> > > + case MESA_FORMAT_R8G8B8A8_SRGB: >> > > + fb->Attachment[i].Renderbuffer->Format = >> > MESA_FORMAT_R8G8B8A8_UNORM; >> > > + break; >> > > + default: >> > > + break; >> > >} >> > > } >> > > } >> > > >> > >> > Why don't we simply do: >> > >> >struct gl_renderbuffer *rb = fb->Attachment[i].Renderbuffer; >> >rb->Format = _mesa_get_srgb_format_linear(rb->Format); >> > >> > This would handle far more formats than we need to, but that shouldn't >> > be a problem. >> >> I'll second that. >> > Look at the comment right above the function in question, I think Ken's suggest is exactly the right thing to do. There's nothing involved here that should be -specific. --Jason ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] glsl: handle unsigned int wraparound in link_shaders()
Pushed thanks! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: enable ARB_explicit_uniform_location for swrast
This is untested but since ARB_explicit_attrib_location is enabled there should be no problem enabling this also. --- src/mesa/main/extensions.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index fa50cb6..a889902 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -132,6 +132,7 @@ _mesa_enable_sw_extensions(struct gl_context *ctx) ctx->Extensions.ARB_draw_elements_base_vertex = GL_TRUE; ctx->Extensions.ARB_draw_instanced = GL_TRUE; ctx->Extensions.ARB_explicit_attrib_location = GL_TRUE; + ctx->Extensions.ARB_explicit_uniform_location = GL_TRUE; ctx->Extensions.ARB_fragment_coord_conventions = GL_TRUE; ctx->Extensions.ARB_fragment_program = GL_TRUE; ctx->Extensions.ARB_fragment_program_shadow = GL_TRUE; -- 2.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] nir: Silence unused "options" warning in algebraic passes.
Quoting Dylan Baker (2016-04-08 14:54:38) > Quoting Kenneth Graunke (2016-04-08 11:48:01) > > On Friday, April 8, 2016 11:18:46 AM PDT Dylan Baker wrote: > > > > Quoting Eduardo Lima Mitev (2016-04-08 08:26:22) > > > > On 04/08/2016 01:35 AM, Kenneth Graunke wrote: > > > > > Some passes may not refer to options->..., at which point the compiler > > > > > will warn about an unused variable. Just cast to void unconditionally > > > > > to shut it up. > > > > > > > > > > Signed-off-by: Kenneth Graunke> > > > > --- > > > > > src/compiler/nir/nir_algebraic.py | 1 + > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > diff --git a/src/compiler/nir/nir_algebraic.py b/src/compiler/nir/ > > nir_algebraic.py > > > > > index d05564f..53a7907 100644 > > > > > --- a/src/compiler/nir/nir_algebraic.py > > > > > +++ b/src/compiler/nir/nir_algebraic.py > > > > > @@ -291,6 +291,7 @@ ${pass_name}(nir_shader *shader) > > > > > bool progress = false; > > > > > bool condition_flags[${len(condition_list)}]; > > > > > const nir_shader_compiler_options *options = shader->options; > > > > > + (void) options; > > > > > > > > > > > > > Hmm, don't like this very much. I suppose since it is generated code, it > > > > can not be done per block where options is unused. > > > > For respect to other people's right to have clean build logs, patch is: > > > > > > > > Reviewed-by: Eduardo Lima Mitev > > > > > > > > > % for index, condition in enumerate(condition_list): > > > > > condition_flags[${index}] = ${condition}; > > > > > > > > > > > > > > > I'm not familiar with the code, but looking at it options must be used > > > in the condition_list argument? > > > > Right, we create the 'options' temporary so you can write rules like: > > > > (('~fmin', ('fmax', a, 0.0), 1.0), ('fsat', a), '!options->lower_fsat'), > > > > where !options->lower_fsat becomes part of the condition_list. However, > > conditions are optional. If you have an algebraic rule list which doesn't > > include any conditions referring to options->..., the variable will be > > unused. (For example, the list in patch 4 meets this criteria.) > > > > So what about wrapping the definitions of options in a conditional? > % if 'options' in conditions[-1]: > const nir_shader_compiler_options *options = shader->options; > % endif > > This only covers that options is the last argument, which seems to be > what nir_opt_algebraic does, although I don't know that is actually a > hard requirement, or just a coincidence. > > Dylan Never mind. Jason and I talked about this and this might possibly work, by accident. Please ignore. Dylan signature.asc Description: signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] include/GL: add mesa_glinterop.h for OpenGL-OpenCL interop (v3)
On Fri, Apr 1, 2016 at 2:13 PM, Emil Velikovwrote: > On 16 March 2016 at 10:40, Marek Olšák wrote: > >> "offset_after" isn't nasty. :) Yeah, I was inspired by other APIs I >> had seen. The sizes make even more sense when they are function >> parameters, because the caller can just do: >> (sizeof(in), , sizeof(out), ) >> > A nice list of arguments: > - If the majority of people like offset_after, the question "Why > barely any projects use it (from a quick search) ?" comes to mind. > - I wasn't the only one advocating for versioned interfaces ;-) > - They will just work in an identical way and the code will be less. > - Regardless of how ugly/nasty/etc, mesa uses versioned interfaces > throughout. "Consistency is the key" a wise man have said once. > - The interfaces using explicit size argument, that I'm aware of, are > not designed with extensibility in mind. Updated branch with your suggestions applied: https://cgit.freedesktop.org/~mareko/mesa/log/?h=interop Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: remove duplicate Makefile.sources
On Fri, Apr 8, 2016 at 4:37 AM, Jose Fonsecawrote: > On 02/04/16 19:35, Rob Clark wrote: >> >> From: Rob Clark >> >> Allegedly this was needed still by scons build.. but in practice it >> doesn't seem to be needed. Removing it and running 'scons' results >> in no build errors. >> >> Signed-off-by: Rob Clark >> --- >> So, afaict NIR is not even built w/ scons build (I'm just running >> 'scons' with no args, so let me know if I'm missing some build >> variant). So at least if there is no scons variant that *does* >> build NIR, I think this is the right thing to do to reduce >> confusion. But it brings up a bigger question of what to do >> with my patchset which adds NIR support in mesa/st, since that >> obviosly won't work with scons build as-is. >> >> I guess the two options are to try to add NIR into scons build >> (which involves some .py generated code, so maybe not trivial) >> or just #ifdef'ify all the mesa/st parts in my gallium-nir >> patchset which introduce dependencies on NIR. Opinions? > > > If you might recall, I already had patches to build NIR with SCons: > > http://cgit.freedesktop.org/~jrfonseca/mesa/log/?h=scons-nir > > I ended up not taking any action at the time. First due to lack of time, > second because I noticed NIR source trees being moving around, so the risk > of exposing scons build to breakage didn't seem worthwhile. > > So those changes need to be rebased and probably updated a bit, but it gives > you an idea of what needs to be done. So, I don't suppose you could give me some hint about how you are supposed to get path to generated header (in this case nir_opcodes.h included from nir.h included from mesa/st) into the include path for src/mesa? BR, -R ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/7] gallium/radeon: remove R600_QUERY_HW_FLAG_TIMER
From: Marek Olšáknot used anymore --- src/gallium/drivers/r600/r600_hw_context.c| 2 +- src/gallium/drivers/radeon/r600_perfcounter.c | 1 - src/gallium/drivers/radeon/r600_query.c | 4 +--- src/gallium/drivers/radeon/r600_query.h | 3 +-- 4 files changed, 3 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index 3ef2ac5..0c3b580 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -64,7 +64,7 @@ void r600_need_cs_space(struct r600_context *ctx, unsigned num_dw, num_dw += R600_MAX_FLUSH_CS_DWORDS + R600_MAX_DRAW_CS_DWORDS; } - /* Count in queries_suspend. */ + /* Count in r600_suspend_queries. */ num_dw += ctx->b.num_cs_dw_queries_suspend; /* Count in streamout_end at the end of CS. */ diff --git a/src/gallium/drivers/radeon/r600_perfcounter.c b/src/gallium/drivers/radeon/r600_perfcounter.c index f3529a1..9ab17d9 100644 --- a/src/gallium/drivers/radeon/r600_perfcounter.c +++ b/src/gallium/drivers/radeon/r600_perfcounter.c @@ -310,7 +310,6 @@ struct pipe_query *r600_create_batch_query(struct pipe_context *ctx, query->b.b.ops = _query_ops; query->b.ops = _query_hw_ops; - query->b.flags = R600_QUERY_HW_FLAG_TIMER; query->num_counters = num_queries; diff --git a/src/gallium/drivers/radeon/r600_query.c b/src/gallium/drivers/radeon/r600_query.c index aa86560..de6e37b 100644 --- a/src/gallium/drivers/radeon/r600_query.c +++ b/src/gallium/drivers/radeon/r600_query.c @@ -369,13 +369,11 @@ static struct pipe_query *r600_query_hw_create(struct r600_common_context *rctx, query->result_size = 16; query->num_cs_dw_begin = 8; query->num_cs_dw_end = 8; - query->flags = R600_QUERY_HW_FLAG_TIMER; break; case PIPE_QUERY_TIMESTAMP: query->result_size = 8; query->num_cs_dw_end = 8; - query->flags = R600_QUERY_HW_FLAG_TIMER | - R600_QUERY_HW_FLAG_NO_START; + query->flags = R600_QUERY_HW_FLAG_NO_START; break; case PIPE_QUERY_PRIMITIVES_EMITTED: case PIPE_QUERY_PRIMITIVES_GENERATED: diff --git a/src/gallium/drivers/radeon/r600_query.h b/src/gallium/drivers/radeon/r600_query.h index 8b2c4e3..9f3a917 100644 --- a/src/gallium/drivers/radeon/r600_query.h +++ b/src/gallium/drivers/radeon/r600_query.h @@ -84,8 +84,7 @@ struct r600_query { enum { R600_QUERY_HW_FLAG_NO_START = (1 << 0), - R600_QUERY_HW_FLAG_TIMER = (1 << 1), - R600_QUERY_HW_FLAG_PREDICATE = (1 << 2), + R600_QUERY_HW_FLAG_PREDICATE = (1 << 1), }; struct r600_query_hw_ops { -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/7] gallium/radeon: merge timer and non-timer query lists
From: Marek OlšákAll of them are paused only between IBs. --- src/gallium/drivers/r600/r600_hw_context.c| 3 +- src/gallium/drivers/radeon/r600_pipe_common.c | 18 ++-- src/gallium/drivers/radeon/r600_pipe_common.h | 19 +++- src/gallium/drivers/radeon/r600_query.c | 65 ++- 4 files changed, 23 insertions(+), 82 deletions(-) diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index 63b631a..3ef2ac5 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -65,8 +65,7 @@ void r600_need_cs_space(struct r600_context *ctx, unsigned num_dw, } /* Count in queries_suspend. */ - num_dw += ctx->b.num_cs_dw_nontimer_queries_suspend + - ctx->b.num_cs_dw_timer_queries_suspend; + num_dw += ctx->b.num_cs_dw_queries_suspend; /* Count in streamout_end at the end of CS. */ if (ctx->b.streamout.begin_emitted) { diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 32bd6e4..f587332 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -156,14 +156,8 @@ static void r600_memory_barrier(struct pipe_context *ctx, unsigned flags) void r600_preflush_suspend_features(struct r600_common_context *ctx) { /* suspend queries */ - if (ctx->num_cs_dw_nontimer_queries_suspend) { - /* Since non-timer queries are suspended during blits, -* we have to guard against double-suspends. */ - r600_suspend_nontimer_queries(ctx); - ctx->nontimer_queries_suspended_by_flush = true; - } - if (!LIST_IS_EMPTY(>active_timer_queries)) - r600_suspend_timer_queries(ctx); + if (!LIST_IS_EMPTY(>active_queries)) + r600_suspend_queries(ctx); ctx->streamout.suspended = false; if (ctx->streamout.begin_emitted) { @@ -180,12 +174,8 @@ void r600_postflush_resume_features(struct r600_common_context *ctx) } /* resume queries */ - if (!LIST_IS_EMPTY(>active_timer_queries)) - r600_resume_timer_queries(ctx); - if (ctx->nontimer_queries_suspended_by_flush) { - ctx->nontimer_queries_suspended_by_flush = false; - r600_resume_nontimer_queries(ctx); - } + if (!LIST_IS_EMPTY(>active_queries)) + r600_resume_queries(ctx); } static void r600_flush_from_st(struct pipe_context *ctx, diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index 57af0ff..c387922 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -428,18 +428,11 @@ struct r600_common_context { unsigned flags; /* flush flags */ /* Queries. */ - /* The list of active queries. */ + /* Maintain the list of active queries for pausing between IBs. */ int num_occlusion_queries; int num_perfect_occlusion_queries; - /* Keep track of non-timer queries, because they should be suspended -* during context flushing. -* The timer queries (TIME_ELAPSED) shouldn't be suspended for blits, -* but they should be suspended between IBs. */ - struct list_headactive_nontimer_queries; - struct list_headactive_timer_queries; - unsignednum_cs_dw_nontimer_queries_suspend; - boolnontimer_queries_suspended_by_flush; - unsignednum_cs_dw_timer_queries_suspend; + struct list_headactive_queries; + unsignednum_cs_dw_queries_suspend; /* Additional hardware info. */ unsignedbackend_mask; unsignedmax_db; /* for OQ */ @@ -569,10 +562,8 @@ void r600_perfcounters_destroy(struct r600_common_screen *rscreen); /* r600_query.c */ void r600_init_screen_query_functions(struct r600_common_screen *rscreen); void r600_query_init(struct r600_common_context *rctx); -void r600_suspend_nontimer_queries(struct r600_common_context *ctx); -void r600_resume_nontimer_queries(struct r600_common_context *ctx); -void r600_suspend_timer_queries(struct r600_common_context *ctx); -void r600_resume_timer_queries(struct r600_common_context *ctx); +void r600_suspend_queries(struct r600_common_context *ctx); +void r600_resume_queries(struct r600_common_context *ctx); void r600_query_init_backend_mask(struct r600_common_context *ctx); /* r600_streamout.c */ diff --git a/src/gallium/drivers/radeon/r600_query.c b/src/gallium/drivers/radeon/r600_query.c index 7a2d2ee..aa86560 100644 ---
[Mesa-dev] [PATCH 1/7] gallium/radeon: move pipeline stat context flags to common code
From: Marek Olšák--- src/gallium/drivers/radeon/r600_pipe_common.h | 5 - src/gallium/drivers/radeonsi/si_pipe.h| 3 --- src/gallium/drivers/radeonsi/si_state.c | 8 src/gallium/drivers/radeonsi/si_state_draw.c | 4 ++-- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index 7da7736..57af0ff 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -50,7 +50,10 @@ #define R600_RESOURCE_FLAG_FORCE_TILING (PIPE_RESOURCE_FLAG_DRV_PRIV << 2) #define R600_CONTEXT_STREAMOUT_FLUSH (1u << 0) -#define R600_CONTEXT_PRIVATE_FLAG (1u << 1) +/* Pipeline & streamout query controls. */ +#define R600_CONTEXT_START_PIPELINE_STATS (1u << 1) +#define R600_CONTEXT_STOP_PIPELINE_STATS (1u << 2) +#define R600_CONTEXT_PRIVATE_FLAG (1u << 3) /* special primitive types */ #define R600_PRIM_RECTANGLE_LIST PIPE_PRIM_MAX diff --git a/src/gallium/drivers/radeonsi/si_pipe.h b/src/gallium/drivers/radeonsi/si_pipe.h index 8fcfcd2..f665c81 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.h +++ b/src/gallium/drivers/radeonsi/si_pipe.h @@ -66,9 +66,6 @@ /* Compute only. */ #define SI_CONTEXT_FLUSH_WITH_INV_L2 (R600_CONTEXT_PRIVATE_FLAG << 13) /* TODO: merge with TC? */ #define SI_CONTEXT_FLAG_COMPUTE(R600_CONTEXT_PRIVATE_FLAG << 14) -/* Pipeline & streamout query controls. */ -#define SI_CONTEXT_START_PIPELINE_STATS(R600_CONTEXT_PRIVATE_FLAG << 15) -#define SI_CONTEXT_STOP_PIPELINE_STATS (R600_CONTEXT_PRIVATE_FLAG << 16) #define SI_CONTEXT_FLUSH_AND_INV_FRAMEBUFFER (SI_CONTEXT_FLUSH_AND_INV_CB | \ SI_CONTEXT_FLUSH_AND_INV_CB_META | \ diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index 0c46425..94130a9 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -1358,11 +1358,11 @@ static void si_set_active_query_state(struct pipe_context *ctx, boolean enable) /* Pipeline stat & streamout queries. */ if (enable) { - sctx->b.flags &= ~SI_CONTEXT_STOP_PIPELINE_STATS; - sctx->b.flags |= SI_CONTEXT_START_PIPELINE_STATS; + sctx->b.flags &= ~R600_CONTEXT_STOP_PIPELINE_STATS; + sctx->b.flags |= R600_CONTEXT_START_PIPELINE_STATS; } else { - sctx->b.flags &= ~SI_CONTEXT_START_PIPELINE_STATS; - sctx->b.flags |= SI_CONTEXT_STOP_PIPELINE_STATS; + sctx->b.flags &= ~R600_CONTEXT_START_PIPELINE_STATS; + sctx->b.flags |= R600_CONTEXT_STOP_PIPELINE_STATS; } /* Occlusion queries. */ diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index 105c5fb..40cad50 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -722,11 +722,11 @@ void si_emit_cache_flush(struct si_context *si_ctx, struct r600_atom *atom) } } - if (sctx->flags & SI_CONTEXT_START_PIPELINE_STATS) { + if (sctx->flags & R600_CONTEXT_START_PIPELINE_STATS) { radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0)); radeon_emit(cs, EVENT_TYPE(V_028A90_PIPELINESTAT_START) | EVENT_INDEX(0)); - } else if (sctx->flags & SI_CONTEXT_STOP_PIPELINE_STATS) { + } else if (sctx->flags & R600_CONTEXT_STOP_PIPELINE_STATS) { radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0)); radeon_emit(cs, EVENT_TYPE(V_028A90_PIPELINESTAT_STOP) | EVENT_INDEX(0)); -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: fix typo in r600 register definitions
From: Marek Olšák--- src/gallium/drivers/r600/r600d.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/r600d.h b/src/gallium/drivers/r600/r600d.h index 3d223ed..ef99573 100644 --- a/src/gallium/drivers/r600/r600d.h +++ b/src/gallium/drivers/r600/r600d.h @@ -780,7 +780,7 @@ #define S_028D0C_STENCIL_COMPRESS_DISABLE(x) (((x) & 0x1) << 5) #define S_028D0C_DEPTH_COMPRESS_DISABLE(x) (((x) & 0x1) << 6) #define S_028D0C_COPY_CENTROID(x)(((x) & 0x1) << 7) -#define S_028D0C_COPY_SAMPLE(x) (((x) & 0x1) << 8) +#define S_028D0C_COPY_SAMPLE(x) (((x) & 0x03) << 8) #define S_028D0C_R700_PERFECT_ZPASS_COUNTS(x)(((x) & 0x1) << 15) #define S_028D0C_CONSERVATIVE_Z_EXPORT(x)(((x) & 0x03) << 13) #define G_028D0C_CONSERVATIVE_Z_EXPORT(x)(((x) >> 13) & 0x03) -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/7] r600g: don't manually stop queries for blitter
From: Marek Olšákr600_set_active_query_state does it better. --- src/gallium/drivers/r600/r600_blit.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_blit.c b/src/gallium/drivers/r600/r600_blit.c index c52d5a9..7ddd4fa 100644 --- a/src/gallium/drivers/r600/r600_blit.c +++ b/src/gallium/drivers/r600/r600_blit.c @@ -54,8 +54,6 @@ static void r600_blitter_begin(struct pipe_context *ctx, enum r600_blitter_op op { struct r600_context *rctx = (struct r600_context *)ctx; - r600_suspend_nontimer_queries(>b); - util_blitter_save_vertex_buffer_slot(rctx->blitter, rctx->vertex_buffer_state.vb); util_blitter_save_vertex_elements(rctx->blitter, rctx->vertex_fetch_shader.cso); util_blitter_save_vertex_shader(rctx->blitter, rctx->vs_shader); @@ -98,7 +96,6 @@ static void r600_blitter_end(struct pipe_context *ctx) struct r600_context *rctx = (struct r600_context *)ctx; rctx->b.render_cond_force_off = false; - r600_resume_nontimer_queries(>b); } static unsigned u_max_sample(struct pipe_resource *r) -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/7] r600g: implement set_active_query_state for pausing occlusion queries
From: Marek OlšákUse ZPASS_INCREMENT_DISABLE everywhere. --- src/gallium/drivers/r600/evergreen_state.c | 5 - src/gallium/drivers/r600/evergreend.h| 2 +- src/gallium/drivers/r600/r600_pipe.h | 1 + src/gallium/drivers/r600/r600_state.c| 6 +- src/gallium/drivers/r600/r600_state_common.c | 12 src/gallium/drivers/r600/r600d.h | 1 + src/gallium/drivers/radeon/r600_query.c | 6 -- 7 files changed, 24 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 077664d..c1b0b56 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -1802,12 +1802,15 @@ static void evergreen_emit_db_misc_state(struct r600_context *rctx, struct r600_ S_02800C_FORCE_HIS_ENABLE0(V_02800C_FORCE_DISABLE) | S_02800C_FORCE_HIS_ENABLE1(V_02800C_FORCE_DISABLE); - if (rctx->b.num_occlusion_queries > 0) { + if (rctx->b.num_occlusion_queries > 0 && + !a->occlusion_queries_disabled) { db_count_control |= S_028004_PERFECT_ZPASS_COUNTS(1); if (rctx->b.chip_class == CAYMAN) { db_count_control |= S_028004_SAMPLE_RATE(a->log_samples); } db_render_override |= S_02800C_NOOP_CULL_DISABLE(1); + } else { + db_count_control |= S_028004_ZPASS_INCREMENT_DISABLE(1); } /* This is to fix a lockup when hyperz and alpha test are enabled at diff --git a/src/gallium/drivers/r600/evergreend.h b/src/gallium/drivers/r600/evergreend.h index ebe8c4a..a900458 100644 --- a/src/gallium/drivers/r600/evergreend.h +++ b/src/gallium/drivers/r600/evergreend.h @@ -1735,7 +1735,7 @@ #define S_028000_COPY_SAMPLE(x) (((x) & 0x7) << 8) #define S_028000_COLOR_DISABLE(x)(((x) & 0x1) << 12) #define R_028004_DB_COUNT_CONTROL0x00028004 -#define S_028004_ZPASS_INCREMENT_DISABLE(((x) & 0x1) << 0) +#define S_028004_ZPASS_INCREMENT_DISABLE(x) (((x) & 0x1) << 0) #define S_028004_PERFECT_ZPASS_COUNTS(x)(((x) & 0x1) << 1) #define S_028004_SAMPLE_RATE(x) (((x) & 0x7) << 4) /* cayman only */ #define R_028008_DB_DEPTH_VIEW 0x00028008 diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index de3fd06..0102638 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -120,6 +120,7 @@ struct r600_db_state { struct r600_db_misc_state { struct r600_atomatom; + boolocclusion_queries_disabled; boolflush_depthstencil_through_cb; boolflush_depth_inplace; boolflush_stencil_inplace; diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 62b46ce..c4de963 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -1644,12 +1644,16 @@ static void r600_emit_db_misc_state(struct r600_context *rctx, struct r600_atom } } - if (rctx->b.num_occlusion_queries > 0) { + if (rctx->b.num_occlusion_queries > 0 && + !a->occlusion_queries_disabled) { if (rctx->b.chip_class >= R700) { db_render_control |= S_028D0C_R700_PERFECT_ZPASS_COUNTS(1); } db_render_override |= S_028D10_NOOP_CULL_DISABLE(1); + } else { + db_render_control |= S_028D0C_ZPASS_INCREMENT_DISABLE(1); } + if (rctx->db_state.rsurf && rctx->db_state.rsurf->db_htile_surface) { /* FORCE_OFF means HiZ/HiS are determined by DB_SHADER_CONTROL */ db_render_override |= S_028D10_FORCE_HIZ_ENABLE(V_028D10_FORCE_OFF); diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index 32a1049..cdb493d 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -2860,6 +2860,17 @@ static void r600_invalidate_buffer(struct pipe_context *ctx, struct pipe_resourc } } +static void r600_set_active_query_state(struct pipe_context *ctx, boolean enable) +{ + struct r600_context *rctx = (struct r600_context*)ctx; + + /* Occlusion queries. */ + if (rctx->db_misc_state.occlusion_queries_disabled != !enable) { + rctx->db_misc_state.occlusion_queries_disabled = !enable; + r600_mark_atom_dirty(rctx, >db_misc_state.atom); + } +} + static void r600_set_occlusion_query_state(struct pipe_context *ctx, bool enable) { struct r600_context *rctx = (struct
[Mesa-dev] [PATCH 2/7] r600g: simplify r600_set_occlusion_query_state
From: Marek OlšákThe caller does the same checking. --- src/gallium/drivers/r600/evergreen_state.c | 2 +- src/gallium/drivers/r600/r600_pipe.h | 1 - src/gallium/drivers/r600/r600_state.c| 2 +- src/gallium/drivers/r600/r600_state_common.c | 5 + 4 files changed, 3 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 6595267..077664d 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -1802,7 +1802,7 @@ static void evergreen_emit_db_misc_state(struct r600_context *rctx, struct r600_ S_02800C_FORCE_HIS_ENABLE0(V_02800C_FORCE_DISABLE) | S_02800C_FORCE_HIS_ENABLE1(V_02800C_FORCE_DISABLE); - if (a->occlusion_query_enabled) { + if (rctx->b.num_occlusion_queries > 0) { db_count_control |= S_028004_PERFECT_ZPASS_COUNTS(1); if (rctx->b.chip_class == CAYMAN) { db_count_control |= S_028004_SAMPLE_RATE(a->log_samples); diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index cd0052a..de3fd06 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -120,7 +120,6 @@ struct r600_db_state { struct r600_db_misc_state { struct r600_atomatom; - boolocclusion_query_enabled; boolflush_depthstencil_through_cb; boolflush_depth_inplace; boolflush_stencil_inplace; diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 3189a13..62b46ce 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -1644,7 +1644,7 @@ static void r600_emit_db_misc_state(struct r600_context *rctx, struct r600_atom } } - if (a->occlusion_query_enabled) { + if (rctx->b.num_occlusion_queries > 0) { if (rctx->b.chip_class >= R700) { db_render_control |= S_028D0C_R700_PERFECT_ZPASS_COUNTS(1); } diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index 82babeb..32a1049 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -2864,10 +2864,7 @@ static void r600_set_occlusion_query_state(struct pipe_context *ctx, bool enable { struct r600_context *rctx = (struct r600_context*)ctx; - if (rctx->db_misc_state.occlusion_query_enabled != enable) { - rctx->db_misc_state.occlusion_query_enabled = enable; - r600_mark_atom_dirty(rctx, >db_misc_state.atom); - } + r600_mark_atom_dirty(rctx, >db_misc_state.atom); } static void r600_need_gfx_cs_space(struct pipe_context *ctx, unsigned num_dw, -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/7] r600g: add pausing pipeline & streamout queries into set_active_query_state
From: Marek Olšák--- src/gallium/drivers/r600/evergreen_state.c | 12 src/gallium/drivers/r600/r600_hw_context.c | 10 ++ src/gallium/drivers/r600/r600_pipe.h | 2 +- src/gallium/drivers/r600/r600_state.c| 6 ++ src/gallium/drivers/r600/r600_state_common.c | 9 + 5 files changed, 38 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index c1b0b56..f76d7a9 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -2395,6 +2395,12 @@ static void cayman_init_atom_start_cs(struct r600_context *rctx) r600_store_value(cb, PKT3(PKT3_EVENT_WRITE, 0, 0)); r600_store_value(cb, EVENT_TYPE(EVENT_TYPE_PS_PARTIAL_FLUSH) | EVENT_INDEX(4)); + /* This enables pipeline stat & streamout queries. +* They are only disabled by blits. +*/ + r600_store_value(cb, PKT3(PKT3_EVENT_WRITE, 0, 0)); + r600_store_value(cb, EVENT_TYPE(EVENT_TYPE_PIPELINESTAT_START) | EVENT_INDEX(0)); + cayman_init_common_regs(cb, rctx->b.chip_class, rctx->b.family, rctx->screen->b.info.drm_minor); @@ -2648,6 +2654,12 @@ void evergreen_init_atom_start_cs(struct r600_context *rctx) r600_store_value(cb, PKT3(PKT3_EVENT_WRITE, 0, 0)); r600_store_value(cb, EVENT_TYPE(EVENT_TYPE_PS_PARTIAL_FLUSH) | EVENT_INDEX(4)); + /* This enables pipeline stat & streamout queries. +* They are only disabled by blits. +*/ + r600_store_value(cb, PKT3(PKT3_EVENT_WRITE, 0, 0)); + r600_store_value(cb, EVENT_TYPE(EVENT_TYPE_PIPELINESTAT_START) | EVENT_INDEX(0)); + evergreen_init_common_regs(rctx, cb, rctx->b.chip_class, rctx->b.family, rctx->screen->b.info.drm_minor); diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index 7a6f957..63b631a 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -223,6 +223,16 @@ void r600_flush_emit(struct r600_context *rctx) cs->buf[cs->cdw++] = 0x000A; /* POLL_INTERVAL */ } + if (rctx->b.flags & R600_CONTEXT_START_PIPELINE_STATS) { + radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0)); + radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_PIPELINESTAT_START) | + EVENT_INDEX(0)); + } else if (rctx->b.flags & R600_CONTEXT_STOP_PIPELINE_STATS) { + radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0)); + radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_PIPELINESTAT_STOP) | + EVENT_INDEX(0)); + } + if (wait_until) { /* Use of WAIT_UNTIL is deprecated on Cayman+ */ if (rctx->b.family < CHIP_CAYMAN) { diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 0102638..86dd3c8 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -56,7 +56,7 @@ #define R600_CONTEXT_WAIT_CP_DMA_IDLE (R600_CONTEXT_PRIVATE_FLAG << 10) /* the number of CS dwords for flushing and drawing */ -#define R600_MAX_FLUSH_CS_DWORDS 16 +#define R600_MAX_FLUSH_CS_DWORDS 18 #define R600_MAX_DRAW_CS_DWORDS58 #define R600_MAX_USER_CONST_BUFFERS 13 diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index c4de963..02702ae 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -2177,6 +2177,12 @@ void r600_init_atom_start_cs(struct r600_context *rctx) r600_store_value(cb, PKT3(PKT3_EVENT_WRITE, 0, 0)); r600_store_value(cb, EVENT_TYPE(EVENT_TYPE_PS_PARTIAL_FLUSH) | EVENT_INDEX(4)); + /* This enables pipeline stat & streamout queries. +* They are only disabled by blits. +*/ + r600_store_value(cb, PKT3(PKT3_EVENT_WRITE, 0, 0)); + r600_store_value(cb, EVENT_TYPE(EVENT_TYPE_PIPELINESTAT_START) | EVENT_INDEX(0)); + family = rctx->b.family; ps_prio = 0; vs_prio = 1; diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index cdb493d..c03b75a 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -2864,6 +2864,15 @@ static void r600_set_active_query_state(struct pipe_context *ctx, boolean enable { struct r600_context *rctx = (struct r600_context*)ctx; + /* Pipeline stat & streamout queries. */ + if (enable) { + rctx->b.flags &= ~R600_CONTEXT_STOP_PIPELINE_STATS; + rctx->b.flags |= R600_CONTEXT_START_PIPELINE_STATS; + } else { + rctx->b.flags &=
Re: [Mesa-dev] [PATCH 2/2] Remove fs-discard-exit-3
BTW, Talos Principle had a bug that it calculated derivatives after discard, which is undefined on radeonsi if it results in non-uniform control flow. Then, somebody reported the bug to them, and they fixed it. I have to say that rarely do we find a test that quotes the spec and expects the exact opposite. :) (oh sorry, this should have been on the piglit ML) Marek On Fri, Apr 8, 2016 at 11:58 PM, Francisco Jerezwrote: > Marek Olšák writes: > >> From: Marek Olšák >> >> The test is wrong and the GLSL 1.30 citation in the test states very >> clearly that discard can cause non-uniform control flow for any code that >> follows. > > Heh, IIRC I sent this exact same patch over a year ago but I didn't get > particularly enthusiastic feed-back so I never pushed it (you can > probably find the discussion in the piglit mailing list archives if > you're interested). Anyway the test is obviously bogus so this seems > like the right thing to do to me: > > Reviewed-by: Francisco Jerez > >> --- >> .../execution/fs-discard-exit-3.shader_test| 76 >> -- >> 1 file changed, 76 deletions(-) >> delete mode 100644 >> tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test >> >> diff --git a/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test >> b/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test >> deleted file mode 100644 >> index 14e9b47..000 >> --- a/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test >> +++ /dev/null >> @@ -1,76 +0,0 @@ >> -# This is a test for derivatives behavior after a discard. >> -# >> -# From the GLSL 1.30 spec: >> -# >> -# "The discard keyword is only allowed within fragment shaders. It >> -# can be used within a fragment shader to abandon the operation >> -# on the current fragment. This keyword causes the fragment to be >> -# discarded and no updates to any buffers will occur. Control >> -# flow exits the shader, and subsequent implicit or explicit >> -# derivatives are undefined when this control flow is non-uniform >> -# (meaning different fragments within the primitive take >> -# different control paths)." >> - >> - >> -[require] >> -GLSL >= 1.30 >> - >> -[vertex shader] >> -#version 130 >> - >> -in vec4 vertex; >> -out vec2 texcoords; >> -void main() >> -{ >> - gl_Position = vertex; >> - >> - /* Turn the texcoords into a 1:1 mapping with pixels when >> - * interpolated. This means that the coords for our 2x2 >> - * subspan we're interested in for the FS will be: >> - * >> - * +-+-+ >> - * | 0,1 | 1,1 | >> - * +-+-+ >> - * | 0,0 | 0,1 | >> - * +-+-+ >> - * >> - * So it would sample the 1x1 miplevel of the GL_TEXTURE_2D >> - * miptree, unless some other math occurs... >> - */ >> - texcoords.yx = (vertex.xy + 1) / 2 * 250; >> -} >> - >> -[fragment shader] >> -#version 130 >> -in vec2 texcoords; >> -uniform sampler2D s; >> - >> -void main() >> -{ >> - if (gl_FragCoord.x >= 1.0 || gl_FragCoord.y >= 1.0) >> - discard; >> - >> - /* Now, we have uniform control after the discard (well, >> - * except for the join after the if statement up there). The >> - * derivatives on this sample should get us the same values >> - * for the undiscarded pixel as if we hadn't done any discard >> - * (comment out the "discard" above to see). >> - */ >> - gl_FragColor = texture(s, texcoords / 4); >> -} >> - >> -[vertex data] >> -vertex/float/2 >> --1.0 -1.0 >> - 1.0 -1.0 >> - 1.0 1.0 >> --1.0 1.0 >> - >> -[test] >> -clear color 0.5 0.5 0.5 0.5 >> -clear >> - >> -texture miptree 0 >> - >> -draw arrays GL_TRIANGLE_FAN 0 4 >> -probe rgba 0 0 0.0 1.0 0.0 1.0 >> -- >> 2.5.0 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] i965: Port INTEL_PRECISE_TRIG=1 to NIR.
On Fri, Apr 8, 2016 at 9:36 AM, Eduardo Lima Mitevwrote: > On 04/08/2016 01:35 AM, Kenneth Graunke wrote: > >> This makes the extra multiply visible to NIR's algebraic optimizations >> (for constant reassociation) as well as constant folding. This means >> that when the result of sin/cos are multiplied by an constant, we can >> eliminate the extra multiply altogether, reducing the cost of the >> workaround. >> >> It also means we only have to implement it one place, rather than in >> both backends. >> >> This makes INTEL_PRECISE_TRIG=1 cost nothing on GPUTest/Volplosion, >> which has a ton of sin() calls, but always multiplies them by an >> immediate constant. The extra multiply gets folded away. >> >> > Cleaner, indeed. Patch (and series) is: > > Reviewed-by: Eduardo Lima Mitev > > Thanks! > > Eduardo > > > Signed-off-by: Kenneth Graunke >> --- >> src/mesa/drivers/dri/i965/Makefile.am | 5 +++ >> src/mesa/drivers/dri/i965/Makefile.sources | 1 + >> src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 16 +--- >> src/mesa/drivers/dri/i965/brw_nir.c| 3 ++ >> src/mesa/drivers/dri/i965/brw_nir.h| 2 + >> .../drivers/dri/i965/brw_nir_trig_workarounds.py | 43 >> ++ >> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 16 +--- >> 7 files changed, 58 insertions(+), 28 deletions(-) >> create mode 100755 src/mesa/drivers/dri/i965/brw_nir_trig_workarounds.py >> >> diff --git a/src/mesa/drivers/dri/i965/Makefile.am >> b/src/mesa/drivers/dri/i965/Makefile.am >> index 0db5a51..a41c830 100644 >> --- a/src/mesa/drivers/dri/i965/Makefile.am >> +++ b/src/mesa/drivers/dri/i965/Makefile.am >> @@ -33,6 +33,7 @@ AM_CFLAGS = \ >> -I$(top_srcdir)/src/mesa/drivers/dri/common \ >> -I$(top_srcdir)/src/mesa/drivers/dri/intel/server \ >> -I$(top_srcdir)/src/gtest/include \ >> + -I$(top_srcdir)/src/compiler/nir \ >> -I$(top_builddir)/src/compiler/nir \ >> -I$(top_builddir)/src/mesa/drivers/dri/common \ >> $(DEFINES) \ >> @@ -41,6 +42,10 @@ AM_CFLAGS = \ >> >> AM_CXXFLAGS = $(AM_CFLAGS) >> >> +brw_nir_trig_workarounds.c: brw_nir_trig_workarounds.py >> $(top_srcdir)/src/compiler/nir/nir_algebraic.py >> + $(MKDIR_GEN) >> + $(AM_V_GEN) PYTHONPATH=$(top_srcdir)/src/compiler/nir $(PYTHON2) >> $(PYTHON_FLAGS) $(srcdir)/brw_nir_trig_workarounds.py > $@ || ($(RM) $@; >> false) >> + >> noinst_LTLIBRARIES = libi965_dri.la libi965_compiler.la >> libi965_dri_la_SOURCES = $(i965_FILES) >> libi965_dri_la_LIBADD = libi965_compiler.la $(INTEL_LIBS) >> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources >> b/src/mesa/drivers/dri/i965/Makefile.sources >> index 4689588..2619e43 100644 >> --- a/src/mesa/drivers/dri/i965/Makefile.sources >> +++ b/src/mesa/drivers/dri/i965/Makefile.sources >> @@ -44,6 +44,7 @@ i965_compiler_FILES = \ >> brw_nir.c \ >> brw_nir_analyze_boolean_resolves.c \ >> brw_nir_attribute_workarounds.c \ >> + brw_nir_trig_workarounds.c \ >> brw_nir_opt_peephole_ffma.c \ >> brw_nir_uniforms.cpp \ >> brw_packed_float.c \ >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> index 90b8789..bd6314a 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> @@ -775,24 +775,12 @@ fs_visitor::nir_emit_alu(const fs_builder , >> nir_alu_instr *instr) >> break; >> >> case nir_op_fsin: >> - if (!compiler->precise_trig) { >> - inst = bld.emit(SHADER_OPCODE_SIN, result, op[0]); >> - } else { >> - fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F); >> - inst = bld.emit(SHADER_OPCODE_SIN, tmp, op[0]); >> - inst = bld.MUL(result, tmp, brw_imm_f(0.7)); >> - } >> + inst = bld.emit(SHADER_OPCODE_SIN, result, op[0]); >> inst->saturate = instr->dest.saturate; >> break; >> >> case nir_op_fcos: >> - if (!compiler->precise_trig) { >> - inst = bld.emit(SHADER_OPCODE_COS, result, op[0]); >> - } else { >> - fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F); >> - inst = bld.emit(SHADER_OPCODE_COS, tmp, op[0]); >> - inst = bld.MUL(result, tmp, brw_imm_f(0.7)); >> - } >> + inst = bld.emit(SHADER_OPCODE_COS, result, op[0]); >> inst->saturate = instr->dest.saturate; >> break; >> >> diff --git a/src/mesa/drivers/dri/i965/brw_nir.c >> b/src/mesa/drivers/dri/i965/brw_nir.c >> index 1821c0d..932979a 100644 >> --- a/src/mesa/drivers/dri/i965/brw_nir.c >> +++ b/src/mesa/drivers/dri/i965/brw_nir.c >> @@ -447,6 +447,9 @@ brw_preprocess_nir(const struct brw_compiler >> *compiler, nir_shader *nir) >> if (nir->stage == MESA_SHADER_GEOMETRY) >> OPT(nir_lower_gs_intrinsics); >> >> + if (compiler->precise_trig) >> +
Re: [Mesa-dev] [PATCH 1/4] nir: Do basic constant reassociation.
On Fri, Apr 8, 2016 at 8:21 AM, Eduardo Lima Mitevwrote: > On 04/08/2016 01:35 AM, Kenneth Graunke wrote: > >> Many shaders contain expression trees of the form: >> >> const_1 * (value * const_2) >> >> Reorganizing these to >> >> (const_1 * const_2) * value >> >> will allow constant folding to combine the constants. Sometimes, these >> constants are 2 and 0.5, so we can remove a multiply altogether. Other >> times, it can create more immediate constants, which can actually hurt. >> > Further justification would be that the original was 4 NIR instructions: 2 load_const and 2 multiplies. Once constant folding takes over, we are again left with 4 NIR instructions: 2 load_const and 2 multiplies. The difference is that we now have (value * const_2) and (value * (const_1 * const_2)). So we have no more instructions in the end and, hopefully, the (value * const_2) goes away. Obviously, the emperical evidence says this actually hurts sometimes, but that's not surprising. > Finding a good balance here is tricky. While much more could be done, >> this simple patch seems to have a lot of positive benefit while having >> a low downside. >> >> > Makes sense. > > Reviewed-by: Eduardo Lima Mitev > > shader-db results on Broadwell: >> >> total instructions in shared programs: 8963768 -> 8961369 (-0.03%) >> instructions in affected programs: 438318 -> 435919 (-0.55%) >> helped: 1502 >> HURT: 245 >> >> total cycles in shared programs: 71527354 -> 71421516 (-0.15%) >> cycles in affected programs: 11541788 -> 11435950 (-0.92%) >> helped: 3445 >> HURT: 1224 >> >> > FWIW, these are my results on HSW: > > total instructions in shared programs: 6223995 -> 6222470 (-0.02%) > instructions in affected programs: 300945 -> 299420 (-0.51%) > helped: 1231 > HURT: 240 > > total cycles in shared programs: 56809432 -> 56615552 (-0.34%) > cycles in affected programs: 7980160 -> 7786280 (-2.43%) > helped: 2087 > HURT: 634 > > Eduardo > > Signed-off-by: Kenneth Graunke >> --- >> src/compiler/nir/nir_opt_algebraic.py | 8 >> 1 file changed, 8 insertions(+) >> >> diff --git a/src/compiler/nir/nir_opt_algebraic.py >> b/src/compiler/nir/nir_opt_algebraic.py >> index e72b4a7..420d9d9 100644 >> --- a/src/compiler/nir/nir_opt_algebraic.py >> +++ b/src/compiler/nir/nir_opt_algebraic.py >> @@ -274,6 +274,14 @@ optimizations = [ >> (('fmul', ('fneg', a), b), ('fneg', ('fmul', a, b))), >> (('imul', ('ineg', a), b), ('ineg', ('imul', a, b))), >> >> + # Reassociate constants in add/mul chains so they can be folded >> together. >> + # For now, we only handle cases where the constants are separated by >> + # a single non-constant. We could do better eventually. >> + (('~fmul', '#a', ('fmul', b, '#c')), ('fmul', ('fmul', a, c), b)), >> + (('imul', '#a', ('imul', b, '#c')), ('imul', ('imul', a, c), b)), >> + (('~fadd', '#a', ('fadd', b, '#c')), ('fadd', ('fadd', a, c), b)), >> + (('iadd', '#a', ('iadd', b, '#c')), ('iadd', ('iadd', a, c), b)), >> + >> # Misc. lowering >> (('fmod', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, >> b, 'options->lower_fmod'), >> (('uadd_carry', a, b), ('b2i', ('ult', ('iadd', a, b), a)), >> 'options->lower_uadd_carry'), >> >> > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] Remove fs-discard-exit-3
Marek Olšákwrites: > From: Marek Olšák > > The test is wrong and the GLSL 1.30 citation in the test states very > clearly that discard can cause non-uniform control flow for any code that > follows. Heh, IIRC I sent this exact same patch over a year ago but I didn't get particularly enthusiastic feed-back so I never pushed it (you can probably find the discussion in the piglit mailing list archives if you're interested). Anyway the test is obviously bogus so this seems like the right thing to do to me: Reviewed-by: Francisco Jerez > --- > .../execution/fs-discard-exit-3.shader_test| 76 > -- > 1 file changed, 76 deletions(-) > delete mode 100644 > tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test > > diff --git a/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test > b/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test > deleted file mode 100644 > index 14e9b47..000 > --- a/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test > +++ /dev/null > @@ -1,76 +0,0 @@ > -# This is a test for derivatives behavior after a discard. > -# > -# From the GLSL 1.30 spec: > -# > -# "The discard keyword is only allowed within fragment shaders. It > -# can be used within a fragment shader to abandon the operation > -# on the current fragment. This keyword causes the fragment to be > -# discarded and no updates to any buffers will occur. Control > -# flow exits the shader, and subsequent implicit or explicit > -# derivatives are undefined when this control flow is non-uniform > -# (meaning different fragments within the primitive take > -# different control paths)." > - > - > -[require] > -GLSL >= 1.30 > - > -[vertex shader] > -#version 130 > - > -in vec4 vertex; > -out vec2 texcoords; > -void main() > -{ > - gl_Position = vertex; > - > - /* Turn the texcoords into a 1:1 mapping with pixels when > - * interpolated. This means that the coords for our 2x2 > - * subspan we're interested in for the FS will be: > - * > - * +-+-+ > - * | 0,1 | 1,1 | > - * +-+-+ > - * | 0,0 | 0,1 | > - * +-+-+ > - * > - * So it would sample the 1x1 miplevel of the GL_TEXTURE_2D > - * miptree, unless some other math occurs... > - */ > - texcoords.yx = (vertex.xy + 1) / 2 * 250; > -} > - > -[fragment shader] > -#version 130 > -in vec2 texcoords; > -uniform sampler2D s; > - > -void main() > -{ > - if (gl_FragCoord.x >= 1.0 || gl_FragCoord.y >= 1.0) > - discard; > - > - /* Now, we have uniform control after the discard (well, > - * except for the join after the if statement up there). The > - * derivatives on this sample should get us the same values > - * for the undiscarded pixel as if we hadn't done any discard > - * (comment out the "discard" above to see). > - */ > - gl_FragColor = texture(s, texcoords / 4); > -} > - > -[vertex data] > -vertex/float/2 > --1.0 -1.0 > - 1.0 -1.0 > - 1.0 1.0 > --1.0 1.0 > - > -[test] > -clear color 0.5 0.5 0.5 0.5 > -clear > - > -texture miptree 0 > - > -draw arrays GL_TRIANGLE_FAN 0 4 > -probe rgba 0 0 0.0 1.0 0.0 1.0 > -- > 2.5.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] nir: Silence unused "options" warning in algebraic passes.
Quoting Kenneth Graunke (2016-04-08 11:48:01) > On Friday, April 8, 2016 11:18:46 AM PDT Dylan Baker wrote: > > > Quoting Eduardo Lima Mitev (2016-04-08 08:26:22) > > > On 04/08/2016 01:35 AM, Kenneth Graunke wrote: > > > > Some passes may not refer to options->..., at which point the compiler > > > > will warn about an unused variable. Just cast to void unconditionally > > > > to shut it up. > > > > > > > > Signed-off-by: Kenneth Graunke> > > > --- > > > > src/compiler/nir/nir_algebraic.py | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/src/compiler/nir/nir_algebraic.py b/src/compiler/nir/ > nir_algebraic.py > > > > index d05564f..53a7907 100644 > > > > --- a/src/compiler/nir/nir_algebraic.py > > > > +++ b/src/compiler/nir/nir_algebraic.py > > > > @@ -291,6 +291,7 @@ ${pass_name}(nir_shader *shader) > > > > bool progress = false; > > > > bool condition_flags[${len(condition_list)}]; > > > > const nir_shader_compiler_options *options = shader->options; > > > > + (void) options; > > > > > > > > > > Hmm, don't like this very much. I suppose since it is generated code, it > > > can not be done per block where options is unused. > > > For respect to other people's right to have clean build logs, patch is: > > > > > > Reviewed-by: Eduardo Lima Mitev > > > > > > > % for index, condition in enumerate(condition_list): > > > > condition_flags[${index}] = ${condition}; > > > > > > > > > > > I'm not familiar with the code, but looking at it options must be used > > in the condition_list argument? > > Right, we create the 'options' temporary so you can write rules like: > > (('~fmin', ('fmax', a, 0.0), 1.0), ('fsat', a), '!options->lower_fsat'), > > where !options->lower_fsat becomes part of the condition_list. However, > conditions are optional. If you have an algebraic rule list which doesn't > include any conditions referring to options->..., the variable will be > unused. (For example, the list in patch 4 meets this criteria.) > So what about wrapping the definitions of options in a conditional? % if 'options' in conditions[-1]: const nir_shader_compiler_options *options = shader->options; % endif This only covers that options is the last argument, which seems to be what nir_opt_algebraic does, although I don't know that is actually a hard requirement, or just a coincidence. Dylan signature.asc Description: signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] i965/disasm: Decode "channel mask present" bit correctly.
Bit 15 means "interleave" for most messages, but for SIMD8 messages it means "use channel masks". Signed-off-by: Kenneth Graunke--- src/mesa/drivers/dri/i965/brw_disasm.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c b/src/mesa/drivers/dri/i965/brw_disasm.c index 0ae237d..0848657 100644 --- a/src/mesa/drivers/dri/i965/brw_disasm.c +++ b/src/mesa/drivers/dri/i965/brw_disasm.c @@ -1505,7 +1505,9 @@ brw_disassemble_inst(FILE *file, const struct brw_device_info *devinfo, break; } - case BRW_SFID_URB: + case BRW_SFID_URB: { +unsigned opcode = brw_inst_urb_opcode(devinfo, inst); + format(file, " %ld", brw_inst_urb_global_offset(devinfo, inst)); space = 1; @@ -1513,10 +1515,18 @@ brw_disassemble_inst(FILE *file, const struct brw_device_info *devinfo, err |= control(file, "urb opcode", devinfo->gen >= 7 ? gen7_urb_opcode : gen5_urb_opcode, - brw_inst_urb_opcode(devinfo, inst), ); + opcode, ); + +if (opcode == GEN8_URB_OPCODE_SIMD8_WRITE || +opcode == GEN8_URB_OPCODE_SIMD8_READ) { + if (brw_inst_urb_channel_mask_present(devinfo, inst)) + string(file, " masked"); +} else { + err |= control(file, "urb swizzle", urb_swizzle, + brw_inst_urb_swizzle_control(devinfo, inst), + ); +} -err |= control(file, "urb swizzle", urb_swizzle, - brw_inst_urb_swizzle_control(devinfo, inst), ); if (devinfo->gen < 7) { err |= control(file, "urb allocate", urb_allocate, brw_inst_urb_allocate(devinfo, inst), ); @@ -1528,6 +1538,7 @@ brw_disassemble_inst(FILE *file, const struct brw_device_info *devinfo, brw_inst_urb_complete(devinfo, inst), ); } break; + } case BRW_SFID_THREAD_SPAWNER: break; -- 2.8.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] i965/disasm: Simplify the URB opcode printing with ?:.
Signed-off-by: Kenneth Graunke--- src/mesa/drivers/dri/i965/brw_disasm.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c b/src/mesa/drivers/dri/i965/brw_disasm.c index 09eb239..0ae237d 100644 --- a/src/mesa/drivers/dri/i965/brw_disasm.c +++ b/src/mesa/drivers/dri/i965/brw_disasm.c @@ -1509,13 +1509,12 @@ brw_disassemble_inst(FILE *file, const struct brw_device_info *devinfo, format(file, " %ld", brw_inst_urb_global_offset(devinfo, inst)); space = 1; -if (devinfo->gen >= 7) { - err |= control(file, "urb opcode", gen7_urb_opcode, - brw_inst_urb_opcode(devinfo, inst), ); -} else if (devinfo->gen >= 5) { - err |= control(file, "urb opcode", gen5_urb_opcode, - brw_inst_urb_opcode(devinfo, inst), ); -} + +err |= control(file, "urb opcode", + devinfo->gen >= 7 ? gen7_urb_opcode + : gen5_urb_opcode, + brw_inst_urb_opcode(devinfo, inst), ); + err |= control(file, "urb swizzle", urb_swizzle, brw_inst_urb_swizzle_control(devinfo, inst), ); if (devinfo->gen < 7) { -- 2.8.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] i965/disasm: Decode per-slot offsets.
We just never bothered to decode this. Signed-off-by: Kenneth Graunke--- src/mesa/drivers/dri/i965/brw_disasm.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c b/src/mesa/drivers/dri/i965/brw_disasm.c index 0848657..88bd7a4 100644 --- a/src/mesa/drivers/dri/i965/brw_disasm.c +++ b/src/mesa/drivers/dri/i965/brw_disasm.c @@ -1517,6 +1517,11 @@ brw_disassemble_inst(FILE *file, const struct brw_device_info *devinfo, : gen5_urb_opcode, opcode, ); +if (devinfo->gen >= 7 && +brw_inst_urb_per_slot_offset(devinfo, inst)) { + string(file, " per-slot"); +} + if (opcode == GEN8_URB_OPCODE_SIMD8_WRITE || opcode == GEN8_URB_OPCODE_SIMD8_READ) { if (brw_inst_urb_channel_mask_present(devinfo, inst)) -- 2.8.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/16] nir: add lowering pass for glBitmap
Rob Clarkwrites: > From: Rob Clark > > Signed-off-by: Rob Clark > Reviewed-by: Connor Abbott > --- > src/compiler/Makefile.sources | 1 + > src/compiler/nir/nir.h | 7 ++ > src/compiler/nir/nir_lower_bitmap.c | 166 > > 3 files changed, 174 insertions(+) > create mode 100644 src/compiler/nir/nir_lower_bitmap.c > > diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources > index c396128..cb9eee9 100644 > --- a/src/compiler/Makefile.sources > +++ b/src/compiler/Makefile.sources > @@ -185,6 +185,7 @@ NIR_FILES = \ > nir/nir_liveness.c \ > nir/nir_lower_alu_to_scalar.c \ > nir/nir_lower_atomics.c \ > + nir/nir_lower_bitmap.c \ > nir/nir_lower_clip.c \ > nir/nir_lower_drawpixels.c \ > nir/nir_lower_global_vars_to_local.c \ > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h > index 34635ed..85b28d2 100644 > --- a/src/compiler/nir/nir.h > +++ b/src/compiler/nir/nir.h > @@ -2269,6 +2269,13 @@ typedef struct nir_lower_drawpixels_options { > void nir_lower_drawpixels(nir_shader *shader, >const nir_lower_drawpixels_options *options); > > +typedef struct nir_lower_bitmap_options { > + unsigned sampler; > + bool swizzle_; > +} nir_lower_bitmap_options; > + > +void nir_lower_bitmap(nir_shader *shader, const nir_lower_bitmap_options > *options); > + > void nir_lower_atomics(nir_shader *shader, > const struct gl_shader_program *shader_program); > void nir_lower_to_source_mods(nir_shader *shader); > diff --git a/src/compiler/nir/nir_lower_bitmap.c > b/src/compiler/nir/nir_lower_bitmap.c > new file mode 100644 > index 000..3cba69a > --- /dev/null > +++ b/src/compiler/nir/nir_lower_bitmap.c > @@ -0,0 +1,166 @@ > +/* > + * Copyright © 2015 Red Hat > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > FROM, > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > THE > + * SOFTWARE. > + */ > + > +#include "nir.h" > +#include "nir_builder.h" > + > +/* Lower glBitmap(). > + * > + * This is based on the logic in st_get_bitmap_shader() in TGSI compiler. > + * From st_cb_bitmap.c: > + * > + *glBitmaps are drawn as textured quads. The user's bitmap pattern > + *is stored in a texture image. An alpha8 texture format is used. > + *The fragment shader samples a bit (texel) from the texture, then > + *discards the fragment if the bit is off. > + * > + *Note that we actually store the inverse image of the bitmap to > + *simplify the fragment program. An "on" bit gets stored as texel=0x0 > + *and an "off" bit is stored as texel=0xff. Then we kill the > + *fragment if the negated texel value is less than zero. > + * > + * Note that the texture format will be, according to what driver supports, > + * in order of preference (with swizzle): > + * > + *I8_UNORM - . > + *A8_UNORM - .000x > + *L8_UNORM - .xxx1 > + * > + * If L8_UNORM, options->swizzle_ is true. Otherwise we can just use > + * the .w comp. > + * > + * Run before nir_lower_io. > + */ > + > +typedef struct { > + const nir_lower_bitmap_options *options; > + nir_shader *shader; > + nir_builder b; > + nir_variable *texcoord; > +} lower_bitmap_state; > + > +static nir_ssa_def * > +get_texcoord(lower_bitmap_state *state) > +{ > + if (state->texcoord == NULL) { > + nir_variable *texcoord = NULL; > + > + /* find gl_TexCoord, if it exists: */ > + nir_foreach_variable(var, >shader->inputs) { > + if (strcmp(var->name, "gl_TexCoord") == 0) { > +texcoord = var; > +break; > + } > + } > + > + /* otherwise create it: */ > + if (texcoord == NULL) { > + texcoord = nir_variable_create(state->shader, > +
Re: [Mesa-dev] [PATCH 07/16] nir: passthrough-edgeflags support
Rob Clarkwrites: > From: Rob Clark > > Handled by tgsi_emulate for glsl->tgsi case. > > Signed-off-by: Rob Clark > Reviewed-by: Connor Abbott > --- > src/compiler/Makefile.sources | 1 + > src/compiler/nir/nir.h | 2 + > src/compiler/nir/nir_lower_passthrough_edgeflags.c | 82 > ++ > 3 files changed, 85 insertions(+) > create mode 100644 src/compiler/nir/nir_lower_passthrough_edgeflags.c > > diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources > index 12badf0..ae7efbf 100644 > --- a/src/compiler/Makefile.sources > +++ b/src/compiler/Makefile.sources > @@ -197,6 +197,7 @@ NIR_FILES = \ > nir/nir_lower_indirect_derefs.c \ > nir/nir_lower_io.c \ > nir/nir_lower_outputs_to_temporaries.c \ > + nir/nir_lower_passthrough_edgeflags.c \ > nir/nir_lower_phis_to_scalar.c \ > nir/nir_lower_returns.c \ > nir/nir_lower_samplers.c \ > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h > index 67373b6..65f75bd 100644 > --- a/src/compiler/nir/nir.h > +++ b/src/compiler/nir/nir.h > @@ -2246,6 +2246,8 @@ void nir_lower_two_sided_color(nir_shader *shader); > > void nir_lower_clamp_color_outputs(nir_shader *shader); > > +void nir_lower_passthrough_edgeflags(nir_shader *shader); > + > typedef struct nir_lower_wpos_ytransform_options { > int state_tokens[5]; > bool fs_coord_origin_upper_left :1; > diff --git a/src/compiler/nir/nir_lower_passthrough_edgeflags.c > b/src/compiler/nir/nir_lower_passthrough_edgeflags.c > new file mode 100644 > index 000..1afcbf7 > --- /dev/null > +++ b/src/compiler/nir/nir_lower_passthrough_edgeflags.c > @@ -0,0 +1,82 @@ > +/* > + * Copyright © 2015 Red Hat > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > FROM, > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > THE > + * SOFTWARE. > + */ > + > +#include "nir.h" > +#include "nir_builder.h" > + > +typedef struct { > + nir_shader *shader; > + nir_builder b; > +} lower_state; > + > +static nir_variable * > +create_edgeflag_var(nir_shader *shader, bool output) > +{ > + nir_variable_mode mode = output ? nir_var_shader_out : nir_var_shader_in; > + nir_variable *var = nir_variable_create(shader, mode, glsl_vec4_type(), > + output ? "edgeflag_out" : > "edgeflag_in"); > + var->data.location = output ? VARYING_SLOT_EDGE : VERT_ATTRIB_EDGEFLAG; > + return var; > +} > + > +static bool > +lower_block(nir_block *block, void *_state) > +{ > + lower_state *state = _state; > + nir_builder *b = >b; > + nir_variable *in, *out; > + nir_ssa_def *def; > + > + in = create_edgeflag_var(state->shader, false); > + out = create_edgeflag_var(state->shader, true); I think the code would be clearer if you inlined create_edgeflag_var. > + > + b->cursor = nir_before_block(block); > + > + def = nir_load_var(b, in); > + nir_store_var(b, out, def, 0xf); > + > + /* only do this for first block: */ > + return false; > +} > + > +static void > +lower_impl(lower_state *state, nir_function_impl *impl) > +{ > + nir_builder_init(>b, impl); > + > + nir_foreach_block(impl, lower_block, state); I think in the previous review when I suggested nir_before_block(), I actually wanted to say that you probably want here to just do nir_before_cf_list(>cf_node). > + nir_metadata_preserve(impl, nir_metadata_block_index | > + nir_metadata_dominance); > +} > + > +void nir_lower_passthrough_edgeflags(nir_shader *shader) > +{ > + lower_state state = { > + .shader = shader, > + }; > + > + nir_foreach_function(shader, function) { > + if ((strcmp(function->name, "main") == 0) && function->impl) > + lower_impl(, function->impl); > + } > +} We pretty clearly want a "Get me the main
Re: [Mesa-dev] Merging the Vulkan driver
Jason Ekstrandwrites: > All, > > We are getting very close to being able to merge the Vulkan driver into > mesa master. I've got around 30 patches on the mailing list that are still > awaiting review. Once those get merged, the diff between the "vulkan" and > "master" branches is basically zero except for adding new files. The going > plan is to get the diff between the two branches down to zero except for > the new "src/intel" directory and minimal build system changes and then do > an actual git merge. I know we don't usually do merges in this project, > but the Vulkan driver is big and has a lot of history that we would like to > preserve. If you're strongly opposed to doing a merge, please speak up now! > > Things that still need review: > > 1) i965 back-end patches for indirect push constants > 2) i965 and NIR patches to add a couple new opcodes > 3) Misc i965 and NIR patches we've been carrying around in the vulkan > branch for a while > 4) A patch to configure.ac to add new flags for building Vulkan drivers > (mostly written, not yet sent). > > Those will all undergo the normal review process and get pushed *before* we > merge. That way the only thing the merge does is add a new directory and a > three or four lines each to configure.ac and src/Makefile.am. > > If people step up with code review, I'm hoping that we can merge some time > this week or next. I think a merge makes a ton of sense for this. Having duplicate commits in the history sucks, but maintaining history is worth it. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 00/15] i965: Rework uniform handling in the back-end
On Thu, Apr 7, 2016 at 10:01 PM, Matt Turnerwrote: > On Tue, Mar 22, 2016 at 3:33 PM, Jason Ekstrand > wrote: > > This is mostly a re-send of a patch series I've had floating around in > one > > form or a while for quite some time. It's basically the same except that > > the original version was missing a work-around for Sandy Bridge. For a > > while, I wasn't really pushing to get it merged because I couldn't > > demonstrate any actual performance benifit from pushing arrays. However, > > with the Vulkan API, the concept of push constants is directly exposed to > > the user and we really need to be able to indirect on them. This series > > makes the FS backend 100% ready for indirect push constants; vec4 will > > take a little more work. > > > > It's worth noting that we've been carying these patches around in our > > Vulkan driver for probably 3 or 4 months now and it's working great. > > > > For those that prefer to review on a branch: > > > > https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i965-uniforms > > > > I think Kristian has mostly reviewed these patches. However, he never > sent > > any R-Bs to the list. I'd also like Ken or Matt to look at it from a > > design perspective. > > I don't know what I think. I'm sympathetic to Curro's argument, but in > the absence of more data it's hard to judge anything really. I'm not > at all sympathetic to > > """ > Do I have a proof-of-concept in code, no. However, I've run through > it in my head and I have a pretty good idea what it would look like. > You are free to go off and do it if you don't believe me, but I don't > really want to hold things up while you do. > """ > > That's what... An Appeal to Your Brain? :) > > I don't know how to proceed on that front if no one is willing or > interested in trying to implement it using reladdr. > > I ran shader-db. > > total instructions in shared programs: 7113290 -> 7161760 (0.68%) > instructions in affected programs: 866011 -> 914481 (5.60%) > helped: 0 > HURT: 7180 > > total cycles in shared programs: 64705926 -> 64776118 (0.11%) > cycles in affected programs: 4951554 -> 5021746 (1.42%) > helped: 1605 > HURT: 5204 > > of which the overwhelming majority is vertex shaders (why? this series > is i965/fs). FS changes are just > > instructions in affected programs: 13550 -> 14132 (4.30%) > helped: 0 > HURT: 50 > > but I'm having a hard time finding shaders that actually use the > address register. > > What's going on with the shader-db regressions? > As I figured, the shader-db issues are primarily a problem with inconsistent use of D and UD. This patch fixes that problem: https://cgit.freedesktop.org/~jekstrand/mesa/commit/?h=wip/i965-uniforms-v3=0dbf9c8ee415b19073efe92fa586fddf22b725e6 I've pushed a version of the series with that patch to fd.o here: https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/i965-uniforms-v3 There is still a small shader-db delta in vec4 which comes from changing the algorithm we use to place pull constants in the buffer. The original algorithm walks over the instructions and places uniforms in the pull constant buffer as it finds an instructions that use them. The new algorithm places them in the same order as their uniform number which is effectively the order in which they are declared in the shader. There are a number of shaders from Valve games (left4dead, portal 2, tf2) that seem to have a single uniform array they use *a lot* and others that they use less frequently. The one they use most frequently happens to also be used first so it gets placed first in the buffer with the old algorighm and later with the new one. The only reason this makes any difference whatsoever is that whatever uniform gets placed first in the buffer is at offset 0 so you don't need to add a constant offset to the array offset and the address calculation has one less instruction. In these Valve games, that uniform happens to be used enough more often than the rest that that extra instruction shows up in the shader-db results. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] i965/tiled_memoyp: Get rid of the direction parameter to get_memcpy
On Thu, Apr 7, 2016 at 4:57 PM, Matt Turnerwrote: > On Thu, Apr 7, 2016 at 1:56 PM, Jason Ekstrand > wrote: > > Now that we can use the much simpler rgba8_copy function, we don't need > to > > hand different functions out based on direction. > > --- > > Typo in the subject, and PATCH 2/3 needs a bugzilla link. > Fixed > The series is > > Reviewed-by: Matt Turner > > I noticed that gcc does *not* unroll the loop in the 64-byte case. > I've got some small optimizations to this code that I'll send after > this series, so I'll just fix that up at the same time. > I just pushed. I added a Cc to stable for the first two patches. I recommend you do as well for whatever you send that fixes the unrolling problem. --Jason ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] nir: Silence unused "options" warning in algebraic passes.
On Friday, April 8, 2016 11:18:46 AM PDT Dylan Baker wrote: > Quoting Eduardo Lima Mitev (2016-04-08 08:26:22) > > On 04/08/2016 01:35 AM, Kenneth Graunke wrote: > > > Some passes may not refer to options->..., at which point the compiler > > > will warn about an unused variable. Just cast to void unconditionally > > > to shut it up. > > > > > > Signed-off-by: Kenneth Graunke> > > --- > > > src/compiler/nir/nir_algebraic.py | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/src/compiler/nir/nir_algebraic.py b/src/compiler/nir/ nir_algebraic.py > > > index d05564f..53a7907 100644 > > > --- a/src/compiler/nir/nir_algebraic.py > > > +++ b/src/compiler/nir/nir_algebraic.py > > > @@ -291,6 +291,7 @@ ${pass_name}(nir_shader *shader) > > > bool progress = false; > > > bool condition_flags[${len(condition_list)}]; > > > const nir_shader_compiler_options *options = shader->options; > > > + (void) options; > > > > > > > Hmm, don't like this very much. I suppose since it is generated code, it > > can not be done per block where options is unused. > > For respect to other people's right to have clean build logs, patch is: > > > > Reviewed-by: Eduardo Lima Mitev > > > > > % for index, condition in enumerate(condition_list): > > > condition_flags[${index}] = ${condition}; > > > > > > > I'm not familiar with the code, but looking at it options must be used > in the condition_list argument? Right, we create the 'options' temporary so you can write rules like: (('~fmin', ('fmax', a, 0.0), 1.0), ('fsat', a), '!options->lower_fsat'), where !options->lower_fsat becomes part of the condition_list. However, conditions are optional. If you have an algebraic rule list which doesn't include any conditions referring to options->..., the variable will be unused. (For example, the list in patch 4 meets this criteria.) signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] nir: Silence unused "options" warning in algebraic passes.
Quoting Eduardo Lima Mitev (2016-04-08 08:26:22) > On 04/08/2016 01:35 AM, Kenneth Graunke wrote: > > Some passes may not refer to options->..., at which point the compiler > > will warn about an unused variable. Just cast to void unconditionally > > to shut it up. > > > > Signed-off-by: Kenneth Graunke> > --- > > src/compiler/nir/nir_algebraic.py | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/src/compiler/nir/nir_algebraic.py > > b/src/compiler/nir/nir_algebraic.py > > index d05564f..53a7907 100644 > > --- a/src/compiler/nir/nir_algebraic.py > > +++ b/src/compiler/nir/nir_algebraic.py > > @@ -291,6 +291,7 @@ ${pass_name}(nir_shader *shader) > > bool progress = false; > > bool condition_flags[${len(condition_list)}]; > > const nir_shader_compiler_options *options = shader->options; > > + (void) options; > > > > Hmm, don't like this very much. I suppose since it is generated code, it > can not be done per block where options is unused. > For respect to other people's right to have clean build logs, patch is: > > Reviewed-by: Eduardo Lima Mitev > > > % for index, condition in enumerate(condition_list): > > condition_flags[${index}] = ${condition}; > > > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev I'm not familiar with the code, but looking at it options must be used in the condition_list argument? signature.asc Description: signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] i965/tiled_memoyp: Get rid of the direction parameter to get_memcpy
On 04/07/2016 01:56 PM, Jason Ekstrand wrote: > Now that we can use the much simpler rgba8_copy function, we don't need to > hand different functions out based on direction. > --- > src/mesa/drivers/dri/i965/intel_pixel_read.c | 3 +-- > src/mesa/drivers/dri/i965/intel_tex_image.c| 3 +-- > src/mesa/drivers/dri/i965/intel_tex_subimage.c | 3 +-- > src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 3 +-- > src/mesa/drivers/dri/i965/intel_tiled_memcpy.h | 15 +-- > 5 files changed, 5 insertions(+), 22 deletions(-) I read the patches closely, and I like their approach. For the series, Reviewed-by: Chad Versace___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeon/uvd: alignment fix for decode message buffer
Cc:On 04/08/2016 11:34 AM, Boyuan Zhang wrote: Signed-off-by: Boyuan Zhang Reviewed-by: Christian König --- src/gallium/drivers/radeon/radeon_uvd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeon/radeon_uvd.c b/src/gallium/drivers/radeon/radeon_uvd.c index 233f460..098baf2 100644 --- a/src/gallium/drivers/radeon/radeon_uvd.c +++ b/src/gallium/drivers/radeon/radeon_uvd.c @@ -1003,7 +1003,7 @@ static void ruvd_end_frame(struct pipe_video_codec *decoder, dec->msg->body.decode.dpb_size = dec->dpb.res->buf->size; dec->msg->body.decode.bsd_size = bs_size; - dec->msg->body.decode.db_pitch = dec->base.width; + dec->msg->body.decode.db_pitch = align(dec->base.width, 16); dt = dec->set_dtb(dec->msg, (struct vl_video_buffer *)target); if (((struct r600_common_screen*)dec->screen)->family >= CHIP_STONEY) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] i965: Port INTEL_PRECISE_TRIG=1 to NIR.
On 04/08/2016 01:35 AM, Kenneth Graunke wrote: This makes the extra multiply visible to NIR's algebraic optimizations (for constant reassociation) as well as constant folding. This means that when the result of sin/cos are multiplied by an constant, we can eliminate the extra multiply altogether, reducing the cost of the workaround. It also means we only have to implement it one place, rather than in both backends. This makes INTEL_PRECISE_TRIG=1 cost nothing on GPUTest/Volplosion, which has a ton of sin() calls, but always multiplies them by an immediate constant. The extra multiply gets folded away. Cleaner, indeed. Patch (and series) is: Reviewed-by: Eduardo Lima MitevThanks! Eduardo Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/Makefile.am | 5 +++ src/mesa/drivers/dri/i965/Makefile.sources | 1 + src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 16 +--- src/mesa/drivers/dri/i965/brw_nir.c| 3 ++ src/mesa/drivers/dri/i965/brw_nir.h| 2 + .../drivers/dri/i965/brw_nir_trig_workarounds.py | 43 ++ src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 16 +--- 7 files changed, 58 insertions(+), 28 deletions(-) create mode 100755 src/mesa/drivers/dri/i965/brw_nir_trig_workarounds.py diff --git a/src/mesa/drivers/dri/i965/Makefile.am b/src/mesa/drivers/dri/i965/Makefile.am index 0db5a51..a41c830 100644 --- a/src/mesa/drivers/dri/i965/Makefile.am +++ b/src/mesa/drivers/dri/i965/Makefile.am @@ -33,6 +33,7 @@ AM_CFLAGS = \ -I$(top_srcdir)/src/mesa/drivers/dri/common \ -I$(top_srcdir)/src/mesa/drivers/dri/intel/server \ -I$(top_srcdir)/src/gtest/include \ + -I$(top_srcdir)/src/compiler/nir \ -I$(top_builddir)/src/compiler/nir \ -I$(top_builddir)/src/mesa/drivers/dri/common \ $(DEFINES) \ @@ -41,6 +42,10 @@ AM_CFLAGS = \ AM_CXXFLAGS = $(AM_CFLAGS) +brw_nir_trig_workarounds.c: brw_nir_trig_workarounds.py $(top_srcdir)/src/compiler/nir/nir_algebraic.py + $(MKDIR_GEN) + $(AM_V_GEN) PYTHONPATH=$(top_srcdir)/src/compiler/nir $(PYTHON2) $(PYTHON_FLAGS) $(srcdir)/brw_nir_trig_workarounds.py > $@ || ($(RM) $@; false) + noinst_LTLIBRARIES = libi965_dri.la libi965_compiler.la libi965_dri_la_SOURCES = $(i965_FILES) libi965_dri_la_LIBADD = libi965_compiler.la $(INTEL_LIBS) diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources index 4689588..2619e43 100644 --- a/src/mesa/drivers/dri/i965/Makefile.sources +++ b/src/mesa/drivers/dri/i965/Makefile.sources @@ -44,6 +44,7 @@ i965_compiler_FILES = \ brw_nir.c \ brw_nir_analyze_boolean_resolves.c \ brw_nir_attribute_workarounds.c \ + brw_nir_trig_workarounds.c \ brw_nir_opt_peephole_ffma.c \ brw_nir_uniforms.cpp \ brw_packed_float.c \ diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 90b8789..bd6314a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -775,24 +775,12 @@ fs_visitor::nir_emit_alu(const fs_builder , nir_alu_instr *instr) break; case nir_op_fsin: - if (!compiler->precise_trig) { - inst = bld.emit(SHADER_OPCODE_SIN, result, op[0]); - } else { - fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F); - inst = bld.emit(SHADER_OPCODE_SIN, tmp, op[0]); - inst = bld.MUL(result, tmp, brw_imm_f(0.7)); - } + inst = bld.emit(SHADER_OPCODE_SIN, result, op[0]); inst->saturate = instr->dest.saturate; break; case nir_op_fcos: - if (!compiler->precise_trig) { - inst = bld.emit(SHADER_OPCODE_COS, result, op[0]); - } else { - fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F); - inst = bld.emit(SHADER_OPCODE_COS, tmp, op[0]); - inst = bld.MUL(result, tmp, brw_imm_f(0.7)); - } + inst = bld.emit(SHADER_OPCODE_COS, result, op[0]); inst->saturate = instr->dest.saturate; break; diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i965/brw_nir.c index 1821c0d..932979a 100644 --- a/src/mesa/drivers/dri/i965/brw_nir.c +++ b/src/mesa/drivers/dri/i965/brw_nir.c @@ -447,6 +447,9 @@ brw_preprocess_nir(const struct brw_compiler *compiler, nir_shader *nir) if (nir->stage == MESA_SHADER_GEOMETRY) OPT(nir_lower_gs_intrinsics); + if (compiler->precise_trig) + OPT(brw_nir_apply_trig_workarounds); + static const nir_lower_tex_options tex_options = { .lower_txp = ~0, }; diff --git a/src/mesa/drivers/dri/i965/brw_nir.h b/src/mesa/drivers/dri/i965/brw_nir.h index b10c083..2711606 100644 --- a/src/mesa/drivers/dri/i965/brw_nir.h +++ b/src/mesa/drivers/dri/i965/brw_nir.h @@ -106,6 +106,8 @@ bool brw_nir_apply_attribute_workarounds(nir_shader *nir,
Re: [Mesa-dev] [Nouveau] [PATCH] nouveau: codegen: Take src swizzle into account on loads
On Fri, Apr 8, 2016 at 12:26 PM, Hans de Goedewrote: > Hi, > > > On 08-04-16 18:06, Hans de Goede wrote: >> >> Hi, >> >> On 08-04-16 17:45, Ilia Mirkin wrote: >>> >>> On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goede >>> wrote: When dealing with non vector variables the llvm register allocator will use TEMP[0].x then TEMP[0].y, etc. When loading something from a global buffer it will calculate the address to use, and store that in say TEMP[0].x, so it ends up generating: LOAD TEMP[0].y, MEMORY[0], TEMP[0] Expecting the contents of TEMP[0].y to become the 32 bits of data to which TEMP[0].x is pointing. But instead it will get the 32 bits of data at address (TEMP[0].x + 4). With the old RES[32767] code one could generate the following TGSI: LOAD TEMP[0].y, RES[32767]., TEMP[0] And things would work fine since the . swizzling postfix would be honored and when storing to y (the only component set in the dest-mask) the x component at address (TEMP[0].x) would be loaded, rather then the y component at (TEMP[0].y) Note that another approach would be to not increment the address by a 32 bit word for skipped (not set in destmask) components. The way I see it either: 1) We see that LOAD does not deal with vectors, but with flat memory, in which case skipping 4 bytes because x is not set in the destmask does not make sense, as that is a vector thing todo. 2) LOAD is vector layout aware in which case supporting swizzling makes sense. Currently we have a weird hybrid which is rather cumbersome to work with from a compiler pov. >>> >>> >>> And I guess LLVM never ends up generating any of the other "funny" >>> instructions like LIT and the such. Well, I have no problem adding the >>> swizzling logic, i.e. the way that LOAD will now work (logically) is >>> that it will >>> >>> (a) fetch 4 values from the coordinates provided (4 sequential dwords >>> from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in >>> the case of images) >>> (b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE >>> argument >>> (c) store that swizzled result into the destination based on the >>> writemask >>> >>> That would sound reasonable to me, and if I understand correctly, is >>> option 2 of your proposal. >> >> >> Yes that is option 2, and is basically what the patch which started this >> thread does. So that would work for me :) >> >>> We'd need some docs updates and buy-in from the other gallium driver >>> developers. >> >> >> What docs would need updating ? The TGSI docs I'm aware of are at: >> >> http://gallium.readthedocs.org/en/latest/tgsi.html >> >> I assume those have a source in the mesa src somewhere (I've not looked), >> but those mostly just look quite incomplete in general when it comes to >> TGSI >> (I've had to revert to figuring what things do from the mesa srcs quite >> often) >> >> Have I been looking at the wrong docs perhaps ? >> >> Note that them being incomplete is not intended as an excuse to not >> document >> this, I'm all for better documentation. >> >>> STORE remains unchanged, as the MEMORY/etc is in the destination, >>> where there is a writemask, which is presently used and will remain >>> effective. >> >> >> Right and note that the first src operand of STORE already takes swizzling >> into account, so the proposed option 2 will actually make the 2 more >> inline. > > > Erm, I mean the 2nd src operand of the store of-course, the actual src. > > On a related note, comparing handleLOAD and handleSTORE, this bit in > handleLOAD seems wrong: > > Value *off = fetchSrc(1, c); > > I believe that should be: > > Value *off = fetchSrc(1, 0); Yep, that's wrong. I think I was waffling back and forth early on in the lifetime of the patchset about how it would work, and one of the options was to read each dword from a separate offset. (I think I started implementing atomic buffers well over a year ago, only to be stymied by the memory window issue and give up for a long time.) I eventually came to realize that was insanity. > > Just like handleSTORE does: > > off = fetchSrc(0, 0); > > And always using a 'c' of 0 seems correct here since we are dealing > with an address. > > Once I know which docs to update for this, I'll do a v2 of this patch > and add a preparation patch fixing the above to the v2 set. src/gallium/docs/source/tgsi.rst There are push hooks which make readthedocs.org re-pull from the mesa repo on every push so that things are up to date (well, it takes a few minutes to regenerate the html). -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Nouveau] [PATCH] nouveau: codegen: Take src swizzle into account on loads
Hi, On 08-04-16 18:06, Hans de Goede wrote: Hi, On 08-04-16 17:45, Ilia Mirkin wrote: On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goedewrote: When dealing with non vector variables the llvm register allocator will use TEMP[0].x then TEMP[0].y, etc. When loading something from a global buffer it will calculate the address to use, and store that in say TEMP[0].x, so it ends up generating: LOAD TEMP[0].y, MEMORY[0], TEMP[0] Expecting the contents of TEMP[0].y to become the 32 bits of data to which TEMP[0].x is pointing. But instead it will get the 32 bits of data at address (TEMP[0].x + 4). With the old RES[32767] code one could generate the following TGSI: LOAD TEMP[0].y, RES[32767]., TEMP[0] And things would work fine since the . swizzling postfix would be honored and when storing to y (the only component set in the dest-mask) the x component at address (TEMP[0].x) would be loaded, rather then the y component at (TEMP[0].y) Note that another approach would be to not increment the address by a 32 bit word for skipped (not set in destmask) components. The way I see it either: 1) We see that LOAD does not deal with vectors, but with flat memory, in which case skipping 4 bytes because x is not set in the destmask does not make sense, as that is a vector thing todo. 2) LOAD is vector layout aware in which case supporting swizzling makes sense. Currently we have a weird hybrid which is rather cumbersome to work with from a compiler pov. And I guess LLVM never ends up generating any of the other "funny" instructions like LIT and the such. Well, I have no problem adding the swizzling logic, i.e. the way that LOAD will now work (logically) is that it will (a) fetch 4 values from the coordinates provided (4 sequential dwords from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in the case of images) (b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE argument (c) store that swizzled result into the destination based on the writemask That would sound reasonable to me, and if I understand correctly, is option 2 of your proposal. Yes that is option 2, and is basically what the patch which started this thread does. So that would work for me :) We'd need some docs updates and buy-in from the other gallium driver developers. What docs would need updating ? The TGSI docs I'm aware of are at: http://gallium.readthedocs.org/en/latest/tgsi.html I assume those have a source in the mesa src somewhere (I've not looked), but those mostly just look quite incomplete in general when it comes to TGSI (I've had to revert to figuring what things do from the mesa srcs quite often) Have I been looking at the wrong docs perhaps ? Note that them being incomplete is not intended as an excuse to not document this, I'm all for better documentation. STORE remains unchanged, as the MEMORY/etc is in the destination, where there is a writemask, which is presently used and will remain effective. Right and note that the first src operand of STORE already takes swizzling into account, so the proposed option 2 will actually make the 2 more inline. Erm, I mean the 2nd src operand of the store of-course, the actual src. On a related note, comparing handleLOAD and handleSTORE, this bit in handleLOAD seems wrong: Value *off = fetchSrc(1, c); I believe that should be: Value *off = fetchSrc(1, 0); Just like handleSTORE does: off = fetchSrc(0, 0); And always using a 'c' of 0 seems correct here since we are dealing with an address. Once I know which docs to update for this, I'll do a v2 of this patch and add a preparation patch fixing the above to the v2 set. Regards, Hans ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH mesa v2 1/2] nouveau: codegen: Use FILE_MEMORY_BUFFER for buffers
On 04/08/2016 12:17 PM, Hans de Goede wrote: Hi, On 23-03-16 23:10, Samuel Pitoiset wrote: Are you sure this won't break compute shaders on fermi? Could you please double-check that? I just checked: lspci: 01:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 610] (rev a1) Before this patch-set: [hans@plank piglit]$ ./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader [9/9] pass: 9 / After this patch-set: [hans@plank piglit]$ ./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader [9/9] pass: 9 / And what about compute shaders? (ie. -t arb_compute_shader) Sorry to ask you again but I just want to be sure it's fine. :-) One minor comment below. On 03/17/2016 05:07 PM, Hans de Goede wrote: Some of the lowering steps we currently do for FILE_MEMORY_GLOBAL only apply to buffers, making it impossible to use FILE_MEMORY_GLOBAL for OpenCL global buffers. This commits changes the buffer code to use FILE_MEMORY_BUFFER at the ir_from_tgsi and lowering steps, freeing use of FILE_MEMORY_GLOBAL for use with OpenCL global buffers. Note that after lowering buffer accesses use the FILE_MEMORY_GLOBAL register file. Tested with piglet on a gk107, before this patch: ./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader [9/9] pass: 9 / after: ./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader [9/9] pass: 9 / Signed-off-by: Hans de Goede--- Changes in v2: -New patch in v2 of patch-set to re-enable support for global opencl buffers --- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 +- src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 8 +--- src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 5 - src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 1 + 6 files changed, 13 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h index 7b0eb2f..5141fc6 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h @@ -332,6 +332,7 @@ enum DataFile FILE_MEMORY_CONST, FILE_SHADER_INPUT, FILE_SHADER_OUTPUT, + FILE_MEMORY_BUFFER, FILE_MEMORY_GLOBAL, FILE_MEMORY_SHARED, FILE_MEMORY_LOCAL, diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index baa2e30..7ae0cb2 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -373,7 +373,7 @@ static nv50_ir::DataFile translateFile(uint file) case TGSI_FILE_PREDICATE: return nv50_ir::FILE_PREDICATE; case TGSI_FILE_IMMEDIATE: return nv50_ir::FILE_IMMEDIATE; case TGSI_FILE_SYSTEM_VALUE:return nv50_ir::FILE_SYSTEM_VALUE; - case TGSI_FILE_BUFFER: return nv50_ir::FILE_MEMORY_GLOBAL; + case TGSI_FILE_BUFFER: return nv50_ir::FILE_MEMORY_BUFFER; case TGSI_FILE_MEMORY: return nv50_ir::FILE_MEMORY_GLOBAL; case TGSI_FILE_SAMPLER: case TGSI_FILE_NULL: diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index d0936d8..628deb7 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp @@ -1141,13 +1141,14 @@ NVC0LoweringPass::handleATOM(Instruction *atom) handleSharedATOM(atom); return true; default: - assert(atom->src(0).getFile() == FILE_MEMORY_GLOBAL); + assert(atom->src(0).getFile() == FILE_MEMORY_BUFFER); base = loadResInfo64(ind, atom->getSrc(0)->reg.fileIndex * 16); assert(base->reg.size == 8); if (ptr) base = bld.mkOp2v(OP_ADD, TYPE_U64, base, base, ptr); assert(base->reg.size == 8); atom->setIndirect(0, 0, base); + atom->getSrc(0)->reg.file = FILE_MEMORY_GLOBAL; return true; } base = @@ -1963,7 +1964,7 @@ NVC0LoweringPass::visit(Instruction *i) } else if (i->src(0).getFile() == FILE_SHADER_OUTPUT) { assert(prog->getType() == Program::TYPE_TESSELLATION_CONTROL); i->op = OP_VFETCH; - } else if (i->src(0).getFile() == FILE_MEMORY_GLOBAL) { + } else if (i->src(0).getFile() == FILE_MEMORY_BUFFER) { Value *ind = i->getIndirect(0, 1); Value *ptr = loadResInfo64(ind, i->getSrc(0)->reg.fileIndex * 16); // XXX come up with a way not to do this for EVERY little access but @@ -1978,6 +1979,7 @@ NVC0LoweringPass::visit(Instruction *i) } i->setIndirect(0, 1, NULL);
Re: [Mesa-dev] [PATCH] nouveau: codegen: Take src swizzle into account on loads
Hi, On 08-04-16 17:45, Ilia Mirkin wrote: On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goedewrote: When dealing with non vector variables the llvm register allocator will use TEMP[0].x then TEMP[0].y, etc. When loading something from a global buffer it will calculate the address to use, and store that in say TEMP[0].x, so it ends up generating: LOAD TEMP[0].y, MEMORY[0], TEMP[0] Expecting the contents of TEMP[0].y to become the 32 bits of data to which TEMP[0].x is pointing. But instead it will get the 32 bits of data at address (TEMP[0].x + 4). With the old RES[32767] code one could generate the following TGSI: LOAD TEMP[0].y, RES[32767]., TEMP[0] And things would work fine since the . swizzling postfix would be honored and when storing to y (the only component set in the dest-mask) the x component at address (TEMP[0].x) would be loaded, rather then the y component at (TEMP[0].y) Note that another approach would be to not increment the address by a 32 bit word for skipped (not set in destmask) components. The way I see it either: 1) We see that LOAD does not deal with vectors, but with flat memory, in which case skipping 4 bytes because x is not set in the destmask does not make sense, as that is a vector thing todo. 2) LOAD is vector layout aware in which case supporting swizzling makes sense. Currently we have a weird hybrid which is rather cumbersome to work with from a compiler pov. And I guess LLVM never ends up generating any of the other "funny" instructions like LIT and the such. Well, I have no problem adding the swizzling logic, i.e. the way that LOAD will now work (logically) is that it will (a) fetch 4 values from the coordinates provided (4 sequential dwords from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in the case of images) (b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE argument (c) store that swizzled result into the destination based on the writemask That would sound reasonable to me, and if I understand correctly, is option 2 of your proposal. Yes that is option 2, and is basically what the patch which started this thread does. So that would work for me :) We'd need some docs updates and buy-in from the other gallium driver developers. What docs would need updating ? The TGSI docs I'm aware of are at: http://gallium.readthedocs.org/en/latest/tgsi.html I assume those have a source in the mesa src somewhere (I've not looked), but those mostly just look quite incomplete in general when it comes to TGSI (I've had to revert to figuring what things do from the mesa srcs quite often) Have I been looking at the wrong docs perhaps ? Note that them being incomplete is not intended as an excuse to not document this, I'm all for better documentation. STORE remains unchanged, as the MEMORY/etc is in the destination, where there is a writemask, which is presently used and will remain effective. Right and note that the first src operand of STORE already takes swizzling into account, so the proposed option 2 will actually make the 2 more inline. Regards, Hans ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nouveau: codegen: Take src swizzle into account on loads
On Fri, Apr 8, 2016 at 11:28 AM, Hans de Goedewrote: > When dealing with non vector variables the llvm register allocator > will use TEMP[0].x then TEMP[0].y, etc. > > When loading something from a global buffer it will calculate the > address to use, and store that in say TEMP[0].x, so it ends up > generating: > > LOAD TEMP[0].y, MEMORY[0], TEMP[0] > > Expecting the contents of TEMP[0].y to become the 32 bits of data > to which TEMP[0].x is pointing. But instead it will get the 32 bits of > data at address (TEMP[0].x + 4). > > With the old RES[32767] code one could generate the following TGSI: > > LOAD TEMP[0].y, RES[32767]., TEMP[0] > > And things would work fine since the . swizzling postfix would > be honored and when storing to y (the only component set in the dest-mask) > the x component at address (TEMP[0].x) would be loaded, rather then the > y component at (TEMP[0].y) > > Note that another approach would be to not increment the address by > a 32 bit word for skipped (not set in destmask) components. > > The way I see it either: > > 1) We see that LOAD does not deal with vectors, but with flat memory, > in which case skipping 4 bytes because x is not set in the destmask > does not make sense, as that is a vector thing todo. > > 2) LOAD is vector layout aware in which case supporting swizzling > makes sense. > > Currently we have a weird hybrid which is rather cumbersome to > work with from a compiler pov. And I guess LLVM never ends up generating any of the other "funny" instructions like LIT and the such. Well, I have no problem adding the swizzling logic, i.e. the way that LOAD will now work (logically) is that it will (a) fetch 4 values from the coordinates provided (4 sequential dwords from src1.x in the case of buffer/memory, RGBA colors from src1.xyz in the case of images) (b) swizzle them according to the swizzle on the MEMORY/BUFFER/IMAGE argument (c) store that swizzled result into the destination based on the writemask That would sound reasonable to me, and if I understand correctly, is option 2 of your proposal. We'd need some docs updates and buy-in from the other gallium driver developers. STORE remains unchanged, as the MEMORY/etc is in the destination, where there is a writemask, which is presently used and will remain effective. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeon/uvd: alignment fix for decode message buffer
Signed-off-by: Boyuan ZhangReviewed-by: Christian König --- src/gallium/drivers/radeon/radeon_uvd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeon/radeon_uvd.c b/src/gallium/drivers/radeon/radeon_uvd.c index 233f460..098baf2 100644 --- a/src/gallium/drivers/radeon/radeon_uvd.c +++ b/src/gallium/drivers/radeon/radeon_uvd.c @@ -1003,7 +1003,7 @@ static void ruvd_end_frame(struct pipe_video_codec *decoder, dec->msg->body.decode.dpb_size = dec->dpb.res->buf->size; dec->msg->body.decode.bsd_size = bs_size; - dec->msg->body.decode.db_pitch = dec->base.width; + dec->msg->body.decode.db_pitch = align(dec->base.width, 16); dt = dec->set_dtb(dec->msg, (struct vl_video_buffer *)target); if (((struct r600_common_screen*)dec->screen)->family >= CHIP_STONEY) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] nir: Silence unused "options" warning in algebraic passes.
On 04/08/2016 01:35 AM, Kenneth Graunke wrote: Some passes may not refer to options->..., at which point the compiler will warn about an unused variable. Just cast to void unconditionally to shut it up. Signed-off-by: Kenneth Graunke--- src/compiler/nir/nir_algebraic.py | 1 + 1 file changed, 1 insertion(+) diff --git a/src/compiler/nir/nir_algebraic.py b/src/compiler/nir/nir_algebraic.py index d05564f..53a7907 100644 --- a/src/compiler/nir/nir_algebraic.py +++ b/src/compiler/nir/nir_algebraic.py @@ -291,6 +291,7 @@ ${pass_name}(nir_shader *shader) bool progress = false; bool condition_flags[${len(condition_list)}]; const nir_shader_compiler_options *options = shader->options; + (void) options; Hmm, don't like this very much. I suppose since it is generated code, it can not be done per block where options is unused. For respect to other people's right to have clean build logs, patch is: Reviewed-by: Eduardo Lima Mitev % for index, condition in enumerate(condition_list): condition_flags[${index}] = ${condition}; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nouveau: codegen: Take src swizzle into account on loads
Hi, On 08-04-16 17:02, Ilia Mirkin wrote: On Fri, Apr 8, 2016 at 5:27 AM, Hans de Goedewrote: Hi, On 07-04-16 15:58, Ilia Mirkin wrote: That's wrong. It used to work with the old RES[] code and if one cannot specify a source swizzle, then how can I do something like LOAD TEMP[0].y, MEMORY[0], address And get the data at absolute global memory address "address" into TEMP[0].y ? This is a must-have for llvm to be able to generate working TGSI code, I do not see any way around this. AFAIK this is exactly what src-swizzling is for. Also note that this commit does not change anything if no src-swizzling is specified, in that case things work exactly as before. The spec for the instruction needs to be clarified... The current nouveau impl is correct - only the .x of the address should be loaded, with up to 16 bytes read into the destination. Ah note this is not about swizzling on the address, that indeed makes no sense given how the addressing works for BUFFERS / MEMORY, no this is about adding a swizlling postfix to the buffer / memory resource specification, for example: LOAD TEMP[0].y, MEMORY[0]., TEMP[0] See the swizzling is done on the resource, not on the address, so the swizzling specifies swizzling of the up to 16 bytes read from address, it does not influence the address handling at all. I now see I made an error in my commit msg, it gives the following example: LOAD TEMP[0].y, MEMORY[0]., TEMP[0].x This clearly is wrong, the last TEMP[0].x is not even valid TGSI, the correct example would be: LOAD TEMP[0].y, MEMORY[0]., TEMP[0] I stand by my comment of "working as intended". But that doesn't mean the intent can't be changed :) For memory/buffers, LOAD takes the address at TEMP[0].x and loads 16 bytes (4 words), and sticks them into the destination's .xyzw. If you happen to have a writemask, then only some of those are written out. It seems that you're trying to add additional meaning to the swizzle on the "memory" argument. However I don't believe that such a thing is defined. (And definitely not used anywhere, at least not on purpose.) Why does this cause you issues with LLVM-generated TGSI? When dealing with non vector variables the llvm register allocator will use TEMP[0].x then TEMP[0].y, etc. When loading something from a global buffer it will calculate the address to use, and store that in say TEMP[0].x, so it ends up generating: LOAD TEMP[0].y, MEMORY[0], TEMP[0] Expecting the contents of TEMP[0].y to become the 32 bits of data to which TEMP[0].x is pointing. But instead it will get the 32 bits of data at address (TEMP[0].x + 4). With the old RES[32767] code one could generate the following TGSI: LOAD TEMP[0].y, RES[32767]., TEMP[0] And things would work fine since the . swizzling postfix would be honored and when storing to y (the only component set in the dest-mask) the x component at address (TEMP[0].x) would be loaded, rather then the y component at (TEMP[0].y) Note that another approach would be to not increment the address by a 32 bit word for skipped (not set in destmask) components. The way I see it either: 1) We see that LOAD does not deal with vectors, but with flat memory, in which case skipping 4 bytes because x is not set in the destmask does not make sense, as that is a vector thing todo. 2) LOAD is vector layout aware in which case supporting swizzling makes sense. Currently we have a weird hybrid which is rather cumbersome to work with from a compiler pov. Regards, Hans -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] nir: Do basic constant reassociation.
On 04/08/2016 01:35 AM, Kenneth Graunke wrote: Many shaders contain expression trees of the form: const_1 * (value * const_2) Reorganizing these to (const_1 * const_2) * value will allow constant folding to combine the constants. Sometimes, these constants are 2 and 0.5, so we can remove a multiply altogether. Other times, it can create more immediate constants, which can actually hurt. Finding a good balance here is tricky. While much more could be done, this simple patch seems to have a lot of positive benefit while having a low downside. Makes sense. Reviewed-by: Eduardo Lima Mitevshader-db results on Broadwell: total instructions in shared programs: 8963768 -> 8961369 (-0.03%) instructions in affected programs: 438318 -> 435919 (-0.55%) helped: 1502 HURT: 245 total cycles in shared programs: 71527354 -> 71421516 (-0.15%) cycles in affected programs: 11541788 -> 11435950 (-0.92%) helped: 3445 HURT: 1224 FWIW, these are my results on HSW: total instructions in shared programs: 6223995 -> 6222470 (-0.02%) instructions in affected programs: 300945 -> 299420 (-0.51%) helped: 1231 HURT: 240 total cycles in shared programs: 56809432 -> 56615552 (-0.34%) cycles in affected programs: 7980160 -> 7786280 (-2.43%) helped: 2087 HURT: 634 Eduardo Signed-off-by: Kenneth Graunke --- src/compiler/nir/nir_opt_algebraic.py | 8 1 file changed, 8 insertions(+) diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py index e72b4a7..420d9d9 100644 --- a/src/compiler/nir/nir_opt_algebraic.py +++ b/src/compiler/nir/nir_opt_algebraic.py @@ -274,6 +274,14 @@ optimizations = [ (('fmul', ('fneg', a), b), ('fneg', ('fmul', a, b))), (('imul', ('ineg', a), b), ('ineg', ('imul', a, b))), + # Reassociate constants in add/mul chains so they can be folded together. + # For now, we only handle cases where the constants are separated by + # a single non-constant. We could do better eventually. + (('~fmul', '#a', ('fmul', b, '#c')), ('fmul', ('fmul', a, c), b)), + (('imul', '#a', ('imul', b, '#c')), ('imul', ('imul', a, c), b)), + (('~fadd', '#a', ('fadd', b, '#c')), ('fadd', ('fadd', a, c), b)), + (('iadd', '#a', ('iadd', b, '#c')), ('iadd', ('iadd', a, c), b)), + # Misc. lowering (('fmod', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b, 'options->lower_fmod'), (('uadd_carry', a, b), ('b2i', ('ult', ('iadd', a, b), a)), 'options->lower_uadd_carry'), ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 00/15] i965: Rework uniform handling in the back-end
On Apr 7, 2016 11:08 PM, "Jason Ekstrand"wrote: > > > On Apr 7, 2016 10:01 PM, "Matt Turner" wrote: > > > > On Tue, Mar 22, 2016 at 3:33 PM, Jason Ekstrand wrote: > > > This is mostly a re-send of a patch series I've had floating around in one > > > form or a while for quite some time. It's basically the same except that > > > the original version was missing a work-around for Sandy Bridge. For a > > > while, I wasn't really pushing to get it merged because I couldn't > > > demonstrate any actual performance benifit from pushing arrays. However, > > > with the Vulkan API, the concept of push constants is directly exposed to > > > the user and we really need to be able to indirect on them. This series > > > makes the FS backend 100% ready for indirect push constants; vec4 will > > > take a little more work. > > > > > > It's worth noting that we've been carying these patches around in our > > > Vulkan driver for probably 3 or 4 months now and it's working great. > > > > > > For those that prefer to review on a branch: > > > > > > https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i965-uniforms > > > > > > I think Kristian has mostly reviewed these patches. However, he never sent > > > any R-Bs to the list. I'd also like Ken or Matt to look at it from a > > > design perspective. > > > > I don't know what I think. I'm sympathetic to Curro's argument, but in > > the absence of more data it's hard to judge anything really. I'm not > > at all sympathetic to > > > > """ > > Do I have a proof-of-concept in code, no. However, I've run through > > it in my head and I have a pretty good idea what it would look like. > > You are free to go off and do it if you don't believe me, but I don't > > really want to hold things up while you do. > > """ > > > > That's what... An Appeal to Your Brain? :) > > Sort-of... It was more a remark of frustration at the (percieved) implication that I hadn't thought about it or at the very least hadnt given it a fair shake. In a bit more detail here are some of my thoughts on reladdr and an ADDR file in no particular order > > a) Not a single FS optimization pass handles it. Yes, "if you see reladdr, bail” is a valid (if suboptimal) strategy 90% of the time. However, anything that computes any sort of kill set now needs a recursive algorithm to walk register sources. We do handle this in NIR and it's not terrible but it does come with nontrivial pain and retrofitting it isn't necessarily going to be quick-and-easy. Curro's response of "use-def chains will fix this" while probably accurate doesn't solve the immediate problem while these patches have been on the list for 6 months. > > b) The hardware doesn't do reladdr. It has an address register with substantial restrictions. Eventually, we would need to lower to something that writes the address register and have an indirect source type that consumes it. If you end up with two indirect sources, we have to emit a move for one of them. Where do we do that lowering? Do we do it in the generator or as a pass? > > c) If we handle it all in the generator, we have no ability to schedule it at all. It also makes the generator far more complex. > > d) If we handle it in a lowering pass, what does that pass produce? Do we expose the ADDR file and try to do RA on it or do we treat it as a fixed thing like flag? In either case, we need to add extra logic to at least the scheduler if not other places to add this whole new concept. > > e) If we allow indirect sources of any sort, how do we carry range information around post-RA. Pre-RA we can theoretically just say if you indirect you touch the whole thing. Post-RA, you either have to carry that information around per-instruction or you have to assume that any instruction that uses an indirect source could be reading anything in the entire GRF and it becomes almost a complete scheduling barrier. > > Those are the thoughts that pop to the top. I could come up with more if you'd like. > > So, yes, using reladdr or or an ADDR file would be possible but it would involve substantial IR surgery. What's the benefit? You can put the relative source directly in the instruction that uses it and maybe do an address calculation directly to the address register instead of having to move it there. The approach I've taken on the other hand, neatly side-steps all of the issues listed above. This comes at the cost of a few extra instructions (which you probably have to spend anyway on gen7). I think that trade-off is worthwhile. There are some other things I found to be very nice about the approach. One is that we got too stop carrying around these arrays of extra uniform size information. They're the only substantial per-backend bit of uniform setup that we do and they're only used for knowing how big the indirected uniforms are. Their existence has bugged me ever since we first got NIR going. They're not too
Re: [Mesa-dev] [PATCH] nouveau: codegen: Take src swizzle into account on loads
On Fri, Apr 8, 2016 at 5:27 AM, Hans de Goedewrote: > Hi, > > On 07-04-16 15:58, Ilia Mirkin wrote: >> >> That's wrong. > > > It used to work with the old RES[] code and if one cannot specify > a source swizzle, then how can I do something like > > LOAD TEMP[0].y, MEMORY[0], address > > And get the data at absolute global memory address "address" into TEMP[0].y > ? > > This is a must-have for llvm to be able to generate working TGSI code, > I do not see any way around this. > > AFAIK this is exactly what src-swizzling is for. Also note that > this commit does not change anything if no src-swizzling is specified, > in that case things work exactly as before. > >> The spec for the instruction needs to be clarified... >> >> The current nouveau impl is correct - only the .x of the address >> should be loaded, with up to 16 bytes read into the destination. > > > Ah note this is not about swizzling on the address, that indeed > makes no sense given how the addressing works for BUFFERS / MEMORY, > no this is about adding a swizlling postfix to the buffer / memory > resource specification, for example: > > LOAD TEMP[0].y, MEMORY[0]., TEMP[0] > > See the swizzling is done on the resource, not on the address, so > the swizzling specifies swizzling of the up to 16 bytes read from > address, it does not influence the address handling at all. > > I now see I made an error in my commit msg, it gives the following > example: > > LOAD TEMP[0].y, MEMORY[0]., TEMP[0].x > > This clearly is wrong, the last TEMP[0].x is not even valid TGSI, > the correct example would be: > > LOAD TEMP[0].y, MEMORY[0]., TEMP[0] I stand by my comment of "working as intended". But that doesn't mean the intent can't be changed :) For memory/buffers, LOAD takes the address at TEMP[0].x and loads 16 bytes (4 words), and sticks them into the destination's .xyzw. If you happen to have a writemask, then only some of those are written out. It seems that you're trying to add additional meaning to the swizzle on the "memory" argument. However I don't believe that such a thing is defined. (And definitely not used anywhere, at least not on purpose.) Why does this cause you issues with LLVM-generated TGSI? -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: remove duplicate Makefile.sources
On Fri, Apr 8, 2016 at 4:37 AM, Jose Fonsecawrote: > On 02/04/16 19:35, Rob Clark wrote: >> >> From: Rob Clark >> >> Allegedly this was needed still by scons build.. but in practice it >> doesn't seem to be needed. Removing it and running 'scons' results >> in no build errors. >> >> Signed-off-by: Rob Clark >> --- >> So, afaict NIR is not even built w/ scons build (I'm just running >> 'scons' with no args, so let me know if I'm missing some build >> variant). So at least if there is no scons variant that *does* >> build NIR, I think this is the right thing to do to reduce >> confusion. But it brings up a bigger question of what to do >> with my patchset which adds NIR support in mesa/st, since that >> obviosly won't work with scons build as-is. >> >> I guess the two options are to try to add NIR into scons build >> (which involves some .py generated code, so maybe not trivial) >> or just #ifdef'ify all the mesa/st parts in my gallium-nir >> patchset which introduce dependencies on NIR. Opinions? > > > If you might recall, I already had patches to build NIR with SCons: > > http://cgit.freedesktop.org/~jrfonseca/mesa/log/?h=scons-nir oh, awesome, I'd forgotten about that.. let me see if I can resurrect that BR, -R > I ended up not taking any action at the time. First due to lack of time, > second because I noticed NIR source trees being moving around, so the risk > of exposing scons build to breakage didn't seem worthwhile. > > So those changes need to be rebased and probably updated a bit, but it gives > you an idea of what needs to be done. > > Jose > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] glsl: handle unsigned int wraparound in link_shaders()
NOTE: someone with access will need to commit this afer the review process v2: change check_explicit_uniform_locations() to return an unsigned 0 (Timothy Arceri) We were storing the int result of check_explicit_uniform_locations() in num_explicit_uniform_locs as an unsigned int which caused it to be 4294967295 when a -1 was returned. This in turn would cause the following error during linking: error: count of uniform locations > MAX_UNIFORM_LOCATIONS(4294967295 > 98304) Here are the results from running piglit tests/all with this patch: changes: 178 fixes: 176 regressions: 2 The two regressions are for the following tests: glean@glsl1-matrix column check (1) glean@glsl1-matrix column check (2) which regress from FAIL to CRASH. I am okay with these regressions because the tests are currently failing due to the aforementioned linker error. Signed-off-by: Lars Hamre--- src/compiler/glsl/linker.cpp | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp index cd35464..cc74574 100644 --- a/src/compiler/glsl/linker.cpp +++ b/src/compiler/glsl/linker.cpp @@ -3164,12 +3164,12 @@ reserve_subroutine_explicit_locations(struct gl_shader_program *prog, * any optimizations happen to handle also inactive uniforms and * inactive array elements that may get trimmed away. */ -static int +static unsigned check_explicit_uniform_locations(struct gl_context *ctx, struct gl_shader_program *prog) { if (!ctx->Extensions.ARB_explicit_uniform_location) - return -1; + return 0; /* This map is used to detect if overlapping explicit locations * occur with the same uniform (from different stage) or a different one. @@ -3178,7 +3178,7 @@ check_explicit_uniform_locations(struct gl_context *ctx, if (!uniform_map) { linker_error(prog, "Out of memory during linking.\n"); - return -1; + return 0; } unsigned entries_total = 0; @@ -3207,7 +3207,7 @@ check_explicit_uniform_locations(struct gl_context *ctx, } if (!ret) { delete uniform_map; - return -1; + return 0; } } } -- 2.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nouveau: codegen: Take src swizzle into account on loads
Ah, I may have read over your commit too hastily. Will have another look. On Apr 8, 2016 5:27 AM, "Hans de Goede"wrote: > Hi, > > On 07-04-16 15:58, Ilia Mirkin wrote: > >> That's wrong. >> > > It used to work with the old RES[] code and if one cannot specify > a source swizzle, then how can I do something like > > LOAD TEMP[0].y, MEMORY[0], address > > And get the data at absolute global memory address "address" into > TEMP[0].y ? > > This is a must-have for llvm to be able to generate working TGSI code, > I do not see any way around this. > > AFAIK this is exactly what src-swizzling is for. Also note that > this commit does not change anything if no src-swizzling is specified, > in that case things work exactly as before. > > > The spec for the instruction needs to be clarified... > >> The current nouveau impl is correct - only the .x of the address >> should be loaded, with up to 16 bytes read into the destination. >> > > Ah note this is not about swizzling on the address, that indeed > makes no sense given how the addressing works for BUFFERS / MEMORY, > no this is about adding a swizlling postfix to the buffer / memory > resource specification, for example: > > LOAD TEMP[0].y, MEMORY[0]., TEMP[0] > > See the swizzling is done on the resource, not on the address, so > the swizzling specifies swizzling of the up to 16 bytes read from > address, it does not influence the address handling at all. > > I now see I made an error in my commit msg, it gives the following > example: > > LOAD TEMP[0].y, MEMORY[0]., TEMP[0].x > > This clearly is wrong, the last TEMP[0].x is not even valid TGSI, > the correct example would be: > > LOAD TEMP[0].y, MEMORY[0]., TEMP[0] > > Regards, > > Hans > > > On Thu, Apr 7, 2016 at 9:27 AM, Hans de Goede wrote: >> >>> The llvm TGSI backend does things like: >>> >>> >>> >>> Expecting the data at address TEMP[0].x to get loaded to >>> TEMP[0].y. Before this commit the data at TEMP[0].x + 4 would be >>> loaded instead. This commit fixes this. >>> >>> Signed-off-by: Hans de Goede >>> --- >>> src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 8 ++-- >>> 1 file changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp >>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp >>> index 557608e..cc51f5a 100644 >>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp >>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp >>> @@ -2279,12 +2279,16 @@ Converter::handleLOAD(Value *dst0[4]) >>> >>>Value *off = fetchSrc(1, c); >>>Symbol *sym; >>> + uint32_t src0_component_offset = tgsi.getSrc(0).getSwizzle(c) >>> * 4; >>> + >>>if (tgsi.getSrc(1).getFile() == TGSI_FILE_IMMEDIATE) { >>> off = NULL; >>> sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c, >>> - tgsi.getSrc(1).getValueU32(0, info) + 4 * c); >>> + tgsi.getSrc(1).getValueU32(0, info) + >>> + src0_component_offset); >>>} else { >>> -sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c, 4 * c); >>> +sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c, >>> + src0_component_offset); >>>} >>> >>>Instruction *ld = mkLoad(TYPE_U32, dst0[c], sym, off); >>> -- >>> 2.7.3 >>> >>> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: handle unsigned int wraparound in link_shaders()
Agreed, I didn't see that check_explicit_uniform_locations() was only used in link_shaders(). I will submit a v2 with those changes. On Thu, Apr 7, 2016 at 11:24 PM, Timothy Arceri < timothy.arc...@collabora.com> wrote: > On Thu, 2016-04-07 at 11:22 -0400, Lars Hamre wrote: > > NOTES: > > * Reposting due to no response last week > > * Someone with access will need to commit this patch after the review > > process > > > > check_explicit_uniform_locations() returns a -1 when the extension > > ARB_explicit_uniform_location is not found. > > > > We were storing the result in num_explicit_uniform_locs as an > > unsigned int which caused it to be 4294967295 when a -1 was returned. > > > > This in turn would cause the following error during linking: > > error: count of uniform locations > MAX_UNIFORM_LOCATIONS(4294967295 > > > 98304) > > > > Here are the results from running piglit tests/all with this patch: > > changes: 178 > > fixes: 176 > > regressions: 2 > > > > The two regressions are for the following tests: > > glean@glsl1-matrix column check (1) > > glean@glsl1-matrix column check (2) > > which regress from FAIL to CRASH. > > > > I am okay with these regressions because the tests are currently > > failing due to the aforementioned linker error. > > > > Signed-off-by: Lars Hamre> > > > --- > > src/compiler/glsl/linker.cpp | 6 -- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/src/compiler/glsl/linker.cpp > > b/src/compiler/glsl/linker.cpp > > index cd35464..a0a9104 100644 > > --- a/src/compiler/glsl/linker.cpp > > +++ b/src/compiler/glsl/linker.cpp > > @@ -4171,7 +4171,7 @@ link_shaders(struct gl_context *ctx, struct > > gl_shader_program *prog) > > > > tfeedback_decl *tfeedback_decls = NULL; > > unsigned num_tfeedback_decls = prog- > > >TransformFeedback.NumVarying; > > - unsigned int num_explicit_uniform_locs = 0; > > + int num_explicit_uniform_locs = 0; > > > > void *mem_ctx = ralloc_context(NULL); // temporary linker context > > > > @@ -4354,6 +4354,8 @@ link_shaders(struct gl_context *ctx, struct > > gl_shader_program *prog) > > } > > > > num_explicit_uniform_locs = check_explicit_uniform_locations(ctx, > > prog); > > + if (num_explicit_uniform_locs < 0) > > + num_explicit_uniform_locs = 0; > > Thanks for spotting this. However the just hacks around the problem. > > Since we never do anything with the -1 I would rather > see check_explicit_uniform_locations() changed to return an unsigned 0 > rather than -1. > > > > link_assign_subroutine_types(prog); > > > > if (!prog->LinkStatus) > > @@ -4585,7 +4587,7 @@ link_shaders(struct gl_context *ctx, struct > > gl_shader_program *prog) > > > > update_array_sizes(prog); > > link_assign_uniform_locations(prog, ctx- > > >Const.UniformBooleanTrue, > > - num_explicit_uniform_locs, > > + (unsigned > > int)num_explicit_uniform_locs, > > ctx- > > >Const.MaxUserAssignableUniformLocations); > > link_assign_atomic_counter_resources(ctx, prog); > > store_fragdepth_layout(prog); > > -- > > 2.5.5 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: fix mask checking when emitting scissors and viewports
On Fri, Apr 8, 2016 at 12:16 PM,wrote: > On 2016-04-08 19:00, Marek Olšák wrote: >> >> From: Marek Olšák >> >> --- >> src/gallium/drivers/radeonsi/si_state.c | 12 >> 1 file changed, 8 insertions(+), 4 deletions(-) >> >> diff --git a/src/gallium/drivers/radeonsi/si_state.c >> b/src/gallium/drivers/radeonsi/si_state.c >> index 8087d23..3894e1d 100644 >> --- a/src/gallium/drivers/radeonsi/si_state.c >> +++ b/src/gallium/drivers/radeonsi/si_state.c >> @@ -912,8 +912,10 @@ static void si_emit_scissors(struct si_context >> *sctx, struct r600_atom *atom) >> bool scissor_enable = >> sctx->queued.named.rasterizer->scissor_enable; >> >> /* The simple case: Only 1 viewport is active. */ >> - if (mask & 1 && >> - !si_get_vs_info(sctx)->writes_viewport_index) { >> + if (!si_get_vs_info(sctx)->writes_viewport_index) { >> + if (!(mask & 1)) > > > seems a bit tentative.. did you want 1u here or? 1 and 1u are equivalent in this case. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [android-x86-devel] Re: gralloc_drm_pipe
2016-04-07 13:30 GMT+08:00 Rob Herring: > On Wed, Apr 6, 2016 at 11:12 AM, Chih-Wei Huang > wrote: > >> I guess the first supported GPU is virgl. Right? > > Yes. Any gallium driver really. The classic mesa drivers will need > their own additions for GBM map/unmap. > >> When could we expect it's ready to test? > > Not sure. Definitely not until the GBM interface is set. There's not > really much reason for android-x86 to move to it until it gets flushed > out. Could you provide me the patches of mesa? I'm glad to test it. Thanks! -- Chih-Wei Android-x86 project http://www.android-x86.org ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: implement and rely on set_active_query_state
Reviewed-by: Edward O'CallaghanOn 2016-04-08 18:58, Marek Olšák wrote: From: Marek Olšák --- src/gallium/drivers/radeonsi/si_blit.c | 3 --- src/gallium/drivers/radeonsi/si_pipe.h | 4 src/gallium/drivers/radeonsi/si_state.c | 32 +++- src/gallium/drivers/radeonsi/si_state_draw.c | 10 + 4 files changed, 45 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_blit.c b/src/gallium/drivers/radeonsi/si_blit.c index c5ea8b1..aed783f 100644 --- a/src/gallium/drivers/radeonsi/si_blit.c +++ b/src/gallium/drivers/radeonsi/si_blit.c @@ -52,8 +52,6 @@ static void si_blitter_begin(struct pipe_context *ctx, enum si_blitter_op op) { struct si_context *sctx = (struct si_context *)ctx; - r600_suspend_nontimer_queries(>b); - util_blitter_save_vertex_buffer_slot(sctx->blitter, sctx->vertex_buffer); util_blitter_save_vertex_elements(sctx->blitter, sctx->vertex_elements); util_blitter_save_vertex_shader(sctx->blitter, sctx->vs_shader.cso); @@ -95,7 +93,6 @@ static void si_blitter_end(struct pipe_context *ctx) struct si_context *sctx = (struct si_context *)ctx; sctx->b.render_cond_force_off = false; - r600_resume_nontimer_queries(>b); } static unsigned u_max_sample(struct pipe_resource *r) diff --git a/src/gallium/drivers/radeonsi/si_pipe.h b/src/gallium/drivers/radeonsi/si_pipe.h index 4158fc5..8fcfcd2 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.h +++ b/src/gallium/drivers/radeonsi/si_pipe.h @@ -66,6 +66,9 @@ /* Compute only. */ #define SI_CONTEXT_FLUSH_WITH_INV_L2 (R600_CONTEXT_PRIVATE_FLAG << 13) /* TODO: merge with TC? */ #define SI_CONTEXT_FLAG_COMPUTE(R600_CONTEXT_PRIVATE_FLAG << 14) +/* Pipeline & streamout query controls. */ +#define SI_CONTEXT_START_PIPELINE_STATS (R600_CONTEXT_PRIVATE_FLAG << 15) +#define SI_CONTEXT_STOP_PIPELINE_STATS (R600_CONTEXT_PRIVATE_FLAG << 16) #define SI_CONTEXT_FLUSH_AND_INV_FRAMEBUFFER (SI_CONTEXT_FLUSH_AND_INV_CB | \ SI_CONTEXT_FLUSH_AND_INV_CB_META | \ @@ -289,6 +292,7 @@ struct si_context { booldb_stencil_clear; booldb_stencil_disable_expclear; unsignedps_db_shader_control; + boolocclusion_queries_disabled; /* Emitted draw state. */ int last_base_vertex; diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index a66bd30..6fbbb68 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -1352,6 +1352,26 @@ static void *si_create_db_flush_dsa(struct si_context *sctx) /* DB RENDER STATE */ +static void si_set_active_query_state(struct pipe_context *ctx, boolean enable) +{ + struct si_context *sctx = (struct si_context*)ctx; + + /* Pipeline stat & streamout queries. */ + if (enable) { + sctx->b.flags &= ~SI_CONTEXT_STOP_PIPELINE_STATS; + sctx->b.flags |= SI_CONTEXT_START_PIPELINE_STATS; + } else { + sctx->b.flags &= ~SI_CONTEXT_START_PIPELINE_STATS; + sctx->b.flags |= SI_CONTEXT_STOP_PIPELINE_STATS; + } + + /* Occlusion queries. */ + if (sctx->occlusion_queries_disabled != !enable) { + sctx->occlusion_queries_disabled = !enable; + si_mark_atom_dirty(sctx, >db_render_state); + } +} + static void si_set_occlusion_query_state(struct pipe_context *ctx, bool enable) { struct si_context *sctx = (struct si_context*)ctx; @@ -1386,7 +1406,8 @@ static void si_emit_db_render_state(struct si_context *sctx, struct r600_atom *s } /* DB_COUNT_CONTROL (occlusion queries) */ - if (sctx->b.num_occlusion_queries > 0) { + if (sctx->b.num_occlusion_queries > 0 && + !sctx->occlusion_queries_disabled) { bool perfect = sctx->b.num_perfect_occlusion_queries > 0; if (sctx->b.chip_class >= CIK) { @@ -3765,6 +3786,7 @@ void si_init_state_functions(struct si_context *sctx) sctx->b.b.set_min_samples = si_set_min_samples; sctx->b.b.set_tess_state = si_set_tess_state; + sctx->b.b.set_active_query_state = si_set_active_query_state; sctx->b.set_occlusion_query_state = si_set_occlusion_query_state; sctx->b.need_gfx_cs_space = si_need_gfx_cs_space; @@ -3995,6 +4017,14 @@ static void si_init_config(struct si_context *sctx) si_pm4_cmd_add(pm4, 0x8000); si_pm4_cmd_end(pm4, false); + /* This enables pipeline stat & streamout queries. +* They are only disabled by blits. +*/ + si_pm4_cmd_begin(pm4, PKT3_EVENT_WRITE); + si_pm4_cmd_add(pm4, EVENT_TYPE(V_028A90_PIPELINESTAT_START) | +
Re: [Mesa-dev] [PATCH mesa v2 1/2] nouveau: codegen: Use FILE_MEMORY_BUFFER for buffers
Hi, On 23-03-16 23:10, Samuel Pitoiset wrote: Are you sure this won't break compute shaders on fermi? Could you please double-check that? I just checked: lspci: 01:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 610] (rev a1) Before this patch-set: [hans@plank piglit]$ ./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader [9/9] pass: 9 / After this patch-set: [hans@plank piglit]$ ./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader [9/9] pass: 9 / One minor comment below. On 03/17/2016 05:07 PM, Hans de Goede wrote: Some of the lowering steps we currently do for FILE_MEMORY_GLOBAL only apply to buffers, making it impossible to use FILE_MEMORY_GLOBAL for OpenCL global buffers. This commits changes the buffer code to use FILE_MEMORY_BUFFER at the ir_from_tgsi and lowering steps, freeing use of FILE_MEMORY_GLOBAL for use with OpenCL global buffers. Note that after lowering buffer accesses use the FILE_MEMORY_GLOBAL register file. Tested with piglet on a gk107, before this patch: ./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader [9/9] pass: 9 / after: ./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader [9/9] pass: 9 / Signed-off-by: Hans de Goede--- Changes in v2: -New patch in v2 of patch-set to re-enable support for global opencl buffers --- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 +- src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 8 +--- src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 5 - src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 1 + 6 files changed, 13 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h index 7b0eb2f..5141fc6 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h @@ -332,6 +332,7 @@ enum DataFile FILE_MEMORY_CONST, FILE_SHADER_INPUT, FILE_SHADER_OUTPUT, + FILE_MEMORY_BUFFER, FILE_MEMORY_GLOBAL, FILE_MEMORY_SHARED, FILE_MEMORY_LOCAL, diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index baa2e30..7ae0cb2 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -373,7 +373,7 @@ static nv50_ir::DataFile translateFile(uint file) case TGSI_FILE_PREDICATE: return nv50_ir::FILE_PREDICATE; case TGSI_FILE_IMMEDIATE: return nv50_ir::FILE_IMMEDIATE; case TGSI_FILE_SYSTEM_VALUE:return nv50_ir::FILE_SYSTEM_VALUE; - case TGSI_FILE_BUFFER: return nv50_ir::FILE_MEMORY_GLOBAL; + case TGSI_FILE_BUFFER: return nv50_ir::FILE_MEMORY_BUFFER; case TGSI_FILE_MEMORY: return nv50_ir::FILE_MEMORY_GLOBAL; case TGSI_FILE_SAMPLER: case TGSI_FILE_NULL: diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index d0936d8..628deb7 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp @@ -1141,13 +1141,14 @@ NVC0LoweringPass::handleATOM(Instruction *atom) handleSharedATOM(atom); return true; default: - assert(atom->src(0).getFile() == FILE_MEMORY_GLOBAL); + assert(atom->src(0).getFile() == FILE_MEMORY_BUFFER); base = loadResInfo64(ind, atom->getSrc(0)->reg.fileIndex * 16); assert(base->reg.size == 8); if (ptr) base = bld.mkOp2v(OP_ADD, TYPE_U64, base, base, ptr); assert(base->reg.size == 8); atom->setIndirect(0, 0, base); + atom->getSrc(0)->reg.file = FILE_MEMORY_GLOBAL; return true; } base = @@ -1963,7 +1964,7 @@ NVC0LoweringPass::visit(Instruction *i) } else if (i->src(0).getFile() == FILE_SHADER_OUTPUT) { assert(prog->getType() == Program::TYPE_TESSELLATION_CONTROL); i->op = OP_VFETCH; - } else if (i->src(0).getFile() == FILE_MEMORY_GLOBAL) { + } else if (i->src(0).getFile() == FILE_MEMORY_BUFFER) { Value *ind = i->getIndirect(0, 1); Value *ptr = loadResInfo64(ind, i->getSrc(0)->reg.fileIndex * 16); // XXX come up with a way not to do this for EVERY little access but @@ -1978,6 +1979,7 @@ NVC0LoweringPass::visit(Instruction *i) } i->setIndirect(0, 1, NULL); i->setIndirect(0, 0, ptr); + i->getSrc(0)->reg.file = FILE_MEMORY_GLOBAL; bld.mkCmp(OP_SET, CC_GT, TYPE_U32, pred, TYPE_U32, offset, length);
Re: [Mesa-dev] [PATCH] radeonsi: fix mask checking when emitting scissors and viewports
On 2016-04-08 19:00, Marek Olšák wrote: From: Marek Olšák--- src/gallium/drivers/radeonsi/si_state.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index 8087d23..3894e1d 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -912,8 +912,10 @@ static void si_emit_scissors(struct si_context *sctx, struct r600_atom *atom) bool scissor_enable = sctx->queued.named.rasterizer->scissor_enable; /* The simple case: Only 1 viewport is active. */ - if (mask & 1 && - !si_get_vs_info(sctx)->writes_viewport_index) { + if (!si_get_vs_info(sctx)->writes_viewport_index) { + if (!(mask & 1)) seems a bit tentative.. did you want 1u here or? + return; + radeon_set_context_reg_seq(cs, R_028250_PA_SC_VPORT_SCISSOR_0_TL, 2); si_emit_one_scissor(cs, >viewports.states[0], scissor_enable ? [0] : NULL); @@ -960,8 +962,10 @@ static void si_emit_viewports(struct si_context *sctx, struct r600_atom *atom) unsigned mask = sctx->viewports.dirty_mask; /* The simple case: Only 1 viewport is active. */ - if (mask & 1 && - !si_get_vs_info(sctx)->writes_viewport_index) { + if (!si_get_vs_info(sctx)->writes_viewport_index) { + if (!(mask & 1)) + return; + radeon_set_context_reg_seq(cs, R_02843C_PA_CL_VPORT_XSCALE, 6); radeon_emit(cs, fui(states[0].scale[0])); radeon_emit(cs, fui(states[0].translate[0])); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/radeon: unify checking streamout enable state
Reviewed-by: Edward O'CallaghanOn 2016-04-08 19:00, Marek Olšák wrote: From: Marek Olšák --- src/gallium/drivers/r600/r600_state_common.c | 5 ++--- src/gallium/drivers/radeon/r600_pipe_common.h | 6 ++ src/gallium/drivers/radeon/r600_streamout.c | 6 -- src/gallium/drivers/radeonsi/si_state_draw.c | 3 +-- 4 files changed, 9 insertions(+), 11 deletions(-) diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index df41d3f..82babeb 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -1841,8 +1841,7 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info ia_switch_on_eop = true; } - if (rctx->b.streamout.streamout_enabled || - rctx->b.streamout.prims_gen_query_enabled) + if (r600_get_strmout_en(>b)) partial_vs_wave = true; radeon_set_context_reg(cs, CM_R_028AA8_IA_MULTI_VGT_PARAM, @@ -2018,7 +2017,7 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info rctx->b.family == CHIP_RV635) { /* if we have gs shader or streamout we need to do a wait idle after every draw */ - if (rctx->gs_shader || rctx->b.streamout.streamout_enabled) { + if (rctx->gs_shader || r600_get_strmout_en(>b)) { radeon_set_config_reg(cs, R_008040_WAIT_UNTIL, S_008040_WAIT_3D_IDLE(1)); } } diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index 062c319..7da7736 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -639,6 +639,12 @@ r600_resource_reference(struct r600_resource **ptr, struct r600_resource *res) (struct pipe_resource *)res); } +static inline bool r600_get_strmout_en(struct r600_common_context *rctx) +{ + return rctx->streamout.streamout_enabled || + rctx->streamout.prims_gen_query_enabled; +} + static inline unsigned r600_tex_aniso_filter(unsigned filter) { if (filter <= 1) return 0; diff --git a/src/gallium/drivers/radeon/r600_streamout.c b/src/gallium/drivers/radeon/r600_streamout.c index e977ed9..fc9ec48 100644 --- a/src/gallium/drivers/radeon/r600_streamout.c +++ b/src/gallium/drivers/radeon/r600_streamout.c @@ -311,12 +311,6 @@ void r600_emit_streamout_end(struct r600_common_context *rctx) * are no buffers bound. */ -static bool r600_get_strmout_en(struct r600_common_context *rctx) -{ - return rctx->streamout.streamout_enabled || - rctx->streamout.prims_gen_query_enabled; -} - static void r600_emit_streamout_enable(struct r600_common_context *rctx, struct r600_atom *atom) { diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index 84b850a..3863e59 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -882,8 +882,7 @@ void si_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info *info) if ((sctx->b.family == CHIP_HAWAII || sctx->b.family == CHIP_TONGA || sctx->b.family == CHIP_FIJI) && - (sctx->b.streamout.streamout_enabled || -sctx->b.streamout.prims_gen_query_enabled)) { + r600_get_strmout_en(>b)) { sctx->b.flags |= SI_CONTEXT_VGT_STREAMOUT_SYNC; } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nouveau: codegen: Take src swizzle into account on loads
Hi, On 07-04-16 15:58, Ilia Mirkin wrote: That's wrong. It used to work with the old RES[] code and if one cannot specify a source swizzle, then how can I do something like LOAD TEMP[0].y, MEMORY[0], address And get the data at absolute global memory address "address" into TEMP[0].y ? This is a must-have for llvm to be able to generate working TGSI code, I do not see any way around this. AFAIK this is exactly what src-swizzling is for. Also note that this commit does not change anything if no src-swizzling is specified, in that case things work exactly as before. > The spec for the instruction needs to be clarified... The current nouveau impl is correct - only the .x of the address should be loaded, with up to 16 bytes read into the destination. Ah note this is not about swizzling on the address, that indeed makes no sense given how the addressing works for BUFFERS / MEMORY, no this is about adding a swizlling postfix to the buffer / memory resource specification, for example: LOAD TEMP[0].y, MEMORY[0]., TEMP[0] See the swizzling is done on the resource, not on the address, so the swizzling specifies swizzling of the up to 16 bytes read from address, it does not influence the address handling at all. I now see I made an error in my commit msg, it gives the following example: LOAD TEMP[0].y, MEMORY[0]., TEMP[0].x This clearly is wrong, the last TEMP[0].x is not even valid TGSI, the correct example would be: LOAD TEMP[0].y, MEMORY[0]., TEMP[0] Regards, Hans On Thu, Apr 7, 2016 at 9:27 AM, Hans de Goedewrote: The llvm TGSI backend does things like: Expecting the data at address TEMP[0].x to get loaded to TEMP[0].y. Before this commit the data at TEMP[0].x + 4 would be loaded instead. This commit fixes this. Signed-off-by: Hans de Goede --- src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 557608e..cc51f5a 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -2279,12 +2279,16 @@ Converter::handleLOAD(Value *dst0[4]) Value *off = fetchSrc(1, c); Symbol *sym; + uint32_t src0_component_offset = tgsi.getSrc(0).getSwizzle(c) * 4; + if (tgsi.getSrc(1).getFile() == TGSI_FILE_IMMEDIATE) { off = NULL; sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c, - tgsi.getSrc(1).getValueU32(0, info) + 4 * c); + tgsi.getSrc(1).getValueU32(0, info) + + src0_component_offset); } else { -sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c, 4 * c); +sym = makeSym(tgsi.getSrc(0).getFile(), r, -1, c, + src0_component_offset); } Instruction *ld = mkLoad(TYPE_U32, dst0[c], sym, off); -- 2.7.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: fix mask checking when emitting scissors and viewports
From: Marek Olšák--- src/gallium/drivers/radeonsi/si_state.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index 8087d23..3894e1d 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -912,8 +912,10 @@ static void si_emit_scissors(struct si_context *sctx, struct r600_atom *atom) bool scissor_enable = sctx->queued.named.rasterizer->scissor_enable; /* The simple case: Only 1 viewport is active. */ - if (mask & 1 && - !si_get_vs_info(sctx)->writes_viewport_index) { + if (!si_get_vs_info(sctx)->writes_viewport_index) { + if (!(mask & 1)) + return; + radeon_set_context_reg_seq(cs, R_028250_PA_SC_VPORT_SCISSOR_0_TL, 2); si_emit_one_scissor(cs, >viewports.states[0], scissor_enable ? [0] : NULL); @@ -960,8 +962,10 @@ static void si_emit_viewports(struct si_context *sctx, struct r600_atom *atom) unsigned mask = sctx->viewports.dirty_mask; /* The simple case: Only 1 viewport is active. */ - if (mask & 1 && - !si_get_vs_info(sctx)->writes_viewport_index) { + if (!si_get_vs_info(sctx)->writes_viewport_index) { + if (!(mask & 1)) + return; + radeon_set_context_reg_seq(cs, R_02843C_PA_CL_VPORT_XSCALE, 6); radeon_emit(cs, fui(states[0].scale[0])); radeon_emit(cs, fui(states[0].translate[0])); -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium/radeon: unify checking streamout enable state
From: Marek Olšák--- src/gallium/drivers/r600/r600_state_common.c | 5 ++--- src/gallium/drivers/radeon/r600_pipe_common.h | 6 ++ src/gallium/drivers/radeon/r600_streamout.c | 6 -- src/gallium/drivers/radeonsi/si_state_draw.c | 3 +-- 4 files changed, 9 insertions(+), 11 deletions(-) diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index df41d3f..82babeb 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -1841,8 +1841,7 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info ia_switch_on_eop = true; } - if (rctx->b.streamout.streamout_enabled || - rctx->b.streamout.prims_gen_query_enabled) + if (r600_get_strmout_en(>b)) partial_vs_wave = true; radeon_set_context_reg(cs, CM_R_028AA8_IA_MULTI_VGT_PARAM, @@ -2018,7 +2017,7 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info rctx->b.family == CHIP_RV635) { /* if we have gs shader or streamout we need to do a wait idle after every draw */ - if (rctx->gs_shader || rctx->b.streamout.streamout_enabled) { + if (rctx->gs_shader || r600_get_strmout_en(>b)) { radeon_set_config_reg(cs, R_008040_WAIT_UNTIL, S_008040_WAIT_3D_IDLE(1)); } } diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index 062c319..7da7736 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -639,6 +639,12 @@ r600_resource_reference(struct r600_resource **ptr, struct r600_resource *res) (struct pipe_resource *)res); } +static inline bool r600_get_strmout_en(struct r600_common_context *rctx) +{ + return rctx->streamout.streamout_enabled || + rctx->streamout.prims_gen_query_enabled; +} + static inline unsigned r600_tex_aniso_filter(unsigned filter) { if (filter <= 1) return 0; diff --git a/src/gallium/drivers/radeon/r600_streamout.c b/src/gallium/drivers/radeon/r600_streamout.c index e977ed9..fc9ec48 100644 --- a/src/gallium/drivers/radeon/r600_streamout.c +++ b/src/gallium/drivers/radeon/r600_streamout.c @@ -311,12 +311,6 @@ void r600_emit_streamout_end(struct r600_common_context *rctx) * are no buffers bound. */ -static bool r600_get_strmout_en(struct r600_common_context *rctx) -{ - return rctx->streamout.streamout_enabled || - rctx->streamout.prims_gen_query_enabled; -} - static void r600_emit_streamout_enable(struct r600_common_context *rctx, struct r600_atom *atom) { diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index 84b850a..3863e59 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -882,8 +882,7 @@ void si_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info *info) if ((sctx->b.family == CHIP_HAWAII || sctx->b.family == CHIP_TONGA || sctx->b.family == CHIP_FIJI) && - (sctx->b.streamout.streamout_enabled || -sctx->b.streamout.prims_gen_query_enabled)) { + r600_get_strmout_en(>b)) { sctx->b.flags |= SI_CONTEXT_VGT_STREAMOUT_SYNC; } -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] i965: Extract SSEU configuration info
On Thursday, April 7, 2016 10:53:12 AM PDT Ben Widawsky wrote: > Signed-off-by: Ben Widawsky> --- > src/mesa/drivers/dri/i965/intel_screen.c | 35 ++ +- > 1 file changed, 21 insertions(+), 14 deletions(-) Thanks for taking care of this. Series is: Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: implement and rely on set_active_query_state
From: Marek Olšák--- src/gallium/drivers/radeonsi/si_blit.c | 3 --- src/gallium/drivers/radeonsi/si_pipe.h | 4 src/gallium/drivers/radeonsi/si_state.c | 32 +++- src/gallium/drivers/radeonsi/si_state_draw.c | 10 + 4 files changed, 45 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_blit.c b/src/gallium/drivers/radeonsi/si_blit.c index c5ea8b1..aed783f 100644 --- a/src/gallium/drivers/radeonsi/si_blit.c +++ b/src/gallium/drivers/radeonsi/si_blit.c @@ -52,8 +52,6 @@ static void si_blitter_begin(struct pipe_context *ctx, enum si_blitter_op op) { struct si_context *sctx = (struct si_context *)ctx; - r600_suspend_nontimer_queries(>b); - util_blitter_save_vertex_buffer_slot(sctx->blitter, sctx->vertex_buffer); util_blitter_save_vertex_elements(sctx->blitter, sctx->vertex_elements); util_blitter_save_vertex_shader(sctx->blitter, sctx->vs_shader.cso); @@ -95,7 +93,6 @@ static void si_blitter_end(struct pipe_context *ctx) struct si_context *sctx = (struct si_context *)ctx; sctx->b.render_cond_force_off = false; - r600_resume_nontimer_queries(>b); } static unsigned u_max_sample(struct pipe_resource *r) diff --git a/src/gallium/drivers/radeonsi/si_pipe.h b/src/gallium/drivers/radeonsi/si_pipe.h index 4158fc5..8fcfcd2 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.h +++ b/src/gallium/drivers/radeonsi/si_pipe.h @@ -66,6 +66,9 @@ /* Compute only. */ #define SI_CONTEXT_FLUSH_WITH_INV_L2 (R600_CONTEXT_PRIVATE_FLAG << 13) /* TODO: merge with TC? */ #define SI_CONTEXT_FLAG_COMPUTE(R600_CONTEXT_PRIVATE_FLAG << 14) +/* Pipeline & streamout query controls. */ +#define SI_CONTEXT_START_PIPELINE_STATS(R600_CONTEXT_PRIVATE_FLAG << 15) +#define SI_CONTEXT_STOP_PIPELINE_STATS (R600_CONTEXT_PRIVATE_FLAG << 16) #define SI_CONTEXT_FLUSH_AND_INV_FRAMEBUFFER (SI_CONTEXT_FLUSH_AND_INV_CB | \ SI_CONTEXT_FLUSH_AND_INV_CB_META | \ @@ -289,6 +292,7 @@ struct si_context { booldb_stencil_clear; booldb_stencil_disable_expclear; unsignedps_db_shader_control; + boolocclusion_queries_disabled; /* Emitted draw state. */ int last_base_vertex; diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index a66bd30..6fbbb68 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -1352,6 +1352,26 @@ static void *si_create_db_flush_dsa(struct si_context *sctx) /* DB RENDER STATE */ +static void si_set_active_query_state(struct pipe_context *ctx, boolean enable) +{ + struct si_context *sctx = (struct si_context*)ctx; + + /* Pipeline stat & streamout queries. */ + if (enable) { + sctx->b.flags &= ~SI_CONTEXT_STOP_PIPELINE_STATS; + sctx->b.flags |= SI_CONTEXT_START_PIPELINE_STATS; + } else { + sctx->b.flags &= ~SI_CONTEXT_START_PIPELINE_STATS; + sctx->b.flags |= SI_CONTEXT_STOP_PIPELINE_STATS; + } + + /* Occlusion queries. */ + if (sctx->occlusion_queries_disabled != !enable) { + sctx->occlusion_queries_disabled = !enable; + si_mark_atom_dirty(sctx, >db_render_state); + } +} + static void si_set_occlusion_query_state(struct pipe_context *ctx, bool enable) { struct si_context *sctx = (struct si_context*)ctx; @@ -1386,7 +1406,8 @@ static void si_emit_db_render_state(struct si_context *sctx, struct r600_atom *s } /* DB_COUNT_CONTROL (occlusion queries) */ - if (sctx->b.num_occlusion_queries > 0) { + if (sctx->b.num_occlusion_queries > 0 && + !sctx->occlusion_queries_disabled) { bool perfect = sctx->b.num_perfect_occlusion_queries > 0; if (sctx->b.chip_class >= CIK) { @@ -3765,6 +3786,7 @@ void si_init_state_functions(struct si_context *sctx) sctx->b.b.set_min_samples = si_set_min_samples; sctx->b.b.set_tess_state = si_set_tess_state; + sctx->b.b.set_active_query_state = si_set_active_query_state; sctx->b.set_occlusion_query_state = si_set_occlusion_query_state; sctx->b.need_gfx_cs_space = si_need_gfx_cs_space; @@ -3995,6 +4017,14 @@ static void si_init_config(struct si_context *sctx) si_pm4_cmd_add(pm4, 0x8000); si_pm4_cmd_end(pm4, false); + /* This enables pipeline stat & streamout queries. +* They are only disabled by blits. +*/ + si_pm4_cmd_begin(pm4, PKT3_EVENT_WRITE); + si_pm4_cmd_add(pm4, EVENT_TYPE(V_028A90_PIPELINESTAT_START) | + EVENT_INDEX(0)); + si_pm4_cmd_end(pm4, false); + si_pm4_set_reg(pm4,
[Mesa-dev] [PATCH 1/2] arb_shader_image_size/builtin: don't report subtests that are never run
From: Marek Olšákthis removes a lot of skipped subtests from results --- tests/spec/arb_shader_image_size/builtin.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/tests/spec/arb_shader_image_size/builtin.c b/tests/spec/arb_shader_image_size/builtin.c index 434af70..c8a7724 100755 --- a/tests/spec/arb_shader_image_size/builtin.c +++ b/tests/spec/arb_shader_image_size/builtin.c @@ -254,11 +254,13 @@ test_max_dimensions(const struct image_format_info *format, const struct image_extent size = get_test_extent(target, d); - subtest(status, - !quick && - is_test_reasonable(!slow, size) && - is_format_interesting(format, slow) && - is_stage_interesting(stage, slow), + if (quick || + !is_test_reasonable(!slow, size) || + !is_format_interesting(format, slow) || + !is_stage_interesting(stage, slow)) + continue; + + subtest(status, true, run_test(format, target, size), "%s/%s/image%s max size test/%dx%dx%dx%d", format->name, stage->name, target->name, @@ -278,9 +280,11 @@ test_small_dimensions(const struct image_format_info *format, image_extent_for_target(target, 16, 96); - subtest(status, - is_format_interesting(format, slow) && - is_stage_interesting(stage, slow), + if (!is_format_interesting(format, slow) || + !is_stage_interesting(stage, slow)) + return; + + subtest(status, true, run_test(format, target, size), "%s/%s/image%s size test/%dx%dx%dx%d", format->name, stage->name, target->name, -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] Remove fs-discard-exit-3
From: Marek OlšákThe test is wrong and the GLSL 1.30 citation in the test states very clearly that discard can cause non-uniform control flow for any code that follows. --- .../execution/fs-discard-exit-3.shader_test| 76 -- 1 file changed, 76 deletions(-) delete mode 100644 tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test diff --git a/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test b/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test deleted file mode 100644 index 14e9b47..000 --- a/tests/spec/glsl-1.30/execution/fs-discard-exit-3.shader_test +++ /dev/null @@ -1,76 +0,0 @@ -# This is a test for derivatives behavior after a discard. -# -# From the GLSL 1.30 spec: -# -# "The discard keyword is only allowed within fragment shaders. It -# can be used within a fragment shader to abandon the operation -# on the current fragment. This keyword causes the fragment to be -# discarded and no updates to any buffers will occur. Control -# flow exits the shader, and subsequent implicit or explicit -# derivatives are undefined when this control flow is non-uniform -# (meaning different fragments within the primitive take -# different control paths)." - - -[require] -GLSL >= 1.30 - -[vertex shader] -#version 130 - -in vec4 vertex; -out vec2 texcoords; -void main() -{ - gl_Position = vertex; - - /* Turn the texcoords into a 1:1 mapping with pixels when -* interpolated. This means that the coords for our 2x2 -* subspan we're interested in for the FS will be: -* -* +-+-+ -* | 0,1 | 1,1 | -* +-+-+ -* | 0,0 | 0,1 | -* +-+-+ -* -* So it would sample the 1x1 miplevel of the GL_TEXTURE_2D -* miptree, unless some other math occurs... -*/ - texcoords.yx = (vertex.xy + 1) / 2 * 250; -} - -[fragment shader] -#version 130 -in vec2 texcoords; -uniform sampler2D s; - -void main() -{ - if (gl_FragCoord.x >= 1.0 || gl_FragCoord.y >= 1.0) - discard; - - /* Now, we have uniform control after the discard (well, -* except for the join after the if statement up there). The -* derivatives on this sample should get us the same values -* for the undiscarded pixel as if we hadn't done any discard -* (comment out the "discard" above to see). -*/ - gl_FragColor = texture(s, texcoords / 4); -} - -[vertex data] -vertex/float/2 --1.0 -1.0 - 1.0 -1.0 - 1.0 1.0 --1.0 1.0 - -[test] -clear color 0.5 0.5 0.5 0.5 -clear - -texture miptree 0 - -draw arrays GL_TRIANGLE_FAN 0 4 -probe rgba 0 0 0.0 1.0 0.0 1.0 -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: remove duplicate Makefile.sources
On 02/04/16 19:35, Rob Clark wrote: From: Rob ClarkAllegedly this was needed still by scons build.. but in practice it doesn't seem to be needed. Removing it and running 'scons' results in no build errors. Signed-off-by: Rob Clark --- So, afaict NIR is not even built w/ scons build (I'm just running 'scons' with no args, so let me know if I'm missing some build variant). So at least if there is no scons variant that *does* build NIR, I think this is the right thing to do to reduce confusion. But it brings up a bigger question of what to do with my patchset which adds NIR support in mesa/st, since that obviosly won't work with scons build as-is. I guess the two options are to try to add NIR into scons build (which involves some .py generated code, so maybe not trivial) or just #ifdef'ify all the mesa/st parts in my gallium-nir patchset which introduce dependencies on NIR. Opinions? If you might recall, I already had patches to build NIR with SCons: http://cgit.freedesktop.org/~jrfonseca/mesa/log/?h=scons-nir I ended up not taking any action at the time. First due to lack of time, second because I noticed NIR source trees being moving around, so the risk of exposing scons build to breakage didn't seem worthwhile. So those changes need to be rebased and probably updated a bit, but it gives you an idea of what needs to be done. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] i965: Pass brw_compiler into brw_preprocess_nir() instead of is_scalar.
LGTM, Reviewed-by: Alejandro PiñeiroOn 08/04/16 01:35, Kenneth Graunke wrote: > I want to be able to read other fields. > > Signed-off-by: Kenneth Graunke > --- > src/mesa/drivers/dri/i965/brw_nir.c | 6 -- > src/mesa/drivers/dri/i965/brw_nir.h | 3 ++- > 2 files changed, 6 insertions(+), 3 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_nir.c > b/src/mesa/drivers/dri/i965/brw_nir.c > index c62840a..1821c0d 100644 > --- a/src/mesa/drivers/dri/i965/brw_nir.c > +++ b/src/mesa/drivers/dri/i965/brw_nir.c > @@ -437,11 +437,13 @@ nir_optimize(nir_shader *nir, bool is_scalar) > * is_scalar = true to scalarize everything prior to code gen. > */ > nir_shader * > -brw_preprocess_nir(nir_shader *nir, bool is_scalar) > +brw_preprocess_nir(const struct brw_compiler *compiler, nir_shader *nir) > { > bool progress; /* Written by OPT and OPT_V */ > (void)progress; > > + const bool is_scalar = compiler->scalar_stage[nir->stage]; > + > if (nir->stage == MESA_SHADER_GEOMETRY) >OPT(nir_lower_gs_intrinsics); > > @@ -568,7 +570,7 @@ brw_create_nir(struct brw_context *brw, > > (void)progress; > > - nir = brw_preprocess_nir(nir, is_scalar); > + nir = brw_preprocess_nir(brw->intelScreen->compiler, nir); > > OPT(nir_lower_system_values); > OPT_V(brw_nir_lower_uniforms, is_scalar); > diff --git a/src/mesa/drivers/dri/i965/brw_nir.h > b/src/mesa/drivers/dri/i965/brw_nir.h > index 440b4ce..b10c083 100644 > --- a/src/mesa/drivers/dri/i965/brw_nir.h > +++ b/src/mesa/drivers/dri/i965/brw_nir.h > @@ -81,7 +81,8 @@ nir_shader *brw_create_nir(struct brw_context *brw, > gl_shader_stage stage, > bool is_scalar); > > -nir_shader *brw_preprocess_nir(nir_shader *nir, bool is_scalar); > +nir_shader *brw_preprocess_nir(const struct brw_compiler *compiler, > + nir_shader *nir); > > void brw_nir_lower_vs_inputs(nir_shader *nir, > const struct brw_device_info *devinfo, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: fix glReadBuffer() assertion failure
On 08/04/16 00:45, Brian Paul wrote: If the first call in a GL app is glReadPixels(GL_FRONT) we'd fail the assert(st->ctx->FragmentProgram._Current) at st_atom_shader.c:114 in update_fp(). This is because we were calling st_validate_state() without first updating Mesa state with _mesa_update_state(). The regression came from commit 83b589301f4a150f4 "st/mesa: fix frontbuffer glReadPixels regressions". The new piglit gl-1.0-simple-readbuffer test exercises this. Cc: "11.1 11.2"--- src/mesa/state_tracker/st_cb_fbo.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/state_tracker/st_cb_fbo.c b/src/mesa/state_tracker/st_cb_fbo.c index ff570e0..456ad83 100644 --- a/src/mesa/state_tracker/st_cb_fbo.c +++ b/src/mesa/state_tracker/st_cb_fbo.c @@ -40,6 +40,7 @@ #include "main/glformats.h" #include "main/macros.h" #include "main/renderbuffer.h" +#include "main/state.h" #include "pipe/p_context.h" #include "pipe/p_defines.h" @@ -729,6 +730,7 @@ st_ReadBuffer(struct gl_context *ctx, GLenum buffer) fb->Attachment[fb->_ColorReadBufferIndex].Type == GL_NONE) { /* add the buffer */ st_manager_add_color_renderbuffer(st, fb, fb->_ColorReadBufferIndex); + _mesa_update_state(ctx); st_validate_state(st, ST_PIPELINE_RENDER); } } Reviewed-by: Jose Fonseca ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 00/15] i965: Rework uniform handling in the back-end
On Apr 7, 2016 10:01 PM, "Matt Turner"wrote: > > On Tue, Mar 22, 2016 at 3:33 PM, Jason Ekstrand wrote: > > This is mostly a re-send of a patch series I've had floating around in one > > form or a while for quite some time. It's basically the same except that > > the original version was missing a work-around for Sandy Bridge. For a > > while, I wasn't really pushing to get it merged because I couldn't > > demonstrate any actual performance benifit from pushing arrays. However, > > with the Vulkan API, the concept of push constants is directly exposed to > > the user and we really need to be able to indirect on them. This series > > makes the FS backend 100% ready for indirect push constants; vec4 will > > take a little more work. > > > > It's worth noting that we've been carying these patches around in our > > Vulkan driver for probably 3 or 4 months now and it's working great. > > > > For those that prefer to review on a branch: > > > > https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i965-uniforms > > > > I think Kristian has mostly reviewed these patches. However, he never sent > > any R-Bs to the list. I'd also like Ken or Matt to look at it from a > > design perspective. > > I don't know what I think. I'm sympathetic to Curro's argument, but in > the absence of more data it's hard to judge anything really. I'm not > at all sympathetic to > > """ > Do I have a proof-of-concept in code, no. However, I've run through > it in my head and I have a pretty good idea what it would look like. > You are free to go off and do it if you don't believe me, but I don't > really want to hold things up while you do. > """ > > That's what... An Appeal to Your Brain? :) Sort-of... It was more a remark of frustration at the (percieved) implication that I hadn't thought about it or at the very least hadnt given it a fair shake. In a bit more detail here are some of my thoughts on reladdr and an ADDR file in no particular order a) Not a single FS optimization pass handles it. Yes, "if you see reladdr, bail” is a valid (if suboptimal) strategy 90% of the time. However, anything that computes any sort of kill set now needs a recursive algorithm to walk register sources. We do handle this in NIR and it's not terrible but it does come with nontrivial pain and retrofitting it isn't necessarily going to be quick-and-easy. Curro's response of "use-def chains will fix this" while probably accurate doesn't solve the immediate problem while these patches have been on the list for 6 months. b) The hardware doesn't do reladdr. It has an address register with substantial restrictions. Eventually, we would need to lower to something that writes the address register and have an indirect source type that consumes it. If you end up with two indirect sources, we have to emit a move for one of them. Where do we do that lowering? Do we do it in the generator or as a pass? c) If we handle it all in the generator, we have no ability to schedule it at all. It also makes the generator far more complex. d) If we handle it in a lowering pass, what does that pass produce? Do we expose the ADDR file and try to do RA on it or do we treat it as a fixed thing like flag? In either case, we need to add extra logic to at least the scheduler if not other places to add this whole new concept. e) If we allow indirect sources of any sort, how do we carry range information around post-RA. Pre-RA we can theoretically just say if you indirect you touch the whole thing. Post-RA, you either have to carry that information around per-instruction or you have to assume that any instruction that uses an indirect source could be reading anything in the entire GRF and it becomes almost a complete scheduling barrier. Those are the thoughts that pop to the top. I could come up with more if you'd like. So, yes, using reladdr or or an ADDR file would be possible but it would involve substantial IR surgery. What's the benefit? You can put the relative source directly in the instruction that uses it and maybe do an address calculation directly to the address register instead of having to move it there. The approach I've taken on the other hand, neatly side-steps all of the issues listed above. This comes at the cost of a few extra instructions (which you probably have to spend anyway on gen7). I think that trade-off is worthwhile. --Jason > I don't know how to proceed on that front if no one is willing or > interested in trying to implement it using reladdr. > > I ran shader-db. > > total instructions in shared programs: 7113290 -> 7161760 (0.68%) > instructions in affected programs: 866011 -> 914481 (5.60%) > helped: 0 > HURT: 7180 > > total cycles in shared programs: 64705926 -> 64776118 (0.11%) > cycles in affected programs: 4951554 -> 5021746 (1.42%) > helped: 1605 > HURT: 5204 > > of which the overwhelming majority is vertex shaders (why? this series > is i965/fs). FS changes are
Re: [Mesa-dev] [PATCH] st/va: avoid dereference after free
Hi Thomas and Emil, I tested it and pushed. Thx a lot. On 6 April 2016 at 21:34, Thomas H.P. Andersenwrote: > > > On Sun, Mar 6, 2016 at 10:08 AM, Thomas H.P. Andersen > wrote: > >> >> >> On Sat, Mar 5, 2016 at 1:30 PM, Emil Velikov >> wrote: >> >>> Hi Thomas, >>> >>> On 5 March 2016 at 12:07, Thomas Hindoe Paaboel Andersen >>> wrote: >>> > --- >>> > src/gallium/state_trackers/va/image.c | 4 +++- >>> > 1 file changed, 3 insertions(+), 1 deletion(-) >>> > >>> > diff --git a/src/gallium/state_trackers/va/image.c >>> b/src/gallium/state_trackers/va/image.c >>> > index 2c42a98..92d014c 100644 >>> > --- a/src/gallium/state_trackers/va/image.c >>> > +++ b/src/gallium/state_trackers/va/image.c >>> > @@ -280,6 +280,7 @@ vlVaDestroyImage(VADriverContextP ctx, VAImageID >>> image) >>> > { >>> > vlVaDriver *drv; >>> > VAImage *vaimage; >>> > + VAStatus status; >>> > >>> > if (!ctx) >>> >return VA_STATUS_ERROR_INVALID_CONTEXT; >>> > @@ -294,8 +295,9 @@ vlVaDestroyImage(VADriverContextP ctx, VAImageID >>> image) >>> > >>> > handle_table_remove(VL_VA_DRIVER(ctx)->htab, image); >>> > pipe_mutex_unlock(drv->mutex); >>> > + status = vlVaDestroyBuffer(ctx, vaimage->buf); >>> > FREE(vaimage); >>> > - return vlVaDestroyBuffer(ctx, vaimage->buf); >>> > + return status; >>> >>> Nicely spotted ! >>> Out of curiosity: did you notice this during code inspection or did it >>> pop up during testing ? >>> >> >> Thanks for the review! I only saw it by inspection. I am just poking >> around a bit to get to know the code. >> >> >>> >>> For the patch >>> >>> Cc: "11.1 11.2" >>> Reviewed-by: Emil Velikov >>> >>> I'll push this in a couple of days unless someone beats me to it. >>> >> > Hi Emil, > > Friendly ping :) > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev