Re: [Mesa-dev] [PATCH 2/7] glsl/glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf
On Sunday, January 1, 2017 1:34:27 AM PST Marek Olšák wrote: > From: Marek Olšák> > This reduces compile times by 4.5% with the Gallium noop driver and > gl_constants::GLSLOptimizeConservatively == true. Compile times of...what exactly? Do you have any statistics for this by itself? Assuming we add your helper, this patch looks reasonable. Reviewed-by: Kenneth Graunke BTW, I suspect you could get some additional speed up by changing parser->output = ralloc_strdup(parser, ""); to something like: parser->output = ralloc_size(parser, strlen(orig_concatenated_src)); parser->output[0] = '\0'; to try and avoid reallocations. rewrite_tail will realloc just enough space every time it allocates, which means once you reallocate, you're going to be calling realloc on every single token. Yuck! ralloc/talloc's string libraries were never meant for serious string processing like the preprocessor does. They're meant for convenience when constructing debug messages which don't need to be that efficient. Perhaps a better approach would be to have the preprocessor do this itself. Just ralloc_size() output and initialize the null byte. reralloc to double the size if you need more space. At the end of preprocessing, reralloc to output_length at the end of free any waste from doubling. I suspect that would be a *lot* more efficient, and is probably what we should have done in the first place... signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/7] ralloc: add a new printing helper ralloc_sprint_rewrite_tail
On Sunday, January 1, 2017 1:34:26 AM PST Marek Olšák wrote: > From: Marek Olšák> > This one is much faster when you don't need vsprintf. > --- > src/util/ralloc.c | 25 + > src/util/ralloc.h | 24 > 2 files changed, 49 insertions(+) > > diff --git a/src/util/ralloc.c b/src/util/ralloc.c > index 980e4e4..7976ca6 100644 > --- a/src/util/ralloc.c > +++ b/src/util/ralloc.c > @@ -522,20 +522,45 @@ ralloc_vasprintf_rewrite_tail(char **str, size_t > *start, const char *fmt, > ptr = resize(*str, *start + new_length + 1); > if (unlikely(ptr == NULL)) >return false; > > vsnprintf(ptr + *start, new_length + 1, fmt, args); > *str = ptr; > *start += new_length; > return true; > } > > +bool > +ralloc_sprint_rewrite_tail(char **str, size_t *start, const char *text, > + unsigned text_length) > +{ Cool. I always thought about adding this. Can we shorten it to "ralloc_rewrite_tail()" or "ralloc_str_rewrite_tail()"? > + char *ptr; > + > + assert(str != NULL); > + > + if (unlikely(*str == NULL)) { > + /* Assuming a NULL context is probably bad, but it's expected > behavior. */ > + *str = ralloc_strdup(NULL, text); > + *start = strlen(*str); > + return true; > + } > + > + ptr = resize(*str, *start + text_length + 1); > + if (unlikely(ptr == NULL)) > + return false; > + > + memcpy(ptr + *start, text, text_length + 1); /* also copy '\0' */ I don't know how I feel about copying the \0 from the source string. I suppose that works, and is probably more efficient. But I noticed that this has an awful lot of similarity to the cat() helper (for ralloc_strncat), which copies n bytes and appends an \0 explicitly. I suppose if we wanted to add a ralloc_strn_rewrite_tail(), we'd need to do it that way. I don't know that there's any use in that, though. > + *str = ptr; > + *start += text_length; > + return true; > +} > + > /*** > * Linear allocator for short-lived allocations. > *** > * > * The allocator consists of a parent node (2K buffer), which requires > * a ralloc parent, and child nodes (allocations). Child nodes can't be freed > * directly, because the parent doesn't track them. You have to release > * the parent node in order to release all its children. > * > * The allocator uses a fixed-sized buffer with a monotonically increasing > diff --git a/src/util/ralloc.h b/src/util/ralloc.h > index 3e2d342..6c31a6d 100644 > --- a/src/util/ralloc.h > +++ b/src/util/ralloc.h > @@ -401,20 +401,44 @@ bool ralloc_asprintf_append (char **str, const char > *fmt, ...) > * \sa ralloc_strcat > * > * \p str will be updated to the new pointer unless allocation fails. > * > * \return True unless allocation failed. > */ > bool ralloc_vasprintf_append(char **str, const char *fmt, va_list args); > /// @} > > /** > + * Rewrite the tail of an existing string, starting at a given index. > + * > + * Overwrites the contents of *str starting at \p start with "text", > + * including a new null-terminator. Allocates more memory as necessary. > + * > + * This can be used to append formatted text when the length of the existing > + * string is already known, saving a strlen() call. > + * > + * \sa ralloc_asprintf_rewrite_tail > + * > + * \param str The string to be updated. > + * \param startThe index to start appending new data at. > + * \param text The input string terminated by zero. > + * \param text_length The length of the input string. > + * > + * \p str will be updated to the new pointer unless allocation fails. > + * \p start will be increased by the length of the newly formatted text. > + * > + * \return True unless allocation failed. > + */ > +bool ralloc_sprint_rewrite_tail(char **str, size_t *start, const char *text, > +unsigned text_length); > + > +/** > * Declare C++ new and delete operators which use ralloc. > * > * Placing this macro in the body of a class makes it possible to do: > * > * TYPE *var = new(mem_ctx) TYPE(...); > * delete var; > * > * which is more idiomatic in C++ than calling ralloc. > */ > #define DECLARE_ALLOC_CXX_OPERATORS_TEMPLATE(TYPE, ALLOC_FUNC) \ > signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/7] glsl_to_tgsi: do fewer optimizations with GLSLOptimizeConservatively
From: Marek Olšák--- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 76 ++ 1 file changed, 67 insertions(+), 9 deletions(-) diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 59d4d69..b68a02a 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -6785,40 +6785,88 @@ get_mesa_program(struct gl_context *ctx, break; default: unreachable("unhandled shader stage"); } } return prog; } +/* See if there are unsupported control flow statements. */ +class ir_control_flow_info_visitor : public ir_hierarchical_visitor { +private: + const struct gl_shader_compiler_options *options; +public: + ir_control_flow_info_visitor(const struct gl_shader_compiler_options *options) + : options(options), +unsupported(false) + { + } + + virtual ir_visitor_status visit_enter(ir_function *ir) + { + /* Other functions are skipped (same as glsl_to_tgsi). */ + if (strcmp(ir->name, "main") == 0) + return visit_continue; + + return visit_continue_with_parent; + } + + virtual ir_visitor_status visit_enter(ir_call *ir) + { + if (!ir->callee->is_intrinsic()) { + unsupported = true; /* it's a function call */ + return visit_stop; + } + return visit_continue; + } + + virtual ir_visitor_status visit_enter(ir_return *ir) + { + if (options->EmitNoMainReturn) { + unsupported = true; + return visit_stop; + } + return visit_continue; + } + + bool unsupported; +}; + +static bool +has_unsupported_control_flow(exec_list *ir, + const struct gl_shader_compiler_options *options) +{ + ir_control_flow_info_visitor visitor(options); + visit_list_elements(, ir); + return visitor.unsupported; +} extern "C" { /** * Link a shader. * Called via ctx->Driver.LinkShader() * This actually involves converting GLSL IR into an intermediate TGSI-like IR * with code lowering and other optimizations. */ GLboolean st_link_shader(struct gl_context *ctx, struct gl_shader_program *prog) { struct pipe_screen *pscreen = ctx->st->pipe->screen; assert(prog->data->LinkStatus); for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) { if (prog->_LinkedShaders[i] == NULL) continue; - bool progress; exec_list *ir = prog->_LinkedShaders[i]->ir; gl_shader_stage stage = prog->_LinkedShaders[i]->Stage; const struct gl_shader_compiler_options *options = >Const.ShaderCompilerOptions[stage]; enum pipe_shader_type ptarget = st_shader_stage_to_ptarget(stage); bool have_dround = pscreen->get_shader_param(pscreen, ptarget, PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED); bool have_dfrexp = pscreen->get_shader_param(pscreen, ptarget, PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED); unsigned if_threshold = pscreen->get_shader_param(pscreen, ptarget, @@ -6888,28 +6936,38 @@ st_link_shader(struct gl_context *ctx, struct gl_shader_program *prog) : 0)); do_vec_index_to_cond_assign(ir); lower_vector_insert(ir, true); lower_quadop_vector(ir, false); lower_noise(ir); if (options->MaxIfDepth == 0) { lower_discard(ir); } - do { - progress = do_common_optimization(ir, true, true, options, - ctx->Const.NativeIntegers); - progress = lower_if_to_cond_assign((gl_shader_stage)i, ir, -options->MaxIfDepth, if_threshold) || -progress; - - } while (progress); + if (ctx->Const.GLSLOptimizeConservatively) { + /* Do it once and repeat only if there's unsupported control flow. */ + do { +do_common_optimization(ir, true, true, options, + ctx->Const.NativeIntegers); +lower_if_to_cond_assign((gl_shader_stage)i, ir, +options->MaxIfDepth, if_threshold); + } while (has_unsupported_control_flow(ir, options)); + } else { + /* Repeat it until it stops making changes. */ + bool progress; + do { +progress = do_common_optimization(ir, true, true, options, + ctx->Const.NativeIntegers); +progress |= lower_if_to_cond_assign((gl_shader_stage)i, ir, +options->MaxIfDepth, if_threshold); + } while (progress); + } validate_ir_tree(ir); } build_program_resource_list(ctx, prog); for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) { struct gl_program
[Mesa-dev] [PATCH 5/7] mesa: add gl_constants::GLSLOptimizeConservatively
From: Marek Olšákto reduce the amount of GLSL optimizations for drivers that can do better. --- src/compiler/glsl/glsl_parser_extras.cpp | 14 +++--- src/compiler/glsl/linker.cpp | 16 src/mesa/main/ff_fragment_shader.cpp | 10 +++--- src/mesa/main/mtypes.h | 7 +++ 4 files changed, 37 insertions(+), 10 deletions(-) diff --git a/src/compiler/glsl/glsl_parser_extras.cpp b/src/compiler/glsl/glsl_parser_extras.cpp index b12cf3d..e97cbf4 100644 --- a/src/compiler/glsl/glsl_parser_extras.cpp +++ b/src/compiler/glsl/glsl_parser_extras.cpp @@ -1942,26 +1942,34 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader, } } if (!state->error && !shader->ir->is_empty()) { struct gl_shader_compiler_options *options = >Const.ShaderCompilerOptions[shader->Stage]; assign_subroutine_indexes(shader, state); lower_subroutine(shader->ir, state); + /* Do some optimization at compile time to reduce shader IR size * and reduce later work if the same shader is linked multiple times */ - while (do_common_optimization(shader->ir, false, false, options, -ctx->Const.NativeIntegers)) - ; + if (ctx->Const.GLSLOptimizeConservatively) { + /* Run it just once. */ + do_common_optimization(shader->ir, false, false, options, +ctx->Const.NativeIntegers); + } else { + /* Repeat it until it stops making changes. */ + while (do_common_optimization(shader->ir, false, false, options, + ctx->Const.NativeIntegers)) +; + } validate_ir_tree(shader->ir); enum ir_variable_mode other; switch (shader->Stage) { case MESA_SHADER_VERTEX: other = ir_var_shader_in; break; case MESA_SHADER_FRAGMENT: other = ir_var_shader_out; diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp index f4f918a..13fbb30 100644 --- a/src/compiler/glsl/linker.cpp +++ b/src/compiler/glsl/linker.cpp @@ -5041,24 +5041,32 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog) goto done; if (ctx->Const.ShaderCompilerOptions[i].LowerCombinedClipCullDistance) { lower_clip_cull_distance(prog, prog->_LinkedShaders[i]); } if (ctx->Const.LowerTessLevel) { lower_tess_level(prog->_LinkedShaders[i]); } - while (do_common_optimization(prog->_LinkedShaders[i]->ir, true, false, ->Const.ShaderCompilerOptions[i], -ctx->Const.NativeIntegers)) - ; + if (ctx->Const.GLSLOptimizeConservatively) { + /* Run it just once. */ + do_common_optimization(prog->_LinkedShaders[i]->ir, true, false, +>Const.ShaderCompilerOptions[i], +ctx->Const.NativeIntegers); + } else { + /* Repeat it until it stops making changes. */ + while (do_common_optimization(prog->_LinkedShaders[i]->ir, true, false, + >Const.ShaderCompilerOptions[i], + ctx->Const.NativeIntegers)) +; + } lower_const_arrays_to_uniforms(prog->_LinkedShaders[i]->ir, i); propagate_invariance(prog->_LinkedShaders[i]->ir); } /* Validation for special cases where we allow sampler array indexing * with loop induction variable. This check emits a warning or error * depending if backend can handle dynamic indexing. */ if ((!prog->IsES && prog->data->Version < 130) || diff --git a/src/mesa/main/ff_fragment_shader.cpp b/src/mesa/main/ff_fragment_shader.cpp index fd2c71f..48b84e8 100644 --- a/src/mesa/main/ff_fragment_shader.cpp +++ b/src/mesa/main/ff_fragment_shader.cpp @@ -1247,23 +1247,27 @@ create_new_program(struct gl_context *ctx, struct state_key *key) p.instructions = _sig->body; if (key->num_draw_buffers) emit_instructions(); validate_ir_tree(p.shader->ir); const struct gl_shader_compiler_options *options = >Const.ShaderCompilerOptions[MESA_SHADER_FRAGMENT]; - while (do_common_optimization(p.shader->ir, false, false, options, - ctx->Const.NativeIntegers)) - ; + /* Conservative approach: Don't optimize here, the linker does it too. */ + if (!ctx->Const.GLSLOptimizeConservatively) { + while (do_common_optimization(p.shader->ir, false, false, options, +ctx->Const.NativeIntegers)) + ; + } + reparent_ir(p.shader->ir, p.shader->ir); p.shader->CompileStatus = true; p.shader->Version = state->language_version; p.shader_program->Shaders = (gl_shader
[Mesa-dev] [PATCH 7/7] st/mesa: enable GLSLOptimizeConservatively for drivers that want it
From: Marek OlšákGLSL compilation now takes 24% less time with the Gallium noop driver. I used my shader-db for the measurement. The difference for the whole radeonsi driver can be ~10%. The generated TGSI is mostly the same. For example, the compilation success rate with a TGSI->GCN bytecode converter without any optimizations is the same. Note that glsl_to_tgsi does its own copy propagation and simple register allocation. shader-db GCN report: - Talos spills fewer SGPRs. - DOTA 2 spills more SGPRs. - The average shader-db score is better, but it's just due to randomness. 29045 shaders in 17564 tests Totals: SGPRS: 1325929 -> 1325017 (-0.07 %) VGPRS: 1010808 -> 1010172 (-0.06 %) Spilled SGPRs: 1432 -> 1399 (-2.30 %) Spilled VGPRs: 93 -> 92 (-1.08 %) Private memory VGPRs: 688 -> 688 (0.00 %) Scratch size: 2540 -> 2484 (-2.20 %) dwords per thread Code Size: 39336732 -> 39342936 (0.02 %) bytes Max Waves: 217937 -> 217969 (0.01 %) --- src/mesa/state_tracker/st_extensions.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index ef926e4..7ff5716 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -303,20 +303,22 @@ void st_init_limits(struct pipe_screen *screen, 65536); else options->MaxUnrollIterations = screen->get_shader_param(screen, sh, PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT); options->LowerCombinedClipCullDistance = true; options->LowerBufferInterfaceBlocks = true; } + c->GLSLOptimizeConservatively = + screen->get_param(screen, PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY); c->LowerTessLevel = true; c->LowerCsDerivedVariables = true; c->PrimitiveRestartForPatches = screen->get_param(screen, PIPE_CAP_PRIMITIVE_RESTART_FOR_PATCHES); c->MaxCombinedTextureImageUnits = _min(c->Program[MESA_SHADER_VERTEX].MaxTextureImageUnits + c->Program[MESA_SHADER_TESS_CTRL].MaxTextureImageUnits + c->Program[MESA_SHADER_TESS_EVAL].MaxTextureImageUnits + c->Program[MESA_SHADER_GEOMETRY].MaxTextureImageUnits + -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/7] glsl: run do_lower_jumps properly in do_common_optimizations
From: Marek Olšákso that backends don't have to run it manually --- src/compiler/glsl/glsl_parser_extras.cpp | 3 ++- src/mesa/program/ir_to_mesa.cpp| 2 -- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 8 +--- 3 files changed, 3 insertions(+), 10 deletions(-) diff --git a/src/compiler/glsl/glsl_parser_extras.cpp b/src/compiler/glsl/glsl_parser_extras.cpp index 4566aa9..b12cf3d 100644 --- a/src/compiler/glsl/glsl_parser_extras.cpp +++ b/src/compiler/glsl/glsl_parser_extras.cpp @@ -2099,21 +2099,22 @@ do_common_optimization(exec_list *ir, bool linked, OPT(do_tree_grafting, ir); OPT(do_constant_propagation, ir); if (linked) OPT(do_constant_variable, ir); else OPT(do_constant_variable_unlinked, ir); OPT(do_constant_folding, ir); OPT(do_minmax_prune, ir); OPT(do_rebalance_tree, ir); OPT(do_algebraic, ir, native_integers, options); - OPT(do_lower_jumps, ir); + OPT(do_lower_jumps, ir, true, true, options->EmitNoMainReturn, + options->EmitNoCont, options->EmitNoLoops); OPT(do_vec_index_to_swizzle, ir); OPT(lower_vector_insert, ir, false); OPT(do_swizzle_swizzle, ir); OPT(do_noop_swizzle, ir); OPT(optimize_split_arrays, ir, linked); OPT(optimize_redundant_jumps, ir); if (options->MaxUnrollIterations) { loop_state *ls = analyze_loop_variables(ir); diff --git a/src/mesa/program/ir_to_mesa.cpp b/src/mesa/program/ir_to_mesa.cpp index 653b822..0089e80 100644 --- a/src/mesa/program/ir_to_mesa.cpp +++ b/src/mesa/program/ir_to_mesa.cpp @@ -2972,22 +2972,20 @@ _mesa_ir_link_shader(struct gl_context *ctx, struct gl_shader_program *prog) do { progress = false; /* Lowering */ do_mat_op_to_vec(ir); lower_instructions(ir, (MOD_TO_FLOOR | DIV_TO_MUL_RCP | EXP_TO_EXP2 | LOG_TO_LOG2 | INT_DIV_TO_MUL_RCP | ((options->EmitNoPow) ? POW_TO_EXP2 : 0))); -progress = do_lower_jumps(ir, true, true, options->EmitNoMainReturn, options->EmitNoCont, options->EmitNoLoops) || progress; - progress = do_common_optimization(ir, true, true, options, ctx->Const.NativeIntegers) || progress; progress = lower_quadop_vector(ir, true) || progress; if (options->MaxIfDepth == 0) progress = lower_discard(ir) || progress; progress = lower_if_to_cond_assign((gl_shader_stage)i, ir, diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index f738084..59d4d69 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -6889,28 +6889,22 @@ st_link_shader(struct gl_context *ctx, struct gl_shader_program *prog) do_vec_index_to_cond_assign(ir); lower_vector_insert(ir, true); lower_quadop_vector(ir, false); lower_noise(ir); if (options->MaxIfDepth == 0) { lower_discard(ir); } do { - progress = false; - - progress = do_lower_jumps(ir, true, true, options->EmitNoMainReturn, options->EmitNoCont, options->EmitNoLoops) || progress; - progress = do_common_optimization(ir, true, true, options, - ctx->Const.NativeIntegers) - || progress; - + ctx->Const.NativeIntegers); progress = lower_if_to_cond_assign((gl_shader_stage)i, ir, options->MaxIfDepth, if_threshold) || progress; } while (progress); validate_ir_tree(ir); } build_program_resource_list(ctx, prog); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/7] gallium: add PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY
From: Marek OlšákDrivers with good compilers don't need aggressive optimizations before TGSI. --- src/gallium/docs/source/screen.rst | 3 +++ src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/swr/swr_screen.cpp | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/drivers/virgl/virgl_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + 17 files changed, 19 insertions(+) diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 86aa259..000551a 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -359,20 +359,23 @@ The integer capabilities: UsageMask of xy or yzw is allowed, but xz or yw isn't. Declarations with overlapping locations must have matching semantic names and indices, and equal interpolation qualifiers. Components may overlap, notably when the gaps in an array of dvec3 are filled in. * ``PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS``: Whether interleaved stream output mode is able to interleave across buffers. This is required for ARB_transform_feedback3. * ``PIPE_CAP_TGSI_CAN_READ_OUTPUTS``: Whether every TGSI shader stage can read from the output file. +* ``PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY``: Tell the GLSL compiler to use + the minimum amount of optimizations just to be able to do all the linking + and lowering. .. _pipe_capf: PIPE_CAPF_* The floating-point capabilities are: * ``PIPE_CAPF_MAX_LINE_WIDTH``: The maximum width of a regular line. diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index d84cd82..3b631372 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -288,20 +288,21 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT: case PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR: case PIPE_CAP_CULL_DISTANCE: case PIPE_CAP_PRIMITIVE_RESTART_FOR_PATCHES: case PIPE_CAP_TGSI_VOTE: case PIPE_CAP_MAX_WINDOW_RECTANGLES: case PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED: case PIPE_CAP_VIEWPORT_SUBPIXEL_BITS: case PIPE_CAP_TGSI_ARRAY_COMPONENTS: case PIPE_CAP_TGSI_CAN_READ_OUTPUTS: +case PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY: return 0; case PIPE_CAP_MAX_VIEWPORTS: return 1; case PIPE_CAP_SHAREABLE_SHADERS: /* manage the variants for these ourself, to avoid breaking precompile: */ case PIPE_CAP_FRAGMENT_COLOR_CLAMPED: case PIPE_CAP_VERTEX_COLOR_CLAMPED: if (is_ir3(screen)) diff --git a/src/gallium/drivers/i915/i915_screen.c b/src/gallium/drivers/i915/i915_screen.c index 14f4271..18578c0 100644 --- a/src/gallium/drivers/i915/i915_screen.c +++ b/src/gallium/drivers/i915/i915_screen.c @@ -289,20 +289,21 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap cap) case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT: case PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT: case PIPE_CAP_DRAW_INDIRECT: case PIPE_CAP_MULTI_DRAW_INDIRECT: case PIPE_CAP_MULTI_DRAW_INDIRECT_PARAMS: case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE: case PIPE_CAP_SAMPLER_VIEW_TARGET: case PIPE_CAP_VIEWPORT_SUBPIXEL_BITS: case PIPE_CAP_TGSI_CAN_READ_OUTPUTS: case PIPE_CAP_NATIVE_FENCE_FD: + case PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY: return 0; case PIPE_CAP_MAX_VIEWPORTS: return 1; case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT: return 64; case PIPE_CAP_GLSL_FEATURE_LEVEL: return 120; diff --git a/src/gallium/drivers/ilo/ilo_screen.c b/src/gallium/drivers/ilo/ilo_screen.c index c3fad73..20a0e8d 100644 --- a/src/gallium/drivers/ilo/ilo_screen.c +++ b/src/gallium/drivers/ilo/ilo_screen.c @@ -512,20 +512,21 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR: case PIPE_CAP_CULL_DISTANCE: case PIPE_CAP_PRIMITIVE_RESTART_FOR_PATCHES: case PIPE_CAP_TGSI_VOTE: case PIPE_CAP_MAX_WINDOW_RECTANGLES: case PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED: case PIPE_CAP_VIEWPORT_SUBPIXEL_BITS:
[Mesa-dev] [PATCH 2/7] glsl/glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf
From: Marek OlšákThis reduces compile times by 4.5% with the Gallium noop driver and gl_constants::GLSLOptimizeConservatively == true. --- src/compiler/glsl/glcpp/glcpp-parse.y | 39 +++ 1 file changed, 21 insertions(+), 18 deletions(-) diff --git a/src/compiler/glsl/glcpp/glcpp-parse.y b/src/compiler/glsl/glcpp/glcpp-parse.y index 63012bc..b84e5ff 100644 --- a/src/compiler/glsl/glcpp/glcpp-parse.y +++ b/src/compiler/glsl/glcpp/glcpp-parse.y @@ -202,21 +202,21 @@ add_builtin_define(glcpp_parser_t *parser, const char *name, int value); input: /* empty */ | input line ; line: control_line | SPACE control_line | text_line { _glcpp_parser_print_expanded_token_list (parser, $1); - ralloc_asprintf_rewrite_tail (>output, >output_length, "\n"); + ralloc_sprint_rewrite_tail(>output, >output_length, "\n", 1); } | expanded_line ; expanded_line: IF_EXPANDED expression NEWLINE { if (parser->is_gles && $2.undefined_macro) glcpp_error(& @1, parser, "undefined macro %s in expression (illegal in GLES)", $2.undefined_macro); _glcpp_parser_skip_stack_push_if (parser, & @1, $2.value); } @@ -252,21 +252,21 @@ define: | FUNC_IDENTIFIER '(' ')' replacement_list NEWLINE { _define_function_macro (parser, & @1, $1, NULL, $4); } | FUNC_IDENTIFIER '(' identifier_list ')' replacement_list NEWLINE { _define_function_macro (parser, & @1, $1, $3, $5); } ; control_line: control_line_success { - ralloc_asprintf_rewrite_tail (>output, >output_length, "\n"); + ralloc_sprint_rewrite_tail(>output, >output_length, "\n", 1); } | control_line_error | HASH_TOKEN LINE pp_tokens NEWLINE { if (parser->skip_stack == NULL || parser->skip_stack->type == SKIP_NO_SKIP) { _glcpp_parser_expand_and_lex_from (parser, LINE_EXPANDED, $3, EXPANSION_MODE_IGNORE_DEFINED); @@ -428,21 +428,22 @@ control_line_success: | HASH_TOKEN VERSION_TOKEN version_constant IDENTIFIER NEWLINE { if (parser->version_set) { glcpp_error(& @1, parser, "#version must appear on the first line"); } _glcpp_parser_handle_version_declaration(parser, $3, $4, true); } | HASH_TOKEN NEWLINE { glcpp_parser_resolve_implicit_version(parser); } | HASH_TOKEN PRAGMA NEWLINE { - ralloc_asprintf_rewrite_tail (>output, >output_length, "#%s", $2); + ralloc_sprint_rewrite_tail(>output, >output_length, "#", 1); + ralloc_sprint_rewrite_tail(>output, >output_length, $2, strlen($2)); } ; control_line_error: HASH_TOKEN ERROR_TOKEN NEWLINE { glcpp_error(& @1, parser, "#%s", $2); } | HASH_TOKEN DEFINE_TOKEN NEWLINE { glcpp_error (& @1, parser, "#define without macro name"); } @@ -1109,71 +1110,73 @@ _token_list_equal_ignoring_space(token_list_t *a, token_list_t *b) node_b = node_b->next; } return 1; } static void _token_print(char **out, size_t *len, token_t *token) { if (token->type < 256) { - ralloc_asprintf_rewrite_tail (out, len, "%c", token->type); + char s[2] = {token->type, 0}; + ralloc_sprint_rewrite_tail(out, len, s, 1); return; } switch (token->type) { case INTEGER: ralloc_asprintf_rewrite_tail (out, len, "%" PRIiMAX, token->value.ival); break; case IDENTIFIER: case INTEGER_STRING: case OTHER: - ralloc_asprintf_rewrite_tail (out, len, "%s", token->value.str); + ralloc_sprint_rewrite_tail(out, len, token->value.str, + strlen(token->value.str)); break; case SPACE: - ralloc_asprintf_rewrite_tail (out, len, " "); + ralloc_sprint_rewrite_tail(out, len, " ", 1); break; case LEFT_SHIFT: - ralloc_asprintf_rewrite_tail (out, len, "<<"); + ralloc_sprint_rewrite_tail(out, len, "<<", 2); break; case RIGHT_SHIFT: - ralloc_asprintf_rewrite_tail (out, len, ">>"); + ralloc_sprint_rewrite_tail(out, len, ">>", 2); break; case LESS_OR_EQUAL: - ralloc_asprintf_rewrite_tail (out, len, "<="); + ralloc_sprint_rewrite_tail(out, len, "<=", 2); break; case GREATER_OR_EQUAL: - ralloc_asprintf_rewrite_tail (out, len, ">="); + ralloc_sprint_rewrite_tail(out, len, ">=", 2); break; case EQUAL: - ralloc_asprintf_rewrite_tail (out, len, "=="); + ralloc_sprint_rewrite_tail(out, len, "==", 2);
[Mesa-dev] [PATCH 1/7] ralloc: add a new printing helper ralloc_sprint_rewrite_tail
From: Marek OlšákThis one is much faster when you don't need vsprintf. --- src/util/ralloc.c | 25 + src/util/ralloc.h | 24 2 files changed, 49 insertions(+) diff --git a/src/util/ralloc.c b/src/util/ralloc.c index 980e4e4..7976ca6 100644 --- a/src/util/ralloc.c +++ b/src/util/ralloc.c @@ -522,20 +522,45 @@ ralloc_vasprintf_rewrite_tail(char **str, size_t *start, const char *fmt, ptr = resize(*str, *start + new_length + 1); if (unlikely(ptr == NULL)) return false; vsnprintf(ptr + *start, new_length + 1, fmt, args); *str = ptr; *start += new_length; return true; } +bool +ralloc_sprint_rewrite_tail(char **str, size_t *start, const char *text, + unsigned text_length) +{ + char *ptr; + + assert(str != NULL); + + if (unlikely(*str == NULL)) { + /* Assuming a NULL context is probably bad, but it's expected behavior. */ + *str = ralloc_strdup(NULL, text); + *start = strlen(*str); + return true; + } + + ptr = resize(*str, *start + text_length + 1); + if (unlikely(ptr == NULL)) + return false; + + memcpy(ptr + *start, text, text_length + 1); /* also copy '\0' */ + *str = ptr; + *start += text_length; + return true; +} + /*** * Linear allocator for short-lived allocations. *** * * The allocator consists of a parent node (2K buffer), which requires * a ralloc parent, and child nodes (allocations). Child nodes can't be freed * directly, because the parent doesn't track them. You have to release * the parent node in order to release all its children. * * The allocator uses a fixed-sized buffer with a monotonically increasing diff --git a/src/util/ralloc.h b/src/util/ralloc.h index 3e2d342..6c31a6d 100644 --- a/src/util/ralloc.h +++ b/src/util/ralloc.h @@ -401,20 +401,44 @@ bool ralloc_asprintf_append (char **str, const char *fmt, ...) * \sa ralloc_strcat * * \p str will be updated to the new pointer unless allocation fails. * * \return True unless allocation failed. */ bool ralloc_vasprintf_append(char **str, const char *fmt, va_list args); /// @} /** + * Rewrite the tail of an existing string, starting at a given index. + * + * Overwrites the contents of *str starting at \p start with "text", + * including a new null-terminator. Allocates more memory as necessary. + * + * This can be used to append formatted text when the length of the existing + * string is already known, saving a strlen() call. + * + * \sa ralloc_asprintf_rewrite_tail + * + * \param str The string to be updated. + * \param startThe index to start appending new data at. + * \param text The input string terminated by zero. + * \param text_length The length of the input string. + * + * \p str will be updated to the new pointer unless allocation fails. + * \p start will be increased by the length of the newly formatted text. + * + * \return True unless allocation failed. + */ +bool ralloc_sprint_rewrite_tail(char **str, size_t *start, const char *text, +unsigned text_length); + +/** * Declare C++ new and delete operators which use ralloc. * * Placing this macro in the body of a class makes it possible to do: * * TYPE *var = new(mem_ctx) TYPE(...); * delete var; * * which is more idiomatic in C++ than calling ralloc. */ #define DECLARE_ALLOC_CXX_OPERATORS_TEMPLATE(TYPE, ALLOC_FUNC) \ -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/7] Faster GLSL compilation (for Gallium)
Hi, The first 2 patches make the GLSL preprocessor a little faster. The others add an optional CAP/Const flag to Mesa and Gallium that decreases the amount of GLSL optimizations that are executed by tweaking the do_common_optimizations call sites. I've not seen a drop in the quality of the produced TGSI (thanks to TGSI passes in glsl_to_tgsi?), but there is a CAP just in case some other drivers don't want this. It reduces GLSL compile times by 24% with the Gallium noop driver. (or 10% with full radeonsi?) Please review. BTW, after this series, it became more obvious that debug builds have much slower compilation because of these two: - validate_ir_tree - tgsi_sanity_check If you wanna profile the compiler, disable those two or don't use a debug build. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/5] gallium: remove TGSI_OPCODE_SUB
On Sat, Dec 31, 2016 at 7:04 PM, Marek Olšákwrote: > +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > @@ -1695,21 +1695,22 @@ glsl_to_tgsi_visitor::visit_expression(ir_expression* > ir, st_src_reg *op) > * driver. > */ >emit_asm(ir, TGSI_OPCODE_MOV, result_dst, st_src_reg_for_float(0.5)); >break; > } > > case ir_binop_add: >emit_asm(ir, TGSI_OPCODE_ADD, result_dst, op[0], op[1]); >break; > case ir_binop_sub: > - emit_asm(ir, TGSI_OPCODE_SUB, result_dst, op[0], op[1]); > + op[1].negate = 1; > + emit_asm(ir, TGSI_OPCODE_ADD, result_dst, op[0], op[1]); >break; I think you want op[1].negate = ~op[1].negate, as it could have been inverted by ir_unop_neg. [Note that this is not a full review, just something I happened to notice.] Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/5] gallium/hud: add an option to reset the color counter
From: Marek Olšák--- src/gallium/auxiliary/hud/hud_context.c | 21 ++--- src/gallium/auxiliary/hud/hud_private.h | 1 + 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c index 9e17d9b..eefbe60 100644 --- a/src/gallium/auxiliary/hud/hud_context.c +++ b/src/gallium/auxiliary/hud/hud_context.c @@ -817,31 +817,32 @@ hud_pane_add_graph(struct hud_pane *pane, struct hud_graph *gr) {1, 0.5, 0.5}, {0.5, 1, 1}, {1, 0.5, 1}, {1, 1, 0.5}, {0, 0.5, 0}, {0.5, 0, 0}, {0, 0.5, 0.5}, {0.5, 0, 0.5}, {0.5, 0.5, 0}, }; - unsigned color = pane->num_graphs % ARRAY_SIZE(colors); + unsigned color = pane->next_color % ARRAY_SIZE(colors); strip_hyphens(gr->name); gr->vertices = MALLOC(pane->max_num_vertices * sizeof(float) * 2); gr->color[0] = colors[color][0]; gr->color[1] = colors[color][1]; gr->color[2] = colors[color][2]; gr->pane = pane; LIST_ADDTAIL(>head, >graph_list); pane->num_graphs++; + pane->next_color++; } void hud_graph_add_value(struct hud_graph *gr, uint64_t value) { gr->current_value = value; value = value > gr->pane->ceiling ? gr->pane->ceiling : value; if (gr->fd) fprintf(gr->fd, "%" PRIu64 "\n", value); @@ -916,21 +917,22 @@ parse_string(const char *s, char *out) "parsing a string\n", *s, *s); fflush(stderr); } return i; } static char * read_pane_settings(char *str, unsigned * const x, unsigned * const y, unsigned * const width, unsigned * const height, - uint64_t * const ceiling, boolean * const dyn_ceiling) + uint64_t * const ceiling, boolean * const dyn_ceiling, + boolean *reset_colors) { char *ret = str; unsigned tmp; while (*str == '.') { ++str; switch (*str) { case 'x': ++str; *x = strtoul(str, , 10); @@ -967,20 +969,26 @@ read_pane_settings(char *str, unsigned * const x, unsigned * const y, *ceiling = tmp > 10 ? tmp : 10; str = ret; break; case 'd': ++str; ret = str; *dyn_ceiling = true; break; + case 'r': + ++str; + ret = str; + *reset_colors = true; + break; + default: fprintf(stderr, "gallium_hud: syntax error: unexpected '%c'\n", *str); fflush(stderr); } } return ret; } @@ -1008,56 +1016,62 @@ hud_parse_env_var(struct hud_context *hud, const char *env) unsigned num, i; char name_a[256], s[256]; char *name; struct hud_pane *pane = NULL; unsigned x = 10, y = 10; unsigned width = 251, height = 100; unsigned period = 500 * 1000; /* default period (1/2 second) */ uint64_t ceiling = UINT64_MAX; unsigned column_width = 251; boolean dyn_ceiling = false; + boolean reset_colors = false; const char *period_env; /* * The GALLIUM_HUD_PERIOD env var sets the graph update rate. * The env var is in seconds (a float). * Zero means update after every frame. */ period_env = getenv("GALLIUM_HUD_PERIOD"); if (period_env) { float p = (float) atof(period_env); if (p >= 0.0f) { period = (unsigned) (p * 1000 * 1000); } } while ((num = parse_string(env, name_a)) != 0) { env += num; /* check for explicit location, size and etc. settings */ name = read_pane_settings(name_a, , , , , , - _ceiling); + _ceiling, _colors); /* * Keep track of overall column width to avoid pane overlapping in case * later we create a new column while the bottom pane in the current * column is less wide than the rest of the panes in it. */ column_width = width > column_width ? width : column_width; if (!pane) { pane = hud_pane_create(x, y, x + width, y + height, period, 10, ceiling, dyn_ceiling); if (!pane) return; } + if (reset_colors) { + pane->next_color = 0; + reset_colors = false; + } + /* Add a graph. */ #if HAVE_GALLIUM_EXTRA_HUD || HAVE_LIBSENSORS char arg_name[64]; #endif /* IF YOU CHANGE THIS, UPDATE print_help! */ if (strcmp(name, "fps") == 0) { hud_fps_graph_install(pane); } else if (strcmp(name, "cpu") == 0) { hud_cpu_graph_install(pane, ALL_CPUS); @@ -1306,20 +1320,21 @@ print_help(struct pipe_screen *screen) puts(" to the upper-left corner of the viewport, in pixels."); puts(" 'y[value]' sets the location of the pane on the y axis relative"); puts(" to the upper-left corner of the viewport, in pixels."); puts("
[Mesa-dev] [PATCH 2/5] gallium/hud: allow more data sources per pane
From: Marek Olšák--- src/gallium/auxiliary/hud/hud_context.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c index 4c65af3..9e17d9b 100644 --- a/src/gallium/auxiliary/hud/hud_context.c +++ b/src/gallium/auxiliary/hud/hud_context.c @@ -806,37 +806,39 @@ strip_hyphens(char *s) */ void hud_pane_add_graph(struct hud_pane *pane, struct hud_graph *gr) { static const float colors[][3] = { {0, 1, 0}, {1, 0, 0}, {0, 1, 1}, {1, 0, 1}, {1, 1, 0}, - {0.5, 0.5, 1}, - {0.5, 0.5, 0.5}, + {0.5, 1, 0.5}, + {1, 0.5, 0.5}, + {0.5, 1, 1}, + {1, 0.5, 1}, + {1, 1, 0.5}, + {0, 0.5, 0}, + {0.5, 0, 0}, + {0, 0.5, 0.5}, + {0.5, 0, 0.5}, + {0.5, 0.5, 0}, }; - char *name = gr->name; + unsigned color = pane->num_graphs % ARRAY_SIZE(colors); - /* replace '-' with a space */ - while (*name) { - if (*name == '-') - *name = ' '; - name++; - } + strip_hyphens(gr->name); - assert(pane->num_graphs < ARRAY_SIZE(colors)); gr->vertices = MALLOC(pane->max_num_vertices * sizeof(float) * 2); - gr->color[0] = colors[pane->num_graphs][0]; - gr->color[1] = colors[pane->num_graphs][1]; - gr->color[2] = colors[pane->num_graphs][2]; + gr->color[0] = colors[color][0]; + gr->color[1] = colors[color][1]; + gr->color[2] = colors[color][2]; gr->pane = pane; LIST_ADDTAIL(>head, >graph_list); pane->num_graphs++; } void hud_graph_add_value(struct hud_graph *gr, uint64_t value) { gr->current_value = value; value = value > gr->pane->ceiling ? gr->pane->ceiling : value; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/5] gallium/hud: add an option to sort items below graphs
From: Marek Olšák--- src/gallium/auxiliary/hud/hud_context.c | 37 - src/gallium/auxiliary/hud/hud_private.h | 1 + 2 files changed, 33 insertions(+), 5 deletions(-) diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c index eefbe60..6892289 100644 --- a/src/gallium/auxiliary/hud/hud_context.c +++ b/src/gallium/auxiliary/hud/hud_context.c @@ -476,21 +476,21 @@ void hud_draw(struct hud_context *hud, struct pipe_resource *tex) { struct cso_context *cso = hud->cso; struct pipe_context *pipe = hud->pipe; struct pipe_framebuffer_state fb; struct pipe_surface surf_templ, *surf; struct pipe_viewport_state viewport; const struct pipe_sampler_state *sampler_states[] = { >font_sampler_state }; struct hud_pane *pane; - struct hud_graph *gr; + struct hud_graph *gr, *next; if (!huds_visible) return; hud->fb_width = tex->width0; hud->fb_height = tex->height0; hud->constants.two_div_fb_width = 2.0f / hud->fb_width; hud->constants.two_div_fb_height = 2.0f / hud->fb_height; cso_save_state(cso, (CSO_BIT_FRAMEBUFFER | @@ -569,20 +569,37 @@ hud_draw(struct hud_context *hud, struct pipe_resource *tex) hud_alloc_vertices(hud, >text, 4 * 1024, 4 * sizeof(float)); /* prepare all graphs */ hud_batch_query_update(hud->batch_query); LIST_FOR_EACH_ENTRY(pane, >pane_list, head) { LIST_FOR_EACH_ENTRY(gr, >graph_list, head) { gr->query_new_value(gr); } + if (pane->sort_items) { + LIST_FOR_EACH_ENTRY_SAFE(gr, next, >graph_list, head) { +/* ignore the last one */ +if (>head == pane->graph_list.prev) + continue; + +/* This is an incremental bubble sort, because we only do one pass + * per frame. It will eventually reach an equilibrium. + */ +if (gr->current_value < +LIST_ENTRY(struct hud_graph, next, head)->current_value) { + LIST_DEL(>head); + LIST_ADD(>head, >head); +} + } + } + hud_pane_accumulate_vertices(hud, pane); } /* unmap the uploader's vertex buffer before drawing */ u_upload_unmap(hud->uploader); /* draw accumulated vertices for background quads */ cso_set_blend(cso, >alpha_blend); cso_set_fragment_shader_handle(hud->cso, hud->fs_color); @@ -754,21 +771,21 @@ hud_pane_update_dyn_ceiling(struct hud_graph *gr, struct hud_pane *pane) /* * Mark this adjustment run so we could avoid repeating a full update * again needlessly in case the pane has more than one graph. */ pane->dyn_ceil_last_ran = gr->index; } static struct hud_pane * hud_pane_create(unsigned x1, unsigned y1, unsigned x2, unsigned y2, unsigned period, uint64_t max_value, uint64_t ceiling, -boolean dyn_ceiling) +boolean dyn_ceiling, boolean sort_items) { struct hud_pane *pane = CALLOC_STRUCT(hud_pane); if (!pane) return NULL; pane->x1 = x1; pane->y1 = y1; pane->x2 = x2; pane->y2 = y2; @@ -776,20 +793,21 @@ hud_pane_create(unsigned x1, unsigned y1, unsigned x2, unsigned y2, pane->inner_x2 = x2 - 1; pane->inner_y1 = y1 + 1; pane->inner_y2 = y2 - 1; pane->inner_width = pane->inner_x2 - pane->inner_x1; pane->inner_height = pane->inner_y2 - pane->inner_y1; pane->period = period; pane->max_num_vertices = (x2 - x1 + 2) / 2; pane->ceiling = ceiling; pane->dyn_ceiling = dyn_ceiling; pane->dyn_ceil_last_ran = 0; + pane->sort_items = sort_items; pane->initial_max_value = max_value; hud_pane_set_max_value(pane, max_value); LIST_INITHEAD(>graph_list); return pane; } /* replace '-' with a space */ static void strip_hyphens(char *s) { @@ -918,21 +936,21 @@ parse_string(const char *s, char *out) fflush(stderr); } return i; } static char * read_pane_settings(char *str, unsigned * const x, unsigned * const y, unsigned * const width, unsigned * const height, uint64_t * const ceiling, boolean * const dyn_ceiling, - boolean *reset_colors) + boolean *reset_colors, boolean *sort_items) { char *ret = str; unsigned tmp; while (*str == '.') { ++str; switch (*str) { case 'x': ++str; *x = strtoul(str, , 10); @@ -975,20 +993,26 @@ read_pane_settings(char *str, unsigned * const x, unsigned * const y, ret = str; *dyn_ceiling = true; break; case 'r': ++str; ret = str; *reset_colors = true; break; + case 's': + ++str; + ret = str; + *sort_items = true; + break; + default: fprintf(stderr, "gallium_hud: syntax error: unexpected
[Mesa-dev] [PATCH 5/5] gallium/hud: increase the vertex buffer size for text
From: Marek Olšák--- src/gallium/auxiliary/hud/hud_context.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c index 6892289..50c2f80 100644 --- a/src/gallium/auxiliary/hud/hud_context.c +++ b/src/gallium/auxiliary/hud/hud_context.c @@ -559,21 +559,21 @@ hud_draw(struct hud_context *hud, struct pipe_resource *tex) cso_set_vertex_elements(cso, 2, hud->velems); cso_set_render_condition(cso, NULL, FALSE, 0); cso_set_sampler_views(cso, PIPE_SHADER_FRAGMENT, 1, >font_sampler_view); cso_set_samplers(cso, PIPE_SHADER_FRAGMENT, 1, sampler_states); cso_set_constant_buffer(cso, PIPE_SHADER_VERTEX, 0, >constbuf); /* prepare vertex buffers */ hud_alloc_vertices(hud, >bg, 4 * 256, 2 * sizeof(float)); hud_alloc_vertices(hud, >whitelines, 4 * 256, 2 * sizeof(float)); - hud_alloc_vertices(hud, >text, 4 * 1024, 4 * sizeof(float)); + hud_alloc_vertices(hud, >text, 16 * 1024, 4 * sizeof(float)); /* prepare all graphs */ hud_batch_query_update(hud->batch_query); LIST_FOR_EACH_ENTRY(pane, >pane_list, head) { LIST_FOR_EACH_ENTRY(gr, >graph_list, head) { gr->query_new_value(gr); } if (pane->sort_items) { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/5] gallium/hud: add an option to rename each data source
From: Marek Olšákuseful for radeonsi performance counters --- src/gallium/auxiliary/hud/hud_context.c | 40 - 1 file changed, 30 insertions(+), 10 deletions(-) diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c index 779c116..4c65af3 100644 --- a/src/gallium/auxiliary/hud/hud_context.c +++ b/src/gallium/auxiliary/hud/hud_context.c @@ -782,20 +782,31 @@ hud_pane_create(unsigned x1, unsigned y1, unsigned x2, unsigned y2, pane->max_num_vertices = (x2 - x1 + 2) / 2; pane->ceiling = ceiling; pane->dyn_ceiling = dyn_ceiling; pane->dyn_ceil_last_ran = 0; pane->initial_max_value = max_value; hud_pane_set_max_value(pane, max_value); LIST_INITHEAD(>graph_list); return pane; } +/* replace '-' with a space */ +static void +strip_hyphens(char *s) +{ + while (*s) { + if (*s == '-') + *s = ' '; + s++; + } +} + /** * Add a graph to an existing pane. * One pane can contain multiple graphs over each other. */ void hud_pane_add_graph(struct hud_pane *pane, struct hud_graph *gr) { static const float colors[][3] = { {0, 1, 0}, {1, 0, 0}, @@ -885,21 +896,21 @@ hud_graph_set_dump_file(struct hud_graph *gr) /** * Read a string from the environment variable. * The separators "+", ",", ":", and ";" terminate the string. * Return the number of read characters. */ static int parse_string(const char *s, char *out) { int i; - for (i = 0; *s && *s != '+' && *s != ',' && *s != ':' && *s != ';'; + for (i = 0; *s && *s != '+' && *s != ',' && *s != ':' && *s != ';' && *s != '='; s++, out++, i++) *out = *s; *out = 0; if (*s && !i) { fprintf(stderr, "gallium_hud: syntax error: unexpected '%c' (%i) while " "parsing a string\n", *s, *s); fflush(stderr); } @@ -1164,41 +1175,48 @@ hud_parse_env_var(struct hud_context *hud, const char *env) /* driver queries */ if (!processed) { if (!hud_driver_query_install(>batch_query, pane, hud->pipe, name)) { fprintf(stderr, "gallium_hud: unknown driver query '%s'\n", name); fflush(stderr); } } } - if (*env == ':') { + if (*env == ':' || *env == '=') { + char key = *env; env++; if (!pane) { fprintf(stderr, "gallium_hud: syntax error: unexpected ':', " "expected a name\n"); fflush(stderr); break; } num = parse_string(env, s); env += num; - if (num && sscanf(s, "%u", ) == 1) { -hud_pane_set_max_value(pane, i); -pane->initial_max_value = i; - } - else { -fprintf(stderr, "gallium_hud: syntax error: unexpected '%c' (%i) " -"after ':'\n", *env, *env); -fflush(stderr); + if (key == ':') { +if (num && sscanf(s, "%u", ) == 1) { + hud_pane_set_max_value(pane, i); + pane->initial_max_value = i; +} +else { + fprintf(stderr, "gallium_hud: syntax error: unexpected '%c' (%i) " + "after ':'\n", *env, *env); + fflush(stderr); +} + } else if (key == '=') { +strip_hyphens(s); +strcpy(LIST_ENTRY(struct hud_graph, + pane->graph_list.prev, head)->name, s); } } if (*env == 0) break; /* parse a separator */ switch (*env) { case '+': env++; @@ -1264,20 +1282,22 @@ print_help(struct pipe_screen *screen) puts(""); puts(" Names are identifiers of data sources which will be drawn as graphs"); puts(" in panes. Multiple graphs can be drawn in the same pane."); puts(" There can be multiple panes placed in rows and columns."); puts(""); puts(" '+' separates names which will share a pane."); puts(" ':[value]' specifies the initial maximum value of the Y axis"); puts(" for the given pane."); puts(" ',' creates a new pane below the last one."); puts(" ';' creates a new pane at the top of the next column."); + puts(" '=' followed by a string, changes the name of the last data source"); + puts(" to that string"); puts(""); puts(" Example: GALLIUM_HUD=\"cpu,fps;primitives-generated\""); puts(""); puts(" Additionally, by prepending '.[identifier][value]' modifiers to"); puts(" a name, it is possible to explicitly set the location and size"); puts(" of a pane, along with limiting overall maximum value of the"); puts(" Y axis and activating dynamic readjustment of the Y axis."); puts(" Several modifiers may be applied to the same pane simultaneously.");
[Mesa-dev] [PATCH 4/5] gallium: remove TGSI_OPCODE_ABS
From: Marek OlšákIt's redundant with the source modifier. --- src/gallium/auxiliary/gallivm/lp_bld_tgsi.c| 2 +- src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c | 16 +- src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c| 2 +- src/gallium/auxiliary/nir/tgsi_to_nir.c| 1 - src/gallium/auxiliary/tgsi/tgsi_exec.c | 4 --- src/gallium/auxiliary/tgsi/tgsi_info.c | 2 +- src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h | 1 - src/gallium/auxiliary/tgsi/tgsi_util.c | 1 - src/gallium/drivers/freedreno/a2xx/fd2_compiler.c | 5 src/gallium/drivers/i915/i915_fpc_optimize.c | 1 - src/gallium/drivers/i915/i915_fpc_translate.c | 9 -- src/gallium/drivers/ilo/shader/toy_tgsi.c | 4 --- .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 3 -- src/gallium/drivers/nouveau/nv30/nvfx_fragprog.c | 3 -- src/gallium/drivers/nouveau/nv30/nvfx_vertprog.c | 3 -- src/gallium/drivers/r300/r300_tgsi_to_rc.c | 1 - src/gallium/drivers/r600/r600_shader.c | 9 ++ src/gallium/drivers/radeonsi/si_shader_tgsi_alu.c | 2 -- src/gallium/drivers/svga/svga_tgsi_insn.c | 1 - src/gallium/drivers/svga/svga_tgsi_vgpu10.c| 24 --- src/gallium/include/pipe/p_shader_tokens.h | 2 +- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 35 ++ src/mesa/state_tracker/st_mesa_to_tgsi.c | 6 ++-- 23 files changed, 41 insertions(+), 96 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c index 68ac695..d368f38 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c @@ -353,21 +353,21 @@ lp_build_emit_fetch( assert(0 && "invalid src register in emit_fetch()"); return bld_base->base.undef; } if (reg->Register.Absolute) { switch (stype) { case TGSI_TYPE_FLOAT: case TGSI_TYPE_DOUBLE: case TGSI_TYPE_UNTYPED: /* modifiers on movs assume data is float */ - res = lp_build_emit_llvm_unary(bld_base, TGSI_OPCODE_ABS, res); + res = lp_build_abs(_base->base, res); break; case TGSI_TYPE_UNSIGNED: case TGSI_TYPE_SIGNED: case TGSI_TYPE_UNSIGNED64: case TGSI_TYPE_SIGNED64: case TGSI_TYPE_VOID: default: /* abs modifier is only legal on floating point types */ assert(0); break; diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c index 9c6fc4b..7d939e8 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c @@ -492,22 +492,21 @@ static struct lp_build_tgsi_action lit_action = { static void log_emit( const struct lp_build_tgsi_action * action, struct lp_build_tgsi_context * bld_base, struct lp_build_emit_data * emit_data) { LLVMValueRef abs_x, log_abs_x, flr_log_abs_x, ex2_flr_log_abs_x; /* abs( src0.x) */ - abs_x = lp_build_emit_llvm_unary(bld_base, TGSI_OPCODE_ABS, -emit_data->args[0] /* src0.x */); + abs_x = lp_build_abs(_base->base, emit_data->args[0] /* src0.x */); /* log( abs( src0.x ) ) */ log_abs_x = lp_build_emit_llvm_unary(bld_base, TGSI_OPCODE_LG2, abs_x); /* floor( log( abs( src0.x ) ) ) */ flr_log_abs_x = lp_build_emit_llvm_unary(bld_base, TGSI_OPCODE_FLR, log_abs_x); /* dst.x */ emit_data->output[TGSI_CHAN_X] = flr_log_abs_x; @@ -1405,32 +1404,20 @@ lp_set_default_actions(struct lp_build_tgsi_context * bld_base) bld_base->op_actions[TGSI_OPCODE_U642D].emit = u642d_emit; } /* CPU Only default actions */ /* These actions are CPU only, because they could potentially output SSE * intrinsics. */ -/* TGSI_OPCODE_ABS (CPU Only)*/ - -static void -abs_emit_cpu( - const struct lp_build_tgsi_action * action, - struct lp_build_tgsi_context * bld_base, - struct lp_build_emit_data * emit_data) -{ - emit_data->output[emit_data->chan] = lp_build_abs(_base->base, - emit_data->args[0]); -} - /* TGSI_OPCODE_ADD (CPU Only) */ static void add_emit_cpu( const struct lp_build_tgsi_action * action, struct lp_build_tgsi_context * bld_base, struct lp_build_emit_data * emit_data) { emit_data->output[emit_data->chan] = lp_build_add(_base->base, emit_data->args[0], emit_data->args[1]); } @@ -2581,21 +2568,20 @@ u64shr_emit_cpu( LLVMValueRef masked_count = lp_build_and(uint_bld, emit_data->args[1], mask); emit_data->output[emit_data->chan] = lp_build_shr(uint_bld, emit_data->args[0],
[Mesa-dev] [PATCH 5/5] gallium: remove TGSI_OPCODE_SUB
From: Marek OlšákIt's redundant with the source modifier. --- src/gallium/auxiliary/draw/draw_pipe_aaline.c | 2 +- src/gallium/auxiliary/draw/draw_pipe_aapoint.c | 20 ++-- src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c | 38 +++--- src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c| 6 src/gallium/auxiliary/nir/tgsi_to_nir.c| 1 - src/gallium/auxiliary/tgsi/tgsi_aa_point.c | 20 ++-- src/gallium/auxiliary/tgsi/tgsi_exec.c | 4 --- src/gallium/auxiliary/tgsi/tgsi_info.c | 2 +- src/gallium/auxiliary/tgsi/tgsi_lowering.c | 22 - src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h | 1 - src/gallium/auxiliary/tgsi/tgsi_point_sprite.c | 12 +++ src/gallium/auxiliary/tgsi/tgsi_transform.h| 8 +++-- src/gallium/auxiliary/tgsi/tgsi_util.c | 1 - src/gallium/auxiliary/util/u_pstipple.c| 2 +- src/gallium/auxiliary/vl/vl_bicubic_filter.c | 4 +-- src/gallium/auxiliary/vl/vl_compositor.c | 4 +-- src/gallium/auxiliary/vl/vl_deint_filter.c | 8 ++--- src/gallium/drivers/i915/i915_fpc_optimize.c | 1 - src/gallium/drivers/i915/i915_fpc_translate.c | 11 --- src/gallium/drivers/ilo/shader/toy_tgsi.c | 6 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 2 -- src/gallium/drivers/nouveau/nv30/nvfx_fragprog.c | 3 -- src/gallium/drivers/nouveau/nv30/nvfx_vertprog.c | 3 -- src/gallium/drivers/r300/r300_tgsi_to_rc.c | 1 - src/gallium/drivers/r600/r600_shader.c | 14 src/gallium/drivers/svga/svga_tgsi_insn.c | 27 --- src/gallium/drivers/svga/svga_tgsi_vgpu10.c| 25 -- src/gallium/include/pipe/p_shader_tokens.h | 2 +- src/gallium/state_trackers/xa/xa_tgsi.c| 4 +-- src/mesa/state_tracker/st_atifs_to_tgsi.c | 18 +- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 3 +- src/mesa/state_tracker/st_mesa_to_tgsi.c | 6 ++-- src/mesa/state_tracker/st_tgsi_lower_yuv.c | 3 +- 33 files changed, 82 insertions(+), 202 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_aaline.c b/src/gallium/auxiliary/draw/draw_pipe_aaline.c index c236caa..57ca12e 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_aaline.c +++ b/src/gallium/auxiliary/draw/draw_pipe_aaline.c @@ -278,21 +278,21 @@ aa_transform_epilog(struct tgsi_transform_context *ctx) tgsi_transform_op1_inst(ctx, TGSI_OPCODE_MOV, TGSI_FILE_OUTPUT, aactx->colorOutput, TGSI_WRITEMASK_XYZ, TGSI_FILE_TEMPORARY, aactx->colorTemp); /* MUL alpha */ tgsi_transform_op2_inst(ctx, TGSI_OPCODE_MUL, TGSI_FILE_OUTPUT, aactx->colorOutput, TGSI_WRITEMASK_W, TGSI_FILE_TEMPORARY, aactx->colorTemp, - TGSI_FILE_TEMPORARY, aactx->texTemp); + TGSI_FILE_TEMPORARY, aactx->texTemp, false); } } /** * TGSI instruction transform callback. * Replace writes to result.color w/ a temp reg. */ static void aa_transform_inst(struct tgsi_transform_context *ctx, diff --git a/src/gallium/auxiliary/draw/draw_pipe_aapoint.c b/src/gallium/auxiliary/draw/draw_pipe_aapoint.c index 33ef8ec..2b96b8a 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_aapoint.c +++ b/src/gallium/auxiliary/draw/draw_pipe_aapoint.c @@ -206,88 +206,88 @@ aa_transform_prolog(struct tgsi_transform_context *ctx) * t0.x = distance of fragment from center point * t0.y = boolean, is t0.x > 1.0, also misc temp usage * t0.z = temporary for computing 1/(1-k) value * t0.w = final coverage value */ /* MUL t0.xy, tex, tex; # compute x^2, y^2 */ tgsi_transform_op2_inst(ctx, TGSI_OPCODE_MUL, TGSI_FILE_TEMPORARY, tmp0, TGSI_WRITEMASK_XY, TGSI_FILE_INPUT, texInput, - TGSI_FILE_INPUT, texInput); + TGSI_FILE_INPUT, texInput, false); /* ADD t0.x, t0.x, t0.y; # x^2 + y^2 */ tgsi_transform_op2_swz_inst(ctx, TGSI_OPCODE_ADD, TGSI_FILE_TEMPORARY, tmp0, TGSI_WRITEMASK_X, TGSI_FILE_TEMPORARY, tmp0, TGSI_SWIZZLE_X, - TGSI_FILE_TEMPORARY, tmp0, TGSI_SWIZZLE_Y); + TGSI_FILE_TEMPORARY, tmp0, TGSI_SWIZZLE_Y, false); #if NORMALIZE /* OPTIONAL normalization of length */ /* RSQ t0.x, t0.x; */ tgsi_transform_op1_inst(ctx, TGSI_OPCODE_RSQ, TGSI_FILE_TEMPORARY, tmp0, TGSI_WRITEMASK_X, TGSI_FILE_TEMPORARY, tmp0); /* RCP t0.x, t0.x; */
[Mesa-dev] [PATCH 3/5] st/nine: Remove all usage of ureg_SUB in nine_shader
From: Axel DavyThis is required to drop gallium SUB. Signed-off-by: Axel Davy --- src/gallium/state_trackers/nine/nine_shader.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/gallium/state_trackers/nine/nine_shader.c b/src/gallium/state_trackers/nine/nine_shader.c index a1e0070..0a75c07 100644 --- a/src/gallium/state_trackers/nine/nine_shader.c +++ b/src/gallium/state_trackers/nine/nine_shader.c @@ -1075,22 +1075,22 @@ tx_src_param(struct shader_translator *tx, const struct sm1_src_param *param) src = ureg_src(tx->regs.address); break; case D3DSPR_MISCTYPE: switch (param->idx) { case D3DSMO_POSITION: if (ureg_src_is_undef(tx->regs.vPos)) tx->regs.vPos = nine_get_position_input(tx); if (tx->shift_wpos) { /* TODO: do this only once */ struct ureg_dst wpos = tx_scratch(tx); - ureg_SUB(ureg, wpos, tx->regs.vPos, -ureg_imm4f(ureg, 0.5f, 0.5f, 0.0f, 0.0f)); + ureg_ADD(ureg, wpos, tx->regs.vPos, +ureg_imm4f(ureg, -0.5f, -0.5f, 0.0f, 0.0f)); src = ureg_src(wpos); } else { src = tx->regs.vPos; } break; case D3DSMO_FACE: if (ureg_src_is_undef(tx->regs.vFace)) { if (tx->face_is_sysval_integer) { tmp = tx_scratch(tx); tx->regs.vFace = @@ -1154,39 +1154,39 @@ tx_src_param(struct shader_translator *tx, const struct sm1_src_param *param) src = ureg_abs(src); break; case NINED3DSPSM_ABSNEG: src = ureg_negate(ureg_abs(src)); break; case NINED3DSPSM_NEG: src = ureg_negate(src); break; case NINED3DSPSM_BIAS: tmp = tx_scratch(tx); -ureg_SUB(ureg, tmp, src, ureg_imm1f(ureg, 0.5f)); +ureg_ADD(ureg, tmp, src, ureg_imm1f(ureg, -0.5f)); src = ureg_src(tmp); break; case NINED3DSPSM_BIASNEG: tmp = tx_scratch(tx); -ureg_SUB(ureg, tmp, ureg_imm1f(ureg, 0.5f), src); +ureg_ADD(ureg, tmp, ureg_imm1f(ureg, 0.5f), ureg_negate(src)); src = ureg_src(tmp); break; case NINED3DSPSM_NOT: if (tx->native_integers) { tmp = tx_scratch(tx); ureg_NOT(ureg, tmp, src); src = ureg_src(tmp); break; } /* fall through */ case NINED3DSPSM_COMP: tmp = tx_scratch(tx); -ureg_SUB(ureg, tmp, ureg_imm1f(ureg, 1.0f), src); +ureg_ADD(ureg, tmp, ureg_imm1f(ureg, 1.0f), ureg_negate(src)); src = ureg_src(tmp); break; case NINED3DSPSM_DZ: case NINED3DSPSM_DW: /* Already handled*/ break; case NINED3DSPSM_SIGN: tmp = tx_scratch(tx); ureg_MAD(ureg, tmp, src, ureg_imm1f(ureg, 2.0f), ureg_imm1f(ureg, -1.0f)); src = ureg_src(tmp); @@ -2545,21 +2545,21 @@ DECL_SPECIAL(TEXM3x3SPEC) ureg_DP3(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_X), ureg_src(dst), ureg_src(dst)); ureg_RCP(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_X), ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X)); /* at this step tmp.x = 1/N.N */ ureg_DP3(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_Y), ureg_src(dst), E); /* at this step tmp.y = N.E */ ureg_MUL(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_X), ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X), ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_Y)); /* at this step tmp.x = N.E/N.N */ ureg_MUL(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_X), ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X), ureg_imm1f(ureg, 2.0f)); ureg_MUL(ureg, tmp, ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X), ureg_src(dst)); /* at this step tmp.xyz = 2 * (N.E / N.N) * N */ -ureg_SUB(ureg, tmp, ureg_src(tmp), E); +ureg_ADD(ureg, tmp, ureg_src(tmp), ureg_negate(E)); ureg_TEX(ureg, dst, ps1x_sampler_type(tx->info, m + 2), ureg_src(tmp), sample); return D3D_OK; } DECL_SPECIAL(TEXREG2RGB) { struct ureg_program *ureg = tx->ureg; struct ureg_dst dst = tx_dst_param(tx, >insn.dst[0]); struct ureg_src sample; @@ -2684,21 +2684,21 @@ DECL_SPECIAL(TEXM3x3) ureg_DP3(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_X), ureg_src(dst), ureg_src(dst)); ureg_RCP(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_X), ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X)); /* at this step tmp.x = 1/N.N */ ureg_DP3(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_Y), ureg_src(dst), ureg_src(E)); /* at this step tmp.y = N.E */ ureg_MUL(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_X), ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_X), ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_Y)); /* at this step tmp.x = N.E/N.N */ ureg_MUL(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_X),
[Mesa-dev] [PATCH 1/5] st/nine: Do not map SUB and ABS to their gallium equivalent.
From: Axel DavyThis is required for gallium SUB and ABS to be removed. Signed-off-by: Axel Davy --- src/gallium/state_trackers/nine/nine_shader.c | 25 +++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/src/gallium/state_trackers/nine/nine_shader.c b/src/gallium/state_trackers/nine/nine_shader.c index 5effc2c..a1e0070 100644 --- a/src/gallium/state_trackers/nine/nine_shader.c +++ b/src/gallium/state_trackers/nine/nine_shader.c @@ -1573,20 +1573,41 @@ d3dsio_to_string( unsigned opcode ) static HRESULT NineTranslateInstruction_Generic(struct shader_translator *); DECL_SPECIAL(NOP) { /* Nothing to do. NOP was used to avoid hangs * with very old d3d drivers. */ return D3D_OK; } +DECL_SPECIAL(SUB) +{ +struct ureg_program *ureg = tx->ureg; +struct ureg_dst dst = tx_dst_param(tx, >insn.dst[0]); +struct ureg_src src0 = tx_src_param(tx, >insn.src[0]); +struct ureg_src src1 = tx_src_param(tx, >insn.src[1]); + +ureg_ADD(ureg, dst, src0, ureg_negate(src1)); +return D3D_OK; +} + +DECL_SPECIAL(ABS) +{ +struct ureg_program *ureg = tx->ureg; +struct ureg_dst dst = tx_dst_param(tx, >insn.dst[0]); +struct ureg_src src = tx_src_param(tx, >insn.src[0]); + +ureg_MOV(ureg, dst, ureg_abs(src)); +return D3D_OK; +} + DECL_SPECIAL(M4x4) { return NineTranslateInstruction_Mkxn(tx, 4, 4); } DECL_SPECIAL(M4x3) { return NineTranslateInstruction_Mkxn(tx, 4, 3); } @@ -2866,21 +2887,21 @@ DECL_SPECIAL(COMMENT) #define _OPI(o,t,vv1,vv2,pv1,pv2,d,s,h) \ { D3DSIO_##o, TGSI_OPCODE_##t, { vv1, vv2 }, { pv1, pv2, }, d, s, h } struct sm1_op_info inst_table[] = { _OPI(NOP, NOP, V(0,0), V(3,0), V(0,0), V(3,0), 0, 0, SPECIAL(NOP)), /* 0 */ _OPI(MOV, MOV, V(0,0), V(3,0), V(0,0), V(3,0), 1, 1, NULL), _OPI(ADD, ADD, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, NULL), /* 2 */ -_OPI(SUB, SUB, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, NULL), /* 3 */ +_OPI(SUB, NOP, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, SPECIAL(SUB)), /* 3 */ _OPI(MAD, MAD, V(0,0), V(3,0), V(0,0), V(3,0), 1, 3, NULL), /* 4 */ _OPI(MUL, MUL, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, NULL), /* 5 */ _OPI(RCP, RCP, V(0,0), V(3,0), V(0,0), V(3,0), 1, 1, NULL), /* 6 */ _OPI(RSQ, RSQ, V(0,0), V(3,0), V(0,0), V(3,0), 1, 1, SPECIAL(RSQ)), /* 7 */ _OPI(DP3, DP3, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, NULL), /* 8 */ _OPI(DP4, DP4, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, NULL), /* 9 */ _OPI(MIN, MIN, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, NULL), /* 10 */ _OPI(MAX, MAX, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, NULL), /* 11 */ _OPI(SLT, SLT, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, NULL), /* 12 */ _OPI(SGE, SGE, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, NULL), /* 13 */ @@ -2902,21 +2923,21 @@ struct sm1_op_info inst_table[] = _OPI(LOOP,BGNLOOP, V(2,0), V(3,0), V(3,0), V(3,0), 0, 2, SPECIAL(LOOP)), _OPI(RET, RET, V(2,0), V(3,0), V(2,1), V(3,0), 0, 0, SPECIAL(RET)), _OPI(ENDLOOP, ENDLOOP, V(2,0), V(3,0), V(3,0), V(3,0), 0, 0, SPECIAL(ENDLOOP)), _OPI(LABEL, NOP, V(2,0), V(3,0), V(2,1), V(3,0), 0, 1, SPECIAL(LABEL)), _OPI(DCL, NOP, V(0,0), V(3,0), V(0,0), V(3,0), 0, 0, SPECIAL(DCL)), _OPI(POW, POW, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, SPECIAL(POW)), _OPI(CRS, XPD, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, NULL), /* XXX: .w */ _OPI(SGN, SSG, V(2,0), V(3,0), V(0,0), V(0,0), 1, 3, SPECIAL(SGN)), /* ignore src1,2 */ -_OPI(ABS, ABS, V(0,0), V(3,0), V(0,0), V(3,0), 1, 1, NULL), +_OPI(ABS, NOP, V(0,0), V(3,0), V(0,0), V(3,0), 1, 1, SPECIAL(ABS)), _OPI(NRM, NOP, V(0,0), V(3,0), V(0,0), V(3,0), 1, 1, SPECIAL(NRM)), /* NRM doesn't fit */ _OPI(SINCOS, SCS, V(2,0), V(2,1), V(2,0), V(2,1), 1, 3, SPECIAL(SINCOS)), _OPI(SINCOS, SCS, V(3,0), V(3,0), V(3,0), V(3,0), 1, 1, SPECIAL(SINCOS)), /* More flow control */ _OPI(REP,NOP,V(2,0), V(3,0), V(2,1), V(3,0), 0, 1, SPECIAL(REP)), _OPI(ENDREP, NOP,V(2,0), V(3,0), V(2,1), V(3,0), 0, 0, SPECIAL(ENDREP)), _OPI(IF, IF, V(2,0), V(3,0), V(2,1), V(3,0), 0, 1, SPECIAL(IF)), _OPI(IFC,IF, V(2,1), V(3,0), V(2,1), V(3,0), 0, 2, SPECIAL(IFC)), -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/5] st/nine: Remove all usage of ureg_SUB in nine_ff
From: Axel DavyThis is required to remove gallium SUB. Signed-off-by: Axel Davy --- src/gallium/state_trackers/nine/nine_ff.c | 40 +++ 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/src/gallium/state_trackers/nine/nine_ff.c b/src/gallium/state_trackers/nine/nine_ff.c index a0a33cd..7cbe3f7 100644 --- a/src/gallium/state_trackers/nine/nine_ff.c +++ b/src/gallium/state_trackers/nine/nine_ff.c @@ -442,23 +442,23 @@ nine_ff_build_vs(struct NineDevice9 *device, struct vs_build_ctx *vs) ureg_MOV(ureg, oPos, vs->aVtx); } else { struct ureg_dst tmp = ureg_DECL_temporary(ureg); /* vs->aVtx contains the coordinates buffer wise. * later in the pipeline, clipping, viewport and division * by w (rhw = 1/w) are going to be applied, so do the reverse * of these transformations (except clipping) to have the good * position at the end.*/ ureg_MOV(ureg, tmp, vs->aVtx); /* X from [X_min, X_min + width] to [-1, 1], same for Y. Z to [0, 1] */ -ureg_SUB(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_XYZ), ureg_src(tmp), _CONST(101)); +ureg_ADD(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_XYZ), ureg_src(tmp), ureg_negate(_CONST(101))); ureg_MUL(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_XYZ), ureg_src(tmp), _CONST(100)); -ureg_SUB(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_XY), ureg_src(tmp), ureg_imm1f(ureg, 1.0f)); +ureg_ADD(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_XY), ureg_src(tmp), ureg_imm1f(ureg, -1.0f)); /* Y needs to be reversed */ ureg_MOV(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_Y), ureg_negate(ureg_src(tmp))); /* inverse rhw */ ureg_RCP(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_W), _W(tmp)); /* multiply X, Y, Z by w */ ureg_MUL(ureg, ureg_writemask(tmp, TGSI_WRITEMASK_XYZ), ureg_src(tmp), _W(tmp)); ureg_MOV(ureg, oPos, ureg_src(tmp)); ureg_release_temporary(ureg, tmp); } } else if (key->vertexblend) { @@ -504,21 +504,21 @@ nine_ff_build_vs(struct NineDevice9 *device, struct vs_build_ctx *vs) ureg_MAD(ureg, tmp2, _(vs->aNrm), cWM[1], ureg_src(tmp2)); ureg_MAD(ureg, tmp2, _(vs->aNrm), cWM[2], ureg_src(tmp2)); } if (i < (key->vertexblend - 1)) { /* accumulate weighted position value */ ureg_MAD(ureg, aVtx_dst, ureg_src(tmp), ureg_scalar(vs->aWgt, i), ureg_src(aVtx_dst)); if (has_aNrm) ureg_MAD(ureg, aNrm_dst, ureg_src(tmp2), ureg_scalar(vs->aWgt, i), ureg_src(aNrm_dst)); /* subtract weighted position value for last value */ -ureg_SUB(ureg, sum_blendweights, ureg_src(sum_blendweights), ureg_scalar(vs->aWgt, i)); +ureg_ADD(ureg, sum_blendweights, ureg_src(sum_blendweights), ureg_negate(ureg_scalar(vs->aWgt, i))); } } /* the last weighted position is always 1 - sum_of_previous_weights */ ureg_MAD(ureg, aVtx_dst, ureg_src(tmp), ureg_scalar(ureg_src(sum_blendweights), key->vertexblend - 1), ureg_src(aVtx_dst)); if (has_aNrm) ureg_MAD(ureg, aNrm_dst, ureg_src(tmp2), ureg_scalar(ureg_src(sum_blendweights), key->vertexblend - 1), ureg_src(aNrm_dst)); /* multiply by VIEW_PROJ */ ureg_MUL(ureg, tmp, _X(aVtx_dst), _CONST(8)); @@ -654,36 +654,36 @@ nine_ff_build_vs(struct NineDevice9 *device, struct vs_build_ctx *vs) ureg_MOV(ureg, ureg_writemask(input_coord, TGSI_WRITEMASK_W), ureg_imm1f(ureg, 1.0f)); dim_input = 4; break; case NINED3DTSS_TCI_CAMERASPACEREFLECTIONVECTOR: tmp.WriteMask = TGSI_WRITEMASK_XYZ; aVtx_normed = ureg_DECL_temporary(ureg); ureg_normalize3(ureg, aVtx_normed, vs->aVtx); ureg_DP3(ureg, tmp_x, ureg_src(aVtx_normed), vs->aNrm); ureg_MUL(ureg, tmp, vs->aNrm, _X(tmp)); ureg_ADD(ureg, tmp, ureg_src(tmp), ureg_src(tmp)); -ureg_SUB(ureg, ureg_writemask(input_coord, TGSI_WRITEMASK_XYZ), ureg_src(aVtx_normed), ureg_src(tmp)); +ureg_ADD(ureg, ureg_writemask(input_coord, TGSI_WRITEMASK_XYZ), ureg_src(aVtx_normed), ureg_negate(ureg_src(tmp))); ureg_MOV(ureg, ureg_writemask(input_coord, TGSI_WRITEMASK_W), ureg_imm1f(ureg, 1.0f)); ureg_release_temporary(ureg, aVtx_normed); dim_input = 4; tmp.WriteMask = TGSI_WRITEMASK_XYZW; break; case NINED3DTSS_TCI_SPHEREMAP: /* Implement the formula of GL_SPHERE_MAP */ tmp.WriteMask = TGSI_WRITEMASK_XYZ; aVtx_normed = ureg_DECL_temporary(ureg); tmp2 =
[Mesa-dev] [Bug 99125] Log to a file all GALLIUM_HUD infos
https://bugs.freedesktop.org/show_bug.cgi?id=99125 --- Comment #2 from Edmondo Tommasina--- FYI: Marek pushed the series of patches to mesa git master. https://cgit.freedesktop.org/mesa/mesa/commit/?id=3f5fba8a7be61bfc0f46a5ea058108f6e0e1c268 -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [AppVeyor] mesa master #3012 failed
src\gallium\auxiliary\hud\hud_context.c(874) : warning C4013: 'access' undefined; assuming extern returning int src\gallium\auxiliary\hud\hud_context.c(874) : error C2065: 'W_OK' : undeclared identifier scons: *** [build\windows-x86-debug\gallium\auxiliary\hud\hud_context.obj] Error 2 On Sat, Dec 31, 2016 at 6:31 PM, AppVeyorwrote: > Build mesa 3012 failed > > Commit 3f5fba8a7b by Edmondo Tommasina on 12/21/2016 9:58 PM: > docs: document GALLIUM_HUD_DUMP_DIR envvar\n\nSigned-off-by: Marek Olšák > > > Configure your notification preferences > > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [AppVeyor] mesa master #3012 failed
Build mesa 3012 failed Commit 3f5fba8a7b by Edmondo Tommasina on 12/21/2016 9:58 PM: docs: document GALLIUM_HUD_DUMP_DIR envvar\n\nSigned-off-by: Marek OlšákConfigure your notification preferences ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] gallium/hud: dump hud_driver_query values to files
FYI, I've pushed the series and squashed the first 2 patches. Thanks, Marek On Sat, Dec 31, 2016 at 10:15 PM, Marek Olšákwrote: > On Wed, Dec 21, 2016 at 10:58 PM, Edmondo Tommasina > wrote: >> Dump values for every selected data source in GALLIUM_HUD. >> >> Every data source has its own file and the filename is >> equal to the data source identifier. >> --- >> src/gallium/auxiliary/hud/hud_context.c | 6 ++ >> src/gallium/auxiliary/hud/hud_driver_query.c | 2 ++ >> src/gallium/auxiliary/hud/hud_private.h | 1 + >> 3 files changed, 9 insertions(+) >> >> diff --git a/src/gallium/auxiliary/hud/hud_context.c >> b/src/gallium/auxiliary/hud/hud_context.c >> index ceb157a..edd831a 100644 >> --- a/src/gallium/auxiliary/hud/hud_context.c >> +++ b/src/gallium/auxiliary/hud/hud_context.c >> @@ -33,6 +33,7 @@ >> * Set GALLIUM_HUD=help for more info. >> */ >> >> +#include >> #include >> #include >> >> @@ -829,6 +830,9 @@ hud_graph_add_value(struct hud_graph *gr, uint64_t value) >> gr->current_value = value; >> value = value > gr->pane->ceiling ? gr->pane->ceiling : value; >> >> + if (gr->fd) >> + fprintf(gr->fd, "%" PRIu64 "\n", value); >> + >> if (gr->index == gr->pane->max_num_vertices) { >>gr->vertices[0] = 0; >>gr->vertices[1] = gr->vertices[(gr->index-1)*2+1]; >> @@ -856,6 +860,8 @@ hud_graph_destroy(struct hud_graph *graph) >> FREE(graph->vertices); >> if (graph->free_query_data) >>graph->free_query_data(graph->query_data); >> + if (graph->fd) >> + fclose(graph->fd); >> FREE(graph); >> } >> >> diff --git a/src/gallium/auxiliary/hud/hud_driver_query.c >> b/src/gallium/auxiliary/hud/hud_driver_query.c >> index 40ea120..bfde16a 100644 >> --- a/src/gallium/auxiliary/hud/hud_driver_query.c >> +++ b/src/gallium/auxiliary/hud/hud_driver_query.c >> @@ -378,6 +378,8 @@ hud_pipe_query_install(struct hud_batch_query_context >> **pbq, >>info->result_index = result_index; >> } >> >> + gr->fd = fopen(gr->name, "w+"); > > This opens the file unconditionally. Did you forget to check the env var here? > > Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/6] gallium/hud: add dump directory enviroment variable
Ignore my comment on patch 1. This patch can be merged with the first one. Marek On Wed, Dec 21, 2016 at 10:58 PM, Edmondo Tommasinawrote: > Set GALLIUM_HUD_DUMP_DIR to dump values to files in this directory. > > No values are dumped if the environment variable is not set, the > directory doesn't exist or the user doesn't have write access. > --- > src/gallium/auxiliary/hud/hud_driver_query.c | 12 +++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/src/gallium/auxiliary/hud/hud_driver_query.c > b/src/gallium/auxiliary/hud/hud_driver_query.c > index bfde16a..23fda01 100644 > --- a/src/gallium/auxiliary/hud/hud_driver_query.c > +++ b/src/gallium/auxiliary/hud/hud_driver_query.c > @@ -351,6 +351,8 @@ hud_pipe_query_install(struct hud_batch_query_context > **pbq, > { > struct hud_graph *gr; > struct query_info *info; > + const char *hud_dump_dir = getenv("GALLIUM_HUD_DUMP_DIR"); > + char *dump_file; > > gr = CALLOC_STRUCT(hud_graph); > if (!gr) > @@ -378,7 +380,15 @@ hud_pipe_query_install(struct hud_batch_query_context > **pbq, >info->result_index = result_index; > } > > - gr->fd = fopen(gr->name, "w+"); > + if (hud_dump_dir && access(hud_dump_dir, W_OK) == 0) { > + dump_file = malloc(strlen(hud_dump_dir) + sizeof(gr->name)); > + if (dump_file) { > + strcpy(dump_file, hud_dump_dir); > + strcat(dump_file, gr->name); > + gr->fd = fopen(dump_file, "w+"); > + free(dump_file); > + } > + } > > hud_pane_add_graph(pane, gr); > pane->type = type; /* must be set before updating the max_value */ > -- > 2.10.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 18/27] i965/miptree: Add a return for updating of winsys
On 16-12-31 14:40:42, Ben Widawsky wrote: On 16-12-10 15:39:12, Pohjolainen, Topi wrote: On Thu, Dec 01, 2016 at 02:09:59PM -0800, Ben Widawsky wrote: [snip] We don't seem to use "zero for success"-style at least in i965. Could you change this to bool and flip the check earlier for consistency? What do you mean by flip the check earlier? nvm. I realized what you meant. [snip] ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 18/27] i965/miptree: Add a return for updating of winsys
On 16-12-10 15:39:12, Pohjolainen, Topi wrote: On Thu, Dec 01, 2016 at 02:09:59PM -0800, Ben Widawsky wrote: From: Ben WidawskyThere is nothing particularly useful to do currently if the update fails, but there is no point carrying on either. As a result, this has a behavior change. Signed-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/brw_context.c | 14 -- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 6 +++--- src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 2 +- 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index b928f94..593fa67 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -1645,9 +1645,10 @@ intel_process_dri2_buffer(struct brw_context *brw, return; } - intel_update_winsys_renderbuffer_miptree(brw, rb, bo, -drawable->w, drawable->h, -buffer->pitch); + if (intel_update_winsys_renderbuffer_miptree(brw, rb, bo, +drawable->w, drawable->h, +buffer->pitch)) + return; if (_mesa_is_front_buffer_drawing(fb) && (buffer->attachment == __DRI_BUFFER_FRONT_LEFT || @@ -1703,9 +1704,10 @@ intel_update_image_buffer(struct brw_context *intel, if (last_mt && last_mt->bo == buffer->bo) return; - intel_update_winsys_renderbuffer_miptree(intel, rb, buffer->bo, -buffer->width, buffer->height, -buffer->pitch); + if (intel_update_winsys_renderbuffer_miptree(intel, rb, buffer->bo, +buffer->width, buffer->height, +buffer->pitch)) + return; if (_mesa_is_front_buffer_drawing(fb) && buffer_type == __DRI_IMAGE_BUFFER_FRONT && diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index d002546..74db507 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -908,7 +908,7 @@ intel_miptree_create_for_image(struct brw_context *intel, * that will contain the actual rendering (which is lazily resolved to * irb->singlesample_mt). */ -void +int We don't seem to use "zero for success"-style at least in i965. Could you change this to bool and flip the check earlier for consistency? What do you mean by flip the check earlier? intel_update_winsys_renderbuffer_miptree(struct brw_context *intel, struct intel_renderbuffer *irb, drm_intel_bo *bo, @@ -974,12 +974,12 @@ intel_update_winsys_renderbuffer_miptree(struct brw_context *intel, irb->mt = multisample_mt; } } - return; + return 0; fail: intel_miptree_release(>singlesample_mt); intel_miptree_release(>mt); - return; + return -1; } struct intel_mipmap_tree* diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h index 7b9a7be..85fe118 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h @@ -726,7 +726,7 @@ intel_miptree_create_for_image(struct brw_context *intel, uint32_t pitch, uint32_t layout_flags); -void +int intel_update_winsys_renderbuffer_miptree(struct brw_context *intel, struct intel_renderbuffer *irb, drm_intel_bo *bo, -- 2.10.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] gallium/hud: dump hud_driver_query values to files
On Wed, Dec 21, 2016 at 10:58 PM, Edmondo Tommasinawrote: > Dump values for every selected data source in GALLIUM_HUD. > > Every data source has its own file and the filename is > equal to the data source identifier. > --- > src/gallium/auxiliary/hud/hud_context.c | 6 ++ > src/gallium/auxiliary/hud/hud_driver_query.c | 2 ++ > src/gallium/auxiliary/hud/hud_private.h | 1 + > 3 files changed, 9 insertions(+) > > diff --git a/src/gallium/auxiliary/hud/hud_context.c > b/src/gallium/auxiliary/hud/hud_context.c > index ceb157a..edd831a 100644 > --- a/src/gallium/auxiliary/hud/hud_context.c > +++ b/src/gallium/auxiliary/hud/hud_context.c > @@ -33,6 +33,7 @@ > * Set GALLIUM_HUD=help for more info. > */ > > +#include > #include > #include > > @@ -829,6 +830,9 @@ hud_graph_add_value(struct hud_graph *gr, uint64_t value) > gr->current_value = value; > value = value > gr->pane->ceiling ? gr->pane->ceiling : value; > > + if (gr->fd) > + fprintf(gr->fd, "%" PRIu64 "\n", value); > + > if (gr->index == gr->pane->max_num_vertices) { >gr->vertices[0] = 0; >gr->vertices[1] = gr->vertices[(gr->index-1)*2+1]; > @@ -856,6 +860,8 @@ hud_graph_destroy(struct hud_graph *graph) > FREE(graph->vertices); > if (graph->free_query_data) >graph->free_query_data(graph->query_data); > + if (graph->fd) > + fclose(graph->fd); > FREE(graph); > } > > diff --git a/src/gallium/auxiliary/hud/hud_driver_query.c > b/src/gallium/auxiliary/hud/hud_driver_query.c > index 40ea120..bfde16a 100644 > --- a/src/gallium/auxiliary/hud/hud_driver_query.c > +++ b/src/gallium/auxiliary/hud/hud_driver_query.c > @@ -378,6 +378,8 @@ hud_pipe_query_install(struct hud_batch_query_context > **pbq, >info->result_index = result_index; > } > > + gr->fd = fopen(gr->name, "w+"); This opens the file unconditionally. Did you forget to check the env var here? Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 17/27] i965: Create correctly sized mcs for an image
On 16-12-10 15:36:06, Pohjolainen, Topi wrote: On Thu, Dec 01, 2016 at 02:09:58PM -0800, Ben Widawsky wrote: From: Ben WidawskySigned-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/intel_screen.c | 37 1 file changed, 33 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index 0f19a6e..91eb7ec 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -545,8 +545,11 @@ create_image_with_modifier(struct intel_screen *screen, { uint32_t tiling; unsigned long pitch; + unsigned ccs_height = 0; switch (modifier) { + case /* I915_FORMAT_MOD_CCS */ fourcc_mod_code(INTEL, 4): + ccs_height = ALIGN(DIV_ROUND_UP(height, 16), 32); case I915_FORMAT_MOD_Y_TILED: tiling = I915_TILING_Y; } @@ -554,10 +557,35 @@ create_image_with_modifier(struct intel_screen *screen, /* For now, all modifiers require some tiling */ assert(tiling); + /* +* CCS width is always going to be less than or equal to the image's width. +* All we need to do is make sure we add extra rows (height) for the CCS. +* +* A pair of CCS bits correspond to 8x4 pixels, and must be cacheline +* granularity. Each CCS tile is laid out in 8b strips, which corresponds to +* 1024x512 pixel region. In memory, it looks like the following: +* +* ? +* ??? ??? +* ??? ??? +* ??? ??? +* ??? Image ??? +* ??? ??? +* ??? ??? +* ???x??? +* ? +* ??? ??? | +* ???ccs ??? unused | +* ?---??? I guess this looks okay as actual source code and Mutt just displays it like this for me. It's UTF-8 codes. The problem is more likely your terminal than mutt. I can use ASCII if people prefer. +* <--pitch--> +*/ + unsigned y_tiled_height = ALIGN(height, 32); + cpp = _mesa_get_format_bytes(image->format); - image->bo = drm_intel_bo_alloc_tiled(screen->bufmgr, "image+mod", -width, height, cpp, , -, 0); + image->bo = drm_intel_bo_alloc_tiled(screen->bufmgr, +ccs_height ? "image+ccs" : "image", Do want to keep "image+mod" for the non-ccs case? Yes, thanks. This was originally a bit different before as the function was called regardless of whether or not there were modifiers. I'll change it. +width, y_tiled_height + ccs_height, +cpp, , , 0); if (image->bo == NULL) return false; @@ -575,7 +603,8 @@ create_image_with_modifier(struct intel_screen *screen, if (image->planar_format) assert(image->planar_format->nplanes == 1); - image->aux_offset = 0; /* y_tiled_height * pitch; */ + if (ccs_height) + image->aux_offset = y_tiled_height * pitch; Here you set 'aux_offset' relative to the beginning of the actual color region and therefore in the previous patch you shouldn't add mt->offset to it, right? (Just like I wrote in the previous patch, I think mt->offset is only used for moving to subsequent slices within color region). Originally I did add offset here. The formula was: aux_offset = mt->offset + y_tiled_height * pitch; So it was consistent at least with the previous patch. We can finish the discussion of what to do on that patch, and I'll have this part match it. return true; } -- 2.10.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/27] Renderbuffer Decompression (and GBM modifiers)
On 16-12-29 17:34:19, Ben Widawsky wrote: On 16-12-06 13:34:02, Paulo Zanoni wrote: 2016-12-01 20:09 GMT-02:00 Ben Widawsky: From: Ben Widawsky This patch series ultimately adds support within the i965 driver for Renderbuffer Decompression with GBM. In short, this feature reduces memory bandwidth by allowing the GPU to work with losslessly compressed data and having that compression scheme understood by the display engine for decompression. The display engine will decompress on the fly and scanout the image. Quoting from the final patch, the bandwidth savings on a SKL GT4 with a 19x10 display running kmscube: Without compression: Read bandwidth: 603.91 MiB/s Write bandwidth: 615.28 MiB/s With compression: Read bandwidth: 259.34 MiB/s Write bandwidth: 337.83 MiB/s The hardware achieves this savings by maintaining an auxiliary buffer containing "opaque" compression information. It's opaque in the sense that the low level compression scheme is not needed, but, knowledge of the overall layout of the compressed data is required. The auxiliary buffer is created by the driver on behalf of the client when requested. That buffer needs to be passed along wherever the main image's buffer goes. The overall strategy is that the buffer/surface is created with a list of modifiers. The list of modifiers the hardware is capable of using will come from a new kernel API that is aware of the hardware and general constraints. A client will request the list of modifiers and pass it directly back in during buffer creation (potentially the client can prune the list, but as of now there is no reason to.) This new API is being developed by Kristian. I did not get far enough to play with that. For EGL, a similar mechanism would exist whereby when importing a buffer into EGL, one would provide a modifier and probably a pointer to the auxiliary data upon import. (Import therefore might require multiple dma-buf fds), but for i965 and Intel, this wouldn't be necessary. Here is a brief description of the series: 1-6 Adds support in GBM for per plane functions where necessary. This is required because the kernel expects the auxiliary buffer to be passed along as a plane. It has its own offset, and stride, and the client shouldn't need to calculate those. 7-9 Adds support in GBM to understand modifiers. When creating a buffer or surface, the client is expected to pass in a list of modifiers that the driver will optimally choose from. As a result of this, the GBM APIs need to support modifiers. 10-12 Support Y-tiled modifier. Y-tiling was already a modifier exposed by the kernel. With the previous patches in place, it's easy to support this too. 13-26 Plumbing to support sending CCS buffers to display. Leveraging much of the existing code for MCS buffers, these patches creating an MCS for the scanout buffer. The trickery here is that a single BO contains both the main surface and the auxiliary data. Previously, auxiliary data always lived in its own BO. 27 Support CCS-modifier. Finally, the code can parse the CCS fb modifier(s) and realize the bandwidth savings that come with it. This was tested using kmscube (https://github.com/bwidawsk/kmscube/tree/modifiers). The kmscube implementation is missing support for GET_PLANE2 - which is currently being worked on by Kristian. Upstream plan: First of all, I'd like to point that I haven't really been following this feature closely, so maybe my questions are irrelevant to this series. But still, I feel I have to poitn these things since maybe they are relevant. Please tell me if I'm not talking about the same thing as you are. The main question is: where's the matching i915.ko series? Shouldn't that be step 0 in your upstream plan? Ville is working on it. All patches except the last can be merged without kernel support. That is assuming that we agree upon the general solution, using the modifiers and having both buffers be part of the same BO. There is also a requisite series from Kristian which will allow the client to query per plane modifiers. I guess this is a lie actually. I depend on fourcc_mod_code(INTEL, 4) being Y-tiled CCS modifier. I can figure out a way to defer this until the last patch. I do recall seeing BSpec text containing "do this thing if render decompression is enabled" and, at that time, our code wasn't implementing those instructions. AFAIU, the Kernel didn't really had support for render decompression, so its specific bits were just ignored. I was assuming that whoever implemented the feature would add all the necessary bits, especially since we didn't seem to have any sort of "if (has_render_decompression(dev_priv))" to call. I am 100% sure there's such an example in the Gen 9 Watermarks instructions, but I'm sure I saw more somewhere else (Display WA page?). And reember: missing watermarks workarounds equals flickering screens. Is this relevant to your series? How will Mesa be
[Mesa-dev] [AMD] Screen flickering with 4K and RX 480, would be glad to help debugging
Hi! I've recently bought a 4K display and an RX 480, but I've got some troubles with Dota when using high settings. I've created a thread on Phoronix forum (because maybe other people have the same problem...): https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/921416-screen-flickering-with-an-rx-480-dota-2-4k-full-settings The idea is not to have support (I know this mailing list isn't for that), but to try to find the source of the problem. Maybe it's a hardware problem (that's on my side), but maybe it's a driver problem. I'd be glad to give you any logs, outputs, info you ask. For instance, is there a log somewhere where we can see that the display actually tried to change it's resolution? Thanks! -- Romain "Creak" Failliot ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Patch for freedreno features
I tried to use git send-email but it doesn't seem to work (although the output says otherwise). So eventually it's simpler to just copy/paste the patch generated by git format-patch: --- docs/features.txt | 38 +++--- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/docs/features.txt b/docs/features.txt index c27d521..63b45af 100644 --- a/docs/features.txt +++ b/docs/features.txt @@ -33,7 +33,7 @@ are exposed in the 3.0 context as extensions. Feature Status --- -GL 3.0, GLSL 1.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr +GL 3.0, GLSL 1.30 --- all DONE: freedreno, i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr glBindFragDataLocation, glGetFragDataLocation DONE GL_NV_conditional_render (Conditional rendering) DONE () @@ -60,12 +60,12 @@ GL 3.0, GLSL 1.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, soft glVertexAttribI commands DONE Depth format cube texturesDONE () GLX_ARB_create_context (GLX 1.4 is required) DONE - Multisample anti-aliasing DONE (llvmpipe (*), softpipe (*), swr (*)) + Multisample anti-aliasing DONE (freedreno (*), llvmpipe (*), softpipe (*), swr (*)) -(*) llvmpipe, softpipe, and swr have fake Multisample anti-aliasing support +(*) freedreno, llvmpipe, softpipe, and swr have fake Multisample anti-aliasing support -GL 3.1, GLSL 1.40 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr +GL 3.1, GLSL 1.40 --- all DONE: freedreno, i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe, swr Forward compatible context support/deprecations DONE () GL_ARB_draw_instanced (Instanced drawing) DONE () @@ -82,34 +82,34 @@ GL 3.2, GLSL 1.50 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, soft Core/compatibility profiles DONE Geometry shaders DONE () - GL_ARB_vertex_array_bgra (BGRA vertex order) DONE (swr) - GL_ARB_draw_elements_base_vertex (Base vertex offset) DONE (swr) - GL_ARB_fragment_coord_conventions (Frag shader coord) DONE (swr) - GL_ARB_provoking_vertex (Provoking vertex)DONE (swr) - GL_ARB_seamless_cube_map (Seamless cubemaps) DONE (swr) + GL_ARB_vertex_array_bgra (BGRA vertex order) DONE (freedreno, swr) + GL_ARB_draw_elements_base_vertex (Base vertex offset) DONE (freedreno, swr) + GL_ARB_fragment_coord_conventions (Frag shader coord) DONE (freedreno, swr) + GL_ARB_provoking_vertex (Provoking vertex)DONE (freedreno, swr) + GL_ARB_seamless_cube_map (Seamless cubemaps) DONE (freedreno, swr) GL_ARB_texture_multisample (Multisample textures) DONE (swr) - GL_ARB_depth_clamp (Frag depth clamp) DONE (swr) - GL_ARB_sync (Fence objects) DONE (swr) + GL_ARB_depth_clamp (Frag depth clamp) DONE (freedreno, swr) + GL_ARB_sync (Fence objects) DONE (freedreno, swr) GLX_ARB_create_context_profileDONE GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe - GL_ARB_blend_func_extendedDONE (swr) + GL_ARB_blend_func_extendedDONE (freedreno/a3xx, swr) GL_ARB_explicit_attrib_location DONE (all drivers that support GLSL) - GL_ARB_occlusion_query2 DONE (swr) + GL_ARB_occlusion_query2 DONE (freedreno, swr) GL_ARB_sampler_objectsDONE (all drivers) - GL_ARB_shader_bit_encodingDONE (swr) - GL_ARB_texture_rgb10_a2ui DONE (swr) - GL_ARB_texture_swizzleDONE (swr) + GL_ARB_shader_bit_encodingDONE (freedreno, swr) + GL_ARB_texture_rgb10_a2ui DONE (freedreno, swr) + GL_ARB_texture_swizzleDONE (freedreno, swr) GL_ARB_timer_queryDONE (swr) - GL_ARB_instanced_arrays DONE (swr) - GL_ARB_vertex_type_2_10_10_10_rev DONE (swr) + GL_ARB_instanced_arrays DONE (freedreno, swr) + GL_ARB_vertex_type_2_10_10_10_rev DONE (freedreno, swr) GL 4.0, GLSL 4.00 --- all DONE: i965/gen8+, nvc0, r600, radeonsi - GL_ARB_draw_buffers_blend DONE (i965/gen6+, nv50, llvmpipe, softpipe, swr) + GL_ARB_draw_buffers_blend DONE (freedreno, i965/gen6+, nv50, llvmpipe, softpipe,
Re: [Mesa-dev] Patch for freedreno features
I'll try to do the git patch! I know features.txt isn't the official support source and it is more for the devs to follow on their work, so it's really up to up if you want to add freedreno in features.txt. I simply don't have a device for each driver, that's why I'm parsing features.txt in mesamatrix. brb with a patch! Thanks! 2016-12-31 12:08 GMT-05:00 Rob Clark: > hey, I don't suppose you could send a git patch? I can push (although > tbh glxinfo is the authoritative source when it comes to which > extensions are supported on which generations of adreno) > > BR, > -R ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Patch for freedreno features
hey, I don't suppose you could send a git patch? I can push (although tbh glxinfo is the authoritative source when it comes to which extensions are supported on which generations of adreno) BR, -R On Fri, Dec 30, 2016 at 2:09 PM, Romain Failliotwrote: > Hi! > > There's a patch by Rob Clark that sits in bugzilla for while now: > https://bugs.freedesktop.org/show_bug.cgi?id=95460 > > I've just updated it to HEAD. It would be nice to merge it, especially > since there hasn't been much changes in features.txt for a while. > > Cheers! > > -- > Romain "Creak" Failliot > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 99237] Impossible to create transparent X11/EGL windows while respecting EGL_NATIVE_VISUAL_ID
https://bugs.freedesktop.org/show_bug.cgi?id=99237 --- Comment #1 from nfx...@gmail.com --- > Second it has some code that explicitly excludes RGBA X visuals, Looking at the code I linked again, this is wrong. But if xcb_depth_iterator_t returns the RGB visual before the RGBA one, the RGBA one will of course never be added. I have no idea how the iterator sorts visuals, though. On my system, the only 32 bit (i.e. RGBA) visual is the very last one according to glxinfo and xdpyinfo. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 99237] Impossible to create transparent X11/EGL windows while respecting EGL_NATIVE_VISUAL_ID
https://bugs.freedesktop.org/show_bug.cgi?id=99237 Bug ID: 99237 Summary: Impossible to create transparent X11/EGL windows while respecting EGL_NATIVE_VISUAL_ID Product: Mesa Version: 13.0 Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: EGL Assignee: mesa-dev@lists.freedesktop.org Reporter: nfx...@gmail.com QA Contact: mesa-dev@lists.freedesktop.org eglGetConfigs() never lists EGLConfigs that point to RGBA X visuals in their EGL_NATIVE_VISUAL_ID attribute. If you create an EGL context and X11 window the "proper" way, i.e. use eglGetConfigs() or eglChooseConfig() and then create a window with the visual noted in the chosen EGLConfig's EGL_NATIVE_VISUAL_ID attribute, the resulting window's GL-rendered contents will never be alpha-blended by the compositor. nVidia's EGL implementation on the other hand does list EGLConfigs with RGBA visuals. (They have two almost identical EGLConfigs following each other: first one with a RGB X visual, then with a RGBA one. All fields except EGL_CONFIG_ID and EGL_NATIVE_VISUAL_ID are the same.) I think that Mesa should do it the same way, and that not doing this is a bug and/or a missing feature that should be added. Seems like this is done explicitly in dri2_x11_add_configs_for_visuals: https://cgit.freedesktop.org/mesa/mesa/tree/src/egl/drivers/dri2/platform_x11.c#n752 First, it allows only 1 config/visual per color class. It really should add two visuals at least for TrueColor visuals (a RGB and then a RGBA one). Second it has some code that explicitly excludes RGBA X visuals, with comments about how applications don't want alpha-composited windows. I'm not completely aware why this can't just be done via the EGL_ALPHA_SIZE attribute (if it's 0, select a RGB-only config), but maybe there are reasons. (Wayland apparently does not care about those reasons, and if you get a RGBA config, your window is always alpha-composited, but I didn't double-check this.) nVidia/EGL and GLX avoid such problems by always listing a config backed by a RGB X visual first. You could also argue that the API user should just use a RGBA visual when creating the X window, either by completely ignoring the chosen EGLConfig's EGL_NATIVE_VISUAL_ID attribute, or by attempting to select a "compatible" one. However, not all drivers allow using a different X visual (even if it's compatible). The nVidia driver is one such driver. The EGL spec also explicitly states that it's implementation specific whether another underlying pixel format can be used (and mention different X visual IDs as example). I'm sure application developers should be spared from trying to support both Mesa and nVidia in the hard way by having to implement both methods. While I'm personally less than enchanted for needing platform-specific code in the EGL config selection just to get X11 transparency, nVidia's idea seems to be the best way to implement it. It's similar to how GLX behaves, backwards-compatible, and the application does not need to try to use different visuals for window creation. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 99010] --disable-gallium-llvm no longer recognized
https://bugs.freedesktop.org/show_bug.cgi?id=99010 Jonathan Graychanged: What|Removed |Added CC||j...@openbsd.org -- You are receiving this mail because: You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev