Re: [Mesa-dev] [PATCH] glsl: Use ir_var_temporary when generating inline functions.
Reviewed-by: Iago Toral QuirogaOn Mon, 2016-12-19 at 15:29 -0800, Kenneth Graunke wrote: > We were using ir_var_auto for the inlined function parameter > variables, > which is wrong, as it suggests that those are real variables declared > by the program. > > Normally this doesn't matter. However, if you called built-ins at > global scope, it would pollute the global variable namespace with > these new parameter temporaries. If the shader already had variables > with those names, the linker might see contradictory global variable > declarations and raise an error. > > Making them temporaries indicates that these are just things > generated > by the compiler internally. This avoids confusing the linker. > > Fixes a new Piglit test: glsl-fs-multiple-builtins. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99097 > Reported-by: Niels Ole Salscheider > Signed-off-by: Kenneth Graunke > --- > src/compiler/glsl/opt_function_inlining.cpp | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/compiler/glsl/opt_function_inlining.cpp > b/src/compiler/glsl/opt_function_inlining.cpp > index 62c1f4b..78a726b 100644 > --- a/src/compiler/glsl/opt_function_inlining.cpp > +++ b/src/compiler/glsl/opt_function_inlining.cpp > @@ -164,7 +164,7 @@ ir_call::generate_inline(ir_instruction *next_ir) > parameters[i] = NULL; > } else { > parameters[i] = sig_param->clone(ctx, ht); > - parameters[i]->data.mode = ir_var_auto; > + parameters[i]->data.mode = ir_var_temporary; > > /* Remove the read-only decoration because we're going to > write > * directly to this variable. If the cloned variable is > left ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] gallium: add renderonly library
2016-12-19 22:50 GMT+01:00 Thierry Reding: > On Mon, Dec 19, 2016 at 08:54:04PM +0100, Christian Gmeiner wrote: >> 2016-12-19 14:08 GMT+01:00 Thierry Reding : >> > On Wed, Nov 30, 2016 at 02:44:34PM +0100, Christian Gmeiner wrote: > [...] >> >> GALLIUM_WINSYS_CFLAGS = \ >> >> -I$(top_srcdir)/src \ >> >> -I$(top_srcdir)/include \ >> >> diff --git a/src/gallium/auxiliary/Makefile.am >> >> b/src/gallium/auxiliary/Makefile.am >> >> index 4a4a4fb..6b63cf1 100644 >> >> --- a/src/gallium/auxiliary/Makefile.am >> >> +++ b/src/gallium/auxiliary/Makefile.am >> >> @@ -20,6 +20,16 @@ libgallium_la_SOURCES = \ >> >> $(NIR_SOURCES) \ >> >> $(GENERATED_SOURCES) >> >> >> >> +if HAVE_LIBDRM >> >> + >> >> +AM_CFLAGS += \ >> >> + $(LIBDRM_CFLAGS) >> > >> > Same here. >> >> I just redone what other have done in that file. There are some if's >> in that file and I am unsure >> what to do here. > > Maybe Emil has a strong opinion here, I was really just nit-picking, so > feel free to leave this. > :) >> >> diff --git a/src/gallium/auxiliary/Makefile.sources >> >> b/src/gallium/auxiliary/Makefile.sources >> >> index 5d4fe30..8d3e4a9 100644 >> >> --- a/src/gallium/auxiliary/Makefile.sources >> >> +++ b/src/gallium/auxiliary/Makefile.sources >> >> @@ -435,3 +435,7 @@ GALLIVM_SOURCES := \ >> >> draw/draw_llvm_sample.c \ >> >> draw/draw_pt_fetch_shade_pipeline_llvm.c \ >> >> draw/draw_vs_llvm.c >> >> + >> >> +RENDERONLY_SOURCES := \ >> >> + renderonly/renderonly.c \ >> >> + renderonly/renderonly.h >> >> diff --git a/src/gallium/auxiliary/renderonly/renderonly.c >> >> b/src/gallium/auxiliary/renderonly/renderonly.c >> >> new file mode 100644 >> >> index 000..c4ea784 >> >> --- /dev/null >> >> +++ b/src/gallium/auxiliary/renderonly/renderonly.c >> >> @@ -0,0 +1,199 @@ >> >> +/* >> >> + * Copyright (C) 2016 Christian Gmeiner >> >> + * >> >> + * Permission is hereby granted, free of charge, to any person obtaining >> >> a >> >> + * copy of this software and associated documentation files (the >> >> "Software"), >> >> + * to deal in the Software without restriction, including without >> >> limitation >> >> + * the rights to use, copy, modify, merge, publish, distribute, >> >> sublicense, >> >> + * and/or sell copies of the Software, and to permit persons to whom the >> >> + * Software is furnished to do so, subject to the following conditions: >> >> + * >> >> + * The above copyright notice and this permission notice (including the >> >> next >> >> + * paragraph) shall be included in all copies or substantial portions of >> >> the >> >> + * Software. >> >> + * >> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >> >> EXPRESS OR >> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >> >> MERCHANTABILITY, >> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT >> >> SHALL >> >> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR >> >> OTHER >> >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, >> >> ARISING FROM, >> >> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER >> >> DEALINGS IN THE >> >> + * SOFTWARE. >> >> + * >> >> + * Authors: >> >> + *Christian Gmeiner >> >> + */ >> >> + >> >> +#include "renderonly/renderonly.h" >> > >> > This include is oddly placed. Should it not go below all other includes? >> > >> >> I think thats a matter of style but I can puit it above #include >> "state_tracker/drm_driver.h" > > I prefer having these in hierarchical order. That is, anything from the > C runtime goes first, then other system includes, followed by project > core includes and finally driver-local includes. By that ordering > renderonly.h would be very last in line. Not sure if anyone in Mesa is > pedantic enough to insist on a common style, so feel free to leave this > as-is if you prefer. > Okay.. I think this could be fixed after it went in. >> >> +struct pipe_screen * >> >> +renderonly_screen_create(int fd, const struct renderonly_ops *ops, void >> >> *priv) >> > >> > This is slightly nit-picky, but I found it confusing when first reading >> > through the patches: I know this started out as an effort to support the >> > various render-only devices out there, but what you're creating here is >> > really an abstraction for the scanout engine. Reading renderonly_* the >> > first thing I think of is that this creates a render-only device, where >> > in fact is creates the scanout engine. >> > >> > Perhaps renaming this to struct scanout, struct scanout_engine or any >> > other number of possibilities would remove that confusion. >> > >> >> In my current version I have only one struct left: >> >> struct renderonly { >>int (*tiling)(int fd, uint32_t handle); /**< for !intermediate_rendering >> */ >>bool intermediate_rendering; >>
[Mesa-dev] [PATCH] egl/dri2: implement query surface hook
This makes better guarantee that the values we return are in sync what the underlying drawable currently has. Together with dEQP change bug #98327 this fixes following test: dEQP-EGL.functional.resize.surface_size.grow Signed-off-by: Tapani PälliBugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98327 --- src/egl/drivers/dri2/platform_x11.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/src/egl/drivers/dri2/platform_x11.c b/src/egl/drivers/dri2/platform_x11.c index db7d3b9..0c5d577 100644 --- a/src/egl/drivers/dri2/platform_x11.c +++ b/src/egl/drivers/dri2/platform_x11.c @@ -395,6 +395,34 @@ dri2_x11_destroy_surface(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *surf) } /** + * Function utilizes swrastGetDrawableInfo to get surface + * geometry from x server and calls default query surface + * implementation that returns the updated values. + * + * In case of errors we still return values that we currently + * have. + */ +static EGLBoolean +dri2_query_surface(_EGLDriver *drv, _EGLDisplay *dpy, + _EGLSurface *surf, EGLint attribute, + EGLint *value) +{ + struct dri2_egl_display *dri2_dpy = dri2_egl_display(dpy); + struct dri2_egl_surface *dri2_surf = dri2_egl_surface(surf); + int x, y, w = -1, h = -1; + + __DRIdrawable *drawable = dri2_dpy->vtbl->get_dri_drawable(surf); + swrastGetDrawableInfo(drawable, , , , , dri2_surf); + + if (w != -1 && h != -1) { + surf->Width = w; + surf->Height = h; + } + + return _eglQuerySurface(drv, dpy, surf, attribute, value); +} + +/** * Process list of buffer received from the server * * Processes the list of buffers received in a reply from the server to either @@ -1113,6 +1141,7 @@ static struct dri2_egl_display_vtbl dri2_x11_swrast_display_vtbl = { .post_sub_buffer = dri2_fallback_post_sub_buffer, .copy_buffers = dri2_x11_copy_buffers, .query_buffer_age = dri2_fallback_query_buffer_age, + .query_surface = dri2_query_surface, .create_wayland_buffer_from_image = dri2_fallback_create_wayland_buffer_from_image, .get_sync_values = dri2_fallback_get_sync_values, .get_dri_drawable = dri2_surface_get_dri_drawable, @@ -1132,6 +1161,7 @@ static struct dri2_egl_display_vtbl dri2_x11_display_vtbl = { .post_sub_buffer = dri2_x11_post_sub_buffer, .copy_buffers = dri2_x11_copy_buffers, .query_buffer_age = dri2_fallback_query_buffer_age, + .query_surface = dri2_query_surface, .create_wayland_buffer_from_image = dri2_fallback_create_wayland_buffer_from_image, .get_sync_values = dri2_x11_get_sync_values, .get_dri_drawable = dri2_surface_get_dri_drawable, -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0
On Mon, 2016-12-19 at 11:31 -0600, Matt Turner wrote: > On Mon, Dec 19, 2016 at 2:00 AM, Samuel Iglesias Gonsálvez >wrote: > > Hello Matt, > > > > We have done most of the suggestions you made to our patches. > > However, > > we have replied to some of your questions/suggestions and we are > > waiting for a reply before marking them as R-b or not. > > Thank you guys so much. > > > You can clone the new version of the patch series by running this > > command: > > > > $ git clone -b i965-fp64-gen7-scalar-vec4-rc3 https://github.com/Ig > > alia > > /mesa.git > > > > Below is the list of patches that need a R-b (they are marked as > > UNREVIEWED in the branch). > > > > * i965/vec4: implement hardware workaround for align16 double to > > float > > conversion > > > > > > This always seemed like a really strange hardware bug, and > > > > one > > > that no one should ever hit. > > > > > > I'd prefer that, instead of loading an immediate double and > > > then > > > performing a conversion to float, that we just convert the > > > double to float in the compiler and emit an instruction to > > > > load > > > that. > > > > > > > We have done this. Does this change get your R-b? > > Yes! > > > > > * i965/vec4: fix optimize predicate for doubles > > > > We have replied here [0]. > > Sounds good to me. > > > > > * i965/vec4: handle 32 and 64 bit channels in liveness analysis > > > > It is still unreviewed. Maybe Curro can take a look at it. > > I've also pinged Curro to ask if he'll review it. > > > * i965/vec4: add a SIMD lowering pass > > > > Replied here [1]. > > Silly messy hardware. :) > > > * i965/vec4: Prevent copy propagation from violating pre-gen8 > > restrictions > > > > Replied here [1]. > > > > * i965/vec4: run scalarize_df() after spilling > > > > Replied here [1]. > > Makes sense. > > Yes, all of those should be > > Reviewed-by: Matt Turner > > Again, thank you so much. This was a large amount of work, and the > way > you guys handled it was extremely impressive. I'm only sorry that the > review of your work wasn't executed as well as your actual work! > Thanks to you for the review! :-) Sam signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] new EGL internal SyncSurface hook
Forget about this ... with some more coffee I realized I can just go and implement the geometry fetching in QuerySurface hook for dri2 x11, will send a patch for that! Sorry about the noise! On 12/19/2016 02:36 PM, Tapani Pälli wrote: Hi; I've been investigating dEQP EGL bugs. Following test seems to fail now and then and only on X11 when using DRI2: dEQP-EGL.functional.resize.surface_size.grow This patch is RFC for a 'SyncSurface' API that will query surface geometry (xcb_get_geometry) before returning current stored values in '_eglQuerySurface'. Without this we might have wrong values in place, I believe this is related to following comments in the source code .. --- 8< --- src/mesa/drivers/dri/i965/brw_context.c: /* GLX uses DRI2 invalidate events to handle window resizing. * Unfortunately, EGL does not - libEGL is written in XCB (not Xlib), * which doesn't provide a mechanism for snooping the event queues. * * So EGL still relies on viewport hacks to handle window resizing. * This should go away with DRI3000. */ src/egl/drivers/dri2/platform_x11.c: /* Since we aren't watching for the server's invalidate events like we're * supposed to (due to XCB providing no mechanism for filtering the events * the way xlib does), and SwapBuffers is a common cause of invalidate * events, just shove one down to the driver, even though we haven't told * the driver that we're the kind of loader that provides reliable * invalidate events. This causes the driver to request buffers again at * its next draw, so that we get the correct buffers if a pageflip * happened. The driver should still be using the viewport hack to catch * window resizes. */ --- 8< --- This patch together with a dEQP change I've attached to bug #98327 makes this test pass consistently for me on 2 machines, one is running Fedora with DRI3 and another one Ubuntu with DRI2. Is this OK approach? Should the name be something else like SyncSurfaceGeometry or GetSurfaceGeometry? Any comments appreciated; Tapani Pälli (1): egl: syncronize surface information with driver RFC src/egl/drivers/dri2/egl_dri2.c | 12 src/egl/drivers/dri2/egl_dri2.h | 2 ++ src/egl/drivers/dri2/platform_x11.c | 31 +++ src/egl/main/eglapi.h | 2 ++ src/egl/main/eglfallbacks.c | 1 + src/egl/main/eglsurface.c | 3 +++ 6 files changed, 51 insertions(+) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] vulkan/wsi/x11: don't crash on null wsi x11 connection
Acked-by: Edward O'CallaghanOn 12/20/2016 02:59 PM, Arda Coskunses wrote: > Without this check driver crash when application window > closed unexpectedly. > --- > src/vulkan/wsi/wsi_common_x11.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/src/vulkan/wsi/wsi_common_x11.c b/src/vulkan/wsi/wsi_common_x11.c > index 25ba0c1..afb7809 100644 > --- a/src/vulkan/wsi/wsi_common_x11.c > +++ b/src/vulkan/wsi/wsi_common_x11.c > @@ -261,6 +261,11 @@ VkBool32 > wsi_get_physical_device_xcb_presentation_support( > struct wsi_x11_connection *wsi_conn = >wsi_x11_get_connection(wsi_device, alloc, connection); > > + if (!wsi_conn) { > + fprintf(stderr, "vulkan: wsi connection lost\n"); > + return false; > + } > + > if (!wsi_conn->has_dri3) { >fprintf(stderr, "vulkan: No DRI3 support\n"); >return false; > signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir/algebraic: Add optimizations for "a == a && a CMP b"
This sequence shows up The Talos Principal, at least under Vulkan, and prevents loop analysis from properly computing trip counts in a few loops. --- src/compiler/nir/nir_opt_algebraic.py | 8 1 file changed, 8 insertions(+) diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py index 698ac67..cc70ad5 100644 --- a/src/compiler/nir/nir_opt_algebraic.py +++ b/src/compiler/nir/nir_opt_algebraic.py @@ -464,6 +464,14 @@ def bitfield_reverse(u): optimizations += [(bitfield_reverse('x@32'), ('bitfield_reverse', 'x'))] +# For any comparison operation, "cmp", if you have "a != a && a cmp b" then +# the "a != a" is redundant because it's equivalent to "a is not NaN" and, if +# a is a NaN then the second comparison will fail anyway. +for op in ['flt', 'fge', 'feq', 'fne']: + optimizations += [ + (('iand', ('feq', a, a), (op, a, b)), (op, a, b)), + (('iand', ('feq', a, a), (op, b, a)), (op, b, a)), + ] # Add optimizations to handle the case where the result of a ternary is # compared to a constant. This way we can take things like -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 99076] dEQP-GLES3.functional.negative_api.texture#teximage3d fails due to wrong Error code
https://bugs.freedesktop.org/show_bug.cgi?id=99076 Gary Wangchanged: What|Removed |Added CC||gary.c.w...@intel.com -- You are receiving this mail because: You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965: make use of nir_lower_returns() for GL
Fixes two new piglit tests: spec/glsl-1.10/execution/vs-nested-return-sibling-loop.shader_test spec/glsl-1.10/execution/vs-nested-return-sibling-loop2.shader_test shader-db results for BDW: total instructions in shared programs: 12903158 -> 12903134 (-0.00%) instructions in affected programs: 27100 -> 27076 (-0.09%) helped: 32 HURT: 6 total cycles in shared programs: 294922518 -> 294922804 (0.00%) cycles in affected programs: 4372828 -> 4373114 (0.01%) helped: 31 HURT: 8 --- src/mesa/drivers/dri/i965/brw_link.cpp | 6 -- src/mesa/drivers/dri/i965/brw_program.c | 2 ++ 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp b/src/mesa/drivers/dri/i965/brw_link.cpp index 6f37428..19db982 100644 --- a/src/mesa/drivers/dri/i965/brw_link.cpp +++ b/src/mesa/drivers/dri/i965/brw_link.cpp @@ -160,12 +160,6 @@ process_glsl_ir(struct brw_context *brw, brw_do_vector_splitting(shader->ir); } - progress = do_lower_jumps(shader->ir, true, true, -true, /* main return */ -false, /* continue */ -false /* loops */ -) || progress; - progress = do_common_optimization(shader->ir, true, true, options, ctx->Const.NativeIntegers) || progress; } while (progress); diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c index a502b8e..c4ab5ee 100644 --- a/src/mesa/drivers/dri/i965/brw_program.c +++ b/src/mesa/drivers/dri/i965/brw_program.c @@ -78,6 +78,8 @@ brw_create_nir(struct brw_context *brw, if (shader_prog) { nir = glsl_to_nir(shader_prog, stage, options); nir_remove_dead_variables(nir, nir_var_shader_in | nir_var_shader_out); + nir_lower_returns(nir); + nir_validate_shader(nir); NIR_PASS_V(nir, nir_lower_io_to_temporaries, nir_shader_get_entrypoint(nir), true, false); } else { -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] nir: update nir_lower_returns to only predicate instructions when needed
Unless an if statement contains nested returns we can simply add any following instructions to the branch without the return. V2: fix handling if_nested_return value when there is a sibling if/loop that doesn't contain a return. (Spotted by Ken) --- src/compiler/nir/nir_lower_returns.c | 37 ++-- 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/src/compiler/nir/nir_lower_returns.c b/src/compiler/nir/nir_lower_returns.c index cf49d5b..5eec984 100644 --- a/src/compiler/nir/nir_lower_returns.c +++ b/src/compiler/nir/nir_lower_returns.c @@ -30,6 +30,8 @@ struct lower_returns_state { struct exec_list *cf_list; nir_loop *loop; nir_variable *return_flag; + /* Are there other return statments nested in the current if */ + bool if_nested_return; }; static bool lower_returns_in_cf_list(struct exec_list *cf_list, @@ -82,8 +84,10 @@ lower_returns_in_loop(nir_loop *loop, struct lower_returns_state *state) * flag set to true. We need to predicate everything following the loop * on the return flag. */ - if (progress) + if (progress) { predicate_following(>cf_node, state); + state->if_nested_return = true; + } return progress; } @@ -91,10 +95,13 @@ lower_returns_in_loop(nir_loop *loop, struct lower_returns_state *state) static bool lower_returns_in_if(nir_if *if_stmt, struct lower_returns_state *state) { - bool progress; + bool progress, then_progress; - progress = lower_returns_in_cf_list(_stmt->then_list, state); - progress = lower_returns_in_cf_list(_stmt->else_list, state) || progress; + bool if_nested_return = state->if_nested_return; + state->if_nested_return = false; + + then_progress = lower_returns_in_cf_list(_stmt->then_list, state); + progress = lower_returns_in_cf_list(_stmt->else_list, state) || then_progress; /* If either of the recursive calls made progress, then there were * returns inside of the body of the if. If we're in a loop, then these @@ -106,8 +113,25 @@ lower_returns_in_if(nir_if *if_stmt, struct lower_returns_state *state) * after a return, we need to predicate everything following on the * return flag. */ - if (progress && !state->loop) - predicate_following(_stmt->cf_node, state); + if (progress && !state->loop) { + if (state->if_nested_return) { + predicate_following(_stmt->cf_node, state); + } else { + /* If there are no nested returns we can just add the instructions to + * the end of the branch that doesn't have the return. + */ + nir_cf_list list; + nir_cf_extract(, nir_after_cf_node(_stmt->cf_node), +nir_after_cf_list(state->cf_list)); + + if (then_progress) +nir_cf_reinsert(, nir_after_cf_list(_stmt->else_list)); + else +nir_cf_reinsert(, nir_after_cf_list(_stmt->then_list)); + } + } + + state->if_nested_return = progress || if_nested_return; return progress; } @@ -221,6 +245,7 @@ nir_lower_returns_impl(nir_function_impl *impl) state.cf_list = >body; state.loop = NULL; state.return_flag = NULL; + state.if_nested_return = false; nir_builder_init(, impl); bool progress = lower_returns_in_cf_list(>body, ); -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/6] nir: A couple of SPIR-V focused loop optimizations
While working with Tim on loop unrolling, I realized that the new NIR loop unrolling pass couldn't actually unroll any loops coming out of SPIR-V thanks to the way we handle continues. This little series adds two relatively small optimization passes that deal with this by peeling apart ifs at the tops of loops that take one branch on the first iteration and the other branch on all subsequent operations. Jason Ekstrand (6): nir: Add a couple quick-and-dirty out-of-SSA helpers nir: Correctly handle blocks in cf_node_cf_tree_next nir: Expose function_impl versions of copy-prop and dce nir: Add an optimization pass to remove trivial continues nir: Add a pass for moving SPIR-V continue blocks to the ends of loops i965: Use nir_opt_trivial_continues and nir_opt_if src/compiler/Makefile.sources| 2 + src/compiler/nir/nir.c | 2 +- src/compiler/nir/nir.h | 9 + src/compiler/nir/nir_from_ssa.c | 189 ++-- src/compiler/nir/nir_opt_copy_propagate.c| 2 +- src/compiler/nir/nir_opt_dce.c | 2 +- src/compiler/nir/nir_opt_if.c| 253 +++ src/compiler/nir/nir_opt_trivial_continues.c | 141 +++ src/mesa/drivers/dri/i965/brw_nir.c | 2 + 9 files changed, 587 insertions(+), 15 deletions(-) create mode 100644 src/compiler/nir/nir_opt_if.c create mode 100644 src/compiler/nir/nir_opt_trivial_continues.c -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] nir: Expose function_impl versions of copy-prop and dce
--- src/compiler/nir/nir.h| 2 ++ src/compiler/nir/nir_opt_copy_propagate.c | 2 +- src/compiler/nir/nir_opt_dce.c| 2 +- 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index 0fd6a77..89b0c70 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -2545,10 +2545,12 @@ bool nir_opt_constant_folding(nir_shader *shader); bool nir_opt_global_to_local(nir_shader *shader); +bool nir_copy_prop_impl(nir_function_impl *impl); bool nir_copy_prop(nir_shader *shader); bool nir_opt_cse(nir_shader *shader); +bool nir_opt_dce_impl(nir_function_impl *impl); bool nir_opt_dce(nir_shader *shader); bool nir_opt_dead_cf(nir_shader *shader); diff --git a/src/compiler/nir/nir_opt_copy_propagate.c b/src/compiler/nir/nir_opt_copy_propagate.c index c26e07f..3cd65a2 100644 --- a/src/compiler/nir/nir_opt_copy_propagate.c +++ b/src/compiler/nir/nir_opt_copy_propagate.c @@ -240,7 +240,7 @@ copy_prop_if(nir_if *if_stmt) return copy_prop_src(_stmt->condition, NULL, if_stmt); } -static bool +bool nir_copy_prop_impl(nir_function_impl *impl) { bool progress = false; diff --git a/src/compiler/nir/nir_opt_dce.c b/src/compiler/nir/nir_opt_dce.c index 5cefba3..2d2ec29 100644 --- a/src/compiler/nir/nir_opt_dce.c +++ b/src/compiler/nir/nir_opt_dce.c @@ -128,7 +128,7 @@ init_block(nir_block *block, struct exec_list *worklist) return true; } -static bool +bool nir_opt_dce_impl(nir_function_impl *impl) { struct exec_list *worklist = rzalloc(NULL, struct exec_list); -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/6] nir: Add a pass for moving SPIR-V continue blocks to the ends of loops
When shaders come in from SPIR-V, we handle continue blocks by placing the contents of the continue inside of a "if (!first_iteration)". We do this so that we can properly handle the fact that continues in SPIR-V jump to the continue block at the end of the loop rather than jumping directly to the top of the loop like they do in NIR. In particular, the increment step of a simple for loop ends up in the continue block. This pass looks for this case in loops that don't actually have any continues and moves the continue contents to the end of the loop instead. We need this because loop unrolling doesn't work if the increment is inside of a condition. --- src/compiler/Makefile.sources | 1 + src/compiler/nir/nir.h| 2 + src/compiler/nir/nir_opt_if.c | 253 ++ 3 files changed, 256 insertions(+) create mode 100644 src/compiler/nir/nir_opt_if.c diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources index a0abede..6f701af 100644 --- a/src/compiler/Makefile.sources +++ b/src/compiler/Makefile.sources @@ -239,6 +239,7 @@ NIR_FILES = \ nir/nir_opt_dead_cf.c \ nir/nir_opt_gcm.c \ nir/nir_opt_global_to_local.c \ + nir/nir_opt_if.c \ nir/nir_opt_loop_unroll.c \ nir/nir_opt_peephole_select.c \ nir/nir_opt_remove_phis.c \ diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index a6c8956..7b582a6 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -2557,6 +2557,8 @@ bool nir_opt_dead_cf(nir_shader *shader); bool nir_opt_gcm(nir_shader *shader, bool value_number); +bool nir_opt_if(nir_shader *shader); + bool nir_opt_loop_unroll(nir_shader *shader, nir_variable_mode indirect_mask); bool nir_opt_peephole_select(nir_shader *shader, unsigned limit); diff --git a/src/compiler/nir/nir_opt_if.c b/src/compiler/nir/nir_opt_if.c new file mode 100644 index 000..5590684 --- /dev/null +++ b/src/compiler/nir/nir_opt_if.c @@ -0,0 +1,253 @@ +/* + * Copyright © 2016 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "nir.h" +#include "nir_control_flow.h" + +/** + * This optimization detects if statements at the tops of loops where the + * condition is a phi node of two constants and moves half of the if to above + * the loop and the other half of the if to the end of the loop. A simple for + * loop "for (int i = 0; i < 4; i++)", when run through the SPIR-V front-end, + * ends up looking something like this: + * + * vec1 32 ssa_0 = load_const (0x) + * vec1 32 ssa_1 = load_const (0x) + * loop { + *block block_1: + *vec1 32 ssa_2 = phi block_0: ssa_0, block_7: ssa_5 + *vec1 32 ssa_3 = phi block_0: ssa_0, block_7: ssa_1 + *if ssa_2 { + * block block_2: + * vec1 32 ssa_4 = load_const (0x0001) + * vec1 32 ssa_5 = iadd ssa_2, ssa_4 + *} else { + * block block_3: + *} + *block block_4: + *vec1 32 ssa_6 = load_const (0x0004) + *vec1 32 ssa_7 = ilt ssa_5, ssa_6 + *if ssa_7 { + * block block_5: + *} else { + * block block_6: + * break + *} + *block block_7: + * } + * + * This turns it into something like this: + * + * // Stuff from block 1 + * // Stuff from block 3 + * loop { + *block block_1: + *vec1 32 ssa_3 = phi block_0: ssa_0, block_7: ssa_1 + *vec1 32 ssa_6 = load_const (0x0004) + *vec1 32 ssa_7 = ilt ssa_5, ssa_6 + *if ssa_7 { + * block block_5: + *} else { + * block block_6: + * break + *} + *block block_7: + *// Stuff from block 1 + *// Stuff from block 2 + *vec1 32 ssa_4 = load_const (0x0001) + *vec1 32 ssa_5 = iadd ssa_2, ssa_4 + * } + */ +static bool +opt_peel_loop_initial_if(nir_loop *loop) +{ + nir_block *header_block = nir_loop_first_block(loop); + nir_block *prev_block = +
[Mesa-dev] [PATCH 2/6] nir: Correctly handle blocks in cf_node_cf_tree_next
--- src/compiler/nir/nir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c index e522a67..b8d1abd 100644 --- a/src/compiler/nir/nir.c +++ b/src/compiler/nir/nir.c @@ -1754,7 +1754,7 @@ nir_block *nir_cf_node_cf_tree_last(nir_cf_node *node) nir_block *nir_cf_node_cf_tree_next(nir_cf_node *node) { if (node->type == nir_cf_node_block) - return nir_cf_node_cf_tree_first(nir_cf_node_next(node)); + return nir_block_cf_tree_next(nir_cf_node_as_block(node)); else if (node->type == nir_cf_node_function) return NULL; else -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] i965: Use nir_opt_trivial_continues and nir_opt_if
--- src/mesa/drivers/dri/i965/brw_nir.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i965/brw_nir.c index 0c1fb44..a091861 100644 --- a/src/mesa/drivers/dri/i965/brw_nir.c +++ b/src/mesa/drivers/dri/i965/brw_nir.c @@ -429,6 +429,8 @@ nir_optimize(nir_shader *nir, const struct brw_compiler *compiler, OPT(nir_opt_algebraic); OPT(nir_opt_constant_folding); OPT(nir_opt_dead_cf); + OPT(nir_opt_trivial_continues); + OPT(nir_opt_if); if (nir->options->max_unroll_iterations != 0) { OPT(nir_opt_loop_unroll, indirect_mask); } -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] nir: Add an optimization pass to remove trivial continues
--- src/compiler/Makefile.sources| 1 + src/compiler/nir/nir.h | 2 + src/compiler/nir/nir_opt_trivial_continues.c | 141 +++ 3 files changed, 144 insertions(+) create mode 100644 src/compiler/nir/nir_opt_trivial_continues.c diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources index ae3e5f0..a0abede 100644 --- a/src/compiler/Makefile.sources +++ b/src/compiler/Makefile.sources @@ -242,6 +242,7 @@ NIR_FILES = \ nir/nir_opt_loop_unroll.c \ nir/nir_opt_peephole_select.c \ nir/nir_opt_remove_phis.c \ + nir/nir_opt_trivial_continues.c \ nir/nir_opt_undef.c \ nir/nir_phi_builder.c \ nir/nir_phi_builder.h \ diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index 89b0c70..a6c8956 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -2563,6 +2563,8 @@ bool nir_opt_peephole_select(nir_shader *shader, unsigned limit); bool nir_opt_remove_phis(nir_shader *shader); +bool nir_opt_trivial_continues(nir_shader *shader); + bool nir_opt_undef(nir_shader *shader); bool nir_opt_conditional_discard(nir_shader *shader); diff --git a/src/compiler/nir/nir_opt_trivial_continues.c b/src/compiler/nir/nir_opt_trivial_continues.c new file mode 100644 index 000..ff3205d --- /dev/null +++ b/src/compiler/nir/nir_opt_trivial_continues.c @@ -0,0 +1,141 @@ +/* + * Copyright © 2016 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "nir.h" + +static bool +instr_is_continue(nir_instr *instr) +{ + if (instr->type != nir_instr_type_jump) + return false; + + return nir_instr_as_jump(instr)->type == nir_jump_continue; +} + +static bool +lower_trivial_continues_block(nir_block *block, nir_loop *loop) +{ + bool progress = false; + nir_instr *first_instr = nir_block_first_instr(block); + if (!first_instr || instr_is_continue(first_instr)) { + /* The block contains only a continue if anything */ + nir_cf_node *prev_node = nir_cf_node_prev(>cf_node); + if (prev_node && prev_node->type == nir_cf_node_if) { + nir_if *prev_if = nir_cf_node_as_if(prev_node); + progress |= lower_trivial_continues_block( +nir_if_last_then_block(prev_if), loop); + progress |= lower_trivial_continues_block( +nir_if_last_else_block(prev_if), loop); + } + + /* The lower_phis_to_regs helper is smart enough to push phi sources as + * far up if-ladders as it possibly can. In other words, the recursive + * calls above shouldn't be adding instructions to this block since it + * follows an if statement. + */ + first_instr = nir_block_first_instr(block); + assert(!first_instr || instr_is_continue(first_instr)); + } + + nir_instr *last_instr = nir_block_last_instr(block); + if (!last_instr || !instr_is_continue(last_instr)) + return progress; + + /* We're at the end of a block that goes straight through to the end of the +* loop and continues to the top. We can exit normally instead of jump. +*/ + nir_lower_phis_to_regs_block(nir_loop_first_block(loop)); + nir_instr_remove(last_instr); + return true; +} + +static bool +lower_trivial_continues_list(struct exec_list *cf_list, + bool list_ends_at_loop_tail, + nir_loop *loop) +{ + bool progress = false; + foreach_list_typed(nir_cf_node, cf_node, node, cf_list) { + bool at_loop_tail = list_ends_at_loop_tail && + _node->node == exec_list_get_tail(cf_list); + switch (cf_node->type) { + case nir_cf_node_block: + break; + + case nir_cf_node_if: { + nir_if *nif = nir_cf_node_as_if(cf_node); + if (lower_trivial_continues_list(>then_list, at_loop_tail, loop)) +progress = true; +
[Mesa-dev] [PATCH 1/6] nir: Add a couple quick-and-dirty out-of-SSA helpers
These are designed for use within an optimization pass when SSA becomes more pain than it's worth. They're very naive and don't generate anything close to optimal register-based NIR. Also, they may result in shaders which do not validate because of, for instance, registers in phi sources. However, the register-based into-SSA pass should be pretty efficient at cleaning up the mess. --- src/compiler/nir/nir.h | 3 + src/compiler/nir/nir_from_ssa.c | 189 +--- 2 files changed, 180 insertions(+), 12 deletions(-) diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index 51bc6b2..0fd6a77 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -2536,6 +2536,9 @@ void nir_convert_loop_to_lcssa(nir_loop *loop); */ void nir_convert_from_ssa(nir_shader *shader, bool phi_webs_only); +bool nir_lower_phis_to_regs_block(nir_block *block); +bool nir_lower_ssa_defs_to_regs_block(nir_block *block); + bool nir_opt_algebraic(nir_shader *shader); bool nir_opt_algebraic_late(nir_shader *shader); bool nir_opt_constant_folding(nir_shader *shader); diff --git a/src/compiler/nir/nir_from_ssa.c b/src/compiler/nir/nir_from_ssa.c index 15e89a3..603d1a5 100644 --- a/src/compiler/nir/nir_from_ssa.c +++ b/src/compiler/nir/nir_from_ssa.c @@ -447,6 +447,19 @@ aggressive_coalesce_block(nir_block *block, struct from_ssa_state *state) return true; } +static nir_register * +create_reg_for_ssa_def(nir_ssa_def *def, nir_function_impl *impl) +{ + nir_register *reg = nir_local_reg_create(impl); + + reg->name = def->name; + reg->num_components = def->num_components; + reg->bit_size = def->bit_size; + reg->num_array_elems = 0; + + return reg; +} + static bool rewrite_ssa_def(nir_ssa_def *def, void *void_state) { @@ -463,13 +476,8 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state) * the things in the merge set should be the same so it doesn't * matter which node's definition we use. */ - if (node->set->reg == NULL) { - node->set->reg = nir_local_reg_create(state->impl); - node->set->reg->name = def->name; - node->set->reg->num_components = def->num_components; - node->set->reg->bit_size = def->bit_size; - node->set->reg->num_array_elems = 0; - } + if (node->set->reg == NULL) + node->set->reg = create_reg_for_ssa_def(def, state->impl); reg = node->set->reg; } else { @@ -482,11 +490,7 @@ rewrite_ssa_def(nir_ssa_def *def, void *void_state) if (def->parent_instr->type == nir_instr_type_load_const) return true; - reg = nir_local_reg_create(state->impl); - reg->name = def->name; - reg->num_components = def->num_components; - reg->bit_size = def->bit_size; - reg->num_array_elems = 0; + reg = create_reg_for_ssa_def(def, state->impl); } nir_ssa_def_rewrite_uses(def, nir_src_for_reg(reg)); @@ -810,3 +814,164 @@ nir_convert_from_ssa(nir_shader *shader, bool phi_webs_only) nir_convert_from_ssa_impl(function->impl, phi_webs_only); } } + + +static void +place_phi_read(nir_shader *shader, nir_register *reg, + nir_ssa_def *def, nir_block *block) +{ + if (block != def->parent_instr->block) { + /* Try to go up the single-successor tree */ + bool all_single_successors = true; + struct set_entry *entry; + set_foreach(block->predecessors, entry) { + nir_block *pred = (nir_block *)entry->key; + if (pred->successors[0] && pred->successors[1]) { +all_single_successors = false; +break; + } + } + + if (all_single_successors) { + /* All predecessors of this block have exactly one successor and it + * is this block so they must eventually lead here without + * intersecting each other. Place the reads in the predecessors + * instead of this block. + */ + set_foreach(block->predecessors, entry) +place_phi_read(shader, reg, def, (nir_block *)entry->key); + return; + } + } + + nir_alu_instr *mov = nir_alu_instr_create(shader, nir_op_imov); + mov->src[0].src = nir_src_for_ssa(def); + mov->dest.dest = nir_dest_for_reg(reg); + mov->dest.write_mask = (1 << reg->num_components) - 1; + nir_instr_insert(nir_after_block_before_jump(block), >instr); +} + +/** Lower all of the phi nodes in a block to imov's to and from a register + * + * This provides a very quick-and-dirty out-of-SSA pass that you can run on a + * single block to convert all of it's phis to a register and some imov's. + * The code that is generated, while not optimal for actual codegen in a + * back-end, is easy to generate, correct, and will turn into the same set of + * phis after you call regs_to_ssa and do some copy propagation. + * + * The one intelligent thing this pass does is that it places the moves from + * the phi sources as high up the predecessor tree
Re: [Mesa-dev] [PATCH] glsl: Fix program interface queries relating to interface blocks.
On Monday, December 19, 2016 8:41:17 PM PST Matt Turner wrote: > On Mon, Dec 19, 2016 at 4:13 PM, Kenneth Graunke> wrote: > > For what it's worth, the OpenGL wiki's Program Introspection page(*), > > under "Interface block member naming" gives an example matching my above > > reply. It says: > > > > uniform BlockName3 > > { > > int mem; > > } instanceName3[4]; > > > > This definition will create a single member named "BlockName3.min". > > The reason this array of four blocks only counts as having one > > variable is because each of the four blocks uses the same internal > > definition. There is nothing that could be queried from > > BlockName3[1] that could not be queried from BlockName3[0]. > > > > (*) https://www.khronos.org/opengl/wiki/Program_Introspection > > > > I think that's a decent explanation of why this is reasonable. Because > > the block entries have per-element entries (Block[0], Block[1], etc. > > you can query whether a block is referenced (i.e. a UBO binding is used). > > This might be a stupid question, but why is the field named "mem" in > the code, but the following paragraph says "min"? Are the two not > supposed to be the same? Oh. I think that's a typo. It should be BlockName3.mem. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] vulkan/wsi/x11: don't crash on null wsi x11 connection
Without this check driver crash when application window closed unexpectedly. --- src/vulkan/wsi/wsi_common_x11.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/vulkan/wsi/wsi_common_x11.c b/src/vulkan/wsi/wsi_common_x11.c index 25ba0c1..afb7809 100644 --- a/src/vulkan/wsi/wsi_common_x11.c +++ b/src/vulkan/wsi/wsi_common_x11.c @@ -261,6 +261,11 @@ VkBool32 wsi_get_physical_device_xcb_presentation_support( struct wsi_x11_connection *wsi_conn = wsi_x11_get_connection(wsi_device, alloc, connection); + if (!wsi_conn) { + fprintf(stderr, "vulkan: wsi connection lost\n"); + return false; + } + if (!wsi_conn->has_dri3) { fprintf(stderr, "vulkan: No DRI3 support\n"); return false; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] compiler/glsl: fix precision problem of tanh
Am 20.12.2016 um 00:12 schrieb Giuseppe Bilotta: > Hello, > > I realize that I'm a little late to comment about this, but I think > the formula used for > tanh should be changed again. Specifically, as suggested by Roland > > On Fri, Dec 9, 2016 at 5:41 AM, Roland Scheideggerwrote: >> btw I'm wondering if some vendors wouldn't implement that with slightly >> simplified formula, e.g. (e^2x - 1) / (e^2x + 1) (this is what nvidia >> used for cg apparently according to docs, saving one of the >> exponentials). Might be worse for accuracy though (and won't solve this >> problem, though it would now only need a one-sided clamp). It was changed to this formula. > > Another option is the 1 - 2/(1+expf(2x)), or even better 1 - > 2/(2+expm1f(2x)).. I've run some tests and this seems to have the same > accuracy as the > one mentioned by Roland, with the bonus benefit of not needing any > clamping. The accuracy seems to actually be better > than the direct evaluation (difference over sum of exps), except > around 0 (say, when abs(x) < 1). The 1 - 2/(1+expf(2x)) is worse for numbers close to zero (probably provably so, I think you might have one bit more to play with there with the other formula due to the division by essentially 2). e.g. if you have 8e-8, libm tanhf() gives me 8e-8 as a result (it looks like it's actually hard-coded to return the input as result for sufficiently small values), the (e^2x - 1) / (e^2x + 1) formula gives 5.960464e-08 whereas 1 - 2/(1+expf(2x)) will give you back 0.0f (but, with even smaller values like 2e-8, both methods will return 0.0f which is pretty wrong in any case, the relative error can get to enormous levels there). I'm not sure which method is better for larger values, I think they might be about the same. Nvidia docs stating they use the slightly more complex formula for cg though may be a hint that this indeed has some properties which are nice-to-have. Though arguably it's not that more complex, since the only part it saves is the one-sided clamp - the most expensive parts are the exp and the div, neither of which you can get rid of. Not sure it really matters though one way or another. If you wanted good accuracy around 0, you'd have to use a different formula plus a select (seems like libm implementations actually use 3 cases depending on input value magnitude - not so hot with vectors, but thankfully glsl doesn't require 1 ULP accuracy). Roland > > I've found the relative error away from 0 to be typically in the same > order of magnitude as the error in tanhf() itself (compared to tanh()) > , and generally less than machine epsilon.. I'm currently looking at > options to improve the accuracy without clamping and without excessive > additional computations, might propose a patch in the next couple of > days. > > Just one question though —not knowing much of the shader language, can > I expect expm1 to be available? > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Elminate the open-coded version of process_block_array_leaf
Reviewed-by: Timothy Arceri___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Fix program interface queries relating to interface blocks.
On Mon, Dec 19, 2016 at 4:13 PM, Kenneth Graunkewrote: > For what it's worth, the OpenGL wiki's Program Introspection page(*), > under "Interface block member naming" gives an example matching my above > reply. It says: > > uniform BlockName3 > { > int mem; > } instanceName3[4]; > > This definition will create a single member named "BlockName3.min". > The reason this array of four blocks only counts as having one > variable is because each of the four blocks uses the same internal > definition. There is nothing that could be queried from > BlockName3[1] that could not be queried from BlockName3[0]. > > (*) https://www.khronos.org/opengl/wiki/Program_Introspection > > I think that's a decent explanation of why this is reasonable. Because > the block entries have per-element entries (Block[0], Block[1], etc. > you can query whether a block is referenced (i.e. a UBO binding is used). This might be a stupid question, but why is the field named "mem" in the code, but the following paragraph says "min"? Are the two not supposed to be the same? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: Bugfix needed for hashcat
Hashcat needs MAX_GLOBAL_BUFFERS to be 21 or even 22 for some modes. It'll crash otherwise. I'm adding an assert to see if programs need it to be even higher. Signed-off-by: Christian Inci--- src/gallium/drivers/radeonsi/si_compute.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index 9d83cb3a..9bad34ed 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -32,7 +32,7 @@ #include "si_pipe.h" #include "sid.h" -#define MAX_GLOBAL_BUFFERS 20 +#define MAX_GLOBAL_BUFFERS 22 struct si_compute { unsigned ir_type; @@ -195,6 +195,7 @@ static void si_set_global_binding( unsigned i; struct si_context *sctx = (struct si_context*)ctx; struct si_compute *program = sctx->cs_shader_state.program; + assert(n <= MAX_GLOBAL_BUFFERS); if (!resources) { for (i = first; i < first + n; i++) { -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: Elminate the open-coded version of process_block_array_leaf
From: Ian RomanickSigned-off-by: Ian Romanick --- src/compiler/glsl/link_uniform_blocks.cpp | 40 +++ 1 file changed, 9 insertions(+), 31 deletions(-) diff --git a/src/compiler/glsl/link_uniform_blocks.cpp b/src/compiler/glsl/link_uniform_blocks.cpp index 3ed3556..ba01269 100644 --- a/src/compiler/glsl/link_uniform_blocks.cpp +++ b/src/compiler/glsl/link_uniform_blocks.cpp @@ -205,7 +205,7 @@ struct block { bool has_instance_name; }; -static void process_block_array_leaf(char **name, gl_uniform_block *blocks, +static void process_block_array_leaf(const char *name, gl_uniform_block *blocks, ubo_visitor *parcel, gl_uniform_buffer_variable *variables, const struct link_uniform_block_active *const b, @@ -241,7 +241,7 @@ process_block_array(struct uniform_block_array_elements *ub_array, char **name, parcel, variables, b, block_index, binding_offset, ctx, prog, first_index); } else { - process_block_array_leaf(name, blocks, + process_block_array_leaf(*name, blocks, parcel, variables, b, block_index, binding_offset, *block_index - first_index, ctx, prog); @@ -250,7 +250,7 @@ process_block_array(struct uniform_block_array_elements *ub_array, char **name, } static void -process_block_array_leaf(char **name, +process_block_array_leaf(const char *name, gl_uniform_block *blocks, ubo_visitor *parcel, gl_uniform_buffer_variable *variables, const struct link_uniform_block_active *const b, @@ -261,7 +261,7 @@ process_block_array_leaf(char **name, unsigned i = *block_index; const glsl_type *type = b->type->without_array(); - blocks[i].Name = ralloc_strdup(blocks, *name); + blocks[i].Name = ralloc_strdup(blocks, name); blocks[i].Uniforms = [(*parcel).index]; /* The ARB_shading_language_420pack spec says: @@ -278,7 +278,7 @@ process_block_array_leaf(char **name, blocks[i]._RowMajor = type->get_interface_row_major(); blocks[i].linearized_array_index = linearized_index; - parcel->process(type, blocks[i].Name); + parcel->process(type, b->has_instance_name ? blocks[i].Name : ""); blocks[i].UniformBufferSize = parcel->buffer_size; @@ -367,8 +367,8 @@ create_buffer_blocks(void *mem_ctx, struct gl_context *ctx, if ((create_ubo_blocks && !b->is_shader_storage) || (!create_ubo_blocks && b->is_shader_storage)) { + unsigned binding_offset = 0; if (b->array != NULL) { -unsigned binding_offset = 0; char *name = ralloc_strdup(NULL, block_type->without_array()->name); size_t name_length = strlen(name); @@ -379,31 +379,9 @@ create_buffer_blocks(void *mem_ctx, struct gl_context *ctx, i); ralloc_free(name); } else { -blocks[i].Name = ralloc_strdup(blocks, block_type->name); -blocks[i].Uniforms = [parcel.index]; -blocks[i].Binding = (b->has_binding) ? b->binding : 0; -blocks[i].UniformBufferSize = 0; -blocks[i]._Packing = - gl_uniform_block_packing(block_type->interface_packing); -blocks[i]._RowMajor = block_type->get_interface_row_major(); - -parcel.process(block_type, - b->has_instance_name ? block_type->name : ""); - -blocks[i].UniformBufferSize = parcel.buffer_size; - -/* Check SSBO size is lower than maximum supported size for SSBO - */ -if (b->is_shader_storage && -parcel.buffer_size > ctx->Const.MaxShaderStorageBlockSize) { - linker_error(prog, "shader storage block `%s' has size %d, " -"which is larger than than the maximum allowed (%d)", -block_type->name, parcel.buffer_size, -ctx->Const.MaxShaderStorageBlockSize); -} -blocks[i].NumUniforms = (unsigned)(ptrdiff_t) - ([parcel.index] - blocks[i].Uniforms); -i++; +process_block_array_leaf(block_type->name, blocks, , + variables, b, , _offset, + 0, ctx, prog); } } } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] compiler/glsl: fix precision problem of tanh
On Mon, Dec 19, 2016 at 8:17 PM, Matt Turnerwrote: > On Mon, Dec 19, 2016 at 5:12 PM, Giuseppe Bilotta > wrote: >> Just one question though —not knowing much of the shader language, can >> I expect expm1 to be available? > > No, expm1 doesn't exist in GLSL. And - more importantly - not in the hardware either. At least not on NVIDIA, and on a brief scan, not Intel either (SKL "math" instruction). -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] compiler/glsl: fix precision problem of tanh
On Mon, Dec 19, 2016 at 5:12 PM, Giuseppe Bilottawrote: > Just one question though —not knowing much of the shader language, can > I expect expm1 to be available? No, expm1 doesn't exist in GLSL. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: don't attempt to unlock an unlocked debug state mutex
Can someone push this to master? On Sun, Dec 11, 2016 at 07:21:36PM +0100, Eduardo Lima Mitev wrote: > Looks good. > > Reviewed-by: Eduardo Lima Mitev> > On 12/11/2016 04:42 PM, Jonathan Gray wrote: > > Commit 929fcee47e46781c57f2a354ce0a013915c033d1 introduced code that > > attempts to unlock an unlocked mutex which is undefined behaviour. > > > > On OpenBSD this leads to an abort: > > > > 0 0x124dadfa96ba in thrkill () at :2 > > 1 0x124dadf3da39 in *_libc_abort () at > > /usr/src/lib/libc/stdlib/abort.c:52 > > 2 0x124d2c1165b5 in *_libpthread_pthread_mutex_unlock > > (mutexp=) > > at /usr/src/lib/librthread/rthread_sync.c:221 > > 3 0x124d279c02e4 in init_attrib_groups (ctx=0x124df0fda000) at > > main/context.c:825 > > 4 _mesa_initialize_context (ctx=ctx@entry=0x124df0fda000, > > api=api@entry=API_OPENGL_CORE, > > visual=visual@entry=0x7f7bdfd0, share_list=share_list@entry=0x0, > > driverFunctions=driverFunctions@entry=0x7f7bda60) at > > main/context.c:1204 > > 5 0x124d27b507ec in st_create_context (api=api@entry=API_OPENGL_CORE, > > pipe=pipe@entry=0x124dc491, visual=visual@entry=0x7f7bdfd0, > > share=share@entry=0x0, options=options@entry=0x7f7be128) > > at state_tracker/st_context.c:545 > > 6 0x124d27b8639f in st_api_create_context (stapi=, > > smapi=0x124d1b608800, attribs=0x7f7be100, error=0x7f7be0fc, > > shared_stctxi=0x0) > > at state_tracker/st_manager.c:669 > > 7 0x124d27cc5b9c in dri_create_context (api=, > > visual=0x124d8a0f8a00, > > cPriv=0x124de473f240, major_version=, > > minor_version=, > > flags=, notify_reset=false, error=0x7f7be2b4, > > sharedContextPrivate=0x0) at dri_context.c:123 > > 8 0x124d27cc5029 in driCreateContextAttribs (screen=0x124d8a0f8400, > > api=, config=0x124d8a0f8a00, shared=, > > num_attribs=, attribs=, > > error=0x7f7be2b4, > > data=0x124d77814a00) at dri_util.c:448 > > 9 0x124d8e109b00 in drisw_create_context_attribs (base=0x124df3e08700, > > config_base=0x124d7a0e7300, shareList=, > > num_attribs=, > > attribs=, error=0x7f7be2b4) at drisw_glx.c:476 > > 10 0x124d8e104b4a in glXCreateContextAttribsARB (dpy=0x124d533f, > > config=0x124d7a0e7300, share_context=0x0, direct=1, > > attrib_list=0x7f7be300) > > at create_context.c:78 > > > > Signed-off-by: Jonathan Gray > > --- > > src/mesa/main/debug_output.c | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/src/mesa/main/debug_output.c b/src/mesa/main/debug_output.c > > index 48dbbb3..bc933db 100644 > > --- a/src/mesa/main/debug_output.c > > +++ b/src/mesa/main/debug_output.c > > @@ -1282,14 +1282,13 @@ _mesa_init_debug_output(struct gl_context *ctx) > > */ > >struct gl_debug_state *debug = _mesa_lock_debug_state(ctx); > >if (!debug) { > > - goto done; > > + return; > >} > >debug->DebugOutput = GL_TRUE; > >debug->LogToStderr = GL_TRUE; > >ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_DEBUG_BIT; > > + _mesa_unlock_debug_state(ctx); > > } > > -done: > > - _mesa_unlock_debug_state(ctx); > > } > > > > > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Use ir_var_temporary when generating inline functions.
Makes sense. Reviewed-by: Ilia MirkinOn Mon, Dec 19, 2016 at 6:33 PM, Kenneth Graunke wrote: > On Monday, December 19, 2016 3:29:13 PM PST Kenneth Graunke wrote: >> We were using ir_var_auto for the inlined function parameter variables, >> which is wrong, as it suggests that those are real variables declared >> by the program. >> >> Normally this doesn't matter. However, if you called built-ins at >> global scope, it would pollute the global variable namespace with >> these new parameter temporaries. If the shader already had variables >> with those names, the linker might see contradictory global variable >> declarations and raise an error. >> >> Making them temporaries indicates that these are just things generated >> by the compiler internally. This avoids confusing the linker. >> >> Fixes a new Piglit test: glsl-fs-multiple-builtins. >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99097 > > That should be > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99154 > > (fixed locally) > >> Reported-by: Niels Ole Salscheider >> Signed-off-by: Kenneth Graunke >> --- >> src/compiler/glsl/opt_function_inlining.cpp | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/src/compiler/glsl/opt_function_inlining.cpp >> b/src/compiler/glsl/opt_function_inlining.cpp >> index 62c1f4b..78a726b 100644 >> --- a/src/compiler/glsl/opt_function_inlining.cpp >> +++ b/src/compiler/glsl/opt_function_inlining.cpp >> @@ -164,7 +164,7 @@ ir_call::generate_inline(ir_instruction *next_ir) >>parameters[i] = NULL; >>} else { >>parameters[i] = sig_param->clone(ctx, ht); >> - parameters[i]->data.mode = ir_var_auto; >> + parameters[i]->data.mode = ir_var_temporary; >> >>/* Remove the read-only decoration because we're going to write >> * directly to this variable. If the cloned variable is left >> > > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Use ir_var_temporary when generating inline functions.
On Monday, December 19, 2016 3:29:13 PM PST Kenneth Graunke wrote: > We were using ir_var_auto for the inlined function parameter variables, > which is wrong, as it suggests that those are real variables declared > by the program. > > Normally this doesn't matter. However, if you called built-ins at > global scope, it would pollute the global variable namespace with > these new parameter temporaries. If the shader already had variables > with those names, the linker might see contradictory global variable > declarations and raise an error. > > Making them temporaries indicates that these are just things generated > by the compiler internally. This avoids confusing the linker. > > Fixes a new Piglit test: glsl-fs-multiple-builtins. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99097 That should be Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99154 (fixed locally) > Reported-by: Niels Ole Salscheider> Signed-off-by: Kenneth Graunke > --- > src/compiler/glsl/opt_function_inlining.cpp | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/compiler/glsl/opt_function_inlining.cpp > b/src/compiler/glsl/opt_function_inlining.cpp > index 62c1f4b..78a726b 100644 > --- a/src/compiler/glsl/opt_function_inlining.cpp > +++ b/src/compiler/glsl/opt_function_inlining.cpp > @@ -164,7 +164,7 @@ ir_call::generate_inline(ir_instruction *next_ir) >parameters[i] = NULL; >} else { >parameters[i] = sig_param->clone(ctx, ht); > - parameters[i]->data.mode = ir_var_auto; > + parameters[i]->data.mode = ir_var_temporary; > >/* Remove the read-only decoration because we're going to write > * directly to this variable. If the cloned variable is left > signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 99154] Link time error when using multiple builtin functions
https://bugs.freedesktop.org/show_bug.cgi?id=99154 Kenneth Graunkechanged: What|Removed |Added Assignee|mesa-dev@lists.freedesktop. |kenn...@whitecape.org |org | Status|NEW |ASSIGNED --- Comment #1 from Kenneth Graunke --- Patch on list: https://lists.freedesktop.org/archives/mesa-dev/2016-December/138613.html -- You are receiving this mail because: You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: Use ir_var_temporary when generating inline functions.
We were using ir_var_auto for the inlined function parameter variables, which is wrong, as it suggests that those are real variables declared by the program. Normally this doesn't matter. However, if you called built-ins at global scope, it would pollute the global variable namespace with these new parameter temporaries. If the shader already had variables with those names, the linker might see contradictory global variable declarations and raise an error. Making them temporaries indicates that these are just things generated by the compiler internally. This avoids confusing the linker. Fixes a new Piglit test: glsl-fs-multiple-builtins. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99097 Reported-by: Niels Ole SalscheiderSigned-off-by: Kenneth Graunke --- src/compiler/glsl/opt_function_inlining.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compiler/glsl/opt_function_inlining.cpp b/src/compiler/glsl/opt_function_inlining.cpp index 62c1f4b..78a726b 100644 --- a/src/compiler/glsl/opt_function_inlining.cpp +++ b/src/compiler/glsl/opt_function_inlining.cpp @@ -164,7 +164,7 @@ ir_call::generate_inline(ir_instruction *next_ir) parameters[i] = NULL; } else { parameters[i] = sig_param->clone(ctx, ht); -parameters[i]->data.mode = ir_var_auto; +parameters[i]->data.mode = ir_var_temporary; /* Remove the read-only decoration because we're going to write * directly to this variable. If the cloned variable is left -- 2.10.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] compiler/glsl: fix precision problem of tanh
Hello, I realize that I'm a little late to comment about this, but I think the formula used for tanh should be changed again. Specifically, as suggested by Roland On Fri, Dec 9, 2016 at 5:41 AM, Roland Scheideggerwrote: > btw I'm wondering if some vendors wouldn't implement that with slightly > simplified formula, e.g. (e^2x - 1) / (e^2x + 1) (this is what nvidia > used for cg apparently according to docs, saving one of the > exponentials). Might be worse for accuracy though (and won't solve this > problem, though it would now only need a one-sided clamp). Another option is the 1 - 2/(1+expf(2x)), or even better 1 - 2/(2+expm1f(2x)).. I've run some tests and this seems to have the same accuracy as the one mentioned by Roland, with the bonus benefit of not needing any clamping. The accuracy seems to actually be better than the direct evaluation (difference over sum of exps), except around 0 (say, when abs(x) < 1). I've found the relative error away from 0 to be typically in the same order of magnitude as the error in tanhf() itself (compared to tanh()) , and generally less than machine epsilon.. I'm currently looking at options to improve the accuracy without clamping and without excessive additional computations, might propose a patch in the next couple of days. Just one question though —not knowing much of the shader language, can I expect expm1 to be available? -- Giuseppe "Oblomov" Bilotta ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] st/nine: Upload on secondary context for Draw*Up
Avoid synchronization by using the secondary context for uploading the vertex data for Draw*Up. v2: Rely on u_upload_mgr to use persistent coherent buffers. Do not flush. Signed-off-by: Axel Davy--- src/gallium/state_trackers/nine/device9.c | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/src/gallium/state_trackers/nine/device9.c b/src/gallium/state_trackers/nine/device9.c index 9f2575309a..f095ff3c86 100644 --- a/src/gallium/state_trackers/nine/device9.c +++ b/src/gallium/state_trackers/nine/device9.c @@ -274,6 +274,12 @@ NineDevice9_ctor( struct NineDevice9 *This, /* r600 and radeonsi are thread safe. */ This->csmt_active = strstr(pScreen->get_name(pScreen), "AMD") != NULL; +/* We rely on u_upload_mgr using persistent coherent buffers (which don't + * require flush to work in multi-pipe_context scenario) for vertex and + * index buffers */ +if (!GET_PCAP(BUFFER_MAP_PERSISTENT_COHERENT)) +This->csmt_active = false; + if (This->csmt_active) { This->csmt_ctx = nine_csmt_create(This); if (!This->csmt_ctx) @@ -472,13 +478,20 @@ NineDevice9_ctor( struct NineDevice9 *This, This->driver_caps.user_sw_vbufs = This->screen_sw->get_param(This->screen_sw, PIPE_CAP_USER_VERTEX_BUFFERS); This->driver_caps.user_sw_cbufs = This->screen_sw->get_param(This->screen_sw, PIPE_CAP_USER_CONSTANT_BUFFERS); +/* Implicit use of context pipe for vertex and index uploaded when + * csmt is not active. Does not need to sync since csmt is unactive, + * thus no need to call NineDevice9_GetPipe at each upload. */ if (!This->driver_caps.user_vbufs) -This->vertex_uploader = u_upload_create(This->context.pipe, 65536, +This->vertex_uploader = u_upload_create(This->csmt_active ? +This->pipe_secondary : This->context.pipe, +65536, PIPE_BIND_VERTEX_BUFFER, PIPE_USAGE_STREAM); This->vertex_sw_uploader = u_upload_create(This->pipe_sw, 65536, PIPE_BIND_VERTEX_BUFFER, PIPE_USAGE_STREAM); if (!This->driver_caps.user_ibufs) -This->index_uploader = u_upload_create(This->context.pipe, 128 * 1024, +This->index_uploader = u_upload_create(This->csmt_active ? +This->pipe_secondary : This->context.pipe, + 128 * 1024, PIPE_BIND_INDEX_BUFFER, PIPE_USAGE_STREAM); if (!This->driver_caps.user_cbufs) { This->constbuf_alignment = GET_PCAP(CONSTANT_BUFFER_OFFSET_ALIGNMENT); @@ -2835,8 +2848,6 @@ NineDevice9_DrawPrimitiveUP( struct NineDevice9 *This, vtxbuf.user_buffer = pVertexStreamZeroData; if (!This->driver_caps.user_vbufs) { -/* Implicit use of context pipe */ -(void)NineDevice9_GetPipe(This); u_upload_data(This->vertex_uploader, 0, (prim_count_to_vertex_count(PrimitiveType, PrimitiveCount)) * VertexStreamZeroStride, /* XXX */ @@ -2900,8 +2911,6 @@ NineDevice9_DrawIndexedPrimitiveUP( struct NineDevice9 *This, if (!This->driver_caps.user_vbufs) { const unsigned base = MinVertexIndex * VertexStreamZeroStride; -/* Implicit use of context pipe */ -(void)NineDevice9_GetPipe(This); u_upload_data(This->vertex_uploader, base, NumVertices * VertexStreamZeroStride, /* XXX */ @@ -2915,8 +2924,6 @@ NineDevice9_DrawIndexedPrimitiveUP( struct NineDevice9 *This, vbuf.user_buffer = NULL; } if (!This->driver_caps.user_ibufs) { -/* Implicit use of context pipe */ -(void)NineDevice9_GetPipe(This); u_upload_data(This->index_uploader, 0, (prim_count_to_vertex_count(PrimitiveType, PrimitiveCount)) * ibuf.index_size, -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 84/84 v7] st/nine: Implement new buffer upload path
This new buffer upload path enables to lock faster than the normal path when using DISCARD/NOOVERWRITE. v2: Diverse cleanups and fixes. v3: Fix allocation size for 'lone' buffers and add more debug info. v4: Rewrite of the path to handle when DISCARD/NOOVERWRITE is not used anymore. The resource content is copied to the new resource used. v5: flush for safety after unmap (not sure it is really required here, but safer to flush). v6: Do not use the path if persistent coherent mapping is unavailable. Fix buffer creation flags. v7: Do not flush since it is not needed. Signed-off-by: Axel Davy--- src/gallium/state_trackers/nine/Makefile.sources | 2 + src/gallium/state_trackers/nine/buffer9.c | 97 ++- src/gallium/state_trackers/nine/buffer9.h | 9 +- src/gallium/state_trackers/nine/device9.c | 6 + src/gallium/state_trackers/nine/device9.h | 3 + src/gallium/state_trackers/nine/indexbuffer9.c | 10 +- src/gallium/state_trackers/nine/indexbuffer9.h | 2 - .../state_trackers/nine/nine_buffer_upload.c | 294 + .../state_trackers/nine/nine_buffer_upload.h | 59 + src/gallium/state_trackers/nine/vertexbuffer9.c| 3 +- 10 files changed, 461 insertions(+), 24 deletions(-) create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.c create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.h diff --git a/src/gallium/state_trackers/nine/Makefile.sources b/src/gallium/state_trackers/nine/Makefile.sources index 2bb08a2612..56698a19f1 100644 --- a/src/gallium/state_trackers/nine/Makefile.sources +++ b/src/gallium/state_trackers/nine/Makefile.sources @@ -23,6 +23,8 @@ C_SOURCES := \ indexbuffer9.h \ iunknown.c \ iunknown.h \ + nine_buffer_upload.c \ + nine_buffer_upload.h \ nine_csmt_helper.h \ nine_debug.c \ nine_debug.h \ diff --git a/src/gallium/state_trackers/nine/buffer9.c b/src/gallium/state_trackers/nine/buffer9.c index c745d77c2c..b22713b351 100644 --- a/src/gallium/state_trackers/nine/buffer9.c +++ b/src/gallium/state_trackers/nine/buffer9.c @@ -23,6 +23,7 @@ #include "buffer9.h" #include "device9.h" +#include "nine_buffer_upload.h" #include "nine_helpers.h" #include "nine_pipe.h" @@ -100,6 +101,10 @@ NineBuffer9_ctor( struct NineBuffer9 *This, else info->usage = PIPE_USAGE_DYNAMIC; +/* When Writeonly is not set, we don't want to enable the + * optimizations */ +This->discard_nooverwrite_only = !!(Usage & D3DUSAGE_WRITEONLY) && + pParams->device->buffer_upload; /* if (pDesc->Usage & D3DUSAGE_DONOTCLIP) { } */ /* if (pDesc->Usage & D3DUSAGE_NONSECURE) { } */ /* if (pDesc->Usage & D3DUSAGE_NPATCHES) { } */ @@ -161,12 +166,18 @@ NineBuffer9_dtor( struct NineBuffer9 *This ) list_del(>managed.list2); } +if (This->buf) +nine_upload_release_buffer(This->base.base.device->buffer_upload, This->buf); + NineResource9_dtor(>base); } struct pipe_resource * -NineBuffer9_GetResource( struct NineBuffer9 *This ) +NineBuffer9_GetResource( struct NineBuffer9 *This, unsigned *offset ) { +if (This->buf) +return nine_upload_buffer_resource_and_offset(This->buf, offset); +*offset = 0; return NineResource9_GetResource(>base); } @@ -264,6 +275,8 @@ NineBuffer9_Lock( struct NineBuffer9 *This, if (Flags & D3DLOCK_DONOTWAIT && !(This->base.usage & D3DUSAGE_DYNAMIC)) usage |= PIPE_TRANSFER_DONTBLOCK; +This->discard_nooverwrite_only &= !!(Flags & (D3DLOCK_DISCARD | D3DLOCK_NOOVERWRITE)); + if (This->nmaps == This->maxmaps) { struct NineTransfer *newmaps = REALLOC(This->maps, sizeof(struct NineTransfer)*This->maxmaps, @@ -275,8 +288,67 @@ NineBuffer9_Lock( struct NineBuffer9 *This, This->maps = newmaps; } -This->maps[This->nmaps].is_pipe_secondary = FALSE; +if (This->buf && !This->discard_nooverwrite_only) { +struct pipe_box src_box; +unsigned offset; +struct pipe_resource *src_res; +DBG("Disabling nine_subbuffer for a buffer having" +"used a nine_subbuffer buffer\n"); +/* Copy buffer content to the buffer resource, which + * we will now use. + * Note: The behaviour may be different from what is expected + * with double lock. However applications can't really make expectations + * about double locks, and don't really use them, so that's ok. */ +src_res = nine_upload_buffer_resource_and_offset(This->buf, ); +u_box_1d(offset, This->size, _box); + +pipe = NineDevice9_GetPipe(device); +pipe->resource_copy_region(pipe, This->base.resource, 0, 0, 0, 0, + src_res, 0, _box); +/* Release previous resource */ +if (This->nmaps >= 1) +
[Mesa-dev] [PATCH 1/2] radeonsi: add Polaris12 support (v3)
From: Junwei Zhangv2: use gfxip names for llvm 4.0+ v3: use tonga for llvm <= 3.8 Signed-off-by: Junwei Zhang Reviewed-by: Nicolai Hähnle Acked-by: Christian König --- src/amd/addrlib/r800/ciaddrlib.cpp| 3 ++- src/amd/addrlib/r800/ciaddrlib.h | 1 + src/amd/common/amd_family.h | 1 + src/amd/common/amdgpu_id.h| 4 src/gallium/drivers/radeon/r600_pipe_common.c | 3 +++ src/gallium/drivers/radeon/radeon_vce.c | 3 ++- src/gallium/drivers/radeonsi/si_pipe.c| 1 + src/gallium/drivers/radeonsi/si_state.c | 1 + src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 4 9 files changed, 19 insertions(+), 2 deletions(-) diff --git a/src/amd/addrlib/r800/ciaddrlib.cpp b/src/amd/addrlib/r800/ciaddrlib.cpp index 7c5d29a..c726c4d 100644 --- a/src/amd/addrlib/r800/ciaddrlib.cpp +++ b/src/amd/addrlib/r800/ciaddrlib.cpp @@ -353,6 +353,7 @@ AddrChipFamily CIAddrLib::HwlConvertChipFamily( m_settings.isFiji= ASICREV_IS_FIJI_P(uChipRevision); m_settings.isPolaris10 = ASICREV_IS_POLARIS10_P(uChipRevision); m_settings.isPolaris11 = ASICREV_IS_POLARIS11_M(uChipRevision); +m_settings.isPolaris12 = ASICREV_IS_POLARIS12_V(uChipRevision); break; case FAMILY_CZ: m_settings.isCarrizo = 1; @@ -417,7 +418,7 @@ BOOL_32 CIAddrLib::HwlInitGlobalParams( { m_pipes = 16; } -else if (m_settings.isPolaris11) +else if (m_settings.isPolaris11 || m_settings.isPolaris12) { m_pipes = 4; } diff --git a/src/amd/addrlib/r800/ciaddrlib.h b/src/amd/addrlib/r800/ciaddrlib.h index de995fa..2c9a4cc 100644 --- a/src/amd/addrlib/r800/ciaddrlib.h +++ b/src/amd/addrlib/r800/ciaddrlib.h @@ -62,6 +62,7 @@ struct CIChipSettings UINT_32 isFiji: 1; UINT_32 isPolaris10 : 1; UINT_32 isPolaris11 : 1; +UINT_32 isPolaris12 : 1; // VI fusion (Carrizo) UINT_32 isCarrizo : 1; }; diff --git a/src/amd/common/amd_family.h b/src/amd/common/amd_family.h index 6a713ad..b09bbb8 100644 --- a/src/amd/common/amd_family.h +++ b/src/amd/common/amd_family.h @@ -91,6 +91,7 @@ enum radeon_family { CHIP_STONEY, CHIP_POLARIS10, CHIP_POLARIS11, +CHIP_POLARIS12, CHIP_LAST, }; diff --git a/src/amd/common/amdgpu_id.h b/src/amd/common/amdgpu_id.h index f91df55..1683a5a 100644 --- a/src/amd/common/amdgpu_id.h +++ b/src/amd/common/amdgpu_id.h @@ -142,6 +142,8 @@ enum { VI_POLARIS11_M_A0 = 90, + VI_POLARIS12_V_A0 = 100, + VI_UNKNOWN= 0xFF }; @@ -156,6 +158,8 @@ enum { ((eChipRev >= VI_POLARIS10_P_A0) && (eChipRev < VI_POLARIS11_M_A0)) #define ASICREV_IS_POLARIS11_M(eChipRev) \ (eChipRev >= VI_POLARIS11_M_A0) +#define ASICREV_IS_POLARIS12_V(eChipRev)\ + (eChipRev >= VI_POLARIS12_V_A0) /* CZ specific rev IDs */ enum { diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 0b5c6dc..e0b914c 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -755,6 +755,7 @@ static const char* r600_get_chip_name(struct r600_common_screen *rscreen) case CHIP_FIJI: return "AMD FIJI"; case CHIP_POLARIS10: return "AMD POLARIS10"; case CHIP_POLARIS11: return "AMD POLARIS11"; + case CHIP_POLARIS12: return "AMD POLARIS12"; case CHIP_STONEY: return "AMD STONEY"; default: return "AMD unknown"; } @@ -889,9 +890,11 @@ const char *r600_get_llvm_processor_name(enum radeon_family family) #if HAVE_LLVM <= 0x0308 case CHIP_POLARIS10: return "tonga"; case CHIP_POLARIS11: return "tonga"; + case CHIP_POLARIS12: return "tonga"; #else case CHIP_POLARIS10: return "polaris10"; case CHIP_POLARIS11: return "polaris11"; + case CHIP_POLARIS12: return "polaris11"; #endif default: return ""; } diff --git a/src/gallium/drivers/radeon/radeon_vce.c b/src/gallium/drivers/radeon/radeon_vce.c index aad2ec1..dcd56ea 100644 --- a/src/gallium/drivers/radeon/radeon_vce.c +++ b/src/gallium/drivers/radeon/radeon_vce.c @@ -413,7 +413,8 @@ struct pipe_video_codec *rvce_create_encoder(struct pipe_context *context, enc->use_vui = true; if (rscreen->info.family >= CHIP_TONGA && rscreen->info.family != CHIP_STONEY && - rscreen->info.family != CHIP_POLARIS11) + rscreen->info.family != CHIP_POLARIS11 && + rscreen->info.family != CHIP_POLARIS12) enc->dual_pipe = true; /* TODO enable B frame with dual instance */ if ((rscreen->info.family >= CHIP_TONGA) && diff --git
Re: [Mesa-dev] [PATCH] glsl: Drop bogus is_vertex_input from add_shader_variable().
On Fri, Dec 16, 2016 at 9:46 PM, Kenneth Graunkewrote: > > stage_mask is a bitmask of shader stages, so the proper comparison would > be (1 << MESA_SHADER_VERTEX), not MESA_SHADER_VERTEX itself. > > But we only care for structure types, and VS inputs cannot be structs. > So we can just drop this entirely. > > Signed-off-by: Kenneth Graunke > --- > src/compiler/glsl/linker.cpp | 7 +-- > 1 file changed, 1 insertion(+), 6 deletions(-) > > Sorry, I forgot I split this into two patches...apply this one before > [PATCH] glsl: Fix program interface queries relating to interface blocks. > > diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp > index 3660257..5066014 100644 > --- a/src/compiler/glsl/linker.cpp > +++ b/src/compiler/glsl/linker.cpp > @@ -3735,10 +3735,6 @@ add_shader_variable(const struct gl_context *ctx, > bool use_implicit_location, int location, > const glsl_type *outermost_struct_type = NULL) > { > - const bool is_vertex_input = > - programInterface == GL_PROGRAM_INPUT && > - stage_mask == MESA_SHADER_VERTEX; > - > switch (type->base_type) { > case GLSL_TYPE_STRUCT: { >/* The ARB_program_interface_query spec says: > @@ -3764,8 +3760,7 @@ add_shader_variable(const struct gl_context *ctx, >outermost_struct_type)) > return false; > > - field_location += > -field->type->count_attribute_slots(is_vertex_input); > + field_location += field->type->count_attribute_slots(false); >} >return true; > } > -- > 2.10.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev Reviewed-by: Anuj Phogat ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/19] glsl: Use simpler visitor to determine which UBO and SSBO blocks are used
On 12/18/2016 10:09 PM, Timothy Arceri wrote: > On Thu, 2016-12-15 at 20:10 -0800, Ian Romanick wrote: >> From: Ian Romanick>> >> Very soon this visitor will get more complicated. The users of the >> existing ir_variable_refcount visitor won't need the coming >> functionality, and this use doesn't need much of the functionality of >> ir_variable_refcount. >> >> Signed-off-by: Ian Romanick >> Cc: mesa-sta...@lists.freedesktop.org >> --- >> src/compiler/Makefile.sources | 2 + >> src/compiler/glsl/ir_array_refcount.cpp | 100 >> >> src/compiler/glsl/ir_array_refcount.h | 65 + >> src/compiler/glsl/link_uniforms.cpp | 10 ++-- >> 4 files changed, 172 insertions(+), 5 deletions(-) >> create mode 100644 src/compiler/glsl/ir_array_refcount.cpp >> create mode 100644 src/compiler/glsl/ir_array_refcount.h >> >> diff --git a/src/compiler/Makefile.sources >> b/src/compiler/Makefile.sources >> index 17b15de..15f410f 100644 >> --- a/src/compiler/Makefile.sources >> +++ b/src/compiler/Makefile.sources >> @@ -29,6 +29,8 @@ LIBGLSL_FILES = \ >> glsl/glsl_to_nir.cpp \ >> glsl/glsl_to_nir.h \ >> glsl/hir_field_selection.cpp \ >> +glsl/ir_array_refcount.cpp \ >> +glsl/ir_array_refcount.h \ >> glsl/ir_basic_block.cpp \ >> glsl/ir_basic_block.h \ >> glsl/ir_builder.cpp \ >> diff --git a/src/compiler/glsl/ir_array_refcount.cpp >> b/src/compiler/glsl/ir_array_refcount.cpp >> new file mode 100644 >> index 000..41a0914 >> --- /dev/null >> +++ b/src/compiler/glsl/ir_array_refcount.cpp >> @@ -0,0 +1,100 @@ >> +/* >> + * Copyright © 2016 Intel Corporation >> + * >> + * Permission is hereby granted, free of charge, to any person >> obtaining a >> + * copy of this software and associated documentation files (the >> "Software"), >> + * to deal in the Software without restriction, including without >> limitation >> + * the rights to use, copy, modify, merge, publish, distribute, >> sublicense, >> + * and/or sell copies of the Software, and to permit persons to whom >> the >> + * Software is furnished to do so, subject to the following >> conditions: >> + * >> + * The above copyright notice and this permission notice (including >> the next >> + * paragraph) shall be included in all copies or substantial >> portions of the >> + * Software. >> + * >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >> EXPRESS OR >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >> MERCHANTABILITY, >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO >> EVENT SHALL >> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES >> OR OTHER >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, >> ARISING >> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR >> OTHER >> + * DEALINGS IN THE SOFTWARE. >> + */ >> + >> +/** >> + * \file ir_array_refcount.cpp >> + * >> + * Provides a visitor which produces a list of variables referenced. >> + */ >> + >> +#include "ir.h" >> +#include "ir_visitor.h" >> +#include "ir_array_refcount.h" >> +#include "compiler/glsl_types.h" >> +#include "util/hash_table.h" >> + >> +ir_array_refcount_visitor::ir_array_refcount_visitor() >> +{ >> + this->mem_ctx = ralloc_context(NULL); >> + this->ht = _mesa_hash_table_create(NULL, _mesa_hash_pointer, >> + _mesa_key_pointer_equal); >> +} >> + >> +static void >> +free_entry(struct hash_entry *entry) >> +{ >> + ir_array_refcount_entry *ivre = (ir_array_refcount_entry *) >> entry->data; >> + delete ivre; >> +} >> + >> +ir_array_refcount_visitor::~ir_array_refcount_visitor() >> +{ >> + ralloc_free(this->mem_ctx); >> + _mesa_hash_table_destroy(this->ht, free_entry); >> +} >> + >> +ir_array_refcount_entry::ir_array_refcount_entry(ir_variable *var) >> + : var(var), is_referenced(false) >> +{ >> + /* empty */ >> +} >> + >> + >> +ir_array_refcount_entry * >> +ir_array_refcount_visitor::get_variable_entry(ir_variable *var) >> +{ >> + assert(var); >> + >> + struct hash_entry *e = _mesa_hash_table_search(this->ht, var); >> + if (e) >> + return (ir_array_refcount_entry *)e->data; >> + >> + ir_array_refcount_entry *entry = new >> ir_array_refcount_entry(var); >> + _mesa_hash_table_insert(this->ht, var, entry); >> + >> + return entry; >> +} >> + >> + >> +ir_visitor_status >> +ir_array_refcount_visitor::visit(ir_dereference_variable *ir) >> +{ >> + ir_variable *const var = ir->variable_referenced(); >> + ir_array_refcount_entry *entry = this->get_variable_entry(var); >> + >> + if (entry) > > I don't think this can ever be not true. maybe just assert(entry). If > new fails it will throw an exception rather than return null as far as > I understand. I believe you are correct. I'll change that. > Otherwise seems fine: > > Reviewed-by: Timothy Arceri
Re: [Mesa-dev] [PATCH 03/19] glsl: Track the linearized array index for each UBO instance array element
On 12/18/2016 09:54 PM, Timothy Arceri wrote: > On Thu, 2016-12-15 at 20:10 -0800, Ian Romanick wrote: >> From: Ian Romanick>> >> Signed-off-by: Ian Romanick >> Cc: mesa-sta...@lists.freedesktop.org >> --- >> src/compiler/glsl/link_uniform_blocks.cpp | 17 ++--- >> src/mesa/main/mtypes.h| 15 +++ >> 2 files changed, 29 insertions(+), 3 deletions(-) >> >> diff --git a/src/compiler/glsl/link_uniform_blocks.cpp >> b/src/compiler/glsl/link_uniform_blocks.cpp >> index 41b26e7..9adfbd5 100644 >> --- a/src/compiler/glsl/link_uniform_blocks.cpp >> +++ b/src/compiler/glsl/link_uniform_blocks.cpp >> @@ -209,13 +209,19 @@ static void process_block_array_leaf(char >> **name, gl_uniform_block *blocks, >> struct gl_context *ctx, >> struct gl_shader_program >> *prog); >> >> +/** >> + * >> + * \param first_index Value of \c block_index for the first element >> of the >> + *array. >> + */ >> static void >> process_block_array(struct uniform_block_array_elements *ub_array, >> char **name, >> size_t name_length, gl_uniform_block *blocks, >> ubo_visitor *parcel, gl_uniform_buffer_variable >> *variables, >> const struct link_uniform_block_active *const b, >> unsigned *block_index, unsigned *binding_offset, >> -struct gl_context *ctx, struct gl_shader_program >> *prog) >> +struct gl_context *ctx, struct gl_shader_program >> *prog, >> +unsigned first_index) >> { >> for (unsigned j = 0; j < ub_array->num_array_elements; j++) { >>size_t new_length = name_length; >> @@ -227,11 +233,15 @@ process_block_array(struct >> uniform_block_array_elements *ub_array, char **name, >>if (ub_array->array) { >> process_block_array(ub_array->array, name, new_length, >> blocks, >> parcel, variables, b, block_index, >> - binding_offset, ctx, prog); >> + binding_offset, ctx, prog, >> first_index); >>} else { >> + const unsigned i = *block_index; >> + >> process_block_array_leaf(name, blocks, >>parcel, variables, b, block_index, >>binding_offset, ctx, prog); >> + >> + blocks[i].linearized_array_index = i - first_index; > > Shouldn't this go in the new process_block_array_leaf() too? It only needs to be one place. *block_index is modified by process_block_array_leaf, but the linearized index is the pre-modified (that's a word... I'm sure of it!) value. It could go in process_block_array_leaf instead, but then I'd also need to pass either first_index or (i - first_index). I originally had a patch that removed the open-coded copy of process_block_array_leaf which does not need to set linearized_array_index. I dropped that patch because there was a subtle difference between the two implementations. I will probably revisit the idea, however. > Otherwise this patch is: > > Reviewed-by: Timothy Arceri > > >>} >> } >> } >> @@ -359,7 +369,8 @@ create_buffer_blocks(void *mem_ctx, struct >> gl_context *ctx, >> >> assert(b->has_instance_name); >> process_block_array(b->array, , name_length, >> blocks, , >> -variables, b, , _offset, >> ctx, prog); >> +variables, b, , _offset, >> ctx, prog, >> +i); >> ralloc_free(name); >> } else { >> blocks[i].Name = ralloc_strdup(blocks, block_type- >>> name); >> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h >> index 36d48e2..ac4cac0 100644 >> --- a/src/mesa/main/mtypes.h >> +++ b/src/mesa/main/mtypes.h >> @@ -2493,6 +2493,21 @@ struct gl_uniform_block >> uint8_t stageref; >> >> /** >> +* Linearized array index for uniform block instance arrays >> +* >> +* Given a uniform block instance array declared with size >> +* blk[s_0][s_1]..[s_m], the block referenced by >> blk[i_0][i_1]..[i_m] will >> +* have the linearized array index >> +* >> +* m-1 m >> +* i_m + ∑ i_j * ∏ s_k >> +* j=0 k=j+1 >> +* >> +* For a uniform block instance that is not an array, this is >> always 0. >> +*/ >> + uint8_t linearized_array_index; >> + >> + /** >> * Layout specified in the shader >> * >> * This isn't accessible through the API, but it is used while ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Fix program interface queries relating to interface blocks.
On Monday, December 19, 2016 1:36:00 PM PST Ian Romanick wrote: > On 12/16/2016 09:35 PM, Kenneth Graunke wrote: > > This fixes 555 dEQP tests (using the nougat-cts-dev branch), Piglit's > > arb_program_interface_query/arb_program_interface_query-resource-query, > > and GL45-CTS.program_interface_query.separate-programs-{tess-control, > > tess-eval,geometry}. Only one dEQP program interface failure remains. > > > > I would have liked to split this up into several distinct changes, but > > I wasn't sure how to do that given thet tangled nature of these issues. > > > > So, the issues: > > > >* We need to treat interface blocks declared as an array of instances > > as a single block - removing the outer array. The resource list > > entry's name should not include the array length. Properties such > > as GL_ARRAY_SIZE should refer to the variable inside the block, not > > the interface block's array properties. > > > >* We need to do this prefixing even for structure variables. > > > >* We need to do this for built-ins (such as gl_PerVertex.gl_Position). > > > >* After interface array unwrapping, any variable which is an array > > should have [0] appended. It doesn't matter if it's a TCS/TES/GS > > input or TCS output - that looked like an attempt to unwrap for > > per-vertex variables, but that didn't consider per-patch variables, > > and as far as I can tell there's nothing to justify this. > > > > Several Mesa developers have suggested that Issue 16 contradicts the > > main specification, but I believe that it doesn't - the main spec just > > isn't terribly clear. The main ARB_program_interface query spec says: > > > > "* For an active interface block not declared as an array of block > > instances, a single entry will be generated, using the block name from > > the shader source. > > > >* For an active interface block declared as an array of instances, > > separate entries will be generated for each active instance. The name > > of the instance is formed by concatenating the block name, the "[" > > character, an integer identifying the instance number, and the "]" > > character." > > > > Issue 16 says that built-ins should be named "gl_PerVertex.gl_Position", > > but several people suggested the second bullet above means that it > > should be named "gl_PerVertex[array length].gl_Position". > > > > There are two important things to note. Those bullet points say > > "an active interface block", while the others say "variable" or "active > > shader storage block member". They also don't mention applying the > > rules recursively (unlike the other bullets). Both suggest that > > these rules apply to blocks themselves, not members of blocks. > > > > In fact, for GL_UNIFORM_BLOCK queries, we do have "block[0]", > > "block[1]", ... resource list entries - so those rules are real, > > and actually used. So if they don't apply to block members, then how > > should members be named? Unfortunately, I don't see any rules outside > > of issue 16 - where the rationale is very unclear. I hope to clarify > > the spec in the future. > > Based on my understanding, something like > > out Vertex { > vec4 p; > vec4 c[3]; > } vertex[2]; > > in a vertex shader should produce > > Vertex[0].p > Vertex[0].c[0] (with GL_ARRAY_SIZE = 3) > Vertex[1].p > Vertex[1].c[0] (with GL_ARRAY_SIZE = 3) > > This is definitely what we would produce for a uniform block. > > What I've never understood is why that isn't done for gl_PerVertex stage > boundaries where gl_PerVertex is explicitly arrayed. My expectation is that the above code would produce: Block entries: Vertex[0] Vertex[1] Variable/member entries: Vertex.p Vertex.c[0] (with GL_ARRAY_SIZE = 3) We would produce similar results for uniform blocks after my patch. I don't think anything about the rules is gl_PerVertex specific. > I think this patch is good for now. I like that it brings all the > strange edge-case handling into one place. > > Reviewed-by: Ian RomanickI've attached some notes I took while looking at dEQP program interface query tests - it has the shader code and expected results for a lot of test cases. I didn't write down the uniform ones, but they follow the same pattern (as shown above). For what it's worth, the OpenGL wiki's Program Introspection page(*), under "Interface block member naming" gives an example matching my above reply. It says: uniform BlockName3 { int mem; } instanceName3[4]; This definition will create a single member named "BlockName3.min". The reason this array of four blocks only counts as having one variable is because each of the four blocks uses the same internal definition. There is nothing that could be queried from BlockName3[1] that could not be queried from BlockName3[0]. (*)
Re: [Mesa-dev] [PATCH v2 043/103] i965/vec4: handle 32 and 64 bit channels in liveness analysis
Iago Toral Quirogawrites: > From: "Juan A. Suarez Romero" > > Our current data flow analysis does not take into account that channels > on 64-bit operands are 64-bit. This is a problem when the same register > is accessed using both 64-bit and 32-bit channels. This is very common > in operations where we need to access 64-bit data in 32-bit chunks, > such as the double packing and packing operations. > > This patch changes the analysis by checking the bits that each source > or destination datatype needs. Actually, rather than bits, we use > blocks of 32bits, which is the minimum channel size. > > Because a vgrf can contain a dvec4 (256 bits), we reserve 8 > 32-bit blocks to map the channels. > > v2 (Curro): > - Simplify code by making the var_from_reg helpers take an extra > argument with the register component we want. > - Fix a couple of cases where we had to update the code to the new > way of representing live variables. > --- > src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +- > src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 2 +- > .../dri/i965/brw_vec4_dead_code_eliminate.cpp | 25 + > .../drivers/dri/i965/brw_vec4_live_variables.cpp | 32 > +++--- > .../drivers/dri/i965/brw_vec4_live_variables.h | 15 ++ > 5 files changed, 42 insertions(+), 34 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp > b/src/mesa/drivers/dri/i965/brw_vec4.cpp > index 3191eab..34cab04 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp > @@ -1140,7 +1140,7 @@ vec4_visitor::opt_register_coalesce() >/* Can't coalesce this GRF if someone else was going to > * read it later. > */ > - if (var_range_end(var_from_reg(alloc, dst_reg(inst->src[0])), 4) > ip) > + if (var_range_end(var_from_reg(alloc, dst_reg(inst->src[0])), 8) > ip) >continue; > >/* We need to check interference with the final destination between > this > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp > index 1b91db9..bef897a 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp > @@ -246,7 +246,7 @@ vec4_visitor::opt_cse_local(bblock_t *block) > * more -- a sure sign they'll fail operands_match(). > */ > if (src->file == VGRF) { > - if (var_range_end(var_from_reg(alloc, dst_reg(*src)), 4) < > ip) { > + if (var_range_end(var_from_reg(alloc, dst_reg(*src)), 8) < > ip) { >entry->remove(); >ralloc_free(entry); >break; > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp > index 950c6c8..6a80810 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp > @@ -57,12 +57,13 @@ vec4_visitor::dead_code_eliminate() > if ((inst->dst.file == VGRF && !inst->has_side_effects()) || > (inst->dst.is_null() && inst->writes_flag())){ > bool result_live[4] = { false }; > - > if (inst->dst.file == VGRF) { > - for (unsigned i = 0; i < regs_written(inst); i++) { > - for (int c = 0; c < 4; c++) > - result_live[c] |= BITSET_TEST( > -live, var_from_reg(alloc, offset(inst->dst, i), c)); > + for (unsigned i = 0; i < 2 * regs_written(inst); i++) { One of the issues we discussed in the past about this approach is that it would overestimate the number of register OWORDs accessed by instructions with size_written < REG_SIZE (or size_read(i) < REG_SIZE), which will be emitted by the SIMD lowering pass. Now that the amount of data read and written by instructions is represented in byte units you can avoid this problem by using DIV_ROUND_UP(inst->size_written, 16) instead of 2 * regs_written(inst) above. > + for (int c = 0; c < 4; c++) { > + const unsigned v = > +var_from_reg(alloc, inst->dst, c, i); > + result_live[c] |= BITSET_TEST(live, v); > + } > } > } else { > for (unsigned c = 0; c < 4; c++) > @@ -111,11 +112,12 @@ vec4_visitor::dead_code_eliminate() > > if (inst->dst.file == VGRF && !inst->predicate && > !inst->is_align1_partial_write()) { > -for (unsigned i = 0; i < regs_written(inst); i++) { > +for (unsigned i = 0; i < 2 * regs_written(inst); i++) { Same rounding issue as above. > for (int c = 0; c < 4; c++) { >if (inst->dst.writemask & (1 << c)) { > -
Re: [Mesa-dev] [PATCH 1/3] gallium: add renderonly library
On Mon, Dec 19, 2016 at 08:54:04PM +0100, Christian Gmeiner wrote: > 2016-12-19 14:08 GMT+01:00 Thierry Reding: > > On Wed, Nov 30, 2016 at 02:44:34PM +0100, Christian Gmeiner wrote: [...] > >> GALLIUM_WINSYS_CFLAGS = \ > >> -I$(top_srcdir)/src \ > >> -I$(top_srcdir)/include \ > >> diff --git a/src/gallium/auxiliary/Makefile.am > >> b/src/gallium/auxiliary/Makefile.am > >> index 4a4a4fb..6b63cf1 100644 > >> --- a/src/gallium/auxiliary/Makefile.am > >> +++ b/src/gallium/auxiliary/Makefile.am > >> @@ -20,6 +20,16 @@ libgallium_la_SOURCES = \ > >> $(NIR_SOURCES) \ > >> $(GENERATED_SOURCES) > >> > >> +if HAVE_LIBDRM > >> + > >> +AM_CFLAGS += \ > >> + $(LIBDRM_CFLAGS) > > > > Same here. > > I just redone what other have done in that file. There are some if's > in that file and I am unsure > what to do here. Maybe Emil has a strong opinion here, I was really just nit-picking, so feel free to leave this. > >> diff --git a/src/gallium/auxiliary/Makefile.sources > >> b/src/gallium/auxiliary/Makefile.sources > >> index 5d4fe30..8d3e4a9 100644 > >> --- a/src/gallium/auxiliary/Makefile.sources > >> +++ b/src/gallium/auxiliary/Makefile.sources > >> @@ -435,3 +435,7 @@ GALLIVM_SOURCES := \ > >> draw/draw_llvm_sample.c \ > >> draw/draw_pt_fetch_shade_pipeline_llvm.c \ > >> draw/draw_vs_llvm.c > >> + > >> +RENDERONLY_SOURCES := \ > >> + renderonly/renderonly.c \ > >> + renderonly/renderonly.h > >> diff --git a/src/gallium/auxiliary/renderonly/renderonly.c > >> b/src/gallium/auxiliary/renderonly/renderonly.c > >> new file mode 100644 > >> index 000..c4ea784 > >> --- /dev/null > >> +++ b/src/gallium/auxiliary/renderonly/renderonly.c > >> @@ -0,0 +1,199 @@ > >> +/* > >> + * Copyright (C) 2016 Christian Gmeiner > >> + * > >> + * Permission is hereby granted, free of charge, to any person obtaining a > >> + * copy of this software and associated documentation files (the > >> "Software"), > >> + * to deal in the Software without restriction, including without > >> limitation > >> + * the rights to use, copy, modify, merge, publish, distribute, > >> sublicense, > >> + * and/or sell copies of the Software, and to permit persons to whom the > >> + * Software is furnished to do so, subject to the following conditions: > >> + * > >> + * The above copyright notice and this permission notice (including the > >> next > >> + * paragraph) shall be included in all copies or substantial portions of > >> the > >> + * Software. > >> + * > >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > >> EXPRESS OR > >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > >> MERCHANTABILITY, > >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT > >> SHALL > >> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR > >> OTHER > >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, > >> ARISING FROM, > >> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS > >> IN THE > >> + * SOFTWARE. > >> + * > >> + * Authors: > >> + *Christian Gmeiner > >> + */ > >> + > >> +#include "renderonly/renderonly.h" > > > > This include is oddly placed. Should it not go below all other includes? > > > > I think thats a matter of style but I can puit it above #include > "state_tracker/drm_driver.h" I prefer having these in hierarchical order. That is, anything from the C runtime goes first, then other system includes, followed by project core includes and finally driver-local includes. By that ordering renderonly.h would be very last in line. Not sure if anyone in Mesa is pedantic enough to insist on a common style, so feel free to leave this as-is if you prefer. > >> +struct pipe_screen * > >> +renderonly_screen_create(int fd, const struct renderonly_ops *ops, void > >> *priv) > > > > This is slightly nit-picky, but I found it confusing when first reading > > through the patches: I know this started out as an effort to support the > > various render-only devices out there, but what you're creating here is > > really an abstraction for the scanout engine. Reading renderonly_* the > > first thing I think of is that this creates a render-only device, where > > in fact is creates the scanout engine. > > > > Perhaps renaming this to struct scanout, struct scanout_engine or any > > other number of possibilities would remove that confusion. > > > > In my current version I have only one struct left: > > struct renderonly { >int (*tiling)(int fd, uint32_t handle); /**< for !intermediate_rendering */ >bool intermediate_rendering; >int kms_fd; >int gpu_fd; > }; > > And here is how the only function in imx-drm winsys looks like: > > struct pipe_screen *imx_drm_screen_create(int fd) > { >struct renderonly ro = { > .intermediate_rendering = true, > .kms_fd = fd, > .gpu_fd
Re: [Mesa-dev] [PATCH] glsl: Fix program interface queries relating to interface blocks.
On 12/16/2016 09:35 PM, Kenneth Graunke wrote: > This fixes 555 dEQP tests (using the nougat-cts-dev branch), Piglit's > arb_program_interface_query/arb_program_interface_query-resource-query, > and GL45-CTS.program_interface_query.separate-programs-{tess-control, > tess-eval,geometry}. Only one dEQP program interface failure remains. > > I would have liked to split this up into several distinct changes, but > I wasn't sure how to do that given thet tangled nature of these issues. > > So, the issues: > >* We need to treat interface blocks declared as an array of instances > as a single block - removing the outer array. The resource list > entry's name should not include the array length. Properties such > as GL_ARRAY_SIZE should refer to the variable inside the block, not > the interface block's array properties. > >* We need to do this prefixing even for structure variables. > >* We need to do this for built-ins (such as gl_PerVertex.gl_Position). > >* After interface array unwrapping, any variable which is an array > should have [0] appended. It doesn't matter if it's a TCS/TES/GS > input or TCS output - that looked like an attempt to unwrap for > per-vertex variables, but that didn't consider per-patch variables, > and as far as I can tell there's nothing to justify this. > > Several Mesa developers have suggested that Issue 16 contradicts the > main specification, but I believe that it doesn't - the main spec just > isn't terribly clear. The main ARB_program_interface query spec says: > > "* For an active interface block not declared as an array of block > instances, a single entry will be generated, using the block name from > the shader source. > >* For an active interface block declared as an array of instances, > separate entries will be generated for each active instance. The name > of the instance is formed by concatenating the block name, the "[" > character, an integer identifying the instance number, and the "]" > character." > > Issue 16 says that built-ins should be named "gl_PerVertex.gl_Position", > but several people suggested the second bullet above means that it > should be named "gl_PerVertex[array length].gl_Position". > > There are two important things to note. Those bullet points say > "an active interface block", while the others say "variable" or "active > shader storage block member". They also don't mention applying the > rules recursively (unlike the other bullets). Both suggest that > these rules apply to blocks themselves, not members of blocks. > > In fact, for GL_UNIFORM_BLOCK queries, we do have "block[0]", > "block[1]", ... resource list entries - so those rules are real, > and actually used. So if they don't apply to block members, then how > should members be named? Unfortunately, I don't see any rules outside > of issue 16 - where the rationale is very unclear. I hope to clarify > the spec in the future. Based on my understanding, something like out Vertex { vec4 p; vec4 c[3]; } vertex[2]; in a vertex shader should produce Vertex[0].p Vertex[0].c[0] (with GL_ARRAY_SIZE = 3) Vertex[1].p Vertex[1].c[0] (with GL_ARRAY_SIZE = 3) This is definitely what we would produce for a uniform block. What I've never understood is why that isn't done for gl_PerVertex stage boundaries where gl_PerVertex is explicitly arrayed. I think this patch is good for now. I like that it brings all the strange edge-case handling into one place. Reviewed-by: Ian Romanick> Cc: Ian Romanick > Cc: Alejandro Piñeiro > Signed-off-by: Kenneth Graunke > --- > src/compiler/glsl/linker.cpp | 100 > - > src/mesa/main/shader_query.cpp | 92 - > 2 files changed, 79 insertions(+), 113 deletions(-) > > diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp > index 5066014..5508d58 100644 > --- a/src/compiler/glsl/linker.cpp > +++ b/src/compiler/glsl/linker.cpp > @@ -3735,6 +3735,65 @@ add_shader_variable(const struct gl_context *ctx, > bool use_implicit_location, int location, > const glsl_type *outermost_struct_type = NULL) > { > + const glsl_type *interface_type = var->get_interface_type(); > + > + if (outermost_struct_type == NULL) { > + /* Unsized (non-patch) TCS output/TES input arrays are implicitly > + * sized to gl_MaxPatchVertices. Internally, we shrink them to a > + * smaller size. > + * > + * This can cause trouble with SSO programs. Since the TCS declares > + * the number of output vertices, we can always shrink TCS output > + * arrays. However, the TES might not be linked with a TCS, in > + * which case it won't know the size of the patch. In other words, > +
[Mesa-dev] [Bug 99154] Link time error when using multiple builtin functions
https://bugs.freedesktop.org/show_bug.cgi?id=99154 Bug ID: 99154 Summary: Link time error when using multiple builtin functions Product: Mesa Version: git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: glsl-compiler Assignee: mesa-dev@lists.freedesktop.org Reporter: niels_...@salscheider-online.de QA Contact: intel-3d-b...@lists.freedesktop.org Created attachment 128570 --> https://bugs.freedesktop.org/attachment.cgi?id=128570=edit Simple test case When using multiple builtin functions, linking of a shader can fail if temporary variables of the GLSL compiler collide. builtin_builder::binop creates variables named "x" and "y" for the parameters of builtin functions. Later, ir_call::generate_inline inlines the call to the builtins. It clones the variables (without changing the name) and assigns the function inputs to them. This can cause a collision of variable names, e. g. when using multiple builtin functions with parameters of different types. The attached test case can be used to reproduce the problem. I can work around the problem by adding a sequence number and possibly a prefix to the variable names but I am not sure if that is the correct solution. Thanks to imirkin and robclark for helping me to figure this out on IRC. -- You are receiving this mail because: You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 99154] Link time error when using multiple builtin functions
https://bugs.freedesktop.org/show_bug.cgi?id=99154 Niels Ole Salscheiderchanged: What|Removed |Added CC||niels_ole@salscheider-onlin ||e.de -- You are receiving this mail because: You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] imx: gallium driver for imx-drm scanout driver
On Mon, Dec 19, 2016 at 04:04:34PM +, Emil Velikov wrote: > On Monday, 19 December 2016, Thierry Reding> wrote: > > > On Wed, Nov 30, 2016 at 02:44:36PM +0100, Christian Gmeiner wrote: > > [...] > > > +static struct pipe_screen *imx_open_render_node(struct renderonly *ro) > > > +{ > > > + return etna_drm_screen_create_rendernode(ro); > > > +} > > > > Patch 2/3 never made it into my inbox for some reason, and had to find > > it in some archives. I'll have to resort to commenting on the code here. > > From what I saw, etna_drm_screen_create_rendernode() hard-codes the GPU > > render node (to /dev/dri/renderD128, I think). It's a little brittle for > > obvious reasons, but I think you might be able to get away with it, > > depending on the SoC. > > > > On Tegra we have a bunch of people that stick discrete GPUs into the > > PCIe slot and run a second instance of Nouveau on that. It's an > > interesting use-case, but it also has the drawback of creating a second > > renderD device. What's more, it uses the same kernel driver, so hard- > > coding the device node is going to give us a 50-50 chance on average > > that we get the right one. Back when I had written the original proposal > > I had also coded a heuristic that would walk sysfs and select the render > > node that was on the same bus as the card node. This was before Emil had > > removed the dependency on udev, but I've rewritten that code to work > > with Emil's drmDevice*() API, which needs some fleshing out[0]. > > > > Out of curiosity, is this something that could happen on i.MX devices as > > well? Or even if not i.MX, I'm fairly sure that there are other SoCs > > that have a Vivante GPU and a PCI slot that people could use to stick > > big GPUs into, so you may run into the same issue eventually. > > Thanks Thierry for the nice write-up. Must admit that I was feeling a bit > lazy. > > Considering the ~ok odds and the fact that we don't have platform support > for the drm*Device helpers I'm inclined to have this fixed after the merge. > > Let's have etna and(?) tegra and sort bugs during the RCs ;-) I'm fine if you want etnaviv as-is. I personally would want to hold off on Tegra for a wee bit longer. Let's see if Alex has a strong opinion. What are your requirements for release? Will you be freezing the libdrm dependency during RCs? I've got most of the patches ready to make this work reliably on Tegra, though I'd still have to port my patches to Christian's renderonly library. Thierry signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] radeonsi: add Polaris12 support (v2)
2016-12-19 21:41 GMT+01:00 Alex Deucher: > diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c > b/src/gallium/drivers/radeon/r600_pipe_common.c > index 0b5c6dc..76a34fe 100644 > --- a/src/gallium/drivers/radeon/r600_pipe_common.c > +++ b/src/gallium/drivers/radeon/r600_pipe_common.c > @@ -755,6 +755,7 @@ static const char* r600_get_chip_name(struct > r600_common_screen *rscreen) > case CHIP_FIJI: return "AMD FIJI"; > case CHIP_POLARIS10: return "AMD POLARIS10"; > case CHIP_POLARIS11: return "AMD POLARIS11"; > + case CHIP_POLARIS12: return "AMD POLARIS12"; > case CHIP_STONEY: return "AMD STONEY"; > default: return "AMD unknown"; > } > @@ -893,6 +894,11 @@ const char *r600_get_llvm_processor_name(enum > radeon_family family) > case CHIP_POLARIS10: return "polaris10"; > case CHIP_POLARIS11: return "polaris11"; > #endif LLVM <= 3.8 needs the same fallback as CHIP_POLARIS11. #if HAVE_LLVM <= 0x0308 needs to return "tonga" Thanks, Andreas > +#if HAVE_LLVM <= 0x0309 > + case CHIP_POLARIS12: return "polaris11"; > +#else > + case CHIP_POLARIS12: return "gfx803"; > +#endif > default: return ""; > } > } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] radeonsi: add Polaris12 support (v2)
From: Junwei Zhangv2: use gfxip names for llvm 4.0+ Signed-off-by: Junwei Zhang Reviewed-by: Nicolai Hähnle Acked-by: Christian König --- src/amd/addrlib/r800/ciaddrlib.cpp| 3 ++- src/amd/addrlib/r800/ciaddrlib.h | 1 + src/amd/common/amd_family.h | 1 + src/amd/common/amdgpu_id.h| 4 src/gallium/drivers/radeon/r600_pipe_common.c | 6 ++ src/gallium/drivers/radeon/radeon_vce.c | 3 ++- src/gallium/drivers/radeonsi/si_pipe.c| 1 + src/gallium/drivers/radeonsi/si_state.c | 1 + src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 4 9 files changed, 22 insertions(+), 2 deletions(-) diff --git a/src/amd/addrlib/r800/ciaddrlib.cpp b/src/amd/addrlib/r800/ciaddrlib.cpp index 7c5d29a..c726c4d 100644 --- a/src/amd/addrlib/r800/ciaddrlib.cpp +++ b/src/amd/addrlib/r800/ciaddrlib.cpp @@ -353,6 +353,7 @@ AddrChipFamily CIAddrLib::HwlConvertChipFamily( m_settings.isFiji= ASICREV_IS_FIJI_P(uChipRevision); m_settings.isPolaris10 = ASICREV_IS_POLARIS10_P(uChipRevision); m_settings.isPolaris11 = ASICREV_IS_POLARIS11_M(uChipRevision); +m_settings.isPolaris12 = ASICREV_IS_POLARIS12_V(uChipRevision); break; case FAMILY_CZ: m_settings.isCarrizo = 1; @@ -417,7 +418,7 @@ BOOL_32 CIAddrLib::HwlInitGlobalParams( { m_pipes = 16; } -else if (m_settings.isPolaris11) +else if (m_settings.isPolaris11 || m_settings.isPolaris12) { m_pipes = 4; } diff --git a/src/amd/addrlib/r800/ciaddrlib.h b/src/amd/addrlib/r800/ciaddrlib.h index de995fa..2c9a4cc 100644 --- a/src/amd/addrlib/r800/ciaddrlib.h +++ b/src/amd/addrlib/r800/ciaddrlib.h @@ -62,6 +62,7 @@ struct CIChipSettings UINT_32 isFiji: 1; UINT_32 isPolaris10 : 1; UINT_32 isPolaris11 : 1; +UINT_32 isPolaris12 : 1; // VI fusion (Carrizo) UINT_32 isCarrizo : 1; }; diff --git a/src/amd/common/amd_family.h b/src/amd/common/amd_family.h index 6a713ad..b09bbb8 100644 --- a/src/amd/common/amd_family.h +++ b/src/amd/common/amd_family.h @@ -91,6 +91,7 @@ enum radeon_family { CHIP_STONEY, CHIP_POLARIS10, CHIP_POLARIS11, +CHIP_POLARIS12, CHIP_LAST, }; diff --git a/src/amd/common/amdgpu_id.h b/src/amd/common/amdgpu_id.h index f91df55..1683a5a 100644 --- a/src/amd/common/amdgpu_id.h +++ b/src/amd/common/amdgpu_id.h @@ -142,6 +142,8 @@ enum { VI_POLARIS11_M_A0 = 90, + VI_POLARIS12_V_A0 = 100, + VI_UNKNOWN= 0xFF }; @@ -156,6 +158,8 @@ enum { ((eChipRev >= VI_POLARIS10_P_A0) && (eChipRev < VI_POLARIS11_M_A0)) #define ASICREV_IS_POLARIS11_M(eChipRev) \ (eChipRev >= VI_POLARIS11_M_A0) +#define ASICREV_IS_POLARIS12_V(eChipRev)\ + (eChipRev >= VI_POLARIS12_V_A0) /* CZ specific rev IDs */ enum { diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 0b5c6dc..76a34fe 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -755,6 +755,7 @@ static const char* r600_get_chip_name(struct r600_common_screen *rscreen) case CHIP_FIJI: return "AMD FIJI"; case CHIP_POLARIS10: return "AMD POLARIS10"; case CHIP_POLARIS11: return "AMD POLARIS11"; + case CHIP_POLARIS12: return "AMD POLARIS12"; case CHIP_STONEY: return "AMD STONEY"; default: return "AMD unknown"; } @@ -893,6 +894,11 @@ const char *r600_get_llvm_processor_name(enum radeon_family family) case CHIP_POLARIS10: return "polaris10"; case CHIP_POLARIS11: return "polaris11"; #endif +#if HAVE_LLVM <= 0x0309 + case CHIP_POLARIS12: return "polaris11"; +#else + case CHIP_POLARIS12: return "gfx803"; +#endif default: return ""; } } diff --git a/src/gallium/drivers/radeon/radeon_vce.c b/src/gallium/drivers/radeon/radeon_vce.c index aad2ec1..dcd56ea 100644 --- a/src/gallium/drivers/radeon/radeon_vce.c +++ b/src/gallium/drivers/radeon/radeon_vce.c @@ -413,7 +413,8 @@ struct pipe_video_codec *rvce_create_encoder(struct pipe_context *context, enc->use_vui = true; if (rscreen->info.family >= CHIP_TONGA && rscreen->info.family != CHIP_STONEY && - rscreen->info.family != CHIP_POLARIS11) + rscreen->info.family != CHIP_POLARIS11 && + rscreen->info.family != CHIP_POLARIS12) enc->dual_pipe = true; /* TODO enable B frame with dual instance */ if ((rscreen->info.family >= CHIP_TONGA) && diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c index 88685f9..11cca6f 100644 ---
Re: [Mesa-dev] [PATCH 08/19] glsl: Walk a list of ir_dereference_array to mark array elements as accessed
On Thursday, December 15, 2016 8:10:20 PM PST Ian Romanick wrote: > From: Ian Romanick> > Signed-off-by: Ian Romanick > Cc: mesa-sta...@lists.freedesktop.org > --- > src/compiler/glsl/ir_array_refcount.cpp | 73 +++- > src/compiler/glsl/ir_array_refcount.h | 10 + > src/compiler/glsl/tests/array_refcount_test.cpp | 425 > > 3 files changed, 507 insertions(+), 1 deletion(-) > > diff --git a/src/compiler/glsl/ir_array_refcount.cpp > b/src/compiler/glsl/ir_array_refcount.cpp > index 75dca0a..7c9ab8e 100644 > --- a/src/compiler/glsl/ir_array_refcount.cpp > +++ b/src/compiler/glsl/ir_array_refcount.cpp > @@ -34,7 +34,7 @@ > #include "util/hash_table.h" > > ir_array_refcount_visitor::ir_array_refcount_visitor() > - : derefs(0), num_derefs(0), derefs_size(0) > + : last_array_deref(0), derefs(0), num_derefs(0), derefs_size(0) > { > this->mem_ctx = ralloc_context(NULL); > this->ht = _mesa_hash_table_create(NULL, _mesa_hash_pointer, > @@ -159,6 +159,77 @@ ir_array_refcount_visitor::get_array_deref() > return d; > } > > +ir_visitor_status > +ir_array_refcount_visitor::visit_enter(ir_dereference_array *ir) > +{ > + /* It could also be a vector or a matrix. Individual elements of vectors > +* are natrices are not tracked, so bail. and matrices > +*/ > + if (!ir->array->type->is_array()) > + return visit_continue; > + > + /* If this array dereference is a child of an array dereference that was > +* already visited, just continue on. Otherwise, for an arrays-of-arrays > +* dereference like x[1][2][3][4], we'd process the [1][2][3][4] sequence, > +* the [1][2][3] sequence, the [1][2] sequence, and the [1] sequence. > This > +* ensures that we only process the full sequence. > +*/ > + if (last_array_deref && last_array_deref->array == ir) { > + last_array_deref = ir; > + return visit_continue; > + } > + > + last_array_deref = ir; > + > + num_derefs = 0; > + > + ir_rvalue *rv = ir; > + while (rv->ir_type == ir_type_dereference_array) { > + ir_dereference_array *const deref = rv->as_dereference_array(); > + > + assert(deref != NULL); > + assert(deref->array->type->is_array()); > + > + ir_rvalue *const array = deref->array; > + const ir_constant *const idx = deref->array_index->as_constant(); > + array_deref_range *const dr = get_array_deref(); > + > + dr->size = array->type->array_size(); > + > + if (idx != NULL) { > + dr->index = idx->get_int_component(0); > + } else { > + /* An unsized array can occur at the end of an SSBO. We can't track > + * accesses to such an array, so bail. > + */ > + if (array->type->array_size() == 0) > +return visit_continue; > + > + dr->index = dr->size; > + } > + > + rv = array; > + } > + > + ir_dereference_variable *const var_deref = rv->as_dereference_variable(); > + > + /* If the array being dereferenced is not a variable, bail. At the very > +* least, ir_constant and ir_dereference_record are possible. > +*/ > + if (var_deref == NULL) > + return visit_continue; > + > + ir_array_refcount_entry *const entry = > + this->get_variable_entry(var_deref->var); > + > + if (entry == NULL) > + return visit_stop; > + > + entry->mark_array_elements_referenced(derefs, num_derefs); > + > + return visit_continue; > +} > + > > ir_visitor_status > ir_array_refcount_visitor::visit(ir_dereference_variable *ir) > diff --git a/src/compiler/glsl/ir_array_refcount.h > b/src/compiler/glsl/ir_array_refcount.h > index 2988046..46ba36c 100644 > --- a/src/compiler/glsl/ir_array_refcount.h > +++ b/src/compiler/glsl/ir_array_refcount.h > @@ -140,6 +140,7 @@ public: > virtual ir_visitor_status visit(ir_dereference_variable *); > > virtual ir_visitor_status visit_enter(ir_function_signature *); > + virtual ir_visitor_status visit_enter(ir_dereference_array *); > > /** > * Find variable in the hash table, and insert it if not present > @@ -158,6 +159,15 @@ private: > array_deref_range *get_array_deref(); > > /** > +* Last ir_dereference_array that was visited > +* > +* Used to prevent some redundant calculations. > +* > +* \sa ::visit_enter(ir_dereference_array *) > +*/ > + ir_dereference_array *last_array_deref; > + > + /** > * \name array_deref_range tracking > */ > /*@{*/ > diff --git a/src/compiler/glsl/tests/array_refcount_test.cpp > b/src/compiler/glsl/tests/array_refcount_test.cpp > index 29bb133..ecd7f46 100644 > --- a/src/compiler/glsl/tests/array_refcount_test.cpp > +++ b/src/compiler/glsl/tests/array_refcount_test.cpp > @@ -23,12 +23,18 @@ > #include > #include "ir.h" > #include "ir_array_refcount.h" > +#include "ir_builder.h" > +#include "util/hash_table.h" > + > +using
Re: [Mesa-dev] [PATCH 1/3] gallium: add renderonly library
Hi Thierry, 2016-12-19 14:08 GMT+01:00 Thierry Reding: > On Wed, Nov 30, 2016 at 02:44:34PM +0100, Christian Gmeiner wrote: >> This a very lightweight library to add basic support for >> renderonly GPUs. It does all the magic regarding in/exporting >> buffers etc. This library will likely break android support and >> hopefully will get replaced with a better solution based on gbm2. >> >> Signed-off-by: Christian Gmeiner >> --- >> src/gallium/Automake.inc | 5 + >> src/gallium/auxiliary/Makefile.am | 10 ++ >> src/gallium/auxiliary/Makefile.sources| 4 + >> src/gallium/auxiliary/renderonly/renderonly.c | 199 >> ++ >> src/gallium/auxiliary/renderonly/renderonly.h | 81 +++ >> 5 files changed, 299 insertions(+) >> create mode 100644 src/gallium/auxiliary/renderonly/renderonly.c >> create mode 100644 src/gallium/auxiliary/renderonly/renderonly.h > > Hi Christian, > > sorry for being late to the party. I only ran across this by accident. > Would you mind keeping me in Cc for subsequent versions of this? It's > been more than two years since I wrote the original proposal for this, > but I'm still interested in seeing a solution emerge. > Sure will add you to CC list. Btw. your timing could not be better I wanted to send out v2 today. But I will hold it back and incooperate your feedback. > Anyway, thanks for picking this up. > > From a diff point of view this is certainly much more terse than the > original proposal. I find it slightly unfortunate that the drivers for > the render GPU have to be changed. But it's hard to argue with the > reduction in size. > > I still think that Tegra will eventually require more than the stub that > you have in this series because the same device that exposes the scanout > engine also contains units that can do video compositing, decoding and > encoding. There's in fact recently been discussions on how best to > provide that functionality and Mesa looks like the best long-term target > currently. Anyway, that's a bridge I think we can cross when we get > there, this concept looks fine to get started. > >> diff --git a/src/gallium/Automake.inc b/src/gallium/Automake.inc >> index 6fe2e22..6aadcb9 100644 >> --- a/src/gallium/Automake.inc >> +++ b/src/gallium/Automake.inc >> @@ -50,6 +50,11 @@ GALLIUM_COMMON_LIB_DEPS = \ >> $(PTHREAD_LIBS) \ >> $(DLOPEN_LIBS) >> >> +if HAVE_LIBDRM >> +GALLIUM_COMMON_LIB_DEPS += \ >> + $(LIBDRM_LIBS) >> +endif > > Nit: Is the conditional necessary here? LIBDRM_LIBS would be empty if > libdrm wasn't found, right? So no harm in just appending it to the > GALLIUM_COMMON_LIB_DEPS variable unconditonally? > Done. >> GALLIUM_WINSYS_CFLAGS = \ >> -I$(top_srcdir)/src \ >> -I$(top_srcdir)/include \ >> diff --git a/src/gallium/auxiliary/Makefile.am >> b/src/gallium/auxiliary/Makefile.am >> index 4a4a4fb..6b63cf1 100644 >> --- a/src/gallium/auxiliary/Makefile.am >> +++ b/src/gallium/auxiliary/Makefile.am >> @@ -20,6 +20,16 @@ libgallium_la_SOURCES = \ >> $(NIR_SOURCES) \ >> $(GENERATED_SOURCES) >> >> +if HAVE_LIBDRM >> + >> +AM_CFLAGS += \ >> + $(LIBDRM_CFLAGS) > > Same here. I just redone what other have done in that file. There are some if's in that file and I am unsure what to do here. > >> diff --git a/src/gallium/auxiliary/Makefile.sources >> b/src/gallium/auxiliary/Makefile.sources >> index 5d4fe30..8d3e4a9 100644 >> --- a/src/gallium/auxiliary/Makefile.sources >> +++ b/src/gallium/auxiliary/Makefile.sources >> @@ -435,3 +435,7 @@ GALLIVM_SOURCES := \ >> draw/draw_llvm_sample.c \ >> draw/draw_pt_fetch_shade_pipeline_llvm.c \ >> draw/draw_vs_llvm.c >> + >> +RENDERONLY_SOURCES := \ >> + renderonly/renderonly.c \ >> + renderonly/renderonly.h >> diff --git a/src/gallium/auxiliary/renderonly/renderonly.c >> b/src/gallium/auxiliary/renderonly/renderonly.c >> new file mode 100644 >> index 000..c4ea784 >> --- /dev/null >> +++ b/src/gallium/auxiliary/renderonly/renderonly.c >> @@ -0,0 +1,199 @@ >> +/* >> + * Copyright (C) 2016 Christian Gmeiner >> + * >> + * Permission is hereby granted, free of charge, to any person obtaining a >> + * copy of this software and associated documentation files (the >> "Software"), >> + * to deal in the Software without restriction, including without limitation >> + * the rights to use, copy, modify, merge, publish, distribute, sublicense, >> + * and/or sell copies of the Software, and to permit persons to whom the >> + * Software is furnished to do so, subject to the following conditions: >> + * >> + * The above copyright notice and this permission notice (including the next >> + * paragraph) shall be included in all copies or substantial portions of the >> + * Software. >> + * >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS >> OR >> + * IMPLIED,
Re: [Mesa-dev] [PATCH 1/2] radeonsi: add Polaris12 support
On Mon, Dec 19, 2016 at 02:04:05PM -0500, Alex Deucher wrote: > From: Junwei Zhang> > Signed-off-by: Junwei Zhang > Reviewed-by: Nicolai Hähnle > Acked-by: Christian König > --- > src/amd/addrlib/r800/ciaddrlib.cpp| 3 ++- > src/amd/addrlib/r800/ciaddrlib.h | 1 + > src/amd/common/amd_family.h | 1 + > src/amd/common/amdgpu_id.h| 4 > src/gallium/drivers/radeon/r600_pipe_common.c | 6 ++ > src/gallium/drivers/radeon/radeon_vce.c | 3 ++- > src/gallium/drivers/radeonsi/si_pipe.c| 1 + > src/gallium/drivers/radeonsi/si_state.c | 1 + > src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 4 > 9 files changed, 22 insertions(+), 2 deletions(-) > > diff --git a/src/amd/addrlib/r800/ciaddrlib.cpp > b/src/amd/addrlib/r800/ciaddrlib.cpp > index 7c5d29a..c726c4d 100644 > --- a/src/amd/addrlib/r800/ciaddrlib.cpp > +++ b/src/amd/addrlib/r800/ciaddrlib.cpp > @@ -353,6 +353,7 @@ AddrChipFamily CIAddrLib::HwlConvertChipFamily( > m_settings.isFiji= ASICREV_IS_FIJI_P(uChipRevision); > m_settings.isPolaris10 = > ASICREV_IS_POLARIS10_P(uChipRevision); > m_settings.isPolaris11 = > ASICREV_IS_POLARIS11_M(uChipRevision); > +m_settings.isPolaris12 = > ASICREV_IS_POLARIS12_V(uChipRevision); > break; > case FAMILY_CZ: > m_settings.isCarrizo = 1; > @@ -417,7 +418,7 @@ BOOL_32 CIAddrLib::HwlInitGlobalParams( > { > m_pipes = 16; > } > -else if (m_settings.isPolaris11) > +else if (m_settings.isPolaris11 || m_settings.isPolaris12) > { > m_pipes = 4; > } > diff --git a/src/amd/addrlib/r800/ciaddrlib.h > b/src/amd/addrlib/r800/ciaddrlib.h > index de995fa..2c9a4cc 100644 > --- a/src/amd/addrlib/r800/ciaddrlib.h > +++ b/src/amd/addrlib/r800/ciaddrlib.h > @@ -62,6 +62,7 @@ struct CIChipSettings > UINT_32 isFiji: 1; > UINT_32 isPolaris10 : 1; > UINT_32 isPolaris11 : 1; > +UINT_32 isPolaris12 : 1; > // VI fusion (Carrizo) > UINT_32 isCarrizo : 1; > }; > diff --git a/src/amd/common/amd_family.h b/src/amd/common/amd_family.h > index 6a713ad..b09bbb8 100644 > --- a/src/amd/common/amd_family.h > +++ b/src/amd/common/amd_family.h > @@ -91,6 +91,7 @@ enum radeon_family { > CHIP_STONEY, > CHIP_POLARIS10, > CHIP_POLARIS11, > +CHIP_POLARIS12, > CHIP_LAST, > }; > > diff --git a/src/amd/common/amdgpu_id.h b/src/amd/common/amdgpu_id.h > index f91df55..1683a5a 100644 > --- a/src/amd/common/amdgpu_id.h > +++ b/src/amd/common/amdgpu_id.h > @@ -142,6 +142,8 @@ enum { > > VI_POLARIS11_M_A0 = 90, > > + VI_POLARIS12_V_A0 = 100, > + > VI_UNKNOWN= 0xFF > }; > > @@ -156,6 +158,8 @@ enum { > ((eChipRev >= VI_POLARIS10_P_A0) && (eChipRev < VI_POLARIS11_M_A0)) > #define ASICREV_IS_POLARIS11_M(eChipRev) \ > (eChipRev >= VI_POLARIS11_M_A0) > +#define ASICREV_IS_POLARIS12_V(eChipRev)\ > + (eChipRev >= VI_POLARIS12_V_A0) > > /* CZ specific rev IDs */ > enum { > diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c > b/src/gallium/drivers/radeon/r600_pipe_common.c > index 0b5c6dc..033e59c 100644 > --- a/src/gallium/drivers/radeon/r600_pipe_common.c > +++ b/src/gallium/drivers/radeon/r600_pipe_common.c > @@ -755,6 +755,7 @@ static const char* r600_get_chip_name(struct > r600_common_screen *rscreen) > case CHIP_FIJI: return "AMD FIJI"; > case CHIP_POLARIS10: return "AMD POLARIS10"; > case CHIP_POLARIS11: return "AMD POLARIS11"; > + case CHIP_POLARIS12: return "AMD POLARIS12"; > case CHIP_STONEY: return "AMD STONEY"; > default: return "AMD unknown"; > } > @@ -893,6 +894,11 @@ const char *r600_get_llvm_processor_name(enum > radeon_family family) > case CHIP_POLARIS10: return "polaris10"; > case CHIP_POLARIS11: return "polaris11"; > #endif > +#if HAVE_LLVM <= 0x0309 > + case CHIP_POLARIS12: return "polaris11"; > +#else > + case CHIP_POLARIS12: return "polaris12"; There is a preference to move away from code names and use gfxip names in the compiler. For LLVM >= 4.0 this should be "gfx803" -Tom > +#endif > default: return ""; > } > } > diff --git a/src/gallium/drivers/radeon/radeon_vce.c > b/src/gallium/drivers/radeon/radeon_vce.c > index aad2ec1..dcd56ea 100644 > --- a/src/gallium/drivers/radeon/radeon_vce.c > +++ b/src/gallium/drivers/radeon/radeon_vce.c > @@ -413,7 +413,8 @@ struct pipe_video_codec *rvce_create_encoder(struct > pipe_context *context, > enc->use_vui = true; > if (rscreen->info.family >= CHIP_TONGA && > rscreen->info.family != CHIP_STONEY && > - rscreen->info.family != CHIP_POLARIS11) > +
[Mesa-dev] [PATCH] i965: Don't bail on vertex element processing if we need draw params.
BaseVertex, BaseInstance, DrawID, and some edge flag conditions need vertex buffer and elements structs. We can't bail early in this case. Gen4-7 already do this properly. Gen8+ did not. Thanks to Ilia Mirkin for helping track this down. Cc: mesa-sta...@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99144 Reported-by: Pierre-Eric Pelloux-PrayerSigned-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/gen8_draw_upload.c | 34 ++-- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c b/src/mesa/drivers/dri/i965/gen8_draw_upload.c index 69ba8e9..3177f9a 100644 --- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c +++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c @@ -110,6 +110,22 @@ gen8_emit_vertices(struct brw_context *brw) ADVANCE_BATCH(); } + /* Normally we don't need an element for the SGVS attribute because the +* 3DSTATE_VF_SGVS instruction lets you store the generated attribute in an +* element that is past the list in 3DSTATE_VERTEX_ELEMENTS. However if +* we're using draw parameters then we need an element for the those +* values. Additionally if there is an edge flag element then the SGVS +* can't be inserted past that so we need a dummy element to ensure that +* the edge flag is the last one. +*/ + const bool needs_sgvs_element = (vs_prog_data->uses_basevertex || +vs_prog_data->uses_baseinstance || +((vs_prog_data->uses_instanceid || + vs_prog_data->uses_vertexid) && + uses_edge_flag)); + const unsigned nr_elements = + brw->vb.nr_enabled + needs_sgvs_element + vs_prog_data->uses_drawid; + /* If the VS doesn't read any inputs (calculating vertex position from * a state variable for some reason, for example), emit a single pad * VERTEX_ELEMENT struct and bail. @@ -117,7 +133,7 @@ gen8_emit_vertices(struct brw_context *brw) * The stale VB state stays in place, but they don't do anything unless * a VE loads from them. */ - if (brw->vb.nr_enabled == 0) { + if (nr_elements == 0) { BEGIN_BATCH(3); OUT_BATCH((_3DSTATE_VERTEX_ELEMENTS << 16) | (3 - 2)); OUT_BATCH((0 << GEN6_VE0_INDEX_SHIFT) | @@ -172,22 +188,6 @@ gen8_emit_vertices(struct brw_context *brw) ADVANCE_BATCH(); } - /* Normally we don't need an element for the SGVS attribute because the -* 3DSTATE_VF_SGVS instruction lets you store the generated attribute in an -* element that is past the list in 3DSTATE_VERTEX_ELEMENTS. However if -* we're using draw parameters then we need an element for the those -* values. Additionally if there is an edge flag element then the SGVS -* can't be inserted past that so we need a dummy element to ensure that -* the edge flag is the last one. -*/ - const bool needs_sgvs_element = (vs_prog_data->uses_basevertex || -vs_prog_data->uses_baseinstance || -((vs_prog_data->uses_instanceid || - vs_prog_data->uses_vertexid) && - uses_edge_flag)); - const unsigned nr_elements = - brw->vb.nr_enabled + needs_sgvs_element + vs_prog_data->uses_drawid; - /* The hardware allows one more VERTEX_ELEMENTS than VERTEX_BUFFERS, * presumably for VertexID/InstanceID. */ -- 2.10.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] gallium-docs: Add documentation for when using several contexts
Add documentation to explicit what can be expected and what is allowed when using several contexts. Signed-off-by: Axel Davy--- src/gallium/docs/source/context.rst | 23 +++ 1 file changed, 23 insertions(+) diff --git a/src/gallium/docs/source/context.rst b/src/gallium/docs/source/context.rst index e190cefc85..35f51a0941 100644 --- a/src/gallium/docs/source/context.rst +++ b/src/gallium/docs/source/context.rst @@ -707,3 +707,26 @@ notifications are single-shot, i.e. subsequent calls to since the last call or since the last notification by callback. * ``set_device_reset_callback`` sets a callback which will be called when a device reset is detected. The callback is only called synchronously. + +Using several contexts +-- + +Several contexts from the same screen can be used at the same time. Objects +created on one context cannot be used in another context, but the objects +created by the screen methods can be used by all contexts. + +Transfers +^ +A transfer on one context is not expected to synchronize properly with +rendering on other contexts, thus only areas not yet used for rendering should +be locked. + +A flush is required after transfer_unmap to expect other contexts to see the +uploaded data, unless: + +* Using persistent mapping. Associated with coherent mapping, unmapping the + resource is also not required to use it in other contexts. Without coherent + mapping, memory_barrier(PIPE_BARRIER_MAPPED_BUFFER) should be called on the + context that has mapped the resource. No flush is required. + +* Mapping the resource with PIPE_TRANSFER_MAP_DIRECTLY. -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFCv2] New gallium documentation to explicit threading requirements
This is a new RFC to replace "New gallium flags for using different contexts in several threads". Please comment. Yours, Axel Davy Axel Davy (2): gallium-docs: Add documentation for threading requirements gallium-docs: Add documentation for when using several contexts src/gallium/docs/source/context.rst | 23 +++ src/gallium/docs/source/screen.rst | 10 ++ 2 files changed, 33 insertions(+) -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] gallium-docs: Add documentation for threading requirements
Add documentation for the requirements related to threading for screens and contexts. Signed-off-by: Axel Davy--- src/gallium/docs/source/screen.rst | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 7ac39ffc44..86aa2591ab 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -737,3 +737,13 @@ query group at the specified **index** is returned in **info**. The function returns non-zero on success. The driver-specific query group is described with the pipe_driver_query_group_info structure. + + +Thread safety +- + +Screen methods are required to be thread safe. While gallium rendering +contexts are not required to be thread safe, it is required to be safe to use +different contexts created with the same screen in different threads without +locks. It is also required to be safe using screen methods in a thread, while +using one of its contexts in another (without locks). -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radeonsi: add Polaris12 PCI ID
From: Junwei ZhangSigned-off-by: Junwei Zhang Reviewed-by: Nicolai Hähnle --- include/pci_ids/radeonsi_pci_ids.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/include/pci_ids/radeonsi_pci_ids.h b/include/pci_ids/radeonsi_pci_ids.h index 20c1583..fca47b0 100644 --- a/include/pci_ids/radeonsi_pci_ids.h +++ b/include/pci_ids/radeonsi_pci_ids.h @@ -205,3 +205,10 @@ CHIPSET(0x67CF, POLARIS10_, POLARIS10) CHIPSET(0x67DF, POLARIS10_, POLARIS10) CHIPSET(0x98E4, STONEY_, STONEY) + +CHIPSET(0x6980, POLARIS12_, POLARIS12) +CHIPSET(0x6981, POLARIS12_, POLARIS12) +CHIPSET(0x6985, POLARIS12_, POLARIS12) +CHIPSET(0x6986, POLARIS12_, POLARIS12) +CHIPSET(0x6987, POLARIS12_, POLARIS12) +CHIPSET(0x699F, POLARIS12_, POLARIS12) -- 2.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] radeonsi: add Polaris12 support
From: Junwei ZhangSigned-off-by: Junwei Zhang Reviewed-by: Nicolai Hähnle Acked-by: Christian König --- src/amd/addrlib/r800/ciaddrlib.cpp| 3 ++- src/amd/addrlib/r800/ciaddrlib.h | 1 + src/amd/common/amd_family.h | 1 + src/amd/common/amdgpu_id.h| 4 src/gallium/drivers/radeon/r600_pipe_common.c | 6 ++ src/gallium/drivers/radeon/radeon_vce.c | 3 ++- src/gallium/drivers/radeonsi/si_pipe.c| 1 + src/gallium/drivers/radeonsi/si_state.c | 1 + src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 4 9 files changed, 22 insertions(+), 2 deletions(-) diff --git a/src/amd/addrlib/r800/ciaddrlib.cpp b/src/amd/addrlib/r800/ciaddrlib.cpp index 7c5d29a..c726c4d 100644 --- a/src/amd/addrlib/r800/ciaddrlib.cpp +++ b/src/amd/addrlib/r800/ciaddrlib.cpp @@ -353,6 +353,7 @@ AddrChipFamily CIAddrLib::HwlConvertChipFamily( m_settings.isFiji= ASICREV_IS_FIJI_P(uChipRevision); m_settings.isPolaris10 = ASICREV_IS_POLARIS10_P(uChipRevision); m_settings.isPolaris11 = ASICREV_IS_POLARIS11_M(uChipRevision); +m_settings.isPolaris12 = ASICREV_IS_POLARIS12_V(uChipRevision); break; case FAMILY_CZ: m_settings.isCarrizo = 1; @@ -417,7 +418,7 @@ BOOL_32 CIAddrLib::HwlInitGlobalParams( { m_pipes = 16; } -else if (m_settings.isPolaris11) +else if (m_settings.isPolaris11 || m_settings.isPolaris12) { m_pipes = 4; } diff --git a/src/amd/addrlib/r800/ciaddrlib.h b/src/amd/addrlib/r800/ciaddrlib.h index de995fa..2c9a4cc 100644 --- a/src/amd/addrlib/r800/ciaddrlib.h +++ b/src/amd/addrlib/r800/ciaddrlib.h @@ -62,6 +62,7 @@ struct CIChipSettings UINT_32 isFiji: 1; UINT_32 isPolaris10 : 1; UINT_32 isPolaris11 : 1; +UINT_32 isPolaris12 : 1; // VI fusion (Carrizo) UINT_32 isCarrizo : 1; }; diff --git a/src/amd/common/amd_family.h b/src/amd/common/amd_family.h index 6a713ad..b09bbb8 100644 --- a/src/amd/common/amd_family.h +++ b/src/amd/common/amd_family.h @@ -91,6 +91,7 @@ enum radeon_family { CHIP_STONEY, CHIP_POLARIS10, CHIP_POLARIS11, +CHIP_POLARIS12, CHIP_LAST, }; diff --git a/src/amd/common/amdgpu_id.h b/src/amd/common/amdgpu_id.h index f91df55..1683a5a 100644 --- a/src/amd/common/amdgpu_id.h +++ b/src/amd/common/amdgpu_id.h @@ -142,6 +142,8 @@ enum { VI_POLARIS11_M_A0 = 90, + VI_POLARIS12_V_A0 = 100, + VI_UNKNOWN= 0xFF }; @@ -156,6 +158,8 @@ enum { ((eChipRev >= VI_POLARIS10_P_A0) && (eChipRev < VI_POLARIS11_M_A0)) #define ASICREV_IS_POLARIS11_M(eChipRev) \ (eChipRev >= VI_POLARIS11_M_A0) +#define ASICREV_IS_POLARIS12_V(eChipRev)\ + (eChipRev >= VI_POLARIS12_V_A0) /* CZ specific rev IDs */ enum { diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 0b5c6dc..033e59c 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -755,6 +755,7 @@ static const char* r600_get_chip_name(struct r600_common_screen *rscreen) case CHIP_FIJI: return "AMD FIJI"; case CHIP_POLARIS10: return "AMD POLARIS10"; case CHIP_POLARIS11: return "AMD POLARIS11"; + case CHIP_POLARIS12: return "AMD POLARIS12"; case CHIP_STONEY: return "AMD STONEY"; default: return "AMD unknown"; } @@ -893,6 +894,11 @@ const char *r600_get_llvm_processor_name(enum radeon_family family) case CHIP_POLARIS10: return "polaris10"; case CHIP_POLARIS11: return "polaris11"; #endif +#if HAVE_LLVM <= 0x0309 + case CHIP_POLARIS12: return "polaris11"; +#else + case CHIP_POLARIS12: return "polaris12"; +#endif default: return ""; } } diff --git a/src/gallium/drivers/radeon/radeon_vce.c b/src/gallium/drivers/radeon/radeon_vce.c index aad2ec1..dcd56ea 100644 --- a/src/gallium/drivers/radeon/radeon_vce.c +++ b/src/gallium/drivers/radeon/radeon_vce.c @@ -413,7 +413,8 @@ struct pipe_video_codec *rvce_create_encoder(struct pipe_context *context, enc->use_vui = true; if (rscreen->info.family >= CHIP_TONGA && rscreen->info.family != CHIP_STONEY && - rscreen->info.family != CHIP_POLARIS11) + rscreen->info.family != CHIP_POLARIS11 && + rscreen->info.family != CHIP_POLARIS12) enc->dual_pipe = true; /* TODO enable B frame with dual instance */ if ((rscreen->info.family >= CHIP_TONGA) && diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c index 88685f9..11cca6f 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.c +++
Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0
On Mon, Dec 19, 2016 at 2:00 AM, Samuel Iglesias Gonsálvezwrote: > Hello Matt, > > We have done most of the suggestions you made to our patches. However, > we have replied to some of your questions/suggestions and we are > waiting for a reply before marking them as R-b or not. Thank you guys so much. > You can clone the new version of the patch series by running this > command: > > $ git clone -b i965-fp64-gen7-scalar-vec4-rc3 https://github.com/Igalia > /mesa.git > > Below is the list of patches that need a R-b (they are marked as > UNREVIEWED in the branch). > > * i965/vec4: implement hardware workaround for align16 double to float > conversion >> >> This always seemed like a really strange hardware bug, and > one >> that no one should ever hit. >> >> I'd prefer that, instead of loading an immediate double and >> then >> performing a conversion to float, that we just convert the >> double to float in the compiler and emit an instruction to > load >> that. >> > > We have done this. Does this change get your R-b? Yes! > > * i965/vec4: fix optimize predicate for doubles > > We have replied here [0]. Sounds good to me. > > * i965/vec4: handle 32 and 64 bit channels in liveness analysis > > It is still unreviewed. Maybe Curro can take a look at it. I've also pinged Curro to ask if he'll review it. > * i965/vec4: add a SIMD lowering pass > > Replied here [1]. Silly messy hardware. :) > * i965/vec4: Prevent copy propagation from violating pre-gen8 > restrictions > > Replied here [1]. > > * i965/vec4: run scalarize_df() after spilling > > Replied here [1]. Makes sense. Yes, all of those should be Reviewed-by: Matt Turner Again, thank you so much. This was a large amount of work, and the way you guys handled it was extremely impressive. I'm only sorry that the review of your work wasn't executed as well as your actual work! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] imx: gallium driver for imx-drm scanout driver
On Monday, 19 December 2016, Thierry Redingwrote: > On Wed, Nov 30, 2016 at 02:44:36PM +0100, Christian Gmeiner wrote: > [...] > > +static struct pipe_screen *imx_open_render_node(struct renderonly *ro) > > +{ > > + return etna_drm_screen_create_rendernode(ro); > > +} > > Patch 2/3 never made it into my inbox for some reason, and had to find > it in some archives. I'll have to resort to commenting on the code here. > From what I saw, etna_drm_screen_create_rendernode() hard-codes the GPU > render node (to /dev/dri/renderD128, I think). It's a little brittle for > obvious reasons, but I think you might be able to get away with it, > depending on the SoC. > > On Tegra we have a bunch of people that stick discrete GPUs into the > PCIe slot and run a second instance of Nouveau on that. It's an > interesting use-case, but it also has the drawback of creating a second > renderD device. What's more, it uses the same kernel driver, so hard- > coding the device node is going to give us a 50-50 chance on average > that we get the right one. Back when I had written the original proposal > I had also coded a heuristic that would walk sysfs and select the render > node that was on the same bus as the card node. This was before Emil had > removed the dependency on udev, but I've rewritten that code to work > with Emil's drmDevice*() API, which needs some fleshing out[0]. > > Out of curiosity, is this something that could happen on i.MX devices as > well? Or even if not i.MX, I'm fairly sure that there are other SoCs > that have a Vivante GPU and a PCI slot that people could use to stick > big GPUs into, so you may run into the same issue eventually. > > Thanks Thierry for the nice write-up. Must admit that I was feeling a bit lazy. Considering the ~ok odds and the fact that we don't have platform support for the drm*Device helpers I'm inclined to have this fixed after the merge. Let's have etna and(?) tegra and sort bugs during the RCs ;-) Emil P.S. Pardon the html formatting ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] imx: gallium driver for imx-drm scanout driver
On Wed, Nov 30, 2016 at 02:44:36PM +0100, Christian Gmeiner wrote: [...] > +static struct pipe_screen *imx_open_render_node(struct renderonly *ro) > +{ > + return etna_drm_screen_create_rendernode(ro); > +} Patch 2/3 never made it into my inbox for some reason, and had to find it in some archives. I'll have to resort to commenting on the code here. From what I saw, etna_drm_screen_create_rendernode() hard-codes the GPU render node (to /dev/dri/renderD128, I think). It's a little brittle for obvious reasons, but I think you might be able to get away with it, depending on the SoC. On Tegra we have a bunch of people that stick discrete GPUs into the PCIe slot and run a second instance of Nouveau on that. It's an interesting use-case, but it also has the drawback of creating a second renderD device. What's more, it uses the same kernel driver, so hard- coding the device node is going to give us a 50-50 chance on average that we get the right one. Back when I had written the original proposal I had also coded a heuristic that would walk sysfs and select the render node that was on the same bus as the card node. This was before Emil had removed the dependency on udev, but I've rewritten that code to work with Emil's drmDevice*() API, which needs some fleshing out[0]. Out of curiosity, is this something that could happen on i.MX devices as well? Or even if not i.MX, I'm fairly sure that there are other SoCs that have a Vivante GPU and a PCI slot that people could use to stick big GPUs into, so you may run into the same issue eventually. Thierry [0]: Interestingly, things would probably work for Tegra regardless of whether the GPU is the Tegra one or a discrete one, as long as the discrete is driven by Nouveau. So perhaps an even more interesting heuristic would be to look for a render node that's driven by the Nouveau driver... signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure.ac: Respect LLVM_CFLAGS in LLVM version detection
On Mon, Dec 19, 2016 at 8:06 PM, Emil Velikovwrote: > On 19 December 2016 at 08:52, Tomasz Figa wrote: >> Hi Tobias, >> >> On Sat, Dec 17, 2016 at 2:15 AM, Tobias Droste wrote: >>> Hi Tomasz, >>> >>> does this actually fix anything? >>> >>> Because right now llvm-config.h does not include anything and I doubt it >>> will >>> in the future, as it's just a collection of defines. >>> The path to the header file itself is given by llvm-config >>> ($LLVM_INCLUDEDIR). >>> >>> Did you just happen to see this or do you get an error without this patch? >> >> We happen to have the setup exactly as described in the patch >> description, i.e. LLVM in a non-standard directory and with >> llvm-config.h that includes another header. Without the patch >> ./configure fails because of LLVM version detection errors. >> > I believe you're spot on - be that due to missing include [directives] > or conditional header inclusion one would need to have the respective > CFLAGS. Thus the patch is > Reviewed-by: Emil Velikov > > At the same time - I seems to be lucky enough to have the headers sane > [neither includes nor defines are required] on my Arch setup. > Wondering if it's not something specific to the Android way of > building LLVM Yeah, that's very likely. :) Although I can imagine distributors creating a wrapper including one or another real header, depending on some setting. I might have seen something like this in Gentoo, but I think it used absolute paths, which made it compile without any flags. > - IIRC the Android-x86 guys were explaining some > interesting things. > ... but that for another day. I'll commit your patch once I > double-check Tobias' fixes. Thanks! Best regards, Tomasz ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] gallium: add renderonly library
On Wed, Nov 30, 2016 at 02:44:34PM +0100, Christian Gmeiner wrote: > This a very lightweight library to add basic support for > renderonly GPUs. It does all the magic regarding in/exporting > buffers etc. This library will likely break android support and > hopefully will get replaced with a better solution based on gbm2. > > Signed-off-by: Christian Gmeiner> --- > src/gallium/Automake.inc | 5 + > src/gallium/auxiliary/Makefile.am | 10 ++ > src/gallium/auxiliary/Makefile.sources| 4 + > src/gallium/auxiliary/renderonly/renderonly.c | 199 > ++ > src/gallium/auxiliary/renderonly/renderonly.h | 81 +++ > 5 files changed, 299 insertions(+) > create mode 100644 src/gallium/auxiliary/renderonly/renderonly.c > create mode 100644 src/gallium/auxiliary/renderonly/renderonly.h Hi Christian, sorry for being late to the party. I only ran across this by accident. Would you mind keeping me in Cc for subsequent versions of this? It's been more than two years since I wrote the original proposal for this, but I'm still interested in seeing a solution emerge. Anyway, thanks for picking this up. From a diff point of view this is certainly much more terse than the original proposal. I find it slightly unfortunate that the drivers for the render GPU have to be changed. But it's hard to argue with the reduction in size. I still think that Tegra will eventually require more than the stub that you have in this series because the same device that exposes the scanout engine also contains units that can do video compositing, decoding and encoding. There's in fact recently been discussions on how best to provide that functionality and Mesa looks like the best long-term target currently. Anyway, that's a bridge I think we can cross when we get there, this concept looks fine to get started. > diff --git a/src/gallium/Automake.inc b/src/gallium/Automake.inc > index 6fe2e22..6aadcb9 100644 > --- a/src/gallium/Automake.inc > +++ b/src/gallium/Automake.inc > @@ -50,6 +50,11 @@ GALLIUM_COMMON_LIB_DEPS = \ > $(PTHREAD_LIBS) \ > $(DLOPEN_LIBS) > > +if HAVE_LIBDRM > +GALLIUM_COMMON_LIB_DEPS += \ > + $(LIBDRM_LIBS) > +endif Nit: Is the conditional necessary here? LIBDRM_LIBS would be empty if libdrm wasn't found, right? So no harm in just appending it to the GALLIUM_COMMON_LIB_DEPS variable unconditonally? > GALLIUM_WINSYS_CFLAGS = \ > -I$(top_srcdir)/src \ > -I$(top_srcdir)/include \ > diff --git a/src/gallium/auxiliary/Makefile.am > b/src/gallium/auxiliary/Makefile.am > index 4a4a4fb..6b63cf1 100644 > --- a/src/gallium/auxiliary/Makefile.am > +++ b/src/gallium/auxiliary/Makefile.am > @@ -20,6 +20,16 @@ libgallium_la_SOURCES = \ > $(NIR_SOURCES) \ > $(GENERATED_SOURCES) > > +if HAVE_LIBDRM > + > +AM_CFLAGS += \ > + $(LIBDRM_CFLAGS) Same here. > diff --git a/src/gallium/auxiliary/Makefile.sources > b/src/gallium/auxiliary/Makefile.sources > index 5d4fe30..8d3e4a9 100644 > --- a/src/gallium/auxiliary/Makefile.sources > +++ b/src/gallium/auxiliary/Makefile.sources > @@ -435,3 +435,7 @@ GALLIVM_SOURCES := \ > draw/draw_llvm_sample.c \ > draw/draw_pt_fetch_shade_pipeline_llvm.c \ > draw/draw_vs_llvm.c > + > +RENDERONLY_SOURCES := \ > + renderonly/renderonly.c \ > + renderonly/renderonly.h > diff --git a/src/gallium/auxiliary/renderonly/renderonly.c > b/src/gallium/auxiliary/renderonly/renderonly.c > new file mode 100644 > index 000..c4ea784 > --- /dev/null > +++ b/src/gallium/auxiliary/renderonly/renderonly.c > @@ -0,0 +1,199 @@ > +/* > + * Copyright (C) 2016 Christian Gmeiner > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > FROM, > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > THE > + * SOFTWARE. > + * > + * Authors: > + *Christian Gmeiner > + */ >
Re: [Mesa-dev] [PATCH 3/3] imx: gallium driver for imx-drm scanout driver
On Thu, Dec 01, 2016 at 03:00:20PM +, Emil Velikov wrote: > On 30 November 2016 at 13:44, Christian Gmeiner >wrote: > > The imx (stub) driver is needed to get hardware acceleration from > > etnaviv on a platform using imx-drm kms driver. This adds support > > for wayland and native kms egl apps. > > > > Signed-off-by: Christian Gmeiner > > --- > > configure.ac | 12 +++ > > src/gallium/Makefile.am| 4 +++ > > .../auxiliary/pipe-loader/pipe_loader_drm.c| 5 +++ > > src/gallium/auxiliary/target-helpers/drm_helper.h | 23 > > .../auxiliary/target-helpers/drm_helper_public.h | 3 ++ > > src/gallium/drivers/imx/Automake.inc | 9 + > > src/gallium/drivers/imx/Makefile.am| 9 + > > src/gallium/winsys/imx/drm/Makefile.am | 33 + > > src/gallium/winsys/imx/drm/Makefile.sources| 3 ++ > > src/gallium/winsys/imx/drm/imx_drm_public.h| 31 > > src/gallium/winsys/imx/drm/imx_drm_winsys.c| 41 > > ++ > > 11 files changed, 173 insertions(+) > I think you want to add the following to src/gallium/targets/dri/Makefile.am > > include $(top_srcdir)/src/gallium/drivers/imx/Automake.inc > > Otherwise there will be no imx_dri.so module which you can use. > ^^ is a must have afaics, everything else (mentioned below) can be > tackled at a later stage. > > A set of targets/pipe-loader/* changes would be nice... unless I beat > you to it and fold the final round of duplication that we have in the > pipe-loader/targets topic ;-) > > > > +include Makefile.sources > > +include $(top_srcdir)/src/gallium/Automake.inc > > + > > +AM_CFLAGS = \ > > + -I$(top_srcdir)/src/gallium/drivers \ > Add the following and then ... >-I$(top_srcdir)/src/gallium/winsys \ > > > + $(GALLIUM_WINSYS_CFLAGS) \ > > + $(IMX_CFLAGS) > > + > > +noinst_LTLIBRARIES = libimxdrm.la > > + > > +libimxdrm_la_SOURCES = $(C_SOURCES) > > \ No newline at end of file > Please add the missing newlines throughout. > > > > diff --git a/src/gallium/winsys/imx/drm/imx_drm_public.h > > b/src/gallium/winsys/imx/drm/imx_drm_public.h > > new file mode 100644 > > index 000..2d93da2 > > --- /dev/null > > +++ b/src/gallium/winsys/imx/drm/imx_drm_public.h > > @@ -0,0 +1,31 @@ > > +/* > > + * Copyright © 2014 NVIDIA Corporation > > + * > Disclaimer: IANAL > > Here and other copyright notices could be updated to reflect you. > Things have changed noticeably that any recemblense with the original > is conicidential. Agreed, there's nothing in this file that I'd consider copyright of NVIDIA. If there's anything copyrightable in this series at all I'd think it'd be the create/export/import/tiling dance that the render- only library does. Thierry signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] egl: syncronize surface information with driver RFC
This makes sure that the values we return are in sync what the driver currently has. Together with dEQP change bug #98327 this fixes following test: dEQP-EGL.functional.resize.surface_size.grow v2: implement callback also for dri3 v3: make optional for dri2 drivers, only x11 dri2 seems to require this right now Signed-off-by: Tapani PälliBugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98327 --- src/egl/drivers/dri2/egl_dri2.c | 12 src/egl/drivers/dri2/egl_dri2.h | 2 ++ src/egl/drivers/dri2/platform_x11.c | 31 +++ src/egl/main/eglapi.h | 2 ++ src/egl/main/eglfallbacks.c | 1 + src/egl/main/eglsurface.c | 3 +++ 6 files changed, 51 insertions(+) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 52fbdff..112c6ad 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -1264,6 +1264,17 @@ dri2_destroy_surface(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSurface *surf) return dri2_dpy->vtbl->destroy_surface(drv, dpy, surf); } +static EGLBoolean +dri2_sync_surface(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSurface *surf) +{ + struct dri2_egl_display *dri2_dpy = dri2_egl_display(dpy); + + if (dri2_dpy->vtbl->sync_surface) + return dri2_dpy->vtbl->sync_surface(drv, dpy, surf); + + return EGL_TRUE; +} + /** * Called via eglMakeCurrent(), drv->API.MakeCurrent(). */ @@ -2942,6 +2953,7 @@ _eglBuiltInDriverDRI2(const char *args) dri2_drv->base.API.CreatePixmapSurface = dri2_create_pixmap_surface; dri2_drv->base.API.CreatePbufferSurface = dri2_create_pbuffer_surface; dri2_drv->base.API.DestroySurface = dri2_destroy_surface; + dri2_drv->base.API.SyncSurface = dri2_sync_surface; dri2_drv->base.API.GetProcAddress = dri2_get_proc_address; dri2_drv->base.API.WaitClient = dri2_wait_client; dri2_drv->base.API.WaitNative = dri2_wait_native; diff --git a/src/egl/drivers/dri2/egl_dri2.h b/src/egl/drivers/dri2/egl_dri2.h index eac58f3..d5cc9b8 100644 --- a/src/egl/drivers/dri2/egl_dri2.h +++ b/src/egl/drivers/dri2/egl_dri2.h @@ -110,6 +110,8 @@ struct dri2_egl_display_vtbl { EGLBoolean (*destroy_surface)(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSurface *surface); + EGLBoolean (*sync_surface)(_EGLDriver *drv, _EGLDisplay *dpy, + _EGLSurface *surface); EGLBoolean (*swap_interval)(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSurface *surf, EGLint interval); diff --git a/src/egl/drivers/dri2/platform_x11.c b/src/egl/drivers/dri2/platform_x11.c index db7d3b9..ecbd8de 100644 --- a/src/egl/drivers/dri2/platform_x11.c +++ b/src/egl/drivers/dri2/platform_x11.c @@ -395,6 +395,36 @@ dri2_x11_destroy_surface(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *surf) } /** + * Syncronize surface geometry with server + */ +static EGLBoolean +dri2_x11_sync_surface(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *surf) +{ + struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp); + struct dri2_egl_surface *dri2_surf = dri2_egl_surface(surf); + xcb_get_geometry_cookie_t cookie; + xcb_get_geometry_reply_t *reply; + EGLBoolean result = EGL_TRUE; + xcb_generic_error_t *error; + + cookie = xcb_get_geometry (dri2_dpy->conn, dri2_surf->drawable); + reply = xcb_get_geometry_reply (dri2_dpy->conn, cookie, ); + if (reply == NULL) + return EGL_FALSE; + + if (error != NULL) { + _eglLog(_EGL_WARNING, "error in xcb_get_geometry"); + result = EGL_FALSE; + free(error); + } else { + surf->Width = reply->width; + surf->Height = reply->height; + } + free(reply); + return result; +} + +/** * Process list of buffer received from the server * * Processes the list of buffers received in a reply from the server to either @@ -1124,6 +1154,7 @@ static struct dri2_egl_display_vtbl dri2_x11_display_vtbl = { .create_pixmap_surface = dri2_x11_create_pixmap_surface, .create_pbuffer_surface = dri2_x11_create_pbuffer_surface, .destroy_surface = dri2_x11_destroy_surface, + .sync_surface = dri2_x11_sync_surface, .create_image = dri2_x11_create_image_khr, .swap_interval = dri2_x11_swap_interval, .swap_buffers = dri2_x11_swap_buffers, diff --git a/src/egl/main/eglapi.h b/src/egl/main/eglapi.h index 710c5d8..5152b93 100644 --- a/src/egl/main/eglapi.h +++ b/src/egl/main/eglapi.h @@ -94,6 +94,8 @@ struct _egl_api const EGLint *attrib_list); EGLBoolean (*DestroySurface)(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSurface *surface); + EGLBoolean (*SyncSurface)(_EGLDriver *drv, _EGLDisplay *dpy, + _EGLSurface *surface); EGLBoolean (*QuerySurface)(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSurface *surface, EGLint attribute,
[Mesa-dev] [PATCH] new EGL internal SyncSurface hook
Hi; I've been investigating dEQP EGL bugs. Following test seems to fail now and then and only on X11 when using DRI2: dEQP-EGL.functional.resize.surface_size.grow This patch is RFC for a 'SyncSurface' API that will query surface geometry (xcb_get_geometry) before returning current stored values in '_eglQuerySurface'. Without this we might have wrong values in place, I believe this is related to following comments in the source code .. --- 8< --- src/mesa/drivers/dri/i965/brw_context.c: /* GLX uses DRI2 invalidate events to handle window resizing. * Unfortunately, EGL does not - libEGL is written in XCB (not Xlib), * which doesn't provide a mechanism for snooping the event queues. * * So EGL still relies on viewport hacks to handle window resizing. * This should go away with DRI3000. */ src/egl/drivers/dri2/platform_x11.c: /* Since we aren't watching for the server's invalidate events like we're * supposed to (due to XCB providing no mechanism for filtering the events * the way xlib does), and SwapBuffers is a common cause of invalidate * events, just shove one down to the driver, even though we haven't told * the driver that we're the kind of loader that provides reliable * invalidate events. This causes the driver to request buffers again at * its next draw, so that we get the correct buffers if a pageflip * happened. The driver should still be using the viewport hack to catch * window resizes. */ --- 8< --- This patch together with a dEQP change I've attached to bug #98327 makes this test pass consistently for me on 2 machines, one is running Fedora with DRI3 and another one Ubuntu with DRI2. Is this OK approach? Should the name be something else like SyncSurfaceGeometry or GetSurfaceGeometry? Any comments appreciated; Tapani Pälli (1): egl: syncronize surface information with driver RFC src/egl/drivers/dri2/egl_dri2.c | 12 src/egl/drivers/dri2/egl_dri2.h | 2 ++ src/egl/drivers/dri2/platform_x11.c | 31 +++ src/egl/main/eglapi.h | 2 ++ src/egl/main/eglfallbacks.c | 1 + src/egl/main/eglsurface.c | 3 +++ 6 files changed, 51 insertions(+) -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Drop bogus is_vertex_input from add_shader_variable().
lgtm Reviewed-by: Alejandro PiñeiroOn 17/12/16 03:46, Kenneth Graunke wrote: > stage_mask is a bitmask of shader stages, so the proper comparison would > be (1 << MESA_SHADER_VERTEX), not MESA_SHADER_VERTEX itself. > > But we only care for structure types, and VS inputs cannot be structs. > So we can just drop this entirely. > > Signed-off-by: Kenneth Graunke > --- > src/compiler/glsl/linker.cpp | 7 +-- > 1 file changed, 1 insertion(+), 6 deletions(-) > > Sorry, I forgot I split this into two patches...apply this one before > [PATCH] glsl: Fix program interface queries relating to interface blocks. > > diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp > index 3660257..5066014 100644 > --- a/src/compiler/glsl/linker.cpp > +++ b/src/compiler/glsl/linker.cpp > @@ -3735,10 +3735,6 @@ add_shader_variable(const struct gl_context *ctx, > bool use_implicit_location, int location, > const glsl_type *outermost_struct_type = NULL) > { > - const bool is_vertex_input = > - programInterface == GL_PROGRAM_INPUT && > - stage_mask == MESA_SHADER_VERTEX; > - > switch (type->base_type) { > case GLSL_TYPE_STRUCT: { >/* The ARB_program_interface_query spec says: > @@ -3764,8 +3760,7 @@ add_shader_variable(const struct gl_context *ctx, >outermost_struct_type)) > return false; > > - field_location += > -field->type->count_attribute_slots(is_vertex_input); > + field_location += field->type->count_attribute_slots(false); >} >return true; > } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure.ac: Respect LLVM_CFLAGS in LLVM version detection
On 19 December 2016 at 08:52, Tomasz Figawrote: > Hi Tobias, > > On Sat, Dec 17, 2016 at 2:15 AM, Tobias Droste wrote: >> Hi Tomasz, >> >> does this actually fix anything? >> >> Because right now llvm-config.h does not include anything and I doubt it will >> in the future, as it's just a collection of defines. >> The path to the header file itself is given by llvm-config >> ($LLVM_INCLUDEDIR). >> >> Did you just happen to see this or do you get an error without this patch? > > We happen to have the setup exactly as described in the patch > description, i.e. LLVM in a non-standard directory and with > llvm-config.h that includes another header. Without the patch > ./configure fails because of LLVM version detection errors. > I believe you're spot on - be that due to missing include [directives] or conditional header inclusion one would need to have the respective CFLAGS. Thus the patch is Reviewed-by: Emil Velikov At the same time - I seems to be lucky enough to have the headers sane [neither includes nor defines are required] on my Arch setup. Wondering if it's not something specific to the Android way of building LLVM - IIRC the Android-x86 guys were explaining some interesting things. ... but that for another day. I'll commit your patch once I double-check Tobias' fixes. Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Fix program interface queries relating to interface blocks.
On 17/12/16 17:57, Kenneth Graunke wrote: > On Saturday, December 17, 2016 5:41:35 PM PST Alejandro Piñeiro wrote: >> On 17/12/16 03:35, Kenneth Graunke wrote: >>> This fixes 555 dEQP tests (using the nougat-cts-dev branch), Piglit's >>> arb_program_interface_query/arb_program_interface_query-resource-query, >> Somewhat offtopic: FWIW, piglit tests for arb_program_interface_query >> doesn't tests too much the REFERENCED_BY_XXX property (see below). I >> have a wip piglit tests locally that still fails. Mentioning this >> because ... >> >>> and GL45-CTS.program_interface_query.separate-programs-{tess-control, >>> tess-eval,geometry}. Only one dEQP program interface failure remains. >> ... as far as I see, just one CTS program interface query keeps failing >> after your patch, and it is caused by some issues with >> REFERENCED_BY_XXX. In any case, about this, now Im somewhat more >> optimistic, after realizing (while testing your patch) that Ian Romanick >> fixed GL45-CTS.geometry_shader.program_resource.program_resource, and >> cleaned up referenced_by on the patch "linker: Accurately track >> gl_uniform_block::stageref" (commit 084105). Will take a look to that >> next Monday. > Is the one remaining failure > > GL45-CTS.program_interface_query.uniform-block-types > > ...? Yes that one. > I believe that one's fixed by Ian's series: > > https://lists.freedesktop.org/archives/mesa-dev/2016-December/138359.html Oh neat! Yes, that series fixes that test, just tested. Thanks! signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Fix program interface queries relating to interface blocks.
On 17/12/16 17:57, Kenneth Graunke wrote: > On Saturday, December 17, 2016 5:41:35 PM PST Alejandro Piñeiro wrote: >> On 17/12/16 03:35, Kenneth Graunke wrote: >>> This fixes 555 dEQP tests (using the nougat-cts-dev branch), Piglit's >>> arb_program_interface_query/arb_program_interface_query-resource-query, >> Somewhat offtopic: FWIW, piglit tests for arb_program_interface_query >> doesn't tests too much the REFERENCED_BY_XXX property (see below). I >> have a wip piglit tests locally that still fails. Mentioning this >> because ... >> >>> and GL45-CTS.program_interface_query.separate-programs-{tess-control, >>> tess-eval,geometry}. Only one dEQP program interface failure remains. >> ... as far as I see, just one CTS program interface query keeps failing >> after your patch, and it is caused by some issues with >> REFERENCED_BY_XXX. In any case, about this, now Im somewhat more >> optimistic, after realizing (while testing your patch) that Ian Romanick >> fixed GL45-CTS.geometry_shader.program_resource.program_resource, and >> cleaned up referenced_by on the patch "linker: Accurately track >> gl_uniform_block::stageref" (commit 084105). Will take a look to that >> next Monday. > Is the one remaining failure > > GL45-CTS.program_interface_query.uniform-block-types Yes I meant that test. > > ...? I believe that one's fixed by Ian's series: > > https://lists.freedesktop.org/archives/mesa-dev/2016-December/138359.html Oh neat, just made a quick test, and yes it fixes that test. BR signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] V7 Loop unrolling in NIR
On Mon, 2016-12-19 at 17:38 +1100, Timothy Arceri wrote: > On Sun, 2016-12-18 at 21:58 -0800, Jason Ekstrand wrote: > > On Dec 18, 2016 9:48 PM, "Timothy Arceri"> .c > > om> wrote: > > V7: > > - paritally out of ssa in unrolling pass to avoid phis > > - lots of simplification/tidy ups in the analysis pass > > - if_uses bug fix (missing functionality) in lcssa fixed > > - better support for non trivial loop terminators > > - fixed all loop HURT except 1 that is too big to unroll. > > > > total instructions in shared programs: 12584624 -> 12584621 (- > > 0.00%) > > instructions in affected programs: 68507 -> 68504 (-0.00%) > > helped: 70 > > HURT: 170 > > > > total cycles in shared programs: 24146 -> 241476226 (-0.01%) > > cycles in affected programs: 4060722 -> 4036952 (-0.59%) > > helped: 1241 > > HURT: 1278 > > > > total loops in shared programs: 4245 -> 2948 (-30.55%) > > loops in affected programs: 1535 -> 238 (-84.50%) > > helped: 1142 > > HURT: 1 > > > > That is a *lot* of loops that we were leaving intact for no good > > reason. Any idea how many of those were because the glsl pass > > didn't > > know what to do weight them vs. the heuristic subtly changing? > Oh whoops I need to learn how to read. Doing things the other way around if I remove the instruction and iterations limit from the GLSL IR pass I get ... a frozen PC for a few minutes processing various shaders and eventually it crashes processing a deus ex shader. Looking at the history is seems the heuristics might have more to do with GLSL IR being slow than the actual output being too large. commit be5f27a84d0d4efb57071d9d7ecda061223d03ef Author: Eric Anholt Date: Tue Feb 21 13:37:49 2012 -0800 glsl: Refine the loop instruction counting. Before, we were only counting top-level instructions. But if we have an assignment of a giant expression tree (such as the ones eventually generated by glsl-fs-unroll), we were counting the same as an assignment of a variable deref. glsl-fs-unroll-explosion now fails in a reasonable amount of time on i965 because the unrolling didn't go ridiculously far. Reviewed-by: Kenneth Graunke commit 67007080b716c7e51039a381f407ababd68230f7 Author: Mathias Fröhlich Date: Wed Jan 25 17:35:01 2012 +0100 glsl: Avoid excessive loop unrolling. Avoid unrollong loops that are either nested loops or where the loop body times the unroll count is huge. The change is far from being perfect but it extends the loop unrolling decision heuristic by some additional safeguard. In particular this cuts down compilation of a shader precomputing atmospheric scattering integral tables containing two nesting levels in a loop from something way beyond some minutes (I never waited for it to finish) to some fractions of a second. This fixes piglit tests glsl-fs-unroll-explosion and glsl-vs-unroll-explosion on r600g. Reviewed-by: Eric Anholt Signed-off-by: Mathias Fröhlich > Are you sending html emails lately? Your comments have been > indistinguishable from the original email in a number of emails > lately. > > I disable instruction/iteration unrolling restrictions and it helped > on > ly 13 loops. > > For the record V7 unrolls around 500 more loops than V6. I think > handling phis as a conditional might give us another good jump. I > would > need to dig a bit deeper to be sure why so many loops remain. > > I'm happy to work on improving things further once we land the > initial > version. > > > > > > > LOST: 26 > > GAINED: 16 > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure.ac: Respect LLVM_CFLAGS in LLVM version detection
Hi Tobias, On Sat, Dec 17, 2016 at 2:15 AM, Tobias Drostewrote: > Hi Tomasz, > > does this actually fix anything? > > Because right now llvm-config.h does not include anything and I doubt it will > in the future, as it's just a collection of defines. > The path to the header file itself is given by llvm-config ($LLVM_INCLUDEDIR). > > Did you just happen to see this or do you get an error without this patch? We happen to have the setup exactly as described in the patch description, i.e. LLVM in a non-standard directory and with llvm-config.h that includes another header. Without the patch ./configure fails because of LLVM version detection errors. Best regards, Tomasz ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/6] cso: Optimize cso_save/restore_fragment_samplers
On 19.12.2016 09:11, Michel Dänzer wrote: On 16/12/16 08:22 PM, Nicolai Hähnle wrote: On 16.12.2016 10:52, Michel Dänzer wrote: From: Michel DänzerOnly copy/memset the pointers that actually need to be. Signed-off-by: Michel Dänzer --- src/gallium/auxiliary/cso_cache/cso_context.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c b/src/gallium/auxiliary/cso_cache/cso_context.c index f52969d366..03f78eea53 100644 --- a/src/gallium/auxiliary/cso_cache/cso_context.c +++ b/src/gallium/auxiliary/cso_cache/cso_context.c @@ -1267,8 +1267,10 @@ cso_save_fragment_samplers(struct cso_context *ctx) struct sampler_info *saved = >fragment_samplers_saved; saved->nr_samplers = info->nr_samplers; - memcpy(saved->cso_samplers, info->cso_samplers, sizeof(info->cso_samplers)); - memcpy(saved->samplers, info->samplers, sizeof(info->samplers)); + memcpy(saved->cso_samplers, info->cso_samplers, info->nr_samplers * + sizeof(*info->cso_samplers)); + memcpy(saved->samplers, info->samplers, info->nr_samplers * + sizeof(*info->samplers)); } @@ -1277,9 +1279,20 @@ cso_restore_fragment_samplers(struct cso_context *ctx) { struct sampler_info *info = >samplers[PIPE_SHADER_FRAGMENT]; struct sampler_info *saved = >fragment_samplers_saved; + int delta = info->nr_samplers - saved->nr_samplers; + + memcpy(info->cso_samplers, saved->cso_samplers, + saved->nr_samplers * sizeof(*info->cso_samplers)); + memcpy(info->samplers, saved->samplers, + saved->nr_samplers * sizeof(*info->samplers)); + + if (delta > 0) { I'd be more comfortable with an explicit info->nr_samplers > saved->nr_samplers given that nr_samplers is an unsigned. How about: int delta = (int)info->nr_samplers - saved->nr_samplers; This generates identical code as without the cast, whereas with the explicit comparison the ELF text section grows by 48 bytes. Yes, that sounds good. Apart from that, patches 1-3: Reviewed-by: Nicolai Hähnle Thanks! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/6] cso: Optimize cso_save/restore_fragment_samplers
On 16/12/16 08:22 PM, Nicolai Hähnle wrote: > On 16.12.2016 10:52, Michel Dänzer wrote: >> From: Michel Dänzer>> >> Only copy/memset the pointers that actually need to be. >> >> Signed-off-by: Michel Dänzer >> --- >> src/gallium/auxiliary/cso_cache/cso_context.c | 21 + >> 1 file changed, 17 insertions(+), 4 deletions(-) >> >> diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c >> b/src/gallium/auxiliary/cso_cache/cso_context.c >> index f52969d366..03f78eea53 100644 >> --- a/src/gallium/auxiliary/cso_cache/cso_context.c >> +++ b/src/gallium/auxiliary/cso_cache/cso_context.c >> @@ -1267,8 +1267,10 @@ cso_save_fragment_samplers(struct cso_context >> *ctx) >> struct sampler_info *saved = >fragment_samplers_saved; >> >> saved->nr_samplers = info->nr_samplers; >> - memcpy(saved->cso_samplers, info->cso_samplers, >> sizeof(info->cso_samplers)); >> - memcpy(saved->samplers, info->samplers, sizeof(info->samplers)); >> + memcpy(saved->cso_samplers, info->cso_samplers, info->nr_samplers * >> + sizeof(*info->cso_samplers)); >> + memcpy(saved->samplers, info->samplers, info->nr_samplers * >> + sizeof(*info->samplers)); >> } >> >> >> @@ -1277,9 +1279,20 @@ cso_restore_fragment_samplers(struct >> cso_context *ctx) >> { >> struct sampler_info *info = >samplers[PIPE_SHADER_FRAGMENT]; >> struct sampler_info *saved = >fragment_samplers_saved; >> + int delta = info->nr_samplers - saved->nr_samplers; >> + >> + memcpy(info->cso_samplers, saved->cso_samplers, >> + saved->nr_samplers * sizeof(*info->cso_samplers)); >> + memcpy(info->samplers, saved->samplers, >> + saved->nr_samplers * sizeof(*info->samplers)); >> + >> + if (delta > 0) { > > I'd be more comfortable with an explicit info->nr_samplers > > saved->nr_samplers given that nr_samplers is an unsigned. How about: int delta = (int)info->nr_samplers - saved->nr_samplers; This generates identical code as without the cast, whereas with the explicit comparison the ELF text section grows by 48 bytes. > Apart from that, patches 1-3: > > Reviewed-by: Nicolai Hähnle Thanks! -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0
Hello Matt, We have done most of the suggestions you made to our patches. However, we have replied to some of your questions/suggestions and we are waiting for a reply before marking them as R-b or not. You can clone the new version of the patch series by running this command: $ git clone -b i965-fp64-gen7-scalar-vec4-rc3 https://github.com/Igalia /mesa.git Below is the list of patches that need a R-b (they are marked as UNREVIEWED in the branch). * i965/vec4: implement hardware workaround for align16 double to float conversion > > This always seemed like a really strange hardware bug, and one > that no one should ever hit. > > I'd prefer that, instead of loading an immediate double and > then > performing a conversion to float, that we just convert the > double to float in the compiler and emit an instruction to load > that. > We have done this. Does this change get your R-b? * i965/vec4: fix optimize predicate for doubles We have replied here [0]. * i965/vec4: handle 32 and 64 bit channels in liveness analysis It is still unreviewed. Maybe Curro can take a look at it. * i965/vec4: add a SIMD lowering pass Replied here [1]. * i965/vec4: Prevent copy propagation from violating pre-gen8 restrictions Replied here [1]. * i965/vec4: run scalarize_df() after spilling Replied here [1]. * i965/gen7: expose OpenGL 4.0 on Haswell We are currently discussing it with Curro :-) Thanks! Sam [0] https://lists.freedesktop.org/archives/mesa-dev/2016-December/13815 1.html [1] https://lists.freedesktop.org/archives/mesa-dev/2016-December/13816 0.html signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev