Re: [Mesa-dev] [PATCH V3] glapi: add GL_OES_geometry_shader extension
On 01/13/2016 10:29 AM, Lofstedt, Marta wrote: -Original Message- From: ibmir...@gmail.com [mailto:ibmir...@gmail.com] On Behalf Of Ilia Mirkin Sent: Tuesday, January 12, 2016 7:09 PM To: Marta Lofstedt Cc: mesa-dev@lists.freedesktop.org; Romanick, Ian D; Lofstedt, Marta Subject: Re: [PATCH V3] glapi: add GL_OES_geometry_shader extension On Tue, Jan 12, 2016 at 9:13 AM, Marta Lofstedtwrote: From: Marta Lofstedt Add xml definitions for the GL_OES_geometry_shader extension and expose the extension for OpenGL ES 3.1. V3: Added dependency to OES_shader_io_blocks and updated to correct Khronos extension number. May I ask why you did this? OES_shader_io_blocks is a purely shader compiler/linker feature, I expect it will be enabled whenever GLES 3.1 is enabled, no? Why would it be tied to geometry shaders? Sure, geometry shaders require it to work, but just because you have OES_shader_io_blocks doesn't necessarily mean you also have geometry shaders... My intension was to address the co-dependency between oes_geometry_shader and oes_shader_io_block. But as always, you are right Ilia. The dependency issue need to be fixed in the driver. So, please disregard this V3, I will push the V2 with the changes suggested by Ilia in the comments. FYI here are quotes from the oes_geometry_shader specification: " OES_shader_io_blocks or EXT_shader_io_blocks is required." IMO according to this it looks OES_shader_io_blocks is a valid requirement as that functionality is not part of OpenGL ES 3.1. " This extension relies on the OES_shader_io_blocks extension to provide the required functionality for declaring input and output blocks and interfacing between shaders." " If the OES_geometry_shader extension is enabled, the OES_shader_io_blocks extension is also implicitly enabled. In practical terms, there's a non-trivial chance that A4xx will get tessellation before geometry shaders, which would also require OES_shader_io_blocks to be exposed. Yes, but for desktop we already have the "shader_io_block" functionality. So, the dependency is a GLES issue, and tessellation is not yet exposed under GLES. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/8] gallium/radeon: implement PIPE_CAP_INVALIDATE_BUFFER
On Tuesday 12 January 2016, Nicolai Hähnle wrote: > On 12.01.2016 13:41, Fredrik Höglund wrote: > > On Tuesday 12 January 2016, Nicolai Hähnle wrote: > >> From: Nicolai Hähnle> >> > >> --- > >> src/gallium/drivers/r600/r600_pipe.c| 2 +- > >> src/gallium/drivers/radeon/r600_buffer_common.c | 23 > >> --- > >> src/gallium/drivers/radeon/r600_pipe_common.c | 1 + > >> src/gallium/drivers/radeon/r600_pipe_common.h | 3 +++ > >> src/gallium/drivers/radeonsi/si_pipe.c | 2 +- > >> 5 files changed, 22 insertions(+), 9 deletions(-) > >> > >> diff --git a/src/gallium/drivers/r600/r600_pipe.c > >> b/src/gallium/drivers/r600/r600_pipe.c > >> index a8805f6..569f77c 100644 > >> --- a/src/gallium/drivers/r600/r600_pipe.c > >> +++ b/src/gallium/drivers/r600/r600_pipe.c > >> @@ -278,6 +278,7 @@ static int r600_get_param(struct pipe_screen* pscreen, > >> enum pipe_cap param) > >>case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR: > >>case PIPE_CAP_TGSI_TXQS: > >>case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS: > >> + case PIPE_CAP_INVALIDATE_BUFFER: > >>return 1; > >> > >>case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: > >> @@ -355,7 +356,6 @@ static int r600_get_param(struct pipe_screen* pscreen, > >> enum pipe_cap param) > >>case PIPE_CAP_TGSI_FS_POSITION_IS_SYSVAL: > >>case PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL: > >>case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT: > >> - case PIPE_CAP_INVALIDATE_BUFFER: > >>return 0; > >> > >>case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS: > >> diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c > >> b/src/gallium/drivers/radeon/r600_buffer_common.c > >> index aeb9a20..09755e0 100644 > >> --- a/src/gallium/drivers/radeon/r600_buffer_common.c > >> +++ b/src/gallium/drivers/radeon/r600_buffer_common.c > >> @@ -209,6 +209,21 @@ static void r600_buffer_destroy(struct pipe_screen > >> *screen, > >>FREE(rbuffer); > >> } > >> > >> +void r600_invalidate_resource(struct pipe_context *ctx, > >> +struct pipe_resource *resource) > >> +{ > >> + struct r600_common_context *rctx = (struct r600_common_context*)ctx; > >> +struct r600_resource *rbuffer = r600_resource(resource); > >> + > >> + /* Check if mapping this buffer would cause waiting for the GPU. */ > >> + if (r600_rings_is_buffer_referenced(rctx, rbuffer->buf, > >> RADEON_USAGE_READWRITE) || > >> + !rctx->ws->buffer_wait(rbuffer->buf, 0, RADEON_USAGE_READWRITE)) { > >> + rctx->invalidate_buffer(>b, >b.b); > >> + } else { > >> + util_range_set_empty(>valid_buffer_range); > >> + } > > > > This implementation does not exactly comply with the specification. > > > > The point of InvalidateBuffer is to tell the driver that it may discard the > > contents of the buffer if, for example, the buffer needs to be evicted. > > > > Calling InvalidateBuffer is not equivalent to calling MapBufferRange > > with GL_MAP_INVALIDATE_BUFFER_BIT, since the former should invalidate > > the buffer regardless of whether it is busy or not. > > Can you back this with a quote from the spec? Given that no-op seems to > be a correct implmentation of InvalidateBuffer, I find what you write > rather hard to believe. The overview says: "GL implementations often include several memory spaces, each with distinct performance characteristics, and the implementations transparently move allocations between memory spaces. With this extension, an application can tell the GL that the contents of a texture or buffer are no longer needed, and the implementation can avoid transferring the data unnecessarily." This to me makes the intent pretty clear. The implementation is of course free to do what it wants with this information, including nothing at all. My objection here is that your implementation only helps applications that are using the extension incorrectly. But it is still an improvement over doing nothing at all. > Part of the problems may be that the spec talks about "invalidating" > without - as far as I can tell - ever defining what that means. In any > case, I see no reason why the behavior should be different form > GL_MAP_INVALIDATE_BUFFER_BIT. > > Thanks, > Nicolai > > > > >> +} > >> + > >> static void *r600_buffer_get_transfer(struct pipe_context *ctx, > >> struct pipe_resource *resource, > >> unsigned level, > >> @@ -276,13 +291,7 @@ static void *r600_buffer_transfer_map(struct > >> pipe_context *ctx, > >>!(usage & PIPE_TRANSFER_UNSYNCHRONIZED)) { > >>assert(usage & PIPE_TRANSFER_WRITE); > >> > >> - /* Check if mapping this buffer would cause waiting for the > >> GPU. */ > >> - if (r600_rings_is_buffer_referenced(rctx, rbuffer->buf, > >> RADEON_USAGE_READWRITE) || > >> -
Re: [Mesa-dev] [PATCH] radeonsi: don't print a warning for unhandled registers returned by LLVM
On Wed, Jan 13, 2016 at 4:25 AM, Michel Dänzerwrote: > On 13.01.2016 03:44, Marek Olšák wrote: >> From: Marek Olšák >> >> We don't want apps to flood stderr. New LLVM + old Mesa is a perfectly >> valid combination (if it doesn't fail to build, of course). > > Actually it's not, in general. Why not? Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nir: Add a fquantize2f16 opcode
On 01/12/2016 05:41 PM, Matt Turner wrote: > On Tue, Jan 12, 2016 at 4:10 PM, Jason Ekstrandwrote: >> On Tue, Jan 12, 2016 at 3:52 PM, Matt Turner wrote: >>> >>> On Tue, Jan 12, 2016 at 3:35 PM, Jason Ekstrand >>> wrote: This opcode simply takes a 32-bit floating-point value and reduces its effective precision to 16 bits. --- >>> >>> What's it supposed to do for values not representable in half-precision? >> >> >> If they're in-range, round. If they're out-of-range, the appropriate >> infinity. > > Are you sure that's the behavior hardware has? And by "are you sure" I > mean "have you tested it" > > The conversion table in the f32to16 documentation in the IVB PRM says: > > single precision -> half precision > > -finite -> -finite/-denorm/-0 > +finite -> +finite/+denorm/+0 > >> https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.html#OpQuantizeToF16 > >> Quantize a floating-point value to a what is expressible by a 16-bit >> floating-point value. > > Erf, anyway, > > ... and the "convert too-large values to inf" isn't the behavior of > other languages like C [1] (and I don't think GLSL either, but I can't > find anything on the matter i the spec) or OpenCL C [2]. Some background may either clarify or further muddy things. Right now applications sprinkle mediump and lowp all over the place in GLSL ES shaders. Many vertex shader implementations, even on mobile devices, do everything in single precision. Many devices will only use f16 part of the time because some instructions may not have f16 versions. When we finally implement f16 in the i965 driver, we'll be in this boat too. As a result, people think that their mediump-decorated code is fine... until it actually runs on a device that really does mediump. Then they report a bug to the vendor of that hardware. Sound like a familiar situation? From this problem the OpQuantizeToF16 SPRI-V instruction was born. The intention is that people could compile their code in a way that mediump gives you mediump precision on every device. While you probably wouldn't want to ship such code, this at least makes it possible to test it without having to find a device that will really do native mediump calculations all the time. IIRC, GLSL doesn't require Inf in mediump. I don't recall what SPRI-V says. I believe that GLSL allows saturating to the maximum magnitude representable value. What we want is for an expression tree like OpQuantizeToF16(OpQuantizeToF16(x) + OpQuantizeToF16(y)) to produce the same value that 'x + y' would produce in "real" f16 mediump. The SPRI-V +/-Inf requirement doesn't completely jive with my recollection of the discussions... but there was a lot of back-and-forth, and it was quite a few months ago at this point. I think we may have picked just one possible answer instead of allowing both choices just for consistency. I don't have any memory whether anyone strongly wanted the +/-Inf behavior or if it was just a coin toss. > Section 8.3.2 of the OpenCL C 2.0 spec is also relevant, but doesn't > touch directly on the issue at hand. > > I'm worried that what is specified is not implementable via a round > trip through half-precision, because it's not the behavior other > languages implement. > > If I had to guess, given the table in the IVB PRM and section 8.3.2, > out-of-range single-precision floats are converted to the > half-precision value with the largest magnitude. You are correct, we should test it to be sure what the hardware really does. This is not intended to be a performance operation. If we need to use a different, more expensive expansion to meet the requirements, we shouldn't lose any sleep over it. > [1] C99 spec, 6.3.1.5 says "If the value being converted is outside > the range of values that can be represented, the behavior is > undefined." > [2] OpenCL C 2.0 spec 6.2.3.3 says to refer to C99 spec section 6.3. > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V2 19/28] glsl: add support for explicit components to frag outputs
V2: fix error checking for arrays and components. V1 was only taking into account all the array elements and all the components of one of the varyings during the comparision and treating the other as a single slot/component. Cc: Anuj Phogat--- src/glsl/linker.cpp | 72 + 1 file changed, 62 insertions(+), 10 deletions(-) diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index b81bfba..c66dcc4 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -2411,7 +2411,12 @@ assign_attribute_or_color_locations(gl_shader_program *prog, } } to_assign[16]; + /* Temporary array for the set of attributes that have locations assigned. +*/ + ir_variable *assigned[16]; + unsigned num_attr = 0; + unsigned assigned_attr = 0; foreach_in_list(ir_instruction, node, sh->ir) { ir_variable *const var = node->as_variable(); @@ -2573,18 +2578,62 @@ assign_attribute_or_color_locations(gl_shader_program *prog, * attribute overlaps any previously allocated bits. */ if ((~(use_mask << attr) & used_locations) != used_locations) { - if (target_index == MESA_SHADER_FRAGMENT || - (prog->IsES && prog->Version >= 300)) { - linker_error(prog, - "overlapping location is assigned " - "to %s `%s' %d %d %d\n", string, - var->name, used_locations, use_mask, attr); + if (target_index == MESA_SHADER_FRAGMENT && !prog->IsES) { + /* From section 4.4.2 (Output Layout Qualifiers) of the GLSL + * 4.40 spec: + * + *"Additionally, for fragment shader outputs, if two + *variables are placed within the same location, they + *must have the same underlying type (floating-point or + *integer). No component aliasing of output variables or + *members is allowed. + */ + for (unsigned i = 0; i < assigned_attr; i++) { + unsigned assigned_slots = +assigned[i]->type->count_attribute_slots(false); +unsigned assig_attr = +assigned[i]->data.location - generic_base; +unsigned assigned_use_mask = (1 << assigned_slots) - 1; + + if ((assigned_use_mask << assig_attr) & + (use_mask << attr)) { + +const glsl_type *assigned_type = + assigned[i]->type->without_array(); +const glsl_type *type = var->type->without_array(); +if (assigned_type->base_type != type->base_type) { + linker_error(prog, "types do not match for aliased" +" %ss %s and %s\n", string, +assigned[i]->name, var->name); + return false; +} + +unsigned assigned_component_mask = + ((1 << assigned_type->vector_elements) - 1) << + assigned[i]->data.location_frac; +unsigned component_mask = + ((1 << type->vector_elements) - 1) << + var->data.location_frac; +if (assigned_component_mask & component_mask) { + linker_error(prog, "overlapping component is " +"assigned to %ss %s and %s " +"(component=%d)\n", +string, assigned[i]->name, var->name, +var->data.location_frac); + return false; +} + } + } + } else if (target_index == MESA_SHADER_FRAGMENT || + (prog->IsES && prog->Version >= 300)) { + linker_error(prog, "overlapping location is assigned " + "to %s `%s' %d %d %d\n", string, var->name, + used_locations, use_mask, attr); return false; } else { - linker_warning(prog, - "overlapping location is assigned " - "to %s `%s' %d %d %d\n", string, - var->name, used_locations, use_mask, attr); + linker_warning(prog, "overlapping location is assigned " + "to %s `%s' %d %d %d\n", string, var->name, + used_locations, use_mask, attr);
Re: [Mesa-dev] [PATCH 19/28] glsl: add support for explicit components to frag outputs
On Tue, 2016-01-12 at 16:36 -0800, Anuj Phogat wrote: > On Mon, Dec 28, 2015 at 9:00 PM, Timothy Arceri >wrote: > > --- > > src/glsl/linker.cpp | 56 > > - > > 1 file changed, 55 insertions(+), 1 deletion(-) > > > > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp > > index 41ff057..44dd7f0 100644 > > --- a/src/glsl/linker.cpp > > +++ b/src/glsl/linker.cpp > > @@ -2411,7 +2411,12 @@ > > assign_attribute_or_color_locations(gl_shader_program *prog, > >} > > } to_assign[16]; > > > > + /* Temporary array for the set of attributes that have > > locations assigned. > > +*/ > > + ir_variable *assigned[16]; > > + > > unsigned num_attr = 0; > > + unsigned assigned_attr = 0; > > > > foreach_in_list(ir_instruction, node, sh->ir) { > >ir_variable *const var = node->as_variable(); > > @@ -2573,7 +2578,53 @@ > > assign_attribute_or_color_locations(gl_shader_program *prog, > > * attribute overlaps any previously allocated bits. > > */ > > if ((~(use_mask << attr) & used_locations) != > > used_locations) { > > - if (target_index == MESA_SHADER_FRAGMENT || > > + if (target_index == MESA_SHADER_FRAGMENT && !prog > > ->IsES) { > > + /* From section 4.4.2 (Output Layout Qualifiers) > > of the GLSL > > + * 4.40 spec: > > + * > > + *"Additionally, for fragment shader > > outputs, if two > > + *variables are placed within the same > > location, they > > + *must have the same underlying type > > (floating-point or > > + *integer). No component aliasing of output > > variables or > > + *members is allowed. > > + */ > > + int frag_out_end_loc = (var->type->is_array() ? > > + var->type->arrays_of_arrays_size() : 1) + > > + var->data.location; > > + > > + for (unsigned i = 0; i < assigned_attr; i++) { > > + for (int j = var->data.location; j < > > frag_out_end_loc; > > + j++) { > > +if (assigned[i]->data.location == j) { > I find assigned[i]->data.location == var->data.location more > readable. This comment got me looking at this code and the piglit tests again and I seem to be missing a piglit test for overlaping array output from the fragment shader. ... 20 minute later after writing the tests, it seems that both error checks for arrays and components are only half working correctly with this patch. I'm about to send a V2 and have already sent the piglit tests: http://patchwork.freedesktop.org/patch/70351/ > > > + if (assigned[i]->type->without_array() > > ->base_type != > > + var->type->without_array() > > ->base_type) { > > + linker_error(prog, > > + "types do not match for > > aliased" > > + " %ss %s and %s\n", > > string, > > + assigned[i]->name, var > > ->name); > > + return false; > > + } > > + > > + if ((assigned[i]->data.location_frac == > > +var->data.location_frac) || > > + ((assigned[i]->data.location_frac < > > +var->data.location_frac) && > > +((assigned[i]->data.location_frac > > + > > + assigned[i]->type > > ->vector_elements) > > > + var->data.location_frac))) { > > + linker_error(prog, > > + "overlapping component > > is " > > + "assigned to %ss %s and > > %s " > > + "(component=%d)\n", > > + string, assigned[i] > > ->name, > > + var->name, > > + var > > ->data.location_frac); > > + return false; > > + } > > +} > > + } > > + } > > + } else if (target_index == MESA_SHADER_FRAGMENT || > > (prog->IsES && prog->Version >= 300)) { > >linker_error(prog, > > "overlapping location is assigned " > > @@ -2614,6 +2665,9 @@ > > assign_attribute_or_color_locations(gl_shader_program *prog, > > double_storage_locations |= (use_mask << attr); > >
Re: [Mesa-dev] [PATCH demos] configure.ac: Fix default behavior of AC_ARG_WITH(glut) if glut isn't available
ping 2015-12-10 16:32 GMT+01:00 Andreas Boll: > Fixes a regression introduced in > 406248811eb0dfabf75ae9495b54529ec59cce66 > > It wrongly sets glut_enabled=yes if glut isn't available and neither > option --with-glut nor --without-glut was given. > > The default behavior in that case should be if glut is available then > enable glut else it should disable glut. > > To fix this the default value of glut_enabled is set back to yes and in > case --without-glut was given glut_enabled is set to no. > > Cc: Ross Burton > Signed-off-by: Andreas Boll > --- > configure.ac | 7 +++ > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/configure.ac b/configure.ac > index 0525b09..ddc68b5 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -67,7 +67,7 @@ DEMO_CFLAGS="$DEMO_CFLAGS $GL_CFLAGS" > DEMO_LIBS="$DEMO_LIBS $GL_LIBS" > > dnl Check for GLUT > -glut_enabled=no > +glut_enabled=yes > AC_ARG_WITH([glut], > [AS_HELP_STRING([--with-glut=DIR], > [glut install directory])], > @@ -83,9 +83,8 @@ AS_IF([test "x$with_glut" != xno], > AC_CHECK_LIB([glut], > [glutInit], > [], > -[glut_enabled=no]) > - glut_enabled=yes > -]) > +[glut_enabled=no])], > + [glut_enabled=no]) > > dnl Check for FreeGLUT 2.6 or later > AC_EGREP_HEADER([glutInitContextProfile], > -- > 2.1.4 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V3] glapi: add GL_OES_geometry_shader extension
> -Original Message- > From: ibmir...@gmail.com [mailto:ibmir...@gmail.com] On Behalf Of Ilia > Mirkin > Sent: Tuesday, January 12, 2016 7:09 PM > To: Marta Lofstedt > Cc: mesa-dev@lists.freedesktop.org; Romanick, Ian D; Lofstedt, Marta > Subject: Re: [PATCH V3] glapi: add GL_OES_geometry_shader extension > > On Tue, Jan 12, 2016 at 9:13 AM, Marta Lofstedt >wrote: > > From: Marta Lofstedt > > > > Add xml definitions for the GL_OES_geometry_shader extension and > > expose the extension for OpenGL ES 3.1. > > > > V3: Added dependency to OES_shader_io_blocks and updated to correct > > Khronos extension number. > > May I ask why you did this? OES_shader_io_blocks is a purely shader > compiler/linker feature, I expect it will be enabled whenever GLES 3.1 is > enabled, no? Why would it be tied to geometry shaders? Sure, geometry > shaders require it to work, but just because you have OES_shader_io_blocks > doesn't necessarily mean you also have geometry shaders... > My intension was to address the co-dependency between oes_geometry_shader and oes_shader_io_block. But as always, you are right Ilia. The dependency issue need to be fixed in the driver. So, please disregard this V3, I will push the V2 with the changes suggested by Ilia in the comments. FYI here are quotes from the oes_geometry_shader specification: " OES_shader_io_blocks or EXT_shader_io_blocks is required." " This extension relies on the OES_shader_io_blocks extension to provide the required functionality for declaring input and output blocks and interfacing between shaders." " If the OES_geometry_shader extension is enabled, the OES_shader_io_blocks extension is also implicitly enabled. > In practical terms, there's a non-trivial chance that A4xx will get > tessellation > before geometry shaders, which would also require OES_shader_io_blocks > to be exposed. Yes, but for desktop we already have the "shader_io_block" functionality. So, the dependency is a GLES issue, and tessellation is not yet exposed under GLES. > > -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 92687] Add support for ARB_internalformat_query2
https://bugs.freedesktop.org/show_bug.cgi?id=92687 --- Comment #3 from Eduardo Lima Mitev--- (In reply to Eduardo Lima Mitev from comment #2) > > Following, there are some initial issues/questions we have been gathering: > Independently of feedback to the branch we posted, it would be very useful to get insights on the issues/questions above, which are most of them generic. In any case, we plan to send the branch as an RFC series to mesa-dev list soon, (e.g, end of this week). -- You are receiving this mail because: You are the QA Contact for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/meta-fast-clear: Convert the clear color through the surf format
Bump. Anyone fancy reviewing this small patch? I think it would be good to have because it makes the code a bit simpler as well as fixing a corner case and making it more robust. - Neil Neil Robertswrites: > When programming the fast clear color there was previously a chunk of > code to try to make the color match the constraints of the surface > format such as by filling in missing components and handling luminance > formats. These cases are not handled by the hardware. There are some > additional possible restrictions that the hardware does seem to > handle, such as clamping to [0,1] for normalised formats. However for > whatever reason it doesn't clamp to [0,∞] for the special float > formats that don't have a sign bit. Rather than adding yet another > special case for this format this patch makes it instead convert the > color to the actual surface format and back again so that we can be > sure it will have all of the possible restrictions. Additionally this > would avoid some other potential surprises such as getting more > precision for the clear color when fast clears are used. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93338 > --- > src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 57 > - > 1 file changed, 27 insertions(+), 30 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > index cf0e56b..29ae6f0 100644 > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > @@ -37,6 +37,8 @@ > #include "main/uniforms.h" > #include "main/fbobject.h" > #include "main/texobj.h" > +#include "main/format_unpack.h" > +#include "main/format_pack.h" > > #include "main/api_validate.h" > #include "main/state.h" > @@ -397,45 +399,40 @@ set_fast_clear_color(struct brw_context *brw, > struct intel_mipmap_tree *mt, > const union gl_color_union *color) > { > + mesa_format linear_format = _mesa_get_srgb_format_linear(mt->format); > union gl_color_union override_color = *color; > - > - /* The sampler doesn't look at the format of the surface when the fast > -* clear color is used so we need to implement luminance, intensity and > -* missing components manually. > -*/ > - switch (_mesa_get_format_base_format(mt->format)) { > - case GL_INTENSITY: > - override_color.ui[3] = override_color.ui[0]; > - /* flow through */ > - case GL_LUMINANCE: > - case GL_LUMINANCE_ALPHA: > - override_color.ui[1] = override_color.ui[0]; > - override_color.ui[2] = override_color.ui[0]; > - break; > - default: > - for (int i = 0; i < 3; i++) { > - if (!_mesa_format_has_color_component(mt->format, i)) > -override_color.ui[i] = 0; > - } > - break; > - } > - > - if (!_mesa_format_has_color_component(mt->format, 3)) { > - if (_mesa_is_format_integer_color(mt->format)) > - override_color.ui[3] = 1; > - else > - override_color.f[3] = 1.0f; > - } > + union gl_color_union tmp_color; > > /* Handle linear→SRGB conversion */ > - if (brw->ctx.Color.sRGBEnabled && > - _mesa_get_srgb_format_linear(mt->format) != mt->format) { > + if (brw->ctx.Color.sRGBEnabled && linear_format != mt->format) { >for (int i = 0; i < 3; i++) { > override_color.f[i] = > util_format_linear_to_srgb_float(override_color.f[i]); >} > } > > + /* Convert the clear color to the surface format and back so that the > color > +* returned when sampling is guaranteed to be a value that could be stored > +* in the surface. For example if the surface is a luminance format and we > +* clear to 0.5,0.75,0.1,0.2 we want the color to come back as > +* 0.5,0.5,0.5,1.0. In general the hardware doesn't seem to look at the > +* surface format when returning the clear color so we need to do this to > +* implement luminance, intensity and missing components. However it does > +* seem to look at it in some cases such as to clamp to the range [0,1] > for > +* unorm formats. Suprisingly however it doesn't clamp to [0,∞] for the > +* special float formats that don't have a sign bit. > +*/ > + if (!_mesa_is_format_integer_color(linear_format)) { > + _mesa_pack_float_rgba_row(linear_format, > +1, /* n_pixels */ > +(const GLfloat (*)[4]) override_color.f, > +_color); > + _mesa_unpack_rgba_row(linear_format, > +1, /* n_pixels */ > +_color, > +(GLfloat (*)[4]) override_color.f); > + } > + > if (brw->gen >= 9) { >mt->gen9_fast_clear_color = override_color; > } else { > -- > 1.9.3 > > ___
[Mesa-dev] Mesa 11.1.1
Mesa 11.1.1 is now available. With this release we have a significant amount of fixes - from radeonsi (Fiji, Hyper-Z), r600 (geom. shaders), nouveau (ir), freedreno (piglits), i965 (UBOs) and a few patches for "GRID Autosport" (i965 and glsl). Additionally I've included the PCI IDs for Intel's KabyLake devices. Last but not least - a few more BSD related build fixes are included :-) Brian Paul (1): st/mesa: check state->mesa in early return check in st_validate_state() Dave Airlie (6): mesa/varray: set double arrays to non-normalised. mesa/shader: return correct attribute location for double matrix arrays glsl: pass stage into mark function glsl/fp64: add helper for dual slot double detection. glsl: fix count_attribute_slots to allow for different 64-bit handling glsl: only update doubles inputs for vertex inputs. Emil Velikov (5): docs: add sha256 checksums for 11.0.1 cherry-ignore: drop the "re-enable" DCC on Stoney cherry-ignore: don't pick a specific i965 formats patch Update version to 11.1.1 docs: add release notes for 11.1.1 Eric Anholt (2): vc4: Warn instead of abort()ing on exec ioctl failures. vc4: Keep sample mask writes from being reordered after TLB writes Grazvydas Ignotas (1): r600: fix constant buffer size programming Ian Romanick (1): meta/generate_mipmap: Work-around GLES 1.x problem with GL_DRAW_FRAMEBUFFER Ilia Mirkin (9): nv50/ir: can't have predication and immediates gk104/ir: simplify and fool-proof texbar algorithm glsl: assign varying locations to tess shaders when doing SSO glx/dri3: a drawable might not be bound at wait time nvc0: don't forget to reset VTX_TMP bufctx slot after blit completion nv50/ir: float(s32 & 0xff) = float(u8), not s8 nv50,nvc0: make sure there's pushbuf space and that we ref the bo early nv50,nvc0: fix crash when increasing bsp bo size for h264 nvc0: scale up inter_bo size so that it's 16M for a 4K video Jonathan Gray (2): configure.ac: use pkg-config for libelf configure: check for python2.7 for PYTHON2 Kenneth Graunke (5): ralloc: Fix ralloc_adopt() to the old context's last child's parent. drirc: Disable ARB_blend_func_extended for Heaven 4.0/Valley 1.0. glsl: Fix varying struct locations when varying packing is disabled. nvc0: Set winding order regardless of domain. nir: Add a lower_fdiv option, turn fdiv into fmul/frcp. Marek Olšák (7): tgsi/scan: add flag colors_written r600g: write all MRTs only if there is exactly one output (fixes a hang) radeonsi: don't call of u_prims_for_vertices for patches and rectangles radeonsi: apply the streamout workaround to Fiji as well gallium/radeon: fix Hyper-Z hangs by programming PA_SC_MODE_CNTL_1 correctly program: add _mesa_reserve_parameter_storage st/mesa: fix GLSL uniform updates for glBitmap & glDrawPixels (v2) Miklós Máté (1): mesa: Don't leak ATIfs instructions in DeleteFragmentShader Neil Roberts (3): i965: Add MESA_FORMAT_B8G8R8X8_SRGB to brw_format_for_mesa_format i965: Add B8G8R8X8_SRGB to the alpha format override i965: Fix crash when calling glViewport with no surface bound Nicolai Hähnle (2): gallium/radeon: only dispose locally created target machine in radeon_llvm_compile gallium/radeon: fix regression in a number of driver queries Oded Gabbay (1): configura.ac: fix test for SSE4.1 assembler support Patrick Rudolph (2): nv50,nvc0: fix use-after-free when vertex buffers are unbound gallium/util: return correct number of bound vertex buffers Rob Herring (1): freedreno/ir3: fix 32-bit builds with pointer-to-int-cast error enabled Samuel Pitoiset (3): nvc0: free memory allocated by the prog which reads MP perf counters nv50,nvc0: free memory allocated by performance metrics nv50: free memory allocated by the prog which reads MP perf counters Sarah Sharp (1): mesa: Add KBL PCI IDs and platform information. git tag: mesa-11.1.1 ftp://ftp.freedesktop.org/pub/mesa/11.1.1/mesa-11.1.1.tar.gz MD5: f0f6df1bd436fd2ccf2dec9b4d583638 mesa-11.1.1.tar.gz SHA1: 98351f58e5ba906cb9ed2311c5b07832c756ca22 mesa-11.1.1.tar.gz SHA256: b15089817540ba0bffd0aad323ecf3a8ff6779568451827c7274890b4a269d58 mesa-11.1.1.tar.gz PGP: ftp://ftp.freedesktop.org/pub/mesa/11.1.1/mesa-11.1.1.tar.gz.sig ftp://ftp.freedesktop.org/pub/mesa/11.1.1/mesa-11.1.1.tar.xz MD5: 1043dfb907beecb2a761272455960427 mesa-11.1.1.tar.xz SHA1: 77eeb75660e8d0851457151ef18c87540c6fd6bc mesa-11.1.1.tar.xz SHA256: 64db074fc514136b5fb3890111f0d50604db52f0b1e94ba3fcb0fe8668a7fd20 mesa-11.1.1.tar.xz PGP: ftp://ftp.freedesktop.org/pub/mesa/11.1.1/mesa-11.1.1.tar.xz.sig -- -Emil signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list
Re: [Mesa-dev] [PATCH 7/8] gallium/radeon: implement PIPE_CAP_INVALIDATE_BUFFER
On Wed, Jan 13, 2016 at 11:41 AM, Fredrik Höglundwrote: > On Tuesday 12 January 2016, Nicolai Hähnle wrote: >> On 12.01.2016 13:41, Fredrik Höglund wrote: >> > On Tuesday 12 January 2016, Nicolai Hähnle wrote: >> >> From: Nicolai Hähnle >> >> >> >> --- >> >> src/gallium/drivers/r600/r600_pipe.c| 2 +- >> >> src/gallium/drivers/radeon/r600_buffer_common.c | 23 >> >> --- >> >> src/gallium/drivers/radeon/r600_pipe_common.c | 1 + >> >> src/gallium/drivers/radeon/r600_pipe_common.h | 3 +++ >> >> src/gallium/drivers/radeonsi/si_pipe.c | 2 +- >> >> 5 files changed, 22 insertions(+), 9 deletions(-) >> >> >> >> diff --git a/src/gallium/drivers/r600/r600_pipe.c >> >> b/src/gallium/drivers/r600/r600_pipe.c >> >> index a8805f6..569f77c 100644 >> >> --- a/src/gallium/drivers/r600/r600_pipe.c >> >> +++ b/src/gallium/drivers/r600/r600_pipe.c >> >> @@ -278,6 +278,7 @@ static int r600_get_param(struct pipe_screen* >> >> pscreen, enum pipe_cap param) >> >>case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR: >> >>case PIPE_CAP_TGSI_TXQS: >> >>case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS: >> >> + case PIPE_CAP_INVALIDATE_BUFFER: >> >>return 1; >> >> >> >>case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: >> >> @@ -355,7 +356,6 @@ static int r600_get_param(struct pipe_screen* >> >> pscreen, enum pipe_cap param) >> >>case PIPE_CAP_TGSI_FS_POSITION_IS_SYSVAL: >> >>case PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL: >> >>case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT: >> >> - case PIPE_CAP_INVALIDATE_BUFFER: >> >>return 0; >> >> >> >>case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS: >> >> diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c >> >> b/src/gallium/drivers/radeon/r600_buffer_common.c >> >> index aeb9a20..09755e0 100644 >> >> --- a/src/gallium/drivers/radeon/r600_buffer_common.c >> >> +++ b/src/gallium/drivers/radeon/r600_buffer_common.c >> >> @@ -209,6 +209,21 @@ static void r600_buffer_destroy(struct pipe_screen >> >> *screen, >> >>FREE(rbuffer); >> >> } >> >> >> >> +void r600_invalidate_resource(struct pipe_context *ctx, >> >> +struct pipe_resource *resource) >> >> +{ >> >> + struct r600_common_context *rctx = (struct r600_common_context*)ctx; >> >> +struct r600_resource *rbuffer = r600_resource(resource); >> >> + >> >> + /* Check if mapping this buffer would cause waiting for the GPU. */ >> >> + if (r600_rings_is_buffer_referenced(rctx, rbuffer->buf, >> >> RADEON_USAGE_READWRITE) || >> >> + !rctx->ws->buffer_wait(rbuffer->buf, 0, RADEON_USAGE_READWRITE)) { >> >> + rctx->invalidate_buffer(>b, >b.b); >> >> + } else { >> >> + util_range_set_empty(>valid_buffer_range); >> >> + } >> > >> > This implementation does not exactly comply with the specification. >> > >> > The point of InvalidateBuffer is to tell the driver that it may discard the >> > contents of the buffer if, for example, the buffer needs to be evicted. >> > >> > Calling InvalidateBuffer is not equivalent to calling MapBufferRange >> > with GL_MAP_INVALIDATE_BUFFER_BIT, since the former should invalidate >> > the buffer regardless of whether it is busy or not. >> >> Can you back this with a quote from the spec? Given that no-op seems to >> be a correct implmentation of InvalidateBuffer, I find what you write >> rather hard to believe. > > The overview says: > > "GL implementations often include several memory spaces, each with > distinct performance characteristics, and the implementations > transparently move allocations between memory spaces. With this > extension, an application can tell the GL that the contents of a > texture or buffer are no longer needed, and the implementation can > avoid transferring the data unnecessarily." > > This to me makes the intent pretty clear. The implementation is of > course free to do what it wants with this information, including nothing > at all. My objection here is that your implementation only helps > applications that are using the extension incorrectly. But it is still an > improvement over doing nothing at all. I wouldn't worry about the spec overview too much. It's just a motivating introduction to the spec. However, immediately before InvalidateBufferData, there is this sentence: "After this command, data in the specified range have undefined values." That's a very clear definition of the behavior, and this patch seems to do the right thing. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3] glapi: Build glapi_gentable.c only on Darwin
Removes the public symbol _glapi_create_table_from_handle from libGL.so.1 on all platforms except Darwin. Since the symbol is not used on other platforms it makes sense to build glapi_gentable.c only on Darwin. A little bit of history: _glapi_create_table_from_handle was introduced in commit 85937f4c0d4a78d3a11e3c1fa6148640f2a9ad7b Author: Jeremy HuddlestonDate: Thu Jun 9 16:59:49 2011 -0700 glapi: Add API that can create a _glapi_table from a dlfcn handle Example usage: void *handle = dlopen(opengl_library_path, RTLD_LOCAL); struct _glapi_table *disp = _glapi_create_table_from_handle(handle, "gl"); Signed-off-by: Jeremy Huddleston and the only user in mesa was added in commit f35913b96e743c5014e99220b1a1c5532a894d69 Author: Jeremy Huddleston Date: Thu Jun 9 17:29:51 2011 -0700 apple: Use _glapi_create_table_from_handle to initialize our dispatch table Signed-off-by: Jeremy Huddleston gl_gentable.py was also used for XQuartz in xserver 1.11 - 1.14. v2: Fix typos in commit message Add missing XORG_GLAPI_OUTPUTS += \ into src/mapi/glapi/gen/Makefile.am Add glapi_gentable.c to EXTRA_DIST for inclusion in the release tarball v3: Fix commit message: s/gl_gentable.c/glapi_gentable.c/ Cc: Jeremy Huddleston Signed-off-by: Andreas Boll --- src/mapi/Makefile.am | 6 +- src/mapi/glapi/gen/Makefile.am | 14 +++--- src/mapi/glapi/glapi.h | 2 ++ 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/src/mapi/Makefile.am b/src/mapi/Makefile.am index 307e05d..ddd3daa 100644 --- a/src/mapi/Makefile.am +++ b/src/mapi/Makefile.am @@ -106,12 +106,16 @@ if HAVE_SPARC_ASM GLAPI_ASM_SOURCES = glapi/glapi_sparc.S endif -glapi_libglapi_la_SOURCES = glapi/glapi_gentable.c +glapi_libglapi_la_SOURCES = glapi_libglapi_la_CPPFLAGS = \ $(AM_CPPFLAGS) \ -I$(top_srcdir)/src/mapi/glapi \ -I$(top_srcdir)/src/mesa +if HAVE_APPLEDRI +glapi_libglapi_la_SOURCES += glapi/glapi_gentable.c +endif + if HAVE_SHARED_GLAPI glapi_libglapi_la_SOURCES += $(MAPI_BRIDGE_FILES) glapi/glapi_mapi_tmp.h glapi_libglapi_la_CPPFLAGS += \ diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am index 900b61a..3f3e0b9 100644 --- a/src/mapi/glapi/gen/Makefile.am +++ b/src/mapi/glapi/gen/Makefile.am @@ -27,8 +27,11 @@ MESA_GLAPI_OUTPUTS = \ $(MESA_GLAPI_DIR)/glapi_mapi_tmp.h \ $(MESA_GLAPI_DIR)/glprocs.h \ $(MESA_GLAPI_DIR)/glapitemp.h \ - $(MESA_GLAPI_DIR)/glapitable.h \ - $(MESA_GLAPI_DIR)/glapi_gentable.c + $(MESA_GLAPI_DIR)/glapitable.h + +if HAVE_APPLEDRI +MESA_GLAPI_OUTPUTS += $(MESA_GLAPI_DIR)/glapi_gentable.c +endif MESA_GLAPI_ASM_OUTPUTS = if HAVE_X86_ASM @@ -57,6 +60,7 @@ BUILT_SOURCES = \ $(MESA_GLX_DIR)/indirect_size.c EXTRA_DIST= \ $(BUILT_SOURCES) \ + $(MESA_GLAPI_DIR)/glapi_gentable.c \ $(MESA_GLAPI_DIR)/glapi_x86.S \ $(MESA_GLAPI_DIR)/glapi_x86-64.S \ $(MESA_GLAPI_DIR)/glapi_sparc.S \ @@ -88,8 +92,12 @@ XORG_GLAPI_DIR = $(XORG_BASE)/glx XORG_GLAPI_OUTPUTS = \ $(XORG_GLAPI_DIR)/glprocs.h \ $(XORG_GLAPI_DIR)/glapitable.h \ - $(XORG_GLAPI_DIR)/dispatch.h \ + $(XORG_GLAPI_DIR)/dispatch.h + +if HAVE_APPLEDRI +XORG_GLAPI_OUTPUTS += \ $(XORG_GLAPI_DIR)/glapi_gentable.c +endif XORG_OUTPUTS = \ $(XORG_GLAPI_OUTPUTS) \ diff --git a/src/mapi/glapi/glapi.h b/src/mapi/glapi/glapi.h index f269b17..3593c88 100644 --- a/src/mapi/glapi/glapi.h +++ b/src/mapi/glapi/glapi.h @@ -158,8 +158,10 @@ _GLAPI_EXPORT const char * _glapi_get_proc_name(unsigned int offset); +#ifdef GLX_USE_APPLEGL _GLAPI_EXPORT struct _glapi_table * _glapi_create_table_from_handle(void *handle, const char *symbol_prefix); +#endif _GLAPI_EXPORT void -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [android-x86-devel] Re: need-help: how to change to newest mesa in android-x86?
On Wed, Jan 13, 2016 at 12:54 PM, Rob Herringwrote: > On Tue, Jan 12, 2016 at 8:06 PM, Chih-Wei Huang > wrote: >> 2016-01-13 6:29 GMT+08:00 Rob Herring : >>> On Tue, Jan 12, 2016 at 7:05 AM, Chih-Wei Huang >>> wrote: 2016-01-12 19:55 GMT+08:00 陈渝 : > hi, Rob, Dave, Zhiwei: > Thank you all! > > Next I need to update other parts should be in the user level. > > I need to update drm_gralloc? Do I need to update drm_hwcomposer or > libdrm? > Are there any other things I need notice? libdrm in marshmallow-x86 is 2.4.66 which is newer enough to support it, I think. drm_hwcomposer is not been used in marshmallow-x86 yet. So don't worry about it. The keypoint is to implement the gralloc_drm_virgil3d.c. You may look at other gralloc_drm_*.c as examples. >>> >>> Nope, virgl is a pipe driver and support is already there for the most part. >> >> Rob, thank you very much for the input. >> It's the first time I heard the usage of >> AOSP's drm_gralloc & drm_hwcomposer from others. >> Great! >> >> Indeed we have tried to enable AOSP's drm_gralloc & drm_hwcomposer >> in the beginning of marshmallow-x86 porting but failed. > > How far did you get? > >> So we keep to use our implementation. >> (AOSP's drm_gralloc was forked from our lollipop-x86 branch >> since about Jan 2015) >> >> I'm excited to know you have succeeded to use them. >> Could you please guide us how to enable them correctly? > > Well, things are not completely working. I've got 1 device config > which can build for x86 or arm64 (any arch in theory) and runs on x86 > KVM, Dragonboard 410c or arm64 QEMU. x86 KVM seems to work the best. I > can boot and navigate around a bit until it dies. There's at least 2 > problems. > > After a little bit of navigating around I get errors like this from > virglrenderer: > vrend_set_single_sampler_view: context error reported 6 > "ndroid.contacts" Illegal handle 112 > vrend_set_single_sampler_view: context error reported 6 > "ndroid.contacts" Illegal handle 113 > vrend_set_single_sampler_view: context error reported 6 > "ndroid.contacts" Illegal handle 114 > vrend_set_single_sampler_view: context error reported 6 > "ndroid.contacts" Illegal handle 115 > vrend_set_single_sampler_view: context error reported 6 > "ndroid.contacts" Illegal handle 116 > vrend_set_single_sampler_view: context error reported 6 > "ndroid.contacts" Illegal handle 117 > vrend_set_single_sampler_view: context error reported 6 > "ndroid.contacts" Illegal handle 118 > vrend_set_framebuffer_state: context error reported 5 > "ndroid.systemui" Illegal surface 63 > > Usually the screen get flipped and drawn in about 1/4 of the original > screen size after these errors. I can capture a screenshot if > interested. I wonder a bit if refcnt'ing imbalance or something like that.. accidentally free'ing the last reference to buffer, and then numeric handle getting re-used on a different unrelated buffer (for example) can cause all sorts of fun. > The 2nd problem is the screen fade to black shader program crashes on > linking. Seems to have a NULL function name from the stack trace, but > I've not debugged it further. This triggers whenever the screen off > timeout triggers. iirc, android-x86 had something to comment out this shader. I do remember it causing a segfault in mesa in the shader compiler. I did attempt to reproduce this w/ same shader in a test program (where I could debug w/ sane gdb env, no java, etc), but no luck. (If anyone knows how to apitrace android "java stuff".. that might be a way to get something I could debug.) I can't find the link to the android-x86 patch anymore, since the git servers moved (and not even sure if that still applies to more recent android) BR, -R > Freedreno seems to have some additional problem I haven't fully characterized. > >> Any necessary changes? > > Yes, my changes are all pushed into my github acct[1]. Instructions > are here[2]. The changes are largely build fixes, virtio-gpu support, > freedreno/dmabuf support from Rob Clark's tree, and hacks around > issues I've found. > >> Especially, is the vanilla kernel 4.4 ready to use them? > > What I'm using is pretty close to stock 4.4[3]. There's a couple of > Android patches and some virtio-gpu related changes. The virtio-gpu > changes are mainly adding atomic support for virtio-gpu. There's some > others for fb mmap and panning support as well. > > Rob > > [1] https://github.com/robherring?tab=repositories > [2] > https://github.com/robherring/generic_device/wiki/Android-with-DRM-mesa-graphics > [3] > https://git.linaro.org/people/rob.herring/linux.git/shortlog/refs/heads/android-4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/8] gallium/radeon: implement PIPE_CAP_INVALIDATE_BUFFER
On 13.01.2016 05:41, Fredrik Höglund wrote: On Tuesday 12 January 2016, Nicolai Hähnle wrote: On 12.01.2016 13:41, Fredrik Höglund wrote: On Tuesday 12 January 2016, Nicolai Hähnle wrote: From: Nicolai Hähnle--- src/gallium/drivers/r600/r600_pipe.c| 2 +- src/gallium/drivers/radeon/r600_buffer_common.c | 23 --- src/gallium/drivers/radeon/r600_pipe_common.c | 1 + src/gallium/drivers/radeon/r600_pipe_common.h | 3 +++ src/gallium/drivers/radeonsi/si_pipe.c | 2 +- 5 files changed, 22 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index a8805f6..569f77c 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -278,6 +278,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR: case PIPE_CAP_TGSI_TXQS: case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS: + case PIPE_CAP_INVALIDATE_BUFFER: return 1; case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: @@ -355,7 +356,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_TGSI_FS_POSITION_IS_SYSVAL: case PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL: case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT: - case PIPE_CAP_INVALIDATE_BUFFER: return 0; case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS: diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c b/src/gallium/drivers/radeon/r600_buffer_common.c index aeb9a20..09755e0 100644 --- a/src/gallium/drivers/radeon/r600_buffer_common.c +++ b/src/gallium/drivers/radeon/r600_buffer_common.c @@ -209,6 +209,21 @@ static void r600_buffer_destroy(struct pipe_screen *screen, FREE(rbuffer); } +void r600_invalidate_resource(struct pipe_context *ctx, + struct pipe_resource *resource) +{ + struct r600_common_context *rctx = (struct r600_common_context*)ctx; +struct r600_resource *rbuffer = r600_resource(resource); + + /* Check if mapping this buffer would cause waiting for the GPU. */ + if (r600_rings_is_buffer_referenced(rctx, rbuffer->buf, RADEON_USAGE_READWRITE) || + !rctx->ws->buffer_wait(rbuffer->buf, 0, RADEON_USAGE_READWRITE)) { + rctx->invalidate_buffer(>b, >b.b); + } else { + util_range_set_empty(>valid_buffer_range); + } This implementation does not exactly comply with the specification. The point of InvalidateBuffer is to tell the driver that it may discard the contents of the buffer if, for example, the buffer needs to be evicted. Calling InvalidateBuffer is not equivalent to calling MapBufferRange with GL_MAP_INVALIDATE_BUFFER_BIT, since the former should invalidate the buffer regardless of whether it is busy or not. Can you back this with a quote from the spec? Given that no-op seems to be a correct implmentation of InvalidateBuffer, I find what you write rather hard to believe. The overview says: "GL implementations often include several memory spaces, each with distinct performance characteristics, and the implementations transparently move allocations between memory spaces. With this extension, an application can tell the GL that the contents of a texture or buffer are no longer needed, and the implementation can avoid transferring the data unnecessarily." This to me makes the intent pretty clear. The implementation is of course free to do what it wants with this information, including nothing at all. My objection here is that your implementation only helps applications that are using the extension incorrectly. But it is still an improvement over doing nothing at all. This implementation helps applications that use glInvalidateBufferData to invalidate a buffer that they use in a streaming fashion. It seems to me that that is a correct use. Perhaps you could give an example of what you think a correct use is, and how it isn't helped by this patch? Thanks, Nicolai Part of the problems may be that the spec talks about "invalidating" without - as far as I can tell - ever defining what that means. In any case, I see no reason why the behavior should be different form GL_MAP_INVALIDATE_BUFFER_BIT. Thanks, Nicolai +} + static void *r600_buffer_get_transfer(struct pipe_context *ctx, struct pipe_resource *resource, unsigned level, @@ -276,13 +291,7 @@ static void *r600_buffer_transfer_map(struct pipe_context *ctx, !(usage & PIPE_TRANSFER_UNSYNCHRONIZED)) { assert(usage & PIPE_TRANSFER_WRITE); - /* Check if mapping this buffer would cause waiting for the GPU. */ - if
[Mesa-dev] [PATCH] radeonsi: don't miss changes to SPI_TMPRING_SIZE
From: Marek OlšákI'm not sure about the consequences of this bug, but it's definitely dangerous. This applies to SI, CIK, VI. Cc: 11.0 11.1 --- src/gallium/drivers/radeonsi/si_state_shaders.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index 35b226f..8ff70b4 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -1317,6 +1317,7 @@ static bool si_update_spi_tmpring_size(struct si_context *sctx) si_get_max_scratch_bytes_per_wave(sctx); unsigned scratch_needed_size = scratch_bytes_per_wave * sctx->scratch_waves; + unsigned spi_tmpring_size; int r; if (scratch_needed_size > 0) { @@ -1386,8 +1387,12 @@ static bool si_update_spi_tmpring_size(struct si_context *sctx) assert((scratch_needed_size & ~0x3FF) == scratch_needed_size && "scratch size should already be aligned correctly."); - sctx->spi_tmpring_size = S_0286E8_WAVES(sctx->scratch_waves) | - S_0286E8_WAVESIZE(scratch_bytes_per_wave >> 10); + spi_tmpring_size = S_0286E8_WAVES(sctx->scratch_waves) | + S_0286E8_WAVESIZE(scratch_bytes_per_wave >> 10); + if (spi_tmpring_size != sctx->spi_tmpring_size) { + sctx->spi_tmpring_size = spi_tmpring_size; + sctx->emit_scratch_reloc = true; + } return true; } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: don't miss changes to SPI_TMPRING_SIZE
Good catch. Reviewed-by: Nicolai HähnleOn 13.01.2016 13:32, Marek Olšák wrote: From: Marek Olšák I'm not sure about the consequences of this bug, but it's definitely dangerous. This applies to SI, CIK, VI. Cc: 11.0 11.1 --- src/gallium/drivers/radeonsi/si_state_shaders.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index 35b226f..8ff70b4 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -1317,6 +1317,7 @@ static bool si_update_spi_tmpring_size(struct si_context *sctx) si_get_max_scratch_bytes_per_wave(sctx); unsigned scratch_needed_size = scratch_bytes_per_wave * sctx->scratch_waves; + unsigned spi_tmpring_size; int r; if (scratch_needed_size > 0) { @@ -1386,8 +1387,12 @@ static bool si_update_spi_tmpring_size(struct si_context *sctx) assert((scratch_needed_size & ~0x3FF) == scratch_needed_size && "scratch size should already be aligned correctly."); - sctx->spi_tmpring_size = S_0286E8_WAVES(sctx->scratch_waves) | - S_0286E8_WAVESIZE(scratch_bytes_per_wave >> 10); + spi_tmpring_size = S_0286E8_WAVES(sctx->scratch_waves) | + S_0286E8_WAVESIZE(scratch_bytes_per_wave >> 10); + if (spi_tmpring_size != sctx->spi_tmpring_size) { + sctx->spi_tmpring_size = spi_tmpring_size; + sctx->emit_scratch_reloc = true; + } return true; } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] NIR, SCons, and Gallium
On 11/01/16 14:21, Jose Fonseca wrote: FWIW, I updated SCons to build NIR, both with GCC and MSVC: http://cgit.freedesktop.org/~jrfonseca/mesa/log/?h=scons-nir It was actually simpler than I anticipated. But I hit a wall -- there's actually no way to get NIR used with softpipe/llvmpipe, not even as an intermediate IR somewhere between GLSL IR and TGSI, is there? Without this I can't actually test it. And I'm afraid the scons integration will rot again unless it is used. I know other gallium drivers already use NIR, but IIUC, they use NIR internally, ie., TGSI -> NIR-> HW. So what is exactly the long term plan for NIR in Mesa general, and Gallium in particular? - replace GLSL IR completely? - use NIR as intermediate IR betweem GLSL IR and TGSI, and run optimizations in there? - use NIR instead of TGSI at the gallium interface? - be only used internally by drivers? - something else? Jose Thanks for all the replies. So IIUC, there a NIR -> TGSI pass in progress, it's not ready for production but there's several parties interested in having it as an option. It's still not crystal clear to me whether building NIR with SCons and MSVC will: - accelerate sinergy (e.g make it easier to use more NIR code in more places without risking build failures due to missing headers/symbols) - or cause more trouble (ie make MSVC builds fail even more often) I don't think there's any way to figure out but trying it. So I'm going to polish my patches and post for review and get them committed. And if it turns out that keeping NIR on a buildable state with MSVC ends up causing more problems for everybody than it solves, we can take a step back then (e.g, add a switch to not build NIR on MSVC, and set it off by default.) Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] nir: Handle =32 case in bitfield_insert lowering.
The OpenGL specifications for bitfieldInsert() says: The result will be undefined if or is negative, or if the sum of and is greater than the number of bits used to store the operand. Therefore passing bits=32, offset=0 is legal and defined in GLSL. But the earlier SM5 bfi opcode is specified to accept a bitfield width ranging from 0-31. As such, Intel and AMD instructions read only the low 5 bits of the width operand, making them not able to implement the GLSL-specified behavior directly. This commit fixes the lowering of bitfield_insert to handle the trivial case of = 32 as bitfieldInsert: bits > 31 ? insert : bfi(bfm(bits, offset), insert, base) Fixes: ES31-CTS.shader_bitfield_operation.bitfieldInsert.uint_2 ES31-CTS.shader_bitfield_operation.bitfieldInsert.uvec4_3 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595 --- These two patches replace 8/9 and 9/9 of the previous series. The first 7 patches from it have been reviewed and committed. src/glsl/nir/nir_opcodes.py | 1 + src/glsl/nir/nir_opt_algebraic.py | 6 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py index 1c65def..3e43438 100644 --- a/src/glsl/nir/nir_opcodes.py +++ b/src/glsl/nir/nir_opcodes.py @@ -558,6 +558,7 @@ triop("fcsel", tfloat, "(src0 != 0.0f) ? src1 : src2") opcode("bcsel", 0, tuint, [0, 0, 0], [tbool, tuint, tuint], "", "src0 ? src1 : src2") +# SM5 bfi assembly triop("bfi", tuint, """ unsigned mask = src0, insert = src1, base = src2; if (mask == 0) { diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py index 1eb044a..0d31e39 100644 --- a/src/glsl/nir/nir_opt_algebraic.py +++ b/src/glsl/nir/nir_opt_algebraic.py @@ -225,9 +225,13 @@ optimizations = [ # Misc. lowering (('fmod', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b, 'options->lower_fmod'), - (('bitfield_insert', a, b, c, d), ('bfi', ('bfm', d, c), b, a), 'options->lower_bitfield_insert'), (('uadd_carry', a, b), ('b2i', ('ult', ('iadd', a, b), a)), 'options->lower_uadd_carry'), (('usub_borrow', a, b), ('b2i', ('ult', a, b)), 'options->lower_usub_borrow'), + + (('bitfield_insert', 'base', 'insert', 'offset', 'bits'), +('bcsel', ('ilt', 31, 'bits'), 'insert', + ('bfi', ('bfm', 'bits', 'offset'), 'insert', 'base')), +'options->lower_bitfield_insert'), ] # Add optimizations to handle the case where the result of a ternary is -- 2.4.9 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] nir: Lower bitfield_extract.
The OpenGL specifications for bitfieldExtract() says: The result will be undefined if or is negative, or if the sum of and is greater than the number of bits used to store the operand. Therefore passing bits=32, offset=0 is legal and defined in GLSL. But the earlier SM5 ubfe/ibfe opcodes are specified to accept a bitfield width ranging from 0-31. As such, Intel and AMD instructions read only the low 5 bits of the width operand, making them not able to implement the GLSL-specified behavior directly. This commit adds ubfe/ibfe operations from SM5 and a lowering pass for bitfield_extract to to handle the trivial case of = 32 as bitfieldExtract: bits > 31 ? value : bfe(value, offset, bits) Fixes: ES31-CTS.shader_bitfield_operation.bitfieldExtract.uvec3_0 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595 --- src/glsl/nir/nir.h | 1 + src/glsl/nir/nir_opcodes.py| 31 ++ src/glsl/nir/nir_opt_algebraic.py | 10 ++ src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 3 +++ src/mesa/drivers/dri/i965/brw_shader.cpp | 1 + src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 3 +++ 6 files changed, 49 insertions(+) diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index 23aec69..11add65 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -1447,6 +1447,7 @@ typedef struct nir_shader_compiler_options { bool lower_fsat; bool lower_fsqrt; bool lower_fmod; + bool lower_bitfield_extract; bool lower_bitfield_insert; bool lower_uadd_carry; bool lower_usub_borrow; diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py index 3e43438..e79810c 100644 --- a/src/glsl/nir/nir_opcodes.py +++ b/src/glsl/nir/nir_opcodes.py @@ -573,6 +573,37 @@ if (mask == 0) { } """) +# SM5 ubfe/ibfe assembly +opcode("ubfe", 0, tuint, + [0, 0, 0], [tuint, tint, tint], "", """ +unsigned base = src0; +int offset = src1, bits = src2; +if (bits == 0) { + dst = 0; +} else if (bits < 0 || offset < 0) { + dst = 0; /* undefined */ +} else if (offset + bits < 32) { + dst = (base << (32 - bits - offset)) >> (32 - bits); +} else { + dst = base >> offset; +} +""") +opcode("ibfe", 0, tint, + [0, 0, 0], [tint, tint, tint], "", """ +int base = src0; +int offset = src1, bits = src2; +if (bits == 0) { + dst = 0; +} else if (bits < 0 || offset < 0) { + dst = 0; /* undefined */ +} else if (offset + bits < 32) { + dst = (base << (32 - bits - offset)) >> (32 - bits); +} else { + dst = base >> offset; +} +""") + +# GLSL bitfieldExtract() opcode("ubitfield_extract", 0, tuint, [0, 0, 0], [tuint, tint, tint], "", """ unsigned base = src0; diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py index 0d31e39..7745b76 100644 --- a/src/glsl/nir/nir_opt_algebraic.py +++ b/src/glsl/nir/nir_opt_algebraic.py @@ -232,6 +232,16 @@ optimizations = [ ('bcsel', ('ilt', 31, 'bits'), 'insert', ('bfi', ('bfm', 'bits', 'offset'), 'insert', 'base')), 'options->lower_bitfield_insert'), + + (('ibitfield_extract', 'value', 'offset', 'bits'), +('bcsel', ('ilt', 31, 'bits'), 'value', + ('ibfe', 'value', 'offset', 'bits')), +'options->lower_bitfield_extract'), + + (('ubitfield_extract', 'value', 'offset', 'bits'), +('bcsel', ('ult', 31, 'bits'), 'value', + ('ubfe', 'value', 'offset', 'bits')), +'options->lower_bitfield_extract'), ] # Add optimizations to handle the case where the result of a ternary is diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 8740925..d7bcc1c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -1027,6 +1027,9 @@ fs_visitor::nir_emit_alu(const fs_builder , nir_alu_instr *instr) case nir_op_ubitfield_extract: case nir_op_ibitfield_extract: + unreachable("should have been lowered"); + case nir_op_ubfe: + case nir_op_ibfe: bld.BFE(result, op[2], op[1], op[0]); break; case nir_op_bfm: diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 0ac3f4a..3a69c23 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -106,6 +106,7 @@ brw_compiler_create(void *mem_ctx, const struct brw_device_info *devinfo) nir_options->lower_fdiv = true; nir_options->lower_scmp = true; nir_options->lower_fmod = true; + nir_options->lower_bitfield_extract = true; nir_options->lower_bitfield_insert = true; nir_options->lower_uadd_carry = true; nir_options->lower_usub_borrow = true; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index ecca166..0ae723f 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1385,6 +1385,9
Re: [Mesa-dev] [PATCH v3] glapi: Build glapi_gentable.c only on Darwin
glxgears still works for me, and libGL goes from 4.2M to 3.3M. Reviewed-by: Matt TurnerWe should also include some mention of Arlie's contribution, since he identified this and sent the initial patch: Reported-by: Arlie Davis ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 92850] Segfault loading War Thunder
https://bugs.freedesktop.org/show_bug.cgi?id=92850 --- Comment #57 from Ernst Sjöstrand--- With current git I get a crash like this: 0x726ff349 in glsl_to_tgsi_visitor::visit (this=0x7fffa2725600, ir=0x7fffa271baf8) at state_tracker/st_glsl_to_tgsi.cpp:3161 3161 const glsl_type *sampler_type = ir->sampler->type; (gdb) bt full #0 0x726ff349 in glsl_to_tgsi_visitor::visit (this=0x7fffa2725600, ir=0x7fffa271baf8) at state_tracker/st_glsl_to_tgsi.cpp:3161 result_src = {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle = 0, negate = 0, type = 13, reladdr = 0x0, reladdr2 = , has_index2 = , double_reg2 = , array_id = , is_double_vertex_input = } coord = {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle = 0, negate = 0, type = 13, reladdr = 0x0, reladdr2 = , has_index2 = , double_reg2 = , array_id = , is_double_vertex_input = } cube_sc = {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle = 0, negate = 0, type = 13, reladdr = 0x0, reladdr2 = , has_index2 = , double_reg2 = , array_id = , is_double_vertex_input = } lod_info = {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle = 0, negate = 0, type = 13, reladdr = 0x0, reladdr2 = , has_index2 = , double_reg2 = , array_id = , is_double_vertex_input = } projector = {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle = 0, negate = 0, type = 13, reladdr = 0x0, reladdr2 = , has_index2 = , double_reg2 = , array_id = , is_double_vertex_input = } dx = {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle = 0, negate = 0, type = 13, reladdr = 0x0, reladdr2 = , has_index2 = , double_reg2 = , array_id = , is_double_vertex_input = } dy = {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle = 0, negate = 0, type = 13, reladdr = 0x0, reladdr2 = , has_index2 = , double_reg2 = , array_id = , is_double_vertex_input = } offset = {{file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle = 0, negate = 0, type = 13, reladdr = 0x0, reladdr2 = 0x0, has_index2 = false, double_reg2 = false, array_id = 0, is_double_vertex_input = false}, {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle = 0, negate = 0, type = 13, reladdr = 0x0, reladdr2 = 0x0, has_index2 = false, double_reg2 = false, array_id = 0, is_double_vertex_input = false}, {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle = 0, negate = 0, type = 13, reladdr = 0x0, reladdr2 = 0x0, has_index2 = false, double_reg2 = false, array_id = 0, is_double_vertex_input = false}, {file = 4067491840, index = 0, index2D = 0, swizzle = 89, negate = 12, type = 6, reladdr = 0x4000, reladdr2 = 0x0, has_index2 = false, double_reg2 = false, array_id = 0, is_double_vertex_input = false}} sample_index = component = levels_src = result_dst = coord_dst = cube_sc_dst = inst = opcode = sampler_type = sampler_index = is_cube_array = i = (gdb) p *(ir->sampler) $8 = { = { = { = {next = 0x0, prev = 0x0}, _vptr.ir_instruction = 0x72e23b90 , ir_type = ir_type_dereference_variable}, type = 0x72e3fcc0 }, } -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V2 19/28] glsl: add support for explicit components to frag outputs
On Wed, Jan 13, 2016 at 1:19 AM, Timothy Arceriwrote: > V2: fix error checking for arrays and components. V1 was > only taking into account all the array elements and all the > components of one of the varyings during the comparision > and treating the other as a single slot/component. > > Cc: Anuj Phogat > --- > src/glsl/linker.cpp | 72 > + > 1 file changed, 62 insertions(+), 10 deletions(-) > > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp > index b81bfba..c66dcc4 100644 > --- a/src/glsl/linker.cpp > +++ b/src/glsl/linker.cpp > @@ -2411,7 +2411,12 @@ assign_attribute_or_color_locations(gl_shader_program > *prog, >} > } to_assign[16]; > > + /* Temporary array for the set of attributes that have locations assigned. > +*/ > + ir_variable *assigned[16]; > + > unsigned num_attr = 0; > + unsigned assigned_attr = 0; > > foreach_in_list(ir_instruction, node, sh->ir) { >ir_variable *const var = node->as_variable(); > @@ -2573,18 +2578,62 @@ assign_attribute_or_color_locations(gl_shader_program > *prog, > * attribute overlaps any previously allocated bits. > */ > if ((~(use_mask << attr) & used_locations) != used_locations) { > - if (target_index == MESA_SHADER_FRAGMENT || > - (prog->IsES && prog->Version >= 300)) { > - linker_error(prog, > - "overlapping location is assigned " > - "to %s `%s' %d %d %d\n", string, > - var->name, used_locations, use_mask, attr); > + if (target_index == MESA_SHADER_FRAGMENT && !prog->IsES) { > + /* From section 4.4.2 (Output Layout Qualifiers) of the > GLSL > + * 4.40 spec: > + * > + *"Additionally, for fragment shader outputs, if two > + *variables are placed within the same location, they > + *must have the same underlying type (floating-point or > + *integer). No component aliasing of output variables > or > + *members is allowed. > + */ > + for (unsigned i = 0; i < assigned_attr; i++) { > + unsigned assigned_slots = > +assigned[i]->type->count_attribute_slots(false); > +unsigned assig_attr = > +assigned[i]->data.location - generic_base; > +unsigned assigned_use_mask = (1 << assigned_slots) - 1; > + > + if ((assigned_use_mask << assig_attr) & > + (use_mask << attr)) { > + > +const glsl_type *assigned_type = > + assigned[i]->type->without_array(); > +const glsl_type *type = var->type->without_array(); > +if (assigned_type->base_type != type->base_type) { > + linker_error(prog, "types do not match for > aliased" > +" %ss %s and %s\n", string, > +assigned[i]->name, var->name); > + return false; > +} > + > +unsigned assigned_component_mask = > + ((1 << assigned_type->vector_elements) - 1) << > + assigned[i]->data.location_frac; > +unsigned component_mask = > + ((1 << type->vector_elements) - 1) << > + var->data.location_frac; > +if (assigned_component_mask & component_mask) { > + linker_error(prog, "overlapping component is " > +"assigned to %ss %s and %s " > +"(component=%d)\n", > +string, assigned[i]->name, var->name, > +var->data.location_frac); > + return false; > +} > + } > + } > + } else if (target_index == MESA_SHADER_FRAGMENT || > + (prog->IsES && prog->Version >= 300)) { > + linker_error(prog, "overlapping location is assigned " > + "to %s `%s' %d %d %d\n", string, var->name, > + used_locations, use_mask, attr); >return false; > } else { > - linker_warning(prog, > - "overlapping location is assigned " > - "to %s `%s' %d %d %d\n", string, > -
[Mesa-dev] [PATCH] texobj: Check completeness with InternalFormat rather than Mesa format
The internal Mesa format used for a texture might not match the one requested in the internalFormat when the texture was created, for example if the driver is internally remapping RGB textures to RGBA. Otherwise it can cause false positives for completeness if one mipmap image is created as RGBA and the other as RGB because they would both have an RGBA Mesa format. If we check the InternalFormat instead then we are directly checking the API usage which I think better matches the intention of the check. https://bugs.freedesktop.org/show_bug.cgi?id=93700 --- src/mesa/main/texobj.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/main/texobj.c b/src/mesa/main/texobj.c index 547055e..b107a8f 100644 --- a/src/mesa/main/texobj.c +++ b/src/mesa/main/texobj.c @@ -835,7 +835,7 @@ _mesa_test_texobj_completeness( const struct gl_context *ctx, incomplete(t, MIPMAP, "TexImage[%d] is missing", i); return; } - if (img->TexFormat != baseImage->TexFormat) { + if (img->InternalFormat != baseImage->InternalFormat) { incomplete(t, MIPMAP, "Format[i] != Format[baseLevel]"); return; } -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] ttn: use writemask for store_var
From: Rob ClarkOnly user is freedreno, and after array-rework it can cope. Avoids generating loads for a store. Signed-off-by: Rob Clark --- Note: I need to finish some array re-work in ir3 in order to be able to deal w/ the writemasks, so I intend to push this as part of that series once I've debugged a few last things. It doesn't effect vc4 which doesn't handle TEMP arrays. But wanted to send it to list now for review. src/gallium/auxiliary/nir/tgsi_to_nir.c | 28 ++-- 1 file changed, 2 insertions(+), 26 deletions(-) diff --git a/src/gallium/auxiliary/nir/tgsi_to_nir.c b/src/gallium/auxiliary/nir/tgsi_to_nir.c index 94d992b..46c9297 100644 --- a/src/gallium/auxiliary/nir/tgsi_to_nir.c +++ b/src/gallium/auxiliary/nir/tgsi_to_nir.c @@ -673,10 +673,6 @@ ttn_get_dest(struct ttn_compile *c, struct tgsi_full_dst_register *tgsi_fdst) if (tgsi_dst->File == TGSI_FILE_TEMPORARY) { if (c->temp_regs[index].var) { - nir_builder *b = >build; - nir_intrinsic_instr *load; - struct tgsi_ind_register *indirect = -tgsi_dst->Indirect ? _fdst->Indirect : NULL; nir_register *reg; /* this works, because TGSI will give us a base offset @@ -690,26 +686,6 @@ ttn_get_dest(struct ttn_compile *c, struct tgsi_full_dst_register *tgsi_fdst) reg->num_components = 4; dest.dest.reg.reg = reg; dest.dest.reg.base_offset = 0; - - /* since the alu op might not write to all components - * of the temporary, we must first do a load_var to - * get the previous array elements into the register. - * This is one area that NIR could use a bit of - * improvement (or opt pass to clean up the mess - * once things are scalarized) - */ - - load = nir_intrinsic_instr_create(c->build.shader, - nir_intrinsic_load_var); - load->num_components = 4; - load->variables[0] = - ttn_array_deref(c, load, c->temp_regs[index].var, - c->temp_regs[index].offset, - indirect); - - load->dest = nir_dest_for_reg(reg); - - nir_builder_instr_insert(b, >instr); } else { assert(!tgsi_dst->Indirect); dest.dest.reg.reg = c->temp_regs[index].reg; @@ -1886,7 +1862,7 @@ ttn_emit_instruction(struct ttn_compile *c) ttn_move_dest(b, dest, nir_fsat(b, ttn_src_for_dest(b, ))); } - /* if the dst has a matching var, append store_global to move + /* if the dst has a matching var, append store_var to move * output from reg to var */ nir_variable *var = ttn_get_var(c, tgsi_dst); @@ -1899,7 +1875,7 @@ ttn_emit_instruction(struct ttn_compile *c) _dst->Indirect : NULL; store->num_components = 4; - store->const_index[0] = 0xf; + store->const_index[0] = dest.write_mask; store->variables[0] = ttn_array_deref(c, store, var, offset, indirect); store->src[0] = nir_src_for_reg(dest.dest.reg.reg); -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V2 19/28] glsl: add support for explicit components to frag outputs
On Wed, 2016-01-13 at 12:02 -0800, Anuj Phogat wrote: > Timothy, Do you have a branch somewhere with the latest patches? https://github.com/tarceri/Mesa_arrays_of_arrays.git explicit_offset Contains the latest for component, offset, and align qualifiers all of which have now been sent to the list. > > On Wed, Jan 13, 2016 at 10:58 AM, Anuj Phogat> wrote: > > On Wed, Jan 13, 2016 at 1:19 AM, Timothy Arceri > > wrote: > > > V2: fix error checking for arrays and components. V1 was > > > only taking into account all the array elements and all the > > > components of one of the varyings during the comparision > > > and treating the other as a single slot/component. > > > > > > Cc: Anuj Phogat > > > --- > > > src/glsl/linker.cpp | 72 > > > + > > > 1 file changed, 62 insertions(+), 10 deletions(-) > > > > > > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp > > > index b81bfba..c66dcc4 100644 > > > --- a/src/glsl/linker.cpp > > > +++ b/src/glsl/linker.cpp > > > @@ -2411,7 +2411,12 @@ > > > assign_attribute_or_color_locations(gl_shader_program *prog, > > >} > > > } to_assign[16]; > > > > > > + /* Temporary array for the set of attributes that have > > > locations assigned. > > > +*/ > > > + ir_variable *assigned[16]; > > > + > > > unsigned num_attr = 0; > > > + unsigned assigned_attr = 0; > > > > > > foreach_in_list(ir_instruction, node, sh->ir) { > > >ir_variable *const var = node->as_variable(); > > > @@ -2573,18 +2578,62 @@ > > > assign_attribute_or_color_locations(gl_shader_program *prog, > > > * attribute overlaps any previously allocated bits. > > > */ > > > if ((~(use_mask << attr) & used_locations) != > > > used_locations) { > > > - if (target_index == MESA_SHADER_FRAGMENT || > > > - (prog->IsES && prog->Version >= 300)) { > > > - linker_error(prog, > > > - "overlapping location is assigned > > > " > > > - "to %s `%s' %d %d %d\n", string, > > > - var->name, used_locations, > > > use_mask, attr); > > > + if (target_index == MESA_SHADER_FRAGMENT && !prog > > > ->IsES) { > > > + /* From section 4.4.2 (Output Layout > > > Qualifiers) of the GLSL > > > + * 4.40 spec: > > > + * > > > + *"Additionally, for fragment shader > > > outputs, if two > > > + *variables are placed within the same > > > location, they > > > + *must have the same underlying type > > > (floating-point or > > > + *integer). No component aliasing of > > > output variables or > > > + *members is allowed. > > > + */ > > > + for (unsigned i = 0; i < assigned_attr; i++) { > > > + unsigned assigned_slots = > > > +assigned[i]->type > > > ->count_attribute_slots(false); > > > +unsigned assig_attr = > > > +assigned[i]->data.location - > > > generic_base; > > > +unsigned assigned_use_mask = (1 << > > > assigned_slots) - 1; > > > + > > > + if ((assigned_use_mask << assig_attr) & > > > + (use_mask << attr)) { > > > + > > > +const glsl_type *assigned_type = > > > + assigned[i]->type->without_array(); > > > +const glsl_type *type = var->type > > > ->without_array(); > > > +if (assigned_type->base_type != type > > > ->base_type) { > > > + linker_error(prog, "types do not > > > match for aliased" > > > +" %ss %s and %s\n", > > > string, > > > +assigned[i]->name, var > > > ->name); > > > + return false; > > > +} > > > + > > > +unsigned assigned_component_mask = > > > + ((1 << assigned_type > > > ->vector_elements) - 1) << > > > + assigned[i]->data.location_frac; > > > +unsigned component_mask = > > > + ((1 << type->vector_elements) - 1) << > > > + var->data.location_frac; > > > +if (assigned_component_mask & > > > component_mask) { > > > + linker_error(prog, "overlapping > > > component is " > > > +"assigned to %ss %s and > > > %s " > > > +"(component=%d)\n", > > > +string,
Re: [Mesa-dev] [PATCH 1/2] nir: Add a fquantize2f16 opcode
On Wed, Jan 13, 2016 at 2:14 PM, Matt Turnerwrote: > On Wed, Jan 13, 2016 at 1:46 PM, Jason Ekstrand > wrote: > > On Wed, Jan 13, 2016 at 2:01 AM, Ian Romanick > wrote: > >> On 01/12/2016 05:41 PM, Matt Turner wrote: > >> > Section 8.3.2 of the OpenCL C 2.0 spec is also relevant, but doesn't > >> > touch directly on the issue at hand. > >> > > >> > I'm worried that what is specified is not implementable via a round > >> > trip through half-precision, because it's not the behavior other > >> > languages implement. > >> > > >> > If I had to guess, given the table in the IVB PRM and section 8.3.2, > >> > out-of-range single-precision floats are converted to the > >> > half-precision value with the largest magnitude. > >> > >> You are correct, we should test it to be sure what the hardware really > >> does. This is not intended to be a performance operation. If we need to > >> use a different, more expensive expansion to meet the requirements, we > >> shouldn't lose any sleep over it. > > > > > > I haven't looked at it in bit-for-bit detail, but I I did run it through > a > > set of tests which explicitly hits denorms and the out-of-bounds cases in > > both directions. The tests seem to indicate that the hardware does what > the > > opcode claims. > > I checked out the tests you mention, and none of the cases touch on > what I'm saying (and this has nothing to do with denormal values). Let > me explain again. > Right. Thanks for looking at it. I guess it only checks the explicit infinity case. > The largest representable value in half-precision is > >65504 == 2.0**15 * (1.0 + 1023.0 / 2.0**10) > > and the distance between representable integers at this range is 32. > Converting 65505.0f through 65519.0f (i.e., one less than half the > interval more than the largest representable value) to half-precision > should round to 65504.0. 65520.0f and larger should round to infinity. > > This is what piglit tests > (generated_tests/gen_builtin_packing_tests.py) and since we pass those > tests I believe this is what the hardware does. > > This is, unfortunately, *not* what the documentation you've cited > says. I expect that that's an oversight more than intentional > behavior. Maybe tomorrow we can figure out how to submit changes to > the spec and test suite? > Yeah, we can look at that tomorrow. The objective of the opcode is to get the behavior that Ian mentioned where if you sprinkle enough of them in, you can emulate half-float precision. What happens if you do FLOAT_MAX + FLOAT_MAX? Maybe infinity is what's wanted. If that's the case, then we'll have to do some sort of absolute value range-check. It doesn't have to be efficient. --Jason ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: restrict consumer stage condition to modify interpolation type
Only modify interpolation type for integer-based varyings or when the consumer is known and different than fragment shader. If we are linking separate shader programs and the consumer is unknown, the consumer could be added later and be a fragment shader. If we modify the interpolation type in this case, we could read wrong values in the fragment shader inputs, as shown in bug 93320. Fixes the following CTS test: ES31-CTS.vertex_attrib_binding.advanced-bindingUpdate Fixes the following dEQP tests: dEQP-GLES31.functional.separate_shader.random.102 dEQP-GLES31.functional.separate_shader.random.111 dEQP-GLES31.functional.separate_shader.random.115 dEQP-GLES31.functional.separate_shader.random.17 dEQP-GLES31.functional.separate_shader.random.22 dEQP-GLES31.functional.separate_shader.random.23 dEQP-GLES31.functional.separate_shader.random.3 dEQP-GLES31.functional.separate_shader.random.32 dEQP-GLES31.functional.separate_shader.random.39 dEQP-GLES31.functional.separate_shader.random.64 dEQP-GLES31.functional.separate_shader.random.73 dEQP-GLES31.functional.separate_shader.random.91 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93320 Signed-off-by: Samuel Iglesias Gonsálvez--- This patch adds 2 regressions in dEQP: dEQP-GLES31.functional.separate_shader.random.49 dEQP-GLES31.functional.separate_shader.random.106 The failure is returned by validation_io() because the number of inputs and outputs does not match with this patch applied. As the interpolation type is not modified in varying_matches::record() when we don't know the consumer (consumer_stage == -1, for example in some separate shader objects), we don't pack them together because its packing class does not match. As a result, some output packed varyings are in different varying slots than the input ones. Due to that, we have a mismatch in the number of inputs and outputs because we don't check how many varyings we have inside of each varying slot ("packed:var0,var1...") nor their type. The validation of packed varyings doesn't seem to be trivial and this is a different issue than the one this patch fixes. src/glsl/link_varyings.cpp | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp index 7cc5880..09f80d0 100644 --- a/src/glsl/link_varyings.cpp +++ b/src/glsl/link_varyings.cpp @@ -968,10 +968,12 @@ varying_matches::record(ir_variable *producer_var, ir_variable *consumer_var) } if ((consumer_var == NULL && producer_var->type->contains_integer()) || - consumer_stage != MESA_SHADER_FRAGMENT) { + (consumer_stage != -1 && consumer_stage != MESA_SHADER_FRAGMENT)) { /* Since this varying is not being consumed by the fragment shader, its - * interpolation type varying cannot possibly affect rendering. Also, - * this variable is non-flat and is (or contains) an integer. + * interpolation type varying cannot possibly affect rendering. + * Also, this variable is non-flat and is (or contains) an integer. + * If the consumer stage is unknown, don't modify the interpolation + * type as it could affect rendering later with separate shaders. * * lower_packed_varyings requires all integer varyings to flat, * regardless of where they appear. We can trivially satisfy that -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V3] glapi: add GL_OES_geometry_shader extension
On 01/14/2016 01:11 AM, Ilia Mirkin wrote: On Wed, Jan 13, 2016 at 3:55 AM, Tapani Pälliwrote: On 01/13/2016 10:29 AM, Lofstedt, Marta wrote: -Original Message- From: ibmir...@gmail.com [mailto:ibmir...@gmail.com] On Behalf Of Ilia Mirkin Sent: Tuesday, January 12, 2016 7:09 PM To: Marta Lofstedt Cc: mesa-dev@lists.freedesktop.org; Romanick, Ian D; Lofstedt, Marta Subject: Re: [PATCH V3] glapi: add GL_OES_geometry_shader extension On Tue, Jan 12, 2016 at 9:13 AM, Marta Lofstedt wrote: From: Marta Lofstedt Add xml definitions for the GL_OES_geometry_shader extension and expose the extension for OpenGL ES 3.1. V3: Added dependency to OES_shader_io_blocks and updated to correct Khronos extension number. May I ask why you did this? OES_shader_io_blocks is a purely shader compiler/linker feature, I expect it will be enabled whenever GLES 3.1 is enabled, no? Why would it be tied to geometry shaders? Sure, geometry shaders require it to work, but just because you have OES_shader_io_blocks doesn't necessarily mean you also have geometry shaders... My intension was to address the co-dependency between oes_geometry_shader and oes_shader_io_block. But as always, you are right Ilia. The dependency issue need to be fixed in the driver. So, please disregard this V3, I will push the V2 with the changes suggested by Ilia in the comments. FYI here are quotes from the oes_geometry_shader specification: " OES_shader_io_blocks or EXT_shader_io_blocks is required." IMO according to this it looks OES_shader_io_blocks is a valid requirement as that functionality is not part of OpenGL ES 3.1. Sure. But that has little bearing on the discussion here -- OES_shader_io_blocks is a compiler feature, not a backend feature. In order for any backend to expose OES_geometry_shader, the OES_shader_io_blocks ext needs to be done. But just because it is done doesn't mean you have geometry shaders. Right, that is correct. For a moment there I forgot how this table works :) There needs to be separate enable bits for these. So you have to make sure that not only does the backend support geometry shaders, but the core supports OES_shader_io_blocks before you expose OES_geometry_shader. That doesn't seem too onerous. -ilia // Tapani ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: allow duplicate layout-qualifier-names
The special case from detecting stream duplicates is also removed, as testing never trigged this error. From the ARB_shading_language_420pack spec: "More than one layout qualifier may appear in a single declaration. If the same layout-qualifier-name occurs in multiple layout qualifiers for the same declaration, the last one overrides the former ones." While the extension spec is taking about multiple layout qualifiers we interpret that to mean layout-qualifier-names can also occur multiple times within a single layout qualifier. In Section 4.4 (Layout Qualifiers) of the GLSL 4.40 spec it clarifies this: "More than one layout qualifier may appear in a single declaration. Additionally, the same layout-qualifier-name can occur multiple times within a layout qualifier or across multiple layout qualifiers in the same declaration" --- The Nvidia driver allows this for GLSL 4.20 but not for the extension. Piglit tests: http://patchwork.freedesktop.org/patch/70459/ src/glsl/ast_type.cpp | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/src/glsl/ast_type.cpp b/src/glsl/ast_type.cpp index f4e51b8..afae687 100644 --- a/src/glsl/ast_type.cpp +++ b/src/glsl/ast_type.cpp @@ -158,7 +158,8 @@ ast_type_qualifier::merge_qualifier(YYLTYPE *loc, allowed_duplicates_mask.flags.i |= stream_layout_mask.flags.i; - if ((this->flags.i & q.flags.i & ~allowed_duplicates_mask.flags.i) != 0) { + if (!state->has_420pack() && + (this->flags.i & q.flags.i & ~allowed_duplicates_mask.flags.i) != 0) { _mesa_glsl_error(loc, state, "duplicate layout qualifiers used"); return false; @@ -209,11 +210,6 @@ ast_type_qualifier::merge_qualifier(YYLTYPE *loc, this->flags.q.stream = 1; this->stream = state->out_qualifier->stream; } - } else { - if (q.flags.q.explicit_stream) { -_mesa_glsl_error(loc, state, - "duplicate layout `stream' qualifier"); - } } } -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] ttn: add missing writemask on store_output
On Wednesday, January 13, 2016 6:40:59 PM PST Rob Clark wrote: > From: Rob Clark> > Signed-off-by: Rob Clark > --- > src/gallium/auxiliary/nir/tgsi_to_nir.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/gallium/auxiliary/nir/tgsi_to_nir.c > b/src/gallium/auxiliary/nir/tgsi_to_nir.c index 46c9297..e127174 100644 > --- a/src/gallium/auxiliary/nir/tgsi_to_nir.c > +++ b/src/gallium/auxiliary/nir/tgsi_to_nir.c > @@ -1908,6 +1908,7 @@ ttn_add_output_stores(struct ttn_compile *c) > store->src[0].reg.reg = c->output_regs[loc].reg; > store->src[0].reg.base_offset = c->output_regs[loc].offset; > store->const_index[0] = loc; > + store->const_index[1] = 0xf; /* writemask */ > store->src[1] = nir_src_for_ssa(nir_imm_int(b, 0)); > nir_builder_instr_insert(b, >instr); >} Oops...sorry :( Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem space
Reduces local memory usage in a lot of Metro 2033 Redux and a few KSP shaders: total local used in shared programs : 54116 -> 30372 (-43.88%) Probably modest advantage to execution, but it's an imporant prerequisite to dropping some of the TGSI optimizations done by the state tracker. Signed-off-by: Ilia Mirkin--- Seems like there ought to be a simpler way of doing this... oh well. .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 64 +- 1 file changed, 50 insertions(+), 14 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 0e1c332..2085978 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -841,6 +841,11 @@ public: std::set locals; std::set indirectTempArrays; + struct TempBase { + int oldBase, newBase; + }; + std::map indirectTempBases; + std::map > tempArrayInfo; std::vector tempArrayId; int clipVertexOutput; @@ -949,9 +954,19 @@ bool Source::scanSource() } tgsi_parse_free(); - // TODO: Compute based on relevant array sizes - if (indirectTempArrays.size()) - info->bin.tlsSpace += (scan.file_max[TGSI_FILE_TEMPORARY] + 1) * 16; + if (indirectTempArrays.size()) { + int tempBase = 0; + for (std::set::const_iterator it = indirectTempArrays.begin(); + it != indirectTempArrays.end(); ++it) { + std::pair & info = tempArrayInfo[*it]; + TempBase base; + base.oldBase = info.first; + base.newBase = tempBase; + indirectTempBases.insert(std::make_pair(*it, base)); + tempBase += info.second; + } + info->bin.tlsSpace += tempBase * 16; + } if (info->io.genUserClip > 0) { info->io.clipDistances = info->io.genUserClip; @@ -1208,6 +1223,9 @@ bool Source::scanDeclaration(const struct tgsi_full_declaration *decl) case TGSI_FILE_TEMPORARY: for (i = first; i <= last; ++i) tempArrayId[i] = arrayId; + if (arrayId) + tempArrayInfo.insert(std::make_pair(arrayId, std::make_pair( + first, last - first + 1))); break; case TGSI_FILE_NULL: case TGSI_FILE_ADDRESS: @@ -1374,6 +1392,7 @@ private: void storeDst(const tgsi::Instruction::DstRegister dst, int c, Value *val, Value *ptr); + void adjustTempIndex(int arrayId, int , int ) const; Value *applySrcMod(Value *, int s, int c); Symbol *makeSym(uint file, int fileIndex, int idx, int c, uint32_t addr); @@ -1679,11 +1698,23 @@ Converter::shiftAddress(Value *index) return mkOp2v(OP_SHL, TYPE_U32, getSSA(4, FILE_ADDRESS), index, mkImm(4)); } +void +Converter::adjustTempIndex(int arrayId, int , int ) const +{ + std::map ::const_iterator it = + code->indirectTempBases.find(arrayId); + if (it == code->indirectTempBases.end()) + return; + + idx2d = 1; + idx += it->second.newBase - it->second.oldBase; +} + Value * Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr) { int idx2d = src.is2D() ? src.getIndex(1) : 0; - const int idx = src.getIndex(0); + int idx = src.getIndex(0); const int swz = src.getSwizzle(c); Instruction *ld; @@ -1728,8 +1759,7 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr) int arrayid = src.getArrayId(); if (!arrayid) arrayid = code->tempArrayId[idx]; - idx2d = (code->indirectTempArrays.find(arrayid) != - code->indirectTempArrays.end()); + adjustTempIndex(arrayid, idx, idx2d); } /* fallthrough */ default: @@ -1743,7 +1773,7 @@ Converter::acquireDst(int d, int c) { const tgsi::Instruction::DstRegister dst = tgsi.getDst(d); const unsigned f = dst.getFile(); - const int idx = dst.getIndex(0); + int idx = dst.getIndex(0); int idx2d = dst.is2D() ? dst.getIndex(1) : 0; if (dst.isMasked(c) || f == TGSI_FILE_BUFFER || f == TGSI_FILE_IMAGE) @@ -1754,9 +1784,12 @@ Converter::acquireDst(int d, int c) (f == TGSI_FILE_OUTPUT && prog->getType() != Program::TYPE_FRAGMENT)) return getScratch(); - if (f == TGSI_FILE_TEMPORARY) - idx2d = code->indirectTempArrays.find(code->tempArrayId[idx]) != - code->indirectTempArrays.end(); + if (f == TGSI_FILE_TEMPORARY) { + int arrayid = dst.getArrayId(); + if (!arrayid) + arrayid = code->tempArrayId[idx]; + adjustTempIndex(arrayid, idx, idx2d); + } return getArrayForFile(f, idx2d)-> acquire(sub.cur->values, idx, c); } @@ -1789,7 +1822,7 @@ Converter::storeDst(const tgsi::Instruction::DstRegister dst, int c, Value *val, Value *ptr) { const unsigned f = dst.getFile(); - const int idx =
[Mesa-dev] [PATCH] st/mesa: add check for color logicop in blit_copy_pixels()
We check that a bunch of raster operations are disabled in blit_copy_pixels(). We also need to check that color logicop is disabled. --- src/mesa/state_tracker/st_cb_drawpixels.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c b/src/mesa/state_tracker/st_cb_drawpixels.c index 7ed52dd..04a9de0 100644 --- a/src/mesa/state_tracker/st_cb_drawpixels.c +++ b/src/mesa/state_tracker/st_cb_drawpixels.c @@ -1302,6 +1302,7 @@ blit_copy_pixels(struct gl_context *ctx, GLint srcx, GLint srcy, ctx->_ImageTransferState == 0x0 && !ctx->Color.BlendEnabled && !ctx->Color.AlphaEnabled && + (!ctx->Color.ColorLogicOpEnabled || ctx->Color.LogicOp == GL_COPY) && !ctx->Depth.Test && !ctx->Fog.Enabled && !ctx->Stencil.Enabled && -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V3] glapi: add GL_OES_geometry_shader extension
On Wed, Jan 13, 2016 at 3:55 AM, Tapani Pälliwrote: > On 01/13/2016 10:29 AM, Lofstedt, Marta wrote: >> >> >>> -Original Message- >>> From: ibmir...@gmail.com [mailto:ibmir...@gmail.com] On Behalf Of Ilia >>> Mirkin >>> Sent: Tuesday, January 12, 2016 7:09 PM >>> To: Marta Lofstedt >>> Cc: mesa-dev@lists.freedesktop.org; Romanick, Ian D; Lofstedt, Marta >>> Subject: Re: [PATCH V3] glapi: add GL_OES_geometry_shader extension >>> >>> On Tue, Jan 12, 2016 at 9:13 AM, Marta Lofstedt >>> wrote: From: Marta Lofstedt Add xml definitions for the GL_OES_geometry_shader extension and expose the extension for OpenGL ES 3.1. V3: Added dependency to OES_shader_io_blocks and updated to correct Khronos extension number. >>> >>> May I ask why you did this? OES_shader_io_blocks is a purely shader >>> compiler/linker feature, I expect it will be enabled whenever GLES 3.1 is >>> enabled, no? Why would it be tied to geometry shaders? Sure, geometry >>> shaders require it to work, but just because you have >>> OES_shader_io_blocks >>> doesn't necessarily mean you also have geometry shaders... >>> >> My intension was to address the co-dependency between oes_geometry_shader >> and oes_shader_io_block. >> But as always, you are right Ilia. The dependency issue need to be fixed >> in the driver. >> >> So, please disregard this V3, I will push the V2 with the changes >> suggested by Ilia in the comments. >> >> FYI here are quotes from the oes_geometry_shader specification: >> " OES_shader_io_blocks or EXT_shader_io_blocks is required." > > > IMO according to this it looks OES_shader_io_blocks is a valid requirement > as that functionality is not part of OpenGL ES 3.1. Sure. But that has little bearing on the discussion here -- OES_shader_io_blocks is a compiler feature, not a backend feature. In order for any backend to expose OES_geometry_shader, the OES_shader_io_blocks ext needs to be done. But just because it is done doesn't mean you have geometry shaders. So you have to make sure that not only does the backend support geometry shaders, but the core supports OES_shader_io_blocks before you expose OES_geometry_shader. That doesn't seem too onerous. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V3] glapi: add GL_OES_geometry_shader extension
On 01/13/2016 12:55 AM, Tapani Pälli wrote: > On 01/13/2016 10:29 AM, Lofstedt, Marta wrote: >> >>> -Original Message- >>> From: ibmir...@gmail.com [mailto:ibmir...@gmail.com] On Behalf Of Ilia >>> Mirkin >>> Sent: Tuesday, January 12, 2016 7:09 PM >>> To: Marta Lofstedt >>> Cc: mesa-dev@lists.freedesktop.org; Romanick, Ian D; Lofstedt, Marta >>> Subject: Re: [PATCH V3] glapi: add GL_OES_geometry_shader extension >>> >>> On Tue, Jan 12, 2016 at 9:13 AM, Marta Lofstedt >>>wrote: From: Marta Lofstedt Add xml definitions for the GL_OES_geometry_shader extension and expose the extension for OpenGL ES 3.1. V3: Added dependency to OES_shader_io_blocks and updated to correct Khronos extension number. >>> May I ask why you did this? OES_shader_io_blocks is a purely shader >>> compiler/linker feature, I expect it will be enabled whenever GLES >>> 3.1 is >>> enabled, no? Why would it be tied to geometry shaders? Sure, geometry >>> shaders require it to work, but just because you have >>> OES_shader_io_blocks >>> doesn't necessarily mean you also have geometry shaders... >>> >> My intension was to address the co-dependency between >> oes_geometry_shader and oes_shader_io_block. >> But as always, you are right Ilia. The dependency issue need to be >> fixed in the driver. >> >> So, please disregard this V3, I will push the V2 with the changes >> suggested by Ilia in the comments. >> >> FYI here are quotes from the oes_geometry_shader specification: >> " OES_shader_io_blocks or EXT_shader_io_blocks is required." > > IMO according to this it looks OES_shader_io_blocks is a valid > requirement as that functionality is not part of OpenGL ES 3.1. True. Any driver that enables OES_geometry_shader but does not also enable OES_shader_io_blocks has a bug. OES_shader_io_blocks is necessary but not sufficient. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC] nir: const_index sanity
From: Rob Clark--- An idea for how to bring some sanity to the wild-west of intrinsic const_index[] usage. Also w/ nir_print support, which could be split into other patch, but makes the nir_print output a bit nicer: intrinsic store_output (ssa_210, ssa_66) () (0, 15) /* base=0 wrmask=xyzw */ (and already made me realize that ttn was neglecting to set wrmask on store_output's) Probably I'd add "setter" functions to, and then in follow-on patches, update the gazillion places where const_index[] access is open-coded. But first, before big conflicty changes like that, I figured I see what others thought. The other variation of the idea is to simply drop the const_index[] field and replace w/ 'unsigned wrmask' and 'int base'. Although that would be a bigger more flag-day sort of patch. BR, -R src/glsl/nir/nir.h| 48 +++- src/glsl/nir/nir_intrinsics.c | 11 ++- src/glsl/nir/nir_intrinsics.h | 178 +- src/glsl/nir/nir_print.c | 30 --- 4 files changed, 166 insertions(+), 101 deletions(-) diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index bedcc0d..2235154 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -786,7 +786,7 @@ typedef struct { } nir_call_instr; #define INTRINSIC(name, num_srcs, src_components, has_dest, dest_components, \ - num_variables, num_indices, flags) \ + num_variables, num_indices, idx0, idx1, idx2, flags) \ nir_intrinsic_##name, #define LAST_INTRINSIC(name) nir_last_intrinsic = nir_intrinsic_##name, @@ -799,6 +799,8 @@ typedef enum { #undef INTRINSIC #undef LAST_INTRINSIC +#define NIR_INTRINSIC_MAX_CONST_INDEX 3 + /** Represents an intrinsic * * An intrinsic is an instruction type for handling things that are @@ -842,7 +844,7 @@ typedef struct { */ uint8_t num_components; - int const_index[3]; + int const_index[NIR_INTRINSIC_MAX_CONST_INDEX]; nir_deref_var *variables[2]; @@ -871,6 +873,29 @@ typedef enum { NIR_INTRINSIC_CAN_REORDER = (1 << 1), } nir_intrinsic_semantic_flag; +/** + * \name NIR intrinsics const-index flag + * + * Indicates the usage of a const_index slot. + * + * \sa nir_intrinsic_info::index_map + */ +typedef enum { + /** +* Generally instructions that take a offset src argument, can encode +* a constant 'base' value which is added to the offset. +*/ + NIR_INTRINSIC_BASE = 1, + + /** +* For store instructions, a writemask for the store. +*/ + NIR_INTRINSIC_WRMASK = 2, + + NIR_INTRINSIC_NUM_INDEX_FLAGS, + +} nir_intrinsic_index_flag; + #define NIR_INTRINSIC_MAX_INPUTS 4 typedef struct { @@ -900,12 +925,31 @@ typedef struct { /** the number of constant indices used by the intrinsic */ unsigned num_indices; + /** indicates the usage of intr->const_index[n] */ + unsigned index_map[NIR_INTRINSIC_NUM_INDEX_FLAGS]; + /** semantic flags for calls to this intrinsic */ nir_intrinsic_semantic_flag flags; } nir_intrinsic_info; extern const nir_intrinsic_info nir_intrinsic_infos[nir_num_intrinsics]; +static inline unsigned +nir_intrinsic_write_mask(nir_intrinsic_instr *instr) +{ + const nir_intrinsic_info *info = _intrinsic_infos[instr->intrinsic]; + assert(info->index_map[NIR_INTRINSIC_WRMASK] > 0); + return instr->const_index[info->index_map[NIR_INTRINSIC_WRMASK] - 1]; +} + +static inline int +nir_intrinsic_base(nir_intrinsic_instr *instr) +{ + const nir_intrinsic_info *info = _intrinsic_infos[instr->intrinsic]; + assert(info->index_map[NIR_INTRINSIC_BASE] > 0); + return instr->const_index[info->index_map[NIR_INTRINSIC_BASE] - 1]; +} + /** * \group texture information * diff --git a/src/glsl/nir/nir_intrinsics.c b/src/glsl/nir/nir_intrinsics.c index a7c868c..7dddc70 100644 --- a/src/glsl/nir/nir_intrinsics.c +++ b/src/glsl/nir/nir_intrinsics.c @@ -30,7 +30,8 @@ #define OPCODE(name) nir_intrinsic_##name #define INTRINSIC(_name, _num_srcs, _src_components, _has_dest, \ - _dest_components, _num_variables, _num_indices, _flags) \ + _dest_components, _num_variables, _num_indices, \ + idx0, idx1, idx2, _flags) \ { \ .name = #_name, \ .num_srcs = _num_srcs, \ @@ -39,9 +40,15 @@ .dest_components = _dest_components, \ .num_variables = _num_variables, \ .num_indices = _num_indices, \ - .flags = _flags \ + .index_map = { \ + [NIR_INTRINSIC_ ## idx0] = 1, \ + [NIR_INTRINSIC_ ## idx1] = 2, \ + [NIR_INTRINSIC_ ## idx2] = 3, \ + }, \ }, +#define NIR_INTRINSIC_xx 0 + #define LAST_INTRINSIC(name) const nir_intrinsic_info nir_intrinsic_infos[nir_num_intrinsics] = { diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h index 62eead4..fd46692 100644 --- a/src/glsl/nir/nir_intrinsics.h +++ b/src/glsl/nir/nir_intrinsics.h @@ -30,7 +30,7 @@ * expands to a list of macros of the form: * *
Re: [Mesa-dev] [PATCH 1/2] nir: Add a fquantize2f16 opcode
On Wed, Jan 13, 2016 at 2:01 AM, Ian Romanickwrote: > On 01/12/2016 05:41 PM, Matt Turner wrote: > > On Tue, Jan 12, 2016 at 4:10 PM, Jason Ekstrand > wrote: > >> On Tue, Jan 12, 2016 at 3:52 PM, Matt Turner > wrote: > >>> > >>> On Tue, Jan 12, 2016 at 3:35 PM, Jason Ekstrand > >>> wrote: > This opcode simply takes a 32-bit floating-point value and reduces its > effective precision to 16 bits. > --- > >>> > >>> What's it supposed to do for values not representable in > half-precision? > >> > >> > >> If they're in-range, round. If they're out-of-range, the appropriate > >> infinity. > > > > Are you sure that's the behavior hardware has? And by "are you sure" I > > mean "have you tested it" > > > > The conversion table in the f32to16 documentation in the IVB PRM says: > > > > single precision -> half precision > > > > -finite -> -finite/-denorm/-0 > > +finite -> +finite/+denorm/+0 > > > >> > https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.html#OpQuantizeToF16 > > > >> Quantize a floating-point value to a what is expressible by a 16-bit > floating-point value. > > > > Erf, anyway, > > > > ... and the "convert too-large values to inf" isn't the behavior of > > other languages like C [1] (and I don't think GLSL either, but I can't > > find anything on the matter i the spec) or OpenCL C [2]. > > Some background may either clarify or further muddy things. > > Right now applications sprinkle mediump and lowp all over the place in > GLSL ES shaders. Many vertex shader implementations, even on mobile > devices, do everything in single precision. Many devices will only use > f16 part of the time because some instructions may not have f16 > versions. When we finally implement f16 in the i965 driver, we'll be in > this boat too. > > As a result, people think that their mediump-decorated code is fine... > until it actually runs on a device that really does mediump. Then they > report a bug to the vendor of that hardware. Sound like a familiar > situation? > > From this problem the OpQuantizeToF16 SPRI-V instruction was born. The > intention is that people could compile their code in a way that mediump > gives you mediump precision on every device. While you probably > wouldn't want to ship such code, this at least makes it possible to test > it without having to find a device that will really do native mediump > calculations all the time. > > IIRC, GLSL doesn't require Inf in mediump. I don't recall what SPRI-V > says. I believe that GLSL allows saturating to the maximum magnitude > representable value. What we want is for an expression tree like > > OpQuantizeToF16(OpQuantizeToF16(x) + OpQuantizeToF16(y)) > > to produce the same value that 'x + y' would produce in "real" f16 mediump. > Right. This is exactly why the opcode was created. > > The SPRI-V +/-Inf requirement doesn't completely jive with my > recollection of the discussions... but there was a lot of > back-and-forth, and it was quite a few months ago at this point. I > think we may have picked just one possible answer instead of allowing > both choices just for consistency. I don't have any memory whether > anyone strongly wanted the +/-Inf behavior or if it was just a coin toss. > For OpQuantizeF16, the spec does currently > > > Section 8.3.2 of the OpenCL C 2.0 spec is also relevant, but doesn't > > touch directly on the issue at hand. > > > > I'm worried that what is specified is not implementable via a round > > trip through half-precision, because it's not the behavior other > > languages implement. > > > > If I had to guess, given the table in the IVB PRM and section 8.3.2, > > out-of-range single-precision floats are converted to the > > half-precision value with the largest magnitude. > > You are correct, we should test it to be sure what the hardware really > does. This is not intended to be a performance operation. If we need to > use a different, more expensive expansion to meet the requirements, we > shouldn't lose any sleep over it. > I haven't looked at it in bit-for-bit detail, but I I did run it through a set of tests which explicitly hits denorms and the out-of-bounds cases in both directions. The tests seem to indicate that the hardware does what the opcode claims. --Jason ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] ttn: add missing writemask on store_output
From: Rob ClarkSigned-off-by: Rob Clark --- src/gallium/auxiliary/nir/tgsi_to_nir.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/auxiliary/nir/tgsi_to_nir.c b/src/gallium/auxiliary/nir/tgsi_to_nir.c index 46c9297..e127174 100644 --- a/src/gallium/auxiliary/nir/tgsi_to_nir.c +++ b/src/gallium/auxiliary/nir/tgsi_to_nir.c @@ -1908,6 +1908,7 @@ ttn_add_output_stores(struct ttn_compile *c) store->src[0].reg.reg = c->output_regs[loc].reg; store->src[0].reg.base_offset = c->output_regs[loc].offset; store->const_index[0] = loc; + store->const_index[1] = 0xf; /* writemask */ store->src[1] = nir_src_for_ssa(nir_imm_int(b, 0)); nir_builder_instr_insert(b, >instr); } -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nir: Add a fquantize2f16 opcode
On Wed, Jan 13, 2016 at 1:46 PM, Jason Ekstrandwrote: > On Wed, Jan 13, 2016 at 2:01 AM, Ian Romanick wrote: >> On 01/12/2016 05:41 PM, Matt Turner wrote: >> > Section 8.3.2 of the OpenCL C 2.0 spec is also relevant, but doesn't >> > touch directly on the issue at hand. >> > >> > I'm worried that what is specified is not implementable via a round >> > trip through half-precision, because it's not the behavior other >> > languages implement. >> > >> > If I had to guess, given the table in the IVB PRM and section 8.3.2, >> > out-of-range single-precision floats are converted to the >> > half-precision value with the largest magnitude. >> >> You are correct, we should test it to be sure what the hardware really >> does. This is not intended to be a performance operation. If we need to >> use a different, more expensive expansion to meet the requirements, we >> shouldn't lose any sleep over it. > > > I haven't looked at it in bit-for-bit detail, but I I did run it through a > set of tests which explicitly hits denorms and the out-of-bounds cases in > both directions. The tests seem to indicate that the hardware does what the > opcode claims. I checked out the tests you mention, and none of the cases touch on what I'm saying (and this has nothing to do with denormal values). Let me explain again. The largest representable value in half-precision is 65504 == 2.0**15 * (1.0 + 1023.0 / 2.0**10) and the distance between representable integers at this range is 32. Converting 65505.0f through 65519.0f (i.e., one less than half the interval more than the largest representable value) to half-precision should round to 65504.0. 65520.0f and larger should round to infinity. This is what piglit tests (generated_tests/gen_builtin_packing_tests.py) and since we pass those tests I believe this is what the hardware does. This is, unfortunately, *not* what the documentation you've cited says. I expect that that's an oversight more than intentional behavior. Maybe tomorrow we can figure out how to submit changes to the spec and test suite? (And thanks to Chad for writing a significantly better quality test than what I found from Khronos) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] texobj: Check completeness with InternalFormat rather than Mesa format
On Wed, Jan 13, 2016 at 11:28 AM, Neil Robertswrote: > The internal Mesa format used for a texture might not match the one > requested in the internalFormat when the texture was created, for > example if the driver is internally remapping RGB textures to RGBA. > Otherwise it can cause false positives for completeness if one mipmap > image is created as RGBA and the other as RGB because they would both > have an RGBA Mesa format. If we check the InternalFormat instead then > we are directly checking the API usage which I think better matches > the intention of the check. > > https://bugs.freedesktop.org/show_bug.cgi?id=93700 > --- > src/mesa/main/texobj.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/mesa/main/texobj.c b/src/mesa/main/texobj.c > index 547055e..b107a8f 100644 > --- a/src/mesa/main/texobj.c > +++ b/src/mesa/main/texobj.c > @@ -835,7 +835,7 @@ _mesa_test_texobj_completeness( const struct gl_context > *ctx, >incomplete(t, MIPMAP, "TexImage[%d] is missing", i); >return; > } > - if (img->TexFormat != baseImage->TexFormat) { > + if (img->InternalFormat != baseImage->InternalFormat) { >incomplete(t, MIPMAP, "Format[i] != Format[baseLevel]"); >return; > } > -- > 2.5.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev LGTM. Reviewed-by: Anuj Phogat ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/7] i965/vec4/gs: Stop munging the ATTR containing gl_PointSize.
gl_PointSize is delivered in the .w component of the VUE header, while the language expects it to be a float (and thus in the .x component). Previously, we emitted MOVs to copy it over to the .x component. But this is silly - we can just use a . swizzle and access it without copying anything or clobbering the value stored at .x (which admittedly is useless). Removes the last use of ATTR destinations. Signed-off-by: Kenneth Graunke--- src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp | 4 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 23 --- 2 files changed, 4 insertions(+), 23 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp index 6f66978..90aa54e 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp @@ -73,6 +73,10 @@ vec4_gs_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) src = src_reg(ATTR, BRW_VARYING_SLOT_COUNT * vertex->u[0] + instr->const_index[0] + offset->u[0], type); + /* gl_PointSize is passed in the .w component of the VUE header */ + if (instr->const_index[0] == VARYING_SLOT_PSIZ) + src.swizzle = SWIZZLE_; + dest = get_nir_dest(instr->dest, src.type); dest.writemask = brw_writemask_for_size(instr->num_components); emit(MOV(dest, src)); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp index b13d36e..374b1a7 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp @@ -182,29 +182,6 @@ vec4_gs_visitor::emit_prolog() } } - /* If the geometry shader uses the gl_PointSize input, we need to fix it up -* to account for the fact that the vertex shader stored it in the w -* component of VARYING_SLOT_PSIZ. -*/ - if (nir->info.inputs_read & VARYING_BIT_PSIZ) { - this->current_annotation = "swizzle gl_PointSize input"; - for (int vertex = 0; vertex < (int)nir->info.gs.vertices_in; vertex++) { - dst_reg dst(ATTR, - BRW_VARYING_SLOT_COUNT * vertex + VARYING_SLOT_PSIZ); - dst.type = BRW_REGISTER_TYPE_F; - src_reg src(dst); - dst.writemask = WRITEMASK_X; - src.swizzle = BRW_SWIZZLE_; - inst = emit(MOV(dst, src)); - - /* In dual instanced dispatch mode, dst has a width of 4, so we need - * to make sure the MOV happens regardless of which channels are - * enabled. - */ - inst->force_writemask_all = true; - } - } - this->current_annotation = NULL; } -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/7] i965: Apply add_const_offset_to_base for vec4 VS inputs too.
This shouldn't hurt anything, and I'm about to introduce a pass that will want it. Signed-off-by: Kenneth Graunke--- src/mesa/drivers/dri/i965/brw_nir.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i965/brw_nir.c index 55ba732..935529a 100644 --- a/src/mesa/drivers/dri/i965/brw_nir.c +++ b/src/mesa/drivers/dri/i965/brw_nir.c @@ -220,6 +220,11 @@ brw_nir_lower_inputs(nir_shader *nir, */ nir_lower_io(nir, nir_var_shader_in, type_size_vec4); + /* This pass needs actual constants */ + nir_opt_constant_folding(nir); + + add_const_offset_to_base(nir, nir_var_shader_in); + if (is_scalar) { /* Finally, translate VERT_ATTRIB_* values into the actual registers. * @@ -229,11 +234,6 @@ brw_nir_lower_inputs(nir_shader *nir, */ GLbitfield64 inputs_read = nir->info.inputs_read; - /* This pass needs actual constants */ - nir_opt_constant_folding(nir); - - add_const_offset_to_base(nir, nir_var_shader_in); - nir_foreach_function(nir, function) { if (function->impl) { nir_foreach_block(function->impl, remap_vs_attrs, _read); -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/7] nir/builder: Add a nir_build_ivec4() convenience helper.
nir_build_ivec4 is more readable and succinct than using nir_build_imm directly, even if you have C99. Signed-off-by: Kenneth Graunke--- src/glsl/nir/nir_builder.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/src/glsl/nir/nir_builder.h b/src/glsl/nir/nir_builder.h index cfaaf8e..88ba3a1 100644 --- a/src/glsl/nir/nir_builder.h +++ b/src/glsl/nir/nir_builder.h @@ -121,6 +121,20 @@ nir_imm_int(nir_builder *build, int x) } static inline nir_ssa_def * +nir_imm_ivec4(nir_builder *build, int x, int y, int z, int w) +{ + nir_const_value v; + + memset(, 0, sizeof(v)); + v.i[0] = x; + v.i[1] = y; + v.i[2] = z; + v.i[3] = w; + + return nir_build_imm(build, 4, v); +} + +static inline nir_ssa_def * nir_build_alu(nir_builder *build, nir_op op, nir_ssa_def *src0, nir_ssa_def *src1, nir_ssa_def *src2, nir_ssa_def *src3) { -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/7] i965/vec4: Drop support for ATTR as an instruction destination.
This is no longer necessary...and it doesn't make much sense to have inputs as destinations. Signed-off-by: Kenneth Graunke--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 16 1 file changed, 16 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index d2c27ff..4b3f2af 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1522,22 +1522,6 @@ vec4_visitor::lower_attributes_to_hw_regs(const int *attribute_map, bool interleaved) { foreach_block_and_inst(block, vec4_instruction, inst, cfg) { - /* We have to support ATTR as a destination for GL_FIXED fixup. */ - if (inst->dst.file == ATTR) { - int grf = attribute_map[inst->dst.nr + inst->dst.reg_offset]; - - /* All attributes used in the shader need to have been assigned a - * hardware register by the caller - */ - assert(grf != 0); - -struct brw_reg reg = attribute_to_hw_reg(grf, interleaved); -reg.type = inst->dst.type; -reg.writemask = inst->dst.writemask; - - inst->dst = reg; - } - for (int i = 0; i < 3; i++) { if (inst->src[i].file != ATTR) continue; -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/7] i965: Make add_const_offset_to_base() work at the shader level.
This makes it a pass, hiding the parameter structs and block callbacks so it's simpler to work with. Signed-off-by: Kenneth Graunke--- src/mesa/drivers/dri/i965/brw_nir.c | 38 - 1 file changed, 21 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i965/brw_nir.c index f8b258b..55ba732 100644 --- a/src/mesa/drivers/dri/i965/brw_nir.c +++ b/src/mesa/drivers/dri/i965/brw_nir.c @@ -60,7 +60,7 @@ struct add_const_offset_to_base_params { }; static bool -add_const_offset_to_base(nir_block *block, void *closure) +add_const_offset_to_base_block(nir_block *block, void *closure) { struct add_const_offset_to_base_params *params = closure; nir_builder *b = >b; @@ -85,7 +85,19 @@ add_const_offset_to_base(nir_block *block, void *closure) } } return true; +} + +static void +add_const_offset_to_base(nir_shader *nir, nir_variable_mode mode) +{ + struct add_const_offset_to_base_params params = { .mode = mode }; + nir_foreach_function(nir, f) { + if (f->impl) { + nir_builder_init(, f->impl); + nir_foreach_block(f->impl, add_const_offset_to_base_block, ); + } + } } static bool @@ -195,10 +207,6 @@ brw_nir_lower_inputs(nir_shader *nir, const struct brw_device_info *devinfo, bool is_scalar) { - struct add_const_offset_to_base_params params = { - .mode = nir_var_shader_in - }; - switch (nir->stage) { case MESA_SHADER_VERTEX: /* Start with the location of the variable's base. */ @@ -224,10 +232,10 @@ brw_nir_lower_inputs(nir_shader *nir, /* This pass needs actual constants */ nir_opt_constant_folding(nir); + add_const_offset_to_base(nir, nir_var_shader_in); + nir_foreach_function(nir, function) { if (function->impl) { - nir_builder_init(, function->impl); - nir_foreach_block(function->impl, add_const_offset_to_base, ); nir_foreach_block(function->impl, remap_vs_attrs, _read); } } @@ -270,10 +278,10 @@ brw_nir_lower_inputs(nir_shader *nir, /* This pass needs actual constants */ nir_opt_constant_folding(nir); + add_const_offset_to_base(nir, nir_var_shader_in); + nir_foreach_function(nir, function) { if (function->impl) { - nir_builder_init(, function->impl); - nir_foreach_block(function->impl, add_const_offset_to_base, ); nir_foreach_block(function->impl, remap_inputs_with_vue_map, _vue_map); } @@ -296,10 +304,10 @@ brw_nir_lower_inputs(nir_shader *nir, /* This pass needs actual constants */ nir_opt_constant_folding(nir); + add_const_offset_to_base(nir, nir_var_shader_in); + nir_foreach_function(nir, function) { if (function->impl) { -nir_builder_init(, function->impl); -nir_foreach_block(function->impl, add_const_offset_to_base, ); nir_builder_init(, function->impl); nir_foreach_block(function->impl, remap_patch_urb_offsets, ); } @@ -339,10 +347,6 @@ brw_nir_lower_outputs(nir_shader *nir, } break; case MESA_SHADER_TESS_CTRL: { - struct add_const_offset_to_base_params params = { - .mode = nir_var_shader_out - }; - struct remap_patch_urb_offsets_state state; brw_compute_tess_vue_map(_map, nir->info.outputs_written, nir->info.patch_outputs_written); @@ -356,10 +360,10 @@ brw_nir_lower_outputs(nir_shader *nir, /* This pass needs actual constants */ nir_opt_constant_folding(nir); + add_const_offset_to_base(nir, nir_var_shader_out); + nir_foreach_function(nir, function) { if (function->impl) { -nir_builder_init(, function->impl); -nir_foreach_block(function->impl, add_const_offset_to_base, ); nir_builder_init(, function->impl); nir_foreach_block(function->impl, remap_patch_urb_offsets, ); } -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/7] i965: Apply VS attribute workarounds in NIR.
This patch re-implements the pre-Haswell VS attribute workarounds. Instead of emitting shader code in the vec4 backend, we now simply call a NIR pass to emit the necessary code. This simplifies the vec4 backend. Beyond deleting code, it removes the primary use of ATTR as a destination. It also eliminates the requirement that the vec4 VS backend express the ATTR file in terms of VERT_ATTRIB_* locations, giving us a bit more flexibility. This approach is a little different: rather than munging the attributes at the top, we emit code to fix them up when they're accessed. However, we run the optimizer afterwards, so CSE should eliminate the redundant math. It may even be able to fuse it with other calculations based on the input value. shader-db does not handle non-default NOS settings, so I have no statistics about this patch. Note that the scalar backend does not implement VS attribute workarounds, as they are unnecessary on hardware which allows SIMD8 VS. Signed-off-by: Kenneth Graunke--- src/mesa/drivers/dri/i965/Makefile.sources | 1 + src/mesa/drivers/dri/i965/brw_nir.c| 19 ++- src/mesa/drivers/dri/i965/brw_nir.h| 7 +- .../dri/i965/brw_nir_attribute_workarounds.c | 178 + src/mesa/drivers/dri/i965/brw_shader.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 + src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp | 109 - 8 files changed, 204 insertions(+), 117 deletions(-) create mode 100644 src/mesa/drivers/dri/i965/brw_nir_attribute_workarounds.c diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources index 5aeeca5..c654f94 100644 --- a/src/mesa/drivers/dri/i965/Makefile.sources +++ b/src/mesa/drivers/dri/i965/Makefile.sources @@ -42,6 +42,7 @@ i965_compiler_FILES = \ brw_nir.h \ brw_nir.c \ brw_nir_analyze_boolean_resolves.c \ + brw_nir_attribute_workarounds.c \ brw_nir_opt_peephole_ffma.c \ brw_nir_uniforms.cpp \ brw_packed_float.c \ diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i965/brw_nir.c index 935529a..cdecc3d 100644 --- a/src/mesa/drivers/dri/i965/brw_nir.c +++ b/src/mesa/drivers/dri/i965/brw_nir.c @@ -205,7 +205,9 @@ remap_patch_urb_offsets(nir_block *block, void *closure) static void brw_nir_lower_inputs(nir_shader *nir, const struct brw_device_info *devinfo, - bool is_scalar) + bool is_scalar, + bool use_legacy_snorm_formula, + const uint8_t *vs_attrib_wa_flags) { switch (nir->stage) { case MESA_SHADER_VERTEX: @@ -225,6 +227,9 @@ brw_nir_lower_inputs(nir_shader *nir, add_const_offset_to_base(nir, nir_var_shader_in); + brw_nir_apply_attribute_workarounds(nir, use_legacy_snorm_formula, + vs_attrib_wa_flags); + if (is_scalar) { /* Finally, translate VERT_ATTRIB_* values into the actual registers. * @@ -497,12 +502,15 @@ brw_preprocess_nir(nir_shader *nir, bool is_scalar) nir_shader * brw_nir_lower_io(nir_shader *nir, const struct brw_device_info *devinfo, - bool is_scalar) + bool is_scalar, + bool use_legacy_snorm_formula, + const uint8_t *vs_attrib_wa_flags) { bool progress; /* Written by OPT and OPT_V */ (void)progress; - OPT_V(brw_nir_lower_inputs, devinfo, is_scalar); + OPT_V(brw_nir_lower_inputs, devinfo, is_scalar, + use_legacy_snorm_formula, vs_attrib_wa_flags); OPT_V(brw_nir_lower_outputs, devinfo, is_scalar); OPT_V(nir_lower_io, nir_var_all, is_scalar ? type_size_scalar : type_size_vec4); @@ -613,9 +621,10 @@ brw_create_nir(struct brw_context *brw, OPT_V(nir_lower_atomics, shader_prog); } - if (nir->stage != MESA_SHADER_TESS_CTRL && + if (nir->stage != MESA_SHADER_VERTEX && + nir->stage != MESA_SHADER_TESS_CTRL && nir->stage != MESA_SHADER_TESS_EVAL) { - nir = brw_nir_lower_io(nir, devinfo, is_scalar); + nir = brw_nir_lower_io(nir, devinfo, is_scalar, false, NULL); } return nir; diff --git a/src/mesa/drivers/dri/i965/brw_nir.h b/src/mesa/drivers/dri/i965/brw_nir.h index 78b139b..5bfe40f 100644 --- a/src/mesa/drivers/dri/i965/brw_nir.h +++ b/src/mesa/drivers/dri/i965/brw_nir.h @@ -84,11 +84,16 @@ nir_shader *brw_create_nir(struct brw_context *brw, nir_shader *brw_preprocess_nir(nir_shader *nir, bool is_scalar); nir_shader *brw_nir_lower_io(nir_shader *nir, const struct brw_device_info *devinfo, -bool is_scalar); +bool is_scalar, +bool use_legacy_snorm_formula, +
[Mesa-dev] [PATCH 2/7] i965: Make an is_scalar boolean in brw_compile_vs().
Shorter than compiler->scalar_stage[MESA_SHADER_VERTEX], which can help with line-wrapping. Signed-off-by: Kenneth Graunke--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index c6a52c5..ca27066 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1988,11 +1988,11 @@ brw_compile_vs(const struct brw_compiler *compiler, void *log_data, unsigned *final_assembly_size, char **error_str) { + const bool is_scalar = compiler->scalar_stage[MESA_SHADER_VERTEX]; nir_shader *shader = nir_shader_clone(mem_ctx, src_shader); shader = brw_nir_apply_sampler_key(shader, compiler->devinfo, >tex, - compiler->scalar_stage[MESA_SHADER_VERTEX]); - shader = brw_postprocess_nir(shader, compiler->devinfo, -compiler->scalar_stage[MESA_SHADER_VERTEX]); + is_scalar); + shader = brw_postprocess_nir(shader, compiler->devinfo, is_scalar); const unsigned *assembly = NULL; @@ -2018,7 +2018,7 @@ brw_compile_vs(const struct brw_compiler *compiler, void *log_data, * Read Length" as 1 in vec4 mode, and 0 in SIMD8 mode. Empirically, in * vec4 mode, the hardware appears to wedge unless we read something. */ - if (compiler->scalar_stage[MESA_SHADER_VERTEX]) + if (is_scalar) prog_data->base.urb_read_length = DIV_ROUND_UP(nr_attributes, 2); else prog_data->base.urb_read_length = DIV_ROUND_UP(MAX2(nr_attributes, 1), 2); @@ -2037,7 +2037,7 @@ brw_compile_vs(const struct brw_compiler *compiler, void *log_data, else prog_data->base.urb_entry_size = DIV_ROUND_UP(vue_entries, 4); - if (compiler->scalar_stage[MESA_SHADER_VERTEX]) { + if (is_scalar) { prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8; fs_visitor v(compiler, log_data, mem_ctx, key, _data->base.base, -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nir: Handle =32 case in bitfield_insert lowering.
Both are Reviewed-by: Connor AbbottOn Wed, Jan 13, 2016 at 2:25 PM, Matt Turner wrote: > The OpenGL specifications for bitfieldInsert() says: > >The result will be undefined if or is negative, or if >the sum of and is greater than the number of bits >used to store the operand. > > Therefore passing bits=32, offset=0 is legal and defined in GLSL. > > But the earlier SM5 bfi opcode is specified to accept a bitfield width > ranging from 0-31. As such, Intel and AMD instructions read only the low > 5 bits of the width operand, making them not able to implement the > GLSL-specified behavior directly. > > This commit fixes the lowering of bitfield_insert to handle the trivial > case of = 32 as > >bitfieldInsert: > bits > 31 ? insert : bfi(bfm(bits, offset), insert, base) > > Fixes: >ES31-CTS.shader_bitfield_operation.bitfieldInsert.uint_2 >ES31-CTS.shader_bitfield_operation.bitfieldInsert.uvec4_3 > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595 > --- > These two patches replace 8/9 and 9/9 of the previous series. > The first 7 patches from it have been reviewed and committed. > > src/glsl/nir/nir_opcodes.py | 1 + > src/glsl/nir/nir_opt_algebraic.py | 6 +- > 2 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py > index 1c65def..3e43438 100644 > --- a/src/glsl/nir/nir_opcodes.py > +++ b/src/glsl/nir/nir_opcodes.py > @@ -558,6 +558,7 @@ triop("fcsel", tfloat, "(src0 != 0.0f) ? src1 : src2") > opcode("bcsel", 0, tuint, [0, 0, 0], >[tbool, tuint, tuint], "", "src0 ? src1 : src2") > > +# SM5 bfi assembly > triop("bfi", tuint, """ > unsigned mask = src0, insert = src1, base = src2; > if (mask == 0) { > diff --git a/src/glsl/nir/nir_opt_algebraic.py > b/src/glsl/nir/nir_opt_algebraic.py > index 1eb044a..0d31e39 100644 > --- a/src/glsl/nir/nir_opt_algebraic.py > +++ b/src/glsl/nir/nir_opt_algebraic.py > @@ -225,9 +225,13 @@ optimizations = [ > > # Misc. lowering > (('fmod', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b, > 'options->lower_fmod'), > - (('bitfield_insert', a, b, c, d), ('bfi', ('bfm', d, c), b, a), > 'options->lower_bitfield_insert'), > (('uadd_carry', a, b), ('b2i', ('ult', ('iadd', a, b), a)), > 'options->lower_uadd_carry'), > (('usub_borrow', a, b), ('b2i', ('ult', a, b)), > 'options->lower_usub_borrow'), > + > + (('bitfield_insert', 'base', 'insert', 'offset', 'bits'), > +('bcsel', ('ilt', 31, 'bits'), 'insert', > + ('bfi', ('bfm', 'bits', 'offset'), 'insert', 'base')), > +'options->lower_bitfield_insert'), > ] > > # Add optimizations to handle the case where the result of a ternary is > -- > 2.4.9 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] nir: const_index sanity
On Jan 13, 2016 4:03 PM, "Rob Clark"wrote: > > From: Rob Clark > > --- > An idea for how to bring some sanity to the wild-west of intrinsic > const_index[] usage. Also w/ nir_print support, which could be > split into other patch, but makes the nir_print output a bit nicer: > > intrinsic store_output (ssa_210, ssa_66) () (0, 15) /* base=0 wrmask=xyzw */ > > (and already made me realize that ttn was neglecting to set wrmask on > store_output's) > > Probably I'd add "setter" functions to, and then in follow-on patches, > update the gazillion places where const_index[] access is open-coded. > > But first, before big conflicty changes like that, I figured I see what > others thought. The other variation of the idea is to simply drop the > const_index[] field and replace w/ 'unsigned wrmask' and 'int base'. > Although that would be a bigger more flag-day sort of patch. We really need to do something here and what you've done is a pretty clever way to handle the problem. I'll have to give it a bit more thought before I'll whole-heartedly endorse it, but a first brush looks pretty good. A few minor comments below. > BR, > -R > > src/glsl/nir/nir.h| 48 +++- > src/glsl/nir/nir_intrinsics.c | 11 ++- > src/glsl/nir/nir_intrinsics.h | 178 +- > src/glsl/nir/nir_print.c | 30 --- > 4 files changed, 166 insertions(+), 101 deletions(-) > > diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h > index bedcc0d..2235154 100644 > --- a/src/glsl/nir/nir.h > +++ b/src/glsl/nir/nir.h > @@ -786,7 +786,7 @@ typedef struct { > } nir_call_instr; > > #define INTRINSIC(name, num_srcs, src_components, has_dest, dest_components, \ > - num_variables, num_indices, flags) \ > + num_variables, num_indices, idx0, idx1, idx2, flags) \ > nir_intrinsic_##name, > > #define LAST_INTRINSIC(name) nir_last_intrinsic = nir_intrinsic_##name, > @@ -799,6 +799,8 @@ typedef enum { > #undef INTRINSIC > #undef LAST_INTRINSIC > > +#define NIR_INTRINSIC_MAX_CONST_INDEX 3 > + > /** Represents an intrinsic > * > * An intrinsic is an instruction type for handling things that are > @@ -842,7 +844,7 @@ typedef struct { > */ > uint8_t num_components; > > - int const_index[3]; > + int const_index[NIR_INTRINSIC_MAX_CONST_INDEX]; > > nir_deref_var *variables[2]; > > @@ -871,6 +873,29 @@ typedef enum { > NIR_INTRINSIC_CAN_REORDER = (1 << 1), > } nir_intrinsic_semantic_flag; > > +/** > + * \name NIR intrinsics const-index flag > + * > + * Indicates the usage of a const_index slot. > + * > + * \sa nir_intrinsic_info::index_map > + */ > +typedef enum { > + /** > +* Generally instructions that take a offset src argument, can encode > +* a constant 'base' value which is added to the offset. > +*/ > + NIR_INTRINSIC_BASE = 1, > + > + /** > +* For store instructions, a writemask for the store. > +*/ > + NIR_INTRINSIC_WRMASK = 2, > + > + NIR_INTRINSIC_NUM_INDEX_FLAGS, > + > +} nir_intrinsic_index_flag; > + > #define NIR_INTRINSIC_MAX_INPUTS 4 > > typedef struct { > @@ -900,12 +925,31 @@ typedef struct { > /** the number of constant indices used by the intrinsic */ > unsigned num_indices; > > + /** indicates the usage of intr->const_index[n] */ > + unsigned index_map[NIR_INTRINSIC_NUM_INDEX_FLAGS]; > + > /** semantic flags for calls to this intrinsic */ > nir_intrinsic_semantic_flag flags; > } nir_intrinsic_info; > > extern const nir_intrinsic_info nir_intrinsic_infos[nir_num_intrinsics]; > > +static inline unsigned > +nir_intrinsic_write_mask(nir_intrinsic_instr *instr) > +{ > + const nir_intrinsic_info *info = _intrinsic_infos[instr->intrinsic]; > + assert(info->index_map[NIR_INTRINSIC_WRMASK] > 0); > + return instr->const_index[info->index_map[NIR_INTRINSIC_WRMASK] - 1]; > +} > + > +static inline int > +nir_intrinsic_base(nir_intrinsic_instr *instr) > +{ > + const nir_intrinsic_info *info = _intrinsic_infos[instr->intrinsic]; > + assert(info->index_map[NIR_INTRINSIC_BASE] > 0); > + return instr->const_index[info->index_map[NIR_INTRINSIC_BASE] - 1]; > +} > + > /** > * \group texture information > * > diff --git a/src/glsl/nir/nir_intrinsics.c b/src/glsl/nir/nir_intrinsics.c > index a7c868c..7dddc70 100644 > --- a/src/glsl/nir/nir_intrinsics.c > +++ b/src/glsl/nir/nir_intrinsics.c > @@ -30,7 +30,8 @@ > #define OPCODE(name) nir_intrinsic_##name > > #define INTRINSIC(_name, _num_srcs, _src_components, _has_dest, \ > - _dest_components, _num_variables, _num_indices, _flags) \ > + _dest_components, _num_variables, _num_indices, \ > + idx0, idx1, idx2, _flags) \ > { \ > .name = #_name, \ > .num_srcs = _num_srcs, \ > @@ -39,9 +40,15 @@ > .dest_components = _dest_components, \ > .num_variables = _num_variables, \ > .num_indices =
Re: [Mesa-dev] [PATCH] radeonsi: don't print a warning for unhandled registers returned by LLVM
On 13.01.2016 20:23, Marek Olšák wrote: > On Wed, Jan 13, 2016 at 4:25 AM, Michel Dänzerwrote: >> On 13.01.2016 03:44, Marek Olšák wrote: >>> From: Marek Olšák >>> >>> We don't want apps to flood stderr. New LLVM + old Mesa is a perfectly >>> valid combination (if it doesn't fail to build, of course). >> >> Actually it's not, in general. > > Why not? LLVM (at least outside of the C APIs it exposes) only guarantees compatibility between minor releases of the same major release branch. So, using Mesa with an SVN snapshot or major release of LLVM which is newer than the Git snapshot or release of Mesa isn't guaranteed to work, even if it happens to build. I think this message should still be printed at least once, as an indication that something might be wrong with the setup. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nv50/ir: only use FILE_LOCAL_MEMORY for temp arrays that use indirection
Previously we were treating any indirect temp array usage to mean that everything should end up in lmem. The MemoryOpt pass would clean a lot of that up later, but in the meanwhile we would lose a lot of opportunity for optimization. This helps a lot of Metro 2033 Redux and a handful of KSP shaders: total instructions in shared programs : 6288373 -> 6261517 (-0.43%) total gprs used in shared programs: 944051 -> 945131 (0.11%) total local used in shared programs : 54116 -> 54116 (0.00%) total bytes used in shared programs : 50306984 -> 50092136 (-0.43%) A typical case is for register usage to double and for instructions to halve. A future commit can also optimize local memory usage size to be reduced with better packing. Signed-off-by: Ilia Mirkin--- .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 65 +- 1 file changed, 50 insertions(+), 15 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 7e3b093..507749d 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -96,6 +96,13 @@ public: return tgsi_util_get_src_register_swizzle(, chan); } + int getArrayId() const + { + if (isIndirect(0)) +return fsr->Indirect.ArrayID; + return 0; + } + nv50_ir::Modifier getMod(int chan) const; SrcRegister getIndirect(int dim) const @@ -155,6 +162,13 @@ public: return SrcRegister(fdr->Indirect); } + int getArrayId() const + { + if (isIndirect(0)) +return fdr->Indirect.ArrayID; + return 0; + } + private: const struct tgsi_dst_register reg; const struct tgsi_full_dst_register *fdr; @@ -826,7 +840,8 @@ public: // these registers are per-subroutine, cannot be used for parameter passing std::set locals; - bool mainTempsInLMem; + std::set indirectTempArrays; + std::vector tempArrayId; int clipVertexOutput; @@ -859,8 +874,6 @@ Source::Source(struct nv50_ir_prog_info *prog) : info(prog) if (prog->dbgFlags & NV50_IR_DEBUG_BASIC) tgsi_dump(tokens, 0); - - mainTempsInLMem = false; } Source::~Source() @@ -890,6 +903,7 @@ bool Source::scanSource() textureViews.resize(scan.file_max[TGSI_FILE_SAMPLER_VIEW] + 1); resources.resize(scan.file_max[TGSI_FILE_IMAGE] + 1); + tempArrayId.resize(scan.file_max[TGSI_FILE_TEMPORARY] + 1); info->immd.bufSize = 0; @@ -935,7 +949,8 @@ bool Source::scanSource() } tgsi_parse_free(); - if (mainTempsInLMem) + // TODO: Compute based on relevant array sizes + if (indirectTempArrays.size()) info->bin.tlsSpace += (scan.file_max[TGSI_FILE_TEMPORARY] + 1) * 16; if (info->io.genUserClip > 0) { @@ -1046,6 +1061,7 @@ bool Source::scanDeclaration(const struct tgsi_full_declaration *decl) unsigned sn = TGSI_SEMANTIC_GENERIC; unsigned si = 0; const unsigned first = decl->Range.First, last = decl->Range.Last; + const int arrayId = decl->Array.ArrayID; if (decl->Declaration.Semantic) { sn = decl->Semantic.Name; @@ -1189,8 +1205,11 @@ bool Source::scanDeclaration(const struct tgsi_full_declaration *decl) for (i = first; i <= last; ++i) textureViews[i].target = decl->SamplerView.Resource; break; - case TGSI_FILE_NULL: case TGSI_FILE_TEMPORARY: + for (i = first; i <= last; ++i) + tempArrayId[i] = arrayId; + break; + case TGSI_FILE_NULL: case TGSI_FILE_ADDRESS: case TGSI_FILE_CONSTANT: case TGSI_FILE_IMMEDIATE: @@ -1241,7 +1260,7 @@ bool Source::scanInstruction(const struct tgsi_full_instruction *inst) } else if (insn.getDst(0).getFile() == TGSI_FILE_TEMPORARY) { if (insn.getDst(0).isIndirect(0)) -mainTempsInLMem = true; +indirectTempArrays.insert(insn.getDst(0).getArrayId()); } else if (insn.getDst(0).getFile() == TGSI_FILE_BUFFER) { info->io.globalAccess |= 0x2; @@ -1252,7 +1271,7 @@ bool Source::scanInstruction(const struct tgsi_full_instruction *inst) Instruction::SrcRegister src = insn.getSrc(s); if (src.getFile() == TGSI_FILE_TEMPORARY) { if (src.isIndirect(0)) -mainTempsInLMem = true; +indirectTempArrays.insert(src.getArrayId()); } else if (src.getFile() == TGSI_FILE_BUFFER) { info->io.globalAccess |= (insn.getOpcode() == TGSI_OPCODE_LOAD) ? @@ -1434,6 +1453,7 @@ private: DataType srcTy; DataArray tData; // TGSI_FILE_TEMPORARY + DataArray lData; // TGSI_FILE_TEMPORARY, for indirect arrays DataArray aData; // TGSI_FILE_ADDRESS DataArray pData; // TGSI_FILE_PREDICATE DataArray oData; // TGSI_FILE_OUTPUT (if outputs in registers) @@ -1637,7 +1657,7 @@
Re: [Mesa-dev] [PATCH] radeonsi: enable late VS export memory allocation
> > Axel Davy benchmarked this briefly. We may need more benchmarks though. > > Marek > I confirm setting this register helps get a few % with heaven. There was also another register to kill color exports early when doing depth only pass that helped a few % (but less). Axel ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 93686] Performance improvement ?=:=?UTF-8?Q? Please consider hardware ɢᴘᴜ rendering in llvmpipe
https://bugs.freedesktop.org/show_bug.cgi?id=93686 --- Comment #2 from Roland Scheidegger--- I'm not sure if this exact same proposal really came up already. We have seen some though asking if we couldn't combine llvmpipe with less capable gpus to make a driver offering more features, that is executing the stuff the gpu can't do with llvmpipe (but no, we really can't in any meaningful way). This proposal sounds even more ambitious in some ways, I certainly agree we can't make it happen. With Vulkan, it may be the developers choice if multiple gpus are available which one to use for what, so theoretically there might be some way there to make something like that happen, but I've no idea there really (plus, unless you're looking at something like at least 5 year old low-end gpu vs. 8-core current high-end cpu, there'd still be no benefits even if that could be made to work). There is one thing llvmpipe is "reasonably good" at compared to gpus, which is shader arithmetic (at least for pixel shaders, not running in parallel for vertex ones, with tons of gotchas as we don't currently even optimize empty branches away), but there's just no way to separate that. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] android: enable building static version of libdrm
From: Sumit SemwalAndroid needs libdrm built statically for recovery; enable that as well. Signed-off-by: Sumit Semwal Signed-off-by: Rob Herring Cc: Chih-Wei Huang Cc: Emil Velikov --- Android.mk | 19 +++ 1 file changed, 19 insertions(+) diff --git a/Android.mk b/Android.mk index 90cdcb3..1d8cd65 100644 --- a/Android.mk +++ b/Android.mk @@ -27,6 +27,8 @@ include $(CLEAR_VARS) # Import variables LIBDRM_{,H_,INCLUDE_H_,INCLUDE_VMWGFX_H_}FILES include $(LOCAL_PATH)/Makefile.sources +#static library for the device (recovery) +include $(CLEAR_VARS) LOCAL_MODULE := libdrm LOCAL_MODULE_TAGS := optional @@ -41,7 +43,24 @@ LOCAL_C_INCLUDES := \ LOCAL_CFLAGS := \ -DHAVE_VISIBILITY=1 \ -DHAVE_LIBDRM_ATOMIC_PRIMITIVES=1 +include $(BUILD_STATIC_LIBRARY) + +# Shared library for the device +include $(CLEAR_VARS) +LOCAL_MODULE := libdrm +LOCAL_MODULE_TAGS := optional +LOCAL_SRC_FILES := $(LIBDRM_FILES) +LOCAL_EXPORT_C_INCLUDE_DIRS := \ +$(LOCAL_PATH) \ +$(LOCAL_PATH)/include/drm + +LOCAL_C_INCLUDES := \ +$(LOCAL_PATH)/include/drm + +LOCAL_CFLAGS := \ +-DHAVE_VISIBILITY=1 \ +-DHAVE_LIBDRM_ATOMIC_PRIMITIVES=1 include $(BUILD_SHARED_LIBRARY) include $(call all-makefiles-under,$(LOCAL_PATH)) -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallivm: merge two identical LLVM version checks
Am 13.01.2016 um 05:41 schrieb Evangelos Foutras: > --- > src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp > b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp > index 3ee708f..b119a93 100644 > --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp > +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp > @@ -515,12 +515,9 @@ > lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT, > MAttrs.push_back(util_cpu_caps.has_ssse3 ? "+ssse3" : "-ssse3" ); > #if HAVE_LLVM >= 0x0304 > MAttrs.push_back(util_cpu_caps.has_sse4_1 ? "+sse4.1" : "-sse4.1"); > -#else > - MAttrs.push_back(util_cpu_caps.has_sse4_1 ? "+sse41" : "-sse41" ); > -#endif > -#if HAVE_LLVM >= 0x0304 > MAttrs.push_back(util_cpu_caps.has_sse4_2 ? "+sse4.2" : "-sse4.2"); > #else > + MAttrs.push_back(util_cpu_caps.has_sse4_1 ? "+sse41" : "-sse41" ); > MAttrs.push_back(util_cpu_caps.has_sse4_2 ? "+sse42" : "-sse42" ); > #endif > /* > Reviewed-by: Roland Scheidegger___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Where to find MAPI_TABLE_NUM_STATIC & MAPI_TABLE_NUM_DYNAMIC
Looks like both symbols are defined in src/mapi/shared-glapi/glapi_mapi_tmp.h which is generated at compile time by the src/mapi/mapi_abi.py script. -Brian On 01/13/2016 08:40 AM, Jouk Jansen wrote: Hi all, I'm trying to create a fresh compilation for my OpenVMS system using the sources I extracted using git today. At some point the compilation fails because MAPI_TABLE_NUM_STATIC and MAPI_TABLE_NUM_DYNAMIC are not defined. In a version compiled sometime ago I found the definitions in the file mapi/vgapi/vgapi_tmp.h, a file generated during the compilation. However that directory is now obsolete. It seems that mapi/glapi/glapi_mapi_tmp.h is the replacement, but that file does not contain neither MAPI_TABLE_NUM_STATIC nor MAPI_TABLE_NUM_DYNAMIC. Where am I supposed to find the definitions? Regards Jouk Pax, vel iniusta, utilior est quam iustissimum bellum. (free after Marcus Tullius Cicero (106 b.Chr.-46 b.Chr.) Epistularum ad Atticum 7.1.4.3) Touch not the cat bot a glove --< Jouk Jansen jo...@hrem.nano.tudelft.nl Technische Universiteit Delfttt uu uu ddd Kavli Institute of Nanoscience tt uu uu dddd Nationaal centrum voor HREM tt uu uu dd dd Lorentzweg 1 tt uu uu dd dd 2628 CJ Delfttt uu uu dd dd Nederlandtt uu uu dddd tel. 31-15-2782272 tt uuu ddd --< ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 8/8] gallium/radeon: do not reallocate user memory buffers
Patches 3-8: Reviewed-by: Marek OlšákMarek On Tue, Jan 12, 2016 at 5:06 PM, Nicolai Hähnle wrote: > From: Nicolai Hähnle > > The whole point of AMD_pinned_memory is that applications don't have to map > buffers via OpenGL - but they're still allowed to, so make sure we don't break > the link between buffer object and user memory unless explicitly instructed > to. > --- > src/gallium/drivers/radeon/r600_buffer_common.c | 31 > ++--- > src/gallium/drivers/radeon/radeon_winsys.h | 8 +++ > src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 6 + > src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 6 + > 4 files changed, 43 insertions(+), 8 deletions(-) > > diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c > b/src/gallium/drivers/radeon/r600_buffer_common.c > index 09755e0..6592c5b 100644 > --- a/src/gallium/drivers/radeon/r600_buffer_common.c > +++ b/src/gallium/drivers/radeon/r600_buffer_common.c > @@ -209,11 +209,15 @@ static void r600_buffer_destroy(struct pipe_screen > *screen, > FREE(rbuffer); > } > > -void r600_invalidate_resource(struct pipe_context *ctx, > - struct pipe_resource *resource) > +static bool > +r600_do_invalidate_resource(struct r600_common_context *rctx, > + struct r600_resource *rbuffer) > { > - struct r600_common_context *rctx = (struct r600_common_context*)ctx; > -struct r600_resource *rbuffer = r600_resource(resource); > + /* In AMD_pinned_memory, the user pointer association only gets > +* broken when the buffer is explicitly re-allocated. > +*/ > + if (rctx->ws->buffer_is_user_ptr(rbuffer->buf)) > + return false; > > /* Check if mapping this buffer would cause waiting for the GPU. */ > if (r600_rings_is_buffer_referenced(rctx, rbuffer->buf, > RADEON_USAGE_READWRITE) || > @@ -222,6 +226,17 @@ void r600_invalidate_resource(struct pipe_context *ctx, > } else { > util_range_set_empty(>valid_buffer_range); > } > + > + return true; > +} > + > +void r600_invalidate_resource(struct pipe_context *ctx, > + struct pipe_resource *resource) > +{ > + struct r600_common_context *rctx = (struct r600_common_context*)ctx; > + struct r600_resource *rbuffer = r600_resource(resource); > + > + (void)r600_do_invalidate_resource(rctx, rbuffer); > } > > static void *r600_buffer_get_transfer(struct pipe_context *ctx, > @@ -291,10 +306,10 @@ static void *r600_buffer_transfer_map(struct > pipe_context *ctx, > !(usage & PIPE_TRANSFER_UNSYNCHRONIZED)) { > assert(usage & PIPE_TRANSFER_WRITE); > > - r600_invalidate_resource(ctx, resource); > - > - /* At this point, the buffer is always idle. */ > - usage |= PIPE_TRANSFER_UNSYNCHRONIZED; > + if (r600_do_invalidate_resource(rctx, rbuffer)) { > + /* At this point, the buffer is always idle. */ > + usage |= PIPE_TRANSFER_UNSYNCHRONIZED; > + } > } > else if ((usage & PIPE_TRANSFER_DISCARD_RANGE) && > !(usage & PIPE_TRANSFER_UNSYNCHRONIZED) && > diff --git a/src/gallium/drivers/radeon/radeon_winsys.h > b/src/gallium/drivers/radeon/radeon_winsys.h > index 4af6a18..ad30474 100644 > --- a/src/gallium/drivers/radeon/radeon_winsys.h > +++ b/src/gallium/drivers/radeon/radeon_winsys.h > @@ -530,6 +530,14 @@ struct radeon_winsys { > void *pointer, unsigned size); > > /** > + * Whether the buffer was created from a user pointer. > + * > + * \param buf A winsys buffer object > + * \return whether \p buf was created via buffer_from_ptr > + */ > +bool (*buffer_is_user_ptr)(struct pb_buffer *buf); > + > +/** > * Get a winsys handle from a winsys buffer. The internal structure > * of the handle is platform-specific and only a winsys should access it. > * > diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c > b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c > index a844773..82c803b 100644 > --- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c > +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c > @@ -686,6 +686,11 @@ error: > return NULL; > } > > +static bool amdgpu_bo_is_user_ptr(struct pb_buffer *buf) > +{ > + return ((struct amdgpu_winsys_bo*)buf)->user_ptr != NULL; > +} > + > static uint64_t amdgpu_bo_get_va(struct pb_buffer *buf) > { > return ((struct amdgpu_winsys_bo*)buf)->va; > @@ -701,6 +706,7 @@ void amdgpu_bo_init_functions(struct amdgpu_winsys *ws) > ws->base.buffer_create = amdgpu_bo_create; > ws->base.buffer_from_handle = amdgpu_bo_from_handle; > ws->base.buffer_from_ptr = amdgpu_bo_from_ptr; > + ws->base.buffer_is_user_ptr =
[Mesa-dev] Where to find MAPI_TABLE_NUM_STATIC & MAPI_TABLE_NUM_DYNAMIC
Hi all, I'm trying to create a fresh compilation for my OpenVMS system using the sources I extracted using git today. At some point the compilation fails because MAPI_TABLE_NUM_STATIC and MAPI_TABLE_NUM_DYNAMIC are not defined. In a version compiled sometime ago I found the definitions in the file mapi/vgapi/vgapi_tmp.h, a file generated during the compilation. However that directory is now obsolete. It seems that mapi/glapi/glapi_mapi_tmp.h is the replacement, but that file does not contain neither MAPI_TABLE_NUM_STATIC nor MAPI_TABLE_NUM_DYNAMIC. Where am I supposed to find the definitions? Regards Jouk Pax, vel iniusta, utilior est quam iustissimum bellum. (free after Marcus Tullius Cicero (106 b.Chr.-46 b.Chr.) Epistularum ad Atticum 7.1.4.3) Touch not the cat bot a glove >--< Jouk Jansen jo...@hrem.nano.tudelft.nl Technische Universiteit Delfttt uu uu ddd Kavli Institute of Nanoscience tt uu uu dddd Nationaal centrum voor HREM tt uu uu dd dd Lorentzweg 1 tt uu uu dd dd 2628 CJ Delfttt uu uu dd dd Nederlandtt uu uu dddd tel. 31-15-2782272 tt uuu ddd >--< ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Implement nir_op_fquantize2f16
Hi Jason Am 13/01/2016 um 00:35 schrieb Jason Ekstrand: > --- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 13 + > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 10 ++ > 2 files changed, 23 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index 6213378..ffb8059 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -943,6 +943,19 @@ fs_visitor::nir_emit_alu(const fs_builder , > nir_alu_instr *instr) >inst->saturate = instr->dest.saturate; >break; > > + case nir_op_fquantize2f16: { > + fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_D); > + > + /* The destination stride must be at least as big as the source > stride. */ > + tmp.type = BRW_REGISTER_TYPE_W; > + tmp.stride = 2; After a comment like 'at least at big' one would normaly expect some check to ensure that. Maybe add a "So set it to 2" or whatever -Michael > + > + bld.emit(BRW_OPCODE_F32TO16, tmp, op[0]); > + inst = bld.emit(BRW_OPCODE_F16TO32, result, tmp); > + inst->saturate = instr->dest.saturate; > + break; > + } > + > case nir_op_fmin: > case nir_op_imin: > case nir_op_umin: > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > index 37f517d..77a2f8b 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > @@ -1177,6 +1177,16 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) >inst->saturate = instr->dest.saturate; >break; > > + case nir_op_fquantize2f16: { > + /* See also vec4_visitor::emit_pack_half_2x16() */ > + src_reg tmp = src_reg(this, glsl_type::uvec4_type); > + > + emit(F32TO16(dst_reg(tmp), op[0])); > + inst = emit(F16TO32(dst, tmp)); > + inst->saturate = instr->dest.saturate; > + break; > + } > + > case nir_op_fmin: > case nir_op_imin: > case nir_op_umin: > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: enable late VS export memory allocation
On Wed, Jan 13, 2016 at 4:35 PM, Axel Davywrote: > >> >> Axel Davy benchmarked this briefly. We may need more benchmarks though. >> >> Marek >> > > I confirm setting this register helps get a few % with heaven. > > There was also another register to kill color exports early when doing > depth only pass that helped a few % (but less). Do you remember which register it was? The hardware should not execute PS when doing depth-only rendering and the shader doesn't use KILL. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev