Re: [Mesa-dev] [PATCH 2/2] util/ralloc: Make sizeof(linear_header) a multiple of 8
Wt., 13 lis 2018, 06:03: Matt Turner napisał(a): > On Mon, Nov 12, 2018 at 3:07 PM Eric Anholt wrote: > > > > Matt Turner writes: > > > > > Prior to this patch sizeof(linear_header) was 20 bytes in a > > > non-debug build on 32-bit platforms. We do some pointer arithmetic to > > > calculate the next available location with > > > > > >ptr = (linear_size_chunk *)((char *)[1] + latest->offset); > > > > > > in linear_alloc_child(). The [1] adds 20 bytes, so an allocation > > > would only be 4-byte aligned. > > > > > > On 32-bit SPARC a 'sttw' instruction (which stores a consecutive pair > of > > > 4-byte registers to memory) requires an 8-byte aligned address. Such an > > > instruction is used to store to an 8-byte integer type, like intmax_t > > > which is used in glcpp's expression_value_t struct. > > > > > > As a result of the 4-byte alignment returned by linear_alloc_child() we > > > would generate a SIGBUS (unaligned exception) on SPARC. > > > > > > According to the GNU libc manual malloc() always returns memory that > has > > > at least an alignment of 8-bytes [1]. I think our allocator should do > > > the same. > > > > > > So, simple fix with two parts: > > > > > >(1) Increase SUBALLOC_ALIGNMENT to 8 unconditionally. > > >(2) Mark linear_header with an aligned attribute, which will cause > > >its sizeof to be rounded up to that alignment. (We already do > > >this for ralloc_header) > > > > > > With this done, all Mesa's unit tests now pass on SPARC. > > > > > > [1] > https://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html > > > > > > Fixes: 47e17586924f ("glcpp: use the linear allocator for most > objects") > > > Bug: https://bugs.gentoo.org/636326 > > > --- > > > src/util/ralloc.c | 14 -- > > > 1 file changed, 12 insertions(+), 2 deletions(-) > > > > > > diff --git a/src/util/ralloc.c b/src/util/ralloc.c > > > index 745b4cf1226..fc35661996d 100644 > > > --- a/src/util/ralloc.c > > > +++ b/src/util/ralloc.c > > > @@ -552,10 +552,18 @@ ralloc_vasprintf_rewrite_tail(char **str, size_t > *start, const char *fmt, > > > */ > > > > > > #define MIN_LINEAR_BUFSIZE 2048 > > > -#define SUBALLOC_ALIGNMENT sizeof(uintptr_t) > > > +#define SUBALLOC_ALIGNMENT 8 > > > #define LMAGIC 0x87b9c7d3 > > > > > > -struct linear_header { > > > +struct > > > +#ifdef _MSC_VER > > > + __declspec(align(8)) > > > +#elif defined(__LP64__) > > > + __attribute__((aligned(16))) > > > +#else > > > + __attribute__((aligned(8))) > > > +#endif > > > + linear_header { > > > #ifndef NDEBUG > > > unsigned magic; /* for debugging */ > > > #endif > > > @@ -647,6 +655,8 @@ linear_alloc_child(void *parent, unsigned size) > > > ptr = (linear_size_chunk *)((char*)[1] + latest->offset); > > > ptr->size = size; > > > latest->offset += full_size; > > > + > > > + assert((uintptr_t)[1] % SUBALLOC_ALIGNMENT == 0); > > > return [1]; > > > } > > > > These patches are: > > > > Reviewed-by: Eric Anholt > > Thanks a bunch! I hope this is useful for you on arm as well. > > > However, shouldn't we also bump SUBALLOC_ALIGNMENT to 16 on LP64, too, > > if that's what glibc is doing for malloc? > > 16-byte alignment is necessary for SSE aligned vector load/store > instructions. I suppose we're not getting any vectorized SSE > load/store instructions to memory allocated by linear_alloc_* and > that's why we haven't seen problems? > FWIW, at least clang on x86 assumes malloc/new return pointers aligned to 16 bytes, though it probably doesn't detect linear_alloc_* as such. Regards, Gustaw Smolarczyk > Seems reasonable to bump it to 16-bytes. > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [radeonsi] Blender/vsraytrace/fsraytrace/gsraytrace - GPUShader: compile error
SR#648671 On Tue, 13 Nov 2018, Dieter Nützel wrote: > GREAT hint Tim! > > Yes, of course. > > /home/dieter> gcc --version > gcc (SUSE Linux) 8.2.1 20181025 [gcc-8-branch revision 265488] > > So I have to ping SUSE to push the fix, too. > > Thanks a lot. > > Dieter > > Am 12.11.2018 08:28, schrieb Timothy Arceri: > > I'm guessing your using GCC 8.2.1 to compile Mesa? There was a compiler bug: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1645400 > > > > On 12/11/18 2:11 pm, Dieter Nützel wrote: > > > Hello, > > > > > > I get brocken shaders with Blender and the above demos didn't start > > > any longer. > > > > > > NOT NIR related. > > > Have to start bisect. > > > > > > OpenGL renderer string: Radeon RX 580 Series (POLARIS10, DRM 3.27.0, > > > 4.19.0-rc1-1.g7262353-default+, LLVM 8.0.0) > > > OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.0.0-devel > > > (git-590fcb50e7) > > > OpenGL core profile shading language version string: 4.50 > > > > > > mesa-demos/glsl> blender > > > Read prefs: /home/dieter/.config/blender/2.79/config/userpref.blend > > > libGL: Can't open configuration file /usr/local/etc/drirc: No such file or > > > directory. > > > libGL: Can't open configuration file /usr/local/etc/drirc: No such file or > > > directory. > > > libGL: pci id for fd 16: 1002:67df, driver radeonsi > > > libGL: OpenDriver: trying /usr/local/lib64/dri/tls/radeonsi_dri.so > > > libGL: OpenDriver: trying /usr/local/lib64/dri/radeonsi_dri.so > > > libGL: Can't open configuration file /usr/local/etc/drirc: No such file or > > > directory. > > > libGL: Can't open configuration file /usr/local/etc/drirc: No such file or > > > directory. > > > libGL: Can't open configuration file /usr/local/etc/drirc: No such file or > > > directory. > > > usr/share/libdrm/amdgpu.ids version: 1.0.0 > > > libGL: Using DRI3 for screen 0 > > > Read blend: /data/Blender/BMW3v2.blend > > > 2.66 versioning fix: replacing black sky with premultiplied alpha for > > > scene Scene > > > Read blend: /data/Blender/BMW27GE.blend > > > GPUShader: compile error: > > > 0:1177(22): error: invalid input layout qualifier used > > > [-] > > > > > > Read blend: /data/Blender/BMW27.blend > > > skipping driver '100*power', automatic scripts are disabled > > > skipping driver '-100*power', automatic scripts are disabled > > > skipping driver '-90*brake', automatic scripts are disabled > > > skipping driver '90*brake', automatic scripts are disabled > > > libGL: Can't open configuration file /usr/local/etc/drirc: No such file or > > > directory. > > > libGL: Can't open configuration file /usr/local/etc/drirc: No such file or > > > directory. > > > libGL: Can't open configuration file /usr/local/etc/drirc: No such file or > > > directory. > > > libGL: Can't open configuration file /usr/local/etc/drirc: No such file or > > > directory. > > > Read blend: /data/Blender/sanisidro.blend > > > Read blend: /data/Blender/bh.blend > > > Info: Read library: '/projeto.blend', '//../../projeto.blend', parent > > > '' > > > Warning: Cannot find lib '/projeto.blend' > > > Warning: LIB: Group: 'Projeto' missing from '/projeto.blend', parent > > > '' > > > Info: Read library: '/projeto.blend', '//../../projeto.blend', parent > > > '' > > > Warning: Unable to open '/projeto.blend': No such file or directory > > > Warning: Cannot find lib '/projeto.blend' > > > Warning: LIB: Group: 'Projeto' missing from '/projeto.blend', parent > > > '' > > > > > > GPUShader: compile error: > > > 0:1177(22): error: invalid input layout qualifier used > > > [-] > > > > > > mesa-demos/glsl> ./fsraytrace > > > libGL: Can't open configuration file /usr/local/etc/drirc: No such file or > > > directory. > > > libGL: Can't open configuration file /usr/local/etc/drirc: No such file or > > > directory. > > > libGL: pci id for fd 4: 1002:67df, driver radeonsi > > > libGL: OpenDriver: trying /usr/local/lib64/dri/tls/radeonsi_dri.so > > > libGL: OpenDriver: trying /usr/local/lib64/dri/radeonsi_dri.so > > > libGL: Can't open configuration file /usr/local/etc/drirc: No such file or > > > directory. > > > libGL: Can't open configuration file /usr/local/etc/drirc: No such file or > > > directory. > > > libGL: Can't open configuration file /usr/local/etc/drirc: No such file or > > > directory. > > > usr/share/libdrm/amdgpu.ids version: 1.0.0 > > > libGL: Using DRI3 for screen 0 > > > Error: problem compiling shader: 0:48(2): error: invalid input layout > > > qualifier used > > > > > > Same with 'vsraytrace' and 'gsraytrace'. > > > > > > Thanks, > > > Dieter > > > ___ > > > mesa-dev mailing list > > > mesa-dev@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
Re: [Mesa-dev] [PATCH 1/3 v2] glsl: prevent qualifiers modification of predeclared variables
Hello, Thanks a lot for review. Regards, Andrii. On Sat, Nov 10, 2018 at 5:38 AM Timothy Arceri wrote: > Nice! Series is: > > Reviewed-by: Timothy Arceri > > On 10/10/18 9:07 am, Ian Romanick wrote: > > From: Ian Romanick > > > > Section 3.7 (Identifiers) of the GLSL spec says: > > > > However, as noted in the specification, there are some cases where > > previously declared variables can be redeclared to change or add > > some property, and predeclared "gl_" names are allowed to be > > redeclared in a shader only for these specific purposes. More > > generally, it is an error to redeclare a variable, including those > > starting "gl_". > > > > This patch should fix piglit tests: > > clip-distance-redeclare-without-inout.frag > > clip-distance-redeclare-without-inout.vert > > > > However, this causes a regression in > > clip-distance-out-values.shader_test. A fix for that test has been sent > > to the piglit list for review: > > > > https://patchwork.freedesktop.org/patch/255201/ > > > > As far as I understood following mailing thread: > > https://lists.freedesktop.org/archives/piglit/2013-October/007935.html > > looks like we have accepted to remove an ability to change qualifiers > > but have not done it yet. Unless I missed something) > > > > v2 (idr): Move 'earlier->data.mode != var->data.mode' test much earlier > > in the function. Add special handling for gl_LastFragData. > > > > Signed-off-by: Andrii Simiklit > > Signed-off-by: Ian Romanick > > --- > > src/compiler/glsl/ast_to_hir.cpp | 51 > +--- > > 1 file changed, 27 insertions(+), 24 deletions(-) > > > > diff --git a/src/compiler/glsl/ast_to_hir.cpp > b/src/compiler/glsl/ast_to_hir.cpp > > index 1082d6c91cf..2e4c9ef6776 100644 > > --- a/src/compiler/glsl/ast_to_hir.cpp > > +++ b/src/compiler/glsl/ast_to_hir.cpp > > @@ -4238,6 +4238,29 @@ get_variable_being_redeclared(ir_variable > **var_ptr, YYLTYPE loc, > > > > *is_redeclaration = true; > > > > + if (earlier->data.how_declared == ir_var_declared_implicitly) { > > + /* Verify that the redeclaration of a built-in does not change the > > + * storage qualifier. There are a couple special cases. > > + * > > + * 1. Some built-in variables that are defined as 'in' in the > > + *specification are implemented as system values. Allow > > + *ir_var_system_value -> ir_var_shader_in. > > + * > > + * 2. gl_LastFragData is implemented as a ir_var_shader_out, but > the > > + *specification requires that redeclarations omit any > qualifier. > > + *Allow ir_var_shader_out -> ir_var_auto for this one > variable. > > + */ > > + if (earlier->data.mode != var->data.mode && > > + !(earlier->data.mode == ir_var_system_value && > > +var->data.mode == ir_var_shader_in) && > > + !(strcmp(var->name, "gl_LastFragData") == 0 && > > +var->data.mode == ir_var_auto)) { > > + _mesa_glsl_error(, state, > > + "redeclaration cannot change qualification of > `%s'", > > + var->name); > > + } > > + } > > + > > /* From page 24 (page 30 of the PDF) of the GLSL 1.50 spec, > > * > > * "It is legal to declare an array without a size and then > > @@ -4246,11 +4269,6 @@ get_variable_being_redeclared(ir_variable > **var_ptr, YYLTYPE loc, > > */ > > if (earlier->type->is_unsized_array() && var->type->is_array() > > && (var->type->fields.array == earlier->type->fields.array)) { > > - /* FINISHME: This doesn't match the qualifiers on the two > > - * FINISHME: declarations. It's not 100% clear whether this is > > - * FINISHME: required or not. > > - */ > > - > > const int size = var->type->array_size(); > > check_builtin_array_max_size(var->name, size, loc, state); > > if ((size > 0) && (size <= earlier->data.max_array_access)) { > > @@ -4342,28 +4360,13 @@ get_variable_being_redeclared(ir_variable > **var_ptr, YYLTYPE loc, > > earlier->data.precision = var->data.precision; > > earlier->data.memory_coherent = var->data.memory_coherent; > > > > - } else if (earlier->data.how_declared == ir_var_declared_implicitly > && > > - state->allow_builtin_variable_redeclaration) { > > + } else if ((earlier->data.how_declared == ir_var_declared_implicitly > && > > + state->allow_builtin_variable_redeclaration) || > > + allow_all_redeclarations) { > > /* Allow verbatim redeclarations of built-in variables. Not > explicitly > > * valid, but some applications do it. > > */ > > - if (earlier->data.mode != var->data.mode && > > - !(earlier->data.mode == ir_var_system_value && > > -var->data.mode == ir_var_shader_in)) { > > - _mesa_glsl_error(, state, > > -
Re: [Mesa-dev] [PATCH 12/15] anv: introduce helper to resolve vk_format from anv_format
On 11/6/18 3:01 PM, Lionel Landwerlin wrote: We could touch the macros in anv_formats.c to include VkFormat in anv_format if that makes your life easier. Yep, this makes sense. I'll add VkFormat there. On 30/10/2018 05:26, Tapani Pälli wrote: Signed-off-by: Tapani Pälli --- src/intel/vulkan/anv_formats.c | 18 ++ src/intel/vulkan/anv_private.h | 3 +++ 2 files changed, 21 insertions(+) diff --git a/src/intel/vulkan/anv_formats.c b/src/intel/vulkan/anv_formats.c index 1d3b1f67928..166b50f5a07 100644 --- a/src/intel/vulkan/anv_formats.c +++ b/src/intel/vulkan/anv_formats.c @@ -405,6 +405,24 @@ anv_get_format(VkFormat vk_format) return format; } +VkFormat +anv_get_vkformat(const struct anv_format *format) +{ +#define LAST_FORMAT(table) table + sizeof(table) - sizeof(struct anv_format) + + const struct anv_format *last_main = LAST_FORMAT(main_formats); + const struct anv_format *last_ycbcr = LAST_FORMAT(ycbcr_formats); + +#undef LAST_FORMAT + + if (format >= main_formats && format <= last_main) + return format - main_formats; + else if (format >= ycbcr_formats && format <= last_ycbcr) + return format - ycbcr_formats; + + return VK_FORMAT_UNDEFINED; +} + /** * Exactly one bit must be set in \a aspect. */ diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h index 882de030ae0..bfdb711337e 100644 --- a/src/intel/vulkan/anv_private.h +++ b/src/intel/vulkan/anv_private.h @@ -2634,6 +2634,9 @@ anv_plane_to_aspect(VkImageAspectFlags image_aspects, const struct anv_format * anv_get_format(VkFormat format); +VkFormat +anv_get_vkformat(const struct anv_format *format); + static inline uint32_t anv_get_format_planes(VkFormat vk_format) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] intel/decoder: tools: Use engine for decoding batch instructions
I forgot that aubinator_viewer_decoder.cpp needs to be updated too. But updated locally and will push with the fix. Thanks! - Lionel On 08/11/2018 10:36, Lionel Landwerlin wrote: Reviewed-by: Lionel Landwerlin On 07/11/2018 14:50, Toni Lönnberg wrote: The engine to which the batch was sent to is now set to the decoder context when decoding the batch. This is needed so that we can distinguish between instructions as the render and video pipe share some of the instruction opcodes. v2: The engine is now in the decoder context and the batch decoder uses a local function for finding the instruction for an engine. v3: Spec uses engine_mask now instead of engine, replaced engine class enums with the definitions from UAPI. --- src/intel/common/gen_batch_decoder.c | 25 +--- src/intel/common/gen_decoder.c | 7 ++- src/intel/common/gen_decoder.h | 6 +- src/intel/tools/aubinator.c | 3 +- src/intel/tools/aubinator_error_decode.c | 73 5 files changed, 63 insertions(+), 51 deletions(-) diff --git a/src/intel/common/gen_batch_decoder.c b/src/intel/common/gen_batch_decoder.c index 63f04627572..d5482a4d455 100644 --- a/src/intel/common/gen_batch_decoder.c +++ b/src/intel/common/gen_batch_decoder.c @@ -45,6 +45,7 @@ gen_batch_decode_ctx_init(struct gen_batch_decode_ctx *ctx, ctx->fp = fp; ctx->flags = flags; ctx->max_vbo_decoded_lines = -1; /* No limit! */ + ctx->engine = I915_ENGINE_CLASS_RENDER; if (xml_path == NULL) ctx->spec = gen_spec_load(devinfo); @@ -192,10 +193,16 @@ ctx_print_buffer(struct gen_batch_decode_ctx *ctx, fprintf(ctx->fp, "\n"); } +static struct gen_group * +gen_ctx_find_instruction(struct gen_batch_decode_ctx *ctx, const uint32_t *p) +{ + return gen_spec_find_instruction(ctx->spec, ctx->engine, p); +} + static void handle_state_base_address(struct gen_batch_decode_ctx *ctx, const uint32_t *p) { - struct gen_group *inst = gen_spec_find_instruction(ctx->spec, p); + struct gen_group *inst = gen_ctx_find_instruction(ctx, p); struct gen_field_iterator iter; gen_field_iterator_init(, inst, p, 0, false); @@ -309,7 +316,7 @@ static void handle_media_interface_descriptor_load(struct gen_batch_decode_ctx *ctx, const uint32_t *p) { - struct gen_group *inst = gen_spec_find_instruction(ctx->spec, p); + struct gen_group *inst = gen_ctx_find_instruction(ctx, p); struct gen_group *desc = gen_spec_find_struct(ctx->spec, "INTERFACE_DESCRIPTOR_DATA"); @@ -373,7 +380,7 @@ static void handle_3dstate_vertex_buffers(struct gen_batch_decode_ctx *ctx, const uint32_t *p) { - struct gen_group *inst = gen_spec_find_instruction(ctx->spec, p); + struct gen_group *inst = gen_ctx_find_instruction(ctx, p); struct gen_group *vbs = gen_spec_find_struct(ctx->spec, "VERTEX_BUFFER_STATE"); struct gen_batch_decode_bo vb = {}; @@ -436,7 +443,7 @@ static void handle_3dstate_index_buffer(struct gen_batch_decode_ctx *ctx, const uint32_t *p) { - struct gen_group *inst = gen_spec_find_instruction(ctx->spec, p); + struct gen_group *inst = gen_ctx_find_instruction(ctx, p); struct gen_batch_decode_bo ib = {}; uint32_t ib_size = 0; @@ -486,7 +493,7 @@ handle_3dstate_index_buffer(struct gen_batch_decode_ctx *ctx, static void decode_single_ksp(struct gen_batch_decode_ctx *ctx, const uint32_t *p) { - struct gen_group *inst = gen_spec_find_instruction(ctx->spec, p); + struct gen_group *inst = gen_ctx_find_instruction(ctx, p); uint64_t ksp = 0; bool is_simd8 = false; /* vertex shaders on Gen8+ only */ @@ -528,7 +535,7 @@ decode_single_ksp(struct gen_batch_decode_ctx *ctx, const uint32_t *p) static void decode_ps_kernels(struct gen_batch_decode_ctx *ctx, const uint32_t *p) { - struct gen_group *inst = gen_spec_find_instruction(ctx->spec, p); + struct gen_group *inst = gen_ctx_find_instruction(ctx, p); uint64_t ksp[3] = {0, 0, 0}; bool enabled[3] = {false, false, false}; @@ -576,7 +583,7 @@ decode_ps_kernels(struct gen_batch_decode_ctx *ctx, const uint32_t *p) static void decode_3dstate_constant(struct gen_batch_decode_ctx *ctx, const uint32_t *p) { - struct gen_group *inst = gen_spec_find_instruction(ctx->spec, p); + struct gen_group *inst = gen_ctx_find_instruction(ctx, p); struct gen_group *body = gen_spec_find_struct(ctx->spec, "3DSTATE_CONSTANT_BODY"); @@ -658,7 +665,7 @@ decode_dynamic_state_pointers(struct gen_batch_decode_ctx *ctx, const char *struct_type, const uint32_t *p, int count) { - struct gen_group *inst = gen_spec_find_instruction(ctx->spec, p); + struct gen_group *inst = gen_ctx_find_instruction(ctx, p); uint32_t state_offset = 0; @@ -802,7 +809,7
Re: [Mesa-dev] [PATCH v2 3/4] dri: add AYUV format
I think this chunk (or the whole patch) should be cherry picked to stable. Otherwise we get a BAD_ATTRIBUTE error for trying to create an AYUV EGLImage. We should have BAD_MATCH instead. - Lionel On 09/11/2018 10:55, Lionel Landwerlin wrote: diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 87e1a704c6e..3b63aebbf9a 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -2278,6 +2278,7 @@ dri2_num_fourcc_format_planes(EGLint format) case DRM_FORMAT_YVYU: case DRM_FORMAT_UYVY: case DRM_FORMAT_VYUY: + case DRM_FORMAT_AYUV: return 1; case DRM_FORMAT_NV12: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] virgl: Add command and flags to initiate debugging on the host (v2)
The host side has now landed, but because I re-worked the guest side since Erik gave his R-B, I thought I'd ask you to take another look, Best, Gert Am Mittwoch, den 12.09.2018, 11:59 +0200 schrieb Gert Wollny: > From: Gert Wollny > > On the host VREND_DEBUG=guestallow must be set to let the guest > override > the debug flags. > > v2: Send flag string instead of flags, this avoids the need to keep > the flags in sync. > v3: Only request host logging if the host actually understands the > command > > Signed-off-by: Gert Wollny > --- > The corresponding virglrenderer patches can be found in this MR: > https://gitlab.freedesktop.org/virgl/virglrenderer/merge_requests/39 > > Thanks for reviewing, > Gert > > src/gallium/drivers/virgl/virgl_context.c | 8 > src/gallium/drivers/virgl/virgl_encode.c | 24 > > src/gallium/drivers/virgl/virgl_encode.h | 3 +++ > src/gallium/drivers/virgl/virgl_hw.h | 1 + > src/gallium/drivers/virgl/virgl_protocol.h | 1 + > 5 files changed, 37 insertions(+) > > diff --git a/src/gallium/drivers/virgl/virgl_context.c > b/src/gallium/drivers/virgl/virgl_context.c > index 4511bf3b2f..96932c473d 100644 > --- a/src/gallium/drivers/virgl/virgl_context.c > +++ b/src/gallium/drivers/virgl/virgl_context.c > @@ -1164,6 +1164,7 @@ struct pipe_context > *virgl_context_create(struct pipe_screen *pscreen, > struct virgl_context *vctx; > struct virgl_screen *rs = virgl_screen(pscreen); > vctx = CALLOC_STRUCT(virgl_context); > + const char *host_debug_flagstring; > > vctx->cbuf = rs->vws->cmd_buf_create(rs->vws); > if (!vctx->cbuf) { > @@ -1268,6 +1269,13 @@ struct pipe_context > *virgl_context_create(struct pipe_screen *pscreen, > virgl_encoder_create_sub_ctx(vctx, vctx->hw_sub_ctx_id); > > virgl_encoder_set_sub_ctx(vctx, vctx->hw_sub_ctx_id); > + > + if (rs->caps.caps.v2.capability_bits & > VIRGL_CAP_GUEST_MAY_INIT_LOG) { > + host_debug_flagstring = getenv("VIRGL_HOST_DEBUG"); > + if (host_debug_flagstring) > + virgl_encode_host_debug_flagstring(vctx, > host_debug_flagstring); > + } > + > return >base; > fail: > return NULL; > diff --git a/src/gallium/drivers/virgl/virgl_encode.c > b/src/gallium/drivers/virgl/virgl_encode.c > index e86d0711a5..400ba68474 100644 > --- a/src/gallium/drivers/virgl/virgl_encode.c > +++ b/src/gallium/drivers/virgl/virgl_encode.c > @@ -1044,3 +1044,27 @@ int virgl_encode_texture_barrier(struct > virgl_context *ctx, > virgl_encoder_write_dword(ctx->cbuf, flags); > return 0; > } > + > +int virgl_encode_host_debug_flagstring(struct virgl_context *ctx, > + char *flagstring) > +{ > + unsigned long slen = strlen(flagstring) + 1; > + uint32_t sslen; > + uint32_t string_length; > + > + if (!slen) > + return 0; > + > + if (slen > 4 * 0x) { > + debug_printf("VIRGL: host debug flag string too long, will be > truncated\n"); > + slen = 4 * 0x; > + } > + > + sslen = (uint32_t )(slen + 3) / 4; > + string_length = (uint32_t)MIN2(sslen * 4, slen); > + > + virgl_encoder_write_cmd_dword(ctx, > VIRGL_CMD0(VIRGL_CCMD_SET_DEBUG_FLAGS, 0, sslen)); > + virgl_encoder_write_block(ctx->cbuf, (uint8_t *)flagstring, > string_length); > + > + return 0; > +} > diff --git a/src/gallium/drivers/virgl/virgl_encode.h > b/src/gallium/drivers/virgl/virgl_encode.h > index 40e62d453b..80b943a6b3 100644 > --- a/src/gallium/drivers/virgl/virgl_encode.h > +++ b/src/gallium/drivers/virgl/virgl_encode.h > @@ -276,4 +276,7 @@ int virgl_encode_launch_grid(struct virgl_context > *ctx, > const struct pipe_grid_info > *grid_info); > int virgl_encode_texture_barrier(struct virgl_context *ctx, > unsigned flags); > + > +int virgl_encode_host_debug_flagstring(struct virgl_context *ctx, > + char *envname); > #endif > diff --git a/src/gallium/drivers/virgl/virgl_hw.h > b/src/gallium/drivers/virgl/virgl_hw.h > index 7736ceb935..e682c750e7 100644 > --- a/src/gallium/drivers/virgl/virgl_hw.h > +++ b/src/gallium/drivers/virgl/virgl_hw.h > @@ -231,6 +231,7 @@ enum virgl_formats { > #define VIRGL_CAP_SHADER_CLOCK (1 << 11) > #define VIRGL_CAP_TEXTURE_BARRIER (1 << 12) > #define VIRGL_CAP_TGSI_COMPONENTS (1 << 13) > +#define VIRGL_CAP_GUEST_MAY_INIT_LOG (1 << 14) > > /* virgl bind flags - these are compatible with mesa 10.5 gallium. > * but are fixed, no other should be passed to virgl either. > diff --git a/src/gallium/drivers/virgl/virgl_protocol.h > b/src/gallium/drivers/virgl/virgl_protocol.h > index 8d99c5ed47..3373121bf7 100644 > --- a/src/gallium/drivers/virgl/virgl_protocol.h > +++ b/src/gallium/drivers/virgl/virgl_protocol.h > @@ -92,6 +92,7 @@ enum virgl_context_cmd { > VIRGL_CCMD_SET_FRAMEBUFFER_STATE_NO_ATTACH, > VIRGL_CCMD_TEXTURE_BARRIER, >
Re: [Mesa-dev] [PATCH 6/7] RFC: nir/xfb_info: arrays of basic types adds just one varying
Hi Jason, just one thing here. Although I appreciate your interest to understand how varyings are enumerated, I think that we are diverting here, as in the end that would be something that I would need to solve. I just wanted to know for the way to go. The main question here is if we are really interested on adding such complexity on the general xfb gathering pass. This RFC was basically a way to show how much changes we would need, even for a incomplete solution Im not totally happy. So at this point, do you think that it is worth to add varying computation to the general pass in the name of code reuse, or should ARB_gl_spirv stick to their own gathering pass? On 10/11/18 12:13, Alejandro Piñeiro wrote: > On 09/11/18 16:58, Jason Ekstrand wrote: >> On November 9, 2018 06:39:25 Alejandro Piñeiro >> wrote: >>> On 08/11/18 23:14, Jason Ekstrand wrote: On Thu, Nov 8, 2018 at 7:22 AM Alejandro Piñeiro mailto:apinhe...@igalia.com>> wrote: On OpenGL, a array of a simple type adds just one varying. So gl_transform_feedback_varying_info struct defined at mtypes.h includes the parameters Type (base_type) and Size (number of elements). This commit checks this when the recursive add_var_xfb_outputs call handles arrays, to ensure that just one is addded. RFC: Until this point, all changes were reasonable, but this change is (imho) ugly. My idea was introducing as less as possible changes on the code, specially on its logic/flow. But this commit is almost a hack. The ideal solution would be to change the focus of the recursive function, focusing on varyings, and at each varying, recursively add outputs. But that seems like an overkill for a pass that was originally intended for consumers only caring about the outputs. So perhaps ARB_gl_spirv should keep their own gathering pass, with vayings and outputs, and let this one untouched for those that only care on outputs. --- src/compiler/nir/nir_gather_xfb_info.c | 52 -- 1 file changed, 43 insertions(+), 9 deletions(-) diff --git a/src/compiler/nir/nir_gather_xfb_info.c b/src/compiler/nir/nir_gather_xfb_info.c index 948b802a815..cb0e2724cab 100644 --- a/src/compiler/nir/nir_gather_xfb_info.c +++ b/src/compiler/nir/nir_gather_xfb_info.c @@ -36,23 +36,59 @@ nir_gather_xfb_info_create(void *mem_ctx, uint16_t output_count, uint16_t varyin return xfb; } +static bool +glsl_type_is_leaf(const struct glsl_type *type) +{ + if (glsl_type_is_struct(type) || + (glsl_type_is_array(type) && + (glsl_type_is_array(glsl_get_array_element(type)) || + glsl_type_is_struct(glsl_get_array_element(type) { I'm trying to understand exactly what this means. From what you wrote here it looks like the following are all one varying: float var[3]; vec2 var[3]; mat4 var[3]; >>> >>> Yes, GLSL returns one varying per each one (Size 3). >> >> Just to be clear, a matrix it array of matrices is one varying? > > Yep, and being more clear, for this shader: > #version 150 > #extension GL_ARB_enhanced_layouts: require > > layout(xfb_offset = 0) out mat4 var[3]; > > void main() { > mat4 m4; > > gl_Position = vec4(0.0); > > var[0] = m4; > } > > We get the following when we dump gl_program::LinkedTransformFeedback, > that is a struct gl_transform_feedback_info defined at mtypes.h: > > [gl_transform_feedback_info] > NumOuputs = 12, (OutputRegister, OutputBuffer, NumComponents, > StreamId, DstOffset, ComponentOffset) > 0:(31, 0, 4, 0, 0, 0) > 1:(32, 0, 4, 0, 4, 0) > 2:(33, 0, 4, 0, 8, 0) > 3:(34, 0, 4, 0, 12, 0) > 4:(35, 0, 4, 0, 16, 0) > 5:(36, 0, 4, 0, 20, 0) > 6:(37, 0, 4, 0, 24, 0) > 7:(38, 0, 4, 0, 28, 0) > 8:(39, 0, 4, 0, 32, 0) > 9:(40, 0, 4, 0, 36, 0) > 10:(41, 0, 4, 0, 40, 0) > 11:(42, 0, 4, 0, 44, 0) > NumVarying=1, (Offset, Type, BufferIndex, Size, Name) > 0:( 0, GL_FLOAT_MAT4, 0, 3, var) > ActiveBuffers=1, (Binding, NumVaryings, Stride, Stream): > 0:( 0, 1, 192, 0) > > FWIW, in some cases we are also getting a slightly different amount of > Outputs. But Im personally not really worried about that as far as it > keeps working. The number of varyings is somewhat different as it is > exposed through the program interface queries, so (I assume) it should > be consistent. > >> >>> but the following are not struct S { float f; vec4 v; };
Re: [Mesa-dev] [PATCH mesa] xmlpool: update translation po files
On Mon, 12 Nov 2018 at 18:14, Dylan Baker wrote: > > Quoting Eric Engestrom (2018-11-12 09:47:22) > > On Monday, 2018-11-12 16:56:32 +, Emil Velikov wrote: > > > On Mon, 12 Nov 2018 at 14:24, Eric Engestrom > > > wrote: > > > > > > > > These files are close to 4 years out of date; a lot's changed since. > > > > Let's just check in a recently-regenerated version. > > > > > > > Worth removing them from git and letting the build regenerate them as > > > needed? > > > > No, the point is for them to be filled with the translations. > > They aren't 100% generated, they're more like "refreshed" by running the > > ninja command, to add new strings to be translated and adjust file/line > > references. > > > > > > That said, I've just looked at the state of the translations, and > > "partial" is already generous. Users would currently get a mostly > > english driconf interface with a few strings translated here and there, > > which I'm not sure is worth the hassle of maintaining all this. > > > > Should we just drop the translation infrastructure? > > I'd try pinging the people who provided the translations in the first place to > see if they're interested in updating them. If not I'd be in favor of dropping > unmaintained translations, if there are no maintained translations drop the > whole things. > > Just my 2¢ > Very well said Dylan. I'm on the same page. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] virgl: native fence fd support
[This time with mesa-dev@ in the list, and less typos] Hi Rob, On Mon, 12 Nov 2018 at 15:14, Robert Foss wrote: > +++ b/src/gallium/drivers/virgl/virgl_screen.c > @@ -340,7 +340,7 @@ virgl_get_param(struct pipe_screen *screen, enum pipe_cap > param) > case PIPE_CAP_VIDEO_MEMORY: >return 0; > case PIPE_CAP_NATIVE_FENCE_FD: > - return 0; > + return vscreen->vws->driver_version(vscreen->vws) >= 1; It seems like the driver_version() vfunc is missing for the vtest winsys. One could go with an empty stub or drop the function in faviour of a winsys variable (or bitmask). Personally, I'm leaning towards the latter, although either will do. > +static void virgl_fence_server_sync(struct virgl_winsys *vws, > + struct virgl_cmd_buf *cbuf, > +struct pipe_fence_handle *fence) > +{ > + struct virgl_hw_res *hw_res = virgl_hw_res(fence); > + > + assert(hw_res->fence_fd != -1); > + Skimming at other drivers - they're not using an assert, so I'd change this to an if statement. > + if (cbuf->in_fence_fd == -1) { > + cbuf->in_fence_fd = dup(hw_res->fence_fd); > + } else { > +int new_fd = sync_merge("virgl", cbuf->in_fence_fd, > hw_res->fence_fd); > +close(cbuf->in_fence_fd); > +cbuf->in_fence_fd = new_fd; The above if/else seems like an open-coded version of sync_accumulate(). Despite the above comments, the kernel interface seems reasonable IMHO. Would be great if one more person else double-checks it though. With the three bits handled the patch is: Reviewed-by: Emil Velikov HTH Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107822] Just Cause 3 Flickering Textures with AMD RADV
https://bugs.freedesktop.org/show_bug.cgi?id=107822 --- Comment #6 from Alexander --- I already have tested that. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 5/5] intel/tools: avoid 'ignoring return value'
From: Andrii Simiklit 1. tools/i965_disasm.c:58:4: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result fread(assembly, *end, 1, fp); v2: - Fixed incorrect return value check. ( Eric Engestrom ) Signed-off-by: Andrii Simiklit --- src/intel/tools/i965_disasm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/intel/tools/i965_disasm.c b/src/intel/tools/i965_disasm.c index 73a6760fc1..329f6327ed 100644 --- a/src/intel/tools/i965_disasm.c +++ b/src/intel/tools/i965_disasm.c @@ -55,7 +55,8 @@ i965_disasm_read_binary(FILE *fp, size_t *end) if (assembly == NULL) return NULL; - fread(assembly, *end, 1, fp); + MAYBE_UNUSED size_t size = fread(assembly, *end, 1, fp); + assert((size || (*end == 0)) && "error: unable to read all elements!"); fclose(fp); return assembly; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 4/5] main: avoid 'may be used uninitialized' warnings
From: Andrii Simiklit 1. main/texcompress_etc.c:1314:12: warning: ‘*((void *)+2)’ may be used uninitialized in this function 2. main/texcompress_etc.c:1354:12: warning: ‘*((void *)+2)’ may be used uninitialized in this function 3. main/texcompress_etc.c:1293:12: warning: ‘dst’ may be used uninitialized in this function 4. main/texcompress_etc.c:1335:12: warning: ‘dst’ may be used uninitialized in this function 5. main/texcompress_etc.c:1460:12: warning: ‘*((void *)+1)’ may be used uninitialized in this function v2: Fixed by adding the unreachable case to the etc2_rgb8_fetch_texel ( Eric Engestrom ) Changes for warning 'pixerrorcolorbest' were removed. Signed-off-by: Andrii Simiklit --- src/mesa/main/texcompress_etc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/main/texcompress_etc.c b/src/mesa/main/texcompress_etc.c index b39ab33d36..f1da4d0f11 100644 --- a/src/mesa/main/texcompress_etc.c +++ b/src/mesa/main/texcompress_etc.c @@ -548,6 +548,7 @@ etc2_rgb8_fetch_texel(const struct etc2_block *block, if (punchthrough_alpha) dst[3] = 255; } + else unreachable("unhandled block mode"); } static void -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 2/5] compiler: avoid 'unused variable'
From: Andrii Simiklit 1. nir/nir_lower_vars_to_ssa.c:691:21: warning: unused variable ‘var’ nir_variable *var = path->path[0]->var; v2: Changes for some part of 'may be used uninitialized' warnings were removed, seems like it is a compiler issue. ( Eric Engestrom ) Possible like this one: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46684 This issue is flagged as duplicate but an original one is not closed yet. Signed-off-by: Andrii Simiklit --- src/compiler/nir/nir_lower_vars_to_ssa.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/compiler/nir/nir_lower_vars_to_ssa.c b/src/compiler/nir/nir_lower_vars_to_ssa.c index 8e517a7895..646efd9ad8 100644 --- a/src/compiler/nir/nir_lower_vars_to_ssa.c +++ b/src/compiler/nir/nir_lower_vars_to_ssa.c @@ -683,10 +683,9 @@ nir_lower_vars_to_ssa_impl(nir_function_impl *impl) nir_deref_path *path = >path; assert(path->path[0]->deref_type == nir_deref_type_var); - nir_variable *var = path->path[0]->var; /* We don't build deref nodes for non-local variables */ - assert(var->data.mode == nir_var_local); + assert(path->path[0]->var->data.mode == nir_var_local); if (path_may_be_aliased(path, )) { exec_node_remove(>direct_derefs_link); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 3/5] i965: avoid 'unused variable'
From: Andrii Simiklit 1. brw_pipe_control.c:311:34: warning: unused variable ‘devinfo’ 2. brw_program_binary.c:209:19: warning: unused variable ‘gen_size’ 3. brw_program_binary.c:216:19: warning: unused variable ‘nir_size’ v2: Changes for unreproducible issues were removed Signed-off-by: Andrii Simiklit --- src/mesa/drivers/dri/i965/brw_pipe_control.c | 2 +- src/mesa/drivers/dri/i965/brw_program_binary.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c b/src/mesa/drivers/dri/i965/brw_pipe_control.c index 122ac26070..a3f521b5ae 100644 --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c @@ -308,7 +308,7 @@ brw_emit_depth_stall_flushes(struct brw_context *brw) void gen7_emit_vs_workaround_flush(struct brw_context *brw) { - const struct gen_device_info *devinfo = >screen->devinfo; + MAYBE_UNUSED const struct gen_device_info *devinfo = >screen->devinfo; assert(devinfo->gen == 7); brw_emit_pipe_control_write(brw, diff --git a/src/mesa/drivers/dri/i965/brw_program_binary.c b/src/mesa/drivers/dri/i965/brw_program_binary.c index db03332241..1298d9e765 100644 --- a/src/mesa/drivers/dri/i965/brw_program_binary.c +++ b/src/mesa/drivers/dri/i965/brw_program_binary.c @@ -206,14 +206,14 @@ brw_program_deserialize_driver_blob(struct gl_context *ctx, break; switch ((enum driver_cache_blob_part)part_type) { case GEN_PART: { - uint32_t gen_size = blob_read_uint32(); + MAYBE_UNUSED uint32_t gen_size = blob_read_uint32(); assert(!reader.overrun && (uintptr_t)(reader.end - reader.current) > gen_size); deserialize_gen_program(, ctx, prog, stage); break; } case NIR_PART: { - uint32_t nir_size = blob_read_uint32(); + MAYBE_UNUSED uint32_t nir_size = blob_read_uint32(); assert(!reader.overrun && (uintptr_t)(reader.end - reader.current) > nir_size); const struct nir_shader_compiler_options *options = -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 0/5] mesa: fix against several compilation warnings
From: Andrii Simiklit Fixes several compilation warnings for a release configuration v2: the patch '1/4' was separated to '1/5' and '5/5' ( Eric Engestrom ) Andrii Simiklit (5): intel/tools: avoid 'unused variable' warnings compiler: avoid 'unused variable' i965: avoid 'unused variable' main: avoid 'may be used uninitialized' warnings intel/tools: avoid 'ignoring return value' src/compiler/nir/nir_lower_vars_to_ssa.c | 3 +-- src/intel/tools/aub_mem.c | 10 ++ src/intel/tools/aub_read.c | 3 ++- src/intel/tools/i965_disasm.c | 3 ++- src/mesa/drivers/dri/i965/brw_pipe_control.c | 2 +- src/mesa/drivers/dri/i965/brw_program_binary.c | 4 ++-- src/mesa/main/texcompress_etc.c| 1 + 7 files changed, 15 insertions(+), 11 deletions(-) -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 1/5] intel/tools: avoid 'unused variable' warnings
From: Andrii Simiklit 1. tools/aub_read.c:271:31: warning: unused variable ‘end’ const uint32_t *p = data, *end = data + data_len, *next; 2. tools/aub_mem.c:292:13: warning: unused variable ‘res’ void *res = mmap((uint8_t *)bo.map + map_offset, 4096, PROT_READ, tools/aub_mem.c:357:13: warning: unused variable ‘res’ void *res = mmap((uint8_t *)bo.map + (page - bo.addr), 4096, PROT_READ, v2: The i965_disasm.c changes was moved into a separate patch The 'end' variable declared separately with MAYBE_UNUSED to avoid effect of it to other variables. ( Eric Engestrom ) Signed-off-by: Andrii Simiklit --- src/intel/tools/aub_mem.c | 10 ++ src/intel/tools/aub_read.c | 3 ++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/src/intel/tools/aub_mem.c b/src/intel/tools/aub_mem.c index 58b51b78a5..98e14219c5 100644 --- a/src/intel/tools/aub_mem.c +++ b/src/intel/tools/aub_mem.c @@ -289,8 +289,9 @@ aub_mem_get_ggtt_bo(void *_mem, uint64_t address) continue; uint32_t map_offset = i->virt_addr - address; - void *res = mmap((uint8_t *)bo.map + map_offset, 4096, PROT_READ, - MAP_SHARED | MAP_FIXED, mem->mem_fd, phys_mem->fd_offset); + MAYBE_UNUSED void *res = +mmap((uint8_t *)bo.map + map_offset, 4096, PROT_READ, + MAP_SHARED | MAP_FIXED, mem->mem_fd, phys_mem->fd_offset); assert(res != MAP_FAILED); } @@ -354,8 +355,9 @@ aub_mem_get_ppgtt_bo(void *_mem, uint64_t address) for (uint64_t page = address; page < end; page += 4096) { struct phys_mem *phys_mem = ppgtt_walk(mem, mem->pml4, page); - void *res = mmap((uint8_t *)bo.map + (page - bo.addr), 4096, PROT_READ, - MAP_SHARED | MAP_FIXED, mem->mem_fd, phys_mem->fd_offset); + MAYBE_UNUSED void *res = +mmap((uint8_t *)bo.map + (page - bo.addr), 4096, PROT_READ, + MAP_SHARED | MAP_FIXED, mem->mem_fd, phys_mem->fd_offset); assert(res != MAP_FAILED); } diff --git a/src/intel/tools/aub_read.c b/src/intel/tools/aub_read.c index d83e88ddce..552ca2cc62 100644 --- a/src/intel/tools/aub_read.c +++ b/src/intel/tools/aub_read.c @@ -294,7 +294,8 @@ handle_memtrace_mem_write(struct aub_read *read, const uint32_t *p) int aub_read_command(struct aub_read *read, const void *data, uint32_t data_len) { - const uint32_t *p = data, *end = data + data_len, *next; + const uint32_t *p = data, *next; + MAYBE_UNUSED const uint32_t *end = data + data_len; uint32_t h, header_length, bias; assert(data_len >= 4); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 01/10] intel/genxml: Add engine definition to render engine instructions (gen4)
For all the xml changes : Reviewed-by: Lionel Landwerlin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 3/4] dri: add AYUV format
On 11/13/18 1:43 PM, Lionel Landwerlin wrote: I think this chunk (or the whole patch) should be cherry picked to stable. Otherwise we get a BAD_ATTRIBUTE error for trying to create an AYUV EGLImage. We should have BAD_MATCH instead. Or should we change the reported error code in places where this is called? That seems like an existing bug, things wouldn't work correctly if someone adds a new format to drm_fourcc.h. - Lionel On 09/11/2018 10:55, Lionel Landwerlin wrote: diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 87e1a704c6e..3b63aebbf9a 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -2278,6 +2278,7 @@ dri2_num_fourcc_format_planes(EGLint format) case DRM_FORMAT_YVYU: case DRM_FORMAT_UYVY: case DRM_FORMAT_VYUY: + case DRM_FORMAT_AYUV: return 1; case DRM_FORMAT_NV12: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 32211] [GLSL] lower_jumps with continue-statements in for-loops prevents loop unrolling
https://bugs.freedesktop.org/show_bug.cgi?id=32211 Danylo changed: What|Removed |Added CC||danylo.pilia...@gmail.com --- Comment #12 from Danylo --- (In reply to Timothy Arceri from comment #11) > > So all we need to do is move everything after the if into the else block and > remove the continue. Removing myself as assignee, this would probably be a > good beginners task. Hi, I've tried to do this and it works for me however it alone doesn't solve the problem. Consider the resulting nir: loop { block block_1: /* preds: block_0 block_7 */ vec1 32 ssa_8 = phi block_0: ssa_4, block_7: ssa_20 vec1 32 ssa_9 = phi block_0: ssa_0, block_7: ssa_4 vec1 32 ssa_10 = phi block_0: ssa_1, block_7: ssa_4 vec1 32 ssa_11 = phi block_0: ssa_2, block_7: ssa_21 vec1 32 ssa_12 = phi block_0: ssa_3, block_7: ssa_22 vec4 32 ssa_13 = vec4 ssa_12, ssa_11, ssa_10, ssa_9 vec1 32 ssa_14 = ige ssa_8, ssa_5 /* succs: block_2 block_3 */ if ssa_14 { block block_2: /* preds: block_1 */ break /* succs: block_8 */ } else { block block_3: /* preds: block_1 */ /* succs: block_4 */ } block block_4: /* preds: block_3 */ vec1 32 ssa_15 = ilt ssa_6, ssa_8 /* succs: block_5 block_6 */ if ssa_15 { block block_5: /* preds: block_4 */ vec1 32 ssa_16 = iadd ssa_8, ssa_7 vec1 32 ssa_17 = load_const (0x3f80 /* 1.00 */) /* succs: block_7 */ } else { block block_6: /* preds: block_4 */ vec1 32 ssa_18 = iadd ssa_8, ssa_7 vec1 32 ssa_19 = load_const (0x3f80 /* 1.00 */) /* succs: block_7 */ } block block_7: /* preds: block_5 block_6 */ vec1 32 ssa_20 = phi block_5: ssa_16, block_6: ssa_18 vec1 32 ssa_21 = phi block_5: ssa_17, block_6: ssa_4 vec1 32 ssa_22 = phi block_5: ssa_4, block_6: ssa_19 /* succs: block_1 */ } Now in both "if" (block_5) and "else" (block_6) blocks there are identical expressions and no continue. However there is no optimization to pull these identical expressions out - CSE pass won't do this since it's a local CSE, only global CSE would help. And there is no active global CSE pass, there is only Global Code Motion pass with Global Value Numbering and it is not enabled, enabling it optimizes the condition in question and in the end whole loop disappears as expected, however this pass doesn't look something we want since in other cases it hurts shaders and it is more than just global CSE. Any opinions on this? -- You are receiving this mail because: You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] i965: Fix calculation of layers array length for isl_view
Hello, Could anyone look at the patch? Thanks! On 10/24/18 2:22 PM, Danylo Piliaiev wrote: I have made a Piglit test that exercises the issue: https://patchwork.freedesktop.org/patch/258180/ - Danil On 9/10/18 6:21 PM, Danylo Piliaiev wrote: Handle all cases in calculation of layers count for isl_view taking into account texture view and image unit. st_convert_image was taken as a reference. When u->Layered is true the whole level is taken with respect to image view. In other case only one layer is taken. v3: (Józef Kucia and Ilia Mirkin) - Rewrote patch by taking st_convert_image as a reference - Removed now unused get_image_num_layers function - Changed commit message Fixes: 5a8c8903 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107856 Signed-off-by: Danylo Piliaiev --- .../drivers/dri/i965/brw_wm_surface_state.c | 32 ++- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c index 944762ec46..9bfe6e2037 100644 --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c @@ -1499,18 +1499,6 @@ update_buffer_image_param(struct brw_context *brw, param->stride[0] = _mesa_get_format_bytes(u->_ActualFormat); } -static unsigned -get_image_num_layers(const struct intel_mipmap_tree *mt, GLenum target, - unsigned level) -{ - if (target == GL_TEXTURE_CUBE_MAP) - return 6; - - return target == GL_TEXTURE_3D ? - minify(mt->surf.logical_level0_px.depth, level) : - mt->surf.logical_level0_px.array_len; -} - static void update_image_surface(struct brw_context *brw, struct gl_image_unit *u, @@ -1541,14 +1529,28 @@ update_image_surface(struct brw_context *brw, } else { struct intel_texture_object *intel_obj = intel_texture_object(obj); struct intel_mipmap_tree *mt = intel_obj->mt; - const unsigned num_layers = u->Layered ? - get_image_num_layers(mt, obj->Target, u->Level) : 1; + + unsigned base_layer, num_layers; + if (u->Layered) { + if (obj->Target == GL_TEXTURE_3D) { + base_layer = 0; + num_layers = minify(mt->surf.logical_level0_px.depth, u->Level); + } else { + base_layer = obj->MinLayer; + num_layers = obj->Immutable ? + obj->NumLayers : + mt->surf.logical_level0_px.array_len; + } + } else { + base_layer = obj->MinLayer + u->_Layer; + num_layers = 1; + } struct isl_view view = { .format = format, .base_level = obj->MinLevel + u->Level, .levels = 1, - .base_array_layer = obj->MinLayer + u->_Layer, + .base_array_layer = base_layer, .array_len = num_layers, .swizzle = ISL_SWIZZLE_IDENTITY, .usage = ISL_SURF_USAGE_STORAGE_BIT, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 3/4] dri: add AYUV format
On 13/11/2018 12:04, Tapani Pälli wrote: On 11/13/18 1:43 PM, Lionel Landwerlin wrote: I think this chunk (or the whole patch) should be cherry picked to stable. Otherwise we get a BAD_ATTRIBUTE error for trying to create an AYUV EGLImage. We should have BAD_MATCH instead. Or should we change the reported error code in places where this is called? That seems like an existing bug, things wouldn't work correctly if someone adds a new format to drm_fourcc.h. Sounds fair, running a patch through CI. - Lionel On 09/11/2018 10:55, Lionel Landwerlin wrote: diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 87e1a704c6e..3b63aebbf9a 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -2278,6 +2278,7 @@ dri2_num_fourcc_format_planes(EGLint format) case DRM_FORMAT_YVYU: case DRM_FORMAT_UYVY: case DRM_FORMAT_VYUY: + case DRM_FORMAT_AYUV: return 1; case DRM_FORMAT_NV12: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] virgl: Add command and flags to initiate debugging on the host (v2)
On Wed, 2018-09-12 at 11:59 +0200, Gert Wollny wrote: > From: Gert Wollny > > On the host VREND_DEBUG=guestallow must be set to let the guest > override > the debug flags. > > v2: Send flag string instead of flags, this avoids the need to keep > the flags in sync. > v3: Only request host logging if the host actually understands the > command > > Signed-off-by: Gert Wollny Looks good to me! Reviewed-by: Erik Faye-Lund > --- > The corresponding virglrenderer patches can be found in this MR: > https://gitlab.freedesktop.org/virgl/virglrenderer/merge_requests/39 > > Thanks for reviewing, > Gert > > src/gallium/drivers/virgl/virgl_context.c | 8 > src/gallium/drivers/virgl/virgl_encode.c | 24 > > src/gallium/drivers/virgl/virgl_encode.h | 3 +++ > src/gallium/drivers/virgl/virgl_hw.h | 1 + > src/gallium/drivers/virgl/virgl_protocol.h | 1 + > 5 files changed, 37 insertions(+) > > diff --git a/src/gallium/drivers/virgl/virgl_context.c > b/src/gallium/drivers/virgl/virgl_context.c > index 4511bf3b2f..96932c473d 100644 > --- a/src/gallium/drivers/virgl/virgl_context.c > +++ b/src/gallium/drivers/virgl/virgl_context.c > @@ -1164,6 +1164,7 @@ struct pipe_context > *virgl_context_create(struct pipe_screen *pscreen, > struct virgl_context *vctx; > struct virgl_screen *rs = virgl_screen(pscreen); > vctx = CALLOC_STRUCT(virgl_context); > + const char *host_debug_flagstring; > > vctx->cbuf = rs->vws->cmd_buf_create(rs->vws); > if (!vctx->cbuf) { > @@ -1268,6 +1269,13 @@ struct pipe_context > *virgl_context_create(struct pipe_screen *pscreen, > virgl_encoder_create_sub_ctx(vctx, vctx->hw_sub_ctx_id); > > virgl_encoder_set_sub_ctx(vctx, vctx->hw_sub_ctx_id); > + > + if (rs->caps.caps.v2.capability_bits & > VIRGL_CAP_GUEST_MAY_INIT_LOG) { > + host_debug_flagstring = getenv("VIRGL_HOST_DEBUG"); > + if (host_debug_flagstring) > + virgl_encode_host_debug_flagstring(vctx, > host_debug_flagstring); > + } > + > return >base; > fail: > return NULL; > diff --git a/src/gallium/drivers/virgl/virgl_encode.c > b/src/gallium/drivers/virgl/virgl_encode.c > index e86d0711a5..400ba68474 100644 > --- a/src/gallium/drivers/virgl/virgl_encode.c > +++ b/src/gallium/drivers/virgl/virgl_encode.c > @@ -1044,3 +1044,27 @@ int virgl_encode_texture_barrier(struct > virgl_context *ctx, > virgl_encoder_write_dword(ctx->cbuf, flags); > return 0; > } > + > +int virgl_encode_host_debug_flagstring(struct virgl_context *ctx, > + char *flagstring) > +{ > + unsigned long slen = strlen(flagstring) + 1; > + uint32_t sslen; > + uint32_t string_length; > + > + if (!slen) > + return 0; > + > + if (slen > 4 * 0x) { > + debug_printf("VIRGL: host debug flag string too long, will be > truncated\n"); > + slen = 4 * 0x; > + } > + > + sslen = (uint32_t )(slen + 3) / 4; > + string_length = (uint32_t)MIN2(sslen * 4, slen); > + > + virgl_encoder_write_cmd_dword(ctx, > VIRGL_CMD0(VIRGL_CCMD_SET_DEBUG_FLAGS, 0, sslen)); > + virgl_encoder_write_block(ctx->cbuf, (uint8_t *)flagstring, > string_length); > + > + return 0; > +} > diff --git a/src/gallium/drivers/virgl/virgl_encode.h > b/src/gallium/drivers/virgl/virgl_encode.h > index 40e62d453b..80b943a6b3 100644 > --- a/src/gallium/drivers/virgl/virgl_encode.h > +++ b/src/gallium/drivers/virgl/virgl_encode.h > @@ -276,4 +276,7 @@ int virgl_encode_launch_grid(struct virgl_context > *ctx, > const struct pipe_grid_info > *grid_info); > int virgl_encode_texture_barrier(struct virgl_context *ctx, > unsigned flags); > + > +int virgl_encode_host_debug_flagstring(struct virgl_context *ctx, > + char *envname); > #endif > diff --git a/src/gallium/drivers/virgl/virgl_hw.h > b/src/gallium/drivers/virgl/virgl_hw.h > index 7736ceb935..e682c750e7 100644 > --- a/src/gallium/drivers/virgl/virgl_hw.h > +++ b/src/gallium/drivers/virgl/virgl_hw.h > @@ -231,6 +231,7 @@ enum virgl_formats { > #define VIRGL_CAP_SHADER_CLOCK (1 << 11) > #define VIRGL_CAP_TEXTURE_BARRIER (1 << 12) > #define VIRGL_CAP_TGSI_COMPONENTS (1 << 13) > +#define VIRGL_CAP_GUEST_MAY_INIT_LOG (1 << 14) > > /* virgl bind flags - these are compatible with mesa 10.5 gallium. > * but are fixed, no other should be passed to virgl either. > diff --git a/src/gallium/drivers/virgl/virgl_protocol.h > b/src/gallium/drivers/virgl/virgl_protocol.h > index 8d99c5ed47..3373121bf7 100644 > --- a/src/gallium/drivers/virgl/virgl_protocol.h > +++ b/src/gallium/drivers/virgl/virgl_protocol.h > @@ -92,6 +92,7 @@ enum virgl_context_cmd { > VIRGL_CCMD_SET_FRAMEBUFFER_STATE_NO_ATTACH, > VIRGL_CCMD_TEXTURE_BARRIER, > VIRGL_CCMD_SET_ATOMIC_BUFFERS, > + VIRGL_CCMD_SET_DEBUG_FLAGS, > }; > > /*
Re: [Mesa-dev] [PATCH 2/3] radv: make use of nir_move_out_const_to_consumer()
Reviewed-by: Samuel Pitoiset On 11/7/18 5:20 AM, Timothy Arceri wrote: vkpipeline-db results: Totals from affected shaders: SGPRS: 28400 -> 28576 (0.62 %) VGPRS: 27916 -> 27692 (-0.80 %) Spilled SGPRs: 140 -> 138 (-1.43 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 1534456 -> 1520560 (-0.91 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 3541 -> 3582 (1.16 %) Wait states: 0 -> 0 (0.00 %) --- src/amd/vulkan/radv_pipeline.c | 4 1 file changed, 4 insertions(+) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index bced19573c1..12e7f43bde7 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -1814,6 +1814,10 @@ radv_link_shaders(struct radv_pipeline *pipeline, nir_shader **shaders) nir_lower_io_arrays_to_elements(ordered_shaders[i], ordered_shaders[i - 1]); + if (nir_move_out_const_to_consumer(ordered_shaders[i], + ordered_shaders[i - 1])) + radv_optimize_nir(ordered_shaders[i - 1], false, false); + nir_remove_dead_variables(ordered_shaders[i], nir_var_shader_out); nir_remove_dead_variables(ordered_shaders[i - 1], ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf
On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote: > Quoting Erik Faye-Lund (2018-11-12 04:51:47) > > On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote: > > > Which has the same behavior. > > > > Does it? I'm not so sure... IROUND_POS seems to round to nearest > > integer depending on the FPU rounding mode, _mesa_roundevenf rounds > > to > > the nearest *even* value regardless of the FPU rounding mode, no? > > > > I'm not sure if it matters or not, but *at least* point that out in > > the > > commit message. Unless I'm missing something, of course... > > I should put it in the commit message, but there is a comment in > rounding.h that > if you change the rounding mode you get to keep the pieces. Well, this might regress performance pretty badly. Especially in the swrast code, this could be bad... ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl/dri: fix error value with unknown drm format
On 13/11/2018 15:43, Emil Velikov wrote: On Tue, 13 Nov 2018 at 14:11, Lionel Landwerlin wrote: According to the EGL_EXT_image_dma_buf_import spec, creating an EGL image with a DRM format not supported should yield the BAD_MATCH error : " * If is EGL_LINUX_DMA_BUF_EXT, and the EGL_LINUX_DRM_FOURCC_EXT attribute is set to a format not supported by the EGL, EGL_BAD_MATCH is generated. " Signed-off-by: Lionel Landwerlin Fixes: 20de7f9f226401 ("egl/dri2: support for creating images out of dma buffers") Reviewed-by: Emil Velikov Great catch Lionel. Out of curiosity, how did you spot this? -Emil Running : ext_image_dma_buf_import-sample_yuv -fmt=AYUV on an older driver. - Lionel ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 22/22] nir/spirv: handle OpBitcasts for pointers
Signed-off-by: Karol Herbst --- src/compiler/spirv/spirv_to_nir.c | 5 +- src/compiler/spirv/vtn_alu.c | 187 +- src/compiler/spirv/vtn_private.h | 3 + 3 files changed, 115 insertions(+), 80 deletions(-) diff --git a/src/compiler/spirv/spirv_to_nir.c b/src/compiler/spirv/spirv_to_nir.c index 8c341e9c1fa..cbd40df7473 100644 --- a/src/compiler/spirv/spirv_to_nir.c +++ b/src/compiler/spirv/spirv_to_nir.c @@ -4067,7 +4067,6 @@ vtn_handle_body_instruction(struct vtn_builder *b, SpvOp opcode, case SpvOpConvertUToPtr: case SpvOpPtrCastToGeneric: case SpvOpGenericCastToPtr: - case SpvOpBitcast: case SpvOpIsNan: case SpvOpIsInf: case SpvOpIsFinite: @@ -4152,6 +4151,10 @@ vtn_handle_body_instruction(struct vtn_builder *b, SpvOp opcode, vtn_handle_alu(b, opcode, w, count); break; + case SpvOpBitcast: + vtn_handle_bitcast(b, opcode, w, count); + break; + case SpvOpVectorExtractDynamic: case SpvOpVectorInsertDynamic: case SpvOpVectorShuffle: diff --git a/src/compiler/spirv/vtn_alu.c b/src/compiler/spirv/vtn_alu.c index 32825da29cb..e1088a7e9db 100644 --- a/src/compiler/spirv/vtn_alu.c +++ b/src/compiler/spirv/vtn_alu.c @@ -211,81 +211,6 @@ vtn_handle_matrix_alu(struct vtn_builder *b, SpvOp opcode, } } -static void -vtn_handle_bitcast(struct vtn_builder *b, struct vtn_ssa_value *dest, - struct nir_ssa_def *src) -{ - if (glsl_get_vector_elements(dest->type) == src->num_components) { - /* From the definition of OpBitcast in the SPIR-V 1.2 spec: - * - * "If Result Type has the same number of components as Operand, they - * must also have the same component width, and results are computed per - * component." - */ - dest->def = nir_imov(>nb, src); - return; - } - - /* From the definition of OpBitcast in the SPIR-V 1.2 spec: -* -* "If Result Type has a different number of components than Operand, the -* total number of bits in Result Type must equal the total number of bits -* in Operand. Let L be the type, either Result Type or Operand’s type, that -* has the larger number of components. Let S be the other type, with the -* smaller number of components. The number of components in L must be an -* integer multiple of the number of components in S. The first component -* (that is, the only or lowest-numbered component) of S maps to the first -* components of L, and so on, up to the last component of S mapping to the -* last components of L. Within this mapping, any single component of S -* (mapping to multiple components of L) maps its lower-ordered bits to the -* lower-numbered components of L." -*/ - unsigned src_bit_size = src->bit_size; - unsigned dest_bit_size = glsl_get_bit_size(dest->type); - unsigned src_components = src->num_components; - unsigned dest_components = glsl_get_vector_elements(dest->type); - vtn_assert(src_bit_size * src_components == dest_bit_size * dest_components); - - nir_ssa_def *dest_chan[NIR_MAX_VEC_COMPONENTS]; - if (src_bit_size > dest_bit_size) { - vtn_assert(src_bit_size % dest_bit_size == 0); - unsigned divisor = src_bit_size / dest_bit_size; - for (unsigned comp = 0; comp < src_components; comp++) { - nir_ssa_def *split; - if (src_bit_size == 64) { -assert(dest_bit_size == 32 || dest_bit_size == 16); -split = dest_bit_size == 32 ? - nir_unpack_64_2x32(>nb, nir_channel(>nb, src, comp)) : - nir_unpack_64_4x16(>nb, nir_channel(>nb, src, comp)); - } else { -vtn_assert(src_bit_size == 32); -vtn_assert(dest_bit_size == 16); -split = nir_unpack_32_2x16(>nb, nir_channel(>nb, src, comp)); - } - for (unsigned i = 0; i < divisor; i++) -dest_chan[divisor * comp + i] = nir_channel(>nb, split, i); - } - } else { - vtn_assert(dest_bit_size % src_bit_size == 0); - unsigned divisor = dest_bit_size / src_bit_size; - for (unsigned comp = 0; comp < dest_components; comp++) { - unsigned channels = ((1 << divisor) - 1) << (comp * divisor); - nir_ssa_def *src_chan = nir_channels(>nb, src, channels); - if (dest_bit_size == 64) { -assert(src_bit_size == 32 || src_bit_size == 16); -dest_chan[comp] = src_bit_size == 32 ? - nir_pack_64_2x32(>nb, src_chan) : - nir_pack_64_4x16(>nb, src_chan); - } else { -vtn_assert(dest_bit_size == 32); -vtn_assert(src_bit_size == 16); -dest_chan[comp] = nir_pack_32_2x16(>nb, src_chan); - } - } - } - dest->def = nir_vec(>nb, dest_chan, dest_components); -} - nir_op vtn_nir_alu_op_for_spirv_opcode(struct vtn_builder *b, SpvOp opcode, bool *swap, @@ -451,6 +376,114 @@ handle_rounding_mode(struct
[Mesa-dev] [PATCH 02/22] nir: replace nir_load_system_value calls with appropiate builder functions
this helps reduce the overall code changes when a bit_size parameter is added to nir_load_system_value Reviewed-by: Jason Ekstrand Reviewed-by: Eric Anholt Signed-off-by: Karol Herbst --- src/amd/vulkan/radv_meta_buffer.c| 8 src/amd/vulkan/radv_meta_bufimage.c | 16 src/amd/vulkan/radv_meta_clear.c | 8 src/amd/vulkan/radv_meta_fast_clear.c| 4 ++-- src/amd/vulkan/radv_meta_resolve_cs.c| 4 ++-- src/amd/vulkan/radv_query.c | 8 src/compiler/nir/nir_lower_clip.c| 3 +-- src/compiler/nir/nir_lower_wpos_center.c | 3 +-- .../vulkan/anv_nir_lower_input_attachments.c | 3 +-- 9 files changed, 27 insertions(+), 30 deletions(-) diff --git a/src/amd/vulkan/radv_meta_buffer.c b/src/amd/vulkan/radv_meta_buffer.c index 8726d36f5fa..76854d7bbad 100644 --- a/src/amd/vulkan/radv_meta_buffer.c +++ b/src/amd/vulkan/radv_meta_buffer.c @@ -15,8 +15,8 @@ build_buffer_fill_shader(struct radv_device *dev) b.shader->info.cs.local_size[1] = 1; b.shader->info.cs.local_size[2] = 1; - nir_ssa_def *invoc_id = nir_load_system_value(, nir_intrinsic_load_local_invocation_id, 0); - nir_ssa_def *wg_id = nir_load_system_value(, nir_intrinsic_load_work_group_id, 0); + nir_ssa_def *invoc_id = nir_load_local_invocation_id(); + nir_ssa_def *wg_id = nir_load_work_group_id(); nir_ssa_def *block_size = nir_imm_ivec4(, b.shader->info.cs.local_size[0], b.shader->info.cs.local_size[1], @@ -67,8 +67,8 @@ build_buffer_copy_shader(struct radv_device *dev) b.shader->info.cs.local_size[1] = 1; b.shader->info.cs.local_size[2] = 1; - nir_ssa_def *invoc_id = nir_load_system_value(, nir_intrinsic_load_local_invocation_id, 0); - nir_ssa_def *wg_id = nir_load_system_value(, nir_intrinsic_load_work_group_id, 0); + nir_ssa_def *invoc_id = nir_load_local_invocation_id(); + nir_ssa_def *wg_id = nir_load_work_group_id(); nir_ssa_def *block_size = nir_imm_ivec4(, b.shader->info.cs.local_size[0], b.shader->info.cs.local_size[1], diff --git a/src/amd/vulkan/radv_meta_bufimage.c b/src/amd/vulkan/radv_meta_bufimage.c index 6f074a70b4c..f5b68f6c9a6 100644 --- a/src/amd/vulkan/radv_meta_bufimage.c +++ b/src/amd/vulkan/radv_meta_bufimage.c @@ -60,8 +60,8 @@ build_nir_itob_compute_shader(struct radv_device *dev, bool is_3d) output_img->data.descriptor_set = 0; output_img->data.binding = 1; - nir_ssa_def *invoc_id = nir_load_system_value(, nir_intrinsic_load_local_invocation_id, 0); - nir_ssa_def *wg_id = nir_load_system_value(, nir_intrinsic_load_work_group_id, 0); + nir_ssa_def *invoc_id = nir_load_local_invocation_id(); + nir_ssa_def *wg_id = nir_load_work_group_id(); nir_ssa_def *block_size = nir_imm_ivec4(, b.shader->info.cs.local_size[0], b.shader->info.cs.local_size[1], @@ -289,8 +289,8 @@ build_nir_btoi_compute_shader(struct radv_device *dev, bool is_3d) output_img->data.descriptor_set = 0; output_img->data.binding = 1; - nir_ssa_def *invoc_id = nir_load_system_value(, nir_intrinsic_load_local_invocation_id, 0); - nir_ssa_def *wg_id = nir_load_system_value(, nir_intrinsic_load_work_group_id, 0); + nir_ssa_def *invoc_id = nir_load_local_invocation_id(); + nir_ssa_def *wg_id = nir_load_work_group_id(); nir_ssa_def *block_size = nir_imm_ivec4(, b.shader->info.cs.local_size[0], b.shader->info.cs.local_size[1], @@ -719,8 +719,8 @@ build_nir_itoi_compute_shader(struct radv_device *dev, bool is_3d) output_img->data.descriptor_set = 0; output_img->data.binding = 1; - nir_ssa_def *invoc_id = nir_load_system_value(, nir_intrinsic_load_local_invocation_id, 0); - nir_ssa_def *wg_id = nir_load_system_value(, nir_intrinsic_load_work_group_id, 0); + nir_ssa_def *invoc_id = nir_load_local_invocation_id(); + nir_ssa_def *wg_id = nir_load_work_group_id(); nir_ssa_def *block_size = nir_imm_ivec4(, b.shader->info.cs.local_size[0], b.shader->info.cs.local_size[1], @@ -1139,8 +1139,8 @@ build_nir_cleari_compute_shader(struct radv_device *dev, bool is_3d) output_img->data.descriptor_set = 0; output_img->data.binding = 0; - nir_ssa_def *invoc_id = nir_load_system_value(, nir_intrinsic_load_local_invocation_id, 0); - nir_ssa_def *wg_id =
[Mesa-dev] [PATCH 01/22] nir: add const_index parameters to system value builder function
this allows to replace some nir_load_system_value calls with the specific system value constructor Reviewed-by: Jason Ekstrand Reviewed-by: Eric Anholt Signed-off-by: Karol Herbst --- src/compiler/nir/nir_builder_opcodes_h.py | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/src/compiler/nir/nir_builder_opcodes_h.py b/src/compiler/nir/nir_builder_opcodes_h.py index e600093e9f6..34b8c4371e1 100644 --- a/src/compiler/nir/nir_builder_opcodes_h.py +++ b/src/compiler/nir/nir_builder_opcodes_h.py @@ -55,11 +55,28 @@ nir_load_system_value(nir_builder *build, nir_intrinsic_op op, int index) return >dest.ssa; } +<% +def sysval_decl_list(opcode): + res = '' + if opcode.indices: + res += ', unsigned ' + opcode.indices[0].lower() + return res + +def sysval_arg_list(opcode): + args = [] + if opcode.indices: + args.append(opcode.indices[0].lower()) + else: + args.append('0') + return ', '.join(args) +%> + % for name, opcode in filter(lambda v: v[1].sysval, sorted(INTR_OPCODES.items())): static inline nir_ssa_def * -nir_${name}(nir_builder *build) +nir_${name}(nir_builder *build${sysval_decl_list(opcode)}) { - return nir_load_system_value(build, nir_intrinsic_${name}, 0); + return nir_load_system_value(build, nir_intrinsic_${name}, +${sysval_arg_list(opcode)}); } % endfor -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf
Quoting Erik Faye-Lund (2018-11-13 01:34:53) > On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote: > > Quoting Erik Faye-Lund (2018-11-12 04:51:47) > > > On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote: > > > > Which has the same behavior. > > > > > > Does it? I'm not so sure... IROUND_POS seems to round to nearest > > > integer depending on the FPU rounding mode, _mesa_roundevenf rounds > > > to > > > the nearest *even* value regardless of the FPU rounding mode, no? > > > > > > I'm not sure if it matters or not, but *at least* point that out in > > > the > > > commit message. Unless I'm missing something, of course... > > > > I should put it in the commit message, but there is a comment in > > rounding.h that > > if you change the rounding mode you get to keep the pieces. > > Well, this might regress performance pretty badly. Especially in the > swrast code, this could be bad... > Why? we have the assumption that you don't change the rounding mode already in core mesa and many of the drivers. For performance, I measured a simple 1000 loops of rounding, and found that the only way the rounding.h function was slower is if you used the __SSE4_1__ path... (It was the same performance as the int cast +0.5 implementation) Dylan signature.asc Description: signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 14/22] nir: add legal bit_sizes to intrinsics
With OpenCL some system values match the address bits, but in GLSL we also have some system values being 64 bit. With this it is possible to adjust the builder functions so that depending on the bit_sizes the correct bit_size is used or an additional argument is added in case of multiple possible values. Also this allows for further validation Signed-off-by: Karol Herbst --- src/compiler/nir/nir.h | 3 +++ src/compiler/nir/nir_intrinsics.py | 15 ++- src/compiler/nir/nir_intrinsics_c.py | 6 +- src/nouveau/meson.build | 25 + 4 files changed, 43 insertions(+), 6 deletions(-) create mode 100644 src/nouveau/meson.build diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index be4f64464f9..3855eb0b582 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -1283,6 +1283,9 @@ typedef struct { /** semantic flags for calls to this intrinsic */ nir_intrinsic_semantic_flag flags; + + /** bitfield of legal bit sizes */ + unsigned bit_sizes : 7; } nir_intrinsic_info; extern const nir_intrinsic_info nir_intrinsic_infos[nir_num_intrinsics]; diff --git a/src/compiler/nir/nir_intrinsics.py b/src/compiler/nir/nir_intrinsics.py index ec3049ca06d..9ada44aad8a 100644 --- a/src/compiler/nir/nir_intrinsics.py +++ b/src/compiler/nir/nir_intrinsics.py @@ -32,7 +32,7 @@ class Intrinsic(object): NOTE: this must be kept in sync with nir_intrinsic_info. """ def __init__(self, name, src_components, dest_components, -indices, flags, sysval): +indices, flags, sysval, bit_sizes): """Parameters: - name: the intrinsic name @@ -45,6 +45,7 @@ class Intrinsic(object): - indices: list of constant indicies - flags: list of semantic flags - sysval: is this a system-value intrinsic + - bit_sizes: allowed dest bit_sizes """ assert isinstance(name, str) assert isinstance(src_components, list) @@ -58,6 +59,8 @@ class Intrinsic(object): if flags: assert isinstance(flags[0], str) assert isinstance(sysval, bool) + if bit_sizes: + assert isinstance(bit_sizes[0], int) self.name = name self.num_srcs = len(src_components) @@ -68,6 +71,7 @@ class Intrinsic(object): self.indices = indices self.flags = flags self.sysval = sysval + self.bit_sizes = bit_sizes # # Possible indices: @@ -120,10 +124,10 @@ CAN_REORDER = "NIR_INTRINSIC_CAN_REORDER" INTR_OPCODES = {} def intrinsic(name, src_comp=[], dest_comp=-1, indices=[], - flags=[], sysval=False): + flags=[], sysval=False, bit_sizes=[]): assert name not in INTR_OPCODES INTR_OPCODES[name] = Intrinsic(name, src_comp, dest_comp, - indices, flags, sysval) + indices, flags, sysval, bit_sizes) intrinsic("nop", flags=[CAN_ELIMINATE]) @@ -446,9 +450,10 @@ intrinsic("shared_atomic_fmin", src_comp=[1, 1], dest_comp=1, indices=[BASE]) intrinsic("shared_atomic_fmax", src_comp=[1, 1], dest_comp=1, indices=[BASE]) intrinsic("shared_atomic_fcomp_swap", src_comp=[1, 1, 1], dest_comp=1, indices=[BASE]) -def system_value(name, dest_comp, indices=[]): +def system_value(name, dest_comp, indices=[], bit_sizes=[32]): intrinsic("load_" + name, [], dest_comp, indices, - flags=[CAN_ELIMINATE, CAN_REORDER], sysval=True) + flags=[CAN_ELIMINATE, CAN_REORDER], sysval=True, + bit_sizes=bit_sizes) system_value("frag_coord", 4) system_value("front_face", 1) diff --git a/src/compiler/nir/nir_intrinsics_c.py b/src/compiler/nir/nir_intrinsics_c.py index ac45b94d496..d0f1c29fa39 100644 --- a/src/compiler/nir/nir_intrinsics_c.py +++ b/src/compiler/nir/nir_intrinsics_c.py @@ -1,3 +1,5 @@ +from functools import reduce +import operator template = """\ /* Copyright (C) 2018 Red Hat @@ -45,6 +47,7 @@ const nir_intrinsic_info nir_intrinsic_infos[nir_num_intrinsics] = { }, % endif .flags = ${"0" if len(opcode.flags) == 0 else " | ".join(opcode.flags)}, + .bit_sizes = ${reduce(operator.or_, opcode.bit_sizes, 0)}, }, % endfor }; @@ -54,6 +57,7 @@ from nir_intrinsics import INTR_OPCODES from mako.template import Template import argparse import os +import functools def main(): parser = argparse.ArgumentParser() @@ -64,7 +68,7 @@ def main(): path = os.path.join(args.outdir, 'nir_intrinsics.c') with open(path, 'wb') as f: -f.write(Template(template, output_encoding='utf-8').render(INTR_OPCODES=INTR_OPCODES)) +f.write(Template(template, output_encoding='utf-8').render(INTR_OPCODES=INTR_OPCODES, reduce=reduce, operator=operator)) if __name__ == '__main__': main() diff --git a/src/nouveau/meson.build b/src/nouveau/meson.build new file mode 100644 index 000..5c265f207ab --- /dev/null +++
[Mesa-dev] [PATCH 20/22] nir/spirv: physical pointer support
this adds support for pointers from CL kernels. The basic idea here is to be able to start a deref chain from a random ssa value and vice versa. changes summed up: 1. derefs can start from a deref_cast 2. new ptr_as_array deref type to offset a pointer 3. derefs can end with a ssa_from_deref intrinsic One problem with this implementation is, that we don't track if a deref points to a value or a pointer, allthough shouldn't cause any problem at runtime as a pointer gets fed into a load or ssa_from_deref. What annoys me more is that we need to keep track of the original SSA value we started the pointer from. For graphics that's most of the time a variable or something we can base the pointer on, but for kernels we usually start from a plain SSA value. Currently I reuse the UBO offset variable to keep track of the initial ssa value, but it's hacky. Most bits of the patch feels rather hacky, but it is much smaller than what we had with the fat ptr approach and I don't have to add a new phys_pointer value type, which made things a lot easier. Signed-off-by: Karol Herbst --- src/compiler/nir/nir.c| 4 +- src/compiler/nir/nir.h| 9 ++ src/compiler/nir/nir_builder.h| 37 +++- src/compiler/nir/nir_clone.c | 1 + src/compiler/nir/nir_deref.c | 26 +- src/compiler/nir/nir_instr_set.c | 2 + src/compiler/nir/nir_intrinsics.py| 7 ++ src/compiler/nir/nir_loop_analyze.c | 2 +- src/compiler/nir/nir_lower_indirect_derefs.c | 6 +- src/compiler/nir/nir_lower_io.c | 79 +++- .../nir/nir_lower_io_arrays_to_elements.c | 4 +- src/compiler/nir/nir_lower_locals_to_regs.c | 9 +- src/compiler/nir/nir_lower_var_copies.c | 3 +- src/compiler/nir/nir_lower_vars_to_ssa.c | 12 ++- src/compiler/nir/nir_opt_copy_propagate.c | 2 +- src/compiler/nir/nir_opt_dead_write_vars.c| 4 +- src/compiler/nir/nir_print.c | 6 +- src/compiler/nir/nir_propagate_invariant.c| 2 + src/compiler/nir/nir_remove_dead_variables.c | 2 + src/compiler/nir/nir_serialize.c | 2 + src/compiler/nir/nir_split_vars.c | 4 +- src/compiler/nir/nir_validate.c | 17 +++- src/compiler/spirv/spirv_to_nir.c | 28 +- src/compiler/spirv/vtn_cfg.c | 3 +- src/compiler/spirv/vtn_private.h | 1 + src/compiler/spirv/vtn_variables.c| 90 +++ 26 files changed, 296 insertions(+), 66 deletions(-) diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c index ca258b7c80e..66648885ec7 100644 --- a/src/compiler/nir/nir.c +++ b/src/compiler/nir/nir.c @@ -463,7 +463,7 @@ nir_deref_instr_create(nir_shader *shader, nir_deref_type deref_type) if (deref_type != nir_deref_type_var) src_init(>parent); - if (deref_type == nir_deref_type_array) + if (nir_deref_is_array(instr)) src_init(>arr.index); dest_init(>dest); @@ -1069,7 +1069,7 @@ visit_deref_instr_src(nir_deref_instr *instr, return false; } - if (instr->deref_type == nir_deref_type_array) { + if (nir_deref_is_array(instr)) { if (!visit_src(>arr.index, cb, state)) return false; } diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index 35f2ec02c31..40f5a0e4e06 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -990,6 +990,7 @@ typedef enum { nir_deref_type_array_wildcard, nir_deref_type_struct, nir_deref_type_cast, + nir_deref_type_ptr_as_array, } nir_deref_type; typedef struct { @@ -1014,6 +1015,7 @@ typedef struct { /** Additional deref parameters */ union { + /** used for deref_array and deref_ptr_as_array */ struct { nir_src index; } arr; @@ -1068,6 +1070,13 @@ bool nir_deref_instr_has_indirect(nir_deref_instr *instr); bool nir_deref_instr_remove_if_unused(nir_deref_instr *instr); +static inline bool +nir_deref_is_array(const nir_deref_instr *instr) +{ + return instr->deref_type == nir_deref_type_array || + instr->deref_type == nir_deref_type_ptr_as_array; +} + typedef struct { nir_instr instr; diff --git a/src/compiler/nir/nir_builder.h b/src/compiler/nir/nir_builder.h index 57f0a188c46..428c6b4fd78 100644 --- a/src/compiler/nir/nir_builder.h +++ b/src/compiler/nir/nir_builder.h @@ -623,6 +623,7 @@ nir_ssa_for_alu_src(nir_builder *build, nir_alu_instr *instr, unsigned srcn) static inline nir_deref_instr * nir_build_deref_var(nir_builder *build, nir_variable *var) { + unsigned ptr_size = build->shader->ptr_size ? build->shader->ptr_size : 32; nir_deref_instr *deref = nir_deref_instr_create(build->shader, nir_deref_type_var); @@ -630,7 +631,7 @@ nir_build_deref_var(nir_builder *build, nir_variable *var) deref->type = var->type; deref->var = var; -
[Mesa-dev] [PATCH 19/22] nir/spirv: handle kernel function parameters
the idea here is to generate an entry point stub function wrapping around the actual kernel function and turn all parameters into shader inputs with byte addressing instead of vec4. This gives us several advantages: 1. calling kernel functions doesn't differ from calling any other function 2. CL inputs match uniforms in most ways and we can just take advantage of most of nir_lower_io Signed-off-by: Karol Herbst --- src/compiler/spirv/spirv_to_nir.c | 32 +++ 1 file changed, 32 insertions(+) diff --git a/src/compiler/spirv/spirv_to_nir.c b/src/compiler/spirv/spirv_to_nir.c index a350a95e27e..dbac3b2e052 100644 --- a/src/compiler/spirv/spirv_to_nir.c +++ b/src/compiler/spirv/spirv_to_nir.c @@ -4335,6 +4335,38 @@ spirv_to_nir(const uint32_t *words, size_t word_count, nir_function *entry_point = b->entry_point->func->impl->function; vtn_assert(entry_point); + /* post process entry_points with input params */ + if (entry_point->num_params) { + /* we shouldn't have any inputs yet */ + vtn_assert(!entry_point->shader->num_inputs); + + nir_function *main_entry_point = nir_function_create(b->shader, ralloc_strdup(b->shader, "main")); + main_entry_point->impl = nir_function_impl_create(main_entry_point); + nir_builder_init(>nb, main_entry_point->impl); + b->nb.cursor = nir_after_cf_list(_entry_point->impl->body); + b->func_param_idx = 0; + + nir_call_instr *call = nir_call_instr_create(b->nb.shader, entry_point); + + for (unsigned i = 0; i < entry_point->num_params; ++i) { + /* input variable */ + nir_variable *in_var = rzalloc(b->nb.shader, nir_variable); + in_var->data.mode = nir_var_shader_in; + in_var->data.read_only = true; + in_var->data.location = i; + in_var->type = b->entry_point->func->type->params[i]->type; + + nir_shader_add_variable(b->nb.shader, in_var); + b->nb.shader->num_inputs++; + + call->params[i] = nir_src_for_ssa(nir_load_var(>nb, in_var)); + } + + nir_builder_instr_insert(>nb, >instr); + + entry_point = main_entry_point; + } + /* Unparent the shader from the vtn_builder before we delete the builder */ ralloc_steal(NULL, b->shader); -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108734] Regression: [bisected] dEQP-GLES31.functional.tessellation.invariance.* start failing on r600
https://bugs.freedesktop.org/show_bug.cgi?id=108734 Bug ID: 108734 Summary: Regression: [bisected] dEQP-GLES31.functional.tessellation.invariance.* start failing on r600 Product: Mesa Version: git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: Other Assignee: mesa-dev@lists.freedesktop.org Reporter: gw.foss...@gmail.com QA Contact: mesa-dev@lists.freedesktop.org The patch 5d517a599b1eabd1d5696bf31e26f16568d35770 st/mesa: Don't record garbage streamout information in the non-SSO case. breaks dEQP-GLES31.functional.tessellation.invariance.* on r600. All the tests pass without this patch, but with the patch applied glGetQueryObjectuiv(queryObject, GL_QUERY_RESULT, ); returns zero in result for all the tests from this set, which is not correct. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 4/5] main: avoid 'may be used uninitialized' warnings
On Tuesday, 2018-11-13 14:19:31 +0200, asimiklit.w...@gmail.com wrote: > From: Andrii Simiklit > > 1. main/texcompress_etc.c:1314:12: > warning: ‘*((void *)+2)’ may be used uninitialized in this function > 2. main/texcompress_etc.c:1354:12: > warning: ‘*((void *)+2)’ may be used uninitialized in this function > 3. main/texcompress_etc.c:1293:12: > warning: ‘dst’ may be used uninitialized in this function > 4. main/texcompress_etc.c:1335:12: > warning: ‘dst’ may be used uninitialized in this function > 5. main/texcompress_etc.c:1460:12: > warning: ‘*((void *)+1)’ may be used uninitialized in this function > > v2: Fixed by adding the unreachable case to the etc2_rgb8_fetch_texel >( Eric Engestrom ) > Changes for warning 'pixerrorcolorbest' were removed. > > Signed-off-by: Andrii Simiklit This is the right way of fixing this code-wise, but I'm not 100% sure we can actually guarantee this logic-wise, so I'll let someone else review (and push) this patch. Acked-by: Eric Engestrom > --- > src/mesa/main/texcompress_etc.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/mesa/main/texcompress_etc.c b/src/mesa/main/texcompress_etc.c > index b39ab33d36..f1da4d0f11 100644 > --- a/src/mesa/main/texcompress_etc.c > +++ b/src/mesa/main/texcompress_etc.c > @@ -548,6 +548,7 @@ etc2_rgb8_fetch_texel(const struct etc2_block *block, >if (punchthrough_alpha) > dst[3] = 255; > } > + else unreachable("unhandled block mode"); > } > > static void > -- > 2.17.1 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 5/5] intel/tools: avoid 'ignoring return value'
On Tuesday, 2018-11-13 14:19:32 +0200, asimiklit.w...@gmail.com wrote: > From: Andrii Simiklit > > 1. tools/i965_disasm.c:58:4: warning: > ignoring return value of ‘fread’, > declared with attribute warn_unused_result > fread(assembly, *end, 1, fp); > > v2: - Fixed incorrect return value check. >( Eric Engestrom ) > > Signed-off-by: Andrii Simiklit > --- > src/intel/tools/i965_disasm.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/intel/tools/i965_disasm.c b/src/intel/tools/i965_disasm.c > index 73a6760fc1..329f6327ed 100644 > --- a/src/intel/tools/i965_disasm.c > +++ b/src/intel/tools/i965_disasm.c > @@ -55,7 +55,8 @@ i965_disasm_read_binary(FILE *fp, size_t *end) > if (assembly == NULL) >return NULL; > > - fread(assembly, *end, 1, fp); > + MAYBE_UNUSED size_t size = fread(assembly, *end, 1, fp); > + assert((size || (*end == 0)) && "error: unable to read all elements!"); I think `*end == 0` should be handled before fread() with an exit(), or just leave it out from the assert so that it fails here. (Realistically most people will be using these tools from debug builds anyway.) > fclose(fp); > > return assembly; > -- > 2.17.1 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 3/5] i965: avoid 'unused variable'
On Tuesday, 2018-11-13 14:19:30 +0200, asimiklit.w...@gmail.com wrote: > From: Andrii Simiklit > > 1. brw_pipe_control.c:311:34: warning: > unused variable ‘devinfo’ > 2. brw_program_binary.c:209:19: warning: > unused variable ‘gen_size’ > 3. brw_program_binary.c:216:19: warning: > unused variable ‘nir_size’ > > v2: Changes for unreproducible issues were removed > > Signed-off-by: Andrii Simiklit > --- > src/mesa/drivers/dri/i965/brw_pipe_control.c | 2 +- > src/mesa/drivers/dri/i965/brw_program_binary.c | 4 ++-- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c > b/src/mesa/drivers/dri/i965/brw_pipe_control.c > index 122ac26070..a3f521b5ae 100644 > --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c > +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c > @@ -308,7 +308,7 @@ brw_emit_depth_stall_flushes(struct brw_context *brw) > void > gen7_emit_vs_workaround_flush(struct brw_context *brw) > { > - const struct gen_device_info *devinfo = >screen->devinfo; > + MAYBE_UNUSED const struct gen_device_info *devinfo = > >screen->devinfo; > > assert(devinfo->gen == 7); This could've just been folded into the assert, but this works. Patches 1-3 are: Reviewed-by: Eric Engestrom I assume you want me to push them for you? > brw_emit_pipe_control_write(brw, > diff --git a/src/mesa/drivers/dri/i965/brw_program_binary.c > b/src/mesa/drivers/dri/i965/brw_program_binary.c > index db03332241..1298d9e765 100644 > --- a/src/mesa/drivers/dri/i965/brw_program_binary.c > +++ b/src/mesa/drivers/dri/i965/brw_program_binary.c > @@ -206,14 +206,14 @@ brw_program_deserialize_driver_blob(struct gl_context > *ctx, > break; >switch ((enum driver_cache_blob_part)part_type) { >case GEN_PART: { > - uint32_t gen_size = blob_read_uint32(); > + MAYBE_UNUSED uint32_t gen_size = blob_read_uint32(); > assert(!reader.overrun && > (uintptr_t)(reader.end - reader.current) > gen_size); > deserialize_gen_program(, ctx, prog, stage); > break; >} >case NIR_PART: { > - uint32_t nir_size = blob_read_uint32(); > + MAYBE_UNUSED uint32_t nir_size = blob_read_uint32(); > assert(!reader.overrun && > (uintptr_t)(reader.end - reader.current) > nir_size); > const struct nir_shader_compiler_options *options = > -- > 2.17.1 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] egl/dri: fix error value with unknown drm format
According to the EGL_EXT_image_dma_buf_import spec, creating an EGL image with a DRM format not supported should yield the BAD_MATCH error : " * If is EGL_LINUX_DMA_BUF_EXT, and the EGL_LINUX_DRM_FOURCC_EXT attribute is set to a format not supported by the EGL, EGL_BAD_MATCH is generated. " Signed-off-by: Lionel Landwerlin Fixes: 20de7f9f226401 ("egl/dri2: support for creating images out of dma buffers") --- src/egl/drivers/dri2/egl_dri2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 3b63aebbf9a..198ba73247f 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -2310,7 +2310,7 @@ dri2_check_dma_buf_format(const _EGLImageAttribs *attrs) { unsigned plane_n = dri2_num_fourcc_format_planes(attrs->DMABufFourCC.Value); if (plane_n == 0) { - _eglError(EGL_BAD_ATTRIBUTE, "invalid format"); + _eglError(EGL_BAD_MATCH, "unknown drm fourcc format"); return 0; } -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] nir: combine fmul and fadd across ffma operations
The brw_nir_opt_peephole_ffma pass is only doing what the fuse_ffma option already does. It produces the same result as the fuse_ffma option, which is not optimal. This is what I get: vec4 32 ssa_7 = fmul ssa_6, ssa_1. vec4 32 ssa_8 = ffma ssa_5, ssa_1., ssa_7 vec4 32 ssa_10 = ffma ssa_9, ssa_1., ssa_8 vec4 32 ssa_12 = fadd ssa_10, ssa_11 But better optimized as (example with the least rearrangements): vec4 32 ssa_7 = ffma ssa_6, ssa_1., ssa_11 vec4 32 ssa_8 = ffma ssa_5, ssa_1., ssa_7 vec4 32 ssa_10 = ffma ssa_9, ssa_1., ssa_8 Fusing the fmul and fadd in this case is not obvious. Could this patch be OK if it is behind the fuse_ffma option? On 11/12/2018 02:30 PM, Jason Ekstrand wrote: In general, you're not supposed to mess around with the precision of fma... What we do in the Intel drivers is to leave fma split, apply operations, and then we have a special mul+add fusion pass we run at the end. Leaving them split allows for exactly this kind of optimization without mixing up those FMAs that are supposed to be kept fused and those generated by mul+add fusion which can be split back apart and re-optimized. On Mon, Nov 12, 2018 at 12:17 PM Jonathan Marek wrote: This works by moving the fadd up across the ffma operations, so that it can eventually can be combined with a fmul. I'm not sure it works in all cases, but it works in all the common cases. Example: matrix * vec4(coord, 1.0) is compiled as: fmul, ffma, ffma, fadd and with this patch: ffma, ffma, ffma Signed-off-by: Jonathan Marek --- src/compiler/nir/nir_opt_algebraic.py | 1 + 1 file changed, 1 insertion(+) diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py index 8f4df891b8..82e10731a6 100644 --- a/src/compiler/nir/nir_opt_algebraic.py +++ b/src/compiler/nir/nir_opt_algebraic.py @@ -133,6 +133,7 @@ optimizations = [ (('~fadd@64', a, ('fmul', c , ('fadd', b, ('fneg', a, ('flrp', a, b, c), '!options->lower_flrp64'), (('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'), (('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'), + (('~fadd', ('ffma', a, b, c), d), ('ffma', a, b, ('fadd', c, d))), (('fdot4', ('vec4', a, b, c, 1.0), d), ('fdph', ('vec3', a, b, c), d)), (('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)), -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl/dri: fix error value with unknown drm format
On Tue, 13 Nov 2018 at 14:11, Lionel Landwerlin wrote: > > According to the EGL_EXT_image_dma_buf_import spec, creating an EGL > image with a DRM format not supported should yield the BAD_MATCH > error : > > " >* If is EGL_LINUX_DMA_BUF_EXT, and the > EGL_LINUX_DRM_FOURCC_EXT > attribute is set to a format not supported by the EGL, EGL_BAD_MATCH > is generated. > " > > Signed-off-by: Lionel Landwerlin > Fixes: 20de7f9f226401 ("egl/dri2: support for creating images out of dma > buffers") Reviewed-by: Emil Velikov Great catch Lionel. Out of curiosity, how did you spot this? -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/22] glsl: add cl_size and cl_alignment
Signed-off-by: Karol Herbst --- src/compiler/glsl_types.cpp | 48 + src/compiler/glsl_types.h | 10 src/compiler/nir_types.cpp | 12 ++ src/compiler/nir_types.h| 4 4 files changed, 74 insertions(+) diff --git a/src/compiler/glsl_types.cpp b/src/compiler/glsl_types.cpp index 9b1fd809b41..c74e67f7be1 100644 --- a/src/compiler/glsl_types.cpp +++ b/src/compiler/glsl_types.cpp @@ -2236,3 +2236,51 @@ decode_type_from_blob(struct blob_reader *blob) return NULL; } } + +unsigned +glsl_type::cl_alignment() const +{ + /* vectors unlike arrays are aligned to their size */ + if (this->is_scalar() || this->is_vector()) + return this->cl_size(); + else if (this->is_array()) + return this->without_array()->cl_alignment(); + else if (this->is_record()) { + /* Packed Structs are 0x1 aligned despite their size. */ + if (this->packed) + return 1; + + unsigned res = 1; + for (unsigned i = 0; i < this->length; ++i) { + struct glsl_struct_field = this->fields.structure[i]; + res = MAX2(res, field.type->cl_alignment()); + } + return res; + } + return 1; +} + +unsigned +glsl_type::cl_size() const +{ + if (this->is_scalar()) { + return glsl_base_get_byte_size(this->base_type); + } else if (this->is_vector()) { + unsigned vec_elemns = this->vector_elements == 3 ? 4 : this->vector_elements; + return vec_elemns * glsl_base_get_byte_size(this->base_type); + } else if (this->is_array()) { + unsigned size = this->without_array()->cl_size(); + return size * this->length; + } else if (this->is_record()) { + unsigned size = 0; + for (unsigned i = 0; i < this->length; ++i) { + struct glsl_struct_field = this->fields.structure[i]; + /* if a struct is packed, members don't get aligned */ + if (!this->packed) +size = align(size, field.type->cl_alignment()); + size += field.type->cl_size(); + } + return size; + } + return 1; +} diff --git a/src/compiler/glsl_types.h b/src/compiler/glsl_types.h index efcbc70af26..c72109cdcfe 100644 --- a/src/compiler/glsl_types.h +++ b/src/compiler/glsl_types.h @@ -421,6 +421,16 @@ public: */ unsigned std430_size(bool row_major) const; + /** +* Alignment in bytes of the start of this type in OpenCL memory. +*/ + unsigned cl_alignment() const; + + /** +* Size in bytes of this type in OpenCL memory +*/ + unsigned cl_size() const; + /** * \brief Can this type be implicitly converted to another? * diff --git a/src/compiler/nir_types.cpp b/src/compiler/nir_types.cpp index 506dabdeb1d..cdde3597f70 100644 --- a/src/compiler/nir_types.cpp +++ b/src/compiler/nir_types.cpp @@ -597,3 +597,15 @@ glsl_contains_atomic(const struct glsl_type *type) { return type->contains_atomic(); } + +int +glsl_get_cl_size(const struct glsl_type *type) +{ + return type->cl_size(); +} + +int +glsl_get_cl_alignment(const struct glsl_type *type) +{ + return type->cl_alignment(); +} diff --git a/src/compiler/nir_types.h b/src/compiler/nir_types.h index c06d227e45a..8304b22254b 100644 --- a/src/compiler/nir_types.h +++ b/src/compiler/nir_types.h @@ -91,6 +91,10 @@ unsigned glsl_get_record_location_offset(const struct glsl_type *type, unsigned glsl_atomic_size(const struct glsl_type *type); +int glsl_get_cl_size(const struct glsl_type *type); + +int glsl_get_cl_alignment(const struct glsl_type *type); + static inline unsigned glsl_get_bit_size(const struct glsl_type *type) { -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/22] glsl: add packed for struct types
We need this for OpenCL kernels because we have to apply C rules for alignment and padding inside structs and for this we also have to know if a struct is packed or not. Signed-off-by: Karol Herbst --- src/compiler/glsl_types.cpp | 17 +++-- src/compiler/glsl_types.h | 12 ++-- src/compiler/nir_types.cpp| 5 +++-- src/compiler/nir_types.h | 3 ++- src/compiler/spirv/spirv_to_nir.c | 10 +- src/compiler/spirv/vtn_private.h | 7 +++ 6 files changed, 42 insertions(+), 12 deletions(-) diff --git a/src/compiler/glsl_types.cpp b/src/compiler/glsl_types.cpp index e6262371bd0..9b1fd809b41 100644 --- a/src/compiler/glsl_types.cpp +++ b/src/compiler/glsl_types.cpp @@ -91,11 +91,11 @@ glsl_type::glsl_type(GLenum gl_type, glsl_base_type base_type, } glsl_type::glsl_type(const glsl_struct_field *fields, unsigned num_fields, - const char *name) : + const char *name, bool packed) : gl_type(0), base_type(GLSL_TYPE_STRUCT), sampled_type(GLSL_TYPE_VOID), sampler_dimensionality(0), sampler_shadow(0), sampler_array(0), - interface_packing(0), interface_row_major(0), + interface_packing(0), interface_row_major(0), packed(packed), vector_elements(0), matrix_columns(0), length(num_fields) { @@ -1004,9 +1004,10 @@ glsl_type::record_key_hash(const void *a) const glsl_type * glsl_type::get_record_instance(const glsl_struct_field *fields, unsigned num_fields, - const char *name) + const char *name, + bool packed) { - const glsl_type key(fields, num_fields, name); + const glsl_type key(fields, num_fields, name, packed); mtx_lock(_type::hash_mutex); @@ -1018,7 +1019,7 @@ glsl_type::get_record_instance(const glsl_struct_field *fields, const struct hash_entry *entry = _mesa_hash_table_search(record_types, ); if (entry == NULL) { - const glsl_type *t = new glsl_type(fields, num_fields, name); + const glsl_type *t = new glsl_type(fields, num_fields, name, packed); entry = _mesa_hash_table_insert(record_types, t, (void *) t); } @@ -1026,6 +1027,7 @@ glsl_type::get_record_instance(const glsl_struct_field *fields, assert(((glsl_type *) entry->data)->base_type == GLSL_TYPE_STRUCT); assert(((glsl_type *) entry->data)->length == num_fields); assert(strcmp(((glsl_type *) entry->data)->name, name) == 0); + assert(((glsl_type *) entry->data)->packed == packed); mtx_unlock(_type::hash_mutex); @@ -2138,6 +2140,8 @@ encode_type_to_blob(struct blob *blob, const glsl_type *type) if (type->is_interface()) { blob_write_uint32(blob, type->interface_packing); blob_write_uint32(blob, type->interface_row_major); + } else { + blob_write_uint32(blob, type->packed); } return; case GLSL_TYPE_VOID: @@ -2217,7 +2221,8 @@ decode_type_from_blob(struct blob_reader *blob) t = glsl_type::get_interface_instance(fields, num_fields, packing, row_major, name); } else { - t = glsl_type::get_record_instance(fields, num_fields, name); + unsigned packed = blob_read_uint32(blob); + t = glsl_type::get_record_instance(fields, num_fields, name, packed); } free(fields); diff --git a/src/compiler/glsl_types.h b/src/compiler/glsl_types.h index d32b580acc1..f2163728610 100644 --- a/src/compiler/glsl_types.h +++ b/src/compiler/glsl_types.h @@ -176,6 +176,13 @@ struct glsl_type { unsigned interface_packing:2; unsigned interface_row_major:1; + /** +* For \c GLSL_TYPE_STRUCT this specifies if the struct is packed or not. +* +* Only used for Compute kernels +*/ + unsigned packed:1; + private: glsl_type() : mem_ctx(NULL) { @@ -299,7 +306,8 @@ public: */ static const glsl_type *get_record_instance(const glsl_struct_field *fields, unsigned num_fields, - const char *name); + const char *name, + bool packed = false); /** * Get the instance of an interface block type @@ -888,7 +896,7 @@ private: /** Constructor for record types */ glsl_type(const glsl_struct_field *fields, unsigned num_fields, -const char *name); +const char *name, bool packed = false); /** Constructor for interface types */ glsl_type(const glsl_struct_field *fields, unsigned num_fields, diff --git a/src/compiler/nir_types.cpp b/src/compiler/nir_types.cpp index 3cd61f66056..506dabdeb1d 100644 --- a/src/compiler/nir_types.cpp +++ b/src/compiler/nir_types.cpp @@ -439,9 +439,10 @@ glsl_array_type(const
[Mesa-dev] [PATCH 13/22] nir/spirv: parse memory model
Signed-off-by: Karol Herbst --- src/compiler/nir/nir.h| 8 src/compiler/nir/nir_clone.c | 1 + src/compiler/nir/nir_serialize.c | 2 ++ src/compiler/spirv/spirv_to_nir.c | 15 +-- 4 files changed, 24 insertions(+), 2 deletions(-) diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index 11e3d18320a..be4f64464f9 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -2204,6 +2204,14 @@ typedef struct nir_shader { */ void *constant_data; unsigned constant_data_size; + + /** +* pointer size is: +* AddressingModelLogical:0(default) +* AddressingModelPhysical32: 32 +* AddressingModelPhysical64: 64 +*/ + unsigned ptr_size; } nir_shader; static inline nir_function_impl * diff --git a/src/compiler/nir/nir_clone.c b/src/compiler/nir/nir_clone.c index 989c5051a54..d47d3e8cb72 100644 --- a/src/compiler/nir/nir_clone.c +++ b/src/compiler/nir/nir_clone.c @@ -733,6 +733,7 @@ nir_shader_clone(void *mem_ctx, const nir_shader *s) ns->num_uniforms = s->num_uniforms; ns->num_outputs = s->num_outputs; ns->num_shared = s->num_shared; + ns->ptr_size = s->ptr_size; ns->constant_data_size = s->constant_data_size; if (s->constant_data_size > 0) { diff --git a/src/compiler/nir/nir_serialize.c b/src/compiler/nir/nir_serialize.c index 43016310048..5ec6972b02a 100644 --- a/src/compiler/nir/nir_serialize.c +++ b/src/compiler/nir/nir_serialize.c @@ -1106,6 +1106,7 @@ nir_serialize(struct blob *blob, const nir_shader *nir) blob_write_uint32(blob, nir->num_uniforms); blob_write_uint32(blob, nir->num_outputs); blob_write_uint32(blob, nir->num_shared); + blob_write_uint32(blob, nir->ptr_size); blob_write_uint32(blob, exec_list_length(>functions)); nir_foreach_function(fxn, nir) { @@ -1165,6 +1166,7 @@ nir_deserialize(void *mem_ctx, ctx.nir->num_uniforms = blob_read_uint32(blob); ctx.nir->num_outputs = blob_read_uint32(blob); ctx.nir->num_shared = blob_read_uint32(blob); + ctx.nir->ptr_size = blob_read_uint32(blob); unsigned num_functions = blob_read_uint32(blob); for (unsigned i = 0; i < num_functions; i++) diff --git a/src/compiler/spirv/spirv_to_nir.c b/src/compiler/spirv/spirv_to_nir.c index db2ee51340c..e597b2462cb 100644 --- a/src/compiler/spirv/spirv_to_nir.c +++ b/src/compiler/spirv/spirv_to_nir.c @@ -3588,9 +3588,20 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, SpvOp opcode, break; case SpvOpMemoryModel: + if (w[2] == SpvMemoryModelOpenCL) { + if (w[1] == SpvAddressingModelPhysical32) +b->shader->ptr_size = 32; + else if (w[1] == SpvAddressingModelPhysical64) +b->shader->ptr_size = 64; + else +vtn_fail("Couldn't parse OpenCL Memory Model"); + break; + } + vtn_assert(w[1] == SpvAddressingModelLogical); vtn_assert(w[2] == SpvMemoryModelSimple || w[2] == SpvMemoryModelGLSL450); + b->shader->ptr_size = 0; break; case SpvOpEntryPoint: @@ -4265,6 +4276,8 @@ spirv_to_nir(const uint32_t *words, size_t word_count, /* Skip the SPIR-V header, handled at vtn_create_builder */ words+= 5; + b->shader = nir_shader_create(b, stage, nir_options, NULL); + /* Handle all the preamble instructions */ words = vtn_foreach_instruction(b, words, word_end, vtn_handle_preamble_instruction); @@ -4275,8 +4288,6 @@ spirv_to_nir(const uint32_t *words, size_t word_count, return NULL; } - b->shader = nir_shader_create(b, stage, nir_options, NULL); - /* Set shader info defaults */ b->shader->info.gs.invocations = 1; -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/22] nir/spirv: initial handling of OpenCL.std extension opcodes
Not complete, mostly just adding things as I encounter them in CTS. But not getting far enough yet to hit most of the OpenCL.std instructions. Anyway, this is better than nothing and covers the most common builtins. Signed-off-by: Karol Herbst --- src/compiler/nir/meson.build | 1 + src/compiler/nir/nir_builtin_builder.c | 249 +- src/compiler/nir/nir_builtin_builder.h | 150 - src/compiler/spirv/spirv_to_nir.c | 2 + src/compiler/spirv/vtn_alu.c | 15 ++ src/compiler/spirv/vtn_glsl450.c | 2 +- src/compiler/spirv/vtn_opencl.c| 284 + src/compiler/spirv/vtn_private.h | 3 + 8 files changed, 701 insertions(+), 5 deletions(-) create mode 100644 src/compiler/spirv/vtn_opencl.c diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build index b0c3a7feb31..00d7f56e6eb 100644 --- a/src/compiler/nir/meson.build +++ b/src/compiler/nir/meson.build @@ -206,6 +206,7 @@ files_libnir = files( '../spirv/vtn_amd.c', '../spirv/vtn_cfg.c', '../spirv/vtn_glsl450.c', + '../spirv/vtn_opencl.c', '../spirv/vtn_private.h', '../spirv/vtn_subgroup.c', '../spirv/vtn_variables.c', diff --git a/src/compiler/nir/nir_builtin_builder.c b/src/compiler/nir/nir_builtin_builder.c index 252a7691f36..e37915e92ca 100644 --- a/src/compiler/nir/nir_builtin_builder.c +++ b/src/compiler/nir/nir_builtin_builder.c @@ -21,11 +21,43 @@ * IN THE SOFTWARE. */ +#include + #include "nir.h" #include "nir_builtin_builder.h" nir_ssa_def* -nir_cross(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y) +nir_iadd_sat(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y) +{ + int64_t max; + switch (x->bit_size) { + case 64: + max = INT64_MAX; + break; + case 32: + max = INT32_MAX; + break; + case 16: + max = INT16_MAX; + break; + case 8: + max = INT8_MAX; + break; + } + + nir_ssa_def *sum = nir_iadd(b, x, y); + + nir_ssa_def *hi = nir_bcsel(b, nir_ilt(b, sum, x), + nir_imm_intN_t(b, max, x->bit_size), sum); + + nir_ssa_def *lo = nir_bcsel(b, nir_ilt(b, x, sum), + nir_imm_intN_t(b, max + 1, x->bit_size), sum); + + return nir_bcsel(b, nir_ige(b, y, nir_imm_intN_t(b, 1, y->bit_size)), hi, lo); +} + +nir_ssa_def* +nir_cross3(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y) { unsigned yzx[3] = { 1, 2, 0 }; unsigned zxy[3] = { 2, 0, 1 }; @@ -36,6 +68,63 @@ nir_cross(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y) nir_swizzle(b, y, yzx, 3, true))); } +nir_ssa_def* +nir_cross4(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y) +{ + nir_ssa_def *cross = nir_cross3(b, x, y); + + return nir_vec4(b, + nir_channel(b, cross, 0), + nir_channel(b, cross, 1), + nir_channel(b, cross, 2), + nir_imm_intN_t(b, 0, cross->bit_size)); +} + +static nir_ssa_def* +nir_hadd(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y, bool sign) +{ + nir_ssa_def *imm1 = nir_imm_int(b, 1); + + nir_ssa_def *t0 = nir_ixor(b, x, y); + nir_ssa_def *t1 = nir_iand(b, x, y); + + nir_ssa_def *t2; + if (sign) + t2 = nir_ishr(b, t0, imm1); + else + t2 = nir_ushr(b, t0, imm1); + return nir_iadd(b, t1, t2); +} + +nir_ssa_def* +nir_ihadd(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y) +{ + return nir_hadd(b, x, y, true); +} + +nir_ssa_def* +nir_uhadd(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y) +{ + return nir_hadd(b, x, y, false); +} + +nir_ssa_def* +nir_length(nir_builder *b, nir_ssa_def *vec) +{ + nir_ssa_def *finf = nir_imm_floatN_t(b, INFINITY, vec->bit_size); + + nir_ssa_def *abs = nir_fabs(b, vec); + if (vec->num_components == 1) + return abs; + + nir_ssa_def *maxc = nir_fmax(b, nir_channel(b, abs, 0), nir_channel(b, abs, 1)); + for (int i = 2; i < vec->num_components; ++i) + maxc = nir_fmax(b, maxc, nir_channel(b, abs, i)); + abs = nir_fdiv(b, abs, maxc); + nir_ssa_def *res = nir_fmul(b, nir_fsqrt(b, nir_fdot(b, abs, abs)), maxc); + return nir_bcsel(b, nir_feq(b, maxc, finf), maxc, res); +} + nir_ssa_def* nir_fast_length(nir_builder *b, nir_ssa_def *vec) { @@ -49,6 +138,107 @@ nir_fast_length(nir_builder *b, nir_ssa_def *vec) } } +nir_ssa_def* +nir_nextafter(nir_builder *b, nir_ssa_def *x, nir_ssa_def *y) +{ + nir_ssa_def *zero = nir_imm_intN_t(b, 0, x->bit_size); + nir_ssa_def *one = nir_imm_intN_t(b, 1, x->bit_size); + nir_ssa_def *nzero = nir_imm_intN_t(b, 1ull << (x->bit_size - 1), x->bit_size); + + nir_ssa_def *condeq = nir_feq(b, x, y); + nir_ssa_def *conddir = nir_flt(b, x, y); + nir_ssa_def *condnzero = nir_feq(b, x, nzero); + + // beware of -0.0 - 1 == NaN + nir_ssa_def *xn = + nir_bcsel(b, +condnzero, +nir_imm_intN_t(b, (1 << (x->bit_size - 1)) + 1, x->bit_size), +nir_isub(b, x, one)); + + // beware of -0.0 + 1 == -0x1p-149 +
[Mesa-dev] [PATCH 12/22] nir: add type alignment support to lower_io
From: Rob Clark For cl we can have structs with 8/16/32/64 bit scalar types (as well as, ofc, arrays/structs/etc), which are padded according to 'C' rules. So for lowering struct deref's we need to not just consider a field's size, but also it's alignment. Signed-off-by: Karol Herbst --- src/compiler/nir/nir.h | 10 +++ src/compiler/nir/nir_lower_io.c | 52 - 2 files changed, 49 insertions(+), 13 deletions(-) diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index c469e111b2c..11e3d18320a 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -2825,10 +2825,20 @@ typedef enum { */ nir_lower_io_force_sample_interpolation = (1 << 1), } nir_lower_io_options; +typedef struct nir_memory_model { + int (*type_size)(const struct glsl_type *); + int (*type_align)(const struct glsl_type *); +} nir_memory_model; bool nir_lower_io(nir_shader *shader, nir_variable_mode modes, int (*type_size)(const struct glsl_type *), nir_lower_io_options); +// TEMP use different name to avoid fixing all the callers yet: +bool nir_lower_io2(nir_shader *shader, + nir_variable_mode modes, + const nir_memory_model *mm, + nir_lower_io_options); + nir_src *nir_get_io_offset_src(nir_intrinsic_instr *instr); nir_src *nir_get_io_vertex_index_src(nir_intrinsic_instr *instr); diff --git a/src/compiler/nir/nir_lower_io.c b/src/compiler/nir/nir_lower_io.c index 2a6c284de2b..292baf9e4fc 100644 --- a/src/compiler/nir/nir_lower_io.c +++ b/src/compiler/nir/nir_lower_io.c @@ -38,7 +38,7 @@ struct lower_io_state { void *dead_ctx; nir_builder builder; - int (*type_size)(const struct glsl_type *type); + const nir_memory_model *mm; nir_variable_mode modes; nir_lower_io_options options; }; @@ -86,12 +86,26 @@ nir_is_per_vertex_io(const nir_variable *var, gl_shader_stage stage) return false; } +static int +default_type_align(const struct glsl_type *type) +{ + return 1; +} + +static inline int +align(int value, int alignment) +{ + return (value + alignment - 1) & ~(alignment - 1); +} + static nir_ssa_def * get_io_offset(nir_deref_instr *deref, nir_ssa_def **vertex_index, struct lower_io_state *state, unsigned *component) { nir_builder *b = >builder; - int (*type_size)(const struct glsl_type *) = state->type_size; + int (*type_size)(const struct glsl_type *) = state->mm->type_size; + int (*type_align)(const struct glsl_type *) = state->mm->type_align ? + state->mm->type_align : default_type_align; nir_deref_path path; nir_deref_path_init(, deref, NULL); @@ -137,7 +151,10 @@ get_io_offset(nir_deref_instr *deref, nir_ssa_def **vertex_index, unsigned field_offset = 0; for (unsigned i = 0; i < (*p)->strct.index; i++) { -field_offset += type_size(glsl_get_struct_field(parent->type, i)); +const struct glsl_type *field_type = + glsl_get_struct_field(parent->type, i); +field_offset = align(field_offset, type_align(field_type)); +field_offset += type_size(field_type); } offset = nir_iadd(b, offset, nir_imm_int(b, field_offset)); } else { @@ -207,7 +224,7 @@ lower_load(nir_intrinsic_instr *intrin, struct lower_io_state *state, nir_intrinsic_set_component(load, component); if (load->intrinsic == nir_intrinsic_load_uniform) - nir_intrinsic_set_range(load, state->type_size(var->type)); + nir_intrinsic_set_range(load, state->mm->type_size(var->type)); if (vertex_index) { load->src[0] = nir_src_for_ssa(vertex_index); @@ -488,10 +505,8 @@ nir_lower_io_block(nir_block *block, } static bool -nir_lower_io_impl(nir_function_impl *impl, - nir_variable_mode modes, - int (*type_size)(const struct glsl_type *), - nir_lower_io_options options) +nir_lower_io_impl(nir_function_impl *impl, nir_variable_mode modes, + const nir_memory_model *mm, nir_lower_io_options options) { struct lower_io_state state; bool progress = false; @@ -499,7 +514,7 @@ nir_lower_io_impl(nir_function_impl *impl, nir_builder_init(, impl); state.dead_ctx = ralloc_context(NULL); state.modes = modes; - state.type_size = type_size; + state.mm = mm; state.options = options; nir_foreach_block(block, impl) { @@ -514,22 +529,33 @@ nir_lower_io_impl(nir_function_impl *impl, } bool -nir_lower_io(nir_shader *shader, nir_variable_mode modes, - int (*type_size)(const struct glsl_type *), - nir_lower_io_options options) +nir_lower_io2(nir_shader *shader, nir_variable_mode modes, + const nir_memory_model *mm, nir_lower_io_options options) { bool progress = false; nir_foreach_function(function, shader) { if (function->impl) { progress
[Mesa-dev] [PATCH 17/22] nir: rename global to private memory
the naming is a bit confusing no matter how you look at it. Within OpenCL "global" memory is memory accessible from all threads. glsl "global" memory normally refers to shader thread private memory declared at global scope. As we already use "shared" for memory shared across all thrads of a work group the solution where everybody could be happy with is to rename "global" to "private" and use "global" later for memory usually stored within system accessible memory (be it VRAM or system RAM if keeping SVM in mind). Signed-off-by: Karol Herbst --- src/compiler/glsl/glsl_to_nir.cpp| 4 ++-- src/compiler/nir/nir.c | 2 +- src/compiler/nir/nir.h | 2 +- src/compiler/nir/nir_linking_helpers.c | 2 +- .../nir/nir_lower_constant_initializers.c| 2 +- .../nir/nir_lower_global_vars_to_local.c | 4 ++-- src/compiler/nir/nir_lower_io_to_temporaries.c | 2 +- src/compiler/nir/nir_opt_copy_prop_vars.c| 4 ++-- src/compiler/nir/nir_opt_dead_write_vars.c | 2 +- src/compiler/nir/nir_print.c | 4 ++-- src/compiler/nir/nir_remove_dead_variables.c | 4 ++-- src/compiler/nir/nir_split_vars.c| 16 src/compiler/nir/tests/vars_tests.cpp| 2 +- src/compiler/spirv/vtn_private.h | 2 +- src/compiler/spirv/vtn_variables.c | 6 +++--- src/gallium/auxiliary/nir/tgsi_to_nir.c | 2 +- src/mesa/state_tracker/st_glsl_to_nir.cpp| 2 +- 17 files changed, 31 insertions(+), 31 deletions(-) diff --git a/src/compiler/glsl/glsl_to_nir.cpp b/src/compiler/glsl/glsl_to_nir.cpp index 0479f8fcfe4..8564cd89b5a 100644 --- a/src/compiler/glsl/glsl_to_nir.cpp +++ b/src/compiler/glsl/glsl_to_nir.cpp @@ -312,7 +312,7 @@ nir_visitor::visit(ir_variable *ir) case ir_var_auto: case ir_var_temporary: if (is_global) - var->data.mode = nir_var_global; + var->data.mode = nir_var_private; else var->data.mode = nir_var_local; break; @@ -1433,7 +1433,7 @@ nir_visitor::visit(ir_expression *ir) * sense, we'll just turn it into a load which will probably * eventually end up as an SSA definition. */ - assert(this->deref->mode == nir_var_global); + assert(this->deref->mode == nir_var_private); op = nir_intrinsic_load_deref; } diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c index 249b9357c3f..27f5d1b7bca 100644 --- a/src/compiler/nir/nir.c +++ b/src/compiler/nir/nir.c @@ -129,7 +129,7 @@ nir_shader_add_variable(nir_shader *shader, nir_variable *var) assert(!"nir_shader_add_variable cannot be used for local variables"); break; - case nir_var_global: + case nir_var_private: exec_list_push_tail(>globals, >node); break; diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index 89c28e36618..78f3204d3e2 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -96,7 +96,7 @@ typedef struct { typedef enum { nir_var_shader_in = (1 << 0), nir_var_shader_out = (1 << 1), - nir_var_global = (1 << 2), + nir_var_private = (1 << 2), nir_var_local = (1 << 3), nir_var_uniform = (1 << 4), nir_var_shader_storage = (1 << 5), diff --git a/src/compiler/nir/nir_linking_helpers.c b/src/compiler/nir/nir_linking_helpers.c index a05890ada43..d8358e08e5a 100644 --- a/src/compiler/nir/nir_linking_helpers.c +++ b/src/compiler/nir/nir_linking_helpers.c @@ -134,7 +134,7 @@ nir_remove_unused_io_vars(nir_shader *shader, struct exec_list *var_list, if (!(other_stage & get_variable_io_mask(var, shader->info.stage))) { /* This one is invalid, make it a global variable instead */ var->data.location = 0; - var->data.mode = nir_var_global; + var->data.mode = nir_var_private; exec_node_remove(>node); exec_list_push_tail(>globals, >node); diff --git a/src/compiler/nir/nir_lower_constant_initializers.c b/src/compiler/nir/nir_lower_constant_initializers.c index 4e9cea46157..932a32b3c9c 100644 --- a/src/compiler/nir/nir_lower_constant_initializers.c +++ b/src/compiler/nir/nir_lower_constant_initializers.c @@ -98,7 +98,7 @@ nir_lower_constant_initializers(nir_shader *shader, nir_variable_mode modes) if (modes & nir_var_shader_out) progress |= lower_const_initializer(, >outputs); - if (modes & nir_var_global) + if (modes & nir_var_private) progress |= lower_const_initializer(, >globals); if (modes & nir_var_system_value) diff --git a/src/compiler/nir/nir_lower_global_vars_to_local.c b/src/compiler/nir/nir_lower_global_vars_to_local.c index be99cf9ad02..6c6d9a9d25c 100644 --- a/src/compiler/nir/nir_lower_global_vars_to_local.c +++ b/src/compiler/nir/nir_lower_global_vars_to_local.c @@ -36,7 +36,7 @@ static void
[Mesa-dev] [PATCH 04/22] nir/spirv: add OpIsFinite and OpIsNormal
From: Rob Clark changes by Karol: v2: make compatible with 64 bit floats fix isfinite v3: use snake_case. Signed-off-by: Karol Herbst --- src/compiler/spirv/vtn_alu.c | 32 1 file changed, 32 insertions(+) diff --git a/src/compiler/spirv/vtn_alu.c b/src/compiler/spirv/vtn_alu.c index b1492c1501a..ea25d4bcbdc 100644 --- a/src/compiler/spirv/vtn_alu.c +++ b/src/compiler/spirv/vtn_alu.c @@ -583,6 +583,38 @@ vtn_handle_alu(struct vtn_builder *b, SpvOp opcode, break; } + case SpvOpIsFinite: { + nir_ssa_def *inf = nir_imm_floatN_t(>nb, INFINITY, src[0]->bit_size); + nir_ssa_def *is_number = nir_feq(>nb, src[0], src[0]); + nir_ssa_def *is_not_inf = nir_ine(>nb, nir_fabs(>nb, src[0]), inf); + val->ssa->def = nir_iand(>nb, is_number, is_not_inf); + break; + } + + case SpvOpIsNormal: { + unsigned bit_size = src[0]->bit_size; + + uint32_t m; + if (bit_size == 64) + m = 11; + else if (bit_size == 32) + m = 8; + else if (bit_size == 16) + m = 5; + else + assert(!"unknown float type"); + + nir_ssa_def *shift = nir_imm_int(>nb, bit_size - m - 1); + nir_ssa_def *abs = nir_fabs(>nb, src[0]); + nir_ssa_def *exp = nir_iadd(>nb, + nir_ushr(>nb, abs, shift), + nir_imm_intN_t(>nb, -1, bit_size)); + val->ssa->def = nir_ult(>nb, + exp, + nir_imm_intN_t(>nb, (1 << m) - 2, bit_size)); + break; + } + case SpvOpFUnordEqual: case SpvOpFUnordNotEqual: case SpvOpFUnordLessThan: -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/22] nir/vtn: add caps for some cl related capabilities
From: Rob Clark vtn supports these, so don't squalk if user is happy with enabling these. Signed-off-by: Karol Herbst --- src/compiler/shader_info.h | 3 +++ src/compiler/spirv/spirv_to_nir.c | 16 +--- src/compiler/spirv/vtn_variables.c | 6 -- 3 files changed, 20 insertions(+), 5 deletions(-) diff --git a/src/compiler/shader_info.h b/src/compiler/shader_info.h index 65bc0588d67..5286cf8fc5f 100644 --- a/src/compiler/shader_info.h +++ b/src/compiler/shader_info.h @@ -62,6 +62,9 @@ struct spirv_supported_capabilities { bool post_depth_coverage; bool transform_feedback; bool geometry_streams; + bool address; + bool kernel; + bool int8; }; typedef struct shader_info { diff --git a/src/compiler/spirv/spirv_to_nir.c b/src/compiler/spirv/spirv_to_nir.c index d7dd5a67cc4..db2ee51340c 100644 --- a/src/compiler/spirv/spirv_to_nir.c +++ b/src/compiler/spirv/spirv_to_nir.c @@ -792,8 +792,10 @@ struct_member_decoration_cb(struct vtn_builder *b, case SpvDecorationFPRoundingMode: case SpvDecorationFPFastMathMode: case SpvDecorationAlignment: - vtn_warn("Decoration only allowed for CL-style kernels: %s", - spirv_decoration_to_string(dec->decoration)); + if (!b->kernel_mode) { + vtn_warn("Decoration only allowed for CL-style kernels: %s", + spirv_decoration_to_string(dec->decoration)); + } break; case SpvDecorationHlslSemanticGOOGLE: @@ -3428,7 +3430,6 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, SpvOp opcode, case SpvCapabilityFloat16: case SpvCapabilityInt64Atomics: case SpvCapabilityStorageImageMultisample: - case SpvCapabilityInt8: case SpvCapabilitySparseResidency: case SpvCapabilityMinLod: vtn_warn("Unsupported SPIR-V capability: %s", @@ -3457,8 +3458,17 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, SpvOp opcode, spv_check_supported(geometry_streams, cap); break; + case SpvCapabilityInt8: + spv_check_supported(int8, cap); + break; + case SpvCapabilityAddresses: + spv_check_supported(address, cap); + break; case SpvCapabilityKernel: + spv_check_supported(kernel, cap); + break; + case SpvCapabilityImageBasic: case SpvCapabilityImageReadWrite: case SpvCapabilityImageMipmap: diff --git a/src/compiler/spirv/vtn_variables.c b/src/compiler/spirv/vtn_variables.c index c5cf345d02a..e7654b768af 100644 --- a/src/compiler/spirv/vtn_variables.c +++ b/src/compiler/spirv/vtn_variables.c @@ -1371,8 +1371,10 @@ apply_var_decoration(struct vtn_builder *b, case SpvDecorationFPRoundingMode: case SpvDecorationFPFastMathMode: case SpvDecorationAlignment: - vtn_warn("Decoration only allowed for CL-style kernels: %s", - spirv_decoration_to_string(dec->decoration)); + if (!b->kernel_mode) { + vtn_warn("Decoration only allowed for CL-style kernels: %s", + spirv_decoration_to_string(dec->decoration)); + } break; case SpvDecorationHlslSemanticGOOGLE: -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 15/22] nir: add support for address bit sized system values
Signed-off-by: Karol Herbst --- src/amd/vulkan/radv_meta_buffer.c | 8 ++-- src/amd/vulkan/radv_meta_bufimage.c | 16 src/amd/vulkan/radv_meta_fast_clear.c | 4 +- src/amd/vulkan/radv_meta_resolve_cs.c | 4 +- src/amd/vulkan/radv_query.c | 8 ++-- src/compiler/nir/nir_builder_opcodes_h.py | 15 ++- src/compiler/nir/nir_intrinsics.py| 10 ++--- src/compiler/nir/nir_lower_system_values.c| 40 --- src/gallium/auxiliary/nir/tgsi_to_nir.c | 2 +- src/gallium/drivers/vc4/vc4_nir_lower_blend.c | 4 +- src/intel/compiler/brw_nir.c | 2 +- 11 files changed, 67 insertions(+), 46 deletions(-) diff --git a/src/amd/vulkan/radv_meta_buffer.c b/src/amd/vulkan/radv_meta_buffer.c index 76854d7bbad..208988c3775 100644 --- a/src/amd/vulkan/radv_meta_buffer.c +++ b/src/amd/vulkan/radv_meta_buffer.c @@ -15,8 +15,8 @@ build_buffer_fill_shader(struct radv_device *dev) b.shader->info.cs.local_size[1] = 1; b.shader->info.cs.local_size[2] = 1; - nir_ssa_def *invoc_id = nir_load_local_invocation_id(); - nir_ssa_def *wg_id = nir_load_work_group_id(); + nir_ssa_def *invoc_id = nir_load_local_invocation_id(, 32); + nir_ssa_def *wg_id = nir_load_work_group_id(, 32); nir_ssa_def *block_size = nir_imm_ivec4(, b.shader->info.cs.local_size[0], b.shader->info.cs.local_size[1], @@ -67,8 +67,8 @@ build_buffer_copy_shader(struct radv_device *dev) b.shader->info.cs.local_size[1] = 1; b.shader->info.cs.local_size[2] = 1; - nir_ssa_def *invoc_id = nir_load_local_invocation_id(); - nir_ssa_def *wg_id = nir_load_work_group_id(); + nir_ssa_def *invoc_id = nir_load_local_invocation_id(, 32); + nir_ssa_def *wg_id = nir_load_work_group_id(, 32); nir_ssa_def *block_size = nir_imm_ivec4(, b.shader->info.cs.local_size[0], b.shader->info.cs.local_size[1], diff --git a/src/amd/vulkan/radv_meta_bufimage.c b/src/amd/vulkan/radv_meta_bufimage.c index f5b68f6c9a6..e79919a984b 100644 --- a/src/amd/vulkan/radv_meta_bufimage.c +++ b/src/amd/vulkan/radv_meta_bufimage.c @@ -60,8 +60,8 @@ build_nir_itob_compute_shader(struct radv_device *dev, bool is_3d) output_img->data.descriptor_set = 0; output_img->data.binding = 1; - nir_ssa_def *invoc_id = nir_load_local_invocation_id(); - nir_ssa_def *wg_id = nir_load_work_group_id(); + nir_ssa_def *invoc_id = nir_load_local_invocation_id(, 32); + nir_ssa_def *wg_id = nir_load_work_group_id(, 32); nir_ssa_def *block_size = nir_imm_ivec4(, b.shader->info.cs.local_size[0], b.shader->info.cs.local_size[1], @@ -289,8 +289,8 @@ build_nir_btoi_compute_shader(struct radv_device *dev, bool is_3d) output_img->data.descriptor_set = 0; output_img->data.binding = 1; - nir_ssa_def *invoc_id = nir_load_local_invocation_id(); - nir_ssa_def *wg_id = nir_load_work_group_id(); + nir_ssa_def *invoc_id = nir_load_local_invocation_id(, 32); + nir_ssa_def *wg_id = nir_load_work_group_id(, 32); nir_ssa_def *block_size = nir_imm_ivec4(, b.shader->info.cs.local_size[0], b.shader->info.cs.local_size[1], @@ -719,8 +719,8 @@ build_nir_itoi_compute_shader(struct radv_device *dev, bool is_3d) output_img->data.descriptor_set = 0; output_img->data.binding = 1; - nir_ssa_def *invoc_id = nir_load_local_invocation_id(); - nir_ssa_def *wg_id = nir_load_work_group_id(); + nir_ssa_def *invoc_id = nir_load_local_invocation_id(, 32); + nir_ssa_def *wg_id = nir_load_work_group_id(, 32); nir_ssa_def *block_size = nir_imm_ivec4(, b.shader->info.cs.local_size[0], b.shader->info.cs.local_size[1], @@ -1139,8 +1139,8 @@ build_nir_cleari_compute_shader(struct radv_device *dev, bool is_3d) output_img->data.descriptor_set = 0; output_img->data.binding = 0; - nir_ssa_def *invoc_id = nir_load_local_invocation_id(); - nir_ssa_def *wg_id = nir_load_work_group_id(); + nir_ssa_def *invoc_id = nir_load_local_invocation_id(, 32); + nir_ssa_def *wg_id = nir_load_work_group_id(, 32); nir_ssa_def *block_size = nir_imm_ivec4(, b.shader->info.cs.local_size[0], b.shader->info.cs.local_size[1], diff --git a/src/amd/vulkan/radv_meta_fast_clear.c b/src/amd/vulkan/radv_meta_fast_clear.c index
[Mesa-dev] [PATCH 16/22] nir+vtn: vec8+vec16 support
This introduces new vec8 and vec16 instructions (which are the only instructions taking more than 4 sources), in order to construct 8 and 16 component vectors. In order to avoid fixing up the non-autogenerated nir_build_alu() sites and making them pass 16 src args for the benefit of the two instructions that take more than 4 srcs (ie vec8 and vec16), nir_build_alu() is has nir_build_alu_tail() split out and re-used by nir_build_alu2() (which is used for the > 4 src args case). Signed-off-by: Rob Clark Signed-off-by: Karol Herbst --- src/compiler/nir/nir.h | 4 +- src/compiler/nir/nir_builder.h | 58 +++- src/compiler/nir/nir_builder_opcodes_h.py| 5 +- src/compiler/nir/nir_constant_expressions.py | 33 +-- src/compiler/nir/nir_lower_alu_to_scalar.c | 2 + src/compiler/nir/nir_opcodes.py | 39 - src/compiler/nir/nir_print.c | 17 -- src/compiler/nir/nir_search.c| 8 ++- src/compiler/spirv/spirv_to_nir.c| 4 +- 9 files changed, 140 insertions(+), 30 deletions(-) diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index 3855eb0b582..89c28e36618 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -57,8 +57,8 @@ extern "C" { #define NIR_FALSE 0u #define NIR_TRUE (~0u) -#define NIR_MAX_VEC_COMPONENTS 4 -typedef uint8_t nir_component_mask_t; +#define NIR_MAX_VEC_COMPONENTS 16 +typedef uint16_t nir_component_mask_t; /** Defines a cast function * diff --git a/src/compiler/nir/nir_builder.h b/src/compiler/nir/nir_builder.h index 3271a480520..57f0a188c46 100644 --- a/src/compiler/nir/nir_builder.h +++ b/src/compiler/nir/nir_builder.h @@ -352,24 +352,12 @@ nir_imm_ivec4(nir_builder *build, int x, int y, int z, int w) } static inline nir_ssa_def * -nir_build_alu(nir_builder *build, nir_op op, nir_ssa_def *src0, - nir_ssa_def *src1, nir_ssa_def *src2, nir_ssa_def *src3) +nir_build_alu_tail(nir_builder *build, nir_alu_instr *instr) { - const nir_op_info *op_info = _op_infos[op]; - nir_alu_instr *instr = nir_alu_instr_create(build->shader, op); - if (!instr) - return NULL; + const nir_op_info *op_info = _op_infos[instr->op]; instr->exact = build->exact; - instr->src[0].src = nir_src_for_ssa(src0); - if (src1) - instr->src[1].src = nir_src_for_ssa(src1); - if (src2) - instr->src[2].src = nir_src_for_ssa(src2); - if (src3) - instr->src[3].src = nir_src_for_ssa(src3); - /* Guess the number of components the destination temporary should have * based on our input sizes, if it's not fixed for the op. */ @@ -425,12 +413,54 @@ nir_build_alu(nir_builder *build, nir_op op, nir_ssa_def *src0, return >dest.dest.ssa; } +static inline nir_ssa_def * +nir_build_alu(nir_builder *build, nir_op op, nir_ssa_def *src0, + nir_ssa_def *src1, nir_ssa_def *src2, nir_ssa_def *src3) +{ + nir_alu_instr *instr = nir_alu_instr_create(build->shader, op); + if (!instr) + return NULL; + + instr->src[0].src = nir_src_for_ssa(src0); + if (src1) + instr->src[1].src = nir_src_for_ssa(src1); + if (src2) + instr->src[2].src = nir_src_for_ssa(src2); + if (src3) + instr->src[3].src = nir_src_for_ssa(src3); + + return nir_build_alu_tail(build, instr); +} + +/* for the couple special cases with more than 4 src args: */ +static inline nir_ssa_def * +nir_build_alu2(nir_builder *build, nir_op op, nir_ssa_def **srcs) +{ + const nir_op_info *op_info = _op_infos[op]; + nir_alu_instr *instr = nir_alu_instr_create(build->shader, op); + if (!instr) + return NULL; + + for (unsigned i = 0; i < op_info->num_inputs; i++) + instr->src[i].src = nir_src_for_ssa(srcs[i]); + + return nir_build_alu_tail(build, instr); +} + #include "nir_builder_opcodes.h" static inline nir_ssa_def * nir_vec(nir_builder *build, nir_ssa_def **comp, unsigned num_components) { switch (num_components) { + case 16: + return nir_vec16(build, comp[0], comp[1], comp[2], comp[3], + comp[4], comp[5], comp[6], comp[7], + comp[8], comp[9], comp[10], comp[11], + comp[12], comp[13], comp[14], comp[15]); + case 8: + return nir_vec8(build, comp[0], comp[1], comp[2], comp[3], + comp[4], comp[5], comp[6], comp[7]); case 4: return nir_vec4(build, comp[0], comp[1], comp[2], comp[3]); case 3: diff --git a/src/compiler/nir/nir_builder_opcodes_h.py b/src/compiler/nir/nir_builder_opcodes_h.py index 84e5400958e..47edc02896c 100644 --- a/src/compiler/nir/nir_builder_opcodes_h.py +++ b/src/compiler/nir/nir_builder_opcodes_h.py @@ -31,14 +31,15 @@ def src_decl_list(num_srcs): return ', '.join('nir_ssa_def *src' + str(i) for i in range(num_srcs)) def src_list(num_srcs): - return ', '.join('src' + str(i) if i < num_srcs else 'NULL' for i in range(4)) + return ',
[Mesa-dev] [PATCH 08/22] glsl: add glsl_base_get_byte_size
Signed-off-by: Karol Herbst --- src/compiler/glsl_types.h | 34 ++ src/compiler/nir_types.h | 30 +- 2 files changed, 35 insertions(+), 29 deletions(-) diff --git a/src/compiler/glsl_types.h b/src/compiler/glsl_types.h index f2163728610..efcbc70af26 100644 --- a/src/compiler/glsl_types.h +++ b/src/compiler/glsl_types.h @@ -1089,4 +1089,38 @@ glsl_align(unsigned int a, unsigned int align) return (a + align - 1) / align * align; } +static inline unsigned +glsl_base_get_byte_size(const enum glsl_base_type base_type) +{ + switch (base_type) { + case GLSL_TYPE_INT: + case GLSL_TYPE_UINT: + case GLSL_TYPE_BOOL: + case GLSL_TYPE_FLOAT: /* TODO handle mediump */ + case GLSL_TYPE_SUBROUTINE: + return 4; + + case GLSL_TYPE_FLOAT16: + case GLSL_TYPE_UINT16: + case GLSL_TYPE_INT16: + return 2; + + case GLSL_TYPE_UINT8: + case GLSL_TYPE_INT8: + return 1; + + case GLSL_TYPE_DOUBLE: + case GLSL_TYPE_INT64: + case GLSL_TYPE_UINT64: + case GLSL_TYPE_IMAGE: + case GLSL_TYPE_SAMPLER: + return 8; + + default: + unreachable("unknown base type"); + } + + return 0; +} + #endif /* GLSL_TYPES_H */ diff --git a/src/compiler/nir_types.h b/src/compiler/nir_types.h index 7080a23e1cc..c06d227e45a 100644 --- a/src/compiler/nir_types.h +++ b/src/compiler/nir_types.h @@ -94,35 +94,7 @@ unsigned glsl_atomic_size(const struct glsl_type *type); static inline unsigned glsl_get_bit_size(const struct glsl_type *type) { - switch (glsl_get_base_type(type)) { - case GLSL_TYPE_INT: - case GLSL_TYPE_UINT: - case GLSL_TYPE_BOOL: - case GLSL_TYPE_FLOAT: /* TODO handle mediump */ - case GLSL_TYPE_SUBROUTINE: - return 32; - - case GLSL_TYPE_FLOAT16: - case GLSL_TYPE_UINT16: - case GLSL_TYPE_INT16: - return 16; - - case GLSL_TYPE_UINT8: - case GLSL_TYPE_INT8: - return 8; - - case GLSL_TYPE_DOUBLE: - case GLSL_TYPE_INT64: - case GLSL_TYPE_UINT64: - case GLSL_TYPE_IMAGE: - case GLSL_TYPE_SAMPLER: - return 64; - - default: - unreachable("unknown base type"); - } - - return 0; + return glsl_base_get_byte_size(glsl_get_base_type(type)) * 8; } bool glsl_type_is_16bit(const struct glsl_type *type); -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/22] vtn: handle SpvExecutionModelKernel
Signed-off-by: Karol Herbst --- src/compiler/spirv/spirv_to_nir.c | 3 +++ src/compiler/spirv/vtn_private.h | 2 ++ 2 files changed, 5 insertions(+) diff --git a/src/compiler/spirv/spirv_to_nir.c b/src/compiler/spirv/spirv_to_nir.c index 2c214324774..650eb6a977c 100644 --- a/src/compiler/spirv/spirv_to_nir.c +++ b/src/compiler/spirv/spirv_to_nir.c @@ -3318,6 +3318,9 @@ stage_for_execution_model(struct vtn_builder *b, SpvExecutionModel model) return MESA_SHADER_FRAGMENT; case SpvExecutionModelGLCompute: return MESA_SHADER_COMPUTE; + case SpvExecutionModelKernel: + b->kernel_mode = true; + return MESA_SHADER_COMPUTE; default: vtn_fail("Unsupported execution model"); } diff --git a/src/compiler/spirv/vtn_private.h b/src/compiler/spirv/vtn_private.h index 643a88d1abe..df6356f50fe 100644 --- a/src/compiler/spirv/vtn_private.h +++ b/src/compiler/spirv/vtn_private.h @@ -605,6 +605,8 @@ struct vtn_builder { unsigned func_param_idx; bool has_loop_continue; + + bool kernel_mode; }; nir_ssa_def * -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/22] nir: simplify get_io_offset() parameters
From: Rob Clark For pointers we'll need to add another caller, plus in addition a type_align() fxn ptr. So just simplify things and pass the lower_io_state to get_io_offset(). Signed-off-by: Karol Herbst --- src/compiler/nir/nir_lower_io.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/compiler/nir/nir_lower_io.c b/src/compiler/nir/nir_lower_io.c index b3595bb19d5..2a6c284de2b 100644 --- a/src/compiler/nir/nir_lower_io.c +++ b/src/compiler/nir/nir_lower_io.c @@ -87,11 +87,11 @@ nir_is_per_vertex_io(const nir_variable *var, gl_shader_stage stage) } static nir_ssa_def * -get_io_offset(nir_builder *b, nir_deref_instr *deref, - nir_ssa_def **vertex_index, - int (*type_size)(const struct glsl_type *), - unsigned *component) +get_io_offset(nir_deref_instr *deref, nir_ssa_def **vertex_index, + struct lower_io_state *state, unsigned *component) { + nir_builder *b = >builder; + int (*type_size)(const struct glsl_type *) = state->type_size; nir_deref_path path; nir_deref_path_init(, deref, NULL); @@ -421,8 +421,8 @@ nir_lower_io_block(nir_block *block, nir_ssa_def *vertex_index = NULL; unsigned component_offset = var->data.location_frac; - offset = get_io_offset(b, deref, per_vertex ? _index : NULL, - state->type_size, _offset); + offset = get_io_offset(deref, per_vertex ? _index : NULL, + state, _offset); nir_intrinsic_instr *replacement; -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 18/22] nir/spirv: handle SpvStorageClassCrossWorkgroup
Signed-off-by: Karol Herbst --- src/compiler/nir/nir.c | 4 src/compiler/nir/nir.h | 1 + src/compiler/nir/nir_print.c | 2 ++ src/compiler/spirv/vtn_private.h | 1 + src/compiler/spirv/vtn_variables.c | 4 5 files changed, 12 insertions(+) diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c index 27f5d1b7bca..ca258b7c80e 100644 --- a/src/compiler/nir/nir.c +++ b/src/compiler/nir/nir.c @@ -129,6 +129,10 @@ nir_shader_add_variable(nir_shader *shader, nir_variable *var) assert(!"nir_shader_add_variable cannot be used for local variables"); break; + case nir_var_global: + assert(!"nir_shader_add_variable cannot be used for global memory"); + break; + case nir_var_private: exec_list_push_tail(>globals, >node); break; diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index 78f3204d3e2..35f2ec02c31 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -102,6 +102,7 @@ typedef enum { nir_var_shader_storage = (1 << 5), nir_var_system_value= (1 << 6), nir_var_shared = (1 << 8), + nir_var_global = (1 << 9), nir_var_all = ~0, } nir_variable_mode; diff --git a/src/compiler/nir/nir_print.c b/src/compiler/nir/nir_print.c index 88f91087134..2fb041039c6 100644 --- a/src/compiler/nir/nir_print.c +++ b/src/compiler/nir/nir_print.c @@ -420,6 +420,8 @@ get_variable_mode_str(nir_variable_mode mode, bool want_local_global_mode) return want_local_global_mode ? "private" : ""; case nir_var_local: return want_local_global_mode ? "local" : ""; + case nir_var_global: + return want_local_global_mode ? "global" : ""; default: return ""; } diff --git a/src/compiler/spirv/vtn_private.h b/src/compiler/spirv/vtn_private.h index 86f98083f58..4dec2b66ff0 100644 --- a/src/compiler/spirv/vtn_private.h +++ b/src/compiler/spirv/vtn_private.h @@ -424,6 +424,7 @@ enum vtn_variable_mode { vtn_variable_mode_ssbo, vtn_variable_mode_push_constant, vtn_variable_mode_workgroup, + vtn_variable_mode_cross_workgroup, vtn_variable_mode_input, vtn_variable_mode_output, }; diff --git a/src/compiler/spirv/vtn_variables.c b/src/compiler/spirv/vtn_variables.c index 5738941ffb6..7896e58f7e5 100644 --- a/src/compiler/spirv/vtn_variables.c +++ b/src/compiler/spirv/vtn_variables.c @@ -1572,6 +1572,9 @@ vtn_storage_class_to_mode(struct vtn_builder *b, nir_mode = nir_var_uniform; break; case SpvStorageClassCrossWorkgroup: + mode = vtn_variable_mode_cross_workgroup; + nir_mode = nir_var_global; + break; case SpvStorageClassGeneric: default: vtn_fail("Unhandled variable storage class"); @@ -1830,6 +1833,7 @@ vtn_create_variable(struct vtn_builder *b, struct vtn_value *val, case vtn_variable_mode_ubo: case vtn_variable_mode_ssbo: case vtn_variable_mode_push_constant: + case vtn_variable_mode_cross_workgroup: /* These don't need actual variables. */ break; } -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/22] nir/spirv: cast shift operand to u32
v2: fix for specialization constants as well Signed-off-by: Karol Herbst --- src/compiler/spirv/spirv_to_nir.c | 20 src/compiler/spirv/vtn_alu.c | 11 +++ 2 files changed, 31 insertions(+) diff --git a/src/compiler/spirv/spirv_to_nir.c b/src/compiler/spirv/spirv_to_nir.c index d72f07dc1f9..2c214324774 100644 --- a/src/compiler/spirv/spirv_to_nir.c +++ b/src/compiler/spirv/spirv_to_nir.c @@ -1813,6 +1813,26 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp opcode, src[j] = src_val->constant->values[0]; } + /* fix up fixed size sources */ + switch (op) { + case nir_op_ishl: + case nir_op_ishr: + case nir_op_ushr: { +if (bit_size == 32) + break; +for (unsigned i = 0; i < num_components; ++i) { + switch (bit_size) { + case 64: src[1].u32[i] = src[1].u64[i]; break; + case 16: src[1].u32[i] = src[1].u16[i]; break; + case 8: src[1].u32[i] = src[1].u8[i]; break; + } +} +break; + } + default: +break; + } + val->constant->values[0] = nir_eval_const_opcode(op, num_components, bit_size, src); break; diff --git a/src/compiler/spirv/vtn_alu.c b/src/compiler/spirv/vtn_alu.c index ea25d4bcbdc..32825da29cb 100644 --- a/src/compiler/spirv/vtn_alu.c +++ b/src/compiler/spirv/vtn_alu.c @@ -743,6 +743,17 @@ vtn_handle_alu(struct vtn_builder *b, SpvOp opcode, src[1] = tmp; } + switch (op) { + case nir_op_ishl: + case nir_op_ishr: + case nir_op_ushr: + if (src[1]->bit_size != 32) +src[1] = nir_u2u32(>nb, src[1]); + break; + default: + break; + } + val->ssa->def = nir_build_alu(>nb, op, src[0], src[1], src[2], src[3]); break; } /* default */ -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 21/22] spirv/cl: support vload/vstore
Signed-off-by: Karol Herbst --- src/compiler/spirv/vtn_opencl.c | 59 + 1 file changed, 59 insertions(+) diff --git a/src/compiler/spirv/vtn_opencl.c b/src/compiler/spirv/vtn_opencl.c index 089e6168fd8..ecaca4c17bc 100644 --- a/src/compiler/spirv/vtn_opencl.c +++ b/src/compiler/spirv/vtn_opencl.c @@ -191,6 +191,59 @@ handle_special(struct vtn_builder *b, enum OpenCLstd opcode, unsigned num_srcs, } } +static void +_handle_v_load_store(struct vtn_builder *b, enum OpenCLstd opcode, + const uint32_t *w, unsigned count, bool load) +{ + struct vtn_type *type; + if (load) + type = vtn_value(b, w[1], vtn_value_type_type)->type; + else + type = vtn_untyped_value(b, w[5])->type; + unsigned a = load ? 0 : 1; + + const struct glsl_type *dest_type = type->type; + enum glsl_base_type base_type = glsl_get_base_type(dest_type); + const struct glsl_type *scalar_type = glsl_scalar_type(base_type); + + nir_ssa_def *offset = vtn_ssa_value(b, w[5 + a])->def; + struct vtn_value *p = vtn_value(b, w[6 + a], vtn_value_type_pointer); + + nir_deref_instr *deref = vtn_pointer_to_deref(b, p->pointer); + + /* we have to manually handle alignment here for vec3 */ + /* 1. cast to scalar type */ + deref = nir_build_deref_cast(>nb, >dest.ssa, nir_var_global, scalar_type); + /* 2. multiple offset by vector size */ + offset = nir_imul(>nb, offset, nir_imm_intN_t(>nb, glsl_get_vector_elements(dest_type), offset->bit_size)); + /* 3. deref ptr_as_array */ + deref = nir_build_deref_ptr_as_array(>nb, deref, offset, scalar_type); + /* 4. cast to vec type */ + deref = nir_build_deref_cast(>nb, >dest.ssa, nir_var_global, dest_type); + + if (load) { + struct vtn_ssa_value *val = vtn_local_load(b, deref); + vtn_push_ssa(b, w[2], type, val); + } else { + struct vtn_ssa_value *val = vtn_ssa_value(b, w[5]); + vtn_local_store(b, val, deref); + } +} + +static void +vtn_handle_opencl_vload(struct vtn_builder *b, enum OpenCLstd opcode, +const uint32_t *w, unsigned count) +{ + _handle_v_load_store(b, opcode, w, count, true); +} + +static void +vtn_handle_opencl_vstore(struct vtn_builder *b, enum OpenCLstd opcode, + const uint32_t *w, unsigned count) +{ + _handle_v_load_store(b, opcode, w, count, false); +} + static nir_ssa_def * handle_printf(struct vtn_builder *b, enum OpenCLstd opcode, unsigned num_srcs, nir_ssa_def **srcs, const struct glsl_type *dest_type) @@ -271,6 +324,12 @@ vtn_handle_opencl_instruction(struct vtn_builder *b, uint32_t ext_opcode, case U_Upsample: handle_instr(b, ext_opcode, w, count, handle_special); return true; + case Vloadn: + vtn_handle_opencl_vload(b, ext_opcode, w, count); + return true; + case Vstoren: + vtn_handle_opencl_vstore(b, ext_opcode, w, count); + return true; case Printf: handle_instr(b, ext_opcode, w, count, handle_printf); return true; -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/22] nir/spirv: support for CL kernel
some of those patches are already reviewed, but not pushed. Just wanted to post the patches to show the most current approach and to start discussion on what we might want to handle differently. There are some things I am not so happy about as well, like that bit_size handling for system values or how the derefs for pointers are created. But overall it feels we require less changes overall with my new approach to support physical pointers inside nir and vtn. Karol Herbst (18): nir: add const_index parameters to system value builder function nir: replace nir_load_system_value calls with appropiate builder functions nir/spirv: initial handling of OpenCL.std extension opcodes nir/spirv: cast shift operand to u32 vtn: handle SpvExecutionModelKernel glsl: add packed for struct types glsl: add glsl_base_get_byte_size glsl: add cl_size and cl_alignment nir/spirv: parse memory model nir: add legal bit_sizes to intrinsics nir: add support for address bit sized system values nir+vtn: vec8+vec16 support nir: rename global to private memory nir/spirv: handle SpvStorageClassCrossWorkgroup nir/spirv: handle kernel function parameters nir/spirv: physical pointer support spirv/cl: support vload/vstore nir/spirv: handle OpBitcasts for pointers Rob Clark (4): nir/spirv: add OpIsFinite and OpIsNormal nir/vtn: add caps for some cl related capabilities nir: simplify get_io_offset() parameters nir: add type alignment support to lower_io src/amd/vulkan/radv_meta_buffer.c | 8 +- src/amd/vulkan/radv_meta_bufimage.c | 16 +- src/amd/vulkan/radv_meta_clear.c | 8 +- src/amd/vulkan/radv_meta_fast_clear.c | 4 +- src/amd/vulkan/radv_meta_resolve_cs.c | 4 +- src/amd/vulkan/radv_query.c | 8 +- src/compiler/glsl/glsl_to_nir.cpp | 4 +- src/compiler/glsl_types.cpp | 65 +++- src/compiler/glsl_types.h | 56 ++- src/compiler/nir/meson.build | 1 + src/compiler/nir/nir.c| 8 +- src/compiler/nir/nir.h| 37 +- src/compiler/nir/nir_builder.h| 95 - src/compiler/nir/nir_builder_opcodes_h.py | 41 ++- src/compiler/nir/nir_builtin_builder.c| 249 - src/compiler/nir/nir_builtin_builder.h| 150 +++- src/compiler/nir/nir_clone.c | 2 + src/compiler/nir/nir_constant_expressions.py | 33 +- src/compiler/nir/nir_deref.c | 26 +- src/compiler/nir/nir_instr_set.c | 2 + src/compiler/nir/nir_intrinsics.py| 32 +- src/compiler/nir/nir_intrinsics_c.py | 6 +- src/compiler/nir/nir_linking_helpers.c| 2 +- src/compiler/nir/nir_loop_analyze.c | 2 +- src/compiler/nir/nir_lower_alu_to_scalar.c| 2 + src/compiler/nir/nir_lower_clip.c | 3 +- .../nir/nir_lower_constant_initializers.c | 2 +- .../nir/nir_lower_global_vars_to_local.c | 4 +- src/compiler/nir/nir_lower_indirect_derefs.c | 6 +- src/compiler/nir/nir_lower_io.c | 141 +-- .../nir/nir_lower_io_arrays_to_elements.c | 4 +- .../nir/nir_lower_io_to_temporaries.c | 2 +- src/compiler/nir/nir_lower_locals_to_regs.c | 9 +- src/compiler/nir/nir_lower_system_values.c| 40 +- src/compiler/nir/nir_lower_var_copies.c | 3 +- src/compiler/nir/nir_lower_vars_to_ssa.c | 12 +- src/compiler/nir/nir_lower_wpos_center.c | 3 +- src/compiler/nir/nir_opcodes.py | 39 +- src/compiler/nir/nir_opt_copy_prop_vars.c | 4 +- src/compiler/nir/nir_opt_copy_propagate.c | 2 +- src/compiler/nir/nir_opt_dead_write_vars.c| 6 +- src/compiler/nir/nir_print.c | 29 +- src/compiler/nir/nir_propagate_invariant.c| 2 + src/compiler/nir/nir_remove_dead_variables.c | 6 +- src/compiler/nir/nir_search.c | 8 +- src/compiler/nir/nir_serialize.c | 4 + src/compiler/nir/nir_split_vars.c | 20 +- src/compiler/nir/nir_validate.c | 17 +- src/compiler/nir/tests/vars_tests.cpp | 2 +- src/compiler/nir_types.cpp| 17 +- src/compiler/nir_types.h | 37 +- src/compiler/shader_info.h| 3 + src/compiler/spirv/spirv_to_nir.c | 131 ++- src/compiler/spirv/vtn_alu.c | 245 + src/compiler/spirv/vtn_cfg.c | 3 +- src/compiler/spirv/vtn_glsl450.c | 2 +- src/compiler/spirv/vtn_opencl.c | 343 ++ src/compiler/spirv/vtn_private.h | 19 +- src/compiler/spirv/vtn_variables.c| 106 -- src/gallium/auxiliary/nir/tgsi_to_nir.c | 4 +- src/gallium/drivers/vc4/vc4_nir_lower_blend.c | 4 +- src/intel/compiler/brw_nir.c
Re: [Mesa-dev] [PATCH mesa] xmlpool: update translation po files
On Tuesday, 2018-11-13 13:37:14 +, Emil Velikov wrote: > On Mon, 12 Nov 2018 at 18:14, Dylan Baker wrote: > > > > Quoting Eric Engestrom (2018-11-12 09:47:22) > > > On Monday, 2018-11-12 16:56:32 +, Emil Velikov wrote: > > > > On Mon, 12 Nov 2018 at 14:24, Eric Engestrom > > > > wrote: > > > > > > > > > > These files are close to 4 years out of date; a lot's changed since. > > > > > Let's just check in a recently-regenerated version. > > > > > > > > > Worth removing them from git and letting the build regenerate them as > > > > needed? > > > > > > No, the point is for them to be filled with the translations. > > > They aren't 100% generated, they're more like "refreshed" by running the > > > ninja command, to add new strings to be translated and adjust file/line > > > references. > > > > > > > > > That said, I've just looked at the state of the translations, and > > > "partial" is already generous. Users would currently get a mostly > > > english driconf interface with a few strings translated here and there, > > > which I'm not sure is worth the hassle of maintaining all this. > > > > > > Should we just drop the translation infrastructure? > > > > I'd try pinging the people who provided the translations in the first place > > to > > see if they're interested in updating them. If not I'd be in favor of > > dropping > > unmaintained translations, if there are no maintained translations drop the > > whole things. > > > > Just my 2¢ > > > Very well said Dylan. I'm on the same page. Sounds like a good plan; I'll ping them privately and we'll see from there :) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 4/8] mesa/main/version: Lower the requirements for GLES 3.0
From: Gert Wollny GLES 3.0 does not actually require support for EXT_framebuffer_sRGB, it only needs support for sRGB attachments to framebuffers. Signed-off-by: Gert Wollny --- src/mesa/main/version.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/version.c b/src/mesa/main/version.c index 610ba2f08c..2f7ac75a81 100644 --- a/src/mesa/main/version.c +++ b/src/mesa/main/version.c @@ -512,8 +512,9 @@ compute_version_es2(const struct gl_extensions *extensions, extensions->ARB_texture_float && extensions->ARB_texture_rg && extensions->ARB_depth_buffer_float && - /* extensions->ARB_framebuffer_object && */ - extensions->EXT_framebuffer_sRGB && + (extensions->EXT_framebuffer_sRGB || + (extensions->ARB_framebuffer_object && + extensions->EXT_sRGB)) && extensions->EXT_packed_float && extensions->EXT_texture_array && extensions->EXT_texture_shared_exponent && -- 2.18.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 2/8] virgl: Set sRGB write control CAP based on host capabilities
From: Gert Wollny Signed-off-by: Gert Wollny --- src/gallium/drivers/virgl/virgl_hw.h | 1 + src/gallium/drivers/virgl/virgl_screen.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/src/gallium/drivers/virgl/virgl_hw.h b/src/gallium/drivers/virgl/virgl_hw.h index e682c750e7..7b4c063f35 100644 --- a/src/gallium/drivers/virgl/virgl_hw.h +++ b/src/gallium/drivers/virgl/virgl_hw.h @@ -232,6 +232,7 @@ enum virgl_formats { #define VIRGL_CAP_TEXTURE_BARRIER (1 << 12) #define VIRGL_CAP_TGSI_COMPONENTS (1 << 13) #define VIRGL_CAP_GUEST_MAY_INIT_LOG (1 << 14) +#define VIRGL_CAP_SRGB_WRITE_CONTROL (1 << 15) /* virgl bind flags - these are compatible with mesa 10.5 gallium. * but are fixed, no other should be passed to virgl either. diff --git a/src/gallium/drivers/virgl/virgl_screen.c b/src/gallium/drivers/virgl/virgl_screen.c index e71883b06f..ec486463fe 100644 --- a/src/gallium/drivers/virgl/virgl_screen.c +++ b/src/gallium/drivers/virgl/virgl_screen.c @@ -341,6 +341,8 @@ virgl_get_param(struct pipe_screen *screen, enum pipe_cap param) return 0; case PIPE_CAP_NATIVE_FENCE_FD: return 0; + case PIPE_CAP_SRGB_WRITE_CONTROL: + return vscreen->caps.caps.v2.capability_bits & VIRGL_CAP_SRGB_WRITE_CONTROL; default: return u_pipe_screen_get_param_defaults(screen, param); } -- 2.18.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 7/8] mesa/main: Remove now superfluos tests for both EXT_sRGB and EXT_framebuffer_sRGB
From: Gert Wollny Signed-off-by: Gert Wollny --- src/mesa/main/fbobject.c | 2 +- src/mesa/main/teximage.c | 3 +-- src/mesa/main/version.c | 5 ++--- 3 files changed, 4 insertions(+), 6 deletions(-) diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c index ca3f3f7f76..7d45ce43f4 100644 --- a/src/mesa/main/fbobject.c +++ b/src/mesa/main/fbobject.c @@ -4253,7 +4253,7 @@ get_framebuffer_attachment_parameter(struct gl_context *ctx, } } else { - if (ctx->Extensions.EXT_framebuffer_sRGB || ctx->Extensions.EXT_sRGB) { + if (ctx->Extensions.EXT_sRGB) { *params = _mesa_get_format_color_encoding(att->Renderbuffer->Format); } diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c index e1d652824e..3c9c8ada99 100644 --- a/src/mesa/main/teximage.c +++ b/src/mesa/main/teximage.c @@ -2438,8 +2438,7 @@ copytexture_error_check( struct gl_context *ctx, GLuint dimensions, bool rb_is_srgb = false; bool dst_is_srgb = false; - if ((ctx->Extensions.EXT_framebuffer_sRGB || - ctx->Extensions.EXT_sRGB) && + if (ctx->Extensions.EXT_sRGB && _mesa_get_format_color_encoding(rb->Format) == GL_SRGB) { rb_is_srgb = true; } diff --git a/src/mesa/main/version.c b/src/mesa/main/version.c index 2f7ac75a81..5709d283f3 100644 --- a/src/mesa/main/version.c +++ b/src/mesa/main/version.c @@ -512,9 +512,8 @@ compute_version_es2(const struct gl_extensions *extensions, extensions->ARB_texture_float && extensions->ARB_texture_rg && extensions->ARB_depth_buffer_float && - (extensions->EXT_framebuffer_sRGB || - (extensions->ARB_framebuffer_object && - extensions->EXT_sRGB)) && + extensions->ARB_framebuffer_object && + extensions->EXT_sRGB && extensions->EXT_packed_float && extensions->EXT_texture_array && extensions->EXT_texture_shared_exponent && -- 2.18.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 8/8] mesa/main: Expose EXT_sRGB_write_control
From: Gert Wollny Use EXT_framebuffer_sRGB to expose EXT_sRGB_write_control on GLES. Remove the checks for desktion GL in the enable calls, since EXT_framebuffer_sRGB now also indicates support for switching the linear-sRGB color space conversion on GLES. Thanks to Ilia Mirkin for all the helpful discussions that helped to rework this series. Signed-off-by: Gert Wollny --- src/mesa/main/enable.c | 4 src/mesa/main/extensions_table.h | 1 + src/mesa/main/get_hash_params.py | 4 +++- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/src/mesa/main/enable.c b/src/mesa/main/enable.c index bd3e493da5..d03ffc9d80 100644 --- a/src/mesa/main/enable.c +++ b/src/mesa/main/enable.c @@ -1125,8 +1125,6 @@ _mesa_set_enable(struct gl_context *ctx, GLenum cap, GLboolean state) /* GL3.0 - GL_framebuffer_sRGB */ case GL_FRAMEBUFFER_SRGB_EXT: - if (!_mesa_is_desktop_gl(ctx)) -goto invalid_enum_error; CHECK_EXTENSION(EXT_framebuffer_sRGB, cap); _mesa_set_framebuffer_srgb(ctx, state); return; @@ -1765,8 +1763,6 @@ _mesa_IsEnabled( GLenum cap ) /* GL3.0 - GL_framebuffer_sRGB */ case GL_FRAMEBUFFER_SRGB_EXT: - if (!_mesa_is_desktop_gl(ctx)) -goto invalid_enum_error; CHECK_EXTENSION(EXT_framebuffer_sRGB); return ctx->Color.sRGBEnabled; diff --git a/src/mesa/main/extensions_table.h b/src/mesa/main/extensions_table.h index a516a1b17f..ea9f54ecdc 100644 --- a/src/mesa/main/extensions_table.h +++ b/src/mesa/main/extensions_table.h @@ -266,6 +266,7 @@ EXT(EXT_shader_integer_mix , EXT_shader_integer_mix EXT(EXT_shader_io_blocks, dummy_true , x , x , x , 31, 2014) EXT(EXT_shader_samples_identical, EXT_shader_samples_identical , GLL, GLC, x , 31, 2015) EXT(EXT_shadow_funcs, ARB_shadow , GLL, x , x , x , 2002) +EXT(EXT_sRGB_write_control , EXT_framebuffer_sRGB , x, x , x , 30, 2013) EXT(EXT_stencil_two_side, EXT_stencil_two_side , GLL, x , x , x , 2001) EXT(EXT_stencil_wrap, dummy_true , GLL, x , x , x , 2002) EXT(EXT_subtexture , dummy_true , GLL, x , x , x , 1995) diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py index 1840db6ebb..8de634e90a 100644 --- a/src/mesa/main/get_hash_params.py +++ b/src/mesa/main/get_hash_params.py @@ -463,6 +463,9 @@ descriptor=[ [ "MIN_FRAGMENT_INTERPOLATION_OFFSET", "CONTEXT_FLOAT(Const.MinFragmentInterpolationOffset), extra_ARB_gpu_shader5_or_OES_sample_variables" ], [ "MAX_FRAGMENT_INTERPOLATION_OFFSET", "CONTEXT_FLOAT(Const.MaxFragmentInterpolationOffset), extra_ARB_gpu_shader5_or_OES_sample_variables" ], [ "FRAGMENT_INTERPOLATION_OFFSET_BITS", "CONST(FRAGMENT_INTERPOLATION_OFFSET_BITS), extra_ARB_gpu_shader5_or_OES_sample_variables" ], + +# GL_EXT_framebuffer_EXT / GLES 3.0 + EXT_sRGB_write_control + [ "FRAMEBUFFER_SRGB_EXT", "CONTEXT_BOOL(Color.sRGBEnabled), extra_EXT_framebuffer_sRGB" ], ]}, { "apis": ["GLES", "GLES2"], "params": [ @@ -934,7 +937,6 @@ descriptor=[ [ "RGBA_FLOAT_MODE_ARB", "BUFFER_FIELD(Visual.floatMode, TYPE_BOOLEAN), extra_core_ARB_color_buffer_float_and_new_buffers" ], # GL3.0 / GL_EXT_framebuffer_sRGB - [ "FRAMEBUFFER_SRGB_EXT", "CONTEXT_BOOL(Color.sRGBEnabled), extra_EXT_framebuffer_sRGB" ], [ "FRAMEBUFFER_SRGB_CAPABLE_EXT", "BUFFER_INT(Visual.sRGBCapable), extra_EXT_framebuffer_sRGB_and_new_buffers" ], # GL 3.1 -- 2.18.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 6/8] i965: Set flag for EXT_sRGB
From: Gert Wollny Signed-off-by: Gert Wollny --- src/mesa/drivers/dri/i965/intel_extensions.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index d7e02efb54..ca369e39f2 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -104,6 +104,7 @@ intelInitExtensions(struct gl_context *ctx) ctx->Extensions.EXT_point_parameters = true; ctx->Extensions.EXT_provoking_vertex = true; ctx->Extensions.EXT_render_snorm = true; + ctx->Extensions.EXT_sRGB = true; ctx->Extensions.EXT_stencil_two_side = true; ctx->Extensions.EXT_texture_array = true; ctx->Extensions.EXT_texture_env_dot3 = true; -- 2.18.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 5/8] mesa/st: rework support for sRGB framebuffer attachements
From: Gert Wollny For GLES sRGB framebuffer attachemnt support is provided in two steps: sRGB attachments like described in EXT_sRGB and GLES 3.0 that enable linear to sRGB color space transformation automatically, and sRGB write control that brings GLES on par with EXT_framebuffer_sRGB. Set the according flags to reflect these two parts. As a difference between desktopm GL and GLES, on desktop GL for a sRGB framebuffer attachment the linear-sRGB conversion is turned off by default, and for GLES it is turned on. This needs to be taken into account when creating framebuffer attachemnts. v2: - always enable the extension when sRGB is supported (Ilia Mirkin). - Correct handling by moving extension initialization to the place where gallium/st actually takes care of this. This also fixes properly disabling the extension via MESA_EXTENSION_OVERRIDE - reinstate check for desktop GL and add check for the extension when creating the framebuffer v3: - Only create sRGB renderbuffers based on Visual.srgbCapable when on desktop GL. v4: - Use PIPE_FORMAT_B8G8R8A8_SRGB to check for the capability, since this is also the format that is used top check for EGL_KHR_gl_colorspace support. virgl on a GLES host usually doesn't provide this format but one can make it available to signal that the host supports this extension. v5: - drop check for PIPE_FORMAT_B8G8R8A8_SRGB in favour of using the new PIPE_CAP_SRGB_WRITE_CONTROL cap flag. - enable EXT_sRGB based on the sRGB formats supported and EXT_framebuffer_sRGB by checking for PIPE_CAP_SRGB_WRITE_CONTROL. Signed-off-by: Gert Wollny --- src/mesa/state_tracker/st_cb_fbo.c | 4 +-- src/mesa/state_tracker/st_extensions.c | 6 - src/mesa/state_tracker/st_format.c | 2 +- src/mesa/state_tracker/st_manager.c| 37 -- 4 files changed, 31 insertions(+), 18 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_fbo.c b/src/mesa/state_tracker/st_cb_fbo.c index 0e535257cb..49a989f126 100644 --- a/src/mesa/state_tracker/st_cb_fbo.c +++ b/src/mesa/state_tracker/st_cb_fbo.c @@ -139,7 +139,7 @@ st_renderbuffer_alloc_storage(struct gl_context * ctx, /* If an sRGB framebuffer is unsupported, sRGB formats behave like linear * formats. */ - if (!ctx->Extensions.EXT_framebuffer_sRGB) { + if (!ctx->Extensions.EXT_sRGB) { internalFormat = _mesa_get_linear_internalformat(internalFormat); } @@ -656,7 +656,7 @@ st_validate_attachment(struct gl_context *ctx, /* If the encoding is sRGB and sRGB rendering cannot be enabled, * check for linear format support instead. * Later when we create a surface, we change the format to a linear one. */ - if (!ctx->Extensions.EXT_framebuffer_sRGB && + if (!ctx->Extensions.EXT_sRGB && _mesa_get_format_color_encoding(texFormat) == GL_SRGB) { const mesa_format linearFormat = _mesa_get_srgb_format_linear(texFormat); format = st_mesa_format_to_pipe_format(st_context(ctx), linearFormat); diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 16889074f6..9e63e7b74c 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -786,7 +786,7 @@ void st_init_extensions(struct pipe_screen *screen, PIPE_FORMAT_B10G10R10A2_UINT }, GL_TRUE }, /* at least one format must be supported */ - { { o(EXT_framebuffer_sRGB) }, + { { o(EXT_sRGB) }, { PIPE_FORMAT_A8B8G8R8_SRGB, PIPE_FORMAT_B8G8R8A8_SRGB, PIPE_FORMAT_R8G8B8A8_SRGB }, @@ -1316,6 +1316,10 @@ void st_init_extensions(struct pipe_screen *screen, extensions->ARB_texture_buffer_object_rgb32 && extensions->ARB_shader_image_load_store; + extensions->EXT_framebuffer_sRGB = + screen->get_param(screen, PIPE_CAP_SRGB_WRITE_CONTROL) && + extensions->EXT_sRGB; + /* Unpacking a varying in the fragment shader costs 1 texture indirection. * If the number of available texture indirections is very limited, then we * prefer to disable varying packing rather than run the risk of varying diff --git a/src/mesa/state_tracker/st_format.c b/src/mesa/state_tracker/st_format.c index caddd76c5d..aacb878828 100644 --- a/src/mesa/state_tracker/st_format.c +++ b/src/mesa/state_tracker/st_format.c @@ -2457,7 +2457,7 @@ st_QuerySamplesForFormat(struct gl_context *ctx, GLenum target, /* If an sRGB framebuffer is unsupported, sRGB formats behave like linear * formats. */ - if (!ctx->Extensions.EXT_framebuffer_sRGB) { + if (!ctx->Extensions.EXT_sRGB) { internalFormat = _mesa_get_linear_internalformat(internalFormat); } diff --git a/src/mesa/state_tracker/st_manager.c b/src/mesa/state_tracker/st_manager.c index 076ad42646..25e2dcad4c 100644 --- a/src/mesa/state_tracker/st_manager.c +++ b/src/mesa/state_tracker/st_manager.c @@ -295,7 +295,7 @@
[Mesa-dev] [RFC PATCH 3/8] mesa/main: Add flag for EXT_sRGB and use it parallel with EXT_framebuffer_sRGB
From: Gert Wollny EXT_sRGB is an (incomplete) GLES extension that provides support for sRGB framebuffer attachments, hence it can be used to check for this support as an alternative to EXT_framebuffer_sRGB that provies the same functionality but also sRGB write control support. All drivers that support EXT_framebuffer_sRGB also support EXT_sRGB, but in order to keep this commit minial, and not to break any drivers both flags are checked. Since EXT_sRGB is incomplete and superseted by GLES 3.0 it will not be exposed as an extension. Signed-off-by: Gert Wollny --- src/mesa/main/fbobject.c| 2 +- src/mesa/main/formatquery.c | 3 ++- src/mesa/main/framebuffer.c | 3 ++- src/mesa/main/mtypes.h | 1 + src/mesa/main/teximage.c| 3 ++- 5 files changed, 8 insertions(+), 4 deletions(-) diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c index 68e0daf342..ca3f3f7f76 100644 --- a/src/mesa/main/fbobject.c +++ b/src/mesa/main/fbobject.c @@ -4253,7 +4253,7 @@ get_framebuffer_attachment_parameter(struct gl_context *ctx, } } else { - if (ctx->Extensions.EXT_framebuffer_sRGB) { + if (ctx->Extensions.EXT_framebuffer_sRGB || ctx->Extensions.EXT_sRGB) { *params = _mesa_get_format_color_encoding(att->Renderbuffer->Format); } diff --git a/src/mesa/main/formatquery.c b/src/mesa/main/formatquery.c index 84b5f512ba..1d43c1e860 100644 --- a/src/mesa/main/formatquery.c +++ b/src/mesa/main/formatquery.c @@ -1241,7 +1241,8 @@ _mesa_GetInternalformativ(GLenum target, GLenum internalformat, GLenum pname, break; case GL_SRGB_WRITE: - if (!_mesa_has_EXT_framebuffer_sRGB(ctx) || + if ((!_mesa_has_EXT_framebuffer_sRGB(ctx) && + !ctx->Extensions.EXT_sRGB) || !_mesa_is_color_format(internalformat)) { goto end; } diff --git a/src/mesa/main/framebuffer.c b/src/mesa/main/framebuffer.c index 10dd2fde44..90314ee1bd 100644 --- a/src/mesa/main/framebuffer.c +++ b/src/mesa/main/framebuffer.c @@ -459,7 +459,8 @@ _mesa_update_framebuffer_visual(struct gl_context *ctx, fb->Visual.rgbBits = fb->Visual.redBits + fb->Visual.greenBits + fb->Visual.blueBits; if (_mesa_get_format_color_encoding(fmt) == GL_SRGB) -fb->Visual.sRGBCapable = ctx->Extensions.EXT_framebuffer_sRGB; +fb->Visual.sRGBCapable = ctx->Extensions.EXT_framebuffer_sRGB || + ctx->Extensions.EXT_sRGB; break; } } diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 656e1226f9..4ee55266e5 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -4253,6 +4253,7 @@ struct gl_extensions GLboolean EXT_semaphore_fd; GLboolean EXT_shader_integer_mix; GLboolean EXT_shader_samples_identical; + GLboolean EXT_sRGB; GLboolean EXT_stencil_two_side; GLboolean EXT_texture_array; GLboolean EXT_texture_compression_latc; diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c index 6805b47c72..e1d652824e 100644 --- a/src/mesa/main/teximage.c +++ b/src/mesa/main/teximage.c @@ -2438,7 +2438,8 @@ copytexture_error_check( struct gl_context *ctx, GLuint dimensions, bool rb_is_srgb = false; bool dst_is_srgb = false; - if (ctx->Extensions.EXT_framebuffer_sRGB && + if ((ctx->Extensions.EXT_framebuffer_sRGB || + ctx->Extensions.EXT_sRGB) && _mesa_get_format_color_encoding(rb->Format) == GL_SRGB) { rb_is_srgb = true; } -- 2.18.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 0/8] Add and enable extension EXT_sRGB_write_control (reworked)
From: Gert Wollny Dear all, based on the feedback given by Ilia I've completely reworked the series to add internal support for EXT_sRGB as a stepstone to implement EXT_sRGB_write_control and expose GLES 3.0 properly. Since the series has been reworked thoroughly, most of the original patches have completely changed so that carrying a history didn't make much sense for most patches. I'd like to thank Ilia for all his commenst on the first series that helped me a lot to rework the series. Thanks for any commenst, Gert Gert Wollny (8): Gallium: Add new CAPS to indicate whether a driver can switch SRGB write virgl: Set sRGB write control CAP based on host capabilities mesa/main: Add flag for EXT_sRGB and use it parallel with EXT_framebuffer_sRGB mesa/main/version: Lower the requirements for GLES 3.0 mesa/st: rework support for sRGB framebuffer attachements i965: Set flag for EXT_sRGB mesa/main: Remove now superfluos tests for both EXT_sRGB and EXT_framebuffer_sRGB mesa/main: Expose EXT_sRGB_write_control src/gallium/auxiliary/util/u_screen.c| 3 ++ src/gallium/docs/source/screen.rst | 3 ++ src/gallium/drivers/virgl/virgl_hw.h | 1 + src/gallium/drivers/virgl/virgl_screen.c | 2 ++ src/gallium/include/pipe/p_defines.h | 1 + src/mesa/drivers/dri/i965/intel_extensions.c | 1 + src/mesa/main/enable.c | 4 --- src/mesa/main/extensions_table.h | 1 + src/mesa/main/fbobject.c | 2 +- src/mesa/main/formatquery.c | 3 +- src/mesa/main/framebuffer.c | 3 +- src/mesa/main/get_hash_params.py | 4 ++- src/mesa/main/mtypes.h | 1 + src/mesa/main/teximage.c | 2 +- src/mesa/main/version.c | 4 +-- src/mesa/state_tracker/st_cb_fbo.c | 4 +-- src/mesa/state_tracker/st_extensions.c | 6 +++- src/mesa/state_tracker/st_format.c | 2 +- src/mesa/state_tracker/st_manager.c | 37 19 files changed, 55 insertions(+), 29 deletions(-) -- 2.18.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 1/8] Gallium: Add new CAPS to indicate whether a driver can switch SRGB write
From: Gert Wollny Add a new cap that indicates whether the drivers supports enabling/disabling the conversion from linear space to sRGB for a framebuffer attachment. Signed-off-by: Gert Wollny --- src/gallium/auxiliary/util/u_screen.c | 3 +++ src/gallium/docs/source/screen.rst| 3 +++ src/gallium/include/pipe/p_defines.h | 1 + 3 files changed, 7 insertions(+) diff --git a/src/gallium/auxiliary/util/u_screen.c b/src/gallium/auxiliary/util/u_screen.c index 73dbbee94a..1d9f367501 100644 --- a/src/gallium/auxiliary/util/u_screen.c +++ b/src/gallium/auxiliary/util/u_screen.c @@ -326,6 +326,9 @@ u_pipe_screen_get_param_defaults(struct pipe_screen *pscreen, case PIPE_CAP_MAX_VERTEX_ELEMENT_SRC_OFFSET: return 2047; + case PIPE_CAP_SRGB_WRITE_CONTROL: + return 1; + default: unreachable("bad PIPE_CAP_*"); } diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 0abd164494..da677eb04b 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -477,6 +477,9 @@ subpixel precision bias in bits during conservative rasterization. 0 means no limit. * ``PIPE_CAP_MAX_VERTEX_ELEMENT_SRC_OFFSET``: The maximum supported value for of pipe_vertex_element::src_offset. +* ``PIPE_CAP_SRGB_WRITE_CONTROL``: Indicates whether the drivers on GLES supports + enabling/disabling the conversion from linear space to sRGB at framebuffer or + blend time. .. _pipe_capf: diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index 693f041b1d..7838b18be8 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -826,6 +826,7 @@ enum pipe_cap PIPE_CAP_MAX_COMBINED_HW_ATOMIC_COUNTER_BUFFERS, PIPE_CAP_MAX_TEXTURE_UPLOAD_MEMORY_BUDGET, PIPE_CAP_MAX_VERTEX_ELEMENT_SRC_OFFSET, + PIPE_CAP_SRGB_WRITE_CONTROL, }; /** -- 2.18.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC PATCH 4/8] mesa/main/version: Lower the requirements for GLES 3.0
Is ARB_framebuffer_object really needed? IIRC one of the sticking points is that it allows differently-sized render targets. Does ES3 allow that? If so, this is fine. On Tue, Nov 13, 2018 at 12:28 PM Gert Wollny wrote: > > From: Gert Wollny > > GLES 3.0 does not actually require support for EXT_framebuffer_sRGB, it > only needs support for sRGB attachments to framebuffers. > > Signed-off-by: Gert Wollny > --- > src/mesa/main/version.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/main/version.c b/src/mesa/main/version.c > index 610ba2f08c..2f7ac75a81 100644 > --- a/src/mesa/main/version.c > +++ b/src/mesa/main/version.c > @@ -512,8 +512,9 @@ compute_version_es2(const struct gl_extensions > *extensions, > extensions->ARB_texture_float && > extensions->ARB_texture_rg && > extensions->ARB_depth_buffer_float && > - /* extensions->ARB_framebuffer_object && */ > - extensions->EXT_framebuffer_sRGB && > + (extensions->EXT_framebuffer_sRGB || > + (extensions->ARB_framebuffer_object && > + extensions->EXT_sRGB)) && > extensions->EXT_packed_float && > extensions->EXT_texture_array && > extensions->EXT_texture_shared_exponent && > -- > 2.18.1 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC PATCH 7/8] mesa/main: Remove now superfluos tests for both EXT_sRGB and EXT_framebuffer_sRGB
Why not order the series such that this commit is not needed? On Tue, Nov 13, 2018 at 12:28 PM Gert Wollny wrote: > > From: Gert Wollny > > Signed-off-by: Gert Wollny > --- > src/mesa/main/fbobject.c | 2 +- > src/mesa/main/teximage.c | 3 +-- > src/mesa/main/version.c | 5 ++--- > 3 files changed, 4 insertions(+), 6 deletions(-) > > diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c > index ca3f3f7f76..7d45ce43f4 100644 > --- a/src/mesa/main/fbobject.c > +++ b/src/mesa/main/fbobject.c > @@ -4253,7 +4253,7 @@ get_framebuffer_attachment_parameter(struct gl_context > *ctx, > } >} >else { > - if (ctx->Extensions.EXT_framebuffer_sRGB || > ctx->Extensions.EXT_sRGB) { > + if (ctx->Extensions.EXT_sRGB) { > *params = > _mesa_get_format_color_encoding(att->Renderbuffer->Format); > } > diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c > index e1d652824e..3c9c8ada99 100644 > --- a/src/mesa/main/teximage.c > +++ b/src/mesa/main/teximage.c > @@ -2438,8 +2438,7 @@ copytexture_error_check( struct gl_context *ctx, GLuint > dimensions, >bool rb_is_srgb = false; >bool dst_is_srgb = false; > > - if ((ctx->Extensions.EXT_framebuffer_sRGB || > - ctx->Extensions.EXT_sRGB) && > + if (ctx->Extensions.EXT_sRGB && >_mesa_get_format_color_encoding(rb->Format) == GL_SRGB) { > rb_is_srgb = true; >} > diff --git a/src/mesa/main/version.c b/src/mesa/main/version.c > index 2f7ac75a81..5709d283f3 100644 > --- a/src/mesa/main/version.c > +++ b/src/mesa/main/version.c > @@ -512,9 +512,8 @@ compute_version_es2(const struct gl_extensions > *extensions, > extensions->ARB_texture_float && > extensions->ARB_texture_rg && > extensions->ARB_depth_buffer_float && > - (extensions->EXT_framebuffer_sRGB || > - (extensions->ARB_framebuffer_object && > - extensions->EXT_sRGB)) && > + extensions->ARB_framebuffer_object && > + extensions->EXT_sRGB && > extensions->EXT_packed_float && > extensions->EXT_texture_array && > extensions->EXT_texture_shared_exponent && > -- > 2.18.1 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] st/xa: Fix transformations when we have both source and mask samplers
In the case when we had both source and mask samplers, transformations were typically not applied correctly. Signed-off-by: Thomas Hellstrom Reviewed-by: Brian Paul --- src/gallium/state_trackers/xa/xa_renderer.c | 117 1 file changed, 49 insertions(+), 68 deletions(-) diff --git a/src/gallium/state_trackers/xa/xa_renderer.c b/src/gallium/state_trackers/xa/xa_renderer.c index 0cb75a8c968..ac26c5508cf 100644 --- a/src/gallium/state_trackers/xa/xa_renderer.c +++ b/src/gallium/state_trackers/xa/xa_renderer.c @@ -192,47 +192,55 @@ add_vertex_2tex(struct xa_context *r, } static void -add_vertex_data1(struct xa_context *r, - float srcX, float srcY, float dstX, float dstY, - float width, float height, - struct pipe_resource *src, const float *src_matrix) +compute_src_coords(float sx, float sy, struct pipe_resource *src, + const float *src_matrix, + float width, float height, + float tc0[2], float tc1[2], float tc2[2], float tc3[2]) { -float s0, t0, s1, t1, s2, t2, s3, t3; -float pt0[2], pt1[2], pt2[2], pt3[2]; - -pt0[0] = srcX; -pt0[1] = srcY; -pt1[0] = (srcX + width); -pt1[1] = srcY; -pt2[0] = (srcX + width); -pt2[1] = (srcY + height); -pt3[0] = srcX; -pt3[1] = (srcY + height); +tc0[0] = sx; +tc0[1] = sy; +tc1[0] = (sx + width); +tc1[1] = sy; +tc2[0] = (sx + width); +tc2[1] = (sy + height); +tc3[0] = sx; +tc3[1] = (sy + height); if (src_matrix) { - map_point((float *)src_matrix, pt0[0], pt0[1], [0], [1]); - map_point((float *)src_matrix, pt1[0], pt1[1], [0], [1]); - map_point((float *)src_matrix, pt2[0], pt2[1], [0], [1]); - map_point((float *)src_matrix, pt3[0], pt3[1], [0], [1]); + map_point((float *)src_matrix, tc0[0], tc0[1], [0], [1]); + map_point((float *)src_matrix, tc1[0], tc1[1], [0], [1]); + map_point((float *)src_matrix, tc2[0], tc2[1], [0], [1]); + map_point((float *)src_matrix, tc3[0], tc3[1], [0], [1]); } -s0 = pt0[0] / src->width0; -s1 = pt1[0] / src->width0; -s2 = pt2[0] / src->width0; -s3 = pt3[0] / src->width0; -t0 = pt0[1] / src->height0; -t1 = pt1[1] / src->height0; -t2 = pt2[1] / src->height0; -t3 = pt3[1] / src->height0; +tc0[0] /= src->width0; +tc1[0] /= src->width0; +tc2[0] /= src->width0; +tc3[0] /= src->width0; +tc0[1] /= src->height0; +tc1[1] /= src->height0; +tc2[1] /= src->height0; +tc3[1] /= src->height0; +} +static void +add_vertex_data1(struct xa_context *r, + float srcX, float srcY, float dstX, float dstY, + float width, float height, + struct pipe_resource *src, const float *src_matrix) +{ +float tc0[2], tc1[2], tc2[2], tc3[2]; + +compute_src_coords(srcX, srcY, src, src_matrix, width, height, + tc0, tc1, tc2, tc3); /* 1st vertex */ -add_vertex_1tex(r, dstX, dstY, s0, t0); +add_vertex_1tex(r, dstX, dstY, tc0[0], tc0[1]); /* 2nd vertex */ -add_vertex_1tex(r, dstX + width, dstY, s1, t1); +add_vertex_1tex(r, dstX + width, dstY, tc1[0], tc1[1]); /* 3rd vertex */ -add_vertex_1tex(r, dstX + width, dstY + height, s2, t2); +add_vertex_1tex(r, dstX + width, dstY + height, tc2[0], tc2[1]); /* 4th vertex */ -add_vertex_1tex(r, dstX, dstY + height, s3, t3); +add_vertex_1tex(r, dstX, dstY + height, tc3[0], tc3[1]); } static void @@ -243,53 +251,26 @@ add_vertex_data2(struct xa_context *r, struct pipe_resource *mask, const float *src_matrix, const float *mask_matrix) { -float src_s0, src_t0, src_s1, src_t1; -float mask_s0, mask_t0, mask_s1, mask_t1; -float spt0[2], spt1[2]; -float mpt0[2], mpt1[2]; - -spt0[0] = srcX; -spt0[1] = srcY; -spt1[0] = srcX + width; -spt1[1] = srcY + height; - -mpt0[0] = maskX; -mpt0[1] = maskY; -mpt1[0] = maskX + width; -mpt1[1] = maskY + height; - -if (src_matrix) { - map_point((float *)src_matrix, spt0[0], spt0[1], [0], [1]); - map_point((float *)src_matrix, spt1[0], spt1[1], [0], [1]); -} - -if (mask_matrix) { - map_point((float *)mask_matrix, mpt0[0], mpt0[1], [0], [1]); - map_point((float *)mask_matrix, mpt1[0], mpt1[1], [0], [1]); -} - -src_s0 = spt0[0] / src->width0; -src_t0 = spt0[1] / src->height0; -src_s1 = spt1[0] / src->width0; -src_t1 = spt1[1] / src->height0; +float spt0[2], spt1[2], spt2[2], spt3[2]; +float mpt0[2], mpt1[2], mpt2[2], mpt3[2]; -mask_s0 = mpt0[0] / mask->width0; -mask_t0 = mpt0[1] / mask->height0; -mask_s1 = mpt1[0] / mask->width0; -mask_t1 = mpt1[1] / mask->height0; +compute_src_coords(srcX, srcY, src, src_matrix, width, height, + spt0, spt1, spt2, spt3); +
[Mesa-dev] [PATCH 3/3] st/xa: Support Component Alpha with trivial blending
Support Component Alpha for those composite operations that do not require per-channel alpha blending. Signed-off-by: Thomas Hellstrom Reviewed-by: Brian Paul --- src/gallium/state_trackers/xa/xa_composite.c | 33 src/gallium/state_trackers/xa/xa_priv.h | 1 + src/gallium/state_trackers/xa/xa_tgsi.c | 18 --- 3 files changed, 35 insertions(+), 17 deletions(-) diff --git a/src/gallium/state_trackers/xa/xa_composite.c b/src/gallium/state_trackers/xa/xa_composite.c index b0746327522..34d78027e27 100644 --- a/src/gallium/state_trackers/xa/xa_composite.c +++ b/src/gallium/state_trackers/xa/xa_composite.c @@ -111,12 +111,6 @@ blend_for_op(struct xa_composite_blend *blend, int i; boolean supported = FALSE; -/* - * No component alpha yet. - */ -if (mask_pic && mask_pic->component_alpha) - return FALSE; - /* * our default in case something goes wrong */ @@ -130,6 +124,12 @@ blend_for_op(struct xa_composite_blend *blend, } } +/* + * No component alpha yet. + */ +if (mask_pic && mask_pic->component_alpha && blend->alpha_src) + return FALSE; + if (!dst_pic->srf) return supported; @@ -224,15 +224,9 @@ xa_src_pict_is_accelerated(const union xa_source_pict *src_pic) XA_EXPORT int xa_composite_check_accelerated(const struct xa_composite *comp) { -struct xa_composite_blend blend; struct xa_picture *src_pic = comp->src; struct xa_picture *mask_pic = comp->mask; - -/* - * No component alpha yet. - */ -if (mask_pic && mask_pic->component_alpha) - return -XA_ERR_INVAL; +struct xa_composite_blend blend; if (!xa_is_filter_accelerated(src_pic) || !xa_is_filter_accelerated(comp->mask)) { @@ -246,6 +240,12 @@ xa_composite_check_accelerated(const struct xa_composite *comp) if (!blend_for_op(, comp->op, comp->src, comp->mask, comp->dst)) return -XA_ERR_INVAL; +/* + * No component alpha yet. + */ +if (mask_pic && mask_pic->component_alpha && blend.alpha_src) + return -XA_ERR_INVAL; + return XA_ERR_NONE; } @@ -382,10 +382,15 @@ bind_shaders(struct xa_context *ctx, const struct xa_composite *comp) struct xa_shader shader; struct xa_picture *src_pic = comp->src; struct xa_picture *mask_pic = comp->mask; +struct xa_picture *dst_pic = comp->dst; ctx->has_solid_src = FALSE; ctx->has_solid_mask = FALSE; +if (dst_pic && xa_format_type(dst_pic->pict_format) != +xa_format_type(xa_surface_format(dst_pic->srf))) + return -XA_ERR_INVAL; + if (src_pic) { if (src_pic->wrap == xa_wrap_clamp_to_border && src_pic->has_transform) fs_traits |= FS_SRC_REPEAT_NONE; @@ -405,6 +410,8 @@ bind_shaders(struct xa_context *ctx, const struct xa_composite *comp) if (mask_pic) { vs_traits |= VS_MASK; fs_traits |= FS_MASK; +if (mask_pic->component_alpha) + fs_traits |= FS_CA; if (mask_pic->src_pict) { if (!xa_handle_src_pict(ctx, mask_pic->src_pict, true)) return -XA_ERR_INVAL; diff --git a/src/gallium/state_trackers/xa/xa_priv.h b/src/gallium/state_trackers/xa/xa_priv.h index 09a858ff972..f368de3b81f 100644 --- a/src/gallium/state_trackers/xa/xa_priv.h +++ b/src/gallium/state_trackers/xa/xa_priv.h @@ -166,6 +166,7 @@ enum xa_fs_traits { FS_SRC_LUMINANCE = 1 << 11, FS_MASK_LUMINANCE = 1 << 12, FS_DST_LUMINANCE = 1 << 13, +FS_CA = 1 << 14, }; struct xa_shader { diff --git a/src/gallium/state_trackers/xa/xa_tgsi.c b/src/gallium/state_trackers/xa/xa_tgsi.c index 5f2608aee55..ed3e0895d98 100644 --- a/src/gallium/state_trackers/xa/xa_tgsi.c +++ b/src/gallium/state_trackers/xa/xa_tgsi.c @@ -82,6 +82,7 @@ print_fs_traits(int fs_traits) "FS_SRC_LUMINANCE", /* = 1 << 11, */ "FS_MASK_LUMINANCE",/* = 1 << 12, */ "FS_DST_LUMINANCE", /* = 1 << 13, */ +"FS_CA",/* = 1 << 14, */ }; int i, k; @@ -107,12 +108,20 @@ src_in_mask(struct ureg_program *ureg, struct ureg_dst dst, struct ureg_src src, struct ureg_src mask, - unsigned mask_luminance) + unsigned mask_luminance, boolean component_alpha) { if (mask_luminance) -ureg_MUL(ureg, dst, src, ureg_scalar(mask, TGSI_SWIZZLE_X)); -else +if (component_alpha) { +ureg_MOV(ureg, dst, src); +ureg_MUL(ureg, ureg_writemask(dst, TGSI_WRITEMASK_W), + src, ureg_scalar(mask, TGSI_SWIZZLE_X)); +} else { +ureg_MUL(ureg, dst, src, ureg_scalar(mask, TGSI_SWIZZLE_X)); +} +else if (!component_alpha) ureg_MUL(ureg, dst, src, ureg_scalar(mask, TGSI_SWIZZLE_W)); +else +ureg_MUL(ureg, dst, src, mask); } static struct ureg_src @@ -347,6 +356,7 @@ create_fs(struct pipe_context *pipe,
[Mesa-dev] [PATCH 2/3] st/xa: Minor renderer cleanups
constify function arguments to clean up the code a bit. Reported-by: Brian Paul Signed-off-by: Thomas Hellstrom Reviewed-by: Brian Paul --- src/gallium/state_trackers/xa/xa_renderer.c | 24 ++--- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/src/gallium/state_trackers/xa/xa_renderer.c b/src/gallium/state_trackers/xa/xa_renderer.c index ac26c5508cf..582a5fa1308 100644 --- a/src/gallium/state_trackers/xa/xa_renderer.c +++ b/src/gallium/state_trackers/xa/xa_renderer.c @@ -46,14 +46,14 @@ renderer_set_constants(struct xa_context *r, int shader_type, const float *params, int param_bytes); static inline boolean -is_affine(float *matrix) +is_affine(const float *matrix) { return floatIsZero(matrix[2]) && floatIsZero(matrix[5]) && floatsEqual(matrix[8], 1); } static inline void -map_point(float *mat, float x, float y, float *out_x, float *out_y) +map_point(const float *mat, float x, float y, float *out_x, float *out_y) { if (!mat) { *out_x = x; @@ -192,25 +192,25 @@ add_vertex_2tex(struct xa_context *r, } static void -compute_src_coords(float sx, float sy, struct pipe_resource *src, +compute_src_coords(float sx, float sy, const struct pipe_resource *src, const float *src_matrix, float width, float height, float tc0[2], float tc1[2], float tc2[2], float tc3[2]) { tc0[0] = sx; tc0[1] = sy; -tc1[0] = (sx + width); +tc1[0] = sx + width; tc1[1] = sy; -tc2[0] = (sx + width); -tc2[1] = (sy + height); +tc2[0] = sx + width; +tc2[1] = sy + height; tc3[0] = sx; -tc3[1] = (sy + height); +tc3[1] = sy + height; if (src_matrix) { - map_point((float *)src_matrix, tc0[0], tc0[1], [0], [1]); - map_point((float *)src_matrix, tc1[0], tc1[1], [0], [1]); - map_point((float *)src_matrix, tc2[0], tc2[1], [0], [1]); - map_point((float *)src_matrix, tc3[0], tc3[1], [0], [1]); + map_point(src_matrix, tc0[0], tc0[1], [0], [1]); + map_point(src_matrix, tc1[0], tc1[1], [0], [1]); + map_point(src_matrix, tc2[0], tc2[1], [0], [1]); + map_point(src_matrix, tc3[0], tc3[1], [0], [1]); } tc0[0] /= src->width0; @@ -227,7 +227,7 @@ static void add_vertex_data1(struct xa_context *r, float srcX, float srcY, float dstX, float dstY, float width, float height, - struct pipe_resource *src, const float *src_matrix) + const struct pipe_resource *src, const float *src_matrix) { float tc0[2], tc1[2], tc2[2], tc3[2]; -- 2.19.0.rc1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa/st: swap order of clear() and clear_with_quad()
If we can't clear all the buffers with pctx->clear() (say, for example, because of ColorMask), push the buffers we *can* clear with pctx->clear() first. Tilers want to see clears coming before draws to enable fast- paths, and clearing one of the attachments with a quad-draw first confuses that logic. Signed-off-by: Rob Clark --- src/mesa/state_tracker/st_cb_clear.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_clear.c b/src/mesa/state_tracker/st_cb_clear.c index 22e85019764..3b51bd2c8a7 100644 --- a/src/mesa/state_tracker/st_cb_clear.c +++ b/src/mesa/state_tracker/st_cb_clear.c @@ -442,9 +442,6 @@ st_Clear(struct gl_context *ctx, GLbitfield mask) * use pipe->clear. We want to always use pipe->clear for the other * renderbuffers, because it's likely to be faster. */ - if (quad_buffers) { - clear_with_quad(ctx, quad_buffers); - } if (clear_buffers) { /* We can't translate the clear color to the colorbuffer format, * because different colorbuffers may have different formats. @@ -453,6 +450,9 @@ st_Clear(struct gl_context *ctx, GLbitfield mask) (union pipe_color_union*)>Color.ClearColor, ctx->Depth.Clear, ctx->Stencil.Clear); } + if (quad_buffers) { + clear_with_quad(ctx, quad_buffers); + } if (mask & BUFFER_BIT_ACCUM) _mesa_clear_accum_buffer(ctx); } -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf
Am 13.11.18 um 18:00 schrieb Dylan Baker: > Quoting Erik Faye-Lund (2018-11-13 01:34:53) >> On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote: >>> Quoting Erik Faye-Lund (2018-11-12 04:51:47) On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote: > Which has the same behavior. Does it? I'm not so sure... IROUND_POS seems to round to nearest integer depending on the FPU rounding mode, _mesa_roundevenf rounds to the nearest *even* value regardless of the FPU rounding mode, no? I'm not sure if it matters or not, but *at least* point that out in the commit message. Unless I'm missing something, of course... >>> >>> I should put it in the commit message, but there is a comment in >>> rounding.h that >>> if you change the rounding mode you get to keep the pieces. >> >> Well, this might regress performance pretty badly. Especially in the >> swrast code, this could be bad... >> > > Why? we have the assumption that you don't change the rounding mode already in > core mesa and many of the drivers. > > For performance, I measured a simple 1000 loops of rounding, and found that > the > only way the rounding.h function was slower is if you used the __SSE4_1__ > path... (It was the same performance as the int cast +0.5 implementation) FWIW I'm not entirely sure it's useful to have a sse41 implementation - since all sse2 capable cpus can natively do rintf. Although maybe it should be pointed out that the sse41 implementation will use a defined rounding mode, whereas rintf will use current rounding mode. But I don't think anyone ever cares for the results if a different rounding mode would be set. Although of course rint and its variant do not actually guarantee the even part of it (but well if it's a sse41 capable box we pretty much know it would do just that anyway)... (And technically nearbyintf would probably be an even better solution, since we never want to get involved with the clunky exceptions, otherwise it's identical. But there might be reasons why it isn't used.) Roland > > Dylan > > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7C5f77a09021be4da94a1c08d649899668%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636777252795733409sdata=ZS9kXWZAg0jOYt5bXyPV2rqlnhqN1ojr675tb8kKPTg%3Dreserved=0 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 8/8] intel/compiler: Lower SSBO and shared loads/stores in NIR
We have a bunch of code to do this in the back-end compiler but it's fairly specific to typed surface messages and the way we emit them. This breaks it out into NIR were it's easier to do things a bit more generally. It also means we can easily share the code between the bec4 and FS back-ends if we wish. --- src/intel/Makefile.sources| 1 + src/intel/compiler/brw_fs_nir.cpp | 381 -- src/intel/compiler/brw_nir.c | 2 + src/intel/compiler/brw_nir.h | 2 + .../brw_nir_lower_mem_access_bit_sizes.c | 313 ++ src/intel/compiler/brw_vec4_nir.cpp | 126 +- src/intel/compiler/meson.build| 1 + 7 files changed, 421 insertions(+), 405 deletions(-) create mode 100644 src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources index 4da887f7ed2..5e7d32293b7 100644 --- a/src/intel/Makefile.sources +++ b/src/intel/Makefile.sources @@ -85,6 +85,7 @@ COMPILER_FILES = \ compiler/brw_nir_attribute_workarounds.c \ compiler/brw_nir_lower_cs_intrinsics.c \ compiler/brw_nir_lower_image_load_store.c \ + compiler/brw_nir_lower_mem_access_bit_sizes.c \ compiler/brw_nir_opt_peephole_ffma.c \ compiler/brw_nir_tcs_workarounds.c \ compiler/brw_packed_float.c \ diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index 2b36171136e..84d0c6be6c3 100644 --- a/src/intel/compiler/brw_fs_nir.cpp +++ b/src/intel/compiler/brw_fs_nir.cpp @@ -26,6 +26,7 @@ #include "brw_fs_surface_builder.h" #include "brw_nir.h" #include "util/u_math.h" +#include "util/bitscan.h" using namespace brw; using namespace brw::surface_access; @@ -2250,107 +2251,6 @@ fs_visitor::get_indirect_offset(nir_intrinsic_instr *instr) return get_nir_src(*offset_src); } -static void -do_untyped_vector_read(const fs_builder , - const fs_reg dest, - const fs_reg surf_index, - const fs_reg offset_reg, - unsigned num_components) -{ - if (type_sz(dest.type) <= 2) { - assert(dest.stride == 1); - boolean is_const_offset = offset_reg.file == BRW_IMMEDIATE_VALUE; - - if (is_const_offset) { - uint32_t start = offset_reg.ud & ~3; - uint32_t end = offset_reg.ud + num_components * type_sz(dest.type); - end = ALIGN(end, 4); - assert (end - start <= 16); - - /* At this point we have 16-bit component/s that have constant - * offset aligned to 4-bytes that can be read with untyped_reads. - * untyped_read message requires 32-bit aligned offsets. - */ - unsigned first_component = (offset_reg.ud & 3) / type_sz(dest.type); - unsigned num_components_32bit = (end - start) / 4; - - fs_reg read_result = -emit_untyped_read(bld, surf_index, brw_imm_ud(start), - 1 /* dims */, - num_components_32bit, - BRW_PREDICATE_NONE); - shuffle_from_32bit_read(bld, dest, read_result, first_component, - num_components); - } else { - fs_reg read_offset = bld.vgrf(BRW_REGISTER_TYPE_UD); - for (unsigned i = 0; i < num_components; i++) { -if (i == 0) { - bld.MOV(read_offset, offset_reg); -} else { - bld.ADD(read_offset, offset_reg, - brw_imm_ud(i * type_sz(dest.type))); -} -/* Non constant offsets are not guaranteed to be aligned 32-bits - * so they are read using one byte_scattered_read message - * for each component. - */ -fs_reg read_result = - emit_byte_scattered_read(bld, surf_index, read_offset, -1 /* dims */, 1, -type_sz(dest.type) * 8 /* bit_size */, -BRW_PREDICATE_NONE); -bld.MOV(offset(dest, bld, i), -subscript (read_result, dest.type, 0)); - } - } - } else if (type_sz(dest.type) == 4) { - fs_reg read_result = emit_untyped_read(bld, surf_index, offset_reg, - 1 /* dims */, - num_components, - BRW_PREDICATE_NONE); - read_result.type = dest.type; - for (unsigned i = 0; i < num_components; i++) - bld.MOV(offset(dest, bld, i), offset(read_result, bld, i)); - } else if (type_sz(dest.type) == 8) { - /* Reading a dvec, so we need to: - * - * 1. Multiply num_components by 2, to account for the fact that we - *need to read 64-bit components. - * 2. Shuffle the result
[Mesa-dev] [PATCH 7/8] nir: Add alignment parameters to SSBO, UBO, and shared access
This also changes spirv_to_nir and glsl_to_nir to set them. The one place that doesn't set them is shared memory access lowering in nir_lower_io. That will have to be updated before any consumers of it can effectively use these new alignments. --- src/compiler/glsl/glsl_to_nir.cpp| 14 +++ src/compiler/nir/nir.h | 41 src/compiler/nir/nir_intrinsics.py | 26 - src/compiler/nir/nir_lower_atomics_to_ssbo.c | 4 ++ src/compiler/nir/nir_print.c | 2 + src/compiler/spirv/spirv_to_nir.c| 2 + src/compiler/spirv/vtn_variables.c | 6 +++ 7 files changed, 85 insertions(+), 10 deletions(-) diff --git a/src/compiler/glsl/glsl_to_nir.cpp b/src/compiler/glsl/glsl_to_nir.cpp index 9bb0f5d4044..9f73b721e39 100644 --- a/src/compiler/glsl/glsl_to_nir.cpp +++ b/src/compiler/glsl/glsl_to_nir.cpp @@ -33,6 +33,7 @@ #include "compiler/nir/nir_builder.h" #include "main/imports.h" #include "main/mtypes.h" +#include "util/u_math.h" /* * pass to lower GLSL IR to NIR @@ -603,6 +604,14 @@ nir_visitor::visit(ir_return *ir) nir_builder_instr_insert(, >instr); } +static void +intrinsic_set_std430_align(nir_intrinsic_instr *intrin, const glsl_type *type) +{ + unsigned bit_size = type->is_boolean() ? 32 : glsl_get_bit_size(type); + unsigned pow2_components = util_next_power_of_two(type->vector_elements); + nir_intrinsic_set_align(intrin, (bit_size / 8) * pow2_components, 0); +} + void nir_visitor::visit(ir_call *ir) { @@ -1006,6 +1015,7 @@ nir_visitor::visit(ir_call *ir) instr->src[0] = nir_src_for_ssa(nir_val); instr->src[1] = nir_src_for_ssa(evaluate_rvalue(block)); instr->src[2] = nir_src_for_ssa(evaluate_rvalue(offset)); + intrinsic_set_std430_align(instr, val->type); nir_intrinsic_set_write_mask(instr, write_mask->value.u[0]); instr->num_components = val->type->vector_elements; @@ -1024,6 +1034,7 @@ nir_visitor::visit(ir_call *ir) const glsl_type *type = ir->return_deref->var->type; instr->num_components = type->vector_elements; + intrinsic_set_std430_align(instr, type); /* Setup destination register */ unsigned bit_size = type->is_boolean() ? 32 : glsl_get_bit_size(type); @@ -1101,6 +1112,7 @@ nir_visitor::visit(ir_call *ir) const glsl_type *type = ir->return_deref->var->type; instr->num_components = type->vector_elements; + intrinsic_set_std430_align(instr, type); /* Setup destination register */ unsigned bit_size = type->is_boolean() ? 32 : glsl_get_bit_size(type); @@ -1131,6 +1143,7 @@ nir_visitor::visit(ir_call *ir) instr->src[0] = nir_src_for_ssa(nir_val); instr->num_components = val->type->vector_elements; + intrinsic_set_std430_align(instr, val->type); nir_builder_instr_insert(, >instr); break; @@ -1388,6 +1401,7 @@ nir_visitor::visit(ir_expression *ir) load->num_components = ir->type->vector_elements; load->src[0] = nir_src_for_ssa(evaluate_rvalue(ir->operands[0])); load->src[1] = nir_src_for_ssa(evaluate_rvalue(ir->operands[1])); + intrinsic_set_std430_align(load, ir->type); add_instr(>instr, ir->type->vector_elements, bit_size); /* diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index c469e111b2c..41d61dc8105 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -34,6 +34,7 @@ #include "util/list.h" #include "util/ralloc.h" #include "util/set.h" +#include "util/bitscan.h" #include "util/bitset.h" #include "util/macros.h" #include "compiler/nir_types.h" @@ -1248,6 +1249,18 @@ typedef enum { */ NIR_INTRINSIC_ACCESS = 16, + /** +* Alignment for offsets and addresses +* +* These two parameters, specify an alignment in terms of a multiplier and +* an offset. The offset or address parameter X of the intrinsic is +* guaranteed to satisfy the following: +* +*(X - align_offset) % align_mul == 0 +*/ + NIR_INTRINSIC_ALIGN_MUL = 17, + NIR_INTRINSIC_ALIGN_OFFSET = 18, + NIR_INTRINSIC_NUM_INDEX_FLAGS, } nir_intrinsic_index_flag; @@ -1342,6 +1355,34 @@ INTRINSIC_IDX_ACCESSORS(image_dim, IMAGE_DIM, enum glsl_sampler_dim) INTRINSIC_IDX_ACCESSORS(image_array, IMAGE_ARRAY, bool) INTRINSIC_IDX_ACCESSORS(access, ACCESS, enum gl_access_qualifier) INTRINSIC_IDX_ACCESSORS(format, FORMAT, unsigned) +INTRINSIC_IDX_ACCESSORS(align_mul, ALIGN_MUL, unsigned) +INTRINSIC_IDX_ACCESSORS(align_offset, ALIGN_OFFSET, unsigned) + +static inline void +nir_intrinsic_set_align(nir_intrinsic_instr *intrin, +unsigned align_mul, unsigned align_offset) +{ + assert(util_is_power_of_two_nonzero(align_mul)); + assert(align_offset < align_mul); + nir_intrinsic_set_align_mul(intrin, align_mul); +
[Mesa-dev] [PATCH 2/8] nir/builder: Assert that intN_t immediates fit
This assert won't catch all mistakes with this helper but it will at least ensure that the top bits are all zero or all one which should help catch bugs. --- src/compiler/nir/nir_builder.h | 4 1 file changed, 4 insertions(+) diff --git a/src/compiler/nir/nir_builder.h b/src/compiler/nir/nir_builder.h index 3271a480520..3be630ab3dd 100644 --- a/src/compiler/nir/nir_builder.h +++ b/src/compiler/nir/nir_builder.h @@ -330,6 +330,10 @@ nir_imm_intN_t(nir_builder *build, uint64_t x, unsigned bit_size) { nir_const_value v; + assert(bit_size == 64 || + (int64_t)x >> bit_size == 0 || + (int64_t)x >> bit_size == -1); + memset(, 0, sizeof(v)); assert(bit_size <= 64); v.i64[0] = x & (~0ull >> (64 - bit_size)); -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/8] nir/lower_alu_to_scalar: Don't try to lower unpack_32_2x16
It messes up when trying to lower. Cc: mesa-sta...@lists.freedesktop.org --- src/compiler/nir/nir_lower_alu_to_scalar.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/compiler/nir/nir_lower_alu_to_scalar.c b/src/compiler/nir/nir_lower_alu_to_scalar.c index 0be3aba9456..7ef032cd164 100644 --- a/src/compiler/nir/nir_lower_alu_to_scalar.c +++ b/src/compiler/nir/nir_lower_alu_to_scalar.c @@ -194,6 +194,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b) } case nir_op_unpack_64_2x32: + case nir_op_unpack_32_2x16: return false; LOWER_REDUCTION(nir_op_fdot, nir_op_fmul, nir_op_fadd); -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/8] intel: Move shared/SSBO access lowering to NIR
In order to properly do all the different kinds of SSBO and SLM writes that we have in GL and Vulkan, we have to do some lowering. The hardware doesn't have instructions for writing a N-bit vecM with an arbitrary write-mask. Instead, we have byte scattered messages which work on a scalar byte, word, or dword at an unaligned address and untyped surface messages which work on a 32-bit vecN. All SSBO and SLM access has to be lowered to one of these two things. Previously we did this in the back-end and had separate copies for fs and vec4. This works but it was fairly heavily tied to the fs_surface_builder and the way we emit typed load/store ops. I've been interested in wiring up the A64 messages for doing "global" reads and writes and they will need exactly the same lowering but I'm not at all convinced I want to shove them through the same emit_untyped_read/write helpers we have today. In any case, this lets us share code between vec4 and fs and I think the implementation is over-all cleaner for it. This series has a few other advantages beyond just code sharing: 1) The new splitting code acts on ranges of bytes and is able to combine loads/stores in more cases than the old code could. For example, an indirect u8vec3 load is now just a single dword load where we throw away the last 16 bits. Another example is that a u16vec4 write with a YZ writemask is now written with a single unaligned dword store. 2) OpBitcast in SPIR-V now works correctly on 8-bit types. 3) Writes to 8 and 16-bit shared variables should now work. Cc: Samuel Iglesias Gonsálvez Cc: Jose Maria Casanova Crespo Jason Ekstrand (8): nir/lower_alu_to_scalar: Don't try to lower unpack_32_2x16 nir/builder: Assert that intN_t immediates fit nir/builder: Add iadd_imm and imul_imm helpers nir/builder: Add a nir_pack/unpack/bitcast helpers nir/spirv: Force 32-bit for UBO and SSBO Booleans nir/glsl: Force 32-bit for UBO and SSBO Booleans nir: Add alignment parameters to SSBO, UBO, and shared access intel/compiler: Lower SSBO and shared loads/stores in NIR src/compiler/glsl/glsl_to_nir.cpp | 31 +- src/compiler/nir/nir.h| 41 ++ src/compiler/nir/nir_builder.h| 142 +++ src/compiler/nir/nir_intrinsics.py| 26 +- src/compiler/nir/nir_lower_alu_to_scalar.c| 1 + src/compiler/nir/nir_lower_atomics_to_ssbo.c | 4 + src/compiler/nir/nir_lower_io.c | 5 +- src/compiler/nir/nir_print.c | 2 + src/compiler/spirv/spirv_to_nir.c | 2 + src/compiler/spirv/vtn_alu.c | 101 ++--- src/compiler/spirv/vtn_variables.c| 30 +- src/intel/Makefile.sources| 1 + src/intel/compiler/brw_fs_nir.cpp | 381 -- src/intel/compiler/brw_nir.c | 2 + src/intel/compiler/brw_nir.h | 2 + .../brw_nir_lower_mem_access_bit_sizes.c | 313 ++ src/intel/compiler/brw_vec4_nir.cpp | 126 +- src/intel/compiler/meson.build| 1 + 18 files changed, 702 insertions(+), 509 deletions(-) create mode 100644 src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/8] nir/glsl: Force 32-bit for UBO and SSBO Booleans
--- src/compiler/glsl/glsl_to_nir.cpp | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/src/compiler/glsl/glsl_to_nir.cpp b/src/compiler/glsl/glsl_to_nir.cpp index 0479f8fcfe4..9bb0f5d4044 100644 --- a/src/compiler/glsl/glsl_to_nir.cpp +++ b/src/compiler/glsl/glsl_to_nir.cpp @@ -1000,7 +1000,10 @@ nir_visitor::visit(ir_call *ir) ir_constant *write_mask = ((ir_instruction *)param)->as_constant(); assert(write_mask); - instr->src[0] = nir_src_for_ssa(evaluate_rvalue(val)); + nir_ssa_def *nir_val = evaluate_rvalue(val); + assert(!val->type->is_boolean() || nir_val->bit_size == 32); + + instr->src[0] = nir_src_for_ssa(nir_val); instr->src[1] = nir_src_for_ssa(evaluate_rvalue(block)); instr->src[2] = nir_src_for_ssa(evaluate_rvalue(offset)); nir_intrinsic_set_write_mask(instr, write_mask->value.u[0]); @@ -1023,7 +1026,7 @@ nir_visitor::visit(ir_call *ir) instr->num_components = type->vector_elements; /* Setup destination register */ - unsigned bit_size = glsl_get_bit_size(type); + unsigned bit_size = type->is_boolean() ? 32 : glsl_get_bit_size(type); nir_ssa_dest_init(>instr, >dest, type->vector_elements, bit_size, NULL); @@ -1100,7 +1103,7 @@ nir_visitor::visit(ir_call *ir) instr->num_components = type->vector_elements; /* Setup destination register */ - unsigned bit_size = glsl_get_bit_size(type); + unsigned bit_size = type->is_boolean() ? 32 : glsl_get_bit_size(type); nir_ssa_dest_init(>instr, >dest, type->vector_elements, bit_size, NULL); @@ -1123,7 +1126,10 @@ nir_visitor::visit(ir_call *ir) nir_intrinsic_set_write_mask(instr, write_mask->value.u[0]); - instr->src[0] = nir_src_for_ssa(evaluate_rvalue(val)); + nir_ssa_def *nir_val = evaluate_rvalue(val); + assert(!val->type->is_boolean() || nir_val->bit_size == 32); + + instr->src[0] = nir_src_for_ssa(nir_val); instr->num_components = val->type->vector_elements; nir_builder_instr_insert(, >instr); @@ -1377,7 +1383,8 @@ nir_visitor::visit(ir_expression *ir) case ir_binop_ubo_load: { nir_intrinsic_instr *load = nir_intrinsic_instr_create(this->shader, nir_intrinsic_load_ubo); - unsigned bit_size = glsl_get_bit_size(ir->type); + unsigned bit_size = ir->type->is_boolean() ? 32 : + glsl_get_bit_size(ir->type); load->num_components = ir->type->vector_elements; load->src[0] = nir_src_for_ssa(evaluate_rvalue(ir->operands[0])); load->src[1] = nir_src_for_ssa(evaluate_rvalue(ir->operands[1])); -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: fix video APIs on Raven2
From: Marek Olšák This was missed when I added the new enum. Cc: 18.3 --- src/gallium/drivers/radeonsi/si_get.c | 9 ++--- src/gallium/drivers/radeonsi/si_uvd.c | 3 ++- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_get.c b/src/gallium/drivers/radeonsi/si_get.c index b440230d227..91f38329d59 100644 --- a/src/gallium/drivers/radeonsi/si_get.c +++ b/src/gallium/drivers/radeonsi/si_get.c @@ -573,24 +573,26 @@ static int si_get_video_param(struct pipe_screen *screen, enum pipe_video_cap param) { struct si_screen *sscreen = (struct si_screen *)screen; enum pipe_video_format codec = u_reduce_video_profile(profile); if (entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) { switch (param) { case PIPE_VIDEO_CAP_SUPPORTED: return (codec == PIPE_VIDEO_FORMAT_MPEG4_AVC && (si_vce_is_fw_version_supported(sscreen) || - sscreen->info.family == CHIP_RAVEN)) || +sscreen->info.family == CHIP_RAVEN || +sscreen->info.family == CHIP_RAVEN2)) || (profile == PIPE_VIDEO_PROFILE_HEVC_MAIN && (sscreen->info.family == CHIP_RAVEN || - si_radeon_uvd_enc_supported(sscreen))); +sscreen->info.family == CHIP_RAVEN2 || +si_radeon_uvd_enc_supported(sscreen))); case PIPE_VIDEO_CAP_NPOT_TEXTURES: return 1; case PIPE_VIDEO_CAP_MAX_WIDTH: return (sscreen->info.family < CHIP_TONGA) ? 2048 : 4096; case PIPE_VIDEO_CAP_MAX_HEIGHT: return (sscreen->info.family < CHIP_TONGA) ? 1152 : 2304; case PIPE_VIDEO_CAP_PREFERED_FORMAT: return PIPE_FORMAT_NV12; case PIPE_VIDEO_CAP_PREFERS_INTERLACED: return false; @@ -624,21 +626,22 @@ static int si_get_video_param(struct pipe_screen *screen, return true; case PIPE_VIDEO_FORMAT_HEVC: /* Carrizo only supports HEVC Main */ if (sscreen->info.family >= CHIP_STONEY) return (profile == PIPE_VIDEO_PROFILE_HEVC_MAIN || profile == PIPE_VIDEO_PROFILE_HEVC_MAIN_10); else if (sscreen->info.family >= CHIP_CARRIZO) return profile == PIPE_VIDEO_PROFILE_HEVC_MAIN; return false; case PIPE_VIDEO_FORMAT_JPEG: - if (sscreen->info.family == CHIP_RAVEN) + if (sscreen->info.family == CHIP_RAVEN || + sscreen->info.family == CHIP_RAVEN2) return true; if (sscreen->info.family < CHIP_CARRIZO || sscreen->info.family >= CHIP_VEGA10) return false; if (!(sscreen->info.drm_major == 3 && sscreen->info.drm_minor >= 19)) { RVID_ERR("No MJPEG support for the kernel version\n"); return false; } return true; case PIPE_VIDEO_FORMAT_VP9: if (sscreen->info.family < CHIP_RAVEN) diff --git a/src/gallium/drivers/radeonsi/si_uvd.c b/src/gallium/drivers/radeonsi/si_uvd.c index 1a9d8f8d9fa..8c9553acbf3 100644 --- a/src/gallium/drivers/radeonsi/si_uvd.c +++ b/src/gallium/drivers/radeonsi/si_uvd.c @@ -139,21 +139,22 @@ static void si_vce_get_buffer(struct pipe_resource *resource, *surface = >surface; } /** * creates an UVD compatible decoder */ struct pipe_video_codec *si_uvd_create_decoder(struct pipe_context *context, const struct pipe_video_codec *templ) { struct si_context *ctx = (struct si_context *)context; - bool vcn = (ctx->family == CHIP_RAVEN) ? true : false; + bool vcn = ctx->family == CHIP_RAVEN || + ctx->family == CHIP_RAVEN2; if (templ->entrypoint == PIPE_VIDEO_ENTRYPOINT_ENCODE) { if (vcn) { return radeon_create_encoder(context, templ, ctx->ws, si_vce_get_buffer); } else { if (u_reduce_video_profile(templ->profile) == PIPE_VIDEO_FORMAT_HEVC) return radeon_uvd_create_encoder(context, templ, ctx->ws, si_vce_get_buffer); else return si_vce_create_encoder(context, templ, ctx->ws, si_vce_get_buffer); } -- 2.17.1
Re: [Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf
Am 14.11.18 um 03:02 schrieb Roland Scheidegger: > Am 13.11.18 um 23:49 schrieb Dylan Baker: >> Quoting Roland Scheidegger (2018-11-13 14:13:00) >>> Am 13.11.18 um 18:00 schrieb Dylan Baker: Quoting Erik Faye-Lund (2018-11-13 01:34:53) > On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote: >> Quoting Erik Faye-Lund (2018-11-12 04:51:47) >>> On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote: Which has the same behavior. >>> >>> Does it? I'm not so sure... IROUND_POS seems to round to nearest >>> integer depending on the FPU rounding mode, _mesa_roundevenf rounds >>> to >>> the nearest *even* value regardless of the FPU rounding mode, no? >>> >>> I'm not sure if it matters or not, but *at least* point that out in >>> the >>> commit message. Unless I'm missing something, of course... >> >> I should put it in the commit message, but there is a comment in >> rounding.h that >> if you change the rounding mode you get to keep the pieces. > > Well, this might regress performance pretty badly. Especially in the > swrast code, this could be bad... > Why? we have the assumption that you don't change the rounding mode already in core mesa and many of the drivers. For performance, I measured a simple 1000 loops of rounding, and found that the only way the rounding.h function was slower is if you used the __SSE4_1__ path... (It was the same performance as the int cast +0.5 implementation) >>> FWIW I'm not entirely sure it's useful to have a sse41 implementation - >>> since all sse2 capable cpus can natively do rintf. Although maybe it >>> should be pointed out that the sse41 implementation will use a defined >>> rounding mode, whereas rintf will use current rounding mode. But I don't >>> think anyone ever cares for the results if a different rounding mode >>> would be set. Although of course rint and its variant do not actually >>> guarantee the even part of it (but well if it's a sse41 capable box we >>> pretty much know it would do just that anyway)... (And technically >>> nearbyintf would probably be an even better solution, since we never >>> want to get involved with the clunky exceptions, otherwise it's >>> identical. But there might be reasons why it isn't used.) >>> >>> Roland >> >> I'm not convinced we want it either, since it seems to be slower than glibc's >> rintf. I guess it probably does make sense to use the nearbyintf instead. >> >> As an aside (since I know 0 about assembly), does _MM_FROUND_CUR_DIRECTION >> not >> check the rounding mode? > Oh indeed, I didn't check the code too closely (I was just assuming > _mm_round_ss() was used because it is possible to use round-to-nearest > regardless the actual rounding mode, but that's not the case). > > But actually I misread this code: the point of mesa_roundevenf is to > round to float WITHOUT conversion to int. In which case it makes more > sense at least at first look... > > But if you want to round to nearest integer WITH conversion to int, you > probably really want to use something else. nearbyint family doesn't > have variants which give you ints. There's rint functions which give you > ints directly, but they are likely a very bad idea (aside from exception > handling, not quite sure if this really causes the compiler to do > something different) because of giving you long (or long long) results - > meaning that you can't use the simple cpu instructions giving you 32bit > results (because conversion to 64bit long + trunc to 32bit will give you > defined (although meaningless) results in some cases where direct > conversion to 32bit int wouldn't). > So ideally you'd pick a variant where the compiler is smart enough to > recognize it can be done with a single instruction. I would guess > nearbyintf + int cast should do just about everywhere, at least as long > as x64 or x86 + sse2 is used, my suspicion is the old IROUND function > was done in a time where x87 was still relevant. Or maybe rintf + int > cast, no idea how the compiler really handles them differently (I tried > to quickly look at it in gcc source, but no idea where those are > buried). As a side note, I hate it when the assembly solution is obvious > and you can't really figure out how the hell you should coax the > compiler in giving you the right answer (I mean, high level languages > are there to help, not get in your way...). > > All that said, I still don't really see the point of the manual sse41 > assembly (even for the case when we don't want to convert to int) - > assuming there is an easy solution to get the compiler to do the right > thing... Err, I tried it out and was completely unable to come up with something which wouldn't generate huge amounts of crap code (or library calls). WTF. (But might depend on compiler, of course.) So I guess maybe for round conversion to int you actually want to manually do sse2 inline asm
Re: [Mesa-dev] [PATCH v3] i965: Fix calculation of layers array length for isl_view
On Mon, Sep 10, 2018 at 10:21 AM Danylo Piliaiev wrote: > Handle all cases in calculation of layers count for isl_view > taking into account texture view and image unit. > st_convert_image was taken as a reference. > > When u->Layered is true the whole level is taken with respect to > image view. In other case only one layer is taken. > > v3: (Józef Kucia and Ilia Mirkin) > - Rewrote patch by taking st_convert_image as a reference > - Removed now unused get_image_num_layers function > - Changed commit message > > Fixes: 5a8c8903 > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107856 > > Signed-off-by: Danylo Piliaiev > --- > .../drivers/dri/i965/brw_wm_surface_state.c | 32 ++- > 1 file changed, 17 insertions(+), 15 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > index 944762ec46..9bfe6e2037 100644 > --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > @@ -1499,18 +1499,6 @@ update_buffer_image_param(struct brw_context *brw, > param->stride[0] = _mesa_get_format_bytes(u->_ActualFormat); > } > > -static unsigned > -get_image_num_layers(const struct intel_mipmap_tree *mt, GLenum target, > - unsigned level) > -{ > - if (target == GL_TEXTURE_CUBE_MAP) > - return 6; > - > - return target == GL_TEXTURE_3D ? > - minify(mt->surf.logical_level0_px.depth, level) : > - mt->surf.logical_level0_px.array_len; > -} > - > static void > update_image_surface(struct brw_context *brw, > struct gl_image_unit *u, > @@ -1541,14 +1529,28 @@ update_image_surface(struct brw_context *brw, >} else { > struct intel_texture_object *intel_obj = > intel_texture_object(obj); > struct intel_mipmap_tree *mt = intel_obj->mt; > - const unsigned num_layers = u->Layered ? > -get_image_num_layers(mt, obj->Target, u->Level) : 1; > + > + unsigned base_layer, num_layers; > + if (u->Layered) { > +if (obj->Target == GL_TEXTURE_3D) { > + base_layer = 0; > + num_layers = minify(mt->surf.logical_level0_px.depth, > u->Level); > +} else { > + base_layer = obj->MinLayer; > + num_layers = obj->Immutable ? > +obj->NumLayers : > +mt->surf.logical_level0_px.array_len; > Doesn't this need to be array_len - base_layer? I'm not sure on the others without digging. > +} > + } else { > +base_layer = obj->MinLayer + u->_Layer; > +num_layers = 1; > + } > > struct isl_view view = { > .format = format, > .base_level = obj->MinLevel + u->Level, > .levels = 1, > -.base_array_layer = obj->MinLayer + u->_Layer, > +.base_array_layer = base_layer, > .array_len = num_layers, > .swizzle = ISL_SWIZZLE_IDENTITY, > .usage = ISL_SURF_USAGE_STORAGE_BIT, > -- > 2.18.0 > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf
Quoting Roland Scheidegger (2018-11-13 14:13:00) > Am 13.11.18 um 18:00 schrieb Dylan Baker: > > Quoting Erik Faye-Lund (2018-11-13 01:34:53) > >> On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote: > >>> Quoting Erik Faye-Lund (2018-11-12 04:51:47) > On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote: > > Which has the same behavior. > > Does it? I'm not so sure... IROUND_POS seems to round to nearest > integer depending on the FPU rounding mode, _mesa_roundevenf rounds > to > the nearest *even* value regardless of the FPU rounding mode, no? > > I'm not sure if it matters or not, but *at least* point that out in > the > commit message. Unless I'm missing something, of course... > >>> > >>> I should put it in the commit message, but there is a comment in > >>> rounding.h that > >>> if you change the rounding mode you get to keep the pieces. > >> > >> Well, this might regress performance pretty badly. Especially in the > >> swrast code, this could be bad... > >> > > > > Why? we have the assumption that you don't change the rounding mode already > > in > > core mesa and many of the drivers. > > > > For performance, I measured a simple 1000 loops of rounding, and found that > > the > > only way the rounding.h function was slower is if you used the __SSE4_1__ > > path... (It was the same performance as the int cast +0.5 implementation) > FWIW I'm not entirely sure it's useful to have a sse41 implementation - > since all sse2 capable cpus can natively do rintf. Although maybe it > should be pointed out that the sse41 implementation will use a defined > rounding mode, whereas rintf will use current rounding mode. But I don't > think anyone ever cares for the results if a different rounding mode > would be set. Although of course rint and its variant do not actually > guarantee the even part of it (but well if it's a sse41 capable box we > pretty much know it would do just that anyway)... (And technically > nearbyintf would probably be an even better solution, since we never > want to get involved with the clunky exceptions, otherwise it's > identical. But there might be reasons why it isn't used.) > > Roland I'm not convinced we want it either, since it seems to be slower than glibc's rintf. I guess it probably does make sense to use the nearbyintf instead. As an aside (since I know 0 about assembly), does _MM_FROUND_CUR_DIRECTION not check the rounding mode? Dylan signature.asc Description: signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa/st: swap order of clear() and clear_with_quad()
On Tue, Nov 13, 2018 at 5:25 PM Eric Anholt wrote: > > Rob Clark writes: > > > If we can't clear all the buffers with pctx->clear() (say, for example, > > because of ColorMask), push the buffers we *can* clear with pctx->clear() > > first. Tilers want to see clears coming before draws to enable fast- > > paths, and clearing one of the attachments with a quad-draw first > > confuses that logic. > > Oh, nice! > > Reviewed-by: Eric Anholt > > Though it feels pretty silly that the ->clear() caller needs a > clear_with_quad implementation when the ->clear() implementation in the > driver also needs a clear_with_quad implementation for non-fast-cleared > buffers. :/ hmm, so perhaps one easy option is to change pctx->clear() to return a boolean, so driver can return false to ask the state tracker to do a clear_with_quad().. maybe that would be a first step towards allowing driver to handle clears w/ colormask and possibly scissor (although for the later, plus glInvalidateFramebuffer()/glInvalidateSubFramebuffer(), I was thinking of pctx->invalidate_surface()/pctx->invalidate_sub_surface()). But either way, I guess this patch is a simple stop-gap solution. BR, -R ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa/st: swap order of clear() and clear_with_quad()
On Tue, Nov 13, 2018 at 6:50 PM Rob Clark wrote: > > On Tue, Nov 13, 2018 at 6:19 PM Eric Anholt wrote: > > > > Rob Clark writes: > > > > > On Tue, Nov 13, 2018 at 5:25 PM Eric Anholt wrote: > > >> > > >> Rob Clark writes: > > >> > > >> > If we can't clear all the buffers with pctx->clear() (say, for example, > > >> > because of ColorMask), push the buffers we *can* clear with > > >> > pctx->clear() > > >> > first. Tilers want to see clears coming before draws to enable fast- > > >> > paths, and clearing one of the attachments with a quad-draw first > > >> > confuses that logic. > > >> > > >> Oh, nice! > > >> > > >> Reviewed-by: Eric Anholt > > >> > > >> Though it feels pretty silly that the ->clear() caller needs a > > >> clear_with_quad implementation when the ->clear() implementation in the > > >> driver also needs a clear_with_quad implementation for non-fast-cleared > > >> buffers. :/ > > > > > > hmm, so perhaps one easy option is to change pctx->clear() to return a > > > boolean, so driver can return false to ask the state tracker to do a > > > clear_with_quad().. maybe that would be a first step towards allowing > > > driver to handle clears w/ colormask and possibly scissor (although > > > for the later, plus > > > glInvalidateFramebuffer()/glInvalidateSubFramebuffer(), I was thinking > > > of pctx->invalidate_surface()/pctx->invalidate_sub_surface()). > > > > I was thinking you'd return the mask of what buffers you couldn't (fast) > > clear. > > yeah, makes sense.. I kinda came to same conclusion when I started > thinking some drivers might not want us to split up the clear per > attachment.. still not quite sure about adding scissor/colormask, > might end up needing a pipe cap so st_Clear() would know to flush the > corresponding state down to driver. I guess low hanging fruit is to > not change the definition of pctx->clear() but just let driver ask for > fallback path for some/all attachments. You could also create a pipe_clear_info which would take that data directly and let the driver worry about it. FWIW, nvidia command stream clears can take into account stencil, scissors, window rectangles, color masks - maybe everything that st_Clear needs to worry about. It never seemed important enough to address myself, but I'll happily go along for the ride. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 102597] [Regression] mpv, high rendering times (two to three times higher)
https://bugs.freedesktop.org/show_bug.cgi?id=102597 --- Comment #10 from Dieter Nützel --- Code fix under way: https://lists.freedesktop.org/archives/mesa-dev/2018-November/209473.html With this patch mpv drops notably, apart that '--vo=opengl-hq' isn't available any longer. Was replaced by '--vo=gpu'. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [ANNOUNCE] Mesa 18.2.5 release candidate
On Mon, Nov 12, 2018 at 8:35 AM Juan A. Suarez Romero wrote: > > Hello list, > > The candidate for the Mesa 18.2.5 is now available. Currently we have: > - 25 queued > - 0 nominated (outstanding) > - and 2 rejected patch If it's not a big deal if would be convenient for me (for Gentoo) to have the following patches included in 18.2.5: efb1ccadca89 ("util/ralloc: Make sizeof(linear_header) a multiple of 8") - Maybe needs 7e3748c268cd ("util/ralloc: Switch from DEBUG to NDEBUG") 4eab98b66e7d ("meson: fix libatomic tests") and the patches to fix https://bugs.freedesktop.org/show_bug.cgi?id=105328#c8 Emil says that the needed commits are 87c156183cd6 ("configure: install KHR/khrplatform.h when needed") e02f061b690d ("meson: install KHR/khrplatform.h when needed") f7d42ee7d319 ("include: update GL & GLES headers (v2)") If they slip to 18.2.6 it's okay. Thanks! Matt ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf
Am 14.11.18 um 03:21 schrieb Matt Turner: > On Tue, Nov 13, 2018 at 6:03 PM Roland Scheidegger wrote: >> >> Am 13.11.18 um 23:49 schrieb Dylan Baker: >>> Quoting Roland Scheidegger (2018-11-13 14:13:00) Am 13.11.18 um 18:00 schrieb Dylan Baker: > Quoting Erik Faye-Lund (2018-11-13 01:34:53) >> On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote: >>> Quoting Erik Faye-Lund (2018-11-12 04:51:47) On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote: > Which has the same behavior. Does it? I'm not so sure... IROUND_POS seems to round to nearest integer depending on the FPU rounding mode, _mesa_roundevenf rounds to the nearest *even* value regardless of the FPU rounding mode, no? I'm not sure if it matters or not, but *at least* point that out in the commit message. Unless I'm missing something, of course... >>> >>> I should put it in the commit message, but there is a comment in >>> rounding.h that >>> if you change the rounding mode you get to keep the pieces. >> >> Well, this might regress performance pretty badly. Especially in the >> swrast code, this could be bad... >> > > Why? we have the assumption that you don't change the rounding mode > already in > core mesa and many of the drivers. > > For performance, I measured a simple 1000 loops of rounding, and found > that the > only way the rounding.h function was slower is if you used the __SSE4_1__ > path... (It was the same performance as the int cast +0.5 implementation) FWIW I'm not entirely sure it's useful to have a sse41 implementation - since all sse2 capable cpus can natively do rintf. Although maybe it should be pointed out that the sse41 implementation will use a defined rounding mode, whereas rintf will use current rounding mode. But I don't think anyone ever cares for the results if a different rounding mode would be set. Although of course rint and its variant do not actually guarantee the even part of it (but well if it's a sse41 capable box we pretty much know it would do just that anyway)... (And technically nearbyintf would probably be an even better solution, since we never want to get involved with the clunky exceptions, otherwise it's identical. But there might be reasons why it isn't used.) Roland >>> >>> I'm not convinced we want it either, since it seems to be slower than >>> glibc's >>> rintf. I guess it probably does make sense to use the nearbyintf instead. >>> >>> As an aside (since I know 0 about assembly), does _MM_FROUND_CUR_DIRECTION >>> not >>> check the rounding mode? >> Oh indeed, I didn't check the code too closely (I was just assuming >> _mm_round_ss() was used because it is possible to use round-to-nearest >> regardless the actual rounding mode, but that's not the case). >> >> But actually I misread this code: the point of mesa_roundevenf is to >> round to float WITHOUT conversion to int. In which case it makes more >> sense at least at first look... >> >> But if you want to round to nearest integer WITH conversion to int, you >> probably really want to use something else. nearbyint family doesn't >> have variants which give you ints. There's rint functions which give you >> ints directly, but they are likely a very bad idea (aside from exception > > Why? Not sure what the why refers to here? > >> handling, not quite sure if this really causes the compiler to do >> something different) because of giving you long (or long long) results - >> meaning that you can't use the simple cpu instructions giving you 32bit >> results (because conversion to 64bit long + trunc to 32bit will give you >> defined (although meaningless) results in some cases where direct >> conversion to 32bit int wouldn't). >> So ideally you'd pick a variant where the compiler is smart enough to >> recognize it can be done with a single instruction. I would guess >> nearbyintf + int cast should do just about everywhere, at least as long >> as x64 or x86 + sse2 is used, my suspicion is the old IROUND function >> was done in a time where x87 was still relevant. Or maybe rintf + int >> cast, no idea how the compiler really handles them differently (I tried >> to quickly look at it in gcc source, but no idea where those are >> buried). As a side note, I hate it when the assembly solution is obvious >> and you can't really figure out how the hell you should coax the >> compiler in giving you the right answer (I mean, high level languages >> are there to help, not get in your way...). > > Please read the commit message of > > commit dd0d3a2c0fb388745519c8a3be800720541eccfe > Author: Matt Turner > Date: Tue Mar 10 17:55:21 2015 -0700 > > mesa: Replace _mesa_round_to_even() with _mesa_roundeven(). > > for a lot of the background. > > I expect IROUND_POS can be replaced with the _mesa_lroundevenf
[Mesa-dev] [PATCH 3/5] intel/icl: Set way_size_per_bank to 4
Signed-off-by: Anuj Phogat Cc: Kenneth Graunke Cc: Francisco Jerez Cc: Lionel Landwerlin --- src/intel/common/gen_l3_config.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/intel/common/gen_l3_config.c b/src/intel/common/gen_l3_config.c index 079608198bc..de16ad23017 100644 --- a/src/intel/common/gen_l3_config.c +++ b/src/intel/common/gen_l3_config.c @@ -313,7 +313,8 @@ static unsigned get_l3_way_size(const struct gen_device_info *devinfo) { const unsigned way_size_per_bank = - devinfo->gen >= 9 && devinfo->l3_banks == 1 ? 4 : 2; + (devinfo->gen >= 9 && devinfo->l3_banks == 1) || devinfo->gen == 11 ? + 4 : 2; assert(devinfo->l3_banks); return way_size_per_bank * devinfo->l3_banks; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/5] anv/icl: Set use full ways in L3CNTLREG
L3 allocation table in h/w specification recommends using 4 KB granularity for programming allocation fields in L3CNTLREG. Signed-off-by: Anuj Phogat Cc: Kenneth Graunke Cc: Francisco Jerez Cc: Lionel Landwerlin --- src/intel/genxml/gen11.xml | 1 + src/intel/vulkan/genX_cmd_buffer.c | 1 + 2 files changed, 2 insertions(+) diff --git a/src/intel/genxml/gen11.xml b/src/intel/genxml/gen11.xml index b975fe94776..1239ed011ed 100644 --- a/src/intel/genxml/gen11.xml +++ b/src/intel/genxml/gen11.xml @@ -3547,6 +3547,7 @@ + diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/genX_cmd_buffer.c index ed88157170d..c7e5ef9596e 100644 --- a/src/intel/vulkan/genX_cmd_buffer.c +++ b/src/intel/vulkan/genX_cmd_buffer.c @@ -1623,6 +1623,7 @@ genX(cmd_buffer_config_l3)(struct anv_cmd_buffer *cmd_buffer, * desirable behavior. */ .ErrorDetectionBehaviorControl = true, + .UseFullWays = true, #endif .URBAllocation = cfg->n[GEN_L3P_URB], .ROAllocation = cfg->n[GEN_L3P_RO], -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/5] i965/icl: Fix L3 configurations
Use L3 configuration table specified in h/w specification. Signed-off-by: Anuj Phogat Cc: Kenneth Graunke Cc: Francisco Jerez Cc: Lionel Landwerlin --- src/intel/common/gen_l3_config.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/src/intel/common/gen_l3_config.c b/src/intel/common/gen_l3_config.c index b977c6ab136..079608198bc 100644 --- a/src/intel/common/gen_l3_config.c +++ b/src/intel/common/gen_l3_config.c @@ -137,12 +137,16 @@ static const struct gen_l3_config cnl_l3_configs[] = { */ static const struct gen_l3_config icl_l3_configs[] = { /* SLM URB ALL DC RO IS C T */ - {{ 0, 64, 64, 0, 0, 0, 0, 0 }}, - {{ 0, 64, 0, 16, 48, 0, 0, 0 }}, - {{ 0, 48, 0, 16, 64, 0, 0, 0 }}, - {{ 0, 32, 0, 0, 96, 0, 0, 0 }}, - {{ 0, 32, 96, 0, 0, 0, 0, 0 }}, - {{ 0, 32, 0, 16, 80, 0, 0, 0 }}, + {{ 0, 32, 32, 0, 0, 0, 0, 0 }}, + {{ 0, 32, 28, 0, 0, 0, 0, 0 }}, + {{ 0, 24, 0, 8, 28, 0, 0, 0 }}, + {{ 0, 16, 0, 0, 44, 0, 0, 0 }}, + {{ 0, 16, 12, 0, 0, 0, 0, 0 }}, + {{ 0, 16, 0, 0, 12, 0, 0, 0 }}, + {{ 0, 16, 80, 0, 0, 0, 0, 0 }}, + {{ 0, 16, 48, 0, 0, 0, 0, 0 }}, + {{ 0, 16, 44, 0, 0, 0, 0, 0 }}, + {{ 0, 32, 64, 0, 0, 0, 0, 0 }}, {{ 0 }} }; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/5] i965/icl: Set use full ways in L3CNTLREG
L3 allocation table in h/w specification recommends using 4 KB granularity for programming allocation fields in L3CNTLREG. Signed-off-by: Anuj Phogat Cc: Kenneth Graunke Cc: Francisco Jerez Cc: Lionel Landwerlin --- src/mesa/drivers/dri/i965/brw_defines.h | 1 + src/mesa/drivers/dri/i965/gen7_l3_state.c | 1 + 2 files changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 897c91aa31e..b8ada02d6eb 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1647,6 +1647,7 @@ enum brw_pixel_shader_coverage_mask_mode { # define GEN8_L3CNTLREG_ALL_ALLOC_SHIFT25 # define GEN8_L3CNTLREG_ALL_ALLOC_MASK INTEL_MASK(31, 25) # define GEN8_L3CNTLREG_EDBC_NO_HANG (1 << 9) +# define GEN8_L3CNTLREG_USE_FULL_WAYS (1 << 10) #define GEN10_CACHE_MODE_SS0x0e420 #define GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE (1 << 4) diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c b/src/mesa/drivers/dri/i965/gen7_l3_state.c index 8c6c4c47481..fb9b2703a50 100644 --- a/src/mesa/drivers/dri/i965/gen7_l3_state.c +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c @@ -119,6 +119,7 @@ setup_l3_config(struct brw_context *brw, const struct gen_l3_config *cfg) assert(!cfg->n[GEN_L3P_IS] && !cfg->n[GEN_L3P_C] && !cfg->n[GEN_L3P_T]); const unsigned imm_data = ((has_slm ? GEN8_L3CNTLREG_SLM_ENABLE : 0) | + (devinfo->gen == 11 ? GEN8_L3CNTLREG_USE_FULL_WAYS : 0) | SET_FIELD(cfg->n[GEN_L3P_URB], GEN8_L3CNTLREG_URB_ALLOC) | SET_FIELD(cfg->n[GEN_L3P_RO], GEN8_L3CNTLREG_RO_ALLOC) | SET_FIELD(cfg->n[GEN_L3P_DC], GEN8_L3CNTLREG_DC_ALLOC) | -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] i965: Fix calculation of layers array length for isl_view
On Tue, Nov 13, 2018 at 4:53 PM Jason Ekstrand wrote: > > On Mon, Sep 10, 2018 at 10:21 AM Danylo Piliaiev > wrote: >> >> Handle all cases in calculation of layers count for isl_view >> taking into account texture view and image unit. >> st_convert_image was taken as a reference. >> >> When u->Layered is true the whole level is taken with respect to >> image view. In other case only one layer is taken. >> >> v3: (Józef Kucia and Ilia Mirkin) >> - Rewrote patch by taking st_convert_image as a reference >> - Removed now unused get_image_num_layers function >> - Changed commit message >> >> Fixes: 5a8c8903 >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107856 >> >> Signed-off-by: Danylo Piliaiev >> --- >> .../drivers/dri/i965/brw_wm_surface_state.c | 32 ++- >> 1 file changed, 17 insertions(+), 15 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c >> b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c >> index 944762ec46..9bfe6e2037 100644 >> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c >> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c >> @@ -1499,18 +1499,6 @@ update_buffer_image_param(struct brw_context *brw, >> param->stride[0] = _mesa_get_format_bytes(u->_ActualFormat); >> } >> >> -static unsigned >> -get_image_num_layers(const struct intel_mipmap_tree *mt, GLenum target, >> - unsigned level) >> -{ >> - if (target == GL_TEXTURE_CUBE_MAP) >> - return 6; >> - >> - return target == GL_TEXTURE_3D ? >> - minify(mt->surf.logical_level0_px.depth, level) : >> - mt->surf.logical_level0_px.array_len; >> -} >> - >> static void >> update_image_surface(struct brw_context *brw, >> struct gl_image_unit *u, >> @@ -1541,14 +1529,28 @@ update_image_surface(struct brw_context *brw, >>} else { >> struct intel_texture_object *intel_obj = intel_texture_object(obj); >> struct intel_mipmap_tree *mt = intel_obj->mt; >> - const unsigned num_layers = u->Layered ? >> -get_image_num_layers(mt, obj->Target, u->Level) : 1; >> + >> + unsigned base_layer, num_layers; >> + if (u->Layered) { >> +if (obj->Target == GL_TEXTURE_3D) { >> + base_layer = 0; >> + num_layers = minify(mt->surf.logical_level0_px.depth, >> u->Level); >> +} else { >> + base_layer = obj->MinLayer; >> + num_layers = obj->Immutable ? >> +obj->NumLayers : >> +mt->surf.logical_level0_px.array_len; > > > Doesn't this need to be array_len - base_layer? I'm not sure on the others > without digging. Probably not intuitively obvious, but MinLayer/NumLayers are only set for Immutable textures. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] radeonsi: don't send data after write-confirm with BOTTOM_OF_PIPE_TS
For the series Tested-by: Dieter Nützel mpv drops notably, apart that '--vo=opengl-hq' isn't available any longer. Was replaced by '--vo=gpu'. Dieter Am 13.11.2018 22:23, schrieb Marek Olšák: From: Marek Olšák There are no writes. --- src/gallium/drivers/radeonsi/si_fence.c | 3 +-- src/gallium/drivers/radeonsi/si_perfcounter.c | 3 +-- src/gallium/drivers/radeonsi/si_query.c | 8 +++- 3 files changed, 5 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_fence.c b/src/gallium/drivers/radeonsi/si_fence.c index 3f22ee31ae8..d385f445774 100644 --- a/src/gallium/drivers/radeonsi/si_fence.c +++ b/src/gallium/drivers/radeonsi/si_fence.c @@ -270,22 +270,21 @@ static void si_fine_fence_set(struct si_context *ctx, radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0)); radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) | S_370_WR_CONFIRM(1) | S_370_ENGINE_SEL(V_370_PFP)); radeon_emit(cs, fence_va); radeon_emit(cs, fence_va >> 32); radeon_emit(cs, 0x8000); } else if (flags & PIPE_FLUSH_BOTTOM_OF_PIPE) { si_cp_release_mem(ctx, V_028A90_BOTTOM_OF_PIPE_TS, 0, - EOP_DST_SEL_MEM, - EOP_INT_SEL_SEND_DATA_AFTER_WR_CONFIRM, + EOP_DST_SEL_MEM, EOP_INT_SEL_NONE, EOP_DATA_SEL_VALUE_32BIT, NULL, fence_va, 0x8000, PIPE_QUERY_GPU_FINISHED); } else { assert(false); } } static boolean si_fence_finish(struct pipe_screen *screen, struct pipe_context *ctx, diff --git a/src/gallium/drivers/radeonsi/si_perfcounter.c b/src/gallium/drivers/radeonsi/si_perfcounter.c index 2ca6d2d7410..cea7d57e518 100644 --- a/src/gallium/drivers/radeonsi/si_perfcounter.c +++ b/src/gallium/drivers/radeonsi/si_perfcounter.c @@ -574,22 +574,21 @@ static void si_pc_emit_start(struct si_context *sctx, } /* Note: The buffer was already added in si_pc_emit_start, so we don't have to * do it again in here. */ static void si_pc_emit_stop(struct si_context *sctx, struct r600_resource *buffer, uint64_t va) { struct radeon_cmdbuf *cs = sctx->gfx_cs; si_cp_release_mem(sctx, V_028A90_BOTTOM_OF_PIPE_TS, 0, - EOP_DST_SEL_MEM, - EOP_INT_SEL_SEND_DATA_AFTER_WR_CONFIRM, + EOP_DST_SEL_MEM, EOP_INT_SEL_NONE, EOP_DATA_SEL_VALUE_32BIT, buffer, va, 0, SI_NOT_QUERY); si_cp_wait_mem(sctx, va, 0, 0x, 0); radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0)); radeon_emit(cs, EVENT_TYPE(V_028A90_PERFCOUNTER_SAMPLE) | EVENT_INDEX(0)); radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0)); radeon_emit(cs, EVENT_TYPE(V_028A90_PERFCOUNTER_STOP) | EVENT_INDEX(0)); radeon_set_uconfig_reg(cs, R_036020_CP_PERFMON_CNTL, S_036020_PERFMON_STATE(V_036020_STOP_COUNTING) | diff --git a/src/gallium/drivers/radeonsi/si_query.c b/src/gallium/drivers/radeonsi/si_query.c index 9b09c74d48a..21b9aeeac28 100644 --- a/src/gallium/drivers/radeonsi/si_query.c +++ b/src/gallium/drivers/radeonsi/si_query.c @@ -883,23 +883,22 @@ static void si_query_hw_do_emit_stop(struct si_context *sctx, break; case PIPE_QUERY_SO_OVERFLOW_ANY_PREDICATE: va += 16; for (unsigned stream = 0; stream < SI_MAX_STREAMS; ++stream) emit_sample_streamout(cs, va + 32 * stream, stream); break; case PIPE_QUERY_TIME_ELAPSED: va += 8; /* fall through */ case PIPE_QUERY_TIMESTAMP: - si_cp_release_mem(sctx, V_028A90_BOTTOM_OF_PIPE_TS, - 0, EOP_DST_SEL_MEM, - EOP_INT_SEL_SEND_DATA_AFTER_WR_CONFIRM, + si_cp_release_mem(sctx, V_028A90_BOTTOM_OF_PIPE_TS, 0, + EOP_DST_SEL_MEM, EOP_INT_SEL_NONE, EOP_DATA_SEL_TIMESTAMP, NULL, va, 0, query->b.type); fence_va = va + 8; break; case PIPE_QUERY_PIPELINE_STATISTICS: { unsigned sample_size = (query->result_size - 8) / 2; va += sample_size; radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 2, 0)); radeon_emit(cs, EVENT_TYPE(V_028A90_SAMPLE_PIPELINESTAT) | EVENT_INDEX(2)); @@ -910,22 +909,21 @@ static void si_query_hw_do_emit_stop(struct si_context *sctx, break; } default: assert(0); }
Re: [Mesa-dev] [PATCH] mesa/st: swap order of clear() and clear_with_quad()
Rob Clark writes: > On Tue, Nov 13, 2018 at 5:25 PM Eric Anholt wrote: >> >> Rob Clark writes: >> >> > If we can't clear all the buffers with pctx->clear() (say, for example, >> > because of ColorMask), push the buffers we *can* clear with pctx->clear() >> > first. Tilers want to see clears coming before draws to enable fast- >> > paths, and clearing one of the attachments with a quad-draw first >> > confuses that logic. >> >> Oh, nice! >> >> Reviewed-by: Eric Anholt >> >> Though it feels pretty silly that the ->clear() caller needs a >> clear_with_quad implementation when the ->clear() implementation in the >> driver also needs a clear_with_quad implementation for non-fast-cleared >> buffers. :/ > > hmm, so perhaps one easy option is to change pctx->clear() to return a > boolean, so driver can return false to ask the state tracker to do a > clear_with_quad().. maybe that would be a first step towards allowing > driver to handle clears w/ colormask and possibly scissor (although > for the later, plus > glInvalidateFramebuffer()/glInvalidateSubFramebuffer(), I was thinking > of pctx->invalidate_surface()/pctx->invalidate_sub_surface()). I was thinking you'd return the mask of what buffers you couldn't (fast) clear. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf
Am 13.11.18 um 23:49 schrieb Dylan Baker: > Quoting Roland Scheidegger (2018-11-13 14:13:00) >> Am 13.11.18 um 18:00 schrieb Dylan Baker: >>> Quoting Erik Faye-Lund (2018-11-13 01:34:53) On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote: > Quoting Erik Faye-Lund (2018-11-12 04:51:47) >> On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote: >>> Which has the same behavior. >> >> Does it? I'm not so sure... IROUND_POS seems to round to nearest >> integer depending on the FPU rounding mode, _mesa_roundevenf rounds >> to >> the nearest *even* value regardless of the FPU rounding mode, no? >> >> I'm not sure if it matters or not, but *at least* point that out in >> the >> commit message. Unless I'm missing something, of course... > > I should put it in the commit message, but there is a comment in > rounding.h that > if you change the rounding mode you get to keep the pieces. Well, this might regress performance pretty badly. Especially in the swrast code, this could be bad... >>> >>> Why? we have the assumption that you don't change the rounding mode already >>> in >>> core mesa and many of the drivers. >>> >>> For performance, I measured a simple 1000 loops of rounding, and found that >>> the >>> only way the rounding.h function was slower is if you used the __SSE4_1__ >>> path... (It was the same performance as the int cast +0.5 implementation) >> FWIW I'm not entirely sure it's useful to have a sse41 implementation - >> since all sse2 capable cpus can natively do rintf. Although maybe it >> should be pointed out that the sse41 implementation will use a defined >> rounding mode, whereas rintf will use current rounding mode. But I don't >> think anyone ever cares for the results if a different rounding mode >> would be set. Although of course rint and its variant do not actually >> guarantee the even part of it (but well if it's a sse41 capable box we >> pretty much know it would do just that anyway)... (And technically >> nearbyintf would probably be an even better solution, since we never >> want to get involved with the clunky exceptions, otherwise it's >> identical. But there might be reasons why it isn't used.) >> >> Roland > > I'm not convinced we want it either, since it seems to be slower than glibc's > rintf. I guess it probably does make sense to use the nearbyintf instead. > > As an aside (since I know 0 about assembly), does _MM_FROUND_CUR_DIRECTION not > check the rounding mode? Oh indeed, I didn't check the code too closely (I was just assuming _mm_round_ss() was used because it is possible to use round-to-nearest regardless the actual rounding mode, but that's not the case). But actually I misread this code: the point of mesa_roundevenf is to round to float WITHOUT conversion to int. In which case it makes more sense at least at first look... But if you want to round to nearest integer WITH conversion to int, you probably really want to use something else. nearbyint family doesn't have variants which give you ints. There's rint functions which give you ints directly, but they are likely a very bad idea (aside from exception handling, not quite sure if this really causes the compiler to do something different) because of giving you long (or long long) results - meaning that you can't use the simple cpu instructions giving you 32bit results (because conversion to 64bit long + trunc to 32bit will give you defined (although meaningless) results in some cases where direct conversion to 32bit int wouldn't). So ideally you'd pick a variant where the compiler is smart enough to recognize it can be done with a single instruction. I would guess nearbyintf + int cast should do just about everywhere, at least as long as x64 or x86 + sse2 is used, my suspicion is the old IROUND function was done in a time where x87 was still relevant. Or maybe rintf + int cast, no idea how the compiler really handles them differently (I tried to quickly look at it in gcc source, but no idea where those are buried). As a side note, I hate it when the assembly solution is obvious and you can't really figure out how the hell you should coax the compiler in giving you the right answer (I mean, high level languages are there to help, not get in your way...). All that said, I still don't really see the point of the manual sse41 assembly (even for the case when we don't want to convert to int) - assuming there is an easy solution to get the compiler to do the right thing... Roland > > Dylan > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf
On Tue, Nov 13, 2018 at 6:03 PM Roland Scheidegger wrote: > > Am 13.11.18 um 23:49 schrieb Dylan Baker: > > Quoting Roland Scheidegger (2018-11-13 14:13:00) > >> Am 13.11.18 um 18:00 schrieb Dylan Baker: > >>> Quoting Erik Faye-Lund (2018-11-13 01:34:53) > On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote: > > Quoting Erik Faye-Lund (2018-11-12 04:51:47) > >> On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote: > >>> Which has the same behavior. > >> > >> Does it? I'm not so sure... IROUND_POS seems to round to nearest > >> integer depending on the FPU rounding mode, _mesa_roundevenf rounds > >> to > >> the nearest *even* value regardless of the FPU rounding mode, no? > >> > >> I'm not sure if it matters or not, but *at least* point that out in > >> the > >> commit message. Unless I'm missing something, of course... > > > > I should put it in the commit message, but there is a comment in > > rounding.h that > > if you change the rounding mode you get to keep the pieces. > > Well, this might regress performance pretty badly. Especially in the > swrast code, this could be bad... > > >>> > >>> Why? we have the assumption that you don't change the rounding mode > >>> already in > >>> core mesa and many of the drivers. > >>> > >>> For performance, I measured a simple 1000 loops of rounding, and found > >>> that the > >>> only way the rounding.h function was slower is if you used the __SSE4_1__ > >>> path... (It was the same performance as the int cast +0.5 implementation) > >> FWIW I'm not entirely sure it's useful to have a sse41 implementation - > >> since all sse2 capable cpus can natively do rintf. Although maybe it > >> should be pointed out that the sse41 implementation will use a defined > >> rounding mode, whereas rintf will use current rounding mode. But I don't > >> think anyone ever cares for the results if a different rounding mode > >> would be set. Although of course rint and its variant do not actually > >> guarantee the even part of it (but well if it's a sse41 capable box we > >> pretty much know it would do just that anyway)... (And technically > >> nearbyintf would probably be an even better solution, since we never > >> want to get involved with the clunky exceptions, otherwise it's > >> identical. But there might be reasons why it isn't used.) > >> > >> Roland > > > > I'm not convinced we want it either, since it seems to be slower than > > glibc's > > rintf. I guess it probably does make sense to use the nearbyintf instead. > > > > As an aside (since I know 0 about assembly), does _MM_FROUND_CUR_DIRECTION > > not > > check the rounding mode? > Oh indeed, I didn't check the code too closely (I was just assuming > _mm_round_ss() was used because it is possible to use round-to-nearest > regardless the actual rounding mode, but that's not the case). > > But actually I misread this code: the point of mesa_roundevenf is to > round to float WITHOUT conversion to int. In which case it makes more > sense at least at first look... > > But if you want to round to nearest integer WITH conversion to int, you > probably really want to use something else. nearbyint family doesn't > have variants which give you ints. There's rint functions which give you > ints directly, but they are likely a very bad idea (aside from exception Why? > handling, not quite sure if this really causes the compiler to do > something different) because of giving you long (or long long) results - > meaning that you can't use the simple cpu instructions giving you 32bit > results (because conversion to 64bit long + trunc to 32bit will give you > defined (although meaningless) results in some cases where direct > conversion to 32bit int wouldn't). > So ideally you'd pick a variant where the compiler is smart enough to > recognize it can be done with a single instruction. I would guess > nearbyintf + int cast should do just about everywhere, at least as long > as x64 or x86 + sse2 is used, my suspicion is the old IROUND function > was done in a time where x87 was still relevant. Or maybe rintf + int > cast, no idea how the compiler really handles them differently (I tried > to quickly look at it in gcc source, but no idea where those are > buried). As a side note, I hate it when the assembly solution is obvious > and you can't really figure out how the hell you should coax the > compiler in giving you the right answer (I mean, high level languages > are there to help, not get in your way...). Please read the commit message of commit dd0d3a2c0fb388745519c8a3be800720541eccfe Author: Matt Turner Date: Tue Mar 10 17:55:21 2015 -0700 mesa: Replace _mesa_round_to_even() with _mesa_roundeven(). for a lot of the background. I expect IROUND_POS can be replaced with the _mesa_lroundevenf function. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
Re: [Mesa-dev] [PATCH] st/mesa: don't do L3 thread pinning for Blender
Hi Marek Sure. Thanks for writing these patches. The looks good. I've done some small testing: drawoverhead numbers looks great in my eyes: 29: DrawElements ( 1 VBO, 8 UBO, 8 Tex) w/ sample mask enable change: 6.63 million (94.7%) Hitman benchmark runs nicely, even slightly bit faster than before and uses all the cores: 52.61fps Average Tombraider benchmark is fine: 105 FPS Thanks edmondo On Tue, Nov 13, 2018 at 1:21 AM Marek Olšák wrote: > Hi Edmondo, > > can you test the two attached patches? They re-enable and rework the > thread pinning. > > Thanks, > Marek > > On Mon, Nov 12, 2018 at 4:31 PM Edmondo Tommasina < > edmondo.tommas...@gmail.com> wrote: > >> On Mon, Nov 12, 2018 at 6:43 PM Michel Dänzer wrote: >> >>> On 2018-11-08 6:23 a.m., Marek Olšák wrote: >>> > Thanks a lot man. I'll reconsider this depending on the results I >>> receive. >>> > >>> > I may also just pin the Mesa threads and keep the app thread intact. It >>> > should perform OK with glthread, but not without glthread. >>> > >>> > Another option is to have the gallium and winsys threads "chase" the >>> main >>> > thread within the CPU by changing the thread affinity based on >>> getcpu(). >>> >>> While those are interesting ideas for the future, I'm afraid it's too >>> late for them for the 18.3.0 release (scheduled for November 21st IIRC). >>> >>> Please make sure the thread pinning code is disabled for the release, at >>> least by default. >>> >> >> I'm not sure what the best solution is, but pinning the threads to >> the L3 CCX has shown great potential on my Ryzen 5 2600 and it would >> be nice to explore the ideas presented by Marek or maybe understand, >> why the kernel scheduler prefers to put the threads on cores on >> different CCX. >> >> For example The Wicher 2 goes from 60 FPS to 70 FPS average and this >> is impressive. Tomb Raider just increases about 1 FPS (average 104 FPS) >> but this can be just noise and for sure not noticeable. >> >> Regards >> edmondo >> >> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] nir: Allow to skip integer ops in nir_lower_to_source_mods
Looks correct. Reviewed-by: Jason Ekstrand On Mon, Nov 12, 2018 at 2:17 AM Gert Wollny wrote: > From: Gert Wollny > > Some hardware supports source mods only for float operations. Make it > possible to skip lowering to source mods in these cases. > > v2: use option flags instead of a boolean (Jason Ekstrand) > > Signed-off-by: Gert Wollny > --- > src/compiler/nir/nir.h | 10 ++- > src/compiler/nir/nir_lower_to_source_mods.c | 78 + > src/intel/compiler/brw_nir.c| 2 +- > 3 files changed, 58 insertions(+), 32 deletions(-) > > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h > index dc3c729dee..c4601ed218 100644 > --- a/src/compiler/nir/nir.h > +++ b/src/compiler/nir/nir.h > @@ -3013,7 +3013,15 @@ typedef struct nir_lower_bitmap_options { > void nir_lower_bitmap(nir_shader *shader, const nir_lower_bitmap_options > *options); > > bool nir_lower_atomics_to_ssbo(nir_shader *shader, unsigned ssbo_offset); > -bool nir_lower_to_source_mods(nir_shader *shader); > + > +typedef enum { > + nir_lower_int_source_mods = 1 << 0, > + nir_lower_float_source_mods = 1 << 1, > + nir_lower_all_source_mods = (1 << 2) - 1 > +} nir_lower_to_source_mods_flags; > + > + > +bool nir_lower_to_source_mods(nir_shader *shader, > nir_lower_to_source_mods_flags options); > > bool nir_lower_gs_intrinsics(nir_shader *shader); > > diff --git a/src/compiler/nir/nir_lower_to_source_mods.c > b/src/compiler/nir/nir_lower_to_source_mods.c > index 077ca53704..657bf8a3d7 100644 > --- a/src/compiler/nir/nir_lower_to_source_mods.c > +++ b/src/compiler/nir/nir_lower_to_source_mods.c > @@ -34,7 +34,8 @@ > */ > > static bool > -nir_lower_to_source_mods_block(nir_block *block) > +nir_lower_to_source_mods_block(nir_block *block, > + nir_lower_to_source_mods_flags options) > { > bool progress = false; > > @@ -58,10 +59,14 @@ nir_lower_to_source_mods_block(nir_block *block) > > switch > (nir_alu_type_get_base_type(nir_op_infos[alu->op].input_types[i])) { > case nir_type_float: > +if (!(options & nir_lower_float_source_mods)) > + continue; > if (parent->op != nir_op_fmov) > continue; > break; > case nir_type_int: > +if (!(options & nir_lower_int_source_mods)) > + continue; > if (parent->op != nir_op_imov) > continue; > break; > @@ -97,33 +102,41 @@ nir_lower_to_source_mods_block(nir_block *block) > progress = true; >} > > - switch (alu->op) { > - case nir_op_fsat: > - alu->op = nir_op_fmov; > - alu->dest.saturate = true; > - break; > - case nir_op_ineg: > - alu->op = nir_op_imov; > - alu->src[0].negate = !alu->src[0].negate; > - break; > - case nir_op_fneg: > - alu->op = nir_op_fmov; > - alu->src[0].negate = !alu->src[0].negate; > - break; > - case nir_op_iabs: > - alu->op = nir_op_imov; > - alu->src[0].abs = true; > - alu->src[0].negate = false; > - break; > - case nir_op_fabs: > - alu->op = nir_op_fmov; > - alu->src[0].abs = true; > - alu->src[0].negate = false; > - break; > - default: > - break; > + if (options & nir_lower_float_source_mods) { > + switch (alu->op) { > + case nir_op_fsat: > +alu->op = nir_op_fmov; > +alu->dest.saturate = true; > +break; > + case nir_op_fneg: > +alu->op = nir_op_fmov; > +alu->src[0].negate = !alu->src[0].negate; > +break; > + case nir_op_fabs: > +alu->op = nir_op_fmov; > +alu->src[0].abs = true; > +alu->src[0].negate = false; > +break; > + default: > +break; > + } >} > > + if (options & nir_lower_int_source_mods) { > + switch (alu->op) { > + case nir_op_ineg: > +alu->op = nir_op_imov; > +alu->src[0].negate = !alu->src[0].negate; > +break; > + case nir_op_iabs: > +alu->op = nir_op_imov; > +alu->src[0].abs = true; > +alu->src[0].negate = false; > +break; > + default: > +break; > + } > + } >/* We've covered sources. Now we're going to try and saturate the > * destination if we can. > */ > @@ -136,6 +149,9 @@ nir_lower_to_source_mods_block(nir_block *block) >nir_type_float) > continue; > > + if (!(options & nir_lower_float_source_mods)) > + continue; > + >if (!list_empty(>dest.dest.ssa.if_uses)) > continue; > > @@ -185,12 +201,13 @@ nir_lower_to_source_mods_block(nir_block *block) > } > > static bool >