Re: [Mesa-dev] [PATCH v2] i965: fix release build unused variable warning
On 12/11/2016 08:57 PM, Grazvydas Ignotas wrote: > Signed-off-by: Grazvydas Ignotas> --- > v2: take Eduardo Lima's suggestion further - remove the variable completely > no commit access > I just pushed the patch. Thanks! Eduardo > src/mesa/drivers/dri/i965/brw_blorp.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c > b/src/mesa/drivers/dri/i965/brw_blorp.c > index 4c1d858..43ac3be 100644 > --- a/src/mesa/drivers/dri/i965/brw_blorp.c > +++ b/src/mesa/drivers/dri/i965/brw_blorp.c > @@ -813,8 +813,6 @@ do_single_blorp_clear(struct brw_context *brw, struct > gl_framebuffer *fb, >can_fast_clear = false; > > const unsigned logical_layer = irb_logical_mt_layer(irb); > - const bool is_lossless_compressed = intel_miptree_is_lossless_compressed( > - brw, irb->mt); > const enum intel_fast_clear_state fast_clear_state = >intel_miptree_get_fast_clear_state(irb->mt, irb->mt_level, > logical_layer); > @@ -850,7 +848,7 @@ do_single_blorp_clear(struct brw_context *brw, struct > gl_framebuffer *fb, > * it now. > */ >if (!irb->mt->mcs_buf) { > - assert(!is_lossless_compressed); > + assert(!intel_miptree_is_lossless_compressed(brw, irb->mt)); > if (!intel_miptree_alloc_non_msrt_mcs(brw, irb->mt, false)) { > /* MCS allocation failed--probably this will only happen in > * out-of-memory conditions. But in any case, try to recover > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/9] i965: Factor out oword block read and write message control calculation.
On Sunday, December 11, 2016 5:44:03 PM PST Francisco Jerez wrote: > Kenneth Graunkewrites: > > > On Friday, December 9, 2016 11:03:27 AM PST Francisco Jerez wrote: > >> We'll need roughly the same logic in other places and it would be > >> annoying to duplicate it. Instead factor it out into a function-like > >> macro that takes the number of dwords per block (which will prove more > >> convenient than taking the same value in owords or some other unit). > >> --- > >> src/mesa/drivers/dri/i965/brw_defines.h | 6 ++ > >> src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 ++ > >> 2 files changed, 8 insertions(+), 12 deletions(-) > >> > >> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > >> b/src/mesa/drivers/dri/i965/brw_defines.h > >> index cae8e9a..1c638a0 100644 > >> --- a/src/mesa/drivers/dri/i965/brw_defines.h > >> +++ b/src/mesa/drivers/dri/i965/brw_defines.h > >> @@ -1669,6 +1669,12 @@ enum brw_message_target { > >> #define BRW_DATAPORT_OWORD_BLOCK_2_OWORDS 2 > >> #define BRW_DATAPORT_OWORD_BLOCK_4_OWORDS 3 > >> #define BRW_DATAPORT_OWORD_BLOCK_8_OWORDS 4 > >> +#define BRW_DATAPORT_OWORD_BLOCK_DWORDS(n) \ > >> + ((n) == 4 ? BRW_DATAPORT_OWORD_BLOCK_1_OWORDLOW :\ > >> +(n) == 8 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS : \ > >> +(n) == 16 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS : \ > >> +(n) == 32 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : \ > >> +(abort(), ~0)) > > > > How about: > > > > #define BRW_DATAPORT_OWORD_BLOCK_DWORDS(n) \ > >(assert(n == 4 || n == 8 || n == 16 || n == 32), ffs(n) - 3) > > > > I don't think that would work for n > 4 due to the rather unfortunate > hardware encoding, e.g. BRW_DATAPORT_OWORD_BLOCK_2_OWORDS is supposed to > be encoded as 2 but your macro would give 1 as result. :| Whoops. I suppose you could do: #define BRW_DATAPORT_OWORD_BLOCK_DWORDS(n) \ ((n) == 4 ? 0 : (assert(n == 8 || n == 16 || n == 32), ffs(n) - 2)) but I'm not sure that's that much better... signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Make copy propagation not panic when it sees an intrinsic.
On Mon, 2016-12-12 at 11:35 +1100, Timothy Arceri wrote: > On Sun, 2016-12-11 at 00:00 -0800, Kenneth Graunke wrote: > > > > On Saturday, December 10, 2016 12:37:16 PM PST Matt Turner wrote: > > > > > > > > > On Fri, Dec 9, 2016 at 8:28 PM, Kenneth Graunke> > e. > > > org> wrote: > > > > > > > > > > > > A number of games have large arrays of constants, which we > > > > promote to > > > > uniforms. This introduces copies from the uniform array to the > > > > original > > > > temporary array. Normally, copy propagation eliminates those > > > > copies, > > > > making everything refer to the uniform array directly. > > > > > > > > A number of shaders in "Deus Ex: Mankind Divided" recently > > > > exposed a > > > > limitation of copy propagation - if we had any intrinsics (i.e. > > > > image > > > > access in a compute shader), we weren't able to get rid of > > > > these > > > > copies. > > > > > > > > That meant that any variable indexing remained on the temporary > > > > array > > > > rather being moved to the uniform array. i965's scalar backend > > > > currently doesn't support indirect addressing of temporary > > > > arrays, > > > > which meant lowering it to if-ladders. This was horrible. > > > > > > > > On Skylake: > > > > > > > > total instructions in shared programs: 13700090 -> 13654519 (- > > > > 0.33%) > > > > instructions in affected programs: 56438 -> 10867 (-80.75%) > > > > > > Wow! > > > > > > > > > > > > > > > helped: 14 > > > > HURT: 0 > > > > > > > > total cycles in shared programs: 288879704 -> 291270232 (0.83%) > > > > cycles in affected programs: 12758080 -> 15148608 (18.74%) > > > > > > ... that seems nuts? > > > > > > Any idea what's going on with the cycle counts? > > > > Good point...I glossed over the cycle counts when I saw the -80% > > reduction in instructions with 0 shaders hurt. But they do look > > pretty bad, so let's take a closer look... > > > > There are two nearly identical shaders that are the worst > > offenders: > > > > shaders/closed/steam/deus-ex-mankind-divided/256.shader_test CS > > SIMD16: > > > > instructions: 2770 -> 253 (-2,517 instructions or -90.87%) > > spills: 25 -> 0 > > fills: 29 -> 0 > > cycles: 923266 -> 1420534 (+497,268 cycles or +53.86%) > > compile time: 2.73 seconds -> 0.17 seconds > > > > There are three loops in the program, each of which contains two > > indirect reads of the uvec4[98] constant array. > > > > Before this patch, there were: > > - 67 UNIFORM_PULL_CONSTANT_LOADs at the top of the program > > - 1 UNIFORM_PULL_CONSTANT_LOAD in the first (cheap) loop > > - 1 UNIFORM_PULL_CONSTANT_LOAD in the second (expensive) loop > > - 1 UNIFORM_PULL_CONSTANT_LOAD in the third (very expensive) loop > > > > After this patch, there are: > > - 0 loads at the top of the program > > - 1 VARYING_PULL_CONSTANT_LOAD in the first (cheap) loop > > - 2 VARYING_PULL_CONSTANT_LOAD in the second (expensive) loop > > - 2 VARYING_PULL_CONSTANT_LOAD in the third (very expensive) loop > > > > The array indexes in the expensive loop are a[foo] and a[foo + 1]. > > foo is modified in the loop, so they can't be hoisted out. I don't > > think we can determine the number of loop iterations. > > The first loop has a trip count of 49. If I force it to unroll I get: > > CS SIMD16 shader: 682 inst, 2 loops, 1009856 cycles. > > The current unrolling rules are: > > static bool > is_loop_small_enough_to_unroll(nir_shader *shader, nir_loop_info *li) > { > unsigned max_iter = shader->options->max_unroll_iterations; > > if (li->trip_count > max_iter) > return false; > > if (li->force_unroll) // this is for consecutive indirect array > access > return true; > > bool loop_not_too_large = > li->num_instructions * li->trip_count <= max_iter * 25; > > return loop_not_too_large; > } > > > Maybe we should just drop the first trip count check? Checking the > number of intructions * trip count makes a lot more sense. Dropping > the > check allows the first loop to unroll. I tried to get this working for GLSL IR loop unrolling and fixed a couple of the other problems, panicing at intrinsics (partialy fixed -well I just ignore them for now), and nequal condition. But I also ran into futher issues: 1) Not being able to detect an induction variable because we end up with: (assign (x) (var_ref compiler_temp@19) (expression uint bitcast_f2u (swiz y (var_ref r1) )) ) (if (expression bool != (var_ref compiler_temp@19) (constant uint (0)) ) ( break ) 2) The GLSL IR analysis pass bails if the first if in the loop is not a loop terminator. It would have been nice to fix the GLSL IR pass so I could compare the NIR pass to the improved version and try to avoid regressing shaders like this but I'm stating to think it's not worth all the effort. What do you think? [1]
Re: [Mesa-dev] [PATCH v3 2/3] svga: Fix a strict-aliasing violation in shader dumper
Brian/Roland ping? On 12/07/2016 10:30 AM, Edward O'Callaghan wrote: > As per the C spec, it is illegal to alias pointers to different > types. This results in undefined behaviour after optimization > passes, resulting in very subtle bugs that happen only on a > full moon.. > > Use a memcpy() as a well defined coercion between the isomorphic > bit-field interpretations of memory. > > V.2: Use C99 compat STATIC_ASSERT() over C11 static_assert(). > > Signed-off-by: Edward O'Callaghan> --- > src/gallium/drivers/svga/svgadump/svga_shader_dump.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/svga/svgadump/svga_shader_dump.c > b/src/gallium/drivers/svga/svgadump/svga_shader_dump.c > index 4ee1bf2..46126a5 100644 > --- a/src/gallium/drivers/svga/svgadump/svga_shader_dump.c > +++ b/src/gallium/drivers/svga/svgadump/svga_shader_dump.c > @@ -30,6 +30,9 @@ > * @author Michal Krol > */ > > +#include > +#include > + > #include "svga_shader.h" > #include "svga_shader_dump.h" > #include "svga_shader_op.h" > @@ -413,6 +416,11 @@ dump_dstreg(struct sh_dstreg dstreg, > > static void dump_srcreg( struct sh_srcreg srcreg, struct sh_srcreg *indreg, > const struct dump_info *di ) > { > + struct sh_reg srcreg_sh = {0}; > + /* bit-fields carefully aligned, ensure they stay that way. */ > + STATIC_ASSERT(sizeof(struct sh_reg) == sizeof(struct sh_srcreg)); > + memcpy(_sh, , sizeof(srcreg_sh)); > + > switch (srcreg.modifier) { > case SVGA3DSRCMOD_NEG: > case SVGA3DSRCMOD_BIASNEG: > @@ -427,7 +435,7 @@ static void dump_srcreg( struct sh_srcreg srcreg, struct > sh_srcreg *indreg, cons > case SVGA3DSRCMOD_NOT: >_debug_printf( "!" ); > } > - dump_reg( *(struct sh_reg *) , indreg, di ); > + dump_reg(srcreg_sh, indreg, di ); > switch (srcreg.modifier) { > case SVGA3DSRCMOD_NONE: > case SVGA3DSRCMOD_NEG: > signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] EGL/android: Enhance pbuffer implementation
Hi, On Fri, Dec 9, 2016 at 8:29 PM, Liu Zhiquanwrote: > Some dri drivers will pass multiple bits in buffer_mask parameter > to droid_image_get_buffer(), more than the actual supported buffer > type combination. For such case, will go through all the bits, and > will not return error when unsupported buffer is requested, only > return error when the allocation for supported buffer failed. > > v2: coding style and log changes > v3: coding style changes and update patch format > > Signed-off-by: Liu Zhiquan > Signed-off-by: Long, Zhifang > Reviewed-by: Tomasz Figa > --- > src/egl/drivers/dri2/platform_android.c | 177 > +--- > 1 file changed, 96 insertions(+), 81 deletions(-) > Looks good, thanks for patiently addressing all the nitpicks. Re-affirming: Reviewed-by: Tomasz Figa Best regards, Tomasz ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] vulkan: fix 'statement with no effect' warning
I have a patch in my tree that fixes that warning by making us actually return the error code in that function rather than simply using it as a fancy print function. I think that's the better fix. On Dec 11, 2016 9:06 AM, "Grazvydas Ignotas"wrote: > Emitted on release build in case vk_error() return is not used. > > Signed-off-by: Grazvydas Ignotas > --- > no commit access > > src/vulkan/vk_util.c | 4 > src/vulkan/vk_util.h | 13 - > 2 files changed, 12 insertions(+), 5 deletions(-) > > diff --git a/src/vulkan/vk_util.c b/src/vulkan/vk_util.c > index 5d38117..716b8e5 100644 > --- a/src/vulkan/vk_util.c > +++ b/src/vulkan/vk_util.c > @@ -60,6 +60,8 @@ __vk_finishme(const char *file, int line, const char > *format, ...) > fprintf(stderr, "%s:%d: FINISHME: %s\n", file, line, buffer); > } > > +#ifdef DEBUG > + > VkResult > __vk_errorf(VkResult error, const char *file, int line, const char > *format, ...) > { > @@ -103,3 +105,5 @@ __vk_errorf(VkResult error, const char *file, int > line, const char *format, ...) > > return error; > } > + > +#endif /* DEBUG */ > diff --git a/src/vulkan/vk_util.h b/src/vulkan/vk_util.h > index c384838..5ce206c 100644 > --- a/src/vulkan/vk_util.h > +++ b/src/vulkan/vk_util.h > @@ -72,15 +72,18 @@ align_i32(int32_t v, int32_t a) > * propagating errors. Might be useful to plug in a stack trace here. > */ > > +#ifdef DEBUG > VkResult __vk_errorf(VkResult error, const char *file, int line, const > char *format, ...); > +#else > +static inline VkResult > +__vk_errorf(VkResult error, const char *file, int line, const char > *format, ...) > +{ > + return error; > +} > +#endif > > -#ifdef DEBUG > #define vk_error(error) __vk_errorf(error, __FILE__, __LINE__, NULL); > #define vk_errorf(error, format, ...) __vk_errorf(error, __FILE__, > __LINE__, format, ## __VA_ARGS__); > -#else > -#define vk_error(error) error > -#define vk_errorf(error, format, ...) error > -#endif > > void __vk_finishme(const char *file, int line, const char *format, ...) > vk_printflike(3, 4); > -- > 2.7.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] clover: Implement 'CL_MEM_OBJECT_IMAGE1D'
Edward O'Callaghanwrites: > Hi Francisco, thanks for the review.. > > On 11/25/2016 03:39 PM, Francisco Jerez wrote: >> Edward O'Callaghan writes: >> >>> Signed-off-by: Edward O'Callaghan >>> --- >>> src/gallium/state_trackers/clover/api/memory.cpp | 9 - >>> src/gallium/state_trackers/clover/core/memory.cpp | 13 + >>> src/gallium/state_trackers/clover/core/memory.hpp | 10 ++ >>> 3 files changed, 31 insertions(+), 1 deletion(-) >>> >>> diff --git a/src/gallium/state_trackers/clover/api/memory.cpp >>> b/src/gallium/state_trackers/clover/api/memory.cpp >>> index 58f56d1..41e344e 100644 >>> --- a/src/gallium/state_trackers/clover/api/memory.cpp >>> +++ b/src/gallium/state_trackers/clover/api/memory.cpp >>> @@ -166,6 +166,14 @@ clCreateImage(cl_context d_ctx, cl_mem_flags d_flags, >>> ret_error(r_errcode, CL_SUCCESS); >>> >>> switch (desc->image_type) { >>> + case CL_MEM_OBJECT_IMAGE1D: >>> + if (!desc->image_width) >>> + throw error(CL_INVALID_IMAGE_SIZE); >>> + >> >> We probably need to check that the specified image width is within the >> device limits -- There's no device::max_image_levels_1d() query but >> max_image_levels_2d() should work as well. >> >>> + return new image1d(ctx, flags, format, >>> + desc->image_width, >>> + desc->image_row_pitch, host_ptr); >>> + >>> case CL_MEM_OBJECT_IMAGE2D: >>>if (!desc->image_width || !desc->image_height) >>> throw error(CL_INVALID_IMAGE_SIZE); >>> @@ -214,7 +222,6 @@ clCreateImage(cl_context d_ctx, cl_mem_flags d_flags, >>> desc->image_depth, desc->image_row_pitch, >>> desc->image_slice_pitch, host_ptr); >>> >>> - case CL_MEM_OBJECT_IMAGE1D: >>> case CL_MEM_OBJECT_IMAGE1D_ARRAY: >>> case CL_MEM_OBJECT_IMAGE1D_BUFFER: >>>// XXX - Not implemented. >>> diff --git a/src/gallium/state_trackers/clover/core/memory.cpp >>> b/src/gallium/state_trackers/clover/core/memory.cpp >>> index de1862b..246bd2d 100644 >>> --- a/src/gallium/state_trackers/clover/core/memory.cpp >>> +++ b/src/gallium/state_trackers/clover/core/memory.cpp >>> @@ -185,6 +185,19 @@ image::slice_pitch() const { >>> return _slice_pitch; >>> } >>> >>> +image1d::image1d(clover::context , cl_mem_flags flags, >>> + const cl_image_format *format, >>> + size_t width, size_t row_pitch, >>> + void *host_ptr) : >>> + image(ctx, flags, format, width, 0, 1, >> >> All surface dimension fields (width, height, depth) of a clover::image >> object are considered valid regardless of the image type, so we should >> set unused dimensions to 1 in order to avoid unexpected results while >> doing arithmetic or various error checking with them. >> >>> + row_pitch, 0, row_pitch, host_ptr) { >> >> I don't think you can rely on the row pitch to be meaningful for 1D >> images, it's probably necessary to pass it as argument to the >> constructor, we're probably better off calculating the size by hand. > Why not and what do you propose here exactly? > Because technically 1D images are a single row, so the row pitch is kind of meaningless... How about 'util_format_get_blocksize(translate_format(format)) * width'? >> >>> +} >>> + >>> +cl_mem_object_type >>> +image1d::type() const { >>> + return CL_MEM_OBJECT_IMAGE1D; >>> +} >>> + >>> image2d::image2d(clover::context , cl_mem_flags flags, >>> const cl_image_format *format, size_t width, >>> size_t height, size_t row_pitch, >>> diff --git a/src/gallium/state_trackers/clover/core/memory.hpp >>> b/src/gallium/state_trackers/clover/core/memory.hpp >>> index 1a3e8c9..ad9052b 100644 >>> --- a/src/gallium/state_trackers/clover/core/memory.hpp >>> +++ b/src/gallium/state_trackers/clover/core/memory.hpp >>> @@ -134,6 +134,16 @@ namespace clover { >>> std::unique_ptr> resources; >>> }; >>> >>> + class image1d : public image { >>> + public: >>> + image1d(clover::context , cl_mem_flags flags, >>> + const cl_image_format *format, >>> + size_t width, size_t row_pitch, >>> + void *host_ptr); >>> + >>> + virtual cl_mem_object_type type() const; >>> + }; >>> + >>> class image2d : public image { >>> public: >>>image2d(clover::context , cl_mem_flags flags, >>> -- >>> 2.7.4 >>> >>> ___ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] clover: Implement 'CL_MEM_OBJECT_IMAGE2D_ARRAY'
Edward O'Callaghanwrites: > On 11/25/2016 03:38 PM, Francisco Jerez wrote: >> Edward O'Callaghan writes: >> >>> Signed-off-by: Edward O'Callaghan >>> --- >>> src/gallium/state_trackers/clover/api/memory.cpp | 17 - >>> src/gallium/state_trackers/clover/core/memory.cpp | 14 ++ >>> src/gallium/state_trackers/clover/core/memory.hpp | 11 +++ >>> 3 files changed, 41 insertions(+), 1 deletion(-) >>> >>> diff --git a/src/gallium/state_trackers/clover/api/memory.cpp >>> b/src/gallium/state_trackers/clover/api/memory.cpp >>> index 9b3cd8b..58f56d1 100644 >>> --- a/src/gallium/state_trackers/clover/api/memory.cpp >>> +++ b/src/gallium/state_trackers/clover/api/memory.cpp >>> @@ -181,6 +181,22 @@ clCreateImage(cl_context d_ctx, cl_mem_flags d_flags, >>> desc->image_width, desc->image_height, >>> desc->image_row_pitch, host_ptr); >>> >>> + case CL_MEM_OBJECT_IMAGE2D_ARRAY: >>> + if (!desc->image_width || !desc->image_height) >>> + throw error(CL_INVALID_IMAGE_SIZE); >>> + >>> + if (all_of([=](const device ) { >>> + const size_t max = 1 << dev.max_image_levels_2d(); >>> + return (desc->image_width > max || >>> + desc->image_height > max); >>> +}, ctx.devices())) >>> + throw error(CL_INVALID_IMAGE_SIZE); >>> + >>> + return new image2d_array(ctx, flags, format, >>> + desc->image_width, desc->image_height, >>> + desc->image_array_size, >>> desc->image_slice_pitch, >>> + host_ptr); >>> + >>> case CL_MEM_OBJECT_IMAGE3D: >>>if (!desc->image_width || !desc->image_height || !desc->image_depth) >>> throw error(CL_INVALID_IMAGE_SIZE); >>> @@ -201,7 +217,6 @@ clCreateImage(cl_context d_ctx, cl_mem_flags d_flags, >>> case CL_MEM_OBJECT_IMAGE1D: >>> case CL_MEM_OBJECT_IMAGE1D_ARRAY: >>> case CL_MEM_OBJECT_IMAGE1D_BUFFER: >>> - case CL_MEM_OBJECT_IMAGE2D_ARRAY: >>>// XXX - Not implemented. >>>throw error(CL_IMAGE_FORMAT_NOT_SUPPORTED); >>> >>> diff --git a/src/gallium/state_trackers/clover/core/memory.cpp >>> b/src/gallium/state_trackers/clover/core/memory.cpp >>> index b852e68..de1862b 100644 >>> --- a/src/gallium/state_trackers/clover/core/memory.cpp >>> +++ b/src/gallium/state_trackers/clover/core/memory.cpp >>> @@ -198,6 +198,20 @@ image2d::type() const { >>> return CL_MEM_OBJECT_IMAGE2D; >>> } >>> >>> +image2d_array::image2d_array(clover::context , cl_mem_flags flags, >>> + const cl_image_format *format, >>> + size_t width, size_t height, >>> + size_t array_size, size_t slice_pitch, >>> + void *host_ptr) : >>> + image(ctx, flags, format, width, height, 1, >>> + 0, slice_pitch, slice_pitch * array_size, host_ptr) { >>> +} >> >> Seems like you aren't passing the row pitch as argument, and the array >> size is getting lost so you won't be able to recover it in order to >> implement memory transfer functions and CL image queries. > So I should extend the image base class with a private _array_size state > and a const getter method also, would that be reasonable? > Yeah, that sounds reasonable to me. >> >>> + >>> +cl_mem_object_type >>> +image2d_array::type() const { >>> + return CL_MEM_OBJECT_IMAGE2D_ARRAY; >>> +} >>> + >>> image3d::image3d(clover::context , cl_mem_flags flags, >>> const cl_image_format *format, >>> size_t width, size_t height, size_t depth, >>> diff --git a/src/gallium/state_trackers/clover/core/memory.hpp >>> b/src/gallium/state_trackers/clover/core/memory.hpp >>> index bd6da6b..1a3e8c9 100644 >>> --- a/src/gallium/state_trackers/clover/core/memory.hpp >>> +++ b/src/gallium/state_trackers/clover/core/memory.hpp >>> @@ -144,6 +144,17 @@ namespace clover { >>>virtual cl_mem_object_type type() const; >>> }; >>> >>> + class image2d_array : public image { >>> + public: >>> + image2d_array(clover::context , cl_mem_flags flags, >>> +const cl_image_format *format, >>> +size_t width, size_t height, >>> +size_t array_size, size_t slice_pitch, >>> +void *host_ptr); >>> + >>> + virtual cl_mem_object_type type() const; >>> + }; >>> + >>> class image3d : public image { >>> public: >>>image3d(clover::context , cl_mem_flags flags, >>> -- >>> 2.7.4 >>> >>> ___ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: PGP signature ___
Re: [Mesa-dev] [PATCH v3 2/2] clover: add GetKernelArgInfo (CL 1.2)
Serge Martinwrites: > --- > src/gallium/state_trackers/clover/api/kernel.cpp | 92 > +- > src/gallium/state_trackers/clover/core/module.cpp | 14 > src/gallium/state_trackers/clover/core/module.hpp | 11 +++ > .../state_trackers/clover/llvm/codegen/common.cpp | 11 +++ > 4 files changed, 125 insertions(+), 3 deletions(-) > > diff --git a/src/gallium/state_trackers/clover/api/kernel.cpp > b/src/gallium/state_trackers/clover/api/kernel.cpp > index 73ba34a..81a4a38 100644 > --- a/src/gallium/state_trackers/clover/api/kernel.cpp > +++ b/src/gallium/state_trackers/clover/api/kernel.cpp > @@ -192,9 +192,95 @@ clGetKernelWorkGroupInfo(cl_kernel d_kern, cl_device_id > d_dev, > CLOVER_API cl_int > clGetKernelArgInfo(cl_kernel d_kern, > cl_uint idx, cl_kernel_arg_info param, > - size_t size, void *r_buf, size_t *r_size) { > - CLOVER_NOT_SUPPORTED_UNTIL("1.2"); > - return CL_KERNEL_ARG_INFO_NOT_AVAILABLE; > + size_t size, void *r_buf, size_t *r_size) try { > + property_buffer buf { r_buf, size, r_size }; > + const auto = obj(d_kern); > + > + const auto args = > + find(name_equals(kern.name()), kern.program().symbols()).args; > + > + if (args.at(0).arg_info.arg_name.empty()) > + throw error(CL_KERNEL_ARG_INFO_NOT_AVAILABLE); > + > + const auto = find([&](const clover::module::argument ) { > + return arg.arg_info.index == idx; > +}, args); > + > + switch (param) { > + case CL_KERNEL_ARG_ADDRESS_QUALIFIER: { > + cl_kernel_arg_address_qualifier v; > + switch (arg.type) { > + case module::argument::local: > +v = CL_KERNEL_ARG_ADDRESS_LOCAL; > +break; > + case module::argument::constant: > +v = CL_KERNEL_ARG_ADDRESS_CONSTANT; > +break; > + case module::argument::global: > + case module::argument::image2d_rd: > + case module::argument::image2d_wr: > + case module::argument::image3d_rd: > + case module::argument::image3d_wr: > +v = CL_KERNEL_ARG_ADDRESS_GLOBAL; > +break; > + default: > +v = CL_KERNEL_ARG_ADDRESS_PRIVATE; > + } > + buf.as_scalar() = v; > + break; > + } > + > + case CL_KERNEL_ARG_ACCESS_QUALIFIER: { > + cl_kernel_arg_access_qualifier v; > + switch (arg.type) { > + case module::argument::image2d_rd: > + case module::argument::image3d_rd: > +v = CL_KERNEL_ARG_ACCESS_READ_ONLY; > +break; > + case module::argument::image2d_wr: > + case module::argument::image3d_wr: > +v = CL_KERNEL_ARG_ACCESS_WRITE_ONLY; > +break; > + default: > +v = CL_KERNEL_ARG_ACCESS_NONE; > + } > + buf.as_scalar() = v; > + break; > + } > + > + case CL_KERNEL_ARG_TYPE_NAME: > + buf.as_string() = arg.arg_info.type_name; > + break; > + > + case CL_KERNEL_ARG_TYPE_QUALIFIER: { > + cl_kernel_arg_type_qualifier v = CL_KERNEL_ARG_TYPE_NONE; > + > + if (arg.arg_info.type_qualifier.find("const") != std::string::npos) > + v |= CL_KERNEL_ARG_TYPE_CONST; > + if (arg.arg_info.type_qualifier.find("restrict") != std::string::npos) > + v |= CL_KERNEL_ARG_TYPE_RESTRICT; > + if (arg.arg_info.type_qualifier.find("volatile") != std::string::npos) > + v |= CL_KERNEL_ARG_TYPE_VOLATILE; > + > + buf.as_scalar() = v; > + break; > + } > + > + case CL_KERNEL_ARG_NAME: > + buf.as_string() = arg.arg_info.arg_name; > + break; > + > + default: > + throw error(CL_INVALID_VALUE); > + } > + > + return CL_SUCCESS; > + > +} catch (std::out_of_range ) { > + return CL_INVALID_ARG_INDEX; > + > +} catch (error ) { > + return e.get(); > } > > namespace { > diff --git a/src/gallium/state_trackers/clover/core/module.cpp > b/src/gallium/state_trackers/clover/core/module.cpp > index a6c5b98..1b6b642 100644 > --- a/src/gallium/state_trackers/clover/core/module.cpp > +++ b/src/gallium/state_trackers/clover/core/module.cpp > @@ -168,6 +168,19 @@ namespace { >} > }; > > + /// (De)serialize a module::argument::info > + template<> > + struct _serializer { > + template > + static void > + proc(S , QT ) { > + _proc(s, x.index); > + _proc(s, x.type_name); > + _proc(s, x.type_qualifier); > + _proc(s, x.arg_name); > + } > + }; > + > /// (De)serialize a module::argument. > template<> > struct _serializer { > @@ -180,6 +193,7 @@ namespace { > _proc(s, x.target_align); > _proc(s, x.ext_type); > _proc(s, x.semantic); > + _proc(s, x.arg_info); >} > }; > > diff --git a/src/gallium/state_trackers/clover/core/module.hpp >
Re: [Mesa-dev] [PATCH 4/9] i965: Factor out oword block read and write message control calculation.
Kenneth Graunkewrites: > On Friday, December 9, 2016 11:03:27 AM PST Francisco Jerez wrote: >> We'll need roughly the same logic in other places and it would be >> annoying to duplicate it. Instead factor it out into a function-like >> macro that takes the number of dwords per block (which will prove more >> convenient than taking the same value in owords or some other unit). >> --- >> src/mesa/drivers/dri/i965/brw_defines.h | 6 ++ >> src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 ++ >> 2 files changed, 8 insertions(+), 12 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h >> b/src/mesa/drivers/dri/i965/brw_defines.h >> index cae8e9a..1c638a0 100644 >> --- a/src/mesa/drivers/dri/i965/brw_defines.h >> +++ b/src/mesa/drivers/dri/i965/brw_defines.h >> @@ -1669,6 +1669,12 @@ enum brw_message_target { >> #define BRW_DATAPORT_OWORD_BLOCK_2_OWORDS 2 >> #define BRW_DATAPORT_OWORD_BLOCK_4_OWORDS 3 >> #define BRW_DATAPORT_OWORD_BLOCK_8_OWORDS 4 >> +#define BRW_DATAPORT_OWORD_BLOCK_DWORDS(n) \ >> + ((n) == 4 ? BRW_DATAPORT_OWORD_BLOCK_1_OWORDLOW :\ >> +(n) == 8 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS : \ >> +(n) == 16 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS : \ >> +(n) == 32 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : \ >> +(abort(), ~0)) > > How about: > > #define BRW_DATAPORT_OWORD_BLOCK_DWORDS(n) \ >(assert(n == 4 || n == 8 || n == 16 || n == 32), ffs(n) - 3) > I don't think that would work for n > 4 due to the rather unfortunate hardware encoding, e.g. BRW_DATAPORT_OWORD_BLOCK_2_OWORDS is supposed to be encoded as 2 but your macro would give 1 as result. :| >> >> #define BRW_DATAPORT_OWORD_DUAL_BLOCK_1OWORD 0 >> #define BRW_DATAPORT_OWORD_DUAL_BLOCK_4OWORDS2 >> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c >> b/src/mesa/drivers/dri/i965/brw_eu_emit.c >> index 341f543..6141bfb 100644 >> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c >> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c >> @@ -2056,11 +2056,6 @@ void brw_oword_block_write_scratch(struct brw_codegen >> *p, >> mrf = retype(mrf, BRW_REGISTER_TYPE_UD); >> >> const unsigned mlen = 1 + num_regs; >> - const unsigned msg_control = >> - (num_regs == 1 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS : >> - num_regs == 2 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS : >> - num_regs == 4 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : 0); >> - assert(msg_control); >> >> /* Set up the message header. This is g0, with g0.2 filled with >> * the offset. We don't want to leave our offset around in g0 or >> @@ -2134,7 +2129,7 @@ void brw_oword_block_write_scratch(struct brw_codegen >> *p, >>brw_set_dp_write_message(p, >> insn, >> brw_scratch_surface_idx(p), >> - msg_control, >> + BRW_DATAPORT_OWORD_BLOCK_DWORDS(num_regs * 8), >> msg_type, >> target_cache, >> mlen, >> @@ -2181,11 +2176,6 @@ brw_oword_block_read_scratch(struct brw_codegen *p, >> dest = retype(dest, BRW_REGISTER_TYPE_UW); >> >> const unsigned rlen = num_regs; >> - const unsigned msg_control = >> - (num_regs == 1 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS : >> - num_regs == 2 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS : >> - num_regs == 4 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : 0); >> - assert(msg_control); >> const unsigned target_cache = >>(devinfo->gen >= 7 ? GEN7_SFID_DATAPORT_DATA_CACHE : >> devinfo->gen >= 6 ? GEN6_SFID_DATAPORT_RENDER_CACHE : >> @@ -,7 +2212,7 @@ brw_oword_block_read_scratch(struct brw_codegen *p, >>brw_set_dp_read_message(p, >>insn, >>brw_scratch_surface_idx(p), >> - msg_control, >> + BRW_DATAPORT_OWORD_BLOCK_DWORDS(num_regs * 8), >>BRW_DATAPORT_READ_MESSAGE_OWORD_BLOCK_READ, /* >> msg_type */ >>target_cache, >>1, /* msg_length */ >> signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] fp64 support for Intel HSW
Hello devs, could someone please ack the patches from the Igalia guys. It is really frustrating that their patches are still not in master. By the way. I use this branch now for months... https://github.com/Igalia/mesa/tree/i965-va64-gen7-scalar-vec4 and it also enables va64 for HSW and it works! > OpenGL 4.2 (4.3 could also be enabled I think). Thank you all for you great work and a happy Advent season from Darius Spitznagel ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: don't attempt to unlock an unlocked debug state mutex
Previously there was no _mesa_unlock_debug_state() call at all, one is still retained here for the case where _mesa_lock_debug_state() is called and did not return NULL (which is documented as being unlocked). On Mon, Dec 12, 2016 at 11:40:32AM +1100, Edward O'Callaghan wrote: > Hold up.. > > Does this reintroduce the hang in glsl-fs-loop piglit test with > MESA_DEBUG=context though? Was that tested? I'm interested to know how > this got so muddled up in the first place. > > Kind Regards, > Edward. > > On 12/12/2016 05:21 AM, Eduardo Lima Mitev wrote: > > Looks good. > > > > Reviewed-by: Eduardo Lima Mitev> > > > On 12/11/2016 04:42 PM, Jonathan Gray wrote: > >> Commit 929fcee47e46781c57f2a354ce0a013915c033d1 introduced code that > >> attempts to unlock an unlocked mutex which is undefined behaviour. > >> > >> On OpenBSD this leads to an abort: > >> > >> 0 0x124dadfa96ba in thrkill () at :2 > >> 1 0x124dadf3da39 in *_libc_abort () at > >> /usr/src/lib/libc/stdlib/abort.c:52 > >> 2 0x124d2c1165b5 in *_libpthread_pthread_mutex_unlock > >> (mutexp=) > >> at /usr/src/lib/librthread/rthread_sync.c:221 > >> 3 0x124d279c02e4 in init_attrib_groups (ctx=0x124df0fda000) at > >> main/context.c:825 > >> 4 _mesa_initialize_context (ctx=ctx@entry=0x124df0fda000, > >> api=api@entry=API_OPENGL_CORE, > >> visual=visual@entry=0x7f7bdfd0, share_list=share_list@entry=0x0, > >> driverFunctions=driverFunctions@entry=0x7f7bda60) at > >> main/context.c:1204 > >> 5 0x124d27b507ec in st_create_context (api=api@entry=API_OPENGL_CORE, > >> pipe=pipe@entry=0x124dc491, visual=visual@entry=0x7f7bdfd0, > >> share=share@entry=0x0, options=options@entry=0x7f7be128) > >> at state_tracker/st_context.c:545 > >> 6 0x124d27b8639f in st_api_create_context (stapi=, > >> smapi=0x124d1b608800, attribs=0x7f7be100, error=0x7f7be0fc, > >> shared_stctxi=0x0) > >> at state_tracker/st_manager.c:669 > >> 7 0x124d27cc5b9c in dri_create_context (api=, > >> visual=0x124d8a0f8a00, > >> cPriv=0x124de473f240, major_version=, > >> minor_version=, > >> flags=, notify_reset=false, error=0x7f7be2b4, > >> sharedContextPrivate=0x0) at dri_context.c:123 > >> 8 0x124d27cc5029 in driCreateContextAttribs (screen=0x124d8a0f8400, > >> api=, config=0x124d8a0f8a00, shared=, > >> num_attribs=, attribs=, > >> error=0x7f7be2b4, > >> data=0x124d77814a00) at dri_util.c:448 > >> 9 0x124d8e109b00 in drisw_create_context_attribs (base=0x124df3e08700, > >> config_base=0x124d7a0e7300, shareList=, > >> num_attribs=, > >> attribs=, error=0x7f7be2b4) at drisw_glx.c:476 > >> 10 0x124d8e104b4a in glXCreateContextAttribsARB (dpy=0x124d533f, > >> config=0x124d7a0e7300, share_context=0x0, direct=1, > >> attrib_list=0x7f7be300) > >> at create_context.c:78 > >> > >> Signed-off-by: Jonathan Gray > >> --- > >> src/mesa/main/debug_output.c | 5 ++--- > >> 1 file changed, 2 insertions(+), 3 deletions(-) > >> > >> diff --git a/src/mesa/main/debug_output.c b/src/mesa/main/debug_output.c > >> index 48dbbb3..bc933db 100644 > >> --- a/src/mesa/main/debug_output.c > >> +++ b/src/mesa/main/debug_output.c > >> @@ -1282,14 +1282,13 @@ _mesa_init_debug_output(struct gl_context *ctx) > >> */ > >>struct gl_debug_state *debug = _mesa_lock_debug_state(ctx); > >>if (!debug) { > >> - goto done; > >> + return; > >>} > >>debug->DebugOutput = GL_TRUE; > >>debug->LogToStderr = GL_TRUE; > >>ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_DEBUG_BIT; > >> + _mesa_unlock_debug_state(ctx); > >> } > >> -done: > >> - _mesa_unlock_debug_state(ctx); > >> } > >> > >> > >> > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/9] i965: Factor out oword block read and write message control calculation.
On Friday, December 9, 2016 11:03:27 AM PST Francisco Jerez wrote: > We'll need roughly the same logic in other places and it would be > annoying to duplicate it. Instead factor it out into a function-like > macro that takes the number of dwords per block (which will prove more > convenient than taking the same value in owords or some other unit). > --- > src/mesa/drivers/dri/i965/brw_defines.h | 6 ++ > src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 ++ > 2 files changed, 8 insertions(+), 12 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > b/src/mesa/drivers/dri/i965/brw_defines.h > index cae8e9a..1c638a0 100644 > --- a/src/mesa/drivers/dri/i965/brw_defines.h > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > @@ -1669,6 +1669,12 @@ enum brw_message_target { > #define BRW_DATAPORT_OWORD_BLOCK_2_OWORDS 2 > #define BRW_DATAPORT_OWORD_BLOCK_4_OWORDS 3 > #define BRW_DATAPORT_OWORD_BLOCK_8_OWORDS 4 > +#define BRW_DATAPORT_OWORD_BLOCK_DWORDS(n) \ > + ((n) == 4 ? BRW_DATAPORT_OWORD_BLOCK_1_OWORDLOW :\ > +(n) == 8 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS : \ > +(n) == 16 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS : \ > +(n) == 32 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : \ > +(abort(), ~0)) How about: #define BRW_DATAPORT_OWORD_BLOCK_DWORDS(n) \ (assert(n == 4 || n == 8 || n == 16 || n == 32), ffs(n) - 3) > > #define BRW_DATAPORT_OWORD_DUAL_BLOCK_1OWORD 0 > #define BRW_DATAPORT_OWORD_DUAL_BLOCK_4OWORDS2 > diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c > b/src/mesa/drivers/dri/i965/brw_eu_emit.c > index 341f543..6141bfb 100644 > --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c > +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c > @@ -2056,11 +2056,6 @@ void brw_oword_block_write_scratch(struct brw_codegen > *p, > mrf = retype(mrf, BRW_REGISTER_TYPE_UD); > > const unsigned mlen = 1 + num_regs; > - const unsigned msg_control = > - (num_regs == 1 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS : > - num_regs == 2 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS : > - num_regs == 4 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : 0); > - assert(msg_control); > > /* Set up the message header. This is g0, with g0.2 filled with > * the offset. We don't want to leave our offset around in g0 or > @@ -2134,7 +2129,7 @@ void brw_oword_block_write_scratch(struct brw_codegen > *p, >brw_set_dp_write_message(p, > insn, > brw_scratch_surface_idx(p), > -msg_control, > +BRW_DATAPORT_OWORD_BLOCK_DWORDS(num_regs * 8), > msg_type, > target_cache, > mlen, > @@ -2181,11 +2176,6 @@ brw_oword_block_read_scratch(struct brw_codegen *p, > dest = retype(dest, BRW_REGISTER_TYPE_UW); > > const unsigned rlen = num_regs; > - const unsigned msg_control = > - (num_regs == 1 ? BRW_DATAPORT_OWORD_BLOCK_2_OWORDS : > - num_regs == 2 ? BRW_DATAPORT_OWORD_BLOCK_4_OWORDS : > - num_regs == 4 ? BRW_DATAPORT_OWORD_BLOCK_8_OWORDS : 0); > - assert(msg_control); > const unsigned target_cache = >(devinfo->gen >= 7 ? GEN7_SFID_DATAPORT_DATA_CACHE : > devinfo->gen >= 6 ? GEN6_SFID_DATAPORT_RENDER_CACHE : > @@ -,7 +2212,7 @@ brw_oword_block_read_scratch(struct brw_codegen *p, >brw_set_dp_read_message(p, > insn, >brw_scratch_surface_idx(p), > - msg_control, > + BRW_DATAPORT_OWORD_BLOCK_DWORDS(num_regs * 8), > BRW_DATAPORT_READ_MESSAGE_OWORD_BLOCK_READ, /* > msg_type */ > target_cache, > 1, /* msg_length */ > signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 1/2] clover: fix sampler argument type detection
Serge Martinwrites: > --- > src/gallium/state_trackers/clover/llvm/codegen/common.cpp | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/src/gallium/state_trackers/clover/llvm/codegen/common.cpp > b/src/gallium/state_trackers/clover/llvm/codegen/common.cpp > index 13ccd59..aa6ca50 100644 > --- a/src/gallium/state_trackers/clover/llvm/codegen/common.cpp > +++ b/src/gallium/state_trackers/clover/llvm/codegen/common.cpp > @@ -118,6 +118,11 @@ namespace { >module::argument::zero_ext, >module::argument::image_format); > > + } else if (type_name == "sampler_t") { > +args.emplace_back(module::argument::sampler, arg_api_size, > + target_size, target_align, > + module::argument::zero_ext); > + Change looks reasonable, but, do you expect this to write anything into the kernel's input buffer? Currently it won't (see the implementation of sampler_argument::bind in core/kernel.cpp). If you didn't expect it to, you should probably pass zero instead of 'target_size' and 'target_align' for consistency. If you did, you'll need to fix core/kernel.cpp in addition. > } else { > // Other types. > const auto actual_type = > -- > 2.5.5 signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: don't attempt to unlock an unlocked debug state mutex
Hold up.. Does this reintroduce the hang in glsl-fs-loop piglit test with MESA_DEBUG=context though? Was that tested? I'm interested to know how this got so muddled up in the first place. Kind Regards, Edward. On 12/12/2016 05:21 AM, Eduardo Lima Mitev wrote: > Looks good. > > Reviewed-by: Eduardo Lima Mitev> > On 12/11/2016 04:42 PM, Jonathan Gray wrote: >> Commit 929fcee47e46781c57f2a354ce0a013915c033d1 introduced code that >> attempts to unlock an unlocked mutex which is undefined behaviour. >> >> On OpenBSD this leads to an abort: >> >> 0 0x124dadfa96ba in thrkill () at :2 >> 1 0x124dadf3da39 in *_libc_abort () at >> /usr/src/lib/libc/stdlib/abort.c:52 >> 2 0x124d2c1165b5 in *_libpthread_pthread_mutex_unlock >> (mutexp=) >> at /usr/src/lib/librthread/rthread_sync.c:221 >> 3 0x124d279c02e4 in init_attrib_groups (ctx=0x124df0fda000) at >> main/context.c:825 >> 4 _mesa_initialize_context (ctx=ctx@entry=0x124df0fda000, >> api=api@entry=API_OPENGL_CORE, >> visual=visual@entry=0x7f7bdfd0, share_list=share_list@entry=0x0, >> driverFunctions=driverFunctions@entry=0x7f7bda60) at >> main/context.c:1204 >> 5 0x124d27b507ec in st_create_context (api=api@entry=API_OPENGL_CORE, >> pipe=pipe@entry=0x124dc491, visual=visual@entry=0x7f7bdfd0, >> share=share@entry=0x0, options=options@entry=0x7f7be128) >> at state_tracker/st_context.c:545 >> 6 0x124d27b8639f in st_api_create_context (stapi=, >> smapi=0x124d1b608800, attribs=0x7f7be100, error=0x7f7be0fc, >> shared_stctxi=0x0) >> at state_tracker/st_manager.c:669 >> 7 0x124d27cc5b9c in dri_create_context (api=, >> visual=0x124d8a0f8a00, >> cPriv=0x124de473f240, major_version=, >> minor_version=, >> flags=, notify_reset=false, error=0x7f7be2b4, >> sharedContextPrivate=0x0) at dri_context.c:123 >> 8 0x124d27cc5029 in driCreateContextAttribs (screen=0x124d8a0f8400, >> api=, config=0x124d8a0f8a00, shared=, >> num_attribs=, attribs=, >> error=0x7f7be2b4, >> data=0x124d77814a00) at dri_util.c:448 >> 9 0x124d8e109b00 in drisw_create_context_attribs (base=0x124df3e08700, >> config_base=0x124d7a0e7300, shareList=, >> num_attribs=, >> attribs=, error=0x7f7be2b4) at drisw_glx.c:476 >> 10 0x124d8e104b4a in glXCreateContextAttribsARB (dpy=0x124d533f, >> config=0x124d7a0e7300, share_context=0x0, direct=1, >> attrib_list=0x7f7be300) >> at create_context.c:78 >> >> Signed-off-by: Jonathan Gray >> --- >> src/mesa/main/debug_output.c | 5 ++--- >> 1 file changed, 2 insertions(+), 3 deletions(-) >> >> diff --git a/src/mesa/main/debug_output.c b/src/mesa/main/debug_output.c >> index 48dbbb3..bc933db 100644 >> --- a/src/mesa/main/debug_output.c >> +++ b/src/mesa/main/debug_output.c >> @@ -1282,14 +1282,13 @@ _mesa_init_debug_output(struct gl_context *ctx) >> */ >>struct gl_debug_state *debug = _mesa_lock_debug_state(ctx); >>if (!debug) { >> - goto done; >> + return; >>} >>debug->DebugOutput = GL_TRUE; >>debug->LogToStderr = GL_TRUE; >>ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_DEBUG_BIT; >> + _mesa_unlock_debug_state(ctx); >> } >> -done: >> - _mesa_unlock_debug_state(ctx); >> } >> >> >> > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 98869] Electronic Super Joy graphic artefacts (regression)
https://bugs.freedesktop.org/show_bug.cgi?id=98869 --- Comment #5 from Mark Janes--- If you can't bisect mesa, then providing an apitrace of the workload will allow someone else to bisect it. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Make copy propagation not panic when it sees an intrinsic.
On Sun, 2016-12-11 at 00:00 -0800, Kenneth Graunke wrote: > On Saturday, December 10, 2016 12:37:16 PM PST Matt Turner wrote: > > > > On Fri, Dec 9, 2016 at 8:28 PM, Kenneth Graunke> org> wrote: > > > > > > A number of games have large arrays of constants, which we > > > promote to > > > uniforms. This introduces copies from the uniform array to the > > > original > > > temporary array. Normally, copy propagation eliminates those > > > copies, > > > making everything refer to the uniform array directly. > > > > > > A number of shaders in "Deus Ex: Mankind Divided" recently > > > exposed a > > > limitation of copy propagation - if we had any intrinsics (i.e. > > > image > > > access in a compute shader), we weren't able to get rid of these > > > copies. > > > > > > That meant that any variable indexing remained on the temporary > > > array > > > rather being moved to the uniform array. i965's scalar backend > > > currently doesn't support indirect addressing of temporary > > > arrays, > > > which meant lowering it to if-ladders. This was horrible. > > > > > > On Skylake: > > > > > > total instructions in shared programs: 13700090 -> 13654519 (- > > > 0.33%) > > > instructions in affected programs: 56438 -> 10867 (-80.75%) > > > > Wow! > > > > > > > > helped: 14 > > > HURT: 0 > > > > > > total cycles in shared programs: 288879704 -> 291270232 (0.83%) > > > cycles in affected programs: 12758080 -> 15148608 (18.74%) > > > > ... that seems nuts? > > > > Any idea what's going on with the cycle counts? > > Good point...I glossed over the cycle counts when I saw the -80% > reduction in instructions with 0 shaders hurt. But they do look > pretty bad, so let's take a closer look... > > There are two nearly identical shaders that are the worst offenders: > > shaders/closed/steam/deus-ex-mankind-divided/256.shader_test CS > SIMD16: > > instructions: 2770 -> 253 (-2,517 instructions or -90.87%) > spills: 25 -> 0 > fills: 29 -> 0 > cycles: 923266 -> 1420534 (+497,268 cycles or +53.86%) > compile time: 2.73 seconds -> 0.17 seconds > > There are three loops in the program, each of which contains two > indirect reads of the uvec4[98] constant array. > > Before this patch, there were: > - 67 UNIFORM_PULL_CONSTANT_LOADs at the top of the program > - 1 UNIFORM_PULL_CONSTANT_LOAD in the first (cheap) loop > - 1 UNIFORM_PULL_CONSTANT_LOAD in the second (expensive) loop > - 1 UNIFORM_PULL_CONSTANT_LOAD in the third (very expensive) loop > > After this patch, there are: > - 0 loads at the top of the program > - 1 VARYING_PULL_CONSTANT_LOAD in the first (cheap) loop > - 2 VARYING_PULL_CONSTANT_LOAD in the second (expensive) loop > - 2 VARYING_PULL_CONSTANT_LOAD in the third (very expensive) loop > > The array indexes in the expensive loop are a[foo] and a[foo + 1]. > foo is modified in the loop, so they can't be hoisted out. I don't > think we can determine the number of loop iterations. The first loop has a trip count of 49. If I force it to unroll I get: CS SIMD16 shader: 682 inst, 2 loops, 1009856 cycles. The current unrolling rules are: static bool is_loop_small_enough_to_unroll(nir_shader *shader, nir_loop_info *li) { unsigned max_iter = shader->options->max_unroll_iterations; if (li->trip_count > max_iter) return false; if (li->force_unroll) // this is for consecutive indirect array access return true; bool loop_not_too_large = li->num_instructions * li->trip_count <= max_iter * 25; return loop_not_too_large; } Maybe we should just drop the first trip count check? Checking the number of intructions * trip count makes a lot more sense. Dropping the check allows the first loop to unroll. > > The two expensive loops look to be twice as expensive after this > patch. > The numbers aren't quite adding up for me - it looks like we should > spend 200 more cycles per loop iteration, but the loops are like > 40,000 > -> 90,000 cycles. > > I'm not sure what to do with this information. Eliminating 90% of > the > instructions seems good. Requiring no scratch access seems good. > Eliminating the 67 memory loads outside of the loops seems good. > Doing two memory loads per loop doesn't seem too crazy, given that > it matches the GLSL source code. Burning 49 registers to store the > entire array for the lifetime of the program seems pretty crazy... > > --Ken > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] gallivm: optimize gather a bit, by using supplied destination type
From: Roland ScheideggerBy using a dst_type in the the gather interface, gather has some more knowledge about how values should be fetched. E.g. if this is a 3x32bit fetch and dst_type is 4x32bit vector gather will no longer do a ZExt with a 96bit scalar value to 128bit, but just fetch the 96bit as 3x32bit vector (this is still going to be 2 loads of course, but the loads can be done directly to simd vector that way). Also, we can now do some try to use the right int/float type. This should make no difference really since there's typically no domain transition penalties for such simd loads, however it actually makes a difference since llvm will use different shuffle lowering afterwards so the caller can use this to trick llvm into using sane shuffle afterwards (and yes llvm is really stupid there - nothing against using the shuffle instruction from the correct domain, but not at the cost of doing 3 times more shuffles, the case which actually matters is refusal to use shufps for integer values). Also do some attempt to avoid things which look great on paper but llvm doesn't really handle (e.g. fetching 3-element 8 bit and 16 bit vectors which is simply disastrous - I suspect type legalizer is to blame trying to extend these vectors to 128bit types somehow, so fetching these with scalars like before which is suboptimal due to the ZExt). Remove the ability for truncation (no point, this is gather, not conversion) as it is complex enough already. While here also implement not just the float, but also the 64bit avx2 gathers (disabled though since based on the theoretical numbers the benefit just isn't there at all until Skylake at least). --- src/gallium/auxiliary/draw/draw_llvm.c | 2 +- src/gallium/auxiliary/gallivm/lp_bld_format_aos.c | 5 +- .../auxiliary/gallivm/lp_bld_format_aos_array.c| 9 +- src/gallium/auxiliary/gallivm/lp_bld_format_soa.c | 23 +- src/gallium/auxiliary/gallivm/lp_bld_format_yuv.c | 4 +- src/gallium/auxiliary/gallivm/lp_bld_gather.c | 359 + src/gallium/auxiliary/gallivm/lp_bld_gather.h | 2 +- src/gallium/auxiliary/gallivm/lp_bld_sample_aos.c | 8 +- 8 files changed, 333 insertions(+), 79 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index c548572..19b75a5 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -1864,7 +1864,7 @@ draw_llvm_generate(struct draw_llvm *llvm, struct draw_llvm_variant *variant) LLVMPointerType(LLVMInt8TypeInContext(context), 0), ""); tmp = lp_build_gather(gallivm, vs_type.length, - 32, 32, TRUE, + 32, bld.type, TRUE, fetch_elts, tmp, FALSE); LLVMBuildStore(builder, tmp, index_store); } diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_aos.c b/src/gallium/auxiliary/gallivm/lp_bld_format_aos.c index 9f6b9e9..322e7b8 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_format_aos.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_format_aos.c @@ -224,6 +224,7 @@ lp_build_unpack_arith_rgba_aos(struct gallivm_state *gallivm, /* Ex: convert packed = {XYZW, XYZW, XYZW, XYZW} * into masked = {X, Y, Z, W} */ + /* Note: we cannot do this shift on x86 natively until AVX2. */ shifted = LLVMBuildLShr(builder, packed, LLVMConstVector(shifts, 4), ""); masked = LLVMBuildAnd(builder, shifted, LLVMConstVector(masks, 4), ""); @@ -394,6 +395,7 @@ lp_build_fetch_rgba_aos(struct gallivm_state *gallivm, util_is_power_of_two(format_desc->block.bits)) { LLVMValueRef packed; LLVMTypeRef dst_vec_type = lp_build_vec_type(gallivm, type); + struct lp_type fetch_type; unsigned vec_len = type.width * type.length; /* @@ -401,8 +403,9 @@ lp_build_fetch_rgba_aos(struct gallivm_state *gallivm, * scaling or converting. */ + fetch_type = lp_type_uint(type.width*4); packed = lp_build_gather(gallivm, type.length/4, - format_desc->block.bits, type.width*4, + format_desc->block.bits, fetch_type, aligned, base_ptr, offset, TRUE); assert(format_desc->block.bits <= vec_len); diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_aos_array.c b/src/gallium/auxiliary/gallivm/lp_bld_format_aos_array.c index 8cad3a6..636a4a6 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_format_aos_array.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_format_aos_array.c @@ -70,7 +70,14 @@ lp_build_fetch_rgba_aos_array(struct gallivm_state *gallivm, src_vec_type = lp_build_vec_type(gallivm, src_type); - /* Read whole vector from memory, unaligned */ + /* +* Read whole vector from memory, unaligned.
[Mesa-dev] [PATCH 2/6] gallivm: optimize SoA AoS fallback fetch path a little
From: Roland ScheideggerWe should do transpose, not extract/insert, at least with "sufficient" amount of channels (for 4 channels, extract/insert shuffles generated otherwise look truly terrifying). Albeit we shouldn't fallback to that so often in any case. --- src/gallium/auxiliary/gallivm/lp_bld_format_soa.c | 83 +++ 1 file changed, 70 insertions(+), 13 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c index 389bfa0..902c763 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c @@ -40,6 +40,39 @@ #include "lp_bld_debug.h" #include "lp_bld_format.h" #include "lp_bld_arit.h" +#include "lp_bld_pack.h" + + +static void +convert_to_soa(struct gallivm_state *gallivm, + LLVMValueRef src_aos[LP_MAX_VECTOR_WIDTH / 32], + LLVMValueRef dst_soa[4], + const struct lp_type soa_type) +{ + unsigned j, k; + struct lp_type aos_channel_type = soa_type; + + LLVMValueRef aos_channels[4]; + unsigned pixels_per_channel = soa_type.length / 4; + + debug_assert((soa_type.length % 4) == 0); + + aos_channel_type.length >>= 1; + + for (j = 0; j < 4; ++j) { + LLVMValueRef channel[LP_MAX_VECTOR_LENGTH] = { 0 }; + + assert(pixels_per_channel <= LP_MAX_VECTOR_LENGTH); + + for (k = 0; k < pixels_per_channel; ++k) { + channel[k] = src_aos[j + 4 * k]; + } + + aos_channels[j] = lp_build_concat(gallivm, channel, aos_channel_type, pixels_per_channel); + } + + lp_build_transpose_aos(gallivm, soa_type, aos_channels, dst_soa); +} void @@ -48,9 +81,6 @@ lp_build_format_swizzle_soa(const struct util_format_description *format_desc, const LLVMValueRef *unswizzled, LLVMValueRef swizzled_out[4]) { - assert(PIPE_SWIZZLE_0 == (int)PIPE_SWIZZLE_0); - assert(PIPE_SWIZZLE_1 == (int)PIPE_SWIZZLE_1); - if (format_desc->colorspace == UTIL_FORMAT_COLORSPACE_ZS) { enum pipe_swizzle swizzle; LLVMValueRef depth_or_stencil; @@ -547,9 +577,11 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm, { unsigned k, chan; struct lp_type tmp_type; + LLVMValueRef aos_fetch[LP_MAX_VECTOR_WIDTH / 32]; + boolean vec_transpose = FALSE; if (gallivm_debug & GALLIVM_DEBUG_PERF) { - debug_printf("%s: scalar unpacking of %s\n", + debug_printf("%s: AoS fetch fallback for %s\n", __FUNCTION__, format_desc->short_name); } @@ -560,12 +592,31 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm, rgba_out[chan] = lp_build_undef(gallivm, type); } + if (format_desc->nr_channels > 2 || + format_desc->layout != UTIL_FORMAT_LAYOUT_PLAIN) { + /* + * Note that vector transpose can be worse. This is because + * llvm will ensure the missing channels have the correct + * values, in particular typically 1.0 for the last channel + * (if they are used or not doesn't matter, usually llvm can't + * figure this out here probably due to the transpose). + * But with the extract/insert path, since those missing elements + * were just directly inserted/extracted llvm can optimize this + * somewhat (though it still doesn't look great - and not for + * the compressed formats due to their external fetch funcs). + * So restrict to cases where we are sure it helps (albeit + * with 2 channels it MIGHT be worth it at least with AVX). + * In any case, this is just a bandaid, it does NOT replace proper + * SoA format unpack. + */ + vec_transpose = TRUE; + } + /* loop over number of pixels */ for(k = 0; k < type.length; ++k) { LLVMValueRef index = lp_build_const_int32(gallivm, k); LLVMValueRef offset_elem; LLVMValueRef i_elem, j_elem; - LLVMValueRef tmp; offset_elem = LLVMBuildExtractElement(builder, offset, index, ""); @@ -574,20 +625,26 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm, j_elem = LLVMBuildExtractElement(builder, j, index, ""); /* Get a single float[4]={R,G,B,A} pixel */ - tmp = lp_build_fetch_rgba_aos(gallivm, format_desc, tmp_type, - aligned, base_ptr, offset_elem, - i_elem, j_elem, cache); + aos_fetch[k] = lp_build_fetch_rgba_aos(gallivm, format_desc, tmp_type, +aligned, base_ptr, offset_elem, +i_elem, j_elem, cache); /* * Insert the AoS tmp value channels into the SoA result vectors at * position =
[Mesa-dev] [PATCH 5/6] gallivm: generalize the compressed format soa fetch a bit
From: Roland ScheideggerThis can now handle rgtc (unorm) too - this path no longer handles plain formats, but that's unnecessary they now all have their proper SoA unpack (this will still be dog-slow though due to the actual fetch being per-pixel util fallbacks). --- src/gallium/auxiliary/gallivm/lp_bld_format_soa.c | 86 +-- 1 file changed, 49 insertions(+), 37 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c index 9550f26..68cbb10 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c @@ -733,64 +733,69 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm, /* * Try calling lp_build_fetch_rgba_aos for all pixels. +* Should only really hit subsampled, compressed +* (for s3tc srgb too, for rgtc the unorm ones only) by now. +* (This is invalid for plain 8unorm formats because we're lazy with +* the swizzle since some results would arrive swizzled, some not.) */ - if (util_format_fits_8unorm(format_desc) && + if ((format_desc->layout != UTIL_FORMAT_LAYOUT_PLAIN) && + (util_format_fits_8unorm(format_desc) || +format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC) && type.floating && type.width == 32 && (type.length == 1 || (type.length % 4 == 0))) { struct lp_type tmp_type; - LLVMValueRef tmp; + struct lp_build_context bld; + LLVMValueRef packed, rgba[4]; + const struct util_format_description *flinear_desc; + const struct util_format_description *frgba8_desc; + unsigned chan; + lp_build_context_init(, gallivm, type); + + /* + * Make sure the conversion in aos really only does convert to rgba8 + * and not anything more (so use linear format, adjust type). + */ + flinear_desc = util_format_description(util_format_linear(format)); memset(_type, 0, sizeof tmp_type); tmp_type.width = 8; tmp_type.length = type.length * 4; tmp_type.norm = TRUE; - tmp = lp_build_fetch_rgba_aos(gallivm, format_desc, tmp_type, -aligned, base_ptr, offset, i, j, cache); + packed = lp_build_fetch_rgba_aos(gallivm, flinear_desc, tmp_type, + aligned, base_ptr, offset, i, j, cache); + packed = LLVMBuildBitCast(builder, packed, bld.int_vec_type, ""); - lp_build_rgba8_to_fi32_soa(gallivm, -type, -tmp, -rgba_out); - - return; - } - - if (format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC && - /* non-srgb case is already handled above */ - format_desc->colorspace == UTIL_FORMAT_COLORSPACE_SRGB && - type.floating && type.width == 32 && - (type.length == 1 || (type.length % 4 == 0)) && - cache) { - const struct util_format_description *format_decompressed; - const struct util_format_description *flinear_desc; - LLVMValueRef packed; - flinear_desc = util_format_description(util_format_linear(format_desc->format)); - /* This probably only works with aligned data */ - packed = lp_build_fetch_cached_texels(gallivm, -flinear_desc, -type.length, -base_ptr, -offset, -i, j, -cache); - packed = LLVMBuildBitCast(builder, packed, -lp_build_int_vec_type(gallivm, type), ""); /* - * The values are now packed so they match ordinary srgb RGBA8 format, + * The values are now packed so they match ordinary (srgb) RGBA8 format, * hence need to use matching format for unpack. */ - format_decompressed = util_format_description(PIPE_FORMAT_R8G8B8A8_SRGB); - + frgba8_desc = util_format_description(PIPE_FORMAT_R8G8B8A8_UNORM); + if (format_desc->colorspace == UTIL_FORMAT_COLORSPACE_SRGB) { + assert(format_desc->layout == UTIL_FORMAT_LAYOUT_S3TC); + frgba8_desc = util_format_description(PIPE_FORMAT_R8G8B8A8_SRGB); + } lp_build_unpack_rgba_soa(gallivm, - format_decompressed, + frgba8_desc, type, - packed, rgba_out); + packed, rgba); + /* + * We converted 4 channels. Make sure llvm can drop unneeded ones + * (luckily the rgba order is fixed, only la needs special case). + */ + for (chan = 0; chan < 4; chan++) { + enum pipe_swizzle swizzle = format_desc->swizzle[chan]; + if (chan == 3 &&
[Mesa-dev] [PATCH 4/6] gallivm: provide soa fetch path handling formats with more than 32bit
From: Roland ScheideggerThis previously always fell back to AoS conversion. Even for 4-float formats (which is the optimal case by far for that fallback case) this was suboptimal, since it meant the conversion couldn't be done with 256bit vectors. While this may still only be partly possible for some formats, (unless there's AVX2 support) at least the transpose can be done with half the unpacks (and before using the transpose for AoS fallbacks, it was worse still). With less than 4 channels, things got way worse with the AoS fallback quickly even with 128bit vectors. The strategy is pretty much the same as the existing one for formats which fit into 32 bits, except there's now multiple vectors to be fetched (2 or 4 to be exact), which need to be shuffled first (if it's 4 vectors, this amounts to a transpose, for 2 it's a bit different), then the unpack is done the same (with the exception that the shift of the channels is now modulo 32, and we need to select the right vector). In fact the most complex part about it is to get the shuffles right for separating into lo/hi parts for AVX/AVX2... This also makes use of the new ability of gather to use provided type information, which we abuse to outsmart llvm so we get decent shuffles, and to fetch 3x32bit vectors without having to ZExt the scalar. And just because we can, we handle double formats too, albeit they are a bit different (draw sometimes needs to handle that). --- src/gallium/auxiliary/gallivm/lp_bld_format_soa.c | 529 +++--- 1 file changed, 375 insertions(+), 154 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c index b3ea709..9550f26 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c @@ -31,6 +31,7 @@ #include "util/u_format.h" #include "util/u_memory.h" #include "util/u_string.h" +#include "util/u_math.h" #include "lp_bld_type.h" #include "lp_bld_const.h" @@ -113,6 +114,166 @@ lp_build_format_swizzle_soa(const struct util_format_description *format_desc, } + +static LLVMValueRef +lp_build_extract_soa_chan(struct lp_build_context *bld, + unsigned blockbits, + boolean srgb_chan, + struct util_format_channel_description chan_desc, + LLVMValueRef packed) +{ + struct gallivm_state *gallivm = bld->gallivm; + LLVMBuilderRef builder = gallivm->builder; + struct lp_type type = bld->type; + LLVMValueRef input = packed; + const unsigned width = chan_desc.size; + const unsigned start = chan_desc.shift; + const unsigned stop = start + width; + + /* Decode the input vector component */ + + switch(chan_desc.type) { + case UTIL_FORMAT_TYPE_VOID: + input = bld->undef; + break; + + case UTIL_FORMAT_TYPE_UNSIGNED: + /* + * Align the LSB + */ + if (start) { + input = LLVMBuildLShr(builder, input, + lp_build_const_int_vec(gallivm, type, start), ""); + } + + /* + * Zero the MSBs + */ + if (stop < blockbits) { + unsigned mask = ((unsigned long long)1 << width) - 1; + input = LLVMBuildAnd(builder, input, + lp_build_const_int_vec(gallivm, type, mask), ""); + } + + /* + * Type conversion + */ + if (type.floating) { + if (srgb_chan) { +struct lp_type conv_type = lp_uint_type(type); +input = lp_build_srgb_to_linear(gallivm, conv_type, width, input); + } + else { +if(chan_desc.normalized) + input = lp_build_unsigned_norm_to_float(gallivm, width, type, input); +else + input = LLVMBuildSIToFP(builder, input, bld->vec_type, ""); + } + } + else if (chan_desc.pure_integer) { + /* Nothing to do */ + } else { + /* FIXME */ + assert(0); + } + break; + + case UTIL_FORMAT_TYPE_SIGNED: + /* + * Align the sign bit first. + */ + if (stop < type.width) { + unsigned bits = type.width - stop; + LLVMValueRef bits_val = lp_build_const_int_vec(gallivm, type, bits); + input = LLVMBuildShl(builder, input, bits_val, ""); + } + + /* + * Align the LSB (with an arithmetic shift to preserve the sign) + */ + if (chan_desc.size < type.width) { + unsigned bits = type.width - chan_desc.size; + LLVMValueRef bits_val = lp_build_const_int_vec(gallivm, type, bits); + input = LLVMBuildAShr(builder, input, bits_val, ""); + } + + /* + * Type conversion + */ + if (type.floating) { + input = LLVMBuildSIToFP(builder, input, bld->vec_type, ""); + if (chan_desc.normalized) { +double scale = 1.0 / ((1 << (chan_desc.size -
[Mesa-dev] [PATCH 6/6] draw: use SoA fetch, not AoS one
From: Roland ScheideggerNow that there's some SoA fetch which never falls back, we should usually get results which are better or at least not worse (something like rgba32f will stay the same). I suppose though it might be worse in some cases where the format doesn't require conversion (e.g. rg32f) and goes straight to output - if llvm was able to see through all shuffles then it might have been able to do away with the aos->soa->aos transpose entirely which can no longer work possibly except for 4-channel formats (due to replacing the undef channels with 0/1 before the second transpose and not the first - llvm will definitely not be able to figure that out). That might actually be quite common, but I'm not sure llvm really could optimize it in the first place, and if it's a problem we should just special case such inputs (though note that if conversion is needed, it isn't obvious if it's better to skip the transpose or do the conversion AoS-style). For cases which get way better, think something like R16_UNORM with 8-wide vectors: this was 8 sign-extend fetches, 8 cvt, 8 muls, followed by a couple of shuffles to stitch things together (if it is smart enough, 6 unpacks) and then a (8-wide) transpose (not sure if llvm could even optimize the shuffles + transpose, since the 16bit values were actually sign-extended to 128bit before being cast to a float vec, so that would be another 8 unpacks). Now that is just 8 fetches (directly inserted into vector, albeit there's one 128bit insert needed), 1 cvt, 1 mul. --- src/gallium/auxiliary/draw/draw_llvm.c | 54 +- 1 file changed, 40 insertions(+), 14 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 19b75a5..f895b76 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -755,11 +755,9 @@ fetch_vector(struct gallivm_state *gallivm, LLVMValueRef *inputs, LLVMValueRef indices) { - LLVMValueRef zero = LLVMConstNull(LLVMInt32TypeInContext(gallivm->context)); LLVMBuilderRef builder = gallivm->builder; struct lp_build_context blduivec; LLVMValueRef offset, valid_mask; - LLVMValueRef aos_fetch[LP_MAX_VECTOR_WIDTH / 32]; unsigned i; lp_build_context_init(, gallivm, lp_uint_type(vs_type)); @@ -783,21 +781,49 @@ fetch_vector(struct gallivm_state *gallivm, } /* -* Note: we probably really want to use SoA fetch, not AoS one (albeit -* for most formats it will amount to the same as this isn't very -* optimized). But looks dangerous since it assumes alignment. +* Use SoA fetch. This should produce better code usually. +* Albeit it's possible there's exceptions (in particular if the fetched +* value is going directly to output if it's something like RG32F). */ - for (i = 0; i < vs_type.length; i++) { - LLVMValueRef offset1, elem; - elem = lp_build_const_int32(gallivm, i); - offset1 = LLVMBuildExtractElement(builder, offset, elem, ""); + if (1) { + struct lp_type res_type = vs_type; + /* The type handling is annoying here... */ + if (format_desc->colorspace == UTIL_FORMAT_COLORSPACE_RGB && + format_desc->channel[0].pure_integer) { + if (format_desc->channel[0].type == UTIL_FORMAT_TYPE_SIGNED) { +res_type = lp_type_int_vec(vs_type.width, vs_type.width * vs_type.length); + } + else if (format_desc->channel[0].type == UTIL_FORMAT_TYPE_UNSIGNED) { +res_type = lp_type_uint_vec(vs_type.width, vs_type.width * vs_type.length); + } + } - aos_fetch[i] = lp_build_fetch_rgba_aos(gallivm, format_desc, - lp_float32_vec4_type(), - FALSE, map_ptr, offset1, - zero, zero, NULL); + lp_build_fetch_rgba_soa(gallivm, format_desc, + res_type, FALSE, map_ptr, offset, + blduivec.zero, blduivec.zero, + NULL, inputs); + + for (i = 0; i < TGSI_NUM_CHANNELS; i++) { + inputs[i] = LLVMBuildBitCast(builder, inputs[i], + lp_build_vec_type(gallivm, vs_type), ""); + } + + } + else { + LLVMValueRef zero = LLVMConstNull(LLVMInt32TypeInContext(gallivm->context)); + LLVMValueRef aos_fetch[LP_MAX_VECTOR_WIDTH / 32]; + for (i = 0; i < vs_type.length; i++) { + LLVMValueRef offset1, elem; + elem = lp_build_const_int32(gallivm, i); + offset1 = LLVMBuildExtractElement(builder, offset, elem, ""); + + aos_fetch[i] = lp_build_fetch_rgba_aos(gallivm, format_desc, +lp_float32_vec4_type(), +FALSE, map_ptr, offset1, +
[Mesa-dev] [PATCH 1/6] gallivm: (trivial) handle non-aligned fetch for lp_build_fetch_rgba_soa
From: Roland Scheideggersoa fetch so far always assumed that data was aligned. However, we want to use this for vertex fetch, and data might not be aligned there, so handle it in this path too (basically just pass through alignment through to other functions). (It looks like it wouldn't work for for cached s3tc but this is no different than with AoS fetch.) --- src/gallium/auxiliary/gallivm/lp_bld_format.h | 1 + src/gallium/auxiliary/gallivm/lp_bld_format_soa.c | 15 +-- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 4 ++-- 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format.h b/src/gallium/auxiliary/gallivm/lp_bld_format.h index 5c866f4..6540caa 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_format.h +++ b/src/gallium/auxiliary/gallivm/lp_bld_format.h @@ -143,6 +143,7 @@ void lp_build_fetch_rgba_soa(struct gallivm_state *gallivm, const struct util_format_description *format_desc, struct lp_type type, +boolean aligned, LLVMValueRef base_ptr, LLVMValueRef offsets, LLVMValueRef i, diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c index 7444c51..389bfa0 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_format_soa.c @@ -349,6 +349,7 @@ lp_build_rgba8_to_fi32_soa(struct gallivm_state *gallivm, * * \param type the desired return type for 'rgba'. The vector length * is the number of texels to fetch + * \param aligned if the offset is guaranteed to be aligned to element width * * \param base_ptr points to the base of the texture mip tree. * \param offsetoffset to start of the texture image block. For non- @@ -365,6 +366,7 @@ void lp_build_fetch_rgba_soa(struct gallivm_state *gallivm, const struct util_format_description *format_desc, struct lp_type type, +boolean aligned, LLVMValueRef base_ptr, LLVMValueRef offset, LLVMValueRef i, @@ -402,7 +404,7 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm, type.length, format_desc->block.bits, type.width, - TRUE, + aligned, base_ptr, offset, FALSE); /* @@ -428,7 +430,7 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm, packed = lp_build_gather(gallivm, type.length, format_desc->block.bits, - type.width, TRUE, + type.width, aligned, base_ptr, offset, FALSE); if (format_desc->format == PIPE_FORMAT_R11G11B10_FLOAT) { lp_build_r11g11b10_to_float(gallivm, packed, rgba_out); @@ -456,14 +458,14 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm, LLVMValueRef s_offset = lp_build_const_int_vec(gallivm, type, 4); offset = LLVMBuildAdd(builder, offset, s_offset, ""); packed = lp_build_gather(gallivm, type.length, 32, type.width, - TRUE, base_ptr, offset, FALSE); + aligned, base_ptr, offset, FALSE); packed = LLVMBuildAnd(builder, packed, lp_build_const_int_vec(gallivm, type, mask), ""); } else { assert (format_desc->format == PIPE_FORMAT_Z32_FLOAT_S8X24_UINT); packed = lp_build_gather(gallivm, type.length, 32, type.width, - TRUE, base_ptr, offset, TRUE); + aligned, base_ptr, offset, TRUE); packed = LLVMBuildBitCast(builder, packed, lp_build_vec_type(gallivm, type), ""); } @@ -489,7 +491,7 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm, tmp_type.norm = TRUE; tmp = lp_build_fetch_rgba_aos(gallivm, format_desc, tmp_type, -TRUE, base_ptr, offset, i, j, cache); +aligned, base_ptr, offset, i, j, cache); lp_build_rgba8_to_fi32_soa(gallivm, type, @@ -509,6 +511,7 @@ lp_build_fetch_rgba_soa(struct gallivm_state *gallivm, const struct util_format_description *flinear_desc; LLVMValueRef packed; flinear_desc = util_format_description(util_format_linear(format_desc->format)); + /* This probably only works with aligned data */ packed = lp_build_fetch_cached_texels(gallivm,
[Mesa-dev] [Bug 99055] Games hang / freeze completely
https://bugs.freedesktop.org/show_bug.cgi?id=99055 --- Comment #2 from Timothy Arceri--- (In reply to Etienne Bruines from comment #0) > > I'd be happy to provide any additional information. The most helpful thing you could do is build mesa from source and bisect the commit that introduces the hang. I'd recommend building mesa from the master branch to start with to see if its still in master of if this is something introduced in the stable branch only. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: Clamp GetUniformuiv values to be >= 0.
Section 2.2.2 (Data Conversions For State Query Commands) of the OpenGL 4.5 October 24th 2016 specification says: "If a command returning unsigned integer data is called, such as GetSamplerParameterIuiv, negative values are clamped to zero." Fixes GL44-CTS.gpu_shader_fp64.state_query. Signed-off-by: Kenneth Graunke--- src/mesa/main/uniform_query.cpp | 48 + 1 file changed, 39 insertions(+), 9 deletions(-) Hey Nicolai, I wrote a similar patch a while back, but never got around to sending it, since I realized that the gl45release branch expects our current behavior, and the change to make the CTS expect clamping is only on the master branch. Apparently I made some additional changes, compared to yours. I figured I'd send this along and let you see if you think any of my extra changes are still necessary. If so, feel free to fold them into your patch. I also think we need to fix several other glGet* commands...it's just that this is the only one currently tested. A bunch work because the values returned can't be negative. --Ken diff --git a/src/mesa/main/uniform_query.cpp b/src/mesa/main/uniform_query.cpp index db700df..f05a29f 100644 --- a/src/mesa/main/uniform_query.cpp +++ b/src/mesa/main/uniform_query.cpp @@ -347,14 +347,10 @@ _mesa_get_uniform(struct gl_context *ctx, GLuint program, GLint location, * just memcpy the data. If the types are not compatible, perform a * slower convert-and-copy process. */ - if (returnType == uni->type->base_type - || ((returnType == GLSL_TYPE_INT - || returnType == GLSL_TYPE_UINT) - && - (uni->type->base_type == GLSL_TYPE_INT - || uni->type->base_type == GLSL_TYPE_UINT - || uni->type->base_type == GLSL_TYPE_SAMPLER - || uni->type->base_type == GLSL_TYPE_IMAGE))) { + if (returnType == uni->type->base_type || + ((returnType == GLSL_TYPE_INT || returnType == GLSL_TYPE_UINT) && + (uni->type->base_type == GLSL_TYPE_SAMPLER || +uni->type->base_type == GLSL_TYPE_IMAGE))) { memcpy(paramsOut, src, bytes); } else { union gl_constant_value *const dst = @@ -422,7 +418,6 @@ _mesa_get_uniform(struct gl_context *ctx, GLuint program, GLint location, } break; case GLSL_TYPE_INT: - case GLSL_TYPE_UINT: switch (uni->type->base_type) { case GLSL_TYPE_FLOAT: /* While the GL 3.2 core spec doesn't explicitly @@ -447,6 +442,9 @@ _mesa_get_uniform(struct gl_context *ctx, GLuint program, GLint location, case GLSL_TYPE_BOOL: dst[didx].i = src[sidx].i ? 1 : 0; break; + case GLSL_TYPE_UINT: + dst[didx].i = src[sidx].i; + break; case GLSL_TYPE_DOUBLE: { double tmp; memcpy(, [sidx].f, sizeof(tmp)); @@ -458,6 +456,38 @@ _mesa_get_uniform(struct gl_context *ctx, GLuint program, GLint location, break; } break; +case GLSL_TYPE_UINT: + switch (uni->type->base_type) { + case GLSL_TYPE_FLOAT: + /* The spec isn't terribly clear how to handle negative + * values with an unsigned return type. + * + * GL 4.5 section 2.2.2 ("Data Conversions for State + * Query Commands") says: + * + * "If a value is so large in magnitude that it cannot be + * represented by the returned data type, then the nearest + * value representable using the requested type is + * returned." + */ + dst[didx].i = src[sidx].f < 0.0f ? 0 : IROUND(src[sidx].f); + break; + case GLSL_TYPE_BOOL: + dst[didx].i = src[sidx].i ? 1 : 0; + break; + case GLSL_TYPE_INT: + dst[didx].i = MAX2(src[sidx].i, 0); + break; + case GLSL_TYPE_DOUBLE: { + double tmp; + memcpy(, [sidx].f, sizeof(tmp)); + dst[didx].i = tmp < 0.0 ? 0 : IROUNDD(tmp); + break; + } + default: + unreachable("invalid uniform type"); + } + break; default: assert(!"Should not get here."); -- 2.10.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Print out cycle estimates at the start of block annotations.
Good plan. Reviewed-by: Matt Turner___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0
i965/vec4: handle 32 and 64 bit channels in liveness analysis Please indent the returned multiline expressions in var_from_reg() like we do elsewhere, so that the second line begins on the same column as the first line. */ goes on its own line. I'm having a hard time reviewing this one. The logic is rather complex. I'll ask someone to help me review it on Monday at the office. i965/vec4: add a horiz_offset() helper i965: move the group field from fs_inst to backend_instruction. i965/vec4: add a SIMD lowering pass In the commit message, you say For now the pass only handles the gen7 restriction where any instruction that writes 2 registers also needs to read 2 registers. This affects double-precision instructions reading uniforms, for example. Later patches will extend the lowering pass adding a few more cases. But the rule about if-writing-two-regs, must-read-two-regs says that scalar sources are an exception: "When source is scalar, the source registers are not incremented." I don't see any code that allows us to avoid splitting an instruction if it's writing two registers but sourcing a scalar uniform. Maybe this doesn't apply because we have to use a non scalar swizzle (.xy) to access a single fp64 component? i965/vec4: make the generator set correct NibCtrl for SIMD4 DF instructions i965/vec4: dump NibCtrl for instructions with execsize != 8 i965/disasm: print NibCtrl for instructions with execsize < 8 i965/vec4: teach CSE about exec_size, group and doubles i965/vec4: teach cmod propagation about different execution sizes i965/vec4: split double-precision bcsel bcsel is the NIR opcode. I'd change references to bcsel to SEL. Very interesting find... i965/vec4: add a scalarization pass for double-precision instructions Don't indent case inside a switch. i965/vec4: translate 64-bit swizzles to 32-bit i965/vec4: implement access to DF source components Z/W Wow, bien hecho! i965/disasm: fix subreg for dst in Align16 mode i965/vec4: teach register coalescing about 64-bit i965/vec4: fix pack_uniform_registers for doubles i965/vec4: fix indentation in pack_uniform_registers i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands s/need/needs/ in the comment. i965/vec4/nir: do not emit 64-bit MAD i965/vec4: do not emit 64-bit MAD I might change the name of this commit to "i965/vec4: Lower 64-bit MAD" or "i965/vec4: Lower DF MAD" I think I'd change the name of the function as well, maybe to lower_64bit_mad[_to_mul_add] or something. i965/vec4: support multiple dispatch widths and groups in the IR builder. i965/vec4: Add a shuffle_64bit_data helper I was initially confused by r0.0:DF/r0.1:DF, thinking that .1 in r0.1:DF was a subreg offset. But I think it's actually the register offset (i.e., .offset)? If that's the case, I think it would be clearer just to increment the register number in the example: r0.0:DF x0 y0 z0 w0 r1.0:DF x1 y1 z1 w1 s/opperation/operation/ in the comment. On the multiline bld.group(...), I think Curro's style is to align with the '.'. For instance, inst = bld.group(4, for_write ? 1 : 0) .MOV(writemask(dst, WRITEMASK_ZW), swizzle(byte_offset(src, REG_SIZE), BRW_SWIZZLE_XYXY)); so that group and MOV align, with the '.' on the same line as the MOV. i965/vec4: Fix UBO loads for 64-bit data i965/vec4: Fix SSBO loads for 64-bit data i965/vec4: Fix SSBO stores for 64-bit data i965/vec4: don't constant propagate 64-bit immediates i965/vec4: prevent copy-propagation from values with a different type size i965/vec4: Prevent copy propagation from violating pre-gen8 restrictions Similar comment as before about being allowed to write two registers while sourcing a scalar. Maybe doesn't apply because of the double swizzle. i965/vec4: don't propagate single-precision uniforms into 4-wide instructions i965/vec4: don't copy propagate misaligned registers i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8 platforms I don't see this in the BDW PRMs, and the internal documentation says that it applies to "CHV, BXT" I suggest dropping this patch (or replacing it with one that adds || devinfo->is_broxton). i965/vec4: Do not use DepCtrl with 64-bit instructions i965/vec4: do not split scratch read/write opcodes i965/vec4: fix scratch offset for 64bit data i965/vec4: fix scratch reads for 64bit data i965/vec4: fix scratch writes for 64bit data i965/vec4: fix move_uniform_array_access_to_pull_constant() for 64-bit data i965/vec4: fix indentation in
[Mesa-dev] [PATCH] i965: Print out cycle estimates at the start of block annotations.
We now print START B15 <-B14 (42774 cycles) indicating that we estimate B15 will take 42,774 cycles. Printing this should make it easier where time is spent in the program. Signed-off-by: Kenneth Graunke--- src/mesa/drivers/dri/i965/intel_asm_annotation.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_asm_annotation.c b/src/mesa/drivers/dri/i965/intel_asm_annotation.c index b01490a..bb24e5e 100644 --- a/src/mesa/drivers/dri/i965/intel_asm_annotation.c +++ b/src/mesa/drivers/dri/i965/intel_asm_annotation.c @@ -45,7 +45,7 @@ dump_assembly(void *assembly, int num_annotations, struct annotation *annotation struct bblock_t *predecessor_block = predecessor_link->block; fprintf(stderr, " <-B%d", predecessor_block->num); } - fprintf(stderr, "\n"); + fprintf(stderr, " (%u cycles)\n", annotation[i].block_start->cycle_count); } if (last_annotation_ir != annotation[i].ir) { -- 2.10.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 84/84 v3] st/nine: Implement new buffer upload path
This new buffer upload path enables to lock faster than the normal path when using DISCARD/NOOVERWRITE. v2: Diverse cleanups and fixes. v3: Fix allocation size for 'lone' buffers and add more debug info. Signed-off-by: Axel Davy--- This fixes the reported 'leak' (which was not a leak but the code allocating buffers too big). I'm still interested in feedback though. src/gallium/state_trackers/nine/Makefile.sources | 2 + src/gallium/state_trackers/nine/buffer9.c | 75 +- src/gallium/state_trackers/nine/buffer9.h | 9 +- src/gallium/state_trackers/nine/device9.c | 5 + src/gallium/state_trackers/nine/device9.h | 3 + src/gallium/state_trackers/nine/indexbuffer9.c | 10 +- src/gallium/state_trackers/nine/indexbuffer9.h | 2 - .../state_trackers/nine/nine_buffer_upload.c | 293 + .../state_trackers/nine/nine_buffer_upload.h | 59 + src/gallium/state_trackers/nine/vertexbuffer9.c| 3 +- 10 files changed, 440 insertions(+), 21 deletions(-) create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.c create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.h diff --git a/src/gallium/state_trackers/nine/Makefile.sources b/src/gallium/state_trackers/nine/Makefile.sources index 2bb08a2..56698a1 100644 --- a/src/gallium/state_trackers/nine/Makefile.sources +++ b/src/gallium/state_trackers/nine/Makefile.sources @@ -23,6 +23,8 @@ C_SOURCES := \ indexbuffer9.h \ iunknown.c \ iunknown.h \ + nine_buffer_upload.c \ + nine_buffer_upload.h \ nine_csmt_helper.h \ nine_debug.c \ nine_debug.h \ diff --git a/src/gallium/state_trackers/nine/buffer9.c b/src/gallium/state_trackers/nine/buffer9.c index 9a7c30c..203e1d9 100644 --- a/src/gallium/state_trackers/nine/buffer9.c +++ b/src/gallium/state_trackers/nine/buffer9.c @@ -23,6 +23,7 @@ #include "buffer9.h" #include "device9.h" +#include "nine_buffer_upload.h" #include "nine_helpers.h" #include "nine_pipe.h" @@ -100,6 +101,9 @@ NineBuffer9_ctor( struct NineBuffer9 *This, else info->usage = PIPE_USAGE_DYNAMIC; +/* When Writeonly is not set, we don't want to enable the + * optimizations */ +This->discard_nooverwrite_only = !!(Usage & D3DUSAGE_WRITEONLY); /* if (pDesc->Usage & D3DUSAGE_DONOTCLIP) { } */ /* if (pDesc->Usage & D3DUSAGE_NONSECURE) { } */ /* if (pDesc->Usage & D3DUSAGE_NPATCHES) { } */ @@ -161,12 +165,18 @@ NineBuffer9_dtor( struct NineBuffer9 *This ) list_del(>managed.list2); } +if (This->buf) +nine_upload_release_buffer(This->base.base.device->buffer_upload, This->buf); + NineResource9_dtor(>base); } struct pipe_resource * -NineBuffer9_GetResource( struct NineBuffer9 *This ) +NineBuffer9_GetResource( struct NineBuffer9 *This, unsigned *offset ) { +if (This->buf) +return nine_upload_buffer_resource_and_offset(This->buf, offset); +*offset = 0; return NineResource9_GetResource(>base); } @@ -260,6 +270,8 @@ NineBuffer9_Lock( struct NineBuffer9 *This, if (Flags & D3DLOCK_DONOTWAIT && !(This->base.usage & D3DUSAGE_DYNAMIC)) usage |= PIPE_TRANSFER_DONTBLOCK; +This->discard_nooverwrite_only &= !!(Flags & (D3DLOCK_DISCARD | D3DLOCK_NOOVERWRITE)); + if (This->nmaps == This->maxmaps) { struct NineTransfer *newmaps = REALLOC(This->maps, sizeof(struct NineTransfer)*This->maxmaps, @@ -271,8 +283,52 @@ NineBuffer9_Lock( struct NineBuffer9 *This, This->maps = newmaps; } -This->maps[This->nmaps].is_pipe_secondary = FALSE; +if (This->buf && +(!This->discard_nooverwrite_only || + (This->discard_nooverwrite_only && (Flags & D3DLOCK_DISCARD { +/* Release previous buffer */ +if (This->nmaps >= 1) +This->maps[This->nmaps-1].should_destroy_buf = true; +else +nine_upload_release_buffer(device->buffer_upload, This->buf); +This->buf = NULL; +/* Mapping a nine_subbuffer without discard/nooverwrite. + * Since that will hurt (we'll wait the entire buffer + * to be idle), we'll won't nine_subbuffer for this buffer + * in fear that happens again. */ +if (!This->discard_nooverwrite_only) { +DBG("Disabling nine_subbuffer for a buffer having" +"used a nine_subbuffer buffer\n"); +/* Rebind previous resource */ +NineBuffer9_RebindIfRequired(This, device); +} +} + +This->maps[This->nmaps].transfer = NULL; +This->maps[This->nmaps].is_pipe_secondary = false; +This->maps[This->nmaps].buf = NULL; +This->maps[This->nmaps].should_destroy_buf = false; + +if (This->discard_nooverwrite_only) { +if (!This->buf) { +This->buf = nine_upload_create_buffer(device->buffer_upload, This->base.info.width0); +
Re: [Mesa-dev] [PATCH] nv50/ir: use OPCLASS_SURFACE for SUSTB
I don't think we use SUSTB anyways... it was used in the old handleSTORE logic in the tgsi -> nvir converter, but wtvr. I have a feeling it's the sort of thing that might matter for nv50 (vs nvc0), but for now this is fine. (Esp since the opclasses don't matter on nv50.) On Sun, Dec 11, 2016 at 5:44 PM, Samuel Pitoisetwrote: > Found by inspection, probably a typo because a surface store is > definitely not an atomic operation. > > Signed-off-by: Samuel Pitoiset > --- > src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp > index 273ec34..789dba4 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp > @@ -109,7 +109,7 @@ const OpClass Target::operationClass[] = > OPCLASS_TEXTURE, OPCLASS_TEXTURE, OPCLASS_TEXTURE, OPCLASS_TEXTURE, > OPCLASS_TEXTURE, OPCLASS_TEXTURE, > // SULDB, SULDP, SUSTB, SUSTP; SUREDB, SUREDP, SULEA > - OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_ATOMIC, OPCLASS_SURFACE, > + OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_SURFACE, > OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_SURFACE, > // SUBFM, SUCLAMP, SUEAU, SUQ, MADSP > OPCLASS_OTHER, OPCLASS_OTHER, OPCLASS_OTHER, OPCLASS_OTHER, OPCLASS_ARITH, > -- > 2.10.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nv50/ir: use OPCLASS_SURFACE for SUSTB
Found by inspection, probably a typo because a surface store is definitely not an atomic operation. Signed-off-by: Samuel Pitoiset--- src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp index 273ec34..789dba4 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.cpp @@ -109,7 +109,7 @@ const OpClass Target::operationClass[] = OPCLASS_TEXTURE, OPCLASS_TEXTURE, OPCLASS_TEXTURE, OPCLASS_TEXTURE, OPCLASS_TEXTURE, OPCLASS_TEXTURE, // SULDB, SULDP, SUSTB, SUSTP; SUREDB, SUREDP, SULEA - OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_ATOMIC, OPCLASS_SURFACE, + OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_SURFACE, OPCLASS_SURFACE, // SUBFM, SUCLAMP, SUEAU, SUQ, MADSP OPCLASS_OTHER, OPCLASS_OTHER, OPCLASS_OTHER, OPCLASS_OTHER, OPCLASS_ARITH, -- 2.10.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 99055] Games hang / freeze completely
https://bugs.freedesktop.org/show_bug.cgi?id=99055 --- Comment #1 from Etienne Bruines--- Update: apitrace did not magically fix it, but it did delay the symptom by a few hours. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] i965: fix release build unused variable warning
Signed-off-by: Grazvydas Ignotas--- v2: take Eduardo Lima's suggestion further - remove the variable completely no commit access src/mesa/drivers/dri/i965/brw_blorp.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c b/src/mesa/drivers/dri/i965/brw_blorp.c index 4c1d858..43ac3be 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.c +++ b/src/mesa/drivers/dri/i965/brw_blorp.c @@ -813,8 +813,6 @@ do_single_blorp_clear(struct brw_context *brw, struct gl_framebuffer *fb, can_fast_clear = false; const unsigned logical_layer = irb_logical_mt_layer(irb); - const bool is_lossless_compressed = intel_miptree_is_lossless_compressed( - brw, irb->mt); const enum intel_fast_clear_state fast_clear_state = intel_miptree_get_fast_clear_state(irb->mt, irb->mt_level, logical_layer); @@ -850,7 +848,7 @@ do_single_blorp_clear(struct brw_context *brw, struct gl_framebuffer *fb, * it now. */ if (!irb->mt->mcs_buf) { - assert(!is_lossless_compressed); + assert(!intel_miptree_is_lossless_compressed(brw, irb->mt)); if (!intel_miptree_alloc_non_msrt_mcs(brw, irb->mt, false)) { /* MCS allocation failed--probably this will only happen in * out-of-memory conditions. But in any case, try to recover -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] intel/aubinator: fix 32bit shift overflow warning
Looks good. Reviewed-by: Eduardo Lima MitevWill push this and the other ones I reviewed. Thanks! Eduardo On 12/10/2016 08:10 PM, Grazvydas Ignotas wrote: > Doesn't look like this can work on 32bit, just rids of annoying > warning. > > Signed-off-by: Grazvydas Ignotas > --- > no commit access, somebody please push > > src/intel/tools/aubinator.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c > index 5e3a684..e2bec8f 100644 > --- a/src/intel/tools/aubinator.c > +++ b/src/intel/tools/aubinator.c > @@ -1297,7 +1297,7 @@ int main(int argc, char *argv[]) >file = aub_file_open(input_file); > > /* mmap a terabyte for our gtt space. */ > - gtt_size = 1ul << 40; > + gtt_size = 1ull << 40; > gtt = mmap(NULL, gtt_size, PROT_READ | PROT_WRITE, >MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0); > if (gtt == MAP_FAILED) { > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] anv: fix release build unused variable warnings
Looks good. I don't remember the MAYBE_UNUSED macro being used that much in mesa opengl, but it seems it is trendy in vulkan code. Reviewed-by: Eduardo Lima MitevOn 12/10/2016 08:10 PM, Grazvydas Ignotas wrote: > Signed-off-by: Grazvydas Ignotas > --- > no commit access, somebody please push > > src/intel/vulkan/anv_blorp.c | 3 ++- > src/intel/vulkan/genX_cmd_buffer.c | 2 +- > 2 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c > index 159e4a0..b431d6a 100644 > --- a/src/intel/vulkan/anv_blorp.c > +++ b/src/intel/vulkan/anv_blorp.c > @@ -909,7 +909,8 @@ anv_cmd_buffer_alloc_blorp_binding_table(struct > anv_cmd_buffer *cmd_buffer, > state_offset); > if (bt_state.map == NULL) { >/* We ran out of space. Grab a new binding table block. */ > - VkResult result = anv_cmd_buffer_new_binding_table_block(cmd_buffer); > + MAYBE_UNUSED VkResult result = > + anv_cmd_buffer_new_binding_table_block(cmd_buffer); >assert(result == VK_SUCCESS); > >/* Re-emit state base addresses so we get the new surface state base > diff --git a/src/intel/vulkan/genX_cmd_buffer.c > b/src/intel/vulkan/genX_cmd_buffer.c > index f761d9a..6131cfb 100644 > --- a/src/intel/vulkan/genX_cmd_buffer.c > +++ b/src/intel/vulkan/genX_cmd_buffer.c > @@ -697,7 +697,7 @@ genX(cmd_buffer_config_l3)(struct anv_cmd_buffer > *cmd_buffer, > assert(!urb_low_bw || cfg->n[GEN_L3P_URB] == cfg->n[GEN_L3P_SLM]); > > /* Minimum number of ways that can be allocated to the URB. */ > - const unsigned n0_urb = (devinfo->is_baytrail ? 32 : 0); > + MAYBE_UNUSED const unsigned n0_urb = devinfo->is_baytrail ? 32 : 0; > assert(cfg->n[GEN_L3P_URB] >= n0_urb); > > uint32_t l3sqcr1, l3cr2, l3cr3; > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: fix release build unused variable warning
The patch looks good but now that we are at it, I would move the definition and assignment of 'is_lossless_compressed' to the block where it is used, right before the assert. Right now we are always defining and assigning the variable, but it is consumed conditionally. With that changed, patch is: Reviewed-by: Eduardo Lima MitevThanks! Should I push this and the other patches you submitted and I reviewed? Eduardo On 12/10/2016 08:10 PM, Grazvydas Ignotas wrote: > Signed-off-by: Grazvydas Ignotas > --- > src/mesa/drivers/dri/i965/brw_blorp.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c > b/src/mesa/drivers/dri/i965/brw_blorp.c > index 4c1d858..8b5750c 100644 > --- a/src/mesa/drivers/dri/i965/brw_blorp.c > +++ b/src/mesa/drivers/dri/i965/brw_blorp.c > @@ -813,8 +813,8 @@ do_single_blorp_clear(struct brw_context *brw, struct > gl_framebuffer *fb, >can_fast_clear = false; > > const unsigned logical_layer = irb_logical_mt_layer(irb); > - const bool is_lossless_compressed = intel_miptree_is_lossless_compressed( > - brw, irb->mt); > + MAYBE_UNUSED const bool is_lossless_compressed = > + intel_miptree_is_lossless_compressed(brw, irb->mt); > const enum intel_fast_clear_state fast_clear_state = >intel_miptree_get_fast_clear_state(irb->mt, irb->mt_level, > logical_layer); > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: don't attempt to unlock an unlocked debug state mutex
Looks good. Reviewed-by: Eduardo Lima MitevOn 12/11/2016 04:42 PM, Jonathan Gray wrote: > Commit 929fcee47e46781c57f2a354ce0a013915c033d1 introduced code that > attempts to unlock an unlocked mutex which is undefined behaviour. > > On OpenBSD this leads to an abort: > > 0 0x124dadfa96ba in thrkill () at :2 > 1 0x124dadf3da39 in *_libc_abort () at > /usr/src/lib/libc/stdlib/abort.c:52 > 2 0x124d2c1165b5 in *_libpthread_pthread_mutex_unlock (mutexp= out>) > at /usr/src/lib/librthread/rthread_sync.c:221 > 3 0x124d279c02e4 in init_attrib_groups (ctx=0x124df0fda000) at > main/context.c:825 > 4 _mesa_initialize_context (ctx=ctx@entry=0x124df0fda000, > api=api@entry=API_OPENGL_CORE, > visual=visual@entry=0x7f7bdfd0, share_list=share_list@entry=0x0, > driverFunctions=driverFunctions@entry=0x7f7bda60) at > main/context.c:1204 > 5 0x124d27b507ec in st_create_context (api=api@entry=API_OPENGL_CORE, > pipe=pipe@entry=0x124dc491, visual=visual@entry=0x7f7bdfd0, > share=share@entry=0x0, options=options@entry=0x7f7be128) > at state_tracker/st_context.c:545 > 6 0x124d27b8639f in st_api_create_context (stapi=, > smapi=0x124d1b608800, attribs=0x7f7be100, error=0x7f7be0fc, > shared_stctxi=0x0) > at state_tracker/st_manager.c:669 > 7 0x124d27cc5b9c in dri_create_context (api=, > visual=0x124d8a0f8a00, > cPriv=0x124de473f240, major_version=, > minor_version=, > flags=, notify_reset=false, error=0x7f7be2b4, > sharedContextPrivate=0x0) at dri_context.c:123 > 8 0x124d27cc5029 in driCreateContextAttribs (screen=0x124d8a0f8400, > api=, config=0x124d8a0f8a00, shared=, > num_attribs=, attribs=, > error=0x7f7be2b4, > data=0x124d77814a00) at dri_util.c:448 > 9 0x124d8e109b00 in drisw_create_context_attribs (base=0x124df3e08700, > config_base=0x124d7a0e7300, shareList=, > num_attribs=, > attribs=, error=0x7f7be2b4) at drisw_glx.c:476 > 10 0x124d8e104b4a in glXCreateContextAttribsARB (dpy=0x124d533f, > config=0x124d7a0e7300, share_context=0x0, direct=1, > attrib_list=0x7f7be300) > at create_context.c:78 > > Signed-off-by: Jonathan Gray > --- > src/mesa/main/debug_output.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/src/mesa/main/debug_output.c b/src/mesa/main/debug_output.c > index 48dbbb3..bc933db 100644 > --- a/src/mesa/main/debug_output.c > +++ b/src/mesa/main/debug_output.c > @@ -1282,14 +1282,13 @@ _mesa_init_debug_output(struct gl_context *ctx) > */ >struct gl_debug_state *debug = _mesa_lock_debug_state(ctx); >if (!debug) { > - goto done; > + return; >} >debug->DebugOutput = GL_TRUE; >debug->LogToStderr = GL_TRUE; >ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_DEBUG_BIT; > + _mesa_unlock_debug_state(ctx); > } > -done: > - _mesa_unlock_debug_state(ctx); > } > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965/blit: Fix the src dimension sanity check in miptree_copy
On Sun, Dec 11, 2016 at 07:56:59AM -0800, Jason Ekstrand wrote: >On Dec 11, 2016 12:28 AM, "Pohjolainen, Topi" ><[1]topi.pohjolai...@gmail.com> wrote: > >On Tue, Dec 06, 2016 at 12:37:45PM -0800, Jason Ekstrand wrote: >> Cc: "13.0" <[2]mesa-sta...@lists.freedesktop.org> >> --- >> src/mesa/drivers/dri/i965/intel_blit.c | 6 -- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/intel_blit.c >b/src/mesa/drivers/dri/i965/intel_blit.c >> index 03a35ee..5f0cf74 100644 >> --- a/src/mesa/drivers/dri/i965/intel_blit.c >> +++ b/src/mesa/drivers/dri/i965/intel_blit.c >> @@ -421,8 +421,10 @@ intel_miptree_copy(struct brw_context *brw, >> >>assert(src_x % bw == 0); >>assert(src_y % bh == 0); >> - assert(src_width % bw == 0); >> - assert(src_height % bh == 0); >> + assert(src_width % bw == 0 || >> + src_x + src_width == minify(src_mt->logical_width0, >src_level)); >> + assert(src_height % bh == 0 || >> + src_y + src_height == minify(src_mt->logical_height0, >src_level)); > > Can you given example how we can blit sub-block worth of data? Above > there is > check for src_x/y being block aligned don't all mipmap levels need > to be > aligned as well? > >For compressed textures, the width and height of the entire texture >don't have to be block-aligned. When computing the size of a miplevel >in blocks, you just round up to the nearest block and part of the last >block may be outside the texture. This means that every time you have >a subrectangle of a compressed textures, the upper and left edges have >to be block-aligned and the lower and right edges have to be either >block-aligned or be the left or right edge of the miplevel. Make >sense? It's really annoying. Ah, that makes sense. Would you mind leaving some of that as comment? Anyway, both patches: Reviewed-by: Topi Pohjolainen> >> >>src_x /= (int)bw; >>src_y /= (int)bh; >> -- >> 2.5.0.400.gff86faf >> > > > ___ > > mesa-dev mailing list > > [3]mesa-dev@lists.freedesktop.org > > [4]https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > References > >1. mailto:topi.pohjolai...@gmail.com >2. mailto:mesa-sta...@lists.freedesktop.org >3. mailto:mesa-dev@lists.freedesktop.org >4. https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] vulkan: deduplicate the util code
anv_util.c and radv_util.c are essentially duplicates, so create a new vk_util.c to be shared by both vulkan drivers. Some other content from anv_private.h and radv_private.h was moved to vk_util.h too. Signed-off-by: Grazvydas Ignotas--- no commit access configure.ac | 1 + src/Makefile.am | 1 + src/amd/vulkan/Makefile.am| 1 + src/amd/vulkan/Makefile.sources | 1 - src/amd/vulkan/radv_device.c | 2 +- src/amd/vulkan/radv_private.h | 60 +- src/amd/vulkan/radv_util.c| 130 -- src/intel/vulkan/Makefile.am | 1 + src/intel/vulkan/Makefile.sources | 1 - src/intel/vulkan/anv_private.h| 60 +- src/intel/vulkan/anv_util.c | 127 - src/vulkan/Makefile.am| 17 + src/vulkan/Makefile.sources | 4 ++ src/vulkan/vk_util.c | 105 ++ src/vulkan/vk_util.h | 94 +++ 15 files changed, 229 insertions(+), 376 deletions(-) delete mode 100644 src/amd/vulkan/radv_util.c delete mode 100644 src/intel/vulkan/anv_util.c create mode 100644 src/vulkan/Makefile.am create mode 100644 src/vulkan/Makefile.sources create mode 100644 src/vulkan/vk_util.c create mode 100644 src/vulkan/vk_util.h diff --git a/configure.ac b/configure.ac index 799f5eb..715a0ff 100644 --- a/configure.ac +++ b/configure.ac @@ -2855,6 +2855,7 @@ AC_CONFIG_FILES([Makefile src/mesa/main/tests/Makefile src/util/Makefile src/util/tests/hash_table/Makefile + src/vulkan/Makefile src/vulkan/wsi/Makefile]) AC_OUTPUT diff --git a/src/Makefile.am b/src/Makefile.am index ad54356..2c36dbd 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -116,6 +116,7 @@ SUBDIRS += intel/tools endif if HAVE_VULKAN_COMMON +SUBDIRS += vulkan SUBDIRS += vulkan/wsi endif diff --git a/src/amd/vulkan/Makefile.am b/src/amd/vulkan/Makefile.am index 6e184c0..2f746dd 100644 --- a/src/amd/vulkan/Makefile.am +++ b/src/amd/vulkan/Makefile.am @@ -94,6 +94,7 @@ libvulkan_common_la_SOURCES = $(VULKAN_SOURCES) VULKAN_LIB_DEPS += \ libvulkan_common.la \ + $(top_builddir)/src/vulkan/libvulkan_shared.la \ $(top_builddir)/src/vulkan/wsi/libvulkan_wsi.la \ $(top_builddir)/src/amd/common/libamd_common.la \ $(top_builddir)/src/amd/addrlib/libamdgpu_addrlib.la \ diff --git a/src/amd/vulkan/Makefile.sources b/src/amd/vulkan/Makefile.sources index 425a00f..139209b 100644 --- a/src/amd/vulkan/Makefile.sources +++ b/src/amd/vulkan/Makefile.sources @@ -56,7 +56,6 @@ VULKAN_FILES := \ radv_private.h \ radv_radeon_winsys.h \ radv_query.c \ - radv_util.c \ radv_util.h \ radv_wsi.c \ si_cmd_buffer.c \ diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 75b7af1..3ecc24f 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -829,7 +829,7 @@ VkResult radv_QueueSubmit( pSubmits[i].commandBufferCount, can_patch, base_fence); if (ret) - radv_loge("failed to submit CS %d\n", i); + vk_loge("failed to submit CS %d\n", i); free(cs_array); } diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h index 3d4b111..0abbe0b 100644 --- a/src/amd/vulkan/radv_private.h +++ b/src/amd/vulkan/radv_private.h @@ -49,6 +49,7 @@ #include "util/list.h" #include "util/vk_alloc.h" #include "main/macros.h" +#include "vulkan/vk_util.h" #include "radv_radeon_winsys.h" #include "ac_binary.h" @@ -100,36 +101,12 @@ enum radv_mem_type { RADV_MEM_TYPE_COUNT }; -#define radv_noreturn __attribute__((__noreturn__)) -#define radv_printflike(a, b) __attribute__((__format__(__printf__, a, b))) - -static inline uint32_t -align_u32(uint32_t v, uint32_t a) -{ - assert(a != 0 && a == (a & -a)); - return (v + a - 1) & ~(a - 1); -} - static inline uint32_t align_u32_npot(uint32_t v, uint32_t a) { return (v + a - 1) / a * a; } -static inline uint64_t -align_u64(uint64_t v, uint64_t a) -{ - assert(a != 0 && a == (a & -a)); - return (v + a - 1) & ~(a - 1); -} - -static inline int32_t -align_i32(int32_t v, int32_t a) -{ - assert(a != 0 && a == (a & -a)); - return (v + a - 1) & ~(a - 1); -} - /** Alignment must be a power of 2. */ static inline bool radv_is_aligned(uintmax_t n, uintmax_t a) @@ -182,38 +159,8 @@ radv_clear_mask(uint32_t *inout_mask, uint32_t clear_mask) } } -#define for_each_bit(b, dword) \ - for (uint32_t __dword = (dword);\ -(b) =
[Mesa-dev] [PATCH 2/2] vulkan: fix 'statement with no effect' warning
Emitted on release build in case vk_error() return is not used. Signed-off-by: Grazvydas Ignotas--- no commit access src/vulkan/vk_util.c | 4 src/vulkan/vk_util.h | 13 - 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/src/vulkan/vk_util.c b/src/vulkan/vk_util.c index 5d38117..716b8e5 100644 --- a/src/vulkan/vk_util.c +++ b/src/vulkan/vk_util.c @@ -60,6 +60,8 @@ __vk_finishme(const char *file, int line, const char *format, ...) fprintf(stderr, "%s:%d: FINISHME: %s\n", file, line, buffer); } +#ifdef DEBUG + VkResult __vk_errorf(VkResult error, const char *file, int line, const char *format, ...) { @@ -103,3 +105,5 @@ __vk_errorf(VkResult error, const char *file, int line, const char *format, ...) return error; } + +#endif /* DEBUG */ diff --git a/src/vulkan/vk_util.h b/src/vulkan/vk_util.h index c384838..5ce206c 100644 --- a/src/vulkan/vk_util.h +++ b/src/vulkan/vk_util.h @@ -72,15 +72,18 @@ align_i32(int32_t v, int32_t a) * propagating errors. Might be useful to plug in a stack trace here. */ +#ifdef DEBUG VkResult __vk_errorf(VkResult error, const char *file, int line, const char *format, ...); +#else +static inline VkResult +__vk_errorf(VkResult error, const char *file, int line, const char *format, ...) +{ + return error; +} +#endif -#ifdef DEBUG #define vk_error(error) __vk_errorf(error, __FILE__, __LINE__, NULL); #define vk_errorf(error, format, ...) __vk_errorf(error, __FILE__, __LINE__, format, ## __VA_ARGS__); -#else -#define vk_error(error) error -#define vk_errorf(error, format, ...) error -#endif void __vk_finishme(const char *file, int line, const char *format, ...) vk_printflike(3, 4); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 98869] Electronic Super Joy graphic artefacts (regression)
https://bugs.freedesktop.org/show_bug.cgi?id=98869 cosiek...@o2.pl changed: What|Removed |Added Priority|medium |high -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965/blit: Fix the src dimension sanity check in miptree_copy
On Dec 11, 2016 12:28 AM, "Pohjolainen, Topi"wrote: On Tue, Dec 06, 2016 at 12:37:45PM -0800, Jason Ekstrand wrote: > Cc: "13.0" > --- > src/mesa/drivers/dri/i965/intel_blit.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_blit.c b/src/mesa/drivers/dri/i965/intel_blit.c > index 03a35ee..5f0cf74 100644 > --- a/src/mesa/drivers/dri/i965/intel_blit.c > +++ b/src/mesa/drivers/dri/i965/intel_blit.c > @@ -421,8 +421,10 @@ intel_miptree_copy(struct brw_context *brw, > >assert(src_x % bw == 0); >assert(src_y % bh == 0); > - assert(src_width % bw == 0); > - assert(src_height % bh == 0); > + assert(src_width % bw == 0 || > + src_x + src_width == minify(src_mt->logical_width0, src_level)); > + assert(src_height % bh == 0 || > + src_y + src_height == minify(src_mt->logical_height0, src_level)); Can you given example how we can blit sub-block worth of data? Above there is check for src_x/y being block aligned don't all mipmap levels need to be aligned as well? For compressed textures, the width and height of the entire texture don't have to be block-aligned. When computing the size of a miplevel in blocks, you just round up to the nearest block and part of the last block may be outside the texture. This means that every time you have a subrectangle of a compressed textures, the upper and left edges have to be block-aligned and the lower and right edges have to be either block-aligned or be the left or right edge of the miplevel. Make sense? It's really annoying. > >src_x /= (int)bw; >src_y /= (int)bh; > -- > 2.5.0.400.gff86faf > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: don't attempt to unlock an unlocked debug state mutex
Commit 929fcee47e46781c57f2a354ce0a013915c033d1 introduced code that attempts to unlock an unlocked mutex which is undefined behaviour. On OpenBSD this leads to an abort: 0 0x124dadfa96ba in thrkill () at :2 1 0x124dadf3da39 in *_libc_abort () at /usr/src/lib/libc/stdlib/abort.c:52 2 0x124d2c1165b5 in *_libpthread_pthread_mutex_unlock (mutexp=) at /usr/src/lib/librthread/rthread_sync.c:221 3 0x124d279c02e4 in init_attrib_groups (ctx=0x124df0fda000) at main/context.c:825 4 _mesa_initialize_context (ctx=ctx@entry=0x124df0fda000, api=api@entry=API_OPENGL_CORE, visual=visual@entry=0x7f7bdfd0, share_list=share_list@entry=0x0, driverFunctions=driverFunctions@entry=0x7f7bda60) at main/context.c:1204 5 0x124d27b507ec in st_create_context (api=api@entry=API_OPENGL_CORE, pipe=pipe@entry=0x124dc491, visual=visual@entry=0x7f7bdfd0, share=share@entry=0x0, options=options@entry=0x7f7be128) at state_tracker/st_context.c:545 6 0x124d27b8639f in st_api_create_context (stapi=, smapi=0x124d1b608800, attribs=0x7f7be100, error=0x7f7be0fc, shared_stctxi=0x0) at state_tracker/st_manager.c:669 7 0x124d27cc5b9c in dri_create_context (api=, visual=0x124d8a0f8a00, cPriv=0x124de473f240, major_version=, minor_version=, flags=, notify_reset=false, error=0x7f7be2b4, sharedContextPrivate=0x0) at dri_context.c:123 8 0x124d27cc5029 in driCreateContextAttribs (screen=0x124d8a0f8400, api=, config=0x124d8a0f8a00, shared=, num_attribs=, attribs=, error=0x7f7be2b4, data=0x124d77814a00) at dri_util.c:448 9 0x124d8e109b00 in drisw_create_context_attribs (base=0x124df3e08700, config_base=0x124d7a0e7300, shareList=, num_attribs=, attribs=, error=0x7f7be2b4) at drisw_glx.c:476 10 0x124d8e104b4a in glXCreateContextAttribsARB (dpy=0x124d533f, config=0x124d7a0e7300, share_context=0x0, direct=1, attrib_list=0x7f7be300) at create_context.c:78 Signed-off-by: Jonathan Gray--- src/mesa/main/debug_output.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/mesa/main/debug_output.c b/src/mesa/main/debug_output.c index 48dbbb3..bc933db 100644 --- a/src/mesa/main/debug_output.c +++ b/src/mesa/main/debug_output.c @@ -1282,14 +1282,13 @@ _mesa_init_debug_output(struct gl_context *ctx) */ struct gl_debug_state *debug = _mesa_lock_debug_state(ctx); if (!debug) { - goto done; + return; } debug->DebugOutput = GL_TRUE; debug->LogToStderr = GL_TRUE; ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_DEBUG_BIT; + _mesa_unlock_debug_state(ctx); } -done: - _mesa_unlock_debug_state(ctx); } -- 2.10.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 99055] Games hang / freeze completely
https://bugs.freedesktop.org/show_bug.cgi?id=99055 Bug ID: 99055 Summary: Games hang / freeze completely Product: Mesa Version: 13.0 Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: major Priority: medium Component: Other Assignee: mesa-dev@lists.freedesktop.org Reporter: etie...@bruines.com QA Contact: mesa-dev@lists.freedesktop.org I recently updated to 13.0.2-1, and since then, any game will hang after about a few minutes of gameplay (sometimes even less than a minute). I am not entirely sure this is related to Mesa, but I'm confident that you'll be able to point me in the right direction. Games tested: Factorio, Democracy 3. Things that do work: Google Chrome (Netflix / YouTube). - GL_VERSION: 3.0 Mesa 13.0.2 GL_VENDOR: Intel Open Source Technology Center GL_RENDERER: Mesa DRI Intel(R) Sandybridge Mobile x86/MMX/SSE2 - No logs available at (looking around the moment of the 'hanging') /var/log/syslog /var/log/messages /var/log/Xorg.0.log dmesg The games do not output anything either. The game just freezes, as if the drawing-thread is hanging. Meaning after switching workspaces (screens), it'll refuse to load anything new and show the "old workspace" at that location. At this stage I have no idea how to debug this. SIGINT won't stop the program, I have to use SIGKILL. I have tried using `apitrace trace`. This "fixed" the bug: not able to reproduce when using apitrace. The current workaround I'm using right now is `apitrace trace -o /dev/null factorio` to start the game; it's a bit slower because of the tracing, but it hasn't crashed so far. (It often looks like it's going to crash, but it seems to recover from it) I'd be happy to provide any additional information. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 84/84 v2] st/nine: Implement new buffer upload path
This new buffer upload path enables to lock faster than the normal path when using DISCARD/NOOVERWRITE. v2: Diverse cleanups and fixes. Signed-off-by: Axel Davy--- This patch needed a few cleanups, whih this new version adds. With this version, I wasn't able to reproduce the memory leak. I'm not sure though it is fixed, I will ask the ones who reported the leak to check. I'm still interested in comments. For example, is the pool big enough ? src/gallium/state_trackers/nine/Makefile.sources | 2 + src/gallium/state_trackers/nine/buffer9.c | 75 +- src/gallium/state_trackers/nine/buffer9.h | 9 +- src/gallium/state_trackers/nine/device9.c | 5 + src/gallium/state_trackers/nine/device9.h | 3 + src/gallium/state_trackers/nine/indexbuffer9.c | 10 +- src/gallium/state_trackers/nine/indexbuffer9.h | 2 - .../state_trackers/nine/nine_buffer_upload.c | 290 + .../state_trackers/nine/nine_buffer_upload.h | 59 + src/gallium/state_trackers/nine/vertexbuffer9.c| 3 +- 10 files changed, 437 insertions(+), 21 deletions(-) create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.c create mode 100644 src/gallium/state_trackers/nine/nine_buffer_upload.h diff --git a/src/gallium/state_trackers/nine/Makefile.sources b/src/gallium/state_trackers/nine/Makefile.sources index 2bb08a2..56698a1 100644 --- a/src/gallium/state_trackers/nine/Makefile.sources +++ b/src/gallium/state_trackers/nine/Makefile.sources @@ -23,6 +23,8 @@ C_SOURCES := \ indexbuffer9.h \ iunknown.c \ iunknown.h \ + nine_buffer_upload.c \ + nine_buffer_upload.h \ nine_csmt_helper.h \ nine_debug.c \ nine_debug.h \ diff --git a/src/gallium/state_trackers/nine/buffer9.c b/src/gallium/state_trackers/nine/buffer9.c index 9a7c30c..203e1d9 100644 --- a/src/gallium/state_trackers/nine/buffer9.c +++ b/src/gallium/state_trackers/nine/buffer9.c @@ -23,6 +23,7 @@ #include "buffer9.h" #include "device9.h" +#include "nine_buffer_upload.h" #include "nine_helpers.h" #include "nine_pipe.h" @@ -100,6 +101,9 @@ NineBuffer9_ctor( struct NineBuffer9 *This, else info->usage = PIPE_USAGE_DYNAMIC; +/* When Writeonly is not set, we don't want to enable the + * optimizations */ +This->discard_nooverwrite_only = !!(Usage & D3DUSAGE_WRITEONLY); /* if (pDesc->Usage & D3DUSAGE_DONOTCLIP) { } */ /* if (pDesc->Usage & D3DUSAGE_NONSECURE) { } */ /* if (pDesc->Usage & D3DUSAGE_NPATCHES) { } */ @@ -161,12 +165,18 @@ NineBuffer9_dtor( struct NineBuffer9 *This ) list_del(>managed.list2); } +if (This->buf) +nine_upload_release_buffer(This->base.base.device->buffer_upload, This->buf); + NineResource9_dtor(>base); } struct pipe_resource * -NineBuffer9_GetResource( struct NineBuffer9 *This ) +NineBuffer9_GetResource( struct NineBuffer9 *This, unsigned *offset ) { +if (This->buf) +return nine_upload_buffer_resource_and_offset(This->buf, offset); +*offset = 0; return NineResource9_GetResource(>base); } @@ -260,6 +270,8 @@ NineBuffer9_Lock( struct NineBuffer9 *This, if (Flags & D3DLOCK_DONOTWAIT && !(This->base.usage & D3DUSAGE_DYNAMIC)) usage |= PIPE_TRANSFER_DONTBLOCK; +This->discard_nooverwrite_only &= !!(Flags & (D3DLOCK_DISCARD | D3DLOCK_NOOVERWRITE)); + if (This->nmaps == This->maxmaps) { struct NineTransfer *newmaps = REALLOC(This->maps, sizeof(struct NineTransfer)*This->maxmaps, @@ -271,8 +283,52 @@ NineBuffer9_Lock( struct NineBuffer9 *This, This->maps = newmaps; } -This->maps[This->nmaps].is_pipe_secondary = FALSE; +if (This->buf && +(!This->discard_nooverwrite_only || + (This->discard_nooverwrite_only && (Flags & D3DLOCK_DISCARD { +/* Release previous buffer */ +if (This->nmaps >= 1) +This->maps[This->nmaps-1].should_destroy_buf = true; +else +nine_upload_release_buffer(device->buffer_upload, This->buf); +This->buf = NULL; +/* Mapping a nine_subbuffer without discard/nooverwrite. + * Since that will hurt (we'll wait the entire buffer + * to be idle), we'll won't nine_subbuffer for this buffer + * in fear that happens again. */ +if (!This->discard_nooverwrite_only) { +DBG("Disabling nine_subbuffer for a buffer having" +"used a nine_subbuffer buffer\n"); +/* Rebind previous resource */ +NineBuffer9_RebindIfRequired(This, device); +} +} + +This->maps[This->nmaps].transfer = NULL; +This->maps[This->nmaps].is_pipe_secondary = false; +This->maps[This->nmaps].buf = NULL; +This->maps[This->nmaps].should_destroy_buf = false; + +if (This->discard_nooverwrite_only) { +if (!This->buf) { +This->buf
Re: [Mesa-dev] [PATCH] glsl: Make copy propagation not panic when it sees an intrinsic.
I don't see any difference in Deus Ex, but F1 2015 uses 64% less spilled-temp-array memory with this on GCN. Marek On Sat, Dec 10, 2016 at 5:28 AM, Kenneth Graunkewrote: > A number of games have large arrays of constants, which we promote to > uniforms. This introduces copies from the uniform array to the original > temporary array. Normally, copy propagation eliminates those copies, > making everything refer to the uniform array directly. > > A number of shaders in "Deus Ex: Mankind Divided" recently exposed a > limitation of copy propagation - if we had any intrinsics (i.e. image > access in a compute shader), we weren't able to get rid of these copies. > > That meant that any variable indexing remained on the temporary array > rather being moved to the uniform array. i965's scalar backend > currently doesn't support indirect addressing of temporary arrays, > which meant lowering it to if-ladders. This was horrible. > > On Skylake: > > total instructions in shared programs: 13700090 -> 13654519 (-0.33%) > instructions in affected programs: 56438 -> 10867 (-80.75%) > helped: 14 > HURT: 0 > > total cycles in shared programs: 288879704 -> 291270232 (0.83%) > cycles in affected programs: 12758080 -> 15148608 (18.74%) > helped: 6 > HURT: 8 > > All shaders helped are compute shaders in Tomb Raider or Deus Ex. > > Signed-off-by: Kenneth Graunke > --- > src/compiler/glsl/opt_copy_propagation.cpp | 31 > ++ > 1 file changed, 27 insertions(+), 4 deletions(-) > > diff --git a/src/compiler/glsl/opt_copy_propagation.cpp > b/src/compiler/glsl/opt_copy_propagation.cpp > index 247c498..2240421 100644 > --- a/src/compiler/glsl/opt_copy_propagation.cpp > +++ b/src/compiler/glsl/opt_copy_propagation.cpp > @@ -186,11 +186,34 @@ ir_copy_propagation_visitor::visit_enter(ir_call *ir) >} > } > > - /* Since we're unlinked, we don't (necessarily) know the side effects of > -* this call. So kill all copies. > + /* Since this pass can run when unlinked, we don't (necessarily) know > +* the side effects of calls. (When linked, most calls are inlined > +* anyway, so it doesn't matter much.) > +* > +* One place where this does matter is IR intrinsics. They're never > +* inlined. We also know what they do - while some have side effects > +* (such as image writes), none edit random global variables. So we > +* can assume they're side-effect free (other than the return value > +* and out parameters). > */ > - _mesa_hash_table_clear(acp, NULL); > - this->killed_all = true; > + if (!ir->callee->is_intrinsic()) { > + _mesa_hash_table_clear(acp, NULL); > + this->killed_all = true; > + } else { > + if (ir->return_deref) > + kill(ir->return_deref->var); > + > + foreach_two_lists(formal_node, >callee->parameters, > +actual_node, >actual_parameters) { > + ir_variable *sig_param = (ir_variable *) formal_node; > + if (sig_param->data.mode == ir_var_function_out || > + sig_param->data.mode == ir_var_function_inout) { > +ir_rvalue *ir = (ir_rvalue *) actual_node; > +ir_variable *var = ir->variable_referenced(); > +kill(var); > + } > + } > + } > > return visit_continue_with_parent; > } > -- > 2.10.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965/blit: Fix the src dimension sanity check in miptree_copy
On Tue, Dec 06, 2016 at 12:37:45PM -0800, Jason Ekstrand wrote: > Cc: "13.0"> --- > src/mesa/drivers/dri/i965/intel_blit.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_blit.c > b/src/mesa/drivers/dri/i965/intel_blit.c > index 03a35ee..5f0cf74 100644 > --- a/src/mesa/drivers/dri/i965/intel_blit.c > +++ b/src/mesa/drivers/dri/i965/intel_blit.c > @@ -421,8 +421,10 @@ intel_miptree_copy(struct brw_context *brw, > >assert(src_x % bw == 0); >assert(src_y % bh == 0); > - assert(src_width % bw == 0); > - assert(src_height % bh == 0); > + assert(src_width % bw == 0 || > + src_x + src_width == minify(src_mt->logical_width0, src_level)); > + assert(src_height % bh == 0 || > + src_y + src_height == minify(src_mt->logical_height0, > src_level)); Can you given example how we can blit sub-block worth of data? Above there is check for src_x/y being block aligned don't all mipmap levels need to be aligned as well? > >src_x /= (int)bw; >src_y /= (int)bh; > -- > 2.5.0.400.gff86faf > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC 3/7] i965/fs: Add a CHANNEL_IDS opcode
On Mon, Dec 05, 2016 at 11:59:54AM -0800, Jason Ekstrand wrote: > --- > src/mesa/drivers/dri/i965/brw_defines.h| 6 ++ > src/mesa/drivers/dri/i965/brw_fs.h | 3 +++ > src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 1 + > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 28 > ++ > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 14 +++-- > src/mesa/drivers/dri/i965/brw_shader.cpp | 2 ++ > 6 files changed, 43 insertions(+), 11 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > b/src/mesa/drivers/dri/i965/brw_defines.h > index f22a52f..16a72c4 100644 > --- a/src/mesa/drivers/dri/i965/brw_defines.h > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > @@ -1095,6 +1095,12 @@ enum opcode { > */ > SHADER_OPCODE_BROADCAST, > > + /* Takes no sources and returns the index of each channel in its > respective > +* SIMD channel in the destination register. The instruction must have > +* WE_all set and the destination must have type BRW_REGISTER_TYPE_UW. In the implementation further down you actually pass BRW_REGISTER_TYPE_W and assert for W or UW. While I haven't actually worked much on the compiler and just been reading patches in order to keep up, to me this patch looks to make things better structured its own - generator emitting the actual trick of instructions. > +*/ > + SHADER_OPCODE_CHANNEL_IDS, > + > VEC4_OPCODE_MOV_BYTES, > VEC4_OPCODE_PACK_BYTES, > VEC4_OPCODE_UNPACK_UNIFORM, > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h > b/src/mesa/drivers/dri/i965/brw_fs.h > index 002cee8..4ce0f56 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.h > +++ b/src/mesa/drivers/dri/i965/brw_fs.h > @@ -437,6 +437,9 @@ private: >struct brw_reg msg_data, >unsigned msg_type); > > + void generate_channel_ids(fs_inst *inst, > + struct brw_reg dst); > + > void generate_set_sample_id(fs_inst *inst, > struct brw_reg dst, > struct brw_reg src0, > diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp > b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp > index 48220ef..02ccda1 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp > @@ -79,6 +79,7 @@ is_expression(const fs_visitor *v, const fs_inst *const > inst) > case FS_OPCODE_LINTERP: > case SHADER_OPCODE_FIND_LIVE_CHANNEL: > case SHADER_OPCODE_BROADCAST: > + case SHADER_OPCODE_CHANNEL_IDS: > case SHADER_OPCODE_MOV_INDIRECT: > case SHADER_OPCODE_TEX_LOGICAL: > case SHADER_OPCODE_TXD_LOGICAL: > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > index c5b50e1..3ef2d5b 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > @@ -1435,6 +1435,30 @@ fs_generator::generate_set_simd4x2_offset(fs_inst > *inst, > brw_pop_insn_state(p); > } > > +void > +fs_generator::generate_channel_ids(fs_inst *inst, struct brw_reg dst) > +{ > + assert(dst.type == BRW_REGISTER_TYPE_UW || > + dst.type == BRW_REGISTER_TYPE_W); > + assert(inst->exec_size >= 8); > + > + brw_push_insn_state(p); > + brw_set_default_exec_size(p, BRW_EXECUTE_8); > + brw_set_default_compression_control(p, BRW_COMPRESSION_NONE); > + brw_set_default_mask_control(p, true); > + brw_MOV(p, dst, brw_imm_v(0x76543210)); > + > + if (inst->exec_size > 8) { > + assert(inst->exec_size == 16); > + brw_ADD(p, vec8(suboffset(dst, 8)), vec8(dst), brw_imm_w(8)); > + } > + > + brw_pop_insn_state(p); > + > + if (inst->group > 0) > + brw_ADD(p, dst, dst, brw_imm_w(inst->group)); > +} > + > /* Sets vstride=1, width=4, hstride=0 of register src1 during > * the ADD instruction. > */ > @@ -2056,6 +2080,10 @@ fs_generator::generate_code(const cfg_t *cfg, int > dispatch_width) > brw_broadcast(p, dst, src[0], src[1]); > break; > > + case SHADER_OPCODE_CHANNEL_IDS: > + generate_channel_ids(inst, dst); > + break; > + >case FS_OPCODE_SET_SAMPLE_ID: > generate_set_sample_id(inst, dst, src[0], src[1]); > break; > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index 855266f..9478bb8 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -4310,17 +4310,9 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , > nir_intrinsic_instr *instr > } > > case nir_intrinsic_load_channel_num: { > - fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_UW); > - dest = retype(dest, BRW_REGISTER_TYPE_UD); > - const fs_builder allbld8 = bld.group(8, 0).exec_all(); > - allbld8.MOV(tmp, brw_imm_v(0x76543210)); > -
Re: [Mesa-dev] [PATCH 26/27] i965: Remove scanout restriction from lossless compression
On Thu, Dec 01, 2016 at 02:10:07PM -0800, Ben Widawsky wrote: > From: Ben Widawsky> > Cc: Topi Pohjolainen > Signed-off-by: Ben Widawsky > --- > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 12 +++- > 1 file changed, 3 insertions(+), 9 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > index b79de08..b297f79 100644 > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > @@ -157,7 +157,7 @@ intel_miptree_supports_non_msrt_fast_clear(struct > brw_context *brw, > if (mt->disable_aux_buffers) >return false; > > - if (mt->is_scanout) > + if (mt->is_scanout && mt->msaa_layout != INTEL_MSAA_LAYOUT_CMS) I guess the same question I had in previous becomes more general. How do we know that the external consumer is prepared for compression (mt->is_scanout just tells that the buffer goes out from the driver, right)? >return false; > > /* This function applies only to non-multisampled render targets. */ > @@ -528,10 +528,6 @@ intel_miptree_create_layout(struct brw_context *brw, >const UNUSED bool is_lossless_compressed_aux = > brw->gen >= 9 && num_samples == 1 && > mt->format == MESA_FORMAT_R_UINT32; > - > - /* For now, nothing else has this requirement */ > - assert(is_lossless_compressed_aux || > - (layout_flags & MIPTREE_LAYOUT_FORCE_HALIGN16) == 0); Why do we need to drop this? And if we do, then we can drop the entire else-branch, "is_lossless_compressed_aux" is only used in the assertion. > } > > brw_miptree_layout(brw, mt, layout_flags); > @@ -752,11 +748,9 @@ intel_miptree_create(struct brw_context *brw, > * resolves. > */ >const bool lossless_compression_disabled = INTEL_DEBUG & DEBUG_NO_RBC; > - assert(!mt->is_scanout); >const bool is_lossless_compressed = > unlikely(!lossless_compression_disabled) && > - brw->gen >= 9 && !mt->is_scanout && > - intel_miptree_supports_lossless_compressed(brw, mt); > + brw->gen >= 9 && intel_miptree_supports_lossless_compressed(brw, > mt); > >if (is_lossless_compressed) { > intel_miptree_alloc_non_msrt_mcs(brw, mt, is_lossless_compressed); > @@ -1043,7 +1037,7 @@ intel_miptree_release(struct intel_mipmap_tree **mt) > drm_intel_bo_unreference((*mt)->hiz_buf->aux_base.bo); > free((*mt)->hiz_buf); >} > - if ((*mt)->mcs_buf && !(*mt)->is_scanout) { > + if ((*mt)->mcs_buf) { > drm_intel_bo_unreference((*mt)->mcs_buf->bo); > free((*mt)->mcs_buf); >} > -- > 2.10.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Make copy propagation not panic when it sees an intrinsic.
On Saturday, December 10, 2016 12:37:16 PM PST Matt Turner wrote: > On Fri, Dec 9, 2016 at 8:28 PM, Kenneth Graunkewrote: > > A number of games have large arrays of constants, which we promote to > > uniforms. This introduces copies from the uniform array to the original > > temporary array. Normally, copy propagation eliminates those copies, > > making everything refer to the uniform array directly. > > > > A number of shaders in "Deus Ex: Mankind Divided" recently exposed a > > limitation of copy propagation - if we had any intrinsics (i.e. image > > access in a compute shader), we weren't able to get rid of these copies. > > > > That meant that any variable indexing remained on the temporary array > > rather being moved to the uniform array. i965's scalar backend > > currently doesn't support indirect addressing of temporary arrays, > > which meant lowering it to if-ladders. This was horrible. > > > > On Skylake: > > > > total instructions in shared programs: 13700090 -> 13654519 (-0.33%) > > instructions in affected programs: 56438 -> 10867 (-80.75%) > > Wow! > > > helped: 14 > > HURT: 0 > > > > total cycles in shared programs: 288879704 -> 291270232 (0.83%) > > cycles in affected programs: 12758080 -> 15148608 (18.74%) > > ... that seems nuts? > > Any idea what's going on with the cycle counts? Good point...I glossed over the cycle counts when I saw the -80% reduction in instructions with 0 shaders hurt. But they do look pretty bad, so let's take a closer look... There are two nearly identical shaders that are the worst offenders: shaders/closed/steam/deus-ex-mankind-divided/256.shader_test CS SIMD16: instructions: 2770 -> 253 (-2,517 instructions or -90.87%) spills: 25 -> 0 fills: 29 -> 0 cycles: 923266 -> 1420534 (+497,268 cycles or +53.86%) compile time: 2.73 seconds -> 0.17 seconds There are three loops in the program, each of which contains two indirect reads of the uvec4[98] constant array. Before this patch, there were: - 67 UNIFORM_PULL_CONSTANT_LOADs at the top of the program - 1 UNIFORM_PULL_CONSTANT_LOAD in the first (cheap) loop - 1 UNIFORM_PULL_CONSTANT_LOAD in the second (expensive) loop - 1 UNIFORM_PULL_CONSTANT_LOAD in the third (very expensive) loop After this patch, there are: - 0 loads at the top of the program - 1 VARYING_PULL_CONSTANT_LOAD in the first (cheap) loop - 2 VARYING_PULL_CONSTANT_LOAD in the second (expensive) loop - 2 VARYING_PULL_CONSTANT_LOAD in the third (very expensive) loop The array indexes in the expensive loop are a[foo] and a[foo + 1]. foo is modified in the loop, so they can't be hoisted out. I don't think we can determine the number of loop iterations. The two expensive loops look to be twice as expensive after this patch. The numbers aren't quite adding up for me - it looks like we should spend 200 more cycles per loop iteration, but the loops are like 40,000 -> 90,000 cycles. I'm not sure what to do with this information. Eliminating 90% of the instructions seems good. Requiring no scratch access seems good. Eliminating the 67 memory loads outside of the loops seems good. Doing two memory loads per loop doesn't seem too crazy, given that it matches the GLSL source code. Burning 49 registers to store the entire array for the lifetime of the program seems pretty crazy... --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev