Re: [Mesa-dev] [PATCH] i965: Fix ARB_indirect_parameters logic.
On Monday, October 30, 2017 2:14:24 PM PDT Plamena Manolova wrote: > This patch modifies the ARB_indirect_parameters logic in > brw_draw_prims, so that our implementation isn't affected if > another application attempts to use predicates. Previously we > were using a predicate with a DELTAS_EQUAL comparison operation > and relying on the MI_PREDICATE_DATA register being 0. Our code > to initialize MI_PREDICATE_DATA to 0 was incorrect, so we were > accidentally using whatever value was written there. Because the > kernel does not initialize the MI_PREDICATE_DATA register on > hardware context creation, we might inherit the value from whatever > context was last running on the GPU (likely another process). > The Haswell command parser also does not currently allow us to write > the MI_PREDICATE_DATA register. Rather than fixing this and requiring > an updated kernel, we switch to a different approach which uses a > SRCS_EQUAL predicate that makes no assumptions about the states of any > of the predicate registers. > > Fixes: piglit.spec.arb_indirect_parameters.tf-count-arrays > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103085 > > Signed-off-by: Plamena ManolovaReviewed-by: Kenneth Graunke and pushed. Thanks Pam! signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/7] meson: build r600
Hi Dylan, I took a crack at testing this one. My BARTS (Radeon 6850) seems to be running through piglit with a reasonable pass-rate afterwards, a restart of GDM/gnome went fine, and at least xonotic launched and is running. I did attempt to build it with vdpau/va gallium-media api support (as defined in meson_options.txt), but that doesn't seem to be getting built (From the TODO's in src/gallium/meson.build, I see that's still pending). I also noticed that there's still TODO items for the clover state tracker. Once we add clover support, the r600g driver will need an LLVM dependency when that state tracker is enabled, as well... I did end up having to do some mild rebasing of this patch on top of current master, but it was pretty trivial. That being said, this patch seems to be working as intended, so you can add: Tested-by: Aaron WatryIf you want, I can dig an r300-based card out of the closet and give that a whirl as well, but I won't bother unless you need me to. --Aaron On Thu, Oct 26, 2017 at 6:57 PM, Dylan Baker wrote: > This has been build tested only. > > Signed-off-by: Dylan Baker > --- > meson.build | 20 -- > meson_options.txt| 2 +- > src/gallium/drivers/r600/meson.build | 128 > +++ > src/gallium/meson.build | 4 +- > src/gallium/targets/dri/meson.build | 7 +- > 5 files changed, 153 insertions(+), 8 deletions(-) > create mode 100644 src/gallium/drivers/r600/meson.build > > diff --git a/meson.build b/meson.build > index a03da18659e..3aa4cbfddc4 100644 > --- a/meson.build > +++ b/meson.build > @@ -107,6 +107,7 @@ with_gallium = false > with_gallium_pl111 = false > with_gallium_radeonsi = false > with_gallium_r300 = false > +with_gallium_r600 = false > with_gallium_nouveau = false > with_gallium_freedreno = false > with_gallium_softpipe = false > @@ -121,6 +122,7 @@ if _drivers != '' >with_gallium_pl111 = _split.contains('pl111') >with_gallium_radeonsi = _split.contains('radeonsi') >with_gallium_r300 = _split.contains('r300') > + with_gallium_r600 = _split.contains('r600') >with_gallium_nouveau = _split.contains('nouveau') >with_gallium_freedreno = _split.contains('freedreno') >with_gallium_softpipe = _split.contains('swrast') > @@ -623,9 +625,13 @@ dep_thread = dependency('threads') > if dep_thread.found() and host_machine.system() == 'linux' >pre_args += '-DHAVE_PTHREAD' > endif > -dep_elf = dependency('libelf', required : false) > -if not dep_elf.found() and (with_amd_vk or with_gallium_radeonsi) # TODO: > clover, r600 > - dep_elf = cc.find_library('elf') > +if with_amd_vk or with_gallium_radeonsi or with_gallium_r600 # TODO: clover > + dep_elf = dependency('libelf', required : false) > + if not dep_elf.found() > +dep_elf = cc.find_library('elf') > + endif > +else > + dep_elf = [] > endif > dep_expat = dependency('expat') > # this only exists on linux so either this is linux and it will be found, or > @@ -640,7 +646,8 @@ dep_libdrm_freedreno = [] > if with_amd_vk or with_gallium_radeonsi >dep_libdrm_amdgpu = dependency('libdrm_amdgpu', version : '>= 2.4.85') > endif > -if with_gallium_radeonsi or with_dri_r100 or with_dri_r200 or > with_gallium_r300 > +if (with_gallium_radeonsi or with_dri_r100 or with_dri_r200 or > +with_gallium_r300 or with_gallium_r600) >dep_libdrm_radeon = dependency('libdrm_radeon', version : '>= 2.4.71') > endif > if with_gallium_nouveau or with_dri_nouveau > @@ -654,8 +661,11 @@ if with_gallium_freedreno > endif > > llvm_modules = ['bitwriter', 'engine', 'mcdisassembler', 'mcjit'] > -if with_amd_vk or with_gallium_radeonsi # TODO: r600 > +if with_amd_vk or with_gallium_radeonsi or with_gallium_r600 >llvm_modules += ['amdgpu', 'bitreader', 'ipo'] > + if with_gallium_r600 > +llvm_modules += 'asmparser' > + endif > endif > dep_llvm = dependency( >'llvm', version : '>= 3.9.0', required : with_amd_vk, modules : > llvm_modules, > diff --git a/meson_options.txt b/meson_options.txt > index 6ac22600ceb..b811fda0dc1 100644 > --- a/meson_options.txt > +++ b/meson_options.txt > @@ -46,7 +46,7 @@ option( > option( >'gallium-drivers', >type : 'string', > - value : 'pl111,radeonsi,nouveau,freedreno,swrast,vc4,etnaviv,imx,r300', > + value : > 'pl111,radeonsi,nouveau,freedreno,swrast,vc4,etnaviv,imx,r300,r600', >description : 'comma separated list of gallium drivers to build.' > ) > option( > diff --git a/src/gallium/drivers/r600/meson.build > b/src/gallium/drivers/r600/meson.build > new file mode 100644 > index 000..411b550331d > --- /dev/null > +++ b/src/gallium/drivers/r600/meson.build > @@ -0,0 +1,128 @@ > +# Copyright © 2017 Intel Corporation > + > +# Permission is hereby granted, free of charge, to any person obtaining a > copy > +# of this software and associated documentation files (the
[Mesa-dev] [PATCH] radeonsi: fix culldist_writemask in nir path
In RADV we need to offset the writemask because nir_lower_clip_cull_distance_arrays() combines the arrays. However we can't use this with radeonsi currently so don't offset the writemask. Fixes the following piglit tests: arb_cull_distance/clip-cull-3.shader_test arb_cull_distance/clip-cull-4.shader_test --- src/gallium/drivers/radeonsi/si_shader_nir.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader_nir.c b/src/gallium/drivers/radeonsi/si_shader_nir.c index e186661caf..7a88227381 100644 --- a/src/gallium/drivers/radeonsi/si_shader_nir.c +++ b/src/gallium/drivers/radeonsi/si_shader_nir.c @@ -295,22 +295,21 @@ void si_nir_scan_shader(const struct nir_shader *nir, info->samplers_declared |= u_bit_consecutive(variable->data.binding, aoa_size); else if (base_type == GLSL_TYPE_IMAGE) info->images_declared |= u_bit_consecutive(variable->data.binding, aoa_size); } info->num_written_clipdistance = nir->info.clip_distance_array_size; info->num_written_culldistance = nir->info.cull_distance_array_size; info->clipdist_writemask = u_bit_consecutive(0, info->num_written_clipdistance); - info->culldist_writemask = u_bit_consecutive(info->num_written_clipdistance, - info->num_written_culldistance); + info->culldist_writemask = u_bit_consecutive(0, info->num_written_culldistance); if (info->processor == PIPE_SHADER_FRAGMENT) info->uses_kill = nir->info.fs.uses_discard; /* TODO make this more accurate */ info->const_buffers_declared = u_bit_consecutive(0, SI_NUM_CONST_BUFFERS); info->shader_buffers_declared = u_bit_consecutive(0, SI_NUM_SHADER_BUFFERS); func = (struct nir_function *)exec_list_get_head_const(>functions); nir_foreach_block(block, func->impl) { -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/25] gallium/u_threaded: implement asynchronous flushes
On Sun, Oct 22, 2017 at 9:07 PM, Nicolai Hähnlewrote: > @@ -107,20 +138,46 @@ static boolean si_fence_finish(struct pipe_screen > *screen, >uint64_t timeout) > { > struct radeon_winsys *rws = ((struct r600_common_screen*)screen)->ws; > struct si_multi_fence *rfence = (struct si_multi_fence *)fence; > struct r600_common_context *rctx; > int64_t abs_timeout = os_time_get_absolute_timeout(timeout); > > ctx = threaded_context_unwrap_sync(ctx); > rctx = ctx ? (struct r600_common_context*)ctx : NULL; > > + if (!util_queue_fence_is_signalled(>ready)) { > + if (!timeout) > + return false; > + > + if (rfence->tc_token) { > + /* Ensure that si_flush_from_st will be called for > +* this fence, but only if we're in the API thread > +* where the context is current. > +* > +* Note that the batch containing the flush may > already > +* be in flight in the driver thread, so the fence > +* may not be ready yet when this call returns. > +*/ > + threaded_context_flush(ctx, rfence->tc_token); > + } > + > + if (timeout == PIPE_TIMEOUT_INFINITE) { > + util_queue_fence_wait(>ready); > + } else { > + if (!util_queue_fence_wait_timeout(>ready, > abs_timeout)) > + return false; > + } > + > + assert(!rfence->tc_token); tc_token might be non-NULL if this code is executed right after si_flush_from_st signals the fence. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] ac/nir: for ubo load use correct num_components
From: Dave AirlieI was hacking something stupid in doom, and hit an assert for the bitcast following this, it definitely looks like this should be the number of 32-bit components, not the instr level ones. Signed-off-by: Dave Airlie --- src/amd/common/ac_nir_to_llvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index a736d34..fb44d55 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -2510,7 +2510,7 @@ static LLVMValueRef visit_load_ubo_buffer(struct ac_nir_context *ctx, } - ret = ac_build_gather_values(>ac, results, instr->num_components); + ret = ac_build_gather_values(>ac, results, num_components); return LLVMBuildBitCast(ctx->ac.builder, ret, get_def_type(ctx, >dest.ssa), ""); } -- 2.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] meson: set visibility flags on gbm
Dylan Bakerwrites: > This is done in autotools, and is an oversight in the meson build. > > Signed-off-by: Dylan Baker Both are: Reviewed-by: Eric Anholt signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: use correct alloc function when loading from disk
R-b On 31 Oct 2017 01:35, "Timothy Arceri"wrote: > Fixes regression in: > > dEQP-VK.api.object_management.alloc_callback_fail.graphics_pipeline > > Fixes: 1e84e53712ae "radv: add cache items to in memory cache when reading > from disk" > --- > src/amd/vulkan/radv_pipeline_cache.c | 15 ++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/src/amd/vulkan/radv_pipeline_cache.c b/src/amd/vulkan/radv_ > pipeline_cache.c > index 91470d1419..2904b62e6b 100644 > --- a/src/amd/vulkan/radv_pipeline_cache.c > +++ b/src/amd/vulkan/radv_pipeline_cache.c > @@ -269,21 +269,34 @@ radv_create_shader_variants_from_pipeline_cache(struct > radv_device *device, > uint8_t disk_sha1[20]; > disk_cache_compute_key(device- > >physical_device->disk_cache, >sha1, 20, disk_sha1); > entry = (struct cache_entry *) > disk_cache_get(device-> > physical_device->disk_cache, >disk_sha1, NULL); > if (!entry) { > pthread_mutex_unlock(>mutex); > return false; > } else { > - radv_pipeline_cache_add_entry(cache, entry); > + size_t size = entry_size(entry); > + struct cache_entry *new_entry = > vk_alloc(>alloc, size, 8, > + > VK_SYSTEM_ALLOCATION_SCOPE_CACHE); > + if (!new_entry) { > + free(entry); > + pthread_mutex_unlock(>mutex); > + return false; > + } > + > + memcpy(new_entry, entry, entry_size(entry)); > + free(entry); > + entry = new_entry; > + > + radv_pipeline_cache_add_entry(cache, new_entry); > } > } > > char *p = entry->code; > for(int i = 0; i < MESA_SHADER_STAGES; ++i) { > if (!entry->variants[i] && entry->code_sizes[i]) { > struct radv_shader_variant *variant; > struct cache_entry_variant_info info; > > variant = calloc(1, sizeof(struct > radv_shader_variant)); > -- > 2.14.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 38/43] i965/fs: Optimize 16-bit SSBO stores by packing two into a 32-bit reg
On Thu, Oct 12, 2017 at 11:38 AM, Jose Maria Casanova Crespo < jmcasan...@igalia.com> wrote: > From: Eduardo Lima Mitev> > Currently, we use byte-scattered write messages for storing 16-bit > into an SSBO. This is because untyped surface messages have a fixed > 32-bit size. > > This patch optimizes these 16-bit writes by combining 2 values (e.g, > two consecutive components) into a 32-bit register, packing the two > 16-bit words. > > 16-bit single component values will continue to use byte-scattered > write messages. > > This optimization reduces the number of SEND messages used for storing > 16-bit values potentially by 2 or 4, which cuts down execution time > significantly because byte-scattered writes are an expensive > operation. > > v2: Removed use of stride = 2 on sources (Jason Ekstrand) > Rework optimization using shuffle 16 write and enable writes > of 16bit vec4 with only one message of 32-bits. (Chema Casanova) > > Signed-off-by: Jose Maria Casanova Crespo > Signed-off-by: Eduardo Lima > --- > src/intel/compiler/brw_fs_nir.cpp | 64 ++ > + > 1 file changed, 52 insertions(+), 12 deletions(-) > > diff --git a/src/intel/compiler/brw_fs_nir.cpp > b/src/intel/compiler/brw_fs_nir.cpp > index 2d0b3e139e..c07b3e4d8d 100644 > --- a/src/intel/compiler/brw_fs_nir.cpp > +++ b/src/intel/compiler/brw_fs_nir.cpp > @@ -4218,6 +4218,9 @@ fs_visitor::nir_emit_intrinsic(const fs_builder > , nir_intrinsic_instr *instr > instr->num_components); > val_reg = tmp; >} > + if (bit_size == 16) { > + val_reg=retype(val_reg, BRW_REGISTER_TYPE_HF); > + } > >/* 16-bit types would use a minimum of 1 slot */ >unsigned type_slots = MAX2(type_size / 4, 1); > @@ -4231,6 +4234,9 @@ fs_visitor::nir_emit_intrinsic(const fs_builder > , nir_intrinsic_instr *instr > unsigned first_component = ffs(writemask) - 1; > unsigned length = ffs(~(writemask >> first_component)) - 1; > > + fs_reg current_val_reg = > +offset(val_reg, bld, first_component * type_slots); > + > /* We can't write more than 2 64-bit components at once. Limit > the >* length of the write to what we can do and let the next > iteration >* handle the rest > @@ -4238,11 +4244,40 @@ fs_visitor::nir_emit_intrinsic(const fs_builder > , nir_intrinsic_instr *instr > if (type_size > 4) { > length = MIN2(2, length); > } else if (type_size == 2) { > -/* For 16-bit types we are using byte scattered writes, that > can > - * only write one component per call. So we limit the length, > and > - * let the write happening in several iterations. > +/* For 16-bit types we pack two consecutive values into a > + * 32-bit word and use an untyped write message. For single > values > + * we need to use byte-scattered writes because untyped > writes work > + * on multiples of 32 bits. > + * > + * For example, if there is a 3-component vector we submit one > + * untyped-write message of 32-bit (first two components), > and one > + * byte-scattered write message (the last component). > */ > -length = 1; > +if (length >= 2) { > + /* pack two consecutive 16-bit words into a 32-bit > register, > +* using the same original source register. > +*/ > This doesn't work if you have a writemask of .xz > + length -= length % 2; > I'm very confused by this bit of math. > + fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F, length / 2); > + shuffle_16bit_data_for_32bit_write(bld, > + tmp, > + current_val_reg, > + length); > + current_val_reg = tmp; > + > +} else { > + /* For single 16-bit values, we just limit the length to 1 > and > +* use a byte-scattered write message below. > +*/ > + length = 1; > I think this can be an assert. Also, why do we need the shuffle? I thought this case would work if we just set length == 1. I answered my own question about the shuffle. It lets us delete some code below. > + fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F); > + shuffle_16bit_data_for_32bit_write(bld, > + tmp, > + current_val_reg, > + length); > + current_val_reg = tmp; > + > +} > } > > fs_reg offset_reg; > @@ -4257,24 +4292,29 @@
[Mesa-dev] [PATCH] radv: use correct alloc function when loading from disk
Fixes regression in: dEQP-VK.api.object_management.alloc_callback_fail.graphics_pipeline Fixes: 1e84e53712ae "radv: add cache items to in memory cache when reading from disk" --- src/amd/vulkan/radv_pipeline_cache.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_pipeline_cache.c b/src/amd/vulkan/radv_pipeline_cache.c index 91470d1419..2904b62e6b 100644 --- a/src/amd/vulkan/radv_pipeline_cache.c +++ b/src/amd/vulkan/radv_pipeline_cache.c @@ -269,21 +269,34 @@ radv_create_shader_variants_from_pipeline_cache(struct radv_device *device, uint8_t disk_sha1[20]; disk_cache_compute_key(device->physical_device->disk_cache, sha1, 20, disk_sha1); entry = (struct cache_entry *) disk_cache_get(device->physical_device->disk_cache, disk_sha1, NULL); if (!entry) { pthread_mutex_unlock(>mutex); return false; } else { - radv_pipeline_cache_add_entry(cache, entry); + size_t size = entry_size(entry); + struct cache_entry *new_entry = vk_alloc(>alloc, size, 8, + VK_SYSTEM_ALLOCATION_SCOPE_CACHE); + if (!new_entry) { + free(entry); + pthread_mutex_unlock(>mutex); + return false; + } + + memcpy(new_entry, entry, entry_size(entry)); + free(entry); + entry = new_entry; + + radv_pipeline_cache_add_entry(cache, new_entry); } } char *p = entry->code; for(int i = 0; i < MESA_SHADER_STAGES; ++i) { if (!entry->variants[i] && entry->code_sizes[i]) { struct radv_shader_variant *variant; struct cache_entry_variant_info info; variant = calloc(1, sizeof(struct radv_shader_variant)); -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V2 1/4] i965/gen10: Implement WaSampleOffsetIZ workaround
On Fri, Oct 06, 2017 at 04:30:47PM -0700, Anuj Phogat wrote: > There are few other (duplicate) workarounds which have similar > recommendations: > WaFlushHangWhenNonPipelineStateAndMarkerStalled > WaCSStallBefore3DSamplePattern > WaPipeControlBefore3DStateSamplePattern > > WaPipeControlBefore3DStateSamplePattern has some extra recommendations if > driver is using mid batch context restore. Ignoring it for now because We're > not doing mid-batch context restore in Mesa. > How do we know we've implemented this correctly? Does this fix or improve any hangs we've been seeing? It would be helpful to have this information in the commit message. > Cc: mesa-sta...@lists.freedesktop.org > Cc: Jason Ekstrand> Cc: Rafael Antognolli > Signed-off-by: Anuj Phogat > --- > src/mesa/drivers/dri/i965/brw_context.h| 2 + > src/mesa/drivers/dri/i965/brw_defines.h| 1 + > src/mesa/drivers/dri/i965/brw_pipe_control.c | 50 > ++ > src/mesa/drivers/dri/i965/gen8_multisample_state.c | 8 > 4 files changed, 61 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > b/src/mesa/drivers/dri/i965/brw_context.h > index 92fc16de13..f0e8d562e9 100644 > --- a/src/mesa/drivers/dri/i965/brw_context.h > +++ b/src/mesa/drivers/dri/i965/brw_context.h > @@ -1647,6 +1647,8 @@ void brw_emit_post_sync_nonzero_flush(struct > brw_context *brw); > void brw_emit_depth_stall_flushes(struct brw_context *brw); > void gen7_emit_vs_workaround_flush(struct brw_context *brw); > void gen7_emit_cs_stall_flush(struct brw_context *brw); > +void gen10_emit_wa_cs_stall_flush(struct brw_context *brw); > +void gen10_emit_wa_lri_to_cache_mode_zero(struct brw_context *brw); > > /* brw_queryformat.c */ > void brw_query_internal_format(struct gl_context *ctx, GLenum target, > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > b/src/mesa/drivers/dri/i965/brw_defines.h > index 4abb790612..270cdf29db 100644 > --- a/src/mesa/drivers/dri/i965/brw_defines.h > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > @@ -1609,6 +1609,7 @@ enum brw_pixel_shader_coverage_mask_mode { > #define GEN7_GPGPU_DISPATCHDIMY 0x2504 > #define GEN7_GPGPU_DISPATCHDIMZ 0x2508 > > +#define GEN7_CACHE_MODE_0 0x7000 > #define GEN7_CACHE_MODE_1 0x7004 > # define GEN9_FLOAT_BLEND_OPTIMIZATION_ENABLE (1 << 4) > # define GEN8_HIZ_NP_PMA_FIX_ENABLE(1 << 11) > diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c > b/src/mesa/drivers/dri/i965/brw_pipe_control.c > index 460b8f73b6..156f5c25ec 100644 > --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c > +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c > @@ -278,6 +278,56 @@ gen7_emit_cs_stall_flush(struct brw_context *brw) > brw->workaround_bo, 0, 0); > } > > +static void > +brw_flush_gpu_caches(struct brw_context *brw) { > + brw_emit_pipe_control_flush(brw, > + PIPE_CONTROL_CACHE_FLUSH_BITS | > + PIPE_CONTROL_CACHE_INVALIDATE_BITS); > +} > + > +/** > + * From Gen10 Workarounds page in h/w specs: > + * WaSampleOffsetIZ: > + * Prior to the 3DSTATE_SAMPLE_PATTERN driver must ensure there are no > + * markers in the pipeline by programming a PIPE_CONTROL with stall. > + */ > +void > +gen10_emit_wa_cs_stall_flush(struct brw_context *brw) > +{ > + const struct gen_device_info *devinfo = >screen->devinfo; > + assert(devinfo->gen == 10); > + brw_emit_pipe_control_flush(brw, > + PIPE_CONTROL_CS_STALL | > + PIPE_CONTROL_STALL_AT_SCOREBOARD); > +} > + > +/** > + * From Gen10 Workarounds page in h/w specs: > + * WaSampleOffsetIZ: > + * When 3DSTATE_SAMPLE_PATTERN is programmed, driver must then issue an > + * MI_LOAD_REGISTER_IMM command to an offset between 0x7000 and 0x7FFF(SVL) > + * after the command to ensure the state has been delivered prior to any > + * command causing a marker in the pipeline. > + */ > +void > +gen10_emit_wa_lri_to_cache_mode_zero(struct brw_context *brw) > +{ > + const struct gen_device_info *devinfo = >screen->devinfo; > + assert(devinfo->gen == 10); > + > + /* Before changing the value of CACHE_MODE_0 register, GFX pipeline must > +* be idle; i.e., full flush is required. > +*/ > + brw_flush_gpu_caches(brw); > + I don't know how the flushing mechanism works completely, but I'm guessing that doing the following would be better: brw_emit_mi_flush(); brw_emit_pipe_control_flush(brw, PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STATE_CACHE_INVALIDATE); The first command seems to be the standard flush/invalidate/stall everything command. It doesn't invalidate the state cache however, hence the second command. Since the GPU must be idle, perhaps we want a CS stall as well? > + /* Write to CACHE_MODE_0
Re: [Mesa-dev] [PATCH v3 00/43] anv: SPV_KHR_16bit_storage/VK_KHR_16bit_storage for gen8+
Patches 1-5, 8-11, and 13-18 are Reviewed-by: Jason EkstrandOn Mon, Oct 16, 2017 at 8:23 AM, Pohjolainen, Topi < topi.pohjolai...@gmail.com> wrote: > On Mon, Oct 16, 2017 at 08:03:41AM -0700, Jason Ekstrand wrote: > > FYI: I'm planning to review this some time this week. Probably not today > > though. > > Great, I was hoping you would. I'm just reading out of curiosity and asking > random questions. Mostly trying to remind myself how compiler works :) It > has > been a while since I had anything to do with it. > > > > > On Thu, Oct 12, 2017 at 11:37 AM, Jose Maria Casanova Crespo < > > jmcasan...@igalia.com> wrote: > > > > > Hello, > > > > > > this is the V3 series for the implementation of the > > > SPV_KHR_16bit_storage and VK_KHR_16bit_storage extensions on the anv > > > vulkan driver, in addition to the GLSL and NIR support needed. > > > > > > The original series can be found here [1], and the V2 is available > > > here [2]. > > > > > > In short V3 includes the following: > > > > > > * Updates on several patches after the review of the V2 series. > > >This includes some squashes, and specially changes so 16-bit > > >types are always packed, not using stride 2 by default. > > >This implied a re-implementation of all load_input/store_output > > >intrinsics for 16-bit. New solution shuffles and unshuffles > > >16-bit components in 32-bit URB write and read operations. This > > >saves space in the URB writes and reduces the register pressure > > >just using half of the space. > > > > > > * 5 patches have been removed from v2 series because now we not > > >assume the stride 2 for 16-bit registers. We also removed the > > >patch of reuse_16bit_conversion_register. The problems related > > >to spilling that motivate that patch were better addressed by > > >Curro's liveness patch. > > > > > >i965/fs: Set stride 2 when dealing with 16-bit floats/ints > > >i965/fs: Retype 16-bit/stride2 movs to UD on nir_op_vecX > > >i965/fs: Need to allocate as minimum 32-bit register > > >i965/fs: Update assertion on copy propagation > > >i965/fs: Add reuse_16bit_conversions_register optimization > > > > > > Finally an updated overview of the patches: > > > > > > Patches 1-2 add 16-bit float, int and uint types to GLSL. This is > > > needed because NIR uses GLSL types internally. We use the enums > > > already defined at AMD_gpu_shader_half_float and NV_gpu_shader > > > extensions. Patch 4 updates mesa/st, in order to avoid warnings for > > > types not handled on a switch. > > > > > > Patches 3-6 add NIR support for those new GLSL 16-bit types, > > > conversion opcodes, and rounding modes for float to half-float > > > conversions. > > > > > > Patches 7-9 add the SPIR-V (SPV_KHR_16bit_storage) to NIR support. > > > > > > Patches 10-13 add general 16-bit support for i965. This includes > > > handling of new types on several general purpose methods, > > > update/remove some asserts. > > > > > > Patches 14-18 add support for 32 to 16-bit conversions for i965, > > > including rounding mode opcodes (needed for float to half-float > > > conversions), and an optimization that removes superfluous rounding > > > mode sets. > > > > > > Patch 19 adds 16-bit support for constant location. > > > > > > Patches 20-24 add and use two new messages: byte scattered read and > > > write. Those were needed because untyped surface message has a fixed > > > 32-bit write size. Those messages are used on the 16-bit support of > > > store SSBO, load SSBO, load UBO and load shared. > > > > > > Patches 25-29 implement 16-bit vertex attribute inputs support on > > > i965. These include changes on anv. This was needed because 16-bit > > > surface formats do implicit conversion to 32-bit. To workaround this, > > > we override the 16-bit surface format, and use 32-bit ones. > > > > > > Patch 30 implements load input and load store for all intra stage. > > > This patch substitutes the previous simple patch i965/fs: Set stride 2 > > > when dealing with 16-bit floats/ints. > > > > > > Patch 31-37 implements 16-bit store output support for fragment > > > shaders on i965. > > > > > > Patches 38-41 are the new patches included in V2. Three of them are > > > improvements over V1 that doesn't fix any execution problem, but they > > > improve performance reducing the use of multiple scattered messages > > > for untyped read/write opreations. 16bit CTS tests passes without them. > > > The other one would fix a real problem (patch 41), but unfourtunately > > > no CTS test yet catching it. > > > > > > Patches 42-43 enable both extensions on anv vulkan driver. > > > > > > [1] https://lists.freedesktop.org/archives/mesa-dev/2017-July/16 > 2791.html > > > [2] https://lists.freedesktop.org/archives/mesa-dev/2017-August/ > > > 167455.html > > > > > > Alejandro Piñeiro (14): > > > i965/vec4: Handle 16-bit types at type_size_xvec4 > > > i965/fs: Add brw_reg_type_from_bit_size
Re: [Mesa-dev] [PATCH v3 22/43] i965/fs: Use byte_scattered_write on 16-bit store_ssbo
On Thu, Oct 12, 2017 at 11:38 AM, Jose Maria Casanova Crespo < jmcasan...@igalia.com> wrote: > From: Alejandro Piñeiro> > We need to rely on byte scattered writes as untyped writes are 32-bit > size. We could try to keep using 32-bit messages when we have two or > four 16-bit elements, but for simplicity sake, we use the same message > for any component number. We revisit this in this serie. > > v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand) > > Signed-off-by: Jose Maria Casanova Crespo > Signed-off-by: Alejandro Piñeiro --- > src/intel/compiler/brw_fs_nir.cpp | 35 +- > - > 1 file changed, 29 insertions(+), 6 deletions(-) > > diff --git a/src/intel/compiler/brw_fs_nir.cpp > b/src/intel/compiler/brw_fs_nir.cpp > index e108b5517b..13c16fc912 100644 > --- a/src/intel/compiler/brw_fs_nir.cpp > +++ b/src/intel/compiler/brw_fs_nir.cpp > @@ -28,6 +28,7 @@ > > using namespace brw; > using namespace brw::surface_access; > +using namespace brw::scattered_access; > > void > fs_visitor::emit_nir_code() > @@ -4085,8 +4086,15 @@ fs_visitor::nir_emit_intrinsic(const fs_builder > , nir_intrinsic_instr *instr >* length of the write to what we can do and let the next > iteration >* handle the rest >*/ > - if (type_size > 4) > + if (type_size > 4) { > Maybe move the above comment to here. > length = MIN2(2, length); > + } else if (type_size == 2) { > +/* For 16-bit types we are using byte scattered writes, that > can > + * only write one component per call. So we limit the length, > and > + * let the write happening in several iterations. > + */ > +length = 1; > + } > > fs_reg offset_reg; > nir_const_value *const_offset = nir_src_as_const_value(instr-> > src[2]); > @@ -4100,11 +4108,26 @@ fs_visitor::nir_emit_intrinsic(const fs_builder > , nir_intrinsic_instr *instr > brw_imm_ud(type_size * first_component)); > } > > - > - emit_untyped_write(bld, surf_index, offset_reg, > -offset(val_reg, bld, first_component * > type_slots), > -1 /* dims */, length * type_slots, > -BRW_PREDICATE_NONE); > + if (type_size == 2) { > +/* Untyped Surface messages have a fixed 32-bit size, so we > need > + * to rely on byte scattered in order to write 16-bit > elements. > + * The byte_scattered_write message needs that every written > 16-bit > + * type to be aligned 32-bits (stride=2). > + */ > +fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F); > +val_reg.type = BRW_REGISTER_TYPE_HF; > +bld.MOV (subscript(tmp, BRW_REGISTER_TYPE_HF, 0), > + offset(val_reg, bld, first_component)); > We should probably use W types here so we don't get any float conversion problems. With that fixed and the previous patch squashed in, Reviewed-by: Jason Ekstrand > +emit_byte_scattered_write(bld, surf_index, offset_reg, > + tmp, > + 1 /* dims */, length * type_slots, > + BRW_PREDICATE_NONE); > + } else { > +emit_untyped_write(bld, surf_index, offset_reg, > + offset(val_reg, bld, first_component * > type_slots), > + 1 /* dims */, length * type_slots, > + BRW_PREDICATE_NONE); > + } > > /* Clear the bits in the writemask that we just wrote, then try >* again to see if more channels are left. > -- > 2.13.6 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 21/43] i965/fs: Adjust type_size/type_slots on store_ssbo
This patch should really be squashed in with the next one. On Thu, Oct 12, 2017 at 11:38 AM, Jose Maria Casanova Crespo < jmcasan...@igalia.com> wrote: > From: Alejandro Piñeiro> > --- > src/intel/compiler/brw_fs_nir.cpp | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/src/intel/compiler/brw_fs_nir.cpp > b/src/intel/compiler/brw_fs_nir.cpp > index b356836e80..e108b5517b 100644 > --- a/src/intel/compiler/brw_fs_nir.cpp > +++ b/src/intel/compiler/brw_fs_nir.cpp > @@ -4056,11 +4056,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder > , nir_intrinsic_instr *instr > * Also, we have to suffle 64-bit data to be in the appropriate > layout > * expected by our 32-bit write messages. > */ > - unsigned type_size = 4; >unsigned bit_size = instr->src[0].is_ssa ? > instr->src[0].ssa->bit_size : instr->src[0].reg.reg->bit_size; > + unsigned type_size = bit_size / 8; >if (bit_size == 64) { > - type_size = 8; > fs_reg tmp = > fs_reg(VGRF, alloc.allocate(alloc.sizes[val_reg.nr]), > val_reg.type); > shuffle_64bit_data_for_32bit_write(bld, > @@ -4070,7 +4069,8 @@ fs_visitor::nir_emit_intrinsic(const fs_builder > , nir_intrinsic_instr *instr > val_reg = tmp; >} > > - unsigned type_slots = type_size / 4; > + /* 16-bit types would use a minimum of 1 slot */ > + unsigned type_slots = MAX2(type_size / 4, 1); > >/* Combine groups of consecutive enabled channels in one write > * message. We use ffs to find the first enabled channel and then > ffs on > -- > 2.13.6 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 20/43] i965/fs: Add byte scattered write message and fs support
On Thu, Oct 12, 2017 at 11:38 AM, Jose Maria Casanova Crespo < jmcasan...@igalia.com> wrote: > Signed-off-by: Jose Maria Casanova Crespo> Signed-off-by: Alejandro Piñeiro > --- > src/intel/compiler/brw_eu.h| 6 ++ > src/intel/compiler/brw_eu_defines.h| 17 + > src/intel/compiler/brw_eu_emit.c | 89 > ++ > src/intel/compiler/brw_fs.cpp | 10 +++ > src/intel/compiler/brw_fs_copy_propagation.cpp | 2 + > src/intel/compiler/brw_fs_generator.cpp| 5 ++ > src/intel/compiler/brw_fs_surface_builder.cpp | 17 + > src/intel/compiler/brw_fs_surface_builder.h| 9 +++ > src/intel/compiler/brw_shader.cpp | 7 ++ > 9 files changed, 162 insertions(+) > > diff --git a/src/intel/compiler/brw_eu.h b/src/intel/compiler/brw_eu.h > index 145942a54f..b44ca0f518 100644 > --- a/src/intel/compiler/brw_eu.h > +++ b/src/intel/compiler/brw_eu.h > @@ -476,6 +476,12 @@ brw_typed_surface_write(struct brw_codegen *p, > unsigned num_channels); > > void > +brw_byte_scattered_write(struct brw_codegen *p, > + struct brw_reg payload, > + struct brw_reg surface, > + unsigned msg_length); > + > +void > brw_memory_fence(struct brw_codegen *p, > struct brw_reg dst); > > diff --git a/src/intel/compiler/brw_eu_defines.h > b/src/intel/compiler/brw_eu_defines.h > index 1751f18293..9aac385ba7 100644 > --- a/src/intel/compiler/brw_eu_defines.h > +++ b/src/intel/compiler/brw_eu_defines.h > @@ -390,6 +390,16 @@ enum opcode { > > SHADER_OPCODE_RND_MODE, > > + /** > +* Byte scattered write/read opcodes. > +* > +* LOGICAL opcodes are eventually translated to the matching > non-LOGICAL > +* opcode, but instead of taking a single payload blog they expect > their > +* arguments separately as individual sources, like untyped write/read. > +*/ > + SHADER_OPCODE_BYTE_SCATTERED_WRITE, > + SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL, > + > SHADER_OPCODE_MEMORY_FENCE, > > SHADER_OPCODE_GEN4_SCRATCH_READ, > @@ -1231,4 +1241,11 @@ enum PACKED brw_rnd_mode { > BRW_RND_MODE_UNSPECIFIED, /* Unspecified rounding mode */ > }; > > +/* MDC_DS - Data Size Message Descriptor Control Field */ > +enum PACKED brw_data_size { > + GEN7_BYTE_SCATTERED_DATA_SIZE_BYTE = 0, > + GEN7_BYTE_SCATTERED_DATA_SIZE_WORD = 1, > + GEN7_BYTE_SCATTERED_DATA_SIZE_DWORD = 2 > +}; > + > #endif /* BRW_EU_DEFINES_H */ > diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_ > emit.c > index 8c1e4c5eae..84d85be653 100644 > --- a/src/intel/compiler/brw_eu_emit.c > +++ b/src/intel/compiler/brw_eu_emit.c > @@ -2483,6 +2483,49 @@ brw_send_indirect_surface_message(struct > brw_codegen *p, > return insn; > } > > + > +static struct brw_inst * > +brw_send_indirect_scattered_message(struct brw_codegen *p, > +unsigned sfid, > +struct brw_reg dst, > +struct brw_reg payload, > +struct brw_reg surface, > +unsigned message_len, > +unsigned response_len, > +bool header_present) > How is this any different from brw_send_indirect_surface_message? They look identical except for the fact that this one is missing the explicit brw_set_default_exec_size I added to the other as part of my subgroup series. If there's no real difference, let's delete this one and just use the other. You can make a pretty good case that the scattered byte messages are "surface" messages. > +{ > + const struct gen_device_info *devinfo = p->devinfo; > + struct brw_inst *insn; > + > + if (surface.file != BRW_IMMEDIATE_VALUE) { > + struct brw_reg addr = retype(brw_address_reg(0), > BRW_REGISTER_TYPE_UD); > + > + brw_push_insn_state(p); > + brw_set_default_access_mode(p, BRW_ALIGN_1); > + brw_set_default_mask_control(p, BRW_MASK_DISABLE); > + brw_set_default_predicate_control(p, BRW_PREDICATE_NONE); > + > + /* Mask out invalid bits from the surface index to avoid hangs e.g. > when > + * some surface array is accessed out of bounds. > + */ > + insn = brw_AND(p, addr, > + suboffset(vec1(retype(surface, > BRW_REGISTER_TYPE_UD)), > + BRW_GET_SWZ(surface.swizzle, 0)), > + brw_imm_ud(0xff)); > + > + brw_pop_insn_state(p); > + > + surface = addr; > + } > + > + insn = brw_send_indirect_message(p, sfid, dst, payload, surface); > + brw_inst_set_mlen(devinfo, insn, message_len); > + brw_inst_set_rlen(devinfo, insn, response_len); > + brw_inst_set_header_present(devinfo, insn, header_present); > + > +
Re: [Mesa-dev] [PATCH 2/2] broadcom/genxml: Fix decoding of groups with small fields.
Kenneth Graunkewrites: > Groups containing fields smaller than a byte probably not being decoded > correctly. For example: > > > > > > gen_field_iterator_next would properly walk over each element of the > array, incrementing group_iter. However, the code to print the actual > values only considered iter->field->start/end, which are 0 and 3 in the > above example. So it would always fetch bits 3:0 of the current byte, > printing the same value over and over. I don't have any groups currently (haven't figured out how to use them well), but it looks right, so I've reviewed it and it'll be in my next push. Thanks for propagating your fix over! signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 19/43] i965/fs: Support push constants of 16-bit types
On Thu, Oct 12, 2017 at 11:38 AM, Jose Maria Casanova Crespo < jmcasan...@igalia.com> wrote: > We enable the use of 16-bit values in push constants > modifying the assign_constant_locations function to work > with 16-bit types. > > The API to access buffers in Vulkan use multiples of 4-byte for > offsets and sizes. Current accountability of uniforms based on 4-byte > slots will work for 16-bit values if they are allowed to use 32-bit > slots. For that, we replace the division by 4 by a DIV_ROUND_UP, so > 2-byte elements will use 1 slot instead of 0. > I'm fairly sure this doesn't actually work correctly. That said, I'm also fairly sure the current code is broken for 64 bits. In particular, let's suppose we have something like this: layout(push_constant) { struct { int i; int pad1; float f; double d; int pad2; } arr[2]; } main() { out_color = vec4(arr[arr[0].i].f, float(arr[arr[0].i].d), arr[arr[1].i].f, float(arr[arr[1].i].d)); } I'm pretty sure the current code will explode because it will see a single contiguous chunk that's neither 32 nor 64-bit. If that particular shader doesn't break it, I'm sure some permutation of it will. Things only get worse if we throw in 16-bit. Ultimately, I think the solution is to throw away our current scheme of trying to separate things out by bit size and move to a scheme where we work with everything in bytes in assign_constant_locations and trust in the contiguous[] array to determine where to split things up. We would probably want to continue only re-arranging things 4 bytes at a time. That said, I don't think this patch makes anything any worse... > We aligns the 16-bit locations after assigning the 32-bit > ones. > --- > src/intel/compiler/brw_fs.cpp | 30 +++--- > 1 file changed, 23 insertions(+), 7 deletions(-) > > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp > index a1d49a63be..8da16145dc 100644 > --- a/src/intel/compiler/brw_fs.cpp > +++ b/src/intel/compiler/brw_fs.cpp > @@ -1909,8 +1909,9 @@ set_push_pull_constant_loc(unsigned uniform, int > *chunk_start, > if (!contiguous) { >/* If bitsize doesn't match the target one, skip it */ >if (*max_chunk_bitsize != target_bitsize) { > - /* FIXME: right now we only support 32 and 64-bit accesses */ > - assert(*max_chunk_bitsize == 4 || *max_chunk_bitsize == 8); > + assert(*max_chunk_bitsize == 4 || > +*max_chunk_bitsize == 8 || > +*max_chunk_bitsize == 2); > *max_chunk_bitsize = 0; > *chunk_start = -1; > return; > @@ -1987,8 +1988,9 @@ fs_visitor::assign_constant_locations() > int constant_nr = inst->src[i].nr + inst->src[i].offset / 4; > > if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) { > -assert(inst->src[2].ud % 4 == 0); > -unsigned last = constant_nr + (inst->src[2].ud / 4) - 1; > +assert(type_sz(inst->src[i].type) == 2 ? > + (inst->src[2].ud % 2 == 0) : (inst->src[2].ud % 4 == > 0)); > +unsigned last = constant_nr + DIV_ROUND_UP(inst->src[2].ud, > 4) - 1; > assert(last < uniforms); > > for (unsigned j = constant_nr; j < last; j++) { > @@ -2000,8 +2002,8 @@ fs_visitor::assign_constant_locations() > bitsize_access[last] = MAX2(bitsize_access[last], > type_sz(inst->src[i].type)); > } else { > if (constant_nr >= 0 && constant_nr < (int) uniforms) { > - int regs_read = inst->components_read(i) * > - type_sz(inst->src[i].type) / 4; > + int regs_read = DIV_ROUND_UP(inst->components_read(i) * > +type_sz(inst->src[i].type), > 4); > for (int j = 0; j < regs_read; j++) { >is_live[constant_nr + j] = true; >bitsize_access[constant_nr + j] = > @@ -2062,7 +2064,7 @@ fs_visitor::assign_constant_locations() > > } > > - /* Then push the rest of uniforms */ > + /* Then push the 32-bit uniforms */ > const unsigned uniform_32_bit_size = type_sz(BRW_REGISTER_TYPE_F); > for (unsigned u = 0; u < uniforms; u++) { >if (!is_live[u]) > @@ -2081,6 +2083,20 @@ fs_visitor::assign_constant_locations() > stage_prog_data); > } > > + const unsigned uniform_16_bit_size = type_sz(BRW_REGISTER_TYPE_HF); > + for (unsigned u = 0; u < uniforms; u++) { > + if (!is_live[u]) > + continue; > + > + set_push_pull_constant_loc(u, _start, _chunk_bitsize, > + contiguous[u], bitsize_access[u], > + uniform_16_bit_size, > + push_constant_loc, pull_constant_loc, > + _push_constants, _pull_constants, > +
Re: [Mesa-dev] [PATCH v3 28/34] i965: add cache fallback support using serialized nir
On Sunday, October 29, 2017 5:21:47 PM PDT Jordan Justen wrote: > On 2017-10-29 01:11:32, Kenneth Graunke wrote: > > On Sunday, October 22, 2017 1:01:36 PM PDT Jordan Justen wrote: > > > If the i965 gen program cannot be loaded from the cache, then we > > > fallback to using a serialized nir program. > > > > > > This is based on "i965: add cache fallback support" by Timothy Arceri > > >. Tim's version was written to fallback > > > to compiling from source, and therefore had to be much more complex. > > > After Connor and Jason implemented nir serialization, I was able to > > > rewrite and greatly simplify this patch. > > > > > > Signed-off-by: Jordan Justen > > > Acked-by: Timothy Arceri > > > --- > > > src/mesa/drivers/dri/i965/brw_disk_cache.c | 27 > > > ++- > > > 1 file changed, 26 insertions(+), 1 deletion(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_disk_cache.c > > > b/src/mesa/drivers/dri/i965/brw_disk_cache.c > > > index 503c6c7b499..9af893d40a7 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_disk_cache.c > > > +++ b/src/mesa/drivers/dri/i965/brw_disk_cache.c > > > @@ -24,6 +24,7 @@ > > > #include "compiler/blob.h" > > > #include "compiler/glsl/ir_uniform.h" > > > #include "compiler/glsl/shader_cache.h" > > > +#include "compiler/nir/nir_serialize.h" > > > #include "main/mtypes.h" > > > #include "util/disk_cache.h" > > > #include "util/macros.h" > > > @@ -58,6 +59,27 @@ gen_shader_sha1(struct brw_context *brw, struct > > > gl_program *prog, > > > _mesa_sha1_compute(manifest, strlen(manifest), out_sha1); > > > } > > > > > > +static void > > > +fallback_to_full_recompile(struct brw_context *brw, struct gl_program > > > *prog, > > > > It's not exactly a full recompile anymore, maybe rename this to > > recompile_from_nir? Or fallback_to_partial_recompile? > > Good point. I guess eventually we'll recompile from nir, but at this > point we are just restoring the nir program. What about > restore_serialized_nir_shader? Reviewed-by from you with that? > > -Jordan Sure. Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 34/34] i965: Initialize disk shader cache if MESA_GLSL_CACHE_DISABLE is false
On Sunday, October 22, 2017 1:01:42 PM PDT Jordan Justen wrote: > Double negative FTW! > > For now, the shader cache is disabled by default on i965 to allow us > to verify its stability. > > In other words, to enable the shader cache on i965, set > MESA_GLSL_CACHE_DISABLE to false or 0. If the variable is unset, then > the shader cache will be disabled. > > We use the build-id of i965_dri.so for the timestamp, and the pci > device id for the device name. > > v2: > * Simplify code by forcing link to include build id sha. (Matt) > > v3: > * Don't use a for loop with snprintf for bin to hex. (Matt) > * Assume fixed length render and timestamp string to further simplify >code. > > Cc: Matt Turner> Signed-off-by: Jordan Justen > --- > src/mesa/drivers/dri/i965/brw_context.c| 2 ++ > src/mesa/drivers/dri/i965/brw_disk_cache.c | 29 + > src/mesa/drivers/dri/i965/brw_state.h | 1 + > 3 files changed, 32 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_context.c > b/src/mesa/drivers/dri/i965/brw_context.c > index 949ec4a2a3d..bb9474035c9 100644 > --- a/src/mesa/drivers/dri/i965/brw_context.c > +++ b/src/mesa/drivers/dri/i965/brw_context.c > @@ -1037,6 +1037,8 @@ brwCreateContext(gl_api api, > brw->dri_config_options_sha1); > brw->ctx.Const.dri_config_options_sha1 = brw->dri_config_options_sha1; > > + brw_disk_cache_init(brw); > + > return true; > } > > diff --git a/src/mesa/drivers/dri/i965/brw_disk_cache.c > b/src/mesa/drivers/dri/i965/brw_disk_cache.c > index 9af893d40a7..22670e31667 100644 > --- a/src/mesa/drivers/dri/i965/brw_disk_cache.c > +++ b/src/mesa/drivers/dri/i965/brw_disk_cache.c > @@ -26,6 +26,8 @@ > #include "compiler/glsl/shader_cache.h" > #include "compiler/nir/nir_serialize.h" > #include "main/mtypes.h" > +#include "util/build_id.h" > +#include "util/debug.h" > #include "util/disk_cache.h" > #include "util/macros.h" > #include "util/mesa-sha1.h" > @@ -460,3 +462,30 @@ brw_disk_cache_write_compute_program(struct brw_context > *brw) > MESA_SHADER_COMPUTE); > } > } > + > +void > +brw_disk_cache_init(struct brw_context *brw) > +{ > +#ifdef ENABLE_SHADER_CACHE > + if (env_var_as_boolean("MESA_GLSL_CACHE_DISABLE", true)) > + return; > + > + char renderer[10]; > + int len = snprintf(renderer, sizeof(renderer), "i965_%04x", > + brw->screen->deviceID); > + assert(len == sizeof(renderer) - 1); > + > + const struct build_id_note *note = > + build_id_find_nhdr_for_addr(brw_disk_cache_init); > + int id_size = build_id_length(note); > + assert(note && id_size == 20 /* sha1 */); > + > + const uint8_t *id_sha1 = build_id_data(note); > + assert(id_sha1); > + > + char timestamp[41]; > + _mesa_sha1_format(timestamp, id_sha1); > + > + brw->ctx.Cache = disk_cache_create(renderer, timestamp, 0); It's not really a timestamp, but that is what disk_cache_create's parameter is called...not inclined to care that much. Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 33/34] dri drivers: Always add the sha1 build-id
On Sunday, October 22, 2017 1:01:41 PM PDT Jordan Justen wrote: > Cc: Dylan Baker> Signed-off-by: Jordan Justen > --- > src/mesa/drivers/dri/Makefile.am | 1 + > src/mesa/drivers/dri/meson.build | 2 +- > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/Makefile.am > b/src/mesa/drivers/dri/Makefile.am > index 95c637d0cdd..5cb2127501e 100644 > --- a/src/mesa/drivers/dri/Makefile.am > +++ b/src/mesa/drivers/dri/Makefile.am > @@ -57,6 +57,7 @@ mesa_dri_drivers_la_LDFLAGS = \ > -module \ > -no-undefined \ > -avoid-version \ > + -Wl,--build-id=sha1 \ > $(BSYMBOLIC) \ > $(GC_SECTIONS) \ > $(LD_NO_UNDEFINED) > diff --git a/src/mesa/drivers/dri/meson.build > b/src/mesa/drivers/dri/meson.build > index 36079324d41..98ed28d04ca 100644 > --- a/src/mesa/drivers/dri/meson.build > +++ b/src/mesa/drivers/dri/meson.build > @@ -41,7 +41,7 @@ if dri_drivers != [] > libmesa_util, libnir, libmesa_classic], > dependencies : [dep_selinux, dep_libdrm, dep_expat, dep_m, dep_thread, > dep_dl], > -link_args : [ld_args_bsymbolic, ld_args_gc_sections], > +link_args : ['-Wl,--build-id=sha1', ld_args_bsymbolic, > ld_args_gc_sections], >) > >pkg.generate( > The autotools bits look fine to me. Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 32/34] disk_cache: Fix issue reading GLSL metadata
On Sunday, October 22, 2017 1:01:40 PM PDT Jordan Justen wrote: > This would cause the read of the metadata content to fail, which would > prevent the linking from being skipped. > > Seen on Rocket League with i965 shader cache. > > Cc: Timothy Arceri> Signed-off-by: Jordan Justen > Reviewed-by: Timothy Arceri > --- > src/util/disk_cache.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/util/disk_cache.c b/src/util/disk_cache.c > index e38cacb259b..fde6e2e0974 100644 > --- a/src/util/disk_cache.c > +++ b/src/util/disk_cache.c > @@ -1110,7 +1110,7 @@ disk_cache_get(struct disk_cache *cache, const > cache_key key, size_t *size) > * TODO: pass the metadata back to the caller and do some basic > * validation. > */ > - cache_item_md_size += sizeof(cache_key); > + cache_item_md_size += num_keys * sizeof(cache_key); >ret = lseek(fd, num_keys * sizeof(cache_key), SEEK_CUR); >if (ret == -1) > goto fail; > Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 30/34] i965: Initialize sha1 hash of dri config options
On Sunday, October 22, 2017 1:01:38 PM PDT Jordan Justen wrote: > Signed-off-by: Jordan Justen> Reviewed-by: Timothy Arceri > --- > src/mesa/drivers/dri/i965/brw_context.c | 4 > src/mesa/drivers/dri/i965/brw_context.h | 1 + > 2 files changed, 5 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_context.c > b/src/mesa/drivers/dri/i965/brw_context.c > index c8de0746387..949ec4a2a3d 100644 > --- a/src/mesa/drivers/dri/i965/brw_context.c > +++ b/src/mesa/drivers/dri/i965/brw_context.c > @@ -1033,6 +1033,10 @@ brwCreateContext(gl_api api, > vbo_use_buffer_objects(ctx); > vbo_always_unmap_buffers(ctx); > > + driComputeOptionsSha1(>screen->optionCache, > + brw->dri_config_options_sha1); > + brw->ctx.Const.dri_config_options_sha1 = brw->dri_config_options_sha1; > + > return true; > } > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > b/src/mesa/drivers/dri/i965/brw_context.h > index 26e71e62b54..834b9ae3d5a 100644 > --- a/src/mesa/drivers/dri/i965/brw_context.h > +++ b/src/mesa/drivers/dri/i965/brw_context.h > @@ -1211,6 +1211,7 @@ struct brw_context > bool draw_aux_buffer_disabled[MAX_DRAW_BUFFERS]; > > __DRIcontext *driContext; > + unsigned char dri_config_options_sha1[20]; > struct intel_screen *screen; > }; > > Why are we storing this in brw_context? Maybe I'm missing something, but it looks like we store it in ctx->Const and never use the brw_context copy again. Seems like we could put it there directly. Can we also move this to brw_process_driconf_options() instead of the end of brwCreateContext? I'm always a bit confused about the driver/screen split here, but I think this looks right. With those changes, Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Don't flag BRW_NEW_SURFACES unless some push constants are dirty.
On Mon, Oct 30, 2017 at 12:54:43AM -0700, Kenneth Graunke wrote: > Due to a gaffe on my part, we were re-emitting all binding table entries > on every single draw call. The push_constant_packets atom listens to > BRW_NEW_DRAW_CALL, but skips emitting 3DSTATE_CONSTANT_XS for each stage > unless stage_state->push_constants_dirty is true. However, it flagged > BRW_NEW_SURFACES unconditionally at the end, by mistake. > > Instead, it should only flag it if we actually emit 3DSTATE_CONSTANT_XS > for a stage. We can move it a few lines up, inside the loop - the early > continues will skip over it if push constants aren't dirty for a stage. > > With INTEL_NO_HW=1 set, improves performance of GFXBench5 gl_driver_2 > on Apollolake at 1280x720 by 1.01122% +/- 0.470723% (n=35). > --- > src/mesa/drivers/dri/i965/genX_state_upload.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c > b/src/mesa/drivers/dri/i965/genX_state_upload.c > index 98f69522de5..b7a6cd73619 100644 > --- a/src/mesa/drivers/dri/i965/genX_state_upload.c > +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c > @@ -3117,9 +3117,8 @@ genX(upload_push_constant_packets)(struct brw_context > *brw) >} > >stage_state->push_constants_dirty = false; > + brw->ctx.NewDriverState |= GEN_GEN >= 9 ? BRW_NEW_SURFACES : 0; > } > - > - brw->ctx.NewDriverState |= GEN_GEN >= 9 ? BRW_NEW_SURFACES : 0; Reviewed-by: Rafael Antognolli> } > > const struct brw_tracked_state genX(push_constant_packets) = { > -- > 2.14.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] i965/gen10: Implement WaForceRCPFEHangWorkaround
On Mon, Oct 30, 2017 at 2:02 PM, Nanley Cherywrote: > On Mon, Oct 02, 2017 at 04:07:58PM -0700, Anuj Phogat wrote: >> Cc: mesa-sta...@lists.freedesktop.org >> Signed-off-by: Anuj Phogat >> --- >> src/mesa/drivers/dri/i965/brw_pipe_control.c | 19 +++ >> 1 file changed, 19 insertions(+) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c >> b/src/mesa/drivers/dri/i965/brw_pipe_control.c >> index 6326957a7a..3192d31758 100644 >> --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c >> +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c >> @@ -89,6 +89,22 @@ gen7_cs_stall_every_four_pipe_controls(struct brw_context >> *brw, uint32_t flags) >> return 0; >> } >> >> +/* #1130 from gen10 workarounds page in h/w specs: >> + * "If a PIPE_CONTROL performs Render Target Cache Flush, function sets >> stall > ^ > Was this meant to be a quote? I don't see this text on this page. > >> + * at Pixel Scoreboard. Otherwise, the function assumes that PIPE_CONTROL >> + * performs Post Sync Operation and WA sets Depth Stall Enable. >> + * > > Why assume instead of checking bits 14, 15 and 23 of `flags`? > Good catch. Let me send a V2 with added check and above comment fixed. We don't need to check for bit 23. Workaround talks about 'Post Sync Operation' but says nothing about 'LRI Post Sync Operation'. We're not using bit 23 in Mesa anyways and the workaround is applicable only until CNL C0 stepping. > -Nanley > >> + * Applicable to CNL B0 and C0 steppings only. >> + */ >> +static void >> +gen10_add_rcpfe_workaround_bits(uint32_t *flags) >> +{ >> + if ((*flags & PIPE_CONTROL_RENDER_TARGET_FLUSH) != 0) >> + *flags = *flags | PIPE_CONTROL_STALL_AT_SCOREBOARD; >> + else >> + *flags = *flags | PIPE_CONTROL_DEPTH_STALL; >> +} >> + >> static void >> brw_emit_pipe_control(struct brw_context *brw, uint32_t flags, >>struct brw_bo *bo, uint32_t offset, uint64_t imm) >> @@ -109,6 +125,9 @@ brw_emit_pipe_control(struct brw_context *brw, uint32_t >> flags, >> brw_emit_pipe_control_flush(brw, 0); >>} >> >> + if (devinfo->gen == 10) >> + gen10_add_rcpfe_workaround_bits(); >> + >>BEGIN_BATCH(6); >>OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2)); >>OUT_BATCH(flags); >> -- >> 2.13.5 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 29/34] i965: Don't link when the program was found in the disk cache
On Sunday, October 22, 2017 1:01:37 PM PDT Jordan Justen wrote: > Signed-off-by: Jordan Justen> Cc: Timothy Arceri > --- > src/mesa/drivers/dri/i965/brw_link.cpp | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp > b/src/mesa/drivers/dri/i965/brw_link.cpp > index 988dd3a73d7..9019db56aa0 100644 > --- a/src/mesa/drivers/dri/i965/brw_link.cpp > +++ b/src/mesa/drivers/dri/i965/brw_link.cpp > @@ -225,6 +225,9 @@ brw_link_shader(struct gl_context *ctx, struct > gl_shader_program *shProg) > unsigned int stage; > struct shader_info *infos[MESA_SHADER_STAGES] = { 0, }; > > + if (shProg->data->LinkStatus == linking_skipped) > + return GL_TRUE; > + > for (stage = 0; stage < ARRAY_SIZE(shProg->_LinkedShaders); stage++) { >struct gl_linked_shader *shader = shProg->_LinkedShaders[stage]; >if (!shader) > I was concerned that we might be skipping some steps here, but I think everything's OK... Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] meson: Use true and false instead of yes and no for tristate options
Dylan Bakerwrites: > This allows a user to not care whether they're setting a tristate or a > boolean option, which is a nice user facing feature, and something I've > personally run into. > > Suggested-by: Adam Jackson > Signed-off-by: Dylan Baker Reviewed-by: Eric Anholt signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] meson: use dep_m in libgallium
Erik Faye-Lundwrites: > The u_format_other.c users sqrtf, which on some systems require > a math-library. So let's make sure we link with it. > > Signed-off-by: Erik Faye-Lund Reviewed-by: Eric Anholt signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 17/43] i965/fs: Enable rounding mode on f2f16 ops
On Thu, Oct 12, 2017 at 11:38 AM, Jose Maria Casanova Crespo < jmcasan...@igalia.com> wrote: > From: Alejandro Piñeiro> > By default we don't set the rounding mode. We only set > round-to-near-even or round-to-zero mode if explicitly set from nir. > > v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate > with the rounding mode (Curro) > > Signed-off-by: Jose Maria Casanova Crespo > Signed-off-by: Alejandro Piñeiro > --- > src/intel/compiler/brw_fs_nir.cpp | 8 > 1 file changed, 8 insertions(+) > > diff --git a/src/intel/compiler/brw_fs_nir.cpp > b/src/intel/compiler/brw_fs_nir.cpp > index 6908c7ea02..b356836e80 100644 > --- a/src/intel/compiler/brw_fs_nir.cpp > +++ b/src/intel/compiler/brw_fs_nir.cpp > @@ -693,6 +693,14 @@ fs_visitor::nir_emit_alu(const fs_builder , > nir_alu_instr *instr) >inst->saturate = instr->dest.saturate; >break; > > + case nir_op_f2f16_rtne: > + case nir_op_f2f16_rtz: > + if (instr->op == nir_op_f2f16_rtz) > + bld.emit(SHADER_OPCODE_RND_MODE, bld.null_reg_ud(), > brw_imm_d(BRW_RND_MODE_RTZ)); > + else if (instr->op == nir_op_f2f16_rtne) > + bld.emit(SHADER_OPCODE_RND_MODE, bld.null_reg_ud(), > brw_imm_d(BRW_RND_MODE_RTNE)); > + /* fallthrough */ > It might look a little nicer (though it's more lines of code) to have a little brw_from_nir_rounding_mode helper and then we could have just the one emit call. I don't care too much though. > + >/* In theory, it would be better to use BRW_OPCODE_F32TO16. > Depending > * on the HW gen, it is a special hw opcode or just a MOV, and > * brw_F32TO16 (at brw_eu_emit) would do the work to chose. > -- > 2.13.6 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 103496] svga_screen.c:26:46: error: git_sha1.h: No such file or directory
https://bugs.freedesktop.org/show_bug.cgi?id=103496 Ian Romanickchanged: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #4 from Ian Romanick --- I was just about to report the same bug... but 2117d033 seems to fix it for me. I'm going to close it. Vinson should reopen this bug if it's not fixed for him. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 12/43] i965/fs: Add brw_reg_type_from_bit_size utility method
On Mon, Oct 30, 2017 at 3:08 PM, Jason Ekstrandwrote: > On Thu, Oct 12, 2017 at 11:38 AM, Jose Maria Casanova Crespo < > jmcasan...@igalia.com> wrote: > >> From: Alejandro Piñeiro >> >> Returns the brw_type for a given ssa.bit_size, and a reference type. >> So if bit_size is 64, and the reference type is BRW_REGISTER_TYPE_F, >> it returns BRW_REGISTER_TYPE_DF. The same applies if bit_size is 32 >> and reference type is BRW_REGISTER_TYPE_HF it returns BRW_REGISTER_TYPE_F >> >> v2 (Jason Ekstrand): >> - Use better unreachable() messages >> - Add Q types >> >> Signed-off-by: Jose Maria Casanova Crespo >> Signed-off-by: Alejandro Piñeiro > Reviewed-by: Jason Ekstrand >> --- >> src/intel/compiler/brw_fs_nir.cpp | 69 ++ >> ++--- >> 1 file changed, 64 insertions(+), 5 deletions(-) >> >> diff --git a/src/intel/compiler/brw_fs_nir.cpp >> b/src/intel/compiler/brw_fs_nir.cpp >> index 7ed44f534c..affe65d5e9 100644 >> --- a/src/intel/compiler/brw_fs_nir.cpp >> +++ b/src/intel/compiler/brw_fs_nir.cpp >> @@ -227,6 +227,65 @@ fs_visitor::nir_emit_system_values() >> } >> } >> >> +/* >> + * Returns a type based on a reference_type (word, float, half-float) >> and a >> + * given bit_size. >> + * >> + * Reference BRW_REGISTER_TYPE are HF,F,DF,W,D,UW,UD. >> + * >> + * @FIXME: 64-bit return types are always DF on integer types to maintain >> + * compability with uses of DF previously to the introduction of int64 >> + * support. >> > I just read this comment and I really don't like it. This is going to come back to bite us if we don't fix it some better way. How many places do we actually need to override to DF? I suppose we'll need it for intrinsics and a couple of ALU operations such as bcsel. I'd like to keep it as contained as we can. --Jason > + */ >> +static brw_reg_type >> +brw_reg_type_from_bit_size(const unsigned bit_size, >> + const brw_reg_type reference_type) >> +{ >> + switch(reference_type) { >> + case BRW_REGISTER_TYPE_HF: >> + case BRW_REGISTER_TYPE_F: >> + case BRW_REGISTER_TYPE_DF: >> + switch(bit_size) { >> + case 16: >> + return BRW_REGISTER_TYPE_HF; >> + case 32: >> + return BRW_REGISTER_TYPE_F; >> + case 64: >> + return BRW_REGISTER_TYPE_DF; >> + default: >> + unreachable("Invalid bit size"); >> + } >> + case BRW_REGISTER_TYPE_W: >> + case BRW_REGISTER_TYPE_D: >> + case BRW_REGISTER_TYPE_Q: >> + switch(bit_size) { >> + case 16: >> + return BRW_REGISTER_TYPE_W; >> + case 32: >> + return BRW_REGISTER_TYPE_D; >> + case 64: >> + return BRW_REGISTER_TYPE_DF; >> > > This should be BRW_REGISTER_TYPE_Q > > >> + default: >> + unreachable("Invalid bit size"); >> + } >> + case BRW_REGISTER_TYPE_UW: >> + case BRW_REGISTER_TYPE_UD: >> + case BRW_REGISTER_TYPE_UQ: >> + switch(bit_size) { >> + case 16: >> + return BRW_REGISTER_TYPE_UW; >> + case 32: >> + return BRW_REGISTER_TYPE_UD; >> + case 64: >> + return BRW_REGISTER_TYPE_DF; >> > > This should be BRW_REGISTER_TYPE_UQ > > With those fixed, > > Reviewed-by: Jason Ekstrand > > >> + default: >> + unreachable("Invalid bit size"); >> + } >> + default: >> + unreachable("Unknown type"); >> + } >> +} >> + >> void >> fs_visitor::nir_emit_impl(nir_function_impl *impl) >> { >> @@ -240,7 +299,7 @@ fs_visitor::nir_emit_impl(nir_function_impl *impl) >> reg->num_array_elems == 0 ? 1 : reg->num_array_elems; >>unsigned size = array_elems * reg->num_components; >>const brw_reg_type reg_type = >> - reg->bit_size == 32 ? BRW_REGISTER_TYPE_F : >> BRW_REGISTER_TYPE_DF; >> + brw_reg_type_from_bit_size(reg->bit_size, BRW_REGISTER_TYPE_F); >>nir_locals[reg->index] = bld.vgrf(reg_type, size); >> } >> >> @@ -1341,7 +1400,7 @@ fs_visitor::nir_emit_load_const(const fs_builder >> , >> nir_load_const_instr *instr) >> { >> const brw_reg_type reg_type = >> - instr->def.bit_size == 32 ? BRW_REGISTER_TYPE_D : >> BRW_REGISTER_TYPE_DF; >> + brw_reg_type_from_bit_size(instr->def.bit_size, >> BRW_REGISTER_TYPE_D); >> fs_reg reg = bld.vgrf(reg_type, instr->def.num_components); >> >> switch (instr->def.bit_size) { >> @@ -1369,8 +1428,8 @@ fs_visitor::get_nir_src(const nir_src ) >> fs_reg reg; >> if (src.is_ssa) { >>if (src.ssa->parent_instr->type == nir_instr_type_ssa_undef) { >> - const brw_reg_type reg_type = src.ssa->bit_size == 32 ? >> -BRW_REGISTER_TYPE_D : BRW_REGISTER_TYPE_DF; >> + const brw_reg_type reg_type = >> +brw_reg_type_from_bit_size(src.ssa->bit_size, >> BRW_REGISTER_TYPE_D); >> reg =
Re: [Mesa-dev] [PATCH v3 12/43] i965/fs: Add brw_reg_type_from_bit_size utility method
On Thu, Oct 12, 2017 at 11:38 AM, Jose Maria Casanova Crespo < jmcasan...@igalia.com> wrote: > From: Alejandro Piñeiro> > Returns the brw_type for a given ssa.bit_size, and a reference type. > So if bit_size is 64, and the reference type is BRW_REGISTER_TYPE_F, > it returns BRW_REGISTER_TYPE_DF. The same applies if bit_size is 32 > and reference type is BRW_REGISTER_TYPE_HF it returns BRW_REGISTER_TYPE_F > > v2 (Jason Ekstrand): > - Use better unreachable() messages > - Add Q types > > Signed-off-by: Jose Maria Casanova Crespo > Signed-off-by: Alejandro Piñeiro Reviewed-by: Jason Ekstrand > --- > src/intel/compiler/brw_fs_nir.cpp | 69 ++ > ++--- > 1 file changed, 64 insertions(+), 5 deletions(-) > > diff --git a/src/intel/compiler/brw_fs_nir.cpp > b/src/intel/compiler/brw_fs_nir.cpp > index 7ed44f534c..affe65d5e9 100644 > --- a/src/intel/compiler/brw_fs_nir.cpp > +++ b/src/intel/compiler/brw_fs_nir.cpp > @@ -227,6 +227,65 @@ fs_visitor::nir_emit_system_values() > } > } > > +/* > + * Returns a type based on a reference_type (word, float, half-float) and > a > + * given bit_size. > + * > + * Reference BRW_REGISTER_TYPE are HF,F,DF,W,D,UW,UD. > + * > + * @FIXME: 64-bit return types are always DF on integer types to maintain > + * compability with uses of DF previously to the introduction of int64 > + * support. > + */ > +static brw_reg_type > +brw_reg_type_from_bit_size(const unsigned bit_size, > + const brw_reg_type reference_type) > +{ > + switch(reference_type) { > + case BRW_REGISTER_TYPE_HF: > + case BRW_REGISTER_TYPE_F: > + case BRW_REGISTER_TYPE_DF: > + switch(bit_size) { > + case 16: > + return BRW_REGISTER_TYPE_HF; > + case 32: > + return BRW_REGISTER_TYPE_F; > + case 64: > + return BRW_REGISTER_TYPE_DF; > + default: > + unreachable("Invalid bit size"); > + } > + case BRW_REGISTER_TYPE_W: > + case BRW_REGISTER_TYPE_D: > + case BRW_REGISTER_TYPE_Q: > + switch(bit_size) { > + case 16: > + return BRW_REGISTER_TYPE_W; > + case 32: > + return BRW_REGISTER_TYPE_D; > + case 64: > + return BRW_REGISTER_TYPE_DF; > This should be BRW_REGISTER_TYPE_Q > + default: > + unreachable("Invalid bit size"); > + } > + case BRW_REGISTER_TYPE_UW: > + case BRW_REGISTER_TYPE_UD: > + case BRW_REGISTER_TYPE_UQ: > + switch(bit_size) { > + case 16: > + return BRW_REGISTER_TYPE_UW; > + case 32: > + return BRW_REGISTER_TYPE_UD; > + case 64: > + return BRW_REGISTER_TYPE_DF; > This should be BRW_REGISTER_TYPE_UQ With those fixed, Reviewed-by: Jason Ekstrand > + default: > + unreachable("Invalid bit size"); > + } > + default: > + unreachable("Unknown type"); > + } > +} > + > void > fs_visitor::nir_emit_impl(nir_function_impl *impl) > { > @@ -240,7 +299,7 @@ fs_visitor::nir_emit_impl(nir_function_impl *impl) > reg->num_array_elems == 0 ? 1 : reg->num_array_elems; >unsigned size = array_elems * reg->num_components; >const brw_reg_type reg_type = > - reg->bit_size == 32 ? BRW_REGISTER_TYPE_F : BRW_REGISTER_TYPE_DF; > + brw_reg_type_from_bit_size(reg->bit_size, BRW_REGISTER_TYPE_F); >nir_locals[reg->index] = bld.vgrf(reg_type, size); > } > > @@ -1341,7 +1400,7 @@ fs_visitor::nir_emit_load_const(const fs_builder > , > nir_load_const_instr *instr) > { > const brw_reg_type reg_type = > - instr->def.bit_size == 32 ? BRW_REGISTER_TYPE_D : > BRW_REGISTER_TYPE_DF; > + brw_reg_type_from_bit_size(instr->def.bit_size, > BRW_REGISTER_TYPE_D); > fs_reg reg = bld.vgrf(reg_type, instr->def.num_components); > > switch (instr->def.bit_size) { > @@ -1369,8 +1428,8 @@ fs_visitor::get_nir_src(const nir_src ) > fs_reg reg; > if (src.is_ssa) { >if (src.ssa->parent_instr->type == nir_instr_type_ssa_undef) { > - const brw_reg_type reg_type = src.ssa->bit_size == 32 ? > -BRW_REGISTER_TYPE_D : BRW_REGISTER_TYPE_DF; > + const brw_reg_type reg_type = > +brw_reg_type_from_bit_size(src.ssa->bit_size, > BRW_REGISTER_TYPE_D); > reg = bld.vgrf(reg_type, src.ssa->num_components); >} else { > reg = nir_ssa_values[src.ssa->index]; > @@ -1404,7 +1463,7 @@ fs_visitor::get_nir_dest(const nir_dest ) > { > if (dest.is_ssa) { >const brw_reg_type reg_type = > - dest.ssa.bit_size == 32 ? BRW_REGISTER_TYPE_F : > BRW_REGISTER_TYPE_DF; > + brw_reg_type_from_bit_size(dest.ssa.bit_size, > BRW_REGISTER_TYPE_F); >nir_ssa_values[dest.ssa.index] = > bld.vgrf(reg_type, dest.ssa.num_components); >return nir_ssa_values[dest.ssa.index]; > --
Re: [Mesa-dev] [PATCH v3 08/43] spirv: Enable FPRoundingMode decorator to nir operations
On Thu, Oct 12, 2017 at 11:37 AM, Jose Maria Casanova Crespo < jmcasan...@igalia.com> wrote: > SpvOpFConvert now manages the FPRoundingMode decorator for the > returning values enabling the nir_rounding_mode in the conversion > operation to fp16 values. > > v2: Fixed breaking of specialization constants. (Jason Ekstrand) > --- > src/compiler/spirv/vtn_alu.c | 32 > 1 file changed, 32 insertions(+) > > diff --git a/src/compiler/spirv/vtn_alu.c b/src/compiler/spirv/vtn_alu.c > index 7ec30b8a63..b7e1b72889 100644 > --- a/src/compiler/spirv/vtn_alu.c > +++ b/src/compiler/spirv/vtn_alu.c > @@ -381,6 +381,26 @@ handle_no_contraction(struct vtn_builder *b, struct > vtn_value *val, int member, > b->nb.exact = true; > } > > +static void > +handle_rounding_mode(struct vtn_builder *b, struct vtn_value *val, int > member, > + const struct vtn_decoration *dec, void > *_out_rounding_mode) > +{ > How about we put nir_rounding_mode *out_rounding_mode = _out_rounding_mode; here and avoid the cast below. > +assert(dec->scope == VTN_DEC_DECORATION); > +if (dec->decoration != SpvDecorationFPRoundingMode) > + return; > +switch (dec->literals[0]) { > +case SpvFPRoundingModeRTE: > + *((nir_rounding_mode *) _out_rounding_mode) = > nir_rounding_mode_rtne; > + break; > +case SpvFPRoundingModeRTZ: > + *((nir_rounding_mode *) _out_rounding_mode) = > nir_rounding_mode_rtz; > + break; > +default: > + unreachable("Not supported rounding mode"); > + break; > +} > +} > + > void > vtn_handle_alu(struct vtn_builder *b, SpvOp opcode, > const uint32_t *w, unsigned count) > @@ -568,6 +588,18 @@ vtn_handle_alu(struct vtn_builder *b, SpvOp opcode, >vtn_handle_bitcast(b, val->ssa, src[0]); >break; > > + case SpvOpFConvert: { > + nir_alu_type src_alu_type = nir_get_nir_type_for_glsl_ > type(vtn_src[0]->type); > + nir_alu_type dst_alu_type = nir_get_nir_type_for_glsl_type(type); > + nir_rounding_mode rounding_mode = nir_rounding_mode_undef; > + > + vtn_foreach_decoration(b, val, handle_rounding_mode, > _mode); > + nir_op op = nir_type_conversion_op(src_alu_type, dst_alu_type, > rounding_mode); > + > + val->ssa->def = nir_build_alu(>nb, op, src[0], src[1], NULL, > NULL); > + break; > + } > + > default: { >bool swap; >nir_alu_type src_alu_type = nir_get_nir_type_for_glsl_ > type(vtn_src[0]->type); > -- > 2.13.6 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 07/43] spirv/nir: Handle 16-bit types
On Thu, Oct 12, 2017 at 11:37 AM, Jose Maria Casanova Crespo < jmcasan...@igalia.com> wrote: > From: Eduardo Lima Mitev> > v2: Added more missing implementations of 16-bit types. (Jason Ekstrand) > > Signed-off-by: Jose Maria Casanova Crespo > Signed-off-by: Eduardo Lima > --- > src/compiler/spirv/spirv_to_nir.c | 46 ++ > ++-- > src/compiler/spirv/vtn_variables.c | 21 + > 2 files changed, 60 insertions(+), 7 deletions(-) > > diff --git a/src/compiler/spirv/spirv_to_nir.c > b/src/compiler/spirv/spirv_to_nir.c > index 079ff0fe95..ea544b065c 100644 > --- a/src/compiler/spirv/spirv_to_nir.c > +++ b/src/compiler/spirv/spirv_to_nir.c > @@ -104,10 +104,13 @@ vtn_const_ssa_value(struct vtn_builder *b, > nir_constant *constant, > switch (glsl_get_base_type(type)) { > case GLSL_TYPE_INT: > case GLSL_TYPE_UINT: > + case GLSL_TYPE_INT16: > + case GLSL_TYPE_UINT16: > case GLSL_TYPE_INT64: > case GLSL_TYPE_UINT64: > case GLSL_TYPE_BOOL: > case GLSL_TYPE_FLOAT: > + case GLSL_TYPE_FLOAT16: > case GLSL_TYPE_DOUBLE: { >int bit_size = glsl_get_bit_size(type); >if (glsl_type_is_vector_or_scalar(type)) { > @@ -751,16 +754,32 @@ vtn_handle_type(struct vtn_builder *b, SpvOp opcode, >int bit_size = w[2]; >const bool signedness = w[3]; >val->type->base_type = vtn_base_type_scalar; > - if (bit_size == 64) > + if (bit_size == 64) { > val->type->type = (signedness ? glsl_int64_t_type() : > glsl_uint64_t_type()); > - else > + } else if (bit_size == 16) { > + val->type->type = (signedness ? glsl_int16_t_type() : > glsl_uint16_t_type()); > + } else { > + assert(bit_size == 32); > val->type->type = (signedness ? glsl_int_type() : > glsl_uint_type()); > + } >break; > } > case SpvOpTypeFloat: { >int bit_size = w[2]; >val->type->base_type = vtn_base_type_scalar; > - val->type->type = bit_size == 64 ? glsl_double_type() : > glsl_float_type(); > + switch (bit_size) { > + case 16: > + val->type->type = glsl_float16_t_type(); > + break; > + case 32: > + val->type->type = glsl_float_type(); > + break; > + case 64: > + val->type->type = glsl_double_type(); > + break; > + default: > + assert(!"Invalid float bit size"); > + } >break; > } > > @@ -980,10 +999,13 @@ vtn_null_constant(struct vtn_builder *b, const > struct glsl_type *type) > switch (glsl_get_base_type(type)) { > case GLSL_TYPE_INT: > case GLSL_TYPE_UINT: > + case GLSL_TYPE_INT16: > + case GLSL_TYPE_UINT16: > case GLSL_TYPE_INT64: > case GLSL_TYPE_UINT64: > case GLSL_TYPE_BOOL: > case GLSL_TYPE_FLOAT: > + case GLSL_TYPE_FLOAT16: > case GLSL_TYPE_DOUBLE: >/* Nothing to do here. It's already initialized to zero */ >break; > @@ -1110,7 +1132,7 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp > opcode, > val->constant->values->u32[0] = w[3]; > val->constant->values->u32[1] = w[4]; >} else { > - assert(bit_size == 32); > + assert(bit_size == 32 || bit_size == 16); > val->constant->values->u32[0] = w[3]; >} >break; > @@ -1136,9 +1158,12 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp > opcode, >switch (glsl_get_base_type(val->const_type)) { >case GLSL_TYPE_UINT: >case GLSL_TYPE_INT: > + case GLSL_TYPE_UINT16: > + case GLSL_TYPE_INT16: >case GLSL_TYPE_UINT64: >case GLSL_TYPE_INT64: >case GLSL_TYPE_FLOAT: > + case GLSL_TYPE_FLOAT16: >case GLSL_TYPE_BOOL: >case GLSL_TYPE_DOUBLE: { > int bit_size = glsl_get_bit_size(val->const_type); > @@ -1153,7 +1178,7 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp > opcode, > if (bit_size == 64) { >val->constant->values[0].u64[i] = > elems[i]->values[0].u64[0]; > } else { > - assert(bit_size == 32); > + assert(bit_size == 32 || bit_size == 16); >val->constant->values[0].u32[i] = > elems[i]->values[0].u32[0]; > } > } > @@ -1228,6 +1253,7 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp > opcode, >val->constant->values[0].u64[j] = u64[comp]; > } > } else { > +/* This is for both 32-bit and 16-bit values */ > uint32_t u32[8]; > if (v0->value_type == vtn_value_type_constant) { > for (unsigned i = 0; i < len0; i++) > @@ -1276,9 +1302,12 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp > opcode, > switch (glsl_get_base_type(type)) { > case GLSL_TYPE_UINT: > case GLSL_TYPE_INT: > +
Re: [Mesa-dev] [PATCH v3 06/43] nir: Handle fp16 rounding modes at nir_type_conversion_op
On Thu, Oct 12, 2017 at 11:37 AM, Jose Maria Casanova Crespo < jmcasan...@igalia.com> wrote: > nir_type_conversion enables new operations to handle rounding modes to > convert to fp16 values. Two new opcodes are enabled nir_op_f2f16_rtne > and nir_op_f2f16_rtz. > > The undefined behaviour doesn't has any effect and uses the original > nir_op_f2f16 operation. > > v2: Indentation fixed (Jason Ekstrand) > --- > src/compiler/glsl/glsl_to_nir.cpp | 3 ++- > src/compiler/nir/nir.h| 3 ++- > src/compiler/nir/nir_opcodes.py | 10 -- > src/compiler/nir/nir_opcodes_c.py | 15 ++- > src/compiler/spirv/vtn_alu.c | 2 +- > 5 files changed, 27 insertions(+), 6 deletions(-) > > diff --git a/src/compiler/glsl/glsl_to_nir.cpp > b/src/compiler/glsl/glsl_to_nir.cpp > index 9f25e30678..5738979b19 100644 > --- a/src/compiler/glsl/glsl_to_nir.cpp > +++ b/src/compiler/glsl/glsl_to_nir.cpp > @@ -1575,7 +1575,8 @@ nir_visitor::visit(ir_expression *ir) > case ir_unop_u642i64: { >nir_alu_type src_type = nir_get_nir_type_for_glsl_ > base_type(types[0]); >nir_alu_type dst_type = nir_get_nir_type_for_glsl_ > base_type(out_type); > - result = nir_build_alu(, nir_type_conversion_op(src_type, > dst_type), > + result = nir_build_alu(, nir_type_conversion_op(src_type, > dst_type, > + nir_rounding_mode_undef), > srcs[0], NULL, NULL, NULL); >/* b2i and b2f don't have fixed bit-size versions so the builder > will > * just assume 32 and we have to fix it up here. > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h > index fb269fcb28..93f0d52804 100644 > --- a/src/compiler/nir/nir.h > +++ b/src/compiler/nir/nir.h > @@ -753,7 +753,8 @@ nir_get_nir_type_for_glsl_type(const struct glsl_type > *type) > return nir_get_nir_type_for_glsl_base_type(glsl_get_base_type(type)); > } > > -nir_op nir_type_conversion_op(nir_alu_type src, nir_alu_type dst); > +nir_op nir_type_conversion_op(nir_alu_type src, nir_alu_type dst, > + nir_rounding_mode rnd); > > typedef enum { > NIR_OP_IS_COMMUTATIVE = (1 << 0), > diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_ > opcodes.py > index 06ae820c3e..0abc34f037 100644 > --- a/src/compiler/nir/nir_opcodes.py > +++ b/src/compiler/nir/nir_opcodes.py > @@ -179,8 +179,14 @@ for src_t in [tint, tuint, tfloat]: >else: > bit_sizes = [8, 16, 32, 64] >for bit_size in bit_sizes: > - unop_convert("{0}2{1}{2}".format(src_t[0], dst_t[0], bit_size), > - dst_t + str(bit_size), src_t, "src0") > + if bit_size == 16 and dst_t == tfloat and src_t == tfloat: > + rnd_modes = ['rtne', 'rtz'] > + for rnd_mode in rnd_modes: > + unop_convert("{0}2{1}{2}_{3}".format(src_t[0], > dst_t[0], > + bit_size, > rnd_mode), > + dst_t + str(bit_size), src_t, "src0") > + unop_convert("{0}2{1}{2}".format(src_t[0], dst_t[0], bit_size), > + dst_t + str(bit_size), src_t, "src0") > > # We'll hand-code the to/from bool conversion opcodes. Because bool > doesn't > # have multiple bit-sizes, we can always infer the size from the other > type. > diff --git a/src/compiler/nir/nir_opcodes_c.py b/src/compiler/nir/nir_ > opcodes_c.py > index 02bb4738ed..95a76ea39f 100644 > --- a/src/compiler/nir/nir_opcodes_c.py > +++ b/src/compiler/nir/nir_opcodes_c.py > @@ -30,7 +30,7 @@ template = Template(""" > #include "nir.h" > > nir_op > -nir_type_conversion_op(nir_alu_type src, nir_alu_type dst) > +nir_type_conversion_op(nir_alu_type src, nir_alu_type dst, > nir_rounding_mode rnd) > { > nir_alu_type src_base = (nir_alu_type) nir_alu_type_get_base_type( > src); > nir_alu_type dst_base = (nir_alu_type) nir_alu_type_get_base_type( > dst); > @@ -64,7 +64,20 @@ nir_type_conversion_op(nir_alu_type src, nir_alu_type > dst) > switch (dst_bit_size) { > % for dst_bits in [16, 32, 64]: >case ${dst_bits}: > +%if src_t == 'float' and dst_t == 'float' and > dst_bits == 16: > + switch(rnd) { > +% for rnd_t in ['rtne', 'rtz']: > +case nir_rounding_mode_${rnd_t}: > + return ${'nir_op_{0}2{1}{2}_{3}'.format(src_t[0], > dst_t[0], > + > dst_bits, rnd_t)}; > +% endfor > +default: > + return ${'nir_op_{0}2{1}{2}'.format(src_t[0], > dst_t[0], > + dst_bits)}; > I commented on an earlier version of this series that having a default here makes me nervous. Someone will pass a rounding mode in for a double->float conversion or pass in a rounding mode of
[Mesa-dev] [PATCH] i965: Fix ARB_indirect_parameters logic.
This patch modifies the ARB_indirect_parameters logic in brw_draw_prims, so that our implementation isn't affected if another application attempts to use predicates. Previously we were using a predicate with a DELTAS_EQUAL comparison operation and relying on the MI_PREDICATE_DATA register being 0. Our code to initialize MI_PREDICATE_DATA to 0 was incorrect, so we were accidentally using whatever value was written there. Because the kernel does not initialize the MI_PREDICATE_DATA register on hardware context creation, we might inherit the value from whatever context was last running on the GPU (likely another process). The Haswell command parser also does not currently allow us to write the MI_PREDICATE_DATA register. Rather than fixing this and requiring an updated kernel, we switch to a different approach which uses a SRCS_EQUAL predicate that makes no assumptions about the states of any of the predicate registers. Fixes: piglit.spec.arb_indirect_parameters.tf-count-arrays Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103085 Signed-off-by: Plamena ManolovaCC: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_draw.c | 45 +++- 1 file changed, 14 insertions(+), 31 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index 58145fad77..0259a75e1d 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -875,7 +875,6 @@ brw_draw_prims(struct gl_context *ctx, struct brw_context *brw = brw_context(ctx); const struct gl_vertex_array **arrays = ctx->Array._DrawArrays; int predicate_state = brw->predicate.state; - int combine_op = MI_PREDICATE_COMBINEOP_SET; struct brw_transform_feedback_object *xfb_obj = (struct brw_transform_feedback_object *) gl_xfb_obj; @@ -919,49 +918,33 @@ brw_draw_prims(struct gl_context *ctx, * to it. */ -if (brw->draw.draw_params_count_bo && -predicate_state == BRW_PREDICATE_STATE_USE_BIT) { - /* We need to empty the MI_PREDICATE_DATA register since it might - * already be set. - */ - - BEGIN_BATCH(4); - OUT_BATCH(MI_PREDICATE_DATA); - OUT_BATCH(0u); - OUT_BATCH(MI_PREDICATE_DATA + 4); - OUT_BATCH(0u); - ADVANCE_BATCH(); - - /* We need to combine the results of both predicates.*/ - combine_op = MI_PREDICATE_COMBINEOP_AND; - } - for (i = 0; i < nr_prims; i++) { /* Implementation of ARB_indirect_parameters via predicates */ if (brw->draw.draw_params_count_bo) { - struct brw_bo *draw_id_bo = NULL; - uint32_t draw_id_offset; - - intel_upload_data(brw, [i].draw_id, 4, 4, _id_bo, - _id_offset); - brw_emit_pipe_control_flush(brw, PIPE_CONTROL_FLUSH_ENABLE); + /* Upload the current draw count from the draw parameters buffer to MI_PREDICATE_SRC0.*/ brw_load_register_mem(brw, MI_PREDICATE_SRC0, brw->draw.draw_params_count_bo, brw->draw.draw_params_count_offset); - brw_load_register_mem(brw, MI_PREDICATE_SRC1, draw_id_bo, - draw_id_offset); + /* Zero the top 32-bits of MI_PREDICATE_SRC0 */ + brw_load_register_imm32(brw, MI_PREDICATE_SRC0 + 4, 0); + /* Upload the id of the current primitive to MI_PREDICATE_SRC1.*/ + brw_load_register_imm64(brw, MI_PREDICATE_SRC1, prims[i].draw_id); BEGIN_BATCH(1); - OUT_BATCH(GEN7_MI_PREDICATE | - MI_PREDICATE_LOADOP_LOADINV | combine_op | - MI_PREDICATE_COMPAREOP_DELTAS_EQUAL); + if (i == 0 && brw->predicate.state != BRW_PREDICATE_STATE_USE_BIT) { +OUT_BATCH(GEN7_MI_PREDICATE | MI_PREDICATE_LOADOP_LOADINV | + MI_PREDICATE_COMBINEOP_SET | + MI_PREDICATE_COMPAREOP_SRCS_EQUAL); + } else { +OUT_BATCH(GEN7_MI_PREDICATE | + MI_PREDICATE_LOADOP_LOAD | MI_PREDICATE_COMBINEOP_XOR | + MI_PREDICATE_COMPAREOP_SRCS_EQUAL); + } ADVANCE_BATCH(); brw->predicate.state = BRW_PREDICATE_STATE_USE_BIT; - - brw_bo_unreference(draw_id_bo); } brw_draw_single_prim(ctx, arrays, [i], i, xfb_obj, stream, -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] i965/gen10: Implement WaForceRCPFEHangWorkaround
On Mon, Oct 02, 2017 at 04:07:58PM -0700, Anuj Phogat wrote: > Cc: mesa-sta...@lists.freedesktop.org > Signed-off-by: Anuj Phogat> --- > src/mesa/drivers/dri/i965/brw_pipe_control.c | 19 +++ > 1 file changed, 19 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c > b/src/mesa/drivers/dri/i965/brw_pipe_control.c > index 6326957a7a..3192d31758 100644 > --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c > +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c > @@ -89,6 +89,22 @@ gen7_cs_stall_every_four_pipe_controls(struct brw_context > *brw, uint32_t flags) > return 0; > } > > +/* #1130 from gen10 workarounds page in h/w specs: > + * "If a PIPE_CONTROL performs Render Target Cache Flush, function sets stall ^ Was this meant to be a quote? I don't see this text on this page. > + * at Pixel Scoreboard. Otherwise, the function assumes that PIPE_CONTROL > + * performs Post Sync Operation and WA sets Depth Stall Enable. > + * Why assume instead of checking bits 14, 15 and 23 of `flags`? -Nanley > + * Applicable to CNL B0 and C0 steppings only. > + */ > +static void > +gen10_add_rcpfe_workaround_bits(uint32_t *flags) > +{ > + if ((*flags & PIPE_CONTROL_RENDER_TARGET_FLUSH) != 0) > + *flags = *flags | PIPE_CONTROL_STALL_AT_SCOREBOARD; > + else > + *flags = *flags | PIPE_CONTROL_DEPTH_STALL; > +} > + > static void > brw_emit_pipe_control(struct brw_context *brw, uint32_t flags, >struct brw_bo *bo, uint32_t offset, uint64_t imm) > @@ -109,6 +125,9 @@ brw_emit_pipe_control(struct brw_context *brw, uint32_t > flags, > brw_emit_pipe_control_flush(brw, 0); >} > > + if (devinfo->gen == 10) > + gen10_add_rcpfe_workaround_bits(); > + >BEGIN_BATCH(6); >OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2)); >OUT_BATCH(flags); > -- > 2.13.5 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Disable L3 cache allocation for external buffers
Bump. Any update on this? On Tue, 2017-10-24 at 14:03 -0700, Jason Ekstrand wrote: > On Tue, Oct 24, 2017 at 9:06 AM, Chris Wilson> wrote: > > Through the use of mocs, we can define the cache usage for any surface > > > > used by the GPU. In particular, we can request that L3 cache be > > > > allocated for either a read/write miss so that subsequent reads can be > > > > fetched from cache rather than memory. A consequence of this is that if > > > > we allocate a L3/LLC cacheline for a read and the object is changed in > > > > main memory (e.g. a PCIe write bypassing the CPU) then the next read > > > > will be serviced from the stale cache and not from the new data in > > > > memory. This is an issue for external PRIME buffers where we may miss > > > > the updates entirely if the image is small enough to fit within our > > > > cache. > > > > > > > > Currently, we have a single bit to mark all external buffers so use that > > > > to tell us when it is unsafe to use a cache override in mocs and > > > > fallback to the PTE value instead (which should be set to the correct > > > > cache level to be coherent amongst all active parties: PRIME, scanout and > > > > render). This may be refined in future to limit the override to buffers > > > > outside the control of mesa; as buffers being shared between mesa > > > > clients should be able to coordinate themselves without resolves. > > > > > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101691 > > > > Cc: Kenneth Graunke > > > > Cc: Jason Ekstrand > > > > Cc: Lyude Paul > > > > Cc: Timo Aalton > > > > Cc: Ben Widawsky > > > > Cc: Daniel Vetter > > > > --- > > > > src/intel/blorp/blorp.c | 1 + > > > > src/intel/blorp/blorp.h | 1 + > > > > src/intel/blorp/blorp_genX_exec.h| 2 +- > > > > src/intel/blorp/blorp_priv.h | 1 + > > > > src/mesa/drivers/dri/i965/brw_blorp.c| 1 + > > > > src/mesa/drivers/dri/i965/brw_state.h| 3 ++- > > > > src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 16 +++- > > > > 7 files changed, 18 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/src/intel/blorp/blorp.c b/src/intel/blorp/blorp.c > > > > index 7cc6335f2f..459ad66652 100644 > > > > --- a/src/intel/blorp/blorp.c > > > > +++ b/src/intel/blorp/blorp.c > > > > @@ -71,6 +71,7 @@ brw_blorp_surface_info_init(struct blorp_context *blorp, > > > > surf->surf->logical_level0_px.array_len)); > > > > > > > > info->enabled = true; > > > > + info->external = surf->external; > > > > > > > > if (format == ISL_FORMAT_UNSUPPORTED) > > > >format = surf->surf->format; > > > > diff --git a/src/intel/blorp/blorp.h b/src/intel/blorp/blorp.h > > > > index 9716c66302..af056c9d52 100644 > > > > --- a/src/intel/blorp/blorp.h > > > > +++ b/src/intel/blorp/blorp.h > > > > @@ -106,6 +106,7 @@ struct blorp_surf > > > > enum isl_aux_usage aux_usage; > > > > > > > > union isl_color_value clear_color; > > > > + bool external; > > > > }; > > > > > > > > void > > > > diff --git a/src/intel/blorp/blorp_genX_exec.h > > b/src/intel/blorp/blorp_genX_exec.h > > > > index 5389262098..18715788ff 100644 > > > > --- a/src/intel/blorp/blorp_genX_exec.h > > > > +++ b/src/intel/blorp/blorp_genX_exec.h > > > > @@ -1328,7 +1328,7 @@ blorp_emit_surface_states(struct blorp_batch *batch, > > > > blorp_emit_surface_state(batch, >src, > > > >surface_maps[BLORP_TEXTURE_BT_INDEX], > > > > > > surface_offsets[BLORP_TEXTURE_BT_INDEX], > > > > - NULL, false); > > > > + NULL, params->src.external); > > > >} > > > > } > > > > > > > > diff --git a/src/intel/blorp/blorp_priv.h b/src/intel/blorp/blorp_priv.h > > > > index c7d5d308da..f841aa7cdc 100644 > > > > --- a/src/intel/blorp/blorp_priv.h > > > > +++ b/src/intel/blorp/blorp_priv.h > > > > @@ -47,6 +47,7 @@ enum { > > > > struct brw_blorp_surface_info > > > > { > > > > bool enabled; > > > > + bool external; > > > > > > > > struct isl_surf surf; > > > > struct blorp_address addr; > > > > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c > > b/src/mesa/drivers/dri/i965/brw_blorp.c > > > > index ed4f9870f2..563d13a037 100644 > > > > --- a/src/mesa/drivers/dri/i965/brw_blorp.c > > > > +++ b/src/mesa/drivers/dri/i965/brw_blorp.c > > > > @@ -160,6 +160,7 @@ blorp_surf_for_miptree(struct brw_context *brw, > > > >.offset = mt->offset, > > > >.reloc_flags = is_render_target ? EXEC_OBJECT_WRITE : 0, > > > > }; > > > > + surf->external =
Re: [Mesa-dev] [PATCH v3 43/48] nir/lower_subgroups: Lower ballot intrinsics to the specified bit size
On Mon, Oct 30, 2017 at 11:53 AM, Jason Ekstrandwrote: > On Mon, Oct 30, 2017 at 5:10 AM, Iago Toral wrote: > >> On Wed, 2017-10-25 at 16:26 -0700, Jason Ekstrand wrote: >> > Ballot intrinsics return a bitfield of subgroups. In GLSL and some >> > SPIR-V extensions, they return a uint64_t. In SPV_KHR_shader_ballot, >> > they return a uvec4. Also, some back-ends would rather pass around >> > 32-bit values because it's easier than messing with 64-bit all the >> > time. >> > To solve this mess, we make nir_lower_subgroups take a new parameter >> > called ballot_bit_size and it lowers whichever thing it gets in from >> > the >> > source language (uint64_t or uvec4) to a scalar with the specified >> > number of bits. This replaces a chunk of the old lowering code. >> > >> > Reviewed-by: Lionel Landwerlin >> > --- >> > src/compiler/nir/nir.h | 3 +- >> > src/compiler/nir/nir_lower_subgroups.c | 101 >> > +++-- >> > src/compiler/nir/nir_opt_intrinsics.c | 18 -- >> > src/intel/compiler/brw_compiler.c | 1 - >> > src/intel/compiler/brw_nir.c | 1 + >> > 5 files changed, 98 insertions(+), 26 deletions(-) >> > >> > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h >> > index 1a25d7b..563b57f 100644 >> > --- a/src/compiler/nir/nir.h >> > +++ b/src/compiler/nir/nir.h >> > @@ -1854,8 +1854,6 @@ typedef struct nir_shader_compiler_options { >> > */ >> > bool use_interpolated_input_intrinsics; >> > >> > - unsigned max_subgroup_size; >> > - >> > unsigned max_unroll_iterations; >> > } nir_shader_compiler_options; >> > >> > @@ -2469,6 +2467,7 @@ bool nir_lower_samplers_as_deref(nir_shader >> > *shader, >> > const struct gl_shader_program >> > *shader_program); >> > >> > typedef struct nir_lower_subgroups_options { >> > + uint8_t ballot_bit_size; >> > bool lower_to_scalar:1; >> > bool lower_vote_trivial:1; >> > bool lower_subgroup_masks:1; >> > diff --git a/src/compiler/nir/nir_lower_subgroups.c >> > b/src/compiler/nir/nir_lower_subgroups.c >> > index 02738c4..1969740 100644 >> > --- a/src/compiler/nir/nir_lower_subgroups.c >> > +++ b/src/compiler/nir/nir_lower_subgroups.c >> > @@ -28,6 +28,43 @@ >> > * \file nir_opt_intrinsics.c >> > */ >> > >> > +/* Converts a uint32_t or uint64_t value to uint64_t or uvec4 */ >> > +static nir_ssa_def * >> > +uint_to_ballot_type(nir_builder *b, nir_ssa_def *value, >> > +unsigned num_components, unsigned bit_size, >> > +uint32_t extend_val) >> > +{ >> > + assert(value->num_components == 1); >> > + assert(value->bit_size == 32 || value->bit_size == 64); >> > + >> > + nir_ssa_def *extend = nir_imm_int(b, extend_val); >> >> Is it required that we do this extension? would it be incorrect if we >> extended with 0's? >> > > Thanks for making me look at that. The Vulkan spec requires that they be > set to 0. I guess I need to go rework some things. :) > I did a bit more looking and things here are thoroughly insane. I've filed CTS and spec bugs. Let's see where they go before I rewrite anything. > > + if (num_components > 1) { >> > + /* SPIR-V uses a uvec4 for ballot values */ >> > + assert(num_components == 4); >> > + assert(bit_size == 32); >> > + >> > + if (value->bit_size == 32) { >> > + return nir_vec4(b, value, extend, extend, extend); >> > + } else { >> > + assert(value->bit_size == 64); >> > + return nir_vec4(b, nir_unpack_64_2x32_split_x(b, value), >> > +nir_unpack_64_2x32_split_y(b, value), >> > +extend, extend); >> > + } >> > + } else { >> > + /* GLSL uses a uint64_t for ballot values */ >> > + assert(num_components == 1); >> > + assert(bit_size == 64); >> > + >> > + if (value->bit_size == 32) { >> > + return nir_pack_64_2x32_split(b, value, extend); >> > + } else { >> > + assert(value->bit_size == 64); >> > + return value; >> > + } >> > + } >> > +} >> > + >> > static nir_ssa_def * >> > lower_read_invocation_to_scalar(nir_builder *b, nir_intrinsic_instr >> > *intrin) >> > { >> > @@ -86,24 +123,78 @@ lower_subgroups_intrin(nir_builder *b, >> > nir_intrinsic_instr *intrin, >> >if (!options->lower_subgroup_masks) >> > return NULL; >> > >> > + uint64_t mask; >> > + switch (intrin->intrinsic) { >> > + case nir_intrinsic_load_subgroup_eq_mask: >> > + mask = 1ull; >> > + break; >> > + case nir_intrinsic_load_subgroup_ge_mask: >> > + case nir_intrinsic_load_subgroup_lt_mask: >> > + mask = ~0ull; >> > + break; >> > + case nir_intrinsic_load_subgroup_gt_mask: >> > + case nir_intrinsic_load_subgroup_le_mask: >> > + mask = ~1ull; >> > + break; >> > +
Re: [Mesa-dev] [PATCH 4/4] i965/gen10: Implement Wa3DStateMode
On Mon, Oct 30, 2017 at 12:28:53PM -0700, Anuj Phogat wrote: > On Mon, Oct 30, 2017 at 11:08 AM, Nanley Cherywrote: > > On Mon, Oct 02, 2017 at 04:08:00PM -0700, Anuj Phogat wrote: > >> Cc: mesa-sta...@lists.freedesktop.org > >> Signed-off-by: Anuj Phogat > >> --- > >> src/mesa/drivers/dri/i965/brw_state_upload.c | 7 +-- > >> 1 file changed, 5 insertions(+), 2 deletions(-) > >> > > > > Assuming my comment in patch 3 is correct, we no longer have a need to > > program this register. Therefore, we also don't need to implement this > > workaround right? > > I think we should still keep the workaround so that we don't miss it if we > start making use of this register in future. We've to enable dw1 bit:6 for few > CNL SKUs. Look at the bit description. > That's a good idea. > > > > -Nanley > > > >> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c > >> b/src/mesa/drivers/dri/i965/brw_state_upload.c > >> index a1bf54dc72..c224355a2b 100644 > >> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c > >> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c > >> @@ -88,8 +88,11 @@ brw_upload_initial_gpu_state(struct brw_context *brw) > >> if (devinfo->gen == 10) { > >>BEGIN_BATCH(2); > >>OUT_BATCH(_3DSTATE_3D_MODE << 16 | (2 - 2)); > >> - OUT_BATCH(GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE << 16 | > >> -GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE); > >> + /* From gen10 workaround table in h/w specs: > >> + * "On 3DSTATE_3D_MODE, driver must always program bits 31:16 of DW1 > >> + * a value of 0x" > >> + */ > >> + OUT_BATCH(0x << 16 | GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE); > >>ADVANCE_BATCH(); > >> } > >> > >> -- > >> 2.13.5 > >> > >> ___ > >> mesa-dev mailing list > >> mesa-dev@lists.freedesktop.org > >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] meson: use dep_m in libgallium
The u_format_other.c users sqrtf, which on some systems require a math-library. So let's make sure we link with it. Signed-off-by: Erik Faye-Lund--- I noticed this while debugging something else, thought I'd just send it upstream directly. src/gallium/auxiliary/meson.build | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/meson.build b/src/gallium/auxiliary/meson.build index bb7c0506d8..eed7064792 100644 --- a/src/gallium/auxiliary/meson.build +++ b/src/gallium/auxiliary/meson.build @@ -496,7 +496,7 @@ libgallium = static_library( ], c_args : [c_vis_args, c_msvc_compat_args], cpp_args : [cpp_vis_args, cpp_msvc_compat_args], - dependencies : [dep_libdrm, dep_llvm, dep_unwind, dep_dl], + dependencies : [dep_libdrm, dep_llvm, dep_unwind, dep_dl, dep_m], build_by_default : false, ) -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] i965/gen10: Enable float blend optimization
On Mon, Oct 30, 2017 at 11:03 AM, Nanley Cherywrote: > On Mon, Oct 02, 2017 at 04:07:59PM -0700, Anuj Phogat wrote: >> This optimization is enabled for previous generations too. >> See Mesa commit c17e214a6b >> On CNL this bit is moved to 3DSTATE_3D_MODE. > > Is this true? Looking at the HW docs, I actually found this bit to exist > in CACHE_MODE_SS. Bit 9 of 3DSTATE_3D_MODE shows that it's reserved MBZ. > You're right. Seems like things changed in the docs. I'll drop this patch. > -Nanley > >> >> Cc: mesa-sta...@lists.freedesktop.org >> Signed-off-by: Anuj Phogat >> --- >> src/mesa/drivers/dri/i965/brw_defines.h | 3 +++ >> src/mesa/drivers/dri/i965/brw_state_upload.c | 8 >> 2 files changed, 11 insertions(+) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h >> b/src/mesa/drivers/dri/i965/brw_defines.h >> index 270cdf29db..743b9d0a0d 100644 >> --- a/src/mesa/drivers/dri/i965/brw_defines.h >> +++ b/src/mesa/drivers/dri/i965/brw_defines.h >> @@ -1333,6 +1333,9 @@ enum brw_pixel_shader_coverage_mask_mode { >> /* DW2: start address */ >> /* DW3: end address. */ >> >> +#define _3DSTATE_3D_MODE 0x791E >> +# define GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE (1 << 9) >> + >> #define CMD_MI_FLUSH 0x0200 >> >> # define BLT_X_SHIFT 0 >> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c >> b/src/mesa/drivers/dri/i965/brw_state_upload.c >> index 7b31aad170..a1bf54dc72 100644 >> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c >> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c >> @@ -85,6 +85,14 @@ brw_upload_initial_gpu_state(struct brw_context *brw) >>} >> } >> >> + if (devinfo->gen == 10) { >> + BEGIN_BATCH(2); >> + OUT_BATCH(_3DSTATE_3D_MODE << 16 | (2 - 2)); >> + OUT_BATCH(GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE << 16 | >> +GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE); >> + ADVANCE_BATCH(); >> + } >> + >> if (devinfo->gen >= 8) { >>gen8_emit_3dstate_sample_pattern(brw); >> >> -- >> 2.13.5 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] i965/gen10: Implement Wa3DStateMode
On Mon, Oct 30, 2017 at 11:08 AM, Nanley Cherywrote: > On Mon, Oct 02, 2017 at 04:08:00PM -0700, Anuj Phogat wrote: >> Cc: mesa-sta...@lists.freedesktop.org >> Signed-off-by: Anuj Phogat >> --- >> src/mesa/drivers/dri/i965/brw_state_upload.c | 7 +-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> > > Assuming my comment in patch 3 is correct, we no longer have a need to > program this register. Therefore, we also don't need to implement this > workaround right? I think we should still keep the workaround so that we don't miss it if we start making use of this register in future. We've to enable dw1 bit:6 for few CNL SKUs. Look at the bit description. > > -Nanley > >> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c >> b/src/mesa/drivers/dri/i965/brw_state_upload.c >> index a1bf54dc72..c224355a2b 100644 >> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c >> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c >> @@ -88,8 +88,11 @@ brw_upload_initial_gpu_state(struct brw_context *brw) >> if (devinfo->gen == 10) { >>BEGIN_BATCH(2); >>OUT_BATCH(_3DSTATE_3D_MODE << 16 | (2 - 2)); >> - OUT_BATCH(GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE << 16 | >> -GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE); >> + /* From gen10 workaround table in h/w specs: >> + * "On 3DSTATE_3D_MODE, driver must always program bits 31:16 of DW1 >> + * a value of 0x" >> + */ >> + OUT_BATCH(0x << 16 | GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE); >>ADVANCE_BATCH(); >> } >> >> -- >> 2.13.5 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 43/48] nir/lower_subgroups: Lower ballot intrinsics to the specified bit size
On Mon, Oct 30, 2017 at 5:10 AM, Iago Toralwrote: > On Wed, 2017-10-25 at 16:26 -0700, Jason Ekstrand wrote: > > Ballot intrinsics return a bitfield of subgroups. In GLSL and some > > SPIR-V extensions, they return a uint64_t. In SPV_KHR_shader_ballot, > > they return a uvec4. Also, some back-ends would rather pass around > > 32-bit values because it's easier than messing with 64-bit all the > > time. > > To solve this mess, we make nir_lower_subgroups take a new parameter > > called ballot_bit_size and it lowers whichever thing it gets in from > > the > > source language (uint64_t or uvec4) to a scalar with the specified > > number of bits. This replaces a chunk of the old lowering code. > > > > Reviewed-by: Lionel Landwerlin > > --- > > src/compiler/nir/nir.h | 3 +- > > src/compiler/nir/nir_lower_subgroups.c | 101 > > +++-- > > src/compiler/nir/nir_opt_intrinsics.c | 18 -- > > src/intel/compiler/brw_compiler.c | 1 - > > src/intel/compiler/brw_nir.c | 1 + > > 5 files changed, 98 insertions(+), 26 deletions(-) > > > > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h > > index 1a25d7b..563b57f 100644 > > --- a/src/compiler/nir/nir.h > > +++ b/src/compiler/nir/nir.h > > @@ -1854,8 +1854,6 @@ typedef struct nir_shader_compiler_options { > > */ > > bool use_interpolated_input_intrinsics; > > > > - unsigned max_subgroup_size; > > - > > unsigned max_unroll_iterations; > > } nir_shader_compiler_options; > > > > @@ -2469,6 +2467,7 @@ bool nir_lower_samplers_as_deref(nir_shader > > *shader, > > const struct gl_shader_program > > *shader_program); > > > > typedef struct nir_lower_subgroups_options { > > + uint8_t ballot_bit_size; > > bool lower_to_scalar:1; > > bool lower_vote_trivial:1; > > bool lower_subgroup_masks:1; > > diff --git a/src/compiler/nir/nir_lower_subgroups.c > > b/src/compiler/nir/nir_lower_subgroups.c > > index 02738c4..1969740 100644 > > --- a/src/compiler/nir/nir_lower_subgroups.c > > +++ b/src/compiler/nir/nir_lower_subgroups.c > > @@ -28,6 +28,43 @@ > > * \file nir_opt_intrinsics.c > > */ > > > > +/* Converts a uint32_t or uint64_t value to uint64_t or uvec4 */ > > +static nir_ssa_def * > > +uint_to_ballot_type(nir_builder *b, nir_ssa_def *value, > > +unsigned num_components, unsigned bit_size, > > +uint32_t extend_val) > > +{ > > + assert(value->num_components == 1); > > + assert(value->bit_size == 32 || value->bit_size == 64); > > + > > + nir_ssa_def *extend = nir_imm_int(b, extend_val); > > Is it required that we do this extension? would it be incorrect if we > extended with 0's? > Thanks for making me look at that. The Vulkan spec requires that they be set to 0. I guess I need to go rework some things. :) > > + if (num_components > 1) { > > + /* SPIR-V uses a uvec4 for ballot values */ > > + assert(num_components == 4); > > + assert(bit_size == 32); > > + > > + if (value->bit_size == 32) { > > + return nir_vec4(b, value, extend, extend, extend); > > + } else { > > + assert(value->bit_size == 64); > > + return nir_vec4(b, nir_unpack_64_2x32_split_x(b, value), > > +nir_unpack_64_2x32_split_y(b, value), > > +extend, extend); > > + } > > + } else { > > + /* GLSL uses a uint64_t for ballot values */ > > + assert(num_components == 1); > > + assert(bit_size == 64); > > + > > + if (value->bit_size == 32) { > > + return nir_pack_64_2x32_split(b, value, extend); > > + } else { > > + assert(value->bit_size == 64); > > + return value; > > + } > > + } > > +} > > + > > static nir_ssa_def * > > lower_read_invocation_to_scalar(nir_builder *b, nir_intrinsic_instr > > *intrin) > > { > > @@ -86,24 +123,78 @@ lower_subgroups_intrin(nir_builder *b, > > nir_intrinsic_instr *intrin, > >if (!options->lower_subgroup_masks) > > return NULL; > > > > + uint64_t mask; > > + switch (intrin->intrinsic) { > > + case nir_intrinsic_load_subgroup_eq_mask: > > + mask = 1ull; > > + break; > > + case nir_intrinsic_load_subgroup_ge_mask: > > + case nir_intrinsic_load_subgroup_lt_mask: > > + mask = ~0ull; > > + break; > > + case nir_intrinsic_load_subgroup_gt_mask: > > + case nir_intrinsic_load_subgroup_le_mask: > > + mask = ~1ull; > > + break; > > + default: > > + unreachable("you seriously can't tell this is > > unreachable?"); > > + } > > + > >nir_ssa_def *count = nir_load_subgroup_invocation(b); > > + nir_ssa_def *shifted; > > + if (options->ballot_bit_size == 32 && intrin- > > >dest.ssa.bit_size == 32) { > > Maybe add a comment here stating that this is the
Re: [Mesa-dev] [PATCH v3 39/48] nir: Add a new subgroups lowering pass
On Mon, Oct 30, 2017 at 4:38 AM, Iago Toralwrote: > On Wed, 2017-10-25 at 16:26 -0700, Jason Ekstrand wrote: > > This commit pulls nir_lower_read_invocations_to_scalar along with > > most > > of the guts of nir_opt_intrinsics (which mostly does subgroup > > lowering) > > into a new nir_lower_subgroups pass. There are various other bits of > > subgroup lowering that we're going to want to do so it makes a bit > > more > > sense to keep it all together in one pass. We also move it in i965 > > to > > happen after nir_lower_system_values to ensure that because we want > > to > > handle the subgroup mask system value intrinsics here. > > --- > > src/compiler/Makefile.sources | 2 +- > > src/compiler/nir/nir.h | 12 +- > > .../nir/nir_lower_read_invocation_to_scalar.c | 112 - > > - > (...) > > diff --git a/src/compiler/nir/nir_opt_intrinsics.c > > b/src/compiler/nir/nir_opt_intrinsics.c > > index 26a0f96..98c8b1a 100644 > > --- a/src/compiler/nir/nir_opt_intrinsics.c > > +++ b/src/compiler/nir/nir_opt_intrinsics.c > > @@ -46,22 +46,14 @@ opt_intrinsics_impl(nir_function_impl *impl) > > > > switch (intrin->intrinsic) { > > case nir_intrinsic_vote_any: > > - case nir_intrinsic_vote_all: { > > -nir_const_value *val = nir_src_as_const_value(intrin- > > >src[0]); > > -if (!val && !b.shader->options->lower_vote_trivial) > > - continue; > > - > > -replacement = nir_ssa_for_src(, intrin->src[0], 1); > > + case nir_intrinsic_vote_all: > > +if (nir_src_as_const_value(intrin->src[0])) > > + replacement = nir_ssa_for_src(, intrin->src[0], 1); > > Isn't this redundant with what is being done in nir_lower_subgroups.c? > I was expectin that this code here would all be removed. > The old code was handling two cases: trivial lowering and constant folding. The former is now part of lowering but the later is an optimization that needs to be run as part of the optimization loop. One could make a case that these could be rolled into nir_opt_constant_folding. > > break; > > - } > > - case nir_intrinsic_vote_eq: { > > -nir_const_value *val = nir_src_as_const_value(intrin- > > >src[0]); > > -if (!val && !b.shader->options->lower_vote_trivial) > > - continue; > > - > > -replacement = nir_imm_int(, NIR_TRUE); > > + case nir_intrinsic_vote_eq: > > +if (nir_src_as_const_value(intrin->src[0])) > > + replacement = nir_imm_int(, NIR_TRUE); > > Same here. > > > break; > > - } > > (...) > > > diff --git a/src/intel/compiler/brw_compiler.c > > b/src/intel/compiler/brw_compiler.c > > index 2f6af7d..a6129e9 100644 > > --- a/src/intel/compiler/brw_compiler.c > > +++ b/src/intel/compiler/brw_compiler.c > > @@ -57,7 +57,6 @@ static const struct nir_shader_compiler_options > > scalar_nir_options = { > > .lower_unpack_snorm_4x8 = true, > > .lower_unpack_unorm_2x16 = true, > > .lower_unpack_unorm_4x8 = true, > > - .lower_subgroup_masks = true, > > .max_subgroup_size = 32, > > .max_unroll_iterations = 32, > > }; > > @@ -80,7 +79,6 @@ static const struct nir_shader_compiler_options > > vector_nir_options = { > > .lower_unpack_unorm_2x16 = true, > > .lower_extract_byte = true, > > .lower_extract_word = true, > > - .lower_vote_trivial = true, > > .max_unroll_iterations = 32, > > }; > > > > @@ -99,7 +97,6 @@ static const struct nir_shader_compiler_options > > vector_nir_options_gen6 = { > > .lower_unpack_unorm_2x16 = true, > > .lower_extract_byte = true, > > .lower_extract_word = true, > > - .lower_vote_trivial = true, > > .max_unroll_iterations = 32, > > }; > > > > diff --git a/src/intel/compiler/brw_nir.c > > b/src/intel/compiler/brw_nir.c > > index e5ff6de..f599f74 100644 > > --- a/src/intel/compiler/brw_nir.c > > +++ b/src/intel/compiler/brw_nir.c > > @@ -620,7 +620,6 @@ brw_preprocess_nir(const struct brw_compiler > > *compiler, nir_shader *nir) > > > > OPT(nir_lower_tex, _options); > > OPT(nir_normalize_cubemap_coords); > > - OPT(nir_lower_read_invocation_to_scalar); > > > > OPT(nir_lower_global_vars_to_local); > > > > @@ -637,6 +636,13 @@ brw_preprocess_nir(const struct brw_compiler > > *compiler, nir_shader *nir) > > > > OPT(nir_lower_system_values); > > > > + const nir_lower_subgroups_options subgroups_options = { > > + .lower_to_scalar = true, > > + .lower_subgroup_masks = true, > > lower_subgroup_masks was not being set for vector compiles before, so > this is a change. Is this intended? > Yes and no. Matt didn't set it on vec4 shaders because we haven't enabled GL_ARB_shader_ballot on anything earlier than gen8 because it relies on int64. Even with the Vulkan feature, we won't support the geometry pipeline on gen7
Re: [Mesa-dev] [PATCH v3 23/48] intel/fs: Assign constant locations if they haven't been assigned
On Mon, Oct 30, 2017 at 12:43 AM, Iago Toralwrote: > On Fri, 2017-10-27 at 12:43 -0700, Jason Ekstrand wrote: > > On Fri, Oct 27, 2017 at 12:35 AM, Iago Toral wrote: > > This sounds good to me, but I guess it is not really fixing anything, > right? I ask because the subject claims that this patch does something > that the original code was already supposed to be doing. > > > This patch is a bit of an artifact of history. I needed it at one point > in the development of the series but I think it may have ended up not > mattering in the end. I still think it's a nice clean-up. > > > Ok, fair enough. In that case I'd suggest to change the subject to > something like this: > > "intel/fs: use pull constant locations to check for first compile of a > shader" > Done. > Also, I just noticed something wrong: > > (...) > > > > diff --git a/src/intel/compiler/brw_fs.cpp > > b/src/intel/compiler/brw_fs.cpp > > index 52079d3..75139fd 100644 > > --- a/src/intel/compiler/brw_fs.cpp > > +++ b/src/intel/compiler/brw_fs.cpp > > @@ -1956,8 +1956,10 @@ void > > fs_visitor::assign_constant_locations() > > { > > /* Only the first compile gets to decide on locations. */ > > - if (dispatch_width != min_dispatch_width) > > + if (push_constant_loc) { > > + assert(pull_constant_loc); > > > It is impossible that the assert is ever hit. I'd suggest to drop it or > maybe change it by: > The if condition is for push and the assert is for pull. --Jason > assert(dispatch_width != min_dispatch_width); > > But since you get rid of min_dispatch_with in the next patch, maybe just > drop the assert. > > The same in nir_setup_uniforms below. > > >return; > > + } > > > > bool is_live[uniforms]; > > memset(is_live, 0, sizeof(is_live)); > > diff --git a/src/intel/compiler/brw_fs_nir.cpp > > b/src/intel/compiler/brw_fs_nir.cpp > > index 7556576..05efee3 100644 > > --- a/src/intel/compiler/brw_fs_nir.cpp > > +++ b/src/intel/compiler/brw_fs_nir.cpp > > @@ -81,8 +81,11 @@ fs_visitor::nir_setup_outputs() > > void > > fs_visitor::nir_setup_uniforms() > > { > > - if (dispatch_width != min_dispatch_width) > > + /* Only the first compile gets to set up uniforms. */ > > + if (push_constant_loc) { > > + assert(pull_constant_loc); > > >return; > > + } > > > > uniforms = nir->num_uniforms / 4; > > } > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: bail out when binding the same vertex buffers
Signed-off-by: Samuel Pitoiset--- src/amd/vulkan/radv_cmd_buffer.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c index 00ed7182a7..4b38ece10f 100644 --- a/src/amd/vulkan/radv_cmd_buffer.c +++ b/src/amd/vulkan/radv_cmd_buffer.c @@ -2259,14 +2259,28 @@ void radv_CmdBindVertexBuffers( { RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer); struct radv_vertex_binding *vb = cmd_buffer->state.vertex_bindings; + bool changed = false; /* We have to defer setting up vertex buffer since we need the buffer * stride from the pipeline. */ assert(firstBinding + bindingCount <= MAX_VBS); for (uint32_t i = 0; i < bindingCount; i++) { - vb[firstBinding + i].buffer = radv_buffer_from_handle(pBuffers[i]); - vb[firstBinding + i].offset = pOffsets[i]; + uint32_t idx = firstBinding + i; + + if (!changed && + (vb[idx].buffer != radv_buffer_from_handle(pBuffers[i]) || +vb[idx].offset != pOffsets[i])) { + changed = true; + } + + vb[idx].buffer = radv_buffer_from_handle(pBuffers[i]); + vb[idx].offset = pOffsets[i]; + } + + if (!changed) { + /* No state changes. */ + return; } cmd_buffer->state.vb_dirty = true; -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 29/48] intel/cs: Rework the way thread local ID is handled
On Mon, Oct 30, 2017 at 12:33 AM, Iago Toralwrote: > On Fri, 2017-10-27 at 12:37 -0700, Jason Ekstrand wrote: > > On Fri, Oct 27, 2017 at 2:11 AM, Iago Toral wrote: > > On Wed, 2017-10-25 at 16:26 -0700, Jason Ekstrand wrote: > > Previously, brw_nir_lower_intrinsics added the param and then emitted > > a > > load_uniform intrinsic to load it directly. This commit switches > > things > > over to use a specific NIR intrinsic for the thread id. The one > > thing I > > don't like about this approach is that we have to copy > > thread_local_id > > over to the new visitor in import_uniforms. > > It is not clear to me why you are doing this... why do you like this > better? > > > For compute shaders, the SPIR-V subgroups stuff has a gl_subgroupId system > value which subgroup in the dispatch you are. That information is > basically the same as the thread_local_id only off by a factor of the SIMD > size. It's fairly arbitrary, but I figured we might as well switch over to > pushing the value that's defined in SPIR-V. > > > Oh, my question was not about pushing the subgroup id instead of the > thread local id (that is actually done in a later patch, not here) it is > about using a system value and changing the place where we push that last > uniform, which is what you change here. The implementation seems exactly > equivalent to what we had prior to this patch, so I was wondering if there > is any practical advantage in doing it like this. > Not really. It just seemed like, if we have a nir_load_* system value intrinsic, we may as well treat it as a system value like everything else. Assuming it doesn't cause too much pain, I think I'd be ok with dropping this if you really want. > Iago > > > --- > > src/compiler/nir/nir_intrinsics.h| 3 ++ > > src/intel/compiler/brw_fs.cpp| 4 +- > > src/intel/compiler/brw_fs.h | 1 + > > src/intel/compiler/brw_fs_nir.cpp| 14 +++ > > src/intel/compiler/brw_nir.h | 3 +- > > src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 53 +- > > -- > > 6 files changed, 32 insertions(+), 46 deletions(-) > > > > diff --git a/src/compiler/nir/nir_intrinsics.h > > b/src/compiler/nir/nir_intrinsics.h > > index cefd18b..47022dd 100644 > > --- a/src/compiler/nir/nir_intrinsics.h > > +++ b/src/compiler/nir/nir_intrinsics.h > > @@ -364,6 +364,9 @@ SYSTEM_VALUE(blend_const_color_a_float, 1, 0, xx, > > xx, xx) > > SYSTEM_VALUE(blend_const_color_rgba_unorm, 1, 0, xx, xx, xx) > > SYSTEM_VALUE(blend_const_color__unorm, 1, 0, xx, xx, xx) > > > > +/* Intel specific system values */ > > +SYSTEM_VALUE(intel_thread_local_id, 1, 0, xx, xx, xx) > > + > > /** > > * Barycentric coordinate intrinsics. > > * > > diff --git a/src/intel/compiler/brw_fs.cpp > > b/src/intel/compiler/brw_fs.cpp > > index 2acd838..c0d4c05 100644 > > --- a/src/intel/compiler/brw_fs.cpp > > +++ b/src/intel/compiler/brw_fs.cpp > > @@ -996,6 +996,7 @@ fs_visitor::import_uniforms(fs_visitor *v) > > this->push_constant_loc = v->push_constant_loc; > > this->pull_constant_loc = v->pull_constant_loc; > > this->uniforms = v->uniforms; > > + this->thread_local_id = v->thread_local_id; > > } > > > > void > > @@ -6781,8 +6782,7 @@ brw_compile_cs(const struct brw_compiler > > *compiler, void *log_data, > > { > > nir_shader *shader = nir_shader_clone(mem_ctx, src_shader); > > shader = brw_nir_apply_sampler_key(shader, compiler, >tex, > > true); > > - > > - brw_nir_lower_cs_intrinsics(shader, prog_data); > > + brw_nir_lower_cs_intrinsics(shader); > > shader = brw_postprocess_nir(shader, compiler, true); > > > > prog_data->local_size[0] = shader->info.cs.local_size[0]; > > diff --git a/src/intel/compiler/brw_fs.h > > b/src/intel/compiler/brw_fs.h > > index da32593..f51a4d8 100644 > > --- a/src/intel/compiler/brw_fs.h > > +++ b/src/intel/compiler/brw_fs.h > > @@ -315,6 +315,7 @@ public: > > */ > > int *push_constant_loc; > > > > + fs_reg thread_local_id; > > fs_reg frag_depth; > > fs_reg frag_stencil; > > fs_reg sample_mask; > > diff --git a/src/intel/compiler/brw_fs_nir.cpp > > b/src/intel/compiler/brw_fs_nir.cpp > > index 05efee3..fdc6fc6 100644 > > --- a/src/intel/compiler/brw_fs_nir.cpp > > +++ b/src/intel/compiler/brw_fs_nir.cpp > > @@ -88,6 +88,16 @@ fs_visitor::nir_setup_uniforms() > > } > > > > uniforms = nir->num_uniforms / 4; > > + > > + if (stage == MESA_SHADER_COMPUTE) { > > + /* Add a uniform for the thread local id. It must be the last > > uniform > > + * on the list. > > + */ > > + assert(uniforms == prog_data->nr_params); > > + uint32_t *param = brw_stage_prog_data_add_params(prog_data, > > 1); > > + *param = BRW_PARAM_BUILTIN_THREAD_LOCAL_ID; > > + thread_local_id = fs_reg(UNIFORM, uniforms++, > > BRW_REGISTER_TYPE_UD); > > + } > > } > > > >
Re: [Mesa-dev] [PATCH] meson: implement default driver arguments
Quoting Eric Engestrom (2017-10-30 10:29:25) > On Monday, 2017-10-30 10:21:50 -0700, Dylan Baker wrote: > > This allows drivers to be set by OS/arch in a sane manner. > > > > Signed-off-by: Dylan Baker> > --- > > meson.build | 37 +++-- > > meson_options.txt | 8 > > 2 files changed, 39 insertions(+), 6 deletions(-) > > > > diff --git a/meson.build b/meson.build > > index 24d997b3e0a..436d676d72d 100644 > > --- a/meson.build > > +++ b/meson.build > > @@ -90,7 +90,19 @@ with_dri_r200 = false > > with_dri_nouveau = false > > with_dri_swrast = false > > _drivers = get_option('dri-drivers') > > -if _drivers != '' > > +if _drivers == 'default' > > + if ['linux', 'bsd'].contains(host_machine.system()) > > +if ['x86', 'x86_64'].contains(host_machine.cpu_family()) > > + with_dri_i915 = true > > + with_dri_i965 = true > > + with_dri_r100 = true > > + with_dri_r200 = true > > + with_dri_nouveau = true > > + with_dri = true > > Yes to the idea, but to avoid having different parse paths for the > default case and when drivers are specified, how about > > if _drivers == 'default' > if os_and_arch logic > _drivers = 'i915,i965,r100,r200,nouveau' > else ... > endif > endif Sure, I can do that instead. > > > +endif > > +# TODO: PPC, Sparc > > + endif > > +elif _drivers != '' > >_split = _drivers.split(',') > >with_dri_i915 = _split.contains('i915') > >with_dri_i965 = _split.contains('i965') > > @@ -112,7 +124,28 @@ with_gallium_vc5 = false > > with_gallium_etnaviv = false > > with_gallium_imx = false > > _drivers = get_option('gallium-drivers') > > -if _drivers != '' > > +if _drivers == 'default' > > + if ['linux', 'bsd'].contains(host_machine.system()) > > +if ['x86', 'x86_64'].contains(host_machine.cpu_family()) > > + with_gallium_radeonsi = true > > + with_gallium_nouveau = true > > + with_gallium_softpipe = true > > + with_gallium = true > > + with_dri = true > > +elif ['arm', 'aarch64'].contains(host_machine.cpu_family()) > > + with_gallium_pl111 = true > > + with_gallium_vc4 = true > > + with_gallium_vc5 = true > > + with_gallium_freedreno = true > > + with_gallium_etnaviv = true > > + with_gallium_imx = true > > + with_gallium_softpipe = true > > + with_gallium = true > > + with_dri = true > > +endif > > +# TODO: PPC, Sparc > > + endif > > +elif _drivers != '' > >_split = _drivers.split(',') > >with_gallium_pl111 = _split.contains('pl111') > >with_gallium_radeonsi = _split.contains('radeonsi') > > diff --git a/meson_options.txt b/meson_options.txt > > index 74f1e71bf43..0de54f9422d 100644 > > --- a/meson_options.txt > > +++ b/meson_options.txt > > @@ -34,8 +34,8 @@ option( > > option( > >'dri-drivers', > >type : 'string', > > - value : 'i915,i965,r100,r200,nouveau', > > - description : 'comma separated list of dri drivers to build.' > > + value : 'default', > > + description : 'comma separated list of dri drivers to build. If this is > > set to default all drivers applicable to the target OS/architecture will be > > built' > > ) > > option( > >'dri-drivers-path', > > @@ -46,8 +46,8 @@ option( > > option( > >'gallium-drivers', > >type : 'string', > > - value : 'pl111,radeonsi,nouveau,freedreno,swrast,vc4,etnaviv,imx', > > - description : 'comma separated list of gallium drivers to build.' > > + value : 'default', > > + description : 'comma separated list of gallium drivers to build. If this > > is set to default all drivers applicable to the target OS/architecture will > > be built' > > ) > > option( > >'gallium-media', > > -- > > 2.14.3 > > signature.asc Description: signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] meson: Don't link gbm with threads
It's supposed to be linked with pthread-stubs (if the platform needs pthread-stubs). Pthread stubs support isn't (yet) implemented in the meson build, so add a TODO. Signed-off-by: Dylan Baker--- src/gbm/meson.build | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gbm/meson.build b/src/gbm/meson.build index 1bb3c94c387..fc1816cc17a 100644 --- a/src/gbm/meson.build +++ b/src/gbm/meson.build @@ -34,7 +34,7 @@ deps_gbm = [] if with_dri2 files_gbm += files('backends/dri/gbm_dri.c', 'backends/dri/gbm_driint.h') - deps_gbm += [dep_libdrm, dep_thread] + deps_gbm += dep_libdrm # TODO: pthread-stubs args_gbm += '-DDEFAULT_DRIVER_DIR="@0@"'.format(dri_driver_dir) endif if with_platform_wayland -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] meson: set visibility flags on gbm
This is done in autotools, and is an oversight in the meson build. Signed-off-by: Dylan Baker--- src/gbm/meson.build | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gbm/meson.build b/src/gbm/meson.build index fc1816cc17a..2fad2a2a8e3 100644 --- a/src/gbm/meson.build +++ b/src/gbm/meson.build @@ -50,7 +50,7 @@ libgbm = shared_library( include_directories : [ include_directories('main'), inc_include, inc_src, inc_loader, include_directories('../egl/wayland/wayland-drm')], - c_args : args_gbm, + c_args : [c_vis_args, args_gbm], link_args : [ld_args_gc_sections], link_with : [links_gbm, libloader, libmesa_util, libxmlconfig], dependencies : [deps_gbm, dep_dl], -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 19/48] i965/fs/nir: Don't stomp 64-bit values to D in get_nir_src
On Mon, Oct 30, 2017 at 12:15 AM, Iago Toralwrote: > On Fri, 2017-10-27 at 12:21 -0700, Jason Ekstrand wrote: > > On Thu, Oct 26, 2017 at 11:53 PM, Iago Toral wrote: > > On Wed, 2017-10-25 at 16:25 -0700, Jason Ekstrand wrote: > > --- > > src/intel/compiler/brw_fs_nir.cpp | 33 +-- > > -- > > 1 file changed, 21 insertions(+), 12 deletions(-) > > > > diff --git a/src/intel/compiler/brw_fs_nir.cpp > > b/src/intel/compiler/brw_fs_nir.cpp > > index e008e2e..a441f57 100644 > > --- a/src/intel/compiler/brw_fs_nir.cpp > > +++ b/src/intel/compiler/brw_fs_nir.cpp > > @@ -1441,11 +1441,19 @@ fs_visitor::get_nir_src(const nir_src ) > > src.reg.base_offset * src.reg.reg- > > >num_components); > > } > > > > - /* to avoid floating-point denorm flushing problems, set the type > > by > > -* default to D - instructions that need floating point semantics > > will set > > -* this to F if they need to > > -*/ > > - return retype(reg, BRW_REGISTER_TYPE_D); > > + if (nir_src_bit_size(src) == 64 && devinfo->gen == 7) { > > + /* The only 64-bit type available on gen7 is DF, so use that. > > */ > > + reg.type = BRW_REGISTER_TYPE_DF; > > + } else { > > + /* To avoid floating-point denorm flushing problems, set the > > type by > > + * default to an integer type - instructions that need > > floating point > > + * semantics will set this to F if they need to > > + */ > > + reg.type = brw_reg_type_from_bit_size(nir_src_bit_size(src), > > +BRW_REGISTER_TYPE_D); > > + } > > + > > + return reg; > > } > > > > /** > > @@ -1455,6 +1463,10 @@ fs_reg > > fs_visitor::get_nir_src_imm(const nir_src ) > > { > > nir_const_value *val = nir_src_as_const_value(src); > > + /* This function shouldn't be called on anything which can even > > +* possibly be 64 bits as it can't do what it claims. > > +*/ > > What would be wrong with something like this? > > if (nir_src_bit_size(src) == 32) >return val ? fs_reg(brw_imm_d(val->i32[0])) : get_nir_src(src); > else >return val ? fs_reg(brw_imm_df(val->f64[0])) : get_nir_src(src); > > > Because double immediates only really work on BDW+ and I didn't want > someone to call this function and get tripped up by that. > > > Ok, fair enough. In that case, maybe I'd suggest to clarify this in the > comment, since otherwise it is a bit confusing (or maybe assert on the > generation rather than the bitsize since that would be more > self-explanatory). > I'm not really clear on what you're asking for. Do you want it to work for 64-bit immediates and just assert on gen7? Or do you want the comment to be more clear? > > > > + assert(nir_src_bit_size(src) == 32); > > return val ? fs_reg(brw_imm_d(val->i32[0])) : get_nir_src(src); > > } > > > > @@ -2648,8 +2660,7 @@ fs_visitor::nir_emit_tcs_intrinsic(const > > fs_builder , > > */ > > unsigned channel = iter * 2 + i; > > fs_reg dest = shuffle_64bit_data_for_32bit_write(bld, > > - retype(offset(value, bld, 2 * channel), > > BRW_REGISTER_TYPE_DF), > > - 1); > > + offset(value, bld, channel), 1); > > > > srcs[header_regs + (i + first_component) * 2] = dest; > > srcs[header_regs + (i + first_component) * 2 + 1] = > > @@ -3505,8 +3516,7 @@ fs_visitor::nir_emit_cs_intrinsic(const > > fs_builder , > >if (nir_src_bit_size(instr->src[0]) == 64) { > > type_size = 8; > > val_reg = shuffle_64bit_data_for_32bit_write(bld, > > -retype(val_reg, BRW_REGISTER_TYPE_DF), > > -instr->num_components); > > +val_reg, instr->num_components); > >} > > > >unsigned type_slots = type_size / 4; > > @@ -4005,8 +4015,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder > > , nir_intrinsic_instr *instr > >if (nir_src_bit_size(instr->src[0]) == 64) { > > type_size = 8; > > val_reg = shuffle_64bit_data_for_32bit_write(bld, > > -retype(val_reg, BRW_REGISTER_TYPE_DF), > > -instr->num_components); > > +val_reg, instr->num_components); > >} > > > >unsigned type_slots = type_size / 4; > > @@ -4063,7 +4072,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder > > , nir_intrinsic_instr *instr > >unsigned first_component = nir_intrinsic_component(instr); > >if (nir_src_bit_size(instr->src[0]) == 64) { > > fs_reg tmp = shuffle_64bit_data_for_32bit_write(bld, > > -retype(src, BRW_REGISTER_TYPE_DF), num_components); > > +src, num_components); > > src = tmp; > > num_components *= 2; > >} > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
Re: [Mesa-dev] [PATCH 4/4] i965/gen10: Implement Wa3DStateMode
On Mon, Oct 02, 2017 at 04:08:00PM -0700, Anuj Phogat wrote: > Cc: mesa-sta...@lists.freedesktop.org > Signed-off-by: Anuj Phogat> --- > src/mesa/drivers/dri/i965/brw_state_upload.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > Assuming my comment in patch 3 is correct, we no longer have a need to program this register. Therefore, we also don't need to implement this workaround right? -Nanley > diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c > b/src/mesa/drivers/dri/i965/brw_state_upload.c > index a1bf54dc72..c224355a2b 100644 > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c > @@ -88,8 +88,11 @@ brw_upload_initial_gpu_state(struct brw_context *brw) > if (devinfo->gen == 10) { >BEGIN_BATCH(2); >OUT_BATCH(_3DSTATE_3D_MODE << 16 | (2 - 2)); > - OUT_BATCH(GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE << 16 | > -GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE); > + /* From gen10 workaround table in h/w specs: > + * "On 3DSTATE_3D_MODE, driver must always program bits 31:16 of DW1 > + * a value of 0x" > + */ > + OUT_BATCH(0x << 16 | GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE); >ADVANCE_BATCH(); > } > > -- > 2.13.5 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] i965/gen10: Enable float blend optimization
On Mon, Oct 02, 2017 at 04:07:59PM -0700, Anuj Phogat wrote: > This optimization is enabled for previous generations too. > See Mesa commit c17e214a6b > On CNL this bit is moved to 3DSTATE_3D_MODE. Is this true? Looking at the HW docs, I actually found this bit to exist in CACHE_MODE_SS. Bit 9 of 3DSTATE_3D_MODE shows that it's reserved MBZ. -Nanley > > Cc: mesa-sta...@lists.freedesktop.org > Signed-off-by: Anuj Phogat> --- > src/mesa/drivers/dri/i965/brw_defines.h | 3 +++ > src/mesa/drivers/dri/i965/brw_state_upload.c | 8 > 2 files changed, 11 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > b/src/mesa/drivers/dri/i965/brw_defines.h > index 270cdf29db..743b9d0a0d 100644 > --- a/src/mesa/drivers/dri/i965/brw_defines.h > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > @@ -1333,6 +1333,9 @@ enum brw_pixel_shader_coverage_mask_mode { > /* DW2: start address */ > /* DW3: end address. */ > > +#define _3DSTATE_3D_MODE 0x791E > +# define GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE (1 << 9) > + > #define CMD_MI_FLUSH 0x0200 > > # define BLT_X_SHIFT 0 > diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c > b/src/mesa/drivers/dri/i965/brw_state_upload.c > index 7b31aad170..a1bf54dc72 100644 > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c > @@ -85,6 +85,14 @@ brw_upload_initial_gpu_state(struct brw_context *brw) >} > } > > + if (devinfo->gen == 10) { > + BEGIN_BATCH(2); > + OUT_BATCH(_3DSTATE_3D_MODE << 16 | (2 - 2)); > + OUT_BATCH(GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE << 16 | > +GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE); > + ADVANCE_BATCH(); > + } > + > if (devinfo->gen >= 8) { >gen8_emit_3dstate_sample_pattern(brw); > > -- > 2.13.5 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] i965/gen10: Implement Wa3DStateMode
On Mon, Oct 30, 2017 at 10:40 AM, Rafael Antognolli < rafael.antogno...@intel.com> wrote: > On Thu, Oct 05, 2017 at 10:53:42AM -0700, Anuj Phogat wrote: > > On Wed, Oct 4, 2017 at 9:29 PM, Jason Ekstrand> wrote: > > > On Wed, Oct 4, 2017 at 3:11 PM, Anuj Phogat > wrote: > > >> > > >> On Mon, Oct 2, 2017 at 7:46 PM, Jason Ekstrand > > >> wrote: > > >> > On Mon, Oct 2, 2017 at 4:08 PM, Anuj Phogat > > >> > wrote: > > >> >> > > >> >> Cc: mesa-sta...@lists.freedesktop.org > > >> >> Signed-off-by: Anuj Phogat > > >> >> --- > > >> >> src/mesa/drivers/dri/i965/brw_state_upload.c | 7 +-- > > >> >> 1 file changed, 5 insertions(+), 2 deletions(-) > > >> >> > > >> >> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c > > >> >> b/src/mesa/drivers/dri/i965/brw_state_upload.c > > >> >> index a1bf54dc72..c224355a2b 100644 > > >> >> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c > > >> >> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c > > >> >> @@ -88,8 +88,11 @@ brw_upload_initial_gpu_state(struct brw_context > > >> >> *brw) > > >> >> if (devinfo->gen == 10) { > > >> >>BEGIN_BATCH(2); > > >> >>OUT_BATCH(_3DSTATE_3D_MODE << 16 | (2 - 2)); > > >> >> - OUT_BATCH(GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE << 16 | > > >> >> -GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE); > > >> >> + /* From gen10 workaround table in h/w specs: > > >> >> + * "On 3DSTATE_3D_MODE, driver must always program bits > 31:16 of > > >> >> DW1 > > >> >> + * a value of 0x" > > >> >> + */ > > >> >> + OUT_BATCH(0x << 16 | GEN10_FLOAT_BLEND_ > OPTIMIZATION_ENABLE); > > >> > > > >> > > > >> > Bits 31:16 are the mask bits. By programming them to 0x, you're > > >> > making > > >> > it write the entire register and not just the float blend > optimization > > >> > enable bit. If we're going to do that, we need to figure out what > > >> > values we > > >> > want in the other fields and always set them along with the float > blend > > >> > optimization enable bit. > > >> > > > >> Right. After looking at all other fields, I don't think we want to set > > >> any of them except one. That field is "Slice Hashing Table Enable" > which > > >> says: > > >> "For gen10, when the total number of subslices enabled is 6,8,10, or > > >> 12, slice hashing table must be enabled." > > >> > > >> I have no idea about slice hashing tables and I think enabling it > > >> should be handled in a separate patch anyways. > > > > > > > > > What I wonder is what we're using today. I don't think mesa is > actually > > > setting anything other than the default right now but Ken was looking > into > > > it at one point. > > Right. Mesa is not setting anything and default values are all zero. > > If Jason is fine with that, patches 3 and 4 are > > Reviewed-by: Rafael Antognolli > I'm fine with it so long as we improve the comment a bit. In particular, how about something like this at the end: "This means that we end up setting the entire 3D_MODE state and not just the float blend optimization. The other bits in this register control things such as slice hashing and we want the default values of zero." ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] create src/wsi
On 30 October 2017 at 17:05, Dylan Bakerwrote: > So I think the consensus is this is okay? > > Emil, is the autotools right here? > Without a clear separation or cleanup of the the existing code, this such move brings no technical benefit. If anything, it makes it harder for people [roughly] familiar with current setup. i was wondering if we can nuke wl_drm, but I don't see that happening anytime soon :-( -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] i965/gen10: Implement Wa3DStateMode
On Thu, Oct 05, 2017 at 10:53:42AM -0700, Anuj Phogat wrote: > On Wed, Oct 4, 2017 at 9:29 PM, Jason Ekstrandwrote: > > On Wed, Oct 4, 2017 at 3:11 PM, Anuj Phogat wrote: > >> > >> On Mon, Oct 2, 2017 at 7:46 PM, Jason Ekstrand > >> wrote: > >> > On Mon, Oct 2, 2017 at 4:08 PM, Anuj Phogat > >> > wrote: > >> >> > >> >> Cc: mesa-sta...@lists.freedesktop.org > >> >> Signed-off-by: Anuj Phogat > >> >> --- > >> >> src/mesa/drivers/dri/i965/brw_state_upload.c | 7 +-- > >> >> 1 file changed, 5 insertions(+), 2 deletions(-) > >> >> > >> >> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c > >> >> b/src/mesa/drivers/dri/i965/brw_state_upload.c > >> >> index a1bf54dc72..c224355a2b 100644 > >> >> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c > >> >> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c > >> >> @@ -88,8 +88,11 @@ brw_upload_initial_gpu_state(struct brw_context > >> >> *brw) > >> >> if (devinfo->gen == 10) { > >> >>BEGIN_BATCH(2); > >> >>OUT_BATCH(_3DSTATE_3D_MODE << 16 | (2 - 2)); > >> >> - OUT_BATCH(GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE << 16 | > >> >> -GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE); > >> >> + /* From gen10 workaround table in h/w specs: > >> >> + * "On 3DSTATE_3D_MODE, driver must always program bits 31:16 of > >> >> DW1 > >> >> + * a value of 0x" > >> >> + */ > >> >> + OUT_BATCH(0x << 16 | GEN10_FLOAT_BLEND_OPTIMIZATION_ENABLE); > >> > > >> > > >> > Bits 31:16 are the mask bits. By programming them to 0x, you're > >> > making > >> > it write the entire register and not just the float blend optimization > >> > enable bit. If we're going to do that, we need to figure out what > >> > values we > >> > want in the other fields and always set them along with the float blend > >> > optimization enable bit. > >> > > >> Right. After looking at all other fields, I don't think we want to set > >> any of them except one. That field is "Slice Hashing Table Enable" which > >> says: > >> "For gen10, when the total number of subslices enabled is 6,8,10, or > >> 12, slice hashing table must be enabled." > >> > >> I have no idea about slice hashing tables and I think enabling it > >> should be handled in a separate patch anyways. > > > > > > What I wonder is what we're using today. I don't think mesa is actually > > setting anything other than the default right now but Ken was looking into > > it at one point. > Right. Mesa is not setting anything and default values are all zero. If Jason is fine with that, patches 3 and 4 are Reviewed-by: Rafael Antognolli ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V2 1/4] i965/gen10: Implement WaSampleOffsetIZ workaround
On Mon, Oct 23, 2017 at 08:46:26AM -0700, Anuj Phogat wrote: > Ping. Patches 2-4 in this series are still waiting for review. > Anyone interested? > Thanks! > > > > On Fri, Oct 13, 2017 at 3:35 PM, Rafael Antognolli >wrote: > > Hi Anuj, sorry that I missed this patch. Please see below. > > > > On Fri, Oct 06, 2017 at 04:30:47PM -0700, Anuj Phogat wrote: > >> There are few other (duplicate) workarounds which have similar > >> recommendations: > >> WaFlushHangWhenNonPipelineStateAndMarkerStalled > >> WaCSStallBefore3DSamplePattern > >> WaPipeControlBefore3DStateSamplePattern > >> > >> WaPipeControlBefore3DStateSamplePattern has some extra recommendations if > >> driver is using mid batch context restore. Ignoring it for now because > >> We're > >> not doing mid-batch context restore in Mesa. > >> > >> Cc: mesa-sta...@lists.freedesktop.org > >> Cc: Jason Ekstrand > >> Cc: Rafael Antognolli > >> Signed-off-by: Anuj Phogat > >> --- > >> src/mesa/drivers/dri/i965/brw_context.h| 2 + > >> src/mesa/drivers/dri/i965/brw_defines.h| 1 + > >> src/mesa/drivers/dri/i965/brw_pipe_control.c | 50 > >> ++ > >> src/mesa/drivers/dri/i965/gen8_multisample_state.c | 8 > >> 4 files changed, 61 insertions(+) > >> > >> diff --git a/src/mesa/drivers/dri/i965/brw_context.h > >> b/src/mesa/drivers/dri/i965/brw_context.h > >> index 92fc16de13..f0e8d562e9 100644 > >> --- a/src/mesa/drivers/dri/i965/brw_context.h > >> +++ b/src/mesa/drivers/dri/i965/brw_context.h > >> @@ -1647,6 +1647,8 @@ void brw_emit_post_sync_nonzero_flush(struct > >> brw_context *brw); > >> void brw_emit_depth_stall_flushes(struct brw_context *brw); > >> void gen7_emit_vs_workaround_flush(struct brw_context *brw); > >> void gen7_emit_cs_stall_flush(struct brw_context *brw); > >> +void gen10_emit_wa_cs_stall_flush(struct brw_context *brw); > >> +void gen10_emit_wa_lri_to_cache_mode_zero(struct brw_context *brw); > >> > >> /* brw_queryformat.c */ > >> void brw_query_internal_format(struct gl_context *ctx, GLenum target, > >> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > >> b/src/mesa/drivers/dri/i965/brw_defines.h > >> index 4abb790612..270cdf29db 100644 > >> --- a/src/mesa/drivers/dri/i965/brw_defines.h > >> +++ b/src/mesa/drivers/dri/i965/brw_defines.h > >> @@ -1609,6 +1609,7 @@ enum brw_pixel_shader_coverage_mask_mode { > >> #define GEN7_GPGPU_DISPATCHDIMY 0x2504 > >> #define GEN7_GPGPU_DISPATCHDIMZ 0x2508 > >> > >> +#define GEN7_CACHE_MODE_0 0x7000 > >> #define GEN7_CACHE_MODE_1 0x7004 > >> # define GEN9_FLOAT_BLEND_OPTIMIZATION_ENABLE (1 << 4) > >> # define GEN8_HIZ_NP_PMA_FIX_ENABLE(1 << 11) > >> diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c > >> b/src/mesa/drivers/dri/i965/brw_pipe_control.c > >> index 460b8f73b6..156f5c25ec 100644 > >> --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c > >> +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c > >> @@ -278,6 +278,56 @@ gen7_emit_cs_stall_flush(struct brw_context *brw) > >> brw->workaround_bo, 0, 0); > >> } > >> > >> +static void > >> +brw_flush_gpu_caches(struct brw_context *brw) { > >> + brw_emit_pipe_control_flush(brw, > >> + PIPE_CONTROL_CACHE_FLUSH_BITS | > >> + PIPE_CONTROL_CACHE_INVALIDATE_BITS); > >> +} > > > > This function is only calling another function without any extra logic, so I > > would just call brw_emit_pipe_control_flush() and remove this declaration. > > But > > that's just cosmetic. > > > > With or without this change, this patch correctly implements the workaround > > imho, so it is > > > > Reviewed-by: Rafael Antognolli > > > >> +/** > >> + * From Gen10 Workarounds page in h/w specs: > >> + * WaSampleOffsetIZ: > >> + * Prior to the 3DSTATE_SAMPLE_PATTERN driver must ensure there are no > >> + * markers in the pipeline by programming a PIPE_CONTROL with stall. > >> + */ > >> +void > >> +gen10_emit_wa_cs_stall_flush(struct brw_context *brw) > >> +{ > >> + const struct gen_device_info *devinfo = >screen->devinfo; > >> + assert(devinfo->gen == 10); > >> + brw_emit_pipe_control_flush(brw, > >> + PIPE_CONTROL_CS_STALL | > >> + PIPE_CONTROL_STALL_AT_SCOREBOARD); > >> +} > >> + > >> +/** > >> + * From Gen10 Workarounds page in h/w specs: > >> + * WaSampleOffsetIZ: > >> + * When 3DSTATE_SAMPLE_PATTERN is programmed, driver must then issue an > >> + * MI_LOAD_REGISTER_IMM command to an offset between 0x7000 and > >> 0x7FFF(SVL) > >> + * after the command to ensure the state has been delivered prior to any > >> + * command causing a marker in the pipeline. > >> + */ > >> +void > >> +gen10_emit_wa_lri_to_cache_mode_zero(struct brw_context *brw) > >> +{
Re: [Mesa-dev] [PATCH 2/2] i965: Check CCS_E compatibility for texture view rendering
On Fri, Oct 27, 2017 at 05:14:16PM -0700, Jason Ekstrand wrote: > On Fri, Oct 27, 2017 at 3:16 PM, Nanley Cherywrote: > > > On Fri, Oct 27, 2017 at 12:52:30PM -0700, Jason Ekstrand wrote: > > > On Fri, Oct 27, 2017 at 12:24 PM, Nanley Chery > > > wrote: > > > > > > > Only use CCS_E to render to a texture that is CCS_E-compatible with the > > > > original texture's miptree (linear) format. This prevents render > > > > operations from writing data that can't be decoded with the original > > > > miptree format. > > > > > > > > On Gen10, with the new CCS_E-enabled formats handled, this enables the > > > > driver to pass the arb_texture_view-rendering-formats piglit test. > > > > > > > > Cc: > > > > Signed-off-by: Nanley Chery > > > > --- > > > > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 28 > > > > +-- > > > > 1 file changed, 26 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > > > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > > > index a850f4d17b..59c57c227b 100644 > > > > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > > > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > > > @@ -241,6 +241,27 @@ intel_miptree_supports_hiz(const struct > > brw_context > > > > *brw, > > > > } > > > > } > > > > > > > > +/** > > > > + * Return true if the format that will be used to access the miptree > > is > > > > + * CCS_E-compatible with the miptree's linear/non-sRGB format. > > > > + * > > > > + * Why use the linear format? Well, although the miptree may be > > specified > > > > with > > > > + * an sRGB format, the usage of that color space/format can be > > toggled. > > > > Since > > > > + * our HW tends to support more linear formats than sRGB ones, we use > > this > > > > + * format variant for check for CCS_E compatibility. > > > > + */ > > > > +static bool > > > > +format_ccs_e_compat_with_miptree(const struct gen_device_info > > *devinfo, > > > > + const struct intel_mipmap_tree *mt, > > > > + enum isl_format access_format) > > > > +{ > > > > + assert(mt->aux_usage == ISL_AUX_USAGE_CCS_E); > > > > + > > > > + mesa_format linear_format = _mesa_get_srgb_format_linear( > > mt->format); > > > > + enum isl_format isl_format = brw_isl_format_for_mesa_ > > > > format(linear_format); > > > > + return isl_formats_are_ccs_e_compatible(devinfo, isl_format, > > > > access_format); > > > > +} > > > > + > > > > static bool > > > > intel_miptree_supports_ccs_e(struct brw_context *brw, > > > > const struct intel_mipmap_tree *mt) > > > > @@ -2662,8 +2683,11 @@ intel_miptree_render_aux_usage(struct > > brw_context > > > > *brw, > > > >return mt->mcs_buf ? ISL_AUX_USAGE_CCS_D : ISL_AUX_USAGE_NONE; > > > > > > > > case ISL_AUX_USAGE_CCS_E: { > > > > - /* If the format supports CCS_E, then we can just use it */ > > > > - if (isl_format_supports_ccs_e(>screen->devinfo, > > > > render_format)) > > > > + /* If the format supports CCS_E and is compatible with the > > miptree, > > > > + * then we can use it. > > > > + */ > > > > + if (format_ccs_e_compat_with_miptree(>screen->devinfo, > > > > + mt, render_format)) > > > > > > > > > > You don't need the helper if you just use mt->surf.format. That's what > > > can_texture_with_ccs does. > > > > > > > > > > Isn't using mt->surf.format making the code more restrictive than > > necessary? That field may give you the sRGB format of the surface, but > > the helper would always give you the linear variant. Therefore the > > helper allows for CCS_E to be used in more cases. > > > > You're right. I completely missed that. In that case, we probably want to > use your helper for can_texture_with_ccs as well. Maybe two patches: one > which adds the helper and makes can_texture_with_ccs use it and a second to > add the render target stuff? Or you can keep this one as-is and do > can_texture_with_ccs as a follow-on, I don't really care. I'm assuming I have your Rb? I'm thinking of keeping this one as-is. I tried modifying can_texture_with_ccs to use the helper before sending the patch out, but it caused one GLES3.1 test to fail and I couldn't figure out why (on jenkins, nchery build#249). dEQP-GLES31.functional.srgb_texture_decode.skip_decode.srgba8.texel_fetch Since this patch is needed for correctness and the other is more of a performance enhancement, I figured we could get to it later. Any ideas would be appreciated though. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] meson: Use true and false instead of yes and no for tristate options
This allows a user to not care whether they're setting a tristate or a boolean option, which is a nice user facing feature, and something I've personally run into. Suggested-by: Adam JacksonSigned-off-by: Dylan Baker --- meson.build | 6 +++--- meson_options.txt | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/meson.build b/meson.build index 24d997b3e0a..efa9fbf3bae 100644 --- a/meson.build +++ b/meson.build @@ -170,7 +170,7 @@ endif with_gbm = get_option('gbm') if with_gbm == 'auto' and with_dri # TODO: or gallium with_gbm = host_machine.system() == 'linux' -elif with_gbm == 'yes' +elif with_gbm == 'true' if not ['linux', 'bsd'].contains(host_machine.system()) error('GBM only supports unix-like platforms') endif @@ -182,7 +182,7 @@ endif _egl = get_option('egl') if _egl == 'auto' with_egl = with_dri and with_shared_glapi and egl_native_platform != '' -elif _egl == 'yes' +elif _egl == 'true' if not with_dri error('EGL requires dri') elif not with_shared_glapi @@ -264,7 +264,7 @@ if with_dri3 == 'auto' else with_dri3 = false endif -elif with_dri3 == 'yes' +elif with_dri3 == 'true' with_dri3 = true else with_dri3 = false diff --git a/meson_options.txt b/meson_options.txt index 74f1e71bf43..d326ad9296f 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -28,7 +28,7 @@ option( 'dri3', type : 'combo', value : 'auto', - choices : ['auto', 'yes', 'no'], + choices : ['auto', 'true', 'false'], description : 'enable support for dri3' ) option( @@ -101,7 +101,7 @@ option( 'gbm', type : 'combo', value : 'auto', - choices : ['auto', 'yes', 'no'], + choices : ['auto', 'true', 'false'], description : 'Build support for gbm platform' ) option( @@ -115,7 +115,7 @@ option( 'egl', type : 'combo', value : 'auto', - choices : ['auto', 'yes', 'no'], + choices : ['auto', 'true', 'false'], description : 'Build support for EGL platform' ) option( -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 102677] [OpenGL CTS] KHR-GL45.CommonBugs.CommonBug_PerVertexValidation fails
https://bugs.freedesktop.org/show_bug.cgi?id=102677 Neil Robertschanged: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #4 from Neil Roberts --- I’ve pushed the patches to master with Ken’s RB. Thanks! -- You are receiving this mail because: You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] meson: implement default driver arguments
On Monday, 2017-10-30 10:21:50 -0700, Dylan Baker wrote: > This allows drivers to be set by OS/arch in a sane manner. > > Signed-off-by: Dylan Baker> --- > meson.build | 37 +++-- > meson_options.txt | 8 > 2 files changed, 39 insertions(+), 6 deletions(-) > > diff --git a/meson.build b/meson.build > index 24d997b3e0a..436d676d72d 100644 > --- a/meson.build > +++ b/meson.build > @@ -90,7 +90,19 @@ with_dri_r200 = false > with_dri_nouveau = false > with_dri_swrast = false > _drivers = get_option('dri-drivers') > -if _drivers != '' > +if _drivers == 'default' > + if ['linux', 'bsd'].contains(host_machine.system()) > +if ['x86', 'x86_64'].contains(host_machine.cpu_family()) > + with_dri_i915 = true > + with_dri_i965 = true > + with_dri_r100 = true > + with_dri_r200 = true > + with_dri_nouveau = true > + with_dri = true Yes to the idea, but to avoid having different parse paths for the default case and when drivers are specified, how about if _drivers == 'default' if os_and_arch logic _drivers = 'i915,i965,r100,r200,nouveau' else ... endif endif > +endif > +# TODO: PPC, Sparc > + endif > +elif _drivers != '' >_split = _drivers.split(',') >with_dri_i915 = _split.contains('i915') >with_dri_i965 = _split.contains('i965') > @@ -112,7 +124,28 @@ with_gallium_vc5 = false > with_gallium_etnaviv = false > with_gallium_imx = false > _drivers = get_option('gallium-drivers') > -if _drivers != '' > +if _drivers == 'default' > + if ['linux', 'bsd'].contains(host_machine.system()) > +if ['x86', 'x86_64'].contains(host_machine.cpu_family()) > + with_gallium_radeonsi = true > + with_gallium_nouveau = true > + with_gallium_softpipe = true > + with_gallium = true > + with_dri = true > +elif ['arm', 'aarch64'].contains(host_machine.cpu_family()) > + with_gallium_pl111 = true > + with_gallium_vc4 = true > + with_gallium_vc5 = true > + with_gallium_freedreno = true > + with_gallium_etnaviv = true > + with_gallium_imx = true > + with_gallium_softpipe = true > + with_gallium = true > + with_dri = true > +endif > +# TODO: PPC, Sparc > + endif > +elif _drivers != '' >_split = _drivers.split(',') >with_gallium_pl111 = _split.contains('pl111') >with_gallium_radeonsi = _split.contains('radeonsi') > diff --git a/meson_options.txt b/meson_options.txt > index 74f1e71bf43..0de54f9422d 100644 > --- a/meson_options.txt > +++ b/meson_options.txt > @@ -34,8 +34,8 @@ option( > option( >'dri-drivers', >type : 'string', > - value : 'i915,i965,r100,r200,nouveau', > - description : 'comma separated list of dri drivers to build.' > + value : 'default', > + description : 'comma separated list of dri drivers to build. If this is > set to default all drivers applicable to the target OS/architecture will be > built' > ) > option( >'dri-drivers-path', > @@ -46,8 +46,8 @@ option( > option( >'gallium-drivers', >type : 'string', > - value : 'pl111,radeonsi,nouveau,freedreno,swrast,vc4,etnaviv,imx', > - description : 'comma separated list of gallium drivers to build.' > + value : 'default', > + description : 'comma separated list of gallium drivers to build. If this > is set to default all drivers applicable to the target OS/architecture will > be built' > ) > option( >'gallium-media', > -- > 2.14.3 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 103496] svga_screen.c:26:46: error: git_sha1.h: No such file or directory
https://bugs.freedesktop.org/show_bug.cgi?id=103496 --- Comment #3 from Eric Engestrom--- (In reply to Dylan Baker from comment #2) > I think Eric Engstrom sent a fix for this that's been reviewed but hasn't > landed yet. Indeed, I just pushed it: commit 2117d0331020874c28bb66b0596467f960259eb7 Author: Eric Engestrom Date: Sun Oct 29 22:06:28 2017 + git_sha1_gen: create empty file in fallback path I missed this part in my conversion, the old stream redirection meant the file was always created. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103496 Fixes: 7088622e5fb506b64c90 "buildsys: move file regeneration logic to the script itself" Signed-off-by: Eric Engestrom Reviewed-by: Dylan Baker -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] meson: implement default driver arguments
This allows drivers to be set by OS/arch in a sane manner. Signed-off-by: Dylan Baker--- meson.build | 37 +++-- meson_options.txt | 8 2 files changed, 39 insertions(+), 6 deletions(-) diff --git a/meson.build b/meson.build index 24d997b3e0a..436d676d72d 100644 --- a/meson.build +++ b/meson.build @@ -90,7 +90,19 @@ with_dri_r200 = false with_dri_nouveau = false with_dri_swrast = false _drivers = get_option('dri-drivers') -if _drivers != '' +if _drivers == 'default' + if ['linux', 'bsd'].contains(host_machine.system()) +if ['x86', 'x86_64'].contains(host_machine.cpu_family()) + with_dri_i915 = true + with_dri_i965 = true + with_dri_r100 = true + with_dri_r200 = true + with_dri_nouveau = true + with_dri = true +endif +# TODO: PPC, Sparc + endif +elif _drivers != '' _split = _drivers.split(',') with_dri_i915 = _split.contains('i915') with_dri_i965 = _split.contains('i965') @@ -112,7 +124,28 @@ with_gallium_vc5 = false with_gallium_etnaviv = false with_gallium_imx = false _drivers = get_option('gallium-drivers') -if _drivers != '' +if _drivers == 'default' + if ['linux', 'bsd'].contains(host_machine.system()) +if ['x86', 'x86_64'].contains(host_machine.cpu_family()) + with_gallium_radeonsi = true + with_gallium_nouveau = true + with_gallium_softpipe = true + with_gallium = true + with_dri = true +elif ['arm', 'aarch64'].contains(host_machine.cpu_family()) + with_gallium_pl111 = true + with_gallium_vc4 = true + with_gallium_vc5 = true + with_gallium_freedreno = true + with_gallium_etnaviv = true + with_gallium_imx = true + with_gallium_softpipe = true + with_gallium = true + with_dri = true +endif +# TODO: PPC, Sparc + endif +elif _drivers != '' _split = _drivers.split(',') with_gallium_pl111 = _split.contains('pl111') with_gallium_radeonsi = _split.contains('radeonsi') diff --git a/meson_options.txt b/meson_options.txt index 74f1e71bf43..0de54f9422d 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -34,8 +34,8 @@ option( option( 'dri-drivers', type : 'string', - value : 'i915,i965,r100,r200,nouveau', - description : 'comma separated list of dri drivers to build.' + value : 'default', + description : 'comma separated list of dri drivers to build. If this is set to default all drivers applicable to the target OS/architecture will be built' ) option( 'dri-drivers-path', @@ -46,8 +46,8 @@ option( option( 'gallium-drivers', type : 'string', - value : 'pl111,radeonsi,nouveau,freedreno,swrast,vc4,etnaviv,imx', - description : 'comma separated list of gallium drivers to build.' + value : 'default', + description : 'comma separated list of gallium drivers to build. If this is set to default all drivers applicable to the target OS/architecture will be built' ) option( 'gallium-media', -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 103496] svga_screen.c:26:46: error: git_sha1.h: No such file or directory
https://bugs.freedesktop.org/show_bug.cgi?id=103496 --- Comment #2 from Dylan Baker--- I think Eric Engstrom sent a fix for this that's been reviewed but hasn't landed yet. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 19/43] i965/fs: Support push constants of 16-bit types
On Mon, Oct 30, 2017 at 05:10:53PM +0100, Chema Casanova wrote: > El 30/10/17 a las 07:44, Pohjolainen, Topi escribió: > > On Sun, Oct 29, 2017 at 11:17:11PM +0100, Chema Casanova wrote: > >> On 29/10/17 19:55, Pohjolainen, Topi wrote: > >>> On Thu, Oct 12, 2017 at 08:38:08PM +0200, Jose Maria Casanova Crespo > >>> wrote: > We enable the use of 16-bit values in push constants > modifying the assign_constant_locations function to work > with 16-bit types. > > The API to access buffers in Vulkan use multiples of 4-byte for > offsets and sizes. Current accountability of uniforms based on 4-byte > slots will work for 16-bit values if they are allowed to use 32-bit > slots. For that, we replace the division by 4 by a DIV_ROUND_UP, so > 2-byte elements will use 1 slot instead of 0. > > We aligns the 16-bit locations after assigning the 32-bit > ones. > --- > src/intel/compiler/brw_fs.cpp | 30 +++--- > 1 file changed, 23 insertions(+), 7 deletions(-) > > diff --git a/src/intel/compiler/brw_fs.cpp > b/src/intel/compiler/brw_fs.cpp > index a1d49a63be..8da16145dc 100644 > --- a/src/intel/compiler/brw_fs.cpp > +++ b/src/intel/compiler/brw_fs.cpp > @@ -1909,8 +1909,9 @@ set_push_pull_constant_loc(unsigned uniform, int > *chunk_start, > if (!contiguous) { > /* If bitsize doesn't match the target one, skip it */ > if (*max_chunk_bitsize != target_bitsize) { > - /* FIXME: right now we only support 32 and 64-bit accesses */ > - assert(*max_chunk_bitsize == 4 || *max_chunk_bitsize == 8); > + assert(*max_chunk_bitsize == 4 || > +*max_chunk_bitsize == 8 || > +*max_chunk_bitsize == 2); > *max_chunk_bitsize = 0; > *chunk_start = -1; > return; > @@ -1987,8 +1988,9 @@ fs_visitor::assign_constant_locations() > int constant_nr = inst->src[i].nr + inst->src[i].offset / 4; > >>> Did you test this with, for example, vec4? > >> CTS has 16bit scalar, vec2 (uint,sint), vec4 (float) and matrix tests > >> for push constants for compute and graphics pipelines. For vec4 you can > >> try: > >> > >> dEQP-VK.spirv_assembly.instruction.compute.16bit_storage.push_constant_16_to_32.vector_float > >> > >> For push constant tests in general there are 42 tests, but vec3 aren't > >> tested: > >> > >> dEQP-VK.*16bit_storage.*push_constant. > >> > >> > >>> I've been toying with a glsl > >>> lowering pass changing mediump floats into float16. I was curious to know > >>> how > >>> much is needed as you have addressed most of the things from NIR onwards. > >>> Here I'm seeing offsets 0,2,4,6 which result into 0,0,1,1 when divided by > >>> four. Don't we need something of this sort in addition? > >> If i remember correctly, tests were testing to use push constants with > >> 64 16bit values, to use the minimum spec maximum available as > >> max_push_constants_size that is 128 bytes. So at the end the generated > >> intrinsic was: > >> > >> vec4 16 ssa_4 = intrinsic load_uniform (ssa_3) () (0, 128) /* base=0 */ > >> /* range=128 */ > >> > >> As the calculus here is to calculate the number of location used, and > >> taking into account that the Vulkan API restrictions for push constants > >> that says that push constant ranges that say that offset must be > >> multiple of 4 and size must be multiple of 4, maintain the use of > >> 4-bytes slots was ok for supporting the feature. Our code changes just > >> take the accountability in the number of 32-bits location needed, mainly > >> changing the divisions by 4 using DIV_ROUND_UP( , 4) to calculate sizes. > > I'm probably misunderstanding something. Let me ask a few clarifying > > questions. > > > > I'm reading that the incoming 16-bit values are given in 32-bit slots, and > > for > > the same reason we place them in the push/pull buffers in 32-bits slots. In > > other words a vec4 would take 16-bytes and each component would 32-bits > > apart? > > Probably I explained quite bad. A f16vec4 would use 8-bytes, and each > component > is going to be 16-bits apart. The 32-bit multiple offset only applies to > the first > element. > > > If that is the case, then don't we need to adjust the register offsets > > somewhere the way I did in the fragment below? Otherwise the offsets will > > point to locations in the register that are simply 16-bits apart? > > Yes components are 16-bits apart. Because of that we don't need anything > especial to adjust the offsets. At least we didn't needed for existing > tests. Ah, okay. I misunderstood, thanks for explaining. That sounds good. I need the hack below for now. I'll come back to it once I have all other things needed for the benchmarks addressed somehow. > > > > >>> commit 1a6d2bf3302f6e4305e383da0f27712dc5c20a67 > >>> Author: Topi Pohjolainen
Re: [Mesa-dev] create src/wsi
So I think the consensus is this is okay? Emil, is the autotools right here? Quoting Dylan Baker (2017-10-20 18:00:13) > This very short series creates a new src/wsi folder, and moves wayland-drm > into > it. Basically wsi stuff is scattered about, and is needed by multiple > components > within mesa, wayland-drm, for example, is used by EGL, GBM, and vulkan > wayland-wsi. > > I think there's more that could be moved into wsi, we could move EGL, GBM, and > GLX, and vulkan/wsi, for example. > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 103496] svga_screen.c:26:46: error: git_sha1.h: No such file or directory
https://bugs.freedesktop.org/show_bug.cgi?id=103496 --- Comment #1 from Brian Paul--- Hi Vinson, can you re-test with ToT? The scons build is working for me, testing at 134a40d2a67. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 28/33] intel: genxml: rename output urb offset field
"Output Read Offset" is a bit non-sensical, let's just make this match "Output Length". Signed-off-by: Lionel Landwerlin--- src/intel/genxml/gen10.xml| 6 +++--- src/intel/genxml/gen8.xml | 6 +++--- src/intel/genxml/gen9.xml | 6 +++--- src/mesa/drivers/dri/i965/genX_state_upload.c | 2 +- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/src/intel/genxml/gen10.xml b/src/intel/genxml/gen10.xml index 5bb46f819a8..d94c6e4ecc2 100644 --- a/src/intel/genxml/gen10.xml +++ b/src/intel/genxml/gen10.xml @@ -1447,7 +1447,7 @@ - + @@ -1653,7 +1653,7 @@ - + @@ -2669,7 +2669,7 @@ - + diff --git a/src/intel/genxml/gen8.xml b/src/intel/genxml/gen8.xml index 7ccb8046796..4511e3d3749 100644 --- a/src/intel/genxml/gen8.xml +++ b/src/intel/genxml/gen8.xml @@ -1349,7 +1349,7 @@ - + @@ -1511,7 +1511,7 @@ - + @@ -2353,7 +2353,7 @@ - + diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml index db4b608f61f..87fff9e7ba0 100644 --- a/src/intel/genxml/gen9.xml +++ b/src/intel/genxml/gen9.xml @@ -1411,7 +1411,7 @@ - + @@ -1617,7 +1617,7 @@ - + @@ -2585,7 +2585,7 @@ - + diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c b/src/mesa/drivers/dri/i965/genX_state_upload.c index 98f69522de5..03b0ac75325 100644 --- a/src/mesa/drivers/dri/i965/genX_state_upload.c +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c @@ -2649,7 +2649,7 @@ genX(upload_gs_state)(struct brw_context *brw) DIV_ROUND_UP(vue_prog_data->vue_map.num_slots, 2) - urb_entry_write_offset; - gs.VertexURBEntryOutputReadOffset = urb_entry_write_offset; + gs.VertexURBEntryOutputOffset = urb_entry_write_offset; gs.VertexURBEntryOutputLength = MAX2(urb_entry_output_length, 1); #endif } -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 18/33] intel: decoder: simplify field_is_header()
Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 7 --- src/intel/common/gen_decoder.h | 3 ++- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index 050926f5642..94e7e15399f 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -288,6 +288,7 @@ create_field(struct parser_context *ctx, const char **atts) struct gen_field *field; field = rzalloc(ctx->group, struct gen_field); + field->parent = ctx->group; for (int i = 0; atts[i]; i += 2) { char *p; @@ -952,7 +953,7 @@ print_dword_header(FILE *outfile, } bool -gen_group_header_is_header(struct gen_group *group, struct gen_field *field) +gen_field_is_header(struct gen_field *field) { uint32_t bits; @@ -962,7 +963,7 @@ gen_group_header_is_header(struct gen_group *group, struct gen_field *field) bits = (1U << (field->end - field->start + 1)) - 1; bits <<= field->start; - return (group->opcode_mask & bits) != 0; + return (field->parent->opcode_mask & bits) != 0; } void @@ -979,7 +980,7 @@ gen_print_group(FILE *outfile, struct gen_group *group, print_dword_header(outfile, , offset, i); last_dword = iter.dword; } - if (!gen_group_header_is_header(group, iter.field)) { + if (!gen_field_is_header(iter.field)) { fprintf(outfile, "%s: %s\n", iter.name, iter.value); if (iter.struct_desc) { uint64_t struct_offset = offset + 4 * iter.dword; diff --git a/src/intel/common/gen_decoder.h b/src/intel/common/gen_decoder.h index 2c54eed267b..658dd7f7b09 100644 --- a/src/intel/common/gen_decoder.h +++ b/src/intel/common/gen_decoder.h @@ -56,7 +56,7 @@ int gen_group_get_length(struct gen_group *group, const uint32_t *p); const char *gen_group_get_name(struct gen_group *group); uint32_t gen_group_get_opcode(struct gen_group *group); struct gen_enum *gen_spec_find_enum(struct gen_spec *spec, const char *name); -bool gen_group_header_is_header(struct gen_group *group, struct gen_field *field); +bool gen_field_is_header(struct gen_field *field); struct gen_field_iterator { struct gen_group *group; @@ -143,6 +143,7 @@ struct gen_type { }; struct gen_field { + struct gen_group *parent; struct gen_field *next; char *name; -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 27/33] intel: genxml: be consistent about register offset naming
Signed-off-by: Lionel Landwerlin--- src/intel/genxml/gen10.xml | 8 src/intel/genxml/gen4.xml | 2 +- src/intel/genxml/gen45.xml | 2 +- src/intel/genxml/gen5.xml | 2 +- src/intel/genxml/gen6.xml | 2 +- src/intel/genxml/gen7.xml | 4 ++-- src/intel/genxml/gen75.xml | 8 src/intel/genxml/gen8.xml | 8 src/intel/genxml/gen9.xml | 8 src/intel/vulkan/genX_cmd_buffer.c | 6 +++--- src/intel/vulkan/genX_gpu_memcpy.c | 4 ++-- src/intel/vulkan/genX_query.c | 20 ++-- 12 files changed, 37 insertions(+), 37 deletions(-) diff --git a/src/intel/genxml/gen10.xml b/src/intel/genxml/gen10.xml index bb33526e6dc..5bb46f819a8 100644 --- a/src/intel/genxml/gen10.xml +++ b/src/intel/genxml/gen10.xml @@ -3195,7 +3195,7 @@ - + @@ -3203,8 +3203,8 @@ - - + + @@ -3405,7 +3405,7 @@ - + diff --git a/src/intel/genxml/gen4.xml b/src/intel/genxml/gen4.xml index fc24329535d..6345b75c48f 100644 --- a/src/intel/genxml/gen4.xml +++ b/src/intel/genxml/gen4.xml @@ -1100,7 +1100,7 @@ - + diff --git a/src/intel/genxml/gen45.xml b/src/intel/genxml/gen45.xml index c91085831ea..dd9ca262030 100644 --- a/src/intel/genxml/gen45.xml +++ b/src/intel/genxml/gen45.xml @@ -1130,7 +1130,7 @@ - + diff --git a/src/intel/genxml/gen5.xml b/src/intel/genxml/gen5.xml index 93e687a32bd..4c822df67f8 100644 --- a/src/intel/genxml/gen5.xml +++ b/src/intel/genxml/gen5.xml @@ -1216,7 +1216,7 @@ - + diff --git a/src/intel/genxml/gen6.xml b/src/intel/genxml/gen6.xml index 0707a33cd2a..317529b4065 100644 --- a/src/intel/genxml/gen6.xml +++ b/src/intel/genxml/gen6.xml @@ -1833,7 +1833,7 @@ - + diff --git a/src/intel/genxml/gen7.xml b/src/intel/genxml/gen7.xml index bda3b82e718..09d45818c3a 100644 --- a/src/intel/genxml/gen7.xml +++ b/src/intel/genxml/gen7.xml @@ -2254,7 +2254,7 @@ - + @@ -2347,7 +2347,7 @@ - + diff --git a/src/intel/genxml/gen75.xml b/src/intel/genxml/gen75.xml index b6aa9c55031..0bd1ce6ace6 100644 --- a/src/intel/genxml/gen75.xml +++ b/src/intel/genxml/gen75.xml @@ -2614,7 +2614,7 @@ - + @@ -2622,8 +2622,8 @@ - - + + @@ -2796,7 +2796,7 @@ - + diff --git a/src/intel/genxml/gen8.xml b/src/intel/genxml/gen8.xml index 9f0fa48ce66..7ccb8046796 100644 --- a/src/intel/genxml/gen8.xml +++ b/src/intel/genxml/gen8.xml @@ -2835,7 +2835,7 @@ - + @@ -2843,8 +2843,8 @@ - - + + @@ -3056,7 +3056,7 @@ - + diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml index 0e2dddeacfb..db4b608f61f 100644 --- a/src/intel/genxml/gen9.xml +++ b/src/intel/genxml/gen9.xml @@ -3120,7 +3120,7 @@ - + @@ -3128,8 +3128,8 @@ - - + + @@ -3341,7 +3341,7 @@ - + diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/genX_cmd_buffer.c index 20a885c4381..fd395ff3077 100644 --- a/src/intel/vulkan/genX_cmd_buffer.c +++ b/src/intel/vulkan/genX_cmd_buffer.c @@ -37,7 +37,7 @@ emit_lrm(struct anv_batch *batch, uint32_t reg, struct anv_bo *bo, uint32_t offset) { anv_batch_emit(batch, GENX(MI_LOAD_REGISTER_MEM), lrm) { - lrm.RegisterAddress = reg; + lrm.RegisterOffset = reg; lrm.MemoryAddress= (struct anv_address) { bo, offset }; } } @@ -56,8 +56,8 @@ static void emit_lrr(struct anv_batch *batch, uint32_t dst, uint32_t src) { anv_batch_emit(batch, GENX(MI_LOAD_REGISTER_REG), lrr) { - lrr.SourceRegisterAddress= src; - lrr.DestinationRegisterAddress = dst; + lrr.SourceRegisterOffset= src; + lrr.DestinationRegisterOffset = dst; } } #endif diff --git a/src/intel/vulkan/genX_gpu_memcpy.c b/src/intel/vulkan/genX_gpu_memcpy.c index 5b00b314f6f..0ef2fec53a3 100644 --- a/src/intel/vulkan/genX_gpu_memcpy.c +++ b/src/intel/vulkan/genX_gpu_memcpy.c @@ -78,11 +78,11 @@ genX(cmd_buffer_mi_memcpy)(struct anv_cmd_buffer *cmd_buffer, */ #define TEMP_REG 0x2440 /* GEN7_3DPRIM_BASE_VERTEX */ anv_batch_emit(_buffer->batch, GENX(MI_LOAD_REGISTER_MEM), load) { - load.RegisterAddress = TEMP_REG; + load.RegisterOffset = TEMP_REG; load.MemoryAddress = src_addr; } anv_batch_emit(_buffer->batch, GENX(MI_STORE_REGISTER_MEM), store) { - store.RegisterAddress = TEMP_REG;
[Mesa-dev] [PATCH 32/33] intel: decoder: add function to query shader length
Signed-off-by: Lionel Landwerlin--- src/intel/tools/disasm.c | 34 ++ src/intel/tools/gen_disasm.h | 2 ++ 2 files changed, 36 insertions(+) diff --git a/src/intel/tools/disasm.c b/src/intel/tools/disasm.c index e2f5c11f6f5..c038949d9ec 100644 --- a/src/intel/tools/disasm.c +++ b/src/intel/tools/disasm.c @@ -26,6 +26,8 @@ #include "compiler/brw_inst.h" #include "compiler/brw_eu.h" +#include "common/gen_decoder.h" + #include "gen_disasm.h" uint64_t INTEL_DEBUG; @@ -43,6 +45,38 @@ is_send(uint32_t opcode) opcode == BRW_OPCODE_SENDSC ); } +uint32_t +gen_disasm_get_assembly_size(struct gen_disasm *disasm, + const struct gen_dword_reader *reader) +{ + struct gen_device_info *devinfo = >devinfo; + uint32_t size = 0; + + /* This loop exits when send-with-EOT or when opcode is 0 */ + while (true) { + union { + brw_inst insn; + uint32_t data[sizeof(brw_inst) / sizeof(uint32_t)]; + } data; + for (int i = 0; i < ARRAY_SIZE(data.data); i++) + data.data[i] = gen_read_dword(reader, size / 4 + i); + + if (brw_inst_cmpt_control(devinfo, )) { + size += 8; + } else { + size += 16; + } + + /* Simplistic, but efficient way to terminate disasm */ + uint32_t opcode = brw_inst_opcode(devinfo, ); + if (opcode == 0 || (is_send(opcode) && brw_inst_eot(devinfo, ))) { + break; + } + } + + return size; +} + static int gen_disasm_find_end(struct gen_disasm *disasm, void *assembly, int start) { diff --git a/src/intel/tools/gen_disasm.h b/src/intel/tools/gen_disasm.h index 281c2dc9cff..8bc2c698610 100644 --- a/src/intel/tools/gen_disasm.h +++ b/src/intel/tools/gen_disasm.h @@ -31,6 +31,8 @@ extern "C" { struct gen_disasm; struct gen_disasm *gen_disasm_create(int pciid); +uint32_t gen_disasm_get_assembly_size(struct gen_disasm *disasm, + const struct gen_dword_reader *reader); void gen_disasm_disassemble(struct gen_disasm *disasm, void *assembly, int start, char **ret); -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 30/33] intel: decoder: change find_instruction() to take first dword
Another step into decoupling memory access from pointers. Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c| 4 ++-- src/intel/common/gen_decoder.h| 2 +- src/intel/tools/aubinator.c | 2 +- src/intel/tools/aubinator_error_decode.c | 2 +- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 2 +- 5 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index 217e84fb38e..098ff472b37 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -685,13 +685,13 @@ void gen_spec_destroy(struct gen_spec *spec) } struct gen_group * -gen_spec_find_instruction(struct gen_spec *spec, const uint32_t *p) +gen_spec_find_instruction(struct gen_spec *spec, uint32_t dw0) { struct hash_entry *entry; hash_table_foreach(spec->commands, entry) { struct gen_group *command = entry->data; - uint32_t opcode = *p & command->opcode_mask; + uint32_t opcode = dw0 & command->opcode_mask; if (opcode == command->opcode) return command; } diff --git a/src/intel/common/gen_decoder.h b/src/intel/common/gen_decoder.h index 334cfaac2c2..b6cb735753d 100644 --- a/src/intel/common/gen_decoder.h +++ b/src/intel/common/gen_decoder.h @@ -50,7 +50,7 @@ struct gen_spec *gen_spec_load_from_path(const struct gen_device_info *devinfo, const char *path); void gen_spec_destroy(struct gen_spec *spec); uint32_t gen_spec_get_gen(struct gen_spec *spec); -struct gen_group *gen_spec_find_instruction(struct gen_spec *spec, const uint32_t *p); +struct gen_group *gen_spec_find_instruction(struct gen_spec *spec, const uint32_t dw0); struct gen_group *gen_spec_find_instruction_by_name(struct gen_spec *spec, const char *name); struct gen_group *gen_spec_find_register(struct gen_spec *spec, uint32_t offset); struct gen_group *gen_spec_find_register_by_name(struct gen_spec *spec, const char *name); diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c index 72f8d2aa4e8..abbebbb462f 100644 --- a/src/intel/tools/aubinator.c +++ b/src/intel/tools/aubinator.c @@ -698,7 +698,7 @@ parse_commands(struct gen_spec *spec, uint32_t *cmds, int size, int engine) struct gen_group *inst; for (p = cmds; p < end; p += length) { - inst = gen_spec_find_instruction(spec, p); + inst = gen_spec_find_instruction(spec, p[0]); length = gen_group_get_length(inst, p[0]); assert(inst == NULL || length > 0); length = MAX2(1, length); diff --git a/src/intel/tools/aubinator_error_decode.c b/src/intel/tools/aubinator_error_decode.c index 089683dac98..6573c2cc25b 100644 --- a/src/intel/tools/aubinator_error_decode.c +++ b/src/intel/tools/aubinator_error_decode.c @@ -249,7 +249,7 @@ static void decode(struct gen_spec *spec, *reset_color = NORMAL; uint64_t offset = gtt_offset + 4 * (p - data); - inst = gen_spec_find_instruction(spec, p); + inst = gen_spec_find_instruction(spec, p[0]); length = gen_group_get_length(inst, p[0]); assert(inst == NULL || length > 0); length = MAX2(1, length); diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c index 811f8a42f1e..0f6759d55aa 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c @@ -454,7 +454,7 @@ do_batch_dump(struct brw_context *brw) const char *reset_color = color ? NORMAL : ""; for (uint32_t *p = batch_data; p < end; p += length) { - struct gen_group *inst = gen_spec_find_instruction(spec, p); + struct gen_group *inst = gen_spec_find_instruction(spec, p[0]); length = gen_group_get_length(inst, p[0]); assert(inst == NULL || length > 0); length = MAX2(1, length); -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 22/33] intel: decoder: expose missing find_enum()
Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/intel/common/gen_decoder.h b/src/intel/common/gen_decoder.h index 658dd7f7b09..81b5beb5baf 100644 --- a/src/intel/common/gen_decoder.h +++ b/src/intel/common/gen_decoder.h @@ -52,6 +52,8 @@ uint32_t gen_spec_get_gen(struct gen_spec *spec); struct gen_group *gen_spec_find_instruction(struct gen_spec *spec, const uint32_t *p); struct gen_group *gen_spec_find_register(struct gen_spec *spec, uint32_t offset); struct gen_group *gen_spec_find_register_by_name(struct gen_spec *spec, const char *name); +struct gen_enum *gen_spec_find_enum(struct gen_spec *spec, const char *name); + int gen_group_get_length(struct gen_group *group, const uint32_t *p); const char *gen_group_get_name(struct gen_group *group); uint32_t gen_group_get_opcode(struct gen_group *group); -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa 17.2.4 release candidate
On Mon, Oct 30, 2017 at 12:43 PM, Andres Gomezwrote: > On Mon, 2017-10-30 at 11:39 -0400, Ilia Mirkin wrote: >> On Mon, Oct 30, 2017 at 11:29 AM, Emil Velikov >> wrote: >> > On 28 October 2017 at 21:35, Andres Gomez wrote: >> > > On Fri, 2017-10-27 at 14:14 -0400, Ilia Mirkin wrote: >> > > > On Fri, Oct 27, 2017 at 1:43 PM, Andres Gomez >> > > > wrote: >> > > > > Rejected (6) >> > > > > >> > > > > >> > > > > Ilia Mirkin (1): >> > > > > glsl: fix derived cs variables >> > > > > >> > > > > Reason: Commit is too big for stable at this point. >> > > > >> > > > The issue it fixes in regular compute shaders is slightly difficult to >> > > > hit (but there are piglits that do now), however the issue it hits >> > > > with ARB_compute_variable_group_size is fairly trivial to encounter. >> > > > >> > > > It seems silly to put out releases with known bugs when a fix is >> > > > easily available and apply-able, with negligible risk of messing >> > > > things up. >> > > > >> > > > Note that this all only affects nouveau and radeonsi, as those are the >> > > > only drivers that make use of the lowering. >> > > >> > > Ilia, I understood by your answer [1] when I asked about your opinion >> > > regarding 17.2.4 inclusion that it was OK to omit in this series but >> > > that it should definitely be included in 17.3. >> > > >> > > Maybe I didn't make it clear that I was asking for the 17.2 queue and I >> > > misunderstood your answer (?). >> > > >> > > Let's leave it as it is for this release and will see if we can include >> > > it for the next one. >> > > >> > >> > I'm inclined to agree with Andres - let's leave the patch out of 17.2.x. >> > >> > 17.3.x on the other hand is still fairly fresh, so I've pulled the >> > patch for 17.3.0-rc2. >> >> In that case you should definitely remove >> GL_ARB_compute_variable_group_size support from that release series. > > In that case, can we get? > > * A specific less invasive solution for the 17.2 queue (don't know if >that's even possible). Invasive is in the eye of the beholder. I don't think my change is invasive at all. (But even if it were, that still shouldn't cause it to not be backported into a stable release at developer request.) > > or > > * A patch to disable the mentioned extension in the affected drivers. Feel free to do that yourself, or find someone to do it for you. Doing that will still have some bugs with regular compute shaders which piglit will run into, although it's less clear that real applications will hit it. I think at this point, I will do what I originally said I'd do last time this kind of issue came up -- stop caring about stable releases. From now on, I will no longer mark patches for stable (since doing so only seems to cause me heartache), and make sure that anyone who's trying to get help for mesa issues starts with git master, since the stable releases aren't interested in all bug fixes. (And I certainly don't have the time to re-debug issues that have already been fixed, especially ones like this which I would never have figured out in a full application, if it wasn't for the simple test case provided in the referenced bug.) Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 23/33] intel: decoder: group enum related declarations
Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 8 src/intel/common/gen_decoder.h | 3 ++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index b0bd161fef3..1d57d350855 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -73,6 +73,14 @@ gen_group_get_opcode(struct gen_group *group) return group->opcode; } +struct gen_group * +gen_spec_find_instruction_by_name(struct gen_spec *spec, const char *name) +{ + struct hash_entry *entry = _mesa_hash_table_search(spec->commands, + name); + return entry ? entry->data : NULL; +} + struct gen_group * gen_spec_find_struct(struct gen_spec *spec, const char *name) { diff --git a/src/intel/common/gen_decoder.h b/src/intel/common/gen_decoder.h index 81b5beb5baf..eeda899cb69 100644 --- a/src/intel/common/gen_decoder.h +++ b/src/intel/common/gen_decoder.h @@ -50,6 +50,7 @@ struct gen_spec *gen_spec_load_from_path(const struct gen_device_info *devinfo, void gen_spec_destroy(struct gen_spec *spec); uint32_t gen_spec_get_gen(struct gen_spec *spec); struct gen_group *gen_spec_find_instruction(struct gen_spec *spec, const uint32_t *p); +struct gen_group *gen_spec_find_instruction_by_name(struct gen_spec *spec, const char *name); struct gen_group *gen_spec_find_register(struct gen_spec *spec, uint32_t offset); struct gen_group *gen_spec_find_register_by_name(struct gen_spec *spec, const char *name); struct gen_enum *gen_spec_find_enum(struct gen_spec *spec, const char *name); @@ -57,7 +58,7 @@ struct gen_enum *gen_spec_find_enum(struct gen_spec *spec, const char *name); int gen_group_get_length(struct gen_group *group, const uint32_t *p); const char *gen_group_get_name(struct gen_group *group); uint32_t gen_group_get_opcode(struct gen_group *group); -struct gen_enum *gen_spec_find_enum(struct gen_spec *spec, const char *name); + bool gen_field_is_header(struct gen_field *field); struct gen_field_iterator { -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 19/33] intel: decoder: rename internal function to free name
Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index 94e7e15399f..91076e901fe 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -819,7 +819,7 @@ iter_advance_field(struct gen_field_iterator *iter) } static void -gen_field_decode(struct gen_field_iterator *iter) +iter_decode_field(struct gen_field_iterator *iter) { union { uint64_t qw; @@ -929,7 +929,7 @@ gen_field_iterator_init(struct gen_field_iterator *iter, iter->end = [gen_group_get_length(iter->group, iter->p)]; iter->print_colors = print_colors; - gen_field_decode(iter); + iter_decode_field(iter); } bool @@ -938,7 +938,7 @@ gen_field_iterator_next(struct gen_field_iterator *iter) if (!iter_advance_field(iter)) return false; - gen_field_decode(iter); + iter_decode_field(iter); return true; } -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 24/33] intel: decoder: enable decoding a single field
Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 37 + src/intel/common/gen_decoder.h | 14 ++ 2 files changed, 51 insertions(+) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index 1d57d350855..bd39ff3654c 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -584,6 +584,9 @@ gen_spec_load(const struct gen_device_info *devinfo) ctx.spec->enums = _mesa_hash_table_create(ctx.spec, _mesa_hash_string, _mesa_key_string_equal); + ctx.spec->access_cache = + _mesa_hash_table_create(ctx.spec, _mesa_hash_string, _mesa_key_string_equal); + total_length = zlib_inflate(compress_genxmls, sizeof(compress_genxmls), (void **) _data); @@ -696,6 +699,32 @@ gen_spec_find_instruction(struct gen_spec *spec, const uint32_t *p) return NULL; } +struct gen_field * +gen_group_find_field(struct gen_group *group, const char *name) +{ + char path[256]; + snprintf(path, sizeof(path), "%s/%s", group->name, name); + + struct gen_spec *spec = group->spec; + struct hash_entry *entry = _mesa_hash_table_search(spec->access_cache, + path); + if (entry) + return entry->data; + + struct gen_field *field = group->fields; + while (field) { + if (strcmp(field->name, name) == 0) { + _mesa_hash_table_insert(spec->access_cache, + ralloc_strdup(spec, path), + field); + return field; + } + field = field->next; + } + + return NULL; +} + int gen_group_get_length(struct gen_group *group, const uint32_t *p) { @@ -989,6 +1018,14 @@ gen_field_is_header(struct gen_field *field) return (field->parent->opcode_mask & bits) != 0; } +void gen_field_decode(struct gen_field *field, + const uint32_t *p, const uint32_t *end, + union gen_field_value *value) +{ + uint32_t dword = field->start / 32; + value->u64 = iter_decode_field_raw(field, [dword], end); +} + void gen_print_group(FILE *outfile, struct gen_group *group, uint64_t offset, const uint32_t *p, bool color) diff --git a/src/intel/common/gen_decoder.h b/src/intel/common/gen_decoder.h index eeda899cb69..89ce05ef6d0 100644 --- a/src/intel/common/gen_decoder.h +++ b/src/intel/common/gen_decoder.h @@ -37,6 +37,7 @@ extern "C" { struct gen_spec; struct gen_group; struct gen_field; +union gen_field_value; static inline uint32_t gen_make_gen(uint32_t major, uint32_t minor) { @@ -58,8 +59,12 @@ struct gen_enum *gen_spec_find_enum(struct gen_spec *spec, const char *name); int gen_group_get_length(struct gen_group *group, const uint32_t *p); const char *gen_group_get_name(struct gen_group *group); uint32_t gen_group_get_opcode(struct gen_group *group); +struct gen_field *gen_group_find_field(struct gen_group *group, const char *name); bool gen_field_is_header(struct gen_field *field); +void gen_field_decode(struct gen_field *field, + const uint32_t *p, const uint32_t *end, + union gen_field_value *value); struct gen_field_iterator { struct gen_group *group; @@ -84,6 +89,8 @@ struct gen_spec { struct hash_table *registers_by_name; struct hash_table *registers_by_offset; struct hash_table *enums; + + struct hash_table *access_cache; }; struct gen_group { @@ -145,6 +152,13 @@ struct gen_type { }; }; +union gen_field_value { + bool b32; + float f32; + uint64_t u64; + int64_t i64; +}; + struct gen_field { struct gen_group *parent; struct gen_field *next; -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 31/33] intel: decoder: decouple decoding from memory pointers
We want to introduce a reader interface for accessing memory, so that later on we can use different ways of storing the content of the GTT address space that don't involve a pointer to a linear buffer. Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c| 75 --- src/intel/common/gen_decoder.h| 24 +++-- src/intel/tools/aubinator.c | 7 ++- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 26 +++--- 4 files changed, 101 insertions(+), 31 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index 098ff472b37..c3fa150a6ea 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -807,12 +807,18 @@ iter_group_offset_bits(const struct gen_field_iterator *iter, return iter->group->group_offset + (group_iter * iter->group->group_size); } +uint32_t gen_read_dword_from_pointer(void *user_data, uint32_t dword_offset) +{ + return ((uint32_t *) user_data)[dword_offset]; +} + static bool iter_more_groups(const struct gen_field_iterator *iter) { if (iter->group->variable) { return iter_group_offset_bits(iter, iter->group_iter + 1) < - (gen_group_get_length(iter->group, iter->p) * 32); + (gen_group_get_length(iter->group, + gen_read_dword(iter->reader, 0)) * 32); } else { return (iter->group_iter + 1) < iter->group->group_count || iter->group->next != NULL; @@ -856,17 +862,20 @@ iter_advance_field(struct gen_field_iterator *iter) static uint64_t iter_decode_field_raw(struct gen_field *field, - const uint32_t *p, - const uint32_t *end) + uint32_t dword_offset, + uint32_t dword_end, + const struct gen_dword_reader *reader) { uint64_t qw = 0; if ((field->end - field->start) > 32) { - if ((p + 1) < end) - qw = ((uint64_t) p[1]) << 32; - qw |= p[0]; + if ((dword_offset + 1) < dword_end) { + qw = gen_read_dword(reader, dword_offset + 1); + qw <<= 32; + } + qw |= gen_read_dword(reader, dword_offset); } else - qw = p[0]; + qw = gen_read_dword(reader, dword_offset); qw = field_value(qw, field->start, field->end); @@ -895,8 +904,8 @@ iter_decode_field(struct gen_field_iterator *iter) memset(, 0, sizeof(v)); - v.qw = iter_decode_field_raw(iter->field, ->p[iter->dword], iter->end); + v.qw = iter_decode_field_raw(iter->field, iter->dword, +iter->dword_end, iter->reader); const char *enum_name = NULL; @@ -966,7 +975,7 @@ iter_decode_field(struct gen_field_iterator *iter) void gen_field_iterator_init(struct gen_field_iterator *iter, struct gen_group *group, -const uint32_t *p, +const struct gen_dword_reader *reader, bool print_colors) { memset(iter, 0, sizeof(*iter)); @@ -976,8 +985,9 @@ gen_field_iterator_init(struct gen_field_iterator *iter, iter->field = group->fields; else iter->field = group->next->fields; - iter->p = p; - iter->end = [gen_group_get_length(iter->group, p[0])]; + iter->reader = reader; + iter->dword_end = gen_group_get_length(iter->group, + gen_read_dword(reader, 0)); iter->print_colors = print_colors; iter_decode_field(iter); @@ -997,10 +1007,12 @@ gen_field_iterator_next(struct gen_field_iterator *iter) static void print_dword_header(FILE *outfile, struct gen_field_iterator *iter, - uint64_t offset, uint32_t dword) + uint64_t offset, + uint32_t dword) { fprintf(outfile, "0x%08"PRIx64": 0x%08x : Dword %d\n", - offset + 4 * dword, iter->p[dword], dword); + offset + 4 * dword, + gen_read_dword(iter->reader, dword), dword); } bool @@ -1018,21 +1030,38 @@ gen_field_is_header(struct gen_field *field) } void gen_field_decode(struct gen_field *field, - const uint32_t *p, const uint32_t *end, + const struct gen_dword_reader *reader, union gen_field_value *value) { + uint32_t length = gen_group_get_length(field->parent, + gen_read_dword(reader, 0)); uint32_t dword = field->start / 32; - value->u64 = iter_decode_field_raw(field, [dword], end); + value->u64 = iter_decode_field_raw(field, dword, length, reader); +} + +struct sub_struct_reader { + struct gen_dword_reader base; + const struct gen_dword_reader *reader; + uint32_t struct_offset; +}; + +static uint32_t +read_struct_dword(void *user_data, uint32_t dword_offset) +{ + struct
[Mesa-dev] [PATCH 07/33] intel: decoder: split out getting the next field and decoding it
Due to the new way we handle fields, we need *not* to forget the first field when decoding instructions. The issue was that the advance function was called first and skipped the first field. Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 31 +-- 1 file changed, 21 insertions(+), 10 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index 8336269b183..4ce3a577f96 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -829,17 +829,14 @@ iter_advance_field(struct gen_field_iterator *iter) return true; } -bool -gen_field_iterator_next(struct gen_field_iterator *iter) +static void +gen_field_decode(struct gen_field_iterator *iter) { union { uint64_t qw; float f; } v; - if (!iter_advance_field(iter)) - return false; - if (iter->field->name) strncpy(iter->name, iter->field->name, sizeof(iter->name)); else @@ -920,8 +917,6 @@ gen_field_iterator_next(struct gen_field_iterator *iter) snprintf(iter->value + length, sizeof(iter->value) - length, " (%s)", enum_name); } - - return true; } void @@ -933,9 +928,25 @@ gen_field_iterator_init(struct gen_field_iterator *iter, memset(iter, 0, sizeof(*iter)); iter->group = group; - iter->field = group->fields; + if (group->fields) + iter->field = group->fields; + else + iter->field = group->next->fields; iter->p = p; iter->print_colors = print_colors; + + gen_field_decode(iter); +} + +bool +gen_field_iterator_next(struct gen_field_iterator *iter) +{ + if (!iter_advance_field(iter)) + return false; + + gen_field_decode(iter); + + return true; } static void @@ -969,7 +980,7 @@ gen_print_group(FILE *outfile, struct gen_group *group, int last_dword = -1; gen_field_iterator_init(, group, p, color); - while (gen_field_iterator_next()) { + do { if (last_dword != iter.dword) { for (int i = last_dword + 1; i <= iter.dword; i++) print_dword_header(outfile, , offset, i); @@ -983,5 +994,5 @@ gen_print_group(FILE *outfile, struct gen_group *group, [iter.dword], color); } } - } + } while (gen_field_iterator_next()); } -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 25/33] intel: compiler: abstract printing
This is required to have output redirected to something else than a file descriptor (stdout). Signed-off-by: Lionel Landwerlin--- src/intel/compiler/brw_compile_clip.c | 5 +- src/intel/compiler/brw_compile_sf.c | 5 +- src/intel/compiler/brw_disasm.c | 645 -- src/intel/compiler/brw_eu.c | 39 -- src/intel/compiler/brw_eu.h | 13 +- src/intel/compiler/brw_eu_compact.c | 9 +- src/intel/compiler/intel_asm_annotation.c | 5 +- src/intel/compiler/test_eu_compact.cpp| 7 +- src/intel/tools/aubinator.c | 12 +- src/intel/tools/disasm.c | 63 ++- src/intel/tools/gen_disasm.h | 7 +- 11 files changed, 459 insertions(+), 351 deletions(-) diff --git a/src/intel/compiler/brw_compile_clip.c b/src/intel/compiler/brw_compile_clip.c index 83788e4b648..8ed82a78fb4 100644 --- a/src/intel/compiler/brw_compile_clip.c +++ b/src/intel/compiler/brw_compile_clip.c @@ -88,7 +88,10 @@ brw_compile_clip(const struct brw_compiler *compiler, if (unlikely(INTEL_DEBUG & DEBUG_CLIP)) { fprintf(stderr, "clip:\n"); brw_disassemble(compiler->devinfo, - program, 0, *final_assembly_size, stderr); + program, 0, *final_assembly_size, + &(struct brw_print){ + .handle = stderr, + .string = (brw_print_cb) fputs, }); fprintf(stderr, "\n"); } diff --git a/src/intel/compiler/brw_compile_sf.c b/src/intel/compiler/brw_compile_sf.c index 91e8a6da6cf..8ad5497290e 100644 --- a/src/intel/compiler/brw_compile_sf.c +++ b/src/intel/compiler/brw_compile_sf.c @@ -871,7 +871,10 @@ brw_compile_sf(const struct brw_compiler *compiler, if (unlikely(INTEL_DEBUG & DEBUG_SF)) { fprintf(stderr, "sf:\n"); brw_disassemble(compiler->devinfo, - program, 0, *final_assembly_size, stderr); + program, 0, *final_assembly_size, + &(struct brw_print){ + .handle = stderr, + .string = (brw_print_cb) fputs, }); fprintf(stderr, "\n"); } diff --git a/src/intel/compiler/brw_disasm.c b/src/intel/compiler/brw_disasm.c index 1a94ed39540..b09813529ac 100644 --- a/src/intel/compiler/brw_disasm.c +++ b/src/intel/compiler/brw_disasm.c @@ -24,6 +24,8 @@ #include #include +#include "common/gen_debug.h" + #include "brw_eu_defines.h" #include "brw_inst.h" #include "brw_shader.h" @@ -568,18 +570,18 @@ static const char *const sampler_target_format[4] = { static int column; static int -string(FILE *file, const char *string) +string(const struct brw_print *prnt, const char *string) { - fputs(string, file); + prnt->string(string, prnt->handle); column += strlen(string); return 0; } static int -format(FILE *f, const char *format, ...) PRINTFLIKE(2, 3); +format(const struct brw_print *prnt, const char *format, ...) PRINTFLIKE(2, 3); static int -format(FILE *f, const char *format, ...) +format(const struct brw_print *prnt, const char *format, ...) { char buf[1024]; va_list args; @@ -587,39 +589,39 @@ format(FILE *f, const char *format, ...) vsnprintf(buf, sizeof(buf) - 1, format, args); va_end(args); - string(f, buf); + string(prnt, buf); return 0; } static int -newline(FILE *f) +newline(const struct brw_print *prnt) { - putc('\n', f); + prnt->string("\n", prnt->handle); column = 0; return 0; } static int -pad(FILE *f, int c) +pad(const struct brw_print *prnt, int c) { do - string(f, " "); + string(prnt, " "); while (column < c); return 0; } static int -control(FILE *file, const char *name, const char *const ctrl[], +control(const struct brw_print *prnt, const char *name, const char *const ctrl[], unsigned id, int *space) { if (!ctrl[id]) { - fprintf(file, "*** invalid %s value %d ", name, id); + format(prnt, "*** invalid %s value %d ", name, id); return 1; } if (ctrl[id][0]) { if (space && *space) - string(file, " "); - string(file, ctrl[id]); + string(prnt, " "); + string(prnt, ctrl[id]); if (space) *space = 1; } @@ -627,20 +629,20 @@ control(FILE *file, const char *name, const char *const ctrl[], } static int -print_opcode(FILE *file, const struct gen_device_info *devinfo, +print_opcode(const struct brw_print *prnt, const struct gen_device_info *devinfo, enum opcode id) { const struct opcode_desc *desc = brw_opcode_desc(devinfo, id); if (!desc) { - format(file, "*** invalid opcode value %d ", id); + format(prnt, "*** invalid opcode value %d ", id); return 1; } - string(file, desc->name); + string(prnt, desc->name); return 0; } static int -reg(FILE *file, unsigned _reg_file,
[Mesa-dev] [PATCH 29/33] intel: decoder: change group_get_length() to take first dword
This is a first step in not accessing the dwords through pointers. Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c| 27 +-- src/intel/common/gen_decoder.h| 2 +- src/intel/tools/aubinator.c | 4 ++-- src/intel/tools/aubinator_error_decode.c | 2 +- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 2 +- 5 files changed, 18 insertions(+), 19 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index bd39ff3654c..217e84fb38e 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -726,35 +726,34 @@ gen_group_find_field(struct gen_group *group, const char *name) } int -gen_group_get_length(struct gen_group *group, const uint32_t *p) +gen_group_get_length(struct gen_group *group, uint32_t dw0) { - uint32_t h = p[0]; - uint32_t type = field_value(h, 29, 31); + uint32_t type = field_value(dw0, 29, 31); switch (type) { case 0: /* MI */ { - uint32_t opcode = field_value(h, 23, 28); + uint32_t opcode = field_value(dw0, 23, 28); if (opcode < 16) return 1; else - return field_value(h, 0, 7) + 2; + return field_value(dw0, 0, 7) + 2; break; } case 2: /* BLT */ { - return field_value(h, 0, 7) + 2; + return field_value(dw0, 0, 7) + 2; } case 3: /* Render */ { - uint32_t subtype = field_value(h, 27, 28); - uint32_t opcode = field_value(h, 24, 26); - uint16_t whole_opcode = field_value(h, 16, 31); + uint32_t subtype = field_value(dw0, 27, 28); + uint32_t opcode = field_value(dw0, 24, 26); + uint16_t whole_opcode = field_value(dw0, 16, 31); switch (subtype) { case 0: if (whole_opcode == 0x6104 /* PIPELINE_SELECT_965 */) return 1; else if (opcode < 2) -return field_value(h, 0, 7) + 2; +return field_value(dw0, 0, 7) + 2; else return -1; case 1: @@ -764,9 +763,9 @@ gen_group_get_length(struct gen_group *group, const uint32_t *p) return -1; case 2: { if (opcode == 0) -return field_value(h, 0, 7) + 2; +return field_value(dw0, 0, 7) + 2; else if (opcode < 3) -return field_value(h, 0, 15) + 2; +return field_value(dw0, 0, 15) + 2; else return -1; } @@ -774,7 +773,7 @@ gen_group_get_length(struct gen_group *group, const uint32_t *p) if (whole_opcode == 0x780b) return 1; else if (opcode < 4) -return field_value(h, 0, 7) + 2; +return field_value(dw0, 0, 7) + 2; else return -1; } @@ -978,7 +977,7 @@ gen_field_iterator_init(struct gen_field_iterator *iter, else iter->field = group->next->fields; iter->p = p; - iter->end = [gen_group_get_length(iter->group, iter->p)]; + iter->end = [gen_group_get_length(iter->group, p[0])]; iter->print_colors = print_colors; iter_decode_field(iter); diff --git a/src/intel/common/gen_decoder.h b/src/intel/common/gen_decoder.h index 89ce05ef6d0..334cfaac2c2 100644 --- a/src/intel/common/gen_decoder.h +++ b/src/intel/common/gen_decoder.h @@ -56,7 +56,7 @@ struct gen_group *gen_spec_find_register(struct gen_spec *spec, uint32_t offset) struct gen_group *gen_spec_find_register_by_name(struct gen_spec *spec, const char *name); struct gen_enum *gen_spec_find_enum(struct gen_spec *spec, const char *name); -int gen_group_get_length(struct gen_group *group, const uint32_t *p); +int gen_group_get_length(struct gen_group *group, uint32_t dw0); const char *gen_group_get_name(struct gen_group *group); uint32_t gen_group_get_opcode(struct gen_group *group); struct gen_field *gen_group_find_field(struct gen_group *group, const char *name); diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c index 436a4928979..72f8d2aa4e8 100644 --- a/src/intel/tools/aubinator.c +++ b/src/intel/tools/aubinator.c @@ -699,7 +699,7 @@ parse_commands(struct gen_spec *spec, uint32_t *cmds, int size, int engine) for (p = cmds; p < end; p += length) { inst = gen_spec_find_instruction(spec, p); - length = gen_group_get_length(inst, p); + length = gen_group_get_length(inst, p[0]); assert(inst == NULL || length > 0); length = MAX2(1, length); if (inst == NULL) { @@ -732,7 +732,7 @@ parse_commands(struct gen_spec *spec, uint32_t *cmds, int size, int engine) fprintf(outfile, "%s0x%08"PRIx64": 0x%08x: %s (%i Dwords) %-80s %s\n", color, offset, p[0], gen_group_get_name(inst), - gen_group_get_length(inst, p), + gen_group_get_length(inst, p[0]), "", reset_color); diff --git a/src/intel/tools/aubinator_error_decode.c
[Mesa-dev] [PATCH 14/33] intel: decoder: extract instruction/structs length
Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 7 +++ src/intel/common/gen_decoder.h | 1 + 2 files changed, 8 insertions(+) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index cd18580aea8..2562aa56175 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -162,6 +162,13 @@ create_group(struct parser_context *ctx, group->spec = ctx->spec; group->variable = false; + for (int i = 0; atts[i]; i += 2) { + char *p; + if (strcmp(atts[i], "length") == 0) { + group->dw_length = strtoul(atts[i + 1], , 0); + } + } + if (parent) { group->parent = parent; get_group_offset_count(atts, diff --git a/src/intel/common/gen_decoder.h b/src/intel/common/gen_decoder.h index 2637c42e7d3..dc1f92ab6a3 100644 --- a/src/intel/common/gen_decoder.h +++ b/src/intel/common/gen_decoder.h @@ -85,6 +85,7 @@ struct gen_group { struct gen_field *fields; /* linked list of fields */ + uint32_t dw_length; uint32_t group_offset, group_count; uint32_t group_size; bool variable; -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 20/33] intel: decoder: rename field() to field_value()
We would like to avoid collisions with variables named field. Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 36 ++-- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index 91076e901fe..a63c09cd37e 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -235,7 +235,7 @@ mask(int start, int end) } static inline uint64_t -field(uint64_t value, int start, int end) +field_value(uint64_t value, int start, int end) { get_start_end_pos(, ); return (value & mask(start, end)) >> (start); @@ -692,32 +692,32 @@ int gen_group_get_length(struct gen_group *group, const uint32_t *p) { uint32_t h = p[0]; - uint32_t type = field(h, 29, 31); + uint32_t type = field_value(h, 29, 31); switch (type) { case 0: /* MI */ { - uint32_t opcode = field(h, 23, 28); + uint32_t opcode = field_value(h, 23, 28); if (opcode < 16) return 1; else - return field(h, 0, 7) + 2; + return field_value(h, 0, 7) + 2; break; } case 2: /* BLT */ { - return field(h, 0, 7) + 2; + return field_value(h, 0, 7) + 2; } case 3: /* Render */ { - uint32_t subtype = field(h, 27, 28); - uint32_t opcode = field(h, 24, 26); - uint16_t whole_opcode = field(h, 16, 31); + uint32_t subtype = field_value(h, 27, 28); + uint32_t opcode = field_value(h, 24, 26); + uint16_t whole_opcode = field_value(h, 16, 31); switch (subtype) { case 0: if (whole_opcode == 0x6104 /* PIPELINE_SELECT_965 */) return 1; else if (opcode < 2) -return field(h, 0, 7) + 2; +return field_value(h, 0, 7) + 2; else return -1; case 1: @@ -727,9 +727,9 @@ gen_group_get_length(struct gen_group *group, const uint32_t *p) return -1; case 2: { if (opcode == 0) -return field(h, 0, 7) + 2; +return field_value(h, 0, 7) + 2; else if (opcode < 3) -return field(h, 0, 15) + 2; +return field_value(h, 0, 15) + 2; else return -1; } @@ -737,7 +737,7 @@ gen_group_get_length(struct gen_group *group, const uint32_t *p) if (whole_opcode == 0x780b) return 1; else if (opcode < 4) -return field(h, 0, 7) + 2; +return field_value(h, 0, 7) + 2; else return -1; } @@ -845,13 +845,13 @@ iter_decode_field(struct gen_field_iterator *iter) switch (iter->field->type.kind) { case GEN_TYPE_UNKNOWN: case GEN_TYPE_INT: { - uint64_t value = field(v.qw, iter->field->start, iter->field->end); + uint64_t value = field_value(v.qw, iter->field->start, iter->field->end); snprintf(iter->value, sizeof(iter->value), "%"PRId64, value); enum_name = gen_get_enum_name(>field->inline_enum, value); break; } case GEN_TYPE_UINT: { - uint64_t value = field(v.qw, iter->field->start, iter->field->end); + uint64_t value = field_value(v.qw, iter->field->start, iter->field->end); snprintf(iter->value, sizeof(iter->value), "%"PRIu64, value); enum_name = gen_get_enum_name(>field->inline_enum, value); break; @@ -860,7 +860,7 @@ iter_decode_field(struct gen_field_iterator *iter) const char *true_string = iter->print_colors ? "\e[0;35mtrue\e[0m" : "true"; snprintf(iter->value, sizeof(iter->value), "%s", - field(v.qw, iter->field->start, iter->field->end) ? + field_value(v.qw, iter->field->start, iter->field->end) ? true_string : "false"); break; } @@ -881,8 +881,8 @@ iter_decode_field(struct gen_field_iterator *iter) break; case GEN_TYPE_UFIXED: snprintf(iter->value, sizeof(iter->value), "%f", - (float) field(v.qw, iter->field->start, - iter->field->end) / (1 << iter->field->type.f)); + (float) field_value(v.qw, iter->field->start, + iter->field->end) / (1 << iter->field->type.f)); break; case GEN_TYPE_SFIXED: /* FIXME: Sign extend extracted field. */ @@ -891,7 +891,7 @@ iter_decode_field(struct gen_field_iterator *iter) case GEN_TYPE_MBO: break; case GEN_TYPE_ENUM: { - uint64_t value = field(v.qw, iter->field->start, iter->field->end); + uint64_t value = field_value(v.qw, iter->field->start, iter->field->end); snprintf(iter->value, sizeof(iter->value), "%"PRId64, value); enum_name = gen_get_enum_name(iter->field->type.gen_enum, value); -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
[Mesa-dev] [PATCH 21/33] intel: decoder: extract field value computation
Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 59 ++ 1 file changed, 37 insertions(+), 22 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index a63c09cd37e..b0bd161fef3 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -818,6 +818,32 @@ iter_advance_field(struct gen_field_iterator *iter) return true; } +static uint64_t +iter_decode_field_raw(struct gen_field *field, + const uint32_t *p, + const uint32_t *end) +{ + uint64_t qw = 0; + + if ((field->end - field->start) > 32) { + if ((p + 1) < end) + qw = ((uint64_t) p[1]) << 32; + qw |= p[0]; + } else + qw = p[0]; + + qw = field_value(qw, field->start, field->end); + + /* Address & offset types have to be aligned to dwords, their start bit is +* a reminder of the alignment requirement. +*/ + if (field->type.kind == GEN_TYPE_ADDRESS || + field->type.kind == GEN_TYPE_OFFSET) + qw <<= field->start % 32; + + return qw; +} + static void iter_decode_field(struct gen_field_iterator *iter) { @@ -833,35 +859,28 @@ iter_decode_field(struct gen_field_iterator *iter) memset(, 0, sizeof(v)); - if ((iter->field->end - iter->field->start) > 32) { - if (>p[iter->dword + 1] < iter->end) - v.qw = ((uint64_t) iter->p[iter->dword+1] << 32); - v.qw |= iter->p[iter->dword]; - } else - v.qw = iter->p[iter->dword]; + v.qw = iter_decode_field_raw(iter->field, +>p[iter->dword], iter->end); const char *enum_name = NULL; switch (iter->field->type.kind) { case GEN_TYPE_UNKNOWN: case GEN_TYPE_INT: { - uint64_t value = field_value(v.qw, iter->field->start, iter->field->end); - snprintf(iter->value, sizeof(iter->value), "%"PRId64, value); - enum_name = gen_get_enum_name(>field->inline_enum, value); + snprintf(iter->value, sizeof(iter->value), "%"PRId64, v.qw); + enum_name = gen_get_enum_name(>field->inline_enum, v.qw); break; } case GEN_TYPE_UINT: { - uint64_t value = field_value(v.qw, iter->field->start, iter->field->end); - snprintf(iter->value, sizeof(iter->value), "%"PRIu64, value); - enum_name = gen_get_enum_name(>field->inline_enum, value); + snprintf(iter->value, sizeof(iter->value), "%"PRIu64, v.qw); + enum_name = gen_get_enum_name(>field->inline_enum, v.qw); break; } case GEN_TYPE_BOOL: { const char *true_string = iter->print_colors ? "\e[0;35mtrue\e[0m" : "true"; snprintf(iter->value, sizeof(iter->value), "%s", - field_value(v.qw, iter->field->start, iter->field->end) ? - true_string : "false"); + v.qw ? true_string : "false"); break; } case GEN_TYPE_FLOAT: @@ -869,8 +888,7 @@ iter_decode_field(struct gen_field_iterator *iter) break; case GEN_TYPE_ADDRESS: case GEN_TYPE_OFFSET: - snprintf(iter->value, sizeof(iter->value), "0x%08"PRIx64, - field_address(v.qw, iter->field->start, iter->field->end)); + snprintf(iter->value, sizeof(iter->value), "0x%08"PRIx64, v.qw); break; case GEN_TYPE_STRUCT: snprintf(iter->value, sizeof(iter->value), "", @@ -881,8 +899,7 @@ iter_decode_field(struct gen_field_iterator *iter) break; case GEN_TYPE_UFIXED: snprintf(iter->value, sizeof(iter->value), "%f", - (float) field_value(v.qw, iter->field->start, - iter->field->end) / (1 << iter->field->type.f)); + (float) v.qw / (1 << iter->field->type.f)); break; case GEN_TYPE_SFIXED: /* FIXME: Sign extend extracted field. */ @@ -891,10 +908,8 @@ iter_decode_field(struct gen_field_iterator *iter) case GEN_TYPE_MBO: break; case GEN_TYPE_ENUM: { - uint64_t value = field_value(v.qw, iter->field->start, iter->field->end); - snprintf(iter->value, sizeof(iter->value), - "%"PRId64, value); - enum_name = gen_get_enum_name(iter->field->type.gen_enum, value); + snprintf(iter->value, sizeof(iter->value), "%"PRId64, v.qw); + enum_name = gen_get_enum_name(iter->field->type.gen_enum, v.qw); break; } } -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 17/33] intel: common: make intel utils available from C++
Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.h | 9 + src/intel/common/gen_device_info.h | 8 src/intel/tools/gen_disasm.h | 8 3 files changed, 25 insertions(+) diff --git a/src/intel/common/gen_decoder.h b/src/intel/common/gen_decoder.h index dc1f92ab6a3..2c54eed267b 100644 --- a/src/intel/common/gen_decoder.h +++ b/src/intel/common/gen_decoder.h @@ -30,6 +30,10 @@ #include "common/gen_device_info.h" #include "util/hash_table.h" +#ifdef __cplusplus +extern "C" { +#endif + struct gen_spec; struct gen_group; struct gen_field; @@ -162,4 +166,9 @@ void gen_print_group(FILE *out, uint64_t offset, const uint32_t *p, bool color); +#ifdef __cplusplus +} +#endif + + #endif /* GEN_DECODER_H */ diff --git a/src/intel/common/gen_device_info.h b/src/intel/common/gen_device_info.h index 59b345e949c..30ddd905be1 100644 --- a/src/intel/common/gen_device_info.h +++ b/src/intel/common/gen_device_info.h @@ -28,6 +28,10 @@ #include #include +#ifdef __cplusplus +extern "C" { +#endif + /** * Intel hardware information and quirks */ @@ -198,4 +202,8 @@ struct gen_device_info bool gen_get_device_info(int devid, struct gen_device_info *devinfo); const char *gen_get_device_name(int devid); +#ifdef __cplusplus +} +#endif + #endif /* GEN_DEVICE_INFO_H */ diff --git a/src/intel/tools/gen_disasm.h b/src/intel/tools/gen_disasm.h index 24b56c9a8e1..d2764bb90b7 100644 --- a/src/intel/tools/gen_disasm.h +++ b/src/intel/tools/gen_disasm.h @@ -24,6 +24,10 @@ #ifndef GEN_DISASM_H #define GEN_DISASM_H +#ifdef __cplusplus +extern "C" { +#endif + struct gen_disasm; struct gen_disasm *gen_disasm_create(int pciid); @@ -32,4 +36,8 @@ void gen_disasm_disassemble(struct gen_disasm *disasm, void gen_disasm_destroy(struct gen_disasm *disasm); +#ifdef __cplusplus +} +#endif + #endif /* GEN_DISASM_H */ -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/33] intel: decoder: simplify creation of struct when 0-allocated
Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 4 1 file changed, 4 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index 0bf705fa9e1..6a6a1f0aca4 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -161,8 +161,6 @@ create_group(struct parser_context *ctx, group->name = ralloc_strdup(group, name); group->spec = ctx->spec; - group->group_offset = 0; - group->group_count = 0; group->variable = false; if (parent) { @@ -186,8 +184,6 @@ create_enum(struct parser_context *ctx, const char *name, const char **atts) if (name) e->name = ralloc_strdup(e, name); - e->nvalues = 0; - return e; } -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 16/33] intel: decoder: remove unused platform field
Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index 2562aa56175..050926f5642 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -49,7 +49,6 @@ struct parser_context { XML_Parser parser; int foo; struct location loc; - const char *platform; struct gen_group *group; struct gen_enum *enoom; @@ -369,7 +368,6 @@ start_element(void *data, const char *element_name, const char **atts) if (gen == NULL) fail(>loc, "no gen given"); - ctx->platform = xstrdup(name); int major, minor; int n = sscanf(gen, "%d.%d", , ); if (n == 0) -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 15/33] intel: error-decode: implement a rolling window of programs
If we have more programs than what we can store, aubinator_error_decode will assert. Instead let's have a rolling window of programs. v2: Fix overflowing issues (Eric Engestrom) Signed-off-by: Lionel LandwerlinReviewed-by: Eric Engestrom --- src/intel/tools/aubinator_error_decode.c | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/src/intel/tools/aubinator_error_decode.c b/src/intel/tools/aubinator_error_decode.c index ed4d6f662ce..52c323e77ee 100644 --- a/src/intel/tools/aubinator_error_decode.c +++ b/src/intel/tools/aubinator_error_decode.c @@ -47,6 +47,8 @@ #define GREEN_HEADER CSI "1;42m" #define NORMAL CSI "0m" +#define MIN(a, b) ((a) < (b) ? (a) : (b)) + /* options */ static bool option_full_decode = true; @@ -220,7 +222,15 @@ struct program { #define MAX_NUM_PROGRAMS 4096 static struct program programs[MAX_NUM_PROGRAMS]; -static int num_programs = 0; +static int idx_program = 0, num_programs = 0; + +static int next_program(void) +{ + int ret = idx_program; + idx_program = (idx_program + 1) % MAX_NUM_PROGRAMS; + num_programs = MIN(num_programs + 1, MAX_NUM_PROGRAMS); + return ret; +} static void decode(struct gen_spec *spec, const char *buffer_name, @@ -300,7 +310,7 @@ static void decode(struct gen_spec *spec, enabled[1] ? "SIMD16 fragment shader" : enabled[2] ? "SIMD32 fragment shader" : NULL; -programs[num_programs++] = (struct program) { +programs[next_program()] = (struct program) { .type = type, .command = inst->name, .command_offset = offset, @@ -309,7 +319,7 @@ static void decode(struct gen_spec *spec, }; } else { if (enabled[0]) /* SIMD8 */ { - programs[num_programs++] = (struct program) { + programs[next_program()] = (struct program) { .type = "SIMD8 fragment shader", .command = inst->name, .command_offset = offset, @@ -318,7 +328,7 @@ static void decode(struct gen_spec *spec, }; } if (enabled[1]) /* SIMD16 */ { - programs[num_programs++] = (struct program) { + programs[next_program()] = (struct program) { .type = "SIMD16 fragment shader", .command = inst->name, .command_offset = offset, @@ -327,7 +337,7 @@ static void decode(struct gen_spec *spec, }; } if (enabled[2]) /* SIMD32 */ { - programs[num_programs++] = (struct program) { + programs[next_program()] = (struct program) { .type = "SIMD32 fragment shader", .command = inst->name, .command_offset = offset, @@ -374,7 +384,7 @@ static void decode(struct gen_spec *spec, NULL; if (is_enabled) { -programs[num_programs++] = (struct program) { +programs[next_program()] = (struct program) { .type = type, .command = inst->name, .command_offset = offset, @@ -383,8 +393,6 @@ static void decode(struct gen_spec *spec, }; } } - - assert(num_programs < MAX_NUM_PROGRAMS); } } -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/33] intel: decoder: add destructor for gen_spec
This makes use of ralloc to simplify the destruction. We can also store instructions in hash tables. Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 176 +++-- src/intel/common/gen_decoder.h | 15 ++-- 2 files changed, 90 insertions(+), 101 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index f8fca28a18b..0bf705fa9e1 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -38,6 +38,7 @@ #include "genxml/genX_xml.h" #define XML_BUFFER_SIZE 4096 +#define MAX_VALUE_ITEMS 128 struct location { const char *filename; @@ -53,11 +54,10 @@ struct parser_context { struct gen_group *group; struct gen_enum *enoom; - int nvalues; - struct gen_value *values[256]; + int n_values, n_allocated_values; + struct gen_value **values; - struct gen_field *fields[256]; - int nfields; + struct gen_field *last_field; struct gen_spec *spec; }; @@ -77,42 +77,34 @@ gen_group_get_opcode(struct gen_group *group) struct gen_group * gen_spec_find_struct(struct gen_spec *spec, const char *name) { - for (int i = 0; i < spec->nstructs; i++) - if (strcmp(spec->structs[i]->name, name) == 0) - return spec->structs[i]; - - return NULL; + struct hash_entry *entry = _mesa_hash_table_search(spec->structs, + name); + return entry ? entry->data : NULL; } struct gen_group * gen_spec_find_register(struct gen_spec *spec, uint32_t offset) { - for (int i = 0; i < spec->nregisters; i++) - if (spec->registers[i]->register_offset == offset) - return spec->registers[i]; - - return NULL; + struct hash_entry *entry = + _mesa_hash_table_search(spec->registers_by_offset, + (void *) (uintptr_t) offset); + return entry ? entry->data : NULL; } struct gen_group * gen_spec_find_register_by_name(struct gen_spec *spec, const char *name) { - for (int i = 0; i < spec->nregisters; i++) { - if (strcmp(spec->registers[i]->name, name) == 0) - return spec->registers[i]; - } - - return NULL; + struct hash_entry *entry = + _mesa_hash_table_search(spec->registers_by_name, name); + return entry ? entry->data : NULL; } struct gen_enum * gen_spec_find_enum(struct gen_spec *spec, const char *name) { - for (int i = 0; i < spec->nenums; i++) - if (strcmp(spec->enums[i]->name, name) == 0) - return spec->enums[i]; - - return NULL; + struct hash_entry *entry = _mesa_hash_table_search(spec->enums, + name); + return entry ? entry->data : NULL; } uint32_t @@ -135,35 +127,6 @@ fail(struct location *loc, const char *msg, ...) exit(EXIT_FAILURE); } -static void * -fail_on_null(void *p) -{ - if (p == NULL) { - fprintf(stderr, "aubinator: out of memory\n"); - exit(EXIT_FAILURE); - } - - return p; -} - -static char * -xstrdup(const char *s) -{ - return fail_on_null(strdup(s)); -} - -static void * -zalloc(size_t s) -{ - return calloc(s, 1); -} - -static void * -xzalloc(size_t s) -{ - return fail_on_null(zalloc(s)); -} - static void get_group_offset_count(const char **atts, uint32_t *offset, uint32_t *count, uint32_t *size, bool *variable) @@ -193,9 +156,9 @@ create_group(struct parser_context *ctx, { struct gen_group *group; - group = xzalloc(sizeof(*group)); + group = rzalloc(ctx->spec, struct gen_group); if (name) - group->name = xstrdup(name); + group->name = ralloc_strdup(group, name); group->spec = ctx->spec; group->group_offset = 0; @@ -219,9 +182,9 @@ create_enum(struct parser_context *ctx, const char *name, const char **atts) { struct gen_enum *e; - e = xzalloc(sizeof(*e)); + e = rzalloc(ctx->spec, struct gen_enum); if (name) - e->name = xstrdup(name); + e->name = ralloc_strdup(e, name); e->nvalues = 0; @@ -326,11 +289,11 @@ create_field(struct parser_context *ctx, const char **atts) char *p; int i; - field = xzalloc(sizeof(*field)); + field = rzalloc(ctx->group, struct gen_field); for (i = 0; atts[i]; i += 2) { if (strcmp(atts[i], "name") == 0) - field->name = xstrdup(atts[i + 1]); + field->name = ralloc_strdup(field, atts[i + 1]); else if (strcmp(atts[i], "start") == 0) field->start = strtoul(atts[i + 1], , 0); else if (strcmp(atts[i], "end") == 0) { @@ -350,11 +313,11 @@ create_field(struct parser_context *ctx, const char **atts) static struct gen_value * create_value(struct parser_context *ctx, const char **atts) { - struct gen_value *value = xzalloc(sizeof(*value)); + struct gen_value *value = rzalloc(ctx->values, struct gen_value); for (int i = 0; atts[i]; i += 2) { if (strcmp(atts[i], "name") == 0) -
[Mesa-dev] [PATCH 13/33] intel: decoder: pack iterator variable declarations
Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 19 --- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index 6a6a1f0aca4..cd18580aea8 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -131,10 +131,9 @@ static void get_group_offset_count(const char **atts, uint32_t *offset, uint32_t *count, uint32_t *size, bool *variable) { - char *p; - int i; + for (int i = 0; atts[i]; i += 2) { + char *p; - for (i = 0; atts[i]; i += 2) { if (strcmp(atts[i], "count") == 0) { *count = strtoul(atts[i + 1], , 0); if (*count == 0) @@ -190,10 +189,9 @@ create_enum(struct parser_context *ctx, const char *name, const char **atts) static void get_register_offset(const char **atts, uint32_t *offset) { - char *p; - int i; + for (int i = 0; atts[i]; i += 2) { + char *p; - for (i = 0; atts[i]; i += 2) { if (strcmp(atts[i], "num") == 0) *offset = strtoul(atts[i + 1], , 0); } @@ -282,12 +280,12 @@ static struct gen_field * create_field(struct parser_context *ctx, const char **atts) { struct gen_field *field; - char *p; - int i; field = rzalloc(ctx->group, struct gen_field); - for (i = 0; atts[i]; i += 2) { + for (int i = 0; atts[i]; i += 2) { + char *p; + if (strcmp(atts[i], "name") == 0) field->name = ralloc_strdup(field, atts[i + 1]); else if (strcmp(atts[i], "start") == 0) @@ -346,13 +344,12 @@ static void start_element(void *data, const char *element_name, const char **atts) { struct parser_context *ctx = data; - int i; const char *name = NULL; const char *gen = NULL; ctx->loc.line_number = XML_GetCurrentLineNumber(ctx->parser); - for (i = 0; atts[i]; i += 2) { + for (int i = 0; atts[i]; i += 2) { if (strcmp(atts[i], "name") == 0) name = atts[i + 1]; else if (strcmp(atts[i], "gen") == 0) -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/33] aubinator: print number of dwords per instruction
Signed-off-by: Lionel Landwerlin--- src/intel/tools/aubinator.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/intel/tools/aubinator.c b/src/intel/tools/aubinator.c index 48d4456cc16..2c4eaab1701 100644 --- a/src/intel/tools/aubinator.c +++ b/src/intel/tools/aubinator.c @@ -729,9 +729,12 @@ parse_commands(struct gen_spec *spec, uint32_t *cmds, int size, int engine) else offset = 0; - fprintf(outfile, "%s0x%08"PRIx64": 0x%08x: %-80s%s\n", + fprintf(outfile, "%s0x%08"PRIx64": 0x%08x: %s (%i Dwords) %-80s %s\n", color, offset, p[0], - gen_group_get_name(inst), reset_color); + gen_group_get_name(inst), + gen_group_get_length(inst, p), + "", + reset_color); if (option_full_decode) { decode_group(inst, p, 0); -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/33] intel: decoder: expose helper to test header fields
These fields are of little importance as they're used to recognize instructions. Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 6 +++--- src/intel/common/gen_decoder.h | 1 + 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index c1affc47a02..f8fca28a18b 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -963,8 +963,8 @@ print_dword_header(FILE *outfile, offset + 4 * dword, iter->p[dword], dword); } -static bool -is_header_field(struct gen_group *group, struct gen_field *field) +bool +gen_group_header_is_header(struct gen_group *group, struct gen_field *field) { uint32_t bits; @@ -991,7 +991,7 @@ gen_print_group(FILE *outfile, struct gen_group *group, print_dword_header(outfile, , offset, i); last_dword = iter.dword; } - if (!is_header_field(group, iter.field)) { + if (!gen_group_header_is_header(group, iter.field)) { fprintf(outfile, "%s: %s\n", iter.name, iter.value); if (iter.struct_desc) { uint64_t struct_offset = offset + 4 * iter.dword; diff --git a/src/intel/common/gen_decoder.h b/src/intel/common/gen_decoder.h index 4d9edf78ff0..86ececeb8b1 100644 --- a/src/intel/common/gen_decoder.h +++ b/src/intel/common/gen_decoder.h @@ -50,6 +50,7 @@ int gen_group_get_length(struct gen_group *group, const uint32_t *p); const char *gen_group_get_name(struct gen_group *group); uint32_t gen_group_get_opcode(struct gen_group *group); struct gen_enum *gen_spec_find_enum(struct gen_spec *spec, const char *name); +bool gen_group_header_is_header(struct gen_group *group, struct gen_field *field); struct gen_field_iterator { struct gen_group *group; -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/33] intel: decoder: reorder iterator init function
Making the next change more readable. Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index 040201541ff..ef39c1c14db 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -758,20 +758,6 @@ gen_group_get_length(struct gen_group *group, const uint32_t *p) return -1; } -void -gen_field_iterator_init(struct gen_field_iterator *iter, -struct gen_group *group, -const uint32_t *p, -bool print_colors) -{ - memset(iter, 0, sizeof(*iter)); - - iter->group = group; - iter->field = group->fields; - iter->p = p; - iter->print_colors = print_colors; -} - static const char * gen_get_enum_name(struct gen_enum *e, uint64_t value) { @@ -937,6 +923,20 @@ gen_field_iterator_next(struct gen_field_iterator *iter) return true; } +void +gen_field_iterator_init(struct gen_field_iterator *iter, +struct gen_group *group, +const uint32_t *p, +bool print_colors) +{ + memset(iter, 0, sizeof(*iter)); + + iter->group = group; + iter->field = group->fields; + iter->p = p; + iter->print_colors = print_colors; +} + static void print_dword_header(FILE *outfile, struct gen_field_iterator *iter, -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/33] intel: common: print out all dword with field spanning multiple dwords
For example, we were skipping Dword 3 in this PIPE_CONTROL : 0x000ce130: 0x7a04: PIPE_CONTROL DWord Length: 4 0x000ce134: 0x0010 : Dword 1 Flush LLC: false Destination Address Type: 0 (PPGTT) LRI Post Sync Operation: 0 (No LRI Operation) Store Data Index: 0 Command Streamer Stall Enable: false Global Snapshot Count Reset: false TLB Invalidate: false Generic Media State Clear: false Post Sync Operation: 0 (No Write) Depth Stall Enable: false Render Target Cache Flush Enable: false Instruction Cache Invalidate Enable: false Texture Cache Invalidation Enable: false Indirect State Pointers Disable: false Notify Enable: false Pipe Control Flush Enable: false DC Flush Enable: false VF Cache Invalidation Enable: true Constant Cache Invalidation Enable: false State Cache Invalidation Enable: false Stall At Pixel Scoreboard: false Depth Cache Flush Enable: false 0x000ce138: 0x : Dword 2 Address: 0x 0x000ce140: 0x : Dword 4 Immediate Data: 0 Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index f7455507abd..040201541ff 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -939,10 +939,11 @@ gen_field_iterator_next(struct gen_field_iterator *iter) static void print_dword_header(FILE *outfile, - struct gen_field_iterator *iter, uint64_t offset) + struct gen_field_iterator *iter, + uint64_t offset, uint32_t dword) { fprintf(outfile, "0x%08"PRIx64": 0x%08x : Dword %d\n", - offset + 4 * iter->dword, iter->p[iter->dword], iter->dword); + offset + 4 * dword, iter->p[dword], dword); } static bool @@ -964,12 +965,13 @@ gen_print_group(FILE *outfile, struct gen_group *group, uint64_t offset, const uint32_t *p, bool color) { struct gen_field_iterator iter; - int last_dword = 0; + int last_dword = -1; gen_field_iterator_init(, group, p, color); while (gen_field_iterator_next()) { if (last_dword != iter.dword) { - print_dword_header(outfile, , offset); + for (int i = last_dword + 1; i <= iter.dword; i++) +print_dword_header(outfile, , offset, i); last_dword = iter.dword; } if (!is_header_field(group, iter.field)) { -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/33] intel: decoder: build sorted linked lists of fields
The xml files don't always have fields in order. This might confuse our parsing of the commands. Let's have the fields in order. To do this, the easiest way it to use a linked list. It also helps a bit with the iterator. Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 52 +- src/intel/common/gen_decoder.h | 7 +++--- 2 files changed, 34 insertions(+), 25 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index 55e7305117c..f7455507abd 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -56,6 +56,9 @@ struct parser_context { int nvalues; struct gen_value *values[256]; + struct gen_field *fields[256]; + int nfields; + struct gen_spec *spec; }; @@ -359,19 +362,25 @@ create_value(struct parser_context *ctx, const char **atts) return value; } -static void +static struct gen_field * create_and_append_field(struct parser_context *ctx, const char **atts) { - if (ctx->group->nfields == ctx->group->fields_size) { - ctx->group->fields_size = MAX2(ctx->group->fields_size * 2, 2); - ctx->group->fields = - (struct gen_field **) realloc(ctx->group->fields, - sizeof(ctx->group->fields[0]) * - ctx->group->fields_size); + struct gen_field *field = create_field(ctx, atts); + struct gen_field *prev = NULL, *list = ctx->group->fields; + + while (list && field->start > list->start) { + prev = list; + list = list->next; } - ctx->group->fields[ctx->group->nfields++] = create_field(ctx, atts); + field->next = list; + if (prev == NULL) + ctx->group->fields = field; + else + prev->next = field; + + return field; } static void @@ -421,7 +430,7 @@ start_element(void *data, const char *element_name, const char **atts) previous_group->next = group; ctx->group = group; } else if (strcmp(element_name, "field") == 0) { - create_and_append_field(ctx, atts); + ctx->fields[ctx->nfields++] = create_and_append_field(ctx, atts); } else if (strcmp(element_name, "enum") == 0) { ctx->enoom = create_enum(ctx, name, atts); } else if (strcmp(element_name, "value") == 0) { @@ -441,18 +450,17 @@ end_element(void *data, const char *name) strcmp(name, "struct") == 0 || strcmp(name, "register") == 0) { struct gen_group *group = ctx->group; + struct gen_field *list = group->fields; ctx->group = ctx->group->parent; - for (int i = 0; i < group->nfields; i++) { - if (group->fields[i]->start >= 16 && - group->fields[i]->end <= 31 && - group->fields[i]->has_default) { + while (list && list->end <= 31) { + if (list->start >= 16 && list->has_default) { group->opcode_mask |= - mask(group->fields[i]->start % 32, group->fields[i]->end % 32); -group->opcode |= - group->fields[i]->default_value << group->fields[i]->start; + mask(list->start % 32, list->end % 32); +group->opcode |= list->default_value << list->start; } + list = list->next; } if (strcmp(name, "instruction") == 0) @@ -468,9 +476,10 @@ end_element(void *data, const char *name) } else if (strcmp(name, "group") == 0) { ctx->group = ctx->group->parent; } else if (strcmp(name, "field") == 0) { - assert(ctx->group->nfields > 0); - struct gen_field *field = ctx->group->fields[ctx->group->nfields - 1]; + struct gen_field *field = ctx->fields[ctx->nfields - 1]; size_t size = ctx->nvalues * sizeof(ctx->values[0]); + ctx->nfields--; + assert(ctx->nfields >= 0); field->inline_enum.values = xzalloc(size); field->inline_enum.nvalues = ctx->nvalues; memcpy(field->inline_enum.values, ctx->values, size); @@ -758,6 +767,7 @@ gen_field_iterator_init(struct gen_field_iterator *iter, memset(iter, 0, sizeof(*iter)); iter->group = group; + iter->field = group->fields; iter->p = p; iter->print_colors = print_colors; } @@ -776,7 +786,7 @@ gen_get_enum_name(struct gen_enum *e, uint64_t value) static bool iter_more_fields(const struct gen_field_iterator *iter) { - return iter->field_iter < iter->group->nfields; + return iter->field != NULL && iter->field->next != NULL; } static uint32_t @@ -812,7 +822,7 @@ iter_advance_group(struct gen_field_iterator *iter) } } - iter->field_iter = 0; + iter->field = iter->group->fields; } static bool @@ -825,7 +835,7 @@ iter_advance_field(struct gen_field_iterator *iter) iter_advance_group(iter); } - iter->field = iter->group->fields[iter->field_iter++]; + iter->field = iter->field->next; if (iter->field->name) strncpy(iter->name,
[Mesa-dev] [PATCH 01/33] intel: common: expose gen_spec fields
Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 13 - src/intel/common/gen_decoder.h | 13 + 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index 85880143f00..395ff02908a 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -39,19 +39,6 @@ #define XML_BUFFER_SIZE 4096 -struct gen_spec { - uint32_t gen; - - int ncommands; - struct gen_group *commands[256]; - int nstructs; - struct gen_group *structs[256]; - int nregisters; - struct gen_group *registers[256]; - int nenums; - struct gen_enum *enums[256]; -}; - struct location { const char *filename; int line_number; diff --git a/src/intel/common/gen_decoder.h b/src/intel/common/gen_decoder.h index cfc9f2e3f15..9c8fce2baa2 100644 --- a/src/intel/common/gen_decoder.h +++ b/src/intel/common/gen_decoder.h @@ -66,6 +66,19 @@ struct gen_field_iterator { bool print_colors; }; +struct gen_spec { + uint32_t gen; + + uint32_t ncommands; + struct gen_group *commands[256]; + uint32_t nstructs; + struct gen_group *structs[256]; + uint32_t nregisters; + struct gen_group *registers[256]; + uint32_t nenums; + struct gen_enum *enums[256]; +}; + struct gen_group { struct gen_spec *spec; char *name; -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/33] intel: decoder: don't read qword outside instruction/struct limit
We used to print invalid data when the last field was being clamped to 32bits due to Dword Length of the whole instruction. Here is an example where the decoder read part of the next instruction instead of stopping at the 32bit limit: 0x000ce0b4: 0x1002: MI_STORE_DATA_IMM 0x000ce0b4: 0x1002 : Dword 0 DWord Length: 2 Store Qword: 0 Use Global GTT: false 0x000ce0b8: 0x00045010 : Dword 1 Core Mode Enable: 0 Address: 0x00045010 0x000ce0bc: 0x : Dword 2 0x000ce0c0: 0x : Dword 3 Immediate Data: 8791026489807077376 With this change we have the proper value : 0x000ce0b4: 0x1002: MI_STORE_DATA_IMM (4 Dwords) 0x000ce0b4: 0x1002 : Dword 0 DWord Length: 2 Store Qword: 0 Use Global GTT: false 0x000ce0b8: 0x00045010 : Dword 1 Core Mode Enable: 0 Address: 0x00045010 0x000ce0bc: 0x : Dword 2 0x000ce0c0: 0x : Dword 3 Immediate Data: 0 Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 11 --- src/intel/common/gen_decoder.h | 1 + 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index 4ce3a577f96..c1affc47a02 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -842,9 +842,13 @@ gen_field_decode(struct gen_field_iterator *iter) else memset(iter->name, 0, sizeof(iter->name)); - if ((iter->field->end - iter->field->start) > 32) - v.qw = ((uint64_t) iter->p[iter->dword+1] << 32) | iter->p[iter->dword]; - else + memset(, 0, sizeof(v)); + + if ((iter->field->end - iter->field->start) > 32) { + if (>p[iter->dword + 1] < iter->end) + v.qw = ((uint64_t) iter->p[iter->dword+1] << 32); + v.qw |= iter->p[iter->dword]; + } else v.qw = iter->p[iter->dword]; const char *enum_name = NULL; @@ -933,6 +937,7 @@ gen_field_iterator_init(struct gen_field_iterator *iter, else iter->field = group->next->fields; iter->p = p; + iter->end = [gen_group_get_length(iter->group, iter->p)]; iter->print_colors = print_colors; gen_field_decode(iter); diff --git a/src/intel/common/gen_decoder.h b/src/intel/common/gen_decoder.h index b11927d2693..4d9edf78ff0 100644 --- a/src/intel/common/gen_decoder.h +++ b/src/intel/common/gen_decoder.h @@ -57,6 +57,7 @@ struct gen_field_iterator { char value[128]; struct gen_group *struct_desc; const uint32_t *p; + const uint32_t *end; int dword; /**< current field starts at [dword] */ int group_iter; -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/33] intel: UI for aubinator
Hi all, This is a proposal for a tool to help debug intel driver through aubdumps. Having gone through implementing (& mostly debugging) the ycbcr extension for anv, I wished I had a better tool than the text output of aubinator. This is the current state of about 1 & 1/2 month of experimenting with a UI toolkit called ImGui (for Immediate Mode Gui). It turned out much better than the previous attempts I had tried with a html UI. Some of the commits in this series probably won't make it to the mailing list. You can find the branch here : https://github.com/djdeath/mesa/tree/aubinator-imgui I've already sent some of the commits fixing bits of the decoder. Others are refactoring/abstractions for allowing a UI that doesn't print out on stdout. Some screenshots : https://i.imgur.com/0JTLkTo.png https://i.imgur.com/ABq31XD.png Hopefully people find this interesting. Cheers, Lionel Landwerlin (33): intel: common: expose gen_spec fields intel: common: silence compiler warning intel: decoder: build sorted linked lists of fields intel: common: print out all dword with field spanning multiple dwords intel: decoder: reorder iterator init function intel: decoder: move field name copy intel: decoder: split out getting the next field and decoding it intel: decoder: don't read qword outside instruction/struct limit aubinator: print number of dwords per instruction intel: decoder: expose helper to test header fields intel: decoder: add destructor for gen_spec intel: decoder: simplify creation of struct when 0-allocated intel: decoder: pack iterator variable declarations intel: decoder: extract instruction/structs length intel: error-decode: implement a rolling window of programs intel: decoder: remove unused platform field intel: common: make intel utils available from C++ intel: decoder: simplify field_is_header() intel: decoder: rename internal function to free name intel: decoder: rename field() to field_value() intel: decoder: extract field value computation intel: decoder: expose missing find_enum() intel: decoder: group enum related declarations intel: decoder: enable decoding a single field intel: compiler: abstract printing intel: genxml: add blitter instructions for gen6->10 intel: genxml: be consistent about register offset naming intel: genxml: rename output urb offset field intel: decoder: change group_get_length() to take first dword intel: decoder: change find_instruction() to take first dword intel: decoder: decouple decoding from memory pointers intel: decoder: add function to query shader length intel: add aubinator ui configure.ac |16 + meson.build| 5 +- meson_options.txt | 6 + src/intel/Makefile.tools.am|58 + src/intel/common/gen_decoder.c | 519 +- src/intel/common/gen_decoder.h |75 +- src/intel/common/gen_device_info.h | 8 + src/intel/compiler/brw_compile_clip.c | 5 +- src/intel/compiler/brw_compile_sf.c| 5 +- src/intel/compiler/brw_disasm.c| 645 +- src/intel/compiler/brw_eu.c|39 - src/intel/compiler/brw_eu.h|13 +- src/intel/compiler/brw_eu_compact.c| 9 +- src/intel/compiler/intel_asm_annotation.c | 5 +- src/intel/compiler/test_eu_compact.cpp | 7 +- src/intel/genxml/gen10.xml | 782 +- src/intel/genxml/gen4.xml | 2 +- src/intel/genxml/gen45.xml | 2 +- src/intel/genxml/gen5.xml | 2 +- src/intel/genxml/gen6.xml | 748 +- src/intel/genxml/gen7.xml | 750 +- src/intel/genxml/gen75.xml | 754 +- src/intel/genxml/gen8.xml | 739 +- src/intel/genxml/gen9.xml | 782 +- src/intel/tools/.gitignore | 2 + src/intel/tools/aubinator.c|30 +- src/intel/tools/aubinator_error_decode.c |28 +- src/intel/tools/aubinator_imgui_widgets.cpp| 183 + src/intel/tools/aubinator_imgui_widgets.h |12 + src/intel/tools/aubinator_ui.cpp | 3116 +++ src/intel/tools/disasm.c |97 +- src/intel/tools/gen_disasm.h |17 +- src/intel/tools/imgui/LICENSE.txt |21 + src/intel/tools/imgui/imconfig.h |57 + src/intel/tools/imgui/imgui.cpp| 10725 +++ src/intel/tools/imgui/imgui.h | 1516 src/intel/tools/imgui/imgui_demo.cpp | 2827 ++ src/intel/tools/imgui/imgui_draw.cpp | 2673 ++ src/intel/tools/imgui/imgui_impl_gtk3_cogl.cpp | 784 ++
[Mesa-dev] [PATCH 02/33] intel: common: silence compiler warning
Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index 395ff02908a..55e7305117c 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -556,7 +556,7 @@ gen_spec_load(const struct gen_device_info *devinfo) { struct parser_context ctx; void *buf; - uint8_t *text_data; + uint8_t *text_data = NULL; uint32_t text_offset = 0, text_length = 0, total_length; uint32_t gen_10 = devinfo_to_gen(devinfo); -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/33] intel: decoder: move field name copy
This should be inside the function that actually decodes fields. Signed-off-by: Lionel Landwerlin--- src/intel/common/gen_decoder.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/intel/common/gen_decoder.c b/src/intel/common/gen_decoder.c index ef39c1c14db..8336269b183 100644 --- a/src/intel/common/gen_decoder.c +++ b/src/intel/common/gen_decoder.c @@ -822,10 +822,6 @@ iter_advance_field(struct gen_field_iterator *iter) } iter->field = iter->field->next; - if (iter->field->name) - strncpy(iter->name, iter->field->name, sizeof(iter->name)); - else - memset(iter->name, 0, sizeof(iter->name)); iter->dword = iter_group_offset_bits(iter, iter->group_iter) / 32 + iter->field->start / 32; iter->struct_desc = NULL; @@ -844,6 +840,11 @@ gen_field_iterator_next(struct gen_field_iterator *iter) if (!iter_advance_field(iter)) return false; + if (iter->field->name) + strncpy(iter->name, iter->field->name, sizeof(iter->name)); + else + memset(iter->name, 0, sizeof(iter->name)); + if ((iter->field->end - iter->field->start) > 32) v.qw = ((uint64_t) iter->p[iter->dword+1] << 32) | iter->p[iter->dword]; else -- 2.15.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH 2/2] radv: Disallow indirect outputs for GS on GFX9 as well.
On Fri, 2017-10-27 at 19:50 +0200, Bas Nieuwenhuizen wrote: > On Fri, Oct 27, 2017 at 5:03 PM, Andres Gomezwrote: [...] > > In any case, I was wondering whether it would be interesting to bring > > them both to the 17.2 stable queue and whether we would also want > > Timothy's preceding patch: > > > > 087e010b2b3dd83a539f97203909d6c43b5da87c radv: copy indirect lowering > > settings from radeonsi > > Yes, that would be reasonable. Timothy's patch is an optimization > though, so I'd be happy to send a backport that only generates the > variable needed for the other two if you'd prefer that. That would be great. Thanks, Bas! ☺ -- Br, Andres ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev