Re: [Mesa-dev] [PATCH v2 31/34] i965/state: Account for the element size in emit_buffer_surface_state
On Tue, Jun 28, 2016 at 09:22:49AM +0300, Pohjolainen, Topi wrote: > On Thu, Jun 23, 2016 at 02:00:30PM -0700, Jason Ekstrand wrote: > > --- > > src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 11 ++- > > src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 9 + > > src/mesa/drivers/dri/i965/gen8_surface_state.c| 9 + > > 3 files changed, 16 insertions(+), 13 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > > b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > > index 944d64d..29b8976 100644 > > --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > > +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > > @@ -496,6 +496,7 @@ gen4_emit_buffer_surface_state(struct brw_context *brw, > > unsigned pitch, > > bool rw) > > { > > + unsigned elements = buffer_size / pitch; > > Could be const as well as in the two other occurences further down. Otherwise patches 31-34 are also: Reviewed-by: Topi Pohjolainen___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] glsl: add driconf to zero-init unintialized vars
Rob Clarkwrites: > On Tue, Jun 28, 2016 at 11:28 AM, Marek Olšák wrote: >> On Mon, Jun 27, 2016 at 9:28 PM, Rob Clark wrote: >>> On Mon, Jun 27, 2016 at 3:06 PM, Kenneth Graunke >>> wrote: On Monday, June 27, 2016 11:43:28 AM PDT Matt Turner wrote: > On Mon, Jun 27, 2016 at 4:44 AM, Rob Clark wrote: > > On Mon, Jun 27, 2016 at 7:13 AM, Alan Swanson > > wrote: > >> On 2016-06-25 13:37, Rob Clark wrote: > >>> > >>> Some games are sloppy.. perhaps because it is defined behavior for DX > >>> or > >>> perhaps because nv blob driver defaults things to zero. > >>> > >>> So add driconf param to force uninitialized variables to default to > >>> zero. > >>> > >>> This issue was observed with rust, from steam store. But has surfaced > >>> elsewhere in the past. > >>> > >>> Signed-off-by: Rob Clark > >>> --- > >>> Note that I left out the drirc bit, since not entirely sure how to > >>> identify this game. (I don't actually have the game, just working off > >>> of an apitrace) > >>> > >>> Possibly worth mentioning that for the shaders using uninitialized > >>> vars > >>> having zero-initializers lets constant-propagation get rid of a whole > >>> lot of instructions. One shader I saw dropped to less than half of > >>> it's original instruction count. > >> > >> > >> If the default for uninitialised variables is undefined, then with the > >> reported shader optimisations why bother with the (DRI) option when > >> zeroing could still essentially be classed as undefined? > >> > >> Cuts the patch down to just the src/compiler/glsl/ast_to_hir.cpp > >> change. > > > > I did suggest that on #dri-devel, but Jason had a theoretical example > > where it would hurt.. iirc something like: > > > > float maybe_undef; > > for (int i = 0; i < some_uniform_at_least_one; i++) > > maybe_undef = ... > > > > also, he didn't want to hide shader bugs that app should fix. > > > > It would be interesting to rush shaderdb w/ glsl_zero_init=true and > > see what happens, but I didn't get around to that yet. > > Here's what I get on i965. It's not a clear win. > > total instructions in shared programs: 5249030 -> 5249002 (-0.00%) > instructions in affected programs: 28936 -> 28908 (-0.10%) > helped: 66 > HURT: 132 > > total cycles in shared programs: 57966694 -> 57956306 (-0.02%) > cycles in affected programs: 1136118 -> 1125730 (-0.91%) > helped: 78 > HURT: 106 I suspect most of the help is because we're missing undef optimizations, such as CSE...while zero could be CSE'd. (I have a patch, but it hurts things too...) >>> >>> right, I was thinking that treating undef as zero in constant-folding >>> would have the same effect.. ofc it might make shader bugs less >>> obvious. >>> >>> Btw, does anyone know what fglrx does? Afaiu nv blob treats undef as >>> zero. If fglrx does the same, I suppose that strengthens the argument >>> for "just do this unconditionally". >> >> No idea what fglrx does, but LLVM does eliminate code with undefined >> inputs. Initializing everything to 0 might make that worse. > > hmm, treating as zero does eliminate a lot.. anyway, I guess we'll > stick w/ driconf. > > fwiw, with some help from the reporter, we figured out that this is > the bit that I need to squash into drirc: > > > > Not knowing a lot about drirc, I suspect you should have a double quote at the end of glsl_zero_init as well? eirik > now, if I could talk somebody into a r-b for this and the i965 fix? ;-) > > BR, > -R > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] st/mesa: get max supported number of image samples from driver
Signed-off-by: Ilia Mirkin--- src/mesa/state_tracker/st_extensions.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index b87c9ce..b55b2c2 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -443,7 +443,6 @@ void st_init_limits(struct pipe_screen *screen, c->Program[MESA_SHADER_COMPUTE].MaxImageUniforms; c->MaxCombinedShaderOutputResources += c->MaxCombinedImageUniforms; c->MaxImageUnits = MAX_IMAGE_UNITS; - c->MaxImageSamples = 0; /* XXX */ if (c->MaxCombinedImageUniforms) { extensions->ARB_shader_image_load_store = GL_TRUE; extensions->ARB_shader_image_size = GL_TRUE; @@ -988,6 +987,11 @@ void st_init_extensions(struct pipe_screen *screen, color_formats, 16, PIPE_BIND_RENDER_TARGET); + consts->MaxImageSamples = + get_max_samples_for_formats(screen, ARRAY_SIZE(color_formats), + color_formats, 16, + PIPE_BIND_SHADER_IMAGE); + consts->MaxColorTextureSamples = get_max_samples_for_formats(screen, ARRAY_SIZE(color_formats), color_formats, consts->MaxSamples, -- 2.7.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] nvc0: fix up image support for allowing multiple samples
Basically we just have to scale up the coordinates and then add the relevant sample offset. The code to handle this was already largely present from Christoph's earlier attempts to pipe images through back in the dark ages, this just hooks it all up. Signed-off-by: Ilia Mirkin--- Only tested on GK208... probably would be good for someone on GF1xx to give it a shot. .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 3 +++ .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 4 src/gallium/drivers/nouveau/nvc0/nvc0_compute.c| 24 ++ src/gallium/drivers/nouveau/nvc0/nvc0_context.h| 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 20 -- src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 20 ++ src/gallium/drivers/nouveau/nvc0/nvc0_tex.c| 6 -- 7 files changed, 64 insertions(+), 15 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 0fa5aa1..f3d7bee 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -2388,6 +2388,9 @@ Converter::getImageCoords(std::vector , int r, int s) for (int c = 0; c < arg; ++c) coords.push_back(fetchSrc(s, c)); + + if (t.isMS()) + coords.push_back(fetchSrc(s, 3)); } // For raw loads, granularity is 4 byte. diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index 67bd73b..73b680a 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp @@ -2044,6 +2044,10 @@ NVC0LoweringPass::processSurfaceCoordsNVC0(TexInstruction *su) Value *v; Value *ind = NULL; + bld.setPosition(su, false); + + adjustCoordinatesMS(su); + if (su->tex.rIndirectSrc >= 0) { ind = su->getIndirectR(); if (su->tex.r > 0) { diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c index 7511819..f5f7fd4 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c @@ -113,6 +113,30 @@ nvc0_screen_compute_setup(struct nvc0_screen *screen, PUSH_DATA (push, screen->txc->offset + 65536); PUSH_DATA (push, NVC0_TSC_MAX_ENTRIES - 1); + /* MS sample coordinate offsets */ + BEGIN_NVC0(push, NVC0_CP(CB_SIZE), 3); + PUSH_DATA (push, 2048); + PUSH_DATAh(push, screen->uniform_bo->offset + NVC0_CB_AUX_INFO(5)); + PUSH_DATA (push, screen->uniform_bo->offset + NVC0_CB_AUX_INFO(5)); + BEGIN_1IC0(push, NVC0_CP(CB_POS), 1 + 2 * 8); + PUSH_DATA (push, NVC0_CB_AUX_MS_INFO); + PUSH_DATA (push, 0); /* 0 */ + PUSH_DATA (push, 0); + PUSH_DATA (push, 1); /* 1 */ + PUSH_DATA (push, 0); + PUSH_DATA (push, 0); /* 2 */ + PUSH_DATA (push, 1); + PUSH_DATA (push, 1); /* 3 */ + PUSH_DATA (push, 1); + PUSH_DATA (push, 2); /* 4 */ + PUSH_DATA (push, 0); + PUSH_DATA (push, 3); /* 5 */ + PUSH_DATA (push, 0); + PUSH_DATA (push, 2); /* 6 */ + PUSH_DATA (push, 1); + PUSH_DATA (push, 3); /* 7 */ + PUSH_DATA (push, 1); + return 0; } diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h index 8a965fc..c633ccf 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h @@ -112,7 +112,7 @@ #define NVC0_CB_AUX_TEX_INFO(i) 0x020 + (i) * 4 #define NVC0_CB_AUX_TEX_SIZE(32 * 4) /* 8 sets of 32-bits coordinate offsets */ -#define NVC0_CB_AUX_MS_INFO 0x0a0 /* CP */ +#define NVC0_CB_AUX_MS_INFO 0x0a0 #define NVC0_CB_AUX_MS_SIZE (8 * 2 * 4) /* block/grid size, at 3 32-bits integers each and gridid */ #define NVC0_CB_AUX_GRID_INFO 0x0e0 /* CP */ diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c index aba9511..d75b702 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c @@ -555,29 +555,25 @@ nvc0_program_translate(struct nvc0_program *prog, uint16_t chipset, info->io.genUserClip = prog->vp.num_ucps; info->io.auxCBSlot = 15; + info->io.msInfoCBSlot = 15; info->io.ucpBase = NVC0_CB_AUX_UCP_INFO; info->io.drawInfoBase = NVC0_CB_AUX_DRAW_INFO; + info->io.msInfoBase = NVC0_CB_AUX_MS_INFO; + info->io.bufInfoBase = NVC0_CB_AUX_BUF_INFO(0); + info->io.suInfoBase = NVC0_CB_AUX_SU_INFO(0); + if (chipset >= NVISA_GK104_CHIPSET) { + info->io.texBindBase = NVC0_CB_AUX_TEX_INFO(0); + } if (prog->type == PIPE_SHADER_COMPUTE) { if (chipset >= NVISA_GK104_CHIPSET) { info->io.auxCBSlot = 7; - info->io.texBindBase = NVC0_CB_AUX_TEX_INFO(0); +
[Mesa-dev] [PATCH v2] gallium: Force blend color to 16-byte alignment
This aligns the 4-element color float array to 16 byte boundaries. This should allow compiler vectorizers to generate better optimizations. Also fixes broken vectorization generated by Intel compiler. v2: Fixed indentation and added a lengthy comment explaining the reason for the alignment. Reported-by: Tim RowleyTested-by: Tim Rowley Signed-off-by: Chuck Atkins Acked-by: Roland Scheidegger --- src/gallium/include/pipe/p_state.h | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/gallium/include/pipe/p_state.h b/src/gallium/include/pipe/p_state.h index 1543e90..5526c39 100644 --- a/src/gallium/include/pipe/p_state.h +++ b/src/gallium/include/pipe/p_state.h @@ -326,7 +326,17 @@ struct pipe_blend_state struct pipe_blend_color { - float color[4]; + /** +* Making the color array explicitly 16-byte aligned provides a hint to +* compilers to make more efficient auto-vectorization optimizations. +* The actual performance gains from vectorizing the blend color array are +* fairly minimal, if any, but the alignment is necessary to work around +* buggy vectorization in some compilers which fail to generate the correct +* unaligned accessors resulting in a segfault. Specifically several +* versions of the Intel compiler are known to be affected but it's likely +* others are as well. +*/ + PIPE_ALIGN_VAR(16) float color[4]; }; -- 2.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium: Force blend color to 16-byte alignment
Really it's a workaround to fix bad vectorization in the Intel compiler, but it doesn't it doesn't hurt for other compilers, even if the performance difference is marginal if at all, and could only help. If it was problematic otherwise I'd guard it with an #ifdef _INTEL_COMPILER. I can update the patch with a comment explaining why it's there in case other developers stumble on it and think "wtf". On Jun 28, 2016 6:38 PM, "Roland Scheidegger"wrote: Am 28.06.2016 um 22:45 schrieb Chuck Atkins: > This aligns the 4-element color float array to 16 byte boundaries. This > should allow compiler vectorizers to generate better optimizations. > Also fixes broken vectorization generated by Intel compiler. > > Reported-by: Tim Rowley > Signed-off-by: Chuck Atkins > --- > src/gallium/include/pipe/p_state.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/gallium/include/pipe/p_state.h b/src/gallium/include/pipe/p_state.h > index 1543e90..95f140f 100644 > --- a/src/gallium/include/pipe/p_state.h > +++ b/src/gallium/include/pipe/p_state.h > @@ -326,7 +326,7 @@ struct pipe_blend_state > > struct pipe_blend_color > { > - float color[4]; > + PIPE_ALIGN_VAR(16) float color[4]; > }; > I'm wondering if that's really needed. I have a difficult time to imagine setting blend color is performance critical. And driver internal you can obviously still align pipe_blend_color structs yourself. But OTOH, why not... Acked-by: Roland Scheidegger ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Intel-gfx] [PATCH 2/2] drm/i915: Removing PCI IDs that are no longer listed as Kabylake.
On Thu, 2016-06-23 at 14:50 -0700, Rodrigo Vivi wrote: > - INTEL_VGA_DEVICE(0x5932, info), /* DT GT4 */ \ > - INTEL_VGA_DEVICE(0x593B, info), /* Halo GT4 */ \ > - INTEL_VGA_DEVICE(0x593A, info), /* SRV GT4 */ \ > - INTEL_VGA_DEVICE(0x593D, info) /* WKS GT4 Reviewed-by: Dhinakaran Pandiyan___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] V3 On disk shader cache for i965 (Now with real world results!)
On Tue, Jun 28, 2016 at 10:53 AM, Timothy Arceriwrote: > On Mon, 2016-06-27 at 00:46 +1000, Timothy Arceri wrote: >> On Sun, 2016-06-26 at 16:15 +0300, Grazvydas Ignotas wrote: >> > Tried this while playing with apitrace and am getting segfaults >> > when >> > running any trace with a cached (second) run. Not sure if it's >> > "wrong" >> > traces I've chosen or what, you can take one example from this bug: >> > https://bugs.freedesktop.org/show_bug.cgi?id=96425 >> >> Thanks for testing I'll take a look tomorrow. > > The problem is the shaders were being detached after linking so we had > nothing to fallback to if we had a shade cache miss. > I've hacked something up and pushed it to the shader-cache19 branch > that allows the trace to run. Not sure how it relates to real game > performance but the trace goes from 5FPS to 7FPS on the second run on > my machine with which looks good :) Seems to work now and makes things a good deal faster. nice! However I have a case of one trace's cache seemingly affecting another trace, are you interested in that? One of them (the one that gets broken) is from this bug: https://bugs.freedesktop.org/show_bug.cgi?id=92229 Unfortunately the other "bad" one is my own and is over a gigabyte (even compressed), I'll need to trim it I guess. >> > It would also be good idea to hide the cache debug messages behind >> > some env var, or at least send them to stderr and not stdout, as >> > stdout breaks programs that pipe data through stdout like >> > qapitrace. >> >> Right thats my next task, I should get this done tomorrow also. As >> stated below :) "For now I have left in some printf's as the feature >> is >> still disabled by default and they are useful for debugging. I intend >> to fix this soon to hide them behind an environment var." Yes I have read that (even used your wording in my comment), but somehow managed to forget it while testing, sorry. Gražvydas ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/16] svga: set render target flag for snorm surfaces
We don't normally support rendering to SNORM surfaces, but with GL_ARB_copy_image we can copy to them if we treat them as typeless and use a UNORM surface view. --- src/gallium/drivers/svga/svga_resource_texture.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/gallium/drivers/svga/svga_resource_texture.c b/src/gallium/drivers/svga/svga_resource_texture.c index 14fe220..aa7724a 100644 --- a/src/gallium/drivers/svga/svga_resource_texture.c +++ b/src/gallium/drivers/svga/svga_resource_texture.c @@ -963,6 +963,16 @@ svga_texture_create(struct pipe_screen *screen, svga_format_name(typeless), bindings); } + + if (svga_format_is_uncompressed_snorm(tex->key.format)) { + /* We can't normally render to snorm surfaces, but once we + * substitute a typeless format, we can if the rendertarget view + * is unorm. This can happen with GL_ARB_copy_image. + */ + tex->key.flags |= SVGA3D_SURFACE_HINT_RENDERTARGET; + tex->key.flags |= SVGA3D_SURFACE_BIND_RENDER_TARGET; + } + tex->key.format = typeless; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/16] svga: use vgpu10 CopyRegion command when possible
From: Neha BhendeDo texture->texture copies host-side with this command when possible. Use the previous software fallback otherwise. Reviewed-by: Brian Paul --- src/gallium/drivers/svga/svga_pipe_blit.c | 149 +- 1 file changed, 147 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/svga/svga_pipe_blit.c b/src/gallium/drivers/svga/svga_pipe_blit.c index 4eec927..564af51 100644 --- a/src/gallium/drivers/svga/svga_pipe_blit.c +++ b/src/gallium/drivers/svga/svga_pipe_blit.c @@ -36,12 +36,76 @@ #define FILE_DEBUG_FLAG DEBUG_BLIT +/** + * Copy an image between textures with the vgpu10 CopyRegion command. + */ +static void +copy_region_vgpu10(struct svga_context *svga, struct pipe_resource *src_tex, +unsigned src_x, unsigned src_y, unsigned src_z, +unsigned src_level, unsigned src_face, +struct pipe_resource *dst_tex, +unsigned dst_x, unsigned dst_y, unsigned dst_z, +unsigned dst_level, unsigned dst_face, +unsigned width, unsigned height, unsigned depth) +{ + enum pipe_error ret; + uint32 srcSubResource, dstSubResource; + struct svga_texture *dtex, *stex; + SVGA3dCopyBox box; + int i, num_layers = 1; + + stex = svga_texture(src_tex); + dtex = svga_texture(dst_tex); + + box.x = dst_x; + box.y = dst_y; + box.z = dst_z; + box.w = width; + box.h = height; + box.d = depth; + box.srcx = src_x; + box.srcy = src_y; + box.srcz = src_z; + + if (src_tex->target == PIPE_TEXTURE_1D_ARRAY || + src_tex->target == PIPE_TEXTURE_2D_ARRAY) { + /* copy layer by layer */ + box.z = 0; + box.d = 1; + box.srcz = 0; + + num_layers = depth; + src_face = src_z; + dst_face = dst_z; + } + + /* loop over array layers */ + for (i = 0; i < num_layers; i++) { + srcSubResource = (src_face + i) * (src_tex->last_level + 1) + src_level; + dstSubResource = (dst_face + i) * (dst_tex->last_level + 1) + dst_level; + + ret = SVGA3D_vgpu10_PredCopyRegion(svga->swc, + dtex->handle, dstSubResource, + stex->handle, srcSubResource, ); + if (ret != PIPE_OK) { + svga_context_flush(svga, NULL); + ret = SVGA3D_vgpu10_PredCopyRegion(svga->swc, +dtex->handle, dstSubResource, +stex->handle, srcSubResource, ); + assert(ret == PIPE_OK); + } + + svga_define_texture_level(dtex, dst_face + i, dst_level); + } +} + + static void svga_resource_copy_region(struct pipe_context *pipe, - struct pipe_resource* dst_tex, + struct pipe_resource *dst_tex, unsigned dst_level, unsigned dstx, unsigned dsty, unsigned dstz, - struct pipe_resource* src_tex, + struct pipe_resource *src_tex, unsigned src_level, const struct pipe_box *src_box) { @@ -100,6 +164,52 @@ svga_resource_copy_region(struct pipe_context *pipe, } +/** + * The state tracker implements some resource copies with blits (for + * GL_ARB_copy_image). This function checks if we should really do the blit + * with a VGPU10 CopyRegion command or software fallback (for incompatible + * src/dst formats). + */ +static bool +can_blit_via_copy_region_vgpu10(struct svga_context *svga, +const struct pipe_blit_info *blit_info) +{ + struct svga_texture *dtex, *stex; + + if (!svga_have_vgpu10(svga)) + return false; + + stex = svga_texture(blit_info->src.resource); + dtex = svga_texture(blit_info->src.resource); + + // can't copy within one resource + if (stex->handle == dtex->handle) + return false; + + // can't copy between different resource types + if (blit_info->src.resource->target != blit_info->dst.resource->target) + return false; + + // check that the blit src/dst regions are same size, no flipping, etc. + if (blit_info->src.box.width != blit_info->dst.box.width || + blit_info->src.box.height != blit_info->dst.box.height) + return false; + + // depth/stencil copies not supported at this time + if (blit_info->mask != PIPE_MASK_RGBA) + return false; + + if (blit_info->alpha_blend || blit_info->render_condition_enable || + blit_info->scissor_enable) + return false; + + // check that src/dst surface formats are compatible for the VGPU device. + return util_is_format_compatible( +util_format_description(blit_info->src.resource->format), +util_format_description(blit_info->dst.resource->format)); +} + + static void svga_blit(struct pipe_context *pipe, const struct pipe_blit_info
[Mesa-dev] [PATCH 16/16] svga: use SVGA3D_vgpu10_BufferCopy() for buffer copies
So that we do copies host-side rather than in the guest with map/memcpy. Tested with piglit arb_copy_buffer-subdata-sync test and new arb_copy_buffer-intra-buffer-copy test. Reviewed-by: Charmaine Lee--- src/gallium/drivers/svga/svga_pipe_blit.c | 32 +++ 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/svga/svga_pipe_blit.c b/src/gallium/drivers/svga/svga_pipe_blit.c index 4a01c8e..08aa30a 100644 --- a/src/gallium/drivers/svga/svga_pipe_blit.c +++ b/src/gallium/drivers/svga/svga_pipe_blit.c @@ -23,10 +23,11 @@ * **/ -#include "svga_resource_texture.h" #include "svga_context.h" #include "svga_debug.h" #include "svga_cmd.h" +#include "svga_resource_buffer.h" +#include "svga_resource_texture.h" #include "svga_surface.h" //#include "util/u_blit_sw.h" @@ -117,10 +118,33 @@ svga_resource_copy_region(struct pipe_context *pipe, */ svga_surfaces_flush( svga ); - /* Fallback for buffers. */ if (dst_tex->target == PIPE_BUFFER && src_tex->target == PIPE_BUFFER) { - util_resource_copy_region(pipe, dst_tex, dst_level, dstx, dsty, dstz, -src_tex, src_level, src_box); + /* can't copy within the same buffer, unfortunately */ + if (svga_have_vgpu10(svga) && src_tex != dst_tex) { + enum pipe_error ret; + struct svga_winsys_surface *src_surf; + struct svga_winsys_surface *dst_surf; + struct svga_buffer *dbuffer = svga_buffer(dst_tex); + + src_surf = svga_buffer_handle(svga, src_tex); + dst_surf = svga_buffer_handle(svga, dst_tex); + + ret = SVGA3D_vgpu10_BufferCopy(svga->swc, src_surf, dst_surf, +src_box->x, dstx, src_box->width); + if (ret != PIPE_OK) { +svga_context_flush(svga, NULL); +ret = SVGA3D_vgpu10_BufferCopy(svga->swc, src_surf, dst_surf, + src_box->x, dstx, src_box->width); +assert(ret == PIPE_OK); + } + + dbuffer->dirty = TRUE; + } + else { + /* use map/memcpy fallback */ + util_resource_copy_region(pipe, dst_tex, dst_level, dstx, + dsty, dstz, src_tex, src_level, src_box); + } return; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/16] svga: adjust sampler view format for RGBX
We previously handled the case of a RGBX sampler view of a RGBA surface. Add the reverse case too. For GL_ARB_copy_image. --- src/gallium/drivers/svga/svga_state_sampler.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/svga/svga_state_sampler.c b/src/gallium/drivers/svga/svga_state_sampler.c index 6e78825..00e8fc0 100644 --- a/src/gallium/drivers/svga/svga_state_sampler.c +++ b/src/gallium/drivers/svga/svga_state_sampler.c @@ -106,12 +106,16 @@ svga_validate_pipe_sampler_view(struct svga_context *svga, enum pipe_format pformat = sv->base.format; /* vgpu10 cannot create a BGRX view for a BGRA resource, so force it to - * create a BGRA view. + * create a BGRA view (and vice versa). */ if (pformat == PIPE_FORMAT_B8G8R8X8_UNORM && sv->base.texture->format == PIPE_FORMAT_B8G8R8A8_UNORM) { pformat = PIPE_FORMAT_B8G8R8A8_UNORM; } + else if (pformat == PIPE_FORMAT_B8G8R8A8_UNORM && + sv->base.texture->format == PIPE_FORMAT_B8G8R8X8_UNORM) { + pformat = PIPE_FORMAT_B8G8R8X8_UNORM; + } format = svga_translate_format(ss, pformat, PIPE_BIND_SAMPLER_VIEW); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 13/16] svga: enable ARB_copy_image extension in the driver
From: Neha BhendeReviewed-by: Brian Paul --- src/gallium/drivers/svga/svga_screen.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/svga/svga_screen.c b/src/gallium/drivers/svga/svga_screen.c index 4c2774d..359a159 100644 --- a/src/gallium/drivers/svga/svga_screen.c +++ b/src/gallium/drivers/svga/svga_screen.c @@ -388,6 +388,8 @@ svga_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_VIDEO_MEMORY: /* XXX: Query the host ? */ return 1; + case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS: + return sws->have_vgpu10; case PIPE_CAP_UMA: case PIPE_CAP_RESOURCE_FROM_USER_MEMORY: case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: @@ -398,7 +400,6 @@ svga_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_TGSI_TXQS: case PIPE_CAP_FORCE_PERSAMPLE_INTERP: case PIPE_CAP_SHAREABLE_SHADERS: - case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS: case PIPE_CAP_CLEAR_TEXTURE: case PIPE_CAP_DRAW_PARAMETERS: case PIPE_CAP_TGSI_FS_POSITION_IS_SYSVAL: -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/16] svga: try blitting with copy region in more cases
We previously could do blits with util_resource_copy_region() when doing 'loose' format checking. Also do blits with util_resource_copy_region() when the blit src/dst formats (not the underlying resources) exactly match. Needed for GL_ARB_copy_image. --- src/gallium/drivers/svga/svga_pipe_blit.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/svga/svga_pipe_blit.c b/src/gallium/drivers/svga/svga_pipe_blit.c index ad54dc5..4a01c8e 100644 --- a/src/gallium/drivers/svga/svga_pipe_blit.c +++ b/src/gallium/drivers/svga/svga_pipe_blit.c @@ -294,7 +294,13 @@ svga_blit(struct pipe_context *pipe, return; } - if (util_try_blit_via_copy_region(pipe, blit_info)) { + if (util_can_blit_via_copy_region(blit_info, TRUE) || + util_can_blit_via_copy_region(blit_info, FALSE)) { + util_resource_copy_region(pipe, blit_info->dst.resource, +blit_info->dst.level, +blit_info->dst.box.x, blit_info->dst.box.y, +blit_info->dst.box.z, blit_info->src.resource, +blit_info->src.level, _info->src.box); return; /* done */ } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/16] svga: use copy_region_vgpu10() for region copies when possible
--- src/gallium/drivers/svga/svga_pipe_blit.c | 42 --- 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/svga/svga_pipe_blit.c b/src/gallium/drivers/svga/svga_pipe_blit.c index 564af51..ad54dc5 100644 --- a/src/gallium/drivers/svga/svga_pipe_blit.c +++ b/src/gallium/drivers/svga/svga_pipe_blit.c @@ -138,7 +138,6 @@ svga_resource_copy_region(struct pipe_context *pipe, src_z = src_box->z; } - /* different src/dst type???*/ if (dst_tex->target == PIPE_TEXTURE_CUBE || dst_tex->target == PIPE_TEXTURE_1D_ARRAY) { dst_face_layer = dstz; @@ -150,14 +149,49 @@ svga_resource_copy_region(struct pipe_context *pipe, dst_z = dstz; } - svga_texture_copy_handle(svga, -stex->handle, + stex = svga_texture(src_tex); + dtex = svga_texture(dst_tex); + + if (svga_have_vgpu10(svga)) { + /* vgpu10 */ + if (util_format_is_compressed(src_tex->format) == + util_format_is_compressed(dst_tex->format) && + !util_format_is_depth_and_stencil(src_tex->format) && + stex->handle != dtex->handle && + src_tex->target == dst_tex->target) { + copy_region_vgpu10(svga, +src_tex, src_box->x, src_box->y, src_z, src_level, src_face_layer, -dtex->handle, +dst_tex, dstx, dsty, dst_z, dst_level, dst_face_layer, src_box->width, src_box->height, src_box->depth); + } + else { + util_resource_copy_region(pipe, dst_tex, dst_level, dstx, dsty, dstz, + src_tex, src_level, src_box); + } + } + else { + /* vgpu9 */ + if (src_tex->format == dst_tex->format) { + svga_texture_copy_handle(svga, + stex->handle, + src_box->x, src_box->y, src_z, + src_level, src_face_layer, + dtex->handle, + dstx, dsty, dst_z, + dst_level, dst_face_layer, + src_box->width, src_box->height, + src_box->depth); + svga_define_texture_level(dtex, dst_face_layer, dst_level); + } + else { + util_resource_copy_region(pipe, dst_tex, dst_level, dstx, dsty, dstz, + src_tex, src_level, src_box); + } + } /* Mark the destination image as being defined */ svga_define_texture_level(dtex, dst_face_layer, dst_level); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 15/16] svga: add SVGA3D_vgpu10_BufferCopy()
--- src/gallium/drivers/svga/svga_cmd.h| 6 ++ src/gallium/drivers/svga/svga_cmd_vgpu10.c | 24 2 files changed, 30 insertions(+) diff --git a/src/gallium/drivers/svga/svga_cmd.h b/src/gallium/drivers/svga/svga_cmd.h index 26e4690..06e1b4a 100644 --- a/src/gallium/drivers/svga/svga_cmd.h +++ b/src/gallium/drivers/svga/svga_cmd.h @@ -642,4 +642,10 @@ enum pipe_error SVGA3D_vgpu10_GenMips(struct svga_winsys_context *swc, const SVGA3dShaderResourceViewId shaderResourceViewId, struct svga_winsys_surface *view); + +enum pipe_error +SVGA3D_vgpu10_BufferCopy(struct svga_winsys_context *swc, + struct svga_winsys_surface *src, + struct svga_winsys_surface *dst, + unsigned srcx, unsigned dstx, unsigned width); #endif /* __SVGA3D_H__ */ diff --git a/src/gallium/drivers/svga/svga_cmd_vgpu10.c b/src/gallium/drivers/svga/svga_cmd_vgpu10.c index 2729655..1f13193 100644 --- a/src/gallium/drivers/svga/svga_cmd_vgpu10.c +++ b/src/gallium/drivers/svga/svga_cmd_vgpu10.c @@ -1314,3 +1314,27 @@ SVGA3D_vgpu10_GenMips(struct svga_winsys_context *swc, swc->commit(swc); return PIPE_OK; } + + +enum pipe_error +SVGA3D_vgpu10_BufferCopy(struct svga_winsys_context *swc, + struct svga_winsys_surface *src, + struct svga_winsys_surface *dst, + unsigned srcx, unsigned dstx, unsigned width) +{ + SVGA3dCmdDXBufferCopy *cmd; + + cmd = SVGA3D_FIFOReserve(swc, SVGA_3D_CMD_DX_BUFFER_COPY, sizeof *cmd, 2); + + if (!cmd) + return PIPE_ERROR_OUT_OF_MEMORY; + + swc->surface_relocation(swc, >dest, NULL, dst, SVGA_RELOC_WRITE); + swc->surface_relocation(swc, >src, NULL, src, SVGA_RELOC_READ); + cmd->destX = dstx; + cmd->srcX = srcx; + cmd->width = width; + + swc->commit(swc); + return PIPE_OK; +} -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/16] svga: add new svga_format_is_uncompressed_snorm() helper
--- src/gallium/drivers/svga/svga_format.c | 20 src/gallium/drivers/svga/svga_format.h | 4 2 files changed, 24 insertions(+) diff --git a/src/gallium/drivers/svga/svga_format.c b/src/gallium/drivers/svga/svga_format.c index 17c3bf9..1b3cebe 100644 --- a/src/gallium/drivers/svga/svga_format.c +++ b/src/gallium/drivers/svga/svga_format.c @@ -2193,3 +2193,23 @@ svga_sampler_format(SVGA3dSurfaceFormat format) return format; } } + + +/** + * Is the given format an uncompressed snorm format? + */ +bool +svga_format_is_uncompressed_snorm(SVGA3dSurfaceFormat format) +{ + switch (format) { + case SVGA3D_R8G8B8A8_SNORM: + case SVGA3D_R8G8_SNORM: + case SVGA3D_R8_SNORM: + case SVGA3D_R16G16B16A16_SNORM: + case SVGA3D_R16G16_SNORM: + case SVGA3D_R16_SNORM: + return true; + default: + return false; + } +} diff --git a/src/gallium/drivers/svga/svga_format.h b/src/gallium/drivers/svga/svga_format.h index 630a86a..e6258179 100644 --- a/src/gallium/drivers/svga/svga_format.h +++ b/src/gallium/drivers/svga/svga_format.h @@ -104,4 +104,8 @@ SVGA3dSurfaceFormat svga_sampler_format(SVGA3dSurfaceFormat format); +bool +svga_format_is_uncompressed_snorm(SVGA3dSurfaceFormat format); + + #endif /* SVGA_FORMAT_H_ */ -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/16] svga: adjust render target view format for RGBX
For GL_ARB_copy_image we may be asked to create an RGBA view of a RGBX surface. Use an RGBX view format for that case. --- src/gallium/drivers/svga/svga_surface.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/svga/svga_surface.c b/src/gallium/drivers/svga/svga_surface.c index a0108d2..e5943cf 100644 --- a/src/gallium/drivers/svga/svga_surface.c +++ b/src/gallium/drivers/svga/svga_surface.c @@ -452,10 +452,22 @@ svga_validate_surface_view(struct svga_context *svga, struct svga_surface *s) ); } else { + SVGA3dSurfaceFormat view_format = s->key.format; + const struct svga_texture *stex = svga_texture(s->base.texture); + + /* Can't create RGBA render target view of a RGBX surface so adjust + * the view format. We do something similar for texture samplers in + * svga_validate_pipe_sampler_view(). + */ + if (view_format == SVGA3D_B8G8R8A8_UNORM && + stex->key.format == SVGA3D_B8G8R8X8_TYPELESS) { +view_format = SVGA3D_B8G8R8X8_UNORM; + } + ret = SVGA3D_vgpu10_DefineRenderTargetView(svga->swc, s->view_id, s->handle, -s->key.format, +view_format, resType, ); } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 14/16] svga: flush buffers when mapping for reading
With host-side buffer copies (via SVGA3D_vgpu10_BufferCopy()) we have to make sure any pending map-write operations are completed before reading. Otherwise the ReadbackSubResource operation could get stale data from the host buffer. This allows the piglit arb_copy_buffer-subdata-sync test to pass when we start using the SVGA3D_vgpu10_BufferCopy command. --- src/gallium/drivers/svga/svga_resource_buffer.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/svga/svga_resource_buffer.c b/src/gallium/drivers/svga/svga_resource_buffer.c index 9ecb975..3c6ee20 100644 --- a/src/gallium/drivers/svga/svga_resource_buffer.c +++ b/src/gallium/drivers/svga/svga_resource_buffer.c @@ -95,13 +95,25 @@ svga_buffer_transfer_map(struct pipe_context *pipe, transfer->usage = usage; transfer->box = *box; - if ((usage & PIPE_TRANSFER_READ) && sbuf->dirty) { - /* Only need to test for vgpu10 since only vgpu10 features (streamout, - * buffer copy) can modify buffers on the device. - */ - if (svga_have_vgpu10(svga)) { + if (usage & PIPE_TRANSFER_READ) { + if (!sbuf->user) { + (void) svga_buffer_handle(svga, resource); + } + + if (sbuf->dma.pending > 0) { + svga_buffer_upload_flush(svga, sbuf); + svga_context_finish(svga); + } + + if (sbuf->dirty) { enum pipe_error ret; + + /* Host-side buffers can only be dirtied with vgpu10 features + * (streamout and buffer copy). + */ + assert(svga_have_vgpu10(svga)); assert(sbuf->handle); + ret = SVGA3D_vgpu10_ReadbackSubResource(svga->swc, sbuf->handle, 0); if (ret != PIPE_OK) { svga_context_flush(svga, NULL); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/16] svga: use untyped surface formats in most cases
This allows us to do copies between different, but compatible, surface formats such as RGBA8_UNORM, RGBA8_SINT, RGBA8_UINT, etc. for GL_ARB_copy_image. --- src/gallium/drivers/svga/svga_resource_texture.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/svga/svga_resource_texture.c b/src/gallium/drivers/svga/svga_resource_texture.c index 0e21f5e..14fe220 100644 --- a/src/gallium/drivers/svga/svga_resource_texture.c +++ b/src/gallium/drivers/svga/svga_resource_texture.c @@ -949,10 +949,13 @@ svga_texture_create(struct pipe_screen *screen, * formats can be reinterpreted as other formats. For example, * SVGA3D_R8G8B8A8_UNORM_TYPELESS can be interpreted as * SVGA3D_R8G8B8A8_UNORM_SRGB or SVGA3D_R8G8B8A8_UNORM. +* Do not use typeless formats for SHARED, DISPLAY_TARGET or SCANOUT +* buffers. */ - if (svgascreen->sws->have_vgpu10 && - (util_format_is_srgb(template->format) || -format_has_depth(template->format))) { + if (svgascreen->sws->have_vgpu10 + && ((bindings & (PIPE_BIND_SHARED | +PIPE_BIND_DISPLAY_TARGET | +PIPE_BIND_SCANOUT)) == 0)) { SVGA3dSurfaceFormat typeless = svga_typeless_format(tex->key.format); if (0) { debug_printf("Convert resource type %s -> %s (bind 0x%x)\n", -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/16] util: simplify a few things in util_can_blit_via_copy_region()
Since only the src box can have negative dims for flipping, just comparing the src/dst box sizes is enough to detect flips. --- src/gallium/auxiliary/util/u_surface.c | 20 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/src/gallium/auxiliary/util/u_surface.c b/src/gallium/auxiliary/util/u_surface.c index 8d22bcf..e2229bc 100644 --- a/src/gallium/auxiliary/util/u_surface.c +++ b/src/gallium/auxiliary/util/u_surface.c @@ -701,21 +701,20 @@ util_can_blit_via_copy_region(const struct pipe_blit_info *blit) return FALSE; } - /* No masks, no filtering, no scissor. */ + /* No masks, no filtering, no scissor, no blending */ if ((blit->mask & mask) != mask || blit->filter != PIPE_TEX_FILTER_NEAREST || - blit->scissor_enable) { + blit->scissor_enable || + blit->alpha_blend) { return FALSE; } - /* No flipping. */ - if (blit->src.box.width < 0 || - blit->src.box.height < 0 || - blit->src.box.depth < 0) { - return FALSE; - } + /* Only the src box can have negative dims for flipping */ + assert(blit->dst.box.width >= 1); + assert(blit->dst.box.height >= 1); + assert(blit->dst.box.depth >= 1); - /* No scaling. */ + /* No scaling or flipping */ if (blit->src.box.width != blit->dst.box.width || blit->src.box.height != blit->dst.box.height || blit->src.box.depth != blit->dst.box.depth) { @@ -736,9 +735,6 @@ util_can_blit_via_copy_region(const struct pipe_blit_info *blit) return FALSE; } - if (blit->alpha_blend) - return FALSE; - return TRUE; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/16] gallium/util: add tight_format_check param to util_can_blit_via_copy_region()
The VMware driver will use this for implementing GL_ARB_copy_image. --- src/gallium/auxiliary/util/u_surface.c | 38 +- src/gallium/auxiliary/util/u_surface.h | 3 ++- 2 files changed, 30 insertions(+), 11 deletions(-) diff --git a/src/gallium/auxiliary/util/u_surface.c b/src/gallium/auxiliary/util/u_surface.c index e2229bc..e0234f8 100644 --- a/src/gallium/auxiliary/util/u_surface.c +++ b/src/gallium/auxiliary/util/u_surface.c @@ -687,19 +687,37 @@ get_sample_count(const struct pipe_resource *res) } +/** + * Check if a blit() command can be implemented with a resource_copy_region(). + * If tight_format_check is true, only allow the resource_copy_region() if + * the blit src/dst formats are identical, ignoring the resource formats. + * Otherwise, check for format casting and compatibility. + */ boolean -util_can_blit_via_copy_region(const struct pipe_blit_info *blit) +util_can_blit_via_copy_region(const struct pipe_blit_info *blit, + boolean tight_format_check) { - unsigned mask = util_format_get_mask(blit->dst.format); + const struct util_format_description *src_desc, *dst_desc; - /* No format conversions. */ - if (blit->src.resource->format != blit->src.format || - blit->dst.resource->format != blit->dst.format || - !util_is_format_compatible( - util_format_description(blit->src.resource->format), - util_format_description(blit->dst.resource->format))) { - return FALSE; + src_desc = util_format_description(blit->src.resource->format); + dst_desc = util_format_description(blit->dst.resource->format); + + if (tight_format_check) { + /* no format conversions allowed */ + if (blit->src.format != blit->dst.format) { + return FALSE; + } } + else { + /* do loose format compatibility checking */ + if (blit->src.resource->format != blit->src.format || + blit->dst.resource->format != blit->dst.format || + !util_is_format_compatible(src_desc, dst_desc)) { + return FALSE; + } + } + + unsigned mask = util_format_get_mask(blit->dst.format); /* No masks, no filtering, no scissor, no blending */ if ((blit->mask & mask) != mask || @@ -752,7 +770,7 @@ boolean util_try_blit_via_copy_region(struct pipe_context *ctx, const struct pipe_blit_info *blit) { - if (util_can_blit_via_copy_region(blit)) { + if (util_can_blit_via_copy_region(blit, FALSE)) { ctx->resource_copy_region(ctx, blit->dst.resource, blit->dst.level, blit->dst.box.x, blit->dst.box.y, blit->dst.box.z, diff --git a/src/gallium/auxiliary/util/u_surface.h b/src/gallium/auxiliary/util/u_surface.h index bda2e1e..64a685b 100644 --- a/src/gallium/auxiliary/util/u_surface.h +++ b/src/gallium/auxiliary/util/u_surface.h @@ -99,7 +99,8 @@ util_clear_depth_stencil(struct pipe_context *pipe, unsigned width, unsigned height); boolean -util_can_blit_via_copy_region(const struct pipe_blit_info *blit); +util_can_blit_via_copy_region(const struct pipe_blit_info *blit, + boolean tight_format_check); extern boolean util_try_blit_via_copy_region(struct pipe_context *ctx, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/16] util: new util_try_blit_via_copy_region() function
Pulled out of the util_try_blit_via_copy_region() function. Subsequent changes build on this. --- src/gallium/auxiliary/util/u_surface.c | 44 ++ src/gallium/auxiliary/util/u_surface.h | 3 +++ 2 files changed, 32 insertions(+), 15 deletions(-) diff --git a/src/gallium/auxiliary/util/u_surface.c b/src/gallium/auxiliary/util/u_surface.c index 8408aa8..8d22bcf 100644 --- a/src/gallium/auxiliary/util/u_surface.c +++ b/src/gallium/auxiliary/util/u_surface.c @@ -686,18 +686,9 @@ get_sample_count(const struct pipe_resource *res) return res->nr_samples ? res->nr_samples : 1; } -/** - * Try to do a blit using resource_copy_region. The function calls - * resource_copy_region if the blit description is compatible with it. - * - * It returns TRUE if the blit was done using resource_copy_region. - * - * It returns FALSE otherwise and the caller must fall back to a more generic - * codepath for the blit operation. (e.g. by using u_blitter) - */ + boolean -util_try_blit_via_copy_region(struct pipe_context *ctx, - const struct pipe_blit_info *blit) +util_can_blit_via_copy_region(const struct pipe_blit_info *blit) { unsigned mask = util_format_get_mask(blit->dst.format); @@ -748,9 +739,32 @@ util_try_blit_via_copy_region(struct pipe_context *ctx, if (blit->alpha_blend) return FALSE; - ctx->resource_copy_region(ctx, blit->dst.resource, blit->dst.level, - blit->dst.box.x, blit->dst.box.y, blit->dst.box.z, - blit->src.resource, blit->src.level, - >src.box); return TRUE; } + + +/** + * Try to do a blit using resource_copy_region. The function calls + * resource_copy_region if the blit description is compatible with it. + * + * It returns TRUE if the blit was done using resource_copy_region. + * + * It returns FALSE otherwise and the caller must fall back to a more generic + * codepath for the blit operation. (e.g. by using u_blitter) + */ +boolean +util_try_blit_via_copy_region(struct pipe_context *ctx, + const struct pipe_blit_info *blit) +{ + if (util_can_blit_via_copy_region(blit)) { + ctx->resource_copy_region(ctx, blit->dst.resource, blit->dst.level, +blit->dst.box.x, blit->dst.box.y, +blit->dst.box.z, +blit->src.resource, blit->src.level, +>src.box); + return TRUE; + } + else { + return FALSE; + } +} diff --git a/src/gallium/auxiliary/util/u_surface.h b/src/gallium/auxiliary/util/u_surface.h index bfd8f40..bda2e1e 100644 --- a/src/gallium/auxiliary/util/u_surface.h +++ b/src/gallium/auxiliary/util/u_surface.h @@ -98,6 +98,9 @@ util_clear_depth_stencil(struct pipe_context *pipe, unsigned dstx, unsigned dsty, unsigned width, unsigned height); +boolean +util_can_blit_via_copy_region(const struct pipe_blit_info *blit); + extern boolean util_try_blit_via_copy_region(struct pipe_context *ctx, const struct pipe_blit_info *blit); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/16] svga: don't advertise support for R32G32B32_UINT/SINT surface formats
From: Neha BhendeWe want to be able to copy between different 32-bit, 3-channel surface formats for GL_ARB_copy_image but since we don't have a 3-channel float format, we can't support 32-bit, 3-channel integer formats. The state tracker will choose 4-channel formats instead. Fixes the piglit arb_copy_image-format test for several cases. Note: This change may need to be revisited if/when the texture_view exension is enabled in driver. Reviewed-by: Brian Paul --- src/gallium/drivers/svga/svga_format.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/svga/svga_format.c b/src/gallium/drivers/svga/svga_format.c index 4662bef..17c3bf9 100644 --- a/src/gallium/drivers/svga/svga_format.c +++ b/src/gallium/drivers/svga/svga_format.c @@ -242,11 +242,11 @@ static const struct vgpu10_format_entry format_conversion_table[] = { PIPE_FORMAT_R16G16B16A16_SINT, SVGA3D_R16G16B16A16_SINT, SVGA3D_R16G16B16A16_SINT,0 }, { PIPE_FORMAT_R32_UINT, SVGA3D_R32_UINT, SVGA3D_R32_UINT, 0 }, { PIPE_FORMAT_R32G32_UINT, SVGA3D_R32G32_UINT, SVGA3D_R32G32_UINT, 0 }, - { PIPE_FORMAT_R32G32B32_UINT,SVGA3D_R32G32B32_UINT, SVGA3D_R32G32B32_UINT, 0 }, + { PIPE_FORMAT_R32G32B32_UINT,SVGA3D_R32G32B32_UINT, SVGA3D_FORMAT_INVALID, 0 }, { PIPE_FORMAT_R32G32B32A32_UINT, SVGA3D_R32G32B32A32_UINT, SVGA3D_R32G32B32A32_UINT,0 }, { PIPE_FORMAT_R32_SINT, SVGA3D_R32_SINT, SVGA3D_R32_SINT, 0 }, { PIPE_FORMAT_R32G32_SINT, SVGA3D_R32G32_SINT, SVGA3D_R32G32_SINT, 0 }, - { PIPE_FORMAT_R32G32B32_SINT,SVGA3D_R32G32B32_SINT, SVGA3D_R32G32B32_SINT, 0 }, + { PIPE_FORMAT_R32G32B32_SINT,SVGA3D_R32G32B32_SINT, SVGA3D_FORMAT_INVALID, 0 }, { PIPE_FORMAT_R32G32B32A32_SINT, SVGA3D_R32G32B32A32_SINT, SVGA3D_R32G32B32A32_SINT,0 }, { PIPE_FORMAT_A8_UINT, SVGA3D_FORMAT_INVALID, SVGA3D_FORMAT_INVALID, 0 }, { PIPE_FORMAT_I8_UINT, SVGA3D_FORMAT_INVALID, SVGA3D_FORMAT_INVALID, 0 }, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] radeonsi: use conformant line rasterization
This series is, Reviewed-by: Edward O'CallaghanOn 06/29/2016 03:53 AM, Marek Olšák wrote: > From: Marek Olšák > > AA lines are not completely correct (see TODO), but everything else > should be. > > + 3 linestipple piglits > --- > src/gallium/drivers/radeon/cayman_msaa.c | 12 ++-- > src/gallium/drivers/radeon/r600d_common.h| 6 ++ > src/gallium/drivers/radeonsi/si_state.c | 10 +- > src/gallium/drivers/radeonsi/si_state_draw.c | 6 -- > 4 files changed, 29 insertions(+), 5 deletions(-) > > diff --git a/src/gallium/drivers/radeon/cayman_msaa.c > b/src/gallium/drivers/radeon/cayman_msaa.c > index a9ec4c3..89c4937 100644 > --- a/src/gallium/drivers/radeon/cayman_msaa.c > +++ b/src/gallium/drivers/radeon/cayman_msaa.c > @@ -200,6 +200,14 @@ void cayman_emit_msaa_config(struct radeon_winsys_cs > *cs, int nr_samples, > { > int setup_samples = nr_samples > 1 ? nr_samples : > overrast_samples > 1 ? overrast_samples : 0; > + /* Required by OpenGL line rasterization. > + * > + * TODO: We should also enable perpendicular endcaps for AA lines, > + * but that requires implementing line stippling in the pixel > + * shader. SC can only do line stippling with axis-aligned > + * endcaps. > + */ > + unsigned sc_line_cntl = S_028BDC_DX10_DIAMOND_TEST_ENA(1); > > if (setup_samples > 1) { > /* indexed by log2(nr_samples) */ > @@ -215,7 +223,7 @@ void cayman_emit_msaa_config(struct radeon_winsys_cs *cs, > int nr_samples, > util_logbase2(util_next_power_of_two(ps_iter_samples)); > > radeon_set_context_reg_seq(cs, CM_R_028BDC_PA_SC_LINE_CNTL, 2); > - radeon_emit(cs, S_028BDC_LAST_PIXEL(1) | > + radeon_emit(cs, sc_line_cntl | > S_028BDC_EXPAND_LINE_WIDTH(1)); /* > CM_R_028BDC_PA_SC_LINE_CNTL */ > radeon_emit(cs, S_028BE0_MSAA_NUM_SAMPLES(log_samples) | > S_028BE0_MAX_SAMPLE_DIST(max_dist[log_samples]) | > @@ -242,7 +250,7 @@ void cayman_emit_msaa_config(struct radeon_winsys_cs *cs, > int nr_samples, > } > } else { > radeon_set_context_reg_seq(cs, CM_R_028BDC_PA_SC_LINE_CNTL, 2); > - radeon_emit(cs, S_028BDC_LAST_PIXEL(1)); /* > CM_R_028BDC_PA_SC_LINE_CNTL */ > + radeon_emit(cs, sc_line_cntl); /* CM_R_028BDC_PA_SC_LINE_CNTL */ > radeon_emit(cs, 0); /* CM_R_028BE0_PA_SC_AA_CONFIG */ > > radeon_set_context_reg(cs, CM_R_028804_DB_EQAA, > diff --git a/src/gallium/drivers/radeon/r600d_common.h > b/src/gallium/drivers/radeon/r600d_common.h > index e50de96..6f534b3 100644 > --- a/src/gallium/drivers/radeon/r600d_common.h > +++ b/src/gallium/drivers/radeon/r600d_common.h > @@ -203,6 +203,12 @@ > #define S_028BDC_LAST_PIXEL(x) (((unsigned)(x) & > 0x1) << 10) > #define G_028BDC_LAST_PIXEL(x) (((x) >> 10) & 0x1) > #define C_028BDC_LAST_PIXEL 0xFBFF > +#define S_028BDC_PERPENDICULAR_ENDCAP_ENA(x) (((unsigned)(x) & > 0x1) << 11) > +#define G_028BDC_PERPENDICULAR_ENDCAP_ENA(x) (((x) >> 11) & 0x1) > +#define C_028BDC_PERPENDICULAR_ENDCAP_ENA0xF7FF > +#define S_028BDC_DX10_DIAMOND_TEST_ENA(x)(((unsigned)(x) & > 0x1) << 12) > +#define G_028BDC_DX10_DIAMOND_TEST_ENA(x)(((x) >> 12) & 0x1) > +#define C_028BDC_DX10_DIAMOND_TEST_ENA 0xEFFF > #define CM_R_028BE0_PA_SC_AA_CONFIG 0x28be0 > #define S_028BE0_MSAA_NUM_SAMPLES(x) (((unsigned)(x) & > 0x7) << 0) > #define S_028BE0_AA_MASK_CENTROID_DTMN(x) (((unsigned)(x) & 0x1) > << 4) > diff --git a/src/gallium/drivers/radeonsi/si_state.c > b/src/gallium/drivers/radeonsi/si_state.c > index 0a2fdbf..b21fa5c 100644 > --- a/src/gallium/drivers/radeonsi/si_state.c > +++ b/src/gallium/drivers/radeonsi/si_state.c > @@ -3805,7 +3805,15 @@ static void si_init_config(struct si_context *sctx) > S_028034_BR_X(16384) | S_028034_BR_Y(16384)); > > si_pm4_set_reg(pm4, R_02820C_PA_SC_CLIPRECT_RULE, 0x); > - si_pm4_set_reg(pm4, R_028230_PA_SC_EDGERULE, 0x); > + si_pm4_set_reg(pm4, R_028230_PA_SC_EDGERULE, > +S_028230_ER_TRI(0xA) | > +S_028230_ER_POINT(0xA) | > +S_028230_ER_RECT(0xA) | > +/* Required by DX10_DIAMOND_TEST_ENA: */ > +S_028230_ER_LINE_LR(0x1A) | > +S_028230_ER_LINE_RL(0x26) | > +S_028230_ER_LINE_TB(0xA) | > +S_028230_ER_LINE_BT(0xA)); > /* PA_SU_HARDWARE_SCREEN_OFFSET must be 0 due to hw bug on SI */ > si_pm4_set_reg(pm4,
Re: [Mesa-dev] [PATCH] gallium: Force blend color to 16-byte alignment
Am 28.06.2016 um 22:45 schrieb Chuck Atkins: > This aligns the 4-element color float array to 16 byte boundaries. This > should allow compiler vectorizers to generate better optimizations. > Also fixes broken vectorization generated by Intel compiler. > > Reported-by: Tim Rowley> Signed-off-by: Chuck Atkins > --- > src/gallium/include/pipe/p_state.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/gallium/include/pipe/p_state.h > b/src/gallium/include/pipe/p_state.h > index 1543e90..95f140f 100644 > --- a/src/gallium/include/pipe/p_state.h > +++ b/src/gallium/include/pipe/p_state.h > @@ -326,7 +326,7 @@ struct pipe_blend_state > > struct pipe_blend_color > { > - float color[4]; > + PIPE_ALIGN_VAR(16) float color[4]; > }; > I'm wondering if that's really needed. I have a difficult time to imagine setting blend color is performance critical. And driver internal you can obviously still align pipe_blend_color structs yourself. But OTOH, why not... Acked-by: Roland Scheidegger ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Intel-gfx] [PATCH 2/2] i965: Removing PCI IDs that are no longer listed as Kabylake.
On Thu, 2016-06-23 at 14:50 -0700, Rodrigo Vivi wrote: > This is unusual. Usually IDs listed on early stages of platform > definition are kept there as reserved for later use. > > However these IDs here are not listed anymore in any of steppings > and devices IDs tables for Kabylake on configurations overview > section of BSpec. > > So it is better removing them before they become used in any > other future platform. > > Signed-off-by: Rodrigo Vivi> --- > include/pci_ids/i965_pci_ids.h | 5 - > 1 file changed, 5 deletions(-) > > diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h > index 7a7897f..1566afd 100644 > --- a/include/pci_ids/i965_pci_ids.h > +++ b/include/pci_ids/i965_pci_ids.h > @@ -153,12 +153,7 @@ CHIPSET(0x5921, kbl_gt2, "Intel(R) Kabylake GT2F") > CHIPSET(0x5923, kbl_gt3, "Intel(R) Kabylake GT3") > CHIPSET(0x5926, kbl_gt3, "Intel(R) Kabylake GT3") > CHIPSET(0x5927, kbl_gt3, "Intel(R) Kabylake GT3") > -CHIPSET(0x592A, kbl_gt3, "Intel(R) Kabylake GT3") > -CHIPSET(0x592B, kbl_gt3, "Intel(R) Kabylake GT3") > -CHIPSET(0x5932, kbl_gt4, "Intel(R) Kabylake GT4") > -CHIPSET(0x593A, kbl_gt4, "Intel(R) Kabylake GT4") > CHIPSET(0x593B, kbl_gt4, "Intel(R) Kabylake GT4") > -CHIPSET(0x593D, kbl_gt4, "Intel(R) Kabylake GT4") > CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherrytrail)") > CHIPSET(0x22B1, chv, "Intel(R) HD Graphics XXX (Braswell)") /* > Overridden in brw_get_renderer_string */ > CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)") Verified against the spec, lgtm. Reviewed-by: Dhinakaran Pandiyan ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Intel-gfx] [PATCH 1/2] i956: Add more Kabylake PCI IDs.
On Thu, 2016-06-23 at 14:50 -0700, Rodrigo Vivi wrote: > The spec has been updated adding new PCI IDs. > > Signed-off-by: Rodrigo Vivi> --- > include/pci_ids/i965_pci_ids.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h > index fce00da..7a7897f 100644 > --- a/include/pci_ids/i965_pci_ids.h > +++ b/include/pci_ids/i965_pci_ids.h > @@ -137,6 +137,7 @@ CHIPSET(0x193D, skl_gt4, "Intel(R) Iris Pro Graphics P580 > (Skylake GT4e)") > CHIPSET(0x5902, kbl_gt1, "Intel(R) Kabylake GT1") > CHIPSET(0x5906, kbl_gt1, "Intel(R) Kabylake GT1") > CHIPSET(0x590A, kbl_gt1, "Intel(R) Kabylake GT1") > +CHIPSET(0x5908, kbl_gt1, "Intel(R) Kabylake GT1") > CHIPSET(0x590B, kbl_gt1, "Intel(R) Kabylake GT1") > CHIPSET(0x590E, kbl_gt1, "Intel(R) Kabylake GT1") > CHIPSET(0x5913, kbl_gt1_5, "Intel(R) Kabylake GT1.5") > @@ -149,7 +150,9 @@ CHIPSET(0x591B, kbl_gt2, "Intel(R) Kabylake GT2") > CHIPSET(0x591D, kbl_gt2, "Intel(R) Kabylake GT2") > CHIPSET(0x591E, kbl_gt2, "Intel(R) Kabylake GT2") > CHIPSET(0x5921, kbl_gt2, "Intel(R) Kabylake GT2F") > +CHIPSET(0x5923, kbl_gt3, "Intel(R) Kabylake GT3") > CHIPSET(0x5926, kbl_gt3, "Intel(R) Kabylake GT3") > +CHIPSET(0x5927, kbl_gt3, "Intel(R) Kabylake GT3") > CHIPSET(0x592A, kbl_gt3, "Intel(R) Kabylake GT3") > CHIPSET(0x592B, kbl_gt3, "Intel(R) Kabylake GT3") > CHIPSET(0x5932, kbl_gt4, "Intel(R) Kabylake GT4") Verified against the spec. lgtm. Reviewed-by: Dhinakaran Pandiyan ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2 v2] mesa/st: Silence unused variable warning
v2: Use MAYBE_UNUSED Changed commit tag (Suggested by Ian Romanick) Signed-off-by: Gurkirpal Singh--- src/mesa/state_tracker/st_glsl_to_nir.cpp | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp b/src/mesa/state_tracker/st_glsl_to_nir.cpp index a880564..2cdb7b6 100644 --- a/src/mesa/state_tracker/st_glsl_to_nir.cpp +++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp @@ -46,6 +46,8 @@ #include "compiler/glsl/glsl_to_nir.h" #include "compiler/glsl/ir.h" +#include "util/macros.h" + /* Depending on PIPE_CAP_TGSI_TEXCOORD (st->needs_texcoord_semantic) we * may need to fix up varying slots so the glsl->nir path is aligned @@ -169,7 +171,7 @@ st_nir_assign_uniform_locations(struct gl_program *prog, if (uniform->type->is_sampler()) { unsigned val; - bool found = shader_program->UniformHash->get(val, uniform->name); + MAYBE_UNUSED bool found = shader_program->UniformHash->get(val, uniform->name); loc = shaderidx++; assert(found); /* this ensure that nir_lower_samplers looks at the correct -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2 v2] gallium: Silence unused variable warnings
v2: Use MAYBE_UNUSED as suggested by Ian Romanick Signed-off-by: Gurkirpal Singh--- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 4 +++- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 4 +++- src/gallium/drivers/nouveau/nv50/nv98_video.c | 4 +++- src/gallium/drivers/nouveau/nvc0/nvc0_video.c | 8 +--- src/gallium/drivers/softpipe/sp_state_shader.c| 3 ++- src/gallium/state_trackers/xvmc/surface.c | 5 +++-- src/gallium/state_trackers/xvmc/tests/xvmc_bench.c| 8 +--- 7 files changed, 24 insertions(+), 12 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp index 0fe399b..9fc7c5a 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp @@ -23,6 +23,8 @@ #include "codegen/nv50_ir.h" #include "codegen/nv50_ir_target_nv50.h" +#include "util/macros.h" + namespace nv50_ir { #define NV50_OP_ENC_LONG 0 @@ -621,7 +623,7 @@ void CodeEmitterNV50::emitLOAD(const Instruction *i) { DataFile sf = i->src(0).getFile(); - int32_t offset = i->getSrc(0)->reg.data.offset; + MAYBE_UNUSED int32_t offset = i->getSrc(0)->reg.data.offset; switch (sf) { case FILE_SHADER_INPUT: diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 3213188..6a60a7b 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -24,6 +24,8 @@ #include "codegen/nv50_ir_target.h" #include "codegen/nv50_ir_build_util.h" +#include "util/macros.h" + extern "C" { #include "util/u_math.h" } @@ -2963,7 +2965,7 @@ NV50PostRaConstantFolding::visit(BasicBlock *bb) i->setSrc(1, def->getSrc(0)); } else { ImmediateValue val; - bool ret = def->src(0).getImmediate(val); + MAYBE_UNUSED bool ret = def->src(0).getImmediate(val); assert(ret); if (i->getSrc(1)->reg.data.id & 1) val.reg.data.u32 >>= 16; diff --git a/src/gallium/drivers/nouveau/nv50/nv98_video.c b/src/gallium/drivers/nouveau/nv50/nv98_video.c index 177a7e0..d348807 100644 --- a/src/gallium/drivers/nouveau/nv50/nv98_video.c +++ b/src/gallium/drivers/nouveau/nv50/nv98_video.c @@ -24,6 +24,7 @@ #include "util/u_sampler.h" #include "util/u_format.h" +#include "util/macros.h" #include @@ -40,7 +41,8 @@ nv98_decoder_decode_bitstream(struct pipe_video_codec *decoder, uint32_t comm_seq = ++dec->fence_seq; union pipe_desc desc; - unsigned vp_caps, is_ref, ret; + unsigned vp_caps, is_ref; + MAYBE_UNUSED unsigned ret; struct nouveau_vp3_video_buffer *refs[16] = {}; desc.base = picture; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_video.c b/src/gallium/drivers/nouveau/nvc0/nvc0_video.c index a9fd1d2..10cb31e 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_video.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_video.c @@ -24,6 +24,7 @@ #include "util/u_sampler.h" #include "util/u_format.h" +#include "util/macros.h" static void nvc0_decoder_begin_frame(struct pipe_video_codec *decoder, @@ -32,7 +33,7 @@ nvc0_decoder_begin_frame(struct pipe_video_codec *decoder, { struct nouveau_vp3_decoder *dec = (struct nouveau_vp3_decoder *)decoder; uint32_t comm_seq = ++dec->fence_seq; - unsigned ret = 0; + MAYBE_UNUSED unsigned ret = 0; assert(dec); assert(target); @@ -53,7 +54,7 @@ nvc0_decoder_decode_bitstream(struct pipe_video_codec *decoder, { struct nouveau_vp3_decoder *dec = (struct nouveau_vp3_decoder *)decoder; uint32_t comm_seq = dec->fence_seq; - unsigned ret = 0; + MAYBE_UNUSED unsigned ret = 0; assert(decoder); @@ -72,7 +73,8 @@ nvc0_decoder_end_frame(struct pipe_video_codec *decoder, uint32_t comm_seq = dec->fence_seq; union pipe_desc desc; - unsigned vp_caps, is_ref, ret; + unsigned vp_caps, is_ref; + MAYBE_UNUSED unsigned ret; struct nouveau_vp3_video_buffer *refs[16] = {}; desc.base = picture; diff --git a/src/gallium/drivers/softpipe/sp_state_shader.c b/src/gallium/drivers/softpipe/sp_state_shader.c index a745662..d02727f 100644 --- a/src/gallium/drivers/softpipe/sp_state_shader.c +++ b/src/gallium/drivers/softpipe/sp_state_shader.c @@ -34,6 +34,7 @@ #include "util/u_memory.h" #include "util/u_inlines.h" #include "util/u_pstipple.h" +#include "util/macros.h" #include "draw/draw_context.h" #include "draw/draw_vs.h" #include "draw/draw_gs.h" @@ -420,7 +421,7 @@ static void softpipe_delete_compute_state(struct pipe_context *pipe, void *cs) { - struct softpipe_context *softpipe = softpipe_context(pipe); + MAYBE_UNUSED struct softpipe_context *softpipe =
Re: [Mesa-dev] [PATCH] gallium: Force blend color to 16-byte alignment
On Tue, Jun 28, 2016 at 1:45 PM, Chuck Atkinswrote: > This aligns the 4-element color float array to 16 byte boundaries. This > should allow compiler vectorizers to generate better optimizations. > Also fixes broken vectorization generated by Intel compiler. > > Reported-by: Tim Rowley > Signed-off-by: Chuck Atkins > --- > src/gallium/include/pipe/p_state.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/gallium/include/pipe/p_state.h > b/src/gallium/include/pipe/p_state.h > index 1543e90..95f140f 100644 > --- a/src/gallium/include/pipe/p_state.h > +++ b/src/gallium/include/pipe/p_state.h > @@ -326,7 +326,7 @@ struct pipe_blend_state > > struct pipe_blend_color > { > - float color[4]; > + PIPE_ALIGN_VAR(16) float color[4]; Looks like you lost a space of indentation. Whoever commits, please fix. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: adds gen7_emit_cs_stall_flush on intel_texture_barrier
Alejandro Piñeirowrites: > Fixes: > GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass > > On Haswell, Broadwell and Skylake (note that in order to execute > that test, it is needed to override GL and GLSL versions). > > I was not able to find a documentation reference that justifies it. > --- > > Having said, I didn't find a documentation reference explicitly > mention that this is needed. > > Initially I thought that a flag was missing when calling > emit_pipe_control_flush at brw_emit_mi_flush, but it was not the case > as far as I saw. Then I noted that there is a gen6 workaround on that > code: > > if (brw->gen == 6) { > /* Hardware workaround: SNB B-Spec says: > * > * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache > * Flush Enable =1, a PIPE_CONTROL with any non-zero > * post-sync-op is required. > */ > brw_emit_post_sync_nonzero_flush(brw); > } > > I tested calling that method for any gen, guessing if the workaround > was needed also for other gens, and the test got fixed. But looking at > the documentation of other gens, I didn't find the need for this > workaround. For that reason I moved to use gen7_emit_cs_stall, that is > less agressive and get the test fixed too. It seems that in order to > get a complete flush you need a cs stall flush with a > pipe_control_write. But again, I didn't find any reference at the PRMs > confirming it. > > Intuitively, this would be needed on brw_emit_mi_flush or even at > brw_emit_pipe_control_flush (this one already include some > gen-specific workarounds), but I prefered to keep it on the only place > that seems to need it for now. > > In addition to solve that CTS test, it also gets it passing for the > test I recently sent to the piglit list, and not included on master > yet (acked for now): > https://lists.freedesktop.org/archives/piglit/2016-June/020055.html > > That piglit patch adds 48 parameter combination for the basic > test. Without this mesa patch 5-6 subtests fails. With this patch all > of them passes. Tested on Haswell, Broadwell and Skylake too. > I believe this test is hitting the same hardware race condition that most callers of brw_emit_mi_flush() suffer from: The problem of brw_emit_mi_flush() is that, even though it is supposed to both invalidate R/O caches (e.g. the sampler caches) and flush R/W caches (e.g. the render cache), the former happens at the top of the pipeline (i.e. as soon as the CS processor parses the PIPE_CONTROL command, irrespective of whether a concurrent rendering workload could pollute a R/O cache again in parallel), while the latter happens at the bottom of the pipeline (i.e. after any concurrent rendering completes). The gen7_emit_cs_stall_flush() call you have introduced seems to fix the issue because it forces additional serialization with respect to previous rendering commands before the R/O caches are invalidated, which is a clear indicative that you're hitting the same bug. The right way to fix it would be to remove the brw_emit_mi_flush() call for Gen6+ at least (brw_emit_mi_flush() is BTW a pretty big hammer and causes a bunch of other caches to be flushed which aren't necessarily relevant to texture barrier), and instead call brw_emit_pipe_control_flush() twice: The first PIPE_CONTROL command should have at least RENDER_TARGET_FLUSH and CS_STALL set to initiate a render cache flush after any concurrent rendering completes and cause the CS to stop parsing commands until the render cache becomes coherent with memory (the DEPTH_CACHE_FLUSH bit may also be necessary for some workloads using depth texturing). The second PIPE_CONTROL should have TEXTURE_CACHE_INVALIDATE set (and no CS stall) to clean up any stale data from the sampler caches before rendering continues. See 0aa4f99f562a05880a779707cbcd46be459863bf for how I addressed the same problem in the L3 cache partitioning code (where I noticed the problem originally), or 72473658c51d5e074ce219c1e6385a4cce29f467 for how Ken fixed the same issue in the draw-time surface validation path. Incidentally I had written some code just a couple of days ago to address the same issue in the implementation of glMemoryBarrier (I'll send it for review soon-ish). There are likely many more instances of this race condition in the driver, most callers of brw_emit_mi_flush are suspect... > src/mesa/drivers/dri/i965/intel_tex.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/mesa/drivers/dri/i965/intel_tex.c > b/src/mesa/drivers/dri/i965/intel_tex.c > index cac33ac..e7459cd 100644 > --- a/src/mesa/drivers/dri/i965/intel_tex.c > +++ b/src/mesa/drivers/dri/i965/intel_tex.c > @@ -362,6 +362,7 @@ intel_texture_barrier(struct gl_context *ctx) > { > struct brw_context *brw = brw_context(ctx); > > + gen7_emit_cs_stall_flush(brw); > brw_emit_mi_flush(brw); > } > > -- > 2.7.4 > >
Re: [Mesa-dev] [PATCH 1/2] intel: Add more Kabylake PCI IDs.
On Mon, 2016-06-27 at 17:10 -0700, Rodrigo Vivi wrote: > The spec has been updated adding new PCI IDs. > > v2: Avoid using "H" instead of HALO to keep names uniform - DK. > > Cc: Dhinakaran Pandiyan> Signed-off-by: Rodrigo Vivi > --- > intel/intel_chipset.h | 14 ++ > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/intel/intel_chipset.h b/intel/intel_chipset.h > index e2554c3..6b8d4e9 100644 > --- a/intel/intel_chipset.h > +++ b/intel/intel_chipset.h > @@ -194,7 +194,9 @@ > #define PCI_CHIP_KABYLAKE_ULT_GT20x5916 > #define PCI_CHIP_KABYLAKE_ULT_GT1_5 0x5913 > #define PCI_CHIP_KABYLAKE_ULT_GT10x5906 > -#define PCI_CHIP_KABYLAKE_ULT_GT30x5926 > +#define PCI_CHIP_KABYLAKE_ULT_GT3_0 0x5923 > +#define PCI_CHIP_KABYLAKE_ULT_GT3_1 0x5926 > +#define PCI_CHIP_KABYLAKE_ULT_GT3_2 0x5927 > #define PCI_CHIP_KABYLAKE_ULT_GT2F 0x5921 > #define PCI_CHIP_KABYLAKE_ULX_GT1_5 0x5915 > #define PCI_CHIP_KABYLAKE_ULX_GT10x590E > @@ -206,7 +208,8 @@ > #define PCI_CHIP_KABYLAKE_HALO_GT2 0x591B > #define PCI_CHIP_KABYLAKE_HALO_GT4 0x593B > #define PCI_CHIP_KABYLAKE_HALO_GT3 0x592B > -#define PCI_CHIP_KABYLAKE_HALO_GT1 0x590B > +#define PCI_CHIP_KABYLAKE_HALO_GT1_0 0x5908 > +#define PCI_CHIP_KABYLAKE_HALO_GT1_1 0x590B > #define PCI_CHIP_KABYLAKE_SRV_GT20x591A > #define PCI_CHIP_KABYLAKE_SRV_GT30x592A > #define PCI_CHIP_KABYLAKE_SRV_GT10x590A > @@ -414,7 +417,8 @@ >(devid) == PCI_CHIP_KABYLAKE_ULT_GT1 || \ >(devid) == PCI_CHIP_KABYLAKE_ULX_GT1 || \ >(devid) == PCI_CHIP_KABYLAKE_DT_GT1|| \ > - (devid) == PCI_CHIP_KABYLAKE_HALO_GT1 || \ > + (devid) == PCI_CHIP_KABYLAKE_HALO_GT1_0 || \ > + (devid) == PCI_CHIP_KABYLAKE_HALO_GT1_1 || \ >(devid) == PCI_CHIP_KABYLAKE_SRV_GT1) > > #define IS_KBL_GT2(devid)((devid) == PCI_CHIP_KABYLAKE_ULT_GT2 || \ > @@ -425,7 +429,9 @@ >(devid) == PCI_CHIP_KABYLAKE_SRV_GT2 || \ >(devid) == PCI_CHIP_KABYLAKE_WKS_GT2) > > -#define IS_KBL_GT3(devid)((devid) == PCI_CHIP_KABYLAKE_ULT_GT3 || \ > +#define IS_KBL_GT3(devid)((devid) == PCI_CHIP_KABYLAKE_ULT_GT3_0 || \ > + (devid) == PCI_CHIP_KABYLAKE_ULT_GT3_1 || \ > + (devid) == PCI_CHIP_KABYLAKE_ULT_GT3_2 || \ >(devid) == PCI_CHIP_KABYLAKE_HALO_GT3 || \ >(devid) == PCI_CHIP_KABYLAKE_SRV_GT3) > Checked against the spec, lgtm. Reviewed-by: Dhinakaran Pandiyan ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa/main: handle gl_buffer_index correctly
--- src/mesa/main/buffers.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/mesa/main/buffers.c b/src/mesa/main/buffers.c index e8aedde..3ff6061 100644 --- a/src/mesa/main/buffers.c +++ b/src/mesa/main/buffers.c @@ -170,7 +170,7 @@ draw_buffer_enum_to_bitmask(const struct gl_context *ctx, GLenum buffer) * Helper routine used by glReadBuffer. * Given a GLenum naming a color buffer, return the index of the corresponding * renderbuffer (a BUFFER_* value). - * return -1 for an invalid buffer. + * return ~0 for an invalid buffer. */ static gl_buffer_index read_buffer_enum_to_index(GLenum buffer) @@ -197,7 +197,7 @@ read_buffer_enum_to_index(GLenum buffer) case GL_AUX1: case GL_AUX2: case GL_AUX3: - return BUFFER_COUNT; /* invalid, but not -1 */ + return BUFFER_COUNT; /* invalid, but not ~0 */ case GL_COLOR_ATTACHMENT0_EXT: return BUFFER_COLOR0; case GL_COLOR_ATTACHMENT1_EXT: @@ -219,7 +219,7 @@ read_buffer_enum_to_index(GLenum buffer) if (buffer >= GL_COLOR_ATTACHMENT8 && buffer <= GL_COLOR_ATTACHMENT31) return BUFFER_COUNT; /* error */ - return -1; + return ~0; } } @@ -722,11 +722,11 @@ read_buffer(struct gl_context *ctx, struct gl_framebuffer *fb, else { /* general case / window-system framebuffer */ if (_mesa_is_gles3(ctx) && !is_legal_es3_readbuffer_enum(buffer)) - srcBuffer = -1; + srcBuffer = ~0; else srcBuffer = read_buffer_enum_to_index(buffer); - if (srcBuffer == -1) { + if (srcBuffer == ~0u) { _mesa_error(ctx, GL_INVALID_ENUM, "%s(invalid buffer %s)", caller, _mesa_enum_to_string(buffer)); -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] intel: Removing PCI IDs that are no longer listed as Kabylake.
On Mon, 2016-06-27 at 17:10 -0700, Rodrigo Vivi wrote: > This is unusual. Usually IDs listed on early stages of platform > definition are kept there as reserved for later use. > > However these IDs here are not listed anymore in any of steppings > and devices IDs tables for Kabylake on configurations overview > section of BSpec. > > So it is better removing them before they become used in any > other future platform. > > v2: Rebase. > > Cc: Dhinakaran Pandiyan> Signed-off-by: Rodrigo Vivi > --- > intel/intel_chipset.h | 16 +++- > 1 file changed, 3 insertions(+), 13 deletions(-) > > diff --git a/intel/intel_chipset.h b/intel/intel_chipset.h > index 6b8d4e9..514f659 100644 > --- a/intel/intel_chipset.h > +++ b/intel/intel_chipset.h > @@ -204,18 +204,13 @@ > #define PCI_CHIP_KABYLAKE_DT_GT2 0x5912 > #define PCI_CHIP_KABYLAKE_DT_GT1_5 0x5917 > #define PCI_CHIP_KABYLAKE_DT_GT1 0x5902 > -#define PCI_CHIP_KABYLAKE_DT_GT4 0x5932 > #define PCI_CHIP_KABYLAKE_HALO_GT2 0x591B > #define PCI_CHIP_KABYLAKE_HALO_GT4 0x593B > -#define PCI_CHIP_KABYLAKE_HALO_GT3 0x592B > #define PCI_CHIP_KABYLAKE_HALO_GT1_0 0x5908 > #define PCI_CHIP_KABYLAKE_HALO_GT1_1 0x590B > #define PCI_CHIP_KABYLAKE_SRV_GT20x591A > -#define PCI_CHIP_KABYLAKE_SRV_GT30x592A > #define PCI_CHIP_KABYLAKE_SRV_GT10x590A > -#define PCI_CHIP_KABYLAKE_SRV_GT40x593A > #define PCI_CHIP_KABYLAKE_WKS_GT20x591D > -#define PCI_CHIP_KABYLAKE_WKS_GT40x593D > > #define PCI_CHIP_BROXTON_0 0x0A84 > #define PCI_CHIP_BROXTON_1 0x1A84 > @@ -431,14 +426,9 @@ > > #define IS_KBL_GT3(devid)((devid) == PCI_CHIP_KABYLAKE_ULT_GT3_0 || \ >(devid) == PCI_CHIP_KABYLAKE_ULT_GT3_1 || \ > - (devid) == PCI_CHIP_KABYLAKE_ULT_GT3_2 || \ > - (devid) == PCI_CHIP_KABYLAKE_HALO_GT3 || \ > - (devid) == PCI_CHIP_KABYLAKE_SRV_GT3) > - > -#define IS_KBL_GT4(devid)((devid) == PCI_CHIP_KABYLAKE_DT_GT4|| \ > - (devid) == PCI_CHIP_KABYLAKE_HALO_GT4 || \ > - (devid) == PCI_CHIP_KABYLAKE_SRV_GT4 || \ > - (devid) == PCI_CHIP_KABYLAKE_WKS_GT4) > + (devid) == PCI_CHIP_KABYLAKE_ULT_GT3_2) > + > +#define IS_KBL_GT4(devid)((devid) == PCI_CHIP_KABYLAKE_HALO_GT4) > > #define IS_KABYLAKE(devid) (IS_KBL_GT1(devid) || \ >IS_KBL_GT2(devid) || \ Checked against the spec, lgtm. Reviewed-by: Dhinakaran Pandiyan ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium: Force blend color to 16-byte alignment
This aligns the 4-element color float array to 16 byte boundaries. This should allow compiler vectorizers to generate better optimizations. Also fixes broken vectorization generated by Intel compiler. Reported-by: Tim RowleySigned-off-by: Chuck Atkins --- src/gallium/include/pipe/p_state.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/include/pipe/p_state.h b/src/gallium/include/pipe/p_state.h index 1543e90..95f140f 100644 --- a/src/gallium/include/pipe/p_state.h +++ b/src/gallium/include/pipe/p_state.h @@ -326,7 +326,7 @@ struct pipe_blend_state struct pipe_blend_color { - float color[4]; + PIPE_ALIGN_VAR(16) float color[4]; }; -- 2.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] i965: Refactor intel_get_param()
When I first saw spriv in the last hunk, my brain parsed it as spirv. That was confusing. :) Patches 1 and 3 are Reviewed-by: Ian RomanickOn 06/28/2016 10:07 AM, Chad Versace wrote: > Replace the function's __DRIscreen parameter with struct intel_screen. > The callsites feel more natural that way. > --- > src/mesa/drivers/dri/i965/intel_screen.c | 15 --- > 1 file changed, 8 insertions(+), 7 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_screen.c > b/src/mesa/drivers/dri/i965/intel_screen.c > index 869119b..b693c45 100644 > --- a/src/mesa/drivers/dri/i965/intel_screen.c > +++ b/src/mesa/drivers/dri/i965/intel_screen.c > @@ -970,7 +970,7 @@ static const __DRIextension > *intelRobustScreenExtensions[] = { > }; > > static int > -intel_get_param(__DRIscreen *psp, int param, int *value) > +intel_get_param(struct intel_screen *screen, int param, int *value) > { > int ret; > struct drm_i915_getparam gp; > @@ -979,7 +979,8 @@ intel_get_param(__DRIscreen *psp, int param, int *value) > gp.param = param; > gp.value = value; > > - ret = drmCommandWriteRead(psp->fd, DRM_I915_GETPARAM, , sizeof(gp)); > + ret = drmCommandWriteRead(screen->driScrnPriv->fd, > + DRM_I915_GETPARAM, , sizeof(gp)); > if (ret < 0 && ret != -EINVAL) >_mesa_warning(NULL, "drm_i915_getparam: %d", ret); > > @@ -987,10 +988,10 @@ intel_get_param(__DRIscreen *psp, int param, int *value) > } > > static bool > -intel_get_boolean(__DRIscreen *psp, int param) > +intel_get_boolean(struct intel_screen *screen, int param) > { > int value = 0; > - return (intel_get_param(psp, param, ) == 0) && value; > + return (intel_get_param(screen, param, ) == 0) && value; > } > > static void > @@ -1125,12 +1126,12 @@ intel_detect_sseu(struct intel_screen *intelScreen) > intelScreen->subslice_total = -1; > intelScreen->eu_total = -1; > > - ret = intel_get_param(intelScreen->driScrnPriv, I915_PARAM_SUBSLICE_TOTAL, > + ret = intel_get_param(intelScreen, I915_PARAM_SUBSLICE_TOTAL, > >subslice_total); > if (ret < 0 && ret != -EINVAL) >goto err_out; > > - ret = intel_get_param(intelScreen->driScrnPriv, > + ret = intel_get_param(intelScreen, > I915_PARAM_EU_TOTAL, >eu_total); > if (ret < 0 && ret != -EINVAL) >goto err_out; > @@ -1167,7 +1168,7 @@ intel_init_bufmgr(struct intel_screen *intelScreen) > > drm_intel_bufmgr_gem_enable_fenced_relocs(intelScreen->bufmgr); > > - if (!intel_get_boolean(spriv, I915_PARAM_HAS_RELAXED_DELTA)) { > + if (!intel_get_boolean(intelScreen, I915_PARAM_HAS_RELAXED_DELTA)) { >fprintf(stderr, "[%s: %u] Kernel 2.6.39 required.\n", __func__, > __LINE__); >return false; > } > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] glsl: add driconf to zero-init unintialized vars
On Tue, Jun 28, 2016 at 11:28 AM, Marek Olšákwrote: > On Mon, Jun 27, 2016 at 9:28 PM, Rob Clark wrote: >> On Mon, Jun 27, 2016 at 3:06 PM, Kenneth Graunke >> wrote: >>> On Monday, June 27, 2016 11:43:28 AM PDT Matt Turner wrote: On Mon, Jun 27, 2016 at 4:44 AM, Rob Clark wrote: > On Mon, Jun 27, 2016 at 7:13 AM, Alan Swanson > wrote: >> On 2016-06-25 13:37, Rob Clark wrote: >>> >>> Some games are sloppy.. perhaps because it is defined behavior for DX >>> or >>> perhaps because nv blob driver defaults things to zero. >>> >>> So add driconf param to force uninitialized variables to default to >>> zero. >>> >>> This issue was observed with rust, from steam store. But has surfaced >>> elsewhere in the past. >>> >>> Signed-off-by: Rob Clark >>> --- >>> Note that I left out the drirc bit, since not entirely sure how to >>> identify this game. (I don't actually have the game, just working off >>> of an apitrace) >>> >>> Possibly worth mentioning that for the shaders using uninitialized vars >>> having zero-initializers lets constant-propagation get rid of a whole >>> lot of instructions. One shader I saw dropped to less than half of >>> it's original instruction count. >> >> >> If the default for uninitialised variables is undefined, then with the >> reported shader optimisations why bother with the (DRI) option when >> zeroing could still essentially be classed as undefined? >> >> Cuts the patch down to just the src/compiler/glsl/ast_to_hir.cpp change. > > I did suggest that on #dri-devel, but Jason had a theoretical example > where it would hurt.. iirc something like: > > float maybe_undef; > for (int i = 0; i < some_uniform_at_least_one; i++) > maybe_undef = ... > > also, he didn't want to hide shader bugs that app should fix. > > It would be interesting to rush shaderdb w/ glsl_zero_init=true and > see what happens, but I didn't get around to that yet. Here's what I get on i965. It's not a clear win. total instructions in shared programs: 5249030 -> 5249002 (-0.00%) instructions in affected programs: 28936 -> 28908 (-0.10%) helped: 66 HURT: 132 total cycles in shared programs: 57966694 -> 57956306 (-0.02%) cycles in affected programs: 1136118 -> 1125730 (-0.91%) helped: 78 HURT: 106 >>> >>> I suspect most of the help is because we're missing undef optimizations, >>> such as CSE...while zero could be CSE'd. (I have a patch, but it hurts >>> things too...) >> >> right, I was thinking that treating undef as zero in constant-folding >> would have the same effect.. ofc it might make shader bugs less >> obvious. >> >> Btw, does anyone know what fglrx does? Afaiu nv blob treats undef as >> zero. If fglrx does the same, I suppose that strengthens the argument >> for "just do this unconditionally". > > No idea what fglrx does, but LLVM does eliminate code with undefined > inputs. Initializing everything to 0 might make that worse. hmm, treating as zero does eliminate a lot.. anyway, I guess we'll stick w/ driconf. fwiw, with some help from the reporter, we figured out that this is the bit that I need to squash into drirc: now, if I could talk somebody into a r-b for this and the i965 fix? ;-) BR, -R ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] mesa: Silence unused variable warning
On 06/28/2016 01:01 PM, Gurkirpal Singh wrote: > Signed-off-by: Gurkirpal Singh> --- > src/mesa/state_tracker/st_glsl_to_nir.cpp | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp > b/src/mesa/state_tracker/st_glsl_to_nir.cpp > index a880564..a914c8d 100644 > --- a/src/mesa/state_tracker/st_glsl_to_nir.cpp > +++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp > @@ -172,6 +172,7 @@ st_nir_assign_uniform_locations(struct gl_program *prog, > bool found = shader_program->UniformHash->get(val, uniform->name); There have been some similar patches recently that do MAYBE_UNUSED bool found = ...; Also, the tag should be "mesa/st". > loc = shaderidx++; > assert(found); > + (void) found; > /* this ensure that nir_lower_samplers looks at the correct >* shader_program->UniformStorage[location]: >*/ > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] mesa: Silence unused variable warning
Signed-off-by: Gurkirpal Singh--- src/mesa/state_tracker/st_glsl_to_nir.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp b/src/mesa/state_tracker/st_glsl_to_nir.cpp index a880564..a914c8d 100644 --- a/src/mesa/state_tracker/st_glsl_to_nir.cpp +++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp @@ -172,6 +172,7 @@ st_nir_assign_uniform_locations(struct gl_program *prog, bool found = shader_program->UniformHash->get(val, uniform->name); loc = shaderidx++; assert(found); + (void) found; /* this ensure that nir_lower_samplers looks at the correct * shader_program->UniformStorage[location]: */ -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] gallium: Silence unused variable warnings
Signed-off-by: Gurkirpal Singh--- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 2 ++ src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 1 + src/gallium/drivers/nouveau/nv50/nv98_video.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_video.c | 3 +++ src/gallium/drivers/softpipe/sp_state_shader.c| 1 + src/gallium/state_trackers/xvmc/surface.c | 2 ++ src/gallium/state_trackers/xvmc/tests/xvmc_bench.c| 4 7 files changed, 14 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp index 0fe399b..d5479a7 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp @@ -637,6 +637,7 @@ CodeEmitterNV50::emitLOAD(const Instruction *i) case FILE_MEMORY_SHARED: if (targ->getChipset() >= 0x84) { assert(offset <= (int32_t)(0x3fff * typeSizeof(i->sType))); + (void) offset; code[0] = 0x1001; code[1] = 0x4000; @@ -646,6 +647,7 @@ CodeEmitterNV50::emitLOAD(const Instruction *i) emitLoadStoreSizeCS(i->sType); } else { assert(offset <= (int32_t)(0x1f * typeSizeof(i->sType))); + (void) offset; code[0] = 0x1001; code[1] = 0x0020 | (i->lanes << 14); emitLoadStoreSizeCS(i->sType); diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 3213188..e92cfea 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -2965,6 +2965,7 @@ NV50PostRaConstantFolding::visit(BasicBlock *bb) ImmediateValue val; bool ret = def->src(0).getImmediate(val); assert(ret); + (void) ret; if (i->getSrc(1)->reg.data.id & 1) val.reg.data.u32 >>= 16; val.reg.data.u32 &= 0x; diff --git a/src/gallium/drivers/nouveau/nv50/nv98_video.c b/src/gallium/drivers/nouveau/nv50/nv98_video.c index 177a7e0..ce86399 100644 --- a/src/gallium/drivers/nouveau/nv50/nv98_video.c +++ b/src/gallium/drivers/nouveau/nv50/nv98_video.c @@ -53,6 +53,7 @@ nv98_decoder_decode_bitstream(struct pipe_video_codec *decoder, /* did we decode bitstream correctly? */ assert(ret == 2); + (void) ret; nv98_decoder_vp(dec, desc, target, comm_seq, vp_caps, is_ref, refs); nv98_decoder_ppp(dec, desc, target, comm_seq); diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_video.c b/src/gallium/drivers/nouveau/nvc0/nvc0_video.c index a9fd1d2..d83f2a9 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_video.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_video.c @@ -41,6 +41,7 @@ nvc0_decoder_begin_frame(struct pipe_video_codec *decoder, ret = nvc0_decoder_bsp_begin(dec, comm_seq); assert(ret == 2); + (void) ret; } static void @@ -60,6 +61,7 @@ nvc0_decoder_decode_bitstream(struct pipe_video_codec *decoder, ret = nvc0_decoder_bsp_next(dec, comm_seq, num_buffers, data, num_bytes); assert(ret == 2); + (void) ret; } static void @@ -81,6 +83,7 @@ nvc0_decoder_end_frame(struct pipe_video_codec *decoder, /* did we decode bitstream correctly? */ assert(ret == 2); + (void) ret; nvc0_decoder_vp(dec, desc, target, comm_seq, vp_caps, is_ref, refs); nvc0_decoder_ppp(dec, desc, target, comm_seq); diff --git a/src/gallium/drivers/softpipe/sp_state_shader.c b/src/gallium/drivers/softpipe/sp_state_shader.c index a745662..d3abd9d 100644 --- a/src/gallium/drivers/softpipe/sp_state_shader.c +++ b/src/gallium/drivers/softpipe/sp_state_shader.c @@ -424,6 +424,7 @@ softpipe_delete_compute_state(struct pipe_context *pipe, struct sp_compute_shader *state = (struct sp_compute_shader *)cs; assert(softpipe->cs != state); + (void) softpipe; tgsi_free_tokens(state->tokens); FREE(state); } diff --git a/src/gallium/state_trackers/xvmc/surface.c b/src/gallium/state_trackers/xvmc/surface.c index 199712b..8e9e079 100644 --- a/src/gallium/state_trackers/xvmc/surface.c +++ b/src/gallium/state_trackers/xvmc/surface.c @@ -270,6 +270,8 @@ Status XvMCRenderSurface(Display *dpy, XvMCContext *context, unsigned int pictur assert(target_surface_priv->context == context); assert(!past_surface || past_surface_priv->context == context); assert(!future_surface || future_surface_priv->context == context); + (void) past_surface_priv; + (void) future_surface_priv; // call end frame on all referenced frames if (past_surface) diff --git a/src/gallium/state_trackers/xvmc/tests/xvmc_bench.c b/src/gallium/state_trackers/xvmc/tests/xvmc_bench.c index 4dc95ba..ec7ecc8 100644 --- a/src/gallium/state_trackers/xvmc/tests/xvmc_bench.c +++
Re: [Mesa-dev] [PATCH 2/3] i965: Use drmIoctl for DRM_I915_GETPARAM
On Tue, Jun 28, 2016 at 10:07:10AM -0700, Chad Versace wrote: > Stop using drmCommandWriteRead for such a simple ioctl. > --- > src/mesa/drivers/dri/i965/intel_screen.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_screen.c > b/src/mesa/drivers/dri/i965/intel_screen.c > index b693c45..f7f806e 100644 > --- a/src/mesa/drivers/dri/i965/intel_screen.c > +++ b/src/mesa/drivers/dri/i965/intel_screen.c > @@ -979,8 +979,7 @@ intel_get_param(struct intel_screen *screen, int param, > int *value) > gp.param = param; > gp.value = value; > > - ret = drmCommandWriteRead(screen->driScrnPriv->fd, > - DRM_I915_GETPARAM, , sizeof(gp)); > + ret = drmIoctl(screen->driScrnPriv->fd, DRM_IOCTL_I915_GETPARAM, ); > if (ret < 0 && ret != -EINVAL) drmIoctl() doesn't return -errno, just -1 and the error code in errno. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v4] swr: Refactor checks for compiler feature flags
Encapsulate the test for which flags are needed to get a compiler to support certain features. Along with this, give various options to try for AVX and AVX2 support. Ideally we want to use specific instruction set feature flags, like -mavx2 for instance instead of -march=haswell, but the flags required for certain compilers are different. This allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c while the Intel compiler which doesn't support those flags can fall back to using -march=core-avx2. This addresses a bug where the Intel compiler will silently ignore the AVX2 instruction feature flags and then potentially fail to build. v2: Pass preprocessor-check argument as true-state instead of false-state for clarity. v3: Reduce AVX2 define test to just __AVX2__. Additional defines suchas __FMA__, __BMI2__, and __F16C__ appear to be inconsistently defined w.r.t thier availability. v4: Fix C++11 flags being added globally and add more logic to swr_require_cxx_feature_flags Cc: Tim RowleySigned-off-by: Chuck Atkins --- configure.ac| 73 + src/gallium/drivers/swr/Makefile.am | 4 +- 2 files changed, 52 insertions(+), 25 deletions(-) diff --git a/configure.ac b/configure.ac index cc9bc47..8321e8e 100644 --- a/configure.ac +++ b/configure.ac @@ -2330,6 +2330,45 @@ swr_llvm_check() { fi } +swr_require_cxx_feature_flags() { +feature_name="$1" +preprocessor_test="$2" +option_list="$3" +output_var="$4" + +AC_MSG_CHECKING([whether $CXX supports $feature_name]) +AC_LANG_PUSH([C++]) +save_CXXFLAGS="$CXXFLAGS" +save_IFS="$IFS" +IFS="," +found=0 +for opts in $option_list +do +unset IFS +CXXFLAGS="$opts $save_CXXFLAGS" +AC_COMPILE_IFELSE( +[AC_LANG_PROGRAM( +[ #if !($preprocessor_test) +#error +#endif +])], +[found=1; break], +[]) +IFS="," +done +IFS="$save_IFS" +CXXFLAGS="$save_CXXFLAGS" +AC_LANG_POP([C++]) +if test $found -eq 1; then +AC_MSG_RESULT([$opts]) +eval "$output_var=\$opts" +return 0 +fi +AC_MSG_RESULT([no]) +AC_MSG_ERROR([swr requires $feature_name support]) +return 1 +} + dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after this block if test -n "$with_gallium_drivers"; then gallium_drivers=`IFS=', '; echo $with_gallium_drivers` @@ -2399,31 +2438,19 @@ if test -n "$with_gallium_drivers"; then xswr) swr_llvm_check "swr" -AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2]) -SWR_AVX_CXXFLAGS="-mavx" -SWR_AVX2_CXXFLAGS="-mavx2 -mfma -mbmi2 -mf16c" - -AC_LANG_PUSH([C++]) -save_CXXFLAGS="$CXXFLAGS" -CXXFLAGS="-std=c++11 $CXXFLAGS" -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], - [AC_MSG_ERROR([c++11 compiler support not detected])]) -CXXFLAGS="$save_CXXFLAGS" - -save_CXXFLAGS="$CXXFLAGS" -CXXFLAGS="$SWR_AVX_CXXFLAGS $CXXFLAGS" -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], - [AC_MSG_ERROR([AVX compiler support not detected])]) -CXXFLAGS="$save_CXXFLAGS" - -save_CFLAGS="$CXXFLAGS" -CXXFLAGS="$SWR_AVX2_CXXFLAGS $CXXFLAGS" -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], - [AC_MSG_ERROR([AVX2 compiler support not detected])]) -CXXFLAGS="$save_CXXFLAGS" -AC_LANG_POP([C++]) +swr_require_cxx_feature_flags "C++11" "__cplusplus >= 201103L" \ +",-std=c++11" \ +SWR_CXX11_CXXFLAGS +AC_SUBST([SWR_CXX11_CXXFLAGS]) +swr_require_cxx_feature_flags "AVX" "defined(__AVX__)" \ +",-mavx,-march=core-avx" \ +SWR_AVX_CXXFLAGS AC_SUBST([SWR_AVX_CXXFLAGS]) + +swr_require_cxx_feature_flags "AVX2" "defined(__AVX2__)" \ +",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2" \ +SWR_AVX2_CXXFLAGS AC_SUBST([SWR_AVX2_CXXFLAGS]) HAVE_GALLIUM_SWR=yes diff --git a/src/gallium/drivers/swr/Makefile.am b/src/gallium/drivers/swr/Makefile.am index d896154..210b203 100644 --- a/src/gallium/drivers/swr/Makefile.am +++ b/src/gallium/drivers/swr/Makefile.am @@ -22,7 +22,7 @@ include Makefile.sources include $(top_srcdir)/src/gallium/Automake.inc -AM_CXXFLAGS = $(GALLIUM_DRIVER_CFLAGS) -std=c++11 +AM_CXXFLAGS = $(GALLIUM_DRIVER_CFLAGS) $(SWR_CXX11_CXXFLAGS) noinst_LTLIBRARIES = libmesaswr.la @@ -31,7 +31,7 @@ libmesaswr_la_SOURCES = $(LOADER_SOURCES) COMMON_CXXFLAGS = \ $(GALLIUM_DRIVER_CFLAGS) \ $(LLVM_CXXFLAGS) \ -
Re: [Mesa-dev] [PATCH] i965: adds gen7_emit_cs_stall_flush on intel_texture_barrier
Hi, On 28/06/16 18:00, Ilia Mirkin wrote: > On Tue, Jun 28, 2016 at 11:46 AM, Alejandro Piñeiro >wrote: >> Fixes: >> GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass >> >> On Haswell, Broadwell and Skylake (note that in order to execute >> that test, it is needed to override GL and GLSL versions). >> >> I was not able to find a documentation reference that justifies it. >> --- >> >> Having said, I didn't find a documentation reference explicitly >> mention that this is needed. >> >> Initially I thought that a flag was missing when calling >> emit_pipe_control_flush at brw_emit_mi_flush, but it was not the case >> as far as I saw. Then I noted that there is a gen6 workaround on that >> code: >> >> if (brw->gen == 6) { >> /* Hardware workaround: SNB B-Spec says: >> * >> * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache >> * Flush Enable =1, a PIPE_CONTROL with any non-zero >> * post-sync-op is required. >> */ >> brw_emit_post_sync_nonzero_flush(brw); >> } >> >> I tested calling that method for any gen, guessing if the workaround >> was needed also for other gens, and the test got fixed. But looking at >> the documentation of other gens, I didn't find the need for this >> workaround. For that reason I moved to use gen7_emit_cs_stall, that is >> less agressive and get the test fixed too. It seems that in order to >> get a complete flush you need a cs stall flush with a >> pipe_control_write. But again, I didn't find any reference at the PRMs >> confirming it. >> >> Intuitively, this would be needed on brw_emit_mi_flush or even at >> brw_emit_pipe_control_flush (this one already include some >> gen-specific workarounds), but I prefered to keep it on the only place >> that seems to need it for now. >> >> In addition to solve that CTS test, it also gets it passing for the >> test I recently sent to the piglit list, and not included on master >> yet (acked for now): >> https://lists.freedesktop.org/archives/piglit/2016-June/020055.html >> >> That piglit patch adds 48 parameter combination for the basic >> test. Without this mesa patch 5-6 subtests fails. With this patch all >> of them passes. Tested on Haswell, Broadwell and Skylake too. >> >> src/mesa/drivers/dri/i965/intel_tex.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/src/mesa/drivers/dri/i965/intel_tex.c >> b/src/mesa/drivers/dri/i965/intel_tex.c >> index cac33ac..e7459cd 100644 >> --- a/src/mesa/drivers/dri/i965/intel_tex.c >> +++ b/src/mesa/drivers/dri/i965/intel_tex.c >> @@ -362,6 +362,7 @@ intel_texture_barrier(struct gl_context *ctx) >> { >> struct brw_context *brw = brw_context(ctx); >> >> + gen7_emit_cs_stall_flush(brw); >> brw_emit_mi_flush(brw); > Without commenting on exactly what these do, what texture barrier *should* do > is > > (1) wait for all previous draws to complete (since they may be in the > process of filling caches with "old" data) > (2) flush texture caches > > If you flush caches without waiting first, then a draw currently in > progress may continue dirtying them with the "bad" data. Thanks for the detailed answer. It is true that I was forgetting (1) at all. I totally focused on the cache flush, and assumed that there was something missing there. > As I said, however, I have no idea what either of the above functions > *really* do, or what forms of parallelism are possible on intel hw. > Hopefully the above comments will help someone with the proper > knowledge evaluate whether this or a different change is necessary. I really think that brw_emit_mi_flush totally fits on your (2) (and should not be modified as I suggested on my previous email). gen7_emit_cs_stall_flush as the better option for (1) is debatable. Tomorrow I will keep checking to confirm it, or in order to find a better option. Obviously if something with previous knowledge appears it will be welcome. Again, thanks for all the feedback, and the patience. Best regards ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Fwd: [PATCH] st/vdpau: use bicubic filter for scaling
Christian König wrote: Am 28.06.2016 um 20:11 schrieb Nayan Deshmukh: Hi Andy, Thanks for testing the patches. On Tue, Jun 28, 2016 at 11:26 PM, Andy Furniss> wrote: Nayan Deshmukh wrote: Hi Christian and Andy, I have sent new series of patches which takes care of the points Christian pointed out. I have also made some changes to make it more efficient than before. Also due to a wrong message id, I have sent the messages as a new thread instead of replying to this thread. With the latest patches the artifacts are gone. Sounds great. Indeed, if nobody has any more suggestions I'm going to push this version upstream tomorrow. One Issue I just tested - it doesn't work with sharpen or denoise, corrupted output compared to hqscale=0. I didn't try this before, so don't know if it ever worked. It's OK with deint. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Fwd: [PATCH] st/vdpau: use bicubic filter for scaling
Hi Christian, I will send a new patch in which the calculation is done before using the constant buffer. Also, Grigori suggested me to use gather4 instead of sampler to get the textures. I tried using it but I don't see any code from where I can take inspiration to use that. Regards, Nayan. On Tue, Jun 28, 2016 at 11:51 PM, Christian Königwrote: > Am 28.06.2016 um 20:11 schrieb Nayan Deshmukh: > > Hi Andy, > > Thanks for testing the patches. > > On Tue, Jun 28, 2016 at 11:26 PM, Andy Furniss > wrote: > >> Nayan Deshmukh wrote: >> >>> Hi Christian and Andy, >>> >>> I have sent new series of patches which takes care of the points >>> Christian >>> pointed out. >>> >>> I have also made some changes to make it more efficient than before. >>> >>> Also due to a wrong message id, I have sent the messages as a new thread >>> instead of replying to this thread. >>> >> >> With the latest patches the artifacts are gone. > > > Sounds great. > > > Indeed, if nobody has any more suggestions I'm going to push this version > upstream tomorrow. > > Regards, > Christian. > > > >> There is still a slight offset on scaled up vids, this is better than >> before though, as now there is no offset on unscaled vids. >> >> >> I also see a slight offset but I am not able to find the reason for this > offset I have set the viewport similiar to the case of hqscaling=0. > > Regards, > Nayan. > > > > > ___ > mesa-dev mailing > listmesa-dev@lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3] swr: Refactor checks for compiler feature flags
Encapsulate the test for which flags are needed to get a compiler to support certain features. Along with this, give various options to try for AVX and AVX2 support. Ideally we want to use specific instruction set feature flags, like -mavx2 for instance instead of -march=haswell, but the flags required for certain compilers are different. This allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c while the Intel compiler which doesn't support those flags can fall back to using -march=core-avx2. This addresses a bug where the Intel compiler will silently ignore the AVX2 instruction feature flags and then potentially fail to build. v2: Pass preprocessor-check argument as true-state instead of false-state for clarity. v3: Reduce AVX2 define test to just __AVX2__. Additional defines suchas __FMA__, __BMI2__, and __F16C__ appear to be inconsistently defined w.r.t thier availability. Cc: Tim RowleySigned-off-by: Chuck Atkins --- configure.ac | 86 +++- 1 file changed, 62 insertions(+), 24 deletions(-) diff --git a/configure.ac b/configure.ac index cc9bc47..92c35e8 100644 --- a/configure.ac +++ b/configure.ac @@ -2330,6 +2330,39 @@ swr_llvm_check() { fi } +swr_cxx_feature_flags_check() { +preprocessor_test="$1" +option_list="$2" +unset SWR_CXX_FEATURE_FLAGS +AC_LANG_PUSH([C++]) +save_CXXFLAGS="$CXXFLAGS" +save_IFS="$IFS" +IFS="," +found=0 +for opts in $option_list +do +unset IFS +CXXFLAGS="$opts $save_CXXFLAGS" +AC_COMPILE_IFELSE( +[AC_LANG_PROGRAM( +[ #if !($preprocessor_test) +#error +#endif +])], +[found=1; break], +[]) +IFS="," +done +IFS="$save_IFS" +CXXFLAGS="$save_CXXFLAGS" +AC_LANG_POP([C++]) +if test $found -eq 1; then +SWR_CXX_FEATURE_FLAGS="$opts" +return 0 +fi +return 1 +} + dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after this block if test -n "$with_gallium_drivers"; then gallium_drivers=`IFS=', '; echo $with_gallium_drivers` @@ -2399,31 +2432,36 @@ if test -n "$with_gallium_drivers"; then xswr) swr_llvm_check "swr" -AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2]) -SWR_AVX_CXXFLAGS="-mavx" -SWR_AVX2_CXXFLAGS="-mavx2 -mfma -mbmi2 -mf16c" - -AC_LANG_PUSH([C++]) -save_CXXFLAGS="$CXXFLAGS" -CXXFLAGS="-std=c++11 $CXXFLAGS" -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], - [AC_MSG_ERROR([c++11 compiler support not detected])]) -CXXFLAGS="$save_CXXFLAGS" - -save_CXXFLAGS="$CXXFLAGS" -CXXFLAGS="$SWR_AVX_CXXFLAGS $CXXFLAGS" -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], - [AC_MSG_ERROR([AVX compiler support not detected])]) -CXXFLAGS="$save_CXXFLAGS" - -save_CFLAGS="$CXXFLAGS" -CXXFLAGS="$SWR_AVX2_CXXFLAGS $CXXFLAGS" -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], - [AC_MSG_ERROR([AVX2 compiler support not detected])]) -CXXFLAGS="$save_CXXFLAGS" -AC_LANG_POP([C++]) - +AC_MSG_CHECKING([whether $CXX supports c++11]) +if ! swr_cxx_feature_flags_check \ +"__cplusplus >= 201103L" \ +",-std=c++11"; then +AC_MSG_RESULT([no]) +AC_MSG_ERROR([swr requires C++11 support]) +fi +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS]) +CXXFLAGS="$SWR_CXX_FEATURE_FLAGS $CXXFLAGS" + +AC_MSG_CHECKING([whether $CXX supports AVX]) +if ! swr_cxx_feature_flags_check \ +"defined(__AVX__)" \ +",-mavx,-march=core-avx"; then +AC_MSG_RESULT([no]) +AC_MSG_ERROR([swr requires AVX compiler support]) +fi +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS]) +SWR_AVX_CXXFLAGS="$SWR_CXX_FEATURE_FLAGS" AC_SUBST([SWR_AVX_CXXFLAGS]) + +AC_MSG_CHECKING([whether $CXX supports AVX2]) +if ! swr_cxx_feature_flags_check \ +"defined(__AVX2__)" \ +",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2"; then +AC_MSG_RESULT([no]) +AC_MSG_ERROR([swr requires AVX2 compiler support]) +fi +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS]) +SWR_AVX2_CXXFLAGS="$SWR_CXX_FEATURE_FLAGS" AC_SUBST([SWR_AVX2_CXXFLAGS]) HAVE_GALLIUM_SWR=yes -- 2.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
Re: [Mesa-dev] Fwd: [PATCH] st/vdpau: use bicubic filter for scaling
Am 28.06.2016 um 20:11 schrieb Nayan Deshmukh: Hi Andy, Thanks for testing the patches. On Tue, Jun 28, 2016 at 11:26 PM, Andy Furniss> wrote: Nayan Deshmukh wrote: Hi Christian and Andy, I have sent new series of patches which takes care of the points Christian pointed out. I have also made some changes to make it more efficient than before. Also due to a wrong message id, I have sent the messages as a new thread instead of replying to this thread. With the latest patches the artifacts are gone. Sounds great. Indeed, if nobody has any more suggestions I'm going to push this version upstream tomorrow. Regards, Christian. There is still a slight offset on scaled up vids, this is better than before though, as now there is no offset on unscaled vids. I also see a slight offset but I am not able to find the reason for this offset I have set the viewport similiar to the case of hqscaling=0. Regards, Nayan. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa/st: Include nir.h for nir_shader symbol.
On Tue, Jun 28, 2016 at 7:51 AM, Rob Clarkwrote: > Already half of the world gets recompiled when you touch nir.h, and > I'd rather not make that worse.. Exactly my thoughts. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swr: Refactor checks for compiler feature flags
The only guaranteed way I can think of to ensure compiler support is to try compiling source that calls one intrinsic from each of the used groups. I can see that being "more correct" but I can't really think of a situation where just checking for the __AVX2__ define will fail to build wither. - Chuck On Tue, Jun 28, 2016 at 2:10 PM, Chuck Atkinswrote: > So this seems to be different across versions as well. It looks like > __AVX__ and __AVX2__ are the only ones we can really count on being there. > I can drop the second check to just __AVX2__. I think it's redundant by > chance though that all CPUs that supported AVX2 also seem to support the > additional 2 instructions. > > - Chuck > > On Tue, Jun 28, 2016 at 1:52 PM, Rowley, Timothy O < > timothy.o.row...@intel.com> wrote: > >> >> > On Jun 28, 2016, at 8:24 AM, Chuck Atkins >> wrote: >> > >> > Encapsulate the test for which flags are needed to get a compiler to >> > support certain features. Along with this, give various options to try >> > for AVX and AVX2 support. Ideally we want to use specific instruction >> > set feature flags, like -mavx2 for instance instead of -march=haswell, >> > but the flags required for certain compilers are different. This >> > allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c >> > while the Intel compiler which doesn't support those flags can fall >> > back to using -march=core-avx2. >> > >> > This addresses a bug where the Intel compiler will silently ignore the >> > AVX2 instruction feature flags and then potentially fail to build. >> > >> > Cc: Tim Rowley >> > Signed-off-by: Chuck Atkins >> > --- >> > configure.ac | 86 >> +++- >> > 1 file changed, 62 insertions(+), 24 deletions(-) >> > >> > diff --git a/configure.ac b/configure.ac >> > index cc9bc47..806850e 100644 >> > --- a/configure.ac >> > +++ b/configure.ac >> > @@ -2330,6 +2330,39 @@ swr_llvm_check() { >> > fi >> > } >> > >> > +swr_cxx_feature_flags_check() { >> > +ifndef_test=$1 >> > +option_list="$2" >> > +unset SWR_CXX_FEATURE_FLAGS >> > +AC_LANG_PUSH([C++]) >> > +save_CXXFLAGS="$CXXFLAGS" >> > +save_IFS="$IFS" >> > +IFS="," >> > +found=0 >> > +for opts in $option_list >> > +do >> > +unset IFS >> > +CXXFLAGS="$opts $save_CXXFLAGS" >> > +AC_COMPILE_IFELSE( >> > +[AC_LANG_PROGRAM( >> > +[ $ifndef_test >> > +#error >> > +#endif >> > +])], >> > +[found=1; break], >> > +[]) >> > +IFS="," >> > +done >> > +IFS="$save_IFS" >> > +CXXFLAGS="$save_CXXFLAGS" >> > +AC_LANG_POP([C++]) >> > +if test $found -eq 1; then >> > +SWR_CXX_FEATURE_FLAGS="$opts" >> > +return 0 >> > +fi >> > +return 1 >> > +} >> > + >> > dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after >> this block >> > if test -n "$with_gallium_drivers"; then >> > gallium_drivers=`IFS=', '; echo $with_gallium_drivers` >> > @@ -2399,31 +2432,36 @@ if test -n "$with_gallium_drivers"; then >> > xswr) >> > swr_llvm_check "swr" >> > >> > -AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2]) >> > -SWR_AVX_CXXFLAGS="-mavx" >> > -SWR_AVX2_CXXFLAGS="-mavx2 -mfma -mbmi2 -mf16c" >> > - >> > -AC_LANG_PUSH([C++]) >> > -save_CXXFLAGS="$CXXFLAGS" >> > -CXXFLAGS="-std=c++11 $CXXFLAGS" >> > -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], >> > - [AC_MSG_ERROR([c++11 compiler support >> not detected])]) >> > -CXXFLAGS="$save_CXXFLAGS" >> > - >> > -save_CXXFLAGS="$CXXFLAGS" >> > -CXXFLAGS="$SWR_AVX_CXXFLAGS $CXXFLAGS" >> > -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], >> > - [AC_MSG_ERROR([AVX compiler support not >> detected])]) >> > -CXXFLAGS="$save_CXXFLAGS" >> > - >> > -save_CFLAGS="$CXXFLAGS" >> > -CXXFLAGS="$SWR_AVX2_CXXFLAGS $CXXFLAGS" >> > -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], >> > - [AC_MSG_ERROR([AVX2 compiler support not >> detected])]) >> > -CXXFLAGS="$save_CXXFLAGS" >> > -AC_LANG_POP([C++]) >> > - >> > +AC_MSG_CHECKING([whether $CXX supports c++11]) >> > +if ! swr_cxx_feature_flags_check \ >> > +"#if __cplusplus < 201103L" \ >> > +",-std=c++11"; then >> > +AC_MSG_RESULT([no]) >> > +AC_MSG_ERROR([swr requires C++11 support]) >> > +fi >> > +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS]) >> > +CXXFLAGS="$SWR_CXX_FEATURE_FLAGS $CXXFLAGS” >> >> We don’t want to
Re: [Mesa-dev] Fwd: [PATCH] st/vdpau: use bicubic filter for scaling
Hi Andy, Thanks for testing the patches. On Tue, Jun 28, 2016 at 11:26 PM, Andy Furnisswrote: > Nayan Deshmukh wrote: > >> Hi Christian and Andy, >> >> I have sent new series of patches which takes care of the points Christian >> pointed out. >> >> I have also made some changes to make it more efficient than before. >> >> Also due to a wrong message id, I have sent the messages as a new thread >> instead of replying to this thread. >> > > With the latest patches the artifacts are gone. Sounds great. > There is still a slight offset on scaled up vids, this is better than > before though, as now there is no offset on unscaled vids. > > > I also see a slight offset but I am not able to find the reason for this offset I have set the viewport similiar to the case of hqscaling=0. Regards, Nayan. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swr: Refactor checks for compiler feature flags
So this seems to be different across versions as well. It looks like __AVX__ and __AVX2__ are the only ones we can really count on being there. I can drop the second check to just __AVX2__. I think it's redundant by chance though that all CPUs that supported AVX2 also seem to support the additional 2 instructions. - Chuck On Tue, Jun 28, 2016 at 1:52 PM, Rowley, Timothy O < timothy.o.row...@intel.com> wrote: > > > On Jun 28, 2016, at 8:24 AM, Chuck Atkins> wrote: > > > > Encapsulate the test for which flags are needed to get a compiler to > > support certain features. Along with this, give various options to try > > for AVX and AVX2 support. Ideally we want to use specific instruction > > set feature flags, like -mavx2 for instance instead of -march=haswell, > > but the flags required for certain compilers are different. This > > allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c > > while the Intel compiler which doesn't support those flags can fall > > back to using -march=core-avx2. > > > > This addresses a bug where the Intel compiler will silently ignore the > > AVX2 instruction feature flags and then potentially fail to build. > > > > Cc: Tim Rowley > > Signed-off-by: Chuck Atkins > > --- > > configure.ac | 86 > +++- > > 1 file changed, 62 insertions(+), 24 deletions(-) > > > > diff --git a/configure.ac b/configure.ac > > index cc9bc47..806850e 100644 > > --- a/configure.ac > > +++ b/configure.ac > > @@ -2330,6 +2330,39 @@ swr_llvm_check() { > > fi > > } > > > > +swr_cxx_feature_flags_check() { > > +ifndef_test=$1 > > +option_list="$2" > > +unset SWR_CXX_FEATURE_FLAGS > > +AC_LANG_PUSH([C++]) > > +save_CXXFLAGS="$CXXFLAGS" > > +save_IFS="$IFS" > > +IFS="," > > +found=0 > > +for opts in $option_list > > +do > > +unset IFS > > +CXXFLAGS="$opts $save_CXXFLAGS" > > +AC_COMPILE_IFELSE( > > +[AC_LANG_PROGRAM( > > +[ $ifndef_test > > +#error > > +#endif > > +])], > > +[found=1; break], > > +[]) > > +IFS="," > > +done > > +IFS="$save_IFS" > > +CXXFLAGS="$save_CXXFLAGS" > > +AC_LANG_POP([C++]) > > +if test $found -eq 1; then > > +SWR_CXX_FEATURE_FLAGS="$opts" > > +return 0 > > +fi > > +return 1 > > +} > > + > > dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after > this block > > if test -n "$with_gallium_drivers"; then > > gallium_drivers=`IFS=', '; echo $with_gallium_drivers` > > @@ -2399,31 +2432,36 @@ if test -n "$with_gallium_drivers"; then > > xswr) > > swr_llvm_check "swr" > > > > -AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2]) > > -SWR_AVX_CXXFLAGS="-mavx" > > -SWR_AVX2_CXXFLAGS="-mavx2 -mfma -mbmi2 -mf16c" > > - > > -AC_LANG_PUSH([C++]) > > -save_CXXFLAGS="$CXXFLAGS" > > -CXXFLAGS="-std=c++11 $CXXFLAGS" > > -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], > > - [AC_MSG_ERROR([c++11 compiler support not > detected])]) > > -CXXFLAGS="$save_CXXFLAGS" > > - > > -save_CXXFLAGS="$CXXFLAGS" > > -CXXFLAGS="$SWR_AVX_CXXFLAGS $CXXFLAGS" > > -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], > > - [AC_MSG_ERROR([AVX compiler support not > detected])]) > > -CXXFLAGS="$save_CXXFLAGS" > > - > > -save_CFLAGS="$CXXFLAGS" > > -CXXFLAGS="$SWR_AVX2_CXXFLAGS $CXXFLAGS" > > -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], > > - [AC_MSG_ERROR([AVX2 compiler support not > detected])]) > > -CXXFLAGS="$save_CXXFLAGS" > > -AC_LANG_POP([C++]) > > - > > +AC_MSG_CHECKING([whether $CXX supports c++11]) > > +if ! swr_cxx_feature_flags_check \ > > +"#if __cplusplus < 201103L" \ > > +",-std=c++11"; then > > +AC_MSG_RESULT([no]) > > +AC_MSG_ERROR([swr requires C++11 support]) > > +fi > > +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS]) > > +CXXFLAGS="$SWR_CXX_FEATURE_FLAGS $CXXFLAGS” > > We don’t want to globally override CXXFLAGS; AC_SUBST on a SWR_CXXFLAGS > and using that in swr’s Makefile.am would be better. > > > + > > +AC_MSG_CHECKING([whether $CXX supports AVX]) > > +if ! swr_cxx_feature_flags_check \ > > +"#ifndef __AVX__" \ > > +",-mavx,-march=core-avx"; then > > +AC_MSG_RESULT([no]) > > +AC_MSG_ERROR([swr requires AVX compiler support]) > > +fi > > +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS]) > > +
Re: [Mesa-dev] Fwd: [PATCH] st/vdpau: use bicubic filter for scaling
Nayan Deshmukh wrote: Hi Christian and Andy, I have sent new series of patches which takes care of the points Christian pointed out. I have also made some changes to make it more efficient than before. Also due to a wrong message id, I have sent the messages as a new thread instead of replying to this thread. With the latest patches the artifacts are gone. There is still a slight offset on scaled up vids, this is better than before though, as now there is no offset on unscaled vids. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] radeonsi: enable distributed tess on multi-SE parts only
From: Marek Olšákported from Vulkan --- src/gallium/drivers/radeonsi/si_pipe.c | 4 src/gallium/drivers/radeonsi/si_pipe.h | 1 + src/gallium/drivers/radeonsi/si_state_draw.c| 2 +- src/gallium/drivers/radeonsi/si_state_shaders.c | 2 +- 4 files changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c index f38ecc1..633d4bb 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.c +++ b/src/gallium/drivers/radeonsi/si_pipe.c @@ -712,6 +712,10 @@ struct pipe_screen *radeonsi_screen_create(struct radeon_winsys *ws) sscreen->tess_offchip_block_dw_size = sscreen->b.family == CHIP_HAWAII ? 4096 : 8192; + sscreen->has_distributed_tess = + sscreen->b.chip_class >= VI && + sscreen->b.info.max_se >= 2; + sscreen->b.has_cp_dma = true; sscreen->b.has_streamout = true; pipe_mutex_init(sscreen->shader_parts_mutex); diff --git a/src/gallium/drivers/radeonsi/si_pipe.h b/src/gallium/drivers/radeonsi/si_pipe.h index ee64ecc..3aff0ac 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.h +++ b/src/gallium/drivers/radeonsi/si_pipe.h @@ -83,6 +83,7 @@ struct si_screen { struct r600_common_screen b; unsignedgs_table_depth; unsignedtess_offchip_block_dw_size; + boolhas_distributed_tess; /* Whether shaders are monolithic (1-part) or separate (3-part). */ booluse_monolithic_shaders; diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index 3558510..ce8def4 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -274,7 +274,7 @@ static unsigned si_get_ia_multi_vgt_param(struct si_context *sctx, partial_vs_wave = true; /* Needed for 028B6C_DISTRIBUTION_MODE != 0 */ - if (sctx->b.chip_class >= VI) { + if (sctx->screen->has_distributed_tess) { if (sctx->gs_shader.cso) partial_es_wave = true; else diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index 9aa4a7c..4bcdeb6 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -300,7 +300,7 @@ static void si_set_tesseval_regs(struct si_screen *sscreen, else topology = V_028B6C_OUTPUT_TRIANGLE_CW; - if (sscreen->b.chip_class >= VI) { + if (sscreen->has_distributed_tess) { if (sscreen->b.family == CHIP_FIJI || sscreen->b.family >= CHIP_POLARIS10) distribution_mode = V_028B6C_DISTRIBUTION_MODE_TRAPEZOIDS; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swr: Refactor checks for compiler feature flags
> On Jun 28, 2016, at 8:24 AM, Chuck Atkinswrote: > > Encapsulate the test for which flags are needed to get a compiler to > support certain features. Along with this, give various options to try > for AVX and AVX2 support. Ideally we want to use specific instruction > set feature flags, like -mavx2 for instance instead of -march=haswell, > but the flags required for certain compilers are different. This > allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c > while the Intel compiler which doesn't support those flags can fall > back to using -march=core-avx2. > > This addresses a bug where the Intel compiler will silently ignore the > AVX2 instruction feature flags and then potentially fail to build. > > Cc: Tim Rowley > Signed-off-by: Chuck Atkins > --- > configure.ac | 86 +++- > 1 file changed, 62 insertions(+), 24 deletions(-) > > diff --git a/configure.ac b/configure.ac > index cc9bc47..806850e 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -2330,6 +2330,39 @@ swr_llvm_check() { > fi > } > > +swr_cxx_feature_flags_check() { > +ifndef_test=$1 > +option_list="$2" > +unset SWR_CXX_FEATURE_FLAGS > +AC_LANG_PUSH([C++]) > +save_CXXFLAGS="$CXXFLAGS" > +save_IFS="$IFS" > +IFS="," > +found=0 > +for opts in $option_list > +do > +unset IFS > +CXXFLAGS="$opts $save_CXXFLAGS" > +AC_COMPILE_IFELSE( > +[AC_LANG_PROGRAM( > +[ $ifndef_test > +#error > +#endif > +])], > +[found=1; break], > +[]) > +IFS="," > +done > +IFS="$save_IFS" > +CXXFLAGS="$save_CXXFLAGS" > +AC_LANG_POP([C++]) > +if test $found -eq 1; then > +SWR_CXX_FEATURE_FLAGS="$opts" > +return 0 > +fi > +return 1 > +} > + > dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after this > block > if test -n "$with_gallium_drivers"; then > gallium_drivers=`IFS=', '; echo $with_gallium_drivers` > @@ -2399,31 +2432,36 @@ if test -n "$with_gallium_drivers"; then > xswr) > swr_llvm_check "swr" > > -AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2]) > -SWR_AVX_CXXFLAGS="-mavx" > -SWR_AVX2_CXXFLAGS="-mavx2 -mfma -mbmi2 -mf16c" > - > -AC_LANG_PUSH([C++]) > -save_CXXFLAGS="$CXXFLAGS" > -CXXFLAGS="-std=c++11 $CXXFLAGS" > -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], > - [AC_MSG_ERROR([c++11 compiler support not > detected])]) > -CXXFLAGS="$save_CXXFLAGS" > - > -save_CXXFLAGS="$CXXFLAGS" > -CXXFLAGS="$SWR_AVX_CXXFLAGS $CXXFLAGS" > -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], > - [AC_MSG_ERROR([AVX compiler support not > detected])]) > -CXXFLAGS="$save_CXXFLAGS" > - > -save_CFLAGS="$CXXFLAGS" > -CXXFLAGS="$SWR_AVX2_CXXFLAGS $CXXFLAGS" > -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], > - [AC_MSG_ERROR([AVX2 compiler support not > detected])]) > -CXXFLAGS="$save_CXXFLAGS" > -AC_LANG_POP([C++]) > - > +AC_MSG_CHECKING([whether $CXX supports c++11]) > +if ! swr_cxx_feature_flags_check \ > +"#if __cplusplus < 201103L" \ > +",-std=c++11"; then > +AC_MSG_RESULT([no]) > +AC_MSG_ERROR([swr requires C++11 support]) > +fi > +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS]) > +CXXFLAGS="$SWR_CXX_FEATURE_FLAGS $CXXFLAGS” We don’t want to globally override CXXFLAGS; AC_SUBST on a SWR_CXXFLAGS and using that in swr’s Makefile.am would be better. > + > +AC_MSG_CHECKING([whether $CXX supports AVX]) > +if ! swr_cxx_feature_flags_check \ > +"#ifndef __AVX__" \ > +",-mavx,-march=core-avx"; then > +AC_MSG_RESULT([no]) > +AC_MSG_ERROR([swr requires AVX compiler support]) > +fi > +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS]) > +SWR_AVX_CXXFLAGS="$SWR_CXX_FEATURE_FLAGS" > AC_SUBST([SWR_AVX_CXXFLAGS]) > + > +AC_MSG_CHECKING([whether $CXX supports AVX2]) > +if ! swr_cxx_feature_flags_check \ > +"#if > !(defined(__AVX2__)&(__FMA__)&(__BMI2__)&(__F16C__))” > \ Is there any standard that says these are defined if the compiler supports them? With icc 16.0.3, the test falls into the #error path when it tries the fallback test of -march=core-avx2. > +",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2"; then > +AC_MSG_RESULT([no]) > +AC_MSG_ERROR([swr
[Mesa-dev] [PATCH 1/4] radeonsi: use conformant line rasterization
From: Marek OlšákAA lines are not completely correct (see TODO), but everything else should be. + 3 linestipple piglits --- src/gallium/drivers/radeon/cayman_msaa.c | 12 ++-- src/gallium/drivers/radeon/r600d_common.h| 6 ++ src/gallium/drivers/radeonsi/si_state.c | 10 +- src/gallium/drivers/radeonsi/si_state_draw.c | 6 -- 4 files changed, 29 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/radeon/cayman_msaa.c b/src/gallium/drivers/radeon/cayman_msaa.c index a9ec4c3..89c4937 100644 --- a/src/gallium/drivers/radeon/cayman_msaa.c +++ b/src/gallium/drivers/radeon/cayman_msaa.c @@ -200,6 +200,14 @@ void cayman_emit_msaa_config(struct radeon_winsys_cs *cs, int nr_samples, { int setup_samples = nr_samples > 1 ? nr_samples : overrast_samples > 1 ? overrast_samples : 0; + /* Required by OpenGL line rasterization. +* +* TODO: We should also enable perpendicular endcaps for AA lines, +* but that requires implementing line stippling in the pixel +* shader. SC can only do line stippling with axis-aligned +* endcaps. +*/ + unsigned sc_line_cntl = S_028BDC_DX10_DIAMOND_TEST_ENA(1); if (setup_samples > 1) { /* indexed by log2(nr_samples) */ @@ -215,7 +223,7 @@ void cayman_emit_msaa_config(struct radeon_winsys_cs *cs, int nr_samples, util_logbase2(util_next_power_of_two(ps_iter_samples)); radeon_set_context_reg_seq(cs, CM_R_028BDC_PA_SC_LINE_CNTL, 2); - radeon_emit(cs, S_028BDC_LAST_PIXEL(1) | + radeon_emit(cs, sc_line_cntl | S_028BDC_EXPAND_LINE_WIDTH(1)); /* CM_R_028BDC_PA_SC_LINE_CNTL */ radeon_emit(cs, S_028BE0_MSAA_NUM_SAMPLES(log_samples) | S_028BE0_MAX_SAMPLE_DIST(max_dist[log_samples]) | @@ -242,7 +250,7 @@ void cayman_emit_msaa_config(struct radeon_winsys_cs *cs, int nr_samples, } } else { radeon_set_context_reg_seq(cs, CM_R_028BDC_PA_SC_LINE_CNTL, 2); - radeon_emit(cs, S_028BDC_LAST_PIXEL(1)); /* CM_R_028BDC_PA_SC_LINE_CNTL */ + radeon_emit(cs, sc_line_cntl); /* CM_R_028BDC_PA_SC_LINE_CNTL */ radeon_emit(cs, 0); /* CM_R_028BE0_PA_SC_AA_CONFIG */ radeon_set_context_reg(cs, CM_R_028804_DB_EQAA, diff --git a/src/gallium/drivers/radeon/r600d_common.h b/src/gallium/drivers/radeon/r600d_common.h index e50de96..6f534b3 100644 --- a/src/gallium/drivers/radeon/r600d_common.h +++ b/src/gallium/drivers/radeon/r600d_common.h @@ -203,6 +203,12 @@ #define S_028BDC_LAST_PIXEL(x) (((unsigned)(x) & 0x1) << 10) #define G_028BDC_LAST_PIXEL(x) (((x) >> 10) & 0x1) #define C_028BDC_LAST_PIXEL 0xFBFF +#define S_028BDC_PERPENDICULAR_ENDCAP_ENA(x) (((unsigned)(x) & 0x1) << 11) +#define G_028BDC_PERPENDICULAR_ENDCAP_ENA(x) (((x) >> 11) & 0x1) +#define C_028BDC_PERPENDICULAR_ENDCAP_ENA0xF7FF +#define S_028BDC_DX10_DIAMOND_TEST_ENA(x)(((unsigned)(x) & 0x1) << 12) +#define G_028BDC_DX10_DIAMOND_TEST_ENA(x)(((x) >> 12) & 0x1) +#define C_028BDC_DX10_DIAMOND_TEST_ENA 0xEFFF #define CM_R_028BE0_PA_SC_AA_CONFIG 0x28be0 #define S_028BE0_MSAA_NUM_SAMPLES(x) (((unsigned)(x) & 0x7) << 0) #define S_028BE0_AA_MASK_CENTROID_DTMN(x)(((unsigned)(x) & 0x1) << 4) diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index 0a2fdbf..b21fa5c 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -3805,7 +3805,15 @@ static void si_init_config(struct si_context *sctx) S_028034_BR_X(16384) | S_028034_BR_Y(16384)); si_pm4_set_reg(pm4, R_02820C_PA_SC_CLIPRECT_RULE, 0x); - si_pm4_set_reg(pm4, R_028230_PA_SC_EDGERULE, 0x); + si_pm4_set_reg(pm4, R_028230_PA_SC_EDGERULE, + S_028230_ER_TRI(0xA) | + S_028230_ER_POINT(0xA) | + S_028230_ER_RECT(0xA) | + /* Required by DX10_DIAMOND_TEST_ENA: */ + S_028230_ER_LINE_LR(0x1A) | + S_028230_ER_LINE_RL(0x26) | + S_028230_ER_LINE_TB(0xA) | + S_028230_ER_LINE_BT(0xA)); /* PA_SU_HARDWARE_SCREEN_OFFSET must be 0 due to hw bug on SI */ si_pm4_set_reg(pm4, R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0); si_pm4_set_reg(pm4, R_028820_PA_CL_NANINF_CNTL, 0); diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index 5f866d5..b9a7c14 100644 ---
[Mesa-dev] [PATCH 3/4] radeonsi: set optimal VGT_HS_OFFCHIP_PARAM
From: Marek Olšákported from Vulkan --- src/gallium/drivers/radeonsi/si_pipe.c | 6 +++ src/gallium/drivers/radeonsi/si_pipe.h | 1 + src/gallium/drivers/radeonsi/si_state.h | 2 - src/gallium/drivers/radeonsi/si_state_draw.c| 5 ++- src/gallium/drivers/radeonsi/si_state_shaders.c | 49 - 5 files changed, 49 insertions(+), 14 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c index d835681..f38ecc1 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.c +++ b/src/gallium/drivers/radeonsi/si_pipe.c @@ -706,6 +706,12 @@ struct pipe_screen *radeonsi_screen_create(struct radeon_winsys *ws) if (!debug_get_bool_option("RADEON_DISABLE_PERFCOUNTERS", false)) si_init_perfcounters(sscreen); + /* Hawaii has a bug with offchip buffers > 256 that can be worked +* around by setting 4K granularity. +*/ + sscreen->tess_offchip_block_dw_size = + sscreen->b.family == CHIP_HAWAII ? 4096 : 8192; + sscreen->b.has_cp_dma = true; sscreen->b.has_streamout = true; pipe_mutex_init(sscreen->shader_parts_mutex); diff --git a/src/gallium/drivers/radeonsi/si_pipe.h b/src/gallium/drivers/radeonsi/si_pipe.h index d181905..ee64ecc 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.h +++ b/src/gallium/drivers/radeonsi/si_pipe.h @@ -82,6 +82,7 @@ struct u_suballocator; struct si_screen { struct r600_common_screen b; unsignedgs_table_depth; + unsignedtess_offchip_block_dw_size; /* Whether shaders are monolithic (1-part) or separate (3-part). */ booluse_monolithic_shaders; diff --git a/src/gallium/drivers/radeonsi/si_state.h b/src/gallium/drivers/radeonsi/si_state.h index 2e4923d..9361849 100644 --- a/src/gallium/drivers/radeonsi/si_state.h +++ b/src/gallium/drivers/radeonsi/si_state.h @@ -40,8 +40,6 @@ #define SI_NUM_IMAGES 16 #define SI_NUM_SHADER_BUFFERS 16 -#define SI_TESS_OFFCHIP_BLOCK_SIZE (8192 * 4) - struct si_screen; struct si_shader; diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index b9a7c14..3558510 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -147,8 +147,9 @@ static void si_emit_derived_tess_state(struct si_context *sctx, output_patch_size)); /* Make sure the output data fits in the offchip buffer */ - *num_patches = MIN2(*num_patches, SI_TESS_OFFCHIP_BLOCK_SIZE / - output_patch_size); + *num_patches = MIN2(*num_patches, + (sctx->screen->tess_offchip_block_dw_size * 4) / + output_patch_size); /* Not necessary for correctness, but improves performance. The * specific value is taken from the proprietary driver. diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index 89490bd..9aa4a7c 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -1798,9 +1798,38 @@ static bool si_update_spi_tmpring_size(struct si_context *sctx) static void si_init_tess_factor_ring(struct si_context *sctx) { - unsigned offchip_blocks = sctx->b.chip_class >= CIK ? 256 : 64; - assert(!sctx->tf_ring); + bool double_offchip_buffers = sctx->b.chip_class >= CIK; + unsigned max_offchip_buffers_per_se = double_offchip_buffers ? 128 : 64; + unsigned max_offchip_buffers = max_offchip_buffers_per_se * + sctx->screen->b.info.max_se; + unsigned offchip_granularity; + + switch (sctx->screen->tess_offchip_block_dw_size) { + default: + assert(0); + /* fall through */ + case 8192: + offchip_granularity = V_03093C_X_8K_DWORDS; + break; + case 4096: + offchip_granularity = V_03093C_X_4K_DWORDS; + break; + } + switch (sctx->b.chip_class) { + case SI: + max_offchip_buffers = MIN2(max_offchip_buffers, 126); + break; + case CIK: + max_offchip_buffers = MIN2(max_offchip_buffers, 508); + break; + case VI: + default: + max_offchip_buffers = MIN2(max_offchip_buffers, 512); + break; + } + + assert(!sctx->tf_ring); sctx->tf_ring = pipe_buffer_create(sctx->b.b.screen, PIPE_BIND_CUSTOM, PIPE_USAGE_DEFAULT, 32768 * sctx->screen->b.info.max_se); @@ -1812,8
[Mesa-dev] [PATCH 2/4] radeonsi: enable CU0 in each SE for LS-HS execution
From: Marek OlšákOffchip-only tessellation allows this. --- src/gallium/drivers/radeonsi/si_state.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index b21fa5c..54febce 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -3829,6 +3829,7 @@ static void si_init_config(struct si_context *sctx) si_pm4_set_reg(pm4, R_028408_VGT_INDX_OFFSET, 0); if (sctx->b.chip_class >= CIK) { + si_pm4_set_reg(pm4, R_00B51C_SPI_SHADER_PGM_RSRC3_LS, S_00B51C_CU_EN(0x)); si_pm4_set_reg(pm4, R_00B41C_SPI_SHADER_PGM_RSRC3_HS, 0); si_pm4_set_reg(pm4, R_00B31C_SPI_SHADER_PGM_RSRC3_ES, S_00B31C_CU_EN(0x)); si_pm4_set_reg(pm4, R_00B21C_SPI_SHADER_PGM_RSRC3_GS, S_00B21C_CU_EN(0x)); @@ -3841,7 +3842,6 @@ static void si_init_config(struct si_context *sctx) * * LATE_ALLOC_VS = 2 is the highest safe number. */ - si_pm4_set_reg(pm4, R_00B51C_SPI_SHADER_PGM_RSRC3_LS, S_00B51C_CU_EN(0x)); si_pm4_set_reg(pm4, R_00B118_SPI_SHADER_PGM_RSRC3_VS, S_00B118_CU_EN(0x)); si_pm4_set_reg(pm4, R_00B11C_SPI_SHADER_LATE_ALLOC_VS, S_00B11C_LIMIT(2)); } else { @@ -3850,7 +3850,6 @@ static void si_init_config(struct si_context *sctx) * - VS can't execute on CU0. * - If HS writes outputs to LDS, LS can't execute on CU0. */ - si_pm4_set_reg(pm4, R_00B51C_SPI_SHADER_PGM_RSRC3_LS, S_00B51C_CU_EN(0xfffe)); si_pm4_set_reg(pm4, R_00B118_SPI_SHADER_PGM_RSRC3_VS, S_00B118_CU_EN(0xfffe)); si_pm4_set_reg(pm4, R_00B11C_SPI_SHADER_LATE_ALLOC_VS, S_00B11C_LIMIT(31)); } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] i965: Refactor intel_get_param()
Replace the function's __DRIscreen parameter with struct intel_screen. The callsites feel more natural that way. --- src/mesa/drivers/dri/i965/intel_screen.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index 869119b..b693c45 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -970,7 +970,7 @@ static const __DRIextension *intelRobustScreenExtensions[] = { }; static int -intel_get_param(__DRIscreen *psp, int param, int *value) +intel_get_param(struct intel_screen *screen, int param, int *value) { int ret; struct drm_i915_getparam gp; @@ -979,7 +979,8 @@ intel_get_param(__DRIscreen *psp, int param, int *value) gp.param = param; gp.value = value; - ret = drmCommandWriteRead(psp->fd, DRM_I915_GETPARAM, , sizeof(gp)); + ret = drmCommandWriteRead(screen->driScrnPriv->fd, + DRM_I915_GETPARAM, , sizeof(gp)); if (ret < 0 && ret != -EINVAL) _mesa_warning(NULL, "drm_i915_getparam: %d", ret); @@ -987,10 +988,10 @@ intel_get_param(__DRIscreen *psp, int param, int *value) } static bool -intel_get_boolean(__DRIscreen *psp, int param) +intel_get_boolean(struct intel_screen *screen, int param) { int value = 0; - return (intel_get_param(psp, param, ) == 0) && value; + return (intel_get_param(screen, param, ) == 0) && value; } static void @@ -1125,12 +1126,12 @@ intel_detect_sseu(struct intel_screen *intelScreen) intelScreen->subslice_total = -1; intelScreen->eu_total = -1; - ret = intel_get_param(intelScreen->driScrnPriv, I915_PARAM_SUBSLICE_TOTAL, + ret = intel_get_param(intelScreen, I915_PARAM_SUBSLICE_TOTAL, >subslice_total); if (ret < 0 && ret != -EINVAL) goto err_out; - ret = intel_get_param(intelScreen->driScrnPriv, + ret = intel_get_param(intelScreen, I915_PARAM_EU_TOTAL, >eu_total); if (ret < 0 && ret != -EINVAL) goto err_out; @@ -1167,7 +1168,7 @@ intel_init_bufmgr(struct intel_screen *intelScreen) drm_intel_bufmgr_gem_enable_fenced_relocs(intelScreen->bufmgr); - if (!intel_get_boolean(spriv, I915_PARAM_HAS_RELAXED_DELTA)) { + if (!intel_get_boolean(intelScreen, I915_PARAM_HAS_RELAXED_DELTA)) { fprintf(stderr, "[%s: %u] Kernel 2.6.39 required.\n", __func__, __LINE__); return false; } -- 2.9.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/3] i965: Cleanups for DRM_IOCTL_I915_GETPARAM
I've begun investigating Android sync fds, whose support will be advertised with a new i915 getparam. While investigating the new feature, I wrote this little cleanup series. Chad Versace (3): i965: Refactor intel_get_param() i965: Use drmIoctl for DRM_I915_GETPARAM i965: Use intel_get_param() more often src/mesa/drivers/dri/i965/intel_screen.c | 30 -- 1 file changed, 12 insertions(+), 18 deletions(-) -- 2.9.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] i965: Use drmIoctl for DRM_I915_GETPARAM
Stop using drmCommandWriteRead for such a simple ioctl. --- src/mesa/drivers/dri/i965/intel_screen.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index b693c45..f7f806e 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -979,8 +979,7 @@ intel_get_param(struct intel_screen *screen, int param, int *value) gp.param = param; gp.value = value; - ret = drmCommandWriteRead(screen->driScrnPriv->fd, - DRM_I915_GETPARAM, , sizeof(gp)); + ret = drmIoctl(screen->driScrnPriv->fd, DRM_IOCTL_I915_GETPARAM, ); if (ret < 0 && ret != -EINVAL) _mesa_warning(NULL, "drm_i915_getparam: %d", ret); -- 2.9.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] i965: Use intel_get_param() more often
Replace some open-coded ioctls with intel_get_param(). This is just a cleanup. No change in behavior. --- src/mesa/drivers/dri/i965/intel_screen.c | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index f7f806e..4194fd6 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1604,12 +1604,10 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp) (ret != -1 || errno != EINVAL); } - struct drm_i915_getparam getparam; - getparam.param = I915_PARAM_CMD_PARSER_VERSION; - getparam.value = >cmd_parser_version; - const int ret = drmIoctl(psp->fd, DRM_IOCTL_I915_GETPARAM, ); - if (ret == -1) + if (intel_get_param(intelScreen, I915_PARAM_CMD_PARSER_VERSION, + >cmd_parser_version) < 0) { intelScreen->cmd_parser_version = 0; + } /* Haswell requires command parser version 6 in order to write to the * MI_MATH GPR registers, and version 7 in order to use @@ -1629,12 +1627,8 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp) intelScreen->program_id = 1; if (intelScreen->devinfo->has_resource_streamer) { - int val = -1; - getparam.param = I915_PARAM_HAS_RESOURCE_STREAMER; - getparam.value = - - drmIoctl(psp->fd, DRM_IOCTL_I915_GETPARAM, ); - intelScreen->has_resource_streamer = val > 0; + intelScreen->has_resource_streamer = +intel_get_boolean(intelScreen, I915_PARAM_HAS_RESOURCE_STREAMER); } return (const __DRIconfig**) intel_screen_make_configs(psp); -- 2.9.0.rc2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] Make single-buffered GLES representation internally consistent
Hi Ilia, Setting it correctly initially is more messy. At least in my use case, we know the context type from EGL_RENDERABLE_TYPE before the framebuffer is created. We would need to add the context information to the visual used by _mesa_initialize_window_framebuffer. That requires including main/mtypes.h in the EGL part of the source tree, which nobody else does and leads to build system issues. We could also make the change in _mesa_make_current instead of get.c, but once again we'll be flipping the original value. I'll send a modified patch shortly unless somebody has any other ideas. On Mon, Jun 27, 2016 at 7:55 PM, Ilia Mirkinwrote: > On Mon, Jun 27, 2016 at 6:30 PM, Gurchetan Singh > wrote: > > Hi Ilia, > > > > The changes for get.c where prompted by the es3fIntegerStateQueryTests > (see > > modules/gles3/functional/es3fIntegerStateQueryTests.cpp in the dEQP > tree). > > Specifically, these few lines: > > > >>> const GLint validInitialValues[] = {GL_BACK, GL_NONE}; > >>> m_verifier->verifyIntegerAnyOf(m_testCtx, GL_READ_BUFFER, > >>> validInitialValues, DE_LENGTH_OF_ARRAY(validInitialValues)); > >>> expectError(GL_NO_ERROR); > > > > We initially set ColorReadBuffer to GL_FRONT in > > _mesa_initialize_window_framebuffer for single-buffered configs. > > So ... could we initialize it to GL_BACK for GLES and avoid this pain? > Unfortunately I have no idea what the implications of that would be. > > > > > We could also make sure the context is single-buffered in get.c to > further > > avoid bugs. Let me know if that works for you and I'll send a modified > > patch. > > > > I do agree it is a bit hacky ... I'd definitely be interested in > alternative > > solutions. > > If you're flipping the value in the getter, you might as well set that > to be the value from the very beginning. However I don't know what the > effects of that are. > > -ilia > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: set the new pipe_surface::alpha_one field for RGB surfaces
As I wrote in my other message yesterday, I was going to disable support in our VMware driver for the RGBX8 formats because of difficulties with ARB_copy_image functionality. This led to the issue of blending to RGBA surfaces as if they were RGBX. I guess we could set pipe_surface::format = RGBX for this case. Though this would probably lead to some special-case code in the driver(s) (but probably comparable to what I had done for 'alpha_one'.) The issue is when a driver says it can't support RGBX formats, it would still need to be prepared to handle some of those formats in the pipe_surface::format field. As it is now, when a driver says it can't support a particular format, it really means it and is probably unprepared to see it anywhere. So, if we set pipe_surface::format = RGBX in the state tracker, there's some regression risk across all drivers. The flag I proposed wouldn't have that risk. Anyway, I think I've found work-arounds in our driver to keep RGBX support so that this patch isn't needed after all. I just have to finish more piglit testing. -Brian On 06/28/2016 09:11 AM, Marek Olšák wrote: I guess you need this because your driver doesn't support LUMINANCE and st/mesa selects RGBA, right? In that case, you can just set RGBX in pipe_surface::format and you don't need another flag. It would be better to select RGBX at renderbuffer creation, but doing it later is fine as well. Marek On Fri, Jun 24, 2016 at 4:43 PM, Brian Paulwrote: This indicates the alpha channel of the surface should always be one. Drivers can use this to adjust blending terms when needed. v2: also check for R, RG, LUMINANCE surfaces, per Ilia --- src/mesa/state_tracker/st_cb_fbo.c | 9 + 1 file changed, 9 insertions(+) diff --git a/src/mesa/state_tracker/st_cb_fbo.c b/src/mesa/state_tracker/st_cb_fbo.c index 9801b1f..843ff83 100644 --- a/src/mesa/state_tracker/st_cb_fbo.c +++ b/src/mesa/state_tracker/st_cb_fbo.c @@ -216,6 +216,11 @@ st_renderbuffer_alloc_storage(struct gl_context * ctx, return FALSE; u_surface_default_template(_tmpl, strb->texture); + surf_tmpl.alpha_one = (strb->Base._BaseFormat == GL_RGB || + strb->Base._BaseFormat == GL_RG || + strb->Base._BaseFormat == GL_R || + strb->Base._BaseFormat == GL_LUMINANCE); + strb->surface = pipe->create_surface(pipe, strb->texture, _tmpl); @@ -463,6 +468,10 @@ st_update_renderbuffer_surface(struct st_context *st, /* create a new pipe_surface */ struct pipe_surface surf_tmpl; memset(_tmpl, 0, sizeof(surf_tmpl)); + surf_tmpl.alpha_one = (strb->Base._BaseFormat == GL_RGB || + strb->Base._BaseFormat == GL_RG || + strb->Base._BaseFormat == GL_R || + strb->Base._BaseFormat == GL_LUMINANCE); surf_tmpl.format = format; surf_tmpl.u.tex.level = level; surf_tmpl.u.tex.first_layer = first_layer; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev=CwIBaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=T0t4QG7chq2ZwJo6wilkFznRSFy-8uDKartPGbomVj8=aRQY4-PdtA1sKl095cbVP0IOaCsr4WAgTK9bl_Loek0=vyrVquTCLd-SUurntQWZk5fNrQUvyVGdEzFB8q5kQ_k= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: adds gen7_emit_cs_stall_flush on intel_texture_barrier
On Tue, Jun 28, 2016 at 11:46 AM, Alejandro Piñeirowrote: > Fixes: > GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass > > On Haswell, Broadwell and Skylake (note that in order to execute > that test, it is needed to override GL and GLSL versions). > > I was not able to find a documentation reference that justifies it. > --- > > Having said, I didn't find a documentation reference explicitly > mention that this is needed. > > Initially I thought that a flag was missing when calling > emit_pipe_control_flush at brw_emit_mi_flush, but it was not the case > as far as I saw. Then I noted that there is a gen6 workaround on that > code: > > if (brw->gen == 6) { > /* Hardware workaround: SNB B-Spec says: > * > * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache > * Flush Enable =1, a PIPE_CONTROL with any non-zero > * post-sync-op is required. > */ > brw_emit_post_sync_nonzero_flush(brw); > } > > I tested calling that method for any gen, guessing if the workaround > was needed also for other gens, and the test got fixed. But looking at > the documentation of other gens, I didn't find the need for this > workaround. For that reason I moved to use gen7_emit_cs_stall, that is > less agressive and get the test fixed too. It seems that in order to > get a complete flush you need a cs stall flush with a > pipe_control_write. But again, I didn't find any reference at the PRMs > confirming it. > > Intuitively, this would be needed on brw_emit_mi_flush or even at > brw_emit_pipe_control_flush (this one already include some > gen-specific workarounds), but I prefered to keep it on the only place > that seems to need it for now. > > In addition to solve that CTS test, it also gets it passing for the > test I recently sent to the piglit list, and not included on master > yet (acked for now): > https://lists.freedesktop.org/archives/piglit/2016-June/020055.html > > That piglit patch adds 48 parameter combination for the basic > test. Without this mesa patch 5-6 subtests fails. With this patch all > of them passes. Tested on Haswell, Broadwell and Skylake too. > > src/mesa/drivers/dri/i965/intel_tex.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/mesa/drivers/dri/i965/intel_tex.c > b/src/mesa/drivers/dri/i965/intel_tex.c > index cac33ac..e7459cd 100644 > --- a/src/mesa/drivers/dri/i965/intel_tex.c > +++ b/src/mesa/drivers/dri/i965/intel_tex.c > @@ -362,6 +362,7 @@ intel_texture_barrier(struct gl_context *ctx) > { > struct brw_context *brw = brw_context(ctx); > > + gen7_emit_cs_stall_flush(brw); > brw_emit_mi_flush(brw); Without commenting on exactly what these do, what texture barrier *should* do is (1) wait for all previous draws to complete (since they may be in the process of filling caches with "old" data) (2) flush texture caches If you flush caches without waiting first, then a draw currently in progress may continue dirtying them with the "bad" data. As I said, however, I have no idea what either of the above functions *really* do, or what forms of parallelism are possible on intel hw. Hopefully the above comments will help someone with the proper knowledge evaluate whether this or a different change is necessary. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: adds gen7_emit_cs_stall_flush on intel_texture_barrier
Fixes: GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass On Haswell, Broadwell and Skylake (note that in order to execute that test, it is needed to override GL and GLSL versions). I was not able to find a documentation reference that justifies it. --- Having said, I didn't find a documentation reference explicitly mention that this is needed. Initially I thought that a flag was missing when calling emit_pipe_control_flush at brw_emit_mi_flush, but it was not the case as far as I saw. Then I noted that there is a gen6 workaround on that code: if (brw->gen == 6) { /* Hardware workaround: SNB B-Spec says: * * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache * Flush Enable =1, a PIPE_CONTROL with any non-zero * post-sync-op is required. */ brw_emit_post_sync_nonzero_flush(brw); } I tested calling that method for any gen, guessing if the workaround was needed also for other gens, and the test got fixed. But looking at the documentation of other gens, I didn't find the need for this workaround. For that reason I moved to use gen7_emit_cs_stall, that is less agressive and get the test fixed too. It seems that in order to get a complete flush you need a cs stall flush with a pipe_control_write. But again, I didn't find any reference at the PRMs confirming it. Intuitively, this would be needed on brw_emit_mi_flush or even at brw_emit_pipe_control_flush (this one already include some gen-specific workarounds), but I prefered to keep it on the only place that seems to need it for now. In addition to solve that CTS test, it also gets it passing for the test I recently sent to the piglit list, and not included on master yet (acked for now): https://lists.freedesktop.org/archives/piglit/2016-June/020055.html That piglit patch adds 48 parameter combination for the basic test. Without this mesa patch 5-6 subtests fails. With this patch all of them passes. Tested on Haswell, Broadwell and Skylake too. src/mesa/drivers/dri/i965/intel_tex.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/intel_tex.c b/src/mesa/drivers/dri/i965/intel_tex.c index cac33ac..e7459cd 100644 --- a/src/mesa/drivers/dri/i965/intel_tex.c +++ b/src/mesa/drivers/dri/i965/intel_tex.c @@ -362,6 +362,7 @@ intel_texture_barrier(struct gl_context *ctx) { struct brw_context *brw = brw_context(ctx); + gen7_emit_cs_stall_flush(brw); brw_emit_mi_flush(brw); } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: set the new pipe_surface::alpha_one field for RGB surfaces
On Tue, Jun 28, 2016 at 5:16 PM, Ilia Mirkinwrote: > The main issue is when the st selects an alpha-ful format, but the GL > wants an alpha-less format. The driver has no way of knowing. This > gives it a way of knowing. > > The alternative is that the driver has to support every format and we > drop all the fallbacks from st_format. Brian's proposal seems like a > simpler solution. (This happens on nvc0, for example - RGB10A2 is > supported, but RGB10X2 isn't. So the st picks RGB10A2 and nvc0 is none > the wiser - until someone tries to do DST_ALPHA blending.) Note that no hardware supports RGBX fully as pipe_surface. Radeon also only supports RGBA and there is a state to force DST_ALPHA to one. That's enough to pass all tests. The idea is to treat RGBX as RGBA in all places except blending, and pipe_surface can already describe that. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] clover: fix getting struct args api size
On Thu, 2016-06-23 at 18:03 -0700, Francisco Jerez wrote: > Jan Veselywrites: > > > On Wed, 2016-06-22 at 20:22 -0700, Francisco Jerez wrote: > > > Jan Vesely writes: > > > > > > > On Wed, 2016-06-22 at 17:07 -0700, Francisco Jerez wrote: > > > > > Jan Vesely writes: > > > > > > > > > > > On Mon, 2016-06-13 at 17:24 -0700, Francisco Jerez wrote: > > > > > > > Serge Martin writes: > > > > > > > > > > > > > > > This fix getting the size of a struct arg. vec3 types > > > > > > > > still > > > > > > > > work > > > > > > > > ok. > > > > > > > > Only buit-in args need to have power of two alignment, > > > > > > > > getTypeAllocSize > > > > > > > > reports the correct size. > > > > > > > > --- > > > > > > > > src/gallium/state_trackers/clover/llvm/invocation.cpp > > > > > > > > | 3 > > > > > > > > ++- > > > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > diff --git > > > > > > > > a/src/gallium/state_trackers/clover/llvm/invocation.cpp > > > > > > > > b/src/gallium/state_trackers/clover/llvm/invocation.cpp > > > > > > > > index 03487d6..9af51539 100644 > > > > > > > > --- > > > > > > > > a/src/gallium/state_trackers/clover/llvm/invocation.cpp > > > > > > > > +++ > > > > > > > > b/src/gallium/state_trackers/clover/llvm/invocation.cpp > > > > > > > > @@ -472,7 +472,8 @@ namespace { > > > > > > > > // aligned to the next larger power of > > > > > > > > two". We > > > > > > > > need > > > > > > > > this > > > > > > > > // alignment for three element vectors, which > > > > > > > > have > > > > > > > > // non-power-of-2 store size. > > > > > > > > - const unsigned arg_api_size = > > > > > > > > util_next_power_of_two(arg_store_size); > > > > > > > > + const unsigned arg_api_size = arg_type- > > > > > > > > > isStructTy() > > > > > > > > ? > > > > > > > > + arg_store_size : > > > > > > > > util_next_power_of_two(arg_store_size); > > > > > > > > > > > > > > > Hm... Isn't this still going to be broken if you pass a > > > > > > > struct > > > > > > > argument > > > > > > > to a kernel function and the alignment of any of the > > > > > > > struct > > > > > > > members > > > > > > > doesn't match the target-specific data layout? Not sure > > > > > > > we > > > > > > > can > > > > > > > fix > > > > > > > this > > > > > > > sensibly without requiring the target's data layout to > > > > > > > match > > > > > > > the > > > > > > > CL > > > > > > > API > > > > > > > exactly. Any suggestions Tom? > > > > > > > > > > > > according to 6.7.2.1 compilers can arbitrarily insert > > > > > > padding > > > > > > between > > > > > > struct members (except at the beginning). > > > > > > > > > > What spec version are you looking at? My CL spec doesn't > > > > > have > > > > > any > > > > > section labeled 6.7.2.1. > > > > > > > > c99 specs, I did not find anything specific for CLC (it might > > > > be > > > > that I > > > > just need to look harder). CLC 2.0 adds additional constraint > > > > that > > > > you > > > > can't use address space qualifiers. > > > > > > > > > > I'd expect that whatever the CL spec says regarding the memory > > > layout > > > of > > > CLC types (e.g. section 6.1.5 which specifies the usual alignment > > > rules > > > for CL types and section 6.11.1 and 6.11.3 which specify various > > > variable and type declaration attributes giving finer control > > > over > > > the > > > alignment of variable and struct member declarations) fully > > > overrides > > > the C99 spec. > > > > Right, even if we consider that none of the C99 6.7.2.1 apply (and > > at > > least CL2.0 6.5.6 does not make it sound so), it only gives us one > > side, we can check that the CLC struct layout follows what we would > > expect. We don't have means to check and enforce that the host side > > struct layout is compatible. > > > > Yes, exactly, the CL spec doesn't have anything to say about the > host-side memory layout, that's up to the host platform's ABI to > define. > > > > > > > > > > > > > > > Even if size/alignment of individual members match CL API > > > > > > exactly, > > > > > > there's no guarantee that the structure layout/size will be > > > > > > the > > > > > > same. > > > > > > > > > > > How can you exchange structured data with a CL kernel then, > > > > > assuming > > > > > that the layout of structure types in memory is fully > > > > > unspecified > > > > > as > > > > > you > > > > > say? > > > > > > > > that is my point. My understanding is that it relies on a > > > > silent > > > > assumption that both CLC and the host compiler will create the > > > > same > > > > structure layout given the same structure elements. > > > > > > > > big endian host can create: > > > > struct foo { > > > > cl_int a; > > > > // 16 bit padding; > > > > cl_short b; > > > > cl_int c; > > > > }; >
Re: [Mesa-dev] [PATCH 1/2] glsl: add driconf to zero-init unintialized vars
On Mon, Jun 27, 2016 at 9:28 PM, Rob Clarkwrote: > On Mon, Jun 27, 2016 at 3:06 PM, Kenneth Graunke > wrote: >> On Monday, June 27, 2016 11:43:28 AM PDT Matt Turner wrote: >>> On Mon, Jun 27, 2016 at 4:44 AM, Rob Clark wrote: >>> > On Mon, Jun 27, 2016 at 7:13 AM, Alan Swanson >>> > wrote: >>> >> On 2016-06-25 13:37, Rob Clark wrote: >>> >>> >>> >>> Some games are sloppy.. perhaps because it is defined behavior for DX or >>> >>> perhaps because nv blob driver defaults things to zero. >>> >>> >>> >>> So add driconf param to force uninitialized variables to default to >>> >>> zero. >>> >>> >>> >>> This issue was observed with rust, from steam store. But has surfaced >>> >>> elsewhere in the past. >>> >>> >>> >>> Signed-off-by: Rob Clark >>> >>> --- >>> >>> Note that I left out the drirc bit, since not entirely sure how to >>> >>> identify this game. (I don't actually have the game, just working off >>> >>> of an apitrace) >>> >>> >>> >>> Possibly worth mentioning that for the shaders using uninitialized vars >>> >>> having zero-initializers lets constant-propagation get rid of a whole >>> >>> lot of instructions. One shader I saw dropped to less than half of >>> >>> it's original instruction count. >>> >> >>> >> >>> >> If the default for uninitialised variables is undefined, then with the >>> >> reported shader optimisations why bother with the (DRI) option when >>> >> zeroing could still essentially be classed as undefined? >>> >> >>> >> Cuts the patch down to just the src/compiler/glsl/ast_to_hir.cpp change. >>> > >>> > I did suggest that on #dri-devel, but Jason had a theoretical example >>> > where it would hurt.. iirc something like: >>> > >>> > float maybe_undef; >>> > for (int i = 0; i < some_uniform_at_least_one; i++) >>> > maybe_undef = ... >>> > >>> > also, he didn't want to hide shader bugs that app should fix. >>> > >>> > It would be interesting to rush shaderdb w/ glsl_zero_init=true and >>> > see what happens, but I didn't get around to that yet. >>> >>> Here's what I get on i965. It's not a clear win. >>> >>> total instructions in shared programs: 5249030 -> 5249002 (-0.00%) >>> instructions in affected programs: 28936 -> 28908 (-0.10%) >>> helped: 66 >>> HURT: 132 >>> >>> total cycles in shared programs: 57966694 -> 57956306 (-0.02%) >>> cycles in affected programs: 1136118 -> 1125730 (-0.91%) >>> helped: 78 >>> HURT: 106 >> >> I suspect most of the help is because we're missing undef optimizations, >> such as CSE...while zero could be CSE'd. (I have a patch, but it hurts >> things too...) > > right, I was thinking that treating undef as zero in constant-folding > would have the same effect.. ofc it might make shader bugs less > obvious. > > Btw, does anyone know what fglrx does? Afaiu nv blob treats undef as > zero. If fglrx does the same, I suppose that strengthens the argument > for "just do this unconditionally". No idea what fglrx does, but LLVM does eliminate code with undefined inputs. Initializing everything to 0 might make that worse. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: set the new pipe_surface::alpha_one field for RGB surfaces
The main issue is when the st selects an alpha-ful format, but the GL wants an alpha-less format. The driver has no way of knowing. This gives it a way of knowing. The alternative is that the driver has to support every format and we drop all the fallbacks from st_format. Brian's proposal seems like a simpler solution. (This happens on nvc0, for example - RGB10A2 is supported, but RGB10X2 isn't. So the st picks RGB10A2 and nvc0 is none the wiser - until someone tries to do DST_ALPHA blending.) -ilia On Tue, Jun 28, 2016 at 11:11 AM, Marek Olšákwrote: > I guess you need this because your driver doesn't support LUMINANCE > and st/mesa selects RGBA, right? In that case, you can just set RGBX > in pipe_surface::format and you don't need another flag. > > It would be better to select RGBX at renderbuffer creation, but doing > it later is fine as well. > > Marek > > On Fri, Jun 24, 2016 at 4:43 PM, Brian Paul wrote: >> This indicates the alpha channel of the surface should always be one. >> Drivers can use this to adjust blending terms when needed. >> >> v2: also check for R, RG, LUMINANCE surfaces, per Ilia >> --- >> src/mesa/state_tracker/st_cb_fbo.c | 9 + >> 1 file changed, 9 insertions(+) >> >> diff --git a/src/mesa/state_tracker/st_cb_fbo.c >> b/src/mesa/state_tracker/st_cb_fbo.c >> index 9801b1f..843ff83 100644 >> --- a/src/mesa/state_tracker/st_cb_fbo.c >> +++ b/src/mesa/state_tracker/st_cb_fbo.c >> @@ -216,6 +216,11 @@ st_renderbuffer_alloc_storage(struct gl_context * ctx, >>return FALSE; >> >> u_surface_default_template(_tmpl, strb->texture); >> + surf_tmpl.alpha_one = (strb->Base._BaseFormat == GL_RGB || >> + strb->Base._BaseFormat == GL_RG || >> + strb->Base._BaseFormat == GL_R || >> + strb->Base._BaseFormat == GL_LUMINANCE); >> + >> strb->surface = pipe->create_surface(pipe, >> strb->texture, >> _tmpl); >> @@ -463,6 +468,10 @@ st_update_renderbuffer_surface(struct st_context *st, >>/* create a new pipe_surface */ >>struct pipe_surface surf_tmpl; >>memset(_tmpl, 0, sizeof(surf_tmpl)); >> + surf_tmpl.alpha_one = (strb->Base._BaseFormat == GL_RGB || >> + strb->Base._BaseFormat == GL_RG || >> + strb->Base._BaseFormat == GL_R || >> + strb->Base._BaseFormat == GL_LUMINANCE); >>surf_tmpl.format = format; >>surf_tmpl.u.tex.level = level; >>surf_tmpl.u.tex.first_layer = first_layer; >> -- >> 1.9.1 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: set the new pipe_surface::alpha_one field for RGB surfaces
I guess you need this because your driver doesn't support LUMINANCE and st/mesa selects RGBA, right? In that case, you can just set RGBX in pipe_surface::format and you don't need another flag. It would be better to select RGBX at renderbuffer creation, but doing it later is fine as well. Marek On Fri, Jun 24, 2016 at 4:43 PM, Brian Paulwrote: > This indicates the alpha channel of the surface should always be one. > Drivers can use this to adjust blending terms when needed. > > v2: also check for R, RG, LUMINANCE surfaces, per Ilia > --- > src/mesa/state_tracker/st_cb_fbo.c | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/src/mesa/state_tracker/st_cb_fbo.c > b/src/mesa/state_tracker/st_cb_fbo.c > index 9801b1f..843ff83 100644 > --- a/src/mesa/state_tracker/st_cb_fbo.c > +++ b/src/mesa/state_tracker/st_cb_fbo.c > @@ -216,6 +216,11 @@ st_renderbuffer_alloc_storage(struct gl_context * ctx, >return FALSE; > > u_surface_default_template(_tmpl, strb->texture); > + surf_tmpl.alpha_one = (strb->Base._BaseFormat == GL_RGB || > + strb->Base._BaseFormat == GL_RG || > + strb->Base._BaseFormat == GL_R || > + strb->Base._BaseFormat == GL_LUMINANCE); > + > strb->surface = pipe->create_surface(pipe, > strb->texture, > _tmpl); > @@ -463,6 +468,10 @@ st_update_renderbuffer_surface(struct st_context *st, >/* create a new pipe_surface */ >struct pipe_surface surf_tmpl; >memset(_tmpl, 0, sizeof(surf_tmpl)); > + surf_tmpl.alpha_one = (strb->Base._BaseFormat == GL_RGB || > + strb->Base._BaseFormat == GL_RG || > + strb->Base._BaseFormat == GL_R || > + strb->Base._BaseFormat == GL_LUMINANCE); >surf_tmpl.format = format; >surf_tmpl.u.tex.level = level; >surf_tmpl.u.tex.first_layer = first_layer; > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa/st: Include nir.h for nir_shader symbol.
On Mon, Jun 27, 2016 at 10:08 PM, Matt Turnerwrote: > On Mon, Jun 27, 2016 at 6:45 PM, Vinson Lee wrote: >> Fix this build error with GCC 4.4. >> >> CC state_tracker/st_nir_lower_builtin.lo >> In file included from state_tracker/st_nir_lower_builtin.c:61: >> state_tracker/st_nir.h:34: error: redefinition of typedef ‘nir_shader’ >> ../../src/compiler/nir/nir.h:1830: note: previous declaration of >> ‘nir_shader’ was here > > This error seems to imply that nir.h is already being included somehow. > > Does just removing the typedef solve the problem? Can we figure out > how nir.h is already being included and remove that? nir.h is coming from st_nir_lower_builtin.c which #includes st_nir.h.. Perhaps the thing to do is drop the typedef, and just fwd declare 'struct nir_shader', and use 'struct nir_shader' instead of 'nir_shader' in st_nir.h Already half of the world gets recompiled when you touch nir.h, and I'd rather not make that worse.. BR, -R ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates
On Tue, Jun 28, 2016 at 10:27 AM, Samuel Pitoisetwrote: > > > On 06/28/2016 04:23 PM, Ilia Mirkin wrote: >> >> On Tue, Jun 28, 2016 at 10:21 AM, Samuel Pitoiset >> wrote: >>> >>> On 06/28/2016 04:15 PM, Ilia Mirkin wrote: Again, what problem was this patch trying to solve? >>> >>> >>> >>> The problem is that FADD can only emits 19-bits but longIMMD() will >>> return >>> false because it only checks for the high 12-bits. >>> >>> I don't know if you saw my messages on IRC but I found some other issues >>> with longIMMD() and emitIMMD(). >> >> >> Nope, it will emit 19 bits and then the 20th (high aka sign) bit as >> well, just to a different location. [And the bottom 12 bits are >> guaranteed to be 0.] >> >> What's a specific example that you think it doesn't emit correctly? > > > I don't have any shaders which hit that issue, but I think it's similar to > the fix I did for IMUL32I. The immediate value was 0xf4240 in that specific > case, and IMUL emitted 0x74240 instead... because the sign bit was used to > emit the NEG modifier. Right, which isn't the same thing for ints, but is the same thing for floats. For integer immediates, it's also the low 20 bits, not the high 20 bits. And I believe that the condition should be ensuring that all 12 of the high bits are the same. But perhaps it doesn't properly check that those 12 bits have the same value as the 20th bit? Anyways... if it ain't broken, don't fix it. Doesn't sound like FADD emission is broken in any way - let's not fix it. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates
On 06/28/2016 04:38 PM, Ilia Mirkin wrote: On Tue, Jun 28, 2016 at 10:27 AM, Samuel Pitoisetwrote: On 06/28/2016 04:23 PM, Ilia Mirkin wrote: On Tue, Jun 28, 2016 at 10:21 AM, Samuel Pitoiset wrote: On 06/28/2016 04:15 PM, Ilia Mirkin wrote: Again, what problem was this patch trying to solve? The problem is that FADD can only emits 19-bits but longIMMD() will return false because it only checks for the high 12-bits. I don't know if you saw my messages on IRC but I found some other issues with longIMMD() and emitIMMD(). Nope, it will emit 19 bits and then the 20th (high aka sign) bit as well, just to a different location. [And the bottom 12 bits are guaranteed to be 0.] What's a specific example that you think it doesn't emit correctly? I don't have any shaders which hit that issue, but I think it's similar to the fix I did for IMUL32I. The immediate value was 0xf4240 in that specific case, and IMUL emitted 0x74240 instead... because the sign bit was used to emit the NEG modifier. Right, which isn't the same thing for ints, but is the same thing for floats. For integer immediates, it's also the low 20 bits, not the high 20 bits. And I believe that the condition should be ensuring that all 12 of the high bits are the same. But perhaps it doesn't properly check that those 12 bits have the same value as the 20th bit? Yes, it does not do that. Anyways... if it ain't broken, don't fix it. Doesn't sound like FADD emission is broken in any way - let's not fix it. Okay, your call. :-) -- -Samuel ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH] mapi: Export all GLES 3.1 functions in libGLESv2.so
On 27 June 2016 at 18:38, Ian Romanickwrote: > On 06/24/2016 09:30 AM, Emil Velikov wrote: >> On 20 June 2016 at 19:14, Ian Romanick wrote: >>> On 06/17/2016 11:15 AM, Emil Velikov wrote: On 17 June 2016 at 18:20, Ian Romanick wrote: > From: Ian Romanick > > Khronos recommends that the GLES 3.1 library also be called libGLESv2. > It also requires that functions be statically linkable from that > library. > > NOTE: Mesa has supported the EGL_KHR_get_all_proc_addresses extension > since at least Mesa 10.5, so applications targeting Linux should use > eglGetProcAddress to avoid problems running binaries on systems with > older, non-GLES 3.1 libGLESv2 libraries. > Fwiw I'm inclined that we should go the "opposite direction". Namely: don't expose new symbols and stick to a predefined version (3.0 being the personal favour of choice). Why, you might ask - for a couple of reasons: - If the list continues to grow programs will have unstable ABI - sort of how libGL ended up. Applications are going to link against 3.1 or later symbols [1], even if they only optionally use them. Thus things will quite hairy and fragile. >>> >>> There are at least two solutions, and piglit uses both. If use of a set >>> of functions is optional, you can still use GetProcAddress (when >>> EGL_KHR_get_all_proc_addresses is available) or you can use dlsym. >>> >>> For me, piglit is where this whole problem actually started. Right now, >>> piglit follows the (unextended) rules and does not attempt to use >>> GetProcAddress on core functions. It uses dlsym. I tried to extend >>> shader_runner for separate shader objects on GLES. Guess what? Since >>> the symbols aren't exported by the library, it didn't work. So... now >>> piglit would need TWO code paths... one that uses dlsym and one that >>> uses eglGetProcAddress... or require an optional extension. >>> >> I've started looking at piglit last night. There should be some fixes >> for it on the list later on today. >> >>> If an application requires GLES 3.1 symbols, it should just be able to >>> link with them. As far as I can tell, that's how it works on Android. >>> >> I look at the Android wrapper too closely for the following reasons: >> >> - There is libGLESv3.so which is identical copy of the v2 one. >> - Their libGLESv2/3.so periodically grows new symbols, including GLES >> extensions. >> - Android has tight control what and/how it's run on their platform - >> something that Linux distributions cannot do afaict. >> - Applications using GLES should annotate the version used in the >> manifest, which (haven't checked exactly) could serve as a first line >> of defence for applications e.g. using GLES 3.1 on system/drivers >> supporting GLES 3.0. >> >> That said, there is one very good thing: >> - They use dlsym and then eglGetProcAddress on all symbols. Thus mesa >> will just work. >> - The other desktop GLES* provider NVIDIA does not export even a single GLES 3.1/3.2 entry point (still going through the 3.0 list) in their libGLESv2.so.2 binary. So what to do with GLES (3.0?)/3.1 and later: - tweak the spec so that said version of the API is only supported if the implementation can get core symbols via eglGetProcAddress. Be that props to the EGL_KHR_get_all_proc_addresses extension or EGL 1.5 [2]. >> Any "sounds ok" or "that's a horrible idea" input on this suggestion ? > > That ship has already sailed. OpenGL ES 3.0 and 3.1 have both been > shipping for years. I don't think changing that is how I would use my > time machine. :) > As you guys wish, I won't stir up a hornet's nest. Just a reminder that we did a similar thing on the libGL front, which, imho, was significantly more likely to have actual users that depend on such 'odd' behaviour. A humble request - can we keep an eye open as GLES 3.3 and/or OpenGL 4.6 comes out. Would be great if with those include the proposed suggestion/fix. Namely: in order to use these with EGL, one needs to have the EGL_KHR_get_all_proc_addresses extension or EGL 1.5. I'll keep an eye open Collabora being a Khronos member, although it would be great if I'm not the only one. Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Split gl_shader in two and clean-ups
On Tue, 2016-06-28 at 11:52 +1000, Timothy Arceri wrote: > There are two distinctly different uses of this struct. The first > is to store GL shader objects. The second is to store information > about a shader stage thats been linked. > > The only place the new structs overlap is the shader layout fields and > I intend to split that out into a third struct once this series lands. > > Having two well defined structs helps code readability and allows the removal > of some unreachable code paths that were the result of confusion between > the two uses. I think it is a good idea, thanks! I dropped a comment in patch 4, with that fixed patches 1-4 are: Reviewed-by: Iago Toral QuirogaI'll try to review the last 3 patches tomorrow. Iago > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates
On 06/28/2016 04:23 PM, Ilia Mirkin wrote: On Tue, Jun 28, 2016 at 10:21 AM, Samuel Pitoisetwrote: On 06/28/2016 04:15 PM, Ilia Mirkin wrote: Again, what problem was this patch trying to solve? The problem is that FADD can only emits 19-bits but longIMMD() will return false because it only checks for the high 12-bits. I don't know if you saw my messages on IRC but I found some other issues with longIMMD() and emitIMMD(). Nope, it will emit 19 bits and then the 20th (high aka sign) bit as well, just to a different location. [And the bottom 12 bits are guaranteed to be 0.] What's a specific example that you think it doesn't emit correctly? I don't have any shaders which hit that issue, but I think it's similar to the fix I did for IMUL32I. The immediate value was 0xf4240 in that specific case, and IMUL emitted 0x74240 instead... because the sign bit was used to emit the NEG modifier. -ilia -- -Samuel ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/7] glsl: pass symbols to find_matching_signature() rather than shader
On Tue, 2016-06-28 at 11:52 +1000, Timothy Arceri wrote: > This will allow us to later split gl_shader into two structs. > --- > src/compiler/glsl/link_functions.cpp | 47 > +--- > 1 file changed, 22 insertions(+), 25 deletions(-) > > diff --git a/src/compiler/glsl/link_functions.cpp > b/src/compiler/glsl/link_functions.cpp > index 4e10287..c9dacc1 100644 > --- a/src/compiler/glsl/link_functions.cpp > +++ b/src/compiler/glsl/link_functions.cpp > @@ -31,8 +31,7 @@ > > static ir_function_signature * > find_matching_signature(const char *name, const exec_list *actual_parameters, > - gl_shader **shader_list, unsigned num_shaders, > - bool use_builtin); > +glsl_symbol_table *symbols, bool use_builtin); > > namespace { > > @@ -78,8 +77,8 @@ public: > * final linked shader. If it does, use it as the target of the call. > */ >ir_function_signature *sig = > - find_matching_signature(name, >parameters, , 1, > - ir->use_builtin); > + find_matching_signature(name, >parameters, linked->symbols, > + ir->use_builtin); >if (sig != NULL) { >ir->callee = sig; >return visit_continue; > @@ -88,8 +87,14 @@ public: >/* Try to find the signature in one of the other shaders that is being > * linked. If it's not found there, return an error. > */ > - sig = find_matching_signature(name, >actual_parameters, > shader_list, > - num_shaders, ir->use_builtin); > + for (unsigned i = 0; i < num_shaders; i++) { > + sig = find_matching_signature(name, >actual_parameters, > + shader_list[i]->symbols, > + ir->use_builtin); > + if (sig) > +break; > + } > + >if (sig == NULL) { >/* FINISHME: Log the full signature of unresolved function. > */ > @@ -307,30 +312,22 @@ private: > */ > ir_function_signature * > find_matching_signature(const char *name, const exec_list *actual_parameters, > - gl_shader **shader_list, unsigned num_shaders, > - bool use_builtin) > +glsl_symbol_table *symbols, bool use_builtin) > { > - for (unsigned i = 0; i < num_shaders; i++) { > - ir_function *const f = shader_list[i]->symbols->get_function(name); > - > - if (f == NULL) > - continue; > + ir_function *const f = symbols->get_function(name); > > + if (f) { >ir_function_signature *sig = > f->matching_signature(NULL, actual_parameters, use_builtin); > > - if ((sig == NULL) || > - (!sig->is_defined && !sig->is_intrinsic)) > - continue; > - > - /* If this function expects to bind to a built-in function and the > - * signature that we found isn't a built-in, keep looking. Also keep > - * looking if we expect a non-built-in but found a built-in. > - */ > - if (use_builtin != sig->is_builtin()) > - continue; > - > - return sig; > + if (sig && (sig->is_defined || sig->is_intrinsic)) { > + /* If this function expects to bind to a built-in function and the > + * signature that we found isn't a built-in, keep looking. Also > keep > + * looking if we expect a non-built-in but found a built-in. > + */ > + if (use_builtin != sig->is_builtin()) > +return sig; The code you changed would not return sig if this condition is true, so I guess you meant: if (use_builtin == sig->is_builtin()) return sig; Iago > + } > } > > return NULL; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] swr: Refactor checks for compiler feature flags
Encapsulate the test for which flags are needed to get a compiler to support certain features. Along with this, give various options to try for AVX and AVX2 support. Ideally we want to use specific instruction set feature flags, like -mavx2 for instance instead of -march=haswell, but the flags required for certain compilers are different. This allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c while the Intel compiler which doesn't support those flags can fall back to using -march=core-avx2. This addresses a bug where the Intel compiler will silently ignore the AVX2 instruction feature flags and then potentially fail to build. v2: Pass preprocessor-check argument as true-state instead of false-state for clarity. Cc: Tim RowleySigned-off-by: Chuck Atkins --- configure.ac | 86 +++- 1 file changed, 62 insertions(+), 24 deletions(-) diff --git a/configure.ac b/configure.ac index cc9bc47..6082778 100644 --- a/configure.ac +++ b/configure.ac @@ -2330,6 +2330,39 @@ swr_llvm_check() { fi } +swr_cxx_feature_flags_check() { +preprocessor_test="$1" +option_list="$2" +unset SWR_CXX_FEATURE_FLAGS +AC_LANG_PUSH([C++]) +save_CXXFLAGS="$CXXFLAGS" +save_IFS="$IFS" +IFS="," +found=0 +for opts in $option_list +do +unset IFS +CXXFLAGS="$opts $save_CXXFLAGS" +AC_COMPILE_IFELSE( +[AC_LANG_PROGRAM( +[ #if !($preprocessor_test) +#error +#endif +])], +[found=1; break], +[]) +IFS="," +done +IFS="$save_IFS" +CXXFLAGS="$save_CXXFLAGS" +AC_LANG_POP([C++]) +if test $found -eq 1; then +SWR_CXX_FEATURE_FLAGS="$opts" +return 0 +fi +return 1 +} + dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after this block if test -n "$with_gallium_drivers"; then gallium_drivers=`IFS=', '; echo $with_gallium_drivers` @@ -2399,31 +2432,36 @@ if test -n "$with_gallium_drivers"; then xswr) swr_llvm_check "swr" -AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2]) -SWR_AVX_CXXFLAGS="-mavx" -SWR_AVX2_CXXFLAGS="-mavx2 -mfma -mbmi2 -mf16c" - -AC_LANG_PUSH([C++]) -save_CXXFLAGS="$CXXFLAGS" -CXXFLAGS="-std=c++11 $CXXFLAGS" -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], - [AC_MSG_ERROR([c++11 compiler support not detected])]) -CXXFLAGS="$save_CXXFLAGS" - -save_CXXFLAGS="$CXXFLAGS" -CXXFLAGS="$SWR_AVX_CXXFLAGS $CXXFLAGS" -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], - [AC_MSG_ERROR([AVX compiler support not detected])]) -CXXFLAGS="$save_CXXFLAGS" - -save_CFLAGS="$CXXFLAGS" -CXXFLAGS="$SWR_AVX2_CXXFLAGS $CXXFLAGS" -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], - [AC_MSG_ERROR([AVX2 compiler support not detected])]) -CXXFLAGS="$save_CXXFLAGS" -AC_LANG_POP([C++]) - +AC_MSG_CHECKING([whether $CXX supports c++11]) +if ! swr_cxx_feature_flags_check \ +"__cplusplus >= 201103L" \ +",-std=c++11"; then +AC_MSG_RESULT([no]) +AC_MSG_ERROR([swr requires C++11 support]) +fi +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS]) +CXXFLAGS="$SWR_CXX_FEATURE_FLAGS $CXXFLAGS" + +AC_MSG_CHECKING([whether $CXX supports AVX]) +if ! swr_cxx_feature_flags_check \ +"defined(__AVX__)" \ +",-mavx,-march=core-avx"; then +AC_MSG_RESULT([no]) +AC_MSG_ERROR([swr requires AVX compiler support]) +fi +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS]) +SWR_AVX_CXXFLAGS="$SWR_CXX_FEATURE_FLAGS" AC_SUBST([SWR_AVX_CXXFLAGS]) + +AC_MSG_CHECKING([whether $CXX supports AVX2]) +if ! swr_cxx_feature_flags_check \ + "defined(__AVX2__)&(__FMA__)&(__BMI2__)&(__F16C__)" \ +",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2"; then +AC_MSG_RESULT([no]) +AC_MSG_ERROR([swr requires AVX2 compiler support]) +fi +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS]) +SWR_AVX2_CXXFLAGS="$SWR_CXX_FEATURE_FLAGS" AC_SUBST([SWR_AVX2_CXXFLAGS]) HAVE_GALLIUM_SWR=yes -- 2.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates
On Tue, Jun 28, 2016 at 10:21 AM, Samuel Pitoisetwrote: > On 06/28/2016 04:15 PM, Ilia Mirkin wrote: >> >> Again, what problem was this patch trying to solve? > > > The problem is that FADD can only emits 19-bits but longIMMD() will return > false because it only checks for the high 12-bits. > > I don't know if you saw my messages on IRC but I found some other issues > with longIMMD() and emitIMMD(). Nope, it will emit 19 bits and then the 20th (high aka sign) bit as well, just to a different location. [And the bottom 12 bits are guaranteed to be 0.] What's a specific example that you think it doesn't emit correctly? -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] doc: improve INTEL_DEBUG documentation
On Tuesday, June 28, 2016 1:33:21 AM PDT Grazvydas Ignotas wrote: > Remove 'reg' option that does not actually exist, elaborate more about > 'sync' and add the missing options. > > Signed-off-by: Grazvydas Ignotas> --- > no commit access, if this is ok please somebody push > > docs/envvars.html | 12 ++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/docs/envvars.html b/docs/envvars.html > index ed957bd..2d9a289 100644 > --- a/docs/envvars.html > +++ b/docs/envvars.html > @@ -144,11 +144,10 @@ See the Xlib software driver > page for details. > bat - emit batch information > pix - emit messages about pixel operations > buf - emit messages about buffer objects > - reg - emit messages about regions > fbo - emit messages about framebuffers > fs - dump shader assembly for fragment shaders > gs - dump shader assembly for geometry shaders > - sync - emit messages about synchronization > + sync - after sending each batch, emit a message and wait for that > batch to finish rendering > prim - emit messages about drawing primitives > vert - emit messages about vertex assembly > dri - emit messages about the DRI interface > @@ -163,9 +162,18 @@ See the Xlib software driver > page for details. > blorp - emit messages about the blorp operations (blits > clears) > nodualobj - suppress generation of dual-object geometry shader > code > optimizer - dump shader assembly to files at each optimization pass > and iteration that make progress > + ann - annotate IR in assembly dumps > + no8 - don't generate SIMD8 fragment shader > vec4 - force vec4 mode in vertex shader > spill_fs - force spilling of all registers in the scalar backend > (useful to debug spilling code) > spill_vec4 - force spilling of all registers in the vec4 backend > (useful to debug spilling code) > + cs - dump shader assembly for compute shaders > + hex - print instruction hex dump with the disassembly > + nocompact - disable instruction compaction > + tcs - dump shader assembly for tessellation control shaders > + tes - dump shader assembly for tessellation evaluation shaders > + l3 - emit messages about the new L3 state during transitions > + do32 - generate compute shader SIMD32 programs even if workgroup size > doesn't exceed the SIMD16 limit > norbc - disable single sampled render buffer compression > > > Reviewed-by: Kenneth Graunke Also pushed: To ssh://git.freedesktop.org/git/mesa/mesa c1dbc56..2343235 master -> master Thanks! signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates
On 06/28/2016 04:15 PM, Ilia Mirkin wrote: On Tue, Jun 28, 2016 at 10:11 AM, Samuel Pitoisetwrote: On 06/28/2016 04:00 PM, Ilia Mirkin wrote: On Tue, Jun 28, 2016 at 4:33 AM, Samuel Pitoiset wrote: On 06/28/2016 05:10 AM, Ilia Mirkin wrote: On Mon, Jun 27, 2016 at 6:08 PM, Samuel Pitoiset wrote: On 06/28/2016 12:06 AM, Ilia Mirkin wrote: On Mon, Jun 27, 2016 at 6:05 PM, Ilia Mirkin wrote: On Mon, Jun 27, 2016 at 6:04 PM, Samuel Pitoiset wrote: On 06/28/2016 12:02 AM, Ilia Mirkin wrote: This loses you saturation. Does the target account for this? No saturate flag for FADD32I. That's not what I asked. Specifically look at this code: bool TargetNVC0::isSatSupported(const Instruction *insn) const { if (insn->op == OP_CVT) return true; if (!(opInfo[insn->op].dstMods & NV50_IR_MOD_SAT)) return false; if (insn->dType == TYPE_U32) return (insn->op == OP_ADD) || (insn->op == OP_MAD); // add f32 LIMM cannot saturate if (insn->op == OP_ADD && insn->sType == TYPE_F32) { if (insn->getSrc(1)->asImm() && insn->getSrc(1)->reg.data.u32 & 0xfff) return false; } Note how it will say that sat is supported for SIMMs with FADD? So the compiler will generate those ops, but then the emitter won't be able to handle it. Okay, I get it. By the way, instead of trying to fight the longIMMD, you should just fix it - /*0008*/ @P0 FADD R0, R0, 1.NEG; /* 0x3858203f8000 */ which corresponds nicely to emitNEG(0x2d, insn->src(1)); The issue is that emitIMMD does if (len == 19) { ... emitField( 56, 1, (val & 0x8) >> 19); emitField(pos, len, (val & 0x7)); So the problem is that the 56 isn't as fixed as the emission code had hoped. I suspect that adjusting it will fix all these silly cases. -ilia /*0010*/ @P0 FADD R0, R0, 0.NEG; /* 0x38582000 */ /*0010*/ @P0 FADD R0, R0, -0; /* 0x3958 */ urgh? So ... what problem were you having again? The thing is: why those 2 instructions use a different position for the neg flag? One is setting the high bit of the immediate, the other is applying negation to the argument. Ok. Again, what problem was this patch trying to solve? The problem is that FADD can only emits 19-bits but longIMMD() will return false because it only checks for the high 12-bits. I don't know if you saw my messages on IRC but I found some other issues with longIMMD() and emitIMMD(). -ilia -- -Samuel ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates
On Tue, Jun 28, 2016 at 10:11 AM, Samuel Pitoisetwrote: > > > On 06/28/2016 04:00 PM, Ilia Mirkin wrote: >> >> On Tue, Jun 28, 2016 at 4:33 AM, Samuel Pitoiset >> wrote: >>> >>> >>> >>> On 06/28/2016 05:10 AM, Ilia Mirkin wrote: On Mon, Jun 27, 2016 at 6:08 PM, Samuel Pitoiset wrote: > > > > > On 06/28/2016 12:06 AM, Ilia Mirkin wrote: >> >> >> >> On Mon, Jun 27, 2016 at 6:05 PM, Ilia Mirkin >> wrote: >>> >>> >>> >>> On Mon, Jun 27, 2016 at 6:04 PM, Samuel Pitoiset >>> wrote: On 06/28/2016 12:02 AM, Ilia Mirkin wrote: > > > > > This loses you saturation. Does the target account for this? No saturate flag for FADD32I. >>> >>> >>> >>> >>> That's not what I asked. >> >> >> >> >> Specifically look at this code: >> >> bool >> TargetNVC0::isSatSupported(const Instruction *insn) const >> { >>if (insn->op == OP_CVT) >> return true; >>if (!(opInfo[insn->op].dstMods & NV50_IR_MOD_SAT)) >> return false; >> >>if (insn->dType == TYPE_U32) >> return (insn->op == OP_ADD) || (insn->op == OP_MAD); >> >>// add f32 LIMM cannot saturate >>if (insn->op == OP_ADD && insn->sType == TYPE_F32) { >> if (insn->getSrc(1)->asImm() && >> insn->getSrc(1)->reg.data.u32 & 0xfff) >> return false; >>} >> >> Note how it will say that sat is supported for SIMMs with FADD? So the >> compiler will generate those ops, but then the emitter won't be able >> to handle it. >> > > Okay, I get it. By the way, instead of trying to fight the longIMMD, you should just fix it - /*0008*/ @P0 FADD R0, R0, 1.NEG; /* 0x3858203f8000 */ which corresponds nicely to emitNEG(0x2d, insn->src(1)); The issue is that emitIMMD does if (len == 19) { ... emitField( 56, 1, (val & 0x8) >> 19); emitField(pos, len, (val & 0x7)); So the problem is that the 56 isn't as fixed as the emission code had hoped. I suspect that adjusting it will fix all these silly cases. -ilia >>> >>> /*0010*/ @P0 FADD R0, R0, 0.NEG; /* >>> 0x38582000 */ >>> /*0010*/ @P0 FADD R0, R0, -0; /* >>> 0x3958 */ >>> >>> urgh? >> >> >> So ... what problem were you having again? > > > The thing is: why those 2 instructions use a different position for the neg > flag? One is setting the high bit of the immediate, the other is applying negation to the argument. Again, what problem was this patch trying to solve? -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates
On 06/28/2016 04:00 PM, Ilia Mirkin wrote: On Tue, Jun 28, 2016 at 4:33 AM, Samuel Pitoisetwrote: On 06/28/2016 05:10 AM, Ilia Mirkin wrote: On Mon, Jun 27, 2016 at 6:08 PM, Samuel Pitoiset wrote: On 06/28/2016 12:06 AM, Ilia Mirkin wrote: On Mon, Jun 27, 2016 at 6:05 PM, Ilia Mirkin wrote: On Mon, Jun 27, 2016 at 6:04 PM, Samuel Pitoiset wrote: On 06/28/2016 12:02 AM, Ilia Mirkin wrote: This loses you saturation. Does the target account for this? No saturate flag for FADD32I. That's not what I asked. Specifically look at this code: bool TargetNVC0::isSatSupported(const Instruction *insn) const { if (insn->op == OP_CVT) return true; if (!(opInfo[insn->op].dstMods & NV50_IR_MOD_SAT)) return false; if (insn->dType == TYPE_U32) return (insn->op == OP_ADD) || (insn->op == OP_MAD); // add f32 LIMM cannot saturate if (insn->op == OP_ADD && insn->sType == TYPE_F32) { if (insn->getSrc(1)->asImm() && insn->getSrc(1)->reg.data.u32 & 0xfff) return false; } Note how it will say that sat is supported for SIMMs with FADD? So the compiler will generate those ops, but then the emitter won't be able to handle it. Okay, I get it. By the way, instead of trying to fight the longIMMD, you should just fix it - /*0008*/ @P0 FADD R0, R0, 1.NEG; /* 0x3858203f8000 */ which corresponds nicely to emitNEG(0x2d, insn->src(1)); The issue is that emitIMMD does if (len == 19) { ... emitField( 56, 1, (val & 0x8) >> 19); emitField(pos, len, (val & 0x7)); So the problem is that the 56 isn't as fixed as the emission code had hoped. I suspect that adjusting it will fix all these silly cases. -ilia /*0010*/ @P0 FADD R0, R0, 0.NEG; /* 0x38582000 */ /*0010*/ @P0 FADD R0, R0, -0; /* 0x3958 */ urgh? So ... what problem were you having again? The thing is: why those 2 instructions use a different position for the neg flag? An by the way, the bit 56 is fixed for all short immediates. -ilia -- -Samuel ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates
On Tue, Jun 28, 2016 at 4:33 AM, Samuel Pitoisetwrote: > > > On 06/28/2016 05:10 AM, Ilia Mirkin wrote: >> >> On Mon, Jun 27, 2016 at 6:08 PM, Samuel Pitoiset >> wrote: >>> >>> >>> >>> On 06/28/2016 12:06 AM, Ilia Mirkin wrote: On Mon, Jun 27, 2016 at 6:05 PM, Ilia Mirkin wrote: > > > On Mon, Jun 27, 2016 at 6:04 PM, Samuel Pitoiset > wrote: >> >> >> >> >> On 06/28/2016 12:02 AM, Ilia Mirkin wrote: >>> >>> >>> >>> This loses you saturation. Does the target account for this? >> >> >> >> >> No saturate flag for FADD32I. > > > > That's not what I asked. Specifically look at this code: bool TargetNVC0::isSatSupported(const Instruction *insn) const { if (insn->op == OP_CVT) return true; if (!(opInfo[insn->op].dstMods & NV50_IR_MOD_SAT)) return false; if (insn->dType == TYPE_U32) return (insn->op == OP_ADD) || (insn->op == OP_MAD); // add f32 LIMM cannot saturate if (insn->op == OP_ADD && insn->sType == TYPE_F32) { if (insn->getSrc(1)->asImm() && insn->getSrc(1)->reg.data.u32 & 0xfff) return false; } Note how it will say that sat is supported for SIMMs with FADD? So the compiler will generate those ops, but then the emitter won't be able to handle it. >>> >>> Okay, I get it. >> >> >> By the way, instead of trying to fight the longIMMD, you should just fix >> it - >> >> /*0008*/ @P0 FADD R0, R0, 1.NEG; /* >> 0x3858203f8000 */ >> >> which corresponds nicely to >> >> emitNEG(0x2d, insn->src(1)); >> >> The issue is that emitIMMD does >> >>if (len == 19) { >> ... >> emitField( 56, 1, (val & 0x8) >> 19); >> emitField(pos, len, (val & 0x7)); >> >> So the problem is that the 56 isn't as fixed as the emission code had >> hoped. I suspect that adjusting it will fix all these silly cases. >> >> -ilia >> > > /*0010*/ @P0 FADD R0, R0, 0.NEG; /* > 0x38582000 */ > /*0010*/ @P0 FADD R0, R0, -0; /* > 0x3958 */ > > urgh? So ... what problem were you having again? -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] swr: Refactor checks for compiler feature flags
Encapsulate the test for which flags are needed to get a compiler to support certain features. Along with this, give various options to try for AVX and AVX2 support. Ideally we want to use specific instruction set feature flags, like -mavx2 for instance instead of -march=haswell, but the flags required for certain compilers are different. This allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c while the Intel compiler which doesn't support those flags can fall back to using -march=core-avx2. This addresses a bug where the Intel compiler will silently ignore the AVX2 instruction feature flags and then potentially fail to build. Cc: Tim RowleySigned-off-by: Chuck Atkins --- configure.ac | 86 +++- 1 file changed, 62 insertions(+), 24 deletions(-) diff --git a/configure.ac b/configure.ac index cc9bc47..806850e 100644 --- a/configure.ac +++ b/configure.ac @@ -2330,6 +2330,39 @@ swr_llvm_check() { fi } +swr_cxx_feature_flags_check() { +ifndef_test=$1 +option_list="$2" +unset SWR_CXX_FEATURE_FLAGS +AC_LANG_PUSH([C++]) +save_CXXFLAGS="$CXXFLAGS" +save_IFS="$IFS" +IFS="," +found=0 +for opts in $option_list +do +unset IFS +CXXFLAGS="$opts $save_CXXFLAGS" +AC_COMPILE_IFELSE( +[AC_LANG_PROGRAM( +[ $ifndef_test +#error +#endif +])], +[found=1; break], +[]) +IFS="," +done +IFS="$save_IFS" +CXXFLAGS="$save_CXXFLAGS" +AC_LANG_POP([C++]) +if test $found -eq 1; then +SWR_CXX_FEATURE_FLAGS="$opts" +return 0 +fi +return 1 +} + dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after this block if test -n "$with_gallium_drivers"; then gallium_drivers=`IFS=', '; echo $with_gallium_drivers` @@ -2399,31 +2432,36 @@ if test -n "$with_gallium_drivers"; then xswr) swr_llvm_check "swr" -AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2]) -SWR_AVX_CXXFLAGS="-mavx" -SWR_AVX2_CXXFLAGS="-mavx2 -mfma -mbmi2 -mf16c" - -AC_LANG_PUSH([C++]) -save_CXXFLAGS="$CXXFLAGS" -CXXFLAGS="-std=c++11 $CXXFLAGS" -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], - [AC_MSG_ERROR([c++11 compiler support not detected])]) -CXXFLAGS="$save_CXXFLAGS" - -save_CXXFLAGS="$CXXFLAGS" -CXXFLAGS="$SWR_AVX_CXXFLAGS $CXXFLAGS" -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], - [AC_MSG_ERROR([AVX compiler support not detected])]) -CXXFLAGS="$save_CXXFLAGS" - -save_CFLAGS="$CXXFLAGS" -CXXFLAGS="$SWR_AVX2_CXXFLAGS $CXXFLAGS" -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[], - [AC_MSG_ERROR([AVX2 compiler support not detected])]) -CXXFLAGS="$save_CXXFLAGS" -AC_LANG_POP([C++]) - +AC_MSG_CHECKING([whether $CXX supports c++11]) +if ! swr_cxx_feature_flags_check \ +"#if __cplusplus < 201103L" \ +",-std=c++11"; then +AC_MSG_RESULT([no]) +AC_MSG_ERROR([swr requires C++11 support]) +fi +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS]) +CXXFLAGS="$SWR_CXX_FEATURE_FLAGS $CXXFLAGS" + +AC_MSG_CHECKING([whether $CXX supports AVX]) +if ! swr_cxx_feature_flags_check \ +"#ifndef __AVX__" \ +",-mavx,-march=core-avx"; then +AC_MSG_RESULT([no]) +AC_MSG_ERROR([swr requires AVX compiler support]) +fi +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS]) +SWR_AVX_CXXFLAGS="$SWR_CXX_FEATURE_FLAGS" AC_SUBST([SWR_AVX_CXXFLAGS]) + +AC_MSG_CHECKING([whether $CXX supports AVX2]) +if ! swr_cxx_feature_flags_check \ +"#if !(defined(__AVX2__)&(__FMA__)&(__BMI2__)&(__F16C__))" \ +",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2"; then +AC_MSG_RESULT([no]) +AC_MSG_ERROR([swr requires AVX2 compiler support]) +fi +AC_MSG_RESULT([$SWR_CXX_FEATURE_FLAGS]) +SWR_AVX2_CXXFLAGS="$SWR_CXX_FEATURE_FLAGS" AC_SUBST([SWR_AVX2_CXXFLAGS]) HAVE_GALLIUM_SWR=yes -- 2.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] vl: add a bicubic interpolation filter(v4)
On 2016-06-28 11:25, Nayan Deshmukh wrote: This is a shader based bicubic interpolater which uses cubic Hermite spline algorithm. v2: set dst_area and dst_clip during scaling (Christian) v3: clear the render target before rendering v4: intialize offsets while initializing shaders use a constant buffer to send dst_size to frag shader small changes to reduce calculation in shader Signed-off-by: Nayan Deshmukh--- src/gallium/auxiliary/Makefile.sources | 2 + src/gallium/auxiliary/vl/vl_bicubic_filter.c | 465 +++ src/gallium/auxiliary/vl/vl_bicubic_filter.h | 63 3 files changed, 530 insertions(+) create mode 100644 src/gallium/auxiliary/vl/vl_bicubic_filter.c create mode 100644 src/gallium/auxiliary/vl/vl_bicubic_filter.h diff --git a/src/gallium/auxiliary/Makefile.sources b/src/gallium/auxiliary/Makefile.sources index ab58358..e0311bf 100644 --- a/src/gallium/auxiliary/Makefile.sources +++ b/src/gallium/auxiliary/Makefile.sources @@ -317,6 +317,8 @@ NIR_SOURCES := \ nir/tgsi_to_nir.h VL_SOURCES := \ + vl/vl_bicubic_filter.c \ + vl/vl_bicubic_filter.h \ vl/vl_compositor.c \ vl/vl_compositor.h \ vl/vl_csc.c \ diff --git a/src/gallium/auxiliary/vl/vl_bicubic_filter.c b/src/gallium/auxiliary/vl/vl_bicubic_filter.c new file mode 100644 index 000..396e76d --- /dev/null +++ b/src/gallium/auxiliary/vl/vl_bicubic_filter.c @@ -0,0 +1,465 @@ +/** + * + * Copyright 2016 Nayan Deshmukh. + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sub license, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial portions + * of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. + * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + **/ + +#include + +#include "pipe/p_context.h" + +#include "tgsi/tgsi_ureg.h" + +#include "util/u_draw.h" +#include "util/u_memory.h" +#include "util/u_math.h" +#include "util/u_rect.h" + +#include "vl_types.h" +#include "vl_vertex_buffers.h" +#include "vl_bicubic_filter.h" + +enum VS_OUTPUT +{ + VS_O_VPOS = 0, + VS_O_VTEX = 0 +}; + +static void * +create_vert_shader(struct vl_bicubic_filter *filter) +{ + struct ureg_program *shader; + struct ureg_src i_vpos; + struct ureg_dst o_vpos, o_vtex; + + shader = ureg_create(PIPE_SHADER_VERTEX); + if (!shader) + return NULL; + + i_vpos = ureg_DECL_vs_input(shader, 0); + o_vpos = ureg_DECL_output(shader, TGSI_SEMANTIC_POSITION, VS_O_VPOS); + o_vtex = ureg_DECL_output(shader, TGSI_SEMANTIC_GENERIC, VS_O_VTEX); + + ureg_MOV(shader, o_vpos, i_vpos); + ureg_MOV(shader, o_vtex, i_vpos); + + ureg_END(shader); + + return ureg_create_shader_and_destroy(shader, filter->pipe); +} + +static void +create_frag_shader_cubic_interpolater(struct ureg_program *shader, struct ureg_src tex_a, + struct ureg_src tex_b, struct ureg_src tex_c, + struct ureg_src tex_d, struct ureg_src t, + struct ureg_dst o_fragment) +{ + struct ureg_dst temp[11]; + struct ureg_dst t_2; + unsigned i; + + for(i = 0; i < 11; ++i) + temp[i] = ureg_DECL_temporary(shader); + t_2 = ureg_DECL_temporary(shader); + + /* +* |temp[0]| | 0 2 0 0 | |tex_a| +* |temp[1]| = | -1 0 1 0 |* |tex_b| +* |temp[2]| | 2 -5 4 -1 | |tex_c| +* |temp[3]| | -1 3 -3 1 | |tex_d| +*/ + ureg_MUL(shader, temp[0], tex_b, ureg_imm1f(shader, 2.0f)); + + ureg_MUL(shader, temp[1], tex_a, ureg_imm1f(shader, -1.0f)); + ureg_MAD(shader, temp[1], tex_c, ureg_imm1f(shader, 1.0f), +ureg_src(temp[1])); + + ureg_MUL(shader, temp[2], tex_a, ureg_imm1f(shader, 2.0f)); + ureg_MAD(shader, temp[2], tex_b, ureg_imm1f(shader, -5.0f), +ureg_src(temp[2])); + ureg_MAD(shader, temp[2], tex_c, ureg_imm1f(shader, 4.0f), +
Re: [Mesa-dev] [PATCH 1/2] vl: add a bicubic interpolation filter(v4)
Am 28.06.2016 um 11:25 schrieb Nayan Deshmukh: This is a shader based bicubic interpolater which uses cubic Hermite spline algorithm. v2: set dst_area and dst_clip during scaling (Christian) v3: clear the render target before rendering v4: intialize offsets while initializing shaders use a constant buffer to send dst_size to frag shader small changes to reduce calculation in shader Signed-off-by: Nayan Deshmukh--- src/gallium/auxiliary/Makefile.sources | 2 + src/gallium/auxiliary/vl/vl_bicubic_filter.c | 465 +++ src/gallium/auxiliary/vl/vl_bicubic_filter.h | 63 3 files changed, 530 insertions(+) create mode 100644 src/gallium/auxiliary/vl/vl_bicubic_filter.c create mode 100644 src/gallium/auxiliary/vl/vl_bicubic_filter.h diff --git a/src/gallium/auxiliary/Makefile.sources b/src/gallium/auxiliary/Makefile.sources index ab58358..e0311bf 100644 --- a/src/gallium/auxiliary/Makefile.sources +++ b/src/gallium/auxiliary/Makefile.sources @@ -317,6 +317,8 @@ NIR_SOURCES := \ nir/tgsi_to_nir.h VL_SOURCES := \ + vl/vl_bicubic_filter.c \ + vl/vl_bicubic_filter.h \ vl/vl_compositor.c \ vl/vl_compositor.h \ vl/vl_csc.c \ diff --git a/src/gallium/auxiliary/vl/vl_bicubic_filter.c b/src/gallium/auxiliary/vl/vl_bicubic_filter.c new file mode 100644 index 000..396e76d --- /dev/null +++ b/src/gallium/auxiliary/vl/vl_bicubic_filter.c @@ -0,0 +1,465 @@ +/** + * + * Copyright 2016 Nayan Deshmukh. + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sub license, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial portions + * of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. + * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + **/ + +#include + +#include "pipe/p_context.h" + +#include "tgsi/tgsi_ureg.h" + +#include "util/u_draw.h" +#include "util/u_memory.h" +#include "util/u_math.h" +#include "util/u_rect.h" + +#include "vl_types.h" +#include "vl_vertex_buffers.h" +#include "vl_bicubic_filter.h" + +enum VS_OUTPUT +{ + VS_O_VPOS = 0, + VS_O_VTEX = 0 +}; + +static void * +create_vert_shader(struct vl_bicubic_filter *filter) +{ + struct ureg_program *shader; + struct ureg_src i_vpos; + struct ureg_dst o_vpos, o_vtex; + + shader = ureg_create(PIPE_SHADER_VERTEX); + if (!shader) + return NULL; + + i_vpos = ureg_DECL_vs_input(shader, 0); + o_vpos = ureg_DECL_output(shader, TGSI_SEMANTIC_POSITION, VS_O_VPOS); + o_vtex = ureg_DECL_output(shader, TGSI_SEMANTIC_GENERIC, VS_O_VTEX); + + ureg_MOV(shader, o_vpos, i_vpos); + ureg_MOV(shader, o_vtex, i_vpos); + + ureg_END(shader); + + return ureg_create_shader_and_destroy(shader, filter->pipe); +} + +static void +create_frag_shader_cubic_interpolater(struct ureg_program *shader, struct ureg_src tex_a, + struct ureg_src tex_b, struct ureg_src tex_c, + struct ureg_src tex_d, struct ureg_src t, + struct ureg_dst o_fragment) +{ + struct ureg_dst temp[11]; + struct ureg_dst t_2; + unsigned i; + + for(i = 0; i < 11; ++i) + temp[i] = ureg_DECL_temporary(shader); + t_2 = ureg_DECL_temporary(shader); + + /* +* |temp[0]| | 0 2 0 0 | |tex_a| +* |temp[1]| = | -1 0 1 0 |* |tex_b| +* |temp[2]| | 2 -5 4 -1 | |tex_c| +* |temp[3]| | -1 3 -3 1 | |tex_d| +*/ + ureg_MUL(shader, temp[0], tex_b, ureg_imm1f(shader, 2.0f)); + + ureg_MUL(shader, temp[1], tex_a, ureg_imm1f(shader, -1.0f)); + ureg_MAD(shader, temp[1], tex_c, ureg_imm1f(shader, 1.0f), +ureg_src(temp[1])); + + ureg_MUL(shader, temp[2], tex_a, ureg_imm1f(shader, 2.0f)); + ureg_MAD(shader, temp[2], tex_b, ureg_imm1f(shader, -5.0f), +ureg_src(temp[2])); + ureg_MAD(shader, temp[2], tex_c, ureg_imm1f(shader, 4.0f), +
Re: [Mesa-dev] [PATCH resend] pipe_loader_sw: Fix fd leak when instantiated via pipe_loader_sw_probe_kms
Hi, On 27-05-16 16:24, Emil Velikov wrote: Hi Hans, On 27 May 2016 at 15:06, Hans de Goedewrote: Make pipe_loader_sw_probe_kms take ownership of the passed in fd, like pipe_loader_drm_probe_fd does. The only caller is dri_kms_init_screen which passes in a dupped fd, just like dri2_init_screen passes in a dupped fd to pipe_loader_drm_probe_fd. My memory is failing ... I thought I replied to this. The patch is correct, so Reviewed-by: Emil Velikov Thanks, unfortunately I was swamped with other stuff, so I did not get around until pushing it until now. It is pushed now. I wonder when I'll get the chance to fold the almost-but-no-quite-the-same sw and hw side of the pipe loader. If you're interested let me know. Sorry, -ENOTIME. Regards, Hans ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Fwd: [PATCH] st/vdpau: use bicubic filter for scaling
Hi Christian and Andy, I have sent new series of patches which takes care of the points Christian pointed out. I have also made some changes to make it more efficient than before. Also due to a wrong message id, I have sent the messages as a new thread instead of replying to this thread. Regards, Nayan. On Mon, Jun 27, 2016 at 7:50 PM, Christian Königwrote: > This code fragment: > > + /* t = frac(i_vtex*size) > ... > + ureg_MUL(shader, t, i_vtex, ureg_imm1f(shader, size)); > > Probably doesn't do what you expect it to do when the pixel center is at > 0.5 instead of 0.0. > > For the matrix and most other filters the difference doesn't matter > because you get the same offset on x/y as input you need to apply in the > texture instructions as well. > > Regards, > Christian. > > > Am 27.06.2016 um 15:51 schrieb Nayan Deshmukh: > > Hi Christian, > > I haven't taken that into account, but how will it any way affect my > calculation. I have written > the code taking inspiration from the way matrix_filter uses offsets. > > Regards, > Nayan. > > On Mon, Jun 27, 2016 at 6:55 PM, Christian König > wrote: > >> Hi guys, >> >> Nayan have you taken into account that the pixel center is at 0.5 and not >> 0.0? >> >> Regards, >> Christian. >> >> >> Am 26.06.2016 um 22:30 schrieb Andy Furniss: >> >>> Nayan Deshmukh wrote: >>> Hi Andy, On Sun, Jun 26, 2016 at 12:25 AM, Andy Furniss < adf.li...@gmail.com> wrote: Nayan Deshmukh wrote: > > Hi Andy, >> >> Thanks for testing the patches. >> >> Please send me the videos and ratios with which there is corruption. >> >> > > > https://drive.google.com/file/d/0BxP5-S1t9VEEaHZEM203RFpyNEE/view?usp=sharing > > This has no aspect encoded and displayed fullscreen on a 1920x1080 > monitor shows vertical line artifacts over the first 2/3 of the image. > > When I say lines they are not lines as such just that the distortion > on the pendulum shows as it passes over imaginary lines at fixed > points on the screen. > > with mplayer -aspect 4/3 or 16/9 it doesn't. > > I tested the videos and found out that the distortion is because of the amount of calculation done in the fragment shader. I tested the video with vl_median_filter and it showed no distortion however, with vl_matrix_filter( which requires more calculations than vl_median_filter) it showed the same distortion. I'll try to make it more efficient. But it still requires a lot of processing for a single pixel as it uses 15 neighbouring pixel. >>> >>> Seems a bit strange, does the processing needed vary greatly with >>> similar scale amounts? I have a powerful GPU and can force clocks >>> high, but it makes no difference. >>> >>> Below is a png showing the artifacts I see on pendulum fullscreen >>> are these what you see? >>> >>> If rather than full screen I stretch out the window to scale, there >>> will be many sizes that don't produce those. >>> >>> >>> https://drive.google.com/file/d/0BxP5-S1t9VEEd2hwNVp0ZXRSZTA/view?usp=sharing >>> >>> Also I don't see any offsets with the videos, may be I am missing something. If could tell me more about the offsets, I'll try to debug them. >>> >>> >>> https://drive.google.com/file/d/0BxP5-S1t9VEEUGZTbndOMzBNZnM/view?usp=sharing >>> >>> Is a default scale, if you download both pngs and use something to >>> display them both at the same time and line up the windows one on >>> top of the other then flip between them you can see although the >>> windows are lined up the images contained are not. >>> >>> You can make your own screen/window shots with xwd and display them >>> with xwud. For me using fluxbox as a desktop it's easy to line up >>> windows as they snap a bit towards the edge of the screen YMMV. >>> >>> ___ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>> >> >> > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] st/vdpau: use bicubic filter for scaling(v6)
use bicubic filtering as high quality scaling L1. v2: fix a typo and add a newline to code v3: -render the unscaled image on a temporary surface (Christian) -apply noise reduction and sharpness filter on unscaled surface -render the final scaled surface using bicubic interpolation v4: support high quality scaling v5: set dst_area and dst_clip in bicubic filter v6: set buffer layer before setting dst_area Signed-off-by: Nayan Deshmukh--- src/gallium/state_trackers/vdpau/mixer.c | 112 --- src/gallium/state_trackers/vdpau/query.c | 1 + src/gallium/state_trackers/vdpau/vdpau_private.h | 6 ++ 3 files changed, 105 insertions(+), 14 deletions(-) diff --git a/src/gallium/state_trackers/vdpau/mixer.c b/src/gallium/state_trackers/vdpau/mixer.c index 65c3ce2..4dbbdf6 100644 --- a/src/gallium/state_trackers/vdpau/mixer.c +++ b/src/gallium/state_trackers/vdpau/mixer.c @@ -82,7 +82,6 @@ vlVdpVideoMixerCreate(VdpDevice device, switch (features[i]) { /* they are valid, but we doesn't support them */ case VDP_VIDEO_MIXER_FEATURE_DEINTERLACE_TEMPORAL_SPATIAL: - case VDP_VIDEO_MIXER_FEATURE_HIGH_QUALITY_SCALING_L1: case VDP_VIDEO_MIXER_FEATURE_HIGH_QUALITY_SCALING_L2: case VDP_VIDEO_MIXER_FEATURE_HIGH_QUALITY_SCALING_L3: case VDP_VIDEO_MIXER_FEATURE_HIGH_QUALITY_SCALING_L4: @@ -110,6 +109,9 @@ vlVdpVideoMixerCreate(VdpDevice device, vmixer->luma_key.supported = true; break; + case VDP_VIDEO_MIXER_FEATURE_HIGH_QUALITY_SCALING_L1: + vmixer->bicubic.supported = true; + break; default: goto no_params; } } @@ -202,6 +204,11 @@ vlVdpVideoMixerDestroy(VdpVideoMixer mixer) vl_matrix_filter_cleanup(vmixer->sharpness.filter); FREE(vmixer->sharpness.filter); } + + if (vmixer->bicubic.filter) { + vl_bicubic_filter_cleanup(vmixer->bicubic.filter); + FREE(vmixer->bicubic.filter); + } pipe_mutex_unlock(vmixer->device->mutex); DeviceReference(>device, NULL); @@ -230,9 +237,11 @@ VdpStatus vlVdpVideoMixerRender(VdpVideoMixer mixer, VdpLayer const *layers) { enum vl_compositor_deinterlace deinterlace; - struct u_rect rect, clip, *prect; + struct u_rect rect, clip, *prect, dirty_area; unsigned i, layer = 0; struct pipe_video_buffer *video_buffer; + struct pipe_sampler_view *sampler_view; + struct pipe_surface *surface; vlVdpVideoMixer *vmixer; vlVdpSurface *surf; @@ -325,7 +334,43 @@ VdpStatus vlVdpVideoMixerRender(VdpVideoMixer mixer, prect = } vl_compositor_set_buffer_layer(>cstate, compositor, layer, video_buffer, prect, NULL, deinterlace); - vl_compositor_set_layer_dst_area(>cstate, layer++, RectToPipe(destination_video_rect, )); + + if(vmixer->bicubic.filter) { + struct pipe_context *pipe; + struct pipe_resource res_tmpl, *res; + struct pipe_sampler_view sv_templ; + struct pipe_surface surf_templ; + + pipe = vmixer->device->context; + memset(_tmpl, 0, sizeof(res_tmpl)); + + res_tmpl.target = PIPE_TEXTURE_2D; + res_tmpl.width0 = surf->templat.width; + res_tmpl.height0 = surf->templat.height; + res_tmpl.format = dst->sampler_view->texture->format; + res_tmpl.depth0 = 1; + res_tmpl.array_size = 1; + res_tmpl.bind = PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_RENDER_TARGET; + res_tmpl.usage = PIPE_USAGE_DEFAULT; + + res = pipe->screen->resource_create(pipe->screen, _tmpl); + + vlVdpDefaultSamplerViewTemplate(_templ, res); + sampler_view = pipe->create_sampler_view(pipe, res, _templ); + + memset(_templ, 0, sizeof(surf_templ)); + surf_templ.format = res->format; + surface = pipe->create_surface(pipe, res, _templ); + + vl_compositor_reset_dirty_area(_area); + pipe_resource_reference(, NULL); + } else { + surface = dst->surface; + sampler_view = dst->sampler_view; + dirty_area = dst->dirty_area; + vl_compositor_set_layer_dst_area(>cstate, layer++, RectToPipe(destination_video_rect, )); + vl_compositor_set_dst_clip(>cstate, RectToPipe(destination_rect, )); + } for (i = 0; i < layer_count; ++i) { vlVdpOutputSurface *src = vlGetDataHTAB(layers->source_surface); @@ -343,22 +388,29 @@ VdpStatus vlVdpVideoMixerRender(VdpVideoMixer mixer, ++layers; } - vl_compositor_set_dst_clip(>cstate, RectToPipe(destination_rect, )); - if (!vmixer->noise_reduction.filter && !vmixer->sharpness.filter) + if (!vmixer->noise_reduction.filter && !vmixer->sharpness.filter && !vmixer->bicubic.filter) vlVdpSave4DelayedRendering(vmixer->device, destination_surface, >cstate); else { - vl_compositor_render(>cstate, compositor, dst->surface, >dirty_area, true); + vl_compositor_render(>cstate, compositor, surface, _area, true); - /* applying the noise reduction
[Mesa-dev] [PATCH 1/2] vl: add a bicubic interpolation filter(v4)
This is a shader based bicubic interpolater which uses cubic Hermite spline algorithm. v2: set dst_area and dst_clip during scaling (Christian) v3: clear the render target before rendering v4: intialize offsets while initializing shaders use a constant buffer to send dst_size to frag shader small changes to reduce calculation in shader Signed-off-by: Nayan Deshmukh--- src/gallium/auxiliary/Makefile.sources | 2 + src/gallium/auxiliary/vl/vl_bicubic_filter.c | 465 +++ src/gallium/auxiliary/vl/vl_bicubic_filter.h | 63 3 files changed, 530 insertions(+) create mode 100644 src/gallium/auxiliary/vl/vl_bicubic_filter.c create mode 100644 src/gallium/auxiliary/vl/vl_bicubic_filter.h diff --git a/src/gallium/auxiliary/Makefile.sources b/src/gallium/auxiliary/Makefile.sources index ab58358..e0311bf 100644 --- a/src/gallium/auxiliary/Makefile.sources +++ b/src/gallium/auxiliary/Makefile.sources @@ -317,6 +317,8 @@ NIR_SOURCES := \ nir/tgsi_to_nir.h VL_SOURCES := \ + vl/vl_bicubic_filter.c \ + vl/vl_bicubic_filter.h \ vl/vl_compositor.c \ vl/vl_compositor.h \ vl/vl_csc.c \ diff --git a/src/gallium/auxiliary/vl/vl_bicubic_filter.c b/src/gallium/auxiliary/vl/vl_bicubic_filter.c new file mode 100644 index 000..396e76d --- /dev/null +++ b/src/gallium/auxiliary/vl/vl_bicubic_filter.c @@ -0,0 +1,465 @@ +/** + * + * Copyright 2016 Nayan Deshmukh. + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sub license, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial portions + * of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. + * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + **/ + +#include + +#include "pipe/p_context.h" + +#include "tgsi/tgsi_ureg.h" + +#include "util/u_draw.h" +#include "util/u_memory.h" +#include "util/u_math.h" +#include "util/u_rect.h" + +#include "vl_types.h" +#include "vl_vertex_buffers.h" +#include "vl_bicubic_filter.h" + +enum VS_OUTPUT +{ + VS_O_VPOS = 0, + VS_O_VTEX = 0 +}; + +static void * +create_vert_shader(struct vl_bicubic_filter *filter) +{ + struct ureg_program *shader; + struct ureg_src i_vpos; + struct ureg_dst o_vpos, o_vtex; + + shader = ureg_create(PIPE_SHADER_VERTEX); + if (!shader) + return NULL; + + i_vpos = ureg_DECL_vs_input(shader, 0); + o_vpos = ureg_DECL_output(shader, TGSI_SEMANTIC_POSITION, VS_O_VPOS); + o_vtex = ureg_DECL_output(shader, TGSI_SEMANTIC_GENERIC, VS_O_VTEX); + + ureg_MOV(shader, o_vpos, i_vpos); + ureg_MOV(shader, o_vtex, i_vpos); + + ureg_END(shader); + + return ureg_create_shader_and_destroy(shader, filter->pipe); +} + +static void +create_frag_shader_cubic_interpolater(struct ureg_program *shader, struct ureg_src tex_a, + struct ureg_src tex_b, struct ureg_src tex_c, + struct ureg_src tex_d, struct ureg_src t, + struct ureg_dst o_fragment) +{ + struct ureg_dst temp[11]; + struct ureg_dst t_2; + unsigned i; + + for(i = 0; i < 11; ++i) + temp[i] = ureg_DECL_temporary(shader); + t_2 = ureg_DECL_temporary(shader); + + /* +* |temp[0]| | 0 2 0 0 | |tex_a| +* |temp[1]| = | -1 0 1 0 |* |tex_b| +* |temp[2]| | 2 -5 4 -1 | |tex_c| +* |temp[3]| | -1 3 -3 1 | |tex_d| +*/ + ureg_MUL(shader, temp[0], tex_b, ureg_imm1f(shader, 2.0f)); + + ureg_MUL(shader, temp[1], tex_a, ureg_imm1f(shader, -1.0f)); + ureg_MAD(shader, temp[1], tex_c, ureg_imm1f(shader, 1.0f), +ureg_src(temp[1])); + + ureg_MUL(shader, temp[2], tex_a, ureg_imm1f(shader, 2.0f)); + ureg_MAD(shader, temp[2], tex_b, ureg_imm1f(shader, -5.0f), +ureg_src(temp[2])); + ureg_MAD(shader, temp[2], tex_c, ureg_imm1f(shader, 4.0f), +ureg_src(temp[2])); + ureg_MAD(shader, temp[2], tex_d,
Re: [Mesa-dev] [PATCH 2/3] st/omx: add support for nouveau / interlaced
Hi Leo, nice catch patch is Reviewed-by: Christian König. But we still need to fix transcoding issue with interlaced as true. Our transcode support tunneling, basic the decode buffer will be used directly for encode. Ah, yes of course. Sorry I was a bit fast with giving my rb on that, should have thought about it more. The problem is that the VCE engine can only handle progressive frames and not the interlaced memory layout. What we should do is implementing interlaced -> progressive conversion in the omx state tracker tunneling handling when that happens. Then set interlaced to false in the template and reallocate the video buffer for the next trancoding round. The weave filter from the compositor could be used for interlaced -> progressive conversion. And btw: We are going to have the same problem with the VA-API state tracker. Regards, Christian. Am 28.06.2016 um 09:01 schrieb Julien Isorce: Thx Leo. I confirm it works with nouveau driver so your fix is: Tested-by: Julien Isorce > On 28 June 2016 at 02:27, Liu, Leo > wrote: Hi Julien and Christian, I got a patch attached to fix the "fillout" problem, and please review. But we still need to fix transcoding issue with interlaced as true. Our transcode support tunneling, basic the decode buffer will be used directly for encode. Thanks, Leo *From:* Julien Isorce > *Sent:* June 27, 2016 4:54:07 PM *To:* Liu, Leo *Cc:* ML mesa-dev; Gurkirpal Singh; Koenig, Christian *Subject:* Re: [Mesa-dev] [PATCH 2/3] st/omx: add support for nouveau / interlaced Hi Leo, Sorry for the inconvenience, could you let me know how to reproduce the problem ? I have been playing with some gst pipelines and they all work but I can only test with nouveau driver. Cheers Julien On 27 June 2016 at 21:35, Leo Liu > wrote: This patch break omx decode to file, it got seg fault. Will take look further. Regards, Leo On 06/27/2016 04:16 AM, Julien Isorce wrote: Signed-off-by: Julien Isorce > --- src/gallium/state_trackers/omx/vid_dec.c | 51 1 file changed, 26 insertions(+), 25 deletions(-) diff --git a/src/gallium/state_trackers/omx/vid_dec.c b/src/gallium/state_trackers/omx/vid_dec.c index 564ca2f..85ffb88 100644 --- a/src/gallium/state_trackers/omx/vid_dec.c +++ b/src/gallium/state_trackers/omx/vid_dec.c @@ -48,6 +48,7 @@ #include "pipe/p_video_codec.h" #include "util/u_memory.h" #include "util/u_surface.h" +#include "vl/vl_video_buffer.h" #include "vl/vl_vlc.h" #include "entrypoint.h" @@ -515,34 +516,34 @@ static void vid_dec_FillOutput(vid_dec_PrivateType *priv, struct pipe_video_buff OMX_VIDEO_PORTDEFINITIONTYPE *def = >sPortParam.format.video; struct pipe_sampler_view **views; - struct pipe_transfer *transfer; - struct pipe_box box = { }; - uint8_t *src, *dst; + unsigned i, j; + unsigned width, height; views = buf->get_sampler_view_planes(buf); - dst = output->pBuffer; - - box.width = def->nFrameWidth; - box.height = def->nFrameHeight; - box.depth = 1; - - src = priv->pipe->transfer_map(priv->pipe, views[0]->texture, 0, - PIPE_TRANSFER_READ, , ); - util_copy_rect(dst, views[0]->texture->format, def->nStride, 0, 0, - box.width, box.height, src, transfer->stride, 0, 0); - pipe_transfer_unmap(priv->pipe, transfer); - - dst = ((uint8_t*)output->pBuffer) + (def->nStride * box.height); - - box.width = def->nFrameWidth / 2; - box.height = def->nFrameHeight / 2; - - src = priv->pipe->transfer_map(priv->pipe, views[1]->texture, 0, - PIPE_TRANSFER_READ, , ); - util_copy_rect(dst, views[1]->texture->format, def->nStride, 0, 0, - box.width, box.height, src, transfer->stride, 0, 0); - pipe_transfer_unmap(priv->pipe, transfer); + for
Re: [Mesa-dev] [PATCH 0/7] mesa: Enable -fstrict-aliasing
On Mon, Jun 27, 2016 at 11:42 PM, Matt Turnerwrote: > Based on work by Davin McCall from last summer. > > The biggest change is to exec_list. Previously, the head and tail sentinels > overlapped, saving the size of a pointer. Unfortunately this is not allowed by > the aliasing rules. > > I have fixed all warnings GCC reports in my normal build. I have no attempted > to see what else needs to be fixed. I hope that the respective owners of the > rest of Mesa can look into the remaining warnings. > > This series depends on my 4 patch series to glx, and the trivial "[PATCH] > i965: > Simplify foreach_inst_in_block_safe() macro." > > Discuss! I like it. I have some similar patches here: https://github.com/kusma/mesa/tree/strict-aliasing I'm not entirely convinced about the endianess-correctness of all of the memcpy-conversions, though... but I could easily be wrong. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] gm107/ir: make use of FADD32I for all immediates
On 06/28/2016 05:10 AM, Ilia Mirkin wrote: On Mon, Jun 27, 2016 at 6:08 PM, Samuel Pitoisetwrote: On 06/28/2016 12:06 AM, Ilia Mirkin wrote: On Mon, Jun 27, 2016 at 6:05 PM, Ilia Mirkin wrote: On Mon, Jun 27, 2016 at 6:04 PM, Samuel Pitoiset wrote: On 06/28/2016 12:02 AM, Ilia Mirkin wrote: This loses you saturation. Does the target account for this? No saturate flag for FADD32I. That's not what I asked. Specifically look at this code: bool TargetNVC0::isSatSupported(const Instruction *insn) const { if (insn->op == OP_CVT) return true; if (!(opInfo[insn->op].dstMods & NV50_IR_MOD_SAT)) return false; if (insn->dType == TYPE_U32) return (insn->op == OP_ADD) || (insn->op == OP_MAD); // add f32 LIMM cannot saturate if (insn->op == OP_ADD && insn->sType == TYPE_F32) { if (insn->getSrc(1)->asImm() && insn->getSrc(1)->reg.data.u32 & 0xfff) return false; } Note how it will say that sat is supported for SIMMs with FADD? So the compiler will generate those ops, but then the emitter won't be able to handle it. Okay, I get it. By the way, instead of trying to fight the longIMMD, you should just fix it - /*0008*/ @P0 FADD R0, R0, 1.NEG; /* 0x3858203f8000 */ which corresponds nicely to emitNEG(0x2d, insn->src(1)); The issue is that emitIMMD does if (len == 19) { ... emitField( 56, 1, (val & 0x8) >> 19); emitField(pos, len, (val & 0x7)); So the problem is that the 56 isn't as fixed as the emission code had hoped. I suspect that adjusting it will fix all these silly cases. -ilia /*0010*/ @P0 FADD R0, R0, 0.NEG; /* 0x38582000 */ /*0010*/ @P0 FADD R0, R0, -0; /* 0x3958 */ urgh? -- -Samuel ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] nouveau_drv_video.so ?
nouveau_drv_video.so - what should it be? https://koji.fedoraproject.org/koji/buildinfo?buildID=722316 ... 0.7.4-13 - Revert symlinks - should be handled by mesa rhbz#1271842 https://bugzilla.redhat.com/show_bug.cgi?id=1271842 ... 0.7.4-12 - Add symlinks for radeonsi,r600,nouveau - rhbz#1264499 https://bugzilla.redhat.com/show_bug.cgi?id=1264499 $ rpm -q libva libva-vdpau-driver mesa-dri-drivers libva-1.7.1-1.fc24.x86_64 libva-vdpau-driver-0.7.4-14.fc24.x86_64 mesa-dri-drivers-11.2.2-2.20160614.fc24.x86_64 $ rpm -ql libva-vdpau-driver /usr/lib64/dri/nvidia_drv_video.so /usr/lib64/dri/s3g_drv_video.so /usr/lib64/dri/vdpau_drv_video.so /usr/share/doc/... ... $ rpm -ql mesa-dri-drivers /etc/drirc /usr/lib64/dri/gallium_drv_video.so /usr/lib64/dri/i915_dri.so /usr/lib64/dri/i965_dri.so /usr/lib64/dri/ilo_dri.so /usr/lib64/dri/kms_swrast_dri.so /usr/lib64/dri/nouveau_dri.so /usr/lib64/dri/nouveau_vieux_dri.so /usr/lib64/dri/r200_dri.so /usr/lib64/dri/r300_dri.so /usr/lib64/dri/r600_dri.so /usr/lib64/dri/radeon_dri.so /usr/lib64/dri/radeonsi_dri.so /usr/lib64/dri/swrast_dri.so /usr/lib64/dri/virtio_gpu_dri.so /usr/lib64/dri/vmwgfx_dri.so /usr/lib64/gallium-pipe /usr/lib64/gallium-pipe/pipe_i965.so /usr/lib64/gallium-pipe/pipe_nouveau.so /usr/lib64/gallium-pipe/pipe_r300.so /usr/lib64/gallium-pipe/pipe_r600.so /usr/lib64/gallium-pipe/pipe_radeonsi.so /usr/lib64/gallium-pipe/pipe_swrast.so /usr/lib64/gallium-pipe/pipe_vmwgfx.so $ ll /usr/lib64/dri/ ... dummy_drv_video.so ... gallium_drv_video.so ... i915_dri.so ... i965_dri.so ... ilo_dri.so ... kms_swrast_dri.so ... nouveau_dri.so ... nouveau_vieux_dri.so ... nvidia_drv_video.so -> vdpau_drv_video.so ... r200_dri.so ... r300_dri.so ... r600_dri.so ... radeon_dri.so ... radeonsi_dri.so ... s3g_drv_video.so -> vdpau_drv_video.so ... swrast_dri.so ... vdpau_drv_video.so ... virtio_gpu_dri.so ... vmwgfx_dri.so $ icecat ... libva info: VA-API version 0.39.2 libva info: va_getDriverName() returns 0 libva info: Trying to open /usr/lib64/dri/nouveau_drv_video.so libva info: va_openDriver() returns -1 libva info: VA-API version 0.39.2 libva info: va_getDriverName() returns 0 libva info: Trying to open /usr/lib64/dri/nouveau_drv_video.so libva info: va_openDriver() returns -1 libva info: VA-API version 0.39.2 libva info: va_getDriverName() returns 0 libva info: Trying to open /usr/lib64/dri/gallium_drv_video.so libva info: Found init function __vaDriverInit_0_39 libva info: va_openDriver() returns 0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] V3 On disk shader cache for i965 (Now with real world results!)
On Mon, 2016-06-27 at 00:46 +1000, Timothy Arceri wrote: > On Sun, 2016-06-26 at 16:15 +0300, Grazvydas Ignotas wrote: > > Tried this while playing with apitrace and am getting segfaults > > when > > running any trace with a cached (second) run. Not sure if it's > > "wrong" > > traces I've chosen or what, you can take one example from this bug: > > https://bugs.freedesktop.org/show_bug.cgi?id=96425 > > Thanks for testing I'll take a look tomorrow. The problem is the shaders were being detached after linking so we had nothing to fallback to if we had a shade cache miss. I've hacked something up and pushed it to the shader-cache19 branch that allows the trace to run. Not sure how it relates to real game performance but the trace goes from 5FPS to 7FPS on the second run on my machine with which looks good :) Note I would only use that branch for short testing (e.g. running traces) not in real games as the hack above will leak memory. I'll be travelling tomorrow but should have a real fix by thursday. Thanks again for testing. > > > > > It would also be good idea to hide the cache debug messages behind > > some env var, or at least send them to stderr and not stdout, as > > stdout breaks programs that pipe data through stdout like > > qapitrace. > > Right thats my next task, I should get this done tomorrow also. As > stated below :) "For now I have left in some printf's as the feature > is > still disabled by default and they are useful for debugging. I intend > to fix this soon to hide them behind an environment var." > > Thanks again. > > > > > Gražvydas > > > > On Sun, Jun 26, 2016 at 7:16 AM, Timothy Arceri > >wrote: > > > I've spent a bunch of time rebasing this series to remove the > > > excess > > > code churn and I've just pushed the results to the shader-cache > > > branch > > > mentioned below. There are no code changes to the end result but > > > I've > > > managed to get the patch count down to 80 (was 96 i think) and > > > things > > > should be much easier to review now. > > > > > > I've also had reports of people testing with additional games > > > such > > > as > > > Dota 2 and seeing good results. > > > > > > > > > On Tue, 2016-06-21 at 16:08 +1000, Timothy Arceri wrote: > > > > Rather than send 90+ patches to the list. Please see the repo > > > > at > > > > the > > > > bottom of this email. > > > > > > > > The big update is I've added all stages but compute and tested > > > > with a > > > > few games and everything seems to be working well so far. > > > > Enabling > > > > shader cache with the Shadow of Mordor benchmark make things > > > > noticeably > > > > smoother and helps consitently keep the min FPS at 15 on my > > > > Skylake, > > > > were as without it can be anywhere between 4-15. > > > > > > > > The elemental demo which Dave pointed out as also doing a bunch > > > > of > > > > compiles during the demo is also smoother especially on the > > > > second > > > > run > > > > but its really slow on my Skylake regardless. Maybe someone > > > > with > > > > a > > > > highend Skylake would like to give it a try. > > > > > > > > > > > > V3: > > > > - add support for geometry and tessellation stages > > > > - cache clip planes > > > > - reserve parameter storage before restoring list > > > > - stop losing buffer blocks on cache fallback > > > > - lots of little fixes I cant remember > > > > > > > > V2: > > > > - rebased on master > > > > - add support for encoding doubles > > > > - renamed skip_cache params to is_cache_fallback, and fix > > > > related > > > > bug > > > > when > > > > disabling shader cache for xfb. > > > > > > > > This series is based on the great work done by Carl, Kristian > > > > and > > > > others. > > > > > > > > I've split up Carls original patches for easier review, and > > > > also > > > > merged > > > > a number of fixes and clean-ups into his patches. However there > > > > is a > > > > little more code churn than is ideal as the appoach taken by > > > > the > > > > original patches needed to be modified quite a lot, I'm hoping > > > > its > > > > not > > > > more than people can live with as I'd like to keep some of the > > > > history > > > > rather than just squashing everything. > > > > > > > > For now I have left in some printf's as the feature is still > > > > disabled > > > > by default and they are useful for debugging. I intend to fix > > > > this > > > > soon > > > > to hide them behind an environment var. > > > > > > > > There are no regressions after two runs of piglit with shader > > > > cache > > > > enabled on my Broadwell machine. > > > > > > > > This series enables on disk shader cache for all stage except > > > > compute > > > > programs. For now transform feedback, and SSO programs skip > > > > using > > > > the > > > > cache, these will be added as follow ups. > > > > > > > > My main goal with this series is to land something that > > > > passes piglit there is a number of optimisations
Re: [Mesa-dev] [PATCH 2/3] st/omx: add support for nouveau / interlaced
Thx Leo. I confirm it works with nouveau driver so your fix is: Tested-by: Julien IsorceOn 28 June 2016 at 02:27, Liu, Leo wrote: > Hi Julien and Christian, > > > I got a patch attached to fix the "fillout" problem, and please review. > > > But we still need to fix transcoding issue with interlaced as true. Our > transcode support tunneling, basic the decode buffer will be used directly > for encode. > > > Thanks, > > Leo > > > > -- > *From:* Julien Isorce > *Sent:* June 27, 2016 4:54:07 PM > *To:* Liu, Leo > *Cc:* ML mesa-dev; Gurkirpal Singh; Koenig, Christian > *Subject:* Re: [Mesa-dev] [PATCH 2/3] st/omx: add support for nouveau / > interlaced > > Hi Leo, > > Sorry for the inconvenience, could you let me know how to reproduce the > problem ? > I have been playing with some gst pipelines and they all work but I can > only test with nouveau driver. > > Cheers > Julien > > > On 27 June 2016 at 21:35, Leo Liu wrote: > >> This patch break omx decode to file, it got seg fault. Will take look >> further. >> >> Regards, >> Leo >> >> >> >> On 06/27/2016 04:16 AM, Julien Isorce wrote: >> >>> Signed-off-by: Julien Isorce >>> --- >>> src/gallium/state_trackers/omx/vid_dec.c | 51 >>> >>> 1 file changed, 26 insertions(+), 25 deletions(-) >>> >>> diff --git a/src/gallium/state_trackers/omx/vid_dec.c >>> b/src/gallium/state_trackers/omx/vid_dec.c >>> index 564ca2f..85ffb88 100644 >>> --- a/src/gallium/state_trackers/omx/vid_dec.c >>> +++ b/src/gallium/state_trackers/omx/vid_dec.c >>> @@ -48,6 +48,7 @@ >>> #include "pipe/p_video_codec.h" >>> #include "util/u_memory.h" >>> #include "util/u_surface.h" >>> +#include "vl/vl_video_buffer.h" >>> #include "vl/vl_vlc.h" >>> #include "entrypoint.h" >>> @@ -515,34 +516,34 @@ static void vid_dec_FillOutput(vid_dec_PrivateType >>> *priv, struct pipe_video_buff >>> OMX_VIDEO_PORTDEFINITIONTYPE *def = >sPortParam.format.video; >>>struct pipe_sampler_view **views; >>> - struct pipe_transfer *transfer; >>> - struct pipe_box box = { }; >>> - uint8_t *src, *dst; >>> + unsigned i, j; >>> + unsigned width, height; >>>views = buf->get_sampler_view_planes(buf); >>> - dst = output->pBuffer; >>> - >>> - box.width = def->nFrameWidth; >>> - box.height = def->nFrameHeight; >>> - box.depth = 1; >>> - >>> - src = priv->pipe->transfer_map(priv->pipe, views[0]->texture, 0, >>> - PIPE_TRANSFER_READ, , ); >>> - util_copy_rect(dst, views[0]->texture->format, def->nStride, 0, 0, >>> - box.width, box.height, src, transfer->stride, 0, 0); >>> - pipe_transfer_unmap(priv->pipe, transfer); >>> - >>> - dst = ((uint8_t*)output->pBuffer) + (def->nStride * box.height); >>> - >>> - box.width = def->nFrameWidth / 2; >>> - box.height = def->nFrameHeight / 2; >>> - >>> - src = priv->pipe->transfer_map(priv->pipe, views[1]->texture, 0, >>> - PIPE_TRANSFER_READ, , ); >>> - util_copy_rect(dst, views[1]->texture->format, def->nStride, 0, 0, >>> - box.width, box.height, src, transfer->stride, 0, 0); >>> - pipe_transfer_unmap(priv->pipe, transfer); >>> + for (i = 0; i < 2 /* NV12 */; i++) { >>> + if (!views[i]) continue; >>> + width = buf->width; >>> + height = buf->height; >>> + vl_video_buffer_adjust_size(, , i, buf->interlaced, >>> buf->chroma_format); >>> + for (j = 0; j < views[i]->texture->array_size; ++j) { >>> + struct pipe_box box = {0, 0, j, width, height, 1}; >>> + struct pipe_transfer *transfer; >>> + uint8_t *map, *dst; >>> + map = priv->pipe->transfer_map(priv->pipe, views[i]->texture, >>> 0, >>> + PIPE_TRANSFER_READ, , ); >>> + if (!map) >>> +return; >>> + >>> + dst = ((uint8_t*)output->pBuffer + output->nOffset) + j * >>> def->nStride + i * buf->width * buf->height; >>> + util_copy_rect(dst, >>> +views[i]->texture->format, >>> +def->nStride * views[i]->texture->array_size, 0, 0, >>> +box.width, box.height, map, transfer->stride, 0, 0); >>> + >>> + pipe_transfer_unmap(priv->pipe, transfer); >>> + } >>> + } >>> } >>> static void vid_dec_FrameDecoded(OMX_COMPONENTTYPE *comp, >>> OMX_BUFFERHEADERTYPE* input, >>> >> >> > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] clover: fix getting struct args api size
Francisco Jerezwrites: > Serge Martin writes: > >> This fix getting the size of a struct arg. vec3 types still work ok. >> Only buit-in args need to have power of two alignment, getTypeAllocSize >> reports the correct size. >> --- >> src/gallium/state_trackers/clover/llvm/invocation.cpp | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp >> b/src/gallium/state_trackers/clover/llvm/invocation.cpp >> index 03487d6..9af51539 100644 >> --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp >> +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp >> @@ -472,7 +472,8 @@ namespace { >> // aligned to the next larger power of two". We need this >> // alignment for three element vectors, which have >> // non-power-of-2 store size. >> - const unsigned arg_api_size = >> util_next_power_of_two(arg_store_size); >> + const unsigned arg_api_size = arg_type->isStructTy() ? >> + arg_store_size : util_next_power_of_two(arg_store_size); >> > Hm... Isn't this still going to be broken if you pass a struct argument > to a kernel function and the alignment of any of the struct members > doesn't match the target-specific data layout? Not sure we can fix this > sensibly without requiring the target's data layout to match the CL API > exactly. Any suggestions Tom? > Unless someone has a better plan, I suggest we roll back to v1.1 of this patch and call it a back-end data layout bug if the expected alignment or size of a kernel argument type doesn't match the requirements set by the CL spec. >> llvm::Type *target_type = arg_type->isIntegerTy() ? >> TD.getSmallestLegalIntType(mod->getContext(), arg_store_size >> * 8) >> -- >> 2.5.5 signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 31/34] i965/state: Account for the element size in emit_buffer_surface_state
On Thu, Jun 23, 2016 at 02:00:30PM -0700, Jason Ekstrand wrote: > --- > src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 11 ++- > src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 9 + > src/mesa/drivers/dri/i965/gen8_surface_state.c| 9 + > 3 files changed, 16 insertions(+), 13 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > index 944d64d..29b8976 100644 > --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c > @@ -496,6 +496,7 @@ gen4_emit_buffer_surface_state(struct brw_context *brw, > unsigned pitch, > bool rw) > { > + unsigned elements = buffer_size / pitch; Could be const as well as in the two other occurences further down. > uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE, > 6 * 4, 32, out_offset); > memset(surf, 0, 6 * 4); > @@ -504,9 +505,9 @@ gen4_emit_buffer_surface_state(struct brw_context *brw, > surface_format << BRW_SURFACE_FORMAT_SHIFT | > (brw->gen >= 6 ? BRW_SURFACE_RC_READ_WRITE : 0); > surf[1] = (bo ? bo->offset64 : 0) + buffer_offset; /* reloc */ > - surf[2] = ((buffer_size - 1) & 0x7f) << BRW_SURFACE_WIDTH_SHIFT | > - (((buffer_size - 1) >> 7) & 0x1fff) << BRW_SURFACE_HEIGHT_SHIFT; > - surf[3] = (((buffer_size - 1) >> 20) & 0x7f) << BRW_SURFACE_DEPTH_SHIFT | > + surf[2] = ((elements - 1) & 0x7f) << BRW_SURFACE_WIDTH_SHIFT | > + (((elements - 1) >> 7) & 0x1fff) << BRW_SURFACE_HEIGHT_SHIFT; > + surf[3] = (((elements - 1) >> 20) & 0x7f) << BRW_SURFACE_DEPTH_SHIFT | > (pitch - 1) << BRW_SURFACE_PITCH_SHIFT; > > /* Emit relocation to surface contents. The 965 PRM, Volume 4, section > @@ -549,7 +550,7 @@ brw_update_buffer_texture_surface(struct gl_context *ctx, > brw->vtbl.emit_buffer_surface_state(brw, surf_offset, bo, > tObj->BufferOffset, > brw_format, > - size / texel_size, > + size, > texel_size, > false /* rw */); > } > @@ -1480,7 +1481,7 @@ update_image_surface(struct brw_context *brw, > > brw->vtbl.emit_buffer_surface_state( > brw, surf_offset, intel_obj->buffer, obj->BufferOffset, > -format, intel_obj->Base.Size / texel_size, texel_size, > +format, intel_obj->Base.Size, texel_size, > access != GL_READ_ONLY); > > update_buffer_image_param(brw, u, surface_idx, param); > diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c > b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c > index bb94f2d..65a1cb0 100644 > --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c > +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c > @@ -135,6 +135,7 @@ gen7_emit_buffer_surface_state(struct brw_context *brw, > unsigned pitch, > bool rw) > { > + unsigned elements = buffer_size / pitch; > uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE, > 8 * 4, 32, out_offset); > memset(surf, 0, 8 * 4); > @@ -143,12 +144,12 @@ gen7_emit_buffer_surface_state(struct brw_context *brw, > surface_format << BRW_SURFACE_FORMAT_SHIFT | > BRW_SURFACE_RC_READ_WRITE; > surf[1] = (bo ? bo->offset64 : 0) + buffer_offset; /* reloc */ > - surf[2] = SET_FIELD((buffer_size - 1) & 0x7f, GEN7_SURFACE_WIDTH) | > - SET_FIELD(((buffer_size - 1) >> 7) & 0x3fff, > GEN7_SURFACE_HEIGHT); > + surf[2] = SET_FIELD((elements - 1) & 0x7f, GEN7_SURFACE_WIDTH) | > + SET_FIELD(((elements - 1) >> 7) & 0x3fff, GEN7_SURFACE_HEIGHT); > if (surface_format == BRW_SURFACEFORMAT_RAW) > - surf[3] = SET_FIELD(((buffer_size - 1) >> 21) & 0x3ff, > BRW_SURFACE_DEPTH); > + surf[3] = SET_FIELD(((elements - 1) >> 21) & 0x3ff, BRW_SURFACE_DEPTH); > else > - surf[3] = SET_FIELD(((buffer_size - 1) >> 21) & 0x3f, > BRW_SURFACE_DEPTH); > + surf[3] = SET_FIELD(((elements - 1) >> 21) & 0x3f, BRW_SURFACE_DEPTH); > surf[3] |= (pitch - 1); > > surf[5] = SET_FIELD(GEN7_MOCS_L3, GEN7_SURFACE_MOCS); > diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c > b/src/mesa/drivers/dri/i965/gen8_surface_state.c > index 00e4c48..9ac8a48 100644 > --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c > +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c > @@ -63,6 +63,7 @@ gen8_emit_buffer_surface_state(struct brw_context *brw, > unsigned pitch, > bool rw) > { > + unsigned