Re: [Mesa-dev] [PATCH 1/2] gallium: add texture gather support to gallium
On 07.02.2014 23:25, Dave Airlie wrote: Doh, yes because GL has ARB_texture_gather then has stuff hidden away in ARB_gpu_shader5 I forgot to add the extra bits which I suppose we should do. So I've reposted with the component selection in src1 now. Hmm seems a bit excessive to use an extra reg for that (gather4 but only in d3d11 form uses a src_sel on the sampler reg, but that might not work). I realize this is actually more messy than I thought, since the initial ARB_texture_gather had the ability to query if multi-channel formats are allowed, but had no way to select the channel (somewhat relying on ARB_texture_swizzle to do it, though of course you can't issue multiple gathers with the same texture to get different channels that way). But glsl 4.00 version could select the channel. Is the ARB_texture_gather version actually all that useful or could you merge the two caps? That is, if you have the ability to fetch from multi-channel textures, assume you can also select the channel. The sm4 version of gather4 also has the single-channel format restriction - I guess though some hw really can do 4 channels without channel selection. Yeah I think I'll rethink this stuff, it looks like two caps, one for MAX_COMPONENTS for ARB_texture_gather4, and just one cap for TEXTURE_GATHER_SM5 support which would denote support for all the ARB_GPU_shader5 bits. Other than that, what about shadow samplers? Gather4 of course can't do it (because the d3d10-style opcodes have different opcodes for shadow comparisons), but the GL style opcodes are usually the same if shadow samplers or not are used. Maybe you don't want to handle that right now, just saying that if you'd want to use the same opcode you'd be missing a component in case of texture cube arrays... Since this can't be used for fixed function though I'd guess nothing would stop you from using a different opcode for shadow samplers. I've gotten shadow samplers to work with the current opcodes, though I have to see about cube arrays if we have the running out of space to put everything. Also the GPU_shader5 spec has a few more oddities, so you have textureGatherOffset which can take a non-constant set of offset values to apply to all 4 texels, then you have textureGatherOffsets which only takes constants again, but 4 of them, one per texel. Looking at radeon hw it appears fglrx decomposes textureGatherOffsets into multiple gather instructions at the hw level but using the non-constant hw support to do this. So I'm not sure if the gallium interface should just support non-constant for all offsets and just restrict the GL. Fwiw Fermi+ support 4 different non-constant offsets, since they're passed in a register anyway. I've reworked the state tracker code already, http://cgit.freedesktop.org/~airlied/mesa/commit/?h=r600g-texture-gatherid=444bc1c8118d51600a58af8a84088e94d0800b22 but I suspect I've a bit further down the rabbit hole to go. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa/st: expose ARB_texture_rgb10_a2ui if R10G10B10A2_UINT is supported v2
--- src/mesa/state_tracker/st_extensions.c | 4 +++- src/mesa/state_tracker/st_format.c | 6 +- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 5e4a3b3..8c49e54 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -419,7 +419,9 @@ void st_init_extensions(struct st_context *st) PIPE_FORMAT_R16G16B16A16_FLOAT } }, { { o(ARB_texture_rgb10_a2ui) }, -{ PIPE_FORMAT_B10G10R10A2_UINT } }, +{ PIPE_FORMAT_R10G10B10A2_UINT, + PIPE_FORMAT_B10G10R10A2_UINT }, + GL_TRUE }, /* at least one format must be supported */ { { o(EXT_framebuffer_sRGB) }, { PIPE_FORMAT_A8B8G8R8_SRGB, diff --git a/src/mesa/state_tracker/st_format.c b/src/mesa/state_tracker/st_format.c index 6acf983..320d3d4 100644 --- a/src/mesa/state_tracker/st_format.c +++ b/src/mesa/state_tracker/st_format.c @@ -359,6 +359,8 @@ st_mesa_format_to_pipe_format(gl_format mesaFormat) return PIPE_FORMAT_R11G11B10_FLOAT; case MESA_FORMAT_ARGB2101010_UINT: return PIPE_FORMAT_B10G10R10A2_UINT; + case MESA_FORMAT_ABGR2101010_UINT: + return PIPE_FORMAT_R10G10B10A2_UINT; case MESA_FORMAT_XRGB_UNORM: return PIPE_FORMAT_B4G4R4X4_UNORM; @@ -712,6 +714,8 @@ st_pipe_format_to_mesa_format(enum pipe_format format) case PIPE_FORMAT_B10G10R10A2_UINT: return MESA_FORMAT_ARGB2101010_UINT; + case PIPE_FORMAT_R10G10B10A2_UINT: + return MESA_FORMAT_ABGR2101010_UINT; case PIPE_FORMAT_B4G4R4X4_UNORM: return MESA_FORMAT_XRGB_UNORM; @@ -1483,7 +1487,7 @@ static const struct format_mapping format_map[] = { }, { { GL_RGB10_A2UI, 0 }, - { PIPE_FORMAT_B10G10R10A2_UINT, 0 } + { PIPE_FORMAT_R10G10B10A2_UINT, PIPE_FORMAT_B10G10R10A2_UINT, 0 } }, }; -- 1.8.1.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] st/mesa: fix GS varyings for PIPE_CAP_TGSI_TEXCOORD
--- src/mesa/state_tracker/st_program.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/src/mesa/state_tracker/st_program.c b/src/mesa/state_tracker/st_program.c index f72122b..f13132e 100644 --- a/src/mesa/state_tracker/st_program.c +++ b/src/mesa/state_tracker/st_program.c @@ -944,17 +944,16 @@ st_translate_geometry_program(struct st_context *st, case VARYING_SLOT_TEX5: case VARYING_SLOT_TEX6: case VARYING_SLOT_TEX7: -stgp-input_semantic_name[slot] = TGSI_SEMANTIC_GENERIC; +stgp-input_semantic_name[slot] = st-needs_texcoord_semantic ? + TGSI_SEMANTIC_TEXCOORD : TGSI_SEMANTIC_GENERIC; stgp-input_semantic_index[slot] = (attr - VARYING_SLOT_TEX0); break; case VARYING_SLOT_VAR0: default: assert(attr = VARYING_SLOT_VAR0 attr VARYING_SLOT_MAX); stgp-input_semantic_name[slot] = TGSI_SEMANTIC_GENERIC; -stgp-input_semantic_index[slot] = (VARYING_SLOT_VAR0 - -VARYING_SLOT_TEX0 + -attr - -VARYING_SLOT_VAR0); +stgp-input_semantic_index[slot] = st-needs_texcoord_semantic ? + (attr - VARYING_SLOT_VAR0) : (attr - VARYING_SLOT_TEX0); break; } } @@ -1036,7 +1035,8 @@ st_translate_geometry_program(struct st_context *st, case VARYING_SLOT_TEX5: case VARYING_SLOT_TEX6: case VARYING_SLOT_TEX7: -gs_output_semantic_name[slot] = TGSI_SEMANTIC_GENERIC; +gs_output_semantic_name[slot] = st-needs_texcoord_semantic ? + TGSI_SEMANTIC_TEXCOORD : TGSI_SEMANTIC_GENERIC; gs_output_semantic_index[slot] = (attr - VARYING_SLOT_TEX0); break; case VARYING_SLOT_VAR0: @@ -1044,10 +1044,9 @@ st_translate_geometry_program(struct st_context *st, assert(slot Elements(gs_output_semantic_name)); assert(attr = VARYING_SLOT_VAR0); gs_output_semantic_name[slot] = TGSI_SEMANTIC_GENERIC; -gs_output_semantic_index[slot] = (VARYING_SLOT_VAR0 - - VARYING_SLOT_TEX0 + - attr - - VARYING_SLOT_VAR0); +gs_output_semantic_index[slot] = st-needs_texcoord_semantic ? + (attr - VARYING_SLOT_VAR0) : (attr - VARYING_SLOT_TEX0); + break; } } } -- 1.8.1.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] mesa/st: expose ARB_texture_rgb10_a2ui if R10G10B10A2_UINT is supported
--- src/mesa/state_tracker/st_extensions.c | 4 +++- src/mesa/state_tracker/st_format.c | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 5e4a3b3..8c49e54 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -419,7 +419,9 @@ void st_init_extensions(struct st_context *st) PIPE_FORMAT_R16G16B16A16_FLOAT } }, { { o(ARB_texture_rgb10_a2ui) }, -{ PIPE_FORMAT_B10G10R10A2_UINT } }, +{ PIPE_FORMAT_R10G10B10A2_UINT, + PIPE_FORMAT_B10G10R10A2_UINT }, + GL_TRUE }, /* at least one format must be supported */ { { o(EXT_framebuffer_sRGB) }, { PIPE_FORMAT_A8B8G8R8_SRGB, diff --git a/src/mesa/state_tracker/st_format.c b/src/mesa/state_tracker/st_format.c index 6acf983..2bb07e7 100644 --- a/src/mesa/state_tracker/st_format.c +++ b/src/mesa/state_tracker/st_format.c @@ -813,7 +813,7 @@ static const struct format_mapping format_map[] = { }, { { GL_RGB10_A2, 0 }, - { PIPE_FORMAT_B10G10R10A2_UNORM, DEFAULT_RGBA_FORMATS } + { PIPE_FORMAT_R10G10B10A2_UNORM, PIPE_FORMAT_B10G10R10A2_UNORM, DEFAULT_RGBA_FORMATS } }, { { 4, GL_RGBA, GL_RGBA8, 0 }, -- 1.8.1.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] nv50: add more RGB10A2 formats
--- src/gallium/drivers/nouveau/nv50/nv50_formats.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_formats.c b/src/gallium/drivers/nouveau/nv50/nv50_formats.c index 0a7e812..b301890 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_formats.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_formats.c @@ -202,6 +202,8 @@ const struct nv50_format nv50_format_table[PIPE_FORMAT_COUNT] = TBV, 1), C4A(R10G10B10A2_SNORM, NONE, C0, C1, C2, C3, SNORM, 10_10_10_2, TV, 0), C4A(B10G10R10A2_SNORM, NONE, C2, C1, C0, C3, SNORM, 10_10_10_2, TV, 1), + C4A(R10G10B10A2_UINT, RGB10_A2_UINT, C0, C1, C2, C3, UINT, 10_10_10_2, TRV, 0), + C4A(B10G10R10A2_UINT, RGB10_A2_UINT, C2, C1, C0, C3, UINT, 10_10_10_2, TV, 0), F3B(R11G11B10_FLOAT, R11G11B10_FLOAT, C0, C1, C2, xx, FLOAT, 11_11_10, IB), @@ -394,6 +396,11 @@ const struct nv50_format nv50_format_table[PIPE_FORMAT_COUNT] = F1A(R16_SSCALED, NONE, C0, xx, xx, xx, SSCALED, 16, V), F1A(R16_USCALED, NONE, C0, xx, xx, xx, USCALED, 16, V), + C4A(R10G10B10A2_USCALED, NONE, C0, C1, C2, C3, USCALED, 10_10_10_2, V, 0), + C4A(R10G10B10A2_SSCALED, NONE, C0, C1, C2, C3, SSCALED, 10_10_10_2, V, 0), + C4A(B10G10R10A2_USCALED, NONE, C0, C1, C2, C3, USCALED, 10_10_10_2, V, 1), + C4A(B10G10R10A2_SSCALED, NONE, C0, C1, C2, C3, SSCALED, 10_10_10_2, V, 1), + C4A(R8G8B8A8_SSCALED, NONE, C0, C1, C2, C3, SSCALED, 8_8_8_8, V, 0), C4A(R8G8B8A8_USCALED, NONE, C0, C1, C2, C3, USCALED, 8_8_8_8, V, 0), F3A(R8G8B8_UNORM, NONE, C0, C1, C2, xx, UNORM, 8_8_8, V), -- 1.8.1.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: implement multisample textures
On 25.10.2013 20:35, Emil Velikov wrote: On 21/10/13 23:23, Bryan Cain wrote: This is a port of 4da54c91d24da (nvc0: implement multisample textures) to nv50. When coupled with the patch to only report 16 texture samplers (to fix crashes), all of the Piglit tests in spec/arb_texture_multisample pass. Hello Bryan, Big thanks for your work. As promised here is a quick piglit summary on my nv96 pass/fail/crash 69/32/27 * dmesg does not spit anything nouveau related during the tests * any geometry shader related tests were skipped (piglit: info: Failed to create GL 3.2 core context) * all the crashes are due to the following assert codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed. I'm not sure how you'd get 4 arguments there (x y layer sample ?). There's no mip maps for multisample textures. But either way you're probably going to have to do things by hand: E.g. MS8 textures contain contiguous 4x2 rectangles of samples for each pixel, so you multiply x by 4 and y by 2 to arrive at the sub-rectangle and then add the correct offsets for the sample id as seen in get_sample_position (store the info in a constant buffer, that has to be updated when texture changes). You might want to use a lookup table like in nve4 compute (look for MS sample coordinate offsets) to map sample id to coordinate offset, that one works for any sample count as long as you don't use the ALT modes (nve4 doesn't need to for textures, but for images/surfaces/UAVs/RATs where the whole VM address calculation is done by hand). PASSarb_texture_multisample-* PASSfb-completeness/* FAILsample-position/* FAILtexelFetch fs sampler2DMS 4* CRASH texelFetch fs sampler2DMSArray 4* FAILtexelFetch/*-*s-isampler2DMS CRASH texelFetch/*-*s-isampler2DMSArray PASStextureSize/* Hope you find this useful :) No real world apps that use multisample textures were tested, yet. Cheers Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: implement multisample textures
On 25.10.2013 23:51, Bryan Cain wrote: On 10/25/2013 04:11 PM, Christoph Bumiller wrote: On 25.10.2013 20:35, Emil Velikov wrote: On 21/10/13 23:23, Bryan Cain wrote: This is a port of 4da54c91d24da (nvc0: implement multisample textures) to nv50. When coupled with the patch to only report 16 texture samplers (to fix crashes), all of the Piglit tests in spec/arb_texture_multisample pass. Hello Bryan, Big thanks for your work. As promised here is a quick piglit summary on my nv96 pass/fail/crash 69/32/27 * dmesg does not spit anything nouveau related during the tests * any geometry shader related tests were skipped (piglit: info: Failed to create GL 3.2 core context) * all the crashes are due to the following assert codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed. I'm not sure how you'd get 4 arguments there (x y layer sample ?). There's no mip maps for multisample textures. But either way you're probably going to have to do things by hand: E.g. MS8 textures contain contiguous 4x2 rectangles of samples for each pixel, so you multiply x by 4 and y by 2 to arrive at the sub-rectangle and then add the correct offsets for the sample id as seen in get_sample_position (store the info in a constant buffer, that has to be updated when texture changes). You might want to use a lookup table like in nve4 compute (look for MS sample coordinate offsets) to map sample id to coordinate offset, that one works for any sample count as long as you don't use the ALT modes (nve4 doesn't need to for textures, but for images/surfaces/UAVs/RATs where the whole VM address calculation is done by hand). You're probably right. I don't know why MSAA appears to work for me, but there's probably something wrong with the output that I haven't noticed. I'll work on implementing it properly this weekend. MSAA itself (rendering and resolving) has been working before, the only thing that ARB_texture_multisample adds is texelFetch from MS resources. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: report only 16 texure_samplers
On 12.10.2013 02:47, Emil Velikov wrote: On 12/10/13 01:25, Roland Scheidegger wrote: Am 12.10.2013 02:02, schrieb Brian Paul: On 10/11/2013 10:44 AM, Emil Velikov wrote: Current mesa code(cso and drivers) expect and use only up-to 16 texture samplers. Verbatum copy from the nvc0 driver. Cc 9.1 mesa-sta...@lists.freedesktop.org Cc 9.2 mesa-sta...@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70212 Reported-by: Aaron Watry awa...@gmail.com Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/gallium/drivers/nouveau/nv50/nv50_screen.c | 4 1 file changed, 4 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index f454ec7..3f81cc4 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -249,7 +249,11 @@ nv50_screen_get_shader_param(struct pipe_screen *pscreen, unsigned shader, case PIPE_SHADER_CAP_INTEGERS: return 1; case PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLERS: + return 16; /* would be 32 in linked (OpenGL-style) mode */ + /* + case PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLER_VIEWS: return 32; + */ default: NOUVEAU_ERR(unknown PIPE_SHADER_CAP %d\n, param); return 0; Since PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLER_VIEWS doesn't really exist, I'd rather see it removed entirely. Actually it doesn't seem to exist at all? Indeed and afaics it never did :) As the commit says it's a verbatim copy of nvc0, which also started with 32 TEXTURE_SAMPLERS. If you wanted to future-proof the code you could do return MIN2(32, PIPE_MAX_SAMPLERS); in case we bump PIPE_MAX_SAMPLERS to 32 one of these days. In any case, Reviewed-by: Brian Paul bri...@vmware.com Well I think there is quite some hw out there which can only do 16 samplers but more sampler views as this is what d3d10/11 wants (16 samplers max per stage, but 128 sampler views). So making it queryable may have some benefits, but OpenGL can't really make any use of it in any case. I'm not entirely sure what is the case in here, as I'm a bit short of knowledge about the hardware, especially with the lack of documentation. That comment's there as a reminder that gallium should have that cap. On nv50 have 2 big tables in VRAM, of texture view (or sampler view in gallium, shader resource view in d3d) descriptors (TIC) and sampler descriptors (TSC). These are mapped to texture and sampler units via a binding table (that happens as a result of calling glBindTexture). In the shader you have to select the units, you have 4 bits for the sampler and 5 bits for the texture unit index. You can set the hardware to linked mode (LINKED_TSC) so that sampler unit index automatically corresponds to the texture view unit index. Now you can't select them independently but you access 32 bindings of them tied together (which is how OpenGL works). But gallium requires them to be selectable independently we can't use that. With Kepler they removed the binding table and you can select the descriptors directly via a 32-bit register so you can access (1 20) texture views and (1 12) samplers, there's no problem there. FWIW I've intentionally added/copied the SAMPLER_VIEWS, as I feel it's beneficial in the long term. That is after going through the build system(s), my plan is to jump into the nv50 driver + vdpau st and some of the missing extensions and other things along the way :) Cheers Emil Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] regression on nvc0 since floating point compare instructions
On 12.09.2013 16:14, Roland Scheidegger wrote: Am 12.09.2013 03:40, schrieb Dave Airlie: Maybe the type isn't set correctly? Looks to me like these instructions end up in mkCmp, which will set both src and dst type but ignore src type and set both according to the same type (which was the dst type). Roland Okay I've attached my next attempt at fixing it, fixes the two testcases I had. No idea what setting type there really does but I guess that looks right :-). Though I'm wondering if U32 vs. S32 would make a difference for dst type since some of the (unsigned) comparisons still would use U32. It doesn't make a difference, making it signed is unnecessary. If it helped before that was just because it made negative floats be interpreted as negative ints (instead of large ints) which has a slightly better chance of succeeding. Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] regression on nvc0 since floating point compare instructions
On 10.09.2013 06:55, Dave Airlie wrote: On Tue, Sep 10, 2013 at 12:04 PM, Dave Airlie airl...@gmail.com wrote: On Tue, Sep 10, 2013 at 11:59 AM, Dave Airlie airl...@gmail.com wrote: Hey, so virgl stopped working on nouveau the other day and I bisected it to the enable of the floating point compare instructions in the state tracker, I've attached a shader runner file that makes it hang, As usual 5 secs after pressing send I had an insight, the attached patch seems to fix it here for me. Okay its a bit wierder than that, found another bunch of regressions, I just noticed that the handler for the TGSI SET instructions assumes source type == dest type, that should explain it. My ingenious plan of not having an NV card [plugged in] so that someone would come along to fill the vacuum of nouveau gallium devs doesn't seem to work :/ Here's another shader test that regression from 9.2 to master on nvc0. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/llvm: don't export more colors than the number of CBs
On 24.08.2013 11:44, Christian König wrote: Am 24.08.2013 03:30, schrieb Vadim Girlin: Currently llvm backend always exports at least one color in pixel shader even if no color buffers are enabled. With depth/stencil exports this can result in the following code: EXPORT PIXEL 0 R0.xyzw VPM EXPORT PIXEL 61R1.x___ VPM EXPORT_DONEPIXEL 61R0._x__ VPM EOP AFAIU with zero color buffers no memory is reserved for colors in the export ring and all exports in this example actually write to the same location. The code above still works fine in this particular case, because correct values are written last, but reordering can break it (especially with SB which tends to reorder the exports). Signed-off-by: Vadim Girlin vadimgir...@gmail.com I briefly remember that we needed at least one color export otherwise the GPU might hang, but I'm not 100% sure of that. If there are no color buffers bound but the original shader writes color 0, you still have to export it to keep the alpha test working ... Marek and Alex should probably also take a look on this before we commit it. Christian. --- This fixes regressions with LLVM+SB, so I consider it as a prerequisite for enabling SB by default. Also it fixes some issues with LLVM backend alone. Tested on evergreen only (I don't have other hw), needs testing on pre-evergreen GPUs. src/gallium/drivers/r600/r600_llvm.c | 2 +- src/gallium/drivers/r600/r600_shader.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/r600_llvm.c b/src/gallium/drivers/r600/r600_llvm.c index 03a68e4..d2f4aff 100644 --- a/src/gallium/drivers/r600/r600_llvm.c +++ b/src/gallium/drivers/r600/r600_llvm.c @@ -333,8 +333,8 @@ static void llvm_emit_epilogue(struct lp_build_tgsi_context * bld_base) } else if (ctx-type == TGSI_PROCESSOR_FRAGMENT) { switch (ctx-r600_outputs[i].name) { case TGSI_SEMANTIC_COLOR: -has_color = true; if ( color_count ctx-color_buffer_count) { +has_color = true; LLVMValueRef args[3]; args[0] = output; if (ctx-fs_color_all) { diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index fb766c4..85f8469 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -1130,7 +1130,7 @@ static int r600_shader_from_tgsi(struct r600_screen *rscreen, radeon_llvm_ctx.face_gpr = ctx.face_gpr; radeon_llvm_ctx.r600_inputs = ctx.shader-input; radeon_llvm_ctx.r600_outputs = ctx.shader-output; -radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1); +radeon_llvm_ctx.color_buffer_count = key.nr_cbufs; radeon_llvm_ctx.chip_class = ctx.bc-chip_class; radeon_llvm_ctx.fs_color_all = shader-fs_write_all (rscreen-chip_class = EVERGREEN); radeon_llvm_ctx.stream_outputs = so; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] nv50: implement new float comparison instructions
On 13.08.2013 19:04, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com untested. Looks like it should work though, thanks. nv50 only supported u32 result all along and on nvc0 both cases are already handled by the rest of the code, too. --- .../drivers/nv50/codegen/nv50_ir_from_tgsi.cpp | 17 + 1 file changed, 17 insertions(+) diff --git a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp index 56eccac..a2ad9f4 100644 --- a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp @@ -440,6 +440,11 @@ nv50_ir::DataType Instruction::inferDstType() const switch (getOpcode()) { case TGSI_OPCODE_F2U: return nv50_ir::TYPE_U32; case TGSI_OPCODE_F2I: return nv50_ir::TYPE_S32; + case TGSI_OPCODE_FSEQ: + case TGSI_OPCODE_FSGE: + case TGSI_OPCODE_FSLT: + case TGSI_OPCODE_FSNE: + return nv50_ir::TYPE_U32; case TGSI_OPCODE_I2F: case TGSI_OPCODE_U2F: return nv50_ir::TYPE_F32; @@ -456,19 +461,23 @@ nv50_ir::CondCode Instruction::getSetCond() const case TGSI_OPCODE_SLT: case TGSI_OPCODE_ISLT: case TGSI_OPCODE_USLT: + case TGSI_OPCODE_FSLT: return CC_LT; case TGSI_OPCODE_SLE: return CC_LE; case TGSI_OPCODE_SGE: case TGSI_OPCODE_ISGE: case TGSI_OPCODE_USGE: + case TGSI_OPCODE_FSGE: return CC_GE; case TGSI_OPCODE_SGT: return CC_GT; case TGSI_OPCODE_SEQ: case TGSI_OPCODE_USEQ: + case TGSI_OPCODE_FSEQ: return CC_EQ; case TGSI_OPCODE_SNE: + case TGSI_OPCODE_FSNE: return CC_NEU; case TGSI_OPCODE_USNE: return CC_NE; @@ -556,6 +565,10 @@ static nv50_ir::operation translateOpcode(uint opcode) NV50_IR_OPCODE_CASE(KILL_IF, DISCARD); NV50_IR_OPCODE_CASE(F2I, CVT); + NV50_IR_OPCODE_CASE(FSEQ, SET); + NV50_IR_OPCODE_CASE(FSGE, SET); + NV50_IR_OPCODE_CASE(FSLT, SET); + NV50_IR_OPCODE_CASE(FSNE, SET); NV50_IR_OPCODE_CASE(IDIV, DIV); NV50_IR_OPCODE_CASE(IMAX, MAX); NV50_IR_OPCODE_CASE(IMIN, MIN); @@ -2354,6 +2367,10 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn) case TGSI_OPCODE_SLE: case TGSI_OPCODE_SNE: case TGSI_OPCODE_STR: + case TGSI_OPCODE_FSEQ: + case TGSI_OPCODE_FSGE: + case TGSI_OPCODE_FSLT: + case TGSI_OPCODE_FSNE: case TGSI_OPCODE_ISGE: case TGSI_OPCODE_ISLT: case TGSI_OPCODE_USEQ: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC]: gallium: add new float comparison opcodes returning integer booleans
On 09.08.2013 20:42, Roland Scheidegger wrote: This is a proposal for new comparison instructions, as the old ones don't really fit modern (graphic or opencl I guess for that matter) languages well. If you've got objections, think the naming is crazy or whatnot I'm open for suggestions :-). I would think this is not just a much better fit for d3d10/glsl but for hw as well. I think current hardware can do both, and as for the names, I'm fine with the prefixed ones being the modern opcodes (prefix referring to the source type in both cases) and the ones that are named exactly like the legacy opcodes behaving like the legacy ones. Otoh newcomers might get confused and think the F prefix meaning that they should return a float, we had a similar issue with legacy-KIL and KILP-condition-is-predicate-if-any (and I just need to say again I'd have preferred to keep the name KIL and rename KILP to DISCARD), but seriously, the opcodes are documented so it should be no trouble to figure out what they do (ok in practice that doesn't always work since we sometimes like to read what we expect instead of what's actually written). Roland Am 09.08.2013 20:40, schrieb srol...@vmware.com: From: Roland Scheidegger srol...@vmware.com The old float comparison opcodes always return floats 0.0 and 1.0 (clarified in docs these were really floats, was always the case) for legacy graphics. But everybody else (opengl,opencl,d3d10) just has to work around their return results (converting the returned float back to int/boolean). --- src/gallium/docs/source/tgsi.rst | 84 ++ 1 file changed, 68 insertions(+), 16 deletions(-) diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 949ad89..b7c40cf 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -512,13 +512,13 @@ This instruction replicates its result. .. math:: - dst.x = (src0.x == src1.x) ? 1 : 0 + dst.x = (src0.x == src1.x) ? 1.0F : 0.0F - dst.y = (src0.y == src1.y) ? 1 : 0 + dst.y = (src0.y == src1.y) ? 1.0F : 0.0F - dst.z = (src0.z == src1.z) ? 1 : 0 + dst.z = (src0.z == src1.z) ? 1.0F : 0.0F - dst.w = (src0.w == src1.w) ? 1 : 0 + dst.w = (src0.w == src1.w) ? 1.0F : 0.0F .. opcode:: SFL - Set On False @@ -538,13 +538,13 @@ This instruction replicates its result. .. math:: - dst.x = (src0.x src1.x) ? 1 : 0 + dst.x = (src0.x src1.x) ? 1.0F : 0.0F - dst.y = (src0.y src1.y) ? 1 : 0 + dst.y = (src0.y src1.y) ? 1.0F : 0.0F - dst.z = (src0.z src1.z) ? 1 : 0 + dst.z = (src0.z src1.z) ? 1.0F : 0.0F - dst.w = (src0.w src1.w) ? 1 : 0 + dst.w = (src0.w src1.w) ? 1.0F : 0.0F .. opcode:: SIN - Sine @@ -560,26 +560,26 @@ This instruction replicates its result. .. math:: - dst.x = (src0.x = src1.x) ? 1 : 0 + dst.x = (src0.x = src1.x) ? 1.0F : 0.0F - dst.y = (src0.y = src1.y) ? 1 : 0 + dst.y = (src0.y = src1.y) ? 1.0F : 0.0F - dst.z = (src0.z = src1.z) ? 1 : 0 + dst.z = (src0.z = src1.z) ? 1.0F : 0.0F - dst.w = (src0.w = src1.w) ? 1 : 0 + dst.w = (src0.w = src1.w) ? 1.0F : 0.0F .. opcode:: SNE - Set On Not Equal .. math:: - dst.x = (src0.x != src1.x) ? 1 : 0 + dst.x = (src0.x != src1.x) ? 1.0F : 0.0F - dst.y = (src0.y != src1.y) ? 1 : 0 + dst.y = (src0.y != src1.y) ? 1.0F : 0.0F - dst.z = (src0.z != src1.z) ? 1 : 0 + dst.z = (src0.z != src1.z) ? 1.0F : 0.0F - dst.w = (src0.w != src1.w) ? 1 : 0 + dst.w = (src0.w != src1.w) ? 1.0F : 0.0F .. opcode:: STR - Set On True @@ -1325,6 +1325,19 @@ Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?) +.. opcode:: FSLT - Float Set On Less Than (ordered) + +.. math:: + + dst.x = (src0.x src1.x) ? ~0 : 0 + + dst.y = (src0.y src1.y) ? ~0 : 0 + + dst.z = (src0.z src1.z) ? ~0 : 0 + + dst.w = (src0.w src1.w) ? ~0 : 0 + + .. opcode:: ISLT - Signed Integer Set On Less Than .. math:: @@ -1351,6 +1364,19 @@ Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?) dst.w = (src0.w src1.w) ? ~0 : 0 +.. opcode:: FSGE - Float Set On Greater Equal Than (ordered) + +.. math:: + + dst.x = (src0.x = src1.x) ? ~0 : 0 + + dst.y = (src0.y = src1.y) ? ~0 : 0 + + dst.z = (src0.z = src1.z) ? ~0 : 0 + + dst.w = (src0.w = src1.w) ? ~0 : 0 + + .. opcode:: ISGE - Signed Integer Set On Greater Equal Than .. math:: @@ -1377,6 +1403,19 @@ Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?) dst.w = (src0.w = src1.w) ? ~0 : 0 +.. opcode:: FSEQ - Float Set On Equal (ordered) + +.. math:: + + dst.x = (src0.x == src1.x) ? ~0 : 0 + + dst.y = (src0.y == src1.y) ? ~0 : 0 + + dst.z = (src0.z == src1.z) ? ~0 : 0 + + dst.w = (src0.w == src1.w) ? ~0 : 0 + + .. opcode:: USEQ - Integer Set On Equal .. math:: @@
Re: [Mesa-dev] [PATCH 4/6] i965/fs: Optimize IF/MOV/ELSE/MOV/ENDIF to SEL when possible.
On 06.08.2013 03:28, Kenneth Graunke wrote: Many GLSL shaders contain code of the form: x = condition ? foo : bar The compiler emits an ir_if tree for this, since each subexpression might be a complex tree that could have side-effects and short-circuit logic operations. However, the common case is to simply pick one of two constants or variable's values---which is exactly what SEL is for. Replacing IF/ELSE with SEL also simplifies the control flow graph, making optimization passes which work on basic blocks more effective. Don't you think something like that should be implemented in common code so that all drivers can profit ? It would be really nice to have more, useful device-independent optimizations or simplifications like this already done instead of requiring each driver to re-implement them (or use llvm). The shader-db statistics: total instructions in shared programs: 1655247 - 1503234 (-9.18%) instructions in affected programs: 949188 - 797175 (-16.02%) 2,970 shaders were helped, none hurt. Gained 181 SIMD16 programs. This helps Valve's Source Engine games (max -41.33%), The Cave (max -33.33%), Serious Sam 3 (max -18.64%), Yo Frankie! (max -30.19%), Zen Bound (max -22.22%), GStreamer (max -6.12%), and GLBenchmark 2.7 (max -1.94%). Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_fs.h | 1 + src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 78 2 files changed, 79 insertions(+) The pattern matching stuff here might be useful to abstract for reuse in other peephole type optimizations; ensuring that the right opcodes exist without accidentally walking the list is tricky to get right. Then again, I'm not sure how many useful peephole optimizations we'll have; it may be more useful in many cases to walk a UD-chain rather than looking at consecutive instructions. diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 370ab6c..7feb2b6 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -369,6 +369,7 @@ public: fs_reg src0, fs_reg src1); bool try_emit_saturate(ir_expression *ir); bool try_emit_mad(ir_expression *ir, int mul_arg); + void try_replace_with_sel(); void emit_bool_to_cond_code(ir_rvalue *condition); void emit_if_gen6(ir_if *ir); void emit_unspill(fs_inst *inst, fs_reg reg, uint32_t spill_offset); diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index ee7728c..a36c248 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1842,6 +1842,82 @@ fs_visitor::emit_if_gen6(ir_if *ir) inst-predicate = BRW_PREDICATE_NORMAL; } +/** + * Try to replace IF/MOV/ELSE/MOV/ENDIF with SEL. + * + * Many GLSL shaders contain the following pattern: + * + *x = condition ? foo : bar + * + * The compiler emits an ir_if tree for this, since each subexpression might be + * a complex tree that could have side-effects or short-circuit logic. + * + * However, the common case is to simply select one of two constants or + * variable values---which is exactly what SEL is for. In this case, the + * assembly looks like: + * + *(+f0) IF + *MOV dst src0 + *ELSE + *MOV dst src1 + *ENDIF + * + * which can be easily translated into: + * + *(+f0) SEL dst src0 src1 + * + * If src0 is an immediate value, we promote it to a temporary GRF. + */ +void +fs_visitor::try_replace_with_sel() +{ + fs_inst *endif_inst = (fs_inst *) instructions.get_tail(); + assert(endif_inst-opcode == BRW_OPCODE_ENDIF); + + /* Pattern match in reverse: IF, MOV, ELSE, MOV, ENDIF. */ + int opcodes[] = { + BRW_OPCODE_IF, BRW_OPCODE_MOV, BRW_OPCODE_ELSE, BRW_OPCODE_MOV, + }; + + fs_inst *match = (fs_inst *) endif_inst-prev; + for (int i = 0; i 4; i++) { + if (match-is_head_sentinel() || match-opcode != opcodes[4-i-1]) + return; + match = (fs_inst *) match-prev; + } + + /* The opcodes match; it looks like the right sequence of instructions. */ + fs_inst *else_mov = (fs_inst *) endif_inst-prev; + fs_inst *then_mov = (fs_inst *) else_mov-prev-prev; + fs_inst *if_inst = (fs_inst *) then_mov-prev; + + /* Check that the MOVs are the right form. */ + if (then_mov-dst.equals(else_mov-dst) + !then_mov-is_partial_write() + !else_mov-is_partial_write()) { + + /* Remove the matched instructions; we'll emit a SEL to replace them. */ + while (!if_inst-next-is_tail_sentinel()) + if_inst-next-remove(); + if_inst-remove(); + + /* Only the last source register can be a constant, so if the MOV in + * the then clause uses a constant, we need to put it in a temporary. + */ +
Re: [Mesa-dev] [PATCH 4/6] i965/fs: Optimize IF/MOV/ELSE/MOV/ENDIF to SEL when possible.
On 06.08.2013 19:19, Matt Turner wrote: On Tue, Aug 6, 2013 at 4:14 AM, Christoph Bumiller e0425...@student.tuwien.ac.at wrote: On 06.08.2013 03:28, Kenneth Graunke wrote: Many GLSL shaders contain code of the form: x = condition ? foo : bar The compiler emits an ir_if tree for this, since each subexpression might be a complex tree that could have side-effects and short-circuit logic operations. However, the common case is to simply pick one of two constants or variable's values---which is exactly what SEL is for. Replacing IF/ELSE with SEL also simplifies the control flow graph, making optimization passes which work on basic blocks more effective. Don't you think something like that should be implemented in common code so that all drivers can profit ? We would love that. As part of an work in progress, I'm adding conditional-select to the GLSL IR. We planned a few months ago to do this as a step toward SSA at the IR level, but have only laid a little bit of groundwork in that direction (Ian's vector insert/extract series). Looks like your backend already does SSA. Shouldn't that be implemented in common code? :) Then the code would have to run on GLSL IR as well as my internal IR because the intermediate one, TGSI, shouldn't be in SSA form, and abstracting an IR doesn't sound particularly fun. Also I don't have to handle vectors so it's a bit simpler, actually pretty straightforward if you implement an existing algorithm. As for some other passes that could be shared, I still need them in the backend to be applied to device-specifc code sequences, you probably have a similar situation. It would be really nice to have more, useful device-independent optimizations or simplifications like this already done instead of requiring each driver to re-implement them (or use llvm). Yes, it definitely would. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/34] mesa/st: Add VARYING_SLOT_TEX[1-7] to st_translate_geometry_program().
On 29.07.2013 08:03, Paul Berry wrote: From: Bryan Cain bryanca...@gmail.com v2 (Paul Berry stereotype...@gmail.com: Split out to separate patch (previously this was part of glsl: add builtins for geometry shaders.) --- src/mesa/state_tracker/st_program.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/mesa/state_tracker/st_program.c b/src/mesa/state_tracker/st_program.c index 60cc37c..211b879 100644 --- a/src/mesa/state_tracker/st_program.c +++ b/src/mesa/state_tracker/st_program.c @@ -911,6 +911,13 @@ st_translate_geometry_program(struct st_context *st, stgp-input_semantic_index[slot] = 0; break; case VARYING_SLOT_TEX0: + case VARYING_SLOT_TEX1: + case VARYING_SLOT_TEX2: + case VARYING_SLOT_TEX3: + case VARYING_SLOT_TEX4: + case VARYING_SLOT_TEX5: + case VARYING_SLOT_TEX6: + case VARYING_SLOT_TEX7: stgp-input_semantic_name[slot] = TGSI_SEMANTIC_GENERIC; stgp-input_semantic_index[slot] = num_generic++; break; This doesn't work, first because the semantic index shouldn't depend on which varyings are present, and second because TEX is required to use TGSI_SEMANTIC_TEXCOORD if the driver has PIPE_CAP_TGSI_TEXCOORD. Please see st_prepare_vertex_program. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: expose EXT_framebuffer_multisample_blit_scaled if MSAA is supported
On 17.07.2013 02:05, Marek Olšák wrote: No, it's not faster, but it's not slower either. Now that I think about it, I can't come up with a good shader-based algorithm for the resolve operation. I don't think Christoph's approach that an MSAA texture can be viewed as a larger single-sample texture is correct, because the physical locations of the samples in memory usually do not correspond to the sample locations the 3D engine used for rasterization. so fetching a texel from the larger texture at (x,y) physical coordinates won't always return the closest rasterized sample at those coordinates. Also the bilinear filter would be horrible in this case, because it only takes 4 samples per pixel. It can also take 8 samples per-pixel or MS8 resolve wouldn't be possible, so scaling down MS2 should look OK at least. The arrangement of the samples in the texture is ordered according to the physical sample locations (of course the proportions don't match). Besides, it's allowed to look horrible, it only requires a 4-tap linear filter, and depth resolve, should anyone actually do that, uses point/nearest filtering so you can always do it in one pass. Also, it's possible this extension isn't even intended to resolve for direct display. There's one advantage in having the app do it though, you don't have to worry about keeping a temporary surface of an appropriate size around. Now let's consider implementing the scaled resolve operation in the shader by texelFetch-ing all samples and using a bilinear filter. For Nx MSAA, there would be N*4 texel fetches per pixel; in comparison, separate resolve+blit needs only N+4 texel fetches per pixel. In addition to that, the resolve is a special fixed-function blending operation and the fragment shader is not even executed. See? Separate resolve+blit beats everything. Marek On Wed, Jul 17, 2013 at 12:12 AM, Grigori Goronzy g...@chown.ath.cx wrote: On 16.07.2013 19:26, Marek Olšák wrote: Surprisingly all drivers supporting MSAA can already do this (r300g and r600g for sure) and I think Christoph wanted to have this feature for his Nouveau drivers anyway. OK, they can do it, but is it actually any faster than doing a resolve and regular blit afterwards? This is kind of the point of this extension. r600g creates a temporary texture to resolve into and then blits that, which shouldn't be any faster than doing the same from GL. Grigori ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Direct3D 9 state tracker
So, about two months ago I had the insane idea to pick up Joakim Sindholt's Direct3D 9 state tracker that he'd started about 3 years ago with the goal to make it run StarCraft 2 so I could finally play at a reasonable frame rate ... With help from Joakim and advice from the wine developers, as well as wine's d3d9 tests, things went surprisingly smooth and my original goal has been achieved and surpassed, hence I thought I'd post a note here in case someone who doesn't yet know about it is interested in trying it out. ... Now wait, didn't we have a D3D10/11 state tracker already that we kicked out because it was unmaintained and not really useful ? Yes, but there are a couple of differences to d3d1x: - the original author has not vanished [yet] (Luca, if you can hear me: You cannot leave your children out to die like that !) - it's written in C instead of C++ and not relying on horrific multiple inheritance with templates hacks to make gcc generate COM-compatible vtables (and I'm still not sure if that actually worked) - gallium wasn't ready for D3D11, and still isn't (at least the pipe drivers aren't), but it is ready for D3D9, and all the features required from the pipe drivers are well tested via OpenGL - there are no motivating applications using Direct3D 10/11 yet (at least for me) - and most importantly, contrary to d3d1x, d3d9/st already actually works for real applications ! So far I've tried Skyrim, Civilization 5, Anno 1404 and StarCraft 2 on the nvc0 and r600g drivers, which work pretty well, at up to x2 the fps I get with wined3d (NOTE: no thorough benchmarking done yet). Civilization 4 works, too, but it still has a couple of (not too severe) rendering issues because I didn't pay much attention to the fixed function pipeline and its interaction with the earlier shader versions yet. If people think it's a good idea to merge it, I'd clean up the few modifications I did to gallium, and, once they've been cleared, merge the state tracker itself. Unfortunately, for proper window system integration, a few modifications to wine are required (it used to run without them, but fully correct operation isn't possible like that). Here's the links to the mesa branch containing the state tracker and to a patched version of wine: https://github.com/chrisbmr/Mesa-3D/tree/gallium-nine https://github.com/chrisbmr/wine/tree/d3dadapter9-wip (The wine modifications only affect { d3d9.dll.so, gdi32.dll.so, user32.dll.so, wineps.drv.so and winex11.drv.so }, so you don't have to replace all of it). Some usage hints: https://github.com/chrisbmr/Mesa-3D/blob/gallium-nine/src/gallium/state_trackers/nine/README ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: expose EXT_framebuffer_multisample_blit_scaled if MSAA is supported
On 17.07.2013 00:12, Grigori Goronzy wrote: On 16.07.2013 19:26, Marek Olšák wrote: Surprisingly all drivers supporting MSAA can already do this (r300g and r600g for sure) and I think Christoph wanted to have this feature for his Nouveau drivers anyway. OK, they can do it, but is it actually any faster than doing a resolve and regular blit afterwards? This is kind of the point of this extension. r600g creates a temporary texture to resolve into and then blits that, which shouldn't be any faster than doing the same from GL. You can implement arbitrary filters for resolve since you're doing it manually using texelFetch from a shader anyway, so yes you can make it faster (for depth/stencil resolve this is trivial), or at least leave that option open, while if GL apps do it manually you can't do anything about it. NV50/NVC0 just use a single plain old scaled blit for resolve because a multisample texture's samples are all adjacent in 2D coordinate space, it's no different from downscaling a larger texture, so there it's always going to be faster. Granted, it might look ugly if I can't find a fitting filtering mode, but I'll just ignore that until I see some application using it that relies on SCALED_RESOLVE_NICEST_EXT looking decent. Grigori ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] tgsi: rename the TGSI fragment kill opcodes
On 12.07.2013 16:06, Jose Fonseca wrote: The tradition has been to use C suffix for conditional opcodes, instead of _IF. That said, I don't feel too strongly either way. Except the 'C' suffix usually (ok, we only have BREAKC) indicates a single condition value where non-zero means true, while KIL operates on all 4 components and executes if either is 0. I'd still prefer to keep the name KIL instead of KILL_IF and simply rename KILP to DISCARD, which is the name used in GLSL and SM4. I agree that the current naming is confusing. And I like the fact that the new and old opcodes don't overlap, which means there is no way we inadvertently get the wrong ones when updating out-of-tree state trackers. And it's nice to see this sort of cleanups. I know from experience that that they can be time consuming, but I do believe they pay up eventually. I believe Gallium pipe_screen/pipe_context interfaces are quite lean and straightforward these days, but the opcodes are still a big mess, and shaders are one of the most (if not the most) important parts of the interface. For the series: Reviewed-by: Jose Fonseca jfons...@vmware.com Jose - Original Message - TGSI_OPCODE_KIL and KILP had confusing names. The former was conditional kill (if any src component 0). The later was unconditional kill. At one time KILP was supposed to work with NV-style condition codes/predicates but we never had that in TGSI. This patch renames both opcodes: TGSI_OPCODE_KIL - KILL_IF (kill if src.xyzw 0) TGSI_OPCODE_KILP - KILL (unconditional kill) Note: I didn't just transpose the opcode names to help ensure that I didn't miss updating any code anywhere. I believe I've updated all the relevant code and comments but I'm not 100% sure that some drivers had this right in the first place. For example, the radeon driver might have llvm.AMDGPU.kill and llvm.AMDGPU.kilp mixed up. Driver authors should review their code. --- src/gallium/auxiliary/draw/draw_pipe_aapoint.c |4 ++-- src/gallium/auxiliary/draw/draw_pipe_pstipple.c |8 src/gallium/auxiliary/gallivm/lp_bld_flow.c |2 +- src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c |8 src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c |6 ++ src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 16 src/gallium/auxiliary/postprocess/pp_mlaa.h |6 +++--- src/gallium/auxiliary/tgsi/tgsi_exec.c | 14 +++--- src/gallium/auxiliary/tgsi/tgsi_info.c |4 ++-- src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h |4 ++-- src/gallium/auxiliary/tgsi/tgsi_scan.c |4 ++-- src/gallium/auxiliary/tgsi/tgsi_scan.h |2 +- src/gallium/auxiliary/util/u_pstipple.c | 10 +- src/gallium/auxiliary/vl/vl_mc.c |2 +- src/gallium/docs/source/tgsi.rst |6 -- src/gallium/drivers/i915/i915_fpc_optimize.c |4 ++-- src/gallium/drivers/i915/i915_fpc_translate.c|8 +++- src/gallium/drivers/ilo/shader/ilo_shader_fs.c |2 +- src/gallium/drivers/ilo/shader/toy_tgsi.c| 16 src/gallium/drivers/nv30/nvfx_fragprog.c |6 +++--- .../drivers/nv50/codegen/nv50_ir_from_tgsi.cpp | 10 +- src/gallium/drivers/r300/compiler/r3xx_fragprog.c|2 +- .../drivers/r300/compiler/radeon_program_alu.c | 12 ++-- .../drivers/r300/compiler/radeon_program_alu.h |2 +- src/gallium/drivers/r300/r300_tgsi_to_rc.c |4 ++-- src/gallium/drivers/r600/r600_shader.c | 14 +++--- src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c |8 src/gallium/drivers/softpipe/sp_quad_depth_test.c|2 +- src/gallium/drivers/svga/svga_tgsi_insn.c| 18 +- src/gallium/include/pipe/p_shader_tokens.h |4 ++-- src/mesa/state_tracker/st_glsl_to_tgsi.cpp |6 +++--- src/mesa/state_tracker/st_mesa_to_tgsi.c |5 +++-- 32 files changed, 109 insertions(+), 110 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_aapoint.c b/src/gallium/auxiliary/draw/draw_pipe_aapoint.c index ec703d0..0d7b88e 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_aapoint.c +++ b/src/gallium/auxiliary/draw/draw_pipe_aapoint.c @@ -308,9 +308,9 @@ aa_transform_inst(struct tgsi_transform_context *ctx, newInst.Src[1].Register.SwizzleY = TGSI_SWIZZLE_W; ctx-emit_instruction(ctx, newInst); - /* KIL -tmp0.; # if -tmp0.y 0, KILL */ + /* KILL_IF -tmp0.; # if -tmp0.y 0, KILL */ newInst = tgsi_default_full_instruction(); - newInst.Instruction.Opcode = TGSI_OPCODE_KIL; + newInst.Instruction.Opcode = TGSI_OPCODE_KILL_IF;
Re: [Mesa-dev] [PATCH 1/3] gallium: add expand_resource interface
On 11.07.2013 20:15, Marek Olšák wrote: Hi Roland, The fast color clear on Radeon doesn't touch the memory of the texture resource. Instead, it changes some GPU meta data that say the resource is cleared (the location of the meta data is stored in pipe_resource). This works fine as long as the gallium pipe_resource structure is used for accessing the resource. That's not the case with the DDX, which is responsible for putting the resource on the screen and it obviously has no idea about the contents of pipe_resource, so it doesn't know that the resource is in a cleared state and a special flush operation must be done to actually write the cleared pixels (which haven't been overwritten by new geometry of course). If I was mean I would suggest you just associate the information with the bo and have the DDX import that, too. The easiest way to solve this is to flush the cleared resource in SwapBuffers and where the front buffer is flushed. The Gallium driver can't do it automatically, because it has no notion of front and back buffers nor does it know which resource must be flushed. That's why a new pipe_context function is being proposed, which was originally my idea. You could cloak the function under a more generic name, then you're less likely to encounter reactions like hardware details don't belong in the API. First I thought of flush_frontbuffer from pipe_screen, but that seems to have a different (or, no) purpose. This commit only fixes r600g for st/dri. Any other co-state tracker (like st/egl and st/xlib) will be broken if it's used with r600g. I think we can ignore st/xlib. Not sure how important st/egl is (not required for EGL under X). Marek On Wed, Jul 10, 2013 at 7:32 PM, Roland Scheidegger srol...@vmware.com wrote: I don't quite understand what this should do, at first sight it looks like a ugly hack (which should really not be part of gallium interface) to make fast color clearing work better with window framebuffers. Seems to go against the idea of resources (which are immutable, well not the contents but the properties). (If anything I wanted an interface to change bind flags for resources after initialization, because they are near impossible to guarantee with OpenGL's (or d3d9 for that matter) distinct texture/fb model, but that would also be quite a hack.) Could you elaborate with some example what that's supposed to do in practice? Roland Am 10.07.2013 18:20, schrieb Grigori Goronzy: This interface is used to expand fast-cleared window system colorbuffers. --- src/gallium/include/pipe/p_context.h | 8 src/gallium/state_trackers/dri/common/dri_drawable.c | 4 src/gallium/state_trackers/dri/drm/dri2.c| 8 ++-- 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/src/gallium/include/pipe/p_context.h b/src/gallium/include/pipe/p_context.h index aa18cbf..38d5ee6 100644 --- a/src/gallium/include/pipe/p_context.h +++ b/src/gallium/include/pipe/p_context.h @@ -354,6 +354,14 @@ struct pipe_context { unsigned dstx, unsigned dsty, unsigned width, unsigned height); + /** +* Expand a color resource in-place. +* +* \return TRUE if resource was expanded, FALSE otherwise +*/ + boolean (*expand_resource)(struct pipe_context *pipe, + struct pipe_resource *dst); + /** Flush draw commands * * \param flags bitfield of enum pipe_flush_flags values. diff --git a/src/gallium/state_trackers/dri/common/dri_drawable.c b/src/gallium/state_trackers/dri/common/dri_drawable.c index 18d8d89..b67a497 100644 --- a/src/gallium/state_trackers/dri/common/dri_drawable.c +++ b/src/gallium/state_trackers/dri/common/dri_drawable.c @@ -448,6 +448,10 @@ dri_flush(__DRIcontext *cPriv, } /* FRONT_LEFT is resolved in drawable-flush_frontbuffer. */ + } else if (ctx-st-pipe-expand_resource) { + /* Expand fast-cleared framebuffer */ + ctx-st-pipe-expand_resource(ctx-st-pipe, + drawable-textures[ST_ATTACHMENT_BACK_LEFT]); } dri_postprocessing(ctx, drawable, ST_ATTACHMENT_BACK_LEFT); diff --git a/src/gallium/state_trackers/dri/drm/dri2.c b/src/gallium/state_trackers/dri/drm/dri2.c index 1dcc1f7..97784ec 100644 --- a/src/gallium/state_trackers/dri/drm/dri2.c +++ b/src/gallium/state_trackers/dri/drm/dri2.c @@ -490,18 +490,22 @@ dri2_flush_frontbuffer(struct dri_context *ctx, { __DRIdrawable *dri_drawable = drawable-dPriv; struct __DRIdri2LoaderExtensionRec *loader = drawable-sPriv-dri2.loader; + struct pipe_context *pipe = ctx-st-pipe; if (statt != ST_ATTACHMENT_FRONT_LEFT) return; if (drawable-stvis.samples 1) { - struct pipe_context *pipe = ctx-st-pipe; - /* Resolve the front buffer. */ dri_pipe_blit(ctx-st-pipe,
Re: [Mesa-dev] [PATCH 2/3] tgsi: fix-up KILP comments
On 12.07.2013 01:26, Brian Paul wrote: KILP is really unconditional fragment kill. We've had KIL and KILP transposed forever. I'll fix that next. I think the 'P' meant to indicate that the condition, if there is any, would be a predicate register, whereas KIL no-P is supposed to represent the KIL/TEXKILL instruction from those old shader languages. So, it's not transposed, it's just an initially confusing name. Maybe just s/KILP/DISCARD instead of swapping them ? --- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c |3 +-- src/gallium/auxiliary/tgsi/tgsi_exec.c |5 ++--- src/gallium/docs/source/tgsi.rst| 10 +- src/mesa/state_tracker/st_glsl_to_tgsi.cpp |1 + 4 files changed, 9 insertions(+), 10 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c index 43724e7..43182ee 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c @@ -2096,8 +2096,7 @@ emit_kil( /** - * Predicated fragment kill. - * XXX Actually, we do an unconditional kill (as in tgsi_exec.c). + * Unconditional fragment kill. * The only predication is the execution mask which will apply if * we're inside a loop or conditional. */ diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c b/src/gallium/auxiliary/tgsi/tgsi_exec.c index eaf..035b105 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_exec.c +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c @@ -1614,8 +1614,7 @@ exec_kil(struct tgsi_exec_machine *mach, } /** - * Execute NVIDIA-style KIL which is predicated by a condition code. - * Kill fragment if the condition code is TRUE. + * Unconditional fragment kill/discard. */ static void exec_kilp(struct tgsi_exec_machine *mach, @@ -1623,7 +1622,7 @@ exec_kilp(struct tgsi_exec_machine *mach, { uint kilmask; /* bit 0 = pixel 0, bit 1 = pixel 1, etc */ - /* unconditional kil */ + /* kill fragment for all fragments currently executing */ kilmask = mach-ExecMask; mach-Temps[TEMP_KILMASK_I].xyzw[TEMP_KILMASK_C].u[0] |= kilmask; } diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 3f48b51..8c6fec9 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -471,11 +471,6 @@ This instruction replicates its result. dst.w = partialy(src.w) -.. opcode:: KILP - Predicated Discard - - Not really predicated, just unconditional discard - - .. opcode:: PK2H - Pack Two 16-bit Floats TBD @@ -755,6 +750,11 @@ This instruction replicates its result. endif +.. opcode:: KILP - Discard + + Unconditional discard. Allowed in fragment shaders only. + + .. opcode:: SCS - Sine Cosine .. math:: diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 64e0a8a..9e0a648 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -2978,6 +2978,7 @@ glsl_to_tgsi_visitor::visit(ir_discard *ir) this-result.negate = ~this-result.negate; emit(ir, TGSI_OPCODE_KIL, undef_dst, this-result); } else { + /* unconditional kil */ emit(ir, TGSI_OPCODE_KILP); } } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: x/y coordinates must be divided by block dim in dma blit
From: Christoph Bumiller christoph.bumil...@speed.at --- src/gallium/drivers/r600/evergreen_state.c | 10 -- src/gallium/drivers/r600/r600_state.c | 10 -- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 0dc4f15..0267d28 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -3740,6 +3740,7 @@ boolean evergreen_dma_blit(struct pipe_context *ctx, struct r600_texture *rdst = (struct r600_texture*)dst; unsigned dst_pitch, src_pitch, bpp, dst_mode, src_mode, copy_height; unsigned src_w, dst_w; + unsigned src_x, src_y; if (rctx-rings.dma.cs == NULL) { return FALSE; @@ -3748,6 +3749,11 @@ boolean evergreen_dma_blit(struct pipe_context *ctx, return FALSE; } + src_x = util_format_get_nblocksx(src-format, src_box-x); + dst_x = util_format_get_nblocksx(src-format, dst_x); + src_y = util_format_get_nblocksy(src-format, src_box-y); + dst_y = util_format_get_nblocksy(src-format, dst_y); + bpp = rdst-surface.bpe; dst_pitch = rdst-surface.level[dst_level].pitch_bytes; src_pitch = rsrc-surface.level[src_level].pitch_bytes; @@ -3792,7 +3798,7 @@ boolean evergreen_dma_blit(struct pipe_context *ctx, */ src_offset= rsrc-surface.level[src_level].offset; src_offset += rsrc-surface.level[src_level].slice_size * src_box-z; - src_offset += src_box-y * src_pitch + src_box-x * bpp; + src_offset += src_y * src_pitch + src_x * bpp; dst_offset = rdst-surface.level[dst_level].offset; dst_offset += rdst-surface.level[dst_level].slice_size * dst_z; dst_offset += dst_y * dst_pitch + dst_x * bpp; @@ -3800,7 +3806,7 @@ boolean evergreen_dma_blit(struct pipe_context *ctx, src_box-height * src_pitch); } else { evergreen_dma_copy_tile(rctx, dst, dst_level, dst_x, dst_y, dst_z, - src, src_level, src_box-x, src_box-y, src_box-z, + src, src_level, src_x, src_y, src_box-z, copy_height, dst_pitch, bpp); } return TRUE; diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 301ca88..ac0e0ce 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -3139,6 +3139,7 @@ boolean r600_dma_blit(struct pipe_context *ctx, struct r600_texture *rdst = (struct r600_texture*)dst; unsigned dst_pitch, src_pitch, bpp, dst_mode, src_mode, copy_height; unsigned src_w, dst_w; + unsigned src_x, src_y; if (rctx-rings.dma.cs == NULL) { return FALSE; @@ -3147,6 +3148,11 @@ boolean r600_dma_blit(struct pipe_context *ctx, return FALSE; } + src_x = util_format_get_nblocksx(src-format, src_box-x); + dst_x = util_format_get_nblocksx(src-format, dst_x); + src_y = util_format_get_nblocksy(src-format, src_box-y); + dst_y = util_format_get_nblocksy(src-format, dst_y); + bpp = rdst-surface.bpe; dst_pitch = rdst-surface.level[dst_level].pitch_bytes; src_pitch = rsrc-surface.level[src_level].pitch_bytes; @@ -3179,7 +3185,7 @@ boolean r600_dma_blit(struct pipe_context *ctx, */ src_offset= rsrc-surface.level[src_level].offset; src_offset += rsrc-surface.level[src_level].slice_size * src_box-z; - src_offset += src_box-y * src_pitch + src_box-x * bpp; + src_offset += src_y * src_pitch + src_x * bpp; dst_offset = rdst-surface.level[dst_level].offset; dst_offset += rdst-surface.level[dst_level].slice_size * dst_z; dst_offset += dst_y * dst_pitch + dst_x * bpp; @@ -3191,7 +3197,7 @@ boolean r600_dma_blit(struct pipe_context *ctx, r600_dma_copy(rctx, dst, src, dst_offset, src_offset, size); } else { return r600_dma_copy_tile(rctx, dst, dst_level, dst_x, dst_y, dst_z, - src, src_level, src_box-x, src_box-y, src_box-z, + src, src_level, src_x, src_y, src_box-z, copy_height, dst_pitch, bpp); } return TRUE; -- 1.8.1.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] RFC: more changes to render_condition
On 22.06.2013 16:36, Roland Scheidegger wrote: We decided to drop predicated transfers already. State tracker can emulate this by using another resource and doing a (predicated) resource_copy_region, might be slightly suboptimal but predicated transfers really sound strange. As for resource_copy_region, I'm fine with a flag indicating if it honors predication or not. You can have that for blit too if you need it (maybe if you implement resource_copy_region as a blit?), I was thinking about it (it is not obvious why blit should behave differently really), but decided against it because d3d10 apparently does not seem to require it (and other apis don't predicate that stuff anyway), unless the docs are wrong (resolve isn't mentioned among the predicated functions). Roland You could still go with adding a separate render_condition (non_render_stuff_condition) for transfers, copies and blits, that's how it's done on NV hardware. Adding booleans to all the functions looks ugly. Am 22.06.2013 14:27, schrieb Marek Olšák: I have mixed feelings about this. Some transfers are implemented with pipe_context::blit instead of resource_copy_region, because MSAA resources should be downsampled in transfer_map and upsampled in transfer_unmap, so that ReadPixels and various fallbacks (CopyPixels, CopyTexSubImage, ...) work. If transfers were to honor the render condition, the blit (including resolve) must honor it too. Adding a boolean flag to resource_copy_region and blit saying whether the render condition should be honored is preferable. This should keep the render-condition disabling in the driver as it is now. Trying to save/restore the render condition before/after all occurences of resource_copy_region and blit would be prone to regressions and it would also need much more work. Marek On Sat, Jun 15, 2013 at 12:01 AM, Roland Scheidegger srol...@vmware.com wrote: Am 14.06.2013 19:49, schrieb srol...@vmware.com: From: Roland Scheidegger srol...@vmware.com For conditional rendering this makes it possible to skip rendering if either the predicate is true or false, as supported by d3d10 (in fact previously it was sort of implied skip rendering if predicate is false for occlusion predicate, and true for so_overflow predicate). There's no cap bit for this as presumably all drivers could do it trivially (but this patch does not implement it for the drivers using true hw predicates, nvxx, r600, radeonsi, no change is expected for OpenGL functionality). --- FWIW there's some more changes which would be useful but they are probably more controversial and may require some more thought so here it goes: diff --git a/src/gallium/docs/source/context.rst b/src/gallium/docs/source/context.rst index ede89be..59403de 100644 --- a/src/gallium/docs/source/context.rst +++ b/src/gallium/docs/source/context.rst @@ -385,7 +385,8 @@ A drawing command can be skipped depending on the outcome of a query (typically an occlusion query, or streamout overflow predicate). The ``render_condition`` function specifies the query which should be checked prior to rendering anything. Functions honoring render_condition include -(and are limited to) draw_vbo, clear, clear_render_target, clear_depth_stencil. +(and are limited to) draw_vbo, clear, clear_render_target, clear_depth_stencil, +resource_copy_region. Transfers may also be affected. If ``render_condition`` is called with ``query`` = NULL, conditional rendering is disabled and drawing takes place normally. @@ -545,6 +546,13 @@ These flags control the behavior of a transfer object. Written ranges will be notified later with :ref:`transfer_flush_region`. Cannot be used with ``PIPE_TRANSFER_READ``. +``PIPE_TRANSFER_HONOR_RENDER_CONDITION`` + The transfer will honor the current render condition. This is only valid + essentially for ``transfer_inline_write`` (but since everyone implements + this with a fallback to ordinary transfer_map/transfer_unmap it is valid + for transfer_map too, however the same restriction apply, the transfer + must be write-only with either DISCARD_RANGE or DISCARD_WHOLE_RESOURCE set). + The reasoning for this is that d3d10 has CopyResource/CopySubResource and UpdateSubResource predicated. For resource_copy_region if it always honors render_condition, then state trackers not wanting this can simply disable predication when they call it. But the opposite is not possible, if it never honors predication, then a state tracker needing predication will need to wait on the predicate, hence requiring a cpu/gpu sync (if the result isn't available yet). For transfers this is a bit weird I admit it essentially implies a predicated gpu blit from a staging texture (if you implement this fully on hardware). If that's too awkward though this one could be emulated in the state tracker easily enough, if resource_copy_region honors predication (by just creating a temporary
Re: [Mesa-dev] [PATCH 2/2] gallium/draw: add limits to the clip and cull distances
On 12.06.2013 15:57, Jose Fonseca wrote: - Original Message - Am 11.06.2013 05:39, schrieb Zack Rusin: There are strict limits on those registers. Define the maximums and use them instead of magic numbers. Also allows us to add some extra sanity checks. Suggested by Brian. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_context.c |2 ++ src/gallium/auxiliary/draw/draw_gs.c | 10 +- src/gallium/auxiliary/draw/draw_gs.h |4 ++-- src/gallium/auxiliary/draw/draw_vs.c | 10 +- src/gallium/auxiliary/draw/draw_vs.h |4 ++-- src/gallium/docs/source/tgsi.rst | 23 +++ src/gallium/include/pipe/p_state.h|2 ++ 7 files changed, 41 insertions(+), 14 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index 0dbddb4..22c0e9b 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -738,6 +738,7 @@ draw_current_shader_clipvertex_output(const struct draw_context *draw) uint draw_current_shader_clipdistance_output(const struct draw_context *draw, int index) { + debug_assert(index PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT); if (draw-gs.geometry_shader) return draw-gs.geometry_shader-clipdistance_output[index]; return draw-vs.clipdistance_output[index]; @@ -756,6 +757,7 @@ draw_current_shader_num_written_clipdistances(const struct draw_context *draw) uint draw_current_shader_culldistance_output(const struct draw_context *draw, int index) { + debug_assert(index PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT); if (draw-gs.geometry_shader) return draw-gs.geometry_shader-culldistance_output[index]; return draw-vs.vertex_shader-culldistance_output[index]; diff --git a/src/gallium/auxiliary/draw/draw_gs.c b/src/gallium/auxiliary/draw/draw_gs.c index b762dd6..cd63e2b 100644 --- a/src/gallium/auxiliary/draw/draw_gs.c +++ b/src/gallium/auxiliary/draw/draw_gs.c @@ -792,13 +792,13 @@ draw_create_geometry_shader(struct draw_context *draw, if (gs-info.output_semantic_name[i] == TGSI_SEMANTIC_VIEWPORT_INDEX) gs-viewport_index_output = i; if (gs-info.output_semantic_name[i] == TGSI_SEMANTIC_CLIPDIST) { - if (gs-info.output_semantic_index[i] == 0) -gs-clipdistance_output[0] = i; - else -gs-clipdistance_output[1] = i; + debug_assert(gs-info.output_semantic_index[i] + PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT); + gs-clipdistance_output[gs-info.output_semantic_index[i]] = i; } if (gs-info.output_semantic_name[i] == TGSI_SEMANTIC_CULLDIST) { - debug_assert(gs-info.output_semantic_index[i] Elements(gs-culldistance_output)); + debug_assert(gs-info.output_semantic_index[i] + PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT); gs-culldistance_output[gs-info.output_semantic_index[i]] = i; } } diff --git a/src/gallium/auxiliary/draw/draw_gs.h b/src/gallium/auxiliary/draw/draw_gs.h index 05d666d..e279a80 100644 --- a/src/gallium/auxiliary/draw/draw_gs.h +++ b/src/gallium/auxiliary/draw/draw_gs.h @@ -67,8 +67,8 @@ struct draw_geometry_shader { struct tgsi_shader_info info; unsigned position_output; unsigned viewport_index_output; - unsigned clipdistance_output[2]; - unsigned culldistance_output[2]; + unsigned clipdistance_output[PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT]; + unsigned culldistance_output[PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT]; unsigned max_output_vertices; unsigned primitive_boundary; diff --git a/src/gallium/auxiliary/draw/draw_vs.c b/src/gallium/auxiliary/draw/draw_vs.c index a0bebcc..bbccbe4 100644 --- a/src/gallium/auxiliary/draw/draw_vs.c +++ b/src/gallium/auxiliary/draw/draw_vs.c @@ -86,12 +86,12 @@ draw_create_vertex_shader(struct draw_context *draw, found_clipvertex = TRUE; vs-clipvertex_output = i; } else if (vs-info.output_semantic_name[i] == TGSI_SEMANTIC_CLIPDIST) { -if (vs-info.output_semantic_index[i] == 0) - vs-clipdistance_output[0] = i; -else - vs-clipdistance_output[1] = i; +debug_assert(vs-info.output_semantic_index[i] + PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT); +vs-clipdistance_output[vs-info.output_semantic_index[i]] = i; } else if (vs-info.output_semantic_name[i] == TGSI_SEMANTIC_CULLDIST) { -debug_assert(vs-info.output_semantic_index[i] Elements(vs-culldistance_output)); +debug_assert(vs-info.output_semantic_index[i] + PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT);
Re: [Mesa-dev] [PATCH 03/13] gallium: Introduce 32-bit bytewise format names
On 06.06.2013 10:34, Richard Sandiford wrote: Michel Dänzer mic...@daenzer.net writes: On Die, 2013-06-04 at 10:47 +0100, Richard Sandiford wrote: (2) it uses PIPE_FORMAT_INT_* names with the lsb first rather than the mesa-like ones with msb first. (I'm happy to change the names to something else though.) The patch isn't in a submittable state yet. I just thought it was worth posting because the lsb-first names do make the change look a bit more obvious/less scary :-) I can see the appeal of that, but I also see some danger in that naming scheme: It'll be easy to miss the difference between the two kinds of formats, e.g. when grepping for B8G8R8A8. That's why I'd prefer making the difference more explicit in the naming scheme. Sticking to LSB first, BGRA might already look a little less scary? :) I realise this was probably more a question for Jose, but FWIW: I liked the names you originally suggested for their consistency with mesa and natural number ordering (as you said). The PIPE_FORMAT_INT_* I don't like that _INT_, it could be confused with the SINT/UINT component type postfix, and it's redundant. The distinction provided by R8G8B8A8 vs RGBA is already sufficient. Neither do I like REV, I always have to check what order that actually implies (but then I hardly ever deal with mesa format names). Why not just defined it as RxGyBzAw meaning left to right = lowest address to highest address and RGBAxyzw meaning left to right = least/most (so that it matches the non-REV variant) to most/least significant bit-tuple in a word ? And you can do RG16[_]BG16 if you have 2 words, or R32_G32_B32_A32 for 4 words, but this ugly speciment is equivalent to R32G32B32A32 so it won't ever appear to hurt your eyes. version seemed OK too from the lowest always first perspective. I'm just afraid that if we use BGRA to mean the reverse of what it means in mesa, these patches are going to be cursed by gallium developers for years to come. BGRA_REV would be consistent with the mesa names while being lsb-first, and I'd be happy with that too FWIW. It's just that _REV kind of implies that the other order is somehow the canonical one. Having all int formats end in _REV might seem a bit odd. Thanks, Richard ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium: add support for layered rendering
On 01.06.2013 01:02, Alex Deucher wrote: On Fri, May 31, 2013 at 6:54 PM, Roland Scheidegger srol...@vmware.com wrote: Am 31.05.2013 23:43, schrieb srol...@vmware.com: From: Roland Scheidegger srol...@vmware.com Since pipe_surface already has all the necessary fields no interface changes are necessary except adding a new shader semantic value (TGSI_SEMANTIC_LAYER), though add a pipe capability bit for it as well. (Note that what GL knows as gl_Layer variable d3d10 is naming RENDER_TARGET_ARRAY_INDEX) --- src/gallium/docs/source/screen.rst |2 ++ src/gallium/include/pipe/p_defines.h |3 ++- src/gallium/include/pipe/p_shader_tokens.h |3 ++- 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 683080c..b74b237 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -168,6 +168,8 @@ The integer capabilities: since they are linked) a driver can support. Returning 0 is equivalent to returning 1 because every driver has to support at least a single viewport/scissor combination. +* ``PIPE_CAP_LAYERED_RENDERING``: Whether rendering to multiple layers is + supported using layer selection by the TGSI_SEMANTIC_LAYER shader variable. .. _pipe_capf: diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index 8af1a84..c359a9e 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -508,7 +508,8 @@ enum pipe_cap { PIPE_CAP_QUERY_PIPELINE_STATISTICS = 81, PIPE_CAP_TEXTURE_BORDER_COLOR_QUIRK = 82, PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE = 83, - PIPE_CAP_MAX_VIEWPORTS = 84 + PIPE_CAP_MAX_VIEWPORTS = 84, + PIPE_CAP_MULTIPLE_LAYERS = 85 }; Actually I don't think is a good name, PIPE_CAP_LAYERED_RENDERING might be better? I'm open to just about any suggestion though :-). FWIW, I prefer PIPE_CAP_LAYERED_RENDERING as well. Other colors: PIPE_CAP_RENDER_TARGET_INDEX PIPE_CAP_RENDER_TARGET_ARRAY_INDEX PIPE_CAP_RENDER_TARGET_LAYERS Or PIPE_CAP_GS_LAYER_SELECTION to make it clear that the driver doesn't support GL_AMD_vertex_shader_layer ? Alex Roland #define PIPE_QUIRK_TEXTURE_BORDER_COLOR_SWIZZLE_NV50 (1 0) diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index b33cf1d..c984d50 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -165,7 +165,8 @@ struct tgsi_declaration_interp #define TGSI_SEMANTIC_TEXCOORD 19 /** texture or sprite coordinates */ #define TGSI_SEMANTIC_PCOORD 20 /** point sprite coordinate */ #define TGSI_SEMANTIC_VIEWPORT_INDEX 21 /** viewport index */ -#define TGSI_SEMANTIC_COUNT 22 /** number of semantic values */ +#define TGSI_SEMANTIC_LAYER 22 /** layer (rendertarget index) */ +#define TGSI_SEMANTIC_COUNT 23 /** number of semantic values */ struct tgsi_declaration_semantic { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Instancing support in r300g?
On 18.05.2013 13:05, Lauri Kasanen wrote: Hi, The 'net claims that instancing is a SM3 feature[1] (r500), but also supported on SM2 ATI cards[2] (r300-r400). Yet r300g claims no support for it, and it seems that even Nvidia's r300_get_param: case PIPE_CAP_VERTEX_ELEMENT_INSTANCE_DIVISOR: return 1; That's ARB_instanced_arrays, which is what d3d9 supports (IDirect3DDevice9::SetStreamSourceFreq). Windows drivers don't expose ARB_draw_instanced on gf6 and gf7[3]. What's the story here? Does the GL extension use something different than what DX uses? - Lauri [1] http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter03.html Using the Geometry Instancing API provided by DirectX 9 and fully supported in hardware by GeForce 6 Series GPUs [2] http://aras-p.info/texts/D3D9GPUHacks.html , http://www.hardwareheaven.com/industry-news/51427-farcy-1-2-withdrawn-patch.html [3] http://feedback.wildfiregames.com/report/opengl/feature/GL_ARB_draw_instanced ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Instancing support in r300g?
On 18.05.2013 17:41, Marek Olšák wrote: ARB_draw_instanced is a DX10 feature. The R300-R500 chipsets do not support instancing at all. ARB_instanced_arrays is emulated with a loop in the driver, so that instancing is supported in Wine/DX9. Modern NV cards still require you to loop in the driver ... the only hardware support for instancing they added is a builtin counter. Marek On Sat, May 18, 2013 at 4:59 PM, Lauri Kasanen c...@gmx.com wrote: On Sat, 18 May 2013 17:46:32 +0300 Lauri Kasanen c...@gmx.com wrote: On Sat, 18 May 2013 13:50:35 +0200 Christoph Bumiller e0425...@student.tuwien.ac.at wrote: r300_get_param: case PIPE_CAP_VERTEX_ELEMENT_INSTANCE_DIVISOR: return 1; That's ARB_instanced_arrays, which is what d3d9 supports (IDirect3DDevice9::SetStreamSourceFreq). Instanced arrays alone, without a way to draw instances is pretty useless, would you say? My mistake, I didn't see the arrays extension adds those draw calls in that situation. - Lauri ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/tgsi: clarify (possibly change) TGSI_OPCODE_UCMP definition
On 08.05.2013 03:48, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com UCMP while an integer opcode isn't really consistently implemented as having all integer arguments. softpipe will assume all arguments are ints, whereas gallivm has the arguments defined as untyped which means they'll get treated as floats. This means input modifiers will not work the same. Fix this by saying only first arg is an integer, which seems more useful than making all arguments integers - this would be similar to d3d10 movc opcode. --- src/gallium/docs/source/tgsi.rst |5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 3af1fb7..852f8a0 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -1291,6 +1291,11 @@ Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?) .. opcode:: UCMP - Integer Conditional Move +.. note:: + + Only the first source arg is an integer, the 2nd and 3rd ones are + considered floats (for input modifier purposes). + As long as you patch up all the occurrences of tgsi_opcode_infer_src_type and make it take an argument to identify the source ... I'd rather just forbid modifiers on moves, i.e. MOV and UCMP, since at least MOV returns TGSI_TYPE_UNTYPED and untyped values can't be operated on. For the ordinary MOV we have NEG and ABS, and for UCMP the backend optimizer can take care of merging modifiers into the instruction (nvc0's UCMP (slct u32) doesn't support modifiers). .. math:: dst.x = src0.x ? src1.x : src2.x ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nouveau: emit and flush fence in fence_signalled if needed
On 07.05.2013 19:25, Bryan Cain wrote: The Mesa state tracker expects us to emit the fence even if it doesn't call fence_finish. Notably, this occurs when glClientWaitSync is called with timeout 0. Fixes Portal and Left 4 Dead 2, which were both stalling on startup by repeatedly calling glClientWaitSync with timeout 0 while waiting for commands to complete. --- I'm not sure I want to do this. pipe_screen::fence_signalled probably shouldn't flush the command buffer, r600g doesn't seem to do it either. They should probably call glFlush() before looping on glClientWaitSync, or, if they don't have anything better to do in the meantime, simply specify an infinite timeout if they're going to loop forever anyway. src/gallium/drivers/nouveau/nouveau_fence.c | 36 ++- src/gallium/drivers/nouveau/nouveau_fence.h |1 + 2 files changed, 25 insertions(+), 12 deletions(-) diff --git a/src/gallium/drivers/nouveau/nouveau_fence.c b/src/gallium/drivers/nouveau/nouveau_fence.c index dea146c..722be01 100644 --- a/src/gallium/drivers/nouveau/nouveau_fence.c +++ b/src/gallium/drivers/nouveau/nouveau_fence.c @@ -167,6 +167,25 @@ nouveau_fence_update(struct nouveau_screen *screen, boolean flushed) } } +boolean +nouveau_fence_ensure_flushed(struct nouveau_fence *fence) +{ + struct nouveau_screen *screen = fence-screen; + + if (fence-state NOUVEAU_FENCE_STATE_EMITTED) { + nouveau_fence_emit(fence); + + if (fence == screen-fence.current) + nouveau_fence_new(screen, screen-fence.current, FALSE); + } + if (fence-state NOUVEAU_FENCE_STATE_FLUSHED) { + if (nouveau_pushbuf_kick(screen-pushbuf, screen-pushbuf-channel)) + return FALSE; + } + + return TRUE; +} + #define NOUVEAU_FENCE_MAX_SPINS (1 31) boolean @@ -174,8 +193,9 @@ nouveau_fence_signalled(struct nouveau_fence *fence) { struct nouveau_screen *screen = fence-screen; - if (fence-state = NOUVEAU_FENCE_STATE_EMITTED) - nouveau_fence_update(screen, FALSE); + if (!nouveau_fence_ensure_flushed(fence)) + return FALSE; + nouveau_fence_update(screen, FALSE); return fence-state == NOUVEAU_FENCE_STATE_SIGNALLED; } @@ -189,16 +209,8 @@ nouveau_fence_wait(struct nouveau_fence *fence) /* wtf, someone is waiting on a fence in flush_notify handler? */ assert(fence-state != NOUVEAU_FENCE_STATE_EMITTING); - if (fence-state NOUVEAU_FENCE_STATE_EMITTED) { - nouveau_fence_emit(fence); - - if (fence == screen-fence.current) - nouveau_fence_new(screen, screen-fence.current, FALSE); - } - if (fence-state NOUVEAU_FENCE_STATE_FLUSHED) { - if (nouveau_pushbuf_kick(screen-pushbuf, screen-pushbuf-channel)) - return FALSE; - } + if (!nouveau_fence_ensure_flushed(fence)) + return FALSE; do { nouveau_fence_update(screen, FALSE); diff --git a/src/gallium/drivers/nouveau/nouveau_fence.h b/src/gallium/drivers/nouveau/nouveau_fence.h index 3984a9a..d497c7f 100644 --- a/src/gallium/drivers/nouveau/nouveau_fence.h +++ b/src/gallium/drivers/nouveau/nouveau_fence.h @@ -34,6 +34,7 @@ boolean nouveau_fence_new(struct nouveau_screen *, struct nouveau_fence **, boolean nouveau_fence_work(struct nouveau_fence *, void (*)(void *), void *); voidnouveau_fence_update(struct nouveau_screen *, boolean flushed); voidnouveau_fence_next(struct nouveau_screen *); +boolean nouveau_fence_ensure_flushed(struct nouveau_fence *); boolean nouveau_fence_wait(struct nouveau_fence *); boolean nouveau_fence_signalled(struct nouveau_fence *); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] RFC: tgsi opcodes for 32x32 muls with 64bit results
On 03.05.2013 16:32, Jose Fonseca wrote: - Original Message - Am 03.05.2013 06:58, schrieb Jose Fonseca: - Original Message - Currently, there's no way to get the high bits of a 32x32 signed/unsigned integer multiplication with tgsi. However, all of d3d10, OpenGL, and OpenCL support that, so we need it as well. There's essentially two ways how it could be done: - a 2-destination instruction returning both high and low bits (this is how it looks like in d3d10 and glsl) - use the existing umul for the low bits and have another instruction for the high bits (this is how it looks like in opencl) Well there's other possibilities but these looked like they'd match both APIs and HW reasonably (well with the exception of things like sse2 which would prefer 2x2 32bit inputs and return 2x64bit as one reg...). Actually it's two new instructions because unlike for the low bits it matters for the high bits if the source operands are signed or unsigned. Personally I'm favoring two separate instructions for low and high bits to not have to deal with multi-destination instructions, but if someone makes a strong case for one returning both low and high bits I could be convinced otherwise. I think though two instructions matches most hw very well (with the exception of software renderers and possibly intel graphics but then a good backend could certainly recognize this). Roland, I don't know about GPU HW, but I think that what you propose will forever prevent decent SSE code generation with LLVM. Using two separate opcodes for hi/low bits relies on common sub-expression elimination to merge the two multiplication operations back into one. But I strongly doubt that even LLVM's optimization passes will be able to do that. Getting the 64bits results with LLVM will require sign extend the source arguments (http://llvm.org/docs/LangRef.html#mul-instruction ) or SSE intrinsics. Eitherway, the expressions for the low and high bit will be radically different, so we'll end with two multiplies in the end -- which I think it is simply inadmissible -- TGSI should not stand in the way of backends generating good code. You can't generate good code either way, this is a deficiency of sse instruction set. As I've outlined in another email, I think the best you can do with sse41 is: - shuffle both src args (put 2nd/4th elements into 1st/3rd slot) - 2xpmuldq/pmuludq for doing the 32x32-64bit mul for both 1st/3rd and 2nd/4th element - shuffle the high bits into place (I think this needs 3 hw shuffle instructions) - shuffle the low bits into place (can benefit from shuffles for high bits, so just one another shuffle) Maybe you can do better with more clever shuffles, but in any case the low bits will always require one (at least) additional shuffle. If you have separate opcodes, everything will be the same, except the last step you'll just ignore that shuffle and instead just use the pmulld instruction, which will do exactly what you need for the low bits. Sure multiplications are more effort for the hw, but hell it even has the same throughput on most cpus compared to a shuffle, just latency is worse. In any case it would be 8 vs 8 instructions, with just one instruction of them very slightly worse. We have much more optimization opportunities elsewhere than that (I agree that with sse2, which lacks pmulld, it would be worse, but we never particularly cared about that). That's the thing -- if we have 32x32-64 opcodes we can fine tune this later. If we stick with separate high bit opcodes then that ability is lost (at least without coming back and changing TGSI again). So I strongly think this is a bad idea. TGSI has support for multiple destinations, though we never made much use of it. I see nothing special about it. If you can prove me wrong -- that LLVM can handle merge the multiplies -- fine. But I do think we have bigger fish to fry, so I'd prefer we don't put too much time debating this. No I doubt llvm can merge it (though in theory nothing would prevent it from recognizing the pattern). My guess is it will do scalar extraction, and use the imul/mul instructions (which can return 2x32bit numbers even on 32bit), then combine the vectors back together (most likely element by element). If it actually does it like that, a separate mul for the low bits would be in fact a win, because it would save the 4 reinsertion of the elements at the cost of just one vector mul (llvm uses pmulld just fine). But looking at this that way doesn't really make sense, we need instructions which make sense for everybody and aren't specified to suit one very peculiar implementation. But even if it generates optimal code, fact is that the multiply for getting the low bits is essentially noise in the whole instruction sequence. And who knows maybe intel will one day add some pmulhd/pmulhud instruction (which just makes plain more sense for vector
Re: [Mesa-dev] [PATCH 2/5] gallium: increase the number of available stream output decls
On 25.04.2013 19:22, Roland Scheidegger wrote: Am 24.04.2013 00:58, schrieb Zack Rusin: There can be more stream output decls than shader outputs because individual components from them can be split and distributed among different so buffers. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/include/pipe/p_state.h |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/include/pipe/p_state.h b/src/gallium/include/pipe/p_state.h index c0b2bcd..5830dff 100644 --- a/src/gallium/include/pipe/p_state.h +++ b/src/gallium/include/pipe/p_state.h @@ -64,6 +64,7 @@ extern C { #define PIPE_MAX_SHADER_RESOURCES 32 #define PIPE_MAX_TEXTURE_LEVELS 16 #define PIPE_MAX_SO_BUFFERS4 +#define PIPE_MAX_SO_OUTPUT_COMPONENT_COUNT 128 struct pipe_reference @@ -198,7 +199,7 @@ struct pipe_stream_output_info unsigned num_components:3; /** 1 to 4 */ unsigned output_buffer:3; /** 0 to PIPE_MAX_SO_BUFFERS */ unsigned dst_offset:16; /** offset into the buffer in dwords */ - } output[PIPE_MAX_SHADER_OUTPUTS]; + } output[PIPE_MAX_SO_BUFFERS * PIPE_MAX_SO_OUTPUT_COMPONENT_COUNT]; }; Are you sure this isn't overkill, that is if you have multiple buffers this really increases the total number of attributes you can output? I Actually yes, we can output 4 * [0 to 128] components on = Fermi. It's getting a bit large though (2 KiB), so I'd probably switch to not storing that whole struct for each shader then (in the driver) ... thought this merely allows you to distribute the same number to different buffers. Also I'm not quite convinced with the 128 number. It looks like d3d10 has a limit of 64 components, and it seems like OpenGL would be happy with that as well. Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] gallium: Replaced gl_rasterization_rules with lower_left_origin and half_pixel_center.
On 23.04.2013 18:28, Jose Fonseca wrote: Ok. I've moved the docs to src/gallium/docs/source/cso/rasterizer.rst , and renamed `lower_left_origin` to `bottom_edge_rule`. Well, that doesn't work for NV, but it's at least less invasive for radeon since you don't have to change the state tracker (using lower_left_origin instead of flipping viewport + bottom_edge_rule) to get things working correctly. /me breathes, tries not to care, too much stuff on my plate already This is how it looks like: http://people.freedesktop.org/~jrfonseca/gl_rasterization_rules/cso/rasterizer.html#other-members Jose - Original Message - Yeah, I was confused when reading the comment and the diagrams. It probably shouldn't mention the screen origin at all and instead should say which one of the top and bottom edges is inclusive and which one is exclusive when determining pixel ownership. Anyway, thank you for fixing this. I would have probably never knew how to fix the triangle rasterization tests if you didn't bring this up. Marek On Sun, Apr 21, 2013 at 7:54 PM, Jose Fonseca jfons...@vmware.com wrote: - Original Message - Some suggestions for the name: lower_left_edge_rule lower_left_rasterization_edge_rule gl_edge_rule gl_rasterization_edge_rule In this case, the name is not as important as the documentation which defines the behavior of the state. On that note, I thought that James' diagrams were pretty good. Maybe the axis is misleading. + /** +* Triangle rasterization always uses a 'top,left' rule for pixel ownership, +* this just alters what we consider to be the top edge for that test. +* +* When true, screen coordinates origin is considered to be at bottom-left +* (e.g., OpenGL drawables): +* +* y ^ +*| +*| +=+ - top edge +*| | | +*| | | +*| | | +*| +-+ +*| +* 0 +- +*0x +* +* When false, screen coordinates origin is considered to be at top-left +* (e.g., OpenGL FBOs, D3D): +* +*0x +* 0 +- +*| +*| +=+ - top edge +*| | | +*| | | +*| | | +*| +-+ +*| +* y V * -* Triangle rasterization always uses a 'top,left' rule for pixel -* ownership, this just alters which point we consider the pixel -* center for that test. +* See also: +* - http://www.opengl.org/registry/specs/ARB/fragment_coord_conventions.txt +* - http://msdn.microsoft.com/en-us/library/windows/desktop/cc627092.aspx +* - http://msdn.microsoft.com/en-us/library/windows/desktop/bb147314.aspx */ - unsigned gl_rasterization_rules:1; + unsigned lower_left_origin:1; /** * When true, rasterization is disabled and no pixels are written. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] gallium: Replaced gl_rasterization_rules with lower_left_origin and half_pixel_center.
On 21.04.2013 14:35, Jose Fonseca wrote: - Original Message - On 21.04.2013 13:18, Jose Fonseca wrote: I think that drivers can just report all 4 CAPs as supported and do the adjustment in the shader themselves (no need for recompilation, just use uniforms, the st already does it like that), provided that the state tracker actually uses the rasterizer origin bit instead of changing the viewport and applies no transformation to the fragment coordinate whatsoever. I'm not sure how much that simplifies in the end. If the drivers need to resort to uniforms to deal with all combinations, then how will making the gl_Fragcoord/viewport transformation depend on lower_left_origin simplify things? Is it really true that for all hardware gl_FragCoord will depend on the lower_left_origin rasterizer state? I don't know about all hardware. R600 doesn't have that origin switch, but the half-integers switch might have an effect. My suggestion about letting the driver modify the coordinate was to avoid having a dependency in the gallium interface between the shader setting, or worse, yet another cap about whether it exists. The only (small) issue is, if a driver does handle the origin switch and compensates for the effect on FragCoord, and the state tracker decides to not use that switch and just flips the viewport, it has to do its own transformation on FragCoord, we get to do 2 transformations. Finally, I think this is precisely what Marek was concerned; so to allow existing drivers to opt out from having to deal with this, we'll need a cap. Which is, I guess, why we have to add both versions depending on a CAP once again, i.e. for some drivers the origin switch in the rasterizer is used (nouveau at least; this should affect the edge rule; I think I looked for an independent switch way back and didn't find one) and for other drivers the viewport is flipped in combination with changing a separate edge rule rasterizer state. Maybe some drivers even support both (independent change of edge rule and origin) ... That said, I don't oppose any of this if it make HW driver implementer lives easier. But how seriously/quickly are you and other hardware drivers maintainers actually aiming at implementing this? I don't wanna go through all that trouble if nobody will care. Well, there's not much code (in terms of lines) to write on the driver side, but code that uglifies things always takes a bit longer to become comfortable with ... Either way, I think that this patch series already is a good improvement over the ugly one-bit-fit-all-needs gl_rasterization_rules state, and should cause no regressions whatsoever. I'd like to tackle the entanglement of lower_left_origin with other bits of state in a follow-on gallium change after there is a clearer understanding/consensus if/how will HW implement this. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] gallium: Replaced gl_rasterization_rules with lower_left_origin and half_pixel_center.
On 21.04.2013 09:36, Jose Fonseca wrote: - Original Message - Do we really need the lower_left_origin state? I think I can't implement it for radeon and it's the kind of stuff that should be taken care of by the state tracker anyway. My understanding is that hardware had switches for this sort of thing. It's really hard to provide fully-conforming rasterization for opengl, dx9 dx10 without it. If your hardware allows to put a negative pitch on rendertargets, then that should also do it. I have a switch for the upside down thing, but maybe it could be framebuffer state instead of rasterizer state (since it's going to either not change (D3D) or only change with the famebuffer, and I have to set WINDOW_OFFSET_Y to 0 / fb height depending on the setting of Y direction (the latter won't work with MRTs, but that's the non-FBO case anyway)) ? R600 seems to have PA_SU_VTX_CNTL.PIX_CENTER but no state to change the window origin / direction ... and I'd rather not have to bother with it myself either. Also, note that this state and the pixel center one might (or maybe I should say will) affect the values of hardware's gl_FragCoord and hence PIPE_CAP_TGSI_FS_COORD_ORIGIN/PIXEL_CENTER*, i.e. the shader transformation of that input must be adjusted according to this state. I'd probably be OK with making this the driver's task. If you know what is the hardware's sub-pixel rasterization resolution, then adding a vertical bias equal to that amount, depending on this state, would give a very close approximation. (This would get the top/bottom edges right, at expense of small inaccuracies on non-horizontal edges) Isn't it sufficient to just set a viewport which is upside down, like we do now? I'm not aware of rasterization top-left rule being affected by the viewport flipping. Do both ./bin/triangle-rasterization -auto ./bin/triangle-rasterization -use_fbo -auto currently work for you? If drivers don't provide this state, the only way to workaround it I know would be to store textures (or drawables?) up-side down, and flip them on gl(Get)TexImage friends. This would be like using a cannon to shoot a fly (a lot of work and a lot of overheads for a small correctness detail). I think the drivers are better equipped to handle this. And you always have the option of merely ignoring this state. Top-left rule correct rasterization has, after all, been ignored till date, and nobody cared. For the record, my motivation here is simple: llvmpipe gets the right behavior on GL drawables, and fails on GL FBOs D3D 9/10. I want to get the right behavior on D3D 9/10 without causing regressions on GL drawables. BTW, I'd imagine that if hardware rasterizer behavior is hardcoded to anything, it would be to D3D 9/10 behavior. That is, they would get GL FBO right, but drawables wrong. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] gallium: Replaced gl_rasterization_rules with lower_left_origin and half_pixel_center.
On 21.04.2013 12:34, Dave Airlie wrote: On Sun, Apr 21, 2013 at 5:36 PM, Jose Fonseca jfons...@vmware.com wrote: - Original Message - Do we really need the lower_left_origin state? I think I can't implement it for radeon and it's the kind of stuff that should be taken care of by the state tracker anyway. My understanding is that hardware had switches for this sort of thing. It's really hard to provide fully-conforming rasterization for opengl, dx9 dx10 without it. If your hardware allows to put a negative pitch on rendertargets, then that should also do it. If you know what is the hardware's sub-pixel rasterization resolution, then adding a vertical bias equal to that amount, depending on this state, would give a very close approximation. (This would get the top/bottom edges right, at expense of small inaccuracies on non-horizontal edges) Isn't it sufficient to just set a viewport which is upside down, like we do now? I'm not aware of rasterization top-left rule being affected by the viewport flipping. Do both ./bin/triangle-rasterization -auto ./bin/triangle-rasterization -use_fbo -auto currently work for you? just FYI, on my evergreen, the first fails the second passes, maybe someone could try on fglrx, I'd be sorta willing to guess AMD hw just does DX10 :) and I think I've heard some complaints about our rendering offseting being wrong somewhere in the past on r600. Same on nouveau. On NV blob it's the other way around, it fails for -use_fbo. So clearly, both can work. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] gallium: Replaced gl_rasterization_rules with lower_left_origin and half_pixel_center.
On 21.04.2013 13:18, Jose Fonseca wrote: - Original Message - On 21.04.2013 09:36, Jose Fonseca wrote: - Original Message - Do we really need the lower_left_origin state? I think I can't implement it for radeon and it's the kind of stuff that should be taken care of by the state tracker anyway. My understanding is that hardware had switches for this sort of thing. It's really hard to provide fully-conforming rasterization for opengl, dx9 dx10 without it. If your hardware allows to put a negative pitch on rendertargets, then that should also do it. I have a switch for the upside down thing, but maybe it could be framebuffer state instead of rasterizer state (since it's going to either not change (D3D) You're right, they should never change at higher frequency than per-framebuffer. But due to auxiliary modules like u_blit, u_blitter, u_gen_mipmap, this state will eventually change even for D3D state trackers. (This is however fixable, if there are performance implications switching this state, we could enhance these helper modules so that they switch it often. But I doubt this is a problem in practice) or only change with the famebuffer, and I have to set WINDOW_OFFSET_Y to 0 / fb height depending on the setting of Y direction (the latter won't work with MRTs, but that's the non-FBO case anyway)) ? Yes, it could go in theory, and truth is rasterizer state is full of bits that apply to other stages of the pipeline, but the practical hurdle of moving this to pipe_framebuffer is that pipe_framebuffer has no discrete state beyond surfaces so far (it is little more than a tuple of surfaces), so a lot of code would need to be updated to fill, propagate, and consider such state in pipe_framebuffer... I presume your concern is that rasterizer state changes frequently where as framebuffer state changes infrequently, so adding a dependency would cause framebuffer to be processed more often than desired. You can avoid that by keeping track of the lower_left_origin state independently at nvc0_rasterizer_state_bind: diff --git a/src/gallium/drivers/nvc0/nvc0_state.c b/src/gallium/drivers/nvc0/nvc0_state.c index cba076f..2a6fabf 100644 --- a/src/gallium/drivers/nvc0/nvc0_state.c +++ b/src/gallium/drivers/nvc0/nvc0_state.c @@ -324,6 +324,12 @@ nvc0_rasterizer_state_bind(struct pipe_context *pipe, void *hwcso) nvc0-rast = hwcso; nvc0-dirty |= NVC0_NEW_RASTERIZER; + + if (nvc0-rast + nvc0-lower_left_origin != nvc0-rast-pipe.lower_left_origin) { + nvc0-lower_left_origin = nvc0-rast-pipe.lower_left_origin; + nvc0-dirty |= NVC0_NEW_FRAMEBUFFER; + } } static void This means you won't need to validate framebuffer anymore often than strictly necessary. You could also have a new NVC0_NEW_FRAMEBUFFER_ORIGIN flag, just for tidyness. R600 seems to have PA_SU_VTX_CNTL.PIX_CENTER but no state to change the window origin / direction ... and I'd rather not have to bother with it myself either. I need to get this working flawlessly on llvmpipe, but I really see no much need for hw driver developers to rush and get this handled properly. There is probably much bigger fish to fry. If people care enough to devise a state tracker workaround, we could have this on a PIPE_CAP. I'd be all for it. But even in that case, I think that nudging the coordinates slightly would probably get the most bang for buck. Also, note that this state and the pixel center one might (or maybe I should say will) affect the values of hardware's gl_FragCoord and hence PIPE_CAP_TGSI_FS_COORD_ORIGIN/PIXEL_CENTER*, i.e. the shader transformation of that input must be adjusted according to this state. I'd probably be OK with making this the driver's task. The FS_COORD_PIXEL_CENTER spec in src/gallium/docs/source/tgsi.rst already stated that these are independent: Note that this does not affect the set of fragments generated by rasterization, which is instead controlled by gl_rasterization_rules in the rasterizer. And I'm not changing the semantics. That also seems the spirit of GL_ARB_fragment_coord_conventions spec. I wouldn't object to add to Gallium a dependency betwen these state if it helps hw driver developers, but I don't see how we could define it in such way that it would work well for all cases. And I suspect that different hardware probably handles this slightly differently (ie, what is orthogonal to some is not to others). I think that drivers can just report all 4 CAPs as supported and do the adjustment in the shader themselves (no need for recompilation, just use uniforms, the st already does it like that), provided that the state tracker actually uses the rasterizer origin bit instead of changing the viewport and applies no transformation to the fragment coordinate whatsoever. Jose ___ mesa-dev mailing list
Re: [Mesa-dev] [PATCH 2/5] gallium: document breakc and switch/case/default/endswitch
On 19.04.2013 09:26, Jose Fonseca wrote: - Original Message - From: Roland Scheidegger srol...@vmware.com docs were missing, especially the opcode-from-hell switch however is anything but obvious. --- src/gallium/docs/source/tgsi.rst | 57 ++ 1 file changed, 51 insertions(+), 6 deletions(-) diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index b7180f8..b46347e 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -861,7 +861,18 @@ This instruction replicates its result. .. opcode:: BRK - Break - TBD + Unconditionally moves the point of execution to the instruction after the + next endloop or endswitch. The instruction must appear within a loop/endloop + or switch/endswitch. + + +.. opcode:: BREAKC - Break Conditional + + Conditionally moves the point of execution to the instruction after the + next endloop or endswitch. The instruction must appear within a loop/endloop + or switch/endswitch. + Condition evaluates to true if src0.x != 0 where src0.x is interpreted + as an integer register. This is fine. But I do wonder if hardware can really benefit from UIF foo; BREAK; ENDIF vs BREAKC foo, or if this is just syntactic sugar that merely burdens developers. IF; BREAK; ENDIF usually gets optimized into a BREAKC anyway, so, it's just easier on the compiler and people who write shaders in TGSI, and drivers without optimization. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 02/14] st/mesa: add a simple path to BufferData if it only discards buffer contents
On 19.04.2013 14:08, Marek Olšák wrote: That's not true. PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE does not copy data in radeon drivers. It really does what st/mesa does - it creates a new buffer and throws away the old one, which doesn't take any GPU bandwidth. Doing that at a lower level should be faster in theory (+ Moreover, for VRAM buffers it also saves me the reallocation because they always use staging transfers. The only downside is if you do have to reallocate you have to check all binding points and set them dirty if they use the resource in question. I just hope that for the other multithreaded contexts (which you can't really interrupt to set dirty bits) we can assume that the user takes care of any issues ... drivers have multiple options how to implement the discarding). Only PIPE_TRANSFER_DISCARD_RANGE copies data in radeon drivers, which is not used here. Marek On Wed, Apr 17, 2013 at 8:15 PM, Eric Anholt e...@anholt.net wrote: Marek Olšák mar...@gmail.com writes: The next patch makes sure _NEW_BUFFER_OBJECT is not needlessly set for this code. This seems like a pretty dubious optimization -- on UMA systems you're increasing the memory bandwidth usage in the data case, and only trying to eliminate update_array_object_max_element, which also happens with _NEW_PROGRAM (I bet it's true every time that a _NEW_BUFFER_OBJECT was flagged, anyway). In short, for the Mesa core change, I'd like to see some actual performance justification on this one. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] st/mesa: optionally apply texture swizzle to border color v2
From: Christoph Bumiller christoph.bumil...@speed.at This is the only sane solution for nv50 and nvc0 (really, trust me), but since on other hardware the border colour is tightly coupled with texture state they'd have to undo the swizzle, so I've added a cap. The dependency of update_sampler on the texture updates was introduced to avoid doing the apply_depthmode to the swizzle twice. v2: Moved swizzling helper to u_format.c, extended the CAP to provide more accurate information. --- src/gallium/auxiliary/util/u_format.c| 34 ++ src/gallium/auxiliary/util/u_format.h| 12 src/gallium/docs/source/cso/sampler.rst |6 ++- src/gallium/docs/source/screen.rst | 11 +++ src/gallium/drivers/freedreno/freedreno_screen.c |1 + src/gallium/drivers/i915/i915_screen.c |1 + src/gallium/drivers/llvmpipe/lp_screen.c |2 + src/gallium/drivers/nv30/nv30_screen.c |1 + src/gallium/drivers/nv50/nv50_screen.c |2 + src/gallium/drivers/nvc0/nvc0_screen.c |2 + src/gallium/drivers/r300/r300_screen.c |1 + src/gallium/drivers/r600/r600_pipe.c |3 ++ src/gallium/drivers/radeonsi/radeonsi_pipe.c |1 + src/gallium/drivers/softpipe/sp_screen.c |2 + src/gallium/drivers/svga/svga_screen.c |2 + src/gallium/include/pipe/p_defines.h |7 - src/mesa/state_tracker/st_atom.c |2 +- src/mesa/state_tracker/st_atom_sampler.c | 27 +++-- src/mesa/state_tracker/st_context.c |3 ++ src/mesa/state_tracker/st_context.h |1 + 20 files changed, 114 insertions(+), 7 deletions(-) diff --git a/src/gallium/auxiliary/util/u_format.c b/src/gallium/auxiliary/util/u_format.c index 1845637..9bdc2ea 100644 --- a/src/gallium/auxiliary/util/u_format.c +++ b/src/gallium/auxiliary/util/u_format.c @@ -632,6 +632,40 @@ void util_format_compose_swizzles(const unsigned char swz1[4], } } +void util_format_apply_color_swizzle(union pipe_color_union *dst, + const union pipe_color_union *src, + const unsigned char swz[4], + const boolean is_integer) +{ + unsigned c; + + if (is_integer) { + for (c = 0; c 4; ++c) { + switch (swz[c]) { + case PIPE_SWIZZLE_RED: dst-ui[c] = src-ui[0]; break; + case PIPE_SWIZZLE_GREEN: dst-ui[c] = src-ui[1]; break; + case PIPE_SWIZZLE_BLUE: dst-ui[c] = src-ui[2]; break; + case PIPE_SWIZZLE_ALPHA: dst-ui[c] = src-ui[3]; break; + default: +dst-ui[c] = (swz[c] == PIPE_SWIZZLE_ONE) ? 1 : 0; +break; + } + } + } else { + for (c = 0; c 4; ++c) { + switch (swz[c]) { + case PIPE_SWIZZLE_RED: dst-f[c] = src-f[0]; break; + case PIPE_SWIZZLE_GREEN: dst-f[c] = src-f[1]; break; + case PIPE_SWIZZLE_BLUE: dst-f[c] = src-f[2]; break; + case PIPE_SWIZZLE_ALPHA: dst-f[c] = src-f[3]; break; + default: +dst-f[c] = (swz[c] == PIPE_SWIZZLE_ONE) ? 1.0f : 0.0f; +break; + } + } + } +} + void util_format_swizzle_4f(float *dst, const float *src, const unsigned char swz[4]) { diff --git a/src/gallium/auxiliary/util/u_format.h b/src/gallium/auxiliary/util/u_format.h index ed942fb..e4b9c36 100644 --- a/src/gallium/auxiliary/util/u_format.h +++ b/src/gallium/auxiliary/util/u_format.h @@ -33,6 +33,9 @@ #include pipe/p_format.h #include util/u_debug.h +union pipe_color_union; + + #ifdef __cplusplus extern C { #endif @@ -1117,6 +1120,15 @@ void util_format_compose_swizzles(const unsigned char swz1[4], const unsigned char swz2[4], unsigned char dst[4]); +/* Apply the swizzle provided in \param swz (which is one of PIPE_SWIZZLE_x) + * to \param src and store the result in \param dst. + * \param is_integer determines the value written for PIPE_SWIZZLE_ONE. + */ +void util_format_apply_color_swizzle(union pipe_color_union *dst, + const union pipe_color_union *src, + const unsigned char swz[4], + const boolean is_integer); + void util_format_swizzle_4f(float *dst, const float *src, const unsigned char swz[4]); diff --git a/src/gallium/docs/source/cso/sampler.rst b/src/gallium/docs/source/cso/sampler.rst index 26ffc18..9959793 100644 --- a/src/gallium/docs/source/cso/sampler.rst +++ b/src/gallium/docs/source/cso/sampler.rst @@ -101,7 +101,9 @@ max_lod border_color Color union used for texel coordinates that are outside the [0,width-1], [0, height-1] or [0, depth-1] ranges. Interpreted according
[Mesa-dev] [PATCH] nv50/ir: handle TGSI_OPCODE_IF(float) properly
You can merge this with the original UIF patch if you want. --- .../drivers/nv50/codegen/nv50_ir_from_tgsi.cpp |7 ++- .../drivers/nv50/codegen/nv50_ir_lowering_nv50.cpp |2 +- .../drivers/nvc0/codegen/nv50_ir_lowering_nvc0.cpp |2 +- 3 files changed, 4 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp index 054c75e..d8abccd 100644 --- a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp @@ -386,6 +386,7 @@ static nv50_ir::TexTarget translateTexture(uint tex) nv50_ir::DataType Instruction::inferSrcType() const { switch (getOpcode()) { + case TGSI_OPCODE_UIF: case TGSI_OPCODE_AND: case TGSI_OPCODE_OR: case TGSI_OPCODE_XOR: @@ -2431,10 +2432,6 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn) mkOp1(op, TYPE_U32, NULL, src0)-fixed = 1; break; case TGSI_OPCODE_IF: - /* XXX: fall-through into UIF, but this might lead to - * incorrect behavior on state trackers and auxiliary - * modules that emit float bool IFs regardless of - * native integer support */ case TGSI_OPCODE_UIF: { BasicBlock *ifBB = new BasicBlock(func); @@ -2443,7 +2440,7 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn) condBBs.push(bb); joinBBs.push(bb); - mkFlow(OP_BRA, NULL, CC_NOT_P, fetchSrc(0, 0)); + mkFlow(OP_BRA, NULL, CC_NOT_P, fetchSrc(0, 0))-setType(srcTy); setPosition(ifBB, true); } diff --git a/src/gallium/drivers/nv50/codegen/nv50_ir_lowering_nv50.cpp b/src/gallium/drivers/nv50/codegen/nv50_ir_lowering_nv50.cpp index 20f76f8..03086e3 100644 --- a/src/gallium/drivers/nv50/codegen/nv50_ir_lowering_nv50.cpp +++ b/src/gallium/drivers/nv50/codegen/nv50_ir_lowering_nv50.cpp @@ -1011,7 +1011,7 @@ NV50LoweringPreSSA::checkPredicate(Instruction *insn) return; cdst = bld.getSSA(1, FILE_FLAGS); - bld.mkCmp(OP_SET, CC_NEU, TYPE_U32, cdst, bld.loadImm(NULL, 0), pred); + bld.mkCmp(OP_SET, CC_NEU, insn-dType, cdst, bld.loadImm(NULL, 0), pred); insn-setPredicate(insn-cc, cdst); } diff --git a/src/gallium/drivers/nvc0/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nvc0/codegen/nv50_ir_lowering_nvc0.cpp index 4d1d372..7676185 100644 --- a/src/gallium/drivers/nvc0/codegen/nv50_ir_lowering_nvc0.cpp +++ b/src/gallium/drivers/nvc0/codegen/nv50_ir_lowering_nvc0.cpp @@ -1490,7 +1490,7 @@ NVC0LoweringPass::checkPredicate(Instruction *insn) // CAUTION: don't use pdst-getInsn, the definition might not be unique, // delay turning PSET(FSET(x,y),0) into PSET(x,y) to a later pass - bld.mkCmp(OP_SET, CC_NEU, TYPE_U32, pdst, bld.mkImm(0), pred); + bld.mkCmp(OP_SET, CC_NEU, insn-dType, pdst, bld.mkImm(0), pred); insn-setPredicate(insn-cc, pdst); } -- 1.7.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: optionally apply texture swizzle to border color
On 14.04.2013 13:44, Jose Fonseca wrote: - Original Message - From: Christoph Bumiller christoph.bumil...@speed.at This is the only sane solution for nv50 and nvc0 (really, trust me), but since on other hardware the border colour is tightly coupled with texture state they'd have to undo the swizzle, so I've added a cap. The name of the cap could be changed to be more descriptive, like PIPE_CAP_TEXTURE_SWIZZLE_AFFECTS_BORDER_COLOR. Yes, please. The dependency of update_sampler on the texture updates was introduced to avoid doing the apply_depthmode to the swizzle twice. More detailed explanation of driver situation: No, really, don't suggest doing this in the driver. The driver has elegantly separated texture view and sampler states (which are each a structure in a table in VRAM and should not be updated to avoid performance loss), and table are bound to the independent (!) I wonder if this is modeled after D3D10, where sampler state is independent from resource view state. Though as far as I known, D3D10's interpretation of texture border color does not depend on the swizzle... texture and sampler slots in shaders which must be separately indexable indirectly). So, if I was to do this in the driver, I'd have to add separate sampler state object instances for each texture view with appropriately swizzled border color, and there's only 16 slots, so I'd be limited to 4 texture units. Not to mention the sheer insanity, ugliness and emotional pain incurred when writing that code when it COULD be so easy and simple in the state tracker where you know that textures and samplers are tightly coupled, while in gallium I cannot assume that to be the case. You wouldn't really need to create all state combinations: if you known that textures and samplers are tightly coupled, then caching the actually used combinations will get you exactly the same behavior, without losing performance or generality. But granted, this would require more effort. The emphasize being on IF I knew (that they're tighly coupled). If I did, I could switch to linked mode where the card automatically uses the view index as sampler index, ignoring the actual sampler index, and validate them together. However, that only applies to 3D, not to COMPUTE (which means that GL compute shaders will still have the problem), and I'd have to support both variants for state trackers that do not allow the coupling, and we need a way for the state tracker to actually tell us what it wants. All that makes it even quirkier. Also please spare a thought for other state trackers -- and I'm not even talking about a potential D3D10 state tracker for which your driver would be unusable --, even inside Mesa: it seems like src/gallium/state_trackers/vega uses both texture border and swizzle, probably vl state tracker too, so your driver will be busted on those state trackers. These need to be It already is busted. It's also busted on r600 where making border color + swizzle work properly isn't even POSSIBLE (according to the radeon guys). Maybe not for vega, it doesn't use a permutational swizzle, it just sets components to PIPE_SWIZZLE_ONE, and incidentally the ZERO/ONE swizzles do affect the border color. As far as I can tell, it looks something like this (if you're interested; the exact behaviour seems not supposed to be made use of): === In the format description (including swizzle), each color component of RGBA (as seen by the shader) gets mapped a memory component {C0,C1,C2,C3} or {ZERO,ONE_INT,ONE_FLOAT}. When a memory (!) component (Cx) is first encountered when going through RGBA, it is assigned the SAMPLER_BORDER_COLOR component value for that component, and if the memory component is encountered again (because of swizzle), that same value will be used. So, assuming memory format RGBA and the swizzle 1RBG: R = ONE G = C0 B = C2 A = C1 the border colour will be SAMPLER_BORDER_COLOR.1GBA. The resulting border colour with swizzle applied to the sampler would be (lowercase being user values): R=1 G=r B=b A=g resulting in 1rbg, which works out. === updated -- maybe the burden of considering this state can be lifted onto some helper functinons -- if not, these state trackers should at least be updated to abort/warn when the cap is set. But I'm not really objecting -- as texture border seems fundamentally quirky state. But before proceeding with this I'd like us to consider another texture border quirk while we are at it. The other quirk is the integer vs float texture border colors. Roland can probably talk a bit more about it as he was the one who came across it. In a few words, the interpretation of texture border color union depends on the format in the sampler view state (whether it's a pure integer format or not). So, I wonder how integer vs float texture border colors will fit in your driver's elegantly separated texture view and sampler states, or any other
Re: [Mesa-dev] [PATCH] st/mesa: optionally apply texture swizzle to border color
On 14.04.2013 13:50, Jose Fonseca wrote: - Original Message - Not to mention the sheer insanity, ugliness and emotional pain incurred when writing that code when it COULD be so easy and simple in the state tracker where you know that textures and samplers are tightly coupled, while in gallium I cannot assume that to be the case. Also, will this still be true when Mesa state tracker implements GL_ARB_texture_view ? I dare say yes. GL texture views do NOT decouple textures from samplers, they just decouple gallium sampler views from OpenGL textures. There may be an issue if we wanted (and we don't) to use a single sampler for all the OpenGL texture views of a single texture. However, that ONLY works if the shaders are changed as well, and since the texture/sampler combinations are not predictable, this is a very bad idea as it would mean frequent shader recompilations. As to whether there will ever be an OpenGL extension that adds separation of views and samplers to shaders ... I'm hoping for NV to add some clause to the spec to solve the border colour trouble, like forbidding texture swizzle in such cases (and I'm sure AMD would be inclined to agree). Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: optionally apply texture swizzle to border color
On 14.04.2013 14:33, Christoph Bumiller wrote: === In the format description (including swizzle), each color component of RGBA (as seen by the shader) gets mapped a memory component {C0,C1,C2,C3} or {ZERO,ONE_INT,ONE_FLOAT}. When a memory (!) component (Cx) is first encountered when going through RGBA, it is assigned the SAMPLER_BORDER_COLOR component value for that component, and if the memory component is encountered again (because of swizzle), that same value will be used. So, assuming memory format RGBA and the swizzle 1RBG: R = ONE G = C0 B = C2 A = C1 the border colour will be SAMPLER_BORDER_COLOR.1GBA. The resulting border colour with swizzle applied to the sampler would be (lowercase being user values): R=1 G=r B=b A=g resulting in 1rbg, which works out. === Sorry, that was a bad example, I feel the need to give a better one: When a memory component (Cx) is first encountered when going through RGBA, it is assigned the SAMPLER_BORDER_COLOR.R/G/B/A component value, and if the memory component is encountered again (because of swizzle), that same value will be used. RGBA8 with swizzle G1GB: R=C1 G=ONE B=C1 A=C2 gets BORDER_COLOR.R1RA. Maybe that's the same thing that happens on r600 (I just recall undo the swizzle in a weird way) ? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH resend] mesa: Add core support for the GL_AMD_performance_monitor extension.
On 12.04.2013 21:14, Kenneth Graunke wrote: This provides an interface for applications (and OpenGL-based tools) to access GPU performance counters. Since the exact performance counters available vary between vendors and hardware generations, the extension provides an API the application can use to get the names, types, and minimum/maximum values of all available counters. Counters are also organized into groups. Applications create performance monitor objects, select the counters they want to track, and Begin/End monitoring, much like OpenGL's query API. Multiple monitors can be in flight simultaneously. We chose not to implement the similar GL_INTEL_performance_queries extension because Intel has not bothered to publish a specification in the OpenGL registry. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mapi/glapi/gen/AMD_performance_monitor.xml | 87 src/mapi/glapi/gen/Makefile.am | 1 + src/mapi/glapi/gen/gl_API.xml | 2 + src/mapi/glapi/gen/gl_genexec.py | 1 + src/mesa/SConscript| 1 + src/mesa/main/context.c| 2 + src/mesa/main/dd.h | 22 + src/mesa/main/extensions.c | 1 + src/mesa/main/mtypes.h | 84 src/mesa/main/performance_monitor.c| 563 + src/mesa/main/performance_monitor.h| 85 src/mesa/sources.mak | 1 + 12 files changed, 850 insertions(+) create mode 100644 src/mapi/glapi/gen/AMD_performance_monitor.xml create mode 100644 src/mesa/main/performance_monitor.c create mode 100644 src/mesa/main/performance_monitor.h /** + * A performance monitor as described in AMD_performance_monitor. + */ +struct gl_perf_monitor_object +{ + GLboolean Active; + + /* Actually BITSET_WORD but we can't #include that here. */ + GLuint *ActiveCounters; +}; + Started to implement this for mesa/st, got a question about ActiveCounters: Does this bitset refer to the counter IDs or the Counters array index ? Do the IDs have to be consecutive ? Do they have to correspond to the array index ? + +void GLAPIENTRY +_mesa_SelectPerfMonitorCountersAMD(GLuint monitor, GLboolean enable, + GLuint group, GLint numCounters, + GLuint *counterList) +{ ... + if (enable) { + /* Enable the counters */ + for (i = 0; i numCounters; i++) { + BITSET_SET(m-ActiveCounters, counterList[i]); + } + } else { + /* Disable the counters */ + for (i = 0; i numCounters; i++) { + BITSET_CLEAR(m-ActiveCounters, counterList[i]); + } + } +} counterList is an ID, so this implies ActiveCounters refers to IDs. You also do: m-ActiveCounters = calloc(ctx-PerfMonitor.NumCounters, sizeof(BITSET_WORD)); So, this implies it refers to the Counters array of size NumCounters (unless the overallocation by 8 * sizeof(BITSET_WORD) bits has some purpose that escapes me). Hence, we cannot freely select IDs, can we ? I had different graciously spaced ranges of gallium query IDs reserved of different counter domains (since I haven't added all possible counters, I don't even know all of them, needs REing), so I guess I have to remap them in the state tracker ... Anyway, I think this should be mentioned in a comment [that is easy to find]. Thanks, Christoph ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] st/mesa: optionally apply texture swizzle to border color
From: Christoph Bumiller christoph.bumil...@speed.at This is the only sane solution for nv50 and nvc0 (really, trust me), but since on other hardware the border colour is tightly coupled with texture state they'd have to undo the swizzle, so I've added a cap. The name of the cap could be changed to be more descriptive, like PIPE_CAP_TEXTURE_SWIZZLE_AFFECTS_BORDER_COLOR. The dependency of update_sampler on the texture updates was introduced to avoid doing the apply_depthmode to the swizzle twice. More detailed explanation of driver situation: No, really, don't suggest doing this in the driver. The driver has elegantly separated texture view and sampler states (which are each a structure in a table in VRAM and should not be updated to avoid performance loss), and table are bound to the independent (!) texture and sampler slots in shaders which must be separately indexable indirectly). So, if I was to do this in the driver, I'd have to add separate sampler state object instances for each texture view with appropriately swizzled border color, and there's only 16 slots, so I'd be limited to 4 texture units. Not to mention the sheer insanity, ugliness and emotional pain incurred when writing that code when it COULD be so easy and simple in the state tracker where you know that textures and samplers are tightly coupled, while in gallium I cannot assume that to be the case. --- src/gallium/docs/source/cso/sampler.rst |7 ++- src/gallium/docs/source/screen.rst |2 + src/gallium/drivers/freedreno/freedreno_screen.c |1 + src/gallium/drivers/i915/i915_screen.c |1 + src/gallium/drivers/llvmpipe/lp_screen.c |2 + src/gallium/drivers/nv30/nv30_screen.c |1 + src/gallium/drivers/nv50/nv50_screen.c |1 + src/gallium/drivers/nvc0/nvc0_screen.c |1 + src/gallium/drivers/r300/r300_screen.c |1 + src/gallium/drivers/r600/r600_pipe.c |1 + src/gallium/drivers/radeonsi/radeonsi_pipe.c |1 + src/gallium/drivers/softpipe/sp_screen.c |2 + src/gallium/drivers/svga/svga_screen.c |2 + src/gallium/include/pipe/p_defines.h |3 +- src/mesa/state_tracker/st_atom.c |2 +- src/mesa/state_tracker/st_atom_sampler.c | 65 +- src/mesa/state_tracker/st_context.c |2 + src/mesa/state_tracker/st_context.h |1 + 18 files changed, 89 insertions(+), 7 deletions(-) diff --git a/src/gallium/docs/source/cso/sampler.rst b/src/gallium/docs/source/cso/sampler.rst index 26ffc18..1911cea 100644 --- a/src/gallium/docs/source/cso/sampler.rst +++ b/src/gallium/docs/source/cso/sampler.rst @@ -101,7 +101,10 @@ max_lod border_color Color union used for texel coordinates that are outside the [0,width-1], [0, height-1] or [0, depth-1] ranges. Interpreted according to sampler -view format. +view format, unless the driver reports +PIPE_CAP_BORDER_COLOR_QUIRK, in which case this value is substituted for +the texture color exactly as specified, the sampler view format and swizzle +have no effect on it. max_anisotropy Maximum anistropy ratio to use when sampling from textures. For example, if max_anistropy=4, a region of up to 1 by 4 texels will be sampled. @@ -111,4 +114,4 @@ max_anisotropy seamless_cube_map If set, the bilinear filter of a cube map may take samples from adjacent cube map faces when sampled near a texture border to produce a seamless -look. \ No newline at end of file +look. diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 4b01d77..495398b 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -151,6 +151,8 @@ The integer capabilities: dedicated memory should return 1 and all software rasterizers should return 0. * ``PIPE_CAP_QUERY_PIPELINE_STATISTICS``: Whether PIPE_QUERY_PIPELINE_STATISTICS is supported. +* ``PIPE_CAP_BORDER_COLOR_QUIRK``: Whether the sampler view's format and swizzle + affect the border color. .. _pipe_capf: diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index 283d07f..5b60401 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -200,6 +200,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_USER_VERTEX_BUFFERS: case PIPE_CAP_USER_INDEX_BUFFERS: case PIPE_CAP_QUERY_PIPELINE_STATISTICS: + case PIPE_CAP_BORDER_COLOR_QUIRK: return 0; /* Stream output. */ diff --git a/src/gallium/drivers/i915/i915_screen.c b/src/gallium/drivers/i915/i915_screen.c index 54b2154..4c3d52f 100644 --- a/src/gallium/drivers/i915/i915_screen.c +++ b/src/gallium/drivers/i915/i915_screen.c @@ -213,6 +213,7
Re: [Mesa-dev] [PATCH resend] mesa: Add core support for the GL_AMD_performance_monitor extension.
On 12.04.2013 21:14, Kenneth Graunke wrote: This provides an interface for applications (and OpenGL-based tools) to access GPU performance counters. Since the exact performance counters available vary between vendors and hardware generations, the extension provides an API the application can use to get the names, types, and minimum/maximum values of all available counters. Counters are also organized into groups. + /** +* \name Performance monitors +*/ + /*@{*/ + struct gl_perf_monitor_object * (*NewPerfMonitor)(void); + void (*DeletePerfMonitor)(struct gl_perf_monitor_object *m); Could we get a gl_context for these as well ? It might be useful since if we want allocate or destroy (more likely) gallium objects we'll need a context. NewQueryObject has a context argument as well. I could save the context from the Begin/End calls, but if there's no reason not to pass a context to New/Delete, having it as arg would be preferable. Regards, Christoph ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radeonsi: add support for compressed texture
On 08.04.2013 12:03, Marek Olšák wrote: On Mon, Apr 8, 2013 at 11:29 AM, Michel Dänzer mic...@daenzer.net mailto:mic...@daenzer.net wrote: On Fre, 2013-04-05 at 17:36 -0400, j.gli...@gmail.com mailto:j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com mailto:jgli...@redhat.com Most test pass, issue are with border color and swizzle. FWIW, those issues are there with non-compressed formats as well. I'm afraid we might need to change the hardware border colour depending on the swizzle. I don't think so. The issue with the swizzled border color seems to be a bad hardware design decision present since r600 rather than a hardware bug. I tried fixing it for older chipsets with no success. I doubt the hw designers fixed this for SI. The problem is the hardware tries to guess what the border color swizzle is from the combined pipe_format+sampler view swizzle combination. You need 2 texture swizzle states in the texture unit for the border color to be swizzled correctly, because texels must be swizzled by the pipe_format swizzle and sampler view swizzle, but the border color must be swizzled by the sampler view only. The main problem is that the hardware internally tries to undo the pipe_format swizzle in a way that just doesn't work. I don't remember the exact swizzles being used by hardware, but I got crazy cases like if I set texture swizzle to ywzx, the border color will be ywyy. There is no way to access those zx components of the border color for that specific swizzling. For some cases, the hardware succeeds in guessing what the border color should be, e.g. if I set texture swizzle to .zyxw, the returned border color will be .xyzw (and that would be correct if the swizzle came from pipe_format, and incorrect if the swizzle came from sampler view). It was easy with r300, because I could just undo pipe_format swizzling before passing the border color to the hardware. Ah yes, border colour swizzle, it's a problem on NV, too. Because the border colour isn't getting swizzled at all [as far as we know]. The main issue is the separation of samplers and textures in gallium, if that wasn't the case samplers and textures would be coupled and the sampler state could be set according to texture view state (if it's just OpenGL; and if it's just D3D there's no swizzle). So, I just leave it broken, I can't destroying the elegant separation because of such an unimportant detail, that hurts too much. (Also, if someone was to use multiple samplers and views in gallium and index them dynamically, I'd have to set up all combinations of textures and samplers, which is simply ridiculous. And now I'm going to look for some secret sampler setup bit that says swizzle according to texture view state. Maybe looking into the future of OpenGL someone's been wise enough to add that. But then, I'd have the sample problem as you. An intensity texture simply doesn't have separate values for R,G,B,A.) Possible solution: Maybe the state tracker could just do the swizzling, because it knows that samplers and views are coupled, and it knows the swizzle ? Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] gallium: add PIPE_BIND_COMMAND_BUFFER
On 04.04.2013 21:53, Christoph Bumiller wrote: On 04.04.2013 21:44, Jose Fonseca wrote: I think that PIPE_BIND_INDIRECT_BUFFER would be more self-descriptive. Marek suggested PIPE_BIND_DRAW_INDIRECT_BUFFER, but I think that's too specific because there's also a DISPATCH_INDIRECT buffer for compute shaders. And just INDIRECT_BUFFER without the _DRAW_ doesn't seem so self-descriptive if you're not thinking in the right context. I'd like to stick with BIND_COMMAND_BUFFER, or maybe BIND_COMMAND_ARGS_BUFFER ... Or do you envision other uses of such buffer? It's possible that at some point we add a mechanism to let the driver store arbitrary commands into a buffer created by the st, or have resources used as arguments conditional rendering ... Lost of possiblities, but nothing concrete, and for the command lists like with D3D's deferred contexts we'd probably return opaque objects that can contain more auxiliary data. I like it to be more generic, but then it could turn out that there be different requirements on these command source buffers in the future ... I'm undecided now. Jose - Original Message - Intended for use with GL_ARB_draw_indirect's DRAW_INDIRECT_BUFFER target or for D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS. --- src/gallium/docs/source/screen.rst |2 ++ src/gallium/include/pipe/p_defines.h |1 + 2 files changed, 3 insertions(+), 0 deletions(-) diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index c1a3c0b..f8cdded 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -306,6 +306,8 @@ resources might be created and handled quite differently. bound to the graphics pipeline as a shader resource. * ``PIPE_BIND_COMPUTE_RESOURCE``: A buffer or texture that can be bound to the compute program as a shader resource. +* ``PIPE_BIND_COMMAND_BUFFER``: A buffer or that may be sourced by the + GPU command processor, like with indirect drawing. .. _pipe_usage: diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index 5b00acc..2b79f2a 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -315,6 +315,7 @@ enum pipe_flush_flags { #define PIPE_BIND_GLOBAL (1 18) /* set_global_binding */ #define PIPE_BIND_SHADER_RESOURCE (1 19) /* set_shader_resources */ #define PIPE_BIND_COMPUTE_RESOURCE (1 20) /* set_compute_resources */ +#define PIPE_BIND_COMMAND_BUFFER (1 21) /* pipe_draw_info.indirect */ /* The first two flags above were previously part of the amorphous * TEXTURE_USAGE, most of which are now descriptions of the ways a -- 1.7.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] mesa: implement GL_ARB_draw_indirect
On 04.04.2013 21:17, Brian Paul wrote: I just did a quick skim and found a few minor things. First, the subject might be mesa: implement GL_ARB_draw_indirect and GL_ARB_multi_draw_indirect This is a big patch and I think it could have been broken down into smaller pieces, but I know it's a PITA to redo. Next time. +static void GLAPIENTRY +save_DrawArraysIndirect(GLenum mode, const GLvoid *indirect) +{ + GET_CURRENT_CONTEXT(ctx); + _mesa_error(ctx, GL_INVALID_OPERATION, + glDrawArraysIndirect() during display list compile); +} Is this specified in the spec? IIRC, if a command isn't supposed to get compiled into a dlist it's just immediately executed. Not that I can see. But I figured since DrawElementsInstancedBaseVertex return GL_INVALID_OPERATION, the indirect version, which is essentially the same except for the place it takes its arguments from (especially in compatibility mode, where the indirect argument is a user pointer (this is not implemented, only exposing the extension in core profile for now, which means it won't ever be used with display lists anyway until compatibility for it is implemented), would behave the same. I also don't get why DrawElements non-instanced non-baseVertex is specified to execute immediately instead of returning INVALID_OPERATION. Surely if I can draw one instance I can draw 2 instances if it's executed immediately anyway. The NV binary driver seems to simply execute it immediately fwiw. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] mesa: implement GL_ARB_draw_indirect
On 05.04.2013 13:32, Christoph Bumiller wrote: On 04.04.2013 21:17, Brian Paul wrote: I just did a quick skim and found a few minor things. First, the subject might be mesa: implement GL_ARB_draw_indirect and GL_ARB_multi_draw_indirect This is a big patch and I think it could have been broken down into smaller pieces, but I know it's a PITA to redo. Next time. +static void GLAPIENTRY +save_DrawArraysIndirect(GLenum mode, const GLvoid *indirect) +{ + GET_CURRENT_CONTEXT(ctx); + _mesa_error(ctx, GL_INVALID_OPERATION, + glDrawArraysIndirect() during display list compile); +} Is this specified in the spec? IIRC, if a command isn't supposed to get compiled into a dlist it's just immediately executed. Not that I can see. But I figured since DrawElementsInstancedBaseVertex return GL_INVALID_OPERATION, the indirect version, which is essentially the same except for the place it takes its arguments from (especially in compatibility mode, where the indirect argument is a user pointer (this is not implemented, only exposing the extension in core profile for now, which means it won't ever be used with display lists anyway until compatibility for it is implemented), would behave the same. I also don't get why DrawElements non-instanced non-baseVertex is specified to execute immediately instead of returning INVALID_OPERATION. Surely if I can draw one instance I can draw 2 instances if it's executed immediately anyway. Nevermind this paragraph. The NV binary driver seems to simply execute it immediately fwiw. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/5] gallium: add PIPE_BIND_COMMAND_ARGS_BUFFER
Intended for use with GL_ARB_draw_indirect's DRAW_INDIRECT_BUFFER target or for D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS. --- src/gallium/docs/source/screen.rst |3 +++ src/gallium/include/pipe/p_defines.h |1 + 2 files changed, 4 insertions(+), 0 deletions(-) diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index c1a3c0b..d8cfb97 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -306,6 +306,9 @@ resources might be created and handled quite differently. bound to the graphics pipeline as a shader resource. * ``PIPE_BIND_COMPUTE_RESOURCE``: A buffer or texture that can be bound to the compute program as a shader resource. +* ``PIPE_BIND_COMMAND_ARGS_BUFFER``: A buffer that may be sourced by the + GPU command processor. It can contain, for example, the arguments to + indirect draw calls. .. _pipe_usage: diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index 5b00acc..4c6b1f1 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -315,6 +315,7 @@ enum pipe_flush_flags { #define PIPE_BIND_GLOBAL (1 18) /* set_global_binding */ #define PIPE_BIND_SHADER_RESOURCE (1 19) /* set_shader_resources */ #define PIPE_BIND_COMPUTE_RESOURCE (1 20) /* set_compute_resources */ +#define PIPE_BIND_COMMAND_ARGS_BUFFER (1 21) /* pipe_draw_info.indirect */ /* The first two flags above were previously part of the amorphous * TEXTURE_USAGE, most of which are now descriptions of the ways a -- 1.7.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/5] mesa: add indirect drawing buffer parameter to draw functions
Split from patch implementing ARB_draw_indirect. v2: Const-qualify the struct gl_buffer_object *indirect argument. --- src/mesa/drivers/dri/i965/brw_draw.c |3 ++- src/mesa/drivers/dri/i965/brw_draw.h |3 ++- src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c |9 ++--- src/mesa/state_tracker/st_cb_rasterpos.c |2 +- src/mesa/state_tracker/st_draw.c |3 ++- src/mesa/state_tracker/st_draw.h |6 -- src/mesa/state_tracker/st_draw_feedback.c|3 ++- src/mesa/tnl/tnl.h |3 ++- src/mesa/vbo/vbo.h |5 - src/mesa/vbo/vbo_exec_array.c|8 src/mesa/vbo/vbo_exec_draw.c |2 +- src/mesa/vbo/vbo_primitive_restart.c |4 ++-- src/mesa/vbo/vbo_rebase.c|2 +- src/mesa/vbo/vbo_save_draw.c |2 +- src/mesa/vbo/vbo_split_copy.c|2 +- src/mesa/vbo/vbo_split_inplace.c |2 +- 16 files changed, 36 insertions(+), 23 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index 809bcc5..9212eb1 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -548,7 +548,8 @@ void brw_draw_prims( struct gl_context *ctx, GLboolean index_bounds_valid, GLuint min_index, GLuint max_index, -struct gl_transform_feedback_object *tfb_vertcount ) +struct gl_transform_feedback_object *tfb_vertcount, +const struct gl_buffer_object *indirect ) { struct intel_context *intel = intel_context(ctx); const struct gl_client_array **arrays = ctx-Array._DrawArrays; diff --git a/src/mesa/drivers/dri/i965/brw_draw.h b/src/mesa/drivers/dri/i965/brw_draw.h index d86a9e7..8f0c768 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.h +++ b/src/mesa/drivers/dri/i965/brw_draw.h @@ -41,7 +41,8 @@ void brw_draw_prims( struct gl_context *ctx, GLboolean index_bounds_valid, GLuint min_index, GLuint max_index, -struct gl_transform_feedback_object *tfb_vertcount ); +struct gl_transform_feedback_object *tfb_vertcount, +const struct gl_buffer_object *indirect ); void brw_draw_init( struct brw_context *brw ); void brw_draw_destroy( struct brw_context *brw ); diff --git a/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c b/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c index 436db32..69f30e2 100644 --- a/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c +++ b/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c @@ -222,7 +222,8 @@ TAG(vbo_render_prims)(struct gl_context *ctx, const struct _mesa_index_buffer *ib, GLboolean index_bounds_valid, GLuint min_index, GLuint max_index, - struct gl_transform_feedback_object *tfb_vertcount); + struct gl_transform_feedback_object *tfb_vertcount, + const struct gl_buffer_object *indirect); static GLboolean vbo_maybe_split(struct gl_context *ctx, const struct gl_client_array **arrays, @@ -453,7 +454,8 @@ TAG(vbo_render_prims)(struct gl_context *ctx, const struct _mesa_index_buffer *ib, GLboolean index_bounds_valid, GLuint min_index, GLuint max_index, - struct gl_transform_feedback_object *tfb_vertcount) + struct gl_transform_feedback_object *tfb_vertcount, + const struct gl_buffer_object *indirect) { struct nouveau_render_state *render = to_render_state(ctx); const struct gl_client_array **arrays = ctx-Array._DrawArrays; @@ -489,7 +491,8 @@ TAG(vbo_check_render_prims)(struct gl_context *ctx, const struct _mesa_index_buffer *ib, GLboolean index_bounds_valid, GLuint min_index, GLuint max_index, - struct gl_transform_feedback_object *tfb_vertcount) + struct gl_transform_feedback_object *tfb_vertcount, + const struct gl_buffer_object *indirect) { struct nouveau_context *nctx = to_nouveau_context(ctx); diff --git a/src/mesa/state_tracker/st_cb_rasterpos.c b/src/mesa/state_tracker/st_cb_rasterpos.c index 4731f26..778218a1 100644 --- a/src/mesa/state_tracker/st_cb_rasterpos.c +++ b/src/mesa/state_tracker/st_cb_rasterpos.c @@ -255,7 +255,7 @@ st_RasterPos(struct gl_context *ctx, const GLfloat v[4]) * st_feedback_draw_vbo doesn't check for that flag. */ ctx-Array._DrawArrays = rs-arrays; st_feedback_draw_vbo(ctx, rs-prim, 1, NULL, GL_TRUE, 0, 1, -NULL); +NULL,
[Mesa-dev] [PATCH 5/5] st/mesa: add support for indirect drawing
--- src/mesa/state_tracker/st_cb_bufferobjects.c |3 +++ src/mesa/state_tracker/st_draw.c | 11 ++- src/mesa/state_tracker/st_extensions.c |3 ++- 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_bufferobjects.c b/src/mesa/state_tracker/st_cb_bufferobjects.c index 8ff32c8..2e719cc 100644 --- a/src/mesa/state_tracker/st_cb_bufferobjects.c +++ b/src/mesa/state_tracker/st_cb_bufferobjects.c @@ -205,6 +205,9 @@ st_bufferobj_data(struct gl_context *ctx, case GL_UNIFORM_BUFFER: bind = PIPE_BIND_CONSTANT_BUFFER; break; + case GL_DRAW_INDIRECT_BUFFER: + bind = PIPE_BIND_COMMAND_ARGS_BUFFER; + break; default: bind = 0; } diff --git a/src/mesa/state_tracker/st_draw.c b/src/mesa/state_tracker/st_draw.c index 82a4bcd..3c74c50 100644 --- a/src/mesa/state_tracker/st_draw.c +++ b/src/mesa/state_tracker/st_draw.c @@ -256,6 +256,14 @@ st_draw_vbo(struct gl_context *ctx, } } + if (indirect) { + info.indirect = st_buffer_object(indirect)-buffer; + + /* Primitive restart is not handled by the VBO module in this case. */ + info.primitive_restart = ctx-Array._PrimitiveRestart; + info.restart_index = ctx-Array._RestartIndex; + } + /* do actual drawing */ for (i = 0; i nr_prims; i++) { info.mode = translate_prim( ctx, prims[i].mode ); @@ -268,6 +276,7 @@ st_draw_vbo(struct gl_context *ctx, info.min_index = info.start; info.max_index = info.start + info.count - 1; } + info.indirect_offset = prims[i].indirect_offset; if (ST_DEBUG DEBUG_DRAW) { debug_printf(st/draw: mode %s start %u count %u indexed %d\n, @@ -277,7 +286,7 @@ st_draw_vbo(struct gl_context *ctx, info.indexed); } - if (info.count_from_stream_output) { + if (info.count_from_stream_output || info.indirect) { cso_draw_vbo(st-cso_context, info); } else if (info.primitive_restart) { diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 11db9d3..0488755 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -398,7 +398,8 @@ void st_init_extensions(struct st_context *st) { o(MESA_texture_array), PIPE_CAP_MAX_TEXTURE_ARRAY_LAYERS }, { o(OES_standard_derivatives), PIPE_CAP_SM3 }, - { o(ARB_texture_cube_map_array), PIPE_CAP_CUBE_MAP_ARRAY } + { o(ARB_texture_cube_map_array), PIPE_CAP_CUBE_MAP_ARRAY }, + { o(ARB_draw_indirect),PIPE_CAP_DRAW_INDIRECT }, }; /* Required: render target and sampler support */ -- 1.7.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/5] mesa: implement GL_ARB_draw_indirect and GL_ARB_multi_draw_indirect
v2: Removed some stray extern qualifiers. Documented use of Draw*IndirectCommand sizes. Removed separate extension enable flag for ARB_multi_draw_indirect since this can always be supported by looping. Kept generation of GL_INVALID_OPERATION in display list compile. The spec doesn't say anything about them, but all the direct drawing commands that support instancing do the same. --- src/mapi/glapi/gen/Makefile.am |1 + src/mapi/glapi/gen/gl_API.xml |4 +- src/mesa/main/api_validate.c| 153 +++ src/mesa/main/api_validate.h| 26 src/mesa/main/bufferobj.c |9 + src/mesa/main/dd.h | 12 ++ src/mesa/main/dlist.c | 41 + src/mesa/main/extensions.c |2 + src/mesa/main/get.c |5 + src/mesa/main/get_hash_params.py|2 + src/mesa/main/mtypes.h |3 + src/mesa/main/tests/dispatch_sanity.cpp |8 +- src/mesa/main/vtxfmt.c |7 + src/mesa/vbo/vbo_exec_array.c | 249 +++ src/mesa/vbo/vbo_save_api.c | 53 +++ 15 files changed, 570 insertions(+), 5 deletions(-) diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am index 36e47e2..243c148 100644 --- a/src/mapi/glapi/gen/Makefile.am +++ b/src/mapi/glapi/gen/Makefile.am @@ -96,6 +96,7 @@ API_XML = \ ARB_depth_clamp.xml \ ARB_draw_buffers_blend.xml \ ARB_draw_elements_base_vertex.xml \ + ARB_draw_indirect.xml \ ARB_draw_instanced.xml \ ARB_ES2_compatibility.xml \ ARB_ES3_compatibility.xml \ diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index df95924..f22fdac 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -8240,6 +8240,8 @@ !-- ARB extensions #86...#93 -- +xi:include href=ARB_draw_indirect.xml xmlns:xi=http://www.w3.org/2001/XInclude/ + category name=GL_ARB_transform_feedback3 number=94 enum name=MAX_TRANSFORM_FEEDBACK_BUFFERS value=0x8E70/ enum name=MAX_VERTEX_STREAMS value=0x8E71/ @@ -8317,7 +8319,7 @@ xi:include href=ARB_invalidate_subdata.xml xmlns:xi=http://www.w3.org/2001/XInclude/ -!-- ARB extensions #133...#138 -- +!-- ARB extensions #134...#138 -- xi:include href=ARB_texture_buffer_range.xml xmlns:xi=http://www.w3.org/2001/XInclude/ diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c index 53b0021..e875c5d 100644 --- a/src/mesa/main/api_validate.c +++ b/src/mesa/main/api_validate.c @@ -737,3 +737,156 @@ _mesa_validate_DrawTransformFeedback(struct gl_context *ctx, return GL_TRUE; } + +static GLboolean +valid_draw_indirect(struct gl_context *ctx, +GLenum mode, const GLvoid *indirect, +GLsizei size, const char *name) +{ + const GLsizeiptr end = (GLsizeiptr)indirect + size; + + if (!_mesa_valid_prim_mode(ctx, mode, name)) + return GL_FALSE; + + if ((GLsizeiptr)indirect (sizeof(GLuint) - 1)) { + _mesa_error(ctx, GL_INVALID_OPERATION, + %s(indirect is not aligned), name); + return GL_FALSE; + } + + if (_mesa_is_bufferobj(ctx-DrawIndirectBuffer)) { + if (_mesa_bufferobj_mapped(ctx-DrawIndirectBuffer)) { + _mesa_error(ctx, GL_INVALID_OPERATION, + %s(DRAW_INDIRECT_BUFFER is mapped), name); + return GL_FALSE; + } + if (ctx-DrawIndirectBuffer-Size end) { + _mesa_error(ctx, GL_INVALID_OPERATION, + %s(DRAW_INDIRECT_BUFFER too small), name); + return GL_FALSE; + } + } else { + if (ctx-API != API_OPENGL_COMPAT) { + _mesa_error(ctx, GL_INVALID_OPERATION, + %s: no buffer bound to DRAW_INDIRECT_BUFFER, name); + return GL_FALSE; + } + } + + if (!check_valid_to_render(ctx, name)) + return GL_FALSE; + + return GL_TRUE; +} + +static inline GLboolean +valid_draw_indirect_elements(struct gl_context *ctx, + GLenum mode, GLenum type, const GLvoid *indirect, + GLsizeiptr size, const char *name) +{ + if (!valid_elements_type(ctx, type, name)) + return GL_FALSE; + + /* +* Unlike regular DrawElementsInstancedBaseVertex commands, the indices +* may not come from a client array and must come from an index buffer. +* If no element array buffer is bound, an INVALID_OPERATION error is +* generated. +*/ + if (!_mesa_is_bufferobj(ctx-Array.ArrayObj-ElementArrayBufferObj)) { + _mesa_error(ctx, GL_INVALID_OPERATION, + %s(no buffer bound to GL_ELEMENT_ARRAY_BUFFER), name); + return GL_FALSE; + } + + return valid_draw_indirect(ctx, mode, indirect, size, name); +} + +static inline GLboolean +valid_draw_indirect_multi(struct gl_context *ctx, +
[Mesa-dev] [PATCH 4/5] gallium: add facilities for indirect drawing
v2: Added comments to util_draw_indirect, clarified and fixed map size. Removed unlikely(). --- src/gallium/auxiliary/util/u_draw.c | 43 ++ src/gallium/auxiliary/util/u_draw.h |8 src/gallium/auxiliary/util/u_dump_state.c|3 ++ src/gallium/docs/source/screen.rst |3 ++ src/gallium/drivers/freedreno/freedreno_screen.c |1 + src/gallium/drivers/i915/i915_screen.c |1 + src/gallium/drivers/llvmpipe/lp_draw_arrays.c|5 +++ src/gallium/drivers/llvmpipe/lp_screen.c |2 + src/gallium/drivers/nv30/nv30_screen.c |1 + src/gallium/drivers/nv50/nv50_screen.c |2 + src/gallium/drivers/r300/r300_screen.c |1 + src/gallium/drivers/r600/r600_pipe.c |1 + src/gallium/drivers/radeonsi/radeonsi_pipe.c |1 + src/gallium/drivers/softpipe/sp_draw_arrays.c|6 +++ src/gallium/drivers/softpipe/sp_screen.c |2 + src/gallium/drivers/svga/svga_screen.c |1 + src/gallium/drivers/trace/tr_dump_state.c|3 ++ src/gallium/include/pipe/p_defines.h |3 +- src/gallium/include/pipe/p_state.h | 22 +++ 19 files changed, 108 insertions(+), 1 deletions(-) diff --git a/src/gallium/auxiliary/util/u_draw.c b/src/gallium/auxiliary/util/u_draw.c index 83d9284..b9f8fcd 100644 --- a/src/gallium/auxiliary/util/u_draw.c +++ b/src/gallium/auxiliary/util/u_draw.c @@ -27,6 +27,7 @@ #include util/u_debug.h +#include util/u_inlines.h #include util/u_math.h #include util/u_format.h #include util/u_draw.h @@ -123,3 +124,45 @@ util_draw_max_index( return max_index + 1; } + + +/* This extracts the draw arguments from the info_in-indirect resource, + * puts them into a new instance of pipe_draw_info, and calls draw_vbo on it. + */ +void +util_draw_indirect(struct pipe_context *pipe, + const struct pipe_draw_info *info_in) +{ + struct pipe_draw_info info; + struct pipe_transfer *transfer; + uint32_t *params; + const unsigned num_params = info_in-indexed ? 5 : 4; + + assert(info_in-indirect); + assert(!info_in-count_from_stream_output); + + memcpy(info, info_in, sizeof(info)); + + params = (uint32_t *) + pipe_buffer_map_range(pipe, +info_in-indirect, +info_in-indirect_offset, +num_params * sizeof(uint32_t), +PIPE_TRANSFER_READ, +transfer); + if (!transfer) { + debug_printf(%s: failed to map indirect buffer\n, __FUNCTION__); + return; + } + + info.count = params[0]; + info.instance_count = params[1]; + info.start = params[2]; + info.index_bias = info_in-indexed ? params[3] : 0; + info.start_instance = info_in-indexed ? params[4] : params[3]; + info.indirect = NULL; + + pipe_buffer_unmap(pipe, transfer); + + pipe-draw_vbo(pipe, info); +} diff --git a/src/gallium/auxiliary/util/u_draw.h b/src/gallium/auxiliary/util/u_draw.h index 3dc6918..1dd6b51 100644 --- a/src/gallium/auxiliary/util/u_draw.h +++ b/src/gallium/auxiliary/util/u_draw.h @@ -142,6 +142,14 @@ util_draw_range_elements(struct pipe_context *pipe, } +/* This converts an indirect draw into a direct draw by mapping the indirect + * buffer, extracting its arguments, and calling pipe-draw_vbo. + */ +void +util_draw_indirect(struct pipe_context *pipe, + const struct pipe_draw_info *info); + + unsigned util_draw_max_index( const struct pipe_vertex_buffer *vertex_buffers, diff --git a/src/gallium/auxiliary/util/u_dump_state.c b/src/gallium/auxiliary/util/u_dump_state.c index 2f28f3c..21b6044 100644 --- a/src/gallium/auxiliary/util/u_dump_state.c +++ b/src/gallium/auxiliary/util/u_dump_state.c @@ -758,6 +758,9 @@ util_dump_draw_info(FILE *stream, const struct pipe_draw_info *state) util_dump_member(stream, ptr, state, count_from_stream_output); + util_dump_member(stream, ptr, state, indirect); + util_dump_member(stream, uint, state, indirect_offset); + util_dump_struct_end(stream); } diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index d8cfb97..96f316a 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -151,6 +151,9 @@ The integer capabilities: dedicated memory should return 1 and all software rasterizers should return 0. * ``PIPE_CAP_QUERY_PIPELINE_STATISTICS``: Whether PIPE_QUERY_PIPELINE_STATISTICS is supported. +* ``PIPE_CAP_DRAW_INDIRECT``: Whether the driver supports taking draw arguments + { count, instance_count, start, index_bias } from a PIPE_BUFFER resource. + See pipe_draw_info. .. _pipe_capf: diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index 283d07f..2b13e29 100644 ---
[Mesa-dev] [PATCH] st/mesa: add support for indirect drawing v2
v2: Fix for constness of indirect buffer argument. Remove separate extension enable for multi_draw_indirect. --- src/mesa/state_tracker/st_cb_bufferobjects.c |3 +++ src/mesa/state_tracker/st_cb_bufferobjects.h |6 ++ src/mesa/state_tracker/st_draw.c | 11 ++- src/mesa/state_tracker/st_extensions.c |3 ++- 4 files changed, 21 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_bufferobjects.c b/src/mesa/state_tracker/st_cb_bufferobjects.c index 8ff32c8..2e719cc 100644 --- a/src/mesa/state_tracker/st_cb_bufferobjects.c +++ b/src/mesa/state_tracker/st_cb_bufferobjects.c @@ -205,6 +205,9 @@ st_bufferobj_data(struct gl_context *ctx, case GL_UNIFORM_BUFFER: bind = PIPE_BIND_CONSTANT_BUFFER; break; + case GL_DRAW_INDIRECT_BUFFER: + bind = PIPE_BIND_COMMAND_ARGS_BUFFER; + break; default: bind = 0; } diff --git a/src/mesa/state_tracker/st_cb_bufferobjects.h b/src/mesa/state_tracker/st_cb_bufferobjects.h index 1c991d2..05cc0fa 100644 --- a/src/mesa/state_tracker/st_cb_bufferobjects.h +++ b/src/mesa/state_tracker/st_cb_bufferobjects.h @@ -54,6 +54,12 @@ st_buffer_object(struct gl_buffer_object *obj) return (struct st_buffer_object *) obj; } +static INLINE const struct st_buffer_object * +st_const_buffer_object(const struct gl_buffer_object *obj) +{ + return (const struct st_buffer_object *) obj; +} + extern void st_bufferobj_validate_usage(struct st_context *st, diff --git a/src/mesa/state_tracker/st_draw.c b/src/mesa/state_tracker/st_draw.c index 82a4bcd..a07f8be 100644 --- a/src/mesa/state_tracker/st_draw.c +++ b/src/mesa/state_tracker/st_draw.c @@ -256,6 +256,14 @@ st_draw_vbo(struct gl_context *ctx, } } + if (indirect) { + info.indirect = st_const_buffer_object(indirect)-buffer; + + /* Primitive restart is not handled by the VBO module in this case. */ + info.primitive_restart = ctx-Array._PrimitiveRestart; + info.restart_index = ctx-Array._RestartIndex; + } + /* do actual drawing */ for (i = 0; i nr_prims; i++) { info.mode = translate_prim( ctx, prims[i].mode ); @@ -268,6 +276,7 @@ st_draw_vbo(struct gl_context *ctx, info.min_index = info.start; info.max_index = info.start + info.count - 1; } + info.indirect_offset = prims[i].indirect_offset; if (ST_DEBUG DEBUG_DRAW) { debug_printf(st/draw: mode %s start %u count %u indexed %d\n, @@ -277,7 +286,7 @@ st_draw_vbo(struct gl_context *ctx, info.indexed); } - if (info.count_from_stream_output) { + if (info.count_from_stream_output || info.indirect) { cso_draw_vbo(st-cso_context, info); } else if (info.primitive_restart) { diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 11db9d3..0488755 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -398,7 +398,8 @@ void st_init_extensions(struct st_context *st) { o(MESA_texture_array), PIPE_CAP_MAX_TEXTURE_ARRAY_LAYERS }, { o(OES_standard_derivatives), PIPE_CAP_SM3 }, - { o(ARB_texture_cube_map_array), PIPE_CAP_CUBE_MAP_ARRAY } + { o(ARB_texture_cube_map_array), PIPE_CAP_CUBE_MAP_ARRAY }, + { o(ARB_draw_indirect),PIPE_CAP_DRAW_INDIRECT }, }; /* Required: render target and sampler support */ -- 1.7.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa/draw_indirect: fix index bounds
(Will be merged into the original patches.) Calculating the actual limits is impossible, and softpipe drops vertices that lie outside the specified range. --- src/gallium/auxiliary/util/u_draw.c |4 src/mesa/state_tracker/st_draw.c|3 +++ src/mesa/vbo/vbo_exec_array.c |8 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/src/gallium/auxiliary/util/u_draw.c b/src/gallium/auxiliary/util/u_draw.c index b9f8fcd..d13ccd4 100644 --- a/src/gallium/auxiliary/util/u_draw.c +++ b/src/gallium/auxiliary/util/u_draw.c @@ -161,6 +161,10 @@ util_draw_indirect(struct pipe_context *pipe, info.index_bias = info_in-indexed ? params[3] : 0; info.start_instance = info_in-indexed ? params[4] : params[3]; info.indirect = NULL; + if (!info_in-indexed) { + info.min_index = info.start; + info.max_index = info.start + info.count - 1; + } pipe_buffer_unmap(pipe, transfer); diff --git a/src/mesa/state_tracker/st_draw.c b/src/mesa/state_tracker/st_draw.c index a07f8be..64470f7 100644 --- a/src/mesa/state_tracker/st_draw.c +++ b/src/mesa/state_tracker/st_draw.c @@ -273,6 +273,9 @@ st_draw_vbo(struct gl_context *ctx, info.instance_count = prims[i].num_instances; info.index_bias = prims[i].basevertex; if (!ib) { + /* NOTE: For indirect drawing, max_index correctly evaluates to ~0, + * since start and count will be 0. + */ info.min_index = info.start; info.max_index = info.start + info.count - 1; } diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c index 75fda00..ba70b5b 100644 --- a/src/mesa/vbo/vbo_exec_array.c +++ b/src/mesa/vbo/vbo_exec_array.c @@ -1382,7 +1382,7 @@ vbo_validated_drawarraysindirect(struct gl_context *ctx, check_buffers_are_unmapped(exec-array.inputs); vbo-draw_prims(ctx, prim, 1, - NULL, GL_TRUE, 0, 0, + NULL, GL_TRUE, 0, ~0, NULL, ctx-DrawIndirectBuffer); @@ -1422,7 +1422,7 @@ vbo_validated_multidrawarraysindirect(struct gl_context *ctx, check_buffers_are_unmapped(exec-array.inputs); vbo-draw_prims(ctx, prim, primcount, - NULL, GL_TRUE, 0, 0, + NULL, GL_TRUE, 0, ~0, NULL, ctx-DrawIndirectBuffer); @@ -1458,7 +1458,7 @@ vbo_validated_drawelementsindirect(struct gl_context *ctx, check_buffers_are_unmapped(exec-array.inputs); vbo-draw_prims(ctx, prim, 1, - ib, GL_TRUE, 0, 0, + ib, GL_TRUE, 0, ~0, NULL, ctx-DrawIndirectBuffer); @@ -1507,7 +1507,7 @@ vbo_validated_multidrawelementsindirect(struct gl_context *ctx, check_buffers_are_unmapped(exec-array.inputs); vbo-draw_prims(ctx, prim, primcount, - ib, GL_TRUE, 0, 0, + ib, GL_TRUE, 0, ~0, NULL, ctx-DrawIndirectBuffer); -- 1.7.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] tgsi: Add a conditional move inststruction
On 04.04.2013 03:45, Zack Rusin wrote: It's part of SM4 (http://goo.gl/4IpeK). It's also fairly painful to emulate without branching. Most hardware supports it natively and even llvm has a 'select' opcode which can handle it without too much hassle. diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 28308cb..6c5a02b 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -72,6 +72,17 @@ used. dst.w = src.w +.. opcode:: MOVC - Conditional move + +.. math:: + + dst.x = src0.x ? src1.x : src2.x + + dst.y = src0.y ? src1.y : src2.y + + dst.z = src0.z ? src1.z : src2.z + + dst.w = src0.w ? src1.w : src2.w I think we already have that: .. opcode:: UCMP - Integer Conditional Move .. math:: dst.x = src0.x ? src1.x : src2.x dst.y = src0.y ? src1.y : src2.y dst.z = src0.z ? src1.z : src2.z dst.w = src0.w ? src1.w : src2.w No difference apart from the source ordering (the integer just implies that any non-zero value counts as true, i.e. also inf, nan and -0). And if you want more conditional ops, in theory we also have predication, albeit support for that depends on the driver (PIPE_SHADER_CAP_MAX_PREDS). ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] tgsi: Add a conditional move inststruction
On 04.04.2013 16:53, Zack Rusin wrote: On 04.04.2013 03:45, Zack Rusin wrote: It's part of SM4 (http://goo.gl/4IpeK). It's also fairly painful to emulate without branching. Most hardware supports it natively and even llvm has a 'select' opcode which can handle it without too much hassle. diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 28308cb..6c5a02b 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -72,6 +72,17 @@ used. dst.w = src.w +.. opcode:: MOVC - Conditional move + +.. math:: + + dst.x = src0.x ? src1.x : src2.x + + dst.y = src0.y ? src1.y : src2.y + + dst.z = src0.z ? src1.z : src2.z + + dst.w = src0.w ? src1.w : src2.w I think we already have that: .. opcode:: UCMP - Integer Conditional Move .. math:: dst.x = src0.x ? src1.x : src2.x dst.y = src0.y ? src1.y : src2.y dst.z = src0.z ? src1.z : src2.z dst.w = src0.w ? src1.w : src2.w No difference apart from the source ordering (the integer just implies that any non-zero value counts as true, i.e. also inf, nan and -0). That's really broken. UCMP needs to be a an unsigned version of the CMP instruction which does Did you mean signed version ? Would you mind doing an s/UCMP/ICMP in TGSI and then chaning all the UCMPs in other code to MOVC ? You're right, it would make more sense like this, though you might want to call it IMOVC so the condition register isn't interpreted as a float ... or is it supposed to be ? dst.chan = (src0.chan 0) ? src1.chan : src2.chan not a whole new instruction. It's what everyone implements anyway. So if st_glsl_to_tgsi needs a conditional move we need to add the above patch and change it to use it. And if you want more conditional ops, in theory we also have predication, albeit support for that depends on the driver (PIPE_SHADER_CAP_MAX_PREDS). No, that's a completely different thing. z ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] tgsi: Add a conditional move inststruction
On 04.04.2013 17:01, Jose Fonseca wrote: - Original Message - On 04.04.2013 03:45, Zack Rusin wrote: It's part of SM4 (http://goo.gl/4IpeK). It's also fairly painful to emulate without branching. Most hardware supports it natively and even llvm has a 'select' opcode which can handle it without too much hassle. diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 28308cb..6c5a02b 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -72,6 +72,17 @@ used. dst.w = src.w +.. opcode:: MOVC - Conditional move + +.. math:: + + dst.x = src0.x ? src1.x : src2.x + + dst.y = src0.y ? src1.y : src2.y + + dst.z = src0.z ? src1.z : src2.z + + dst.w = src0.w ? src1.w : src2.w I think we already have that: .. opcode:: UCMP - Integer Conditional Move .. math:: dst.x = src0.x ? src1.x : src2.x dst.y = src0.y ? src1.y : src2.y dst.z = src0.z ? src1.z : src2.z dst.w = src0.w ? src1.w : src2.w No difference apart from the source ordering (the integer just implies that any non-zero value counts as true, i.e. also inf, nan and -0). That's really broken. UCMP needs to be a an unsigned version of the CMP instruction which does dst.chan = (src0.chan 0) ? src1.chan : src2.chan not a whole new instruction. It's what everyone implements anyway. So if st_glsl_to_tgsi needs a conditional move we need to add the above patch and change it to use it. Yes, it doesn't seem that any of the TGSI_OPCODE_UCMP implementation does that the spec says it supposedly does -- it seems everybody implements it as an unsigned version of CMP. That is, it seems UCMP's description needs to be fixed. Erm, unsigned 0 doesn't make sense. Definitely what the description says: static void micro_ucmp(union tgsi_exec_channel *dst, const union tgsi_exec_channel *src0, const union tgsi_exec_channel *src1, const union tgsi_exec_channel *src2) { dst-u[0] = src0-u[0] ? src1-u[0] : src2-u[0]; dst-u[1] = src0-u[1] ? src1-u[1] : src2-u[1]; dst-u[2] = src0-u[2] ? src1-u[2] : src2-u[2]; dst-u[3] = src0-u[3] ? src1-u[3] : src2-u[3]; } or case TGSI_OPCODE_UCMP: case TGSI_OPCODE_CMP: FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) { src0 = fetchSrc(0, c); src1 = fetchSrc(1, c); src2 = fetchSrc(2, c); if (src1 == src2) mkMov(dst0[c], src1); else mkCmp(OP_SLCT, (srcTy == TYPE_F32) ? CC_LT(less than 0) : CC_NE(not equal 0), srcTy, dst0[c], src1, src2, src0); } Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] tgsi: Add a conditional move inststruction
On 04.04.2013 17:23, Jose Fonseca wrote: - Original Message - On 04.04.2013 17:01, Jose Fonseca wrote: - Original Message - On 04.04.2013 03:45, Zack Rusin wrote: It's part of SM4 (http://goo.gl/4IpeK). It's also fairly painful to emulate without branching. Most hardware supports it natively and even llvm has a 'select' opcode which can handle it without too much hassle. diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 28308cb..6c5a02b 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -72,6 +72,17 @@ used. dst.w = src.w +.. opcode:: MOVC - Conditional move + +.. math:: + + dst.x = src0.x ? src1.x : src2.x + + dst.y = src0.y ? src1.y : src2.y + + dst.z = src0.z ? src1.z : src2.z + + dst.w = src0.w ? src1.w : src2.w I think we already have that: .. opcode:: UCMP - Integer Conditional Move .. math:: dst.x = src0.x ? src1.x : src2.x dst.y = src0.y ? src1.y : src2.y dst.z = src0.z ? src1.z : src2.z dst.w = src0.w ? src1.w : src2.w No difference apart from the source ordering (the integer just implies that any non-zero value counts as true, i.e. also inf, nan and -0). That's really broken. UCMP needs to be a an unsigned version of the CMP instruction which does dst.chan = (src0.chan 0) ? src1.chan : src2.chan not a whole new instruction. It's what everyone implements anyway. So if st_glsl_to_tgsi needs a conditional move we need to add the above patch and change it to use it. Yes, it doesn't seem that any of the TGSI_OPCODE_UCMP implementation does that the spec says it supposedly does -- it seems everybody implements it as an unsigned version of CMP. That is, it seems UCMP's description needs to be fixed. Erm, unsigned 0 doesn't make sense. Ah indeed! Definitely what the description says: static void micro_ucmp(union tgsi_exec_channel *dst, const union tgsi_exec_channel *src0, const union tgsi_exec_channel *src1, const union tgsi_exec_channel *src2) { dst-u[0] = src0-u[0] ? src1-u[0] : src2-u[0]; dst-u[1] = src0-u[1] ? src1-u[1] : src2-u[1]; dst-u[2] = src0-u[2] ? src1-u[2] : src2-u[2]; dst-u[3] = src0-u[3] ? src1-u[3] : src2-u[3]; } or case TGSI_OPCODE_UCMP: case TGSI_OPCODE_CMP: FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) { src0 = fetchSrc(0, c); src1 = fetchSrc(1, c); src2 = fetchSrc(2, c); if (src1 == src2) mkMov(dst0[c], src1); else mkCmp(OP_SLCT, (srcTy == TYPE_F32) ? CC_LT(less than 0) : CC_NE(not equal 0), srcTy, dst0[c], src1, src2, src0); } But odd enough, the implementations I happend to look at seemed to do foo = 0: Well, some people can't read documentation ... or they rely on the condition value always being a glsl-to-tgsi boolean which is only either 0 or ~0/-1. src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c has: static void emit_ucmp( const struct lp_build_tgsi_action * action, struct lp_build_tgsi_context * bld_base, struct lp_build_emit_data * emit_data) { LLVMBuilderRef builder = bld_base-base.gallivm-builder; LLVMValueRef v = LLVMBuildFCmp(builder, LLVMRealUGE, emit_data-args[0], lp_build_const_float(bld_base-base.gallivm, 0.), ); emit_data-output[emit_data-chan] = LLVMBuildSelect(builder, v, emit_data-args[2], emit_data-args[1], ); } (it doesn't even seem to do integers at all) src/gallium/drivers/r600/r600_shader.c: static int tgsi_ucmp(struct r600_shader_ctx *ctx) { struct tgsi_full_instruction *inst = ctx-parse.FullToken.FullInstruction; struct r600_bytecode_alu alu; int i, r; int lasti = tgsi_last_instruction(inst-Dst[0].Register.WriteMask); for (i = 0; i lasti + 1; i++) { if (!(inst-Dst[0].Register.WriteMask (1 i))) continue; memset(alu, 0, sizeof(struct r600_bytecode_alu)); alu.op = ALU_OP3_CNDGE_INT; r600_bytecode_src(alu.src[0], ctx-src[0], i); r600_bytecode_src(alu.src[1], ctx-src[2], i); r600_bytecode_src(alu.src[2], ctx-src[1], i); tgsi_dst(ctx, inst-Dst[0], i, alu.dst); alu.dst.chan = i; alu.dst.write = 1; alu.is_op3 = 1; if (i == lasti) alu.last = 1; r = r600_bytecode_add_alu(ctx-bc, alu); if (r) return r; } return 0; } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] mesa: implement GL_ARB_draw_indirect
--- src/mapi/glapi/gen/Makefile.am |1 + src/mapi/glapi/gen/gl_API.xml|4 +- src/mesa/drivers/dri/i965/brw_draw.c |3 +- src/mesa/drivers/dri/i965/brw_draw.h |3 +- src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c |9 +- src/mesa/main/api_validate.c | 159 src/mesa/main/api_validate.h | 26 +++ src/mesa/main/bufferobj.c|9 + src/mesa/main/dd.h | 12 ++ src/mesa/main/dlist.c| 41 src/mesa/main/extensions.c |2 + src/mesa/main/get.c |5 + src/mesa/main/get_hash_params.py |2 + src/mesa/main/mtypes.h |4 + src/mesa/main/tests/dispatch_sanity.cpp |8 +- src/mesa/main/vtxfmt.c |7 + src/mesa/state_tracker/st_cb_rasterpos.c |2 +- src/mesa/state_tracker/st_draw.c |3 +- src/mesa/state_tracker/st_draw.h |6 +- src/mesa/state_tracker/st_draw_feedback.c|3 +- src/mesa/tnl/tnl.h |3 +- src/mesa/vbo/vbo.h |5 +- src/mesa/vbo/vbo_exec_array.c| 255 +- src/mesa/vbo/vbo_exec_draw.c |2 +- src/mesa/vbo/vbo_primitive_restart.c |4 +- src/mesa/vbo/vbo_rebase.c|2 +- src/mesa/vbo/vbo_save_api.c | 53 ++ src/mesa/vbo/vbo_save_draw.c |2 +- src/mesa/vbo/vbo_split_copy.c|2 +- src/mesa/vbo/vbo_split_inplace.c |2 +- 30 files changed, 611 insertions(+), 28 deletions(-) diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am index 36e47e2..243c148 100644 --- a/src/mapi/glapi/gen/Makefile.am +++ b/src/mapi/glapi/gen/Makefile.am @@ -96,6 +96,7 @@ API_XML = \ ARB_depth_clamp.xml \ ARB_draw_buffers_blend.xml \ ARB_draw_elements_base_vertex.xml \ + ARB_draw_indirect.xml \ ARB_draw_instanced.xml \ ARB_ES2_compatibility.xml \ ARB_ES3_compatibility.xml \ diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index df95924..f22fdac 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -8240,6 +8240,8 @@ !-- ARB extensions #86...#93 -- +xi:include href=ARB_draw_indirect.xml xmlns:xi=http://www.w3.org/2001/XInclude/ + category name=GL_ARB_transform_feedback3 number=94 enum name=MAX_TRANSFORM_FEEDBACK_BUFFERS value=0x8E70/ enum name=MAX_VERTEX_STREAMS value=0x8E71/ @@ -8317,7 +8319,7 @@ xi:include href=ARB_invalidate_subdata.xml xmlns:xi=http://www.w3.org/2001/XInclude/ -!-- ARB extensions #133...#138 -- +!-- ARB extensions #134...#138 -- xi:include href=ARB_texture_buffer_range.xml xmlns:xi=http://www.w3.org/2001/XInclude/ diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index 809bcc5..d0c8415 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -548,7 +548,8 @@ void brw_draw_prims( struct gl_context *ctx, GLboolean index_bounds_valid, GLuint min_index, GLuint max_index, -struct gl_transform_feedback_object *tfb_vertcount ) +struct gl_transform_feedback_object *tfb_vertcount, +struct gl_buffer_object *indirect ) { struct intel_context *intel = intel_context(ctx); const struct gl_client_array **arrays = ctx-Array._DrawArrays; diff --git a/src/mesa/drivers/dri/i965/brw_draw.h b/src/mesa/drivers/dri/i965/brw_draw.h index d86a9e7..3dfac2e 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.h +++ b/src/mesa/drivers/dri/i965/brw_draw.h @@ -41,7 +41,8 @@ void brw_draw_prims( struct gl_context *ctx, GLboolean index_bounds_valid, GLuint min_index, GLuint max_index, -struct gl_transform_feedback_object *tfb_vertcount ); +struct gl_transform_feedback_object *tfb_vertcount, +struct gl_buffer_object *tfb_vertcount ); void brw_draw_init( struct brw_context *brw ); void brw_draw_destroy( struct brw_context *brw ); diff --git a/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c b/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c index 436db32..4dee0b8 100644 --- a/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c +++ b/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c @@ -222,7 +222,8 @@ TAG(vbo_render_prims)(struct gl_context *ctx, const struct _mesa_index_buffer *ib, GLboolean index_bounds_valid, GLuint min_index, GLuint max_index, - struct gl_transform_feedback_object *tfb_vertcount); +
[Mesa-dev] [PATCH 4/4] st/mesa: add support for indirect drawing
--- src/mesa/state_tracker/st_cb_bufferobjects.c |3 +++ src/mesa/state_tracker/st_draw.c | 11 ++- src/mesa/state_tracker/st_extensions.c |4 +++- 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_bufferobjects.c b/src/mesa/state_tracker/st_cb_bufferobjects.c index 8ff32c8..5a44bf2 100644 --- a/src/mesa/state_tracker/st_cb_bufferobjects.c +++ b/src/mesa/state_tracker/st_cb_bufferobjects.c @@ -205,6 +205,9 @@ st_bufferobj_data(struct gl_context *ctx, case GL_UNIFORM_BUFFER: bind = PIPE_BIND_CONSTANT_BUFFER; break; + case GL_DRAW_INDIRECT_BUFFER: + bind = PIPE_BIND_COMMAND_BUFFER; + break; default: bind = 0; } diff --git a/src/mesa/state_tracker/st_draw.c b/src/mesa/state_tracker/st_draw.c index ee1c902..f1379ab 100644 --- a/src/mesa/state_tracker/st_draw.c +++ b/src/mesa/state_tracker/st_draw.c @@ -256,6 +256,14 @@ st_draw_vbo(struct gl_context *ctx, } } + if (indirect) { + info.indirect = st_buffer_object(indirect)-buffer; + + /* Primitive restart is not handled by the VBO module in this case. */ + info.primitive_restart = ctx-Array._PrimitiveRestart; + info.restart_index = ctx-Array._RestartIndex; + } + /* do actual drawing */ for (i = 0; i nr_prims; i++) { info.mode = translate_prim( ctx, prims[i].mode ); @@ -268,6 +276,7 @@ st_draw_vbo(struct gl_context *ctx, info.min_index = info.start; info.max_index = info.start + info.count - 1; } + info.indirect_offset = prims[i].indirect_offset; if (ST_DEBUG DEBUG_DRAW) { debug_printf(st/draw: mode %s start %u count %u indexed %d\n, @@ -277,7 +286,7 @@ st_draw_vbo(struct gl_context *ctx, info.indexed); } - if (info.count_from_stream_output) { + if (info.count_from_stream_output || info.indirect) { cso_draw_vbo(st-cso_context, info); } else if (info.primitive_restart) { diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 11db9d3..c021cda 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -398,7 +398,9 @@ void st_init_extensions(struct st_context *st) { o(MESA_texture_array), PIPE_CAP_MAX_TEXTURE_ARRAY_LAYERS }, { o(OES_standard_derivatives), PIPE_CAP_SM3 }, - { o(ARB_texture_cube_map_array), PIPE_CAP_CUBE_MAP_ARRAY } + { o(ARB_texture_cube_map_array), PIPE_CAP_CUBE_MAP_ARRAY }, + { o(ARB_draw_indirect),PIPE_CAP_DRAW_INDIRECT }, + { o(ARB_multi_draw_indirect), PIPE_CAP_DRAW_INDIRECT } }; /* Required: render target and sampler support */ -- 1.7.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] gallium: add facilities for indirect drawing
--- src/gallium/auxiliary/util/u_draw.c | 39 ++ src/gallium/auxiliary/util/u_draw.h |5 +++ src/gallium/auxiliary/util/u_dump_state.c|3 ++ src/gallium/docs/source/screen.rst |3 ++ src/gallium/drivers/freedreno/freedreno_screen.c |1 + src/gallium/drivers/i915/i915_screen.c |1 + src/gallium/drivers/llvmpipe/lp_draw_arrays.c|5 +++ src/gallium/drivers/llvmpipe/lp_screen.c |2 + src/gallium/drivers/nv30/nv30_screen.c |1 + src/gallium/drivers/nv50/nv50_screen.c |2 + src/gallium/drivers/r300/r300_screen.c |1 + src/gallium/drivers/r600/r600_pipe.c |1 + src/gallium/drivers/radeonsi/radeonsi_pipe.c |1 + src/gallium/drivers/softpipe/sp_draw_arrays.c|6 +++ src/gallium/drivers/softpipe/sp_screen.c |2 + src/gallium/drivers/svga/svga_screen.c |1 + src/gallium/drivers/trace/tr_dump_state.c|3 ++ src/gallium/include/pipe/p_defines.h |3 +- src/gallium/include/pipe/p_state.h | 22 19 files changed, 101 insertions(+), 1 deletions(-) diff --git a/src/gallium/auxiliary/util/u_draw.c b/src/gallium/auxiliary/util/u_draw.c index 83d9284..7a28cf1 100644 --- a/src/gallium/auxiliary/util/u_draw.c +++ b/src/gallium/auxiliary/util/u_draw.c @@ -27,6 +27,7 @@ #include util/u_debug.h +#include util/u_inlines.h #include util/u_math.h #include util/u_format.h #include util/u_draw.h @@ -123,3 +124,41 @@ util_draw_max_index( return max_index + 1; } + + +void +util_draw_indirect(struct pipe_context *pipe, + const struct pipe_draw_info *_info) +{ + struct pipe_draw_info info; + struct pipe_transfer *transfer; + uint32_t *params; + + assert(_info-indirect); + assert(!_info-count_from_stream_output); + + memcpy(info, _info, sizeof(info)); + + params = (uint32_t *) + pipe_buffer_map_range(pipe, +_info-indirect, +_info-indirect_offset, +_info-indexed ? (4 * 4) : (3 * 4), +PIPE_TRANSFER_READ, +transfer); + if (!transfer) { + debug_printf(%s: failed to map indirect buffer\n, __FUNCTION__); + return; + } + + info.count = params[0]; + info.instance_count = params[1]; + info.start = params[2]; + info.index_bias = _info-indexed ? params[3] : 0; + info.start_instance = _info-indexed ? params[4] : params[3]; + info.indirect = NULL; + + pipe_buffer_unmap(pipe, transfer); + + pipe-draw_vbo(pipe, info); +} diff --git a/src/gallium/auxiliary/util/u_draw.h b/src/gallium/auxiliary/util/u_draw.h index 3dc6918..acec56e 100644 --- a/src/gallium/auxiliary/util/u_draw.h +++ b/src/gallium/auxiliary/util/u_draw.h @@ -142,6 +142,11 @@ util_draw_range_elements(struct pipe_context *pipe, } +void +util_draw_indirect(struct pipe_context *pipe, + const struct pipe_draw_info *info); + + unsigned util_draw_max_index( const struct pipe_vertex_buffer *vertex_buffers, diff --git a/src/gallium/auxiliary/util/u_dump_state.c b/src/gallium/auxiliary/util/u_dump_state.c index 2f28f3c..21b6044 100644 --- a/src/gallium/auxiliary/util/u_dump_state.c +++ b/src/gallium/auxiliary/util/u_dump_state.c @@ -758,6 +758,9 @@ util_dump_draw_info(FILE *stream, const struct pipe_draw_info *state) util_dump_member(stream, ptr, state, count_from_stream_output); + util_dump_member(stream, ptr, state, indirect); + util_dump_member(stream, uint, state, indirect_offset); + util_dump_struct_end(stream); } diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index f8cdded..ed4749d 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -151,6 +151,9 @@ The integer capabilities: dedicated memory should return 1 and all software rasterizers should return 0. * ``PIPE_CAP_QUERY_PIPELINE_STATISTICS``: Whether PIPE_QUERY_PIPELINE_STATISTICS is supported. +* ``PIPE_CAP_DRAW_INDIRECT``: Whether the driver supports taking draw arguments + { count, instance_count, start, index_bias } from a PIPE_BUFFER resource. + See pipe_draw_info. .. _pipe_capf: diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index 283d07f..2b13e29 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -200,6 +200,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_USER_VERTEX_BUFFERS: case PIPE_CAP_USER_INDEX_BUFFERS: case PIPE_CAP_QUERY_PIPELINE_STATISTICS: + case PIPE_CAP_DRAW_INDIRECT: return 0; /* Stream output. */ diff --git a/src/gallium/drivers/i915/i915_screen.c
[Mesa-dev] [PATCH] mesa: implement GL_ARB_draw_indirect (added missing ARB_draw_indirect.xml)
--- src/mapi/glapi/gen/ARB_draw_indirect.xml | 45 + src/mapi/glapi/gen/Makefile.am |1 + src/mapi/glapi/gen/gl_API.xml|4 +- src/mesa/drivers/dri/i965/brw_draw.c |3 +- src/mesa/drivers/dri/i965/brw_draw.h |3 +- src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c |9 +- src/mesa/main/api_validate.c | 159 src/mesa/main/api_validate.h | 26 +++ src/mesa/main/bufferobj.c|9 + src/mesa/main/dd.h | 12 ++ src/mesa/main/dlist.c| 41 src/mesa/main/extensions.c |2 + src/mesa/main/get.c |5 + src/mesa/main/get_hash_params.py |2 + src/mesa/main/mtypes.h |4 + src/mesa/main/tests/dispatch_sanity.cpp |8 +- src/mesa/main/vtxfmt.c |7 + src/mesa/state_tracker/st_cb_rasterpos.c |2 +- src/mesa/state_tracker/st_draw.c |3 +- src/mesa/state_tracker/st_draw.h |6 +- src/mesa/state_tracker/st_draw_feedback.c|3 +- src/mesa/tnl/tnl.h |3 +- src/mesa/vbo/vbo.h |5 +- src/mesa/vbo/vbo_exec_array.c| 255 +- src/mesa/vbo/vbo_exec_draw.c |2 +- src/mesa/vbo/vbo_primitive_restart.c |4 +- src/mesa/vbo/vbo_rebase.c|2 +- src/mesa/vbo/vbo_save_api.c | 53 ++ src/mesa/vbo/vbo_save_draw.c |2 +- src/mesa/vbo/vbo_split_copy.c|2 +- src/mesa/vbo/vbo_split_inplace.c |2 +- 31 files changed, 656 insertions(+), 28 deletions(-) create mode 100644 src/mapi/glapi/gen/ARB_draw_indirect.xml diff --git a/src/mapi/glapi/gen/ARB_draw_indirect.xml b/src/mapi/glapi/gen/ARB_draw_indirect.xml new file mode 100644 index 000..7de03cd --- /dev/null +++ b/src/mapi/glapi/gen/ARB_draw_indirect.xml @@ -0,0 +1,45 @@ +?xml version=1.0? +!DOCTYPE OpenGLAPI SYSTEM gl_API.dtd + +OpenGLAPI + +category name=GL_ARB_draw_indirect number=87 + +enum name=DRAW_INDIRECT_BUFFER value=0x8F3F/ +enum name=DRAW_INDIRECT_BUFFER_BINDING value=0x8F43/ + +function name=DrawArraysIndirect offset=assign exec=dynamic +param name=mode type=GLenum/ +param name=indirect type=const GLvoid */ +/function + +function name=DrawElementsIndirect offset=assign exec=dynamic +param name=mode type=GLenum/ +param name=type type=GLenum/ +param name=indirect type=const GLvoid */ +/function + +/category + + +category name=GL_ARB_multi_draw_indirect number=133 + +function name=MultiDrawArraysIndirect offset=assign exec=dynamic +param name=mode type=GLenum/ +param name=indirect type=const GLvoid */ +param name=primcount type=GLsizei/ +param name=stride type=GLsizei/ +/function + +function name=MultiDrawElementsIndirect offset=assign exec=dynamic +param name=mode type=GLenum/ +param name=type type=GLenum/ +param name=indirect type=const GLvoid */ +param name=primcount type=GLsizei/ +param name=stride type=GLsizei/ +/function + +/category + + +/OpenGLAPI diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am index 36e47e2..243c148 100644 --- a/src/mapi/glapi/gen/Makefile.am +++ b/src/mapi/glapi/gen/Makefile.am @@ -96,6 +96,7 @@ API_XML = \ ARB_depth_clamp.xml \ ARB_draw_buffers_blend.xml \ ARB_draw_elements_base_vertex.xml \ + ARB_draw_indirect.xml \ ARB_draw_instanced.xml \ ARB_ES2_compatibility.xml \ ARB_ES3_compatibility.xml \ diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index df95924..f22fdac 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -8240,6 +8240,8 @@ !-- ARB extensions #86...#93 -- +xi:include href=ARB_draw_indirect.xml xmlns:xi=http://www.w3.org/2001/XInclude/ + category name=GL_ARB_transform_feedback3 number=94 enum name=MAX_TRANSFORM_FEEDBACK_BUFFERS value=0x8E70/ enum name=MAX_VERTEX_STREAMS value=0x8E71/ @@ -8317,7 +8319,7 @@ xi:include href=ARB_invalidate_subdata.xml xmlns:xi=http://www.w3.org/2001/XInclude/ -!-- ARB extensions #133...#138 -- +!-- ARB extensions #134...#138 -- xi:include href=ARB_texture_buffer_range.xml xmlns:xi=http://www.w3.org/2001/XInclude/ diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index 809bcc5..d0c8415 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -548,7 +548,8 @@ void brw_draw_prims( struct gl_context *ctx, GLboolean index_bounds_valid, GLuint min_index,
Re: [Mesa-dev] [PATCH 2/4] gallium: add PIPE_BIND_COMMAND_BUFFER
On 04.04.2013 21:44, Jose Fonseca wrote: I think that PIPE_BIND_INDIRECT_BUFFER would be more self-descriptive. Or do you envision other uses of such buffer? It's possible that at some point we add a mechanism to let the driver store arbitrary commands into a buffer created by the st, or have resources used as arguments conditional rendering ... Lost of possiblities, but nothing concrete, and for the command lists like with D3D's deferred contexts we'd probably return opaque objects that can contain more auxiliary data. I like it to be more generic, but then it could turn out that there be different requirements on these command source buffers in the future ... I'm undecided now. Jose - Original Message - Intended for use with GL_ARB_draw_indirect's DRAW_INDIRECT_BUFFER target or for D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS. --- src/gallium/docs/source/screen.rst |2 ++ src/gallium/include/pipe/p_defines.h |1 + 2 files changed, 3 insertions(+), 0 deletions(-) diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index c1a3c0b..f8cdded 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -306,6 +306,8 @@ resources might be created and handled quite differently. bound to the graphics pipeline as a shader resource. * ``PIPE_BIND_COMPUTE_RESOURCE``: A buffer or texture that can be bound to the compute program as a shader resource. +* ``PIPE_BIND_COMMAND_BUFFER``: A buffer or that may be sourced by the + GPU command processor, like with indirect drawing. .. _pipe_usage: diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index 5b00acc..2b79f2a 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -315,6 +315,7 @@ enum pipe_flush_flags { #define PIPE_BIND_GLOBAL (1 18) /* set_global_binding */ #define PIPE_BIND_SHADER_RESOURCE (1 19) /* set_shader_resources */ #define PIPE_BIND_COMPUTE_RESOURCE (1 20) /* set_compute_resources */ +#define PIPE_BIND_COMMAND_BUFFER (1 21) /* pipe_draw_info.indirect */ /* The first two flags above were previously part of the amorphous * TEXTURE_USAGE, most of which are now descriptions of the ways a -- 1.7.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: fix bitmap, drawpix, drawtex for PIPE_CAP_TGSI_TEXCOORD
On 02.04.2013 16:39, Brian Paul wrote: On 03/30/2013 08:11 AM, Christoph Bumiller wrote: NOTE: Changed the semantic index for the drawtex coordiante to be the texture unit index instead of always 0. Not sure if this is correct but since the value seems to depend on the unit it would make sense to use different varying slots. Tested-by: Brian Paul bri...@vmware.com Thanks ! Just to be sure, you're referring to the part that changes the semantic index so that TEX0..7(max units) is used instead of always TEX0, right ? I'll push that as a separate patch then. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] st/mesa: fix bitmap, drawpix, drawtex for PIPE_CAP_TGSI_TEXCOORD
NOTE: Changed the semantic index for the drawtex coordiante to be the texture unit index instead of always 0. Not sure if this is correct but since the value seems to depend on the unit it would make sense to use different varying slots. --- src/mesa/state_tracker/st_cb_bitmap.c |1 + src/mesa/state_tracker/st_cb_drawpixels.c |5 - src/mesa/state_tracker/st_cb_drawtex.c|5 +++-- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_bitmap.c b/src/mesa/state_tracker/st_cb_bitmap.c index bae9ff8..0513814 100644 --- a/src/mesa/state_tracker/st_cb_bitmap.c +++ b/src/mesa/state_tracker/st_cb_bitmap.c @@ -766,6 +766,7 @@ st_Bitmap(struct gl_context *ctx, GLint x, GLint y, /* create pass-through vertex shader now */ const uint semantic_names[] = { TGSI_SEMANTIC_POSITION, TGSI_SEMANTIC_COLOR, +st-needs_texcoord_semantic ? TGSI_SEMANTIC_TEXCOORD : TGSI_SEMANTIC_GENERIC }; const uint semantic_indexes[] = { 0, 0, 0 }; st-bitmap.vs = util_make_vertex_passthrough_shader(st-pipe, 3, diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c b/src/mesa/state_tracker/st_cb_drawpixels.c index f0baa34..b25b776 100644 --- a/src/mesa/state_tracker/st_cb_drawpixels.c +++ b/src/mesa/state_tracker/st_cb_drawpixels.c @@ -294,6 +294,9 @@ static void * make_passthrough_vertex_shader(struct st_context *st, GLboolean passColor) { + const unsigned texcoord_semantic = st-needs_texcoord_semantic ? + TGSI_SEMANTIC_TEXCOORD : TGSI_SEMANTIC_GENERIC; + if (!st-drawpix.vert_shaders[passColor]) { struct ureg_program *ureg = ureg_create( TGSI_PROCESSOR_VERTEX ); @@ -307,7 +310,7 @@ make_passthrough_vertex_shader(struct st_context *st, /* MOV result.texcoord0, vertex.attr[1]; */ ureg_MOV(ureg, - ureg_DECL_output( ureg, TGSI_SEMANTIC_GENERIC, 0 ), + ureg_DECL_output( ureg, texcoord_semantic, 0 ), ureg_DECL_vs_input( ureg, 1 )); if (passColor) { diff --git a/src/mesa/state_tracker/st_cb_drawtex.c b/src/mesa/state_tracker/st_cb_drawtex.c index a8806c9..fc1cb7d 100644 --- a/src/mesa/state_tracker/st_cb_drawtex.c +++ b/src/mesa/state_tracker/st_cb_drawtex.c @@ -209,8 +209,9 @@ st_DrawTex(struct gl_context *ctx, GLfloat x, GLfloat y, GLfloat z, SET_ATTRIB(2, attr, s1, t1, 0.0f, 1.0f); /* upper right */ SET_ATTRIB(3, attr, s0, t1, 0.0f, 1.0f); /* upper left */ -semantic_names[attr] = TGSI_SEMANTIC_GENERIC; -semantic_indexes[attr] = 0; +semantic_names[attr] = st-needs_texcoord_semantic ? + TGSI_SEMANTIC_TEXCOORD : TGSI_SEMANTIC_GENERIC; +semantic_indexes[attr] = i; attr++; } -- 1.7.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix range handling for tgsi input/output declarations
On 29.03.2013 10:56, Christian König wrote: Am 28.03.2013 20:34, schrieb Vadim Girlin: On 03/28/2013 01:01 PM, � wrote: Am 27.03.2013 20:37, schrieb Vadim Girlin: Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- src/gallium/drivers/r600/r600_shader.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 29facf7..d4c9c03 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -874,12 +874,12 @@ static int select_twoside_color(struct r600_shader_ctx *ctx, int front, int back static int tgsi_declaration(struct r600_shader_ctx *ctx) { struct tgsi_full_declaration *d = ctx-parse.FullToken.FullDeclaration; -unsigned i; -int r; +int r, i, j, count = d-Range.Last - d-Range.First + 1; switch (d-Declaration.File) { case TGSI_FILE_INPUT: -i = ctx-shader-ninput++; +i = ctx-shader-ninput; +ctx-shader-ninput += count; ctx-shader-input[i].name = d-Semantic.Name; ctx-shader-input[i].sid = d-Semantic.Index; ctx-shader-input[i].interpolate = d-Interp.Interpolate; @@ -903,9 +903,15 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx) return r; } } +for (j = 1; j count; ++j) { +memcpy(ctx-shader-input[i + j], ctx-shader-input[i], + sizeof(struct r600_shader_io)); Instead of memcpy, shouldn't an assignment do the trick here as well? Yes, assignment should work fine, I just used to use memcpy in such cases for some reason. I'll replace memcpy with assignment. Also I think second part (outputs handling) can be dropped for now - currently we only need to handle the inputs (for HUD shaders), and later when array declarations for inputs/outputs will be implemented in TGSI probably we'll need to update the parser in r600g anyway - I'm just not sure yet how the semantic indices should be handled for input/output arrays. The semantic indices are sequential, obviously. It gets more complex with scalar arrays, but you don't have to worry about that in r600 because I'd probably add a cap for those. Example: If you declare an out float a[8] layout(location = k) in GLSL (as per ARB_separate_shader_objects), the 8 values are counted as consuming 8 consecutive vec4 slots (here the user is responsible for packing, nice !). The location will be communicated via the semantic index. You'd get DCL OUT[n..n+7] GENERIC[k] (or k+some_constant_offset because of st/mesa's allocation policy). If the consuming shader declares in float b[4] layout(location = k+4), you'd get DCL IN[m..m+3] GENERIC[k+4], and this has to link with the upper 4 components out the a[8] output. Yeah, the uncertainly about semantic IDs was one of the reasons I didn't wanted to do Input/Output arrays in the initial array implementation. When those changes are the only one then v2 of the patch is: Reviewed-by: Christian König christian.koe...@amd.com Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium: add PIPE_CAP_QUERY_PIPELINE_STATISTICS
--- src/gallium/docs/source/screen.rst |2 ++ src/gallium/drivers/freedreno/freedreno_screen.c |1 + src/gallium/drivers/i915/i915_screen.c |1 + src/gallium/drivers/llvmpipe/lp_screen.c |2 ++ src/gallium/drivers/nv30/nv30_screen.c |1 + src/gallium/drivers/nv50/nv50_screen.c |2 ++ src/gallium/drivers/nvc0/nvc0_screen.c |1 + src/gallium/drivers/r300/r300_screen.c |1 + src/gallium/drivers/r600/r600_pipe.c |1 + src/gallium/drivers/radeonsi/radeonsi_pipe.c |1 + src/gallium/drivers/softpipe/sp_screen.c |2 ++ src/gallium/include/pipe/p_defines.h |3 ++- 12 files changed, 17 insertions(+), 1 deletions(-) diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 8c7e86e..c1a3c0b 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -149,6 +149,8 @@ The integer capabilities: to use a blit to implement a texture transfer which needs format conversions and swizzling in state trackers. Generally, all hardware drivers with dedicated memory should return 1 and all software rasterizers should return 0. +* ``PIPE_CAP_QUERY_PIPELINE_STATISTICS``: Whether PIPE_QUERY_PIPELINE_STATISTICS + is supported. .. _pipe_capf: diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index 79eef5e..283d07f 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -199,6 +199,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_VERTEX_COLOR_CLAMPED: case PIPE_CAP_USER_VERTEX_BUFFERS: case PIPE_CAP_USER_INDEX_BUFFERS: + case PIPE_CAP_QUERY_PIPELINE_STATISTICS: return 0; /* Stream output. */ diff --git a/src/gallium/drivers/i915/i915_screen.c b/src/gallium/drivers/i915/i915_screen.c index 13aa91c..54b2154 100644 --- a/src/gallium/drivers/i915/i915_screen.c +++ b/src/gallium/drivers/i915/i915_screen.c @@ -210,6 +210,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap cap) case PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION: case PIPE_CAP_START_INSTANCE: case PIPE_CAP_QUERY_TIMESTAMP: + case PIPE_CAP_QUERY_PIPELINE_STATISTICS: case PIPE_CAP_TEXTURE_MULTISAMPLE: case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT: return 0; diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c b/src/gallium/drivers/llvmpipe/lp_screen.c index e8c6ab1..6700887 100644 --- a/src/gallium/drivers/llvmpipe/lp_screen.c +++ b/src/gallium/drivers/llvmpipe/lp_screen.c @@ -130,6 +130,8 @@ llvmpipe_get_param(struct pipe_screen *screen, enum pipe_cap param) return 0; case PIPE_CAP_QUERY_TIMESTAMP: return 1; + case PIPE_CAP_QUERY_PIPELINE_STATISTICS: + return 0; case PIPE_CAP_TEXTURE_MIRROR_CLAMP: return 1; case PIPE_CAP_TEXTURE_SHADOW_MAP: diff --git a/src/gallium/drivers/nv30/nv30_screen.c b/src/gallium/drivers/nv30/nv30_screen.c index 4084869..e33710e 100644 --- a/src/gallium/drivers/nv30/nv30_screen.c +++ b/src/gallium/drivers/nv30/nv30_screen.c @@ -122,6 +122,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT: case PIPE_CAP_TEXTURE_BUFFER_OBJECTS: case PIPE_CAP_TEXTURE_BUFFER_OFFSET_ALIGNMENT: + case PIPE_CAP_QUERY_PIPELINE_STATISTICS: return 0; case PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY: case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY: diff --git a/src/gallium/drivers/nv50/nv50_screen.c b/src/gallium/drivers/nv50/nv50_screen.c index 0a20ae3..53eeeb6 100644 --- a/src/gallium/drivers/nv50/nv50_screen.c +++ b/src/gallium/drivers/nv50/nv50_screen.c @@ -189,6 +189,8 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) return 0; case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER: return 1; + case PIPE_CAP_QUERY_PIPELINE_STATISTICS: + return 0; default: NOUVEAU_ERR(unknown PIPE_CAP %d\n, param); return 0; diff --git a/src/gallium/drivers/nvc0/nvc0_screen.c b/src/gallium/drivers/nvc0/nvc0_screen.c index 5b9385a..3a32539 100644 --- a/src/gallium/drivers/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nvc0/nvc0_screen.c @@ -136,6 +136,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_QUERY_TIME_ELAPSED: case PIPE_CAP_OCCLUSION_QUERY: case PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME: + case PIPE_CAP_QUERY_PIPELINE_STATISTICS: return 1; case PIPE_CAP_MAX_STREAM_OUTPUT_BUFFERS: return 4; diff --git a/src/gallium/drivers/r300/r300_screen.c b/src/gallium/drivers/r300/r300_screen.c index bd16c3b..3175b3b 100644 --- a/src/gallium/drivers/r300/r300_screen.c +++ b/src/gallium/drivers/r300/r300_screen.c @@ -135,6
[Mesa-dev] [PATCH] gallium/hud: add support for PIPE_QUERY_PIPELINE_STATISTICS
Also, renamed pixels-rendered to samples-passed because the occlusion counter increments even if colour and depth writes are disabled, or (on some implementations) for killed that passed the depth test when early_fragment_tests has been set for the PS. --- src/gallium/auxiliary/hud/hud_context.c | 45 +++-- src/gallium/auxiliary/hud/hud_cpu.c |6 ++- src/gallium/auxiliary/hud/hud_driver_query.c |8 +++-- src/gallium/auxiliary/hud/hud_private.h |1 + 4 files changed, 51 insertions(+), 9 deletions(-) diff --git a/src/gallium/auxiliary/hud/hud_context.c b/src/gallium/auxiliary/hud/hud_context.c index 60355ca..cfb58a8 100644 --- a/src/gallium/auxiliary/hud/hud_context.c +++ b/src/gallium/auxiliary/hud/hud_context.c @@ -90,6 +90,10 @@ struct hud_context { unsigned max_num_vertices; unsigned num_vertices; } text, bg, whitelines; + + struct { + boolean query_pipeline_statistics; + } cap; }; @@ -716,15 +720,45 @@ hud_parse_env_var(struct hud_context *hud, const char *env) else if (sscanf(name, cpu%u%s, i, s) == 1) { hud_cpu_graph_install(pane, i); } - else if (strcmp(name, pixels-rendered) == 0 + else if (strcmp(name, samples-passed) == 0 has_occlusion_query(hud-pipe-screen)) { - hud_pipe_query_install(pane, hud-pipe, pixels-rendered, -PIPE_QUERY_OCCLUSION_COUNTER, 0, FALSE); + hud_pipe_query_install(pane, hud-pipe, samples-passed, +PIPE_QUERY_OCCLUSION_COUNTER, 0, 0, FALSE); } else if (strcmp(name, primitives-generated) == 0 has_streamout(hud-pipe-screen)) { hud_pipe_query_install(pane, hud-pipe, primitives-generated, -PIPE_QUERY_PRIMITIVES_GENERATED, 0, FALSE); +PIPE_QUERY_PRIMITIVES_GENERATED, 0, 0, FALSE); + } + else if (strncmp(name, pipeline-statistics-, 20) == 0) { + if (hud-cap.query_pipeline_statistics) { +static const char *pipeline_statistics_names[] = +{ + ia_vertices, + ia_primitives, + vs_invocations, + gs_invocations, + gs_primitives, + c_invocationd, + c_primitives, + ps_invocations, + hs_invocations, + ds_invocations, + cs_invocations +}; +for (i = 0; i Elements(pipeline_statistics_names); ++i) + if (strcmp(name[20], pipeline_statistics_names[i]) == 0) + break; +if (i Elements(pipeline_statistics_names)) + hud_pipe_query_install(pane, hud-pipe, name[20], + PIPE_QUERY_PIPELINE_STATISTICS, i, + 0, FALSE); +else + fprintf(stderr, gallium_hud: invalid pipeline-statistics-*\n); + } else { +fprintf(stderr, gallium_hud: PIPE_QUERY_PIPELINE_STATISTICS +not supported by the driver\n); + } } else { if (!hud_driver_query_install(pane, hud-pipe, name)){ @@ -963,6 +997,9 @@ hud_create(struct pipe_context *pipe, struct cso_context *cso) LIST_INITHEAD(hud-pane_list); + hud-cap.query_pipeline_statistics = + pipe-screen-get_param(pipe-screen, PIPE_CAP_QUERY_PIPELINE_STATISTICS); + hud_parse_env_var(hud, env); return hud; } diff --git a/src/gallium/auxiliary/hud/hud_cpu.c b/src/gallium/auxiliary/hud/hud_cpu.c index dfd9f68..ce98115 100644 --- a/src/gallium/auxiliary/hud/hud_cpu.c +++ b/src/gallium/auxiliary/hud/hud_cpu.c @@ -32,6 +32,7 @@ #include os/os_time.h #include util/u_memory.h #include stdio.h +#include inttypes.h static boolean get_cpu_stats(unsigned cpu_index, uint64_t *busy_time, uint64_t *total_time) @@ -55,8 +56,9 @@ get_cpu_stats(unsigned cpu_index, uint64_t *busy_time, uint64_t *total_time) int i, num; num = sscanf(line, - %s %llu %llu %llu %llu %llu %llu %llu %llu %llu - %llu %llu %llu, + %s %PRIu64 %PRIu64 %PRIu64 %PRIu64 %PRIu64 + %PRIu64 %PRIu64 %PRIu64 %PRIu64 %PRIu64 + %PRIu64 %PRIu64, cpuname, v[0], v[1], v[2], v[3], v[4], v[5], v[6], v[7], v[8], v[9], v[10], v[11]); if (num 5) { diff --git a/src/gallium/auxiliary/hud/hud_driver_query.c b/src/gallium/auxiliary/hud/hud_driver_query.c index 798da50..413059c 100644 --- a/src/gallium/auxiliary/hud/hud_driver_query.c +++ b/src/gallium/auxiliary/hud/hud_driver_query.c @@ -42,6 +42,7 @@ struct query_info { struct pipe_context *pipe; unsigned query_type; + unsigned result_index; /* unit depends on query_type */ /* Ring of queries. If a query is busy, we use
[Mesa-dev] [PATCH] gallium/docs: fix definition of PIPE_QUERY_SO_STATISTICS
--- src/gallium/docs/source/context.rst |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/gallium/docs/source/context.rst b/src/gallium/docs/source/context.rst index 9e57930..2cc1848 100644 --- a/src/gallium/docs/source/context.rst +++ b/src/gallium/docs/source/context.rst @@ -335,15 +335,17 @@ The result is a 64-bit integer specifying the timer resolution in Hz, followed by a boolean value indicating whether the timer has incremented. ``PIPE_QUERY_PRIMITIVES_GENERATED`` returns a 64-bit integer indicating -the number of primitives processed by the pipeline. +the number of primitives processed by the pipeline (regardless of whether +stream output is active or not). ``PIPE_QUERY_PRIMITIVES_EMITTED`` returns a 64-bit integer indicating the number of primitives written to stream output buffers. ``PIPE_QUERY_SO_STATISTICS`` returns 2 64-bit integers corresponding to -the results of +the result of ``PIPE_QUERY_PRIMITIVES_EMITTED`` and -``PIPE_QUERY_PRIMITIVES_GENERATED``, in this order. +the number of primitives that would have been written to stream output buffers +if they had infinite space available (primitives_storage_needed), in this order. ``PIPE_QUERY_SO_OVERFLOW_PREDICATE`` returns a boolean value indicating whether the stream output targets have overflowed as a result of the -- 1.7.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/5] Head-up display for Gallium DRI2 drivers
On 26.03.2013 12:18, Vadim Girlin wrote: On 03/26/2013 02:00 AM, Marek Olšák wrote: On Mon, Mar 25, 2013 at 10:38 PM, Ondrej Holecek aaa...@gmail.com wrote: On Saturday 23 of March 2013 00:50:59 Marek Olšák wrote: Hi everyone, one image is better than a thousand words: ... Hi, I tried your patches and hit a few problems. As first, they do not apply cleanly on master as they are expecting another your patch cso: add constant buffer save/restore feature for postprocessing to be present. But I guess you are aware of that. Yes, I sent the patch to mesa-dev earlier. Second problem is that when I build mesa with HUD on my 32bit virtual machine, HUD works (with 32bit app of course). When I build it on 64bit (both are same uptodate OS openSUSE 12.3), HUD is not working (with 64bit app). I managed to track it down to failed IMM instruction parsing during HUD_create function. It appears that translate_ctx structure in tgsi_text_translate (file src/gallium/auxiliary/tgsi/tgsi_text.c) is not initialized to zeros under my 64bit system, instead ctx.num_immediates is equal to 1 and hence trigger Immediates must be sorted error. Following fixes HUD for me (note that I really don't know if I am not broking something here in regards to mesa): diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c b/src/gallium/auxiliary/tgsi/tgsi_text.c index 6b97bee..247ec75 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_text.c +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c @@ -1577,6 +1577,7 @@ tgsi_text_translate( ctx.tokens = tokens; ctx.tokens_cur = tokens; ctx.tokens_end = tokens + num_tokens; + ctx.num_immediates = 0; if (!translate( ctx )) return FALSE; I've sent a fix for this a couple of days ago: http://www.mail-archive.com/mesa-dev@lists.freedesktop.org/msg36038.html The third issue is that on both 32bit and 64bit build fonts are not displayed in HUD. I see graphs and transparent background rectangles for text but no text is visible. This one I did not yet solve. Your driver must support the I8_UNORM texture format. I think this also may be related to unexpected by some drivers TGSI declaration of vertex shader inputs: DCL IN[0..1] But this is in no way invalid, any driver that doesn't handle it is broken. Moreover, ideally, IN/OUT should follow the same array declaration and access semantics as TEMP, that's just not implemented yet because it's a bit more involved (WIP). At least r600g expects the separate declaration for each input, though fortunately it still works in this case because parsed declarations of VS inputs aren't really used in r600g. I noticed exactly the same issue (missing text) with my r600-sb branch because it relies on the number of the parsed inputs from r600g's tgsi translator. It's 1 in this case instead of 2, so second input register is considered undefined and optimized away. I suspect that some other drivers may also handle this declaration incorrectly and this may explain the issue. Vadim One last thought, is it intentional when wrong query is entered that hud graph is displayed but empty? Maybe some text like wrong query XXX would be a good hint. I know it is printed on stdout but looking for warnings in chatty apps like openarena is little tricky. Yes, it's intentional. I guess I can at least make it not draw an empty pane. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] RFC: TGSI scalar arrays
Sorry, this has become longer than I anticipated ... I've been toying with adding support for TGSI_FILE_INPUT/OUTPUT arrays because, since I cannot allocate varyings in the same order that the register index specifies, I need it: === EXAMPLE: OUT[0], CLIPDIST[1], must be allocated at address 0x2c0 in hardware output space OUT[1], CLIPDIST[0], 0x2d0 OUT[2], GENERIC[0], between 0x80 and 0x280 OUT[3], GENERIC[1], between 0x80 and 0x280 And without array specification MOV OUT[TEMP[0].x-1], IMM[0] would leave me no clue as to whether use 0x80 or 0x2c0 as base address. === Now that I'm on it, I'm considering to go a step further, which is adding indirect scalar/component access. This is motivated by float gl_ClipDistance[], which, if accessed indirectly, currently leaves us no choice than generating code like this: if ((index 3) == 0) access x component; else if ((index 3) == 1) access y component; ... This is undesirable and the hardware can do better (as it actually supports accessing individual components since address registers contain an address in bytes and we can do scalar read/write). A second motivation is varying packing, which is required by the GL spec, and may lead to use of TEMP arrays, which, albeit improved now, will impair performance when used (on nv50 they go to uncached memory which is very slow). That case occurs if, for instance, a varying float[8] is accessed indirectly and has to be packed into OUT[0..1].xyzw, GENERIC[0..1] instead of OUT[0..7].x, GENERIC[0..7] So far I've come up with 2 choices (all available only if the driver supports e.g. PIPE_CAP_TGSI_SCALAR_REGISTERS): 1. SCALAR DECLARATIONS Using float gl_ClipDistance[8] as example, it could be declared as: OUT[0..7].x, CLIPDIST, ARRAY(1) where the .x now means that it's a single component per OUT[index] Now this obviously means that a single OUT[i] doesn't always consume 16 bytes / 4 components anymore, which may be a somewhat disturbing, since the address of an output can't be directly inferred solely from its index anymore. However, that doesn't really constitute a problem if all access is either direct or comes with an ARRAY() reference. For varying packing, which happens only for user defined variables, and hence TGSI_SEMANTIC_GENERIC, it gets a bit uglier: (NOTE: GL requires us to be able to support exactly the amount of components we report, failing due to alignment is not allowed. Hence the GLSL compiler may put some variables at unaligned locations, see ir_variable.location_frac): A GENERIC semantic index should always cover 4 components so that a fixed location can be assigned for it (drivers usually do this since it makes an extra dynamic linkage pass when shaders are changed unnecessary, as intended by GL_ARB_separate_shader_objects). So, this would be valid: OUT[0..3].x, GENERIC[0] OUT[4..5].xy, GENERIC[1] OUT[6], GENERIC[2] Note how 3 OUT[indices] only consume 1 GENERIC[index]. If we, instead, allocated semantic index per register index instead of per 4 components, we would have: OUT[0..3].x, GENERIC[0] OUT[4..5].xy, GENERIC[4] OUT[6], GENERIC[6] This would waste space, since GENERIC[4,6] would have to go to output_space[addresses 0x40, 0x60] so it could link with IN[6], GENERIC[6] where we have no information about the size of GENERIC[0 .. 5], and wasting space like that means the advertised number of varying components cannot be satisfied. And as a last step, if varyings are placed at non-vec4 boundaries, we would have to be able to specify fractional semantic indices, like this: OUT[0..2].x, GENERIC[0].x OUT[3].x, GENERIC[0].w 2. SCALAR ADDRESS REGISTER VALUES All this can be avoided by always declaring full vec4s, and adding the possibility of doing indirect addressing on a per-component basis: varying float a[4] becomes: uniform int i; a[i+5] = 999 becomes: OUT[0].xyzw, ARRAY(1) UARL_SCALAR ADDR[0].x, CONST[0]. MOV OUT(array 1)[ADDR[0].x+1].y, IMM[0]. The only difficulty with this is that we have to split acess TGSI instructions accessing unaligned vectors: (NOTE: this can always be avoided with TGSI_FILE_TEMPORARY, but varyings may have to be packed). With suggestion (1), 2 packed (and hence unaligned) vec3 arrays and a single vec2 would look like this: OUT[0..3].xyz, GENERIC[0].x OUT[4..5].xyz, GENERIC[3].x OUT[6].xy, GENERIC[4].zw and we could still do: ADD OUT[5].xyz, TEMP[0], TEMP[1] Now, these would have to merged declared as: OUT[0..4].xyzw and the 2nd vec3 would be { OUT[0].w, OUT[1].xyz } instead of simply OUT[1].xyz A problem with this is that the GLSL compiler, while it can do the packing into vec4s and splitting up access, cannot, iirc, access individual components of a vec4 indirectly like TGSI would be able to. To avoid TEMP arrays we'd have to disable the last phase of varying packing (that actually converts the code to using vec4s). It would still be able to assign fractional locations to guarantee that linkage works, but glsl-to-tgsi would likely have
Re: [Mesa-dev] RFC: TGSI scalar arrays
On 20.03.2013 17:05, Roland Scheidegger wrote: Am 20.03.2013 15:41, schrieb Christoph Bumiller: Sorry, this has become longer than I anticipated ... I've been toying with adding support for TGSI_FILE_INPUT/OUTPUT arrays because, since I cannot allocate varyings in the same order that the register index specifies, I need it: === EXAMPLE: OUT[0], CLIPDIST[1], must be allocated at address 0x2c0 in hardware output space OUT[1], CLIPDIST[0], 0x2d0 OUT[2], GENERIC[0], between 0x80 and 0x280 OUT[3], GENERIC[1], between 0x80 and 0x280 And without array specification MOV OUT[TEMP[0].x-1], IMM[0] would leave me no clue as to whether use 0x80 or 0x2c0 as base address. === Now that I'm on it, I'm considering to go a step further, which is adding indirect scalar/component access. This is motivated by float gl_ClipDistance[], which, if accessed indirectly, currently leaves us no choice than generating code like this: if ((index 3) == 0) access x component; else if ((index 3) == 1) access y component; ... This is undesirable and the hardware can do better (as it actually supports accessing individual components since address registers contain an address in bytes and we can do scalar read/write). A second motivation is varying packing, which is required by the GL spec, and may lead to use of TEMP arrays, which, albeit improved now, will impair performance when used (on nv50 they go to uncached memory which is very slow). That case occurs if, for instance, a varying float[8] is accessed indirectly and has to be packed into OUT[0..1].xyzw, GENERIC[0..1] instead of OUT[0..7].x, GENERIC[0..7] So far I've come up with 2 choices (all available only if the driver supports e.g. PIPE_CAP_TGSI_SCALAR_REGISTERS): 1. SCALAR DECLARATIONS Using float gl_ClipDistance[8] as example, it could be declared as: OUT[0..7].x, CLIPDIST, ARRAY(1) where the .x now means that it's a single component per OUT[index] Now this obviously means that a single OUT[i] doesn't always consume 16 bytes / 4 components anymore, which may be a somewhat disturbing, since the address of an output can't be directly inferred solely from its index anymore. However, that doesn't really constitute a problem if all access is either direct or comes with an ARRAY() reference. For varying packing, which happens only for user defined variables, and hence TGSI_SEMANTIC_GENERIC, it gets a bit uglier: (NOTE: GL requires us to be able to support exactly the amount of components we report, failing due to alignment is not allowed. Hence the GLSL compiler may put some variables at unaligned locations, see ir_variable.location_frac): A GENERIC semantic index should always cover 4 components so that a fixed location can be assigned for it (drivers usually do this since it makes an extra dynamic linkage pass when shaders are changed unnecessary, as intended by GL_ARB_separate_shader_objects). So, this would be valid: OUT[0..3].x, GENERIC[0] OUT[4..5].xy, GENERIC[1] OUT[6], GENERIC[2] Note how 3 OUT[indices] only consume 1 GENERIC[index]. If we, instead, allocated semantic index per register index instead of per 4 components, we would have: OUT[0..3].x, GENERIC[0] OUT[4..5].xy, GENERIC[4] OUT[6], GENERIC[6] This would waste space, since GENERIC[4,6] would have to go to output_space[addresses 0x40, 0x60] so it could link with IN[6], GENERIC[6] where we have no information about the size of GENERIC[0 .. 5], and wasting space like that means the advertised number of varying components cannot be satisfied. And as a last step, if varyings are placed at non-vec4 boundaries, we would have to be able to specify fractional semantic indices, like this: OUT[0..2].x, GENERIC[0].x OUT[3].x, GENERIC[0].w 2. SCALAR ADDRESS REGISTER VALUES All this can be avoided by always declaring full vec4s, and adding the possibility of doing indirect addressing on a per-component basis: varying float a[4] becomes: uniform int i; a[i+5] = 999 becomes: OUT[0].xyzw, ARRAY(1) UARL_SCALAR ADDR[0].x, CONST[0]. MOV OUT(array 1)[ADDR[0].x+1].y, IMM[0]. The only difficulty with this is that we have to split acess TGSI instructions accessing unaligned vectors: (NOTE: this can always be avoided with TGSI_FILE_TEMPORARY, but varyings may have to be packed). With suggestion (1), 2 packed (and hence unaligned) vec3 arrays and a single vec2 would look like this: OUT[0..3].xyz, GENERIC[0].x OUT[4..5].xyz, GENERIC[3].x OUT[6].xy, GENERIC[4].zw and we could still do: ADD OUT[5].xyz, TEMP[0], TEMP[1] Now, these would have to merged declared as: OUT[0..4].xyzw and the 2nd vec3 would be { OUT[0].w, OUT[1].xyz } instead of simply OUT[1].xyz A problem with this is that the GLSL compiler, while it can do the packing into vec4s and splitting up access, cannot, iirc, access individual components of a vec4 indirectly like TGSI would be able to. To avoid TEMP arrays we'd have to disable
Re: [Mesa-dev] RFC: TGSI scalar arrays
On 20.03.2013 18:30, Roland Scheidegger wrote: Am 20.03.2013 17:46, schrieb Christoph Bumiller: On 20.03.2013 17:05, Roland Scheidegger wrote: Not sure I fully understand this, but I'm thinking whenever in doubt, use something close to what dx10 does since that's likely going to work reasonable with different hw. Maybe declaring those special values differently (not just as output reg) would help? What DX10 does is making indirect access of varyings illegal. That's not possible with OpenGL ... Hmm I thought dcl_indexRange would be used for indirect access of varyings? Interesting ... when last I tried that back when working on d3d1x, the compiler didn't like it, and I remember something about indexRange existing only for debugging (and I remember finding that strange). Also, d3d11 doesn't have the annoying limit that GLSL has so there is no need for it to pack varyings. When I use floats[3] + SV_POSITION, I get vs_5_0 output limit (32) exceeded, shader uses 33 outputs, but float4[28] works just fine. For indirect access I still get: error X3500: array reference cannot be used as an l-value; not natively addressable for struct IA2VS { float4 position : POSITION; float4 color: COLOR; }; struct VS2PS { float4 position : SV_POSITION; float4 color[2] : WHATEVER; }; VS2PS vs(IA2VS input) { VS2PS result; int i = int(input.position.x); result.position = input.position; result.color[i] = input.color; return result; } float4 ps(VS2PS input) : SV_TARGET { return input.color[0]; } Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium: add TGSI_SEMANTIC_TEXCOORD, PCOORD v3
On 15.03.2013 22:16, Christoph Bumiller wrote: This makes it possible to identify gl_TexCoord and gl_PointCoord for drivers where sprite coordinate replacement is restricted. The new PIPE_CAP_TGSI_TEXCOORD decides whether these varyings should be hidden behind the GENERIC semantic or not. With this patch only nvc0 and nv30 will request that they be used. v2: introduce a CAP so other drivers don't have to bother with the new semantic v3: adapt to introduction gl_varying_slot enum I would push this soon if there are no objections ... --- src/gallium/auxiliary/draw/draw_pipe_wide_point.c | 46 + src/gallium/auxiliary/tgsi/tgsi_dump.c|1 + src/gallium/auxiliary/tgsi/tgsi_strings.c |4 +- src/gallium/docs/source/cso/rasterizer.rst|5 ++ src/gallium/docs/source/screen.rst|8 src/gallium/docs/source/tgsi.rst | 29 + src/gallium/drivers/freedreno/freedreno_screen.c |2 + src/gallium/drivers/i915/i915_screen.c|2 + src/gallium/drivers/llvmpipe/lp_screen.c |1 + src/gallium/drivers/nv30/nv30_screen.c|1 + src/gallium/drivers/nv30/nvfx_fragprog.c | 42 ++- src/gallium/drivers/nv30/nvfx_vertprog.c |7 +++- src/gallium/drivers/nv50/codegen/nv50_ir_driver.h |2 - src/gallium/drivers/nv50/nv50_screen.c|1 + src/gallium/drivers/nv50/nv50_surface.c |5 +- src/gallium/drivers/nvc0/nvc0_program.c | 37 +--- src/gallium/drivers/nvc0/nvc0_screen.c|1 + src/gallium/drivers/r300/r300_screen.c|2 + src/gallium/drivers/r600/r600_pipe.c |2 + src/gallium/drivers/radeonsi/radeonsi_pipe.c |2 + src/gallium/drivers/softpipe/sp_screen.c |2 + src/gallium/drivers/svga/svga_screen.c|2 + src/gallium/include/pipe/p_defines.h |3 +- src/gallium/include/pipe/p_shader_tokens.h|4 +- src/gallium/include/pipe/p_state.h|2 +- src/mesa/state_tracker/st_context.c |3 + src/mesa/state_tracker/st_context.h |2 + src/mesa/state_tracker/st_program.c | 45 +++- 28 files changed, 171 insertions(+), 92 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c index 8e0a117..0d3fee4 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c +++ b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c @@ -52,6 +52,7 @@ */ +#include pipe/p_screen.h #include pipe/p_context.h #include util/u_math.h #include util/u_memory.h @@ -74,6 +75,9 @@ struct widepoint_stage { uint num_texcoord_gen; uint texcoord_gen_slot[PIPE_MAX_SHADER_OUTPUTS]; + /* TGSI_SEMANTIC to which sprite_coord_enable applies */ + unsigned sprite_coord_semantic; + int psize_slot; }; @@ -233,28 +237,29 @@ widepoint_first_point(struct draw_stage *stage, wide-num_texcoord_gen = 0; - /* Loop over fragment shader inputs looking for generic inputs + /* Loop over fragment shader inputs looking for the PCOORD input or inputs * for which bit 'k' in sprite_coord_enable is set. */ for (i = 0; i fs-info.num_inputs; i++) { - if (fs-info.input_semantic_name[i] == TGSI_SEMANTIC_GENERIC) { -const int generic_index = fs-info.input_semantic_index[i]; -/* Note that sprite_coord enable is a bitfield of - * PIPE_MAX_SHADER_OUTPUTS bits. - */ -if (generic_index PIPE_MAX_SHADER_OUTPUTS -(rast-sprite_coord_enable (1 generic_index))) { - /* OK, this generic attribute needs to be replaced with a -* texcoord (see above). -*/ - int slot = draw_alloc_extra_vertex_attrib(draw, - TGSI_SEMANTIC_GENERIC, - generic_index); - - /* add this slot to the texcoord-gen list */ - wide-texcoord_gen_slot[wide-num_texcoord_gen++] = slot; -} + int slot; + const unsigned sn = fs-info.input_semantic_name[i]; + const unsigned si = fs-info.input_semantic_index[i]; + + if (sn == wide-sprite_coord_semantic) { +/* Note that sprite_coord_enable is a bitfield of 32 bits. */ +if (si = 32 || !(rast-sprite_coord_enable (1 si))) + continue; + } else if (sn != TGSI_SEMANTIC_PCOORD) { +continue; } + + /* OK, this generic attribute needs to be replaced with a + * sprite coord (see above). + */ + slot
Re: [Mesa-dev] [PATCH 9/9] tgsi: add ArrayID documentation
On 17.03.2013 16:30, Christian König wrote: Am 15.03.2013 18:58, schrieb Christoph Bumiller: On 15.03.2013 13:08, Christian König wrote: Am 14.03.2013 15:53, schrieb Christoph Bumiller: On 14.03.2013 15:20, Christian König wrote: From: Christian König christian.koe...@amd.com Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/docs/source/tgsi.rst | 16 1 file changed, 16 insertions(+) diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index d9a7fe9..27fe039 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -1833,6 +1833,22 @@ If Interpolate flag is set to 1, a Declaration Interpolate token follows. If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows. +If Array flag is set to 1, a Declaration Array token follows. + +Array Declaration + + +Declarations can optional have an ArrayID attribute which can be referred by +indirect addressing operands. An ArrayID of zero is reserved and treaded as +if no ArrayID is specified. + +If an indirect addressing operand refers to an specific declaration by using s/an/a Thx, fixed. +an ArrayID only the registers in this declaration are guaranteed to be +accessed, accessing any register outside this declaration results in undefined +behavior. + Note that the effective index is zero-based and not relative to the specified declaration. XXX: Is it ? Should it be ? Yes for compatibility reasons, otherwise we would need to change all drivers at once. + +If no ArrayID is specified with an indirect addressing operand the whole +register file might be accessed by this operand. + A practice which is strongly discouraged. Don't do this if you have more than 1 declaration for the file in question ! It will prevent packing of scalar/vec2 arrays and effective memory alias analysis. A bit shortened, but in general added the remark. Packing ? Yes ! We can pack arrays if they're declared as e.g. TEMP[0-3].xyzw TEMP[4-31].x And the caches will be very very thankful that we don't just access every 4th element of our 4 times larger than it needs to be buffer !!! And if your card can't do that, pleeease be nice and still make it possible for other drivers. :o3 It is probably possible with the new information to do so, but not priority for me cause I primary need it for our LLVM backend. At some point you'll be able to make use of the info in your backend, too, and then you'll regret having to refamiliarize with this code just because you didn't add the extra (estimated) 2 lines to set the UsageMask. I think you misunderstood me here, you don't need the UsageMask to generate those informations. It is possible by just scanning the shader to figure out which channels are used and which aren't. For temporaries that may be true ... and inputs/outputs are always vec4 sized to guarantee linkage, packing for GENERIC ones is handled at the mesa level. Additional to that I'm not convinced that using the UsageMask for this is 100% correct, to me it looks more like UsageMask is something we need for outputs to distinct between not writing to an output channel (and so still having the default) and not having an output channel at all. Actually, for gl_ClipDistance[] we use the UsageMask to specify if the clip distance was declared in the source (and thus should be enabled) instead of whether it's been written or not. I wanted to be able to distinguish between float gl_ClipDistance[8] or vec4 mesa_ClipDistance[2] with the UsageMask but I guess OUT[0..1].x, CLIPDIST might just as well mean that gl_ClipDistance[0 and 4] are being used ... Hm, we'll need a cap for that anway to tell st if it should lower ClipDistance to vec4s or not, and just assume that TGSI corresponds to what the cap says. And since this is the only case for IN/OUT where the driver's backend has to decide whether to pack or not ... ok, I'll just infer array width myself, too, and you can ignore the UsageMask. Also, NAK from me until array access/declarations for the other files follows suit. Sorry for being so ... pesky, but I'd really like this change to be 100% complete. Come on, doesn't it nag on your conscience if this is left to remain only a few smalls steps from perfection ? Declaring and accessing arrays for inputs/outputs are not so much of a problem, figuring out how to get this information to glsl_to_tgsi is the real problem. For temporaries changing the glsl_to_tgsi pass is pretty much sufficient, but for inputs and outputs you need to dig into the mesa state tracker, and I definitely don't intend to do so. Fine, then I'll have a look at that myself. * flaming eyes * Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 9/9] tgsi: add ArrayID documentation
On 17.03.2013 18:04, Christoph Bumiller wrote: On 17.03.2013 16:30, Christian König wrote: Am 15.03.2013 18:58, schrieb Christoph Bumiller: On 15.03.2013 13:08, Christian König wrote: Am 14.03.2013 15:53, schrieb Christoph Bumiller: On 14.03.2013 15:20, Christian König wrote: From: Christian König christian.koe...@amd.com Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/docs/source/tgsi.rst | 16 1 file changed, 16 insertions(+) diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index d9a7fe9..27fe039 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -1833,6 +1833,22 @@ If Interpolate flag is set to 1, a Declaration Interpolate token follows. If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows. +If Array flag is set to 1, a Declaration Array token follows. + +Array Declaration + + +Declarations can optional have an ArrayID attribute which can be referred by +indirect addressing operands. An ArrayID of zero is reserved and treaded as +if no ArrayID is specified. + +If an indirect addressing operand refers to an specific declaration by using s/an/a Thx, fixed. +an ArrayID only the registers in this declaration are guaranteed to be +accessed, accessing any register outside this declaration results in undefined +behavior. + Note that the effective index is zero-based and not relative to the specified declaration. XXX: Is it ? Should it be ? Yes for compatibility reasons, otherwise we would need to change all drivers at once. + +If no ArrayID is specified with an indirect addressing operand the whole +register file might be accessed by this operand. + A practice which is strongly discouraged. Don't do this if you have more than 1 declaration for the file in question ! It will prevent packing of scalar/vec2 arrays and effective memory alias analysis. A bit shortened, but in general added the remark. Packing ? Yes ! We can pack arrays if they're declared as e.g. TEMP[0-3].xyzw TEMP[4-31].x And the caches will be very very thankful that we don't just access every 4th element of our 4 times larger than it needs to be buffer !!! And if your card can't do that, pleeease be nice and still make it possible for other drivers. :o3 It is probably possible with the new information to do so, but not priority for me cause I primary need it for our LLVM backend. At some point you'll be able to make use of the info in your backend, too, and then you'll regret having to refamiliarize with this code just because you didn't add the extra (estimated) 2 lines to set the UsageMask. I think you misunderstood me here, you don't need the UsageMask to generate those informations. It is possible by just scanning the shader to figure out which channels are used and which aren't. For temporaries that may be true ... and inputs/outputs are always vec4 sized to guarantee linkage, packing for GENERIC ones is handled at the mesa level. Additional to that I'm not convinced that using the UsageMask for this is 100% correct, to me it looks more like UsageMask is something we need for outputs to distinct between not writing to an output channel (and so still having the default) and not having an output channel at all. Actually, for gl_ClipDistance[] we use the UsageMask to specify if the clip distance was declared in the source (and thus should be enabled) instead of whether it's been written or not. I wanted to be able to distinguish between float gl_ClipDistance[8] or vec4 mesa_ClipDistance[2] with the UsageMask but I guess OUT[0..1].x, CLIPDIST might just as well mean that gl_ClipDistance[0 and 4] are being used ... Hm, we'll need a cap for that anway to tell st if it should lower ClipDistance to vec4s or not, and just assume that TGSI corresponds to what the cap says. And since this is the only case for IN/OUT where the driver's backend has to decide whether to pack or not ... ok, I'll just infer array width myself, too, and you can ignore the UsageMask. Also, NAK from me until array access/declarations for the other files follows suit. Sorry for being so ... pesky, but I'd really like this change to be 100% complete. Come on, doesn't it nag on your conscience if this is left to remain only a few smalls steps from perfection ? Declaring and accessing arrays for inputs/outputs are not so much of a problem, figuring out how to get this information to glsl_to_tgsi is the real problem. For temporaries changing the glsl_to_tgsi pass is pretty much sufficient, but for inputs and outputs you need to dig into the mesa state tracker, and I definitely don't intend to do so. Fine, then I'll have a look at that myself. * flaming eyes * Ok, had a look, had enough. It hurts. At least if you have never touched glsl-to-tgsi and go about it expecting it to be easy
Re: [Mesa-dev] [PATCH 9/9] tgsi: add ArrayID documentation
On 15.03.2013 13:08, Christian König wrote: Am 14.03.2013 15:53, schrieb Christoph Bumiller: On 14.03.2013 15:20, Christian König wrote: From: Christian König christian.koe...@amd.com Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/docs/source/tgsi.rst | 16 1 file changed, 16 insertions(+) diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index d9a7fe9..27fe039 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -1833,6 +1833,22 @@ If Interpolate flag is set to 1, a Declaration Interpolate token follows. If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows. +If Array flag is set to 1, a Declaration Array token follows. + +Array Declaration + + +Declarations can optional have an ArrayID attribute which can be referred by +indirect addressing operands. An ArrayID of zero is reserved and treaded as +if no ArrayID is specified. + +If an indirect addressing operand refers to an specific declaration by using s/an/a Thx, fixed. +an ArrayID only the registers in this declaration are guaranteed to be +accessed, accessing any register outside this declaration results in undefined +behavior. + Note that the effective index is zero-based and not relative to the specified declaration. XXX: Is it ? Should it be ? Yes for compatibility reasons, otherwise we would need to change all drivers at once. + +If no ArrayID is specified with an indirect addressing operand the whole +register file might be accessed by this operand. + A practice which is strongly discouraged. Don't do this if you have more than 1 declaration for the file in question ! It will prevent packing of scalar/vec2 arrays and effective memory alias analysis. A bit shortened, but in general added the remark. Packing ? Yes ! We can pack arrays if they're declared as e.g. TEMP[0-3].xyzw TEMP[4-31].x And the caches will be very very thankful that we don't just access every 4th element of our 4 times larger than it needs to be buffer !!! And if your card can't do that, pleeease be nice and still make it possible for other drivers. :o3 It is probably possible with the new information to do so, but not priority for me cause I primary need it for our LLVM backend. At some point you'll be able to make use of the info in your backend, too, and then you'll regret having to refamiliarize with this code just because you didn't add the extra (estimated) 2 lines to set the UsageMask. Also, NAK from me until array access/declarations for the other files follows suit. Sorry for being so ... pesky, but I'd really like this change to be 100% complete. Come on, doesn't it nag on your conscience if this is left to remain only a few smalls steps from perfection ? Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium: add TGSI_SEMANTIC_TEXCOORD,PCOORD v3
This makes it possible to identify gl_TexCoord and gl_PointCoord for drivers where sprite coordinate replacement is restricted. The new PIPE_CAP_TGSI_TEXCOORD decides whether these varyings should be hidden behind the GENERIC semantic or not. With this patch only nvc0 and nv30 will request that they be used. v2: introduce a CAP so other drivers don't have to bother with the new semantic v3: adapt to introduction gl_varying_slot enum --- src/gallium/auxiliary/draw/draw_pipe_wide_point.c | 46 + src/gallium/auxiliary/tgsi/tgsi_dump.c|1 + src/gallium/auxiliary/tgsi/tgsi_strings.c |4 +- src/gallium/docs/source/cso/rasterizer.rst|5 ++ src/gallium/docs/source/screen.rst|8 src/gallium/docs/source/tgsi.rst | 29 + src/gallium/drivers/freedreno/freedreno_screen.c |2 + src/gallium/drivers/i915/i915_screen.c|2 + src/gallium/drivers/llvmpipe/lp_screen.c |1 + src/gallium/drivers/nv30/nv30_screen.c|1 + src/gallium/drivers/nv30/nvfx_fragprog.c | 42 ++- src/gallium/drivers/nv30/nvfx_vertprog.c |7 +++- src/gallium/drivers/nv50/codegen/nv50_ir_driver.h |2 - src/gallium/drivers/nv50/nv50_screen.c|1 + src/gallium/drivers/nv50/nv50_surface.c |5 +- src/gallium/drivers/nvc0/nvc0_program.c | 37 +--- src/gallium/drivers/nvc0/nvc0_screen.c|1 + src/gallium/drivers/r300/r300_screen.c|2 + src/gallium/drivers/r600/r600_pipe.c |2 + src/gallium/drivers/radeonsi/radeonsi_pipe.c |2 + src/gallium/drivers/softpipe/sp_screen.c |2 + src/gallium/drivers/svga/svga_screen.c|2 + src/gallium/include/pipe/p_defines.h |3 +- src/gallium/include/pipe/p_shader_tokens.h|4 +- src/gallium/include/pipe/p_state.h|2 +- src/mesa/state_tracker/st_context.c |3 + src/mesa/state_tracker/st_context.h |2 + src/mesa/state_tracker/st_program.c | 45 +++- 28 files changed, 171 insertions(+), 92 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c index 8e0a117..0d3fee4 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c +++ b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c @@ -52,6 +52,7 @@ */ +#include pipe/p_screen.h #include pipe/p_context.h #include util/u_math.h #include util/u_memory.h @@ -74,6 +75,9 @@ struct widepoint_stage { uint num_texcoord_gen; uint texcoord_gen_slot[PIPE_MAX_SHADER_OUTPUTS]; + /* TGSI_SEMANTIC to which sprite_coord_enable applies */ + unsigned sprite_coord_semantic; + int psize_slot; }; @@ -233,28 +237,29 @@ widepoint_first_point(struct draw_stage *stage, wide-num_texcoord_gen = 0; - /* Loop over fragment shader inputs looking for generic inputs + /* Loop over fragment shader inputs looking for the PCOORD input or inputs * for which bit 'k' in sprite_coord_enable is set. */ for (i = 0; i fs-info.num_inputs; i++) { - if (fs-info.input_semantic_name[i] == TGSI_SEMANTIC_GENERIC) { -const int generic_index = fs-info.input_semantic_index[i]; -/* Note that sprite_coord enable is a bitfield of - * PIPE_MAX_SHADER_OUTPUTS bits. - */ -if (generic_index PIPE_MAX_SHADER_OUTPUTS -(rast-sprite_coord_enable (1 generic_index))) { - /* OK, this generic attribute needs to be replaced with a -* texcoord (see above). -*/ - int slot = draw_alloc_extra_vertex_attrib(draw, - TGSI_SEMANTIC_GENERIC, - generic_index); - - /* add this slot to the texcoord-gen list */ - wide-texcoord_gen_slot[wide-num_texcoord_gen++] = slot; -} + int slot; + const unsigned sn = fs-info.input_semantic_name[i]; + const unsigned si = fs-info.input_semantic_index[i]; + + if (sn == wide-sprite_coord_semantic) { +/* Note that sprite_coord_enable is a bitfield of 32 bits. */ +if (si = 32 || !(rast-sprite_coord_enable (1 si))) + continue; + } else if (sn != TGSI_SEMANTIC_PCOORD) { +continue; } + + /* OK, this generic attribute needs to be replaced with a + * sprite coord (see above). + */ + slot = draw_alloc_extra_vertex_attrib(draw, sn, si); + + /* add this slot to the texcoord-gen list */ + wide-texcoord_gen_slot[wide-num_texcoord_gen++] = slot; } } @@ -326,6 +331,11 @@ struct
[Mesa-dev] [PATCH] gallium: add TGSI_SEMANTIC_TEXCOORD, PCOORD (CAP variant)
--- src/gallium/auxiliary/draw/draw_pipe_wide_point.c | 46 +-- src/gallium/auxiliary/tgsi/tgsi_dump.c|1 + src/gallium/auxiliary/tgsi/tgsi_strings.c |4 +- src/gallium/docs/source/cso/rasterizer.rst|5 ++ src/gallium/docs/source/screen.rst|8 +++ src/gallium/docs/source/tgsi.rst | 29 + src/gallium/drivers/freedreno/freedreno_screen.c |2 + src/gallium/drivers/i915/i915_screen.c|2 + src/gallium/drivers/llvmpipe/lp_screen.c |1 + src/gallium/drivers/nv30/nv30_screen.c|1 + src/gallium/drivers/nv30/nvfx_fragprog.c | 39 ++-- src/gallium/drivers/nv50/codegen/nv50_ir_driver.h |2 - src/gallium/drivers/nv50/nv50_screen.c|1 + src/gallium/drivers/nv50/nv50_surface.c |5 +- src/gallium/drivers/nvc0/nvc0_program.c | 37 +-- src/gallium/drivers/nvc0/nvc0_screen.c|1 + src/gallium/drivers/r300/r300_screen.c|2 + src/gallium/drivers/r600/r600_pipe.c |2 + src/gallium/drivers/radeonsi/radeonsi_pipe.c |2 + src/gallium/drivers/softpipe/sp_screen.c |2 + src/gallium/drivers/svga/svga_screen.c|2 + src/gallium/include/pipe/p_defines.h |3 +- src/gallium/include/pipe/p_shader_tokens.h|4 +- src/gallium/include/pipe/p_state.h|2 +- src/mesa/state_tracker/st_atom_rasterizer.c |5 +- src/mesa/state_tracker/st_context.c |3 + src/mesa/state_tracker/st_context.h |2 + src/mesa/state_tracker/st_program.c | 68 +++-- 28 files changed, 181 insertions(+), 100 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c index 8e0a117..0d3fee4 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c +++ b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c @@ -52,6 +52,7 @@ */ +#include pipe/p_screen.h #include pipe/p_context.h #include util/u_math.h #include util/u_memory.h @@ -74,6 +75,9 @@ struct widepoint_stage { uint num_texcoord_gen; uint texcoord_gen_slot[PIPE_MAX_SHADER_OUTPUTS]; + /* TGSI_SEMANTIC to which sprite_coord_enable applies */ + unsigned sprite_coord_semantic; + int psize_slot; }; @@ -233,28 +237,29 @@ widepoint_first_point(struct draw_stage *stage, wide-num_texcoord_gen = 0; - /* Loop over fragment shader inputs looking for generic inputs + /* Loop over fragment shader inputs looking for the PCOORD input or inputs * for which bit 'k' in sprite_coord_enable is set. */ for (i = 0; i fs-info.num_inputs; i++) { - if (fs-info.input_semantic_name[i] == TGSI_SEMANTIC_GENERIC) { -const int generic_index = fs-info.input_semantic_index[i]; -/* Note that sprite_coord enable is a bitfield of - * PIPE_MAX_SHADER_OUTPUTS bits. - */ -if (generic_index PIPE_MAX_SHADER_OUTPUTS -(rast-sprite_coord_enable (1 generic_index))) { - /* OK, this generic attribute needs to be replaced with a -* texcoord (see above). -*/ - int slot = draw_alloc_extra_vertex_attrib(draw, - TGSI_SEMANTIC_GENERIC, - generic_index); - - /* add this slot to the texcoord-gen list */ - wide-texcoord_gen_slot[wide-num_texcoord_gen++] = slot; -} + int slot; + const unsigned sn = fs-info.input_semantic_name[i]; + const unsigned si = fs-info.input_semantic_index[i]; + + if (sn == wide-sprite_coord_semantic) { +/* Note that sprite_coord_enable is a bitfield of 32 bits. */ +if (si = 32 || !(rast-sprite_coord_enable (1 si))) + continue; + } else if (sn != TGSI_SEMANTIC_PCOORD) { +continue; } + + /* OK, this generic attribute needs to be replaced with a + * sprite coord (see above). + */ + slot = draw_alloc_extra_vertex_attrib(draw, sn, si); + + /* add this slot to the texcoord-gen list */ + wide-texcoord_gen_slot[wide-num_texcoord_gen++] = slot; } } @@ -326,6 +331,11 @@ struct draw_stage *draw_wide_point_stage( struct draw_context *draw ) if (!draw_alloc_temp_verts( wide-stage, 4 )) goto fail; + wide-sprite_coord_semantic = + draw-pipe-screen-get_param(draw-pipe-screen, PIPE_CAP_TGSI_TEXCOORD) + ? + TGSI_SEMANTIC_TEXCOORD : TGSI_SEMANTIC_GENERIC; + return wide-stage; fail: diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c b/src/gallium/auxiliary/tgsi/tgsi_dump.c index 3e6f76a..8f16f2d 100644
[Mesa-dev] [PATCH] gallium: add TGSI_SEMANTIC_TEXCOORD,PCOORD
Second attempt, 2 years ago no one replied or cared ... We really need to know about these on nvc0 because there are only 8 fixed hardware locations that can be overwritten by sprite coordinates, and one location that represents gl_PointCoord and unconditionally returns sprite coordinates. So far this was solved via a hack, which works since the locations the state tracker picks aren't dynamic (and likely will never be, to facilitate ARB_separate_shader_objects), but it still isn't nice to do it this way. It looks like nv30 was using a hack, too, since it had a check for Semantic.Index == 9, which is what mesa uses for PointCoord. Implementing a safe, non-mesa-dependent way without these SEMANTICs would be jumping through hoops and doing expensive shader recompilations just because we like to destroy information at the gallium threshold, and that's unacceptable. I started to (try) fix up the other drivers, but maybe we just want a CAP for this instead, since the default solution - if this is TEXCOORD then treat it as GENERIC with semantic index += MAX_TEXCOORDS - doesn't really look that nicer either. E.g. if PIPE_CAP_RESTRICTED_SPRITE_COORDS is advertised, the state tracker should use the TEXCOORD and PCOORD semantics, otherwise it should just use GENERICs as before. --- src/gallium/auxiliary/draw/draw_pipe_wide_point.c | 39 src/gallium/auxiliary/tgsi/tgsi_dump.c |1 + src/gallium/auxiliary/tgsi/tgsi_strings.c |2 + src/gallium/docs/source/cso/rasterizer.rst |2 +- src/gallium/docs/source/tgsi.rst | 23 +- src/gallium/drivers/freedreno/freedreno_compiler.c |2 + src/gallium/drivers/i915/i915_fpc_translate.c |2 + src/gallium/drivers/i915/i915_state_derived.c |4 ++ src/gallium/drivers/llvmpipe/lp_setup_point.c | 29 ++-- src/gallium/drivers/nv30/nvfx_fragprog.c | 39 src/gallium/drivers/nv50/nv50_shader_state.c |8 +-- src/gallium/drivers/nv50/nv50_surface.c|5 +- src/gallium/drivers/nvc0/nvc0_program.c| 37 +-- src/gallium/drivers/r300/r300_fs.c |2 + src/gallium/drivers/r300/r300_shader_semantics.h |3 +- src/gallium/drivers/r300/r300_vs.c |2 + src/gallium/drivers/r600/evergreen_state.c |7 ++- src/gallium/drivers/r600/r600_shader.c |3 +- src/gallium/drivers/r600/r600_state.c |7 ++- src/gallium/drivers/radeonsi/radeonsi_shader.c |1 + src/gallium/drivers/radeonsi/si_state.c|2 +- src/gallium/drivers/radeonsi/si_state_draw.c |5 +- src/gallium/include/pipe/p_shader_tokens.h | 36 +-- src/gallium/include/pipe/p_state.h |2 +- src/mesa/state_tracker/st_atom_rasterizer.c|6 +-- src/mesa/state_tracker/st_program.c| 48 +-- 26 files changed, 162 insertions(+), 155 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c index 8e0a117..d4ed0f7 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c +++ b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c @@ -233,28 +233,29 @@ widepoint_first_point(struct draw_stage *stage, wide-num_texcoord_gen = 0; - /* Loop over fragment shader inputs looking for generic inputs - * for which bit 'k' in sprite_coord_enable is set. + /* Loop over fragment shader inputs looking for the PCOORD input or + * TEXCOORD inputs for which bit 'k' in sprite_coord_enable is set. */ for (i = 0; i fs-info.num_inputs; i++) { - if (fs-info.input_semantic_name[i] == TGSI_SEMANTIC_GENERIC) { -const int generic_index = fs-info.input_semantic_index[i]; -/* Note that sprite_coord enable is a bitfield of - * PIPE_MAX_SHADER_OUTPUTS bits. - */ -if (generic_index PIPE_MAX_SHADER_OUTPUTS -(rast-sprite_coord_enable (1 generic_index))) { - /* OK, this generic attribute needs to be replaced with a -* texcoord (see above). -*/ - int slot = draw_alloc_extra_vertex_attrib(draw, - TGSI_SEMANTIC_GENERIC, - generic_index); - - /* add this slot to the texcoord-gen list */ - wide-texcoord_gen_slot[wide-num_texcoord_gen++] = slot; -} + int slot; + const unsigned sn = fs-info.input_semantic_name[i]; + const unsigned si = fs-info.input_semantic_index[i]; + + if (sn == TGSI_SEMANTIC_TEXCOORD) { +/* Note that sprite_coord enable is a bitfield of 8 bits. */ +if (si = 8 || !(rast-sprite_coord_enable (1 si))) +
Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
On 12.03.2013 10:31, Christian König wrote: Am 12.03.2013 02:48, schrieb Marek Olšák: On Mon, Mar 11, 2013 at 1:44 PM, Christian König deathsim...@vodafone.de wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. The solutions proposed or implemented so far all more or less incomplete, so this approach was designed in mind with both completeness and compatibility with existing code. Over all it's just an implementation of what Tom Stellard named solution #4 in this eMail thread: http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html Hi Christian, this is definitely not the solution #4. According to the TGSI dump Christoph posted, it looks more like #3. Well, for me the main difference between proposal #3 and #4 is that #3 tries to identify the declaration to use with the supplied offset, while #4 uses a completely distinct identifier for that. The solution #4 completely changes the temporary file such that it becomes two-dimensional with the first index being a literal and the second index being either a literal or ADDR[literal], and it would always be like that regardless of whether drivers support that or not. One-dimensional indexing of TEMP is not allowed. For backward compatibility, the drivers that do not support it would only get a single array declaration TEMP[0][0..n] and TEMP[0][...] would be everywhere in the code. Ok, then I misunderstood you a bit, but I don't think the difference is so much. What I'm proposing is that we have an optional ArrayID attached to each declaration and refer to this ArrayID in the indirect addressing operand. To sum it up declarations should look something like this: DCL TEMP[0..3]// normal registers DCL TEMP[1][4..11]// indirectly accessed array DCL TEMP[2][12..15]// another indirectly accessed array DCL TEMP[16..17] LOCAL// local registers While an indirect operand might look like this: MOV TEMP[16], TEMP[1][ADDR[0].x-13] On the pro side for this approach is that it is compatible with all the existing state trackers and driver, and we don't need to generate different code depending on weather or not the driver supports this. I don't know much about TGSI internals, so I can't review this. I'd just like to say that TGSI dumps should make sense (2D indexing should be only allowed with 2D declarations) and tgsi_text_translate should be able to do the reverse - convert the dumps back to TGSI tokens. Completely agree with that, and beside writing documentation testing this is still one of the todos with this patchset. I have to admit that your approach looks a bit cleaner from the high above view. The problem with it is that it requires this additional 2D index on every operand, and we just don't have enough bits left for this. Even with my approach I need to make room for this ArrayID in the indirect addressing operand token, and this additional token is only there if the operand uses indirect adressing. Do you think we can live with my approach or is there any major downside I currently don't see? I can live with it. I think ... (I hope I don't regret this later; seems like this doesn't contain less information, then it's ok.) If the placement of the hint index offends someone, just write it as MOV TEMP[16], TEMP(1)[ADDR[0].x-13] or ... TEMP[ADDR[0].x-13 : 1] or TEMP[ADDR[0].x-13 supposedToBeIn [4,11]] or something ... nicer. Actually ... if TEMP[0] is placed at mem[0] and TEMP[4..1] is placed at, say, mem[0x1000 in bytes] do I have to load $register mem[$addr - 0xd0] (no this can't work) or load $regsiter mem[$addr - 0xd0 + 0x1000] (if you didn't adjust the offset) or load $register mem[$addr - 0xd0 + 0x1000 - 0x40] (if you already added the base TEMP to the immediate offset) This needs to be documented as well. Thanks for the clarification, Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
On 12.03.2013 12:10, Christoph Bumiller wrote: On 12.03.2013 10:31, Christian König wrote: Am 12.03.2013 02:48, schrieb Marek Olšák: On Mon, Mar 11, 2013 at 1:44 PM, Christian König deathsim...@vodafone.de wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. The solutions proposed or implemented so far all more or less incomplete, so this approach was designed in mind with both completeness and compatibility with existing code. Over all it's just an implementation of what Tom Stellard named solution #4 in this eMail thread: http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html Hi Christian, this is definitely not the solution #4. According to the TGSI dump Christoph posted, it looks more like #3. Well, for me the main difference between proposal #3 and #4 is that #3 tries to identify the declaration to use with the supplied offset, while #4 uses a completely distinct identifier for that. The solution #4 completely changes the temporary file such that it becomes two-dimensional with the first index being a literal and the second index being either a literal or ADDR[literal], and it would always be like that regardless of whether drivers support that or not. One-dimensional indexing of TEMP is not allowed. For backward compatibility, the drivers that do not support it would only get a single array declaration TEMP[0][0..n] and TEMP[0][...] would be everywhere in the code. Ok, then I misunderstood you a bit, but I don't think the difference is so much. What I'm proposing is that we have an optional ArrayID attached to each declaration and refer to this ArrayID in the indirect addressing operand. To sum it up declarations should look something like this: DCL TEMP[0..3]// normal registers DCL TEMP[1][4..11]// indirectly accessed array DCL TEMP[2][12..15]// another indirectly accessed array DCL TEMP[16..17] LOCAL// local registers While an indirect operand might look like this: MOV TEMP[16], TEMP[1][ADDR[0].x-13] On the pro side for this approach is that it is compatible with all the existing state trackers and driver, and we don't need to generate different code depending on weather or not the driver supports this. I don't know much about TGSI internals, so I can't review this. I'd just like to say that TGSI dumps should make sense (2D indexing should be only allowed with 2D declarations) and tgsi_text_translate should be able to do the reverse - convert the dumps back to TGSI tokens. Completely agree with that, and beside writing documentation testing this is still one of the todos with this patchset. I have to admit that your approach looks a bit cleaner from the high above view. The problem with it is that it requires this additional 2D index on every operand, and we just don't have enough bits left for this. Even with my approach I need to make room for this ArrayID in the indirect addressing operand token, and this additional token is only there if the operand uses indirect adressing. Do you think we can live with my approach or is there any major downside I currently don't see? One more thing. While you're at it (i.e. are familiar with the code), could you set the UsageMask in the TGSI declaration so we can pack scalar or vec2 arrays ? Also, you could then declare gl_ClipDistance outputs as DCL OUT[0..7].x, CLIPDIST so we can actually index clip distances properly ? With DCL OUT[0..1].xyzw, CLIPDIST we can't really index the individual components which leads to if ((index 3) == 0) MOV OUT[index / 4].x = value else if ((index 3) == 1) MOV OUT[index / 4].y = value which is unnecessary on some hardware. I can live with it. I think ... (I hope I don't regret this later; seems like this doesn't contain less information, then it's ok.) If the placement of the hint index offends someone, just write it as MOV TEMP[16], TEMP(1)[ADDR[0].x-13] or ... TEMP[ADDR[0].x-13 : 1] or TEMP[ADDR[0].x-13 supposedToBeIn [4,11]] or something ... nicer. Actually ... if TEMP[0] is placed at mem[0] and TEMP[4..1] is placed at, say, mem[0x1000 in bytes] do I have to load $register mem[$addr - 0xd0] (no this can't work) or load $regsiter mem[$addr - 0xd0 + 0x1000] (if you didn't adjust the offset) or load $register mem[$addr - 0xd0 + 0x1000 - 0x40] (if you already added the base TEMP to the immediate offset) This needs to be documented as well. Thanks for the clarification, Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev
Re: [Mesa-dev] [PATCH 1/2] d3d1x: Remove.
On 11.03.2013 11:26, Jose Fonseca wrote: First email was too long, so re-sending just the interesting bits) Please tell me removing this came to mind because you're going to release a better D3D9,10/11 state tracker :) (Nah I guess it would be too much trouble if there's no users for it ...) This one *did* kind of work, notably also with wine, but it still has loads of bugs and I just don't have the time to improve it; and then add those missing bits like deferred contexts, virtual functions, compute shader or UAV support. Also gallium's still not completely able to support everything properly. It did acquire some of the missing parts though since last time I touched it. I had succeeded in making Unigine Heaven run (taking a little shortcut with sm4 to nv50, extending the gallium interface for features like tessellation that are still years ahead for all the other parties would not have been well received at that time, at least I had that impression), but all the more complex games I tested crashed somewhere and I wasn't going to try to debug binary blobs (most of them seemed to require those missing features, too). Anyway, just meant to say, it *could* have been useful had someone finished it ... if only with wine. So I'm fine with removing it since I don't expect anyone to get back to it. Trying to decide between farewell and good riddance for all the pain its bugs caused me. From: José Fonseca jfons...@vmware.com Unused/unmaintained. --- configure.ac | 21 - src/gallium/docs/source/context.rst|2 +- src/gallium/state_trackers/d3d1x/.gitignore| 20 - src/gallium/state_trackers/d3d1x/Makefile | 11 - src/gallium/state_trackers/d3d1x/Makefile.inc | 19 - .../state_trackers/d3d1x/d3d1xshader/Makefile | 16 - .../d3d1x/d3d1xshader/defs/files.txt | 41 - .../d3d1x/d3d1xshader/defs/interpolations.txt |8 - .../d3d1x/d3d1xshader/defs/opcodes.txt | 207 -- .../d3d1x/d3d1xshader/defs/operand_compnums.txt|5 - .../d3d1x/d3d1xshader/defs/operand_index_reprs.txt |5 - .../d3d1x/d3d1xshader/defs/operand_modes.txt |4 - .../d3d1x/d3d1xshader/defs/shortfiles.txt | 41 - .../state_trackers/d3d1x/d3d1xshader/defs/svs.txt | 23 - .../d3d1x/d3d1xshader/defs/targets.txt | 13 - .../defs/token_instruction_extended_types.txt |4 - .../defs/token_operand_extended_types.txt |2 - .../state_trackers/d3d1x/d3d1xshader/gen-header.sh | 13 - .../state_trackers/d3d1x/d3d1xshader/gen-text.sh | 11 - .../d3d1x/d3d1xshader/include/dxbc.h | 125 - .../d3d1x/d3d1xshader/include/le32.h | 45 - .../state_trackers/d3d1x/d3d1xshader/include/sm4.h | 416 .../d3d1x/d3d1xshader/src/dxbc_assemble.cpp| 59 - .../d3d1x/d3d1xshader/src/dxbc_dump.cpp| 43 - .../d3d1x/d3d1xshader/src/dxbc_parse.cpp | 87 - .../d3d1x/d3d1xshader/src/sm4_analyze.cpp | 122 - .../d3d1x/d3d1xshader/src/sm4_dump.cpp | 222 -- .../d3d1x/d3d1xshader/src/sm4_parse.cpp| 445 .../state_trackers/d3d1x/d3d1xshader/src/utils.h | 45 - .../d3d1x/d3d1xshader/tools/fxdis.cpp | 75 - .../state_trackers/d3d1x/d3d1xstutil/Makefile |5 - .../d3d1x/d3d1xstutil/include/d3d1xstutil.h| 1110 - .../d3d1x/d3d1xstutil/src/d3d_sm4_enums.cpp| 42 - .../d3d1x/d3d1xstutil/src/dxgi_enums.cpp | 165 -- .../state_trackers/d3d1x/d3d1xstutil/src/guids.cpp |6 - src/gallium/state_trackers/d3d1x/d3dapi/Makefile |4 - src/gallium/state_trackers/d3d1x/d3dapi/d3d10.idl | 1554 .../state_trackers/d3d1x/d3dapi/d3d10_1.idl| 191 -- .../state_trackers/d3d1x/d3dapi/d3d10misc.h| 47 - .../state_trackers/d3d1x/d3dapi/d3d10shader.idl| 269 --- src/gallium/state_trackers/d3d1x/d3dapi/d3d11.idl | 2492 .../state_trackers/d3d1x/d3dapi/d3d11shader.idl| 287 --- .../state_trackers/d3d1x/d3dapi/d3dcommon.idl | 704 -- src/gallium/state_trackers/d3d1x/d3dapi/dxgi.idl | 470 .../state_trackers/d3d1x/d3dapi/dxgiformat.idl | 129 - .../state_trackers/d3d1x/d3dapi/dxgitype.idl | 84 - src/gallium/state_trackers/d3d1x/docs/Makefile |5 - .../state_trackers/d3d1x/docs/coding_style.txt | 84 - .../d3d1x/docs/module_dependencies.dot | 25 - .../state_trackers/d3d1x/docs/source_layout.txt| 17 - src/gallium/state_trackers/d3d1x/dxgi/Makefile | 17 - .../state_trackers/d3d1x/dxgi/src/dxgi_loader.cpp | 206 -- .../state_trackers/d3d1x/dxgi/src/dxgi_native.cpp | 1514 .../state_trackers/d3d1x/dxgi/src/dxgi_private.h | 49 - .../state_trackers/d3d1x/dxgid3d10/Makefile|4 -
Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
On 11.03.2013 13:44, Christian König wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. Nice, finally someone implements a proper solution. However, it seems like this isn't used for arrays in the IN and OUT files (varyings). Would it be much more work to use it there, too ? Fragment Shader inputs seem to be read with if (index == 0) return in[0] else if (index == 1) ... sequences. And I may have spotted a bug in the following shader: in vec4 vertex[2]; in vec4 color; out vec4 value[4]; uniform int i, j; void main() { gl_Position = vertex[i]; value[0] = vertex[0]; value[1] = vertex[1]; value[2] = vec4(0.0); value[3] = vec4(0.0); value[j] = color; } gives me DCL IN[0] DCL IN[1] DCL IN[2] DCL OUT[0], POSITION DCL OUT[1], GENERIC[12] DCL OUT[2], GENERIC[13] DCL OUT[3], GENERIC[14] DCL OUT[4], GENERIC[15] DCL CONST[0..1] DCL TEMP[0..3], LOCAL DCL TEMP[4], LOCAL DCL ADDR[0] IMM[0] FLT32 {0., 0., 0., 0.} 0: UARL ADDR[0].x, CONST[1]. 1: MOV TEMP[4], IN[ADDR[0].x] (not the bug) but this is invalid as there is no IN array, just single ones 2: MOV TEMP[0], IN[0] 3: MOV TEMP[1], IN[1] 4: MOV TEMP[2], IMM[0]. 5: MOV TEMP[3], IMM[0]. 6: UARL ADDR[0].x, CONST[0]. 7: MOV TEMP[1][ADDR[0].x], IN[2] why is this TEMP[1][] ? The array seems to be the first declaration ... 8: MOV OUT[1], TEMP[0] 9: MOV OUT[2], TEMP[1] 10: MOV OUT[3], TEMP[2] 11: MOV OUT[4], TEMP[3] 12: MOV OUT[0], TEMP[4] 13: END Ideally this would not use TEMP arrays at all though, but output arrays (I vaguely recall some radeon card doesn't support this though. Is that just outputs or also inputs ?). The solutions proposed or implemented so far all more or less incomplete, so this approach was designed in mind with both completeness and compatibility with existing code. Over all it's just an implementation of what Tom Stellard named solution #4 in this eMail thread: http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html Please review and as usual comments are welcome, Christian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
On 11.03.2013 15:38, Christian König wrote: Am 11.03.2013 14:47, schrieb Christoph Bumiller: On 11.03.2013 13:44, Christian König wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. Nice, finally someone implements a proper solution. However, it seems like this isn't used for arrays in the IN and OUT files (varyings). Would it be much more work to use it there, too ? Shouldn't be to much of a problem, but I just wanted to solve temporaries first and when that's working look at all the rest. Fragment Shader inputs seem to be read with if (index == 0) return in[0] else if (index == 1) ... sequences. Well as said before it only handles temp arrays for now. That looks like the code that's generated if the driver reports to not have indirect support, do you know off hand where exactly that's handled? The glsl_to_tgsi code is unfortunately hard to read at best. Apologies, I didn't remember I that I didn't advertise indirect support for fragment shaders, indirect inputs would be supported though. The reason why I really want array support for inputs, too, is that input space location depends on semantic, and thus doesn't necessarily correspond to the TGSI order. Treatment of arrays should be consistent in the end, right now it looks like we're having, if you read this like C code: float temp0[4]; temp0[i] = x; but float in0, in1, in2, in3; x = in[i]; why is this TEMP[1][] ? The array seems to be the first declaration ... I numbered the declarations starting with 1 (and not 0), so I could use 0 as the SPECIAL case saying that we want to address the whole range of registers and not just one declaration. I did this just for compatibility reasons, so I could look at handling temps only, and doesn't bother to much with inputs/outputs. Well so far the patchset is just an RFC, and so I want to let the list see the patches before either implementing inputs/outputs as well or fully document such quirks/hacks. Ah, good to know. This should be documented (maybe it is and I missed it ?). At least in the comment above struct tgsi_ind_register's definition, which is what I'd look at first. Thanks again for doing this. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem
On 11.03.2013 19:33, Brian Paul wrote: On 03/11/2013 06:44 AM, Christian König wrote: Hi everybody, this problem has been open for quite some time now, with a bunch of different opinions and sometimes even patches floating on the list. The solutions proposed or implemented so far all more or less incomplete, so this approach was designed in mind with both completeness and compatibility with existing code. Over all it's just an implementation of what Tom Stellard named solution #4 in this eMail thread: http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html Please review and as usual comments are welcome, I still don't quite get what's going on here. In Christoph's reply, it seems he tested your patch and got TGSI code that looks like this: DCL IN[0] DCL IN[1] DCL IN[2] DCL OUT[0], POSITION DCL OUT[1], GENERIC[12] DCL OUT[2], GENERIC[13] DCL OUT[3], GENERIC[14] DCL OUT[4], GENERIC[15] DCL CONST[0..1] DCL TEMP[0..3], LOCAL DCL TEMP[4], LOCAL DCL ADDR[0] IMM[0] FLT32 {0., 0., 0., 0.} 0: UARL ADDR[0].x, CONST[1]. 1: MOV TEMP[4], IN[ADDR[0].x] (not the bug) but this is invalid as there is no IN array, just single ones 2: MOV TEMP[0], IN[0] 3: MOV TEMP[1], IN[1] 4: MOV TEMP[2], IMM[0]. 5: MOV TEMP[3], IMM[0]. 6: UARL ADDR[0].x, CONST[0]. 7: MOV TEMP[1][ADDR[0].x], IN[2] What exactly does LOCAL mean on the temp declarations? That the register isn't used for parameter passing between subroutines. Has been introduced a long time ago. See commit 2644952bd4dfa3b75112dee8dfd287a12d770705. But in Jose's example, he wrote: DCL TEMP[1][0..70] DCL TEMP[2][0..7] MOV OUT[1], TEMP[1][ADDR[0].x] In this code, each chunk of temporaries has an explicit name as Marek suggested in his comments to the #4 proposal. The point is that TEMP (and all other spaces likewise) are still a single space, i.e. without duplicate indices. The only real change is that an indirect access is supplied with the index of the declaration of which the range will be accessed. What exactly is your proposal doing? Can you please provide some more sample TGSI code to illustrate what you're doing? And, how it would be extended for inputs/outputs? Thanks. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] RFC: enforcing gallium resource bind flags
On 01.03.2013 11:30, Jose Fonseca wrote: - Original Message - On Fri, Mar 1, 2013 at 12:31 AM, Roland Scheidegger srol...@vmware.com wrote: Hi, there is some sloppy usage of bind flags in the opengl state tracker (that is, resources get used for things which they didn't have the bind flag set). We'd really like to enforce these flags to be honored but it doesn't really work (ok llvmpipe so far would only really care about sampler view, color render target, depth/stencil - see also c8eb2d0e829d0d2aea6a982620da0d3cfb5982e2). Currently it looks like there's at least two issues with those bind flags in the opengl state tracker (for these bind flags only, there are almost certainly more). 1) for textures, the state tracker will always try to allocate resources with both sampler_view and render_target (or depth/stencil) bind flags. However it will drop these flags for resources where this isn't supported. This is all right, however when we try to render to such resources, the surface will be created regardless (but it won't get used as it will fail framebuffer validation which checks the attachments and specifically tests if the format is a renderable format). I guess this could be fixed (seems a bit backward, it might be possible to just look at the resource bind flags to decide if we create a surface or not, and we shouldn't need to check the format later - if we've got the bind flag we know we can create a surface and hence render to). 2) a far more difficult problem seem to be buffers. While piglit doesn't hit it (I modified the tbo test to hit this) it is possible to create buffers with any target and later bind to anything. So the state tracker has no knowledge at all what a buffer will eventually get used for (other than the hint when it was first created), and it seems unreasonable to just set all possible bind flags all the time. But then still enforcing bind flags later would require the state tracker to recreate the resource (with more bind flags) and copy over the old contents, which sounds very bad too. So any ideas? In my opinion, the bind flags are useless, because they cannot be determined for OpenGL resources exactly. The only exceptions are: - PIPE_BIND_CONSTANT_BUFFER, which is set correctly for the default non-UBO constant buffer. - PIPE_BIND_SCANOUT for the DDX and DRM state trackers. - PIPE_BIND_GLOBAL for OpenCL. The radeon drivers ignore the bind flags entirely except SCANOUT and GLOBAL, and r300g also checks for CONSTANT_BUFFER. The OpenGL buffer API doesn't have any bind flags. It only has binding points, and any buffer can be bound to any binding point. Textures are just as fun. You can create a texture or a renderbuffer, but if you use CopyTexSubImage, the roles are swapped - what was a texture is suddenly a renderbuffer and what was a renderbuffer is suddenly a texture. If we didn't need the 3 bind flags mentioned above, I would be for removing pipe_resource::bind, because it's not that useful. API other than OpenGL have clear and stricter binding rules, which drivers can rely upon to make real optimizations. I honestly don't see the what's the difficulty here, the semantics are clear: - If Mesa state tracker doesn't care about BIND flags, that's fine, just let Mesa request as much BIND flags as the driver advertises. - If a gallium driver doesn't care about BIND flags, that's fine, just advertise ~0 bind flags. You can still use the them as a hint for optimization. For example, if an application binds a buffer to GL_ARRAY_BUFFER on creation, it is rather likely to be used mainly as a vertex buffer. So I wouldn't want mesa/st to just simply set all of them and discard that bit of information about the user's intentions. Rather than setting all flags, I'd add an additional PIPE_BIND_UNKNOWN to signal that the binding specified should not be taken too seriously, and that the resource should be bindable to all points possible for the given resource. We could also make it possible to add new bind flags as you go, via an additional gallium API function, where the driver can, for instance, migrate a resource if necessary. But I don't quite like that because the state tracker would have to keep checking the bind flags all the time which is rather ugly. It's really as simple as that. If the state tracker cares about some but not all flags you can easily extrapolate from the above what should be done. But lets not get carried away and throw the baby with the water. Eliminating bind flags from Gallium does nothing to improve Mesa performance, and will hamper other gallium-based graphics stacks. Agreed. I'm strongly against removing the bind flags. There are still other state trackers that could really make use of them. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: support rendering to buffer render targets.
On 27.02.2013 10:44, Jose Fonseca wrote: - Original Message - What is this good for? Is it for UAVs? (unordered access views) No, it is just a standard D3D10 feature: http://msdn.microsoft.com/en-gb/library/windows/desktop/bb204897.aspx Not sure if there's a particular use case for it (e.g, maybe DirectCompute uses this extensively), or just a matter of symmetry in the API (ie., if one can sample from buffer textures, then why not render into them?) I can think of rendering to vertex buffers. It's just annoying that there are no alignment restrictions on the range that is bound (worst case you have to render to a temporary buffer and copy stuff around); but at least it has to be = 8192 bytes (or elements, not sure) in D3D10. For UAVs, I think there is ARB_shader_storage_buffer_object and pipe_context::set_shader_resources. Yeah, D3D11 UAVs are also supposed to be bound separately in the pipeline. Jose Marek On Wed, Feb 27, 2013 at 3:18 AM, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com Unfortunately not usable from OpenGL, and no cap bit. Pretty similar to a 1d texture, though allows specifying a start element. The util code for handling clears also needs adjustments (and fix a bug causing crashes for handling pure integer formats there too). --- src/gallium/auxiliary/util/u_surface.c | 55 +++ src/gallium/drivers/llvmpipe/lp_rast.c | 25 ++-- src/gallium/drivers/llvmpipe/lp_rast_priv.h |4 +- src/gallium/drivers/llvmpipe/lp_scene.c | 35 +++-- src/gallium/drivers/llvmpipe/lp_texture.c | 44 +++-- 5 files changed, 108 insertions(+), 55 deletions(-) diff --git a/src/gallium/auxiliary/util/u_surface.c b/src/gallium/auxiliary/util/u_surface.c index b948b46..fba0798 100644 --- a/src/gallium/auxiliary/util/u_surface.c +++ b/src/gallium/auxiliary/util/u_surface.c @@ -323,20 +323,59 @@ util_clear_render_target(struct pipe_context *pipe, if (!dst-texture) return; /* XXX: should handle multiple layers */ - dst_map = pipe_transfer_map(pipe, - dst-texture, - dst-u.tex.level, - dst-u.tex.first_layer, - PIPE_TRANSFER_WRITE, - dstx, dsty, width, height, dst_trans); + + if (dst-texture-target == PIPE_BUFFER) { + /* + * The fill naturally works on the surface format, however + * the transfer uses resource format which is just bytes for buffers. + */ + unsigned dx, w; + unsigned pixstride = util_format_get_blocksize(dst-format); + dx = dstx * pixstride; + w = width * pixstride; + dst_map = pipe_transfer_map(pipe, + dst-texture, + 0, 0, + PIPE_TRANSFER_WRITE, + dx, 0, w, 1, + dst_trans); + dst_map = (uint8_t *)dst_map + dst-u.buf.first_element * pixstride; + } + else { + /* XXX: should handle multiple layers */ + dst_map = pipe_transfer_map(pipe, + dst-texture, + dst-u.tex.level, + dst-u.tex.first_layer, + PIPE_TRANSFER_WRITE, + dstx, dsty, width, height, dst_trans); + + } assert(dst_map); if (dst_map) { + enum pipe_format format = dst-format; assert(dst_trans-stride 0); - util_pack_color(color-f, dst-texture-format, uc); - util_fill_rect(dst_map, dst-texture-format, + if (util_format_is_pure_integer(format)) { + /* + * We expect int/uint clear values here, though some APIs + * might disagree (but in any case util_pack_color() + * couldn't handle it)... + */ + if (util_format_is_pure_sint(format)) { +util_format_write_4i(format, color-i, 0, uc, 0, 0, 0, 1, 1); + } + else { +assert(util_format_is_pure_uint(format)); +util_format_write_4ui(format, color-ui, 0, uc, 0, 0, 0, 1, 1); + } + } + else { + util_pack_color(color-f, dst-format, uc); + } + util_fill_rect(dst_map, dst-format, dst_trans-stride, 0, 0, width, height, uc); diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c b/src/gallium/drivers/llvmpipe/lp_rast.c index b5e5da6..6183f41 100644 --- a/src/gallium/drivers/llvmpipe/lp_rast.c +++ b/src/gallium/drivers/llvmpipe/lp_rast.c @@ -165,32 +165,13 @@ lp_rast_clear_color(struct lp_rasterizer_task *task, for (i = 0; i scene-fb.nr_cbufs; i++) { enum pipe_format format =
Re: [Mesa-dev] [PATCH] gallium: fix tgsi SAMPLE_L opcode to use separate source for explicit lod
On 11.02.2013 20:47, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com It looks like using coord.w as explicit lod value is a mistake, most likely because some dx10 docs had it specified that way. Seems this was changed though: http://msdn.microsoft.com/en-us/library/windows/desktop/hh447229%28v=vs.85%29.aspx - let's just hope it doesn't depend on runtime build version or something. Not only would this need translation (so go against the stated goal these opcodes should be close to dx10 semantics) but it would prevent usage of this opcode with cube arrays, which is apparently possible: http://msdn.microsoft.com/en-us/library/windows/desktop/bb509699%28v=vs.85%29.aspx (Note not only does this show cube arrays using explicit lod, but also the confusion with this opcode: it lists an explicit lod parameter value, but then states last component of location is used as lod). (For true hw drivers, only nv50 had code to handle it, and it appears the code was already right for the new semantics, though fix up the seemingly wrong c/d arguments while there.) --- src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c |5 + src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c|2 +- src/gallium/auxiliary/tgsi/tgsi_exec.c |2 +- src/gallium/auxiliary/tgsi/tgsi_info.c |2 +- src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h |2 +- src/gallium/docs/source/tgsi.rst | 12 ++-- .../drivers/nv50/codegen/nv50_ir_from_tgsi.cpp |2 +- .../state_trackers/d3d1x/gd3d1x/sm4_to_tgsi.cpp|9 ++--- 8 files changed, 14 insertions(+), 22 deletions(-) diff --git a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp index 5078eb4..acec623 100644 --- a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp @@ -2065,7 +2065,7 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn) case TGSI_OPCODE_SAMPLE_L: case TGSI_OPCODE_SAMPLE_C: case TGSI_OPCODE_SAMPLE_C_LZ: - handleTEX(dst0, 1, 2, 0x30, 0x31, 0x40, 0x50); + handleTEX(dst0, 1, 2, 0x30, 0x30, 0x30, 0x40); Thanks, this looks good. It was probably completely wrong before. break; case TGSI_OPCODE_TXF: case TGSI_OPCODE_LOAD: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] gallium: add facilities for indirect drawing
On 04.02.2013 08:27, Michel Dänzer wrote: On Fre, 2013-02-01 at 22:50 +0100, Christoph Bumiller wrote: diff --git a/src/gallium/drivers/r300/r300_screen.c b/src/gallium/drivers/r300/r300_screen.c index d0f0070..7ae9dd6 100644 --- a/src/gallium/drivers/r300/r300_screen.c +++ b/src/gallium/drivers/r300/r300_screen.c @@ -155,6 +155,7 @@ static int r300_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_TEXTURE_MULTISAMPLE: case PIPE_CAP_CUBE_MAP_ARRAY: case PIPE_CAP_TEXTURE_BUFFER_OBJECTS: +case PIPE_CAP_DRAW_INDIRECT: return 0; /* SWTCL-only features. */ Thanks for adding the cap to r300g, but what about r600g and radeonsi? For r300, nv30 and nv50 it was clear that it's not going to be supported. For sp and lp I just used the helper function because there's probably no better way to do it there. For r600 and radeonsi I thought you'd set the cap together with a patch that implements the feature, there's probably plenty of time until this has gone through review :) (Can't tell if it's easy to do or not since I can't find EG+ docs, but on nvc0 it was rather simple.) But if no one feels like doing that until indirect drawing can be merged, I'll add the return 0's for you as well. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] mesa: implement GL_ARB_draw_indirect
On 02.02.2013 08:32, Adrian M Negreanu wrote: On Fri, Feb 1, 2013 at 11:50 PM, Christoph Bumiller e0425...@student.tuwien.ac.at wrote: I have 1 piglit test to check drawing with several combinations of parameters (using transform feedback to write the commands), but will make some more tests for various things like interaction with PrimitiveRestart or error conditions. (http://people.freedesktop.org/~chrisbmr/0001-arb_draw_indirect-add-initial-test.patch) The gallium interface specifies a start_instance parameter that the GL extension doesn't have (it's reservedMustBeZero instead, but, seriously, why ? D3D does have it. Because making yet another extension will be so much fun ?) Not sure if we want to expose this with the compatibilit profile. Hi, I have tested your changes on Android and Linux but it fails for Android. Oops, thanks, it's a copy-paste error; next time I shall try to remember building all drivers ... Tested the patch(es) on top of the following commits: == 6c7e95c intel: implement create image from texture 8e2454c intel: Account for mt-offset in intel_miptree_map 11f5c82 intel: Create a miptree using offsets in intel_set_texture_image_region 45a28a9 i965: Account for offsets when updating SURFACE_STATE. 163b35e intel: add pixel offset calculator for miptree levels 7014df0 intel: Expose intel_miptree_create_internal as intel_miptree_create_layout. f9e4e5f intel: expose dimensions and offsets of a miptree level in DRIImage Failed to build for android 6c7e95c intel: implement create image from texture 8e2454c intel: Account for mt-offset in intel_miptree_map 11f5c82 intel: Create a miptree using offsets in intel_set_texture_image_region 45a28a9 i965: Account for offsets when updating SURFACE_STATE. 163b35e intel: add pixel offset calculator for miptree levels 7014df0 intel: Expose intel_miptree_create_internal as intel_miptree_create_layout. f9e4e5f intel: expose dimensions and offsets of a miptree level in DRIImage src/mesa/drivers/dri/i965/intel_buffer_objects.c:375:46: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] src/mesa/drivers/dri/i965/intel_fbo.c: In function 'intel_map_renderbuffer': src/mesa/drivers/dri/i965/intel_fbo.c:146:11: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_miptree_map_gtt': src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1123:58: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1136:23: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1136:41: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_miptree_unmap_etc': src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1344:17: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1345:17: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_miptree_alloc_mcs': src/mesa/drivers/dri/i965/intel_mipmap_tree.c:305:4: warning: 'format' may be used uninitialized in this function [-Wuninitialized] src/mesa/drivers/dri/i965/intel_mipmap_tree.c:814:14: note: 'format' was declared here src/mesa/drivers/dri/i965/intel_tex_subimage.c: In function 'intel_texsubimage_tiled_memcpy': src/mesa/drivers/dri/i965/intel_tex_subimage.c:301:29: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] src/mesa/drivers/dri/i965/intel_tex_subimage.c:302:17: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] src/mesa/drivers/dri/i965/intel_tex_validate.c: In function 'intel_tex_map_image_for_swrast': src/mesa/drivers/dri/i965/intel_tex_validate.c:189:73: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] src/mesa/drivers/dri/i965/intel_tex_validate.c:190:15: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] In file included from src/mesa/drivers/dri/i965/brw_context.c:44:0: src/mesa/drivers/dri/i965/brw_draw.h:45:33: error: conflicting types for 'tfb_vertcount' src/mesa/drivers/dri/i965/brw_draw.h:44:45: note: previous definition of 'tfb_vertcount' was here make: *** [out/target/product/samsungxe700t/obj/SHARED_LIBRARIES/i965_dri_intermediates/brw_context.o] Error 1 FAILURE Successfully built configuration linux, no issues -- Regards! http://groleo.wordpress.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org
[Mesa-dev] [PATCH] gallium: add PIPE_BIND_COMMAND_BUFFER
Intend to merge this into the previous ARB_draw_indirect patches. Just in case there's any complaints ... Needed to add this so the DRAW_INDIRECT_BUFFER doesn't get placed into a non-GPU accessible domain. Besides, this seems reasonable, and D3D11 has it, too (albeit a specialized version, called D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS). --- src/gallium/docs/source/screen.rst |2 ++ src/gallium/include/pipe/p_defines.h |1 + src/mesa/state_tracker/st_cb_bufferobjects.c |3 +++ 3 files changed, 6 insertions(+), 0 deletions(-) diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index c94d87d..6bf0b3a 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -295,6 +295,8 @@ resources might be created and handled quite differently. bound to the graphics pipeline as a shader resource. * ``PIPE_BIND_COMPUTE_RESOURCE``: A buffer or texture that can be bound to the compute program as a shader resource. +* ``PIPE_BIND_COMMAND_BUFFER``: A buffer or that may be sourced by the + GPU command processor, like with indirect drawing. .. _pipe_usage: diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index 1aea9f4..4fb91cf 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -315,6 +315,7 @@ enum pipe_flush_flags { #define PIPE_BIND_GLOBAL (1 18) /* set_global_binding */ #define PIPE_BIND_SHADER_RESOURCE (1 19) /* set_shader_resources */ #define PIPE_BIND_COMPUTE_RESOURCE (1 20) /* set_compute_resources */ +#define PIPE_BIND_COMMAND_BUFFER (1 21) /* pipe_draw_info.indirect */ /* The first two flags above were previously part of the amorphous * TEXTURE_USAGE, most of which are now descriptions of the ways a diff --git a/src/mesa/state_tracker/st_cb_bufferobjects.c b/src/mesa/state_tracker/st_cb_bufferobjects.c index d516735..265f758 100644 --- a/src/mesa/state_tracker/st_cb_bufferobjects.c +++ b/src/mesa/state_tracker/st_cb_bufferobjects.c @@ -205,6 +205,9 @@ st_bufferobj_data(struct gl_context *ctx, case GL_UNIFORM_BUFFER: bind = PIPE_BIND_CONSTANT_BUFFER; break; + case GL_DRAW_INDIRECT_BUFFER: + bind = PIPE_BIND_COMMAND_BUFFER; + break; default: bind = 0; } -- 1.7.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] gallium: add SQRT shader opcode
On 01.02.2013 19:29, Brian Paul wrote: The glsl-to-tgsi translater will emit SQRT to implement GLSL's sqrt() and distance() functions if the PIPE_SHADER_CAP_TGSI_SQRT_SUPPORTED query says it's supported by the driver. Otherwise, sqrt(x) is implemented with x*rsq(x). The problem with this is sqrt(0) must be handled specially because rsq(0) might be Inf/NaN/undefined (and then 0*rsq(0) is Inf/Nan/undefined). In the That's why we do rcp(rsq(x)), that works correctly. I'm not sure we really need a cap for this though ... except to avoid modifying drivers ;) I'll advertise the cap anyway, I prefer to be able to handle it internally. But I like this change, lowering SQRT (or not) is device specific and shouldn't be done unconditionally just because the API can't represent it. glsl-to-tgsi code we use an extra CMP to check if x is zero and then replace the result of x*rsq(x) with zero. In the end, this makes sqrt() generate much more reasonable code for drivers that can do square roots. Note that many of piglit's generated shader tests use the GLSL distance() function. --- src/gallium/docs/source/tgsi.rst |9 + src/gallium/include/pipe/p_defines.h |3 ++- src/gallium/include/pipe/p_shader_tokens.h |2 +- 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 548a9a3..5f03f32 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -89,6 +89,15 @@ This instruction replicates its result. dst = \frac{1}{\sqrt{|src.x|}} +.. opcode:: SQRT - Square Root + +This instruction replicates its result. + +.. math:: + + dst = {\sqrt{src.x}} + + .. opcode:: EXP - Approximate Exponential Base 2 .. math:: diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index d0db5e4..fdf6e7f 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -542,7 +542,8 @@ enum pipe_shader_cap PIPE_SHADER_CAP_SUBROUTINES = 16, /* BGNSUB, ENDSUB, CAL, RET */ PIPE_SHADER_CAP_INTEGERS = 17, PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLERS = 18, - PIPE_SHADER_CAP_PREFERRED_IR = 19 + PIPE_SHADER_CAP_PREFERRED_IR = 19, + PIPE_SHADER_CAP_TGSI_SQRT_SUPPORTED = 20 }; /** diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index 3fb12fb..a9fb6aa 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -275,7 +275,7 @@ struct tgsi_property_data { #define TGSI_OPCODE_SUB 17 #define TGSI_OPCODE_LRP 18 #define TGSI_OPCODE_CND 19 -/* gap */ +#define TGSI_OPCODE_SQRT20 #define TGSI_OPCODE_DP2A21 /* gap */ #define TGSI_OPCODE_FRC 24 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] mesa: implement GL_ARB_draw_indirect
I have 1 piglit test to check drawing with several combinations of parameters (using transform feedback to write the commands), but will make some more tests for various things like interaction with PrimitiveRestart or error conditions. (http://people.freedesktop.org/~chrisbmr/0001-arb_draw_indirect-add-initial-test.patch) The gallium interface specifies a start_instance parameter that the GL extension doesn't have (it's reservedMustBeZero instead, but, seriously, why ? D3D does have it. Because making yet another extension will be so much fun ?) Not sure if we want to expose this with the compatibilit profile. --- src/mapi/glapi/gen/ARB_draw_indirect.xml | 45 + src/mapi/glapi/gen/Makefile.am |1 + src/mapi/glapi/gen/gl_API.xml|4 +- src/mesa/drivers/dri/i965/brw_draw.c |3 +- src/mesa/drivers/dri/i965/brw_draw.h |3 +- src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c |9 +- src/mesa/main/api_validate.c | 159 src/mesa/main/api_validate.h | 26 +++ src/mesa/main/bufferobj.c|9 + src/mesa/main/dd.h | 12 ++ src/mesa/main/dlist.c| 41 + src/mesa/main/extensions.c |2 + src/mesa/main/get.c |5 + src/mesa/main/get_hash_params.py |2 + src/mesa/main/mtypes.h |4 + src/mesa/main/tests/dispatch_sanity.cpp |8 +- src/mesa/main/vtxfmt.c |7 + src/mesa/state_tracker/st_cb_rasterpos.c |2 +- src/mesa/state_tracker/st_draw.c |3 +- src/mesa/state_tracker/st_draw.h |6 +- src/mesa/state_tracker/st_draw_feedback.c|3 +- src/mesa/tnl/tnl.h |3 +- src/mesa/vbo/vbo.h |5 +- src/mesa/vbo/vbo_exec_array.c| 251 +- src/mesa/vbo/vbo_exec_draw.c |2 +- src/mesa/vbo/vbo_primitive_restart.c |4 +- src/mesa/vbo/vbo_rebase.c|2 +- src/mesa/vbo/vbo_save_api.c | 53 ++ src/mesa/vbo/vbo_save_draw.c |2 +- src/mesa/vbo/vbo_split_copy.c|2 +- src/mesa/vbo/vbo_split_inplace.c |2 +- 31 files changed, 652 insertions(+), 28 deletions(-) create mode 100644 src/mapi/glapi/gen/ARB_draw_indirect.xml diff --git a/src/mapi/glapi/gen/ARB_draw_indirect.xml b/src/mapi/glapi/gen/ARB_draw_indirect.xml new file mode 100644 index 000..7de03cd --- /dev/null +++ b/src/mapi/glapi/gen/ARB_draw_indirect.xml @@ -0,0 +1,45 @@ +?xml version=1.0? +!DOCTYPE OpenGLAPI SYSTEM gl_API.dtd + +OpenGLAPI + +category name=GL_ARB_draw_indirect number=87 + +enum name=DRAW_INDIRECT_BUFFER value=0x8F3F/ +enum name=DRAW_INDIRECT_BUFFER_BINDING value=0x8F43/ + +function name=DrawArraysIndirect offset=assign exec=dynamic +param name=mode type=GLenum/ +param name=indirect type=const GLvoid */ +/function + +function name=DrawElementsIndirect offset=assign exec=dynamic +param name=mode type=GLenum/ +param name=type type=GLenum/ +param name=indirect type=const GLvoid */ +/function + +/category + + +category name=GL_ARB_multi_draw_indirect number=133 + +function name=MultiDrawArraysIndirect offset=assign exec=dynamic +param name=mode type=GLenum/ +param name=indirect type=const GLvoid */ +param name=primcount type=GLsizei/ +param name=stride type=GLsizei/ +/function + +function name=MultiDrawElementsIndirect offset=assign exec=dynamic +param name=mode type=GLenum/ +param name=type type=GLenum/ +param name=indirect type=const GLvoid */ +param name=primcount type=GLsizei/ +param name=stride type=GLsizei/ +/function + +/category + + +/OpenGLAPI diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am index 4d51bbc..37fdea1 100644 --- a/src/mapi/glapi/gen/Makefile.am +++ b/src/mapi/glapi/gen/Makefile.am @@ -96,6 +96,7 @@ API_XML = \ ARB_depth_clamp.xml \ ARB_draw_buffers_blend.xml \ ARB_draw_elements_base_vertex.xml \ + ARB_draw_indirect.xml \ ARB_draw_instanced.xml \ ARB_ES2_compatibility.xml \ ARB_ES3_compatibility.xml \ diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index 4cbd724..bb6034f 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -8239,6 +8239,8 @@ !-- ARB extensions #86...#93 -- +xi:include href=ARB_draw_indirect.xml xmlns:xi=http://www.w3.org/2001/XInclude/ + category name=GL_ARB_transform_feedback3 number=94 enum name=MAX_TRANSFORM_FEEDBACK_BUFFERS value=0x8E70/ enum name=MAX_VERTEX_STREAMS value=0x8E71/ @@ -8316,7
[Mesa-dev] [PATCH 3/4] st/mesa: add support for indirect drawing
--- src/mesa/state_tracker/st_draw.c |7 ++- src/mesa/state_tracker/st_extensions.c |4 +++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_draw.c b/src/mesa/state_tracker/st_draw.c index 0f3aae7..f9fbd32 100644 --- a/src/mesa/state_tracker/st_draw.c +++ b/src/mesa/state_tracker/st_draw.c @@ -256,6 +256,10 @@ st_draw_vbo(struct gl_context *ctx, } } + if (indirect) { + info.indirect = st_buffer_object(indirect)-buffer; + } + /* do actual drawing */ for (i = 0; i nr_prims; i++) { info.mode = translate_prim( ctx, prims[i].mode ); @@ -268,6 +272,7 @@ st_draw_vbo(struct gl_context *ctx, info.min_index = info.start; info.max_index = info.start + info.count - 1; } + info.indirect_offset = prims[i].indirect_offset; if (ST_DEBUG DEBUG_DRAW) { debug_printf(st/draw: mode %s start %u count %u indexed %d\n, @@ -277,7 +282,7 @@ st_draw_vbo(struct gl_context *ctx, info.indexed); } - if (info.count_from_stream_output) { + if (info.count_from_stream_output || info.indirect) { cso_draw_vbo(st-cso_context, info); } else if (info.primitive_restart) { diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 214588f..548bab2 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -398,7 +398,9 @@ void st_init_extensions(struct st_context *st) { o(MESA_texture_array), PIPE_CAP_MAX_TEXTURE_ARRAY_LAYERS }, { o(OES_standard_derivatives), PIPE_CAP_SM3 }, - { o(ARB_texture_cube_map_array), PIPE_CAP_CUBE_MAP_ARRAY } + { o(ARB_texture_cube_map_array), PIPE_CAP_CUBE_MAP_ARRAY }, + { o(ARB_draw_indirect),PIPE_CAP_DRAW_INDIRECT }, + { o(ARB_multi_draw_indirect), PIPE_CAP_DRAW_INDIRECT } }; /* Required: render target and sampler support */ -- 1.7.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev