Re: [Mesa-dev] [PATCH 6/8] i965: Implement HiZ resolves on Broadwell.
On 02/18/2014 01:38 PM, Eric Anholt wrote: Kenneth Graunke kenn...@whitecape.org writes: [snip] diff --git a/src/mesa/drivers/dri/i965/gen8_depth_state.c b/src/mesa/drivers/dri/i965/gen8_depth_state.c index f30ff28..3fa20c8 100644 --- a/src/mesa/drivers/dri/i965/gen8_depth_state.c +++ b/src/mesa/drivers/dri/i965/gen8_depth_state.c @@ -203,3 +203,108 @@ gen8_emit_depth_stencil_hiz(struct brw_context *brw, brw-depthstencil.stencil_offset, hiz, width, height, depth, lod, min_array_element); } + +/** + * Emit packets to perform a depth/HiZ resolve or fast depth/stencil clear. + * + * See the Optimized Depth Buffer Clear and/or Stencil Buffer Clear section + * of the hardware documentation for details. + */ +void +gen8_hiz_exec(struct brw_context *brw, struct intel_mipmap_tree *mt, + unsigned int level, unsigned int layer, enum gen6_hiz_op op) +{ + if (op == GEN6_HIZ_OP_NONE) + return; + + assert(mt-first_level == 0); + + struct intel_mipmap_level *miplevel = mt-level[level]; + + /* The basic algorithm is: +* - If needed, emit 3DSTATE_{DEPTH,HIER_DEPTH,STENCIL}_BUFFER and +* 3DSTATE_CLEAR_PARAMS packets to set up the relevant buffers. +* - If needed, emit 3DSTATE_DRAWING_RECTANGLE. +* - Emit 3DSTATE_WM_HZ_OP with a bit set for the particular operation. +* - Do a special PIPE_CONTROL to trigger an implicit rectangle primitive. +* - Emit 3DSTATE_WM_HZ_OP with no bits set to return to normal rendering. +*/ + emit_depth_packets(brw, mt, + brw_depth_format(brw, mt-format), + BRW_SURFACE_2D, + true, /* depth writes */ + NULL, false, 0, /* no stencil for now */ + true, /* hiz */ + mt-logical_width0, + mt-logical_height0, + MAX2(mt-logical_depth0, 1), Is logical_depth0 ever 0? That seems like a bug. No, I guess it isn't. It looks like I copy and pasted this from BLORP, or was being overly cautious for some reason. Dropped. + level, + layer); /* min_array_element */ + + BEGIN_BATCH(4); + OUT_BATCH(_3DSTATE_DRAWING_RECTANGLE 16 | (4 - 2)); + OUT_BATCH(0); + OUT_BATCH(((mt-logical_width0 - 1) 0x) | + ((mt-logical_height0 - 1) 16)); + OUT_BATCH(0); + ADVANCE_BATCH(); The drawing rectangle should be using the level's size, not the level 0 size. Yes, this makes sense...we bind a specific miplevel of the depth buffer, so presumably the (0, 0) origin is the start of that miplevel, not the start of the whole tree. I'll change that. Since the drawing rectangle is just the bounds of where you can draw, and not actually the clear/resolve rectangle, I think specifying one that's too large shouldn't be harmful. But specifying the right value is trivial, so I agree we should do it. + uint32_t sample_mask = 0x; + if (mt-num_samples 0) { + dw1 |= SET_FIELD(ffs(mt-num_samples) - 1, GEN8_WM_HZ_NUM_SAMPLES); + sample_mask = gen6_determine_sample_mask(brw); + } I don't think we want the user-set sample mask stuff to change the samples affected by our hiz/depth resolves. I think you can just drop the if block. Good point, whatever the user specified is probably unrelated to our values. I've dropped the sample_mask variable and just stuffed 0x in the packet. I kept the if-block for the dw1 |= ...num_samples... line. + + BEGIN_BATCH(5); + OUT_BATCH(_3DSTATE_WM_HZ_OP 16 | (5 - 2)); + OUT_BATCH(dw1); + OUT_BATCH(0); + OUT_BATCH(SET_FIELD(miplevel-width, GEN8_WM_HZ_CLEAR_RECTANGLE_X_MAX) | + SET_FIELD(miplevel-height, GEN8_WM_HZ_CLEAR_RECTANGLE_Y_MAX)); + OUT_BATCH(SET_FIELD(sample_mask, GEN8_WM_HZ_SAMPLE_MASK)); + ADVANCE_BATCH(); I think now the miplevel-width should be minify(mt-logical_width0, level). Hope that helped Yes, that's much nicer - and correct for MSAA buffers! I'm unclear whether we need to do: ALIGN(minify(mt-logical_width0, level), 8) ALIGN(minify(mt-logical_height0, level), 4) (both here and in the drawing rectangle) I've read seemingly contradictory information...it sounds like it might be necessary for depth resolves, but not otherwise...but I could be misinterpreting it. It seems to be working... + + /* Emit a PIPE_CONTROL with Post-Sync Operation set to Write Immediate +* Data, and no other bits set. This causes 3DSTATE_WM_HZ_OP's state to +* take effect, and spawns a rectangle primitive. +*/ + brw_emit_pipe_control_write(brw, + PIPE_CONTROL_WRITE_IMMEDIATE, + brw-batch.workaround_bo, 0, 0, 0); + + /* Emit 3DSTATE_WM_HZ_OP again to disable the state overrides. */ + BEGIN_BATCH(5); + OUT_BATCH(_3DSTATE_WM_HZ_OP 16 | (5
[Mesa-dev] [PATCH 01/13] i965: Simplify Broadwell's 3DSTATE_MULTISAMPLE sample count handling.
These enumerations are simply log2 of the number of multisamples shifted by a bit, so we can calculate them using ffs() in a lot less code. Suggested by Eric Anholt. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen8_multisample_state.c | 26 +++--- 1 file changed, 3 insertions(+), 23 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_multisample_state.c b/src/mesa/drivers/dri/i965/gen8_multisample_state.c index 64c7208..bfe0d5b 100644 --- a/src/mesa/drivers/dri/i965/gen8_multisample_state.c +++ b/src/mesa/drivers/dri/i965/gen8_multisample_state.c @@ -33,33 +33,13 @@ void gen8_emit_3dstate_multisample(struct brw_context *brw, unsigned num_samples) { - uint32_t number_of_multisamples = 0; + assert(num_samples = 16); - switch (num_samples) { - case 0: - case 1: - number_of_multisamples = MS_NUMSAMPLES_1; - break; - case 2: - number_of_multisamples = MS_NUMSAMPLES_2; - break; - case 4: - number_of_multisamples = MS_NUMSAMPLES_4; - break; - case 8: - number_of_multisamples = MS_NUMSAMPLES_8; - break; - case 16: - number_of_multisamples = MS_NUMSAMPLES_16; - break; - default: - assert(!Unrecognized num_samples in gen8_emit_3dstate_multisample); - break; - } + unsigned log2_samples = ffs(MAX2(num_samples, 1)) - 1; BEGIN_BATCH(2); OUT_BATCH(GEN8_3DSTATE_MULTISAMPLE 16 | (2 - 2)); - OUT_BATCH(MS_PIXEL_LOCATION_CENTER | number_of_multisamples); + OUT_BATCH(MS_PIXEL_LOCATION_CENTER | log2_samples 1); ADVANCE_BATCH(); } -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/13] i965/fs: Implement FS_OPCODE_SET_SAMPLE_ID on Broadwell.
Largely cut and paste from Gen7; it works the same way. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_fs.h | 4 src/mesa/drivers/dri/i965/gen8_fs_generator.cpp | 29 - 2 files changed, 32 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index b5fb0eb..99c6298 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -704,6 +704,10 @@ private: struct brw_reg index, struct brw_reg offset); void generate_mov_dispatch_to_flags(fs_inst *ir); + void generate_set_sample_id(fs_inst *ir, + struct brw_reg dst, + struct brw_reg src0, + struct brw_reg src1); void generate_set_simd4x2_offset(fs_inst *ir, struct brw_reg dst, struct brw_reg offset); diff --git a/src/mesa/drivers/dri/i965/gen8_fs_generator.cpp b/src/mesa/drivers/dri/i965/gen8_fs_generator.cpp index e19d960..0078228 100644 --- a/src/mesa/drivers/dri/i965/gen8_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/gen8_fs_generator.cpp @@ -698,6 +698,33 @@ gen8_fs_generator::generate_set_simd4x2_offset(fs_inst *ir, } /** + * Do a special ADD with vstride=1, width=4, hstride=0 for src1. + */ +void +gen8_fs_generator::generate_set_sample_id(fs_inst *ir, + struct brw_reg dst, + struct brw_reg src0, + struct brw_reg src1) +{ + assert(dst.type == BRW_REGISTER_TYPE_D || dst.type == BRW_REGISTER_TYPE_UD); + assert(src0.type == BRW_REGISTER_TYPE_D || src0.type == BRW_REGISTER_TYPE_UD); + + struct brw_reg reg = retype(stride(src1, 1, 4, 0), BRW_REGISTER_TYPE_UW); + + unsigned save_exec_size = default_state.exec_size; + default_state.exec_size = BRW_EXECUTE_8; + + gen8_instruction *add = ADD(dst, src0, reg); + gen8_set_mask_control(add, BRW_MASK_DISABLE); + if (dispatch_width == 16) { + add = ADD(offset(dst, 1), offset(src0, 1), suboffset(reg, 2)); + gen8_set_mask_control(add, BRW_MASK_DISABLE); + } + + default_state.exec_size = save_exec_size; +} + +/** * Change the register's data type from UD to HF, doubling the strides in order * to compensate for halving the data type width. */ @@ -1148,7 +1175,7 @@ gen8_fs_generator::generate_code(exec_list *instructions) break; case FS_OPCODE_SET_SAMPLE_ID: - assert(!XXX: Missing Gen8 scalar support for SET_SAMPLE_ID); + generate_set_sample_id(ir, dst, src[0], src[1]); break; case FS_OPCODE_PACK_HALF_2x16_SPLIT: -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/13] i965: Set Position XY Offset Select bits in 3DSTATE_PS on Broadwell.
Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen8_ps_state.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c b/src/mesa/drivers/dri/i965/gen8_ps_state.c index e93668e..57bf053 100644 --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c @@ -187,6 +187,24 @@ upload_ps_state(struct brw_context *brw) if (brw-wm.prog_data-prog_offset_16) dw6 |= GEN7_PS_16_DISPATCH_ENABLE; + /* From the documentation for this packet: +* If the PS kernel does not need the Position XY Offsets to +* compute a Position Value, then this field should be programmed +* to POSOFFSET_NONE. +* +* SW Recommendation: If the PS kernel needs the Position Offsets +* to compute a Position XY value, this field should match Position +* ZW Interpolation Mode to ensure a consistent position.xyzw +* computation. +* +* We only require XY sample offsets. So, this recommendation doesn't +* look useful at the moment. We might need this in future. +*/ + if (brw-wm.prog_data-uses_pos_offset) + dw6 |= GEN7_PS_POSOFFSET_SAMPLE; + else + dw6 |= GEN7_PS_POSOFFSET_NONE; + dw7 |= brw-wm.prog_data-first_curbe_grf GEN7_PS_DISPATCH_START_GRF_SHIFT_0 | brw-wm.prog_data-first_curbe_grf_16 GEN7_PS_DISPATCH_START_GRF_SHIFT_2; -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/13] i965: Only use the SIMD16 program for per-sample shading on Broadwell.
This is a straight port from gen7_wm_state.c; I haven't looked into whether we can do both. v2: Actually do it right. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen8_ps_state.c | 38 --- 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c b/src/mesa/drivers/dri/i965/gen8_ps_state.c index 57bf053..a834b85 100644 --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c @@ -183,10 +183,6 @@ upload_ps_state(struct brw_context *brw) if (brw-wm.prog_data-nr_params 0) dw6 |= GEN7_PS_PUSH_CONSTANT_ENABLE; - dw6 |= GEN7_PS_8_DISPATCH_ENABLE; - if (brw-wm.prog_data-prog_offset_16) - dw6 |= GEN7_PS_16_DISPATCH_ENABLE; - /* From the documentation for this packet: * If the PS kernel does not need the Position XY Offsets to * compute a Position Value, then this field should be programmed @@ -205,13 +201,39 @@ upload_ps_state(struct brw_context *brw) else dw6 |= GEN7_PS_POSOFFSET_NONE; - dw7 |= - brw-wm.prog_data-first_curbe_grf GEN7_PS_DISPATCH_START_GRF_SHIFT_0 | - brw-wm.prog_data-first_curbe_grf_16 GEN7_PS_DISPATCH_START_GRF_SHIFT_2; + /* In case of non 1x per sample shading, only one of SIMD8 and SIMD16 +* should be enabled. We do 'SIMD16 only' dispatch if a SIMD16 shader +* is successfully compiled. In majority of the cases that bring us +* better performance than 'SIMD8 only' dispatch. +*/ + int min_invocations_per_fragment = + _mesa_get_min_invocations_per_fragment(ctx, brw-fragment_program, false); + assert(min_invocations_per_fragment = 1); + + if (brw-wm.prog_data-prog_offset_16) { + dw6 |= GEN7_PS_16_DISPATCH_ENABLE; + if (min_invocations_per_fragment == 1) { + dw6 |= GEN7_PS_8_DISPATCH_ENABLE; + dw7 |= (brw-wm.prog_data-first_curbe_grf + GEN7_PS_DISPATCH_START_GRF_SHIFT_0); + dw7 |= (brw-wm.prog_data-first_curbe_grf_16 + GEN7_PS_DISPATCH_START_GRF_SHIFT_2); + } else { + dw7 |= (brw-wm.prog_data-first_curbe_grf_16 + GEN7_PS_DISPATCH_START_GRF_SHIFT_0); + } + } else { + dw6 |= GEN7_PS_8_DISPATCH_ENABLE; + dw7 |= (brw-wm.prog_data-first_curbe_grf + GEN7_PS_DISPATCH_START_GRF_SHIFT_0); + } BEGIN_BATCH(12); OUT_BATCH(_3DSTATE_PS 16 | (12 - 2)); - OUT_BATCH(brw-wm.base.prog_offset); + if (brw-wm.prog_data-prog_offset_16 min_invocations_per_fragment 1) + OUT_BATCH(brw-wm.base.prog_offset + brw-wm.prog_data-prog_offset_16); + else + OUT_BATCH(brw-wm.base.prog_offset); OUT_BATCH(0); OUT_BATCH(dw3); if (brw-wm.prog_data-total_scratch) { -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/13] i965: Thwack multisample enable bit in 3DSTATE_RASTER.
The meaning and effects of this bit are surprisingly complicated. See Rasterization Windower Multisampling Multisample ModesState. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_defines.h | 1 + src/mesa/drivers/dri/i965/gen8_sf_state.c | 4 2 files changed, 5 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index dea0940..1cbbe67 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1707,6 +1707,7 @@ enum brw_message_target { # define GEN8_RASTER_CULL_FRONT (2 16) # define GEN8_RASTER_CULL_BACK (3 16) # define GEN8_RASTER_SMOOTH_POINT_ENABLE(1 13) +# define GEN8_RASTER_API_MULTISAMPLE_ENABLE (1 12) # define GEN8_RASTER_LINE_AA_ENABLE (1 2) # define GEN8_RASTER_SCISSOR_ENABLE (1 1) # define GEN8_RASTER_VIEWPORT_Z_CLIP_TEST_ENABLE(1 0) diff --git a/src/mesa/drivers/dri/i965/gen8_sf_state.c b/src/mesa/drivers/dri/i965/gen8_sf_state.c index a5cd9f8..b31b17e 100644 --- a/src/mesa/drivers/dri/i965/gen8_sf_state.c +++ b/src/mesa/drivers/dri/i965/gen8_sf_state.c @@ -209,6 +209,9 @@ upload_raster(struct brw_context *brw) if (ctx-Point.SmoothFlag) dw1 |= GEN8_RASTER_SMOOTH_POINT_ENABLE; + if (ctx-Multisample._Enabled) + dw1 |= GEN8_RASTER_API_MULTISAMPLE_ENABLE; + if (ctx-Polygon.OffsetFill) dw1 |= GEN6_SF_GLOBAL_DEPTH_OFFSET_SOLID; @@ -274,6 +277,7 @@ const struct brw_tracked_state gen8_raster_state = { .dirty = { .mesa = _NEW_BUFFERS | _NEW_LINE | + _NEW_MULTISAMPLE | _NEW_POINT | _NEW_POLYGON | _NEW_SCISSOR | -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/13] i965: Enable smooth points when multisampling without point sprites.
According to the Point Multisample Rasterization of the OpenGL specification (3.0 or later), smooth points are supposed to be enabled implicitly when multisampling, regardless of the GL_POINT_SMOOTH flag. However, if GL_POINT_SPRITE is enabled, you get square points no matter what. Core contexts always enable point sprites, so this effectively makes smooth points go away, even in the case of multisampling. Fixes Piglit's EXT_framebuffer_multisample/point-smooth tests. (Yes, that's right folks, we actually have Piglit tests for this.) Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen8_sf_state.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen8_sf_state.c b/src/mesa/drivers/dri/i965/gen8_sf_state.c index b31b17e..0693fee 100644 --- a/src/mesa/drivers/dri/i965/gen8_sf_state.c +++ b/src/mesa/drivers/dri/i965/gen8_sf_state.c @@ -139,8 +139,11 @@ upload_sf(struct brw_context *brw) if (!(ctx-VertexProgram.PointSizeEnabled || ctx-Point._Attenuated)) dw3 |= GEN6_SF_USE_STATE_POINT_WIDTH; - if (ctx-Point.SmoothFlag) + /* _NEW_POINT | _NEW_MULTISAMPLE */ + if ((ctx-Point.SmoothFlag || ctx-Multisample._Enabled) + !ctx-Point.PointSprite) { dw3 |= GEN8_SF_SMOOTH_POINT_ENABLE; + } dw3 |= GEN6_SF_LINE_AA_MODE_TRUE; @@ -166,6 +169,7 @@ const struct brw_tracked_state gen8_sf_state = { .mesa = _NEW_LIGHT | _NEW_PROGRAM | _NEW_LINE | + _NEW_MULTISAMPLE | _NEW_POINT, .brw = BRW_NEW_CONTEXT, .cache = 0, -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/13] i965: Add missing sample shading bits to Gen8's 3DSTATE_PS_EXTRA.
v2: Also set the oMask Present to Render Target bit, which is required for shaders that write oMask. Otherwise the hardware won't expect the extra data. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen8_ps_state.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c b/src/mesa/drivers/dri/i965/gen8_ps_state.c index e0a1c9b..e93668e 100644 --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c @@ -22,6 +22,7 @@ */ #include stdbool.h +#include program/program.h #include brw_state.h #include brw_defines.h #include intel_batchbuffer.h @@ -29,6 +30,7 @@ static void upload_ps_extra(struct brw_context *brw) { + struct gl_context *ctx = brw-ctx; /* BRW_NEW_FRAGMENT_PROGRAM */ const struct brw_fragment_program *fp = brw_fragment_program_const(brw-fragment_program); @@ -63,6 +65,18 @@ upload_ps_extra(struct brw_context *brw) if (fp-program.Base.InputsRead VARYING_BIT_POS) dw1 |= GEN8_PSX_USES_SOURCE_DEPTH | GEN8_PSX_USES_SOURCE_W; + /* _NEW_BUFFERS */ + bool multisampled_fbo = ctx-DrawBuffer-Visual.samples 1; + if (multisampled_fbo + _mesa_get_min_invocations_per_fragment(ctx, fp-program, false) 1) + dw1 |= GEN8_PSX_SHADER_IS_PER_SAMPLE; + + if (fp-program.Base.SystemValuesRead SYSTEM_BIT_SAMPLE_MASK_IN) + dw1 |= GEN8_PSX_SHADER_USES_INPUT_COVERAGE_MASK; + + if (brw-wm.prog_data-uses_omask) + dw1 |= GEN8_PSX_OMASK_TO_RENDER_TARGET; + BEGIN_BATCH(2); OUT_BATCH(_3DSTATE_PS_EXTRA 16 | (2 - 2)); OUT_BATCH(dw1); @@ -71,7 +85,7 @@ upload_ps_extra(struct brw_context *brw) const struct brw_tracked_state gen8_ps_extra = { .dirty = { - .mesa = 0, + .mesa = _NEW_BUFFERS, .brw = BRW_NEW_CONTEXT | BRW_NEW_FRAGMENT_PROGRAM, .cache = 0, }, -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/13] i965: Use ffs() for sample counting in gen7_surface_msaa_bits().
The enumerations are just log2(num_samples) shifted by 3, which we can easily compute via ffs(). This also makes it reusable for Broadwell, which has 2x MSAA. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c index 12d0fa9..154a0fd 100644 --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c @@ -82,12 +82,10 @@ gen7_surface_msaa_bits(unsigned num_samples, enum intel_msaa_layout layout) { uint32_t ss4 = 0; - if (num_samples 4) - ss4 |= GEN7_SURFACE_MULTISAMPLECOUNT_8; - else if (num_samples 1) - ss4 |= GEN7_SURFACE_MULTISAMPLECOUNT_4; - else - ss4 |= GEN7_SURFACE_MULTISAMPLECOUNT_1; + assert(num_samples = 8); + + /* The SURFACE_MULTISAMPLECOUNT_X enums are simply log2(num_samples) 3. */ + ss4 |= (ffs(MAX2(num_samples, 1)) - 1) 3; if (layout == INTEL_MSAA_LAYOUT_IMS) ss4 |= GEN7_SURFACE_MSFMT_DEPTH_STENCIL; -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/13] i965/fs: Implement FS_OPCODE_SET_OMASK on Broadwell.
I made a few changes which I think simplify the code a bit compared to the Gen7 implementation, but which are largely pointless. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_fs.h | 3 +++ src/mesa/drivers/dri/i965/gen8_fs_generator.cpp | 36 - 2 files changed, 38 insertions(+), 1 deletion(-) Apologies for the differences between Gen7 and Gen8 code. I think this is cleaner, and as long as I'm reimplementing it... diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 99c6298..00ac577 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -704,6 +704,9 @@ private: struct brw_reg index, struct brw_reg offset); void generate_mov_dispatch_to_flags(fs_inst *ir); + void generate_set_omask(fs_inst *ir, + struct brw_reg dst, + struct brw_reg sample_mask); void generate_set_sample_id(fs_inst *ir, struct brw_reg dst, struct brw_reg src0, diff --git a/src/mesa/drivers/dri/i965/gen8_fs_generator.cpp b/src/mesa/drivers/dri/i965/gen8_fs_generator.cpp index 0078228..106c7f4 100644 --- a/src/mesa/drivers/dri/i965/gen8_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/gen8_fs_generator.cpp @@ -698,6 +698,40 @@ gen8_fs_generator::generate_set_simd4x2_offset(fs_inst *ir, } /** + * Sets vstride=16, width=8, hstride=2 or vstride=0, width=1, hstride=0 + * (when mask is passed as a uniform) of register mask before moving it + * to register dst. + */ +void +gen8_fs_generator::generate_set_omask(fs_inst *inst, + struct brw_reg dst, + struct brw_reg mask) +{ + assert(dst.type == BRW_REGISTER_TYPE_UW); + + if (dispatch_width == 16) + dst = vec16(dst); + + if (mask.vstride == BRW_VERTICAL_STRIDE_8 + mask.width == BRW_WIDTH_8 + mask.hstride == BRW_HORIZONTAL_STRIDE_1) { + mask = stride(mask, 16, 8, 2); + } else { + assert(mask.vstride == BRW_VERTICAL_STRIDE_0 + mask.width == BRW_WIDTH_1 + mask.hstride == BRW_HORIZONTAL_STRIDE_0); + } + + unsigned save_exec_size = default_state.exec_size; + default_state.exec_size = BRW_EXECUTE_8; + + gen8_instruction *mov = MOV(dst, retype(mask, dst.type)); + gen8_set_mask_control(mov, BRW_MASK_DISABLE); + + default_state.exec_size = save_exec_size; +} + +/** * Do a special ADD with vstride=1, width=4, hstride=0 for src1. */ void @@ -1171,7 +1205,7 @@ gen8_fs_generator::generate_code(exec_list *instructions) break; case FS_OPCODE_SET_OMASK: - assert(!XXX: Missing Gen8 scalar support for SET_OMASK); + generate_set_omask(ir, dst, src[0]); break; case FS_OPCODE_SET_SAMPLE_ID: -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/13] Hack: Disable MCS on Broadwell for now.
--- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 6 ++ 1 file changed, 6 insertions(+) I'm mostly sending this out as a placeholder. Ultimately, we want to get MCS working. I'm not sure whether it would be valuable to push this (with a proper commit message) in the meantime. diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index e604b70..43f51fc 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -84,6 +84,12 @@ compute_msaa_layout(struct brw_context *brw, mesa_format format, GLenum target) case GL_DEPTH_STENCIL: return INTEL_MSAA_LAYOUT_IMS; default: + /* Disable MCS on Broadwell for now. We can enable it once things + * are working without it. + */ + if (brw-gen = 8) + return INTEL_MSAA_LAYOUT_UMS; + /* From the Ivy Bridge PRM, Vol4 Part1 p77 (MCS Enable): * * This field must be set to 0 for all SINT MSRTs when all RT channels -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 13/13] i965: Actually claim to support MSAA on Broadwell.
We need to advertise 8x, 4x, and 2x multisamples. Previously, we only claimed to support 0/1 samples. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_context.c | 6 ++ src/mesa/drivers/dri/i965/intel_screen.c | 5 - 2 files changed, 10 insertions(+), 1 deletion(-) This also makes WebGL work. diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index bb194a7..5800092 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -79,6 +79,12 @@ brw_query_samples_for_format(struct gl_context *ctx, GLenum target, (void) target; switch (brw-gen) { + case 8: + samples[0] = 8; + samples[1] = 4; + samples[2] = 2; + return 3; + case 7: samples[0] = 8; samples[1] = 4; diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index ba22971..b5b0294 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -,11 +,14 @@ intel_detect_swizzling(struct intel_screen *screen) const int* intel_supported_msaa_modes(const struct intel_screen *screen) { + static const int gen8_modes[] = {8, 4, 2, 0, -1}; static const int gen7_modes[] = {8, 4, 0, -1}; static const int gen6_modes[] = {4, 0, -1}; static const int gen4_modes[] = {0, -1}; - if (screen-devinfo-gen = 7) { + if (screen-devinfo-gen = 8) { + return gen8_modes; + } else if (screen-devinfo-gen = 7) { return gen7_modes; } else if (screen-devinfo-gen == 6) { return gen6_modes; -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/13] i965: Use gen7_surface_msaa_bits in Broadwell SURFACE_STATE code.
We already set the number of samples, but were missing the MSAA layout mode. Reusing gen7_surface_msaa_bits makes it easy to set both. This also lets us drop the Gen8 surface_num_multisamples function. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen8_surface_state.c | 16 ++-- 1 file changed, 2 insertions(+), 14 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c b/src/mesa/drivers/dri/i965/gen8_surface_state.c index 22ffa78..594e531 100644 --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c @@ -83,18 +83,6 @@ horizontal_alignment(struct intel_mipmap_tree *mt) } } -static uint32_t -surface_num_multisamples(unsigned num_samples) -{ - assert(num_samples = 0 num_samples = 16); - - if (num_samples == 0) - return GEN7_SURFACE_MULTISAMPLECOUNT_1; - - /* The SURFACE_MULTISAMPLECOUNT_X enums are simply log2(num_samples) 3. */ - return (ffs(num_samples) - 1) 3; -} - static void gen8_emit_buffer_surface_state(struct brw_context *brw, uint32_t *out_offset, @@ -180,7 +168,7 @@ gen8_update_texture_surface(struct gl_context *ctx, surf[3] = SET_FIELD(mt-logical_depth0 - 1, BRW_SURFACE_DEPTH) | (mt-region-pitch - 1); - surf[4] = surface_num_multisamples(mt-num_samples); + surf[4] = gen7_surface_msaa_bits(mt-num_samples, mt-msaa_layout); surf[5] = SET_FIELD(tObj-BaseLevel - mt-first_level, GEN7_SURFACE_MIN_LOD) | (intelObj-_MaxLevel - tObj-BaseLevel); /* mip count */ @@ -322,7 +310,7 @@ gen8_update_renderbuffer_surface(struct brw_context *brw, surf[3] = (depth - 1) BRW_SURFACE_DEPTH_SHIFT | (region-pitch - 1); /* Surface Pitch */ - surf[4] = surface_num_multisamples(mt-num_samples) | + surf[4] = gen7_surface_msaa_bits(mt-num_samples, mt-msaa_layout) | min_array_element GEN7_SURFACE_MIN_ARRAY_ELEMENT_SHIFT | (depth - 1) GEN7_SURFACE_RENDER_TARGET_VIEW_EXTENT_SHIFT; -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/13] i965: Update physical width/height munging for 2x IMS MSAA.
I can't find any documentation to explain what ought to be done here, so I simply guessed based on the pattern I observed in the 4x/8x cases. It appears to work, but it could be totally wrong. I was able to find the Sandybridge PRM quote from the comments in the latest documentation: Shared Functions 3D Sampler Multisampled Surface Behavior. However, it only mentions 4x MSAA - not even 8x. After a substantial amount more digging, I was able to find a second page (incorrectly tagged) which confirmed the formulas in our code for 8x MSAA. However, that page didn't mention 2x MSAA at all. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 43f51fc..07308dc 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -311,6 +311,11 @@ intel_miptree_create_layout(struct brw_context *brw, * sample 3 is in that bottom right 2x2 block. */ switch (num_samples) { + case 2: +assert(brw-gen = 8); +width0 = ALIGN(width0, 2) * 2; +height0 = ALIGN(height0, 2); +break; case 4: width0 = ALIGN(width0, 2) * 2; height0 = ALIGN(height0, 2) * 2; @@ -320,7 +325,7 @@ intel_miptree_create_layout(struct brw_context *brw, height0 = ALIGN(height0, 2) * 2; break; default: -/* num_samples should already have been quantized to 0, 1, 4, or +/* num_samples should already have been quantized to 0, 1, 2, 4, or * 8. */ assert(false); -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Use MOV, not OR for setting URB write channel enables on Gen8+.
On Broadwell, g0.5 contains the Scratch Space Pointer; using OR puts some bits of that into ignored sections of our message header. While this doesn't hurt, it's also not terribly /useful/. Using MOV is sufficient to set the only interesting bits in this part of the message header. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen8_vec4_generator.cpp | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_vec4_generator.cpp b/src/mesa/drivers/dri/i965/gen8_vec4_generator.cpp index d0f574a..7ed5d2a 100644 --- a/src/mesa/drivers/dri/i965/gen8_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/gen8_vec4_generator.cpp @@ -173,11 +173,8 @@ gen8_vec4_generator::generate_urb_write(vec4_instruction *ir, bool vs) if (!(ir-urb_write_flags BRW_URB_WRITE_USE_CHANNEL_MASKS)) { /* Enable Channel Masks in the URB_WRITE_OWORD message header */ default_state.access_mode = BRW_ALIGN_1; - inst = OR(retype(brw_vec1_grf(GEN7_MRF_HACK_START + ir-base_mrf, 5), - BRW_REGISTER_TYPE_UD), -retype(brw_vec1_grf(0, 5), BRW_REGISTER_TYPE_UD), -brw_imm_ud(0xff00)); - gen8_set_mask_control(inst, BRW_MASK_DISABLE); + MOV_RAW(brw_vec1_grf(GEN7_MRF_HACK_START + ir-base_mrf, 5), + brw_imm_ud(0xff00)); default_state.access_mode = BRW_ALIGN_16; } -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: add missing DebugMessageControl types
Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- src/mesa/main/errors.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/mesa/main/errors.c b/src/mesa/main/errors.c index 5f4eac6..c00c796 100644 --- a/src/mesa/main/errors.c +++ b/src/mesa/main/errors.c @@ -575,6 +575,11 @@ validate_params(struct gl_context *ctx, unsigned caller, /* this value is only valid for GL_KHR_debug functions */ if (caller == CONTROL || caller == INSERT) break; + case GL_DEBUG_TYPE_PUSH_GROUP: + case GL_DEBUG_TYPE_POP_GROUP: + /* this value is only valid for GL_KHR_debug */ + if (caller == CONTROL) + break; case GL_DONT_CARE: if (caller == CONTROL || caller == CONTROL_ARB) break; -- 1.8.5.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 75212] New: Mesa selects wrong DRI driver
https://bugs.freedesktop.org/show_bug.cgi?id=75212 Priority: medium Bug ID: 75212 CC: e...@anholt.net, lem...@gmail.com Assignee: mesa-dev@lists.freedesktop.org Summary: Mesa selects wrong DRI driver Severity: normal Classification: Unclassified OS: Linux (All) Reporter: eero.t.tammi...@intel.com Hardware: x86 (IA32) Status: NEW Version: git Component: GLX Product: Mesa Test environment: - HSW GT3e - Up to date Ubuntu 13.10 - Latest versions of libdrm and mesa built (from today's git master), with only i965 driver enabled Steps to reproduce: 1. Run glxgears with latest mesa Expected output: - i965 driver loaded HW acceleration used Actual output: - Mesa complains that it doesn't find i915 nor swrast - glxgears runs really slowly Bisecting Mesa identified this commit as culprit: -- commit 7bd95ec437a5b1052fa17780a9d66677ec1fdc35 Author: Eric Anholt e...@anholt.net Date: Thu Jan 23 10:21:09 2014 -0800 dri2: Trust our own driver name lookup over the server's. This allows Mesa to choose to rename driver .sos (or split drivers), without needing a flag day with the corresponding 2D driver. v2: Undo the loader-only-for-dri3 change. -- If Mesa needs some new dependency or specific version of that to identify on which HW it's running, it should check for that in configure. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 75212] Mesa selects wrong DRI driver
https://bugs.freedesktop.org/show_bug.cgi?id=75212 --- Comment #1 from Emil Velikov emil.l.veli...@gmail.com --- There are a couple of solutions for this [1] [2]. I would prefer the latter as I've never been a fan of black/white listing. Eric are you leaning towards either solution ? Can we get an ack or your thoughts if you are not keen on either one ? Thanks [1] http://lists.freedesktop.org/archives/mesa-dev/2014-January/052888.html [2] http://lists.freedesktop.org/archives/mesa-dev/2014-January/052981.html -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 75098] OpenGL ES2 with fbdev - link error
https://bugs.freedesktop.org/show_bug.cgi?id=75098 --- Comment #4 from Christian Prochaska christian.procha...@genode-labs.com --- (In reply to comment #3) Created attachment 94319 [details] [review] configure: use shared-glapi when more than one gl* API is used Hmm forcing shared-glapi whenever more than one gl* api is used seems like the only sensible thing to do imho. This patch fixes the problem by convering the dri dependency gl*. Feel free to give the patch a try. This patch works for me. Thanks. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] st/omx/enc: add multi scaling buffers for performance improvement
From: Leo Liu leo@amd.com Signed-off-by: Leo Liu leo@amd.com --- src/gallium/state_trackers/omx/vid_enc.c | 38 src/gallium/state_trackers/omx/vid_enc.h | 7 -- 2 files changed, 29 insertions(+), 16 deletions(-) diff --git a/src/gallium/state_trackers/omx/vid_enc.c b/src/gallium/state_trackers/omx/vid_enc.c index 6e65274..3f1d01c 100644 --- a/src/gallium/state_trackers/omx/vid_enc.c +++ b/src/gallium/state_trackers/omx/vid_enc.c @@ -273,8 +273,9 @@ static OMX_ERRORTYPE vid_enc_Destructor(OMX_COMPONENTTYPE *comp) vl_compositor_cleanup_state(priv-cstate); vl_compositor_cleanup(priv-compositor); - if (priv-scale_buffer) - priv-scale_buffer-destroy(priv-scale_buffer); + for (i = 0; i OMX_VID_ENC_NUM_SCALING_BUFFERS; ++i) + if (priv-scale_buffer[i]) + priv-scale_buffer[i]-destroy(priv-scale_buffer[i]); if (priv-s_pipe) priv-s_pipe-destroy(priv-s_pipe); @@ -447,7 +448,8 @@ static OMX_ERRORTYPE vid_enc_SetConfig(OMX_HANDLETYPE handle, OMX_INDEXTYPE idx, OMX_COMPONENTTYPE *comp = handle; vid_enc_PrivateType *priv = comp-pComponentPrivate; OMX_ERRORTYPE r; - + int i; + if (!config) return OMX_ErrorBadParameter; @@ -473,9 +475,11 @@ static OMX_ERRORTYPE vid_enc_SetConfig(OMX_HANDLETYPE handle, OMX_INDEXTYPE idx, if (scale-xWidth 176 || scale-xHeight 144) return OMX_ErrorBadParameter; - if (priv-scale_buffer) { - priv-scale_buffer-destroy(priv-scale_buffer); - priv-scale_buffer = NULL; + for (i = 0; i OMX_VID_ENC_NUM_SCALING_BUFFERS; ++i) { + if (priv-scale_buffer[i]) { +priv-scale_buffer[i]-destroy(priv-scale_buffer[i]); +priv-scale_buffer[i] = NULL; + } } priv-scale = *scale; @@ -487,9 +491,11 @@ static OMX_ERRORTYPE vid_enc_SetConfig(OMX_HANDLETYPE handle, OMX_INDEXTYPE idx, templat.width = priv-scale.xWidth; templat.height = priv-scale.xHeight; templat.interlaced = false; - priv-scale_buffer = priv-s_pipe-create_video_buffer(priv-s_pipe, templat); - if (!priv-scale_buffer) -return OMX_ErrorInsufficientResources; + for (i = 0; i OMX_VID_ENC_NUM_SCALING_BUFFERS; ++i) { +priv-scale_buffer[i] = priv-s_pipe-create_video_buffer(priv-s_pipe, templat); +if (!priv-scale_buffer[i]) + return OMX_ErrorInsufficientResources; + } } break; @@ -545,8 +551,10 @@ static OMX_ERRORTYPE vid_enc_MessageHandler(OMX_COMPONENTTYPE* comp, internalReq templat.profile = PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE; templat.entrypoint = PIPE_VIDEO_ENTRYPOINT_ENCODE; templat.chroma_format = PIPE_VIDEO_CHROMA_FORMAT_420; - templat.width = priv-scale_buffer ? priv-scale.xWidth : port-sPortParam.format.video.nFrameWidth; - templat.height = priv-scale_buffer ? priv-scale.xHeight : port-sPortParam.format.video.nFrameHeight; + templat.width = priv-scale_buffer[priv-current_scale_buffer] ? +priv-scale.xWidth : port-sPortParam.format.video.nFrameWidth; + templat.height = priv-scale_buffer[priv-current_scale_buffer] ? +priv-scale.xHeight : port-sPortParam.format.video.nFrameHeight; templat.max_references = 1; priv-codec = priv-s_pipe-create_video_codec(priv-s_pipe, templat); @@ -736,7 +744,7 @@ static OMX_ERRORTYPE vid_enc_EncodeFrame(omx_base_PortType *port, OMX_BUFFERHEAD /* -- scale input image - */ - if (priv-scale_buffer) { + if (priv-scale_buffer[priv-current_scale_buffer]) { struct vl_compositor *compositor = priv-compositor; struct vl_compositor_state *s = priv-cstate; struct pipe_sampler_view **views; @@ -744,7 +752,8 @@ static OMX_ERRORTYPE vid_enc_EncodeFrame(omx_base_PortType *port, OMX_BUFFERHEAD unsigned i; views = vbuf-get_sampler_view_planes(vbuf); - dst_surface = priv-scale_buffer-get_surfaces(priv-scale_buffer); + dst_surface = priv-scale_buffer[priv-current_scale_buffer]-get_surfaces + (priv-scale_buffer[priv-current_scale_buffer]); vl_compositor_clear_layers(s); for (i = 0; i VL_MAX_SURFACES; ++i) { @@ -768,7 +777,8 @@ static OMX_ERRORTYPE vid_enc_EncodeFrame(omx_base_PortType *port, OMX_BUFFERHEAD } size = priv-scale.xWidth * priv-scale.xHeight * 2; - vbuf = priv-scale_buffer; + vbuf = priv-scale_buffer[priv-current_scale_buffer++]; + priv-current_scale_buffer %= OMX_VID_ENC_NUM_SCALING_BUFFERS; } priv-s_pipe-flush(priv-s_pipe, NULL, 0); diff --git a/src/gallium/state_trackers/omx/vid_enc.h b/src/gallium/state_trackers/omx/vid_enc.h index 431ca91..a3fdfae 100644 --- a/src/gallium/state_trackers/omx/vid_enc.h +++
[Mesa-dev] [PATCH] st/omx: fix prevFrameNumOffset handling
From: Christian König christian.koe...@amd.com Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/state_trackers/omx/vid_dec_h264.c | 4 1 file changed, 4 insertions(+) diff --git a/src/gallium/state_trackers/omx/vid_dec_h264.c b/src/gallium/state_trackers/omx/vid_dec_h264.c index 5f4a261..7f1c2fa 100644 --- a/src/gallium/state_trackers/omx/vid_dec_h264.c +++ b/src/gallium/state_trackers/omx/vid_dec_h264.c @@ -765,6 +765,8 @@ static void slice_header(vid_dec_PrivateType *priv, struct vl_rbsp *rbsp, else FrameNumOffset = priv-codec_data.h264.prevFrameNumOffset; + priv-codec_data.h264.prevFrameNumOffset = FrameNumOffset; + if (sps-num_ref_frames_in_pic_order_cnt_cycle != 0) absFrameNum = FrameNumOffset + frame_num; else @@ -814,6 +816,8 @@ static void slice_header(vid_dec_PrivateType *priv, struct vl_rbsp *rbsp, else FrameNumOffset = priv-codec_data.h264.prevFrameNumOffset; + priv-codec_data.h264.prevFrameNumOffset = FrameNumOffset; + if (IdrPicFlag) tempPicOrderCnt = 0; else if (nal_ref_idc == 0) -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] build: Fix FTBFS bug introduced by ee55500c22
The referenced commit set the with_dri_drivers variable to yes in the auto case, which is an unknown classic DRI driver and leads to a FTBFS. CC: Emil Velikov emil.l.veli...@gmail.com Signed-off-by: Kai Wasserbäch k...@dev.carbon-project.org --- configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 8390d27..ad00d93 100644 --- a/configure.ac +++ b/configure.ac @@ -955,7 +955,7 @@ no) ;; auto) # classic DRI drivers if test x$enable_opengl = xyes; then -with_dri_drivers=yes +with_dri_drivers=swrast,i915,i965,radeon,r200,nouveau fi ;; esac -- 1.8.5.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH-RFC] i965: do not advertise MESA_FORMAT_Z_UNORM16 support
Chia-I Wu olva...@gmail.com writes: Since 73bc6061f5c3b6a3bb7a8114bb2e1ab77d23cfdb, Z16 support is not advertised for OpenGL ES contexts due to the terrible performance. It is still enabled for desktop GL because it was believed GL 3.0+ requires Z16. It turns out only GL 3.0 requires Z16, and that is corrected in later GL versions. In light of that, and per Ian's suggestion, stop advertising Z16 support by default, and add a drirc option, gl30_sized_format_rules, so that users can override. I don't think having the driconf option will help anybody. Let's just unconditionally drop Z16. pgpnzzWLmZCAb.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH 07/19] svga: update shader code for GBS
This patch didn't apply to the 10.1 branch. I've picked most things to the 10.1 branch except this series. Could you put a branch up somewhere and send me a pull request? I'm sure you'd like to have these in the release, and I don't want to mess them up. :) 2e0c90847f16a9cf2a40436beacb65c65535fa4a svga: split / update svga3d header files 024711385ec5333976b124d33a030c30f1345ed1 svga: update dumping code with new GBS commands, etc d993ada50cf2f112bfff2bd7fbb5a6c25ca00306 svga: update svga_winsys interface for GBS 823fbfdca7165ac11eab2a7e168960f5874ebdc3 svga: add new GBS commands 31dfefc47f9f12c49fd3cfb27ba4fe384cb60380 svga: add svga_have_gb_objects/dma() functions 2f1fc8db108eb771414aa5440d4c439f63f4e7c1 svga: update constant buffer code for GBS f84c830b144fd4d53f862fc6ad05541e5bf60a3b svga: update shader code for GBS c1e60a61e8ca3bdac0530ad1aeb3c751f273b73d svga: add helpers for tracking rendering to textures d0c22a6d53a9cce2d40006f3d4d7dd7e2f63aca9 svga: track which textures are rendered to f8bbd8261d297be11f1f2eaf768c2a8ace0cb69d svga: adjust adjustment for point coordinates 6476bcbc5005b76e1494a201f92f3c76bd8e9727 svga: remove a couple unneeded assertions e0a6fb09bdfde40253b924b6c9d1fdf3f16fed21 svga: add new helper functions for GBS buffers 72b0e959fc38cf4f01d8aaeabe7336cc88588f90 svga: update buffer code for GBS 3d1fd6df5315cfa4b9c8b1332f5078a89abc3ed8 svga: update texture code for GBS c9e9b1862b472b2671b8d3b339f9f7624a272073 pipebuffer, winsys: Add a size match parameter to the cached buffer manager 8af358d8bc9f7563cd76313b16d7b149197a4b2c gallium/pipebuffer: Add a cache buffer manager bypass mask 59e7c596215155b556ba8cf06233b621b88f49c6 gallium/util: Add flush/map debug utility code fe6a854477c2ed30c37c200668a4dc86512120f7 svga/winsys: implement GBS support 141e39a8936a7b19fd857a35ea2d200daf1777c7 svga/winsys: Propagate surface shared information to the winsys e4a5a9fd2fdd5b5ae8b85ac743a228f409a21a70 gallium/pipebuffer: change pb_cache_manager_create() size_factor to float On 02/13/2014 05:20 PM, Brian Paul wrote: Reviewed-by: Thomas Hellstrom thellst...@vmware.com Cc: 10.1 mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/svga/svga_context.c |4 +++ src/gallium/drivers/svga/svga_context.h |2 ++ src/gallium/drivers/svga/svga_draw.c | 14 src/gallium/drivers/svga/svga_shader.c | 21 ++- src/gallium/drivers/svga/svga_state.h|4 +++ src/gallium/drivers/svga/svga_state_fs.c | 58 -- src/gallium/drivers/svga/svga_state_vs.c | 56 - src/gallium/drivers/svga/svga_tgsi.h |3 ++ 8 files changed, 142 insertions(+), 20 deletions(-) diff --git a/src/gallium/drivers/svga/svga_context.c b/src/gallium/drivers/svga/svga_context.c index de769ca..4da9a65 100644 --- a/src/gallium/drivers/svga/svga_context.c +++ b/src/gallium/drivers/svga/svga_context.c @@ -197,6 +197,10 @@ void svga_context_flush( struct svga_context *svga, */ svga-rebind.rendertargets = TRUE; svga-rebind.texture_samplers = TRUE; + if (svga_have_gb_objects(svga)) { + svga-rebind.vs = TRUE; + svga-rebind.fs = TRUE; + } if (SVGA_DEBUG DEBUG_SYNC) { if (fence) diff --git a/src/gallium/drivers/svga/svga_context.h b/src/gallium/drivers/svga/svga_context.h index 71a8eea..0daab0b 100644 --- a/src/gallium/drivers/svga/svga_context.h +++ b/src/gallium/drivers/svga/svga_context.h @@ -374,6 +374,8 @@ struct svga_context struct { unsigned rendertargets:1; unsigned texture_samplers:1; + unsigned vs:1; + unsigned fs:1; } rebind; struct svga_hwtnl *hwtnl; diff --git a/src/gallium/drivers/svga/svga_draw.c b/src/gallium/drivers/svga/svga_draw.c index 80dbc35..fa0cac4 100644 --- a/src/gallium/drivers/svga/svga_draw.c +++ b/src/gallium/drivers/svga/svga_draw.c @@ -213,6 +213,20 @@ svga_hwtnl_flush(struct svga_hwtnl *hwtnl) } } + if (svga-rebind.vs) { + ret = svga_reemit_vs_bindings(svga); + if (ret != PIPE_OK) { +return ret; + } + } + + if (svga-rebind.fs) { + ret = svga_reemit_fs_bindings(svga); + if (ret != PIPE_OK) { +return ret; + } + } + SVGA_DBG(DEBUG_DMA, draw to sid %p, %d prims\n, svga-curr.framebuffer.cbufs[0] ? svga_surface(svga-curr.framebuffer.cbufs[0])-handle : NULL, diff --git a/src/gallium/drivers/svga/svga_shader.c b/src/gallium/drivers/svga/svga_shader.c index 88877b2..6b6b441 100644 --- a/src/gallium/drivers/svga/svga_shader.c +++ b/src/gallium/drivers/svga/svga_shader.c @@ -43,7 +43,17 @@ svga_define_shader(struct svga_context *svga, { unsigned codeLen = variant-nr_tokens * sizeof(variant-tokens[0]); - { + if (svga_have_gb_objects(svga)) { + struct svga_winsys_screen *sws =
Re: [Mesa-dev] [PATCH 1/3] glcpp: Only warn for macro names containing __
I'm hoping that Tapani or Darius will verify that this patch actually fixes the problem. That's why people CC other people on patches. :) On 02/18/2014 10:19 AM, Ian Romanick wrote: From: Ian Romanick ian.d.roman...@intel.com Section 3.3 (Preprocessor) of the GLSL 1.30 spec (and later) and the GLSL ES spec (all versions) say: All macro names containing two consecutive underscores ( __ ) are reserved for future use as predefined macro names. All macro names prefixed with GL_ (GL followed by a single underscore) are also reserved. The intention is that names containing __ are reserved for internal use by the implementation, and names prefixed with GL_ are reserved for use by Khronos. Since every extension adds a name prefixed with GL_ (i.e., the name of the extension), that should be an error. Names simply containing __ are dangerous to use, but should be allowed. In similar cases, the C++ preprocessor specification says, no diagnostic is required. Per the Khronos bug mentioned below, a future version of the GLSL specification will clarify this. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Cc: 9.2 10.0 10.1 mesa-sta...@lists.freedesktop.org Cc: Tapani Pälli lem...@gmail.com Cc: Kenneth Graunke kenn...@whitecape.org Cc: Darius Spitznagel d.spitzna...@goodbytez.de Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870 Bugzilla: Khronos #11702 --- src/glsl/glcpp/glcpp-parse.y | 22 +++--- .../tests/086-reserved-macro-names.c.expected | 4 ++-- 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y index 5bb2891..bdc598f 100644 --- a/src/glsl/glcpp/glcpp-parse.y +++ b/src/glsl/glcpp/glcpp-parse.y @@ -1770,11 +1770,27 @@ static void _check_for_reserved_macro_name (glcpp_parser_t *parser, YYLTYPE *loc, const char *identifier) { - /* According to the GLSL specification, macro names starting with __ - * or GL_ are reserved for future use. So, don't allow them. + /* Section 3.3 (Preprocessor) of the GLSL 1.30 spec (and later) and + * the GLSL ES spec (all versions) say: + * + * All macro names containing two consecutive underscores ( __ ) + * are reserved for future use as predefined macro names. All + * macro names prefixed with GL_ (GL followed by a single + * underscore) are also reserved. + * + * The intention is that names containing __ are reserved for internal + * use by the implementation, and names prefixed with GL_ are reserved + * for use by Khronos. Since every extension adds a name prefixed + * with GL_ (i.e., the name of the extension), that should be an + * error. Names simply containing __ are dangerous to use, but should + * be allowed. + * + * A future version of the GLSL specification will clarify this. */ if (strstr(identifier, __)) { - glcpp_error (loc, parser, Macro names containing \__\ are reserved.\n); + glcpp_warning(loc, parser, + Macro names containing \__\ are reserved + for use by the implementation.\n); } if (strncmp(identifier, GL_, 3) == 0) { glcpp_error (loc, parser, Macro names starting with \GL_\ are reserved.\n); diff --git a/src/glsl/glcpp/tests/086-reserved-macro-names.c.expected b/src/glsl/glcpp/tests/086-reserved-macro-names.c.expected index d8aa9f0..5ca42a9 100644 --- a/src/glsl/glcpp/tests/086-reserved-macro-names.c.expected +++ b/src/glsl/glcpp/tests/086-reserved-macro-names.c.expected @@ -1,8 +1,8 @@ -0:1(10): preprocessor error: Macro names containing __ are reserved. +0:1(10): preprocessor warning: Macro names containing __ are reserved for use by the implementation. 0:2(9): preprocessor error: Macro names starting with GL_ are reserved. -0:3(9): preprocessor error: Macro names containing __ are reserved. +0:3(9): preprocessor warning: Macro names containing __ are reserved for use by the implementation. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/8] i965: Implement HiZ resolves on Broadwell.
Kenneth Graunke kenn...@whitecape.org writes: On 02/18/2014 01:38 PM, Eric Anholt wrote: Kenneth Graunke kenn...@whitecape.org writes: [snip] diff --git a/src/mesa/drivers/dri/i965/gen8_depth_state.c b/src/mesa/drivers/dri/i965/gen8_depth_state.c index f30ff28..3fa20c8 100644 --- a/src/mesa/drivers/dri/i965/gen8_depth_state.c +++ b/src/mesa/drivers/dri/i965/gen8_depth_state.c @@ -203,3 +203,108 @@ gen8_emit_depth_stencil_hiz(struct brw_context *brw, brw-depthstencil.stencil_offset, hiz, width, height, depth, lod, min_array_element); } + +/** + * Emit packets to perform a depth/HiZ resolve or fast depth/stencil clear. + * + * See the Optimized Depth Buffer Clear and/or Stencil Buffer Clear section + * of the hardware documentation for details. + */ +void +gen8_hiz_exec(struct brw_context *brw, struct intel_mipmap_tree *mt, + unsigned int level, unsigned int layer, enum gen6_hiz_op op) +{ + if (op == GEN6_HIZ_OP_NONE) + return; + + assert(mt-first_level == 0); + + struct intel_mipmap_level *miplevel = mt-level[level]; + + /* The basic algorithm is: +* - If needed, emit 3DSTATE_{DEPTH,HIER_DEPTH,STENCIL}_BUFFER and +* 3DSTATE_CLEAR_PARAMS packets to set up the relevant buffers. +* - If needed, emit 3DSTATE_DRAWING_RECTANGLE. +* - Emit 3DSTATE_WM_HZ_OP with a bit set for the particular operation. +* - Do a special PIPE_CONTROL to trigger an implicit rectangle primitive. +* - Emit 3DSTATE_WM_HZ_OP with no bits set to return to normal rendering. +*/ + emit_depth_packets(brw, mt, + brw_depth_format(brw, mt-format), + BRW_SURFACE_2D, + true, /* depth writes */ + NULL, false, 0, /* no stencil for now */ + true, /* hiz */ + mt-logical_width0, + mt-logical_height0, + MAX2(mt-logical_depth0, 1), Is logical_depth0 ever 0? That seems like a bug. No, I guess it isn't. It looks like I copy and pasted this from BLORP, or was being overly cautious for some reason. Dropped. + level, + layer); /* min_array_element */ + + BEGIN_BATCH(4); + OUT_BATCH(_3DSTATE_DRAWING_RECTANGLE 16 | (4 - 2)); + OUT_BATCH(0); + OUT_BATCH(((mt-logical_width0 - 1) 0x) | + ((mt-logical_height0 - 1) 16)); + OUT_BATCH(0); + ADVANCE_BATCH(); The drawing rectangle should be using the level's size, not the level 0 size. Yes, this makes sense...we bind a specific miplevel of the depth buffer, so presumably the (0, 0) origin is the start of that miplevel, not the start of the whole tree. I'll change that. Since the drawing rectangle is just the bounds of where you can draw, and not actually the clear/resolve rectangle, I think specifying one that's too large shouldn't be harmful. But specifying the right value is trivial, so I agree we should do it. + uint32_t sample_mask = 0x; + if (mt-num_samples 0) { + dw1 |= SET_FIELD(ffs(mt-num_samples) - 1, GEN8_WM_HZ_NUM_SAMPLES); + sample_mask = gen6_determine_sample_mask(brw); + } I don't think we want the user-set sample mask stuff to change the samples affected by our hiz/depth resolves. I think you can just drop the if block. Good point, whatever the user specified is probably unrelated to our values. I've dropped the sample_mask variable and just stuffed 0x in the packet. I kept the if-block for the dw1 |= ...num_samples... line. + + BEGIN_BATCH(5); + OUT_BATCH(_3DSTATE_WM_HZ_OP 16 | (5 - 2)); + OUT_BATCH(dw1); + OUT_BATCH(0); + OUT_BATCH(SET_FIELD(miplevel-width, GEN8_WM_HZ_CLEAR_RECTANGLE_X_MAX) | + SET_FIELD(miplevel-height, GEN8_WM_HZ_CLEAR_RECTANGLE_Y_MAX)); + OUT_BATCH(SET_FIELD(sample_mask, GEN8_WM_HZ_SAMPLE_MASK)); + ADVANCE_BATCH(); I think now the miplevel-width should be minify(mt-logical_width0, level). Hope that helped Yes, that's much nicer - and correct for MSAA buffers! I'm unclear whether we need to do: ALIGN(minify(mt-logical_width0, level), 8) ALIGN(minify(mt-logical_height0, level), 4) (both here and in the drawing rectangle) I've read seemingly contradictory information...it sounds like it might be necessary for depth resolves, but not otherwise...but I could be misinterpreting it. It seems to be working... Yeah, I'm not clear on how this ought to work for resolves. For clears, the strategy explained for the previous gens made sense: Round down your coords to get 8x4 alignment, then do slow clears on the remaining strips (if any). But for a resolve, what else do you do? pgpFhkUCbiV6h.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
Re: [Mesa-dev] [PATCH 13/23] i965/fs: Take into account reg_offset consistently for MRF regs.
Paul Berry stereotype...@gmail.com writes: On 15 January 2014 14:01, Francisco Jerez curroje...@riseup.net wrote: Paul Berry stereotype...@gmail.com writes: On 2 December 2013 11:31, Francisco Jerez curroje...@riseup.net wrote: Until now it was only being taken into account in the VEC4 back-end but not in the FS back-end. Do it in both cases. --- src/mesa/drivers/dri/i965/brw_fs.h | 2 +- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 10 ++ src/mesa/drivers/dri/i965/brw_shader.h | 7 --- 3 files changed, 11 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 2c36d9f..f918f7e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -615,4 +615,4 @@ bool brw_do_channel_expressions(struct exec_list *instructions); bool brw_do_vector_splitting(struct exec_list *instructions); bool brw_fs_precompile(struct gl_context *ctx, struct gl_shader_program *prog); -struct brw_reg brw_reg_from_fs_reg(fs_reg *reg); +struct brw_reg brw_reg_from_fs_reg(fs_reg *reg, unsigned dispatch_width); diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 8d310a1..1de59eb 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -981,8 +981,9 @@ static uint32_t brw_file_from_reg(fs_reg *reg) } struct brw_reg -brw_reg_from_fs_reg(fs_reg *reg) +brw_reg_from_fs_reg(fs_reg *reg, unsigned dispatch_width) { + const int reg_size = 4 * dispatch_width; What happens when reg.type is UW and dispatch_width is 16? In that case, we would compute reg_size == 64, but the correct value seems like it's actually 32 in this case. Are we perhaps relying on reg.type being a 32-bit type? If so, maybe we should add an assertion: assert(type_sz(reg.type) == 4); Nope, reg_size is supposed to be the size in bytes of a ::reg_offset unit, i.e. one hardware register in SIMD8 mode and two hardware registers in SIMD16 mode as the comment at the definition of ::reg_offset explains. The fixed factor of four is intentional and correct no matter what the register type is. Thanks. Ok, I see. It appears that both this function *and* the comment above reg_offset are assuming that the data type is 32 bit. The comment above reg_offset says: * For pre-register-allocation GRFs and MRFs, this is in units of a * float per pixel (1 hardware register for SIMD8 or SIMD4x2 mode, * or 2 registers for SIMD16 mode). For uniforms, this is in units * of 1 float. but when we get around to adding support for double-precision floats (a feature of GL 4.0), this will no longer work; for double precision types we'll need reg_offset to be measured in units of at least 4 hardware registers in SIMD16 mode to avoid overlap. Similarly, for types that are 16 bits, if we consider reg_offset to be measured in units of 2 hardware registers in SIMD16 mode, we're actually wasting registers, since all 16 values actually fit in a single hardware register. That's not really a big deal right now, since we use 16-bit types so rarely in the FS back-end. I see what you mean, but it seems rather problematic to me to have the unit of reg_offset depend on the register data type. E.g. bit-casting the contents of a register to a type of different size would involve non-trivial algebra on reg_offset. IMHO the ideal solution would be to settle on a fixed unit (e.g. bytes, and remove the subreg_offset field completely) and use a helper function to get the array indexing that seems to be the main use case of reg_offset (e.g. 'index(base_reg, i)' that would take into account the type size of 'base_reg' to calculate the byte offset of element 'i'). In fact, I bet we never use a nonzero reg_offset on a 16-bit type. Not sure if we do already, but my surface packing/unpacking code might in some situations. It still seems to me that it would be worth putting an assertion here to help alert us to what needs to change when we add double precision support (or if someday we have hardware that supports half float computation). I'm not 100% sure what the assertion could be. assert(type_sz(reg-type) == 4); was an optimistic guess. We might have to do assert(type_sz(reg-type) == 4 || reg-reg_offset == 0); in order to avoid tripping on the rare cases where we currently use 16-bit types in fragment shaders. I think that such an assertion would break my homogeneous packing/unpacking code if it ever gets an argument with reg_offset != 0, because it casts its argument into a smaller type (either 8 or 16 bits) and then uses 'subreg_offset' to select the individual components of the packed vector. P.S.: Sorry for the late reply. pgpED3gpN8Ntu.pgp Description: PGP signature
Re: [Mesa-dev] [PATCH] build: Fix FTBFS bug introduced by ee55500c22
On 19/02/14 18:12, Kai Wasserbäch wrote: The referenced commit set the with_dri_drivers variable to yes in the auto case, which is an unknown classic DRI driver and leads to a FTBFS. Thanks for the patch Kai The issue has been reported already[1] and a slightly more appropriate patch has been suggested[2]. It will resolve a few more build cases than the one you have in mind. Feel free to give it a try. -Emil [1] https://bugs.freedesktop.org/show_bug.cgi?id=75126 [2] http://patchwork.freedesktop.org/patch/20467/ CC: Emil Velikov emil.l.veli...@gmail.com Signed-off-by: Kai Wasserbäch k...@dev.carbon-project.org --- configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 8390d27..ad00d93 100644 --- a/configure.ac +++ b/configure.ac @@ -955,7 +955,7 @@ no) ;; auto) # classic DRI drivers if test x$enable_opengl = xyes; then -with_dri_drivers=yes +with_dri_drivers=swrast,i915,i965,radeon,r200,nouveau fi ;; esac ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 07/13] i965: Add missing sample shading bits to Gen8's 3DSTATE_PS_EXTRA.
Kenneth Graunke kenn...@whitecape.org writes: v2: Also set the oMask Present to Render Target bit, which is required for shaders that write oMask. Otherwise the hardware won't expect the extra data. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen8_ps_state.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c b/src/mesa/drivers/dri/i965/gen8_ps_state.c index e0a1c9b..e93668e 100644 --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c @@ -22,6 +22,7 @@ */ #include stdbool.h +#include program/program.h #include brw_state.h #include brw_defines.h #include intel_batchbuffer.h @@ -29,6 +30,7 @@ static void upload_ps_extra(struct brw_context *brw) { + struct gl_context *ctx = brw-ctx; /* BRW_NEW_FRAGMENT_PROGRAM */ const struct brw_fragment_program *fp = brw_fragment_program_const(brw-fragment_program); @@ -63,6 +65,18 @@ upload_ps_extra(struct brw_context *brw) if (fp-program.Base.InputsRead VARYING_BIT_POS) dw1 |= GEN8_PSX_USES_SOURCE_DEPTH | GEN8_PSX_USES_SOURCE_W; + /* _NEW_BUFFERS */ _mesa_get_min_invocations_per_fragment also depends on _NEW_MULTISAMPLE (for its test of Multisample.Enabled). + bool multisampled_fbo = ctx-DrawBuffer-Visual.samples 1; + if (multisampled_fbo + _mesa_get_min_invocations_per_fragment(ctx, fp-program, false) 1) + dw1 |= GEN8_PSX_SHADER_IS_PER_SAMPLE; + + if (fp-program.Base.SystemValuesRead SYSTEM_BIT_SAMPLE_MASK_IN) + dw1 |= GEN8_PSX_SHADER_USES_INPUT_COVERAGE_MASK; + + if (brw-wm.prog_data-uses_omask) + dw1 |= GEN8_PSX_OMASK_TO_RENDER_TARGET; + BEGIN_BATCH(2); OUT_BATCH(_3DSTATE_PS_EXTRA 16 | (2 - 2)); OUT_BATCH(dw1); @@ -71,7 +85,7 @@ upload_ps_extra(struct brw_context *brw) const struct brw_tracked_state gen8_ps_extra = { .dirty = { - .mesa = 0, + .mesa = _NEW_BUFFERS, .brw = BRW_NEW_CONTEXT | BRW_NEW_FRAGMENT_PROGRAM, .cache = 0, }, -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev pgpmseqGu3Qeo.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/13] i965: Only use the SIMD16 program for per-sample shading on Broadwell.
Kenneth Graunke kenn...@whitecape.org writes: This is a straight port from gen7_wm_state.c; I haven't looked into whether we can do both. v2: Actually do it right. Signed-off-by: Kenneth Graunke kenn...@whitecape.org @@ -205,13 +201,39 @@ upload_ps_state(struct brw_context *brw) else dw6 |= GEN7_PS_POSOFFSET_NONE; - dw7 |= - brw-wm.prog_data-first_curbe_grf GEN7_PS_DISPATCH_START_GRF_SHIFT_0 | - brw-wm.prog_data-first_curbe_grf_16 GEN7_PS_DISPATCH_START_GRF_SHIFT_2; + /* In case of non 1x per sample shading, only one of SIMD8 and SIMD16 +* should be enabled. We do 'SIMD16 only' dispatch if a SIMD16 shader +* is successfully compiled. In majority of the cases that bring us +* better performance than 'SIMD8 only' dispatch. +*/ + int min_invocations_per_fragment = + _mesa_get_min_invocations_per_fragment(ctx, brw-fragment_program, false); + assert(min_invocations_per_fragment = 1); Same _NEW_MULTISAMPLE comment. pgpHduuj0GO1U.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/13] Hack: Disable MCS on Broadwell for now.
Kenneth Graunke kenn...@whitecape.org writes: --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 6 ++ 1 file changed, 6 insertions(+) I'm mostly sending this out as a placeholder. Ultimately, we want to get MCS working. I'm not sure whether it would be valuable to push this (with a proper commit message) in the meantime. Seems reasonable for now. Drop a perf_debug() in here to remind us, and fix the _NEW_MULTISAMPLE comments in two later patches, and this series is: Reviewed-by: Eric Anholt e...@anholt.net pgpsSvwJe9C37.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH-RFC] i965: do not advertise MESA_FORMAT_Z_UNORM16 support
On 02/18/2014 09:48 PM, Chia-I Wu wrote: Since 73bc6061f5c3b6a3bb7a8114bb2e1ab77d23cfdb, Z16 support is not advertised for OpenGL ES contexts due to the terrible performance. It is still enabled for desktop GL because it was believed GL 3.0+ requires Z16. It turns out only GL 3.0 requires Z16, and that is corrected in later GL versions. In light of that, and per Ian's suggestion, stop advertising Z16 support by default, and add a drirc option, gl30_sized_format_rules, so that users can override. I actually don't think that GL 3.0 requires Z16, either. In glspec30.20080923.pdf, page 180, it says: [...] memory allocation per texture component is assigned by the GL to match the allocations listed in tables 3.16-3.18 as closely as possible. [...] Required Texture Formats [...] In addition, implementations are required to support the following sized internal formats. Requesting one of these internal formats for any texture type will allocate exactly the internal component sizes and types shown for that format in tables 3.16-3.17: Notably, however, GL_DEPTH_COMPONENT16 does /not/ appear in table 3.16 or table 3.17. It appears in table 3.18, where the exact rule doesn't apply, and thus we fall back to the closely as possible rule. The confusing part is that the ordering of the tables in the PDF is: Table 3.16 (pages 182-184) Table 3.18 (bottom of page 184 to top of 185) Table 3.17 (page 185) I'm guessing that people saw table 3.16, then saw the one after with DEPTH_COMPONENT* formats, and assumed it was 3.17. But it's not. I think we should just drop Z16 support entirely, and I think we should remove the requirement from the Piglit test. This regresses required-sized-texture-formats on GL 3.0. Signed-off-by: Chia-I Wu o...@lunarg.com Cc: Ian Romanick ian.d.roman...@intel.com --- src/mesa/drivers/dri/i965/brw_context.c | 3 +++ src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/brw_surface_formats.c | 7 --- src/mesa/drivers/dri/i965/intel_screen.c| 4 4 files changed, 12 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index ffbdb94..8ecf80b 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -553,6 +553,9 @@ brw_process_driconf_options(struct brw_context *brw) brw-disable_derivative_optimization = driQueryOptionb(brw-optionCache, disable_derivative_optimization); + brw-enable_z16 = + driQueryOptionb(brw-optionCache, gl30_sized_format_rules); + brw-precompile = driQueryOptionb(brw-optionCache, shader_precompile); ctx-Const.ForceGLSLExtensionsWarn = diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 98e90e2..fd10884 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1093,6 +1093,7 @@ struct brw_context bool disable_throttling; bool precompile; bool disable_derivative_optimization; + bool enable_z16; driOptionCache optionCache; /** @} */ diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c b/src/mesa/drivers/dri/i965/brw_surface_formats.c index 6a7e00a..1d5f044 100644 --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c @@ -623,10 +623,11 @@ brw_init_surface_formats(struct brw_context *brw) * increased depth stalls from a cacheline-based heuristic for detecting * depth stalls. * -* However, desktop GL 3.0+ require that you get exactly 16 bits when -* asking for DEPTH_COMPONENT16, so we have to respect that. +* However, desktop GL 3.0, and no other version, requires that you get +* exactly 16 bits when asking for DEPTH_COMPONENT16, so we have an drirc +* option to decide whether to respect that or not. */ - if (_mesa_is_desktop_gl(ctx)) + if (brw-enable_z16) ctx-TextureFormatSupported[MESA_FORMAT_Z_UNORM16] = true; /* On hardware that lacks support for ETC1, we map ETC1 to RGBX diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index ba22971..087fc3c 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -64,6 +64,10 @@ DRI_CONF_BEGIN DRI_CONF_OPT_BEGIN_B(disable_derivative_optimization, false) DRI_CONF_DESC(en, Derivatives with finer granularity by default) DRI_CONF_OPT_END + + DRI_CONF_OPT_BEGIN_B(gl30_sized_format_rules, false) + DRI_CONF_DESC(en, Honor GL 3.0 specific rules for sized formats) + DRI_CONF_OPT_END DRI_CONF_SECTION_END DRI_CONF_SECTION_QUALITY signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
Re: [Mesa-dev] [PATCH 07/15] mesa/sso: Add pipeline container/state
On Fri, Feb 7, 2014 at 10:00 PM, Ian Romanick i...@freedesktop.org wrote: From: Gregory Hainaut gregory.hain...@gmail.com V1: * Extend gl_shader_state as pipeline object state * Add a new container gl_pipeline_shader_state that contains binding point of the previous object * Update mesa init/free shader state due to the extension of the attibute * Add an init/free pipeline function for the context V2: * Rename gl_shader_state to gl_pipeline_object * Rename Pipeline.PipelineObj to Pipeline.Current * Formatting improvement V3 (idr): * Split out from previous uber patch. * Remove '#if 0' debug printfs. Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/mesa/main/context.c | 3 + src/mesa/main/mtypes.h | 22 +- src/mesa/main/pipelineobj.c | 161 src/mesa/main/pipelineobj.h | 25 +++ 4 files changed, 209 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c index 8421a25..fe072ab 100644 --- a/src/mesa/main/context.c +++ b/src/mesa/main/context.c @@ -106,6 +106,7 @@ #include matrix.h #include multisample.h #include performance_monitor.h +#include pipelineobj.h #include pixel.h #include pixelstore.h #include points.h @@ -814,6 +815,7 @@ init_attrib_groups(struct gl_context *ctx) _mesa_init_matrix( ctx ); _mesa_init_multisample( ctx ); _mesa_init_performance_monitors( ctx ); + _mesa_init_pipeline( ctx ); _mesa_init_pixel( ctx ); _mesa_init_pixelstore( ctx ); _mesa_init_point( ctx ); @@ -1219,6 +1221,7 @@ _mesa_free_context_data( struct gl_context *ctx ) _mesa_free_texture_data( ctx ); _mesa_free_matrix_data( ctx ); _mesa_free_viewport_data( ctx ); + _mesa_free_pipeline_data(ctx); _mesa_free_program_data(ctx); _mesa_free_shader_state(ctx); _mesa_free_queryobj_data(ctx); diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 52aeb15..4b8749a 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -2746,9 +2746,15 @@ struct gl_shader_program /** * Context state for GLSL vertex/fragment shaders. + * Extended to support pipeline object */ -struct gl_shader_state +struct gl_pipeline_object { + /** Name of the pipeline object as received from glGenProgramPipelines. +* It would be 0 for shaders without separate shader objects. +*/ + GLuint Name; + GLint RefCount; _glthread_Mutex Mutex; @@ -2774,6 +2780,17 @@ struct gl_shader_state GLbitfield Flags;/** Mask of GLSL_x flags */ }; +/** + * Context state for GLSL pipeline shaders. + */ +struct gl_pipeline_shader_state +{ + /** Currently bound pipeline object. See _mesa_BindProgramPipeline() */ + struct gl_pipeline_object *Current; + + /** Pipeline objects */ + struct _mesa_HashTable *Objects; +}; /** * Compiler options for a single GLSL shaders type @@ -4075,7 +4092,8 @@ struct gl_context struct gl_geometry_program_state GeometryProgram; struct gl_ati_fragment_shader_state ATIFragmentShader; - struct gl_shader_state Shader; /** GLSL shader object state */ + struct gl_pipeline_shader_state Pipeline; /** GLSL pipeline shader object state */ + struct gl_pipeline_object Shader; /** GLSL shader object state */ struct gl_shader_compiler_options ShaderCompilerOptions[MESA_SHADER_STAGES]; struct gl_query_state Query; /** occlusion, timer queries */ diff --git a/src/mesa/main/pipelineobj.c b/src/mesa/main/pipelineobj.c index 7454619..a82e3ed 100644 --- a/src/mesa/main/pipelineobj.c +++ b/src/mesa/main/pipelineobj.c @@ -30,6 +30,9 @@ * Implementation of pipeline object related API functions. Based on * GL_ARB_separate_shader_objects extension. * + * \todo + * Do we need to create CreatePipelineObject and DeletePipelineObject driver + * functions? */ I don't know. Another question .. do we need this todo comment? :) #include main/glheader.h @@ -50,6 +53,164 @@ #include ../glsl/glsl_parser_extras.h #include ../glsl/ir_uniform.h +/** + * Delete a pipeline object. + */ +void +_mesa_delete_pipeline_object(struct gl_context *ctx, + struct gl_pipeline_object *obj) +{ + unsinged i; + + _mesa_reference_shader_program(ctx, obj-_CurrentFragmentProgram, NULL); + + for (i = 0; i MESA_SHADER_STAGES; i++) + _mesa_reference_shader_program(ctx, obj-CurrentProgram[i], NULL); + + _mesa_reference_shader_program(ctx, obj-ActiveProgram, NULL); + _glthread_DESTROY_MUTEX(obj-Mutex); + ralloc_free(obj); +} + +/** + * Allocate and initialize a new pipeline object. + */ +static struct gl_pipeline_object * +_mesa_new_pipeline_object(struct gl_context *ctx, GLuint name) +{ + struct gl_pipeline_object *obj = rzalloc(NULL, struct gl_pipeline_object); + if (obj) { + obj-Name = name; +
Re: [Mesa-dev] [PATCH 6/8] i965: Implement HiZ resolves on Broadwell.
On 02/19/2014 11:12 AM, Eric Anholt wrote: Kenneth Graunke kenn...@whitecape.org writes: On 02/18/2014 01:38 PM, Eric Anholt wrote: Kenneth Graunke kenn...@whitecape.org writes: [snip] diff --git a/src/mesa/drivers/dri/i965/gen8_depth_state.c b/src/mesa/drivers/dri/i965/gen8_depth_state.c index f30ff28..3fa20c8 100644 --- a/src/mesa/drivers/dri/i965/gen8_depth_state.c +++ b/src/mesa/drivers/dri/i965/gen8_depth_state.c @@ -203,3 +203,108 @@ gen8_emit_depth_stencil_hiz(struct brw_context *brw, brw-depthstencil.stencil_offset, hiz, width, height, depth, lod, min_array_element); } + +/** + * Emit packets to perform a depth/HiZ resolve or fast depth/stencil clear. + * + * See the Optimized Depth Buffer Clear and/or Stencil Buffer Clear section + * of the hardware documentation for details. + */ +void +gen8_hiz_exec(struct brw_context *brw, struct intel_mipmap_tree *mt, + unsigned int level, unsigned int layer, enum gen6_hiz_op op) +{ + if (op == GEN6_HIZ_OP_NONE) + return; + + assert(mt-first_level == 0); + + struct intel_mipmap_level *miplevel = mt-level[level]; + + /* The basic algorithm is: +* - If needed, emit 3DSTATE_{DEPTH,HIER_DEPTH,STENCIL}_BUFFER and +* 3DSTATE_CLEAR_PARAMS packets to set up the relevant buffers. +* - If needed, emit 3DSTATE_DRAWING_RECTANGLE. +* - Emit 3DSTATE_WM_HZ_OP with a bit set for the particular operation. +* - Do a special PIPE_CONTROL to trigger an implicit rectangle primitive. +* - Emit 3DSTATE_WM_HZ_OP with no bits set to return to normal rendering. +*/ + emit_depth_packets(brw, mt, + brw_depth_format(brw, mt-format), + BRW_SURFACE_2D, + true, /* depth writes */ + NULL, false, 0, /* no stencil for now */ + true, /* hiz */ + mt-logical_width0, + mt-logical_height0, + MAX2(mt-logical_depth0, 1), Is logical_depth0 ever 0? That seems like a bug. No, I guess it isn't. It looks like I copy and pasted this from BLORP, or was being overly cautious for some reason. Dropped. + level, + layer); /* min_array_element */ + + BEGIN_BATCH(4); + OUT_BATCH(_3DSTATE_DRAWING_RECTANGLE 16 | (4 - 2)); + OUT_BATCH(0); + OUT_BATCH(((mt-logical_width0 - 1) 0x) | + ((mt-logical_height0 - 1) 16)); + OUT_BATCH(0); + ADVANCE_BATCH(); The drawing rectangle should be using the level's size, not the level 0 size. Yes, this makes sense...we bind a specific miplevel of the depth buffer, so presumably the (0, 0) origin is the start of that miplevel, not the start of the whole tree. I'll change that. Since the drawing rectangle is just the bounds of where you can draw, and not actually the clear/resolve rectangle, I think specifying one that's too large shouldn't be harmful. But specifying the right value is trivial, so I agree we should do it. + uint32_t sample_mask = 0x; + if (mt-num_samples 0) { + dw1 |= SET_FIELD(ffs(mt-num_samples) - 1, GEN8_WM_HZ_NUM_SAMPLES); + sample_mask = gen6_determine_sample_mask(brw); + } I don't think we want the user-set sample mask stuff to change the samples affected by our hiz/depth resolves. I think you can just drop the if block. Good point, whatever the user specified is probably unrelated to our values. I've dropped the sample_mask variable and just stuffed 0x in the packet. I kept the if-block for the dw1 |= ...num_samples... line. + + BEGIN_BATCH(5); + OUT_BATCH(_3DSTATE_WM_HZ_OP 16 | (5 - 2)); + OUT_BATCH(dw1); + OUT_BATCH(0); + OUT_BATCH(SET_FIELD(miplevel-width, GEN8_WM_HZ_CLEAR_RECTANGLE_X_MAX) | + SET_FIELD(miplevel-height, GEN8_WM_HZ_CLEAR_RECTANGLE_Y_MAX)); + OUT_BATCH(SET_FIELD(sample_mask, GEN8_WM_HZ_SAMPLE_MASK)); + ADVANCE_BATCH(); I think now the miplevel-width should be minify(mt-logical_width0, level). Hope that helped Yes, that's much nicer - and correct for MSAA buffers! I'm unclear whether we need to do: ALIGN(minify(mt-logical_width0, level), 8) ALIGN(minify(mt-logical_height0, level), 4) (both here and in the drawing rectangle) I've read seemingly contradictory information...it sounds like it might be necessary for depth resolves, but not otherwise...but I could be misinterpreting it. It seems to be working... Yeah, I'm not clear on how this ought to work for resolves. For clears, the strategy explained for the previous gens made sense: Round down your coords to get 8x4 alignment, then do slow clears on the remaining strips (if any). But for a resolve, what else do you do? We only enable HiZ for miplevels that align to 8x4 boundaries. --Ken ___ mesa-dev
Re: [Mesa-dev] [PATCH 1/3] glcpp: Only warn for macro names containing __
On 02/19/2014 11:09 AM, Ian Romanick wrote: I'm hoping that Tapani or Darius will verify that this patch actually fixes the problem. That's why people CC other people on patches. :) I have the game, and I can confirm that the lighting is awfully broken with master, and looks correct after this patch. Patches 1 and 2 are: Tested-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] OpenCL Supported extensions for R600/SI ?
Currently clGetDeviceInfo() returns an empty string when queried for CL_DEVICE_EXTENSIONS. Looking through both the Mesa and LLVM/Clang code I see references to the following extensions: cl_khr_fp16 cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_gl_sharing cl_khr_gl_event cl_khr_d3d10_sharing cl_khr_global_int32_base_atomics cl_khr_global_in32_extended_atomics cl_khr_local_in32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes So are any of these extensions supported within Mesa for the R600 or SI implementation? I'm not finding the Khronos OpenCL spec to be completely clear on this, but it seems that extensions that are possible, even if not enabled, should be returned by clGetDeviceInfo() Can anyone shed some light on this for me? Thanks! Al Dorrington Software Engineer Sr Lockheed Martin, Mission Systems and Training ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/13] i965: Simplify Broadwell's 3DSTATE_MULTISAMPLE sample count handling.
On Wed, Feb 19, 2014 at 2:04 AM, Kenneth Graunke kenn...@whitecape.org wrote: These enumerations are simply log2 of the number of multisamples shifted by a bit, so we can calculate them using ffs() in a lot less code. Suggested by Eric Anholt. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen8_multisample_state.c | 26 +++--- 1 file changed, 3 insertions(+), 23 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_multisample_state.c b/src/mesa/drivers/dri/i965/gen8_multisample_state.c index 64c7208..bfe0d5b 100644 --- a/src/mesa/drivers/dri/i965/gen8_multisample_state.c +++ b/src/mesa/drivers/dri/i965/gen8_multisample_state.c @@ -33,33 +33,13 @@ void gen8_emit_3dstate_multisample(struct brw_context *brw, unsigned num_samples) { - uint32_t number_of_multisamples = 0; + assert(num_samples = 16); - switch (num_samples) { - case 0: - case 1: - number_of_multisamples = MS_NUMSAMPLES_1; - break; - case 2: - number_of_multisamples = MS_NUMSAMPLES_2; - break; - case 4: - number_of_multisamples = MS_NUMSAMPLES_4; - break; - case 8: - number_of_multisamples = MS_NUMSAMPLES_8; - break; - case 16: - number_of_multisamples = MS_NUMSAMPLES_16; - break; - default: - assert(!Unrecognized num_samples in gen8_emit_3dstate_multisample); - break; - } + unsigned log2_samples = ffs(MAX2(num_samples, 1)) - 1; BEGIN_BATCH(2); OUT_BATCH(GEN8_3DSTATE_MULTISAMPLE 16 | (2 - 2)); - OUT_BATCH(MS_PIXEL_LOCATION_CENTER | number_of_multisamples); + OUT_BATCH(MS_PIXEL_LOCATION_CENTER | log2_samples 1); ADVANCE_BATCH(); } -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev This series is: Reviewed-by: Anuj Phogat anuj.pho...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/13] i965: Only use the SIMD16 program for per-sample shading on Broadwell.
On Wed, Feb 19, 2014 at 2:04 AM, Kenneth Graunke kenn...@whitecape.org wrote: This is a straight port from gen7_wm_state.c; I haven't looked into whether we can do both. Verified that restriction still holds true in BDW. See 3D Pipeline Stages Pixel Pixel Shader Thread Generation Pixel Grouping (Dispatch Size) Control v2: Actually do it right. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen8_ps_state.c | 38 --- 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c b/src/mesa/drivers/dri/i965/gen8_ps_state.c index 57bf053..a834b85 100644 --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c @@ -183,10 +183,6 @@ upload_ps_state(struct brw_context *brw) if (brw-wm.prog_data-nr_params 0) dw6 |= GEN7_PS_PUSH_CONSTANT_ENABLE; - dw6 |= GEN7_PS_8_DISPATCH_ENABLE; - if (brw-wm.prog_data-prog_offset_16) - dw6 |= GEN7_PS_16_DISPATCH_ENABLE; - /* From the documentation for this packet: * If the PS kernel does not need the Position XY Offsets to * compute a Position Value, then this field should be programmed @@ -205,13 +201,39 @@ upload_ps_state(struct brw_context *brw) else dw6 |= GEN7_PS_POSOFFSET_NONE; - dw7 |= - brw-wm.prog_data-first_curbe_grf GEN7_PS_DISPATCH_START_GRF_SHIFT_0 | - brw-wm.prog_data-first_curbe_grf_16 GEN7_PS_DISPATCH_START_GRF_SHIFT_2; + /* In case of non 1x per sample shading, only one of SIMD8 and SIMD16 +* should be enabled. We do 'SIMD16 only' dispatch if a SIMD16 shader +* is successfully compiled. In majority of the cases that bring us +* better performance than 'SIMD8 only' dispatch. +*/ + int min_invocations_per_fragment = + _mesa_get_min_invocations_per_fragment(ctx, brw-fragment_program, false); + assert(min_invocations_per_fragment = 1); + + if (brw-wm.prog_data-prog_offset_16) { + dw6 |= GEN7_PS_16_DISPATCH_ENABLE; + if (min_invocations_per_fragment == 1) { + dw6 |= GEN7_PS_8_DISPATCH_ENABLE; + dw7 |= (brw-wm.prog_data-first_curbe_grf + GEN7_PS_DISPATCH_START_GRF_SHIFT_0); + dw7 |= (brw-wm.prog_data-first_curbe_grf_16 + GEN7_PS_DISPATCH_START_GRF_SHIFT_2); + } else { + dw7 |= (brw-wm.prog_data-first_curbe_grf_16 + GEN7_PS_DISPATCH_START_GRF_SHIFT_0); + } + } else { + dw6 |= GEN7_PS_8_DISPATCH_ENABLE; + dw7 |= (brw-wm.prog_data-first_curbe_grf + GEN7_PS_DISPATCH_START_GRF_SHIFT_0); + } BEGIN_BATCH(12); OUT_BATCH(_3DSTATE_PS 16 | (12 - 2)); - OUT_BATCH(brw-wm.base.prog_offset); + if (brw-wm.prog_data-prog_offset_16 min_invocations_per_fragment 1) + OUT_BATCH(brw-wm.base.prog_offset + brw-wm.prog_data-prog_offset_16); + else + OUT_BATCH(brw-wm.base.prog_offset); OUT_BATCH(0); OUT_BATCH(dw3); if (brw-wm.prog_data-total_scratch) { -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 14/15] mesa/sso: Implement _mesa_ActiveShaderProgram
On Fri, Feb 7, 2014 at 10:00 PM, Ian Romanick i...@freedesktop.org wrote: From: Gregory Hainaut gregory.hain...@gmail.com This was originally included in another patch, but it was split out by Ian Romanick. Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/mesa/main/pipelineobj.c | 24 1 file changed, 24 insertions(+) diff --git a/src/mesa/main/pipelineobj.c b/src/mesa/main/pipelineobj.c index b47dc7a..6e490bd 100644 --- a/src/mesa/main/pipelineobj.c +++ b/src/mesa/main/pipelineobj.c @@ -227,6 +227,30 @@ _mesa_UseProgramStages(GLuint pipeline, GLbitfield stages, GLuint program) void GLAPIENTRY _mesa_ActiveShaderProgram(GLuint pipeline, GLuint program) { + GET_CURRENT_CONTEXT(ctx); + struct gl_shader_program *shProg = (program != 0) + ? _mesa_lookup_shader_program_err(ctx, program, glActiveShaderProgram(program)) + : NULL; Seems like if/else would be more clear for this part. If _mesa_lookup_shader_program_err returns NULL, should we exit early? -Jordan + struct gl_pipeline_object *pipe = lookup_pipeline_object(ctx, pipeline); + + if (!pipe) { + _mesa_error(ctx, GL_INVALID_OPERATION, glActiveShaderProgram(pipeline)); + return; + } + + /* Object is created by any Pipeline call but glGenProgramPipelines, +* glIsProgramPipeline and GetProgramPipelineInfoLog +*/ + pipe-EverBound = GL_TRUE; + + if ((shProg != NULL) !shProg-LinkStatus) { + _mesa_error(ctx, GL_INVALID_OPERATION, +glActiveShaderProgram(program %u not linked), shProg-Name); + return; + } + + _mesa_reference_shader_program(ctx, pipe-ActiveProgram, shProg); } /** -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/15] The first half of GL_ARB_separate_shader_objects
I replied to 7 14. Series: Reviewed-by: Jordan Justen jordan.l.jus...@intel.com On Fri, Feb 7, 2014 at 10:00 PM, Ian Romanick i...@freedesktop.org wrote: I'm taking a patch from Paul's notebook, and I'm going to try land a giant patch series in a small number of more manageable chunks. GL_ARB_separate_shader_objects has been work-in-progress for about 10 months. This represents about half of the final patch series. The next block of patches will be about half of the remaining bits, and the third patch series should be the rest. The current state of things is also in the sso5 branch of git://people.freedesktop.org/~idr/mesa. There are some smurf commits at the end, and there's still some work to be done, obviously. This is the easy half. This series adds: - Extension tracking - Parser and compiler front-end support for the layout qualifiers added by the extension. - Plumbing for shader pipeline objects. - The bulk of the API. I don't think there should be anythig controversial here... that's all in the next batch. All of this has been pretty well tested by piglit, and at least one ISV has been playing with it too. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH-RFC] i965: do not advertise MESA_FORMAT_Z_UNORM16 support
Kenneth Graunke kenn...@whitecape.org writes: On 02/18/2014 09:48 PM, Chia-I Wu wrote: Since 73bc6061f5c3b6a3bb7a8114bb2e1ab77d23cfdb, Z16 support is not advertised for OpenGL ES contexts due to the terrible performance. It is still enabled for desktop GL because it was believed GL 3.0+ requires Z16. It turns out only GL 3.0 requires Z16, and that is corrected in later GL versions. In light of that, and per Ian's suggestion, stop advertising Z16 support by default, and add a drirc option, gl30_sized_format_rules, so that users can override. I actually don't think that GL 3.0 requires Z16, either. In glspec30.20080923.pdf, page 180, it says: [...] memory allocation per texture component is assigned by the GL to match the allocations listed in tables 3.16-3.18 as closely as possible. [...] Required Texture Formats [...] In addition, implementations are required to support the following sized internal formats. Requesting one of these internal formats for any texture type will allocate exactly the internal component sizes and types shown for that format in tables 3.16-3.17: Notably, however, GL_DEPTH_COMPONENT16 does /not/ appear in table 3.16 or table 3.17. It appears in table 3.18, where the exact rule doesn't apply, and thus we fall back to the closely as possible rule. The confusing part is that the ordering of the tables in the PDF is: Table 3.16 (pages 182-184) Table 3.18 (bottom of page 184 to top of 185) Table 3.17 (page 185) ffs. Yeah, let's just drop Z16 from the driver and the sized depth stuff from the 3.0 piglit test. pgp8DqD8hK1Uc.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] OpenCL Supported extensions for R600/SI ?
On Wed, Feb 19, 2014 at 09:20:22PM +, Dorrington, Albert wrote: Currently clGetDeviceInfo() returns an empty string when queried for CL_DEVICE_EXTENSIONS. Looking through both the Mesa and LLVM/Clang code I see references to the following extensions: cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_gl_sharing cl_khr_gl_event cl_khr_d3d10_sharing cl_khr_global_int32_base_atomics cl_khr_global_in32_extended_atomics cl_khr_local_in32_base_atomics cl_khr_local_int32_extended_atomics These two are partially supported. cl_khr_byte_addressable_store This one is supported, but the clGetDeviceInfo implementation has not been updated to reflect this. cl_khr_3d_image_writes So are any of these extensions supported within Mesa for the R600 or SI implementation? I'm not finding the Khronos OpenCL spec to be completely clear on this, but it seems that extensions that are possible, even if not enabled, should be returned by clGetDeviceInfo() Can anyone shed some light on this for me? The clGetDeviceInfo() implementation in clover is incomplete. There are a lot of values that are hard-coded which need to be replaced with driver queries. -Tom Thanks! Al Dorrington Software Engineer Sr Lockheed Martin, Mission Systems and Training ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] clover: Don't call pipe_loader_release() when deleting a device
On Tue, Feb 18, 2014 at 05:50:19PM +0100, Francisco Jerez wrote: Tom Stellard t...@stellard.net writes: From: Tom Stellard thomas.stell...@amd.com After pipe_loader_release() is called, if any of the pipe_* objects try to call into the gallium API the application will segfault. The only time devices are deleted is when the global _clover_platform object is deleted by the static destructor. However, since application objects that are deleted by the static destructor *after* _clover_platform might try to make a CL API calls from their destructor, it is never safe to call pipe_loader_release(). Please have a look at the clover-internal-ref-counting branch [1] of my mesa tree, it should fix a number of memory management-related bugs, possibly the one you've encountered too, without the negative side effects of dropping the call to pipe_loader_release(). I came across one regression, but I'm still looking into whether or not it is a bug in clover or an application bug. -Tom Thanks. [1] http://cgit.freedesktop.org/~currojerez/mesa/log/?h=clover-internal-ref-counting --- src/gallium/state_trackers/clover/core/device.cpp | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/gallium/state_trackers/clover/core/device.cpp b/src/gallium/state_trackers/clover/core/device.cpp index 76a49d0..2290366 100644 --- a/src/gallium/state_trackers/clover/core/device.cpp +++ b/src/gallium/state_trackers/clover/core/device.cpp @@ -48,8 +48,6 @@ device::device(clover::platform platform, pipe_loader_device *ldev) : device::~device() { if (pipe) pipe-destroy(pipe); - if (ldev) - pipe_loader_release(ldev, 1); } bool -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH-RFC] i965: do not advertise MESA_FORMAT_Z_UNORM16 support
On 02/19/2014 02:27 PM, Ian Romanick wrote: On 02/19/2014 12:08 PM, Kenneth Graunke wrote: On 02/18/2014 09:48 PM, Chia-I Wu wrote: Since 73bc6061f5c3b6a3bb7a8114bb2e1ab77d23cfdb, Z16 support is not advertised for OpenGL ES contexts due to the terrible performance. It is still enabled for desktop GL because it was believed GL 3.0+ requires Z16. It turns out only GL 3.0 requires Z16, and that is corrected in later GL versions. In light of that, and per Ian's suggestion, stop advertising Z16 support by default, and add a drirc option, gl30_sized_format_rules, so that users can override. I actually don't think that GL 3.0 requires Z16, either. In glspec30.20080923.pdf, page 180, it says: [...] memory allocation per texture component is assigned by the GL to match the allocations listed in tables 3.16-3.18 as closely as possible. [...] Required Texture Formats [...] In addition, implementations are required to support the following sized internal formats. Requesting one of these internal formats for any texture type will allocate exactly the internal component sizes and types shown for that format in tables 3.16-3.17: Notably, however, GL_DEPTH_COMPONENT16 does /not/ appear in table 3.16 or table 3.17. It appears in table 3.18, where the exact rule doesn't apply, and thus we fall back to the closely as possible rule. The confusing part is that the ordering of the tables in the PDF is: Table 3.16 (pages 182-184) Table 3.18 (bottom of page 184 to top of 185) Table 3.17 (page 185) I'm guessing that people saw table 3.16, then saw the one after with DEPTH_COMPONENT* formats, and assumed it was 3.17. But it's not. Yay latex! Thank you for putting things in random order because it fit better. :( I think we should just drop Z16 support entirely, and I think we should remove the requirement from the Piglit test. If the test is wrong, and it sounds like it is, then I'm definitely in favor of changing it. The reason to have Z16 is low-bandwidth GPUs in resource constrained environments. If an app specifically asks for Z16, then there's a non-zero (though possibly infinitesimal) probability they're doing it for a reason. For at least some platforms, isn't there just a work-around to implement to fix the performance issue? Doesn't the performance issue only affect some platforms to begin with? Maybe just change the check to ctx-TextureFormatSupported[MESA_FORMAT_Z_UNORM16] = ! platform has z16 performance issues; Currently, all platforms have Z16 performance issues. On Haswell and later, we could potentially implement the PMA stall optimization, which I believe would reduce(?) the problem. I'm not sure if it would eliminate it though. I think the best course of action is: 1. Fix the Piglit test to not require precise depth formats. 2. Disable Z16 on all generations. 3. Add a to do item for implementing the HSW+ PMA stall optimization. 4. Add a to do item for re-evaluating Z16 on HSW+ once that's done. --Ken signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] glsl: Compile error if fs defines conflicting qualifiers for gl_FragCoord
On Tue, Feb 18, 2014 at 5:28 PM, Ian Romanick i...@freedesktop.org wrote: On 02/18/2014 03:36 PM, Anuj Phogat wrote: On Tue, Feb 18, 2014 at 11:01 AM, Ian Romanick i...@freedesktop.org wrote: On 02/10/2014 05:29 PM, Anuj Phogat wrote: GLSL 1.50 spec says: If gl_FragCoord is redeclared in any fragment shader in a program, it must be redeclared in all the fragment shaders in that program that have a static use gl_FragCoord. All redeclarations of gl_FragCoord in all fragment shaders in a single program must have the same set of qualifiers. This patch makes the glsl compiler to generate an error if we have a fragment shader defined with conflicting layout qualifier declarations for gl_FragCoord. For example: layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord; layout(pixel_center_integer) in vec4 gl_FragCoord; void main() { gl_FragColor = gl_FragCoord.xyzz; } Cc: mesa-sta...@lists.freedesktop.org Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/glsl/ast_to_hir.cpp | 39 +++ src/glsl/glsl_parser_extras.cpp | 3 +++ src/glsl/glsl_parser_extras.h | 10 ++ 3 files changed, 52 insertions(+) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index c89a26b..7d7d89b 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -2374,6 +2374,45 @@ apply_type_qualifier_to_variable(const struct ast_type_qualifier *qual, qual_string); } + /* Make sure all gl_FragCoord redeclarations specify the same layout +* qualifier type. +*/ + bool conflicting_pixel_center_integer = + state-fs_pixel_center_integer + !qual-flags.q.pixel_center_integer; + + bool conflicting_origin_upper_left = + state-fs_origin_upper_left + !qual-flags.q.origin_upper_left; I don't think this catches all the cases. What about layout(origin_upper_left ) in vec4 gl_FragCoord; layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord; Nice catch. I'll update my patch to include this case. What do you think about following two cases? case 1: in vec4 gl_FragCoord; layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord; AMD produces no compilation error. This patch matches the behavior. case 2: layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord; in vec4 gl_FragCoord; AMD produces compilation error. This patch matches the behavior. I don't think that's right. I think they should both produce an error. The spec says, All redeclarations of gl_FragCoord in all fragment shaders in a single program must have the same set of qualifiers. I can't see any reason to give an error for case 2 but not for case 1. I agree. We should also check NVIDIA. NVIDIA driver produces compilation error in both cases. I'll add a check to handle them correctly in mesa. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 71870] Metro: Last Light rendering issues
https://bugs.freedesktop.org/show_bug.cgi?id=71870 Ian Romanick i...@freedesktop.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #45 from Ian Romanick i...@freedesktop.org --- The __ issue should be fixed by the following commits on Mesa master. These are scheduled to inclusion in 10.1 and 10.0.4. commit 2c85fd5a964a78c9f7a93994fb79f1723c6f45b5 Author: Ian Romanick ian.d.roman...@intel.com Date: Tue Feb 18 09:36:08 2014 -0800 glsl: Only warn for macro names containing __ From page 14 (page 20 of the PDF) of the GLSL 1.10 spec: In addition, all identifiers containing two consecutive underscores (__) are reserved as possible future keywords. The intention is that names containing __ are reserved for internal use by the implementation, and names prefixed with GL_ are reserved for use by Khronos. Names simply containing __ are dangerous to use, but should be allowed. Per the Khronos bug mentioned below, a future version of the GLSL specification will clarify this. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Cc: 9.2 10.0 10.1 mesa-sta...@lists.freedesktop.org Reviewed-by: Kenneth Graunke kenn...@whitecape.org Tested-by: Kenneth Graunke kenn...@whitecape.org Reviewed-by: Anuj Phogat anuj.pho...@gmail.com Tested-by: Darius Spitznagel d.spitzna...@goodbytez.de Cc: Tapani Pälli lem...@gmail.com Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870 Bugzilla: Khronos #11702 commit 0bd78926304e72ef3566e977d0cb5a959d86b809 Author: Ian Romanick ian.d.roman...@intel.com Date: Tue Feb 18 09:10:36 2014 -0800 glcpp: Only warn for macro names containing __ Section 3.3 (Preprocessor) of the GLSL 1.30 spec (and later) and the GLSL ES spec (all versions) say: All macro names containing two consecutive underscores ( __ ) are reserved for future use as predefined macro names. All macro names prefixed with GL_ (GL followed by a single underscore) are also reserved. The intention is that names containing __ are reserved for internal use by the implementation, and names prefixed with GL_ are reserved for use by Khronos. Since every extension adds a name prefixed with GL_ (i.e., the name of the extension), that should be an error. Names simply containing __ are dangerous to use, but should be allowed. In similar cases, the C++ preprocessor specification says, no diagnostic is required. Per the Khronos bug mentioned below, a future version of the GLSL specification will clarify this. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Cc: 9.2 10.0 10.1 mesa-sta...@lists.freedesktop.org Reviewed-by: Kenneth Graunke kenn...@whitecape.org Tested-by: Kenneth Graunke kenn...@whitecape.org Reviewed-by: Anuj Phogat anuj.pho...@gmail.com Tested-by: Darius Spitznagel d.spitzna...@goodbytez.de Cc: Tapani Pälli lem...@gmail.com Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870 Bugzilla: Khronos #11702 -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH-RFC] i965: do not advertise MESA_FORMAT_Z_UNORM16 support
On 02/19/2014 03:03 PM, Kenneth Graunke wrote: Currently, all platforms have Z16 performance issues. On Haswell and later, we could potentially implement the PMA stall optimization, which I believe would reduce(?) the problem. I'm not sure if it would eliminate it though. I think the best course of action is: 1. Fix the Piglit test to not require precise depth formats. 2. Disable Z16 on all generations. 3. Add a to do item for implementing the HSW+ PMA stall optimization. 4. Add a to do item for re-evaluating Z16 on HSW+ once that's done. I didn't realize all platforms had Z16 issues. I thought it was just HiZ platforms (ILK+). Sounds like a good plan to me. --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/13] i965: Only use the SIMD16 program for per-sample shading on Broadwell.
On 02/19/2014 01:32 PM, Anuj Phogat wrote: On Wed, Feb 19, 2014 at 2:04 AM, Kenneth Graunke kenn...@whitecape.org wrote: This is a straight port from gen7_wm_state.c; I haven't looked into whether we can do both. Verified that restriction still holds true in BDW. See 3D Pipeline Stages Pixel Pixel Shader Thread Generation Pixel Grouping (Dispatch Size) Control Thanks for looking this up! signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/12] geom-outlining-150: Use a vbo.
Use a vbo for vertex data instead of client-side arrays. Also bind a vertex array object. This is necessary for the switch to a core profile context. Signed-off-by: Fabian Bieler fabianbie...@fastmail.fm Reviewed-by: Brian Paul bri...@vmware.com Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/geom-outlining-150.c | 25 ++--- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/src/glsl/geom-outlining-150.c b/src/glsl/geom-outlining-150.c index 5c2b3c9..0bc20f0 100644 --- a/src/glsl/geom-outlining-150.c +++ b/src/glsl/geom-outlining-150.c @@ -23,6 +23,7 @@ static GLint WinWidth = 500, WinHeight = 500; static GLint Win = 0; static GLuint VertShader, GeomShader, FragShader, Program; +static GLuint vao, vbo; static GLboolean Anim = GL_TRUE; static int uViewportSize = -1, uModelViewProj = -1, uColor = -1; @@ -112,11 +113,6 @@ mat_multiply(GLfloat product[16], const GLfloat a[16], const GLfloat b[16]) static void Redisplay(void) { - static const GLfloat verts[3][2] = { - { -1, -1 }, - { 1, -1 }, - { 0, 1 } - }; GLfloat rot[4][4]; GLfloat trans[16], mvp[16]; @@ -131,8 +127,6 @@ Redisplay(void) glUniformMatrix4fv(uModelViewProj, 1, GL_FALSE, (float *) mvp); /* Draw */ - glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, verts); - glEnableVertexAttribArray(0); glDrawArrays(GL_TRIANGLES, 0, 3); glutSwapBuffers(); @@ -217,6 +211,8 @@ CleanUp(void) glDeleteShader(VertShader); glDeleteShader(GeomShader); glDeleteProgram(Program); + glDeleteVertexArrays(1, vao); + glDeleteBuffers(1, vbo); glutDestroyWindow(Win); } @@ -304,6 +300,11 @@ Init(void) float m = min(d0, min(d1, d2)); \n FragColor = Color * smoothstep(0.0, LINE_WIDTH, m); \n } \n; + static const GLfloat verts[3][2] = { + { -1, -1 }, + { 1, -1 }, + { 0, 1 } + }; if (!ShadersSupported()) exit(1); @@ -351,6 +352,16 @@ Init(void) glUniform4fv(uColor, 1, Orange); + glGenBuffers(1, vbo); + glBindBuffer(GL_ARRAY_BUFFER, vbo); + glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW); + + glGenVertexArrays(1, vao); + glBindVertexArray(vao); + + glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, NULL); + glEnableVertexAttribArray(0); + glClearColor(0.3f, 0.3f, 0.3f, 0.0f); glEnable(GL_DEPTH_TEST); -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/12] glsl/gsraytrace: Don't create new Buffer objects everytime the window is resized.
Signed-off-by: Fabian Bieler fabianbie...@fastmail.fm Reviewed-by: Brian Paul bri...@vmware.com Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/gsraytrace.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/glsl/gsraytrace.cpp b/src/glsl/gsraytrace.cpp index c21c667..f156fdc 100644 --- a/src/glsl/gsraytrace.cpp +++ b/src/glsl/gsraytrace.cpp @@ -776,7 +776,6 @@ Reshape(int width, int height) { size_t nElem = WinWidth*WinHeight*nRayGens; - glGenBuffers(1, dst); glBindBuffer(GL_TRANSFORM_FEEDBACK_BUFFER_NV, dst); glBufferData(GL_TRANSFORM_FEEDBACK_BUFFER_NV, nElem*sizeof(GSRay), 0, GL_STREAM_DRAW); GSRay* d = (GSRay*)glMapBuffer(GL_TRANSFORM_FEEDBACK_BUFFER_NV, GL_READ_WRITE); @@ -790,7 +789,6 @@ Reshape(int width, int height) } { - glGenBuffers(1, eyeRaysAsPoints); glBindBuffer(GL_ARRAY_BUFFER, eyeRaysAsPoints); glBufferData(GL_ARRAY_BUFFER, WinWidth*WinHeight*sizeof(GSRay), 0, GL_STATIC_DRAW); GSRay* d = (GSRay*)glMapBuffer(GL_ARRAY_BUFFER, GL_READ_WRITE); @@ -919,6 +917,8 @@ Init(void) } glGenQueries(1, pgQuery); + glGenBuffers(1, dst); + glGenBuffers(1, eyeRaysAsPoints); printf(\nESC = exit demo\nleft mouse + drag = rotate camera\n\n); } -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/12] DEMOS Use core profile in two GS demos (v3).
Hello! As mesa only supports geometry shaders in core profile contexts this patchset adjusts the gsraytrace and the geom-outlining-150 demos to use the core profile. This is v3 with some comments by Ian Romanick adressed. The series is reviewed by Brian Paul and Ian Romanick. As I don't have git access, I'd appreciate it if someone could commit these patches. Thanks, Fabian Fabian Bieler (12): configure.ac: Check for freeglut. glut_wrapper: Include freeglut.h if available. glsl/gsraytrace: Use __LINE__ macro to set line numbers in GLSL source strings. glsl/gsraytrace: Don't create new Buffer objects everytime the window is resized. glsl/gsraytrace: Bind transform feedback buffer. glsl/gsraytrace: Use core GL3.0 transform feedback glsl/gsraytrace: Use GLSL 1.5 instead of 1.2. glsl/gsraytrace: Use core geometry shaders. glsl/gsraytrace: Switch to core profile. geom-outlining-150: Use a vbo. geom-outlining-150: Use core geometry shaders. geom-outlining-150: Switch to core profile. configure.ac | 6 ++ src/glsl/geom-outlining-150.c | 64 +-- src/glsl/gsraytrace.cpp | 185 ++ src/util/glut_wrap.h | 4 +- 4 files changed, 147 insertions(+), 112 deletions(-) -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/12] glut_wrapper: Include freeglut.h if available.
The freeglut header only defines the extensions to request an OpenGL core profile context if freeglut.h (rather than glut.h) is included. Note that the header is installed to include/GL/freeglut.h on OS X, too. Signed-off-by: Fabian Bieler fabianbie...@fastmail.fm Reviewed-by: Brian Paul bri...@vmware.com Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/util/glut_wrap.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/util/glut_wrap.h b/src/util/glut_wrap.h index a48a9e8..fa1b8f9 100644 --- a/src/util/glut_wrap.h +++ b/src/util/glut_wrap.h @@ -1,7 +1,9 @@ #ifndef GLUT_WRAP_H #define GLUT_WRAP_H -#ifdef __APPLE__ +#ifdef HAVE_FREEGLUT +# include GL/freeglut.h +#elif defined __APPLE__ # include GLUT/glut.h #else # include GL/glut.h -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/12] geom-outlining-150: Switch to core profile.
Signed-off-by: Fabian Bieler fabianbie...@fastmail.fm Reviewed-by: Brian Paul bri...@vmware.com Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/geom-outlining-150.c | 13 + 1 file changed, 13 insertions(+) diff --git a/src/glsl/geom-outlining-150.c b/src/glsl/geom-outlining-150.c index 3dffa16..2e2a54a 100644 --- a/src/glsl/geom-outlining-150.c +++ b/src/glsl/geom-outlining-150.c @@ -364,9 +364,22 @@ main(int argc, char *argv[]) { glutInit(argc, argv); glutInitWindowSize(WinWidth, WinHeight); +#ifdef HAVE_FREEGLUT + glutInitContextVersion(3, 2); + glutInitContextProfile(GLUT_CORE_PROFILE); glutInitDisplayMode(GLUT_RGB | GLUT_DEPTH | GLUT_DOUBLE); +#elif defined __APPLE__ + glutInitDisplayMode(GLUT_3_2_CORE_PROFILE | GLUT_RGB | GLUT_DEPTH | GLUT_DOUBLE); +#else + glutInitDisplayMode(GLUT_RGB | GLUT_DEPTH | GLUT_DOUBLE); +#endif Win = glutCreateWindow(argv[0]); + /* glewInit requires glewExperimentel set to true for core profiles. +* Depending on the glew version it also generates a GL_INVALID_ENUM. +*/ + glewExperimental = GL_TRUE; glewInit(); + glGetError(); glutReshapeFunc(Reshape); glutKeyboardFunc(Key); glutDisplayFunc(Redisplay); -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/12] glsl/gsraytrace: Bind transform feedback buffer.
Bind the transform feedback buffer before drawing into it und unbind it afterwards. Signed-off-by: Fabian Bieler fabianbie...@fastmail.fm Reviewed-by: Brian Paul bri...@vmware.com Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/gsraytrace.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/glsl/gsraytrace.cpp b/src/glsl/gsraytrace.cpp index f156fdc..015bfcd 100644 --- a/src/glsl/gsraytrace.cpp +++ b/src/glsl/gsraytrace.cpp @@ -628,6 +628,7 @@ Draw(void) printf(%d\n, i); //gs.fpwQuery-beginQuery(); //gs.pgQuery-beginQuery(); + glBindBufferBaseNV(GL_TRANSFORM_FEEDBACK_BUFFER_NV, 0, dst); glBeginQuery(GL_PRIMITIVES_GENERATED_NV, pgQuery); glBeginTransformFeedbackNV(GL_POINTS); //gs.eyeRaysAsPoints-bindAs(ARRAY); @@ -675,7 +676,7 @@ Draw(void) swap(src, dst); - glBindBufferOffsetNV(GL_TRANSFORM_FEEDBACK_BUFFER_NV, 0, dst-getID(), 0); pso_gl_check(); + glBindBufferBaseNV(GL_TRANSFORM_FEEDBACK_BUFFER_NV, 0, 0); clear(); -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/12] glsl/gsraytrace: Use __LINE__ macro to set line numbers in GLSL source strings.
The hardcoded numbers are a few lines off at the moment. Keeping track of the numbers through further modifications is inconvenient. The __LINE__ constant takes care of this automatically. v2: Don't set source-string-number to line number. Signed-off-by: Fabian Bieler fabianbie...@fastmail.fm Reviewed-by: Brian Paul bri...@vmware.com Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/gsraytrace.cpp | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/src/glsl/gsraytrace.cpp b/src/glsl/gsraytrace.cpp index 62a584d..c21c667 100644 --- a/src/glsl/gsraytrace.cpp +++ b/src/glsl/gsraytrace.cpp @@ -37,6 +37,10 @@ // TODO: use GL_EXT_transform_feedback or GL3 equivalent // TODO: port to piglit too +#define STRINGIFY_(x) #x +#define STRINGIFY(x) STRINGIFY_(x) +#define S__LINE__ STRINGIFY(__LINE__) + static const float INF=.9F; static int Win; @@ -67,7 +71,7 @@ float rot[9] = {1,0,0, 0,1,0, 0,0,1}; static const char* vsSource = \n #version 120 \n -#line 63 63 \n +#line S__LINE__ \n #define SHADOWS \n #define RECURSION \n \n @@ -249,7 +253,7 @@ static const char* vsSource = static const char* gsSource = #version 120 \n -#line 245 245\n +#line S__LINE__ \n #extension GL_ARB_geometry_shader4: require \n \n #define SHADOWS \n @@ -388,7 +392,7 @@ static const char* gsSource = static const char* fsSource = #version 120 \n -#line 384 384\n +#line S__LINE__ \n \n #define SHADOWS \n #define RECURSION\n -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/12] glsl/gsraytrace: Use core geometry shaders.
v2: Don't remove ShaderSupported() test. It sets up some function pointers for the CompileShader framework. Signed-off-by: Fabian Bieler fabianbie...@fastmail.fm Reviewed-by: Brian Paul bri...@vmware.com Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/gsraytrace.cpp | 24 +--- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/src/glsl/gsraytrace.cpp b/src/glsl/gsraytrace.cpp index f9e708f..6df6543 100644 --- a/src/glsl/gsraytrace.cpp +++ b/src/glsl/gsraytrace.cpp @@ -255,7 +255,8 @@ static const char* vsSource = static const char* gsSource = #version 150 \n #line S__LINE__ \n -#extension GL_ARB_geometry_shader4: require \n +layout(points) in; \n +layout(points, max_vertices = 3) out;\n \n #define SHADOWS \n #define RECURSION\n @@ -337,7 +338,7 @@ static const char* gsSource = return; \n \n // emitPassThrough(); \n - gl_Position = gl_PositionIn[0]; \n + gl_Position = gl_in[0].gl_Position; \n orig_t2 = orig_t1[0]; \n dir_idx2 = dir_idx1[0];\n uv_state2.xyw= uv_state1[0].xyw; \n @@ -362,7 +363,7 @@ static const char* gsSource = type = 1;\n \n //emitShadowRay(); \n - gl_Position = gl_PositionIn[0]; \n + gl_Position = gl_in[0].gl_Position; \n orig_t2.xyz = shadowRay.orig; \n orig_t2.w= shadowHit.t; \n dir_idx2.xyz = shadowRay.dir;\n @@ -379,7 +380,7 @@ static const char* gsSource = type = -1; \n \n //emitReflRay(); \n - gl_Position = gl_PositionIn[0]; \n + gl_Position = gl_in[0].gl_Position; \n orig_t2.xyz = reflRay.orig; \n orig_t2.w= reflHit.t; \n dir_idx2.xyz = reflRay.dir; \n @@ -844,24 +845,17 @@ Init(void) exit(-1); } - if (!GLEW_ARB_geometry_shader4) + if (!GLEW_VERSION_3_2) { - fprintf(stderr, GS Shaders are not supported!\n); - exit(-1); - } - - if (!GLEW_VERSION_3_0) - { - fprintf(stderr, OpenGL 3.0 (needed for transform feedback) not - supported!\n); + fprintf(stderr, OpenGL 3.2 (needed for transform feedback and + gemoetry shaders) not supported!\n); exit(-1); } vertShader = CompileShaderText(GL_VERTEX_SHADER, vsSource); geomShader = CompileShaderText(GL_GEOMETRY_SHADER_ARB, gsSource); fragShader = CompileShaderText(GL_FRAGMENT_SHADER, fsSource); - program = LinkShaders3WithGeometryInfo(vertShader, geomShader, fragShader, - 3, GL_POINTS, GL_POINTS); + program = LinkShaders3(vertShader, geomShader, fragShader); const char *varyings[] = { gl_Position, -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/12] glsl/gsraytrace: Use core GL3.0 transform feedback
NV_transform_feedback is not supported by mesa. Use transform feedback from core OpenGL 3.0. This necessitates binding the transform feedback varyings before linking the shader. Signed-off-by: Fabian Bieler fabianbie...@fastmail.fm Reviewed-by: Brian Paul bri...@vmware.com Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/gsraytrace.cpp | 72 + 1 file changed, 31 insertions(+), 41 deletions(-) diff --git a/src/glsl/gsraytrace.cpp b/src/glsl/gsraytrace.cpp index 015bfcd..ef67643 100644 --- a/src/glsl/gsraytrace.cpp +++ b/src/glsl/gsraytrace.cpp @@ -34,7 +34,6 @@ #include math.h #include stddef.h // offsetof -// TODO: use GL_EXT_transform_feedback or GL3 equivalent // TODO: port to piglit too #define STRINGIFY_(x) #x @@ -604,33 +603,12 @@ Draw(void) dir_idxAttribLoc = glGetAttribLocation(program, dir_idx); uv_stateAttribLoc = glGetAttribLocation(program, uv_state); - posVaryingLoc = glGetVaryingLocationNV(program, gl_Position); - orig_tVaryingLoc = glGetVaryingLocationNV(program, orig_t2); - dir_idxVaryingLoc = glGetVaryingLocationNV(program, dir_idx2); - uv_stateVaryingLoc = glGetVaryingLocationNV(program, uv_state2); - //gs.gs-getVaryingLocation(gl_Position, gs.posVaryingLoc); - //gs.gs-getVaryingLocation(orig_t2, gs.orig_tVaryingLoc); - //gs.gs-getVaryingLocation(dir_idx2, gs.dir_idxVaryingLoc); - //gs.gs-getVaryingLocation(uv_state2, gs.uv_stateVaryingLoc); - - - glBindBufferOffsetNV(GL_TRANSFORM_FEEDBACK_BUFFER_NV, 0, dst, 0); - GLint varyings[4]= { - posVaryingLoc, - orig_tVaryingLoc, - dir_idxVaryingLoc, - uv_stateVaryingLoc - }; - // I think it will be a performance win to use multiple buffer objects to write to - // instead of using the interleaved mode. - glTransformFeedbackVaryingsNV(program, 4, varyings, GL_INTERLEAVED_ATTRIBS_NV); - printf(%d\n, i); //gs.fpwQuery-beginQuery(); //gs.pgQuery-beginQuery(); - glBindBufferBaseNV(GL_TRANSFORM_FEEDBACK_BUFFER_NV, 0, dst); - glBeginQuery(GL_PRIMITIVES_GENERATED_NV, pgQuery); - glBeginTransformFeedbackNV(GL_POINTS); + glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, dst); + glBeginQuery(GL_PRIMITIVES_GENERATED, pgQuery); + glBeginTransformFeedback(GL_POINTS); //gs.eyeRaysAsPoints-bindAs(ARRAY); glBindBuffer(GL_ARRAY_BUFFER, eyeRaysAsPoints); { @@ -653,9 +631,9 @@ Draw(void) //gs.gs-set_uniform(emitNoMore, 1, 0); glUniform1i(glGetUniformLocation(program, emitNoMore), 0); - //glEnable(GL_RASTERIZER_DISCARD_NV); + //glEnable(GL_RASTERIZER_DISCARD); glDrawArrays(GL_POINTS, 0, WinWidth*WinHeight); - //glDisable(GL_RASTERIZER_DISCARD_NV); + //glDisable(GL_RASTERIZER_DISCARD); glDisableVertexAttribArray(uv_stateAttribLoc); @@ -667,16 +645,16 @@ Draw(void) } //gs.eyeRaysAsPoints-unbindAs(ARRAY); glBindBuffer(GL_ARRAY_BUFFER, 0); - glEndTransformFeedbackNV(); + glEndTransformFeedback(); //gs.pgQuery-endQuery(); - glEndQuery(GL_PRIMITIVES_GENERATED_NV); + glEndQuery(GL_PRIMITIVES_GENERATED); //gs.fpwQuery-endQuery(); psoLog(LOG_RAW) 1st: gs.fpwQuery-getQueryResult() , gs.pgQuery-getQueryResult() \n; swap(src, dst); - glBindBufferBaseNV(GL_TRANSFORM_FEEDBACK_BUFFER_NV, 0, 0); + glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, 0); clear(); @@ -777,15 +755,15 @@ Reshape(int width, int height) { size_t nElem = WinWidth*WinHeight*nRayGens; - glBindBuffer(GL_TRANSFORM_FEEDBACK_BUFFER_NV, dst); - glBufferData(GL_TRANSFORM_FEEDBACK_BUFFER_NV, nElem*sizeof(GSRay), 0, GL_STREAM_DRAW); - GSRay* d = (GSRay*)glMapBuffer(GL_TRANSFORM_FEEDBACK_BUFFER_NV, GL_READ_WRITE); + glBindBuffer(GL_TRANSFORM_FEEDBACK_BUFFER, dst); + glBufferData(GL_TRANSFORM_FEEDBACK_BUFFER, nElem*sizeof(GSRay), 0, GL_STREAM_DRAW); + GSRay* d = (GSRay*)glMapBuffer(GL_TRANSFORM_FEEDBACK_BUFFER, GL_READ_WRITE); for (size_t i = 0; i nElem; i++) { d[i].dir_idx = vec4(0.0F, 0.0F, 0.0F, -1.0F); } - glUnmapBuffer(GL_TRANSFORM_FEEDBACK_BUFFER_NV); - glBindBuffer(GL_TRANSFORM_FEEDBACK_BUFFER_NV, 0); + glUnmapBuffer(GL_TRANSFORM_FEEDBACK_BUFFER); + glBindBuffer(GL_TRANSFORM_FEEDBACK_BUFFER, 0); //printf(Ping-pong VBO size 2x%d Kbytes.\n, (int)nElem*sizeof(GSRay)/1024); } @@ -866,12 +844,30 @@ Init(void) exit(-1); } + if (!GLEW_VERSION_3_0) + { + fprintf(stderr, OpenGL 3.0 (needed for transform feedback) not + supported!\n); + exit(-1); + } + vertShader = CompileShaderText(GL_VERTEX_SHADER, vsSource); geomShader = CompileShaderText(GL_GEOMETRY_SHADER_ARB, gsSource); fragShader = CompileShaderText(GL_FRAGMENT_SHADER, fsSource); program = LinkShaders3WithGeometryInfo(vertShader, geomShader, fragShader, 3,
[Mesa-dev] [PATCH 11/12] geom-outlining-150: Use core geometry shaders.
Signed-off-by: Fabian Bieler fabianbie...@fastmail.fm Reviewed-by: Brian Paul bri...@vmware.com Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/geom-outlining-150.c | 26 -- 1 file changed, 8 insertions(+), 18 deletions(-) diff --git a/src/glsl/geom-outlining-150.c b/src/glsl/geom-outlining-150.c index 0bc20f0..3dffa16 100644 --- a/src/glsl/geom-outlining-150.c +++ b/src/glsl/geom-outlining-150.c @@ -256,7 +256,8 @@ Init(void) } \n; static const char *geomShaderText = #version 150 \n - #extension GL_ARB_geometry_shader4: enable \n + layout(triangles) in; \n + layout(triangle_strip, max_vertices = 3) out; \n uniform vec2 ViewportSize; \n out vec2 Vert0, Vert1, Vert2; \n \n @@ -271,11 +272,11 @@ Init(void) Vert0 = vpxform(gl_in[0].gl_Position); \n Vert1 = vpxform(gl_in[1].gl_Position); \n Vert2 = vpxform(gl_in[2].gl_Position); \n - gl_Position = gl_PositionIn[0]; \n + gl_Position = gl_in[0].gl_Position; \n EmitVertex(); \n - gl_Position = gl_PositionIn[1]; \n + gl_Position = gl_in[1].gl_Position; \n EmitVertex(); \n - gl_Position = gl_PositionIn[2]; \n + gl_Position = gl_in[2].gl_Position; \n EmitVertex(); \n } \n; static const char *fragShaderText = @@ -309,15 +310,14 @@ Init(void) if (!ShadersSupported()) exit(1); - version = glGetString(GL_VERSION); - if (version[0] * 10 + version[2] 32) { + if (!GLEW_VERSION_3_2) { fprintf(stderr, Sorry, OpenGL 3.2 or later required.\n); exit(1); } VertShader = CompileShaderText(GL_VERTEX_SHADER, vertShaderText); FragShader = CompileShaderText(GL_FRAGMENT_SHADER, fragShaderText); - GeomShader = CompileShaderText(GL_GEOMETRY_SHADER_ARB, geomShaderText); + GeomShader = CompileShaderText(GL_GEOMETRY_SHADER, geomShaderText); Program = LinkShaders3(VertShader, GeomShader, FragShader); assert(Program); @@ -326,18 +326,8 @@ Init(void) glBindAttribLocation(Program, 0, Vertex); glBindFragDataLocation(Program, 0, FragColor); - /* -* The geometry shader will receive and emit triangles. -*/ - glProgramParameteriARB(Program, GL_GEOMETRY_INPUT_TYPE_ARB, - GL_TRIANGLES); - glProgramParameteriARB(Program, GL_GEOMETRY_OUTPUT_TYPE_ARB, - GL_TRIANGLE_STRIP); - glProgramParameteriARB(Program,GL_GEOMETRY_VERTICES_OUT_ARB, 3); - CheckError(__LINE__); - /* relink */ - glLinkProgramARB(Program); + glLinkProgram(Program); assert(glIsProgram(Program)); assert(glIsShader(FragShader)); -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/12] glsl/gsraytrace: Use GLSL 1.5 instead of 1.2.
This commit prepares the transition from extension to core geometry shaders. (Core geometry shaders require GLSL version 1.5 or later.) This includes using generic vertex attributes instead of built-ins. Signed-off-by: Fabian Bieler fabianbie...@fastmail.fm Reviewed-by: Brian Paul bri...@vmware.com Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/gsraytrace.cpp | 58 +++-- 1 file changed, 32 insertions(+), 26 deletions(-) diff --git a/src/glsl/gsraytrace.cpp b/src/glsl/gsraytrace.cpp index ef67643..f9e708f 100644 --- a/src/glsl/gsraytrace.cpp +++ b/src/glsl/gsraytrace.cpp @@ -56,6 +56,7 @@ static GLuint pgQuery; static GLuint dst; static GLuint eyeRaysAsPoints; +int posAttribLoc; int orig_tAttribLoc; int dir_idxAttribLoc; int uv_stateAttribLoc; @@ -69,7 +70,7 @@ float rot[9] = {1,0,0, 0,1,0, 0,0,1}; static const char* vsSource = \n -#version 120 \n +#version 150 \n #line S__LINE__ \n #define SHADOWS \n #define RECURSION \n @@ -83,9 +84,10 @@ static const char* vsSource = uniform vec4 backgroundColor; \n uniform int emitNoMore; \n \n -attribute vec4 orig_t;\n -attribute vec4 dir_idx; \n -attribute vec4 uv_state; \n +in vec4 pos; \n +in vec4 orig_t; \n +in vec4 dir_idx; \n +in vec4 uv_state; \n // uv_state.z = state \n // uv_state.w = type (ray generation) \n \n @@ -98,9 +100,9 @@ static const char* vsSource = //0: not shadow ray, eye ray \n //1: shadow ray \n \n -varying vec4 orig_t1; \n -varying vec4 dir_idx1;\n -varying vec4 uv_state1; \n +out vec4 orig_t1; \n +out vec4 dir_idx1;\n +out vec4 uv_state1; \n \n \n //\n @@ -224,7 +226,7 @@ static const char* vsSource = if (state == 0) \n { \n // generate eye rays\n -ray = Ray(cameraPos, normalize(vec3(gl_Vertex.x, gl_Vertex.y, -1.0) * rot3));\n +ray = Ray(cameraPos, normalize(vec3(pos.x, pos.y, -1.0) * rot3)); \n isec.t = INF;\n isec.idx = -1;\n state = 1;\n @@ -240,7 +242,7 @@ static const char* vsSource = //else state == 3 \n \n //outVS();\n - gl_Position = gl_Vertex; \n + gl_Position = pos; \n orig_t1.xyz = ray.orig; \n orig_t1.w= isec.t;\n dir_idx1.xyz = ray.dir; \n @@ -251,7 +253,7 @@ static const char* vsSource = static const char* gsSource = -#version 120 \n +#version 150 \n #line S__LINE__ \n #extension GL_ARB_geometry_shader4: require \n \n @@ -310,13 +312,13 @@ static const char* gsSource = return isec; \n }\n \n -varying in vec4 orig_t1[1];
[Mesa-dev] [PATCH 09/12] glsl/gsraytrace: Switch to core profile.
v2: Remove redundant 'core' in GLSL version statement. Signed-off-by: Fabian Bieler fabianbie...@fastmail.fm Reviewed-by: Brian Paul bri...@vmware.com Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/gsraytrace.cpp | 34 ++ 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/src/glsl/gsraytrace.cpp b/src/glsl/gsraytrace.cpp index 6df6543..44f2674 100644 --- a/src/glsl/gsraytrace.cpp +++ b/src/glsl/gsraytrace.cpp @@ -408,6 +408,7 @@ static const char* fsSource = uniform vec4 backgroundColor;\n uniform int emitNoMore; \n \n +out vec4 frag_color; \n \n //---\n \n @@ -493,7 +494,7 @@ static const char* fsSource = Isec eyeHit = isec;\n if (eyeHit.idx == -1)\n {\n - gl_FragColor = vec4(backgroundColor.rgb, 0.0);\n + frag_color = vec4(backgroundColor.rgb, 0.0);\n return;\n }\n vec3 eyeHitPosition = eyeRay.orig + eyeRay.dir * eyeHit.t;\n @@ -503,7 +504,7 @@ static const char* fsSource = vec3 L = normalize(lightVec); \n float NdotL = max(dot(N, L), 0.0); \n vec3 diffuse = idx2color(eyeHit.idx); // material color of the visible point\n -gl_FragColor = vec4(diffuse * NdotL, 1.0); \n +frag_color = vec4(diffuse * NdotL, 1.0); \n return;\n }\n #ifdef SHADOWS \n @@ -514,7 +515,7 @@ static const char* fsSource = { \n discard;\n } \n -gl_FragColor = vec4(-1,-1,-1, 0.0); \n +frag_color = vec4(-1,-1,-1, 0.0); \n return; \n } \n #endif\n @@ -534,7 +535,7 @@ static const char* fsSource = vec3 L = normalize(lightVec); \n float NdotL = max(dot(N, L), 0.0); \n vec3 diffuse = idx2color(reflHit.idx);\n -gl_FragColor = vec4(diffuse * NdotL * 0.25, 1.0); // material color of the visible point\n +frag_color = vec4(diffuse * NdotL * 0.25, 1.0); // material color of the visible point\n return; \n } \n #endif\n @@ -608,6 +609,8 @@ Draw(void) dir_idxAttribLoc = glGetAttribLocation(program, dir_idx); uv_stateAttribLoc = glGetAttribLocation(program, uv_state); + glBindFragDataLocation(program, 0, frag_color); + printf(%d\n, i); //gs.fpwQuery-beginQuery(); //gs.pgQuery-beginQuery(); @@ -755,10 +758,6 @@ Reshape(int width, int height) WinWidth = width; WinHeight = height; glViewport(0, 0, width, height); - glMatrixMode(GL_PROJECTION); - glLoadIdentity(); - glMatrixMode(GL_MODELVIEW); - glLoadIdentity(); { size_t nElem = WinWidth*WinHeight*nRayGens; @@ -911,6 +910,10 @@ Init(void) glGenBuffers(1, dst); glGenBuffers(1, eyeRaysAsPoints); + GLuint vao; + glGenVertexArrays(1, vao); + glBindVertexArray(vao); + printf(\nESC = exit demo\nleft mouse + drag = rotate camera\n\n); } @@ -920,9 +923,24 @@ main(int argc, char *argv[]) { glutInitWindowSize(WinWidth, WinHeight); glutInit(argc, argv); + +#ifdef HAVE_FREEGLUT + glutInitContextVersion(3, 2); + glutInitContextProfile(GLUT_CORE_PROFILE); glutInitDisplayMode(GLUT_RGB | GLUT_DOUBLE | GLUT_DEPTH); +#elif defined __APPLE__ + glutInitDisplayMode(GLUT_3_2_CORE_PROFILE | GLUT_RGB | GLUT_DOUBLE | GLUT_DEPTH); +#else + glutInitDisplayMode(GLUT_RGB | GLUT_DOUBLE | GLUT_DEPTH); +#endif Win = glutCreateWindow(argv[0]); + + // glewInit requires glewExperimentel set to true for core profiles. + // Depending on the glew version it also generates GL_INVALID_ENUM. + glewExperimental = GL_TRUE; glewInit(); + glGetError(); + glutReshapeFunc(Reshape);
[Mesa-dev] [PATCH 01/12] configure.ac: Check for freeglut.
To get an OpenGL core profile context freeglut 2.6 or later is required. Note that in spite of it's name HAVE_FREEGLUT is only defined if freeglut 2.6 (released in 2009) or later ist found. Signed-off-by: Fabian Bieler fabianbie...@fastmail.fm Reviewed-by: Brian Paul bri...@vmware.com Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- configure.ac | 6 ++ 1 file changed, 6 insertions(+) diff --git a/configure.ac b/configure.ac index 0c38f4d..cd523c1 100644 --- a/configure.ac +++ b/configure.ac @@ -83,6 +83,12 @@ AC_CHECK_LIB([glut], [], [glut_enabled=no]) +dnl Check for FreeGLUT 2.6 or later +AC_EGREP_HEADER([glutInitContextProfile], + [GL/freeglut.h], + [AC_DEFINE(HAVE_FREEGLUT)], + []) + dnl Check for GLEW PKG_CHECK_MODULES(GLEW, [glew = 1.5.4]) DEMO_CFLAGS=$DEMO_CFLAGS $GLEW_CFLAGS -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] util: Add util_bswap64()
--- src/gallium/auxiliary/util/u_math.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/gallium/auxiliary/util/u_math.h b/src/gallium/auxiliary/util/u_math.h index b5e0663..49f8bda 100644 --- a/src/gallium/auxiliary/util/u_math.h +++ b/src/gallium/auxiliary/util/u_math.h @@ -741,6 +741,16 @@ util_bswap32(uint32_t n) #endif } +/** + * Reverse byte order of a 64bit word. + */ +static INLINE uint64_t +util_bswap64(uint64_t n) +{ + return ((uint64_t)util_bswap32(n 0x) 32) | + util_bswap32((n 32)); +} + /** * Reverse byte order of a 16 bit word. -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] clover: Pass buffer offsets to the driver in set_global_binding() v2
The offsets will be stored in the handles parameter. This makes it possible to use sub-buffers. v2: - Style fixes - Add support for constant sub-buffers - Store handles in device byte order --- src/gallium/drivers/r600/evergreen_compute.c | 10 +- src/gallium/drivers/radeonsi/si_compute.c | 6 ++ src/gallium/include/pipe/p_context.h | 13 - src/gallium/state_trackers/clover/core/kernel.cpp | 16 +--- 4 files changed, 36 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c index 70efe5c..efd7143 100644 --- a/src/gallium/drivers/r600/evergreen_compute.c +++ b/src/gallium/drivers/r600/evergreen_compute.c @@ -662,10 +662,18 @@ static void evergreen_set_global_binding( for (int i = 0; i n; i++) { + uint32_t buffer_offset; + uint32_t handle; assert(resources[i]-target == PIPE_BUFFER); assert(resources[i]-bind PIPE_BIND_GLOBAL); - *(handles[i]) = buffers[i]-chunk-start_in_dw * 4; + buffer_offset = util_le32_to_cpu(*(handles[i])); + handle = buffer_offset + buffers[i]-chunk-start_in_dw * 4; + if (R600_BIG_ENDIAN) { + handle = util_bswap32(handle); + } + + *(handles[i]) = handle; } evergreen_set_rat(ctx-cs_shader_state.shader, 0, pool-bo, 0, pool-size_in_dw * 4); diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index a7f49e7..43d521b 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -107,8 +107,14 @@ static void si_set_global_binding( for (i = first; i first + n; i++) { uint64_t va; + uint32_t offset; program-global_buffers[i] = resources[i]; va = r600_resource_va(ctx-screen, resources[i]); + offset = util_le32_to_cpu(*handles[i]); + va += offset; + if (SI_BIG_ENDIAN) { + va = util_bswap64(va); + } memcpy(handles[i], va, sizeof(va)); } } diff --git a/src/gallium/include/pipe/p_context.h b/src/gallium/include/pipe/p_context.h index 8ef6e27..209ec9e 100644 --- a/src/gallium/include/pipe/p_context.h +++ b/src/gallium/include/pipe/p_context.h @@ -460,11 +460,14 @@ struct pipe_context { * unless it's NULL, in which case no new * resources will be bound. * \param handlesarray of pointers to the memory locations that -* will be filled with the respective base -* addresses each buffer will be mapped to. It -* should contain at least \a count elements, -* unless \a resources is NULL in which case \a -* handles should be NULL as well. +* will be updated with the address each buffer +* will be mapped to. The base memory address of +* each of the buffers will be added to the value +* pointed to by its corresponding handle to form +* the final address argument. It should contain +* at least \a count elements, unless \a +* resources is NULL in which case \a handles +* should be NULL as well. * * Note that the driver isn't required to make any guarantees about * the contents of the \a handles array being valid anytime except diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp b/src/gallium/state_trackers/clover/core/kernel.cpp index 6d894cd..b4d555c 100644 --- a/src/gallium/state_trackers/clover/core/kernel.cpp +++ b/src/gallium/state_trackers/clover/core/kernel.cpp @@ -337,8 +337,17 @@ kernel::global_argument::bind(exec_context ctx, align(ctx.input, marg.target_align); if (buf) { - ctx.g_handles.push_back(allocate(ctx.input, marg.target_size)); - ctx.g_buffers.push_back(buf-resource(*ctx.q).pipe); + const resource r = buf-resource(*ctx.q); + ctx.g_handles.push_back(ctx.input.size()); + ctx.g_buffers.push_back(r.pipe); + + // How to handle multi-demensional offsets? + // We don't need to. Buffer offsets are always + // one-dimensional. + auto v = bytes(r.offset[0]); + extend(v, marg.ext_type, marg.target_size); + byteswap(v, ctx.q-dev.endianness()); + insert(ctx.input, v); } else { // Null pointer. allocate(ctx.input, marg.target_size); @@ -395,7 +404,8 @@ kernel::constant_argument::bind(exec_context ctx, align(ctx.input, marg.target_align); if (buf) { - auto v = bytes(ctx.resources.size() 24); + const resource r =
[Mesa-dev] [PATCH 2/3] radeonsi: Use SI_BIG_ENDIAN now that it exists
--- src/gallium/drivers/radeonsi/si_shader.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 54270cd..9fed751 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -2333,7 +2333,7 @@ int si_compile_llvm(struct si_context *sctx, struct si_pipe_shader *shader, } ptr = (uint32_t*)sctx-b.ws-buffer_map(shader-bo-cs_buf, sctx-b.rings.gfx.cs, PIPE_TRANSFER_WRITE); - if (0 /*SI_BIG_ENDIAN*/) { + if (SI_BIG_ENDIAN) { for (i = 0; i binary.code_size / 4; ++i) { ptr[i] = util_bswap32(*(uint32_t*)(binary.code + i*4)); } -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 14/15] mesa/sso: Implement _mesa_ActiveShaderProgram
On 02/19/2014 01:44 PM, Jordan Justen wrote: On Fri, Feb 7, 2014 at 10:00 PM, Ian Romanick i...@freedesktop.org wrote: From: Gregory Hainaut gregory.hain...@gmail.com This was originally included in another patch, but it was split out by Ian Romanick. Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/mesa/main/pipelineobj.c | 24 1 file changed, 24 insertions(+) diff --git a/src/mesa/main/pipelineobj.c b/src/mesa/main/pipelineobj.c index b47dc7a..6e490bd 100644 --- a/src/mesa/main/pipelineobj.c +++ b/src/mesa/main/pipelineobj.c @@ -227,6 +227,30 @@ _mesa_UseProgramStages(GLuint pipeline, GLbitfield stages, GLuint program) void GLAPIENTRY _mesa_ActiveShaderProgram(GLuint pipeline, GLuint program) { + GET_CURRENT_CONTEXT(ctx); + struct gl_shader_program *shProg = (program != 0) + ? _mesa_lookup_shader_program_err(ctx, program, glActiveShaderProgram(program)) + : NULL; Seems like if/else would be more clear for this part. If _mesa_lookup_shader_program_err returns NULL, should we exit early? Yes. Good catch. We should also have a piglit test for this. I don't think there is one already. - Bind a valid program to the pipeline. - Try to bind a non-existant, non-zero program. - Verify the error is generated. - Verify that old program is still bound to the pipeline. -Jordan + struct gl_pipeline_object *pipe = lookup_pipeline_object(ctx, pipeline); + + if (!pipe) { + _mesa_error(ctx, GL_INVALID_OPERATION, glActiveShaderProgram(pipeline)); + return; + } + + /* Object is created by any Pipeline call but glGenProgramPipelines, +* glIsProgramPipeline and GetProgramPipelineInfoLog +*/ + pipe-EverBound = GL_TRUE; + + if ((shProg != NULL) !shProg-LinkStatus) { + _mesa_error(ctx, GL_INVALID_OPERATION, +glActiveShaderProgram(program %u not linked), shProg-Name); + return; + } + + _mesa_reference_shader_program(ctx, pipe-ActiveProgram, shProg); } /** -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] util: Add util_bswap64()
On Wed, Feb 19, 2014 at 6:09 PM, Tom Stellard thomas.stell...@amd.com wrote: --- src/gallium/auxiliary/util/u_math.h | 10 ++ 1 file changed, 10 insertions(+) For the series: Reviewed-by: Alex Deucher alexander.deuc...@amd.com diff --git a/src/gallium/auxiliary/util/u_math.h b/src/gallium/auxiliary/util/u_math.h index b5e0663..49f8bda 100644 --- a/src/gallium/auxiliary/util/u_math.h +++ b/src/gallium/auxiliary/util/u_math.h @@ -741,6 +741,16 @@ util_bswap32(uint32_t n) #endif } +/** + * Reverse byte order of a 64bit word. + */ +static INLINE uint64_t +util_bswap64(uint64_t n) +{ + return ((uint64_t)util_bswap32(n 0x) 32) | + util_bswap32((n 32)); +} + /** * Reverse byte order of a 16 bit word. -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] util: Add util_bswap64()
On Wed, Feb 19, 2014 at 6:09 PM, Tom Stellard thomas.stell...@amd.com wrote: --- src/gallium/auxiliary/util/u_math.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/gallium/auxiliary/util/u_math.h b/src/gallium/auxiliary/util/u_math.h index b5e0663..49f8bda 100644 --- a/src/gallium/auxiliary/util/u_math.h +++ b/src/gallium/auxiliary/util/u_math.h @@ -741,6 +741,16 @@ util_bswap32(uint32_t n) #endif } +/** + * Reverse byte order of a 64bit word. + */ +static INLINE uint64_t +util_bswap64(uint64_t n) +{ + return ((uint64_t)util_bswap32(n 0x) 32) | + util_bswap32((n 32)); Perhaps use __builtin_bswap64 if it's available? Not sure when it became available though. +} + /** * Reverse byte order of a 16 bit word. -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] util: Add util_bswap64()
On Wed, Feb 19, 2014 at 3:32 PM, Ilia Mirkin imir...@alum.mit.edu wrote: On Wed, Feb 19, 2014 at 6:09 PM, Tom Stellard thomas.stell...@amd.com wrote: +/** + * Reverse byte order of a 64bit word. + */ +static INLINE uint64_t +util_bswap64(uint64_t n) +{ + return ((uint64_t)util_bswap32(n 0x) 32) | + util_bswap32((n 32)); Perhaps use __builtin_bswap64 if it's available? Not sure when it became available though. When I fixed up bswap stuff in the X server a few years ago, I discovered that gcc was really good at detecting open-coded bswap, and less good at recognizing when it could constant fold __builtin_bswap. Do some experiments, but make sure your experiments include not using __builtin_bswap32 in util_bswap32. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] util: Add util_bswap64()
Tom Stellard thomas.stell...@amd.com writes: --- src/gallium/auxiliary/util/u_math.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/gallium/auxiliary/util/u_math.h b/src/gallium/auxiliary/util/u_math.h index b5e0663..49f8bda 100644 --- a/src/gallium/auxiliary/util/u_math.h +++ b/src/gallium/auxiliary/util/u_math.h @@ -741,6 +741,16 @@ util_bswap32(uint32_t n) #endif } +/** + * Reverse byte order of a 64bit word. + */ +static INLINE uint64_t +util_bswap64(uint64_t n) +{ + return ((uint64_t)util_bswap32(n 0x) 32) | + util_bswap32((n 32)); +} + /** * Reverse byte order of a 16 bit word. -- 1.8.1.4 Reviewed-by: Francisco Jerez curroje...@riseup.net pgpZjfBcsW_4n.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] clover: Pass buffer offsets to the driver in set_global_binding() v2
Tom Stellard thomas.stell...@amd.com writes: The offsets will be stored in the handles parameter. This makes it possible to use sub-buffers. v2: - Style fixes - Add support for constant sub-buffers - Store handles in device byte order --- src/gallium/drivers/r600/evergreen_compute.c | 10 +- src/gallium/drivers/radeonsi/si_compute.c | 6 ++ src/gallium/include/pipe/p_context.h | 13 - src/gallium/state_trackers/clover/core/kernel.cpp | 16 +--- 4 files changed, 36 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c index 70efe5c..efd7143 100644 --- a/src/gallium/drivers/r600/evergreen_compute.c +++ b/src/gallium/drivers/r600/evergreen_compute.c @@ -662,10 +662,18 @@ static void evergreen_set_global_binding( for (int i = 0; i n; i++) { + uint32_t buffer_offset; + uint32_t handle; assert(resources[i]-target == PIPE_BUFFER); assert(resources[i]-bind PIPE_BIND_GLOBAL); - *(handles[i]) = buffers[i]-chunk-start_in_dw * 4; + buffer_offset = util_le32_to_cpu(*(handles[i])); + handle = buffer_offset + buffers[i]-chunk-start_in_dw * 4; + if (R600_BIG_ENDIAN) { + handle = util_bswap32(handle); + } + + *(handles[i]) = handle; I guess you could just do *(handles[i]) = util_cpu_to_le32(handle)? Oh, right, there isn't such a function -- though it would be trivial to implement. } evergreen_set_rat(ctx-cs_shader_state.shader, 0, pool-bo, 0, pool-size_in_dw * 4); diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index a7f49e7..43d521b 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -107,8 +107,14 @@ static void si_set_global_binding( for (i = first; i first + n; i++) { uint64_t va; + uint32_t offset; program-global_buffers[i] = resources[i]; va = r600_resource_va(ctx-screen, resources[i]); + offset = util_le32_to_cpu(*handles[i]); + va += offset; + if (SI_BIG_ENDIAN) { + va = util_bswap64(va); + } memcpy(handles[i], va, sizeof(va)); } } diff --git a/src/gallium/include/pipe/p_context.h b/src/gallium/include/pipe/p_context.h index 8ef6e27..209ec9e 100644 --- a/src/gallium/include/pipe/p_context.h +++ b/src/gallium/include/pipe/p_context.h @@ -460,11 +460,14 @@ struct pipe_context { * unless it's NULL, in which case no new * resources will be bound. * \param handlesarray of pointers to the memory locations that -* will be filled with the respective base -* addresses each buffer will be mapped to. It -* should contain at least \a count elements, -* unless \a resources is NULL in which case \a -* handles should be NULL as well. +* will be updated with the address each buffer +* will be mapped to. The base memory address of +* each of the buffers will be added to the value +* pointed to by its corresponding handle to form +* the final address argument. It should contain +* at least \a count elements, unless \a +* resources is NULL in which case \a handles +* should be NULL as well. * * Note that the driver isn't required to make any guarantees about * the contents of the \a handles array being valid anytime except diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp b/src/gallium/state_trackers/clover/core/kernel.cpp index 6d894cd..b4d555c 100644 --- a/src/gallium/state_trackers/clover/core/kernel.cpp +++ b/src/gallium/state_trackers/clover/core/kernel.cpp @@ -337,8 +337,17 @@ kernel::global_argument::bind(exec_context ctx, align(ctx.input, marg.target_align); if (buf) { - ctx.g_handles.push_back(allocate(ctx.input, marg.target_size)); - ctx.g_buffers.push_back(buf-resource(*ctx.q).pipe); + const resource r = buf-resource(*ctx.q); + ctx.g_handles.push_back(ctx.input.size()); + ctx.g_buffers.push_back(r.pipe); + + // How to handle multi-demensional offsets? + // We don't need to. Buffer offsets are always + // one-dimensional. + auto v = bytes(r.offset[0]); + extend(v, marg.ext_type, marg.target_size); + byteswap(v, ctx.q-dev.endianness()); + insert(ctx.input, v); } else { // Null pointer.
[Mesa-dev] [PATCH] i965: Implement a CS stall workaround on Broadwell.
According to the latest documentation, any PIPE_CONTROL with the Command Streamer Stall bit set must also have another bit set, with five different options: - Render Target Cache Flush - Depth Cache Flush - Stall at Pixel Scoreboard - Post-Sync Operation - Depth Stall I chose Stall at Pixel Scoreboard since we've used it effectively in the past, but the choice is fairly arbitrary. Implementing this in the PIPE_CONTROL emit helpers ensures that the workaround will always take effect when it ought to. Apparently, this workaround may be necessary on older hardware as well; for now I've only added it to Broadwell as it's absolutely necessary there. Subsequent patches could add it to older platforms, provided someone tests it there. v2: Only flag Stall at Pixel Scoreboard when none of the other bits are set (suggested by Ian Romanick). Cc: Ian Romanick i...@freedesktop.org Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 36 +++ 1 file changed, 36 insertions(+) Sure, that seems reasonable, Ian. I've updated the patch to only add stall at scoreboard when one of the other bits isn't already present. diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c index 4624268..bdb7b6b 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c @@ -432,6 +432,38 @@ intel_batchbuffer_data(struct brw_context *brw, } /** + * According to the latest documentation, any PIPE_CONTROL with the + * Command Streamer Stall bit set must also have another bit set, + * with five different options: + * + * - Render Target Cache Flush + * - Depth Cache Flush + * - Stall at Pixel Scoreboard + * - Post-Sync Operation + * - Depth Stall + * + * I chose Stall at Pixel Scoreboard since we've used it effectively + * in the past, but the choice is fairly arbitrary. + */ +static void +add_cs_stall_workaround_bits(uint32_t *flags) +{ + uint32_t wa_bits = PIPE_CONTROL_WRITE_FLUSH | + PIPE_CONTROL_DEPTH_CACHE_FLUSH | + PIPE_CONTROL_WRITE_IMMEDIATE | + PIPE_CONTROL_WRITE_DEPTH_COUNT | + PIPE_CONTROL_WRITE_TIMESTAMP | + PIPE_CONTROL_STALL_AT_SCOREBOARD | + PIPE_CONTROL_DEPTH_STALL; + + /* If we're doing a CS stall, and don't already have one of the +* workaround bits set, add Stall at Pixel Scoreboard. +*/ + if ((*flags PIPE_CONTROL_CS_STALL) != 0 (*flags wa_bits) == 0) + *flags |= PIPE_CONTROL_STALL_AT_SCOREBOARD; +} + +/** * Emit a PIPE_CONTROL with various flushing flags. * * The caller is responsible for deciding what flags are appropriate for the @@ -441,6 +473,8 @@ void brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags) { if (brw-gen = 8) { + add_cs_stall_workaround_bits(flags); + BEGIN_BATCH(6); OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2)); OUT_BATCH(flags); @@ -481,6 +515,8 @@ brw_emit_pipe_control_write(struct brw_context *brw, uint32_t flags, uint32_t imm_lower, uint32_t imm_upper) { if (brw-gen = 8) { + add_cs_stall_workaround_bits(flags); + BEGIN_BATCH(6); OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2)); OUT_BATCH(flags); -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] glcpp: Only warn for macro names containing __
Am 19.02.2014 20:09, schrieb Ian Romanick: I'm hoping that Tapani or Darius will verify that this patch actually fixes the problem. That's why people CC other people on patches. :) On 02/18/2014 10:19 AM, Ian Romanick wrote: From: Ian Romanick ian.d.roman...@intel.com Section 3.3 (Preprocessor) of the GLSL 1.30 spec (and later) and the GLSL ES spec (all versions) say: All macro names containing two consecutive underscores ( __ ) are reserved for future use as predefined macro names. All macro names prefixed with GL_ (GL followed by a single underscore) are also reserved. The intention is that names containing __ are reserved for internal use by the implementation, and names prefixed with GL_ are reserved for use by Khronos. Since every extension adds a name prefixed with GL_ (i.e., the name of the extension), that should be an error. Names simply containing __ are dangerous to use, but should be allowed. In similar cases, the C++ preprocessor specification says, no diagnostic is required. Per the Khronos bug mentioned below, a future version of the GLSL specification will clarify this. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Cc: 9.2 10.0 10.1 mesa-sta...@lists.freedesktop.org Cc: Tapani Pälli lem...@gmail.com Cc: Kenneth Graunke kenn...@whitecape.org Cc: Darius Spitznagel d.spitzna...@goodbytez.de Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870 Bugzilla: Khronos #11702 --- src/glsl/glcpp/glcpp-parse.y | 22 +++--- .../tests/086-reserved-macro-names.c.expected | 4 ++-- 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y index 5bb2891..bdc598f 100644 --- a/src/glsl/glcpp/glcpp-parse.y +++ b/src/glsl/glcpp/glcpp-parse.y @@ -1770,11 +1770,27 @@ static void _check_for_reserved_macro_name (glcpp_parser_t *parser, YYLTYPE *loc, const char *identifier) { - /* According to the GLSL specification, macro names starting with __ -* or GL_ are reserved for future use. So, don't allow them. + /* Section 3.3 (Preprocessor) of the GLSL 1.30 spec (and later) and +* the GLSL ES spec (all versions) say: +* +* All macro names containing two consecutive underscores ( __ ) +* are reserved for future use as predefined macro names. All +* macro names prefixed with GL_ (GL followed by a single +* underscore) are also reserved. +* +* The intention is that names containing __ are reserved for internal +* use by the implementation, and names prefixed with GL_ are reserved +* for use by Khronos. Since every extension adds a name prefixed +* with GL_ (i.e., the name of the extension), that should be an +* error. Names simply containing __ are dangerous to use, but should +* be allowed. +* +* A future version of the GLSL specification will clarify this. */ if (strstr(identifier, __)) { - glcpp_error (loc, parser, Macro names containing \__\ are reserved.\n); + glcpp_warning(loc, parser, + Macro names containing \__\ are reserved + for use by the implementation.\n); } if (strncmp(identifier, GL_, 3) == 0) { glcpp_error (loc, parser, Macro names starting with \GL_\ are reserved.\n); diff --git a/src/glsl/glcpp/tests/086-reserved-macro-names.c.expected b/src/glsl/glcpp/tests/086-reserved-macro-names.c.expected index d8aa9f0..5ca42a9 100644 --- a/src/glsl/glcpp/tests/086-reserved-macro-names.c.expected +++ b/src/glsl/glcpp/tests/086-reserved-macro-names.c.expected @@ -1,8 +1,8 @@ -0:1(10): preprocessor error: Macro names containing __ are reserved. +0:1(10): preprocessor warning: Macro names containing __ are reserved for use by the implementation. 0:2(9): preprocessor error: Macro names starting with GL_ are reserved. -0:3(9): preprocessor error: Macro names containing __ are reserved. +0:3(9): preprocessor warning: Macro names containing __ are reserved for use by the implementation. All three patches worked as expecting with mesa-10.0.3 which I use right now. Thank you. Kind regards Darius ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH][RFC] dri3: Add support for the GLX_EXT_buffer_age extension
Hi, The attached patch adds support for the GLX_EXT_buffer_age extension, which is mostly used by compositors for efficient sub screen updates. The extension should not be reported as supported when running DRI2 but it seems to show up when I try to disable it with LIBGL_DRI3_DISABLE ... not sure why suggestions welcome. P.S: Please CC me when replying as I am not subscribed to the list. From: Adel Gadllah adel.gadl...@gmail.com Date: Sun, 16 Feb 2014 13:40:42 +0100 Subject: [PATCH] dri3: Add GLX_EXT_buffer_age support --- include/GL/glx.h | 5 + include/GL/glxext.h | 5 + src/glx/dri2_glx.c| 1 + src/glx/dri3_glx.c| 17 + src/glx/dri3_priv.h | 2 ++ src/glx/glx_pbuffer.c | 7 +++ src/glx/glxclient.h | 1 + src/glx/glxextensions.c | 1 + src/glx/glxextensions.h | 1 + src/mesa/drivers/x11/glxapi.c | 3 +++ 10 files changed, 43 insertions(+) diff --git a/include/GL/glx.h b/include/GL/glx.h index 234abc0..b8b4d75 100644 --- a/include/GL/glx.h +++ b/include/GL/glx.h @@ -161,6 +161,11 @@ extern C { #define GLX_SAMPLES 0x186a1 /*11*/ +/* + * GLX_EXT_buffer_age + */ +#define GLX_BACK_BUFFER_AGE_EXT 0x20F4 + typedef struct __GLXcontextRec *GLXContext; typedef XID GLXPixmap; diff --git a/include/GL/glxext.h b/include/GL/glxext.h index 8c642f3..36e92dc 100644 --- a/include/GL/glxext.h +++ b/include/GL/glxext.h @@ -383,6 +383,11 @@ void glXReleaseTexImageEXT (Display *dpy, GLXDrawable drawable, int buffer); #define GLX_FLIP_COMPLETE_INTEL 0x8182 #endif /* GLX_INTEL_swap_event */ +#ifndef GLX_EXT_buffer_age +#define GLX_EXT_buffer_age 1 +#define GLX_BACK_BUFFER_AGE_EXT 0x20F4 +#endif /* GLX_EXT_buffer_age */ + #ifndef GLX_MESA_agp_offset #define GLX_MESA_agp_offset 1 typedef unsigned int ( *PFNGLXGETAGPOFFSETMESAPROC) (const void *pointer); diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c index 67fe9c1..007f449 100644 --- a/src/glx/dri2_glx.c +++ b/src/glx/dri2_glx.c @@ -1288,6 +1288,7 @@ dri2CreateScreen(int screen, struct glx_display * priv) psp-waitForSBC = NULL; psp-setSwapInterval = NULL; psp-getSwapInterval = NULL; + psp-queryBufferAge = NULL; if (pdp-driMinor = 2) { psp-getDrawableMSC = dri2DrawableGetMSC; diff --git a/src/glx/dri3_glx.c b/src/glx/dri3_glx.c index 70ec057..07120e1 100644 --- a/src/glx/dri3_glx.c +++ b/src/glx/dri3_glx.c @@ -1345,6 +1345,8 @@ dri3_swap_buffers(__GLXDRIdrawable *pdraw, int64_t target_msc, int64_t divisor, target_msc = priv-msc + priv-swap_interval * (priv-send_sbc - priv-recv_sbc); priv-buffers[buf_id]-busy = 1; + priv-buffers[buf_id]-last_swap = priv-swap_count; + xcb_present_pixmap(c, priv-base.xDrawable, priv-buffers[buf_id]-pixmap, @@ -1379,11 +1381,23 @@ dri3_swap_buffers(__GLXDRIdrawable *pdraw, int64_t target_msc, int64_t divisor, xcb_flush(c); if (priv-stamp) ++(*priv-stamp); + + priv-swap_count++; } return ret; } +static int +dri3_query_buffer_age(__GLXDRIdrawable *pdraw) +{ + struct dri3_drawable *priv = (struct dri3_drawable *) pdraw; + int buf_id = DRI3_BACK_ID(priv-cur_back); + if (!priv-buffers[buf_id]-last_swap) +return 0; + return priv-swap_count - priv-buffers[buf_id]-last_swap; +} + /** dri3_open * * Wrapper around xcb_dri3_open @@ -1742,6 +1756,9 @@ dri3_create_screen(int screen, struct glx_display * priv) psp-copySubBuffer = dri3_copy_sub_buffer; __glXEnableDirectExtension(psc-base, GLX_MESA_copy_sub_buffer); + psp-queryBufferAge = dri3_query_buffer_age; + __glXEnableDirectExtension(psc-base, GLX_EXT_buffer_age); + free(driverName); free(deviceName); diff --git a/src/glx/dri3_priv.h b/src/glx/dri3_priv.h index 1d124f8..d00440a 100644 --- a/src/glx/dri3_priv.h +++ b/src/glx/dri3_priv.h @@ -97,6 +97,7 @@ struct dri3_buffer { uint32_t cpp; uint32_t flags; uint32_t width, height; + uint32_t last_swap; enum dri3_buffer_typebuffer_type; }; @@ -184,6 +185,7 @@ struct dri3_drawable { struct dri3_buffer *buffers[DRI3_NUM_BUFFERS]; int cur_back; int num_back; + uint32_t swap_count; uint32_t *stamp; diff --git a/src/glx/glx_pbuffer.c b/src/glx/glx_pbuffer.c index 411d6e5..a87a0a4 100644 --- a/src/glx/glx_pbuffer.c +++ b/src/glx/glx_pbuffer.c @@ -365,6 +365,13 @@ GetDrawableAttribute(Display * dpy, GLXDrawable drawable, #if defined(GLX_DIRECT_RENDERING) !defined(GLX_USE_APPLEGL) { __GLXDRIdrawable *pdraw = GetGLXDRIDrawable(dpy, drawable); +struct glx_screen *psc = pdraw-psc; + +if (attribute == GLX_BACK_BUFFER_AGE_EXT pdraw != NULL +psc-driScreen-queryBufferAge != NULL) { + +*value = psc-driScreen-queryBufferAge (pdraw); +} if
Re: [Mesa-dev] [PATCH-RFC] i965: do not advertise MESA_FORMAT_Z_UNORM16 support
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/19/2014 12:08 PM, Kenneth Graunke wrote: On 02/18/2014 09:48 PM, Chia-I Wu wrote: Since 73bc6061f5c3b6a3bb7a8114bb2e1ab77d23cfdb, Z16 support is not advertised for OpenGL ES contexts due to the terrible performance. It is still enabled for desktop GL because it was believed GL 3.0+ requires Z16. It turns out only GL 3.0 requires Z16, and that is corrected in later GL versions. In light of that, and per Ian's suggestion, stop advertising Z16 support by default, and add a drirc option, gl30_sized_format_rules, so that users can override. I actually don't think that GL 3.0 requires Z16, either. In glspec30.20080923.pdf, page 180, it says: [...] memory allocation per texture component is assigned by the GL to match the allocations listed in tables 3.16-3.18 as closely as possible. [...] Required Texture Formats [...] In addition, implementations are required to support the following sized internal formats. Requesting one of these internal formats for any texture type will allocate exactly the internal component sizes and types shown for that format in tables 3.16-3.17: Notably, however, GL_DEPTH_COMPONENT16 does /not/ appear in table 3.16 or table 3.17. It appears in table 3.18, where the exact rule doesn't apply, and thus we fall back to the closely as possible rule. The confusing part is that the ordering of the tables in the PDF is: Table 3.16 (pages 182-184) Table 3.18 (bottom of page 184 to top of 185) Table 3.17 (page 185) I'm guessing that people saw table 3.16, then saw the one after with DEPTH_COMPONENT* formats, and assumed it was 3.17. But it's not. Yay latex! Thank you for putting things in random order because it fit better. :( I think we should just drop Z16 support entirely, and I think we should remove the requirement from the Piglit test. If the test is wrong, and it sounds like it is, then I'm definitely in favor of changing it. The reason to have Z16 is low-bandwidth GPUs in resource constrained environments. If an app specifically asks for Z16, then there's a non-zero (though possibly infinitesimal) probability they're doing it for a reason. For at least some platforms, isn't there just a work-around to implement to fix the performance issue? Doesn't the performance issue only affect some platforms to begin with? Maybe just change the check to ctx-TextureFormatSupported[MESA_FORMAT_Z_UNORM16] = ! platform has z16 performance issues; This regresses required-sized-texture-formats on GL 3.0. Signed-off-by: Chia-I Wu o...@lunarg.com Cc: Ian Romanick ian.d.roman...@intel.com --- src/mesa/drivers/dri/i965/brw_context.c | 3 +++ src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/brw_surface_formats.c | 7 --- src/mesa/drivers/dri/i965/intel_screen.c| 4 4 files changed, 12 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index ffbdb94..8ecf80b 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -553,6 +553,9 @@ brw_process_driconf_options(struct brw_context *brw) brw-disable_derivative_optimization = driQueryOptionb(brw-optionCache, disable_derivative_optimization); + brw-enable_z16 = + driQueryOptionb(brw-optionCache, gl30_sized_format_rules); + brw-precompile = driQueryOptionb(brw-optionCache, shader_precompile); ctx-Const.ForceGLSLExtensionsWarn = diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 98e90e2..fd10884 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1093,6 +1093,7 @@ struct brw_context bool disable_throttling; bool precompile; bool disable_derivative_optimization; + bool enable_z16; driOptionCache optionCache; /** @} */ diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c b/src/mesa/drivers/dri/i965/brw_surface_formats.c index 6a7e00a..1d5f044 100644 --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c @@ -623,10 +623,11 @@ brw_init_surface_formats(struct brw_context *brw) * increased depth stalls from a cacheline-based heuristic for detecting * depth stalls. * -* However, desktop GL 3.0+ require that you get exactly 16 bits when -* asking for DEPTH_COMPONENT16, so we have to respect that. +* However, desktop GL 3.0, and no other version, requires that you get + * exactly 16 bits when asking for DEPTH_COMPONENT16, so we have an drirc +* option to decide whether to respect that or not. */ - if (_mesa_is_desktop_gl(ctx)) + if (brw-enable_z16) ctx-TextureFormatSupported[MESA_FORMAT_Z_UNORM16] = true; /* On hardware that lacks support for ETC1, we map ETC1 to RGBX diff --git a/src/mesa/drivers/dri/i965/intel_screen.c
[Mesa-dev] [PATCH] st/omx/enc: add multi scaling buffers for performance improvement
From: Leo Liu leoxs...@gmail.com Signed-off-by: Leo Liu leoxs...@gmail.com --- src/gallium/state_trackers/omx/vid_enc.c | 39 src/gallium/state_trackers/omx/vid_enc.h | 7 -- 2 files changed, 29 insertions(+), 17 deletions(-) diff --git a/src/gallium/state_trackers/omx/vid_enc.c b/src/gallium/state_trackers/omx/vid_enc.c index 6e65274..fcdb305 100644 --- a/src/gallium/state_trackers/omx/vid_enc.c +++ b/src/gallium/state_trackers/omx/vid_enc.c @@ -273,8 +273,9 @@ static OMX_ERRORTYPE vid_enc_Destructor(OMX_COMPONENTTYPE *comp) vl_compositor_cleanup_state(priv-cstate); vl_compositor_cleanup(priv-compositor); - if (priv-scale_buffer) - priv-scale_buffer-destroy(priv-scale_buffer); + for (i = 0; i OMX_VID_ENC_NUM_SCALING_BUFFERS; ++i) + if (priv-scale_buffer[i]) + priv-scale_buffer[i]-destroy(priv-scale_buffer[i]); if (priv-s_pipe) priv-s_pipe-destroy(priv-s_pipe); @@ -447,7 +448,8 @@ static OMX_ERRORTYPE vid_enc_SetConfig(OMX_HANDLETYPE handle, OMX_INDEXTYPE idx, OMX_COMPONENTTYPE *comp = handle; vid_enc_PrivateType *priv = comp-pComponentPrivate; OMX_ERRORTYPE r; - + int i; + if (!config) return OMX_ErrorBadParameter; @@ -473,11 +475,12 @@ static OMX_ERRORTYPE vid_enc_SetConfig(OMX_HANDLETYPE handle, OMX_INDEXTYPE idx, if (scale-xWidth 176 || scale-xHeight 144) return OMX_ErrorBadParameter; - if (priv-scale_buffer) { - priv-scale_buffer-destroy(priv-scale_buffer); - priv-scale_buffer = NULL; + for (i = 0; i OMX_VID_ENC_NUM_SCALING_BUFFERS; ++i) { + if (priv-scale_buffer[i]) { +priv-scale_buffer[i]-destroy(priv-scale_buffer[i]); +priv-scale_buffer[i] = NULL; + } } - priv-scale = *scale; if (priv-scale.xWidth != 0x priv-scale.xHeight != 0x) { struct pipe_video_buffer templat = {}; @@ -487,9 +490,11 @@ static OMX_ERRORTYPE vid_enc_SetConfig(OMX_HANDLETYPE handle, OMX_INDEXTYPE idx, templat.width = priv-scale.xWidth; templat.height = priv-scale.xHeight; templat.interlaced = false; - priv-scale_buffer = priv-s_pipe-create_video_buffer(priv-s_pipe, templat); - if (!priv-scale_buffer) -return OMX_ErrorInsufficientResources; + for (i = 0; i OMX_VID_ENC_NUM_SCALING_BUFFERS; ++i) { +priv-scale_buffer[i] = priv-s_pipe-create_video_buffer(priv-s_pipe, templat); +if (!priv-scale_buffer[i]) + return OMX_ErrorInsufficientResources; + } } break; @@ -545,8 +550,10 @@ static OMX_ERRORTYPE vid_enc_MessageHandler(OMX_COMPONENTTYPE* comp, internalReq templat.profile = PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE; templat.entrypoint = PIPE_VIDEO_ENTRYPOINT_ENCODE; templat.chroma_format = PIPE_VIDEO_CHROMA_FORMAT_420; - templat.width = priv-scale_buffer ? priv-scale.xWidth : port-sPortParam.format.video.nFrameWidth; - templat.height = priv-scale_buffer ? priv-scale.xHeight : port-sPortParam.format.video.nFrameHeight; + templat.width = priv-scale_buffer[priv-current_scale_buffer] ? +priv-scale.xWidth : port-sPortParam.format.video.nFrameWidth; + templat.height = priv-scale_buffer[priv-current_scale_buffer] ? +priv-scale.xHeight : port-sPortParam.format.video.nFrameHeight; templat.max_references = 1; priv-codec = priv-s_pipe-create_video_codec(priv-s_pipe, templat); @@ -736,7 +743,7 @@ static OMX_ERRORTYPE vid_enc_EncodeFrame(omx_base_PortType *port, OMX_BUFFERHEAD /* -- scale input image - */ - if (priv-scale_buffer) { + if (priv-scale_buffer[priv-current_scale_buffer]) { struct vl_compositor *compositor = priv-compositor; struct vl_compositor_state *s = priv-cstate; struct pipe_sampler_view **views; @@ -744,7 +751,8 @@ static OMX_ERRORTYPE vid_enc_EncodeFrame(omx_base_PortType *port, OMX_BUFFERHEAD unsigned i; views = vbuf-get_sampler_view_planes(vbuf); - dst_surface = priv-scale_buffer-get_surfaces(priv-scale_buffer); + dst_surface = priv-scale_buffer[priv-current_scale_buffer]-get_surfaces + (priv-scale_buffer[priv-current_scale_buffer]); vl_compositor_clear_layers(s); for (i = 0; i VL_MAX_SURFACES; ++i) { @@ -768,7 +776,8 @@ static OMX_ERRORTYPE vid_enc_EncodeFrame(omx_base_PortType *port, OMX_BUFFERHEAD } size = priv-scale.xWidth * priv-scale.xHeight * 2; - vbuf = priv-scale_buffer; + vbuf = priv-scale_buffer[priv-current_scale_buffer++]; + priv-current_scale_buffer %= OMX_VID_ENC_NUM_SCALING_BUFFERS; } priv-s_pipe-flush(priv-s_pipe, NULL, 0); diff --git a/src/gallium/state_trackers/omx/vid_enc.h
Re: [Mesa-dev] [PATCH][RFC] dri3: Add support for the GLX_EXT_buffer_age extension
On Wed, Feb 19, 2014 at 5:49 PM, Adel Gadllah adel.gadl...@gmail.com wrote: Hi, The attached patch adds support for the GLX_EXT_buffer_age extension, which is mostly used by compositors for efficient sub screen updates. The extension should not be reported as supported when running DRI2 but it seems to show up when I try to disable it with LIBGL_DRI3_DISABLE ... not sure why suggestions welcome. P.S: Please CC me when replying as I am not subscribed to the list. From: Adel Gadllah adel.gadl...@gmail.com Date: Sun, 16 Feb 2014 13:40:42 +0100 Subject: [PATCH] dri3: Add GLX_EXT_buffer_age support --- include/GL/glx.h | 5 + include/GL/glxext.h | 5 + src/glx/dri2_glx.c| 1 + src/glx/dri3_glx.c| 17 + src/glx/dri3_priv.h | 2 ++ src/glx/glx_pbuffer.c | 7 +++ src/glx/glxclient.h | 1 + src/glx/glxextensions.c | 1 + src/glx/glxextensions.h | 1 + src/mesa/drivers/x11/glxapi.c | 3 +++ 10 files changed, 43 insertions(+) diff --git a/include/GL/glx.h b/include/GL/glx.h index 234abc0..b8b4d75 100644 --- a/include/GL/glx.h +++ b/include/GL/glx.h @@ -161,6 +161,11 @@ extern C { #define GLX_SAMPLES 0x186a1 /*11*/ +/* + * GLX_EXT_buffer_age + */ +#define GLX_BACK_BUFFER_AGE_EXT 0x20F4 + typedef struct __GLXcontextRec *GLXContext; typedef XID GLXPixmap; diff --git a/include/GL/glxext.h b/include/GL/glxext.h index 8c642f3..36e92dc 100644 --- a/include/GL/glxext.h +++ b/include/GL/glxext.h @@ -383,6 +383,11 @@ void glXReleaseTexImageEXT (Display *dpy, GLXDrawable drawable, int buffer); #define GLX_FLIP_COMPLETE_INTEL 0x8182 #endif /* GLX_INTEL_swap_event */ +#ifndef GLX_EXT_buffer_age +#define GLX_EXT_buffer_age 1 +#define GLX_BACK_BUFFER_AGE_EXT 0x20F4 +#endif /* GLX_EXT_buffer_age */ + #ifndef GLX_MESA_agp_offset #define GLX_MESA_agp_offset 1 typedef unsigned int ( *PFNGLXGETAGPOFFSETMESAPROC) (const void *pointer); diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c index 67fe9c1..007f449 100644 --- a/src/glx/dri2_glx.c +++ b/src/glx/dri2_glx.c @@ -1288,6 +1288,7 @@ dri2CreateScreen(int screen, struct glx_display * priv) psp-waitForSBC = NULL; psp-setSwapInterval = NULL; psp-getSwapInterval = NULL; + psp-queryBufferAge = NULL; if (pdp-driMinor = 2) { psp-getDrawableMSC = dri2DrawableGetMSC; diff --git a/src/glx/dri3_glx.c b/src/glx/dri3_glx.c index 70ec057..07120e1 100644 --- a/src/glx/dri3_glx.c +++ b/src/glx/dri3_glx.c @@ -1345,6 +1345,8 @@ dri3_swap_buffers(__GLXDRIdrawable *pdraw, int64_t target_msc, int64_t divisor, target_msc = priv-msc + priv-swap_interval * (priv-send_sbc - priv-recv_sbc); priv-buffers[buf_id]-busy = 1; + priv-buffers[buf_id]-last_swap = priv-swap_count; + xcb_present_pixmap(c, priv-base.xDrawable, priv-buffers[buf_id]-pixmap, @@ -1379,11 +1381,23 @@ dri3_swap_buffers(__GLXDRIdrawable *pdraw, int64_t target_msc, int64_t divisor, xcb_flush(c); if (priv-stamp) ++(*priv-stamp); + + priv-swap_count++; } return ret; } +static int +dri3_query_buffer_age(__GLXDRIdrawable *pdraw) +{ + struct dri3_drawable *priv = (struct dri3_drawable *) pdraw; + int buf_id = DRI3_BACK_ID(priv-cur_back); + if (!priv-buffers[buf_id]-last_swap) +return 0; + return priv-swap_count - priv-buffers[buf_id]-last_swap; +} + /** dri3_open * * Wrapper around xcb_dri3_open @@ -1742,6 +1756,9 @@ dri3_create_screen(int screen, struct glx_display * priv) psp-copySubBuffer = dri3_copy_sub_buffer; __glXEnableDirectExtension(psc-base, GLX_MESA_copy_sub_buffer); + psp-queryBufferAge = dri3_query_buffer_age; + __glXEnableDirectExtension(psc-base, GLX_EXT_buffer_age); + free(driverName); free(deviceName); diff --git a/src/glx/dri3_priv.h b/src/glx/dri3_priv.h index 1d124f8..d00440a 100644 --- a/src/glx/dri3_priv.h +++ b/src/glx/dri3_priv.h @@ -97,6 +97,7 @@ struct dri3_buffer { uint32_t cpp; uint32_t flags; uint32_t width, height; + uint32_t last_swap; enum dri3_buffer_typebuffer_type; }; @@ -184,6 +185,7 @@ struct dri3_drawable { struct dri3_buffer *buffers[DRI3_NUM_BUFFERS]; int cur_back; int num_back; + uint32_t swap_count; uint32_t *stamp; diff --git a/src/glx/glx_pbuffer.c b/src/glx/glx_pbuffer.c index 411d6e5..a87a0a4 100644 --- a/src/glx/glx_pbuffer.c +++ b/src/glx/glx_pbuffer.c @@ -365,6 +365,13 @@ GetDrawableAttribute(Display * dpy, GLXDrawable drawable, #if defined(GLX_DIRECT_RENDERING) !defined(GLX_USE_APPLEGL) { __GLXDRIdrawable *pdraw = GetGLXDRIDrawable(dpy, drawable); +struct glx_screen *psc = pdraw-psc; + +if
Re: [Mesa-dev] [PATCH] i965: Implement a CS stall workaround on Broadwell.
Reviewed-by: Ian Romanick ian.d.roman...@intel.com On 02/19/2014 04:28 PM, Kenneth Graunke wrote: According to the latest documentation, any PIPE_CONTROL with the Command Streamer Stall bit set must also have another bit set, with five different options: - Render Target Cache Flush - Depth Cache Flush - Stall at Pixel Scoreboard - Post-Sync Operation - Depth Stall I chose Stall at Pixel Scoreboard since we've used it effectively in the past, but the choice is fairly arbitrary. Implementing this in the PIPE_CONTROL emit helpers ensures that the workaround will always take effect when it ought to. Apparently, this workaround may be necessary on older hardware as well; for now I've only added it to Broadwell as it's absolutely necessary there. Subsequent patches could add it to older platforms, provided someone tests it there. v2: Only flag Stall at Pixel Scoreboard when none of the other bits are set (suggested by Ian Romanick). Cc: Ian Romanick i...@freedesktop.org Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 36 +++ 1 file changed, 36 insertions(+) Sure, that seems reasonable, Ian. I've updated the patch to only add stall at scoreboard when one of the other bits isn't already present. diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c index 4624268..bdb7b6b 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c @@ -432,6 +432,38 @@ intel_batchbuffer_data(struct brw_context *brw, } /** + * According to the latest documentation, any PIPE_CONTROL with the + * Command Streamer Stall bit set must also have another bit set, + * with five different options: + * + * - Render Target Cache Flush + * - Depth Cache Flush + * - Stall at Pixel Scoreboard + * - Post-Sync Operation + * - Depth Stall + * + * I chose Stall at Pixel Scoreboard since we've used it effectively + * in the past, but the choice is fairly arbitrary. + */ +static void +add_cs_stall_workaround_bits(uint32_t *flags) +{ + uint32_t wa_bits = PIPE_CONTROL_WRITE_FLUSH | + PIPE_CONTROL_DEPTH_CACHE_FLUSH | + PIPE_CONTROL_WRITE_IMMEDIATE | + PIPE_CONTROL_WRITE_DEPTH_COUNT | + PIPE_CONTROL_WRITE_TIMESTAMP | + PIPE_CONTROL_STALL_AT_SCOREBOARD | + PIPE_CONTROL_DEPTH_STALL; + + /* If we're doing a CS stall, and don't already have one of the +* workaround bits set, add Stall at Pixel Scoreboard. +*/ + if ((*flags PIPE_CONTROL_CS_STALL) != 0 (*flags wa_bits) == 0) + *flags |= PIPE_CONTROL_STALL_AT_SCOREBOARD; +} + +/** * Emit a PIPE_CONTROL with various flushing flags. * * The caller is responsible for deciding what flags are appropriate for the @@ -441,6 +473,8 @@ void brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags) { if (brw-gen = 8) { + add_cs_stall_workaround_bits(flags); + BEGIN_BATCH(6); OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2)); OUT_BATCH(flags); @@ -481,6 +515,8 @@ brw_emit_pipe_control_write(struct brw_context *brw, uint32_t flags, uint32_t imm_lower, uint32_t imm_upper) { if (brw-gen = 8) { + add_cs_stall_workaround_bits(flags); + BEGIN_BATCH(6); OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2)); OUT_BATCH(flags); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] meta: Add support for integer blits.
Compared to i965, the code generated doesn't use the AVG instruction. But I'm not sure that multisampled integer resolves are really that important to worry about. --- src/mesa/drivers/common/meta.h | 10 ++ src/mesa/drivers/common/meta_blit.c | 68 + 2 files changed, 71 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h index c7a21fc..fcf45c4 100644 --- a/src/mesa/drivers/common/meta.h +++ b/src/mesa/drivers/common/meta.h @@ -221,9 +221,19 @@ struct blit_shader_table { struct blit_shader sampler_cubemap_array; }; +/** + * Indices in the blit_state-msaa_shaders[] array + * + * Note that setup_glsl_msaa_blit_shader() assumes that the _INT enums are one + * more than the non-_INT version and _UINT is one beyond that. + */ enum blit_msaa_shader { BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE, + BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE_INT, + BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE_UINT, BLIT_MSAA_SHADER_2D_MULTISAMPLE_COPY, + BLIT_MSAA_SHADER_2D_MULTISAMPLE_COPY_INT, + BLIT_MSAA_SHADER_2D_MULTISAMPLE_COPY_UINT, BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH_RESOLVE, BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH_COPY, BLIT_MSAA_SHADER_COUNT, diff --git a/src/mesa/drivers/common/meta_blit.c b/src/mesa/drivers/common/meta_blit.c index 7f5416d..34b58d9 100644 --- a/src/mesa/drivers/common/meta_blit.c +++ b/src/mesa/drivers/common/meta_blit.c @@ -95,9 +95,24 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, enum blit_msaa_shader shader_index; const char *samplers[] = { [BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE] = sampler2DMS, + [BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE_INT] = isampler2DMS, + [BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE_UINT] = usampler2DMS, [BLIT_MSAA_SHADER_2D_MULTISAMPLE_COPY] = sampler2DMS, + [BLIT_MSAA_SHADER_2D_MULTISAMPLE_COPY_INT] = isampler2DMS, + [BLIT_MSAA_SHADER_2D_MULTISAMPLE_COPY_UINT] = usampler2DMS, }; bool dst_is_msaa = false; + GLenum src_datatype; + const char *vec4_prefix; + + if (src_rb) { + src_datatype = _mesa_get_format_datatype(src_rb-Format); + } else { + /* depth-or-color glCopyTexImage fallback path that passes a NULL rb and + * doesn't handle integer. + */ + src_datatype = GL_UNSIGNED_NORMALIZED; + } if (ctx-DrawBuffer-Visual.samples 1) { /* If you're calling meta_BlitFramebuffer with the destination @@ -135,6 +150,21 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, shader_index = BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE; } + /* We rely on the enum being sorted this way. */ + STATIC_ASSERT(BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE_INT == + BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE + 1); + STATIC_ASSERT(BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE_UINT == + BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE + 2); + if (src_datatype == GL_INT) { + shader_index++; + vec4_prefix = i; + } else if (src_datatype == GL_UNSIGNED_INT) { + shader_index += 2; + vec4_prefix = u; + } else { + vec4_prefix = ; + } + if (blit-msaa_shaders[shader_index]) { _mesa_UseProgram(blit-msaa_shaders[shader_index]); return; @@ -199,11 +229,25 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, int samples = MAX2(src_rb-NumSamples, 1); char *sample_resolve; const char *arb_sample_shading_extension_string; + const char *merge_function; if (dst_is_msaa) { arb_sample_shading_extension_string = #extension GL_ARB_sample_shading : enable; sample_resolve = ralloc_asprintf(mem_ctx,out_color = texelFetch(texSampler, ivec2(texCoords), gl_SampleID);); + merge_function = ; } else { + if (src_datatype == GL_INT) { +merge_function = + ivec4 merge(ivec4 a, ivec4 b) { return (a ivec4(1)) + (b ivec4(1)) + (a b ivec4(1)); }\n; + } else if (src_datatype == GL_UNSIGNED_INT) { +merge_function = + uvec4 merge(uvec4 a, uvec4 b) { return (a uvec4(1)) + (b uvec4(1)) + (a b uvec4(1)); }\n; + } else { +/* The divide will happen at the end for floats. */ +merge_function = + vec4 merge(vec4 a, vec4 b) { return (a + b); }\n; + } + arb_sample_shading_extension_string = ; /* We're assuming power of two samples for this resolution procedure. @@ -218,8 +262,8 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, sample_resolve = rzalloc_size(mem_ctx, 1); for (int i = 0; i samples; i++) { ralloc_asprintf_append(sample_resolve, - vec4 sample_1_%d = texelFetch(texSampler, ivec2(texCoords), %d);\n, - i, i); + %svec4 sample_1_%d = texelFetch(texSampler,
[Mesa-dev] [PATCH 5/6] meta: Add support for doing MSAA to MSAA blits.
These are non-stretched, non-resolving blits, so it's just a matter of sampling once from our gl_SampleID and storing that to our color/depth. --- src/mesa/drivers/common/meta.h | 6 +- src/mesa/drivers/common/meta_blit.c | 147 2 files changed, 104 insertions(+), 49 deletions(-) diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h index 7d4474e..c7a21fc 100644 --- a/src/mesa/drivers/common/meta.h +++ b/src/mesa/drivers/common/meta.h @@ -222,8 +222,10 @@ struct blit_shader_table { }; enum blit_msaa_shader { - BLIT_MSAA_SHADER_2D_MULTISAMPLE, - BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH, + BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE, + BLIT_MSAA_SHADER_2D_MULTISAMPLE_COPY, + BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH_RESOLVE, + BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH_COPY, BLIT_MSAA_SHADER_COUNT, }; diff --git a/src/mesa/drivers/common/meta_blit.c b/src/mesa/drivers/common/meta_blit.c index 65e2692..7f5416d 100644 --- a/src/mesa/drivers/common/meta_blit.c +++ b/src/mesa/drivers/common/meta_blit.c @@ -35,6 +35,7 @@ #include main/fbobject.h #include main/macros.h #include main/matrix.h +#include main/multisample.h #include main/readpix.h #include main/shaderapi.h #include main/texobj.h @@ -93,22 +94,45 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, void *mem_ctx; enum blit_msaa_shader shader_index; const char *samplers[] = { - [BLIT_MSAA_SHADER_2D_MULTISAMPLE] = sampler2DMS, + [BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE] = sampler2DMS, + [BLIT_MSAA_SHADER_2D_MULTISAMPLE_COPY] = sampler2DMS, }; + bool dst_is_msaa = false; + + if (ctx-DrawBuffer-Visual.samples 1) { + /* If you're calling meta_BlitFramebuffer with the destination + * multisampled, this is the only path that will work -- swrast and + * CopyTexImage won't work on it either. + */ + assert(ctx-Extensions.ARB_sample_shading); + + dst_is_msaa = true; + + /* We need shader invocation per sample, not per pixel */ + _mesa_set_enable(ctx, GL_MULTISAMPLE, GL_TRUE); + _mesa_set_enable(ctx, GL_SAMPLE_SHADING, GL_TRUE); + _mesa_MinSampleShading(1.0); + } switch (target) { case GL_TEXTURE_2D_MULTISAMPLE: if (src_rb-_BaseFormat == GL_DEPTH_COMPONENT || src_rb-_BaseFormat == GL_DEPTH_STENCIL) { - shader_index = BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH; + if (dst_is_msaa) +shader_index = BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH_COPY; + else +shader_index = BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH_RESOLVE; } else { - shader_index = BLIT_MSAA_SHADER_2D_MULTISAMPLE; + if (dst_is_msaa) +shader_index = BLIT_MSAA_SHADER_2D_MULTISAMPLE_COPY; + else +shader_index = BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE; } break; default: _mesa_problem(ctx, Unkown texture target %s\n, _mesa_lookup_enum_by_nr(target)); - shader_index = BLIT_MSAA_SHADER_2D_MULTISAMPLE; + shader_index = BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE; } if (blit-msaa_shaders[shader_index]) { @@ -118,17 +142,32 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, mem_ctx = ralloc_context(NULL); - if (shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH) { - /* From the GL 4.3 spec: - * - * If there is a multisample buffer (the value of SAMPLE_BUFFERS is - * one), then values are obtained from the depth samples in this - * buffer. It is recommended that the depth value of the centermost - * sample be used, though implementations may choose any function - * of the depth sample values at each pixel. - * - * We're slacking and instead of choosing centermost, we've got 0. - */ + if (shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH_RESOLVE || + shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH_COPY) { + char *sample_index; + const char *arb_sample_shading_extension_string; + + if (dst_is_msaa) { + arb_sample_shading_extension_string = #extension GL_ARB_sample_shading : enable; + sample_index = gl_SampleID; + } else { + /* Don't need that extension, since we're drawing to a single-sampled + * destination. + */ + arb_sample_shading_extension_string = ; + /* From the GL 4.3 spec: + * + * If there is a multisample buffer (the value of SAMPLE_BUFFERS + * is one), then values are obtained from the depth samples in + * this buffer. It is recommended that the depth value of the + * centermost sample be used, though implementations may choose + * any function of the depth sample values at each pixel. + * + * We're slacking and instead of choosing centermost, we've got 0. + */
[Mesa-dev] [PATCH 2/6] meta: Add support for doing multisample resolves.
Note that this doesn't handle GL_EXT_multisample_scaled_blit yet. The i965 code for that extension bakes in knowledge of the sample positions (well, knowledge of the sample positions aligned to a lower-resolution grid), which we would have to do at runtime somehow for meta. --- src/mesa/drivers/common/meta.h | 7 ++ src/mesa/drivers/common/meta_blit.c | 202 +--- 2 files changed, 197 insertions(+), 12 deletions(-) diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h index 822bfa1..5d79253 100644 --- a/src/mesa/drivers/common/meta.h +++ b/src/mesa/drivers/common/meta.h @@ -221,6 +221,12 @@ struct blit_shader_table { struct blit_shader sampler_cubemap_array; }; +enum blit_msaa_shader { + BLIT_MSAA_SHADER_2D_MULTISAMPLE, + BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH, + BLIT_MSAA_SHADER_COUNT, +}; + /** * State for glBlitFramebufer() */ @@ -230,6 +236,7 @@ struct blit_state GLuint VBO; GLuint DepthFP; struct blit_shader_table shaders; + GLuint msaa_shaders[BLIT_MSAA_SHADER_COUNT]; struct temp_texture depthTex; }; diff --git a/src/mesa/drivers/common/meta_blit.c b/src/mesa/drivers/common/meta_blit.c index a2b284b..be91247 100644 --- a/src/mesa/drivers/common/meta_blit.c +++ b/src/mesa/drivers/common/meta_blit.c @@ -31,6 +31,7 @@ #include main/condrender.h #include main/depth.h #include main/enable.h +#include main/enums.h #include main/fbobject.h #include main/macros.h #include main/matrix.h @@ -81,8 +82,158 @@ init_blit_depth_pixels(struct gl_context *ctx) } static void +setup_glsl_msaa_blit_shader(struct gl_context *ctx, +struct blit_state *blit, +struct gl_renderbuffer *src_rb, +GLenum target) +{ + const char *vs_source; + char *fs_source; + GLuint vs, fs; + void *mem_ctx; + enum blit_msaa_shader shader_index; + const char *samplers[] = { + [BLIT_MSAA_SHADER_2D_MULTISAMPLE] = sampler2DMS, + }; + + switch (target) { + case GL_TEXTURE_2D_MULTISAMPLE: + if (src_rb-_BaseFormat == GL_DEPTH_COMPONENT || + src_rb-_BaseFormat == GL_DEPTH_STENCIL) { + shader_index = BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH; + } else { + shader_index = BLIT_MSAA_SHADER_2D_MULTISAMPLE; + } + break; + default: + _mesa_problem(ctx, Unkown texture target %s\n, +_mesa_lookup_enum_by_nr(target)); + shader_index = BLIT_MSAA_SHADER_2D_MULTISAMPLE; + } + + if (blit-msaa_shaders[shader_index]) { + _mesa_UseProgram(blit-msaa_shaders[shader_index]); + return; + } + + mem_ctx = ralloc_context(NULL); + + if (shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE_DEPTH) { + /* From the GL 4.3 spec: + * + * If there is a multisample buffer (the value of SAMPLE_BUFFERS is + * one), then values are obtained from the depth samples in this + * buffer. It is recommended that the depth value of the centermost + * sample be used, though implementations may choose any function + * of the depth sample values at each pixel. + * + * We're slacking and instead of choosing centermost, we've got 0. + */ + vs_source = ralloc_asprintf(mem_ctx, + #version 130\n + in vec2 position;\n + in vec2 textureCoords;\n + out vec2 texCoords;\n + void main()\n + {\n + texCoords = textureCoords;\n + gl_Position = vec4(position, 0.0, 1.0);\n + }\n); + fs_source = ralloc_asprintf(mem_ctx, + #version 130\n + #extension GL_ARB_texture_multisample : enable\n + uniform sampler2DMS texSampler;\n + in vec2 texCoords;\n + out vec4 out_color;\n + \n + void main()\n + {\n + gl_FragDepth = texelFetch(texSampler, ivec2(texCoords), 0).r;\n + }\n); + } else if (shader_index == BLIT_MSAA_SHADER_2D_MULTISAMPLE) { + char *sample_resolve; + /* You can create 2D_MULTISAMPLE textures with 0 sample count (meaning 1 + * sample). Yes, this is ridiculous. + */ + int samples = MAX2(src_rb-NumSamples, 1); + + /* We're assuming power of two samples for this resolution procedure. + * + * To avoid losing any floating point precision if the samples all + * happen to have the same value, we merge pairs of values at a time (so + * the
[Mesa-dev] [PATCH 4/6] meta: Save and restore a bunch of MSAA state.
We're disabling GL_MULTISAMPLE, so we didn't need to worry about a lot of that state. But to do MSAA to MSAA blits, we need to start handling more state. v2: Fix pasteo caught by Kenneth. --- src/mesa/drivers/common/meta.c | 40 +--- src/mesa/drivers/common/meta.h | 2 +- 2 files changed, 38 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index a0613f2..2dec2c3 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -51,6 +51,7 @@ #include main/macros.h #include main/matrix.h #include main/mipmap.h +#include main/multisample.h #include main/pixel.h #include main/pbo.h #include main/polygon.h @@ -719,9 +720,20 @@ _mesa_meta_begin(struct gl_context *ctx, GLbitfield state) } if (state MESA_META_MULTISAMPLE) { - save-MultisampleEnabled = ctx-Multisample.Enabled; + save-Multisample = ctx-Multisample; /* struct copy */ + if (ctx-Multisample.Enabled) _mesa_set_multisample(ctx, GL_FALSE); + if (ctx-Multisample.SampleCoverage) + _mesa_set_enable(ctx, GL_SAMPLE_COVERAGE, GL_FALSE); + if (ctx-Multisample.SampleAlphaToCoverage) + _mesa_set_enable(ctx, GL_SAMPLE_ALPHA_TO_COVERAGE, GL_FALSE); + if (ctx-Multisample.SampleAlphaToOne) + _mesa_set_enable(ctx, GL_SAMPLE_ALPHA_TO_ONE, GL_FALSE); + if (ctx-Multisample.SampleShading) + _mesa_set_enable(ctx, GL_SAMPLE_SHADING, GL_FALSE); + if (ctx-Multisample.SampleMask) + _mesa_set_enable(ctx, GL_SAMPLE_MASK, GL_FALSE); } if (state MESA_META_FRAMEBUFFER_SRGB) { @@ -1059,8 +1071,30 @@ _mesa_meta_end(struct gl_context *ctx) } if (state MESA_META_MULTISAMPLE) { - if (ctx-Multisample.Enabled != save-MultisampleEnabled) - _mesa_set_multisample(ctx, save-MultisampleEnabled); + struct gl_multisample_attrib *ctx_ms = ctx-Multisample; + struct gl_multisample_attrib *save_ms = save-Multisample; + + if (ctx_ms-Enabled != save_ms-Enabled) + _mesa_set_multisample(ctx, save_ms-Enabled); + if (ctx_ms-SampleCoverage != save_ms-SampleCoverage) + _mesa_set_enable(ctx, GL_SAMPLE_COVERAGE, save_ms-SampleCoverage); + if (ctx_ms-SampleAlphaToCoverage != save_ms-SampleAlphaToCoverage) + _mesa_set_enable(ctx, GL_SAMPLE_ALPHA_TO_COVERAGE, save_ms-SampleAlphaToCoverage); + if (ctx_ms-SampleAlphaToOne != save_ms-SampleAlphaToOne) + _mesa_set_enable(ctx, GL_SAMPLE_ALPHA_TO_ONE, save_ms-SampleAlphaToOne); + if (ctx_ms-SampleCoverageValue != save_ms-SampleCoverageValue || + ctx_ms-SampleCoverageInvert != save_ms-SampleCoverageInvert) { + _mesa_SampleCoverage(save_ms-SampleCoverageValue, + save_ms-SampleCoverageInvert); + } + if (ctx_ms-SampleShading != save_ms-SampleShading) + _mesa_set_enable(ctx, GL_SAMPLE_SHADING, save_ms-SampleShading); + if (ctx_ms-SampleMask != save_ms-SampleMask) + _mesa_set_enable(ctx, GL_SAMPLE_MASK, save_ms-SampleMask); + if (ctx_ms-SampleMaskValue != save_ms-SampleMaskValue) + _mesa_SampleMaski(0, save_ms-SampleMaskValue); + if (ctx_ms-MinSampleShadingValue != save_ms-MinSampleShadingValue) + _mesa_MinSampleShading(save_ms-MinSampleShadingValue); } if (state MESA_META_FRAMEBUFFER_SRGB) { diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h index 5d79253..7d4474e 100644 --- a/src/mesa/drivers/common/meta.h +++ b/src/mesa/drivers/common/meta.h @@ -168,7 +168,7 @@ struct save_state struct gl_feedback Feedback; /** MESA_META_MULTISAMPLE */ - GLboolean MultisampleEnabled; + struct gl_multisample_attrib Multisample; /** MESA_META_FRAMEBUFFER_SRGB */ GLboolean sRGBEnabled; -- 1.9.rc1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] meta: Try to do blending of sRGB values in linear colorspace.
Blending of values would occur when doing GL_LINEAR filtering with scaling, and in an upcoming commit when doing MSAA resolves. --- src/mesa/drivers/common/meta_blit.c | 30 +- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/common/meta_blit.c b/src/mesa/drivers/common/meta_blit.c index be91247..65e2692 100644 --- a/src/mesa/drivers/common/meta_blit.c +++ b/src/mesa/drivers/common/meta_blit.c @@ -375,13 +375,33 @@ blitframebuffer_texture(struct gl_context *ctx, _mesa_SamplerParameteri(sampler, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE); _mesa_SamplerParameteri(sampler, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE); - /* Always do our blits with no sRGB decode or encode. Note that -* GL_FRAMEBUFFER_SRGB has already been disabled by -* _mesa_meta_begin(). + /* Always do our blits with no net sRGB decode or encode. +* +* However, if both the src and dst can be srgb decode/encoded, enable them +* so that we do any blending (from scaling or from MSAA resolves) in the +* right colorspace. +* +* Our choice of not doing any net encode/decode is from the GL 3.0 +* specification: +* +* Blit operations bypass the fragment pipeline. The only fragment +* operations which affect a blit are the pixel ownership test and the +* scissor test. +* +* The GL 4.4 specification disagrees and says that the sRGB part of the +* fragment pipeline applies, but this was found to break applications. */ if (ctx-Extensions.EXT_texture_sRGB_decode) { - _mesa_SamplerParameteri(sampler, GL_TEXTURE_SRGB_DECODE_EXT, - GL_SKIP_DECODE_EXT); + if (_mesa_get_format_color_encoding(rb-Format) == GL_SRGB + ctx-DrawBuffer-Visual.sRGBCapable) { + _mesa_SamplerParameteri(sampler, GL_TEXTURE_SRGB_DECODE_EXT, + GL_DECODE_EXT); + _mesa_set_framebuffer_srgb(ctx, GL_TRUE); + } else { + _mesa_SamplerParameteri(sampler, GL_TEXTURE_SRGB_DECODE_EXT, + GL_SKIP_DECODE_EXT); + /* set_framebuffer_srgb was set by _mesa_meta_begin(). */ + } } if (!glsl_version) { -- 1.9.rc1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] i965: Fix miptree matching for multisampled, non-interleaved miptrees.
We haven't been executing this code before the meta-blit case, because we've been flagging the miptree as validated at texstorage time, and never having to revalidate. --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 15 ++- src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 2 ++ 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 5461562..355f7cd 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -876,13 +876,26 @@ intel_miptree_match_image(struct intel_mipmap_tree *mt, if (mt-target == GL_TEXTURE_CUBE_MAP) depth = 6; + int level_depth = mt-level[level].depth; + if (mt-num_samples 1) { + switch (mt-msaa_layout) { + case INTEL_MSAA_LAYOUT_NONE: + case INTEL_MSAA_LAYOUT_IMS: + break; + case INTEL_MSAA_LAYOUT_UMS: + case INTEL_MSAA_LAYOUT_CMS: + level_depth /= mt-num_samples; + break; + } + } + /* Test image dimensions against the base level image adjusted for * minification. This will also catch images not present in the * tree, changed targets, etc. */ if (width != minify(mt-logical_width0, level) || height != minify(mt-logical_height0, level) || - depth != mt-level[level].depth) { + depth != level_depth) { return false; } diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h index 6c45cfd..c274994 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h @@ -115,6 +115,8 @@ struct intel_mipmap_level *- For GL_TEXTURE_3D, it is the texture's depth at this miplevel. Its * value, like width and height, varies with miplevel. *- For other texture types, depth is 1. +*- Additionally, for UMS and CMS miptrees, depth is multiplied by +* sample count. */ GLuint depth; -- 1.9.rc1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Create a hardware context before initializing state module.
brw_init_state() calls brw_upload_initial_gpu_state(). If hardware contexts are enabled (brw-hw_ctx != NULL), this will upload some initial invariant state for the GPU. Without hardware contexts, we rely on this state being uploaded via atoms that subscribe to the BRW_NEW_CONTEXT bit. Commit 46d3c2bf4ddd227193b98861f1e632498fe547d8 accidentally moved the call to brw_init_state() before creating a hardware context. This meant brw_upload_initial_gpu_state would always early return. Except on Gen6+, we stopped uploading the initial GPU state via state atoms, so it never happened. Fixes a regression since 46d3c2bf4ddd227193b98861f1e632498fe547d8. Cc: 10.0 10.1 mesa-sta...@lists.freedesktop.org Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_context.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) The diff looks weird - I actually moved the hardware context initialization block up a few lines. Diff instead decided that I moved these three lines down below it. Which is equivalent, but...odd. diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 5800092..9791a49 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -700,12 +700,6 @@ brwCreateContext(gl_api api, intel_batchbuffer_init(brw); - brw_init_state(brw); - - intelInitExtensions(ctx); - - intel_fbo_init(brw); - if (brw-gen = 6) { /* Create a new hardware context. Using a hardware context means that * our GPU state will be saved/restored on context switch, allowing us @@ -723,6 +717,12 @@ brwCreateContext(gl_api api, } } + brw_init_state(brw); + + intelInitExtensions(ctx); + + intel_fbo_init(brw); + brw_init_surface_formats(brw); if (brw-is_g4x || brw-gen = 5) { -- 1.8.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH][RFC] dri3: Add support for the GLX_EXT_buffer_age extension
On 02/19/2014 02:49 PM, Adel Gadllah wrote: Hi, The attached patch adds support for the GLX_EXT_buffer_age extension, which is mostly used by compositors for efficient sub screen updates. The extension should not be reported as supported when running DRI2 but it seems to show up when I try to disable it with LIBGL_DRI3_DISABLE ... not sure why suggestions welcome. P.S: Please CC me when replying as I am not subscribed to the list. You'll need to fix that. :) You didn't send this patch with git-send-email. Whatever you used to send it also mangled it, so it won't apply. From: Adel Gadllah adel.gadl...@gmail.com Date: Sun, 16 Feb 2014 13:40:42 +0100 Subject: [PATCH] dri3: Add GLX_EXT_buffer_age support --- include/GL/glx.h | 5 + include/GL/glxext.h | 5 + src/glx/dri2_glx.c| 1 + src/glx/dri3_glx.c| 17 + src/glx/dri3_priv.h | 2 ++ src/glx/glx_pbuffer.c | 7 +++ src/glx/glxclient.h | 1 + src/glx/glxextensions.c | 1 + src/glx/glxextensions.h | 1 + src/mesa/drivers/x11/glxapi.c | 3 +++ 10 files changed, 43 insertions(+) diff --git a/include/GL/glx.h b/include/GL/glx.h index 234abc0..b8b4d75 100644 --- a/include/GL/glx.h +++ b/include/GL/glx.h @@ -161,6 +161,11 @@ extern C { #define GLX_SAMPLES 0x186a1 /*11*/ +/* + * GLX_EXT_buffer_age + */ +#define GLX_BACK_BUFFER_AGE_EXT 0x20F4 + typedef struct __GLXcontextRec *GLXContext; typedef XID GLXPixmap; diff --git a/include/GL/glxext.h b/include/GL/glxext.h index 8c642f3..36e92dc 100644 --- a/include/GL/glxext.h +++ b/include/GL/glxext.h @@ -383,6 +383,11 @@ void glXReleaseTexImageEXT (Display *dpy, GLXDrawable drawable, int buffer); #define GLX_FLIP_COMPLETE_INTEL 0x8182 #endif /* GLX_INTEL_swap_event */ +#ifndef GLX_EXT_buffer_age +#define GLX_EXT_buffer_age 1 +#define GLX_BACK_BUFFER_AGE_EXT 0x20F4 +#endif /* GLX_EXT_buffer_age */ + We get glxext.h directly from Khronos, so it should not be modified... except to import new versions from upstream. It looks like the upstream glxext.h has this, so the first patch in the series should be glx: Update glxext.h to revision 25407. And drop the change to glx.h. #ifndef GLX_MESA_agp_offset #define GLX_MESA_agp_offset 1 typedef unsigned int ( *PFNGLXGETAGPOFFSETMESAPROC) (const void *pointer); diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c index 67fe9c1..007f449 100644 --- a/src/glx/dri2_glx.c +++ b/src/glx/dri2_glx.c @@ -1288,6 +1288,7 @@ dri2CreateScreen(int screen, struct glx_display * priv) psp-waitForSBC = NULL; psp-setSwapInterval = NULL; psp-getSwapInterval = NULL; + psp-queryBufferAge = NULL; if (pdp-driMinor = 2) { psp-getDrawableMSC = dri2DrawableGetMSC; diff --git a/src/glx/dri3_glx.c b/src/glx/dri3_glx.c index 70ec057..07120e1 100644 --- a/src/glx/dri3_glx.c +++ b/src/glx/dri3_glx.c @@ -1345,6 +1345,8 @@ dri3_swap_buffers(__GLXDRIdrawable *pdraw, int64_t target_msc, int64_t divisor, target_msc = priv-msc + priv-swap_interval * (priv-send_sbc - priv-recv_sbc); priv-buffers[buf_id]-busy = 1; + priv-buffers[buf_id]-last_swap = priv-swap_count; + xcb_present_pixmap(c, priv-base.xDrawable, priv-buffers[buf_id]-pixmap, @@ -1379,11 +1381,23 @@ dri3_swap_buffers(__GLXDRIdrawable *pdraw, int64_t target_msc, int64_t divisor, xcb_flush(c); if (priv-stamp) ++(*priv-stamp); + + priv-swap_count++; } return ret; } +static int +dri3_query_buffer_age(__GLXDRIdrawable *pdraw) +{ + struct dri3_drawable *priv = (struct dri3_drawable *) pdraw; + int buf_id = DRI3_BACK_ID(priv-cur_back); Blank line here. Also maybe use dri3_back_buffer instead? const struct dri3_buffer *const back = dri3_back_buffer(priv); if (back-last_swap != 0) return 0; else return priv-swap_count - back-last_swap; + if (!priv-buffers[buf_id]-last_swap) +return 0; And here. + return priv-swap_count - priv-buffers[buf_id]-last_swap; +} + /** dri3_open * * Wrapper around xcb_dri3_open @@ -1742,6 +1756,9 @@ dri3_create_screen(int screen, struct glx_display * priv) psp-copySubBuffer = dri3_copy_sub_buffer; __glXEnableDirectExtension(psc-base, GLX_MESA_copy_sub_buffer); + psp-queryBufferAge = dri3_query_buffer_age; + __glXEnableDirectExtension(psc-base, GLX_EXT_buffer_age); + free(driverName); free(deviceName); diff --git a/src/glx/dri3_priv.h b/src/glx/dri3_priv.h index 1d124f8..d00440a 100644 --- a/src/glx/dri3_priv.h +++ b/src/glx/dri3_priv.h @@ -97,6 +97,7 @@ struct dri3_buffer { uint32_t cpp; uint32_t flags; uint32_t width, height; + uint32_t last_swap; enum dri3_buffer_typebuffer_type;
Re: [Mesa-dev] [Mesa-stable] [PATCH] i965: Create a hardware context before initializing state module.
On 02/19/2014 05:40 PM, Kenneth Graunke wrote: brw_init_state() calls brw_upload_initial_gpu_state(). If hardware contexts are enabled (brw-hw_ctx != NULL), this will upload some initial invariant state for the GPU. Without hardware contexts, we rely on this state being uploaded via atoms that subscribe to the BRW_NEW_CONTEXT bit. Commit 46d3c2bf4ddd227193b98861f1e632498fe547d8 accidentally moved the call to brw_init_state() before creating a hardware context. This meant brw_upload_initial_gpu_state would always early return. Except on Gen6+, we stopped uploading the initial GPU state via state atoms, so it never happened. Fixes a regression since 46d3c2bf4ddd227193b98861f1e632498fe547d8. This seems like a rational change... but why didn't 46d3c2b blow up the world on IVB and HSW? ...and only cause heisenbugs on BDW? Cc: 10.0 10.1 mesa-sta...@lists.freedesktop.org Signed-off-by: Kenneth Graunke kenn...@whitecape.org Either way, Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/mesa/drivers/dri/i965/brw_context.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) The diff looks weird - I actually moved the hardware context initialization block up a few lines. Diff instead decided that I moved these three lines down below it. Which is equivalent, but...odd. diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 5800092..9791a49 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -700,12 +700,6 @@ brwCreateContext(gl_api api, intel_batchbuffer_init(brw); - brw_init_state(brw); - - intelInitExtensions(ctx); - - intel_fbo_init(brw); - if (brw-gen = 6) { /* Create a new hardware context. Using a hardware context means that * our GPU state will be saved/restored on context switch, allowing us @@ -723,6 +717,12 @@ brwCreateContext(gl_api api, } } + brw_init_state(brw); + + intelInitExtensions(ctx); + + intel_fbo_init(brw); + brw_init_surface_formats(brw); if (brw-is_g4x || brw-gen = 5) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] util: Add util_bswap64()
On Mit, 2014-02-19 at 15:09 -0800, Tom Stellard wrote: --- src/gallium/auxiliary/util/u_math.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/gallium/auxiliary/util/u_math.h b/src/gallium/auxiliary/util/u_math.h index b5e0663..49f8bda 100644 --- a/src/gallium/auxiliary/util/u_math.h +++ b/src/gallium/auxiliary/util/u_math.h @@ -741,6 +741,16 @@ util_bswap32(uint32_t n) #endif } +/** + * Reverse byte order of a 64bit word. + */ +static INLINE uint64_t +util_bswap64(uint64_t n) +{ Please use __builtin_bswap64() when available, as per util_bswap32(). + return ((uint64_t)util_bswap32(n 0x) 32) | There is no need for 0x. + util_bswap32((n 32)); +} -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] clover: Pass buffer offsets to the driver in set_global_binding() v2
On Don, 2014-02-20 at 00:53 +0100, Francisco Jerez wrote: Tom Stellard thomas.stell...@amd.com writes: diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c index 70efe5c..efd7143 100644 --- a/src/gallium/drivers/r600/evergreen_compute.c +++ b/src/gallium/drivers/r600/evergreen_compute.c @@ -662,10 +662,18 @@ static void evergreen_set_global_binding( for (int i = 0; i n; i++) { + uint32_t buffer_offset; + uint32_t handle; assert(resources[i]-target == PIPE_BUFFER); assert(resources[i]-bind PIPE_BIND_GLOBAL); - *(handles[i]) = buffers[i]-chunk-start_in_dw * 4; + buffer_offset = util_le32_to_cpu(*(handles[i])); + handle = buffer_offset + buffers[i]-chunk-start_in_dw * 4; + if (R600_BIG_ENDIAN) { + handle = util_bswap32(handle); + } + + *(handles[i]) = handle; I guess you could just do *(handles[i]) = util_cpu_to_le32(handle)? Oh, right, there isn't such a function -- though it would be trivial to implement. Right, just add: #define util_cpu_to_le64 util_le64_to_cpu [0] #define util_cpu_to_le32 util_le32_to_cpu #define util_cpu_to_le16 util_le16_to_cpu [0] and add util_le64_to_cpu in the first place :) -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] radeonsi: Use SI_BIG_ENDIAN now that it exists
On Mit, 2014-02-19 at 15:09 -0800, Tom Stellard wrote: --- src/gallium/drivers/radeonsi/si_shader.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 54270cd..9fed751 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -2333,7 +2333,7 @@ int si_compile_llvm(struct si_context *sctx, struct si_pipe_shader *shader, } ptr = (uint32_t*)sctx-b.ws-buffer_map(shader-bo-cs_buf, sctx-b.rings.gfx.cs, PIPE_TRANSFER_WRITE); - if (0 /*SI_BIG_ENDIAN*/) { + if (SI_BIG_ENDIAN) { for (i = 0; i binary.code_size / 4; ++i) { ptr[i] = util_bswap32(*(uint32_t*)(binary.code + i*4)); } I would prefer using util_cpu_to_le32() here as well. -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/13] i965: Enable smooth points when multisampling without point sprites.
Am 19.02.2014 11:04, schrieb Kenneth Graunke: According to the Point Multisample Rasterization of the OpenGL specification (3.0 or later), smooth points are supposed to be enabled implicitly when multisampling, regardless of the GL_POINT_SMOOTH flag. However, if GL_POINT_SPRITE is enabled, you get square points no matter what. Core contexts always enable point sprites, so this effectively makes smooth points go away, even in the case of multisampling. Fixes Piglit's EXT_framebuffer_multisample/point-smooth tests. (Yes, that's right folks, we actually have Piglit tests for this.) Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen8_sf_state.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen8_sf_state.c b/src/mesa/drivers/dri/i965/gen8_sf_state.c index b31b17e..0693fee 100644 --- a/src/mesa/drivers/dri/i965/gen8_sf_state.c +++ b/src/mesa/drivers/dri/i965/gen8_sf_state.c @@ -139,8 +139,11 @@ upload_sf(struct brw_context *brw) if (!(ctx-VertexProgram.PointSizeEnabled || ctx-Point._Attenuated)) dw3 |= GEN6_SF_USE_STATE_POINT_WIDTH; - if (ctx-Point.SmoothFlag) + /* _NEW_POINT | _NEW_MULTISAMPLE */ + if ((ctx-Point.SmoothFlag || ctx-Multisample._Enabled) + !ctx-Point.PointSprite) { dw3 |= GEN8_SF_SMOOTH_POINT_ENABLE; + } dw3 |= GEN6_SF_LINE_AA_MODE_TRUE; @@ -166,6 +169,7 @@ const struct brw_tracked_state gen8_sf_state = { .mesa = _NEW_LIGHT | _NEW_PROGRAM | _NEW_LINE | + _NEW_MULTISAMPLE | _NEW_POINT, .brw = BRW_NEW_CONTEXT, .cache = 0, Wow your hw can rasterize round points directly? At least ten years too late to be useful but that's a slick feature! In any case the logic looks right to me. I have not much idea about the hw do you need to match the raster bit (GEN8_RASTER_SMOOTH_POINT_ENABLE) for this too? Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 71870] Metro: Last Light rendering issues
https://bugs.freedesktop.org/show_bug.cgi?id=71870 Tapani Pälli lem...@gmail.com changed: What|Removed |Added Status|RESOLVED|VERIFIED --- Comment #46 from Tapani Pälli lem...@gmail.com --- I've verified with git master that issues 1, 3 and 4 are gone. 2 and 5 still hold but let's handle those separately from this bug. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH-RFC] i965: do not advertise MESA_FORMAT_Z_UNORM16 support
On Thu, Feb 20, 2014 at 7:03 AM, Kenneth Graunke kenn...@whitecape.org wrote: On 02/19/2014 02:27 PM, Ian Romanick wrote: On 02/19/2014 12:08 PM, Kenneth Graunke wrote: On 02/18/2014 09:48 PM, Chia-I Wu wrote: Since 73bc6061f5c3b6a3bb7a8114bb2e1ab77d23cfdb, Z16 support is not advertised for OpenGL ES contexts due to the terrible performance. It is still enabled for desktop GL because it was believed GL 3.0+ requires Z16. It turns out only GL 3.0 requires Z16, and that is corrected in later GL versions. In light of that, and per Ian's suggestion, stop advertising Z16 support by default, and add a drirc option, gl30_sized_format_rules, so that users can override. I actually don't think that GL 3.0 requires Z16, either. In glspec30.20080923.pdf, page 180, it says: [...] memory allocation per texture component is assigned by the GL to match the allocations listed in tables 3.16-3.18 as closely as possible. [...] Required Texture Formats [...] In addition, implementations are required to support the following sized internal formats. Requesting one of these internal formats for any texture type will allocate exactly the internal component sizes and types shown for that format in tables 3.16-3.17: Notably, however, GL_DEPTH_COMPONENT16 does /not/ appear in table 3.16 or table 3.17. It appears in table 3.18, where the exact rule doesn't apply, and thus we fall back to the closely as possible rule. The confusing part is that the ordering of the tables in the PDF is: Table 3.16 (pages 182-184) Table 3.18 (bottom of page 184 to top of 185) Table 3.17 (page 185) I'm guessing that people saw table 3.16, then saw the one after with DEPTH_COMPONENT* formats, and assumed it was 3.17. But it's not. Yay latex! Thank you for putting things in random order because it fit better. :( I think we should just drop Z16 support entirely, and I think we should remove the requirement from the Piglit test. If the test is wrong, and it sounds like it is, then I'm definitely in favor of changing it. The reason to have Z16 is low-bandwidth GPUs in resource constrained environments. If an app specifically asks for Z16, then there's a non-zero (though possibly infinitesimal) probability they're doing it for a reason. For at least some platforms, isn't there just a work-around to implement to fix the performance issue? Doesn't the performance issue only affect some platforms to begin with? Maybe just change the check to ctx-TextureFormatSupported[MESA_FORMAT_Z_UNORM16] = ! platform has z16 performance issues; Currently, all platforms have Z16 performance issues. On Haswell and later, we could potentially implement the PMA stall optimization, which I believe would reduce(?) the problem. I'm not sure if it would eliminate it though. I think the best course of action is: 1. Fix the Piglit test to not require precise depth formats. 2. Disable Z16 on all generations. 3. Add a to do item for implementing the HSW+ PMA stall optimization. 4. Add a to do item for re-evaluating Z16 on HSW+ once that's done. I've sent a fix for the piglit test. What is PMA stall optimization? I could not find any reference to it in the public docs. --Ken -- o...@lunarg.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH] i965: Create a hardware context before initializing state module.
On 02/19/2014 06:05 PM, Ian Romanick wrote: On 02/19/2014 05:40 PM, Kenneth Graunke wrote: brw_init_state() calls brw_upload_initial_gpu_state(). If hardware contexts are enabled (brw-hw_ctx != NULL), this will upload some initial invariant state for the GPU. Without hardware contexts, we rely on this state being uploaded via atoms that subscribe to the BRW_NEW_CONTEXT bit. Commit 46d3c2bf4ddd227193b98861f1e632498fe547d8 accidentally moved the call to brw_init_state() before creating a hardware context. This meant brw_upload_initial_gpu_state would always early return. Except on Gen6+, we stopped uploading the initial GPU state via state atoms, so it never happened. Fixes a regression since 46d3c2bf4ddd227193b98861f1e632498fe547d8. This seems like a rational change... but why didn't 46d3c2b blow up the world on IVB and HSW? ...and only cause heisenbugs on BDW? Presumably because it doesn't do much. On Gen6+, all it does is: - PIPELINE_SELECT (probably already render, unless you're doing media/gpgpu) - STATE_SIP (basically only matters if you hit breakpoints or invalid operations) - VF_STATISTICS (we don't use the counters anyway) On Broadwell, it also uploads 3DSTATE_SAMPLE_PATTERN, which meant that any Piglit test that relied on legitimate sample positions would fail. That is, until I ran with a branch that actually emitted 3DSTATE_SAMPLE_PATTERN---after that, the sample positions remained programmed, and tests continued to work fine until reboot. Cc: 10.0 10.1 mesa-sta...@lists.freedesktop.org Signed-off-by: Kenneth Graunke kenn...@whitecape.org Either way, Reviewed-by: Ian Romanick ian.d.roman...@intel.com signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nv50: enable txg where supported
Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- This applies on top of Dave Airlie's r600g-texture-gather branch. Ran piglit with -t gather, passed all 1057 tests. Can't say I fully understand what all the arguments to handleTEX in the Coverter are but... seems to work. Will probably require some care for nvc0 support which should have SM5 caps. src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 3 ++- src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 4 src/gallium/drivers/nouveau/nv50/nv50_screen.c| 3 ++- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp index bef103f..e2f93bb 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp @@ -1447,7 +1447,7 @@ CodeEmitterNV50::emitTEX(const TexInstruction *i) code[0] |= 0x0100; break; case OP_TXG: - code[0] = 0x0100; + code[0] |= 0x0100; code[1] = 0x8000; break; default: @@ -1790,6 +1790,7 @@ CodeEmitterNV50::emitInstruction(Instruction *insn) case OP_TXB: case OP_TXL: case OP_TXF: + case OP_TXG: emitTEX(insn-asTex()); break; case OP_TXQ: diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index d226d0c..ccddb9a 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -558,6 +558,7 @@ static nv50_ir::operation translateOpcode(uint opcode) NV50_IR_OPCODE_CASE(SAD, SAD); NV50_IR_OPCODE_CASE(TXF, TXF); NV50_IR_OPCODE_CASE(TXQ, TXQ); + NV50_IR_OPCODE_CASE(TG4, TXG); NV50_IR_OPCODE_CASE(EMIT, EMIT); NV50_IR_OPCODE_CASE(ENDPRIM, RESTART); @@ -2434,6 +2435,9 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn) case TGSI_OPCODE_TXD: handleTEX(dst0, 3, 3, 0x03, 0x0f, 0x10, 0x20); break; + case TGSI_OPCODE_TG4: + handleTEX(dst0, 2, 2, 0x03, 0x0f, 0x00, 0x00); + break; case TGSI_OPCODE_TEX2: handleTEX(dst0, 2, 2, 0x03, 0x10, 0x00, 0x00); break; diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index 488642a..9aafe94 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -193,11 +193,12 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_ENDIANNESS: return PIPE_ENDIAN_LITTLE; case PIPE_CAP_TGSI_VS_LAYER: - case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: case PIPE_CAP_TEXTURE_GATHER_SM5: return 0; case PIPE_CAP_MAX_VIEWPORTS: return NV50_MAX_VIEWPORTS; + case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: + return (class_3d = NVA3_3D_CLASS) ? 4 : 0; default: NOUVEAU_ERR(unknown PIPE_CAP %d\n, param); return 0; -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev