Re: [Mesa-dev] [PATCH 5/8] i965 gen6: Initial implementation of transform feedback.

2011-12-19 Thread Paul Berry
On 17 December 2011 11:50, Eric Anholt e...@anholt.net wrote:

 On Tue, 13 Dec 2011 15:35:18 -0800, Paul Berry stereotype...@gmail.com
 wrote:
  diff --git a/src/mesa/drivers/dri/i965/brw_gs.c
 b/src/mesa/drivers/dri/i965/brw_gs.c
  index f5d5898..c323a73 100644
  --- a/src/mesa/drivers/dri/i965/brw_gs.c
  +++ b/src/mesa/drivers/dri/i965/brw_gs.c
  @@ -183,7 +183,31 @@ static void populate_key( struct brw_context *brw,
  } else if (intel-gen == 6) {
 /* On Gen6, GS is used for transform feedback. */
 /* _NEW_TRANSFORM_FEEDBACK */
  -  key-need_gs_prog = ctx-TransformFeedback.CurrentObject-Active;
  +  if (ctx-TransformFeedback.CurrentObject-Active) {
  + const struct gl_shader_program *shaderprog =
  +ctx-Shader.CurrentVertexProgram;
  + const struct gl_transform_feedback_info *linked_xfb_info =
  +shaderprog-LinkedTransformFeedback;
  + int i;
  +
  + /* Make sure that the VUE slots won't overflow the unsigned
 chars in
  +  * key-transform_feedback_bindings[].
  +  */
  + assert (BRW_VERT_RESULT_MAX = 256);

 We generally don't put a space between assert and the opening paren.


I prefer to put a space before the opening paren to highlight whenever the
thing I am doing is not a function call, as a reminder that flow control or
argument evaluation may not be as expected--we follow this convention
fairly consistently in i965 for if, for, switch, and while.  I think we
should do it for assert as well, since assert has both flow control
implications (it may abort) and argument evaluation implications (it won't
evaluate its argument under non-debug builds).



 Also, this assert is really weird to me.  Neither of the two fields in
 the key are related to 256 -- there's a 7-bit field, and a
 BRW_MAX_SOL_BINDINGS field.  (Also, for compile-time checks like this,
 we now have STATIC_ASSERT.  Hooray!)


The BRW_MAX_SOL_BINDINGS field (key-transform_feedback_bindings[]) is
related to 256 because it's an array of unsigned chars.  Each entry in the
array of unsigned chars stores a gl_vert_result, so the assertion is to
make sure that all possible gl_vert_result values can fit in an unsigned
char without overflow.  In retrospect, the assertion should read
(VERT_RESULT_MAX = 256), since the brw-specific extensions to
gl_vert_result will never appear in key-transform_feedback_bindings[].
I'll fix that and change the assertion to a STATIC_ASSERT.



  @@ -107,12 +119,27 @@ static void brw_gs_overwrite_header_dw2(struct
 brw_gs_compile *c,
* of DWORD 2.  URB_WRITE messages need the primitive type in bits 6:2
 of
* DWORD 2.  So this function extracts the primitive type field,
 bitshifts it
* appropriately, and stores it in c-reg.header.
  + *
  + * Also, if num_verts is 3, it converts primitive type
  + * _3DPRIM_TRISTRIP_REVERSE to _3DPRIM_TRISTRIP, so that odd numbered
  + * triangles in a triangle strip will render correctly.
*/
  -static void brw_gs_overwrite_header_dw2_from_r0(struct brw_gs_compile
 *c)
  +static void brw_gs_overwrite_header_dw2_from_r0(struct brw_gs_compile
 *c,
  +unsigned num_verts)
   {
  struct brw_compile *p = c-func;
  brw_AND(p, get_element_ud(c-reg.header, 2),
 get_element_ud(c-reg.R0, 2),
  brw_imm_ud(0x1f));
  +   if (num_verts == 3) {
  +  brw_CMP(p, vec1(brw_null_reg()), BRW_CONDITIONAL_EQ,
  +  get_element_ud(c-reg.header, 2),
  +  brw_imm_ud(_3DPRIM_TRISTRIP_REVERSE));
  +  {
  + brw_MOV(p, get_element_ud(c-reg.header, 2),
  + brw_imm_ud(_3DPRIM_TRISTRIP));
  + brw_set_predicate_control(p, BRW_PREDICATE_NONE);
  +  }
  +   }

 This doesn't mess up normal rendering of tristrips in any way?  It looks
 like it could be a separate commit to me.


It doesn't mess up normal rendering of tristrips because we also set the
GEN6_GS_REORDER bit of gen6_gs_state.c--this causes alternate triangles in
a tristrip to be reordered so that they all have the same orientation, so
in the geometry shader we need to map _3DPRIM_TRISTRIP_REVERSE to
_3DPRIM_TRISTRIP to prevent future pipeline stages from trying to undo the
reordering.

Having said that, though, I'm going to revert this change, because the
reordering performed by GEN6_GS_REORDER is not the reordering we need for
transform feedback.  For a tristrip with vertices A, B, C, D, and E (in
that order), the hardware normally generates triangles in the order ABC,
BCD, CDE (which is wrong because BCD is improperly oriented).  With
reordering it generates ABC, BDC, CDE.  However, to ensure that the
transform feedback output will render properly when flatshading, we need
the order to be ABC, CBD, CDE.

So consider this part of the patch NAK'd.  I want to wait until we are
implementing transform feedback on Gen7 to fix ordering issues anyhow
(since on Gen7 we have less control over the ordering, and 

Re: [Mesa-dev] [PATCH 5/8] i965 gen6: Initial implementation of transform feedback.

2011-12-19 Thread Paul Berry
On 16 December 2011 19:50, Kenneth Graunke kenn...@whitecape.org wrote:

 On 12/13/2011 03:35 PM, Paul Berry wrote:
 [snip]
  +static void
  +brw_update_sol_surface(struct brw_context *brw, drm_intel_bo *bo,
  +   uint32_t *out_offset, unsigned
 num_vector_components,
  +   unsigned stride_dwords, unsigned offset_dwords,
  +   uint32_t buffer_size_minus_1)
  +{
  +   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE, 6 *
 4, 32,
  +out_offset);
  +   uint32_t width = buffer_size_minus_1  0x7f;
  +   uint32_t height = (buffer_size_minus_1  0xfff80)  7;
  +   uint32_t depth = (buffer_size_minus_1  0x7f0)  20;
  +   uint32_t pitch_minus_1 = 4*stride_dwords - 1;
  +   uint32_t surface_format;
  +   uint32_t offset_bytes = 4 * offset_dwords;
  +   switch (num_vector_components) {
  +   case 1:
  +  surface_format = BRW_SURFACEFORMAT_R32_FLOAT;
  +  break;
  +   case 2:
  +  surface_format = BRW_SURFACEFORMAT_R32G32_FLOAT;
  +  break;
  +   case 3:
  +  surface_format = BRW_SURFACEFORMAT_R32G32B32_FLOAT;
  +  break;
  +   case 4:
  +  surface_format = BRW_SURFACEFORMAT_R32G32B32A32_FLOAT;
  +  break;
  +   default:
  +  assert (!Invalid vector size for transform feedback output);
  +  surface_format = BRW_SURFACEFORMAT_R32_FLOAT;
  +  break;
  +   }

 Is it possible to have integer transform feedback outputs?  If so, we'd
 need to adjust this.  But we can do that later (if at all).


Yes, integer transform feedback outputs are possible, but I believe that it
won't be necessary to adjust this because the data port doesn't do any
format conversions on streamed vertex buffer write messages.  But I'll
make sure we have the necessary piglit tests to verify this, and if I'm
wrong I'll do a follow-up patch.



 Looks good, Paul.

 Reviewed-by: Kenneth Graunke kenn...@whitecape.org

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/8] i965 gen6: Initial implementation of transform feedback.

2011-12-17 Thread Eric Anholt
On Tue, 13 Dec 2011 15:35:18 -0800, Paul Berry stereotype...@gmail.com wrote:
 diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
 b/src/mesa/drivers/dri/i965/brw_gs.c
 index f5d5898..c323a73 100644
 --- a/src/mesa/drivers/dri/i965/brw_gs.c
 +++ b/src/mesa/drivers/dri/i965/brw_gs.c
 @@ -183,7 +183,31 @@ static void populate_key( struct brw_context *brw,
 } else if (intel-gen == 6) {
/* On Gen6, GS is used for transform feedback. */
/* _NEW_TRANSFORM_FEEDBACK */
 -  key-need_gs_prog = ctx-TransformFeedback.CurrentObject-Active;
 +  if (ctx-TransformFeedback.CurrentObject-Active) {
 + const struct gl_shader_program *shaderprog =
 +ctx-Shader.CurrentVertexProgram;
 + const struct gl_transform_feedback_info *linked_xfb_info =
 +shaderprog-LinkedTransformFeedback;
 + int i;
 +
 + /* Make sure that the VUE slots won't overflow the unsigned chars in
 +  * key-transform_feedback_bindings[].
 +  */
 + assert (BRW_VERT_RESULT_MAX = 256);

We generally don't put a space between assert and the opening paren.

Also, this assert is really weird to me.  Neither of the two fields in
the key are related to 256 -- there's a 7-bit field, and a
BRW_MAX_SOL_BINDINGS field.  (Also, for compile-time checks like this,
we now have STATIC_ASSERT.  Hooray!)

 @@ -107,12 +119,27 @@ static void brw_gs_overwrite_header_dw2(struct 
 brw_gs_compile *c,
   * of DWORD 2.  URB_WRITE messages need the primitive type in bits 6:2 of
   * DWORD 2.  So this function extracts the primitive type field, bitshifts it
   * appropriately, and stores it in c-reg.header.
 + *
 + * Also, if num_verts is 3, it converts primitive type
 + * _3DPRIM_TRISTRIP_REVERSE to _3DPRIM_TRISTRIP, so that odd numbered
 + * triangles in a triangle strip will render correctly.
   */
 -static void brw_gs_overwrite_header_dw2_from_r0(struct brw_gs_compile *c)
 +static void brw_gs_overwrite_header_dw2_from_r0(struct brw_gs_compile *c,
 +unsigned num_verts)
  {
 struct brw_compile *p = c-func;
 brw_AND(p, get_element_ud(c-reg.header, 2), get_element_ud(c-reg.R0, 2),
 brw_imm_ud(0x1f));
 +   if (num_verts == 3) {
 +  brw_CMP(p, vec1(brw_null_reg()), BRW_CONDITIONAL_EQ,
 +  get_element_ud(c-reg.header, 2),
 +  brw_imm_ud(_3DPRIM_TRISTRIP_REVERSE));
 +  {
 + brw_MOV(p, get_element_ud(c-reg.header, 2),
 + brw_imm_ud(_3DPRIM_TRISTRIP));
 + brw_set_predicate_control(p, BRW_PREDICATE_NONE);
 +  }
 +   }

This doesn't mess up normal rendering of tristrips in any way?  It looks
like it could be a separate commit to me.

 diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
 b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
 index f9b0b71..4ec9ef5 100644
 --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
 +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c

 +/**
 + * Set up a binding table entry for use by stream output logic (transform
 + * feedback).
 + *
 + * buffer_size_minus_1 must me less than BRW_MAX_NUM_BUFFER_ENTRIES.
 + */
 +static void
 +brw_update_sol_surface(struct brw_context *brw, drm_intel_bo *bo,
 +   uint32_t *out_offset, unsigned num_vector_components,
 +   unsigned stride_dwords, unsigned offset_dwords,
 +   uint32_t buffer_size_minus_1)
 +{
 +   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE, 6 * 4, 32,
 +out_offset);
 +   uint32_t width = buffer_size_minus_1  0x7f;
 +   uint32_t height = (buffer_size_minus_1  0xfff80)  7;
 +   uint32_t depth = (buffer_size_minus_1  0x7f0)  20;
 +   uint32_t pitch_minus_1 = 4*stride_dwords - 1;
 +   uint32_t surface_format;
 +   uint32_t offset_bytes = 4 * offset_dwords;

Could I get just a little more whitespace in your code?  Between the
declaration and the switch, after the switch, etc.?

 +   surf[1] = bo-offset + offset_bytes; /* reloc */
 +   surf[2] = (width  BRW_SURFACE_WIDTH_SHIFT |
 +   height  BRW_SURFACE_HEIGHT_SHIFT);
 +   surf[3] = (depth  BRW_SURFACE_DEPTH_SHIFT |
 +  pitch_minus_1  BRW_SURFACE_PITCH_SHIFT);

This looks to me like the hardware width/heigth/depth fields are ending
up programmed as 1 larger than you intend.  Compare to
brw_create_constant_surface().

 +   /* Emit relocation to surface contents.  Section 5.1.1 of the gen4
 +* bspec (Data Cache) says that the data cache does not exist as
 +* a separate cache and is just the sampler cache.
 +*/
 +   drm_intel_bo_emit_reloc(brw-intel.batch.bo,
 +*out_offset + 4,
 +bo, offset_bytes,
 +I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER);
 +}

If it's the sampler cache, that would be (I915_GEM_DOMAIN_SAMPLER, 0).
But it obviously isn't, since we're reading and writing.

 diff --git 

Re: [Mesa-dev] [PATCH 5/8] i965 gen6: Initial implementation of transform feedback.

2011-12-16 Thread Kenneth Graunke
On 12/13/2011 03:35 PM, Paul Berry wrote:
[snip]
 +static void
 +brw_update_sol_surface(struct brw_context *brw, drm_intel_bo *bo,
 +   uint32_t *out_offset, unsigned num_vector_components,
 +   unsigned stride_dwords, unsigned offset_dwords,
 +   uint32_t buffer_size_minus_1)
 +{
 +   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE, 6 * 4, 32,
 +out_offset);
 +   uint32_t width = buffer_size_minus_1  0x7f;
 +   uint32_t height = (buffer_size_minus_1  0xfff80)  7;
 +   uint32_t depth = (buffer_size_minus_1  0x7f0)  20;
 +   uint32_t pitch_minus_1 = 4*stride_dwords - 1;
 +   uint32_t surface_format;
 +   uint32_t offset_bytes = 4 * offset_dwords;
 +   switch (num_vector_components) {
 +   case 1:
 +  surface_format = BRW_SURFACEFORMAT_R32_FLOAT;
 +  break;
 +   case 2:
 +  surface_format = BRW_SURFACEFORMAT_R32G32_FLOAT;
 +  break;
 +   case 3:
 +  surface_format = BRW_SURFACEFORMAT_R32G32B32_FLOAT;
 +  break;
 +   case 4:
 +  surface_format = BRW_SURFACEFORMAT_R32G32B32A32_FLOAT;
 +  break;
 +   default:
 +  assert (!Invalid vector size for transform feedback output);
 +  surface_format = BRW_SURFACEFORMAT_R32_FLOAT;
 +  break;
 +   }

Is it possible to have integer transform feedback outputs?  If so, we'd
need to adjust this.  But we can do that later (if at all).

Looks good, Paul.

Reviewed-by: Kenneth Graunke kenn...@whitecape.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/8] i965 gen6: Initial implementation of transform feedback.

2011-12-13 Thread Paul Berry
This patch adds basic transform feedback capability for Gen6 hardware.
This consists of several related pieces of functionality:

(1) In gen6_sol.c, we set up binding table entries for use by
transform feedback.  We use one binding table entry per transform
feedback varying (this allows us to avoid doing pointer arithmetic in
the shader, since we can set up the binding table entries with the
appropriate offsets and surface pitches to place each varying at the
correct address).

(2) In brw_context.c, we advertise the hardware capabilities, which
are as follows:

   MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS 64
   MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS4
   MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS16

OpenGL 3.0 requires these values to be at least 64, 4, and 4,
respectively.  The reason we advertise a larger value than required
for MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS is that we have already
set aside 64 binding table entries, so we might as well make them all
available in both separate attribs and interleaved modes.

(3) We set aside a single SVBI (streamed vertex buffer index) for
use by transform feedback.  The hardware supports four independent
SVBI's, but we only need one, since vertices are added to all
transform feedback buffers at the same rate.  Note: at the moment this
index is reset to 0 only when the driver is initialized.  It needs to
be reset to 0 whenever BeginTransformFeedback() is called, and
otherwise preserved.

(4) In brw_gs_emit.c and brw_gs.c, we modify the geometry shader
program to output transform feedback data as a side effect.

(5) In gen6_gs_state.c, we configure the geometry shader stage to
handle the SVBI pointer correctly.
---
 src/mesa/drivers/dri/i965/Makefile.sources|1 +
 src/mesa/drivers/dri/i965/brw_context.c   |   20 
 src/mesa/drivers/dri/i965/brw_context.h   |   42 -
 src/mesa/drivers/dri/i965/brw_defines.h   |6 +
 src/mesa/drivers/dri/i965/brw_eu.h|7 ++
 src/mesa/drivers/dri/i965/brw_eu_emit.c   |   39 +++
 src/mesa/drivers/dri/i965/brw_gs.c|   26 +-
 src/mesa/drivers/dri/i965/brw_gs.h|   20 
 src/mesa/drivers/dri/i965/brw_gs_emit.c   |  112 +++--
 src/mesa/drivers/dri/i965/brw_misc_state.c|2 +-
 src/mesa/drivers/dri/i965/brw_state.h |1 +
 src/mesa/drivers/dri/i965/brw_state_upload.c  |1 +
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c  |   61 +++
 src/mesa/drivers/dri/i965/gen6_gs_state.c |9 ++-
 src/mesa/drivers/dri/i965/gen6_sol.c  |  102 +++
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |1 +
 src/mesa/drivers/dri/intel/intel_context.h|7 ++
 17 files changed, 445 insertions(+), 12 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/gen6_sol.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index cd6a8f4..e50f9c3 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -93,6 +93,7 @@ i965_C_SOURCES := \
gen6_sampler_state.c \
gen6_scissor_state.c \
gen6_sf_state.c \
+gen6_sol.c \
gen6_urb.c \
gen6_viewport_state.c \
gen6_vs_state.c \
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index faa02bf..eb68bf4 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -178,6 +178,26 @@ brwCreateContext(int api,

ctx-Const.MaxTextureMaxAnisotropy = 16.0;
 
+   /* Hardware only supports a limited number of transform feedback buffers.
+* So we need to override the Mesa default (which is based only on software
+* limits).
+*/
+   ctx-Const.MaxTransformFeedbackSeparateAttribs = BRW_MAX_SOL_BUFFERS;
+
+   /* On Gen6, in the worst case, we use up one binding table entry per
+* transform feedback component (see comments above the definition of
+* BRW_MAX_SOL_BINDINGS, in brw_context.h), so we need to advertise a value
+* for MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS equal to
+* BRW_MAX_SOL_BINDINGS.
+*
+* In separate components mode, we need to divide this value by
+* BRW_MAX_SOL_BUFFERS, so that the total number of binding table entries
+* used up by all buffers will not exceed BRW_MAX_SOL_BINDINGS.
+*/
+   ctx-Const.MaxTransformFeedbackInterleavedComponents = BRW_MAX_SOL_BINDINGS;
+   ctx-Const.MaxTransformFeedbackSeparateComponents =
+  BRW_MAX_SOL_BINDINGS / BRW_MAX_SOL_BUFFERS;
+
/* if conformance mode is set, swrast can handle any size AA point */
ctx-Const.MaxPointSizeAA = 255.0;
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 70a45c7..da1de02 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++