Re: [Mesa-dev] [PATCH 08/10] i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.

2018-05-08 Thread Scott D Phillips
Kenneth Graunke  writes:

> We'd like to start using soft-pin to assign BO addresses up front, and
> never move them again.  Our previous plan for dealing with 48-bit VF
> cache bugs was to relocate vertex buffers to the low 4GB, so we'd never
> have addresses that alias in the low 32 bits.  But that requires moving
> buffers dynamically.
>
> This patch tracks the last seen BO address for each vertex/index buffer,
> and emits a VF cache invalidate if the high bits change.  (Ideally, we
> won't hit this case very often.)  This should work for the soft-pin
> case, but unfortunately won't work in the relocation case, as we don't
> actually know the addresses.  So, we have to use both methods.
> ---
>  src/mesa/drivers/dri/i965/brw_context.h   |  6 ++
>  src/mesa/drivers/dri/i965/genX_state_upload.c | 62 +++
>  2 files changed, 68 insertions(+)
>
> Migration is nice at times, but keeping everything static is also really
> nice for pre-baking states...hard to know exactly what to do here.  It
> would be nice if we could allocate all GL buffer objects that might be
> VBOs out of the same 4GB, but that's...difficult to know, and might be
> too limiting.  *shrug*
>
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index fb637e22281..7d6aa1a9c51 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -983,6 +983,9 @@ struct brw_context
>  
>/* For the initial pushdown, keep the list of vbo inputs. */
>struct vbo_inputs draw_arrays;
> +
> +  /* High bits of the last seen vertex buffer address (for workarounds). 
> */
> +  uint16_t last_bo_high_bits[33];

Maybe it's time to add a #define for 33 being the maximum number of
vertex buffers.

> } vb;
>  
> struct {
> @@ -1003,6 +1006,9 @@ struct brw_context
> * referencing the same index buffer.
> */
>unsigned int start_vertex_offset;
> +
> +  /* High bits of the last seen index buffer address (for workarounds). 
> */
> +  uint16_t last_bo_high_bits;
> } ib;
>  
> /* Active vertex program:
> diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c 
> b/src/mesa/drivers/dri/i965/genX_state_upload.c
> index b1867c1a1cc..e517b91de93 100644
> --- a/src/mesa/drivers/dri/i965/genX_state_upload.c
> +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
> @@ -480,6 +480,64 @@ upload_format_size(uint32_t upload_format)
> }
>  }
>  
> +static UNUSED uint16_t
> +pinned_bo_high_bits(struct brw_bo *bo)
> +{
> +   return (bo->kflags & EXEC_OBJECT_PINNED) ? bo->gtt_offset >> 32ull : 0;
> +}
> +
> +/* The VF cache designers apparently cut corners, and made the cache
> + * only consider the bottom 32 bits of memory addresses.

You mentioned it in the thread here, but I think it's important to
mention in the comment too that the cache is considering the tuple of
the bottom 32 bits and the vertex buffer's index in the state
message. Otherwise it would look like an individual set of vertices
could be self-conflicting between indices.

with those,

Reviewed-by: Scott D Phillips 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/10] i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.

2018-05-04 Thread Chris Wilson
Quoting Kenneth Graunke (2018-05-04 09:06:49)
> On Thursday, May 3, 2018 11:49:51 PM PDT Chris Wilson wrote:
> > Quoting Kenneth Graunke (2018-05-04 02:12:38)
> > > We'd like to start using soft-pin to assign BO addresses up front, and
> > > never move them again.  Our previous plan for dealing with 48-bit VF
> > > cache bugs was to relocate vertex buffers to the low 4GB, so we'd never
> > > have addresses that alias in the low 32 bits.  But that requires moving
> > > buffers dynamically.
> > > 
> > > This patch tracks the last seen BO address for each vertex/index buffer,
> > > and emits a VF cache invalidate if the high bits change.  (Ideally, we
> > > won't hit this case very often.)  This should work for the soft-pin
> > > case, but unfortunately won't work in the relocation case, as we don't
> > > actually know the addresses.  So, we have to use both methods.
> > 
> > It can only be moved between batches; and between batches there is a
> > PIPE_CONTROL_VF_CACHE_INVALIDATE. Is there more to this w/a, or is it
> > truly unnecessary? See gen8_emit_flush_render() and double check that is
> > sufficient.
> > -Chris
> 
> In the old approach, the moment a batch used a buffer as a vertex
> buffer, we would pin it to 4GB.  If it had been at some other address,
> it would be relocated there for the entire batch.  This ensures that
> all vertex buffers have a 32-bit address.
> 
> Now that we're fixing an address for BO's up front, those buffers might
> have high address bits.  We don't relocate and can't move them at all.
> The VF only considers the low 32 bits of the address, plus the VB index.
> 
> So, if the VF contained data for VB[5] at 0x1000 and then we bound a new
> VB[5] at 0x11000, the cache would give us bogus data from the first
> buffer when it ought to miss and read the second.  To work around this,
> this patch introduces code to do a VF cache invalidate if the top bits
> for a particular VB[i] ever change.

I forgot it was the aliasing effect between vertex buffers in the same
batch. Thanks for clearing that up. (Could you throw in the phase
"within the same batch" just to prevent me asking the same question
again in future ;)
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/10] i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.

2018-05-04 Thread Kenneth Graunke
On Thursday, May 3, 2018 11:49:51 PM PDT Chris Wilson wrote:
> Quoting Kenneth Graunke (2018-05-04 02:12:38)
> > We'd like to start using soft-pin to assign BO addresses up front, and
> > never move them again.  Our previous plan for dealing with 48-bit VF
> > cache bugs was to relocate vertex buffers to the low 4GB, so we'd never
> > have addresses that alias in the low 32 bits.  But that requires moving
> > buffers dynamically.
> > 
> > This patch tracks the last seen BO address for each vertex/index buffer,
> > and emits a VF cache invalidate if the high bits change.  (Ideally, we
> > won't hit this case very often.)  This should work for the soft-pin
> > case, but unfortunately won't work in the relocation case, as we don't
> > actually know the addresses.  So, we have to use both methods.
> 
> It can only be moved between batches; and between batches there is a
> PIPE_CONTROL_VF_CACHE_INVALIDATE. Is there more to this w/a, or is it
> truly unnecessary? See gen8_emit_flush_render() and double check that is
> sufficient.
> -Chris

In the old approach, the moment a batch used a buffer as a vertex
buffer, we would pin it to 4GB.  If it had been at some other address,
it would be relocated there for the entire batch.  This ensures that
all vertex buffers have a 32-bit address.

Now that we're fixing an address for BO's up front, those buffers might
have high address bits.  We don't relocate and can't move them at all.
The VF only considers the low 32 bits of the address, plus the VB index.

So, if the VF contained data for VB[5] at 0x1000 and then we bound a new
VB[5] at 0x11000, the cache would give us bogus data from the first
buffer when it ought to miss and read the second.  To work around this,
this patch introduces code to do a VF cache invalidate if the top bits
for a particular VB[i] ever change.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/10] i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.

2018-05-04 Thread Chris Wilson
Quoting Kenneth Graunke (2018-05-04 02:12:38)
> We'd like to start using soft-pin to assign BO addresses up front, and
> never move them again.  Our previous plan for dealing with 48-bit VF
> cache bugs was to relocate vertex buffers to the low 4GB, so we'd never
> have addresses that alias in the low 32 bits.  But that requires moving
> buffers dynamically.
> 
> This patch tracks the last seen BO address for each vertex/index buffer,
> and emits a VF cache invalidate if the high bits change.  (Ideally, we
> won't hit this case very often.)  This should work for the soft-pin
> case, but unfortunately won't work in the relocation case, as we don't
> actually know the addresses.  So, we have to use both methods.

It can only be moved between batches; and between batches there is a
PIPE_CONTROL_VF_CACHE_INVALIDATE. Is there more to this w/a, or is it
truly unnecessary? See gen8_emit_flush_render() and double check that is
sufficient.
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/10] i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.

2018-05-03 Thread Kenneth Graunke
We'd like to start using soft-pin to assign BO addresses up front, and
never move them again.  Our previous plan for dealing with 48-bit VF
cache bugs was to relocate vertex buffers to the low 4GB, so we'd never
have addresses that alias in the low 32 bits.  But that requires moving
buffers dynamically.

This patch tracks the last seen BO address for each vertex/index buffer,
and emits a VF cache invalidate if the high bits change.  (Ideally, we
won't hit this case very often.)  This should work for the soft-pin
case, but unfortunately won't work in the relocation case, as we don't
actually know the addresses.  So, we have to use both methods.
---
 src/mesa/drivers/dri/i965/brw_context.h   |  6 ++
 src/mesa/drivers/dri/i965/genX_state_upload.c | 62 +++
 2 files changed, 68 insertions(+)

Migration is nice at times, but keeping everything static is also really
nice for pre-baking states...hard to know exactly what to do here.  It
would be nice if we could allocate all GL buffer objects that might be
VBOs out of the same 4GB, but that's...difficult to know, and might be
too limiting.  *shrug*

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index fb637e22281..7d6aa1a9c51 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -983,6 +983,9 @@ struct brw_context
 
   /* For the initial pushdown, keep the list of vbo inputs. */
   struct vbo_inputs draw_arrays;
+
+  /* High bits of the last seen vertex buffer address (for workarounds). */
+  uint16_t last_bo_high_bits[33];
} vb;
 
struct {
@@ -1003,6 +1006,9 @@ struct brw_context
* referencing the same index buffer.
*/
   unsigned int start_vertex_offset;
+
+  /* High bits of the last seen index buffer address (for workarounds). */
+  uint16_t last_bo_high_bits;
} ib;
 
/* Active vertex program:
diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c 
b/src/mesa/drivers/dri/i965/genX_state_upload.c
index b1867c1a1cc..e517b91de93 100644
--- a/src/mesa/drivers/dri/i965/genX_state_upload.c
+++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
@@ -480,6 +480,64 @@ upload_format_size(uint32_t upload_format)
}
 }
 
+static UNUSED uint16_t
+pinned_bo_high_bits(struct brw_bo *bo)
+{
+   return (bo->kflags & EXEC_OBJECT_PINNED) ? bo->gtt_offset >> 32ull : 0;
+}
+
+/* The VF cache designers apparently cut corners, and made the cache
+ * only consider the bottom 32 bits of memory addresses.  If you happen
+ * to have two vertex buffers which get placed exactly 4 GiB apart and
+ * use them in back-to-back draw calls, you can get collisions.
+ *
+ * In the soft-pin world, we'd like to assign addresses up front, and never
+ * move buffers.  So, we need to do a VF cache invalidate if the buffer for
+ * a particular VB slot has different [48:32] address bits than the last one.
+ *
+ * In the relocation world, we have no idea what the addresses will be, so
+ * we can't apply this workaround.  Instead, we tell the kernel to move it
+ * to the low 4GB regardless.
+ */
+static void
+vf_invalidate_for_vb_48bit_transitions(struct brw_context *brw)
+{
+#if GEN_GEN >= 8
+   bool need_invalidate = true;
+   unsigned i;
+
+   for (i = 0; i < brw->vb.nr_buffers; i++) {
+  uint16_t high_bits = pinned_bo_high_bits(brw->vb.buffers[i].bo);
+
+  if (high_bits != brw->vb.last_bo_high_bits[i]) {
+ need_invalidate = true;
+ brw->vb.last_bo_high_bits[i] = high_bits;
+  }
+   }
+
+   /* Don't bother with draw parameter buffers - those are generated by
+* the driver so we can select a consistent memory zone.
+*/
+
+   if (need_invalidate) {
+  brw_emit_pipe_control_flush(brw, PIPE_CONTROL_VF_CACHE_INVALIDATE);
+   }
+#endif
+}
+
+static void
+vf_invalidate_for_ib_48bit_transition(struct brw_context *brw)
+{
+#if GEN_GEN >= 8
+   uint16_t high_bits = pinned_bo_high_bits(brw->ib.bo);
+
+   if (high_bits != brw->ib.last_bo_high_bits) {
+  brw_emit_pipe_control_flush(brw, PIPE_CONTROL_VF_CACHE_INVALIDATE);
+  brw->ib.last_bo_high_bits = high_bits;
+   }
+#endif
+}
+
 static void
 genX(emit_vertices)(struct brw_context *brw)
 {
@@ -595,6 +653,8 @@ genX(emit_vertices)(struct brw_context *brw)
const unsigned nr_buffers = brw->vb.nr_buffers +
   uses_draw_params + uses_derived_draw_params;
 
+   vf_invalidate_for_vb_48bit_transitions(brw);
+
if (nr_buffers) {
   assert(nr_buffers <= (GEN_GEN >= 6 ? 33 : 17));
 
@@ -890,6 +950,8 @@ genX(emit_index_buffer)(struct brw_context *brw)
if (index_buffer == NULL)
   return;
 
+   vf_invalidate_for_ib_48bit_transition(brw);
+
brw_batch_emit(brw, GENX(3DSTATE_INDEX_BUFFER), ib) {
 #if GEN_GEN < 8 && !GEN_IS_HASWELL
   ib.CutIndexEnable = brw->prim_restart.enable_cut_index;
-- 
2.17.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org