Re: [Mesa-dev] [PATCH 08/10] i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.
Kenneth Graunkewrites: > We'd like to start using soft-pin to assign BO addresses up front, and > never move them again. Our previous plan for dealing with 48-bit VF > cache bugs was to relocate vertex buffers to the low 4GB, so we'd never > have addresses that alias in the low 32 bits. But that requires moving > buffers dynamically. > > This patch tracks the last seen BO address for each vertex/index buffer, > and emits a VF cache invalidate if the high bits change. (Ideally, we > won't hit this case very often.) This should work for the soft-pin > case, but unfortunately won't work in the relocation case, as we don't > actually know the addresses. So, we have to use both methods. > --- > src/mesa/drivers/dri/i965/brw_context.h | 6 ++ > src/mesa/drivers/dri/i965/genX_state_upload.c | 62 +++ > 2 files changed, 68 insertions(+) > > Migration is nice at times, but keeping everything static is also really > nice for pre-baking states...hard to know exactly what to do here. It > would be nice if we could allocate all GL buffer objects that might be > VBOs out of the same 4GB, but that's...difficult to know, and might be > too limiting. *shrug* > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > b/src/mesa/drivers/dri/i965/brw_context.h > index fb637e22281..7d6aa1a9c51 100644 > --- a/src/mesa/drivers/dri/i965/brw_context.h > +++ b/src/mesa/drivers/dri/i965/brw_context.h > @@ -983,6 +983,9 @@ struct brw_context > >/* For the initial pushdown, keep the list of vbo inputs. */ >struct vbo_inputs draw_arrays; > + > + /* High bits of the last seen vertex buffer address (for workarounds). > */ > + uint16_t last_bo_high_bits[33]; Maybe it's time to add a #define for 33 being the maximum number of vertex buffers. > } vb; > > struct { > @@ -1003,6 +1006,9 @@ struct brw_context > * referencing the same index buffer. > */ >unsigned int start_vertex_offset; > + > + /* High bits of the last seen index buffer address (for workarounds). > */ > + uint16_t last_bo_high_bits; > } ib; > > /* Active vertex program: > diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c > b/src/mesa/drivers/dri/i965/genX_state_upload.c > index b1867c1a1cc..e517b91de93 100644 > --- a/src/mesa/drivers/dri/i965/genX_state_upload.c > +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c > @@ -480,6 +480,64 @@ upload_format_size(uint32_t upload_format) > } > } > > +static UNUSED uint16_t > +pinned_bo_high_bits(struct brw_bo *bo) > +{ > + return (bo->kflags & EXEC_OBJECT_PINNED) ? bo->gtt_offset >> 32ull : 0; > +} > + > +/* The VF cache designers apparently cut corners, and made the cache > + * only consider the bottom 32 bits of memory addresses. You mentioned it in the thread here, but I think it's important to mention in the comment too that the cache is considering the tuple of the bottom 32 bits and the vertex buffer's index in the state message. Otherwise it would look like an individual set of vertices could be self-conflicting between indices. with those, Reviewed-by: Scott D Phillips ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/10] i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.
Quoting Kenneth Graunke (2018-05-04 09:06:49) > On Thursday, May 3, 2018 11:49:51 PM PDT Chris Wilson wrote: > > Quoting Kenneth Graunke (2018-05-04 02:12:38) > > > We'd like to start using soft-pin to assign BO addresses up front, and > > > never move them again. Our previous plan for dealing with 48-bit VF > > > cache bugs was to relocate vertex buffers to the low 4GB, so we'd never > > > have addresses that alias in the low 32 bits. But that requires moving > > > buffers dynamically. > > > > > > This patch tracks the last seen BO address for each vertex/index buffer, > > > and emits a VF cache invalidate if the high bits change. (Ideally, we > > > won't hit this case very often.) This should work for the soft-pin > > > case, but unfortunately won't work in the relocation case, as we don't > > > actually know the addresses. So, we have to use both methods. > > > > It can only be moved between batches; and between batches there is a > > PIPE_CONTROL_VF_CACHE_INVALIDATE. Is there more to this w/a, or is it > > truly unnecessary? See gen8_emit_flush_render() and double check that is > > sufficient. > > -Chris > > In the old approach, the moment a batch used a buffer as a vertex > buffer, we would pin it to 4GB. If it had been at some other address, > it would be relocated there for the entire batch. This ensures that > all vertex buffers have a 32-bit address. > > Now that we're fixing an address for BO's up front, those buffers might > have high address bits. We don't relocate and can't move them at all. > The VF only considers the low 32 bits of the address, plus the VB index. > > So, if the VF contained data for VB[5] at 0x1000 and then we bound a new > VB[5] at 0x11000, the cache would give us bogus data from the first > buffer when it ought to miss and read the second. To work around this, > this patch introduces code to do a VF cache invalidate if the top bits > for a particular VB[i] ever change. I forgot it was the aliasing effect between vertex buffers in the same batch. Thanks for clearing that up. (Could you throw in the phase "within the same batch" just to prevent me asking the same question again in future ;) -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/10] i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.
On Thursday, May 3, 2018 11:49:51 PM PDT Chris Wilson wrote: > Quoting Kenneth Graunke (2018-05-04 02:12:38) > > We'd like to start using soft-pin to assign BO addresses up front, and > > never move them again. Our previous plan for dealing with 48-bit VF > > cache bugs was to relocate vertex buffers to the low 4GB, so we'd never > > have addresses that alias in the low 32 bits. But that requires moving > > buffers dynamically. > > > > This patch tracks the last seen BO address for each vertex/index buffer, > > and emits a VF cache invalidate if the high bits change. (Ideally, we > > won't hit this case very often.) This should work for the soft-pin > > case, but unfortunately won't work in the relocation case, as we don't > > actually know the addresses. So, we have to use both methods. > > It can only be moved between batches; and between batches there is a > PIPE_CONTROL_VF_CACHE_INVALIDATE. Is there more to this w/a, or is it > truly unnecessary? See gen8_emit_flush_render() and double check that is > sufficient. > -Chris In the old approach, the moment a batch used a buffer as a vertex buffer, we would pin it to 4GB. If it had been at some other address, it would be relocated there for the entire batch. This ensures that all vertex buffers have a 32-bit address. Now that we're fixing an address for BO's up front, those buffers might have high address bits. We don't relocate and can't move them at all. The VF only considers the low 32 bits of the address, plus the VB index. So, if the VF contained data for VB[5] at 0x1000 and then we bound a new VB[5] at 0x11000, the cache would give us bogus data from the first buffer when it ought to miss and read the second. To work around this, this patch introduces code to do a VF cache invalidate if the top bits for a particular VB[i] ever change. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/10] i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.
Quoting Kenneth Graunke (2018-05-04 02:12:38) > We'd like to start using soft-pin to assign BO addresses up front, and > never move them again. Our previous plan for dealing with 48-bit VF > cache bugs was to relocate vertex buffers to the low 4GB, so we'd never > have addresses that alias in the low 32 bits. But that requires moving > buffers dynamically. > > This patch tracks the last seen BO address for each vertex/index buffer, > and emits a VF cache invalidate if the high bits change. (Ideally, we > won't hit this case very often.) This should work for the soft-pin > case, but unfortunately won't work in the relocation case, as we don't > actually know the addresses. So, we have to use both methods. It can only be moved between batches; and between batches there is a PIPE_CONTROL_VF_CACHE_INVALIDATE. Is there more to this w/a, or is it truly unnecessary? See gen8_emit_flush_render() and double check that is sufficient. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/10] i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.
We'd like to start using soft-pin to assign BO addresses up front, and never move them again. Our previous plan for dealing with 48-bit VF cache bugs was to relocate vertex buffers to the low 4GB, so we'd never have addresses that alias in the low 32 bits. But that requires moving buffers dynamically. This patch tracks the last seen BO address for each vertex/index buffer, and emits a VF cache invalidate if the high bits change. (Ideally, we won't hit this case very often.) This should work for the soft-pin case, but unfortunately won't work in the relocation case, as we don't actually know the addresses. So, we have to use both methods. --- src/mesa/drivers/dri/i965/brw_context.h | 6 ++ src/mesa/drivers/dri/i965/genX_state_upload.c | 62 +++ 2 files changed, 68 insertions(+) Migration is nice at times, but keeping everything static is also really nice for pre-baking states...hard to know exactly what to do here. It would be nice if we could allocate all GL buffer objects that might be VBOs out of the same 4GB, but that's...difficult to know, and might be too limiting. *shrug* diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index fb637e22281..7d6aa1a9c51 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -983,6 +983,9 @@ struct brw_context /* For the initial pushdown, keep the list of vbo inputs. */ struct vbo_inputs draw_arrays; + + /* High bits of the last seen vertex buffer address (for workarounds). */ + uint16_t last_bo_high_bits[33]; } vb; struct { @@ -1003,6 +1006,9 @@ struct brw_context * referencing the same index buffer. */ unsigned int start_vertex_offset; + + /* High bits of the last seen index buffer address (for workarounds). */ + uint16_t last_bo_high_bits; } ib; /* Active vertex program: diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c b/src/mesa/drivers/dri/i965/genX_state_upload.c index b1867c1a1cc..e517b91de93 100644 --- a/src/mesa/drivers/dri/i965/genX_state_upload.c +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c @@ -480,6 +480,64 @@ upload_format_size(uint32_t upload_format) } } +static UNUSED uint16_t +pinned_bo_high_bits(struct brw_bo *bo) +{ + return (bo->kflags & EXEC_OBJECT_PINNED) ? bo->gtt_offset >> 32ull : 0; +} + +/* The VF cache designers apparently cut corners, and made the cache + * only consider the bottom 32 bits of memory addresses. If you happen + * to have two vertex buffers which get placed exactly 4 GiB apart and + * use them in back-to-back draw calls, you can get collisions. + * + * In the soft-pin world, we'd like to assign addresses up front, and never + * move buffers. So, we need to do a VF cache invalidate if the buffer for + * a particular VB slot has different [48:32] address bits than the last one. + * + * In the relocation world, we have no idea what the addresses will be, so + * we can't apply this workaround. Instead, we tell the kernel to move it + * to the low 4GB regardless. + */ +static void +vf_invalidate_for_vb_48bit_transitions(struct brw_context *brw) +{ +#if GEN_GEN >= 8 + bool need_invalidate = true; + unsigned i; + + for (i = 0; i < brw->vb.nr_buffers; i++) { + uint16_t high_bits = pinned_bo_high_bits(brw->vb.buffers[i].bo); + + if (high_bits != brw->vb.last_bo_high_bits[i]) { + need_invalidate = true; + brw->vb.last_bo_high_bits[i] = high_bits; + } + } + + /* Don't bother with draw parameter buffers - those are generated by +* the driver so we can select a consistent memory zone. +*/ + + if (need_invalidate) { + brw_emit_pipe_control_flush(brw, PIPE_CONTROL_VF_CACHE_INVALIDATE); + } +#endif +} + +static void +vf_invalidate_for_ib_48bit_transition(struct brw_context *brw) +{ +#if GEN_GEN >= 8 + uint16_t high_bits = pinned_bo_high_bits(brw->ib.bo); + + if (high_bits != brw->ib.last_bo_high_bits) { + brw_emit_pipe_control_flush(brw, PIPE_CONTROL_VF_CACHE_INVALIDATE); + brw->ib.last_bo_high_bits = high_bits; + } +#endif +} + static void genX(emit_vertices)(struct brw_context *brw) { @@ -595,6 +653,8 @@ genX(emit_vertices)(struct brw_context *brw) const unsigned nr_buffers = brw->vb.nr_buffers + uses_draw_params + uses_derived_draw_params; + vf_invalidate_for_vb_48bit_transitions(brw); + if (nr_buffers) { assert(nr_buffers <= (GEN_GEN >= 6 ? 33 : 17)); @@ -890,6 +950,8 @@ genX(emit_index_buffer)(struct brw_context *brw) if (index_buffer == NULL) return; + vf_invalidate_for_ib_48bit_transition(brw); + brw_batch_emit(brw, GENX(3DSTATE_INDEX_BUFFER), ib) { #if GEN_GEN < 8 && !GEN_IS_HASWELL ib.CutIndexEnable = brw->prim_restart.enable_cut_index; -- 2.17.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org