Re: [Mesa-dev] [PATCH 1/2] r600g: rework flusing and synchronization pattern v4

2012-12-08 Thread Marek Olšák
Hi Jerome,

I'm okay with the simplification of r600_flush_emit, I'm not so okay
with some other things. There's also some cruft unrelated to flushing.

1) R600_CONTEXT_FLUSH could have a better name, because it's not clear
what it does. (it looks like it only flushed read-only bindings)

2) Don't use magic numbers when setting cp_coher_cntl unless you want
to hide something from us / obfuscating the code. :)

3) The definition of R600_MAX_FLUSH_CS_DWORDS should be updated.

4) SURFACE_BASE_UPDATE is emitted twice in emit_framebuffer_state. I
don't think splitting one packet into two packets doing the same thing
is needed.

5) RS780 and RS880 don't need SURFACE_BASE_UPDATE for streamout. Their
streamout hardware was actually copied from R700. Doing  CHIP_RS780
instead of  CHIP_RV770 was correct. The same for r600_flush_emit.

6) In r600_context_flush, don't remove the comment about flushing
framebuffer caches, because it's still done there.

7) Masking out R600_CONTEXT_FLUSH in r600_context_emit_fence is not
correct. We should still flush the caches later if they're dirty and
even if the fence was emitted. You can't see this regression in
piglit, because we don't have a test for that.

8) There's some inconsistent flushing between graphics and compute
colorbuffer bindings. For graphics, you use (WAIT_IDLE |
FLUSH_AND_INV), which makes sense. For compute, you use
R600_CONTEXT_FLUSH (which is used for vertex buffers and the like
elsewhere, but not colorbuffers).

And one question:

Why do you use set both FLUSH_AND_INV and STREAMOUT_FLUSH on
Evergreen, while r600 only gets FLUSH_AND_INV? Did you overlook this?

Marek

On Thu, Dec 6, 2012 at 8:51 PM,  j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 This bring r600g allmost inline with closed source driver when
 it comes to flushing and synchronization pattern.

 Signed-off-by: Jerome Glisse jgli...@redhat.com
 ---
  src/gallium/drivers/r600/evergreen_compute.c   |   8 +-
  .../drivers/r600/evergreen_compute_internal.c  |   4 +-
  src/gallium/drivers/r600/evergreen_state.c |   4 +-
  src/gallium/drivers/r600/r600.h|  16 +--
  src/gallium/drivers/r600/r600_hw_context.c | 154 
 -
  src/gallium/drivers/r600/r600_state.c  |  18 ++-
  src/gallium/drivers/r600/r600_state_common.c   |  19 ++-
  7 files changed, 61 insertions(+), 162 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
 b/src/gallium/drivers/r600/evergreen_compute.c
 index 44831a7..33a5910 100644
 --- a/src/gallium/drivers/r600/evergreen_compute.c
 +++ b/src/gallium/drivers/r600/evergreen_compute.c
 @@ -98,7 +98,7 @@ static void evergreen_cs_set_vertex_buffer(

 /* The vertex instructions in the compute shaders use the texture 
 cache,
  * so we need to invalidate it. */
 -   rctx-flags |= R600_CONTEXT_TEX_FLUSH;
 +   rctx-flags |= R600_CONTEXT_FLUSH;
 state-enabled_mask |= 1  vb_index;
 state-dirty_mask |= 1  vb_index;
 state-atom.dirty = true;
 @@ -329,7 +329,7 @@ static void compute_emit_cs(struct r600_context *ctx, 
 const uint *block_layout,
  */
 r600_emit_command_buffer(ctx-cs, ctx-start_compute_cs_cmd);

 -   ctx-flags |= R600_CONTEXT_CB_FLUSH;
 +   ctx-flags |= R600_CONTEXT_FLUSH;
 r600_flush_emit(ctx);

 /* Emit colorbuffers. */
 @@ -409,7 +409,7 @@ static void compute_emit_cs(struct r600_context *ctx, 
 const uint *block_layout,

 /* XXX evergreen_flush_emit() hardcodes the CP_COHER_SIZE to 
 0x
  */
 -   ctx-flags |= R600_CONTEXT_CB_FLUSH;
 +   ctx-flags |= R600_CONTEXT_FLUSH;
 r600_flush_emit(ctx);

  #if 0
 @@ -468,7 +468,7 @@ void evergreen_emit_cs_shader(
 r600_write_value(cs, r600_context_bo_reloc(rctx, kernel-code_bo,
 RADEON_USAGE_READ));

 -   rctx-flags |= R600_CONTEXT_SHADERCONST_FLUSH;
 +   rctx-flags |= R600_CONTEXT_FLUSH;
  }

  static void evergreen_launch_grid(
 diff --git a/src/gallium/drivers/r600/evergreen_compute_internal.c 
 b/src/gallium/drivers/r600/evergreen_compute_internal.c
 index 7bc7fb4..187bcf1 100644
 --- a/src/gallium/drivers/r600/evergreen_compute_internal.c
 +++ b/src/gallium/drivers/r600/evergreen_compute_internal.c
 @@ -538,7 +538,7 @@ void evergreen_set_tex_resource(
  
 util_format_get_blockwidth(tmp-resource.b.b.format) *
  view-base.texture-width0*height*depth;

 -   pipe-ctx-flags |= R600_CONTEXT_TEX_FLUSH;
 +   pipe-ctx-flags |= R600_CONTEXT_FLUSH;

 evergreen_emit_force_reloc(res);
 evergreen_emit_force_reloc(res);
 @@ -597,7 +597,7 @@ void evergreen_set_const_cache(
 res-usage = RADEON_USAGE_READ;
 res-coher_bo_size = size;

 -   pipe-ctx-flags |= R600_CONTEXT_SHADERCONST_FLUSH;
 +   pipe-ctx-flags |= R600_CONTEXT_FLUSH;
  }

  

Re: [Mesa-dev] [PATCH 1/2] r600g: rework flusing and synchronization pattern v4

2012-12-08 Thread Jerome Glisse
On Sat, Dec 8, 2012 at 7:27 PM, Marek Olšák mar...@gmail.com wrote:
 Hi Jerome,

 I'm okay with the simplification of r600_flush_emit, I'm not so okay
 with some other things. There's also some cruft unrelated to flushing.

 1) R600_CONTEXT_FLUSH could have a better name, because it's not clear
 what it does. (it looks like it only flushed read-only bindings)

GPU_FLUSH ?

 2) Don't use magic numbers when setting cp_coher_cntl unless you want
 to hide something from us / obfuscating the code. :)

 3) The definition of R600_MAX_FLUSH_CS_DWORDS should be updated.

Yes i haven't recomputed worst case

 4) SURFACE_BASE_UPDATE is emitted twice in emit_framebuffer_state. I
 don't think splitting one packet into two packets doing the same thing
 is needed.

It's need couple r6xx/r7xx gpu will lockup after couple hour of
stressing, wasn't seeing lockup with it.

 5) RS780 and RS880 don't need SURFACE_BASE_UPDATE for streamout. Their
 streamout hardware was actually copied from R700. Doing  CHIP_RS780
 instead of  CHIP_RV770 was correct. The same for r600_flush_emit.

fglrx mostly do the same on r7xx and r6xx for streamout as i am not
sure i have any stressing test for that i side on fglrx side.

 6) In r600_context_flush, don't remove the comment about flushing
 framebuffer caches, because it's still done there.

 7) Masking out R600_CONTEXT_FLUSH in r600_context_emit_fence is not
 correct. We should still flush the caches later if they're dirty and
 even if the fence was emitted. You can't see this regression in
 piglit, because we don't have a test for that.
True
 8) There's some inconsistent flushing between graphics and compute
 colorbuffer bindings. For graphics, you use (WAIT_IDLE |
 FLUSH_AND_INV), which makes sense. For compute, you use
 R600_CONTEXT_FLUSH (which is used for vertex buffers and the like
 elsewhere, but not colorbuffers).

I haven't paid much attention to compute side, i should probably look at it.

 And one question:

 Why do you use set both FLUSH_AND_INV and STREAMOUT_FLUSH on
 Evergreen, while r600 only gets FLUSH_AND_INV? Did you overlook this?

No, just matching fglrx pattern, i don't think i tested without that
change, but it definitly match fglrx.

Cheers,
Jerome

 Marek

 On Thu, Dec 6, 2012 at 8:51 PM,  j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 This bring r600g allmost inline with closed source driver when
 it comes to flushing and synchronization pattern.

 Signed-off-by: Jerome Glisse jgli...@redhat.com
 ---
  src/gallium/drivers/r600/evergreen_compute.c   |   8 +-
  .../drivers/r600/evergreen_compute_internal.c  |   4 +-
  src/gallium/drivers/r600/evergreen_state.c |   4 +-
  src/gallium/drivers/r600/r600.h|  16 +--
  src/gallium/drivers/r600/r600_hw_context.c | 154 
 -
  src/gallium/drivers/r600/r600_state.c  |  18 ++-
  src/gallium/drivers/r600/r600_state_common.c   |  19 ++-
  7 files changed, 61 insertions(+), 162 deletions(-)

 diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
 b/src/gallium/drivers/r600/evergreen_compute.c
 index 44831a7..33a5910 100644
 --- a/src/gallium/drivers/r600/evergreen_compute.c
 +++ b/src/gallium/drivers/r600/evergreen_compute.c
 @@ -98,7 +98,7 @@ static void evergreen_cs_set_vertex_buffer(

 /* The vertex instructions in the compute shaders use the texture 
 cache,
  * so we need to invalidate it. */
 -   rctx-flags |= R600_CONTEXT_TEX_FLUSH;
 +   rctx-flags |= R600_CONTEXT_FLUSH;
 state-enabled_mask |= 1  vb_index;
 state-dirty_mask |= 1  vb_index;
 state-atom.dirty = true;
 @@ -329,7 +329,7 @@ static void compute_emit_cs(struct r600_context *ctx, 
 const uint *block_layout,
  */
 r600_emit_command_buffer(ctx-cs, ctx-start_compute_cs_cmd);

 -   ctx-flags |= R600_CONTEXT_CB_FLUSH;
 +   ctx-flags |= R600_CONTEXT_FLUSH;
 r600_flush_emit(ctx);

 /* Emit colorbuffers. */
 @@ -409,7 +409,7 @@ static void compute_emit_cs(struct r600_context *ctx, 
 const uint *block_layout,

 /* XXX evergreen_flush_emit() hardcodes the CP_COHER_SIZE to 
 0x
  */
 -   ctx-flags |= R600_CONTEXT_CB_FLUSH;
 +   ctx-flags |= R600_CONTEXT_FLUSH;
 r600_flush_emit(ctx);

  #if 0
 @@ -468,7 +468,7 @@ void evergreen_emit_cs_shader(
 r600_write_value(cs, r600_context_bo_reloc(rctx, kernel-code_bo,
 RADEON_USAGE_READ));

 -   rctx-flags |= R600_CONTEXT_SHADERCONST_FLUSH;
 +   rctx-flags |= R600_CONTEXT_FLUSH;
  }

  static void evergreen_launch_grid(
 diff --git a/src/gallium/drivers/r600/evergreen_compute_internal.c 
 b/src/gallium/drivers/r600/evergreen_compute_internal.c
 index 7bc7fb4..187bcf1 100644
 --- a/src/gallium/drivers/r600/evergreen_compute_internal.c
 +++ b/src/gallium/drivers/r600/evergreen_compute_internal.c
 @@ -538,7 +538,7 @@ void 

[Mesa-dev] [PATCH 1/2] r600g: rework flusing and synchronization pattern v4

2012-12-06 Thread j . glisse
From: Jerome Glisse jgli...@redhat.com

This bring r600g allmost inline with closed source driver when
it comes to flushing and synchronization pattern.

Signed-off-by: Jerome Glisse jgli...@redhat.com
---
 src/gallium/drivers/r600/evergreen_compute.c   |   8 +-
 .../drivers/r600/evergreen_compute_internal.c  |   4 +-
 src/gallium/drivers/r600/evergreen_state.c |   4 +-
 src/gallium/drivers/r600/r600.h|  16 +--
 src/gallium/drivers/r600/r600_hw_context.c | 154 -
 src/gallium/drivers/r600/r600_state.c  |  18 ++-
 src/gallium/drivers/r600/r600_state_common.c   |  19 ++-
 7 files changed, 61 insertions(+), 162 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
b/src/gallium/drivers/r600/evergreen_compute.c
index 44831a7..33a5910 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -98,7 +98,7 @@ static void evergreen_cs_set_vertex_buffer(
 
/* The vertex instructions in the compute shaders use the texture cache,
 * so we need to invalidate it. */
-   rctx-flags |= R600_CONTEXT_TEX_FLUSH;
+   rctx-flags |= R600_CONTEXT_FLUSH;
state-enabled_mask |= 1  vb_index;
state-dirty_mask |= 1  vb_index;
state-atom.dirty = true;
@@ -329,7 +329,7 @@ static void compute_emit_cs(struct r600_context *ctx, const 
uint *block_layout,
 */
r600_emit_command_buffer(ctx-cs, ctx-start_compute_cs_cmd);
 
-   ctx-flags |= R600_CONTEXT_CB_FLUSH;
+   ctx-flags |= R600_CONTEXT_FLUSH;
r600_flush_emit(ctx);
 
/* Emit colorbuffers. */
@@ -409,7 +409,7 @@ static void compute_emit_cs(struct r600_context *ctx, const 
uint *block_layout,
 
/* XXX evergreen_flush_emit() hardcodes the CP_COHER_SIZE to 0x
 */
-   ctx-flags |= R600_CONTEXT_CB_FLUSH;
+   ctx-flags |= R600_CONTEXT_FLUSH;
r600_flush_emit(ctx);
 
 #if 0
@@ -468,7 +468,7 @@ void evergreen_emit_cs_shader(
r600_write_value(cs, r600_context_bo_reloc(rctx, kernel-code_bo,
RADEON_USAGE_READ));
 
-   rctx-flags |= R600_CONTEXT_SHADERCONST_FLUSH;
+   rctx-flags |= R600_CONTEXT_FLUSH;
 }
 
 static void evergreen_launch_grid(
diff --git a/src/gallium/drivers/r600/evergreen_compute_internal.c 
b/src/gallium/drivers/r600/evergreen_compute_internal.c
index 7bc7fb4..187bcf1 100644
--- a/src/gallium/drivers/r600/evergreen_compute_internal.c
+++ b/src/gallium/drivers/r600/evergreen_compute_internal.c
@@ -538,7 +538,7 @@ void evergreen_set_tex_resource(
 
util_format_get_blockwidth(tmp-resource.b.b.format) *
 view-base.texture-width0*height*depth;
 
-   pipe-ctx-flags |= R600_CONTEXT_TEX_FLUSH;
+   pipe-ctx-flags |= R600_CONTEXT_FLUSH;
 
evergreen_emit_force_reloc(res);
evergreen_emit_force_reloc(res);
@@ -597,7 +597,7 @@ void evergreen_set_const_cache(
res-usage = RADEON_USAGE_READ;
res-coher_bo_size = size;
 
-   pipe-ctx-flags |= R600_CONTEXT_SHADERCONST_FLUSH;
+   pipe-ctx-flags |= R600_CONTEXT_FLUSH;
 }
 
 struct r600_resource* r600_compute_buffer_alloc_vram(
diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 9b898cb..7bc4772 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -1557,14 +1557,14 @@ static void evergreen_set_framebuffer_state(struct 
pipe_context *ctx,
uint32_t i, log_samples;
 
if (rctx-framebuffer.state.nr_cbufs) {
-   rctx-flags |= R600_CONTEXT_CB_FLUSH;
+   rctx-flags |= R600_CONTEXT_WAIT_IDLE | 
R600_CONTEXT_FLUSH_AND_INV;
 
if (rctx-framebuffer.state.cbufs[0]-texture-nr_samples  1) {
rctx-flags |= R600_CONTEXT_FLUSH_AND_INV_CB_META;
}
}
if (rctx-framebuffer.state.zsbuf) {
-   rctx-flags |= R600_CONTEXT_DB_FLUSH;
+   rctx-flags |= R600_CONTEXT_WAIT_IDLE | 
R600_CONTEXT_FLUSH_AND_INV;
}
 
util_copy_framebuffer_state(rctx-framebuffer.state, state);
diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h
index 7d43416..4060672 100644
--- a/src/gallium/drivers/r600/r600.h
+++ b/src/gallium/drivers/r600/r600.h
@@ -180,17 +180,11 @@ struct r600_so_target {
unsignedso_index;
 };
 
-#define R600_CONTEXT_PS_PARTIAL_FLUSH  (1  0)
-#define R600_CONTEXT_CB_FLUSH  (1  1)
-#define R600_CONTEXT_DB_FLUSH  (1  2)
-#define R600_CONTEXT_SHADERCONST_FLUSH (1  3)
-#define R600_CONTEXT_TEX_FLUSH (1  4)
-#define R600_CONTEXT_VTX_FLUSH (1  5)
-#define R600_CONTEXT_STREAMOUT_FLUSH   (1  6)
-#define R600_CONTEXT_WAIT_IDLE (1  7)
-#define