Re: [Mesa-dev] [PATCH] r600g: remove one pointless flush

2011-10-29 Thread Vadim Girlin
On Sat, 2011-10-29 at 02:32 +0200, Marek Olšák wrote:
 On Sat, Oct 29, 2011 at 12:58 AM, Vadim Girlin vadimgir...@gmail.com wrote:
  On Sat, 2011-10-29 at 01:29 +0400, Vadim Girlin wrote:
  On Fri, 2011-10-28 at 23:16 +0200, Marek Olšák wrote:
   On Fri, Oct 28, 2011 at 10:52 PM, Vadim Girlin vadimgir...@gmail.com 
   wrote:
On Fri, 2011-10-28 at 19:47 +0200, Marek Olšák wrote:
I've got no idea what the flush was good for, but it's useless from
the looks of it. The rest of the patch is just a cleanup resulting
from some of the variables being no longer used for anything useful.
   
There are no piglit regressions.
   
It was intended to handle multiple interleaved query and conditional
render calls with single query object (in this case in theory we may
have multiple outstanding queries in current CS and separate data block
in the buffer for each query, with possible buffer overflow). I wasn't
  
   Do you mean this?
  
   while (1) {
 begin_query(q);
 draw();
 end_query(q);
 render_condition(q);
 draw();
 render_condition(NULL);
   }
  
   begin_query always resets query results to 0 anyway, so in theory, we
   could re-use the same data block over and over again.
 
  I think it's possible to run this loop without flushes, so we'll have
  multiple queries queued in current cs. From the driver point of view
  these queries will be executed simultaneously after flush, that's why we
  need to reserve and initialize separate data blocks for them.
 
 
  Probably we also need to use PIPE_TRANSFER_UNSYCHRONIZED to avoid the
  flush when mapping the buffer for data block initialization in the
  r600_query_begin. It seems I missed this, or the mapping semantics were
  changed with winsys change.
 
 Ah I see. I entirely missed the map/unmap part. Of course there is an
 implicit sync in the winsys (the flush is not so expensive, but the
 sync is). UNSYCHRONIZED would be dangerous in this particular case,
 because the GPU may still use the previous buffer data. UNSYCHRONIZED
 can only be used if we're 100% sure the GPU doesn't use the buffer
 range we're going to change. I see only two ways out of this:
 
 1) If the buffer is full, we can allocate another one and use that. I
 don't think we have any other choice with current Mesa master.
 
 2) We can program the GPU to memset the buffer. This would be very
 easy with transform feedback.
 
 I prefer (2).
 

(2) looks interesting, though currently I know next to nothing about
transform feedback on radeon hw.

I also have some ideas about possible improvements for this code, but I
never found the time to try them, and probably it will require more work
than your way. I was thinking about the following:

1) We can pre-initialize all data blocks in the buffer at once, e.g.
with memcpy, to avoid map/init/unmap overhead for every data block. 

2) We can use common data buffer(s) for all occlusion queries to reduce
allocation overhead and to avoid unnecessary initialization of multiple
data blocks for one-shot queries. Common buffer will be used
sequentially, similar to existing code.

3) We can use two big enough common buffers in turn, to reduce/avoid
sync overhead. With high probability (depending on the size) one buffer
will be completely processed by the GPU while we are emitting queries
with another. When current buffer is over, we'll initialize and use
another one. Probably having e.g. 32Kb buffers should be enough to avoid
waiting for GPU completely in most cases.

4) We can offload next buffer initialization to another thread.

5) We can adjust the size/number of the buffers dynamically, depending
on the usage.

Anyway, I'm OK with your solution.

Vadim


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: remove one pointless flush

2011-10-28 Thread Vadim Girlin
On Fri, 2011-10-28 at 19:47 +0200, Marek Olšák wrote:
 I've got no idea what the flush was good for, but it's useless from
 the looks of it. The rest of the patch is just a cleanup resulting
 from some of the variables being no longer used for anything useful.
 
 There are no piglit regressions.

It was intended to handle multiple interleaved query and conditional
render calls with single query object (in this case in theory we may
have multiple outstanding queries in current CS and separate data block
in the buffer for each query, with possible buffer overflow). I wasn't
sure if this is possible to do in one CS, so I used this additional
check to be on the safe side. Probably such situation is not possible
currently due to the flushes from the other paths, but might be possible
in the future after removing all unnecessary flushes. If you think it
won't be possible at all, I'm OK with this patch. Though I don't like
existing solution in any case, so even if we'll need this check later,
probably it should be done in some better way. 

IIRR this case isn't covered by the piglit. Probably I'll try to
create/modify some test for this.

Vadim

 ---
  src/gallium/drivers/r600/r600.h|6 +-
  src/gallium/drivers/r600/r600_blit.c   |2 +-
  src/gallium/drivers/r600/r600_hw_context.c |   21 ++---
  3 files changed, 4 insertions(+), 25 deletions(-)
 
 diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h
 index f58549a..9367651 100644
 --- a/src/gallium/drivers/r600/r600.h
 +++ b/src/gallium/drivers/r600/r600.h
 @@ -176,10 +176,6 @@ struct r600_query {
   unsignedresults_end;
   /* Size of the result */
   unsignedresult_size;
 - /* Count of new queries started in one stream without flushing */
 - unsignedqueries_emitted;
 - /* State flags */
 - boolean flushed;
   /* The buffer where query results are stored. It's used as a ring,
* data blocks for current query are stored sequentially from
* results_start to results_end, with wrapping on the buffer end */
 @@ -258,7 +254,7 @@ boolean r600_context_query_result(struct r600_context 
 *ctx,
  void r600_query_begin(struct r600_context *ctx, struct r600_query *query);
  void r600_query_end(struct r600_context *ctx, struct r600_query *query);
  void r600_context_queries_suspend(struct r600_context *ctx);
 -void r600_context_queries_resume(struct r600_context *ctx, boolean flushed);
 +void r600_context_queries_resume(struct r600_context *ctx);
  void r600_query_predication(struct r600_context *ctx, struct r600_query 
 *query, int operation,
   int flag_wait);
  void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
 *fence,
 diff --git a/src/gallium/drivers/r600/r600_blit.c 
 b/src/gallium/drivers/r600/r600_blit.c
 index 3eba0ad..9326dc6 100644
 --- a/src/gallium/drivers/r600/r600_blit.c
 +++ b/src/gallium/drivers/r600/r600_blit.c
 @@ -96,7 +96,7 @@ static void r600_blitter_end(struct pipe_context *ctx)
  rctx-saved_render_cond_mode);
   rctx-saved_render_cond = NULL;
   }
 - r600_context_queries_resume(rctx-ctx, FALSE);
 + r600_context_queries_resume(rctx-ctx);
  }
  
  static unsigned u_num_layers(struct pipe_resource *r, unsigned level)
 diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
 b/src/gallium/drivers/r600/r600_hw_context.c
 index a7d7ce6..1332748 100644
 --- a/src/gallium/drivers/r600/r600_hw_context.c
 +++ b/src/gallium/drivers/r600/r600_hw_context.c
 @@ -1521,7 +1521,7 @@ void r600_context_flush(struct r600_context *ctx, 
 unsigned flags)
   r600_init_cs(ctx);
  
   /* resume queries */
 - r600_context_queries_resume(ctx, TRUE);
 + r600_context_queries_resume(ctx);
  
   /* set all valid group as dirty so they get reemited on
* next draw command
 @@ -1619,18 +1619,6 @@ void r600_query_begin(struct r600_context *ctx, struct 
 r600_query *query)
   r600_context_flush(ctx, RADEON_FLUSH_ASYNC);
   }
  
 - if (query-type == PIPE_QUERY_OCCLUSION_COUNTER) {
 - /* Count queries emitted without flushes, and flush if more than
 -  * half of buffer used, to avoid overwriting results which may 
 be
 -  * still in use. */
 - if (query-flushed) {
 - query-queries_emitted = 1;
 - } else {
 - if (++query-queries_emitted  
 query-buffer-b.b.b.width0 / query-result_size / 2)
 - r600_context_flush(ctx, RADEON_FLUSH_ASYNC);
 - }
 - }
 -
   new_results_end = query-results_end + query-result_size;
   if (new_results_end = query-buffer-b.b.b.width0)
   new_results_end = 0;
 @@ -1713,8 +1701,6 @@ void 

Re: [Mesa-dev] [PATCH] r600g: remove one pointless flush

2011-10-28 Thread Marek Olšák
On Fri, Oct 28, 2011 at 10:52 PM, Vadim Girlin vadimgir...@gmail.com wrote:
 On Fri, 2011-10-28 at 19:47 +0200, Marek Olšák wrote:
 I've got no idea what the flush was good for, but it's useless from
 the looks of it. The rest of the patch is just a cleanup resulting
 from some of the variables being no longer used for anything useful.

 There are no piglit regressions.

 It was intended to handle multiple interleaved query and conditional
 render calls with single query object (in this case in theory we may
 have multiple outstanding queries in current CS and separate data block
 in the buffer for each query, with possible buffer overflow). I wasn't

Do you mean this?

while (1) {
  begin_query(q);
  draw();
  end_query(q);
  render_condition(q);
  draw();
  render_condition(NULL);
}

begin_query always resets query results to 0 anyway, so in theory, we
could re-use the same data block over and over again. I think the only
case where buffer overflow may happen is:

begin_query(q);
while (1) {
  draw();
  suspend_queries(); // flush or u_blitter
  resume_queries();
}
end_query(q);

Correct?

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: remove one pointless flush

2011-10-28 Thread Vadim Girlin
On Fri, 2011-10-28 at 23:16 +0200, Marek Olšák wrote:
 On Fri, Oct 28, 2011 at 10:52 PM, Vadim Girlin vadimgir...@gmail.com wrote:
  On Fri, 2011-10-28 at 19:47 +0200, Marek Olšák wrote:
  I've got no idea what the flush was good for, but it's useless from
  the looks of it. The rest of the patch is just a cleanup resulting
  from some of the variables being no longer used for anything useful.
 
  There are no piglit regressions.
 
  It was intended to handle multiple interleaved query and conditional
  render calls with single query object (in this case in theory we may
  have multiple outstanding queries in current CS and separate data block
  in the buffer for each query, with possible buffer overflow). I wasn't
 
 Do you mean this?
 
 while (1) {
   begin_query(q);
   draw();
   end_query(q);
   render_condition(q);
   draw();
   render_condition(NULL);
 }
 
 begin_query always resets query results to 0 anyway, so in theory, we
 could re-use the same data block over and over again. 

I think it's possible to run this loop without flushes, so we'll have
multiple queries queued in current cs. From the driver point of view
these queries will be executed simultaneously after flush, that's why we
need to reserve and initialize separate data blocks for them.

Vadim

 I think the only
 case where buffer overflow may happen is:
 
 begin_query(q);
 while (1) {
   draw();
   suspend_queries(); // flush or u_blitter
   resume_queries();
 }
 end_query(q);
 
 Correct?
 
 Marek



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: remove one pointless flush

2011-10-28 Thread Vadim Girlin
On Sat, 2011-10-29 at 01:29 +0400, Vadim Girlin wrote:
 On Fri, 2011-10-28 at 23:16 +0200, Marek Olšák wrote:
  On Fri, Oct 28, 2011 at 10:52 PM, Vadim Girlin vadimgir...@gmail.com 
  wrote:
   On Fri, 2011-10-28 at 19:47 +0200, Marek Olšák wrote:
   I've got no idea what the flush was good for, but it's useless from
   the looks of it. The rest of the patch is just a cleanup resulting
   from some of the variables being no longer used for anything useful.
  
   There are no piglit regressions.
  
   It was intended to handle multiple interleaved query and conditional
   render calls with single query object (in this case in theory we may
   have multiple outstanding queries in current CS and separate data block
   in the buffer for each query, with possible buffer overflow). I wasn't
  
  Do you mean this?
  
  while (1) {
begin_query(q);
draw();
end_query(q);
render_condition(q);
draw();
render_condition(NULL);
  }
  
  begin_query always resets query results to 0 anyway, so in theory, we
  could re-use the same data block over and over again. 
 
 I think it's possible to run this loop without flushes, so we'll have
 multiple queries queued in current cs. From the driver point of view
 these queries will be executed simultaneously after flush, that's why we
 need to reserve and initialize separate data blocks for them.
 

Probably we also need to use PIPE_TRANSFER_UNSYCHRONIZED to avoid the
flush when mapping the buffer for data block initialization in the
r600_query_begin. It seems I missed this, or the mapping semantics were
changed with winsys change. 

Vadim

 Vadim
 
  I think the only
  case where buffer overflow may happen is:
  
  begin_query(q);
  while (1) {
draw();
suspend_queries(); // flush or u_blitter
resume_queries();
  }
  end_query(q);
  
  Correct?
  
  Marek
 
 
 



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: remove one pointless flush

2011-10-28 Thread Marek Olšák
On Sat, Oct 29, 2011 at 12:58 AM, Vadim Girlin vadimgir...@gmail.com wrote:
 On Sat, 2011-10-29 at 01:29 +0400, Vadim Girlin wrote:
 On Fri, 2011-10-28 at 23:16 +0200, Marek Olšák wrote:
  On Fri, Oct 28, 2011 at 10:52 PM, Vadim Girlin vadimgir...@gmail.com 
  wrote:
   On Fri, 2011-10-28 at 19:47 +0200, Marek Olšák wrote:
   I've got no idea what the flush was good for, but it's useless from
   the looks of it. The rest of the patch is just a cleanup resulting
   from some of the variables being no longer used for anything useful.
  
   There are no piglit regressions.
  
   It was intended to handle multiple interleaved query and conditional
   render calls with single query object (in this case in theory we may
   have multiple outstanding queries in current CS and separate data block
   in the buffer for each query, with possible buffer overflow). I wasn't
 
  Do you mean this?
 
  while (1) {
    begin_query(q);
    draw();
    end_query(q);
    render_condition(q);
    draw();
    render_condition(NULL);
  }
 
  begin_query always resets query results to 0 anyway, so in theory, we
  could re-use the same data block over and over again.

 I think it's possible to run this loop without flushes, so we'll have
 multiple queries queued in current cs. From the driver point of view
 these queries will be executed simultaneously after flush, that's why we
 need to reserve and initialize separate data blocks for them.


 Probably we also need to use PIPE_TRANSFER_UNSYCHRONIZED to avoid the
 flush when mapping the buffer for data block initialization in the
 r600_query_begin. It seems I missed this, or the mapping semantics were
 changed with winsys change.

Ah I see. I entirely missed the map/unmap part. Of course there is an
implicit sync in the winsys (the flush is not so expensive, but the
sync is). UNSYCHRONIZED would be dangerous in this particular case,
because the GPU may still use the previous buffer data. UNSYCHRONIZED
can only be used if we're 100% sure the GPU doesn't use the buffer
range we're going to change. I see only two ways out of this:

1) If the buffer is full, we can allocate another one and use that. I
don't think we have any other choice with current Mesa master.

2) We can program the GPU to memset the buffer. This would be very
easy with transform feedback.

I prefer (2).

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev