[Mesa-dev] Problem getting a 32-bit X11 visual when porting from GLX to EGL

2013-11-06 Thread Simon Farnsworth
Hello all,

I'm attempting to port our GLX based code to EGL/X11, as a first step towards 
being ready for Wayland, and I've hit a snag. We use an X11 compositor to get 
translucent windows blending one atop the other, and I carefully select a 32 
bit RGBA visual (ID 0x64 on my system) instead of a 24 bit RGB visual (IDs 
0x21 and 0x22 on my system) to make this work.

I've got the code working on EGL/X11, but I can't find an EGLConfig that's 
based 
on visual 0x64 - they're all based on 0x21 (TrueColor) and 0x22 (DirectColor).

Looking at 
http://cgit.freedesktop.org/mesa/mesa/tree/src/egl/drivers/dri2/platform_x11.c#n671
 
suggests this is deliberate. I've tried the stupid patch below, but that makes 
all the TrueColor EGLConfigs into the RGBA visual 0x64, which is clearly not 
wanted based on the comment.

How should I proceed here? I can obviously go back to GLX, and migrate to EGL 
when I migrate to Wayland, but I'd prefer to move to EGL now while I'm still 
X11 based.

diff --git a/src/egl/drivers/dri2/platform_x11.c 
b/src/egl/drivers/dri2/platform_x11.c
index a518db1..18fb81d 100644
--- a/src/egl/drivers/dri2/platform_x11.c
+++ b/src/egl/drivers/dri2/platform_x11.c
@@ -649,6 +649,37 @@ dri2_add_configs_for_visuals(struct dri2_egl_display 
*dri2_dpy,
   EGL_SWAP_BEHAVIOR_PRESERVED_BIT;
 
while (d.rem > 0) {
+  if (d.data->depth != 32) {
+ xcb_depth_next(&d);
+ continue;
+  }
+  EGLBoolean class_added[6] = { 0, };
+
+  visuals = xcb_depth_visuals(d.data);
+  for (i = 0; i < xcb_depth_visuals_length(d.data); i++) {
+if (class_added[visuals[i]._class])
+   continue;
+
+class_added[visuals[i]._class] = EGL_TRUE;
+for (j = 0; dri2_dpy->driver_configs[j]; j++) {
+config_attrs[1] = visuals[i].visual_id;
+config_attrs[3] = visuals[i]._class;
+
+rgba_masks[0] = visuals[i].red_mask;
+rgba_masks[1] = visuals[i].green_mask;
+rgba_masks[2] = visuals[i].blue_mask;
+rgba_masks[3] =
+   ~(rgba_masks[0] | rgba_masks[1] | rgba_masks[2]);
+   dri2_add_config(disp, dri2_dpy->driver_configs[j], id++,
+   surface_type, config_attrs, rgba_masks);
+}
+  }
+
+  xcb_depth_next(&d);
+   }
+
+   d = xcb_screen_allowed_depths_iterator(s.data);
+   while (d.rem > 0) {
   EGLBoolean class_added[6] = { 0, };
 
   visuals = xcb_depth_visuals(d.data);


-- 
Simon Farnsworth
Software Engineer
ONELAN Ltd
http://www.onelan.com


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCHv4] r600g: Use a fake reloc to sleep for fences

2012-02-14 Thread Simon Farnsworth
r300g is able to sleep until a fence completes rather than busywait because
it creates a special buffer object and relocation that stays busy until the
CS containing the fence is finished.

Copy the idea into r600g, and use it to sleep if the user asked for an
infinite wait, falling back to busywaiting if the user provided a timeout.

Signed-off-by: Simon Farnsworth 
---
This is now ready to apply. We might also want to backport it to the 8.0
branch, as without this or an equivalent, a GPU reset can result in
applications busywaiting forever.

v4: Rebase against master - cleanups to r600g had stopped it applying.

At Marek's suggestion, move the sleep BO into r600_pipe.c, instead of
r600_hw_context.c

v3: At Dave Airlie's suggestion on dri-devel, make use of the BO to give us
a fighting chance of recovering after a GPU reset.

We know that the fence will never be signalled by hardware if the dummy BO
has gone idle - we use that information to escape the loop. This might be a
useful addition to the 8.0 branch.

v2: Address Michel's review comments.

The fake bo is now unconditional, and is unreferenced when the fence is
moved to the fence pool by fence_reference. This entailed shifting to
r600_resource, which cleaned the code up a little to boot.

 src/gallium/drivers/r600/r600_pipe.c |   25 +++--
 src/gallium/drivers/r600/r600_pipe.h |1 +
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index bfb01f5..7d2dd20 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -47,6 +47,7 @@
 #include "r600_resource.h"
 #include "r600_shader.h"
 #include "r600_pipe.h"
+#include "r600_hw_context_priv.h"
 
 /*
  * pipe_context
@@ -116,6 +117,14 @@ static struct r600_fence *r600_create_fence(struct 
r600_context *rctx)
 
rscreen->fences.data[fence->index] = 0;
r600_context_emit_fence(rctx, rscreen->fences.bo, fence->index, 1);
+
+   /* Create a dummy BO so that fence_finish without a timeout can sleep 
waiting for completion */
+   fence->sleep_bo = (struct r600_resource*)
+   pipe_buffer_create(&rctx->screen->screen, 
PIPE_BIND_CUSTOM,
+  PIPE_USAGE_STAGING, 1);
+   /* Add the fence as a dummy relocation. */
+   r600_context_bo_reloc(rctx, fence->sleep_bo, RADEON_USAGE_READWRITE);
+
 out:
pipe_mutex_unlock(rscreen->fences.mutex);
return fence;
@@ -584,6 +593,7 @@ static void r600_fence_reference(struct pipe_screen 
*pscreen,
if (pipe_reference(&(*oldf)->reference, &newf->reference)) {
struct r600_screen *rscreen = (struct r600_screen *)pscreen;
pipe_mutex_lock(rscreen->fences.mutex);
+   pipe_resource_reference((struct 
pipe_resource**)&(*oldf)->sleep_bo, NULL);
LIST_ADDTAIL(&(*oldf)->head, &rscreen->fences.pool);
pipe_mutex_unlock(rscreen->fences.mutex);
}
@@ -617,6 +627,17 @@ static boolean r600_fence_finish(struct pipe_screen 
*pscreen,
}
 
while (rscreen->fences.data[rfence->index] == 0) {
+   /* Special-case infinite timeout - wait for the dummy BO to 
become idle */
+   if (timeout == PIPE_TIMEOUT_INFINITE) {
+   rscreen->ws->buffer_wait(rfence->sleep_bo->buf, 
RADEON_USAGE_READWRITE);
+   break;
+   }
+
+   /* The dummy BO will be busy until the CS including the fence 
has completed, or
+* the GPU is reset. Don't bother continuing to spin when the 
BO is idle. */
+   if (!rscreen->ws->buffer_is_busy(rfence->sleep_bo->buf, 
RADEON_USAGE_READWRITE))
+   break;
+
if (++spins % 256)
continue;
 #ifdef PIPE_OS_UNIX
@@ -626,11 +647,11 @@ static boolean r600_fence_finish(struct pipe_screen 
*pscreen,
 #endif
if (timeout != PIPE_TIMEOUT_INFINITE &&
os_time_get() - start_time >= timeout) {
-   return FALSE;
+   break;
}
}
 
-   return TRUE;
+   return rscreen->fences.data[rfence->index] != 0;
 }
 
 static int r600_interpret_tiling(struct r600_screen *rscreen, uint32_t 
tiling_config)
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index f130617..d8e396a 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -204,6 +204,7 @@ struct r600_textures_info {
 struct r600_fence {
struct pipe_reference   reference;
unsignedindex; /* in the shared bo */
+   struct 

Re: [Mesa-dev] [PATCHv3] r600g: Use a fake reloc to sleep for fences

2012-02-10 Thread Simon Farnsworth
On Thursday 9 February 2012 20:31:18 Marek Olšák wrote:
> 2012/2/9 Simon Farnsworth :
> > On Wednesday 8 February 2012 18:28:05 Michel Dänzer wrote:
> >> On Fre, 2012-02-03 at 17:32 +, Simon Farnsworth wrote:
> >> > --- a/src/gallium/drivers/r600/r600_hw_context.c
> >> > +++ b/src/gallium/drivers/r600/r600_hw_context.c
> >> > @@ -1623,6 +1623,13 @@ void r600_context_emit_fence(struct
> >> > +
> >> > +   /* Create a dummy BO so that fence_finish without a timeout
> >> > can sleep waiting for completion */ + *sleep_bo = (struct
> >> > r600_resource*) +
> >> > pipe_buffer_create(&ctx->screen->screen, PIPE_BIND_CUSTOM, +
> >> >  PIPE_USAGE_STAGING, 1); +   /*
> >> > Add the fence as a dummy relocation. */
> >> > +   r600_context_bo_reloc(ctx, *sleep_bo, RADEON_USAGE_READWRITE);
> >>
> >> Sorry for only thinking of this now, but what's the advantage of doing
> >> this here, rather than in r600_create_fence()? Seems like that would
> >> be
> >> simpler and cleaner.
> >
> > I've done it here because this is the bit of code that deals with all
> > the
> > hardware-related stuff, and it already knows about writing out
> > relocations. I have no particular opinion either way, though, and
> > wouldn't get upset about moving it. Thoughts?
>
> FWIW, the "hw_context" files are there for historical reasons only. There is
> nothing wrong with adding relocations and emitting commands from the other
> places of the driver. Eventually, r600.h, r600_pipe.h, and
> r600_hw_context_priv.h will be merged together.
>
OK, so when I rebase this onto master (which will be v4), I should move the
sleep_bo into r600_create_fence.

Is there anything else that needs to be done before this can be considered
for merging to master?
--
Simon Farnsworth
Software Engineer
ONELAN Ltd
http://www.onelan.com


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCHv3] r600g: Use a fake reloc to sleep for fences

2012-02-09 Thread Simon Farnsworth
On Wednesday 8 February 2012 18:28:05 Michel Dänzer wrote:
> On Fre, 2012-02-03 at 17:32 +0000, Simon Farnsworth wrote:
> > r300g is able to sleep until a fence completes rather than busywait
> > because it creates a special buffer object and relocation that stays
> > busy until the CS containing the fence is finished.
> >
> > Copy the idea into r600g, and use it to sleep if the user asked for an
> > infinite wait, falling back to busywaiting if the user provided a
> > timeout.
> >
> > Signed-off-by: Simon Farnsworth 
> > ---
> >
> > v3: At Dave Airlie's suggestion on dri-devel, make use of the BO to give
> > us a fighting chance of recovering after a GPU reset.
> >
> > We know that the fence will never be signalled by hardware if the dummy
> > BO has gone idle - we use that information to escape the loop. This
> > might be a useful addition to the 8.0 branch.
>
> That's a nifty trick. :)
>
It's just a shame that the rest of recovering from a GPU reset isn't as simple
- at least this stops us busywaiting eternally after the kernel's detected a
CP stall.

> > diff --git a/src/gallium/drivers/r600/r600_hw_context.c
> > b/src/gallium/drivers/r600/r600_hw_context.c index 8eb8e6d..acfa494
> > 100644
> > --- a/src/gallium/drivers/r600/r600_hw_context.c
> > +++ b/src/gallium/drivers/r600/r600_hw_context.c
> > @@ -1623,6 +1623,13 @@ void r600_context_emit_fence(struct r600_context
> > *ctx, struct r600_resource *fen>
> > ctx->pm4[ctx->pm4_cdwords++] = 0;   /* DATA_HI
> > */
> > ctx->pm4[ctx->pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0);
> > ctx->pm4[ctx->pm4_cdwords++] = r600_context_bo_reloc(ctx, fence_bo,
> > RADEON_USAGE_WRITE);>
> > +
> > +   /* Create a dummy BO so that fence_finish without a timeout can sleep
> > waiting for completion */ + *sleep_bo = (struct r600_resource*)
> > +   pipe_buffer_create(&ctx->screen->screen, 
> > PIPE_BIND_CUSTOM,
> > +  PIPE_USAGE_STAGING, 1);
> > +   /* Add the fence as a dummy relocation. */
> > +   r600_context_bo_reloc(ctx, *sleep_bo, RADEON_USAGE_READWRITE);
>
> Sorry for only thinking of this now, but what's the advantage of doing
> this here, rather than in r600_create_fence()? Seems like that would be
> simpler and cleaner.

I've done it here because this is the bit of code that deals with all the
hardware-related stuff, and it already knows about writing out relocations. I
have no particular opinion either way, though, and wouldn't get upset about
moving it. Thoughts?
--
Simon Farnsworth
Software Engineer
ONELAN Ltd
http://www.onelan.com


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCHv3] r600g: Use a fake reloc to sleep for fences

2012-02-03 Thread Simon Farnsworth
r300g is able to sleep until a fence completes rather than busywait because
it creates a special buffer object and relocation that stays busy until the
CS containing the fence is finished.

Copy the idea into r600g, and use it to sleep if the user asked for an
infinite wait, falling back to busywaiting if the user provided a timeout.

Signed-off-by: Simon Farnsworth 
---

v3: At Dave Airlie's suggestion on dri-devel, make use of the BO to give us
a fighting chance of recovering after a GPU reset.

We know that the fence will never be signalled by hardware if the dummy BO
has gone idle - we use that information to escape the loop. This might be a
useful addition to the 8.0 branch.

v2: Address Michel's review comments.

The fake bo is now unconditional, and is unreferenced when the fence is
moved to the fence pool by fence_reference. This entailed shifting to
r600_resource, which cleaned the code up a little to boot.

 src/gallium/drivers/r600/r600.h|2 +-
 src/gallium/drivers/r600/r600_hw_context.c |9 -
 src/gallium/drivers/r600/r600_pipe.c   |   18 +++---
 src/gallium/drivers/r600/r600_pipe.h   |1 +
 4 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h
index baf09c1..c5813c2 100644
--- a/src/gallium/drivers/r600/r600.h
+++ b/src/gallium/drivers/r600/r600.h
@@ -284,7 +284,7 @@ void r600_context_queries_resume(struct r600_context *ctx);
 void r600_query_predication(struct r600_context *ctx, struct r600_query 
*query, int operation,
int flag_wait);
 void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
*fence,
- unsigned offset, unsigned value);
+ unsigned offset, unsigned value, struct 
r600_resource **sleep_bo);
 void r600_context_flush_all(struct r600_context *ctx, unsigned flush_flags);
 void r600_context_flush_dest_caches(struct r600_context *ctx);
 
diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 8eb8e6d..acfa494 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -1603,7 +1603,7 @@ void r600_context_flush(struct r600_context *ctx, 
unsigned flags)
}
 }
 
-void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
*fence_bo, unsigned offset, unsigned value)
+void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
*fence_bo, unsigned offset, unsigned value, struct r600_resource **sleep_bo)
 {
uint64_t va;
 
@@ -1623,6 +1623,13 @@ void r600_context_emit_fence(struct r600_context *ctx, 
struct r600_resource *fen
ctx->pm4[ctx->pm4_cdwords++] = 0;   /* DATA_HI */
ctx->pm4[ctx->pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0);
ctx->pm4[ctx->pm4_cdwords++] = r600_context_bo_reloc(ctx, fence_bo, 
RADEON_USAGE_WRITE);
+
+   /* Create a dummy BO so that fence_finish without a timeout can sleep 
waiting for completion */
+   *sleep_bo = (struct r600_resource*)
+   pipe_buffer_create(&ctx->screen->screen, 
PIPE_BIND_CUSTOM,
+  PIPE_USAGE_STAGING, 1);
+   /* Add the fence as a dummy relocation. */
+   r600_context_bo_reloc(ctx, *sleep_bo, RADEON_USAGE_READWRITE);
 }
 
 static unsigned r600_query_read_result(char *map, unsigned start_index, 
unsigned end_index,
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index c38fbc5..9b6015c 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -115,7 +115,7 @@ static struct r600_fence *r600_create_fence(struct 
r600_pipe_context *ctx)
pipe_reference_init(&fence->reference, 1);
 
rscreen->fences.data[fence->index] = 0;
-   r600_context_emit_fence(&ctx->ctx, rscreen->fences.bo, fence->index, 1);
+   r600_context_emit_fence(&ctx->ctx, rscreen->fences.bo, fence->index, 1, 
&fence->sleep_bo);
 out:
pipe_mutex_unlock(rscreen->fences.mutex);
return fence;
@@ -572,6 +572,7 @@ static void r600_fence_reference(struct pipe_screen 
*pscreen,
if (pipe_reference(&(*oldf)->reference, &newf->reference)) {
struct r600_screen *rscreen = (struct r600_screen *)pscreen;
pipe_mutex_lock(rscreen->fences.mutex);
+   pipe_resource_reference((struct 
pipe_resource**)&(*oldf)->sleep_bo, NULL);
LIST_ADDTAIL(&(*oldf)->head, &rscreen->fences.pool);
pipe_mutex_unlock(rscreen->fences.mutex);
}
@@ -605,6 +606,17 @@ static boolean r600_fence_finish(struct pipe_screen 
*pscreen,
}
 
while (rscreen->fences.data[rfence->index] == 0) {
+ 

[Mesa-dev] [PATCH] r600g: Use a fake reloc to sleep for fences

2012-02-02 Thread Simon Farnsworth
r300g is able to sleep until a fence completes rather than busywait because
it creates a special buffer object and relocation that stays busy until the
CS containing the fence is finished.

Copy the idea into r600g, and use it to sleep if the user asked for an
infinite wait, falling back to busywaiting if the user provided a timeout.

Signed-off-by: Simon Farnsworth 
---

v2: Address Michel's review comments.

The fake bo is now unconditional, and is unreferenced when the fence is
moved to the fence pool by fence_reference. This entailed shifting to
r600_resource, which cleaned the code up a little to boot.

 src/gallium/drivers/r600/r600.h|2 +-
 src/gallium/drivers/r600/r600_hw_context.c |9 -
 src/gallium/drivers/r600/r600_pipe.c   |9 -
 src/gallium/drivers/r600/r600_pipe.h   |1 +
 4 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h
index baf09c1..c5813c2 100644
--- a/src/gallium/drivers/r600/r600.h
+++ b/src/gallium/drivers/r600/r600.h
@@ -284,7 +284,7 @@ void r600_context_queries_resume(struct r600_context *ctx);
 void r600_query_predication(struct r600_context *ctx, struct r600_query 
*query, int operation,
int flag_wait);
 void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
*fence,
- unsigned offset, unsigned value);
+ unsigned offset, unsigned value, struct 
r600_resource **sleep_bo);
 void r600_context_flush_all(struct r600_context *ctx, unsigned flush_flags);
 void r600_context_flush_dest_caches(struct r600_context *ctx);
 
diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 8eb8e6d..acfa494 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -1603,7 +1603,7 @@ void r600_context_flush(struct r600_context *ctx, 
unsigned flags)
}
 }
 
-void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
*fence_bo, unsigned offset, unsigned value)
+void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
*fence_bo, unsigned offset, unsigned value, struct r600_resource **sleep_bo)
 {
uint64_t va;
 
@@ -1623,6 +1623,13 @@ void r600_context_emit_fence(struct r600_context *ctx, 
struct r600_resource *fen
ctx->pm4[ctx->pm4_cdwords++] = 0;   /* DATA_HI */
ctx->pm4[ctx->pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0);
ctx->pm4[ctx->pm4_cdwords++] = r600_context_bo_reloc(ctx, fence_bo, 
RADEON_USAGE_WRITE);
+
+   /* Create a dummy BO so that fence_finish without a timeout can sleep 
waiting for completion */
+   *sleep_bo = (struct r600_resource*)
+   pipe_buffer_create(&ctx->screen->screen, 
PIPE_BIND_CUSTOM,
+  PIPE_USAGE_STAGING, 1);
+   /* Add the fence as a dummy relocation. */
+   r600_context_bo_reloc(ctx, *sleep_bo, RADEON_USAGE_READWRITE);
 }
 
 static unsigned r600_query_read_result(char *map, unsigned start_index, 
unsigned end_index,
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index c38fbc5..748869a 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -115,7 +115,7 @@ static struct r600_fence *r600_create_fence(struct 
r600_pipe_context *ctx)
pipe_reference_init(&fence->reference, 1);
 
rscreen->fences.data[fence->index] = 0;
-   r600_context_emit_fence(&ctx->ctx, rscreen->fences.bo, fence->index, 1);
+   r600_context_emit_fence(&ctx->ctx, rscreen->fences.bo, fence->index, 1, 
&fence->sleep_bo);
 out:
pipe_mutex_unlock(rscreen->fences.mutex);
return fence;
@@ -572,6 +572,7 @@ static void r600_fence_reference(struct pipe_screen 
*pscreen,
if (pipe_reference(&(*oldf)->reference, &newf->reference)) {
struct r600_screen *rscreen = (struct r600_screen *)pscreen;
pipe_mutex_lock(rscreen->fences.mutex);
+   pipe_resource_reference((struct 
pipe_resource**)&(*oldf)->sleep_bo, NULL);
LIST_ADDTAIL(&(*oldf)->head, &rscreen->fences.pool);
pipe_mutex_unlock(rscreen->fences.mutex);
}
@@ -605,6 +606,12 @@ static boolean r600_fence_finish(struct pipe_screen 
*pscreen,
}
 
while (rscreen->fences.data[rfence->index] == 0) {
+   /* Special-case infinite timeout */
+   if (timeout == PIPE_TIMEOUT_INFINITE) {
+   rscreen->ws->buffer_wait(rfence->sleep_bo->buf, 
RADEON_USAGE_READWRITE);
+   continue;
+   }
+
if (++spins % 256)
continue;

Re: [Mesa-dev] [PATCH] r600g: Use a fake reloc to sleep for fences

2012-02-01 Thread Simon Farnsworth
On Wednesday 1 February 2012, Michel Dänzer  wrote:
> On Mit, 2012-02-01 at 15:01 +0000, Simon Farnsworth wrote: 
> > +   if (sleep_bo) {
> > +   unsigned reloc_index;
> > +   /* Create a dummy BO so that fence_finish without a timeout can 
> > sleep waiting for completion */
> > +   *sleep_bo = ctx->ws->buffer_create(ctx->ws, 1, 1,
> > +  PIPE_BIND_CUSTOM,
> > +  RADEON_DOMAIN_GTT);
> > +   /* Add the fence as a dummy relocation. */
> > +   reloc_index = ctx->ws->cs_add_reloc(ctx->cs,
> > +   
> > ctx->ws->buffer_get_cs_handle(*sleep_bo),
> > +   RADEON_USAGE_READWRITE, 
> > RADEON_DOMAIN_GTT);
> > +   if (reloc_index >= ctx->creloc)
> > +   ctx->creloc = reloc_index+1;
> > +   }
> 
> Is there a point in making sleep_bo optional?
>
I can't think of a reason to make it optional; I'll remove that in v2.
> 
> > diff --git a/src/gallium/drivers/r600/r600_pipe.c 
> > b/src/gallium/drivers/r600/r600_pipe.c
> > index c38fbc5..71e31b1 100644
> > --- a/src/gallium/drivers/r600/r600_pipe.c
> > +++ b/src/gallium/drivers/r600/r600_pipe.c
> > @@ -605,6 +605,14 @@ static boolean r600_fence_finish(struct pipe_screen 
> > *pscreen,
> > }
> >  
> > while (rscreen->fences.data[rfence->index] == 0) {
> > +   /* Special-case infinite timeout */
> > +   if (timeout == PIPE_TIMEOUT_INFINITE &&
> > +   rfence->sleep_bo) {
> > +   rscreen->ws->buffer_wait(rfence->sleep_bo, 
> > RADEON_USAGE_READWRITE);
> > +   pb_reference(&rfence->sleep_bo, NULL);
> > +   continue;
> > +   }
> 
> I think rfence->sleep_bo should only be unreferenced in
> r600_fence_reference() when the fence is recycled, otherwise it'll be
> leaked if r600_fence_finish() is never called for some reason.
>
I'll fix this in v2.

> If r600_fence_finish() only ever called os_time_sleep(), never
> sched_yield() (like r300_fence_finish()), would that avoid your problem
> even with a finite timeout?
>
I experimented with that - depending on the specific workload, I need the
timeout to vary, otherwise I can see the impact of the loop in terms of bad
latency behaviour (resulting in occasional dropped frames). For the
workloads I tried, I needed the sleep to vary between 1 usec (for
low-complexity workloads) and 100 usec (for high complexity
workloads). Recompiling Mesa for each workload is obviously not an option.

I did try an adaptive spin - essentially removing the "if (spins++ % 256)
continue", and adding:

if (spins < 40)
os_sleep_time(1);
else if (spins < 100)
os_sleep_time(10);
else
os_sleep_time(100);

But I felt this was ugly, when the core problem is that I want to sleep
until completion, the hardware has support for sleeping until completion,
and the only reason I can't is deficiencies in the driver stack.

Fundamentally, I suspect that the reason I'm seeing pain from this and other
people aren't is that I'm comparing an AMD E-350 to an Intel Atom D510, and
I've tuned my software stack on the D510 to within an inch of its life.

My expectation is that the better GPU in the E-350 will make my 2D
graphics-intensive workload (OpenGL compositing of 2D movies) perform about
as well as it did on the D510 - sleep-based waiting for fence completion
gets in the way, as the D510 has slightly more CPU power than the E-350, and
I'm not (yet) fully exploiting the E-350's GPU.
-- 
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Use a fake reloc to sleep for fences

2012-02-01 Thread Simon Farnsworth
On Wednesday 1 February 2012, Simon Farnsworth  
wrote:
> This is a userspace only fix for my problem, as suggested by Mark Olšák. It

My apologies, Marek; my typing appears to have failed me, and renamed you to
Mark. I shall try not to make that mistake again.
-- 
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Use a fake reloc to sleep for fences

2012-02-01 Thread Simon Farnsworth
r300g is able to sleep until a fence completes rather than busywait because
it creates a special buffer object and relocation that stays busy until the
CS containing the fence is finished.

Copy the idea into r600g, and use it to sleep if the user asked for an
infinite wait, falling back to busywaiting if the user provided a timeout.

Signed-off-by: Simon Farnsworth 
---
This is a userspace only fix for my problem, as suggested by Mark Olšák. It
doesn't fix the case with a timeout (which doesn't matter to me), but works
on existing kernels.

Given that my previous suggested fix opened a can of worms (mostly because I
don't fully understand modern Radeon hardware), this is probably enough of a
fix to apply for now, leaving a proper fix for someone with more
understanding of the way multiple rings interact.

 src/gallium/drivers/r600/r600.h|2 +-
 src/gallium/drivers/r600/r600_hw_context.c |   16 +++-
 src/gallium/drivers/r600/r600_pipe.c   |   10 +-
 src/gallium/drivers/r600/r600_pipe.h   |1 +
 4 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h
index baf09c1..bac4890 100644
--- a/src/gallium/drivers/r600/r600.h
+++ b/src/gallium/drivers/r600/r600.h
@@ -284,7 +284,7 @@ void r600_context_queries_resume(struct r600_context *ctx);
 void r600_query_predication(struct r600_context *ctx, struct r600_query 
*query, int operation,
int flag_wait);
 void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
*fence,
- unsigned offset, unsigned value);
+ unsigned offset, unsigned value, struct pb_buffer 
**sleep_bo);
 void r600_context_flush_all(struct r600_context *ctx, unsigned flush_flags);
 void r600_context_flush_dest_caches(struct r600_context *ctx);
 
diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 8eb8e6d..cc81c48 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -1603,7 +1603,7 @@ void r600_context_flush(struct r600_context *ctx, 
unsigned flags)
}
 }
 
-void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
*fence_bo, unsigned offset, unsigned value)
+void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource 
*fence_bo, unsigned offset, unsigned value, struct pb_buffer **sleep_bo)
 {
uint64_t va;
 
@@ -1623,6 +1623,20 @@ void r600_context_emit_fence(struct r600_context *ctx, 
struct r600_resource *fen
ctx->pm4[ctx->pm4_cdwords++] = 0;   /* DATA_HI */
ctx->pm4[ctx->pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0);
ctx->pm4[ctx->pm4_cdwords++] = r600_context_bo_reloc(ctx, fence_bo, 
RADEON_USAGE_WRITE);
+
+   if (sleep_bo) {
+   unsigned reloc_index;
+   /* Create a dummy BO so that fence_finish without a timeout can 
sleep waiting for completion */
+   *sleep_bo = ctx->ws->buffer_create(ctx->ws, 1, 1,
+  PIPE_BIND_CUSTOM,
+  RADEON_DOMAIN_GTT);
+   /* Add the fence as a dummy relocation. */
+   reloc_index = ctx->ws->cs_add_reloc(ctx->cs,
+   
ctx->ws->buffer_get_cs_handle(*sleep_bo),
+   RADEON_USAGE_READWRITE, 
RADEON_DOMAIN_GTT);
+   if (reloc_index >= ctx->creloc)
+   ctx->creloc = reloc_index+1;
+   }
 }
 
 static unsigned r600_query_read_result(char *map, unsigned start_index, 
unsigned end_index,
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index c38fbc5..71e31b1 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -115,7 +115,7 @@ static struct r600_fence *r600_create_fence(struct 
r600_pipe_context *ctx)
pipe_reference_init(&fence->reference, 1);
 
rscreen->fences.data[fence->index] = 0;
-   r600_context_emit_fence(&ctx->ctx, rscreen->fences.bo, fence->index, 1);
+   r600_context_emit_fence(&ctx->ctx, rscreen->fences.bo, fence->index, 1, 
&fence->sleep_bo);
 out:
pipe_mutex_unlock(rscreen->fences.mutex);
return fence;
@@ -605,6 +605,14 @@ static boolean r600_fence_finish(struct pipe_screen 
*pscreen,
}
 
while (rscreen->fences.data[rfence->index] == 0) {
+   /* Special-case infinite timeout */
+   if (timeout == PIPE_TIMEOUT_INFINITE &&
+   rfence->sleep_bo) {
+   rscreen->ws->buffer_wait(rfence->sleep_bo, 
RADEON_USAGE_READWRITE);
+

[Mesa-dev] [PATCHv2] drm/radeon: Add support for userspace fence waits

2012-02-01 Thread Simon Farnsworth
Userspace currently busywaits for fences to complete; on my workload, this
busywait consumes 10% of the available CPU time.

Provide an ioctl so that userspace can wait for an EOP interrupt that
corresponds to a previous EVENT_WRITE_EOP.

Signed-off-by: Simon Farnsworth 
---
There's debate ongoing about whether this is a sensible interface; I've
cleaned up in accordance with Dave Airlie's review comments, so that there's
a nicer starting point if the debate ends up agreeing that this is actually
the sort of interface we want.

In the meantime, I'm going to leave this interface alone, and work on
porting r300g's fence mechanism to r600g, using it to sleep instead of
spinning when fences are issued with an infinite timeout, but using the
existing busywait mechanism if a timeout is provided.

 drivers/gpu/drm/radeon/radeon.h   |3 ++
 drivers/gpu/drm/radeon/radeon_drv.c   |3 +-
 drivers/gpu/drm/radeon/radeon_fence.c |2 +
 drivers/gpu/drm/radeon/radeon_gem.c   |   63 +
 drivers/gpu/drm/radeon/radeon_kms.c   |1 +
 include/drm/radeon_drm.h  |   31 
 6 files changed, 102 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 2859406..00c187b 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -217,6 +217,7 @@ struct radeon_fence_driver {
unsigned long   last_jiffies;
unsigned long   last_timeout;
wait_queue_head_t   queue;
+   wait_queue_head_t   userspace_queue;
struct list_headcreated;
struct list_heademitted;
struct list_headsignaled;
@@ -1348,6 +1349,8 @@ int radeon_gem_set_tiling_ioctl(struct drm_device *dev, 
void *data,
struct drm_file *filp);
 int radeon_gem_get_tiling_ioctl(struct drm_device *dev, void *data,
struct drm_file *filp);
+int radeon_gem_wait_user_fence_ioctl(struct drm_device *dev, void *data,
+struct drm_file *filp);
 
 /* VRAM scratch page for HDP bug, default vram page */
 struct r600_vram_scratch {
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
b/drivers/gpu/drm/radeon/radeon_drv.c
index 4ae2e1d..9f82fa9 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -56,9 +56,10 @@
  *   2.12.0 - RADEON_CS_KEEP_TILING_FLAGS
  *   2.13.0 - virtual memory support
  *   2.14.0 - add evergreen tiling informations
+ *   2.15.0 - gem_wait_user_fence ioctl
  */
 #define KMS_DRIVER_MAJOR   2
-#define KMS_DRIVER_MINOR   14
+#define KMS_DRIVER_MINOR   15
 #define KMS_DRIVER_PATCHLEVEL  0
 int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags);
 int radeon_driver_unload_kms(struct drm_device *dev);
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index 64ea3dd..d86bc28 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -356,6 +356,7 @@ void radeon_fence_process(struct radeon_device *rdev, int 
ring)
if (wake) {
wake_up_all(&rdev->fence_drv[ring].queue);
}
+   wake_up_interruptible_all(&rdev->fence_drv[ring].userspace_queue);
 }
 
 int radeon_fence_count_emitted(struct radeon_device *rdev, int ring)
@@ -421,6 +422,7 @@ static void radeon_fence_driver_init_ring(struct 
radeon_device *rdev, int ring)
INIT_LIST_HEAD(&rdev->fence_drv[ring].emitted);
INIT_LIST_HEAD(&rdev->fence_drv[ring].signaled);
init_waitqueue_head(&rdev->fence_drv[ring].queue);
+   init_waitqueue_head(&rdev->fence_drv[ring].userspace_queue);
rdev->fence_drv[ring].initialized = false;
 }
 
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c 
b/drivers/gpu/drm/radeon/radeon_gem.c
index 7337850..765fc6d 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -531,3 +531,66 @@ int radeon_mode_dumb_destroy(struct drm_file *file_priv,
 {
return drm_gem_handle_delete(file_priv, handle);
 }
+
+int radeon_gem_wait_user_fence_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *filp)
+{
+   struct drm_radeon_gem_wait_user_fence *args = data;
+   struct radeon_device *rdev = dev->dev_private;
+   struct drm_gem_object *gobj;
+   struct radeon_bo *robj;
+   void *buffer_data;
+   uint32_t *fence_data;
+   int r = 0;
+   long timeout;
+   int ring = RADEON_RING_TYPE_GFX_INDEX;
+
+   /* If you're implementing this for other rings, you'll want to share
+  code with radeon_cs_get_ring in radeon_cs.c */
+   if (args->ring != RADEON_CS_RING_GFX) {
+   return -EINVAL;
+   

[Mesa-dev] [PATCHv2] r600g: Use new kernel interface to wait for fences

2012-02-01 Thread Simon Farnsworth
Instead of busywaiting for the GPU to finish a fence, use the new kernel
interface to wait for fence completion.

If the new kernel interface is unavailable, fall back to busywaiting.

Signed-off-by: Simon Farnsworth 
---
This is simply addressing Michel's review comments against the v1 patch.

There is ongoing debate on the dri-devel list about the required kernel
interface to make this patch useful; until that discussion is resolved, this
patch should probably not be applied.

 src/gallium/drivers/r600/r600_hw_context.c|2 +-
 src/gallium/drivers/r600/r600_pipe.c  |   14 +++--
 src/gallium/winsys/radeon/drm/radeon_drm_bo.c |   34 +
 src/gallium/winsys/radeon/drm/radeon_winsys.h |   16 +++
 4 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 8eb8e6d..35a57a7 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -1618,7 +1618,7 @@ void r600_context_emit_fence(struct r600_context *ctx, 
struct r600_resource *fen
ctx->pm4[ctx->pm4_cdwords++] = 
EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_TS_EVENT) | EVENT_INDEX(5);
ctx->pm4[ctx->pm4_cdwords++] = va & 0xUL;   /* ADDRESS_LO */
/* DATA_SEL | INT_EN | ADDRESS_HI */
-   ctx->pm4[ctx->pm4_cdwords++] = (1 << 29) | (0 << 24) | ((va >> 32UL) & 
0xFF);
+   ctx->pm4[ctx->pm4_cdwords++] = (1 << 29) | (2 << 24) | ((va >> 32UL) & 
0xFF);
ctx->pm4[ctx->pm4_cdwords++] = value;   /* DATA_LO */
ctx->pm4[ctx->pm4_cdwords++] = 0;   /* DATA_HI */
ctx->pm4[ctx->pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0);
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index c38fbc5..c03a176 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -595,7 +595,6 @@ static boolean r600_fence_finish(struct pipe_screen 
*pscreen,
struct r600_screen *rscreen = (struct r600_screen *)pscreen;
struct r600_fence *rfence = (struct r600_fence*)fence;
int64_t start_time = 0;
-   unsigned spins = 0;
 
if (timeout != PIPE_TIMEOUT_INFINITE) {
start_time = os_time_get();
@@ -605,16 +604,13 @@ static boolean r600_fence_finish(struct pipe_screen 
*pscreen,
}
 
while (rscreen->fences.data[rfence->index] == 0) {
-   if (++spins % 256)
-   continue;
-#ifdef PIPE_OS_UNIX
-   sched_yield();
-#else
-   os_time_sleep(10);
-#endif
+   rscreen->ws->buffer_wait_fence(rscreen->fences.bo->buf,
+  rfence->index << 2,
+  0,
+  timeout);
if (timeout != PIPE_TIMEOUT_INFINITE &&
os_time_get() - start_time >= timeout) {
-   return FALSE;
+   return rscreen->fences.data[rfence->index] == 0 ? FALSE 
: TRUE;
}
}
 
diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
index 143dcf9..b83791d 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
@@ -879,6 +879,36 @@ static uint64_t radeon_winsys_bo_va(struct pb_buffer 
*buffer)
 return bo->va;
 }
 
+/* No kernel support for doing this faster - just spin */
+static void radeon_winsys_bo_wait_fence_nokernel(struct pb_buffer *buf,
+unsigned offset,
+uint32_t value,
+uint64_t timeout)
+{
+#ifdef PIPE_OS_UNIX
+sched_yield();
+#else
+os_time_sleep(10);
+#endif
+}
+
+static void radeon_winsys_bo_wait_fence(struct pb_buffer *_buf,
+   unsigned offset,
+   uint32_t value,
+   uint64_t timeout)
+{
+struct radeon_bo *bo = get_radeon_bo(_buf);
+struct drm_radeon_gem_wait_user_fence args;
+memset(&args, 0, sizeof(args));
+args.handle = bo->handle;
+args.ring = RADEON_CS_RING_GFX;
+args.offset = offset;
+args.value = value;
+args.timeout_usec = timeout;
+while (drmCommandWrite(bo->rws->fd, DRM_RADEON_GEM_WAIT_USER_FENCE,
+   &args, sizeof(args)) == -EBUSY);
+}
+
 void radeon_bomgr_init_functions(struct radeon_drm_winsys *ws)
 {
 ws->base.buffer_get_cs_handle = radeon_drm_get_cs_handle;
@@ -892,4 +922,8 @@ void radeon_bomgr_init_functions(stru

Re: [Mesa-dev] [PATCH] r600g: Use new kernel interface to wait for fences

2012-02-01 Thread Simon Farnsworth
On Tuesday 31 January 2012, Michel Dänzer  wrote:
> On Die, 2012-01-31 at 17:02 +0000, Simon Farnsworth wrote: 
> > Instead of busywaiting for the GPU to finish a fence, use the new kernel
> > interface to wait for fence completion.
> > 
> > If the new kernel interface is unavailable, fall back to busywaiting.
> > 
> > if (timeout != PIPE_TIMEOUT_INFINITE &&
> > os_time_get() - start_time >= timeout) {
> > return FALSE;
> 
> Maybe add something like
> 
> if (rscreen->fences.data[rfence->index])
> return TRUE;
> 
> before the timeout check? Otherwise there may be a false negative if the
> fence signalled just before the timeout.

I'll fix this - I think I'd prefer to use a ternary operator in the return.

> > diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
> > b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> > index 143dcf9..487fc58 100644
> > --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> > +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
> > @@ -892,4 +922,9 @@ void radeon_bomgr_init_functions(struct 
> > radeon_drm_winsys *ws)
> >  ws->base.buffer_from_handle = radeon_winsys_bo_from_handle;
> >  ws->base.buffer_get_handle = radeon_winsys_bo_get_handle;
> >  ws->base.buffer_get_virtual_address = radeon_winsys_bo_va;
> > +if (ws->info.drm_major > 2 ||
> > +(ws->info.drm_major == 2 && ws->info.drm_minor >= 15))
> > +ws->base.buffer_wait_fence = radeon_winsys_bo_wait_fence;
> > +else
> > +ws->base.buffer_wait_fence = radeon_winsys_bo_wait_fence_nokernel;
> >  }
>
> We have no idea what kind of API a hypothetical major version > 2 might
> have, so I think it's better to only check for (ws->info.drm_minor >=
> 15) here.
>
I'll make that change as well and respin.
-- 
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Do not use sched_yield() on Linux

2012-01-31 Thread Simon Farnsworth
On Tuesday 31 January 2012, Alan Swanson  wrote:
> As discussed back in 2003, sched_yield() on Linux is no longer
> equivalent to other POSIX variations. From a LWN article; "This call
> used to simply move the process to the end of the run queue; now it
> moves the process to the "expired" queue, effectively cancelling the
> rest of the process's time slice. So a process calling sched_yield() now
> must wait until all other runnable processes in the system have used up
> their time slices before it will get the processor again."
> 
> However its use on Linux has sneaked back in causing suboptimal
> performance such as reported by Simon Farnsworth on r600g. Use sleep on
> Linux instead.
> 
> Signed-off-by: Alan Swanson 
>
I should note that I'm running my 3D app real-time (SCHED_RR) - the
performance pain I was facing is that I got to hog one CPU core waiting for
the GPU, while having several (three or four, typically) other processes
(SCHED_OTHER priority) waiting for free CPU time. As the E-350 is not
overblessed with CPU time in the first place, this hurt.

I've fixed my problem by a different route (in other patches sent to the
list); I no longer spin at all when waiting for fences to complete, instead
waiting for an interrupt from the GPU.

Having said that, I'll rebase your patch on top of my changes, and confirm
that it doesn't break things for me.
-- 
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Use new kernel interface to wait for fences

2012-01-31 Thread Simon Farnsworth
Instead of busywaiting for the GPU to finish a fence, use the new kernel
interface to wait for fence completion.

If the new kernel interface is unavailable, fall back to busywaiting.

Signed-off-by: Simon Farnsworth 
---
This builds on top of the kernel interface I add in Message-Id:
<1328029169-12729-1-git-send-email-simon.farnswo...@onelan.co.uk> and
depends on libdrm's copy of radeon_drm.h being synced with the changes made
in that patch.

As with the kernel patch, I'm working atop Jerome's 2D tiling work, so this
may not apply to any current tree. I'm happy to rebase against another tree
on request, though.

 src/gallium/drivers/r600/r600_hw_context.c|2 +-
 src/gallium/drivers/r600/r600_pipe.c  |   12 +++-
 src/gallium/winsys/radeon/drm/radeon_drm_bo.c |   35 +
 src/gallium/winsys/radeon/drm/radeon_winsys.h |   16 +++
 4 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 8eb8e6d..35a57a7 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -1618,7 +1618,7 @@ void r600_context_emit_fence(struct r600_context *ctx, 
struct r600_resource *fen
ctx->pm4[ctx->pm4_cdwords++] = 
EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_TS_EVENT) | EVENT_INDEX(5);
ctx->pm4[ctx->pm4_cdwords++] = va & 0xUL;   /* ADDRESS_LO */
/* DATA_SEL | INT_EN | ADDRESS_HI */
-   ctx->pm4[ctx->pm4_cdwords++] = (1 << 29) | (0 << 24) | ((va >> 32UL) & 
0xFF);
+   ctx->pm4[ctx->pm4_cdwords++] = (1 << 29) | (2 << 24) | ((va >> 32UL) & 
0xFF);
ctx->pm4[ctx->pm4_cdwords++] = value;   /* DATA_LO */
ctx->pm4[ctx->pm4_cdwords++] = 0;   /* DATA_HI */
ctx->pm4[ctx->pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0);
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index c38fbc5..12c5bf5 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -595,7 +595,6 @@ static boolean r600_fence_finish(struct pipe_screen 
*pscreen,
struct r600_screen *rscreen = (struct r600_screen *)pscreen;
struct r600_fence *rfence = (struct r600_fence*)fence;
int64_t start_time = 0;
-   unsigned spins = 0;
 
if (timeout != PIPE_TIMEOUT_INFINITE) {
start_time = os_time_get();
@@ -605,13 +604,10 @@ static boolean r600_fence_finish(struct pipe_screen 
*pscreen,
}
 
while (rscreen->fences.data[rfence->index] == 0) {
-   if (++spins % 256)
-   continue;
-#ifdef PIPE_OS_UNIX
-   sched_yield();
-#else
-   os_time_sleep(10);
-#endif
+   rscreen->ws->buffer_wait_fence(rscreen->fences.bo->buf,
+  rfence->index << 2,
+  0,
+  timeout);
if (timeout != PIPE_TIMEOUT_INFINITE &&
os_time_get() - start_time >= timeout) {
return FALSE;
diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
index 143dcf9..487fc58 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
@@ -879,6 +879,36 @@ static uint64_t radeon_winsys_bo_va(struct pb_buffer 
*buffer)
 return bo->va;
 }
 
+/* No kernel support for doing this faster - just spin */
+static void radeon_winsys_bo_wait_fence_nokernel(struct pb_buffer *buf,
+unsigned offset,
+uint32_t value,
+uint64_t timeout)
+{
+#ifdef PIPE_OS_UNIX
+sched_yield();
+#else
+os_time_sleep(10);
+#endif
+}
+
+static void radeon_winsys_bo_wait_fence(struct pb_buffer *_buf,
+   unsigned offset,
+   uint32_t value,
+   uint64_t timeout)
+{
+struct radeon_bo *bo = get_radeon_bo(_buf);
+struct drm_radeon_gem_wait_user_fence args;
+memset(&args, 0, sizeof(args));
+args.handle = bo->handle;
+args.ring = RADEON_CS_RING_GFX;
+args.offset = offset;
+args.value = value;
+args.timeout_usec = timeout;
+while (drmCommandWrite(bo->rws->fd, DRM_RADEON_GEM_WAIT_USER_FENCE,
+   &args, sizeof(args)) == -EBUSY);
+}
+
 void radeon_bomgr_init_functions(struct radeon_drm_winsys *ws)
 {
 ws->base.buffer_get_cs_handle = radeon_drm_get_cs_handle;
@@ -892,4 +922,9 @@ void radeon_bomgr_init

Re: [Mesa-dev] [RFC] r600-r800 2D tiling

2012-01-16 Thread Simon Farnsworth
(resending due to my inability to work my e-mail client - I neither cc'd
Jerome, nor used the correct identity, so the original appears to be held in
moderation).

On Thursday 12 January 2012, Jerome Glisse  wrote:
> Hi,
> 
> I don't cross post as i am pretty sure all interested people are reading
> this mailing-list.
> 
> Attached is kernel, libdrm, ddx, mesa/r600g patches to enable 2D tiling
> on r600 to cayman. I haven't yet done a full regression testing but 2D
> tiling seems to work ok. I would like to get feedback on 2 things :
> 
> - the kernel API

I notice that you don't expose all the available Evergreen parameters to
user control (TILE_SPLIT_BYTES, NUM_BANKS are both currently fixed by the
kernel). Is this deliberate?

It looks like it's leftovers from a previous attempt to force Evergreen's
flexible 2D tiling to behave like R600's fixed-by-hardware 2D tiling.

> - using libdrm/radeon as common place for surface allocation
> 
> The second question especialy impact the layering/abstraction of gallium
> btw winsys as it make libdrm/radeon_surface API a part of the winsys.
> The ddx doesn't need as much knowledge as mesa (pretty much the whole
> mipmap tree is pointless to the ddx). So anyone have strong feeling
> about moving the whole mipmap tree computation to this common code ?
>
I'm in favour - it means that all the code relating to the details of how
modern Radeons tile surfaces is in one place.

I've looked at the API you introduce to handle this, and it should be very
easy to port to a non-libdrm platform - the only element of the API that's
currently tied to libdrm is radeon_surface_manager_new, so a new platform
shouldn't struggle to adapt it.

I do have one question; how are you intending to handle passing the tiling
parameters from the DDX to Mesa for GLX_EXT_texture_from_pixmap? Right now,
it works because the DDX uses the surface manager's defaults for tiling, as
does Mesa; I would expect Mesa to read out the parameters as set in the
kernel and use those.

At a future date, I can envisage the DDX wanting to choose a different
tiling layout for DRI2 buffers, or XComposite backing pixmaps (e.g. because
someone's benchmarked it and found that choosing something beyond the bare
minimum that meets constraints improves performance); it would be a shame if
we can't do this because Mesa's not flexible enough.

-- 
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Fix tiling alignment to match docs

2011-12-13 Thread Simon Farnsworth
On Tuesday 13 December 2011, Simon Farnsworth  
wrote:
> R600 and Evergreen have different tiling requirements. Fix Mesa to match the
> documented requirements.
> 
> Signed-off-by: Simon Farnsworth 
> ---
> 
> This doesn't fix my problems with enabling macro tiling, but it does help
> somewhat. I also need to fix tile shape in the kernel and alignment in the
> DDX to avoid misrendering, at which point I see CP lockups instead.
> 
> I'm therefore sending this, the DDX patch, and the kernel patch out in order
> to avoid getting stuck in a world where I have an 80% fix, someone else has
> an 80% fix, and if only we talked, we'd have a 100% fix.
> 
Jerome has told me that he's got closer to fixing 2D tiling than I have -
I'm going to collaborate with him on fixing his remaining bugs rather than
continue with these patches.
-- 
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Fix tiling alignment to match docs

2011-12-13 Thread Simon Farnsworth
R600 and Evergreen have different tiling requirements. Fix Mesa to match the
documented requirements.

Signed-off-by: Simon Farnsworth 
---

This doesn't fix my problems with enabling macro tiling, but it does help
somewhat. I also need to fix tile shape in the kernel and alignment in the
DDX to avoid misrendering, at which point I see CP lockups instead.

I'm therefore sending this, the DDX patch, and the kernel patch out in order
to avoid getting stuck in a world where I have an 80% fix, someone else has
an 80% fix, and if only we talked, we'd have a 100% fix.

 src/gallium/drivers/r600/evergreen_state.c |2 +-
 src/gallium/drivers/r600/r600_texture.c|  140 
 2 files changed, 103 insertions(+), 39 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index d0c02d5..d6b97da 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -1447,7 +1447,7 @@ static void evergreen_cb(struct r600_pipe_context *rctx, 
struct r600_pipe_state
offset >> 8, 0x, &rtex->resource, 
RADEON_USAGE_READWRITE);
r600_pipe_state_add_reg(rstate,
R_028C78_CB_COLOR0_DIM + cb * 0x3C,
-   0x0, 0x, NULL, 0);
+   S_028C78_WIDTH_MAX(surf->base.width) | 
S_028C78_HEIGHT_MAX(surf->base.height), 0x, NULL, 0);
r600_pipe_state_add_reg(rstate,
R_028C70_CB_COLOR0_INFO + cb * 0x3C,
color_info, 0x, &rtex->resource, 
RADEON_USAGE_READWRITE);
diff --git a/src/gallium/drivers/r600/r600_texture.c 
b/src/gallium/drivers/r600/r600_texture.c
index 6143133..16ffe23 100644
--- a/src/gallium/drivers/r600/r600_texture.c
+++ b/src/gallium/drivers/r600/r600_texture.c
@@ -88,23 +88,42 @@ static unsigned r600_get_block_alignment(struct pipe_screen 
*screen,
unsigned pixsize = util_format_get_blocksize(format);
int p_align;
 
-   switch(array_mode) {
-   case V_038000_ARRAY_2D_TILED_THIN1:
-   p_align = MAX2(rscreen->tiling_info.num_banks,
-  (((rscreen->tiling_info.group_bytes / 8) / 
pixsize) * rscreen->tiling_info.num_banks)) * 8;
-   /* further restrictions for scanout */
-   p_align = MAX2(rscreen->tiling_info.num_banks * 8, p_align);
-   break;
-   case V_038000_ARRAY_1D_TILED_THIN1:
-   p_align = MAX2(8, (rscreen->tiling_info.group_bytes / (8 * 
pixsize)));
-   /* further restrictions for scanout */
-   p_align = MAX2((rscreen->tiling_info.group_bytes / pixsize), 
p_align);
-   break;
-   case V_038000_ARRAY_LINEAR_ALIGNED:
-   case V_038000_ARRAY_LINEAR_GENERAL:
-   default:
-   p_align = MAX2(64, rscreen->tiling_info.group_bytes / pixsize);
-   break;
+   if (rscreen->chip_class >= EVERGREEN)
+   {
+   switch(array_mode) {
+   case V_038000_ARRAY_2D_TILED_THIN1:
+   /* Kernel uses bank width of 8, macro tile aspect of 1 
*/
+   p_align = rscreen->tiling_info.num_channels * 8 * 1 * 8;
+   break;
+   case V_038000_ARRAY_1D_TILED_THIN1:
+   p_align = MAX2(8, rscreen->tiling_info.group_bytes /
+  (8 * 1 * pixsize * 1));
+   break;
+   case V_038000_ARRAY_LINEAR_ALIGNED:
+   p_align = MAX2(64, rscreen->tiling_info.group_bytes / 
pixsize);
+   break;
+   case V_038000_ARRAY_LINEAR_GENERAL:
+   default:
+   p_align = 8;
+   }
+   }
+   else
+   {
+   switch(array_mode) {
+   case V_038000_ARRAY_2D_TILED_THIN1:
+   p_align = MAX2(rscreen->tiling_info.num_banks * 8,
+  ((rscreen->tiling_info.group_bytes / 8) 
/ pixsize) * rscreen->tiling_info.num_banks);
+   break;
+   case V_038000_ARRAY_1D_TILED_THIN1:
+   p_align = MAX2(8, (rscreen->tiling_info.group_bytes / 
(8 * pixsize)));
+   break;
+   case V_038000_ARRAY_LINEAR_ALIGNED:
+   p_align = MAX2(64, rscreen->tiling_info.group_bytes / 
pixsize);
+   break;
+   case V_038000_ARRAY_LINEAR_GENERAL:
+   default:
+   p_align = 1;
+   }
}
return p_align;
 }
@@ -115,16 +134,36 @@ static unsigned r600_get_height_alignment(struct 
pipe_screen *screen,
struct r600_screen* rscreen = (st

[Mesa-dev] [PATCH 2/2] r600g: Set tile_type correctly when textures are created

2011-12-01 Thread Simon Farnsworth
>From discussion on IRC, tile_type should be 1 for depth or stencil textures.

On Evergreen, it should also be 1 for some color buffers, but that's handled
in the evergreen_cb function when required.

Signed-off-by: Simon Farnsworth 
---

Doesn't fix my tiled scanout fun, but is needed for correctness.

 src/gallium/drivers/r600/r600_texture.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_texture.c 
b/src/gallium/drivers/r600/r600_texture.c
index 5ac39aa..6143133 100644
--- a/src/gallium/drivers/r600/r600_texture.c
+++ b/src/gallium/drivers/r600/r600_texture.c
@@ -391,6 +391,7 @@ r600_texture_create_object(struct pipe_screen *screen,
resource->b.b.b.screen = screen;
rtex->pitch_override = pitch_in_bytes_override;
rtex->real_format = base->format;
+   rtex->tile_type = util_format_is_depth_or_stencil(base->format) ? 1 : 0;
 
/* We must split depth and stencil into two separate buffers on 
Evergreen. */
if (!(base->flags & R600_RESOURCE_FLAG_TRANSFER) &&
-- 
1.7.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] r600g: Make tiling alignment restrictions match the DDX

2011-12-01 Thread Simon Farnsworth
It's unclear exactly what the alignment requirements are for Radeon tiled
surfaces, but they differ between the DDX and Mesa. Make Mesa match the DDX.

Signed-off-by: Simon Farnsworth 
---
This basically copies across the DDX versions of the restrictions. They
differ, and I'm not sure which one's right.

Someone from AMD should review this before it's applied.

 src/gallium/drivers/r600/r600_texture.c |   22 ++
 1 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_texture.c 
b/src/gallium/drivers/r600/r600_texture.c
index 2d041b0..5ac39aa 100644
--- a/src/gallium/drivers/r600/r600_texture.c
+++ b/src/gallium/drivers/r600/r600_texture.c
@@ -89,21 +89,21 @@ static unsigned r600_get_block_alignment(struct pipe_screen 
*screen,
int p_align;
 
switch(array_mode) {
-   case V_038000_ARRAY_1D_TILED_THIN1:
-   p_align = MAX2(8,
-  ((rscreen->tiling_info.group_bytes / 8 / 
pixsize)));
-   break;
case V_038000_ARRAY_2D_TILED_THIN1:
p_align = MAX2(rscreen->tiling_info.num_banks,
-  (((rscreen->tiling_info.group_bytes / 8 / 
pixsize)) *
-   rscreen->tiling_info.num_banks)) * 8;
+  (((rscreen->tiling_info.group_bytes / 8) / 
pixsize) * rscreen->tiling_info.num_banks)) * 8;
+   /* further restrictions for scanout */
+   p_align = MAX2(rscreen->tiling_info.num_banks * 8, p_align);
break;
-   case V_038000_ARRAY_LINEAR_ALIGNED:
-   p_align = MAX2(64, rscreen->tiling_info.group_bytes / pixsize);
+   case V_038000_ARRAY_1D_TILED_THIN1:
+   p_align = MAX2(8, (rscreen->tiling_info.group_bytes / (8 * 
pixsize)));
+   /* further restrictions for scanout */
+   p_align = MAX2((rscreen->tiling_info.group_bytes / pixsize), 
p_align);
break;
+   case V_038000_ARRAY_LINEAR_ALIGNED:
case V_038000_ARRAY_LINEAR_GENERAL:
default:
-   p_align = rscreen->tiling_info.group_bytes / pixsize;
+   p_align = MAX2(64, rscreen->tiling_info.group_bytes / pixsize);
break;
}
return p_align;
@@ -121,11 +121,9 @@ static unsigned r600_get_height_alignment(struct 
pipe_screen *screen,
break;
case V_038000_ARRAY_1D_TILED_THIN1:
case V_038000_ARRAY_LINEAR_ALIGNED:
-   h_align = 8;
-   break;
case V_038000_ARRAY_LINEAR_GENERAL:
default:
-   h_align = 1;
+   h_align = 8;
break;
}
return h_align;
-- 
1.7.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium: Set renderbuffer's InternalFormat when rendering to texture

2011-09-29 Thread Simon Farnsworth
When an FBO is rendering to a texture (rather than a renderbuffer),
Gallium sets up an internal renderbuffer to handle the rendering, and
copies over enough texture state to make this work.

InternalFormat was missed out, causing glTexCopyImage to take a slow
path unnecessarily.

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=41263

Signed-off-by: Simon Farnsworth 
---
 src/mesa/state_tracker/st_cb_fbo.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_fbo.c 
b/src/mesa/state_tracker/st_cb_fbo.c
index 05139ec..4d32158 100644
--- a/src/mesa/state_tracker/st_cb_fbo.c
+++ b/src/mesa/state_tracker/st_cb_fbo.c
@@ -381,6 +381,7 @@ st_render_texture(struct gl_context *ctx,
rb->Width = texImage->Width2;
rb->Height = texImage->Height2;
rb->_BaseFormat = texImage->_BaseFormat;
+   rb->InternalFormat = texImage->InternalFormat;
/*printf("* render to texture level %d: %d x %d\n", att->TextureLevel, 
rb->Width, rb->Height);*/
 
/*printf("* pipe texture %d x %d\n", pt->width0, pt->height0);*/
-- 
1.7.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Regression in i965 Mesa driver caused by commit c7c64d97836c71eaf2ee3fc6d384877170b8c844

2010-06-22 Thread Simon Farnsworth
Hello Kristian,

Your commit:
commit c7c64d97836c71eaf2ee3fc6d384877170b8c844
Author: Kristian Høgsberg 
Date:   Tue Jun 1 14:33:43 2010 -0400

intel: Fallback to meta if we're asked to CopyTexImage2D from RGB to RGBA

The pixel transfer rules state that we must set alpha to 1.0 in this case
which we can't easily do with the blitter.  We can do to passes: one that
sets the alpha to 0xff and one that copies the RGB bits or we can just
use the 3D engine.  Neither approach seems worth it for this case.

has broken my use of Mesa to snapshot the display using an FBO. In my code, I 
do (where width and height are the size of the screen, snapshot_width and 
snapshot_height are the target size of my snapshot):
glBindFramebufferEXT( GL_FRAMEBUFFER_EXT, 
  framebuffer_object );
glBindRenderbufferEXT( GL_RENDERBUFFER_EXT,
   renderbuffer );
glRenderbufferStorageEXT( GL_RENDERBUFFER_EXT,
  GL_RGB8,
  snapshot_width,
  snapshot_height );
glFramebufferRenderbufferEXT( GL_FRAMEBUFFER_EXT,
  GL_COLOR_ATTACHMENT0_EXT,
  GL_RENDERBUFFER_EXT,
  renderbuffer );
glBindFramebufferEXT( GL_READ_FRAMEBUFFER_EXT,
  0 );
glBlitFramebufferEXT( 0, 0,
  width, height,
  0, snapshot_height,
  snapshot_width, 0,
  GL_COLOR_BUFFER_BIT,
  GL_LINEAR );
glBindFramebufferEXT( GL_FRAMEBUFFER_EXT,
  0 );

to get a snapshot of the current screen contents in my FBO. I then have a 
separate thread (using its own GL context) get the pixel contents of the FBO 
with glReadPixels() and turn it into a PNG file.

With current Mesa master (eb7ef433bbbeabda963e74adf0ef61c47883f292), the 
glBlitFramebufferEXT() call causes my fullscreen application to get stuck - the 
front buffer and back buffer at the time of the glBlitFramebufferEXT() call 
swap 
back and forth as I call glXSwapBuffers(), but no further changes to screen 
contents occur. The FBO appears to have the correct contents the first time I 
do this, but not on future attempts to snapshot the screen.

git bisect fingered this commit as the cause - reverting just this commit makes 
things work as they used to. Is there a better fix (either to my code or to 
Mesa) that I should work on; I can't glReadPixels() directly from the main 
framebuffer, as (not entirely surprisingly) this causes my rendering thread to 
stall for a short but visible period - and while I can stall the snapshotting 
thread indefinitely, I'm not permitted to stall the rendering thread.

I've tried using both GL_RGBA8 and GL_RGB8 renderbuffers, which hasn't helped - 
but it looks from the code like it's the format of the source buffer (the 
system framebuffer) that matters, not the format of the destination buffer.
-- 
Thanks in advance for any advice you can offer,

Simon Farnsworth
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev