Re: [Mesa-dev] [PATCH 1/2] gallium: add texture gather support to gallium

2014-02-08 Thread Christoph Bumiller
On 07.02.2014 23:25, Dave Airlie wrote:
 Doh, yes because GL has ARB_texture_gather then has stuff hidden away
 in ARB_gpu_shader5 I forgot to add the extra bits which I suppose we should 
 do.

 So I've reposted with the component selection in src1 now.
 Hmm seems a bit excessive to use an extra reg for that (gather4 but only
 in d3d11 form uses a src_sel on the sampler reg, but that might not work).
 I realize this is actually more messy than I thought, since the initial
 ARB_texture_gather had the ability to query if multi-channel formats are
 allowed, but had no way to select the channel (somewhat relying on
 ARB_texture_swizzle to do it, though of course you can't issue multiple
 gathers with the same texture to get different channels that way).
 But glsl 4.00 version could select the channel.
 Is the ARB_texture_gather version actually all that useful or could you
 merge the two caps? That is, if you have the ability to fetch from
 multi-channel textures, assume you can also select the channel. The sm4
 version of gather4 also has the single-channel format restriction - I
 guess though some hw really can do 4 channels without channel selection.
 Yeah I think I'll rethink this stuff, it looks like two caps, one for
 MAX_COMPONENTS for ARB_texture_gather4, and just one cap for
 TEXTURE_GATHER_SM5 support which would denote support for all the
 ARB_GPU_shader5 bits.

 Other than that, what about shadow samplers? Gather4 of course can't do
 it (because the d3d10-style opcodes have different opcodes for shadow
 comparisons), but the GL style opcodes are usually the same if shadow
 samplers or not are used. Maybe you don't want to handle that right now,
 just saying that if you'd want to use the same opcode you'd be missing a
 component in case of texture cube arrays... Since this can't be used for
 fixed function though I'd guess nothing would stop you from using a
 different opcode for shadow samplers.

 I've gotten shadow samplers to work with the current opcodes, though I
 have to see about cube arrays if we have the running out of space to
 put everything.

 Also the GPU_shader5 spec has a few more oddities, so you have
 textureGatherOffset which can take a non-constant set of offset values
 to apply to all 4 texels, then you have textureGatherOffsets which
 only takes constants again, but 4 of them, one per texel. Looking at
 radeon hw it appears fglrx decomposes textureGatherOffsets into
 multiple gather instructions at the hw level but using the
 non-constant hw support to do this. So I'm not sure if the gallium
 interface should just support non-constant for all offsets and just
 restrict the GL.

Fwiw Fermi+ support 4 different non-constant offsets, since they're
passed in a register anyway.

 I've reworked the state tracker code already,
  
 http://cgit.freedesktop.org/~airlied/mesa/commit/?h=r600g-texture-gatherid=444bc1c8118d51600a58af8a84088e94d0800b22

 but I suspect I've a bit further down the rabbit hole to go.

 Dave.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/st: expose ARB_texture_rgb10_a2ui if R10G10B10A2_UINT is supported v2

2013-12-26 Thread Christoph Bumiller
---
 src/mesa/state_tracker/st_extensions.c | 4 +++-
 src/mesa/state_tracker/st_format.c | 6 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 5e4a3b3..8c49e54 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -419,7 +419,9 @@ void st_init_extensions(struct st_context *st)
   PIPE_FORMAT_R16G16B16A16_FLOAT } },
 
   { { o(ARB_texture_rgb10_a2ui) },
-{ PIPE_FORMAT_B10G10R10A2_UINT } },
+{ PIPE_FORMAT_R10G10B10A2_UINT,
+  PIPE_FORMAT_B10G10R10A2_UINT },
+ GL_TRUE }, /* at least one format must be supported */
 
   { { o(EXT_framebuffer_sRGB) },
 { PIPE_FORMAT_A8B8G8R8_SRGB,
diff --git a/src/mesa/state_tracker/st_format.c 
b/src/mesa/state_tracker/st_format.c
index 6acf983..320d3d4 100644
--- a/src/mesa/state_tracker/st_format.c
+++ b/src/mesa/state_tracker/st_format.c
@@ -359,6 +359,8 @@ st_mesa_format_to_pipe_format(gl_format mesaFormat)
   return PIPE_FORMAT_R11G11B10_FLOAT;
case MESA_FORMAT_ARGB2101010_UINT:
   return PIPE_FORMAT_B10G10R10A2_UINT;
+   case MESA_FORMAT_ABGR2101010_UINT:
+  return PIPE_FORMAT_R10G10B10A2_UINT;
 
case MESA_FORMAT_XRGB_UNORM:
   return PIPE_FORMAT_B4G4R4X4_UNORM;
@@ -712,6 +714,8 @@ st_pipe_format_to_mesa_format(enum pipe_format format)
 
case PIPE_FORMAT_B10G10R10A2_UINT:
   return MESA_FORMAT_ARGB2101010_UINT;
+   case PIPE_FORMAT_R10G10B10A2_UINT:
+  return MESA_FORMAT_ABGR2101010_UINT;
 
case PIPE_FORMAT_B4G4R4X4_UNORM:
   return MESA_FORMAT_XRGB_UNORM;
@@ -1483,7 +1487,7 @@ static const struct format_mapping format_map[] = {
},
{
   { GL_RGB10_A2UI, 0 },
-  { PIPE_FORMAT_B10G10R10A2_UINT, 0 }
+  { PIPE_FORMAT_R10G10B10A2_UINT, PIPE_FORMAT_B10G10R10A2_UINT, 0 }
},
 };
 
-- 
1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] st/mesa: fix GS varyings for PIPE_CAP_TGSI_TEXCOORD

2013-12-25 Thread Christoph Bumiller
---
 src/mesa/state_tracker/st_program.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/src/mesa/state_tracker/st_program.c 
b/src/mesa/state_tracker/st_program.c
index f72122b..f13132e 100644
--- a/src/mesa/state_tracker/st_program.c
+++ b/src/mesa/state_tracker/st_program.c
@@ -944,17 +944,16 @@ st_translate_geometry_program(struct st_context *st,
  case VARYING_SLOT_TEX5:
  case VARYING_SLOT_TEX6:
  case VARYING_SLOT_TEX7:
-stgp-input_semantic_name[slot] = TGSI_SEMANTIC_GENERIC;
+stgp-input_semantic_name[slot] = st-needs_texcoord_semantic ?
+   TGSI_SEMANTIC_TEXCOORD : TGSI_SEMANTIC_GENERIC;
 stgp-input_semantic_index[slot] = (attr - VARYING_SLOT_TEX0);
 break;
  case VARYING_SLOT_VAR0:
  default:
 assert(attr = VARYING_SLOT_VAR0  attr  VARYING_SLOT_MAX);
 stgp-input_semantic_name[slot] = TGSI_SEMANTIC_GENERIC;
-stgp-input_semantic_index[slot] = (VARYING_SLOT_VAR0 -
-VARYING_SLOT_TEX0 +
-attr -
-VARYING_SLOT_VAR0);
+stgp-input_semantic_index[slot] = st-needs_texcoord_semantic ?
+   (attr - VARYING_SLOT_VAR0) : (attr - VARYING_SLOT_TEX0);
  break;
  }
   }
@@ -1036,7 +1035,8 @@ st_translate_geometry_program(struct st_context *st,
  case VARYING_SLOT_TEX5:
  case VARYING_SLOT_TEX6:
  case VARYING_SLOT_TEX7:
-gs_output_semantic_name[slot] = TGSI_SEMANTIC_GENERIC;
+gs_output_semantic_name[slot] = st-needs_texcoord_semantic ?
+   TGSI_SEMANTIC_TEXCOORD : TGSI_SEMANTIC_GENERIC;
 gs_output_semantic_index[slot] = (attr - VARYING_SLOT_TEX0);
 break;
  case VARYING_SLOT_VAR0:
@@ -1044,10 +1044,9 @@ st_translate_geometry_program(struct st_context *st,
 assert(slot  Elements(gs_output_semantic_name));
 assert(attr = VARYING_SLOT_VAR0);
 gs_output_semantic_name[slot] = TGSI_SEMANTIC_GENERIC;
-gs_output_semantic_index[slot] = (VARYING_SLOT_VAR0 -
-  VARYING_SLOT_TEX0 +
-  attr - 
-  VARYING_SLOT_VAR0);
+gs_output_semantic_index[slot] = st-needs_texcoord_semantic ?
+   (attr - VARYING_SLOT_VAR0) : (attr - VARYING_SLOT_TEX0);
+ break;
  }
   }
}
-- 
1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] mesa/st: expose ARB_texture_rgb10_a2ui if R10G10B10A2_UINT is supported

2013-12-25 Thread Christoph Bumiller
---
 src/mesa/state_tracker/st_extensions.c | 4 +++-
 src/mesa/state_tracker/st_format.c | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 5e4a3b3..8c49e54 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -419,7 +419,9 @@ void st_init_extensions(struct st_context *st)
   PIPE_FORMAT_R16G16B16A16_FLOAT } },
 
   { { o(ARB_texture_rgb10_a2ui) },
-{ PIPE_FORMAT_B10G10R10A2_UINT } },
+{ PIPE_FORMAT_R10G10B10A2_UINT,
+  PIPE_FORMAT_B10G10R10A2_UINT },
+ GL_TRUE }, /* at least one format must be supported */
 
   { { o(EXT_framebuffer_sRGB) },
 { PIPE_FORMAT_A8B8G8R8_SRGB,
diff --git a/src/mesa/state_tracker/st_format.c 
b/src/mesa/state_tracker/st_format.c
index 6acf983..2bb07e7 100644
--- a/src/mesa/state_tracker/st_format.c
+++ b/src/mesa/state_tracker/st_format.c
@@ -813,7 +813,7 @@ static const struct format_mapping format_map[] = {
},
{
   { GL_RGB10_A2, 0 },
-  { PIPE_FORMAT_B10G10R10A2_UNORM, DEFAULT_RGBA_FORMATS }
+  { PIPE_FORMAT_R10G10B10A2_UNORM, PIPE_FORMAT_B10G10R10A2_UNORM, 
DEFAULT_RGBA_FORMATS }
},
{
   { 4, GL_RGBA, GL_RGBA8, 0 },
-- 
1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] nv50: add more RGB10A2 formats

2013-12-25 Thread Christoph Bumiller
---
 src/gallium/drivers/nouveau/nv50/nv50_formats.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_formats.c 
b/src/gallium/drivers/nouveau/nv50/nv50_formats.c
index 0a7e812..b301890 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_formats.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_formats.c
@@ -202,6 +202,8 @@ const struct nv50_format 
nv50_format_table[PIPE_FORMAT_COUNT] =
TBV, 1),
C4A(R10G10B10A2_SNORM, NONE, C0, C1, C2, C3, SNORM, 10_10_10_2, TV, 0),
C4A(B10G10R10A2_SNORM, NONE, C2, C1, C0, C3, SNORM, 10_10_10_2, TV, 1),
+   C4A(R10G10B10A2_UINT, RGB10_A2_UINT, C0, C1, C2, C3, UINT, 10_10_10_2, TRV, 
0),
+   C4A(B10G10R10A2_UINT, RGB10_A2_UINT, C2, C1, C0, C3, UINT, 10_10_10_2, TV, 
0),
 
F3B(R11G11B10_FLOAT, R11G11B10_FLOAT, C0, C1, C2, xx, FLOAT, 11_11_10, IB),
 
@@ -394,6 +396,11 @@ const struct nv50_format 
nv50_format_table[PIPE_FORMAT_COUNT] =
F1A(R16_SSCALED, NONE, C0, xx, xx, xx, SSCALED, 16, V),
F1A(R16_USCALED, NONE, C0, xx, xx, xx, USCALED, 16, V),
 
+   C4A(R10G10B10A2_USCALED, NONE, C0, C1, C2, C3, USCALED, 10_10_10_2, V, 0),
+   C4A(R10G10B10A2_SSCALED, NONE, C0, C1, C2, C3, SSCALED, 10_10_10_2, V, 0),
+   C4A(B10G10R10A2_USCALED, NONE, C0, C1, C2, C3, USCALED, 10_10_10_2, V, 1),
+   C4A(B10G10R10A2_SSCALED, NONE, C0, C1, C2, C3, SSCALED, 10_10_10_2, V, 1),
+
C4A(R8G8B8A8_SSCALED, NONE, C0, C1, C2, C3, SSCALED, 8_8_8_8, V, 0),
C4A(R8G8B8A8_USCALED, NONE, C0, C1, C2, C3, USCALED, 8_8_8_8, V, 0),
F3A(R8G8B8_UNORM, NONE, C0, C1, C2, xx, UNORM, 8_8_8, V),
-- 
1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nv50: implement multisample textures

2013-10-25 Thread Christoph Bumiller
On 25.10.2013 20:35, Emil Velikov wrote:
 On 21/10/13 23:23, Bryan Cain wrote:
 This is a port of 4da54c91d24da (nvc0: implement multisample textures) to
 nv50.

 When coupled with the patch to only report 16 texture samplers (to fix
 crashes), all of the Piglit tests in spec/arb_texture_multisample pass.

 Hello Bryan,

 Big thanks for your work. As promised here is a quick piglit summary on
 my nv96

 pass/fail/crash
 69/32/27

 * dmesg does not spit anything nouveau related during the tests
 * any geometry shader related tests were skipped
 (piglit: info: Failed to create GL 3.2 core context)
 * all the crashes are due to the following assert
 codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed.

I'm not sure how you'd get  4 arguments there (x y layer sample ?).
There's no mip maps for multisample textures.

But either way you're probably going to have to do things by hand:
E.g. MS8 textures contain contiguous 4x2 rectangles of samples for each
pixel, so you multiply x by 4 and y by 2 to arrive at the sub-rectangle
and then add the correct offsets for the sample id as seen in
get_sample_position (store the info in a constant buffer, that has to be
updated when texture changes).

You might want to use a lookup table like in nve4 compute (look for MS
sample coordinate offsets) to map sample id to coordinate offset, that
one works for any sample count as long as you don't use the ALT modes
(nve4 doesn't need to for textures, but for images/surfaces/UAVs/RATs
where the whole VM address calculation is done by hand).

 PASSarb_texture_multisample-*
 PASSfb-completeness/*
 FAILsample-position/*
 FAILtexelFetch fs sampler2DMS 4*
 CRASH   texelFetch fs sampler2DMSArray 4*
 FAILtexelFetch/*-*s-isampler2DMS
 CRASH   texelFetch/*-*s-isampler2DMSArray
 PASStextureSize/*


 Hope you find this useful :)
 No real world apps that use multisample textures were tested, yet.

 Cheers
 Emil
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nv50: implement multisample textures

2013-10-25 Thread Christoph Bumiller
On 25.10.2013 23:51, Bryan Cain wrote:
 On 10/25/2013 04:11 PM, Christoph Bumiller wrote:
 On 25.10.2013 20:35, Emil Velikov wrote:
 On 21/10/13 23:23, Bryan Cain wrote:
 This is a port of 4da54c91d24da (nvc0: implement multisample textures) to
 nv50.

 When coupled with the patch to only report 16 texture samplers (to fix
 crashes), all of the Piglit tests in spec/arb_texture_multisample pass.

 Hello Bryan,

 Big thanks for your work. As promised here is a quick piglit summary on
 my nv96

 pass/fail/crash
 69/32/27

 * dmesg does not spit anything nouveau related during the tests
 * any geometry shader related tests were skipped
 (piglit: info: Failed to create GL 3.2 core context)
 * all the crashes are due to the following assert
 codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed.
 I'm not sure how you'd get  4 arguments there (x y layer sample ?).
 There's no mip maps for multisample textures.

 But either way you're probably going to have to do things by hand:
 E.g. MS8 textures contain contiguous 4x2 rectangles of samples for each
 pixel, so you multiply x by 4 and y by 2 to arrive at the sub-rectangle
 and then add the correct offsets for the sample id as seen in
 get_sample_position (store the info in a constant buffer, that has to be
 updated when texture changes).

 You might want to use a lookup table like in nve4 compute (look for MS
 sample coordinate offsets) to map sample id to coordinate offset, that
 one works for any sample count as long as you don't use the ALT modes
 (nve4 doesn't need to for textures, but for images/surfaces/UAVs/RATs
 where the whole VM address calculation is done by hand).
 You're probably right.  I don't know why MSAA appears to work for me,
 but there's probably something wrong with the output that I haven't
 noticed.  I'll work on implementing it properly this weekend.

MSAA itself (rendering and resolving) has been working before, the only
thing that ARB_texture_multisample adds is texelFetch from MS resources.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nv50: report only 16 texure_samplers

2013-10-12 Thread Christoph Bumiller
On 12.10.2013 02:47, Emil Velikov wrote:
 On 12/10/13 01:25, Roland Scheidegger wrote:
 Am 12.10.2013 02:02, schrieb Brian Paul:
 On 10/11/2013 10:44 AM, Emil Velikov wrote:
 Current mesa code(cso and drivers) expect and use only up-to 16
 texture samplers.

 Verbatum copy from the nvc0 driver.

 Cc 9.1 mesa-sta...@lists.freedesktop.org
 Cc 9.2 mesa-sta...@lists.freedesktop.org
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70212
 Reported-by: Aaron Watry awa...@gmail.com
 Signed-off-by: Emil Velikov emil.l.veli...@gmail.com
 ---
   src/gallium/drivers/nouveau/nv50/nv50_screen.c | 4 
   1 file changed, 4 insertions(+)

 diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
 b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
 index f454ec7..3f81cc4 100644
 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
 +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
 @@ -249,7 +249,11 @@ nv50_screen_get_shader_param(struct pipe_screen
 *pscreen, unsigned shader,
  case PIPE_SHADER_CAP_INTEGERS:
 return 1;
  case PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLERS:
 +  return 16; /* would be 32 in linked (OpenGL-style) mode */
 +  /*
 +   case PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLER_VIEWS:
 return 32;
 +  */
  default:
 NOUVEAU_ERR(unknown PIPE_SHADER_CAP %d\n, param);
 return 0;

 Since PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLER_VIEWS doesn't really exist,
 I'd rather see it removed entirely.
 Actually it doesn't seem to exist at all?

 Indeed and afaics it never did :)

 As the commit says it's a verbatim copy of nvc0, which also started with
 32 TEXTURE_SAMPLERS.

 If you wanted to future-proof the code you could do
return MIN2(32, PIPE_MAX_SAMPLERS);

 in case we bump PIPE_MAX_SAMPLERS to 32 one of these days.


 In any case, Reviewed-by: Brian Paul bri...@vmware.com


 Well I think there is quite some hw out there which can only do 16
 samplers but more sampler views as this is what d3d10/11 wants (16
 samplers max per stage, but 128 sampler views). So making it queryable
 may have some benefits, but OpenGL can't really make any use of it in
 any case.

 I'm not entirely sure what is the case in here, as I'm a bit short of
 knowledge about the hardware, especially with the lack of documentation.

That comment's there as a reminder that gallium should have that cap.

On nv50 have 2 big tables in VRAM, of texture view (or sampler view in
gallium, shader resource view in d3d) descriptors (TIC) and sampler
descriptors (TSC).
These are mapped to texture and sampler units via a binding table
(that happens as a result of calling glBindTexture).
In the shader you have to select the units, you have 4 bits for the
sampler and 5 bits for the texture unit index.

You can set the hardware to linked mode (LINKED_TSC) so that sampler
unit index automatically corresponds to the texture view unit index.
Now you can't select them independently but you access 32 bindings of
them tied together (which is how OpenGL works).
But gallium requires them to be selectable independently we can't use that.

With Kepler they removed the binding table and you can select the
descriptors directly via a 32-bit register so you can access (1  20)
texture views and (1  12) samplers, there's no problem there.

 FWIW I've intentionally added/copied the SAMPLER_VIEWS, as I feel it's
 beneficial in the long term. That is after going through the build
 system(s), my plan is to jump into the nv50 driver + vdpau st and some
 of the missing extensions and other things along the way :)

 Cheers
 Emil

 Roland
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] regression on nvc0 since floating point compare instructions

2013-09-12 Thread Christoph Bumiller
On 12.09.2013 16:14, Roland Scheidegger wrote:
 Am 12.09.2013 03:40, schrieb Dave Airlie:
 Maybe the type isn't set correctly? Looks to me like these instructions
 end up in mkCmp, which will set both src and dst type but ignore src
 type and set both according to the same type (which was the dst type).

 Roland
 Okay I've attached my next attempt at fixing it, fixes the two testcases I 
 had.

 No idea what setting type there really does but I guess that looks right
 :-). Though I'm wondering if U32 vs. S32 would make a difference for dst
 type since some of the (unsigned) comparisons still would use U32.

It doesn't make a difference, making it signed is unnecessary.
If it helped before that was just because it made negative floats be
interpreted as negative ints (instead of large ints) which has a
slightly better chance of succeeding.

 Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] regression on nvc0 since floating point compare instructions

2013-09-10 Thread Christoph Bumiller
On 10.09.2013 06:55, Dave Airlie wrote:
 On Tue, Sep 10, 2013 at 12:04 PM, Dave Airlie airl...@gmail.com wrote:
 On Tue, Sep 10, 2013 at 11:59 AM, Dave Airlie airl...@gmail.com wrote:
 Hey,

 so virgl stopped working on nouveau the other day and I bisected it to
 the enable of the floating point compare instructions in the state
 tracker,

 I've attached a shader runner file that makes it hang,

 As usual 5 secs after pressing send I had an insight,

 the attached patch seems to fix it here for me.
 
 Okay its a bit wierder than that, found another bunch of regressions,
 

I just noticed that the handler for the TGSI SET instructions assumes
source type == dest type, that should explain it.

My ingenious plan of not having an NV card [plugged in] so that someone
would come along to fill the vacuum of nouveau gallium devs doesn't seem
to work :/

 Here's another shader test that regression from 9.2 to master on nvc0.
 
 Dave.
 
 
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/llvm: don't export more colors than the number of CBs

2013-08-24 Thread Christoph Bumiller
On 24.08.2013 11:44, Christian König wrote:
 Am 24.08.2013 03:30, schrieb Vadim Girlin:
 Currently llvm backend always exports at least one color in pixel
 shader even if no color buffers are enabled. With depth/stencil exports
 this can result in the following code:

 EXPORT PIXEL 0 R0.xyzw  VPM
 EXPORT PIXEL 61R1.x___  VPM
 EXPORT_DONEPIXEL 61R0._x__  VPM  EOP

 AFAIU with zero color buffers no memory is reserved for colors in the
 export
 ring and all exports in this example actually write to the same
 location.
 The code above still works fine in this particular case, because correct
 values are written last, but reordering can break it (especially with SB
 which tends to reorder the exports).

 Signed-off-by: Vadim Girlin vadimgir...@gmail.com

 I briefly remember that we needed at least one color export otherwise
 the GPU might hang, but I'm not 100% sure of that.


If there are no color buffers bound but the original shader writes color
0, you still have to export it to keep the alpha test working ...

 Marek and Alex should probably also take a look on this before we
 commit it.

 Christian.

 ---

 This fixes regressions with LLVM+SB, so I consider it as a prerequisite
 for enabling SB by default. Also it fixes some issues with LLVM
 backend alone.
 Tested on evergreen only (I don't have other hw), needs testing on
 pre-evergreen GPUs.

   src/gallium/drivers/r600/r600_llvm.c   | 2 +-
   src/gallium/drivers/r600/r600_shader.c | 2 +-
   2 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/src/gallium/drivers/r600/r600_llvm.c
 b/src/gallium/drivers/r600/r600_llvm.c
 index 03a68e4..d2f4aff 100644
 --- a/src/gallium/drivers/r600/r600_llvm.c
 +++ b/src/gallium/drivers/r600/r600_llvm.c
 @@ -333,8 +333,8 @@ static void llvm_emit_epilogue(struct
 lp_build_tgsi_context * bld_base)
   } else if (ctx-type == TGSI_PROCESSOR_FRAGMENT) {
   switch (ctx-r600_outputs[i].name) {
   case TGSI_SEMANTIC_COLOR:
 -has_color = true;
   if ( color_count  ctx-color_buffer_count) {
 +has_color = true;
   LLVMValueRef args[3];
   args[0] = output;
   if (ctx-fs_color_all) {
 diff --git a/src/gallium/drivers/r600/r600_shader.c
 b/src/gallium/drivers/r600/r600_shader.c
 index fb766c4..85f8469 100644
 --- a/src/gallium/drivers/r600/r600_shader.c
 +++ b/src/gallium/drivers/r600/r600_shader.c
 @@ -1130,7 +1130,7 @@ static int r600_shader_from_tgsi(struct
 r600_screen *rscreen,
   radeon_llvm_ctx.face_gpr = ctx.face_gpr;
   radeon_llvm_ctx.r600_inputs = ctx.shader-input;
   radeon_llvm_ctx.r600_outputs = ctx.shader-output;
 -radeon_llvm_ctx.color_buffer_count = MAX2(key.nr_cbufs , 1);
 +radeon_llvm_ctx.color_buffer_count = key.nr_cbufs;
   radeon_llvm_ctx.chip_class = ctx.bc-chip_class;
   radeon_llvm_ctx.fs_color_all = shader-fs_write_all 
 (rscreen-chip_class = EVERGREEN);
   radeon_llvm_ctx.stream_outputs = so;

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] nv50: implement new float comparison instructions

2013-08-13 Thread Christoph Bumiller
On 13.08.2013 19:04, srol...@vmware.com wrote:
 From: Roland Scheidegger srol...@vmware.com

 untested.

Looks like it should work though, thanks.
nv50 only supported u32 result all along and on nvc0 both cases are
already handled by the rest of the code, too.

 ---
  .../drivers/nv50/codegen/nv50_ir_from_tgsi.cpp |   17 +
  1 file changed, 17 insertions(+)

 diff --git a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp 
 b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp
 index 56eccac..a2ad9f4 100644
 --- a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp
 +++ b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp
 @@ -440,6 +440,11 @@ nv50_ir::DataType Instruction::inferDstType() const
 switch (getOpcode()) {
 case TGSI_OPCODE_F2U: return nv50_ir::TYPE_U32;
 case TGSI_OPCODE_F2I: return nv50_ir::TYPE_S32;
 +   case TGSI_OPCODE_FSEQ:
 +   case TGSI_OPCODE_FSGE:
 +   case TGSI_OPCODE_FSLT:
 +   case TGSI_OPCODE_FSNE:
 +  return nv50_ir::TYPE_U32;
 case TGSI_OPCODE_I2F:
 case TGSI_OPCODE_U2F:
return nv50_ir::TYPE_F32;
 @@ -456,19 +461,23 @@ nv50_ir::CondCode Instruction::getSetCond() const
 case TGSI_OPCODE_SLT:
 case TGSI_OPCODE_ISLT:
 case TGSI_OPCODE_USLT:
 +   case TGSI_OPCODE_FSLT:
return CC_LT;
 case TGSI_OPCODE_SLE:
return CC_LE;
 case TGSI_OPCODE_SGE:
 case TGSI_OPCODE_ISGE:
 case TGSI_OPCODE_USGE:
 +   case TGSI_OPCODE_FSGE:
return CC_GE;
 case TGSI_OPCODE_SGT:
return CC_GT;
 case TGSI_OPCODE_SEQ:
 case TGSI_OPCODE_USEQ:
 +   case TGSI_OPCODE_FSEQ:
return CC_EQ;
 case TGSI_OPCODE_SNE:
 +   case TGSI_OPCODE_FSNE:
return CC_NEU;
 case TGSI_OPCODE_USNE:
return CC_NE;
 @@ -556,6 +565,10 @@ static nv50_ir::operation translateOpcode(uint opcode)
 NV50_IR_OPCODE_CASE(KILL_IF, DISCARD);
  
 NV50_IR_OPCODE_CASE(F2I, CVT);
 +   NV50_IR_OPCODE_CASE(FSEQ, SET);
 +   NV50_IR_OPCODE_CASE(FSGE, SET);
 +   NV50_IR_OPCODE_CASE(FSLT, SET);
 +   NV50_IR_OPCODE_CASE(FSNE, SET);
 NV50_IR_OPCODE_CASE(IDIV, DIV);
 NV50_IR_OPCODE_CASE(IMAX, MAX);
 NV50_IR_OPCODE_CASE(IMIN, MIN);
 @@ -2354,6 +2367,10 @@ Converter::handleInstruction(const struct 
 tgsi_full_instruction *insn)
 case TGSI_OPCODE_SLE:
 case TGSI_OPCODE_SNE:
 case TGSI_OPCODE_STR:
 +   case TGSI_OPCODE_FSEQ:
 +   case TGSI_OPCODE_FSGE:
 +   case TGSI_OPCODE_FSLT:
 +   case TGSI_OPCODE_FSNE:
 case TGSI_OPCODE_ISGE:
 case TGSI_OPCODE_ISLT:
 case TGSI_OPCODE_USEQ:

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC]: gallium: add new float comparison opcodes returning integer booleans

2013-08-09 Thread Christoph Bumiller
On 09.08.2013 20:42, Roland Scheidegger wrote:
 This is a proposal for new comparison instructions, as the old ones
 don't really fit modern (graphic or opencl I guess for that matter)
 languages well.
 If you've got objections, think the naming is crazy or whatnot I'm open
 for suggestions :-). I would think this is not just a much better fit
 for d3d10/glsl but for hw as well.

I think current hardware can do both, and as for the names, I'm fine
with the prefixed ones being the modern opcodes (prefix referring to
the source type in both cases) and the ones that are named exactly like
the legacy opcodes behaving like the legacy ones.

Otoh newcomers might get confused and think the F prefix meaning that
they should return a float, we had a similar issue with legacy-KIL and
KILP-condition-is-predicate-if-any (and I just need to say again I'd
have preferred to keep the name KIL and rename KILP to DISCARD), but
seriously, the opcodes are documented so it should be no trouble to
figure out what they do (ok in practice that doesn't always work since
we sometimes like to read what we expect instead of what's actually
written).

 Roland

 Am 09.08.2013 20:40, schrieb srol...@vmware.com:
 From: Roland Scheidegger srol...@vmware.com

 The old float comparison opcodes always return floats 0.0 and 1.0 (clarified
 in docs these were really floats, was always the case) for legacy graphics.
 But everybody else (opengl,opencl,d3d10) just has to work around their
 return results (converting the returned float back to int/boolean).
 ---
  src/gallium/docs/source/tgsi.rst |   84 
 ++
  1 file changed, 68 insertions(+), 16 deletions(-)

 diff --git a/src/gallium/docs/source/tgsi.rst 
 b/src/gallium/docs/source/tgsi.rst
 index 949ad89..b7c40cf 100644
 --- a/src/gallium/docs/source/tgsi.rst
 +++ b/src/gallium/docs/source/tgsi.rst
 @@ -512,13 +512,13 @@ This instruction replicates its result.
  
  .. math::
  
 -  dst.x = (src0.x == src1.x) ? 1 : 0
 +  dst.x = (src0.x == src1.x) ? 1.0F : 0.0F
  
 -  dst.y = (src0.y == src1.y) ? 1 : 0
 +  dst.y = (src0.y == src1.y) ? 1.0F : 0.0F
  
 -  dst.z = (src0.z == src1.z) ? 1 : 0
 +  dst.z = (src0.z == src1.z) ? 1.0F : 0.0F
  
 -  dst.w = (src0.w == src1.w) ? 1 : 0
 +  dst.w = (src0.w == src1.w) ? 1.0F : 0.0F
  
  
  .. opcode:: SFL - Set On False
 @@ -538,13 +538,13 @@ This instruction replicates its result.
  
  .. math::
  
 -  dst.x = (src0.x  src1.x) ? 1 : 0
 +  dst.x = (src0.x  src1.x) ? 1.0F : 0.0F
  
 -  dst.y = (src0.y  src1.y) ? 1 : 0
 +  dst.y = (src0.y  src1.y) ? 1.0F : 0.0F
  
 -  dst.z = (src0.z  src1.z) ? 1 : 0
 +  dst.z = (src0.z  src1.z) ? 1.0F : 0.0F
  
 -  dst.w = (src0.w  src1.w) ? 1 : 0
 +  dst.w = (src0.w  src1.w) ? 1.0F : 0.0F
  
  
  .. opcode:: SIN - Sine
 @@ -560,26 +560,26 @@ This instruction replicates its result.
  
  .. math::
  
 -  dst.x = (src0.x = src1.x) ? 1 : 0
 +  dst.x = (src0.x = src1.x) ? 1.0F : 0.0F
  
 -  dst.y = (src0.y = src1.y) ? 1 : 0
 +  dst.y = (src0.y = src1.y) ? 1.0F : 0.0F
  
 -  dst.z = (src0.z = src1.z) ? 1 : 0
 +  dst.z = (src0.z = src1.z) ? 1.0F : 0.0F
  
 -  dst.w = (src0.w = src1.w) ? 1 : 0
 +  dst.w = (src0.w = src1.w) ? 1.0F : 0.0F
  
  
  .. opcode:: SNE - Set On Not Equal
  
  .. math::
  
 -  dst.x = (src0.x != src1.x) ? 1 : 0
 +  dst.x = (src0.x != src1.x) ? 1.0F : 0.0F
  
 -  dst.y = (src0.y != src1.y) ? 1 : 0
 +  dst.y = (src0.y != src1.y) ? 1.0F : 0.0F
  
 -  dst.z = (src0.z != src1.z) ? 1 : 0
 +  dst.z = (src0.z != src1.z) ? 1.0F : 0.0F
  
 -  dst.w = (src0.w != src1.w) ? 1 : 0
 +  dst.w = (src0.w != src1.w) ? 1.0F : 0.0F
  
  
  .. opcode:: STR - Set On True
 @@ -1325,6 +1325,19 @@ Support for these opcodes indicated by 
 PIPE_SHADER_CAP_INTEGERS (all of them?)
  
  
  
 +.. opcode:: FSLT - Float Set On Less Than (ordered)
 +
 +.. math::
 +
 +  dst.x = (src0.x  src1.x) ? ~0 : 0
 +
 +  dst.y = (src0.y  src1.y) ? ~0 : 0
 +
 +  dst.z = (src0.z  src1.z) ? ~0 : 0
 +
 +  dst.w = (src0.w  src1.w) ? ~0 : 0
 +
 +
  .. opcode:: ISLT - Signed Integer Set On Less Than
  
  .. math::
 @@ -1351,6 +1364,19 @@ Support for these opcodes indicated by 
 PIPE_SHADER_CAP_INTEGERS (all of them?)
dst.w = (src0.w  src1.w) ? ~0 : 0
  
  
 +.. opcode:: FSGE - Float Set On Greater Equal Than (ordered)
 +
 +.. math::
 +
 +  dst.x = (src0.x = src1.x) ? ~0 : 0
 +
 +  dst.y = (src0.y = src1.y) ? ~0 : 0
 +
 +  dst.z = (src0.z = src1.z) ? ~0 : 0
 +
 +  dst.w = (src0.w = src1.w) ? ~0 : 0
 +
 +
  .. opcode:: ISGE - Signed Integer Set On Greater Equal Than
  
  .. math::
 @@ -1377,6 +1403,19 @@ Support for these opcodes indicated by 
 PIPE_SHADER_CAP_INTEGERS (all of them?)
dst.w = (src0.w = src1.w) ? ~0 : 0
  
  
 +.. opcode:: FSEQ - Float Set On Equal (ordered)
 +
 +.. math::
 +
 +  dst.x = (src0.x == src1.x) ? ~0 : 0
 +
 +  dst.y = (src0.y == src1.y) ? ~0 : 0
 +
 +  dst.z = (src0.z == src1.z) ? ~0 : 0
 +
 +  dst.w = (src0.w == src1.w) ? ~0 : 0
 +
 +
  .. opcode:: USEQ - Integer Set On Equal
  
  .. math::
 @@ 

Re: [Mesa-dev] [PATCH 4/6] i965/fs: Optimize IF/MOV/ELSE/MOV/ENDIF to SEL when possible.

2013-08-06 Thread Christoph Bumiller
On 06.08.2013 03:28, Kenneth Graunke wrote:
 Many GLSL shaders contain code of the form:
 
x = condition ? foo : bar
 
 The compiler emits an ir_if tree for this, since each subexpression
 might be a complex tree that could have side-effects and short-circuit
 logic operations.
 
 However, the common case is to simply pick one of two constants or
 variable's values---which is exactly what SEL is for.  Replacing IF/ELSE
 with SEL also simplifies the control flow graph, making optimization
 passes which work on basic blocks more effective.
 

Don't you think something like that should be implemented in common code
so that all drivers can profit ?

It would be really nice to have more, useful device-independent
optimizations or simplifications like this already done instead of
requiring each driver to re-implement them (or use llvm).

 The shader-db statistics:
 
total instructions in shared programs: 1655247 - 1503234 (-9.18%)
instructions in affected programs: 949188 - 797175 (-16.02%)
 
2,970 shaders were helped, none hurt.  Gained 181 SIMD16 programs.
 
 This helps Valve's Source Engine games (max -41.33%), The Cave
 (max -33.33%), Serious Sam 3 (max -18.64%), Yo Frankie! (max -30.19%),
 Zen Bound (max -22.22%), GStreamer (max -6.12%), and GLBenchmark 2.7
 (max -1.94%).
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/brw_fs.h   |  1 +
  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 78 
 
  2 files changed, 79 insertions(+)
 
 The pattern matching stuff here might be useful to abstract for reuse in
 other peephole type optimizations; ensuring that the right opcodes exist
 without accidentally walking the list is tricky to get right.
 
 Then again, I'm not sure how many useful peephole optimizations we'll have;
 it may be more useful in many cases to walk a UD-chain rather than looking
 at consecutive instructions.
 
 diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
 b/src/mesa/drivers/dri/i965/brw_fs.h
 index 370ab6c..7feb2b6 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.h
 +++ b/src/mesa/drivers/dri/i965/brw_fs.h
 @@ -369,6 +369,7 @@ public:
  fs_reg src0, fs_reg src1);
 bool try_emit_saturate(ir_expression *ir);
 bool try_emit_mad(ir_expression *ir, int mul_arg);
 +   void try_replace_with_sel();
 void emit_bool_to_cond_code(ir_rvalue *condition);
 void emit_if_gen6(ir_if *ir);
 void emit_unspill(fs_inst *inst, fs_reg reg, uint32_t spill_offset);
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 index ee7728c..a36c248 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 @@ -1842,6 +1842,82 @@ fs_visitor::emit_if_gen6(ir_if *ir)
 inst-predicate = BRW_PREDICATE_NORMAL;
  }
  
 +/**
 + * Try to replace IF/MOV/ELSE/MOV/ENDIF with SEL.
 + *
 + * Many GLSL shaders contain the following pattern:
 + *
 + *x = condition ? foo : bar
 + *
 + * The compiler emits an ir_if tree for this, since each subexpression might 
 be
 + * a complex tree that could have side-effects or short-circuit logic.
 + *
 + * However, the common case is to simply select one of two constants or
 + * variable values---which is exactly what SEL is for.  In this case, the
 + * assembly looks like:
 + *
 + *(+f0) IF
 + *MOV dst src0
 + *ELSE
 + *MOV dst src1
 + *ENDIF
 + *
 + * which can be easily translated into:
 + *
 + *(+f0) SEL dst src0 src1
 + *
 + * If src0 is an immediate value, we promote it to a temporary GRF.
 + */
 +void
 +fs_visitor::try_replace_with_sel()
 +{
 +   fs_inst *endif_inst = (fs_inst *) instructions.get_tail();
 +   assert(endif_inst-opcode == BRW_OPCODE_ENDIF);
 +
 +   /* Pattern match in reverse: IF, MOV, ELSE, MOV, ENDIF. */
 +   int opcodes[] = {
 +  BRW_OPCODE_IF, BRW_OPCODE_MOV, BRW_OPCODE_ELSE, BRW_OPCODE_MOV,
 +   };
 +
 +   fs_inst *match = (fs_inst *) endif_inst-prev;
 +   for (int i = 0; i  4; i++) {
 +  if (match-is_head_sentinel() || match-opcode != opcodes[4-i-1])
 + return;
 +  match = (fs_inst *) match-prev;
 +   }
 +
 +   /* The opcodes match; it looks like the right sequence of instructions. */
 +   fs_inst *else_mov = (fs_inst *) endif_inst-prev;
 +   fs_inst *then_mov = (fs_inst *) else_mov-prev-prev;
 +   fs_inst *if_inst = (fs_inst *) then_mov-prev;
 +
 +   /* Check that the MOVs are the right form. */
 +   if (then_mov-dst.equals(else_mov-dst) 
 +   !then_mov-is_partial_write() 
 +   !else_mov-is_partial_write()) {
 +
 +  /* Remove the matched instructions; we'll emit a SEL to replace them. 
 */
 +  while (!if_inst-next-is_tail_sentinel())
 + if_inst-next-remove();
 +  if_inst-remove();
 +
 +  /* Only the last source register can be a constant, so if the MOV in
 +   * the then clause uses a constant, we need to put it in a temporary.
 +   */
 +  

Re: [Mesa-dev] [PATCH 4/6] i965/fs: Optimize IF/MOV/ELSE/MOV/ENDIF to SEL when possible.

2013-08-06 Thread Christoph Bumiller
On 06.08.2013 19:19, Matt Turner wrote:
 On Tue, Aug 6, 2013 at 4:14 AM, Christoph Bumiller
 e0425...@student.tuwien.ac.at wrote:
 On 06.08.2013 03:28, Kenneth Graunke wrote:
 Many GLSL shaders contain code of the form:

x = condition ? foo : bar

 The compiler emits an ir_if tree for this, since each subexpression
 might be a complex tree that could have side-effects and short-circuit
 logic operations.

 However, the common case is to simply pick one of two constants or
 variable's values---which is exactly what SEL is for.  Replacing IF/ELSE
 with SEL also simplifies the control flow graph, making optimization
 passes which work on basic blocks more effective.

 Don't you think something like that should be implemented in common code
 so that all drivers can profit ?
 We would love that. As part of an work in progress, I'm adding
 conditional-select to the GLSL IR. We planned a few months ago to do
 this as a step toward SSA at the IR level, but have only laid a little
 bit of groundwork in that direction (Ian's vector insert/extract
 series).

 Looks like your backend already does SSA. Shouldn't that be
 implemented in common code? :)

Then the code would have to run on GLSL IR as well as my internal IR
because the intermediate one, TGSI, shouldn't be in SSA form, and
abstracting an IR doesn't sound particularly fun.
Also I don't have to handle vectors so it's a bit simpler, actually
pretty straightforward if you implement an existing algorithm.
As for some other passes that could be shared, I still need them in the
backend to be applied to device-specifc code sequences, you probably
have a similar situation.

 It would be really nice to have more, useful device-independent
 optimizations or simplifications like this already done instead of
 requiring each driver to re-implement them (or use llvm).
 Yes, it definitely would.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/34] mesa/st: Add VARYING_SLOT_TEX[1-7] to st_translate_geometry_program().

2013-07-30 Thread Christoph Bumiller
On 29.07.2013 08:03, Paul Berry wrote:
 From: Bryan Cain bryanca...@gmail.com

 v2 (Paul Berry stereotype...@gmail.com: Split out to separate patch
 (previously this was part of glsl: add builtins for geometry
 shaders.)
 ---
  src/mesa/state_tracker/st_program.c | 7 +++
  1 file changed, 7 insertions(+)

 diff --git a/src/mesa/state_tracker/st_program.c 
 b/src/mesa/state_tracker/st_program.c
 index 60cc37c..211b879 100644
 --- a/src/mesa/state_tracker/st_program.c
 +++ b/src/mesa/state_tracker/st_program.c
 @@ -911,6 +911,13 @@ st_translate_geometry_program(struct st_context *st,
  stgp-input_semantic_index[slot] = 0;
  break;
   case VARYING_SLOT_TEX0:
 + case VARYING_SLOT_TEX1:
 + case VARYING_SLOT_TEX2:
 + case VARYING_SLOT_TEX3:
 + case VARYING_SLOT_TEX4:
 + case VARYING_SLOT_TEX5:
 + case VARYING_SLOT_TEX6:
 + case VARYING_SLOT_TEX7:
  stgp-input_semantic_name[slot] = TGSI_SEMANTIC_GENERIC;
  stgp-input_semantic_index[slot] = num_generic++;
  break;

This doesn't work, first because the semantic index shouldn't depend on
which varyings are present, and second because TEX is required to use
TGSI_SEMANTIC_TEXCOORD if the driver has PIPE_CAP_TGSI_TEXCOORD. Please
see st_prepare_vertex_program.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: expose EXT_framebuffer_multisample_blit_scaled if MSAA is supported

2013-07-17 Thread Christoph Bumiller
On 17.07.2013 02:05, Marek Olšák wrote:
 No, it's not faster, but it's not slower either.

 Now that I think about it, I can't come up with a good shader-based
 algorithm for the resolve operation.

 I don't think Christoph's approach that an MSAA texture can be viewed
 as a larger single-sample texture is correct, because the physical
 locations of the samples in memory usually do not correspond to the
 sample locations the 3D engine used for rasterization. so fetching a
 texel from the larger texture at (x,y) physical coordinates won't
 always return the closest rasterized sample at those coordinates. Also
 the bilinear filter would be horrible in this case, because it only
 takes 4 samples per pixel.

It can also take 8 samples per-pixel or MS8 resolve wouldn't be
possible, so scaling down MS2 should look OK at least.
The arrangement of the samples in the texture is ordered according to
the physical sample locations (of course the proportions don't match).

Besides, it's allowed to look horrible, it only requires a 4-tap linear
filter, and depth resolve, should anyone actually do that, uses
point/nearest filtering so you can always do it in one pass.
Also, it's possible this extension isn't even intended to resolve for
direct display.

There's one advantage in having the app do it though, you don't have to
worry about keeping a temporary surface of an appropriate size around.

 Now let's consider implementing the scaled resolve operation in the
 shader by texelFetch-ing all samples and using a bilinear filter. For
 Nx MSAA, there would be N*4 texel fetches per pixel; in comparison,
 separate resolve+blit needs only N+4 texel fetches per pixel. In
 addition to that, the resolve is a special fixed-function blending
 operation and the fragment shader is not even executed. See? Separate
 resolve+blit beats everything.

 Marek

 On Wed, Jul 17, 2013 at 12:12 AM, Grigori Goronzy g...@chown.ath.cx wrote:
 On 16.07.2013 19:26, Marek Olšák wrote:
 Surprisingly all drivers supporting MSAA can already do this (r300g and
 r600g
 for sure) and I think Christoph wanted to have this feature for his
 Nouveau
 drivers anyway.

 OK, they can do it, but is it actually any faster than doing a resolve and
 regular blit afterwards? This is kind of the point of this extension. r600g
 creates a temporary texture to resolve into and then blits that, which
 shouldn't be any faster than doing the same from GL.

 Grigori
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Direct3D 9 state tracker

2013-07-16 Thread Christoph Bumiller
So, about two months ago I had the insane idea to pick up Joakim
Sindholt's Direct3D 9 state tracker that he'd started about 3 years ago
with the goal to make it run StarCraft 2 so I could finally play at a
reasonable frame rate ...

With help from Joakim and advice from the wine developers, as well as
wine's d3d9 tests, things went surprisingly smooth and my original goal
has been achieved and surpassed, hence I thought I'd post a note here in
case someone who doesn't yet know about it is interested in trying it out.

... Now wait, didn't we have a D3D10/11 state tracker already that we
kicked out because it was unmaintained and not really useful ?
Yes, but there are a couple of differences to d3d1x:

- the original author has not vanished [yet] (Luca, if you can hear me:
You cannot leave your children out to die like that !)
- it's written in C instead of C++ and not relying on horrific multiple
inheritance with templates hacks to make gcc generate COM-compatible
vtables (and I'm still not sure if that actually worked)
- gallium wasn't ready for D3D11, and still isn't (at least the pipe
drivers aren't), but it is ready for D3D9, and all the features required
from the pipe drivers are well tested via OpenGL
- there are no motivating applications using Direct3D 10/11 yet (at
least for me)
- and most importantly, contrary to d3d1x, d3d9/st already actually
works for real applications !

So far I've tried Skyrim, Civilization 5, Anno 1404 and StarCraft 2 on
the nvc0 and r600g drivers, which work pretty well, at up to x2 the fps
I get with wined3d (NOTE: no thorough benchmarking done yet).
Civilization 4 works, too, but it still has a couple of (not too severe)
rendering issues because I didn't pay much attention to the fixed
function pipeline and its interaction with the earlier shader versions yet.

If people think it's a good idea to merge it, I'd clean up the few
modifications I did to gallium, and, once they've been cleared, merge
the state tracker itself.
Unfortunately, for proper window system integration, a few modifications
to wine are required (it used to run without them, but fully correct
operation isn't possible like that).

Here's the links to the mesa branch containing the state tracker and to
a patched version of wine:
https://github.com/chrisbmr/Mesa-3D/tree/gallium-nine
https://github.com/chrisbmr/wine/tree/d3dadapter9-wip
(The wine modifications only affect { d3d9.dll.so, gdi32.dll.so,
user32.dll.so, wineps.drv.so and winex11.drv.so }, so you don't have to
replace all of it).

Some usage hints:
https://github.com/chrisbmr/Mesa-3D/blob/gallium-nine/src/gallium/state_trackers/nine/README

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: expose EXT_framebuffer_multisample_blit_scaled if MSAA is supported

2013-07-16 Thread Christoph Bumiller
On 17.07.2013 00:12, Grigori Goronzy wrote:
 On 16.07.2013 19:26, Marek Olšák wrote:
 Surprisingly all drivers supporting MSAA can already do this (r300g
 and r600g
 for sure) and I think Christoph wanted to have this feature for his
 Nouveau
 drivers anyway.

 OK, they can do it, but is it actually any faster than doing a resolve
 and regular blit afterwards? This is kind of the point of this
 extension. r600g creates a temporary texture to resolve into and then
 blits that, which shouldn't be any faster than doing the same from GL.


You can implement arbitrary filters for resolve since you're doing it
manually using texelFetch from a shader anyway, so yes you can make it
faster (for depth/stencil resolve this is trivial), or at least leave
that option open, while if GL apps do it manually you can't do anything
about it.

NV50/NVC0 just use a single plain old scaled blit for resolve because a
multisample texture's samples are all adjacent in 2D coordinate space,
it's no different from downscaling a larger texture, so there it's
always going to be faster.
Granted, it might look ugly if I can't find a fitting filtering mode,
but I'll just ignore that until I see some application using it that
relies on SCALED_RESOLVE_NICEST_EXT looking decent.

 Grigori
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] tgsi: rename the TGSI fragment kill opcodes

2013-07-12 Thread Christoph Bumiller
On 12.07.2013 16:06, Jose Fonseca wrote:
 The tradition has been to use C suffix for conditional opcodes, instead of 
 _IF.  That said, I don't feel too strongly either way.
 

Except the 'C' suffix usually (ok, we only have BREAKC) indicates a
single condition value where non-zero means true, while KIL operates on
all 4 components and executes if either is  0.

I'd still prefer to keep the name KIL instead of KILL_IF and simply
rename KILP to DISCARD, which is the name used in GLSL and SM4.

 I agree that the current naming is confusing. And I like the fact that the 
 new and old opcodes don't overlap, which means there is no way we 
 inadvertently get the wrong ones when updating out-of-tree state trackers.
 
 And it's nice to see this sort of cleanups. I know from experience that that 
 they can be time consuming, but I do believe they pay up eventually. I 
 believe Gallium pipe_screen/pipe_context interfaces are quite lean and 
 straightforward these days, but the opcodes are still a big mess, and shaders 
 are one of the most (if not the most) important parts of the interface.
 
 For the series:
 Reviewed-by: Jose Fonseca jfons...@vmware.com
 
 Jose
 
 - Original Message -
 TGSI_OPCODE_KIL and KILP had confusing names.  The former was conditional
 kill (if any src component  0).  The later was unconditional kill.
 At one time KILP was supposed to work with NV-style condition
 codes/predicates but we never had that in TGSI.

 This patch renames both opcodes:
   TGSI_OPCODE_KIL - KILL_IF   (kill if src.xyzw  0)
   TGSI_OPCODE_KILP - KILL (unconditional kill)

 Note: I didn't just transpose the opcode names to help ensure that I
 didn't miss updating any code anywhere.

 I believe I've updated all the relevant code and comments but I'm
 not 100% sure that some drivers had this right in the first place.
 For example, the radeon driver might have llvm.AMDGPU.kill and
 llvm.AMDGPU.kilp mixed up.  Driver authors should review their code.
 ---
  src/gallium/auxiliary/draw/draw_pipe_aapoint.c   |4 ++--
  src/gallium/auxiliary/draw/draw_pipe_pstipple.c  |8 
  src/gallium/auxiliary/gallivm/lp_bld_flow.c  |2 +-
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c   |8 
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c  |6 ++
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c  |   16 
  src/gallium/auxiliary/postprocess/pp_mlaa.h  |6 +++---
  src/gallium/auxiliary/tgsi/tgsi_exec.c   |   14 +++---
  src/gallium/auxiliary/tgsi/tgsi_info.c   |4 ++--
  src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h |4 ++--
  src/gallium/auxiliary/tgsi/tgsi_scan.c   |4 ++--
  src/gallium/auxiliary/tgsi/tgsi_scan.h   |2 +-
  src/gallium/auxiliary/util/u_pstipple.c  |   10 +-
  src/gallium/auxiliary/vl/vl_mc.c |2 +-
  src/gallium/docs/source/tgsi.rst |6 --
  src/gallium/drivers/i915/i915_fpc_optimize.c |4 ++--
  src/gallium/drivers/i915/i915_fpc_translate.c|8 +++-
  src/gallium/drivers/ilo/shader/ilo_shader_fs.c   |2 +-
  src/gallium/drivers/ilo/shader/toy_tgsi.c|   16 
  src/gallium/drivers/nv30/nvfx_fragprog.c |6 +++---
  .../drivers/nv50/codegen/nv50_ir_from_tgsi.cpp   |   10 +-
  src/gallium/drivers/r300/compiler/r3xx_fragprog.c|2 +-
  .../drivers/r300/compiler/radeon_program_alu.c   |   12 ++--
  .../drivers/r300/compiler/radeon_program_alu.h   |2 +-
  src/gallium/drivers/r300/r300_tgsi_to_rc.c   |4 ++--
  src/gallium/drivers/r600/r600_shader.c   |   14 +++---
  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c  |8 
  src/gallium/drivers/softpipe/sp_quad_depth_test.c|2 +-
  src/gallium/drivers/svga/svga_tgsi_insn.c|   18
  +-
  src/gallium/include/pipe/p_shader_tokens.h   |4 ++--
  src/mesa/state_tracker/st_glsl_to_tgsi.cpp   |6 +++---
  src/mesa/state_tracker/st_mesa_to_tgsi.c |5 +++--
  32 files changed, 109 insertions(+), 110 deletions(-)

 diff --git a/src/gallium/auxiliary/draw/draw_pipe_aapoint.c
 b/src/gallium/auxiliary/draw/draw_pipe_aapoint.c
 index ec703d0..0d7b88e 100644
 --- a/src/gallium/auxiliary/draw/draw_pipe_aapoint.c
 +++ b/src/gallium/auxiliary/draw/draw_pipe_aapoint.c
 @@ -308,9 +308,9 @@ aa_transform_inst(struct tgsi_transform_context *ctx,
newInst.Src[1].Register.SwizzleY = TGSI_SWIZZLE_W;
ctx-emit_instruction(ctx, newInst);
  
 -  /* KIL -tmp0.;   # if -tmp0.y  0, KILL */
 +  /* KILL_IF -tmp0.;   # if -tmp0.y  0, KILL */
newInst = tgsi_default_full_instruction();
 -  newInst.Instruction.Opcode = TGSI_OPCODE_KIL;
 +  newInst.Instruction.Opcode = TGSI_OPCODE_KILL_IF;

Re: [Mesa-dev] [PATCH 1/3] gallium: add expand_resource interface

2013-07-11 Thread Christoph Bumiller
On 11.07.2013 20:15, Marek Olšák wrote:
 Hi Roland,

 The fast color clear on Radeon doesn't touch the memory of the texture
 resource. Instead, it changes some GPU meta data that say the resource
 is cleared (the location of the meta data is stored in pipe_resource).
 This works fine as long as the gallium pipe_resource structure is used
 for accessing the resource. That's not the case with the DDX, which is
 responsible for putting the resource on the screen and it obviously
 has no idea about the contents of pipe_resource, so it doesn't know
 that the resource is in a cleared state and a special flush
 operation must be done to actually write the cleared pixels (which
 haven't been overwritten by new geometry of course).

If I was mean I would suggest you just associate the information with
the bo and have the DDX import that, too.

 The easiest way to solve this is to flush the cleared resource in
 SwapBuffers and where the front buffer is flushed. The Gallium driver
 can't do it automatically, because it has no notion of front and back
 buffers nor does it know which resource must be flushed. That's why
 a new pipe_context function is being proposed, which was originally my
 idea.

You could cloak the function under a more generic name, then you're less
likely to encounter reactions like hardware details don't belong in the
API.

First I thought of flush_frontbuffer from pipe_screen, but that seems
to have a different (or, no) purpose.

 This commit only fixes r600g for st/dri. Any other co-state tracker
 (like st/egl and st/xlib) will be broken if it's used with r600g. I
 think we can ignore st/xlib. Not sure how important st/egl is (not
 required for EGL under X).

 Marek

 On Wed, Jul 10, 2013 at 7:32 PM, Roland Scheidegger srol...@vmware.com 
 wrote:
 I don't quite understand what this should do, at first sight it looks
 like a ugly hack (which should really not be part of gallium interface)
 to make fast color clearing work better with window framebuffers.
 Seems to go against the idea of resources (which are immutable, well not
 the contents but the properties).
 (If anything I wanted an interface to change bind flags for resources
 after initialization, because they are near impossible to guarantee with
 OpenGL's (or d3d9 for that matter) distinct texture/fb model, but that
 would also be quite a hack.)
 Could you elaborate with some example what that's supposed to do in
 practice?

 Roland


 Am 10.07.2013 18:20, schrieb Grigori Goronzy:
 This interface is used to expand fast-cleared window system
 colorbuffers.
 ---
  src/gallium/include/pipe/p_context.h | 8 
  src/gallium/state_trackers/dri/common/dri_drawable.c | 4 
  src/gallium/state_trackers/dri/drm/dri2.c| 8 ++--
  3 files changed, 18 insertions(+), 2 deletions(-)

 diff --git a/src/gallium/include/pipe/p_context.h 
 b/src/gallium/include/pipe/p_context.h
 index aa18cbf..38d5ee6 100644
 --- a/src/gallium/include/pipe/p_context.h
 +++ b/src/gallium/include/pipe/p_context.h
 @@ -354,6 +354,14 @@ struct pipe_context {
 unsigned dstx, unsigned dsty,
 unsigned width, unsigned height);

 +   /**
 +* Expand a color resource in-place.
 +*
 +* \return TRUE if resource was expanded, FALSE otherwise
 +*/
 +   boolean (*expand_resource)(struct pipe_context *pipe,
 +  struct pipe_resource *dst);
 +
 /** Flush draw commands
  *
  * \param flags  bitfield of enum pipe_flush_flags values.
 diff --git a/src/gallium/state_trackers/dri/common/dri_drawable.c 
 b/src/gallium/state_trackers/dri/common/dri_drawable.c
 index 18d8d89..b67a497 100644
 --- a/src/gallium/state_trackers/dri/common/dri_drawable.c
 +++ b/src/gallium/state_trackers/dri/common/dri_drawable.c
 @@ -448,6 +448,10 @@ dri_flush(__DRIcontext *cPriv,
   }

   /* FRONT_LEFT is resolved in drawable-flush_frontbuffer. */
 +  } else if (ctx-st-pipe-expand_resource) {
 + /* Expand fast-cleared framebuffer */
 + ctx-st-pipe-expand_resource(ctx-st-pipe,
 +   drawable-textures[ST_ATTACHMENT_BACK_LEFT]);
}

dri_postprocessing(ctx, drawable, ST_ATTACHMENT_BACK_LEFT);
 diff --git a/src/gallium/state_trackers/dri/drm/dri2.c 
 b/src/gallium/state_trackers/dri/drm/dri2.c
 index 1dcc1f7..97784ec 100644
 --- a/src/gallium/state_trackers/dri/drm/dri2.c
 +++ b/src/gallium/state_trackers/dri/drm/dri2.c
 @@ -490,18 +490,22 @@ dri2_flush_frontbuffer(struct dri_context *ctx,
  {
 __DRIdrawable *dri_drawable = drawable-dPriv;
 struct __DRIdri2LoaderExtensionRec *loader = 
 drawable-sPriv-dri2.loader;
 +   struct pipe_context *pipe = ctx-st-pipe;

 if (statt != ST_ATTACHMENT_FRONT_LEFT)
return;

 if (drawable-stvis.samples  1) {
 -  struct pipe_context *pipe = ctx-st-pipe;
 -
/* Resolve the front buffer. */
dri_pipe_blit(ctx-st-pipe,
   

Re: [Mesa-dev] [PATCH 2/3] tgsi: fix-up KILP comments

2013-07-11 Thread Christoph Bumiller
On 12.07.2013 01:26, Brian Paul wrote:
 KILP is really unconditional fragment kill.

 We've had KIL and KILP transposed forever.  I'll fix that next.

I think the 'P' meant to indicate that the condition, if there is any,
would be a predicate register, whereas KIL no-P is supposed to represent
the KIL/TEXKILL instruction from those old shader languages.
So, it's not transposed, it's just an initially confusing name. Maybe
just s/KILP/DISCARD instead of swapping them ?

 ---
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c |3 +--
  src/gallium/auxiliary/tgsi/tgsi_exec.c  |5 ++---
  src/gallium/docs/source/tgsi.rst|   10 +-
  src/mesa/state_tracker/st_glsl_to_tgsi.cpp  |1 +
  4 files changed, 9 insertions(+), 10 deletions(-)

 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
 b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 index 43724e7..43182ee 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 @@ -2096,8 +2096,7 @@ emit_kil(
  
  
  /**
 - * Predicated fragment kill.
 - * XXX Actually, we do an unconditional kill (as in tgsi_exec.c).
 + * Unconditional fragment kill.
   * The only predication is the execution mask which will apply if
   * we're inside a loop or conditional.
   */
 diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
 b/src/gallium/auxiliary/tgsi/tgsi_exec.c
 index eaf..035b105 100644
 --- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
 +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
 @@ -1614,8 +1614,7 @@ exec_kil(struct tgsi_exec_machine *mach,
  }
  
  /**
 - * Execute NVIDIA-style KIL which is predicated by a condition code.
 - * Kill fragment if the condition code is TRUE.
 + * Unconditional fragment kill/discard.
   */
  static void
  exec_kilp(struct tgsi_exec_machine *mach,
 @@ -1623,7 +1622,7 @@ exec_kilp(struct tgsi_exec_machine *mach,
  {
 uint kilmask; /* bit 0 = pixel 0, bit 1 = pixel 1, etc */
  
 -   /* unconditional kil */
 +   /* kill fragment for all fragments currently executing */
 kilmask = mach-ExecMask;
 mach-Temps[TEMP_KILMASK_I].xyzw[TEMP_KILMASK_C].u[0] |= kilmask;
  }
 diff --git a/src/gallium/docs/source/tgsi.rst 
 b/src/gallium/docs/source/tgsi.rst
 index 3f48b51..8c6fec9 100644
 --- a/src/gallium/docs/source/tgsi.rst
 +++ b/src/gallium/docs/source/tgsi.rst
 @@ -471,11 +471,6 @@ This instruction replicates its result.
dst.w = partialy(src.w)
  
  
 -.. opcode:: KILP - Predicated Discard
 -
 -  Not really predicated, just unconditional discard
 -
 -
  .. opcode:: PK2H - Pack Two 16-bit Floats
  
TBD
 @@ -755,6 +750,11 @@ This instruction replicates its result.
endif
  
  
 +.. opcode:: KILP - Discard
 +
 +  Unconditional discard.  Allowed in fragment shaders only.
 +
 +
  .. opcode:: SCS - Sine Cosine
  
  .. math::
 diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
 b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 index 64e0a8a..9e0a648 100644
 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 @@ -2978,6 +2978,7 @@ glsl_to_tgsi_visitor::visit(ir_discard *ir)
this-result.negate = ~this-result.negate;
emit(ir, TGSI_OPCODE_KIL, undef_dst, this-result);
 } else {
 +  /* unconditional kil */
emit(ir, TGSI_OPCODE_KILP);
 }
  }

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: x/y coordinates must be divided by block dim in dma blit

2013-07-05 Thread Christoph Bumiller
From: Christoph Bumiller christoph.bumil...@speed.at

---
 src/gallium/drivers/r600/evergreen_state.c | 10 --
 src/gallium/drivers/r600/r600_state.c  | 10 --
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 0dc4f15..0267d28 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -3740,6 +3740,7 @@ boolean evergreen_dma_blit(struct pipe_context *ctx,
struct r600_texture *rdst = (struct r600_texture*)dst;
unsigned dst_pitch, src_pitch, bpp, dst_mode, src_mode, copy_height;
unsigned src_w, dst_w;
+   unsigned src_x, src_y;
 
if (rctx-rings.dma.cs == NULL) {
return FALSE;
@@ -3748,6 +3749,11 @@ boolean evergreen_dma_blit(struct pipe_context *ctx,
return FALSE;
}
 
+   src_x = util_format_get_nblocksx(src-format, src_box-x);
+   dst_x = util_format_get_nblocksx(src-format, dst_x);
+   src_y = util_format_get_nblocksy(src-format, src_box-y);
+   dst_y = util_format_get_nblocksy(src-format, dst_y);
+
bpp = rdst-surface.bpe;
dst_pitch = rdst-surface.level[dst_level].pitch_bytes;
src_pitch = rsrc-surface.level[src_level].pitch_bytes;
@@ -3792,7 +3798,7 @@ boolean evergreen_dma_blit(struct pipe_context *ctx,
 */
src_offset= rsrc-surface.level[src_level].offset;
src_offset += rsrc-surface.level[src_level].slice_size * 
src_box-z;
-   src_offset += src_box-y * src_pitch + src_box-x * bpp;
+   src_offset += src_y * src_pitch + src_x * bpp;
dst_offset = rdst-surface.level[dst_level].offset;
dst_offset += rdst-surface.level[dst_level].slice_size * dst_z;
dst_offset += dst_y * dst_pitch + dst_x * bpp;
@@ -3800,7 +3806,7 @@ boolean evergreen_dma_blit(struct pipe_context *ctx,
src_box-height * src_pitch);
} else {
evergreen_dma_copy_tile(rctx, dst, dst_level, dst_x, dst_y, 
dst_z,
-   src, src_level, src_box-x, src_box-y, 
src_box-z,
+   src, src_level, src_x, src_y, 
src_box-z,
copy_height, dst_pitch, bpp);
}
return TRUE;
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 301ca88..ac0e0ce 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -3139,6 +3139,7 @@ boolean r600_dma_blit(struct pipe_context *ctx,
struct r600_texture *rdst = (struct r600_texture*)dst;
unsigned dst_pitch, src_pitch, bpp, dst_mode, src_mode, copy_height;
unsigned src_w, dst_w;
+   unsigned src_x, src_y;
 
if (rctx-rings.dma.cs == NULL) {
return FALSE;
@@ -3147,6 +3148,11 @@ boolean r600_dma_blit(struct pipe_context *ctx,
return FALSE;
}
 
+   src_x = util_format_get_nblocksx(src-format, src_box-x);
+   dst_x = util_format_get_nblocksx(src-format, dst_x);
+   src_y = util_format_get_nblocksy(src-format, src_box-y);
+   dst_y = util_format_get_nblocksy(src-format, dst_y);
+
bpp = rdst-surface.bpe;
dst_pitch = rdst-surface.level[dst_level].pitch_bytes;
src_pitch = rsrc-surface.level[src_level].pitch_bytes;
@@ -3179,7 +3185,7 @@ boolean r600_dma_blit(struct pipe_context *ctx,
 */
src_offset= rsrc-surface.level[src_level].offset;
src_offset += rsrc-surface.level[src_level].slice_size * 
src_box-z;
-   src_offset += src_box-y * src_pitch + src_box-x * bpp;
+   src_offset += src_y * src_pitch + src_x * bpp;
dst_offset = rdst-surface.level[dst_level].offset;
dst_offset += rdst-surface.level[dst_level].slice_size * dst_z;
dst_offset += dst_y * dst_pitch + dst_x * bpp;
@@ -3191,7 +3197,7 @@ boolean r600_dma_blit(struct pipe_context *ctx,
r600_dma_copy(rctx, dst, src, dst_offset, src_offset, size);
} else {
return r600_dma_copy_tile(rctx, dst, dst_level, dst_x, dst_y, 
dst_z,
-   src, src_level, src_box-x, src_box-y, 
src_box-z,
+   src, src_level, src_x, src_y, 
src_box-z,
copy_height, dst_pitch, bpp);
}
return TRUE;
-- 
1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: more changes to render_condition

2013-06-22 Thread Christoph Bumiller
On 22.06.2013 16:36, Roland Scheidegger wrote:
 We decided to drop predicated transfers already. State tracker can
 emulate this by using another resource and doing a (predicated)
 resource_copy_region, might be slightly suboptimal but predicated
 transfers really sound strange.
 As for resource_copy_region, I'm fine with a flag indicating if it
 honors predication or not. You can have that for blit too if you need it
 (maybe if you implement resource_copy_region as a blit?), I was thinking
 about it (it is not obvious why blit should behave differently really),
 but decided against it because d3d10 apparently does not seem to require
 it (and other apis don't predicate that stuff anyway), unless the docs
 are wrong (resolve isn't mentioned among the predicated functions).

 Roland

You could still go with adding a separate render_condition
(non_render_stuff_condition) for transfers, copies and blits, that's how
it's done on NV hardware.
Adding booleans to all the functions looks ugly.


 Am 22.06.2013 14:27, schrieb Marek Olšák:
 I have mixed feelings about this.

 Some transfers are implemented with pipe_context::blit instead of
 resource_copy_region, because MSAA resources should be downsampled in
 transfer_map and upsampled in transfer_unmap, so that ReadPixels and
 various fallbacks (CopyPixels, CopyTexSubImage, ...) work. If
 transfers were to honor the render condition, the blit (including
 resolve) must honor it too.

 Adding a boolean flag to resource_copy_region and blit saying whether
 the render condition should be honored is preferable. This should keep
 the render-condition disabling in the driver as it is now. Trying to
 save/restore the render condition before/after all occurences of
 resource_copy_region and blit would be prone to regressions and it
 would also need much more work.

 Marek


 On Sat, Jun 15, 2013 at 12:01 AM, Roland Scheidegger srol...@vmware.com 
 wrote:
 Am 14.06.2013 19:49, schrieb srol...@vmware.com:
 From: Roland Scheidegger srol...@vmware.com

 For conditional rendering this makes it possible to skip rendering
 if either the predicate is true or false, as supported by d3d10
 (in fact previously it was sort of implied skip rendering if predicate
 is false for occlusion predicate, and true for so_overflow predicate).
 There's no cap bit for this as presumably all drivers could do it trivially
 (but this patch does not implement it for the drivers using true
 hw predicates, nvxx, r600, radeonsi, no change is expected for OpenGL
 functionality).
 ---

 FWIW there's some more changes which would be useful but they are probably
 more controversial and may require some more thought so here it goes:


 diff --git a/src/gallium/docs/source/context.rst 
 b/src/gallium/docs/source/context.rst
 index ede89be..59403de 100644
 --- a/src/gallium/docs/source/context.rst
 +++ b/src/gallium/docs/source/context.rst
 @@ -385,7 +385,8 @@ A drawing command can be skipped depending on the 
 outcome of a query
  (typically an occlusion query, or streamout overflow predicate).
  The ``render_condition`` function specifies the query which should be 
 checked
  prior to rendering anything. Functions honoring render_condition include
 -(and are limited to) draw_vbo, clear, clear_render_target, 
 clear_depth_stencil.
 +(and are limited to) draw_vbo, clear, clear_render_target, 
 clear_depth_stencil,
 +resource_copy_region. Transfers may also be affected.

  If ``render_condition`` is called with ``query`` = NULL, conditional
  rendering is disabled and drawing takes place normally.
 @@ -545,6 +546,13 @@ These flags control the behavior of a transfer object.
Written ranges will be notified later with :ref:`transfer_flush_region`.
Cannot be used with ``PIPE_TRANSFER_READ``.

 +``PIPE_TRANSFER_HONOR_RENDER_CONDITION``
 +  The transfer will honor the current render condition. This is only valid
 +  essentially for ``transfer_inline_write`` (but since everyone implements
 +  this with a fallback to ordinary transfer_map/transfer_unmap it is valid
 +  for transfer_map too, however the same restriction apply, the transfer
 +  must be write-only with either DISCARD_RANGE or DISCARD_WHOLE_RESOURCE 
 set).
 +

 The reasoning for this is that d3d10 has CopyResource/CopySubResource
 and UpdateSubResource predicated.
 For resource_copy_region if it always honors render_condition,
 then state trackers not wanting this can simply disable predication
 when they call it. But the opposite is not possible, if it never
 honors predication, then a state tracker needing predication will
 need to wait on the predicate, hence requiring a cpu/gpu sync (if
 the result isn't available yet).
 For transfers this is a bit weird I admit it essentially implies
 a predicated gpu blit from a staging texture (if you implement this
 fully on hardware). If that's too awkward though this one could be
 emulated in the state tracker easily enough, if resource_copy_region
 honors predication (by just creating a temporary 

Re: [Mesa-dev] [PATCH 2/2] gallium/draw: add limits to the clip and cull distances

2013-06-12 Thread Christoph Bumiller
On 12.06.2013 15:57, Jose Fonseca wrote:
 
 
 - Original Message -
 Am 11.06.2013 05:39, schrieb Zack Rusin:
 There are strict limits on those registers. Define the maximums
 and use them instead of magic numbers. Also allows us to add
 some extra sanity checks.
 Suggested by Brian.

 Signed-off-by: Zack Rusin za...@vmware.com
 ---
  src/gallium/auxiliary/draw/draw_context.c |2 ++
  src/gallium/auxiliary/draw/draw_gs.c  |   10 +-
  src/gallium/auxiliary/draw/draw_gs.h  |4 ++--
  src/gallium/auxiliary/draw/draw_vs.c  |   10 +-
  src/gallium/auxiliary/draw/draw_vs.h  |4 ++--
  src/gallium/docs/source/tgsi.rst  |   23 +++
  src/gallium/include/pipe/p_state.h|2 ++
  7 files changed, 41 insertions(+), 14 deletions(-)

 diff --git a/src/gallium/auxiliary/draw/draw_context.c
 b/src/gallium/auxiliary/draw/draw_context.c
 index 0dbddb4..22c0e9b 100644
 --- a/src/gallium/auxiliary/draw/draw_context.c
 +++ b/src/gallium/auxiliary/draw/draw_context.c
 @@ -738,6 +738,7 @@ draw_current_shader_clipvertex_output(const struct
 draw_context *draw)
  uint
  draw_current_shader_clipdistance_output(const struct draw_context *draw,
  int index)
  {
 +   debug_assert(index  PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT);
 if (draw-gs.geometry_shader)
return draw-gs.geometry_shader-clipdistance_output[index];
 return draw-vs.clipdistance_output[index];
 @@ -756,6 +757,7 @@ draw_current_shader_num_written_clipdistances(const
 struct draw_context *draw)
  uint
  draw_current_shader_culldistance_output(const struct draw_context *draw,
  int index)
  {
 +   debug_assert(index  PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT);
 if (draw-gs.geometry_shader)
return draw-gs.geometry_shader-culldistance_output[index];
 return draw-vs.vertex_shader-culldistance_output[index];
 diff --git a/src/gallium/auxiliary/draw/draw_gs.c
 b/src/gallium/auxiliary/draw/draw_gs.c
 index b762dd6..cd63e2b 100644
 --- a/src/gallium/auxiliary/draw/draw_gs.c
 +++ b/src/gallium/auxiliary/draw/draw_gs.c
 @@ -792,13 +792,13 @@ draw_create_geometry_shader(struct draw_context
 *draw,
if (gs-info.output_semantic_name[i] ==
TGSI_SEMANTIC_VIEWPORT_INDEX)
   gs-viewport_index_output = i;
if (gs-info.output_semantic_name[i] == TGSI_SEMANTIC_CLIPDIST) {
 - if (gs-info.output_semantic_index[i] == 0)
 -gs-clipdistance_output[0] = i;
 - else
 -gs-clipdistance_output[1] = i;
 + debug_assert(gs-info.output_semantic_index[i] 
 +  PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT);
 + gs-clipdistance_output[gs-info.output_semantic_index[i]] = i;
}
if (gs-info.output_semantic_name[i] == TGSI_SEMANTIC_CULLDIST) {
 - debug_assert(gs-info.output_semantic_index[i] 
 Elements(gs-culldistance_output));
 + debug_assert(gs-info.output_semantic_index[i] 
 +  PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT);
   gs-culldistance_output[gs-info.output_semantic_index[i]] = i;
}
 }
 diff --git a/src/gallium/auxiliary/draw/draw_gs.h
 b/src/gallium/auxiliary/draw/draw_gs.h
 index 05d666d..e279a80 100644
 --- a/src/gallium/auxiliary/draw/draw_gs.h
 +++ b/src/gallium/auxiliary/draw/draw_gs.h
 @@ -67,8 +67,8 @@ struct draw_geometry_shader {
 struct tgsi_shader_info info;
 unsigned position_output;
 unsigned viewport_index_output;
 -   unsigned clipdistance_output[2];
 -   unsigned culldistance_output[2];
 +   unsigned
 clipdistance_output[PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT];
 +   unsigned
 culldistance_output[PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT];
  
 unsigned max_output_vertices;
 unsigned primitive_boundary;
 diff --git a/src/gallium/auxiliary/draw/draw_vs.c
 b/src/gallium/auxiliary/draw/draw_vs.c
 index a0bebcc..bbccbe4 100644
 --- a/src/gallium/auxiliary/draw/draw_vs.c
 +++ b/src/gallium/auxiliary/draw/draw_vs.c
 @@ -86,12 +86,12 @@ draw_create_vertex_shader(struct draw_context *draw,
  found_clipvertex = TRUE;
  vs-clipvertex_output = i;
   } else if (vs-info.output_semantic_name[i] ==
   TGSI_SEMANTIC_CLIPDIST) {
 -if (vs-info.output_semantic_index[i] == 0)
 -   vs-clipdistance_output[0] = i;
 -else
 -   vs-clipdistance_output[1] = i;
 +debug_assert(vs-info.output_semantic_index[i] 
 + PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT);
 +vs-clipdistance_output[vs-info.output_semantic_index[i]] =
 i;
   } else if (vs-info.output_semantic_name[i] ==
   TGSI_SEMANTIC_CULLDIST) {
 -debug_assert(vs-info.output_semantic_index[i] 
 Elements(vs-culldistance_output));
 +debug_assert(vs-info.output_semantic_index[i] 
 + PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT);
  

Re: [Mesa-dev] [PATCH 03/13] gallium: Introduce 32-bit bytewise format names

2013-06-06 Thread Christoph Bumiller
On 06.06.2013 10:34, Richard Sandiford wrote:
 Michel Dänzer mic...@daenzer.net writes:
 On Die, 2013-06-04 at 10:47 +0100, Richard Sandiford wrote:
 (2) it uses PIPE_FORMAT_INT_* names with the lsb first rather than the
 mesa-like ones with msb first.  (I'm happy to change the names to
 something else though.)

 The patch isn't in a submittable state yet.  I just thought it was worth
 posting because the lsb-first names do make the change look a bit more
 obvious/less scary :-)
 I can see the appeal of that, but I also see some danger in that naming
 scheme: It'll be easy to miss the difference between the two kinds of
 formats, e.g. when grepping for B8G8R8A8. That's why I'd prefer making
 the difference more explicit in the naming scheme. Sticking to LSB
 first, BGRA might already look a little less scary? :)
 I realise this was probably more a question for Jose, but FWIW:
 I liked the names you originally suggested for their consistency with
 mesa and natural number ordering (as you said).  The PIPE_FORMAT_INT_*

I don't like that _INT_, it could be confused with the SINT/UINT
component type postfix, and it's redundant. The distinction provided by
R8G8B8A8 vs RGBA is already sufficient. Neither do I like REV, I
always have to check what order that actually implies (but then I hardly
ever deal with mesa format names).

Why not just defined it as RxGyBzAw meaning left to right = lowest
address to highest address and RGBAxyzw meaning left to right =
least/most (so that it matches the non-REV variant) to most/least
significant bit-tuple in a word ? And you can do RG16[_]BG16 if you have
2 words, or R32_G32_B32_A32 for 4 words, but this ugly speciment is
equivalent to R32G32B32A32 so it won't ever appear to hurt your eyes.

 version seemed OK too from the lowest always first perspective.
 I'm just afraid that if we use BGRA to mean the reverse of what
 it means in mesa, these patches are going to be cursed by gallium
 developers for years to come.

 BGRA_REV would be consistent with the mesa names while being
 lsb-first, and I'd be happy with that too FWIW.  It's just that
 _REV kind of implies that the other order is somehow the canonical one.
 Having all int formats end in _REV might seem a bit odd.

 Thanks,
 Richard

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium: add support for layered rendering

2013-05-31 Thread Christoph Bumiller
On 01.06.2013 01:02, Alex Deucher wrote:
 On Fri, May 31, 2013 at 6:54 PM, Roland Scheidegger srol...@vmware.com 
 wrote:
 Am 31.05.2013 23:43, schrieb srol...@vmware.com:
 From: Roland Scheidegger srol...@vmware.com

 Since pipe_surface already has all the necessary fields no interface
 changes are necessary except adding a new shader semantic value
 (TGSI_SEMANTIC_LAYER), though add a pipe capability bit for it as well.
 (Note that what GL knows as gl_Layer variable d3d10 is naming
 RENDER_TARGET_ARRAY_INDEX)
 ---
  src/gallium/docs/source/screen.rst |2 ++
  src/gallium/include/pipe/p_defines.h   |3 ++-
  src/gallium/include/pipe/p_shader_tokens.h |3 ++-
  3 files changed, 6 insertions(+), 2 deletions(-)

 diff --git a/src/gallium/docs/source/screen.rst 
 b/src/gallium/docs/source/screen.rst
 index 683080c..b74b237 100644
 --- a/src/gallium/docs/source/screen.rst
 +++ b/src/gallium/docs/source/screen.rst
 @@ -168,6 +168,8 @@ The integer capabilities:
since they are linked) a driver can support. Returning 0 is equivalent
to returning 1 because every driver has to support at least a single
viewport/scissor combination.
 +* ``PIPE_CAP_LAYERED_RENDERING``: Whether rendering to multiple layers is
 +  supported using layer selection by the TGSI_SEMANTIC_LAYER shader 
 variable.


  .. _pipe_capf:
 diff --git a/src/gallium/include/pipe/p_defines.h 
 b/src/gallium/include/pipe/p_defines.h
 index 8af1a84..c359a9e 100644
 --- a/src/gallium/include/pipe/p_defines.h
 +++ b/src/gallium/include/pipe/p_defines.h
 @@ -508,7 +508,8 @@ enum pipe_cap {
 PIPE_CAP_QUERY_PIPELINE_STATISTICS = 81,
 PIPE_CAP_TEXTURE_BORDER_COLOR_QUIRK = 82,
 PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE = 83,
 -   PIPE_CAP_MAX_VIEWPORTS = 84
 +   PIPE_CAP_MAX_VIEWPORTS = 84,
 +   PIPE_CAP_MULTIPLE_LAYERS = 85
  };
 Actually I don't think is a good name, PIPE_CAP_LAYERED_RENDERING might
 be better?
 I'm open to just about any suggestion though :-).
 FWIW, I prefer PIPE_CAP_LAYERED_RENDERING as well.  Other colors:

 PIPE_CAP_RENDER_TARGET_INDEX
 PIPE_CAP_RENDER_TARGET_ARRAY_INDEX
 PIPE_CAP_RENDER_TARGET_LAYERS

Or PIPE_CAP_GS_LAYER_SELECTION to make it clear that the driver doesn't
support GL_AMD_vertex_shader_layer ?

 Alex

 Roland


  #define PIPE_QUIRK_TEXTURE_BORDER_COLOR_SWIZZLE_NV50 (1  0)
 diff --git a/src/gallium/include/pipe/p_shader_tokens.h 
 b/src/gallium/include/pipe/p_shader_tokens.h
 index b33cf1d..c984d50 100644
 --- a/src/gallium/include/pipe/p_shader_tokens.h
 +++ b/src/gallium/include/pipe/p_shader_tokens.h
 @@ -165,7 +165,8 @@ struct tgsi_declaration_interp
  #define TGSI_SEMANTIC_TEXCOORD   19 /** texture or sprite coordinates */
  #define TGSI_SEMANTIC_PCOORD 20 /** point sprite coordinate */
  #define TGSI_SEMANTIC_VIEWPORT_INDEX 21 /** viewport index */
 -#define TGSI_SEMANTIC_COUNT  22 /** number of semantic values */
 +#define TGSI_SEMANTIC_LAYER  22 /** layer (rendertarget index) */
 +#define TGSI_SEMANTIC_COUNT  23 /** number of semantic values */

  struct tgsi_declaration_semantic
  {

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Instancing support in r300g?

2013-05-18 Thread Christoph Bumiller
On 18.05.2013 13:05, Lauri Kasanen wrote:
 Hi,

 The 'net claims that instancing is a SM3 feature[1] (r500), but also
 supported on SM2 ATI cards[2] (r300-r400).

 Yet r300g claims no support for it, and it seems that even Nvidia's

r300_get_param:
case PIPE_CAP_VERTEX_ELEMENT_INSTANCE_DIVISOR: return 1;

That's ARB_instanced_arrays, which is what d3d9 supports
(IDirect3DDevice9::SetStreamSourceFreq).

 Windows drivers don't expose ARB_draw_instanced on gf6 and gf7[3].

 What's the story here? Does the GL extension use something different
 than what DX uses?

 - Lauri

 [1] http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter03.html
 Using the Geometry Instancing API provided by DirectX 9 and fully
 supported in hardware by GeForce 6 Series GPUs

 [2] http://aras-p.info/texts/D3D9GPUHacks.html ,
 http://www.hardwareheaven.com/industry-news/51427-farcy-1-2-withdrawn-patch.html

 [3]
 http://feedback.wildfiregames.com/report/opengl/feature/GL_ARB_draw_instanced
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Instancing support in r300g?

2013-05-18 Thread Christoph Bumiller
On 18.05.2013 17:41, Marek Olšák wrote:
 ARB_draw_instanced is a DX10 feature.

 The R300-R500 chipsets do not support instancing at all.
 ARB_instanced_arrays is emulated with a loop in the driver, so that
 instancing is supported in Wine/DX9.

Modern NV cards still require you to loop in the driver ... the only
hardware support for instancing they added is a builtin counter.

 Marek

 On Sat, May 18, 2013 at 4:59 PM, Lauri Kasanen c...@gmx.com wrote:
 On Sat, 18 May 2013 17:46:32 +0300
 Lauri Kasanen c...@gmx.com wrote:

 On Sat, 18 May 2013 13:50:35 +0200
 Christoph Bumiller e0425...@student.tuwien.ac.at wrote:

 r300_get_param:
 case PIPE_CAP_VERTEX_ELEMENT_INSTANCE_DIVISOR: return 1;

 That's ARB_instanced_arrays, which is what d3d9 supports
 (IDirect3DDevice9::SetStreamSourceFreq).
 Instanced arrays alone, without a way to draw instances is pretty
 useless, would you say?
 My mistake, I didn't see the arrays extension adds those draw calls in
 that situation.

 - Lauri
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/tgsi: clarify (possibly change) TGSI_OPCODE_UCMP definition

2013-05-08 Thread Christoph Bumiller
On 08.05.2013 03:48, srol...@vmware.com wrote:
 From: Roland Scheidegger srol...@vmware.com
 
 UCMP while an integer opcode isn't really consistently implemented as
 having all integer arguments. softpipe will assume all arguments are
 ints, whereas gallivm has the arguments defined as untyped which
 means they'll get treated as floats. This means input modifiers will
 not work the same. Fix this by saying only first arg is an integer,
 which seems more useful than making all arguments integers - this would
 be similar to d3d10 movc opcode.
 ---
  src/gallium/docs/source/tgsi.rst |5 +
  1 file changed, 5 insertions(+)
 
 diff --git a/src/gallium/docs/source/tgsi.rst 
 b/src/gallium/docs/source/tgsi.rst
 index 3af1fb7..852f8a0 100644
 --- a/src/gallium/docs/source/tgsi.rst
 +++ b/src/gallium/docs/source/tgsi.rst
 @@ -1291,6 +1291,11 @@ Support for these opcodes indicated by 
 PIPE_SHADER_CAP_INTEGERS (all of them?)
  
  .. opcode:: UCMP - Integer Conditional Move
  
 +.. note::
 +
 +   Only the first source arg is an integer, the 2nd and 3rd ones are
 +   considered floats (for input modifier purposes).
 +

As long as you patch up all the occurrences of
tgsi_opcode_infer_src_type and make it take an argument to identify the
source ...

I'd rather just forbid modifiers on moves, i.e. MOV and UCMP, since at
least MOV returns TGSI_TYPE_UNTYPED and untyped values can't be operated on.
For the ordinary MOV we have NEG and ABS, and for UCMP the backend
optimizer can take care of merging modifiers into the instruction
(nvc0's UCMP (slct u32) doesn't support modifiers).

  .. math::
  
dst.x = src0.x ? src1.x : src2.x
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nouveau: emit and flush fence in fence_signalled if needed

2013-05-07 Thread Christoph Bumiller
On 07.05.2013 19:25, Bryan Cain wrote:
 The Mesa state tracker expects us to emit the fence even if it doesn't call
 fence_finish.  Notably, this occurs when glClientWaitSync is called with
 timeout 0.
 
 Fixes Portal and Left 4 Dead 2, which were both stalling on startup by
 repeatedly calling glClientWaitSync with timeout 0 while waiting for commands
 to complete.
 ---

I'm not sure I want to do this.
pipe_screen::fence_signalled probably shouldn't flush the command
buffer, r600g doesn't seem to do it either.

They should probably call glFlush() before looping on glClientWaitSync,
or, if they don't have anything better to do in the meantime, simply
specify an infinite timeout if they're going to loop forever anyway.

  src/gallium/drivers/nouveau/nouveau_fence.c |   36 
 ++-
  src/gallium/drivers/nouveau/nouveau_fence.h |1 +
  2 files changed, 25 insertions(+), 12 deletions(-)
 
 diff --git a/src/gallium/drivers/nouveau/nouveau_fence.c 
 b/src/gallium/drivers/nouveau/nouveau_fence.c
 index dea146c..722be01 100644
 --- a/src/gallium/drivers/nouveau/nouveau_fence.c
 +++ b/src/gallium/drivers/nouveau/nouveau_fence.c
 @@ -167,6 +167,25 @@ nouveau_fence_update(struct nouveau_screen *screen, 
 boolean flushed)
 }
  }
  
 +boolean
 +nouveau_fence_ensure_flushed(struct nouveau_fence *fence)
 +{
 +   struct nouveau_screen *screen = fence-screen;
 +
 +   if (fence-state  NOUVEAU_FENCE_STATE_EMITTED) {
 +  nouveau_fence_emit(fence);
 +
 +  if (fence == screen-fence.current)
 + nouveau_fence_new(screen, screen-fence.current, FALSE);
 +   }
 +   if (fence-state  NOUVEAU_FENCE_STATE_FLUSHED) {
 +  if (nouveau_pushbuf_kick(screen-pushbuf, screen-pushbuf-channel))
 + return FALSE;
 +   }
 +
 +   return TRUE;
 +}
 +
  #define NOUVEAU_FENCE_MAX_SPINS (1  31)
  
  boolean
 @@ -174,8 +193,9 @@ nouveau_fence_signalled(struct nouveau_fence *fence)
  {
 struct nouveau_screen *screen = fence-screen;
  
 -   if (fence-state = NOUVEAU_FENCE_STATE_EMITTED)
 -  nouveau_fence_update(screen, FALSE);
 +   if (!nouveau_fence_ensure_flushed(fence))
 +  return FALSE;
 +   nouveau_fence_update(screen, FALSE);
  
 return fence-state == NOUVEAU_FENCE_STATE_SIGNALLED;
  }
 @@ -189,16 +209,8 @@ nouveau_fence_wait(struct nouveau_fence *fence)
 /* wtf, someone is waiting on a fence in flush_notify handler? */
 assert(fence-state != NOUVEAU_FENCE_STATE_EMITTING);
  
 -   if (fence-state  NOUVEAU_FENCE_STATE_EMITTED) {
 -  nouveau_fence_emit(fence);
 -
 -  if (fence == screen-fence.current)
 - nouveau_fence_new(screen, screen-fence.current, FALSE);
 -   }
 -   if (fence-state  NOUVEAU_FENCE_STATE_FLUSHED) {
 -  if (nouveau_pushbuf_kick(screen-pushbuf, screen-pushbuf-channel))
 - return FALSE;
 -   }
 +   if (!nouveau_fence_ensure_flushed(fence))
 +  return FALSE;
  
 do {
nouveau_fence_update(screen, FALSE);
 diff --git a/src/gallium/drivers/nouveau/nouveau_fence.h 
 b/src/gallium/drivers/nouveau/nouveau_fence.h
 index 3984a9a..d497c7f 100644
 --- a/src/gallium/drivers/nouveau/nouveau_fence.h
 +++ b/src/gallium/drivers/nouveau/nouveau_fence.h
 @@ -34,6 +34,7 @@ boolean nouveau_fence_new(struct nouveau_screen *, struct 
 nouveau_fence **,
  boolean nouveau_fence_work(struct nouveau_fence *, void (*)(void *), void *);
  voidnouveau_fence_update(struct nouveau_screen *, boolean flushed);
  voidnouveau_fence_next(struct nouveau_screen *);
 +boolean nouveau_fence_ensure_flushed(struct nouveau_fence *);
  boolean nouveau_fence_wait(struct nouveau_fence *);
  boolean nouveau_fence_signalled(struct nouveau_fence *);
  
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: tgsi opcodes for 32x32 muls with 64bit results

2013-05-03 Thread Christoph Bumiller
On 03.05.2013 16:32, Jose Fonseca wrote:

 - Original Message -
 Am 03.05.2013 06:58, schrieb Jose Fonseca:

 - Original Message -
 Currently, there's no way to get the high bits of a 32x32
 signed/unsigned integer multiplication with tgsi. However, all of
 d3d10, OpenGL, and OpenCL support that, so we need it as well.
 There's essentially two ways how it could be done: - a
 2-destination instruction returning both high and low bits (this
 is how it looks like in d3d10 and glsl) - use the existing umul for
 the low bits and have another instruction for the high bits (this
 is how it looks like in opencl)

 Well there's other possibilities but these looked like they'd match
 both APIs and HW reasonably (well with the exception of things like
 sse2 which would prefer 2x2 32bit inputs and return 2x64bit as one
 reg...).

 Actually it's two new instructions because unlike for the low bits
 it matters for the high bits if the source operands are signed or
 unsigned.

 Personally I'm favoring two separate instructions for low and high
 bits to not have to deal with multi-destination instructions, but
 if someone makes a strong case for one returning both low and high
 bits I could be convinced otherwise. I think though two
 instructions matches most hw very well (with the exception of
 software renderers and possibly intel graphics but then a good
 backend could certainly recognize this).
 Roland,

 I don't know about GPU HW, but I think that what you propose will
 forever prevent decent SSE code generation with LLVM.

 Using two separate opcodes for hi/low bits relies on common
 sub-expression elimination to merge the two multiplication operations
 back into one.  But I strongly doubt that even LLVM's optimization
 passes will be able to do that.

 Getting the 64bits results with LLVM will require sign extend the
 source arguments (http://llvm.org/docs/LangRef.html#mul-instruction )
 or SSE intrinsics. Eitherway, the expressions for the low and high
 bit will be radically different, so we'll end with two multiplies in
 the end -- which I think it is simply inadmissible -- TGSI should not
 stand in the way of backends generating good code.
 You can't generate good code either way, this is a deficiency of sse
 instruction set.
 As I've outlined in another email, I think the best you can do with
 sse41 is:
 - shuffle both src args (put 2nd/4th elements into 1st/3rd slot)
 - 2xpmuldq/pmuludq for doing the 32x32-64bit mul for both 1st/3rd and
 2nd/4th element
 - shuffle the high bits into place (I think this needs 3 hw shuffle
 instructions)
 - shuffle the low bits into place (can benefit from shuffles for high
 bits, so just one another shuffle)

 Maybe you can do better with more clever shuffles, but in any case the
 low bits will always require one (at least) additional shuffle.

 If you have separate opcodes, everything will be the same, except the
 last step you'll just ignore that shuffle and instead just use the
 pmulld instruction, which will do exactly what you need for the low
 bits. Sure multiplications are more effort for the hw, but hell it even
 has the same throughput on most cpus compared to a shuffle, just latency
 is worse. In any case it would be 8 vs 8 instructions, with just one
 instruction of them very slightly worse. We have much more optimization
 opportunities elsewhere than that (I agree that with sse2, which lacks
 pmulld, it would be worse, but we never particularly cared about that).
 That's the thing -- if we have 32x32-64 opcodes we can fine tune this later. 
 If we stick with separate high bit opcodes then that ability is lost (at 
 least without coming back and changing TGSI again).

 So I strongly think this is a bad idea. TGSI has support for multiple
 destinations, though we never made much use of it. I see nothing
 special about it.

 If you can prove me wrong -- that LLVM can handle merge the
 multiplies -- fine.  But I do think we have bigger fish to fry, so
 I'd prefer we don't put too much time debating this.
 No I doubt llvm can merge it (though in theory nothing would prevent it
 from recognizing the pattern). My guess is it will do scalar extraction,
 and use the imul/mul instructions (which can return 2x32bit numbers even
 on 32bit), then combine the vectors back together (most likely element
 by element). If it actually does it like that, a separate mul for the
 low bits would be in fact a win, because it would save the 4 reinsertion
 of the elements at the cost of just one vector mul (llvm uses pmulld
 just fine). But looking at this that way doesn't really make sense, we
 need instructions which make sense for everybody and aren't specified to
 suit one very peculiar implementation.
 But even if it generates optimal code, fact is that the multiply for
 getting the low bits is essentially noise in the whole instruction
 sequence. And who knows maybe intel will one day add some pmulhd/pmulhud
 instruction (which just makes plain more sense for vector 

Re: [Mesa-dev] [PATCH 2/5] gallium: increase the number of available stream output decls

2013-04-25 Thread Christoph Bumiller
On 25.04.2013 19:22, Roland Scheidegger wrote:
 Am 24.04.2013 00:58, schrieb Zack Rusin:
 There can be more stream output decls than shader outputs because
 individual components from them can be split and distributed
 among different so buffers.

 Signed-off-by: Zack Rusin za...@vmware.com
 ---
  src/gallium/include/pipe/p_state.h |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/src/gallium/include/pipe/p_state.h 
 b/src/gallium/include/pipe/p_state.h
 index c0b2bcd..5830dff 100644
 --- a/src/gallium/include/pipe/p_state.h
 +++ b/src/gallium/include/pipe/p_state.h
 @@ -64,6 +64,7 @@ extern C {
  #define PIPE_MAX_SHADER_RESOURCES 32
  #define PIPE_MAX_TEXTURE_LEVELS   16
  #define PIPE_MAX_SO_BUFFERS4
 +#define PIPE_MAX_SO_OUTPUT_COMPONENT_COUNT 128
  
  
  struct pipe_reference
 @@ -198,7 +199,7 @@ struct pipe_stream_output_info
unsigned num_components:3;  /** 1 to 4 */
unsigned output_buffer:3;   /** 0 to PIPE_MAX_SO_BUFFERS */
unsigned dst_offset:16; /** offset into the buffer in dwords */
 -   } output[PIPE_MAX_SHADER_OUTPUTS];
 +   } output[PIPE_MAX_SO_BUFFERS * PIPE_MAX_SO_OUTPUT_COMPONENT_COUNT];
  };
  
  

 Are you sure this isn't overkill, that is if you have multiple buffers
 this really increases the total number of attributes you can output? I

Actually yes, we can output 4 * [0 to 128] components on = Fermi.

It's getting a bit large though (2 KiB), so I'd probably switch to not
storing that whole struct for each shader then (in the driver) ...

 thought this merely allows you to distribute the same number to
 different buffers.
 Also I'm not quite convinced with the 128 number. It looks like d3d10
 has a limit of 64 components, and it seems like OpenGL would be happy
 with that as well.

 Roland
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallium: Replaced gl_rasterization_rules with lower_left_origin and half_pixel_center.

2013-04-23 Thread Christoph Bumiller
On 23.04.2013 18:28, Jose Fonseca wrote:
 Ok. I've moved the docs to src/gallium/docs/source/cso/rasterizer.rst , and 
 renamed `lower_left_origin` to `bottom_edge_rule`.
 

Well, that doesn't work for NV, but it's at least less invasive for
radeon since you don't have to change the state tracker (using
lower_left_origin instead of flipping viewport + bottom_edge_rule) to
get things working correctly.

/me breathes, tries not to care, too much stuff on my plate already

 This is how it looks like:
 
   
 http://people.freedesktop.org/~jrfonseca/gl_rasterization_rules/cso/rasterizer.html#other-members
 
 Jose
 
 - Original Message -
 Yeah, I was confused when reading the comment and the diagrams. It
 probably shouldn't mention the screen origin at all and instead should
 say which one of the top and bottom edges is inclusive and which one
 is exclusive when determining pixel ownership.

 Anyway, thank you for fixing this. I would have probably never knew
 how to fix the triangle rasterization tests if you didn't bring this
 up.

 Marek

 On Sun, Apr 21, 2013 at 7:54 PM, Jose Fonseca jfons...@vmware.com wrote:
 - Original Message -
 Some suggestions for the name:

 lower_left_edge_rule
 lower_left_rasterization_edge_rule
 gl_edge_rule
 gl_rasterization_edge_rule

 In this case, the name is not as important as the documentation which
 defines the behavior of the state.

 On that note, I thought that James' diagrams were pretty good.  Maybe the
 axis is misleading.


 +   /**
 +* Triangle rasterization always uses a 'top,left' rule for pixel
 ownership,
 +* this just alters what we consider to be the top edge for that test.
 +*
 +* When true, screen coordinates origin is considered to be at
 bottom-left
 +* (e.g., OpenGL drawables):
 +*
 +*  y ^
 +*|
 +*|  +=+ - top edge
 +*|  | |
 +*|  | |
 +*|  | |
 +*|  +-+
 +*|
 +*  0 +-
 +*0x
 +*
 +*  When false, screen coordinates origin is considered to be at
 top-left
 +*  (e.g., OpenGL FBOs, D3D):
 +*
 +*0x
 +*  0 +-
 +*|
 +*|  +=+ - top edge
 +*|  | |
 +*|  | |
 +*|  | |
 +*|  +-+
 +*|
 +*  y V
  *
 -* Triangle rasterization always uses a 'top,left' rule for pixel
 -* ownership, this just alters which point we consider the pixel
 -* center for that test.
 +* See also:
 +* -
 http://www.opengl.org/registry/specs/ARB/fragment_coord_conventions.txt
 +* -
 http://msdn.microsoft.com/en-us/library/windows/desktop/cc627092.aspx
 +* -
 http://msdn.microsoft.com/en-us/library/windows/desktop/bb147314.aspx
  */
 -   unsigned gl_rasterization_rules:1;
 +   unsigned lower_left_origin:1;

 /**
  * When true, rasterization is disabled and no pixels are written.

 Jose


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallium: Replaced gl_rasterization_rules with lower_left_origin and half_pixel_center.

2013-04-22 Thread Christoph Bumiller
On 21.04.2013 14:35, Jose Fonseca wrote:
 
 
 - Original Message -
 On 21.04.2013 13:18, Jose Fonseca wrote:

 I think that drivers can just report all 4 CAPs as supported and do the
 adjustment in the shader themselves (no need for recompilation, just use
 uniforms, the st already does it like that), provided that the state
 tracker actually uses the rasterizer origin bit instead of changing the
 viewport and applies no transformation to the fragment coordinate
 whatsoever.
 
 I'm not sure how much that simplifies in the end. If the drivers need to 
 resort to uniforms to deal with all combinations, then how will making the 
 gl_Fragcoord/viewport transformation depend on lower_left_origin simplify 
 things? 
 
 Is it really true that for all hardware gl_FragCoord will depend on the 
 lower_left_origin rasterizer state?
 

I don't know about all hardware. R600 doesn't have that origin switch,
but the half-integers switch might have an effect.

My suggestion about letting the driver modify the coordinate was to
avoid having a dependency in the gallium interface between the shader
setting, or worse, yet another cap about whether it exists.

The only (small) issue is, if a driver does handle the origin switch and
compensates for the effect on FragCoord, and the state tracker decides
to not use that switch and just flips the viewport, it has to do its own
transformation on FragCoord, we get to do 2 transformations.

 Finally, I think this is precisely what Marek was concerned; so to allow 
 existing drivers to opt out from having to deal with this, we'll need a cap.

Which is, I guess, why we have to add both versions depending on a CAP
once again, i.e. for some drivers the origin switch in the rasterizer is
used (nouveau at least; this should affect the edge rule; I think I
looked for an independent switch way back and didn't find one) and for
other drivers the viewport is flipped in combination with changing a
separate edge rule rasterizer state.
Maybe some drivers even support both (independent change of edge rule
and origin) ...

 
 
 That said, I don't oppose any of this if it make HW driver implementer lives 
 easier.
 
 But how seriously/quickly are you and other hardware drivers maintainers 
 actually aiming at implementing this? I don't wanna go through all that 
 trouble if nobody will care.
 

Well, there's not much code (in terms of lines) to write on the driver
side, but code that uglifies things always takes a bit longer to become
comfortable with ...

 
 Either way, I think that this patch series already is a good improvement over 
 the ugly one-bit-fit-all-needs gl_rasterization_rules state, and should 
 cause no regressions whatsoever.  I'd like to tackle the entanglement of 
 lower_left_origin with other bits of state in a follow-on gallium change 
 after there is a clearer understanding/consensus if/how will HW implement 
 this.
 
 Jose
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallium: Replaced gl_rasterization_rules with lower_left_origin and half_pixel_center.

2013-04-21 Thread Christoph Bumiller
On 21.04.2013 09:36, Jose Fonseca wrote:
 - Original Message -
 Do we really need the lower_left_origin state? I think I can't
 implement it for radeon and it's the kind of stuff that should be
 taken care of by the state tracker anyway. 
 My understanding is that hardware had switches for this sort of thing. It's 
 really hard to provide fully-conforming rasterization for opengl, dx9  dx10 
 without it.

 If your hardware allows to put a negative pitch on rendertargets, then that 
 should also do it.

I have a switch for the upside down thing, but maybe it could be
framebuffer state instead of rasterizer state (since it's going to
either not change (D3D) or only change with the famebuffer, and I have
to set WINDOW_OFFSET_Y to 0 / fb height depending on the setting of Y
direction (the latter won't work with MRTs, but that's the non-FBO case
anyway)) ?

R600 seems to have PA_SU_VTX_CNTL.PIX_CENTER but no state to change the
window origin / direction ... and I'd rather not have to bother with it
myself either.

Also, note that this state and the pixel center one might (or maybe I
should say will) affect the values of hardware's gl_FragCoord and hence
PIPE_CAP_TGSI_FS_COORD_ORIGIN/PIXEL_CENTER*, i.e. the shader
transformation of that input must be adjusted according to this state.
I'd probably be OK with making this the driver's task.

 If you know what is the hardware's sub-pixel rasterization resolution, then 
 adding a vertical bias equal to that amount, depending on this state, would 
 give a very close approximation. (This would get the top/bottom edges right, 
 at expense of small inaccuracies on non-horizontal edges)

 Isn't it sufficient to just
 set a viewport which is upside down, like we do now?
 I'm not aware of rasterization top-left rule being affected by the viewport 
 flipping.

 Do both

  ./bin/triangle-rasterization -auto
  ./bin/triangle-rasterization -use_fbo -auto

 currently work for you?


 If drivers don't provide this state, the only way to workaround it I know 
 would be to store textures (or drawables?) up-side down, and flip them on 
 gl(Get)TexImage  friends.  This would be like using a cannon to shoot a fly 
 (a lot of work and a lot of overheads for a small correctness detail).  I 
 think the drivers are better equipped to handle this.

 And you always have the option of merely ignoring this state.  Top-left rule 
 correct rasterization has, after all, been ignored till date, and nobody 
 cared.


 For the record, my motivation here is simple: llvmpipe gets the right 
 behavior on GL drawables, and fails on GL FBOs  D3D 9/10. I want to get the 
 right behavior on D3D 9/10 without causing regressions on GL drawables.

 BTW, I'd imagine that if hardware rasterizer behavior is hardcoded to 
 anything, it would be to D3D 9/10 behavior. That is, they would get GL FBO 
 right, but drawables wrong.


 Jose
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallium: Replaced gl_rasterization_rules with lower_left_origin and half_pixel_center.

2013-04-21 Thread Christoph Bumiller
On 21.04.2013 12:34, Dave Airlie wrote:
 On Sun, Apr 21, 2013 at 5:36 PM, Jose Fonseca jfons...@vmware.com wrote:
 - Original Message -
 Do we really need the lower_left_origin state? I think I can't
 implement it for radeon and it's the kind of stuff that should be
 taken care of by the state tracker anyway.
 My understanding is that hardware had switches for this sort of thing. It's 
 really hard to provide fully-conforming rasterization for opengl, dx9  dx10 
 without it.

 If your hardware allows to put a negative pitch on rendertargets, then that 
 should also do it.

 If you know what is the hardware's sub-pixel rasterization resolution, then 
 adding a vertical bias equal to that amount, depending on this state, would 
 give a very close approximation. (This would get the top/bottom edges right, 
 at expense of small inaccuracies on non-horizontal edges)

 Isn't it sufficient to just
 set a viewport which is upside down, like we do now?
 I'm not aware of rasterization top-left rule being affected by the viewport 
 flipping.

 Do both

  ./bin/triangle-rasterization -auto
  ./bin/triangle-rasterization -use_fbo -auto

 currently work for you?

 just FYI, on my evergreen, the first fails the second passes, maybe
 someone could try on fglrx, I'd be sorta willing to guess AMD hw just
 does DX10 :)

 and I think I've heard some complaints about our rendering offseting
 being wrong somewhere in the past on r600.

Same on nouveau. On NV blob it's the other way around, it fails for
-use_fbo. So clearly, both can work.

 Dave.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallium: Replaced gl_rasterization_rules with lower_left_origin and half_pixel_center.

2013-04-21 Thread Christoph Bumiller
On 21.04.2013 13:18, Jose Fonseca wrote:

 - Original Message -
 On 21.04.2013 09:36, Jose Fonseca wrote:
 - Original Message -
 Do we really need the lower_left_origin state? I think I can't
 implement it for radeon and it's the kind of stuff that should be
 taken care of by the state tracker anyway.
 My understanding is that hardware had switches for this sort of thing. It's
 really hard to provide fully-conforming rasterization for opengl, dx9 
 dx10 without it.

 If your hardware allows to put a negative pitch on rendertargets, then that
 should also do it.
 I have a switch for the upside down thing, but maybe it could be
 framebuffer state instead of rasterizer state (since it's going to
 either not change (D3D) 
 You're right, they should never change at higher frequency than 
 per-framebuffer.

 But due to auxiliary modules like u_blit, u_blitter, u_gen_mipmap, this state 
 will eventually change even for D3D state trackers.  (This is however 
 fixable, if there are performance implications switching this state, we could 
 enhance these helper modules so that they switch it often. But I doubt this 
 is a problem in practice)

 or only change with the famebuffer, and I have
 to set WINDOW_OFFSET_Y to 0 / fb height depending on the setting of Y
 direction (the latter won't work with MRTs, but that's the non-FBO case
 anyway)) ?
 Yes, it could go in theory, and truth is rasterizer state is full of bits 
 that apply to other stages of the pipeline, but the practical hurdle of 
 moving this to pipe_framebuffer is that pipe_framebuffer has no discrete 
 state beyond surfaces so far (it is little more than a tuple of surfaces), so 
 a lot of code would need to be updated to fill, propagate, and consider such 
 state in pipe_framebuffer...

 I presume your concern is that rasterizer state changes frequently where as 
 framebuffer state changes infrequently, so adding a dependency would cause 
 framebuffer to be processed more often than desired.  You can avoid that by 
 keeping track of the lower_left_origin state independently at 
 nvc0_rasterizer_state_bind:

 diff --git a/src/gallium/drivers/nvc0/nvc0_state.c 
 b/src/gallium/drivers/nvc0/nvc0_state.c
 index cba076f..2a6fabf 100644
 --- a/src/gallium/drivers/nvc0/nvc0_state.c
 +++ b/src/gallium/drivers/nvc0/nvc0_state.c
 @@ -324,6 +324,12 @@ nvc0_rasterizer_state_bind(struct pipe_context *pipe, 
 void *hwcso)
  
 nvc0-rast = hwcso;
 nvc0-dirty |= NVC0_NEW_RASTERIZER;
 +
 +   if (nvc0-rast 
 +   nvc0-lower_left_origin != nvc0-rast-pipe.lower_left_origin) {
 +  nvc0-lower_left_origin = nvc0-rast-pipe.lower_left_origin;
 +  nvc0-dirty |= NVC0_NEW_FRAMEBUFFER;
 +   }
  }

  static void

 This means you won't need to validate framebuffer anymore often than strictly 
 necessary. You could also have a new NVC0_NEW_FRAMEBUFFER_ORIGIN flag, just 
 for tidyness.

 R600 seems to have PA_SU_VTX_CNTL.PIX_CENTER but no state to change the
 window origin / direction ... and I'd rather not have to bother with it
 myself either.
 I need to get this working flawlessly on llvmpipe, but I really see no much 
 need for hw driver developers to rush and get this handled properly.  There 
 is probably much bigger fish to fry.

 If people care enough to devise a state tracker workaround, we could have 
 this on a PIPE_CAP.  I'd be all for it.  But even in that case, I think that 
 nudging the coordinates slightly would probably get the most bang for buck.

 Also, note that this state and the pixel center one might (or maybe I
 should say will) affect the values of hardware's gl_FragCoord and hence
 PIPE_CAP_TGSI_FS_COORD_ORIGIN/PIXEL_CENTER*, i.e. the shader
 transformation of that input must be adjusted according to this state.
 I'd probably be OK with making this the driver's task.
 The FS_COORD_PIXEL_CENTER spec in src/gallium/docs/source/tgsi.rst already 
 stated that these are independent: 

   Note that this does not affect the set of fragments generated by
   rasterization, which is instead controlled by gl_rasterization_rules in the
   rasterizer.

 And I'm not changing the semantics.  That also seems the spirit of 
 GL_ARB_fragment_coord_conventions spec. 

 I wouldn't object to add to Gallium a dependency betwen these state if it 
 helps hw driver developers, but I don't see how we could define it in such 
 way that it would work well for all cases. And I suspect that different 
 hardware probably handles this slightly differently (ie, what is orthogonal 
 to some is not to others).

I think that drivers can just report all 4 CAPs as supported and do the
adjustment in the shader themselves (no need for recompilation, just use
uniforms, the st already does it like that), provided that the state
tracker actually uses the rasterizer origin bit instead of changing the
viewport and applies no transformation to the fragment coordinate
whatsoever.

 Jose

___
mesa-dev mailing list

Re: [Mesa-dev] [PATCH 2/5] gallium: document breakc and switch/case/default/endswitch

2013-04-19 Thread Christoph Bumiller
On 19.04.2013 09:26, Jose Fonseca wrote:

 - Original Message -
 From: Roland Scheidegger srol...@vmware.com

 docs were missing, especially the opcode-from-hell switch however is anything
 but obvious.
 ---
  src/gallium/docs/source/tgsi.rst |   57
  ++
  1 file changed, 51 insertions(+), 6 deletions(-)

 diff --git a/src/gallium/docs/source/tgsi.rst
 b/src/gallium/docs/source/tgsi.rst
 index b7180f8..b46347e 100644
 --- a/src/gallium/docs/source/tgsi.rst
 +++ b/src/gallium/docs/source/tgsi.rst
 @@ -861,7 +861,18 @@ This instruction replicates its result.
  
  .. opcode:: BRK - Break
  
 -  TBD
 +  Unconditionally moves the point of execution to the instruction after the
 +  next endloop or endswitch. The instruction must appear within a
 loop/endloop
 +  or switch/endswitch.
 +
 +
 +.. opcode:: BREAKC - Break Conditional
 +
 +  Conditionally moves the point of execution to the instruction after the
 +  next endloop or endswitch. The instruction must appear within a
 loop/endloop
 +  or switch/endswitch.
 +  Condition evaluates to true if src0.x != 0 where src0.x is interpreted
 +  as an integer register.
 This is fine. But I do wonder if hardware can really benefit from UIF foo; 
 BREAK; ENDIF vs BREAKC foo, or if this is just syntactic sugar that merely 
 burdens developers. 

IF; BREAK; ENDIF usually gets optimized into a BREAKC anyway, so, it's
just easier on the compiler and people who write shaders in TGSI, and
drivers without optimization.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/14] st/mesa: add a simple path to BufferData if it only discards buffer contents

2013-04-19 Thread Christoph Bumiller
On 19.04.2013 14:08, Marek Olšák wrote:
 That's not true. PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE does not copy
 data in radeon drivers. It really does what st/mesa does - it creates
 a new buffer and throws away the old one, which doesn't take any GPU
 bandwidth. Doing that at a lower level should be faster in theory (+

Moreover, for VRAM buffers it also saves me the reallocation because
they always use staging transfers.

The only downside is if you do have to reallocate you have to check all
binding points and set them dirty if they use the resource in question.

I just hope that for the other multithreaded contexts (which you can't
really interrupt to set dirty bits) we can assume that the user takes
care of any issues ...

 drivers have multiple options how to implement the discarding).

 Only PIPE_TRANSFER_DISCARD_RANGE copies data in radeon drivers, which
 is not used here.

 Marek

 On Wed, Apr 17, 2013 at 8:15 PM, Eric Anholt e...@anholt.net wrote:
 Marek Olšák mar...@gmail.com writes:

 The next patch makes sure _NEW_BUFFER_OBJECT is not needlessly set
 for this code.
 This seems like a pretty dubious optimization -- on UMA systems you're
 increasing the memory bandwidth usage in the  data case, and only
 trying to eliminate update_array_object_max_element, which also happens
 with _NEW_PROGRAM (I bet it's true every time that a _NEW_BUFFER_OBJECT
 was flagged, anyway).

 In short, for the Mesa core change, I'd like to see some actual
 performance justification on this one.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/mesa: optionally apply texture swizzle to border color v2

2013-04-15 Thread Christoph Bumiller
From: Christoph Bumiller christoph.bumil...@speed.at

This is the only sane solution for nv50 and nvc0 (really, trust me),
but since on other hardware the border colour is tightly coupled with
texture state they'd have to undo the swizzle, so I've added a cap.

The dependency of update_sampler on the texture updates was
introduced to avoid doing the apply_depthmode to the swizzle twice.

v2: Moved swizzling helper to u_format.c, extended the CAP to
provide more accurate information.
---
 src/gallium/auxiliary/util/u_format.c|   34 ++
 src/gallium/auxiliary/util/u_format.h|   12 
 src/gallium/docs/source/cso/sampler.rst  |6 ++-
 src/gallium/docs/source/screen.rst   |   11 +++
 src/gallium/drivers/freedreno/freedreno_screen.c |1 +
 src/gallium/drivers/i915/i915_screen.c   |1 +
 src/gallium/drivers/llvmpipe/lp_screen.c |2 +
 src/gallium/drivers/nv30/nv30_screen.c   |1 +
 src/gallium/drivers/nv50/nv50_screen.c   |2 +
 src/gallium/drivers/nvc0/nvc0_screen.c   |2 +
 src/gallium/drivers/r300/r300_screen.c   |1 +
 src/gallium/drivers/r600/r600_pipe.c |3 ++
 src/gallium/drivers/radeonsi/radeonsi_pipe.c |1 +
 src/gallium/drivers/softpipe/sp_screen.c |2 +
 src/gallium/drivers/svga/svga_screen.c   |2 +
 src/gallium/include/pipe/p_defines.h |7 -
 src/mesa/state_tracker/st_atom.c |2 +-
 src/mesa/state_tracker/st_atom_sampler.c |   27 +++--
 src/mesa/state_tracker/st_context.c  |3 ++
 src/mesa/state_tracker/st_context.h  |1 +
 20 files changed, 114 insertions(+), 7 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_format.c 
b/src/gallium/auxiliary/util/u_format.c
index 1845637..9bdc2ea 100644
--- a/src/gallium/auxiliary/util/u_format.c
+++ b/src/gallium/auxiliary/util/u_format.c
@@ -632,6 +632,40 @@ void util_format_compose_swizzles(const unsigned char 
swz1[4],
}
 }
 
+void util_format_apply_color_swizzle(union pipe_color_union *dst,
+ const union pipe_color_union *src,
+ const unsigned char swz[4],
+ const boolean is_integer)
+{
+   unsigned c;
+
+   if (is_integer) {
+  for (c = 0; c  4; ++c) {
+ switch (swz[c]) {
+ case PIPE_SWIZZLE_RED:   dst-ui[c] = src-ui[0]; break;
+ case PIPE_SWIZZLE_GREEN: dst-ui[c] = src-ui[1]; break;
+ case PIPE_SWIZZLE_BLUE:  dst-ui[c] = src-ui[2]; break;
+ case PIPE_SWIZZLE_ALPHA: dst-ui[c] = src-ui[3]; break;
+ default:
+dst-ui[c] = (swz[c] == PIPE_SWIZZLE_ONE) ? 1 : 0;
+break;
+ }
+  }
+   } else {
+  for (c = 0; c  4; ++c) {
+ switch (swz[c]) {
+ case PIPE_SWIZZLE_RED:   dst-f[c] = src-f[0]; break;
+ case PIPE_SWIZZLE_GREEN: dst-f[c] = src-f[1]; break;
+ case PIPE_SWIZZLE_BLUE:  dst-f[c] = src-f[2]; break;
+ case PIPE_SWIZZLE_ALPHA: dst-f[c] = src-f[3]; break;
+ default:
+dst-f[c] = (swz[c] == PIPE_SWIZZLE_ONE) ? 1.0f : 0.0f;
+break;
+ }
+  }
+   }
+}
+
 void util_format_swizzle_4f(float *dst, const float *src,
 const unsigned char swz[4])
 {
diff --git a/src/gallium/auxiliary/util/u_format.h 
b/src/gallium/auxiliary/util/u_format.h
index ed942fb..e4b9c36 100644
--- a/src/gallium/auxiliary/util/u_format.h
+++ b/src/gallium/auxiliary/util/u_format.h
@@ -33,6 +33,9 @@
 #include pipe/p_format.h
 #include util/u_debug.h
 
+union pipe_color_union;
+
+
 #ifdef __cplusplus
 extern C {
 #endif
@@ -1117,6 +1120,15 @@ void util_format_compose_swizzles(const unsigned char 
swz1[4],
   const unsigned char swz2[4],
   unsigned char dst[4]);
 
+/* Apply the swizzle provided in \param swz (which is one of PIPE_SWIZZLE_x)
+ * to \param src and store the result in \param dst.
+ * \param is_integer determines the value written for PIPE_SWIZZLE_ONE.
+ */
+void util_format_apply_color_swizzle(union pipe_color_union *dst,
+ const union pipe_color_union *src,
+ const unsigned char swz[4],
+ const boolean is_integer);
+
 void util_format_swizzle_4f(float *dst, const float *src,
 const unsigned char swz[4]);
 
diff --git a/src/gallium/docs/source/cso/sampler.rst 
b/src/gallium/docs/source/cso/sampler.rst
index 26ffc18..9959793 100644
--- a/src/gallium/docs/source/cso/sampler.rst
+++ b/src/gallium/docs/source/cso/sampler.rst
@@ -101,7 +101,9 @@ max_lod
 border_color
 Color union used for texel coordinates that are outside the [0,width-1],
 [0, height-1] or [0, depth-1] ranges. Interpreted according

[Mesa-dev] [PATCH] nv50/ir: handle TGSI_OPCODE_IF(float) properly

2013-04-14 Thread Christoph Bumiller
You can merge this with the original UIF patch if you want.
---
 .../drivers/nv50/codegen/nv50_ir_from_tgsi.cpp |7 ++-
 .../drivers/nv50/codegen/nv50_ir_lowering_nv50.cpp |2 +-
 .../drivers/nvc0/codegen/nv50_ir_lowering_nvc0.cpp |2 +-
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp
index 054c75e..d8abccd 100644
--- a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp
@@ -386,6 +386,7 @@ static nv50_ir::TexTarget translateTexture(uint tex)
 nv50_ir::DataType Instruction::inferSrcType() const
 {
switch (getOpcode()) {
+   case TGSI_OPCODE_UIF:
case TGSI_OPCODE_AND:
case TGSI_OPCODE_OR:
case TGSI_OPCODE_XOR:
@@ -2431,10 +2432,6 @@ Converter::handleInstruction(const struct 
tgsi_full_instruction *insn)
   mkOp1(op, TYPE_U32, NULL, src0)-fixed = 1;
   break;
case TGSI_OPCODE_IF:
-  /* XXX: fall-through into UIF, but this might lead to
-   * incorrect behavior on state trackers and auxiliary
-   * modules that emit float bool IFs regardless of
-   * native integer support */
case TGSI_OPCODE_UIF:
{
   BasicBlock *ifBB = new BasicBlock(func);
@@ -2443,7 +2440,7 @@ Converter::handleInstruction(const struct 
tgsi_full_instruction *insn)
   condBBs.push(bb);
   joinBBs.push(bb);
 
-  mkFlow(OP_BRA, NULL, CC_NOT_P, fetchSrc(0, 0));
+  mkFlow(OP_BRA, NULL, CC_NOT_P, fetchSrc(0, 0))-setType(srcTy);
 
   setPosition(ifBB, true);
}
diff --git a/src/gallium/drivers/nv50/codegen/nv50_ir_lowering_nv50.cpp 
b/src/gallium/drivers/nv50/codegen/nv50_ir_lowering_nv50.cpp
index 20f76f8..03086e3 100644
--- a/src/gallium/drivers/nv50/codegen/nv50_ir_lowering_nv50.cpp
+++ b/src/gallium/drivers/nv50/codegen/nv50_ir_lowering_nv50.cpp
@@ -1011,7 +1011,7 @@ NV50LoweringPreSSA::checkPredicate(Instruction *insn)
   return;
cdst = bld.getSSA(1, FILE_FLAGS);
 
-   bld.mkCmp(OP_SET, CC_NEU, TYPE_U32, cdst, bld.loadImm(NULL, 0), pred);
+   bld.mkCmp(OP_SET, CC_NEU, insn-dType, cdst, bld.loadImm(NULL, 0), pred);
 
insn-setPredicate(insn-cc, cdst);
 }
diff --git a/src/gallium/drivers/nvc0/codegen/nv50_ir_lowering_nvc0.cpp 
b/src/gallium/drivers/nvc0/codegen/nv50_ir_lowering_nvc0.cpp
index 4d1d372..7676185 100644
--- a/src/gallium/drivers/nvc0/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nvc0/codegen/nv50_ir_lowering_nvc0.cpp
@@ -1490,7 +1490,7 @@ NVC0LoweringPass::checkPredicate(Instruction *insn)
// CAUTION: don't use pdst-getInsn, the definition might not be unique,
//  delay turning PSET(FSET(x,y),0) into PSET(x,y) to a later pass
 
-   bld.mkCmp(OP_SET, CC_NEU, TYPE_U32, pdst, bld.mkImm(0), pred);
+   bld.mkCmp(OP_SET, CC_NEU, insn-dType, pdst, bld.mkImm(0), pred);
 
insn-setPredicate(insn-cc, pdst);
 }
-- 
1.7.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: optionally apply texture swizzle to border color

2013-04-14 Thread Christoph Bumiller
On 14.04.2013 13:44, Jose Fonseca wrote:
 - Original Message -
 From: Christoph Bumiller christoph.bumil...@speed.at

 This is the only sane solution for nv50 and nvc0 (really, trust me),
 but since on other hardware the border colour is tightly coupled with
 texture state they'd have to undo the swizzle, so I've added a cap.

 The name of the cap could be changed to be more descriptive, like
 PIPE_CAP_TEXTURE_SWIZZLE_AFFECTS_BORDER_COLOR.
 Yes, please.
  
 The dependency of update_sampler on the texture updates was
 introduced to avoid doing the apply_depthmode to the swizzle twice.

 More detailed explanation of driver situation:

 No, really, don't suggest doing this in the driver. The driver has
 elegantly separated texture view and sampler states (which are each
 a structure in a table in VRAM and should not be updated to avoid
 performance loss), and table are bound to the independent (!) 
 I wonder if this is modeled after D3D10, where sampler state is independent 
 from resource view state. Though as far as I known, D3D10's interpretation of 
 texture border color does not depend on the swizzle...

 texture
 and sampler slots in shaders which must be separately indexable
 indirectly).
 So, if I was to do this in the driver, I'd have to add separate sampler
 state object instances for each texture view with appropriately swizzled
 border color, and there's only 16 slots, so I'd be limited to 4 texture
 units.
 Not to mention the sheer insanity, ugliness and emotional pain incurred
 when writing that code when it COULD be so easy and simple in the state
 tracker where you know that textures and samplers are tightly coupled,
 while in gallium I cannot assume that to be the case.
 You wouldn't really need to create all state combinations: if you known that 
 textures and samplers are tightly coupled, then caching the actually used 
 combinations will get you exactly the same behavior, without losing 
 performance or generality.  But granted, this would require more effort.

The emphasize being on IF I knew (that they're tighly coupled). If I
did, I could switch to linked mode where the card automatically uses the
view index as sampler index, ignoring the actual sampler index, and
validate them together.
However, that only applies to 3D, not to COMPUTE (which means that GL
compute shaders will still have the problem), and I'd have to support
both variants for state trackers that do not allow the coupling, and we
need a way for the state tracker to actually tell us what it wants. All
that makes it even quirkier.

 Also please spare a thought for other state trackers -- and I'm not even 
 talking about a potential D3D10 state tracker for which your driver would be 
 unusable --, even inside Mesa: it seems like src/gallium/state_trackers/vega 
 uses both texture border and swizzle, probably vl state tracker too, so your 
 driver will be busted on those state trackers. These need to be

It already is busted. It's also busted on r600 where making border color
+ swizzle work properly isn't even POSSIBLE (according to the radeon guys).

Maybe not for vega, it doesn't use a permutational swizzle, it just sets
components to PIPE_SWIZZLE_ONE, and incidentally the ZERO/ONE swizzles
do affect the border color. As far as I can tell, it looks something
like this (if you're interested; the exact behaviour seems not supposed
to be made use of):

===
In the format description (including swizzle), each color component of
RGBA (as seen by the shader) gets mapped a memory component
{C0,C1,C2,C3} or {ZERO,ONE_INT,ONE_FLOAT}.

When a memory (!) component (Cx) is first encountered when going through
RGBA, it is assigned the SAMPLER_BORDER_COLOR component value for that
component, and if the memory component is encountered again (because of
swizzle), that same value will be used.

So, assuming memory format RGBA and the swizzle 1RBG:
R = ONE
G = C0
B = C2
A = C1
the border colour will be SAMPLER_BORDER_COLOR.1GBA.

The resulting border colour with swizzle applied to the sampler would be
(lowercase being user values):
R=1
G=r
B=b
A=g

resulting in 1rbg, which works out.
===

  updated -- maybe the burden of considering this state can be lifted onto 
 some helper functinons -- if not, these state trackers should at least be 
 updated to abort/warn when the cap is set. 

 But I'm not really objecting -- as texture border seems fundamentally quirky 
 state.  But before proceeding with this I'd like us to consider another 
 texture border quirk while we are at it.

 The other quirk is the integer vs float texture border colors.  Roland can 
 probably talk a bit more about it as he was the one who came across it.  In a 
 few words, the interpretation of texture border color union depends on the 
 format in the sampler view state (whether it's a pure integer format or not).

 So, I wonder how integer vs float texture border colors will fit in your 
 driver's elegantly separated texture view and sampler states, or any other

Re: [Mesa-dev] [PATCH] st/mesa: optionally apply texture swizzle to border color

2013-04-14 Thread Christoph Bumiller
On 14.04.2013 13:50, Jose Fonseca wrote:
 - Original Message -

 Not to mention the sheer insanity, ugliness and emotional pain incurred
 when writing that code when it COULD be so easy and simple in the state
 tracker where you know that textures and samplers are tightly coupled,
 while in gallium I cannot assume that to be the case.
 Also, will this still be true when Mesa state tracker implements 
 GL_ARB_texture_view ?

I dare say yes. GL texture views do NOT decouple textures from samplers,
they just decouple gallium sampler views from OpenGL textures.

There may be an issue if we wanted (and we don't) to use a single
sampler for all the OpenGL texture views of a single texture. However,
that ONLY works if the shaders are changed as well, and since the
texture/sampler combinations are not predictable, this is a very bad
idea as it would mean frequent shader recompilations.

As to whether there will ever be an OpenGL extension that adds
separation of views and samplers to shaders ... I'm hoping for NV to add
some clause to the spec to solve the border colour trouble, like
forbidding texture swizzle in such cases (and I'm sure AMD would be
inclined to agree).

 Jose

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: optionally apply texture swizzle to border color

2013-04-14 Thread Christoph Bumiller
On 14.04.2013 14:33, Christoph Bumiller wrote:
  
 ===
 In the format description (including swizzle), each color component of
 RGBA (as seen by the shader) gets mapped a memory component
 {C0,C1,C2,C3} or {ZERO,ONE_INT,ONE_FLOAT}.

 When a memory (!) component (Cx) is first encountered when going through
 RGBA, it is assigned the SAMPLER_BORDER_COLOR component value for that
 component, and if the memory component is encountered again (because of
 swizzle), that same value will be used.

 So, assuming memory format RGBA and the swizzle 1RBG:
 R = ONE
 G = C0
 B = C2
 A = C1
 the border colour will be SAMPLER_BORDER_COLOR.1GBA.

 The resulting border colour with swizzle applied to the sampler would be
 (lowercase being user values):
 R=1
 G=r
 B=b
 A=g

 resulting in 1rbg, which works out.
 ===


Sorry, that was a bad example, I feel the need to give a better one:

When a memory component (Cx) is first encountered when going through RGBA, it 
is assigned the SAMPLER_BORDER_COLOR.R/G/B/A component value, and if the memory 
component is encountered again (because of swizzle), that same value will be 
used.

RGBA8 with swizzle G1GB:
R=C1
G=ONE
B=C1
A=C2

gets BORDER_COLOR.R1RA.

Maybe that's the same thing that happens on r600 (I just recall undo
the swizzle in a weird way) ?


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH resend] mesa: Add core support for the GL_AMD_performance_monitor extension.

2013-04-13 Thread Christoph Bumiller
On 12.04.2013 21:14, Kenneth Graunke wrote:
 This provides an interface for applications (and OpenGL-based tools) to
 access GPU performance counters.  Since the exact performance counters
 available vary between vendors and hardware generations, the extension
 provides an API the application can use to get the names, types, and
 minimum/maximum values of all available counters.  Counters are also
 organized into groups.

 Applications create performance monitor objects, select the counters
 they want to track, and Begin/End monitoring, much like OpenGL's query
 API.  Multiple monitors can be in flight simultaneously.

 We chose not to implement the similar GL_INTEL_performance_queries
 extension because Intel has not bothered to publish a specification in
 the OpenGL registry.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mapi/glapi/gen/AMD_performance_monitor.xml |  87 
  src/mapi/glapi/gen/Makefile.am |   1 +
  src/mapi/glapi/gen/gl_API.xml  |   2 +
  src/mapi/glapi/gen/gl_genexec.py   |   1 +
  src/mesa/SConscript|   1 +
  src/mesa/main/context.c|   2 +
  src/mesa/main/dd.h |  22 +
  src/mesa/main/extensions.c |   1 +
  src/mesa/main/mtypes.h |  84 
  src/mesa/main/performance_monitor.c| 563 
 +
  src/mesa/main/performance_monitor.h|  85 
  src/mesa/sources.mak   |   1 +
  12 files changed, 850 insertions(+)
  create mode 100644 src/mapi/glapi/gen/AMD_performance_monitor.xml
  create mode 100644 src/mesa/main/performance_monitor.c
  create mode 100644 src/mesa/main/performance_monitor.h
  
  /**
 + * A performance monitor as described in AMD_performance_monitor.
 + */
 +struct gl_perf_monitor_object
 +{
 +   GLboolean Active;
 +
 +   /* Actually BITSET_WORD but we can't #include that here. */
 +   GLuint *ActiveCounters;
 +};
 +

Started to implement this for mesa/st, got a question about ActiveCounters:

Does this bitset refer to the counter IDs or the Counters array index ?
Do the IDs have to be consecutive ? Do they have to correspond to the
array index ?

 +
 +void GLAPIENTRY
 +_mesa_SelectPerfMonitorCountersAMD(GLuint monitor, GLboolean enable,
 +   GLuint group, GLint numCounters,
 +   GLuint *counterList)
 +{
...
 +   if (enable) {
 +  /* Enable the counters */
 +  for (i = 0; i  numCounters; i++) {
 + BITSET_SET(m-ActiveCounters, counterList[i]);
 +  }
 +   } else {
 +  /* Disable the counters */
 +  for (i = 0; i  numCounters; i++) {
 + BITSET_CLEAR(m-ActiveCounters, counterList[i]);
 +  }
 +   }
 +}

counterList is an ID, so this implies ActiveCounters refers to IDs.

You also do:

m-ActiveCounters = calloc(ctx-PerfMonitor.NumCounters, sizeof(BITSET_WORD));

So, this implies it refers to the Counters array of size NumCounters
(unless the overallocation by 8 * sizeof(BITSET_WORD) bits has some
purpose that escapes me).

Hence, we cannot freely select IDs, can we ?
I had different graciously spaced ranges of gallium query IDs reserved
of different counter domains (since I haven't added all possible
counters, I don't even know all of them, needs REing), so I guess I have
to remap them in the state tracker ...

Anyway, I think this should be mentioned in a comment [that is easy to
find].

Thanks,
Christoph
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/mesa: optionally apply texture swizzle to border color

2013-04-13 Thread Christoph Bumiller
From: Christoph Bumiller christoph.bumil...@speed.at

This is the only sane solution for nv50 and nvc0 (really, trust me),
but since on other hardware the border colour is tightly coupled with
texture state they'd have to undo the swizzle, so I've added a cap.

The name of the cap could be changed to be more descriptive, like
PIPE_CAP_TEXTURE_SWIZZLE_AFFECTS_BORDER_COLOR.

The dependency of update_sampler on the texture updates was
introduced to avoid doing the apply_depthmode to the swizzle twice.

More detailed explanation of driver situation:

No, really, don't suggest doing this in the driver. The driver has
elegantly separated texture view and sampler states (which are each
a structure in a table in VRAM and should not be updated to avoid
performance loss), and table are bound to the independent (!) texture
and sampler slots in shaders which must be separately indexable
indirectly).
So, if I was to do this in the driver, I'd have to add separate sampler
state object instances for each texture view with appropriately swizzled
border color, and there's only 16 slots, so I'd be limited to 4 texture
units.
Not to mention the sheer insanity, ugliness and emotional pain incurred
when writing that code when it COULD be so easy and simple in the state
tracker where you know that textures and samplers are tightly coupled,
while in gallium I cannot assume that to be the case.
---
 src/gallium/docs/source/cso/sampler.rst  |7 ++-
 src/gallium/docs/source/screen.rst   |2 +
 src/gallium/drivers/freedreno/freedreno_screen.c |1 +
 src/gallium/drivers/i915/i915_screen.c   |1 +
 src/gallium/drivers/llvmpipe/lp_screen.c |2 +
 src/gallium/drivers/nv30/nv30_screen.c   |1 +
 src/gallium/drivers/nv50/nv50_screen.c   |1 +
 src/gallium/drivers/nvc0/nvc0_screen.c   |1 +
 src/gallium/drivers/r300/r300_screen.c   |1 +
 src/gallium/drivers/r600/r600_pipe.c |1 +
 src/gallium/drivers/radeonsi/radeonsi_pipe.c |1 +
 src/gallium/drivers/softpipe/sp_screen.c |2 +
 src/gallium/drivers/svga/svga_screen.c   |2 +
 src/gallium/include/pipe/p_defines.h |3 +-
 src/mesa/state_tracker/st_atom.c |2 +-
 src/mesa/state_tracker/st_atom_sampler.c |   65 +-
 src/mesa/state_tracker/st_context.c  |2 +
 src/mesa/state_tracker/st_context.h  |1 +
 18 files changed, 89 insertions(+), 7 deletions(-)

diff --git a/src/gallium/docs/source/cso/sampler.rst 
b/src/gallium/docs/source/cso/sampler.rst
index 26ffc18..1911cea 100644
--- a/src/gallium/docs/source/cso/sampler.rst
+++ b/src/gallium/docs/source/cso/sampler.rst
@@ -101,7 +101,10 @@ max_lod
 border_color
 Color union used for texel coordinates that are outside the [0,width-1],
 [0, height-1] or [0, depth-1] ranges. Interpreted according to sampler
-view format.
+view format, unless the driver reports
+PIPE_CAP_BORDER_COLOR_QUIRK, in which case this value is substituted for
+the texture color exactly as specified, the sampler view format and swizzle
+have no effect on it.
 max_anisotropy
 Maximum anistropy ratio to use when sampling from textures.  For example,
 if max_anistropy=4, a region of up to 1 by 4 texels will be sampled.
@@ -111,4 +114,4 @@ max_anisotropy
 seamless_cube_map
 If set, the bilinear filter of a cube map may take samples from adjacent
 cube map faces when sampled near a texture border to produce a seamless
-look.
\ No newline at end of file
+look.
diff --git a/src/gallium/docs/source/screen.rst 
b/src/gallium/docs/source/screen.rst
index 4b01d77..495398b 100644
--- a/src/gallium/docs/source/screen.rst
+++ b/src/gallium/docs/source/screen.rst
@@ -151,6 +151,8 @@ The integer capabilities:
   dedicated memory should return 1 and all software rasterizers should return 
0.
 * ``PIPE_CAP_QUERY_PIPELINE_STATISTICS``: Whether 
PIPE_QUERY_PIPELINE_STATISTICS
   is supported.
+* ``PIPE_CAP_BORDER_COLOR_QUIRK``: Whether the sampler view's format and 
swizzle
+  affect the border color.
 
 
 .. _pipe_capf:
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index 283d07f..5b60401 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -200,6 +200,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_USER_VERTEX_BUFFERS:
case PIPE_CAP_USER_INDEX_BUFFERS:
case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
+   case PIPE_CAP_BORDER_COLOR_QUIRK:
return 0;
 
/* Stream output. */
diff --git a/src/gallium/drivers/i915/i915_screen.c 
b/src/gallium/drivers/i915/i915_screen.c
index 54b2154..4c3d52f 100644
--- a/src/gallium/drivers/i915/i915_screen.c
+++ b/src/gallium/drivers/i915/i915_screen.c
@@ -213,6 +213,7

Re: [Mesa-dev] [PATCH resend] mesa: Add core support for the GL_AMD_performance_monitor extension.

2013-04-13 Thread Christoph Bumiller
On 12.04.2013 21:14, Kenneth Graunke wrote:
 This provides an interface for applications (and OpenGL-based tools) to
 access GPU performance counters.  Since the exact performance counters
 available vary between vendors and hardware generations, the extension
 provides an API the application can use to get the names, types, and
 minimum/maximum values of all available counters.  Counters are also
 organized into groups.
  
 +   /**
 +* \name Performance monitors
 +*/
 +   /*@{*/
 +   struct gl_perf_monitor_object * (*NewPerfMonitor)(void);
 +   void (*DeletePerfMonitor)(struct gl_perf_monitor_object *m);

Could we get a gl_context for these as well ? It might be useful since
if we want allocate or destroy (more likely) gallium objects we'll need
a context.
NewQueryObject has a context argument as well.

I could save the context from the Begin/End calls, but if there's no
reason not to pass a context to New/Delete, having it as arg would be
preferable.

Regards,
Christoph
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: add support for compressed texture

2013-04-08 Thread Christoph Bumiller
On 08.04.2013 12:03, Marek Olšák wrote:
 On Mon, Apr 8, 2013 at 11:29 AM, Michel Dänzer mic...@daenzer.net
 mailto:mic...@daenzer.net wrote:

 On Fre, 2013-04-05 at 17:36 -0400, j.gli...@gmail.com
 mailto:j.gli...@gmail.com wrote:
  From: Jerome Glisse jgli...@redhat.com mailto:jgli...@redhat.com
 
  Most test pass, issue are with border color and swizzle.

 FWIW, those issues are there with non-compressed formats as well. I'm
 afraid we might need to change the hardware border colour depending on
 the swizzle.


 I don't think so. The issue with the swizzled border color seems to be
 a bad hardware design decision present since r600 rather than a
 hardware bug. I tried fixing it for older chipsets with no success. I
 doubt the hw designers fixed this for SI. The problem is the hardware
 tries to guess what the border color swizzle is from the combined
 pipe_format+sampler view swizzle combination. You need 2 texture
 swizzle states in the texture unit for the border color to be swizzled
 correctly, because texels must be swizzled by the pipe_format swizzle
 and sampler view swizzle, but the border color must be swizzled by the
 sampler view only. The main problem is that the hardware internally
 tries to undo the pipe_format swizzle in a way that just doesn't work.
 I don't remember the exact swizzles being used by hardware, but I got
 crazy cases like if I set texture swizzle to ywzx, the border color
 will be ywyy. There is no way to access those zx components of the
 border color for that specific swizzling. For some cases, the hardware
 succeeds in guessing what the border color should be, e.g. if I set
 texture swizzle to .zyxw, the returned border color will be .xyzw (and
 that would be correct if the swizzle came from pipe_format, and
 incorrect if the swizzle came from sampler view).

 It was easy with r300, because I could just undo pipe_format swizzling
 before passing the border color to the hardware.


Ah yes, border colour swizzle, it's a problem on NV, too. Because the
border colour isn't getting swizzled at all [as far as we know].
The main issue is the separation of samplers and textures in gallium, if
that wasn't the case samplers and textures would be coupled and the
sampler state could be set according to texture view state (if it's just
OpenGL; and if it's just D3D there's no swizzle).
So, I just leave it broken, I can't destroying the elegant separation
because of such an unimportant detail, that hurts too much.

(Also, if someone was to use multiple samplers and views in gallium and
index them dynamically, I'd have to set up all combinations of textures
and samplers, which is simply ridiculous.
And now I'm going to look for some secret sampler setup bit that says
swizzle according to texture view state. Maybe looking into the future
of OpenGL someone's been wise enough to add that. But then, I'd have the
sample problem as you. An intensity texture simply doesn't have separate
values for R,G,B,A.)

Possible solution:
Maybe the state tracker could just do the swizzling, because it knows
that samplers and views are coupled, and it knows the swizzle ?

 Marek


 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] gallium: add PIPE_BIND_COMMAND_BUFFER

2013-04-05 Thread Christoph Bumiller
On 04.04.2013 21:53, Christoph Bumiller wrote:
 On 04.04.2013 21:44, Jose Fonseca wrote:
 I think that PIPE_BIND_INDIRECT_BUFFER would be more self-descriptive.

Marek suggested PIPE_BIND_DRAW_INDIRECT_BUFFER, but I think that's too
specific because there's also a DISPATCH_INDIRECT buffer for compute
shaders.

And just INDIRECT_BUFFER without the _DRAW_ doesn't seem so
self-descriptive if you're not thinking in the right context.

I'd like to stick with BIND_COMMAND_BUFFER, or maybe
BIND_COMMAND_ARGS_BUFFER ...

 Or do you envision other uses of such buffer?
 It's possible that at some point we add a mechanism to let the driver
 store arbitrary commands into a buffer created by the st, or have
 resources used as arguments conditional rendering ...
 Lost of possiblities, but nothing concrete, and for the command lists
 like with D3D's deferred contexts we'd probably return opaque objects
 that can contain more auxiliary data.
 I like it to be more generic, but then it could turn out that there be
 different requirements on these command source buffers in the future
 ... I'm undecided now.


 Jose

 - Original Message -
 Intended for use with GL_ARB_draw_indirect's DRAW_INDIRECT_BUFFER
 target or for D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS.
 ---
  src/gallium/docs/source/screen.rst   |2 ++
  src/gallium/include/pipe/p_defines.h |1 +
  2 files changed, 3 insertions(+), 0 deletions(-)

 diff --git a/src/gallium/docs/source/screen.rst
 b/src/gallium/docs/source/screen.rst
 index c1a3c0b..f8cdded 100644
 --- a/src/gallium/docs/source/screen.rst
 +++ b/src/gallium/docs/source/screen.rst
 @@ -306,6 +306,8 @@ resources might be created and handled quite 
 differently.
bound to the graphics pipeline as a shader resource.
  * ``PIPE_BIND_COMPUTE_RESOURCE``: A buffer or texture that can be
bound to the compute program as a shader resource.
 +* ``PIPE_BIND_COMMAND_BUFFER``: A buffer or that may be sourced by the
 +  GPU command processor, like with indirect drawing.
  
  .. _pipe_usage:
  
 diff --git a/src/gallium/include/pipe/p_defines.h
 b/src/gallium/include/pipe/p_defines.h
 index 5b00acc..2b79f2a 100644
 --- a/src/gallium/include/pipe/p_defines.h
 +++ b/src/gallium/include/pipe/p_defines.h
 @@ -315,6 +315,7 @@ enum pipe_flush_flags {
  #define PIPE_BIND_GLOBAL   (1  18) /* set_global_binding */
  #define PIPE_BIND_SHADER_RESOURCE  (1  19) /* set_shader_resources */
  #define PIPE_BIND_COMPUTE_RESOURCE (1  20) /* set_compute_resources 
 */
 +#define PIPE_BIND_COMMAND_BUFFER   (1  21) /* pipe_draw_info.indirect
 */
  
  /* The first two flags above were previously part of the amorphous
   * TEXTURE_USAGE, most of which are now descriptions of the ways a
 --
 1.7.3.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] mesa: implement GL_ARB_draw_indirect

2013-04-05 Thread Christoph Bumiller
On 04.04.2013 21:17, Brian Paul wrote:
 I just did a quick skim and found a few minor things.

 First, the subject might be mesa: implement GL_ARB_draw_indirect and
 GL_ARB_multi_draw_indirect

 This is a big patch and I think it could have been broken down into
 smaller pieces, but I know it's a PITA to redo.  Next time.

 +static void GLAPIENTRY
 +save_DrawArraysIndirect(GLenum mode, const GLvoid *indirect)
 +{
 +   GET_CURRENT_CONTEXT(ctx);
 +   _mesa_error(ctx, GL_INVALID_OPERATION,
 +   glDrawArraysIndirect() during display list compile);
 +}

 Is this specified in the spec?  IIRC, if a command isn't supposed to
 get compiled into a dlist it's just immediately executed.



Not that I can see.
But I figured since DrawElementsInstancedBaseVertex return
GL_INVALID_OPERATION, the indirect version, which is essentially the
same except for the place it takes its arguments from (especially in
compatibility mode, where the indirect argument is a user pointer (this
is not implemented, only exposing the extension in core profile for now,
which means it won't ever be used with display lists anyway until
compatibility for it is implemented), would behave the same.

I also don't get why DrawElements non-instanced non-baseVertex is
specified to execute immediately instead of returning INVALID_OPERATION.
Surely if I can draw one instance I can draw 2 instances if it's
executed immediately anyway.

The NV binary driver seems to simply execute it immediately fwiw.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] mesa: implement GL_ARB_draw_indirect

2013-04-05 Thread Christoph Bumiller
On 05.04.2013 13:32, Christoph Bumiller wrote:
 On 04.04.2013 21:17, Brian Paul wrote:
 I just did a quick skim and found a few minor things.

 First, the subject might be mesa: implement GL_ARB_draw_indirect and
 GL_ARB_multi_draw_indirect

 This is a big patch and I think it could have been broken down into
 smaller pieces, but I know it's a PITA to redo.  Next time.
 +static void GLAPIENTRY
 +save_DrawArraysIndirect(GLenum mode, const GLvoid *indirect)
 +{
 +   GET_CURRENT_CONTEXT(ctx);
 +   _mesa_error(ctx, GL_INVALID_OPERATION,
 +   glDrawArraysIndirect() during display list compile);
 +}
 Is this specified in the spec?  IIRC, if a command isn't supposed to
 get compiled into a dlist it's just immediately executed.


 Not that I can see.
 But I figured since DrawElementsInstancedBaseVertex return
 GL_INVALID_OPERATION, the indirect version, which is essentially the
 same except for the place it takes its arguments from (especially in
 compatibility mode, where the indirect argument is a user pointer (this
 is not implemented, only exposing the extension in core profile for now,
 which means it won't ever be used with display lists anyway until
 compatibility for it is implemented), would behave the same.


 I also don't get why DrawElements non-instanced non-baseVertex is
 specified to execute immediately instead of returning INVALID_OPERATION.
 Surely if I can draw one instance I can draw 2 instances if it's
 executed immediately anyway.
Nevermind this paragraph.

 The NV binary driver seems to simply execute it immediately fwiw.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] gallium: add PIPE_BIND_COMMAND_ARGS_BUFFER

2013-04-05 Thread Christoph Bumiller
Intended for use with GL_ARB_draw_indirect's DRAW_INDIRECT_BUFFER
target or for D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS.
---
 src/gallium/docs/source/screen.rst   |3 +++
 src/gallium/include/pipe/p_defines.h |1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/src/gallium/docs/source/screen.rst 
b/src/gallium/docs/source/screen.rst
index c1a3c0b..d8cfb97 100644
--- a/src/gallium/docs/source/screen.rst
+++ b/src/gallium/docs/source/screen.rst
@@ -306,6 +306,9 @@ resources might be created and handled quite differently.
   bound to the graphics pipeline as a shader resource.
 * ``PIPE_BIND_COMPUTE_RESOURCE``: A buffer or texture that can be
   bound to the compute program as a shader resource.
+* ``PIPE_BIND_COMMAND_ARGS_BUFFER``: A buffer that may be sourced by the
+  GPU command processor. It can contain, for example, the arguments to
+  indirect draw calls.
 
 .. _pipe_usage:
 
diff --git a/src/gallium/include/pipe/p_defines.h 
b/src/gallium/include/pipe/p_defines.h
index 5b00acc..4c6b1f1 100644
--- a/src/gallium/include/pipe/p_defines.h
+++ b/src/gallium/include/pipe/p_defines.h
@@ -315,6 +315,7 @@ enum pipe_flush_flags {
 #define PIPE_BIND_GLOBAL   (1  18) /* set_global_binding */
 #define PIPE_BIND_SHADER_RESOURCE  (1  19) /* set_shader_resources */
 #define PIPE_BIND_COMPUTE_RESOURCE (1  20) /* set_compute_resources */
+#define PIPE_BIND_COMMAND_ARGS_BUFFER  (1  21) /* pipe_draw_info.indirect */
 
 /* The first two flags above were previously part of the amorphous
  * TEXTURE_USAGE, most of which are now descriptions of the ways a
-- 
1.7.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] mesa: add indirect drawing buffer parameter to draw functions

2013-04-05 Thread Christoph Bumiller
Split from patch implementing ARB_draw_indirect.

v2:
Const-qualify the struct gl_buffer_object *indirect argument.
---
 src/mesa/drivers/dri/i965/brw_draw.c |3 ++-
 src/mesa/drivers/dri/i965/brw_draw.h |3 ++-
 src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c |9 ++---
 src/mesa/state_tracker/st_cb_rasterpos.c |2 +-
 src/mesa/state_tracker/st_draw.c |3 ++-
 src/mesa/state_tracker/st_draw.h |6 --
 src/mesa/state_tracker/st_draw_feedback.c|3 ++-
 src/mesa/tnl/tnl.h   |3 ++-
 src/mesa/vbo/vbo.h   |5 -
 src/mesa/vbo/vbo_exec_array.c|8 
 src/mesa/vbo/vbo_exec_draw.c |2 +-
 src/mesa/vbo/vbo_primitive_restart.c |4 ++--
 src/mesa/vbo/vbo_rebase.c|2 +-
 src/mesa/vbo/vbo_save_draw.c |2 +-
 src/mesa/vbo/vbo_split_copy.c|2 +-
 src/mesa/vbo/vbo_split_inplace.c |2 +-
 16 files changed, 36 insertions(+), 23 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 809bcc5..9212eb1 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -548,7 +548,8 @@ void brw_draw_prims( struct gl_context *ctx,
 GLboolean index_bounds_valid,
 GLuint min_index,
 GLuint max_index,
-struct gl_transform_feedback_object *tfb_vertcount )
+struct gl_transform_feedback_object *tfb_vertcount,
+const struct gl_buffer_object *indirect )
 {
struct intel_context *intel = intel_context(ctx);
const struct gl_client_array **arrays = ctx-Array._DrawArrays;
diff --git a/src/mesa/drivers/dri/i965/brw_draw.h 
b/src/mesa/drivers/dri/i965/brw_draw.h
index d86a9e7..8f0c768 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.h
+++ b/src/mesa/drivers/dri/i965/brw_draw.h
@@ -41,7 +41,8 @@ void brw_draw_prims( struct gl_context *ctx,
 GLboolean index_bounds_valid,
 GLuint min_index,
 GLuint max_index,
-struct gl_transform_feedback_object *tfb_vertcount );
+struct gl_transform_feedback_object *tfb_vertcount,
+const struct gl_buffer_object *indirect );
 
 void brw_draw_init( struct brw_context *brw );
 void brw_draw_destroy( struct brw_context *brw );
diff --git a/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c 
b/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c
index 436db32..69f30e2 100644
--- a/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c
+++ b/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c
@@ -222,7 +222,8 @@ TAG(vbo_render_prims)(struct gl_context *ctx,
  const struct _mesa_index_buffer *ib,
  GLboolean index_bounds_valid,
  GLuint min_index, GLuint max_index,
- struct gl_transform_feedback_object *tfb_vertcount);
+ struct gl_transform_feedback_object *tfb_vertcount,
+ const struct gl_buffer_object *indirect);
 
 static GLboolean
 vbo_maybe_split(struct gl_context *ctx, const struct gl_client_array **arrays,
@@ -453,7 +454,8 @@ TAG(vbo_render_prims)(struct gl_context *ctx,
  const struct _mesa_index_buffer *ib,
  GLboolean index_bounds_valid,
  GLuint min_index, GLuint max_index,
- struct gl_transform_feedback_object *tfb_vertcount)
+ struct gl_transform_feedback_object *tfb_vertcount,
+ const struct gl_buffer_object *indirect)
 {
struct nouveau_render_state *render = to_render_state(ctx);
const struct gl_client_array **arrays = ctx-Array._DrawArrays;
@@ -489,7 +491,8 @@ TAG(vbo_check_render_prims)(struct gl_context *ctx,
const struct _mesa_index_buffer *ib,
GLboolean index_bounds_valid,
GLuint min_index, GLuint max_index,
-   struct gl_transform_feedback_object *tfb_vertcount)
+   struct gl_transform_feedback_object *tfb_vertcount,
+   const struct gl_buffer_object *indirect)
 {
struct nouveau_context *nctx = to_nouveau_context(ctx);
 
diff --git a/src/mesa/state_tracker/st_cb_rasterpos.c 
b/src/mesa/state_tracker/st_cb_rasterpos.c
index 4731f26..778218a1 100644
--- a/src/mesa/state_tracker/st_cb_rasterpos.c
+++ b/src/mesa/state_tracker/st_cb_rasterpos.c
@@ -255,7 +255,7 @@ st_RasterPos(struct gl_context *ctx, const GLfloat v[4])
 * st_feedback_draw_vbo doesn't check for that flag. */
ctx-Array._DrawArrays = rs-arrays;
st_feedback_draw_vbo(ctx, rs-prim, 1, NULL, GL_TRUE, 0, 1,
-NULL);
+NULL, 

[Mesa-dev] [PATCH 5/5] st/mesa: add support for indirect drawing

2013-04-05 Thread Christoph Bumiller
---
 src/mesa/state_tracker/st_cb_bufferobjects.c |3 +++
 src/mesa/state_tracker/st_draw.c |   11 ++-
 src/mesa/state_tracker/st_extensions.c   |3 ++-
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_bufferobjects.c 
b/src/mesa/state_tracker/st_cb_bufferobjects.c
index 8ff32c8..2e719cc 100644
--- a/src/mesa/state_tracker/st_cb_bufferobjects.c
+++ b/src/mesa/state_tracker/st_cb_bufferobjects.c
@@ -205,6 +205,9 @@ st_bufferobj_data(struct gl_context *ctx,
case GL_UNIFORM_BUFFER:
   bind = PIPE_BIND_CONSTANT_BUFFER;
   break;
+   case GL_DRAW_INDIRECT_BUFFER:
+  bind = PIPE_BIND_COMMAND_ARGS_BUFFER;
+  break;
default:
   bind = 0;
}
diff --git a/src/mesa/state_tracker/st_draw.c b/src/mesa/state_tracker/st_draw.c
index 82a4bcd..3c74c50 100644
--- a/src/mesa/state_tracker/st_draw.c
+++ b/src/mesa/state_tracker/st_draw.c
@@ -256,6 +256,14 @@ st_draw_vbo(struct gl_context *ctx,
   }
}
 
+   if (indirect) {
+  info.indirect = st_buffer_object(indirect)-buffer;
+
+  /* Primitive restart is not handled by the VBO module in this case. */
+  info.primitive_restart = ctx-Array._PrimitiveRestart;
+  info.restart_index = ctx-Array._RestartIndex;
+   }
+
/* do actual drawing */
for (i = 0; i  nr_prims; i++) {
   info.mode = translate_prim( ctx, prims[i].mode );
@@ -268,6 +276,7 @@ st_draw_vbo(struct gl_context *ctx,
  info.min_index = info.start;
  info.max_index = info.start + info.count - 1;
   }
+  info.indirect_offset = prims[i].indirect_offset;
 
   if (ST_DEBUG  DEBUG_DRAW) {
  debug_printf(st/draw: mode %s  start %u  count %u  indexed %d\n,
@@ -277,7 +286,7 @@ st_draw_vbo(struct gl_context *ctx,
   info.indexed);
   }
 
-  if (info.count_from_stream_output) {
+  if (info.count_from_stream_output || info.indirect) {
  cso_draw_vbo(st-cso_context, info);
   }
   else if (info.primitive_restart) {
diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 11db9d3..0488755 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -398,7 +398,8 @@ void st_init_extensions(struct st_context *st)
   { o(MESA_texture_array),   PIPE_CAP_MAX_TEXTURE_ARRAY_LAYERS 
},
 
   { o(OES_standard_derivatives), PIPE_CAP_SM3  
},
-  { o(ARB_texture_cube_map_array),   PIPE_CAP_CUBE_MAP_ARRAY   
}
+  { o(ARB_texture_cube_map_array),   PIPE_CAP_CUBE_MAP_ARRAY   
},
+  { o(ARB_draw_indirect),PIPE_CAP_DRAW_INDIRECT
},
};
 
/* Required: render target and sampler support */
-- 
1.7.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] mesa: implement GL_ARB_draw_indirect and GL_ARB_multi_draw_indirect

2013-04-05 Thread Christoph Bumiller
v2:
Removed some stray extern qualifiers.

Documented use of Draw*IndirectCommand sizes.

Removed separate extension enable flag for ARB_multi_draw_indirect
since this can always be supported by looping.

Kept generation of GL_INVALID_OPERATION in display list compile.
The spec doesn't say anything about them, but all the direct drawing
commands that support instancing do the same.
---
 src/mapi/glapi/gen/Makefile.am  |1 +
 src/mapi/glapi/gen/gl_API.xml   |4 +-
 src/mesa/main/api_validate.c|  153 +++
 src/mesa/main/api_validate.h|   26 
 src/mesa/main/bufferobj.c   |9 +
 src/mesa/main/dd.h  |   12 ++
 src/mesa/main/dlist.c   |   41 +
 src/mesa/main/extensions.c  |2 +
 src/mesa/main/get.c |5 +
 src/mesa/main/get_hash_params.py|2 +
 src/mesa/main/mtypes.h  |3 +
 src/mesa/main/tests/dispatch_sanity.cpp |8 +-
 src/mesa/main/vtxfmt.c  |7 +
 src/mesa/vbo/vbo_exec_array.c   |  249 +++
 src/mesa/vbo/vbo_save_api.c |   53 +++
 15 files changed, 570 insertions(+), 5 deletions(-)

diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am
index 36e47e2..243c148 100644
--- a/src/mapi/glapi/gen/Makefile.am
+++ b/src/mapi/glapi/gen/Makefile.am
@@ -96,6 +96,7 @@ API_XML = \
ARB_depth_clamp.xml \
ARB_draw_buffers_blend.xml \
ARB_draw_elements_base_vertex.xml \
+   ARB_draw_indirect.xml \
ARB_draw_instanced.xml \
ARB_ES2_compatibility.xml \
ARB_ES3_compatibility.xml \
diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index df95924..f22fdac 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -8240,6 +8240,8 @@
 
 !-- ARB extensions #86...#93 --
 
+xi:include href=ARB_draw_indirect.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
+
 category name=GL_ARB_transform_feedback3 number=94
   enum name=MAX_TRANSFORM_FEEDBACK_BUFFERS value=0x8E70/
   enum name=MAX_VERTEX_STREAMS value=0x8E71/
@@ -8317,7 +8319,7 @@
 
 xi:include href=ARB_invalidate_subdata.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
 
-!-- ARB extensions #133...#138 --
+!-- ARB extensions #134...#138 --
 
 xi:include href=ARB_texture_buffer_range.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
 
diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index 53b0021..e875c5d 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -737,3 +737,156 @@ _mesa_validate_DrawTransformFeedback(struct gl_context 
*ctx,
 
return GL_TRUE;
 }
+
+static GLboolean
+valid_draw_indirect(struct gl_context *ctx,
+GLenum mode, const GLvoid *indirect,
+GLsizei size, const char *name)
+{
+   const GLsizeiptr end = (GLsizeiptr)indirect + size;
+
+   if (!_mesa_valid_prim_mode(ctx, mode, name))
+  return GL_FALSE;
+
+   if ((GLsizeiptr)indirect  (sizeof(GLuint) - 1)) {
+  _mesa_error(ctx, GL_INVALID_OPERATION,
+  %s(indirect is not aligned), name);
+  return GL_FALSE;
+   }
+
+   if (_mesa_is_bufferobj(ctx-DrawIndirectBuffer)) {
+  if (_mesa_bufferobj_mapped(ctx-DrawIndirectBuffer)) {
+ _mesa_error(ctx, GL_INVALID_OPERATION,
+ %s(DRAW_INDIRECT_BUFFER is mapped), name);
+ return GL_FALSE;
+  }
+  if (ctx-DrawIndirectBuffer-Size  end) {
+ _mesa_error(ctx, GL_INVALID_OPERATION,
+ %s(DRAW_INDIRECT_BUFFER too small), name);
+ return GL_FALSE;
+  }
+   } else {
+  if (ctx-API != API_OPENGL_COMPAT) {
+ _mesa_error(ctx, GL_INVALID_OPERATION,
+ %s: no buffer bound to DRAW_INDIRECT_BUFFER, name);
+ return GL_FALSE;
+  }
+   }
+
+   if (!check_valid_to_render(ctx, name))
+  return GL_FALSE;
+
+   return GL_TRUE;
+}
+
+static inline GLboolean
+valid_draw_indirect_elements(struct gl_context *ctx,
+ GLenum mode, GLenum type, const GLvoid *indirect,
+ GLsizeiptr size, const char *name)
+{
+   if (!valid_elements_type(ctx, type, name))
+  return GL_FALSE;
+
+   /*
+* Unlike regular DrawElementsInstancedBaseVertex commands, the indices
+* may not come from a client array and must come from an index buffer.
+* If no element array buffer is bound, an INVALID_OPERATION error is
+* generated.
+*/
+   if (!_mesa_is_bufferobj(ctx-Array.ArrayObj-ElementArrayBufferObj)) {
+  _mesa_error(ctx, GL_INVALID_OPERATION,
+  %s(no buffer bound to GL_ELEMENT_ARRAY_BUFFER), name);
+  return GL_FALSE;
+   }
+
+   return valid_draw_indirect(ctx, mode, indirect, size, name);
+}
+
+static inline GLboolean
+valid_draw_indirect_multi(struct gl_context *ctx,
+  

[Mesa-dev] [PATCH 4/5] gallium: add facilities for indirect drawing

2013-04-05 Thread Christoph Bumiller
v2:
Added comments to util_draw_indirect, clarified and fixed map size.
Removed unlikely().
---
 src/gallium/auxiliary/util/u_draw.c  |   43 ++
 src/gallium/auxiliary/util/u_draw.h  |8 
 src/gallium/auxiliary/util/u_dump_state.c|3 ++
 src/gallium/docs/source/screen.rst   |3 ++
 src/gallium/drivers/freedreno/freedreno_screen.c |1 +
 src/gallium/drivers/i915/i915_screen.c   |1 +
 src/gallium/drivers/llvmpipe/lp_draw_arrays.c|5 +++
 src/gallium/drivers/llvmpipe/lp_screen.c |2 +
 src/gallium/drivers/nv30/nv30_screen.c   |1 +
 src/gallium/drivers/nv50/nv50_screen.c   |2 +
 src/gallium/drivers/r300/r300_screen.c   |1 +
 src/gallium/drivers/r600/r600_pipe.c |1 +
 src/gallium/drivers/radeonsi/radeonsi_pipe.c |1 +
 src/gallium/drivers/softpipe/sp_draw_arrays.c|6 +++
 src/gallium/drivers/softpipe/sp_screen.c |2 +
 src/gallium/drivers/svga/svga_screen.c   |1 +
 src/gallium/drivers/trace/tr_dump_state.c|3 ++
 src/gallium/include/pipe/p_defines.h |3 +-
 src/gallium/include/pipe/p_state.h   |   22 +++
 19 files changed, 108 insertions(+), 1 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_draw.c 
b/src/gallium/auxiliary/util/u_draw.c
index 83d9284..b9f8fcd 100644
--- a/src/gallium/auxiliary/util/u_draw.c
+++ b/src/gallium/auxiliary/util/u_draw.c
@@ -27,6 +27,7 @@
 
 
 #include util/u_debug.h
+#include util/u_inlines.h
 #include util/u_math.h
 #include util/u_format.h
 #include util/u_draw.h
@@ -123,3 +124,45 @@ util_draw_max_index(
 
return max_index + 1;
 }
+
+
+/* This extracts the draw arguments from the info_in-indirect resource,
+ * puts them into a new instance of pipe_draw_info, and calls draw_vbo on it.
+ */
+void
+util_draw_indirect(struct pipe_context *pipe,
+   const struct pipe_draw_info *info_in)
+{
+   struct pipe_draw_info info;
+   struct pipe_transfer *transfer;
+   uint32_t *params;
+   const unsigned num_params = info_in-indexed ? 5 : 4;
+
+   assert(info_in-indirect);
+   assert(!info_in-count_from_stream_output);
+
+   memcpy(info, info_in, sizeof(info));
+
+   params = (uint32_t *)
+  pipe_buffer_map_range(pipe,
+info_in-indirect,
+info_in-indirect_offset,
+num_params * sizeof(uint32_t),
+PIPE_TRANSFER_READ,
+transfer);
+   if (!transfer) {
+  debug_printf(%s: failed to map indirect buffer\n, __FUNCTION__);
+  return;
+   }
+
+   info.count = params[0];
+   info.instance_count = params[1];
+   info.start = params[2];
+   info.index_bias = info_in-indexed ? params[3] : 0;
+   info.start_instance = info_in-indexed ? params[4] : params[3];
+   info.indirect = NULL;
+
+   pipe_buffer_unmap(pipe, transfer);
+
+   pipe-draw_vbo(pipe, info);
+}
diff --git a/src/gallium/auxiliary/util/u_draw.h 
b/src/gallium/auxiliary/util/u_draw.h
index 3dc6918..1dd6b51 100644
--- a/src/gallium/auxiliary/util/u_draw.h
+++ b/src/gallium/auxiliary/util/u_draw.h
@@ -142,6 +142,14 @@ util_draw_range_elements(struct pipe_context *pipe,
 }
 
 
+/* This converts an indirect draw into a direct draw by mapping the indirect
+ * buffer, extracting its arguments, and calling pipe-draw_vbo.
+ */
+void
+util_draw_indirect(struct pipe_context *pipe,
+   const struct pipe_draw_info *info);
+
+
 unsigned
 util_draw_max_index(
   const struct pipe_vertex_buffer *vertex_buffers,
diff --git a/src/gallium/auxiliary/util/u_dump_state.c 
b/src/gallium/auxiliary/util/u_dump_state.c
index 2f28f3c..21b6044 100644
--- a/src/gallium/auxiliary/util/u_dump_state.c
+++ b/src/gallium/auxiliary/util/u_dump_state.c
@@ -758,6 +758,9 @@ util_dump_draw_info(FILE *stream, const struct 
pipe_draw_info *state)
 
util_dump_member(stream, ptr, state, count_from_stream_output);
 
+   util_dump_member(stream, ptr, state, indirect);
+   util_dump_member(stream, uint, state, indirect_offset);
+
util_dump_struct_end(stream);
 }
 
diff --git a/src/gallium/docs/source/screen.rst 
b/src/gallium/docs/source/screen.rst
index d8cfb97..96f316a 100644
--- a/src/gallium/docs/source/screen.rst
+++ b/src/gallium/docs/source/screen.rst
@@ -151,6 +151,9 @@ The integer capabilities:
   dedicated memory should return 1 and all software rasterizers should return 
0.
 * ``PIPE_CAP_QUERY_PIPELINE_STATISTICS``: Whether 
PIPE_QUERY_PIPELINE_STATISTICS
   is supported.
+* ``PIPE_CAP_DRAW_INDIRECT``: Whether the driver supports taking draw arguments
+  { count, instance_count, start, index_bias } from a PIPE_BUFFER resource.
+  See pipe_draw_info.
 
 
 .. _pipe_capf:
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index 283d07f..2b13e29 100644
--- 

[Mesa-dev] [PATCH] st/mesa: add support for indirect drawing v2

2013-04-05 Thread Christoph Bumiller
v2:
Fix for constness of indirect buffer argument.
Remove separate extension enable for multi_draw_indirect.
---
 src/mesa/state_tracker/st_cb_bufferobjects.c |3 +++
 src/mesa/state_tracker/st_cb_bufferobjects.h |6 ++
 src/mesa/state_tracker/st_draw.c |   11 ++-
 src/mesa/state_tracker/st_extensions.c   |3 ++-
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_bufferobjects.c 
b/src/mesa/state_tracker/st_cb_bufferobjects.c
index 8ff32c8..2e719cc 100644
--- a/src/mesa/state_tracker/st_cb_bufferobjects.c
+++ b/src/mesa/state_tracker/st_cb_bufferobjects.c
@@ -205,6 +205,9 @@ st_bufferobj_data(struct gl_context *ctx,
case GL_UNIFORM_BUFFER:
   bind = PIPE_BIND_CONSTANT_BUFFER;
   break;
+   case GL_DRAW_INDIRECT_BUFFER:
+  bind = PIPE_BIND_COMMAND_ARGS_BUFFER;
+  break;
default:
   bind = 0;
}
diff --git a/src/mesa/state_tracker/st_cb_bufferobjects.h 
b/src/mesa/state_tracker/st_cb_bufferobjects.h
index 1c991d2..05cc0fa 100644
--- a/src/mesa/state_tracker/st_cb_bufferobjects.h
+++ b/src/mesa/state_tracker/st_cb_bufferobjects.h
@@ -54,6 +54,12 @@ st_buffer_object(struct gl_buffer_object *obj)
return (struct st_buffer_object *) obj;
 }
 
+static INLINE const struct st_buffer_object *
+st_const_buffer_object(const struct gl_buffer_object *obj)
+{
+   return (const struct st_buffer_object *) obj;
+}
+
 
 extern void
 st_bufferobj_validate_usage(struct st_context *st,
diff --git a/src/mesa/state_tracker/st_draw.c b/src/mesa/state_tracker/st_draw.c
index 82a4bcd..a07f8be 100644
--- a/src/mesa/state_tracker/st_draw.c
+++ b/src/mesa/state_tracker/st_draw.c
@@ -256,6 +256,14 @@ st_draw_vbo(struct gl_context *ctx,
   }
}
 
+   if (indirect) {
+  info.indirect = st_const_buffer_object(indirect)-buffer;
+
+  /* Primitive restart is not handled by the VBO module in this case. */
+  info.primitive_restart = ctx-Array._PrimitiveRestart;
+  info.restart_index = ctx-Array._RestartIndex;
+   }
+
/* do actual drawing */
for (i = 0; i  nr_prims; i++) {
   info.mode = translate_prim( ctx, prims[i].mode );
@@ -268,6 +276,7 @@ st_draw_vbo(struct gl_context *ctx,
  info.min_index = info.start;
  info.max_index = info.start + info.count - 1;
   }
+  info.indirect_offset = prims[i].indirect_offset;
 
   if (ST_DEBUG  DEBUG_DRAW) {
  debug_printf(st/draw: mode %s  start %u  count %u  indexed %d\n,
@@ -277,7 +286,7 @@ st_draw_vbo(struct gl_context *ctx,
   info.indexed);
   }
 
-  if (info.count_from_stream_output) {
+  if (info.count_from_stream_output || info.indirect) {
  cso_draw_vbo(st-cso_context, info);
   }
   else if (info.primitive_restart) {
diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 11db9d3..0488755 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -398,7 +398,8 @@ void st_init_extensions(struct st_context *st)
   { o(MESA_texture_array),   PIPE_CAP_MAX_TEXTURE_ARRAY_LAYERS 
},
 
   { o(OES_standard_derivatives), PIPE_CAP_SM3  
},
-  { o(ARB_texture_cube_map_array),   PIPE_CAP_CUBE_MAP_ARRAY   
}
+  { o(ARB_texture_cube_map_array),   PIPE_CAP_CUBE_MAP_ARRAY   
},
+  { o(ARB_draw_indirect),PIPE_CAP_DRAW_INDIRECT
},
};
 
/* Required: render target and sampler support */
-- 
1.7.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/draw_indirect: fix index bounds

2013-04-05 Thread Christoph Bumiller
(Will be merged into the original patches.)

Calculating the actual limits is impossible, and softpipe drops
vertices that lie outside the specified range.
---
 src/gallium/auxiliary/util/u_draw.c |4 
 src/mesa/state_tracker/st_draw.c|3 +++
 src/mesa/vbo/vbo_exec_array.c   |8 
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_draw.c 
b/src/gallium/auxiliary/util/u_draw.c
index b9f8fcd..d13ccd4 100644
--- a/src/gallium/auxiliary/util/u_draw.c
+++ b/src/gallium/auxiliary/util/u_draw.c
@@ -161,6 +161,10 @@ util_draw_indirect(struct pipe_context *pipe,
info.index_bias = info_in-indexed ? params[3] : 0;
info.start_instance = info_in-indexed ? params[4] : params[3];
info.indirect = NULL;
+   if (!info_in-indexed) {
+  info.min_index = info.start;
+  info.max_index = info.start + info.count - 1;
+   }
 
pipe_buffer_unmap(pipe, transfer);
 
diff --git a/src/mesa/state_tracker/st_draw.c b/src/mesa/state_tracker/st_draw.c
index a07f8be..64470f7 100644
--- a/src/mesa/state_tracker/st_draw.c
+++ b/src/mesa/state_tracker/st_draw.c
@@ -273,6 +273,9 @@ st_draw_vbo(struct gl_context *ctx,
   info.instance_count = prims[i].num_instances;
   info.index_bias = prims[i].basevertex;
   if (!ib) {
+ /* NOTE: For indirect drawing, max_index correctly evaluates to ~0,
+  * since start and count will be 0.
+  */
  info.min_index = info.start;
  info.max_index = info.start + info.count - 1;
   }
diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c
index 75fda00..ba70b5b 100644
--- a/src/mesa/vbo/vbo_exec_array.c
+++ b/src/mesa/vbo/vbo_exec_array.c
@@ -1382,7 +1382,7 @@ vbo_validated_drawarraysindirect(struct gl_context *ctx,
 
check_buffers_are_unmapped(exec-array.inputs);
vbo-draw_prims(ctx, prim, 1,
-   NULL, GL_TRUE, 0, 0,
+   NULL, GL_TRUE, 0, ~0,
NULL,
ctx-DrawIndirectBuffer);
 
@@ -1422,7 +1422,7 @@ vbo_validated_multidrawarraysindirect(struct gl_context 
*ctx,
 
check_buffers_are_unmapped(exec-array.inputs);
vbo-draw_prims(ctx, prim, primcount,
-   NULL, GL_TRUE, 0, 0,
+   NULL, GL_TRUE, 0, ~0,
NULL,
ctx-DrawIndirectBuffer);
 
@@ -1458,7 +1458,7 @@ vbo_validated_drawelementsindirect(struct gl_context *ctx,
 
check_buffers_are_unmapped(exec-array.inputs);
vbo-draw_prims(ctx, prim, 1,
-   ib, GL_TRUE, 0, 0,
+   ib, GL_TRUE, 0, ~0,
NULL,
ctx-DrawIndirectBuffer);
 
@@ -1507,7 +1507,7 @@ vbo_validated_multidrawelementsindirect(struct gl_context 
*ctx,
 
check_buffers_are_unmapped(exec-array.inputs);
vbo-draw_prims(ctx, prim, primcount,
-   ib, GL_TRUE, 0, 0,
+   ib, GL_TRUE, 0, ~0,
NULL,
ctx-DrawIndirectBuffer);
 
-- 
1.7.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] tgsi: Add a conditional move inststruction

2013-04-04 Thread Christoph Bumiller
On 04.04.2013 03:45, Zack Rusin wrote:
 It's part of SM4 (http://goo.gl/4IpeK). It's also fairly
 painful to emulate without branching. Most hardware
 supports it natively and even llvm has a 'select' opcode
 which can handle it without too much hassle.

 diff --git a/src/gallium/docs/source/tgsi.rst 
 b/src/gallium/docs/source/tgsi.rst
 index 28308cb..6c5a02b 100644
 --- a/src/gallium/docs/source/tgsi.rst
 +++ b/src/gallium/docs/source/tgsi.rst
 @@ -72,6 +72,17 @@ used.
  
dst.w = src.w
  
 +.. opcode:: MOVC - Conditional move
 +
 +.. math::
 +
 +  dst.x = src0.x ? src1.x : src2.x
 +
 +  dst.y = src0.y ? src1.y : src2.y
 +
 +  dst.z = src0.z ? src1.z : src2.z
 +
 +  dst.w = src0.w ? src1.w : src2.w
  

I think we already have that:

.. opcode:: UCMP - Integer Conditional Move

.. math::

  dst.x = src0.x ? src1.x : src2.x

  dst.y = src0.y ? src1.y : src2.y

  dst.z = src0.z ? src1.z : src2.z

  dst.w = src0.w ? src1.w : src2.w


No difference apart from the source ordering (the integer just implies
that any non-zero value counts as true, i.e. also inf, nan and -0).

And if you want more conditional ops, in theory we also have
predication, albeit support for that depends on the driver
(PIPE_SHADER_CAP_MAX_PREDS).

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] tgsi: Add a conditional move inststruction

2013-04-04 Thread Christoph Bumiller
On 04.04.2013 16:53, Zack Rusin wrote:
 On 04.04.2013 03:45, Zack Rusin wrote:
 It's part of SM4 (http://goo.gl/4IpeK). It's also fairly
 painful to emulate without branching. Most hardware
 supports it natively and even llvm has a 'select' opcode
 which can handle it without too much hassle.

 diff --git a/src/gallium/docs/source/tgsi.rst
 b/src/gallium/docs/source/tgsi.rst
 index 28308cb..6c5a02b 100644
 --- a/src/gallium/docs/source/tgsi.rst
 +++ b/src/gallium/docs/source/tgsi.rst
 @@ -72,6 +72,17 @@ used.
  
dst.w = src.w
  
 +.. opcode:: MOVC - Conditional move
 +
 +.. math::
 +
 +  dst.x = src0.x ? src1.x : src2.x
 +
 +  dst.y = src0.y ? src1.y : src2.y
 +
 +  dst.z = src0.z ? src1.z : src2.z
 +
 +  dst.w = src0.w ? src1.w : src2.w
  
 I think we already have that:

 .. opcode:: UCMP - Integer Conditional Move

 .. math::

   dst.x = src0.x ? src1.x : src2.x

   dst.y = src0.y ? src1.y : src2.y

   dst.z = src0.z ? src1.z : src2.z

   dst.w = src0.w ? src1.w : src2.w


 No difference apart from the source ordering (the integer just implies
 that any non-zero value counts as true, i.e. also inf, nan and -0).
 That's really broken. UCMP needs to be a an unsigned version of the CMP 
 instruction which does
Did you mean signed version ?
Would you mind doing an s/UCMP/ICMP in TGSI and then chaning all the
UCMPs in other code to MOVC ?
You're right, it would make more sense like this, though you might want
to call it IMOVC so the condition register isn't interpreted as a float
... or is it supposed to be ?

 dst.chan = (src0.chan  0) ? src1.chan : src2.chan
 not a whole new instruction. It's what everyone implements anyway. So if 
 st_glsl_to_tgsi needs
 a conditional move we need to add the above patch and change it to use it.

 And if you want more conditional ops, in theory we also have
 predication, albeit support for that depends on the driver
 (PIPE_SHADER_CAP_MAX_PREDS).
 No, that's a completely different thing. 

 z

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] tgsi: Add a conditional move inststruction

2013-04-04 Thread Christoph Bumiller
On 04.04.2013 17:01, Jose Fonseca wrote:

 - Original Message -
 On 04.04.2013 03:45, Zack Rusin wrote:
 It's part of SM4 (http://goo.gl/4IpeK). It's also fairly
 painful to emulate without branching. Most hardware
 supports it natively and even llvm has a 'select' opcode
 which can handle it without too much hassle.

 diff --git a/src/gallium/docs/source/tgsi.rst
 b/src/gallium/docs/source/tgsi.rst
 index 28308cb..6c5a02b 100644
 --- a/src/gallium/docs/source/tgsi.rst
 +++ b/src/gallium/docs/source/tgsi.rst
 @@ -72,6 +72,17 @@ used.
  
dst.w = src.w
  
 +.. opcode:: MOVC - Conditional move
 +
 +.. math::
 +
 +  dst.x = src0.x ? src1.x : src2.x
 +
 +  dst.y = src0.y ? src1.y : src2.y
 +
 +  dst.z = src0.z ? src1.z : src2.z
 +
 +  dst.w = src0.w ? src1.w : src2.w
  
 I think we already have that:

 .. opcode:: UCMP - Integer Conditional Move

 .. math::

   dst.x = src0.x ? src1.x : src2.x

   dst.y = src0.y ? src1.y : src2.y

   dst.z = src0.z ? src1.z : src2.z

   dst.w = src0.w ? src1.w : src2.w


 No difference apart from the source ordering (the integer just implies
 that any non-zero value counts as true, i.e. also inf, nan and -0).
 That's really broken. UCMP needs to be a an unsigned version of the CMP
 instruction which does
 dst.chan = (src0.chan  0) ? src1.chan : src2.chan
 not a whole new instruction. It's what everyone implements anyway. So if
 st_glsl_to_tgsi needs
 a conditional move we need to add the above patch and change it to use it.
 Yes, it doesn't seem that any of the TGSI_OPCODE_UCMP implementation does 
 that the spec says it supposedly does -- it seems everybody implements it as 
 an unsigned version of CMP. That is, it seems UCMP's description needs to be 
 fixed.

Erm, unsigned  0 doesn't make sense.

Definitely what the description says:
static void
micro_ucmp(union tgsi_exec_channel *dst,
   const union tgsi_exec_channel *src0,
   const union tgsi_exec_channel *src1,
   const union tgsi_exec_channel *src2)
{
   dst-u[0] = src0-u[0] ? src1-u[0] : src2-u[0];
   dst-u[1] = src0-u[1] ? src1-u[1] : src2-u[1];
   dst-u[2] = src0-u[2] ? src1-u[2] : src2-u[2];
   dst-u[3] = src0-u[3] ? src1-u[3] : src2-u[3];
}

or

   case TGSI_OPCODE_UCMP:
   case TGSI_OPCODE_CMP:
  FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) {
 src0 = fetchSrc(0, c);
 src1 = fetchSrc(1, c);
 src2 = fetchSrc(2, c);
 if (src1 == src2)
mkMov(dst0[c], src1);
 else
mkCmp(OP_SLCT, (srcTy == TYPE_F32) ? CC_LT(less than 0) :
CC_NE(not equal 0),
  srcTy, dst0[c], src1, src2, src0);
  }


 Jose


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] tgsi: Add a conditional move inststruction

2013-04-04 Thread Christoph Bumiller
On 04.04.2013 17:23, Jose Fonseca wrote:

 - Original Message -
 On 04.04.2013 17:01, Jose Fonseca wrote:
 - Original Message -
 On 04.04.2013 03:45, Zack Rusin wrote:
 It's part of SM4 (http://goo.gl/4IpeK). It's also fairly
 painful to emulate without branching. Most hardware
 supports it natively and even llvm has a 'select' opcode
 which can handle it without too much hassle.

 diff --git a/src/gallium/docs/source/tgsi.rst
 b/src/gallium/docs/source/tgsi.rst
 index 28308cb..6c5a02b 100644
 --- a/src/gallium/docs/source/tgsi.rst
 +++ b/src/gallium/docs/source/tgsi.rst
 @@ -72,6 +72,17 @@ used.
  
dst.w = src.w
  
 +.. opcode:: MOVC - Conditional move
 +
 +.. math::
 +
 +  dst.x = src0.x ? src1.x : src2.x
 +
 +  dst.y = src0.y ? src1.y : src2.y
 +
 +  dst.z = src0.z ? src1.z : src2.z
 +
 +  dst.w = src0.w ? src1.w : src2.w
  
 I think we already have that:

 .. opcode:: UCMP - Integer Conditional Move

 .. math::

   dst.x = src0.x ? src1.x : src2.x

   dst.y = src0.y ? src1.y : src2.y

   dst.z = src0.z ? src1.z : src2.z

   dst.w = src0.w ? src1.w : src2.w


 No difference apart from the source ordering (the integer just implies
 that any non-zero value counts as true, i.e. also inf, nan and -0).
 That's really broken. UCMP needs to be a an unsigned version of the CMP
 instruction which does
 dst.chan = (src0.chan  0) ? src1.chan : src2.chan
 not a whole new instruction. It's what everyone implements anyway. So if
 st_glsl_to_tgsi needs
 a conditional move we need to add the above patch and change it to use it.
 Yes, it doesn't seem that any of the TGSI_OPCODE_UCMP implementation does
 that the spec says it supposedly does -- it seems everybody implements it
 as an unsigned version of CMP. That is, it seems UCMP's description needs
 to be fixed.
 Erm, unsigned  0 doesn't make sense.
 Ah indeed!

 Definitely what the description says:
 static void
 micro_ucmp(union tgsi_exec_channel *dst,
const union tgsi_exec_channel *src0,
const union tgsi_exec_channel *src1,
const union tgsi_exec_channel *src2)
 {
dst-u[0] = src0-u[0] ? src1-u[0] : src2-u[0];
dst-u[1] = src0-u[1] ? src1-u[1] : src2-u[1];
dst-u[2] = src0-u[2] ? src1-u[2] : src2-u[2];
dst-u[3] = src0-u[3] ? src1-u[3] : src2-u[3];
 }

 or

case TGSI_OPCODE_UCMP:
case TGSI_OPCODE_CMP:
   FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) {
  src0 = fetchSrc(0, c);
  src1 = fetchSrc(1, c);
  src2 = fetchSrc(2, c);
  if (src1 == src2)
 mkMov(dst0[c], src1);
  else
 mkCmp(OP_SLCT, (srcTy == TYPE_F32) ? CC_LT(less than 0) :
 CC_NE(not equal 0),
   srcTy, dst0[c], src1, src2, src0);
   }

 But odd enough, the implementations I happend to look at seemed to do foo = 
 0:

Well, some people can't read documentation ... or they rely on the
condition value always being a glsl-to-tgsi boolean which is only either
0 or ~0/-1.

 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c has:

 static void emit_ucmp(
 const struct lp_build_tgsi_action * action,
 struct lp_build_tgsi_context * bld_base,
 struct lp_build_emit_data * emit_data)
 {
 LLVMBuilderRef builder = bld_base-base.gallivm-builder;

 LLVMValueRef v = LLVMBuildFCmp(builder, LLVMRealUGE,
 emit_data-args[0], 
 lp_build_const_float(bld_base-base.gallivm, 0.), );

 emit_data-output[emit_data-chan] = LLVMBuildSelect(builder, v, 
 emit_data-args[2], emit_data-args[1], );
 }

 (it doesn't even seem to do integers at all)

 src/gallium/drivers/r600/r600_shader.c:

 static int tgsi_ucmp(struct r600_shader_ctx *ctx)
 {
   struct tgsi_full_instruction *inst = 
 ctx-parse.FullToken.FullInstruction;
   struct r600_bytecode_alu alu;
   int i, r;
   int lasti = tgsi_last_instruction(inst-Dst[0].Register.WriteMask);

   for (i = 0; i  lasti + 1; i++) {
   if (!(inst-Dst[0].Register.WriteMask  (1  i)))
   continue;

   memset(alu, 0, sizeof(struct r600_bytecode_alu));
   alu.op = ALU_OP3_CNDGE_INT;
   r600_bytecode_src(alu.src[0], ctx-src[0], i);
   r600_bytecode_src(alu.src[1], ctx-src[2], i);
   r600_bytecode_src(alu.src[2], ctx-src[1], i);
   tgsi_dst(ctx, inst-Dst[0], i, alu.dst);
   alu.dst.chan = i;
   alu.dst.write = 1;
   alu.is_op3 = 1;
   if (i == lasti)
   alu.last = 1;
   r = r600_bytecode_add_alu(ctx-bc, alu);
   if (r)
   return r;
   }
   return 0;
 }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] mesa: implement GL_ARB_draw_indirect

2013-04-04 Thread Christoph Bumiller
---
 src/mapi/glapi/gen/Makefile.am   |1 +
 src/mapi/glapi/gen/gl_API.xml|4 +-
 src/mesa/drivers/dri/i965/brw_draw.c |3 +-
 src/mesa/drivers/dri/i965/brw_draw.h |3 +-
 src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c |9 +-
 src/mesa/main/api_validate.c |  159 
 src/mesa/main/api_validate.h |   26 +++
 src/mesa/main/bufferobj.c|9 +
 src/mesa/main/dd.h   |   12 ++
 src/mesa/main/dlist.c|   41 
 src/mesa/main/extensions.c   |2 +
 src/mesa/main/get.c  |5 +
 src/mesa/main/get_hash_params.py |2 +
 src/mesa/main/mtypes.h   |4 +
 src/mesa/main/tests/dispatch_sanity.cpp  |8 +-
 src/mesa/main/vtxfmt.c   |7 +
 src/mesa/state_tracker/st_cb_rasterpos.c |2 +-
 src/mesa/state_tracker/st_draw.c |3 +-
 src/mesa/state_tracker/st_draw.h |6 +-
 src/mesa/state_tracker/st_draw_feedback.c|3 +-
 src/mesa/tnl/tnl.h   |3 +-
 src/mesa/vbo/vbo.h   |5 +-
 src/mesa/vbo/vbo_exec_array.c|  255 +-
 src/mesa/vbo/vbo_exec_draw.c |2 +-
 src/mesa/vbo/vbo_primitive_restart.c |4 +-
 src/mesa/vbo/vbo_rebase.c|2 +-
 src/mesa/vbo/vbo_save_api.c  |   53 ++
 src/mesa/vbo/vbo_save_draw.c |2 +-
 src/mesa/vbo/vbo_split_copy.c|2 +-
 src/mesa/vbo/vbo_split_inplace.c |2 +-
 30 files changed, 611 insertions(+), 28 deletions(-)

diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am
index 36e47e2..243c148 100644
--- a/src/mapi/glapi/gen/Makefile.am
+++ b/src/mapi/glapi/gen/Makefile.am
@@ -96,6 +96,7 @@ API_XML = \
ARB_depth_clamp.xml \
ARB_draw_buffers_blend.xml \
ARB_draw_elements_base_vertex.xml \
+   ARB_draw_indirect.xml \
ARB_draw_instanced.xml \
ARB_ES2_compatibility.xml \
ARB_ES3_compatibility.xml \
diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index df95924..f22fdac 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -8240,6 +8240,8 @@
 
 !-- ARB extensions #86...#93 --
 
+xi:include href=ARB_draw_indirect.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
+
 category name=GL_ARB_transform_feedback3 number=94
   enum name=MAX_TRANSFORM_FEEDBACK_BUFFERS value=0x8E70/
   enum name=MAX_VERTEX_STREAMS value=0x8E71/
@@ -8317,7 +8319,7 @@
 
 xi:include href=ARB_invalidate_subdata.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
 
-!-- ARB extensions #133...#138 --
+!-- ARB extensions #134...#138 --
 
 xi:include href=ARB_texture_buffer_range.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
 
diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 809bcc5..d0c8415 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -548,7 +548,8 @@ void brw_draw_prims( struct gl_context *ctx,
 GLboolean index_bounds_valid,
 GLuint min_index,
 GLuint max_index,
-struct gl_transform_feedback_object *tfb_vertcount )
+struct gl_transform_feedback_object *tfb_vertcount,
+struct gl_buffer_object *indirect )
 {
struct intel_context *intel = intel_context(ctx);
const struct gl_client_array **arrays = ctx-Array._DrawArrays;
diff --git a/src/mesa/drivers/dri/i965/brw_draw.h 
b/src/mesa/drivers/dri/i965/brw_draw.h
index d86a9e7..3dfac2e 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.h
+++ b/src/mesa/drivers/dri/i965/brw_draw.h
@@ -41,7 +41,8 @@ void brw_draw_prims( struct gl_context *ctx,
 GLboolean index_bounds_valid,
 GLuint min_index,
 GLuint max_index,
-struct gl_transform_feedback_object *tfb_vertcount );
+struct gl_transform_feedback_object *tfb_vertcount,
+struct gl_buffer_object *tfb_vertcount );
 
 void brw_draw_init( struct brw_context *brw );
 void brw_draw_destroy( struct brw_context *brw );
diff --git a/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c 
b/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c
index 436db32..4dee0b8 100644
--- a/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c
+++ b/src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c
@@ -222,7 +222,8 @@ TAG(vbo_render_prims)(struct gl_context *ctx,
  const struct _mesa_index_buffer *ib,
  GLboolean index_bounds_valid,
  GLuint min_index, GLuint max_index,
- struct gl_transform_feedback_object *tfb_vertcount);
+

[Mesa-dev] [PATCH 4/4] st/mesa: add support for indirect drawing

2013-04-04 Thread Christoph Bumiller
---
 src/mesa/state_tracker/st_cb_bufferobjects.c |3 +++
 src/mesa/state_tracker/st_draw.c |   11 ++-
 src/mesa/state_tracker/st_extensions.c   |4 +++-
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_bufferobjects.c 
b/src/mesa/state_tracker/st_cb_bufferobjects.c
index 8ff32c8..5a44bf2 100644
--- a/src/mesa/state_tracker/st_cb_bufferobjects.c
+++ b/src/mesa/state_tracker/st_cb_bufferobjects.c
@@ -205,6 +205,9 @@ st_bufferobj_data(struct gl_context *ctx,
case GL_UNIFORM_BUFFER:
   bind = PIPE_BIND_CONSTANT_BUFFER;
   break;
+   case GL_DRAW_INDIRECT_BUFFER:
+  bind = PIPE_BIND_COMMAND_BUFFER;
+  break;
default:
   bind = 0;
}
diff --git a/src/mesa/state_tracker/st_draw.c b/src/mesa/state_tracker/st_draw.c
index ee1c902..f1379ab 100644
--- a/src/mesa/state_tracker/st_draw.c
+++ b/src/mesa/state_tracker/st_draw.c
@@ -256,6 +256,14 @@ st_draw_vbo(struct gl_context *ctx,
   }
}
 
+   if (indirect) {
+  info.indirect = st_buffer_object(indirect)-buffer;
+
+  /* Primitive restart is not handled by the VBO module in this case. */
+  info.primitive_restart = ctx-Array._PrimitiveRestart;
+  info.restart_index = ctx-Array._RestartIndex;
+   }
+
/* do actual drawing */
for (i = 0; i  nr_prims; i++) {
   info.mode = translate_prim( ctx, prims[i].mode );
@@ -268,6 +276,7 @@ st_draw_vbo(struct gl_context *ctx,
  info.min_index = info.start;
  info.max_index = info.start + info.count - 1;
   }
+  info.indirect_offset = prims[i].indirect_offset;
 
   if (ST_DEBUG  DEBUG_DRAW) {
  debug_printf(st/draw: mode %s  start %u  count %u  indexed %d\n,
@@ -277,7 +286,7 @@ st_draw_vbo(struct gl_context *ctx,
   info.indexed);
   }
 
-  if (info.count_from_stream_output) {
+  if (info.count_from_stream_output || info.indirect) {
  cso_draw_vbo(st-cso_context, info);
   }
   else if (info.primitive_restart) {
diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 11db9d3..c021cda 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -398,7 +398,9 @@ void st_init_extensions(struct st_context *st)
   { o(MESA_texture_array),   PIPE_CAP_MAX_TEXTURE_ARRAY_LAYERS 
},
 
   { o(OES_standard_derivatives), PIPE_CAP_SM3  
},
-  { o(ARB_texture_cube_map_array),   PIPE_CAP_CUBE_MAP_ARRAY   
}
+  { o(ARB_texture_cube_map_array),   PIPE_CAP_CUBE_MAP_ARRAY   
},
+  { o(ARB_draw_indirect),PIPE_CAP_DRAW_INDIRECT
},
+  { o(ARB_multi_draw_indirect),  PIPE_CAP_DRAW_INDIRECT
}
};
 
/* Required: render target and sampler support */
-- 
1.7.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] gallium: add facilities for indirect drawing

2013-04-04 Thread Christoph Bumiller
---
 src/gallium/auxiliary/util/u_draw.c  |   39 ++
 src/gallium/auxiliary/util/u_draw.h  |5 +++
 src/gallium/auxiliary/util/u_dump_state.c|3 ++
 src/gallium/docs/source/screen.rst   |3 ++
 src/gallium/drivers/freedreno/freedreno_screen.c |1 +
 src/gallium/drivers/i915/i915_screen.c   |1 +
 src/gallium/drivers/llvmpipe/lp_draw_arrays.c|5 +++
 src/gallium/drivers/llvmpipe/lp_screen.c |2 +
 src/gallium/drivers/nv30/nv30_screen.c   |1 +
 src/gallium/drivers/nv50/nv50_screen.c   |2 +
 src/gallium/drivers/r300/r300_screen.c   |1 +
 src/gallium/drivers/r600/r600_pipe.c |1 +
 src/gallium/drivers/radeonsi/radeonsi_pipe.c |1 +
 src/gallium/drivers/softpipe/sp_draw_arrays.c|6 +++
 src/gallium/drivers/softpipe/sp_screen.c |2 +
 src/gallium/drivers/svga/svga_screen.c   |1 +
 src/gallium/drivers/trace/tr_dump_state.c|3 ++
 src/gallium/include/pipe/p_defines.h |3 +-
 src/gallium/include/pipe/p_state.h   |   22 
 19 files changed, 101 insertions(+), 1 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_draw.c 
b/src/gallium/auxiliary/util/u_draw.c
index 83d9284..7a28cf1 100644
--- a/src/gallium/auxiliary/util/u_draw.c
+++ b/src/gallium/auxiliary/util/u_draw.c
@@ -27,6 +27,7 @@
 
 
 #include util/u_debug.h
+#include util/u_inlines.h
 #include util/u_math.h
 #include util/u_format.h
 #include util/u_draw.h
@@ -123,3 +124,41 @@ util_draw_max_index(
 
return max_index + 1;
 }
+
+
+void
+util_draw_indirect(struct pipe_context *pipe,
+   const struct pipe_draw_info *_info)
+{
+   struct pipe_draw_info info;
+   struct pipe_transfer *transfer;
+   uint32_t *params;
+
+   assert(_info-indirect);
+   assert(!_info-count_from_stream_output);
+
+   memcpy(info, _info, sizeof(info));
+
+   params = (uint32_t *)
+  pipe_buffer_map_range(pipe,
+_info-indirect,
+_info-indirect_offset,
+_info-indexed ? (4 * 4) : (3 * 4),
+PIPE_TRANSFER_READ,
+transfer);
+   if (!transfer) {
+  debug_printf(%s: failed to map indirect buffer\n, __FUNCTION__);
+  return;
+   }
+
+   info.count = params[0];
+   info.instance_count = params[1];
+   info.start = params[2];
+   info.index_bias = _info-indexed ? params[3] : 0;
+   info.start_instance = _info-indexed ? params[4] : params[3];
+   info.indirect = NULL;
+
+   pipe_buffer_unmap(pipe, transfer);
+
+   pipe-draw_vbo(pipe, info);
+}
diff --git a/src/gallium/auxiliary/util/u_draw.h 
b/src/gallium/auxiliary/util/u_draw.h
index 3dc6918..acec56e 100644
--- a/src/gallium/auxiliary/util/u_draw.h
+++ b/src/gallium/auxiliary/util/u_draw.h
@@ -142,6 +142,11 @@ util_draw_range_elements(struct pipe_context *pipe,
 }
 
 
+void
+util_draw_indirect(struct pipe_context *pipe,
+   const struct pipe_draw_info *info);
+
+
 unsigned
 util_draw_max_index(
   const struct pipe_vertex_buffer *vertex_buffers,
diff --git a/src/gallium/auxiliary/util/u_dump_state.c 
b/src/gallium/auxiliary/util/u_dump_state.c
index 2f28f3c..21b6044 100644
--- a/src/gallium/auxiliary/util/u_dump_state.c
+++ b/src/gallium/auxiliary/util/u_dump_state.c
@@ -758,6 +758,9 @@ util_dump_draw_info(FILE *stream, const struct 
pipe_draw_info *state)
 
util_dump_member(stream, ptr, state, count_from_stream_output);
 
+   util_dump_member(stream, ptr, state, indirect);
+   util_dump_member(stream, uint, state, indirect_offset);
+
util_dump_struct_end(stream);
 }
 
diff --git a/src/gallium/docs/source/screen.rst 
b/src/gallium/docs/source/screen.rst
index f8cdded..ed4749d 100644
--- a/src/gallium/docs/source/screen.rst
+++ b/src/gallium/docs/source/screen.rst
@@ -151,6 +151,9 @@ The integer capabilities:
   dedicated memory should return 1 and all software rasterizers should return 
0.
 * ``PIPE_CAP_QUERY_PIPELINE_STATISTICS``: Whether 
PIPE_QUERY_PIPELINE_STATISTICS
   is supported.
+* ``PIPE_CAP_DRAW_INDIRECT``: Whether the driver supports taking draw arguments
+  { count, instance_count, start, index_bias } from a PIPE_BUFFER resource.
+  See pipe_draw_info.
 
 
 .. _pipe_capf:
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index 283d07f..2b13e29 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -200,6 +200,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_USER_VERTEX_BUFFERS:
case PIPE_CAP_USER_INDEX_BUFFERS:
case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
+   case PIPE_CAP_DRAW_INDIRECT:
return 0;
 
/* Stream output. */
diff --git a/src/gallium/drivers/i915/i915_screen.c 

[Mesa-dev] [PATCH] mesa: implement GL_ARB_draw_indirect (added missing ARB_draw_indirect.xml)

2013-04-04 Thread Christoph Bumiller
---
 src/mapi/glapi/gen/ARB_draw_indirect.xml |   45 +
 src/mapi/glapi/gen/Makefile.am   |1 +
 src/mapi/glapi/gen/gl_API.xml|4 +-
 src/mesa/drivers/dri/i965/brw_draw.c |3 +-
 src/mesa/drivers/dri/i965/brw_draw.h |3 +-
 src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c |9 +-
 src/mesa/main/api_validate.c |  159 
 src/mesa/main/api_validate.h |   26 +++
 src/mesa/main/bufferobj.c|9 +
 src/mesa/main/dd.h   |   12 ++
 src/mesa/main/dlist.c|   41 
 src/mesa/main/extensions.c   |2 +
 src/mesa/main/get.c  |5 +
 src/mesa/main/get_hash_params.py |2 +
 src/mesa/main/mtypes.h   |4 +
 src/mesa/main/tests/dispatch_sanity.cpp  |8 +-
 src/mesa/main/vtxfmt.c   |7 +
 src/mesa/state_tracker/st_cb_rasterpos.c |2 +-
 src/mesa/state_tracker/st_draw.c |3 +-
 src/mesa/state_tracker/st_draw.h |6 +-
 src/mesa/state_tracker/st_draw_feedback.c|3 +-
 src/mesa/tnl/tnl.h   |3 +-
 src/mesa/vbo/vbo.h   |5 +-
 src/mesa/vbo/vbo_exec_array.c|  255 +-
 src/mesa/vbo/vbo_exec_draw.c |2 +-
 src/mesa/vbo/vbo_primitive_restart.c |4 +-
 src/mesa/vbo/vbo_rebase.c|2 +-
 src/mesa/vbo/vbo_save_api.c  |   53 ++
 src/mesa/vbo/vbo_save_draw.c |2 +-
 src/mesa/vbo/vbo_split_copy.c|2 +-
 src/mesa/vbo/vbo_split_inplace.c |2 +-
 31 files changed, 656 insertions(+), 28 deletions(-)
 create mode 100644 src/mapi/glapi/gen/ARB_draw_indirect.xml

diff --git a/src/mapi/glapi/gen/ARB_draw_indirect.xml 
b/src/mapi/glapi/gen/ARB_draw_indirect.xml
new file mode 100644
index 000..7de03cd
--- /dev/null
+++ b/src/mapi/glapi/gen/ARB_draw_indirect.xml
@@ -0,0 +1,45 @@
+?xml version=1.0?
+!DOCTYPE OpenGLAPI SYSTEM gl_API.dtd
+
+OpenGLAPI
+
+category name=GL_ARB_draw_indirect number=87
+
+enum name=DRAW_INDIRECT_BUFFER   value=0x8F3F/
+enum name=DRAW_INDIRECT_BUFFER_BINDING   value=0x8F43/
+
+function name=DrawArraysIndirect offset=assign exec=dynamic
+param name=mode type=GLenum/
+param name=indirect type=const GLvoid */
+/function
+
+function name=DrawElementsIndirect offset=assign exec=dynamic
+param name=mode type=GLenum/
+param name=type type=GLenum/
+param name=indirect type=const GLvoid */
+/function
+
+/category
+
+
+category name=GL_ARB_multi_draw_indirect number=133
+
+function name=MultiDrawArraysIndirect offset=assign exec=dynamic
+param name=mode type=GLenum/
+param name=indirect type=const GLvoid */
+param name=primcount type=GLsizei/
+param name=stride type=GLsizei/
+/function
+
+function name=MultiDrawElementsIndirect offset=assign exec=dynamic
+param name=mode type=GLenum/
+param name=type type=GLenum/
+param name=indirect type=const GLvoid */
+param name=primcount type=GLsizei/
+param name=stride type=GLsizei/
+/function
+
+/category
+
+
+/OpenGLAPI
diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am
index 36e47e2..243c148 100644
--- a/src/mapi/glapi/gen/Makefile.am
+++ b/src/mapi/glapi/gen/Makefile.am
@@ -96,6 +96,7 @@ API_XML = \
ARB_depth_clamp.xml \
ARB_draw_buffers_blend.xml \
ARB_draw_elements_base_vertex.xml \
+   ARB_draw_indirect.xml \
ARB_draw_instanced.xml \
ARB_ES2_compatibility.xml \
ARB_ES3_compatibility.xml \
diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index df95924..f22fdac 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -8240,6 +8240,8 @@
 
 !-- ARB extensions #86...#93 --
 
+xi:include href=ARB_draw_indirect.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
+
 category name=GL_ARB_transform_feedback3 number=94
   enum name=MAX_TRANSFORM_FEEDBACK_BUFFERS value=0x8E70/
   enum name=MAX_VERTEX_STREAMS value=0x8E71/
@@ -8317,7 +8319,7 @@
 
 xi:include href=ARB_invalidate_subdata.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
 
-!-- ARB extensions #133...#138 --
+!-- ARB extensions #134...#138 --
 
 xi:include href=ARB_texture_buffer_range.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
 
diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 809bcc5..d0c8415 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -548,7 +548,8 @@ void brw_draw_prims( struct gl_context *ctx,
 GLboolean index_bounds_valid,
 GLuint min_index,
   

Re: [Mesa-dev] [PATCH 2/4] gallium: add PIPE_BIND_COMMAND_BUFFER

2013-04-04 Thread Christoph Bumiller
On 04.04.2013 21:44, Jose Fonseca wrote:
 I think that PIPE_BIND_INDIRECT_BUFFER would be more self-descriptive.

 Or do you envision other uses of such buffer?

It's possible that at some point we add a mechanism to let the driver
store arbitrary commands into a buffer created by the st, or have
resources used as arguments conditional rendering ...
Lost of possiblities, but nothing concrete, and for the command lists
like with D3D's deferred contexts we'd probably return opaque objects
that can contain more auxiliary data.
I like it to be more generic, but then it could turn out that there be
different requirements on these command source buffers in the future
... I'm undecided now.



 Jose

 - Original Message -
 Intended for use with GL_ARB_draw_indirect's DRAW_INDIRECT_BUFFER
 target or for D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS.
 ---
  src/gallium/docs/source/screen.rst   |2 ++
  src/gallium/include/pipe/p_defines.h |1 +
  2 files changed, 3 insertions(+), 0 deletions(-)

 diff --git a/src/gallium/docs/source/screen.rst
 b/src/gallium/docs/source/screen.rst
 index c1a3c0b..f8cdded 100644
 --- a/src/gallium/docs/source/screen.rst
 +++ b/src/gallium/docs/source/screen.rst
 @@ -306,6 +306,8 @@ resources might be created and handled quite differently.
bound to the graphics pipeline as a shader resource.
  * ``PIPE_BIND_COMPUTE_RESOURCE``: A buffer or texture that can be
bound to the compute program as a shader resource.
 +* ``PIPE_BIND_COMMAND_BUFFER``: A buffer or that may be sourced by the
 +  GPU command processor, like with indirect drawing.
  
  .. _pipe_usage:
  
 diff --git a/src/gallium/include/pipe/p_defines.h
 b/src/gallium/include/pipe/p_defines.h
 index 5b00acc..2b79f2a 100644
 --- a/src/gallium/include/pipe/p_defines.h
 +++ b/src/gallium/include/pipe/p_defines.h
 @@ -315,6 +315,7 @@ enum pipe_flush_flags {
  #define PIPE_BIND_GLOBAL   (1  18) /* set_global_binding */
  #define PIPE_BIND_SHADER_RESOURCE  (1  19) /* set_shader_resources */
  #define PIPE_BIND_COMPUTE_RESOURCE (1  20) /* set_compute_resources */
 +#define PIPE_BIND_COMMAND_BUFFER   (1  21) /* pipe_draw_info.indirect
 */
  
  /* The first two flags above were previously part of the amorphous
   * TEXTURE_USAGE, most of which are now descriptions of the ways a
 --
 1.7.3.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: fix bitmap, drawpix, drawtex for PIPE_CAP_TGSI_TEXCOORD

2013-04-02 Thread Christoph Bumiller
On 02.04.2013 16:39, Brian Paul wrote:
 On 03/30/2013 08:11 AM, Christoph Bumiller wrote:
 NOTE: Changed the semantic index for the drawtex coordiante to
 be the texture unit index instead of always 0.
 Not sure if this is correct but since the value seems to depend
 on the unit it would make sense to use different varying slots.

 Tested-by: Brian Paul bri...@vmware.com
Thanks !
Just to be sure, you're referring to the part that changes the semantic
index so that TEX0..7(max units) is used instead of always TEX0, right ?
I'll push that as a separate patch then.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/mesa: fix bitmap, drawpix, drawtex for PIPE_CAP_TGSI_TEXCOORD

2013-03-30 Thread Christoph Bumiller
NOTE: Changed the semantic index for the drawtex coordiante to
be the texture unit index instead of always 0.
Not sure if this is correct but since the value seems to depend
on the unit it would make sense to use different varying slots.
---
 src/mesa/state_tracker/st_cb_bitmap.c |1 +
 src/mesa/state_tracker/st_cb_drawpixels.c |5 -
 src/mesa/state_tracker/st_cb_drawtex.c|5 +++--
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_bitmap.c 
b/src/mesa/state_tracker/st_cb_bitmap.c
index bae9ff8..0513814 100644
--- a/src/mesa/state_tracker/st_cb_bitmap.c
+++ b/src/mesa/state_tracker/st_cb_bitmap.c
@@ -766,6 +766,7 @@ st_Bitmap(struct gl_context *ctx, GLint x, GLint y,
   /* create pass-through vertex shader now */
   const uint semantic_names[] = { TGSI_SEMANTIC_POSITION,
   TGSI_SEMANTIC_COLOR,
+st-needs_texcoord_semantic ? TGSI_SEMANTIC_TEXCOORD :
   TGSI_SEMANTIC_GENERIC };
   const uint semantic_indexes[] = { 0, 0, 0 };
   st-bitmap.vs = util_make_vertex_passthrough_shader(st-pipe, 3,
diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c 
b/src/mesa/state_tracker/st_cb_drawpixels.c
index f0baa34..b25b776 100644
--- a/src/mesa/state_tracker/st_cb_drawpixels.c
+++ b/src/mesa/state_tracker/st_cb_drawpixels.c
@@ -294,6 +294,9 @@ static void *
 make_passthrough_vertex_shader(struct st_context *st, 
GLboolean passColor)
 {
+   const unsigned texcoord_semantic = st-needs_texcoord_semantic ?
+  TGSI_SEMANTIC_TEXCOORD : TGSI_SEMANTIC_GENERIC;
+
if (!st-drawpix.vert_shaders[passColor]) {
   struct ureg_program *ureg = ureg_create( TGSI_PROCESSOR_VERTEX );
 
@@ -307,7 +310,7 @@ make_passthrough_vertex_shader(struct st_context *st,
   
   /* MOV result.texcoord0, vertex.attr[1]; */
   ureg_MOV(ureg, 
-   ureg_DECL_output( ureg, TGSI_SEMANTIC_GENERIC, 0 ),
+   ureg_DECL_output( ureg, texcoord_semantic, 0 ),
ureg_DECL_vs_input( ureg, 1 ));
   
   if (passColor) {
diff --git a/src/mesa/state_tracker/st_cb_drawtex.c 
b/src/mesa/state_tracker/st_cb_drawtex.c
index a8806c9..fc1cb7d 100644
--- a/src/mesa/state_tracker/st_cb_drawtex.c
+++ b/src/mesa/state_tracker/st_cb_drawtex.c
@@ -209,8 +209,9 @@ st_DrawTex(struct gl_context *ctx, GLfloat x, GLfloat y, 
GLfloat z,
 SET_ATTRIB(2, attr, s1, t1, 0.0f, 1.0f);  /* upper right */
 SET_ATTRIB(3, attr, s0, t1, 0.0f, 1.0f);  /* upper left */
 
-semantic_names[attr] = TGSI_SEMANTIC_GENERIC;
-semantic_indexes[attr] = 0;
+semantic_names[attr] = st-needs_texcoord_semantic ?
+   TGSI_SEMANTIC_TEXCOORD : TGSI_SEMANTIC_GENERIC;
+semantic_indexes[attr] = i;
 
 attr++;
  }
-- 
1.7.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix range handling for tgsi input/output declarations

2013-03-29 Thread Christoph Bumiller
On 29.03.2013 10:56, Christian König wrote:
 Am 28.03.2013 20:34, schrieb Vadim Girlin:
 On 03/28/2013 01:01 PM, � wrote:
 Am 27.03.2013 20:37, schrieb Vadim Girlin:
 Signed-off-by: Vadim Girlin vadimgir...@gmail.com
 ---
   src/gallium/drivers/r600/r600_shader.c | 19 +++
   1 file changed, 15 insertions(+), 4 deletions(-)

 diff --git a/src/gallium/drivers/r600/r600_shader.c
 b/src/gallium/drivers/r600/r600_shader.c
 index 29facf7..d4c9c03 100644
 --- a/src/gallium/drivers/r600/r600_shader.c
 +++ b/src/gallium/drivers/r600/r600_shader.c
 @@ -874,12 +874,12 @@ static int select_twoside_color(struct
 r600_shader_ctx *ctx, int front, int back
   static int tgsi_declaration(struct r600_shader_ctx *ctx)
   {
   struct tgsi_full_declaration *d =
 ctx-parse.FullToken.FullDeclaration;
 -unsigned i;
 -int r;
 +int r, i, j, count = d-Range.Last - d-Range.First + 1;
   switch (d-Declaration.File) {
   case TGSI_FILE_INPUT:
 -i = ctx-shader-ninput++;
 +i = ctx-shader-ninput;
 +ctx-shader-ninput += count;
   ctx-shader-input[i].name = d-Semantic.Name;
   ctx-shader-input[i].sid = d-Semantic.Index;
   ctx-shader-input[i].interpolate = d-Interp.Interpolate;
 @@ -903,9 +903,15 @@ static int tgsi_declaration(struct
 r600_shader_ctx *ctx)
   return r;
   }
   }
 +for (j = 1; j  count; ++j) {
 +memcpy(ctx-shader-input[i + j],
 ctx-shader-input[i],
 +   sizeof(struct r600_shader_io));

 Instead of memcpy, shouldn't an assignment do the trick here as well?

 Yes, assignment should work fine, I just used to use memcpy in such
 cases for some reason. I'll replace memcpy with assignment.

 Also I think second part (outputs handling) can be dropped for now -
 currently we only need to handle the inputs (for HUD shaders), and
 later when array declarations for inputs/outputs will be implemented
 in TGSI probably we'll need to update the parser in r600g anyway -
 I'm just not sure yet how the semantic indices should be handled for
 input/output arrays.

The semantic indices are sequential, obviously. It gets more complex
with scalar arrays, but you don't have to worry about that in r600
because I'd probably add a cap for those.

Example: If you declare an out float a[8] layout(location = k) in GLSL
(as per ARB_separate_shader_objects), the 8 values are counted as
consuming 8 consecutive vec4 slots (here the user is responsible for
packing, nice !).
The location will be communicated via the semantic index. You'd get DCL
OUT[n..n+7] GENERIC[k] (or k+some_constant_offset because of st/mesa's
allocation policy).
If the consuming shader declares in float b[4] layout(location = k+4),
you'd get DCL IN[m..m+3] GENERIC[k+4], and this has to link with the
upper 4 components out the a[8] output.



 Yeah, the uncertainly about semantic IDs was one of the reasons I
 didn't wanted to do Input/Output arrays in the initial array
 implementation.

 When those changes are the only one then v2 of the patch is:

 Reviewed-by: Christian König christian.koe...@amd.com

 Christian.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium: add PIPE_CAP_QUERY_PIPELINE_STATISTICS

2013-03-29 Thread Christoph Bumiller
---
 src/gallium/docs/source/screen.rst   |2 ++
 src/gallium/drivers/freedreno/freedreno_screen.c |1 +
 src/gallium/drivers/i915/i915_screen.c   |1 +
 src/gallium/drivers/llvmpipe/lp_screen.c |2 ++
 src/gallium/drivers/nv30/nv30_screen.c   |1 +
 src/gallium/drivers/nv50/nv50_screen.c   |2 ++
 src/gallium/drivers/nvc0/nvc0_screen.c   |1 +
 src/gallium/drivers/r300/r300_screen.c   |1 +
 src/gallium/drivers/r600/r600_pipe.c |1 +
 src/gallium/drivers/radeonsi/radeonsi_pipe.c |1 +
 src/gallium/drivers/softpipe/sp_screen.c |2 ++
 src/gallium/include/pipe/p_defines.h |3 ++-
 12 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/src/gallium/docs/source/screen.rst 
b/src/gallium/docs/source/screen.rst
index 8c7e86e..c1a3c0b 100644
--- a/src/gallium/docs/source/screen.rst
+++ b/src/gallium/docs/source/screen.rst
@@ -149,6 +149,8 @@ The integer capabilities:
   to use a blit to implement a texture transfer which needs format conversions
   and swizzling in state trackers. Generally, all hardware drivers with
   dedicated memory should return 1 and all software rasterizers should return 
0.
+* ``PIPE_CAP_QUERY_PIPELINE_STATISTICS``: Whether 
PIPE_QUERY_PIPELINE_STATISTICS
+  is supported.
 
 
 .. _pipe_capf:
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index 79eef5e..283d07f 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -199,6 +199,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_VERTEX_COLOR_CLAMPED:
case PIPE_CAP_USER_VERTEX_BUFFERS:
case PIPE_CAP_USER_INDEX_BUFFERS:
+   case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
return 0;
 
/* Stream output. */
diff --git a/src/gallium/drivers/i915/i915_screen.c 
b/src/gallium/drivers/i915/i915_screen.c
index 13aa91c..54b2154 100644
--- a/src/gallium/drivers/i915/i915_screen.c
+++ b/src/gallium/drivers/i915/i915_screen.c
@@ -210,6 +210,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap 
cap)
case PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION:
case PIPE_CAP_START_INSTANCE:
case PIPE_CAP_QUERY_TIMESTAMP:
+   case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
case PIPE_CAP_TEXTURE_MULTISAMPLE:
case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
   return 0;
diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c 
b/src/gallium/drivers/llvmpipe/lp_screen.c
index e8c6ab1..6700887 100644
--- a/src/gallium/drivers/llvmpipe/lp_screen.c
+++ b/src/gallium/drivers/llvmpipe/lp_screen.c
@@ -130,6 +130,8 @@ llvmpipe_get_param(struct pipe_screen *screen, enum 
pipe_cap param)
   return 0;
case PIPE_CAP_QUERY_TIMESTAMP:
   return 1;
+   case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
+  return 0;
case PIPE_CAP_TEXTURE_MIRROR_CLAMP:
   return 1;
case PIPE_CAP_TEXTURE_SHADOW_MAP:
diff --git a/src/gallium/drivers/nv30/nv30_screen.c 
b/src/gallium/drivers/nv30/nv30_screen.c
index 4084869..e33710e 100644
--- a/src/gallium/drivers/nv30/nv30_screen.c
+++ b/src/gallium/drivers/nv30/nv30_screen.c
@@ -122,6 +122,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
case PIPE_CAP_TEXTURE_BUFFER_OBJECTS:
case PIPE_CAP_TEXTURE_BUFFER_OFFSET_ALIGNMENT:
+   case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
   return 0;
case PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY:
case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY:
diff --git a/src/gallium/drivers/nv50/nv50_screen.c 
b/src/gallium/drivers/nv50/nv50_screen.c
index 0a20ae3..53eeeb6 100644
--- a/src/gallium/drivers/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nv50/nv50_screen.c
@@ -189,6 +189,8 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
   return 0;
case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
   return 1;
+   case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
+  return 0;
default:
   NOUVEAU_ERR(unknown PIPE_CAP %d\n, param);
   return 0;
diff --git a/src/gallium/drivers/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nvc0/nvc0_screen.c
index 5b9385a..3a32539 100644
--- a/src/gallium/drivers/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nvc0/nvc0_screen.c
@@ -136,6 +136,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_QUERY_TIME_ELAPSED:
case PIPE_CAP_OCCLUSION_QUERY:
case PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME:
+   case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
   return 1;
case PIPE_CAP_MAX_STREAM_OUTPUT_BUFFERS:
   return 4;
diff --git a/src/gallium/drivers/r300/r300_screen.c 
b/src/gallium/drivers/r300/r300_screen.c
index bd16c3b..3175b3b 100644
--- a/src/gallium/drivers/r300/r300_screen.c
+++ b/src/gallium/drivers/r300/r300_screen.c
@@ -135,6 

[Mesa-dev] [PATCH] gallium/hud: add support for PIPE_QUERY_PIPELINE_STATISTICS

2013-03-29 Thread Christoph Bumiller
Also, renamed pixels-rendered to samples-passed because the
occlusion counter increments even if colour and depth writes are
disabled, or (on some implementations) for killed that passed the
depth test when early_fragment_tests has been set for the PS.
---
 src/gallium/auxiliary/hud/hud_context.c  |   45 +++--
 src/gallium/auxiliary/hud/hud_cpu.c  |6 ++-
 src/gallium/auxiliary/hud/hud_driver_query.c |8 +++--
 src/gallium/auxiliary/hud/hud_private.h  |1 +
 4 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/src/gallium/auxiliary/hud/hud_context.c 
b/src/gallium/auxiliary/hud/hud_context.c
index 60355ca..cfb58a8 100644
--- a/src/gallium/auxiliary/hud/hud_context.c
+++ b/src/gallium/auxiliary/hud/hud_context.c
@@ -90,6 +90,10 @@ struct hud_context {
   unsigned max_num_vertices;
   unsigned num_vertices;
} text, bg, whitelines;
+
+   struct {
+  boolean query_pipeline_statistics;
+   } cap;
 };
 
 
@@ -716,15 +720,45 @@ hud_parse_env_var(struct hud_context *hud, const char 
*env)
   else if (sscanf(name, cpu%u%s, i, s) == 1) {
  hud_cpu_graph_install(pane, i);
   }
-  else if (strcmp(name, pixels-rendered) == 0 
+  else if (strcmp(name, samples-passed) == 0 
has_occlusion_query(hud-pipe-screen)) {
- hud_pipe_query_install(pane, hud-pipe, pixels-rendered,
-PIPE_QUERY_OCCLUSION_COUNTER, 0, FALSE);
+ hud_pipe_query_install(pane, hud-pipe, samples-passed,
+PIPE_QUERY_OCCLUSION_COUNTER, 0, 0, FALSE);
   }
   else if (strcmp(name, primitives-generated) == 0 
has_streamout(hud-pipe-screen)) {
  hud_pipe_query_install(pane, hud-pipe, primitives-generated,
-PIPE_QUERY_PRIMITIVES_GENERATED, 0, FALSE);
+PIPE_QUERY_PRIMITIVES_GENERATED, 0, 0, FALSE);
+  }
+  else if (strncmp(name, pipeline-statistics-, 20) == 0) {
+ if (hud-cap.query_pipeline_statistics) {
+static const char *pipeline_statistics_names[] =
+{
+   ia_vertices,
+   ia_primitives,
+   vs_invocations,
+   gs_invocations,
+   gs_primitives,
+   c_invocationd,
+   c_primitives,
+   ps_invocations,
+   hs_invocations,
+   ds_invocations,
+   cs_invocations
+};
+for (i = 0; i  Elements(pipeline_statistics_names); ++i)
+   if (strcmp(name[20], pipeline_statistics_names[i]) == 0)
+  break;
+if (i  Elements(pipeline_statistics_names))
+   hud_pipe_query_install(pane, hud-pipe, name[20],
+  PIPE_QUERY_PIPELINE_STATISTICS, i,
+  0, FALSE);
+else
+   fprintf(stderr, gallium_hud: invalid pipeline-statistics-*\n);
+ } else {
+fprintf(stderr, gallium_hud: PIPE_QUERY_PIPELINE_STATISTICS 
+not supported by the driver\n);
+ }
   }
   else {
  if (!hud_driver_query_install(pane, hud-pipe, name)){
@@ -963,6 +997,9 @@ hud_create(struct pipe_context *pipe, struct cso_context 
*cso)
 
LIST_INITHEAD(hud-pane_list);
 
+   hud-cap.query_pipeline_statistics =
+  pipe-screen-get_param(pipe-screen, 
PIPE_CAP_QUERY_PIPELINE_STATISTICS);
+
hud_parse_env_var(hud, env);
return hud;
 }
diff --git a/src/gallium/auxiliary/hud/hud_cpu.c 
b/src/gallium/auxiliary/hud/hud_cpu.c
index dfd9f68..ce98115 100644
--- a/src/gallium/auxiliary/hud/hud_cpu.c
+++ b/src/gallium/auxiliary/hud/hud_cpu.c
@@ -32,6 +32,7 @@
 #include os/os_time.h
 #include util/u_memory.h
 #include stdio.h
+#include inttypes.h
 
 static boolean
 get_cpu_stats(unsigned cpu_index, uint64_t *busy_time, uint64_t *total_time)
@@ -55,8 +56,9 @@ get_cpu_stats(unsigned cpu_index, uint64_t *busy_time, 
uint64_t *total_time)
  int i, num;
 
  num = sscanf(line,
-  %s %llu %llu %llu %llu %llu %llu %llu %llu %llu 
-  %llu %llu %llu,
+  %s %PRIu64 %PRIu64 %PRIu64 %PRIu64 %PRIu64
+   %PRIu64 %PRIu64 %PRIu64 %PRIu64 %PRIu64
+   %PRIu64 %PRIu64,
   cpuname, v[0], v[1], v[2], v[3], v[4], v[5],
   v[6], v[7], v[8], v[9], v[10], v[11]);
  if (num  5) {
diff --git a/src/gallium/auxiliary/hud/hud_driver_query.c 
b/src/gallium/auxiliary/hud/hud_driver_query.c
index 798da50..413059c 100644
--- a/src/gallium/auxiliary/hud/hud_driver_query.c
+++ b/src/gallium/auxiliary/hud/hud_driver_query.c
@@ -42,6 +42,7 @@
 struct query_info {
struct pipe_context *pipe;
unsigned query_type;
+   unsigned result_index; /* unit depends on query_type */
 
/* Ring of queries. If a query is busy, we use 

[Mesa-dev] [PATCH] gallium/docs: fix definition of PIPE_QUERY_SO_STATISTICS

2013-03-29 Thread Christoph Bumiller
---
 src/gallium/docs/source/context.rst |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/gallium/docs/source/context.rst 
b/src/gallium/docs/source/context.rst
index 9e57930..2cc1848 100644
--- a/src/gallium/docs/source/context.rst
+++ b/src/gallium/docs/source/context.rst
@@ -335,15 +335,17 @@ The result is a 64-bit integer specifying the timer 
resolution in Hz,
 followed by a boolean value indicating whether the timer has incremented.
 
 ``PIPE_QUERY_PRIMITIVES_GENERATED`` returns a 64-bit integer indicating
-the number of primitives processed by the pipeline.
+the number of primitives processed by the pipeline (regardless of whether
+stream output is active or not).
 
 ``PIPE_QUERY_PRIMITIVES_EMITTED`` returns a 64-bit integer indicating
 the number of primitives written to stream output buffers.
 
 ``PIPE_QUERY_SO_STATISTICS`` returns 2 64-bit integers corresponding to
-the results of
+the result of
 ``PIPE_QUERY_PRIMITIVES_EMITTED`` and
-``PIPE_QUERY_PRIMITIVES_GENERATED``, in this order.
+the number of primitives that would have been written to stream output buffers
+if they had infinite space available (primitives_storage_needed), in this 
order.
 
 ``PIPE_QUERY_SO_OVERFLOW_PREDICATE`` returns a boolean value indicating
 whether the stream output targets have overflowed as a result of the
-- 
1.7.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/5] Head-up display for Gallium DRI2 drivers

2013-03-26 Thread Christoph Bumiller
On 26.03.2013 12:18, Vadim Girlin wrote:
 On 03/26/2013 02:00 AM, Marek Olšák wrote:
 On Mon, Mar 25, 2013 at 10:38 PM, Ondrej Holecek aaa...@gmail.com
 wrote:
 On Saturday 23 of March 2013 00:50:59 Marek Olšák wrote:
 Hi everyone, one image is better than a thousand words:
 ...

 Hi,

 I tried your patches and hit a few problems. As first, they do not
 apply
 cleanly on master as they are expecting another your patch cso: add
 constant
 buffer save/restore feature for postprocessing to be present. But I
 guess you
 are aware of that.

 Yes, I sent the patch to mesa-dev earlier.


 Second problem is that when I build mesa with HUD on my 32bit
 virtual machine,
 HUD works (with 32bit app of course). When I build it on 64bit (both
 are same
 uptodate OS openSUSE 12.3), HUD is not working (with 64bit app). I
 managed to
 track it down to failed IMM instruction parsing during HUD_create
 function. It
 appears that translate_ctx structure in tgsi_text_translate (file
 src/gallium/auxiliary/tgsi/tgsi_text.c) is not initialized to zeros
 under my
 64bit system, instead ctx.num_immediates is equal to 1 and hence
 trigger
 Immediates must be sorted error.
 Following fixes HUD for me (note that I really don't know if I am
 not broking
 something here in regards to mesa):

 diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c
 b/src/gallium/auxiliary/tgsi/tgsi_text.c
 index 6b97bee..247ec75 100644
 --- a/src/gallium/auxiliary/tgsi/tgsi_text.c
 +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
 @@ -1577,6 +1577,7 @@ tgsi_text_translate(
  ctx.tokens = tokens;
  ctx.tokens_cur = tokens;
  ctx.tokens_end = tokens + num_tokens;
 +   ctx.num_immediates = 0;

  if (!translate( ctx ))
 return FALSE;

 I've sent a fix for this a couple of days ago:

 http://www.mail-archive.com/mesa-dev@lists.freedesktop.org/msg36038.html


 The third issue is that on both 32bit and 64bit build fonts are not
 displayed
 in HUD. I see graphs and transparent background rectangles for text
 but no
 text is visible. This one I did not yet solve.

 Your driver must support the I8_UNORM texture format.

 I think this also may be related to unexpected by some drivers TGSI
 declaration of vertex shader inputs:

 DCL IN[0..1]


But this is in no way invalid, any driver that doesn't handle it is broken.

Moreover, ideally, IN/OUT should follow the same array declaration and
access semantics as TEMP, that's just not implemented yet because it's a
bit more involved (WIP).

 At least r600g expects the separate declaration for each input, though
 fortunately it still works in this case because parsed declarations of
 VS inputs aren't really used in r600g. I noticed exactly the same
 issue (missing text) with my r600-sb branch because it relies on the
 number of the parsed inputs from r600g's tgsi translator. It's 1 in
 this case instead of 2, so second input register is considered
 undefined and optimized away.

 I suspect that some other drivers may also handle this declaration
 incorrectly and this may explain the issue.

 Vadim



 One last thought, is it intentional when wrong query is entered that
 hud graph
 is displayed but empty? Maybe some text like wrong query XXX would
 be a good
 hint. I know it is printed on stdout but looking for warnings in
 chatty apps
 like openarena is little tricky.

 Yes, it's intentional. I guess I can at least make it not draw an
 empty pane.

 Marek
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] RFC: TGSI scalar arrays

2013-03-20 Thread Christoph Bumiller
Sorry, this has become longer than I anticipated ...

I've been toying with adding support for TGSI_FILE_INPUT/OUTPUT arrays
because, since I cannot allocate varyings in the same order that the
register index specifies, I need it:

===
EXAMPLE:
OUT[0], CLIPDIST[1], must be allocated at address 0x2c0 in hardware
output space
OUT[1], CLIPDIST[0], 0x2d0
OUT[2], GENERIC[0], between 0x80 and 0x280
OUT[3], GENERIC[1], between 0x80 and 0x280

And without array specification
MOV OUT[TEMP[0].x-1], IMM[0]
would leave me no clue as to whether use 0x80 or 0x2c0 as base address.
===

Now that I'm on it, I'm considering to go a step further, which is
adding indirect scalar/component access.
This is motivated by float gl_ClipDistance[], which, if accessed
indirectly, currently leaves us no choice than generating code like this:

if ((index  3) == 0) access x component; else
if ((index  3) == 1) access y component; ...

This is undesirable and the hardware can do better (as it actually
supports accessing individual components since address registers contain
an address in bytes and we can do scalar read/write).

A second motivation is varying packing, which is required by the GL
spec, and may lead to use of TEMP arrays, which, albeit improved now,
will impair performance when used (on nv50 they go to uncached memory
which is very slow).

That case occurs if, for instance, a varying float[8] is accessed
indirectly and has to be packed into
OUT[0..1].xyzw, GENERIC[0..1]
instead of
OUT[0..7].x, GENERIC[0..7]

So far I've come up with 2 choices (all available only if the driver
supports e.g. PIPE_CAP_TGSI_SCALAR_REGISTERS):


1. SCALAR DECLARATIONS

Using float gl_ClipDistance[8] as example, it could be declared as:

OUT[0..7].x, CLIPDIST, ARRAY(1) where the .x now means that it's a
single component per OUT[index]

Now this obviously means that a single OUT[i] doesn't always consume 16
bytes / 4 components anymore, which may be a somewhat disturbing, since
the address of an output can't be directly inferred solely from its
index anymore.
However, that doesn't really constitute a problem if all access is
either direct or comes with an ARRAY() reference.

For varying packing, which happens only for user defined variables, and
hence TGSI_SEMANTIC_GENERIC, it gets a bit uglier:

(NOTE: GL requires us to be able to support exactly the amount of
components we report, failing due to alignment is not allowed. Hence the
GLSL compiler may put some variables at unaligned locations, see
ir_variable.location_frac):

A GENERIC semantic index should always cover 4 components so that a
fixed location can be assigned for it (drivers usually do this since it
makes an extra dynamic linkage pass when shaders are changed
unnecessary, as intended by GL_ARB_separate_shader_objects).

So, this would be valid:
OUT[0..3].x, GENERIC[0]
OUT[4..5].xy, GENERIC[1]
OUT[6], GENERIC[2]
Note how 3 OUT[indices] only consume 1 GENERIC[index].

If we, instead, allocated semantic index per register index instead of
per 4 components, we would have:
OUT[0..3].x, GENERIC[0]
OUT[4..5].xy, GENERIC[4]
OUT[6], GENERIC[6]
This would waste space, since GENERIC[4,6] would have to go to
output_space[addresses 0x40, 0x60] so it could link with
IN[6], GENERIC[6]
where we have no information about the size of GENERIC[0 .. 5], and
wasting space like that means the advertised number of varying
components cannot be satisfied.


And as a last step, if varyings are placed at non-vec4 boundaries, we
would have to be able to specify fractional semantic indices, like this:
OUT[0..2].x, GENERIC[0].x
OUT[3].x, GENERIC[0].w



2. SCALAR ADDRESS REGISTER VALUES

All this can be avoided by always declaring full vec4s, and adding the
possibility of doing indirect addressing on a per-component basis:

varying float a[4] becomes:
uniform int i;
a[i+5] = 999 becomes:

OUT[0].xyzw, ARRAY(1)
UARL_SCALAR ADDR[0].x, CONST[0].
MOV OUT(array 1)[ADDR[0].x+1].y, IMM[0].

The only difficulty with this is that we have to split acess TGSI
instructions accessing unaligned vectors:
(NOTE: this can always be avoided with TGSI_FILE_TEMPORARY, but varyings
may have to be packed).

With suggestion (1), 2 packed (and hence unaligned) vec3 arrays and a
single vec2 would look like this:
OUT[0..3].xyz, GENERIC[0].x
OUT[4..5].xyz, GENERIC[3].x
OUT[6].xy, GENERIC[4].zw
and we could still do:
ADD OUT[5].xyz, TEMP[0], TEMP[1]

Now, these would have to merged declared as:
OUT[0..4].xyzw

and the 2nd vec3 would be { OUT[0].w, OUT[1].xyz }

instead of simply OUT[1].xyz

A problem with this is that the GLSL compiler, while it can do the
packing into vec4s and splitting up access, cannot, iirc, access
individual components of a vec4 indirectly like TGSI would be able to.
To avoid TEMP arrays we'd have to disable the last phase of varying
packing (that actually converts the code to using vec4s).
It would still be able to assign fractional locations to guarantee that
linkage works, but glsl-to-tgsi would likely have 

Re: [Mesa-dev] RFC: TGSI scalar arrays

2013-03-20 Thread Christoph Bumiller
On 20.03.2013 17:05, Roland Scheidegger wrote:
 Am 20.03.2013 15:41, schrieb Christoph Bumiller:
 Sorry, this has become longer than I anticipated ...

 I've been toying with adding support for TGSI_FILE_INPUT/OUTPUT arrays
 because, since I cannot allocate varyings in the same order that the
 register index specifies, I need it:

 ===
 EXAMPLE:
 OUT[0], CLIPDIST[1], must be allocated at address 0x2c0 in hardware
 output space
 OUT[1], CLIPDIST[0], 0x2d0
 OUT[2], GENERIC[0], between 0x80 and 0x280
 OUT[3], GENERIC[1], between 0x80 and 0x280

 And without array specification
 MOV OUT[TEMP[0].x-1], IMM[0]
 would leave me no clue as to whether use 0x80 or 0x2c0 as base address.
 ===

 Now that I'm on it, I'm considering to go a step further, which is
 adding indirect scalar/component access.
 This is motivated by float gl_ClipDistance[], which, if accessed
 indirectly, currently leaves us no choice than generating code like this:

 if ((index  3) == 0) access x component; else
 if ((index  3) == 1) access y component; ...

 This is undesirable and the hardware can do better (as it actually
 supports accessing individual components since address registers contain
 an address in bytes and we can do scalar read/write).

 A second motivation is varying packing, which is required by the GL
 spec, and may lead to use of TEMP arrays, which, albeit improved now,
 will impair performance when used (on nv50 they go to uncached memory
 which is very slow).

 That case occurs if, for instance, a varying float[8] is accessed
 indirectly and has to be packed into
 OUT[0..1].xyzw, GENERIC[0..1]
 instead of
 OUT[0..7].x, GENERIC[0..7]

 So far I've come up with 2 choices (all available only if the driver
 supports e.g. PIPE_CAP_TGSI_SCALAR_REGISTERS):


 1. SCALAR DECLARATIONS

 Using float gl_ClipDistance[8] as example, it could be declared as:

 OUT[0..7].x, CLIPDIST, ARRAY(1) where the .x now means that it's a
 single component per OUT[index]

 Now this obviously means that a single OUT[i] doesn't always consume 16
 bytes / 4 components anymore, which may be a somewhat disturbing, since
 the address of an output can't be directly inferred solely from its
 index anymore.
 However, that doesn't really constitute a problem if all access is
 either direct or comes with an ARRAY() reference.

 For varying packing, which happens only for user defined variables, and
 hence TGSI_SEMANTIC_GENERIC, it gets a bit uglier:

 (NOTE: GL requires us to be able to support exactly the amount of
 components we report, failing due to alignment is not allowed. Hence the
 GLSL compiler may put some variables at unaligned locations, see
 ir_variable.location_frac):

 A GENERIC semantic index should always cover 4 components so that a
 fixed location can be assigned for it (drivers usually do this since it
 makes an extra dynamic linkage pass when shaders are changed
 unnecessary, as intended by GL_ARB_separate_shader_objects).

 So, this would be valid:
 OUT[0..3].x, GENERIC[0]
 OUT[4..5].xy, GENERIC[1]
 OUT[6], GENERIC[2]
 Note how 3 OUT[indices] only consume 1 GENERIC[index].

 If we, instead, allocated semantic index per register index instead of
 per 4 components, we would have:
 OUT[0..3].x, GENERIC[0]
 OUT[4..5].xy, GENERIC[4]
 OUT[6], GENERIC[6]
 This would waste space, since GENERIC[4,6] would have to go to
 output_space[addresses 0x40, 0x60] so it could link with
 IN[6], GENERIC[6]
 where we have no information about the size of GENERIC[0 .. 5], and
 wasting space like that means the advertised number of varying
 components cannot be satisfied.


 And as a last step, if varyings are placed at non-vec4 boundaries, we
 would have to be able to specify fractional semantic indices, like this:
 OUT[0..2].x, GENERIC[0].x
 OUT[3].x, GENERIC[0].w



 2. SCALAR ADDRESS REGISTER VALUES

 All this can be avoided by always declaring full vec4s, and adding the
 possibility of doing indirect addressing on a per-component basis:

 varying float a[4] becomes:
 uniform int i;
 a[i+5] = 999 becomes:

 OUT[0].xyzw, ARRAY(1)
 UARL_SCALAR ADDR[0].x, CONST[0].
 MOV OUT(array 1)[ADDR[0].x+1].y, IMM[0].

 The only difficulty with this is that we have to split acess TGSI
 instructions accessing unaligned vectors:
 (NOTE: this can always be avoided with TGSI_FILE_TEMPORARY, but varyings
 may have to be packed).

 With suggestion (1), 2 packed (and hence unaligned) vec3 arrays and a
 single vec2 would look like this:
 OUT[0..3].xyz, GENERIC[0].x
 OUT[4..5].xyz, GENERIC[3].x
 OUT[6].xy, GENERIC[4].zw
 and we could still do:
 ADD OUT[5].xyz, TEMP[0], TEMP[1]

 Now, these would have to merged declared as:
 OUT[0..4].xyzw

 and the 2nd vec3 would be { OUT[0].w, OUT[1].xyz }

 instead of simply OUT[1].xyz

 A problem with this is that the GLSL compiler, while it can do the
 packing into vec4s and splitting up access, cannot, iirc, access
 individual components of a vec4 indirectly like TGSI would be able to.
 To avoid TEMP arrays we'd have to disable

Re: [Mesa-dev] RFC: TGSI scalar arrays

2013-03-20 Thread Christoph Bumiller
On 20.03.2013 18:30, Roland Scheidegger wrote:
 Am 20.03.2013 17:46, schrieb Christoph Bumiller:
 On 20.03.2013 17:05, Roland Scheidegger wrote:

 Not sure I fully understand this, but I'm thinking whenever in doubt,
 use something close to what dx10 does since that's likely going to work
 reasonable with different hw. Maybe declaring those special values
 differently (not just as output reg) would help?
 What DX10 does is making indirect access of varyings illegal. That's not
 possible with OpenGL ...
 Hmm I thought dcl_indexRange would be used for indirect access of varyings?

Interesting ... when last I tried that back when working on d3d1x, the
compiler didn't like it, and I remember something about indexRange
existing only for debugging (and I remember finding that strange).

Also, d3d11 doesn't have the annoying limit that GLSL has so there is no
need for it to pack varyings.
When I use floats[3] + SV_POSITION, I get vs_5_0 output limit (32)
exceeded, shader uses 33 outputs, but float4[28] works just fine.

For indirect access I still get:
error X3500: array reference cannot be used as an l-value; not natively
addressable

for

struct IA2VS
{
float4 position : POSITION;
float4 color: COLOR;
};

struct VS2PS
{
float4 position : SV_POSITION;
float4 color[2] : WHATEVER;
};

VS2PS vs(IA2VS input)
{
VS2PS result;
int i = int(input.position.x);
result.position = input.position;
result.color[i] = input.color;
return result;
}

float4 ps(VS2PS input) : SV_TARGET
{
return input.color[0];
}

 Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium: add TGSI_SEMANTIC_TEXCOORD, PCOORD v3

2013-03-19 Thread Christoph Bumiller
On 15.03.2013 22:16, Christoph Bumiller wrote:
 This makes it possible to identify gl_TexCoord and gl_PointCoord
 for drivers where sprite coordinate replacement is restricted.

 The new PIPE_CAP_TGSI_TEXCOORD decides whether these varyings
 should be hidden behind the GENERIC semantic or not.

 With this patch only nvc0 and nv30 will request that they be used.

 v2: introduce a CAP so other drivers don't have to bother with
 the new semantic

 v3: adapt to introduction gl_varying_slot enum

I would push this soon if there are no objections ...

 ---
  src/gallium/auxiliary/draw/draw_pipe_wide_point.c |   46 
 +
  src/gallium/auxiliary/tgsi/tgsi_dump.c|1 +
  src/gallium/auxiliary/tgsi/tgsi_strings.c |4 +-
  src/gallium/docs/source/cso/rasterizer.rst|5 ++
  src/gallium/docs/source/screen.rst|8 
  src/gallium/docs/source/tgsi.rst  |   29 +
  src/gallium/drivers/freedreno/freedreno_screen.c  |2 +
  src/gallium/drivers/i915/i915_screen.c|2 +
  src/gallium/drivers/llvmpipe/lp_screen.c  |1 +
  src/gallium/drivers/nv30/nv30_screen.c|1 +
  src/gallium/drivers/nv30/nvfx_fragprog.c  |   42 ++-
  src/gallium/drivers/nv30/nvfx_vertprog.c  |7 +++-
  src/gallium/drivers/nv50/codegen/nv50_ir_driver.h |2 -
  src/gallium/drivers/nv50/nv50_screen.c|1 +
  src/gallium/drivers/nv50/nv50_surface.c   |5 +-
  src/gallium/drivers/nvc0/nvc0_program.c   |   37 +---
  src/gallium/drivers/nvc0/nvc0_screen.c|1 +
  src/gallium/drivers/r300/r300_screen.c|2 +
  src/gallium/drivers/r600/r600_pipe.c  |2 +
  src/gallium/drivers/radeonsi/radeonsi_pipe.c  |2 +
  src/gallium/drivers/softpipe/sp_screen.c  |2 +
  src/gallium/drivers/svga/svga_screen.c|2 +
  src/gallium/include/pipe/p_defines.h  |3 +-
  src/gallium/include/pipe/p_shader_tokens.h|4 +-
  src/gallium/include/pipe/p_state.h|2 +-
  src/mesa/state_tracker/st_context.c   |3 +
  src/mesa/state_tracker/st_context.h   |2 +
  src/mesa/state_tracker/st_program.c   |   45 +++-
  28 files changed, 171 insertions(+), 92 deletions(-)

 diff --git a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c 
 b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c
 index 8e0a117..0d3fee4 100644
 --- a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c
 +++ b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c
 @@ -52,6 +52,7 @@
   */
  
  
 +#include pipe/p_screen.h
  #include pipe/p_context.h
  #include util/u_math.h
  #include util/u_memory.h
 @@ -74,6 +75,9 @@ struct widepoint_stage {
 uint num_texcoord_gen;
 uint texcoord_gen_slot[PIPE_MAX_SHADER_OUTPUTS];
  
 +   /* TGSI_SEMANTIC to which sprite_coord_enable applies */
 +   unsigned sprite_coord_semantic;
 +
 int psize_slot;
  };
  
 @@ -233,28 +237,29 @@ widepoint_first_point(struct draw_stage *stage,
  
wide-num_texcoord_gen = 0;
  
 -  /* Loop over fragment shader inputs looking for generic inputs
 +  /* Loop over fragment shader inputs looking for the PCOORD input or 
 inputs
 * for which bit 'k' in sprite_coord_enable is set.
 */
for (i = 0; i  fs-info.num_inputs; i++) {
 - if (fs-info.input_semantic_name[i] == TGSI_SEMANTIC_GENERIC) {
 -const int generic_index = fs-info.input_semantic_index[i];
 -/* Note that sprite_coord enable is a bitfield of
 - * PIPE_MAX_SHADER_OUTPUTS bits.
 - */
 -if (generic_index  PIPE_MAX_SHADER_OUTPUTS 
 -(rast-sprite_coord_enable  (1  generic_index))) {
 -   /* OK, this generic attribute needs to be replaced with a
 -* texcoord (see above).
 -*/
 -   int slot = draw_alloc_extra_vertex_attrib(draw,
 - 
 TGSI_SEMANTIC_GENERIC,
 - generic_index);
 -
 -   /* add this slot to the texcoord-gen list */
 -   wide-texcoord_gen_slot[wide-num_texcoord_gen++] = slot;
 -}
 + int slot;
 + const unsigned sn = fs-info.input_semantic_name[i];
 + const unsigned si = fs-info.input_semantic_index[i];
 +
 + if (sn == wide-sprite_coord_semantic) {
 +/* Note that sprite_coord_enable is a bitfield of 32 bits. */
 +if (si = 32 || !(rast-sprite_coord_enable  (1  si)))
 +   continue;
 + } else if (sn != TGSI_SEMANTIC_PCOORD) {
 +continue;
   }
 +
 + /* OK, this generic attribute needs to be replaced with a
 +  * sprite coord (see above).
 +  */
 + slot

Re: [Mesa-dev] [PATCH 9/9] tgsi: add ArrayID documentation

2013-03-17 Thread Christoph Bumiller
On 17.03.2013 16:30, Christian König wrote:
 Am 15.03.2013 18:58, schrieb Christoph Bumiller:
 On 15.03.2013 13:08, Christian König wrote:
 Am 14.03.2013 15:53, schrieb Christoph Bumiller:
 On 14.03.2013 15:20, Christian König wrote:
 From: Christian König christian.koe...@amd.com

 Signed-off-by: Christian König christian.koe...@amd.com
 ---
src/gallium/docs/source/tgsi.rst |   16 
1 file changed, 16 insertions(+)

 diff --git a/src/gallium/docs/source/tgsi.rst
 b/src/gallium/docs/source/tgsi.rst
 index d9a7fe9..27fe039 100644
 --- a/src/gallium/docs/source/tgsi.rst
 +++ b/src/gallium/docs/source/tgsi.rst
 @@ -1833,6 +1833,22 @@ If Interpolate flag is set to 1, a
 Declaration Interpolate token follows.
  If file is TGSI_FILE_RESOURCE, a Declaration Resource token
 follows.
+If Array flag is set to 1, a Declaration Array token follows.
 +
 +Array Declaration
 +
 +
 +Declarations can optional have an ArrayID attribute which can be
 referred by
 +indirect addressing operands. An ArrayID of zero is reserved and
 treaded as
 +if no ArrayID is specified.
 +
 +If an indirect addressing operand refers to an specific declaration
 by using
 s/an/a
 Thx, fixed.

 +an ArrayID only the registers in this declaration are guaranteed
 to be
 +accessed, accessing any register outside this declaration results
 in undefined
 +behavior.
 + Note that the effective index is zero-based and not relative to the
 specified declaration. XXX: Is it ? Should it be ?
 Yes for compatibility reasons, otherwise we would need to change all
 drivers at once.

 +
 +If no ArrayID is specified with an indirect addressing operand the
 whole
 +register file might be accessed by this operand.

 + A practice which is strongly discouraged. Don't do this if you have
 more than 1 declaration for the file in question ! It will prevent
 packing of scalar/vec2 arrays and effective memory alias analysis.
 A bit shortened, but in general added the remark.

 Packing ? Yes !
 We can pack arrays if they're declared as e.g.
 TEMP[0-3].xyzw
 TEMP[4-31].x

 And the caches will be very very thankful that we don't just access
 every 4th element of our 4 times larger than it needs to be buffer !!!

 And if your card can't do that, pleeease be nice and still make it
 possible for other drivers. :o3
 It is probably possible with the new information to do so, but not
 priority for me cause I primary need it for our LLVM backend.

 At some point you'll be able to make use of the info in your backend,
 too, and then you'll regret having to refamiliarize with this code just
 because you didn't add the extra (estimated) 2 lines to set the
 UsageMask.

 I think you misunderstood me here, you don't need the UsageMask to
 generate those informations. It is possible by just scanning the
 shader to figure out which channels are used and which aren't.

For temporaries that may be true ... and inputs/outputs are always vec4
sized to guarantee linkage, packing for GENERIC ones is handled at the
mesa level.

 Additional to that I'm not convinced that using the UsageMask for this
 is 100% correct, to me it looks more like UsageMask is something we
 need for outputs to distinct between not writing to an output channel
 (and so still having the default) and not having an output channel at
 all.

Actually, for gl_ClipDistance[] we use the UsageMask to specify if the
clip distance was declared in the source (and thus should be enabled)
instead of whether it's been written or not.

I wanted to be able to distinguish between
float gl_ClipDistance[8] or
vec4 mesa_ClipDistance[2] with the UsageMask but I guess

OUT[0..1].x, CLIPDIST might just as well mean that gl_ClipDistance[0 and
4] are being used ...

Hm, we'll need a cap for that anway to tell st if it should lower
ClipDistance to vec4s or not, and just assume that TGSI corresponds to
what the cap says.
And since this is the only case for IN/OUT where the driver's backend
has to decide whether to pack or not ... ok, I'll just infer array width
myself, too, and you can ignore the UsageMask.

 Also, NAK from me until array access/declarations for the other files
 follows suit.
 Sorry for being so ... pesky, but I'd really like this change to be 100%
 complete. Come on, doesn't it nag on your conscience if this is left to
 remain only a few smalls steps from perfection ?

 Declaring and accessing arrays for inputs/outputs are not so much of a
 problem, figuring out how to get this information to glsl_to_tgsi is
 the real problem. For temporaries changing the glsl_to_tgsi pass is
 pretty much sufficient, but for inputs and outputs you need to dig
 into the mesa state tracker, and I definitely don't intend to do so.

Fine, then I'll have a look at that myself. * flaming eyes *

 Christian.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 9/9] tgsi: add ArrayID documentation

2013-03-17 Thread Christoph Bumiller
On 17.03.2013 18:04, Christoph Bumiller wrote:
 On 17.03.2013 16:30, Christian König wrote:
 Am 15.03.2013 18:58, schrieb Christoph Bumiller:
 On 15.03.2013 13:08, Christian König wrote:
 Am 14.03.2013 15:53, schrieb Christoph Bumiller:
 On 14.03.2013 15:20, Christian König wrote:
 From: Christian König christian.koe...@amd.com

 Signed-off-by: Christian König christian.koe...@amd.com
 ---
src/gallium/docs/source/tgsi.rst |   16 
1 file changed, 16 insertions(+)

 diff --git a/src/gallium/docs/source/tgsi.rst
 b/src/gallium/docs/source/tgsi.rst
 index d9a7fe9..27fe039 100644
 --- a/src/gallium/docs/source/tgsi.rst
 +++ b/src/gallium/docs/source/tgsi.rst
 @@ -1833,6 +1833,22 @@ If Interpolate flag is set to 1, a
 Declaration Interpolate token follows.
  If file is TGSI_FILE_RESOURCE, a Declaration Resource token
 follows.
+If Array flag is set to 1, a Declaration Array token follows.
 +
 +Array Declaration
 +
 +
 +Declarations can optional have an ArrayID attribute which can be
 referred by
 +indirect addressing operands. An ArrayID of zero is reserved and
 treaded as
 +if no ArrayID is specified.
 +
 +If an indirect addressing operand refers to an specific declaration
 by using
 s/an/a
 Thx, fixed.

 +an ArrayID only the registers in this declaration are guaranteed
 to be
 +accessed, accessing any register outside this declaration results
 in undefined
 +behavior.
 + Note that the effective index is zero-based and not relative to the
 specified declaration. XXX: Is it ? Should it be ?
 Yes for compatibility reasons, otherwise we would need to change all
 drivers at once.

 +
 +If no ArrayID is specified with an indirect addressing operand the
 whole
 +register file might be accessed by this operand.

 + A practice which is strongly discouraged. Don't do this if you have
 more than 1 declaration for the file in question ! It will prevent
 packing of scalar/vec2 arrays and effective memory alias analysis.
 A bit shortened, but in general added the remark.

 Packing ? Yes !
 We can pack arrays if they're declared as e.g.
 TEMP[0-3].xyzw
 TEMP[4-31].x

 And the caches will be very very thankful that we don't just access
 every 4th element of our 4 times larger than it needs to be buffer !!!

 And if your card can't do that, pleeease be nice and still make it
 possible for other drivers. :o3
 It is probably possible with the new information to do so, but not
 priority for me cause I primary need it for our LLVM backend.

 At some point you'll be able to make use of the info in your backend,
 too, and then you'll regret having to refamiliarize with this code just
 because you didn't add the extra (estimated) 2 lines to set the
 UsageMask.
 I think you misunderstood me here, you don't need the UsageMask to
 generate those informations. It is possible by just scanning the
 shader to figure out which channels are used and which aren't.

 For temporaries that may be true ... and inputs/outputs are always vec4
 sized to guarantee linkage, packing for GENERIC ones is handled at the
 mesa level.

 Additional to that I'm not convinced that using the UsageMask for this
 is 100% correct, to me it looks more like UsageMask is something we
 need for outputs to distinct between not writing to an output channel
 (and so still having the default) and not having an output channel at
 all.

 Actually, for gl_ClipDistance[] we use the UsageMask to specify if the
 clip distance was declared in the source (and thus should be enabled)
 instead of whether it's been written or not.

 I wanted to be able to distinguish between
 float gl_ClipDistance[8] or
 vec4 mesa_ClipDistance[2] with the UsageMask but I guess

 OUT[0..1].x, CLIPDIST might just as well mean that gl_ClipDistance[0 and
 4] are being used ...

 Hm, we'll need a cap for that anway to tell st if it should lower
 ClipDistance to vec4s or not, and just assume that TGSI corresponds to
 what the cap says.
 And since this is the only case for IN/OUT where the driver's backend
 has to decide whether to pack or not ... ok, I'll just infer array width
 myself, too, and you can ignore the UsageMask.

 Also, NAK from me until array access/declarations for the other files
 follows suit.
 Sorry for being so ... pesky, but I'd really like this change to be 100%
 complete. Come on, doesn't it nag on your conscience if this is left to
 remain only a few smalls steps from perfection ?
 Declaring and accessing arrays for inputs/outputs are not so much of a
 problem, figuring out how to get this information to glsl_to_tgsi is
 the real problem. For temporaries changing the glsl_to_tgsi pass is
 pretty much sufficient, but for inputs and outputs you need to dig
 into the mesa state tracker, and I definitely don't intend to do so.

 Fine, then I'll have a look at that myself. * flaming eyes *

Ok, had a look, had enough. It hurts. At least if you have never touched
glsl-to-tgsi and go about it expecting it to be easy

Re: [Mesa-dev] [PATCH 9/9] tgsi: add ArrayID documentation

2013-03-15 Thread Christoph Bumiller
On 15.03.2013 13:08, Christian König wrote:
 Am 14.03.2013 15:53, schrieb Christoph Bumiller:
 On 14.03.2013 15:20, Christian König wrote:
 From: Christian König christian.koe...@amd.com

 Signed-off-by: Christian König christian.koe...@amd.com
 ---
   src/gallium/docs/source/tgsi.rst |   16 
   1 file changed, 16 insertions(+)

 diff --git a/src/gallium/docs/source/tgsi.rst
 b/src/gallium/docs/source/tgsi.rst
 index d9a7fe9..27fe039 100644
 --- a/src/gallium/docs/source/tgsi.rst
 +++ b/src/gallium/docs/source/tgsi.rst
 @@ -1833,6 +1833,22 @@ If Interpolate flag is set to 1, a
 Declaration Interpolate token follows.
 If file is TGSI_FILE_RESOURCE, a Declaration Resource token
 follows.
   +If Array flag is set to 1, a Declaration Array token follows.
 +
 +Array Declaration
 +
 +
 +Declarations can optional have an ArrayID attribute which can be
 referred by
 +indirect addressing operands. An ArrayID of zero is reserved and
 treaded as
 +if no ArrayID is specified.
 +
 +If an indirect addressing operand refers to an specific declaration
 by using
 s/an/a

 Thx, fixed.


 +an ArrayID only the registers in this declaration are guaranteed to be
 +accessed, accessing any register outside this declaration results
 in undefined
 +behavior.
 + Note that the effective index is zero-based and not relative to the
 specified declaration. XXX: Is it ? Should it be ?

 Yes for compatibility reasons, otherwise we would need to change all
 drivers at once.


 +
 +If no ArrayID is specified with an indirect addressing operand the
 whole
 +register file might be accessed by this operand.
   
 + A practice which is strongly discouraged. Don't do this if you have
 more than 1 declaration for the file in question ! It will prevent
 packing of scalar/vec2 arrays and effective memory alias analysis.

 A bit shortened, but in general added the remark.

 Packing ? Yes !
 We can pack arrays if they're declared as e.g.
 TEMP[0-3].xyzw
 TEMP[4-31].x

 And the caches will be very very thankful that we don't just access
 every 4th element of our 4 times larger than it needs to be buffer !!!

 And if your card can't do that, pleeease be nice and still make it
 possible for other drivers. :o3

 It is probably possible with the new information to do so, but not
 priority for me cause I primary need it for our LLVM backend.

At some point you'll be able to make use of the info in your backend,
too, and then you'll regret having to refamiliarize with this code just
because you didn't add the extra (estimated) 2 lines to set the UsageMask.

Also, NAK from me until array access/declarations for the other files
follows suit.
Sorry for being so ... pesky, but I'd really like this change to be 100%
complete. Come on, doesn't it nag on your conscience if this is left to
remain only a few smalls steps from perfection ?

 Christian.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallium: add TGSI_SEMANTIC_TEXCOORD,PCOORD v3

2013-03-15 Thread Christoph Bumiller
This makes it possible to identify gl_TexCoord and gl_PointCoord
for drivers where sprite coordinate replacement is restricted.

The new PIPE_CAP_TGSI_TEXCOORD decides whether these varyings
should be hidden behind the GENERIC semantic or not.

With this patch only nvc0 and nv30 will request that they be used.

v2: introduce a CAP so other drivers don't have to bother with
the new semantic

v3: adapt to introduction gl_varying_slot enum
---
 src/gallium/auxiliary/draw/draw_pipe_wide_point.c |   46 +
 src/gallium/auxiliary/tgsi/tgsi_dump.c|1 +
 src/gallium/auxiliary/tgsi/tgsi_strings.c |4 +-
 src/gallium/docs/source/cso/rasterizer.rst|5 ++
 src/gallium/docs/source/screen.rst|8 
 src/gallium/docs/source/tgsi.rst  |   29 +
 src/gallium/drivers/freedreno/freedreno_screen.c  |2 +
 src/gallium/drivers/i915/i915_screen.c|2 +
 src/gallium/drivers/llvmpipe/lp_screen.c  |1 +
 src/gallium/drivers/nv30/nv30_screen.c|1 +
 src/gallium/drivers/nv30/nvfx_fragprog.c  |   42 ++-
 src/gallium/drivers/nv30/nvfx_vertprog.c  |7 +++-
 src/gallium/drivers/nv50/codegen/nv50_ir_driver.h |2 -
 src/gallium/drivers/nv50/nv50_screen.c|1 +
 src/gallium/drivers/nv50/nv50_surface.c   |5 +-
 src/gallium/drivers/nvc0/nvc0_program.c   |   37 +---
 src/gallium/drivers/nvc0/nvc0_screen.c|1 +
 src/gallium/drivers/r300/r300_screen.c|2 +
 src/gallium/drivers/r600/r600_pipe.c  |2 +
 src/gallium/drivers/radeonsi/radeonsi_pipe.c  |2 +
 src/gallium/drivers/softpipe/sp_screen.c  |2 +
 src/gallium/drivers/svga/svga_screen.c|2 +
 src/gallium/include/pipe/p_defines.h  |3 +-
 src/gallium/include/pipe/p_shader_tokens.h|4 +-
 src/gallium/include/pipe/p_state.h|2 +-
 src/mesa/state_tracker/st_context.c   |3 +
 src/mesa/state_tracker/st_context.h   |2 +
 src/mesa/state_tracker/st_program.c   |   45 +++-
 28 files changed, 171 insertions(+), 92 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c 
b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c
index 8e0a117..0d3fee4 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c
@@ -52,6 +52,7 @@
  */
 
 
+#include pipe/p_screen.h
 #include pipe/p_context.h
 #include util/u_math.h
 #include util/u_memory.h
@@ -74,6 +75,9 @@ struct widepoint_stage {
uint num_texcoord_gen;
uint texcoord_gen_slot[PIPE_MAX_SHADER_OUTPUTS];
 
+   /* TGSI_SEMANTIC to which sprite_coord_enable applies */
+   unsigned sprite_coord_semantic;
+
int psize_slot;
 };
 
@@ -233,28 +237,29 @@ widepoint_first_point(struct draw_stage *stage,
 
   wide-num_texcoord_gen = 0;
 
-  /* Loop over fragment shader inputs looking for generic inputs
+  /* Loop over fragment shader inputs looking for the PCOORD input or 
inputs
* for which bit 'k' in sprite_coord_enable is set.
*/
   for (i = 0; i  fs-info.num_inputs; i++) {
- if (fs-info.input_semantic_name[i] == TGSI_SEMANTIC_GENERIC) {
-const int generic_index = fs-info.input_semantic_index[i];
-/* Note that sprite_coord enable is a bitfield of
- * PIPE_MAX_SHADER_OUTPUTS bits.
- */
-if (generic_index  PIPE_MAX_SHADER_OUTPUTS 
-(rast-sprite_coord_enable  (1  generic_index))) {
-   /* OK, this generic attribute needs to be replaced with a
-* texcoord (see above).
-*/
-   int slot = draw_alloc_extra_vertex_attrib(draw,
- TGSI_SEMANTIC_GENERIC,
- generic_index);
-
-   /* add this slot to the texcoord-gen list */
-   wide-texcoord_gen_slot[wide-num_texcoord_gen++] = slot;
-}
+ int slot;
+ const unsigned sn = fs-info.input_semantic_name[i];
+ const unsigned si = fs-info.input_semantic_index[i];
+
+ if (sn == wide-sprite_coord_semantic) {
+/* Note that sprite_coord_enable is a bitfield of 32 bits. */
+if (si = 32 || !(rast-sprite_coord_enable  (1  si)))
+   continue;
+ } else if (sn != TGSI_SEMANTIC_PCOORD) {
+continue;
  }
+
+ /* OK, this generic attribute needs to be replaced with a
+  * sprite coord (see above).
+  */
+ slot = draw_alloc_extra_vertex_attrib(draw, sn, si);
+
+ /* add this slot to the texcoord-gen list */
+ wide-texcoord_gen_slot[wide-num_texcoord_gen++] = slot;
   }
}
 
@@ -326,6 +331,11 @@ struct 

[Mesa-dev] [PATCH] gallium: add TGSI_SEMANTIC_TEXCOORD, PCOORD (CAP variant)

2013-03-14 Thread Christoph Bumiller
---
 src/gallium/auxiliary/draw/draw_pipe_wide_point.c |   46 +--
 src/gallium/auxiliary/tgsi/tgsi_dump.c|1 +
 src/gallium/auxiliary/tgsi/tgsi_strings.c |4 +-
 src/gallium/docs/source/cso/rasterizer.rst|5 ++
 src/gallium/docs/source/screen.rst|8 +++
 src/gallium/docs/source/tgsi.rst  |   29 +
 src/gallium/drivers/freedreno/freedreno_screen.c  |2 +
 src/gallium/drivers/i915/i915_screen.c|2 +
 src/gallium/drivers/llvmpipe/lp_screen.c  |1 +
 src/gallium/drivers/nv30/nv30_screen.c|1 +
 src/gallium/drivers/nv30/nvfx_fragprog.c  |   39 ++--
 src/gallium/drivers/nv50/codegen/nv50_ir_driver.h |2 -
 src/gallium/drivers/nv50/nv50_screen.c|1 +
 src/gallium/drivers/nv50/nv50_surface.c   |5 +-
 src/gallium/drivers/nvc0/nvc0_program.c   |   37 +--
 src/gallium/drivers/nvc0/nvc0_screen.c|1 +
 src/gallium/drivers/r300/r300_screen.c|2 +
 src/gallium/drivers/r600/r600_pipe.c  |2 +
 src/gallium/drivers/radeonsi/radeonsi_pipe.c  |2 +
 src/gallium/drivers/softpipe/sp_screen.c  |2 +
 src/gallium/drivers/svga/svga_screen.c|2 +
 src/gallium/include/pipe/p_defines.h  |3 +-
 src/gallium/include/pipe/p_shader_tokens.h|4 +-
 src/gallium/include/pipe/p_state.h|2 +-
 src/mesa/state_tracker/st_atom_rasterizer.c   |5 +-
 src/mesa/state_tracker/st_context.c   |3 +
 src/mesa/state_tracker/st_context.h   |2 +
 src/mesa/state_tracker/st_program.c   |   68 +++--
 28 files changed, 181 insertions(+), 100 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c 
b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c
index 8e0a117..0d3fee4 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c
@@ -52,6 +52,7 @@
  */
 
 
+#include pipe/p_screen.h
 #include pipe/p_context.h
 #include util/u_math.h
 #include util/u_memory.h
@@ -74,6 +75,9 @@ struct widepoint_stage {
uint num_texcoord_gen;
uint texcoord_gen_slot[PIPE_MAX_SHADER_OUTPUTS];
 
+   /* TGSI_SEMANTIC to which sprite_coord_enable applies */
+   unsigned sprite_coord_semantic;
+
int psize_slot;
 };
 
@@ -233,28 +237,29 @@ widepoint_first_point(struct draw_stage *stage,
 
   wide-num_texcoord_gen = 0;
 
-  /* Loop over fragment shader inputs looking for generic inputs
+  /* Loop over fragment shader inputs looking for the PCOORD input or 
inputs
* for which bit 'k' in sprite_coord_enable is set.
*/
   for (i = 0; i  fs-info.num_inputs; i++) {
- if (fs-info.input_semantic_name[i] == TGSI_SEMANTIC_GENERIC) {
-const int generic_index = fs-info.input_semantic_index[i];
-/* Note that sprite_coord enable is a bitfield of
- * PIPE_MAX_SHADER_OUTPUTS bits.
- */
-if (generic_index  PIPE_MAX_SHADER_OUTPUTS 
-(rast-sprite_coord_enable  (1  generic_index))) {
-   /* OK, this generic attribute needs to be replaced with a
-* texcoord (see above).
-*/
-   int slot = draw_alloc_extra_vertex_attrib(draw,
- TGSI_SEMANTIC_GENERIC,
- generic_index);
-
-   /* add this slot to the texcoord-gen list */
-   wide-texcoord_gen_slot[wide-num_texcoord_gen++] = slot;
-}
+ int slot;
+ const unsigned sn = fs-info.input_semantic_name[i];
+ const unsigned si = fs-info.input_semantic_index[i];
+
+ if (sn == wide-sprite_coord_semantic) {
+/* Note that sprite_coord_enable is a bitfield of 32 bits. */
+if (si = 32 || !(rast-sprite_coord_enable  (1  si)))
+   continue;
+ } else if (sn != TGSI_SEMANTIC_PCOORD) {
+continue;
  }
+
+ /* OK, this generic attribute needs to be replaced with a
+  * sprite coord (see above).
+  */
+ slot = draw_alloc_extra_vertex_attrib(draw, sn, si);
+
+ /* add this slot to the texcoord-gen list */
+ wide-texcoord_gen_slot[wide-num_texcoord_gen++] = slot;
   }
}
 
@@ -326,6 +331,11 @@ struct draw_stage *draw_wide_point_stage( struct 
draw_context *draw )
if (!draw_alloc_temp_verts( wide-stage, 4 ))
   goto fail;
 
+   wide-sprite_coord_semantic =
+  draw-pipe-screen-get_param(draw-pipe-screen, PIPE_CAP_TGSI_TEXCOORD)
+  ?
+  TGSI_SEMANTIC_TEXCOORD : TGSI_SEMANTIC_GENERIC;
+
return wide-stage;
 
  fail:
diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c 
b/src/gallium/auxiliary/tgsi/tgsi_dump.c
index 3e6f76a..8f16f2d 100644

[Mesa-dev] [PATCH] gallium: add TGSI_SEMANTIC_TEXCOORD,PCOORD

2013-03-13 Thread Christoph Bumiller
Second attempt, 2 years ago no one replied or cared ...

We really need to know about these on nvc0 because there are only 8
fixed hardware locations that can be overwritten by sprite coordinates,
and one location that represents gl_PointCoord and unconditionally
returns sprite coordinates.

So far this was solved via a hack, which works since the locations the
state tracker picks aren't dynamic (and likely will never be, to facilitate
ARB_separate_shader_objects), but it still isn't nice to do it this way.

It looks like nv30 was using a hack, too, since it had a check for
Semantic.Index == 9, which is what mesa uses for PointCoord.

Implementing a safe, non-mesa-dependent way without these SEMANTICs would
be jumping through hoops and doing expensive shader recompilations just
because we like to destroy information at the gallium threshold, and that's
unacceptable.

I started to (try) fix up the other drivers, but maybe we just want a CAP
for this instead, since the default solution - if this is TEXCOORD then
treat it as GENERIC with semantic index += MAX_TEXCOORDS - doesn't really
look that nicer either.
E.g. if PIPE_CAP_RESTRICTED_SPRITE_COORDS is advertised, the state tracker
should use the TEXCOORD and PCOORD semantics, otherwise it should just use
GENERICs as before.
---
 src/gallium/auxiliary/draw/draw_pipe_wide_point.c  |   39 
 src/gallium/auxiliary/tgsi/tgsi_dump.c |1 +
 src/gallium/auxiliary/tgsi/tgsi_strings.c  |2 +
 src/gallium/docs/source/cso/rasterizer.rst |2 +-
 src/gallium/docs/source/tgsi.rst   |   23 +-
 src/gallium/drivers/freedreno/freedreno_compiler.c |2 +
 src/gallium/drivers/i915/i915_fpc_translate.c  |2 +
 src/gallium/drivers/i915/i915_state_derived.c  |4 ++
 src/gallium/drivers/llvmpipe/lp_setup_point.c  |   29 ++--
 src/gallium/drivers/nv30/nvfx_fragprog.c   |   39 
 src/gallium/drivers/nv50/nv50_shader_state.c   |8 +--
 src/gallium/drivers/nv50/nv50_surface.c|5 +-
 src/gallium/drivers/nvc0/nvc0_program.c|   37 +--
 src/gallium/drivers/r300/r300_fs.c |2 +
 src/gallium/drivers/r300/r300_shader_semantics.h   |3 +-
 src/gallium/drivers/r300/r300_vs.c |2 +
 src/gallium/drivers/r600/evergreen_state.c |7 ++-
 src/gallium/drivers/r600/r600_shader.c |3 +-
 src/gallium/drivers/r600/r600_state.c  |7 ++-
 src/gallium/drivers/radeonsi/radeonsi_shader.c |1 +
 src/gallium/drivers/radeonsi/si_state.c|2 +-
 src/gallium/drivers/radeonsi/si_state_draw.c   |5 +-
 src/gallium/include/pipe/p_shader_tokens.h |   36 +--
 src/gallium/include/pipe/p_state.h |2 +-
 src/mesa/state_tracker/st_atom_rasterizer.c|6 +--
 src/mesa/state_tracker/st_program.c|   48 +--
 26 files changed, 162 insertions(+), 155 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c 
b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c
index 8e0a117..d4ed0f7 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_wide_point.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_wide_point.c
@@ -233,28 +233,29 @@ widepoint_first_point(struct draw_stage *stage,
 
   wide-num_texcoord_gen = 0;
 
-  /* Loop over fragment shader inputs looking for generic inputs
-   * for which bit 'k' in sprite_coord_enable is set.
+  /* Loop over fragment shader inputs looking for the PCOORD input or
+   * TEXCOORD inputs for which bit 'k' in sprite_coord_enable is set.
*/
   for (i = 0; i  fs-info.num_inputs; i++) {
- if (fs-info.input_semantic_name[i] == TGSI_SEMANTIC_GENERIC) {
-const int generic_index = fs-info.input_semantic_index[i];
-/* Note that sprite_coord enable is a bitfield of
- * PIPE_MAX_SHADER_OUTPUTS bits.
- */
-if (generic_index  PIPE_MAX_SHADER_OUTPUTS 
-(rast-sprite_coord_enable  (1  generic_index))) {
-   /* OK, this generic attribute needs to be replaced with a
-* texcoord (see above).
-*/
-   int slot = draw_alloc_extra_vertex_attrib(draw,
- TGSI_SEMANTIC_GENERIC,
- generic_index);
-
-   /* add this slot to the texcoord-gen list */
-   wide-texcoord_gen_slot[wide-num_texcoord_gen++] = slot;
-}
+ int slot;
+ const unsigned sn = fs-info.input_semantic_name[i];
+ const unsigned si = fs-info.input_semantic_index[i];
+
+ if (sn == TGSI_SEMANTIC_TEXCOORD) {
+/* Note that sprite_coord enable is a bitfield of 8 bits. */
+if (si = 8 || !(rast-sprite_coord_enable  (1  si)))
+  

Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-12 Thread Christoph Bumiller
On 12.03.2013 10:31, Christian König wrote:
 Am 12.03.2013 02:48, schrieb Marek Olšák:
 On Mon, Mar 11, 2013 at 1:44 PM, Christian König
 deathsim...@vodafone.de wrote:
 Hi everybody,

 this problem has been open for quite some time now, with a bunch of
 different
 opinions and sometimes even patches floating on the list.

 The solutions proposed or implemented so far all more or less
 incomplete, so
 this approach was designed in mind with both completeness and
 compatibility
 with existing code.

 Over all it's just an implementation of what Tom Stellard named
 solution #4 in
 this eMail thread:
 http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html
 Hi Christian,

 this is definitely not the solution #4. According to the TGSI dump
 Christoph posted, it looks more like #3.

 Well, for me the main difference between proposal #3 and #4 is that #3
 tries to identify the declaration to use with the supplied offset,
 while #4 uses a completely distinct identifier for that.

 The solution #4 completely changes the temporary file such that it
 becomes two-dimensional with the first index being a literal and the
 second index being either a literal or ADDR[literal], and it would
 always be like that regardless of whether drivers support that or not.
 One-dimensional indexing of TEMP is not allowed. For backward
 compatibility, the drivers that do not support it would only get a
 single array declaration TEMP[0][0..n] and TEMP[0][...] would be
 everywhere in the code.

 Ok, then I misunderstood you a bit, but I don't think the difference
 is so much.

 What I'm proposing is that we have an optional ArrayID attached to
 each declaration and refer to this ArrayID in the indirect
 addressing operand. To sum it up declarations should look something
 like this:

 DCL TEMP[0..3]// normal registers
 DCL TEMP[1][4..11]// indirectly accessed array
 DCL TEMP[2][12..15]// another indirectly accessed array
 DCL TEMP[16..17] LOCAL// local registers

 While an indirect operand might look like this:

 MOV TEMP[16], TEMP[1][ADDR[0].x-13]

 On the pro side for this approach is that it is compatible with all
 the existing state trackers and driver, and we don't need to generate
 different code depending on weather or not the driver supports this.

 I don't know much about TGSI internals, so I can't review this. I'd
 just like to say that TGSI dumps should make sense (2D indexing should
 be only allowed with 2D declarations) and tgsi_text_translate should
 be able to do the reverse - convert the dumps back to TGSI tokens.

 Completely agree with that, and beside writing documentation testing
 this is still one of the todos with this patchset.

 I have to admit that your approach looks a bit cleaner from the high
 above view. The problem with it is that it requires this additional 2D
 index on every operand, and we just don't have enough bits left for
 this. Even with my approach I need to make room for this ArrayID in
 the indirect addressing operand token, and this additional token is
 only there if the operand uses indirect adressing.

 Do you think we can live with my approach or is there any major
 downside I currently don't see?


I can live with it. I think ... (I hope I don't regret this later; seems
like this doesn't contain less information, then it's ok.)
If the placement of the hint index offends someone, just write it as
MOV TEMP[16], TEMP(1)[ADDR[0].x-13] or ...
TEMP[ADDR[0].x-13 : 1] or
TEMP[ADDR[0].x-13 supposedToBeIn [4,11]] or
something ... nicer.

Actually ...
if TEMP[0] is placed at mem[0]
and TEMP[4..1] is placed at, say, mem[0x1000 in bytes]

do I have to
load $register mem[$addr - 0xd0] (no this can't work) or
load $regsiter mem[$addr - 0xd0 + 0x1000] (if you didn't adjust the
offset) or
load $register mem[$addr - 0xd0 + 0x1000 - 0x40] (if you already added
the base TEMP to the immediate offset)

This needs to be documented as well.

 Thanks for the clarification,
 Christian.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-12 Thread Christoph Bumiller
On 12.03.2013 12:10, Christoph Bumiller wrote:
 On 12.03.2013 10:31, Christian König wrote:
 Am 12.03.2013 02:48, schrieb Marek Olšák:
 On Mon, Mar 11, 2013 at 1:44 PM, Christian König
 deathsim...@vodafone.de wrote:
 Hi everybody,

 this problem has been open for quite some time now, with a bunch of
 different
 opinions and sometimes even patches floating on the list.

 The solutions proposed or implemented so far all more or less
 incomplete, so
 this approach was designed in mind with both completeness and
 compatibility
 with existing code.

 Over all it's just an implementation of what Tom Stellard named
 solution #4 in
 this eMail thread:
 http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html
 Hi Christian,

 this is definitely not the solution #4. According to the TGSI dump
 Christoph posted, it looks more like #3.
 Well, for me the main difference between proposal #3 and #4 is that #3
 tries to identify the declaration to use with the supplied offset,
 while #4 uses a completely distinct identifier for that.

 The solution #4 completely changes the temporary file such that it
 becomes two-dimensional with the first index being a literal and the
 second index being either a literal or ADDR[literal], and it would
 always be like that regardless of whether drivers support that or not.
 One-dimensional indexing of TEMP is not allowed. For backward
 compatibility, the drivers that do not support it would only get a
 single array declaration TEMP[0][0..n] and TEMP[0][...] would be
 everywhere in the code.
 Ok, then I misunderstood you a bit, but I don't think the difference
 is so much.

 What I'm proposing is that we have an optional ArrayID attached to
 each declaration and refer to this ArrayID in the indirect
 addressing operand. To sum it up declarations should look something
 like this:

 DCL TEMP[0..3]// normal registers
 DCL TEMP[1][4..11]// indirectly accessed array
 DCL TEMP[2][12..15]// another indirectly accessed array
 DCL TEMP[16..17] LOCAL// local registers

 While an indirect operand might look like this:

 MOV TEMP[16], TEMP[1][ADDR[0].x-13]

 On the pro side for this approach is that it is compatible with all
 the existing state trackers and driver, and we don't need to generate
 different code depending on weather or not the driver supports this.

 I don't know much about TGSI internals, so I can't review this. I'd
 just like to say that TGSI dumps should make sense (2D indexing should
 be only allowed with 2D declarations) and tgsi_text_translate should
 be able to do the reverse - convert the dumps back to TGSI tokens.
 Completely agree with that, and beside writing documentation testing
 this is still one of the todos with this patchset.

 I have to admit that your approach looks a bit cleaner from the high
 above view. The problem with it is that it requires this additional 2D
 index on every operand, and we just don't have enough bits left for
 this. Even with my approach I need to make room for this ArrayID in
 the indirect addressing operand token, and this additional token is
 only there if the operand uses indirect adressing.

 Do you think we can live with my approach or is there any major
 downside I currently don't see?


One more thing. While you're at it (i.e. are familiar with the code),
could you set the UsageMask in the TGSI declaration so we can pack
scalar or vec2 arrays ?
Also, you could then declare gl_ClipDistance outputs as

DCL OUT[0..7].x, CLIPDIST

so we can actually index clip distances properly ?

With
DCL OUT[0..1].xyzw, CLIPDIST we can't really index the individual
components which leads to
if ((index  3) == 0)
   MOV OUT[index / 4].x = value
else if ((index  3) == 1)
   MOV OUT[index / 4].y = value

which is unnecessary on some hardware.

 I can live with it. I think ... (I hope I don't regret this later; seems
 like this doesn't contain less information, then it's ok.)
 If the placement of the hint index offends someone, just write it as
 MOV TEMP[16], TEMP(1)[ADDR[0].x-13] or ...
 TEMP[ADDR[0].x-13 : 1] or
 TEMP[ADDR[0].x-13 supposedToBeIn [4,11]] or
 something ... nicer.

 Actually ...
 if TEMP[0] is placed at mem[0]
 and TEMP[4..1] is placed at, say, mem[0x1000 in bytes]

 do I have to
 load $register mem[$addr - 0xd0] (no this can't work) or
 load $regsiter mem[$addr - 0xd0 + 0x1000] (if you didn't adjust the
 offset) or
 load $register mem[$addr - 0xd0 + 0x1000 - 0x40] (if you already added
 the base TEMP to the immediate offset)

 This needs to be documented as well.

 Thanks for the clarification,
 Christian.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev

Re: [Mesa-dev] [PATCH 1/2] d3d1x: Remove.

2013-03-11 Thread Christoph Bumiller
On 11.03.2013 11:26, Jose Fonseca wrote:
 First email was too long, so re-sending just the interesting bits)

Please tell me removing this came to mind because you're going to
release a better D3D9,10/11 state tracker :)
(Nah I guess it would be too much trouble if there's no users for it ...)

This one *did* kind of work, notably also with wine, but it still has
loads of bugs and I just don't have the time to improve it; and then add
those missing bits like deferred contexts, virtual functions, compute
shader or UAV support.
Also gallium's still not completely able to support everything properly.
It did acquire some of the missing parts though since last time I
touched it.

I had succeeded in making Unigine Heaven run (taking a little shortcut
with sm4 to nv50, extending the gallium interface for features like
tessellation that are still years ahead for all the other parties would
not have been well received at that time, at least I had that
impression), but all the more complex games I tested crashed somewhere
and I wasn't going to try to debug binary blobs (most of them seemed to
require those missing features, too).

Anyway, just meant to say, it *could* have been useful had someone
finished it ... if only with wine.
So I'm fine with removing it since I don't expect anyone to get back to
it. Trying to decide between farewell and good riddance for all the
pain its bugs caused me.

 From: José Fonseca jfons...@vmware.com

 Unused/unmaintained.
 ---
  configure.ac   |   21 -
  src/gallium/docs/source/context.rst|2 +-
  src/gallium/state_trackers/d3d1x/.gitignore|   20 -
  src/gallium/state_trackers/d3d1x/Makefile  |   11 -
  src/gallium/state_trackers/d3d1x/Makefile.inc  |   19 -
  .../state_trackers/d3d1x/d3d1xshader/Makefile  |   16 -
  .../d3d1x/d3d1xshader/defs/files.txt   |   41 -
  .../d3d1x/d3d1xshader/defs/interpolations.txt  |8 -
  .../d3d1x/d3d1xshader/defs/opcodes.txt |  207 --
  .../d3d1x/d3d1xshader/defs/operand_compnums.txt|5 -
  .../d3d1x/d3d1xshader/defs/operand_index_reprs.txt |5 -
  .../d3d1x/d3d1xshader/defs/operand_modes.txt   |4 -
  .../d3d1x/d3d1xshader/defs/shortfiles.txt  |   41 -
  .../state_trackers/d3d1x/d3d1xshader/defs/svs.txt  |   23 -
  .../d3d1x/d3d1xshader/defs/targets.txt |   13 -
  .../defs/token_instruction_extended_types.txt  |4 -
  .../defs/token_operand_extended_types.txt  |2 -
  .../state_trackers/d3d1x/d3d1xshader/gen-header.sh |   13 -
  .../state_trackers/d3d1x/d3d1xshader/gen-text.sh   |   11 -
  .../d3d1x/d3d1xshader/include/dxbc.h   |  125 -
  .../d3d1x/d3d1xshader/include/le32.h   |   45 -
  .../state_trackers/d3d1x/d3d1xshader/include/sm4.h |  416 
  .../d3d1x/d3d1xshader/src/dxbc_assemble.cpp|   59 -
  .../d3d1x/d3d1xshader/src/dxbc_dump.cpp|   43 -
  .../d3d1x/d3d1xshader/src/dxbc_parse.cpp   |   87 -
  .../d3d1x/d3d1xshader/src/sm4_analyze.cpp  |  122 -
  .../d3d1x/d3d1xshader/src/sm4_dump.cpp |  222 --
  .../d3d1x/d3d1xshader/src/sm4_parse.cpp|  445 
  .../state_trackers/d3d1x/d3d1xshader/src/utils.h   |   45 -
  .../d3d1x/d3d1xshader/tools/fxdis.cpp  |   75 -
  .../state_trackers/d3d1x/d3d1xstutil/Makefile  |5 -
  .../d3d1x/d3d1xstutil/include/d3d1xstutil.h| 1110 -
  .../d3d1x/d3d1xstutil/src/d3d_sm4_enums.cpp|   42 -
  .../d3d1x/d3d1xstutil/src/dxgi_enums.cpp   |  165 --
  .../state_trackers/d3d1x/d3d1xstutil/src/guids.cpp |6 -
  src/gallium/state_trackers/d3d1x/d3dapi/Makefile   |4 -
  src/gallium/state_trackers/d3d1x/d3dapi/d3d10.idl  | 1554 
  .../state_trackers/d3d1x/d3dapi/d3d10_1.idl|  191 --
  .../state_trackers/d3d1x/d3dapi/d3d10misc.h|   47 -
  .../state_trackers/d3d1x/d3dapi/d3d10shader.idl|  269 ---
  src/gallium/state_trackers/d3d1x/d3dapi/d3d11.idl  | 2492 
 
  .../state_trackers/d3d1x/d3dapi/d3d11shader.idl|  287 ---
  .../state_trackers/d3d1x/d3dapi/d3dcommon.idl  |  704 --
  src/gallium/state_trackers/d3d1x/d3dapi/dxgi.idl   |  470 
  .../state_trackers/d3d1x/d3dapi/dxgiformat.idl |  129 -
  .../state_trackers/d3d1x/d3dapi/dxgitype.idl   |   84 -
  src/gallium/state_trackers/d3d1x/docs/Makefile |5 -
  .../state_trackers/d3d1x/docs/coding_style.txt |   84 -
  .../d3d1x/docs/module_dependencies.dot |   25 -
  .../state_trackers/d3d1x/docs/source_layout.txt|   17 -
  src/gallium/state_trackers/d3d1x/dxgi/Makefile |   17 -
  .../state_trackers/d3d1x/dxgi/src/dxgi_loader.cpp  |  206 --
  .../state_trackers/d3d1x/dxgi/src/dxgi_native.cpp  | 1514 
  .../state_trackers/d3d1x/dxgi/src/dxgi_private.h   |   49 -
  .../state_trackers/d3d1x/dxgid3d10/Makefile|4 -
  

Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-11 Thread Christoph Bumiller
On 11.03.2013 13:44, Christian König wrote:
 Hi everybody,

 this problem has been open for quite some time now, with a bunch of different
 opinions and sometimes even patches floating on the list.
Nice, finally someone implements a proper solution.
However, it seems like this isn't used for arrays in the IN and OUT
files (varyings). Would it be much more work to use it there, too ?

Fragment Shader inputs seem to be read with if (index == 0) return
in[0] else if (index == 1) ... sequences.

And I may have spotted a bug in the following shader:

in vec4 vertex[2];
in vec4 color;
out vec4 value[4];

uniform int i, j;

void main()
{
gl_Position = vertex[i];

value[0] = vertex[0];
value[1] = vertex[1];
value[2] = vec4(0.0);
value[3] = vec4(0.0);
value[j] = color;
}

gives me

DCL IN[0]
DCL IN[1]
DCL IN[2]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[12]
DCL OUT[2], GENERIC[13]
DCL OUT[3], GENERIC[14]
DCL OUT[4], GENERIC[15]
DCL CONST[0..1]
DCL TEMP[0..3], LOCAL
DCL TEMP[4], LOCAL
DCL ADDR[0]
IMM[0] FLT32 {0., 0., 0., 0.}
  0: UARL ADDR[0].x, CONST[1].
  1: MOV TEMP[4], IN[ADDR[0].x]  (not the bug) but this is invalid as
there is no IN array, just single ones
  2: MOV TEMP[0], IN[0]
  3: MOV TEMP[1], IN[1]
  4: MOV TEMP[2], IMM[0].
  5: MOV TEMP[3], IMM[0].
  6: UARL ADDR[0].x, CONST[0].
  7: MOV TEMP[1][ADDR[0].x], IN[2]

why is this TEMP[1][] ? The array seems to be the first declaration ...

  8: MOV OUT[1], TEMP[0]
  9: MOV OUT[2], TEMP[1]
 10: MOV OUT[3], TEMP[2]
 11: MOV OUT[4], TEMP[3]
 12: MOV OUT[0], TEMP[4]
 13: END

Ideally this would not use TEMP arrays at all though, but output arrays
(I vaguely recall some radeon card doesn't support this though. Is that
just outputs or also inputs ?).

 The solutions proposed or implemented so far all more or less incomplete, so
 this approach was designed in mind with both completeness and compatibility
 with existing code.

 Over all it's just an implementation of what Tom Stellard named solution #4 in
 this eMail thread: 
 http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html

 Please review and as usual comments are welcome,
 Christian.

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-11 Thread Christoph Bumiller
On 11.03.2013 15:38, Christian König wrote:
 Am 11.03.2013 14:47, schrieb Christoph Bumiller:
 On 11.03.2013 13:44, Christian König wrote:
 Hi everybody,

 this problem has been open for quite some time now, with a bunch of
 different
 opinions and sometimes even patches floating on the list.
 Nice, finally someone implements a proper solution.
 However, it seems like this isn't used for arrays in the IN and OUT
 files (varyings). Would it be much more work to use it there, too ?

 Shouldn't be to much of a problem, but I just wanted to solve
 temporaries first and when that's working look at all the rest.

 Fragment Shader inputs seem to be read with if (index == 0) return
 in[0] else if (index == 1) ... sequences.

 Well as said before it only handles temp arrays for now. That looks
 like the code that's generated if the driver reports to not have
 indirect support, do you know off hand where exactly that's handled?
 The glsl_to_tgsi code is unfortunately hard to read at best.


Apologies, I didn't remember I that I didn't advertise indirect support
for fragment shaders, indirect inputs would be supported though.
The reason why I really want array support for inputs, too, is that
input space location depends on semantic, and thus doesn't necessarily
correspond to the TGSI order.

Treatment of arrays should be consistent in the end, right now it looks
like we're having, if you read this like C code:
float temp0[4];
temp0[i] = x;

but
float in0, in1, in2, in3;
x = in[i];

 why is this TEMP[1][] ? The array seems to be the first declaration ...

 I numbered the declarations starting with 1 (and not 0), so I could
 use 0 as the SPECIAL case saying that we want to address the whole
 range of registers and not just one declaration. I did this just for
 compatibility reasons, so I could look at handling temps only, and
 doesn't bother to much with inputs/outputs.

 Well so far the patchset is just an RFC, and so I want to let the list
 see the patches before either implementing inputs/outputs as well or
 fully document such quirks/hacks.


Ah, good to know. This should be documented (maybe it is and I missed it
?). At least in the comment above struct tgsi_ind_register's definition,
which is what I'd look at first.

Thanks again for doing this.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Solving the TGSI indirect addressing optimization problem

2013-03-11 Thread Christoph Bumiller
On 11.03.2013 19:33, Brian Paul wrote:
 On 03/11/2013 06:44 AM, Christian König wrote:
 Hi everybody,

 this problem has been open for quite some time now, with a bunch of
 different
 opinions and sometimes even patches floating on the list.

 The solutions proposed or implemented so far all more or less
 incomplete, so
 this approach was designed in mind with both completeness and
 compatibility
 with existing code.

 Over all it's just an implementation of what Tom Stellard named
 solution #4 in
 this eMail thread:
 http://lists.freedesktop.org/archives/mesa-dev/2013-January/033264.html

 Please review and as usual comments are welcome,

 I still don't quite get what's going on here.

 In Christoph's reply, it seems he tested your patch and got TGSI code
 that looks like this:

 DCL IN[0]
 DCL IN[1]
 DCL IN[2]
 DCL OUT[0], POSITION
 DCL OUT[1], GENERIC[12]
 DCL OUT[2], GENERIC[13]
 DCL OUT[3], GENERIC[14]
 DCL OUT[4], GENERIC[15]
 DCL CONST[0..1]
 DCL TEMP[0..3], LOCAL
 DCL TEMP[4], LOCAL
 DCL ADDR[0]
 IMM[0] FLT32 {0., 0., 0., 0.}
   0: UARL ADDR[0].x, CONST[1].
   1: MOV TEMP[4], IN[ADDR[0].x]  (not the bug) but this is invalid as
 there is no IN array, just single ones
   2: MOV TEMP[0], IN[0]
   3: MOV TEMP[1], IN[1]
   4: MOV TEMP[2], IMM[0].
   5: MOV TEMP[3], IMM[0].
   6: UARL ADDR[0].x, CONST[0].
   7: MOV TEMP[1][ADDR[0].x], IN[2]

 What exactly does LOCAL mean on the temp declarations?

That the register isn't used for parameter passing between subroutines.
Has been introduced a long time ago.
See commit 2644952bd4dfa3b75112dee8dfd287a12d770705.

 But in Jose's example, he wrote:

   DCL TEMP[1][0..70]
   DCL TEMP[2][0..7]
   MOV OUT[1], TEMP[1][ADDR[0].x]

 In this code, each chunk of temporaries has an explicit name as Marek
 suggested in his comments to the #4 proposal.


The point is that TEMP (and all other spaces likewise) are still a
single space, i.e. without duplicate indices. The only real change is
that an indirect access is supplied with the index of the declaration of
which the range will be accessed.

 What exactly is your proposal doing?

 Can you please provide some more sample TGSI code to illustrate what
 you're doing?  And, how it would be extended for inputs/outputs?

 Thanks.

 -Brian


 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: enforcing gallium resource bind flags

2013-03-01 Thread Christoph Bumiller
On 01.03.2013 11:30, Jose Fonseca wrote:
 - Original Message -
 On Fri, Mar 1, 2013 at 12:31 AM, Roland Scheidegger srol...@vmware.com
 wrote:
 Hi,

 there is some sloppy usage of bind flags in the opengl state tracker
 (that is, resources get used for things which they didn't have the bind
 flag set).
 We'd really like to enforce these flags to be honored but it doesn't
 really work (ok llvmpipe so far would only really care about sampler
 view, color render target, depth/stencil - see also
 c8eb2d0e829d0d2aea6a982620da0d3cfb5982e2).

 Currently it looks like there's at least two issues with those bind
 flags in the opengl state tracker (for these bind flags only, there are
 almost certainly more).
 1) for textures, the state tracker will always try to allocate resources
 with both sampler_view and render_target (or depth/stencil) bind flags.
 However it will drop these flags for resources where this isn't
 supported. This is all right, however when we try to render to such
 resources, the surface will be created regardless (but it won't get used
 as it will fail framebuffer validation which checks the attachments and
 specifically tests if the format is a renderable format). I guess this
 could be fixed (seems a bit backward, it might be possible to just look
 at the resource bind flags to decide if we create a surface or not, and
 we shouldn't need to check the format later - if we've got the bind flag
 we know we can create a surface and hence render to).

 2) a far more difficult problem seem to be buffers. While piglit doesn't
 hit it (I modified the tbo test to hit this) it is possible to create
 buffers with any target and later bind to anything. So the state tracker
 has no knowledge at all what a buffer will eventually get used for
 (other than the hint when it was first created), and it seems
 unreasonable to just set all possible bind flags all the time. But then
 still enforcing bind flags later would require the state tracker to
 recreate the resource (with more bind flags) and copy over the old
 contents, which sounds very bad too.

 So any ideas?
 In my opinion, the bind flags are useless, because they cannot be
 determined for OpenGL resources exactly. The only exceptions are:
 - PIPE_BIND_CONSTANT_BUFFER, which is set correctly for the default
 non-UBO constant buffer.
 - PIPE_BIND_SCANOUT for the DDX and DRM state trackers.
 - PIPE_BIND_GLOBAL for OpenCL.

 The radeon drivers ignore the bind flags entirely except SCANOUT and
 GLOBAL, and r300g also checks for CONSTANT_BUFFER.

 The OpenGL buffer API doesn't have any bind flags. It only has binding
 points, and any buffer can be bound to any binding point. Textures are
 just as fun. You can create a texture or a renderbuffer, but if you
 use CopyTexSubImage, the roles are swapped - what was a texture is
 suddenly a renderbuffer and what was a renderbuffer is suddenly a
 texture.

 If we didn't need the 3 bind flags mentioned above, I would be for
 removing pipe_resource::bind, because it's not that useful.
 API other than OpenGL have clear and stricter binding rules, which drivers 
 can rely upon to make real optimizations.

 I honestly don't see the what's the difficulty here, the semantics are clear:

 - If Mesa state tracker doesn't care about BIND flags, that's fine, just let 
 Mesa request as much BIND flags as the driver advertises.

 - If a gallium driver doesn't care about BIND flags, that's fine, just 
 advertise ~0 bind flags.

You can still use the them as a hint for optimization. For example, if
an application binds a buffer to GL_ARRAY_BUFFER on creation, it is
rather likely to be used mainly as a vertex buffer.
So I wouldn't want mesa/st to just simply set all of them and discard
that bit of information about the user's intentions.
Rather than setting all flags, I'd add an additional PIPE_BIND_UNKNOWN
to signal that the binding specified should not be taken too seriously,
and that the resource should be bindable to all points possible for the
given resource.

We could also make it possible to add new bind flags as you go, via an
additional gallium API function, where the driver can, for instance,
migrate a resource if necessary.
But I don't quite like that because the state tracker would have to keep
checking the bind flags all the time which is rather ugly.

 It's really as simple as that. If the state tracker cares about some but not 
 all flags you can easily extrapolate from the above what should be done.

 But lets not get carried away and throw the baby with the water. Eliminating 
 bind flags from Gallium does nothing to improve Mesa performance, and will 
 hamper other gallium-based graphics stacks.

Agreed. I'm strongly against removing the bind flags. There are still
other state trackers that could really make use of them.

 Jose
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: support rendering to buffer render targets.

2013-02-27 Thread Christoph Bumiller
On 27.02.2013 10:44, Jose Fonseca wrote:
 - Original Message -
 What is this good for? Is it for UAVs? (unordered access views)
 No, it is just a standard D3D10 feature: 
 http://msdn.microsoft.com/en-gb/library/windows/desktop/bb204897.aspx

 Not sure if there's a particular use case for it (e.g, maybe DirectCompute 
 uses this extensively), or just a matter of symmetry in the API (ie., if one 
 can sample from buffer textures, then why not render into them?)
I can think of rendering to vertex buffers. It's just annoying that
there are no alignment restrictions on the range that is bound (worst
case you have to render to a temporary buffer and copy stuff around);
but at least it has to be = 8192 bytes (or elements, not sure) in D3D10.

 For UAVs, I think there is ARB_shader_storage_buffer_object and
 pipe_context::set_shader_resources.
 Yeah, D3D11 UAVs are also supposed to be bound separately in the pipeline.

 Jose

 Marek

 On Wed, Feb 27, 2013 at 3:18 AM,  srol...@vmware.com wrote:
 From: Roland Scheidegger srol...@vmware.com

 Unfortunately not usable from OpenGL, and no cap bit.
 Pretty similar to a 1d texture, though allows specifying a start element.
 The util code for handling clears also needs adjustments (and fix
 a bug causing crashes for handling pure integer formats there too).
 ---
  src/gallium/auxiliary/util/u_surface.c  |   55
  +++
  src/gallium/drivers/llvmpipe/lp_rast.c  |   25 ++--
  src/gallium/drivers/llvmpipe/lp_rast_priv.h |4 +-
  src/gallium/drivers/llvmpipe/lp_scene.c |   35 +++--
  src/gallium/drivers/llvmpipe/lp_texture.c   |   44 +++--
  5 files changed, 108 insertions(+), 55 deletions(-)

 diff --git a/src/gallium/auxiliary/util/u_surface.c
 b/src/gallium/auxiliary/util/u_surface.c
 index b948b46..fba0798 100644
 --- a/src/gallium/auxiliary/util/u_surface.c
 +++ b/src/gallium/auxiliary/util/u_surface.c
 @@ -323,20 +323,59 @@ util_clear_render_target(struct pipe_context *pipe,
 if (!dst-texture)
return;
 /* XXX: should handle multiple layers */
 -   dst_map = pipe_transfer_map(pipe,
 -   dst-texture,
 -   dst-u.tex.level,
 -   dst-u.tex.first_layer,
 -   PIPE_TRANSFER_WRITE,
 -   dstx, dsty, width, height, dst_trans);
 +
 +   if (dst-texture-target == PIPE_BUFFER) {
 +  /*
 +   * The fill naturally works on the surface format, however
 +   * the transfer uses resource format which is just bytes for
 buffers.
 +   */
 +  unsigned dx, w;
 +  unsigned pixstride = util_format_get_blocksize(dst-format);
 +  dx = dstx * pixstride;
 +  w = width * pixstride;
 +  dst_map = pipe_transfer_map(pipe,
 +  dst-texture,
 +  0, 0,
 +  PIPE_TRANSFER_WRITE,
 +  dx, 0, w, 1,
 +  dst_trans);
 +  dst_map = (uint8_t *)dst_map + dst-u.buf.first_element * pixstride;
 +   }
 +   else {
 +  /* XXX: should handle multiple layers */
 +  dst_map = pipe_transfer_map(pipe,
 +  dst-texture,
 +  dst-u.tex.level,
 +  dst-u.tex.first_layer,
 +  PIPE_TRANSFER_WRITE,
 +  dstx, dsty, width, height, dst_trans);
 +
 +   }

 assert(dst_map);

 if (dst_map) {
 +  enum pipe_format format = dst-format;
assert(dst_trans-stride  0);

 -  util_pack_color(color-f, dst-texture-format, uc);
 -  util_fill_rect(dst_map, dst-texture-format,
 +  if (util_format_is_pure_integer(format)) {
 + /*
 +  * We expect int/uint clear values here, though some APIs
 +  * might disagree (but in any case util_pack_color()
 +  * couldn't handle it)...
 +  */
 + if (util_format_is_pure_sint(format)) {
 +util_format_write_4i(format, color-i, 0, uc, 0, 0, 0, 1, 1);
 + }
 + else {
 +assert(util_format_is_pure_uint(format));
 +util_format_write_4ui(format, color-ui, 0, uc, 0, 0, 0, 1,
 1);
 + }
 +  }
 +  else {
 + util_pack_color(color-f, dst-format, uc);
 +  }
 +  util_fill_rect(dst_map, dst-format,
   dst_trans-stride,
   0, 0, width, height, uc);

 diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c
 b/src/gallium/drivers/llvmpipe/lp_rast.c
 index b5e5da6..6183f41 100644
 --- a/src/gallium/drivers/llvmpipe/lp_rast.c
 +++ b/src/gallium/drivers/llvmpipe/lp_rast.c
 @@ -165,32 +165,13 @@ lp_rast_clear_color(struct lp_rasterizer_task *task,

   for (i = 0; i  scene-fb.nr_cbufs; i++) {
  enum pipe_format format = 

Re: [Mesa-dev] [PATCH] gallium: fix tgsi SAMPLE_L opcode to use separate source for explicit lod

2013-02-12 Thread Christoph Bumiller
On 11.02.2013 20:47, srol...@vmware.com wrote:
 From: Roland Scheidegger srol...@vmware.com
 
 It looks like using coord.w as explicit lod value is a mistake, most likely
 because some dx10 docs had it specified that way. Seems this was changed 
 though:
 http://msdn.microsoft.com/en-us/library/windows/desktop/hh447229%28v=vs.85%29.aspx
 - let's just hope it doesn't depend on runtime build version or something.
 Not only would this need translation (so go against the stated goal these
 opcodes should be close to dx10 semantics) but it would prevent usage of this
 opcode with cube arrays, which is apparently possible:
 http://msdn.microsoft.com/en-us/library/windows/desktop/bb509699%28v=vs.85%29.aspx
 (Note not only does this show cube arrays using explicit lod, but also the
 confusion with this opcode: it lists an explicit lod parameter value, but then
 states last component of location is used as lod).
 (For true hw drivers, only nv50 had code to handle it, and it appears the
 code was already right for the new semantics, though fix up the seemingly
 wrong c/d arguments while there.)
 ---
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c   |5 +
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c|2 +-
  src/gallium/auxiliary/tgsi/tgsi_exec.c |2 +-
  src/gallium/auxiliary/tgsi/tgsi_info.c |2 +-
  src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h   |2 +-
  src/gallium/docs/source/tgsi.rst   |   12 ++--
  .../drivers/nv50/codegen/nv50_ir_from_tgsi.cpp |2 +-
  .../state_trackers/d3d1x/gd3d1x/sm4_to_tgsi.cpp|9 ++---
  8 files changed, 14 insertions(+), 22 deletions(-)
 

 diff --git a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp 
 b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp
 index 5078eb4..acec623 100644
 --- a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp
 +++ b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp
 @@ -2065,7 +2065,7 @@ Converter::handleInstruction(const struct 
 tgsi_full_instruction *insn)
 case TGSI_OPCODE_SAMPLE_L:
 case TGSI_OPCODE_SAMPLE_C:
 case TGSI_OPCODE_SAMPLE_C_LZ:
 -  handleTEX(dst0, 1, 2, 0x30, 0x31, 0x40, 0x50);
 +  handleTEX(dst0, 1, 2, 0x30, 0x30, 0x30, 0x40);

Thanks, this looks good. It was probably completely wrong before.

break;
 case TGSI_OPCODE_TXF:
 case TGSI_OPCODE_LOAD:


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] gallium: add facilities for indirect drawing

2013-02-04 Thread Christoph Bumiller
On 04.02.2013 08:27, Michel Dänzer wrote:
 On Fre, 2013-02-01 at 22:50 +0100, Christoph Bumiller wrote: 
 diff --git a/src/gallium/drivers/r300/r300_screen.c 
 b/src/gallium/drivers/r300/r300_screen.c
 index d0f0070..7ae9dd6 100644
 --- a/src/gallium/drivers/r300/r300_screen.c
 +++ b/src/gallium/drivers/r300/r300_screen.c
 @@ -155,6 +155,7 @@ static int r300_get_param(struct pipe_screen* pscreen, 
 enum pipe_cap param)
  case PIPE_CAP_TEXTURE_MULTISAMPLE:
  case PIPE_CAP_CUBE_MAP_ARRAY:
  case PIPE_CAP_TEXTURE_BUFFER_OBJECTS:
 +case PIPE_CAP_DRAW_INDIRECT:
  return 0;
  
  /* SWTCL-only features. */
 Thanks for adding the cap to r300g, but what about r600g and radeonsi?


For r300, nv30 and nv50 it was clear that it's not going to be supported.
For sp and lp I just used the helper function because there's probably
no better way to do it there.
For r600 and radeonsi I thought you'd set the cap together with a patch
that implements the feature, there's probably plenty of time until this
has gone through review :)
(Can't tell if it's easy to do or not since I can't find EG+ docs, but
on nvc0 it was rather simple.)
But if no one feels like doing that until indirect drawing can be
merged, I'll add the return 0's for you as well.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] mesa: implement GL_ARB_draw_indirect

2013-02-02 Thread Christoph Bumiller
On 02.02.2013 08:32, Adrian M Negreanu wrote:
 On Fri, Feb 1, 2013 at 11:50 PM, Christoph Bumiller
 e0425...@student.tuwien.ac.at wrote:
 I have 1 piglit test to check drawing with several combinations of
 parameters (using transform feedback to write the commands), but
 will make some more tests for various things like interaction with
 PrimitiveRestart or error conditions.

 (http://people.freedesktop.org/~chrisbmr/0001-arb_draw_indirect-add-initial-test.patch)

 The gallium interface specifies a start_instance parameter that the
 GL extension doesn't have (it's reservedMustBeZero instead, but,
 seriously, why ? D3D does have it. Because making yet another
 extension will be so much fun ?)

 Not sure if we want to expose this with the compatibilit profile.

 Hi,

 I have tested your changes on Android and Linux but it fails for Android.

Oops, thanks, it's a copy-paste error; next time I shall try to remember
building all drivers ...


 Tested the patch(es) on top of the following commits:
 ==
 6c7e95c intel: implement create image from texture
 8e2454c intel: Account for mt-offset in intel_miptree_map
 11f5c82 intel: Create a miptree using offsets in 
 intel_set_texture_image_region
 45a28a9 i965: Account for offsets when updating SURFACE_STATE.
 163b35e intel: add pixel offset calculator for miptree levels
 7014df0 intel: Expose intel_miptree_create_internal as
 intel_miptree_create_layout.
 f9e4e5f intel: expose dimensions and offsets of a miptree level in DRIImage



 Failed to build for android
 
 6c7e95c intel: implement create image from texture
 8e2454c intel: Account for mt-offset in intel_miptree_map
 11f5c82 intel: Create a miptree using offsets in 
 intel_set_texture_image_region
 45a28a9 i965: Account for offsets when updating SURFACE_STATE.
 163b35e intel: add pixel offset calculator for miptree levels
 7014df0 intel: Expose intel_miptree_create_internal as
 intel_miptree_create_layout.
 f9e4e5f intel: expose dimensions and offsets of a miptree level in DRIImage
 src/mesa/drivers/dri/i965/intel_buffer_objects.c:375:46: warning:
 pointer of type 'void *' used in arithmetic [-Wpointer-arith]
 src/mesa/drivers/dri/i965/intel_fbo.c: In function 'intel_map_renderbuffer':
 src/mesa/drivers/dri/i965/intel_fbo.c:146:11: warning: pointer of type
 'void *' used in arithmetic [-Wpointer-arith]
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function
 'intel_miptree_map_gtt':
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1123:58: warning:
 pointer of type 'void *' used in arithmetic [-Wpointer-arith]
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1136:23: warning:
 pointer of type 'void *' used in arithmetic [-Wpointer-arith]
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1136:41: warning:
 pointer of type 'void *' used in arithmetic [-Wpointer-arith]
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function
 'intel_miptree_unmap_etc':
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1344:17: warning:
 pointer of type 'void *' used in arithmetic [-Wpointer-arith]
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1345:17: warning:
 pointer of type 'void *' used in arithmetic [-Wpointer-arith]
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function
 'intel_miptree_alloc_mcs':
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c:305:4: warning: 'format'
 may be used uninitialized in this function [-Wuninitialized]
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c:814:14: note: 'format'
 was declared here
 src/mesa/drivers/dri/i965/intel_tex_subimage.c: In function
 'intel_texsubimage_tiled_memcpy':
 src/mesa/drivers/dri/i965/intel_tex_subimage.c:301:29: warning:
 pointer of type 'void *' used in arithmetic [-Wpointer-arith]
 src/mesa/drivers/dri/i965/intel_tex_subimage.c:302:17: warning:
 pointer of type 'void *' used in arithmetic [-Wpointer-arith]
 src/mesa/drivers/dri/i965/intel_tex_validate.c: In function
 'intel_tex_map_image_for_swrast':
 src/mesa/drivers/dri/i965/intel_tex_validate.c:189:73: warning:
 pointer of type 'void *' used in arithmetic [-Wpointer-arith]
 src/mesa/drivers/dri/i965/intel_tex_validate.c:190:15: warning:
 pointer of type 'void *' used in arithmetic [-Wpointer-arith]
 In file included from src/mesa/drivers/dri/i965/brw_context.c:44:0:
 src/mesa/drivers/dri/i965/brw_draw.h:45:33: error: conflicting types
 for 'tfb_vertcount'
 src/mesa/drivers/dri/i965/brw_draw.h:44:45: note: previous definition
 of 'tfb_vertcount' was here
 make: *** 
 [out/target/product/samsungxe700t/obj/SHARED_LIBRARIES/i965_dri_intermediates/brw_context.o]
 Error 1
 FAILURE



 Successfully built configuration linux, no issues
 


 --
 Regards!
 http://groleo.wordpress.com

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org

[Mesa-dev] [PATCH] gallium: add PIPE_BIND_COMMAND_BUFFER

2013-02-02 Thread Christoph Bumiller
Intend to merge this into the previous ARB_draw_indirect patches.
Just in case there's any complaints ...

Needed to add this so the DRAW_INDIRECT_BUFFER doesn't get placed
into a non-GPU accessible domain. Besides, this seems reasonable,
and D3D11 has it, too (albeit a specialized version, called
D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS).
---
 src/gallium/docs/source/screen.rst   |2 ++
 src/gallium/include/pipe/p_defines.h |1 +
 src/mesa/state_tracker/st_cb_bufferobjects.c |3 +++
 3 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/src/gallium/docs/source/screen.rst 
b/src/gallium/docs/source/screen.rst
index c94d87d..6bf0b3a 100644
--- a/src/gallium/docs/source/screen.rst
+++ b/src/gallium/docs/source/screen.rst
@@ -295,6 +295,8 @@ resources might be created and handled quite differently.
   bound to the graphics pipeline as a shader resource.
 * ``PIPE_BIND_COMPUTE_RESOURCE``: A buffer or texture that can be
   bound to the compute program as a shader resource.
+* ``PIPE_BIND_COMMAND_BUFFER``: A buffer or that may be sourced by the
+  GPU command processor, like with indirect drawing.
 
 .. _pipe_usage:
 
diff --git a/src/gallium/include/pipe/p_defines.h 
b/src/gallium/include/pipe/p_defines.h
index 1aea9f4..4fb91cf 100644
--- a/src/gallium/include/pipe/p_defines.h
+++ b/src/gallium/include/pipe/p_defines.h
@@ -315,6 +315,7 @@ enum pipe_flush_flags {
 #define PIPE_BIND_GLOBAL   (1  18) /* set_global_binding */
 #define PIPE_BIND_SHADER_RESOURCE  (1  19) /* set_shader_resources */
 #define PIPE_BIND_COMPUTE_RESOURCE (1  20) /* set_compute_resources */
+#define PIPE_BIND_COMMAND_BUFFER   (1  21) /* pipe_draw_info.indirect */
 
 /* The first two flags above were previously part of the amorphous
  * TEXTURE_USAGE, most of which are now descriptions of the ways a
diff --git a/src/mesa/state_tracker/st_cb_bufferobjects.c 
b/src/mesa/state_tracker/st_cb_bufferobjects.c
index d516735..265f758 100644
--- a/src/mesa/state_tracker/st_cb_bufferobjects.c
+++ b/src/mesa/state_tracker/st_cb_bufferobjects.c
@@ -205,6 +205,9 @@ st_bufferobj_data(struct gl_context *ctx,
case GL_UNIFORM_BUFFER:
   bind = PIPE_BIND_CONSTANT_BUFFER;
   break;
+   case GL_DRAW_INDIRECT_BUFFER:
+  bind = PIPE_BIND_COMMAND_BUFFER;
+  break;
default:
   bind = 0;
}
-- 
1.7.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] gallium: add SQRT shader opcode

2013-02-01 Thread Christoph Bumiller
On 01.02.2013 19:29, Brian Paul wrote:
 The glsl-to-tgsi translater will emit SQRT to implement GLSL's sqrt()
 and distance() functions if the PIPE_SHADER_CAP_TGSI_SQRT_SUPPORTED
 query says it's supported by the driver.

 Otherwise, sqrt(x) is implemented with x*rsq(x).  The problem with
 this is sqrt(0) must be handled specially because rsq(0) might be
 Inf/NaN/undefined (and then 0*rsq(0) is Inf/Nan/undefined).  In the
That's why we do rcp(rsq(x)), that works correctly.
I'm not sure we really need a cap for this though ... except to avoid
modifying drivers ;)

I'll advertise the cap anyway, I prefer to be able to handle it internally.
But I like this change, lowering SQRT (or not) is device specific and
shouldn't be done unconditionally just because the API can't represent it.

 glsl-to-tgsi code we use an extra CMP to check if x is zero and then
 replace the result of x*rsq(x) with zero.

 In the end, this makes sqrt() generate much more reasonable code for
 drivers that can do square roots.

 Note that many of piglit's generated shader tests use the GLSL
 distance() function.
 ---
  src/gallium/docs/source/tgsi.rst   |9 +
  src/gallium/include/pipe/p_defines.h   |3 ++-
  src/gallium/include/pipe/p_shader_tokens.h |2 +-
  3 files changed, 12 insertions(+), 2 deletions(-)

 diff --git a/src/gallium/docs/source/tgsi.rst 
 b/src/gallium/docs/source/tgsi.rst
 index 548a9a3..5f03f32 100644
 --- a/src/gallium/docs/source/tgsi.rst
 +++ b/src/gallium/docs/source/tgsi.rst
 @@ -89,6 +89,15 @@ This instruction replicates its result.
dst = \frac{1}{\sqrt{|src.x|}}
  
  
 +.. opcode:: SQRT - Square Root
 +
 +This instruction replicates its result.
 +
 +.. math::
 +
 +  dst = {\sqrt{src.x}}
 +
 +
  .. opcode:: EXP - Approximate Exponential Base 2
  
  .. math::
 diff --git a/src/gallium/include/pipe/p_defines.h 
 b/src/gallium/include/pipe/p_defines.h
 index d0db5e4..fdf6e7f 100644
 --- a/src/gallium/include/pipe/p_defines.h
 +++ b/src/gallium/include/pipe/p_defines.h
 @@ -542,7 +542,8 @@ enum pipe_shader_cap
 PIPE_SHADER_CAP_SUBROUTINES = 16, /* BGNSUB, ENDSUB, CAL, RET */
 PIPE_SHADER_CAP_INTEGERS = 17,
 PIPE_SHADER_CAP_MAX_TEXTURE_SAMPLERS = 18,
 -   PIPE_SHADER_CAP_PREFERRED_IR = 19
 +   PIPE_SHADER_CAP_PREFERRED_IR = 19,
 +   PIPE_SHADER_CAP_TGSI_SQRT_SUPPORTED = 20
  };
  
  /**
 diff --git a/src/gallium/include/pipe/p_shader_tokens.h 
 b/src/gallium/include/pipe/p_shader_tokens.h
 index 3fb12fb..a9fb6aa 100644
 --- a/src/gallium/include/pipe/p_shader_tokens.h
 +++ b/src/gallium/include/pipe/p_shader_tokens.h
 @@ -275,7 +275,7 @@ struct tgsi_property_data {
  #define TGSI_OPCODE_SUB 17
  #define TGSI_OPCODE_LRP 18
  #define TGSI_OPCODE_CND 19
 -/* gap */
 +#define TGSI_OPCODE_SQRT20
  #define TGSI_OPCODE_DP2A21
  /* gap */
  #define TGSI_OPCODE_FRC 24

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] mesa: implement GL_ARB_draw_indirect

2013-02-01 Thread Christoph Bumiller
I have 1 piglit test to check drawing with several combinations of
parameters (using transform feedback to write the commands), but
will make some more tests for various things like interaction with
PrimitiveRestart or error conditions.

(http://people.freedesktop.org/~chrisbmr/0001-arb_draw_indirect-add-initial-test.patch)

The gallium interface specifies a start_instance parameter that the
GL extension doesn't have (it's reservedMustBeZero instead, but,
seriously, why ? D3D does have it. Because making yet another
extension will be so much fun ?)

Not sure if we want to expose this with the compatibilit profile.
---
 src/mapi/glapi/gen/ARB_draw_indirect.xml |   45 +
 src/mapi/glapi/gen/Makefile.am   |1 +
 src/mapi/glapi/gen/gl_API.xml|4 +-
 src/mesa/drivers/dri/i965/brw_draw.c |3 +-
 src/mesa/drivers/dri/i965/brw_draw.h |3 +-
 src/mesa/drivers/dri/nouveau/nouveau_vbo_t.c |9 +-
 src/mesa/main/api_validate.c |  159 
 src/mesa/main/api_validate.h |   26 +++
 src/mesa/main/bufferobj.c|9 +
 src/mesa/main/dd.h   |   12 ++
 src/mesa/main/dlist.c|   41 +
 src/mesa/main/extensions.c   |2 +
 src/mesa/main/get.c  |5 +
 src/mesa/main/get_hash_params.py |2 +
 src/mesa/main/mtypes.h   |4 +
 src/mesa/main/tests/dispatch_sanity.cpp  |8 +-
 src/mesa/main/vtxfmt.c   |7 +
 src/mesa/state_tracker/st_cb_rasterpos.c |2 +-
 src/mesa/state_tracker/st_draw.c |3 +-
 src/mesa/state_tracker/st_draw.h |6 +-
 src/mesa/state_tracker/st_draw_feedback.c|3 +-
 src/mesa/tnl/tnl.h   |3 +-
 src/mesa/vbo/vbo.h   |5 +-
 src/mesa/vbo/vbo_exec_array.c|  251 +-
 src/mesa/vbo/vbo_exec_draw.c |2 +-
 src/mesa/vbo/vbo_primitive_restart.c |4 +-
 src/mesa/vbo/vbo_rebase.c|2 +-
 src/mesa/vbo/vbo_save_api.c  |   53 ++
 src/mesa/vbo/vbo_save_draw.c |2 +-
 src/mesa/vbo/vbo_split_copy.c|2 +-
 src/mesa/vbo/vbo_split_inplace.c |2 +-
 31 files changed, 652 insertions(+), 28 deletions(-)
 create mode 100644 src/mapi/glapi/gen/ARB_draw_indirect.xml

diff --git a/src/mapi/glapi/gen/ARB_draw_indirect.xml 
b/src/mapi/glapi/gen/ARB_draw_indirect.xml
new file mode 100644
index 000..7de03cd
--- /dev/null
+++ b/src/mapi/glapi/gen/ARB_draw_indirect.xml
@@ -0,0 +1,45 @@
+?xml version=1.0?
+!DOCTYPE OpenGLAPI SYSTEM gl_API.dtd
+
+OpenGLAPI
+
+category name=GL_ARB_draw_indirect number=87
+
+enum name=DRAW_INDIRECT_BUFFER   value=0x8F3F/
+enum name=DRAW_INDIRECT_BUFFER_BINDING   value=0x8F43/
+
+function name=DrawArraysIndirect offset=assign exec=dynamic
+param name=mode type=GLenum/
+param name=indirect type=const GLvoid */
+/function
+
+function name=DrawElementsIndirect offset=assign exec=dynamic
+param name=mode type=GLenum/
+param name=type type=GLenum/
+param name=indirect type=const GLvoid */
+/function
+
+/category
+
+
+category name=GL_ARB_multi_draw_indirect number=133
+
+function name=MultiDrawArraysIndirect offset=assign exec=dynamic
+param name=mode type=GLenum/
+param name=indirect type=const GLvoid */
+param name=primcount type=GLsizei/
+param name=stride type=GLsizei/
+/function
+
+function name=MultiDrawElementsIndirect offset=assign exec=dynamic
+param name=mode type=GLenum/
+param name=type type=GLenum/
+param name=indirect type=const GLvoid */
+param name=primcount type=GLsizei/
+param name=stride type=GLsizei/
+/function
+
+/category
+
+
+/OpenGLAPI
diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am
index 4d51bbc..37fdea1 100644
--- a/src/mapi/glapi/gen/Makefile.am
+++ b/src/mapi/glapi/gen/Makefile.am
@@ -96,6 +96,7 @@ API_XML = \
ARB_depth_clamp.xml \
ARB_draw_buffers_blend.xml \
ARB_draw_elements_base_vertex.xml \
+   ARB_draw_indirect.xml \
ARB_draw_instanced.xml \
ARB_ES2_compatibility.xml \
ARB_ES3_compatibility.xml \
diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index 4cbd724..bb6034f 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -8239,6 +8239,8 @@
 
 !-- ARB extensions #86...#93 --
 
+xi:include href=ARB_draw_indirect.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
+
 category name=GL_ARB_transform_feedback3 number=94
   enum name=MAX_TRANSFORM_FEEDBACK_BUFFERS value=0x8E70/
   enum name=MAX_VERTEX_STREAMS value=0x8E71/
@@ -8316,7 

[Mesa-dev] [PATCH 3/4] st/mesa: add support for indirect drawing

2013-02-01 Thread Christoph Bumiller
---
 src/mesa/state_tracker/st_draw.c   |7 ++-
 src/mesa/state_tracker/st_extensions.c |4 +++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_draw.c b/src/mesa/state_tracker/st_draw.c
index 0f3aae7..f9fbd32 100644
--- a/src/mesa/state_tracker/st_draw.c
+++ b/src/mesa/state_tracker/st_draw.c
@@ -256,6 +256,10 @@ st_draw_vbo(struct gl_context *ctx,
   }
}
 
+   if (indirect) {
+  info.indirect = st_buffer_object(indirect)-buffer;
+   }
+
/* do actual drawing */
for (i = 0; i  nr_prims; i++) {
   info.mode = translate_prim( ctx, prims[i].mode );
@@ -268,6 +272,7 @@ st_draw_vbo(struct gl_context *ctx,
  info.min_index = info.start;
  info.max_index = info.start + info.count - 1;
   }
+  info.indirect_offset = prims[i].indirect_offset;
 
   if (ST_DEBUG  DEBUG_DRAW) {
  debug_printf(st/draw: mode %s  start %u  count %u  indexed %d\n,
@@ -277,7 +282,7 @@ st_draw_vbo(struct gl_context *ctx,
   info.indexed);
   }
 
-  if (info.count_from_stream_output) {
+  if (info.count_from_stream_output || info.indirect) {
  cso_draw_vbo(st-cso_context, info);
   }
   else if (info.primitive_restart) {
diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 214588f..548bab2 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -398,7 +398,9 @@ void st_init_extensions(struct st_context *st)
   { o(MESA_texture_array),   PIPE_CAP_MAX_TEXTURE_ARRAY_LAYERS 
},
 
   { o(OES_standard_derivatives), PIPE_CAP_SM3  
},
-  { o(ARB_texture_cube_map_array),   PIPE_CAP_CUBE_MAP_ARRAY   
}
+  { o(ARB_texture_cube_map_array),   PIPE_CAP_CUBE_MAP_ARRAY   
},
+  { o(ARB_draw_indirect),PIPE_CAP_DRAW_INDIRECT
},
+  { o(ARB_multi_draw_indirect),  PIPE_CAP_DRAW_INDIRECT
}
};
 
/* Required: render target and sampler support */
-- 
1.7.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   4   >