date:20140814

Re: [Mesa-dev] [PATCH 1/5] mesa: add ARB_derivative_control extension bit

2014-08-14 Thread Matt Turner

On Wed, Aug 13, 2014 at 9:52 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 ---
  src/mesa/main/extensions.c | 1 +
  src/mesa/main/mtypes.h | 1 +
  2 files changed, 2 insertions(+)

 diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
 index 8658ca8..3dcb199 100644
 --- a/src/mesa/main/extensions.c
 +++ b/src/mesa/main/extensions.c
 @@ -101,6 +101,7 @@ static const struct extension extension_table[] = {
 { GL_ARB_depth_buffer_float,  
 o(ARB_depth_buffer_float),  GL, 2008 },
 { GL_ARB_depth_clamp, o(ARB_depth_clamp),   
   GL, 2003 },
 { GL_ARB_depth_texture,   o(ARB_depth_texture), 
   GLL,2001 },
 +   { GL_ARB_derivative_control,  
 o(ARB_derivative_control),  GLC,2014 },

No reason to be core-only that I can see.

With s/GLC/GL/ this is

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 2/5] glsl: add ARB_derivative control support

2014-08-14 Thread Matt Turner

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/5] Add ARB_derivative_control support

2014-08-14 Thread Matt Turner

On Wed, Aug 13, 2014 at 9:52 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 I left all the variants as separate operations in the glsl ir. However for
 gallium I only added the fine version, as it seems like DDX can do pretty much
 whatever it wants. I was on the fence about adding coarse versions as well and
 then using the FragmentShaderDerivative hint to select one or the other in the
 glsl - tgsi conversion.

 In the case of nv50/nvc0, doing the fine version is pretty much the only
 (easy) way of doing derivatives. I haven't traced the blob to see how it
 handles things yet. In any case, on nv50/nvc0 all this is completely moot, at
 least for now. Curious about what the situation with other hardware is.

i965 already implements coarse and fine derivatives, selectable by the
derivatives hint, coarse default.

The calculation of the derivative itself isn't faster for coarse
derivatives, but it was discovered that if all of the samples of a
sample_d are from the same LOD, it's a bunch faster on Haswell at
least. See commit 848c0e72. And with coarse derivatives they are.

Maybe other hardware has similar optimizations?

 Also, the extension spec claims to require GLSL 4.00, which seems a little
 extreme. Instead I restrict it to core contexts. Let me know if I should
 change this.

Making it core-only doesn't help, nor does it satisfy the GLSL = 4.0
requirement in the spec. I'm not sure if we have a way to arbitrarily
limit an extension to being exposed under certain GLSL versions... ?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/5] mesa: add ARB_derivative_control extension bit

2014-08-14 Thread Matt Turner

On Wed, Aug 13, 2014 at 11:44 PM, Matt Turner matts...@gmail.com wrote:
 On Wed, Aug 13, 2014 at 9:52 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 ---
  src/mesa/main/extensions.c | 1 +
  src/mesa/main/mtypes.h | 1 +
  2 files changed, 2 insertions(+)

 diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
 index 8658ca8..3dcb199 100644
 --- a/src/mesa/main/extensions.c
 +++ b/src/mesa/main/extensions.c
 @@ -101,6 +101,7 @@ static const struct extension extension_table[] = {
 { GL_ARB_depth_buffer_float,  
 o(ARB_depth_buffer_float),  GL, 2008 },
 { GL_ARB_depth_clamp, o(ARB_depth_clamp),  
GL, 2003 },
 { GL_ARB_depth_texture,   o(ARB_depth_texture),
GLL,2001 },
 +   { GL_ARB_derivative_control,  
 o(ARB_derivative_control),  GLC,2014 },

 No reason to be core-only that I can see.

I guess we can just leave it up to the drivers to turn on the
extension if GLSL = 4.00? Seems ugly. Also seems like a pretty
arbitrary requirement.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/9] glsl: Optimize min/max expression trees

2014-08-14 Thread Abdiel Janulgue



On 14.08.2014 04:33, Ian Romanick wrote:
 On 07/29/2014 02:36 AM, Petri Latvala wrote:
 Add an optimization pass that drops min/max expression operands that
 can be proven to not contribute to the final result. The algorithm is
 similar to alpha-beta pruning on a minmax search, from the field of
 AI.

 This optimization pass can optimize min/max expressions where operands
 are min/max expressions. Such code can appear in shaders by itself, or
 as the result of clamp() or AMD_shader_trinary_minmax functions.

 This optimization pass improves the generated code for piglit's
 AMD_shader_trinary_minmax tests as follows:

 total instructions in shared programs: 75 - 67 (-10.67%)
 instructions in affected programs: 60 - 52 (-13.33%)
 GAINED:0
 LOST:  0

 All tests (max3, min3, mid3) improved.
 
 And I assume no piglit regressions?
 
 Also... have you tried this in combination with Abdiel's related work on
 saturates?
 


Petteri,

What is your plan on this particular pass? I have a similar patch that
drops the min/max expression but using a different approach. Do you want
to push for this particular optimization or do you want to take over the
series?

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/9] glsl: Optimize min/max expression trees

2014-08-14 Thread Connor Abbott

On Tue, Jul 29, 2014 at 2:36 AM, Petri Latvala petri.latv...@intel.com wrote:
 Add an optimization pass that drops min/max expression operands that
 can be proven to not contribute to the final result. The algorithm is
 similar to alpha-beta pruning on a minmax search, from the field of
 AI.

 This optimization pass can optimize min/max expressions where operands
 are min/max expressions. Such code can appear in shaders by itself, or
 as the result of clamp() or AMD_shader_trinary_minmax functions.

 This optimization pass improves the generated code for piglit's
 AMD_shader_trinary_minmax tests as follows:

 total instructions in shared programs: 75 - 67 (-10.67%)
 instructions in affected programs: 60 - 52 (-13.33%)
 GAINED:0
 LOST:  0

 All tests (max3, min3, mid3) improved.

 A full shader-db run:

 total instructions in shared programs: 4293603 - 4293575 (-0.00%)
 instructions in affected programs: 1188 - 1160 (-2.36%)
 GAINED:0
 LOST:  0

 Improvements happen in Guacamelee and Serious Sam 3. One shader from
 Dungeon Defenders is hurt by shader-db metrics (26 - 28), because of
 dropping of a (constant float (0.0)) operand, which was
 compiled to a saturate modifier.

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76861
 Signed-off-by: Petri Latvala petri.latv...@intel.com
 ---
  src/glsl/Makefile.sources   |   1 +
  src/glsl/glsl_parser_extras.cpp |   1 +
  src/glsl/ir_optimization.h  |   1 +
  src/glsl/opt_minmax.cpp | 395 
 
  4 files changed, 398 insertions(+)
  create mode 100644 src/glsl/opt_minmax.cpp

 diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
 index b54eae7..1ee80a3 100644
 --- a/src/glsl/Makefile.sources
 +++ b/src/glsl/Makefile.sources
 @@ -95,6 +95,7 @@ LIBGLSL_FILES = \
 $(GLSL_SRCDIR)/opt_flip_matrices.cpp \
 $(GLSL_SRCDIR)/opt_function_inlining.cpp \
 $(GLSL_SRCDIR)/opt_if_simplification.cpp \
 +   $(GLSL_SRCDIR)/opt_minmax.cpp \
 $(GLSL_SRCDIR)/opt_noop_swizzle.cpp \
 $(GLSL_SRCDIR)/opt_rebalance_tree.cpp \
 $(GLSL_SRCDIR)/opt_redundant_jumps.cpp \
 diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
 index 890123a..9f57ef3 100644
 --- a/src/glsl/glsl_parser_extras.cpp
 +++ b/src/glsl/glsl_parser_extras.cpp
 @@ -1561,6 +1561,7 @@ do_common_optimization(exec_list *ir, bool linked,
 else
progress = do_constant_variable_unlinked(ir) || progress;
 progress = do_constant_folding(ir) || progress;
 +   progress = do_minmax_prune(ir) || progress;
 progress = do_cse(ir) || progress;
 progress = do_rebalance_tree(ir) || progress;
 progress = do_algebraic(ir, native_integers, options) || progress;
 diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
 index b83c225..9d22585 100644
 --- a/src/glsl/ir_optimization.h
 +++ b/src/glsl/ir_optimization.h
 @@ -98,6 +98,7 @@ bool opt_flatten_nested_if_blocks(exec_list *instructions);
  bool do_discard_simplification(exec_list *instructions);
  bool lower_if_to_cond_assign(exec_list *instructions, unsigned max_depth = 
 0);
  bool do_mat_op_to_vec(exec_list *instructions);
 +bool do_minmax_prune(exec_list *instructions);
  bool do_noop_swizzle(exec_list *instructions);
  bool do_structure_splitting(exec_list *instructions);
  bool do_swizzle_swizzle(exec_list *instructions);
 diff --git a/src/glsl/opt_minmax.cpp b/src/glsl/opt_minmax.cpp
 new file mode 100644
 index 000..5656059
 --- /dev/null
 +++ b/src/glsl/opt_minmax.cpp
 @@ -0,0 +1,395 @@
 +/*
 + * Copyright © 2014 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the Software),
 + * to deal in the Software without restriction, including without limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the next
 + * paragraph) shall be included in all copies or substantial portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 + * DEALINGS IN THE SOFTWARE.
 + */
 +
 +/**
 + * \file opt_minmax.cpp
 + *
 + * Drop operands from an expression tree of only

Re: [Mesa-dev] [PATCH] egl_dri2: fix EXT_image_dma_buf_import fds

2014-08-14 Thread Pekka Paalanen

On Wed, 13 Aug 2014 19:46:40 +0300
Pohjolainen, Topi topi.pohjolai...@intel.com wrote:

 On Fri, Aug 08, 2014 at 05:28:59PM +0300, Pekka Paalanen wrote:
  From: Pekka Paalanen pekka.paala...@collabora.co.uk
  
  The EGL_EXT_image_dma_buf_import specification was revised (according to
  its revision history) on Dec 5th, 2013, for EGL to not take ownership of
  the file descriptors.
  
  Do not close the file descriptors passed in to eglCreateImageKHR with
  EGL_LINUX_DMA_BUF_EXT target.
  
  It is assumed, that the drivers, which ultimately process the file
  descriptors, do not close or modify them in any way either. This avoids
  the need to dup(), as it seems we would only need to just close the
  dup'd file descriptors right after.
  
  Signed-off-by: Pekka Paalanen pekka.paala...@collabora.co.uk
 
 I wrote the current logic based on the older version, and at least to me this
 is the right thing to do. Thanks for fixing it as well as taking care of the
 piglit test.
 
 Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com
 
 I would be happier though if someone else gave his/her approval as well.

Thank you, I have added your R-b, and will wait some more. I think I
want the piglit patch landed first before I try to push this, anyway.

Thanks for the piglit review too, I sent a new version with your R-b
and the comment fix.


- pq

 
  ---
  
  Hi,
  
  the corresponding Piglit fix has already been sent to the piglit mailing
  list. Both this and that need to be applied to not regress Mesa' piglit run
  by one test (ext_image_dma_buf_import-ownership_transfer).
  
  This patch fixes my test case on heavily modified Weston.
  
  Thanks,
  pq
  ---
   src/egl/drivers/dri2/egl_dri2.c | 37 ++---
   1 file changed, 6 insertions(+), 31 deletions(-)
  
  diff --git a/src/egl/drivers/dri2/egl_dri2.c 
  b/src/egl/drivers/dri2/egl_dri2.c
  index 5602ec3..cd85fd3 100644
  --- a/src/egl/drivers/dri2/egl_dri2.c
  +++ b/src/egl/drivers/dri2/egl_dri2.c
  @@ -1678,36 +1678,13 @@ dri2_check_dma_buf_format(const _EGLImageAttribs 
  *attrs)
   /**
* The spec says:
*
  - * If eglCreateImageKHR is successful for a EGL_LINUX_DMA_BUF_EXT target,
  - *  the EGL takes ownership of the file descriptor and is responsible for
  - *  closing it, which it may do at any time while the EGLDisplay is
  - *  initialized.
  + * If eglCreateImageKHR is successful for a EGL_LINUX_DMA_BUF_EXT target, 
  the
  + *  EGL will take a reference to the dma_buf(s) which it will release at 
  any
  + *  time while the EGLDisplay is initialized. It is the responsibility of 
  the
  + *  application to close the dma_buf file descriptors.
  + *
  + * Therefore we must never close or otherwise modify the file descriptors.
*/
  -static void
  -dri2_take_dma_buf_ownership(const int *fds, unsigned num_fds)
  -{
  -   int already_closed[num_fds];
  -   unsigned num_closed = 0;
  -   unsigned i, j;
  -
  -   for (i = 0; i  num_fds; ++i) {
  -  /**
  -   * The same file descriptor can be referenced multiple times in case 
  more
  -   * than one plane is found in the same buffer, just with a different
  -   * offset.
  -   */
  -  for (j = 0; j  num_closed; ++j) {
  - if (already_closed[j] == fds[i])
  -break;
  -  }
  -
  -  if (j == num_closed) {
  - close(fds[i]);
  - already_closed[num_closed++] = fds[i];
  -  }
  -   }
  -}
  -
   static _EGLImage *
   dri2_create_image_dma_buf(_EGLDisplay *disp, _EGLContext *ctx,
EGLClientBuffer buffer, const EGLint *attr_list)
  @@ -1770,8 +1747,6 @@ dri2_create_image_dma_buf(_EGLDisplay *disp, 
  _EGLContext *ctx,
 return EGL_NO_IMAGE_KHR;
   
  res = dri2_create_image_from_dri(disp, dri_image);
  -   if (res)
  -  dri2_take_dma_buf_ownership(fds, num_fds);
   
  return res;
   }
  -- 
  1.8.5.5
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] squash! glsl: Optimize min/max expression trees

2014-08-14 Thread Connor Abbott

On Wed, Aug 13, 2014 at 9:04 PM, Matt Turner matts...@gmail.com wrote:
 ---
 I'd squash this in at minimum. The changes are

  - Whitespace
  - Removal of unnecessary destructor
  - Renaming one and two to a and b (one-value.u[c0]  
 two-value.u[c0]...)
  - continue - break
  - assert(!...) - unreachable
  - Not doing assignments in if conditionals
  - Marking swizzle_if_required as static

 I also think less_all_components should just return an enum like
 { MIXED, EQUAL, LESS, GREATER }, rather than setting a variable in
 the class. It, as well as smaller/larger_constant, can then be
 static functions outside of the visitor.

I agree. Also, I realized that in the only place where we care about
the valid variable,

Another thing I'd like to see is to change minmax_range to call things
low and high instead of range[0] and range[1]. This helps
readability, and the tricks with indirect addressing that having an
array lets you do are things we really shouldn't be doing anyways
because it's hard to follow.

As I mentioned before, swizzle_if_required() should probably use the
ir_builder swizzle helpers.

I'm still not convinced that the algorithm is the best way to go about
it. Right now, AFAICT, we do something like:

- Pass in a base range, which is what the min's and max's above us
in the tree will clamp the value we return to
- Get the ranges for each subexpression (this is a recursive call)
- Check and see if each operand is unnecessary (i.e. its range is
strictly greater than the base range or strictly greater than the
other argument for mins, the other way around for max's)

As another thing, the logic for this part could be made a *lot*
clearer by rearranging the code and commenting. I'd do something like:

bool is_redundant = false /* whether this operand will never affect
the final value of the min-max tree */

if (is_min) {
   /* if this operand will always be greater than the other one, it's
redundant */
   if (limit[i].low  limit[1 - i].high)
  is_redundant = true;

   /* if this operand is always greater than baserange, then even if
it's smaller than the other one it'll get clamped so it's redundant */
   if (limit[i].low  baserange.high)
  is_redundant = true;
} else {
   ... the exact same logic mirrored ...
}

- Recurse into the subexpressions, computing the new baserange.

What I think we should do instead is change prune_expression() to also
return the range for the expression (it's now returning two things, so
one would have to be passed via a class variable), so it would look
like:

- Pass in the base range
- If this is a constant, return ourself and the range with low == high
- Recurse into both subexpressions, setting both the range (limits[i])
and the new subexpression
- If one of the subexpressions is redundant, return the other
subexpression and its range
- Otherwise, return ourself and the combination of the ranges

This will allow us to do the recursion only once, instead of once in
get_range() and once in prune_expression(), which will make things
simpler and faster.


 I think the algorithm itself looks correct.

  src/glsl/opt_minmax.cpp | 145 
 +---
  1 file changed, 63 insertions(+), 82 deletions(-)

 diff --git a/src/glsl/opt_minmax.cpp b/src/glsl/opt_minmax.cpp
 index 5656059..b987386 100644
 --- a/src/glsl/opt_minmax.cpp
 +++ b/src/glsl/opt_minmax.cpp
 @@ -37,12 +37,10 @@
  #include glsl_types.h
  #include main/macros.h

 -namespace
 -{
 -class minmax_range
 -{
 -public:
 +namespace {

 +class minmax_range {
 +public:
 minmax_range(ir_constant *low = NULL, ir_constant *high = NULL)
 {
range[0] = low;
 @@ -60,60 +58,45 @@ public:
  class ir_minmax_visitor : public ir_rvalue_enter_visitor {
  public:
 ir_minmax_visitor()
 -  : progress(false)
 -  , valid(true)
 -   {
 -   }
 -
 -   virtual ~ir_minmax_visitor()
 +  : progress(false), valid(true)
 {
 }

 -   bool
 -   less_all_components(ir_constant *one, ir_constant *two);
 -
 -   ir_constant *
 -   smaller_constant(ir_constant *one, ir_constant *two);
 -
 -   ir_constant *
 -   larger_constant(ir_constant *one, ir_constant *two);
 +   bool less_all_components(ir_constant *a, ir_constant *b);
 +   ir_constant *smaller_constant(ir_constant *a, ir_constant *b);
 +   ir_constant *larger_constant(ir_constant *a, ir_constant *b);

 -   minmax_range
 -   combine_range(minmax_range r0, minmax_range r1, bool ismin);
 +   minmax_range combine_range(minmax_range r0, minmax_range r1, bool ismin);

 -   minmax_range
 -   range_intersection(minmax_range r0, minmax_range r1);
 +   minmax_range range_intersection(minmax_range r0, minmax_range r1);

 -   minmax_range
 -   get_range(ir_rvalue *rval);
 +   minmax_range get_range(ir_rvalue *rval);

 -   ir_rvalue *
 -   prune_expression(ir_expression *expr, minmax_range baserange);
 +   ir_rvalue *prune_expression(ir_expression *expr, minmax_range baserange);

 -   void
 -   handle_rvalue(ir_rvalue

[Mesa-dev] [PATCH] glsl: Fixed vectorize pass vs. texture lookups

2014-08-14 Thread Aras Pranckevicius

Attached patch fixes GLSL vectorization optimization going wrong on some
texture lookups, see https://bugs.freedesktop.org/show_bug.cgi?id=82574


-- 
Aras Pranckevičius
work: http://unity3d.com
home: http://aras-p.info
From 9c592e2d0216e1b17f303be3ae1505b209abd5b3 Mon Sep 17 00:00:00 2001
From: Aras Pranckevicius a...@unity3d.com
Date: Wed, 13 Aug 2014 20:40:05 +0300
Subject: [PATCH] glsl: Fixed vectorize pass vs. texture lookups
 https://bugs.freedesktop.org/show_bug.cgi?id=82574

---
 src/glsl/opt_vectorize.cpp   | 13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/src/glsl/opt_vectorize.cpp b/src/glsl/opt_vectorize.cpp
index 826de5f..aa24043 100644
--- a/src/glsl/opt_vectorize.cpp
+++ b/src/glsl/opt_vectorize.cpp
@@ -86,6 +86,7 @@ public:
virtual ir_visitor_status visit_enter(ir_expression *);
virtual ir_visitor_status visit_enter(ir_if *);
virtual ir_visitor_status visit_enter(ir_loop *);
+   virtual ir_visitor_status visit_enter(ir_texture *);
 
virtual ir_visitor_status visit_leave(ir_assignment *);
 
@@ -354,6 +355,18 @@ ir_vectorize_visitor::visit_enter(ir_loop *ir)
 }
 
 /**
+ * Upon entering an ir_texture, remove the current assignment from
+ * further consideration. Vectorizing multiple texture lookups into one
+ * is wrong.
+ */
+ir_visitor_status
+ir_vectorize_visitor::visit_enter(ir_texture *)
+{
+   this-current_assignment = NULL;
+   return visit_continue_with_parent;
+}
+
+/**
  * Upon leaving an ir_assignment, save a pointer to it in ::assignment[] if
  * the swizzle mask(s) found were appropriate. Also save a pointer in
  * ::last_assignment so that we can compare future assignments with it.
-- 
1.8.4.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/5] Enable ARB_derivative_control for i965/Gen7+

2014-08-14 Thread Chris Forbes

Since i965 already had derivative control via hints  driconf, this was
too trivial to pass up.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/5] i965/vec4: Assert that fine/coarse derivative ops don't appear

2014-08-14 Thread Chris Forbes

Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 1b46850..5a13094 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -1348,7 +1348,11 @@ vec4_visitor::visit(ir_expression *ir)
   break;
 
case ir_unop_dFdx:
+   case ir_unop_dFdx_coarse:
+   case ir_unop_dFdx_fine:
case ir_unop_dFdy:
+   case ir_unop_dFdy_coarse:
+   case ir_unop_dFdy_fine:
   unreachable(derivatives not valid in vertex shader);
 
case ir_unop_bitfield_reverse:
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/5] i965/fs: Support fine/coarse derivative opcodes

2014-08-14 Thread Chris Forbes

The quality level (fine/coarse/dont-care) is plumbed through to the
generator as a constant in src1.

Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/brw_defines.h|  6 ++
 src/mesa/drivers/dri/i965/brw_fs.h |  4 ++--
 .../dri/i965/brw_fs_channel_expressions.cpp|  4 
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 24 --
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 16 +--
 5 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 3564041..1322ed2 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1004,6 +1004,12 @@ enum opcode {
GS_OPCODE_GET_INSTANCE_ID,
 };
 
+enum brw_derivative_quality {
+   BRW_DERIVATIVE_BY_HINT = 0,
+   BRW_DERIVATIVE_FINE = 1,
+   BRW_DERIVATIVE_COARSE = 2,
+};
+
 enum brw_urb_write_flags {
BRW_URB_WRITE_NO_FLAGS = 0,
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 5cad504..a838e74 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -604,9 +604,9 @@ private:
void generate_math_g45(fs_inst *inst,
  struct brw_reg dst,
  struct brw_reg src);
-   void generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src);
+   void generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src, 
struct brw_reg quality);
void generate_ddy(fs_inst *inst, struct brw_reg dst, struct brw_reg src,
- bool negate_value);
+ struct brw_reg quality, bool negate_value);
void generate_scratch_write(fs_inst *inst, struct brw_reg src);
void generate_scratch_read(fs_inst *inst, struct brw_reg dst);
void generate_scratch_read_gen7(fs_inst *inst, struct brw_reg dst);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp
index 4113f47..d98b7eb 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_channel_expressions.cpp
@@ -237,7 +237,11 @@ ir_channel_expressions_visitor::visit_leave(ir_assignment 
*ir)
case ir_unop_sin_reduced:
case ir_unop_cos_reduced:
case ir_unop_dFdx:
+   case ir_unop_dFdx_coarse:
+   case ir_unop_dFdx_fine:
case ir_unop_dFdy:
+   case ir_unop_dFdy_coarse:
+   case ir_unop_dFdy_fine:
case ir_unop_bitfield_reverse:
case ir_unop_bit_count:
case ir_unop_find_msb:
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 1190f1f..6efd41c 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -644,11 +644,17 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
  * appropriate swizzling.
  */
 void
-fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg 
src)
+fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg 
src,
+   struct brw_reg quality)
 {
unsigned vstride, width;
+   assert(quality.file == BRW_IMMEDIATE_VALUE);
+   assert(quality.type == BRW_REGISTER_TYPE_D);
 
-   if (key-high_quality_derivatives) {
+   int quality_value = quality.dw1.d;
+
+   if (quality_value == BRW_DERIVATIVE_FINE ||
+  (key-high_quality_derivatives  quality_value != 
BRW_DERIVATIVE_COARSE)) {
   /* produce accurate derivatives */
   vstride = BRW_VERTICAL_STRIDE_2;
   width = BRW_WIDTH_2;
@@ -680,9 +686,15 @@ fs_generator::generate_ddx(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
  */
 void
 fs_generator::generate_ddy(fs_inst *inst, struct brw_reg dst, struct brw_reg 
src,
- bool negate_value)
+ struct brw_reg quality, bool negate_value)
 {
-   if (key-high_quality_derivatives) {
+   assert(quality.file == BRW_IMMEDIATE_VALUE);
+   assert(quality.type == BRW_REGISTER_TYPE_D);
+
+   int quality_value = quality.dw1.d;
+
+   if (quality_value == BRW_DERIVATIVE_FINE ||
+  (key-high_quality_derivatives  quality_value != 
BRW_DERIVATIVE_COARSE)) {
   /* From the Ivy Bridge PRM, volume 4 part 3, section 3.3.9 (Register
* Region Restrictions):
*
@@ -1655,14 +1667,14 @@ fs_generator::generate_code(exec_list *instructions)
 generate_tex(inst, dst, src[0], src[1]);
 break;
   case FS_OPCODE_DDX:
-generate_ddx(inst, dst, src[0]);
+generate_ddx(inst, dst, src[0], src[1]);
 break;
   case FS_OPCODE_DDY:
  /* Make sure fp-UsesDFdy flag got set (otherwise there's no
   * guarantee that key-render_to_fbo is set).
   */
  assert(fp-UsesDFdy);
-generate_ddy(inst, dst, src[0], key-render_to_fbo);
+

[Mesa-dev] [PATCH 5/5] docs: Mark off ARB_derivative_control for i965.

2014-08-14 Thread Chris Forbes

Also update 10.3 relnotes to match, and note nv50/nvc0 support there.

Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 docs/GL3.txt| 2 +-
 docs/relnotes/10.3.html | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 0631c72..1c3567e 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -188,7 +188,7 @@ GL 4.5, GLSL 4.50:
   GL_ARB_clip_control  not started
   GL_ARB_conditional_render_inverted   not started
   GL_ARB_cull_distance not started
-  GL_ARB_derivative_controlDONE (nv50, nvc0)
+  GL_ARB_derivative_controlDONE (i965, nv50, nvc0)
   GL_ARB_direct_state_access   not started
   GL_ARB_get_texture_sub_image started (Brian Paul)
   GL_ARB_shader_texture_image_samples  not started
diff --git a/docs/relnotes/10.3.html b/docs/relnotes/10.3.html
index a297106..3c33150 100644
--- a/docs/relnotes/10.3.html
+++ b/docs/relnotes/10.3.html
@@ -46,6 +46,7 @@ Note: some of the new features are only available with 
certain drivers.
 ul
 liGL_ARB_ES3_compatibility on nv50, nvc0, r600, radeonsi, softpipe, 
llvmpipe/li
 liGL_ARB_compressed_texture_pixel_storage on all drivers/li
+liGL_ARB_derivative_control on i965, nv50, nvc0/li
 liGL_ARB_draw_indirect on nvc0, radeonsi/li
 liGL_ARB_explicit_uniform_location (all drivers that support GLSL)/li
 liGL_ARB_multi_draw_indirect on nvc0, radeonsi/li
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/5] glsl: Mark program as using dFdy if coarse/fine variant is used

2014-08-14 Thread Chris Forbes

Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/glsl/ir_set_program_inouts.cpp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/glsl/ir_set_program_inouts.cpp 
b/src/glsl/ir_set_program_inouts.cpp
index 5163eb2..97ead75 100644
--- a/src/glsl/ir_set_program_inouts.cpp
+++ b/src/glsl/ir_set_program_inouts.cpp
@@ -306,7 +306,9 @@ ir_visitor_status
 ir_set_program_inouts_visitor::visit_enter(ir_expression *ir)
 {
if (this-shader_stage == MESA_SHADER_FRAGMENT 
-   ir-operation == ir_unop_dFdy) {
+   (ir-operation == ir_unop_dFdy ||
+ir-operation == ir_unop_dFdy_coarse ||
+ir-operation == ir_unop_dFdy_fine)) {
   gl_fragment_program *fprog = (gl_fragment_program *) prog;
   fprog-UsesDFdy = true;
}
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/5] i965: Enable ARB_derivative_control on Gen7+.

2014-08-14 Thread Chris Forbes

The extension says GL 4.0 is required. We'll meet the spirit
of that restriction by enabling on just those generations which will
soon support GL 4.0 (Gen7+), although it's technically supportable on
all generations.

Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index e134cd9..c672044 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -305,6 +305,7 @@ intelInitExtensions(struct gl_context *ctx)
   }
 
   ctx-Extensions.ARB_texture_compression_bptc = true;
+  ctx-Extensions.ARB_derivative_control = true;
}
 
if (brw-gen = 8) {
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/5] glsl: add ARB_derivative control support

2014-08-14 Thread Chris Forbes

Reviewed-by: Chris Forbes chr...@ijw.co.nz

On Thu, Aug 14, 2014 at 4:52 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 ---
  src/glsl/builtin_functions.cpp  | 48 
 +
  src/glsl/glcpp/glcpp-parse.y|  3 +++
  src/glsl/glsl_parser_extras.cpp |  1 +
  src/glsl/glsl_parser_extras.h   |  2 ++
  src/glsl/ir.h   |  4 
  src/glsl/ir_validate.cpp|  4 
  6 files changed, 62 insertions(+)

 diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
 index 185fe98..c882ec8 100644
 --- a/src/glsl/builtin_functions.cpp
 +++ b/src/glsl/builtin_functions.cpp
 @@ -318,6 +318,14 @@ fs_oes_derivatives(const _mesa_glsl_parse_state *state)
  }

  static bool
 +fs_derivative_control(const _mesa_glsl_parse_state *state)
 +{
 +   return state-stage == MESA_SHADER_FRAGMENT 
 +  (state-is_version(450, 0) ||
 +   state-ARB_derivative_control_enable);
 +}
 +
 +static bool
  tex1d_lod(const _mesa_glsl_parse_state *state)
  {
 return !state-es_shader  lod_exists_in_stage(state);
 @@ -618,6 +626,12 @@ private:
 B1(dFdx);
 B1(dFdy);
 B1(fwidth);
 +   B1(dFdxCoarse);
 +   B1(dFdyCoarse);
 +   B1(fwidthCoarse);
 +   B1(dFdxFine);
 +   B1(dFdyFine);
 +   B1(fwidthFine);
 B1(noise1);
 B1(noise2);
 B1(noise3);
 @@ -2148,6 +2162,12 @@ builtin_builder::create_builtins()
 F(dFdx)
 F(dFdy)
 F(fwidth)
 +   F(dFdxCoarse)
 +   F(dFdyCoarse)
 +   F(fwidthCoarse)
 +   F(dFdxFine)
 +   F(dFdyFine)
 +   F(fwidthFine)
 F(noise1)
 F(noise2)
 F(noise3)
 @@ -4010,7 +4030,11 @@ builtin_builder::_textureQueryLevels(const glsl_type 
 *sampler_type)
  }

  UNOP(dFdx, ir_unop_dFdx, fs_oes_derivatives)
 +UNOP(dFdxCoarse, ir_unop_dFdx_coarse, fs_derivative_control)
 +UNOP(dFdxFine, ir_unop_dFdx_fine, fs_derivative_control)
  UNOP(dFdy, ir_unop_dFdy, fs_oes_derivatives)
 +UNOP(dFdyCoarse, ir_unop_dFdy_coarse, fs_derivative_control)
 +UNOP(dFdyFine, ir_unop_dFdy_fine, fs_derivative_control)

  ir_function_signature *
  builtin_builder::_fwidth(const glsl_type *type)
 @@ -4024,6 +4048,30 @@ builtin_builder::_fwidth(const glsl_type *type)
  }

  ir_function_signature *
 +builtin_builder::_fwidthCoarse(const glsl_type *type)
 +{
 +   ir_variable *p = in_var(type, p);
 +   MAKE_SIG(type, fs_derivative_control, 1, p);
 +
 +   body.emit(ret(add(abs(expr(ir_unop_dFdx_coarse, p)),
 + abs(expr(ir_unop_dFdy_coarse, p);
 +
 +   return sig;
 +}
 +
 +ir_function_signature *
 +builtin_builder::_fwidthFine(const glsl_type *type)
 +{
 +   ir_variable *p = in_var(type, p);
 +   MAKE_SIG(type, fs_derivative_control, 1, p);
 +
 +   body.emit(ret(add(abs(expr(ir_unop_dFdx_fine, p)),
 + abs(expr(ir_unop_dFdy_fine, p);
 +
 +   return sig;
 +}
 +
 +ir_function_signature *
  builtin_builder::_noise1(const glsl_type *type)
  {
 return unop(v110, ir_unop_noise, glsl_type::float_type, type);
 diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y
 index a616973..f1119eb 100644
 --- a/src/glsl/glcpp/glcpp-parse.y
 +++ b/src/glsl/glcpp/glcpp-parse.y
 @@ -2469,6 +2469,9 @@ _glcpp_parser_handle_version_declaration(glcpp_parser_t 
 *parser, intmax_t versio

   if (extensions-ARB_shader_image_load_store)
  add_builtin_define(parser, GL_ARB_shader_image_load_store, 
 1);
 +
 +  if (extensions-ARB_derivative_control)
 + add_builtin_define(parser, GL_ARB_derivative_control, 1);
}
 }

 diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
 index ad91c46..490c3c8 100644
 --- a/src/glsl/glsl_parser_extras.cpp
 +++ b/src/glsl/glsl_parser_extras.cpp
 @@ -514,6 +514,7 @@ static const _mesa_glsl_extension 
 _mesa_glsl_supported_extensions[] = {
 EXT(ARB_arrays_of_arrays,   true,  false, 
 ARB_arrays_of_arrays),
 EXT(ARB_compute_shader, true,  false, ARB_compute_shader),
 EXT(ARB_conservative_depth, true,  false, 
 ARB_conservative_depth),
 +   EXT(ARB_derivative_control, true,  false, 
 ARB_derivative_control),
 EXT(ARB_draw_buffers,   true,  false, dummy_true),
 EXT(ARB_draw_instanced, true,  false, ARB_draw_instanced),
 EXT(ARB_explicit_attrib_location,   true,  false, 
 ARB_explicit_attrib_location),
 diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h
 index ce66e2f..c8b9478 100644
 --- a/src/glsl/glsl_parser_extras.h
 +++ b/src/glsl/glsl_parser_extras.h
 @@ -393,6 +393,8 @@ struct _mesa_glsl_parse_state {
 bool ARB_compute_shader_warn;
 bool ARB_conservative_depth_enable;
 bool ARB_conservative_depth_warn;
 +   bool ARB_derivative_control_enable;
 +   bool ARB_derivative_control_warn;
 bool ARB_draw_buffers_enable;
 bool ARB_draw_buffers_warn;
 bool

[Mesa-dev] [PATCH 2/2] vl/compositor: set the scissor before clearing the render target

2014-08-14 Thread Christian König

From: Christian König christian.koe...@amd.com

Otherwise we clear areas that shouldn't be cleared.

Signed-off-by: Christian König christian.koe...@amd.com
---
 src/gallium/auxiliary/vl/vl_compositor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/vl/vl_compositor.c 
b/src/gallium/auxiliary/vl/vl_compositor.c
index 839fd27..6bd1a88 100644
--- a/src/gallium/auxiliary/vl/vl_compositor.c
+++ b/src/gallium/auxiliary/vl/vl_compositor.c
@@ -1060,6 +1060,7 @@ vl_compositor_render(struct vl_compositor_state *s,
   s-scissor.maxx = dst_surface-width;
   s-scissor.maxy = dst_surface-height;
}
+   c-pipe-set_scissor_states(c-pipe, 0, 1, s-scissor);
 
gen_vertex_data(c, s, dirty_area);
 
@@ -1072,7 +1073,6 @@ vl_compositor_render(struct vl_compositor_state *s,
   dirty_area-x1 = dirty_area-y1 = MIN_DIRTY;
}
 
-   c-pipe-set_scissor_states(c-pipe, 0, 1, s-scissor);
c-pipe-set_framebuffer_state(c-pipe, c-fb_state);
c-pipe-bind_vs_state(c-pipe, c-vs);
c-pipe-set_vertex_buffers(c-pipe, 0, 1, c-vertex_buf);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] st/vdpau: fix vlVdpOutputSurfaceRender(Output|Bitmap)Surface

2014-08-14 Thread Christian König

From: Christian König christian.koe...@amd.com

Correctly handle that the source_surface is only optional.

Signed-off-by: Christian König christian.koe...@amd.com
---
 src/gallium/state_trackers/vdpau/device.c| 43 +++-
 src/gallium/state_trackers/vdpau/output.c| 42 +++
 src/gallium/state_trackers/vdpau/vdpau_private.h |  1 +
 3 files changed, 71 insertions(+), 15 deletions(-)

diff --git a/src/gallium/state_trackers/vdpau/device.c 
b/src/gallium/state_trackers/vdpau/device.c
index 9c5ec60..efc1fde 100644
--- a/src/gallium/state_trackers/vdpau/device.c
+++ b/src/gallium/state_trackers/vdpau/device.c
@@ -42,6 +42,8 @@ vdp_imp_device_create_x11(Display *display, int screen, 
VdpDevice *device,
   VdpGetProcAddress **get_proc_address)
 {
struct pipe_screen *pscreen;
+   struct pipe_resource *res, res_tmpl;
+   struct pipe_sampler_view sv_tmpl;
vlVdpDevice *dev = NULL;
VdpStatus ret;
 
@@ -79,6 +81,43 @@ vdp_imp_device_create_x11(Display *display, int screen, 
VdpDevice *device,
   goto no_context;
}
 
+   memset(res_tmpl, 0, sizeof(res_tmpl));
+
+   res_tmpl.target = PIPE_TEXTURE_2D;
+   res_tmpl.format = PIPE_FORMAT_R8G8B8A8_UNORM;
+   res_tmpl.width0 = 1;
+   res_tmpl.height0 = 1;
+   res_tmpl.depth0 = 1;
+   res_tmpl.array_size = 1;
+   res_tmpl.bind = PIPE_BIND_SAMPLER_VIEW;
+   res_tmpl.usage = PIPE_USAGE_DEFAULT;
+
+   if (!CheckSurfaceParams(pscreen, res_tmpl)) {
+  ret = VDP_STATUS_NO_IMPLEMENTATION;
+  goto no_resource;
+   }
+
+   res = pscreen-resource_create(pscreen, res_tmpl);
+   if (!res) {
+  ret = VDP_STATUS_RESOURCES;
+  goto no_resource;
+   }
+
+   memset(sv_tmpl, 0, sizeof(sv_tmpl));
+   u_sampler_view_default_template(sv_tmpl, res, res-format);
+
+   sv_tmpl.swizzle_r = PIPE_SWIZZLE_ONE;
+   sv_tmpl.swizzle_g = PIPE_SWIZZLE_ONE;
+   sv_tmpl.swizzle_b = PIPE_SWIZZLE_ONE;
+   sv_tmpl.swizzle_a = PIPE_SWIZZLE_ONE;
+
+   dev-dummy_sv = dev-context-create_sampler_view(dev-context, res, 
sv_tmpl);
+   pipe_resource_reference(res, NULL);
+   if (!dev-dummy_sv) {
+  ret = VDP_STATUS_RESOURCES;
+  goto no_resource;
+   }
+
*device = vlAddDataHTAB(dev);
if (*device == 0) {
   ret = VDP_STATUS_ERROR;
@@ -93,8 +132,9 @@ vdp_imp_device_create_x11(Display *display, int screen, 
VdpDevice *device,
return VDP_STATUS_OK;
 
 no_handle:
+   pipe_sampler_view_reference(dev-dummy_sv, NULL);
+no_resource:
dev-context-destroy(dev-context);
-   /* Destroy vscreen */
 no_context:
vl_screen_destroy(dev-vscreen);
 no_vscreen:
@@ -185,6 +225,7 @@ vlVdpDeviceFree(vlVdpDevice *dev)
 {
pipe_mutex_destroy(dev-mutex);
vl_compositor_cleanup(dev-compositor);
+   pipe_sampler_view_reference(dev-dummy_sv, NULL);
dev-context-destroy(dev-context);
vl_screen_destroy(dev-vscreen);
FREE(dev);
diff --git a/src/gallium/state_trackers/vdpau/output.c 
b/src/gallium/state_trackers/vdpau/output.c
index caae50f..3248f76 100644
--- a/src/gallium/state_trackers/vdpau/output.c
+++ b/src/gallium/state_trackers/vdpau/output.c
@@ -624,9 +624,9 @@ vlVdpOutputSurfaceRenderOutputSurface(VdpOutputSurface 
destination_surface,
   uint32_t flags)
 {
vlVdpOutputSurface *dst_vlsurface;
-   vlVdpOutputSurface *src_vlsurface;
 
struct pipe_context *context;
+   struct pipe_sampler_view *src_sv;
struct vl_compositor *compositor;
struct vl_compositor_state *cstate;
 
@@ -639,12 +639,19 @@ vlVdpOutputSurfaceRenderOutputSurface(VdpOutputSurface 
destination_surface,
if (!dst_vlsurface)
   return VDP_STATUS_INVALID_HANDLE;
 
-   src_vlsurface = vlGetDataHTAB(source_surface);
-   if (!src_vlsurface)
-  return VDP_STATUS_INVALID_HANDLE;
+   if (source_surface == VDP_INVALID_HANDLE) {
+  src_sv = dst_vlsurface-device-dummy_sv;
+
+   } else {
+  vlVdpOutputSurface *src_vlsurface = vlGetDataHTAB(source_surface);
+  if (!src_vlsurface)
+ return VDP_STATUS_INVALID_HANDLE;
 
-   if (dst_vlsurface-device != src_vlsurface-device)
-  return VDP_STATUS_HANDLE_DEVICE_MISMATCH;
+  if (dst_vlsurface-device != src_vlsurface-device)
+ return VDP_STATUS_HANDLE_DEVICE_MISMATCH;
+
+  src_sv = src_vlsurface-sampler_view;
+   }
 
pipe_mutex_lock(dst_vlsurface-device-mutex);
vlVdpResolveDelayedRendering(dst_vlsurface-device, NULL, NULL);
@@ -657,7 +664,7 @@ vlVdpOutputSurfaceRenderOutputSurface(VdpOutputSurface 
destination_surface,
 
vl_compositor_clear_layers(cstate);
vl_compositor_set_layer_blend(cstate, 0, blend, false);
-   vl_compositor_set_rgba_layer(cstate, compositor, 0, 
src_vlsurface-sampler_view,
+   vl_compositor_set_rgba_layer(cstate, compositor, 0, src_sv,
 RectToPipe(source_rect, src_rect), NULL,
 ColorsToPipe(colors, flags, vlcolors));
STATIC_ASSERT(VL_COMPOSITOR_ROTATE_0 == VDP_OUTPUT_SURFACE_RENDER_ROTATE_0);

[Mesa-dev] [PATCH 08/11] glsl: enable/disable certain lowering passes for doubles

2014-08-14 Thread Dave Airlie

We want to restrict some lowering passes to floats only,
and enable other for doubles.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/glsl/lower_instructions.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/glsl/lower_instructions.cpp b/src/glsl/lower_instructions.cpp
index 176070c..eced619 100644
--- a/src/glsl/lower_instructions.cpp
+++ b/src/glsl/lower_instructions.cpp
@@ -290,7 +290,7 @@ lower_instructions_visitor::mod_to_fract(ir_expression *ir)
/* Don't generate new IR that would need to be lowered in an additional
 * pass.
 */
-   if (lowering(DIV_TO_MUL_RCP))
+   if (lowering(DIV_TO_MUL_RCP)  ir-type-is_float())
   div_to_mul_rcp(div_expr);
 
ir_rvalue *expr = new(ir) ir_expression(ir_unop_fract,
@@ -511,7 +511,7 @@ lower_instructions_visitor::visit_leave(ir_expression *ir)
   break;
 
case ir_binop_mod:
-  if (lowering(MOD_TO_FRACT)  ir-type-is_float())
+  if (lowering(MOD_TO_FRACT)  (ir-type-is_float() || 
ir-type-is_double()))
 mod_to_fract(ir);
   break;
 
@@ -526,7 +526,7 @@ lower_instructions_visitor::visit_leave(ir_expression *ir)
   break;
 
case ir_binop_ldexp:
-  if (lowering(LDEXP_TO_ARITH))
+  if (lowering(LDEXP_TO_ARITH)  ir-type-is_float())
  ldexp_to_arith(ir);
   break;
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 07/11] glsl: add double support

2014-08-14 Thread Dave Airlie

This adds the guts of the fp64 implementation to the GLSL compiler.

- builtin double types
- double constant support
- lexer parsing for double types (lf, LF)
- enforcing flat on double fs inputs
- double operations (d2f,f2d, pack/unpack, frexp - in 2 parts)
- ir builder bits.
- double constant expression handling

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/glsl/ast.h |   2 +
 src/glsl/ast_function.cpp  |  36 ++
 src/glsl/ast_to_hir.cpp|  28 -
 src/glsl/builtin_type_macros.h |  16 +++
 src/glsl/builtin_types.cpp |  30 +
 src/glsl/glsl_lexer.ll |  42 ++-
 src/glsl/glsl_parser.yy|  33 +-
 src/glsl/glsl_parser_extras.cpp|   4 +
 src/glsl/glsl_types.cpp|  74 ++--
 src/glsl/glsl_types.h  |  18 ++-
 src/glsl/ir.cpp|  90 +-
 src/glsl/ir.h  |  17 +++
 src/glsl/ir_builder.cpp|  11 ++
 src/glsl/ir_builder.h  |   3 +
 src/glsl/ir_clone.cpp  |   1 +
 src/glsl/ir_constant_expression.cpp| 207 -
 src/glsl/ir_print_visitor.cpp  |  11 ++
 src/glsl/ir_set_program_inouts.cpp |  24 +++-
 src/glsl/ir_validate.cpp   |  45 ++-
 src/glsl/link_uniform_initializers.cpp |   4 +
 src/glsl/link_uniforms.cpp |   2 +
 src/glsl/link_varyings.cpp |   3 +-
 src/mesa/program/ir_to_mesa.cpp|   6 +
 23 files changed, 641 insertions(+), 66 deletions(-)

diff --git a/src/glsl/ast.h b/src/glsl/ast.h
index 15bf086..99274ed 100644
--- a/src/glsl/ast.h
+++ b/src/glsl/ast.h
@@ -189,6 +189,7 @@ enum ast_operators {
ast_uint_constant,
ast_float_constant,
ast_bool_constant,
+   ast_double_constant,
 
ast_sequence,
ast_aggregate
@@ -236,6 +237,7 @@ public:
   float float_constant;
   unsigned uint_constant;
   int bool_constant;
+  double double_constant;
} primary_expression;
 
 
diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp
index 39c7bee..6169ae6 100644
--- a/src/glsl/ast_function.cpp
+++ b/src/glsl/ast_function.cpp
@@ -570,6 +570,10 @@ convert_component(ir_rvalue *src, const glsl_type 
*desired_type)
 result = new(ctx) ir_expression(ir_unop_i2u,
  new(ctx) ir_expression(ir_unop_b2i, src));
 break;
+  case GLSL_TYPE_DOUBLE:
+result = new(ctx) ir_expression(ir_unop_f2u,
+  new(ctx) ir_expression(ir_unop_d2f, src));
+break;
   }
   break;
case GLSL_TYPE_INT:
@@ -583,6 +587,10 @@ convert_component(ir_rvalue *src, const glsl_type 
*desired_type)
   case GLSL_TYPE_BOOL:
 result = new(ctx) ir_expression(ir_unop_b2i, src);
 break;
+  case GLSL_TYPE_DOUBLE:
+result = new(ctx) ir_expression(ir_unop_f2i,
+  new(ctx) ir_expression(ir_unop_d2f, src));
+break;
   }
   break;
case GLSL_TYPE_FLOAT:
@@ -596,6 +604,9 @@ convert_component(ir_rvalue *src, const glsl_type 
*desired_type)
   case GLSL_TYPE_BOOL:
 result = new(ctx) ir_expression(ir_unop_b2f, desired_type, src, NULL);
 break;
+  case GLSL_TYPE_DOUBLE:
+result = new(ctx) ir_expression(ir_unop_d2f, desired_type, src, NULL);
+break;
   }
   break;
case GLSL_TYPE_BOOL:
@@ -610,8 +621,30 @@ convert_component(ir_rvalue *src, const glsl_type 
*desired_type)
   case GLSL_TYPE_FLOAT:
 result = new(ctx) ir_expression(ir_unop_f2b, desired_type, src, NULL);
 break;
+  case GLSL_TYPE_DOUBLE:
+result = new(ctx) ir_expression(ir_unop_f2b,
+  new(ctx) ir_expression(ir_unop_d2f, src));
+break;
   }
   break;
+   case GLSL_TYPE_DOUBLE:
+  switch (b) {
+  case GLSL_TYPE_INT:
+ result = new(ctx) ir_expression(ir_unop_f2d,
+  new(ctx) ir_expression(ir_unop_i2f, src));
+break;
+  case GLSL_TYPE_UINT:
+ result = new(ctx) ir_expression(ir_unop_f2d,
+  new(ctx) ir_expression(ir_unop_u2f, src));
+break;
+  case GLSL_TYPE_BOOL:
+ result = new(ctx) ir_expression(ir_unop_f2d,
+  new(ctx) ir_expression(ir_unop_b2f, src));
+break;
+  case GLSL_TYPE_FLOAT:
+result = new(ctx) ir_expression(ir_unop_f2d, desired_type, src, NULL);
+break;
+  }
}
 
assert(result != NULL);
@@ -1009,6 +1042,9 @@ emit_inline_vector_constructor(const glsl_type *type,
   case GLSL_TYPE_FLOAT:
  data.f[i + base_component] = c-get_float_component(i);
  break;
+  case GLSL_TYPE_DOUBLE:
+ data.d[i + base_component] = c-get_double_component(i);
+ break;
   case GLSL_TYPE_BOOL:
  data.b[i + base_component] =

[Mesa-dev] [PATCH 09/11] glsl/lower_instructions: add double lowering passes

2014-08-14 Thread Dave Airlie

This lowers double dot product and lrp to fma.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/glsl/lower_instructions.cpp | 83 +
 1 file changed, 83 insertions(+)

diff --git a/src/glsl/lower_instructions.cpp b/src/glsl/lower_instructions.cpp
index eced619..f737556 100644
--- a/src/glsl/lower_instructions.cpp
+++ b/src/glsl/lower_instructions.cpp
@@ -107,6 +107,7 @@
  */
 
 #include main/core.h /* for M_LOG2E */
+#include program/prog_instruction.h /* for swizzle */
 #include glsl_types.h
 #include ir.h
 #include ir_builder.h
@@ -139,6 +140,8 @@ private:
void ldexp_to_arith(ir_expression *);
void carry_to_arith(ir_expression *);
void borrow_to_arith(ir_expression *);
+   void double_dot_to_fma(ir_expression *);
+   void double_lrp(ir_expression *);
 };
 
 } /* anonymous namespace */
@@ -484,10 +487,90 @@ lower_instructions_visitor::borrow_to_arith(ir_expression 
*ir)
this-progress = true;
 }
 
+void
+lower_instructions_visitor::double_dot_to_fma(ir_expression *ir)
+{
+   ir_variable *temp = new(ir) 
ir_variable(ir-operands[0]-type-get_base_type(), dot_res,
+  ir_var_temporary);
+   this-base_ir-insert_before(temp);
+
+   int nc = ir-operands[0]-type-components();
+   for (int i = nc - 1; i = 1; i--) {
+  ir_assignment *assig;
+  if (i == (nc - 1)) {
+ assig = assign(temp, mul(swizzle(ir-operands[0]-clone(ir, NULL), i, 
1),
+  swizzle(ir-operands[1]-clone(ir, NULL), i, 
1)));
+  } else {
+ assig = assign(temp, fma(swizzle(ir-operands[0]-clone(ir, NULL), i, 
1),
+  swizzle(ir-operands[1]-clone(ir, NULL), i, 
1),
+  temp));
+  }
+  this-base_ir-insert_before(assig);
+   }
+
+   ir-operation = ir_triop_fma;
+   ir-operands[0] = swizzle(ir-operands[0], 0, 1);
+   ir-operands[1] = swizzle(ir-operands[1], 0, 1);
+   ir-operands[2] = new(ir) ir_dereference_variable(temp);
+
+   this-progress = true;
+
+}
+
+void
+lower_instructions_visitor::double_lrp(ir_expression *ir)
+{
+   ir_assignment *assig;
+   ir_constant *one = new(ir) ir_constant(1.0, 
ir-operands[2]-type-vector_elements);
+   ir_variable *temp = new(ir) ir_variable(ir-operands[0]-type, lrp_res,
+  ir_var_temporary);
+   ir_variable *t2 = new(ir) ir_variable(ir-operands[0]-type, aval,
+  ir_var_temporary);
+   int swizval;
+   this-base_ir-insert_before(temp);
+   this-base_ir-insert_before(t2);
+
+   assig = assign(temp, mul(sub(one, ir-operands[2]), ir-operands[0]));
+   this-base_ir-insert_before(assig);
+
+   switch (ir-operands[2]-type-vector_elements) {
+   case 1:
+  swizval = SWIZZLE_;
+  break;
+   case 2:
+  swizval = MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_Y, SWIZZLE_X, SWIZZLE_X);
+  break;
+   case 3:
+  swizval = MAKE_SWIZZLE4(SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_X);
+  break;
+   case 4:
+   default:
+  swizval = SWIZZLE_XYZW;
+  break;
+   }
+   assig = assign(t2, swizzle(ir-operands[2], swizval, 
ir-operands[0]-type-vector_elements));
+   this-base_ir-insert_before(assig);
+
+   ir-operation = ir_triop_fma;
+   ir-operands[0] = new(ir) ir_dereference_variable(t2);
+   ir-operands[1] = ir-operands[1];
+   ir-operands[2] = new(ir) ir_dereference_variable(temp);
+
+   this-progress = true;
+}
+
 ir_visitor_status
 lower_instructions_visitor::visit_leave(ir_expression *ir)
 {
switch (ir-operation) {
+   case ir_binop_dot:
+  if (ir-operands[0]-type-is_double())
+ double_dot_to_fma(ir);
+  break;
+   case ir_triop_lrp:
+  if (ir-operands[0]-type-is_double())
+ double_lrp(ir);
+  break;
case ir_binop_sub:
   if (lowering(SUB_TO_ADD_NEG))
 sub_to_add_neg(ir);
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC] initial ARB_gpu_shader_fp64 posting

2014-08-14 Thread Dave Airlie

This is just the mesa and glsl compiler portions of the ARB_gpu_shader_fp64
extension that I've been slowly iterating over the past few months.

All in 
http://cgit.freedesktop.org/~airlied/mesa/log/?h=arb_gpu_shader_fp64-submit but 
underneath the gallium + softpipe + mesa/st development, which all
need further cleaning and docs.

The biggest bits of this are the builtin generator, constant expression 
handling and uniform interfaces. I suspect there are chunks in some patches 
that might need to be in other, and the uniform patches are probably not very 
well explained, mostly because I can't remember why exactly I did what I did in 
a few places.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 05/11] mesa: add double uniform support.

2014-08-14 Thread Dave Airlie

From: Dave Airlie airl...@redhat.com

This adds support for the new uniform interfaces
from ARB_gpu_shader_fp64.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/mesa/main/uniform_query.cpp   | 50 +
 src/mesa/main/uniforms.c  | 91 +++
 src/mesa/main/uniforms.h  |  3 +-
 src/mesa/program/ir_to_mesa.cpp   | 17 +++-
 src/mesa/program/prog_parameter.c | 16 ---
 5 files changed, 143 insertions(+), 34 deletions(-)

diff --git a/src/mesa/main/uniform_query.cpp b/src/mesa/main/uniform_query.cpp
index 7e630e6..d7024cb 100644
--- a/src/mesa/main/uniform_query.cpp
+++ b/src/mesa/main/uniform_query.cpp
@@ -449,6 +449,9 @@ log_uniform(const void *values, enum glsl_base_type 
basicType,
   case GLSL_TYPE_FLOAT:
 printf(%g , v[i].f);
 break;
+  case GLSL_TYPE_DOUBLE:
+printf(%g , *(double* )v[i * 2].f);
+break;
   default:
 assert(!Should not get here.);
 break;
@@ -509,11 +512,11 @@ _mesa_propagate_uniforms_to_driver_storage(struct 
gl_uniform_storage *uni,
 */
const unsigned components = MAX2(1, uni-type-vector_elements);
const unsigned vectors = MAX2(1, uni-type-matrix_columns);
-
+   const int dmul = uni-type-base_type == GLSL_TYPE_DOUBLE ? 2 : 1;
/* Store the data in the driver's requested type in the driver's storage
 * areas.
 */
-   unsigned src_vector_byte_stride = components * 4;
+   unsigned src_vector_byte_stride = components * 4 * dmul;
 
for (i = 0; i  uni-num_driver_storage; i++) {
   struct gl_uniform_driver_storage *const store = uni-driver_storage[i];
@@ -612,6 +615,7 @@ _mesa_uniform(struct gl_context *ctx, struct 
gl_shader_program *shProg,
unsigned components;
unsigned src_components;
enum glsl_base_type basicType;
+   int size_mul = 1;
 
struct gl_uniform_storage *const uni =
   validate_uniform_parameters(ctx, shProg, location, count,
@@ -670,6 +674,26 @@ _mesa_uniform(struct gl_context *ctx, struct 
gl_shader_program *shProg,
   basicType = GLSL_TYPE_INT;
   src_components = 4;
   break;
+   case GL_DOUBLE:
+  basicType = GLSL_TYPE_DOUBLE;
+  src_components = 1;
+  size_mul = 2;
+  break;
+   case GL_DOUBLE_VEC2:
+  basicType = GLSL_TYPE_DOUBLE;
+  src_components = 2;
+  size_mul = 2;
+  break;
+   case GL_DOUBLE_VEC3:
+  basicType = GLSL_TYPE_DOUBLE;
+  src_components = 3;
+  size_mul = 2;
+  break;
+   case GL_DOUBLE_VEC4:
+  basicType = GLSL_TYPE_DOUBLE;
+  src_components = 4;
+  size_mul = 2;
+  break;
case GL_BOOL:
case GL_BOOL_VEC2:
case GL_BOOL_VEC3:
@@ -683,6 +707,15 @@ _mesa_uniform(struct gl_context *ctx, struct 
gl_shader_program *shProg,
case GL_FLOAT_MAT4x2:
case GL_FLOAT_MAT4x3:
case GL_FLOAT_MAT4:
+   case GL_DOUBLE_MAT2:
+   case GL_DOUBLE_MAT2x3:
+   case GL_DOUBLE_MAT2x4:
+   case GL_DOUBLE_MAT3x2:
+   case GL_DOUBLE_MAT3:
+   case GL_DOUBLE_MAT3x4:
+   case GL_DOUBLE_MAT4x2:
+   case GL_DOUBLE_MAT4x3:
+   case GL_DOUBLE_MAT4:
default:
   _mesa_problem(NULL, Invalid type in %s, __func__);
   return;
@@ -789,7 +822,7 @@ _mesa_uniform(struct gl_context *ctx, struct 
gl_shader_program *shProg,
 */
if (!uni-type-is_boolean()) {
   memcpy(uni-storage[components * offset], values,
-sizeof(uni-storage[0]) * components * count);
+sizeof(uni-storage[0]) * components * count * size_mul);
} else {
   const union gl_constant_value *src =
 (const union gl_constant_value *) values;
@@ -892,13 +925,14 @@ extern C void
 _mesa_uniform_matrix(struct gl_context *ctx, struct gl_shader_program *shProg,
 GLuint cols, GLuint rows,
  GLint location, GLsizei count,
- GLboolean transpose, const GLfloat *values)
+ GLboolean transpose,
+ const GLvoid *values, GLenum type)
 {
unsigned offset;
unsigned vectors;
unsigned components;
unsigned elements;
-
+   int size_mul = mesa_type_is_double(type) ? 2 : 1;
struct gl_uniform_storage *const uni =
   validate_uniform_parameters(ctx, shProg, location, count,
   offset, glUniformMatrix, false);
@@ -936,7 +970,7 @@ _mesa_uniform_matrix(struct gl_context *ctx, struct 
gl_shader_program *shProg,
}
 
if (ctx-_Shader-Flags  GLSL_UNIFORMS) {
-  log_uniform(values, GLSL_TYPE_FLOAT, components, vectors, count,
+  log_uniform(values, uni-type-base_type, components, vectors, count,
  bool(transpose), shProg, location, uni);
}
 
@@ -963,11 +997,11 @@ _mesa_uniform_matrix(struct gl_context *ctx, struct 
gl_shader_program *shProg,
 
if (!transpose) {
   memcpy(uni-storage[elements * offset], values,
-sizeof(uni-storage[0]) * elements * count);
+sizeof(uni-storage[0]) * elements * count * size_mul);

[Mesa-dev] [PATCH 03/11] mesa: add mesa_type_is_double helper function

2014-08-14 Thread Dave Airlie

This is a helper to return if a type is based on a double.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/mesa/program/prog_parameter.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/src/mesa/program/prog_parameter.h 
b/src/mesa/program/prog_parameter.h
index 6b3b3c2..9ee0f5e 100644
--- a/src/mesa/program/prog_parameter.h
+++ b/src/mesa/program/prog_parameter.h
@@ -151,6 +151,28 @@ _mesa_lookup_parameter_constant(const struct 
gl_program_parameter_list *list,
 const gl_constant_value v[], GLuint vSize,
 GLint *posOut, GLuint *swizzleOut);
 
+static INLINE GLboolean mesa_type_is_double(int dataType)
+{
+   switch (dataType) {
+   case GL_DOUBLE:
+   case GL_DOUBLE_VEC2:
+   case GL_DOUBLE_VEC3:
+   case GL_DOUBLE_VEC4:
+   case GL_DOUBLE_MAT2:
+   case GL_DOUBLE_MAT2x3:
+   case GL_DOUBLE_MAT2x4:
+   case GL_DOUBLE_MAT3:
+   case GL_DOUBLE_MAT3x2:
+   case GL_DOUBLE_MAT3x4:
+   case GL_DOUBLE_MAT4:
+   case GL_DOUBLE_MAT4x2:
+   case GL_DOUBLE_MAT4x3:
+  return GL_TRUE;
+   default:
+  return GL_FALSE;
+   }
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 01/11] glapi: add ARB_gpu_shader_fp64

2014-08-14 Thread Dave Airlie

From: Dave Airlie airl...@redhat.com

Just add the xml file covering this extension,
and dummy interface files in mesa, and fix up
sanity tests.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/mapi/glapi/gen/ARB_gpu_shader_fp64.xml | 143 +
 src/mapi/glapi/gen/Makefile.am |   1 +
 src/mapi/glapi/gen/gl_API.xml  |   2 +
 src/mesa/main/tests/dispatch_sanity.cpp|  36 
 src/mesa/main/uniforms.c   |  95 +++
 src/mesa/main/uniforms.h   |  43 +
 6 files changed, 302 insertions(+), 18 deletions(-)
 create mode 100644 src/mapi/glapi/gen/ARB_gpu_shader_fp64.xml

diff --git a/src/mapi/glapi/gen/ARB_gpu_shader_fp64.xml 
b/src/mapi/glapi/gen/ARB_gpu_shader_fp64.xml
new file mode 100644
index 000..4f860ef
--- /dev/null
+++ b/src/mapi/glapi/gen/ARB_gpu_shader_fp64.xml
@@ -0,0 +1,143 @@
+?xml version=1.0?
+!DOCTYPE OpenGLAPI SYSTEM gl_API.dtd
+
+OpenGLAPI
+
+category name=GL_ARB_gpu_shader_fp64 number=89
+
+function name=Uniform1d offset=assign
+param name=location type=GLint/
+param name=x type=GLdouble/
+/function
+
+function name=Uniform2d offset=assign
+param name=location type=GLint/
+param name=x type=GLdouble/
+param name=y type=GLdouble/
+/function
+
+function name=Uniform3d offset=assign
+param name=location type=GLint/
+param name=x type=GLdouble/
+param name=y type=GLdouble/
+param name=z type=GLdouble/
+/function
+
+function name=Uniform4d offset=assign
+param name=location type=GLint/
+param name=x type=GLdouble/
+param name=y type=GLdouble/
+param name=z type=GLdouble/
+param name=w type=GLdouble/
+/function
+
+function name=Uniform1dv offset=assign
+param name=location type=GLint/
+param name=count type=GLsizei/
+param name=value type=const GLdouble */
+/function
+
+function name=Uniform2dv offset=assign
+param name=location type=GLint/
+param name=count type=GLsizei/
+param name=value type=const GLdouble */
+/function
+
+function name=Uniform3dv offset=assign
+param name=location type=GLint/
+param name=count type=GLsizei/
+param name=value type=const GLdouble */
+/function
+
+function name=Uniform4dv offset=assign
+param name=location type=GLint/
+param name=count type=GLsizei/
+param name=value type=const GLdouble */
+/function
+
+function name=UniformMatrix2dv offset=assign
+param name=location type=GLint/
+param name=count type=GLsizei/
+param name=transpose type=GLboolean/
+param name=value type=const GLdouble */
+/function
+
+function name=UniformMatrix3dv offset=assign
+param name=location type=GLint/
+param name=count type=GLsizei/
+param name=transpose type=GLboolean/
+param name=value type=const GLdouble */
+/function
+
+function name=UniformMatrix4dv offset=assign
+param name=location type=GLint/
+param name=count type=GLsizei/
+param name=transpose type=GLboolean/
+param name=value type=const GLdouble */
+/function
+
+function name=UniformMatrix2x3dv offset=assign
+param name=location type=GLint/
+param name=count type=GLsizei/
+param name=transpose type=GLboolean/
+param name=value type=const GLdouble */
+/function
+
+function name=UniformMatrix2x4dv offset=assign
+param name=location type=GLint/
+param name=count type=GLsizei/
+param name=transpose type=GLboolean/
+param name=value type=const GLdouble */
+/function
+
+function name=UniformMatrix3x2dv offset=assign
+param name=location type=GLint/
+param name=count type=GLsizei/
+param name=transpose type=GLboolean/
+param name=value type=const GLdouble */
+/function
+
+function name=UniformMatrix3x4dv offset=assign
+param name=location type=GLint/
+param name=count type=GLsizei/
+param name=transpose type=GLboolean/
+param name=value type=const GLdouble */
+/function
+
+function name=UniformMatrix4x2dv offset=assign
+param name=location type=GLint/
+param name=count type=GLsizei/
+param name=transpose type=GLboolean/
+param name=value type=const GLdouble */
+/function
+
+function name=UniformMatrix4x3dv offset=assign
+param name=location type=GLint/
+param name=count type=GLsizei/
+param name=transpose type=GLboolean/
+param name=value type=const GLdouble */
+/function
+
+function name=GetUniformdv offset=assign
+param name=program type=GLuint/
+param name=location type=GLint/
+param name=params type=GLdouble */
+/function
+
+enum name=DOUBLE_VEC2

[Mesa-dev] [PATCH 04/11] glsl: add double type

2014-08-14 Thread Dave Airlie

From: Dave Airlie airl...@redhat.com

This just adds a placeholder for the GLSL_TYPE_DOUBLE.

This causes a lot of warnings about unchecked type in
switch statements - fix them later.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/glsl/glsl_types.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/glsl/glsl_types.h b/src/glsl/glsl_types.h
index d545533..e00a3e0 100644
--- a/src/glsl/glsl_types.h
+++ b/src/glsl/glsl_types.h
@@ -51,6 +51,7 @@ enum glsl_base_type {
GLSL_TYPE_UINT = 0,
GLSL_TYPE_INT,
GLSL_TYPE_FLOAT,
+   GLSL_TYPE_DOUBLE,
GLSL_TYPE_BOOL,
GLSL_TYPE_SAMPLER,
GLSL_TYPE_IMAGE,
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/11] glsl: add ARB_gpu_shader_fp64 to the glsl extensions.

2014-08-14 Thread Dave Airlie

From: Dave Airlie airl...@redhat.com

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/glsl/glsl_parser_extras.cpp | 1 +
 src/glsl/glsl_parser_extras.h   | 2 ++
 src/glsl/standalone_scaffolding.cpp | 1 +
 3 files changed, 4 insertions(+)

diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
index ad91c46..53fbb25 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -521,6 +521,7 @@ static const _mesa_glsl_extension 
_mesa_glsl_supported_extensions[] = {
EXT(ARB_fragment_coord_conventions, true,  false, 
ARB_fragment_coord_conventions),
EXT(ARB_fragment_layer_viewport,true,  false, 
ARB_fragment_layer_viewport),
EXT(ARB_gpu_shader5,true,  false, ARB_gpu_shader5),
+   EXT(ARB_gpu_shader_fp64,true,  false, ARB_gpu_shader_fp64),
EXT(ARB_sample_shading, true,  false, ARB_sample_shading),
EXT(ARB_separate_shader_objects,true,  false, dummy_true),
EXT(ARB_shader_atomic_counters, true,  false, 
ARB_shader_atomic_counters),
diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h
index ce66e2f..6f5c0b1 100644
--- a/src/glsl/glsl_parser_extras.h
+++ b/src/glsl/glsl_parser_extras.h
@@ -407,6 +407,8 @@ struct _mesa_glsl_parse_state {
bool ARB_fragment_layer_viewport_warn;
bool ARB_gpu_shader5_enable;
bool ARB_gpu_shader5_warn;
+   bool ARB_gpu_shader_fp64_enable;
+   bool ARB_gpu_shader_fp64_warn;
bool ARB_sample_shading_enable;
bool ARB_sample_shading_warn;
bool ARB_separate_shader_objects_enable;
diff --git a/src/glsl/standalone_scaffolding.cpp 
b/src/glsl/standalone_scaffolding.cpp
index 2b76dd1..63e3cde 100644
--- a/src/glsl/standalone_scaffolding.cpp
+++ b/src/glsl/standalone_scaffolding.cpp
@@ -100,6 +100,7 @@ void initialize_context_to_defaults(struct gl_context *ctx, 
gl_api api)
ctx-Extensions.ARB_fragment_coord_conventions = true;
ctx-Extensions.ARB_fragment_layer_viewport = true;
ctx-Extensions.ARB_gpu_shader5 = true;
+   ctx-Extensions.ARB_gpu_shader_fp64 = true;
ctx-Extensions.ARB_sample_shading = true;
ctx-Extensions.ARB_shader_bit_encoding = true;
ctx-Extensions.ARB_shader_stencil_export = true;
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 11/11] glsl: lower double optional passes

2014-08-14 Thread Dave Airlie

These lowering passes are optional for the backend to request, currently
the TGSI softpipe backend most likely the r600g backend would want to use
these passes as is. They aim to hit the gallium opcodes from the standard
rounding/truncation functions.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/glsl/ir_optimization.h  |   1 +
 src/glsl/lower_instructions.cpp | 209 
 2 files changed, 210 insertions(+)

diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
index b83c225..72ac3a9 100644
--- a/src/glsl/ir_optimization.h
+++ b/src/glsl/ir_optimization.h
@@ -40,6 +40,7 @@
 #define LDEXP_TO_ARITH 0x100
 #define CARRY_TO_ARITH 0x200
 #define BORROW_TO_ARITH0x400
+#define DOPS_TO_DFRAC  0x800
 
 /**
  * \see class lower_packing_builtins_visitor
diff --git a/src/glsl/lower_instructions.cpp b/src/glsl/lower_instructions.cpp
index f737556..6da144e 100644
--- a/src/glsl/lower_instructions.cpp
+++ b/src/glsl/lower_instructions.cpp
@@ -41,6 +41,7 @@
  * - BITFIELD_INSERT_TO_BFM_BFI
  * - CARRY_TO_ARITH
  * - BORROW_TO_ARITH
+ * - DOPS_TO_DFRAC
  *
  * SUB_TO_ADD_NEG:
  * ---
@@ -104,6 +105,9 @@
  * 
  * Converts ir_borrow into (x  y).
  *
+ * DOPS_TO_DFRAC:
+ * --
+ * Converts double trunc, ceil, floor, round to fract
  */
 
 #include main/core.h /* for M_LOG2E */
@@ -142,6 +146,11 @@ private:
void borrow_to_arith(ir_expression *);
void double_dot_to_fma(ir_expression *);
void double_lrp(ir_expression *);
+   void dceil_to_dfrac(ir_expression *);
+   void dfloor_to_dfrac(ir_expression *);
+   void dround_even_to_dfrac(ir_expression *);
+   void dtrunc_to_dfrac(ir_expression *);
+   void dsign_to_csel(ir_expression *);
 };
 
 } /* anonymous namespace */
@@ -559,6 +568,182 @@ lower_instructions_visitor::double_lrp(ir_expression *ir)
this-progress = true;
 }
 
+void
+lower_instructions_visitor::dceil_to_dfrac(ir_expression *ir)
+{
+   /*
+* frtemp = frac(x);
+* temp = sub(x, frtemp);
+* result = temp + ((frtemp != 0.0) ? 1.0 : 0.0);
+*/
+   ir_instruction i = *base_ir;
+   ir_constant *zero = new(ir) ir_constant(0.0, 
ir-operands[0]-type-vector_elements);
+   ir_constant *one = new(ir) ir_constant(1.0, 
ir-operands[0]-type-vector_elements);
+   ir_variable *frtemp = new(ir) ir_variable(ir-operands[0]-type, frtemp,
+ ir_var_temporary);
+   ir_variable *temp = new(ir) ir_variable(ir-operands[0]-type, temp,
+   ir_var_temporary);
+   ir_variable *t2 = new(ir) ir_variable(ir-operands[0]-type, t2,
+ ir_var_temporary);
+
+   i.insert_before(frtemp);
+   i.insert_before(assign(frtemp, fract(ir-operands[0])));
+
+   i.insert_before(temp);
+   i.insert_before(assign(temp, sub(ir-operands[0]-clone(ir, NULL), 
frtemp)));
+
+   i.insert_before(t2);
+   i.insert_before(assign(t2, csel(nequal(frtemp, zero), one, zero-clone(ir, 
NULL;
+   ir-operation = ir_binop_add;
+   ir-operands[0] = new(ir) ir_dereference_variable(temp);
+   ir-operands[1] = new(ir) ir_dereference_variable(t2);
+}
+
+void
+lower_instructions_visitor::dfloor_to_dfrac(ir_expression *ir)
+{
+   /*
+* frtemp = frac(x);
+* result = sub(x, frtemp);
+*/
+   ir_instruction i = *base_ir;
+   ir_variable *frtemp = new(ir) ir_variable(ir-operands[0]-type, frtemp,
+ ir_var_temporary);
+
+   i.insert_before(frtemp);
+   i.insert_before(assign(frtemp, fract(ir-operands[0]-clone(ir, NULL;
+
+   ir-operation = ir_binop_sub;
+   ir-operands[1] = new(ir) ir_dereference_variable(frtemp);
+}
+void
+lower_instructions_visitor::dround_even_to_dfrac(ir_expression *ir)
+{
+   /*
+* insane but works
+* temp = x + 0.5;
+* frtemp = frac(temp);
+* t2 = sub(temp, frtemp);
+* if (frac(x) == 0.5)
+* result = frac(t2 * 0.5) == 0 ? t2 : t2 - 1;
+*  else
+* result = t2;
+
+*/
+   const unsigned vec_elem = ir-type-vector_elements;
+   const glsl_type *bvec = glsl_type::get_instance(GLSL_TYPE_BOOL, vec_elem, 
1);
+   ir_instruction i = *base_ir;
+   ir_variable *frtemp = new(ir) ir_variable(ir-operands[0]-type, frtemp,
+ ir_var_temporary);
+   ir_variable *temp = new(ir) ir_variable(ir-operands[0]-type, temp,
+   ir_var_temporary);
+   ir_variable *t2 = new(ir) ir_variable(ir-operands[0]-type, t2,
+   ir_var_temporary);
+   ir_variable *t3 = new(ir) ir_variable(bvec, t3,
+   ir_var_temporary);
+   ir_variable *t4 = new(ir) ir_variable(bvec, t4,
+   ir_var_temporary);
+   ir_variable *t5 = new(ir) ir_variable(ir-operands[0]-type, t5,
+   ir_var_temporary);
+   ir_constant *p5 =

[Mesa-dev] [PATCH 10/11] glsl: implement double builtin functions

2014-08-14 Thread Dave Airlie

This implements the bulk of the builtin functions for fp64 support.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/glsl/builtin_functions.cpp | 751 +++--
 1 file changed, 492 insertions(+), 259 deletions(-)

diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
index 185fe98..b190fcd 100644
--- a/src/glsl/builtin_functions.cpp
+++ b/src/glsl/builtin_functions.cpp
@@ -373,6 +373,12 @@ gs_streams(const _mesa_glsl_parse_state *state)
return gpu_shader5(state)  gs_only(state);
 }
 
+static bool
+fp64(const _mesa_glsl_parse_state *state)
+{
+   return state-is_version(400, 0) || state-ARB_gpu_shader_fp64_enable;
+}
+
 /** @} */
 
 
/**/
@@ -428,6 +434,7 @@ private:
ir_constant *imm(float f, unsigned vector_elements=1);
ir_constant *imm(int i, unsigned vector_elements=1);
ir_constant *imm(unsigned u, unsigned vector_elements=1);
+   ir_constant *imm(double d, unsigned vector_elements=1);
ir_constant *imm(const glsl_type *type, const ir_constant_data );
ir_dereference_variable *var_ref(ir_variable *var);
ir_dereference_array *array_ref(ir_variable *var, int i);
@@ -517,29 +524,29 @@ private:
B1(log)
B1(exp2)
B1(log2)
-   B1(sqrt)
-   B1(inversesqrt)
-   B1(abs)
-   B1(sign)
-   B1(floor)
-   B1(trunc)
-   B1(round)
-   B1(roundEven)
-   B1(ceil)
-   B1(fract)
+   BA1(sqrt)
+   BA1(inversesqrt)
+   BA1(abs)
+   BA1(sign)
+   BA1(floor)
+   BA1(trunc)
+   BA1(round)
+   BA1(roundEven)
+   BA1(ceil)
+   BA1(fract)
B2(mod)
-   B1(modf)
+   BA1(modf)
BA2(min)
BA2(max)
BA2(clamp)
-   B2(mix_lrp)
+   BA2(mix_lrp)
ir_function_signature *_mix_sel(builtin_available_predicate avail,
const glsl_type *val_type,
const glsl_type *blend_type);
-   B2(step)
-   B2(smoothstep)
-   B1(isnan)
-   B1(isinf)
+   BA2(step)
+   BA2(smoothstep)
+   BA1(isnan)
+   BA1(isinf)
B1(floatBitsToInt)
B1(floatBitsToUint)
B1(intBitsToFloat)
@@ -554,24 +561,27 @@ private:
ir_function_signature *_unpackSnorm4x8(builtin_available_predicate avail);
ir_function_signature *_packHalf2x16(builtin_available_predicate avail);
ir_function_signature *_unpackHalf2x16(builtin_available_predicate avail);
-   B1(length)
-   B1(distance);
-   B1(dot);
-   B1(cross);
-   B1(normalize);
+   ir_function_signature *_packDouble2x32(builtin_available_predicate avail);
+   ir_function_signature *_unpackDouble2x32(builtin_available_predicate avail);
+
+   BA1(length)
+   BA1(distance);
+   BA1(dot);
+   BA1(cross);
+   BA1(normalize);
B0(ftransform);
-   B1(faceforward);
-   B1(reflect);
-   B1(refract);
-   B1(matrixCompMult);
-   B1(outerProduct);
-   B0(determinant_mat2);
-   B0(determinant_mat3);
-   B0(determinant_mat4);
-   B0(inverse_mat2);
-   B0(inverse_mat3);
-   B0(inverse_mat4);
-   B1(transpose);
+   BA1(faceforward);
+   BA1(reflect);
+   BA1(refract);
+   BA1(matrixCompMult);
+   BA1(outerProduct);
+   BA1(determinant_mat2);
+   BA1(determinant_mat3);
+   BA1(determinant_mat4);
+   BA1(inverse_mat2);
+   BA1(inverse_mat3);
+   BA1(inverse_mat4);
+   BA1(transpose);
BA1(lessThan);
BA1(lessThanEqual);
BA1(greaterThan);
@@ -629,9 +639,10 @@ private:
B1(bitCount)
B1(findLSB)
B1(findMSB)
-   B1(fma)
+   BA1(fma)
B2(ldexp)
B2(frexp)
+   B2(dfrexp)
B1(uaddCarry)
B1(usubBorrow)
B1(mulExtended)
@@ -800,6 +811,42 @@ builtin_builder::create_builtins()
 _##NAME(glsl_type::vec4_type),  \
 NULL);
 
+#define FD(NAME) \
+   add_function(#NAME,  \
+_##NAME(always_available, glsl_type::float_type), \
+_##NAME(always_available, glsl_type::vec2_type),  \
+_##NAME(always_available, glsl_type::vec3_type),  \
+_##NAME(always_available, glsl_type::vec4_type),  \
+_##NAME(fp64, glsl_type::double_type),  \
+_##NAME(fp64, glsl_type::dvec2_type),\
+_##NAME(fp64, glsl_type::dvec3_type), \
+_##NAME(fp64, glsl_type::dvec4_type),  \
+NULL);
+
+#define FD130(NAME) \
+   add_function(#NAME,  \
+_##NAME(v130, glsl_type::float_type), \
+_##NAME(v130, glsl_type::vec2_type),  \
+_##NAME(v130, glsl_type::vec3_type),  \
+_##NAME(v130, glsl_type::vec4_type),  \
+_##NAME(fp64, glsl_type::double_type),  \
+_##NAME(fp64, glsl_type::dvec2_type),\
+_##NAME(fp64, glsl_type::dvec3_type), \
+_##NAME(fp64, glsl_type::dvec4_type),  \
+NULL);
+
+#define FDGS5(NAME) \
+

[Mesa-dev] [PATCH 02/11] mesa: add ARB_gpu_shader_fp64 extension info

2014-08-14 Thread Dave Airlie

From: Dave Airlie airl...@redhat.com

This just adds the entries to extensions.c and mtypes.h

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/mesa/main/extensions.c | 1 +
 src/mesa/main/mtypes.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index 4f322d0..1445a9d 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -117,6 +117,7 @@ static const struct extension extension_table[] = {
{ GL_ARB_framebuffer_sRGB,o(EXT_framebuffer_sRGB),
GL, 1998 },
{ GL_ARB_get_program_binary,  o(dummy_true),  
GL, 2010 },
{ GL_ARB_gpu_shader5, o(ARB_gpu_shader5), 
GL, 2010 },
+   { GL_ARB_gpu_shader_fp64, o(ARB_gpu_shader_fp64), 
GL, 2010 },
{ GL_ARB_half_float_pixel,o(dummy_true),  
GL, 2003 },
{ GL_ARB_half_float_vertex,   o(ARB_half_float_vertex),   
GL, 2008 },
{ GL_ARB_instanced_arrays,o(ARB_instanced_arrays),
GL, 2008 },
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 742ce3e..121f2ea 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3572,6 +3572,7 @@ struct gl_extensions
GLboolean ARB_explicit_uniform_location;
GLboolean ARB_geometry_shader4;
GLboolean ARB_gpu_shader5;
+   GLboolean ARB_gpu_shader_fp64;
GLboolean ARB_half_float_vertex;
GLboolean ARB_instanced_arrays;
GLboolean ARB_internalformat_query;
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/5] gallium: add opcodes/cap for fine derivative support

2014-08-14 Thread Marek Olšák

On Thu, Aug 14, 2014 at 6:52 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 ---
  src/gallium/auxiliary/tgsi/tgsi_info.c   |  3 +++
  src/gallium/auxiliary/tgsi/tgsi_util.c   |  2 ++
  src/gallium/docs/source/screen.rst   |  2 ++
  src/gallium/docs/source/tgsi.rst | 12 ++--
  src/gallium/drivers/freedreno/freedreno_screen.c |  1 +
  src/gallium/drivers/i915/i915_screen.c   |  1 +
  src/gallium/drivers/ilo/ilo_screen.c |  1 +
  src/gallium/drivers/llvmpipe/lp_screen.c |  1 +
  src/gallium/drivers/nouveau/nv30/nv30_screen.c   |  1 +
  src/gallium/drivers/nouveau/nv50/nv50_screen.c   |  1 +
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c   |  1 +
  src/gallium/drivers/r300/r300_screen.c   |  1 +
  src/gallium/drivers/r600/r600_pipe.c |  1 +
  src/gallium/drivers/radeonsi/si_pipe.c   |  1 +
  src/gallium/drivers/softpipe/sp_screen.c |  1 +
  src/gallium/drivers/svga/svga_screen.c   |  1 +
  src/gallium/drivers/vc4/vc4_screen.c |  1 +
  src/gallium/include/pipe/p_defines.h |  1 +
  src/gallium/include/pipe/p_shader_tokens.h   |  5 -
  19 files changed, 35 insertions(+), 3 deletions(-)

 diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
 b/src/gallium/auxiliary/tgsi/tgsi_info.c
 index e24348f..35f9747 100644
 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
 +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
 @@ -235,6 +235,9 @@ static const struct tgsi_opcode_info 
 opcode_info[TGSI_OPCODE_LAST] =
 { 1, 1, 0, 0, 0, 0, OTHR, INTERP_CENTROID, TGSI_OPCODE_INTERP_CENTROID 
 },
 { 1, 2, 0, 0, 0, 0, OTHR, INTERP_SAMPLE, TGSI_OPCODE_INTERP_SAMPLE },
 { 1, 2, 0, 0, 0, 0, OTHR, INTERP_OFFSET, TGSI_OPCODE_INTERP_OFFSET },
 +
 +   { 1, 1, 0, 0, 0, 0, COMP, DDX_FINE, TGSI_OPCODE_DDX_FINE },
 +   { 1, 1, 0, 0, 0, 0, COMP, DDY_FINE, TGSI_OPCODE_DDY_FINE },

It would be nice to fill in some of the unused slots, e.g. 79 and 80.
Other than that:

Reviewed-by: Marek Olšák marek.ol...@amd.com

Marek

  };

  const struct tgsi_opcode_info *
 diff --git a/src/gallium/auxiliary/tgsi/tgsi_util.c 
 b/src/gallium/auxiliary/tgsi/tgsi_util.c
 index e48159c..e1cba95 100644
 --- a/src/gallium/auxiliary/tgsi/tgsi_util.c
 +++ b/src/gallium/auxiliary/tgsi/tgsi_util.c
 @@ -245,6 +245,8 @@ tgsi_util_get_inst_usage_mask(const struct 
 tgsi_full_instruction *inst,
 case TGSI_OPCODE_USNE:
 case TGSI_OPCODE_IMUL_HI:
 case TGSI_OPCODE_UMUL_HI:
 +   case TGSI_OPCODE_DDX_FINE:
 +   case TGSI_OPCODE_DDY_FINE:
/* Channel-wise operations */
read_mask = write_mask;
break;
 diff --git a/src/gallium/docs/source/screen.rst 
 b/src/gallium/docs/source/screen.rst
 index 814e3ae..6fecc15 100644
 --- a/src/gallium/docs/source/screen.rst
 +++ b/src/gallium/docs/source/screen.rst
 @@ -213,6 +213,8 @@ The integer capabilities:
  * ``PIPE_CAP_DRAW_INDIRECT``: Whether the driver supports taking draw 
 arguments
{ count, instance_count, start, index_bias } from a PIPE_BUFFER resource.
See pipe_draw_info.
 +* ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE``: Whether the fragment shader supports
 +  the FINE versions of DDX/DDY.


  .. _pipe_capf:
 diff --git a/src/gallium/docs/source/tgsi.rst 
 b/src/gallium/docs/source/tgsi.rst
 index ac0ea54..7d5918f 100644
 --- a/src/gallium/docs/source/tgsi.rst
 +++ b/src/gallium/docs/source/tgsi.rst
 @@ -433,7 +433,11 @@ This instruction replicates its result.
dst = \cos{src.x}


 -.. opcode:: DDX - Derivative Relative To X
 +.. opcode:: DDX, DDX_FINE - Derivative Relative To X
 +
 +The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
 +advertised. When it is, the fine version guarantees one derivative per row
 +while DDX is allowed to be the same for the entire 2x2 quad.

  .. math::

 @@ -446,7 +450,11 @@ This instruction replicates its result.
dst.w = partialx(src.w)


 -.. opcode:: DDY - Derivative Relative To Y
 +.. opcode:: DDY, DDY_FINE - Derivative Relative To Y
 +
 +The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
 +advertised. When it is, the fine version guarantees one derivative per column
 +while DDY is allowed to be the same for the entire 2x2 quad.

  .. math::

 diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
 b/src/gallium/drivers/freedreno/freedreno_screen.c
 index de69b14..b156d8b 100644
 --- a/src/gallium/drivers/freedreno/freedreno_screen.c
 +++ b/src/gallium/drivers/freedreno/freedreno_screen.c
 @@ -216,6 +216,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
 pipe_cap param)
 case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
 case PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION:
 case PIPE_CAP_DRAW_INDIRECT:
 +   case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE:
 return 0;

 /* Stream output. */
 diff --git a/src/gallium/drivers/i915/i915_screen.c

Re: [Mesa-dev] [PATCH 4/5] mesa/st: add support for emitting fine derivative opcodes

2014-08-14 Thread Marek Olšák

Reviewed-by: Marek Olšák marek.ol...@amd.com

Marek

On Thu, Aug 14, 2014 at 6:52 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 ---
  src/mesa/state_tracker/st_extensions.c | 3 ++-
  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 9 -
  2 files changed, 10 insertions(+), 2 deletions(-)

 diff --git a/src/mesa/state_tracker/st_extensions.c 
 b/src/mesa/state_tracker/st_extensions.c
 index eace321..24e886c 100644
 --- a/src/mesa/state_tracker/st_extensions.c
 +++ b/src/mesa/state_tracker/st_extensions.c
 @@ -458,7 +458,8 @@ void st_init_extensions(struct pipe_screen *screen,
{ o(ARB_texture_multisample),  PIPE_CAP_TEXTURE_MULTISAMPLE
   },
{ o(ARB_texture_query_lod),PIPE_CAP_TEXTURE_QUERY_LOD  
   },
{ o(ARB_sample_shading),   PIPE_CAP_SAMPLE_SHADING 
   },
 -  { o(ARB_draw_indirect),PIPE_CAP_DRAW_INDIRECT  
   }
 +  { o(ARB_draw_indirect),PIPE_CAP_DRAW_INDIRECT  
   },
 +  { o(ARB_derivative_control),   
 PIPE_CAP_TGSI_FS_FINE_DERIVATIVE  },
 };

 /* Required: render target and sampler support */
 diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
 b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 index 4898166..84bdc4f 100644
 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
 @@ -1462,9 +1462,15 @@ glsl_to_tgsi_visitor::visit(ir_expression *ir)
break;

 case ir_unop_dFdx:
 +   case ir_unop_dFdx_coarse:
emit(ir, TGSI_OPCODE_DDX, result_dst, op[0]);
break;
 +   case ir_unop_dFdx_fine:
 +  emit(ir, TGSI_OPCODE_DDX_FINE, result_dst, op[0]);
 +  break;
 case ir_unop_dFdy:
 +   case ir_unop_dFdy_coarse:
 +   case ir_unop_dFdy_fine:
 {
/* The X component contains 1 or -1 depending on whether the 
 framebuffer
 * is a FBO or the window system buffer, respectively.
 @@ -1485,7 +1491,8 @@ glsl_to_tgsi_visitor::visit(ir_expression *ir)
st_src_reg temp = get_temp(glsl_type::vec4_type);

emit(ir, TGSI_OPCODE_MUL, st_dst_reg(temp), transform_y, op[0]);
 -  emit(ir, TGSI_OPCODE_DDY, result_dst, temp);
 +  emit(ir, ir-operation == ir_unop_dFdy_fine ?
 +   TGSI_OPCODE_DDY_FINE : TGSI_OPCODE_DDY, result_dst, temp);
break;
 }

 --
 1.8.5.5

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/5] nv50, nvc0: add support for fine derivatives

2014-08-14 Thread Marek Olšák

Are you gonna update the release notes too?

Marek

On Thu, Aug 14, 2014 at 6:52 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 The quadop-based method we currently use on all chipsets already
 provides the fine version of the derivatives.

 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 ---
  docs/GL3.txt  | 2 +-
  src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 4 
  src/gallium/drivers/nouveau/nv50/nv50_screen.c| 2 +-
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c| 2 +-
  4 files changed, 7 insertions(+), 3 deletions(-)

 diff --git a/docs/GL3.txt b/docs/GL3.txt
 index 89529fe..0a40e23 100644
 --- a/docs/GL3.txt
 +++ b/docs/GL3.txt
 @@ -189,7 +189,7 @@ GL 4.5, GLSL 4.50:
GL_ARB_clip_control  not started
GL_ARB_conditional_render_inverted   not started
GL_ARB_cull_distance not started
 -  GL_ARB_derivative_controlnot started
 +  GL_ARB_derivative_controlDONE (nv50, nvc0)
GL_ARB_direct_state_access   not started
GL_ARB_get_texture_sub_image started (Brian Paul)
GL_ARB_shader_texture_image_samples  not started
 diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
 b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
 index 14b6d68..456efcb 100644
 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
 +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
 @@ -531,7 +531,9 @@ static nv50_ir::operation translateOpcode(uint opcode)

 NV50_IR_OPCODE_CASE(COS, COS);
 NV50_IR_OPCODE_CASE(DDX, DFDX);
 +   NV50_IR_OPCODE_CASE(DDX_FINE, DFDX);
 NV50_IR_OPCODE_CASE(DDY, DFDY);
 +   NV50_IR_OPCODE_CASE(DDY_FINE, DFDY);
 NV50_IR_OPCODE_CASE(KILL, DISCARD);

 NV50_IR_OPCODE_CASE(SEQ, SET);
 @@ -2327,6 +2329,8 @@ Converter::handleInstruction(const struct 
 tgsi_full_instruction *insn)
 case TGSI_OPCODE_NOT:
 case TGSI_OPCODE_DDX:
 case TGSI_OPCODE_DDY:
 +   case TGSI_OPCODE_DDX_FINE:
 +   case TGSI_OPCODE_DDY_FINE:
FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi)
   mkOp1(op, dstTy, dst0[c], fetchSrc(0, c));
break;
 diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
 b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
 index 34cca3d..8a9a40e 100644
 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
 +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
 @@ -169,6 +169,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum 
 pipe_cap param)
 case PIPE_CAP_USER_VERTEX_BUFFERS:
 case PIPE_CAP_TEXTURE_MULTISAMPLE:
 case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
 +   case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE:
return 1;
 case PIPE_CAP_SEAMLESS_CUBE_MAP:
return 1; /* class_3d = NVA0_3D_CLASS; */
 @@ -200,7 +201,6 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum 
 pipe_cap param)
 case PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION:
 case PIPE_CAP_COMPUTE:
 case PIPE_CAP_DRAW_INDIRECT:
 -   case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE:
return 0;
 }

 diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
 b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
 index 17aee63..c6d9b91 100644
 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
 +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
 @@ -167,6 +167,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
 pipe_cap param)
 case PIPE_CAP_SAMPLE_SHADING:
 case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
 case PIPE_CAP_TEXTURE_GATHER_SM5:
 +   case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE:
return 1;
 case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
return (class_3d = NVE4_3D_CLASS) ? 1 : 0;
 @@ -184,7 +185,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
 pipe_cap param)
 case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT:
 case PIPE_CAP_FAKE_SW_MSAA:
 case PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION:
 -   case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE:
return 0;
 }

 --
 1.8.5.5

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 01/37] i965/gs: Use single dispatch mode as fallback to dual object mode when possible.

2014-08-14 Thread Iago Toral Quiroga

Currently, when a geometry shader can't use dual object mode we fall back to
dual instance mode, however, when invocations == 1, single dispatch mode is
more performant and equally efficient in terms of register pressure.

Single dispatch mode requires that the driver can handle interleaving of
registers, but this is already supported (dual instance mode has the same
requirement).
---
 src/mesa/drivers/dri/i965/brw_context.h   |  8 ---
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 26 +++
 src/mesa/drivers/dri/i965/gen7_gs_state.c |  4 +---
 src/mesa/drivers/dri/i965/gen8_gs_state.c |  4 +---
 4 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 1bbcf46..7439da1 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -587,10 +587,12 @@ struct brw_gs_prog_data
int invocations;
 
/**
-* True if the thread should be dispatched in DUAL_INSTANCE mode, false if
-* it should be dispatched in DUAL_OBJECT mode.
+* Dispatch mode, can be any of:
+* GEN7_GS_DISPATCH_MODE_DUAL_OBJECT
+* GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE
+* GEN7_GS_DISPATCH_MODE_SINGLE
 */
-   bool dual_instanced_dispatch;
+   int dispatch_mode;
 };
 
 /** Number of texture sampler units */
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index b7995ad..c2a4892 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -101,10 +101,11 @@ vec4_gs_visitor::setup_payload()
 {
int attribute_map[BRW_VARYING_SLOT_COUNT * MAX_GS_INPUT_VERTICES];
 
-   /* If we are in dual instanced mode, then attributes are going to be
-* interleaved, so one register contains two attribute slots.
+   /* If we are in dual instanced or single mode, then attributes are going
+* to be interleaved, so one register contains two attribute slots.
 */
-   int attributes_per_reg = c-prog_data.dual_instanced_dispatch ? 2 : 1;
+   int attributes_per_reg =
+  c-prog_data.dispatch_mode == GEN7_GS_DISPATCH_MODE_DUAL_OBJECT ? 1 : 2;
 
/* If a geometry shader tries to read from an input that wasn't written by
 * the vertex shader, that produces undefined results, but it shouldn't
@@ -129,8 +130,7 @@ vec4_gs_visitor::setup_payload()
 
reg = setup_varying_inputs(reg, attribute_map, attributes_per_reg);
 
-   lower_attributes_to_hw_regs(attribute_map,
-   c-prog_data.dual_instanced_dispatch);
+   lower_attributes_to_hw_regs(attribute_map, attributes_per_reg  1);
 
this-first_non_payload_grf = reg;
 }
@@ -640,7 +640,7 @@ brw_gs_emit(struct brw_context *brw,
 */
if (c-prog_data.invocations = 1 
likely(!(INTEL_DEBUG  DEBUG_NO_DUAL_OBJECT_GS))) {
-  c-prog_data.dual_instanced_dispatch = false;
+  c-prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_OBJECT;
 
   vec4_gs_visitor v(brw, c, prog, mem_ctx, true /* no_spills */);
   if (v.run()) {
@@ -652,15 +652,15 @@ brw_gs_emit(struct brw_context *brw,
 
/* Either we failed to compile in DUAL_OBJECT mode (probably because it
 * would have required spilling) or DUAL_OBJECT mode is disabled.  So fall
-* back to DUAL_INSTANCED mode, which consumes fewer registers.
+* back to DUAL_INSTANCED or SINGLE mode, which consumes fewer registers.
 *
-* FIXME: In an ideal world we'd fall back to SINGLE mode, which would
-* allow us to interleave general purpose registers (resulting in even less
-* likelihood of spilling).  But at the moment, the vec4 generator and
-* visitor classes don't have the infrastructure to interleave general
-* purpose registers, so DUAL_INSTANCED is the best we can do.
+* SINGLE mode is more performant when invocations == 1 and DUAL_INSTANCE
+* mode is more performant when invocations  1.
 */
-   c-prog_data.dual_instanced_dispatch = true;
+   if (c-prog_data.invocations = 1)
+  c-prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_SINGLE;
+   else
+  c-prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE;
 
vec4_gs_visitor v(brw, c, prog, mem_ctx, false /* no_spills */);
if (!v.run()) {
diff --git a/src/mesa/drivers/dri/i965/gen7_gs_state.c 
b/src/mesa/drivers/dri/i965/gen7_gs_state.c
index 93f48f6..b3b4ee6 100644
--- a/src/mesa/drivers/dri/i965/gen7_gs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_gs_state.c
@@ -145,9 +145,7 @@ upload_gs_state(struct brw_context *brw)
   GEN7_GS_CONTROL_DATA_HEADER_SIZE_SHIFT) |
  ((brw-gs.prog_data-invocations - 1) 
   GEN7_GS_INSTANCE_CONTROL_SHIFT) |
- (brw-gs.prog_data-dual_instanced_dispatch ?
-  GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE :
-  GEN7_GS_DISPATCH_MODE_DUAL_OBJECT) |
+ brw-gs.prog_data-dispatch_mode |

[Mesa-dev] [PATCH 02/37] i965/gen6/gs: refactor gen6_gs_state

2014-08-14 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsalvez sigles...@igalia.com

Currently, gen6 only uses geometry shaders for transform feedback so the state
we emit is not suitable to accomodate general purpose, user-provided geometry
shaders. This patch paves the way to add these support and the needed
3DSTATE_GS packet modifications for it.

Previous code that emitted state to implement transform feedback in gen6 goes
to upload_gs_state_adhoc_tf().

Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
---
 src/mesa/drivers/dri/i965/gen6_gs_state.c | 105 ++
 1 file changed, 94 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c 
b/src/mesa/drivers/dri/i965/gen6_gs_state.c
index 9648fb7..e132959 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c
@@ -31,7 +31,7 @@
 #include intel_batchbuffer.h
 
 static void
-upload_gs_state(struct brw_context *brw)
+upload_gs_state_for_tf(struct brw_context *brw)
 {
/* Disable all the constant buffers. */
BEGIN_BATCH(5);
@@ -49,11 +49,11 @@ upload_gs_state(struct brw_context *brw)
   OUT_BATCH(GEN6_GS_SPF_MODE | GEN6_GS_VECTOR_MASK_ENABLE);
   OUT_BATCH(0); /* no scratch space */
   OUT_BATCH((2  GEN6_GS_DISPATCH_START_GRF_SHIFT) |
-   (brw-ff_gs.prog_data-urb_read_length  
GEN6_GS_URB_READ_LENGTH_SHIFT));
+(brw-ff_gs.prog_data-urb_read_length  
GEN6_GS_URB_READ_LENGTH_SHIFT));
   OUT_BATCH(((brw-max_gs_threads - 1)  GEN6_GS_MAX_THREADS_SHIFT) |
-   GEN6_GS_STATISTICS_ENABLE |
-   GEN6_GS_SO_STATISTICS_ENABLE |
-   GEN6_GS_RENDERING_ENABLE);
+GEN6_GS_STATISTICS_ENABLE |
+GEN6_GS_SO_STATISTICS_ENABLE |
+GEN6_GS_RENDERING_ENABLE);
   OUT_BATCH(GEN6_GS_SVBI_PAYLOAD_ENABLE |
 GEN6_GS_SVBI_POSTINCREMENT_ENABLE |
 (brw-ff_gs.prog_data-svbi_postincrement_value 
@@ -65,24 +65,107 @@ upload_gs_state(struct brw_context *brw)
   OUT_BATCH(_3DSTATE_GS  16 | (7 - 2));
   OUT_BATCH(0); /* prog_bo */
   OUT_BATCH((0  GEN6_GS_SAMPLER_COUNT_SHIFT) |
-   (0  GEN6_GS_BINDING_TABLE_ENTRY_COUNT_SHIFT));
+(0  GEN6_GS_BINDING_TABLE_ENTRY_COUNT_SHIFT));
   OUT_BATCH(0); /* scratch space base offset */
   OUT_BATCH((1  GEN6_GS_DISPATCH_START_GRF_SHIFT) |
-   (0  GEN6_GS_URB_READ_LENGTH_SHIFT) |
-   (0  GEN6_GS_URB_ENTRY_READ_OFFSET_SHIFT));
+(0  GEN6_GS_URB_READ_LENGTH_SHIFT) |
+(0  GEN6_GS_URB_ENTRY_READ_OFFSET_SHIFT));
   OUT_BATCH((0  GEN6_GS_MAX_THREADS_SHIFT) |
-   GEN6_GS_STATISTICS_ENABLE |
-   GEN6_GS_RENDERING_ENABLE);
+GEN6_GS_STATISTICS_ENABLE |
+GEN6_GS_RENDERING_ENABLE);
+  OUT_BATCH(0);
+  ADVANCE_BATCH();
+   }
+}
+
+static void
+upload_gs_state(struct brw_context *brw)
+{
+   /* BRW_NEW_GEOMETRY_PROGRAM */
+   bool active = brw-geometry_program;
+   /* CACHE_NEW_GS_PROG */
+   const struct brw_vec4_prog_data *prog_data = brw-gs.prog_data-base;
+   const struct brw_stage_state *stage_state = brw-gs.base;
+
+   if (active) {
+  /* FIXME: enable constant buffers */
+  BEGIN_BATCH(5);
+  OUT_BATCH(_3DSTATE_CONSTANT_GS  16 | (5 - 2));
+  OUT_BATCH(0);
+  OUT_BATCH(0);
   OUT_BATCH(0);
+  OUT_BATCH(0);
+  ADVANCE_BATCH();
+
+  BEGIN_BATCH(7);
+  OUT_BATCH(_3DSTATE_GS  16 | (7 - 2));
+  OUT_BATCH(stage_state-prog_offset);
+
+  /* GEN6_GS_SPF_MODE and GEN6_GS_VECTOR_MASK_ENABLE are enabled as it
+   * was previously done for gen6.
+   *
+   * TODO: test with both disabled to see if the HW is behaving
+   * as expected, like in gen7.
+   */
+  OUT_BATCH(GEN6_GS_SPF_MODE | GEN6_GS_VECTOR_MASK_ENABLE |
+((ALIGN(stage_state-sampler_count, 4)/4) 
+ GEN6_GS_SAMPLER_COUNT_SHIFT) |
+((prog_data-base.binding_table.size_bytes / 4) 
+ GEN6_GS_BINDING_TABLE_ENTRY_COUNT_SHIFT));
+
+  if (prog_data-total_scratch) {
+ OUT_RELOC(stage_state-scratch_bo,
+   I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER,
+   ffs(prog_data-total_scratch) - 11);
+  } else {
+ OUT_BATCH(0); /* no scratch space */
+  }
+
+  OUT_BATCH((prog_data-urb_read_length 
+ GEN6_GS_URB_READ_LENGTH_SHIFT) |
+(0  GEN6_GS_URB_ENTRY_READ_OFFSET_SHIFT) |
+(prog_data-base.dispatch_grf_start_reg 
+ GEN6_GS_DISPATCH_START_GRF_SHIFT));
+
+  OUT_BATCH(((brw-max_gs_threads - 1)  GEN6_GS_MAX_THREADS_SHIFT) |
+GEN6_GS_STATISTICS_ENABLE |
+GEN6_GS_SO_STATISTICS_ENABLE |
+GEN6_GS_RENDERING_ENABLE);
+
+  /* FIXME: Enable SVBI payload only when TF is enable in SNB for
+   * user-provided GS.
+

[Mesa-dev] [PATCH 27/37] i965/gen6/gs: Add an additional parameter to the FF_SYNC opcode.

2014-08-14 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsalvez sigles...@igalia.com

We will use this parameter in later patches to provide information relevant
to transform feedback that needs to be set as part of the FF_SYNC message.

Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
---
 src/mesa/drivers/dri/i965/brw_defines.h  |  4 
 src/mesa/drivers/dri/i965/brw_vec4.h |  3 ++-
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 16 +---
 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp|  3 ++-
 4 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 6e8b998..b0d6d9f 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1030,6 +1030,10 @@ enum opcode {
 *   FF_SYNC operation.
 *
 * - src1 is the number of primitives written.
+*
+* - src2 is the value to hold in M0.0: number of SO vertices to write
+*   and number of SO primitives needed. Its value will be overwritten
+*   with the SVBI values if transform feedback is enabled.
 */
GS_OPCODE_FF_SYNC,
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 763cb23..58a5aac 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -679,7 +679,8 @@ private:
struct brw_reg src2);
void generate_gs_ff_sync(struct brw_reg dst,
 struct brw_reg src0,
-struct brw_reg src1);
+struct brw_reg src1,
+struct brw_reg src2);
void generate_gs_set_primitive_id(struct brw_reg dst);
void generate_oword_dual_block_offsets(struct brw_reg m1,
  struct brw_reg index);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index d4554f5..c69b305 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -734,7 +734,8 @@ vec4_generator::generate_gs_ff_sync_set_primitives(struct 
brw_reg dst,
 void
 vec4_generator::generate_gs_ff_sync(struct brw_reg dst,
 struct brw_reg src0,
-struct brw_reg src1)
+struct brw_reg src1,
+struct brw_reg src2)
 {
/* We use dst to setup the ff_sync header, so we expect it to be
 * initialized to R0 by the caller. Here we overwrite dword 0 (cleared
@@ -744,7 +745,7 @@ vec4_generator::generate_gs_ff_sync(struct brw_reg dst,
brw_push_insn_state(p);
brw_set_default_mask_control(p, BRW_MASK_DISABLE);
brw_set_default_access_mode(p, BRW_ALIGN_1);
-   brw_MOV(p, get_element_ud(dst, 0), brw_imm_ud(0));
+   brw_MOV(p, get_element_ud(dst, 0), get_element_ud(src2, 0));
brw_MOV(p, get_element_ud(dst, 1), get_element_ud(src1, 0));
brw_set_default_access_mode(p, BRW_ALIGN_16);
brw_pop_insn_state(p);
@@ -763,6 +764,15 @@ vec4_generator::generate_gs_ff_sync(struct brw_reg dst,
brw_set_default_access_mode(p, BRW_ALIGN_1);
brw_set_default_mask_control(p, BRW_MASK_DISABLE);
brw_MOV(p, get_element_ud(dst, 0), get_element_ud(src0, 0));
+
+   /* src2 is not an immediate when we use transform feedback */
+   if (src2.file != BRW_IMMEDIATE_VALUE) {
+  brw_MOV(p, suboffset(vec1(src2), 0), suboffset(vec1(src0), 1));
+  brw_MOV(p, suboffset(vec1(src2), 1), suboffset(vec1(src0), 2));
+  brw_MOV(p, suboffset(vec1(src2), 2), suboffset(vec1(src0), 3));
+  brw_MOV(p, suboffset(vec1(src2), 3), suboffset(vec1(src0), 4));
+   }
+
brw_set_default_access_mode(p, BRW_ALIGN_16);
brw_pop_insn_state(p);
 }
@@ -1374,7 +1384,7 @@ 
vec4_generator::generate_vec4_instruction(vec4_instruction *instruction,
   break;
 
case GS_OPCODE_FF_SYNC:
-  generate_gs_ff_sync(dst, src[0], src[1]);
+  generate_gs_ff_sync(dst, src[0], src[1], src[2]);
   break;
 
case GS_OPCODE_FF_SYNC_SET_PRIMITIVES:
diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
index b45c381..c1cfe75 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
@@ -331,7 +331,8 @@ gen6_gs_visitor::emit_thread_end()
{
   this-current_annotation = gen6 thread end: ff_sync;
   emit(GS_OPCODE_FF_SYNC,
-   dst_reg(MRF, base_mrf), this-temp, this-prim_count);
+   dst_reg(MRF, base_mrf), this-temp, this-prim_count,
+   brw_imm_ud(0u));
 
   /* Loop over all buffered vertices and emit URB write messages */
   this-current_annotation = gen6 thread end: urb writes init;
-- 
1.9.1

___
mesa-dev mailing list

[Mesa-dev] [PATCH 05/37] i965/gen6/gs: Setup constant push buffers for gen6 geometry shaders.

2014-08-14 Thread Iago Toral Quiroga

---
 src/mesa/drivers/dri/i965/brw_state.h|  1 +
 src/mesa/drivers/dri/i965/brw_state_upload.c |  1 +
 src/mesa/drivers/dri/i965/gen6_gs_state.c| 59 ++--
 3 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index abead18..95dc411 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -101,6 +101,7 @@ extern const struct brw_tracked_state gen6_clip_vp;
 extern const struct brw_tracked_state gen6_color_calc_state;
 extern const struct brw_tracked_state gen6_depth_stencil_state;
 extern const struct brw_tracked_state gen6_gs_state;
+extern const struct brw_tracked_state gen6_gs_push_constants;
 extern const struct brw_tracked_state gen6_gs_binding_table;
 extern const struct brw_tracked_state gen6_multisample_state;
 extern const struct brw_tracked_state gen6_renderbuffer_surfaces;
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index 086956d..0481790 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -128,6 +128,7 @@ static const struct brw_tracked_state *gen6_atoms[] =
gen6_depth_stencil_state,  /* must do before cc unit */
 
gen6_vs_push_constants, /* Before vs_state */
+   gen6_gs_push_constants, /* Before gs_state */
gen6_wm_push_constants, /* Before wm_state */
 
/* Surface state setup.  Must come before the VS/WM unit.  The binding
diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c 
b/src/mesa/drivers/dri/i965/gen6_gs_state.c
index e132959..987b7d2 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c
@@ -31,17 +31,36 @@
 #include intel_batchbuffer.h
 
 static void
-upload_gs_state_for_tf(struct brw_context *brw)
+gen6_upload_gs_push_constants(struct brw_context *brw)
 {
-   /* Disable all the constant buffers. */
-   BEGIN_BATCH(5);
-   OUT_BATCH(_3DSTATE_CONSTANT_GS  16 | (5 - 2));
-   OUT_BATCH(0);
-   OUT_BATCH(0);
-   OUT_BATCH(0);
-   OUT_BATCH(0);
-   ADVANCE_BATCH();
+   /* BRW_NEW_GEOMETRY_PROGRAM */
+   const struct brw_geometry_program *gp =
+  (struct brw_geometry_program *) brw-geometry_program;
+
+   if (gp) {
+  /* CACHE_NEW_GS_PROG */
+  struct brw_stage_state *stage_state = brw-gs.base;
+  struct brw_stage_prog_data *prog_data = brw-gs.prog_data-base.base;
+
+  gen6_upload_push_constants(brw, gp-program.Base, prog_data,
+ stage_state, AUB_TRACE_VS_CONSTANTS);
+   }
+}
 
+const struct brw_tracked_state gen6_gs_push_constants = {
+   .dirty = {
+  .mesa  = _NEW_TRANSFORM | _NEW_PROGRAM_CONSTANTS,
+  .brw   = (BRW_NEW_BATCH |
+BRW_NEW_GEOMETRY_PROGRAM |
+BRW_NEW_PUSH_CONSTANT_ALLOCATION),
+  .cache = CACHE_NEW_GS_PROG,
+   },
+   .emit = gen6_upload_gs_push_constants,
+};
+
+static void
+upload_gs_state_for_tf(struct brw_context *brw)
+{
if (brw-ff_gs.prog_active) {
   BEGIN_BATCH(7);
   OUT_BATCH(_3DSTATE_GS  16 | (7 - 2));
@@ -87,8 +106,8 @@ upload_gs_state(struct brw_context *brw)
const struct brw_vec4_prog_data *prog_data = brw-gs.prog_data-base;
const struct brw_stage_state *stage_state = brw-gs.base;
 
-   if (active) {
-  /* FIXME: enable constant buffers */
+   if (!active || stage_state-push_const_size == 0) {
+  /* Disable the push constant buffers. */
   BEGIN_BATCH(5);
   OUT_BATCH(_3DSTATE_CONSTANT_GS  16 | (5 - 2));
   OUT_BATCH(0);
@@ -96,7 +115,23 @@ upload_gs_state(struct brw_context *brw)
   OUT_BATCH(0);
   OUT_BATCH(0);
   ADVANCE_BATCH();
+   } else {
+  BEGIN_BATCH(5);
+  OUT_BATCH(_3DSTATE_CONSTANT_GS  16 |
+   GEN6_CONSTANT_BUFFER_0_ENABLE |
+   (5 - 2));
+  /* Pointer to the GS constant buffer.  Covered by the set of
+   * state flags from gen6_upload_vs_constants
+   */
+  OUT_BATCH(stage_state-push_const_offset +
+stage_state-push_const_size - 1);
+  OUT_BATCH(0);
+  OUT_BATCH(0);
+  OUT_BATCH(0);
+  ADVANCE_BATCH();
+   }
 
+   if (active) {
   BEGIN_BATCH(7);
   OUT_BATCH(_3DSTATE_GS  16 | (7 - 2));
   OUT_BATCH(stage_state-prog_offset);
@@ -163,7 +198,7 @@ upload_gs_state(struct brw_context *brw)
 
 const struct brw_tracked_state gen6_gs_state = {
.dirty = {
-  .mesa  = _NEW_TRANSFORM,
+  .mesa  = _NEW_TRANSFORM | _NEW_PROGRAM_CONSTANTS,
   .brw   = BRW_NEW_CONTEXT | BRW_NEW_PUSH_CONSTANT_ALLOCATION,
   .cache = (CACHE_NEW_GS_PROG | CACHE_NEW_FF_GS_PROG)
},
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 04/37] i965/gen6/gs: Set brw-gs.enabled to FALSE in gen6_blorp_emit_gs_disable()

2014-08-14 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsalvez sigles...@igalia.com

Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
---
 src/mesa/drivers/dri/i965/gen6_blorp.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.cpp 
b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
index 1cab8b7..34b4331 100644
--- a/src/mesa/drivers/dri/i965/gen6_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
@@ -575,6 +575,7 @@ gen6_blorp_emit_gs_disable(struct brw_context *brw,
OUT_BATCH(0);
OUT_BATCH(0);
ADVANCE_BATCH();
+   brw-gs.enabled = false;
 }
 
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/37] i965/gs: Reuse gen6 constant push buffers setup code in gen7+.

2014-08-14 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsalvez sigles...@igalia.com

The code required for gen6 and gen7+ is almost the same, so reuse it.

Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
---
 src/mesa/drivers/dri/i965/brw_state_upload.c |  4 ++--
 src/mesa/drivers/dri/i965/gen6_gs_state.c|  6 -
 src/mesa/drivers/dri/i965/gen7_gs_state.c| 33 
 3 files changed, 7 insertions(+), 36 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index 0481790..a52a8f4 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -197,7 +197,7 @@ static const struct brw_tracked_state *gen7_atoms[] =
gen6_depth_stencil_state,  /* must do before cc unit */
 
gen6_vs_push_constants, /* Before vs_state */
-   gen7_gs_push_constants, /* Before gs_state */
+   gen6_gs_push_constants, /* Before gs_state */
gen6_wm_push_constants, /* Before wm_surfaces and constant_buffer */
 
/* Surface state setup.  Must come before the VS/WM unit.  The binding
@@ -271,7 +271,7 @@ static const struct brw_tracked_state *gen8_atoms[] =
gen6_color_calc_state,
 
gen6_vs_push_constants, /* Before vs_state */
-   gen7_gs_push_constants, /* Before gs_state */
+   gen6_gs_push_constants, /* Before gs_state */
gen6_wm_push_constants, /* Before wm_surfaces and constant_buffer */
 
/* Surface state setup.  Must come before the VS/WM unit.  The binding
diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c 
b/src/mesa/drivers/dri/i965/gen6_gs_state.c
index 987b7d2..e3256e2 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c
@@ -33,18 +33,22 @@
 static void
 gen6_upload_gs_push_constants(struct brw_context *brw)
 {
+   struct brw_stage_state *stage_state = brw-gs.base;
+
/* BRW_NEW_GEOMETRY_PROGRAM */
const struct brw_geometry_program *gp =
   (struct brw_geometry_program *) brw-geometry_program;
 
if (gp) {
   /* CACHE_NEW_GS_PROG */
-  struct brw_stage_state *stage_state = brw-gs.base;
   struct brw_stage_prog_data *prog_data = brw-gs.prog_data-base.base;
 
   gen6_upload_push_constants(brw, gp-program.Base, prog_data,
  stage_state, AUB_TRACE_VS_CONSTANTS);
}
+
+   if (brw-gen = 7)
+  gen7_upload_constant_state(brw, stage_state, gp, _3DSTATE_CONSTANT_GS);
 }
 
 const struct brw_tracked_state gen6_gs_push_constants = {
diff --git a/src/mesa/drivers/dri/i965/gen7_gs_state.c 
b/src/mesa/drivers/dri/i965/gen7_gs_state.c
index b3b4ee6..2a9955f 100644
--- a/src/mesa/drivers/dri/i965/gen7_gs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_gs_state.c
@@ -26,39 +26,6 @@
 #include brw_defines.h
 #include intel_batchbuffer.h
 
-
-static void
-gen7_upload_gs_push_constants(struct brw_context *brw)
-{
-   const struct brw_stage_state *stage_state = brw-gs.base;
-   /* BRW_NEW_GEOMETRY_PROGRAM */
-   const struct brw_geometry_program *gp =
-  (struct brw_geometry_program *) brw-geometry_program;
-
-   if (gp) {
-  /* CACHE_NEW_GS_PROG */
-  const struct brw_stage_prog_data *prog_data = 
brw-gs.prog_data-base.base;
-  struct brw_stage_state *stage_state = brw-gs.base;
-
-  gen6_upload_push_constants(brw, gp-program.Base, prog_data,
- stage_state, AUB_TRACE_VS_CONSTANTS);
-   }
-
-   gen7_upload_constant_state(brw, stage_state, gp, _3DSTATE_CONSTANT_GS);
-}
-
-const struct brw_tracked_state gen7_gs_push_constants = {
-   .dirty = {
-  .mesa  = _NEW_TRANSFORM | _NEW_PROGRAM_CONSTANTS,
-  .brw   = (BRW_NEW_BATCH |
-BRW_NEW_GEOMETRY_PROGRAM |
-BRW_NEW_PUSH_CONSTANT_ALLOCATION),
-  .cache = CACHE_NEW_GS_PROG,
-   },
-   .emit = gen7_upload_gs_push_constants,
-};
-
-
 static void
 upload_gs_state(struct brw_context *brw)
 {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 09/37] i965/gen6/gs: Add instruction URB flags to geometry shaders EOT message.

2014-08-14 Thread Iago Toral Quiroga

Gen6 seems to require that EOT messages include the complete flag too or else
the GPU hangs. We add will this flag to the instruction when we emit the
thread end opcode.
---
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 8ef0c34..9cb47b2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -462,7 +462,7 @@ vec4_generator::generate_gs_thread_end(vec4_instruction 
*inst)
  brw_null_reg(), /* dest */
  inst-base_mrf, /* starting mrf reg nr */
  src,
- BRW_URB_WRITE_EOT,
+ BRW_URB_WRITE_EOT | inst-urb_write_flags,
  brw-gen = 8 ? 2 : 1,/* message len */
  0,  /* response len */
  0,  /* urb destination offset */
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 03/37] i965/gen6/gs: use brw_gs_prog atom instead of brw_ff_gs_prog

2014-08-14 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsalvez sigles...@igalia.com

This is needed to support user-provided geometry shaders, since the
brw_ff_gs_prog atom in gen6 only takes care of implementing transform feedback
for vertex shaders.

If there is no user-provided geometry shader the implementation falls back to
the original code.

Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
---
 src/mesa/drivers/dri/i965/brw_gs.c   |  4 
 src/mesa/drivers/dri/i965/brw_gs.h   |  1 +
 src/mesa/drivers/dri/i965/brw_state_upload.c |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_gs.c  | 11 ++-
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
b/src/mesa/drivers/dri/i965/brw_gs.c
index fbd728f..c0c4c13 100644
--- a/src/mesa/drivers/dri/i965/brw_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_gs.c
@@ -243,6 +243,10 @@ brw_upload_ff_gs_prog(struct brw_context *brw)
}
 }
 
+void gen6_brw_upload_ff_gs_prog(struct brw_context *brw)
+{
+   brw_upload_ff_gs_prog(brw);
+}
 
 const struct brw_tracked_state brw_ff_gs_prog = {
.dirty = {
diff --git a/src/mesa/drivers/dri/i965/brw_gs.h 
b/src/mesa/drivers/dri/i965/brw_gs.h
index f8f430c..a538948 100644
--- a/src/mesa/drivers/dri/i965/brw_gs.h
+++ b/src/mesa/drivers/dri/i965/brw_gs.h
@@ -110,5 +110,6 @@ void brw_ff_gs_lines(struct brw_ff_gs_compile *c);
 void gen6_sol_program(struct brw_ff_gs_compile *c,
   struct brw_ff_gs_prog_key *key,
   unsigned num_verts, bool check_edge_flag);
+void gen6_brw_upload_ff_gs_prog(struct brw_context *brw);
 
 #endif
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index 3a452c3..086956d 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -108,7 +108,7 @@ static const struct brw_tracked_state *gen4_atoms[] =
 static const struct brw_tracked_state *gen6_atoms[] =
 {
brw_vs_prog, /* must do before state base address */
-   brw_ff_gs_prog, /* must do before state base address */
+   brw_gs_prog, /* must do before state base address */
brw_wm_prog, /* must do before state base address */
 
gen6_clip_vp,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs.c 
b/src/mesa/drivers/dri/i965/brw_vec4_gs.c
index 6428291..2d9e8c2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs.c
@@ -31,6 +31,7 @@
 #include brw_context.h
 #include brw_vec4_gs_visitor.h
 #include brw_state.h
+#include brw_gs.h
 
 
 static bool
@@ -270,6 +271,12 @@ brw_upload_gs_prog(struct brw_context *brw)
   (struct brw_geometry_program *) brw-geometry_program;
 
if (gp == NULL) {
+  if (brw-gen == 6) {
+ if (brw-state.dirty.brw  BRW_NEW_TRANSFORM_FEEDBACK)
+gen6_brw_upload_ff_gs_prog(brw);
+ return;
+  }
+
   /* No geometry shader.  Vertex data just passes straight through. */
   if (brw-state.dirty.brw  BRW_NEW_VUE_MAP_VS) {
  brw-vue_map_geom_out = brw-vue_map_vs;
@@ -325,7 +332,9 @@ brw_upload_gs_prog(struct brw_context *brw)
 const struct brw_tracked_state brw_gs_prog = {
.dirty = {
   .mesa  = (_NEW_LIGHT | _NEW_BUFFERS | _NEW_TEXTURE),
-  .brw   = BRW_NEW_GEOMETRY_PROGRAM | BRW_NEW_VUE_MAP_VS,
+  .brw   = (BRW_NEW_GEOMETRY_PROGRAM |
+BRW_NEW_VUE_MAP_VS |
+BRW_NEW_TRANSFORM_FEEDBACK),
},
.emit = brw_upload_gs_prog
 };
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 33/37] i965/gen6/gs: Enable transform feedback support in geometry shaders

2014-08-14 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsalvez sigles...@igalia.com

Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
---
 src/mesa/drivers/dri/i965/brw_vec4_gs.c   |  6 ++
 src/mesa/drivers/dri/i965/gen6_gs_state.c | 13 +
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs.c 
b/src/mesa/drivers/dri/i965/brw_vec4_gs.c
index f735cf3..53b0a2f 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs.c
@@ -105,6 +105,12 @@ do_gs_prog(struct brw_context *brw,
} else {
   /* There are no control data bits in gen6. */
   c.control_data_bits_per_vertex = 0;
+
+  /* If it is using transform feedback, enable it */
+  if (prog-TransformFeedback.NumVarying)
+ c.prog_data.gen6_xfb_enabled = true;
+  else
+ c.prog_data.gen6_xfb_enabled = false;
}
c.control_data_header_size_bits =
   gp-program.VerticesOut * c.control_data_bits_per_vertex;
diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c 
b/src/mesa/drivers/dri/i965/gen6_gs_state.c
index e3256e2..f2eed19 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c
@@ -171,10 +171,7 @@ upload_gs_state(struct brw_context *brw)
 GEN6_GS_SO_STATISTICS_ENABLE |
 GEN6_GS_RENDERING_ENABLE);
 
-  /* FIXME: Enable SVBI payload only when TF is enable in SNB for
-   * user-provided GS.
-   */
-  if (0) {
+  if (brw-gs.prog_data-gen6_xfb_enabled) {
  /* GEN6_GS_REORDER is equivalent to GEN7_GS_REORDER_TRAILING
   * in gen7. SNB and IVB specs are the same regarding the reordering of
   * TRISTRIP/TRISTRIP_REV vertices and triangle orientation, so we do
@@ -183,9 +180,6 @@ upload_gs_state(struct brw_context *brw)
   */
  OUT_BATCH(GEN6_GS_REORDER |
GEN6_GS_SVBI_PAYLOAD_ENABLE |
-   GEN6_GS_SVBI_POSTINCREMENT_ENABLE |
-   /* FIXME: prog_data-svbi_postincrement_value instead of 0 
*/
-   (0  GEN6_GS_SVBI_POSTINCREMENT_VALUE_SHIFT) |
GEN6_GS_ENABLE);
   } else {
  OUT_BATCH(GEN6_GS_REORDER | GEN6_GS_ENABLE);
@@ -203,7 +197,10 @@ upload_gs_state(struct brw_context *brw)
 const struct brw_tracked_state gen6_gs_state = {
.dirty = {
   .mesa  = _NEW_TRANSFORM | _NEW_PROGRAM_CONSTANTS,
-  .brw   = BRW_NEW_CONTEXT | BRW_NEW_PUSH_CONSTANT_ALLOCATION,
+  .brw   = (BRW_NEW_CONTEXT |
+BRW_NEW_PUSH_CONSTANT_ALLOCATION |
+BRW_NEW_GEOMETRY_PROGRAM |
+BRW_NEW_BATCH),
   .cache = (CACHE_NEW_GS_PROG | CACHE_NEW_FF_GS_PROG)
},
.emit = upload_gs_state,
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 24/37] i965/gen6/gs: implement GS_OPCODE_SVB_WRITE opcode

2014-08-14 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsalvez sigles...@igalia.com

This opcode will be used when sending SVB WRITE messages to save
transform feedback outputs into Streamed Vertex Buffers.

Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
---
 src/mesa/drivers/dri/i965/brw_defines.h  | 12 +++
 src/mesa/drivers/dri/i965/brw_shader.cpp |  2 ++
 src/mesa/drivers/dri/i965/brw_vec4.h |  7 
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 41 
 4 files changed, 62 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index b30a095..83011d6 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1040,6 +1040,18 @@ enum opcode {
 * - dst is the GRF where PrimitiveID information will be moved.
 */
GS_OPCODE_SET_PRIMITIVE_ID,
+
+   /**
+* Write transform feedback data to the SVB by sending a SVB WRITE message.
+* Used in gen6.
+*
+* - dst is the MRF register containing the message header.
+*
+* - src0 is the register where the vertex data is going to be copied from.
+*
+* - src1 is the destination register when write commit occurs.
+*/
+   GS_OPCODE_SVB_WRITE,
 };
 
 enum brw_urb_write_flags {
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index fc3146c..8698b75 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -536,6 +536,8 @@ brw_instruction_name(enum opcode op)
   return ff_sync;
case GS_OPCODE_SET_PRIMITIVE_ID:
   return set_primitive_id;
+   case GS_OPCODE_SVB_WRITE:
+  return gs_svb_write;
 
default:
   /* Yes, this leaks.  It's in debug code, it should never occur, and if
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 6e0da6d..e8456ce 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -220,6 +220,9 @@ public:
enum brw_urb_write_flags urb_write_flags;
bool header_present;
 
+   unsigned sol_binding; /** gen6: SOL binding table index */
+   bool sol_final_write; /** gen6: send commit message */
+
bool is_send_from_grf();
bool can_reswizzle_dst(int dst_writemask, int swizzle, int swizzle_mask);
void reswizzle_dst(int dst_writemask, int swizzle);
@@ -657,6 +660,10 @@ private:
  struct brw_reg src1);
void generate_gs_set_vertex_count(struct brw_reg dst,
  struct brw_reg src);
+   void generate_gs_svb_write(vec4_instruction *inst,
+  struct brw_reg dst,
+  struct brw_reg src0,
+  struct brw_reg src1);
void generate_gs_set_dword_2_immed(struct brw_reg dst, struct brw_reg src);
void generate_gs_set_dword_2(struct brw_reg dst, struct brw_reg src);
void generate_gs_prepare_channel_masks(struct brw_reg dst);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 8293f60..1728790 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -536,6 +536,44 @@ vec4_generator::generate_gs_set_vertex_count(struct 
brw_reg dst,
 }
 
 void
+vec4_generator::generate_gs_svb_write(vec4_instruction *inst,
+  struct brw_reg dst,
+  struct brw_reg src0,
+  struct brw_reg src1)
+{
+   int binding = inst-sol_binding;
+   bool final_write = inst-sol_final_write;
+
+   brw_push_insn_state(p);
+   /* Copy Vertex data into M0.x */
+   brw_MOV(p, stride(dst, 4, 4, 1),
+   stride(retype(src0, BRW_REGISTER_TYPE_UD), 4, 4, 1));
+
+   /* Send SVB Write */
+   brw_svb_write(p,
+ final_write ? src1 : brw_null_reg(), /* dest == src1 */
+ 1, /* msg_reg_nr */
+ dst, /* src0 == previous dst */
+ SURF_INDEX_GEN6_SOL_BINDING(binding), /* binding_table_index 
*/
+ final_write); /* send_commit_msg */
+
+   /* Finally, wait for the write commit to occur so that we can proceed to
+* other things safely.
+*
+* From the Sandybridge PRM, Volume 4, Part 1, Section 3.3:
+*
+*   The write commit does not modify the destination register, but
+*   merely clears the dependency associated with the destination
+*   register. Thus, a simple “mov” instruction using the register as a
+*   source is sufficient to wait for the write commit to occur.
+*/
+   if (final_write) {
+  brw_MOV(p, src1, src1);
+   }
+   brw_pop_insn_state(p);
+}
+
+void
 vec4_generator::generate_gs_set_dword_2_immed(struct brw_reg dst,
   struct brw_reg src)
 {
@@ -1272,6 +1310,9 @@

[Mesa-dev] [PATCH 34/37] i965/gen6/gs: upload ubo and pull constants surfaces.

2014-08-14 Thread Iago Toral Quiroga

Uniforms declared as uniform blocks are stored in ubo surfaces and need to
be pulled from the geometry shader program so make sure we upload them first
and do the same for pull constants.

This fixes all piglit tests that use uniform blocks:
bin/shader_runner tests/spec/glsl-1.50/uniform_buffer/gs-*
---
 src/mesa/drivers/dri/i965/brw_state_upload.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index b0d78ab..af19a4c 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -136,6 +136,8 @@ static const struct brw_tracked_state *gen6_atoms[] =
 */
brw_vs_pull_constants,
brw_vs_ubo_surfaces,
+   brw_gs_pull_constants,
+   brw_gs_ubo_surfaces,
brw_wm_pull_constants,
brw_wm_ubo_surfaces,
gen6_renderbuffer_surfaces,
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 00/37] Geometry shader support in Sandy Bridge

2014-08-14 Thread Iago Toral Quiroga

Hi,

this series brings support for geometry shaders in Sandy Bridge (gen6) and is
combined work from Samuel and myself. A few notes:

1.- Some patches have been based on original work by Ilia Mirkin, specifically
the idea of using arrays to buffer the output of the GS, subclassing the
vec4_gs_visitor for gen6 and generalizing emit_urb_slot().

2.- Geometry shaders were already being used in gen6 to implement transform
feedback support for vertex shaders. We have not changed this. These patches
focus on adding support for user-provided geometry shaders and transform
feedback support for the geometry shader stage. In the future it probably
makes sense to merge transform feedback support for the vertex shader stage
in our implementation so there is only one code path for geometry shaders
in gen6, but it is probably better to tackle that at a later moment, once we
have merged this work.

2.- On Ivy Bridge there are no piglit regressions.

3.- On Sandy Bridge we get these results after enabling OpenGL 3.2 and
GLSL 1.50 (*1):

  crash:+0
  fail:+15 (*2)
  pass:  +3265
  skip:  -3280

(*1) Including Jordan's patches from the series
Gen6 render surface state changes since these are required to enable
layered rendering in geometry shaders. The numbers were obtained by comparing
master with Jordan's patches on top (OpenGL 3.1, GLSL 1.40) against master
with these and Jordan's patches on top (OpenGL 3.2, GLSL 1.50)

(*2) These are mostly tests that either fail in Ivy Bridge too, are GS
variants of tests that also fail for the VS/FS stages or relate to other
aspects of OpenGL 3.2 that are not related with geometry shaders.

4.- With these patches, the following piglit test hangs:
bin/glsl-1.50-geometry-primitive-id-restart GL_TRIANGLE_STRIP_ADJACENCY

This problem seems to be unrelated to our implementation, since the hang
happens only for that primitive type, only when using glDrawElements()
(so glDrawArrays works fine), and only in specific cases where the list
of indices provided includes repeated indices with a certain pattern. Actually,
this test hangs even if we have a geometry shader that does nothing (i.e. an
empty main function), where the code we generate is trivial and works with
any other primitive type. Based on this, I conclude that this is a problem
originating somewhere else, I think probably a hardware bug. Because of this,
piglit runs with these patches should exclude this test by including
-x primitive-id-restart. The offending piglit test can be trivially reworked
to avoid repeating indices in the call to glDrawElements() too. I'll
develop this issue further in another thread so we can decide what to do about
this problem.

I'll be on holidays for the next two weeks, starting tomorrow, but Samuel will
be around since Tuesday next week so he can start acting on the review feedback
we get.

A quick summary of the patches:

- Patch 1: is actually about gen7, but since gen6's dispatch mode for geometry
  shaders is equivalent to gen7's SINGLE mode it makes sense to do this first.
- Patches 2-4 refactor 3DSTATE_GS to accomodate the code path for user-provided
  geometry shaders while keeping the original code that handles TF support
  in vertex shaders.
- Patches 5-13 implement generator opcodes, configure state packets and 
  handle required URB space.
- Patches 14-15 generalize emit_urb_slot() so we can reuse that code.
- Patches 16-19 are the gen6 geometry shader visitor implementation.
- Patches 20-21 implement gl_PrimitiveIDIn.
- Patch 22 makes sure we compute the right VUE map for user-provided GS.
- Patch 23 enables texture related functions in the GS stage.
- Patches 24-33 mostly implement transform feedback
- Patch 34 handles uploading of ubo and pull constant surfaces
- Patch 35 makes gen6 use this implementation of geometry shaders
- Patches 36-37 enable GLSL 1.5 and OpenGL 3.2 in gen6

Iago Toral Quiroga (23):
  i965/gs: Use single dispatch mode as fallback to dual object mode when
possible.
  i965/gen6/gs: Setup constant push buffers for gen6 geometry shaders.
  i965/gen6/gs: Implement GS_OPCODE_FF_SYNC.
  i965/gen6/gs: Implement GS_OPCODE_URB_WRITE_ALLOCATE.
  i965/gen6/gs: Add instruction URB flags to geometry shaders EOT
message.
  i965/gen6/gs: Compute URB entry size for user-provided geometry
shaders.
  i965/gen6/gs: Enable URB space for user-provided geometry shaders.
  i965/gen6/gs: Upload binding table for user-provided geometry shaders.
  i965/gen6/gs: Implement GS_OPCODE_SET_DWORD_2.
  i965: Provide means to create registers of a given size.
  i965: Generalize emit_urb_slot() to emit to any dst_reg.
  i965/gen6/gs: Add initial implementation for a gen6 geometry shader
visitor.
  i965/gen6/gs: Implement geometry shaders for outputs other than
points.
  i965/gen6/gs: Make sure we complete the last primitive.
  i965/gen6/gs: Handle the case where a geometry shader emits no output.
  i965/gen6/gs: Implement GS_OPCODE_SET_PRIMITIVE_ID.
  i965/gen6/gs:

[Mesa-dev] [PATCH 13/37] i965/gen6/gs: Implement GS_OPCODE_SET_DWORD_2.

2014-08-14 Thread Iago Toral Quiroga

We have GS_OPCODE_SET_DWORD_2_IMMED but this requires its source argument to be
an immediate. In gen6 we need to set dword 2 of the URB write message header
from values stored in separate register, so we need something more flexible.
---
 src/mesa/drivers/dri/i965/brw_defines.h  |  8 
 src/mesa/drivers/dri/i965/brw_shader.cpp |  2 ++
 src/mesa/drivers/dri/i965/brw_vec4.h |  1 +
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 15 +++
 4 files changed, 26 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index a2b40fb..f6bdaeb 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -979,6 +979,14 @@ enum opcode {
GS_OPCODE_SET_DWORD_2_IMMED,
 
/**
+* Same as above but can take the DWORD 2 value from any general purpose
+* register, not necessarily an immediate. Used by geometry shaders in gen6
+* which need to set DWORD 2 of the URB write message header with vertex
+* flags that we have buffered in a separate register.
+*/
+   GS_OPCODE_SET_DWORD_2,
+
+   /**
 * Prepare the dst register for storage in the Channel Mask fields of a
 * URB_WRITE message header.
 *
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 69d16a7..b927601 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -524,6 +524,8 @@ brw_instruction_name(enum opcode op)
   return set_vertex_count;
case GS_OPCODE_SET_DWORD_2_IMMED:
   return set_dword_2_immed;
+   case GS_OPCODE_SET_DWORD_2:
+  return set_dword_2;
case GS_OPCODE_PREPARE_CHANNEL_MASKS:
   return prepare_channel_masks;
case GS_OPCODE_SET_CHANNEL_MASKS:
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index c1daf54..5403f5a 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -657,6 +657,7 @@ private:
void generate_gs_set_vertex_count(struct brw_reg dst,
  struct brw_reg src);
void generate_gs_set_dword_2_immed(struct brw_reg dst, struct brw_reg src);
+   void generate_gs_set_dword_2(struct brw_reg dst, struct brw_reg src);
void generate_gs_prepare_channel_masks(struct brw_reg dst);
void generate_gs_set_channel_masks(struct brw_reg dst, struct brw_reg src);
void generate_gs_get_instance_id(struct brw_reg dst);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 9cb47b2..2bf2b67 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -550,6 +550,17 @@ vec4_generator::generate_gs_set_dword_2_immed(struct 
brw_reg dst,
 }
 
 void
+vec4_generator::generate_gs_set_dword_2(struct brw_reg dst, struct brw_reg src)
+{
+   brw_push_insn_state(p);
+   brw_set_default_access_mode(p, BRW_ALIGN_1);
+   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
+   brw_MOV(p, suboffset(vec1(dst), 2), suboffset(vec1(src), 0));
+   brw_set_default_access_mode(p, BRW_ALIGN_16);
+   brw_pop_insn_state(p);
+}
+
+void
 vec4_generator::generate_gs_prepare_channel_masks(struct brw_reg dst)
 {
/* We want to left shift just DWORD 4 (the x component belonging to the
@@ -1252,6 +1263,10 @@ 
vec4_generator::generate_vec4_instruction(vec4_instruction *instruction,
   generate_gs_set_dword_2_immed(dst, src[0]);
   break;
 
+   case GS_OPCODE_SET_DWORD_2:
+  generate_gs_set_dword_2(dst, src[0]);
+  break;
+
case GS_OPCODE_PREPARE_CHANNEL_MASKS:
   generate_gs_prepare_channel_masks(dst);
   break;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 07/37] i965/gen6/gs: Implement GS_OPCODE_FF_SYNC.

2014-08-14 Thread Iago Toral Quiroga

This implements the FF_SYNC message required in gen6  geometry shaders to
get the initial URB handle.
---
 src/mesa/drivers/dri/i965/brw_defines.h  | 14 +
 src/mesa/drivers/dri/i965/brw_shader.cpp |  2 ++
 src/mesa/drivers/dri/i965/brw_vec4.h |  3 ++
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 40 
 4 files changed, 59 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 3564041..125d728 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1002,6 +1002,20 @@ enum opcode {
 * - dst is the GRF for gl_InvocationID.
 */
GS_OPCODE_GET_INSTANCE_ID,
+
+   /**
+* Send a FF_SYNC message to allocate initial URB handles (gen6).
+*
+* - dst will hold the newly allocated VUE handle. It is expected to be
+*   be initialized so that it can be used to as the FF_SYNC message header
+*   (that is, it won't do an implied move from R0).
+*
+* - src0 is a temporary that will be used as writeback register for the
+*   FF_SYNC operation.
+*
+* - src1 is the number of primitives written.
+*/
+   GS_OPCODE_FF_SYNC,
 };
 
 enum brw_urb_write_flags {
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 0033135..5749061 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -528,6 +528,8 @@ brw_instruction_name(enum opcode op)
   return set_channel_masks;
case GS_OPCODE_GET_INSTANCE_ID:
   return get_instance_id;
+   case GS_OPCODE_FF_SYNC:
+  return ff_sync;
 
default:
   /* Yes, this leaks.  It's in debug code, it should never occur, and if
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 67132c0..72fabdd 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -659,6 +659,9 @@ private:
void generate_gs_prepare_channel_masks(struct brw_reg dst);
void generate_gs_set_channel_masks(struct brw_reg dst, struct brw_reg src);
void generate_gs_get_instance_id(struct brw_reg dst);
+   void generate_gs_ff_sync(struct brw_reg dst,
+struct brw_reg src0,
+struct brw_reg src1);
void generate_oword_dual_block_offsets(struct brw_reg m1,
  struct brw_reg index);
void generate_scratch_write(vec4_instruction *inst,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index c63b47a..05f4892 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -621,6 +621,42 @@ vec4_generator::generate_gs_get_instance_id(struct brw_reg 
dst)
 }
 
 void
+vec4_generator::generate_gs_ff_sync(struct brw_reg dst,
+struct brw_reg src0,
+struct brw_reg src1)
+{
+   /* We use dst to setup the ff_sync header, so we expect it to be
+* initialized to R0 by the caller. Here we overwrite dword 0 (cleared
+* for now since we are not doing transform feedback) and dword 1
+* (to hold the number of primitives written).
+*/
+   brw_push_insn_state(p);
+   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
+   brw_set_default_access_mode(p, BRW_ALIGN_1);
+   brw_MOV(p, get_element_ud(dst, 0), brw_imm_ud(0));
+   brw_MOV(p, get_element_ud(dst, 1), get_element_ud(src1, 0));
+   brw_set_default_access_mode(p, BRW_ALIGN_16);
+   brw_pop_insn_state(p);
+
+   /* Write allocated URB handle to temporary passed in src0 */
+   brw_ff_sync(p,
+   src0,
+   0,
+   dst,
+   1, /* allocate */
+   1, /* response length */
+   0 /* eot */);
+
+   /* Now put allocated urb handle in dst.0 */
+   brw_push_insn_state(p);
+   brw_set_default_access_mode(p, BRW_ALIGN_1);
+   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
+   brw_MOV(p, get_element_ud(dst, 0), get_element_ud(src0, 0));
+   brw_set_default_access_mode(p, BRW_ALIGN_16);
+   brw_pop_insn_state(p);
+}
+
+void
 vec4_generator::generate_oword_dual_block_offsets(struct brw_reg m1,
   struct brw_reg index)
 {
@@ -1198,6 +1234,10 @@ 
vec4_generator::generate_vec4_instruction(vec4_instruction *instruction,
   generate_gs_get_instance_id(dst);
   break;
 
+   case GS_OPCODE_FF_SYNC:
+  generate_gs_ff_sync(dst, src[0], src[1]);
+  break;
+
case SHADER_OPCODE_SHADER_TIME_ADD:
   brw_shader_time_add(p, src[0],
   prog_data-base.binding_table.shader_time_start);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

[Mesa-dev] [PATCH 17/37] i965/gen6/gs: Implement geometry shaders for outputs other than points.

2014-08-14 Thread Iago Toral Quiroga

---
 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 72 ---
 src/mesa/drivers/dri/i965/gen6_gs_visitor.h   |  2 +
 2 files changed, 67 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
index b78c55e..5123bd7 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
@@ -79,6 +79,21 @@ gen6_gs_visitor::emit_prolog()
 * and URB_WRITE messages.
 */
this-temp = src_reg(this, glsl_type::uint_type);
+
+   /* This will be used to know when we are processing the first vertex of
+* a primitive. We will set this to URB_WRITE_PRIM_START only when we know
+* that we are processing the first vertex in the primitive and to zero
+* otherwise. This way we can use its value directly in the URB write
+* headers.
+*/
+   this-first_vertex = src_reg(this, glsl_type::uint_type);
+   emit(MOV(dst_reg(this-first_vertex), URB_WRITE_PRIM_START));
+
+   /* The FF_SYNC message requires to know the number of primitives generated,
+* so keep a counter for this.
+*/
+   this-prim_count = src_reg(this, glsl_type::uint_type);
+   emit(MOV(dst_reg(this-prim_count), 0u));
 }
 
 void
@@ -109,18 +124,26 @@ gen6_gs_visitor::visit(ir_emit_vertex *)
   this-vertex_output_offset, 1u));
   }
 
-  /* Now buffer flags for this vertex (we only support point output
-   * for now).
-   */
+  /* Now buffer flags for this vertex */
   dst_reg dst(this-vertex_output);
   dst.reladdr = ralloc(mem_ctx, src_reg);
   memcpy(dst.reladdr, this-vertex_output_offset, sizeof(src_reg));
-  /* If we are outputting points, then every vertex has PrimStart and
-   * PrimEnd set.
-   */
   if (c-gp-program.OutputType == GL_POINTS) {
+ /* If we are outputting points, then every vertex has PrimStart and
+  * PrimEnd set.
+  */
  emit(MOV(dst, (_3DPRIM_POINTLIST  URB_WRITE_PRIM_TYPE_SHIFT) |
   URB_WRITE_PRIM_START | URB_WRITE_PRIM_END));
+ emit(ADD(dst_reg(this-prim_count), this-prim_count, 1u));
+  } else {
+ /* Otherwise, we can only set the PrimStart flag, which we have stored
+  * in the first_vertex register. We will have to wait until we execute
+  * EndPrimitive() or we end the thread to set the PrimEnd flag on a
+  * vertex.
+  */
+ emit(OR(dst, this-first_vertex,
+ (c-prog_data.output_topology  URB_WRITE_PRIM_TYPE_SHIFT)));
+ emit(MOV(dst_reg(this-first_vertex), 0u));
   }
   emit(ADD(dst_reg(this-vertex_output_offset),
this-vertex_output_offset, 1u));
@@ -140,6 +163,41 @@ gen6_gs_visitor::visit(ir_end_primitive *)
 */
if (c-gp-program.OutputType == GL_POINTS)
   return;
+
+   /* Otheriwse we know that the last vertex we have processed was the last
+* vertex in the primitive and we need to set its PrimEnd flag, so do this
+* unless we haven't emitted that vertex at all.
+*
+* Notice that we have already incremented vertex_count when we processed
+* the last emit_vertex, so we need to take that into account in the
+* comparison below (hence the num_output_vertices + 1 in the comparison
+* below).
+*/
+   unsigned num_output_vertices = c-gp-program.VerticesOut;
+   emit(CMP(dst_null_d(), this-vertex_count, src_reg(num_output_vertices + 1),
+BRW_CONDITIONAL_L));
+   emit(IF(BRW_PREDICATE_NORMAL));
+   {
+  /* vertex_output_offset is already pointing at the first entry of the
+   * next vertex. So subtract 1 to modify the flags for the previous
+   * vertex.
+   */
+  src_reg offset(this, glsl_type::uint_type);
+  emit(ADD(dst_reg(offset), this-vertex_output_offset, brw_imm_d(-1)));
+
+  src_reg dst(this-vertex_output);
+  dst.reladdr = ralloc(mem_ctx, src_reg);
+  memcpy(dst.reladdr, offset, sizeof(src_reg));
+
+  emit(OR(dst_reg(dst), dst, URB_WRITE_PRIM_END));
+  emit(ADD(dst_reg(this-prim_count), this-prim_count, 1u));
+
+  /* Set the first vertex flag to indicate that the next vertex will start
+   * a primitive.
+   */
+  emit(MOV(dst_reg(this-first_vertex), URB_WRITE_PRIM_START));
+   }
+   emit(BRW_OPCODE_ENDIF);
 }
 
 void
@@ -234,7 +292,7 @@ gen6_gs_visitor::emit_thread_end()
/* Issue the FF_SYNC message and obtain the initial VUE handle. */
this-current_annotation = gen6 thread end: ff_sync;
emit(GS_OPCODE_FF_SYNC,
-dst_reg(MRF, base_mrf), this-temp, this-vertex_count);
+dst_reg(MRF, base_mrf), this-temp, this-prim_count);
 
/* Loop over all buffered vertices and emit URB write messages */
this-current_annotation = gen6 thread end: urb writes init;
diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.h 
b/src/mesa/drivers/dri/i965/gen6_gs_visitor.h
index 6dd3a19..68fe88d 100644
---

[Mesa-dev] [PATCH 36/37] i965/gen6: enable GLSL 1.50

2014-08-14 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsalvez sigles...@igalia.com

Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
---
 src/mesa/drivers/dri/i965/intel_extensions.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index e134cd9..9875b7c 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -246,7 +246,7 @@ intelInitExtensions(struct gl_context *ctx)
if (brw-gen = 7)
   ctx-Const.GLSLVersion = 330;
else if (brw-gen = 6)
-  ctx-Const.GLSLVersion = 140;
+  ctx-Const.GLSLVersion = 150;
else
   ctx-Const.GLSLVersion = 120;
_mesa_override_glsl_version(ctx-Const);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 26/37] i965/gen6/gs: implement GS_OPCODE_FF_SYNC_SET_PRIMITIVES opcode

2014-08-14 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsalvez sigles...@igalia.com

This opcode will be used when filling FF_SYNC header before
emitting vertices and their data.

Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
---
 src/mesa/drivers/dri/i965/brw_defines.h  | 15 +++
 src/mesa/drivers/dri/i965/brw_shader.cpp |  2 ++
 src/mesa/drivers/dri/i965/brw_vec4.h |  4 
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 24 
 4 files changed, 45 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 7095c39..6e8b998 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1061,6 +1061,21 @@ enum opcode {
 * - src is the register that holds the destination indices value.
 */
GS_OPCODE_SVB_SET_DST_INDEX,
+
+   /**
+* Prepare Mx.0 subregister for being used in the FF_SYNC message header.
+* Used in gen6 for transform feedback.
+*
+* - dst will hold the register with the final Mx.0 value.
+*
+* - src0 has the number of vertices emitted in SO (NumSOVertsToWrite)
+*
+* - src1 has the number of needed primitives for SO (NumSOPrimsNeeded)
+*
+* - src2 is the value to hold in M0: number of SO vertices to write
+*   and number of SO primitives needed.
+*/
+   GS_OPCODE_FF_SYNC_SET_PRIMITIVES,
 };
 
 enum brw_urb_write_flags {
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index bf625a5..7328fdc 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -540,6 +540,8 @@ brw_instruction_name(enum opcode op)
   return gs_svb_write;
case GS_OPCODE_SVB_SET_DST_INDEX:
   return gs_svb_set_dst_index;
+   case GS_OPCODE_FF_SYNC_SET_PRIMITIVES:
+  return gs_ff_sync_set_primitives;
 
default:
   /* Yes, this leaks.  It's in debug code, it should never occur, and if
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index ea3967d..763cb23 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -673,6 +673,10 @@ private:
void generate_gs_prepare_channel_masks(struct brw_reg dst);
void generate_gs_set_channel_masks(struct brw_reg dst, struct brw_reg src);
void generate_gs_get_instance_id(struct brw_reg dst);
+   void generate_gs_ff_sync_set_primitives(struct brw_reg dst,
+   struct brw_reg src0,
+   struct brw_reg src1,
+   struct brw_reg src2);
void generate_gs_ff_sync(struct brw_reg dst,
 struct brw_reg src0,
 struct brw_reg src1);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index d914a52..d4554f5 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -712,6 +712,26 @@ vec4_generator::generate_gs_get_instance_id(struct brw_reg 
dst)
 }
 
 void
+vec4_generator::generate_gs_ff_sync_set_primitives(struct brw_reg dst,
+  struct brw_reg src0,
+  struct brw_reg src1,
+  struct brw_reg src2)
+{
+   brw_push_insn_state(p);
+   brw_set_default_access_mode(p, BRW_ALIGN_1);
+   /* Save src0 data in 16:31 bits of dst.0 */
+   brw_AND(p, suboffset(vec1(dst), 0), suboffset(vec1(src0), 0), 
brw_imm_ud(0xu));
+   brw_SHL(p, suboffset(vec1(dst), 0), suboffset(vec1(dst), 0), 
brw_imm_ud(16));
+   /* Save src1 data in 0:15 bits of dst.0 */
+   brw_AND(p, suboffset(vec1(src2), 0), suboffset(vec1(src1), 0), 
brw_imm_ud(0xu));
+   brw_OR(p, suboffset(vec1(dst), 0),
+  suboffset(vec1(dst), 0),
+  suboffset(vec1(src2), 0));
+   brw_set_default_access_mode(p, BRW_ALIGN_16);
+   brw_pop_insn_state(p);
+}
+
+void
 vec4_generator::generate_gs_ff_sync(struct brw_reg dst,
 struct brw_reg src0,
 struct brw_reg src1)
@@ -1357,6 +1377,10 @@ 
vec4_generator::generate_vec4_instruction(vec4_instruction *instruction,
   generate_gs_ff_sync(dst, src[0], src[1]);
   break;
 
+   case GS_OPCODE_FF_SYNC_SET_PRIMITIVES:
+  generate_gs_ff_sync_set_primitives(dst, src[0], src[1], src[2]);
+  break;
+
case GS_OPCODE_SET_PRIMITIVE_ID:
   generate_gs_set_primitive_id(dst);
   break;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 30/37] i965/gen6/gs: Buffer PSIZ/flags vertex data in gen6_gs_visitor

2014-08-14 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsalvez sigles...@igalia.com

Since geometry shaders can alter the value of varyings packed in the first
output VUE slot (PSIZ), we need to buffer it together with all the other
vertex data so we can emit the right value for each vertex when we do the
URB writes.

This fixes the following piglit test in gen6:
tests/spec/glsl-1.50/execution/redeclare-pervertex-out-subset-gs.shader_test

Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
---
 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 79 ++-
 1 file changed, 41 insertions(+), 38 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
index b8eaa58..fca7536 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
@@ -178,16 +178,33 @@ gen6_gs_visitor::visit(ir_emit_vertex *)
 
   /* Buffer all output slots for this vertex in vertex_output */
   for (int slot = 0; slot  prog_data-vue_map.num_slots; ++slot) {
- /* We will handle PSIZ for each vertex at thread end time since it
-  * is not computed by the GS algorithm and requires specific handling.
-  */
  int varying = prog_data-vue_map.slot_to_varying[slot];
  if (varying != VARYING_SLOT_PSIZ) {
 dst_reg dst(this-vertex_output);
 dst.reladdr = ralloc(mem_ctx, src_reg);
 memcpy(dst.reladdr, this-vertex_output_offset, sizeof(src_reg));
 emit_urb_slot(dst, varying);
+ } else {
+/* The PSIZ slot can pack multiple varyings in different channels
+ * and emit_urb_slot() will produce a MOV instruction for each of
+ * them. Since we are writing to an array, that will translate to
+ * possibly multiple MOV instructions with an array destination and
+ * each will generate a scratch write with the same offset into
+ * scratch space (thus, each one overwriting the previous). This is
+ * not what we want. What we will do instead is emit PSIZ to a
+ * a regular temporary register, then move that resgister into the
+ * array. This way we only have one instruction with an array
+ * destination and we only produce a single scratch write.
+ */
+dst_reg tmp = dst_reg(src_reg(this, glsl_type::uvec4_type));
+emit_urb_slot(tmp, varying);
+dst_reg dst(this-vertex_output);
+dst.reladdr = ralloc(mem_ctx, src_reg);
+memcpy(dst.reladdr, this-vertex_output_offset, sizeof(src_reg));
+vec4_instruction *inst = emit(MOV(dst, src_reg(tmp)));
+inst-force_writemask_all = true;
  }
+
  emit(ADD(dst_reg(this-vertex_output_offset),
   this-vertex_output_offset, 1u));
   }
@@ -427,17 +444,12 @@ gen6_gs_visitor::emit_thread_end()
memcpy(data.reladdr, this-vertex_output_offset,
   sizeof(src_reg));
 
-   if (varying == VARYING_SLOT_PSIZ) {
-  /* We did not buffer PSIZ, emit it directly here */
-  emit_urb_slot(dst_reg(MRF, mrf), varying);
-   } else {
-  /* Copy this slot to the appropriate message register */
-  dst_reg reg = dst_reg(MRF, mrf);
-  reg.type = output_reg[varying].type;
-  data.type = reg.type;
-  vec4_instruction *inst = emit(MOV(reg, data));
-  inst-force_writemask_all = true;
-   }
+   /* Copy this slot to the appropriate message register */
+   dst_reg reg = dst_reg(MRF, mrf);
+   reg.type = output_reg[varying].type;
+   data.type = reg.type;
+   vec4_instruction *inst = emit(MOV(reg, data));
+   inst-force_writemask_all = true;
 
mrf++;
emit(ADD(dst_reg(this-vertex_output_offset),
@@ -585,22 +597,19 @@ gen6_gs_visitor::xfb_buffer_output()
/* Buffer all TF outputs for this vertex in xfb_output */
for (int binding = 0; binding  prog_data-num_transform_feedback_bindings;
 binding++) {
-  /* We will handle PSIZ for each vertex at thread end time since it
-   * is not computed by the GS algorithm and requires specific handling.
-   */
   unsigned varying =
  prog_data-transform_feedback_bindings[binding];
-  if (varying != VARYING_SLOT_PSIZ) {
- dst_reg dst(this-xfb_output);
- dst.reladdr = ralloc(mem_ctx, src_reg);
- memcpy(dst.reladdr, this-xfb_output_offset, sizeof(src_reg));
- dst.type = output_reg[varying].type;
+  dst_reg dst(this-xfb_output);
+  dst.reladdr = ralloc(mem_ctx, src_reg);
+  memcpy(dst.reladdr, this-xfb_output_offset, sizeof(src_reg));
+  dst.type = output_reg[varying].type;
+
+

[Mesa-dev] [PATCH 11/37] i965/gen6/gs: Enable URB space for user-provided geometry shaders.

2014-08-14 Thread Iago Toral Quiroga

---
 src/mesa/drivers/dri/i965/gen6_urb.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_urb.c 
b/src/mesa/drivers/dri/i965/gen6_urb.c
index b694f5d..7af1f37 100644
--- a/src/mesa/drivers/dri/i965/gen6_urb.c
+++ b/src/mesa/drivers/dri/i965/gen6_urb.c
@@ -52,19 +52,29 @@ gen6_upload_urb( struct brw_context *brw )
int nr_vs_entries, nr_gs_entries;
int total_urb_size = brw-urb.size * 1024; /* in bytes */
 
+   bool gs_present = brw-ff_gs.prog_active || brw-geometry_program;
+
/* CACHE_NEW_VS_PROG */
unsigned vs_size = MAX2(brw-vs.prog_data-base.urb_entry_size, 1);
 
-   /* We use the same VUE layout for VS outputs and GS outputs (as it's what
-* the SF and Clipper expect), so we can simply make the GS URB entry size
-* the same as for the VS.  This may technically be too large in cases
-* where we have few vertex attributes and a lot of varyings, since the VS
-* size is determined by the larger of the two.  For now, it's safe.
+   /* Whe using GS to do transform feedback only we use the same VUE layout for
+* VS outputs and GS outputs (as it's what the SF and Clipper expect), so we
+* can simply make the GS URB entry size the same as for the VS.  This may
+* technically be too large in cases where we have few vertex attributes and
+* a lot of varyings, since the VS size is determined by the larger of the
+* two. For now, it's safe.
+*
+* For user-provided GS the assumption above does not hold since the GS
+* outputs can be different from the VS outputs.
 */
unsigned gs_size = vs_size;
+   if (brw-geometry_program) {
+  gs_size = brw-gs.prog_data-base.urb_entry_size;
+  assert(gs_size = 1);
+   }
 
/* Calculate how many entries fit in each stage's section of the URB */
-   if (brw-ff_gs.prog_active) {
+   if (gs_present) {
   nr_vs_entries = (total_urb_size/2) / (vs_size * 128);
   nr_gs_entries = (total_urb_size/2) / (gs_size * 128);
} else {
@@ -109,16 +119,16 @@ gen6_upload_urb( struct brw_context *brw )
 * doesn't exist on Gen6).  So for now we just do a full pipeline flush as
 * a workaround.
 */
-   if (brw-urb.gen6_gs_previously_active  !brw-ff_gs.prog_active)
+   if (brw-urb.gen6_gs_previously_active  !gs_present)
   intel_batchbuffer_emit_mi_flush(brw);
-   brw-urb.gen6_gs_previously_active = brw-ff_gs.prog_active;
+   brw-urb.gen6_gs_previously_active = gs_present;
 }
 
 const struct brw_tracked_state gen6_urb = {
.dirty = {
   .mesa = 0,
-  .brw = BRW_NEW_CONTEXT,
-  .cache = (CACHE_NEW_VS_PROG | CACHE_NEW_FF_GS_PROG),
+  .brw = (BRW_NEW_CONTEXT | BRW_NEW_GEOMETRY_PROGRAM),
+  .cache = (CACHE_NEW_VS_PROG | CACHE_NEW_GS_PROG | CACHE_NEW_FF_GS_PROG),
},
.emit = gen6_upload_urb,
 };
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 37/37] i965/gen6: enable OpenGL 3.2

2014-08-14 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsalvez sigles...@igalia.com

Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
---
 src/mesa/drivers/dri/i965/intel_screen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index ea0fc58..83101a5 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1273,7 +1273,7 @@ set_max_gl_versions(struct intel_screen *screen)
   psp-max_gl_es2_version = 30;
   break;
case 6:
-  psp-max_gl_core_version = 31;
+  psp-max_gl_core_version = 32;
   psp-max_gl_compat_version = 30;
   psp-max_gl_es1_version = 11;
   psp-max_gl_es2_version = 30;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 25/37] i965/gen6/gs: implement GS_OPCODE_SVB_SET_DST_INDEX opcode

2014-08-14 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsalvez sigles...@igalia.com

This opcode generates code to copy the specified destination index
into subregister 5 of the MRF message header.

Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
---
 src/mesa/drivers/dri/i965/brw_defines.h  |  9 +
 src/mesa/drivers/dri/i965/brw_shader.cpp |  2 ++
 src/mesa/drivers/dri/i965/brw_vec4.h |  4 
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 20 
 4 files changed, 35 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 83011d6..7095c39 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1052,6 +1052,15 @@ enum opcode {
 * - src1 is the destination register when write commit occurs.
 */
GS_OPCODE_SVB_WRITE,
+
+   /**
+* Set destination index in the SVB write message payload (M0.5). Used
+* in gen6 for transform feedback.
+*
+* - dst is the header to save the destination indices for SVB WRITE.
+* - src is the register that holds the destination indices value.
+*/
+   GS_OPCODE_SVB_SET_DST_INDEX,
 };
 
 enum brw_urb_write_flags {
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 8698b75..bf625a5 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -538,6 +538,8 @@ brw_instruction_name(enum opcode op)
   return set_primitive_id;
case GS_OPCODE_SVB_WRITE:
   return gs_svb_write;
+   case GS_OPCODE_SVB_SET_DST_INDEX:
+  return gs_svb_set_dst_index;
 
default:
   /* Yes, this leaks.  It's in debug code, it should never occur, and if
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index e8456ce..ea3967d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -222,6 +222,7 @@ public:
 
unsigned sol_binding; /** gen6: SOL binding table index */
bool sol_final_write; /** gen6: send commit message */
+   unsigned sol_vertex; /** gen6: used for setting dst index in SVB header */
 
bool is_send_from_grf();
bool can_reswizzle_dst(int dst_writemask, int swizzle, int swizzle_mask);
@@ -664,6 +665,9 @@ private:
   struct brw_reg dst,
   struct brw_reg src0,
   struct brw_reg src1);
+   void generate_gs_svb_set_destination_index(vec4_instruction *inst,
+  struct brw_reg dst,
+  struct brw_reg src);
void generate_gs_set_dword_2_immed(struct brw_reg dst, struct brw_reg src);
void generate_gs_set_dword_2(struct brw_reg dst, struct brw_reg src);
void generate_gs_prepare_channel_masks(struct brw_reg dst);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 1728790..d914a52 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -574,6 +574,22 @@ vec4_generator::generate_gs_svb_write(vec4_instruction 
*inst,
 }
 
 void
+vec4_generator::generate_gs_svb_set_destination_index(vec4_instruction *inst,
+  struct brw_reg dst,
+  struct brw_reg src)
+{
+
+   int vertex = inst-sol_vertex;
+   brw_push_insn_state(p);
+   brw_set_default_access_mode(p, BRW_ALIGN_1);
+   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
+   brw_MOV(p, get_element_ud(dst, 5),
+   get_element_ud(src, vertex));
+   brw_set_default_access_mode(p, BRW_ALIGN_16);
+   brw_pop_insn_state(p);
+}
+
+void
 vec4_generator::generate_gs_set_dword_2_immed(struct brw_reg dst,
   struct brw_reg src)
 {
@@ -1313,6 +1329,10 @@ 
vec4_generator::generate_vec4_instruction(vec4_instruction *instruction,
case GS_OPCODE_SVB_WRITE:
   generate_gs_svb_write(inst, dst, src[0], src[1]);
 
+   case GS_OPCODE_SVB_SET_DST_INDEX:
+  generate_gs_svb_set_destination_index(inst, dst, src[0]);
+  break;
+
case GS_OPCODE_SET_DWORD_2_IMMED:
   generate_gs_set_dword_2_immed(dst, src[0]);
   break;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 22/37] i965/gen6/gs: Assign geometry shader VUE map properly.

2014-08-14 Thread Iago Toral Quiroga

So far in gen6 we only used geometry shaders to implement transform feedback
in vertex shaders, so we assumed that the VUE map for the geometry shader
stage was always the same as for the vertex shader stage. This is no longer
true now that we support user provided geometry shaders in gen6 too.
---
 src/mesa/drivers/dri/i965/brw_vec4_gs.c | 12 ++--
 src/mesa/drivers/dri/i965/brw_vs.c  |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs.c 
b/src/mesa/drivers/dri/i965/brw_vec4_gs.c
index a445174..f735cf3 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs.c
@@ -296,18 +296,18 @@ brw_upload_gs_prog(struct brw_context *brw)
   (struct brw_geometry_program *) brw-geometry_program;
 
if (gp == NULL) {
-  if (brw-gen == 6) {
- if (brw-state.dirty.brw  BRW_NEW_TRANSFORM_FEEDBACK)
-gen6_brw_upload_ff_gs_prog(brw);
- return;
-  }
-
   /* No geometry shader.  Vertex data just passes straight through. */
   if (brw-state.dirty.brw  BRW_NEW_VUE_MAP_VS) {
  brw-vue_map_geom_out = brw-vue_map_vs;
  brw-state.dirty.brw |= BRW_NEW_VUE_MAP_GEOM_OUT;
   }
 
+  if (brw-gen == 6 
+  (brw-state.dirty.brw  BRW_NEW_TRANSFORM_FEEDBACK)) {
+ gen6_brw_upload_ff_gs_prog(brw);
+ return;
+  }
+
   /* Other state atoms had better not try to access prog_data, since
* there's no GS program.
*/
diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
b/src/mesa/drivers/dri/i965/brw_vs.c
index 19b1d3b..3ea7681 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.c
+++ b/src/mesa/drivers/dri/i965/brw_vs.c
@@ -495,7 +495,7 @@ static void brw_upload_vs_prog(struct brw_context *brw)
   sizeof(brw-vue_map_geom_out)) != 0) {
   brw-vue_map_vs = brw-vs.prog_data-base.vue_map;
   brw-state.dirty.brw |= BRW_NEW_VUE_MAP_VS;
-  if (brw-gen  7) {
+  if (brw-gen  6) {
  /* No geometry shader support, so the VS VUE map is the VUE map for
   * the output of the geometry portion of the pipeline.
   */
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 15/37] i965: Generalize emit_urb_slot() to emit to any dst_reg.

2014-08-14 Thread Iago Toral Quiroga

In gen7+ we emit vertices as they come, however in gen6 geometry shaders we
have to buffer vertex data for all vertices and then emit it all in one go
at the end. To achieve this we need to generalize emit_urb_slot() to store
vertex data in general purpose registers and not only MRF registers.
---
 src/mesa/drivers/dri/i965/brw_vec4.h   |  4 ++--
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 30 +++---
 2 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index d95b58d..ad3a77f 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -532,10 +532,10 @@ public:
void swizzle_result(ir_texture *ir, src_reg orig_val, uint32_t sampler);
 
void emit_ndc_computation();
-   void emit_psiz_and_flags(struct brw_reg reg);
+   void emit_psiz_and_flags(dst_reg reg);
void emit_clip_distances(dst_reg reg, int offset);
void emit_generic_urb_slot(dst_reg reg, int varying);
-   void emit_urb_slot(int mrf, int varying);
+   void emit_urb_slot(dst_reg reg, int varying);
 
void emit_shader_time_begin();
void emit_shader_time_end();
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index e1fbcbc..d6ace29 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -2806,7 +2806,7 @@ vec4_visitor::emit_ndc_computation()
 }
 
 void
-vec4_visitor::emit_psiz_and_flags(struct brw_reg reg)
+vec4_visitor::emit_psiz_and_flags(dst_reg reg)
 {
if (brw-gen  6 
((prog_data-vue_map.slots_valid  VARYING_BIT_PSIZ) ||
@@ -2866,16 +2866,21 @@ vec4_visitor::emit_psiz_and_flags(struct brw_reg reg)
} else {
   emit(MOV(retype(reg, BRW_REGISTER_TYPE_D), src_reg(0)));
   if (prog_data-vue_map.slots_valid  VARYING_BIT_PSIZ) {
- emit(MOV(brw_writemask(reg, WRITEMASK_W),
-  src_reg(output_reg[VARYING_SLOT_PSIZ])));
+ dst_reg reg_w = reg;
+ reg_w.writemask = WRITEMASK_W;
+ emit(MOV(reg_w, src_reg(output_reg[VARYING_SLOT_PSIZ])));
   }
   if (prog_data-vue_map.slots_valid  VARYING_BIT_LAYER) {
- emit(MOV(retype(brw_writemask(reg, WRITEMASK_Y), BRW_REGISTER_TYPE_D),
-  src_reg(output_reg[VARYING_SLOT_LAYER])));
+ dst_reg reg_y = reg;
+ reg_y.writemask = WRITEMASK_Y;
+ reg_y.type = BRW_REGISTER_TYPE_D;
+ emit(MOV(reg_y, src_reg(output_reg[VARYING_SLOT_LAYER])));
   }
   if (prog_data-vue_map.slots_valid  VARYING_BIT_VIEWPORT) {
- emit(MOV(retype(brw_writemask(reg, WRITEMASK_Z), BRW_REGISTER_TYPE_D),
-  src_reg(output_reg[VARYING_SLOT_VIEWPORT])));
+ dst_reg reg_z = reg;
+ reg_z.writemask = WRITEMASK_Z;
+ reg_z.type = BRW_REGISTER_TYPE_D;
+ emit(MOV(reg_z, src_reg(output_reg[VARYING_SLOT_VIEWPORT])));
   }
}
 }
@@ -2928,18 +2933,18 @@ vec4_visitor::emit_generic_urb_slot(dst_reg reg, int 
varying)
 }
 
 void
-vec4_visitor::emit_urb_slot(int mrf, int varying)
+vec4_visitor::emit_urb_slot(dst_reg reg, int varying)
 {
-   struct brw_reg hw_reg = brw_message_reg(mrf);
-   dst_reg reg = dst_reg(MRF, mrf);
reg.type = BRW_REGISTER_TYPE_F;
 
switch (varying) {
case VARYING_SLOT_PSIZ:
+   {
   /* PSIZ is always in slot 0, and is coupled with other flags. */
   current_annotation = indices, point width, clip flags;
-  emit_psiz_and_flags(hw_reg);
+  emit_psiz_and_flags(reg);
   break;
+   }
case BRW_VARYING_SLOT_NDC:
   current_annotation = NDC;
   emit(MOV(reg, src_reg(output_reg[BRW_VARYING_SLOT_NDC])));
@@ -3047,7 +3052,8 @@ vec4_visitor::emit_vertex()
 
   mrf = base_mrf + 1;
   for (; slot  prog_data-vue_map.num_slots; ++slot) {
- emit_urb_slot(mrf++, prog_data-vue_map.slot_to_varying[slot]);
+ emit_urb_slot(dst_reg(MRF, mrf++),
+   prog_data-vue_map.slot_to_varying[slot]);
 
  /* If this was max_usable_mrf, we can't fit anything more into this
   * URB WRITE.
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 20/37] i965/gen6/gs: Implement GS_OPCODE_SET_PRIMITIVE_ID.

2014-08-14 Thread Iago Toral Quiroga

In gen6 the geometry shader payload includes the PrimitiveID information in
r0.1. When the shader code uses glPimitiveIdIn we will have to move this to
a separate hardware register where we can map this attribute. This opcode
takes the selected destination register and moves r0.1 there.
---
 src/mesa/drivers/dri/i965/brw_defines.h  |  8 
 src/mesa/drivers/dri/i965/brw_shader.cpp |  2 ++
 src/mesa/drivers/dri/i965/brw_vec4.h |  1 +
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 17 +
 4 files changed, 28 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index f6bdaeb..b30a095 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1032,6 +1032,14 @@ enum opcode {
 * - src1 is the number of primitives written.
 */
GS_OPCODE_FF_SYNC,
+
+   /**
+* Move r0.1 (which holds PrimitiveID information in gen6) to a separate
+* register.
+*
+* - dst is the GRF where PrimitiveID information will be moved.
+*/
+   GS_OPCODE_SET_PRIMITIVE_ID,
 };
 
 enum brw_urb_write_flags {
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index b927601..fc3146c 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -534,6 +534,8 @@ brw_instruction_name(enum opcode op)
   return get_instance_id;
case GS_OPCODE_FF_SYNC:
   return ff_sync;
+   case GS_OPCODE_SET_PRIMITIVE_ID:
+  return set_primitive_id;
 
default:
   /* Yes, this leaks.  It's in debug code, it should never occur, and if
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index ad3a77f..6e0da6d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -665,6 +665,7 @@ private:
void generate_gs_ff_sync(struct brw_reg dst,
 struct brw_reg src0,
 struct brw_reg src1);
+   void generate_gs_set_primitive_id(struct brw_reg dst);
void generate_oword_dual_block_offsets(struct brw_reg m1,
  struct brw_reg index);
void generate_scratch_write(vec4_instruction *inst,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 2bf2b67..8293f60 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -694,6 +694,19 @@ vec4_generator::generate_gs_ff_sync(struct brw_reg dst,
 }
 
 void
+vec4_generator::generate_gs_set_primitive_id(struct brw_reg dst)
+{
+   /* In gen6, PrimitiveID is delivered in R0.1 of the payload */
+   struct brw_reg src = brw_vec8_grf(0, 0);
+   brw_push_insn_state(p);
+   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
+   brw_set_default_access_mode(p, BRW_ALIGN_1);
+   brw_MOV(p, get_element_ud(dst, 0), get_element_ud(src, 1));
+   brw_set_default_access_mode(p, BRW_ALIGN_16);
+   brw_pop_insn_state(p);
+}
+
+void
 vec4_generator::generate_oword_dual_block_offsets(struct brw_reg m1,
   struct brw_reg index)
 {
@@ -1283,6 +1296,10 @@ 
vec4_generator::generate_vec4_instruction(vec4_instruction *instruction,
   generate_gs_ff_sync(dst, src[0], src[1]);
   break;
 
+   case GS_OPCODE_SET_PRIMITIVE_ID:
+  generate_gs_set_primitive_id(dst);
+  break;
+
case SHADER_OPCODE_SHADER_TIME_ADD:
   brw_shader_time_add(p, src[0],
   prog_data-base.binding_table.shader_time_start);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 31/37] i965/gen6/gs: Avoid buffering transform feedback varyings twice.

2014-08-14 Thread Iago Toral Quiroga

Currently we buffer transform feedack varyings separately. This patch makes
it so that we reuse the values we have already buffered for all the output
varyings of the geometry shader instead.
---
 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 181 --
 src/mesa/drivers/dri/i965/gen6_gs_visitor.h   |   8 +-
 2 files changed, 83 insertions(+), 106 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
index fca7536..8b7b8fd 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
@@ -98,30 +98,6 @@ gen6_gs_visitor::emit_prolog()
emit(MOV(dst_reg(this-prim_count), 0u));
 
if (c-prog_data.gen6_xfb_enabled) {
-  const struct gl_transform_feedback_info *linked_xfb_info =
- this-shader_prog-LinkedTransformFeedback;
-
-  /* Gen6 geometry shaders are required to ask for Streamed Vertex Buffer
-   * Indices values via FF_SYNC message, when Transform Feedback is
-   * enabled.
-   *
-   * To achieve this we buffer the Transform feedback outputs for each
-   * emitted vertex in xfb_output during operation. Then, when we have
-   * processed the last vertex (that is, at thread end time), we know all
-   * the required data for the FF_SYNC message header in order to receive
-   * the SVBI in the writeback.
-   *
-   * For each emitted vertex, xfb_output will hold
-   * num_transform_feedback_bindings data items plus one, which will
-   * indicate the end of the primitive. Next vertex's data comes right
-   * after.
-   */
-  this-xfb_output = src_reg(this,
- glsl_type::uint_type,
- linked_xfb_info-NumOutputs *
- c-gp-program.VerticesOut);
-  this-xfb_output_offset = src_reg(this, glsl_type::uint_type);
-  emit(MOV(dst_reg(this-xfb_output_offset), src_reg(0u)));
   /* Create a virtual register to hold destination indices in SOL */
   this-destination_indices = src_reg(this, glsl_type::uvec4_type);
   /* Create a virtual register to hold temporal values in SOL */
@@ -134,6 +110,8 @@ gen6_gs_visitor::emit_prolog()
   this-max_svbi = src_reg(this, glsl_type::uvec4_type);
   emit(MOV(dst_reg(this-max_svbi),
src_reg(retype(brw_vec1_grf(1, 4), BRW_REGISTER_TYPE_UD;
+
+  xfb_setup();
}
 
/* PrimitveID is delivered in r0.1 of the thread payload. If the program
@@ -173,9 +151,6 @@ gen6_gs_visitor::visit(ir_emit_vertex *)
 BRW_CONDITIONAL_L));
emit(IF(BRW_PREDICATE_NORMAL));
{
-  if (c-prog_data.gen6_xfb_enabled)
- xfb_buffer_output();
-
   /* Buffer all output slots for this vertex in vertex_output */
   for (int slot = 0; slot  prog_data-vue_map.num_slots; ++slot) {
  int varying = prog_data-vue_map.slot_to_varying[slot];
@@ -557,7 +532,7 @@ gen6_gs_visitor::setup_payload()
 }
 
 void
-gen6_gs_visitor::xfb_buffer_output()
+gen6_gs_visitor::xfb_setup()
 {
static const unsigned swizzle_for_offset[4] = {
   BRW_SWIZZLE4(0, 1, 2, 3),
@@ -569,48 +544,27 @@ gen6_gs_visitor::xfb_buffer_output()
struct brw_gs_prog_data *prog_data =
   (struct brw_gs_prog_data *) c-prog_data;
 
-   if (!prog_data-num_transform_feedback_bindings) {
-  const struct gl_transform_feedback_info *linked_xfb_info =
- this-shader_prog-LinkedTransformFeedback;
-  int i;
-
-  /* Make sure that the VUE slots won't overflow the unsigned chars in
-   * prog_data-transform_feedback_bindings[].
-   */
-  STATIC_ASSERT(BRW_VARYING_SLOT_COUNT = 256);
-
-  /* Make sure that we don't need more binding table entries than we've
-   * set aside for use in transform feedback.  (We shouldn't, since we
-   * set aside enough binding table entries to have one per component).
-   */
-  assert(linked_xfb_info-NumOutputs = BRW_MAX_SOL_BINDINGS);
-
-  prog_data-num_transform_feedback_bindings = linked_xfb_info-NumOutputs;
-  for (i = 0; i  prog_data-num_transform_feedback_bindings; i++) {
- prog_data-transform_feedback_bindings[i] =
-linked_xfb_info-Outputs[i].OutputRegister;
- prog_data-transform_feedback_swizzles[i] =
-swizzle_for_offset[linked_xfb_info-Outputs[i].ComponentOffset];
-  }
-   }
-
-   /* Buffer all TF outputs for this vertex in xfb_output */
-   for (int binding = 0; binding  prog_data-num_transform_feedback_bindings;
-binding++) {
-  unsigned varying =
- prog_data-transform_feedback_bindings[binding];
-  dst_reg dst(this-xfb_output);
-  dst.reladdr = ralloc(mem_ctx, src_reg);
-  memcpy(dst.reladdr, this-xfb_output_offset, sizeof(src_reg));
-  dst.type = output_reg[varying].type;
+   const struct gl_transform_feedback_info *linked_xfb_info =
+  this-shader_prog-LinkedTransformFeedback;

[Mesa-dev] [PATCH 28/37] i965/gen6/gs: implement transform feedback support in gen6_gs_visitor

2014-08-14 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsalvez sigles...@igalia.com

This takes care of generating code required to handle transform feedback.
Notice that transform feedback isn't enabled yet, since that requires
additional setups in other parts of the code that will come in later patches.

Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
---
 src/mesa/drivers/dri/i965/brw_context.h   | 113 ++
 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 309 +-
 src/mesa/drivers/dri/i965/gen6_gs_visitor.h   |  14 ++
 3 files changed, 391 insertions(+), 45 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 7439da1..3418b76 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -553,48 +553,6 @@ struct brw_vs_prog_data {
bool uses_vertexid;
 };
 
-
-/* Note: brw_gs_prog_data_compare() must be updated when adding fields to
- * this struct!
- */
-struct brw_gs_prog_data
-{
-   struct brw_vec4_prog_data base;
-
-   /**
-* Size of an output vertex, measured in HWORDS (32 bytes).
-*/
-   unsigned output_vertex_size_hwords;
-
-   unsigned output_topology;
-
-   /**
-* Size of the control data (cut bits or StreamID bits), in hwords (32
-* bytes).  0 if there is no control data.
-*/
-   unsigned control_data_header_size_hwords;
-
-   /**
-* Format of the control data (either GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_SID
-* if the control data is StreamID bits, or
-* GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_CUT if the control data is cut bits).
-* Ignored if control_data_header_size is 0.
-*/
-   unsigned control_data_format;
-
-   bool include_primitive_id;
-
-   int invocations;
-
-   /**
-* Dispatch mode, can be any of:
-* GEN7_GS_DISPATCH_MODE_DUAL_OBJECT
-* GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE
-* GEN7_GS_DISPATCH_MODE_SINGLE
-*/
-   int dispatch_mode;
-};
-
 /** Number of texture sampler units */
 #define BRW_MAX_TEX_UNIT 32
 
@@ -641,6 +599,77 @@ struct brw_gs_prog_data
 #define SURF_INDEX_GEN6_SOL_BINDING(t) (t)
 #define BRW_MAX_GEN6_GS_SURFACES   
SURF_INDEX_GEN6_SOL_BINDING(BRW_MAX_SOL_BINDINGS)
 
+/* Note: brw_gs_prog_data_compare() must be updated when adding fields to
+ * this struct!
+ */
+struct brw_gs_prog_data
+{
+   struct brw_vec4_prog_data base;
+
+   /**
+* Size of an output vertex, measured in HWORDS (32 bytes).
+*/
+   unsigned output_vertex_size_hwords;
+
+   unsigned output_topology;
+
+   /**
+* Size of the control data (cut bits or StreamID bits), in hwords (32
+* bytes).  0 if there is no control data.
+*/
+   unsigned control_data_header_size_hwords;
+
+   /**
+* Format of the control data (either GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_SID
+* if the control data is StreamID bits, or
+* GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_CUT if the control data is cut bits).
+* Ignored if control_data_header_size is 0.
+*/
+   unsigned control_data_format;
+
+   bool include_primitive_id;
+
+   int invocations;
+
+   /**
+* Dispatch mode, can be any of:
+* GEN7_GS_DISPATCH_MODE_DUAL_OBJECT
+* GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE
+* GEN7_GS_DISPATCH_MODE_SINGLE
+*/
+   int dispatch_mode;
+
+   /**
+* Gen6 transform feedback enabled flag.
+*/
+   bool gen6_xfb_enabled;
+
+   /**
+* Gen6: Provoking vertex convention for odd-numbered triangles
+* in tristrips.
+*/
+   GLuint pv_first:1;
+
+   /**
+* Gen6: Number of varyings that are output to transform feedback.
+*/
+   GLuint num_transform_feedback_bindings:7; /* 0-BRW_MAX_SOL_BINDINGS */
+
+   /**
+* Gen6: Map from the index of a transform feedback binding table entry to 
the
+* gl_varying_slot that should be streamed out through that binding table
+* entry.
+*/
+   unsigned char transform_feedback_bindings[BRW_MAX_SOL_BINDINGS];
+
+   /**
+* Gen6: Map from the index of a transform feedback binding table entry to 
the
+* swizzles that should be used when streaming out data through that
+* binding table entry.
+*/
+   unsigned char transform_feedback_swizzles[BRW_MAX_SOL_BINDINGS];
+};
+
 /**
  * Stride in bytes between shader_time entries.
  *
diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
index c1cfe75..b8eaa58 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
@@ -97,6 +97,45 @@ gen6_gs_visitor::emit_prolog()
this-prim_count = src_reg(this, glsl_type::uint_type);
emit(MOV(dst_reg(this-prim_count), 0u));
 
+   if (c-prog_data.gen6_xfb_enabled) {
+  const struct gl_transform_feedback_info *linked_xfb_info =
+ this-shader_prog-LinkedTransformFeedback;
+
+  /* Gen6 geometry shaders are required to ask for Streamed Vertex Buffer
+   * Indices values via FF_SYNC message, when Transform Feedback is
+   * enabled.
+

[Mesa-dev] [PATCH 16/37] i965/gen6/gs: Add initial implementation for a gen6 geometry shader visitor.

2014-08-14 Thread Iago Toral Quiroga

Geometry shaders in gen6 are significantly different from gen7+ so it is better
to have them implemented in a different file rather than adding gen6 branching
paths all over brw_vec4_gs_visitor.cpp.

This commit adds an initial implementation that only handles point output, which
is the simplest case.
---
 src/mesa/drivers/dri/i965/Makefile.sources  |   1 +
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h |   2 +-
 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp   | 345 
 src/mesa/drivers/dri/i965/gen6_gs_visitor.h |  67 +
 4 files changed, 414 insertions(+), 1 deletion(-)
 create mode 100644 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
 create mode 100644 src/mesa/drivers/dri/i965/gen6_gs_visitor.h

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 3fb647b..deada5f 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -121,6 +121,7 @@ i965_FILES = \
gen6_clip_state.c \
gen6_depthstencil.c \
gen6_gs_state.c \
+   gen6_gs_visitor.cpp \
 gen6_multisample_state.c \
gen6_queryobj.c \
gen6_sampler_state.c \
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
index 0be7559..8bf11fa 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
@@ -95,7 +95,7 @@ protected:
virtual void visit(ir_emit_vertex *);
virtual void visit(ir_end_primitive *);
 
-private:
+protected:
int setup_varying_inputs(int payload_reg, int *attribute_map,
 int attributes_per_reg);
void emit_control_data_bits();
diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
new file mode 100644
index 000..b78c55e
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
@@ -0,0 +1,345 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * This code is based on original work by Ilia Mirkin.
+ */
+
+/**
+ * \file gen6_gs_visitor.cpp
+ *
+ * Gen6 geometry shader implementation
+ */
+
+#include gen6_gs_visitor.h
+
+namespace brw {
+
+void
+gen6_gs_visitor::emit_prolog()
+{
+   vec4_gs_visitor::emit_prolog();
+
+   /* Gen6 geometry shaders require to allocate an initial VUE handle via
+* FF_SYNC message, however the documentation remarks that only one thread
+* can write to the URB simultaneously and the FF_SYNC message provides the
+* synchronization mechanism for this, so using this message effectively
+* stalls the thread until it is its turn to write to the URB. Because of
+* this, the best way to implement geometry shader algorithms in gen6 is to
+* execute the algorithm before the FF_SYNC message to maximize parallelism.
+*
+* To achieve this we buffer the geometry shader outputs for each emitted
+* vertex in vertex_output during operation. Then, when we have processed
+* the last vertex (that is, at thread end time), we send the FF_SYNC
+* message to allocate the initial VUE handle and write all buffered vertex
+* data to the URB in one go.
+*
+* For each emitted vertex, vertex_output will hold vue_map.num_slots
+* data items plus one additional item to hold required flags
+* (PrimType, PrimStart, PrimEnd, as expected by the URB_WRITE message)
+* which come right after the data items for that vertex. Vertex data and
+* flags for the next vertex come right after the data items and flags for
+* the previous vertex.
+*/
+   this-current_annotation = gen6 prolog;
+   this-vertex_output = src_reg(this,
+ glsl_type::uint_type,
+ (prog_data-vue_map.num_slots + 1) *
+

[Mesa-dev] [PATCH 08/37] i965/gen6/gs: Implement GS_OPCODE_URB_WRITE_ALLOCATE.

2014-08-14 Thread Iago Toral Quiroga

Gen6 geometry shaders need to allocate URB handles for each new vertex they
emit after the first (the URB handle for the first vertex is obtained via the
FF_SYNC message).

This opcode adds the URB allocation mechanism to regular URB writes.
---
 src/mesa/drivers/dri/i965/brw_defines.h  |  8 +++
 src/mesa/drivers/dri/i965/brw_shader.cpp |  2 ++
 src/mesa/drivers/dri/i965/brw_vec4.cpp   |  1 +
 src/mesa/drivers/dri/i965/brw_vec4.h |  1 +
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 30 
 5 files changed, 42 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 125d728..60b3846 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -929,6 +929,14 @@ enum opcode {
GS_OPCODE_URB_WRITE,
 
/**
+* Write geometry shader output data to the URB and request a new URB
+* handle (gen6).
+*
+* This opcode doesn't do an implied move from R0 to the first MRF.
+*/
+   GS_OPCODE_URB_WRITE_ALLOCATE,
+
+   /**
 * Terminate the geometry shader thread by doing an empty URB write.
 *
 * This opcode doesn't do an implied move from R0 to the first MRF.  This
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 5749061..69d16a7 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -514,6 +514,8 @@ brw_instruction_name(enum opcode op)
 
case GS_OPCODE_URB_WRITE:
   return gs_urb_write;
+   case GS_OPCODE_URB_WRITE_ALLOCATE:
+  return gs_urb_write_allocate;
case GS_OPCODE_THREAD_END:
   return gs_thread_end;
case GS_OPCODE_SET_WRITE_OFFSET:
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index b572b61..e413a05 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -274,6 +274,7 @@ vec4_visitor::implied_mrf_writes(vec4_instruction *inst)
case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
   return 3;
case GS_OPCODE_URB_WRITE:
+   case GS_OPCODE_URB_WRITE_ALLOCATE:
case GS_OPCODE_THREAD_END:
   return 0;
case SHADER_OPCODE_SHADER_TIME_ADD:
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 72fabdd..c1daf54 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -649,6 +649,7 @@ private:
 
void generate_vs_urb_write(vec4_instruction *inst);
void generate_gs_urb_write(vec4_instruction *inst);
+   void generate_gs_urb_write_allocate(vec4_instruction *inst);
void generate_gs_thread_end(vec4_instruction *inst);
void generate_gs_set_write_offset(struct brw_reg dst,
  struct brw_reg src0,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 05f4892..8ef0c34 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -429,6 +429,32 @@ vec4_generator::generate_gs_urb_write(vec4_instruction 
*inst)
 }
 
 void
+vec4_generator::generate_gs_urb_write_allocate(vec4_instruction *inst)
+{
+   struct brw_reg src = brw_message_reg(inst-base_mrf);
+
+   /* We pass the temporary passed in src0 as the writeback register */
+   brw_urb_WRITE(p,
+ inst-get_src(this-prog_data, 0), /* dest */
+ inst-base_mrf, /* starting mrf reg nr */
+ src,
+ BRW_URB_WRITE_ALLOCATE_COMPLETE,
+ inst-mlen,
+ 1, /* response len */
+ inst-offset,  /* urb destination offset */
+ BRW_URB_SWIZZLE_INTERLEAVE);
+
+   /* Now put allocated urb handle in dst.0 */
+   brw_push_insn_state(p);
+   brw_set_default_access_mode(p, BRW_ALIGN_1);
+   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
+   brw_MOV(p, get_element_ud(inst-get_dst(), 0),
+   get_element_ud(inst-get_src(this-prog_data, 0), 0));
+   brw_set_default_access_mode(p, BRW_ALIGN_16);
+   brw_pop_insn_state(p);
+}
+
+void
 vec4_generator::generate_gs_thread_end(vec4_instruction *inst)
 {
struct brw_reg src = brw_message_reg(inst-base_mrf);
@@ -1206,6 +1232,10 @@ 
vec4_generator::generate_vec4_instruction(vec4_instruction *instruction,
   generate_gs_urb_write(inst);
   break;
 
+   case GS_OPCODE_URB_WRITE_ALLOCATE:
+  generate_gs_urb_write_allocate(inst);
+  break;
+
case GS_OPCODE_THREAD_END:
   generate_gs_thread_end(inst);
   break;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 29/37] i965/gen6/gs: Setup SOL surfaces for user-provided geometry shaders

2014-08-14 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsalvez sigles...@igalia.com

Update gen6_gs_binding_table and gen6_sol_surface to use user-provided
geometry program information when present. This is necessary to implement
transform feedback support.

Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
---
 src/mesa/drivers/dri/i965/brw_context.h |   2 +-
 src/mesa/drivers/dri/i965/gen6_sol.c| 119 ++--
 2 files changed, 82 insertions(+), 39 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 3418b76..82f32af 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -914,7 +914,7 @@ struct brw_stage_state
uint32_t push_const_offset; /* Offset in the batchbuffer */
int push_const_size; /* in 256-bit register increments */
 
-   /* Binding table: pointers to SURFACE_STATE entries. */
+   /** Binding table: pointers to SURFACE_STATE entries. */
uint32_t bind_bo_offset;
uint32_t surf_offset[BRW_MAX_SURFACES];
 
diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c 
b/src/mesa/drivers/dri/i965/gen6_sol.c
index e1c1b3c..d21a010 100644
--- a/src/mesa/drivers/dri/i965/gen6_sol.c
+++ b/src/mesa/drivers/dri/i965/gen6_sol.c
@@ -41,13 +41,21 @@ gen6_update_sol_surfaces(struct brw_context *brw)
/* BRW_NEW_TRANSFORM_FEEDBACK */
struct gl_transform_feedback_object *xfb_obj =
   ctx-TransformFeedback.CurrentObject;
-   /* BRW_NEW_VERTEX_PROGRAM */
-   const struct gl_shader_program *shaderprog =
-  ctx-_Shader-CurrentProgram[MESA_SHADER_VERTEX];
-   const struct gl_transform_feedback_info *linked_xfb_info =
-  shaderprog-LinkedTransformFeedback;
+   const struct gl_shader_program *shaderprog;
+   const struct gl_transform_feedback_info *linked_xfb_info;
int i;
 
+   if (brw-geometry_program) {
+  /* BRW_NEW_GEOMETRY_PROGRAM */
+  shaderprog =
+ ctx-_Shader-CurrentProgram[MESA_SHADER_GEOMETRY];
+   } else {
+  /* BRW_NEW_VERTEX_PROGRAM */
+  shaderprog =
+ ctx-_Shader-CurrentProgram[MESA_SHADER_VERTEX];
+   }
+   linked_xfb_info = shaderprog-LinkedTransformFeedback;
+
for (i = 0; i  BRW_MAX_SOL_BINDINGS; ++i) {
   const int surf_index = SURF_INDEX_GEN6_SOL_BINDING(i);
   if (_mesa_is_xfb_active_and_unpaused(ctx) 
@@ -56,12 +64,24 @@ gen6_update_sol_surfaces(struct brw_context *brw)
  unsigned buffer_offset =
 xfb_obj-Offset[buffer] / 4 +
 linked_xfb_info-Outputs[i].DstOffset;
- brw_update_sol_surface(
-brw, xfb_obj-Buffers[buffer], brw-ff_gs.surf_offset[surf_index],
-linked_xfb_info-Outputs[i].NumComponents,
-linked_xfb_info-BufferStride[buffer], buffer_offset);
+ if (brw-geometry_program) {
+brw_update_sol_surface(
+   brw, xfb_obj-Buffers[buffer],
+   brw-gs.base.surf_offset[surf_index],
+   linked_xfb_info-Outputs[i].NumComponents,
+   linked_xfb_info-BufferStride[buffer], buffer_offset);
+ } else {
+brw_update_sol_surface(
+   brw, xfb_obj-Buffers[buffer],
+   brw-ff_gs.surf_offset[surf_index],
+   linked_xfb_info-Outputs[i].NumComponents,
+   linked_xfb_info-BufferStride[buffer], buffer_offset);
+ }
   } else {
- brw-ff_gs.surf_offset[surf_index] = 0;
+ if (!brw-geometry_program)
+brw-ff_gs.surf_offset[surf_index] = 0;
+ else
+brw-gs.base.surf_offset[surf_index] = 0;
   }
}
 
@@ -73,6 +93,7 @@ const struct brw_tracked_state gen6_sol_surface = {
   .mesa = 0,
   .brw = (BRW_NEW_BATCH |
   BRW_NEW_VERTEX_PROGRAM |
+  BRW_NEW_GEOMETRY_PROGRAM |
   BRW_NEW_TRANSFORM_FEEDBACK),
   .cache = 0
},
@@ -86,38 +107,50 @@ const struct brw_tracked_state gen6_sol_surface = {
 static void
 brw_gs_upload_binding_table(struct brw_context *brw)
 {
-   struct gl_context *ctx = brw-ctx;
-   /* BRW_NEW_VERTEX_PROGRAM */
-   const struct gl_shader_program *shaderprog =
-  ctx-_Shader-CurrentProgram[MESA_SHADER_VERTEX];
-   bool has_surfaces = false;
uint32_t *bind;
 
-   if (shaderprog) {
-  const struct gl_transform_feedback_info *linked_xfb_info =
-shaderprog-LinkedTransformFeedback;
-  /* Currently we only ever upload surfaces for SOL. */
-  has_surfaces = linked_xfb_info-NumOutputs != 0;
-   }
+   if (!brw-geometry_program) {
+  struct gl_context *ctx = brw-ctx;
+  /* BRW_NEW_VERTEX_PROGRAM */
+  const struct gl_shader_program *shaderprog =
+ ctx-_Shader-CurrentProgram[MESA_SHADER_VERTEX];
+  bool has_surfaces = false;
+
+  if (shaderprog) {
+ const struct gl_transform_feedback_info *linked_xfb_info =
+shaderprog-LinkedTransformFeedback;
+ /* Currently we only ever upload surfaces for SOL. */
+ has_surfaces =

[Mesa-dev] [PATCH 35/37] i965/gen6/gs: Use a specific implementation of geometry shaders for gen6.

2014-08-14 Thread Iago Toral Quiroga

In gen6 we will use the geometry shader implementation from gen6_gs_visitor.cpp
and keep the implementation in brw_vec4_gs_visitor.cpp for gen7+. Notice that
gen6_gs_visitor inherits from brw_vec4_gs_visitor so it is not a completely
seprate implementation of geometry shaders.

Also, gen6 does not support multiple dispatch modes, its default operation mode
is equivalent to gen7's SINGLE mode, so select that in gen6 for consistency.
---
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 56 ++-
 1 file changed, 35 insertions(+), 21 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index c2a4892..d9f658e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -28,6 +28,7 @@
  */
 
 #include brw_vec4_gs_visitor.h
+#include gen6_gs_visitor.h
 
 const unsigned MAX_GS_INPUT_VERTICES = 6;
 
@@ -634,19 +635,21 @@ brw_gs_emit(struct brw_context *brw,
   brw_dump_ir(brw, geometry, prog, shader-base, NULL);
}
 
-   /* Compile the geometry shader in DUAL_OBJECT dispatch mode, if we can do
-* so without spilling. If the GS invocations count  1, then we can't use
-* dual object mode.
-*/
-   if (c-prog_data.invocations = 1 
-   likely(!(INTEL_DEBUG  DEBUG_NO_DUAL_OBJECT_GS))) {
-  c-prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_OBJECT;
-
-  vec4_gs_visitor v(brw, c, prog, mem_ctx, true /* no_spills */);
-  if (v.run()) {
- return generate_assembly(brw, prog, c-gp-program.Base,
-  c-prog_data.base, mem_ctx, v.instructions,
-  final_assembly_size);
+   if (brw-gen = 7) {
+  /* Compile the geometry shader in DUAL_OBJECT dispatch mode, if we can do
+   * so without spilling. If the GS invocations count  1, then we can't 
use
+   * dual object mode.
+   */
+  if (c-prog_data.invocations = 1 
+  likely(!(INTEL_DEBUG  DEBUG_NO_DUAL_OBJECT_GS))) {
+ c-prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_OBJECT;
+
+ vec4_gs_visitor v(brw, c, prog, mem_ctx, true /* no_spills */);
+ if (v.run()) {
+return generate_assembly(brw, prog, c-gp-program.Base,
+ c-prog_data.base, mem_ctx,
+ v.instructions, final_assembly_size);
+ }
   }
}
 
@@ -655,22 +658,33 @@ brw_gs_emit(struct brw_context *brw,
 * back to DUAL_INSTANCED or SINGLE mode, which consumes fewer registers.
 *
 * SINGLE mode is more performant when invocations == 1 and DUAL_INSTANCE
-* mode is more performant when invocations  1.
+* mode is more performant when invocations  1. Gen6 only supports
+* SINGLE mode.
 */
-   if (c-prog_data.invocations = 1)
+   if (c-prog_data.invocations = 1 || brw-gen  7)
   c-prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_SINGLE;
else
   c-prog_data.dispatch_mode = GEN7_GS_DISPATCH_MODE_DUAL_INSTANCE;
 
-   vec4_gs_visitor v(brw, c, prog, mem_ctx, false /* no_spills */);
-   if (!v.run()) {
+   vec4_gs_visitor *gs = NULL;
+   const unsigned *ret = NULL;
+
+   if (brw-gen = 7)
+  gs = new vec4_gs_visitor(brw, c, prog, mem_ctx, false /* no_spills */);
+   else
+  gs = new gen6_gs_visitor(brw, c, prog, mem_ctx, false /* no_spills */);
+
+   if (!gs-run()) {
   prog-LinkStatus = false;
-  ralloc_strcat(prog-InfoLog, v.fail_msg);
-  return NULL;
+  ralloc_strcat(prog-InfoLog, gs-fail_msg);
+   } else {
+  ret = generate_assembly(brw, prog, c-gp-program.Base,
+  c-prog_data.base, mem_ctx, gs-instructions,
+  final_assembly_size);
}
 
-   return generate_assembly(brw, prog, c-gp-program.Base, 
c-prog_data.base,
-mem_ctx, v.instructions, final_assembly_size);
+   delete gs;
+   return ret;
 }
 
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 19/37] i965/gen6/gs: Handle the case where a geometry shader emits no output.

2014-08-14 Thread Iago Toral Quiroga

In gen6 we need to end the thread differently depending on whether we have
emitted at least one vertex or not. In case we did, the EOT message must
always include the COMPLETE flag or else the GPU hangs. If we have not
produced any output, however, we can't use the COMPLETE flag.

This would lead us to end the program with an ENDIF opcode, which we want
to avoid (and actually is not permitted since it hits an assertion), so
instead what we do is that we always request a new VUE handle every time we do
an URB WRITE, even for the last vertex we emit. With this we make sure that
whether we have emitted at least one vertex or none at all we have to finish the
thread without writing to the URB, which works for both cases by setting the
COMPLETE and UNUSED flags in the EOT message.
---
 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 237 +-
 src/mesa/drivers/dri/i965/gen6_gs_visitor.h   |   3 +-
 2 files changed, 118 insertions(+), 122 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
index 252e585..4a440eb 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
@@ -166,7 +166,7 @@ gen6_gs_visitor::visit(ir_end_primitive *)
 
/* Otheriwse we know that the last vertex we have processed was the last
 * vertex in the primitive and we need to set its PrimEnd flag, so do this
-* unless we haven't emitted that vertex at all.
+* unless we haven't emitted that vertex at all (vertex_count != 0).
 *
 * Notice that we have already incremented vertex_count when we processed
 * the last emit_vertex, so we need to take that into account in the
@@ -176,6 +176,10 @@ gen6_gs_visitor::visit(ir_end_primitive *)
unsigned num_output_vertices = c-gp-program.VerticesOut;
emit(CMP(dst_null_d(), this-vertex_count, src_reg(num_output_vertices + 1),
 BRW_CONDITIONAL_L));
+   vec4_instruction *inst = emit(CMP(dst_null_d(),
+ this-vertex_count, 0u,
+ BRW_CONDITIONAL_NEQ));
+   inst-predicate = BRW_PREDICATE_NORMAL;
emit(IF(BRW_PREDICATE_NORMAL));
{
   /* vertex_output_offset is already pointing at the first entry of the
@@ -224,47 +228,40 @@ gen6_gs_visitor::emit_urb_write_header(int mrf)
 }
 
 void
-gen6_gs_visitor::emit_urb_write_opcode(bool complete, src_reg vertex,
-   int base_mrf, int mlen, int urb_offset)
+gen6_gs_visitor::emit_urb_write_opcode(bool complete, int base_mrf,
+   int last_mrf, int urb_offset)
 {
vec4_instruction *inst = NULL;
 
-   /* If the vertex is not complete we don't have to do anything special */
if (!complete) {
+  /* If the vertex is not complete we don't have to do anything special */
   inst = emit(GS_OPCODE_URB_WRITE);
   inst-urb_write_flags = BRW_URB_WRITE_NO_FLAGS;
-  inst-base_mrf = base_mrf;
-  inst-mlen = mlen;
-  inst-offset = urb_offset;
-  return;
-   }
-
-   /* Otherwise, if this is not the last vertex we are going to write,
-* we have to request a new VUE handle for the next vertex.
-*
-* Notice that the vertex parameter has been pre-incremented in
-* emit_thread_end() to make this comparison easier.
-*/
-   emit(CMP(dst_null_d(), vertex, this-vertex_count, BRW_CONDITIONAL_L));
-   emit(IF(BRW_PREDICATE_NORMAL));
-   {
+   } else {
+  /* Otherwise we always request to allocate a new VUE handle. If this is
+   * the last write before the EOT message and the new handle never gets
+   * used it will be dereferenced when we send the EOT message. This is
+   * necessary to avoid different setups for the EOT message (one for the
+   * case when there is no output and another for the case when there is)
+   * which would require to end the program with an IF/ELSE/ENDIF block,
+   * something we do not want.
+   */
   inst = emit(GS_OPCODE_URB_WRITE_ALLOCATE);
   inst-urb_write_flags = BRW_URB_WRITE_COMPLETE;
-  inst-base_mrf = base_mrf;
-  inst-mlen = mlen;
-  inst-offset = urb_offset;
   inst-dst = dst_reg(MRF, base_mrf);
   inst-src[0] = this-temp;
}
-   emit(BRW_OPCODE_ELSE);
-   {
-  inst = emit(GS_OPCODE_URB_WRITE);
-  inst-urb_write_flags = BRW_URB_WRITE_COMPLETE;
-  inst-base_mrf = base_mrf;
-  inst-mlen = mlen;
-  inst-offset = urb_offset;
-   }
-   emit(BRW_OPCODE_ENDIF);
+
+   inst-base_mrf = base_mrf;
+   /* URB data written (does not include the message header reg) must
+* be a multiple of 256 bits, or 2 VS registers.  See vol5c.5,
+* section 5.4.3.2.2: URB_INTERLEAVED.
+*/
+   int mlen = last_mrf - base_mrf;
+   if ((mlen % 2) != 1)
+  mlen++;
+   inst-mlen = mlen;
+   inst-offset = urb_offset;
 }
 
 void
@@ -303,112 +300,112 @@ gen6_gs_visitor::emit_thread_end()
int max_usable_mrf =

[Mesa-dev] [PATCH 12/37] i965/gen6/gs: Upload binding table for user-provided geometry shaders.

2014-08-14 Thread Iago Toral Quiroga

---
 src/mesa/drivers/dri/i965/brw_binding_tables.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
b/src/mesa/drivers/dri/i965/brw_binding_tables.c
index 30a54ef..709cb9c 100644
--- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
+++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
@@ -215,7 +215,10 @@ gen6_upload_binding_table_pointers(struct brw_context *brw)
  GEN6_BINDING_TABLE_MODIFY_PS |
  (4 - 2));
OUT_BATCH(brw-vs.base.bind_bo_offset); /* vs */
-   OUT_BATCH(brw-ff_gs.bind_bo_offset); /* gs */
+   if (brw-ff_gs.prog_active)
+  OUT_BATCH(brw-ff_gs.bind_bo_offset); /* gs */
+   else
+  OUT_BATCH(brw-gs.base.bind_bo_offset); /* gs */
OUT_BATCH(brw-wm.base.bind_bo_offset); /* wm/ps */
ADVANCE_BATCH();
 }
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 14/37] i965: Provide means to create registers of a given size.

2014-08-14 Thread Iago Toral Quiroga

Implemented by Ilia Mirkin imir...@alum.mit.edu.
---
 src/mesa/drivers/dri/i965/brw_vec4.h   |  1 +
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 14 ++
 2 files changed, 15 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 5403f5a..d95b58d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -107,6 +107,7 @@ public:
bool equals(const src_reg r) const;
 
src_reg(class vec4_visitor *v, const struct glsl_type *type);
+   src_reg(class vec4_visitor *v, const struct glsl_type *type, int size);
 
explicit src_reg(dst_reg reg);
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 1b46850..e1fbcbc 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -639,6 +639,20 @@ src_reg::src_reg(class vec4_visitor *v, const struct 
glsl_type *type)
this-type = brw_type_for_base_type(type);
 }
 
+src_reg::src_reg(class vec4_visitor *v, const struct glsl_type *type, int size)
+{
+   assert(size  0);
+
+   init();
+
+   this-file = GRF;
+   this-reg = v-virtual_grf_alloc(type_size(type) * size);
+
+   this-swizzle = BRW_SWIZZLE_NOOP;
+
+   this-type = brw_type_for_base_type(type);
+}
+
 dst_reg::dst_reg(class vec4_visitor *v, const struct glsl_type *type)
 {
init();
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 32/37] i965/gen6/gs: Fix binding table clash between TF surfaces and textures.

2014-08-14 Thread Iago Toral Quiroga

For gen6 geometry shaders we use the first BRW_MAX_SOL_BINDINGS entries of the
binding table for transform feedback surfaces. However, vec4_visitor will
setup the binding table so that textures use the same space in the binding
table. This is done when calling assign_common_binding_table_offsets(0) as
part if its run() method.

To fix this clash we add a virtual method to the vec4_visitor hierarchy to
assign the binding table offsets, so that we can change this behavior
specifically for gen6 geometry shaders by mapping textures right after the
first BRW_MAX_SOL_BINDINGS entries.

Also, when there is no user-provided geometry shader, we only need to upload
the binding table if we have transform feedback, however, in the case of a
user-provided geometry shader, we can't only look into transform feedback
to make that decision.

This fixes multiple piglit tests for textureSize() and texelFetch() when these
functions are called from a geometry shader in gen6, like these:

bin/textureSize gs sampler2D -fbo -auto
bin/texelFetch gs usampler2D -fbo -auto
---
 src/mesa/drivers/dri/i965/brw_context.h   |  8 ++-
 src/mesa/drivers/dri/i965/brw_vec4.cpp|  8 ++-
 src/mesa/drivers/dri/i965/brw_vec4.h  |  1 +
 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp |  9 +++
 src/mesa/drivers/dri/i965/gen6_gs_visitor.h   |  1 +
 src/mesa/drivers/dri/i965/gen6_sol.c  | 80 ++-
 6 files changed, 78 insertions(+), 29 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 82f32af..aad7033 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -597,7 +597,6 @@ struct brw_vs_prog_data {
 2 /* shader time, pull constants */)
 
 #define SURF_INDEX_GEN6_SOL_BINDING(t) (t)
-#define BRW_MAX_GEN6_GS_SURFACES   
SURF_INDEX_GEN6_SOL_BINDING(BRW_MAX_SOL_BINDINGS)
 
 /* Note: brw_gs_prog_data_compare() must be updated when adding fields to
  * this struct!
@@ -1240,7 +1239,12 @@ struct brw_context
   uint32_t state_offset;
 
   uint32_t bind_bo_offset;
-  uint32_t surf_offset[BRW_MAX_GEN6_GS_SURFACES];
+  /**
+   * Surface offsets for the binding table. We only need surfaces to
+   * implement transform feedback so BRW_MAX_SOL_BINDINGS is all that we
+   * need in this case.
+   */
+  uint32_t surf_offset[BRW_MAX_SOL_BINDINGS];
} ff_gs;
 
struct {
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index e413a05..5307861 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1529,6 +1529,12 @@ vec4_vs_visitor::setup_payload(void)
this-first_non_payload_grf = reg;
 }
 
+void
+vec4_visitor::assign_binding_table_offsets()
+{
+   assign_common_binding_table_offsets(0);
+}
+
 src_reg
 vec4_visitor::get_timestamp()
 {
@@ -1628,7 +1634,7 @@ vec4_visitor::run()
if (INTEL_DEBUG  DEBUG_SHADER_TIME)
   emit_shader_time_begin();
 
-   assign_common_binding_table_offsets(0);
+   assign_binding_table_offsets();
 
emit_prolog();
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 58a5aac..531ec68 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -589,6 +589,7 @@ protected:
void setup_payload_interference(struct ra_graph *g, int first_payload_node,
int reg_node_count);
virtual dst_reg *make_reg_for_system_value(ir_variable *ir) = 0;
+   virtual void assign_binding_table_offsets();
virtual void setup_payload() = 0;
virtual void emit_prolog() = 0;
virtual void emit_program_code() = 0;
diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
index 8b7b8fd..8285efb 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
@@ -36,6 +36,15 @@ const unsigned MAX_GS_INPUT_VERTICES = 6;
 namespace brw {
 
 void
+gen6_gs_visitor::assign_binding_table_offsets()
+{
+   /* In gen6 we reserve the first BRW_MAX_SOL_BINDINGS entries for transform
+* feedback surfaces.
+*/
+   assign_common_binding_table_offsets(BRW_MAX_SOL_BINDINGS);
+}
+
+void
 gen6_gs_visitor::emit_prolog()
 {
vec4_gs_visitor::emit_prolog();
diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.h 
b/src/mesa/drivers/dri/i965/gen6_gs_visitor.h
index db65f81..3a67fe4 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.h
+++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.h
@@ -43,6 +43,7 @@ public:
   vec4_gs_visitor(brw, c, prog, mem_ctx, no_spills) {}
 
 protected:
+   virtual void assign_binding_table_offsets();
virtual void emit_prolog();
virtual void emit_thread_end();
virtual void visit(ir_emit_vertex *);
diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c 
b/src/mesa/drivers/dri/i965/gen6_sol.c
index

[Mesa-dev] [PATCH 10/37] i965/gen6/gs: Compute URB entry size for user-provided geometry shaders.

2014-08-14 Thread Iago Toral Quiroga

---
 src/mesa/drivers/dri/i965/brw_defines.h |  8 ++-
 src/mesa/drivers/dri/i965/brw_vec4_gs.c | 87 +
 2 files changed, 62 insertions(+), 33 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 60b3846..a2b40fb 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1576,10 +1576,14 @@ enum brw_message_target {
 # define GEN7_URB_ENTRY_SIZE_SHIFT  16
 # define GEN7_URB_STARTING_ADDRESS_SHIFT25
 
-/* GS URB Entry Allocation Size is a U9-1 field, so the maximum gs_size
+/* Gen7 GS URB Entry Allocation Size is a U9-1 field, so the maximum gs_size
  * is 2^9, or 512.  It's counted in multiples of 64 bytes.
  */
-#define GEN7_MAX_GS_URB_ENTRY_SIZE_BYTES   (512*64)
+#define GEN7_MAX_GS_URB_ENTRY_SIZE_BYTES(512*64)
+/* Gen6 GS URB Entry Allocation Size is defined as a number of 1024-bit
+ * (128 bytes) URB rows and the maximum allowed value is 5 rows.
+ */
+#define GEN6_MAX_GS_URB_ENTRY_SIZE_BYTES(5*128)
 
 #define _3DSTATE_PUSH_CONSTANT_ALLOC_VS 0x7912 /* GEN7+ */
 #define _3DSTATE_PUSH_CONSTANT_ALLOC_GS 0x7915 /* GEN7+ */
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs.c 
b/src/mesa/drivers/dri/i965/brw_vec4_gs.c
index 2d9e8c2..a445174 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs.c
@@ -75,31 +75,36 @@ do_gs_prog(struct brw_context *brw,
 */
c.prog_data.base.base.nr_params = ALIGN(param_count, 4) / 4 + 
gs-num_samplers;
 
-   if (gp-program.OutputType == GL_POINTS) {
-  /* When the output type is points, the geometry shader may output data
-   * to multiple streams, and EndPrimitive() has no effect.  So we
-   * configure the hardware to interpret the control data as stream ID.
-   */
-  c.prog_data.control_data_format = GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_SID;
-
-  /* We only have to emit control bits if we are using streams */
-  if (prog-Geom.UsesStreams)
- c.control_data_bits_per_vertex = 2;
-  else
- c.control_data_bits_per_vertex = 0;
+   if (brw-gen = 7) {
+  if (gp-program.OutputType == GL_POINTS) {
+ /* When the output type is points, the geometry shader may output data
+  * to multiple streams, and EndPrimitive() has no effect.  So we
+  * configure the hardware to interpret the control data as stream ID.
+  */
+ c.prog_data.control_data_format = 
GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_SID;
+
+ /* We only have to emit control bits if we are using streams */
+ if (prog-Geom.UsesStreams)
+c.control_data_bits_per_vertex = 2;
+ else
+c.control_data_bits_per_vertex = 0;
+  } else {
+ /* When the output type is triangle_strip or line_strip, 
EndPrimitive()
+  * may be used to terminate the current strip and start a new one
+  * (similar to primitive restart), and outputting data to multiple
+  * streams is not supported.  So we configure the hardware to 
interpret
+  * the control data as EndPrimitive information (a.k.a. cut bits).
+  */
+ c.prog_data.control_data_format = 
GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_CUT;
+
+ /* We only need to output control data if the shader actually calls
+  * EndPrimitive().
+  */
+ c.control_data_bits_per_vertex = gp-program.UsesEndPrimitive ? 1 : 0;
+  }
} else {
-  /* When the output type is triangle_strip or line_strip, EndPrimitive()
-   * may be used to terminate the current strip and start a new one
-   * (similar to primitive restart), and outputting data to multiple
-   * streams is not supported.  So we configure the hardware to interpret
-   * the control data as EndPrimitive information (a.k.a. cut bits).
-   */
-  c.prog_data.control_data_format = GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_CUT;
-
-  /* We only need to output control data if the shader actually calls
-   * EndPrimitive().
-   */
-  c.control_data_bits_per_vertex = gp-program.UsesEndPrimitive ? 1 : 0;
+  /* There are no control data bits in gen6. */
+  c.control_data_bits_per_vertex = 0;
}
c.control_data_header_size_bits =
   gp-program.VerticesOut * c.control_data_bits_per_vertex;
@@ -170,7 +175,8 @@ do_gs_prog(struct brw_context *brw,
 *
 */
unsigned output_vertex_size_bytes = c.prog_data.base.vue_map.num_slots * 16;
-   assert(output_vertex_size_bytes = GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES);
+   assert(brw-gen == 6 ||
+  output_vertex_size_bytes = GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES);
c.prog_data.output_vertex_size_hwords =
   ALIGN(output_vertex_size_bytes, 32) / 32;
 
@@ -200,10 +206,20 @@ do_gs_prog(struct brw_context *brw,
 * the above figures are all worst-case, and most of them

[Mesa-dev] [PATCH 18/37] i965/gen6/gs: Make sure we complete the last primitive.

2014-08-14 Thread Iago Toral Quiroga

Just in case the GS algorithm does not call EndPrimitive() for the last
primitive produced. This is relevant only for non point outputs, since for
this we are already setting the PrimEnd flag on each vertex we emit.
---
 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
index 5123bd7..252e585 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
@@ -270,6 +270,19 @@ gen6_gs_visitor::emit_urb_write_opcode(bool complete, 
src_reg vertex,
 void
 gen6_gs_visitor::emit_thread_end()
 {
+   /* Make sure the current primitive is ended: we know it is not ended when
+* first_vertex is not zero. This is only relevant for outputs other than
+* points because in the point case we set PrimEnd on all vertices.
+*/
+   if (c-gp-program.OutputType != GL_POINTS) {
+  emit(CMP(dst_null_d(), this-first_vertex, 0u, BRW_CONDITIONAL_Z));
+  emit(IF(BRW_PREDICATE_NORMAL));
+  {
+ visit((ir_end_primitive *) NULL);
+  }
+  emit(BRW_OPCODE_ENDIF);
+   }
+
/* Here we have to:
 * 1) Emit an FF_SYNC messsage to obtain an initial VUE handle.
 * 2) Loop over all buffered vertex data and write it to corresponding
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 23/37] i965/gen6/gs: Enable texture units and upload sampler state.

2014-08-14 Thread Iago Toral Quiroga

---
 src/mesa/drivers/dri/i965/brw_context.c| 2 +-
 src/mesa/drivers/dri/i965/brw_state_upload.c   | 1 +
 src/mesa/drivers/dri/i965/gen6_sampler_state.c | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index bf2aedb..bc0e1dd 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -300,7 +300,7 @@ brw_initialize_context_constants(struct brw_context *brw)
   MIN2(ctx-Const.MaxTextureCoordUnits,
ctx-Const.Program[MESA_SHADER_FRAGMENT].MaxTextureImageUnits);
ctx-Const.Program[MESA_SHADER_VERTEX].MaxTextureImageUnits = max_samplers;
-   if (brw-gen = 7)
+   if (brw-gen = 6)
   ctx-Const.Program[MESA_SHADER_GEOMETRY].MaxTextureImageUnits = 
max_samplers;
else
   ctx-Const.Program[MESA_SHADER_GEOMETRY].MaxTextureImageUnits = 0;
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index a52a8f4..b0d78ab 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -147,6 +147,7 @@ static const struct brw_tracked_state *gen6_atoms[] =
 
brw_fs_samplers,
brw_vs_samplers,
+   brw_gs_samplers,
gen6_sampler_state,
gen6_multisample_state,
 
diff --git a/src/mesa/drivers/dri/i965/gen6_sampler_state.c 
b/src/mesa/drivers/dri/i965/gen6_sampler_state.c
index 981e98f..9c6c508 100644
--- a/src/mesa/drivers/dri/i965/gen6_sampler_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_sampler_state.c
@@ -40,7 +40,7 @@ upload_sampler_state_pointers(struct brw_context *brw)
 PS_SAMPLER_STATE_CHANGE |
 (4 - 2));
OUT_BATCH(brw-vs.base.sampler_offset); /* VS */
-   OUT_BATCH(0); /* GS */
+   OUT_BATCH(brw-gs.base.sampler_offset); /* GS */
OUT_BATCH(brw-wm.base.sampler_offset);
ADVANCE_BATCH();
 }
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 21/37] i965/gen6/gs: Implement support for gl_PrimitiveIdIn.

2014-08-14 Thread Iago Toral Quiroga

For this we will need to move PrimitiveID information, delivered in the thread
payload in r0.1, to a separate register (we use GS_OPCODE_SET_PRIMITIVE_ID
for this), then map the corresponding varying slot to that register in the
setup_payload() method.

Notice that we cannot use a virtual register as the destination for the
PrimitiveID because we need to map all input attributes to hardware registers
in setup_payload(), which happens before virtual registers are mapped to
hardware registers. We could work around that issue if we were able to compute
the first non-payload register in emit_prolog() and move the PrimitiveID
information to that register, but we can't because at that point we still
don't know the final number uniforms that will be included in the payload.

So, what we do is to place PrimitiveID information in r1, which is always
delivered as part of the payload but its only populated with data
relevant for transform feedback when we set GEN6_GS_SVBI_PAYLOAD_ENABLE
in the 3DSTATE_GS state packet.

When we implement transform feedback, we wil make sure to move the value of r1
to another register before we overwrite it with the PrimitiveID.
---
 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 69 ++-
 src/mesa/drivers/dri/i965/gen6_gs_visitor.h   |  2 +
 2 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
index 4a440eb..b45c381 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
@@ -31,6 +31,8 @@
 
 #include gen6_gs_visitor.h
 
+const unsigned MAX_GS_INPUT_VERTICES = 6;
+
 namespace brw {
 
 void
@@ -38,6 +40,7 @@ gen6_gs_visitor::emit_prolog()
 {
vec4_gs_visitor::emit_prolog();
 
+   this-current_annotation = gen6 prolog;
/* Gen6 geometry shaders require to allocate an initial VUE handle via
 * FF_SYNC message, however the documentation remarks that only one thread
 * can write to the URB simultaneously and the FF_SYNC message provides the
@@ -59,7 +62,6 @@ gen6_gs_visitor::emit_prolog()
 * flags for the next vertex come right after the data items and flags for
 * the previous vertex.
 */
-   this-current_annotation = gen6 prolog;
this-vertex_output = src_reg(this,
  glsl_type::uint_type,
  (prog_data-vue_map.num_slots + 1) *
@@ -94,6 +96,30 @@ gen6_gs_visitor::emit_prolog()
 */
this-prim_count = src_reg(this, glsl_type::uint_type);
emit(MOV(dst_reg(this-prim_count), 0u));
+
+   /* PrimitveID is delivered in r0.1 of the thread payload. If the program
+* needs it we have to move it to a separate register where we can map
+* the atttribute.
+*
+* Notice that we cannot use a virtual register for this, because we need to
+* map all input attributes to hardware registers in setup_payload(),
+* which happens before virtual registers are mapped to hardware registers.
+* We could work around that issue if we were able to compute the first
+* non-payload register here and move the PrimitiveID information to that
+* register, but we can't because at this point we don't know the final
+* number uniforms that will be included in the payload.
+*
+* So, what we do is to place PrimitiveID information in r1, which is always
+* delivered as part of the payload, but its only populated with data
+* relevant for transform feedback when we set GEN6_GS_SVBI_PAYLOAD_ENABLE
+* in the 3DSTATE_GS state packet. That information can be obtained by other
+* means though, so we can safely use r1 for this purpose.
+*/
+   if (c-prog_data.include_primitive_id) {
+  this-primitive_id =
+ src_reg(retype(brw_vec8_grf(1, 0), BRW_REGISTER_TYPE_UD));
+  emit(GS_OPCODE_SET_PRIMITIVE_ID, dst_reg(this-primitive_id));
+   }
 }
 
 void
@@ -410,4 +436,45 @@ gen6_gs_visitor::emit_thread_end()
inst-mlen = 1;
 }
 
+void
+gen6_gs_visitor::setup_payload()
+{
+   int attribute_map[BRW_VARYING_SLOT_COUNT * MAX_GS_INPUT_VERTICES];
+
+   /* Attributes are going to be interleaved, so one register contains two
+* attribute slots.
+*/
+   int attributes_per_reg = 2;
+
+   /* If a geometry shader tries to read from an input that wasn't written by
+* the vertex shader, that produces undefined results, but it shouldn't
+* crash anything.  So initialize attribute_map to zeros--that ensures that
+* these undefined results are read from r0.
+*/
+   memset(attribute_map, 0, sizeof(attribute_map));
+
+   int reg = 0;
+
+   /* The payload always contains important data in r0. */
+   reg++;
+
+   /* r1 is always part of the payload and it holds information relevant
+* for transform feedback when we set the GEN6_GS_SVBI_PAYLOAD_ENABLE bit in
+* the 3DSTATE_GS packet. We will overwrite it with the PrimitiveID
+* information (and move the original

Re: [Mesa-dev] [PATCH 5/5] nv50, nvc0: add support for fine derivatives

2014-08-14 Thread Chris Forbes

I've included an appropriate release notes change including nv50 
nvc0 in my i965 follow-up series, on the assumption that these two
will land more or less together.

On Thu, Aug 14, 2014 at 11:00 PM, Marek Olšák mar...@gmail.com wrote:
 Are you gonna update the release notes too?

 Marek

 On Thu, Aug 14, 2014 at 6:52 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 The quadop-based method we currently use on all chipsets already
 provides the fine version of the derivatives.

 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 ---
  docs/GL3.txt  | 2 +-
  src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 4 
  src/gallium/drivers/nouveau/nv50/nv50_screen.c| 2 +-
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c| 2 +-
  4 files changed, 7 insertions(+), 3 deletions(-)

 diff --git a/docs/GL3.txt b/docs/GL3.txt
 index 89529fe..0a40e23 100644
 --- a/docs/GL3.txt
 +++ b/docs/GL3.txt
 @@ -189,7 +189,7 @@ GL 4.5, GLSL 4.50:
GL_ARB_clip_control  not started
GL_ARB_conditional_render_inverted   not started
GL_ARB_cull_distance not started
 -  GL_ARB_derivative_controlnot started
 +  GL_ARB_derivative_controlDONE (nv50, nvc0)
GL_ARB_direct_state_access   not started
GL_ARB_get_texture_sub_image started (Brian Paul)
GL_ARB_shader_texture_image_samples  not started
 diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
 b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
 index 14b6d68..456efcb 100644
 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
 +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
 @@ -531,7 +531,9 @@ static nv50_ir::operation translateOpcode(uint opcode)

 NV50_IR_OPCODE_CASE(COS, COS);
 NV50_IR_OPCODE_CASE(DDX, DFDX);
 +   NV50_IR_OPCODE_CASE(DDX_FINE, DFDX);
 NV50_IR_OPCODE_CASE(DDY, DFDY);
 +   NV50_IR_OPCODE_CASE(DDY_FINE, DFDY);
 NV50_IR_OPCODE_CASE(KILL, DISCARD);

 NV50_IR_OPCODE_CASE(SEQ, SET);
 @@ -2327,6 +2329,8 @@ Converter::handleInstruction(const struct 
 tgsi_full_instruction *insn)
 case TGSI_OPCODE_NOT:
 case TGSI_OPCODE_DDX:
 case TGSI_OPCODE_DDY:
 +   case TGSI_OPCODE_DDX_FINE:
 +   case TGSI_OPCODE_DDY_FINE:
FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi)
   mkOp1(op, dstTy, dst0[c], fetchSrc(0, c));
break;
 diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
 b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
 index 34cca3d..8a9a40e 100644
 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
 +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
 @@ -169,6 +169,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum 
 pipe_cap param)
 case PIPE_CAP_USER_VERTEX_BUFFERS:
 case PIPE_CAP_TEXTURE_MULTISAMPLE:
 case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
 +   case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE:
return 1;
 case PIPE_CAP_SEAMLESS_CUBE_MAP:
return 1; /* class_3d = NVA0_3D_CLASS; */
 @@ -200,7 +201,6 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum 
 pipe_cap param)
 case PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION:
 case PIPE_CAP_COMPUTE:
 case PIPE_CAP_DRAW_INDIRECT:
 -   case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE:
return 0;
 }

 diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
 b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
 index 17aee63..c6d9b91 100644
 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
 +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
 @@ -167,6 +167,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
 pipe_cap param)
 case PIPE_CAP_SAMPLE_SHADING:
 case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
 case PIPE_CAP_TEXTURE_GATHER_SM5:
 +   case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE:
return 1;
 case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
return (class_3d = NVE4_3D_CLASS) ? 1 : 0;
 @@ -184,7 +185,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
 pipe_cap param)
 case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT:
 case PIPE_CAP_FAKE_SW_MSAA:
 case PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION:
 -   case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE:
return 0;
 }

 --
 1.8.5.5

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] initial ARB_gpu_shader_fp64 posting

2014-08-14 Thread Tapani Pälli

Hi;

On 08/14/2014 01:52 PM, Dave Airlie wrote:
 This is just the mesa and glsl compiler portions of the ARB_gpu_shader_fp64
 extension that I've been slowly iterating over the past few months.

 All in 
 http://cgit.freedesktop.org/~airlied/mesa/log/?h=arb_gpu_shader_fp64-submit 
 but underneath the gallium + softpipe + mesa/st development, which all
 need further cleaning and docs.

I have some fixes/changes to this which I rebased on top of your latest
tree, these are available here:

http://cgit.freedesktop.org/~tpalli/mesa/log/?h=fp64_fixes

Notably the last one (i965 changes) is very experimental and should be
maybe ignored for now, others should be useful and fixes the fp64 tests
I've been sending to Piglit.

I introduced 'i2d and u2d', I'm not sure if this is wanted but it makes
implicit conversions in ast_to_hir.cpp cleaner, other option would be to
refactor implicit conversions code a bit. Let me know of your thoughts,
I can go for refactor if these are not wanted.

Thanks;

 The biggest bits of this are the builtin generator, constant expression 
 handling and uniform interfaces. I suspect there are chunks in some patches 
 that might need to be in other, and the uniform patches are probably not very 
 well explained, mostly because I can't remember why exactly I did what I did 
 in a few places.

 Dave.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

// Tapani

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] SandyBridge not handling GL_TRIANGLE_STRIP_ADJACENCY with repeating vertex indices correctly

2014-08-14 Thread Iago Toral Quiroga

On mar, 2014-07-29 at 10:12 +0200, Iago Toral Quiroga wrote:
 Hi,
 
 running the piglit tests on my implementation of geometry shaders for
 Sandy Bridge produces a GPU hang for the following test:
 
 ./glsl-1.50-geometry-primitive-id-restart GL_TRIANGLE_STRIP_ADJACENCY
 ffs
 
 That test checks primitive restarts but the hang seems to be unrelated
 to that, since it happens also when primitive restart is not enabled.
 The problem, which only affects GL_TRIANGLE_STRIP_ADJACENCY and no other
 primitive type -with our without adjacency-, is in this loop that the
 test uses to setup the indices for the vertices:
 
 elements = glMapBuffer(GL_ELEMENT_ARRAY_BUFFER, GL_READ_WRITE);
 num_elements = 0;
 for (i = 1; i = LONGEST_INPUT_SEQUENCE; i++) {
for (j = 0; j  i; j++) {
   /* Every element that isn't the primitive
* restart index can just be element 0, since
* we don't care about the actual vertex data.
*/
   elements[num_elements++] = 0;
}
elements[num_elements++] = prim_restart_index;
 }
 glUnmapBuffer(GL_ELEMENT_ARRAY_BUFFER);
 
 Setting all elements to the same index (0 in this case) is the one thing
 that causes the hang for GL_TRIANGLE_STRIP_ADJACENCY. A simple change
 like this removes the hang:
 -  elements[num_elements++] = 0;
 +  elements[num_elements++] = j != prim_restart_index ? j : j + 1;
 
 Skimming through the docs I have not seen any references to this being a
 known problem. In fact, I don't see any references to
 GL_TRIANGLE_STRIP_ADJACENCY being special in any way and it seems that
 this is not a problem in IvyBridge, since the test runs correctly there.
 
 Does this sound like a hardware bug specific to SandyBridge's handling
 of GL_TRIANGLE_STRIP_ADJACENCY or is there something else I should check
 before arriving to that conclusion?
 
 If it is a hardware bug I guess we want a workaround for it , at least
 to prevent the hang or something but I am not sure what would be the
 best option here, I think the only option for the driver would be to
 explore the list of indices provided when this primitive type is used
 and when we hit this scenario (I'd have to test how many repeating
 indices we need for it to hang), error out and do not execute the
 drawing command or something... any other suggestions? 

This is what I found so far:

1. the problem is specific to glDrawElements. glDrawArrays works well
even if all the vertices used have the same coordinates. To me this
suggests that the problem should not be in our implementation of GS,
since using glDrawArrays or glDrawElements is handled elsewhere and
should be transparent to the implementation of the GS stage.

2. The problem does not happen in all situations, only when we repeat
values in the indices we use with glDrawElements. In particular, I found
that the pattern that leads to the hang seems to be:
  - There are only 8 indices and all of them are the same.
  - There are more than 8 indices and there is at least one subset of 9
consecutive indices where at least 8 indices are the same (they do not
need to be consecutive within the group of 9).

3. The problem is specific to GL_TRIANGLE_STRIP_ADJACENCY. It does not
hang for any other primitive. In fact, other primitives work well and
produce the expected results. I have not seen specific requirements for
this primitive type in the docs that could justify something like this.
Even GL_TRIANGLE_STRIP_ADJACENCY seems to work well except when there
are repeating vertices with that specific pattern in glDrawElements.

4. The problem seems to be independent of the code we generate in the GS
stage, although this should not be surprising considering 1).
Particularly, the hang persists even in the case of an empty main()
function in the geometry shader (where we generate trivial code that of
course works for any other primitive type).

Based on this my conclusion is that this is very likely a hardware
issue. That, or some very obscure problem in the implementation of the
index buffer in gen6 that I have not seen and that only affects
GL_TRIANGLE_STRIP_ADJACENCY for some reason.

At this point I'd like to hear suggestions for things we could try next
to confirm whether this is a hardware problem or a driver problem, or,
if we agree that this is enough evidence that this must be a hardware
problem, how we can limit its impact, starting, probably, by rewriting
the piglit test so that we don't alter its purpose but avoid the hang on
gen6. We should also discuss if there is a way to work around this
problem so that at least developers running into it (as unlikely as that
may be) don't hang their systems.

I am going to be on holidays starting tomorrow and will have difficult
and limited Internet access for the most part, but Samuel (in the CC)
will be available next week to try any suggestions you may have.

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

[Mesa-dev] [Bug 81680] [r600g] Firefox crashes with hardware acceleration turned on

2014-08-14 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=81680

--- Comment #28 from Eugene ken20...@ukr.net ---
(In reply to comment #27)
 (In reply to comment #17)
  Program received signal SIGSEGV, Segmentation fault.
  PatchJump (label=..., jump=...) at
 
 When it says 'PatchJump (label=..., jump=...) at [...]', it's not a crash
 but normal JavaScript JIT operation. Run 'continue' in that case.
 
 
  Program received signal SIGSEGV, Segmentation fault.
  0x in ?? ()
 
 Only when it says '0x in ?? ()' is it the crash you're
 looking for. Run 'bt full' in that case and attach the output here.

I told already that bt / bt full gives nothing:


Program received signal SIGSEGV, Segmentation fault.
0x7fffe09f1d89 in ?? ()
(gdb) bt full
#0  0x7fffe09f1d89 in ?? ()
No symbol table info available.
#1  0x0500 in ?? ()
No symbol table info available.
#2  0x7fffb86d3900 in ?? ()
No symbol table info available.
#3  0x0003 in ?? ()
No symbol table info available.
#4  0xfffbb86d0f00 in ?? ()
No symbol table info available.
#5  0xfffadd4a9720 in ?? ()
No symbol table info available.

Any suggestions ?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 79629] [dri3] piglit glx_GLX_ARB_create_context_current_with_no_framebuffer fails

2014-08-14 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=79629

Eero Tamminen eero.t.tammi...@intel.com changed:

   What|Removed |Added

 CC||eero.t.tammi...@intel.com

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 37/37] i965/gen6: enable OpenGL 3.2

2014-08-14 Thread Iago Toral

Hi Mike, I don't really know, probably someone from Intel should confirm
this.

Iago

On Thu, 2014-08-14 at 14:14 +0100, Mike Lothian wrote:
 Isn't everything already added for GL 3.3?
 
 On 14 Aug 2014 12:13, Iago Toral Quiroga ito...@igalia.com wrote:
 From: Samuel Iglesias Gonsalvez sigles...@igalia.com
 
 Signed-off-by: Samuel Iglesias Gonsalvez
 sigles...@igalia.com
 ---
  src/mesa/drivers/dri/i965/intel_screen.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/src/mesa/drivers/dri/i965/intel_screen.c
 b/src/mesa/drivers/dri/i965/intel_screen.c
 index ea0fc58..83101a5 100644
 --- a/src/mesa/drivers/dri/i965/intel_screen.c
 +++ b/src/mesa/drivers/dri/i965/intel_screen.c
 @@ -1273,7 +1273,7 @@ set_max_gl_versions(struct intel_screen
 *screen)
psp-max_gl_es2_version = 30;
break;
 case 6:
 -  psp-max_gl_core_version = 31;
 +  psp-max_gl_core_version = 32;
psp-max_gl_compat_version = 30;
psp-max_gl_es1_version = 11;
psp-max_gl_es2_version = 30;
 --
 1.9.1
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 37/37] i965/gen6: enable OpenGL 3.2

2014-08-14 Thread Mike Lothian

I think everything that's required for GL 3.3 has already been added can we
jump directly there?
On 14 Aug 2014 12:13, Iago Toral Quiroga ito...@igalia.com wrote:

 From: Samuel Iglesias Gonsalvez sigles...@igalia.com

 Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
 ---
  src/mesa/drivers/dri/i965/intel_screen.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/mesa/drivers/dri/i965/intel_screen.c
 b/src/mesa/drivers/dri/i965/intel_screen.c
 index ea0fc58..83101a5 100644
 --- a/src/mesa/drivers/dri/i965/intel_screen.c
 +++ b/src/mesa/drivers/dri/i965/intel_screen.c
 @@ -1273,7 +1273,7 @@ set_max_gl_versions(struct intel_screen *screen)
psp-max_gl_es2_version = 30;
break;
 case 6:
 -  psp-max_gl_core_version = 31;
 +  psp-max_gl_core_version = 32;
psp-max_gl_compat_version = 30;
psp-max_gl_es1_version = 11;
psp-max_gl_es2_version = 30;
 --
 1.9.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 36/37] i965/gen6: enable GLSL 1.50

2014-08-14 Thread Mike Lothian

We can probably just change this to check for gen = 6 and expose 3.30
On 14 Aug 2014 12:13, Iago Toral Quiroga ito...@igalia.com wrote:

 From: Samuel Iglesias Gonsalvez sigles...@igalia.com

 Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
 ---
  src/mesa/drivers/dri/i965/intel_extensions.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c
 b/src/mesa/drivers/dri/i965/intel_extensions.c
 index e134cd9..9875b7c 100644
 --- a/src/mesa/drivers/dri/i965/intel_extensions.c
 +++ b/src/mesa/drivers/dri/i965/intel_extensions.c
 @@ -246,7 +246,7 @@ intelInitExtensions(struct gl_context *ctx)
 if (brw-gen = 7)
ctx-Const.GLSLVersion = 330;
 else if (brw-gen = 6)
 -  ctx-Const.GLSLVersion = 140;
 +  ctx-Const.GLSLVersion = 150;
 else
ctx-Const.GLSLVersion = 120;
 _mesa_override_glsl_version(ctx-Const);
 --
 1.9.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/1] configure.ac: Fix build with git-svn llvm version string

2014-08-14 Thread Tom Stellard

On Wed, Aug 13, 2014 at 04:46:56PM -0400, Jan Vesely wrote:
 Signed-off-by: Jan Vesely jan.ves...@rutgers.edu
 ---
 
 My llvm-config --version is
 3.6.0git-svn-r215564-cd35a3b3
 
 This patch assumes that the interesting part consists of only digits and dots.
 
 
  configure.ac | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/configure.ac b/configure.ac
 index 4ff87eb..dc5117e 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -1697,7 +1697,7 @@ if test x$enable_gallium_llvm = xyes; then
  fi
  
  if test x$LLVM_CONFIG != xno; then
 -LLVM_VERSION=`$LLVM_CONFIG --version | sed 's/svn.*//g'`
 +LLVM_VERSION=`$LLVM_CONFIG --version | sed 's/[[^0-9.]].*//g'`

As long as we are changing this.  I think it would be simpler to use grep:

`$LLVM_CONFIG --version | grep -o '^[[0-9.]]\+'`

-Tom

  LLVM_LDFLAGS=`$LLVM_CONFIG --ldflags`
  LLVM_BINDIR=`$LLVM_CONFIG --bindir`
  LLVM_CPPFLAGS=`strip_unwanted_llvm_flags $LLVM_CONFIG --cppflags`
 -- 
 1.9.3
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/5] gallium: add opcodes/cap for fine derivative support

2014-08-14 Thread Roland Scheidegger

Reviewed-by: Roland Scheidegger srol...@vmware.com

llvmpipe also already does the fine version. A coarse version (which we
indeed do when used implicitly for sampling though with some other
changes) might be minimally simpler though not even sure (might save a
shuffle instruction somewhere), but probably not worth it (plus, d3d10
sm4 had deriv_rtx and sm5 deriv_rtx_coarse/deriv_rtx_fine but the sm4
versions correspond to the fine versions so this was required).

Roland

Am 14.08.2014 06:52, schrieb Ilia Mirkin:
 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 ---
  src/gallium/auxiliary/tgsi/tgsi_info.c   |  3 +++
  src/gallium/auxiliary/tgsi/tgsi_util.c   |  2 ++
  src/gallium/docs/source/screen.rst   |  2 ++
  src/gallium/docs/source/tgsi.rst | 12 ++--
  src/gallium/drivers/freedreno/freedreno_screen.c |  1 +
  src/gallium/drivers/i915/i915_screen.c   |  1 +
  src/gallium/drivers/ilo/ilo_screen.c |  1 +
  src/gallium/drivers/llvmpipe/lp_screen.c |  1 +
  src/gallium/drivers/nouveau/nv30/nv30_screen.c   |  1 +
  src/gallium/drivers/nouveau/nv50/nv50_screen.c   |  1 +
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c   |  1 +
  src/gallium/drivers/r300/r300_screen.c   |  1 +
  src/gallium/drivers/r600/r600_pipe.c |  1 +
  src/gallium/drivers/radeonsi/si_pipe.c   |  1 +
  src/gallium/drivers/softpipe/sp_screen.c |  1 +
  src/gallium/drivers/svga/svga_screen.c   |  1 +
  src/gallium/drivers/vc4/vc4_screen.c |  1 +
  src/gallium/include/pipe/p_defines.h |  1 +
  src/gallium/include/pipe/p_shader_tokens.h   |  5 -
  19 files changed, 35 insertions(+), 3 deletions(-)
 
 diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
 b/src/gallium/auxiliary/tgsi/tgsi_info.c
 index e24348f..35f9747 100644
 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
 +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
 @@ -235,6 +235,9 @@ static const struct tgsi_opcode_info 
 opcode_info[TGSI_OPCODE_LAST] =
 { 1, 1, 0, 0, 0, 0, OTHR, INTERP_CENTROID, TGSI_OPCODE_INTERP_CENTROID 
 },
 { 1, 2, 0, 0, 0, 0, OTHR, INTERP_SAMPLE, TGSI_OPCODE_INTERP_SAMPLE },
 { 1, 2, 0, 0, 0, 0, OTHR, INTERP_OFFSET, TGSI_OPCODE_INTERP_OFFSET },
 +
 +   { 1, 1, 0, 0, 0, 0, COMP, DDX_FINE, TGSI_OPCODE_DDX_FINE },
 +   { 1, 1, 0, 0, 0, 0, COMP, DDY_FINE, TGSI_OPCODE_DDY_FINE },
  };
  
  const struct tgsi_opcode_info *
 diff --git a/src/gallium/auxiliary/tgsi/tgsi_util.c 
 b/src/gallium/auxiliary/tgsi/tgsi_util.c
 index e48159c..e1cba95 100644
 --- a/src/gallium/auxiliary/tgsi/tgsi_util.c
 +++ b/src/gallium/auxiliary/tgsi/tgsi_util.c
 @@ -245,6 +245,8 @@ tgsi_util_get_inst_usage_mask(const struct 
 tgsi_full_instruction *inst,
 case TGSI_OPCODE_USNE:
 case TGSI_OPCODE_IMUL_HI:
 case TGSI_OPCODE_UMUL_HI:
 +   case TGSI_OPCODE_DDX_FINE:
 +   case TGSI_OPCODE_DDY_FINE:
/* Channel-wise operations */
read_mask = write_mask;
break;
 diff --git a/src/gallium/docs/source/screen.rst 
 b/src/gallium/docs/source/screen.rst
 index 814e3ae..6fecc15 100644
 --- a/src/gallium/docs/source/screen.rst
 +++ b/src/gallium/docs/source/screen.rst
 @@ -213,6 +213,8 @@ The integer capabilities:
  * ``PIPE_CAP_DRAW_INDIRECT``: Whether the driver supports taking draw 
 arguments
{ count, instance_count, start, index_bias } from a PIPE_BUFFER resource.
See pipe_draw_info.
 +* ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE``: Whether the fragment shader supports
 +  the FINE versions of DDX/DDY.
  
  
  .. _pipe_capf:
 diff --git a/src/gallium/docs/source/tgsi.rst 
 b/src/gallium/docs/source/tgsi.rst
 index ac0ea54..7d5918f 100644
 --- a/src/gallium/docs/source/tgsi.rst
 +++ b/src/gallium/docs/source/tgsi.rst
 @@ -433,7 +433,11 @@ This instruction replicates its result.
dst = \cos{src.x}
  
  
 -.. opcode:: DDX - Derivative Relative To X
 +.. opcode:: DDX, DDX_FINE - Derivative Relative To X
 +
 +The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
 +advertised. When it is, the fine version guarantees one derivative per row
 +while DDX is allowed to be the same for the entire 2x2 quad.
  
  .. math::
  
 @@ -446,7 +450,11 @@ This instruction replicates its result.
dst.w = partialx(src.w)
  
  
 -.. opcode:: DDY - Derivative Relative To Y
 +.. opcode:: DDY, DDY_FINE - Derivative Relative To Y
 +
 +The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
 +advertised. When it is, the fine version guarantees one derivative per column
 +while DDY is allowed to be the same for the entire 2x2 quad.
  
  .. math::
  
 diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
 b/src/gallium/drivers/freedreno/freedreno_screen.c
 index de69b14..b156d8b 100644
 --- a/src/gallium/drivers/freedreno/freedreno_screen.c
 +++ b/src/gallium/drivers/freedreno/freedreno_screen.c
 @@ -216,6 +216,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
 pipe_cap

[Mesa-dev] [PATCH] i965/blorp_clear: Use memcpy instead of assignment to copy clear value

2014-08-14 Thread Neil Roberts

Hi,

After the looking at the problem in bug 81150 I was wondering if we
have the same problem when using glClear with integer values. Sure
enough I can trigger a similar bug on 32-bit builds with optimisations
using a piglit test which I've posted here:

http://lists.freedesktop.org/archives/piglit/2014-August/012144.html

- Neil

--- 8 --- (use git am --scissors to automatically chop here)

Similar to the problem described in 2c50212b14da27de4e3, if we copy the clear
value through a regular assignment via a floating point value, then if an
integer clear value is being used that happens to contain a signalling NaN
value then it would get converted to a quiet NaN when stored via the x87
floating-point registers. This would corrupt the integer value. Instead we
should use a memcpy to ensure the exact bit representation is preserved.

This bug can be triggered on 32-bit builds with optimisations by using an
integer clear color with a value like 0x7f817f81.
---
 src/mesa/drivers/dri/i965/brw_blorp_clear.cpp | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
index ffbcd1a..8db0837 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
@@ -202,12 +202,7 @@ brw_blorp_clear_params::brw_blorp_clear_params(struct 
brw_context *brw,
   y1 = rb-Height - fb-_Ymin;
}
 
-   float *push_consts = (float *)wm_push_consts;
-
-   push_consts[0] = ctx-Color.ClearColor.f[0];
-   push_consts[1] = ctx-Color.ClearColor.f[1];
-   push_consts[2] = ctx-Color.ClearColor.f[2];
-   push_consts[3] = ctx-Color.ClearColor.f[3];
+   memcpy(wm_push_consts.dst_x0, ctx-Color.ClearColor.f, sizeof(float) * 4);
 
use_wm_prog = true;
 
@@ -250,7 +245,7 @@ brw_blorp_clear_params::brw_blorp_clear_params(struct 
brw_context *brw,
if (irb-mt-fast_clear_state != INTEL_FAST_CLEAR_STATE_NO_MCS 
!partial_clear  wm_prog_key.use_simd16_replicated_data 
is_color_fast_clear_compatible(brw, format, ctx-Color.ClearColor)) {
-  memset(push_consts, 0xff, 4*sizeof(float));
+  memset(wm_push_consts, 0xff, 4*sizeof(float));
   fast_clear_op = GEN7_FAST_CLEAR_OP_FAST_CLEAR;
 
   /* Figure out what the clear rectangle needs to be aligned to, and how
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] egl/main: use separate LIBEGL_C_FILES and LIBEGL_H_FILES to fix SCons build

2014-08-14 Thread Brian Paul

The linker was trying to process .h files and failing.
---
 src/egl/main/Makefile.am  |3 ++-
 src/egl/main/Makefile.sources |   36 
 2 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/src/egl/main/Makefile.am b/src/egl/main/Makefile.am
index 6746bcc..06f6a05 100644
--- a/src/egl/main/Makefile.am
+++ b/src/egl/main/Makefile.am
@@ -34,7 +34,8 @@ AM_CFLAGS = \
 lib_LTLIBRARIES = libEGL.la
 
 libEGL_la_SOURCES = \
-   ${LIBEGL_C_FILES}
+   ${LIBEGL_C_FILES} \
+   ${LIBEGL_H_FILES}
 
 libEGL_la_LIBADD = \
$(EGL_LIB_DEPS)
diff --git a/src/egl/main/Makefile.sources b/src/egl/main/Makefile.sources
index 6a917e2..3573004 100644
--- a/src/egl/main/Makefile.sources
+++ b/src/egl/main/Makefile.sources
@@ -1,38 +1,42 @@
 LIBEGL_C_FILES := \
eglapi.c \
-   eglapi.h \
eglarray.c \
+   eglconfig.c \
+   eglcontext.c \
+   eglcurrent.c \
+   egldisplay.c \
+   egldriver.c \
+   eglfallbacks.c \
+   eglglobals.c \
+   eglimage.c \
+   egllog.c \
+   eglmisc.c \
+   eglmode.c \
+   eglscreen.c \
+   eglstring.c \
+   eglsurface.c \
+   eglsync.c
+
+
+LIBEGL_H_FILES := \
+   eglapi.h \
eglarray.h \
eglcompiler.h \
-   eglconfig.c \
eglconfig.h \
-   eglcontext.c \
eglcontext.h \
-   eglcurrent.c \
eglcurrent.h \
egldefines.h \
-   egldisplay.c \
egldisplay.h \
-   egldriver.c \
egldriver.h \
-   eglfallbacks.c \
-   eglglobals.c \
eglglobals.h \
-   eglimage.c \
eglimage.h \
-   egllog.c \
egllog.h \
-   eglmisc.c \
eglmisc.h \
-   eglmode.c \
eglmode.h \
eglmutex.h \
-   eglscreen.c \
eglscreen.h \
-   eglstring.c \
eglstring.h \
-   eglsurface.c \
eglsurface.h \
-   eglsync.c \
eglsync.h \
egltypedefs.h
+
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/5] gallium: add opcodes/cap for fine derivative support

2014-08-14 Thread Ilia Mirkin

I guess a question is whether we should even bother with the fine
version at all then? Just map everything to DDX/DDY... Although I
guess if llvmpipe does the coarse version sometimes, at least the fine
version is warranted.

On Thu, Aug 14, 2014 at 10:12 AM, Roland Scheidegger srol...@vmware.com wrote:
 Reviewed-by: Roland Scheidegger srol...@vmware.com

 llvmpipe also already does the fine version. A coarse version (which we
 indeed do when used implicitly for sampling though with some other
 changes) might be minimally simpler though not even sure (might save a
 shuffle instruction somewhere), but probably not worth it (plus, d3d10
 sm4 had deriv_rtx and sm5 deriv_rtx_coarse/deriv_rtx_fine but the sm4
 versions correspond to the fine versions so this was required).

 Roland

 Am 14.08.2014 06:52, schrieb Ilia Mirkin:
 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 ---
  src/gallium/auxiliary/tgsi/tgsi_info.c   |  3 +++
  src/gallium/auxiliary/tgsi/tgsi_util.c   |  2 ++
  src/gallium/docs/source/screen.rst   |  2 ++
  src/gallium/docs/source/tgsi.rst | 12 ++--
  src/gallium/drivers/freedreno/freedreno_screen.c |  1 +
  src/gallium/drivers/i915/i915_screen.c   |  1 +
  src/gallium/drivers/ilo/ilo_screen.c |  1 +
  src/gallium/drivers/llvmpipe/lp_screen.c |  1 +
  src/gallium/drivers/nouveau/nv30/nv30_screen.c   |  1 +
  src/gallium/drivers/nouveau/nv50/nv50_screen.c   |  1 +
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c   |  1 +
  src/gallium/drivers/r300/r300_screen.c   |  1 +
  src/gallium/drivers/r600/r600_pipe.c |  1 +
  src/gallium/drivers/radeonsi/si_pipe.c   |  1 +
  src/gallium/drivers/softpipe/sp_screen.c |  1 +
  src/gallium/drivers/svga/svga_screen.c   |  1 +
  src/gallium/drivers/vc4/vc4_screen.c |  1 +
  src/gallium/include/pipe/p_defines.h |  1 +
  src/gallium/include/pipe/p_shader_tokens.h   |  5 -
  19 files changed, 35 insertions(+), 3 deletions(-)

 diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
 b/src/gallium/auxiliary/tgsi/tgsi_info.c
 index e24348f..35f9747 100644
 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
 +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
 @@ -235,6 +235,9 @@ static const struct tgsi_opcode_info 
 opcode_info[TGSI_OPCODE_LAST] =
 { 1, 1, 0, 0, 0, 0, OTHR, INTERP_CENTROID, TGSI_OPCODE_INTERP_CENTROID 
 },
 { 1, 2, 0, 0, 0, 0, OTHR, INTERP_SAMPLE, TGSI_OPCODE_INTERP_SAMPLE },
 { 1, 2, 0, 0, 0, 0, OTHR, INTERP_OFFSET, TGSI_OPCODE_INTERP_OFFSET },
 +
 +   { 1, 1, 0, 0, 0, 0, COMP, DDX_FINE, TGSI_OPCODE_DDX_FINE },
 +   { 1, 1, 0, 0, 0, 0, COMP, DDY_FINE, TGSI_OPCODE_DDY_FINE },
  };

  const struct tgsi_opcode_info *
 diff --git a/src/gallium/auxiliary/tgsi/tgsi_util.c 
 b/src/gallium/auxiliary/tgsi/tgsi_util.c
 index e48159c..e1cba95 100644
 --- a/src/gallium/auxiliary/tgsi/tgsi_util.c
 +++ b/src/gallium/auxiliary/tgsi/tgsi_util.c
 @@ -245,6 +245,8 @@ tgsi_util_get_inst_usage_mask(const struct 
 tgsi_full_instruction *inst,
 case TGSI_OPCODE_USNE:
 case TGSI_OPCODE_IMUL_HI:
 case TGSI_OPCODE_UMUL_HI:
 +   case TGSI_OPCODE_DDX_FINE:
 +   case TGSI_OPCODE_DDY_FINE:
/* Channel-wise operations */
read_mask = write_mask;
break;
 diff --git a/src/gallium/docs/source/screen.rst 
 b/src/gallium/docs/source/screen.rst
 index 814e3ae..6fecc15 100644
 --- a/src/gallium/docs/source/screen.rst
 +++ b/src/gallium/docs/source/screen.rst
 @@ -213,6 +213,8 @@ The integer capabilities:
  * ``PIPE_CAP_DRAW_INDIRECT``: Whether the driver supports taking draw 
 arguments
{ count, instance_count, start, index_bias } from a PIPE_BUFFER resource.
See pipe_draw_info.
 +* ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE``: Whether the fragment shader supports
 +  the FINE versions of DDX/DDY.


  .. _pipe_capf:
 diff --git a/src/gallium/docs/source/tgsi.rst 
 b/src/gallium/docs/source/tgsi.rst
 index ac0ea54..7d5918f 100644
 --- a/src/gallium/docs/source/tgsi.rst
 +++ b/src/gallium/docs/source/tgsi.rst
 @@ -433,7 +433,11 @@ This instruction replicates its result.
dst = \cos{src.x}


 -.. opcode:: DDX - Derivative Relative To X
 +.. opcode:: DDX, DDX_FINE - Derivative Relative To X
 +
 +The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
 +advertised. When it is, the fine version guarantees one derivative per row
 +while DDX is allowed to be the same for the entire 2x2 quad.

  .. math::

 @@ -446,7 +450,11 @@ This instruction replicates its result.
dst.w = partialx(src.w)


 -.. opcode:: DDY - Derivative Relative To Y
 +.. opcode:: DDY, DDY_FINE - Derivative Relative To Y
 +
 +The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
 +advertised. When it is, the fine version guarantees one derivative per 
 column
 +while DDY is allowed to be the same for the entire 2x2 quad.

  .. math::

 diff --git

Re: [Mesa-dev] [PATCH] egl/main: use separate LIBEGL_C_FILES and LIBEGL_H_FILES to fix SCons build

2014-08-14 Thread Emil Velikov

On 14/08/14 15:38, Brian Paul wrote:
 The linker was trying to process .h files and failing.
Hi Brian, what linker do you have in mind ? Is it the same issue as reported
here [1] ? If so I've just pushed Jose's patch which explicitly handles scons.

commit d4a1f3fd270001b2fb0684dc981340391df8fb64
Author: Jose Fonseca jfons...@vmware.com
Date:   Wed Aug 13 20:33:35 2014 +0100

scons: do not include headers from the sources lists


-Emil

[1] https://bugs.freedesktop.org/show_bug.cgi?id=82534

 ---
  src/egl/main/Makefile.am  |3 ++-
  src/egl/main/Makefile.sources |   36 
  2 files changed, 22 insertions(+), 17 deletions(-)
 
 diff --git a/src/egl/main/Makefile.am b/src/egl/main/Makefile.am
 index 6746bcc..06f6a05 100644
 --- a/src/egl/main/Makefile.am
 +++ b/src/egl/main/Makefile.am
 @@ -34,7 +34,8 @@ AM_CFLAGS = \
  lib_LTLIBRARIES = libEGL.la
  
  libEGL_la_SOURCES = \
 - ${LIBEGL_C_FILES}
 + ${LIBEGL_C_FILES} \
 + ${LIBEGL_H_FILES}
  
  libEGL_la_LIBADD = \
   $(EGL_LIB_DEPS)
 diff --git a/src/egl/main/Makefile.sources b/src/egl/main/Makefile.sources
 index 6a917e2..3573004 100644
 --- a/src/egl/main/Makefile.sources
 +++ b/src/egl/main/Makefile.sources
 @@ -1,38 +1,42 @@
  LIBEGL_C_FILES := \
   eglapi.c \
 - eglapi.h \
   eglarray.c \
 + eglconfig.c \
 + eglcontext.c \
 + eglcurrent.c \
 + egldisplay.c \
 + egldriver.c \
 + eglfallbacks.c \
 + eglglobals.c \
 + eglimage.c \
 + egllog.c \
 + eglmisc.c \
 + eglmode.c \
 + eglscreen.c \
 + eglstring.c \
 + eglsurface.c \
 + eglsync.c
 +
 +
 +LIBEGL_H_FILES := \
 + eglapi.h \
   eglarray.h \
   eglcompiler.h \
 - eglconfig.c \
   eglconfig.h \
 - eglcontext.c \
   eglcontext.h \
 - eglcurrent.c \
   eglcurrent.h \
   egldefines.h \
 - egldisplay.c \
   egldisplay.h \
 - egldriver.c \
   egldriver.h \
 - eglfallbacks.c \
 - eglglobals.c \
   eglglobals.h \
 - eglimage.c \
   eglimage.h \
 - egllog.c \
   egllog.h \
 - eglmisc.c \
   eglmisc.h \
 - eglmode.c \
   eglmode.h \
   eglmutex.h \
 - eglscreen.c \
   eglscreen.h \
 - eglstring.c \
   eglstring.h \
 - eglsurface.c \
   eglsurface.h \
 - eglsync.c \
   eglsync.h \
   egltypedefs.h
 +
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 82534] src\egl\main\eglapi.h : fatal error LNK1107: invalid or corrupt file: cannot read at 0x2E02

2014-08-14 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=82534

Emil Velikov emil.l.veli...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Emil Velikov emil.l.veli...@gmail.com ---
Pushed to master

commit d4a1f3fd270001b2fb0684dc981340391df8fb64
Author: Jose Fonseca jfons...@vmware.com
Date:   Wed Aug 13 20:33:35 2014 +0100

scons: do not include headers from the sources lists

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl/main: use separate LIBEGL_C_FILES and LIBEGL_H_FILES to fix SCons build

2014-08-14 Thread Brian Paul


I guess I missed that patch/discussion.  That fixes things for me too.

-Brian

On 08/14/2014 08:49 AM, Emil Velikov wrote:

On 14/08/14 15:38, Brian Paul wrote:

The linker was trying to process .h files and failing.

Hi Brian, what linker do you have in mind ? Is it the same issue as reported
here [1] ? If so I've just pushed Jose's patch which explicitly handles scons.

commit d4a1f3fd270001b2fb0684dc981340391df8fb64
Author: Jose Fonseca jfons...@vmware.com
Date:   Wed Aug 13 20:33:35 2014 +0100

 scons: do not include headers from the sources lists


-Emil

[1] 
https://urldefense.proofpoint.com/v1/url?u=https://bugs.freedesktop.org/show_bug.cgi?id%3D82534k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=lGQMzzTgII0I7jefp2FHq7WtZ%2BTLs8wadB%2BiIj9xpBY%3D%0Am=toHOIPIotrjbrpC6XOIQWRAIUxSJLYHsUM%2Bq8nUooWI%3D%0As=470d545554f902da2fca8e17cf4def93576c3b65d0a2b0c80f73a4256a2f2d9b


---
  src/egl/main/Makefile.am  |3 ++-
  src/egl/main/Makefile.sources |   36 
  2 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/src/egl/main/Makefile.am b/src/egl/main/Makefile.am
index 6746bcc..06f6a05 100644
--- a/src/egl/main/Makefile.am
+++ b/src/egl/main/Makefile.am
@@ -34,7 +34,8 @@ AM_CFLAGS = \
  lib_LTLIBRARIES = libEGL.la

  libEGL_la_SOURCES = \
-   ${LIBEGL_C_FILES}
+   ${LIBEGL_C_FILES} \
+   ${LIBEGL_H_FILES}

  libEGL_la_LIBADD = \
$(EGL_LIB_DEPS)
diff --git a/src/egl/main/Makefile.sources b/src/egl/main/Makefile.sources
index 6a917e2..3573004 100644
--- a/src/egl/main/Makefile.sources
+++ b/src/egl/main/Makefile.sources
@@ -1,38 +1,42 @@
  LIBEGL_C_FILES := \
eglapi.c \
-   eglapi.h \
eglarray.c \
+   eglconfig.c \
+   eglcontext.c \
+   eglcurrent.c \
+   egldisplay.c \
+   egldriver.c \
+   eglfallbacks.c \
+   eglglobals.c \
+   eglimage.c \
+   egllog.c \
+   eglmisc.c \
+   eglmode.c \
+   eglscreen.c \
+   eglstring.c \
+   eglsurface.c \
+   eglsync.c
+
+
+LIBEGL_H_FILES := \
+   eglapi.h \
eglarray.h \
eglcompiler.h \
-   eglconfig.c \
eglconfig.h \
-   eglcontext.c \
eglcontext.h \
-   eglcurrent.c \
eglcurrent.h \
egldefines.h \
-   egldisplay.c \
egldisplay.h \
-   egldriver.c \
egldriver.h \
-   eglfallbacks.c \
-   eglglobals.c \
eglglobals.h \
-   eglimage.c \
eglimage.h \
-   egllog.c \
egllog.h \
-   eglmisc.c \
eglmisc.h \
-   eglmode.c \
eglmode.h \
eglmutex.h \
-   eglscreen.c \
eglscreen.h \
-   eglstring.c \
eglstring.h \
-   eglsurface.c \
eglsurface.h \
-   eglsync.c \
eglsync.h \
egltypedefs.h
+





___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 82536] u_current.h:72: undefined reference to `impglapi_Dispatch'

2014-08-14 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=82536

Emil Velikov emil.l.veli...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Emil Velikov emil.l.veli...@gmail.com ---
The offending commit has been reverted.

commit 957a28e63c8a205d01c48cb8fa03c3c1abe4b499
Author: Emil Velikov emil.l.veli...@gmail.com
Date:   Wed Aug 13 17:55:39 2014 +0100

Revert configure: Fix --enable-XX-bit flags by moving LT_INIT where it
should

This reverts commit 2af28040d639dddbb7c258981a00eaf3dfcbcf03.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 82546] [regression] libOSMesa build failure

2014-08-14 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=82546

Emil Velikov emil.l.veli...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Emil Velikov emil.l.veli...@gmail.com ---
The offending commit has been reverted.

commit 957a28e63c8a205d01c48cb8fa03c3c1abe4b499
Author: Emil Velikov emil.l.veli...@gmail.com
Date:   Wed Aug 13 17:55:39 2014 +0100

Revert configure: Fix --enable-XX-bit flags by moving LT_INIT where it
should

This reverts commit 2af28040d639dddbb7c258981a00eaf3dfcbcf03.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 82539] vmw_screen_dri.lo In file included from vmw_screen_dri.c:41: vmwgfx_drm.h:32:17: error: drm.h: No such file or directory

2014-08-14 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=82539

--- Comment #8 from Emil Velikov emil.l.veli...@gmail.com ---
The revert is already in master, so a fetch/rebase  test should suffice.
Thank you

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 50754] Building 32 bit mesa on 64 bit OS fails since change for automake

2014-08-14 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=50754

--- Comment #29 from Emil Velikov emil.l.veli...@gmail.com ---
Hello gents,

While this patch looks correct at first sight I caused quite a few issues with
other parts of mesa. As such I've reverted it, removed the hacky
--enable-32,64-bit options, and documented (docs/autoconf.html) a reasonable
approach towards multilib/cross-compile builds.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/1] configure.ac: Fix build with git-svn llvm version string

2014-08-14 Thread Jan Vesely

On Thu, 2014-08-14 at 06:35 -0700, Tom Stellard wrote:
 On Wed, Aug 13, 2014 at 04:46:56PM -0400, Jan Vesely wrote:
  Signed-off-by: Jan Vesely jan.ves...@rutgers.edu
  ---
  
  My llvm-config --version is
  3.6.0git-svn-r215564-cd35a3b3
  
  This patch assumes that the interesting part consists of only digits and 
  dots.
  
  
   configure.ac | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/configure.ac b/configure.ac
  index 4ff87eb..dc5117e 100644
  --- a/configure.ac
  +++ b/configure.ac
  @@ -1697,7 +1697,7 @@ if test x$enable_gallium_llvm = xyes; then
   fi
   
   if test x$LLVM_CONFIG != xno; then
  -LLVM_VERSION=`$LLVM_CONFIG --version | sed 's/svn.*//g'`
  +LLVM_VERSION=`$LLVM_CONFIG --version | sed 's/[[^0-9.]].*//g'`
 
 As long as we are changing this.  I think it would be simpler to use grep:
 
 `$LLVM_CONFIG --version | grep -o '^[[0-9.]]\+'`

I agree. I didn't know about grep -o.
It fixes my issue.

Reviewed-and-tested-by: Jan Vesely jan.ves...@rutgers.edu

jan

 
 -Tom
 
   LLVM_LDFLAGS=`$LLVM_CONFIG --ldflags`
   LLVM_BINDIR=`$LLVM_CONFIG --bindir`
   LLVM_CPPFLAGS=`strip_unwanted_llvm_flags $LLVM_CONFIG --cppflags`
  -- 
  1.9.3
  
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev

-- 
Jan Vesely jan.ves...@rutgers.edu


signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] vl/compositor: set the scissor before clearing the render target

2014-08-14 Thread Ilia Mirkin

Series is

Reviewed-by: Ilia Mirkin imir...@alum.mit.edu

On Thu, Aug 14, 2014 at 5:59 AM, Christian König
deathsim...@vodafone.de wrote:
 From: Christian König christian.koe...@amd.com

 Otherwise we clear areas that shouldn't be cleared.

 Signed-off-by: Christian König christian.koe...@amd.com
 ---
  src/gallium/auxiliary/vl/vl_compositor.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/src/gallium/auxiliary/vl/vl_compositor.c 
 b/src/gallium/auxiliary/vl/vl_compositor.c
 index 839fd27..6bd1a88 100644
 --- a/src/gallium/auxiliary/vl/vl_compositor.c
 +++ b/src/gallium/auxiliary/vl/vl_compositor.c
 @@ -1060,6 +1060,7 @@ vl_compositor_render(struct vl_compositor_state *s,
s-scissor.maxx = dst_surface-width;
s-scissor.maxy = dst_surface-height;
 }
 +   c-pipe-set_scissor_states(c-pipe, 0, 1, s-scissor);

 gen_vertex_data(c, s, dirty_area);

 @@ -1072,7 +1073,6 @@ vl_compositor_render(struct vl_compositor_state *s,
dirty_area-x1 = dirty_area-y1 = MIN_DIRTY;
 }

 -   c-pipe-set_scissor_states(c-pipe, 0, 1, s-scissor);
 c-pipe-set_framebuffer_state(c-pipe, c-fb_state);
 c-pipe-bind_vs_state(c-pipe, c-vs);
 c-pipe-set_vertex_buffers(c-pipe, 0, 1, c-vertex_buf);
 --
 1.9.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/5] Enable ARB_derivative_control for i965/Gen7+

2014-08-14 Thread Matt Turner

Nice. Series is

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 37/37] i965/gen6: enable OpenGL 3.2

2014-08-14 Thread Matt Turner

On Thu, Aug 14, 2014 at 4:12 AM, Iago Toral Quiroga ito...@igalia.com wrote:
 From: Samuel Iglesias Gonsalvez sigles...@igalia.com

 Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
 ---

I'd squash the last two patches together. I think it's likely we can
go to GL 3.3 on Sandybridge, but we'd probably like to take a look at
the piglit results first, so this patch that increases it to 3.2 seems
fine.

Both of these (squashed together) are

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/blorp_clear: Use memcpy instead of assignment to copy clear value

2014-08-14 Thread Matt Turner

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl_dri2: fix EXT_image_dma_buf_import fds

2014-08-14 Thread Matt Turner

On Thu, Aug 14, 2014 at 12:24 AM, Pekka Paalanen ppaala...@gmail.com wrote:
 On Wed, 13 Aug 2014 19:46:40 +0300
 Pohjolainen, Topi topi.pohjolai...@intel.com wrote:

 On Fri, Aug 08, 2014 at 05:28:59PM +0300, Pekka Paalanen wrote:
  From: Pekka Paalanen pekka.paala...@collabora.co.uk
 
  The EGL_EXT_image_dma_buf_import specification was revised (according to
  its revision history) on Dec 5th, 2013, for EGL to not take ownership of
  the file descriptors.
 
  Do not close the file descriptors passed in to eglCreateImageKHR with
  EGL_LINUX_DMA_BUF_EXT target.
 
  It is assumed, that the drivers, which ultimately process the file
  descriptors, do not close or modify them in any way either. This avoids
  the need to dup(), as it seems we would only need to just close the
  dup'd file descriptors right after.
 
  Signed-off-by: Pekka Paalanen pekka.paala...@collabora.co.uk

 I wrote the current logic based on the older version, and at least to me this
 is the right thing to do. Thanks for fixing it as well as taking care of the
 piglit test.

 Reviewed-by: Topi Pohjolainen topi.pohjolai...@intel.com

 I would be happier though if someone else gave his/her approval as well.

 Thank you, I have added your R-b, and will wait some more. I think I
 want the piglit patch landed first before I try to push this, anyway.

 Thanks for the piglit review too, I sent a new version with your R-b
 and the comment fix.

The plan is to make the 10.3 branch tomorrow, so don't wait too long. :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965/gen8: Allow 16k viewport when blitting stencil

2014-08-14 Thread Topi Pohjolainen

From: Topi Pohjolainen topi.pohjolai...@gmail.com

Fixes gles3 conformance tests:

framebuffer_blit_functionality_negative_height_blit
framebuffer_blit_functionality_negative_width_blit
framebuffer_blit_functionality_negative_dimensions_blit
framebuffer_blit_functionality_magnifying_blit
framebuffer_blit_functionality_multisampled_to_singlesampled_blit

Signed-off-by: Topi Pohjolainen topi.pohjolai...@gmail.com
---
 src/mesa/drivers/dri/i965/gen8_viewport_state.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/gen8_viewport_state.c 
b/src/mesa/drivers/dri/i965/gen8_viewport_state.c
index 9c89532..eda9aad 100644
--- a/src/mesa/drivers/dri/i965/gen8_viewport_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_viewport_state.c
@@ -94,6 +94,13 @@ gen8_upload_sf_clip_viewport(struct brw_context *brw)
   float gbx = maximum_guardband_extent / ctx-ViewportArray[i].Width;
   float gby = maximum_guardband_extent / ctx-ViewportArray[i].Height;
 
+  /**
+   * Stencil blits require W-tiled to be treated as Y-tiled needing in
+   * turn width to be programmed twice the original.
+   */
+  if (brw-meta_in_progress)
+ gbx *= 2;
+
   /* _NEW_VIEWPORT: Guardband Clipping */
   vp[8]  = -gbx; /* x-min */
   vp[9]  =  gbx; /* x-max */
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 4/5] mesa: add ARB_texture_barrier support

2014-08-14 Thread Ilia Mirkin

Any chance this can get reviewed before the 10.3 cutoff tomorrow? I
copied one of the existing nv_texture_barrier piglits and made use of
glTextureBarrier() instead, and it still passed.

On Mon, Aug 11, 2014 at 4:01 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 This extension is identical to NV_texture_barrier. Alias
 glTextureBarrier to the existing glTextureBarrierNV and use the existing
 NV_texture_barrier extension bit.

 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 ---

 v1 - v2:
  - Add the actual extension string
  - Remove separate (and missing dlist bits) TextureBarrier implementation
in favor of aliasing approach.

  src/mapi/glapi/gen/ARB_texture_barrier.xml | 13 +
  src/mapi/glapi/gen/Makefile.am |  1 +
  src/mapi/glapi/gen/gl_API.xml  |  4 
  src/mesa/main/extensions.c |  1 +
  4 files changed, 19 insertions(+)
  create mode 100644 src/mapi/glapi/gen/ARB_texture_barrier.xml

 diff --git a/src/mapi/glapi/gen/ARB_texture_barrier.xml 
 b/src/mapi/glapi/gen/ARB_texture_barrier.xml
 new file mode 100644
 index 000..7119732
 --- /dev/null
 +++ b/src/mapi/glapi/gen/ARB_texture_barrier.xml
 @@ -0,0 +1,13 @@
 +?xml version=1.0?
 +!DOCTYPE OpenGLAPI SYSTEM gl_API.dtd
 +
 +!-- Note: no GLX protocol info yet. --
 +
 +
 +OpenGLAPI
 +
 +category name=GL_ARB_texture_barrier number=167
 +function name=TextureBarrier alias=TextureBarrierNV /
 +/category
 +
 +/OpenGLAPI
 diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am
 index 212731f..2cc2752 100644
 --- a/src/mapi/glapi/gen/Makefile.am
 +++ b/src/mapi/glapi/gen/Makefile.am
 @@ -144,6 +144,7 @@ API_XML = \
 ARB_shader_atomic_counters.xml \
 ARB_shader_image_load_store.xml \
 ARB_sync.xml \
 +   ARB_texture_barrier.xml \
 ARB_texture_buffer_object.xml \
 ARB_texture_buffer_range.xml \
 ARB_texture_compression_rgtc.xml \
 diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
 index e011509..ccf3b9a 100644
 --- a/src/mapi/glapi/gen/gl_API.xml
 +++ b/src/mapi/glapi/gen/gl_API.xml
 @@ -8364,6 +8364,10 @@

  xi:include href=ARB_multi_bind.xml 
 xmlns:xi=http://www.w3.org/2001/XInclude/

 +!-- ARB extensions 148 - 166 --
 +
 +xi:include href=ARB_texture_barrier.xml 
 xmlns:xi=http://www.w3.org/2001/XInclude/
 +
  !-- Non-ARB extensions sorted by extension number. --

  category name=GL_EXT_blend_color number=2
 diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
 index 9ac8377..311f6ce 100644
 --- a/src/mesa/main/extensions.c
 +++ b/src/mesa/main/extensions.c
 @@ -151,6 +151,7 @@ static const struct extension extension_table[] = {
 { GL_ARB_shadow,  o(ARB_shadow),
   GLL,2001 },
 { GL_ARB_stencil_texturing,   o(ARB_stencil_texturing), 
   GL, 2012 },
 { GL_ARB_sync,o(ARB_sync),  
   GL, 2003 },
 +   { GL_ARB_texture_barrier, o(NV_texture_barrier),
   GL, 2014 },
 { GL_ARB_texture_border_clamp,
 o(ARB_texture_border_clamp),GLL,2000 },
 { GL_ARB_texture_buffer_object,   
 o(ARB_texture_buffer_object),   GLC,2008 },
 { GL_ARB_texture_buffer_object_rgb32, 
 o(ARB_texture_buffer_object_rgb32), GLC,2009 },
 --
 1.8.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/5] gallium: add opcodes/cap for fine derivative support

2014-08-14 Thread Roland Scheidegger

Am 14.08.2014 16:39, schrieb Ilia Mirkin:
 I guess a question is whether we should even bother with the fine
 version at all then? Just map everything to DDX/DDY... Although I
 guess if llvmpipe does the coarse version sometimes, at least the fine
 version is warranted.
I think it's nice to have both versions. llvmpipe only does the coarse
version for its internal use.
If a shader would do a ddx and ddy and then use the values for a texture
instruction with explicit derivatives, some slower path is used for
sampling (which can handle different mip levels in a quad) (though this
is a lot subject currently to debug vars such as no_quad_lod). The
problem is that even if you'd do a coarse_ddx, we still would fall back
to that slower path anyway, because (unlike intel hw where it really
matters if the actual lod values are different) we won't detect that
there is in fact just one lod per quad, so right now there would not
really be a benefit. Obviously, if you do the derivatives calculations
as part of the sampling itself, this is not a problem. FWIW the slow
path isn't actually all THAT more complicated than the per-quad lod path
- strides, mip image offsets etc. need to be looked up per pixel rather
than per quad, plus some slowness comes from the fact that stupid
sse/avx (only avx2) doesn't have true vector shift... There's also the
fact that the tex filter may be different too per pixel (with different
min/mag filter) though since we do (in some cases at least with avx) do
texture sampling for multiple quads at once this is something which
needs to be handled in any case. I suspect hw being slower with
different effective lods per pixel has similar reasons - there's just
more work to be done.

Roland



 
 On Thu, Aug 14, 2014 at 10:12 AM, Roland Scheidegger srol...@vmware.com 
 wrote:
 Reviewed-by: Roland Scheidegger srol...@vmware.com

 llvmpipe also already does the fine version. A coarse version (which we
 indeed do when used implicitly for sampling though with some other
 changes) might be minimally simpler though not even sure (might save a
 shuffle instruction somewhere), but probably not worth it (plus, d3d10
 sm4 had deriv_rtx and sm5 deriv_rtx_coarse/deriv_rtx_fine but the sm4
 versions correspond to the fine versions so this was required).

 Roland

 Am 14.08.2014 06:52, schrieb Ilia Mirkin:
 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 ---
  src/gallium/auxiliary/tgsi/tgsi_info.c   |  3 +++
  src/gallium/auxiliary/tgsi/tgsi_util.c   |  2 ++
  src/gallium/docs/source/screen.rst   |  2 ++
  src/gallium/docs/source/tgsi.rst | 12 ++--
  src/gallium/drivers/freedreno/freedreno_screen.c |  1 +
  src/gallium/drivers/i915/i915_screen.c   |  1 +
  src/gallium/drivers/ilo/ilo_screen.c |  1 +
  src/gallium/drivers/llvmpipe/lp_screen.c |  1 +
  src/gallium/drivers/nouveau/nv30/nv30_screen.c   |  1 +
  src/gallium/drivers/nouveau/nv50/nv50_screen.c   |  1 +
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c   |  1 +
  src/gallium/drivers/r300/r300_screen.c   |  1 +
  src/gallium/drivers/r600/r600_pipe.c |  1 +
  src/gallium/drivers/radeonsi/si_pipe.c   |  1 +
  src/gallium/drivers/softpipe/sp_screen.c |  1 +
  src/gallium/drivers/svga/svga_screen.c   |  1 +
  src/gallium/drivers/vc4/vc4_screen.c |  1 +
  src/gallium/include/pipe/p_defines.h |  1 +
  src/gallium/include/pipe/p_shader_tokens.h   |  5 -
  19 files changed, 35 insertions(+), 3 deletions(-)

 diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
 b/src/gallium/auxiliary/tgsi/tgsi_info.c
 index e24348f..35f9747 100644
 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
 +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
 @@ -235,6 +235,9 @@ static const struct tgsi_opcode_info 
 opcode_info[TGSI_OPCODE_LAST] =
 { 1, 1, 0, 0, 0, 0, OTHR, INTERP_CENTROID, 
 TGSI_OPCODE_INTERP_CENTROID },
 { 1, 2, 0, 0, 0, 0, OTHR, INTERP_SAMPLE, TGSI_OPCODE_INTERP_SAMPLE },
 { 1, 2, 0, 0, 0, 0, OTHR, INTERP_OFFSET, TGSI_OPCODE_INTERP_OFFSET },
 +
 +   { 1, 1, 0, 0, 0, 0, COMP, DDX_FINE, TGSI_OPCODE_DDX_FINE },
 +   { 1, 1, 0, 0, 0, 0, COMP, DDY_FINE, TGSI_OPCODE_DDY_FINE },
  };

  const struct tgsi_opcode_info *
 diff --git a/src/gallium/auxiliary/tgsi/tgsi_util.c 
 b/src/gallium/auxiliary/tgsi/tgsi_util.c
 index e48159c..e1cba95 100644
 --- a/src/gallium/auxiliary/tgsi/tgsi_util.c
 +++ b/src/gallium/auxiliary/tgsi/tgsi_util.c
 @@ -245,6 +245,8 @@ tgsi_util_get_inst_usage_mask(const struct 
 tgsi_full_instruction *inst,
 case TGSI_OPCODE_USNE:
 case TGSI_OPCODE_IMUL_HI:
 case TGSI_OPCODE_UMUL_HI:
 +   case TGSI_OPCODE_DDX_FINE:
 +   case TGSI_OPCODE_DDY_FINE:
/* Channel-wise operations */
read_mask = write_mask;
break;
 diff --git a/src/gallium/docs/source/screen.rst 
 b/src/gallium/docs/source/screen.rst
 index 814e3ae..6fecc15 100644
 ---

1 2 >

1 - 100 of 179 matches

Mail list logo