Re: [Mesa-dev] [PATCH V3] glapi: add GL_OES_geometry_shader extension

2016-01-13 Thread Tapani Pälli

On 01/13/2016 10:29 AM, Lofstedt, Marta wrote:



-Original Message-
From: ibmir...@gmail.com [mailto:ibmir...@gmail.com] On Behalf Of Ilia
Mirkin
Sent: Tuesday, January 12, 2016 7:09 PM
To: Marta Lofstedt
Cc: mesa-dev@lists.freedesktop.org; Romanick, Ian D; Lofstedt, Marta
Subject: Re: [PATCH V3] glapi: add GL_OES_geometry_shader extension

On Tue, Jan 12, 2016 at 9:13 AM, Marta Lofstedt
 wrote:

From: Marta Lofstedt 

Add xml definitions for the GL_OES_geometry_shader extension and
expose the extension for OpenGL ES 3.1.

V3: Added dependency to OES_shader_io_blocks and updated to correct
Khronos extension number.

May I ask why you did this? OES_shader_io_blocks is a purely shader
compiler/linker feature, I expect it will be enabled whenever GLES 3.1 is
enabled, no? Why would it be tied to geometry shaders? Sure, geometry
shaders require it to work, but just because you have OES_shader_io_blocks
doesn't necessarily mean you also have geometry shaders...


My intension was to address the co-dependency between oes_geometry_shader and 
oes_shader_io_block.
But as always, you are right Ilia. The dependency issue need to be fixed in the 
driver.

So, please disregard this V3, I will push the V2 with the changes suggested by 
Ilia in the comments.

FYI here are quotes from the oes_geometry_shader specification:
" OES_shader_io_blocks or EXT_shader_io_blocks is required."


IMO according to this it looks OES_shader_io_blocks is a valid 
requirement as that functionality is not part of OpenGL ES 3.1.



" This extension relies on the OES_shader_io_blocks extension to provide
 the required functionality for declaring input and output blocks and
 interfacing between shaders."

" If the OES_geometry_shader extension is enabled, the
 OES_shader_io_blocks extension is also implicitly enabled.



In practical terms, there's a non-trivial chance that A4xx will get tessellation
before geometry shaders, which would also require OES_shader_io_blocks
to be exposed.

Yes, but for desktop we already have the "shader_io_block" functionality.
So, the dependency is a GLES issue, and tessellation is not yet exposed under 
GLES.


   -ilia


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/8] gallium/radeon: implement PIPE_CAP_INVALIDATE_BUFFER

2016-01-13 Thread Fredrik Höglund
On Tuesday 12 January 2016, Nicolai Hähnle wrote:
> On 12.01.2016 13:41, Fredrik Höglund wrote:
> > On Tuesday 12 January 2016, Nicolai Hähnle wrote:
> >> From: Nicolai Hähnle 
> >>
> >> ---
> >>   src/gallium/drivers/r600/r600_pipe.c|  2 +-
> >>   src/gallium/drivers/radeon/r600_buffer_common.c | 23 
> >> ---
> >>   src/gallium/drivers/radeon/r600_pipe_common.c   |  1 +
> >>   src/gallium/drivers/radeon/r600_pipe_common.h   |  3 +++
> >>   src/gallium/drivers/radeonsi/si_pipe.c  |  2 +-
> >>   5 files changed, 22 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/src/gallium/drivers/r600/r600_pipe.c 
> >> b/src/gallium/drivers/r600/r600_pipe.c
> >> index a8805f6..569f77c 100644
> >> --- a/src/gallium/drivers/r600/r600_pipe.c
> >> +++ b/src/gallium/drivers/r600/r600_pipe.c
> >> @@ -278,6 +278,7 @@ static int r600_get_param(struct pipe_screen* pscreen, 
> >> enum pipe_cap param)
> >>case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
> >>case PIPE_CAP_TGSI_TXQS:
> >>case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS:
> >> +  case PIPE_CAP_INVALIDATE_BUFFER:
> >>return 1;
> >>
> >>case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
> >> @@ -355,7 +356,6 @@ static int r600_get_param(struct pipe_screen* pscreen, 
> >> enum pipe_cap param)
> >>case PIPE_CAP_TGSI_FS_POSITION_IS_SYSVAL:
> >>case PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL:
> >>case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT:
> >> -  case PIPE_CAP_INVALIDATE_BUFFER:
> >>return 0;
> >>
> >>case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS:
> >> diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
> >> b/src/gallium/drivers/radeon/r600_buffer_common.c
> >> index aeb9a20..09755e0 100644
> >> --- a/src/gallium/drivers/radeon/r600_buffer_common.c
> >> +++ b/src/gallium/drivers/radeon/r600_buffer_common.c
> >> @@ -209,6 +209,21 @@ static void r600_buffer_destroy(struct pipe_screen 
> >> *screen,
> >>FREE(rbuffer);
> >>   }
> >>
> >> +void r600_invalidate_resource(struct pipe_context *ctx,
> >> +struct pipe_resource *resource)
> >> +{
> >> +  struct r600_common_context *rctx = (struct r600_common_context*)ctx;
> >> +struct r600_resource *rbuffer = r600_resource(resource);
> >> +
> >> +  /* Check if mapping this buffer would cause waiting for the GPU. */
> >> +  if (r600_rings_is_buffer_referenced(rctx, rbuffer->buf, 
> >> RADEON_USAGE_READWRITE) ||
> >> +  !rctx->ws->buffer_wait(rbuffer->buf, 0, RADEON_USAGE_READWRITE)) {
> >> +  rctx->invalidate_buffer(>b, >b.b);
> >> +  } else {
> >> +  util_range_set_empty(>valid_buffer_range);
> >> +  }
> >
> > This implementation does not exactly comply with the specification.
> >
> > The point of InvalidateBuffer is to tell the driver that it may discard the
> > contents of the buffer if, for example, the buffer needs to be evicted.
> >
> > Calling InvalidateBuffer is not equivalent to calling MapBufferRange
> > with GL_MAP_INVALIDATE_BUFFER_BIT, since the former should invalidate
> > the buffer regardless of whether it is busy or not.
> 
> Can you back this with a quote from the spec? Given that no-op seems to 
> be a correct implmentation of InvalidateBuffer, I find what you write 
> rather hard to believe.

The overview says:

"GL implementations often include several memory spaces, each with
 distinct performance characteristics, and the implementations
 transparently move allocations between memory spaces. With this
 extension, an application can tell the GL that the contents of a
 texture or buffer are no longer needed, and the implementation can
 avoid transferring the data unnecessarily."

This to me makes the intent pretty clear.  The implementation is of
course free to do what it wants with this information, including nothing
at all.  My objection here is that your implementation only helps
applications that are using the extension incorrectly.  But it is still an
improvement over doing nothing at all.

> Part of the problems may be that the spec talks about "invalidating" 
> without - as far as I can tell - ever defining what that means. In any 
> case, I see no reason why the behavior should be different form 
> GL_MAP_INVALIDATE_BUFFER_BIT.
>
> Thanks,
> Nicolai
> 
> >
> >> +}
> >> +
> >>   static void *r600_buffer_get_transfer(struct pipe_context *ctx,
> >>  struct pipe_resource *resource,
> >> unsigned level,
> >> @@ -276,13 +291,7 @@ static void *r600_buffer_transfer_map(struct 
> >> pipe_context *ctx,
> >>!(usage & PIPE_TRANSFER_UNSYNCHRONIZED)) {
> >>assert(usage & PIPE_TRANSFER_WRITE);
> >>
> >> -  /* Check if mapping this buffer would cause waiting for the 
> >> GPU. */
> >> -  if (r600_rings_is_buffer_referenced(rctx, rbuffer->buf, 
> >> RADEON_USAGE_READWRITE) ||
> >> -  

Re: [Mesa-dev] [PATCH] radeonsi: don't print a warning for unhandled registers returned by LLVM

2016-01-13 Thread Marek Olšák
On Wed, Jan 13, 2016 at 4:25 AM, Michel Dänzer  wrote:
> On 13.01.2016 03:44, Marek Olšák wrote:
>> From: Marek Olšák 
>>
>> We don't want apps to flood stderr. New LLVM + old Mesa is a perfectly
>> valid combination (if it doesn't fail to build, of course).
>
> Actually it's not, in general.

Why not?

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nir: Add a fquantize2f16 opcode

2016-01-13 Thread Ian Romanick
On 01/12/2016 05:41 PM, Matt Turner wrote:
> On Tue, Jan 12, 2016 at 4:10 PM, Jason Ekstrand  wrote:
>> On Tue, Jan 12, 2016 at 3:52 PM, Matt Turner  wrote:
>>>
>>> On Tue, Jan 12, 2016 at 3:35 PM, Jason Ekstrand 
>>> wrote:
 This opcode simply takes a 32-bit floating-point value and reduces its
 effective precision to 16 bits.
 ---
>>>
>>> What's it supposed to do for values not representable in half-precision?
>>
>>
>> If they're in-range, round.  If they're out-of-range, the appropriate
>> infinity.
> 
> Are you sure that's the behavior hardware has? And by "are you sure" I
> mean "have you tested it"
> 
> The conversion table in the f32to16 documentation in the IVB PRM says:
> 
> single precision -> half precision
> 
> -finite -> -finite/-denorm/-0
> +finite -> +finite/+denorm/+0
> 
>> https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.html#OpQuantizeToF16
> 
>> Quantize a floating-point value to a what is expressible by a 16-bit 
>> floating-point value.
> 
> Erf, anyway,
> 
> ... and the "convert too-large values to inf" isn't the behavior of
> other languages like C [1] (and I don't think GLSL either, but I can't
> find anything on the matter i the spec) or OpenCL C [2].

Some background may either clarify or further muddy things.

Right now applications sprinkle mediump and lowp all over the place in
GLSL ES shaders.  Many vertex shader implementations, even on mobile
devices, do everything in single precision.  Many devices will only use
f16 part of the time because some instructions may not have f16
versions.  When we finally implement f16 in the i965 driver, we'll be in
this boat too.

As a result, people think that their mediump-decorated code is fine...
until it actually runs on a device that really does mediump.  Then they
report a bug to the vendor of that hardware.  Sound like a familiar
situation?

From this problem the OpQuantizeToF16 SPRI-V instruction was born.  The
intention is that people could compile their code in a way that mediump
gives you mediump precision on every device.  While you probably
wouldn't want to ship such code, this at least makes it possible to test
it without having to find a device that will really do native mediump
calculations all the time.

IIRC, GLSL doesn't require Inf in mediump.  I don't recall what SPRI-V
says.  I believe that GLSL allows saturating to the maximum magnitude
representable value.  What we want is for an expression tree like

OpQuantizeToF16(OpQuantizeToF16(x) + OpQuantizeToF16(y))

to produce the same value that 'x + y' would produce in "real" f16 mediump.

The SPRI-V +/-Inf requirement doesn't completely jive with my
recollection of the discussions... but there was a lot of
back-and-forth, and it was quite a few months ago at this point.  I
think we may have picked just one possible answer instead of allowing
both choices just for consistency.  I don't have any memory whether
anyone strongly wanted the +/-Inf behavior or if it was just a coin toss.

> Section 8.3.2 of the OpenCL C 2.0 spec is also relevant, but doesn't
> touch directly on the issue at hand.
> 
> I'm worried that what is specified is not implementable via a round
> trip through half-precision, because it's not the behavior other
> languages implement.
> 
> If I had to guess, given the table in the IVB PRM and section 8.3.2,
> out-of-range single-precision floats are converted to the
> half-precision value with the largest magnitude.

You are correct, we should test it to be sure what the hardware really
does. This is not intended to be a performance operation. If we need to
use a different, more expensive expansion to meet the requirements, we
shouldn't lose any sleep over it.

> [1] C99 spec, 6.3.1.5 says "If the value being converted is outside
> the range of values that can be represented, the behavior is
> undefined."
> [2] OpenCL C 2.0 spec 6.2.3.3 says to refer to C99 spec section 6.3.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 19/28] glsl: add support for explicit components to frag outputs

2016-01-13 Thread Timothy Arceri
V2: fix error checking for arrays and components. V1 was
only taking into account all the array elements and all the
components of one of the varyings during the comparision
and treating the other as a single slot/component.

Cc: Anuj Phogat 
---
 src/glsl/linker.cpp | 72 +
 1 file changed, 62 insertions(+), 10 deletions(-)

diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index b81bfba..c66dcc4 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -2411,7 +2411,12 @@ assign_attribute_or_color_locations(gl_shader_program 
*prog,
   }
} to_assign[16];
 
+   /* Temporary array for the set of attributes that have locations assigned.
+*/
+   ir_variable *assigned[16];
+
unsigned num_attr = 0;
+   unsigned assigned_attr = 0;
 
foreach_in_list(ir_instruction, node, sh->ir) {
   ir_variable *const var = node->as_variable();
@@ -2573,18 +2578,62 @@ assign_attribute_or_color_locations(gl_shader_program 
*prog,
 * attribute overlaps any previously allocated bits.
 */
if ((~(use_mask << attr) & used_locations) != used_locations) {
-   if (target_index == MESA_SHADER_FRAGMENT ||
-   (prog->IsES && prog->Version >= 300)) {
-  linker_error(prog,
-   "overlapping location is assigned "
-   "to %s `%s' %d %d %d\n", string,
-   var->name, used_locations, use_mask, attr);
+   if (target_index == MESA_SHADER_FRAGMENT && !prog->IsES) {
+  /* From section 4.4.2 (Output Layout Qualifiers) of the GLSL
+   * 4.40 spec:
+   *
+   *"Additionally, for fragment shader outputs, if two
+   *variables are placed within the same location, they
+   *must have the same underlying type (floating-point or
+   *integer). No component aliasing of output variables or
+   *members is allowed.
+   */
+  for (unsigned i = 0; i < assigned_attr; i++) {
+ unsigned assigned_slots =
+assigned[i]->type->count_attribute_slots(false);
+unsigned assig_attr =
+assigned[i]->data.location - generic_base;
+unsigned assigned_use_mask = (1 << assigned_slots) - 1;
+
+ if ((assigned_use_mask << assig_attr) &
+ (use_mask << attr)) {
+
+const glsl_type *assigned_type =
+   assigned[i]->type->without_array();
+const glsl_type *type = var->type->without_array();
+if (assigned_type->base_type != type->base_type) {
+   linker_error(prog, "types do not match for aliased"
+" %ss %s and %s\n", string,
+assigned[i]->name, var->name);
+   return false;
+}
+
+unsigned assigned_component_mask =
+   ((1 << assigned_type->vector_elements) - 1) <<
+   assigned[i]->data.location_frac;
+unsigned component_mask =
+   ((1 << type->vector_elements) - 1) <<
+   var->data.location_frac;
+if (assigned_component_mask & component_mask) {
+   linker_error(prog, "overlapping component is "
+"assigned to %ss %s and %s "
+"(component=%d)\n",
+string, assigned[i]->name, var->name,
+var->data.location_frac);
+   return false;
+}
+ }
+  }
+   } else if (target_index == MESA_SHADER_FRAGMENT ||
+  (prog->IsES && prog->Version >= 300)) {
+  linker_error(prog, "overlapping location is assigned "
+   "to %s `%s' %d %d %d\n", string, var->name,
+   used_locations, use_mask, attr);
   return false;
} else {
-  linker_warning(prog,
- "overlapping location is assigned "
- "to %s `%s' %d %d %d\n", string,
- var->name, used_locations, use_mask, attr);
+  linker_warning(prog, "overlapping location is assigned "
+ "to %s `%s' %d %d %d\n", string, var->name,
+ used_locations, use_mask, attr);

Re: [Mesa-dev] [PATCH 19/28] glsl: add support for explicit components to frag outputs

2016-01-13 Thread Timothy Arceri
On Tue, 2016-01-12 at 16:36 -0800, Anuj Phogat wrote:
> On Mon, Dec 28, 2015 at 9:00 PM, Timothy Arceri
>  wrote:
> > ---
> >  src/glsl/linker.cpp | 56
> > -
> >  1 file changed, 55 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> > index 41ff057..44dd7f0 100644
> > --- a/src/glsl/linker.cpp
> > +++ b/src/glsl/linker.cpp
> > @@ -2411,7 +2411,12 @@
> > assign_attribute_or_color_locations(gl_shader_program *prog,
> >}
> > } to_assign[16];
> > 
> > +   /* Temporary array for the set of attributes that have
> > locations assigned.
> > +*/
> > +   ir_variable *assigned[16];
> > +
> > unsigned num_attr = 0;
> > +   unsigned assigned_attr = 0;
> > 
> > foreach_in_list(ir_instruction, node, sh->ir) {
> >ir_variable *const var = node->as_variable();
> > @@ -2573,7 +2578,53 @@
> > assign_attribute_or_color_locations(gl_shader_program *prog,
> >  * attribute overlaps any previously allocated bits.
> >  */
> > if ((~(use_mask << attr) & used_locations) !=
> > used_locations) {
> > -   if (target_index == MESA_SHADER_FRAGMENT ||
> > +   if (target_index == MESA_SHADER_FRAGMENT && !prog
> > ->IsES) {
> > +  /* From section 4.4.2 (Output Layout Qualifiers)
> > of the GLSL
> > +   * 4.40 spec:
> > +   *
> > +   *"Additionally, for fragment shader
> > outputs, if two
> > +   *variables are placed within the same
> > location, they
> > +   *must have the same underlying type
> > (floating-point or
> > +   *integer). No component aliasing of output
> > variables or
> > +   *members is allowed.
> > +   */
> > +  int frag_out_end_loc = (var->type->is_array() ?
> > + var->type->arrays_of_arrays_size() : 1) +
> > + var->data.location;
> > +
> > +  for (unsigned i = 0; i < assigned_attr; i++) {
> > + for (int j = var->data.location; j <
> > frag_out_end_loc;
> > +  j++) {
> > +if (assigned[i]->data.location == j) {
> I find assigned[i]->data.location == var->data.location more
> readable.

This comment got me looking at this code and the piglit tests again and
I seem to be missing a piglit test for overlaping array output from the
fragment shader.

... 20 minute later after writing the tests, it seems that both error
checks for arrays and components are only half working correctly with
this patch.

I'm about to send a V2 and have already sent the piglit tests:

http://patchwork.freedesktop.org/patch/70351/

> 
> > +   if (assigned[i]->type->without_array()
> > ->base_type !=
> > +   var->type->without_array()
> > ->base_type) {
> > +  linker_error(prog,
> > +   "types do not match for
> > aliased"
> > +   " %ss %s and %s\n",
> > string,
> > +   assigned[i]->name, var
> > ->name);
> > +  return false;
> > +   }
> > +
> > +   if ((assigned[i]->data.location_frac ==
> > +var->data.location_frac) ||
> > +  ((assigned[i]->data.location_frac <
> > +var->data.location_frac) &&
> > +((assigned[i]->data.location_frac
> > +
> > +  assigned[i]->type
> > ->vector_elements) >
> > + var->data.location_frac))) {
> > +  linker_error(prog,
> > +   "overlapping component
> > is "
> > +   "assigned to %ss %s and
> > %s "
> > +   "(component=%d)\n",
> > +   string, assigned[i]
> > ->name,
> > +   var->name,
> > +   var
> > ->data.location_frac);
> > +  return false;
> > +   }
> > +}
> > + }
> > +  }
> > +   } else if (target_index == MESA_SHADER_FRAGMENT ||
> > (prog->IsES && prog->Version >= 300)) {
> >linker_error(prog,
> > "overlapping location is assigned "
> > @@ -2614,6 +2665,9 @@
> > assign_attribute_or_color_locations(gl_shader_program *prog,
> > double_storage_locations |= (use_mask << attr);
> > 

Re: [Mesa-dev] [PATCH demos] configure.ac: Fix default behavior of AC_ARG_WITH(glut) if glut isn't available

2016-01-13 Thread Andreas Boll
ping

2015-12-10 16:32 GMT+01:00 Andreas Boll :
> Fixes a regression introduced in
> 406248811eb0dfabf75ae9495b54529ec59cce66
>
> It wrongly sets glut_enabled=yes if glut isn't available and neither
> option --with-glut nor --without-glut was given.
>
> The default behavior in that case should be if glut is available then
> enable glut else it should disable glut.
>
> To fix this the default value of glut_enabled is set back to yes and in
> case --without-glut was given glut_enabled is set to no.
>
> Cc: Ross Burton 
> Signed-off-by: Andreas Boll 
> ---
>  configure.ac | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/configure.ac b/configure.ac
> index 0525b09..ddc68b5 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -67,7 +67,7 @@ DEMO_CFLAGS="$DEMO_CFLAGS $GL_CFLAGS"
>  DEMO_LIBS="$DEMO_LIBS $GL_LIBS"
>
>  dnl Check for GLUT
> -glut_enabled=no
> +glut_enabled=yes
>  AC_ARG_WITH([glut],
> [AS_HELP_STRING([--with-glut=DIR],
> [glut install directory])],
> @@ -83,9 +83,8 @@ AS_IF([test "x$with_glut" != xno],
> AC_CHECK_LIB([glut],
>  [glutInit],
>  [],
> -[glut_enabled=no])
> -   glut_enabled=yes
> -])
> +[glut_enabled=no])],
> +  [glut_enabled=no])
>
>  dnl Check for FreeGLUT 2.6 or later
>  AC_EGREP_HEADER([glutInitContextProfile],
> --
> 2.1.4
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V3] glapi: add GL_OES_geometry_shader extension

2016-01-13 Thread Lofstedt, Marta


> -Original Message-
> From: ibmir...@gmail.com [mailto:ibmir...@gmail.com] On Behalf Of Ilia
> Mirkin
> Sent: Tuesday, January 12, 2016 7:09 PM
> To: Marta Lofstedt
> Cc: mesa-dev@lists.freedesktop.org; Romanick, Ian D; Lofstedt, Marta
> Subject: Re: [PATCH V3] glapi: add GL_OES_geometry_shader extension
> 
> On Tue, Jan 12, 2016 at 9:13 AM, Marta Lofstedt
>  wrote:
> > From: Marta Lofstedt 
> >
> > Add xml definitions for the GL_OES_geometry_shader extension and
> > expose the extension for OpenGL ES 3.1.
> >
> > V3: Added dependency to OES_shader_io_blocks and updated to correct
> > Khronos extension number.
> 
> May I ask why you did this? OES_shader_io_blocks is a purely shader
> compiler/linker feature, I expect it will be enabled whenever GLES 3.1 is
> enabled, no? Why would it be tied to geometry shaders? Sure, geometry
> shaders require it to work, but just because you have OES_shader_io_blocks
> doesn't necessarily mean you also have geometry shaders...
> 

My intension was to address the co-dependency between oes_geometry_shader and 
oes_shader_io_block. 
But as always, you are right Ilia. The dependency issue need to be fixed in the 
driver.

So, please disregard this V3, I will push the V2 with the changes suggested by 
Ilia in the comments.

FYI here are quotes from the oes_geometry_shader specification:
" OES_shader_io_blocks or EXT_shader_io_blocks is required."

" This extension relies on the OES_shader_io_blocks extension to provide
the required functionality for declaring input and output blocks and
interfacing between shaders."

" If the OES_geometry_shader extension is enabled, the
OES_shader_io_blocks extension is also implicitly enabled.


> In practical terms, there's a non-trivial chance that A4xx will get 
> tessellation
> before geometry shaders, which would also require OES_shader_io_blocks
> to be exposed.
Yes, but for desktop we already have the "shader_io_block" functionality.
So, the dependency is a GLES issue, and tessellation is not yet exposed under 
GLES.

> 
>   -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92687] Add support for ARB_internalformat_query2

2016-01-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92687

--- Comment #3 from Eduardo Lima Mitev  ---
(In reply to Eduardo Lima Mitev from comment #2)
> 
> Following, there are some initial issues/questions we have been gathering:
> 

Independently of feedback to the branch we posted, it would be very useful to
get insights on the issues/questions above, which are most of them generic.

In any case, we plan to send the branch as an RFC series to mesa-dev list soon,
(e.g, end of this week).

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/meta-fast-clear: Convert the clear color through the surf format

2016-01-13 Thread Neil Roberts
Bump. Anyone fancy reviewing this small patch? I think it would be good
to have because it makes the code a bit simpler as well as fixing a
corner case and making it more robust.

- Neil

Neil Roberts  writes:

> When programming the fast clear color there was previously a chunk of
> code to try to make the color match the constraints of the surface
> format such as by filling in missing components and handling luminance
> formats. These cases are not handled by the hardware. There are some
> additional possible restrictions that the hardware does seem to
> handle, such as clamping to [0,1] for normalised formats. However for
> whatever reason it doesn't clamp to [0,∞] for the special float
> formats that don't have a sign bit. Rather than adding yet another
> special case for this format this patch makes it instead convert the
> color to the actual surface format and back again so that we can be
> sure it will have all of the possible restrictions. Additionally this
> would avoid some other potential surprises such as getting more
> precision for the clear color when fast clears are used.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93338
> ---
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 57 
> -
>  1 file changed, 27 insertions(+), 30 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> index cf0e56b..29ae6f0 100644
> --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> @@ -37,6 +37,8 @@
>  #include "main/uniforms.h"
>  #include "main/fbobject.h"
>  #include "main/texobj.h"
> +#include "main/format_unpack.h"
> +#include "main/format_pack.h"
>  
>  #include "main/api_validate.h"
>  #include "main/state.h"
> @@ -397,45 +399,40 @@ set_fast_clear_color(struct brw_context *brw,
>   struct intel_mipmap_tree *mt,
>   const union gl_color_union *color)
>  {
> +   mesa_format linear_format = _mesa_get_srgb_format_linear(mt->format);
> union gl_color_union override_color = *color;
> -
> -   /* The sampler doesn't look at the format of the surface when the fast
> -* clear color is used so we need to implement luminance, intensity and
> -* missing components manually.
> -*/
> -   switch (_mesa_get_format_base_format(mt->format)) {
> -   case GL_INTENSITY:
> -  override_color.ui[3] = override_color.ui[0];
> -  /* flow through */
> -   case GL_LUMINANCE:
> -   case GL_LUMINANCE_ALPHA:
> -  override_color.ui[1] = override_color.ui[0];
> -  override_color.ui[2] = override_color.ui[0];
> -  break;
> -   default:
> -  for (int i = 0; i < 3; i++) {
> - if (!_mesa_format_has_color_component(mt->format, i))
> -override_color.ui[i] = 0;
> -  }
> -  break;
> -   }
> -
> -   if (!_mesa_format_has_color_component(mt->format, 3)) {
> -  if (_mesa_is_format_integer_color(mt->format))
> - override_color.ui[3] = 1;
> -  else
> - override_color.f[3] = 1.0f;
> -   }
> +   union gl_color_union tmp_color;
>  
> /* Handle linear→SRGB conversion */
> -   if (brw->ctx.Color.sRGBEnabled &&
> -   _mesa_get_srgb_format_linear(mt->format) != mt->format) {
> +   if (brw->ctx.Color.sRGBEnabled && linear_format != mt->format) {
>for (int i = 0; i < 3; i++) {
>   override_color.f[i] =
>  util_format_linear_to_srgb_float(override_color.f[i]);
>}
> }
>  
> +   /* Convert the clear color to the surface format and back so that the 
> color
> +* returned when sampling is guaranteed to be a value that could be stored
> +* in the surface. For example if the surface is a luminance format and we
> +* clear to 0.5,0.75,0.1,0.2 we want the color to come back as
> +* 0.5,0.5,0.5,1.0. In general the hardware doesn't seem to look at the
> +* surface format when returning the clear color so we need to do this to
> +* implement luminance, intensity and missing components. However it does
> +* seem to look at it in some cases such as to clamp to the range [0,1] 
> for
> +* unorm formats. Suprisingly however it doesn't clamp to [0,∞] for the
> +* special float formats that don't have a sign bit.
> +*/
> +   if (!_mesa_is_format_integer_color(linear_format)) {
> +  _mesa_pack_float_rgba_row(linear_format,
> +1, /* n_pixels */
> +(const GLfloat (*)[4]) override_color.f,
> +_color);
> +  _mesa_unpack_rgba_row(linear_format,
> +1, /* n_pixels */
> +_color,
> +(GLfloat (*)[4]) override_color.f);
> +   }
> +
> if (brw->gen >= 9) {
>mt->gen9_fast_clear_color = override_color;
> } else {
> -- 
> 1.9.3
>
> ___

[Mesa-dev] Mesa 11.1.1

2016-01-13 Thread Emil Velikov
Mesa 11.1.1 is now available.

With this release we have a significant amount of fixes - from radeonsi
(Fiji, Hyper-Z), r600 (geom. shaders), nouveau (ir), freedreno (piglits),
i965 (UBOs) and a few patches for "GRID Autosport" (i965 and glsl).
Additionally I've included the PCI IDs for Intel's KabyLake devices.

Last but not least - a few more BSD related build fixes are included :-)


Brian Paul (1):
  st/mesa: check state->mesa in early return check in st_validate_state()

Dave Airlie (6):
  mesa/varray: set double arrays to non-normalised.
  mesa/shader: return correct attribute location for double matrix arrays
  glsl: pass stage into mark function
  glsl/fp64: add helper for dual slot double detection.
  glsl: fix count_attribute_slots to allow for different 64-bit handling
  glsl: only update doubles inputs for vertex inputs.

Emil Velikov (5):
  docs: add sha256 checksums for 11.0.1
  cherry-ignore: drop the "re-enable" DCC on Stoney
  cherry-ignore: don't pick a specific i965 formats patch
  Update version to 11.1.1
  docs: add release notes for 11.1.1

Eric Anholt (2):
  vc4: Warn instead of abort()ing on exec ioctl failures.
  vc4: Keep sample mask writes from being reordered after TLB writes

Grazvydas Ignotas (1):
  r600: fix constant buffer size programming

Ian Romanick (1):
  meta/generate_mipmap: Work-around GLES 1.x problem with 
GL_DRAW_FRAMEBUFFER

Ilia Mirkin (9):
  nv50/ir: can't have predication and immediates
  gk104/ir: simplify and fool-proof texbar algorithm
  glsl: assign varying locations to tess shaders when doing SSO
  glx/dri3: a drawable might not be bound at wait time
  nvc0: don't forget to reset VTX_TMP bufctx slot after blit completion
  nv50/ir: float(s32 & 0xff) = float(u8), not s8
  nv50,nvc0: make sure there's pushbuf space and that we ref the bo early
  nv50,nvc0: fix crash when increasing bsp bo size for h264
  nvc0: scale up inter_bo size so that it's 16M for a 4K video

Jonathan Gray (2):
  configure.ac: use pkg-config for libelf
  configure: check for python2.7 for PYTHON2

Kenneth Graunke (5):
  ralloc: Fix ralloc_adopt() to the old context's last child's parent.
  drirc: Disable ARB_blend_func_extended for Heaven 4.0/Valley 1.0.
  glsl: Fix varying struct locations when varying packing is disabled.
  nvc0: Set winding order regardless of domain.
  nir: Add a lower_fdiv option, turn fdiv into fmul/frcp.

Marek Olšák (7):
  tgsi/scan: add flag colors_written
  r600g: write all MRTs only if there is exactly one output (fixes a hang)
  radeonsi: don't call of u_prims_for_vertices for patches and rectangles
  radeonsi: apply the streamout workaround to Fiji as well
  gallium/radeon: fix Hyper-Z hangs by programming PA_SC_MODE_CNTL_1 
correctly
  program: add _mesa_reserve_parameter_storage
  st/mesa: fix GLSL uniform updates for glBitmap & glDrawPixels (v2)

Miklós Máté (1):
  mesa: Don't leak ATIfs instructions in DeleteFragmentShader

Neil Roberts (3):
  i965: Add MESA_FORMAT_B8G8R8X8_SRGB to brw_format_for_mesa_format
  i965: Add B8G8R8X8_SRGB to the alpha format override
  i965: Fix crash when calling glViewport with no surface bound

Nicolai Hähnle (2):
  gallium/radeon: only dispose locally created target machine in 
radeon_llvm_compile
  gallium/radeon: fix regression in a number of driver queries

Oded Gabbay (1):
  configura.ac: fix test for SSE4.1 assembler support

Patrick Rudolph (2):
  nv50,nvc0: fix use-after-free when vertex buffers are unbound
  gallium/util: return correct number of bound vertex buffers

Rob Herring (1):
  freedreno/ir3: fix 32-bit builds with pointer-to-int-cast error enabled

Samuel Pitoiset (3):
  nvc0: free memory allocated by the prog which reads MP perf counters
  nv50,nvc0: free memory allocated by performance metrics
  nv50: free memory allocated by the prog which reads MP perf counters

Sarah Sharp (1):
  mesa: Add KBL PCI IDs and platform information.


git tag: mesa-11.1.1

ftp://ftp.freedesktop.org/pub/mesa/11.1.1/mesa-11.1.1.tar.gz
MD5: f0f6df1bd436fd2ccf2dec9b4d583638  mesa-11.1.1.tar.gz
SHA1: 98351f58e5ba906cb9ed2311c5b07832c756ca22  mesa-11.1.1.tar.gz
SHA256: b15089817540ba0bffd0aad323ecf3a8ff6779568451827c7274890b4a269d58  
mesa-11.1.1.tar.gz
PGP: ftp://ftp.freedesktop.org/pub/mesa/11.1.1/mesa-11.1.1.tar.gz.sig

ftp://ftp.freedesktop.org/pub/mesa/11.1.1/mesa-11.1.1.tar.xz
MD5: 1043dfb907beecb2a761272455960427  mesa-11.1.1.tar.xz
SHA1: 77eeb75660e8d0851457151ef18c87540c6fd6bc  mesa-11.1.1.tar.xz
SHA256: 64db074fc514136b5fb3890111f0d50604db52f0b1e94ba3fcb0fe8668a7fd20  
mesa-11.1.1.tar.xz
PGP: ftp://ftp.freedesktop.org/pub/mesa/11.1.1/mesa-11.1.1.tar.xz.sig

--
-Emil



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list

Re: [Mesa-dev] [PATCH 7/8] gallium/radeon: implement PIPE_CAP_INVALIDATE_BUFFER

2016-01-13 Thread Marek Olšák
On Wed, Jan 13, 2016 at 11:41 AM, Fredrik Höglund  wrote:
> On Tuesday 12 January 2016, Nicolai Hähnle wrote:
>> On 12.01.2016 13:41, Fredrik Höglund wrote:
>> > On Tuesday 12 January 2016, Nicolai Hähnle wrote:
>> >> From: Nicolai Hähnle 
>> >>
>> >> ---
>> >>   src/gallium/drivers/r600/r600_pipe.c|  2 +-
>> >>   src/gallium/drivers/radeon/r600_buffer_common.c | 23 
>> >> ---
>> >>   src/gallium/drivers/radeon/r600_pipe_common.c   |  1 +
>> >>   src/gallium/drivers/radeon/r600_pipe_common.h   |  3 +++
>> >>   src/gallium/drivers/radeonsi/si_pipe.c  |  2 +-
>> >>   5 files changed, 22 insertions(+), 9 deletions(-)
>> >>
>> >> diff --git a/src/gallium/drivers/r600/r600_pipe.c 
>> >> b/src/gallium/drivers/r600/r600_pipe.c
>> >> index a8805f6..569f77c 100644
>> >> --- a/src/gallium/drivers/r600/r600_pipe.c
>> >> +++ b/src/gallium/drivers/r600/r600_pipe.c
>> >> @@ -278,6 +278,7 @@ static int r600_get_param(struct pipe_screen* 
>> >> pscreen, enum pipe_cap param)
>> >>case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
>> >>case PIPE_CAP_TGSI_TXQS:
>> >>case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS:
>> >> +  case PIPE_CAP_INVALIDATE_BUFFER:
>> >>return 1;
>> >>
>> >>case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
>> >> @@ -355,7 +356,6 @@ static int r600_get_param(struct pipe_screen* 
>> >> pscreen, enum pipe_cap param)
>> >>case PIPE_CAP_TGSI_FS_POSITION_IS_SYSVAL:
>> >>case PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL:
>> >>case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT:
>> >> -  case PIPE_CAP_INVALIDATE_BUFFER:
>> >>return 0;
>> >>
>> >>case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS:
>> >> diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
>> >> b/src/gallium/drivers/radeon/r600_buffer_common.c
>> >> index aeb9a20..09755e0 100644
>> >> --- a/src/gallium/drivers/radeon/r600_buffer_common.c
>> >> +++ b/src/gallium/drivers/radeon/r600_buffer_common.c
>> >> @@ -209,6 +209,21 @@ static void r600_buffer_destroy(struct pipe_screen 
>> >> *screen,
>> >>FREE(rbuffer);
>> >>   }
>> >>
>> >> +void r600_invalidate_resource(struct pipe_context *ctx,
>> >> +struct pipe_resource *resource)
>> >> +{
>> >> +  struct r600_common_context *rctx = (struct r600_common_context*)ctx;
>> >> +struct r600_resource *rbuffer = r600_resource(resource);
>> >> +
>> >> +  /* Check if mapping this buffer would cause waiting for the GPU. */
>> >> +  if (r600_rings_is_buffer_referenced(rctx, rbuffer->buf, 
>> >> RADEON_USAGE_READWRITE) ||
>> >> +  !rctx->ws->buffer_wait(rbuffer->buf, 0, RADEON_USAGE_READWRITE)) {
>> >> +  rctx->invalidate_buffer(>b, >b.b);
>> >> +  } else {
>> >> +  util_range_set_empty(>valid_buffer_range);
>> >> +  }
>> >
>> > This implementation does not exactly comply with the specification.
>> >
>> > The point of InvalidateBuffer is to tell the driver that it may discard the
>> > contents of the buffer if, for example, the buffer needs to be evicted.
>> >
>> > Calling InvalidateBuffer is not equivalent to calling MapBufferRange
>> > with GL_MAP_INVALIDATE_BUFFER_BIT, since the former should invalidate
>> > the buffer regardless of whether it is busy or not.
>>
>> Can you back this with a quote from the spec? Given that no-op seems to
>> be a correct implmentation of InvalidateBuffer, I find what you write
>> rather hard to believe.
>
> The overview says:
>
> "GL implementations often include several memory spaces, each with
>  distinct performance characteristics, and the implementations
>  transparently move allocations between memory spaces. With this
>  extension, an application can tell the GL that the contents of a
>  texture or buffer are no longer needed, and the implementation can
>  avoid transferring the data unnecessarily."
>
> This to me makes the intent pretty clear.  The implementation is of
> course free to do what it wants with this information, including nothing
> at all.  My objection here is that your implementation only helps
> applications that are using the extension incorrectly.  But it is still an
> improvement over doing nothing at all.

I wouldn't worry about the spec overview too much. It's just a
motivating introduction to the spec.

However, immediately before InvalidateBufferData, there is this sentence:

"After this command, data in the specified range have undefined values."

That's a very clear definition of the behavior, and this patch seems
to do the right thing.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3] glapi: Build glapi_gentable.c only on Darwin

2016-01-13 Thread Andreas Boll
Removes the public symbol _glapi_create_table_from_handle from
libGL.so.1 on all platforms except Darwin.

Since the symbol is not used on other platforms it makes sense to
build glapi_gentable.c only on Darwin.

A little bit of history:

_glapi_create_table_from_handle was introduced in

commit 85937f4c0d4a78d3a11e3c1fa6148640f2a9ad7b
Author: Jeremy Huddleston 
Date:   Thu Jun 9 16:59:49 2011 -0700

glapi: Add API that can create a _glapi_table from a dlfcn handle

Example usage:

void *handle = dlopen(opengl_library_path, RTLD_LOCAL);
struct _glapi_table *disp = _glapi_create_table_from_handle(handle,
"gl");

Signed-off-by: Jeremy Huddleston 

and the only user in mesa was added in

commit f35913b96e743c5014e99220b1a1c5532a894d69
Author: Jeremy Huddleston 
Date:   Thu Jun 9 17:29:51 2011 -0700

apple: Use _glapi_create_table_from_handle to initialize our
dispatch table

Signed-off-by: Jeremy Huddleston 

gl_gentable.py was also used for XQuartz in xserver 1.11 - 1.14.

v2: Fix typos in commit message
Add missing XORG_GLAPI_OUTPUTS += \ into src/mapi/glapi/gen/Makefile.am
Add glapi_gentable.c to EXTRA_DIST for inclusion in the release
tarball

v3: Fix commit message: s/gl_gentable.c/glapi_gentable.c/

Cc: Jeremy Huddleston 
Signed-off-by: Andreas Boll 
---
 src/mapi/Makefile.am   |  6 +-
 src/mapi/glapi/gen/Makefile.am | 14 +++---
 src/mapi/glapi/glapi.h |  2 ++
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/src/mapi/Makefile.am b/src/mapi/Makefile.am
index 307e05d..ddd3daa 100644
--- a/src/mapi/Makefile.am
+++ b/src/mapi/Makefile.am
@@ -106,12 +106,16 @@ if HAVE_SPARC_ASM
 GLAPI_ASM_SOURCES = glapi/glapi_sparc.S
 endif
 
-glapi_libglapi_la_SOURCES = glapi/glapi_gentable.c
+glapi_libglapi_la_SOURCES =
 glapi_libglapi_la_CPPFLAGS = \
$(AM_CPPFLAGS) \
-I$(top_srcdir)/src/mapi/glapi \
-I$(top_srcdir)/src/mesa
 
+if HAVE_APPLEDRI
+glapi_libglapi_la_SOURCES += glapi/glapi_gentable.c
+endif
+
 if HAVE_SHARED_GLAPI
 glapi_libglapi_la_SOURCES += $(MAPI_BRIDGE_FILES) glapi/glapi_mapi_tmp.h
 glapi_libglapi_la_CPPFLAGS += \
diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am
index 900b61a..3f3e0b9 100644
--- a/src/mapi/glapi/gen/Makefile.am
+++ b/src/mapi/glapi/gen/Makefile.am
@@ -27,8 +27,11 @@ MESA_GLAPI_OUTPUTS = \
$(MESA_GLAPI_DIR)/glapi_mapi_tmp.h \
$(MESA_GLAPI_DIR)/glprocs.h \
$(MESA_GLAPI_DIR)/glapitemp.h \
-   $(MESA_GLAPI_DIR)/glapitable.h \
-   $(MESA_GLAPI_DIR)/glapi_gentable.c
+   $(MESA_GLAPI_DIR)/glapitable.h
+
+if HAVE_APPLEDRI
+MESA_GLAPI_OUTPUTS += $(MESA_GLAPI_DIR)/glapi_gentable.c
+endif
 
 MESA_GLAPI_ASM_OUTPUTS =
 if HAVE_X86_ASM
@@ -57,6 +60,7 @@ BUILT_SOURCES = \
$(MESA_GLX_DIR)/indirect_size.c
 EXTRA_DIST= \
$(BUILT_SOURCES) \
+   $(MESA_GLAPI_DIR)/glapi_gentable.c \
$(MESA_GLAPI_DIR)/glapi_x86.S \
$(MESA_GLAPI_DIR)/glapi_x86-64.S \
$(MESA_GLAPI_DIR)/glapi_sparc.S \
@@ -88,8 +92,12 @@ XORG_GLAPI_DIR = $(XORG_BASE)/glx
 XORG_GLAPI_OUTPUTS = \
$(XORG_GLAPI_DIR)/glprocs.h \
$(XORG_GLAPI_DIR)/glapitable.h \
-   $(XORG_GLAPI_DIR)/dispatch.h \
+   $(XORG_GLAPI_DIR)/dispatch.h
+
+if HAVE_APPLEDRI
+XORG_GLAPI_OUTPUTS += \
$(XORG_GLAPI_DIR)/glapi_gentable.c
+endif
 
 XORG_OUTPUTS = \
$(XORG_GLAPI_OUTPUTS) \
diff --git a/src/mapi/glapi/glapi.h b/src/mapi/glapi/glapi.h
index f269b17..3593c88 100644
--- a/src/mapi/glapi/glapi.h
+++ b/src/mapi/glapi/glapi.h
@@ -158,8 +158,10 @@ _GLAPI_EXPORT const char *
 _glapi_get_proc_name(unsigned int offset);
 
 
+#ifdef GLX_USE_APPLEGL
 _GLAPI_EXPORT struct _glapi_table *
 _glapi_create_table_from_handle(void *handle, const char *symbol_prefix);
+#endif
 
 
 _GLAPI_EXPORT void
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [android-x86-devel] Re: need-help: how to change to newest mesa in android-x86?

2016-01-13 Thread Rob Clark
On Wed, Jan 13, 2016 at 12:54 PM, Rob Herring  wrote:
> On Tue, Jan 12, 2016 at 8:06 PM, Chih-Wei Huang  
> wrote:
>> 2016-01-13 6:29 GMT+08:00 Rob Herring :
>>> On Tue, Jan 12, 2016 at 7:05 AM, Chih-Wei Huang  
>>> wrote:
 2016-01-12 19:55 GMT+08:00 陈渝 :
> hi, Rob, Dave, Zhiwei:
>  Thank you all!
>
> Next I need to update other parts should be in the user level.
>
> I need to update drm_gralloc? Do I need to update drm_hwcomposer or 
> libdrm?
> Are there any other things I need notice?

 libdrm in marshmallow-x86 is 2.4.66 which is newer enough
 to support it, I think.

 drm_hwcomposer is not been used in marshmallow-x86 yet.
 So don't worry about it.

 The keypoint is to implement the gralloc_drm_virgil3d.c.
 You may look at other gralloc_drm_*.c as examples.
>>>
>>> Nope, virgl is a pipe driver and support is already there for the most part.
>>
>> Rob, thank you very much for the input.
>> It's the first time I heard the usage of
>> AOSP's drm_gralloc & drm_hwcomposer from others.
>> Great!
>>
>> Indeed we have tried to enable AOSP's drm_gralloc & drm_hwcomposer
>> in the beginning of marshmallow-x86 porting but failed.
>
> How far did you get?
>
>> So we keep to use our implementation.
>> (AOSP's drm_gralloc was forked from our lollipop-x86 branch
>> since about Jan 2015)
>>
>> I'm excited to know you have succeeded to use them.
>> Could you please guide us how to enable them correctly?
>
> Well, things are not completely working. I've got 1 device config
> which can build for x86 or arm64 (any arch in theory) and runs on x86
> KVM, Dragonboard 410c or arm64 QEMU. x86 KVM seems to work the best. I
> can boot and navigate around a bit until it dies. There's at least 2
> problems.
>
> After a little bit of navigating around I get errors like this from
> virglrenderer:
> vrend_set_single_sampler_view: context error reported 6
> "ndroid.contacts" Illegal handle 112
> vrend_set_single_sampler_view: context error reported 6
> "ndroid.contacts" Illegal handle 113
> vrend_set_single_sampler_view: context error reported 6
> "ndroid.contacts" Illegal handle 114
> vrend_set_single_sampler_view: context error reported 6
> "ndroid.contacts" Illegal handle 115
> vrend_set_single_sampler_view: context error reported 6
> "ndroid.contacts" Illegal handle 116
> vrend_set_single_sampler_view: context error reported 6
> "ndroid.contacts" Illegal handle 117
> vrend_set_single_sampler_view: context error reported 6
> "ndroid.contacts" Illegal handle 118
> vrend_set_framebuffer_state: context error reported 5
> "ndroid.systemui" Illegal surface 63
>
> Usually the screen get flipped and drawn in about 1/4 of the original
> screen size after these errors. I can capture a screenshot if
> interested.

I wonder a bit if refcnt'ing imbalance or something like that..
accidentally free'ing the last reference to buffer, and then numeric
handle getting re-used on a different unrelated buffer (for example)
can cause all sorts of fun.

> The 2nd problem is the screen fade to black shader program crashes on
> linking. Seems to have a NULL function name from the stack trace, but
> I've not debugged it further. This triggers whenever the screen off
> timeout triggers.

iirc, android-x86 had something to comment out this shader.  I do
remember it causing a segfault in mesa in the shader compiler.  I did
attempt to reproduce this w/ same shader in a test program (where I
could debug w/ sane gdb env, no java, etc), but no luck.

(If anyone knows how to apitrace android "java stuff".. that might be
a way to get something I could debug.)

I can't find the link to the android-x86 patch anymore, since the git
servers moved (and not even sure if that still applies to more recent
android)

BR,
-R

> Freedreno seems to have some additional problem I haven't fully characterized.
>
>> Any necessary changes?
>
> Yes, my changes are all pushed into my github acct[1]. Instructions
> are here[2]. The changes are largely build fixes, virtio-gpu support,
> freedreno/dmabuf support from Rob Clark's tree, and hacks around
> issues I've found.
>
>> Especially, is the vanilla kernel 4.4 ready to use them?
>
> What I'm using is pretty close to stock 4.4[3]. There's a couple of
> Android patches and some virtio-gpu related changes. The virtio-gpu
> changes are mainly adding atomic support for virtio-gpu. There's some
> others for fb mmap and panning support as well.
>
> Rob
>
> [1] https://github.com/robherring?tab=repositories
> [2] 
> https://github.com/robherring/generic_device/wiki/Android-with-DRM-mesa-graphics
> [3] 
> https://git.linaro.org/people/rob.herring/linux.git/shortlog/refs/heads/android-4.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/8] gallium/radeon: implement PIPE_CAP_INVALIDATE_BUFFER

2016-01-13 Thread Nicolai Hähnle

On 13.01.2016 05:41, Fredrik Höglund wrote:

On Tuesday 12 January 2016, Nicolai Hähnle wrote:

On 12.01.2016 13:41, Fredrik Höglund wrote:

On Tuesday 12 January 2016, Nicolai Hähnle wrote:

From: Nicolai Hähnle 

---
   src/gallium/drivers/r600/r600_pipe.c|  2 +-
   src/gallium/drivers/radeon/r600_buffer_common.c | 23 ---
   src/gallium/drivers/radeon/r600_pipe_common.c   |  1 +
   src/gallium/drivers/radeon/r600_pipe_common.h   |  3 +++
   src/gallium/drivers/radeonsi/si_pipe.c  |  2 +-
   5 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index a8805f6..569f77c 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -278,6 +278,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
case PIPE_CAP_TGSI_TXQS:
case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS:
+   case PIPE_CAP_INVALIDATE_BUFFER:
return 1;

case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
@@ -355,7 +356,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_TGSI_FS_POSITION_IS_SYSVAL:
case PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL:
case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT:
-   case PIPE_CAP_INVALIDATE_BUFFER:
return 0;

case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS:
diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index aeb9a20..09755e0 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -209,6 +209,21 @@ static void r600_buffer_destroy(struct pipe_screen *screen,
FREE(rbuffer);
   }

+void r600_invalidate_resource(struct pipe_context *ctx,
+ struct pipe_resource *resource)
+{
+   struct r600_common_context *rctx = (struct r600_common_context*)ctx;
+struct r600_resource *rbuffer = r600_resource(resource);
+
+   /* Check if mapping this buffer would cause waiting for the GPU. */
+   if (r600_rings_is_buffer_referenced(rctx, rbuffer->buf, 
RADEON_USAGE_READWRITE) ||
+   !rctx->ws->buffer_wait(rbuffer->buf, 0, RADEON_USAGE_READWRITE)) {
+   rctx->invalidate_buffer(>b, >b.b);
+   } else {
+   util_range_set_empty(>valid_buffer_range);
+   }


This implementation does not exactly comply with the specification.

The point of InvalidateBuffer is to tell the driver that it may discard the
contents of the buffer if, for example, the buffer needs to be evicted.

Calling InvalidateBuffer is not equivalent to calling MapBufferRange
with GL_MAP_INVALIDATE_BUFFER_BIT, since the former should invalidate
the buffer regardless of whether it is busy or not.


Can you back this with a quote from the spec? Given that no-op seems to
be a correct implmentation of InvalidateBuffer, I find what you write
rather hard to believe.


The overview says:

"GL implementations often include several memory spaces, each with
 distinct performance characteristics, and the implementations
 transparently move allocations between memory spaces. With this
 extension, an application can tell the GL that the contents of a
 texture or buffer are no longer needed, and the implementation can
 avoid transferring the data unnecessarily."

This to me makes the intent pretty clear.  The implementation is of
course free to do what it wants with this information, including nothing
at all.  My objection here is that your implementation only helps
applications that are using the extension incorrectly.  But it is still an
improvement over doing nothing at all.


This implementation helps applications that use glInvalidateBufferData 
to invalidate a buffer that they use in a streaming fashion. It seems to 
me that that is a correct use.


Perhaps you could give an example of what you think a correct use is, 
and how it isn't helped by this patch?


Thanks,
Nicolai




Part of the problems may be that the spec talks about "invalidating"
without - as far as I can tell - ever defining what that means. In any
case, I see no reason why the behavior should be different form
GL_MAP_INVALIDATE_BUFFER_BIT.

Thanks,
Nicolai




+}
+
   static void *r600_buffer_get_transfer(struct pipe_context *ctx,
  struct pipe_resource *resource,
 unsigned level,
@@ -276,13 +291,7 @@ static void *r600_buffer_transfer_map(struct pipe_context 
*ctx,
!(usage & PIPE_TRANSFER_UNSYNCHRONIZED)) {
assert(usage & PIPE_TRANSFER_WRITE);

-   /* Check if mapping this buffer would cause waiting for the 
GPU. */
-   if 

[Mesa-dev] [PATCH] radeonsi: don't miss changes to SPI_TMPRING_SIZE

2016-01-13 Thread Marek Olšák
From: Marek Olšák 

I'm not sure about the consequences of this bug, but it's definitely
dangerous.

This applies to SI, CIK, VI.

Cc: 11.0 11.1 
---
 src/gallium/drivers/radeonsi/si_state_shaders.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 35b226f..8ff70b4 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1317,6 +1317,7 @@ static bool si_update_spi_tmpring_size(struct si_context 
*sctx)
si_get_max_scratch_bytes_per_wave(sctx);
unsigned scratch_needed_size = scratch_bytes_per_wave *
sctx->scratch_waves;
+   unsigned spi_tmpring_size;
int r;
 
if (scratch_needed_size > 0) {
@@ -1386,8 +1387,12 @@ static bool si_update_spi_tmpring_size(struct si_context 
*sctx)
assert((scratch_needed_size & ~0x3FF) == scratch_needed_size &&
"scratch size should already be aligned correctly.");
 
-   sctx->spi_tmpring_size = S_0286E8_WAVES(sctx->scratch_waves) |
-   S_0286E8_WAVESIZE(scratch_bytes_per_wave >> 10);
+   spi_tmpring_size = S_0286E8_WAVES(sctx->scratch_waves) |
+  S_0286E8_WAVESIZE(scratch_bytes_per_wave >> 10);
+   if (spi_tmpring_size != sctx->spi_tmpring_size) {
+   sctx->spi_tmpring_size = spi_tmpring_size;
+   sctx->emit_scratch_reloc = true;
+   }
return true;
 }
 
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: don't miss changes to SPI_TMPRING_SIZE

2016-01-13 Thread Nicolai Hähnle

Good catch.

Reviewed-by: Nicolai Hähnle 

On 13.01.2016 13:32, Marek Olšák wrote:

From: Marek Olšák 

I'm not sure about the consequences of this bug, but it's definitely
dangerous.

This applies to SI, CIK, VI.

Cc: 11.0 11.1 
---
  src/gallium/drivers/radeonsi/si_state_shaders.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 35b226f..8ff70b4 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1317,6 +1317,7 @@ static bool si_update_spi_tmpring_size(struct si_context 
*sctx)
si_get_max_scratch_bytes_per_wave(sctx);
unsigned scratch_needed_size = scratch_bytes_per_wave *
sctx->scratch_waves;
+   unsigned spi_tmpring_size;
int r;

if (scratch_needed_size > 0) {
@@ -1386,8 +1387,12 @@ static bool si_update_spi_tmpring_size(struct si_context 
*sctx)
assert((scratch_needed_size & ~0x3FF) == scratch_needed_size &&
"scratch size should already be aligned correctly.");

-   sctx->spi_tmpring_size = S_0286E8_WAVES(sctx->scratch_waves) |
-   S_0286E8_WAVESIZE(scratch_bytes_per_wave >> 10);
+   spi_tmpring_size = S_0286E8_WAVES(sctx->scratch_waves) |
+  S_0286E8_WAVESIZE(scratch_bytes_per_wave >> 10);
+   if (spi_tmpring_size != sctx->spi_tmpring_size) {
+   sctx->spi_tmpring_size = spi_tmpring_size;
+   sctx->emit_scratch_reloc = true;
+   }
return true;
  }



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] NIR, SCons, and Gallium

2016-01-13 Thread Jose Fonseca

On 11/01/16 14:21, Jose Fonseca wrote:

FWIW, I updated SCons to build NIR, both with GCC and MSVC:

   http://cgit.freedesktop.org/~jrfonseca/mesa/log/?h=scons-nir

It was actually simpler than I anticipated.

But I hit a wall -- there's actually no way to get NIR used with
softpipe/llvmpipe, not even as an intermediate IR somewhere between GLSL
IR and TGSI, is there?

Without this I can't actually test it.  And I'm afraid the scons
integration will rot again unless it is used.


I know other gallium drivers already use NIR, but IIUC, they use NIR
internally, ie., TGSI -> NIR-> HW.


So what is exactly the long term plan for NIR in Mesa general, and
Gallium in particular?
- replace GLSL IR completely?
- use NIR as intermediate IR betweem GLSL IR and TGSI, and run
optimizations in there?
- use NIR instead of TGSI at the gallium interface?
- be only used internally by drivers?
- something else?


Jose



Thanks for all the replies.


So IIUC, there a NIR -> TGSI pass in progress, it's not ready for 
production but there's several parties interested in having it as an option.



It's still not crystal clear to me whether building NIR with SCons and 
MSVC will:


- accelerate sinergy (e.g make it easier to use more NIR code in more 
places without risking build failures due to missing headers/symbols)


- or cause more trouble (ie make MSVC builds fail even more often)

I don't think there's any way to figure out but trying it.  So I'm going 
to polish my patches and post for review and get them committed.


And if it turns out that keeping NIR on a buildable state with MSVC ends 
up causing more problems for everybody than it solves, we can take a 
step back then (e.g, add a switch to not build NIR on MSVC, and set it 
off by default.)


Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] nir: Handle =32 case in bitfield_insert lowering.

2016-01-13 Thread Matt Turner
The OpenGL specifications for bitfieldInsert() says:

   The result will be undefined if  or  is negative, or if
   the sum of  and  is greater than the number of bits
   used to store the operand.

Therefore passing bits=32, offset=0 is legal and defined in GLSL.

But the earlier SM5 bfi opcode is specified to accept a bitfield width
ranging from 0-31. As such, Intel and AMD instructions read only the low
5 bits of the width operand, making them not able to implement the
GLSL-specified behavior directly.

This commit fixes the lowering of bitfield_insert to handle the trivial
case of  = 32 as

   bitfieldInsert:
  bits > 31 ? insert : bfi(bfm(bits, offset), insert, base)

Fixes:
   ES31-CTS.shader_bitfield_operation.bitfieldInsert.uint_2
   ES31-CTS.shader_bitfield_operation.bitfieldInsert.uvec4_3
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
---
These two patches replace 8/9 and 9/9 of the previous series.
The first 7 patches from it have been reviewed and committed.

 src/glsl/nir/nir_opcodes.py   | 1 +
 src/glsl/nir/nir_opt_algebraic.py | 6 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py
index 1c65def..3e43438 100644
--- a/src/glsl/nir/nir_opcodes.py
+++ b/src/glsl/nir/nir_opcodes.py
@@ -558,6 +558,7 @@ triop("fcsel", tfloat, "(src0 != 0.0f) ? src1 : src2")
 opcode("bcsel", 0, tuint, [0, 0, 0],
   [tbool, tuint, tuint], "", "src0 ? src1 : src2")
 
+# SM5 bfi assembly
 triop("bfi", tuint, """
 unsigned mask = src0, insert = src1, base = src2;
 if (mask == 0) {
diff --git a/src/glsl/nir/nir_opt_algebraic.py 
b/src/glsl/nir/nir_opt_algebraic.py
index 1eb044a..0d31e39 100644
--- a/src/glsl/nir/nir_opt_algebraic.py
+++ b/src/glsl/nir/nir_opt_algebraic.py
@@ -225,9 +225,13 @@ optimizations = [
 
# Misc. lowering
(('fmod', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b, 
'options->lower_fmod'),
-   (('bitfield_insert', a, b, c, d), ('bfi', ('bfm', d, c), b, a), 
'options->lower_bitfield_insert'),
(('uadd_carry', a, b), ('b2i', ('ult', ('iadd', a, b), a)), 
'options->lower_uadd_carry'),
(('usub_borrow', a, b), ('b2i', ('ult', a, b)), 
'options->lower_usub_borrow'),
+
+   (('bitfield_insert', 'base', 'insert', 'offset', 'bits'),
+('bcsel', ('ilt', 31, 'bits'), 'insert',
+  ('bfi', ('bfm', 'bits', 'offset'), 'insert', 'base')),
+'options->lower_bitfield_insert'),
 ]
 
 # Add optimizations to handle the case where the result of a ternary is
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] nir: Lower bitfield_extract.

2016-01-13 Thread Matt Turner
The OpenGL specifications for bitfieldExtract() says:

   The result will be undefined if  or  is negative, or if
   the sum of  and  is greater than the number of bits
   used to store the operand.

Therefore passing bits=32, offset=0 is legal and defined in GLSL.

But the earlier SM5 ubfe/ibfe opcodes are specified to accept a bitfield width
ranging from 0-31. As such, Intel and AMD instructions read only the low 5 bits
of the width operand, making them not able to implement the GLSL-specified
behavior directly.

This commit adds ubfe/ibfe operations from SM5 and a lowering pass for
bitfield_extract to to handle the trivial case of  = 32 as

   bitfieldExtract:
  bits > 31 ? value : bfe(value, offset, bits)

Fixes:
   ES31-CTS.shader_bitfield_operation.bitfieldExtract.uvec3_0
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
---
 src/glsl/nir/nir.h |  1 +
 src/glsl/nir/nir_opcodes.py| 31 ++
 src/glsl/nir/nir_opt_algebraic.py  | 10 ++
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   |  3 +++
 src/mesa/drivers/dri/i965/brw_shader.cpp   |  1 +
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |  3 +++
 6 files changed, 49 insertions(+)

diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 23aec69..11add65 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1447,6 +1447,7 @@ typedef struct nir_shader_compiler_options {
bool lower_fsat;
bool lower_fsqrt;
bool lower_fmod;
+   bool lower_bitfield_extract;
bool lower_bitfield_insert;
bool lower_uadd_carry;
bool lower_usub_borrow;
diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py
index 3e43438..e79810c 100644
--- a/src/glsl/nir/nir_opcodes.py
+++ b/src/glsl/nir/nir_opcodes.py
@@ -573,6 +573,37 @@ if (mask == 0) {
 }
 """)
 
+# SM5 ubfe/ibfe assembly
+opcode("ubfe", 0, tuint,
+   [0, 0, 0], [tuint, tint, tint], "", """
+unsigned base = src0;
+int offset = src1, bits = src2;
+if (bits == 0) {
+   dst = 0;
+} else if (bits < 0 || offset < 0) {
+   dst = 0; /* undefined */
+} else if (offset + bits < 32) {
+   dst = (base << (32 - bits - offset)) >> (32 - bits);
+} else {
+   dst = base >> offset;
+}
+""")
+opcode("ibfe", 0, tint,
+   [0, 0, 0], [tint, tint, tint], "", """
+int base = src0;
+int offset = src1, bits = src2;
+if (bits == 0) {
+   dst = 0;
+} else if (bits < 0 || offset < 0) {
+   dst = 0; /* undefined */
+} else if (offset + bits < 32) {
+   dst = (base << (32 - bits - offset)) >> (32 - bits);
+} else {
+   dst = base >> offset;
+}
+""")
+
+# GLSL bitfieldExtract()
 opcode("ubitfield_extract", 0, tuint,
[0, 0, 0], [tuint, tint, tint], "", """
 unsigned base = src0;
diff --git a/src/glsl/nir/nir_opt_algebraic.py 
b/src/glsl/nir/nir_opt_algebraic.py
index 0d31e39..7745b76 100644
--- a/src/glsl/nir/nir_opt_algebraic.py
+++ b/src/glsl/nir/nir_opt_algebraic.py
@@ -232,6 +232,16 @@ optimizations = [
 ('bcsel', ('ilt', 31, 'bits'), 'insert',
   ('bfi', ('bfm', 'bits', 'offset'), 'insert', 'base')),
 'options->lower_bitfield_insert'),
+
+   (('ibitfield_extract', 'value', 'offset', 'bits'),
+('bcsel', ('ilt', 31, 'bits'), 'value',
+  ('ibfe', 'value', 'offset', 'bits')),
+'options->lower_bitfield_extract'),
+
+   (('ubitfield_extract', 'value', 'offset', 'bits'),
+('bcsel', ('ult', 31, 'bits'), 'value',
+  ('ubfe', 'value', 'offset', 'bits')),
+'options->lower_bitfield_extract'),
 ]
 
 # Add optimizations to handle the case where the result of a ternary is
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 8740925..d7bcc1c 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -1027,6 +1027,9 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
 
case nir_op_ubitfield_extract:
case nir_op_ibitfield_extract:
+  unreachable("should have been lowered");
+   case nir_op_ubfe:
+   case nir_op_ibfe:
   bld.BFE(result, op[2], op[1], op[0]);
   break;
case nir_op_bfm:
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 0ac3f4a..3a69c23 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -106,6 +106,7 @@ brw_compiler_create(void *mem_ctx, const struct 
brw_device_info *devinfo)
nir_options->lower_fdiv = true;
nir_options->lower_scmp = true;
nir_options->lower_fmod = true;
+   nir_options->lower_bitfield_extract = true;
nir_options->lower_bitfield_insert = true;
nir_options->lower_uadd_carry = true;
nir_options->lower_usub_borrow = true;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index ecca166..0ae723f 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1385,6 +1385,9 

Re: [Mesa-dev] [PATCH v3] glapi: Build glapi_gentable.c only on Darwin

2016-01-13 Thread Matt Turner
glxgears still works for me, and libGL goes from 4.2M to 3.3M.

Reviewed-by: Matt Turner 

We should also include some mention of Arlie's contribution, since he
identified this and sent the initial patch:

Reported-by: Arlie Davis 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92850] Segfault loading War Thunder

2016-01-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92850

--- Comment #57 from Ernst Sjöstrand  ---
With current git I get a crash like this:

0x726ff349 in glsl_to_tgsi_visitor::visit (this=0x7fffa2725600,
ir=0x7fffa271baf8) at state_tracker/st_glsl_to_tgsi.cpp:3161
3161   const glsl_type *sampler_type = ir->sampler->type;


(gdb) bt full
#0  0x726ff349 in glsl_to_tgsi_visitor::visit (this=0x7fffa2725600,
ir=0x7fffa271baf8) at state_tracker/st_glsl_to_tgsi.cpp:3161
result_src = {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle
= 0, negate = 0, type = 13, reladdr = 0x0, reladdr2 = ,
has_index2 = , 
  double_reg2 = , array_id = ,
is_double_vertex_input = }
coord = {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle = 0,
negate = 0, type = 13, reladdr = 0x0, reladdr2 = , has_index2 =
, 
  double_reg2 = , array_id = ,
is_double_vertex_input = }
cube_sc = {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle =
0, negate = 0, type = 13, reladdr = 0x0, reladdr2 = , has_index2
= , 
  double_reg2 = , array_id = ,
is_double_vertex_input = }
lod_info = {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle =
0, negate = 0, type = 13, reladdr = 0x0, reladdr2 = , has_index2
= , 
  double_reg2 = , array_id = ,
is_double_vertex_input = }
projector = {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle
= 0, negate = 0, type = 13, reladdr = 0x0, reladdr2 = ,
has_index2 = , 
  double_reg2 = , array_id = ,
is_double_vertex_input = }
dx = {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle = 0,
negate = 0, type = 13, reladdr = 0x0, reladdr2 = , has_index2 =
, 
  double_reg2 = , array_id = ,
is_double_vertex_input = }
dy = {file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle = 0,
negate = 0, type = 13, reladdr = 0x0, reladdr2 = , has_index2 =
, 
  double_reg2 = , array_id = ,
is_double_vertex_input = }
offset = {{file = PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle =
0, negate = 0, type = 13, reladdr = 0x0, reladdr2 = 0x0, has_index2 = false,
double_reg2 = false, 
array_id = 0, is_double_vertex_input = false}, {file =
PROGRAM_UNDEFINED, index = 0, index2D = 0, swizzle = 0, negate = 0, type = 13,
reladdr = 0x0, reladdr2 = 0x0, 
has_index2 = false, double_reg2 = false, array_id = 0,
is_double_vertex_input = false}, {file = PROGRAM_UNDEFINED, index = 0, index2D
= 0, swizzle = 0, negate = 0, 
type = 13, reladdr = 0x0, reladdr2 = 0x0, has_index2 = false,
double_reg2 = false, array_id = 0, is_double_vertex_input = false}, {file =
4067491840, index = 0, 
index2D = 0, swizzle = 89, negate = 12, type = 6, reladdr =
0x4000, reladdr2 = 0x0, has_index2 = false, double_reg2 = false, array_id =
0, 
is_double_vertex_input = false}}
sample_index = 
component = 
levels_src = 
result_dst = 
coord_dst = 
cube_sc_dst = 
inst = 
opcode = 
sampler_type = 
sampler_index = 
is_cube_array = 
i = 


(gdb) p *(ir->sampler)
$8 = { = { = { = {next = 0x0, prev =
0x0}, _vptr.ir_instruction = 0x72e23b90 , 
  ir_type = ir_type_dereference_variable}, type = 0x72e3fcc0
}, }

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V2 19/28] glsl: add support for explicit components to frag outputs

2016-01-13 Thread Anuj Phogat
On Wed, Jan 13, 2016 at 1:19 AM, Timothy Arceri
 wrote:
> V2: fix error checking for arrays and components. V1 was
> only taking into account all the array elements and all the
> components of one of the varyings during the comparision
> and treating the other as a single slot/component.
>
> Cc: Anuj Phogat 
> ---
>  src/glsl/linker.cpp | 72 
> +
>  1 file changed, 62 insertions(+), 10 deletions(-)
>
> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> index b81bfba..c66dcc4 100644
> --- a/src/glsl/linker.cpp
> +++ b/src/glsl/linker.cpp
> @@ -2411,7 +2411,12 @@ assign_attribute_or_color_locations(gl_shader_program 
> *prog,
>}
> } to_assign[16];
>
> +   /* Temporary array for the set of attributes that have locations assigned.
> +*/
> +   ir_variable *assigned[16];
> +
> unsigned num_attr = 0;
> +   unsigned assigned_attr = 0;
>
> foreach_in_list(ir_instruction, node, sh->ir) {
>ir_variable *const var = node->as_variable();
> @@ -2573,18 +2578,62 @@ assign_attribute_or_color_locations(gl_shader_program 
> *prog,
>  * attribute overlaps any previously allocated bits.
>  */
> if ((~(use_mask << attr) & used_locations) != used_locations) {
> -   if (target_index == MESA_SHADER_FRAGMENT ||
> -   (prog->IsES && prog->Version >= 300)) {
> -  linker_error(prog,
> -   "overlapping location is assigned "
> -   "to %s `%s' %d %d %d\n", string,
> -   var->name, used_locations, use_mask, attr);
> +   if (target_index == MESA_SHADER_FRAGMENT && !prog->IsES) {
> +  /* From section 4.4.2 (Output Layout Qualifiers) of the 
> GLSL
> +   * 4.40 spec:
> +   *
> +   *"Additionally, for fragment shader outputs, if two
> +   *variables are placed within the same location, they
> +   *must have the same underlying type (floating-point or
> +   *integer). No component aliasing of output variables 
> or
> +   *members is allowed.
> +   */
> +  for (unsigned i = 0; i < assigned_attr; i++) {
> + unsigned assigned_slots =
> +assigned[i]->type->count_attribute_slots(false);
> +unsigned assig_attr =
> +assigned[i]->data.location - generic_base;
> +unsigned assigned_use_mask = (1 << assigned_slots) - 1;
> +
> + if ((assigned_use_mask << assig_attr) &
> + (use_mask << attr)) {
> +
> +const glsl_type *assigned_type =
> +   assigned[i]->type->without_array();
> +const glsl_type *type = var->type->without_array();
> +if (assigned_type->base_type != type->base_type) {
> +   linker_error(prog, "types do not match for 
> aliased"
> +" %ss %s and %s\n", string,
> +assigned[i]->name, var->name);
> +   return false;
> +}
> +
> +unsigned assigned_component_mask =
> +   ((1 << assigned_type->vector_elements) - 1) <<
> +   assigned[i]->data.location_frac;
> +unsigned component_mask =
> +   ((1 << type->vector_elements) - 1) <<
> +   var->data.location_frac;
> +if (assigned_component_mask & component_mask) {
> +   linker_error(prog, "overlapping component is "
> +"assigned to %ss %s and %s "
> +"(component=%d)\n",
> +string, assigned[i]->name, var->name,
> +var->data.location_frac);
> +   return false;
> +}
> + }
> +  }
> +   } else if (target_index == MESA_SHADER_FRAGMENT ||
> +  (prog->IsES && prog->Version >= 300)) {
> +  linker_error(prog, "overlapping location is assigned "
> +   "to %s `%s' %d %d %d\n", string, var->name,
> +   used_locations, use_mask, attr);
>return false;
> } else {
> -  linker_warning(prog,
> - "overlapping location is assigned "
> - "to %s `%s' %d %d %d\n", string,
> -  

[Mesa-dev] [PATCH] texobj: Check completeness with InternalFormat rather than Mesa format

2016-01-13 Thread Neil Roberts
The internal Mesa format used for a texture might not match the one
requested in the internalFormat when the texture was created, for
example if the driver is internally remapping RGB textures to RGBA.
Otherwise it can cause false positives for completeness if one mipmap
image is created as RGBA and the other as RGB because they would both
have an RGBA Mesa format. If we check the InternalFormat instead then
we are directly checking the API usage which I think better matches
the intention of the check.

https://bugs.freedesktop.org/show_bug.cgi?id=93700
---
 src/mesa/main/texobj.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/texobj.c b/src/mesa/main/texobj.c
index 547055e..b107a8f 100644
--- a/src/mesa/main/texobj.c
+++ b/src/mesa/main/texobj.c
@@ -835,7 +835,7 @@ _mesa_test_texobj_completeness( const struct gl_context 
*ctx,
   incomplete(t, MIPMAP, "TexImage[%d] is missing", i);
   return;
}
-   if (img->TexFormat != baseImage->TexFormat) {
+   if (img->InternalFormat != baseImage->InternalFormat) {
   incomplete(t, MIPMAP, "Format[i] != Format[baseLevel]");
   return;
}
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] ttn: use writemask for store_var

2016-01-13 Thread Rob Clark
From: Rob Clark 

Only user is freedreno, and after array-rework it can cope.  Avoids
generating loads for a store.

Signed-off-by: Rob Clark 
---
Note: I need to finish some array re-work in ir3 in order to be able
to deal w/ the writemasks, so I intend to push this as part of that
series once I've debugged a few last things.  It doesn't effect vc4
which doesn't handle TEMP arrays.   But wanted to send it to list
now for review.

 src/gallium/auxiliary/nir/tgsi_to_nir.c | 28 ++--
 1 file changed, 2 insertions(+), 26 deletions(-)

diff --git a/src/gallium/auxiliary/nir/tgsi_to_nir.c 
b/src/gallium/auxiliary/nir/tgsi_to_nir.c
index 94d992b..46c9297 100644
--- a/src/gallium/auxiliary/nir/tgsi_to_nir.c
+++ b/src/gallium/auxiliary/nir/tgsi_to_nir.c
@@ -673,10 +673,6 @@ ttn_get_dest(struct ttn_compile *c, struct 
tgsi_full_dst_register *tgsi_fdst)
 
if (tgsi_dst->File == TGSI_FILE_TEMPORARY) {
   if (c->temp_regs[index].var) {
-  nir_builder *b = >build;
-  nir_intrinsic_instr *load;
-  struct tgsi_ind_register *indirect =
-tgsi_dst->Indirect ? _fdst->Indirect : NULL;
   nir_register *reg;
 
  /* this works, because TGSI will give us a base offset
@@ -690,26 +686,6 @@ ttn_get_dest(struct ttn_compile *c, struct 
tgsi_full_dst_register *tgsi_fdst)
  reg->num_components = 4;
  dest.dest.reg.reg = reg;
  dest.dest.reg.base_offset = 0;
-
- /* since the alu op might not write to all components
-  * of the temporary, we must first do a load_var to
-  * get the previous array elements into the register.
-  * This is one area that NIR could use a bit of
-  * improvement (or opt pass to clean up the mess
-  * once things are scalarized)
-  */
-
- load = nir_intrinsic_instr_create(c->build.shader,
-   nir_intrinsic_load_var);
- load->num_components = 4;
- load->variables[0] =
-   ttn_array_deref(c, load, c->temp_regs[index].var,
-   c->temp_regs[index].offset,
-   indirect);
-
- load->dest = nir_dest_for_reg(reg);
-
- nir_builder_instr_insert(b, >instr);
   } else {
  assert(!tgsi_dst->Indirect);
  dest.dest.reg.reg = c->temp_regs[index].reg;
@@ -1886,7 +1862,7 @@ ttn_emit_instruction(struct ttn_compile *c)
   ttn_move_dest(b, dest, nir_fsat(b, ttn_src_for_dest(b, )));
}
 
-   /* if the dst has a matching var, append store_global to move
+   /* if the dst has a matching var, append store_var to move
 * output from reg to var
 */
nir_variable *var = ttn_get_var(c, tgsi_dst);
@@ -1899,7 +1875,7 @@ ttn_emit_instruction(struct ttn_compile *c)
_dst->Indirect : NULL;
 
   store->num_components = 4;
-  store->const_index[0] = 0xf;
+  store->const_index[0] = dest.write_mask;
   store->variables[0] = ttn_array_deref(c, store, var, offset, indirect);
   store->src[0] = nir_src_for_reg(dest.dest.reg.reg);
 
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V2 19/28] glsl: add support for explicit components to frag outputs

2016-01-13 Thread Timothy Arceri
On Wed, 2016-01-13 at 12:02 -0800, Anuj Phogat wrote:
> Timothy, Do you have a branch somewhere with the latest patches?

https://github.com/tarceri/Mesa_arrays_of_arrays.git explicit_offset

Contains the latest for component, offset, and align qualifiers all of
which have now been sent to the list.


> 
> On Wed, Jan 13, 2016 at 10:58 AM, Anuj Phogat 
> wrote:
> > On Wed, Jan 13, 2016 at 1:19 AM, Timothy Arceri
> >  wrote:
> > > V2: fix error checking for arrays and components. V1 was
> > > only taking into account all the array elements and all the
> > > components of one of the varyings during the comparision
> > > and treating the other as a single slot/component.
> > > 
> > > Cc: Anuj Phogat 
> > > ---
> > >  src/glsl/linker.cpp | 72
> > > +
> > >  1 file changed, 62 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> > > index b81bfba..c66dcc4 100644
> > > --- a/src/glsl/linker.cpp
> > > +++ b/src/glsl/linker.cpp
> > > @@ -2411,7 +2411,12 @@
> > > assign_attribute_or_color_locations(gl_shader_program *prog,
> > >}
> > > } to_assign[16];
> > > 
> > > +   /* Temporary array for the set of attributes that have
> > > locations assigned.
> > > +*/
> > > +   ir_variable *assigned[16];
> > > +
> > > unsigned num_attr = 0;
> > > +   unsigned assigned_attr = 0;
> > > 
> > > foreach_in_list(ir_instruction, node, sh->ir) {
> > >ir_variable *const var = node->as_variable();
> > > @@ -2573,18 +2578,62 @@
> > > assign_attribute_or_color_locations(gl_shader_program *prog,
> > >  * attribute overlaps any previously allocated bits.
> > >  */
> > > if ((~(use_mask << attr) & used_locations) !=
> > > used_locations) {
> > > -   if (target_index == MESA_SHADER_FRAGMENT ||
> > > -   (prog->IsES && prog->Version >= 300)) {
> > > -  linker_error(prog,
> > > -   "overlapping location is assigned
> > > "
> > > -   "to %s `%s' %d %d %d\n", string,
> > > -   var->name, used_locations,
> > > use_mask, attr);
> > > +   if (target_index == MESA_SHADER_FRAGMENT && !prog
> > > ->IsES) {
> > > +  /* From section 4.4.2 (Output Layout
> > > Qualifiers) of the GLSL
> > > +   * 4.40 spec:
> > > +   *
> > > +   *"Additionally, for fragment shader
> > > outputs, if two
> > > +   *variables are placed within the same
> > > location, they
> > > +   *must have the same underlying type
> > > (floating-point or
> > > +   *integer). No component aliasing of
> > > output variables or
> > > +   *members is allowed.
> > > +   */
> > > +  for (unsigned i = 0; i < assigned_attr; i++) {
> > > + unsigned assigned_slots =
> > > +assigned[i]->type
> > > ->count_attribute_slots(false);
> > > +unsigned assig_attr =
> > > +assigned[i]->data.location -
> > > generic_base;
> > > +unsigned assigned_use_mask = (1 <<
> > > assigned_slots) - 1;
> > > +
> > > + if ((assigned_use_mask << assig_attr) &
> > > + (use_mask << attr)) {
> > > +
> > > +const glsl_type *assigned_type =
> > > +   assigned[i]->type->without_array();
> > > +const glsl_type *type = var->type
> > > ->without_array();
> > > +if (assigned_type->base_type != type
> > > ->base_type) {
> > > +   linker_error(prog, "types do not
> > > match for aliased"
> > > +" %ss %s and %s\n",
> > > string,
> > > +assigned[i]->name, var
> > > ->name);
> > > +   return false;
> > > +}
> > > +
> > > +unsigned assigned_component_mask =
> > > +   ((1 << assigned_type
> > > ->vector_elements) - 1) <<
> > > +   assigned[i]->data.location_frac;
> > > +unsigned component_mask =
> > > +   ((1 << type->vector_elements) - 1) <<
> > > +   var->data.location_frac;
> > > +if (assigned_component_mask &
> > > component_mask) {
> > > +   linker_error(prog, "overlapping
> > > component is "
> > > +"assigned to %ss %s and
> > > %s "
> > > +"(component=%d)\n",
> > > +string, 

Re: [Mesa-dev] [PATCH 1/2] nir: Add a fquantize2f16 opcode

2016-01-13 Thread Jason Ekstrand
On Wed, Jan 13, 2016 at 2:14 PM, Matt Turner  wrote:

> On Wed, Jan 13, 2016 at 1:46 PM, Jason Ekstrand 
> wrote:
> > On Wed, Jan 13, 2016 at 2:01 AM, Ian Romanick 
> wrote:
> >> On 01/12/2016 05:41 PM, Matt Turner wrote:
> >> > Section 8.3.2 of the OpenCL C 2.0 spec is also relevant, but doesn't
> >> > touch directly on the issue at hand.
> >> >
> >> > I'm worried that what is specified is not implementable via a round
> >> > trip through half-precision, because it's not the behavior other
> >> > languages implement.
> >> >
> >> > If I had to guess, given the table in the IVB PRM and section 8.3.2,
> >> > out-of-range single-precision floats are converted to the
> >> > half-precision value with the largest magnitude.
> >>
> >> You are correct, we should test it to be sure what the hardware really
> >> does. This is not intended to be a performance operation. If we need to
> >> use a different, more expensive expansion to meet the requirements, we
> >> shouldn't lose any sleep over it.
> >
> >
> > I haven't looked at it in bit-for-bit detail, but I I did run it through
> a
> > set of tests which explicitly hits denorms and the out-of-bounds cases in
> > both directions.  The tests seem to indicate that the hardware does what
> the
> > opcode claims.
>
> I checked out the tests you mention, and none of the cases touch on
> what I'm saying (and this has nothing to do with denormal values). Let
> me explain again.
>

Right.  Thanks for looking at it.  I guess it only checks the explicit
infinity case.


> The largest representable value in half-precision is
>
>65504 == 2.0**15 * (1.0 + 1023.0 / 2.0**10)
>
> and the distance between representable integers at this range is 32.
> Converting 65505.0f through 65519.0f (i.e., one less than half the
> interval more than the largest representable value) to half-precision
> should round to 65504.0. 65520.0f and larger should round to infinity.
>
> This is what piglit tests
> (generated_tests/gen_builtin_packing_tests.py) and since we pass those
> tests I believe this is what the hardware does.
>
> This is, unfortunately, *not* what the documentation you've cited
> says. I expect that that's an oversight more than intentional
> behavior. Maybe tomorrow we can figure out how to submit changes to
> the spec and test suite?
>

Yeah, we can look at that tomorrow.  The objective of the opcode is to get
the behavior that Ian mentioned where if you sprinkle enough of them in,
you can emulate half-float precision.  What happens if you do FLOAT_MAX +
FLOAT_MAX?  Maybe infinity is what's wanted.  If that's the case, then
we'll have to do some sort of absolute value range-check.  It doesn't have
to be efficient.
--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: restrict consumer stage condition to modify interpolation type

2016-01-13 Thread Samuel Iglesias Gonsálvez
Only modify interpolation type for integer-based varyings or when the
consumer is known and different than fragment shader.

If we are linking separate shader programs and the consumer is unknown,
the consumer could be added later and be a fragment shader. If we
modify the interpolation type in this case, we could read wrong
values in the fragment shader inputs, as shown in bug 93320.

Fixes the following CTS test:
   ES31-CTS.vertex_attrib_binding.advanced-bindingUpdate

Fixes the following dEQP tests:

dEQP-GLES31.functional.separate_shader.random.102
dEQP-GLES31.functional.separate_shader.random.111
dEQP-GLES31.functional.separate_shader.random.115
dEQP-GLES31.functional.separate_shader.random.17
dEQP-GLES31.functional.separate_shader.random.22
dEQP-GLES31.functional.separate_shader.random.23
dEQP-GLES31.functional.separate_shader.random.3
dEQP-GLES31.functional.separate_shader.random.32
dEQP-GLES31.functional.separate_shader.random.39
dEQP-GLES31.functional.separate_shader.random.64
dEQP-GLES31.functional.separate_shader.random.73
dEQP-GLES31.functional.separate_shader.random.91

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93320
Signed-off-by: Samuel Iglesias Gonsálvez 
---

This patch adds 2 regressions in dEQP:

dEQP-GLES31.functional.separate_shader.random.49
dEQP-GLES31.functional.separate_shader.random.106

The failure is returned by validation_io() because the number of inputs
and outputs does not match with this patch applied.

As the interpolation type is not modified in
varying_matches::record() when we don't know the consumer
(consumer_stage == -1, for example in some separate shader objects), we
don't pack them together because its packing class does not match.

As a result, some output packed varyings are in different varying slots
than the input ones. Due to that, we have a mismatch in the number of
inputs and outputs because we don't check how many varyings we have
inside of each varying slot ("packed:var0,var1...") nor their type.

The validation of packed varyings doesn't seem to be trivial
and this is a different issue than the one this patch fixes.

 src/glsl/link_varyings.cpp | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
index 7cc5880..09f80d0 100644
--- a/src/glsl/link_varyings.cpp
+++ b/src/glsl/link_varyings.cpp
@@ -968,10 +968,12 @@ varying_matches::record(ir_variable *producer_var, 
ir_variable *consumer_var)
}
 
if ((consumer_var == NULL && producer_var->type->contains_integer()) ||
-   consumer_stage != MESA_SHADER_FRAGMENT) {
+   (consumer_stage != -1 && consumer_stage != MESA_SHADER_FRAGMENT)) {
   /* Since this varying is not being consumed by the fragment shader, its
-   * interpolation type varying cannot possibly affect rendering.  Also,
-   * this variable is non-flat and is (or contains) an integer.
+   * interpolation type varying cannot possibly affect rendering.
+   * Also, this variable is non-flat and is (or contains) an integer.
+   * If the consumer stage is unknown, don't modify the interpolation
+   * type as it could affect rendering later with separate shaders.
*
* lower_packed_varyings requires all integer varyings to flat,
* regardless of where they appear.  We can trivially satisfy that
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V3] glapi: add GL_OES_geometry_shader extension

2016-01-13 Thread Tapani Pälli



On 01/14/2016 01:11 AM, Ilia Mirkin wrote:

On Wed, Jan 13, 2016 at 3:55 AM, Tapani Pälli  wrote:

On 01/13/2016 10:29 AM, Lofstedt, Marta wrote:




-Original Message-
From: ibmir...@gmail.com [mailto:ibmir...@gmail.com] On Behalf Of Ilia
Mirkin
Sent: Tuesday, January 12, 2016 7:09 PM
To: Marta Lofstedt
Cc: mesa-dev@lists.freedesktop.org; Romanick, Ian D; Lofstedt, Marta
Subject: Re: [PATCH V3] glapi: add GL_OES_geometry_shader extension

On Tue, Jan 12, 2016 at 9:13 AM, Marta Lofstedt
 wrote:


From: Marta Lofstedt 

Add xml definitions for the GL_OES_geometry_shader extension and
expose the extension for OpenGL ES 3.1.

V3: Added dependency to OES_shader_io_blocks and updated to correct
Khronos extension number.


May I ask why you did this? OES_shader_io_blocks is a purely shader
compiler/linker feature, I expect it will be enabled whenever GLES 3.1 is
enabled, no? Why would it be tied to geometry shaders? Sure, geometry
shaders require it to work, but just because you have
OES_shader_io_blocks
doesn't necessarily mean you also have geometry shaders...


My intension was to address the co-dependency between oes_geometry_shader
and oes_shader_io_block.
But as always, you are right Ilia. The dependency issue need to be fixed
in the driver.

So, please disregard this V3, I will push the V2 with the changes
suggested by Ilia in the comments.

FYI here are quotes from the oes_geometry_shader specification:
" OES_shader_io_blocks or EXT_shader_io_blocks is required."



IMO according to this it looks OES_shader_io_blocks is a valid requirement
as that functionality is not part of OpenGL ES 3.1.


Sure. But that has little bearing on the discussion here --

OES_shader_io_blocks is a compiler feature, not a backend feature. In
order for any backend to expose OES_geometry_shader, the
OES_shader_io_blocks ext needs to be done. But just because it is done
doesn't mean you have geometry shaders.


Right, that is correct. For a moment there I forgot how this table works 
:) There needs to be separate enable bits for these.



So you have to make sure that not only does the backend support
geometry shaders, but the core supports OES_shader_io_blocks before
you expose OES_geometry_shader. That doesn't seem too onerous.

   -ilia



// Tapani
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: allow duplicate layout-qualifier-names

2016-01-13 Thread Timothy Arceri
The special case from detecting stream duplicates is also
removed, as testing never trigged this error.

From the ARB_shading_language_420pack spec:

   "More than one layout qualifier may appear in a single
   declaration. If the same layout-qualifier-name occurs in
   multiple layout qualifiers for the same declaration, the
   last one overrides the former ones."

While the extension spec is taking about multiple layout
qualifiers we interpret that to mean layout-qualifier-names
can also occur multiple times within a single layout qualifier.

In Section 4.4 (Layout Qualifiers) of the GLSL 4.40 spec it
clarifies this:

   "More than one layout qualifier may appear in a single
   declaration. Additionally, the same layout-qualifier-name
   can occur multiple times within a layout qualifier or across
   multiple layout qualifiers in the  same declaration"
---

 The Nvidia driver allows this for GLSL 4.20 but not for the
 extension.

 Piglit tests:

 http://patchwork.freedesktop.org/patch/70459/

 src/glsl/ast_type.cpp | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/src/glsl/ast_type.cpp b/src/glsl/ast_type.cpp
index f4e51b8..afae687 100644
--- a/src/glsl/ast_type.cpp
+++ b/src/glsl/ast_type.cpp
@@ -158,7 +158,8 @@ ast_type_qualifier::merge_qualifier(YYLTYPE *loc,
   allowed_duplicates_mask.flags.i |=
  stream_layout_mask.flags.i;
 
-   if ((this->flags.i & q.flags.i & ~allowed_duplicates_mask.flags.i) != 0) {
+   if (!state->has_420pack() &&
+   (this->flags.i & q.flags.i & ~allowed_duplicates_mask.flags.i) != 0) {
   _mesa_glsl_error(loc, state,
   "duplicate layout qualifiers used");
   return false;
@@ -209,11 +210,6 @@ ast_type_qualifier::merge_qualifier(YYLTYPE *loc,
 this->flags.q.stream = 1;
 this->stream = state->out_qualifier->stream;
  }
-  } else {
- if (q.flags.q.explicit_stream) {
-_mesa_glsl_error(loc, state,
- "duplicate layout `stream' qualifier");
- }
   }
}
 
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ttn: add missing writemask on store_output

2016-01-13 Thread Kenneth Graunke
On Wednesday, January 13, 2016 6:40:59 PM PST Rob Clark wrote:
> From: Rob Clark 
> 
> Signed-off-by: Rob Clark 
> ---
>  src/gallium/auxiliary/nir/tgsi_to_nir.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/gallium/auxiliary/nir/tgsi_to_nir.c
> b/src/gallium/auxiliary/nir/tgsi_to_nir.c index 46c9297..e127174 100644
> --- a/src/gallium/auxiliary/nir/tgsi_to_nir.c
> +++ b/src/gallium/auxiliary/nir/tgsi_to_nir.c
> @@ -1908,6 +1908,7 @@ ttn_add_output_stores(struct ttn_compile *c)
>   store->src[0].reg.reg = c->output_regs[loc].reg;
>   store->src[0].reg.base_offset = c->output_regs[loc].offset;
>   store->const_index[0] = loc;
> + store->const_index[1] = 0xf;  /* writemask */
>   store->src[1] = nir_src_for_ssa(nir_imm_int(b, 0));
>   nir_builder_instr_insert(b, >instr);
>}


Oops...sorry :(

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem space

2016-01-13 Thread Ilia Mirkin
Reduces local memory usage in a lot of Metro 2033 Redux and a few KSP
shaders:

total local used in shared programs   : 54116 -> 30372 (-43.88%)

Probably modest advantage to execution, but it's an imporant
prerequisite to dropping some of the TGSI optimizations done by the
state tracker.

Signed-off-by: Ilia Mirkin 
---

Seems like there ought to be a simpler way of doing this... oh well.

 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 64 +-
 1 file changed, 50 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 0e1c332..2085978 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -841,6 +841,11 @@ public:
std::set locals;
 
std::set indirectTempArrays;
+   struct TempBase {
+  int oldBase, newBase;
+   };
+   std::map indirectTempBases;
+   std::map > tempArrayInfo;
std::vector tempArrayId;
 
int clipVertexOutput;
@@ -949,9 +954,19 @@ bool Source::scanSource()
}
tgsi_parse_free();
 
-   // TODO: Compute based on relevant array sizes
-   if (indirectTempArrays.size())
-  info->bin.tlsSpace += (scan.file_max[TGSI_FILE_TEMPORARY] + 1) * 16;
+   if (indirectTempArrays.size()) {
+  int tempBase = 0;
+  for (std::set::const_iterator it = indirectTempArrays.begin();
+   it != indirectTempArrays.end(); ++it) {
+ std::pair& info = tempArrayInfo[*it];
+ TempBase base;
+ base.oldBase = info.first;
+ base.newBase = tempBase;
+ indirectTempBases.insert(std::make_pair(*it, base));
+ tempBase += info.second;
+  }
+  info->bin.tlsSpace += tempBase * 16;
+   }
 
if (info->io.genUserClip > 0) {
   info->io.clipDistances = info->io.genUserClip;
@@ -1208,6 +1223,9 @@ bool Source::scanDeclaration(const struct 
tgsi_full_declaration *decl)
case TGSI_FILE_TEMPORARY:
   for (i = first; i <= last; ++i)
  tempArrayId[i] = arrayId;
+  if (arrayId)
+ tempArrayInfo.insert(std::make_pair(arrayId, std::make_pair(
+   first, last - first + 1)));
   break;
case TGSI_FILE_NULL:
case TGSI_FILE_ADDRESS:
@@ -1374,6 +1392,7 @@ private:
void storeDst(const tgsi::Instruction::DstRegister dst, int c,
  Value *val, Value *ptr);
 
+   void adjustTempIndex(int arrayId, int , int ) const;
Value *applySrcMod(Value *, int s, int c);
 
Symbol *makeSym(uint file, int fileIndex, int idx, int c, uint32_t addr);
@@ -1679,11 +1698,23 @@ Converter::shiftAddress(Value *index)
return mkOp2v(OP_SHL, TYPE_U32, getSSA(4, FILE_ADDRESS), index, mkImm(4));
 }
 
+void
+Converter::adjustTempIndex(int arrayId, int , int ) const
+{
+   std::map::const_iterator it =
+  code->indirectTempBases.find(arrayId);
+   if (it == code->indirectTempBases.end())
+  return;
+
+   idx2d = 1;
+   idx += it->second.newBase - it->second.oldBase;
+}
+
 Value *
 Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr)
 {
int idx2d = src.is2D() ? src.getIndex(1) : 0;
-   const int idx = src.getIndex(0);
+   int idx = src.getIndex(0);
const int swz = src.getSwizzle(c);
Instruction *ld;
 
@@ -1728,8 +1759,7 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister src, 
int c, Value *ptr)
   int arrayid = src.getArrayId();
   if (!arrayid)
  arrayid = code->tempArrayId[idx];
-  idx2d = (code->indirectTempArrays.find(arrayid) !=
-   code->indirectTempArrays.end());
+  adjustTempIndex(arrayid, idx, idx2d);
}
   /* fallthrough */
default:
@@ -1743,7 +1773,7 @@ Converter::acquireDst(int d, int c)
 {
const tgsi::Instruction::DstRegister dst = tgsi.getDst(d);
const unsigned f = dst.getFile();
-   const int idx = dst.getIndex(0);
+   int idx = dst.getIndex(0);
int idx2d = dst.is2D() ? dst.getIndex(1) : 0;
 
if (dst.isMasked(c) || f == TGSI_FILE_BUFFER || f == TGSI_FILE_IMAGE)
@@ -1754,9 +1784,12 @@ Converter::acquireDst(int d, int c)
(f == TGSI_FILE_OUTPUT && prog->getType() != Program::TYPE_FRAGMENT))
   return getScratch();
 
-   if (f == TGSI_FILE_TEMPORARY)
-  idx2d = code->indirectTempArrays.find(code->tempArrayId[idx]) !=
- code->indirectTempArrays.end();
+   if (f == TGSI_FILE_TEMPORARY) {
+  int arrayid = dst.getArrayId();
+  if (!arrayid)
+ arrayid = code->tempArrayId[idx];
+  adjustTempIndex(arrayid, idx, idx2d);
+   }
 
return getArrayForFile(f, idx2d)-> acquire(sub.cur->values, idx, c);
 }
@@ -1789,7 +1822,7 @@ Converter::storeDst(const tgsi::Instruction::DstRegister 
dst, int c,
 Value *val, Value *ptr)
 {
const unsigned f = dst.getFile();
-   const int idx = 

[Mesa-dev] [PATCH] st/mesa: add check for color logicop in blit_copy_pixels()

2016-01-13 Thread Brian Paul
We check that a bunch of raster operations are disabled in
blit_copy_pixels().  We also need to check that color logicop is
disabled.
---
 src/mesa/state_tracker/st_cb_drawpixels.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c 
b/src/mesa/state_tracker/st_cb_drawpixels.c
index 7ed52dd..04a9de0 100644
--- a/src/mesa/state_tracker/st_cb_drawpixels.c
+++ b/src/mesa/state_tracker/st_cb_drawpixels.c
@@ -1302,6 +1302,7 @@ blit_copy_pixels(struct gl_context *ctx, GLint srcx, 
GLint srcy,
ctx->_ImageTransferState == 0x0 &&
!ctx->Color.BlendEnabled &&
!ctx->Color.AlphaEnabled &&
+   (!ctx->Color.ColorLogicOpEnabled || ctx->Color.LogicOp == GL_COPY) &&
!ctx->Depth.Test &&
!ctx->Fog.Enabled &&
!ctx->Stencil.Enabled &&
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V3] glapi: add GL_OES_geometry_shader extension

2016-01-13 Thread Ilia Mirkin
On Wed, Jan 13, 2016 at 3:55 AM, Tapani Pälli  wrote:
> On 01/13/2016 10:29 AM, Lofstedt, Marta wrote:
>>
>>
>>> -Original Message-
>>> From: ibmir...@gmail.com [mailto:ibmir...@gmail.com] On Behalf Of Ilia
>>> Mirkin
>>> Sent: Tuesday, January 12, 2016 7:09 PM
>>> To: Marta Lofstedt
>>> Cc: mesa-dev@lists.freedesktop.org; Romanick, Ian D; Lofstedt, Marta
>>> Subject: Re: [PATCH V3] glapi: add GL_OES_geometry_shader extension
>>>
>>> On Tue, Jan 12, 2016 at 9:13 AM, Marta Lofstedt
>>>  wrote:

 From: Marta Lofstedt 

 Add xml definitions for the GL_OES_geometry_shader extension and
 expose the extension for OpenGL ES 3.1.

 V3: Added dependency to OES_shader_io_blocks and updated to correct
 Khronos extension number.
>>>
>>> May I ask why you did this? OES_shader_io_blocks is a purely shader
>>> compiler/linker feature, I expect it will be enabled whenever GLES 3.1 is
>>> enabled, no? Why would it be tied to geometry shaders? Sure, geometry
>>> shaders require it to work, but just because you have
>>> OES_shader_io_blocks
>>> doesn't necessarily mean you also have geometry shaders...
>>>
>> My intension was to address the co-dependency between oes_geometry_shader
>> and oes_shader_io_block.
>> But as always, you are right Ilia. The dependency issue need to be fixed
>> in the driver.
>>
>> So, please disregard this V3, I will push the V2 with the changes
>> suggested by Ilia in the comments.
>>
>> FYI here are quotes from the oes_geometry_shader specification:
>> " OES_shader_io_blocks or EXT_shader_io_blocks is required."
>
>
> IMO according to this it looks OES_shader_io_blocks is a valid requirement
> as that functionality is not part of OpenGL ES 3.1.

Sure. But that has little bearing on the discussion here --

OES_shader_io_blocks is a compiler feature, not a backend feature. In
order for any backend to expose OES_geometry_shader, the
OES_shader_io_blocks ext needs to be done. But just because it is done
doesn't mean you have geometry shaders.

So you have to make sure that not only does the backend support
geometry shaders, but the core supports OES_shader_io_blocks before
you expose OES_geometry_shader. That doesn't seem too onerous.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V3] glapi: add GL_OES_geometry_shader extension

2016-01-13 Thread Ian Romanick
On 01/13/2016 12:55 AM, Tapani Pälli wrote:
> On 01/13/2016 10:29 AM, Lofstedt, Marta wrote:
>>
>>> -Original Message-
>>> From: ibmir...@gmail.com [mailto:ibmir...@gmail.com] On Behalf Of Ilia
>>> Mirkin
>>> Sent: Tuesday, January 12, 2016 7:09 PM
>>> To: Marta Lofstedt
>>> Cc: mesa-dev@lists.freedesktop.org; Romanick, Ian D; Lofstedt, Marta
>>> Subject: Re: [PATCH V3] glapi: add GL_OES_geometry_shader extension
>>>
>>> On Tue, Jan 12, 2016 at 9:13 AM, Marta Lofstedt
>>>  wrote:
 From: Marta Lofstedt 

 Add xml definitions for the GL_OES_geometry_shader extension and
 expose the extension for OpenGL ES 3.1.

 V3: Added dependency to OES_shader_io_blocks and updated to correct
 Khronos extension number.
>>> May I ask why you did this? OES_shader_io_blocks is a purely shader
>>> compiler/linker feature, I expect it will be enabled whenever GLES
>>> 3.1 is
>>> enabled, no? Why would it be tied to geometry shaders? Sure, geometry
>>> shaders require it to work, but just because you have
>>> OES_shader_io_blocks
>>> doesn't necessarily mean you also have geometry shaders...
>>>
>> My intension was to address the co-dependency between
>> oes_geometry_shader and oes_shader_io_block.
>> But as always, you are right Ilia. The dependency issue need to be
>> fixed in the driver.
>>
>> So, please disregard this V3, I will push the V2 with the changes
>> suggested by Ilia in the comments.
>>
>> FYI here are quotes from the oes_geometry_shader specification:
>> " OES_shader_io_blocks or EXT_shader_io_blocks is required."
> 
> IMO according to this it looks OES_shader_io_blocks is a valid
> requirement as that functionality is not part of OpenGL ES 3.1.

True.  Any driver that enables OES_geometry_shader but does not also
enable OES_shader_io_blocks has a bug.  OES_shader_io_blocks is
necessary but not sufficient.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC] nir: const_index sanity

2016-01-13 Thread Rob Clark
From: Rob Clark 

---
An idea for how to bring some sanity to the wild-west of intrinsic
const_index[] usage.  Also w/ nir_print support, which could be
split into other patch, but makes the nir_print output a bit nicer:

  intrinsic store_output (ssa_210, ssa_66) () (0, 15) /* base=0 wrmask=xyzw */

(and already made me realize that ttn was neglecting to set wrmask on
store_output's)

Probably I'd add "setter" functions to, and then in follow-on patches,
update the gazillion places where const_index[] access is open-coded.

But first, before big conflicty changes like that, I figured I see what
others thought.  The other variation of the idea is to simply drop the
const_index[] field and replace w/ 'unsigned wrmask' and 'int base'.
Although that would be a bigger more flag-day sort of patch.

BR,
-R

 src/glsl/nir/nir.h|  48 +++-
 src/glsl/nir/nir_intrinsics.c |  11 ++-
 src/glsl/nir/nir_intrinsics.h | 178 +-
 src/glsl/nir/nir_print.c  |  30 ---
 4 files changed, 166 insertions(+), 101 deletions(-)

diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index bedcc0d..2235154 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -786,7 +786,7 @@ typedef struct {
 } nir_call_instr;
 
 #define INTRINSIC(name, num_srcs, src_components, has_dest, dest_components, \
-  num_variables, num_indices, flags) \
+  num_variables, num_indices, idx0, idx1, idx2, flags) \
nir_intrinsic_##name,
 
 #define LAST_INTRINSIC(name) nir_last_intrinsic = nir_intrinsic_##name,
@@ -799,6 +799,8 @@ typedef enum {
 #undef INTRINSIC
 #undef LAST_INTRINSIC
 
+#define NIR_INTRINSIC_MAX_CONST_INDEX 3
+
 /** Represents an intrinsic
  *
  * An intrinsic is an instruction type for handling things that are
@@ -842,7 +844,7 @@ typedef struct {
 */
uint8_t num_components;
 
-   int const_index[3];
+   int const_index[NIR_INTRINSIC_MAX_CONST_INDEX];
 
nir_deref_var *variables[2];
 
@@ -871,6 +873,29 @@ typedef enum {
NIR_INTRINSIC_CAN_REORDER = (1 << 1),
 } nir_intrinsic_semantic_flag;
 
+/**
+ * \name NIR intrinsics const-index flag
+ *
+ * Indicates the usage of a const_index slot.
+ *
+ * \sa nir_intrinsic_info::index_map
+ */
+typedef enum {
+   /**
+* Generally instructions that take a offset src argument, can encode
+* a constant 'base' value which is added to the offset.
+*/
+   NIR_INTRINSIC_BASE = 1,
+
+   /**
+* For store instructions, a writemask for the store.
+*/
+   NIR_INTRINSIC_WRMASK = 2,
+
+   NIR_INTRINSIC_NUM_INDEX_FLAGS,
+
+} nir_intrinsic_index_flag;
+
 #define NIR_INTRINSIC_MAX_INPUTS 4
 
 typedef struct {
@@ -900,12 +925,31 @@ typedef struct {
/** the number of constant indices used by the intrinsic */
unsigned num_indices;
 
+   /** indicates the usage of intr->const_index[n] */
+   unsigned index_map[NIR_INTRINSIC_NUM_INDEX_FLAGS];
+
/** semantic flags for calls to this intrinsic */
nir_intrinsic_semantic_flag flags;
 } nir_intrinsic_info;
 
 extern const nir_intrinsic_info nir_intrinsic_infos[nir_num_intrinsics];
 
+static inline unsigned
+nir_intrinsic_write_mask(nir_intrinsic_instr *instr)
+{
+   const nir_intrinsic_info *info = _intrinsic_infos[instr->intrinsic];
+   assert(info->index_map[NIR_INTRINSIC_WRMASK] > 0);
+   return instr->const_index[info->index_map[NIR_INTRINSIC_WRMASK] - 1];
+}
+
+static inline int
+nir_intrinsic_base(nir_intrinsic_instr *instr)
+{
+   const nir_intrinsic_info *info = _intrinsic_infos[instr->intrinsic];
+   assert(info->index_map[NIR_INTRINSIC_BASE] > 0);
+   return instr->const_index[info->index_map[NIR_INTRINSIC_BASE] - 1];
+}
+
 /**
  * \group texture information
  *
diff --git a/src/glsl/nir/nir_intrinsics.c b/src/glsl/nir/nir_intrinsics.c
index a7c868c..7dddc70 100644
--- a/src/glsl/nir/nir_intrinsics.c
+++ b/src/glsl/nir/nir_intrinsics.c
@@ -30,7 +30,8 @@
 #define OPCODE(name) nir_intrinsic_##name
 
 #define INTRINSIC(_name, _num_srcs, _src_components, _has_dest, \
-  _dest_components, _num_variables, _num_indices, _flags) \
+  _dest_components, _num_variables, _num_indices, \
+  idx0, idx1, idx2, _flags) \
 { \
.name = #_name, \
.num_srcs = _num_srcs, \
@@ -39,9 +40,15 @@
.dest_components = _dest_components, \
.num_variables = _num_variables, \
.num_indices = _num_indices, \
-   .flags = _flags \
+   .index_map = { \
+  [NIR_INTRINSIC_ ## idx0] = 1, \
+  [NIR_INTRINSIC_ ## idx1] = 2, \
+  [NIR_INTRINSIC_ ## idx2] = 3, \
+   }, \
 },
 
+#define NIR_INTRINSIC_xx 0
+
 #define LAST_INTRINSIC(name)
 
 const nir_intrinsic_info nir_intrinsic_infos[nir_num_intrinsics] = {
diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
index 62eead4..fd46692 100644
--- a/src/glsl/nir/nir_intrinsics.h
+++ b/src/glsl/nir/nir_intrinsics.h
@@ -30,7 +30,7 @@
  * expands to a list of macros of the form:
  *
  * 

Re: [Mesa-dev] [PATCH 1/2] nir: Add a fquantize2f16 opcode

2016-01-13 Thread Jason Ekstrand
On Wed, Jan 13, 2016 at 2:01 AM, Ian Romanick  wrote:

> On 01/12/2016 05:41 PM, Matt Turner wrote:
> > On Tue, Jan 12, 2016 at 4:10 PM, Jason Ekstrand 
> wrote:
> >> On Tue, Jan 12, 2016 at 3:52 PM, Matt Turner 
> wrote:
> >>>
> >>> On Tue, Jan 12, 2016 at 3:35 PM, Jason Ekstrand 
> >>> wrote:
>  This opcode simply takes a 32-bit floating-point value and reduces its
>  effective precision to 16 bits.
>  ---
> >>>
> >>> What's it supposed to do for values not representable in
> half-precision?
> >>
> >>
> >> If they're in-range, round.  If they're out-of-range, the appropriate
> >> infinity.
> >
> > Are you sure that's the behavior hardware has? And by "are you sure" I
> > mean "have you tested it"
> >
> > The conversion table in the f32to16 documentation in the IVB PRM says:
> >
> > single precision -> half precision
> > 
> > -finite -> -finite/-denorm/-0
> > +finite -> +finite/+denorm/+0
> >
> >>
> https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.html#OpQuantizeToF16
> >
> >> Quantize a floating-point value to a what is expressible by a 16-bit
> floating-point value.
> >
> > Erf, anyway,
> >
> > ... and the "convert too-large values to inf" isn't the behavior of
> > other languages like C [1] (and I don't think GLSL either, but I can't
> > find anything on the matter i the spec) or OpenCL C [2].
>
> Some background may either clarify or further muddy things.
>
> Right now applications sprinkle mediump and lowp all over the place in
> GLSL ES shaders.  Many vertex shader implementations, even on mobile
> devices, do everything in single precision.  Many devices will only use
> f16 part of the time because some instructions may not have f16
> versions.  When we finally implement f16 in the i965 driver, we'll be in
> this boat too.
>
> As a result, people think that their mediump-decorated code is fine...
> until it actually runs on a device that really does mediump.  Then they
> report a bug to the vendor of that hardware.  Sound like a familiar
> situation?
>
> From this problem the OpQuantizeToF16 SPRI-V instruction was born.  The
> intention is that people could compile their code in a way that mediump
> gives you mediump precision on every device.  While you probably
> wouldn't want to ship such code, this at least makes it possible to test
> it without having to find a device that will really do native mediump
> calculations all the time.
>
> IIRC, GLSL doesn't require Inf in mediump.  I don't recall what SPRI-V
> says.  I believe that GLSL allows saturating to the maximum magnitude
> representable value.  What we want is for an expression tree like
>
> OpQuantizeToF16(OpQuantizeToF16(x) + OpQuantizeToF16(y))
>
> to produce the same value that 'x + y' would produce in "real" f16 mediump.
>

Right.  This is exactly why the opcode was created.


>
> The SPRI-V +/-Inf requirement doesn't completely jive with my
> recollection of the discussions... but there was a lot of
> back-and-forth, and it was quite a few months ago at this point.  I
> think we may have picked just one possible answer instead of allowing
> both choices just for consistency.  I don't have any memory whether
> anyone strongly wanted the +/-Inf behavior or if it was just a coin toss.
>

For OpQuantizeF16, the spec does currently


>
> > Section 8.3.2 of the OpenCL C 2.0 spec is also relevant, but doesn't
> > touch directly on the issue at hand.
> >
> > I'm worried that what is specified is not implementable via a round
> > trip through half-precision, because it's not the behavior other
> > languages implement.
> >
> > If I had to guess, given the table in the IVB PRM and section 8.3.2,
> > out-of-range single-precision floats are converted to the
> > half-precision value with the largest magnitude.
>
> You are correct, we should test it to be sure what the hardware really
> does. This is not intended to be a performance operation. If we need to
> use a different, more expensive expansion to meet the requirements, we
> shouldn't lose any sleep over it.
>

I haven't looked at it in bit-for-bit detail, but I I did run it through a
set of tests which explicitly hits denorms and the out-of-bounds cases in
both directions.  The tests seem to indicate that the hardware does what
the opcode claims.

--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] ttn: add missing writemask on store_output

2016-01-13 Thread Rob Clark
From: Rob Clark 

Signed-off-by: Rob Clark 
---
 src/gallium/auxiliary/nir/tgsi_to_nir.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/auxiliary/nir/tgsi_to_nir.c 
b/src/gallium/auxiliary/nir/tgsi_to_nir.c
index 46c9297..e127174 100644
--- a/src/gallium/auxiliary/nir/tgsi_to_nir.c
+++ b/src/gallium/auxiliary/nir/tgsi_to_nir.c
@@ -1908,6 +1908,7 @@ ttn_add_output_stores(struct ttn_compile *c)
  store->src[0].reg.reg = c->output_regs[loc].reg;
  store->src[0].reg.base_offset = c->output_regs[loc].offset;
  store->const_index[0] = loc;
+ store->const_index[1] = 0xf;  /* writemask */
  store->src[1] = nir_src_for_ssa(nir_imm_int(b, 0));
  nir_builder_instr_insert(b, >instr);
   }
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nir: Add a fquantize2f16 opcode

2016-01-13 Thread Matt Turner
On Wed, Jan 13, 2016 at 1:46 PM, Jason Ekstrand  wrote:
> On Wed, Jan 13, 2016 at 2:01 AM, Ian Romanick  wrote:
>> On 01/12/2016 05:41 PM, Matt Turner wrote:
>> > Section 8.3.2 of the OpenCL C 2.0 spec is also relevant, but doesn't
>> > touch directly on the issue at hand.
>> >
>> > I'm worried that what is specified is not implementable via a round
>> > trip through half-precision, because it's not the behavior other
>> > languages implement.
>> >
>> > If I had to guess, given the table in the IVB PRM and section 8.3.2,
>> > out-of-range single-precision floats are converted to the
>> > half-precision value with the largest magnitude.
>>
>> You are correct, we should test it to be sure what the hardware really
>> does. This is not intended to be a performance operation. If we need to
>> use a different, more expensive expansion to meet the requirements, we
>> shouldn't lose any sleep over it.
>
>
> I haven't looked at it in bit-for-bit detail, but I I did run it through a
> set of tests which explicitly hits denorms and the out-of-bounds cases in
> both directions.  The tests seem to indicate that the hardware does what the
> opcode claims.

I checked out the tests you mention, and none of the cases touch on
what I'm saying (and this has nothing to do with denormal values). Let
me explain again.

The largest representable value in half-precision is

   65504 == 2.0**15 * (1.0 + 1023.0 / 2.0**10)

and the distance between representable integers at this range is 32.
Converting 65505.0f through 65519.0f (i.e., one less than half the
interval more than the largest representable value) to half-precision
should round to 65504.0. 65520.0f and larger should round to infinity.

This is what piglit tests
(generated_tests/gen_builtin_packing_tests.py) and since we pass those
tests I believe this is what the hardware does.

This is, unfortunately, *not* what the documentation you've cited
says. I expect that that's an oversight more than intentional
behavior. Maybe tomorrow we can figure out how to submit changes to
the spec and test suite?

(And thanks to Chad for writing a significantly better quality test
than what I found from Khronos)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] texobj: Check completeness with InternalFormat rather than Mesa format

2016-01-13 Thread Anuj Phogat
On Wed, Jan 13, 2016 at 11:28 AM, Neil Roberts  wrote:
> The internal Mesa format used for a texture might not match the one
> requested in the internalFormat when the texture was created, for
> example if the driver is internally remapping RGB textures to RGBA.
> Otherwise it can cause false positives for completeness if one mipmap
> image is created as RGBA and the other as RGB because they would both
> have an RGBA Mesa format. If we check the InternalFormat instead then
> we are directly checking the API usage which I think better matches
> the intention of the check.
>
> https://bugs.freedesktop.org/show_bug.cgi?id=93700
> ---
>  src/mesa/main/texobj.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/mesa/main/texobj.c b/src/mesa/main/texobj.c
> index 547055e..b107a8f 100644
> --- a/src/mesa/main/texobj.c
> +++ b/src/mesa/main/texobj.c
> @@ -835,7 +835,7 @@ _mesa_test_texobj_completeness( const struct gl_context 
> *ctx,
>incomplete(t, MIPMAP, "TexImage[%d] is missing", i);
>return;
> }
> -   if (img->TexFormat != baseImage->TexFormat) {
> +   if (img->InternalFormat != baseImage->InternalFormat) {
>incomplete(t, MIPMAP, "Format[i] != Format[baseLevel]");
>return;
> }
> --
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

LGTM.
Reviewed-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/7] i965/vec4/gs: Stop munging the ATTR containing gl_PointSize.

2016-01-13 Thread Kenneth Graunke
gl_PointSize is delivered in the .w component of the VUE header, while
the language expects it to be a float (and thus in the .x component).

Previously, we emitted MOVs to copy it over to the .x component.
But this is silly - we can just use a . swizzle and access it
without copying anything or clobbering the value stored at .x
(which admittedly is useless).

Removes the last use of ATTR destinations.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp |  4 
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 23 ---
 2 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp
index 6f66978..90aa54e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp
@@ -73,6 +73,10 @@ vec4_gs_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
*instr)
   src = src_reg(ATTR, BRW_VARYING_SLOT_COUNT * vertex->u[0] +
   instr->const_index[0] + offset->u[0],
 type);
+  /* gl_PointSize is passed in the .w component of the VUE header */
+  if (instr->const_index[0] == VARYING_SLOT_PSIZ)
+ src.swizzle = SWIZZLE_;
+
   dest = get_nir_dest(instr->dest, src.type);
   dest.writemask = brw_writemask_for_size(instr->num_components);
   emit(MOV(dest, src));
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index b13d36e..374b1a7 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -182,29 +182,6 @@ vec4_gs_visitor::emit_prolog()
   }
}
 
-   /* If the geometry shader uses the gl_PointSize input, we need to fix it up
-* to account for the fact that the vertex shader stored it in the w
-* component of VARYING_SLOT_PSIZ.
-*/
-   if (nir->info.inputs_read & VARYING_BIT_PSIZ) {
-  this->current_annotation = "swizzle gl_PointSize input";
-  for (int vertex = 0; vertex < (int)nir->info.gs.vertices_in; vertex++) {
- dst_reg dst(ATTR,
- BRW_VARYING_SLOT_COUNT * vertex + VARYING_SLOT_PSIZ);
- dst.type = BRW_REGISTER_TYPE_F;
- src_reg src(dst);
- dst.writemask = WRITEMASK_X;
- src.swizzle = BRW_SWIZZLE_;
- inst = emit(MOV(dst, src));
-
- /* In dual instanced dispatch mode, dst has a width of 4, so we need
-  * to make sure the MOV happens regardless of which channels are
-  * enabled.
-  */
- inst->force_writemask_all = true;
-  }
-   }
-
this->current_annotation = NULL;
 }
 
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/7] i965: Apply add_const_offset_to_base for vec4 VS inputs too.

2016-01-13 Thread Kenneth Graunke
This shouldn't hurt anything, and I'm about to introduce a pass that
will want it.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_nir.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index 55ba732..935529a 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -220,6 +220,11 @@ brw_nir_lower_inputs(nir_shader *nir,
*/
   nir_lower_io(nir, nir_var_shader_in, type_size_vec4);
 
+  /* This pass needs actual constants */
+  nir_opt_constant_folding(nir);
+
+  add_const_offset_to_base(nir, nir_var_shader_in);
+
   if (is_scalar) {
  /* Finally, translate VERT_ATTRIB_* values into the actual registers.
   *
@@ -229,11 +234,6 @@ brw_nir_lower_inputs(nir_shader *nir,
   */
  GLbitfield64 inputs_read = nir->info.inputs_read;
 
- /* This pass needs actual constants */
- nir_opt_constant_folding(nir);
-
- add_const_offset_to_base(nir, nir_var_shader_in);
-
  nir_foreach_function(nir, function) {
 if (function->impl) {
nir_foreach_block(function->impl, remap_vs_attrs, _read);
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/7] nir/builder: Add a nir_build_ivec4() convenience helper.

2016-01-13 Thread Kenneth Graunke
nir_build_ivec4 is more readable and succinct than using nir_build_imm
directly, even if you have C99.

Signed-off-by: Kenneth Graunke 
---
 src/glsl/nir/nir_builder.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/glsl/nir/nir_builder.h b/src/glsl/nir/nir_builder.h
index cfaaf8e..88ba3a1 100644
--- a/src/glsl/nir/nir_builder.h
+++ b/src/glsl/nir/nir_builder.h
@@ -121,6 +121,20 @@ nir_imm_int(nir_builder *build, int x)
 }
 
 static inline nir_ssa_def *
+nir_imm_ivec4(nir_builder *build, int x, int y, int z, int w)
+{
+   nir_const_value v;
+
+   memset(, 0, sizeof(v));
+   v.i[0] = x;
+   v.i[1] = y;
+   v.i[2] = z;
+   v.i[3] = w;
+
+   return nir_build_imm(build, 4, v);
+}
+
+static inline nir_ssa_def *
 nir_build_alu(nir_builder *build, nir_op op, nir_ssa_def *src0,
   nir_ssa_def *src1, nir_ssa_def *src2, nir_ssa_def *src3)
 {
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/7] i965/vec4: Drop support for ATTR as an instruction destination.

2016-01-13 Thread Kenneth Graunke
This is no longer necessary...and it doesn't make much sense to
have inputs as destinations.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 16 
 1 file changed, 16 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index d2c27ff..4b3f2af 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1522,22 +1522,6 @@ vec4_visitor::lower_attributes_to_hw_regs(const int 
*attribute_map,
   bool interleaved)
 {
foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
-  /* We have to support ATTR as a destination for GL_FIXED fixup. */
-  if (inst->dst.file == ATTR) {
- int grf = attribute_map[inst->dst.nr + inst->dst.reg_offset];
-
- /* All attributes used in the shader need to have been assigned a
-  * hardware register by the caller
-  */
- assert(grf != 0);
-
-struct brw_reg reg = attribute_to_hw_reg(grf, interleaved);
-reg.type = inst->dst.type;
-reg.writemask = inst->dst.writemask;
-
- inst->dst = reg;
-  }
-
   for (int i = 0; i < 3; i++) {
 if (inst->src[i].file != ATTR)
continue;
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/7] i965: Make add_const_offset_to_base() work at the shader level.

2016-01-13 Thread Kenneth Graunke
This makes it a pass, hiding the parameter structs and block callbacks
so it's simpler to work with.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_nir.c | 38 -
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index f8b258b..55ba732 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -60,7 +60,7 @@ struct add_const_offset_to_base_params {
 };
 
 static bool
-add_const_offset_to_base(nir_block *block, void *closure)
+add_const_offset_to_base_block(nir_block *block, void *closure)
 {
struct add_const_offset_to_base_params *params = closure;
nir_builder *b = >b;
@@ -85,7 +85,19 @@ add_const_offset_to_base(nir_block *block, void *closure)
   }
}
return true;
+}
+
+static void
+add_const_offset_to_base(nir_shader *nir, nir_variable_mode mode)
+{
+   struct add_const_offset_to_base_params params = { .mode = mode };
 
+   nir_foreach_function(nir, f) {
+  if (f->impl) {
+ nir_builder_init(, f->impl);
+ nir_foreach_block(f->impl, add_const_offset_to_base_block, );
+  }
+   }
 }
 
 static bool
@@ -195,10 +207,6 @@ brw_nir_lower_inputs(nir_shader *nir,
  const struct brw_device_info *devinfo,
  bool is_scalar)
 {
-   struct add_const_offset_to_base_params params = {
-  .mode = nir_var_shader_in
-   };
-
switch (nir->stage) {
case MESA_SHADER_VERTEX:
   /* Start with the location of the variable's base. */
@@ -224,10 +232,10 @@ brw_nir_lower_inputs(nir_shader *nir,
  /* This pass needs actual constants */
  nir_opt_constant_folding(nir);
 
+ add_const_offset_to_base(nir, nir_var_shader_in);
+
  nir_foreach_function(nir, function) {
 if (function->impl) {
-   nir_builder_init(, function->impl);
-   nir_foreach_block(function->impl, add_const_offset_to_base, 
);
nir_foreach_block(function->impl, remap_vs_attrs, _read);
 }
  }
@@ -270,10 +278,10 @@ brw_nir_lower_inputs(nir_shader *nir,
  /* This pass needs actual constants */
  nir_opt_constant_folding(nir);
 
+ add_const_offset_to_base(nir, nir_var_shader_in);
+
  nir_foreach_function(nir, function) {
 if (function->impl) {
-   nir_builder_init(, function->impl);
-   nir_foreach_block(function->impl, add_const_offset_to_base, 
);
nir_foreach_block(function->impl, remap_inputs_with_vue_map,
  _vue_map);
 }
@@ -296,10 +304,10 @@ brw_nir_lower_inputs(nir_shader *nir,
   /* This pass needs actual constants */
   nir_opt_constant_folding(nir);
 
+  add_const_offset_to_base(nir, nir_var_shader_in);
+
   nir_foreach_function(nir, function) {
  if (function->impl) {
-nir_builder_init(, function->impl);
-nir_foreach_block(function->impl, add_const_offset_to_base, 
);
 nir_builder_init(, function->impl);
 nir_foreach_block(function->impl, remap_patch_urb_offsets, );
  }
@@ -339,10 +347,6 @@ brw_nir_lower_outputs(nir_shader *nir,
   }
   break;
case MESA_SHADER_TESS_CTRL: {
-  struct add_const_offset_to_base_params params = {
- .mode = nir_var_shader_out
-  };
-
   struct remap_patch_urb_offsets_state state;
   brw_compute_tess_vue_map(_map, nir->info.outputs_written,
nir->info.patch_outputs_written);
@@ -356,10 +360,10 @@ brw_nir_lower_outputs(nir_shader *nir,
   /* This pass needs actual constants */
   nir_opt_constant_folding(nir);
 
+  add_const_offset_to_base(nir, nir_var_shader_out);
+
   nir_foreach_function(nir, function) {
  if (function->impl) {
-nir_builder_init(, function->impl);
-nir_foreach_block(function->impl, add_const_offset_to_base, 
);
 nir_builder_init(, function->impl);
 nir_foreach_block(function->impl, remap_patch_urb_offsets, );
  }
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/7] i965: Apply VS attribute workarounds in NIR.

2016-01-13 Thread Kenneth Graunke
This patch re-implements the pre-Haswell VS attribute workarounds.
Instead of emitting shader code in the vec4 backend, we now simply
call a NIR pass to emit the necessary code.

This simplifies the vec4 backend.  Beyond deleting code, it removes
the primary use of ATTR as a destination.  It also eliminates the
requirement that the vec4 VS backend express the ATTR file in terms
of VERT_ATTRIB_* locations, giving us a bit more flexibility.

This approach is a little different: rather than munging the attributes
at the top, we emit code to fix them up when they're accessed.  However,
we run the optimizer afterwards, so CSE should eliminate the redundant
math.  It may even be able to fuse it with other calculations based on
the input value.

shader-db does not handle non-default NOS settings, so I have no
statistics about this patch.

Note that the scalar backend does not implement VS attribute
workarounds, as they are unnecessary on hardware which allows SIMD8 VS.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/brw_nir.c|  19 ++-
 src/mesa/drivers/dri/i965/brw_nir.h|   7 +-
 .../dri/i965/brw_nir_attribute_workarounds.c   | 178 +
 src/mesa/drivers/dri/i965/brw_shader.cpp   |   2 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp |   3 +
 src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp |   2 +-
 src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp  | 109 -
 8 files changed, 204 insertions(+), 117 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_nir_attribute_workarounds.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 5aeeca5..c654f94 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -42,6 +42,7 @@ i965_compiler_FILES = \
brw_nir.h \
brw_nir.c \
brw_nir_analyze_boolean_resolves.c \
+   brw_nir_attribute_workarounds.c \
brw_nir_opt_peephole_ffma.c \
brw_nir_uniforms.cpp \
brw_packed_float.c \
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index 935529a..cdecc3d 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -205,7 +205,9 @@ remap_patch_urb_offsets(nir_block *block, void *closure)
 static void
 brw_nir_lower_inputs(nir_shader *nir,
  const struct brw_device_info *devinfo,
- bool is_scalar)
+ bool is_scalar,
+ bool use_legacy_snorm_formula,
+ const uint8_t *vs_attrib_wa_flags)
 {
switch (nir->stage) {
case MESA_SHADER_VERTEX:
@@ -225,6 +227,9 @@ brw_nir_lower_inputs(nir_shader *nir,
 
   add_const_offset_to_base(nir, nir_var_shader_in);
 
+  brw_nir_apply_attribute_workarounds(nir, use_legacy_snorm_formula,
+  vs_attrib_wa_flags);
+
   if (is_scalar) {
  /* Finally, translate VERT_ATTRIB_* values into the actual registers.
   *
@@ -497,12 +502,15 @@ brw_preprocess_nir(nir_shader *nir, bool is_scalar)
 nir_shader *
 brw_nir_lower_io(nir_shader *nir,
  const struct brw_device_info *devinfo,
- bool is_scalar)
+ bool is_scalar,
+ bool use_legacy_snorm_formula,
+ const uint8_t *vs_attrib_wa_flags)
 {
bool progress; /* Written by OPT and OPT_V */
(void)progress;
 
-   OPT_V(brw_nir_lower_inputs, devinfo, is_scalar);
+   OPT_V(brw_nir_lower_inputs, devinfo, is_scalar,
+ use_legacy_snorm_formula, vs_attrib_wa_flags);
OPT_V(brw_nir_lower_outputs, devinfo, is_scalar);
OPT_V(nir_lower_io, nir_var_all, is_scalar ? type_size_scalar : 
type_size_vec4);
 
@@ -613,9 +621,10 @@ brw_create_nir(struct brw_context *brw,
   OPT_V(nir_lower_atomics, shader_prog);
}
 
-   if (nir->stage != MESA_SHADER_TESS_CTRL &&
+   if (nir->stage != MESA_SHADER_VERTEX &&
+   nir->stage != MESA_SHADER_TESS_CTRL &&
nir->stage != MESA_SHADER_TESS_EVAL) {
-  nir = brw_nir_lower_io(nir, devinfo, is_scalar);
+  nir = brw_nir_lower_io(nir, devinfo, is_scalar, false, NULL);
}
 
return nir;
diff --git a/src/mesa/drivers/dri/i965/brw_nir.h 
b/src/mesa/drivers/dri/i965/brw_nir.h
index 78b139b..5bfe40f 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.h
+++ b/src/mesa/drivers/dri/i965/brw_nir.h
@@ -84,11 +84,16 @@ nir_shader *brw_create_nir(struct brw_context *brw,
 nir_shader *brw_preprocess_nir(nir_shader *nir, bool is_scalar);
 nir_shader *brw_nir_lower_io(nir_shader *nir,
 const struct brw_device_info *devinfo,
-bool is_scalar);
+bool is_scalar,
+bool use_legacy_snorm_formula,
+  

[Mesa-dev] [PATCH 2/7] i965: Make an is_scalar boolean in brw_compile_vs().

2016-01-13 Thread Kenneth Graunke
Shorter than compiler->scalar_stage[MESA_SHADER_VERTEX], which can
help with line-wrapping.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index c6a52c5..ca27066 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1988,11 +1988,11 @@ brw_compile_vs(const struct brw_compiler *compiler, 
void *log_data,
unsigned *final_assembly_size,
char **error_str)
 {
+   const bool is_scalar = compiler->scalar_stage[MESA_SHADER_VERTEX];
nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
shader = brw_nir_apply_sampler_key(shader, compiler->devinfo, >tex,
-  
compiler->scalar_stage[MESA_SHADER_VERTEX]);
-   shader = brw_postprocess_nir(shader, compiler->devinfo,
-compiler->scalar_stage[MESA_SHADER_VERTEX]);
+  is_scalar);
+   shader = brw_postprocess_nir(shader, compiler->devinfo, is_scalar);
 
const unsigned *assembly = NULL;
 
@@ -2018,7 +2018,7 @@ brw_compile_vs(const struct brw_compiler *compiler, void 
*log_data,
 * Read Length" as 1 in vec4 mode, and 0 in SIMD8 mode.  Empirically, in
 * vec4 mode, the hardware appears to wedge unless we read something.
 */
-   if (compiler->scalar_stage[MESA_SHADER_VERTEX])
+   if (is_scalar)
   prog_data->base.urb_read_length = DIV_ROUND_UP(nr_attributes, 2);
else
   prog_data->base.urb_read_length = DIV_ROUND_UP(MAX2(nr_attributes, 1), 
2);
@@ -2037,7 +2037,7 @@ brw_compile_vs(const struct brw_compiler *compiler, void 
*log_data,
else
   prog_data->base.urb_entry_size = DIV_ROUND_UP(vue_entries, 4);
 
-   if (compiler->scalar_stage[MESA_SHADER_VERTEX]) {
+   if (is_scalar) {
   prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
 
   fs_visitor v(compiler, log_data, mem_ctx, key, _data->base.base,
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nir: Handle =32 case in bitfield_insert lowering.

2016-01-13 Thread Connor Abbott
Both are

Reviewed-by: Connor Abbott 

On Wed, Jan 13, 2016 at 2:25 PM, Matt Turner  wrote:
> The OpenGL specifications for bitfieldInsert() says:
>
>The result will be undefined if  or  is negative, or if
>the sum of  and  is greater than the number of bits
>used to store the operand.
>
> Therefore passing bits=32, offset=0 is legal and defined in GLSL.
>
> But the earlier SM5 bfi opcode is specified to accept a bitfield width
> ranging from 0-31. As such, Intel and AMD instructions read only the low
> 5 bits of the width operand, making them not able to implement the
> GLSL-specified behavior directly.
>
> This commit fixes the lowering of bitfield_insert to handle the trivial
> case of  = 32 as
>
>bitfieldInsert:
>   bits > 31 ? insert : bfi(bfm(bits, offset), insert, base)
>
> Fixes:
>ES31-CTS.shader_bitfield_operation.bitfieldInsert.uint_2
>ES31-CTS.shader_bitfield_operation.bitfieldInsert.uvec4_3
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
> ---
> These two patches replace 8/9 and 9/9 of the previous series.
> The first 7 patches from it have been reviewed and committed.
>
>  src/glsl/nir/nir_opcodes.py   | 1 +
>  src/glsl/nir/nir_opt_algebraic.py | 6 +-
>  2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py
> index 1c65def..3e43438 100644
> --- a/src/glsl/nir/nir_opcodes.py
> +++ b/src/glsl/nir/nir_opcodes.py
> @@ -558,6 +558,7 @@ triop("fcsel", tfloat, "(src0 != 0.0f) ? src1 : src2")
>  opcode("bcsel", 0, tuint, [0, 0, 0],
>[tbool, tuint, tuint], "", "src0 ? src1 : src2")
>
> +# SM5 bfi assembly
>  triop("bfi", tuint, """
>  unsigned mask = src0, insert = src1, base = src2;
>  if (mask == 0) {
> diff --git a/src/glsl/nir/nir_opt_algebraic.py 
> b/src/glsl/nir/nir_opt_algebraic.py
> index 1eb044a..0d31e39 100644
> --- a/src/glsl/nir/nir_opt_algebraic.py
> +++ b/src/glsl/nir/nir_opt_algebraic.py
> @@ -225,9 +225,13 @@ optimizations = [
>
> # Misc. lowering
> (('fmod', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b, 
> 'options->lower_fmod'),
> -   (('bitfield_insert', a, b, c, d), ('bfi', ('bfm', d, c), b, a), 
> 'options->lower_bitfield_insert'),
> (('uadd_carry', a, b), ('b2i', ('ult', ('iadd', a, b), a)), 
> 'options->lower_uadd_carry'),
> (('usub_borrow', a, b), ('b2i', ('ult', a, b)), 
> 'options->lower_usub_borrow'),
> +
> +   (('bitfield_insert', 'base', 'insert', 'offset', 'bits'),
> +('bcsel', ('ilt', 31, 'bits'), 'insert',
> +  ('bfi', ('bfm', 'bits', 'offset'), 'insert', 'base')),
> +'options->lower_bitfield_insert'),
>  ]
>
>  # Add optimizations to handle the case where the result of a ternary is
> --
> 2.4.9
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] nir: const_index sanity

2016-01-13 Thread Jason Ekstrand
On Jan 13, 2016 4:03 PM, "Rob Clark"  wrote:
>
> From: Rob Clark 
>
> ---
> An idea for how to bring some sanity to the wild-west of intrinsic
> const_index[] usage.  Also w/ nir_print support, which could be
> split into other patch, but makes the nir_print output a bit nicer:
>
>   intrinsic store_output (ssa_210, ssa_66) () (0, 15) /* base=0
wrmask=xyzw */
>
> (and already made me realize that ttn was neglecting to set wrmask on
> store_output's)
>
> Probably I'd add "setter" functions to, and then in follow-on patches,
> update the gazillion places where const_index[] access is open-coded.
>
> But first, before big conflicty changes like that, I figured I see what
> others thought.  The other variation of the idea is to simply drop the
> const_index[] field and replace w/ 'unsigned wrmask' and 'int base'.
> Although that would be a bigger more flag-day sort of patch.

We really need to do something here and what you've done is a pretty clever
way to handle the problem.  I'll have to give it a bit more thought before
I'll whole-heartedly endorse it, but a first brush looks pretty good.

A few minor comments below.

> BR,
> -R
>
>  src/glsl/nir/nir.h|  48 +++-
>  src/glsl/nir/nir_intrinsics.c |  11 ++-
>  src/glsl/nir/nir_intrinsics.h | 178
+-
>  src/glsl/nir/nir_print.c  |  30 ---
>  4 files changed, 166 insertions(+), 101 deletions(-)
>
> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index bedcc0d..2235154 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -786,7 +786,7 @@ typedef struct {
>  } nir_call_instr;
>
>  #define INTRINSIC(name, num_srcs, src_components, has_dest,
dest_components, \
> -  num_variables, num_indices, flags) \
> +  num_variables, num_indices, idx0, idx1, idx2, flags) \
> nir_intrinsic_##name,
>
>  #define LAST_INTRINSIC(name) nir_last_intrinsic = nir_intrinsic_##name,
> @@ -799,6 +799,8 @@ typedef enum {
>  #undef INTRINSIC
>  #undef LAST_INTRINSIC
>
> +#define NIR_INTRINSIC_MAX_CONST_INDEX 3
> +
>  /** Represents an intrinsic
>   *
>   * An intrinsic is an instruction type for handling things that are
> @@ -842,7 +844,7 @@ typedef struct {
>  */
> uint8_t num_components;
>
> -   int const_index[3];
> +   int const_index[NIR_INTRINSIC_MAX_CONST_INDEX];
>
> nir_deref_var *variables[2];
>
> @@ -871,6 +873,29 @@ typedef enum {
> NIR_INTRINSIC_CAN_REORDER = (1 << 1),
>  } nir_intrinsic_semantic_flag;
>
> +/**
> + * \name NIR intrinsics const-index flag
> + *
> + * Indicates the usage of a const_index slot.
> + *
> + * \sa nir_intrinsic_info::index_map
> + */
> +typedef enum {
> +   /**
> +* Generally instructions that take a offset src argument, can encode
> +* a constant 'base' value which is added to the offset.
> +*/
> +   NIR_INTRINSIC_BASE = 1,
> +
> +   /**
> +* For store instructions, a writemask for the store.
> +*/
> +   NIR_INTRINSIC_WRMASK = 2,
> +
> +   NIR_INTRINSIC_NUM_INDEX_FLAGS,
> +
> +} nir_intrinsic_index_flag;
> +
>  #define NIR_INTRINSIC_MAX_INPUTS 4
>
>  typedef struct {
> @@ -900,12 +925,31 @@ typedef struct {
> /** the number of constant indices used by the intrinsic */
> unsigned num_indices;
>
> +   /** indicates the usage of intr->const_index[n] */
> +   unsigned index_map[NIR_INTRINSIC_NUM_INDEX_FLAGS];
> +
> /** semantic flags for calls to this intrinsic */
> nir_intrinsic_semantic_flag flags;
>  } nir_intrinsic_info;
>
>  extern const nir_intrinsic_info nir_intrinsic_infos[nir_num_intrinsics];
>
> +static inline unsigned
> +nir_intrinsic_write_mask(nir_intrinsic_instr *instr)
> +{
> +   const nir_intrinsic_info *info =
_intrinsic_infos[instr->intrinsic];
> +   assert(info->index_map[NIR_INTRINSIC_WRMASK] > 0);
> +   return instr->const_index[info->index_map[NIR_INTRINSIC_WRMASK] - 1];
> +}
> +
> +static inline int
> +nir_intrinsic_base(nir_intrinsic_instr *instr)
> +{
> +   const nir_intrinsic_info *info =
_intrinsic_infos[instr->intrinsic];
> +   assert(info->index_map[NIR_INTRINSIC_BASE] > 0);
> +   return instr->const_index[info->index_map[NIR_INTRINSIC_BASE] - 1];
> +}
> +
>  /**
>   * \group texture information
>   *
> diff --git a/src/glsl/nir/nir_intrinsics.c b/src/glsl/nir/nir_intrinsics.c
> index a7c868c..7dddc70 100644
> --- a/src/glsl/nir/nir_intrinsics.c
> +++ b/src/glsl/nir/nir_intrinsics.c
> @@ -30,7 +30,8 @@
>  #define OPCODE(name) nir_intrinsic_##name
>
>  #define INTRINSIC(_name, _num_srcs, _src_components, _has_dest, \
> -  _dest_components, _num_variables, _num_indices,
_flags) \
> +  _dest_components, _num_variables, _num_indices, \
> +  idx0, idx1, idx2, _flags) \
>  { \
> .name = #_name, \
> .num_srcs = _num_srcs, \
> @@ -39,9 +40,15 @@
> .dest_components = _dest_components, \
> .num_variables = _num_variables, \
> .num_indices = 

Re: [Mesa-dev] [PATCH] radeonsi: don't print a warning for unhandled registers returned by LLVM

2016-01-13 Thread Michel Dänzer
On 13.01.2016 20:23, Marek Olšák wrote:
> On Wed, Jan 13, 2016 at 4:25 AM, Michel Dänzer  wrote:
>> On 13.01.2016 03:44, Marek Olšák wrote:
>>> From: Marek Olšák 
>>>
>>> We don't want apps to flood stderr. New LLVM + old Mesa is a perfectly
>>> valid combination (if it doesn't fail to build, of course).
>>
>> Actually it's not, in general.
> 
> Why not?

LLVM (at least outside of the C APIs it exposes) only guarantees
compatibility between minor releases of the same major release branch.
So, using Mesa with an SVN snapshot or major release of LLVM which is
newer than the Git snapshot or release of Mesa isn't guaranteed to work,
even if it happens to build.

I think this message should still be printed at least once, as an
indication that something might be wrong with the setup.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nv50/ir: only use FILE_LOCAL_MEMORY for temp arrays that use indirection

2016-01-13 Thread Ilia Mirkin
Previously we were treating any indirect temp array usage to mean that
everything should end up in lmem. The MemoryOpt pass would clean a lot
of that up later, but in the meanwhile we would lose a lot of
opportunity for optimization.

This helps a lot of Metro 2033 Redux and a handful of KSP shaders:

total instructions in shared programs : 6288373 -> 6261517 (-0.43%)
total gprs used in shared programs: 944051 -> 945131 (0.11%)
total local used in shared programs   : 54116 -> 54116 (0.00%)
total bytes used in shared programs   : 50306984 -> 50092136 (-0.43%)

A typical case is for register usage to double and for instructions to
halve. A future commit can also optimize local memory usage size to be
reduced with better packing.

Signed-off-by: Ilia Mirkin 
---
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 65 +-
 1 file changed, 50 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 7e3b093..507749d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -96,6 +96,13 @@ public:
  return tgsi_util_get_src_register_swizzle(, chan);
   }
 
+  int getArrayId() const
+  {
+ if (isIndirect(0))
+return fsr->Indirect.ArrayID;
+ return 0;
+  }
+
   nv50_ir::Modifier getMod(int chan) const;
 
   SrcRegister getIndirect(int dim) const
@@ -155,6 +162,13 @@ public:
  return SrcRegister(fdr->Indirect);
   }
 
+  int getArrayId() const
+  {
+ if (isIndirect(0))
+return fdr->Indirect.ArrayID;
+ return 0;
+  }
+
private:
   const struct tgsi_dst_register reg;
   const struct tgsi_full_dst_register *fdr;
@@ -826,7 +840,8 @@ public:
// these registers are per-subroutine, cannot be used for parameter passing
std::set locals;
 
-   bool mainTempsInLMem;
+   std::set indirectTempArrays;
+   std::vector tempArrayId;
 
int clipVertexOutput;
 
@@ -859,8 +874,6 @@ Source::Source(struct nv50_ir_prog_info *prog) : info(prog)
 
if (prog->dbgFlags & NV50_IR_DEBUG_BASIC)
   tgsi_dump(tokens, 0);
-
-   mainTempsInLMem = false;
 }
 
 Source::~Source()
@@ -890,6 +903,7 @@ bool Source::scanSource()
 
textureViews.resize(scan.file_max[TGSI_FILE_SAMPLER_VIEW] + 1);
resources.resize(scan.file_max[TGSI_FILE_IMAGE] + 1);
+   tempArrayId.resize(scan.file_max[TGSI_FILE_TEMPORARY] + 1);
 
info->immd.bufSize = 0;
 
@@ -935,7 +949,8 @@ bool Source::scanSource()
}
tgsi_parse_free();
 
-   if (mainTempsInLMem)
+   // TODO: Compute based on relevant array sizes
+   if (indirectTempArrays.size())
   info->bin.tlsSpace += (scan.file_max[TGSI_FILE_TEMPORARY] + 1) * 16;
 
if (info->io.genUserClip > 0) {
@@ -1046,6 +1061,7 @@ bool Source::scanDeclaration(const struct 
tgsi_full_declaration *decl)
unsigned sn = TGSI_SEMANTIC_GENERIC;
unsigned si = 0;
const unsigned first = decl->Range.First, last = decl->Range.Last;
+   const int arrayId = decl->Array.ArrayID;
 
if (decl->Declaration.Semantic) {
   sn = decl->Semantic.Name;
@@ -1189,8 +1205,11 @@ bool Source::scanDeclaration(const struct 
tgsi_full_declaration *decl)
   for (i = first; i <= last; ++i)
  textureViews[i].target = decl->SamplerView.Resource;
   break;
-   case TGSI_FILE_NULL:
case TGSI_FILE_TEMPORARY:
+  for (i = first; i <= last; ++i)
+ tempArrayId[i] = arrayId;
+  break;
+   case TGSI_FILE_NULL:
case TGSI_FILE_ADDRESS:
case TGSI_FILE_CONSTANT:
case TGSI_FILE_IMMEDIATE:
@@ -1241,7 +1260,7 @@ bool Source::scanInstruction(const struct 
tgsi_full_instruction *inst)
   } else
   if (insn.getDst(0).getFile() == TGSI_FILE_TEMPORARY) {
  if (insn.getDst(0).isIndirect(0))
-mainTempsInLMem = true;
+indirectTempArrays.insert(insn.getDst(0).getArrayId());
   } else
   if (insn.getDst(0).getFile() == TGSI_FILE_BUFFER) {
  info->io.globalAccess |= 0x2;
@@ -1252,7 +1271,7 @@ bool Source::scanInstruction(const struct 
tgsi_full_instruction *inst)
   Instruction::SrcRegister src = insn.getSrc(s);
   if (src.getFile() == TGSI_FILE_TEMPORARY) {
  if (src.isIndirect(0))
-mainTempsInLMem = true;
+indirectTempArrays.insert(src.getArrayId());
   } else
   if (src.getFile() == TGSI_FILE_BUFFER) {
  info->io.globalAccess |= (insn.getOpcode() == TGSI_OPCODE_LOAD) ?
@@ -1434,6 +1453,7 @@ private:
DataType srcTy;
 
DataArray tData; // TGSI_FILE_TEMPORARY
+   DataArray lData; // TGSI_FILE_TEMPORARY, for indirect arrays
DataArray aData; // TGSI_FILE_ADDRESS
DataArray pData; // TGSI_FILE_PREDICATE
DataArray oData; // TGSI_FILE_OUTPUT (if outputs in registers)
@@ -1637,7 +1657,7 @@ 

Re: [Mesa-dev] [PATCH] radeonsi: enable late VS export memory allocation

2016-01-13 Thread Axel Davy

>
> Axel Davy benchmarked this briefly. We may need more benchmarks though.
>
> Marek
>

I confirm setting this register helps get a few % with heaven.

There was also another register to kill color exports early when doing
depth only pass that helped a few % (but less).

Axel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 93686] Performance improvement ?=:=?UTF-8?Q? Please consider hardware ɢᴘᴜ rendering in llvmpipe

2016-01-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=93686

--- Comment #2 from Roland Scheidegger  ---
I'm not sure if this exact same proposal really came up already. We have seen
some though asking if we couldn't combine llvmpipe with less capable gpus to
make a driver offering more features, that is executing the stuff the gpu can't
do with llvmpipe (but no, we really can't in any meaningful way).
This proposal sounds even more ambitious in some ways, I certainly agree we
can't make it happen. With Vulkan, it may be the developers choice if multiple
gpus are available which one to use for what, so theoretically there might be
some way there to make something like that happen, but I've no idea there
really (plus, unless you're looking at something like at least 5 year old
low-end gpu vs. 8-core current high-end cpu, there'd still be no benefits even
if that could be made to work). There is one thing llvmpipe is "reasonably
good" at compared to gpus, which is shader arithmetic (at least for pixel
shaders, not running in parallel for vertex ones, with tons of gotchas as we
don't currently even optimize empty branches away), but there's just no way to
separate that.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] android: enable building static version of libdrm

2016-01-13 Thread Rob Herring
From: Sumit Semwal 

Android needs libdrm built statically for recovery;
enable that as well.

Signed-off-by: Sumit Semwal 
Signed-off-by: Rob Herring 
Cc: Chih-Wei Huang 
Cc: Emil Velikov 
---
 Android.mk | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/Android.mk b/Android.mk
index 90cdcb3..1d8cd65 100644
--- a/Android.mk
+++ b/Android.mk
@@ -27,6 +27,8 @@ include $(CLEAR_VARS)
 # Import variables LIBDRM_{,H_,INCLUDE_H_,INCLUDE_VMWGFX_H_}FILES
 include $(LOCAL_PATH)/Makefile.sources
 
+#static library for the device (recovery)
+include $(CLEAR_VARS)
 LOCAL_MODULE := libdrm
 LOCAL_MODULE_TAGS := optional
 
@@ -41,7 +43,24 @@ LOCAL_C_INCLUDES := \
 LOCAL_CFLAGS := \
-DHAVE_VISIBILITY=1 \
-DHAVE_LIBDRM_ATOMIC_PRIMITIVES=1
+include $(BUILD_STATIC_LIBRARY)
+
+# Shared library for the device
+include $(CLEAR_VARS)
+LOCAL_MODULE := libdrm
+LOCAL_MODULE_TAGS := optional
 
+LOCAL_SRC_FILES := $(LIBDRM_FILES)
+LOCAL_EXPORT_C_INCLUDE_DIRS := \
+$(LOCAL_PATH) \
+$(LOCAL_PATH)/include/drm
+
+LOCAL_C_INCLUDES := \
+$(LOCAL_PATH)/include/drm
+
+LOCAL_CFLAGS := \
+-DHAVE_VISIBILITY=1 \
+-DHAVE_LIBDRM_ATOMIC_PRIMITIVES=1
 include $(BUILD_SHARED_LIBRARY)
 
 include $(call all-makefiles-under,$(LOCAL_PATH))
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: merge two identical LLVM version checks

2016-01-13 Thread Roland Scheidegger
Am 13.01.2016 um 05:41 schrieb Evangelos Foutras:
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
> b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> index 3ee708f..b119a93 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
> @@ -515,12 +515,9 @@ 
> lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
> MAttrs.push_back(util_cpu_caps.has_ssse3  ? "+ssse3"  : "-ssse3" );
>  #if HAVE_LLVM >= 0x0304
> MAttrs.push_back(util_cpu_caps.has_sse4_1 ? "+sse4.1" : "-sse4.1");
> -#else
> -   MAttrs.push_back(util_cpu_caps.has_sse4_1 ? "+sse41"  : "-sse41" );
> -#endif
> -#if HAVE_LLVM >= 0x0304
> MAttrs.push_back(util_cpu_caps.has_sse4_2 ? "+sse4.2" : "-sse4.2");
>  #else
> +   MAttrs.push_back(util_cpu_caps.has_sse4_1 ? "+sse41"  : "-sse41" );
> MAttrs.push_back(util_cpu_caps.has_sse4_2 ? "+sse42"  : "-sse42" );
>  #endif
> /*
> 

Reviewed-by: Roland Scheidegger 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Where to find MAPI_TABLE_NUM_STATIC & MAPI_TABLE_NUM_DYNAMIC

2016-01-13 Thread Brian Paul


Looks like both symbols are defined in 
src/mapi/shared-glapi/glapi_mapi_tmp.h which is generated at compile 
time by the src/mapi/mapi_abi.py script.


-Brian


On 01/13/2016 08:40 AM, Jouk Jansen wrote:

Hi all,

I'm trying to create a fresh compilation for my OpenVMS system using the
sources I extracted using git today.
At some point the compilation fails because MAPI_TABLE_NUM_STATIC and
MAPI_TABLE_NUM_DYNAMIC are not defined. In a version compiled sometime ago I
found the definitions in the file mapi/vgapi/vgapi_tmp.h, a file generated
during the compilation. However that directory is now obsolete. It seems that
mapi/glapi/glapi_mapi_tmp.h is the replacement, but that file does not
contain neither MAPI_TABLE_NUM_STATIC nor MAPI_TABLE_NUM_DYNAMIC.
Where am I supposed to find the definitions?

Regards
Jouk



Pax, vel iniusta, utilior est quam iustissimum bellum.
 (free after Marcus Tullius Cicero (106 b.Chr.-46 b.Chr.)
  Epistularum ad Atticum 7.1.4.3)


Touch not the cat bot a glove


--<


   Jouk Jansen

   jo...@hrem.nano.tudelft.nl

   Technische Universiteit Delfttt  uu uu  ddd
   Kavli Institute of Nanoscience   tt  uu uu  dddd
   Nationaal centrum voor HREM  tt  uu uu  dd dd
   Lorentzweg 1 tt  uu uu  dd dd
   2628 CJ Delfttt  uu uu  dd dd
   Nederlandtt  uu uu  dddd
   tel. 31-15-2782272   tt   uuu   ddd


--<


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8] gallium/radeon: do not reallocate user memory buffers

2016-01-13 Thread Marek Olšák
Patches 3-8:

Reviewed-by: Marek Olšák 

Marek

On Tue, Jan 12, 2016 at 5:06 PM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> The whole point of AMD_pinned_memory is that applications don't have to map
> buffers via OpenGL - but they're still allowed to, so make sure we don't break
> the link between buffer object and user memory unless explicitly instructed
> to.
> ---
>  src/gallium/drivers/radeon/r600_buffer_common.c | 31 
> ++---
>  src/gallium/drivers/radeon/radeon_winsys.h  |  8 +++
>  src/gallium/winsys/amdgpu/drm/amdgpu_bo.c   |  6 +
>  src/gallium/winsys/radeon/drm/radeon_drm_bo.c   |  6 +
>  4 files changed, 43 insertions(+), 8 deletions(-)
>
> diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
> b/src/gallium/drivers/radeon/r600_buffer_common.c
> index 09755e0..6592c5b 100644
> --- a/src/gallium/drivers/radeon/r600_buffer_common.c
> +++ b/src/gallium/drivers/radeon/r600_buffer_common.c
> @@ -209,11 +209,15 @@ static void r600_buffer_destroy(struct pipe_screen 
> *screen,
> FREE(rbuffer);
>  }
>
> -void r600_invalidate_resource(struct pipe_context *ctx,
> - struct pipe_resource *resource)
> +static bool
> +r600_do_invalidate_resource(struct r600_common_context *rctx,
> +   struct r600_resource *rbuffer)
>  {
> -   struct r600_common_context *rctx = (struct r600_common_context*)ctx;
> -struct r600_resource *rbuffer = r600_resource(resource);
> +   /* In AMD_pinned_memory, the user pointer association only gets
> +* broken when the buffer is explicitly re-allocated.
> +*/
> +   if (rctx->ws->buffer_is_user_ptr(rbuffer->buf))
> +   return false;
>
> /* Check if mapping this buffer would cause waiting for the GPU. */
> if (r600_rings_is_buffer_referenced(rctx, rbuffer->buf, 
> RADEON_USAGE_READWRITE) ||
> @@ -222,6 +226,17 @@ void r600_invalidate_resource(struct pipe_context *ctx,
> } else {
> util_range_set_empty(>valid_buffer_range);
> }
> +
> +   return true;
> +}
> +
> +void r600_invalidate_resource(struct pipe_context *ctx,
> + struct pipe_resource *resource)
> +{
> +   struct r600_common_context *rctx = (struct r600_common_context*)ctx;
> +   struct r600_resource *rbuffer = r600_resource(resource);
> +
> +   (void)r600_do_invalidate_resource(rctx, rbuffer);
>  }
>
>  static void *r600_buffer_get_transfer(struct pipe_context *ctx,
> @@ -291,10 +306,10 @@ static void *r600_buffer_transfer_map(struct 
> pipe_context *ctx,
> !(usage & PIPE_TRANSFER_UNSYNCHRONIZED)) {
> assert(usage & PIPE_TRANSFER_WRITE);
>
> -   r600_invalidate_resource(ctx, resource);
> -
> -   /* At this point, the buffer is always idle. */
> -   usage |= PIPE_TRANSFER_UNSYNCHRONIZED;
> +   if (r600_do_invalidate_resource(rctx, rbuffer)) {
> +   /* At this point, the buffer is always idle. */
> +   usage |= PIPE_TRANSFER_UNSYNCHRONIZED;
> +   }
> }
> else if ((usage & PIPE_TRANSFER_DISCARD_RANGE) &&
>  !(usage & PIPE_TRANSFER_UNSYNCHRONIZED) &&
> diff --git a/src/gallium/drivers/radeon/radeon_winsys.h 
> b/src/gallium/drivers/radeon/radeon_winsys.h
> index 4af6a18..ad30474 100644
> --- a/src/gallium/drivers/radeon/radeon_winsys.h
> +++ b/src/gallium/drivers/radeon/radeon_winsys.h
> @@ -530,6 +530,14 @@ struct radeon_winsys {
>   void *pointer, unsigned size);
>
>  /**
> + * Whether the buffer was created from a user pointer.
> + *
> + * \param buf   A winsys buffer object
> + * \return  whether \p buf was created via buffer_from_ptr
> + */
> +bool (*buffer_is_user_ptr)(struct pb_buffer *buf);
> +
> +/**
>   * Get a winsys handle from a winsys buffer. The internal structure
>   * of the handle is platform-specific and only a winsys should access it.
>   *
> diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c 
> b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
> index a844773..82c803b 100644
> --- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
> +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
> @@ -686,6 +686,11 @@ error:
>  return NULL;
>  }
>
> +static bool amdgpu_bo_is_user_ptr(struct pb_buffer *buf)
> +{
> +   return ((struct amdgpu_winsys_bo*)buf)->user_ptr != NULL;
> +}
> +
>  static uint64_t amdgpu_bo_get_va(struct pb_buffer *buf)
>  {
> return ((struct amdgpu_winsys_bo*)buf)->va;
> @@ -701,6 +706,7 @@ void amdgpu_bo_init_functions(struct amdgpu_winsys *ws)
> ws->base.buffer_create = amdgpu_bo_create;
> ws->base.buffer_from_handle = amdgpu_bo_from_handle;
> ws->base.buffer_from_ptr = amdgpu_bo_from_ptr;
> +   ws->base.buffer_is_user_ptr = 

[Mesa-dev] Where to find MAPI_TABLE_NUM_STATIC & MAPI_TABLE_NUM_DYNAMIC

2016-01-13 Thread Jouk Jansen
Hi all,

I'm trying to create a fresh compilation for my OpenVMS system using the
sources I extracted using git today.
At some point the compilation fails because MAPI_TABLE_NUM_STATIC and
MAPI_TABLE_NUM_DYNAMIC are not defined. In a version compiled sometime ago I
found the definitions in the file mapi/vgapi/vgapi_tmp.h, a file generated
during the compilation. However that directory is now obsolete. It seems that
mapi/glapi/glapi_mapi_tmp.h is the replacement, but that file does not
contain neither MAPI_TABLE_NUM_STATIC nor MAPI_TABLE_NUM_DYNAMIC.
Where am I supposed to find the definitions?

   Regards
Jouk



Pax, vel iniusta, utilior est quam iustissimum bellum.
(free after Marcus Tullius Cicero (106 b.Chr.-46 b.Chr.)
 Epistularum ad Atticum 7.1.4.3)


   Touch not the cat bot a glove

>--<

  Jouk Jansen
 
  jo...@hrem.nano.tudelft.nl

  Technische Universiteit Delfttt  uu uu  ddd
  Kavli Institute of Nanoscience   tt  uu uu  dddd
  Nationaal centrum voor HREM  tt  uu uu  dd dd
  Lorentzweg 1 tt  uu uu  dd dd
  2628 CJ Delfttt  uu uu  dd dd
  Nederlandtt  uu uu  dddd
  tel. 31-15-2782272   tt   uuu   ddd

>--<

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: Implement nir_op_fquantize2f16

2016-01-13 Thread Michael Schellenberger Costa
Hi Jason

Am 13/01/2016 um 00:35 schrieb Jason Ekstrand:
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 13 +
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 10 ++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 6213378..ffb8059 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -943,6 +943,19 @@ fs_visitor::nir_emit_alu(const fs_builder , 
> nir_alu_instr *instr)
>inst->saturate = instr->dest.saturate;
>break;
>  
> +   case nir_op_fquantize2f16: {
> +  fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_D);
> +
> +  /* The destination stride must be at least as big as the source 
> stride. */
> +  tmp.type = BRW_REGISTER_TYPE_W;
> +  tmp.stride = 2;
After a comment like 'at least at big' one would normaly expect some
check to ensure that. Maybe add a "So set it to 2" or whatever

-Michael
> +
> +  bld.emit(BRW_OPCODE_F32TO16, tmp, op[0]);
> +  inst = bld.emit(BRW_OPCODE_F16TO32, result, tmp);
> +  inst->saturate = instr->dest.saturate;
> +  break;
> +   }
> +
> case nir_op_fmin:
> case nir_op_imin:
> case nir_op_umin:
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> index 37f517d..77a2f8b 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> @@ -1177,6 +1177,16 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
>inst->saturate = instr->dest.saturate;
>break;
>  
> +   case nir_op_fquantize2f16: {
> +  /* See also vec4_visitor::emit_pack_half_2x16() */
> +  src_reg tmp = src_reg(this, glsl_type::uvec4_type);
> +
> +  emit(F32TO16(dst_reg(tmp), op[0]));
> +  inst = emit(F16TO32(dst, tmp));
> +  inst->saturate = instr->dest.saturate;
> +  break;
> +   }
> +
> case nir_op_fmin:
> case nir_op_imin:
> case nir_op_umin:
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable late VS export memory allocation

2016-01-13 Thread Marek Olšák
On Wed, Jan 13, 2016 at 4:35 PM, Axel Davy  wrote:
>
>>
>> Axel Davy benchmarked this briefly. We may need more benchmarks though.
>>
>> Marek
>>
>
> I confirm setting this register helps get a few % with heaven.
>
> There was also another register to kill color exports early when doing
> depth only pass that helped a few % (but less).

Do you remember which register it was?

The hardware should not execute PS when doing depth-only rendering and
the shader doesn't use KILL.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev