Re: [Mesa-dev] [PATCH 2/7] i965/gen9: Don't call tr_mode_vertical_texture_alignment() for 1D textures

2015-10-20 Thread Pohjolainen, Topi
On Mon, Oct 19, 2015 at 02:29:04PM -0700, Anuj Phogat wrote:
> On Thu, Aug 13, 2015 at 2:51 PM, Anuj Phogat  wrote:
> > Vertical alignment is not applicable to 1D textures.
> >
> > Signed-off-by: Anuj Phogat 
> > ---
> >  src/mesa/drivers/dri/i965/brw_tex_layout.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c 
> > b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> > index 4e44b15..edd7518 100644
> > --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
> > +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> > @@ -277,7 +277,9 @@ intel_vertical_texture_alignment_unit(struct 
> > brw_context *brw,
> > if (mt->format == MESA_FORMAT_S_UINT8)
> >return brw->gen >= 7 ? 8 : 4;
> >
> > -   if (mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE) {
> > +   if (mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE &&
> > +   mt->target != GL_TEXTURE_1D &&
> > +   mt->target != GL_TEXTURE_1D_ARRAY) {

There is the exact same assertion in tr_mode_vertical_texture_alignment(),
and therefore this makes sense.

> >uint32_t align = tr_mode_vertical_texture_alignment(brw, mt);
> >/* XY_FAST_COPY_BLT doesn't support vertical alignment < 64 */
> >return align < 64 ? 64 : align;
> > --
> > 2.4.3
> >
> 
> Patches 2, 4, 5-7 of this series are waiting for review. These patches are 
> doing
> simple changes and should be easy to review. Here is a patchwork link to the
> list of patches:
> http://patchwork.freedesktop.org/project/mesa/patches/?submitter=10862

Along with patch 2, number 5 is clear improvment also. The replacament of the
static alignment table with a multiplier (targeting another alignment table)
in patch 4 wasn't at first a clear improvement to me. But combined with
patch 6 I can see them reducing the number of lines of code and therefore I'm
in favor of commiting them as well.
In patch 4 you could call "multiplier" as "multiplier_ys" as it is
specifically used to derive YS alignment from YF alignment.
In patch 6 (or later in patch 7) you could declare the variable "i" as const
and initialize it right away with the correct value.

Using "mt->cpp" instead of "_mesa_get_format_bytes(mt->format) * 8" in patch
7 looks better to me also.

So 2 and 4-7 are:

Reviewed-by: Topi Pohjolainen 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] glsl: Implement a SSBO load optimization pass

2015-10-20 Thread Iago Toral Quiroga
This allows us to re-use the results of previous ssbo loads in situations
that are safe (i.e. when there are no stores, atomic operations or
memory barriers in between).

This is particularly useful for things like matrix multiplications, where
for a mat4 buffer variable we cut the number of loads from 16 (4 reads of
each column) down to 4 (1 read of each column).

The pass can only cache ssbo loads that involve constant blocks and
offsets, but could be extended to compare sub-expressions for these
as well, similar to a CSE pass.

The way the cache works is simple: ssbo loads with constant block/offset
are included in a cache as they are seen. Stores invalidate cache entries.
Stores with non-constant offset invalidate all cached loads for the block
and stores with non-constant block invalidate all cache entries. There is
room to improve this by using the actual variable name we are accessing to
limit the entries that should be invalidated. We also need to invalidate
cache entries when we exit the block in which they have been defined
(i.e. inside if/else blocks or loops).

The cache optimization is built as a separate pass, instead of merging it
inside the lower_ubo_reference pass for a number of reasons:

1) The way we process assignments in visitors is that the LHS is
processed before the RHS. This creates a problem for an optimization
such as this when we do things like a = a + 1, since we would see the
store before the read when the actual execution order is reversed.
This could be fixed by re-implementing the logic in the visit_enter
method for ir_assignment in lower_ubo_reference and then returning
visit_continue_with_parent.

2) Some writes/reads need to be split into multiple smaller
writes/reads, and we need to handle caching for each one. This happens
deep inside the code that handles the lowering and some
of the information we need to do this is not available. This could also
be fixed by passing more data into the corresponding functions or by
making this data available as class members, but the current implementation
is already complex enough and  this would only contribute to the complexity.

3) We can have ssbo loads in the LHS too (i.e. a[a[0]] = ..). In these cases
the current code in lower_uo_reference would see the store before the read.
Probably fixable, but again would add more complexity to the lowering.

On the other hand, a separate pass that runs after the lowering sees
all the individal loads and stores in the correct order (so we don't need
to do any tricks) and it allows us to sepearate the lowering logic (which
is already complex) from the caching logic. It also gives us a chance to
run it after other optimization passes have run and turned constant
expressions for block/offset into constants, enabling more opportunities
for caching.
---
 src/glsl/Makefile.sources  |   1 +
 src/glsl/ir_optimization.h |   1 +
 src/glsl/opt_ssbo_load.cpp | 338 +
 3 files changed, 340 insertions(+)
 create mode 100644 src/glsl/opt_ssbo_load.cpp

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index ca87036..73c7514 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -201,6 +201,7 @@ LIBGLSL_FILES = \
opt_noop_swizzle.cpp \
opt_rebalance_tree.cpp \
opt_redundant_jumps.cpp \
+   opt_ssbo_load.cpp \
opt_structure_splitting.cpp \
opt_swizzle_swizzle.cpp \
opt_tree_grafting.cpp \
diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
index ce5c492..26677d7 100644
--- a/src/glsl/ir_optimization.h
+++ b/src/glsl/ir_optimization.h
@@ -125,6 +125,7 @@ bool lower_clip_distance(gl_shader *shader);
 void lower_output_reads(unsigned stage, exec_list *instructions);
 bool lower_packing_builtins(exec_list *instructions, int op_mask);
 void lower_ubo_reference(struct gl_shader *shader, exec_list *instructions);
+bool opt_ssbo_loads(struct gl_shader *shader, exec_list *instructions);
 void lower_packed_varyings(void *mem_ctx,
unsigned locations_used, ir_variable_mode mode,
unsigned gs_input_vertices, gl_shader *shader);
diff --git a/src/glsl/opt_ssbo_load.cpp b/src/glsl/opt_ssbo_load.cpp
new file mode 100644
index 000..5404907
--- /dev/null
+++ b/src/glsl/opt_ssbo_load.cpp
@@ -0,0 +1,338 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all 

[Mesa-dev] [PATCH] i965/fs: Disable CSE optimization for untyped & typed surface reads

2015-10-20 Thread Jordan Justen
An untyped surface read is volatile because it might be affected by a
write.

In the ES31-CTS.compute_shader.resources-max test, two back to back
read/modify/writes of an SSBO variable looked something like this:

  r1 = untyped_surface_read(ssbo_float)
  r2 = r1 + 1
  untyped_surface_write(ssbo_float, r2)
  r3 = untyped_surface_read(ssbo_float)
  r4 = r3 + 1
  untyped_surface_write(ssbo_float, r4)

And after CSE, we had:

  r1 = untyped_surface_read(ssbo_float)
  r2 = r1 + 1
  untyped_surface_write(ssbo_float, r2)
  r4 = r1 + 1
  untyped_surface_write(ssbo_float, r4)

Signed-off-by: Jordan Justen 
---
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp |  3 ++-
 src/mesa/drivers/dri/i965/brw_shader.cpp | 14 ++
 src/mesa/drivers/dri/i965/brw_shader.h   |  6 ++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
index c7628dc..3a28c8d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
@@ -93,7 +93,8 @@ is_expression(const fs_visitor *v, const fs_inst *const inst)
case SHADER_OPCODE_LOAD_PAYLOAD:
   return !inst->is_copy_payload(v->alloc);
default:
-  return inst->is_send_from_grf() && !inst->has_side_effects();
+  return inst->is_send_from_grf() && !inst->has_side_effects() &&
+ !inst->is_volatile();
}
 }
 
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 2324b56..be911ed 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -969,6 +969,20 @@ backend_instruction::has_side_effects() const
}
 }
 
+bool
+backend_instruction::is_volatile() const
+{
+   switch (opcode) {
+   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
+   case SHADER_OPCODE_UNTYPED_SURFACE_READ_LOGICAL:
+   case SHADER_OPCODE_TYPED_SURFACE_READ:
+   case SHADER_OPCODE_TYPED_SURFACE_READ_LOGICAL:
+  return true;
+   default:
+  return false;
+   }
+}
+
 #ifndef NDEBUG
 static bool
 inst_is_in_block(const bblock_t *block, const backend_instruction *inst)
diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
b/src/mesa/drivers/dri/i965/brw_shader.h
index b33b08f..35ee210 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.h
+++ b/src/mesa/drivers/dri/i965/brw_shader.h
@@ -115,6 +115,12 @@ struct backend_instruction : public exec_node {
 * optimize these out unless you know what you are doing.
 */
bool has_side_effects() const;
+
+   /**
+* True if the instruction might be affected by side effects of other
+* instructions.
+*/
+   bool is_volatile() const;
 #else
 struct backend_instruction {
struct exec_node link;
-- 
2.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] i965: Add devinfo parameter to brw_compact_inst_* funcs.

2015-10-20 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Mon, 2015-10-19 at 21:09 -0700, Matt Turner wrote:
> The next commit will add assertions dependent on devinfo->gen.
> 
> Use compact()/uncompact() macros where possible, like the 3-src code
> does.
> ---
>  src/mesa/drivers/dri/i965/brw_eu_compact.c | 118 
> ++---
>  src/mesa/drivers/dri/i965/brw_inst.h   |  30 +---
>  2 files changed, 91 insertions(+), 57 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_compact.c 
> b/src/mesa/drivers/dri/i965/brw_eu_compact.c
> index b798931..facf3cd 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_compact.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_compact.c
> @@ -690,7 +690,7 @@ set_control_index(const struct brw_device_info *devinfo,
>  
> for (int i = 0; i < 32; i++) {
>if (control_index_table[i] == uncompacted) {
> - brw_compact_inst_set_control_index(dst, i);
> + brw_compact_inst_set_control_index(devinfo, dst, i);
>return true;
>}
> }
> @@ -711,7 +711,7 @@ set_datatype_index(const struct brw_device_info *devinfo, 
> brw_compact_inst *dst,
>  
> for (int i = 0; i < 32; i++) {
>if (datatype_table[i] == uncompacted) {
> - brw_compact_inst_set_datatype_index(dst, i);
> + brw_compact_inst_set_datatype_index(devinfo, dst, i);
>return true;
>}
> }
> @@ -732,7 +732,7 @@ set_subreg_index(const struct brw_device_info *devinfo, 
> brw_compact_inst *dst,
>  
> for (int i = 0; i < 32; i++) {
>if (subreg_table[i] == uncompacted) {
> - brw_compact_inst_set_subreg_index(dst, i);
> + brw_compact_inst_set_subreg_index(devinfo, dst, i);
>return true;
>}
> }
> @@ -764,7 +764,7 @@ set_src0_index(const struct brw_device_info *devinfo,
> if (!get_src_index(uncompacted, ))
>return false;
>  
> -   brw_compact_inst_set_src0_index(dst, compacted);
> +   brw_compact_inst_set_src0_index(devinfo, dst, compacted);
>  
> return true;
>  }
> @@ -784,7 +784,7 @@ set_src1_index(const struct brw_device_info *devinfo, 
> brw_compact_inst *dst,
>   return false;
> }
>  
> -   brw_compact_inst_set_src1_index(dst, compacted);
> +   brw_compact_inst_set_src1_index(devinfo, dst, compacted);
>  
> return true;
>  }
> @@ -804,7 +804,7 @@ set_3src_control_index(const struct brw_device_info 
> *devinfo,
>  
> for (unsigned i = 0; i < ARRAY_SIZE(gen8_3src_control_index_table); i++) {
>if (gen8_3src_control_index_table[i] == uncompacted) {
> - brw_compact_inst_set_3src_control_index(dst, i);
> + brw_compact_inst_set_3src_control_index(devinfo, dst, i);
>return true;
>}
> }
> @@ -838,7 +838,7 @@ set_3src_source_index(const struct brw_device_info 
> *devinfo,
>  
> for (unsigned i = 0; i < ARRAY_SIZE(gen8_3src_source_index_table); i++) {
>if (gen8_3src_source_index_table[i] == uncompacted) {
> - brw_compact_inst_set_3src_source_index(dst, i);
> + brw_compact_inst_set_3src_source_index(devinfo, dst, i);
>return true;
>}
> }
> @@ -909,7 +909,7 @@ brw_try_compact_3src_instruction(const struct 
> brw_device_info *devinfo,
>return false;
>  
>  #define compact(field) \
> -   brw_compact_inst_set_3src_##field(dst, brw_inst_3src_##field(devinfo, 
> src))
> +   brw_compact_inst_set_3src_##field(devinfo, dst, 
> brw_inst_3src_##field(devinfo, src))
>  
> compact(opcode);
>  
> @@ -921,7 +921,7 @@ brw_try_compact_3src_instruction(const struct 
> brw_device_info *devinfo,
>  
> compact(dst_reg_nr);
> compact(src0_rep_ctrl);
> -   brw_compact_inst_set_3src_cmpt_control(dst, true);
> +   brw_compact_inst_set_3src_cmpt_control(devinfo, dst, true);
> compact(debug_control);
> compact(saturate);
> compact(src1_rep_ctrl);
> @@ -1003,36 +1003,47 @@ brw_try_compact_instruction(const struct 
> brw_device_info *devinfo,
>  
> memset(, 0, sizeof(temp));
>  
> -   brw_compact_inst_set_opcode(, brw_inst_opcode(devinfo, src));
> -   brw_compact_inst_set_debug_control(, brw_inst_debug_control(devinfo, 
> src));
> +#define compact(field) \
> +   brw_compact_inst_set_##field(devinfo, , brw_inst_##field(devinfo, 
> src))
> +
> +   compact(opcode);
> +   compact(debug_control);
> +
> if (!set_control_index(devinfo, , src))
>return false;
> if (!set_datatype_index(devinfo, , src))
>return false;
> if (!set_subreg_index(devinfo, , src, is_immediate))
>return false;
> -   brw_compact_inst_set_acc_wr_control(,
> -   brw_inst_acc_wr_control(devinfo, 
> src));
> -   brw_compact_inst_set_cond_modifier(,
> -  brw_inst_cond_modifier(devinfo, src));
> +
> +   compact(acc_wr_control);
> +   compact(cond_modifier);
> +
> if (devinfo->gen <= 6)
> -  brw_compact_inst_set_flag_subreg_nr(,
> -  

Re: [Mesa-dev] [PATCH 2/2] vc4: Use nir_foreach_variable

2015-10-20 Thread Eric Anholt
Boyan Ding  writes:

> Signed-off-by: Boyan Ding 
> ---
>  src/gallium/drivers/vc4/vc4_nir_lower_blend.c | 2 +-
>  src/gallium/drivers/vc4/vc4_nir_lower_io.c| 4 ++--
>  src/gallium/drivers/vc4/vc4_program.c | 8 
>  3 files changed, 7 insertions(+), 7 deletions(-)

Pushed the vc4 patch.  I'll leave freedreno to Rob.

Thanks!


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965: Remove block arg from foreach_inst_in_block_*_starting_from

2015-10-20 Thread Neil Roberts
Since 49374fab5d793 these macros no longer actually use the block
argument. I think this is worth doing to make the macros easier to use
because they already have really long names and a confusing set of
arguments.
---
 src/mesa/drivers/dri/i965/brw_cfg.h   | 4 ++--
 src/mesa/drivers/dri/i965/brw_fs.cpp  | 6 +++---
 src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp | 3 +--
 src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp | 2 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp| 2 +-
 5 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h 
b/src/mesa/drivers/dri/i965/brw_cfg.h
index a094917..a06b0aa 100644
--- a/src/mesa/drivers/dri/i965/brw_cfg.h
+++ b/src/mesa/drivers/dri/i965/brw_cfg.h
@@ -327,12 +327,12 @@ struct cfg_t {
 #define foreach_inst_in_block_reverse_safe(__type, __inst, __block) \
foreach_in_list_reverse_safe(__type, __inst, &(__block)->instructions)
 
-#define foreach_inst_in_block_starting_from(__type, __scan_inst, __inst, 
__block) \
+#define foreach_inst_in_block_starting_from(__type, __scan_inst, __inst) \
for (__type *__scan_inst = (__type *)__inst->next;  \
 !__scan_inst->is_tail_sentinel();  \
 __scan_inst = (__type *)__scan_inst->next)
 
-#define foreach_inst_in_block_reverse_starting_from(__type, __scan_inst, 
__inst, __block) \
+#define foreach_inst_in_block_reverse_starting_from(__type, __scan_inst, 
__inst) \
for (__type *__scan_inst = (__type *)__inst->prev;  \
 !__scan_inst->is_head_sentinel();  \
 __scan_inst = (__type *)__scan_inst->prev)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 49323eb..97d7fd7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2456,7 +2456,7 @@ fs_visitor::compute_to_mrf()
   /* Found a move of a GRF to a MRF.  Let's see if we can go
* rewrite the thing that made this GRF to write into the MRF.
*/
-  foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst, 
block) {
+  foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) {
 if (scan_inst->dst.file == GRF &&
 scan_inst->dst.reg == inst->src[0].reg) {
/* Found the last thing to write our reg we want to turn
@@ -2789,7 +2789,7 @@ 
fs_visitor::insert_gen4_pre_send_dependency_workarounds(bblock_t *block,
 * we assume that there are no outstanding dependencies on entry to the
 * program.
 */
-   foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst, 
block) {
+   foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) {
   /* If we hit control flow, assume that there *are* outstanding
* dependencies, and force their cleanup before our instruction.
*/
@@ -2855,7 +2855,7 @@ 
fs_visitor::insert_gen4_post_send_dependency_workarounds(bblock_t *block, fs_ins
/* Walk forwards looking for writes to registers we're writing which aren't
 * read before being written.
 */
-   foreach_inst_in_block_starting_from(fs_inst, scan_inst, inst, block) {
+   foreach_inst_in_block_starting_from(fs_inst, scan_inst, inst) {
   /* If we hit control flow, force resolve all remaining dependencies. */
   if (block->end() == scan_inst) {
  for (int i = 0; i < write_len; i++) {
diff --git a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
index 469f2ea..883e8d2 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
@@ -87,8 +87,7 @@ opt_cmod_propagation_local(bblock_t *block)
  continue;
 
   bool read_flag = false;
-  foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst,
-  block) {
+  foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) {
  if (scan_inst->overwrites_reg(inst->src[0])) {
 if (scan_inst->is_partial_write() ||
 scan_inst->dst.reg_offset != inst->src[0].reg_offset)
diff --git a/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp
index 8792a8c..862e324 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp
@@ -64,7 +64,7 @@ opt_saturate_propagation_local(fs_visitor *v, bblock_t *block)
   int src_end_ip = v->live_intervals->end[src_var];
 
   bool interfered = false;
-  foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst, 
block) {
+  foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) {
  if (scan_inst->overwrites_reg(inst->src[0])) {
 if 

Re: [Mesa-dev] [PATCH 4/9] i965: Compact acc_wr_control only on Gen6+.

2015-10-20 Thread Iago Toral
On Mon, 2015-10-19 at 21:09 -0700, Matt Turner wrote:
> It only exists on Gen6+, and the next patches will add compaction
> support for the (unused) field in the same location on earlier
> platforms.

The docs say that this exists also in ILK at least. See Page 131 of:
https://01.org/sites/default/files/documentation/ilk_ihd_os_vol4_part2_july_28_10_0.pdf

However, I see some places in the i965 code where dealing with this is
surrounded by if (gen >= 6)... 

Is this a bug in the ILK documentation?

Iago

> ---
>  src/mesa/drivers/dri/i965/brw_eu_compact.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_compact.c 
> b/src/mesa/drivers/dri/i965/brw_eu_compact.c
> index facf3cd..b122dec 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_compact.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_compact.c
> @@ -1016,7 +1016,10 @@ brw_try_compact_instruction(const struct 
> brw_device_info *devinfo,
> if (!set_subreg_index(devinfo, , src, is_immediate))
>return false;
>  
> -   compact(acc_wr_control);
> +   if (devinfo->gen >= 6) {
> +  compact(acc_wr_control);
> +   }
> +
> compact(cond_modifier);
>  
> if (devinfo->gen <= 6)
> @@ -1224,7 +1227,10 @@ brw_uncompact_instruction(const struct brw_device_info 
> *devinfo, brw_inst *dst,
>  
> set_uncompacted_subreg(devinfo, dst, src);
>  
> -   uncompact(acc_wr_control);
> +   if (devinfo->gen >= 6) {
> +  uncompact(acc_wr_control);
> +   }
> +
> uncompact(cond_modifier);
>  
> if (devinfo->gen <= 6)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965/fs: Improve search for the argument source in opt_zero_samples

2015-10-20 Thread Neil Roberts
The opt_zero_samples instruction tries to find the corresponding
load_payload instruction for each sample instruction. However it was
previously only looking at the previous instruction. This patch makes
it search back within the block to whatever was the last instruction
to write to each individual argument to the send message. There are
two reasons two do this:

On Gen<=6 load_payload isn't used and there is a separate message
register file. This version of the optimisation also finds MOVs into
the MRF registers so it now also works on SNB. Unfortunately this
doesn't show up in a shader-db report because the dead code eliminator
doesn't do anything for instructions writing to MRF registers so it
can't remove the redundant MOVs. However if I hack Mesa to report the
message lengths instead of the instruction counts then it shows this:

total mlen in shared programs: 2600373 -> 2574663 (-0.99%)
mlen in affected programs: 237077 -> 211367 (-10.84%)
helped:3508
HURT:  0

I haven't tested whether reducing the message length without
decreasing the instruction count is actually a performance benefit but
it's hard to imagine that it could possibly be a disadvantage. It also
paves the way to reduce the instruction count later if someone
improves the dead code eliminator.

Secondly it could help on other gens because sometimes the
load_payload instruction can become separated from the corresponding
send instruction and the old version wouldn't work in those cases.
Currently this doesn't seem to make any difference in practice because
the register coalescer is run after this optimisation. However it
seems like this version is more robust.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 54 
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 97d7fd7..f87a5a7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2150,6 +2150,41 @@ fs_visitor::opt_algebraic()
return progress;
 }
 
+static bool
+last_texture_source_is_zero(const fs_inst *send_inst)
+{
+   int reg_offset = send_inst->mlen - send_inst->exec_size / 8;
+   fs_reg src;
+
+   /* Get the last argument of the texture instruction */
+   if (send_inst->is_send_from_grf())
+  src = byte_offset(send_inst->src[0], reg_offset * 32);
+   else
+  src = fs_reg(MRF, send_inst->base_mrf + reg_offset);
+
+   /* Look for the last instruction that writes to the source */
+   foreach_inst_in_block_reverse_starting_from(const fs_inst, inst, send_inst) 
{
+  if (inst->overwrites_reg(src)) {
+ if (inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD) {
+const int src_num = ((send_inst->mlen - send_inst->header_size) /
+ (inst->exec_size / 8) +
+ inst->header_size - 1);
+return inst->src[src_num].is_zero();
+ } else if (inst->opcode == BRW_OPCODE_MOV) {
+if (inst->is_partial_write() || !inst->dst.equals(src))
+   return false;
+
+return inst->src[0].is_zero();
+ }
+
+ /* Something unknown is writing to the src */
+ break;
+  }
+   }
+
+   return false;
+}
+
 /**
  * Optimize sample messages that have constant zero values for the trailing
  * texture coordinates. We can just reduce the message length for these
@@ -2173,12 +2208,6 @@ fs_visitor::opt_zero_samples()
   if (!inst->is_tex())
  continue;
 
-  fs_inst *load_payload = (fs_inst *) inst->prev;
-
-  if (load_payload->is_head_sentinel() ||
-  load_payload->opcode != SHADER_OPCODE_LOAD_PAYLOAD)
- continue;
-
   /* We don't want to remove the message header or the first parameter.
* Removing the first parameter is not allowed, see the Haswell PRM
* volume 7, page 149:
@@ -2186,12 +2215,13 @@ fs_visitor::opt_zero_samples()
* "Parameter 0 is required except for the sampleinfo message, which
*  has no parameter 0"
*/
-  while (inst->mlen > inst->header_size + inst->exec_size / 8 &&
- load_payload->src[(inst->mlen - inst->header_size) /
-   (inst->exec_size / 8) +
-   inst->header_size - 1].is_zero()) {
- inst->mlen -= inst->exec_size / 8;
- progress = true;
+  while (inst->mlen > inst->header_size + inst->exec_size / 8) {
+ if (last_texture_source_is_zero(inst)) {
+inst->mlen -= inst->exec_size / 8;
+progress = true;
+ } else {
+break;
+ }
   }
}
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/9] i965/vec4: Initialize LOD to 0.0f.

2015-10-20 Thread Matt Turner
On Mon, Oct 19, 2015 at 9:09 PM, Matt Turner  wrote:
> We implement textureQueryLevels (which takes no arguments, save the
> sampler) using the resinfo message (which takes an argument of LOD).
> Without initializing it, we'd generate a MOV from the null register to
> load the LOD argument.
> ---
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> index ea1e3e7..c942c80 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> @@ -1579,7 +1579,7 @@ vec4_visitor::nir_emit_texture(nir_tex_instr *instr)
> const glsl_type *coord_type = NULL;
> src_reg shadow_comparitor;
> src_reg offset_value;
> -   src_reg lod, lod2;
> +   src_reg lod(0.0f), lod2;
> src_reg sample_index;
> src_reg mcs;

The fs backend handles this differently --

   if (op == ir_query_levels) {
  /* textureQueryLevels() is implemented in terms of TXS so we need to
   * pass a valid LOD argument.
   */
  assert(lod.file == BAD_FILE);
  lod = fs_reg(0u);
   }

That kinda seems worse -- but it does indicate that I should probably
initialize lod with 0u instead, though it doesn't seem to matter...

mov(8)  g10<1>F [0F, 0F, 0F, 0F]VF
send(8) g3<1>UW g10<8,8,1>F
 sampler resinfo SIMD8 Surface = 1 Sampler = 0 mlen 1 rlen 4

Opinions?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] glsl: Implement a SSBO load optimization pass

2015-10-20 Thread Iago Toral
On Tue, 2015-10-20 at 13:22 +0300, Francisco Jerez wrote:
> Iago Toral Quiroga  writes:
> 
> > This allows us to re-use the results of previous ssbo loads in situations
> > that are safe (i.e. when there are no stores, atomic operations or
> > memory barriers in between).
> >
> > This is particularly useful for things like matrix multiplications, where
> > for a mat4 buffer variable we cut the number of loads from 16 (4 reads of
> > each column) down to 4 (1 read of each column).
> >
> > The pass can only cache ssbo loads that involve constant blocks and
> > offsets, but could be extended to compare sub-expressions for these
> > as well, similar to a CSE pass.
> >
> > The way the cache works is simple: ssbo loads with constant block/offset
> > are included in a cache as they are seen. Stores invalidate cache entries.
> > Stores with non-constant offset invalidate all cached loads for the block
> > and stores with non-constant block invalidate all cache entries. There is
> > room to improve this by using the actual variable name we are accessing to
> > limit the entries that should be invalidated. We also need to invalidate
> > cache entries when we exit the block in which they have been defined
> > (i.e. inside if/else blocks or loops).
> >
> > The cache optimization is built as a separate pass, instead of merging it
> > inside the lower_ubo_reference pass for a number of reasons:
> >
> > 1) The way we process assignments in visitors is that the LHS is
> > processed before the RHS. This creates a problem for an optimization
> > such as this when we do things like a = a + 1, since we would see the
> > store before the read when the actual execution order is reversed.
> > This could be fixed by re-implementing the logic in the visit_enter
> > method for ir_assignment in lower_ubo_reference and then returning
> > visit_continue_with_parent.
> >
> > 2) Some writes/reads need to be split into multiple smaller
> > writes/reads, and we need to handle caching for each one. This happens
> > deep inside the code that handles the lowering and some
> > of the information we need to do this is not available. This could also
> > be fixed by passing more data into the corresponding functions or by
> > making this data available as class members, but the current implementation
> > is already complex enough and  this would only contribute to the complexity.
> >
> > 3) We can have ssbo loads in the LHS too (i.e. a[a[0]] = ..). In these cases
> > the current code in lower_uo_reference would see the store before the read.
> > Probably fixable, but again would add more complexity to the lowering.
> >
> > On the other hand, a separate pass that runs after the lowering sees
> > all the individal loads and stores in the correct order (so we don't need
> > to do any tricks) and it allows us to sepearate the lowering logic (which
> > is already complex) from the caching logic. It also gives us a chance to
> > run it after other optimization passes have run and turned constant
> > expressions for block/offset into constants, enabling more opportunities
> > for caching.
> 
> Seems like a restricted form of CSE that only handles SSBO loads, and
> only the ones with constant arguments.  Why don't we CSE these? (and
> other memory access operations like image loads)

There is not a CSE pass in GLSL IR any more so we would have to do it in
NIR and some drivers would lose the optimization. Doing it at GLSL IR
level seemed like a win from this perspective.

Then there is the fact that we cannot just CSE these. We need to make
sure that we only CSE them when it is safe to do so (i.e. no
stores/atomics to the same offsets in between, no memory barriers, etc).
The current CSE pass in NIR does not support this as far as I can see. I
suppose that we could look into changing the pass to accommodate
restrictions such as this if we think is worth it.

Iago

> 
> > ---
> >  src/glsl/Makefile.sources  |   1 +
> >  src/glsl/ir_optimization.h |   1 +
> >  src/glsl/opt_ssbo_load.cpp | 338 
> > +
> >  3 files changed, 340 insertions(+)
> >  create mode 100644 src/glsl/opt_ssbo_load.cpp
> >
> > diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> > index ca87036..73c7514 100644
> > --- a/src/glsl/Makefile.sources
> > +++ b/src/glsl/Makefile.sources
> > @@ -201,6 +201,7 @@ LIBGLSL_FILES = \
> > opt_noop_swizzle.cpp \
> > opt_rebalance_tree.cpp \
> > opt_redundant_jumps.cpp \
> > +   opt_ssbo_load.cpp \
> > opt_structure_splitting.cpp \
> > opt_swizzle_swizzle.cpp \
> > opt_tree_grafting.cpp \
> > diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
> > index ce5c492..26677d7 100644
> > --- a/src/glsl/ir_optimization.h
> > +++ b/src/glsl/ir_optimization.h
> > @@ -125,6 +125,7 @@ bool lower_clip_distance(gl_shader *shader);
> >  void lower_output_reads(unsigned stage, exec_list *instructions);
> >  bool lower_packing_builtins(exec_list 

[Mesa-dev] [PATCH 2/3] glsl: do not try to reserve explicit locations for buffer variables

2015-10-20 Thread Tapani Pälli
Explicit locations are only used with uniform variables.

Signed-off-by: Tapani Pälli 
---
 src/glsl/linker.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index 25ca928..481481b 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -3117,8 +3117,8 @@ check_explicit_uniform_locations(struct gl_context *ctx,
 
   foreach_in_list(ir_instruction, node, sh->ir) {
  ir_variable *var = node->as_variable();
- if (var && (var->data.mode == ir_var_uniform || var->data.mode == 
ir_var_shader_storage) &&
- var->data.explicit_location) {
+ if (var && (var->data.mode == ir_var_uniform &&
+ var->data.explicit_location)) {
 bool ret;
 if (var->type->is_subroutine())
ret = reserve_subroutine_explicit_locations(prog, sh, var);
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when filling UniformRemapTable

2015-10-20 Thread Tapani Pälli
UniformRemapTable is used only for remapping user specified uniform
locations to driver internally used ones, shader storage buffer
variables should not utilize uniform locations.

Signed-off-by: Tapani Pälli 
---
 src/glsl/link_uniforms.cpp | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp
index fe00aa3..f7b87a1 100644
--- a/src/glsl/link_uniforms.cpp
+++ b/src/glsl/link_uniforms.cpp
@@ -1180,7 +1180,8 @@ link_assign_uniform_locations(struct gl_shader_program 
*prog,
 
/* Reserve all the explicit locations of the active uniforms. */
for (unsigned i = 0; i < num_uniforms; i++) {
-  if (uniforms[i].type->is_subroutine())
+  if (uniforms[i].type->is_subroutine() ||
+  uniforms[i].is_shader_storage)
  continue;
 
   if (uniforms[i].remap_location != UNMAPPED_UNIFORM_LOC) {
@@ -1200,8 +1201,10 @@ link_assign_uniform_locations(struct gl_shader_program 
*prog,
/* Reserve locations for rest of the uniforms. */
for (unsigned i = 0; i < num_uniforms; i++) {
 
-  if (uniforms[i].type->is_subroutine())
+  if (uniforms[i].type->is_subroutine() ||
+  uniforms[i].is_shader_storage)
  continue;
+
   /* Built-in uniforms should not get any location. */
   if (uniforms[i].builtin)
  continue;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] glsl: fix record type detection in explicit location assign

2015-10-20 Thread Tapani Pälli
Check current_var directly instead of using the passed in record_type.

This fixes following failing CTS test:
ES31-CTS.explicit_uniform_location.uniform-loc-types-structs

No Piglit regressions.

Signed-off-by: Tapani Pälli 
---
 src/glsl/link_uniforms.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp
index f7b87a1..6efde5c 100644
--- a/src/glsl/link_uniforms.cpp
+++ b/src/glsl/link_uniforms.cpp
@@ -763,7 +763,7 @@ private:
   /* Assign explicit locations. */
   if (current_var->data.explicit_location) {
  /* Set sequential locations for struct fields. */
- if (record_type != NULL) {
+ if (current_var->type->without_array()->is_record()) {
 const unsigned entries = MAX2(1, 
this->uniforms[id].array_elements);
 this->uniforms[id].remap_location =
this->explicit_location + field_counter;
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-20 Thread Francisco Jerez
Kristian Høgsberg  writes:

> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez  
> wrote:
>> Neil Roberts  writes:
>>
>>> Just a thought, would it be better to move this check into the
>>> eliminate_find_live_channel optimisation? That way it could catch
>>> sources that become immediates through later optimisations. One problem
>>> with this that I've seen before is that eliminating the
>>> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be
>>> eliminated because the copy propagation doesn't work.
>>
>> I believe in this particular case the BROADCAST instruction should
>> already be eliminated (by opt_algebraic), because its first source is an
>> immediate, so this optimization would in fact be redundant with
>> opt_algebraic+constant propagation if it weren't because the latter is
>> unable to handle scalar copies.
>
> Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets
> eliminated because it's outside control flow
> (fs_visitor::eliminate_find_live_channel). Then the broadcast gets
> reduced to a MOV in opt_algebraic because src0 is uniform (immediate
> constant). The problem then is the dst of the MOV has stride 0 which
> can_propagate_from() then bails on.
>
>> Another possibility that would likely
>> be more general than this (because it would handle cases in which, as
>> Neil said, the argument becomes an immediate only after optimization,
>> and because it would also avoid the issue with liveness analysis
>> extending the live ranges of scalar values incorrectly [1]) would be to
>> extend the destination register of the BROADCAST instructions to a
>> vector [2] and then add a case to the switch statement of
>> try_constant_propagate() so that the immediate MOV resulting from
>> opt_algebraic is propagated into surface instructions (see attachment).
>
> I wrote exactly this code to make constant propagate work, plus adding
> the extra opcodes to the switch statement. It works and I could
> certainly send that out after this, but
>
> 1) This doesn't mean we shouldn't do the if (src.file == IMM)
> shortcut. It saves the compiler a bit of work in the very common case
> of
> non-indirect buffer access.
>
> 2) I'm not even sure it makes sense to extend copy-propagation to do
> this (which is why I went back to just the IMM test). Anything that
> would be an immediate at this point should be an immediate, if not
> we're missing something in nir.
>
Still this doesn't address the root of the problem, which is that
emit_uniformize() emits scalar code that the rest of the compiler is not
able to handle properly.  Re-implementing a special case of the
optimization already done by opt_algebraic() might have helped in this
specific case, but it won't help when the argument is of some other kind
of uniform value which are also frequently encountered in practice and
could also be copy-propagated, and it won't help avoid the liveness
analysis bug [1] in cases where the argument is not an immediate (the
bug is reported to break some thousands SSBO dEQP tests, although I
don't know what fraction of them actually use non-constant indexing).
The alternative solution I provided patches for (and you seem to have
implemented yourself independently) would address all these issues at
once.

> Kristian
>
>>> I made this patch a while ago but I never posted it anywhere because
>>> it's a of a kludge and it would probably be better to fix the copy
>>> propagation:
>>>
>>> https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa
>>>
>> Heh, yeah, I'd rather fix copy propagation instead, which I believe will
>> become much easier with the use-def-chain analysis pass I'm working on.
>>
>>> Either way though I don't think it would do any harm to have Kristian's
>>> patch as well even if we did improve elimintate_find_live_channel so it
>>> is:
>>>
>>> Reviewed-by: Neil Roberts 
>>>
>>> - Neil
>>>
>>> Kristian Høgsberg Kristensen  writes:
>>>
 An immdiate is already uniform so just return it up front. Without this,
 brw_fs_surface_builder ends up passing immediate surface indices through
 SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't
 constant propagate out of, and further, we don't constant propagate into
 the typed/untype read/write opcodes at all.  The end result is that all
 typed/untyped read/write/atomics end up as indirect sends.

 Code generation should always produce either an immediate or an actual
 indirect surface index, so we can fix this by just special casing
 immediates in emit_uniformize.

 Signed-off-by: Kristian Høgsberg Kristensen 
 ---
  src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++
  1 file changed, 3 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
 b/src/mesa/drivers/dri/i965/brw_fs_builder.h
 index df10a9d..98ce71e 

Re: [Mesa-dev] [PATCH 1/2] glsl: Implement a SSBO load optimization pass

2015-10-20 Thread Francisco Jerez
Iago Toral Quiroga  writes:

> This allows us to re-use the results of previous ssbo loads in situations
> that are safe (i.e. when there are no stores, atomic operations or
> memory barriers in between).
>
> This is particularly useful for things like matrix multiplications, where
> for a mat4 buffer variable we cut the number of loads from 16 (4 reads of
> each column) down to 4 (1 read of each column).
>
> The pass can only cache ssbo loads that involve constant blocks and
> offsets, but could be extended to compare sub-expressions for these
> as well, similar to a CSE pass.
>
> The way the cache works is simple: ssbo loads with constant block/offset
> are included in a cache as they are seen. Stores invalidate cache entries.
> Stores with non-constant offset invalidate all cached loads for the block
> and stores with non-constant block invalidate all cache entries. There is
> room to improve this by using the actual variable name we are accessing to
> limit the entries that should be invalidated. We also need to invalidate
> cache entries when we exit the block in which they have been defined
> (i.e. inside if/else blocks or loops).
>
> The cache optimization is built as a separate pass, instead of merging it
> inside the lower_ubo_reference pass for a number of reasons:
>
> 1) The way we process assignments in visitors is that the LHS is
> processed before the RHS. This creates a problem for an optimization
> such as this when we do things like a = a + 1, since we would see the
> store before the read when the actual execution order is reversed.
> This could be fixed by re-implementing the logic in the visit_enter
> method for ir_assignment in lower_ubo_reference and then returning
> visit_continue_with_parent.
>
> 2) Some writes/reads need to be split into multiple smaller
> writes/reads, and we need to handle caching for each one. This happens
> deep inside the code that handles the lowering and some
> of the information we need to do this is not available. This could also
> be fixed by passing more data into the corresponding functions or by
> making this data available as class members, but the current implementation
> is already complex enough and  this would only contribute to the complexity.
>
> 3) We can have ssbo loads in the LHS too (i.e. a[a[0]] = ..). In these cases
> the current code in lower_uo_reference would see the store before the read.
> Probably fixable, but again would add more complexity to the lowering.
>
> On the other hand, a separate pass that runs after the lowering sees
> all the individal loads and stores in the correct order (so we don't need
> to do any tricks) and it allows us to sepearate the lowering logic (which
> is already complex) from the caching logic. It also gives us a chance to
> run it after other optimization passes have run and turned constant
> expressions for block/offset into constants, enabling more opportunities
> for caching.

Seems like a restricted form of CSE that only handles SSBO loads, and
only the ones with constant arguments.  Why don't we CSE these? (and
other memory access operations like image loads)


> ---
>  src/glsl/Makefile.sources  |   1 +
>  src/glsl/ir_optimization.h |   1 +
>  src/glsl/opt_ssbo_load.cpp | 338 
> +
>  3 files changed, 340 insertions(+)
>  create mode 100644 src/glsl/opt_ssbo_load.cpp
>
> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> index ca87036..73c7514 100644
> --- a/src/glsl/Makefile.sources
> +++ b/src/glsl/Makefile.sources
> @@ -201,6 +201,7 @@ LIBGLSL_FILES = \
>   opt_noop_swizzle.cpp \
>   opt_rebalance_tree.cpp \
>   opt_redundant_jumps.cpp \
> + opt_ssbo_load.cpp \
>   opt_structure_splitting.cpp \
>   opt_swizzle_swizzle.cpp \
>   opt_tree_grafting.cpp \
> diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
> index ce5c492..26677d7 100644
> --- a/src/glsl/ir_optimization.h
> +++ b/src/glsl/ir_optimization.h
> @@ -125,6 +125,7 @@ bool lower_clip_distance(gl_shader *shader);
>  void lower_output_reads(unsigned stage, exec_list *instructions);
>  bool lower_packing_builtins(exec_list *instructions, int op_mask);
>  void lower_ubo_reference(struct gl_shader *shader, exec_list *instructions);
> +bool opt_ssbo_loads(struct gl_shader *shader, exec_list *instructions);
>  void lower_packed_varyings(void *mem_ctx,
> unsigned locations_used, ir_variable_mode mode,
> unsigned gs_input_vertices, gl_shader *shader);
> diff --git a/src/glsl/opt_ssbo_load.cpp b/src/glsl/opt_ssbo_load.cpp
> new file mode 100644
> index 000..5404907
> --- /dev/null
> +++ b/src/glsl/opt_ssbo_load.cpp
> @@ -0,0 +1,338 @@
> +/*
> + * Copyright © 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the 

Re: [Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when filling UniformRemapTable

2015-10-20 Thread Tapani Pälli



On 10/20/2015 01:11 PM, Lofstedt, Marta wrote:


If you also remove the uniforms[i].array_elements in the same manner, the 
assert in:
ES31-CTS.explicit_uniform_location.uniform-loc-arrays-of-arrays
Is also avoided.


Are you referring to one of the loops iterating uniforms in this same 
place the patch is touching?


For me it looks like the problem with arrays-of-arrays test is that it 
will assign overlapping locations currently for uniforms if we have 
'arrays of arrays' situation going on.


For example with

layout(location = 0) uniform float uni[2][2]

uni[0] gets location 0, but uni[1] has gets location 0 too. I haven't 
quite figured out yet where the issue is but the location counter gets 
reset for each array item and starts again from 0 while it should 
actually maintain its count ..




-Original Message-
From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
Behalf Of Tapani Pälli
Sent: Tuesday, October 20, 2015 11:24 AM
To: mesa-dev@lists.freedesktop.org
Subject: [Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when filling
UniformRemapTable

UniformRemapTable is used only for remapping user specified uniform
locations to driver internally used ones, shader storage buffer variables
should not utilize uniform locations.

Signed-off-by: Tapani Pälli 
---
  src/glsl/link_uniforms.cpp | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp index
fe00aa3..f7b87a1 100644
--- a/src/glsl/link_uniforms.cpp
+++ b/src/glsl/link_uniforms.cpp
@@ -1180,7 +1180,8 @@ link_assign_uniform_locations(struct
gl_shader_program *prog,

 /* Reserve all the explicit locations of the active uniforms. */
 for (unsigned i = 0; i < num_uniforms; i++) {
-  if (uniforms[i].type->is_subroutine())
+  if (uniforms[i].type->is_subroutine() ||
+  uniforms[i].is_shader_storage)
   continue;

if (uniforms[i].remap_location != UNMAPPED_UNIFORM_LOC) { @@ -
1200,8 +1201,10 @@ link_assign_uniform_locations(struct
gl_shader_program *prog,
 /* Reserve locations for rest of the uniforms. */
 for (unsigned i = 0; i < num_uniforms; i++) {

-  if (uniforms[i].type->is_subroutine())
+  if (uniforms[i].type->is_subroutine() ||
+  uniforms[i].is_shader_storage)
   continue;
+
/* Built-in uniforms should not get any location. */
if (uniforms[i].builtin)
   continue;
--
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: add support for ARB_texture_view

2015-10-20 Thread Marek Olšák
On Tue, Oct 20, 2015 at 9:42 AM, Michel Dänzer  wrote:
> On 17.10.2015 22:09, Marek Olšák wrote:
>> From: Marek Olšák 
>>
>> All tests pass. We don't need to do much - just set CUBE if the view
>> target is CUBE or CUBE_ARRAY, otherwise set the resource target.
>>
>> The reason this is so simple can be that texture instructions
>> have a greater effect on the target than the sampler view.
>
> I think you meant to swap "is" and "can be": "The reason this can be so
> simple is that texture instructions have a greater effect on the target
> than the sampler view."
>
> BTW, is that statement really accurate? Isn't it rather because the
> format and mip level clamping was already handled via the sampler view
> before?

I didn't consider the format and clamping to be anything new here.

The only thing that is new with ARB_texture_view is that the texture
target can be overriden by sampler views. The thing is the hardware
doesn't care much :) setting the correct target in the shader already
had the desired effect for all types except Cube.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when filling UniformRemapTable

2015-10-20 Thread Lofstedt, Marta
> -Original Message-
> From: Palli, Tapani
> Sent: Tuesday, October 20, 2015 12:25 PM
> To: Lofstedt, Marta; mesa-dev@lists.freedesktop.org
> Subject: Re: [Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when filling
> UniformRemapTable
> 
> 
> 
> On 10/20/2015 01:11 PM, Lofstedt, Marta wrote:
> >
> > If you also remove the uniforms[i].array_elements in the same manner,
> the assert in:
> > ES31-CTS.explicit_uniform_location.uniform-loc-arrays-of-arrays
> > Is also avoided.
> 
> Are you referring to one of the loops iterating uniforms in this same place 
> the
> patch is touching?

Yes, I hit the same assert in "uniform-loc-structs" and 
"uniform-loc-arrays-of-arrays", with your patchset I no longer hit the assert 
for the "uniform-loc-structs" tests. 

If I also remove array_elements I also avoid the assert in 
"uniform-loc-arrays-of-arrays". This is just an observation, not a suggestion 
of a fix.

> 
> For me it looks like the problem with arrays-of-arrays test is that it will 
> assign
> overlapping locations currently for uniforms if we have 'arrays of arrays'
> situation going on.
> 
> For example with
> 
> layout(location = 0) uniform float uni[2][2]
> 
> uni[0] gets location 0, but uni[1] has gets location 0 too. I haven't quite
> figured out yet where the issue is but the location counter gets reset for
> each array item and starts again from 0 while it should actually maintain its
> count ..
> 
> 
> >> -Original Message-
> >> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
> >> Behalf Of Tapani Pälli
> >> Sent: Tuesday, October 20, 2015 11:24 AM
> >> To: mesa-dev@lists.freedesktop.org
> >> Subject: [Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when
> >> filling UniformRemapTable
> >>
> >> UniformRemapTable is used only for remapping user specified uniform
> >> locations to driver internally used ones, shader storage buffer
> >> variables should not utilize uniform locations.
> >>
> >> Signed-off-by: Tapani Pälli 
> >> ---
> >>   src/glsl/link_uniforms.cpp | 7 +--
> >>   1 file changed, 5 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp
> >> index
> >> fe00aa3..f7b87a1 100644
> >> --- a/src/glsl/link_uniforms.cpp
> >> +++ b/src/glsl/link_uniforms.cpp
> >> @@ -1180,7 +1180,8 @@ link_assign_uniform_locations(struct
> >> gl_shader_program *prog,
> >>
> >>  /* Reserve all the explicit locations of the active uniforms. */
> >>  for (unsigned i = 0; i < num_uniforms; i++) {
> >> -  if (uniforms[i].type->is_subroutine())
> >> +  if (uniforms[i].type->is_subroutine() ||
> >> +  uniforms[i].is_shader_storage)
> >>continue;
> >>
> >> if (uniforms[i].remap_location != UNMAPPED_UNIFORM_LOC) { @@
> >> -
> >> 1200,8 +1201,10 @@ link_assign_uniform_locations(struct
> >> gl_shader_program *prog,
> >>  /* Reserve locations for rest of the uniforms. */
> >>  for (unsigned i = 0; i < num_uniforms; i++) {
> >>
> >> -  if (uniforms[i].type->is_subroutine())
> >> +  if (uniforms[i].type->is_subroutine() ||
> >> +  uniforms[i].is_shader_storage)
> >>continue;
> >> +
> >> /* Built-in uniforms should not get any location. */
> >> if (uniforms[i].builtin)
> >>continue;
> >> --
> >> 2.4.3
> >>
> >> ___
> >> mesa-dev mailing list
> >> mesa-dev@lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when filling UniformRemapTable

2015-10-20 Thread Tapani Pälli



On 10/20/2015 01:40 PM, Lofstedt, Marta wrote:

-Original Message-
From: Palli, Tapani
Sent: Tuesday, October 20, 2015 12:25 PM
To: Lofstedt, Marta; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when filling
UniformRemapTable



On 10/20/2015 01:11 PM, Lofstedt, Marta wrote:


If you also remove the uniforms[i].array_elements in the same manner,

the assert in:

ES31-CTS.explicit_uniform_location.uniform-loc-arrays-of-arrays
Is also avoided.


Are you referring to one of the loops iterating uniforms in this same place the
patch is touching?


Yes, I hit the same assert in "uniform-loc-structs" and "uniform-loc-arrays-of-arrays", 
with your patchset I no longer hit the assert for the "uniform-loc-structs" tests.

If I also remove array_elements I also avoid the assert in 
"uniform-loc-arrays-of-arrays". This is just an observation, not a suggestion 
of a fix.


Right, I was not sure at first what you meant by 'removing 
array_elements'. I was hoping arrays-of-arrays to be the same case as 
with loc-structs but unfortunately seems quite different, we just lack 
some support for arrays of arrays here. Will dig more!





For me it looks like the problem with arrays-of-arrays test is that it will 
assign
overlapping locations currently for uniforms if we have 'arrays of arrays'
situation going on.

For example with

layout(location = 0) uniform float uni[2][2]

uni[0] gets location 0, but uni[1] has gets location 0 too. I haven't quite
figured out yet where the issue is but the location counter gets reset for
each array item and starts again from 0 while it should actually maintain its
count ..



-Original Message-
From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
Behalf Of Tapani Pälli
Sent: Tuesday, October 20, 2015 11:24 AM
To: mesa-dev@lists.freedesktop.org
Subject: [Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when
filling UniformRemapTable

UniformRemapTable is used only for remapping user specified uniform
locations to driver internally used ones, shader storage buffer
variables should not utilize uniform locations.

Signed-off-by: Tapani Pälli 
---
   src/glsl/link_uniforms.cpp | 7 +--
   1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp
index
fe00aa3..f7b87a1 100644
--- a/src/glsl/link_uniforms.cpp
+++ b/src/glsl/link_uniforms.cpp
@@ -1180,7 +1180,8 @@ link_assign_uniform_locations(struct
gl_shader_program *prog,

  /* Reserve all the explicit locations of the active uniforms. */
  for (unsigned i = 0; i < num_uniforms; i++) {
-  if (uniforms[i].type->is_subroutine())
+  if (uniforms[i].type->is_subroutine() ||
+  uniforms[i].is_shader_storage)
continue;

 if (uniforms[i].remap_location != UNMAPPED_UNIFORM_LOC) { @@
-
1200,8 +1201,10 @@ link_assign_uniform_locations(struct
gl_shader_program *prog,
  /* Reserve locations for rest of the uniforms. */
  for (unsigned i = 0; i < num_uniforms; i++) {

-  if (uniforms[i].type->is_subroutine())
+  if (uniforms[i].type->is_subroutine() ||
+  uniforms[i].is_shader_storage)
continue;
+
 /* Built-in uniforms should not get any location. */
 if (uniforms[i].builtin)
continue;
--
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] glsl: Implement a SSBO load optimization pass

2015-10-20 Thread Francisco Jerez
Iago Toral  writes:

> On Tue, 2015-10-20 at 13:22 +0300, Francisco Jerez wrote:
>> Iago Toral Quiroga  writes:
>> 
>> > This allows us to re-use the results of previous ssbo loads in situations
>> > that are safe (i.e. when there are no stores, atomic operations or
>> > memory barriers in between).
>> >
>> > This is particularly useful for things like matrix multiplications, where
>> > for a mat4 buffer variable we cut the number of loads from 16 (4 reads of
>> > each column) down to 4 (1 read of each column).
>> >
>> > The pass can only cache ssbo loads that involve constant blocks and
>> > offsets, but could be extended to compare sub-expressions for these
>> > as well, similar to a CSE pass.
>> >
>> > The way the cache works is simple: ssbo loads with constant block/offset
>> > are included in a cache as they are seen. Stores invalidate cache entries.
>> > Stores with non-constant offset invalidate all cached loads for the block
>> > and stores with non-constant block invalidate all cache entries. There is
>> > room to improve this by using the actual variable name we are accessing to
>> > limit the entries that should be invalidated. We also need to invalidate
>> > cache entries when we exit the block in which they have been defined
>> > (i.e. inside if/else blocks or loops).
>> >
>> > The cache optimization is built as a separate pass, instead of merging it
>> > inside the lower_ubo_reference pass for a number of reasons:
>> >
>> > 1) The way we process assignments in visitors is that the LHS is
>> > processed before the RHS. This creates a problem for an optimization
>> > such as this when we do things like a = a + 1, since we would see the
>> > store before the read when the actual execution order is reversed.
>> > This could be fixed by re-implementing the logic in the visit_enter
>> > method for ir_assignment in lower_ubo_reference and then returning
>> > visit_continue_with_parent.
>> >
>> > 2) Some writes/reads need to be split into multiple smaller
>> > writes/reads, and we need to handle caching for each one. This happens
>> > deep inside the code that handles the lowering and some
>> > of the information we need to do this is not available. This could also
>> > be fixed by passing more data into the corresponding functions or by
>> > making this data available as class members, but the current implementation
>> > is already complex enough and  this would only contribute to the 
>> > complexity.
>> >
>> > 3) We can have ssbo loads in the LHS too (i.e. a[a[0]] = ..). In these 
>> > cases
>> > the current code in lower_uo_reference would see the store before the read.
>> > Probably fixable, but again would add more complexity to the lowering.
>> >
>> > On the other hand, a separate pass that runs after the lowering sees
>> > all the individal loads and stores in the correct order (so we don't need
>> > to do any tricks) and it allows us to sepearate the lowering logic (which
>> > is already complex) from the caching logic. It also gives us a chance to
>> > run it after other optimization passes have run and turned constant
>> > expressions for block/offset into constants, enabling more opportunities
>> > for caching.
>> 
>> Seems like a restricted form of CSE that only handles SSBO loads, and
>> only the ones with constant arguments.  Why don't we CSE these? (and
>> other memory access operations like image loads)
>
> There is not a CSE pass in GLSL IR any more so we would have to do it in
> NIR and some drivers would lose the optimization. Doing it at GLSL IR
> level seemed like a win from this perspective.
>
> Then there is the fact that we cannot just CSE these. We need to make
> sure that we only CSE them when it is safe to do so (i.e. no
> stores/atomics to the same offsets in between, no memory barriers, etc).
> The current CSE pass in NIR does not support this as far as I can see. I
> suppose that we could look into changing the pass to accommodate
> restrictions such as this if we think is worth it.
>
Not really sure if the NIR CSE pass would be adequate, but at least at
the i965 back-end level this could be handled easily in the CSE pass
(for typed and untyped surface read opcodes in general) in roughly the
same way that source variable interference is handled -- Just kill
potentially overlapping entries from the AEB whenever an atomic or write
instruction for the same surface is seen.

> Iago
>
>> 
>> > ---
>> >  src/glsl/Makefile.sources  |   1 +
>> >  src/glsl/ir_optimization.h |   1 +
>> >  src/glsl/opt_ssbo_load.cpp | 338 
>> > +
>> >  3 files changed, 340 insertions(+)
>> >  create mode 100644 src/glsl/opt_ssbo_load.cpp
>> >
>> > diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
>> > index ca87036..73c7514 100644
>> > --- a/src/glsl/Makefile.sources
>> > +++ b/src/glsl/Makefile.sources
>> > @@ -201,6 +201,7 @@ LIBGLSL_FILES = \
>> >opt_noop_swizzle.cpp \
>> >

[Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-20 Thread Neil Roberts
In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
message types because I found by experimentation that it doesn't work.
I wrote in the comment that I couldn't find any documentation for this
problem. However I've now found the documentation and it has
additional restrictions on further message types so this patch updates
the comment and adds the others.
---

That paragraph in the spec also mentions further restrictions that we
should probably worry about like that the shader shouldn't combine
this optimisation with any other render target data port read/writes.

It also has a fairly pessimistic note saying the optimisation is only
really good for large polygons in a GUI-like workload. I wonder
whether we should be doing some more benchmarking to decide whether
it's really a good idea to enable this as a general optimisation even
for games.

 src/mesa/drivers/dri/i965/brw_fs.cpp | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 49323eb..bf9ff84 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2238,13 +2238,14 @@ fs_visitor::opt_sampler_eot()
if (unlikely(tex_inst->is_head_sentinel()) || !tex_inst->is_tex())
   return false;
 
-   /* This optimisation doesn't seem to work for textureGather for some
-* reason. I can't find any documentation or known workarounds to indicate
-* that this is expected, but considering that it is probably pretty
-* unlikely that a shader would directly write out the results from
-* textureGather we might as well just disable it.
+   /* 3D Sampler » Messages » Message Format
+*
+* “Response Length of zero is allowed on all SIMD8* and SIMD16* sampler
+*  messages except sample+killpix, resinfo, sampleinfo, LOD, and gather4*”
 */
-   if (tex_inst->opcode == SHADER_OPCODE_TG4 ||
+   if (tex_inst->opcode == SHADER_OPCODE_TXS ||
+   tex_inst->opcode == SHADER_OPCODE_LOD ||
+   tex_inst->opcode == SHADER_OPCODE_TG4 ||
tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET)
   return false;
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i915/aa: fixing anti-aliasing bug for thinnest width lines

2015-10-20 Thread Ville Syrjälä
On Tue, Oct 20, 2015 at 08:15:32AM +, Predut, Marius wrote:
> > -Original Message-
> > From: Ville Syrjälä [mailto:ville.syrj...@linux.intel.com]
> > Sent: Monday, October 19, 2015 6:04 PM
> > To: Predut, Marius
> > Cc: mesa-dev@lists.freedesktop.org; Iago Toral Quiroga
> > Subject: Re: [Mesa-dev] [PATCH] i915/aa: fixing anti-aliasing bug for 
> > thinnest
> > width lines
> > 
> > On Thu, Oct 15, 2015 at 06:03:34PM +0300, Ville Syrjälä wrote:
> > > On Thu, Oct 15, 2015 at 02:19:09PM +, Predut, Marius wrote:
> > > > > -Original Message-
> > > > > From: Ville Syrjälä [mailto:ville.syrj...@linux.intel.com]
> > > > > Sent: Wednesday, October 07, 2015 1:53 PM
> > > > > To: Predut, Marius
> > > > > Cc: mesa-dev@lists.freedesktop.org
> > > > > Subject: Re: [Mesa-dev] [PATCH] i915/aa: fixing anti-aliasing bug
> > > > > for thinnest width lines
> > > > >
> > > > > On Mon, Oct 05, 2015 at 07:55:24PM +0300, Marius Predut wrote:
> > > > > > On PNV platform, for 1 pixel line thickness or less, the general
> > > > > > anti-aliasing algorithm gives up, and a garbage line is generated.
> > > > > > Setting a Line Width of 0.0 specifies the rasterization of the
> > > > > > "thinnest" (one-pixel-wide), non-antialiased lines.
> > > > > > Lines rendered with zero Line Width are rasterized using Grid
> > > > > > Intersection Quantization rules as specified by
> > > > > > 2.8.4.1 Zero-Width (Cosmetic) Line Rasterization from volume 1f
> > > > > > of the
> > > > > > GEN3 docs.
> > > > > > The patch was tested on Intel Atom CPU N455.
> > > > > >
> > > > > > This patch follow the same rules as patches fixing the
> > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=28832
> > > > > > bug.
> > > > > >
> > > > > > v1: Eduardo Lima Mitev:  Wrong indentation inside the if clause.
> > > > > > v2: Ian Romanick: comments fix.
> > > > > >
> > > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90367
> > > > > >
> > > > > > Signed-off-by: Marius Predut 
> > > > > > ---
> > > > > >  src/mesa/drivers/dri/i915/i915_state.c | 15 +++
> > > > > >  1 file changed, 15 insertions(+)
> > > > > >
> > > > > > diff --git a/src/mesa/drivers/dri/i915/i915_state.c
> > > > > > b/src/mesa/drivers/dri/i915/i915_state.c
> > > > > > index 4c83073..897eb59 100644
> > > > > > --- a/src/mesa/drivers/dri/i915/i915_state.c
> > > > > > +++ b/src/mesa/drivers/dri/i915/i915_state.c
> > > > > > @@ -599,6 +599,21 @@ i915LineWidth(struct gl_context * ctx,
> > > > > > GLfloat
> > > > > > widthf)
> > > > > >
> > > > > > width = (int) (widthf * 2);
> > > > > > width = CLAMP(width, 1, 0xf);
> > > > > > +
> > > > > > +   if (ctx->Line.Width < 1.5 || widthf < 1.5) {
> > > > > > + /* For 1 pixel line thickness or less, the general
> > > > > > +  * anti-aliasing algorithm gives up, and a garbage line is
> > > > > > +  * generated.  Setting a Line Width of 0.0 specifies the
> > > > > > +  * rasterization of the "thinnest" (one-pixel-wide),
> > > > > > +  * non-antialiased lines.
> > > > > > +  *
> > > > > > +  * Lines rendered with zero Line Width are rasterized using
> > > > > > +  * Grid Intersection Quantization rules as specified by
> > > > > > +  * volume 1f of the GEN3 docs,
> > > > > > +  * 2.8.4.1 Zero-Width (Cosmetic) Line Rasterization.
> > > > > > +  */
> > > > > > +  width = 0;
> > > > > > +   }
> > > > >
> > > > > I went to do some spec reading, and while I can't confirm the AA
> > > > > <= 1.0 problem (no mention in the spec about such things), I can
> > > > > see this fix alone isn't sufficient to satisfy the spec (we lack
> > > > > the round to nearest integer for non-aa for instance).
> > > >
> > > > Ville ,Thanks for review!
> > > > On this seem not too much docs, here can use experiments or docs for 
> > > > next
> > GEN+.
> > > >
> > > > >
> > > > > I think what we'd want is a small helper. i965 has one, although
> > > > > that one looks quite messy. I think this is how I'd write the
> > > > > helper for
> > > > > i915:
> > > > >
> > > > > unsigned intel_line_width(ctx)
> > > > > {
> > > > >   float line_width = ctx->Line.Width;
> > > > >
> > > > >   if (ctx->Line.SmoothFlag)
> > > > >   line_width = CLAMP(line_width, MinAA, MaxAA);
> > > > >   else
> > > > >   line_width = CLAMP(roundf(line_width), Min, Max);
> > > > >
> > > > >   /*
> > > > >* blah
> > > > >*/
> > > > >   if (line_width < 1.5f)
> > > > >   line_width = 0.0f
> > > > >
> > > > >   return U_FIXED(line_width, 1);
> > > > > }
> > > > >
> > > > > and then use it for both gen2 and gen3 state setup.
> > > >
> > > > Do you used this and it works for you? (I mean if you did a test on
> > > > your PNV platform)
> > >
> > > Didn't do any actual testing yet. I've been meaning to, but just been
> > > too busy with other stuff. I can try to test on pnv today, and maybe
> > > on
> > > 830 and 85x on the 

Re: [Mesa-dev] [PATCH 1/2] glsl: Implement a SSBO load optimization pass

2015-10-20 Thread Iago Toral
On Tue, 2015-10-20 at 14:18 +0300, Francisco Jerez wrote:
> Iago Toral  writes:
> 
> > On Tue, 2015-10-20 at 13:22 +0300, Francisco Jerez wrote:
> >> Iago Toral Quiroga  writes:
> >> 
> >> > This allows us to re-use the results of previous ssbo loads in situations
> >> > that are safe (i.e. when there are no stores, atomic operations or
> >> > memory barriers in between).
> >> >
> >> > This is particularly useful for things like matrix multiplications, where
> >> > for a mat4 buffer variable we cut the number of loads from 16 (4 reads of
> >> > each column) down to 4 (1 read of each column).
> >> >
> >> > The pass can only cache ssbo loads that involve constant blocks and
> >> > offsets, but could be extended to compare sub-expressions for these
> >> > as well, similar to a CSE pass.
> >> >
> >> > The way the cache works is simple: ssbo loads with constant block/offset
> >> > are included in a cache as they are seen. Stores invalidate cache 
> >> > entries.
> >> > Stores with non-constant offset invalidate all cached loads for the block
> >> > and stores with non-constant block invalidate all cache entries. There is
> >> > room to improve this by using the actual variable name we are accessing 
> >> > to
> >> > limit the entries that should be invalidated. We also need to invalidate
> >> > cache entries when we exit the block in which they have been defined
> >> > (i.e. inside if/else blocks or loops).
> >> >
> >> > The cache optimization is built as a separate pass, instead of merging it
> >> > inside the lower_ubo_reference pass for a number of reasons:
> >> >
> >> > 1) The way we process assignments in visitors is that the LHS is
> >> > processed before the RHS. This creates a problem for an optimization
> >> > such as this when we do things like a = a + 1, since we would see the
> >> > store before the read when the actual execution order is reversed.
> >> > This could be fixed by re-implementing the logic in the visit_enter
> >> > method for ir_assignment in lower_ubo_reference and then returning
> >> > visit_continue_with_parent.
> >> >
> >> > 2) Some writes/reads need to be split into multiple smaller
> >> > writes/reads, and we need to handle caching for each one. This happens
> >> > deep inside the code that handles the lowering and some
> >> > of the information we need to do this is not available. This could also
> >> > be fixed by passing more data into the corresponding functions or by
> >> > making this data available as class members, but the current 
> >> > implementation
> >> > is already complex enough and  this would only contribute to the 
> >> > complexity.
> >> >
> >> > 3) We can have ssbo loads in the LHS too (i.e. a[a[0]] = ..). In these 
> >> > cases
> >> > the current code in lower_uo_reference would see the store before the 
> >> > read.
> >> > Probably fixable, but again would add more complexity to the lowering.
> >> >
> >> > On the other hand, a separate pass that runs after the lowering sees
> >> > all the individal loads and stores in the correct order (so we don't need
> >> > to do any tricks) and it allows us to sepearate the lowering logic (which
> >> > is already complex) from the caching logic. It also gives us a chance to
> >> > run it after other optimization passes have run and turned constant
> >> > expressions for block/offset into constants, enabling more opportunities
> >> > for caching.
> >> 
> >> Seems like a restricted form of CSE that only handles SSBO loads, and
> >> only the ones with constant arguments.  Why don't we CSE these? (and
> >> other memory access operations like image loads)
> >
> > There is not a CSE pass in GLSL IR any more so we would have to do it in
> > NIR and some drivers would lose the optimization. Doing it at GLSL IR
> > level seemed like a win from this perspective.
> >
> > Then there is the fact that we cannot just CSE these. We need to make
> > sure that we only CSE them when it is safe to do so (i.e. no
> > stores/atomics to the same offsets in between, no memory barriers, etc).
> > The current CSE pass in NIR does not support this as far as I can see. I
> > suppose that we could look into changing the pass to accommodate
> > restrictions such as this if we think is worth it.
> >
> Not really sure if the NIR CSE pass would be adequate, but at least at
> the i965 back-end level this could be handled easily in the CSE pass
> (for typed and untyped surface read opcodes in general) in roughly the
> same way that source variable interference is handled -- Just kill
> potentially overlapping entries from the AEB whenever an atomic or write
> instruction for the same surface is seen.

I didn't think of solving this for i965 only, but if we can get
non-constant block/offset handled easily it is definitely worth a try.
I'll have a go at it.

Even if we do that for i965, I wonder if we still want to have something
like this at GLSL IR though.

Iago

> > Iago
> >
> >> 
> >> > ---
> >> >  

Re: [Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when filling UniformRemapTable

2015-10-20 Thread Lofstedt, Marta

If you also remove the uniforms[i].array_elements in the same manner, the 
assert in:
ES31-CTS.explicit_uniform_location.uniform-loc-arrays-of-arrays 
Is also avoided.

> -Original Message-
> From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
> Behalf Of Tapani Pälli
> Sent: Tuesday, October 20, 2015 11:24 AM
> To: mesa-dev@lists.freedesktop.org
> Subject: [Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when filling
> UniformRemapTable
> 
> UniformRemapTable is used only for remapping user specified uniform
> locations to driver internally used ones, shader storage buffer variables
> should not utilize uniform locations.
> 
> Signed-off-by: Tapani Pälli 
> ---
>  src/glsl/link_uniforms.cpp | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp index
> fe00aa3..f7b87a1 100644
> --- a/src/glsl/link_uniforms.cpp
> +++ b/src/glsl/link_uniforms.cpp
> @@ -1180,7 +1180,8 @@ link_assign_uniform_locations(struct
> gl_shader_program *prog,
> 
> /* Reserve all the explicit locations of the active uniforms. */
> for (unsigned i = 0; i < num_uniforms; i++) {
> -  if (uniforms[i].type->is_subroutine())
> +  if (uniforms[i].type->is_subroutine() ||
> +  uniforms[i].is_shader_storage)
>   continue;
> 
>if (uniforms[i].remap_location != UNMAPPED_UNIFORM_LOC) { @@ -
> 1200,8 +1201,10 @@ link_assign_uniform_locations(struct
> gl_shader_program *prog,
> /* Reserve locations for rest of the uniforms. */
> for (unsigned i = 0; i < num_uniforms; i++) {
> 
> -  if (uniforms[i].type->is_subroutine())
> +  if (uniforms[i].type->is_subroutine() ||
> +  uniforms[i].is_shader_storage)
>   continue;
> +
>/* Built-in uniforms should not get any location. */
>if (uniforms[i].builtin)
>   continue;
> --
> 2.4.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFCv0 0/8] gallium: add support for NIR as alternate IR

2015-10-20 Thread Marek Olšák
On Tue, Oct 20, 2015 at 3:51 AM, Rob Clark  wrote:
> On Mon, Oct 19, 2015 at 8:53 PM, Marek Olšák  wrote:
>> Yes, drivers need to "dup" tokens if they want to keep them. Their
>> lifetime is the create function and that's it. st/mesa only keeps the
>> tokens for vertex shaders, because it needs for emulating some
>> features with the Draw module.
>
> It is something that I think would be worth revisiting, since the
> common case is that drivers have to dup the tokens to do their own
> variant mgmt.. so that seems like the better case to optimize for.  I
> think the only one that isn't doing it's own variant mgmt is nouveau,
> but Ilia mentioned at some point that there where some edge cases
> where they really do need a variant.

This would add more cruft into state trackers and modules. Doing "dup"
in the driver is the easiest way to handle this.

Also, drivers should be able to release any incoming IR if they don't
need it. Including NIR. Having a requirement that some IR must be in
memory for the lifetime of shaders is just overall a bad idea.

>
> it would be a lot of churn to change this for the tgsi case, but I
> still think worthwhile.  Either way, I think we should at least
> consider this while introducing the possibility for other IR's (since
> cloning llvm or nir, for example, would be more expensive than cloning
> tgsi)

For more complex IRs and *if* drivers want to keep them (some really
don't), you need reference counting on the whole IR, so that state
trackers can unreference it and drivers can keep it.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when filling UniformRemapTable

2015-10-20 Thread Samuel Iglesias Gonsálvez
Series is

Reviewed-by: Samuel Iglesias Gonsálvez 

Thanks,

Sam

On 20/10/15 11:24, Tapani Pälli wrote:
> UniformRemapTable is used only for remapping user specified uniform
> locations to driver internally used ones, shader storage buffer
> variables should not utilize uniform locations.
> 
> Signed-off-by: Tapani Pälli 
> ---
>  src/glsl/link_uniforms.cpp | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp
> index fe00aa3..f7b87a1 100644
> --- a/src/glsl/link_uniforms.cpp
> +++ b/src/glsl/link_uniforms.cpp
> @@ -1180,7 +1180,8 @@ link_assign_uniform_locations(struct gl_shader_program 
> *prog,
>  
> /* Reserve all the explicit locations of the active uniforms. */
> for (unsigned i = 0; i < num_uniforms; i++) {
> -  if (uniforms[i].type->is_subroutine())
> +  if (uniforms[i].type->is_subroutine() ||
> +  uniforms[i].is_shader_storage)
>   continue;
>  
>if (uniforms[i].remap_location != UNMAPPED_UNIFORM_LOC) {
> @@ -1200,8 +1201,10 @@ link_assign_uniform_locations(struct gl_shader_program 
> *prog,
> /* Reserve locations for rest of the uniforms. */
> for (unsigned i = 0; i < num_uniforms; i++) {
>  
> -  if (uniforms[i].type->is_subroutine())
> +  if (uniforms[i].type->is_subroutine() ||
> +  uniforms[i].is_shader_storage)
>   continue;
> +
>/* Built-in uniforms should not get any location. */
>if (uniforms[i].builtin)
>   continue;
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 9/9] i965: Mark compacted 3-src instructions as Gen8+.

2015-10-20 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Mon, 2015-10-19 at 21:09 -0700, Matt Turner wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_inst.h | 32 
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_inst.h 
> b/src/mesa/drivers/dri/i965/brw_inst.h
> index 524a4fb..db05a8a 100644
> --- a/src/mesa/drivers/dri/i965/brw_inst.h
> +++ b/src/mesa/drivers/dri/i965/brw_inst.h
> @@ -804,24 +804,24 @@ F(opcode,6,  0) /* Same location as 
> brw_inst */
>   * (Gen8+) Compacted three-source instructions:
>   *  @{
>   */
> -F(3src_src2_reg_nr,63, 57)
> -F(3src_src1_reg_nr,56, 50)
> -F(3src_src0_reg_nr,49, 43)
> -F(3src_src2_subreg_nr, 42, 40)
> -F(3src_src1_subreg_nr, 39, 37)
> -F(3src_src0_subreg_nr, 36, 34)
> -F(3src_src2_rep_ctrl,  33, 33)
> -F(3src_src1_rep_ctrl,  32, 32)
> -F(3src_saturate,   31, 31)
> -F(3src_debug_control,  30, 30)
> -F(3src_cmpt_control,   29, 29)
> -F(3src_src0_rep_ctrl,  28, 28)
> +FC(3src_src2_reg_nr,63, 57, devinfo->gen >= 8)
> +FC(3src_src1_reg_nr,56, 50, devinfo->gen >= 8)
> +FC(3src_src0_reg_nr,49, 43, devinfo->gen >= 8)
> +FC(3src_src2_subreg_nr, 42, 40, devinfo->gen >= 8)
> +FC(3src_src1_subreg_nr, 39, 37, devinfo->gen >= 8)
> +FC(3src_src0_subreg_nr, 36, 34, devinfo->gen >= 8)
> +FC(3src_src2_rep_ctrl,  33, 33, devinfo->gen >= 8)
> +FC(3src_src1_rep_ctrl,  32, 32, devinfo->gen >= 8)
> +FC(3src_saturate,   31, 31, devinfo->gen >= 8)
> +FC(3src_debug_control,  30, 30, devinfo->gen >= 8)
> +FC(3src_cmpt_control,   29, 29, devinfo->gen >= 8)
> +FC(3src_src0_rep_ctrl,  28, 28, devinfo->gen >= 8)
>  /* Reserved */
> -F(3src_dst_reg_nr, 18, 12)
> -F(3src_source_index,   11, 10)
> -F(3src_control_index,   9,  8)
> +FC(3src_dst_reg_nr, 18, 12, devinfo->gen >= 8)
> +FC(3src_source_index,   11, 10, devinfo->gen >= 8)
> +FC(3src_control_index,   9,  8, devinfo->gen >= 8)
>  /* Bit 7 is Reserved (for future Opcode expansion) */
> -F(3src_opcode,  6,  0)
> +FC(3src_opcode,  6,  0, devinfo->gen >= 8)
>  /** @} */
>  
>  #undef F


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/9] i965: Add const to brw_compact_inst_bits.

2015-10-20 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga 

On Mon, 2015-10-19 at 21:09 -0700, Matt Turner wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_inst.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_inst.h 
> b/src/mesa/drivers/dri/i965/brw_inst.h
> index 819ce59..524a4fb 100644
> --- a/src/mesa/drivers/dri/i965/brw_inst.h
> +++ b/src/mesa/drivers/dri/i965/brw_inst.h
> @@ -739,7 +739,7 @@ typedef struct {
>   * Bits indices range from 0..63.
>   */
>  static inline unsigned
> -brw_compact_inst_bits(brw_compact_inst *inst, unsigned high, unsigned low)
> +brw_compact_inst_bits(const brw_compact_inst *inst, unsigned high, unsigned 
> low)
>  {
> const uint64_t mask = (1ull << (high - low + 1)) - 1;
>  
> @@ -774,7 +774,7 @@ brw_compact_inst_set_##name(const struct brw_device_info 
> *devinfo, \
>  }  \
>  static inline unsigned \
>  brw_compact_inst_##name(const struct brw_device_info *devinfo, \
> -brw_compact_inst *inst)\
> +const brw_compact_inst *inst)  \
>  {  \
> assert(assertions); \
> (void) devinfo; \


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFCv0 0/8] gallium: add support for NIR as alternate IR

2015-10-20 Thread Rob Clark
On Tue, Oct 20, 2015 at 6:49 AM, Marek Olšák  wrote:
> On Tue, Oct 20, 2015 at 3:51 AM, Rob Clark  wrote:
>> On Mon, Oct 19, 2015 at 8:53 PM, Marek Olšák  wrote:
>>> Yes, drivers need to "dup" tokens if they want to keep them. Their
>>> lifetime is the create function and that's it. st/mesa only keeps the
>>> tokens for vertex shaders, because it needs for emulating some
>>> features with the Draw module.
>>
>> It is something that I think would be worth revisiting, since the
>> common case is that drivers have to dup the tokens to do their own
>> variant mgmt.. so that seems like the better case to optimize for.  I
>> think the only one that isn't doing it's own variant mgmt is nouveau,
>> but Ilia mentioned at some point that there where some edge cases
>> where they really do need a variant.
>
> This would add more cruft into state trackers and modules. Doing "dup"
> in the driver is the easiest way to handle this.
>
> Also, drivers should be able to release any incoming IR if they don't
> need it. Including NIR. Having a requirement that some IR must be in
> memory for the lifetime of shaders is just overall a bad idea.

If you need to create variants, you need to hang on to the IR until
all possible variants are created (ie. forever)

But if we are changing things, we could also just redefine things that
the driver takes ownership of the IR and frees it whenever it feels
like it.

>>
>> it would be a lot of churn to change this for the tgsi case, but I
>> still think worthwhile.  Either way, I think we should at least
>> consider this while introducing the possibility for other IR's (since
>> cloning llvm or nir, for example, would be more expensive than cloning
>> tgsi)
>
> For more complex IRs and *if* drivers want to keep them (some really
> don't), you need reference counting on the whole IR, so that state
> trackers can unreference it and drivers can keep it.

It could be an idea..  we'd have to track down and clean up all the
various different ways that tgsi is freed
(tgsi_free_tokens()/ureg_free_tokens().. maybe some places are just
calling free() directly?).  For NIR, everyone just calls ralloc_free()
directly, but there aren't so many users yet so I guess not too hard
to track down.

I guess at least for NIR (since the IR is mutable) some minor smarts
would be needed when driver wants a writeable copy to see if there is
only a single outstanding reference to the IR (and clone otherwise).

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when filling UniformRemapTable

2015-10-20 Thread Lofstedt, Marta
> -Original Message-
> From: Palli, Tapani
> Sent: Tuesday, October 20, 2015 12:55 PM
> To: Lofstedt, Marta; mesa-dev@lists.freedesktop.org
> Subject: Re: [Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when filling
> UniformRemapTable
> 
> 
> 
> On 10/20/2015 01:40 PM, Lofstedt, Marta wrote:
> >> -Original Message-
> >> From: Palli, Tapani
> >> Sent: Tuesday, October 20, 2015 12:25 PM
> >> To: Lofstedt, Marta; mesa-dev@lists.freedesktop.org
> >> Subject: Re: [Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when
> >> filling UniformRemapTable
> >>
> >>
> >>
> >> On 10/20/2015 01:11 PM, Lofstedt, Marta wrote:
> >>>
> >>> If you also remove the uniforms[i].array_elements in the same
> >>> manner,
> >> the assert in:
> >>> ES31-CTS.explicit_uniform_location.uniform-loc-arrays-of-arrays
> >>> Is also avoided.
> >>
> >> Are you referring to one of the loops iterating uniforms in this same
> >> place the patch is touching?
> >
> > Yes, I hit the same assert in "uniform-loc-structs" and "uniform-loc-arrays-
> of-arrays", with your patchset I no longer hit the assert for the 
> "uniform-loc-
> structs" tests.
> >
> > If I also remove array_elements I also avoid the assert in "uniform-loc-
> arrays-of-arrays". This is just an observation, not a suggestion of a fix.
> 
> Right, I was not sure at first what you meant by 'removing array_elements'. I
> was hoping arrays-of-arrays to be the same case as with loc-structs but
> unfortunately seems quite different, we just lack some support for arrays of
> arrays here. Will dig more!

The arrays test is another problem...
This patch-set solves the problem it set out to solve, so I consider the whole 
patchset:

Reviewed-by: Marta Lofstedt 

> 
> 
> >>
> >> For me it looks like the problem with arrays-of-arrays test is that
> >> it will assign overlapping locations currently for uniforms if we have 
> >> 'arrays
> of arrays'
> >> situation going on.
> >>
> >> For example with
> >>
> >> layout(location = 0) uniform float uni[2][2]
> >>
> >> uni[0] gets location 0, but uni[1] has gets location 0 too. I haven't
> >> quite figured out yet where the issue is but the location counter
> >> gets reset for each array item and starts again from 0 while it
> >> should actually maintain its count ..
> >>
> >>
>  -Original Message-
>  From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On
>  Behalf Of Tapani Pälli
>  Sent: Tuesday, October 20, 2015 11:24 AM
>  To: mesa-dev@lists.freedesktop.org
>  Subject: [Mesa-dev] [PATCH 1/3] glsl: skip buffer variables when
>  filling UniformRemapTable
> 
>  UniformRemapTable is used only for remapping user specified uniform
>  locations to driver internally used ones, shader storage buffer
>  variables should not utilize uniform locations.
> 
>  Signed-off-by: Tapani Pälli 
>  ---
> src/glsl/link_uniforms.cpp | 7 +--
> 1 file changed, 5 insertions(+), 2 deletions(-)
> 
>  diff --git a/src/glsl/link_uniforms.cpp
>  b/src/glsl/link_uniforms.cpp index
>  fe00aa3..f7b87a1 100644
>  --- a/src/glsl/link_uniforms.cpp
>  +++ b/src/glsl/link_uniforms.cpp
>  @@ -1180,7 +1180,8 @@ link_assign_uniform_locations(struct
>  gl_shader_program *prog,
> 
>    /* Reserve all the explicit locations of the active uniforms. */
>    for (unsigned i = 0; i < num_uniforms; i++) {
>  -  if (uniforms[i].type->is_subroutine())
>  +  if (uniforms[i].type->is_subroutine() ||
>  +  uniforms[i].is_shader_storage)
>  continue;
> 
>   if (uniforms[i].remap_location != UNMAPPED_UNIFORM_LOC) {
>  @@
>  -
>  1200,8 +1201,10 @@ link_assign_uniform_locations(struct
>  gl_shader_program *prog,
>    /* Reserve locations for rest of the uniforms. */
>    for (unsigned i = 0; i < num_uniforms; i++) {
> 
>  -  if (uniforms[i].type->is_subroutine())
>  +  if (uniforms[i].type->is_subroutine() ||
>  +  uniforms[i].is_shader_storage)
>  continue;
>  +
>   /* Built-in uniforms should not get any location. */
>   if (uniforms[i].builtin)
>  continue;
>  --
>  2.4.3
> 
>  ___
>  mesa-dev mailing list
>  mesa-dev@lists.freedesktop.org
>  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92552] [softpipe] piglit egl-create-context-valid-flag-forward-compatible-gl regression

2015-10-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92552

Bug ID: 92552
   Summary: [softpipe] piglit
egl-create-context-valid-flag-forward-compatible-gl
regression
   Product: Mesa
   Version: git
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Keywords: bisected, regression
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: v...@freedesktop.org
QA Contact: mesa-dev@lists.freedesktop.org
CC: chad.vers...@intel.com, emil.l.veli...@gmail.com,
stu_...@126.com, ystree...@gmail.com

mesa: 86ccb2a16f6d21be29cd99d38831eab6079ce107 (master 11.1.0-devel)

$ ./bin/egl-create-context-valid-flag-forward-compatible-gl -auto
Unexpected EGL error: EGL_BAD_ATTRIBUTE 0x3004
Expected EGL error: EGL_BAD_MATCH 0x3009
PIGLIT: {"result": "fail" }

11cabc45b7124e51d5ead42db6dceb5a3755266b is the first bad commit
commit 11cabc45b7124e51d5ead42db6dceb5a3755266b
Author: Matthew Waters 
Date:   Mon Sep 14 18:35:45 2015 +0100

egl: rework handling EGL_CONTEXT_FLAGS

As of version 15 of the EGL_KHR_create_context spec, debug contexts
are allowed for ES contexts.  We should allow creation instead of
erroring.

While we're here provide a more comprehensive checking for the other two
flags - ROBUST_ACCESS_BIT_KHR and FORWARD_COMPATIBLE_BIT_KHR

v2 [Emil Velikov] Rebase. Minor tweak in commit message.

Cc: Boyan Ding 
Cc: Chad Versace 
Cc: "10.6 11.0" 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91044
Signed-off-by: Matthew Waters 
Signed-off-by: Emil Velikov 

:04 04 b134e418a0c11e0fecde8623388be853cddd62c7
217dd9b7d9475918f9ac20a3592bffc399a87319 Msrc
bisect run success

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: add support for ARB_texture_view

2015-10-20 Thread Michel Dänzer
On 17.10.2015 22:09, Marek Olšák wrote:
> From: Marek Olšák 
> 
> All tests pass. We don't need to do much - just set CUBE if the view
> target is CUBE or CUBE_ARRAY, otherwise set the resource target.
> 
> The reason this is so simple can be that texture instructions
> have a greater effect on the target than the sampler view.

I think you meant to swap "is" and "can be": "The reason this can be so
simple is that texture instructions have a greater effect on the target
than the sampler view."

BTW, is that statement really accurate? Isn't it rather because the
format and mip level clamping was already handled via the sampler view
before?


Anyway, if you fix at least the switcheroo above,

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Disable CSE optimization for untyped & typed surface reads

2015-10-20 Thread Iago Toral
On Tue, 2015-10-20 at 00:12 -0700, Jordan Justen wrote:
> An untyped surface read is volatile because it might be affected by a
> write.
> 
> In the ES31-CTS.compute_shader.resources-max test, two back to back
> read/modify/writes of an SSBO variable looked something like this:
> 
>   r1 = untyped_surface_read(ssbo_float)
>   r2 = r1 + 1
>   untyped_surface_write(ssbo_float, r2)
>   r3 = untyped_surface_read(ssbo_float)
>   r4 = r3 + 1
>   untyped_surface_write(ssbo_float, r4)
> 
> And after CSE, we had:
> 
>   r1 = untyped_surface_read(ssbo_float)
>   r2 = r1 + 1
>   untyped_surface_write(ssbo_float, r2)
>   r4 = r1 + 1
>   untyped_surface_write(ssbo_float, r4)

Yeah, we cannot do CSE with SSBO loads. Patch looks good to me, but we
should do the same in the vec4 CSE pass.

Iago

> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_cse.cpp |  3 ++-
>  src/mesa/drivers/dri/i965/brw_shader.cpp | 14 ++
>  src/mesa/drivers/dri/i965/brw_shader.h   |  6 ++
>  3 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
> index c7628dc..3a28c8d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
> @@ -93,7 +93,8 @@ is_expression(const fs_visitor *v, const fs_inst *const 
> inst)
> case SHADER_OPCODE_LOAD_PAYLOAD:
>return !inst->is_copy_payload(v->alloc);
> default:
> -  return inst->is_send_from_grf() && !inst->has_side_effects();
> +  return inst->is_send_from_grf() && !inst->has_side_effects() &&
> + !inst->is_volatile();
> }
>  }
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
> b/src/mesa/drivers/dri/i965/brw_shader.cpp
> index 2324b56..be911ed 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> @@ -969,6 +969,20 @@ backend_instruction::has_side_effects() const
> }
>  }
>  
> +bool
> +backend_instruction::is_volatile() const
> +{
> +   switch (opcode) {
> +   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
> +   case SHADER_OPCODE_UNTYPED_SURFACE_READ_LOGICAL:
> +   case SHADER_OPCODE_TYPED_SURFACE_READ:
> +   case SHADER_OPCODE_TYPED_SURFACE_READ_LOGICAL:
> +  return true;
> +   default:
> +  return false;
> +   }
> +}
> +
>  #ifndef NDEBUG
>  static bool
>  inst_is_in_block(const bblock_t *block, const backend_instruction *inst)
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
> b/src/mesa/drivers/dri/i965/brw_shader.h
> index b33b08f..35ee210 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.h
> +++ b/src/mesa/drivers/dri/i965/brw_shader.h
> @@ -115,6 +115,12 @@ struct backend_instruction : public exec_node {
>  * optimize these out unless you know what you are doing.
>  */
> bool has_side_effects() const;
> +
> +   /**
> +* True if the instruction might be affected by side effects of other
> +* instructions.
> +*/
> +   bool is_volatile() const;
>  #else
>  struct backend_instruction {
> struct exec_node link;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] i965: Don't use message headers for typed/untyped reads

2015-10-20 Thread Jordan Justen
On 2015-10-19 12:23:27, Kristian Høgsberg wrote:
> On Mon, Oct 19, 2015 at 3:19 AM, Francisco Jerez  
> wrote:
> > Hey Kristian,
> >
> > Kristian Høgsberg Kristensen  writes:
> >
> >> We always set the mask to 0x, which is what it defaults to when no
> >> header is present. Let's drop the header instead.
> >>
> >> Signed-off-by: Kristian Høgsberg Kristensen 
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_eu_emit.c | 3 +--
> >>  src/mesa/drivers/dri/i965/brw_fs.cpp| 4 ++--
> >>  2 files changed, 3 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> >> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> >> index bf2fee9..ebd811f 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> >> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> >> @@ -2906,11 +2906,10 @@ brw_untyped_surface_read(struct brw_codegen *p,
> >> const unsigned sfid = (devinfo->gen >= 8 || devinfo->is_haswell ?
> >>HSW_SFID_DATAPORT_DATA_CACHE_1 :
> >>GEN7_SFID_DATAPORT_DATA_CACHE);
> >> -   const bool align1 = (brw_inst_access_mode(devinfo, p->current) == 
> >> BRW_ALIGN_1);
> >> struct brw_inst *insn = brw_send_indirect_surface_message(
> >>p, sfid, dst, payload, surface, msg_length,
> >>brw_surface_payload_size(p, num_channels, true, true),
> >> -  align1);
> >> +  false);
> >>
> >> brw_set_dp_untyped_surface_read_message(
> >>p, insn, num_channels);
> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> >> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> index a2fd441..457bf59 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> @@ -4032,7 +4032,7 @@ fs_visitor::lower_logical_sends()
> >>case SHADER_OPCODE_UNTYPED_SURFACE_READ_LOGICAL:
> >>   lower_surface_logical_send(ibld, inst,
> >>  SHADER_OPCODE_UNTYPED_SURFACE_READ,
> >> -fs_reg(0x));
> >> +fs_reg());
> >>   break;
> >>
> >>case SHADER_OPCODE_UNTYPED_SURFACE_WRITE_LOGICAL:
> >> @@ -4050,7 +4050,7 @@ fs_visitor::lower_logical_sends()
> >>case SHADER_OPCODE_TYPED_SURFACE_READ_LOGICAL:
> >>   lower_surface_logical_send(ibld, inst,
> >>  SHADER_OPCODE_TYPED_SURFACE_READ,
> >> -fs_reg(0x));
> >> +fs_reg());
> >>   break;
> >>
> >
> > Whatever you do here must be in agreement with the generator and whether
> > it sets the header-present bit in the message descriptor or not,
> > otherwise you're likely to get a hang or misrendering (I guess Jenkins
> > would've caught this).
> >
> > However, according to my hardware docs "Typed messages (which go to the
> > render cache data port) must include the header.".  There's no mention
> > of which generation that restriction applies to, but I believe it
> > applies to IVB/VLV *only*, which is the only generation in which typed
> > surface messages are handled by the render cache instead of by the data
> > cache -- The parenthesized comment doesn't quite make sense on newer
> > gens.  A quick test shows no piglit regressions on HSW after removing
> > the header from typed surface read messages.
> >
> > I guess there are two things you could do, I'm okay with either:
> >
> >  - Just drop this hunk in order to stick to the letter of the BSpec and
> >always pass a header together with typed surface read messages.  I
> >have the strong suspicion though that the docs are just being
> >inaccurate and the header is in fact unnecessary on HSW+.  No need to
> >resend if you decide to go down this path, if you just drop the
> >change for typed reads feel free to include my:
> >
> >Reviewed-by: Francisco Jerez 
> 
> You're right, that should not have been there. I did get some image
> load/store regressions from this (on BDW as well) that I only noticed
> this morning. Backing out this change fixes it, thanks.

With this change:

Reviewed-by: Jordan Justen 

and, at least with CS,

Tested-by: Jordan Justen 

> >  - Pass 'gen == 7 ? fs_reg(0x) : fs_reg()' as sample_mask argument
> >to lower_surface_logical_send() in the TYPED_SURFACE_READ case.  For
> >this to work you'll also have to change the generator code to pass
> >the fs_inst::header_size field down to brw_typed_surface_read() so it
> >knows whether the header is present or not.
> >
> >>case SHADER_OPCODE_TYPED_SURFACE_WRITE_LOGICAL:
> >> --
> >> 2.6.2
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/9] i965/vec4: Don't emit MOVs for unused URB slots.

2015-10-20 Thread Iago Toral
On Mon, 2015-10-19 at 21:08 -0700, Matt Turner wrote:
> Otherwise we'd emit a MOV from the null register (which isn't allowed).
> 
> Helps 24 programs in shader-db (the geometry shaders in GSCloth):
> 
> instructions in affected programs: 302 -> 262 (-13.25%)

Makes sense to me,

Reviewed-by: Iago Toral Quiroga 

Out of curiosity, does moving from the null register have any actual
consequences?

Iago

> ---
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 18 +-
>  src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp |  2 +-
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index f891910..c39f97e 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -1221,6 +1221,9 @@ vec4_visitor::emit_untyped_surface_read(unsigned 
> surf_index, dst_reg dst,
>  void
>  vec4_visitor::emit_ndc_computation()
>  {
> +   if (output_reg[VARYING_SLOT_POS].file == BAD_FILE)
> +  return;
> +
> /* Get the position */
> src_reg pos = src_reg(output_reg[VARYING_SLOT_POS]);
>  
> @@ -1286,7 +1289,8 @@ vec4_visitor::emit_psiz_and_flags(dst_reg reg)
> * Later, clipping will detect ucp[6] and ensure the primitive is
> * clipped against all fixed planes.
> */
> -  if (devinfo->has_negative_rhw_bug) {
> +  if (devinfo->has_negative_rhw_bug &&
> +  output_reg[BRW_VARYING_SLOT_NDC].file != BAD_FILE) {
>   src_reg ndc_w = src_reg(output_reg[BRW_VARYING_SLOT_NDC]);
>   ndc_w.swizzle = BRW_SWIZZLE_;
>   emit(CMP(dst_null_f(), ndc_w, src_reg(0.0f), BRW_CONDITIONAL_L));
> @@ -1334,8 +1338,10 @@ vec4_visitor::emit_generic_urb_slot(dst_reg reg, int 
> varying)
> assert(varying < VARYING_SLOT_MAX);
> assert(output_reg[varying].type == reg.type);
> current_annotation = output_reg_annotation[varying];
> -   /* Copy the register, saturating if necessary */
> -   return emit(MOV(reg, src_reg(output_reg[varying])));
> +   if (output_reg[varying].file != BAD_FILE)
> +  return emit(MOV(reg, src_reg(output_reg[varying])));
> +   else
> +  return NULL;
>  }
>  
>  void
> @@ -1354,11 +1360,13 @@ vec4_visitor::emit_urb_slot(dst_reg reg, int varying)
> }
> case BRW_VARYING_SLOT_NDC:
>current_annotation = "NDC";
> -  emit(MOV(reg, src_reg(output_reg[BRW_VARYING_SLOT_NDC])));
> +  if (output_reg[BRW_VARYING_SLOT_NDC].file != BAD_FILE)
> + emit(MOV(reg, src_reg(output_reg[BRW_VARYING_SLOT_NDC])));
>break;
> case VARYING_SLOT_POS:
>current_annotation = "gl_Position";
> -  emit(MOV(reg, src_reg(output_reg[VARYING_SLOT_POS])));
> +  if (output_reg[VARYING_SLOT_POS].file != BAD_FILE)
> + emit(MOV(reg, src_reg(output_reg[VARYING_SLOT_POS])));
>break;
> case VARYING_SLOT_EDGE:
>/* This is present when doing unfilled polygons.  We're supposed to 
> copy
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
> index 485a80e..5dd4f98 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
> @@ -217,7 +217,7 @@ vec4_vs_visitor::emit_urb_slot(dst_reg reg, int varying)
> * shader.
> */
>vec4_instruction *inst = emit_generic_urb_slot(reg, varying);
> -  if (key->clamp_vertex_color)
> +  if (inst && key->clamp_vertex_color)
>   inst->saturate = true;
>break;
> }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] radeonsi: link ES-GS just like LS-HS

2015-10-20 Thread Michel Dänzer
On 19.10.2015 02:13, Marek Olšák wrote:
> From: Marek Olšák 
> 
> This reduces the shader key for ES.
> 
> Use a fixed attrib location based on (semantic name,  index).
> 
> The ESGS item size is determined by the physical index of the highest ES
> output, so it's almost always larger than before, but I think that
> shouldn't matter as long as the ESGS ring buffer is large enough.

The series is

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/9] i965/vec4: Initialize LOD to 0.0f.

2015-10-20 Thread Iago Toral
On Mon, 2015-10-19 at 23:11 -0700, Matt Turner wrote:
> On Mon, Oct 19, 2015 at 9:09 PM, Matt Turner  wrote:
> > We implement textureQueryLevels (which takes no arguments, save the
> > sampler) using the resinfo message (which takes an argument of LOD).
> > Without initializing it, we'd generate a MOV from the null register to
> > load the LOD argument.
> > ---
> >  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
> > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > index ea1e3e7..c942c80 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > @@ -1579,7 +1579,7 @@ vec4_visitor::nir_emit_texture(nir_tex_instr *instr)
> > const glsl_type *coord_type = NULL;
> > src_reg shadow_comparitor;
> > src_reg offset_value;
> > -   src_reg lod, lod2;
> > +   src_reg lod(0.0f), lod2;
> > src_reg sample_index;
> > src_reg mcs;
> 
> The fs backend handles this differently --
> 
>if (op == ir_query_levels) {
>   /* textureQueryLevels() is implemented in terms of TXS so we need to
>* pass a valid LOD argument.
>*/
>   assert(lod.file == BAD_FILE);
>   lod = fs_reg(0u);
>}
> 
> That kinda seems worse -- but it does indicate that I should probably
> initialize lod with 0u instead, though it doesn't seem to matter...


I don't care much either way, the good thing about the code in the fs
backend is that it makes things more explicit. I think I kind of like
that a bit better but works for me either way. Whatever we do, we should
probably use the same solution in both backends though.

> mov(8)  g10<1>F [0F, 0F, 0F, 0F]VF
> send(8) g3<1>UW g10<8,8,1>F
>  sampler resinfo SIMD8 Surface = 1 Sampler = 0 mlen 1 rlen 4
> 
> Opinions?

isn't LOD a float value by definition? Also, there are other parts of
the FS code that initialize it to 0.0, like this (from
lower_sampler_logical_send_gen7):

   if (bld.shader->stage != MESA_SHADER_FRAGMENT &&
   op == SHADER_OPCODE_TEX) {
  op = SHADER_OPCODE_TXL;
  lod = fs_reg(0.0f);
   }

In any case, in that same function we do:

   for (unsigned i = 0; i < ARRAY_SIZE(sources); i++)
  sources[i] = bld.vgrf(BRW_REGISTER_TYPE_F);

and then later we do:

bld.MOV(sources[length], lod)

so even if we initialize lod to 0u it is going to end up as a float,
which explains the result you obtained, so I think float is probably
what we want.

Iago

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Fuzz testing the stand alone glsl compiler

2015-10-20 Thread Eric Anholt
Steve Lynch  writes:

> Hi,
>
> I've been using afl (http://lcamtuf.coredump.cx/afl/) on the standalone
> glsl compiler.
>
> It found four different crashes in the latest code in master and I have
> minimised the test cases that cause the crashes. I spent a couple of hours
> poking around but haven't managed to fix any of the issues.
>
> Is any one interested in the generated test data set?
>
> I haven't filed the defects yet but from what I can see some of the tests
> give control over a pointer that gets dereferenced. I've got no idea if
> they are exploitable but thought I should check that these should still go
> on the public bug list.

That's great!  Fuzzing the compiler is something I've wished someone had
the time for for a long time.

Public bug list sounds fine to me -- we don't embargo other segfaults
(nor do I think we should).  The best way to report it would be to make
piglit tests out of them -- check out something like
tests/spec/glsl-1.10/compiler/version-macro.frag (compile only) or
tests/spec/glsl-1.10/execution/fs-bool-less-compare-false.shader_test
(compile, link, and draw) for examples to work from.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/2] Add a SSBO load optimization pass

2015-10-20 Thread Iago Toral Quiroga
The general idea of this optimization is explained in the commit log of
the first patch, but it basically allows to re-use the results of previous
SSBO loads in situations that should be safe instead of issuing a new
SSBO load every time.

Makes the following shader:

buffer SSBO {
mat4 sm4;
};

uniform mat4 um4;

void main() {
sm4 *= um4;
}

Go from 16 SSBO loads to only 4.

I have not seen any regressions in piglit or dEQP's SSBO functional
tests.

Jordan: hopefully this is enough to fix that CTS test that was producing
spilling issues for you. Notice that this does not address the core problem
there (hopefully Curro's patches will address that), but it helps and if
nothing else, it should improve SSBO performance.

Iago Toral Quiroga (2):
  glsl: Implement a SSBO load optimization pass
  i965: Use the ssbo load cache optimization

 src/glsl/Makefile.sources  |   1 +
 src/glsl/ir_optimization.h |   1 +
 src/glsl/opt_ssbo_load.cpp | 338 +
 src/mesa/drivers/dri/i965/brw_link.cpp |   5 +
 4 files changed, 345 insertions(+)
 create mode 100644 src/glsl/opt_ssbo_load.cpp

-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965: Use the ssbo load cache optimization

2015-10-20 Thread Iago Toral Quiroga
We only really want to do this once since no passes other than
lower_ubo_reference can inject ssbo loads in the IR. We want to
run this late after other optimizations have kicked in to maximize
the chances of seeing constants for block and offset arguments to
the ssbo loads (since these are the ones we can re-use at the
moment).
---
 src/mesa/drivers/dri/i965/brw_link.cpp | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp 
b/src/mesa/drivers/dri/i965/brw_link.cpp
index fc9bee4..df3d0ec 100644
--- a/src/mesa/drivers/dri/i965/brw_link.cpp
+++ b/src/mesa/drivers/dri/i965/brw_link.cpp
@@ -159,6 +159,7 @@ process_glsl_ir(gl_shader_stage stage,
 
lower_ubo_reference(shader, shader->ir);
 
+   bool first_pass = true;
bool progress;
do {
   progress = false;
@@ -176,6 +177,10 @@ process_glsl_ir(gl_shader_stage stage,
 
   progress = do_common_optimization(shader->ir, true, true,
 options, ctx->Const.NativeIntegers) || 
progress;
+  if (first_pass) {
+ progress = opt_ssbo_loads(shader, shader->ir) || progress;
+ first_pass = false;
+  }
} while (progress);
 
validate_ir_tree(shader->ir);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-20 Thread Jordan Justen
I wrote a similar patch a while back because I was annoyed by how this
was causing the send disassembly to not be as nice. :) I dropped it
from my branch at some point, but I still think it is a good idea.

Reviewed-by: Jordan Justen 

On 2015-10-18 21:31:43, Kristian Høgsberg Kristensen wrote:
> An immdiate is already uniform so just return it up front. Without this,
> brw_fs_surface_builder ends up passing immediate surface indices through
> SHADER_OPCODE_BROADCAST. This writes to a stride 0 dst, which we can't
> constant propagate out of, and further, we don't constant propagate into
> the typed/untype read/write opcodes at all.  The end result is that all
> typed/untyped read/write/atomics end up as indirect sends.
> 
> Code generation should always produce either an immediate or an actual
> indirect surface index, so we can fix this by just special casing
> immediates in emit_uniformize.
> 
> Signed-off-by: Kristian Høgsberg Kristensen 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
> b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> index df10a9d..98ce71e 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> @@ -394,6 +394,9 @@ namespace brw {
>   const dst_reg chan_index = component(vgrf(BRW_REGISTER_TYPE_UD), 0);
>   const dst_reg dst = component(vgrf(src.type), 0);
>  
> + if (src.file == IMM)
> +return src;
> +
>   ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
>   ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index);
>  
> -- 
> 2.6.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i915/aa: fixing anti-aliasing bug for thinnest width lines

2015-10-20 Thread Predut, Marius
> -Original Message-
> From: Ville Syrjälä [mailto:ville.syrj...@linux.intel.com]
> Sent: Monday, October 19, 2015 6:04 PM
> To: Predut, Marius
> Cc: mesa-dev@lists.freedesktop.org; Iago Toral Quiroga
> Subject: Re: [Mesa-dev] [PATCH] i915/aa: fixing anti-aliasing bug for thinnest
> width lines
> 
> On Thu, Oct 15, 2015 at 06:03:34PM +0300, Ville Syrjälä wrote:
> > On Thu, Oct 15, 2015 at 02:19:09PM +, Predut, Marius wrote:
> > > > -Original Message-
> > > > From: Ville Syrjälä [mailto:ville.syrj...@linux.intel.com]
> > > > Sent: Wednesday, October 07, 2015 1:53 PM
> > > > To: Predut, Marius
> > > > Cc: mesa-dev@lists.freedesktop.org
> > > > Subject: Re: [Mesa-dev] [PATCH] i915/aa: fixing anti-aliasing bug
> > > > for thinnest width lines
> > > >
> > > > On Mon, Oct 05, 2015 at 07:55:24PM +0300, Marius Predut wrote:
> > > > > On PNV platform, for 1 pixel line thickness or less, the general
> > > > > anti-aliasing algorithm gives up, and a garbage line is generated.
> > > > > Setting a Line Width of 0.0 specifies the rasterization of the
> > > > > "thinnest" (one-pixel-wide), non-antialiased lines.
> > > > > Lines rendered with zero Line Width are rasterized using Grid
> > > > > Intersection Quantization rules as specified by
> > > > > 2.8.4.1 Zero-Width (Cosmetic) Line Rasterization from volume 1f
> > > > > of the
> > > > > GEN3 docs.
> > > > > The patch was tested on Intel Atom CPU N455.
> > > > >
> > > > > This patch follow the same rules as patches fixing the
> > > > > https://bugs.freedesktop.org/show_bug.cgi?id=28832
> > > > > bug.
> > > > >
> > > > > v1: Eduardo Lima Mitev:  Wrong indentation inside the if clause.
> > > > > v2: Ian Romanick: comments fix.
> > > > >
> > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90367
> > > > >
> > > > > Signed-off-by: Marius Predut 
> > > > > ---
> > > > >  src/mesa/drivers/dri/i915/i915_state.c | 15 +++
> > > > >  1 file changed, 15 insertions(+)
> > > > >
> > > > > diff --git a/src/mesa/drivers/dri/i915/i915_state.c
> > > > > b/src/mesa/drivers/dri/i915/i915_state.c
> > > > > index 4c83073..897eb59 100644
> > > > > --- a/src/mesa/drivers/dri/i915/i915_state.c
> > > > > +++ b/src/mesa/drivers/dri/i915/i915_state.c
> > > > > @@ -599,6 +599,21 @@ i915LineWidth(struct gl_context * ctx,
> > > > > GLfloat
> > > > > widthf)
> > > > >
> > > > > width = (int) (widthf * 2);
> > > > > width = CLAMP(width, 1, 0xf);
> > > > > +
> > > > > +   if (ctx->Line.Width < 1.5 || widthf < 1.5) {
> > > > > + /* For 1 pixel line thickness or less, the general
> > > > > +  * anti-aliasing algorithm gives up, and a garbage line is
> > > > > +  * generated.  Setting a Line Width of 0.0 specifies the
> > > > > +  * rasterization of the "thinnest" (one-pixel-wide),
> > > > > +  * non-antialiased lines.
> > > > > +  *
> > > > > +  * Lines rendered with zero Line Width are rasterized using
> > > > > +  * Grid Intersection Quantization rules as specified by
> > > > > +  * volume 1f of the GEN3 docs,
> > > > > +  * 2.8.4.1 Zero-Width (Cosmetic) Line Rasterization.
> > > > > +  */
> > > > > +  width = 0;
> > > > > +   }
> > > >
> > > > I went to do some spec reading, and while I can't confirm the AA
> > > > <= 1.0 problem (no mention in the spec about such things), I can
> > > > see this fix alone isn't sufficient to satisfy the spec (we lack
> > > > the round to nearest integer for non-aa for instance).
> > >
> > > Ville ,Thanks for review!
> > > On this seem not too much docs, here can use experiments or docs for next
> GEN+.
> > >
> > > >
> > > > I think what we'd want is a small helper. i965 has one, although
> > > > that one looks quite messy. I think this is how I'd write the
> > > > helper for
> > > > i915:
> > > >
> > > > unsigned intel_line_width(ctx)
> > > > {
> > > > float line_width = ctx->Line.Width;
> > > >
> > > > if (ctx->Line.SmoothFlag)
> > > > line_width = CLAMP(line_width, MinAA, MaxAA);
> > > > else
> > > > line_width = CLAMP(roundf(line_width), Min, Max);
> > > >
> > > > /*
> > > >  * blah
> > > >  */
> > > > if (line_width < 1.5f)
> > > > line_width = 0.0f
> > > >
> > > > return U_FIXED(line_width, 1);
> > > > }
> > > >
> > > > and then use it for both gen2 and gen3 state setup.
> > >
> > > Do you used this and it works for you? (I mean if you did a test on
> > > your PNV platform)
> >
> > Didn't do any actual testing yet. I've been meaning to, but just been
> > too busy with other stuff. I can try to test on pnv today, and maybe
> > on
> > 830 and 85x on the weekend. Hmm, I wonder if the test even works on
> > gl1?
> 
> Finally got around to trying your patch it on pnv. Sadly it still fails,
> although it does seem a bit happier about the results.
> 
> old:
> $ DISPLAY=:0 ./bin/line-aa-width -auto
> Line from 0,2-30,3 had bad thickness 

[Mesa-dev] [PATCH 0/4] Updates to Draw Indirect for OpenGL ES 3.1

2015-10-20 Thread Marta Lofstedt
This patchset fixes 7 API related tests of the OpenGL ES 3.1 CTS.

I would consider squashing these into one patch, but to avoid
confusion when we discuss my solution I preferr to keep the discussions
separated for each patch for now.

Marta Lofstedt (4):
  mesa: Draw indirect is not allowed if the default VAO is bound.
  mesa: Draw Indirect is not allowed when no vertex array binding
exists.
  mesa: Draw Indirect return wrong error code on unalinged
  mesa: Draw indirect is not allowed when xfb is active and unpaused

 src/mesa/main/api_validate.c | 49 ++--
 1 file changed, 47 insertions(+), 2 deletions(-)

-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] mesa: Draw indirect is not allowed if the default VAO is bound.

2015-10-20 Thread Marta Lofstedt
From: Marta Lofstedt 

From OpenGL ES 3.1 specification, section 10.5:
"DrawArraysIndirect requires that all data sourced for the
command, including the DrawArraysIndirectCommand
structure,  be in buffer objects,  and may not be called when
the default vertex array object is bound."

Signed-off-by: Marta Lofstedt 
---
 src/mesa/main/api_validate.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index a46c194..c5628f5 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -698,6 +698,19 @@ valid_draw_indirect(struct gl_context *ctx,
 {
const GLsizeiptr end = (GLsizeiptr)indirect + size;
 
+   /*
+* OpenGL ES 3.1 spec. section 10.5:
+* "DrawArraysIndirect requires that all data sourced for the
+* command, including the DrawArraysIndirectCommand
+* structure,  be in buffer objects,  and may not be called when
+* the default vertex array object is bound."
+*/
+   if (_mesa_is_gles31(ctx) && (ctx->Array.VAO == ctx->Array.DefaultVAO)) {
+  _mesa_error(ctx, GL_INVALID_OPERATION,
+  "%s(The default VAO is bound)", name);
+  return GL_FALSE;
+   }
+
if (!_mesa_valid_prim_mode(ctx, mode, name))
   return GL_FALSE;
 
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] mesa: Draw Indirect is not allowed when no vertex array binding exists.

2015-10-20 Thread Marta Lofstedt
From: Marta Lofstedt 

OpenGL ES 3.1 spec. section 10.5:
"An INVALID_OPERATION error is generated if zero is bound
to VERTEX_ARRAY_BINDING, DRAW_INDIRECT_BUFFER or to
any enabled vertex array."

Signed-off-by: Marta Lofstedt 
---
 src/mesa/main/api_validate.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index c5628f5..7062cbd 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -711,6 +711,20 @@ valid_draw_indirect(struct gl_context *ctx,
   return GL_FALSE;
}
 
+   /*
+* OpenGL ES 3.1 spec. section 10.5:
+* "An INVALID_OPERATION error is generated if zero is bound to
+* VERTEX_ARRAY_BINDING, DRAW_INDIRECT_BUFFER or to any enabled
+* vertex array."
+* OpenGL 4.5 spec. section 10.4:
+* "An INVALID_OPERATION error is generated if  zero is bound to
+* DRAW_INDIRECT_BUFFER, or if  no element array buffer is bound"
+*/
+   if (!_mesa_is_bufferobj(ctx->Array.ArrayBufferObj)) {
+  _mesa_error(ctx, GL_INVALID_OPERATION,
+  "%s(No VBO is bound)", name);
+   }
+
if (!_mesa_valid_prim_mode(ctx, mode, name))
   return GL_FALSE;
 
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 03/21] mesa/extensions: Wrap array entries in macros

2015-10-20 Thread Erik Faye-Lund
On Tue, Oct 20, 2015 at 12:36 AM, Nanley Chery  wrote:
> From: Nanley Chery 
>
> -   { "GL_SUN_multi_draw_arrays",   o(dummy_true),
>   GLL,1999 },
> +#define EXT(name_str, driver_cap, api_flags, ) \
> +{ .name = "GL_" #name_str, .offset = o(driver_cap), .api_set = 
> api_flags, .year = },

I suspect that using designated initializers will make the MSVC build unhappy.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] gallium: add new properties for clip and cull distance usage

2015-10-20 Thread Roland Scheidegger
Am 20.10.2015 um 15:40 schrieb Marek Olšák:
> On Tue, Oct 20, 2015 at 3:25 PM, Roland Scheidegger  
> wrote:
>> Am 19.10.2015 um 23:44 schrieb Marek Olšák:
>>> On Mon, Oct 19, 2015 at 11:31 PM, Roland Scheidegger  
>>> wrote:
 Yes, but I still don't see much change from getting this information
 from the property rather than how tgsi_scan does it now, which is by
 just using the usage mask from the output declaration. So the writes
 shouldn't have to be analyzed.
 (There's also a slight change in patch 4/4, namely these outputs
 absolutely must be in order (xyzw) now as usage mask is determined
 solely by the number of values. That might already have been the case at
 least for some drivers and is probably ok for other state trackers too,
 it wasn't in the docs however.)
>>>
>>> DCL OUT[1..2], ARRAY(1), CLIPDIST
>>>
>>> CLIPDIST became an array declaration recently, so the usage mask isn't
>>> useful unless it's extended to 8 bits.
>>>
>>> Also, st/mesa doesn't set the usage mask for anything, so I wonder
>>> whether we need it.
>>>
>>> Marek
>>>
>>
>> Actually thinking about this some more, the problem here really is that
>> we have vec4 only, but clip/culldist are scalar outputs. Thus, an array
>> size of 2 doesn't give you the information how many elements the array
>> actually has and needs to be put somewhere else. Can you actually
>> dynamically index into such an array? You'd have to translate the scalar
>> index to vec4 array index + component.
> 
> Yes, I believe that's what the GLSL compiler does - it divides the
> index by 4 and does some kind of vec4-extract-element operation.
> 
>> I guess another possibility instead of properties would be to encode
>> this scalar array information directly into the declaration instead,
>> tgsi_declaration for instance would have spare bits, and
>> tgsi_declaration_array even more so.
>> Albeit properties should be fine for now, it doesn't feel very clean (as
>> I can't really see a reason why this is "side-channel" information
>> rather than directly part of the register declaration).
> 
> Well, do we really need the usage mask? There are no real users (maybe
> nine), vec4 backends typically don't care about usage masks and
> optimizing scalar backends can walk the shader and check writemasks on
> the outputs.
> 

Honestly I don't see much point in removing that information. If the
state tracker doesn't emit it, that's really the fault of the state
tracker - some drivers do something with that information.
Also, this is really a somewhat orthogonal issue to missing the scalar
array information for clip/cull dist (except that of course for non-vec4
arrays UsageMask doesn't make sense).


Roland


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] mesa: Draw Indirect return wrong error code on unalinged

2015-10-20 Thread Marta Lofstedt
From: Marta Lofstedt 

From OpenGL 4.4 specification, section 10.4 and
Open GL Es 3.1 section 10.5:
"An INVALID_VALUE error is generated if indirect is not a multiple
of the size, in basic machine units, of uint."

However, the current code follow the ARB_draw_indirect:
https://www.opengl.org/registry/specs/ARB/draw_indirect.txt
"INVALID_OPERATION is generated by DrawArraysIndirect and
DrawElementsIndirect if commands source data beyond the end
of a buffer object or if  is not word aligned."

Signed-off-by: Marta Lofstedt 
---
 src/mesa/main/api_validate.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index 7062cbd..a084362 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -732,10 +732,19 @@ valid_draw_indirect(struct gl_context *ctx,
/* From the ARB_draw_indirect specification:
 * "An INVALID_OPERATION error is generated [...] if  is no
 *  word aligned."
+* However, from OpenGL version 4.4. section 10.5
+* and OpenGL ES 3.1, section 10.6:
+* "An INVALID_VALUE error is generated if indirect is not a multiple
+* of the size, in basic machine units, of uint."
 */
if ((GLsizeiptr)indirect & (sizeof(GLuint) - 1)) {
-  _mesa_error(ctx, GL_INVALID_OPERATION,
-  "%s(indirect is not aligned)", name);
+  if ((_mesa_is_desktop_gl(ctx) && ctx->Version >= 44) ||
+  _mesa_is_gles31(ctx))
+ _mesa_error(ctx, GL_INVALID_VALUE,
+ "%s(indirect is not aligned)", name);
+  else
+ _mesa_error(ctx, GL_INVALID_OPERATION,
+ "%s(indirect is not aligned)", name);
   return GL_FALSE;
}
 
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] mesa: Draw indirect is not allowed when xfb is active and unpaused

2015-10-20 Thread Marta Lofstedt
From: Marta Lofstedt 

OpenGL ES 3.1 specification, section 10.5:
"An INVALID_OPERATION error is generated if
transform feedback is active and not paused."

Signed-off-by: Marta Lofstedt 
---
 src/mesa/main/api_validate.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index a084362..cb6112f 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -711,6 +711,15 @@ valid_draw_indirect(struct gl_context *ctx,
   return GL_FALSE;
}
 
+   /* OpenGL ES 3.1 specification, section 10.5:
+* "An INVALID_OPERATION error is generated if
+* transform feedback is active and not paused."
+*/
+   if (_mesa_is_gles31(ctx) && _mesa_is_xfb_active_and_unpaused(ctx)) {
+  _mesa_error(ctx, GL_INVALID_OPERATION,
+  "%s(TransformFeedback is active but not paused)", name);
+   }
+
/*
 * OpenGL ES 3.1 spec. section 10.5:
 * "An INVALID_OPERATION error is generated if zero is bound to
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92552] [softpipe] piglit egl-create-context-valid-flag-forward-compatible-gl regression

2015-10-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92552

Boyan Ding  changed:

   What|Removed |Added

 CC||ramix.ben.hass...@intel.com

--- Comment #1 from Boyan Ding  ---
*** Bug 92536 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-20 Thread Matt Turner
On Tue, Oct 20, 2015 at 2:56 AM, Neil Roberts  wrote:
> In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
> message types because I found by experimentation that it doesn't work.
> I wrote in the comment that I couldn't find any documentation for this
> problem. However I've now found the documentation and it has
> additional restrictions on further message types so this patch updates
> the comment and adds the others.
> ---
>
> That paragraph in the spec also mentions further restrictions that we
> should probably worry about like that the shader shouldn't combine
> this optimisation with any other render target data port read/writes.
>
> It also has a fairly pessimistic note saying the optimisation is only
> really good for large polygons in a GUI-like workload. I wonder
> whether we should be doing some more benchmarking to decide whether
> it's really a good idea to enable this as a general optimisation even
> for games.

That's frustrating. I wish the documentation would contain notes like
this from the beginning, but maybe I should just be thankful that it
includes that text at all.

Both patches are

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2015-10-20 Thread Roland Scheidegger
Am 20.10.2015 um 19:11 schrieb Rowley, Timothy O:
> Hi.  I'd like to introduce the Mesa3D community to a software project
> that we hope to upstream.  We're a small team at Intel working on
> software defined visualization (http://sdvis.org/), and have
> opensource projects in both the raytracing (Embree, OSPRay) and
> rasterization (OpenSWR) realms.
> 
> We're a different Intel team from that of i965 fame, with a different
> type of customer and workloads.  Our customers have large clusters of
> compute nodes that for various reasons do not have GPUs, and are
> working with extremely large geometry models.
> 
> We've been working on a high performance, highly scalable rasterizer
> and driver to interface with Mesa3D.  Our rasterizer functions as a
> "software gpu", relying on the mature well-supported Mesa3D to provide
> API and state tracking layers.
> 
> We would like to contribute this code to Mesa3D and continue doing
> active development in your source repository.  We welcome discussion
> about how this will happen and questions about the project itself.
> Below are some answers to what we think might be frequently asked
> questions.
> 
> Bruce and I will be the public contacts for this project, but this
> project isn't solely our work - there's a dedicated group of people
> working on the core SWR code.
> 
>   Tim Rowley
>   Bruce Cherniak
> 
>   Intel Corporation
> 
> Why another software rasterizer?
> 
> 
> Good question, given there are already three (swrast, softpipe,
> llvmpipe) in the Mesa3D tree. Two important reasons for this:
> 
>  * Architecture - given our focus on scientific visualization, our
>workloads are much different than the typical game; we have heavy
>vertex load and relatively simple shaders.  In addition, the core
>counts of machines we run on are much higher.  These parameters led
>to design decisions much different than llvmpipe.
> 
>  * Historical - Intel had developed a high performance software
>graphics stack for internal purposes.  Later we adapted this
>graphics stack for use in visualization and decided to move forward
>with Mesa3D to provide a high quality API layer while at the same
>time benefiting from the excellent performance the software
>rasterizerizer gives us.
> 
> What's the architecture?
> 
> 
> SWR is a tile based immediate mode renderer with a sort-free threading
> model which is arranged as a ring of queues.  Each entry in the ring
> represents a draw context that contains all of the draw state and work
> queues.  An API thread sets up each draw context and worker threads
> will execute both the frontend (vertex/geometry processing) and
> backend (fragment) work as required.  The ring allows for backend
> threads to pull work in order.  Large draws are split into chunks to
> allow vertex processing to happen in parallel, with the backend work
> pickup preserving draw ordering.
> 
> Our pipeline uses just-in-time compiled code for the fetch shader that
> does vertex attribute gathering and AOS to SOA conversions, the vertex
> shader and fragment shaders, streamout, and fragment blending. SWR
> core also supports geometry and compute shaders but we haven't exposed
> them through our driver yet. The fetch shader, streamout, and blend is
> built internally to swr core using LLVM directly, while for the vertex
> and pixel shaders we reuse bits of llvmpipe from
> gallium/auxiliary/gallivm to build the kernels, which we wrap
> differently than llvmpipe's auxiliary/draw code.
> 
> What's the performance?
> ---
> 
> For the types of high-geometry workloads we're interested in, we are
> significantly faster than llvmpipe.  This is to be expected, as
> llvmpipe only threads the fragment processing and not the geometry
> frontend.
> 
> The linked slide below shows some performance numbers from a benchmark
> dataset and application.  On a 36 total core dual E5-2699v3 we see
> performance 29x to 51x that of llvmpipe.  
> 
>   http://openswr.org/slides/SWR_Sept15.pdf
> 
> While our current performance is quite good, we know there is more
> potential in this architecture.  When we switched from a prototype
> OpenGL driver to Mesa we regressed performance severely, some due to
> interface issues that need tuning, some differences in shader code
> generation, and some due to conformance and feature additions to the
> core swr.  We are looking to recovering most of this performance back.
> 
> What's the conformance?
> ---
> 
> The major applications we are targeting are all based on the
> Visualization Toolkit (VTK), and as such our development efforts have
> been focused on making sure these work as best as possible.  Our
> current code passes vtk's rendering tests with their new "OpenGL2"
> (really OpenGL 3.2) backend at 99%.
> 
> piglit testing shows a much lower pass rate, roughly 80% at the time
> of writing.  Core SWR undergoes 

Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2015-10-20 Thread Ilia Mirkin
[re-adding mesa-dev, dropped accidentally]

On Tue, Oct 20, 2015 at 1:43 PM, Ilia Mirkin  wrote:
> On Tue, Oct 20, 2015 at 1:11 PM, Rowley, Timothy O
>  wrote:
>> Does one build work on both AVX and AVX2?
>> -
>>
>>  * Unfortunately, no.  The architecture support is fixed at compile
>>time.  While the AVX version of course will run on AVX2 machines
>>and the jitted code will use AVX2, the overall performance will
>>suffer relative to a full AVX2 build.
>>
>>  * There is some idea that if we move some code from the driver back
>>to SWR core, we could build two versions of libSWR and dynamically
>>load the correct version at runtime.  Unfortunately this mechanism
>>would not work with AVX512, as some of the SWR state structures
>>would change size.
>
> Without commenting on any of the other issues, I believe one of your
> stated goals is to ease distribution to your end-users. If you expect
> them to build their own code, that's no problem. However if you're
> thinking of relying on distros to include your driver and have end
> users use that, then you should consider some solution that enables
> runtime selection of this stuff (even if that's building 3 versions of
> the driver -- swr-avx, swr-avx2, swr-avx512, and having e.g. loader
> magic determine which the right one is for the current CPU).
>
> Cheers,
>
>   -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-20 Thread Kristian Høgsberg
On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez  wrote:
> Kristian Høgsberg  writes:
>
>> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez  
>> wrote:
>>> Neil Roberts  writes:
>>>
 Just a thought, would it be better to move this check into the
 eliminate_find_live_channel optimisation? That way it could catch
 sources that become immediates through later optimisations. One problem
 with this that I've seen before is that eliminating the
 FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be
 eliminated because the copy propagation doesn't work.
>>>
>>> I believe in this particular case the BROADCAST instruction should
>>> already be eliminated (by opt_algebraic), because its first source is an
>>> immediate, so this optimization would in fact be redundant with
>>> opt_algebraic+constant propagation if it weren't because the latter is
>>> unable to handle scalar copies.
>>
>> Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets
>> eliminated because it's outside control flow
>> (fs_visitor::eliminate_find_live_channel). Then the broadcast gets
>> reduced to a MOV in opt_algebraic because src0 is uniform (immediate
>> constant). The problem then is the dst of the MOV has stride 0 which
>> can_propagate_from() then bails on.
>>
>>> Another possibility that would likely
>>> be more general than this (because it would handle cases in which, as
>>> Neil said, the argument becomes an immediate only after optimization,
>>> and because it would also avoid the issue with liveness analysis
>>> extending the live ranges of scalar values incorrectly [1]) would be to
>>> extend the destination register of the BROADCAST instructions to a
>>> vector [2] and then add a case to the switch statement of
>>> try_constant_propagate() so that the immediate MOV resulting from
>>> opt_algebraic is propagated into surface instructions (see attachment).
>>
>> I wrote exactly this code to make constant propagate work, plus adding
>> the extra opcodes to the switch statement. It works and I could
>> certainly send that out after this, but
>>
>> 1) This doesn't mean we shouldn't do the if (src.file == IMM)
>> shortcut. It saves the compiler a bit of work in the very common case
>> of
>> non-indirect buffer access.
>>
>> 2) I'm not even sure it makes sense to extend copy-propagation to do
>> this (which is why I went back to just the IMM test). Anything that
>> would be an immediate at this point should be an immediate, if not
>> we're missing something in nir.
>>
> Still this doesn't address the root of the problem, which is that
> emit_uniformize() emits scalar code that the rest of the compiler is not
> able to handle properly.

This is not a case of papering over the root cause, it's about not
creating the root cause in the first place. The output of
emit_uniformize() always ends up as either a surface or a sampler
index, which we only look at later in the generator. There are no
other cases where the result of emit_uniformize() might be part of an
expression that we can copy propagate or otherwise optimize. If the
input to emit_uniformize() isn't an immediate where it could be, nir
optimization needs fixing. So if we add these two lines to
emit_uniformize() to pass immediates straight through, we avoid
generating code that we have to extend the copy prop pass to handle.

Kristian

> Re-implementing a special case of the
> optimization already done by opt_algebraic() might have helped in this
> specific case, but it won't help when the argument is of some other kind
> of uniform value which are also frequently encountered in practice and
> could also be copy-propagated, and it won't help avoid the liveness
> analysis bug [1] in cases where the argument is not an immediate (the
> bug is reported to break some thousands SSBO dEQP tests, although I
> don't know what fraction of them actually use non-constant indexing).
> The alternative solution I provided patches for (and you seem to have
> implemented yourself independently) would address all these issues at
> once.
>
>> Kristian
>>
 I made this patch a while ago but I never posted it anywhere because
 it's a of a kludge and it would probably be better to fix the copy
 propagation:

 https://github.com/bpeel/mesa/commit/e4c3286075f891f466fb8558106d2aaa

>>> Heh, yeah, I'd rather fix copy propagation instead, which I believe will
>>> become much easier with the use-def-chain analysis pass I'm working on.
>>>
 Either way though I don't think it would do any harm to have Kristian's
 patch as well even if we did improve elimintate_find_live_channel so it
 is:

 Reviewed-by: Neil Roberts 

 - Neil

 Kristian Høgsberg Kristensen  writes:

> An immdiate is already uniform so just return it up front. Without this,
> brw_fs_surface_builder ends up 

Re: [Mesa-dev] [PATCH 2/3] i965/fs: Make emit_uniformize a no-op for immediates

2015-10-20 Thread Francisco Jerez
Kristian Høgsberg  writes:

> On Tue, Oct 20, 2015 at 3:16 AM, Francisco Jerez  
> wrote:
>> Kristian Høgsberg  writes:
>>
>>> On Mon, Oct 19, 2015 at 4:19 AM, Francisco Jerez  
>>> wrote:
 Neil Roberts  writes:

> Just a thought, would it be better to move this check into the
> eliminate_find_live_channel optimisation? That way it could catch
> sources that become immediates through later optimisations. One problem
> with this that I've seen before is that eliminating the
> FIND_LIVE_CHANNEL doesn't cause the subsequent BROADCAST to be
> eliminated because the copy propagation doesn't work.

 I believe in this particular case the BROADCAST instruction should
 already be eliminated (by opt_algebraic), because its first source is an
 immediate, so this optimization would in fact be redundant with
 opt_algebraic+constant propagation if it weren't because the latter is
 unable to handle scalar copies.
>>>
>>> Yes, is happens as you say: first the FIND_LIVE_CHANNEL gets
>>> eliminated because it's outside control flow
>>> (fs_visitor::eliminate_find_live_channel). Then the broadcast gets
>>> reduced to a MOV in opt_algebraic because src0 is uniform (immediate
>>> constant). The problem then is the dst of the MOV has stride 0 which
>>> can_propagate_from() then bails on.
>>>
 Another possibility that would likely
 be more general than this (because it would handle cases in which, as
 Neil said, the argument becomes an immediate only after optimization,
 and because it would also avoid the issue with liveness analysis
 extending the live ranges of scalar values incorrectly [1]) would be to
 extend the destination register of the BROADCAST instructions to a
 vector [2] and then add a case to the switch statement of
 try_constant_propagate() so that the immediate MOV resulting from
 opt_algebraic is propagated into surface instructions (see attachment).
>>>
>>> I wrote exactly this code to make constant propagate work, plus adding
>>> the extra opcodes to the switch statement. It works and I could
>>> certainly send that out after this, but
>>>
>>> 1) This doesn't mean we shouldn't do the if (src.file == IMM)
>>> shortcut. It saves the compiler a bit of work in the very common case
>>> of
>>> non-indirect buffer access.
>>>
>>> 2) I'm not even sure it makes sense to extend copy-propagation to do
>>> this (which is why I went back to just the IMM test). Anything that
>>> would be an immediate at this point should be an immediate, if not
>>> we're missing something in nir.
>>>
>> Still this doesn't address the root of the problem, which is that
>> emit_uniformize() emits scalar code that the rest of the compiler is not
>> able to handle properly.
>
> This is not a case of papering over the root cause, it's about not
> creating the root cause in the first place. The output of
> emit_uniformize() always ends up as either a surface or a sampler
> index, which we only look at later in the generator. There are no
> other cases where the result of emit_uniformize() might be part of an
> expression that we can copy propagate or otherwise optimize. If the
> input to emit_uniformize() isn't an immediate where it could be, nir
> optimization needs fixing. So if we add these two lines to
> emit_uniformize() to pass immediates straight through, we avoid
> generating code that we have to extend the copy prop pass to handle.
>

Kristian, there are legitimate uses of emit_uniformize() in which the
argument is not an immediate but still can be optimized out later on --
E.g. for images it will frequently be a uniform register, or a
non-constant expression calculated within uniform control flow (in which
case the FIND_LIVE_CHANNEL instruction emitted here will be reduced to a
constant MOV by eliminate_find_live_channel()).  In such cases we still
want copy-propagation to kick in, but it wont because of the problem I
was talking about with scalar writes.  Even if the instructions emitted
by emit_uniformize() cannot be optimized out, liveness analysis will
overestimate the live ranges of the temporaries used by
emit_uniformize() for the same reason, potentially causing the register
allocator to spill or run out of registers.

Anyway don't take me wrong, I'm not NAK-ing your patch or anything, I
just have the feeling that by fixing this more generally you could've
saved us [or most likely me ;)] work in the near future.

> Kristian
>
>> Re-implementing a special case of the
>> optimization already done by opt_algebraic() might have helped in this
>> specific case, but it won't help when the argument is of some other kind
>> of uniform value which are also frequently encountered in practice and
>> could also be copy-propagated, and it won't help avoid the liveness
>> analysis bug [1] in cases where the argument is not an immediate (the
>> 

Re: [Mesa-dev] [PATCHv4 4/7] nir: add shader_clock intrinsic

2015-10-20 Thread Connor Abbott
On Tue, Oct 20, 2015 at 12:55 PM, Emil Velikov  wrote:
> v2: Add flags and inline comment/description.
> v3: None of the input/outputs are variables
> v4: Drop clockARB reference, relate code motion barrier comment wrt
> intrinsic flag.
>
> Signed-off-by: Emil Velikov 
> ---
>
> Matt,
>
> Hopefully the updated comment makes more sense (the clockARB
> reference was an unintentional left over). If not I'll just nuke it as
> you suggested.
>
> -Emil
>
>
>  src/glsl/nir/glsl_to_nir.cpp  | 6 ++
>  src/glsl/nir/nir_intrinsics.h | 9 +
>  2 files changed, 15 insertions(+)
>
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index cf5bb93..9dd3d07 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -685,6 +685,8 @@ nir_visitor::visit(ir_call *ir)
>   op = nir_intrinsic_ssbo_atomic_exchange;
>} else if (strcmp(ir->callee_name(), 
> "__intrinsic_ssbo_atomic_comp_swap_internal") == 0) {
>   op = nir_intrinsic_ssbo_atomic_comp_swap;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_shader_clock") == 0) 
> {
> + op = nir_intrinsic_shader_clock;
>} else {
>   unreachable("not reached");
>}
> @@ -789,6 +791,10 @@ nir_visitor::visit(ir_call *ir)
>case nir_intrinsic_memory_barrier:
>   nir_instr_insert_after_cf_list(this->cf_node_list, >instr);
>   break;
> +  case nir_intrinsic_shader_clock:
> + nir_ssa_dest_init(>instr, >dest, 1, NULL);
> + nir_instr_insert_after_cf_list(this->cf_node_list, >instr);
> + break;
>case nir_intrinsic_store_ssbo: {
>   exec_node *param = ir->actual_parameters.get_head();
>   ir_rvalue *block = ((ir_instruction *)param)->as_rvalue();
> diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
> index 49bf3b2..7485070 100644
> --- a/src/glsl/nir/nir_intrinsics.h
> +++ b/src/glsl/nir/nir_intrinsics.h
> @@ -83,6 +83,15 @@ BARRIER(discard)
>   */
>  BARRIER(memory_barrier)
>
> +/*
> + * Shader clock intrinsic with semantics analogous to the clock2x32ARB()
> + * GLSL intrinsic.
> + * The latter can be used as code motion barrier, which is currently not
> + * feasible with NIR, thus we can eliminate the intrinsic when the return
> + * value is unused.

Just a small bikeshedding thing: technically, even if we were to make
shader_clock a code motion barrier like the spec asks for, we could
still delete it if its result is unused because then nobody will
notice if we move code around it. Get rid of the "thus we can
eliminate..." bit and this is

Reviewed-by: Connor Abbott 

> + */
> +INTRINSIC(shader_clock, 0, ARR(), true, 1, 0, 0, NIR_INTRINSIC_CAN_ELIMINATE)
> +
>  /** A conditional discard, with a single boolean source. */
>  INTRINSIC(discard_if, 1, ARR(1), false, 0, 0, 0, 0)
>
> --
> 2.6.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92552] [softpipe] piglit egl-create-context-valid-flag-forward-compatible-gl regression

2015-10-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92552

Boyan Ding  changed:

   What|Removed |Added

  Component|Mesa core   |EGL

--- Comment #4 from Boyan Ding  ---
(In reply to Matthew Waters from comment #2)
> Created attachment 119006 [details] [review]
> egl: distnguish between unsupported api vs capabilities for EGL_CONTEXT_FLAGS
> 
> This fixes that issue for me.

Well, your patch does seem to eliminate the problem, but there is a subtle
difference between how things works now and before, which I think may be
questionable.

The spec of EGL_KHR_create_context says:
requesting a forward-compatible context for OpenGL versions less
than 3.0 will generate an error

The code now takes "OpenGL version" above as version requested with EGL (1.0 by
default). However, the OpenGL version actually provided can be up to 3.0. In
such case, your patch will fail to create forward-compatible context, giving
the "appropriate" error, while mesa before 86ccb2a1 will succeed. Though both
behaviors pass piglit.

Any ideas?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/7] ARB_shader_clock support

2015-10-20 Thread Emil Velikov
On 19 October 2015 at 19:39, Matt Turner  wrote:
> On Mon, Oct 19, 2015 at 7:45 AM, Emil Velikov  
> wrote:
>> Hi all,
>>
>> This is a resent of the previous RFC series with a few minor changes
>>  - fs_reg:;smear() has been moved out of get_timestamp()
>>  - clock2x32ARB() has proper return value
>>  - the nir intrinsic no longer claims to have a variable
>>
>> I'm still rocking a SNB hardware so I did not really got the chance to
>> test this properly. Any reviews and testers will be greatly appreciated.
>
> Assuming I didn't miss something in my comment on 6/7, the series looks 
> correct.
>
> Reviewed-by: Matt Turner 
>
Thanks Matt. I've resent the few problematic patches, and will add
your r-b if you still think they're ok.

> But let's wait until we have piglit tests in place (for both VS and
> FS) before committing it. I'd like to do some testing myself as well.
>
> Thanks Emil. Hopefully we'll get you some hardware you can test on. :)
Hope so too. It has been a bit of stab in the dark with these I so
hopefully I haven't butchered things too much.

The updated series can be found over at
https://github.com/evelikov/Mesa/tree/for-upstream/arb-shader-clock

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92552] [softpipe] piglit egl-create-context-valid-flag-forward-compatible-gl regression

2015-10-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92552

--- Comment #6 from Emil Velikov  ---
Personally I think that one can interpret the return value in either way -
GL_BAD_ATTRIBUTE or EGL_BAD_MATCH. I'd vote for consistency so I'll check what
the other GL test suites are doing.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/9] i965: Compact acc_wr_control only on Gen6+.

2015-10-20 Thread Matt Turner
On Tue, Oct 20, 2015 at 1:51 AM, Iago Toral  wrote:
> On Mon, 2015-10-19 at 21:09 -0700, Matt Turner wrote:
>> It only exists on Gen6+, and the next patches will add compaction
>> support for the (unused) field in the same location on earlier
>> platforms.
>
> The docs say that this exists also in ILK at least. See Page 131 of:
> https://01.org/sites/default/files/documentation/ilk_ihd_os_vol4_part2_july_28_10_0.pdf
>
> However, I see some places in the i965 code where dealing with this is
> surrounded by if (gen >= 6)...
>
> Is this a bug in the ILK documentation?

Yes. The ILK docs are terrible and contain more SNB documentation than
ILK documentation. :(
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCHv2 6/7] i965: Implement nir_intrinsic_shader_clock

2015-10-20 Thread Connor Abbott
On Tue, Oct 20, 2015 at 1:02 PM, Emil Velikov  wrote:
> v2:
>  - Add a few const qualifiers for good measure.
>  - Drop unneeded retype()s (Matt)
>  - Convert timestamp to SIMD8/16, as fs_visitor::get_timestamp() returns
> SIMD4 (Connor)
>
> Signed-off-by: Emil Velikov 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 11 +++
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 10 ++
>  2 files changed, 21 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 792663f..976b4fd 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -1309,6 +1309,17 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
> nir_intrinsic_instr *instr
>break;
> }
>
> +   case nir_intrinsic_shader_clock: {
> +  /* We cannot do anything if there is an event, so ignore it for now */
> +  fs_reg shader_clock = get_timestamp(bld);
> +  const fs_reg srcs[] = { shader_clock.set_smear(0), 
> shader_clock.set_smear(1) };
> +  const fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_UD, ARRAY_SIZE(srcs));

No need to create a temporary here -- you can just do

bld.LOAD_PAYLOAD(dest, srcs, ARRAY_SIZE(srcs), 0);

with that, this is

Reviewed-by: Connor Abbott 

> +
> +  bld.LOAD_PAYLOAD(tmp, srcs, ARRAY_SIZE(srcs), 0);
> +  bld.MOV(dest, tmp);
> +  break;
> +   }
> +
> case nir_intrinsic_image_size: {
>/* Get the referenced image variable and type. */
>const nir_variable *var = instr->variables[0]->var;
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> index ea1e3e7..c401212 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> @@ -806,6 +806,16 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
> *instr)
>break;
> }
>
> +   case nir_intrinsic_shader_clock: {
> +  /* We cannot do anything if there is an event, so ignore it for now */
> +  const src_reg shader_clock = get_timestamp();
> +  const enum brw_reg_type type = 
> brw_type_for_base_type(glsl_type::uvec2_type);
> +
> +  dest = get_nir_dest(instr->dest, type);
> +  emit(MOV(dest, shader_clock));
> +  break;
> +   }
> +
> default:
>unreachable("Unknown intrinsic");
> }
> --
> 2.6.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-20 Thread Matt Turner
On Tue, Oct 20, 2015 at 11:57 AM, Matt Turner  wrote:
> On Tue, Oct 20, 2015 at 2:56 AM, Neil Roberts  wrote:
>> In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
>> message types because I found by experimentation that it doesn't work.
>> I wrote in the comment that I couldn't find any documentation for this
>> problem. However I've now found the documentation and it has
>> additional restrictions on further message types so this patch updates
>> the comment and adds the others.
>> ---
>>
>> That paragraph in the spec also mentions further restrictions that we
>> should probably worry about like that the shader shouldn't combine
>> this optimisation with any other render target data port read/writes.
>>
>> It also has a fairly pessimistic note saying the optimisation is only
>> really good for large polygons in a GUI-like workload. I wonder
>> whether we should be doing some more benchmarking to decide whether
>> it's really a good idea to enable this as a general optimisation even
>> for games.
>
> That's frustrating. I wish the documentation would contain notes like
> this from the beginning, but maybe I should just be thankful that it
> includes that text at all.
>
> Both patches are
>
> Reviewed-by: Matt Turner 

Just this one actually. I thought it was part of a series.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] i965: Remove block arg from foreach_inst_in_block_*_starting_from

2015-10-20 Thread Matt Turner
Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] i965/fs: move the fs_reg::smear() from get_timestamp() to the callers

2015-10-20 Thread Connor Abbott
On Tue, Oct 20, 2015 at 1:00 PM, Emil Velikov  wrote:
> We're about to reuse get_timestamp() for the nir_intrinsic_shader_clock.
> In the latter the generalisation does not apply, so move the smear()
> where needed. This also makes the function analogous to the vec4 one.
>
> v2: Tweak the comment - The caller -> We (Matt, Connor).
>
> Signed-off-by: Emil Velikov 
> ---
>
> Connor,
>
> The diff might be a bit hard to read, but the patch does remove the
> comment from get_timestamp(), If you guys prefer I can also drop it from
> shader_time_begin() and(?) delay the smear until emit_shader_time_end().

You can't delay it until emit_shader_time_end() -- both functions are
using the smear, so both need to use it. I think we don't need to give
the rationale for why we only use the low 32 bits more than once,
though. See below for what I'd do wrt the comments.

>
> -Emil
>
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 32 ++--
>  1 file changed, 22 insertions(+), 10 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index a2fd441..93bb55d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -521,7 +521,15 @@ fs_visitor::get_timestamp(const fs_builder )
>  */
> bld.group(4, 0).exec_all().MOV(dst, ts);
>
> -   /* The caller wants the low 32 bits of the timestamp.  Since it's running
> +   return dst;
> +}
> +
> +void
> +fs_visitor::emit_shader_time_begin()
> +{
> +   shader_start_time = get_timestamp(bld.annotate("shader time start"));
> +
> +   /* We want only the low 32 bits of the timestamp.  Since it's running
>  * at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds,
>  * which is plenty of time for our purposes.  It is identical across the
>  * EUs, but since it's tracking GPU core speed it will increment at a
> @@ -531,15 +539,7 @@ fs_visitor::get_timestamp(const fs_builder )
>  * else that might disrupt timing) by setting smear to 2 and checking if
>  * that field is != 0.
>  */
> -   dst.set_smear(0);
> -
> -   return dst;
> -}
> -
> -void
> -fs_visitor::emit_shader_time_begin()
> -{
> -   shader_start_time = get_timestamp(bld.annotate("shader time start"));
> +   shader_start_time.set_smear(0);
>  }
>
>  void
> @@ -553,6 +553,18 @@ fs_visitor::emit_shader_time_end()
>
> fs_reg shader_end_time = get_timestamp(ibld);
>
> +   /* We want only the low 32 bits of the timestamp.  Since it's running
> +* at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds,
> +* which is plenty of time for our purposes.  It is identical across the
> +* EUs, but since it's tracking GPU core speed it will increment at a
> +* varying rate as render P-states change.
> +*
> +* The caller could also check if render P-states have changed (or 
> anything

The caller => We

Also, this sentence is only relevant for emit_shader_time_end(), since
we only care if the measurement was invalidated during the period
we're measuring, so we can delete it from emit_shader_time_begin(). In
turn, you can summarize the first paragraph here with something like
"We only use the low 32 bits of the timestamp (see
emit_shader_time_begin())". With that bikeshedding aside,

Reviewed-by: Connor Abbott 

> +* else that might disrupt timing) by setting smear to 2 and checking if
> +* that field is != 0.
> +*/
> +   shader_end_time.set_smear(0);
> +
> /* Check that there weren't any timestamp reset events (assuming these
>  * were the only two timestamp reads that happened).
>  */
> --
> 2.6.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] mesa: Draw Indirect is not allowed when no vertex array binding exists.

2015-10-20 Thread Marek Olšák
On Tue, Oct 20, 2015 at 4:19 PM, Marta Lofstedt
 wrote:
> From: Marta Lofstedt 
>
> OpenGL ES 3.1 spec. section 10.5:
> "An INVALID_OPERATION error is generated if zero is bound
> to VERTEX_ARRAY_BINDING, DRAW_INDIRECT_BUFFER or to
> any enabled vertex array."
>
> Signed-off-by: Marta Lofstedt 
> ---
>  src/mesa/main/api_validate.c | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
> index c5628f5..7062cbd 100644
> --- a/src/mesa/main/api_validate.c
> +++ b/src/mesa/main/api_validate.c
> @@ -711,6 +711,20 @@ valid_draw_indirect(struct gl_context *ctx,
>return GL_FALSE;
> }
>
> +   /*
> +* OpenGL ES 3.1 spec. section 10.5:
> +* "An INVALID_OPERATION error is generated if zero is bound to
> +* VERTEX_ARRAY_BINDING, DRAW_INDIRECT_BUFFER or to any enabled
> +* vertex array."
> +* OpenGL 4.5 spec. section 10.4:
> +* "An INVALID_OPERATION error is generated if  zero is bound to
> +* DRAW_INDIRECT_BUFFER, or if  no element array buffer is bound"
> +*/
> +   if (!_mesa_is_bufferobj(ctx->Array.ArrayBufferObj)) {
> +  _mesa_error(ctx, GL_INVALID_OPERATION,
> +  "%s(No VBO is bound)", name);
> +   }

NAK.

VERTEX_ARRAY_BINDING is a VAO. Array.ArrayBufferObj is from glBindBuffer.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] mesa: Draw Indirect return wrong error code on unalinged

2015-10-20 Thread Ilia Mirkin
On Tue, Oct 20, 2015 at 10:19 AM, Marta Lofstedt
 wrote:
> From: Marta Lofstedt 
>
> From OpenGL 4.4 specification, section 10.4 and
> Open GL Es 3.1 section 10.5:
> "An INVALID_VALUE error is generated if indirect is not a multiple
> of the size, in basic machine units, of uint."
>
> However, the current code follow the ARB_draw_indirect:
> https://www.opengl.org/registry/specs/ARB/draw_indirect.txt
> "INVALID_OPERATION is generated by DrawArraysIndirect and
> DrawElementsIndirect if commands source data beyond the end
> of a buffer object or if  is not word aligned."
>
> Signed-off-by: Marta Lofstedt 
> ---
>  src/mesa/main/api_validate.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
> index 7062cbd..a084362 100644
> --- a/src/mesa/main/api_validate.c
> +++ b/src/mesa/main/api_validate.c
> @@ -732,10 +732,19 @@ valid_draw_indirect(struct gl_context *ctx,
> /* From the ARB_draw_indirect specification:
>  * "An INVALID_OPERATION error is generated [...] if  is no
>  *  word aligned."
> +* However, from OpenGL version 4.4. section 10.5

4.4,

I double-checked and you're right -- it was INVALID_OPERATION in GL
4.3, but INVALID_VALUE in GL 4.4. Weird.

Reviewed-by: Ilia Mirkin 

Should probably fix up the piglit test as well, if any.

> +* and OpenGL ES 3.1, section 10.6:
> +* "An INVALID_VALUE error is generated if indirect is not a multiple
> +* of the size, in basic machine units, of uint."
>  */
> if ((GLsizeiptr)indirect & (sizeof(GLuint) - 1)) {
> -  _mesa_error(ctx, GL_INVALID_OPERATION,
> -  "%s(indirect is not aligned)", name);
> +  if ((_mesa_is_desktop_gl(ctx) && ctx->Version >= 44) ||
> +  _mesa_is_gles31(ctx))
> + _mesa_error(ctx, GL_INVALID_VALUE,
> + "%s(indirect is not aligned)", name);
> +  else
> + _mesa_error(ctx, GL_INVALID_OPERATION,
> + "%s(indirect is not aligned)", name);
>return GL_FALSE;
> }
>
> --
> 2.1.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] draw: fix splitting of line loops

2015-10-20 Thread Brian Paul

On 10/19/2015 08:29 AM, Brian Paul wrote:

On 10/19/2015 04:53 AM, Roland Scheidegger wrote:

I thought just changing the prim type would cause the loop to not get
closed in the end, albeit I looked at it only briefly (I thought the
DRAW_SPLIT_AFTER/BEFORE flags were supposed to be able to deal with this
but couldn't figure out how). But if this works correctly,


Yeah, everything I've tested looks good now with this change.  That
said, I don't know all the ins and outs of that code either and it's
possible there's more to do.  But it's certainly better than before.


Grrr, I found another issue with this.  I'm testing a different patch 
and will try to post a v2 later.


-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2015-10-20 Thread Rowley, Timothy O

> On Oct 20, 2015, at 12:44 PM, Ilia Mirkin  wrote:
> 
> On Tue, Oct 20, 2015 at 1:43 PM, Ilia Mirkin  wrote:
>> On Tue, Oct 20, 2015 at 1:11 PM, Rowley, Timothy O
>>  wrote:
>>> Does one build work on both AVX and AVX2?
>>> -
>>> 
>>> * Unfortunately, no.  The architecture support is fixed at compile
>>>   time.  While the AVX version of course will run on AVX2 machines
>>>   and the jitted code will use AVX2, the overall performance will
>>>   suffer relative to a full AVX2 build.
>>> 
>>> * There is some idea that if we move some code from the driver back
>>>   to SWR core, we could build two versions of libSWR and dynamically
>>>   load the correct version at runtime.  Unfortunately this mechanism
>>>   would not work with AVX512, as some of the SWR state structures
>>>   would change size.
>> 
>> Without commenting on any of the other issues, I believe one of your
>> stated goals is to ease distribution to your end-users. If you expect
>> them to build their own code, that's no problem. However if you're
>> thinking of relying on distros to include your driver and have end
>> users use that, then you should consider some solution that enables
>> runtime selection of this stuff (even if that's building 3 versions of
>> the driver -- swr-avx, swr-avx2, swr-avx512, and having e.g. loader
>> magic determine which the right one is for the current CPU).

We’ve found that the large clusters tend to roll their own user environment 
specific to their system configuration, so this problem of binary support 
hasn’t been an immediate concern for the initial users.  We hadn’t considered 
building complete driver/core-swr combinations behind a loader; we’ll consider 
this as a possibility for avx512.

Most of the code movement to make runtime selection at the interface layer 
between core SWR and the driver has been done; we would need to verify any 
stray AVX/AVX2 architecture differences in the driver and add loader logic.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for more message types

2015-10-20 Thread Ilia Mirkin
On Tue, Oct 20, 2015 at 5:56 AM, Neil Roberts  wrote:
> In bfdae9149e0 I disabled the opt_sampler_eot optimisation for TG4
> message types because I found by experimentation that it doesn't work.
> I wrote in the comment that I couldn't find any documentation for this
> problem. However I've now found the documentation and it has
> additional restrictions on further message types so this patch updates
> the comment and adds the others.
> ---
>
> That paragraph in the spec also mentions further restrictions that we
> should probably worry about like that the shader shouldn't combine
> this optimisation with any other render target data port read/writes.
>
> It also has a fairly pessimistic note saying the optimisation is only
> really good for large polygons in a GUI-like workload. I wonder
> whether we should be doing some more benchmarking to decide whether
> it's really a good idea to enable this as a general optimisation even
> for games.
>
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 49323eb..bf9ff84 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2238,13 +2238,14 @@ fs_visitor::opt_sampler_eot()
> if (unlikely(tex_inst->is_head_sentinel()) || !tex_inst->is_tex())
>return false;
>
> -   /* This optimisation doesn't seem to work for textureGather for some
> -* reason. I can't find any documentation or known workarounds to indicate
> -* that this is expected, but considering that it is probably pretty
> -* unlikely that a shader would directly write out the results from
> -* textureGather we might as well just disable it.
> +   /* 3D Sampler » Messages » Message Format
> +*
> +* “Response Length of zero is allowed on all SIMD8* and SIMD16* sampler
> +*  messages except sample+killpix, resinfo, sampleinfo, LOD, and 
> gather4*”
>  */
> -   if (tex_inst->opcode == SHADER_OPCODE_TG4 ||
> +   if (tex_inst->opcode == SHADER_OPCODE_TXS ||
> +   tex_inst->opcode == SHADER_OPCODE_LOD ||
> +   tex_inst->opcode == SHADER_OPCODE_TG4 ||
> tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET)

Do you also need to include SHADER_OPCODE_SAMPLEINFO?

>return false;
>
> --
> 1.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 03/21] mesa/extensions: Wrap array entries in macros

2015-10-20 Thread Marek Olšák
Also, the FIXME comment should be on its own line.

Marek

On Tue, Oct 20, 2015 at 5:14 PM, Marek Olšák  wrote:
> On Tue, Oct 20, 2015 at 4:31 PM, Erik Faye-Lund  wrote:
>> On Tue, Oct 20, 2015 at 12:36 AM, Nanley Chery  wrote:
>>> From: Nanley Chery 
>>>
>>> -   { "GL_SUN_multi_draw_arrays",   o(dummy_true),  
>>> GLL,1999 },
>>> +#define EXT(name_str, driver_cap, api_flags, ) \
>>> +{ .name = "GL_" #name_str, .offset = o(driver_cap), .api_set = 
>>> api_flags, .year = },
>>
>> I suspect that using designated initializers will make the MSVC build 
>> unhappy.
>
> Yes, this syntax isn't allowed in core Mesa. Only drivers can use it.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92552] [softpipe] piglit egl-create-context-valid-flag-forward-compatible-gl regression

2015-10-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92552

--- Comment #3 from Matthew Waters  ---
Note: I don't have commit access so if this is good, then someone else has to
push ;).

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 16/21] mesa: Fix ARB_texture_compression_bptc functionality leaks

2015-10-20 Thread Marek Olšák
NAK. I'd like this extension in compatibility contexts. The fact the
spec requires OpenGL 3.1 was just authors' laziness.

Marek

On Tue, Oct 20, 2015 at 12:44 AM, Nanley Chery  wrote:
> From: Nanley Chery 
>
> Stop leaks into the following contexts:
>* GLES in _mesa_target_can_be_compressed().
>* Pre-3.1 GL legacy versions in all uses.
>
> The extension spec lists OpenGL 3.1 as required, so update the extension
> table accordingly.
>
> Signed-off-by: Nanley Chery 
> ---
>  src/mesa/main/extensions_table.h | 2 +-
>  src/mesa/main/glformats.c| 3 +--
>  src/mesa/main/teximage.c | 2 +-
>  3 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/main/extensions_table.h 
> b/src/mesa/main/extensions_table.h
> index cf6eed7..62e0804 100644
> --- a/src/mesa/main/extensions_table.h
> +++ b/src/mesa/main/extensions_table.h
> @@ -93,7 +93,7 @@ EXT(ARB_texture_buffer_object   , 
> ARB_texture_buffer_object
>  EXT(ARB_texture_buffer_object_rgb32 , 
> ARB_texture_buffer_object_rgb32,  x , GLC,  x ,  x , 2009)
>  EXT(ARB_texture_buffer_range, ARB_texture_buffer_range   
> ,  x , GLC,  x ,  x , 2012)
>  EXT(ARB_texture_compression , dummy_true 
> , GLL,  x ,  x ,  x , 2000)
> -EXT(ARB_texture_compression_bptc, ARB_texture_compression_bptc   
> , GLL, GLC,  x ,  x , 2010)
> +EXT(ARB_texture_compression_bptc, ARB_texture_compression_bptc   
> ,  31, GLC,  x ,  x , 2010)
>  EXT(ARB_texture_compression_rgtc, ARB_texture_compression_rgtc   
> , GLL, GLC,  x ,  x , 2004)
>  EXT(ARB_texture_cube_map, ARB_texture_cube_map   
> , GLL,  x ,  x ,  x , 1999)
>  EXT(ARB_texture_cube_map_array  , ARB_texture_cube_map_array 
> , GLL, GLC,  x ,  x , 2009)
> diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c
> index 9134d7d..542dcfc 100644
> --- a/src/mesa/main/glformats.c
> +++ b/src/mesa/main/glformats.c
> @@ -1327,8 +1327,7 @@ _mesa_is_compressed_format(const struct gl_context 
> *ctx, GLenum format)
> case MESA_FORMAT_LAYOUT_ETC2:
>return _mesa_is_gles3(ctx) || ctx->Extensions.ARB_ES3_compatibility;
> case MESA_FORMAT_LAYOUT_BPTC:
> -  return _mesa_is_desktop_gl(ctx) &&
> - ctx->Extensions.ARB_texture_compression_bptc;
> +  return _mesa_has_ARB_texture_compression_bptc(ctx);
> case MESA_FORMAT_LAYOUT_ASTC:
>return _mesa_has_KHR_texture_compression_astc_ldr(ctx);
> default:
> diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
> index c14f941..7910fe7 100644
> --- a/src/mesa/main/teximage.c
> +++ b/src/mesa/main/teximage.c
> @@ -1387,7 +1387,7 @@ _mesa_target_can_be_compressed(const struct gl_context 
> *ctx, GLenum target,
>  return write_error(error, GL_INVALID_OPERATION);
>   break;
>case MESA_FORMAT_LAYOUT_BPTC:
> - target_can_be_compresed = 
> ctx->Extensions.ARB_texture_compression_bptc;
> + target_can_be_compresed = 
> _mesa_has_ARB_texture_compression_bptc(ctx);
>   break;
>case MESA_FORMAT_LAYOUT_ASTC:
>   target_can_be_compresed =
> --
> 2.6.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/7] st/va: properly defines VAImageFormat formats and improve VaCreateImage

2015-10-20 Thread Julien Isorce
On 17 October 2015 at 00:25, Ilia Mirkin  wrote:

> Not sure how VA specifies things, but if the RGBA8 stuff is supposed
> to be in CPU-endian as packed 32-bit ints, I think you're meant to use
> PIPE_FORMAT_RGBA_UNORM and so on. However if it's always supposed
> to be little-endian or array-based, then the way you have it is fine.
>

Great question.

>From what I can read, I think in VA there is no fixed assumption,  for
example VA_FOURCC('A','B','G','R') exists, I think currently mesa st/va
just only supports VA_LSB_FIRST (but VA_MSB_FIRST exists in libva headers).
Initially I just extended existing code:

case VA_FOURCC('B','G','R','A'):
   return PIPE_FORMAT_B8G8R8A8_UNORM;

I also compared with intel vaapi driver:
http://cgit.freedesktop.org/vaapi/intel-driver/tree/src/i965_drv_video.c#n291
It seems they only use LSB case. Though MSB is handled in unmaintained
vaapi driver bridge for vdpau:
http://cgit.freedesktop.org/vaapi/vdpau-driver/tree/src/vdpau_image.c#n57

As you raised the question, I decided to ask on libva irc channel to poke
some maintainers. But actually you replied to me :) (for the record, you
replied there is not a ton of big endian intel cpu).

Well in any case libva defines for example VA_FOURCC('A','B','G','R') and
VA_FOURCC('B','G','R','A'), so I guess it means I should not use
PIPE_FORMAT_RGBA_UNORM-likes so I guess the patch is ok.

I think for now mesa st/va does not work on big endian cpu when it is about
using "RGBs" formats. And after my patch it still does not.

Though I can add them if you want but it will be untested :)

Julien



>
>   -ilia
>
> On Fri, Oct 16, 2015 at 7:14 PM, Julien Isorce 
> wrote:
> > Also add RGBA, RGBX and BGRX.
> > Also extend ChromaToPipe and implement PipeToYCbCr.
> >
> > Note that gstreamer-vaapi check all the VAImageFormat fields.
> >
> > Signed-off-by: Julien Isorce 
> > ---
> >  src/gallium/state_trackers/va/image.c  | 10 ++--
> >  src/gallium/state_trackers/va/va_private.h | 38
> +-
> >  2 files changed, 45 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/gallium/state_trackers/va/image.c
> b/src/gallium/state_trackers/va/image.c
> > index 022240d..c7fbe1a 100644
> > --- a/src/gallium/state_trackers/va/image.c
> > +++ b/src/gallium/state_trackers/va/image.c
> > @@ -44,7 +44,10 @@ static const VAImageFormat
> formats[VL_VA_MAX_IMAGE_FORMATS] =
> > {VA_FOURCC('Y','V','1','2')},
> > {VA_FOURCC('Y','U','Y','V')},
> > {VA_FOURCC('U','Y','V','Y')},
> > -   {VA_FOURCC('B','G','R','A')}
> > +   {.fourcc = VA_FOURCC('B','G','R','A'), .byte_order = VA_LSB_FIRST,
> 32, 32, 0x00ff, 0xff00, 0x00ff, 0xff00},
> > +   {.fourcc = VA_FOURCC('R','G','B','A'), .byte_order = VA_LSB_FIRST,
> 32, 32, 0x00ff, 0xff00, 0x00ff, 0xff00},
> > +   {.fourcc = VA_FOURCC('B','G','R','X'), .byte_order = VA_LSB_FIRST,
> 32, 24, 0x00ff, 0xff00, 0x00ff, 0x},
> > +   {.fourcc = VA_FOURCC('R','G','B','X'), .byte_order = VA_LSB_FIRST,
> 32, 24, 0x00ff, 0xff00, 0x00ff, 0x}
> >  };
> >
> >  static void
> > @@ -116,7 +119,7 @@ vlVaCreateImage(VADriverContextP ctx, VAImageFormat
> *format, int width, int heig
> > img->width = width;
> > img->height = height;
> > w = align(width, 2);
> > -   h = align(width, 2);
> > +   h = align(height, 2);
> >
> > switch (format->fourcc) {
> > case VA_FOURCC('N','V','1','2'):
> > @@ -149,6 +152,9 @@ vlVaCreateImage(VADriverContextP ctx, VAImageFormat
> *format, int width, int heig
> >break;
> >
> > case VA_FOURCC('B','G','R','A'):
> > +   case VA_FOURCC('R','G','B','A'):
> > +   case VA_FOURCC('B','G','R','X'):
> > +   case VA_FOURCC('R','G','B','X'):
> >img->num_planes = 1;
> >img->pitches[0] = w * 4;
> >img->offsets[0] = 0;
> > diff --git a/src/gallium/state_trackers/va/va_private.h
> b/src/gallium/state_trackers/va/va_private.h
> > index 1ea7be7..3479156 100644
> > --- a/src/gallium/state_trackers/va/va_private.h
> > +++ b/src/gallium/state_trackers/va/va_private.h
> > @@ -46,7 +46,7 @@
> >  #define VL_VA_DRIVER(ctx) ((vlVaDriver *)ctx->pDriverData)
> >  #define VL_VA_PSCREEN(ctx) (VL_VA_DRIVER(ctx)->vscreen->pscreen)
> >
> > -#define VL_VA_MAX_IMAGE_FORMATS 6
> > +#define VL_VA_MAX_IMAGE_FORMATS 9
> >
> >  static inline enum pipe_video_chroma_format
> >  ChromaToPipe(int format)
> > @@ -58,6 +58,8 @@ ChromaToPipe(int format)
> >return PIPE_VIDEO_CHROMA_FORMAT_422;
> > case VA_RT_FORMAT_YUV444:
> >return PIPE_VIDEO_CHROMA_FORMAT_444;
> > +   case VA_RT_FORMAT_RGB32:
> > +   return 0;
> > default:
> >assert(0);
> >return PIPE_VIDEO_CHROMA_FORMAT_420;
> > @@ -80,12 +82,46 @@ YCbCrToPipe(unsigned format)
> >return PIPE_FORMAT_UYVY;
> > case VA_FOURCC('B','G','R','A'):
> >return PIPE_FORMAT_B8G8R8A8_UNORM;
> > +   case 

Re: [Mesa-dev] [PATCH] radeonsi: add support for ARB_texture_view

2015-10-20 Thread Marek Olšák
On Tue, Oct 20, 2015 at 5:26 PM, Ilia Mirkin  wrote:
> On Sat, Oct 17, 2015 at 9:09 AM, Marek Olšák  wrote:
>> +   /* This is not needed if state trackers set last_layer correctly. */
>> +   if (state->target == PIPE_TEXTURE_1D ||
>> +   state->target == PIPE_TEXTURE_2D ||
>> +   state->target == PIPE_TEXTURE_RECT ||
>> +   state->target == PIPE_TEXTURE_CUBE)
>> +   last_layer = state->u.tex.first_layer;
>> +
>
> Did you observe state trackers messing this one up, or just being
> defensive? Also for cube, normally last layer would be first_layer +
> 5...

I'm just being defensive.

The hardware does the right thing if target == CUBE. first_layer ==
last_layer has no effect on cubemaps.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 21/21] mesa/extensions: Declare _mesa_init_extensions() static inline

2015-10-20 Thread Nanley Chery
On Tue, Oct 20, 2015 at 8:57 AM, Marek Olšák  wrote:

> On Tue, Oct 20, 2015 at 12:44 AM, Nanley Chery 
> wrote:
> > From: Nanley Chery 
> >
> > Avoid the function call overhead for this one-liner.
>
> Can you measure the overhead? If not, there is no reason for this change.
>
>
The .text size does decrease by 48 bytes:
 text   data bss   dec hex filename
5190616  209856   27712 5428184  52d3d8 lib/i965_dri.so (before)
5190568  209856   27712 5428136  52d3a8 lib/i965_dri.so (after)

Is this sufficient? If so, I can add the snippet above to the git comment.

Regards,
Nanley

Marek
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 09/21] mesa: Generate a helper function for each extension

2015-10-20 Thread Marek Olšák
On Tue, Oct 20, 2015 at 12:44 AM, Nanley Chery  wrote:
> From: Nanley Chery 
>
> Generate functions which determine if an extension is supported in the
> current context. Initially, enums were going to be explicitly used with
> _mesa_extension_supported(). The idea to embed the function and enums
> into generated helper functions was suggested by Kristian Høgsberg.
>
> For performance, the function body no longer uses
> _mesa_extension_supported() and, as suggested by Chad Versace, the
> functions are also declared static inline.
>
> Signed-off-by: Nanley Chery 
> ---
>  src/mesa/main/context.h|  1 +
>  src/mesa/main/extensions.c | 22 +-
>  src/mesa/main/extensions.h | 39 +++
>  3 files changed, 41 insertions(+), 21 deletions(-)
>
> diff --git a/src/mesa/main/context.h b/src/mesa/main/context.h
> index 1e7a12c..4798b1f 100644
> --- a/src/mesa/main/context.h
> +++ b/src/mesa/main/context.h
> @@ -50,6 +50,7 @@
>
>
>  #include "imports.h"
> +#include "extensions.h"
>  #include "mtypes.h"
>  #include "vbo/vbo.h"
>
> diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
> index 9451085..136313f 100644
> --- a/src/mesa/main/extensions.c
> +++ b/src/mesa/main/extensions.c
> @@ -42,26 +42,6 @@ struct gl_extensions _mesa_extension_override_disables;
>  static char *extra_extensions = NULL;
>  static char *cant_disable_extensions = NULL;
>
> -/**
> - * \brief An element of the \c extension_table.
> - */
> -struct extension {
> -   /** Name of extension, such as "GL_ARB_depth_clamp". */
> -   const char *name;
> -
> -   /** Offset (in bytes) of the corresponding member in struct 
> gl_extensions. */
> -   size_t offset;
> -
> -   /** Minimum version the extension requires for the given API
> -* (see gl_api defined in mtypes.h)
> -*/
> -   GLuint version[API_OPENGL_LAST + 1];
> -
> -   /** Year the extension was proposed or approved.  Used to sort the
> -* extension string chronologically. */
> -   uint16_t year;
> -};
> -
>
>  /**
>   * Given a member \c x of struct gl_extensions, return offset of
> @@ -73,7 +53,7 @@ struct extension {
>  /**
>   * \brief Table of supported OpenGL extensions for all API's.
>   */
> -static const struct extension extension_table[] = {
> +const struct extension extension_table[] = {
>  #define EXT(name_str, driver_cap, gll_ver, glc_ver, gles_ver, gles2_ver, 
> ) \
>  { .name = "GL_" #name_str, .offset = o(driver_cap), \
>.version = { \
> diff --git a/src/mesa/main/extensions.h b/src/mesa/main/extensions.h
> index 595512a..24cc04d 100644
> --- a/src/mesa/main/extensions.h
> +++ b/src/mesa/main/extensions.h
> @@ -55,6 +55,45 @@ _mesa_get_extension_count(struct gl_context *ctx);
>  extern const GLubyte *
>  _mesa_get_enabled_extension(struct gl_context *ctx, GLuint index);
>
> +
> +/**
> + * \brief An element of the \c extension_table.
> + */
> +struct extension {
> +   /** Name of extension, such as "GL_ARB_depth_clamp". */
> +   const char *name;
> +
> +   /** Offset (in bytes) of the corresponding member in struct 
> gl_extensions. */
> +   size_t offset;
> +
> +   /** Minimum version the extension requires for the given API
> +* (see gl_api defined in mtypes.h)
> +*/
> +   GLuint version[API_OPENGL_LAST + 1];
> +
> +   /** Year the extension was proposed or approved.  Used to sort the
> +* extension string chronologically. */
> +   uint16_t year;
> +} extern const extension_table[];

Global variables should have a proper prefix.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: add support for ARB_texture_view

2015-10-20 Thread Ilia Mirkin
On Sat, Oct 17, 2015 at 9:09 AM, Marek Olšák  wrote:
> +   /* This is not needed if state trackers set last_layer correctly. */
> +   if (state->target == PIPE_TEXTURE_1D ||
> +   state->target == PIPE_TEXTURE_2D ||
> +   state->target == PIPE_TEXTURE_RECT ||
> +   state->target == PIPE_TEXTURE_CUBE)
> +   last_layer = state->u.tex.first_layer;
> +

Did you observe state trackers messing this one up, or just being
defensive? Also for cube, normally last layer would be first_layer +
5...

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/7] st/va: add headless support, i.e. VA_DISPLAY_DRM

2015-10-20 Thread Julien Isorce
On 19 October 2015 at 17:16, Emil Velikov  wrote:

> On 17 October 2015 at 00:14, Julien Isorce 
> wrote:
> > This patch allows to use gallium vaapi without requiring
> > a X server running for your second graphic card.
> >
> I've sent a lengthy series which should mitigate the need of some hunks.
>

Ok I'll wait for your patches to land before going further on this patch.
Should I expect vl_winsy_drm.c in your patches ? Not sure do understood
that part. Actually I though about having "vl_screen_create_drm" and
renames vl_screen_create to vl_screen_create_x11 (because it takes XDisplay
in params) but then I got confused because vl_winsys.h includes Xlib.h.
Should future vl_screen_create_drm be in another header, vl_drm.h ?

Thx
Julien


> > Signed-off-by: Julien Isorce 
> > ---
> >  src/gallium/state_trackers/va/Makefile.am |  9 ++
> >  src/gallium/state_trackers/va/context.c   | 49
> +++
> >  2 files changed, 53 insertions(+), 5 deletions(-)
> >
> > diff --git a/src/gallium/state_trackers/va/Makefile.am
> b/src/gallium/state_trackers/va/Makefile.am
> > index 2a93a90..348cfe1 100644
> > --- a/src/gallium/state_trackers/va/Makefile.am
> > +++ b/src/gallium/state_trackers/va/Makefile.am
> > @@ -30,6 +30,15 @@ AM_CFLAGS = \
> > $(VA_CFLAGS) \
> > -DVA_DRIVER_INIT_FUNC="__vaDriverInit_$(VA_MAJOR)_$(VA_MINOR)"
> >
> > +AM_CFLAGS += \
> > +   $(GALLIUM_PIPE_LOADER_DEFINES) \
> > +   -DPIPE_SEARCH_DIR=\"$(libdir)/gallium-pipe\"
> > +
> > +if HAVE_GALLIUM_STATIC_TARGETS
> > +AM_CFLAGS += \
> > +   -DGALLIUM_STATIC_TARGETS=1
> > +endif
> > +
> Like this one.
>
> >  AM_CPPFLAGS = \
> > -I$(top_srcdir)/include
> >
> > diff --git a/src/gallium/state_trackers/va/context.c
> b/src/gallium/state_trackers/va/context.c
> > index ddc863b..9ab2710 100644
> > --- a/src/gallium/state_trackers/va/context.c
> > +++ b/src/gallium/state_trackers/va/context.c
> > @@ -28,7 +28,8 @@
> >
> >  #include "pipe/p_screen.h"
> >  #include "pipe/p_video_codec.h"
> > -
> > +#include "pipe-loader/pipe_loader.h"
> > +#include "state_tracker/drm_driver.h"
> >  #include "util/u_memory.h"
> >  #include "util/u_handle_table.h"
> >  #include "util/u_video.h"
> > @@ -98,7 +99,7 @@ static struct VADriverVTableVPP vtable_vpp =
> >  PUBLIC VAStatus
> >  VA_DRIVER_INIT_FUNC(VADriverContextP ctx)
> >  {
> > -   vlVaDriver *drv;
> > +   vlVaDriver *drv = NULL;
> Unnecessary change.
>
> >
> > if (!ctx)
> >return VA_STATUS_ERROR_INVALID_CONTEXT;
> > @@ -107,8 +108,40 @@ VA_DRIVER_INIT_FUNC(VADriverContextP ctx)
> > if (!drv)
> >return VA_STATUS_ERROR_ALLOCATION_FAILED;
> >
> > -   drv->vscreen = vl_screen_create(ctx->native_dpy, ctx->x11_screen);
> > -   if (!drv->vscreen)
> > +   drv->vscreen = NULL;
> DItto.
>
> > +
> > +   switch (ctx->display_type) {
> > +   case VA_DISPLAY_X11:
> > +  drv->vscreen = vl_screen_create(ctx->native_dpy, ctx->x11_screen);
> > +  if (!drv->vscreen)
> > + goto error_screen;
> > +  break;
> > +
> > +   case VA_DISPLAY_DRM:
> > +   case VA_DISPLAY_DRM_RENDERNODES: {
> > +  struct drm_state *drm_info = (struct drm_state *) ctx->drm_state;
> > +  if (!drm_info)
> > + goto error_screen;
> > +
> > +  drv->vscreen = CALLOC_STRUCT(vl_screen);
> > +
> > +#if GALLIUM_STATIC_TARGETS
> > +  drv->vscreen->pscreen = dd_create_screen(drm_info->fd);
> > +#else
> > +  int loader_fd = dup(drm_info->fd);
> > +  if (loader_fd == -1)
> > + goto error_screen;
> > +
> > +  if (pipe_loader_drm_probe_fd(>dev, loader_fd))
> > + drv->vscreen->pscreen = pipe_loader_create_screen(drv->dev,
> PIPE_SEARCH_DIR);
> > +#endif
> And much of this.
>
> Having this around feels rather abusive. I'm leaning that ideally we
> need vl_winsy_drm.c, but I'll let others decide. At the very least the
> error paths looks quite funky. Please use the same approach as in
> vlVaTerminate()
>
> -Emil
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 4/7] st/va: implement VaCreateSurfaces2 and VaQuerySurfaceAttributes

2015-10-20 Thread Julien Isorce
Inspired from http://cgit.freedesktop.org/vaapi/intel-driver/
especially src/i965_drv_video.c::i965_CreateSurfaces2.

This patch is mainly to support gstreamer-vaapi and tools that uses
this newer libva API. The first advantage of using VaCreateSurfaces2
over existing VaCreateSurfaces, is that it is possible to select which
the pixel format for the surface. Indeed with the simple VaCreateSurfaces
function it is only possible to create a NV12 surface. It can be useful
to create a RGBA surface to use with video post processing.

The avaible pixel formats can be query with VaQuerySurfaceAttributes.

Signed-off-by: Julien Isorce 
---
 src/gallium/state_trackers/va/context.c|   5 +-
 src/gallium/state_trackers/va/surface.c| 294 -
 src/gallium/state_trackers/va/va_private.h |   6 +-
 3 files changed, 253 insertions(+), 52 deletions(-)

diff --git a/src/gallium/state_trackers/va/context.c 
b/src/gallium/state_trackers/va/context.c
index 8b003ae..9be9085 100644
--- a/src/gallium/state_trackers/va/context.c
+++ b/src/gallium/state_trackers/va/context.c
@@ -81,7 +81,10 @@ static struct VADriverVTable vtable =
,
,
,
-   
+   ,
+   NULL, /* DEPRECATED VaGetSurfaceAttributes */
+   ,
+   
 };
 
 PUBLIC VAStatus
diff --git a/src/gallium/state_trackers/va/surface.c 
b/src/gallium/state_trackers/va/surface.c
index 8d4487b..62fdf3c 100644
--- a/src/gallium/state_trackers/va/surface.c
+++ b/src/gallium/state_trackers/va/surface.c
@@ -36,6 +36,7 @@
 #include "util/u_surface.h"
 
 #include "vl/vl_compositor.h"
+#include "vl/vl_video_buffer.h"
 #include "vl/vl_winsys.h"
 
 #include "va_private.h"
@@ -44,56 +45,8 @@ VAStatus
 vlVaCreateSurfaces(VADriverContextP ctx, int width, int height, int format,
int num_surfaces, VASurfaceID *surfaces)
 {
-   struct pipe_video_buffer templat = {};
-   struct pipe_screen *pscreen;
-   vlVaDriver *drv;
-   int i;
-
-   if (!ctx)
-  return VA_STATUS_ERROR_INVALID_CONTEXT;
-
-   if (!(width && height))
-  return VA_STATUS_ERROR_INVALID_IMAGE_FORMAT;
-
-   drv = VL_VA_DRIVER(ctx);
-   pscreen = VL_VA_PSCREEN(ctx);
-
-   templat.buffer_format = pscreen->get_video_param
-   (
-  pscreen,
-  PIPE_VIDEO_PROFILE_UNKNOWN,
-  PIPE_VIDEO_ENTRYPOINT_BITSTREAM,
-  PIPE_VIDEO_CAP_PREFERED_FORMAT
-   );
-   templat.chroma_format = ChromaToPipe(format);
-   templat.width = width;
-   templat.height = height;
-   templat.interlaced = pscreen->get_video_param
-   (
-  pscreen,
-  PIPE_VIDEO_PROFILE_UNKNOWN,
-  PIPE_VIDEO_ENTRYPOINT_BITSTREAM,
-  PIPE_VIDEO_CAP_PREFERS_INTERLACED
-   );
-
-   for (i = 0; i < num_surfaces; ++i) {
-  vlVaSurface *surf = CALLOC(1, sizeof(vlVaSurface));
-  if (!surf)
- goto no_res;
-
-  surf->templat = templat;
-  surf->buffer = drv->pipe->create_video_buffer(drv->pipe, );
-  util_dynarray_init(>subpics);
-  surfaces[i] = handle_table_add(drv->htab, surf);
-   }
-
-   return VA_STATUS_SUCCESS;
-
-no_res:
-   if (i)
-  vlVaDestroySurfaces(ctx, surfaces, i);
-
-   return VA_STATUS_ERROR_ALLOCATION_FAILED;
+   return vlVaCreateSurfaces2(ctx, format, width, height, surfaces, 
num_surfaces,
+  NULL, 0);
 }
 
 VAStatus
@@ -349,3 +302,244 @@ vlVaUnlockSurface(VADriverContextP ctx, VASurfaceID 
surface)
 
return VA_STATUS_ERROR_UNIMPLEMENTED;
 }
+
+VAStatus
+vlVaQuerySurfaceAttributes(VADriverContextP ctx, VAConfigID config,
+   VASurfaceAttrib *attrib_list, unsigned int 
*num_attribs)
+{
+vlVaDriver *drv;
+VASurfaceAttrib *attribs;
+struct pipe_screen *pscreen;
+int i;
+
+if (config == VA_INVALID_ID)
+return VA_STATUS_ERROR_INVALID_CONFIG;
+
+if (!attrib_list && !num_attribs)
+return VA_STATUS_ERROR_INVALID_PARAMETER;
+
+if (!attrib_list) {
+*num_attribs = VASurfaceAttribCount;
+return VA_STATUS_SUCCESS;
+}
+
+if (!ctx)
+   return VA_STATUS_ERROR_INVALID_CONTEXT;
+
+drv = VL_VA_DRIVER(ctx);
+
+if (!drv)
+return VA_STATUS_ERROR_INVALID_CONTEXT;
+
+pscreen = VL_VA_PSCREEN(ctx);
+
+if (!pscreen)
+   return VA_STATUS_ERROR_INVALID_CONTEXT;
+
+attribs = CALLOC(VASurfaceAttribCount, sizeof(VASurfaceAttrib));
+
+if (!attribs)
+return VA_STATUS_ERROR_ALLOCATION_FAILED;
+
+i = 0;
+
+if (config == PIPE_VIDEO_PROFILE_UNKNOWN) {
+   /* Assume VAEntrypointVideoProc for now. */
+   attribs[i].type = VASurfaceAttribPixelFormat;
+   attribs[i].value.type = VAGenericValueTypeInteger;
+   attribs[i].flags = VA_SURFACE_ATTRIB_GETTABLE | 
VA_SURFACE_ATTRIB_SETTABLE;
+   attribs[i].value.value.i = VA_FOURCC_BGRA;
+   i++;
+
+   attribs[i].type = VASurfaceAttribPixelFormat;
+   attribs[i].value.type = VAGenericValueTypeInteger;
+   attribs[i].flags = VA_SURFACE_ATTRIB_GETTABLE | 
VA_SURFACE_ATTRIB_SETTABLE;
+   

Re: [Mesa-dev] [RFCv0 0/8] gallium: add support for NIR as alternate IR

2015-10-20 Thread Ilia Mirkin
On Tue, Oct 20, 2015 at 9:10 AM, Marek Olšák  wrote:
> On Tue, Oct 20, 2015 at 2:37 PM, Rob Clark  wrote:
>> On Tue, Oct 20, 2015 at 6:49 AM, Marek Olšák  wrote:
>>> On Tue, Oct 20, 2015 at 3:51 AM, Rob Clark  wrote:
 On Mon, Oct 19, 2015 at 8:53 PM, Marek Olšák  wrote:
> Yes, drivers need to "dup" tokens if they want to keep them. Their
> lifetime is the create function and that's it. st/mesa only keeps the
> tokens for vertex shaders, because it needs for emulating some
> features with the Draw module.

 It is something that I think would be worth revisiting, since the
 common case is that drivers have to dup the tokens to do their own
 variant mgmt.. so that seems like the better case to optimize for.  I
 think the only one that isn't doing it's own variant mgmt is nouveau,
 but Ilia mentioned at some point that there where some edge cases
 where they really do need a variant.
>>>
>>> This would add more cruft into state trackers and modules. Doing "dup"
>>> in the driver is the easiest way to handle this.
>>>
>>> Also, drivers should be able to release any incoming IR if they don't
>>> need it. Including NIR. Having a requirement that some IR must be in
>>> memory for the lifetime of shaders is just overall a bad idea.
>>
>> If you need to create variants, you need to hang on to the IR until
>> all possible variants are created (ie. forever)
>
> Not true. You can convert to another IR, run lowering passes, and keep
> the result of that.

On nvc0, I'm just patching up the binary code directly. I may or may
not be able to get away with doing that on nv50... my latest thinking
is that I can, but haven't implemented yet. Which will mean no
variants for nouveau *and* fully working logic for flat shading :)

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 01/21] mesa/texcompress: Restrict FXT1 format to desktop GL subset

2015-10-20 Thread Nanley Chery
On Tue, Oct 20, 2015 at 6:26 AM, Ian Romanick  wrote:

> On 10/19/2015 05:58 PM, Nanley Chery wrote:
> >
> >
> > On Mon, Oct 19, 2015 at 3:47 PM, Ilia Mirkin  > > wrote:
> >
> > On Mon, Oct 19, 2015 at 6:36 PM, Nanley Chery  > > wrote:
> > > From: Nanley Chery  > >
> > >
> > > In agreement with the extension spec and commit
> > > dd0eb004874645135b9aaac3ebbd0aaf274079ea, filter FXT1 formats to
> the
> > > desktop GL profiles. Now we no longer advertise such formats as
> > supported
> > > in an ES context and then throw an INVALID_ENUM error when the
> client
> > > tries to use such formats with CompressedTexImage2D.
> > >
> > > Fixes the following 26 dEQP tests:
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_x
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_y
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_z
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_x
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_y
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_z
> > > *
> >
>  dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_size
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_level_max_cube_pos
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_level_max_tex2d
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_level_cube
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_level_tex2d
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_x
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_y
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_z
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_x
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_y
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_z
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_tex2d
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_x
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_y
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_z
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_x
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_y
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_z
> > > *
> >
>  
> dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_tex2d
> > >
> > > Cc: Ian Romanick  > >
> > > Cc: Mark Janes  > >
> > > Cc: "11.0"  > >
> > > Signed-off-by: Nanley Chery  > >
> > > ---
> > >  src/mesa/main/texcompress.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/src/mesa/main/texcompress.c
> b/src/mesa/main/texcompress.c
> > > index feaf061..1615f4f 100644
> > > --- a/src/mesa/main/texcompress.c
> > > +++ b/src/mesa/main/texcompress.c
> > > @@ -286,7 +286,9 @@ GLuint
> > >  _mesa_get_compressed_formats(struct gl_context *ctx, GLint
> *formats)
> > >  {
> > > GLuint n = 0;
> > > -   if (ctx->Extensions.TDFX_texture_compression_FXT1) {
> > > +   if (ctx->Extensions.TDFX_texture_compression_FXT1 &&
> > > +  (ctx->API == API_OPENGL_CORE ||
> > > +  (ctx->API == API_OPENGL_COMPAT && ctx->Version > 10))) {
> >
> > Not sure if you get rid of such helpers 

Re: [Mesa-dev] [RFC 21/21] mesa/extensions: Declare _mesa_init_extensions() static inline

2015-10-20 Thread Nanley Chery
On Tue, Oct 20, 2015 at 9:31 AM, Marek Olšák  wrote:

> On Tue, Oct 20, 2015 at 6:25 PM, Nanley Chery 
> wrote:
> >
> >
> > On Tue, Oct 20, 2015 at 8:57 AM, Marek Olšák  wrote:
> >>
> >> On Tue, Oct 20, 2015 at 12:44 AM, Nanley Chery 
> >> wrote:
> >> > From: Nanley Chery 
> >> >
> >> > Avoid the function call overhead for this one-liner.
> >>
> >> Can you measure the overhead? If not, there is no reason for this
> change.
> >>
> >
> > The .text size does decrease by 48 bytes:
> >  text   data bss   dec hex filename
> > 5190616  209856   27712 5428184  52d3d8 lib/i965_dri.so (before)
> > 5190568  209856   27712 5428136  52d3a8 lib/i965_dri.so (after)
> >
> > Is this sufficient? If so, I can add the snippet above to the git
> comment.
>
> I don't think so. :)
>
>
:) Okay. I suppose that the function isn't called often enough to warrant
the loss of encapsulation.

Thanks,
Nanley


> Marek
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCHv2 6/7] i965: Implement nir_intrinsic_shader_clock

2015-10-20 Thread Emil Velikov
v2:
 - Add a few const qualifiers for good measure.
 - Drop unneeded retype()s (Matt)
 - Convert timestamp to SIMD8/16, as fs_visitor::get_timestamp() returns
SIMD4 (Connor)

Signed-off-by: Emil Velikov 
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 11 +++
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 10 ++
 2 files changed, 21 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 792663f..976b4fd 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -1309,6 +1309,17 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   break;
}
 
+   case nir_intrinsic_shader_clock: {
+  /* We cannot do anything if there is an event, so ignore it for now */
+  fs_reg shader_clock = get_timestamp(bld);
+  const fs_reg srcs[] = { shader_clock.set_smear(0), 
shader_clock.set_smear(1) };
+  const fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_UD, ARRAY_SIZE(srcs));
+
+  bld.LOAD_PAYLOAD(tmp, srcs, ARRAY_SIZE(srcs), 0);
+  bld.MOV(dest, tmp);
+  break;
+   }
+
case nir_intrinsic_image_size: {
   /* Get the referenced image variable and type. */
   const nir_variable *var = instr->variables[0]->var;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index ea1e3e7..c401212 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -806,6 +806,16 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
*instr)
   break;
}
 
+   case nir_intrinsic_shader_clock: {
+  /* We cannot do anything if there is an event, so ignore it for now */
+  const src_reg shader_clock = get_timestamp();
+  const enum brw_reg_type type = 
brw_type_for_base_type(glsl_type::uvec2_type);
+
+  dest = get_nir_dest(instr->dest, type);
+  emit(MOV(dest, shader_clock));
+  break;
+   }
+
default:
   unreachable("Unknown intrinsic");
}
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] mesa: Draw indirect is not allowed if the default VAO is bound.

2015-10-20 Thread Ilia Mirkin
On Tue, Oct 20, 2015 at 10:19 AM, Marta Lofstedt
 wrote:
> From: Marta Lofstedt 
>
> From OpenGL ES 3.1 specification, section 10.5:
> "DrawArraysIndirect requires that all data sourced for the
> command, including the DrawArraysIndirectCommand
> structure,  be in buffer objects,  and may not be called when
> the default vertex array object is bound."

Is it possible to do this with desktop GL? AFAIK ARB_draw_indirect is
only enabled for core profiles, and you can't draw at all in core
without a VAO bound. So I think you can remove the _mesa_is_gles31
check. [Might want to wait on confirmation for that, or double-check
my claim yourself.]

>
> Signed-off-by: Marta Lofstedt 
> ---
>  src/mesa/main/api_validate.c | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
> index a46c194..c5628f5 100644
> --- a/src/mesa/main/api_validate.c
> +++ b/src/mesa/main/api_validate.c
> @@ -698,6 +698,19 @@ valid_draw_indirect(struct gl_context *ctx,
>  {
> const GLsizeiptr end = (GLsizeiptr)indirect + size;
>
> +   /*
> +* OpenGL ES 3.1 spec. section 10.5:
> +* "DrawArraysIndirect requires that all data sourced for the
> +* command, including the DrawArraysIndirectCommand
> +* structure,  be in buffer objects,  and may not be called when
> +* the default vertex array object is bound."
> +*/
> +   if (_mesa_is_gles31(ctx) && (ctx->Array.VAO == ctx->Array.DefaultVAO)) {
> +  _mesa_error(ctx, GL_INVALID_OPERATION,
> +  "%s(The default VAO is bound)", name);
> +  return GL_FALSE;
> +   }
> +
> if (!_mesa_valid_prim_mode(ctx, mode, name))
>return GL_FALSE;
>
> --
> 2.1.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 21/21] mesa/extensions: Declare _mesa_init_extensions() static inline

2015-10-20 Thread Marek Olšák
On Tue, Oct 20, 2015 at 12:44 AM, Nanley Chery  wrote:
> From: Nanley Chery 
>
> Avoid the function call overhead for this one-liner.

Can you measure the overhead? If not, there is no reason for this change.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 5/7] st/va: implement dmabuf import for VaCreateSurfaces2

2015-10-20 Thread Julien Isorce
For now it is limited to RGBA, BGRA, RGBX, BGRX surfaces.

Signed-off-by: Julien Isorce 
---
 src/gallium/state_trackers/va/surface.c | 97 -
 1 file changed, 96 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/va/surface.c 
b/src/gallium/state_trackers/va/surface.c
index 62fdf3c..6b5a1f5 100644
--- a/src/gallium/state_trackers/va/surface.c
+++ b/src/gallium/state_trackers/va/surface.c
@@ -29,6 +29,8 @@
 #include "pipe/p_screen.h"
 #include "pipe/p_video_codec.h"
 
+#include "state_tracker/drm_driver.h"
+
 #include "util/u_memory.h"
 #include "util/u_handle_table.h"
 #include "util/u_rect.h"
@@ -41,6 +43,8 @@
 
 #include "va_private.h"
 
+#include 
+
 VAStatus
 vlVaCreateSurfaces(VADriverContextP ctx, int width, int height, int format,
int num_surfaces, VASurfaceID *surfaces)
@@ -368,7 +372,8 @@ vlVaQuerySurfaceAttributes(VADriverContextP ctx, VAConfigID 
config,
 attribs[i].type = VASurfaceAttribMemoryType;
 attribs[i].value.type = VAGenericValueTypeInteger;
 attribs[i].flags = VA_SURFACE_ATTRIB_GETTABLE | VA_SURFACE_ATTRIB_SETTABLE;
-attribs[i].value.value.i = VA_SURFACE_ATTRIB_MEM_TYPE_VA;
+attribs[i].value.value.i = VA_SURFACE_ATTRIB_MEM_TYPE_VA |
+VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME;
 i++;
 
 attribs[i].type = VASurfaceAttribExternalBufferDescriptor;
@@ -402,6 +407,83 @@ vlVaQuerySurfaceAttributes(VADriverContextP ctx, 
VAConfigID config,
 return VA_STATUS_SUCCESS;
 }
 
+static VAStatus
+suface_from_external_memory(VADriverContextP ctx, vlVaSurface *surface,
+VASurfaceAttribExternalBuffers *memory_attibute,
+int index, VASurfaceID *surfaces,
+struct pipe_video_buffer *templat)
+{
+vlVaDriver *drv;
+struct pipe_screen *pscreen;
+struct pipe_resource *resource;
+struct pipe_resource res_templ;
+struct winsys_handle whandle;
+struct pipe_resource *resources[VL_NUM_COMPONENTS];
+
+if (!ctx)
+return VA_STATUS_ERROR_INVALID_PARAMETER;
+
+pscreen = VL_VA_PSCREEN(ctx);
+drv = VL_VA_DRIVER(ctx);
+
+if (!memory_attibute || !memory_attibute->buffers ||
+index > memory_attibute->num_buffers)
+return VA_STATUS_ERROR_INVALID_PARAMETER;
+
+if (surface->templat.width != memory_attibute->width ||
+surface->templat.height != memory_attibute->height ||
+memory_attibute->num_planes < 1)
+return VA_STATUS_ERROR_INVALID_PARAMETER;
+
+switch (memory_attibute->pixel_format) {
+case VA_FOURCC_RGBA:
+case VA_FOURCC_RGBX:
+case VA_FOURCC_BGRA:
+case VA_FOURCC_BGRX:
+if (memory_attibute->num_planes != 1)
+return VA_STATUS_ERROR_INVALID_PARAMETER;
+break;
+default:
+return VA_STATUS_ERROR_INVALID_PARAMETER;
+}
+
+memset(_templ, 0, sizeof(res_templ));
+res_templ.target = PIPE_TEXTURE_2D;
+res_templ.last_level = 0;
+res_templ.depth0 = 1;
+res_templ.array_size = 1;
+res_templ.width0 = memory_attibute->width;
+res_templ.height0 = memory_attibute->height;
+res_templ.format = surface->templat.buffer_format;
+res_templ.bind = PIPE_BIND_SAMPLER_VIEW;
+res_templ.usage = PIPE_USAGE_DEFAULT;
+
+memset(, 0, sizeof(struct winsys_handle));
+whandle.type = DRM_API_HANDLE_TYPE_FD;
+whandle.handle = memory_attibute->buffers[index];
+whandle.stride = memory_attibute->pitches[index];
+
+resource = pscreen->resource_from_handle(pscreen, _templ, );
+
+if (!resource)
+   return VA_STATUS_ERROR_ALLOCATION_FAILED;
+
+memset(resources, 0, sizeof resources);
+resources[0] = resource;
+
+surface->buffer = vl_video_buffer_create_ex2(drv->pipe, templat, 
resources);
+if (!surface->buffer)
+return VA_STATUS_ERROR_ALLOCATION_FAILED;
+
+util_dynarray_init(>subpics);
+surfaces[index] = handle_table_add(drv->htab, surface);
+
+if (!surfaces[index])
+  return VA_STATUS_ERROR_ALLOCATION_FAILED;
+
+return VA_STATUS_SUCCESS;
+}
+
 VAStatus
 vlVaCreateSurfaces2(VADriverContextP ctx, unsigned int format,
 unsigned int width, unsigned int height,
@@ -415,6 +497,7 @@ vlVaCreateSurfaces2(VADriverContextP ctx, unsigned int 
format,
 int i;
 int memory_type;
 int expected_fourcc;
+VAStatus vaStatus;
 
 if (!ctx)
return VA_STATUS_ERROR_INVALID_CONTEXT;
@@ -453,6 +536,7 @@ vlVaCreateSurfaces2(VADriverContextP ctx, unsigned int 
format,
 
 switch (attrib_list[i].value.value.i) {
 case VA_SURFACE_ATTRIB_MEM_TYPE_VA:
+case VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME:
memory_type = attrib_list[i].value.value.i;
break;
 default:
@@ -481,6 +565,12 @@ vlVaCreateSurfaces2(VADriverContextP ctx, unsigned int 
format,
 if (memory_attibute)

[Mesa-dev] [PATCH v2 2/7] st/va: properly defines VAImageFormat formats and improve VaCreateImage

2015-10-20 Thread Julien Isorce
Also add RGBA, RGBX and BGRX.
Also extend ChromaToPipe and implement PipeToYCbCr.

Note that gstreamer-vaapi check all the VAImageFormat fields.

Signed-off-by: Julien Isorce 
---
 src/gallium/state_trackers/va/image.c  | 18 +++---
 src/gallium/state_trackers/va/va_private.h | 38 +-
 2 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/src/gallium/state_trackers/va/image.c 
b/src/gallium/state_trackers/va/image.c
index b37a971..84d94c8 100644
--- a/src/gallium/state_trackers/va/image.c
+++ b/src/gallium/state_trackers/va/image.c
@@ -37,14 +37,21 @@
 
 #include "va_private.h"
 
-static const VAImageFormat formats[VL_VA_MAX_IMAGE_FORMATS] =
+static const VAImageFormat formats[] =
 {
{VA_FOURCC('N','V','1','2')},
{VA_FOURCC('I','4','2','0')},
{VA_FOURCC('Y','V','1','2')},
{VA_FOURCC('Y','U','Y','V')},
{VA_FOURCC('U','Y','V','Y')},
-   {VA_FOURCC('B','G','R','A')}
+   {.fourcc = VA_FOURCC('B','G','R','A'), .byte_order = VA_LSB_FIRST, 32, 32,
+0x00ff, 0xff00, 0x00ff, 0xff00},
+   {.fourcc = VA_FOURCC('R','G','B','A'), .byte_order = VA_LSB_FIRST, 32, 32,
+0x00ff, 0xff00, 0x00ff, 0xff00},
+   {.fourcc = VA_FOURCC('B','G','R','X'), .byte_order = VA_LSB_FIRST, 32, 24,
+0x00ff, 0xff00, 0x00ff, 0x},
+   {.fourcc = VA_FOURCC('R','G','B','X'), .byte_order = VA_LSB_FIRST, 32, 24,
+0x00ff, 0xff00, 0x00ff, 0x}
 };
 
 static void
@@ -72,6 +79,8 @@ vlVaQueryImageFormats(VADriverContextP ctx, VAImageFormat 
*format_list, int *num
enum pipe_format format;
int i;
 
+   STATIC_ASSERT(ARRAY_SIZE(formats) == VL_VA_MAX_IMAGE_FORMATS);
+
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
@@ -80,7 +89,7 @@ vlVaQueryImageFormats(VADriverContextP ctx, VAImageFormat 
*format_list, int *num
 
*num_formats = 0;
pscreen = VL_VA_PSCREEN(ctx);
-   for (i = 0; i < VL_VA_MAX_IMAGE_FORMATS; ++i) {
+   for (i = 0; i < ARRAY_SIZE(formats); ++i) {
   format = YCbCrToPipe(formats[i].fourcc);
   if (pscreen->is_video_format_supported(pscreen, format,
   PIPE_VIDEO_PROFILE_UNKNOWN,
@@ -149,6 +158,9 @@ vlVaCreateImage(VADriverContextP ctx, VAImageFormat 
*format, int width, int heig
   break;
 
case VA_FOURCC('B','G','R','A'):
+   case VA_FOURCC('R','G','B','A'):
+   case VA_FOURCC('B','G','R','X'):
+   case VA_FOURCC('R','G','B','X'):
   img->num_planes = 1;
   img->pitches[0] = w * 4;
   img->offsets[0] = 0;
diff --git a/src/gallium/state_trackers/va/va_private.h 
b/src/gallium/state_trackers/va/va_private.h
index 1ea7be7..303b0c6 100644
--- a/src/gallium/state_trackers/va/va_private.h
+++ b/src/gallium/state_trackers/va/va_private.h
@@ -46,12 +46,14 @@
 #define VL_VA_DRIVER(ctx) ((vlVaDriver *)ctx->pDriverData)
 #define VL_VA_PSCREEN(ctx) (VL_VA_DRIVER(ctx)->vscreen->pscreen)
 
-#define VL_VA_MAX_IMAGE_FORMATS 6
+#define VL_VA_MAX_IMAGE_FORMATS 9
 
 static inline enum pipe_video_chroma_format
 ChromaToPipe(int format)
 {
switch (format) {
+   case VA_RT_FORMAT_YUV400:
+  return PIPE_VIDEO_CHROMA_FORMAT_400;
case VA_RT_FORMAT_YUV420:
   return PIPE_VIDEO_CHROMA_FORMAT_420;
case VA_RT_FORMAT_YUV422:
@@ -80,12 +82,46 @@ YCbCrToPipe(unsigned format)
   return PIPE_FORMAT_UYVY;
case VA_FOURCC('B','G','R','A'):
   return PIPE_FORMAT_B8G8R8A8_UNORM;
+   case VA_FOURCC('R','G','B','A'):
+  return PIPE_FORMAT_R8G8B8A8_UNORM;
+   case VA_FOURCC('B','G','R','X'):
+  return PIPE_FORMAT_B8G8R8X8_UNORM;
+   case VA_FOURCC('R','G','B','X'):
+  return PIPE_FORMAT_R8G8B8X8_UNORM;
default:
   assert(0);
   return PIPE_FORMAT_NONE;
}
 }
 
+static inline unsigned
+PipeToYCbCr(enum pipe_format p_format)
+{
+   switch (p_format) {
+   case PIPE_FORMAT_NV12:
+  return VA_FOURCC('N','V','1','2');
+   case PIPE_FORMAT_IYUV:
+  return VA_FOURCC('I','4','2','0');
+   case PIPE_FORMAT_YV12:
+  return VA_FOURCC('Y','V','1','2');
+   case PIPE_FORMAT_UYVY:
+  return VA_FOURCC('U','Y','V','Y');
+   case PIPE_FORMAT_YUYV:
+  return VA_FOURCC('Y','U','Y','V');
+   case PIPE_FORMAT_B8G8R8A8_UNORM:
+  return VA_FOURCC('B','G','R','A');
+   case PIPE_FORMAT_R8G8B8A8_UNORM:
+  return VA_FOURCC('R','G','B','A');
+   case PIPE_FORMAT_B8G8R8X8_UNORM:
+  return VA_FOURCC('B','G','R','X');
+   case PIPE_FORMAT_R8G8B8X8_UNORM:
+  return VA_FOURCC('R','G','B','X');
+   default:
+  assert(0);
+  return -1;
+   }
+}
+
 static inline VAProfile
 PipeToProfile(enum pipe_video_profile profile)
 {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/7] st/va: do not destroy old buffer when new one failed

2015-10-20 Thread Julien Isorce
If formats are not the same vlVaPutImage re-creates the video
buffer with the right format. But if the creation of this new
video buffer fails then the surface looses its current buffer.
Let's just destroy the previous buffer on success.

Signed-off-by: Julien Isorce 
---
 src/gallium/state_trackers/va/image.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/gallium/state_trackers/va/image.c 
b/src/gallium/state_trackers/va/image.c
index 84d94c8..8e64673 100644
--- a/src/gallium/state_trackers/va/image.c
+++ b/src/gallium/state_trackers/va/image.c
@@ -346,13 +346,20 @@ vlVaPutImage(VADriverContextP ctx, VASurfaceID surface, 
VAImageID image,
if (format == PIPE_FORMAT_NONE)
   return VA_STATUS_ERROR_OPERATION_FAILED;
 
-   if (surf->buffer == NULL || format != surf->buffer->buffer_format) {
-  if (surf->buffer)
- surf->buffer->destroy(surf->buffer);
+   if (format != surf->buffer->buffer_format) {
+  struct pipe_video_buffer *tmp_buf;
+  enum pipe_format old_surf_format = surf->templat.buffer_format;
+
   surf->templat.buffer_format = format;
-  surf->buffer = drv->pipe->create_video_buffer(drv->pipe, >templat);
-  if (!surf->buffer)
- return VA_STATUS_ERROR_ALLOCATION_FAILED;
+  tmp_buf = drv->pipe->create_video_buffer(drv->pipe, >templat);
+
+  if (!tmp_buf) {
+  surf->templat.buffer_format = old_surf_format;
+  return VA_STATUS_ERROR_ALLOCATION_FAILED;
+  }
+
+  surf->buffer->destroy(surf->buffer);
+  surf->buffer = tmp_buf;
}
 
views = surf->buffer->get_sampler_view_planes(surf->buffer);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92552] [softpipe] piglit egl-create-context-valid-flag-forward-compatible-gl regression

2015-10-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92552

--- Comment #2 from Matthew Waters  ---
Created attachment 119006
  --> https://bugs.freedesktop.org/attachment.cgi?id=119006=edit
egl: distnguish between unsupported api vs capabilities for EGL_CONTEXT_FLAGS

This fixes that issue for me.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 15/21] mesa: Fix EXT_texture_sRGB functionality leaks

2015-10-20 Thread Marek Olšák
On Tue, Oct 20, 2015 at 12:44 AM, Nanley Chery  wrote:
> From: Nanley Chery 
>
> Stop leaks into the following contexts:
>* GLES in _mesa_base_tex_format() and lookup_view_class().
>* Pre-1.1 GL legacy contexts in all uses.

Fixing OpenGL 1.0? Seriously? Come on. :)

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 03/21] mesa/extensions: Wrap array entries in macros

2015-10-20 Thread Nanley Chery
On Tue, Oct 20, 2015 at 8:14 AM, Marek Olšák  wrote:

> On Tue, Oct 20, 2015 at 4:31 PM, Erik Faye-Lund 
> wrote:
> > On Tue, Oct 20, 2015 at 12:36 AM, Nanley Chery 
> wrote:
> >> From: Nanley Chery 
> >>
> >> -   { "GL_SUN_multi_draw_arrays",   o(dummy_true),
> GLL,1999 },
> >> +#define EXT(name_str, driver_cap, api_flags, ) \
> >> +{ .name = "GL_" #name_str, .offset = o(driver_cap), .api_set =
> api_flags, .year = },
> >
> > I suspect that using designated initializers will make the MSVC build
> unhappy.
>
> Yes, this syntax isn't allowed in core Mesa. Only drivers can use it.
>

I meant to look into this, but forgot. Thanks for catching this.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 03/21] mesa/extensions: Wrap array entries in macros

2015-10-20 Thread Nanley Chery
On Tue, Oct 20, 2015 at 8:16 AM, Marek Olšák  wrote:

> Also, the FIXME comment should be on its own line.
>
>
I moved it aside to make editing the table easier. However, since the
formatting of the
table is unlikely to change much after this series, I agree that I should
move it back to
its original position.

Thanks,
Nanley

Marek
>
> On Tue, Oct 20, 2015 at 5:14 PM, Marek Olšák  wrote:
> > On Tue, Oct 20, 2015 at 4:31 PM, Erik Faye-Lund 
> wrote:
> >> On Tue, Oct 20, 2015 at 12:36 AM, Nanley Chery 
> wrote:
> >>> From: Nanley Chery 
> >>>
> >>> -   { "GL_SUN_multi_draw_arrays",   o(dummy_true),
>   GLL,1999 },
> >>> +#define EXT(name_str, driver_cap, api_flags, ) \
> >>> +{ .name = "GL_" #name_str, .offset = o(driver_cap), .api_set
> = api_flags, .year = },
> >>
> >> I suspect that using designated initializers will make the MSVC build
> unhappy.
> >
> > Yes, this syntax isn't allowed in core Mesa. Only drivers can use it.
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 15/21] mesa: Fix EXT_texture_sRGB functionality leaks

2015-10-20 Thread Nanley Chery
On Tue, Oct 20, 2015 at 8:32 AM, Marek Olšák  wrote:

> On Tue, Oct 20, 2015 at 12:44 AM, Nanley Chery 
> wrote:
> > From: Nanley Chery 
> >
> > Stop leaks into the following contexts:
> >* GLES in _mesa_base_tex_format() and lookup_view_class().
> >* Pre-1.1 GL legacy contexts in all uses.
>
> Fixing OpenGL 1.0? Seriously? Come on. :)
>
>
:) I was just notified (in the first patch's thread) that 1.2 is the
minimum version advertised in Mesa.
I'll have to update the comments in other such patches which mention pre
1.2 versions of GL.


> Marek
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/7] st/va: add headless support, i.e. VA_DISPLAY_DRM

2015-10-20 Thread Emil Velikov
On 20 October 2015 at 17:06, Julien Isorce  wrote:
>
>
> On 19 October 2015 at 17:16, Emil Velikov  wrote:
>>
>> On 17 October 2015 at 00:14, Julien Isorce 
>> wrote:
>> > This patch allows to use gallium vaapi without requiring
>> > a X server running for your second graphic card.
>> >
>> I've sent a lengthy series which should mitigate the need of some hunks.
>
>
> Ok I'll wait for your patches to land before going further on this patch.
> Should I expect vl_winsy_drm.c in your patches ? Not sure do understood that
> part. Actually I though about having "vl_screen_create_drm" and renames
> vl_screen_create to vl_screen_create_x11 (because it takes XDisplay in
> params) but then I got confused because vl_winsys.h includes Xlib.h. Should
> future vl_screen_create_drm be in another header, vl_drm.h ?
>
My series flattens the if GALLIUM_STATIC_TARGETS spaghetti. Although
it's more of a FYI rather than "wait until they land".

On the winsys_dri vs winsys_drm side - I'm not planning to do any work
there, neither I did notice the Xlib.h dependency in vl_winsys.h.

What I'm pondering is about having a 'proper' drm backend, although
admittedly I haven't looked exactly what libva{-intel-driver,}'s
definition of that is. I'd assume that moving the non-winsys specifics
(from vl_winsys_dri.c) to vl_winsys.h and adding a
vl_screen_texture_from_drawable() equivalent for drm (amongst others).
As you can tell much of this is guesswork, so if you don't have the
time and others are happy with the approach as is, feel free to
ignore.

Cheers,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/7] nvc0: fix crash when nv50_miptree_from_handle fails

2015-10-20 Thread Julien Isorce
Signed-off-by: Julien Isorce 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_resource.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
index 12b5a02..15c803c 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
@@ -26,7 +26,8 @@ nvc0_resource_from_handle(struct pipe_screen * screen,
} else {
   struct pipe_resource *res = nv50_miptree_from_handle(screen,
templ, whandle);
-  nv04_resource(res)->vtbl = _miptree_vtbl;
+  if (res)
+ nv04_resource(res)->vtbl = _miptree_vtbl;
   return res;
}
 }
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/7] i965/fs: move the fs_reg::smear() from get_timestamp() to the callers

2015-10-20 Thread Emil Velikov
We're about to reuse get_timestamp() for the nir_intrinsic_shader_clock.
In the latter the generalisation does not apply, so move the smear()
where needed. This also makes the function analogous to the vec4 one.

v2: Tweak the comment - The caller -> We (Matt, Connor).

Signed-off-by: Emil Velikov 
---

Connor,

The diff might be a bit hard to read, but the patch does remove the 
comment from get_timestamp(), If you guys prefer I can also drop it from 
shader_time_begin() and(?) delay the smear until emit_shader_time_end().

-Emil

 src/mesa/drivers/dri/i965/brw_fs.cpp | 32 ++--
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index a2fd441..93bb55d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -521,7 +521,15 @@ fs_visitor::get_timestamp(const fs_builder )
 */
bld.group(4, 0).exec_all().MOV(dst, ts);
 
-   /* The caller wants the low 32 bits of the timestamp.  Since it's running
+   return dst;
+}
+
+void
+fs_visitor::emit_shader_time_begin()
+{
+   shader_start_time = get_timestamp(bld.annotate("shader time start"));
+
+   /* We want only the low 32 bits of the timestamp.  Since it's running
 * at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds,
 * which is plenty of time for our purposes.  It is identical across the
 * EUs, but since it's tracking GPU core speed it will increment at a
@@ -531,15 +539,7 @@ fs_visitor::get_timestamp(const fs_builder )
 * else that might disrupt timing) by setting smear to 2 and checking if
 * that field is != 0.
 */
-   dst.set_smear(0);
-
-   return dst;
-}
-
-void
-fs_visitor::emit_shader_time_begin()
-{
-   shader_start_time = get_timestamp(bld.annotate("shader time start"));
+   shader_start_time.set_smear(0);
 }
 
 void
@@ -553,6 +553,18 @@ fs_visitor::emit_shader_time_end()
 
fs_reg shader_end_time = get_timestamp(ibld);
 
+   /* We want only the low 32 bits of the timestamp.  Since it's running
+* at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds,
+* which is plenty of time for our purposes.  It is identical across the
+* EUs, but since it's tracking GPU core speed it will increment at a
+* varying rate as render P-states change.
+*
+* The caller could also check if render P-states have changed (or anything
+* else that might disrupt timing) by setting smear to 2 and checking if
+* that field is != 0.
+*/
+   shader_end_time.set_smear(0);
+
/* Check that there weren't any timestamp reset events (assuming these
 * were the only two timestamp reads that happened).
 */
-- 
2.6.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 18/21] mesa/extensions: Remove extra memsets on gl_extensions

2015-10-20 Thread Nanley Chery
On Tue, Oct 20, 2015 at 8:49 AM, Marek Olšák  wrote:

> On Tue, Oct 20, 2015 at 12:44 AM, Nanley Chery 
> wrote:
> > From: Nanley Chery 
> >
> > Aside from those modified in this commit, all gl_extensions structs are
> > zero-initialized by default. There is therefore no need to memset the
> > structs to 0. Also, remove the open-coded memset in
> > _mesa_init_extensions().
> >
> > Signed-off-by: Nanley Chery 
> > ---
> >  src/mesa/main/extensions.c | 18 --
> >  1 file changed, 4 insertions(+), 14 deletions(-)
> >
> > diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
> > index 365e7ed..6cd2b27 100644
> > --- a/src/mesa/main/extensions.c
> > +++ b/src/mesa/main/extensions.c
> > @@ -37,8 +37,8 @@
> >  #include "macros.h"
> >  #include "mtypes.h"
> >
> > -struct gl_extensions _mesa_extension_override_enables;
> > -struct gl_extensions _mesa_extension_override_disables;
> > +struct gl_extensions _mesa_extension_override_enables = {0};
> > +struct gl_extensions _mesa_extension_override_disables = {0};
>
> This looks good. Note that global variables are always
> zero-initialized, thus the initializer is useless there. This hunk
> could be dropped, but it's not a big deal.
>
>
Thanks for the feedback. My git comment incorrectly says that
those two variables are not zero-initialized. So I'll have to drop
this hunk and update the comment.

Nanley

> Marek
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/7] nvc0: fix crash when nv50_miptree_from_handle fails

2015-10-20 Thread samuel.pitoiset
Is there a particular situation where nv50_miptree_from_handle() fails? 
And did you check nv50?


Anyway, this patch is:
Reviewed-by: Samuel Pitoiset 

On 20/10/2015 18:34, Julien Isorce wrote:

Signed-off-by: Julien Isorce 
---
  src/gallium/drivers/nouveau/nvc0/nvc0_resource.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
index 12b5a02..15c803c 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_resource.c
@@ -26,7 +26,8 @@ nvc0_resource_from_handle(struct pipe_screen * screen,
 } else {
struct pipe_resource *res = nv50_miptree_from_handle(screen,
 templ, whandle);
-  nv04_resource(res)->vtbl = _miptree_vtbl;
+  if (res)
+ nv04_resource(res)->vtbl = _miptree_vtbl;
return res;
 }
  }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2015-10-20 Thread Rowley, Timothy O
Hi.  I'd like to introduce the Mesa3D community to a software project
that we hope to upstream.  We're a small team at Intel working on
software defined visualization (http://sdvis.org/), and have
opensource projects in both the raytracing (Embree, OSPRay) and
rasterization (OpenSWR) realms.

We're a different Intel team from that of i965 fame, with a different
type of customer and workloads.  Our customers have large clusters of
compute nodes that for various reasons do not have GPUs, and are
working with extremely large geometry models.

We've been working on a high performance, highly scalable rasterizer
and driver to interface with Mesa3D.  Our rasterizer functions as a
"software gpu", relying on the mature well-supported Mesa3D to provide
API and state tracking layers.

We would like to contribute this code to Mesa3D and continue doing
active development in your source repository.  We welcome discussion
about how this will happen and questions about the project itself.
Below are some answers to what we think might be frequently asked
questions.

Bruce and I will be the public contacts for this project, but this
project isn't solely our work - there's a dedicated group of people
working on the core SWR code.

  Tim Rowley
  Bruce Cherniak

  Intel Corporation

Why another software rasterizer?


Good question, given there are already three (swrast, softpipe,
llvmpipe) in the Mesa3D tree. Two important reasons for this:

 * Architecture - given our focus on scientific visualization, our
   workloads are much different than the typical game; we have heavy
   vertex load and relatively simple shaders.  In addition, the core
   counts of machines we run on are much higher.  These parameters led
   to design decisions much different than llvmpipe.

 * Historical - Intel had developed a high performance software
   graphics stack for internal purposes.  Later we adapted this
   graphics stack for use in visualization and decided to move forward
   with Mesa3D to provide a high quality API layer while at the same
   time benefiting from the excellent performance the software
   rasterizerizer gives us.

What's the architecture?


SWR is a tile based immediate mode renderer with a sort-free threading
model which is arranged as a ring of queues.  Each entry in the ring
represents a draw context that contains all of the draw state and work
queues.  An API thread sets up each draw context and worker threads
will execute both the frontend (vertex/geometry processing) and
backend (fragment) work as required.  The ring allows for backend
threads to pull work in order.  Large draws are split into chunks to
allow vertex processing to happen in parallel, with the backend work
pickup preserving draw ordering.

Our pipeline uses just-in-time compiled code for the fetch shader that
does vertex attribute gathering and AOS to SOA conversions, the vertex
shader and fragment shaders, streamout, and fragment blending. SWR
core also supports geometry and compute shaders but we haven't exposed
them through our driver yet. The fetch shader, streamout, and blend is
built internally to swr core using LLVM directly, while for the vertex
and pixel shaders we reuse bits of llvmpipe from
gallium/auxiliary/gallivm to build the kernels, which we wrap
differently than llvmpipe's auxiliary/draw code.

What's the performance?
---

For the types of high-geometry workloads we're interested in, we are
significantly faster than llvmpipe.  This is to be expected, as
llvmpipe only threads the fragment processing and not the geometry
frontend.

The linked slide below shows some performance numbers from a benchmark
dataset and application.  On a 36 total core dual E5-2699v3 we see
performance 29x to 51x that of llvmpipe.  

http://openswr.org/slides/SWR_Sept15.pdf

While our current performance is quite good, we know there is more
potential in this architecture.  When we switched from a prototype
OpenGL driver to Mesa we regressed performance severely, some due to
interface issues that need tuning, some differences in shader code
generation, and some due to conformance and feature additions to the
core swr.  We are looking to recovering most of this performance back.

What's the conformance?
---

The major applications we are targeting are all based on the
Visualization Toolkit (VTK), and as such our development efforts have
been focused on making sure these work as best as possible.  Our
current code passes vtk's rendering tests with their new "OpenGL2"
(really OpenGL 3.2) backend at 99%.

piglit testing shows a much lower pass rate, roughly 80% at the time
of writing.  Core SWR undergoes rigorous unit testing and we are quite
confident in the rasterizer, and understand the areas where it
currently has issues (example: line rendering is done with triangles,
so doesn't match the strict line rendering rules).  The majority of
the piglit failures 

Re: [Mesa-dev] [RFCv0 0/8] gallium: add support for NIR as alternate IR

2015-10-20 Thread Rob Clark
On Tue, Oct 20, 2015 at 11:21 AM, Marek Olšák  wrote:
> On Tue, Oct 20, 2015 at 5:17 PM, Ilia Mirkin  wrote:
>> On Tue, Oct 20, 2015 at 9:10 AM, Marek Olšák  wrote:
>>> On Tue, Oct 20, 2015 at 2:37 PM, Rob Clark  wrote:
 On Tue, Oct 20, 2015 at 6:49 AM, Marek Olšák  wrote:
> On Tue, Oct 20, 2015 at 3:51 AM, Rob Clark  wrote:
>> On Mon, Oct 19, 2015 at 8:53 PM, Marek Olšák  wrote:
>>> Yes, drivers need to "dup" tokens if they want to keep them. Their
>>> lifetime is the create function and that's it. st/mesa only keeps the
>>> tokens for vertex shaders, because it needs for emulating some
>>> features with the Draw module.
>>
>> It is something that I think would be worth revisiting, since the
>> common case is that drivers have to dup the tokens to do their own
>> variant mgmt.. so that seems like the better case to optimize for.  I
>> think the only one that isn't doing it's own variant mgmt is nouveau,
>> but Ilia mentioned at some point that there where some edge cases
>> where they really do need a variant.
>
> This would add more cruft into state trackers and modules. Doing "dup"
> in the driver is the easiest way to handle this.
>
> Also, drivers should be able to release any incoming IR if they don't
> need it. Including NIR. Having a requirement that some IR must be in
> memory for the lifetime of shaders is just overall a bad idea.

 If you need to create variants, you need to hang on to the IR until
 all possible variants are created (ie. forever)
>>>
>>> Not true. You can convert to another IR, run lowering passes, and keep
>>> the result of that.
>>
>> On nvc0, I'm just patching up the binary code directly. I may or may
>> not be able to get away with doing that on nv50... my latest thinking
>> is that I can, but haven't implemented yet. Which will mean no
>> variants for nouveau *and* fully working logic for flat shading :)
>
> Yes, patching shader bytecode is the best way to deal with variants.
> I'd strongly recommend all drivers to do that, but it's not easy if
> the modifications are non-trivial.
>


and not easy if your isa requires instruction scheduling :-(

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >