date:20150104

Re: [Mesa-dev] [RFC PATCH 28/40] i965/fs: Append uniform variables to the gather table

2015-01-04 Thread Kenneth Graunke

On Sunday, January 04, 2015 04:04:42 PM Abdiel Janulgue wrote:
> Signed-off-by: Abdiel Janulgue 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index bd9345e..2f592c9 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -180,6 +180,7 @@ fs_visitor::visit(ir_variable *ir)
>reg = new(this->mem_ctx) fs_reg(UNIFORM, param_index);
>reg->type = brw_type_for_base_type(ir->type);
>  
> +  stage_prog_data->gather_table[stage_prog_data->nr_gather_table++].reg 
> = reg->reg;
> } else if (ir->data.mode == ir_var_system_value) {
>switch (ir->data.location) {
>case SYSTEM_VALUE_BASE_VERTEX:
> 

For v2...please squash the FS and Vec4 patches which do the same thing into a
single logical patch.

It's hard to review 40 patches where many of them only add one line - it
becomes difficult to see the big picture or where things are going.

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC PATCH 23/40] i965/fs: Associate the uniform location for the fragment shader

2015-01-04 Thread Kenneth Graunke

On Sunday, January 04, 2015 04:04:37 PM Abdiel Janulgue wrote:
> Signed-off-by: Abdiel Janulgue 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 3639ed2..0f2c2c4 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -1144,6 +1144,7 @@ fs_visitor::setup_uniform_values(ir_variable *ir)
>   continue;
>}
>  
> +  brw->uniformstagemap[u] |= _NEW_FRAGMENT_CONSTANTS;
>unsigned slots = storage->type->component_slots();
>if (storage->array_elements)
>   slots *= storage->array_elements;
> 

You can't just put a giant array in brw_context and index it by uniform
numbers like this.  Every shader program has its own set of uniforms.

Let's say I use two shader programs:

Program A is:

   [vertex shader]
   uniform mat4 mvp;
   void main() { gl_Position = mvp * gl_Vertex; }

   [fragment shader]
   void main() { gl_FragColor = vec4(1.0); }

Program B is:

   [vertex shader]
   void main() { gl_Position = gl_Vertex; }

   [fragment shader]
   uniform vec4 color;
   void main() { gl_FragColor = color; }

In program A, "mvp" will be uniform 0.  In program B, "color" will be
uniform 0.  Your single global map will contain:

brw->uniformstagemap[0] == _NEW_VERTEX_CONSTANTS | _NEW_FRAGMENT_CONSTANTS

which is wrong - each program has distinct uniforms that are each only used
in a single stage.

I think the approach I recommended in my reply to patch 19 should solve this
without the need for the global table.

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC PATCH 22/40] i965: Implement fine-grained uniform uploads

2015-01-04 Thread Kenneth Graunke

On Sunday, January 04, 2015 04:04:36 PM Abdiel Janulgue wrote:
> Determine which shader stage changed their uniforms and only upload
> uniforms which belong to it.
> 
> Signed-off-by: Abdiel Janulgue 
> ---
>  src/mesa/drivers/dri/i965/brw_context.h   | 2 ++
>  src/mesa/drivers/dri/i965/brw_program.c   | 9 +
>  src/mesa/drivers/dri/i965/gen7_vs_state.c | 4 
>  3 files changed, 15 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index f384008..6706b4a 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1357,6 +1357,8 @@ struct brw_context
>uint32_t next_offset;
> } constants;
>  
> +   uint64_t uniformstagemap[MAX_UNIFORMS];
> +
> struct {
>uint32_t state_offset;
>uint32_t blend_state_offset;
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
> b/src/mesa/drivers/dri/i965/brw_program.c
> index d9a3f05..c1eec8a 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -179,6 +179,14 @@ brwProgramStringNotify(struct gl_context *ctx,
> return true;
>  }
>  
> +static void
> +brw_uniform_update(struct gl_context *ctx, GLint location)
> +{
> +   struct brw_context *brw = brw_context(ctx);
> +
> +   brw->state.dirty.mesa |= brw->uniformstagemap[location];
> +}
> +
>  void
>  brw_add_texrect_params(struct gl_program *prog)
>  {
> @@ -236,6 +244,7 @@ void brwInitFragProgFuncs( struct dd_function_table 
> *functions )
>  
> functions->NewShader = brw_new_shader;
> functions->LinkShader = brw_link_shader;
> +   functions->UniformUpdate = brw_uniform_update;
>  }
>  
>  void
> diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c 
> b/src/mesa/drivers/dri/i965/gen7_vs_state.c
> index 85bd56f..269612b 100644
> --- a/src/mesa/drivers/dri/i965/gen7_vs_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c
> @@ -70,6 +70,10 @@ gen7_upload_constant_buffer_data(struct brw_context* brw,
>_NEW_FRAGMENT_CONSTANTS
> };
>  
> +   if (!(brw->state.dirty.brw & BRW_NEW_BATCH) &&
> +   (!prog_data->nr_params || !(brw->state.dirty.mesa & 
> const_state_stage[stage_state->stage])))
> +  return;
> +
> /* If current constant data does not fit in current constant buffer bank,
>  * move to next slot. 
>  */
> 

I'm not a huge fan of how i965-centric this is, nor having a driver hook on
every uniform update.

I think we should just make core Mesa track whether a uniform exists in a
particular stage or not.  My attempt to implement that is here:

http://cgit.freedesktop.org/~kwg/mesa/commit/?id=e4777f6b1baa7b1e32bc376ba140a3e827fb9808

Then in ctx->DriverFlags, I created a new field:

   uint64_t NewUniforms[MESA_SHADER_STAGES];

In update_program_constants, I flagged
   ctx->NewDriverState |= ctx->DriverFlags.NewUniforms[MESA_SHADER_FRAGMENT];
and so on...

in _mesa_uniform, I added:

   for (int i = 0; i < MESA_SHADER_STAGES; i++) {
  if (uni->active[i])
 ctx->NewDriverState |= ctx->DriverFlags.NewUniforms[i];
   }

That may be incomplete, but I think you get the idea.  This is useful generic
infrastructure to have for all drivers, IMO.

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC PATCH 19/40] mesa: Change internal state flag to a 64-bits

2015-01-04 Thread Kenneth Graunke

On Sunday, January 04, 2015 04:04:33 PM Abdiel Janulgue wrote:
> Existing state flag cannot publish additional values.

That's not quite true - there are actually two available bits.

However, I don't think we should be adding _NEW_WHATEVER flags.  These are
basically for Mesa internals, and we've been moving drivers away from them.

Instead, you should consider using ctx->DriverFlags.

1. Create a new field, uint64_t ctx->DriverFlags.NewVSUniforms.
2. Create a new BRW_NEW_VS_UNIFORMS flag, and in brw_init_state(),
   set ctx->DriverFlags = BRW_NEW_VS_UNIFORMS.
3. When you want to signal the new flag, do:
   ctx->NewDriverState |= ctx->DriverFlags.NewVSUniforms.

This causes BRW_NEW_VS_UNIFORMS to get flagged at the appropriate time.
Other drivers can set the ctx->DriverFlags field to whatever flag they
wish to be signalled.

I like the idea of per-stage uniform flags.  I'd started implementing
that myself, but never got it all hooked up right.

> Signed-off-by: Abdiel Janulgue 
> ---
>  src/mesa/main/dd.h | 2 +-
>  src/mesa/main/mtypes.h | 3 ++-
>  src/mesa/main/state.c  | 6 +++---
>  3 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
> index 2f40915..8c737e0 100644
> --- a/src/mesa/main/dd.h
> +++ b/src/mesa/main/dd.h
> @@ -91,7 +91,7 @@ struct dd_function_table {
>  * This is in addition to any state change callbacks Mesa may already have
>  * made.
>  */
> -   void (*UpdateState)( struct gl_context *ctx, GLbitfield new_state );
> +   void (*UpdateState)( struct gl_context *ctx, GLbitfield64 new_state );
>  
> /**
>  * Resize the given framebuffer to the given size.
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> index b95dfb9..12ab3e8 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -3932,6 +3932,7 @@ struct gl_matrix_stack
>  #define _NEW_FRAG_CLAMP(1 << 29)
>  /* gap, re-use for core Mesa state only; use ctx->DriverFlags otherwise */
>  #define _NEW_VARYING_VP_INPUTS (1 << 31) /**< gl_context::varying_vp_inputs 
> */
> +
>  #define _NEW_ALL ~0
>  /*@}*/
>  
> @@ -4399,7 +4400,7 @@ struct gl_context
> struct gl_debug_state *Debug;
>  
> GLenum RenderMode;/**< either GL_RENDER, GL_SELECT, GL_FEEDBACK */
> -   GLbitfield NewState;  /**< bitwise-or of _NEW_* flags */
> +   GLbitfield64 NewState;  /**< bitwise-or of _NEW_* flags */
> uint64_t NewDriverState;  /**< bitwise-or of flags from DriverFlags */
>  
> struct gl_driver_flags DriverFlags;
> diff --git a/src/mesa/main/state.c b/src/mesa/main/state.c
> index 45bce78..ccf60de 100644
> --- a/src/mesa/main/state.c
> +++ b/src/mesa/main/state.c
> @@ -349,9 +349,9 @@ update_twoside(struct gl_context *ctx)
>  void
>  _mesa_update_state_locked( struct gl_context *ctx )
>  {
> -   GLbitfield new_state = ctx->NewState;
> -   GLbitfield prog_flags = _NEW_PROGRAM;
> -   GLbitfield new_prog_state = 0x0;
> +   GLbitfield64 new_state = ctx->NewState;
> +   GLbitfield64 prog_flags = _NEW_PROGRAM;
> +   GLbitfield64 new_prog_state = 0x0;
>  
> if (new_state == _NEW_CURRENT_ATTRIB) 
>goto out;

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 101/133] nir: Add gpu_shader5 interpolation intrinsics

2015-01-04 Thread Connor Abbott

On Tue, Dec 16, 2014 at 1:12 AM, Jason Ekstrand 
wrote:

> ---
>  src/glsl/nir/nir_intrinsics.h | 32 +++-
>  src/glsl/nir/nir_lower_io.c   | 16 ++--
>  2 files changed, 21 insertions(+), 27 deletions(-)
>
> diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
> index 75bd12f..e66273d 100644
> --- a/src/glsl/nir/nir_intrinsics.h
> +++ b/src/glsl/nir/nir_intrinsics.h
> @@ -47,6 +47,21 @@ INTRINSIC(store_var, 1, ARR(0), false, 0, 1, 0, 0)
>  INTRINSIC(copy_var, 0, ARR(), false, 0, 2, 0, 0)
>
>  /*
> + * Interpolation of input.  The interp_var_at* intrinsics are similar to
> the
> + * load_var intrinsic acting an a shader input except that they
> interpolate
> + * the input differently.  The at_sample and at_offset intrinsics take an
> + * aditional source that is a integer sample id or a vec2 position offset
> + * respectively.
> + */
> +
> +INTRINSIC(interp_var_at_centroid, 0, ARR(0), true, 0, 1, 0,
> +  NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)
> +INTRINSIC(interp_var_at_sample, 1, ARR(1), true, 0, 1, 0,
> +  NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)
> +INTRINSIC(interp_var_at_offset, 1, ARR(2), true, 0, 1, 0,
> +  NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)
> +
> +/*
>   * a barrier is an intrinsic with no inputs/outputs but which can't be
> moved
>   * around/optimized in general
>   */
> @@ -110,23 +125,6 @@ LOAD(input, 2, NIR_INTRINSIC_CAN_REORDER)
>  /* LOAD(ssbo, 2, 0) */
>
>  /*
> - * Interpolation of input.  These are similar to the load_input*
> intrinsics
> - * except they interpolate differently.  The interp_at_offset* and
> - * interp_at_offset* intrinsics take a second source that is either a
> - * sample id or a vec2 position offset.
> - */
> -
> -#define INTERP(name, num_srcs, src_comps) \
> -   INTRINSIC(interp_##name, num_srcs, ARR(src_comps), true, \
> - 0, 0, 2, NIR_INTRINSIC_CAN_ELIMINATE |
> NIR_INTRINSIC_CAN_REORDER) \
> -   INTRINSIC(interp_##name##_indirect, 1 + num_srcs, ARR(1, src_comps),
> true, \
> - 0, 0, 2, NIR_INTRINSIC_CAN_ELIMINATE |
> NIR_INTRINSIC_CAN_REORDER)
> -
> -INTERP(at_centroid, 0, 0)
> -INTERP(at_sample, 1, 1)
> -INTERP(at_offset, 1, 1)
> -
> -/*
>   * Stores work the same way as loads, except now the first register input
> is
>   * the value or array to store and the optional second input is the
> indirect
>   * offset.
> diff --git a/src/glsl/nir/nir_lower_io.c b/src/glsl/nir/nir_lower_io.c
> index ed3ce81..1ab0400 100644
> --- a/src/glsl/nir/nir_lower_io.c
> +++ b/src/glsl/nir/nir_lower_io.c
>

The changes here are unrelated, so they should get separated out and
probably squashed into the vectorizing intrinsics commit.


> @@ -205,25 +205,21 @@ nir_lower_io_block(nir_block *block, void
> *void_state)
>
>   bool has_indirect = deref_has_indirect(intrin->variables[0]);
>
> + /* Figure out the opcode */
>   nir_intrinsic_op load_op;
>   switch (mode) {
>   case nir_var_shader_in:
> -if (has_indirect) {
> -   load_op = nir_intrinsic_load_input_indirect;
> -} else {
> -   load_op = nir_intrinsic_load_input;
> -}
> +load_op = has_indirect ? nir_intrinsic_load_input_indirect :
> + nir_intrinsic_load_input;
>  break;
>   case nir_var_uniform:
> -if (has_indirect) {
> -   load_op = nir_intrinsic_load_uniform_indirect;
> -} else {
> -   load_op = nir_intrinsic_load_uniform;
> -}
> +load_op = has_indirect ? nir_intrinsic_load_uniform_indirect :
> + nir_intrinsic_load_uniform;
>  break;
>   default:
>  unreachable("Unknown variable mode");
>   }
> +
>   nir_intrinsic_instr *load =
> nir_intrinsic_instr_create(state->mem_ctx,
>  load_op);
>   load->num_components = intrin->num_components;
> --
> 2.2.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 105/133] i965/fs_nir: Implement the ARB_gpu_shader5 interpolation intrinsics

2015-01-04 Thread Connor Abbott

This is a general question for the interpolation support:

Why are we using the variable-based intrinsics directly, instead of
lowering it to something index-based in the lower_io pass just like we do
for normal inputs?

On Tue, Dec 16, 2014 at 1:12 AM, Jason Ekstrand 
wrote:

> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 102
> +++
>  1 file changed, 102 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 46d855d..67714ec 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -1410,6 +1410,108 @@ fs_visitor::nir_emit_intrinsic(nir_intrinsic_instr
> *instr)
>break;
> }
>
> +   case nir_intrinsic_interp_var_at_centroid:
> +   case nir_intrinsic_interp_var_at_sample:
> +   case nir_intrinsic_interp_var_at_offset: {
> +  /* in SIMD16 mode, the pixel interpolator returns coords interleaved
> +   * 8 channels at a time, same as the barycentric coords presented in
> +   * the FS payload. this requires a bit of extra work to support.
> +   */
> +  no16("interpolate_at_* not yet supported in SIMD16 mode.");
> +
> +  fs_reg dst_x(GRF, virtual_grf_alloc(2), BRW_REGISTER_TYPE_F);
> +  fs_reg dst_y = offset(dst_x, 1);
> +
> +  /* For most messages, we need one reg of ignored data; the hardware
> +   * requires mlen==1 even when there is no payload. in the per-slot
> +   * offset case, we'll replace this with the proper source data.
> +   */
> +  fs_reg src(this, glsl_type::float_type);
> +  int mlen = 1; /* one reg unless overriden */
> +  fs_inst *inst;
> +
> +  switch (instr->intrinsic) {
> +  case nir_intrinsic_interp_var_at_centroid:
> + inst = emit(FS_OPCODE_INTERPOLATE_AT_CENTROID, dst_x, src,
> fs_reg(0u));
> + break;
> +
> +  case nir_intrinsic_interp_var_at_sample: {
> + /* XXX: We should probably handle non-constant sample id's */
> + nir_const_value *const_sample =
> nir_src_as_const_value(instr->src[0]);
> + assert(const_sample);
> + unsigned msg_data = const_sample ? const_sample->i[0] << 4 : 0;
> + inst = emit(FS_OPCODE_INTERPOLATE_AT_SAMPLE, dst_x, src,
> + fs_reg(msg_data));
> + break;
> +  }
> +
> +  case nir_intrinsic_interp_var_at_offset: {
> + nir_const_value *const_offset =
> nir_src_as_const_value(instr->src[0]);
> +
> + if (const_offset) {
> +unsigned off_x = MIN2((int)(const_offset->f[0] * 16), 7) &
> 0xf;
> +unsigned off_y = MIN2((int)(const_offset->f[1] * 16), 7) &
> 0xf;
> +
> +inst = emit(FS_OPCODE_INTERPOLATE_AT_SHARED_OFFSET, dst_x,
> src,
> +fs_reg(off_x | (off_y << 4)));
> + } else {
> +src = fs_reg(this, glsl_type::ivec2_type);
> +fs_reg offset_src = retype(get_nir_src(instr->src[0]),
> +   BRW_REGISTER_TYPE_F);
> +for (int i = 0; i < 2; i++) {
> +   fs_reg temp(this, glsl_type::float_type);
> +   emit(MUL(temp, offset(offset_src, i), fs_reg(16.0f)));
> +   fs_reg itemp(this, glsl_type::int_type);
> +   emit(MOV(itemp, temp));  /* float to int */
> +
> +   /* Clamp the upper end of the range to +7/16.
> +* ARB_gpu_shader5 requires that we support a maximum
> offset
> +* of +0.5, which isn't representable in a S0.4 value -- if
> +* we didn't clamp it, we'd end up with -8/16, which is the
> +* opposite of what the shader author wanted.
> +*
> +* This is legal due to ARB_gpu_shader5's quantization
> +* rules:
> +*
> +* "Not all values of  may be supported; x and y
> +* offsets may be rounded to fixed-point values with the
> +* number of fraction bits given by the
> +* implementation-dependent constant
> +* FRAGMENT_INTERPOLATION_OFFSET_BITS"
> +*/
> +
> +   emit(BRW_OPCODE_SEL, offset(src, i), itemp, fs_reg(7))
> +   ->conditional_mod = BRW_CONDITIONAL_L; /* min(src2, 7)
> */
> +}
> +
> +mlen = 2;
> +inst = emit(FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET, dst_x,
> src,
> +fs_reg(0u));
> + }
> + break;
> +  }
> +
> +  default:
> + unreachable("Invalid intrinsic");
> +  }
> +
> +  inst->mlen = mlen;
> +  inst->regs_written = 2; /* 2 floats per slot returned */
> +  inst->pi_noperspective =
> instr->variables[0]->var->data.interpolation ==
> +   INTERP_QUALIFIER_NOPERSPECTIVE;
> +
> +  for (unsigned j = 0; j < instr->num_components; j++) {
> + fs_reg src = in

Re: [Mesa-dev] [PATCH 098/133] nir: Remove the old variable lowering code

2015-01-04 Thread Connor Abbott

Reviewed-by: Connor Abbott 

On Tue, Dec 16, 2014 at 1:11 AM, Jason Ekstrand 
wrote:

> ---
>  src/glsl/Makefile.sources |1 -
>  src/glsl/nir/nir_lower_variables_scalar.c | 1249
> -
>  2 files changed, 1250 deletions(-)
>  delete mode 100644 src/glsl/nir/nir_lower_variables_scalar.c
>
> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> index 3f5c0bd..3cee2e0 100644
> --- a/src/glsl/Makefile.sources
> +++ b/src/glsl/Makefile.sources
> @@ -28,7 +28,6 @@ NIR_FILES = \
> $(GLSL_SRCDIR)/nir/nir_lower_samplers.cpp \
> $(GLSL_SRCDIR)/nir/nir_lower_system_values.c \
> $(GLSL_SRCDIR)/nir/nir_lower_variables.c \
> -   $(GLSL_SRCDIR)/nir/nir_lower_variables_scalar.c \
> $(GLSL_SRCDIR)/nir/nir_lower_vec_to_movs.c \
> $(GLSL_SRCDIR)/nir/nir_metadata.c \
> $(GLSL_SRCDIR)/nir/nir_opcodes.c \
> diff --git a/src/glsl/nir/nir_lower_variables_scalar.c
> b/src/glsl/nir/nir_lower_variables_scalar.c
> deleted file mode 100644
> index 1cb79c0..000
> --- a/src/glsl/nir/nir_lower_variables_scalar.c
> +++ /dev/null
> @@ -1,1249 +0,0 @@
> -/*
> - * Copyright © 2014 Intel Corporation
> - *
> - * Permission is hereby granted, free of charge, to any person obtaining a
> - * copy of this software and associated documentation files (the
> "Software"),
> - * to deal in the Software without restriction, including without
> limitation
> - * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> - * and/or sell copies of the Software, and to permit persons to whom the
> - * Software is furnished to do so, subject to the following conditions:
> - *
> - * The above copyright notice and this permission notice (including the
> next
> - * paragraph) shall be included in all copies or substantial portions of
> the
> - * Software.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> SHALL
> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> OTHER
> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS
> - * IN THE SOFTWARE.
> - *
> - * Authors:
> - *Connor Abbott (cwabbo...@gmail.com)
> - *
> - */
> -
> -/*
> - * This lowering pass converts references to variables with loads/stores
> to
> - * registers or inputs/outputs. We assume that structure splitting has
> already
> - * been run, or else structures with indirect references can't be split.
> We
> - * also assume that this pass will be consumed by a scalar backend, so we
> pack
> - * things more tightly.
> - */
> -
> -#include "nir.h"
> -
> -static unsigned
> -type_size(const struct glsl_type *type)
> -{
> -   unsigned int size, i;
> -
> -   switch (glsl_get_base_type(type)) {
> -   case GLSL_TYPE_UINT:
> -   case GLSL_TYPE_INT:
> -   case GLSL_TYPE_FLOAT:
> -   case GLSL_TYPE_BOOL:
> -  return glsl_get_components(type);
> -   case GLSL_TYPE_ARRAY:
> -  return type_size(glsl_get_array_element(type)) *
> glsl_get_length(type);
> -   case GLSL_TYPE_STRUCT:
> -  size = 0;
> -  for (i = 0; i < glsl_get_length(type); i++) {
> - size += type_size(glsl_get_struct_field(type, i));
> -  }
> -  return size;
> -   case GLSL_TYPE_SAMPLER:
> -  return 0;
> -   case GLSL_TYPE_ATOMIC_UINT:
> -  return 0;
> -   case GLSL_TYPE_INTERFACE:
> -  return 0;
> -   case GLSL_TYPE_IMAGE:
> -  return 0;
> -   case GLSL_TYPE_VOID:
> -   case GLSL_TYPE_ERROR:
> -  unreachable("not reached");
> -   }
> -
> -   return 0;
> -}
> -
> -/*
> - * for inputs, outputs, and uniforms, assigns starting locations for
> variables
> - */
> -
> -static void
> -assign_var_locations(struct hash_table *ht, unsigned *size)
> -{
> -   unsigned location = 0;
> -
> -   struct hash_entry *entry;
> -   hash_table_foreach(ht, entry) {
> -  nir_variable *var = (nir_variable *) entry->data;
> -
> -  /*
> -   * UBO's have their own address spaces, so don't count them towards
> the
> -   * number of global uniforms
> -   */
> -  if (var->data.mode == nir_var_uniform && var->interface_type !=
> NULL)
> - continue;
> -
> -  var->data.driver_location = location;
> -  location += type_size(var->type);
> -   }
> -
> -   *size = location;
> -}
> -
> -static void
> -assign_var_locations_shader(nir_shader *shader)
> -{
> -   assign_var_locations(shader->inputs, &shader->num_inputs);
> -   assign_var_locations(shader->outputs, &shader->num_outputs);
> -   assign_var_locations(shader->uniforms, &shader->num_uniforms);
> -}
> -
> -static void
> -init_reg(nir_variable *var, nir_register *reg, struct hash_table *ht,
> - bool add_names)
> -{
> -   if (!glsl_type_is_scalar(var->type) &&
> -   !glsl_typ

Re: [Mesa-dev] [PATCH 099/133] nir: Vectorize intrinsics

2015-01-04 Thread Connor Abbott

Reviewed-by: Connor Abbott 

Nice to see that this idea worked out well!

On Tue, Dec 16, 2014 at 1:11 AM, Jason Ekstrand 
wrote:

> We used to have the number of components built into the intrinsic.  This
> meant that all of our load/store intrinsics had vec1, vec2, vec3, and vec4
> variants.  This lead to piles of switch statements to generate the correct
> texture names, and introspection to figure out the number of components.
>

This doesn't touch textures, I think you can just delete "texture" and
it'll make more sense.


> We can make things much nicer by allowing "vectorized" intrinsics.
> ---
>  src/glsl/nir/glsl_to_nir.cpp |  60 
>  src/glsl/nir/nir.h   |  15 +++-
>  src/glsl/nir/nir_intrinsics.h|  79 +++--
>  src/glsl/nir/nir_lower_io.c  | 115
> +++
>  src/glsl/nir/nir_lower_locals_to_regs.c  |  18 ++---
>  src/glsl/nir/nir_lower_system_values.c   |   3 +-
>  src/glsl/nir/nir_lower_variables.c   |  74 +++-
>  src/glsl/nir/nir_validate.c  |  10 +--
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp |  64 +
>  9 files changed, 123 insertions(+), 315 deletions(-)
>
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index f85b50e..088a8e9 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -629,7 +629,8 @@ nir_visitor::visit(ir_call *ir)
>nir_instr_insert_after_cf_list(this->cf_node_list, &instr->instr);
>
>nir_intrinsic_instr *store_instr =
> - nir_intrinsic_instr_create(shader, nir_intrinsic_store_var_vec1);
> + nir_intrinsic_instr_create(shader, nir_intrinsic_store_var);
> +  store_instr->num_components = 1;
>
>ir->return_deref->accept(this);
>store_instr->variables[0] = this->deref_head;
> @@ -704,17 +705,9 @@ nir_visitor::visit(ir_assignment *ir)
> * back into the LHS. Copy propagation should get rid of the mess.
> */
>
> -  nir_intrinsic_op load_op;
> -  switch (ir->lhs->type->vector_elements) {
> - case 1: load_op = nir_intrinsic_load_var_vec1; break;
> - case 2: load_op = nir_intrinsic_load_var_vec2; break;
> - case 3: load_op = nir_intrinsic_load_var_vec3; break;
> - case 4: load_op = nir_intrinsic_load_var_vec4; break;
> - default: unreachable("Invalid number of components"); break;
> -  }
> -
> -  nir_intrinsic_instr *load = nir_intrinsic_instr_create(this->shader,
> - load_op);
> +  nir_intrinsic_instr *load =
> + nir_intrinsic_instr_create(this->shader, nir_intrinsic_load_var);
> +  load->num_components = ir->lhs->type->vector_elements;
>load->dest.is_ssa = true;
>nir_ssa_def_init(&load->instr, &load->dest.ssa,
> num_components, NULL);
> @@ -759,17 +752,9 @@ nir_visitor::visit(ir_assignment *ir)
>src.ssa = &vec->dest.dest.ssa;
> }
>
> -   nir_intrinsic_op store_op;
> -   switch (ir->lhs->type->vector_elements) {
> -  case 1: store_op = nir_intrinsic_store_var_vec1; break;
> -  case 2: store_op = nir_intrinsic_store_var_vec2; break;
> -  case 3: store_op = nir_intrinsic_store_var_vec3; break;
> -  case 4: store_op = nir_intrinsic_store_var_vec4; break;
> -  default: unreachable("Invalid number of components"); break;
> -   }
> -
> -   nir_intrinsic_instr *store = nir_intrinsic_instr_create(this->shader,
> -   store_op);
> +   nir_intrinsic_instr *store =
> +  nir_intrinsic_instr_create(this->shader, nir_intrinsic_store_var);
> +   store->num_components = ir->lhs->type->vector_elements;
> nir_deref *store_deref = nir_copy_deref(this->shader,
> &lhs_deref->deref);
> store->variables[0] = nir_deref_as_var(store_deref);
> store->src[0] = src;
> @@ -848,17 +833,9 @@ nir_visitor::evaluate_rvalue(ir_rvalue* ir)
> * must emit a variable load.
> */
>
> -  nir_intrinsic_op load_op;
> -  switch (ir->type->vector_elements) {
> -  case 1: load_op = nir_intrinsic_load_var_vec1; break;
> -  case 2: load_op = nir_intrinsic_load_var_vec2; break;
> -  case 3: load_op = nir_intrinsic_load_var_vec3; break;
> -  case 4: load_op = nir_intrinsic_load_var_vec4; break;
> -  default: unreachable("Invalid number of components");
> -  }
> -
>nir_intrinsic_instr *load_instr =
> - nir_intrinsic_instr_create(this->shader, load_op);
> + nir_intrinsic_instr_create(this->shader, nir_intrinsic_load_var);
> +  load_instr->num_components = ir->type->vector_elements;
>load_instr->variables[0] = this->deref_head;
>add_instr(&load_instr->instr, ir->type->vector_elements);
> }
> @@ -917,23 +894,12 @@ nir_visitor::visit(ir_expression *ir)
>
>nir_intrinsic_op op;
>if (c

Re: [Mesa-dev] [PATCH 096/133] i965/fs_nir: Use the new variable lowering code

2015-01-04 Thread Connor Abbott

Reviewed-by: Connor Abbott 

Nice job getting this variable lowering stuff all done!

On Tue, Dec 16, 2014 at 1:11 AM, Jason Ekstrand 
wrote:

> This commit switches us over to the new variable lowering code which is
> capable of properly handling lowering indirects as we go.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 44
> ++--
>  1 file changed, 25 insertions(+), 19 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 0cd8fca..dbb2470 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -21,6 +21,8 @@
>   * IN THE SOFTWARE.
>   */
>
> +#include "glsl/ir.h"
> +#include "glsl/ir_optimization.h"
>  #include "glsl/nir/glsl_to_nir.h"
>  #include "brw_fs.h"
>
> @@ -28,35 +30,21 @@ void
>  fs_visitor::emit_nir_code()
>  {
> /* first, lower the GLSL IR shader to NIR */
> +   lower_output_reads(shader->base.ir);
> nir_shader *nir = glsl_to_nir(shader->base.ir, NULL, true);
> nir_validate_shader(nir);
>
> -   /* lower some of the GLSL-isms into NIR-isms - after this point, we no
> -* longer have to deal with variables inside the shader
> -*/
> -
> -   nir_lower_variables_scalar(nir, true, true, true, true);
> -   nir_validate_shader(nir);
> -
> -   nir_lower_samplers(nir, shader_prog, shader->base.Program);
> +   nir_lower_global_vars_to_local(nir);
> nir_validate_shader(nir);
>
> -   nir_lower_system_values(nir);
> -   nir_validate_shader(nir);
> -
> -   nir_lower_atomics(nir);
> -   nir_validate_shader(nir);
> -
> -   nir_remove_dead_variables(nir);
> -   nir_opt_global_to_local(nir);
> -   nir_validate_shader(nir);
> -
> -   nir_convert_to_ssa(nir);
> +   nir_split_var_copies(nir);
> nir_validate_shader(nir);
>
> bool progress;
> do {
>progress = false;
> +  nir_lower_variables(nir);
> +  nir_validate_shader(nir);
>progress |= nir_copy_prop(nir);
>nir_validate_shader(nir);
>progress |= nir_opt_dce(nir);
> @@ -69,11 +57,29 @@ fs_visitor::emit_nir_code()
>nir_validate_shader(nir);
> } while (progress);
>
> +   /* Lower a bunch of stuff */
> +   nir_lower_io(nir);
> +   nir_validate_shader(nir);
> +
> +   nir_lower_locals_to_regs(nir);
> +   nir_validate_shader(nir);
> +
> +   nir_remove_dead_variables(nir);
> +   nir_validate_shader(nir);
> nir_convert_from_ssa(nir);
> nir_validate_shader(nir);
> nir_lower_vec_to_movs(nir);
> nir_validate_shader(nir);
>
> +   nir_lower_samplers(nir, shader_prog, shader->base.Program);
> +   nir_validate_shader(nir);
> +
> +   nir_lower_system_values(nir);
> +   nir_validate_shader(nir);
> +
> +   nir_lower_atomics(nir);
> +   nir_validate_shader(nir);
> +
> /* emit the arrays used for inputs and outputs - load/store intrinsics
> will
>  * be converted to reads/writes of these arrays
>  */
> --
> 2.2.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 097/133] nir/validate: Ensure that outputs are write-only and inputs are read-only

2015-01-04 Thread Connor Abbott

I'm not so sure how I feel about checking that outputs are write-only...
eventually we'll want to do lower_input_reads in NIR itself, at which point
we'll need to remove that part from the validator. At the same time, for
now this is somewhat useful. I'm just not sure if it's worth it (making
sure inputs are read-only definitely is, though).

On Tue, Dec 16, 2014 at 1:11 AM, Jason Ekstrand 
wrote:

> ---
>  src/glsl/nir/nir_validate.c | 23 +++
>  1 file changed, 23 insertions(+)
>
> diff --git a/src/glsl/nir/nir_validate.c b/src/glsl/nir/nir_validate.c
> index 80faa15..b8ef802 100644
> --- a/src/glsl/nir/nir_validate.c
> +++ b/src/glsl/nir/nir_validate.c
> @@ -337,6 +337,29 @@ validate_intrinsic_instr(nir_intrinsic_instr *instr,
> validate_state *state)
>validate_deref_var(instr->variables[i], state);
> }
>
> +   switch (instr->intrinsic) {
> +   case nir_intrinsic_load_var_vec1:
> +   case nir_intrinsic_load_var_vec2:
> +   case nir_intrinsic_load_var_vec3:
> +   case nir_intrinsic_load_var_vec4:
> +  assert(instr->variables[0]->var->data.mode != nir_var_shader_out);
> +  break;
> +   case nir_intrinsic_store_var_vec1:
> +   case nir_intrinsic_store_var_vec2:
> +   case nir_intrinsic_store_var_vec3:
> +   case nir_intrinsic_store_var_vec4:
> +  assert(instr->variables[0]->var->data.mode != nir_var_shader_in &&
> + instr->variables[0]->var->data.mode != nir_var_uniform);
> +  break;
> +   case nir_intrinsic_copy_var:
> +  assert(instr->variables[0]->var->data.mode != nir_var_shader_in &&
> + instr->variables[0]->var->data.mode != nir_var_uniform);
> +  assert(instr->variables[1]->var->data.mode != nir_var_shader_out);
> +  break;
> +   default:
> +  break;
> +   }
> +
> if (instr->has_predicate)
>validate_src(&instr->predicate, state);
>  }
> --
> 2.2.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 095/133] nir/glsl: Generate SSA NIR

2015-01-04 Thread Connor Abbott

Except for the minor stale comment and assuming you checked that we don't
call nir_create_local_reg() anymore,

Reviewed-by: Connor Abbott 

On Tue, Dec 16, 2014 at 1:11 AM, Jason Ekstrand 
wrote:

> With this commit, the GLSL IR -> NIR pass generates NIR in more-or-less SSA
> form.  It's SSA in the sense that it doesn't have any registers, but it
> isn't really useful SSA because it still has a pile of load/store
> intrinsics that we will need to get rid of.
> ---
>  src/glsl/nir/glsl_to_nir.cpp | 246
> ---
>  1 file changed, 117 insertions(+), 129 deletions(-)
>
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index 6870bd2..f85b50e 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -618,15 +618,13 @@ nir_visitor::visit(ir_call *ir)
>   assert(0);
>}
>
> -  nir_register *reg = nir_local_reg_create(impl);
> -  reg->num_components = 1;
> -
>nir_intrinsic_instr *instr = nir_intrinsic_instr_create(shader, op);
>ir_dereference *param =
>   (ir_dereference *) ir->actual_parameters.get_head();
>param->accept(this);
>instr->variables[0] = this->deref_head;
> -  instr->dest.reg.reg = reg;
> +  instr->dest.is_ssa = true;
> +  nir_ssa_def_init(&instr->instr, &instr->dest.ssa, 1, NULL);
>
>nir_instr_insert_after_cf_list(this->cf_node_list, &instr->instr);
>
> @@ -635,7 +633,8 @@ nir_visitor::visit(ir_call *ir)
>
>ir->return_deref->accept(this);
>store_instr->variables[0] = this->deref_head;
> -  store_instr->src[0].reg.reg = reg;
> +  store_instr->src[0].is_ssa = true;
> +  store_instr->src[0].ssa = &instr->dest.ssa;
>
>nir_instr_insert_after_cf_list(this->cf_node_list,
> &store_instr->instr);
>
> @@ -665,121 +664,124 @@ nir_visitor::visit(ir_call *ir)
>  void
>  nir_visitor::visit(ir_assignment *ir)
>  {
> -   if (ir->write_mask != (1 << ir->lhs->type->vector_elements) - 1 &&
> -   ir->write_mask != 0) {
> +   unsigned num_components = ir->lhs->type->vector_elements;
> +
> +   if ((ir->rhs->as_dereference() || ir->rhs->as_constant()) &&
> +   (ir->write_mask == (1 << num_components) - 1 || ir->write_mask ==
> 0)) {
> +  /* We're doing a plane-as-can-be copy, so emit a copy_var */
>

plain-as-can-be


> +  nir_intrinsic_instr *copy =
> + nir_intrinsic_instr_create(this->shader, nir_intrinsic_copy_var);
> +
> +  ir->lhs->accept(this);
> +  copy->variables[0] = this->deref_head;
> +
> +  ir->rhs->accept(this);
> +  copy->variables[1] = this->deref_head;
> +
> +
> +  if (ir->condition) {
> + nir_if *if_stmt = nir_if_create(this->shader);
> + if_stmt->condition = evaluate_rvalue(ir->condition);
> + nir_cf_node_insert_end(this->cf_node_list, &if_stmt->cf_node);
> + nir_instr_insert_after_cf_list(&if_stmt->then_list,
> ©->instr);
> +  } else {
> + nir_instr_insert_after_cf_list(this->cf_node_list, ©->instr);
> +  }
> +  return;
> +   }
> +
> +   assert(ir->rhs->type->is_scalar() || ir->rhs->type->is_vector());
> +
> +   ir->lhs->accept(this);
> +   nir_deref_var *lhs_deref = this->deref_head;
> +   nir_src src = evaluate_rvalue(ir->rhs);
> +
> +   if (ir->write_mask != (1 << num_components) - 1 && ir->write_mask !=
> 0) {
>/*
> * We have no good way to update only part of a variable, so just
> load
> -   * the LHS into a register, do a writemasked move, and then store it
> +   * the LHS and do a vec operation to combine the old with the new,
> and
> +   * then store it
> * back into the LHS. Copy propagation should get rid of the mess.
> */
>
> -  ir->lhs->accept(this);
> -  nir_deref_var *lhs_deref = this->deref_head;
> -  nir_register *reg = nir_local_reg_create(this->impl);
> -  reg->num_components = ir->lhs->type->vector_elements;
> -
> -  nir_intrinsic_op op;
> +  nir_intrinsic_op load_op;
>switch (ir->lhs->type->vector_elements) {
> - case 1: op = nir_intrinsic_load_var_vec1; break;
> - case 2: op = nir_intrinsic_load_var_vec2; break;
> - case 3: op = nir_intrinsic_load_var_vec3; break;
> - case 4: op = nir_intrinsic_load_var_vec4; break;
> - default: assert(0); break;
> + case 1: load_op = nir_intrinsic_load_var_vec1; break;
> + case 2: load_op = nir_intrinsic_load_var_vec2; break;
> + case 3: load_op = nir_intrinsic_load_var_vec3; break;
> + case 4: load_op = nir_intrinsic_load_var_vec4; break;
> + default: unreachable("Invalid number of components"); break;
>}
>
> -  nir_intrinsic_instr *load =
> nir_intrinsic_instr_create(this->shader, op);
> -  load->dest.reg.reg = reg;
> +  nir_intrinsic_instr *load = nir_intrinsic_instr_create(this->shader,
> + load_op);
> +  l

Re: [Mesa-dev] [PATCH 093/133] nir: Add a pass for lowering input/output loads/stores

2015-01-04 Thread Connor Abbott

I'd also like to rename or at least note that this is a scalar-only thing
for now... otherwise,

Reviewed-by: Connor Abbott 

On Tue, Dec 16, 2014 at 1:11 AM, Jason Ekstrand 
wrote:

> ---
>  src/glsl/Makefile.sources   |   1 +
>  src/glsl/nir/nir.h  |   2 +
>  src/glsl/nir/nir_lower_io.c | 388
> 
>  3 files changed, 391 insertions(+)
>  create mode 100644 src/glsl/nir/nir_lower_io.c
>
> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> index 6230f49..53c3e98 100644
> --- a/src/glsl/Makefile.sources
> +++ b/src/glsl/Makefile.sources
> @@ -23,6 +23,7 @@ NIR_FILES = \
> $(GLSL_SRCDIR)/nir/nir_live_variables.c \
> $(GLSL_SRCDIR)/nir/nir_lower_atomics.c \
> $(GLSL_SRCDIR)/nir/nir_lower_locals_to_regs.c \
> +   $(GLSL_SRCDIR)/nir/nir_lower_io.c \
> $(GLSL_SRCDIR)/nir/nir_lower_samplers.cpp \
> $(GLSL_SRCDIR)/nir/nir_lower_system_values.c \
> $(GLSL_SRCDIR)/nir/nir_lower_variables.c \
> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index 7d7aec7..ec9ce07 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -1360,6 +1360,8 @@ void nir_split_var_copies(nir_shader *shader);
>
>  void nir_lower_locals_to_regs(nir_shader *shader);
>
> +void nir_lower_io(nir_shader *shader);
> +
>  void nir_lower_variables(nir_shader *shader);
>
>  void nir_lower_variables_scalar(nir_shader *shader, bool lower_globals,
> diff --git a/src/glsl/nir/nir_lower_io.c b/src/glsl/nir/nir_lower_io.c
> new file mode 100644
> index 000..a3b8186
> --- /dev/null
> +++ b/src/glsl/nir/nir_lower_io.c
> @@ -0,0 +1,388 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> "Software"),
> + * to deal in the Software without restriction, including without
> limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> next
> + * paragraph) shall be included in all copies or substantial portions of
> the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + *Connor Abbott (cwabbo...@gmail.com)
> + *Jason Ekstrand (ja...@jlekstrand.net)
> + *
> + */
> +
> +/*
> + * This lowering pass converts references to input/output variables with
> + * loads/stores to actual input/output intrinsics.
> + */
> +
> +#include "nir.h"
> +
> +struct lower_io_state {
> +   void *mem_ctx;
> +};
> +
> +static unsigned
> +type_size(const struct glsl_type *type)
> +{
> +   unsigned int size, i;
> +
> +   switch (glsl_get_base_type(type)) {
> +   case GLSL_TYPE_UINT:
> +   case GLSL_TYPE_INT:
> +   case GLSL_TYPE_FLOAT:
> +   case GLSL_TYPE_BOOL:
> +  return glsl_get_components(type);
> +   case GLSL_TYPE_ARRAY:
> +  return type_size(glsl_get_array_element(type)) *
> glsl_get_length(type);
> +   case GLSL_TYPE_STRUCT:
> +  size = 0;
> +  for (i = 0; i < glsl_get_length(type); i++) {
> + size += type_size(glsl_get_struct_field(type, i));
> +  }
> +  return size;
> +   case GLSL_TYPE_SAMPLER:
> +  return 0;
> +   case GLSL_TYPE_ATOMIC_UINT:
> +  return 0;
> +   case GLSL_TYPE_INTERFACE:
> +  return 0;
> +   case GLSL_TYPE_IMAGE:
> +  return 0;
> +   case GLSL_TYPE_VOID:
> +   case GLSL_TYPE_ERROR:
> +  unreachable("not reached");
> +   }
> +
> +   return 0;
> +}
> +
> +static void
> +assign_var_locations(struct hash_table *ht, unsigned *size)
> +{
> +   unsigned location = 0;
> +
> +   struct hash_entry *entry;
> +   hash_table_foreach(ht, entry) {
> +  nir_variable *var = (nir_variable *) entry->data;
> +
> +  /*
> +   * UBO's have their own address spaces, so don't count them towards
> the
> +   * number of global uniforms
> +   */
> +  if (var->data.mode == nir_var_uniform && var->interface_type !=
> NULL)
> + continue;
> +
> +  var->data.driver_location = location;
> +  location += type_size(var->type);
> +   }
> +
> +   *size = location;
> +}
> +
> +static void
> +assign_var_locations_shader(nir_shader *shader)
> +{
> +   assign_var_locations(shader->inputs, &shader->num_inputs);
> +   assign_var_locations(shader->outputs, &shader-

Re: [Mesa-dev] [PATCH 092/133] nir: Add a pass to lower local variables to registers

2015-01-04 Thread Connor Abbott

Oh, and I forgot... can we rename this to lower_local_to_regs_scalar or at
least add a note that this won't work for vec4 backends yet due to the
different indexing?


On Tue, Dec 16, 2014 at 1:11 AM, Jason Ekstrand 
wrote:

> ---
>  src/glsl/Makefile.sources   |   1 +
>  src/glsl/nir/nir.h  |   2 +
>  src/glsl/nir/nir_lower_locals_to_regs.c | 313
> 
>  3 files changed, 316 insertions(+)
>  create mode 100644 src/glsl/nir/nir_lower_locals_to_regs.c
>
> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> index 1d3b049..6230f49 100644
> --- a/src/glsl/Makefile.sources
> +++ b/src/glsl/Makefile.sources
> @@ -22,6 +22,7 @@ NIR_FILES = \
> $(GLSL_SRCDIR)/nir/nir_intrinsics.h \
> $(GLSL_SRCDIR)/nir/nir_live_variables.c \
> $(GLSL_SRCDIR)/nir/nir_lower_atomics.c \
> +   $(GLSL_SRCDIR)/nir/nir_lower_locals_to_regs.c \
> $(GLSL_SRCDIR)/nir/nir_lower_samplers.cpp \
> $(GLSL_SRCDIR)/nir/nir_lower_system_values.c \
> $(GLSL_SRCDIR)/nir/nir_lower_variables.c \
> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index b3abfb9..7d7aec7 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -1358,6 +1358,8 @@ void nir_dump_cfg(nir_shader *shader, FILE *fp);
>
>  void nir_split_var_copies(nir_shader *shader);
>
> +void nir_lower_locals_to_regs(nir_shader *shader);
> +
>  void nir_lower_variables(nir_shader *shader);
>
>  void nir_lower_variables_scalar(nir_shader *shader, bool lower_globals,
> diff --git a/src/glsl/nir/nir_lower_locals_to_regs.c
> b/src/glsl/nir/nir_lower_locals_to_regs.c
> new file mode 100644
> index 000..caf1c29
> --- /dev/null
> +++ b/src/glsl/nir/nir_lower_locals_to_regs.c
> @@ -0,0 +1,313 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> "Software"),
> + * to deal in the Software without restriction, including without
> limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> next
> + * paragraph) shall be included in all copies or substantial portions of
> the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + *Jason Ekstrand (ja...@jlekstrand.net)
> + *
> + */
> +
> +#include "nir.h"
> +
> +struct locals_to_regs_state {
> +   void *mem_ctx;
> +   nir_function_impl *impl;
> +
> +   /* A hash table mapping derefs to registers */
> +   struct hash_table *regs_table;
> +};
> +
> +/* The following two functions implement a hash and equality check for
> + * variable dreferences.  When the hash or equality function encounters an
> + * array, it ignores the offset and whether it is direct or indirect
> + * entirely.
> + */
> +static uint32_t
> +hash_deref(const void *void_deref)
> +{
> +   const nir_deref *deref = void_deref;
> +
> +   uint32_t hash;
> +   if (deref->child) {
> +  hash = hash_deref(deref->child);
> +   } else {
> +  hash = 2166136261ul;
> +   }
> +
> +   switch (deref->deref_type) {
> +   case nir_deref_type_var:
> +  hash ^= _mesa_hash_pointer(nir_deref_as_var(deref)->var);
> +  break;
> +   case nir_deref_type_array: {
> +  hash ^= 268435183;
> +  break;
> +   }
> +   case nir_deref_type_struct:
> +  hash ^= nir_deref_as_struct(deref)->index;
> +  break;
> +   }
> +
> +   return hash * 0x01000193;
> +}
> +
> +static bool
> +derefs_equal(const void *void_a, const void *void_b)
> +{
> +   const nir_deref *a = void_a;
> +   const nir_deref *b = void_b;
> +
> +   if (a->deref_type != b->deref_type)
> +  return false;
> +
> +   switch (a->deref_type) {
> +   case nir_deref_type_var:
> +  if (nir_deref_as_var(a)->var != nir_deref_as_var(b)->var)
> + return false;
> +  break;
> +   case nir_deref_type_array:
> +  /* Do nothing.  All array derefs are the same */
> +  break;
> +   case nir_deref_type_struct:
> +  if (nir_deref_as_struct(a)->index != nir_deref_as_struct(b)->index)
> + return false;
> +  break;
> +   default:
> +  unreachable("Invalid dreference type");
> +   }
> +
> +   assert((a->child == NULL) == (b->child == NULL));
> +   if (a->c

Re: [Mesa-dev] [PATCH 092/133] nir: Add a pass to lower local variables to registers

2015-01-04 Thread Connor Abbott

On Tue, Dec 16, 2014 at 1:11 AM, Jason Ekstrand 
wrote:

> ---
>  src/glsl/Makefile.sources   |   1 +
>  src/glsl/nir/nir.h  |   2 +
>  src/glsl/nir/nir_lower_locals_to_regs.c | 313
> 
>  3 files changed, 316 insertions(+)
>  create mode 100644 src/glsl/nir/nir_lower_locals_to_regs.c
>
> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> index 1d3b049..6230f49 100644
> --- a/src/glsl/Makefile.sources
> +++ b/src/glsl/Makefile.sources
> @@ -22,6 +22,7 @@ NIR_FILES = \
> $(GLSL_SRCDIR)/nir/nir_intrinsics.h \
> $(GLSL_SRCDIR)/nir/nir_live_variables.c \
> $(GLSL_SRCDIR)/nir/nir_lower_atomics.c \
> +   $(GLSL_SRCDIR)/nir/nir_lower_locals_to_regs.c \
> $(GLSL_SRCDIR)/nir/nir_lower_samplers.cpp \
> $(GLSL_SRCDIR)/nir/nir_lower_system_values.c \
> $(GLSL_SRCDIR)/nir/nir_lower_variables.c \
> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index b3abfb9..7d7aec7 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -1358,6 +1358,8 @@ void nir_dump_cfg(nir_shader *shader, FILE *fp);
>
>  void nir_split_var_copies(nir_shader *shader);
>
> +void nir_lower_locals_to_regs(nir_shader *shader);
> +
>  void nir_lower_variables(nir_shader *shader);
>
>  void nir_lower_variables_scalar(nir_shader *shader, bool lower_globals,
> diff --git a/src/glsl/nir/nir_lower_locals_to_regs.c
> b/src/glsl/nir/nir_lower_locals_to_regs.c
> new file mode 100644
> index 000..caf1c29
> --- /dev/null
> +++ b/src/glsl/nir/nir_lower_locals_to_regs.c
> @@ -0,0 +1,313 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> "Software"),
> + * to deal in the Software without restriction, including without
> limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> next
> + * paragraph) shall be included in all copies or substantial portions of
> the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + *Jason Ekstrand (ja...@jlekstrand.net)
> + *
> + */
> +
> +#include "nir.h"
> +
> +struct locals_to_regs_state {
> +   void *mem_ctx;
> +   nir_function_impl *impl;
> +
> +   /* A hash table mapping derefs to registers */
> +   struct hash_table *regs_table;
> +};
> +
> +/* The following two functions implement a hash and equality check for
> + * variable dreferences.  When the hash or equality function encounters an
> + * array, it ignores the offset and whether it is direct or indirect
> + * entirely.
> + */
> +static uint32_t
> +hash_deref(const void *void_deref)
> +{
> +   const nir_deref *deref = void_deref;
> +
> +   uint32_t hash;
> +   if (deref->child) {
> +  hash = hash_deref(deref->child);
> +   } else {
> +  hash = 2166136261ul;
> +   }
> +
> +   switch (deref->deref_type) {
> +   case nir_deref_type_var:
> +  hash ^= _mesa_hash_pointer(nir_deref_as_var(deref)->var);
> +  break;
> +   case nir_deref_type_array: {
> +  hash ^= 268435183;
> +  break;
> +   }
> +   case nir_deref_type_struct:
> +  hash ^= nir_deref_as_struct(deref)->index;
> +  break;
> +   }
> +
> +   return hash * 0x01000193;
> +}
>

Same comment here about using FNV-1a instead.


> +
> +static bool
> +derefs_equal(const void *void_a, const void *void_b)
> +{
> +   const nir_deref *a = void_a;
> +   const nir_deref *b = void_b;
> +
> +   if (a->deref_type != b->deref_type)
> +  return false;
> +
> +   switch (a->deref_type) {
> +   case nir_deref_type_var:
> +  if (nir_deref_as_var(a)->var != nir_deref_as_var(b)->var)
> + return false;
> +  break;
>

Again, we could split this out of the loop since it's only going to be used
once.


> +   case nir_deref_type_array:
> +  /* Do nothing.  All array derefs are the same */
> +  break;
> +   case nir_deref_type_struct:
> +  if (nir_deref_as_struct(a)->index != nir_deref_as_struct(b)->index)
> + return false;
> +  break;
> +   default:
> +  unreachable("Invalid dreference type");
> +   }
> +
> +   assert((a->child == NULL) == (b->child == NULL));
> +   if (a->child)
> +  return derefs_e

Re: [Mesa-dev] [RFC PATCH 10/40] i965/blorp: Update hw-binding table entries for blorp.

2015-01-04 Thread Kenneth Graunke

On Sunday, January 04, 2015 04:04:24 PM Abdiel Janulgue wrote:
> Update the hw-generated binding table for blorp SURFACE_STATE entries.
> 
> Signed-off-by: Abdiel Janulgue 
> ---
>  src/mesa/drivers/dri/i965/gen6_blorp.cpp | 35 
> 
>  1 file changed, 26 insertions(+), 9 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.cpp 
> b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
> index d4aa955..8e78450 100644
> --- a/src/mesa/drivers/dri/i965/gen6_blorp.cpp
> +++ b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
> @@ -428,15 +428,32 @@ gen6_blorp_emit_binding_table(struct brw_context *brw,
>uint32_t wm_surf_offset_texture)
>  {
> uint32_t wm_bind_bo_offset;
> -   uint32_t *bind = (uint32_t *)
> -  brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
> -  sizeof(uint32_t) *
> -  BRW_BLORP_NUM_BINDING_TABLE_ENTRIES,
> -  32, /* alignment */
> -  &wm_bind_bo_offset);
> -   bind[BRW_BLORP_RENDERBUFFER_BINDING_TABLE_INDEX] =
> -  wm_surf_offset_renderbuffer;
> -   bind[BRW_BLORP_TEXTURE_BINDING_TABLE_INDEX] = wm_surf_offset_texture;
> +   uint32_t *bind;
> +
> +   if (brw->hw_bt_pool.bo && brw->has_resource_streamer) {

It seems like checking brw->hw_bt_pool.bo != NULL should be sufficient.
If there's no resource streamer, we shouldn't allocate the buffer.

But I don't think we want to touch BLORP, so it may be moot...

> +  BEGIN_BATCH(4);
> +  OUT_BATCH(_3DSTATE_BINDING_TABLE_EDIT_PS << 16 | (4 - 2));
> +  OUT_BATCH(0x3);
> +  {
> + OUT_BATCH(BRW_BLORP_RENDERBUFFER_BINDING_TABLE_INDEX << 16 |
> +   (wm_surf_offset_renderbuffer >> 5));
> + OUT_BATCH(BRW_BLORP_TEXTURE_BINDING_TABLE_INDEX << 16 |
> +   (wm_surf_offset_texture >> 5));
> +  }
> +  ADVANCE_BATCH();
> +  wm_bind_bo_offset = brw->hw_bt_pool.next_offset;
> +  brw->hw_bt_pool.next_offset += (256 * sizeof(uint16_t));
> +   } else {
> +  bind = (uint32_t *)
> + brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
> + sizeof(uint32_t) *
> + BRW_BLORP_NUM_BINDING_TABLE_ENTRIES,
> + 32, /* alignment */
> + &wm_bind_bo_offset);
> +  bind[BRW_BLORP_RENDERBUFFER_BINDING_TABLE_INDEX] =
> + wm_surf_offset_renderbuffer;
> +  bind[BRW_BLORP_TEXTURE_BINDING_TABLE_INDEX] = wm_surf_offset_texture;
> +   }
>  
> return wm_bind_bo_offset;
>  }
> 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC PATCH 06/40] i965/gen7.5: Enable hardware-generated binding tables in blorp path

2015-01-04 Thread Kenneth Graunke

On Sunday, January 04, 2015 04:04:20 PM Abdiel Janulgue wrote:
> Signed-off-by: Abdiel Janulgue 
> ---
>  src/mesa/drivers/dri/i965/gen7_blorp.cpp | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp 
> b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
> index 206a6ff..3d5c7df 100644
> --- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
> +++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
> @@ -824,6 +824,8 @@ gen7_blorp_exec(struct brw_context *brw,
> if (params->use_wm_prog) {
>uint32_t wm_surf_offset_renderbuffer;
>uint32_t wm_surf_offset_texture = 0;
> +  
> +  gen7_enable_hw_binding_tables(brw);
>wm_push_const_offset = gen6_blorp_emit_wm_constants(brw, params);
>intel_miptree_used_for_rendering(params->dst.mt);
>wm_surf_offset_renderbuffer =

I can't imagine there's any benefit to using the RS in BLORP.

BLORP's binding tables only contain 1-2 entries: the destination renderbuffer
and (optional) source texture.  Each BLORP operation most likely operates on
different buffers, so we may as well just re-emit the whole two entry binding
table.

BLORP also doesn't use pull constants at all, so gather constants won't help
at all either.

Unless the point is to not switch back and forth due to some cost?

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC PATCH 05/40] i965/gen7.5: Enable hardware-generated binding tables on render path.

2015-01-04 Thread Kenneth Graunke

On Sunday, January 04, 2015 04:04:19 PM Abdiel Janulgue wrote:
> This patch implements the binding table enable command which is also
> used to allocate a binding table pool where where hardware-generated
> binding table entries are flushed into.
> 
> Each binding table offset in the binding table pool is unique per
> each shader stage that are enabled within a batch.
> 
> In addition, this change inserts the required brw_tracked_state objects
> to enable hw-generated binding tables in normal render path.
> 
> Signed-off-by: Abdiel Janulgue 
> ---
>  src/mesa/drivers/dri/i965/brw_binding_tables.c | 67 
> ++
>  src/mesa/drivers/dri/i965/brw_context.c|  2 +
>  src/mesa/drivers/dri/i965/brw_context.h|  5 ++
>  src/mesa/drivers/dri/i965/brw_defines.h|  3 ++
>  src/mesa/drivers/dri/i965/brw_state.h  | 12 +
>  src/mesa/drivers/dri/i965/brw_state_upload.c   |  2 +
>  6 files changed, 91 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
> b/src/mesa/drivers/dri/i965/brw_binding_tables.c
> index ea82e71..3807301 100644
> --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
> +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
> @@ -44,6 +44,7 @@
>  #include "brw_state.h"
>  #include "intel_batchbuffer.h"
>  
> +static const int bt_size = 256 * sizeof(uint16_t);

Is this just an arbitrarily chosen size?

>  /**
>   * Upload a shader stage's binding table as indirect state.
>   *
> @@ -161,6 +162,72 @@ const struct brw_tracked_state brw_gs_binding_table = {
> .emit = brw_gs_upload_binding_table,
>  };
>  
> +/**
> + * Hardware-generated binding tables for the resource streamer
> + */
> +void
> +gen7_disable_hw_binding_tables(struct brw_context *brw)
> +{
> +   BEGIN_BATCH(3);
> +   OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC << 16 | (3 - 2));
> +   OUT_BATCH(3 << 5); /* only in HSW */

#defines for magic values, please.

> +   OUT_BATCH(0);
> +   ADVANCE_BATCH();
> +
> +   /* Pipe control workaround */
> +   BEGIN_BATCH(4);
> +   OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2));
> +   OUT_BATCH(PIPE_CONTROL_STATE_CACHE_INVALIDATE);
> +   OUT_BATCH(0); /* address */
> +   OUT_BATCH(0); /* write data */
> +   ADVANCE_BATCH();
> +}
> +
> +void
> +gen7_enable_hw_binding_tables(struct brw_context *brw)
> +{
> +   if (!brw->has_resource_streamer) {
> +  gen7_disable_hw_binding_tables(brw);
> +  return;
> +   }
> +
> +   if (!brw->hw_bt_pool.bo) {
> +  brw->hw_bt_pool.bo = drm_intel_bo_alloc(brw->bufmgr, "hw_bt",
> +  131072, 4096);

Perhaps use bt_size here, not a hardcoded 128kB?

> +  brw->hw_bt_pool.next_offset = bt_size;
> +   }
> +
> +   BEGIN_BATCH(3);
> +   OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC << 16 | (3 - 2));
> +   OUT_RELOC(brw->hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0,
> + HSW_BINDING_TABLE_ALLOC_OFFSET | GEN7_MOCS_L3 << 7);
> +   OUT_RELOC(brw->hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0,
> + brw->hw_bt_pool.bo->size);
> +   ADVANCE_BATCH();
> +
> +   /* Pipe control workaround */
> +   BEGIN_BATCH(4);
> +   OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2));
> +   OUT_BATCH(PIPE_CONTROL_STATE_CACHE_INVALIDATE);
> +   OUT_BATCH(0); /* address */
> +   OUT_BATCH(0); /* write data */
> +   ADVANCE_BATCH();

brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE);

> +}
> +
> +void
> +gen7_reset_rs_pool_offsets(struct brw_context *brw)
> +{
> +   brw->hw_bt_pool.next_offset = bt_size;
> +}
> +
> +const struct brw_tracked_state gen7_hw_binding_tables = {
> +   .dirty = {
> +  .mesa = 0,
> +  .brw = BRW_NEW_BATCH,
> +   },
> +   .emit = gen7_enable_hw_binding_tables
> +};
> +
>  /** @} */
>  
>  /**
> diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
> b/src/mesa/drivers/dri/i965/brw_context.c
> index 59f190b..b962103 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.c
> +++ b/src/mesa/drivers/dri/i965/brw_context.c
> @@ -852,6 +852,8 @@ brwCreateContext(gl_api api,
> if ((flags & __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS) != 0)
>ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_ROBUST_ACCESS_BIT_ARB;
>  
> +   brw->hw_bt_pool.bo = 0;

You don't need this - brw_context is rzalloc'd, so fields are already zeroed.

> +
> if (INTEL_DEBUG & DEBUG_SHADER_TIME)
>brw_init_shader_time(brw);
>  
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index dd8e730..17fea5b 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1334,6 +1334,11 @@ struct brw_context
>uint32_t fast_clear_op;
> } wm;
>  
> +   /* RS hardware binding table */
> +   struct {
> +  drm_intel_bo *bo;
> +  uint32_t next_offset;
> +   } hw_bt_pool;
>  
> struct {
>uint32_t state_offset;
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 28e398d..ba62811 100644

Re: [Mesa-dev] [PATCH 1/4] vbo: ignore primitive restart if FixedIndex is enabled in DrawArrays

2015-01-04 Thread Kenneth Graunke

On Sunday, January 04, 2015 10:44:17 PM Marek Olšák wrote:
> From: Marek Olšák 
> 
> From GL 4.4 Core profile:
> 
>   If both PRIMITIVE_RESTART and PRIMITIVE_RESTART_FIXED_INDEX are
>   enabled, the index value determined by PRIMITIVE_RESTART_FIXED_INDEX is
>   used. If PRIMITIVE_RESTART_FIXED_INDEX is enabled, primitive restart is not
>   performed for array elements transferred by any drawing command not taking a
>   type parameter, including all of the *Draw* commands other than *DrawEle-
>   ments*.
> 
> Cc: 10.2 10.3 10.4 
> ---
>  src/mesa/vbo/vbo_exec_array.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c
> index 6eac841..95193f2 100644
> --- a/src/mesa/vbo/vbo_exec_array.c
> +++ b/src/mesa/vbo/vbo_exec_array.c
> @@ -596,7 +596,8 @@ vbo_draw_arrays(struct gl_context *ctx, GLenum mode, 
> GLint start,
> prim[0].is_indirect = 0;
>  
> /* Implement the primitive restart index */
> -   if (ctx->Array.PrimitiveRestart && ctx->Array.RestartIndex < count) {
> +   if (ctx->Array.PrimitiveRestart && !ctx->Array.PrimitiveRestartFixedIndex 
> &&
> +   ctx->Array.RestartIndex < count) {
>GLuint primCount = 0;
>  
>if (ctx->Array.RestartIndex == start) {
> 

Patches 1-2 are:
Reviewed-by: Kenneth Graunke 

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 094/133] nir: Add a pass to lower global variables to local variables

2015-01-04 Thread Connor Abbott

On Tue, Dec 16, 2014 at 1:11 AM, Jason Ekstrand 
wrote:

> ---
>  src/glsl/Makefile.sources |   1 +
>  src/glsl/nir/nir.h|   2 +
>  src/glsl/nir/nir_lower_global_vars_to_local.c | 107
> ++
>  3 files changed, 110 insertions(+)
>  create mode 100644 src/glsl/nir/nir_lower_global_vars_to_local.c
>
> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> index 53c3e98..3f5c0bd 100644
> --- a/src/glsl/Makefile.sources
> +++ b/src/glsl/Makefile.sources
> @@ -22,6 +22,7 @@ NIR_FILES = \
> $(GLSL_SRCDIR)/nir/nir_intrinsics.h \
> $(GLSL_SRCDIR)/nir/nir_live_variables.c \
> $(GLSL_SRCDIR)/nir/nir_lower_atomics.c \
> +   $(GLSL_SRCDIR)/nir/nir_lower_global_vars_to_local.c \
> $(GLSL_SRCDIR)/nir/nir_lower_locals_to_regs.c \
> $(GLSL_SRCDIR)/nir/nir_lower_io.c \
> $(GLSL_SRCDIR)/nir/nir_lower_samplers.cpp \
> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index ec9ce07..30146d6 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -1358,6 +1358,8 @@ void nir_dump_cfg(nir_shader *shader, FILE *fp);
>
>  void nir_split_var_copies(nir_shader *shader);
>
> +void nir_lower_global_vars_to_local(nir_shader *shader);
> +
>  void nir_lower_locals_to_regs(nir_shader *shader);
>
>  void nir_lower_io(nir_shader *shader);
> diff --git a/src/glsl/nir/nir_lower_global_vars_to_local.c
> b/src/glsl/nir/nir_lower_global_vars_to_local.c
> new file mode 100644
> index 000..ec23a0a
> --- /dev/null
> +++ b/src/glsl/nir/nir_lower_global_vars_to_local.c
> @@ -0,0 +1,107 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> "Software"),
> + * to deal in the Software without restriction, including without
> limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> next
> + * paragraph) shall be included in all copies or substantial portions of
> the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + *Jason Ekstrand (ja...@jlekstrand.net)
> + *
> + */
> +
> +/*
> + * This lowering pass detects when a global variable is only being used by
> + * one function and makes it local to that function
> + */
> +
> +#include "nir.h"
> +
> +struct global_to_local_state {
> +   nir_function_impl *impl;
> +   /* A hash table keyed on variable pointers that stores the unique
> +* nir_function_impl that uses the given variable.  If a variable is
> +* used in multiple functions, the data for the given key will be NULL.
> +*/
> +   struct hash_table *var_func_table;
> +};
> +
> +static bool
> +mark_global_var_uses_block(nir_block *block, void *void_state)
> +{
> +   struct global_to_local_state *state = void_state;
> +
> +   nir_foreach_instr(block, instr) {
> +  if (instr->type != nir_instr_type_intrinsic)
> + continue;
> +
> +  nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
> +  unsigned num_vars =
> nir_intrinsic_infos[intrin->intrinsic].num_variables;
> +
> +  for (unsigned i = 0; i < num_vars; i++) {
> + nir_variable *var = intrin->variables[i]->var;
> + if (var->data.mode != nir_var_global)
> +continue;
> +
> + uint32_t hash = _mesa_hash_pointer(var);
> + struct hash_entry *entry =
> +_mesa_hash_table_search(state->var_func_table, hash, var);
> +
> + if (entry) {
> +if (entry->data != state->impl)
> +   entry->data = NULL;
> + }
> +
> + _mesa_hash_table_insert(state->var_func_table, hash, var,
> state->impl);
>

I think you should be doing:

if (entry) {
   ...
} else {
   _mesa_hash_table_insert(state->var_func_table, hash, var, state->impl);
}


> +  }
> +   }
> +
> +   return true;
> +}
> +
> +void
> +nir_lower_global_vars_to_local(nir_shader *shader)
> +{
> +   struct global_to_local_state state;
> +
> +   state.var_func_table = _mesa_hash_table_create(NULL,
> +
> _mesa_key_pointer_equal);
> +
> +   nir_foreach_overload(shader, overload) {
> +  if (overload->impl) {
> + state.impl = overload->impl;
> + ni

Re: [Mesa-dev] [PATCH 1/2] gallium/st: Clean up Haiku depth mapping, fix colorspace errors

2015-01-04 Thread kallisti5


On 2014-12-27 11:41, Ilia Mirkin wrote:

On Sat, Dec 27, 2014 at 1:13 AM, Alexander von Gluck IV
 wrote:

---
 src/gallium/state_trackers/hgl/hgl.c |   48 
+

 1 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/src/gallium/state_trackers/hgl/hgl.c 
b/src/gallium/state_trackers/hgl/hgl.c

index 4d7c479..0b30290 100644
--- a/src/gallium/state_trackers/hgl/hgl.c
+++ b/src/gallium/state_trackers/hgl/hgl.c
@@ -232,9 +232,10 @@ hgl_create_st_visual(ulong options)
const GLboolean alphaFlag   = ((options & BGL_ALPHA) == 
BGL_ALPHA);
const GLboolean dblFlag = ((options & BGL_DOUBLE) == 
BGL_DOUBLE);

const GLboolean stereoFlag  = false;
-   const GLint depth   = (options & BGL_DEPTH) ? 24 : 0;
-   const GLint stencil = (options & BGL_STENCIL) ? 8 : 0;
-   const GLint accum   = (options & BGL_ACCUM) ? 16 : 0;
+   const GLboolean depthFlag   = ((options & BGL_DEPTH) == 
BGL_DEPTH);
+   const GLboolean stencilFlag = ((options & BGL_STENCIL) == 
BGL_STENCIL);
+   const GLboolean accumFlag   = ((options & BGL_ACCUM) == 
BGL_ACCUM);

+
const GLint red = rgbFlag ? 8 : 5;
const GLint green   = rgbFlag ? 8 : 5;
const GLint blue= rgbFlag ? 8 : 5;
@@ -244,9 +245,9 @@ hgl_create_st_visual(ulong options)
TRACE("alpha:\t%d\n", (bool)alphaFlag);
TRACE("dbl  :\t%d\n", (bool)dblFlag);
TRACE("stereo   :\t%d\n", (bool)stereoFlag);
-   TRACE("depth:\t%d\n", depth);
-   TRACE("stencil  :\t%d\n", stencil);
-   TRACE("accum:\t%d\n", accum);
+   TRACE("depth:\t%d\n", (bool)depthFlag);
+   TRACE("stencil  :\t%d\n", (bool)stencilFlag);
+   TRACE("accum:\t%d\n", (bool)accumFlag);
TRACE("red  :\t%d\n", red);
TRACE("green:\t%d\n", green);
TRACE("blue :\t%d\n", blue);
@@ -254,34 +255,23 @@ hgl_create_st_visual(ulong options)

// Determine color format
if (red == 8) {
+   // Color format
if (alpha == 8)
-   visual->color_format = 
PIPE_FORMAT_A8R8G8B8_UNORM;
+   visual->color_format = 
PIPE_FORMAT_B8G8R8A8_UNORM;

else
-   visual->color_format = 
PIPE_FORMAT_X8R8G8B8_UNORM;
+   visual->color_format = 
PIPE_FORMAT_B8G8R8X8_UNORM;

+
+   // Depth buffer
+   if (depthFlag)
+   visual->depth_stencil_format = 
PIPE_FORMAT_Z32_UNORM;


I guess you only work with llvmpipe which supports whatever, but I
don't think a lot of hw drivers support Z32_UNORM. Z24 is much more
common. Some hardware also supports Z16 and Z32_FLOAT (and
Z32_FLOAT_S8X24_UNORM for depth/stencil combined version).


Thanks :-).  If Z24 is a lot more common i'll go with that one for now.

I'm interested to see if we could figure out something with dri3 and C++ 
in Haiku

for hardware support, but that seems pretty far off.


Further you appear to have dropped the stencil format here entirely.
If that's expected, perhaps get rid of the stencilFlag above?


Yeah, I mostly left it there in-case we wanted to set something based on 
it
in the future, that BGL_STENCIL may not be obvious to non-Haiku people 
:-)


Here are the list of flags our OpenGL Kit accepts for new GL contexts:

BGL_RGB	Use RGB graphics instead of indexed color (8-bit). This 
is the default if beither BGL_RGB nor BGL_INDEX is specified.

BGL_INDEX   Use indexed color (8-bit graphics). Not currently supported.
BGL_SINGLE	Use single-buffering; all rendering is done directly to the 
display. This is not currently supported by the BeOS implementation of 
OpenGL. This is the default
BGL_DOUBLE	Use double-buffered graphics. All rendering is done to an 
off-screen buffer and only becomes visible when the SwapBuffers() 
function is called.

BGL_ACCUM   Requests that the view have an accumulation buffer.
BGL_ALPHA	Requests that the view's color buffer include an alpha 
component.

BGL_DEPTH   Requests that the view have a depth buffer.
BGL_STENCIL Requests that the view have a stencil buffer.

Right now i'm trying to get rid of the last of the _mesa_* calls from 
the gallium target

as they're cheating a bit.

 -- Alex
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] swrast: Build fix for darwin

2015-01-04 Thread kallisti5


On 2015-01-03 10:00, Emil Velikov wrote:

On 02/01/15 04:14, Jeremy Huddleston Sequoia wrote:
This is certainly not the best solution to the problem, so I'm just 
sending this patch to the list to get the discussion started on the 
best way to solve this problem.  Currently, any platform that does not 
support _SC_PHYS_PAGES and _SC_PAGESIZE will fail to build swrast.  
_SC_PHYS_PAGES is not POSIX and thus there are many platforms out 
there that don't support it (such as OS X).



With the indentation fixed and a guard around the new includes this
should be safe to go as is. Pretty sure the latter don't exist on all 
of

mesa's supported platforms. Cc'ing Alexander (Haiku).



Actually, Haiku is good to go here :-)

~> grep -R "_SC_PHYS_PAGES" /boot/system/develop/headers/
/boot/system/develop/headers/posix/unistd.h:#define 
_SC_PHYS_PAGES	40


~> grep -R "_SC_PAGESIZE" /boot/system/develop/headers/
/boot/system/develop/headers/posix/unistd.h:#define 
_SC_PAGESIZE			_SC_PAGE_SIZE


~> grep -R "sysconf" /boot/system/develop/headers/
.
.
/boot/system/develop/headers/posix/unistd.h:/* sysconf() constants */
/boot/system/develop/headers/posix/unistd.h:extern long		sysconf(int 
name);
/boot/system/develop/headers/private/libroot/unistd_private.h:long	__sysconf_beos(int 
name);
/boot/system/develop/headers/private/libroot/unistd_private.h:long	__sysconf(int 
name);

.
.

Thanks!

 -- Alex
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 091/133] nir: Add a pass to lower local variable accesses to SSA values

2015-01-04 Thread Connor Abbott

Ok, I'm going to try reviewing this again. I'm pasting the latest version
of the file from review/nir-v1 and replying to that so that I won't get
confused between all the various changes and reorganizing things. Here we
go!

> /*
>  * Copyright © 2014 Intel Corporation
>  *
>  * Permission is hereby granted, free of charge, to any person obtaining a
>  * copy of this software and associated documentation files (the
"Software"),
>  * to deal in the Software without restriction, including without
limitation
>  * the rights to use, copy, modify, merge, publish, distribute,
sublicense,
>  * and/or sell copies of the Software, and to permit persons to whom the
>  * Software is furnished to do so, subject to the following conditions:
>  *
>  * The above copyright notice and this permission notice (including the
next
>  * paragraph) shall be included in all copies or substantial portions of
the
>  * Software.
>  *
>  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR
>  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
>  * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
SHALL
>  * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER
>  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>  * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS
>  * IN THE SOFTWARE.
>  *
>  * Authors:
>  *Jason Ekstrand (ja...@jlekstrand.net)
>  *
>  */
>
> #include "nir.h"
>
> struct deref_node {
>struct deref_node *parent;
>const struct glsl_type *type;
>
>bool lower_to_ssa;
>
>struct set *loads;
>struct set *stores;
>struct set *copies;
>
>nir_ssa_def **def_stack;
>nir_ssa_def **def_stack_tail;
>
>struct deref_node *wildcard;
>struct deref_node *indirect;
>struct deref_node *children[0];
> };
>
> struct lower_variables_state {
>void *mem_ctx;
>void *dead_ctx;
>nir_function_impl *impl;
>
>/* A hash table mapping variables to deref_node data */
>struct hash_table *deref_var_nodes;
>
>/* A hash table mapping dereference leaves to deref_node data.  A deref
> * is considered a leaf if it is fully-qualified (no wildcards) and
> * direct.  In short, these are the derefs we can actually consider
> * lowering to SSA values.
> */

I was mislead by the "leaf" name the first time I reviewed this; having a
comment explaining what it does helps, but I still think that it's a pretty
misleading name. "Leaf," at least to me, implies that it's a leaf of the
dereference tree, which in this case isn't true unless I'm missing
something here. I think "direct" is the right term here. You already say
that dereference leaves are "fully qualified (no wild cards) and direct" --
I would just call something potentially with wildcards + direct references
"not indirect," and then what you're calling "leaf" becomes just "direct."

Also, it was only after reading through the entire thing and thinking about
it that I realized why we have this hash table -- it's because we can only
figure out if direct dereferences alias indirect dereferences, so we only
consider lowering them. Of course, we may have to expand some wildcard
copies since they're copying the direct dereference we're trying to lower
to SSA, but otherwise we don't care about them. Idk if you want to explain
this here.

>struct hash_table *deref_leaves;
>
>/* A hash table mapping phi nodes to deref_state data */
>struct hash_table *phi_table;
> };
>
> /* The following two functions implement a hash and equality check for
>  * variable dreferences.  When the hash or equality function encounters an
>  * array, all indirects are treated as equal and are never equal to a
>  * direct dereference or a wildcard.
>  *
>  * Some of the magic numbers here were taken from _mesa_hash_data and one
>  * was just a big prime I found on the internet.
>  */
> static uint32_t
> hash_deref(const void *void_deref)
> {
>const nir_deref *deref = void_deref;
>
>uint32_t hash;
>if (deref->child) {
>   hash = hash_deref(deref->child);
>} else {
>   hash = 2166136261ul;
>}
>
>switch (deref->deref_type) {
>case nir_deref_type_var:
>   hash ^= _mesa_hash_pointer(nir_deref_as_var(deref)->var);
>   break;
>case nir_deref_type_array: {
>   nir_deref_array *array = nir_deref_as_array(deref);
>   hash += 268435183 * array->deref_array_type;
>   if (array->deref_array_type == nir_deref_array_type_direct)
>  hash ^= array->base_offset; /* Some prime */
>   break;
>}
>case nir_deref_type_struct:
>   hash ^= nir_deref_as_struct(deref)->index;
>   break;
>}
>
>return hash * 0x01000193;
> }

I know I complained about this already but... could we just use
_mesa_hash_data directly here, or at least properly implement FNV-1a here?
It should be the same or less code either way (FNV-1a is really, really
easy), and

Re: [Mesa-dev] [PATCH 3/3] glx/dri3: Request non-vsynced Present for swapinterval zero.

2015-01-04 Thread Aaron Plattner


On 12/16/2014 11:22 PM, Mathias Fröhlich wrote:


Hi,

On Tuesday, December 16, 2014 19:30:04 Mario Kleiner wrote:

Hmm. For benchmarking i think i'd consider that a mild form of cheating.
You get higher fps because you skip processing like the whole gpu blit
overhead and host processing overhead for queuing / validating /
processing the copy command in the command stream, so the benchmark
numbers don't translate very well anymore in how the system would behave
in a non-benchmark situation?

Agree, sounds somehow like cheating.
Nevertheless, I think I have observed the nvidia blob doing the same and
probably even more under some circumstance. I could get framerates


Really?  Do you have a link to a report of this, or some description of 
how this happened?  I could see it happen if you're running a composite 
manager, but that's only because the application can render to its 
redirected window as fast as it wants, but it's up to the compositor to 
sample those frames at its own pace.  If that's the scenario you're 
thinking of, then it's just a side effect of the non-Present compositor 
design.



there that I could not explain by issuing the same draw as before with
the buffer swap only being scheduled every n frames.
So, if you think about this, it sounds like cheating even more.
There is not only the savings that you mentioned with the swap not happening.
That's what the test application in this case already did to trigger the effect.
Looking at this I thought that if a buffer swap is scheduled, even
the draws that are not yet executed on the gpu and belonging to previous swaps
that are now hidden by the now scheduled swap are skipped.
Sure, if doing this you need to be careful if something else that must be
measurable by the application like reading back buffer data happens in between.
I have never tried to drive a verification of this interpretation to the
end, so I may be wrong.
But I ever thought that this sounds like the main reason why some gl drivers 
can draw
so much more gears a second then others can - just don't draw the ones that are 
already
proven not to be observable anymore by the application or user.
So, yes, this is kind of cheating. ... probably not the only 'cheat' for a GL
driver to be fast in a lot of 'benchmarks'.

Greetings

Mathias


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] gallium/st: Clean up Haiku depth mapping, fix colorspace errors

2015-01-04 Thread kallisti5


On 2014-12-29 16:55, Roland Scheidegger wrote:

Am 27.12.2014 um 18:41 schrieb Ilia Mirkin:

On Sat, Dec 27, 2014 at 1:13 AM, Alexander von Gluck IV
 wrote:

---
 src/gallium/state_trackers/hgl/hgl.c |   48 
+

 1 files changed, 19 insertions(+), 29 deletions(-)

@@ -244,9 +245,9 @@ hgl_create_st_visual(ulong options)
TRACE("alpha:\t%d\n", (bool)alphaFlag);
TRACE("dbl  :\t%d\n", (bool)dblFlag);
TRACE("stereo   :\t%d\n", (bool)stereoFlag);
-   TRACE("depth:\t%d\n", depth);
-   TRACE("stencil  :\t%d\n", stencil);
-   TRACE("accum:\t%d\n", accum);
+   TRACE("depth:\t%d\n", (bool)depthFlag);
+   TRACE("stencil  :\t%d\n", (bool)stencilFlag);
+   TRACE("accum:\t%d\n", (bool)accumFlag);
TRACE("red  :\t%d\n", red);
TRACE("green:\t%d\n", green);
TRACE("blue :\t%d\n", blue);
@@ -254,34 +255,23 @@ hgl_create_st_visual(ulong options)

// Determine color format
if (red == 8) {
+   // Color format
if (alpha == 8)
-   visual->color_format = 
PIPE_FORMAT_A8R8G8B8_UNORM;
+   visual->color_format = 
PIPE_FORMAT_B8G8R8A8_UNORM;

else
-   visual->color_format = 
PIPE_FORMAT_X8R8G8B8_UNORM;
+   visual->color_format = 
PIPE_FORMAT_B8G8R8X8_UNORM;

+
+   // Depth buffer
+   if (depthFlag)
+   visual->depth_stencil_format = 
PIPE_FORMAT_Z32_UNORM;


I guess you only work with llvmpipe which supports whatever, but I
don't think a lot of hw drivers support Z32_UNORM. Z24 is much more
common. Some hardware also supports Z16 and Z32_FLOAT (and
Z32_FLOAT_S8X24_UNORM for depth/stencil combined version).
FWIW llvmpipe (and softpipe) do not really support z32_unorm neither, 
it

should
never be used. (The reason is just like hw most things are done with
floats, so you've got only 24bit mantissa bits to work with really). 
So,

while it may work, the precision is probably not what you expected and
if you rely on some specific accuracy (for instance for depth offsets)
the results may be somewhat bogus. We actually wanted to drop the 
format

entirely at some point, could still do it at some point. Similar things
can be said about the other 32bit snorm/unorm formats though IIRC these
are sort of necessary for supporting gl vertex attribs which can be
32bit normalized, but they should not be needed as render target /
texture support.
So indeed z24 (with or without stencil), z16_unorm if that's good 
enough

or z32_float should be used.



I actually just rewrote pretty much all of our state_tracker code. I was 
cheating

in a a few places and using _mesa_* calls vs using stamps :-)

The patch can be found here:
http://unixzen.com/patchwork/0001-gallium-state_tracker-Rewrite-Haiku-s-state-tracker.patch

This isn't final, so i'm not attaching it to the ML just yet.

My remaining issue is when I have an OpenGL window open, and resize it, 
the
rendered item remains in the bottom left corner of the viewport at the 
original size.

This didn't happen-pre this patch. (attached)

I can say that:
  * The winsys bitmap resizes with the window
  * The state_tracker frame buffer gets destroyed and recreated that the 
proper size.


So no idea what i'm missing :-)

Feedback very, very, welcome.

 -- Alex___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 86837] kodi segfault since auxiliary/vl: rework the build of the VL code

2015-01-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=86837

--- Comment #18 from bgunte...@gmail.com ---
(In reply to bgunteriv from comment #17)
> (In reply to Andy Furniss from comment #13)
> > (In reply to Emil Velikov from comment #12)
> > > Seems like Christian dropped the link with the tentative fix.
> > > http://patchwork.freedesktop.org/patch/39400/
> > > 
> > > Guys can you test this please ?
> > 
> > Works OK for me testing with Kodi.
> 
> I spoke too soon.
> When trying to build for Kodi, my display works, but my videos do not play.
> 
> @Andy Furniss, what is your command line for building mesa?
> I'm using this -- ./autogen.sh --prefix=/opt/xorg
> --with-gallium-drivers=r600 --with-dri-drivers=radeon --enable-glx-tls
> 
> 
> am I missing something?

okay, realizing that maybe I don't have the patches included previous to this
for Kodi.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Implement WaCsStallAtEveryFourthPipecontrol on IVB/BYT.

2015-01-04 Thread Kenneth Graunke

On Sunday, January 04, 2015 12:03:01 PM Ben Widawsky wrote:
> On Wed, Nov 12, 2014 at 11:17:55AM -0800, Kenneth Graunke wrote:
> > According to the documentation, we need to do a CS stall on every fourth
> > PIPE_CONTROL command to avoid GPU hangs.  The kernel does a CS stall
> > between batches, so we only need to count the PIPE_CONTROLs in our batches.
> > 
> > v2: Get the generation check right (caught by Chris Wilson),
> > combine the ++ with the check (suggested by Daniel Vetter).
> > 
> > Signed-off-by: Kenneth Graunke 
> > Reviewed-by: Daniel Vetter 
> 
> Ken, did you want to push this patch?
> 
> [snip]

Yup, thanks.  There was some debate about whether IVB, BYT, or HSW needed it.
We found pretty convincing documentation that IVB needs it but HSW does not.

We also found documentation saying "VLV" needs it (but not VLVT?), but with
all the renaming of abbreviations, I'm inclined to assume the worst and apply
the workaround.

Pushed:

To ssh://git.freedesktop.org/git/mesa/mesa
   3793a1b..d41cf9f  master -> master

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] gallium: Plumb the swap INVALIDATE_ANCILLARY flag through more layers.

2015-01-04 Thread Eric Anholt

v2: Instead of telling the driver that the window system ancillaries have
been invalidated (when the driver doesn't know which of its buffers
are the window system's!), introduce a method for invalidating
specific surfaces.
---
 src/gallium/include/pipe/p_context.h  | 11 +++
 src/gallium/state_trackers/dri/dri_drawable.c |  6 ++
 2 files changed, 17 insertions(+)

diff --git a/src/gallium/include/pipe/p_context.h 
b/src/gallium/include/pipe/p_context.h
index af5674f..a4cae8e 100644
--- a/src/gallium/include/pipe/p_context.h
+++ b/src/gallium/include/pipe/p_context.h
@@ -551,6 +551,17 @@ struct pipe_context {
 */
void (*flush_resource)(struct pipe_context *ctx,
   struct pipe_resource *resource);
+
+   /**
+* Invalidate the contents of the resource.
+*
+* This is used to implement EGL's semantic of undefined depth/stencil
+* contenst after a swapbuffers.  This allows a tiled renderer (for
+* example) to not store the depth buffer.
+*/
+   void (*invalidate_resource)(struct pipe_context *ctx,
+   struct pipe_resource *resource);
+
 };
 
 
diff --git a/src/gallium/state_trackers/dri/dri_drawable.c 
b/src/gallium/state_trackers/dri/dri_drawable.c
index b7df053..eda2d52 100644
--- a/src/gallium/state_trackers/dri/dri_drawable.c
+++ b/src/gallium/state_trackers/dri/dri_drawable.c
@@ -484,6 +484,12 @@ dri_flush(__DRIcontext *cPriv,
   }
 
   pipe->flush_resource(pipe, drawable->textures[ST_ATTACHMENT_BACK_LEFT]);
+
+  if (pipe->invalidate_resource &&
+  (flags & __DRI2_FLUSH_INVALIDATE_ANCILLARY)) {
+ pipe->invalidate_resource(pipe, 
drawable->textures[ST_ATTACHMENT_DEPTH_STENCIL]);
+ pipe->invalidate_resource(pipe, 
drawable->msaa_textures[ST_ATTACHMENT_DEPTH_STENCIL]);
+  }
}
 
flush_flags = 0;
-- 
2.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] egl: Inform the client API when ancillary buffers may become undefined.

2015-01-04 Thread Eric Anholt

This is part of the EGL spec, and is useful for a tiled renderer to avoid
the memory bandwidth cost of storing the depth/stencil buffers.
---
 include/GL/internal/dri_interface.h |  1 +
 src/egl/drivers/dri2/egl_dri2.c | 36 +
 src/egl/drivers/dri2/egl_dri2.h |  3 +++
 src/egl/drivers/dri2/platform_android.c |  2 +-
 src/egl/drivers/dri2/platform_drm.c |  2 +-
 src/egl/drivers/dri2/platform_wayland.c | 12 +--
 src/egl/drivers/dri2/platform_x11.c |  3 +--
 7 files changed, 44 insertions(+), 15 deletions(-)

diff --git a/include/GL/internal/dri_interface.h 
b/include/GL/internal/dri_interface.h
index 8c5ceb9..1d670b1 100644
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -279,6 +279,7 @@ struct __DRItexBufferExtensionRec {
 
 #define __DRI2_FLUSH_DRAWABLE (1 << 0) /* the drawable should be flushed. */
 #define __DRI2_FLUSH_CONTEXT  (1 << 1) /* glFlush should be called */
+#define __DRI2_FLUSH_INVALIDATE_ANCILLARY (1 << 2)
 
 enum __DRI2throttleReason {
__DRI2_THROTTLE_SWAPBUFFER,
diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index 2a6811c..86e5f24 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -1087,6 +1087,42 @@ dri2_swap_interval(_EGLDriver *drv, _EGLDisplay *dpy, 
_EGLSurface *surf,
return dri2_dpy->vtbl->swap_interval(drv, dpy, surf, interval);
 }
 
+/**
+ * Asks the client API to flush any rendering to the drawable so that we can
+ * do our swapbuffers.
+ */
+void
+dri2_flush_drawable_for_swapbuffers(_EGLDisplay *disp, _EGLSurface *draw)
+{
+   struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
+   struct dri2_egl_surface *dri2_surf = dri2_egl_surface(draw);
+
+   if (dri2_dpy->flush) {
+  if (dri2_dpy->flush->base.version >= 4) {
+ /* We know there's a current context because:
+  *
+  * "If surface is not bound to the calling thread’s current
+  *  context, an EGL_BAD_SURFACE error is generated."
+ */
+ _EGLContext *ctx = _eglGetCurrentContext();
+ struct dri2_egl_context *dri2_ctx = dri2_egl_context(ctx);
+
+ /* From the EGL 1.4 spec (page 52):
+  *
+  * "The contents of ancillary buffers are always undefined
+  *  after calling eglSwapBuffers."
+  */
+ dri2_dpy->flush->flush_with_flags(dri2_ctx->dri_context,
+   dri2_surf->dri_drawable,
+   __DRI2_FLUSH_DRAWABLE |
+   __DRI2_FLUSH_INVALIDATE_ANCILLARY,
+   __DRI2_THROTTLE_SWAPBUFFER);
+  } else {
+ dri2_dpy->flush->flush(dri2_surf->dri_drawable);
+  }
+   }
+}
+
 static EGLBoolean
 dri2_swap_buffers(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSurface *surf)
 {
diff --git a/src/egl/drivers/dri2/egl_dri2.h b/src/egl/drivers/dri2/egl_dri2.h
index 52f05fb..9efe1f7 100644
--- a/src/egl/drivers/dri2/egl_dri2.h
+++ b/src/egl/drivers/dri2/egl_dri2.h
@@ -332,4 +332,7 @@ dri2_initialize_wayland(_EGLDriver *drv, _EGLDisplay *disp);
 EGLBoolean
 dri2_initialize_android(_EGLDriver *drv, _EGLDisplay *disp);
 
+void
+dri2_flush_drawable_for_swapbuffers(_EGLDisplay *disp, _EGLSurface *draw);
+
 #endif /* EGL_DRI2_INCLUDED */
diff --git a/src/egl/drivers/dri2/platform_android.c 
b/src/egl/drivers/dri2/platform_android.c
index 61a99ba..f482526 100644
--- a/src/egl/drivers/dri2/platform_android.c
+++ b/src/egl/drivers/dri2/platform_android.c
@@ -311,7 +311,7 @@ droid_swap_buffers(_EGLDriver *drv, _EGLDisplay *disp, 
_EGLSurface *draw)
  dri2_drv->glFlush();
}
 
-   (*dri2_dpy->flush->flush)(dri2_surf->dri_drawable);
+   dri2_flush_drawable_for_swapbuffers(disp, draw);
 
if (dri2_surf->buffer)
   droid_window_enqueue_buffer(dri2_surf);
diff --git a/src/egl/drivers/dri2/platform_drm.c 
b/src/egl/drivers/dri2/platform_drm.c
index 753c60f..02e87f7 100644
--- a/src/egl/drivers/dri2/platform_drm.c
+++ b/src/egl/drivers/dri2/platform_drm.c
@@ -431,7 +431,7 @@ dri2_drm_swap_buffers(_EGLDriver *drv, _EGLDisplay *disp, 
_EGLSurface *draw)
  dri2_surf->back = NULL;
   }
 
-  (*dri2_dpy->flush->flush)(dri2_surf->dri_drawable);
+  dri2_flush_drawable_for_swapbuffers(disp, draw);
   (*dri2_dpy->flush->invalidate)(dri2_surf->dri_drawable);
}
 
diff --git a/src/egl/drivers/dri2/platform_wayland.c 
b/src/egl/drivers/dri2/platform_wayland.c
index ba0eb10..e8b4413 100644
--- a/src/egl/drivers/dri2/platform_wayland.c
+++ b/src/egl/drivers/dri2/platform_wayland.c
@@ -649,17 +649,7 @@ dri2_wl_swap_buffers_with_damage(_EGLDriver *drv,
   }
}
 
-   if (dri2_dpy->flush->base.version >= 4) {
-  ctx = _eglGetCurrentContext();
-  dri2_ctx = dri2_egl_context(ctx);
-  (*dri2_dpy->flush->flush_with_flags)(dri2_ctx->dri_context,
-

[Mesa-dev] [PATCH 3/3] vc4: Skip storing the Z/S contents when it's invalidated.

2015-01-04 Thread Eric Anholt

Improves framerate of 5 seconds of es2gears by 1.57473% +/- 0.669409%
(n=67).
---
 src/gallium/drivers/vc4/vc4_context.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/gallium/drivers/vc4/vc4_context.c 
b/src/gallium/drivers/vc4/vc4_context.c
index 62f77b3..4c84bd3 100644
--- a/src/gallium/drivers/vc4/vc4_context.c
+++ b/src/gallium/drivers/vc4/vc4_context.c
@@ -467,6 +467,16 @@ vc4_cl_references_bo(struct pipe_context *pctx, struct 
vc4_bo *bo)
 }
 
 static void
+vc4_invalidate_resource(struct pipe_context *pctx, struct pipe_resource *prsc)
+{
+struct vc4_context *vc4 = vc4_context(pctx);
+struct pipe_surface *zsurf = vc4->framebuffer.zsbuf;
+
+if (zsurf && zsurf->texture == prsc)
+vc4->resolve &= ~(PIPE_CLEAR_DEPTH | PIPE_CLEAR_STENCIL);
+}
+
+static void
 vc4_context_destroy(struct pipe_context *pctx)
 {
 struct vc4_context *vc4 = vc4_context(pctx);
@@ -510,6 +520,7 @@ vc4_context_create(struct pipe_screen *pscreen, void *priv)
 pctx->priv = priv;
 pctx->destroy = vc4_context_destroy;
 pctx->flush = vc4_pipe_flush;
+pctx->invalidate_resource = vc4_invalidate_resource;
 
 vc4_draw_init(pctx);
 vc4_state_init(pctx);
-- 
2.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/13] radeonsi: remove flatshade from the shader key

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_pipe.h  |  1 +
 src/gallium/drivers/radeonsi/si_shader.h|  1 -
 src/gallium/drivers/radeonsi/si_state_shaders.c | 12 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 3632929..ba305e7 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -134,6 +134,7 @@ struct si_context {
struct si_cs_shader_state   cs_shader_state;
/* shader information */
unsignedsprite_coord_enable;
+   boolflatshade;
struct si_descriptors   vertex_buffers;
struct si_buffer_resources  const_buffers[SI_NUM_SHADERS];
struct si_buffer_resources  rw_buffers[SI_NUM_SHADERS];
diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index 124615e..21692f0 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -123,7 +123,6 @@ union si_shader_key {
unsignedlast_cbuf:3;
unsignedcolor_two_side:1;
unsignedalpha_func:3;
-   unsignedflatshade:1;
unsignedalpha_to_one:1;
} ps;
struct {
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 355f8aa..de12b4e 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -362,7 +362,6 @@ static INLINE void si_shader_selector_key(struct 
pipe_context *ctx,
 
if (sctx->queued.named.rasterizer) {
key->ps.color_two_side = 
sctx->queued.named.rasterizer->two_side;
-   key->ps.flatshade = 
sctx->queued.named.rasterizer->flatshade;
 
if (sctx->queued.named.blend) {
key->ps.alpha_to_one = 
sctx->queued.named.blend->alpha_to_one &&
@@ -632,10 +631,8 @@ bcolor:
tmp = 0;
 
if (interpolate == TGSI_INTERPOLATE_CONSTANT ||
-   (interpolate == TGSI_INTERPOLATE_COLOR &&
-ps->key.ps.flatshade)) {
+   (interpolate == TGSI_INTERPOLATE_COLOR && sctx->flatshade))
tmp |= S_028644_FLAT_SHADE(1);
-   }
 
if (name == TGSI_SEMANTIC_GENERIC &&
sctx->sprite_coord_enable & (1 << index)) {
@@ -711,6 +708,7 @@ static void si_init_gs_rings(struct si_context *sctx)
 void si_update_shaders(struct si_context *sctx)
 {
struct pipe_context *ctx = (struct pipe_context*)sctx;
+   struct si_state_rasterizer *rs = sctx->queued.named.rasterizer;
 
if (sctx->gs_shader) {
si_shader_select(ctx, sctx->gs_shader);
@@ -776,8 +774,10 @@ void si_update_shaders(struct si_context *sctx)
si_pm4_bind_state(sctx, ps, sctx->ps_shader->current->pm4);
 
if (si_pm4_state_changed(sctx, ps) || si_pm4_state_changed(sctx, vs) ||
-   sctx->sprite_coord_enable != 
sctx->queued.named.rasterizer->sprite_coord_enable) {
-   sctx->sprite_coord_enable = 
sctx->queued.named.rasterizer->sprite_coord_enable;
+   sctx->sprite_coord_enable != rs->sprite_coord_enable ||
+   sctx->flatshade != rs->flatshade) {
+   sctx->sprite_coord_enable = rs->sprite_coord_enable;
+   sctx->flatshade = rs->flatshade;
si_update_spi_map(sctx);
}
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 03/13] radeonsi: remove unused and not useful variables

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_pipe.h  | 2 --
 src/gallium/drivers/radeonsi/si_state.c | 3 +--
 src/gallium/drivers/radeonsi/si_state.h | 2 --
 3 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 83d046e..3632929 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -145,8 +145,6 @@ struct si_context {
struct r600_atommsaa_config;
int ps_iter_samples;
 
-   unsigned default_ps_gprs, default_vs_gprs;
-
/* Vertex and index buffers. */
boolvertex_buffers_dirty;
struct pipe_index_buffer index_buffer;
diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index b6b4091..c9997b3 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -835,10 +835,9 @@ static void *si_create_dsa_state(struct pipe_context *ctx,
/* alpha */
if (state->alpha.enabled) {
dsa->alpha_func = state->alpha.func;
-   dsa->alpha_ref = state->alpha.ref_value;
 
si_pm4_set_reg(pm4, R_00B030_SPI_SHADER_USER_DATA_PS_0 +
-  SI_SGPR_ALPHA_REF * 4, fui(dsa->alpha_ref));
+  SI_SGPR_ALPHA_REF * 4, 
fui(state->alpha.ref_value));
} else {
dsa->alpha_func = PIPE_FUNC_ALWAYS;
}
diff --git a/src/gallium/drivers/radeonsi/si_state.h 
b/src/gallium/drivers/radeonsi/si_state.h
index 504b428..8927e50 100644
--- a/src/gallium/drivers/radeonsi/si_state.h
+++ b/src/gallium/drivers/radeonsi/si_state.h
@@ -64,7 +64,6 @@ struct si_state_rasterizer {
unsignedpa_sc_line_stipple;
unsignedpa_su_sc_mode_cntl;
unsignedpa_cl_clip_cntl;
-   unsignedpa_cl_vs_out_cntl;
unsignedclip_plane_enable;
float   offset_units;
float   offset_scale;
@@ -72,7 +71,6 @@ struct si_state_rasterizer {
 
 struct si_state_dsa {
struct si_pm4_state pm4;
-   float   alpha_ref;
unsignedalpha_func;
uint8_t valuemask[2];
uint8_t writemask[2];
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 09/13] radeonsi: remove special handling of TGSI_INTERPOLATE_COLOR in shader codegen

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

It doesn't do anything useful. And colors are floating-point, so we can use
fs.interp, remove "flatshade" from the shader key, and rely on the FLAT_SHADE
state only (in the next patch).
---
 src/gallium/drivers/radeonsi/si_shader.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index e9c1a7f..89099e2 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -452,12 +452,8 @@ static void declare_input_fs(
else
interp_param = LLVMGetParam(main_fn, 
SI_PARAM_LINEAR_CENTER);
break;
-   case TGSI_INTERPOLATE_COLOR:
-   if (si_shader_ctx->shader->key.ps.flatshade) {
-   interp_param = 0;
-   break;
-   }
/* fall through to perspective */
+   case TGSI_INTERPOLATE_COLOR:
case TGSI_INTERPOLATE_PERSPECTIVE:
if (decl->Interp.Location == TGSI_INTERPOLATE_LOC_SAMPLE)
interp_param = LLVMGetParam(main_fn, 
SI_PARAM_PERSP_SAMPLE);
@@ -471,9 +467,18 @@ static void declare_input_fs(
return;
}
 
+   /* fs.constant returns the param from the middle vertex, so it's not
+* really useful for flat shading. It's meant to be used for custom
+* interpolation (but the intrinsic can't fetch from the other two
+* vertices).
+*
+* Luckily, it doesn't matter, because we rely on the FLAT_SHADE state
+* to do the right thing. The only reason we use fs.constant is that
+* fs.interp cannot be used on integers, because they can be equal
+* to NaN.
+*/
intr_name = interp_param ? "llvm.SI.fs.interp" : "llvm.SI.fs.constant";
 
-   /* XXX: Could there be more than TGSI_NUM_CHANNELS (4) ? */
if (decl->Semantic.Name == TGSI_SEMANTIC_COLOR &&
si_shader_ctx->shader->key.ps.color_two_side) {
LLVMValueRef args[4];
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 07/13] radeonsi: fix VertexID for OpenGL

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

This fixes all failing piglit VertexID tests.

Cc: 10.4 
---
 src/gallium/drivers/radeonsi/si_shader.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index aa051cb..fb479d5 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -591,8 +591,11 @@ static void declare_system_value(
break;
 
case TGSI_SEMANTIC_VERTEXID:
-   value = LLVMGetParam(radeon_bld->main_fn,
-si_shader_ctx->param_vertex_id);
+   value = LLVMBuildAdd(gallivm->builder,
+LLVMGetParam(radeon_bld->main_fn,
+ 
si_shader_ctx->param_vertex_id),
+LLVMGetParam(radeon_bld->main_fn,
+ SI_PARAM_BASE_VERTEX), "");
break;
 
case TGSI_SEMANTIC_SAMPLEID:
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 13/13] radeonsi: only set BC_OPTIMIZE_DISABLE when necessary

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

SPI_PS_IN_CONTROL is moved into the SPI mapping state.
---
 src/gallium/drivers/radeonsi/si_pipe.h  |  1 +
 src/gallium/drivers/radeonsi/si_state_shaders.c | 20 ++--
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 00825c1..04385b0 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -136,6 +136,7 @@ struct si_context {
unsignedsprite_coord_enable;
boolflatshade;
boolcolor_two_side;
+   boolbc_optimize_disable;
struct si_descriptors   vertex_buffers;
struct si_buffer_resources  const_buffers[SI_NUM_SHADERS];
struct si_buffer_resources  rw_buffers[SI_NUM_SHADERS];
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 282fcf2..0456d14 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -232,7 +232,7 @@ static void si_shader_ps(struct si_shader *shader)
 {
struct tgsi_shader_info *info = &shader->selector->info;
struct si_pm4_state *pm4;
-   unsigned i, spi_ps_in_control;
+   unsigned i;
unsigned num_sgprs, num_user_sgprs;
unsigned spi_baryc_cntl = 0, spi_ps_input_ena;
uint64_t va;
@@ -267,9 +267,6 @@ static void si_shader_ps(struct si_shader *shader)
}
}
 
-   spi_ps_in_control = S_0286D8_NUM_INTERP(shader->nparam) |
-   S_0286D8_BC_OPTIMIZE_DISABLE(1);
-
si_pm4_set_reg(pm4, R_0286E0_SPI_BARYC_CNTL, spi_baryc_cntl);
spi_ps_input_ena = shader->spi_ps_input_ena;
/* we need to enable at least one of them, otherwise we hang the GPU */
@@ -284,7 +281,6 @@ static void si_shader_ps(struct si_shader *shader)
 
si_pm4_set_reg(pm4, R_0286CC_SPI_PS_INPUT_ENA, spi_ps_input_ena);
si_pm4_set_reg(pm4, R_0286D0_SPI_PS_INPUT_ADDR, spi_ps_input_ena);
-   si_pm4_set_reg(pm4, R_0286D8_SPI_PS_IN_CONTROL, spi_ps_in_control);
 
si_pm4_set_reg(pm4, R_028710_SPI_SHADER_Z_FORMAT, 
shader->spi_shader_z_format);
si_pm4_set_reg(pm4, R_028714_SPI_SHADER_COL_FORMAT,
@@ -666,6 +662,10 @@ bcolor:
}
}
 
+   si_pm4_set_reg(pm4, R_0286D8_SPI_PS_IN_CONTROL,
+  S_0286D8_NUM_INTERP(ps->nparam) |
+  S_0286D8_BC_OPTIMIZE_DISABLE(sctx->bc_optimize_disable));
+
si_pm4_set_state(sctx, spi, pm4);
 }
 
@@ -711,6 +711,7 @@ void si_update_shaders(struct si_context *sctx)
 {
struct pipe_context *ctx = (struct pipe_context*)sctx;
struct si_state_rasterizer *rs = sctx->queued.named.rasterizer;
+   bool bc_optimize_disable;
 
if (sctx->gs_shader) {
si_shader_select(ctx, sctx->gs_shader);
@@ -775,13 +776,20 @@ void si_update_shaders(struct si_context *sctx)
 
si_pm4_bind_state(sctx, ps, sctx->ps_shader->current->pm4);
 
+   /* Whether CENTER != CENTROID. */
+   bc_optimize_disable = sctx->framebuffer.nr_samples > 1 &&
+ rs->multisample_enable &&
+ sctx->ps_shader->info.uses_centroid;
+
if (si_pm4_state_changed(sctx, ps) || si_pm4_state_changed(sctx, vs) ||
sctx->sprite_coord_enable != rs->sprite_coord_enable ||
sctx->flatshade != rs->flatshade ||
-   sctx->color_two_side != rs->two_side) {
+   sctx->color_two_side != rs->two_side ||
+   sctx->bc_optimize_disable != bc_optimize_disable) {
sctx->sprite_coord_enable = rs->sprite_coord_enable;
sctx->flatshade = rs->flatshade;
sctx->color_two_side = rs->two_side;
+   sctx->bc_optimize_disable = bc_optimize_disable;
si_update_spi_map(sctx);
}
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 04/13] radeonsi: enable LLVM optimizations that assume no NaNs for non-compute shaders

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeon/radeon_llvm_emit.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c 
b/src/gallium/drivers/radeon/radeon_llvm_emit.c
index dc871d7..e3be72c 100644
--- a/src/gallium/drivers/radeon/radeon_llvm_emit.c
+++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c
@@ -83,6 +83,7 @@ void radeon_llvm_shader_type(LLVMValueRef F, unsigned type)
 
if (type != TGSI_PROCESSOR_COMPUTE) {
LLVMAddTargetDependentFunctionAttr(F, "unsafe-fp-math", "true");
+   LLVMAddTargetDependentFunctionAttr(F, "enable-no-nans-fp-math", 
"true");
}
 }
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 12/13] radeonsi: remove color_two_side from the shader key

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

This can be done using the SPI mapping only. If two_side is disabled,
VS COLOR is loaded to both PS COLOR and PS BCOLOR inputs.

The disadvantage is that the PS always chooses the color according to FACE
even though two_side is disabled.

Since PS color inputs can only be used in the GL compatibility profile, only
legacy apps should be affected, which is acceptable.

The PS shader key now only contains states for PS exports. The key can be
eliminated completely by implementing a pixel shader export subroutine,
so that we can stop compiling PS on demand. (it's also necessary for taking
advantage of all the SPI color and Z formats)
---
 src/gallium/drivers/radeonsi/si_pipe.h  |  1 +
 src/gallium/drivers/radeonsi/si_shader.c|  3 +--
 src/gallium/drivers/radeonsi/si_shader.h|  1 -
 src/gallium/drivers/radeonsi/si_state_shaders.c | 27 ++---
 4 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index ba305e7..00825c1 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -135,6 +135,7 @@ struct si_context {
/* shader information */
unsignedsprite_coord_enable;
boolflatshade;
+   boolcolor_two_side;
struct si_descriptors   vertex_buffers;
struct si_buffer_resources  const_buffers[SI_NUM_SHADERS];
struct si_buffer_resources  rw_buffers[SI_NUM_SHADERS];
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 89099e2..1073723 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -479,8 +479,7 @@ static void declare_input_fs(
 */
intr_name = interp_param ? "llvm.SI.fs.interp" : "llvm.SI.fs.constant";
 
-   if (decl->Semantic.Name == TGSI_SEMANTIC_COLOR &&
-   si_shader_ctx->shader->key.ps.color_two_side) {
+   if (decl->Semantic.Name == TGSI_SEMANTIC_COLOR) {
LLVMValueRef args[4];
LLVMValueRef face, is_face_positive;
LLVMValueRef back_attr_number =
diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index 21692f0..93461be 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -121,7 +121,6 @@ union si_shader_key {
struct {
unsignedexport_16bpc:8;
unsignedlast_cbuf:3;
-   unsignedcolor_two_side:1;
unsignedalpha_func:3;
unsignedalpha_to_one:1;
} ps;
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 437dd95..282fcf2 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -360,15 +360,12 @@ static INLINE void si_shader_selector_key(struct 
pipe_context *ctx,
key->ps.last_cbuf = 
MAX2(sctx->framebuffer.state.nr_cbufs, 1) - 1;
key->ps.export_16bpc = sctx->framebuffer.export_16bpc;
 
-   if (sctx->queued.named.rasterizer) {
-   key->ps.color_two_side = 
sctx->queued.named.rasterizer->two_side;
-
-   if (sctx->queued.named.blend) {
-   key->ps.alpha_to_one = 
sctx->queued.named.blend->alpha_to_one &&
-  
sctx->queued.named.rasterizer->multisample_enable &&
-  
!sctx->framebuffer.cb0_is_integer;
-   }
+   if (sctx->queued.named.rasterizer && sctx->queued.named.blend) {
+   key->ps.alpha_to_one = 
sctx->queued.named.blend->alpha_to_one &&
+  
sctx->queued.named.rasterizer->multisample_enable &&
+  
!sctx->framebuffer.cb0_is_integer;
}
+
if (sctx->queued.named.dsa) {
key->ps.alpha_func = sctx->queued.named.dsa->alpha_func;
 
@@ -622,6 +619,7 @@ static void si_update_spi_map(struct si_context *sctx)
unsigned index = psinfo->input_semantic_index[i];
unsigned interpolate = psinfo->input_interpolate[i];
unsigned param_offset = ps->ps_input_param_offset[i];
+   bool is_bcolor = false;
 
if (name == TGSI_SEMANTIC_POSITION ||
name == TGSI_SEMANTIC_FACE)
@@ -657,10 +655,13 @@ bcolor:
   R_028644_SPI_PS_INPUT_CNTL_0 + param_offset * 4,
   tmp);
 
-   if (name == TGSI_SEMANTIC_COLOR &&
-

[Mesa-dev] [PATCH 06/13] radeonsi: clarify a hw bug in shader exports

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_shader.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index eb762c0..aa051cb 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -59,6 +59,7 @@ struct si_shader_context
struct tgsi_parse_context parse;
struct tgsi_token * tokens;
struct si_shader *shader;
+   struct si_screen *screen;
unsigned type; /* TGSI_PROCESSOR_* specifies the type of shader. */
int param_streamout_config;
int param_streamout_write_index;
@@ -1400,10 +1401,7 @@ static void si_llvm_emit_fs_epilogue(struct 
lp_build_tgsi_context * bld_base)
if (stencil_index >= 0) {
out_ptr = 
si_shader_ctx->radeon_bld.soa.outputs[stencil_index][1];
args[6] = LLVMBuildLoad(base->gallivm->builder, 
out_ptr, "");
-   /* Only setting the stencil component bit (0x2) here
-* breaks some stencil piglit tests
-*/
-   mask |= 0x3;
+   mask |= 0x2;
si_shader_ctx->shader->db_shader_control |=
S_02880C_STENCIL_TEST_VAL_EXPORT_ENABLE(1);
}
@@ -1411,10 +1409,16 @@ static void si_llvm_emit_fs_epilogue(struct 
lp_build_tgsi_context * bld_base)
if (samplemask_index >= 0) {
out_ptr = 
si_shader_ctx->radeon_bld.soa.outputs[samplemask_index][0];
args[7] = LLVMBuildLoad(base->gallivm->builder, 
out_ptr, "");
-   mask |= 0xf; /* Set all components. */
+   mask |= 0x4;
si_shader_ctx->shader->db_shader_control |= 
S_02880C_MASK_EXPORT_ENABLE(1);
}
 
+   /* SI (except OLAND) has a bug that it only looks
+* at the X writemask component. */
+   if (si_shader_ctx->screen->b.chip_class == SI &&
+   si_shader_ctx->screen->b.family != CHIP_OLAND)
+   mask |= 0x1;
+
if (samplemask_index >= 0)
si_shader_ctx->shader->spi_shader_z_format = 
V_028710_SPI_SHADER_32_ABGR;
else if (stencil_index >= 0)
@@ -2740,6 +2744,7 @@ int si_shader_create(struct si_screen *sscreen, struct 
si_shader *shader)
tgsi_parse_init(&si_shader_ctx.parse, si_shader_ctx.tokens);
si_shader_ctx.shader = shader;
si_shader_ctx.type = si_shader_ctx.parse.FullHeader.Processor.Processor;
+   si_shader_ctx.screen = sscreen;
 
switch (si_shader_ctx.type) {
case TGSI_PROCESSOR_VERTEX:
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 05/13] radeonsi: use ordered compares for SSG and face selection

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

Ordered compares are what you have in C. Unordered compares are the result
of negating ordered compares (they return true if either argument is NaN).

That special NaN behavior is completely useless here, and unordered
compares produce horrible code with all stable LLVM versions.
(I think that has been fixed in LLVM git)
---
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 4 ++--
 src/gallium/drivers/radeonsi/si_shader.c| 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index c30a9d0..dce5b55 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -1107,9 +1107,9 @@ static void emit_ssg(
cmp = LLVMBuildICmp(builder, LLVMIntSGE, val, 
bld_base->int_bld.zero, "");
val = LLVMBuildSelect(builder, cmp, val, 
LLVMConstInt(bld_base->int_bld.elem_type, -1, true), "");
} else { // float SSG
-   cmp = LLVMBuildFCmp(builder, LLVMRealUGT, emit_data->args[0], 
bld_base->base.zero, "");
+   cmp = LLVMBuildFCmp(builder, LLVMRealOGT, emit_data->args[0], 
bld_base->base.zero, "");
val = LLVMBuildSelect(builder, cmp, bld_base->base.one, 
emit_data->args[0], "");
-   cmp = LLVMBuildFCmp(builder, LLVMRealUGE, val, 
bld_base->base.zero, "");
+   cmp = LLVMBuildFCmp(builder, LLVMRealOGE, val, 
bld_base->base.zero, "");
val = LLVMBuildSelect(builder, cmp, val, 
LLVMConstReal(bld_base->base.elem_type, -1), "");
}
 
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index ce59f0e..eb762c0 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -484,7 +484,7 @@ static void declare_input_fs(
face = LLVMGetParam(main_fn, SI_PARAM_FRONT_FACE);
 
is_face_positive = LLVMBuildFCmp(gallivm->builder,
-LLVMRealUGT, face,
+LLVMRealOGT, face,
 lp_build_const_float(gallivm, 
0.0f),
 "");
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 11/13] radeonsi: do not define FACE as an ordinary PS input

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_state_shaders.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index de12b4e..437dd95 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -623,7 +623,8 @@ static void si_update_spi_map(struct si_context *sctx)
unsigned interpolate = psinfo->input_interpolate[i];
unsigned param_offset = ps->ps_input_param_offset[i];
 
-   if (name == TGSI_SEMANTIC_POSITION)
+   if (name == TGSI_SEMANTIC_POSITION ||
+   name == TGSI_SEMANTIC_FACE)
/* Read from preloaded VGPRs, not parameters */
continue;
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 01/13] radeonsi: reduce the size of si_pm4_state

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

- the relocs array is unused, remove it
- ndw is at most 115 (init), set 140 as the maximum
- compute needs 4 buffers per state, graphics only needs 1; set 4 as the maximum
---
 src/gallium/drivers/radeonsi/si_pm4.c | 6 +-
 src/gallium/drivers/radeonsi/si_pm4.h | 9 ++---
 2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pm4.c 
b/src/gallium/drivers/radeonsi/si_pm4.c
index 954eb6e..21ab9f2 100644
--- a/src/gallium/drivers/radeonsi/si_pm4.c
+++ b/src/gallium/drivers/radeonsi/si_pm4.c
@@ -145,17 +145,13 @@ unsigned si_pm4_dirty_dw(struct si_context *sctx)
 void si_pm4_emit(struct si_context *sctx, struct si_pm4_state *state)
 {
struct radeon_winsys_cs *cs = sctx->b.rings.gfx.cs;
+
for (int i = 0; i < state->nbo; ++i) {
r600_context_bo_reloc(&sctx->b, &sctx->b.rings.gfx, 
state->bo[i],
  state->bo_usage[i], 
state->bo_priority[i]);
}
 
memcpy(&cs->buf[cs->cdw], state->pm4, state->ndw * 4);
-
-   for (int i = 0; i < state->nrelocs; ++i) {
-   cs->buf[cs->cdw + state->relocs[i]] += cs->cdw << 2;
-   }
-
cs->cdw += state->ndw;
 
 #if SI_TRACE_CS
diff --git a/src/gallium/drivers/radeonsi/si_pm4.h 
b/src/gallium/drivers/radeonsi/si_pm4.h
index 8680a9e..388bb4b 100644
--- a/src/gallium/drivers/radeonsi/si_pm4.h
+++ b/src/gallium/drivers/radeonsi/si_pm4.h
@@ -29,9 +29,8 @@
 
 #include "radeon/drm/radeon_winsys.h"
 
-#define SI_PM4_MAX_DW  256
-#define SI_PM4_MAX_BO  32
-#define SI_PM4_MAX_RELOCS  4
+#define SI_PM4_MAX_DW  140
+#define SI_PM4_MAX_BO  4
 
 // forward defines
 struct si_context;
@@ -54,10 +53,6 @@ struct si_pm4_state
enum radeon_bo_usagebo_usage[SI_PM4_MAX_BO];
enum radeon_bo_priority bo_priority[SI_PM4_MAX_BO];
 
-   /* relocs for shader data */
-   unsignednrelocs;
-   unsignedrelocs[SI_PM4_MAX_RELOCS];
-
bool compute_pkt;
 };
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/13] radeonsi: remove init config from states

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

It really doesn't do anything there.
---
 src/gallium/drivers/radeonsi/si_hw_context.c | 3 +--
 src/gallium/drivers/radeonsi/si_pipe.c   | 1 +
 src/gallium/drivers/radeonsi/si_pipe.h   | 1 +
 src/gallium/drivers/radeonsi/si_pm4.c| 1 -
 src/gallium/drivers/radeonsi/si_state.c  | 2 +-
 src/gallium/drivers/radeonsi/si_state.h  | 1 -
 6 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c 
b/src/gallium/drivers/radeonsi/si_hw_context.c
index 983a097..5ebd0be 100644
--- a/src/gallium/drivers/radeonsi/si_hw_context.c
+++ b/src/gallium/drivers/radeonsi/si_hw_context.c
@@ -142,8 +142,7 @@ void si_begin_new_cs(struct si_context *ctx)
si_pm4_reset_emitted(ctx);
 
/* The CS initialization should be emitted before everything else. */
-   si_pm4_emit(ctx, ctx->queued.named.init);
-   ctx->emitted.named.init = ctx->queued.named.init;
+   si_pm4_emit(ctx, ctx->init_config);
 
ctx->clip_regs.dirty = true;
ctx->framebuffer.atom.dirty = true;
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index afb6364..4b66499 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -43,6 +43,7 @@ static void si_destroy_context(struct pipe_context *context)
pipe_resource_reference(&sctx->null_const_buf.buffer, NULL);
r600_resource_reference(&sctx->border_color_table, NULL);
 
+   si_pm4_free_state(sctx, sctx->init_config, ~0);
si_pm4_delete_state(sctx, gs_rings, sctx->gs_rings);
si_pm4_delete_state(sctx, gs_onoff, sctx->gs_on);
si_pm4_delete_state(sctx, gs_onoff, sctx->gs_off);
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 9ba4970..83d046e 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -98,6 +98,7 @@ struct si_context {
void*custom_blend_decompress;
void*custom_blend_fastclear;
struct si_screen*screen;
+   struct si_pm4_state *init_config;
 
union {
struct {
diff --git a/src/gallium/drivers/radeonsi/si_pm4.c 
b/src/gallium/drivers/radeonsi/si_pm4.c
index 21ab9f2..2729346 100644
--- a/src/gallium/drivers/radeonsi/si_pm4.c
+++ b/src/gallium/drivers/radeonsi/si_pm4.c
@@ -169,7 +169,6 @@ void si_pm4_emit_dirty(struct si_context *sctx)
if (!state || sctx->emitted.array[i] == state)
continue;
 
-   assert(state != sctx->queued.named.init);
si_pm4_emit(sctx, state);
sctx->emitted.array[i] = state;
}
diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 5a417b0..b6b4091 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -3085,5 +3085,5 @@ void si_init_config(struct si_context *sctx)
si_pm4_set_reg(pm4, R_00B01C_SPI_SHADER_PGM_RSRC3_PS, 
S_00B01C_CU_EN(0x));
}
 
-   si_pm4_set_state(sctx, init, pm4);
+   sctx->init_config = pm4;
 }
diff --git a/src/gallium/drivers/radeonsi/si_state.h 
b/src/gallium/drivers/radeonsi/si_state.h
index 0e06767..504b428 100644
--- a/src/gallium/drivers/radeonsi/si_state.h
+++ b/src/gallium/drivers/radeonsi/si_state.h
@@ -88,7 +88,6 @@ struct si_vertex_element
 
 union si_state {
struct {
-   struct si_pm4_state *init;
struct si_state_blend   *blend;
struct si_pm4_state *blend_color;
struct si_pm4_state *clip;
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 08/13] radeonsi: implement VERTEXID_NOBASE and BASEVERTEX system values

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

Only done for completeness. Not used by anything yet.

Tested by advertising PIPE_CAP_VERTEXID_NOBASE.
---
 src/gallium/drivers/radeonsi/si_shader.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index fb479d5..e9c1a7f 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -598,6 +598,16 @@ static void declare_system_value(
  SI_PARAM_BASE_VERTEX), "");
break;
 
+   case TGSI_SEMANTIC_VERTEXID_NOBASE:
+   value = LLVMGetParam(radeon_bld->main_fn,
+si_shader_ctx->param_vertex_id);
+   break;
+
+   case TGSI_SEMANTIC_BASEVERTEX:
+   value = LLVMGetParam(radeon_bld->main_fn,
+SI_PARAM_BASE_VERTEX);
+   break;
+
case TGSI_SEMANTIC_SAMPLEID:
value = get_sample_id(radeon_bld);
break;
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] nv50/ir: fix texture offsets in release builds

2015-01-04 Thread Ilia Mirkin

assert's get compiled out in release builds, so they can't be relied
upon to perform logic.

Reported-by: Pierre Moreau 
Signed-off-by: Ilia Mirkin 
Cc: "10.2 10.3 10.4" 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp | 3 ++-
 src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
index e283424..0d7612e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
@@ -772,7 +772,8 @@ NV50LoweringPreSSA::handleTEX(TexInstruction *i)
if (i->tex.useOffsets) {
   for (int c = 0; c < 3; ++c) {
  ImmediateValue val;
- assert(i->offset[0][c].getImmediate(val));
+ if (!i->offset[0][c].getImmediate(val))
+assert(!"non-immediate offset");
  i->tex.offset[c] = val.reg.data.u32;
  i->offset[0][c].set(NULL);
   }
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index e279ba7..ff48e94 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
@@ -755,7 +755,8 @@ NVC0LoweringPass::handleTEX(TexInstruction *i)
  assert(i->tex.useOffsets == 1);
  for (c = 0; c < 3; ++c) {
 ImmediateValue val;
-assert(i->offset[0][c].getImmediate(val));
+if (!i->offset[0][c].getImmediate(val))
+   assert(!"non-immediate offset passed to non-TXG");
 imm |= (val.reg.data.u32 & 0xf) << (c * 4);
  }
  if (i->op == OP_TXD && chipset >= NVISA_GK104_CHIPSET) {
-- 
2.0.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/4] st/mesa: ignore primitive restart if FixedIndex is enabled in DrawArraysIndirect

2015-01-04 Thread Marek Olšák

Nice. BTW this patch is unrelated to the test. This patch tries to fix
the fixed index case, while the test only validates normal primitive
restart.

Marek

On Sun, Jan 4, 2015 at 10:54 PM, Ilia Mirkin  wrote:
> FWIW the piglit you posted recently
> (arb_draw_indirect-draw-arrays-prim-restart) works with nvc0 with
> upstream mesa as-is. (But fails on llvmpipe/softpipe.)
>
> On Sun, Jan 4, 2015 at 4:44 PM, Marek Olšák  wrote:
>> From: Marek Olšák 
>>
>> From GL 4.4 Core profile:
>>
>>   If both PRIMITIVE_RESTART and PRIMITIVE_RESTART_FIXED_INDEX are
>>   enabled, the index value determined by PRIMITIVE_RESTART_FIXED_INDEX is
>>   used. If PRIMITIVE_RESTART_FIXED_INDEX is enabled, primitive restart is not
>>   performed for array elements transferred by any drawing command not taking 
>> a
>>   type parameter, including all of the *Draw* commands other than *DrawEle-
>>   ments*.
>>
>> If only I had a driver where primitive restart works with DrawArraysIndirect.
>> I can't test this, sorry.
>> ---
>>  src/mesa/state_tracker/st_draw.c | 12 +---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/mesa/state_tracker/st_draw.c 
>> b/src/mesa/state_tracker/st_draw.c
>> index b6ccdd7..9e5a5a9 100644
>> --- a/src/mesa/state_tracker/st_draw.c
>> +++ b/src/mesa/state_tracker/st_draw.c
>> @@ -248,9 +248,15 @@ st_draw_vbo(struct gl_context *ctx,
>> if (indirect) {
>>info.indirect = st_buffer_object(indirect)->buffer;
>>
>> -  /* Primitive restart is not handled by the VBO module in this case. */
>> -  info.primitive_restart = ctx->Array._PrimitiveRestart;
>> -  info.restart_index = ctx->Array.RestartIndex;
>> +  /* Primitive restart for DrawArrays is not handled by the VBO module
>> +   * in this case.
>> +   *
>> +   * If PrimitiveRestartFixedIndex is enabled, primitive_restart must
>> +   * be disabled for DrawArrays. DrawElements is handled above. */
>> +  if (!ib && !ctx->Array.PrimitiveRestartFixedIndex) {
>> + info.primitive_restart = ctx->Array.PrimitiveRestart;
>> + info.restart_index = ctx->Array.RestartIndex;
>> +  }
>> }
>>
>> /* do actual drawing */
>> --
>> 2.1.0
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/4] st/mesa: ignore primitive restart if FixedIndex is enabled in DrawArraysIndirect

2015-01-04 Thread Ilia Mirkin

FWIW the piglit you posted recently
(arb_draw_indirect-draw-arrays-prim-restart) works with nvc0 with
upstream mesa as-is. (But fails on llvmpipe/softpipe.)

On Sun, Jan 4, 2015 at 4:44 PM, Marek Olšák  wrote:
> From: Marek Olšák 
>
> From GL 4.4 Core profile:
>
>   If both PRIMITIVE_RESTART and PRIMITIVE_RESTART_FIXED_INDEX are
>   enabled, the index value determined by PRIMITIVE_RESTART_FIXED_INDEX is
>   used. If PRIMITIVE_RESTART_FIXED_INDEX is enabled, primitive restart is not
>   performed for array elements transferred by any drawing command not taking a
>   type parameter, including all of the *Draw* commands other than *DrawEle-
>   ments*.
>
> If only I had a driver where primitive restart works with DrawArraysIndirect.
> I can't test this, sorry.
> ---
>  src/mesa/state_tracker/st_draw.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_draw.c 
> b/src/mesa/state_tracker/st_draw.c
> index b6ccdd7..9e5a5a9 100644
> --- a/src/mesa/state_tracker/st_draw.c
> +++ b/src/mesa/state_tracker/st_draw.c
> @@ -248,9 +248,15 @@ st_draw_vbo(struct gl_context *ctx,
> if (indirect) {
>info.indirect = st_buffer_object(indirect)->buffer;
>
> -  /* Primitive restart is not handled by the VBO module in this case. */
> -  info.primitive_restart = ctx->Array._PrimitiveRestart;
> -  info.restart_index = ctx->Array.RestartIndex;
> +  /* Primitive restart for DrawArrays is not handled by the VBO module
> +   * in this case.
> +   *
> +   * If PrimitiveRestartFixedIndex is enabled, primitive_restart must
> +   * be disabled for DrawArrays. DrawElements is handled above. */
> +  if (!ib && !ctx->Array.PrimitiveRestartFixedIndex) {
> + info.primitive_restart = ctx->Array.PrimitiveRestart;
> + info.restart_index = ctx->Array.RestartIndex;
> +  }
> }
>
> /* do actual drawing */
> --
> 2.1.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] st/mesa: fix GL_PRIMITIVE_RESTART_FIXED_INDEX

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

Cc: 10.2 10.3 10.4 
---
 src/mesa/state_tracker/st_draw.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_draw.c b/src/mesa/state_tracker/st_draw.c
index 64d6ef5..b6ccdd7 100644
--- a/src/mesa/state_tracker/st_draw.c
+++ b/src/mesa/state_tracker/st_draw.c
@@ -40,6 +40,7 @@
 #include "main/image.h"
 #include "main/bufferobj.h"
 #include "main/macros.h"
+#include "main/varray.h"
 
 #include "vbo/vbo.h"
 
@@ -234,7 +235,7 @@ st_draw_vbo(struct gl_context *ctx,
* so we only set these fields for indexed drawing:
*/
   info.primitive_restart = ctx->Array._PrimitiveRestart;
-  info.restart_index = ctx->Array.RestartIndex;
+  info.restart_index = _mesa_primitive_restart_index(ctx, ib->type);
}
else {
   /* Transform feedback drawing is always non-indexed. */
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] st/mesa: ignore primitive restart if FixedIndex is enabled in DrawArraysIndirect

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

From GL 4.4 Core profile:

  If both PRIMITIVE_RESTART and PRIMITIVE_RESTART_FIXED_INDEX are
  enabled, the index value determined by PRIMITIVE_RESTART_FIXED_INDEX is
  used. If PRIMITIVE_RESTART_FIXED_INDEX is enabled, primitive restart is not
  performed for array elements transferred by any drawing command not taking a
  type parameter, including all of the *Draw* commands other than *DrawEle-
  ments*.

If only I had a driver where primitive restart works with DrawArraysIndirect.
I can't test this, sorry.
---
 src/mesa/state_tracker/st_draw.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/mesa/state_tracker/st_draw.c b/src/mesa/state_tracker/st_draw.c
index b6ccdd7..9e5a5a9 100644
--- a/src/mesa/state_tracker/st_draw.c
+++ b/src/mesa/state_tracker/st_draw.c
@@ -248,9 +248,15 @@ st_draw_vbo(struct gl_context *ctx,
if (indirect) {
   info.indirect = st_buffer_object(indirect)->buffer;
 
-  /* Primitive restart is not handled by the VBO module in this case. */
-  info.primitive_restart = ctx->Array._PrimitiveRestart;
-  info.restart_index = ctx->Array.RestartIndex;
+  /* Primitive restart for DrawArrays is not handled by the VBO module
+   * in this case.
+   *
+   * If PrimitiveRestartFixedIndex is enabled, primitive_restart must
+   * be disabled for DrawArrays. DrawElements is handled above. */
+  if (!ib && !ctx->Array.PrimitiveRestartFixedIndex) {
+ info.primitive_restart = ctx->Array.PrimitiveRestart;
+ info.restart_index = ctx->Array.RestartIndex;
+  }
}
 
/* do actual drawing */
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/4] tgsi: add uses_centroid into tgsi_shader_info

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/auxiliary/tgsi/tgsi_scan.c | 3 +++
 src/gallium/auxiliary/tgsi/tgsi_scan.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.c 
b/src/gallium/auxiliary/tgsi/tgsi_scan.c
index eb313e4..6210ebd 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_scan.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_scan.c
@@ -191,6 +191,9 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
   info->input_cylindrical_wrap[reg] = 
(ubyte)fulldecl->Interp.CylindricalWrap;
   info->num_inputs++;
 
+  if (fulldecl->Interp.Location == 
TGSI_INTERPOLATE_LOC_CENTROID)
+ info->uses_centroid = TRUE;
+
   if (semName == TGSI_SEMANTIC_PRIMID)
  info->uses_primid = TRUE;
   else if (procType == TGSI_PROCESSOR_FRAGMENT) {
diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.h 
b/src/gallium/auxiliary/tgsi/tgsi_scan.h
index 375f75a..bacb4ab 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_scan.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_scan.h
@@ -72,6 +72,7 @@ struct tgsi_shader_info
boolean writes_stencil; /**< does fragment shader write stencil value? */
boolean writes_edgeflag; /**< vertex shader outputs edgeflag */
boolean uses_kill;  /**< KILL or KILL_IF instruction used? */
+   boolean uses_centroid;
boolean uses_instanceid;
boolean uses_vertexid;
boolean uses_vertexid_nobase;
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/4] vbo: ignore primitive restart if FixedIndex is enabled in DrawArrays

2015-01-04 Thread Marek Olšák

From: Marek Olšák 

From GL 4.4 Core profile:

  If both PRIMITIVE_RESTART and PRIMITIVE_RESTART_FIXED_INDEX are
  enabled, the index value determined by PRIMITIVE_RESTART_FIXED_INDEX is
  used. If PRIMITIVE_RESTART_FIXED_INDEX is enabled, primitive restart is not
  performed for array elements transferred by any drawing command not taking a
  type parameter, including all of the *Draw* commands other than *DrawEle-
  ments*.

Cc: 10.2 10.3 10.4 
---
 src/mesa/vbo/vbo_exec_array.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c
index 6eac841..95193f2 100644
--- a/src/mesa/vbo/vbo_exec_array.c
+++ b/src/mesa/vbo/vbo_exec_array.c
@@ -596,7 +596,8 @@ vbo_draw_arrays(struct gl_context *ctx, GLenum mode, GLint 
start,
prim[0].is_indirect = 0;
 
/* Implement the primitive restart index */
-   if (ctx->Array.PrimitiveRestart && ctx->Array.RestartIndex < count) {
+   if (ctx->Array.PrimitiveRestart && !ctx->Array.PrimitiveRestartFixedIndex &&
+   ctx->Array.RestartIndex < count) {
   GLuint primCount = 0;
 
   if (ctx->Array.RestartIndex == start) {
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] i965: Implemente a tiled fast-path for glReadPixels and glGetTexImage

2015-01-04 Thread Ben Widawsky

I just did a very cursory review. I assume someone smarter than me will do a
real review, but if not, feel free to ping me.

I think all the comments apply to both functions.

On Sat, Jan 03, 2015 at 11:54:15AM -0800, Jason Ekstrand wrote:
> From: Sisinty Sasmita Patra 
> 
> Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy
> functions. These are the fast paths for glReadPixels and glGetTexImage.
> 
> v2: Jason Ekstrand 
>- Refactor to make the functions look more like the old
>  intel_tex_subimage_tiled_memcpy
>- Don't export the readpixels_tiled_memcpy function
>- Fix some pointer arithmatic bugs in partial image downloads (using
>  ReadPixels with a non-zero x or y offset)
>- Fix a bug when ReadPixels is performed on an FBO wrapping a texture
>  miplevel other than zero.
> 
> Signed-off-by: Jason Ekstrand 
> ---
>  src/mesa/drivers/dri/i965/intel_pixel_read.c | 134 +-
>  src/mesa/drivers/dri/i965/intel_tex.h|   9 ++
>  src/mesa/drivers/dri/i965/intel_tex_image.c  | 137 
> ++-
>  3 files changed, 277 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_pixel_read.c 
> b/src/mesa/drivers/dri/i965/intel_pixel_read.c
> index beb3152..d1e7798 100644
> --- a/src/mesa/drivers/dri/i965/intel_pixel_read.c
> +++ b/src/mesa/drivers/dri/i965/intel_pixel_read.c
> @@ -38,14 +38,16 @@
>  
>  #include "brw_context.h"
>  #include "intel_screen.h"
> +#include "intel_batchbuffer.h"
>  #include "intel_blit.h"
>  #include "intel_buffers.h"
>  #include "intel_fbo.h"
>  #include "intel_mipmap_tree.h"
>  #include "intel_pixel.h"
>  #include "intel_buffer_objects.h"
> +#include "intel_tiled_memcpy.h"
>  
> -#define FILE_DEBUG_FLAG DEBUG_PIXEL
> +#define FILE_DEBUG_FLAG DEBUG_TEXTURE
>  
>  /* For many applications, the new ability to pull the source buffers
>   * back out of the GTT and then do the packing/conversion operations
> @@ -161,17 +163,147 @@ do_blit_readpixels(struct gl_context * ctx,
> return true;
>  }
>  
> +/**
> + * \brief A fast path for glReadPixels
> + *
> + * This fast path is taken when the source format is BGRA, RGBA,
> + * A or L and when the texture memory is X- or Y-tiled.  It downloads
> + * the source data by mapping the memory without a GTT fence, thus
> + * acquiring a linear view of the memory.

That last sentence is confusing since you're using linear differently than just
about everywhere else (though the statement is accurate).

How about something like, "It maps the source data with a CPU mapping which then
needs to be de-tiled [by the CPU] before presenting linear data back to the
user."

> + *
> + * This is a performance win over the conventional texture download path.
> + * In the conventional texture download path,
> + *
> + */
> +
> +static bool
> +intel_readpixels_tiled_memcpy(struct gl_context * ctx,
> +  GLint xoffset, GLint yoffset,
> +  GLsizei width, GLsizei height,
> +  GLenum format, GLenum type,
> +  GLvoid * pixels,
> +  const struct gl_pixelstore_attrib *pack)
> +{
> +   struct brw_context *brw = brw_context(ctx);
> +   struct gl_renderbuffer *rb = ctx->ReadBuffer->_ColorReadBuffer;
> +
> +   /* This path supports reading from color buffers only */
> +   if (rb == NULL)
> +  return false;
> +
> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);
> +   int dst_pitch;
> +
> +   /* The miptree's buffer. */
> +   drm_intel_bo *bo;
> +
> +   int error = 0;
> +
> +   uint32_t cpp;
> +   mem_copy_fn mem_copy = NULL;
> +
> +   /* This fastpath is restricted to specific renderbuffer types:
> +* a 2D BGRA, RGBA, L8 or A8 texture. It could be generalized to support
> +* more types.
> +*/
> +   if (!brw->has_llc ||
> +   !(type == GL_UNSIGNED_BYTE || type == GL_UNSIGNED_INT_8_8_8_8_REV) ||
> +   pixels == NULL ||
> +   _mesa_is_bufferobj(pack->BufferObj) ||
> +   pack->Alignment > 4 ||
> +   pack->SkipPixels > 0 ||
> +   pack->SkipRows > 0 ||
> +   (pack->RowLength != 0 && pack->RowLength != width) ||
> +   pack->SwapBytes ||
> +   pack->LsbFirst ||
> +   pack->Invert)
> +  return false;
> +
> +   if (!intel_get_memcpy(rb->Format, format, type, &mem_copy, &cpp))
> +  return false;
> +
> +   if (!irb->mt ||
> +   (irb->mt->tiling != I915_TILING_X &&
> +   irb->mt->tiling != I915_TILING_Y)) {
> +  /* The algorithm is written only for X- or Y-tiled memory. */
> +  return false;
> +   }
> +
> +   /* Since we are going to read raw data to the miptree, we need to resolve
> +* any pending fast color clears before we start.
> +*/
> +   intel_miptree_resolve_color(brw, irb->mt);
> +
> +   bo = irb->mt->bo;
> +
> +   if (drm_intel_bo_references(brw->batch.bo, bo)) {
> +  perf_debug("Flushing before mapping a referenced bo.\n"

Re: [Mesa-dev] Submitting more shaders to shader-db?

2015-01-04 Thread Aras Pranckevicius

On Sun, Jan 4, 2015 at 10:20 AM, Kenneth Graunke 
wrote:
> On Sunday 04 January 2015 09:36:40 Aras Pranckevicius wrote:
> > Is it possible to submit more shaders into whatever shader-db is
typically
> > used by Mesa developers to test compiler optimizations on? I could
package
> > up some built-in shaders from Unity 4 / Unity 5 (which end up being
used in
> > a many, many Unity games), in the format expected by shader-db, and
under
> > whatever license is needed.
> > Would that be useful, and if so, what are the steps to get there?
>
> Absolutely!  We'd love to expand it.
> You can just run a program that compiles the shaders with MESA_GLSL=dump
and
> email me the output.  (MESA_GLSL=dump ./yourapp &> log_to_be_emailed)

Since I don't even have a Linux box nearby, it would be way easier for me
to change Unity source code to dump the shaders into the right format
directly, and do it from a Mac (Linux Unity game builds use exactly the
same shaders anyway).


> If you'd like them included in the public repository, I'll also need a
license
> header saying that the shaders are freely redistributable.

MIT or BSD would be fine with me; are they ok for public shader-db?



--
Aras Pranckevičius
work: http://unity3d.com
home: http://aras-p.info
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] glx/dri3: Request non-vsynced Present for swapinterval zero.

2015-01-04 Thread Emil Velikov

On 17/12/14 05:20, Mario Kleiner wrote:
> On 12/17/2014 05:49 AM, Keith Packard wrote:
>> Mario Kleiner  writes:
>>
>>> It's just that i need access to both, the old behaviour i described, and
>>> the new "drop frame" behaviour, and i need a way to select what i want
>>> at runtime via api without the need for easily overwhelmed and confused
>>> users to change config files or environment variables. I also always
>>> need meaningful and trustworthy feedback, at least for page-flipped
>>> presents, about what really happened for a presented frame - was it
>>> flipped, copied, exchanged, skipped, or did some error happen?
>> Present reports precisely what it did with each frame; flipped, copied,
>> or skipped.
>>
>>> That's why i'd like to have an extension to INTEL_swap_events to also
>>> report some new completion type "skipped" and "error" and that one patch
>>> 5/5 of mine for mesa reviewed and included, to make sure the swap_events
>>> don't fall apart so easily.
>> You can use Present events on the target drawable; they're generated to
>> whoever requests them, so you don't need to rely on the intel swap
>> events alone.
> 
> Never thought about that. Could you show me some short example snippet
> of XLib/GLX code how i reliably detect at runtime if Present is present,
> and then enable this? That would probably do for the moment and at the
> same time solve the problem that i don't know how to reliably detect at
> runtime if i'm on DRI2 or DRI3/Present. Making good use of this will
> require separate code-path and a way to select the right one.
> 
> It's still important to fix that wraparound handling bug from my patch
> 5/5 for INTEL_swap_events.
> 
>>> As some kind of stop gap measure one could also think about defining a
>>> new vblank_mode to enable the new behaviour instead of the old one.
>> I really don't have a good suggestion here, given that we have such a
>> limited API available.
>>
> 
> I thought i just made a suggestion how we could wiggle through it within
> existing api?
> And we can define new one for future extensions?
> 
Adding Mathias and Frank Binns to the Cc list.

So taking into account the discussion so far, including Mathias's input
that the official nvidia driver does the same/similar form of cheating:
 - Should we revert on master until a decision(alternative solution) is
available ?
 - Curious if we can have some form of consensus on what the next steps
would be.



Keith,
Any plans on taking a look at patch 4/5 [1] and 5/5 [2] from rev 3 of
the series ? The former is reviewed by Eric while the latter lacks any
comments :'(


Thanks
Emil

[1] http://lists.freedesktop.org/archives/mesa-dev/2014-December/072078.html
[2] http://lists.freedesktop.org/archives/mesa-dev/2014-December/072079.html

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH:mesa] Bracket arguments to tr so they work with Solaris tr

2015-01-04 Thread Alan Coopersmith


On 01/ 4/15 11:34 AM, Emil Velikov wrote:

Hi Alan,
On 03/01/15 22:28, Alan Coopersmith wrote:

https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Limitations-of-Usual-Tools.html#index-g_t_0040command_007btr_007d-1842

Without this fix, egl fails to build on Solaris, with the error:

:0:22: error: '_EGL_PLATFORM_x11' undeclared (first use in this 
function)
egldisplay.c:207:31: note: in expansion of macro '_EGL_NATIVE_PLATFORM'
  native_platform = _EGL_NATIVE_PLATFORM;
^


Trivial note - the sed command has picked up the missing -e parameter ;)


Oh, whoops.  That was left from my failed attempt to replace tr with a -e 'y/..'
flag to sed.


I'm thinking about using $SED consistently throughout (and add the
missing -e) in a separate patch(es). Any objections ?


Sounds good to me.


Reviewed-by: Emil Velikov 

If you're planning to push this please add the following line.
Alternatively I'll add it as I push it in a couple of days.

Cc: 10.3 10.4 


It's been a couple years since I've pushed to mesa, so I'm happy to let
you do the push so it's done right.

Thanks,

-alan-

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 86837] kodi segfault since auxiliary/vl: rework the build of the VL code

2015-01-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=86837

--- Comment #17 from bgunte...@gmail.com ---
(In reply to Andy Furniss from comment #13)
> (In reply to Emil Velikov from comment #12)
> > Seems like Christian dropped the link with the tentative fix.
> > http://patchwork.freedesktop.org/patch/39400/
> > 
> > Guys can you test this please ?
> 
> Works OK for me testing with Kodi.

I spoke too soon.
When trying to build for Kodi, my display works, but my videos do not play.

@Andy Furniss, what is your command line for building mesa?
I'm using this -- ./autogen.sh --prefix=/opt/xorg --with-gallium-drivers=r600
--with-dri-drivers=radeon --enable-glx-tls


am I missing something?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Implement WaCsStallAtEveryFourthPipecontrol on IVB/BYT.

2015-01-04 Thread Ben Widawsky

On Wed, Nov 12, 2014 at 11:17:55AM -0800, Kenneth Graunke wrote:
> According to the documentation, we need to do a CS stall on every fourth
> PIPE_CONTROL command to avoid GPU hangs.  The kernel does a CS stall
> between batches, so we only need to count the PIPE_CONTROLs in our batches.
> 
> v2: Get the generation check right (caught by Chris Wilson),
> combine the ++ with the check (suggested by Daniel Vetter).
> 
> Signed-off-by: Kenneth Graunke 
> Reviewed-by: Daniel Vetter 

Ken, did you want to push this patch?

[snip]
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH:mesa] Bracket arguments to tr so they work with Solaris tr

2015-01-04 Thread Emil Velikov

Hi Alan,
On 03/01/15 22:28, Alan Coopersmith wrote:
> https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Limitations-of-Usual-Tools.html#index-g_t_0040command_007btr_007d-1842
> 
> Without this fix, egl fails to build on Solaris, with the error:
> 
> :0:22: error: '_EGL_PLATFORM_x11' undeclared (first use in this 
> function)
> egldisplay.c:207:31: note: in expansion of macro '_EGL_NATIVE_PLATFORM'
>  native_platform = _EGL_NATIVE_PLATFORM;
>^
> 
Trivial note - the sed command has picked up the missing -e parameter ;)
I'm thinking about using $SED consistently throughout (and add the
missing -e) in a separate patch(es). Any objections ?

Reviewed-by: Emil Velikov 

If you're planning to push this please add the following line.
Alternatively I'll add it as I push it in a couple of days.

Cc: 10.3 10.4 

Thanks
Emil

> Signed-off-by: Alan Coopersmith 
> ---
>  configure.ac |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/configure.ac b/configure.ac
> index b5805f6..a008cbf 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -1584,7 +1584,7 @@ done
>  # libEGL wants to default to the first platform specified in
>  # ./configure.  parse that here.
>  if test "x$egl_platforms" != "x"; then
> -FIRST_PLATFORM_CAPS=`echo $egl_platforms | sed 's| .*||' | tr 'a-z' 
> 'A-Z'`
> +FIRST_PLATFORM_CAPS=`echo $egl_platforms | sed -e 's| .*||' | tr 
> '[[a-z]]' '[[A-Z]]'`
>  EGL_NATIVE_PLATFORM="_EGL_PLATFORM_$FIRST_PLATFORM_CAPS"
>  else
>  EGL_NATIVE_PLATFORM="_EGL_INVALID_PLATFORM"
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 86837] kodi segfault since auxiliary/vl: rework the build of the VL code

2015-01-04 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=86837

--- Comment #16 from bgunte...@gmail.com ---
I can also confirm that this patch works.

Running OpenGL version:2.1 Mesa 10.4.0(git-fb3f7c0)

great work!

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Nouveau] [PATCH 2/2] nvc0: regenerate rnndb headers

2015-01-04 Thread Emil Velikov

On 31/12/14 03:42, Ilia Mirkin wrote:
> The headers hadn't been regenerated in a long time and had seen a number
> of manual modifications. A few changes:
>  - remove nvc0_2d entirely, use the nv50 header which has the nvc0
>values too
>  - remove 3ddefs, it's identical to the nv50 file
>  - move macros out into a separate file
> 
> Also the upstream rnndb changed the overall chip naming convention; this
> was fixed up manually in the generated files until a better solution is
> determined.
> 
> Signed-off-by: Ilia Mirkin 
> ---
>  src/gallium/drivers/nouveau/nvc0/nvc0_2d.xml.h |  380 ---
>  src/gallium/drivers/nouveau/nvc0/nvc0_3d.xml.h | 1115 
> 
>  src/gallium/drivers/nouveau/nvc0/nvc0_3ddefs.xml.h |   98 --
>  .../drivers/nouveau/nvc0/nvc0_compute.xml.h|   67 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_context.h|5 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_m2mf.xml.h   |   67 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_macros.h |   32 +
>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |6 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_state.c  |6 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_surface.c|   22 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_tex.c|   12 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c   |8 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_winsys.h |4 +-
>  .../drivers/nouveau/nvc0/nve4_compute.xml.h|   61 +-
>  src/gallium/drivers/nouveau/nvc0/nve4_p2mf.xml.h   |  102 +-
>  15 files changed, 1153 insertions(+), 832 deletions(-)
>  delete mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_2d.xml.h
>  delete mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_3ddefs.xml.h
>  create mode 100644 src/gallium/drivers/nouveau/nvc0/nvc0_macros.h
> 
Hi Ilia,

Please squash the following before pushing. Thanks.

-Emil

diff --git a/src/gallium/drivers/nouveau/Makefile.sources
b/src/gallium/drivers/nouveau/Makefile.sources
index 64f9608..3fae3bc 100644
--- a/src/gallium/drivers/nouveau/Makefile.sources
+++ b/src/gallium/drivers/nouveau/Makefile.sources
@@ -137,8 +137,6 @@ NVC0_CODEGEN_SOURCES := \
codegen/nv50_ir_target_nvc0.h

 NVC0_C_SOURCES := \
-   nvc0/nvc0_2d.xml.h \
-   nvc0/nvc0_3ddefs.xml.h \
nvc0/nvc0_3d.xml.h \
nvc0/nvc0_compute.c \
nvc0/nvc0_compute.h \
@@ -147,6 +145,7 @@ NVC0_C_SOURCES := \
nvc0/nvc0_context.h \
nvc0/nvc0_formats.c \
nvc0/nvc0_m2mf.xml.h \
+   nvc0/nvc0_macros.h \
nvc0/nvc0_miptree.c \
nvc0/nvc0_program.c \
nvc0/nvc0_program.h \

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 29/40] i965/vec4: Append uniform variables to the gather table

2015-01-04 Thread Abdiel Janulgue

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 0f22829..8dee915 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -1093,6 +1093,7 @@ vec4_visitor::visit(ir_variable *ir)
   } else {
 setup_uniform_values(ir);
   }
+  stage_prog_data->gather_table[stage_prog_data->nr_gather_table++].reg = 
reg->reg;
   break;
 
case ir_var_system_value:
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 31/40] i965/fs: Set limitation for amount of UBO push constant entries

2015-01-04 Thread Abdiel Janulgue

We set the same 16-register limitation used in assign_constant_locations()
when assigning UBOs as push constants. Otherwise, just fall-back to using
pull constant loads.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index c0499b6..8e092fb 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1083,7 +1083,8 @@ fs_visitor::visit(ir_expression *ir)
*/
   bool use_gather = (brw->has_resource_streamer && 
brw->use_gather_constants);
   int param_index = uniforms + ubo_uniforms;
-  if (use_gather && const_uniform_block && const_offset) {
+  if (use_gather && const_uniform_block && const_offset &&
+  (param_index < 128)) {
 
  fs_reg reg(UNIFORM, param_index);
  reg.type = brw_type_for_base_type(ir->type);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 38/40] i965/fs: Update curb_read_length to include ubo uniforms

2015-01-04 Thread Abdiel Janulgue

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 8a03581..904c51b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1662,7 +1662,8 @@ fs_visitor::assign_curb_setup()
   prog_data->dispatch_grf_start_reg_16 = payload.num_regs;
}
 
-   prog_data->curb_read_length = ALIGN(stage_prog_data->nr_params, 8) / 8;
+   prog_data->curb_read_length = ALIGN(stage_prog_data->nr_params + 
stage_prog_data->nr_ubo_params,
+   8) / 8;
 
/* Map the offsets in the UNIFORM file to fixed HW regs. */
foreach_block_and_inst(block, fs_inst, inst, cfg) {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 33/40] i965/fs: Include ubo registers when assigning push_constant locations

2015-01-04 Thread Abdiel Janulgue

When assigning a block of register to normal uniforms, pack the ubo
uniform registers next to it.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index d62050e..9a73691 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1670,7 +1670,7 @@ fs_visitor::assign_curb_setup()
 if (inst->src[i].file == UNIFORM) {
 int uniform_nr = inst->src[i].reg + inst->src[i].reg_offset;
 int constant_nr;
-if (uniform_nr >= 0 && uniform_nr < (int) uniforms) {
+if (uniform_nr >= 0 && uniform_nr < (int) (uniforms + 
ubo_uniforms)) {
constant_nr = push_constant_loc[uniform_nr];
 } else {
/* Section 5.11 of the OpenGL 4.1 spec says:
@@ -2105,8 +2105,9 @@ fs_visitor::move_uniform_array_access_to_pull_constants()
if (dispatch_width != 8)
   return;
 
-   pull_constant_loc = ralloc_array(mem_ctx, int, uniforms);
-   memset(pull_constant_loc, -1, sizeof(pull_constant_loc[0]) * uniforms);
+   unsigned int total_uniforms = uniforms + ubo_uniforms;
+   pull_constant_loc = ralloc_array(mem_ctx, int, total_uniforms);
+   memset(pull_constant_loc, -1, sizeof(pull_constant_loc[0]) * 
total_uniforms);
 
/* Walk through and find array access of uniforms.  Put a copy of that
 * uniform in the pull constant buffer.
@@ -2156,9 +2157,10 @@ fs_visitor::assign_constant_locations()
if (dispatch_width != 8)
   return;
 
+   unsigned int total_uniforms = uniforms + ubo_uniforms;
/* Find which UNIFORM registers are still in use. */
-   bool is_live[uniforms];
-   for (unsigned int i = 0; i < uniforms; i++) {
+   bool is_live[total_uniforms];
+   for (unsigned int i = 0; i < total_uniforms; i++) {
   is_live[i] = false;
}
 
@@ -2168,7 +2170,7 @@ fs_visitor::assign_constant_locations()
 continue;
 
  int constant_nr = inst->src[i].reg + inst->src[i].reg_offset;
- if (constant_nr >= 0 && constant_nr < (int) uniforms)
+ if (constant_nr >= 0 && constant_nr < (int) total_uniforms)
 is_live[constant_nr] = true;
   }
}
@@ -2184,9 +2186,9 @@ fs_visitor::assign_constant_locations()
unsigned int max_push_components = 16 * 8;
unsigned int num_push_constants = 0;
 
-   push_constant_loc = ralloc_array(mem_ctx, int, uniforms);
+   push_constant_loc = ralloc_array(mem_ctx, int, total_uniforms);
 
-   for (unsigned int i = 0; i < uniforms; i++) {
+   for (unsigned int i = 0; i < total_uniforms; i++) {
   if (!is_live[i] || pull_constant_loc[i] != -1) {
  /* This UNIFORM register is either dead, or has already been demoted
   * to a pull const.  Mark it as no longer living in the param[] array.
@@ -2210,7 +2212,8 @@ fs_visitor::assign_constant_locations()
   }
}
 
-   stage_prog_data->nr_params = num_push_constants;
+   stage_prog_data->nr_params = 0;
+   stage_prog_data->nr_ubo_params = ubo_uniforms;
 
/* Up until now, the param[] array has been indexed by reg + reg_offset
 * of UNIFORM registers.  Condense it to only contain the uniforms we
@@ -2224,6 +2227,7 @@ fs_visitor::assign_constant_locations()
 
   assert(remapped <= (int)i);
   stage_prog_data->param[remapped] = stage_prog_data->param[i];
+  stage_prog_data->nr_params++;
}
 }
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 32/40] i965/fs: Pack a uniform register next to UBO uniforms

2015-01-04 Thread Abdiel Janulgue

And vice versa. This allows us to combine UBOs and uniform registers
as push constants.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 1 +
 src/mesa/drivers/dri/i965/brw_fs.h   | 3 +++
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 ++-
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 0f2c2c4..d62050e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1114,6 +1114,7 @@ fs_visitor::import_uniforms(fs_visitor *v)
this->push_constant_loc = v->push_constant_loc;
this->pull_constant_loc = v->pull_constant_loc;
this->uniforms = v->uniforms;
+   this->ubo_uniforms = v->ubo_uniforms;
this->param_size = v->param_size;
 }
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 06575a5..cec615e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -625,6 +625,9 @@ public:
/** Number of uniform variable components visited. */
unsigned uniforms;
 
+   /** Number of ubo uniform variable components visited. */
+   unsigned ubo_uniforms;
+
/** Byte-offset for the next available spot in the scratch space buffer. */
unsigned last_scratch;
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 8e092fb..597e125 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -151,7 +151,7 @@ fs_visitor::visit(ir_variable *ir)
 }
   }
} else if (ir->data.mode == ir_var_uniform) {
-  int param_index = uniforms;
+  int param_index = uniforms + ubo_uniforms;
 
   /* Thanks to the lower_ubo_reference pass, we will see only
* ir_binop_ubo_load expressions and not ir_dereference_variable for UBO
@@ -3864,6 +3864,7 @@ fs_visitor::init()
this->regs_live_at_ip = NULL;
 
this->uniforms = 0;
+   this->ubo_uniforms = 0;
this->last_scratch = 0;
this->pull_constant_loc = NULL;
this->push_constant_loc = NULL;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 34/40] i965/fs: Specify which channels are enabled per gather constant entry

2015-01-04 Thread Abdiel Janulgue

A gather push constant table entry is able to fetch in 128-bit
increments from the constant buffer. A channel mask is provided to
narrow down which channels are loaded in that entry. This patch
generates the mask for enabled entries only.

The ir_swizzle visitor which is run prior this function determines
which registers are loaded in the push constant array. This function
basically walks the live registers and appends the live entries
in the channel mask.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 9a73691..8a03581 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2170,6 +2170,25 @@ fs_visitor::assign_constant_locations()
 continue;
 
  int constant_nr = inst->src[i].reg + inst->src[i].reg_offset;
+ for (unsigned int p = 0; p < stage_prog_data->nr_gather_table; p++) {
+if (stage_prog_data->gather_table[p].reg == inst->src[i].reg) {
+   /* Is the constant a uniform or a ubo? */
+   unsigned offset = (constant_nr < (int) uniforms) ?
+  (constant_nr % 4): inst->src[i].reg_offset;
+   /* Generate the channel mask to determine which entries 
starting from
+* the offset above should be packed into the 16-byte entry. If 
the
+* offset is aligned to a 16-byte boundary, just set the 
position based on
+* the reg_offset. Otherwise, set the mask based on the positon 
of the offset
+* from the boundary.
+*/
+   unsigned mask = ((prog_data->gather_table[p].const_offset % 16) 
== 0) ?
+  1 << offset : 1 << ((prog_data->gather_table[p].const_offset 
% 16) / 4);
+
+   stage_prog_data->gather_table[p].channel_mask |= mask;
+   break;
+}
+ }
+
  if (constant_nr >= 0 && constant_nr < (int) total_uniforms)
 is_live[constant_nr] = true;
   }
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 35/40] i965: Upload UBO surfaces before emitting constant state packet

2015-01-04 Thread Abdiel Janulgue

Now that UBOs are uploaded as push constants. We need to obtain and
append the amount of push constant entries generated by the UBO entry
fetches to the 3DSTATE_CONSTANT_* packets.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_state_upload.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index 612638e..30da446 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -200,6 +200,10 @@ static const struct brw_tracked_state *gen7_atoms[] =
 
&gen7_hw_binding_tables, /* Enable hw-generated binding tables for Haswell 
*/
 
+   &brw_vs_ubo_surfaces,
+   &brw_gs_ubo_surfaces,
+   &brw_wm_ubo_surfaces,
+
&gen6_vs_push_constants, /* Before vs_state */
&gen6_gs_push_constants, /* Before gs_state */
&gen6_wm_push_constants, /* Before wm_surfaces and constant_buffer */
@@ -208,13 +212,10 @@ static const struct brw_tracked_state *gen7_atoms[] =
 * table upload must be last.
 */
&brw_vs_pull_constants,
-   &brw_vs_ubo_surfaces,
&brw_vs_abo_surfaces,
&brw_gs_pull_constants,
-   &brw_gs_ubo_surfaces,
&brw_gs_abo_surfaces,
&brw_wm_pull_constants,
-   &brw_wm_ubo_surfaces,
&brw_wm_abo_surfaces,
&gen6_renderbuffer_surfaces,
&brw_texture_surfaces,
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 37/40] i965: Assign hw-binding table entries for each ubo block.

2015-01-04 Thread Abdiel Janulgue

Blanket the ubo blocks with a binding table. Note that the resource
streamer is able to fetch the constant buffers referred to by the gather
table only if it is referenced by the hw-binding table generator.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_context.h  | 3 +++
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 6 ++
 2 files changed, 9 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 9877126..8d9adf6 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -668,6 +668,9 @@ struct brw_vs_prog_data {
 /** Start of hardware binding table index for uniform gather constant entries 
*/
 #define BRW_UNIFORM_GATHER_INDEX_START 16
 
+/** Start of hardware binding table index for UBO gather constant entries */
+#define BRW_UBO_GATHER_INDEX_START (BRW_UNIFORM_GATHER_INDEX_START + 8)
+
 /* Note: brw_gs_prog_data_compare() must be updated when adding fields to
  * this struct!
  */
diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index 85a08d5..558a816 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -853,6 +853,7 @@ brw_upload_ubo_surfaces(struct brw_context *brw,
 
uint32_t *surf_offsets =
   &stage_state->surf_offset[prog_data->binding_table.ubo_start];
+   bool use_gather = (brw->gather_pool.bo != NULL);
 
for (int i = 0; i < shader->NumUniformBlocks; i++) {
   struct gl_uniform_buffer_binding *binding;
@@ -873,6 +874,11 @@ brw_upload_ubo_surfaces(struct brw_context *brw,
   bo->size - binding->Offset,
   &surf_offsets[i],
   dword_pitch);
+  if (use_gather) {
+ int bt_idx = BRW_UBO_GATHER_INDEX_START + i;
+ gen7_update_binding_table(brw, MESA_SHADER_FRAGMENT,
+   bt_idx, surf_offsets[i]);
+  }
}
 
if (shader->NumUniformBlocks)
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 36/40] i965/fs: Make SIMD16 work for UBO gather push constants

2015-01-04 Thread Abdiel Janulgue

Gather table entries were generated previously in the SIMD8 pass.
Just reuse those entries for SIMD16 so we don't generate a duplicate
set of registers.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 597e125..18626af 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1086,6 +1086,19 @@ fs_visitor::visit(ir_expression *ir)
   if (use_gather && const_uniform_block && const_offset &&
   (param_index < 128)) {
 
+ if (dispatch_width == 16) {
+for (int i = 0; i < (int) stage_prog_data->nr_gather_table; i++) {
+   if ((stage_prog_data->gather_table[i].const_block ==
+const_uniform_block->value.u[0]) &&
+   (stage_prog_data->gather_table[i].const_offset ==
+const_offset->value.u[0])) {
+  fs_reg reg(UNIFORM, stage_prog_data->gather_table[i].reg);
+  result = reg;
+  return;
+   }
+}
+ }
+
  fs_reg reg(UNIFORM, param_index);
  reg.type = brw_type_for_base_type(ir->type);
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 39/40] i965: Generate separate gather entries for UBOs

2015-01-04 Thread Abdiel Janulgue

Now that we are able to use a gather table for fetching UBOs,
make a gather entry based on the table generated by the ir_binop_ubo_load
and uniform loads combined. At the moment, we separate this entry from
the previous uniform-only gather table because the current approach
to pack the uniform and ubo gather entries doesn't work if the
uniform entry size is greater than a vec4.

Ideally, we should only generate a single gather table for the uniforms
and the UBOs combined. But this will do for now.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/gen7_vs_state.c | 61 +--
 1 file changed, 42 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c 
b/src/mesa/drivers/dri/i965/gen7_vs_state.c
index 30ebec8..78dfc13 100644
--- a/src/mesa/drivers/dri/i965/gen7_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c
@@ -36,26 +36,49 @@ gen7_submit_gather_table(struct brw_context* brw,
  unsigned gather_opcode)
 {
uint32_t gather_dwords = 0;
-   /* Generate gather entry only for uniforms */
-   int num_consts = ALIGN(prog_data->nr_params, 4) / 4;
-   gather_dwords = 3 + num_consts;
-
-   /* Fetch the entries in 128-bit units. If the offset in the constant
-* buffer pointing to the entry is > 4096 bytes, round it to the next
-* gather bank slot. gen7_upload_constant_buffer_data() should have
-* made sure that the entries are uploaded in the correct slots.
-*/
-   unsigned bo_offset = (stage_state->const_bo_offset / 16) % 256;
-   unsigned bti = stage_state->const_bo_offset / 4096;
-
-   BEGIN_BATCH(gather_dwords);
-   OUT_BATCH(gather_opcode << 16 | (gather_dwords - 2));
-   OUT_BATCH(0x << 16 | 1 << 12);
-   OUT_BATCH(stage_state->push_const_offset);
-   for (int i = 0; i < num_consts; i++) {
-  OUT_BATCH((bo_offset + i) << 8 | 0xF << 4 | bti);
+   if (!prog_data->nr_ubo_params) {
+  /* Generate gather entry only for uniforms */
+  int num_consts = ALIGN(prog_data->nr_params, 4) / 4;
+  gather_dwords = 3 + num_consts;
+
+  /* Fetch the entries in 128-bit units. If the offset in the constant
+   * buffer pointing to the entry is > 4096 bytes, round it to the next
+   * gather bank slot. gen7_upload_constant_buffer_data() should have
+   * made sure that the entries are uploaded in the correct slots.
+   */
+  unsigned bo_offset = (stage_state->const_bo_offset / 16) % 256;
+  unsigned bti = stage_state->const_bo_offset / 4096;
+
+  BEGIN_BATCH(gather_dwords);
+  OUT_BATCH(gather_opcode << 16 | (gather_dwords - 2));
+  OUT_BATCH(0x << 16 | 1 << 12);
+  OUT_BATCH(stage_state->push_const_offset);
+  for (int i = 0; i < num_consts; i++) {
+ OUT_BATCH((bo_offset + i) << 8 | 0xF << 4 | bti);
+  }
+  ADVANCE_BATCH();
+   } else {
+  /* Generate gather entry for UBOs + uniforms combined */
+  gather_dwords = 3 + prog_data->nr_gather_table;
+
+  BEGIN_BATCH(gather_dwords);
+  OUT_BATCH(gather_opcode << 16 | (gather_dwords - 2));
+  OUT_BATCH(0x << 16 | 1 << 12);
+  OUT_BATCH(stage_state->push_const_offset);
+  for (int i = 0; i < prog_data->nr_gather_table; i++) {
+ /* Which bo are we referring to? The uniform constant buffer or
+  * the UBO block?
+  */
+ bool is_uniform = prog_data->gather_table[i].reg < 
prog_data->nr_params;
+ int cb_offset = (is_uniform ?
+  stage_state->const_bo_offset :
+  prog_data->gather_table[i].const_offset) / 16;
+ int bt_offset = is_uniform ? 0 : 
prog_data->gather_table[i].const_block + 8;
+
+ OUT_BATCH(cb_offset << 8 | prog_data->gather_table[i].channel_mask << 
4 | bt_offset);
+  }
+  ADVANCE_BATCH();
}
-   ADVANCE_BATCH();
 }
 
 void
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 28/40] i965/fs: Append uniform variables to the gather table

2015-01-04 Thread Abdiel Janulgue

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index bd9345e..2f592c9 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -180,6 +180,7 @@ fs_visitor::visit(ir_variable *ir)
   reg = new(this->mem_ctx) fs_reg(UNIFORM, param_index);
   reg->type = brw_type_for_base_type(ir->type);
 
+  stage_prog_data->gather_table[stage_prog_data->nr_gather_table++].reg = 
reg->reg;
} else if (ir->data.mode == ir_var_system_value) {
   switch (ir->data.location) {
   case SYSTEM_VALUE_BASE_VERTEX:
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 40/40] i965: Enable push constants for UBOs

2015-01-04 Thread Abdiel Janulgue

Switches on push constants whenever we have UBO entries.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/gen7_wm_state.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c 
b/src/mesa/drivers/dri/i965/gen7_wm_state.c
index 923414e..1dfe697 100644
--- a/src/mesa/drivers/dri/i965/gen7_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c
@@ -152,7 +152,7 @@ upload_ps_state(struct brw_context *brw)
 
dw4 |= (brw->max_wm_threads - 1) << max_threads_shift;
 
-   if (prog_data->base.nr_params > 0)
+   if (prog_data->base.nr_params > 0 || prog_data->base.nr_ubo_params > 0)
   dw4 |= GEN7_PS_PUSH_CONSTANT_ENABLE;
 
/* From the IVB PRM, volume 2 part 1, page 287:
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 24/40] i965/vec4: Associate the uniform location with either geometry or vertex stage

2015-01-04 Thread Abdiel Janulgue

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 09d79c8..0f22829 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -707,6 +707,8 @@ vec4_visitor::setup_uniform_values(ir_variable *ir)
   unsigned vector_count = (MAX2(storage->array_elements, 1) *
storage->type->matrix_columns);
 
+  brw->uniformstagemap[u] |= (stage == MESA_SHADER_GEOMETRY ?
+  _NEW_GEOMETRY_CONSTANTS : 
_NEW_VERTEX_CONSTANTS);
   for (unsigned s = 0; s < vector_count; s++) {
  assert(uniforms < uniform_array_size);
  uniform_vector_size[uniforms] = storage->type->vector_elements;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 30/40] i965/fs: Append ir_binop_ubo_load entries to the gather table

2015-01-04 Thread Abdiel Janulgue

At the moment, this is only possible if the const block and offset are
immediate values (constants). Otherwise just fall-back to the previous
method of uploading the UBO constant data to GRF using pull constants.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_context.h  |  3 +++
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 24 
 2 files changed, 27 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 8eddc54..9877126 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -347,6 +347,7 @@ struct brw_stage_prog_data {
 
GLuint nr_params;   /**< number of float params/constants */
GLuint nr_pull_params;
+   GLuint nr_ubo_params;
GLuint nr_gather_table;
 
unsigned curb_read_length;
@@ -371,6 +372,8 @@ struct brw_stage_prog_data {
struct {
   int reg;
   unsigned channel_mask;
+  unsigned const_block;
+  unsigned const_offset;
} gather_table[128]; /** equal to max_push_components */
 };
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 2f592c9..c0499b6 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1077,6 +1077,30 @@ fs_visitor::visit(ir_expression *ir)
*/
   ir_constant *const_uniform_block = ir->operands[0]->as_constant();
   ir_constant *const_offset = ir->operands[1]->as_constant();
+
+  /* Use gather push constants if at all possible, otherwise just
+   * fall back to pull constants for UBOs
+   */
+  bool use_gather = (brw->has_resource_streamer && 
brw->use_gather_constants);
+  int param_index = uniforms + ubo_uniforms;
+  if (use_gather && const_uniform_block && const_offset) {
+
+ fs_reg reg(UNIFORM, param_index);
+ reg.type = brw_type_for_base_type(ir->type);
+
+ result = reg;
+ ubo_uniforms += ir->type->vector_elements;
+
+ int gather = stage_prog_data->nr_gather_table++;
+ stage_prog_data->gather_table[gather].reg = reg.reg;
+ stage_prog_data->gather_table[gather].const_block =
+const_uniform_block->value.u[0];
+ stage_prog_data->gather_table[gather].const_offset =
+const_offset->value.u[0];
+
+ break;
+  }
+
   fs_reg surf_index;
 
   if (const_uniform_block) {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 22/40] i965: Implement fine-grained uniform uploads

2015-01-04 Thread Abdiel Janulgue

Determine which shader stage changed their uniforms and only upload
uniforms which belong to it.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_context.h   | 2 ++
 src/mesa/drivers/dri/i965/brw_program.c   | 9 +
 src/mesa/drivers/dri/i965/gen7_vs_state.c | 4 
 3 files changed, 15 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index f384008..6706b4a 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1357,6 +1357,8 @@ struct brw_context
   uint32_t next_offset;
} constants;
 
+   uint64_t uniformstagemap[MAX_UNIFORMS];
+
struct {
   uint32_t state_offset;
   uint32_t blend_state_offset;
diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index d9a3f05..c1eec8a 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -179,6 +179,14 @@ brwProgramStringNotify(struct gl_context *ctx,
return true;
 }
 
+static void
+brw_uniform_update(struct gl_context *ctx, GLint location)
+{
+   struct brw_context *brw = brw_context(ctx);
+
+   brw->state.dirty.mesa |= brw->uniformstagemap[location];
+}
+
 void
 brw_add_texrect_params(struct gl_program *prog)
 {
@@ -236,6 +244,7 @@ void brwInitFragProgFuncs( struct dd_function_table 
*functions )
 
functions->NewShader = brw_new_shader;
functions->LinkShader = brw_link_shader;
+   functions->UniformUpdate = brw_uniform_update;
 }
 
 void
diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c 
b/src/mesa/drivers/dri/i965/gen7_vs_state.c
index 85bd56f..269612b 100644
--- a/src/mesa/drivers/dri/i965/gen7_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c
@@ -70,6 +70,10 @@ gen7_upload_constant_buffer_data(struct brw_context* brw,
   _NEW_FRAGMENT_CONSTANTS
};
 
+   if (!(brw->state.dirty.brw & BRW_NEW_BATCH) &&
+   (!prog_data->nr_params || !(brw->state.dirty.mesa & 
const_state_stage[stage_state->stage])))
+  return;
+
/* If current constant data does not fit in current constant buffer bank,
 * move to next slot. 
 */
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 27/40] i965: Build a dynamic gather table for UBO push constant entries

2015-01-04 Thread Abdiel Janulgue

The resource streamer is able to gather and pack sparsely-located
constant data from any buffer object representing a UBO block.
This patch adds support for keeping track of these constant data
fetches into a gather table.

We only allocate a maximum of 128 entries. This limitation is taken
from a comment in assign_constant_locations() that we allow 16
registers (128 uniform components) as push constants.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_context.h | 5 +
 src/mesa/drivers/dri/i965/brw_program.c | 3 +++
 src/mesa/drivers/dri/i965/brw_wm.c  | 3 +++
 3 files changed, 11 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 0337bfd..8eddc54 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -347,6 +347,7 @@ struct brw_stage_prog_data {
 
GLuint nr_params;   /**< number of float params/constants */
GLuint nr_pull_params;
+   GLuint nr_gather_table;
 
unsigned curb_read_length;
unsigned total_scratch;
@@ -367,6 +368,10 @@ struct brw_stage_prog_data {
 */
const gl_constant_value **param;
const gl_constant_value **pull_param;
+   struct {
+  int reg;
+  unsigned channel_mask;
+   } gather_table[128]; /** equal to max_push_components */
 };
 
 /* Data about a particular attempt to compile a program.  Note that
diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index c1eec8a..2c3d374 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -558,6 +558,9 @@ brw_stage_prog_data_compare(const struct 
brw_stage_prog_data *a,
if (memcmp(a->pull_param, b->pull_param, a->nr_pull_params * sizeof(void 
*)))
   return false;
 
+   if (memcmp(a->gather_table, b->gather_table, sizeof(a->gather_table)))
+  return false;
+
return true;
 }
 
diff --git a/src/mesa/drivers/dri/i965/brw_wm.c 
b/src/mesa/drivers/dri/i965/brw_wm.c
index e7939f0..1be93c3 100644
--- a/src/mesa/drivers/dri/i965/brw_wm.c
+++ b/src/mesa/drivers/dri/i965/brw_wm.c
@@ -204,6 +204,9 @@ bool do_wm_prog(struct brw_context *brw,
   rzalloc_array(NULL, const gl_constant_value *, param_count);
prog_data.base.nr_params = param_count;
 
+   prog_data.base.nr_gather_table = 0;
+   memset(prog_data.base.gather_table, 0, sizeof(prog_data.base.gather_table));
+
prog_data.barycentric_interp_modes =
   brw_compute_barycentric_interp_modes(brw, key->flat_shade,
key->persample_shading,
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 23/40] i965/fs: Associate the uniform location for the fragment shader

2015-01-04 Thread Abdiel Janulgue

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 3639ed2..0f2c2c4 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1144,6 +1144,7 @@ fs_visitor::setup_uniform_values(ir_variable *ir)
  continue;
   }
 
+  brw->uniformstagemap[u] |= _NEW_FRAGMENT_CONSTANTS;
   unsigned slots = storage->type->component_slots();
   if (storage->array_elements)
  slots *= storage->array_elements;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 25/40] i965: Disable gather push constants for null constants

2015-01-04 Thread Abdiel Janulgue

Programming null constants with gather constant tables seems to
be unsupported and results in a GPU lockup even with the prescribed
GPU workarounds in the bspec. I found out by trial and error that
disabling the gather constant feature for null constants is the only
way to go around the issue.

This patch batches the null push constant state commands if possible so
we don't unnecessarily send the gather push constants disable command
every time we send a null constant state in the pipeline.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_context.c   |  1 +
 src/mesa/drivers/dri/i965/brw_context.h   |  1 +
 src/mesa/drivers/dri/i965/gen7_disable.c  |  4 
 src/mesa/drivers/dri/i965/gen7_vs_state.c | 20 
 4 files changed, 26 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index a6e73ce..175a7c8 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -861,6 +861,7 @@ brwCreateContext(gl_api api,
brw->hw_bt_pool.bo = 0;
brw->gather_pool.bo = 0;
brw->constants.bo = 0;
+   brw->enabled_stage_const = 0x7;
 
if (INTEL_DEBUG & DEBUG_SHADER_TIME)
   brw_init_shader_time(brw);
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 6706b4a..0337bfd 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1358,6 +1358,7 @@ struct brw_context
} constants;
 
uint64_t uniformstagemap[MAX_UNIFORMS];
+   GLbitfield enabled_stage_const;
 
struct {
   uint32_t state_offset;
diff --git a/src/mesa/drivers/dri/i965/gen7_disable.c 
b/src/mesa/drivers/dri/i965/gen7_disable.c
index 2c43cd7..ba7fbf8 100644
--- a/src/mesa/drivers/dri/i965/gen7_disable.c
+++ b/src/mesa/drivers/dri/i965/gen7_disable.c
@@ -29,6 +29,8 @@
 static void
 disable_stages(struct brw_context *brw)
 {
+   gen7_toggle_gather_constants(brw, false);
+
/* Disable the HS Unit */
BEGIN_BATCH(7);
OUT_BATCH(_3DSTATE_CONSTANT_HS << 16 | (7 - 2));
@@ -87,6 +89,8 @@ disable_stages(struct brw_context *brw)
OUT_BATCH(_3DSTATE_BINDING_TABLE_POINTERS_DS << 16 | (2 - 2));
OUT_BATCH(0);
ADVANCE_BATCH();
+
+   gen7_toggle_gather_constants(brw, true);
 }
 
 const struct brw_tracked_state gen7_disable_stages = {
diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c 
b/src/mesa/drivers/dri/i965/gen7_vs_state.c
index 269612b..30ebec8 100644
--- a/src/mesa/drivers/dri/i965/gen7_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c
@@ -116,6 +116,22 @@ gen7_upload_constant_state(struct brw_context *brw,
int const_loc = use_gather ? 16 : 0;
int dwords = brw->gen >= 8 ? 11 : 7;
 
+   /* Disable gather constants when zeroing constant states */
+   bool gather_switched_off = false;
+   if (use_gather) {
+  if (active) {
+ brw->enabled_stage_const |= (1 << stage_state->stage);
+  } else {
+ if (brw->enabled_stage_const & (1 << stage_state->stage)) {
+gen7_toggle_gather_constants(brw, false);
+gather_switched_off = true;
+brw->enabled_stage_const &= ~(1 << stage_state->stage);
+ } else {
+return;
+ }
+  }
+   }
+
struct brw_stage_prog_data *prog_data = stage_state->prog_data;
if (prog_data && use_gather && active) {
   gen7_submit_gather_table(brw, stage_state, prog_data, gather_opcode);
@@ -145,6 +161,10 @@ gen7_upload_constant_state(struct brw_context *brw,
}
 
ADVANCE_BATCH();
+
+   /* Re-enable gather again if required */
+   if (gather_switched_off)
+  gen7_toggle_gather_constants(brw, true);
 }
 
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 26/40] i965: Allocate space on the gather pool for UBO entries

2015-01-04 Thread Abdiel Janulgue

In addition, append the UBO entries to stage_state->push_const_size

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/gen6_vs_state.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_vs_state.c 
b/src/mesa/drivers/dri/i965/gen6_vs_state.c
index 5e71a44..9b32ec2 100644
--- a/src/mesa/drivers/dri/i965/gen6_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_vs_state.c
@@ -59,7 +59,9 @@ gen6_upload_push_constants(struct brw_context *brw,
struct gl_context *ctx = &brw->ctx;
 
if (prog_data->nr_params == 0) {
-  stage_state->push_const_size = 0;
+  if (prog_data->nr_ubo_params == 0) {
+ stage_state->push_const_size = 0;
+  }
} else {
   /* Updates the ParamaterValues[i] pointers for all parameters of the
* basic type of PROGRAM_STATE_VAR.
@@ -130,6 +132,14 @@ gen6_upload_push_constants(struct brw_context *brw,
  stage_state->push_const_offset = brw->gather_pool.next_offset;
  brw->gather_pool.next_offset += (ALIGN(num_consts, 4) / 4) * 64;
   }
+
+  if (prog_data->nr_ubo_params > 0) {
+ stage_state->push_const_size += ALIGN(prog_data->nr_ubo_params, 8) / 
8;
+
+ uint32_t num_constants = ALIGN(prog_data->nr_ubo_params, 4) / 4;
+ stage_state->push_const_offset = brw->gather_pool.next_offset;
+ brw->gather_pool.next_offset += (ALIGN(num_constants, 4) / 4) * 64;
+  }
}
 }
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 18/40] i965: Don't use gather push constants in BLORP

2015-01-04 Thread Abdiel Janulgue

Switch off gather push constants in the blorp. Blorp requires only a
a set of simple constants that there is no need for the extra complexity
to program a gather table entry into the pipeline.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/gen7_blorp.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp 
b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
index f6fb904..da21a55 100644
--- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
@@ -847,6 +847,7 @@ gen7_blorp_exec(struct brw_context *brw,
wm_surf_offset_texture);
   sampler_offset = gen6_blorp_emit_sampler_state(brw, params);
}
+   gen7_toggle_gather_constants(brw, false);
gen7_blorp_emit_vs_disable(brw, params);
gen7_blorp_emit_hs_disable(brw, params);
gen7_blorp_emit_te_disable(brw, params);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 20/40] mesa: Publish uniform update state flags

2015-01-04 Thread Abdiel Janulgue

Trigger it when uniforms are updated

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/main/dd.h  |  2 ++
 src/mesa/main/mtypes.h  |  3 +++
 src/mesa/main/state.c   | 10 +-
 src/mesa/main/uniform_query.cpp |  6 ++
 4 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
index 8c737e0..1be51fc 100644
--- a/src/mesa/main/dd.h
+++ b/src/mesa/main/dd.h
@@ -592,6 +592,8 @@ struct dd_function_table {
void (*TexParameter)(struct gl_context *ctx,
 struct gl_texture_object *texObj,
 GLenum pname, const GLfloat *params);
+   void (*UniformUpdate)(struct gl_context *ctx,
+ GLint location);
/** Set the viewport */
void (*Viewport)(struct gl_context *ctx);
/*@}*/
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 12ab3e8..e762199 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3933,6 +3933,9 @@ struct gl_matrix_stack
 /* gap, re-use for core Mesa state only; use ctx->DriverFlags otherwise */
 #define _NEW_VARYING_VP_INPUTS (1 << 31) /**< gl_context::varying_vp_inputs */
 
+#define _NEW_VERTEX_CONSTANTS   (1ULL << 32)
+#define _NEW_GEOMETRY_CONSTANTS (1ULL << 33)
+#define _NEW_FRAGMENT_CONSTANTS (1ULL << 34)
 #define _NEW_ALL ~0
 /*@}*/
 
diff --git a/src/mesa/main/state.c b/src/mesa/main/state.c
index ccf60de..553ea8a 100644
--- a/src/mesa/main/state.c
+++ b/src/mesa/main/state.c
@@ -232,16 +232,16 @@ update_program(struct gl_context *ctx)
 /**
  * Examine shader constants and return either _NEW_PROGRAM_CONSTANTS or 0.
  */
-static GLbitfield
+static GLbitfield64
 update_program_constants(struct gl_context *ctx)
 {
-   GLbitfield new_state = 0x0;
+   GLbitfield64 new_state = 0x0;
 
if (ctx->FragmentProgram._Current) {
   const struct gl_program_parameter_list *params =
  ctx->FragmentProgram._Current->Base.Parameters;
   if (params && params->StateFlags & ctx->NewState) {
- new_state |= _NEW_PROGRAM_CONSTANTS;
+ new_state |= (_NEW_PROGRAM_CONSTANTS | _NEW_FRAGMENT_CONSTANTS);
   }
}
 
@@ -251,7 +251,7 @@ update_program_constants(struct gl_context *ctx)
   /*FIXME: StateFlags is always 0 because we have unnamed constant
*   not state changes */
   if (params /*&& params->StateFlags & ctx->NewState*/) {
- new_state |= _NEW_PROGRAM_CONSTANTS;
+ new_state |= (_NEW_PROGRAM_CONSTANTS | _NEW_GEOMETRY_CONSTANTS);
   }
}
 
@@ -259,7 +259,7 @@ update_program_constants(struct gl_context *ctx)
   const struct gl_program_parameter_list *params =
  ctx->VertexProgram._Current->Base.Parameters;
   if (params && params->StateFlags & ctx->NewState) {
- new_state |= _NEW_PROGRAM_CONSTANTS;
+ new_state |= (_NEW_PROGRAM_CONSTANTS | _NEW_VERTEX_CONSTANTS);
   }
}
 
diff --git a/src/mesa/main/uniform_query.cpp b/src/mesa/main/uniform_query.cpp
index 32870d0..14837ec 100644
--- a/src/mesa/main/uniform_query.cpp
+++ b/src/mesa/main/uniform_query.cpp
@@ -700,6 +700,9 @@ _mesa_uniform(struct gl_context *ctx, struct 
gl_shader_program *shProg,
   count = MIN2(count, (int) (uni->array_elements - offset));
}
 
+   if (ctx->Driver.UniformUpdate)
+  ctx->Driver.UniformUpdate(ctx, location);
+
FLUSH_VERTICES(ctx, _NEW_PROGRAM_CONSTANTS);
 
/* Store the data in the "actual type" backing storage for the uniform.
@@ -866,6 +869,9 @@ _mesa_uniform_matrix(struct gl_context *ctx, struct 
gl_shader_program *shProg,
   count = MIN2(count, (int) (uni->array_elements - offset));
}
 
+   if (ctx->Driver.UniformUpdate)
+  ctx->Driver.UniformUpdate(ctx, location);
+
FLUSH_VERTICES(ctx, _NEW_PROGRAM_CONSTANTS);
 
/* Store the data in the "actual type" backing storage for the uniform.
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 11/40] i965/gen7.5: Flush on-chip binding table to pool

2015-01-04 Thread Abdiel Janulgue

Normally, the CS will will just consume the binding table pointer commands
as pipelined state. When the RS is enabled however, the RS flushes whatever
edited surface state entries of our on-chip binding table to the binding
table pool before passing the command on to the CS.

Note that the the binding table pointer offset is relative to the binding table
pool base address when resource streamer instead of the surface state base 
address.

In addition, 3DSTATE_BINDING_TABLE_POINTERS_* expects btp offsets of up to 64k 
when
resource streamer hardware binding tables are enabled. However the bt entry 
within
the command only allows until 32k. Therefore, ensure that offset fits within the
highest bit of the command.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_binding_tables.c | 3 ++-
 src/mesa/drivers/dri/i965/gen7_blorp.cpp   | 4 +++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
b/src/mesa/drivers/dri/i965/brw_binding_tables.c
index d97b3d9..03e7a4a 100644
--- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
+++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
@@ -106,7 +106,8 @@ brw_upload_binding_table(struct brw_context *brw,
 
   BEGIN_BATCH(2);
   OUT_BATCH(packet_name << 16 | (2 - 2));
-  OUT_BATCH(stage_state->bind_bo_offset);
+  OUT_BATCH(brw->hw_bt_pool.bo ? stage_state->bind_bo_offset >> 1
+: stage_state->bind_bo_offset);
   ADVANCE_BATCH();
 
   if (brw->has_resource_streamer)
diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp 
b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
index 3d5c7df..f6fb904 100644
--- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
@@ -560,7 +560,9 @@ gen7_blorp_emit_binding_table_pointers_ps(struct 
brw_context *brw,
 {
BEGIN_BATCH(2);
OUT_BATCH(_3DSTATE_BINDING_TABLE_POINTERS_PS << 16 | (2 - 2));
-   OUT_BATCH(wm_bind_bo_offset);
+   /* For RS: fit maximum 64k binding table offset within high bits */
+   OUT_BATCH(brw->hw_bt_pool.bo ? wm_bind_bo_offset >> 1
+ : wm_bind_bo_offset);
ADVANCE_BATCH();
 }
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 14/40] i965: Initialize and partition a 32k constant buffer for uniforms.

2015-01-04 Thread Abdiel Janulgue

Uniforms are uploaded to this buffer instead of the space allocated from
the dynamic state base address.

This buffer is sliced into eight 4k-sized banks; each accessible
by SURFACE_STATE entries. These banks are layouted in such a way that all
shader stages can upload to whatever next free bank is available. This way,
we avoid generating numerous SURFACE_STATE entries everytime a uniform is
uploaded.

Using the gather table, we are able to refer to the constant entries using
the hw-binding table index plus the constant buffer offset to refer to
the constant data.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_binding_tables.c | 31 ++
 src/mesa/drivers/dri/i965/brw_context.c|  1 +
 src/mesa/drivers/dri/i965/brw_context.h|  9 
 3 files changed, 41 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
b/src/mesa/drivers/dri/i965/brw_binding_tables.c
index b91e5d9..4138509 100644
--- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
+++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
@@ -233,6 +233,33 @@ gen7_init_gather_pool(struct brw_context *brw)
131072, 4096);
   brw->gather_pool.next_offset = 0;
}
+
+   if (!brw->constants.bo) {
+  static const int cb_bank_size = 4096;
+  static const int num_cb_banks = 8;
+
+  brw->constants.bo = drm_intel_bo_alloc(brw->bufmgr, "constants_bo",
+ 32768, 4096);
+  drm_intel_gem_bo_map_gtt(brw->constants.bo);
+  brw->constants.next_offset = 0;
+
+  assert(is_power_of_two(cb_bank_size));
+
+  uint32_t cb_offset = 0;
+  uint32_t surf_offset = 0;
+  for (int i = 0; i < num_cb_banks; i++) {
+ brw_create_constant_surface(brw, brw->constants.bo, cb_offset,
+ cb_bank_size,
+ &surf_offset, false);
+ cb_offset += cb_bank_size;
+ unsigned index = BRW_UNIFORM_GATHER_INDEX_START + i;
+
+ gen7_update_binding_table(brw, MESA_SHADER_VERTEX, index,
+   surf_offset);
+ gen7_update_binding_table(brw, MESA_SHADER_FRAGMENT, index,
+   surf_offset);
+  }
+   }
 }
 
 void
@@ -313,7 +340,11 @@ void
 gen7_reset_rs_pool_offsets(struct brw_context *brw)
 {
brw->hw_bt_pool.next_offset = bt_size;
+   brw->constants.next_offset = 0;
brw->gather_pool.next_offset = 0;
+
+   drm_intel_bo_unreference(brw->constants.bo);
+   brw->constants.bo = 0;
 }
 
 const struct brw_tracked_state gen7_hw_binding_tables = {
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 32bbdc2..d18733d 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -860,6 +860,7 @@ brwCreateContext(gl_api api,
 
brw->hw_bt_pool.bo = 0;
brw->gather_pool.bo = 0;
+   brw->constants.bo = 0;
 
if (INTEL_DEBUG & DEBUG_SHADER_TIME)
   brw_init_shader_time(brw);
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index b205773..e2a6415 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -657,6 +657,9 @@ struct brw_vs_prog_data {
 
 #define SURF_INDEX_GEN6_SOL_BINDING(t) (t)
 
+/** Start of hardware binding table index for uniform gather constant entries 
*/
+#define BRW_UNIFORM_GATHER_INDEX_START 16
+
 /* Note: brw_gs_prog_data_compare() must be updated when adding fields to
  * this struct!
  */
@@ -1347,6 +1350,12 @@ struct brw_context
   uint32_t next_offset;
} gather_pool;
 
+   /* Constant data shared by the shader stages */
+   struct {
+  drm_intel_bo *bo;
+  uint32_t next_offset;
+   } constants;
+
struct {
   uint32_t state_offset;
   uint32_t blend_state_offset;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] Resource streamer redux: Enable gather push constants

2015-01-04 Thread Abdiel Janulgue

I sent previous patches enabling hardware-generated binding tables. By itself,
hw-binding tables gave no performance improvements, it is just a means to 
an end. However, the real meat of the RS hardware is the optimized ability to
map constants to the GRF.

Gather push constants is basically an optimized way of programming push 
constants. What it gives us is the ability to gather and pack constant data that
may reside in a non-contiguous block of any arbitrary buffer object without
incurring additional overhead. The goal of this series is to allow registers
representing combined UBO blocks and uniform to be sequentially allocated and
packed tightly without holes, thus (1) reduce register pressure and 
(2) minimize the use of pull constant loads.

To achieve the same results without the resource streamer, the driver may have
to manually rearrange, reformat, and repack the entries within the already
uploaded UBO block and any uniform buffer that may be present so that the 
entries would carefully match the layout of the allocated GRFs. All of which 
would happen every frame. It get's even worse if a shader fetches its constants
from two or more different constant buffer blocks.

The resource streamer acheives this hardware packing of GRF entries by parsing
a gather table containing hardware-binding table indices, offset, and channel
mask to gather the sparsely-located constant data.

I promised some folks that I would send this out in a coherent state before
the holidays. Unfortunately, I didn't make it in time, but I hope the current 
state should be enough to demonstrate my approach and make reviews possible.

I still lack real-world benchmarks. But consider this simple piglit testcase:
tests/spec/glsl-1.40/uniform_buffer/fs-struct-copy.shader_test.

With the existing method of fetching the ubo entries:

SIMD16 shader: 15 instructions. 0 loops. Compacted 240 to 176 bytes (27%)
mov(1)  g16<1>UD0x000cUD
mov(1)  g18<1>UD0xUD  
mov(1)  g20<1>UD0x0004UD 
send(4) g2<1>F  g16<0,1,0>F
sampler (1, 0, 7, 0) mlen 1 rlen 1 
send(4) g4<1>F  g18<0,1,0>F
sampler (1, 0, 7, 0) mlen 1 rlen 1  
send(4) g6<1>F  g20<0,1,0>F
sampler (1, 0, 7, 0) mlen 1 rlen 1  
add(16) g8<1>F  g4<0,1,0>F  g6<0,1,0>F   
add(16) g10<1>F g4.1<0,1,0>Fg6.1<0,1,0>F   
add(16) g12<1>F g4.2<0,1,0>Fg6.2<0,1,0>F   
add(16) g14<1>F g4.3<0,1,0>Fg6.3<0,1,0>F  
add(16) g120<1>Fg8<8,8,1>F  g2<0,1,0>F   
add(16) g122<1>Fg10<8,8,1>F g2.1<0,1,0>F   
add(16) g124<1>Fg12<8,8,1>F g2.2<0,1,0>F   
add(16) g126<1>Fg14<8,8,1>F g2.3<0,1,0>F 
sendc(16)   nullg120<8,8,1>F

Compare with gather constants enabled:

SIMD16 shader: 9 instructions. 0 loops. Compacted 144 to 112 bytes (22%)
add(16) g4<1>F  g2.4<0,1,0>Fg3<0,1,0>F
add(16) g6<1>F  g2.5<0,1,0>Fg3.1<0,1,0>F  
add(16) g8<1>F  g2.6<0,1,0>Fg3.2<0,1,0>F 
add(16) g10<1>F g2.7<0,1,0>Fg3.3<0,1,0>F  
add(16) g120<1>Fg4<8,8,1>F  g2<0,1,0>F
add(16) g122<1>Fg6<8,8,1>F  g2.1<0,1,0>F 
add(16) g124<1>Fg8<8,8,1>F  g2.2<0,1,0>F  
add(16) g126<1>Fg10<8,8,1>F g2.3<0,1,0>F
nop ;
sendc(16)   nullg120<8,8,1>F


Current Status
--

What works: 
- FS, VS uniforms piglit tests pass
- Fragment shader UBOs without mixed uniforms pass
- Fragment shader UBOs mixed with uniforms entries sized vec4 or less pass

What doesn't work yet:
- Fragment shader UBOs with bools
- VS and GS UBOs

Vec4 backend support is not yet done. Once I complete it, I hope to publish
comprehenive benchmark scores.

Patch Summary
-

Series lives here: 
http://cgit.freedesktop.org/~abj/mesa/log/?h=rs_gather_constants0 
 
Patches 1  - 11: Enables hardware-generated binding tables which is a 
requirement
 for gather push constants.
Patches 12 - 18: Enables gather push constant support for ordinary uniforms 
Patches 19 - 24: Implements fine-grained uniform uploads.
Patches 26 - 40: Adds FS-backend compiler support to make UBOs as push 
constants 

I'm not particularly very happy about having to do patch 19. My goal was to make
the driver able to tell which stage actually modified their uniforms. With that
information, uniform uploads actually happen when there is a change, which 
makes 
the gather table generation more efficient for ordinary uniforms. Ideally, if 
there
is any way to let the driver accept additional state flags without making the 
type
size of the state flag variable bigger, I would be more than ha

[Mesa-dev] [RFC PATCH 12/40] i965: Add gather push constants opcodes

2015-01-04 Thread Abdiel Janulgue

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_defines.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 3f31a6f..d0b1eab 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -2179,6 +2179,14 @@ enum brw_wm_barycentric_interp_mode {
 #define _3DSTATE_CONSTANT_HS  0x7819 /* GEN7+ */
 #define _3DSTATE_CONSTANT_DS  0x781A /* GEN7+ */
 
+/* Resource streamer gather constants */
+#define _3DSTATE_GATHER_POOL_ALLOC0x791A /* GEN7.5+ */
+#define _3DSTATE_GATHER_CONSTANT_VS   0x7834
+#define _3DSTATE_GATHER_CONSTANT_GS   0x7835
+#define _3DSTATE_GATHER_CONSTANT_HS   0x7836
+#define _3DSTATE_GATHER_CONSTANT_DS   0x7837
+#define _3DSTATE_GATHER_CONSTANT_PS   0x7838
+
 #define _3DSTATE_STREAMOUT0x781e /* GEN7+ */
 /* DW1 */
 # define SO_FUNCTION_ENABLE(1 << 31)
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 02/40] i965/gen7.5: Introduce INTEL_RESOURCE_STREAMER to toggle resource streamer

2015-01-04 Thread Abdiel Janulgue

export INTEL_RESOURCE_STREAMER={0,1} To switch on/off resource streamer.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_context.c | 6 ++
 src/mesa/drivers/dri/i965/brw_context.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 860ee22..59f190b 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -704,6 +704,12 @@ brwCreateContext(gl_api api,
 
brw->must_use_separate_stencil = screen->hw_must_use_separate_stencil;
brw->has_swizzling = screen->hw_has_swizzling;
+   
+   if (getenv("INTEL_RESOURCE_STREAMER")) {
+  brw->has_resource_streamer = true;
+   } else {
+  brw->has_resource_streamer = false;
+   }
 
brw->vs.base.stage = MESA_SHADER_VERTEX;
brw->gs.base.stage = MESA_SHADER_GEOMETRY;
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index a63c483..dd8e730 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1080,6 +1080,7 @@ struct brw_context
bool no_simd8;
bool use_rep_send;
bool scalar_vs;
+   bool has_resource_streamer;
 
/**
 * Some versions of Gen hardware don't do centroid interpolation correctly
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 01/40] i965/gen7.5: Implement resource streamer control opcodes

2015-01-04 Thread Abdiel Janulgue

Used to toggle the resource streamer within a batchbuffer

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/intel_reg.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/intel_reg.h 
b/src/mesa/drivers/dri/i965/intel_reg.h
index 5ac0180..d41adaa 100644
--- a/src/mesa/drivers/dri/i965/intel_reg.h
+++ b/src/mesa/drivers/dri/i965/intel_reg.h
@@ -47,6 +47,9 @@
 /* Load a value from memory into a register.  Only available on Gen7+. */
 #define GEN7_MI_LOAD_REGISTER_MEM  (CMD_MI | (0x29 << 23))
 # define MI_LOAD_REGISTER_MEM_USE_GGTT (1 << 22)
+/* Haswell RS control */
+#define MI_RS_CONTROL   (CMD_MI | (0x6 << 23))
+#define MI_RS_STORE_DATA_IMM(CMD_MI | (0x2b << 23))
 
 /** @{
  *
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 04/40] i965/gen7.5: Implement MI_RS_STORE_DATA_IMM workaround for 3DPRIMITIVE commands

2015-01-04 Thread Abdiel Janulgue

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_draw.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index c581cc0..d48128d 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -254,6 +254,20 @@ static void brw_emit_prim(struct brw_context *brw,
 
 
if (brw->gen >= 7) {
+  /* If resource streamer is enabled, an MI_RS_STORE_DATA_IMM with Resource
+   * Streamer Flush set must be programmed prior to a 3DPRIMITIVE command.
+   */
+  if (brw->has_resource_streamer) {
+ BEGIN_BATCH(4);
+ OUT_BATCH(MI_RS_STORE_DATA_IMM |
+   (1 << 21) |  /* rs flush */
+   (4 - 2));
+ OUT_BATCH(0);
+ OUT_BATCH(0);
+ OUT_BATCH(0);
+ ADVANCE_BATCH();
+  }
+  
   BEGIN_BATCH(7);
   OUT_BATCH(CMD_3D_PRIM << 16 | (7 - 2) | indirect_flag);
   OUT_BATCH(hw_prim | vertex_access_type);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 03/40] i965/gen7.5: Pass resource streamer enable flags on batchbuffer start

2015-01-04 Thread Abdiel Janulgue

This is passed on the kernel to enable the resource streamer enable bit
on MI_BATCHBUFFER_START

v3: Use I915_EXEC_RESOURCE_STREAMER. Kernel folks want the batchbuffer
flags to be more concise.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 2bd11d7..1150e3d 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -33,6 +33,10 @@
 #include "intel_fbo.h"
 #include "brw_context.h"
 
+#ifndef I915_EXEC_RESOURCE_STREAMER
+#define I915_EXEC_RESOURCE_STREAMER (1<<13)
+#endif
+
 static void
 intel_batchbuffer_reset(struct brw_context *brw);
 
@@ -257,7 +261,8 @@ do_flush_locked(struct brw_context *brw)
   if (brw->gen >= 6 && batch->ring == BLT_RING) {
  flags = I915_EXEC_BLT;
   } else {
- flags = I915_EXEC_RENDER;
+ flags = I915_EXEC_RENDER |
+(brw->has_resource_streamer ? I915_EXEC_RESOURCE_STREAMER : 0);
   }
   if (batch->needs_sol_reset)
 flags |= I915_EXEC_GEN7_SOL_RESET;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 16/40] i965: Allocate space on the gather pool for every push constant state

2015-01-04 Thread Abdiel Janulgue

Reserve space in the gather pool where the resource streamer will flush
its gather constant data.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/gen6_vs_state.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/gen6_vs_state.c 
b/src/mesa/drivers/dri/i965/gen6_vs_state.c
index c1950d1..4fd3ea2 100644
--- a/src/mesa/drivers/dri/i965/gen6_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_vs_state.c
@@ -123,6 +123,14 @@ gen6_upload_push_constants(struct brw_context *brw,
*/
   assert(stage_state->push_const_size <= 32);
}
+   /* Allocate gather pool space for uniform and UBO entries in 512-bit 
chunks*/
+   if (brw->gather_pool.bo != NULL) {
+  if (prog_data->nr_params > 0) {
+ int num_consts = ALIGN(prog_data->nr_params, 4) / 4;
+ stage_state->push_const_offset = brw->gather_pool.next_offset;
+ brw->gather_pool.next_offset += (ALIGN(num_consts, 4) / 4) * 64;
+  }
+   }
 }
 
 static void
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 10/40] i965/blorp: Update hw-binding table entries for blorp.

2015-01-04 Thread Abdiel Janulgue

Update the hw-generated binding table for blorp SURFACE_STATE entries.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/gen6_blorp.cpp | 35 
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.cpp 
b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
index d4aa955..8e78450 100644
--- a/src/mesa/drivers/dri/i965/gen6_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen6_blorp.cpp
@@ -428,15 +428,32 @@ gen6_blorp_emit_binding_table(struct brw_context *brw,
   uint32_t wm_surf_offset_texture)
 {
uint32_t wm_bind_bo_offset;
-   uint32_t *bind = (uint32_t *)
-  brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
-  sizeof(uint32_t) *
-  BRW_BLORP_NUM_BINDING_TABLE_ENTRIES,
-  32, /* alignment */
-  &wm_bind_bo_offset);
-   bind[BRW_BLORP_RENDERBUFFER_BINDING_TABLE_INDEX] =
-  wm_surf_offset_renderbuffer;
-   bind[BRW_BLORP_TEXTURE_BINDING_TABLE_INDEX] = wm_surf_offset_texture;
+   uint32_t *bind;
+
+   if (brw->hw_bt_pool.bo && brw->has_resource_streamer) {
+  BEGIN_BATCH(4);
+  OUT_BATCH(_3DSTATE_BINDING_TABLE_EDIT_PS << 16 | (4 - 2));
+  OUT_BATCH(0x3);
+  {
+ OUT_BATCH(BRW_BLORP_RENDERBUFFER_BINDING_TABLE_INDEX << 16 |
+   (wm_surf_offset_renderbuffer >> 5));
+ OUT_BATCH(BRW_BLORP_TEXTURE_BINDING_TABLE_INDEX << 16 |
+   (wm_surf_offset_texture >> 5));
+  }
+  ADVANCE_BATCH();
+  wm_bind_bo_offset = brw->hw_bt_pool.next_offset;
+  brw->hw_bt_pool.next_offset += (256 * sizeof(uint16_t));
+   } else {
+  bind = (uint32_t *)
+ brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
+ sizeof(uint32_t) *
+ BRW_BLORP_NUM_BINDING_TABLE_ENTRIES,
+ 32, /* alignment */
+ &wm_bind_bo_offset);
+  bind[BRW_BLORP_RENDERBUFFER_BINDING_TABLE_INDEX] =
+ wm_surf_offset_renderbuffer;
+  bind[BRW_BLORP_TEXTURE_BINDING_TABLE_INDEX] = wm_surf_offset_texture;
+   }
 
return wm_bind_bo_offset;
 }
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 13/40] i965: Enable gather push constants

2015-01-04 Thread Abdiel Janulgue

The 3DSTATE_GATHER_POOL_ALLOC is used to enable or disable the gather
push constants feature within a context. This patch provides the toggle
functionality of using gather push constants to program constant data
within a batch.

In addition, using gather push constants require that a gather pool be
allocated so that the resource streamer can flush the packed constants it
gathered. The pool is later referenced by the 3DSTATE_CONSTANT_* command
to program the push constant data. This patch initializes the gather
pool as well.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_binding_tables.c | 40 +-
 src/mesa/drivers/dri/i965/brw_context.c|  7 +
 src/mesa/drivers/dri/i965/brw_context.h|  7 +
 src/mesa/drivers/dri/i965/brw_state.h  |  1 +
 4 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
b/src/mesa/drivers/dri/i965/brw_binding_tables.c
index 03e7a4a..b91e5d9 100644
--- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
+++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
@@ -222,9 +222,44 @@ gen7_update_binding_table_from_array(struct brw_context 
*brw,
ADVANCE_BATCH();
 }
 
+static void
+gen7_init_gather_pool(struct brw_context *brw)
+{
+   if (!brw->has_resource_streamer || !brw->use_gather_constants)
+  return;
+
+   if (!brw->gather_pool.bo) {
+  brw->gather_pool.bo = drm_intel_bo_alloc(brw->bufmgr, "gather_pool",
+   131072, 4096);
+  brw->gather_pool.next_offset = 0;
+   }
+}
+
+void
+gen7_toggle_gather_constants(struct brw_context *brw, bool enable)
+{
+   if (enable && (!brw->has_resource_streamer || !brw->use_gather_constants))
+  return;
+
+   BEGIN_BATCH(3);
+   OUT_BATCH(_3DSTATE_GATHER_POOL_ALLOC << 16 | (3 - 2));
+   if (enable) {
+  OUT_RELOC(brw->gather_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0,
+(1 << 11) | (3 << 4) | GEN7_MOCS_L3);
+  OUT_RELOC(brw->gather_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0,
+brw->gather_pool.bo->size);
+   } else {
+  OUT_BATCH((3 << 4));
+  OUT_BATCH(0);
+   }
+   ADVANCE_BATCH();
+}
+
 void
 gen7_disable_hw_binding_tables(struct brw_context *brw)
 {
+   gen7_toggle_gather_constants(brw, false);
+
BEGIN_BATCH(3);
OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC << 16 | (3 - 2));
OUT_BATCH(3 << 5); /* only in HSW */
@@ -262,6 +297,9 @@ gen7_enable_hw_binding_tables(struct brw_context *brw)
  brw->hw_bt_pool.bo->size);
ADVANCE_BATCH();
 
+   gen7_init_gather_pool(brw);
+   gen7_toggle_gather_constants(brw, true);
+
/* Pipe control workaround */
BEGIN_BATCH(4);
OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2));
@@ -275,6 +313,7 @@ void
 gen7_reset_rs_pool_offsets(struct brw_context *brw)
 {
brw->hw_bt_pool.next_offset = bt_size;
+   brw->gather_pool.next_offset = 0;
 }
 
 const struct brw_tracked_state gen7_hw_binding_tables = {
@@ -358,5 +397,4 @@ const struct brw_tracked_state gen6_binding_table_pointers 
= {
},
.emit = gen6_upload_binding_table_pointers,
 };
-
 /** @} */
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index b962103..32bbdc2 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -711,6 +711,12 @@ brwCreateContext(gl_api api,
   brw->has_resource_streamer = false;
}
 
+   if (getenv("INTEL_GATHER")) {
+  brw->use_gather_constants = brw->has_resource_streamer;
+   } else {
+  brw->use_gather_constants = false;
+   }
+
brw->vs.base.stage = MESA_SHADER_VERTEX;
brw->gs.base.stage = MESA_SHADER_GEOMETRY;
brw->wm.base.stage = MESA_SHADER_FRAGMENT;
@@ -853,6 +859,7 @@ brwCreateContext(gl_api api,
   ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_ROBUST_ACCESS_BIT_ARB;
 
brw->hw_bt_pool.bo = 0;
+   brw->gather_pool.bo = 0;
 
if (INTEL_DEBUG & DEBUG_SHADER_TIME)
   brw_init_shader_time(brw);
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 17fea5b..b205773 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1081,6 +1081,7 @@ struct brw_context
bool use_rep_send;
bool scalar_vs;
bool has_resource_streamer;
+   bool use_gather_constants;
 
/**
 * Some versions of Gen hardware don't do centroid interpolation correctly
@@ -1340,6 +1341,12 @@ struct brw_context
   uint32_t next_offset;
} hw_bt_pool;
 
+   /* Internal storage used by the resource streamer to flush and refer to 
constant data*/
+   struct {
+  drm_intel_bo *bo;
+  uint32_t next_offset;
+   } gather_pool;
+
struct {
   uint32_t state_offset;
   uint32_t blend_state_offset;
diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index f985a3a..16355bd 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/

[Mesa-dev] [RFC PATCH 07/40] i965/gen7.5: Reset resource streamer pool offsets on batch flush

2015-01-04 Thread Abdiel Janulgue

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 1150e3d..df4a0f2 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -32,6 +32,7 @@
 #include "intel_buffers.h"
 #include "intel_fbo.h"
 #include "brw_context.h"
+#include "brw_state.h"
 
 #ifndef I915_EXEC_RESOURCE_STREAMER
 #define I915_EXEC_RESOURCE_STREAMER (1<<13)
@@ -339,6 +340,9 @@ _intel_batchbuffer_flush(struct brw_context *brw,
   drm_intel_bo_wait_rendering(brw->batch.bo);
}
 
+   if (brw->gen >= 7)
+  gen7_reset_rs_pool_offsets(brw);
+
/* Start a new batch buffer. */
brw_new_batch(brw);
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 15/40] i965: Upload uniforms to the constant buffer

2015-01-04 Thread Abdiel Janulgue

When uploading uniform constants to the uniform constant buffer, this
patch aligns each entry to a 4k-sized boundary so that the gather table
is able to refer to the individual bank using the hw-binding table index
and the constant buffer offset to fetch to the constant entry.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_context.h   |  1 +
 src/mesa/drivers/dri/i965/brw_state.h |  3 +++
 src/mesa/drivers/dri/i965/gen6_vs_state.c | 29 ---
 src/mesa/drivers/dri/i965/gen7_vs_state.c | 39 +++
 4 files changed, 59 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index e2a6415..e0a1759 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -932,6 +932,7 @@ struct brw_stage_state
 
uint32_t push_const_offset; /* Offset in the batchbuffer */
int push_const_size; /* in 256-bit register increments */
+   uint32_t const_bo_offset;  /* Offset within the constant buffer */
 
/* Binding table: pointers to SURFACE_STATE entries. */
uint32_t bind_bo_offset;
diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index 16355bd..6e506a4 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -310,6 +310,9 @@ void gen7_enable_hw_binding_tables(struct brw_context *brw);
 void gen7_disable_hw_binding_tables(struct brw_context *brw);
 void gen7_reset_rs_pool_offsets(struct brw_context *brw);
 void gen7_toggle_gather_constants(struct brw_context *brw, bool enable);
+void gen7_upload_constant_buffer_data(struct brw_context* brw,
+  struct brw_stage_state *stage_state,
+  const struct brw_stage_prog_data 
*prog_data);
 
 #ifdef __cplusplus
 }
diff --git a/src/mesa/drivers/dri/i965/gen6_vs_state.c 
b/src/mesa/drivers/dri/i965/gen6_vs_state.c
index e365cc6..c1950d1 100644
--- a/src/mesa/drivers/dri/i965/gen6_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_vs_state.c
@@ -70,20 +70,23 @@ gen6_upload_push_constants(struct brw_context *brw,
   gl_constant_value *param;
   int i;
 
-  param = brw_state_batch(brw, type,
- prog_data->nr_params * sizeof(gl_constant_value),
- 32, &stage_state->push_const_offset);
+  if (brw->gather_pool.bo != NULL) {
+ gen7_upload_constant_buffer_data(brw, stage_state, prog_data);
+  } else {
+ param = brw_state_batch(brw, type,
+ prog_data->nr_params * 
sizeof(gl_constant_value),
+ 32, &stage_state->push_const_offset);
 
-  STATIC_ASSERT(sizeof(gl_constant_value) == sizeof(float));
-
-  /* _NEW_PROGRAM_CONSTANTS
-   *
-   * Also _NEW_TRANSFORM -- we may reference clip planes other than as a
-   * side effect of dereferencing uniforms, so _NEW_PROGRAM_CONSTANTS
-   * wouldn't be set for them.
-  */
-  for (i = 0; i < prog_data->nr_params; i++) {
- param[i] = *prog_data->param[i];
+ STATIC_ASSERT(sizeof(gl_constant_value) == sizeof(float));
+ /* _NEW_PROGRAM_CONSTANTS
+  *
+  * Also _NEW_TRANSFORM -- we may reference clip planes other than as a
+  * side effect of dereferencing uniforms, so _NEW_PROGRAM_CONSTANTS
+  * wouldn't be set for them.
+  */
+ for (i = 0; i < prog_data->nr_params; i++) {
+param[i] = *prog_data->param[i];
+ }
   }
 
   if (0) {
diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c 
b/src/mesa/drivers/dri/i965/gen7_vs_state.c
index 404dd20..5f8e8b0 100644
--- a/src/mesa/drivers/dri/i965/gen7_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c
@@ -31,6 +31,45 @@
 
 
 void
+gen7_upload_constant_buffer_data(struct brw_context* brw,
+ struct brw_stage_state *stage_state,
+ const struct brw_stage_prog_data *prog_data)
+{
+   static const uint64_t const_state_stage[MESA_SHADER_FRAGMENT + 1] =
+   {
+  _NEW_VERTEX_CONSTANTS,
+  _NEW_GEOMETRY_CONSTANTS,
+  _NEW_FRAGMENT_CONSTANTS
+   };
+
+   /* If current constant data does not fit in current constant buffer bank,
+* move to next slot. 
+*/
+   uint32_t alloc_size = brw->constants.next_offset + (prog_data->nr_params * 
sizeof(gl_constant_value));
+   uint32_t next_bank = ALIGN(brw->constants.next_offset + 1, 4096);
+   if (alloc_size > next_bank ) {
+  brw->constants.next_offset = next_bank;
+   }
+   alloc_size = brw->constants.next_offset + (prog_data->nr_params * 
sizeof(gl_constant_value));
+
+   if (alloc_size > brw->constants.bo->size) {
+  gen7_reset_rs_pool_offsets(brw);
+  gen7_upload_constant_buffer_data(brw, stage_state, prog_data);
+   } else {
+  int i;
+  gl_constant_value *par

[Mesa-dev] [RFC PATCH 19/40] mesa: Change internal state flag to a 64-bits

2015-01-04 Thread Abdiel Janulgue

Existing state flag cannot publish additional values.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/main/dd.h | 2 +-
 src/mesa/main/mtypes.h | 3 ++-
 src/mesa/main/state.c  | 6 +++---
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
index 2f40915..8c737e0 100644
--- a/src/mesa/main/dd.h
+++ b/src/mesa/main/dd.h
@@ -91,7 +91,7 @@ struct dd_function_table {
 * This is in addition to any state change callbacks Mesa may already have
 * made.
 */
-   void (*UpdateState)( struct gl_context *ctx, GLbitfield new_state );
+   void (*UpdateState)( struct gl_context *ctx, GLbitfield64 new_state );
 
/**
 * Resize the given framebuffer to the given size.
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index b95dfb9..12ab3e8 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3932,6 +3932,7 @@ struct gl_matrix_stack
 #define _NEW_FRAG_CLAMP(1 << 29)
 /* gap, re-use for core Mesa state only; use ctx->DriverFlags otherwise */
 #define _NEW_VARYING_VP_INPUTS (1 << 31) /**< gl_context::varying_vp_inputs */
+
 #define _NEW_ALL ~0
 /*@}*/
 
@@ -4399,7 +4400,7 @@ struct gl_context
struct gl_debug_state *Debug;
 
GLenum RenderMode;/**< either GL_RENDER, GL_SELECT, GL_FEEDBACK */
-   GLbitfield NewState;  /**< bitwise-or of _NEW_* flags */
+   GLbitfield64 NewState;  /**< bitwise-or of _NEW_* flags */
uint64_t NewDriverState;  /**< bitwise-or of flags from DriverFlags */
 
struct gl_driver_flags DriverFlags;
diff --git a/src/mesa/main/state.c b/src/mesa/main/state.c
index 45bce78..ccf60de 100644
--- a/src/mesa/main/state.c
+++ b/src/mesa/main/state.c
@@ -349,9 +349,9 @@ update_twoside(struct gl_context *ctx)
 void
 _mesa_update_state_locked( struct gl_context *ctx )
 {
-   GLbitfield new_state = ctx->NewState;
-   GLbitfield prog_flags = _NEW_PROGRAM;
-   GLbitfield new_prog_state = 0x0;
+   GLbitfield64 new_state = ctx->NewState;
+   GLbitfield64 prog_flags = _NEW_PROGRAM;
+   GLbitfield64 new_prog_state = 0x0;
 
if (new_state == _NEW_CURRENT_ATTRIB) 
   goto out;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 05/40] i965/gen7.5: Enable hardware-generated binding tables on render path.

2015-01-04 Thread Abdiel Janulgue

This patch implements the binding table enable command which is also
used to allocate a binding table pool where where hardware-generated
binding table entries are flushed into.

Each binding table offset in the binding table pool is unique per
each shader stage that are enabled within a batch.

In addition, this change inserts the required brw_tracked_state objects
to enable hw-generated binding tables in normal render path.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_binding_tables.c | 67 ++
 src/mesa/drivers/dri/i965/brw_context.c|  2 +
 src/mesa/drivers/dri/i965/brw_context.h|  5 ++
 src/mesa/drivers/dri/i965/brw_defines.h|  3 ++
 src/mesa/drivers/dri/i965/brw_state.h  | 12 +
 src/mesa/drivers/dri/i965/brw_state_upload.c   |  2 +
 6 files changed, 91 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
b/src/mesa/drivers/dri/i965/brw_binding_tables.c
index ea82e71..3807301 100644
--- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
+++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
@@ -44,6 +44,7 @@
 #include "brw_state.h"
 #include "intel_batchbuffer.h"
 
+static const int bt_size = 256 * sizeof(uint16_t);
 /**
  * Upload a shader stage's binding table as indirect state.
  *
@@ -161,6 +162,72 @@ const struct brw_tracked_state brw_gs_binding_table = {
.emit = brw_gs_upload_binding_table,
 };
 
+/**
+ * Hardware-generated binding tables for the resource streamer
+ */
+void
+gen7_disable_hw_binding_tables(struct brw_context *brw)
+{
+   BEGIN_BATCH(3);
+   OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC << 16 | (3 - 2));
+   OUT_BATCH(3 << 5); /* only in HSW */
+   OUT_BATCH(0);
+   ADVANCE_BATCH();
+
+   /* Pipe control workaround */
+   BEGIN_BATCH(4);
+   OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2));
+   OUT_BATCH(PIPE_CONTROL_STATE_CACHE_INVALIDATE);
+   OUT_BATCH(0); /* address */
+   OUT_BATCH(0); /* write data */
+   ADVANCE_BATCH();
+}
+
+void
+gen7_enable_hw_binding_tables(struct brw_context *brw)
+{
+   if (!brw->has_resource_streamer) {
+  gen7_disable_hw_binding_tables(brw);
+  return;
+   }
+
+   if (!brw->hw_bt_pool.bo) {
+  brw->hw_bt_pool.bo = drm_intel_bo_alloc(brw->bufmgr, "hw_bt",
+  131072, 4096);
+  brw->hw_bt_pool.next_offset = bt_size;
+   }
+
+   BEGIN_BATCH(3);
+   OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC << 16 | (3 - 2));
+   OUT_RELOC(brw->hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0,
+ HSW_BINDING_TABLE_ALLOC_OFFSET | GEN7_MOCS_L3 << 7);
+   OUT_RELOC(brw->hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0,
+ brw->hw_bt_pool.bo->size);
+   ADVANCE_BATCH();
+
+   /* Pipe control workaround */
+   BEGIN_BATCH(4);
+   OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2));
+   OUT_BATCH(PIPE_CONTROL_STATE_CACHE_INVALIDATE);
+   OUT_BATCH(0); /* address */
+   OUT_BATCH(0); /* write data */
+   ADVANCE_BATCH();
+}
+
+void
+gen7_reset_rs_pool_offsets(struct brw_context *brw)
+{
+   brw->hw_bt_pool.next_offset = bt_size;
+}
+
+const struct brw_tracked_state gen7_hw_binding_tables = {
+   .dirty = {
+  .mesa = 0,
+  .brw = BRW_NEW_BATCH,
+   },
+   .emit = gen7_enable_hw_binding_tables
+};
+
 /** @} */
 
 /**
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 59f190b..b962103 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -852,6 +852,8 @@ brwCreateContext(gl_api api,
if ((flags & __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS) != 0)
   ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_ROBUST_ACCESS_BIT_ARB;
 
+   brw->hw_bt_pool.bo = 0;
+
if (INTEL_DEBUG & DEBUG_SHADER_TIME)
   brw_init_shader_time(brw);
 
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index dd8e730..17fea5b 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1334,6 +1334,11 @@ struct brw_context
   uint32_t fast_clear_op;
} wm;
 
+   /* RS hardware binding table */
+   struct {
+  drm_intel_bo *bo;
+  uint32_t next_offset;
+   } hw_bt_pool;
 
struct {
   uint32_t state_offset;
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 28e398d..ba62811 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1572,6 +1572,9 @@ enum brw_message_target {
 #define _3DSTATE_BINDING_TABLE_POINTERS_GS 0x7829 /* GEN7+ */
 #define _3DSTATE_BINDING_TABLE_POINTERS_PS 0x782A /* GEN7+ */
 
+#define _3DSTATE_BINDING_TABLE_POOL_ALLOC   0x7919 /* GEN7.5+ */
+# define HSW_BINDING_TABLE_ALLOC_OFFSET 0x860  /* GEN7.5+ */
+
 #define _3DSTATE_SAMPLER_STATE_POINTERS0x7802 /* GEN6+ */
 # define PS_SAMPLER_STATE_CHANGE   (1 << 12)
 # define GS_SAMPLER_STATE_CHANGE

[Mesa-dev] [RFC PATCH 21/40] i965: Make sure that we receive 64-bit state flags

2015-01-04 Thread Abdiel Janulgue

Match core Mesa state flags to receive correct notifications.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_context.c | 2 +-
 src/mesa/drivers/dri/i965/brw_context.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index d18733d..a6e73ce 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -154,7 +154,7 @@ intel_viewport(struct gl_context *ctx)
 }
 
 static void
-intel_update_state(struct gl_context * ctx, GLuint new_state)
+intel_update_state(struct gl_context * ctx, GLbitfield64 new_state)
 {
struct brw_context *brw = brw_context(ctx);
struct intel_texture_object *tex_obj;
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index e0a1759..f384008 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -284,7 +284,7 @@ enum brw_state_id {
 
 struct brw_state_flags {
/** State update flags signalled by mesa internals */
-   GLuint mesa;
+   uint64_t mesa;
/**
 * State update flags signalled as the result of brw_tracked_state updates
 */
@@ -1096,7 +1096,7 @@ struct brw_context
 */
bool needs_unlit_centroid_workaround;
 
-   GLuint NewGLState;
+   uint64_t NewGLState;
struct {
   struct brw_state_flags dirty;
} state;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 17/40] i965: Program the push constants state using the gather table

2015-01-04 Thread Abdiel Janulgue

Use the gather table generated from the uniform uploads to gather
and pack the constants to the gather pool. This changes the 3DSTATE_CONSTANT_*
bits to refer to the gather pool instead of the constant buffer pointed
to by an offset of the dynamic state base address.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_state.h |  2 +-
 src/mesa/drivers/dri/i965/gen6_gs_state.c |  2 +-
 src/mesa/drivers/dri/i965/gen6_vs_state.c |  2 +-
 src/mesa/drivers/dri/i965/gen6_wm_state.c |  2 +-
 src/mesa/drivers/dri/i965/gen7_vs_state.c | 50 ---
 5 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index 6e506a4..de99c7a 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -288,7 +288,7 @@ brw_upload_pull_constants(struct brw_context *brw,
 void
 gen7_upload_constant_state(struct brw_context *brw,
const struct brw_stage_state *stage_state,
-   bool active, unsigned opcode);
+   bool active, unsigned opcode, unsigned gather_op);
 
 /* gen8_vs_state.c */
 void
diff --git a/src/mesa/drivers/dri/i965/gen6_gs_state.c 
b/src/mesa/drivers/dri/i965/gen6_gs_state.c
index eb4c586..79a899e 100644
--- a/src/mesa/drivers/dri/i965/gen6_gs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_gs_state.c
@@ -48,7 +48,7 @@ gen6_upload_gs_push_constants(struct brw_context *brw)
}
 
if (brw->gen >= 7)
-  gen7_upload_constant_state(brw, stage_state, gp, _3DSTATE_CONSTANT_GS);
+  gen7_upload_constant_state(brw, stage_state, gp, _3DSTATE_CONSTANT_GS, 
_3DSTATE_GATHER_CONSTANT_GS);
 }
 
 const struct brw_tracked_state gen6_gs_push_constants = {
diff --git a/src/mesa/drivers/dri/i965/gen6_vs_state.c 
b/src/mesa/drivers/dri/i965/gen6_vs_state.c
index 4fd3ea2..5e71a44 100644
--- a/src/mesa/drivers/dri/i965/gen6_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_vs_state.c
@@ -152,7 +152,7 @@ gen6_upload_vs_push_constants(struct brw_context *brw)
  gen7_emit_vs_workaround_flush(brw);
 
   gen7_upload_constant_state(brw, stage_state, true /* active */,
- _3DSTATE_CONSTANT_VS);
+ _3DSTATE_CONSTANT_VS, 
_3DSTATE_GATHER_CONSTANT_VS);
}
 }
 
diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c 
b/src/mesa/drivers/dri/i965/gen6_wm_state.c
index e57b7f6..e741388 100644
--- a/src/mesa/drivers/dri/i965/gen6_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c
@@ -50,7 +50,7 @@ gen6_upload_wm_push_constants(struct brw_context *brw)
 
if (brw->gen >= 7) {
   gen7_upload_constant_state(brw, &brw->wm.base, true,
- _3DSTATE_CONSTANT_PS);
+ _3DSTATE_CONSTANT_PS, 
_3DSTATE_GATHER_CONSTANT_PS);
}
 }
 
diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c 
b/src/mesa/drivers/dri/i965/gen7_vs_state.c
index 5f8e8b0..85bd56f 100644
--- a/src/mesa/drivers/dri/i965/gen7_vs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c
@@ -29,6 +29,34 @@
 #include "program/prog_statevars.h"
 #include "intel_batchbuffer.h"
 
+static void
+gen7_submit_gather_table(struct brw_context* brw,
+ const struct brw_stage_state *stage_state,
+ const struct brw_stage_prog_data *prog_data,
+ unsigned gather_opcode)
+{
+   uint32_t gather_dwords = 0;
+   /* Generate gather entry only for uniforms */
+   int num_consts = ALIGN(prog_data->nr_params, 4) / 4;
+   gather_dwords = 3 + num_consts;
+
+   /* Fetch the entries in 128-bit units. If the offset in the constant
+* buffer pointing to the entry is > 4096 bytes, round it to the next
+* gather bank slot. gen7_upload_constant_buffer_data() should have
+* made sure that the entries are uploaded in the correct slots.
+*/
+   unsigned bo_offset = (stage_state->const_bo_offset / 16) % 256;
+   unsigned bti = stage_state->const_bo_offset / 4096;
+
+   BEGIN_BATCH(gather_dwords);
+   OUT_BATCH(gather_opcode << 16 | (gather_dwords - 2));
+   OUT_BATCH(0x << 16 | 1 << 12);
+   OUT_BATCH(stage_state->push_const_offset);
+   for (int i = 0; i < num_consts; i++) {
+  OUT_BATCH((bo_offset + i) << 8 | 0xF << 4 | bti);
+   }
+   ADVANCE_BATCH();
+}
 
 void
 gen7_upload_constant_buffer_data(struct brw_context* brw,
@@ -72,23 +100,37 @@ gen7_upload_constant_buffer_data(struct brw_context* brw,
 void
 gen7_upload_constant_state(struct brw_context *brw,
const struct brw_stage_state *stage_state,
-   bool active, unsigned opcode)
+   bool active, unsigned opcode, unsigned 
gather_opcode)
 {
uint32_t mocs = brw->gen < 8 ? GEN7_MOCS_L3 : 0;
 
/* Disable if the shader stage is inactive or there are no push constants. 
*/
active = active && stage_state->push_con

[Mesa-dev] [RFC PATCH 09/40] i965/gen7.5: Upload binding tables as hw-generated binding table format.

2015-01-04 Thread Abdiel Janulgue

When hardware-generated binding tables are enabled, use the hw-generated
binding table format when uploading binding table state.

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_binding_tables.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
b/src/mesa/drivers/dri/i965/brw_binding_tables.c
index e853aac..d97b3d9 100644
--- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
+++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
@@ -71,7 +71,12 @@ brw_upload_binding_table(struct brw_context *brw,
  return;
 
   stage_state->bind_bo_offset = 0;
-   } else {
+   }
+
+   /* If resource streamer is enabled, skip manual binding table upload */
+   if (!brw->hw_bt_pool.bo) {
+  /* CACHE_NEW_*_PROG */
+
   /* Upload a new binding table. */
   if (INTEL_DEBUG & DEBUG_SHADER_TIME) {
  brw->vtbl.create_raw_surface(
@@ -86,15 +91,26 @@ brw_upload_binding_table(struct brw_context *brw,
   /* BRW_NEW_SURFACES and BRW_NEW_*_CONSTBUF */
   memcpy(bind, stage_state->surf_offset,
  prog_data->binding_table.size_bytes);
+   } else {
+  gen7_update_binding_table_from_array(brw, stage_state->stage,
+   stage_state->surf_offset,
+   prog_data->binding_table.size_bytes 
/ 4);
}
 
brw->state.dirty.brw |= brw_new_binding_table;
 
if (brw->gen >= 7) {
+
+  if (brw->has_resource_streamer)
+ stage_state->bind_bo_offset = brw->hw_bt_pool.next_offset;
+
   BEGIN_BATCH(2);
   OUT_BATCH(packet_name << 16 | (2 - 2));
   OUT_BATCH(stage_state->bind_bo_offset);
   ADVANCE_BATCH();
+
+  if (brw->has_resource_streamer)
+ brw->hw_bt_pool.next_offset += bt_size;
}
 }
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 08/40] i965/gen7.5: Implement opcodes for the hw-generated binding table EDIT commands

2015-01-04 Thread Abdiel Janulgue

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/brw_binding_tables.c | 40 ++
 src/mesa/drivers/dri/i965/brw_defines.h|  5 
 src/mesa/drivers/dri/i965/brw_state.h  |  9 ++
 3 files changed, 54 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
b/src/mesa/drivers/dri/i965/brw_binding_tables.c
index 3807301..e853aac 100644
--- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
+++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
@@ -45,6 +45,11 @@
 #include "intel_batchbuffer.h"
 
 static const int bt_size = 256 * sizeof(uint16_t);
+static const GLuint stage_to_bt_edit[MESA_SHADER_FRAGMENT + 1] = {
+   _3DSTATE_BINDING_TABLE_EDIT_VS,
+   _3DSTATE_BINDING_TABLE_EDIT_GS,
+   _3DSTATE_BINDING_TABLE_EDIT_PS,
+};
 /**
  * Upload a shader stage's binding table as indirect state.
  *
@@ -166,6 +171,41 @@ const struct brw_tracked_state brw_gs_binding_table = {
  * Hardware-generated binding tables for the resource streamer
  */
 void
+gen7_update_binding_table(struct brw_context *brw,
+  gl_shader_stage stage,
+  uint32_t index,
+  uint32_t surf_offset)
+{
+   assert(stage <= MESA_SHADER_FRAGMENT);
+
+   BEGIN_BATCH(3);
+   OUT_BATCH(stage_to_bt_edit[stage] << 16 | (3 - 2));
+   OUT_BATCH(0x3);
+   OUT_BATCH(index << 16 | (surf_offset >> 5));
+   ADVANCE_BATCH();
+}
+
+/**
+ * Hardware-generated binding tables for the resource streamer
+ */
+void
+gen7_update_binding_table_from_array(struct brw_context *brw,
+ gl_shader_stage stage,
+ const uint32_t* binding_table,
+ int size)
+{
+   assert(stage <= MESA_SHADER_FRAGMENT);
+
+   BEGIN_BATCH(size + 2);
+   OUT_BATCH(stage_to_bt_edit[stage] << 16 | size);
+   OUT_BATCH(0x3);
+   for (int i = 0; i < size; i++) {
+  OUT_BATCH(i << 16 | binding_table[i] >> 5);
+   }
+   ADVANCE_BATCH();
+}
+
+void
 gen7_disable_hw_binding_tables(struct brw_context *brw)
 {
BEGIN_BATCH(3);
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index ba62811..3f31a6f 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1574,6 +1574,11 @@ enum brw_message_target {
 
 #define _3DSTATE_BINDING_TABLE_POOL_ALLOC   0x7919 /* GEN7.5+ */
 # define HSW_BINDING_TABLE_ALLOC_OFFSET 0x860  /* GEN7.5+ */
+#define _3DSTATE_BINDING_TABLE_EDIT_VS  0x7843 /* GEN7.5 */
+#define _3DSTATE_BINDING_TABLE_EDIT_GS  0x7844 /* GEN7.5 */
+#define _3DSTATE_BINDING_TABLE_EDIT_HS  0x7845 /* GEN7.5 */
+#define _3DSTATE_BINDING_TABLE_EDIT_DS  0x7846 /* GEN7.5 */
+#define _3DSTATE_BINDING_TABLE_EDIT_PS  0x7847 /* GEN7.5 */
 
 #define _3DSTATE_SAMPLER_STATE_POINTERS0x7802 /* GEN6+ */
 # define PS_SAMPLER_STATE_CHANGE   (1 << 12)
diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index bbbf4a4..f985a3a 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -297,6 +297,15 @@ gen8_upload_constant_state(struct brw_context *brw,
bool active, unsigned opcode);
 /* gen7_misc_state.c */
 void gen7_rs_control(struct brw_context *brw, int enable);
+
+void gen7_update_binding_table(struct brw_context *brw,
+   gl_shader_stage stage,
+   uint32_t index,
+   uint32_t surf_offset);
+void gen7_update_binding_table_from_array(struct brw_context *brw,
+  gl_shader_stage stage,
+  const uint32_t* binding_table,
+  int size);
 void gen7_enable_hw_binding_tables(struct brw_context *brw);
 void gen7_disable_hw_binding_tables(struct brw_context *brw);
 void gen7_reset_rs_pool_offsets(struct brw_context *brw);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [RFC PATCH 06/40] i965/gen7.5: Enable hardware-generated binding tables in blorp path

2015-01-04 Thread Abdiel Janulgue

Signed-off-by: Abdiel Janulgue 
---
 src/mesa/drivers/dri/i965/gen7_blorp.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp 
b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
index 206a6ff..3d5c7df 100644
--- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
@@ -824,6 +824,8 @@ gen7_blorp_exec(struct brw_context *brw,
if (params->use_wm_prog) {
   uint32_t wm_surf_offset_renderbuffer;
   uint32_t wm_surf_offset_texture = 0;
+  
+  gen7_enable_hw_binding_tables(brw);
   wm_push_const_offset = gen6_blorp_emit_wm_constants(brw, params);
   intel_miptree_used_for_rendering(params->dst.mt);
   wm_surf_offset_renderbuffer =
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] tgsi: keep track of read vs written indirects

2015-01-04 Thread Rob Clark

From: Rob Clark 

At least temporarily, I need to fallback to old compiler still for
relative dest (for freedreno), but I can do relative src temp.  Only
a temporary situation, but seems easy/reasonable for tgsi-scan to
track this.

Signed-off-by: Rob Clark 
---
I could always keep temp indirect support on a branch until the RA can
also handle indirect dest.  But I would at least like to keep things
split up into smaller patches (and still bisectable).

 src/gallium/auxiliary/tgsi/tgsi_scan.c | 2 ++
 src/gallium/auxiliary/tgsi/tgsi_scan.h | 6 ++
 2 files changed, 8 insertions(+)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.c 
b/src/gallium/auxiliary/tgsi/tgsi_scan.c
index eb313e4..2b44271 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_scan.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_scan.c
@@ -130,6 +130,7 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
/* check for indirect register reads */
if (src->Register.Indirect) {
   info->indirect_files |= (1 << src->Register.File);
+  info->indirect_files_read |= (1 << src->Register.File);
}
 
/* MSAA samplers */
@@ -150,6 +151,7 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
const struct tgsi_full_dst_register *dst = &fullinst->Dst[i];
if (dst->Register.Indirect) {
   info->indirect_files |= (1 << dst->Register.File);
+  info->indirect_files_written |= (1 << dst->Register.File);
}
 }
 
diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.h 
b/src/gallium/auxiliary/tgsi/tgsi_scan.h
index 375f75a..93a9898 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_scan.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_scan.h
@@ -93,6 +93,12 @@ struct tgsi_shader_info
 * indirect addressing.  The bits are (1 << TGSI_FILE_x), etc.
 */
unsigned indirect_files;
+   /**
+* Bitmask indicating which register files are read / written with
+* indirect addressing.  The bits are (1 << TGSI_FILE_x).
+*/
+   unsigned indirect_files_read;
+   unsigned indirect_files_written;
 
unsigned properties[TGSI_PROPERTY_COUNT]; /* index with TGSI_PROPERTY_ */
 };
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

1 2 >

1 - 100 of 103 matches

Mail list logo