[Mesa-dev] [Bug 84145] UE4: Realistic Rendering Demo render blue

2014-09-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84145

Ilia Mirkin imir...@alum.mit.edu changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #21 from Ilia Mirkin imir...@alum.mit.edu ---
I've pushed this out. Thanks for bisecting and testing!

commit 9d2e298dd4159651323cac54dbc43527e7fd6d16
Author: Ilia Mirkin imir...@alum.mit.edu
Date:   Wed Sep 24 00:58:07 2014 -0400

mesa/st: NumLayers is only valid for array textures

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84355] New: texture2DProjLod and textureCubeLod are not supported when using GLES.

2014-09-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84355

  Priority: medium
Bug ID: 84355
  Assignee: mesa-dev@lists.freedesktop.org
   Summary: texture2DProjLod and textureCubeLod are not supported
when using GLES.
  Severity: normal
Classification: Unclassified
OS: All
  Reporter: kondapallykalyancontrib...@gmail.com
  Hardware: Other
Status: NEW
   Version: 10.2
 Component: Mesa core
   Product: Mesa

Created attachment 106901
  -- https://bugs.freedesktop.org/attachment.cgi?id=106901action=edit
patch.

According to GLES (i.e. 1.0 and above) spec textureCubeLod and
texture2DProjLod are built in functions. We seem to disable support
for these functions with GLES.



The following WebGL conformance tests fail when running Chromium Web Browser
with Wayland(https://github.com/01org/ozone-wayland)

Test case:
https://www.khronos.org/registry/webgl/sdk/tests/conformance/glsl/samplers/glsl-function-texture2dprojlod.html

Attached is a patch which fixes this..

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] i965: Issue performance warnings on MapBufferRange stalls.

2014-09-26 Thread Kenneth Graunke
On Friday, August 29, 2014 11:10:48 PM Kenneth Graunke wrote:
 This is easy: we just need to use brw_map_bo instead of mapping it
 directly.
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/intel_buffer_objects.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c 
 b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
 index 96dacde..fb806dc 100644
 --- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c
 +++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
 @@ -421,8 +421,8 @@ intel_bufferobj_map_range(struct gl_context * ctx,

 intel_obj-map_extra[index],
alignment);
if (brw-has_llc) {
 - drm_intel_bo_map(intel_obj-range_map_bo[index],
 -  (access  GL_MAP_WRITE_BIT) != 0);
 + brw_bo_map(brw, intel_obj-range_map_bo[index],
 +(access  GL_MAP_WRITE_BIT) != 0, range-map);
} else {
   drm_intel_gem_bo_map_gtt(intel_obj-range_map_bo[index]);
}
 @@ -438,7 +438,8 @@ intel_bufferobj_map_range(struct gl_context * ctx,
drm_intel_gem_bo_map_gtt(intel_obj-buffer);
intel_bufferobj_mark_inactive(intel_obj);
 } else {
 -  drm_intel_bo_map(intel_obj-buffer, (access  GL_MAP_WRITE_BIT) != 0);
 +  brw_bo_map(brw, intel_obj-buffer, (access  GL_MAP_WRITE_BIT) != 0,
 + MapBufferRange);
intel_bufferobj_mark_inactive(intel_obj);
 }

It's been a month and patches 2-4 haven't received any review.  Could someone 
take a look?

Thanks,
--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84355] texture2DProjLod and textureCubeLod are not supported when using GLES.

2014-09-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84355

kalyank kondapallykalyancontrib...@gmail.com changed:

   What|Removed |Added

   Hardware|Other   |x86 (IA32)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 84355] texture2DProjLod and textureCubeLod are not supported when using GLES.

2014-09-26 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=84355

Kenneth Graunke kenn...@whitecape.org changed:

   What|Removed |Added

   Assignee|mesa-dev@lists.freedesktop. |i...@freedesktop.org
   |org |
 QA Contact||intel-3d-bugs@lists.freedes
   ||ktop.org
  Component|Mesa core   |glsl-compiler

--- Comment #1 from Kenneth Graunke kenn...@whitecape.org ---
Hi Kalyan,

Please use git-send-email to send patches to mesa-dev.

Thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] configure.ac: Compute LLVM_VERSION_PATCH using llvm-config

2014-09-26 Thread Jonathan Gray
On Thu, Sep 25, 2014 at 12:55:40PM -0700, Tom Stellard wrote:
 This is the only guaranteed way get the patch level for llvm,
 since the define cannot always be found in config.h depending
 on the version of llvm or the build system used.
 
 CC: mesa-sta...@lists.freedesktop.org

Reviewed-by: Jonathan Gray j...@jsg.id.au

 ---
  configure.ac | 9 -
  1 file changed, 4 insertions(+), 5 deletions(-)
 
 diff --git a/configure.ac b/configure.ac
 index bad1528..a097a5c 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -1704,11 +1704,10 @@ if test x$enable_gallium_llvm = xyes; then
  AC_COMPUTE_INT([LLVM_VERSION_MINOR], [LLVM_VERSION_MINOR],
  [#include ${LLVM_INCLUDEDIR}/llvm/Config/llvm-config.h])
  
 -dnl In LLVM 3.4.1 patch level was defined in config.h and not
 -dnl llvm-config.h
 -AC_COMPUTE_INT([LLVM_VERSION_PATCH], [LLVM_VERSION_PATCH],
 -[#include ${LLVM_INCLUDEDIR}/llvm/Config/config.h],
 -LLVM_VERSION_PATCH=0) dnl Default if LLVM_VERSION_PATCH not found
 +LLVM_VERSION_PATCH=`echo $LLVM_VERSION | cut -d. -f3 | egrep -o 
 '^[[0-9]]+'`
 +if test -z $LLVM_VERSION_PATCH; then
 +LLVM_VERSION_PATCH=0
 +fi
  
  if test -n ${LLVM_VERSION_MAJOR}; then
  LLVM_VERSION_INT=${LLVM_VERSION_MAJOR}0${LLVM_VERSION_MINOR}
 -- 
 1.8.3.1
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] radeonsi/compute: directly emit CONTEXT_CONTROL

2014-09-26 Thread Christian König
How about assuming for each CS that it can use the compute ring and as 
soon as we submit a PM4 command that can only be executed on the 
graphics ring note that this CS needs to be executed on the graphics ring?


Just an idea,
Christian.

Am 25.09.2014 um 21:02 schrieb Tom Stellard:

On Mon, Sep 22, 2014 at 09:48:43PM +0200, Marek Olšák wrote:

No, we cannot detect compute-only contexts yet. We need to add a new
parameter to pipe_context::context_create which says that a context is
compute-only. That should be OpenCL but not OpenGL.

Also, some code paths like resource_copy_region use the graphics
engine for copying, which cannot be used with compute rings and must
be implemented with either DMA or compute-based blits. DMA isn't
flexible enough, so some additional work for compute-based blits might
be needed. We can also use the graphics ring for copying only and the
compute ring for compute stuff.


If possible, I think I would prefer continuing to use the graphic ring
for blits and only submit compute specific packets to the compute ring.
I'm a little concerned that adding a compute-flag to context create
might make it harder to share code between compute and graphics, which
I think is important.

What are the downsides of using both rings at once?  Will we need to add
synchronization code for the two rings?  I think the last time I
looked into doing this, the biggest problem was that fences were
submitted via the graphics ring even though they were meant for jobs
on the compute ring.  Is there are good solution to this?

-Tom


Marek

On Mon, Sep 22, 2014 at 8:03 PM, Niels Ole Salscheider
niels_...@salscheider-online.de wrote:

On Monday 22 September 2014, 12:16:13, Alex Deucher wrote:

On Sat, Sep 20, 2014 at 6:11 AM, Marek Olšák mar...@gmail.com wrote:

From: Marek Olšák marek.ol...@amd.com

Looks good.  Tom should probably take a look as well.  As a further
improvement, it would be nice to be able to use the compute rings for
compute rather than gfx, but I'm not sure how much additional effort
it would take to clean that up.

This is completely untested but now that we can detect compute contexts
something like the attached patches might be sufficient...


Reviewed-by: Alex Deucher alexander.deuc...@amd.com


---

  src/gallium/drivers/radeonsi/si_compute.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c
b/src/gallium/drivers/radeonsi/si_compute.c index 4b2662d..3ad9182 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -168,6 +168,7 @@ static void si_launch_grid(

 uint32_t pc, const void *input)

  {

 struct si_context *sctx = (struct si_context*)ctx;

+   struct radeon_winsys_cs *cs = sctx-b.rings.gfx.cs;

 struct si_compute *program = sctx-cs_shader_state.program;
 struct si_pm4_state *pm4 = CALLOC_STRUCT(si_pm4_state);
 struct r600_resource *input_buffer = program-input_buffer;

@@ -184,8 +185,11 @@ static void si_launch_grid(

 unsigned lds_blocks;
 unsigned num_waves_for_scratch;

+   radeon_emit(cs, PKT3(PKT3_CONTEXT_CONTROL, 1, 0) |
PKT3_SHADER_TYPE_S(1)); +   radeon_emit(cs, 0x8000);
+   radeon_emit(cs, 0x8000);
+

 pm4-compute_pkt = true;

-   si_cmd_context_control(pm4);

 si_pm4_cmd_begin(pm4, PKT3_EVENT_WRITE);
 si_pm4_cmd_add(pm4, EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH) |

--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] st/va: skeleton VAAPI state tracker

2014-09-26 Thread Emil Velikov
Hi Leo,

On 25/09/14 15:21, Liu, Leo wrote:
 Hi Gwenole and Emil,
 
[...]
 the reason for $(LIBVA_LIBS) is for xcb lib, from configure.ac
 +PKG_CHECK_MODULES([LIBVA], [libva = 0.35.0 x11-xcb xcb-dri2 = 
 $XCBDRI2_REQUIRED])
 
 I will separate them, and remove libva for link.
 
I've completely forgot that the patch that splits them out did not land.

The easiest/shortest thing you can do is (based on vdpau)

PKG_CHECK_MODULES([LIBVA], [libva = 0.35.0 x11-xcb xcb-dri2 =
$XCBDRI2_REQUIRED],
[LIBVA_LIBS=`$PKG_CONFIG --libs x11-xcb xcb-dri2 = $XCBDRI2_REQUIRED`])


Pardon for the bashing,
Emil
 Thanks,
 Leo

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] glsl: Optimize min/max expression trees

2014-09-26 Thread Iago Toral Quiroga
Original patch by Petri Latvala petri.latv...@intel.com:

Add an optimization pass that drops min/max expression operands that
can be proven to not contribute to the final result. The algorithm is
similar to alpha-beta pruning on a minmax search, from the field of
AI.

This optimization pass can optimize min/max expressions where operands
are min/max expressions. Such code can appear in shaders by itself, or
as the result of clamp() or AMD_shader_trinary_minmax functions.

This optimization pass improves the generated code for piglit's
AMD_shader_trinary_minmax tests as follows:

total instructions in shared programs: 75 - 67 (-10.67%)
instructions in affected programs: 60 - 52 (-13.33%)
GAINED:0
LOST:  0

All tests (max3, min3, mid3) improved.

A full shader-db run:

total instructions in shared programs: 4293603 - 4293575 (-0.00%)
instructions in affected programs: 1188 - 1160 (-2.36%)
GAINED:0
LOST:  0

Improvements happen in Guacamelee and Serious Sam 3. One shader from
Dungeon Defenders is hurt by shader-db metrics (26 - 28), because of
dropping of a (constant float (0.0)) operand, which was
compiled to a saturate modifier.

Version 2 by Iago Toral Quiroga ito...@igalia.com:

Changes from review feedback:
- Squashed various cosmetic changes sent by Matt Turner.
- Make less_all_components return an enum rather than setting a class member.
  (Suggested by Mat Turner). Also, renamed it to compare_components.
- Make less_all_components, smaller_constant and larger_constant static.
  (Suggested by Mat Turner)
- Change mixmax_range to call its limits low and high instead of
  range[0] and range[1]. (Suggested by Connor Abbot).
- Use ir_builder swizzle helpers in swizzle_if_required(). (Suggested by
  Connor Abbot).
- Make the logic more clearer by rearrenging the code and commenting.
  (Suggested by Connor Abbot).
- Added comment to explain why we need to recurse twice. (Suggested by
  Connor Abbot).
- If we cannot prune an expression, do not return early. Instead, attempt
  to prune its children. (Suggested by Connor Abbot).

Other changes:
- Instead of having a global valid visitor member, let the various functions
  that can determine this status return a boolean and check for its value
  to decide what to do in each case. This is more flexible and allows to
  recurse into children of parents that could not be prunned due to invalid
  ranges (so related to the last bullet in the review feedback).
- Make sure we always check if a range is valid before working with it. Since
  any use of get_range, combine_range or range_intersection can invalidate
  a range we should check for this situation every time we use any of these
  functions.

No piglit regressions observed with Version 2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76861
---

Version 2 also passes all unit tests sent by Petri in the original series.

 src/glsl/Makefile.sources   |   1 +
 src/glsl/glsl_parser_extras.cpp |   1 +
 src/glsl/ir_optimization.h  |   1 +
 src/glsl/opt_minmax.cpp | 457 
 4 files changed, 460 insertions(+)
 create mode 100644 src/glsl/opt_minmax.cpp

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index cb8d5a6..1c08697 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -95,6 +95,7 @@ LIBGLSL_FILES = \
$(GLSL_SRCDIR)/opt_flip_matrices.cpp \
$(GLSL_SRCDIR)/opt_function_inlining.cpp \
$(GLSL_SRCDIR)/opt_if_simplification.cpp \
+   $(GLSL_SRCDIR)/opt_minmax.cpp \
$(GLSL_SRCDIR)/opt_noop_swizzle.cpp \
$(GLSL_SRCDIR)/opt_rebalance_tree.cpp \
$(GLSL_SRCDIR)/opt_redundant_jumps.cpp \
diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
index 490c3c8..ae19ce4 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -1586,6 +1586,7 @@ do_common_optimization(exec_list *ir, bool linked,
else
   progress = do_constant_variable_unlinked(ir) || progress;
progress = do_constant_folding(ir) || progress;
+   progress = do_minmax_prune(ir) || progress;
progress = do_cse(ir) || progress;
progress = do_rebalance_tree(ir) || progress;
progress = do_algebraic(ir, native_integers, options) || progress;
diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
index 369dcd1..8fbd992 100644
--- a/src/glsl/ir_optimization.h
+++ b/src/glsl/ir_optimization.h
@@ -99,6 +99,7 @@ bool opt_flatten_nested_if_blocks(exec_list *instructions);
 bool do_discard_simplification(exec_list *instructions);
 bool lower_if_to_cond_assign(exec_list *instructions, unsigned max_depth = 0);
 bool do_mat_op_to_vec(exec_list *instructions);
+bool do_minmax_prune(exec_list *instructions);
 bool do_noop_swizzle(exec_list *instructions);
 bool do_structure_splitting(exec_list 

Re: [Mesa-dev] [PATCH 1/4] radeonsi/compute: directly emit CONTEXT_CONTROL

2014-09-26 Thread Alex Deucher
On Thu, Sep 25, 2014 at 3:02 PM, Tom Stellard t...@stellard.net wrote:
 On Mon, Sep 22, 2014 at 09:48:43PM +0200, Marek Olšák wrote:
 No, we cannot detect compute-only contexts yet. We need to add a new
 parameter to pipe_context::context_create which says that a context is
 compute-only. That should be OpenCL but not OpenGL.

 Also, some code paths like resource_copy_region use the graphics
 engine for copying, which cannot be used with compute rings and must
 be implemented with either DMA or compute-based blits. DMA isn't
 flexible enough, so some additional work for compute-based blits might
 be needed. We can also use the graphics ring for copying only and the
 compute ring for compute stuff.


 If possible, I think I would prefer continuing to use the graphic ring
 for blits and only submit compute specific packets to the compute ring.
 I'm a little concerned that adding a compute-flag to context create
 might make it harder to share code between compute and graphics, which
 I think is important.

 What are the downsides of using both rings at once?  Will we need to add
 synchronization code for the two rings?  I think the last time I
 looked into doing this, the biggest problem was that fences were
 submitted via the graphics ring even though they were meant for jobs
 on the compute ring.  Is there are good solution to this?

It would be nice to not have any dependencies on the gfx ring.  That
way compute jobs can run on the compute rings without requiring the
gfx ring which should avoid any latency issues with desktop gfx jobs.

Alex


 -Tom

 Marek

 On Mon, Sep 22, 2014 at 8:03 PM, Niels Ole Salscheider
 niels_...@salscheider-online.de wrote:
  On Monday 22 September 2014, 12:16:13, Alex Deucher wrote:
  On Sat, Sep 20, 2014 at 6:11 AM, Marek Olšák mar...@gmail.com wrote:
   From: Marek Olšák marek.ol...@amd.com
 
  Looks good.  Tom should probably take a look as well.  As a further
  improvement, it would be nice to be able to use the compute rings for
  compute rather than gfx, but I'm not sure how much additional effort
  it would take to clean that up.
 
  This is completely untested but now that we can detect compute contexts
  something like the attached patches might be sufficient...
 
  Reviewed-by: Alex Deucher alexander.deuc...@amd.com
 
   ---
  
src/gallium/drivers/radeonsi/si_compute.c | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
  
   diff --git a/src/gallium/drivers/radeonsi/si_compute.c
   b/src/gallium/drivers/radeonsi/si_compute.c index 4b2662d..3ad9182 
   100644
   --- a/src/gallium/drivers/radeonsi/si_compute.c
   +++ b/src/gallium/drivers/radeonsi/si_compute.c
   @@ -168,6 +168,7 @@ static void si_launch_grid(
  
   uint32_t pc, const void *input)
  
{
  
   struct si_context *sctx = (struct si_context*)ctx;
  
   +   struct radeon_winsys_cs *cs = sctx-b.rings.gfx.cs;
  
   struct si_compute *program = sctx-cs_shader_state.program;
   struct si_pm4_state *pm4 = CALLOC_STRUCT(si_pm4_state);
   struct r600_resource *input_buffer = program-input_buffer;
  
   @@ -184,8 +185,11 @@ static void si_launch_grid(
  
   unsigned lds_blocks;
   unsigned num_waves_for_scratch;
  
   +   radeon_emit(cs, PKT3(PKT3_CONTEXT_CONTROL, 1, 0) |
   PKT3_SHADER_TYPE_S(1)); +   radeon_emit(cs, 0x8000);
   +   radeon_emit(cs, 0x8000);
   +
  
   pm4-compute_pkt = true;
  
   -   si_cmd_context_control(pm4);
  
   si_pm4_cmd_begin(pm4, PKT3_EVENT_WRITE);
   si_pm4_cmd_add(pm4, EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH) |
  
   --
   1.9.1
  
   ___
   mesa-dev mailing list
   mesa-dev@lists.freedesktop.org
   http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa (master): glsl: Make sure fields after small structs have correct padding

2014-09-26 Thread Ian Romanick
Okay... I screwed up this morning.  I pushed a set of four patches
without adding Jordan's Reviewed-by.  Realizing the error, I quickly
added the R-b to each commit and force-pushed the changes.

If you pushed something in the intervening 2 minutes, it got lost.

On 09/26/2014 08:00 AM, Ian Romanick wrote:
 Module: Mesa
 Branch: master
 Commit: 8e01c66da6c780601f941aa5b9939962c219fdbd
 URL:
 http://cgit.freedesktop.org/mesa/mesa/commit/?id=8e01c66da6c780601f941aa5b9939962c219fdbd
 
 Author: Ian Romanick ian.d.roman...@intel.com
 Date:   Mon Sep  8 12:23:39 2014 -0700
 
 glsl: Make sure fields after small structs have correct padding
 
 Previously the linker would correctly calculate the layout, but the
 lower_ubo_reference pass would not apply correct alignment to fields
 following small (less than 16-byte) nested structures.
 
 Signed-off-by: Ian Romanick ian.d.roman...@intel.com
 Reviewed-by: Jordan Justen jordan.l.jus...@intel.com
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83533
 Cc: mesa-sta...@lists.freedesktop.org
 
 ---
 
  src/glsl/lower_ubo_reference.cpp |   22 ++
  1 file changed, 22 insertions(+)
 
 diff --git a/src/glsl/lower_ubo_reference.cpp 
 b/src/glsl/lower_ubo_reference.cpp
 index 3cdfc04..4ae1aac 100644
 --- a/src/glsl/lower_ubo_reference.cpp
 +++ b/src/glsl/lower_ubo_reference.cpp
 @@ -327,6 +327,15 @@ lower_ubo_reference_visitor::handle_rvalue(ir_rvalue 
 **rvalue)
const glsl_type *struct_type = deref_record-record-type;
unsigned intra_struct_offset = 0;
  
 + /* glsl_type::std140_base_alignment doesn't grok interfaces.  Use
 +  * 16-bytes for the alignment because that is the general minimum of
 +  * std140.
 +  */
 + const unsigned struct_alignment = struct_type-is_interface()
 +? 16
 +: struct_type-std140_base_alignment(row_major);
 +
 +
for (unsigned int i = 0; i  struct_type-length; i++) {
   const glsl_type *type = struct_type-fields.structure[i].type;
  
 @@ -346,6 +355,19 @@ lower_ubo_reference_visitor::handle_rvalue(ir_rvalue 
 **rvalue)
  deref_record-field) == 0)
  break;
  intra_struct_offset += type-std140_size(field_row_major);
 +
 +/* If the field just examined was itself a structure, apply rule
 + * #9:
 + *
 + * The structure may have padding at the end; the base 
 offset
 + * of the member following the sub-structure is rounded up to
 + * the next multiple of the base alignment of the structure.
 + */
 +if (type-without_array()-is_record()) {
 +   intra_struct_offset = glsl_align(intra_struct_offset,
 +struct_alignment);
 +
 +}
}
  
const_offset += intra_struct_offset;
 
 ___
 mesa-commit mailing list
 mesa-com...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-commit
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] i965: Issue performance warnings on MapBufferRange stalls.

2014-09-26 Thread Kristian Høgsberg
On Fri, Aug 29, 2014 at 11:10:48PM -0700, Kenneth Graunke wrote:
 This is easy: we just need to use brw_map_bo instead of mapping it
 directly.
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org

Reviwed-by: Kristian Høgsberg k...@bitplanet.net

 ---
  src/mesa/drivers/dri/i965/intel_buffer_objects.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c 
 b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
 index 96dacde..fb806dc 100644
 --- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c
 +++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
 @@ -421,8 +421,8 @@ intel_bufferobj_map_range(struct gl_context * ctx,

 intel_obj-map_extra[index],
alignment);
if (brw-has_llc) {
 - drm_intel_bo_map(intel_obj-range_map_bo[index],
 -  (access  GL_MAP_WRITE_BIT) != 0);
 + brw_bo_map(brw, intel_obj-range_map_bo[index],
 +(access  GL_MAP_WRITE_BIT) != 0, range-map);
} else {
   drm_intel_gem_bo_map_gtt(intel_obj-range_map_bo[index]);
}
 @@ -438,7 +438,8 @@ intel_bufferobj_map_range(struct gl_context * ctx,
drm_intel_gem_bo_map_gtt(intel_obj-buffer);
intel_bufferobj_mark_inactive(intel_obj);
 } else {
 -  drm_intel_bo_map(intel_obj-buffer, (access  GL_MAP_WRITE_BIT) != 0);
 +  brw_bo_map(brw, intel_obj-buffer, (access  GL_MAP_WRITE_BIT) != 0,
 + MapBufferRange);
intel_bufferobj_mark_inactive(intel_obj);
 }
  
 -- 
 2.1.0
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] i965: Issue performance warnings for program cache related stalls.

2014-09-26 Thread Kristian Høgsberg
On Fri, Aug 29, 2014 at 11:10:49PM -0700, Kenneth Graunke wrote:
 We don't really want extra buffer copying or stalls when mapping,
 so it'd be nice to know when it's happening.
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org

Reviewed-by: Kristian Høgsberg k...@bitplanet.net

 ---
  src/mesa/drivers/dri/i965/brw_state_cache.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c 
 b/src/mesa/drivers/dri/i965/brw_state_cache.c
 index b0986ea..b9bb0fc 100644
 --- a/src/mesa/drivers/dri/i965/brw_state_cache.c
 +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
 @@ -175,7 +175,7 @@ brw_cache_new_bo(struct brw_cache *cache, uint32_t 
 new_size)
  
 /* Copy any existing data that needs to be saved. */
 if (cache-next_offset != 0) {
 -  drm_intel_bo_map(cache-bo, false);
 +  brw_bo_map(brw, cache-bo, false, program cache);
drm_intel_bo_subdata(new_bo, 0, cache-next_offset, 
 cache-bo-virtual);
drm_intel_bo_unmap(cache-bo);
 }
 @@ -200,6 +200,7 @@ brw_try_upload_using_copy(struct brw_cache *cache,
 const void *data,
 const void *aux)
  {
 +   struct brw_context *brw = cache-brw;
 int i;
 struct brw_cache_item *item;
  
 @@ -221,7 +222,7 @@ brw_try_upload_using_copy(struct brw_cache *cache,
   continue;
}
  
 -  drm_intel_bo_map(cache-bo, false);
 +  brw_bo_map(brw, cache-bo, false, program cache);
ret = memcmp(cache-bo-virtual + item-offset, data, item-size);
drm_intel_bo_unmap(cache-bo);
if (ret)
 @@ -241,6 +242,8 @@ brw_upload_item_data(struct brw_cache *cache,
struct brw_cache_item *item,
const void *data)
  {
 +   struct brw_context *brw = cache-brw;
 +
 /* Allocate space in the cache BO for our new program. */
 if (cache-next_offset + item-size  cache-bo-size) {
uint32_t new_size = cache-bo-size * 2;
 @@ -255,6 +258,7 @@ brw_upload_item_data(struct brw_cache *cache,
  * recreate it.
  */
 if (cache-bo_used_by_gpu) {
 +  perf_debug(Copying busy program cache buffer.\n);
brw_cache_new_bo(cache, cache-bo-size);
 }
  
 -- 
 2.1.0
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] i965: Issue performance warnings for program cache related stalls.

2014-09-26 Thread Chris Wilson
On Fri, Sep 26, 2014 at 08:36:39AM -0700, Kristian Høgsberg wrote:
 On Fri, Aug 29, 2014 at 11:10:49PM -0700, Kenneth Graunke wrote:
  We don't really want extra buffer copying or stalls when mapping,
  so it'd be nice to know when it's happening.
  
  Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 
 Reviewed-by: Kristian Høgsberg k...@bitplanet.net

This warns if the the program cache is currently being read by the GPU
(expected) but a read-read (as used here) does not incur a stall.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: improve accuracy of atan()

2014-09-26 Thread Erik Faye-Lund
Our current atan()-approximation is pretty inaccurate at 1.0, so
let's try to improve the situation by doing a direct approximation
without going through atan.

This new implementation uses an 11th degree polynomial to approximate
atan in the [-1..1] range, and the following identitiy to reduce the
entire range to [-1..1]:

atan(x) = 0.5 * pi * sign(x) - atan(1.0 / x)

This range-reduction idea is taken from the paper Fast computation
of Arctangent Functions for Embedded Applications: A Comparative
Analysis (Ukil et al. 2011).

The polynomial that approximates atan(x) is:

x   * 0.793128310355 - x^3  * 0.3326756418091246 +
x^5 * 0.1938924977115610 - x^7  * 0.1173503194786851 +
x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444

This polynomial was found with the following GNU Octave script:

x = linspace(0, 1);
y = atan(x);
n = [1, 3, 5, 7, 9, 11];
format long;
polyfitc(x, y, n)

The polyfitc function is not built-in, but too long to include here.
It can be downloaded from the following URL:

http://www.mathworks.com/matlabcentral/fileexchange/47851-constraint-polynomial-fit/content/polyfitc.m

This fixes the following piglit test:
shaders/glsl-const-folding-01

Signed-off-by: Erik Faye-Lund kusmab...@gmail.com
Reviewed-by: Ian Romanick ian.d.roman...@intel.com
---
 src/glsl/builtin_functions.cpp | 65 +++---
 1 file changed, 55 insertions(+), 10 deletions(-)

diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
index 9be7f6d..c126b60 100644
--- a/src/glsl/builtin_functions.cpp
+++ b/src/glsl/builtin_functions.cpp
@@ -442,6 +442,7 @@ private:
ir_swizzle *matrix_elt(ir_variable *var, int col, int row);
 
ir_expression *asin_expr(ir_variable *x);
+   void do_atan(ir_factory body, const glsl_type *type, ir_variable *res, 
operand y_over_x);
 
/**
 * Call function \param f with parameters specified as the linked
@@ -2684,11 +2685,7 @@ builtin_builder::_atan2(const glsl_type *type)
   ir_factory outer_then(outer_if-then_instructions, mem_ctx);
 
   /* Then...call atan(y/x) */
-  ir_variable *y_over_x = outer_then.make_temp(glsl_type::float_type, 
y_over_x);
-  outer_then.emit(assign(y_over_x, div(y, x)));
-  outer_then.emit(assign(r, mul(y_over_x, rsq(add(mul(y_over_x, y_over_x),
-  imm(1.0f));
-  outer_then.emit(assign(r, asin_expr(r)));
+  do_atan(body, glsl_type::float_type, r, div(y, x));
 
   /* ...and fix it up: */
   ir_if *inner_if = new(mem_ctx) ir_if(less(x, imm(0.0f)));
@@ -2711,17 +2708,65 @@ builtin_builder::_atan2(const glsl_type *type)
return sig;
 }
 
+void
+builtin_builder::do_atan(ir_factory body, const glsl_type *type, ir_variable 
*res, operand y_over_x)
+{
+   /*
+* range-reduction, first step:
+*
+*  / y_over_x if |y_over_x| = 1.0;
+* x = 
+*  \ 1.0 / y_over_x   otherwise
+*/
+   ir_variable *x = body.make_temp(type, atan_x);
+   body.emit(assign(x, div(min2(abs(y_over_x),
+imm(1.0f)),
+   max2(abs(y_over_x),
+imm(1.0f);
+
+   /*
+* approximate atan by evaluating polynomial:
+*
+* x   * 0.793128310355 - x^3  * 0.3326756418091246 +
+* x^5 * 0.1938924977115610 - x^7  * 0.1173503194786851 +
+* x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444
+*/
+   ir_variable *tmp = body.make_temp(type, atan_tmp);
+   body.emit(assign(tmp, mul(x, x)));
+   body.emit(assign(tmp, 
mul(add(mul(sub(mul(add(mul(sub(mul(add(mul(imm(-0.0121323213173444f),
+ tmp),
+ 
imm(0.0536813784310406f)),
+ tmp),
+ 
imm(0.1173503194786851f)),
+ tmp),
+ imm(0.1938924977115610f)),
+ tmp),
+ imm(0.3326756418091246f)),
+ tmp),
+ imm(0.793128310355f)),
+ x)));
+
+   /* range-reduction fixup */
+   body.emit(assign(tmp, add(tmp,
+ mul(b2f(greater(abs(y_over_x),
+  imm(1.0f, type-components(,
+  add(mul(tmp,
+  imm(-2.0f)),
+  imm(M_PI_2f));
+
+   /* sign fixup */
+   body.emit(assign(res, mul(tmp, sign(y_over_x;
+}
+
 ir_function_signature *
 builtin_builder::_atan(const glsl_type *type)
 {
ir_variable *y_over_x = in_var(type, y_over_x);
MAKE_SIG(type, always_available, 1, 

Re: [Mesa-dev] [PATCH 4/4] i965: Use unsynchronized maps for the program cache on LLC platforms.

2014-09-26 Thread Kristian Høgsberg
On Fri, Aug 29, 2014 at 11:10:50PM -0700, Kenneth Graunke wrote:
 There's no reason to stall on pwrite - the CPU always appends to the
 buffer and never modifies existing contents, and the GPU never writes
 it.  Further, the CPU always appends new data before submitting a batch
 that requires it.
 
 This code predates the unsynchronized mapping feature, so we simply
 didn't have the option when it was written.
 
 Ideally, we would do this for non-LLC platforms too, but unsynchronized
 mapping support only exists for LLC systems.
 
 Saves repeated 0.001ms stalls on program upload.
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/brw_state_cache.c | 34 
 +++--
  1 file changed, 27 insertions(+), 7 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c 
 b/src/mesa/drivers/dri/i965/brw_state_cache.c
 index b9bb0fc..1d2d32f 100644
 --- a/src/mesa/drivers/dri/i965/brw_state_cache.c
 +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
 @@ -172,14 +172,23 @@ brw_cache_new_bo(struct brw_cache *cache, uint32_t 
 new_size)
 drm_intel_bo *new_bo;
  
 new_bo = drm_intel_bo_alloc(brw-bufmgr, program cache, new_size, 64);
 +   if (brw-has_llc)
 +  drm_intel_gem_bo_map_unsynchronized(new_bo);
  
 /* Copy any existing data that needs to be saved. */
 if (cache-next_offset != 0) {
 -  brw_bo_map(brw, cache-bo, false, program cache);
 -  drm_intel_bo_subdata(new_bo, 0, cache-next_offset, 
 cache-bo-virtual);
 -  drm_intel_bo_unmap(cache-bo);
 +  if (brw-has_llc) {
 + memcpy(new_bo-virtual, cache-bo-virtual, cache-next_offset);

Move the drm_intel_gem_bo_map_unsynchronized() and drm_intel_bo_unmap()
calls into this block so they bracket the memcpy as for the subdata case
below?

Other than that,

Reviewed-by: Kristian Høgsberg k...@bitplanet.net

 +  } else {
 + brw_bo_map(brw, cache-bo, false, program cache);
 + drm_intel_bo_subdata(new_bo, 0, cache-next_offset,
 +  cache-bo-virtual);
 + drm_intel_bo_unmap(cache-bo);
 +  }
 }
  
 +   if (brw-has_llc)
 +  drm_intel_bo_unmap(cache-bo);
 drm_intel_bo_unreference(cache-bo);
 cache-bo = new_bo;
 cache-bo_used_by_gpu = false;
 @@ -222,9 +231,11 @@ brw_try_upload_using_copy(struct brw_cache *cache,
   continue;
}
  
 -  brw_bo_map(brw, cache-bo, false, program cache);
 + if (!brw-has_llc)
 +brw_bo_map(brw, cache-bo, false, program cache);
ret = memcmp(cache-bo-virtual + item-offset, data, item-size);
 -  drm_intel_bo_unmap(cache-bo);
 + if (!brw-has_llc)
 +drm_intel_bo_unmap(cache-bo);
if (ret)
   continue;
  
 @@ -257,7 +268,7 @@ brw_upload_item_data(struct brw_cache *cache,
 /* If we would block on writing to an in-use program BO, just
  * recreate it.
  */
 -   if (cache-bo_used_by_gpu) {
 +   if (!brw-has_llc  cache-bo_used_by_gpu) {
perf_debug(Copying busy program cache buffer.\n);
brw_cache_new_bo(cache, cache-bo-size);
 }
 @@ -280,6 +291,7 @@ brw_upload_cache(struct brw_cache *cache,
uint32_t *out_offset,
void *out_aux)
  {
 +   struct brw_context *brw = cache-brw;
 struct brw_cache_item *item = CALLOC_STRUCT(brw_cache_item);
 GLuint hash;
 void *tmp;
 @@ -320,7 +332,11 @@ brw_upload_cache(struct brw_cache *cache,
 cache-n_items++;
  
 /* Copy data to the buffer */
 -   drm_intel_bo_subdata(cache-bo, item-offset, data_size, data);
 +   if (brw-has_llc) {
 +  memcpy((char *) cache-bo-virtual + item-offset, data, data_size);
 +   } else {
 +  drm_intel_bo_subdata(cache-bo, item-offset, data_size, data);
 +   }
  
 *out_offset = item-offset;
 *(void **)out_aux = (void *)((char *)item-key + item-key_size);
 @@ -342,6 +358,8 @@ brw_init_caches(struct brw_context *brw)
 cache-bo = drm_intel_bo_alloc(brw-bufmgr,
 program cache,
 4096, 64);
 +   if (brw-has_llc)
 +  drm_intel_gem_bo_map_unsynchronized(cache-bo);
  
 cache-aux_compare[BRW_VS_PROG] = brw_vs_prog_data_compare;
 cache-aux_compare[BRW_GS_PROG] = brw_gs_prog_data_compare;
 @@ -408,6 +426,8 @@ brw_destroy_cache(struct brw_context *brw, struct 
 brw_cache *cache)
  
 DBG(%s\n, __FUNCTION__);
  
 +   if (brw-has_llc)
 +  drm_intel_bo_unmap(cache-bo);
 drm_intel_bo_unreference(cache-bo);
 cache-bo = NULL;
 brw_clear_cache(brw, cache);
 -- 
 2.1.0
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] i965: Issue performance warnings on MapBufferRange stalls.

2014-09-26 Thread Kristian Høgsberg
On Fri, Sep 26, 2014 at 12:38 AM, Kenneth Graunke kenn...@whitecape.org wrote:
 On Friday, August 29, 2014 11:10:48 PM Kenneth Graunke wrote:
 This is easy: we just need to use brw_map_bo instead of mapping it
 directly.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/intel_buffer_objects.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/intel_buffer_objects.c 
 b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
 index 96dacde..fb806dc 100644
 --- a/src/mesa/drivers/dri/i965/intel_buffer_objects.c
 +++ b/src/mesa/drivers/dri/i965/intel_buffer_objects.c
 @@ -421,8 +421,8 @@ intel_bufferobj_map_range(struct gl_context * ctx,

 intel_obj-map_extra[index],
alignment);
if (brw-has_llc) {
 - drm_intel_bo_map(intel_obj-range_map_bo[index],
 -  (access  GL_MAP_WRITE_BIT) != 0);
 + brw_bo_map(brw, intel_obj-range_map_bo[index],
 +(access  GL_MAP_WRITE_BIT) != 0, range-map);
} else {
   drm_intel_gem_bo_map_gtt(intel_obj-range_map_bo[index]);
}
 @@ -438,7 +438,8 @@ intel_bufferobj_map_range(struct gl_context * ctx,
drm_intel_gem_bo_map_gtt(intel_obj-buffer);
intel_bufferobj_mark_inactive(intel_obj);
 } else {
 -  drm_intel_bo_map(intel_obj-buffer, (access  GL_MAP_WRITE_BIT) != 0);
 +  brw_bo_map(brw, intel_obj-buffer, (access  GL_MAP_WRITE_BIT) != 0,
 + MapBufferRange);
intel_bufferobj_mark_inactive(intel_obj);
 }

 It's been a month and patches 2-4 haven't received any review.  Could someone 
 take a look?

Sorry, I saw them go by but didn't review.  I like 4/4 a lot.

Kristian

 Thanks,
 --Ken
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] glsl: Optimize min/max expression trees

2014-09-26 Thread Connor Abbott
On Fri, Sep 26, 2014 at 9:02 AM, Iago Toral Quiroga ito...@igalia.com wrote:
 Original patch by Petri Latvala petri.latv...@intel.com:

 Add an optimization pass that drops min/max expression operands that
 can be proven to not contribute to the final result. The algorithm is
 similar to alpha-beta pruning on a minmax search, from the field of
 AI.

 This optimization pass can optimize min/max expressions where operands
 are min/max expressions. Such code can appear in shaders by itself, or
 as the result of clamp() or AMD_shader_trinary_minmax functions.

 This optimization pass improves the generated code for piglit's
 AMD_shader_trinary_minmax tests as follows:

 total instructions in shared programs: 75 - 67 (-10.67%)
 instructions in affected programs: 60 - 52 (-13.33%)
 GAINED:0
 LOST:  0

 All tests (max3, min3, mid3) improved.

 A full shader-db run:

 total instructions in shared programs: 4293603 - 4293575 (-0.00%)
 instructions in affected programs: 1188 - 1160 (-2.36%)
 GAINED:0
 LOST:  0

 Improvements happen in Guacamelee and Serious Sam 3. One shader from
 Dungeon Defenders is hurt by shader-db metrics (26 - 28), because of
 dropping of a (constant float (0.0)) operand, which was
 compiled to a saturate modifier.

 Version 2 by Iago Toral Quiroga ito...@igalia.com:

 Changes from review feedback:
 - Squashed various cosmetic changes sent by Matt Turner.
 - Make less_all_components return an enum rather than setting a class member.
   (Suggested by Mat Turner). Also, renamed it to compare_components.
 - Make less_all_components, smaller_constant and larger_constant static.
   (Suggested by Mat Turner)
 - Change mixmax_range to call its limits low and high instead of
   range[0] and range[1]. (Suggested by Connor Abbot).
 - Use ir_builder swizzle helpers in swizzle_if_required(). (Suggested by
   Connor Abbot).
 - Make the logic more clearer by rearrenging the code and commenting.
   (Suggested by Connor Abbot).
 - Added comment to explain why we need to recurse twice. (Suggested by
   Connor Abbot).
 - If we cannot prune an expression, do not return early. Instead, attempt
   to prune its children. (Suggested by Connor Abbot).

 Other changes:
 - Instead of having a global valid visitor member, let the various functions
   that can determine this status return a boolean and check for its value
   to decide what to do in each case. This is more flexible and allows to
   recurse into children of parents that could not be prunned due to invalid
   ranges (so related to the last bullet in the review feedback).
 - Make sure we always check if a range is valid before working with it. Since
   any use of get_range, combine_range or range_intersection can invalidate
   a range we should check for this situation every time we use any of these
   functions.

 No piglit regressions observed with Version 2.

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76861
 ---

 Version 2 also passes all unit tests sent by Petri in the original series.

  src/glsl/Makefile.sources   |   1 +
  src/glsl/glsl_parser_extras.cpp |   1 +
  src/glsl/ir_optimization.h  |   1 +
  src/glsl/opt_minmax.cpp | 457 
 
  4 files changed, 460 insertions(+)
  create mode 100644 src/glsl/opt_minmax.cpp

 diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
 index cb8d5a6..1c08697 100644
 --- a/src/glsl/Makefile.sources
 +++ b/src/glsl/Makefile.sources
 @@ -95,6 +95,7 @@ LIBGLSL_FILES = \
 $(GLSL_SRCDIR)/opt_flip_matrices.cpp \
 $(GLSL_SRCDIR)/opt_function_inlining.cpp \
 $(GLSL_SRCDIR)/opt_if_simplification.cpp \
 +   $(GLSL_SRCDIR)/opt_minmax.cpp \
 $(GLSL_SRCDIR)/opt_noop_swizzle.cpp \
 $(GLSL_SRCDIR)/opt_rebalance_tree.cpp \
 $(GLSL_SRCDIR)/opt_redundant_jumps.cpp \
 diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
 index 490c3c8..ae19ce4 100644
 --- a/src/glsl/glsl_parser_extras.cpp
 +++ b/src/glsl/glsl_parser_extras.cpp
 @@ -1586,6 +1586,7 @@ do_common_optimization(exec_list *ir, bool linked,
 else
progress = do_constant_variable_unlinked(ir) || progress;
 progress = do_constant_folding(ir) || progress;
 +   progress = do_minmax_prune(ir) || progress;
 progress = do_cse(ir) || progress;
 progress = do_rebalance_tree(ir) || progress;
 progress = do_algebraic(ir, native_integers, options) || progress;
 diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
 index 369dcd1..8fbd992 100644
 --- a/src/glsl/ir_optimization.h
 +++ b/src/glsl/ir_optimization.h
 @@ -99,6 +99,7 @@ bool opt_flatten_nested_if_blocks(exec_list *instructions);
  bool do_discard_simplification(exec_list *instructions);
  bool lower_if_to_cond_assign(exec_list *instructions, unsigned max_depth = 
 0);
  bool 

[Mesa-dev] [PATCH v2 40/41] i965/fs: Use the GRF for FB writes on gen = 7

2014-09-26 Thread Jason Ekstrand
On gen 7, the MRF was removed and we gained the ability to do send
instructions directly from the GRF.  This commit enables that functinoality
for FB writes.

v2: Make handling of components more sane.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp |   4 +
 src/mesa/drivers/dri/i965/brw_fs.h   |   1 +
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 167 +--
 src/mesa/drivers/dri/i965/brw_shader.cpp |   1 +
 4 files changed, 136 insertions(+), 37 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index b43032b..143b590 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -514,6 +514,8 @@ fs_inst::is_send_from_grf() const
   return true;
case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
   return src[1].file == GRF;
+   case FS_OPCODE_FB_WRITE:
+  return src[0].file == GRF;
default:
   if (is_tex())
  return src[0].file == GRF;
@@ -917,6 +919,8 @@ fs_inst::regs_read(fs_visitor *v, int arg) const
 {
if (is_tex()  arg == 0  src[0].file == GRF) {
   return mlen;
+   } else if (opcode == FS_OPCODE_FB_WRITE  arg == 0) {
+  return mlen;
} else if (opcode == SHADER_OPCODE_UNTYPED_ATOMIC  arg == 0) {
   return mlen;
} else if (opcode == SHADER_OPCODE_UNTYPED_SURFACE_READ  arg == 0) {
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 7500e8e..a91bf9f 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -521,6 +521,7 @@ public:
 fs_reg dst, fs_reg src0, fs_reg src1, fs_reg one);
 
void emit_color_write(fs_reg color, int index, int first_color_mrf);
+   int setup_color_payload(fs_reg *dst, fs_reg color, unsigned components);
void emit_alpha_test();
fs_inst *emit_single_fb_write(fs_reg color1, fs_reg color2,
  fs_reg src0_alpha, unsigned components);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 8e38315..e72fb62 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -3005,6 +3005,82 @@ fs_visitor::emit_color_write(fs_reg color, int index, 
int first_color_mrf)
}
 }
 
+int
+fs_visitor::setup_color_payload(fs_reg *dst, fs_reg color, unsigned components)
+{
+   fs_inst *inst;
+
+   if (color.file == BAD_FILE) {
+  return 4 * (dispatch_width / 8);
+   }
+
+   uint8_t colors_enabled;
+   if (components == 0) {
+  /* We want to write one component to the alpha channel */
+  colors_enabled = 0x8;
+   } else {
+  /* Enable the first components-many channels */
+  colors_enabled = (1  components) - 1;
+   }
+
+   if (dispatch_width == 8 || brw-gen = 6) {
+  /* SIMD8 write looks like:
+   * m + 0: r0
+   * m + 1: r1
+   * m + 2: g0
+   * m + 3: g1
+   *
+   * gen6 SIMD16 DP write looks like:
+   * m + 0: r0
+   * m + 1: r1
+   * m + 2: g0
+   * m + 3: g1
+   * m + 4: b0
+   * m + 5: b1
+   * m + 6: a0
+   * m + 7: a1
+   */
+  int len = 0;
+  for (unsigned i = 0; i  4; ++i) {
+ if (colors_enabled  (1  i)) {
+dst[len] = fs_reg(GRF, virtual_grf_alloc(color.width / 8),
+  color.type, color.width);
+inst = emit(MOV(dst[len], offset(color, i)));
+inst-saturate = key-clamp_fragment_color;
+ } else if (color.width == 16) {
+/* We need two BAD_FILE slots for a 16-wide color */
+len++;
+ }
+ len++;
+  }
+  return len;
+   } else {
+  /* pre-gen6 SIMD16 single source DP write looks like:
+   * m + 0: r0
+   * m + 1: g0
+   * m + 2: b0
+   * m + 3: a0
+   * m + 4: r1
+   * m + 5: g1
+   * m + 6: b1
+   * m + 7: a1
+   */
+  for (unsigned i = 0; i  4; ++i) {
+ if (colors_enabled  (1  i)) {
+dst[i] = fs_reg(GRF, virtual_grf_alloc(1), color.type);
+inst = emit(MOV(dst[i], half(offset(color, i), 0)));
+inst-saturate = key-clamp_fragment_color;
+
+dst[i + 4] = fs_reg(GRF, virtual_grf_alloc(1), color.type);
+inst = emit(MOV(dst[i + 4], half(offset(color, i), 1)));
+inst-saturate = key-clamp_fragment_color;
+inst-force_sechalf = true;
+ }
+  }
+  return 8;
+   }
+}
+
 static enum brw_conditional_mod
 cond_for_alpha_func(GLenum func)
 {
@@ -3063,12 +3139,13 @@ fs_visitor::emit_single_fb_write(fs_reg color0, fs_reg 
color1,
 {
this-current_annotation = FB write header;
bool header_present = true;
+   int reg_size = dispatch_width / 8;
+
/* We can potentially have a message length of up to 15, so we have to set
 * base_mrf to either 0 or 1 in order to fit in m0..m15.
 */
-   int base_mrf = 1;
-   int nr = base_mrf;
-   

[Mesa-dev] [PATCH 42/41] i965: Fix widths on gen5 math instructions.

2014-09-26 Thread Jason Ekstrand
This commit uses a 16-wide MRF instead of a hardware register when setting
up math instructions and properly sets the base_mrf on the second emitted
instruction.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 2 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 143b590..af9736b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1648,7 +1648,7 @@ fs_visitor::emit_math(enum opcode opcode, fs_reg dst, 
fs_reg src0, fs_reg src1)
   fs_reg op0 = is_int_div ? src1 : src0;
   fs_reg op1 = is_int_div ? src0 : src1;
 
-  emit(BRW_OPCODE_MOV, fs_reg(MRF, base_mrf + 1, op1.type), op1);
+  emit(MOV(fs_reg(MRF, base_mrf + 1, op1.type, dispatch_width), op1));
   inst = emit(opcode, dst, op0, reg_null_f);
 
   inst-base_mrf = base_mrf;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 59c7e7c..485c050 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -346,7 +346,7 @@ fs_generator::generate_math_gen4(fs_inst *inst,
   brw_set_default_compression_control(p, BRW_COMPRESSION_NONE);
   gen4_math(p, firsthalf(dst),
op,
-   inst-base_mrf + 1, firsthalf(src),
+   inst-base_mrf, firsthalf(src),
BRW_MATH_DATA_VECTOR,
BRW_MATH_PRECISION_FULL);
   brw_set_default_compression_control(p, BRW_COMPRESSION_2NDHALF);
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06.5/41] SQUAHS: i965/fs: Always 2-align registers SIMD16 for gen = 5

2014-09-26 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 61 ++-
 1 file changed, 48 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
index 567f8e2..8d96906 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
@@ -117,7 +117,21 @@ brw_alloc_reg_set(struct intel_screen *screen, int 
reg_width)
/* Compute the total number of registers across all classes. */
int ra_reg_count = 0;
for (int i = 0; i  class_count; i++) {
-  ra_reg_count += base_reg_count - (class_sizes[i] - 1);
+  if (devinfo-gen = 5  reg_width == 2) {
+ /* From the GM5 PRM:
+  *
+  * In order to reduce the hardware complexity, the following
+  * rules and restrictions apply to the compressed instruction:
+  * ...
+  * * Operand Alignment Rule: With the exceptions listed below, a
+  *   source/destination operand in general should be aligned to
+  *   even 256-bit physical register with a region size equal to
+  *   two 256-bit physical register
+  */
+ ra_reg_count += (base_reg_count - (class_sizes[i] - 1)) / 2;
+  } else {
+ ra_reg_count += base_reg_count - (class_sizes[i] - 1);
+  }
}
 
uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t, ra_reg_count);
@@ -134,27 +148,48 @@ brw_alloc_reg_set(struct intel_screen *screen, int 
reg_width)
int pairs_base_reg = 0;
int pairs_reg_count = 0;
for (int i = 0; i  class_count; i++) {
-  int class_reg_count = base_reg_count - (class_sizes[i] - 1);
+  int class_reg_count;
+  if (devinfo-gen = 5  reg_width == 2) {
+ class_reg_count = (base_reg_count - (class_sizes[i] - 1)) / 2;
+  } else {
+ class_reg_count = base_reg_count - (class_sizes[i] - 1);
+  }
   classes[i] = ra_alloc_reg_class(regs);
 
   /* Save this off for the aligned pair class at the end. */
   if (class_sizes[i] == 2) {
-pairs_base_reg = reg;
-pairs_reg_count = class_reg_count;
+ pairs_base_reg = reg;
+ pairs_reg_count = class_reg_count;
   }
 
-  for (int j = 0; j  class_reg_count; j++) {
-ra_class_add_reg(regs, classes[i], reg);
+  if (devinfo-gen = 5  reg_width == 2) {
+ for (int j = 0; j  class_reg_count; j++) {
+ra_class_add_reg(regs, classes[i], reg);
 
-ra_reg_to_grf[reg] = j;
+ra_reg_to_grf[reg] = j * 2;
 
-for (int base_reg = j;
- base_reg  j + class_sizes[i];
- base_reg++) {
-   ra_add_transitive_reg_conflict(regs, base_reg, reg);
-}
+for (int base_reg = j * 2;
+ base_reg  j * 2 + class_sizes[i];
+ base_reg++) {
+   ra_add_transitive_reg_conflict(regs, base_reg, reg);
+}
 
-reg++;
+reg++;
+ }
+  } else {
+ for (int j = 0; j  class_reg_count; j++) {
+ra_class_add_reg(regs, classes[i], reg);
+
+ra_reg_to_grf[reg] = j;
+
+for (int base_reg = j;
+ base_reg  j + class_sizes[i];
+ base_reg++) {
+   ra_add_transitive_reg_conflict(regs, base_reg, reg);
+}
+
+reg++;
+ }
   }
}
assert(reg == ra_reg_count);
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10.5/41] SQUASH: i965/fs: Properly set writemasks in LOAD_PAYLOAD

2014-09-26 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 56 +++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 444cc32..4d97594 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2865,10 +2865,44 @@ fs_visitor::lower_load_payload()
 {
bool progress = false;
 
+   int vgrf_to_reg[virtual_grf_count];
+   int reg_count = 16; /* Leave room for MRF */
+   for (int i = 0; i  virtual_grf_count; ++i) {
+  vgrf_to_reg[i] = reg_count;
+  reg_count += virtual_grf_sizes[i];
+   }
+
+   struct {
+  bool written:1; /* Whether this register has ever been written */
+  bool force_writemask_all:1;
+  bool force_sechalf:1;
+   } metadata[reg_count];
+   memset(metadata, 0, sizeof(metadata));
+
calculate_cfg();
 
foreach_block_and_inst_safe (block, fs_inst, inst, cfg) {
+  int dst_reg;
+  if (inst-dst.file == MRF) {
+ dst_reg = inst-dst.reg;
+  } else if (inst-dst.file == GRF) {
+ dst_reg = vgrf_to_reg[inst-dst.reg];
+  }
+
+  if (inst-dst.file == MRF || inst-dst.file == GRF) {
+ bool force_sechalf = inst-force_sechalf;
+ bool toggle_sechalf = inst-dst.width == 16 
+   type_sz(inst-dst.type) == 4;
+ for (int i = 0; i  inst-regs_written; ++i) {
+metadata[dst_reg + i].written = true;
+metadata[dst_reg + i].force_sechalf = force_sechalf;
+metadata[dst_reg + i].force_writemask_all = 
inst-force_writemask_all;
+force_sechalf = (toggle_sechalf != force_sechalf);
+ }
+  }
+
   if (inst-opcode == SHADER_OPCODE_LOAD_PAYLOAD) {
+ assert(inst-dst.file == MRF || inst-dst.file == GRF);
  fs_reg dst = inst-dst;
 
  for (int i = 0; i  inst-sources; i++) {
@@ -2879,7 +2913,27 @@ fs_visitor::lower_load_payload()
/* Do nothing but otherwise increment as normal */
 } else {
fs_inst *mov = MOV(dst, inst-src[i]);
-   mov-force_writemask_all = true;
+   if (inst-src[i].file == GRF) {
+  int src_reg = vgrf_to_reg[inst-src[i].reg] +
+inst-src[i].reg_offset;
+  mov-force_sechalf = metadata[src_reg].force_sechalf;
+  mov-force_writemask_all = 
metadata[src_reg].force_writemask_all;
+  metadata[dst_reg] = metadata[src_reg];
+  if (dst.width * type_sz(dst.type)  32) {
+ assert((!metadata[src_reg].written ||
+ !metadata[src_reg].force_sechalf) 
+(!metadata[src_reg + 1].written ||
+ metadata[src_reg + 1].force_sechalf));
+ metadata[dst_reg + 1] = metadata[src_reg + 1];
+  }
+   } else {
+  metadata[dst_reg].force_writemask_all = false;
+  metadata[dst_reg].force_sechalf = false;
+  if (dst.width == 16) {
+ metadata[dst_reg + 1].force_writemask_all = false;
+ metadata[dst_reg + 1].force_sechalf = true;
+  }
+   }
inst-insert_before(block, mov);
 }
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/12] i965/fs: Copy propagate partial reads.

2014-09-26 Thread Jason Ekstrand
This commit reworks copy propagation a bit to support propagating the
copying of partial registers.  This comes up every time we have pull
constants because we do a pull constant read immediately followed by a move
to splat the one component of the out to 8 or 16-wide.  This allows us to
eliminate the copy and simply use the one component of the register.

Shader DB results:

total instructions in shared programs: 5044937 - 5044428 (-0.01%)
instructions in affected programs: 66112 - 65603 (-0.77%)
GAINED:0
LOST:  0

Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs.h |  1 +
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 83 --
 2 files changed, 64 insertions(+), 20 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 50b5fc1..9b63114 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -337,6 +337,7 @@ public:
bool opt_cse_local(bblock_t *block);
bool opt_copy_propagate();
bool try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry);
+   bool try_constant_propagate(fs_inst *inst, acp_entry *entry);
bool opt_copy_propagate_local(void *mem_ctx, bblock_t *block,
  exec_list *acp);
void opt_drop_redundant_mov_to_flags();
diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index e5816df..a97dc04 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -277,24 +277,30 @@ is_logic_op(enum opcode opcode)
 bool
 fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
 {
+   if (inst-src[arg].file != GRF)
+  return false;
+
if (entry-src.file == IMM)
   return false;
+   assert(entry-src.file == GRF || entry-src.file == UNIFORM);
 
if (entry-opcode == SHADER_OPCODE_LOAD_PAYLOAD 
inst-opcode == SHADER_OPCODE_LOAD_PAYLOAD)
   return false;
 
-   /* Bail if inst is reading more than entry is writing. */
-   if ((inst-regs_read(this, arg) * inst-src[arg].stride *
-type_sz(inst-src[arg].type))  type_sz(entry-dst.type))
+   assert(entry-dst.file == GRF);
+   if (inst-src[arg].reg != entry-dst.reg)
   return false;
 
-   if (inst-src[arg].file != entry-dst.file ||
-   inst-src[arg].reg != entry-dst.reg ||
-   inst-src[arg].reg_offset != entry-dst.reg_offset ||
-   inst-src[arg].subreg_offset != entry-dst.subreg_offset) {
+   /* Bail if inst is reading a range that isn't contained in the range
+* that entry is writing.
+*/
+   int reg_size = dispatch_width * sizeof(float);
+   if (inst-src[arg].reg_offset  entry-dst.reg_offset ||
+   (inst-src[arg].reg_offset * reg_size + inst-src[arg].subreg_offset +
+inst-regs_read(this, arg) * inst-src[arg].stride * reg_size) 
+   (entry-dst.reg_offset + 1) * reg_size)
   return false;
-   }
 
/* See resolve_ud_negate() and comment in brw_fs_emit.cpp. */
if (inst-conditional_mod 
@@ -361,11 +367,39 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
acp_entry *entry)
 
inst-src[arg].file = entry-src.file;
inst-src[arg].reg = entry-src.reg;
-   inst-src[arg].reg_offset = entry-src.reg_offset;
-   inst-src[arg].subreg_offset = entry-src.subreg_offset;
inst-src[arg].stride *= entry-src.stride;
inst-saturate = inst-saturate || entry-saturate;
 
+   switch (entry-src.file) {
+   case BAD_FILE:
+   case HW_REG:
+   case UNIFORM:
+  inst-src[arg].reg_offset = entry-src.reg_offset;
+  inst-src[arg].subreg_offset = entry-src.subreg_offset;
+  break;
+   case GRF:
+  {
+ /* In this case, we have to deal with mapping parts of vgrfs to
+  * other parts of vgrfs so we have to do some reg_offset magic.
+  */
+
+ /* Compute the offset of inst-src[arg] relative to inst-dst */
+ assert(entry-dst.subreg_offset == 0);
+ int rel_offset = inst-src[arg].reg_offset - entry-dst.reg_offset;
+ int rel_suboffset = inst-src[arg].subreg_offset;
+
+ /* Compute the final register offset (in bytes) */
+ int offset = entry-src.reg_offset * reg_size + 
entry-src.subreg_offset;
+ offset += rel_offset * reg_size + rel_suboffset;
+ inst-src[arg].reg_offset = offset / reg_size;
+ inst-src[arg].subreg_offset = offset % reg_size;
+  }
+  break;
+   default:
+  unreachable(Invalid register file);
+  break;
+   }
+
if (!inst-src[arg].abs) {
   inst-src[arg].abs = entry-src.abs;
   inst-src[arg].negate ^= entry-src.negate;
@@ -375,9 +409,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
acp_entry *entry)
 }
 
 
-static bool
-try_constant_propagate(struct brw_context *brw, fs_inst *inst,
-   

[Mesa-dev] [PATCH 39.2/41] i965/fs: Handle COMPR4 in LOAD_PAYLOAD

2014-09-26 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 15 +++
 src/mesa/drivers/dri/i965/brw_fs.h   | 22 +-
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 97b21e3..b43032b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2988,6 +2988,21 @@ fs_visitor::lower_load_payload()
 
 if (inst-src[i].file == BAD_FILE) {
/* Do nothing but otherwise increment as normal */
+} else if (dst.file == MRF 
+   dst.width == 8 
+   brw-has_compr4 
+   i + 4  inst-sources 
+   inst-src[i + 4].equals(horiz_offset(inst-src[i], 8))) 
{
+   fs_reg compr4_dst = dst;
+   compr4_dst.reg += BRW_MRF_COMPR4;
+   compr4_dst.width = 16;
+   fs_reg compr4_src = inst-src[i];
+   compr4_src.width = 16;
+   fs_inst *mov = MOV(compr4_dst, compr4_src);
+   mov-force_writemask_all = true;
+   inst-insert_before(block, mov);
+   /* Mark i+4 as BAD_FILE so we don't emit a MOV for it */
+   inst-src[i + 4].file = BAD_FILE;
 } else {
fs_inst *mov = MOV(dst, inst-src[i]);
if (inst-src[i].file == GRF) {
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 14bbac2..7500e8e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -143,6 +143,26 @@ byte_offset(fs_reg reg, unsigned delta)
 }
 
 static inline fs_reg
+horiz_offset(fs_reg reg, unsigned delta)
+{
+   switch (reg.file) {
+   case BAD_FILE:
+   case UNIFORM:
+   case IMM:
+  /* These only have a single component that is implicitly splatted.  A
+   * horizontal offset should be a harmless no-op.
+   */
+  break;
+   case GRF:
+   case MRF:
+  return byte_offset(reg, delta * reg.stride * type_sz(reg.type));
+   default:
+  assert(delta == 0);
+   }
+   return reg;
+}
+
+static inline fs_reg
 offset(fs_reg reg, unsigned delta)
 {
assert(reg.stride  0);
@@ -183,7 +203,7 @@ half(fs_reg reg, unsigned idx)
assert(idx == 0 || (reg.file != HW_REG  reg.file != IMM));
assert(reg.width == 16);
reg.width = 8;
-   return byte_offset(reg, 8 * idx * reg.stride * type_sz(reg.type));
+   return horiz_offset(reg, 8 * idx);
 }
 
 static const fs_reg reg_undef;
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 45/63] i965/fs: Make fs_reg::effective_width take fs_inst* instead of fs_visitor*

2014-09-26 Thread Jason Ekstrand
Now that we have execution sizes, we can use that instead of the dispatch
width.  This way it also works for 8-wide instructions in SIMD16.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp  | 10 +-
 src/mesa/drivers/dri/i965/brw_fs.h|  4 ++--
 src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp |  4 ++--
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 551bc2b..ffbfdbd 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -354,7 +354,7 @@ fs_visitor::LOAD_PAYLOAD(const fs_reg dst, fs_reg *src, 
int sources)
* dealing with whole registers.  If this ever changes, we can deal
* with it later.
*/
-  int size = src[i].effective_width(this) * type_sz(src[i].type);
+  int size = src[i].effective_width(inst) * type_sz(src[i].type);
   assert(size % 32 == 0);
   inst-regs_written += (size + 31) / 32;
}
@@ -583,7 +583,7 @@ fs_reg::equals(const fs_reg r) const
 }
 
 uint8_t
-fs_reg::effective_width(const fs_visitor *v) const
+fs_reg::effective_width(const fs_inst *inst) const
 {
switch (this-file) {
case BAD_FILE:
@@ -591,10 +591,10 @@ fs_reg::effective_width(const fs_visitor *v) const
case UNIFORM:
case IMM:
   assert(this-width == 1);
-  return v-dispatch_width;
+  return inst-exec_size;
case GRF:
case HW_REG:
-  assert(this-width  1  this-width = v-dispatch_width);
+  assert(this-width  1  this-width = inst-exec_size);
   assert(this-width % 8 == 0);
   return this-width;
case MRF:
@@ -2994,7 +2994,7 @@ fs_visitor::lower_load_payload()
  fs_reg dst = inst-dst;
 
  for (int i = 0; i  inst-sources; i++) {
-dst.width = inst-src[i].effective_width(this);
+dst.width = inst-src[i].effective_width(inst);
 dst.type = inst-src[i].type;
 
 if (inst-src[i].file == BAD_FILE) {
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 4ffbec8..c282b5e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -62,7 +62,7 @@ namespace brw {
class fs_live_variables;
 }
 
-class fs_visitor;
+class fs_inst;
 
 class fs_reg : public backend_reg {
 public:
@@ -110,7 +110,7 @@ public:
 * effectively take on the width of the instruction in which they are
 * used.
 */
-   uint8_t effective_width(const fs_visitor *v) const;
+   uint8_t effective_width(const fs_inst *inst) const;
 
/** Register region horizontal stride */
uint8_t stride;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index aafc49b..73a196d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -640,13 +640,13 @@ fs_visitor::opt_copy_propagate_local(void *copy_prop_ctx, 
bblock_t *block,
  inst-dst.file == GRF) {
  int offset = 0;
  for (int i = 0; i  inst-sources; i++) {
-int regs_written = ((inst-src[i].effective_width(this) *
+int regs_written = ((inst-src[i].effective_width(inst) *
  type_sz(inst-src[i].type)) + 31) / 32;
 if (inst-src[i].file == GRF) {
acp_entry *entry = ralloc(copy_prop_ctx, acp_entry);
entry-dst = inst-dst;
entry-dst.reg_offset = offset;
-   entry-dst.width = inst-src[i].effective_width(this);
+   entry-dst.width = inst-src[i].effective_width(inst);
entry-src = inst-src[i];
entry-regs_written = regs_written;
entry-opcode = inst-opcode;
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 39.1/41] i965/fs: Constant propagate into LOAD_PAYLOAD

2014-09-26 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index 7dfed6e..6b7ec79 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -456,6 +456,7 @@ fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry 
*entry)
 
   switch (inst-opcode) {
   case BRW_OPCODE_MOV:
+  case SHADER_OPCODE_LOAD_PAYLOAD:
  inst-src[i] = val;
  progress = true;
  break;
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 41/41] SQUASH: i965/fs: Force a high register for the final FB write

2014-09-26 Thread Jason Ekstrand
v2: Renamed the array for the range mappings and added a comment.
---
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 34 ++-
 src/mesa/drivers/dri/i965/intel_screen.h  | 10 +++
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
index 246d27c..477efe1 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
@@ -113,6 +113,10 @@ brw_alloc_reg_set(struct intel_screen *screen, int 
reg_width)
   class_sizes[class_count++] = 8;
}
 
+   memset(screen-wm_reg_sets[index].class_to_ra_reg_range, 0,
+  sizeof(screen-wm_reg_sets[index].class_to_ra_reg_range));
+   int *class_to_ra_reg_range = 
screen-wm_reg_sets[index].class_to_ra_reg_range;
+
/* Compute the total number of registers across all classes. */
int ra_reg_count = 0;
for (int i = 0; i  class_count; i++) {
@@ -131,6 +135,14 @@ brw_alloc_reg_set(struct intel_screen *screen, int 
reg_width)
   } else {
  ra_reg_count += base_reg_count - (class_sizes[i] - 1);
   }
+  /* Mark the last register. We'll fill in the beginnings later. */
+  class_to_ra_reg_range[class_sizes[i]] = ra_reg_count;
+   }
+
+   /* Fill out the rest of the range markers */
+   for (int i = 1; i  17; ++i) {
+  if (class_to_ra_reg_range[i] == 0)
+ class_to_ra_reg_range[i] = class_to_ra_reg_range[i-1];
}
 
uint8_t *ra_reg_to_grf = ralloc_array(screen, uint8_t, ra_reg_count);
@@ -504,9 +516,29 @@ fs_visitor::assign_regs(bool allow_spilling)
}
 
setup_payload_interference(g, payload_node_count, first_payload_node);
-   if (brw-gen = 7)
+   if (brw-gen = 7) {
   setup_mrf_hack_interference(g, first_mrf_hack_node);
 
+  foreach_in_list(fs_inst, inst, instructions) {
+ /* When we do send-from-GRF for FB writes, we need to ensure that
+  * the last write instruction sends from a high register.  This is
+  * because the vertex fetcher wants to start filling the low
+  * payload registers while the pixel data port is still working on
+  * writing out the memory.  If we don't do this, we get rendering
+  * artifacts.
+  *
+  * We could just do something high.  Instead, we just pick the
+  * highest register that works.
+  */
+ if (inst-opcode == FS_OPCODE_FB_WRITE  inst-eot) {
+int size = virtual_grf_sizes[inst-src[0].reg];
+int reg = screen-wm_reg_sets[rsi].class_to_ra_reg_range[size] - 1;
+ra_set_node_reg(g, inst-src[0].reg, reg);
+break;
+ }
+  }
+   }
+
if (dispatch_width  8) {
   /* In 16-wide dispatch we have an issue where a compressed
* instruction is actually two instructions executed simultaneiously.
diff --git a/src/mesa/drivers/dri/i965/intel_screen.h 
b/src/mesa/drivers/dri/i965/intel_screen.h
index 945f6f5..88a84a2 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.h
+++ b/src/mesa/drivers/dri/i965/intel_screen.h
@@ -90,6 +90,16 @@ struct intel_screen
   int classes[16];
 
   /**
+   * Mapping from classes to ra_reg ranges.  Each of the per-size
+   * classes corresponds to a range of ra_reg nodes.  This array stores
+   * those ranges in the form of first ra_reg in each class and the
+   * total number of ra_reg elements in the last array element.  This
+   * way the range of the i'th class is given by:
+   * [ class_to_ra_reg_range[i], class_to_ra_reg_range[i+1] )
+   */
+  int class_to_ra_reg_range[17];
+
+  /**
* Mapping for register-allocated objects in *regs to the first
* GRF for that object.
*/
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 17/41] SQUASH: i965/fs: Properly handle widths in copy propagation

2014-09-26 Thread Jason Ekstrand
v2: Account for register ranges due to the rebase on top of the patch to
propagate subsets of copied registers

---
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 40 ++
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index cfb17bf..01113f3 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -42,6 +42,7 @@ namespace { /* avoid conflict with 
opt_copy_propagation_elements */
 struct acp_entry : public exec_node {
fs_reg dst;
fs_reg src;
+   uint8_t regs_written;
enum opcode opcode;
bool saturate;
 };
@@ -295,11 +296,10 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
acp_entry *entry)
/* Bail if inst is reading a range that isn't contained in the range
 * that entry is writing.
 */
-   int reg_size = dispatch_width * sizeof(float);
if (inst-src[arg].reg_offset  entry-dst.reg_offset ||
-   (inst-src[arg].reg_offset * reg_size + inst-src[arg].subreg_offset +
-inst-regs_read(this, arg) * inst-src[arg].stride * reg_size) 
-   (entry-dst.reg_offset + 1) * reg_size)
+   (inst-src[arg].reg_offset * 32 + inst-src[arg].subreg_offset +
+inst-regs_read(this, arg) * inst-src[arg].stride * 32) 
+   (entry-dst.reg_offset + entry-regs_written) * 32)
   return false;
 
/* See resolve_ud_negate() and comment in brw_fs_emit.cpp. */
@@ -371,16 +371,25 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
acp_entry *entry)
inst-saturate = inst-saturate || entry-saturate;
 
switch (entry-src.file) {
+   case UNIFORM:
+  assert(entry-src.width == 1);
case BAD_FILE:
case HW_REG:
-   case UNIFORM:
+  inst-src[arg].width = entry-src.width;
   inst-src[arg].reg_offset = entry-src.reg_offset;
   inst-src[arg].subreg_offset = entry-src.subreg_offset;
   break;
case GRF:
   {
- /* In this case, we have to deal with mapping parts of vgrfs to
-  * other parts of vgrfs so we have to do some reg_offset magic.
+ assert(entry-src.width % inst-src[arg].width == 0);
+ /* In this case, we'll just leave the width alone.  The source
+  * register could have different widths depending on how it is
+  * being used.  For instance, if only half of the register was
+  * used then we want to preserve that and continue to only use
+  * half.
+  *
+  * Also, we have to deal with mapping parts of vgrfs to other
+  * parts of vgrfs so we have to do some reg_offset magic.
   */
 
  /* Compute the offset of inst-src[arg] relative to inst-dst */
@@ -389,10 +398,10 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
acp_entry *entry)
  int rel_suboffset = inst-src[arg].subreg_offset;
 
  /* Compute the final register offset (in bytes) */
- int offset = entry-src.reg_offset * reg_size + 
entry-src.subreg_offset;
- offset += rel_offset * reg_size + rel_suboffset;
- inst-src[arg].reg_offset = offset / reg_size;
- inst-src[arg].subreg_offset = offset % reg_size;
+ int offset = entry-src.reg_offset * 32 + entry-src.subreg_offset;
+ offset += rel_offset * 32 + rel_suboffset;
+ inst-src[arg].reg_offset = offset / 32;
+ inst-src[arg].subreg_offset = offset % 32;
   }
   break;
default:
@@ -429,11 +438,10 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
acp_entry *entry)
   /* Bail if inst is reading a range that isn't contained in the range
* that entry is writing.
*/
-  int reg_size = dispatch_width * sizeof(float);
   if (inst-src[i].reg_offset  entry-dst.reg_offset ||
-  (inst-src[i].reg_offset * reg_size + inst-src[i].subreg_offset +
-   inst-regs_read(this, i) * inst-src[i].stride * reg_size) 
-  (entry-dst.reg_offset + 1) * reg_size)
+  (inst-src[i].reg_offset * 32 + inst-src[i].subreg_offset +
+   inst-regs_read(this, i) * inst-src[i].stride * 32) 
+  (entry-dst.reg_offset + entry-regs_written) * 32)
  continue;
 
   /* Don't bother with cases that should have been taken care of by the
@@ -623,6 +631,7 @@ fs_visitor::opt_copy_propagate_local(void *copy_prop_ctx, 
bblock_t *block,
 acp_entry *entry = ralloc(copy_prop_ctx, acp_entry);
 entry-dst = inst-dst;
 entry-src = inst-src[0];
+ entry-regs_written = inst-regs_written;
  entry-opcode = inst-opcode;
  entry-saturate = inst-saturate;
 acp[entry-dst.reg % ACP_HASH_SIZE].push_tail(entry);
@@ -638,6 +647,7 @@ fs_visitor::opt_copy_propagate_local(void *copy_prop_ctx, 
bblock_t *block,
entry-dst.reg_offset = offset;
entry-dst.width = inst-src[i].effective_width(this);
  

Re: [Mesa-dev] [PATCH 0.1/2] mesa: Add new variables in gl_context to store sample number layout

2014-09-26 Thread Jordan Justen
On Tue, Sep 23, 2014 at 5:38 PM, Anuj Phogat anuj.pho...@gmail.com wrote:
 Variables are used in later patches to implement
 EXT_framebuffer_multisample_blit_scaled extension.

 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 ---
  src/mesa/main/mtypes.h | 9 +
  1 file changed, 9 insertions(+)

 diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
 index 0d50be8..1cb3461 100644
 --- a/src/mesa/main/mtypes.h
 +++ b/src/mesa/main/mtypes.h
 @@ -3608,6 +3608,15 @@ struct gl_constants
 GLint MaxDepthTextureSamples;
 GLint MaxIntegerSamples;

 +   /**
 +* Layout of sample numbers in a rectangular grid roughly corresponding
 +* to real sample locations within a pixel. Used by
 +* GL_EXT_texture_multisample_blit_scaled implementation.
 +*/
 +   GLchar* sample_map_2x;
 +   GLchar* sample_map_4x;
 +   GLchar* sample_map_8x;

I think this would be better:
   uint8_t SampleMap2x[2];

Using a string here seems confusing. The meta code can use asprintf to
build the string.

The CamelCase name seems to follow the convention of this structure.

uint8_t doesn't follow the convection of the structure. :) (But, Ian
seems to often try to move us away from GL types when not API facing.)

Do you think the comment could be improved to help drivers understand
the purpose of the constants? The comment in the 0.2 patch was pretty
clear, but it is i965 specific.

If you agree to my suggestions, then you should probably send out all
4 patches as a series.

-Jordan

 /** GL_ARB_shader_atomic_counters */
 GLuint MaxAtomicBufferBindings;
 GLuint MaxAtomicBufferSize;
 --
 1.9.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0.1/2] mesa: Add new variables in gl_context to store sample number layout

2014-09-26 Thread Anuj Phogat
On Fri, Sep 26, 2014 at 12:50 PM, Jordan Justen jljus...@gmail.com wrote:

 On Tue, Sep 23, 2014 at 5:38 PM, Anuj Phogat anuj.pho...@gmail.com
 wrote:
  Variables are used in later patches to implement
  EXT_framebuffer_multisample_blit_scaled extension.
 
  Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
  ---
   src/mesa/main/mtypes.h | 9 +
   1 file changed, 9 insertions(+)
 
  diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
  index 0d50be8..1cb3461 100644
  --- a/src/mesa/main/mtypes.h
  +++ b/src/mesa/main/mtypes.h
  @@ -3608,6 +3608,15 @@ struct gl_constants
  GLint MaxDepthTextureSamples;
  GLint MaxIntegerSamples;
 
  +   /**
  +* Layout of sample numbers in a rectangular grid roughly
 corresponding
  +* to real sample locations within a pixel. Used by
  +* GL_EXT_texture_multisample_blit_scaled implementation.
  +*/
  +   GLchar* sample_map_2x;
  +   GLchar* sample_map_4x;
  +   GLchar* sample_map_8x;

 I think this would be better:
uint8_t SampleMap2x[2];


 Using a string here seems confusing. The meta code can use asprintf to
 build the string.

Yes, I had this thought earlier but wasn't sure. Will fix it now.



 The CamelCase name seems to follow the convention of this structure.

 uint8_t doesn't follow the convection of the structure. :) (But, Ian
 seems to often try to move us away from GL types when not API facing.)

 Do you think the comment could be improved to help drivers understand
 the purpose of the constants? The comment in the 0.2 patch was pretty
 clear, but it is i965 specific.

 If you agree to my suggestions, then you should probably send out all
 4 patches as a series.

I agree. I'll soon send out the series. Thanks.



 -Jordan

  /** GL_ARB_shader_atomic_counters */
  GLuint MaxAtomicBufferBindings;
  GLuint MaxAtomicBufferSize;
  --
  1.9.3
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/6] st/va: implement vlVa(Create|Destroy|Query|Get)Config

2014-09-26 Thread Leo Liu
From: Christian König christian.koe...@amd.com

This patch is for application to query configuration,
such as profiles, entrypoints, and attributes

v2: fix missing profile with query

Signed-off-by: Michael Varga michael.va...@amd.com
Signed-off-by: Christian König christian.koe...@amd.com
Signed-off-by: Leo Liu leo@amd.com
---
 src/gallium/state_trackers/va/config.c | 78 --
 src/gallium/state_trackers/va/context.c|  2 +-
 src/gallium/state_trackers/va/va_private.h | 68 ++
 3 files changed, 143 insertions(+), 5 deletions(-)

diff --git a/src/gallium/state_trackers/va/config.c 
b/src/gallium/state_trackers/va/config.c
index d548780..cfb0b25 100644
--- a/src/gallium/state_trackers/va/config.c
+++ b/src/gallium/state_trackers/va/config.c
@@ -26,16 +26,32 @@
  *
  **/
 
+#include pipe/p_screen.h
+
+#include vl/vl_winsys.h
+
 #include va_private.h
 
 VAStatus
 vlVaQueryConfigProfiles(VADriverContextP ctx, VAProfile *profile_list, int 
*num_profiles)
 {
+   struct pipe_screen *pscreen;
+   enum pipe_video_profile p;
+   VAProfile vap;
+
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
*num_profiles = 0;
 
+   pscreen = VL_VA_PSCREEN(ctx);
+   for (p = PIPE_VIDEO_PROFILE_MPEG2_SIMPLE; p = 
PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH; ++p)
+  if (pscreen-get_video_param(pscreen, p, 
PIPE_VIDEO_ENTRYPOINT_BITSTREAM, PIPE_VIDEO_CAP_SUPPORTED)) {
+ vap = PipeToProfile(p);
+ if (vap != VAProfileNone)
+profile_list[(*num_profiles)++] = vap;
+  }
+
return VA_STATUS_SUCCESS;
 }
 
@@ -43,11 +59,24 @@ VAStatus
 vlVaQueryConfigEntrypoints(VADriverContextP ctx, VAProfile profile,
VAEntrypoint *entrypoint_list, int *num_entrypoints)
 {
+   struct pipe_screen *pscreen;
+   enum pipe_video_profile p;
+
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
*num_entrypoints = 0;
 
+   p = ProfileToPipe(profile);
+   if (p == PIPE_VIDEO_PROFILE_UNKNOWN)
+  return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
+
+   pscreen = VL_VA_PSCREEN(ctx);
+   if (!pscreen-get_video_param(pscreen, p, PIPE_VIDEO_ENTRYPOINT_BITSTREAM, 
PIPE_VIDEO_CAP_SUPPORTED))
+  return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
+
+   entrypoint_list[(*num_entrypoints)++] = VAEntrypointVLD;
+
return VA_STATUS_SUCCESS;
 }
 
@@ -55,20 +84,54 @@ VAStatus
 vlVaGetConfigAttributes(VADriverContextP ctx, VAProfile profile, VAEntrypoint 
entrypoint,
 VAConfigAttrib *attrib_list, int num_attribs)
 {
+   int i;
+
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
-   return VA_STATUS_ERROR_UNIMPLEMENTED;
+   for (i = 0; i  num_attribs; ++i) {
+  unsigned int value;
+  switch (attrib_list[i].type) {
+  case VAConfigAttribRTFormat:
+ value = VA_RT_FORMAT_YUV420;
+ break;
+  case VAConfigAttribRateControl:
+value = VA_RC_NONE;
+ break;
+  default:
+ value = VA_ATTRIB_NOT_SUPPORTED;
+ break;
+  }
+  attrib_list[i].value = value;
+   }
+
+   return VA_STATUS_SUCCESS;
 }
 
 VAStatus
 vlVaCreateConfig(VADriverContextP ctx, VAProfile profile, VAEntrypoint 
entrypoint,
  VAConfigAttrib *attrib_list, int num_attribs, VAConfigID 
*config_id)
 {
+   struct pipe_screen *pscreen;
+   enum pipe_video_profile p;
+
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
-   return VA_STATUS_ERROR_UNIMPLEMENTED;
+   p = ProfileToPipe(profile);
+   if (p == PIPE_VIDEO_PROFILE_UNKNOWN)
+  return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
+
+   pscreen = VL_VA_PSCREEN(ctx);
+   if (!pscreen-get_video_param(pscreen, p, PIPE_VIDEO_ENTRYPOINT_BITSTREAM, 
PIPE_VIDEO_CAP_SUPPORTED))
+  return VA_STATUS_ERROR_UNSUPPORTED_PROFILE;
+
+   if (entrypoint != VAEntrypointVLD)
+  return VA_STATUS_ERROR_UNSUPPORTED_ENTRYPOINT;
+
+   *config_id = p;
+
+   return VA_STATUS_SUCCESS;
 }
 
 VAStatus
@@ -77,7 +140,7 @@ vlVaDestroyConfig(VADriverContextP ctx, VAConfigID config_id)
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
-   return VA_STATUS_ERROR_UNIMPLEMENTED;
+   return VA_STATUS_SUCCESS;
 }
 
 VAStatus
@@ -87,5 +150,12 @@ vlVaQueryConfigAttributes(VADriverContextP ctx, VAConfigID 
config_id, VAProfile
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
-   return VA_STATUS_ERROR_UNIMPLEMENTED;
+   *profile = PipeToProfile(config_id);
+   *entrypoint = VAEntrypointVLD;
+
+   *num_attribs = 1;
+   attrib_list[0].type = VAConfigAttribRTFormat;
+   attrib_list[0].value = VA_RT_FORMAT_YUV420;
+
+   return VA_STATUS_SUCCESS;
 }
diff --git a/src/gallium/state_trackers/va/context.c 
b/src/gallium/state_trackers/va/context.c
index 71651aa..048c3f2 100644
--- a/src/gallium/state_trackers/va/context.c
+++ b/src/gallium/state_trackers/va/context.c
@@ -104,7 +104,7 @@ VA_DRIVER_INIT_FUNC(VADriverContextP ctx)

[Mesa-dev] [PATCH v2 6/6] st/va: implement vlVa(Query|Create|Get|Put|Destroy)Image

2014-09-26 Thread Leo Liu
This patch implements functions for images support,
which basically supports copy data between video
surface and user buffers, in this case supports
SW decode, and other video output

v2: fix buffer size for odd-sized image case
expose I420 format as well

Signed-off-by: Leo Liu leo@amd.com
---
 src/gallium/state_trackers/va/context.c|   2 +-
 src/gallium/state_trackers/va/image.c  | 254 -
 src/gallium/state_trackers/va/va_private.h |  22 +++
 3 files changed, 269 insertions(+), 9 deletions(-)

diff --git a/src/gallium/state_trackers/va/context.c 
b/src/gallium/state_trackers/va/context.c
index 1819ec5..ae87d3b 100644
--- a/src/gallium/state_trackers/va/context.c
+++ b/src/gallium/state_trackers/va/context.c
@@ -121,7 +121,7 @@ VA_DRIVER_INIT_FUNC(VADriverContextP ctx)
ctx-max_profiles = PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH - 
PIPE_VIDEO_PROFILE_UNKNOWN;
ctx-max_entrypoints = 1;
ctx-max_attributes = 1;
-   ctx-max_image_formats = 1;
+   ctx-max_image_formats = VL_VA_MAX_IMAGE_FORMATS;
ctx-max_subpic_formats = 1;
ctx-max_display_attributes = 1;
ctx-str_vendor = mesa gallium vaapi;
diff --git a/src/gallium/state_trackers/va/image.c 
b/src/gallium/state_trackers/va/image.c
index 8aaa29c..d3c9f20 100644
--- a/src/gallium/state_trackers/va/image.c
+++ b/src/gallium/state_trackers/va/image.c
@@ -26,18 +26,66 @@
  *
  **/
 
+#include pipe/p_screen.h
+
+#include util/u_memory.h
+#include util/u_handle_table.h
+#include util/u_surface.h
+#include util/u_video.h
+
+#include vl/vl_winsys.h
+
 #include va_private.h
 
+static const VAImageFormat formats[VL_VA_MAX_IMAGE_FORMATS] =
+{
+   {VA_FOURCC('N','V','1','2')},
+   {VA_FOURCC('I','4','2','0')},
+   {VA_FOURCC('Y','V','1','2')},
+   {VA_FOURCC('Y','U','Y','V')},
+   {VA_FOURCC('U','Y','V','Y')},
+};
+
+static void
+vlVaVideoSurfaceSize(vlVaSurface *p_surf, int component,
+ unsigned *width, unsigned *height)
+{
+   *width = p_surf-templat.width;
+   *height = p_surf-templat.height;
+
+   if (component  0) {
+  if (p_surf-templat.chroma_format == PIPE_VIDEO_CHROMA_FORMAT_420) {
+ *width /= 2;
+ *height /= 2;
+  } else if (p_surf-templat.chroma_format == PIPE_VIDEO_CHROMA_FORMAT_422)
+ *width /= 2;
+   }
+   if (p_surf-templat.interlaced)
+  *height /= 2;
+}
+
 VAStatus
 vlVaQueryImageFormats(VADriverContextP ctx, VAImageFormat *format_list, int 
*num_formats)
 {
+   struct pipe_screen *pscreen;
+   enum pipe_format format;
+   int i;
+
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
if (!(format_list  num_formats))
-  return VA_STATUS_ERROR_UNKNOWN;
+  return VA_STATUS_ERROR_INVALID_PARAMETER;
 
*num_formats = 0;
+   pscreen = VL_VA_PSCREEN(ctx);
+   for (i = 0; i  VL_VA_MAX_IMAGE_FORMATS; ++i) {
+  format = YCbCrToPipe(formats[i].fourcc);
+  if (pscreen-is_video_format_supported(pscreen, format,
+  PIPE_VIDEO_PROFILE_UNKNOWN,
+  PIPE_VIDEO_ENTRYPOINT_BITSTREAM))
+ format_list[(*num_formats)++] = formats[i];
+   }
 
return VA_STATUS_SUCCESS;
 }
@@ -45,16 +93,61 @@ vlVaQueryImageFormats(VADriverContextP ctx, VAImageFormat 
*format_list, int *num
 VAStatus
 vlVaCreateImage(VADriverContextP ctx, VAImageFormat *format, int width, int 
height, VAImage *image)
 {
+   vlVaDriver *drv;
+   int w, h;
+
if (!ctx)
   return VA_STATUS_ERROR_INVALID_CONTEXT;
 
-   if(!format)
-  return VA_STATUS_ERROR_UNKNOWN;
+   if (!(format  image  width  height))
+  return VA_STATUS_ERROR_INVALID_PARAMETER;
+
+   drv = VL_VA_DRIVER(ctx);
 
-   if (!(width  height))
+   image-image_id = handle_table_add(drv-htab, image);
+   image-format = *format;
+   image-width = width;
+   image-height = height;
+   w = align(width, 2);
+   h = align(width, 2);
+
+   switch (format-fourcc) {
+   case VA_FOURCC('N','V','1','2'):
+  image-num_planes = 2;
+  image-pitches[0] = w;
+  image-offsets[0] = 0;
+  image-pitches[1] = w;
+  image-offsets[1] = w * h;
+  image-data_size  = w * h * 3 / 2;
+  break;
+
+   case VA_FOURCC('I','4','2','0'):
+   case VA_FOURCC('Y','V','1','2'):
+  image-num_planes = 3;
+  image-pitches[0] = w;
+  image-offsets[0] = 0;
+  image-pitches[1] = w / 2;
+  image-offsets[1] = w * h;
+  image-pitches[2] = w / 2;
+  image-offsets[2] = w * h * 5 / 4;
+  image-data_size  = w * h * 3 / 2;
+  break;
+
+   case VA_FOURCC('U','Y','V','Y'):
+   case VA_FOURCC('Y','U','Y','V'):
+  image-num_planes = 1;
+  image-pitches[0] = w * 4;
+  image-offsets[0] = 0;
+  image-data_size  = w * h * 4;
+  break;
+
+   default:
   return VA_STATUS_ERROR_INVALID_IMAGE_FORMAT;
+   }
 
-   return VA_STATUS_ERROR_UNIMPLEMENTED;
+   return vlVaCreateBuffer(ctx, 0, VAImageBufferType,
+   align(image-data_size, 16),
+  

[Mesa-dev] [PATCH v3 2/6] st/va: skeleton VAAPI state tracker

2014-09-26 Thread Leo Liu
From: Christian König christian.koe...@amd.com

This patch adds a skeleton VA-API state tracker,
which is filled with live in the subsequent patches.

v2: fixes in configure.ac and va state_tracker Makefile.am
v3: configure.ac:
   generate a marco for link to xcb
   auto-dectecting VA version
   rebase with upstream changes
state-trackers/va/Makefile.am:
   pass symbol for auto-detecting VA version
targets/va/Makefile.am
   rebase with omx/Makefile.am
use macro VA_DRIVER_INIT_FUNC for auto-detect

Signed-off-by: Christian König christian.koe...@amd.com
Signed-off-by: Leo Liu leo@amd.com
---
 configure.ac   |  34 ++
 src/gallium/Makefile.am|   4 +
 src/gallium/state_trackers/va/Makefile.am  |  37 ++
 src/gallium/state_trackers/va/Makefile.sources |  10 ++
 src/gallium/state_trackers/va/buffer.c |  87 ++
 src/gallium/state_trackers/va/config.c |  91 +++
 src/gallium/state_trackers/va/context.c| 151 +
 src/gallium/state_trackers/va/display.c|  61 ++
 src/gallium/state_trackers/va/image.c  | 106 +
 src/gallium/state_trackers/va/picture.c|  56 +
 src/gallium/state_trackers/va/subpicture.c | 115 +++
 src/gallium/state_trackers/va/surface.c| 111 ++
 src/gallium/state_trackers/va/va_private.h | 116 +++
 src/gallium/targets/va/Makefile.am |  58 ++
 src/gallium/targets/va/target.c|   1 +
 15 files changed, 1038 insertions(+)
 create mode 100644 src/gallium/state_trackers/va/Makefile.am
 create mode 100644 src/gallium/state_trackers/va/Makefile.sources
 create mode 100644 src/gallium/state_trackers/va/buffer.c
 create mode 100644 src/gallium/state_trackers/va/config.c
 create mode 100644 src/gallium/state_trackers/va/context.c
 create mode 100644 src/gallium/state_trackers/va/display.c
 create mode 100644 src/gallium/state_trackers/va/image.c
 create mode 100644 src/gallium/state_trackers/va/picture.c
 create mode 100644 src/gallium/state_trackers/va/subpicture.c
 create mode 100644 src/gallium/state_trackers/va/surface.c
 create mode 100644 src/gallium/state_trackers/va/va_private.h
 create mode 100644 src/gallium/targets/va/Makefile.am
 create mode 100644 src/gallium/targets/va/target.c

diff --git a/configure.ac b/configure.ac
index 52f8a52..9cd7f4b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -673,6 +673,11 @@ AC_ARG_ENABLE([omx],
  [enable OpenMAX library @:@default=no@:@])],
[enable_omx=$enableval],
[enable_omx=no])
+AC_ARG_ENABLE([va],
+   [AS_HELP_STRING([--enable-va],
+ [enable va library @:@default=auto@:@])],
+   [enable_va=$enableval],
+   [enable_va=auto])
 AC_ARG_ENABLE([opencl],
[AS_HELP_STRING([--enable-opencl],
  [enable OpenCL library @:@default=no@:@])],
@@ -744,6 +749,7 @@ if test x$enable_opengl = xno -a \
 x$enable_xvmc = xno -a \
 x$enable_vdpau = xno -a \
 x$enable_omx = xno -a \
+x$enable_va = xno -a \
 x$enable_opencl = xno; then
 AC_MSG_ERROR([at least one API should be enabled])
 fi
@@ -1404,6 +1410,10 @@ if test -n $with_gallium_drivers -a 
x$with_gallium_drivers != xswrast; then
 if test x$enable_omx = xauto; then
PKG_CHECK_EXISTS([libomxil-bellagio], [enable_omx=yes], [enable_omx=no])
 fi
+
+if test x$enable_va = xauto; then
+PKG_CHECK_EXISTS([libva], [enable_va=yes], [enable_va=no])
+fi
 fi
 
 if test x$enable_xvmc = xyes; then
@@ -1425,6 +1435,16 @@ if test x$enable_omx = xyes; then
 fi
 AM_CONDITIONAL(HAVE_ST_OMX, test x$enable_omx = xyes)
 
+if test x$enable_va = xyes; then
+PKG_CHECK_MODULES([VA], [libva = 0.35.0 x11-xcb xcb-dri2 = 
$XCBDRI2_REQUIRED],
+  [VA_LIBS=`$PKG_CONFIG --libs x11-xcb xcb-dri2`])
+VA_DRIVER_INIT_FUNC=`$PKG_CONFIG --modversion libva|sed -n 
's/\(.*\)\.\(.*\)\..*$/__vaDriverInit_\1_\2/p'`
+AC_SUBST([VA_DRIVER_INIT_FUNC])
+GALLIUM_STATE_TRACKERS_DIRS=$GALLIUM_STATE_TRACKERS_DIRS va
+enable_gallium_loader=$enable_shared_pipe_drivers
+fi
+AM_CONDITIONAL(HAVE_ST_VA, test x$enable_va = xyes)
+
 dnl
 dnl OpenCL configuration
 dnl
@@ -1796,6 +1816,15 @@ AC_ARG_WITH([omx-libdir],
 [OMX_LIB_INSTALL_DIR=$OMX_LIB_INSTALL_DIR_DEFAULT])
 AC_SUBST([OMX_LIB_INSTALL_DIR])
 
+dnl Directory for VA libs
+
+AC_ARG_WITH([va-libdir],
+[AS_HELP_STRING([--with-va-libdir=DIR],
+[directory for the VA libraries @:@default=`pkg-config libva 
--variable=driverdir`@:@])],
+[VA_LIB_INSTALL_DIR=$withval],
+[VA_LIB_INSTALL_DIR=`pkg-config libva --variable=driverdir`])
+AC_SUBST([VA_LIB_INSTALL_DIR])
+
 dnl Directory for OpenCL libs
 AC_ARG_WITH([opencl-libdir],
 [AS_HELP_STRING([--with-opencl-libdir=DIR],
@@ -1829,6 +1858,9 @@ gallium_require_drm_loader() {
 fi
 

Re: [Mesa-dev] [PATCH v2 6/6] st/va: implement vlVa(Query|Create|Get|Put|Destroy)Image

2014-09-26 Thread Ilia Mirkin
On Fri, Sep 26, 2014 at 4:30 PM, Leo Liu leo@amd.com wrote:
 This patch implements functions for images support,
 which basically supports copy data between video
 surface and user buffers, in this case supports
 SW decode, and other video output

 v2: fix buffer size for odd-sized image case
 expose I420 format as well

 Signed-off-by: Leo Liu leo@amd.com
 ---
  src/gallium/state_trackers/va/context.c|   2 +-
  src/gallium/state_trackers/va/image.c  | 254 
 -
  src/gallium/state_trackers/va/va_private.h |  22 +++
  3 files changed, 269 insertions(+), 9 deletions(-)

 diff --git a/src/gallium/state_trackers/va/context.c 
 b/src/gallium/state_trackers/va/context.c
 index 1819ec5..ae87d3b 100644
 --- a/src/gallium/state_trackers/va/context.c
 +++ b/src/gallium/state_trackers/va/context.c
 @@ -121,7 +121,7 @@ VA_DRIVER_INIT_FUNC(VADriverContextP ctx)
 ctx-max_profiles = PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH - 
 PIPE_VIDEO_PROFILE_UNKNOWN;
 ctx-max_entrypoints = 1;
 ctx-max_attributes = 1;
 -   ctx-max_image_formats = 1;
 +   ctx-max_image_formats = VL_VA_MAX_IMAGE_FORMATS;
 ctx-max_subpic_formats = 1;
 ctx-max_display_attributes = 1;
 ctx-str_vendor = mesa gallium vaapi;
 diff --git a/src/gallium/state_trackers/va/image.c 
 b/src/gallium/state_trackers/va/image.c
 index 8aaa29c..d3c9f20 100644
 --- a/src/gallium/state_trackers/va/image.c
 +++ b/src/gallium/state_trackers/va/image.c
 @@ -26,18 +26,66 @@
   *
   **/

 +#include pipe/p_screen.h
 +
 +#include util/u_memory.h
 +#include util/u_handle_table.h
 +#include util/u_surface.h
 +#include util/u_video.h
 +
 +#include vl/vl_winsys.h
 +
  #include va_private.h

 +static const VAImageFormat formats[VL_VA_MAX_IMAGE_FORMATS] =
 +{
 +   {VA_FOURCC('N','V','1','2')},
 +   {VA_FOURCC('I','4','2','0')},
 +   {VA_FOURCC('Y','V','1','2')},
 +   {VA_FOURCC('Y','U','Y','V')},
 +   {VA_FOURCC('U','Y','V','Y')},
 +};
 +
 +static void
 +vlVaVideoSurfaceSize(vlVaSurface *p_surf, int component,
 + unsigned *width, unsigned *height)
 +{
 +   *width = p_surf-templat.width;
 +   *height = p_surf-templat.height;
 +
 +   if (component  0) {
 +  if (p_surf-templat.chroma_format == PIPE_VIDEO_CHROMA_FORMAT_420) {
 + *width /= 2;
 + *height /= 2;
 +  } else if (p_surf-templat.chroma_format == 
 PIPE_VIDEO_CHROMA_FORMAT_422)
 + *width /= 2;
 +   }
 +   if (p_surf-templat.interlaced)
 +  *height /= 2;
 +}
 +
  VAStatus
  vlVaQueryImageFormats(VADriverContextP ctx, VAImageFormat *format_list, int 
 *num_formats)
  {
 +   struct pipe_screen *pscreen;
 +   enum pipe_format format;
 +   int i;
 +
 if (!ctx)
return VA_STATUS_ERROR_INVALID_CONTEXT;

 if (!(format_list  num_formats))
 -  return VA_STATUS_ERROR_UNKNOWN;
 +  return VA_STATUS_ERROR_INVALID_PARAMETER;

 *num_formats = 0;
 +   pscreen = VL_VA_PSCREEN(ctx);
 +   for (i = 0; i  VL_VA_MAX_IMAGE_FORMATS; ++i) {
 +  format = YCbCrToPipe(formats[i].fourcc);
 +  if (pscreen-is_video_format_supported(pscreen, format,
 +  PIPE_VIDEO_PROFILE_UNKNOWN,
 +  PIPE_VIDEO_ENTRYPOINT_BITSTREAM))
 + format_list[(*num_formats)++] = formats[i];
 +   }

 return VA_STATUS_SUCCESS;
  }
 @@ -45,16 +93,61 @@ vlVaQueryImageFormats(VADriverContextP ctx, VAImageFormat 
 *format_list, int *num
  VAStatus
  vlVaCreateImage(VADriverContextP ctx, VAImageFormat *format, int width, int 
 height, VAImage *image)
  {
 +   vlVaDriver *drv;
 +   int w, h;
 +
 if (!ctx)
return VA_STATUS_ERROR_INVALID_CONTEXT;

 -   if(!format)
 -  return VA_STATUS_ERROR_UNKNOWN;
 +   if (!(format  image  width  height))
 +  return VA_STATUS_ERROR_INVALID_PARAMETER;
 +
 +   drv = VL_VA_DRIVER(ctx);

 -   if (!(width  height))
 +   image-image_id = handle_table_add(drv-htab, image);
 +   image-format = *format;
 +   image-width = width;
 +   image-height = height;
 +   w = align(width, 2);
 +   h = align(width, 2);
 +
 +   switch (format-fourcc) {
 +   case VA_FOURCC('N','V','1','2'):
 +  image-num_planes = 2;
 +  image-pitches[0] = w;
 +  image-offsets[0] = 0;
 +  image-pitches[1] = w;
 +  image-offsets[1] = w * h;
 +  image-data_size  = w * h * 3 / 2;
 +  break;
 +
 +   case VA_FOURCC('I','4','2','0'):
 +   case VA_FOURCC('Y','V','1','2'):
 +  image-num_planes = 3;
 +  image-pitches[0] = w;
 +  image-offsets[0] = 0;
 +  image-pitches[1] = w / 2;
 +  image-offsets[1] = w * h;
 +  image-pitches[2] = w / 2;
 +  image-offsets[2] = w * h * 5 / 4;
 +  image-data_size  = w * h * 3 / 2;
 +  break;
 +
 +   case VA_FOURCC('U','Y','V','Y'):
 +   case VA_FOURCC('Y','U','Y','V'):
 +  image-num_planes = 1;
 +  image-pitches[0] = w * 4;
 +  image-offsets[0] = 0;
 +  image-data_size  = w * h * 4;

Is this right? YUYV/UYVY stores 2 pixels in 4 

[Mesa-dev] [PATCH 2/3] driconf: Update Spanish translation

2014-09-26 Thread Alex Henrie
---
 src/mesa/drivers/dri/common/xmlpool/es.po | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/common/xmlpool/es.po 
b/src/mesa/drivers/dri/common/xmlpool/es.po
index 1733b76..a68c329 100644
--- a/src/mesa/drivers/dri/common/xmlpool/es.po
+++ b/src/mesa/drivers/dri/common/xmlpool/es.po
@@ -10,7 +10,7 @@ msgstr 
 Project-Id-Version: es\n
 Report-Msgid-Bugs-To: \n
 POT-Creation-Date: 2014-09-25 22:29-0600\n
-PO-Revision-Date: 2014-01-15 10:34-0700\n
+PO-Revision-Date: 2014-09-26 14:22-0700\n
 Last-Translator: Alex Henrie alexhenri...@gmail.com\n
 Language-Team: Spanish e...@li.org\n
 Language: es\n
@@ -18,7 +18,7 @@ msgstr 
 Content-Type: text/plain; charset=UTF-8\n
 Content-Transfer-Encoding: 8bit\n
 Plural-Forms: nplurals=2; plural=(n != 1);\n
-X-Generator: Poedit 1.5.4\n
+X-Generator: Poedit 1.6.9\n
 
 #: t_options.h:56
 msgid Debugging
@@ -72,7 +72,7 @@ msgstr 
 
 #: t_options.h:110
 msgid Allow GLSL #extension directives in the middle of shaders
-msgstr 
+msgstr Permite directivas #extension GLSL en medio de los shaders
 
 #: t_options.h:120
 msgid Image Quality
@@ -309,8 +309,8 @@ msgstr Crear todos los visuales con buffer de profundidad
 
 #: t_options.h:337
 msgid Initialization
-msgstr 
+msgstr Inicialización
 
 #: t_options.h:341
 msgid Define the graphic device to use if possible
-msgstr 
+msgstr Define el dispositivo de gráficos que usar si es posible
-- 
2.1.1
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] driconf: Synchronize po files

2014-09-26 Thread Alex Henrie
---
 src/mesa/drivers/dri/common/xmlpool/ca.po | 119 --
 src/mesa/drivers/dri/common/xmlpool/de.po | 118 -
 src/mesa/drivers/dri/common/xmlpool/es.po | 118 -
 src/mesa/drivers/dri/common/xmlpool/fr.po | 118 -
 src/mesa/drivers/dri/common/xmlpool/nl.po | 118 -
 src/mesa/drivers/dri/common/xmlpool/sv.po | 118 -
 6 files changed, 390 insertions(+), 319 deletions(-)

diff --git a/src/mesa/drivers/dri/common/xmlpool/ca.po 
b/src/mesa/drivers/dri/common/xmlpool/ca.po
index c0cf7f6..1db9703 100644
--- a/src/mesa/drivers/dri/common/xmlpool/ca.po
+++ b/src/mesa/drivers/dri/common/xmlpool/ca.po
@@ -21,12 +21,11 @@
 # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
 # IN THE SOFTWARE.
-
 msgid 
 msgstr 
 Project-Id-Version: Mesa 10.1.0-devel\n
 Report-Msgid-Bugs-To: \n
-POT-Creation-Date: 2014-01-13 22:30-0700\n
+POT-Creation-Date: 2014-09-25 22:29-0600\n
 PO-Revision-Date: 2014-01-15 10:37-0700\n
 Last-Translator: Alex Henrie alexhenri...@gmail.com\n
 Language-Team: Catalan c...@li.org\n
@@ -87,108 +86,112 @@ msgstr 
 Força una versió GLSL per defecte en els shaders als quals falta una línia 
 #version explícita
 
-#: t_options.h:115
+#: t_options.h:110
+msgid Allow GLSL #extension directives in the middle of shaders
+msgstr 
+
+#: t_options.h:120
 msgid Image Quality
 msgstr Qualitat d'Imatge
 
-#: t_options.h:128
+#: t_options.h:133
 msgid Texture color depth
 msgstr Profunditat de color de textura
 
-#: t_options.h:129
+#: t_options.h:134
 msgid Prefer frame buffer color depth
 msgstr Prefereix profunditat de color del framebuffer
 
-#: t_options.h:130
+#: t_options.h:135
 msgid Prefer 32 bits per texel
 msgstr Prefereix 32 bits per texel
 
-#: t_options.h:131
+#: t_options.h:136
 msgid Prefer 16 bits per texel
 msgstr Prefereix 16 bits per texel
 
-#: t_options.h:132
+#: t_options.h:137
 msgid Force 16 bits per texel
 msgstr Força 16 bits per texel
 
-#: t_options.h:138
+#: t_options.h:143
 msgid Initial maximum value for anisotropic texture filtering
 msgstr Valor màxim inicial per a la filtració de textura anisòtropa
 
-#: t_options.h:143
+#: t_options.h:148
 msgid Forbid negative texture LOD bias
 msgstr 
 Prohibeix una parcialitat negativa del Nivell de Detalle (LOD) de les 
 textures
 
-#: t_options.h:148
+#: t_options.h:153
 msgid 
 Enable S3TC texture compression even if software support is not available
 msgstr 
 Habilitar la compressió de textures S3TC encara que el suport de programari 
 no estigui disponible
 
-#: t_options.h:155
+#: t_options.h:160
 msgid Initial color reduction method
 msgstr Mètode inicial de reducció de color
 
-#: t_options.h:156
+#: t_options.h:161
 msgid Round colors
 msgstr Colors arrodonits
 
-#: t_options.h:157
+#: t_options.h:162
 msgid Dither colors
 msgstr Colors tramats
 
-#: t_options.h:165
+#: t_options.h:170
 msgid Color rounding method
 msgstr Mètode d'arrodoniment de color
 
-#: t_options.h:166
+#: t_options.h:171
 msgid Round color components downward
 msgstr Arrondeix els components de color a baix
 
-#: t_options.h:167
+#: t_options.h:172
 msgid Round to nearest color
 msgstr Arrondeix al color més proper
 
-#: t_options.h:176
+#: t_options.h:181
 msgid Color dithering method
 msgstr Mètode de tramat de color
 
-#: t_options.h:177
+#: t_options.h:182
 msgid Horizontal error diffusion
 msgstr Difusió d'error horitzontal
 
-#: t_options.h:178
+#: t_options.h:183
 msgid Horizontal error diffusion, reset error at line start
 msgstr Difusió d'error horitzontal, reinicia l'error a l'inici de la línia
 
-#: t_options.h:179
+#: t_options.h:184
 msgid Ordered 2D color dithering
 msgstr Tramat de color 2D ordenat
 
-#: t_options.h:185
+#: t_options.h:190
 msgid Floating point depth buffer
 msgstr Buffer de profunditat de punt flotant
 
-#: t_options.h:190
+#: t_options.h:195
 msgid A post-processing filter to cel-shade the output
 msgstr Un filtre de postprocessament per a aplicar cel shading a la sortida
 
-#: t_options.h:195
+#: t_options.h:200
 msgid A post-processing filter to remove the red channel
 msgstr Un filtre de postprocessament per a treure el canal vermell
 
-#: t_options.h:200
+#: t_options.h:205
 msgid A post-processing filter to remove the green channel
 msgstr Un filtre de postprocessament per a treure el canal verd
 
-#: t_options.h:205
+#: t_options.h:210
 msgid A post-processing filter to remove the blue channel
 msgstr Un filtre de postprocessament per a treure el canal blau
 
-#: t_options.h:210
+#: t_options.h:215
 msgid 
 Morphological anti-aliasing based on Jimenez\\' MLAA. 0 to disable, 8 for 
 default quality
@@ -196,7 +199,7 @@ msgstr 
 Antialiàsing morfològic basat en el MLAA de Jimenez. 0 per deshabilitar, 8 
 per qualitat per defecte
 
-#: t_options.h:215
+#: 

[Mesa-dev] [PATCH 3/3] driconf: Correct and update Catalan translation

2014-09-26 Thread Alex Henrie
---
 src/mesa/drivers/dri/common/xmlpool/ca.po | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/common/xmlpool/ca.po 
b/src/mesa/drivers/dri/common/xmlpool/ca.po
index 1db9703..23e9f42 100644
--- a/src/mesa/drivers/dri/common/xmlpool/ca.po
+++ b/src/mesa/drivers/dri/common/xmlpool/ca.po
@@ -26,14 +26,14 @@ msgstr 
 Project-Id-Version: Mesa 10.1.0-devel\n
 Report-Msgid-Bugs-To: \n
 POT-Creation-Date: 2014-09-25 22:29-0600\n
-PO-Revision-Date: 2014-01-15 10:37-0700\n
+PO-Revision-Date: 2014-09-26 14:43-0700\n
 Last-Translator: Alex Henrie alexhenri...@gmail.com\n
 Language-Team: Catalan c...@li.org\n
 Language: ca\n
 MIME-Version: 1.0\n
 Content-Type: text/plain; charset=UTF-8\n
 Content-Transfer-Encoding: 8bit\n
-X-Generator: Poedit 1.5.4\n
+X-Generator: Poedit 1.6.9\n
 
 #: t_options.h:56
 msgid Debugging
@@ -72,8 +72,8 @@ msgstr Deshabilita la barreja de font dual
 #: t_options.h:95
 msgid Disable backslash-based line continuations in GLSL source
 msgstr 
-Deshabilitar les continuacions de línia basades en barra invertida en la 
-font GLSL
+Deshabilita les continuacions de línia basades en barra invertida en la font 
+GLSL
 
 #: t_options.h:100
 msgid Disable GL_ARB_shader_bit_encoding
@@ -88,7 +88,7 @@ msgstr 
 
 #: t_options.h:110
 msgid Allow GLSL #extension directives in the middle of shaders
-msgstr 
+msgstr Permet les directives #extension GLSL en el mitjà dels shaders
 
 #: t_options.h:120
 msgid Image Quality
@@ -128,7 +128,7 @@ msgstr 
 msgid 
 Enable S3TC texture compression even if software support is not available
 msgstr 
-Habilitar la compressió de textures S3TC encara que el suport de programari 
+Habilita la compressió de textures S3TC encara que el suport de programari 
 no estigui disponible
 
 #: t_options.h:160
@@ -325,8 +325,8 @@ msgstr Crea tots els visuals amb buffer de profunditat
 
 #: t_options.h:337
 msgid Initialization
-msgstr 
+msgstr Inicialització
 
 #: t_options.h:341
 msgid Define the graphic device to use if possible
-msgstr 
+msgstr Defineix el dispositiu de gràfics que usar si és possible
-- 
2.1.1
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] mesa: remove last DJGPP remains

2014-09-26 Thread Emil Velikov
Signed-off-by: Emil Velikov emil.l.veli...@gmail.com
---
 src/mapi/glapi/gen/gl_x86_asm.py  | 2 +-
 src/mesa/main/dlopen.h| 7 ---
 src/mesa/main/texcompress_s3tc.c  | 2 --
 src/mesa/x86/assyntax.h   | 6 +++---
 src/mesa/x86/read_rgba_span_x86.S | 4 ++--
 5 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/src/mapi/glapi/gen/gl_x86_asm.py b/src/mapi/glapi/gen/gl_x86_asm.py
index 919bbc0..d87d0bd 100644
--- a/src/mapi/glapi/gen/gl_x86_asm.py
+++ b/src/mapi/glapi/gen/gl_x86_asm.py
@@ -72,7 +72,7 @@ class PrintGenericStubs(gl_XML.gl_print_base):
 print ''
 print '#define GL_OFFSET(x) CODEPTR(REGOFF(4 * x, EAX))'
 print ''
-print '#if defined(GNU_ASSEMBLER)  !defined(__DJGPP__)  
!defined(__MINGW32__)  !defined(__APPLE__)'
+print '#if defined(GNU_ASSEMBLER)  !defined(__MINGW32__)  
!defined(__APPLE__)'
 print '#define GLOBL_FN(x) GLOBL x ; .type x, @function'
 print '#else'
 print '#define GLOBL_FN(x) GLOBL x'
diff --git a/src/mesa/main/dlopen.h b/src/mesa/main/dlopen.h
index 55a56f0..3754ec1 100644
--- a/src/mesa/main/dlopen.h
+++ b/src/mesa/main/dlopen.h
@@ -73,13 +73,6 @@ _mesa_dlsym(void *handle, const char *fname)
} u;
 #if defined(__blrts)
u.v = NULL;
-#elif defined(__DJGPP__)
-   /* need '_' prefix on symbol names */
-   char fname2[1000];
-   fname2[0] = '_';
-   strncpy(fname2 + 1, fname, 998);
-   fname2[999] = 0;
-   u.v = dlsym(handle, fname2);
 #elif defined(HAVE_DLOPEN)
u.v = dlsym(handle, fname);
 #elif defined(__MINGW32__)
diff --git a/src/mesa/main/texcompress_s3tc.c b/src/mesa/main/texcompress_s3tc.c
index 5b275ef..254f84e 100644
--- a/src/mesa/main/texcompress_s3tc.c
+++ b/src/mesa/main/texcompress_s3tc.c
@@ -51,8 +51,6 @@
 #define DXTN_LIBNAME dxtn.dll
 #define RTLD_LAZY 0
 #define RTLD_GLOBAL 0
-#elif defined(__DJGPP__)
-#define DXTN_LIBNAME dxtn.dxe
 #else
 #define DXTN_LIBNAME libtxc_dxtn.so
 #endif
diff --git a/src/mesa/x86/assyntax.h b/src/mesa/x86/assyntax.h
index fa7d92e..67867bd 100644
--- a/src/mesa/x86/assyntax.h
+++ b/src/mesa/x86/assyntax.h
@@ -255,7 +255,7 @@
 #endif /* ACK_ASSEMBLER */
 
 
-#if defined(__QNX__) || defined(Lynx) || (defined(SYSV) || defined(SVR4))  
!defined(ACK_ASSEMBLER) || defined(__ELF__) || defined(__GNU__) || 
defined(__GNUC__)  !defined(__DJGPP__)  !defined(__MINGW32__)
+#if defined(__QNX__) || defined(Lynx) || (defined(SYSV) || defined(SVR4))  
!defined(ACK_ASSEMBLER) || defined(__ELF__) || defined(__GNU__) || 
defined(__GNUC__)  !defined(__MINGW32__)
 #define GLNAME(a)  a
 #else
 #define GLNAME(a)  CONCAT(_,a)
@@ -1727,9 +1727,9 @@
  * If we build with gcc's -fvisibility=hidden flag, we'll need to change
  * the symbol visibility mode to 'default'.
  */
-#if defined(GNU_ASSEMBLER)  !defined(__DJGPP__)  !defined(__MINGW32__)  
!defined(__APPLE__)
+#if defined(GNU_ASSEMBLER)  !defined(__MINGW32__)  !defined(__APPLE__)
 #  define HIDDEN(x) .hidden x
-#elif defined(__GNUC__)  !defined(__DJGPP__)  !defined(__MINGW32__)  
!defined(__APPLE__)
+#elif defined(__GNUC__)  !defined(__MINGW32__)  !defined(__APPLE__)
 #  pragma GCC visibility push(default)
 #  define HIDDEN(x) .hidden x
 #else
diff --git a/src/mesa/x86/read_rgba_span_x86.S 
b/src/mesa/x86/read_rgba_span_x86.S
index 8177299..5def1f8 100644
--- a/src/mesa/x86/read_rgba_span_x86.S
+++ b/src/mesa/x86/read_rgba_span_x86.S
@@ -31,7 +31,7 @@
  */
 
.file   read_rgba_span_x86.S
-#if !defined(__DJGPP__)  !defined(__MINGW32__)  !defined(__APPLE__) /* 
this one cries for assyntax.h */
+#if !defined(__MINGW32__)  !defined(__APPLE__) /* this one cries for 
assyntax.h */
 /* Kevin F. Quinn 2nd July 2006
  * Replaced data segment constants with text-segment instructions.
  */
@@ -671,7 +671,7 @@ _generic_read_RGBA_span_RGB565_MMX:
emms
 #endif
ret
-#endif /* !defined(__DJGPP__)  !defined(__MINGW32__)  !defined(__APPLE__) 
*/
+#endif /* !defined(__MINGW32__)  !defined(__APPLE__) */

 #if defined (__ELF__)  defined (__linux__)
.section .note.GNU-stack,,%progbits
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] Remove Bluegene/L wrappers

2014-09-26 Thread Emil Velikov
Added back in 2009, with osmesa/GLU in mind. Unlikely to be working
any more since the removal of the static makefiles.

Cc: Brian Paul bri...@vmware.com
Signed-off-by: Emil Velikov emil.l.veli...@gmail.com
---
 src/mesa/main/compiler.h |  2 +-
 src/mesa/main/dlopen.h   | 12 +++-
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/src/mesa/main/compiler.h b/src/mesa/main/compiler.h
index 185c911..34671dc 100644
--- a/src/mesa/main/compiler.h
+++ b/src/mesa/main/compiler.h
@@ -150,7 +150,7 @@ extern C {
 #elif defined(__APPLE__)
 #include CoreFoundation/CFByteOrder.h
 #define CPU_TO_LE32( x )   CFSwapInt32HostToLittle( x )
-#elif (defined(_AIX) || defined(__blrts))
+#elif (defined(_AIX))
 static inline GLuint CPU_TO_LE32(GLuint x)
 {
return (((x  0x00ff)  24) |
diff --git a/src/mesa/main/dlopen.h b/src/mesa/main/dlopen.h
index 3754ec1..1e77849 100644
--- a/src/mesa/main/dlopen.h
+++ b/src/mesa/main/dlopen.h
@@ -47,9 +47,7 @@ typedef void (*GenericFunc)(void);
 static inline void *
 _mesa_dlopen(const char *libname, int flags)
 {
-#if defined(__blrts)
-   return NULL;
-#elif defined(HAVE_DLOPEN)
+#if defined(HAVE_DLOPEN)
flags = RTLD_LAZY | RTLD_GLOBAL; /* Overriding flags at this time */
return dlopen(libname, flags);
 #elif defined(__MINGW32__)
@@ -71,9 +69,7 @@ _mesa_dlsym(void *handle, const char *fname)
   void *v;
   GenericFunc f;
} u;
-#if defined(__blrts)
-   u.v = NULL;
-#elif defined(HAVE_DLOPEN)
+#if defined(HAVE_DLOPEN)
u.v = dlsym(handle, fname);
 #elif defined(__MINGW32__)
u.v = (void *) GetProcAddress(handle, fname);
@@ -89,9 +85,7 @@ _mesa_dlsym(void *handle, const char *fname)
 static inline void
 _mesa_dlclose(void *handle)
 {
-#if defined(__blrts)
-   (void) handle;
-#elif defined(HAVE_DLOPEN)
+#if defined(HAVE_DLOPEN)
dlclose(handle);
 #elif defined(__MINGW32__)
FreeLibrary(handle);
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] i965: Use unsynchronized maps for the program cache on LLC platforms.

2014-09-26 Thread Kenneth Graunke
On Friday, September 26, 2014 09:22:31 AM Kristian Høgsberg wrote:
 On Fri, Aug 29, 2014 at 11:10:50PM -0700, Kenneth Graunke wrote:
  There's no reason to stall on pwrite - the CPU always appends to the
  buffer and never modifies existing contents, and the GPU never writes
  it.  Further, the CPU always appends new data before submitting a batch
  that requires it.
  
  This code predates the unsynchronized mapping feature, so we simply
  didn't have the option when it was written.
  
  Ideally, we would do this for non-LLC platforms too, but unsynchronized
  mapping support only exists for LLC systems.
  
  Saves repeated 0.001ms stalls on program upload.
  
  Signed-off-by: Kenneth Graunke kenn...@whitecape.org
  ---
   src/mesa/drivers/dri/i965/brw_state_cache.c | 34 
  +++--
   1 file changed, 27 insertions(+), 7 deletions(-)
  
  diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c 
  b/src/mesa/drivers/dri/i965/brw_state_cache.c
  index b9bb0fc..1d2d32f 100644
  --- a/src/mesa/drivers/dri/i965/brw_state_cache.c
  +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
  @@ -172,14 +172,23 @@ brw_cache_new_bo(struct brw_cache *cache, uint32_t 
  new_size)
  drm_intel_bo *new_bo;
   
  new_bo = drm_intel_bo_alloc(brw-bufmgr, program cache, new_size, 64);
  +   if (brw-has_llc)
  +  drm_intel_gem_bo_map_unsynchronized(new_bo);
   
  /* Copy any existing data that needs to be saved. */
  if (cache-next_offset != 0) {
  -  brw_bo_map(brw, cache-bo, false, program cache);
  -  drm_intel_bo_subdata(new_bo, 0, cache-next_offset, 
  cache-bo-virtual);
  -  drm_intel_bo_unmap(cache-bo);
  +  if (brw-has_llc) {
  + memcpy(new_bo-virtual, cache-bo-virtual, cache-next_offset);
 
 Move the drm_intel_gem_bo_map_unsynchronized() and drm_intel_bo_unmap()
 calls into this block so they bracket the memcpy as for the subdata case
 below?
 
 Other than that,
 
 Reviewed-by: Kristian Høgsberg k...@bitplanet.net

That won't work---the point is to map new_bo, and leave it mapped...and unmap 
the old BO before throwing it away.  If I moved the map call into the if 
(cache-next_offset != 0) block, then the initial mapping would never occur.

  +  } else {
  + brw_bo_map(brw, cache-bo, false, program cache);
  + drm_intel_bo_subdata(new_bo, 0, cache-next_offset,
  +  cache-bo-virtual);
  + drm_intel_bo_unmap(cache-bo);
  +  }
  }
   
  +   if (brw-has_llc)
  +  drm_intel_bo_unmap(cache-bo);
  drm_intel_bo_unreference(cache-bo);
  cache-bo = new_bo;
  cache-bo_used_by_gpu = false;
  @@ -222,9 +231,11 @@ brw_try_upload_using_copy(struct brw_cache *cache,
  continue;
   }
   
  -brw_bo_map(brw, cache-bo, false, program cache);
  + if (!brw-has_llc)
  +brw_bo_map(brw, cache-bo, false, program cache);
   ret = memcmp(cache-bo-virtual + item-offset, data, item-size);
  -drm_intel_bo_unmap(cache-bo);
  + if (!brw-has_llc)
  +drm_intel_bo_unmap(cache-bo);
   if (ret)
  continue;
   
  @@ -257,7 +268,7 @@ brw_upload_item_data(struct brw_cache *cache,
  /* If we would block on writing to an in-use program BO, just
   * recreate it.
   */
  -   if (cache-bo_used_by_gpu) {
  +   if (!brw-has_llc  cache-bo_used_by_gpu) {
 perf_debug(Copying busy program cache buffer.\n);
 brw_cache_new_bo(cache, cache-bo-size);
  }
  @@ -280,6 +291,7 @@ brw_upload_cache(struct brw_cache *cache,
   uint32_t *out_offset,
   void *out_aux)
   {
  +   struct brw_context *brw = cache-brw;
  struct brw_cache_item *item = CALLOC_STRUCT(brw_cache_item);
  GLuint hash;
  void *tmp;
  @@ -320,7 +332,11 @@ brw_upload_cache(struct brw_cache *cache,
  cache-n_items++;
   
  /* Copy data to the buffer */
  -   drm_intel_bo_subdata(cache-bo, item-offset, data_size, data);
  +   if (brw-has_llc) {
  +  memcpy((char *) cache-bo-virtual + item-offset, data, data_size);
  +   } else {
  +  drm_intel_bo_subdata(cache-bo, item-offset, data_size, data);
  +   }
   
  *out_offset = item-offset;
  *(void **)out_aux = (void *)((char *)item-key + item-key_size);
  @@ -342,6 +358,8 @@ brw_init_caches(struct brw_context *brw)
  cache-bo = drm_intel_bo_alloc(brw-bufmgr,
program cache,
4096, 64);
  +   if (brw-has_llc)
  +  drm_intel_gem_bo_map_unsynchronized(cache-bo);
   
  cache-aux_compare[BRW_VS_PROG] = brw_vs_prog_data_compare;
  cache-aux_compare[BRW_GS_PROG] = brw_gs_prog_data_compare;
  @@ -408,6 +426,8 @@ brw_destroy_cache(struct brw_context *brw, struct 
  brw_cache *cache)
   
  DBG(%s\n, __FUNCTION__);
   
  +   if (brw-has_llc)
  +  drm_intel_bo_unmap(cache-bo);
  drm_intel_bo_unreference(cache-bo);
  cache-bo = NULL;
  brw_clear_cache(brw, cache);
 



Re: [Mesa-dev] [PATCH 4/5] i965/fs: Don't invalidate live intervals in saturate propagation.

2014-09-26 Thread Jason Ekstrand
Patches 2-4 are
Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com

I'll have to think more about patch 1

On Mon, Sep 8, 2014 at 12:21 PM, Matt Turner matts...@gmail.com wrote:

 ---
  src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp
 b/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp
 index 6f7fb6c..347a78e 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp
 @@ -95,8 +95,7 @@ fs_visitor::opt_saturate_propagation()
progress = opt_saturate_propagation_local(this, block) || progress;
 }

 -   if (progress)
 -  invalidate_live_intervals();
 +   /* Live intervals are still valid. */

 return progress;
  }
 --
 1.8.5.5

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] i965: Issue performance warnings for program cache related stalls.

2014-09-26 Thread Kenneth Graunke
On Friday, September 26, 2014 04:41:14 PM Chris Wilson wrote:
 On Fri, Sep 26, 2014 at 08:36:39AM -0700, Kristian Høgsberg wrote:
  On Fri, Aug 29, 2014 at 11:10:49PM -0700, Kenneth Graunke wrote:
   We don't really want extra buffer copying or stalls when mapping,
   so it'd be nice to know when it's happening.
   
   Signed-off-by: Kenneth Graunke kenn...@whitecape.org
  
  Reviewed-by: Kristian Høgsberg k...@bitplanet.net
 
 This warns if the the program cache is currently being read by the GPU
 (expected) but a read-read (as used here) does not incur a stall.
 -Chris

Good catch!  Since we're doing a read-only mapping, and all of our relocations 
to this buffer have 0 for the write domains, GEM knows that nobody is altering 
it, so there shouldn't be a stall.  Even though i915_gem_set_domain_ioctl calls 
i915_gem_object_wait_rendering__nonblocking, it shouldn't actually wait.

Thanks for spotting this.  I'll drop this hunk.

I suppose this is a problem with my stall-warning code in general... 
drm_intel_bo_busy() == true does not necessarily imply that there will be a 
stall when mapping it.  I hadn't considered that.

It sounds like patch 4 (using unsynchronized mappings) is still useful though, 
as drm_intel_bo_subdata/pwrite doesn't know that it's safe to let the CPU write 
the buffer even while the GPU is reading it.

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa: remove last DJGPP remains

2014-09-26 Thread Ian Romanick
And I was just going to start working on the Mesa software rasterizer
for DOS.  Oh well.

Reviewed-by: Ian Romanick ian.d.roman...@intel.com

On 09/26/2014 02:14 PM, Emil Velikov wrote:
 Signed-off-by: Emil Velikov emil.l.veli...@gmail.com
 ---
  src/mapi/glapi/gen/gl_x86_asm.py  | 2 +-
  src/mesa/main/dlopen.h| 7 ---
  src/mesa/main/texcompress_s3tc.c  | 2 --
  src/mesa/x86/assyntax.h   | 6 +++---
  src/mesa/x86/read_rgba_span_x86.S | 4 ++--
  5 files changed, 6 insertions(+), 15 deletions(-)
 
 diff --git a/src/mapi/glapi/gen/gl_x86_asm.py 
 b/src/mapi/glapi/gen/gl_x86_asm.py
 index 919bbc0..d87d0bd 100644
 --- a/src/mapi/glapi/gen/gl_x86_asm.py
 +++ b/src/mapi/glapi/gen/gl_x86_asm.py
 @@ -72,7 +72,7 @@ class PrintGenericStubs(gl_XML.gl_print_base):
  print ''
  print '#define GL_OFFSET(x) CODEPTR(REGOFF(4 * x, EAX))'
  print ''
 -print '#if defined(GNU_ASSEMBLER)  !defined(__DJGPP__)  
 !defined(__MINGW32__)  !defined(__APPLE__)'
 +print '#if defined(GNU_ASSEMBLER)  !defined(__MINGW32__)  
 !defined(__APPLE__)'
  print '#define GLOBL_FN(x) GLOBL x ; .type x, @function'
  print '#else'
  print '#define GLOBL_FN(x) GLOBL x'
 diff --git a/src/mesa/main/dlopen.h b/src/mesa/main/dlopen.h
 index 55a56f0..3754ec1 100644
 --- a/src/mesa/main/dlopen.h
 +++ b/src/mesa/main/dlopen.h
 @@ -73,13 +73,6 @@ _mesa_dlsym(void *handle, const char *fname)
 } u;
  #if defined(__blrts)
 u.v = NULL;
 -#elif defined(__DJGPP__)
 -   /* need '_' prefix on symbol names */
 -   char fname2[1000];
 -   fname2[0] = '_';
 -   strncpy(fname2 + 1, fname, 998);
 -   fname2[999] = 0;
 -   u.v = dlsym(handle, fname2);
  #elif defined(HAVE_DLOPEN)
 u.v = dlsym(handle, fname);
  #elif defined(__MINGW32__)
 diff --git a/src/mesa/main/texcompress_s3tc.c 
 b/src/mesa/main/texcompress_s3tc.c
 index 5b275ef..254f84e 100644
 --- a/src/mesa/main/texcompress_s3tc.c
 +++ b/src/mesa/main/texcompress_s3tc.c
 @@ -51,8 +51,6 @@
  #define DXTN_LIBNAME dxtn.dll
  #define RTLD_LAZY 0
  #define RTLD_GLOBAL 0
 -#elif defined(__DJGPP__)
 -#define DXTN_LIBNAME dxtn.dxe
  #else
  #define DXTN_LIBNAME libtxc_dxtn.so
  #endif
 diff --git a/src/mesa/x86/assyntax.h b/src/mesa/x86/assyntax.h
 index fa7d92e..67867bd 100644
 --- a/src/mesa/x86/assyntax.h
 +++ b/src/mesa/x86/assyntax.h
 @@ -255,7 +255,7 @@
  #endif /* ACK_ASSEMBLER */
  
  
 -#if defined(__QNX__) || defined(Lynx) || (defined(SYSV) || defined(SVR4))  
 !defined(ACK_ASSEMBLER) || defined(__ELF__) || defined(__GNU__) || 
 defined(__GNUC__)  !defined(__DJGPP__)  !defined(__MINGW32__)
 +#if defined(__QNX__) || defined(Lynx) || (defined(SYSV) || defined(SVR4))  
 !defined(ACK_ASSEMBLER) || defined(__ELF__) || defined(__GNU__) || 
 defined(__GNUC__)  !defined(__MINGW32__)
  #define GLNAME(a)a
  #else
  #define GLNAME(a)CONCAT(_,a)
 @@ -1727,9 +1727,9 @@
   * If we build with gcc's -fvisibility=hidden flag, we'll need to change
   * the symbol visibility mode to 'default'.
   */
 -#if defined(GNU_ASSEMBLER)  !defined(__DJGPP__)  !defined(__MINGW32__) 
  !defined(__APPLE__)
 +#if defined(GNU_ASSEMBLER)  !defined(__MINGW32__)  !defined(__APPLE__)
  #  define HIDDEN(x) .hidden x
 -#elif defined(__GNUC__)  !defined(__DJGPP__)  !defined(__MINGW32__)  
 !defined(__APPLE__)
 +#elif defined(__GNUC__)  !defined(__MINGW32__)  !defined(__APPLE__)
  #  pragma GCC visibility push(default)
  #  define HIDDEN(x) .hidden x
  #else
 diff --git a/src/mesa/x86/read_rgba_span_x86.S 
 b/src/mesa/x86/read_rgba_span_x86.S
 index 8177299..5def1f8 100644
 --- a/src/mesa/x86/read_rgba_span_x86.S
 +++ b/src/mesa/x86/read_rgba_span_x86.S
 @@ -31,7 +31,7 @@
   */
  
   .file   read_rgba_span_x86.S
 -#if !defined(__DJGPP__)  !defined(__MINGW32__)  !defined(__APPLE__) /* 
 this one cries for assyntax.h */
 +#if !defined(__MINGW32__)  !defined(__APPLE__) /* this one cries for 
 assyntax.h */
  /* Kevin F. Quinn 2nd July 2006
   * Replaced data segment constants with text-segment instructions.
   */
 @@ -671,7 +671,7 @@ _generic_read_RGBA_span_RGB565_MMX:
   emms
  #endif
   ret
 -#endif /* !defined(__DJGPP__)  !defined(__MINGW32__)  
 !defined(__APPLE__) */
 +#endif /* !defined(__MINGW32__)  !defined(__APPLE__) */
   
  #if defined (__ELF__)  defined (__linux__)
   .section .note.GNU-stack,,%progbits
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] Remove Bluegene/L wrappers

2014-09-26 Thread Ian Romanick
On 09/26/2014 02:14 PM, Emil Velikov wrote:
 Added back in 2009, with osmesa/GLU in mind. Unlikely to be working
 any more since the removal of the static makefiles.
 
 Cc: Brian Paul bri...@vmware.com
 Signed-off-by: Emil Velikov emil.l.veli...@gmail.com

Reviewed-by: Ian Romanick ian.d.roman...@intel.com

In dlopen.h, the code will be the same... the defined(__blrts) paths are
the same as the last #else paths.

 ---
  src/mesa/main/compiler.h |  2 +-
  src/mesa/main/dlopen.h   | 12 +++-
  2 files changed, 4 insertions(+), 10 deletions(-)
 
 diff --git a/src/mesa/main/compiler.h b/src/mesa/main/compiler.h
 index 185c911..34671dc 100644
 --- a/src/mesa/main/compiler.h
 +++ b/src/mesa/main/compiler.h
 @@ -150,7 +150,7 @@ extern C {
  #elif defined(__APPLE__)
  #include CoreFoundation/CFByteOrder.h
  #define CPU_TO_LE32( x ) CFSwapInt32HostToLittle( x )
 -#elif (defined(_AIX) || defined(__blrts))
 +#elif (defined(_AIX))
  static inline GLuint CPU_TO_LE32(GLuint x)
  {
 return (((x  0x00ff)  24) |
 diff --git a/src/mesa/main/dlopen.h b/src/mesa/main/dlopen.h
 index 3754ec1..1e77849 100644
 --- a/src/mesa/main/dlopen.h
 +++ b/src/mesa/main/dlopen.h
 @@ -47,9 +47,7 @@ typedef void (*GenericFunc)(void);
  static inline void *
  _mesa_dlopen(const char *libname, int flags)
  {
 -#if defined(__blrts)
 -   return NULL;
 -#elif defined(HAVE_DLOPEN)
 +#if defined(HAVE_DLOPEN)
 flags = RTLD_LAZY | RTLD_GLOBAL; /* Overriding flags at this time */
 return dlopen(libname, flags);
  #elif defined(__MINGW32__)
 @@ -71,9 +69,7 @@ _mesa_dlsym(void *handle, const char *fname)
void *v;
GenericFunc f;
 } u;
 -#if defined(__blrts)
 -   u.v = NULL;
 -#elif defined(HAVE_DLOPEN)
 +#if defined(HAVE_DLOPEN)
 u.v = dlsym(handle, fname);
  #elif defined(__MINGW32__)
 u.v = (void *) GetProcAddress(handle, fname);
 @@ -89,9 +85,7 @@ _mesa_dlsym(void *handle, const char *fname)
  static inline void
  _mesa_dlclose(void *handle)
  {
 -#if defined(__blrts)
 -   (void) handle;
 -#elif defined(HAVE_DLOPEN)
 +#if defined(HAVE_DLOPEN)
 dlclose(handle);
  #elif defined(__MINGW32__)
 FreeLibrary(handle);
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] i965: Use 1ull instead of 1 in BRW_NEW_* defines.

2014-09-26 Thread Kenneth Graunke
Now that the bitfield is a uint64_t, we should use 1ull.  Currently, we
only have 32 entries, so 1 works fine, but it's not future-proof.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_context.h | 64 -
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 3efd582..317724f 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -185,43 +185,43 @@ enum brw_state_id {
BRW_NUM_STATE_BITS
 };
 
-#define BRW_NEW_URB_FENCE   (1  BRW_STATE_URB_FENCE)
-#define BRW_NEW_FRAGMENT_PROGRAM(1  BRW_STATE_FRAGMENT_PROGRAM)
-#define BRW_NEW_GEOMETRY_PROGRAM(1  BRW_STATE_GEOMETRY_PROGRAM)
-#define BRW_NEW_VERTEX_PROGRAM  (1  BRW_STATE_VERTEX_PROGRAM)
-#define BRW_NEW_CURBE_OFFSETS   (1  BRW_STATE_CURBE_OFFSETS)
-#define BRW_NEW_REDUCED_PRIMITIVE   (1  BRW_STATE_REDUCED_PRIMITIVE)
-#define BRW_NEW_PRIMITIVE   (1  BRW_STATE_PRIMITIVE)
-#define BRW_NEW_CONTEXT (1  BRW_STATE_CONTEXT)
-#define BRW_NEW_PSP (1  BRW_STATE_PSP)
-#define BRW_NEW_SURFACES   (1  BRW_STATE_SURFACES)
-#define BRW_NEW_VS_BINDING_TABLE   (1  BRW_STATE_VS_BINDING_TABLE)
-#define BRW_NEW_GS_BINDING_TABLE   (1  BRW_STATE_GS_BINDING_TABLE)
-#define BRW_NEW_PS_BINDING_TABLE   (1  BRW_STATE_PS_BINDING_TABLE)
-#define BRW_NEW_INDICES(1  BRW_STATE_INDICES)
-#define BRW_NEW_VERTICES   (1  BRW_STATE_VERTICES)
+#define BRW_NEW_URB_FENCE   (1ull  BRW_STATE_URB_FENCE)
+#define BRW_NEW_FRAGMENT_PROGRAM(1ull  BRW_STATE_FRAGMENT_PROGRAM)
+#define BRW_NEW_GEOMETRY_PROGRAM(1ull  BRW_STATE_GEOMETRY_PROGRAM)
+#define BRW_NEW_VERTEX_PROGRAM  (1ull  BRW_STATE_VERTEX_PROGRAM)
+#define BRW_NEW_CURBE_OFFSETS   (1ull  BRW_STATE_CURBE_OFFSETS)
+#define BRW_NEW_REDUCED_PRIMITIVE   (1ull  BRW_STATE_REDUCED_PRIMITIVE)
+#define BRW_NEW_PRIMITIVE   (1ull  BRW_STATE_PRIMITIVE)
+#define BRW_NEW_CONTEXT (1ull  BRW_STATE_CONTEXT)
+#define BRW_NEW_PSP (1ull  BRW_STATE_PSP)
+#define BRW_NEW_SURFACES(1ull  BRW_STATE_SURFACES)
+#define BRW_NEW_VS_BINDING_TABLE(1ull  BRW_STATE_VS_BINDING_TABLE)
+#define BRW_NEW_GS_BINDING_TABLE(1ull  BRW_STATE_GS_BINDING_TABLE)
+#define BRW_NEW_PS_BINDING_TABLE(1ull  BRW_STATE_PS_BINDING_TABLE)
+#define BRW_NEW_INDICES (1ull  BRW_STATE_INDICES)
+#define BRW_NEW_VERTICES(1ull  BRW_STATE_VERTICES)
 /**
  * Used for any batch entry with a relocated pointer that will be used
  * by any 3D rendering.
  */
-#define BRW_NEW_BATCH  (1  BRW_STATE_BATCH)
+#define BRW_NEW_BATCH   (1ull  BRW_STATE_BATCH)
 /** \see brw.state.depth_region */
-#define BRW_NEW_INDEX_BUFFER   (1  BRW_STATE_INDEX_BUFFER)
-#define BRW_NEW_VS_CONSTBUF(1  BRW_STATE_VS_CONSTBUF)
-#define BRW_NEW_GS_CONSTBUF(1  BRW_STATE_GS_CONSTBUF)
-#define BRW_NEW_PROGRAM_CACHE  (1  BRW_STATE_PROGRAM_CACHE)
-#define BRW_NEW_STATE_BASE_ADDRESS (1  BRW_STATE_STATE_BASE_ADDRESS)
-#define BRW_NEW_VUE_MAP_VS (1  BRW_STATE_VUE_MAP_VS)
-#define BRW_NEW_VUE_MAP_GEOM_OUT   (1  BRW_STATE_VUE_MAP_GEOM_OUT)
-#define BRW_NEW_TRANSFORM_FEEDBACK (1  BRW_STATE_TRANSFORM_FEEDBACK)
-#define BRW_NEW_RASTERIZER_DISCARD (1  BRW_STATE_RASTERIZER_DISCARD)
-#define BRW_NEW_STATS_WM   (1  BRW_STATE_STATS_WM)
-#define BRW_NEW_UNIFORM_BUFFER  (1  BRW_STATE_UNIFORM_BUFFER)
-#define BRW_NEW_ATOMIC_BUFFER   (1  BRW_STATE_ATOMIC_BUFFER)
-#define BRW_NEW_META_IN_PROGRESS(1  BRW_STATE_META_IN_PROGRESS)
-#define BRW_NEW_INTERPOLATION_MAP   (1  BRW_STATE_INTERPOLATION_MAP)
-#define BRW_NEW_PUSH_CONSTANT_ALLOCATION (1  
BRW_STATE_PUSH_CONSTANT_ALLOCATION)
-#define BRW_NEW_NUM_SAMPLES (1  BRW_STATE_NUM_SAMPLES)
+#define BRW_NEW_INDEX_BUFFER(1ull  BRW_STATE_INDEX_BUFFER)
+#define BRW_NEW_VS_CONSTBUF (1ull  BRW_STATE_VS_CONSTBUF)
+#define BRW_NEW_GS_CONSTBUF (1ull  BRW_STATE_GS_CONSTBUF)
+#define BRW_NEW_PROGRAM_CACHE   (1ull  BRW_STATE_PROGRAM_CACHE)
+#define BRW_NEW_STATE_BASE_ADDRESS  (1ull  BRW_STATE_STATE_BASE_ADDRESS)
+#define BRW_NEW_VUE_MAP_VS  (1ull  BRW_STATE_VUE_MAP_VS)
+#define BRW_NEW_VUE_MAP_GEOM_OUT(1ull  BRW_STATE_VUE_MAP_GEOM_OUT)
+#define BRW_NEW_TRANSFORM_FEEDBACK  (1ull  BRW_STATE_TRANSFORM_FEEDBACK)
+#define BRW_NEW_RASTERIZER_DISCARD  (1ull  BRW_STATE_RASTERIZER_DISCARD)
+#define BRW_NEW_STATS_WM(1ull  BRW_STATE_STATS_WM)
+#define BRW_NEW_UNIFORM_BUFFER  (1ull  BRW_STATE_UNIFORM_BUFFER)
+#define BRW_NEW_ATOMIC_BUFFER   (1ull  

[Mesa-dev] [PATCH 5/5] i965: Drop brwBindProgram driver hook.

2014-09-26 Thread Kenneth Graunke
This function flagged BRW_NEW_*_PROGRAM

When ctx-{Vertex,Geometry,Fragment}Program._Current changes, core Mesa
calls the BindProgram driver hook, which flagged BRW_NEW_*_PROGRAM.

However, brw_upload_state also checks for that changing, sets the same
flags, and also updates brw-fragment_program and so on.  So, this looks
to be entirely redundant.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_program.c | 20 
 1 file changed, 20 deletions(-)

Tested with Piglit and a manual inspection of an apitrace of Shadowrun Returns,
which uses a variety of ARB programs.

diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index d782b4f..b37da4e 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -54,25 +54,6 @@ get_new_program_id(struct intel_screen *screen)
return id;
 }
 
-static void brwBindProgram( struct gl_context *ctx,
-   GLenum target,
-   struct gl_program *prog )
-{
-   struct brw_context *brw = brw_context(ctx);
-
-   switch (target) {
-   case GL_VERTEX_PROGRAM_ARB:
-  brw-state.dirty.brw |= BRW_NEW_VERTEX_PROGRAM;
-  break;
-   case MESA_GEOMETRY_PROGRAM:
-  brw-state.dirty.brw |= BRW_NEW_GEOMETRY_PROGRAM;
-  break;
-   case GL_FRAGMENT_PROGRAM_ARB:
-  brw-state.dirty.brw |= BRW_NEW_FRAGMENT_PROGRAM;
-  break;
-   }
-}
-
 static struct gl_program *brwNewProgram( struct gl_context *ctx,
  GLenum target,
  GLuint id )
@@ -250,7 +231,6 @@ void brwInitFragProgFuncs( struct dd_function_table 
*functions )
 {
assert(functions-ProgramStringNotify == _tnl_program_string);
 
-   functions-BindProgram = brwBindProgram;
functions-NewProgram = brwNewProgram;
functions-DeleteProgram = brwDeleteProgram;
functions-IsProgramNative = brwIsProgramNative;
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] i965: Delete CACHE_NEW_BLORP_CONST_COLOR_PROG.

2014-09-26 Thread Kenneth Graunke
Unused since krh rewrote fast clears to use meta.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_context.h  | 2 --
 src/mesa/drivers/dri/i965/brw_state_upload.c | 1 -
 2 files changed, 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 377853e..3efd582 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -685,7 +685,6 @@ enum brw_cache_id {
BRW_CC_UNIT,
BRW_WM_PROG,
BRW_BLORP_BLIT_PROG,
-   BRW_BLORP_CONST_COLOR_PROG,
BRW_SAMPLER,
BRW_WM_UNIT,
BRW_SF_PROG,
@@ -780,7 +779,6 @@ enum shader_time_shader_type {
 #define CACHE_NEW_CC_UNIT(1BRW_CC_UNIT)
 #define CACHE_NEW_WM_PROG(1BRW_WM_PROG)
 #define CACHE_NEW_BLORP_BLIT_PROG(1BRW_BLORP_BLIT_PROG)
-#define CACHE_NEW_BLORP_CONST_COLOR_PROG (1BRW_BLORP_CONST_COLOR_PROG)
 #define CACHE_NEW_SAMPLER(1BRW_SAMPLER)
 #define CACHE_NEW_WM_UNIT(1BRW_WM_UNIT)
 #define CACHE_NEW_SF_PROG(1BRW_SF_PROG)
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index dd0ceb6..f4b0475 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -520,7 +520,6 @@ static struct dirty_bit_map cache_bits[] = {
DEFINE_BIT(CACHE_NEW_CC_UNIT),
DEFINE_BIT(CACHE_NEW_WM_PROG),
DEFINE_BIT(CACHE_NEW_BLORP_BLIT_PROG),
-   DEFINE_BIT(CACHE_NEW_BLORP_CONST_COLOR_PROG),
DEFINE_BIT(CACHE_NEW_SAMPLER),
DEFINE_BIT(CACHE_NEW_WM_UNIT),
DEFINE_BIT(CACHE_NEW_SF_PROG),
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] i965: Update dirty_bit_map::bit to be a uint64_t.

2014-09-26 Thread Kenneth Graunke
This will keep INTEL_DEBUG=state working when we add BRW_NEW_* bits
beyond 1  31.  We missed doing this when widening the driver flags
from uint32_t to uint64_t.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_state_upload.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index f4b0475..b2d1bdf 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -438,7 +438,7 @@ static void xor_states( struct brw_state_flags *result,
 }
 
 struct dirty_bit_map {
-   uint32_t bit;
+   uint64_t bit;
char *name;
uint32_t count;
 };
@@ -560,7 +560,7 @@ brw_print_dirty_count(struct dirty_bit_map *bit_map)
   if (bit_map[i].bit == 0)
 return;
 
-  fprintf(stderr, 0x%08x: %12d (%s)\n,
+  fprintf(stderr, 0x%08lx: %12d (%s)\n,
  bit_map[i].bit, bit_map[i].count, bit_map[i].name);
}
 }
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] i965: Add missing /* BRW_NEW_FRAGMENT_PROGRAM */ comments.

2014-09-26 Thread Kenneth Graunke
I had to dig a bit to figure out why this was necessary.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/gen6_sf_state.c | 5 +++--
 src/mesa/drivers/dri/i965/gen7_sf_state.c | 4 ++--
 src/mesa/drivers/dri/i965/gen8_sf_state.c | 4 ++--
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c 
b/src/mesa/drivers/dri/i965/gen6_sf_state.c
index 843507e..d0411b0 100644
--- a/src/mesa/drivers/dri/i965/gen6_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c
@@ -155,6 +155,7 @@ calculate_attr_overrides(const struct brw_context *brw,
memset(attr_overrides, 0, 16*sizeof(*attr_overrides));
 
for (int attr = 0; attr  VARYING_SLOT_MAX; attr++) {
+  /* BRW_NEW_FRAGMENT_PROGRAM */
   enum glsl_interp_qualifier interp_qualifier =
  brw-fragment_program-InterpQualifier[attr];
   bool is_gl_Color = attr == VARYING_SLOT_COL0 || attr == 
VARYING_SLOT_COL1;
@@ -369,8 +370,8 @@ upload_sf_state(struct brw_context *brw)
 (1  GEN6_SF_TRIFAN_PROVOKE_SHIFT);
}
 
-   /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM |
-* CACHE_NEW_WM_PROG
+   /* BRW_NEW_VUE_MAP_GEOM_OUT | BRW_NEW_FRAGMENT_PROGRAM |
+* _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM | CACHE_NEW_WM_PROG
 */
uint32_t urb_entry_read_length;
calculate_attr_overrides(brw, attr_overrides, point_sprite_enables,
diff --git a/src/mesa/drivers/dri/i965/gen7_sf_state.c 
b/src/mesa/drivers/dri/i965/gen7_sf_state.c
index 4badc82..67e4448 100644
--- a/src/mesa/drivers/dri/i965/gen7_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sf_state.c
@@ -60,8 +60,8 @@ upload_sbe_state(struct brw_context *brw)
}
dw1 |= point_sprite_origin;
 
-   /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM |
-* CACHE_NEW_WM_PROG
+   /* BRW_NEW_VUE_MAP_GEOM_OUT | BRW_NEW_FRAGMENT_PROGRAM
+* _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM | CACHE_NEW_WM_PROG
 */
uint32_t urb_entry_read_length;
calculate_attr_overrides(brw, attr_overrides, point_sprite_enables,
diff --git a/src/mesa/drivers/dri/i965/gen8_sf_state.c 
b/src/mesa/drivers/dri/i965/gen8_sf_state.c
index 4263eaf..555e6a8 100644
--- a/src/mesa/drivers/dri/i965/gen8_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_sf_state.c
@@ -57,8 +57,8 @@ upload_sbe(struct brw_context *brw)
else
   dw1 |= GEN6_SF_POINT_SPRITE_UPPERLEFT;
 
-   /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM |
-* CACHE_NEW_WM_PROG
+   /* BRW_NEW_VUE_MAP_GEOM_OUT | BRW_NEW_FRAGMENT_PROGRAM |
+* _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM | CACHE_NEW_WM_PROG
 */
calculate_attr_overrides(brw, attr_overrides,
 point_sprite_enables,
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Recalculate cfg in emit_curb_setup

2014-09-26 Thread Jason Ekstrand
Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index ffe8ba8..95af5ab 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1477,6 +1477,8 @@ fs_visitor::assign_curb_setup()
 
prog_data-curb_read_length = ALIGN(stage_prog_data-nr_params, 8) / 8;
 
+   calculate_cfg();
+
/* Map the offsets in the UNIFORM file to fixed HW regs. */
foreach_block_and_inst(block, fs_inst, inst, cfg) {
   for (unsigned int i = 0; i  inst-sources; i++) {
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] i965: Use unsynchronized maps for the program cache on LLC platforms.

2014-09-26 Thread Kristian Høgsberg
On Fri, Sep 26, 2014 at 2:21 PM, Kenneth Graunke kenn...@whitecape.org wrote:
 On Friday, September 26, 2014 09:22:31 AM Kristian Høgsberg wrote:
 On Fri, Aug 29, 2014 at 11:10:50PM -0700, Kenneth Graunke wrote:
  There's no reason to stall on pwrite - the CPU always appends to the
  buffer and never modifies existing contents, and the GPU never writes
  it.  Further, the CPU always appends new data before submitting a batch
  that requires it.
 
  This code predates the unsynchronized mapping feature, so we simply
  didn't have the option when it was written.
 
  Ideally, we would do this for non-LLC platforms too, but unsynchronized
  mapping support only exists for LLC systems.
 
  Saves repeated 0.001ms stalls on program upload.
 
  Signed-off-by: Kenneth Graunke kenn...@whitecape.org
  ---
   src/mesa/drivers/dri/i965/brw_state_cache.c | 34 
  +++--
   1 file changed, 27 insertions(+), 7 deletions(-)
 
  diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c 
  b/src/mesa/drivers/dri/i965/brw_state_cache.c
  index b9bb0fc..1d2d32f 100644
  --- a/src/mesa/drivers/dri/i965/brw_state_cache.c
  +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
  @@ -172,14 +172,23 @@ brw_cache_new_bo(struct brw_cache *cache, uint32_t 
  new_size)
  drm_intel_bo *new_bo;
 
  new_bo = drm_intel_bo_alloc(brw-bufmgr, program cache, new_size, 
  64);
  +   if (brw-has_llc)
  +  drm_intel_gem_bo_map_unsynchronized(new_bo);
 
  /* Copy any existing data that needs to be saved. */
  if (cache-next_offset != 0) {
  -  brw_bo_map(brw, cache-bo, false, program cache);
  -  drm_intel_bo_subdata(new_bo, 0, cache-next_offset, 
  cache-bo-virtual);
  -  drm_intel_bo_unmap(cache-bo);
  +  if (brw-has_llc) {
  + memcpy(new_bo-virtual, cache-bo-virtual, cache-next_offset);

 Move the drm_intel_gem_bo_map_unsynchronized() and drm_intel_bo_unmap()
 calls into this block so they bracket the memcpy as for the subdata case
 below?

 Other than that,

 Reviewed-by: Kristian Høgsberg k...@bitplanet.net

 That won't work---the point is to map new_bo, and leave it mapped...and unmap 
 the old BO before throwing it away.  If I moved the map call into the if 
 (cache-next_offset != 0) block, then the initial mapping would never occur.

Yup, that makes sense.

Kristian


  +  } else {
  + brw_bo_map(brw, cache-bo, false, program cache);
  + drm_intel_bo_subdata(new_bo, 0, cache-next_offset,
  +  cache-bo-virtual);
  + drm_intel_bo_unmap(cache-bo);
  +  }
  }
 
  +   if (brw-has_llc)
  +  drm_intel_bo_unmap(cache-bo);
  drm_intel_bo_unreference(cache-bo);
  cache-bo = new_bo;
  cache-bo_used_by_gpu = false;
  @@ -222,9 +231,11 @@ brw_try_upload_using_copy(struct brw_cache *cache,
  continue;
   }
 
  -brw_bo_map(brw, cache-bo, false, program cache);
  + if (!brw-has_llc)
  +brw_bo_map(brw, cache-bo, false, program cache);
   ret = memcmp(cache-bo-virtual + item-offset, data, item-size);
  -drm_intel_bo_unmap(cache-bo);
  + if (!brw-has_llc)
  +drm_intel_bo_unmap(cache-bo);
   if (ret)
  continue;
 
  @@ -257,7 +268,7 @@ brw_upload_item_data(struct brw_cache *cache,
  /* If we would block on writing to an in-use program BO, just
   * recreate it.
   */
  -   if (cache-bo_used_by_gpu) {
  +   if (!brw-has_llc  cache-bo_used_by_gpu) {
 perf_debug(Copying busy program cache buffer.\n);
 brw_cache_new_bo(cache, cache-bo-size);
  }
  @@ -280,6 +291,7 @@ brw_upload_cache(struct brw_cache *cache,
   uint32_t *out_offset,
   void *out_aux)
   {
  +   struct brw_context *brw = cache-brw;
  struct brw_cache_item *item = CALLOC_STRUCT(brw_cache_item);
  GLuint hash;
  void *tmp;
  @@ -320,7 +332,11 @@ brw_upload_cache(struct brw_cache *cache,
  cache-n_items++;
 
  /* Copy data to the buffer */
  -   drm_intel_bo_subdata(cache-bo, item-offset, data_size, data);
  +   if (brw-has_llc) {
  +  memcpy((char *) cache-bo-virtual + item-offset, data, data_size);
  +   } else {
  +  drm_intel_bo_subdata(cache-bo, item-offset, data_size, data);
  +   }
 
  *out_offset = item-offset;
  *(void **)out_aux = (void *)((char *)item-key + item-key_size);
  @@ -342,6 +358,8 @@ brw_init_caches(struct brw_context *brw)
  cache-bo = drm_intel_bo_alloc(brw-bufmgr,
program cache,
4096, 64);
  +   if (brw-has_llc)
  +  drm_intel_gem_bo_map_unsynchronized(cache-bo);
 
  cache-aux_compare[BRW_VS_PROG] = brw_vs_prog_data_compare;
  cache-aux_compare[BRW_GS_PROG] = brw_gs_prog_data_compare;
  @@ -408,6 +426,8 @@ brw_destroy_cache(struct brw_context *brw, struct 
  brw_cache *cache)
 
  DBG(%s\n, __FUNCTION__);
 
  +   if (brw-has_llc)
  +  drm_intel_bo_unmap(cache-bo);
  

[Mesa-dev] [PATCH 2/2] mesa: Avoid flagging _NEW_VIEWPORT on redundant viewport updates.

2014-09-26 Thread Kenneth Graunke
Cuts the number of i965 color calculator viewport uploads by 100x
(11017983 - 113385) in 'x11perf -gc' with Glamor in Xephyr.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/main/viewport.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/mesa/main/viewport.c b/src/mesa/main/viewport.c
index 6545bf6..222ae30 100644
--- a/src/mesa/main/viewport.c
+++ b/src/mesa/main/viewport.c
@@ -58,6 +58,12 @@ set_viewport_no_notify(struct gl_context *ctx, unsigned idx,
 ctx-Const.ViewportBounds.Min, ctx-Const.ViewportBounds.Max);
}
 
+   if (ctx-ViewportArray[idx].X == x 
+   ctx-ViewportArray[idx].Width == width 
+   ctx-ViewportArray[idx].Y == y 
+   ctx-ViewportArray[idx].Height == height)
+  return;
+
ctx-ViewportArray[idx].X = x;
ctx-ViewportArray[idx].Width = width;
ctx-ViewportArray[idx].Y = y;
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965: Drop CACHE_NEW_VS_PROG from the gen7_sf_state atom.

2014-09-26 Thread Kenneth Graunke
I believe when I wrote this code, gen6_sf_state used CACHE_NEW_VS_PROG,
which has since been replaced by BRW_NEW_VUE_MAP_GEOM_OUT.  It's not
needed here anyway - only SBE needs it.  Just a copy and paste mistake.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/gen7_sf_state.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_sf_state.c 
b/src/mesa/drivers/dri/i965/gen7_sf_state.c
index 67e4448..150a4d3 100644
--- a/src/mesa/drivers/dri/i965/gen7_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sf_state.c
@@ -254,7 +254,7 @@ const struct brw_tracked_state gen7_sf_state = {
_NEW_POINT |
 _NEW_MULTISAMPLE),
   .brw   = BRW_NEW_CONTEXT,
-  .cache = CACHE_NEW_VS_PROG
+  .cache = 0,
},
.emit = upload_sf_state,
 };
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Recalculate cfg in emit_curb_setup

2014-09-26 Thread Matt Turner
On Fri, Sep 26, 2014 at 2:59 PM, Jason Ekstrand ja...@jlekstrand.net wrote:
 Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com
 ---
  src/mesa/drivers/dri/i965/brw_fs.cpp | 2 ++
  1 file changed, 2 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs.cpp
 index ffe8ba8..95af5ab 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
 @@ -1477,6 +1477,8 @@ fs_visitor::assign_curb_setup()

 prog_data-curb_read_length = ALIGN(stage_prog_data-nr_params, 8) / 8;

 +   calculate_cfg();
 +
 /* Map the offsets in the UNIFORM file to fixed HW regs. */
 foreach_block_and_inst(block, fs_inst, inst, cfg) {
for (unsigned int i = 0; i  inst-sources; i++) {
 --
 2.1.0

This shouldn't be necessary. We never invalidate the cfg after
calculating it the first time.

Something's wrong.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] radeonsi/compute: directly emit CONTEXT_CONTROL

2014-09-26 Thread Marek Olšák
On Fri, Sep 26, 2014 at 3:04 PM, Alex Deucher alexdeuc...@gmail.com wrote:
 On Thu, Sep 25, 2014 at 3:02 PM, Tom Stellard t...@stellard.net wrote:
 On Mon, Sep 22, 2014 at 09:48:43PM +0200, Marek Olšák wrote:
 No, we cannot detect compute-only contexts yet. We need to add a new
 parameter to pipe_context::context_create which says that a context is
 compute-only. That should be OpenCL but not OpenGL.

 Also, some code paths like resource_copy_region use the graphics
 engine for copying, which cannot be used with compute rings and must
 be implemented with either DMA or compute-based blits. DMA isn't
 flexible enough, so some additional work for compute-based blits might
 be needed. We can also use the graphics ring for copying only and the
 compute ring for compute stuff.


 If possible, I think I would prefer continuing to use the graphic ring
 for blits and only submit compute specific packets to the compute ring.
 I'm a little concerned that adding a compute-flag to context create
 might make it harder to share code between compute and graphics, which
 I think is important.

 What are the downsides of using both rings at once?  Will we need to add
 synchronization code for the two rings?  I think the last time I
 looked into doing this, the biggest problem was that fences were
 submitted via the graphics ring even though they were meant for jobs
 on the compute ring.  Is there are good solution to this?

 It would be nice to not have any dependencies on the gfx ring.  That
 way compute jobs can run on the compute rings without requiring the
 gfx ring which should avoid any latency issues with desktop gfx jobs.

In that case we have to rewrite resource_copy_region and make it use
compute shaders only. The ideal time for that would be after
ARB_compute_shader (from GL4.3) has been implemented.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] mesa: Avoid flagging _NEW_VIEWPORT on redundant viewport updates.

2014-09-26 Thread Brian Paul

On 09/26/2014 04:13 PM, Kenneth Graunke wrote:

Cuts the number of i965 color calculator viewport uploads by 100x
(11017983 - 113385) in 'x11perf -gc' with Glamor in Xephyr.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
  src/mesa/main/viewport.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/src/mesa/main/viewport.c b/src/mesa/main/viewport.c
index 6545bf6..222ae30 100644
--- a/src/mesa/main/viewport.c
+++ b/src/mesa/main/viewport.c
@@ -58,6 +58,12 @@ set_viewport_no_notify(struct gl_context *ctx, unsigned idx,
  ctx-Const.ViewportBounds.Min, ctx-Const.ViewportBounds.Max);
 }

+   if (ctx-ViewportArray[idx].X == x 
+   ctx-ViewportArray[idx].Width == width 
+   ctx-ViewportArray[idx].Y == y 
+   ctx-ViewportArray[idx].Height == height)
+  return;
+
 ctx-ViewportArray[idx].X = x;
 ctx-ViewportArray[idx].Width = width;
 ctx-ViewportArray[idx].Y = y;



Reviewed-by: Brian Paul bri...@vmware.com

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/5] i965: Fix INTEL_DEBUG=state to work with 64-bit dirty bits.

2014-09-26 Thread Kenneth Graunke
This will keep INTEL_DEBUG=state working when we add BRW_NEW_* bits
beyond 1  31.  We missed doing this when widening the driver flags
from uint32_t to uint64_t.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_state_upload.c | 23 +++
 1 file changed, 7 insertions(+), 16 deletions(-)

NAK on i965: Update dirty_bit_map::bit to be a uint64_t.
It wasn't sufficient to keep this working.  I've now actually created
bits 32 and 33, and verified that they are counted and printed correctly.

diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index f4b0475..e124ce4 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -438,7 +438,7 @@ static void xor_states( struct brw_state_flags *result,
 }
 
 struct dirty_bit_map {
-   uint32_t bit;
+   uint64_t bit;
char *name;
uint32_t count;
 };
@@ -475,7 +475,8 @@ static struct dirty_bit_map mesa_bits[] = {
DEFINE_BIT(_NEW_PROGRAM_CONSTANTS),
DEFINE_BIT(_NEW_BUFFER_OBJECT),
DEFINE_BIT(_NEW_FRAG_CLAMP),
-   DEFINE_BIT(_NEW_VARYING_VP_INPUTS),
+   /* Avoid sign extension problems. */
+   {(unsigned) _NEW_VARYING_VP_INPUTS, _NEW_VARYING_VP_INPUTS, 0},
{0, 0, 0}
 };
 
@@ -538,14 +539,9 @@ static struct dirty_bit_map cache_bits[] = {
 
 
 static void
-brw_update_dirty_count(struct dirty_bit_map *bit_map, int32_t bits)
+brw_update_dirty_count(struct dirty_bit_map *bit_map, uint64_t bits)
 {
-   int i;
-
-   for (i = 0; i  32; i++) {
-  if (bit_map[i].bit == 0)
-return;
-
+   for (int i = 0; bit_map[i].bit != 0; i++) {
   if (bit_map[i].bit  bits)
 bit_map[i].count++;
}
@@ -554,13 +550,8 @@ brw_update_dirty_count(struct dirty_bit_map *bit_map, 
int32_t bits)
 static void
 brw_print_dirty_count(struct dirty_bit_map *bit_map)
 {
-   int i;
-
-   for (i = 0; i  32; i++) {
-  if (bit_map[i].bit == 0)
-return;
-
-  fprintf(stderr, 0x%08x: %12d (%s)\n,
+   for (int i = 0; bit_map[i].bit != 0; i++) {
+  fprintf(stderr, 0x%016lx: %12d (%s)\n,
  bit_map[i].bit, bit_map[i].count, bit_map[i].name);
}
 }
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3.5/5] i965: Use ~0ull when flagging all BRW_NEW_* dirty flags.

2014-09-26 Thread Kenneth Graunke
~0 is 0x, which only covers the first 32 bits.  We need all 64.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_blorp.cpp  | 2 +-
 src/mesa/drivers/dri/i965/brw_state_cache.c  | 2 +-
 src/mesa/drivers/dri/i965/brw_state_upload.c | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

I think Jordan/Paul fixed this with macros, but we reverted that patch.
This fixes it in the minimal way; we can think about adding macros later.

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp.cpp
index 2c00bce..20ce7b7 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp
@@ -276,7 +276,7 @@ retry:
/* We've smashed all state compared to what the normal 3D pipeline
 * rendering tracks for GL.
 */
-   brw-state.dirty.brw = ~0;
+   brw-state.dirty.brw = ~0ull;
brw-state.dirty.cache = ~0;
brw-no_depth_or_stencil = false;
brw-ib.type = -1;
diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c 
b/src/mesa/drivers/dri/i965/brw_state_cache.c
index 882d131..62e03b1 100644
--- a/src/mesa/drivers/dri/i965/brw_state_cache.c
+++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
@@ -379,7 +379,7 @@ brw_clear_cache(struct brw_context *brw, struct brw_cache 
*cache)
 * any offsets leftover in brw_context will no longer be valid.
 */
brw-state.dirty.mesa |= ~0;
-   brw-state.dirty.brw |= ~0;
+   brw-state.dirty.brw |= ~0ull;
brw-state.dirty.cache |= ~0;
intel_batchbuffer_flush(brw);
 }
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index e124ce4..9e3cfb8 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -388,7 +388,7 @@ void brw_init_state( struct brw_context *brw )
brw_upload_initial_gpu_state(brw);
 
brw-state.dirty.mesa = ~0;
-   brw-state.dirty.brw = ~0;
+   brw-state.dirty.brw = ~0ull;
 
/* Make sure that brw-state.dirty.brw has enough bits to hold all possible
 * dirty flags.
@@ -575,7 +575,7 @@ void brw_upload_state(struct brw_context *brw)
if (0) {
   /* Always re-emit all state. */
   state-mesa |= ~0;
-  state-brw |= ~0;
+  state-brw |= ~0ull;
   state-cache |= ~0;
}
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Properly calculate the number of instructions in calculate_register_pressure

2014-09-26 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index b9bd94c..97b39e1 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3400,7 +3400,9 @@ fs_visitor::calculate_register_pressure()
invalidate_live_intervals();
calculate_live_intervals();
 
-   unsigned num_instructions = instructions.length();
+   unsigned num_instructions = 0;
+   foreach_block(block, cfg)
+  num_instructions = block-instructions.length();
 
regs_live_at_ip = rzalloc_array(mem_ctx, int, num_instructions);
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Properly calculate the number of instructions in calculate_register_pressure

2014-09-26 Thread Ilia Mirkin
On Fri, Sep 26, 2014 at 7:09 PM, Jason Ekstrand ja...@jlekstrand.net wrote:
 ---
  src/mesa/drivers/dri/i965/brw_fs.cpp | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs.cpp
 index b9bd94c..97b39e1 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
 @@ -3400,7 +3400,9 @@ fs_visitor::calculate_register_pressure()
 invalidate_live_intervals();
 calculate_live_intervals();

 -   unsigned num_instructions = instructions.length();
 +   unsigned num_instructions = 0;
 +   foreach_block(block, cfg)
 +  num_instructions = block-instructions.length();

This seems odd. Did you mean += perchance?


 regs_live_at_ip = rzalloc_array(mem_ctx, int, num_instructions);

 --
 2.1.0

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Properly calculate the number of instructions in calculate_register_pressure

2014-09-26 Thread Matt Turner
On Fri, Sep 26, 2014 at 4:09 PM, Jason Ekstrand ja...@jlekstrand.net wrote:
 ---
  src/mesa/drivers/dri/i965/brw_fs.cpp | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs.cpp
 index b9bd94c..97b39e1 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
 @@ -3400,7 +3400,9 @@ fs_visitor::calculate_register_pressure()
 invalidate_live_intervals();
 calculate_live_intervals();

 -   unsigned num_instructions = instructions.length();
 +   unsigned num_instructions = 0;
 +   foreach_block(block, cfg)
 +  num_instructions = block-instructions.length();

+=


 regs_live_at_ip = rzalloc_array(mem_ctx, int, num_instructions);

 --
 2.1.0

Oh, yeah. Nice find.

Reviewed-by: Matt Turner matts...@gmail.com

We should get rid of the instructions member entirely to avoid (my)
mistakes like this.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] i965: Drop CACHE_NEW_VS_PROG from the gen7_sf_state atom.

2014-09-26 Thread Ian Romanick
Series is

Reviewed-by: Ian Romanick ian.d.roman...@intel.com

On 09/26/2014 03:13 PM, Kenneth Graunke wrote:
 I believe when I wrote this code, gen6_sf_state used CACHE_NEW_VS_PROG,
 which has since been replaced by BRW_NEW_VUE_MAP_GEOM_OUT.  It's not
 needed here anyway - only SBE needs it.  Just a copy and paste mistake.
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/gen7_sf_state.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/src/mesa/drivers/dri/i965/gen7_sf_state.c 
 b/src/mesa/drivers/dri/i965/gen7_sf_state.c
 index 67e4448..150a4d3 100644
 --- a/src/mesa/drivers/dri/i965/gen7_sf_state.c
 +++ b/src/mesa/drivers/dri/i965/gen7_sf_state.c
 @@ -254,7 +254,7 @@ const struct brw_tracked_state gen7_sf_state = {
   _NEW_POINT |
  _NEW_MULTISAMPLE),
.brw   = BRW_NEW_CONTEXT,
 -  .cache = CACHE_NEW_VS_PROG
 +  .cache = 0,
 },
 .emit = upload_sf_state,
  };
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] i965: Delete CACHE_NEW_BLORP_CONST_COLOR_PROG.

2014-09-26 Thread Ian Romanick
Series is

Reviewed-by: Ian Romanick ian.d.roman...@intel.com

On 09/26/2014 02:53 PM, Kenneth Graunke wrote:
 Unused since krh rewrote fast clears to use meta.
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/brw_context.h  | 2 --
  src/mesa/drivers/dri/i965/brw_state_upload.c | 1 -
  2 files changed, 3 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
 b/src/mesa/drivers/dri/i965/brw_context.h
 index 377853e..3efd582 100644
 --- a/src/mesa/drivers/dri/i965/brw_context.h
 +++ b/src/mesa/drivers/dri/i965/brw_context.h
 @@ -685,7 +685,6 @@ enum brw_cache_id {
 BRW_CC_UNIT,
 BRW_WM_PROG,
 BRW_BLORP_BLIT_PROG,
 -   BRW_BLORP_CONST_COLOR_PROG,
 BRW_SAMPLER,
 BRW_WM_UNIT,
 BRW_SF_PROG,
 @@ -780,7 +779,6 @@ enum shader_time_shader_type {
  #define CACHE_NEW_CC_UNIT(1BRW_CC_UNIT)
  #define CACHE_NEW_WM_PROG(1BRW_WM_PROG)
  #define CACHE_NEW_BLORP_BLIT_PROG(1BRW_BLORP_BLIT_PROG)
 -#define CACHE_NEW_BLORP_CONST_COLOR_PROG (1BRW_BLORP_CONST_COLOR_PROG)
  #define CACHE_NEW_SAMPLER(1BRW_SAMPLER)
  #define CACHE_NEW_WM_UNIT(1BRW_WM_UNIT)
  #define CACHE_NEW_SF_PROG(1BRW_SF_PROG)
 diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
 b/src/mesa/drivers/dri/i965/brw_state_upload.c
 index dd0ceb6..f4b0475 100644
 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
 +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
 @@ -520,7 +520,6 @@ static struct dirty_bit_map cache_bits[] = {
 DEFINE_BIT(CACHE_NEW_CC_UNIT),
 DEFINE_BIT(CACHE_NEW_WM_PROG),
 DEFINE_BIT(CACHE_NEW_BLORP_BLIT_PROG),
 -   DEFINE_BIT(CACHE_NEW_BLORP_CONST_COLOR_PROG),
 DEFINE_BIT(CACHE_NEW_SAMPLER),
 DEFINE_BIT(CACHE_NEW_WM_UNIT),
 DEFINE_BIT(CACHE_NEW_SF_PROG),
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3.5/5] i965: Use ~0ull when flagging all BRW_NEW_* dirty flags.

2014-09-26 Thread Matt Turner
On Fri, Sep 26, 2014 at 4:09 PM, Kenneth Graunke kenn...@whitecape.org wrote:
 ~0 is 0x, which only covers the first 32 bits.  We need all 64.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/brw_blorp.cpp  | 2 +-
  src/mesa/drivers/dri/i965/brw_state_cache.c  | 2 +-
  src/mesa/drivers/dri/i965/brw_state_upload.c | 4 ++--
  3 files changed, 4 insertions(+), 4 deletions(-)

 I think Jordan/Paul fixed this with macros, but we reverted that patch.
 This fixes it in the minimal way; we can think about adding macros later.

 diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp 
 b/src/mesa/drivers/dri/i965/brw_blorp.cpp
 index 2c00bce..20ce7b7 100644
 --- a/src/mesa/drivers/dri/i965/brw_blorp.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp
 @@ -276,7 +276,7 @@ retry:
 /* We've smashed all state compared to what the normal 3D pipeline
  * rendering tracks for GL.
  */
 -   brw-state.dirty.brw = ~0;
 +   brw-state.dirty.brw = ~0ull;
 brw-state.dirty.cache = ~0;
 brw-no_depth_or_stencil = false;
 brw-ib.type = -1;
 diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c 
 b/src/mesa/drivers/dri/i965/brw_state_cache.c
 index 882d131..62e03b1 100644
 --- a/src/mesa/drivers/dri/i965/brw_state_cache.c
 +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c
 @@ -379,7 +379,7 @@ brw_clear_cache(struct brw_context *brw, struct brw_cache 
 *cache)
  * any offsets leftover in brw_context will no longer be valid.
  */
 brw-state.dirty.mesa |= ~0;
 -   brw-state.dirty.brw |= ~0;
 +   brw-state.dirty.brw |= ~0ull;
 brw-state.dirty.cache |= ~0;
 intel_batchbuffer_flush(brw);
  }
 diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
 b/src/mesa/drivers/dri/i965/brw_state_upload.c
 index e124ce4..9e3cfb8 100644
 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
 +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
 @@ -388,7 +388,7 @@ void brw_init_state( struct brw_context *brw )
 brw_upload_initial_gpu_state(brw);

 brw-state.dirty.mesa = ~0;
 -   brw-state.dirty.brw = ~0;
 +   brw-state.dirty.brw = ~0ull;

 /* Make sure that brw-state.dirty.brw has enough bits to hold all 
 possible
  * dirty flags.
 @@ -575,7 +575,7 @@ void brw_upload_state(struct brw_context *brw)
 if (0) {
/* Always re-emit all state. */
state-mesa |= ~0;
 -  state-brw |= ~0;
 +  state-brw |= ~0ull;
state-cache |= ~0;

Something stupid about ORing with a field-width set of 1s, but that's
how the code is.

Looks good to me. The whole series is

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/compaction: Avoid (unexpected) unsigned division.

2014-09-26 Thread Matt Turner
... which leads to incorrect results on 32-bit x86.

Reported-by: Mark Janes mark.a.ja...@intel.com
---
I tried writing up a nice commit message that explained what was going
on and why this worked on 64-bit, but then I realized that it was taking
orders of magnitude longer than the fix itself and probably no one would
care anyway.

 src/mesa/drivers/dri/i965/brw_eu_compact.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_compact.c 
b/src/mesa/drivers/dri/i965/brw_eu_compact.c
index 114d18f..3f655ac 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_compact.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_compact.c
@@ -1445,8 +1445,8 @@ brw_compact_instructions(struct brw_compile *p, int 
start_offset,
 assert(brw_inst_src1_reg_file(brw, insn) == BRW_IMMEDIATE_VALUE);
 
 int jump = brw_inst_imm_d(brw, insn);
-int jump_compacted = jump / sizeof(brw_compact_inst);
-int jump_uncompacted = jump / sizeof(brw_inst);
+int jump_compacted = jump / (int)sizeof(brw_compact_inst);
+int jump_uncompacted = jump / (int)sizeof(brw_inst);
 
 target_old_ip = this_old_ip + jump_uncompacted;
 target_compacted_count = compacted_counts[target_old_ip];
-- 
1.8.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] glsl: replace while loop with without_array function

2014-09-26 Thread Timothy Arceri
Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
---
 src/glsl/ast_to_hir.cpp | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 5ec1614..1c1815b 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -3560,9 +3560,7 @@ ast_declarator_list::hir(exec_list *instructions,
  *vectors. Vertex shader inputs cannot be arrays or
  *structures.
  */
-const glsl_type *check_type = var-type;
-while (check_type-is_array())
-   check_type = check_type-element_type();
+const glsl_type *check_type = var-type-without_array();
 
 switch (check_type-base_type) {
 case GLSL_TYPE_FLOAT:
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] glsl: simplify varying lowering check

2014-09-26 Thread Timothy Arceri
This adds support for arrays of arrays and simplifies the check for gs and ts.

Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
---
 src/glsl/lower_packed_varyings.cpp | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/src/glsl/lower_packed_varyings.cpp 
b/src/glsl/lower_packed_varyings.cpp
index 7801483..60b06f4 100644
--- a/src/glsl/lower_packed_varyings.cpp
+++ b/src/glsl/lower_packed_varyings.cpp
@@ -590,14 +590,7 @@ lower_packed_varyings_visitor::needs_lowering(ir_variable 
*var)
if (var-data.explicit_location)
   return false;
 
-   const glsl_type *type = var-type;
-   if (this-gs_input_vertices != 0) {
-  assert(type-is_array());
-  type = type-element_type();
-   }
-   if (type-is_array())
-  type = type-fields.array;
-   if (type-vector_elements == 4)
+   if (var-type-without_array()-vector_elements == 4)
   return false;
return true;
 }
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] glsl: add arrays of arrays support to without_array function

2014-09-26 Thread Timothy Arceri
Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
---
 src/glsl/glsl_types.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/glsl/glsl_types.h b/src/glsl/glsl_types.h
index eeb14c2..f1d578e 100644
--- a/src/glsl/glsl_types.h
+++ b/src/glsl/glsl_types.h
@@ -505,7 +505,12 @@ struct glsl_type {
 */
const glsl_type *without_array() const
{
-  return this-is_array() ? this-fields.array : this;
+  const glsl_type *t = this;
+
+  while (t-is_array())
+ t = t-fields.array;
+
+  return t;
}
 
/**
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] Allow texture2DProjLod and textureCubeLod with Gles.

2014-09-26 Thread Kalyan Kondapally
According to GLES (i.e. 1.0 and above) spec textureCubeLod and
texture2DProjLod are built in functions. We seem to disable support
for these functions with GLES. This patch enables the support.

Signed-off-by: Kalyan Kondapally kalyan.kondapa...@intel.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84355
---
 src/glsl/builtin_functions.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
index 9be7f6d..5a024cb 100644
--- a/src/glsl/builtin_functions.cpp
+++ b/src/glsl/builtin_functions.cpp
@@ -1882,8 +1882,8 @@ builtin_builder::create_builtins()
 NULL);
 
add_function(texture2DProjLod,
-_texture(ir_txl, v110_lod, glsl_type::vec4_type,  
glsl_type::sampler2D_type, glsl_type::vec3_type, TEX_PROJECT),
-_texture(ir_txl, v110_lod, glsl_type::vec4_type,  
glsl_type::sampler2D_type, glsl_type::vec4_type, TEX_PROJECT),
+_texture(ir_txl, lod_exists_in_stage, glsl_type::vec4_type,  
glsl_type::sampler2D_type, glsl_type::vec3_type, TEX_PROJECT),
+_texture(ir_txl, lod_exists_in_stage, glsl_type::vec4_type,  
glsl_type::sampler2D_type, glsl_type::vec4_type, TEX_PROJECT),
 NULL);
 
add_function(texture3D,
@@ -1910,7 +1910,7 @@ builtin_builder::create_builtins()
 NULL);
 
add_function(textureCubeLod,
-_texture(ir_txl, v110_lod, glsl_type::vec4_type,  
glsl_type::samplerCube_type, glsl_type::vec3_type),
+_texture(ir_txl, lod_exists_in_stage, glsl_type::vec4_type,  
glsl_type::samplerCube_type, glsl_type::vec3_type),
 NULL);
 
add_function(texture2DRect,
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Allow texture2DProjLod and textureCubeLod with Gles.

2014-09-26 Thread Matt Turner
On Fri, Sep 26, 2014 at 7:44 PM, Kalyan Kondapally
kondapallykalyancontrib...@gmail.com wrote:
 According to GLES (i.e. 1.0 and above) spec textureCubeLod and
 texture2DProjLod are built in functions. We seem to disable support
 for these functions with GLES. This patch enables the support.

 Signed-off-by: Kalyan Kondapally kalyan.kondapa...@intel.com
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84355

Change the subject to

glsl: Allow texture2DProjLod and textureCubeLod in GL ES.

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev