date:20141113

https://bugs.freedesktop.org/show_bug.cgi?id=86195

--- Comment #2 from Iaroslav Andrusyak pontost...@gmail.com ---
Created attachment 109393
  -- https://bugs.freedesktop.org/attachment.cgi?id=109393action=edit
stderr

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 86195] Lightswork video editor segfaults

https://bugs.freedesktop.org/show_bug.cgi?id=86195

--- Comment #3 from Michel Dänzer mic...@daenzer.net ---
(In reply to Iaroslav Andrusyak from comment #2)
 stderr

Did that crash as well? There's only one LLVM dump in there, and no immediate
sign of a crash. If it did crash, can you try again with R600_DEBUG=vs,gs,ps?

Also, did you try DRAW_USE_LLVM=0?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/4] i965/vec4: Combine all the math emitters.

2014-11-13 Thread Kenneth Graunke

On Wednesday, November 12, 2014 09:57:30 PM Matt Turner wrote:
 On Wed, Nov 12, 2014 at 9:35 PM, Kenneth Graunke kenn...@whitecape.org 
wrote:
  +vec4_visitor::emit_math(enum opcode opcode,
  +   dst_reg dst, src_reg src0, src_reg src1)
 
 I think you can make the arguments const references too?

Yeah.  I've changed the prototype to:

void emit_math(enum opcode opcode, const dst_reg dst, const src_reg src0,
   const src_reg src1 = src_reg());

It also meant changing the first few lines to:

   vec4_instruction *math =
  emit(opcode, dst, fix_math_operand(src0), fix_math_operand(src1))

since src0 = fix_math_operand(src0) doesn't work with const src_reg .

  +   if (brw-gen == 6  dst.writemask != WRITEMASK_XYZW) {
  +  /* MATH on Gen6 must be align1, so we can't do writemasks. */
  +  math-dst = dst_reg(this, glsl_type::vec4_type);
  +  math-dst.type = dst.type;
  +  math-dst.writemask = WRITEMASK_XYZW;
 
 I don't think you need to set the writemask (XYZW is the default).

I do, actually - it's guaranteed to not be XYZW at this point.  The caller 
passed us a destination register with some writemask set.  We create the 
math instruction using dst, so it inherits that writemask.  This block 
executes when dst.writemask != WRITEMASK_XYZW.

The point is to override it back to XYZW, since it isn't.

  +  emit(MOV(dst, src_reg(math-dst)));
  +   } else if (brw-gen  6) {
  +  math-base_mrf = 1;
  +  math-mlen = src1.file == BAD_FILE ? 1 : 2;
  }
   }
 
 Series is
 
 Reviewed-by: Matt Turner matts...@gmail.com

Thanks!

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 86195] Lightswork video editor segfaults

https://bugs.freedesktop.org/show_bug.cgi?id=86195

--- Comment #4 from Iaroslav Andrusyak pontost...@gmail.com ---
DRAW_USE_LLVM=0 does not help, and there is no output in console from LW,
Lightswork totally silent. I have only several logs in lightswork folder.


StdErr.log and error.log

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 86195] Lightswork video editor segfaults

https://bugs.freedesktop.org/show_bug.cgi?id=86195

--- Comment #5 from Iaroslav Andrusyak pontost...@gmail.com ---
Created attachment 109395
  -- https://bugs.freedesktop.org/attachment.cgi?id=109395action=edit
error

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] gk20a: use NOUVEAU_BO_GART as VRAM domain

2014-11-13 Thread Alexandre Courbot


On 10/30/2014 12:29 AM, Ilia Mirkin wrote:

On Mon, Oct 27, 2014 at 6:34 AM, Alexandre Courbot acour...@nvidia.com wrote:

GK20A does not have dedicated VRAM, therefore allocating in VRAM can be
sub-optimal and sometimes even harmful. Set its VRAM domain to
NOUVEAU_BO_GART so all objects are allocated in system memory.

Signed-off-by: Alexandre Courbot acour...@nvidia.com
---
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index ac5823e4a8d5..ad143cd9a140 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -620,6 +620,16 @@ nvc0_screen_create(struct nouveau_device *dev)
return NULL;
 pscreen = screen-base.base;

+   /* Recognize chipsets with no VRAM */
+   switch (dev-chipset) {
+   /* GK20A */
+   case 0xea:
+  screen-base.vram_domain = NOUVEAU_BO_GART;


I think you also want to set vidmem_bindings = 0... although
potentially after the |= that's done below. Although I guess that
constbuf + command args buf need to be |='d into the sysmem_bindings
for this to work out well. That said, we don't really handle explicit
migration well right now, and those PIPE_BIND_* are *incredibly*
misleading and don't actually necessarily reflect the current usage.
[I have some patches to improve the situation, but you don't really
have to worry about that.]


In the light of that it could be that the vram_domain member I am 
introducing is completely useless - if we set NV_VRAM_DOMAIN to be the 
following:


#define NV_VRAM_DOMAIN(screen) ((screen)-vidmem_bindings == 0 ? 
NOUVEAU_BO_GART : NOUVEAU_BO_VRAM)


then I suspect we can just live without it. I tested quickly and it 
seems to work. Ilia, do you agree? Or could we imagine having GPUs with 
VRAM for which none of the PIPE_BIND_* targets should reside in VRAM?


Also thinking, prior to setting vidmem_bindings to 0, shouldn't we also 
do a sysmem_bindings |= vidmem_bindings to make sure all the set 
bindings are tracked somewhere?

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: Avoid deadlock when unloading opengl32.dll

Hi Tom,

That's peculiar. It looks like pthreads got into a weird state somehow.  Don't 
precisely understand how though.  Maybe there's a race inside 
pipe_semaphore_signal() with the destruction of the semaphore.

I think the best thing for now is to revert to old behavior for non-windows 
platforms:

diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c 
b/src/gallium/drivers/llvmpipe/lp_rast.c
index 6b54d43..e168766 100644
--- a/src/gallium/drivers/llvmpipe/lp_rast.c
+++ b/src/gallium/drivers/llvmpipe/lp_rast.c
@@ -800,7 +800,9 @@ static PIPE_THREAD_ROUTINE( thread_function, init_data )
   pipe_semaphore_signal(task-work_done);
}
 
+#ifdef _WIN32
pipe_semaphore_signal(task-work_done);
+#endif
 
return 0;
 }
@@ -891,7 +893,11 @@ void lp_rast_destroy( struct lp_rasterizer *rast )
 * We don't actually call pipe_thread_wait to avoid dead lock on Windows
 * per https://bugs.freedesktop.org/show_bug.cgi?id=76252 */
for (i = 0; i  rast-num_threads; i++) {
+#ifdef _WIN32
   pipe_semaphore_wait(rast-tasks[i].work_done);
+#else
+  pipe_thread_wait(rast-threads[i]);
+#endif
}
 
/* Clean up per-thread data */


Because I don't think that the Windows deadlock ever happens on Linux.

Jose



From: Tom Stellard t...@stellard.net
Sent: 13 November 2014 01:45
To: Jose Fonseca
Cc: mesa-dev@lists.freedesktop.org; Roland Scheidegger
Subject: Re: [Mesa-dev] [PATCH] llvmpipe: Avoid deadlock when unloading 
opengl32.dll

On Fri, Nov 07, 2014 at 04:52:25PM +, jfons...@vmware.com wrote:
 From: José Fonseca jfons...@vmware.com


Hi Jose,

This patch is causing random segfaults with OpenCL programs on radeonsi.
I haven't been able to figure out exactly what is happening, so I was
hoping you could help.

I think the problem has something to do with the fact that when clover
probes the hardware for OpenCL devices, the pipe_loader creates an
llvmpipe screen, checks the value of PIPE_CAP_COMPUTE, and then destroys
the screen since PIPE_CAP_COMPUTE is 0.

The only way I can reproduce this bug is by running the piglit OpenCL
tests concurrently.  If it helps, here are the stack traces
from one of the core dumps I captured from a piglit run:

(gdb) thread 1
[Switching to thread 1 (Thread 0x7f6d53cdf700 (LWP 18653))]
#0  0x7f6d53e56d2d in ?? ()
(gdb) bt
#0  0x7f6d53e56d2d in ?? ()
#1  0x in ?? ()
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f6d5495f700 (LWP 18652))]
#0  0x7f6d5aacd44c in pthread_cond_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x7f6d5aacd44c in pthread_cond_wait () from /lib64/libpthread.so.0
#1  0x7f6d54c71dbb in mtx_init (mtx=0x7f6d54c71dbb mtx_init+97,type=0) at 
../../../../../include/c11/threads_posix.h:182
#2  0x7f6d54c72157 in radeon_set_fd_access 
(applier=0x61e828,owner=0x61e800, mutex=0x7f6d54c71dbb mtx_init+97, 
request=0,request_name=0x0, enable=238 '\356') at radeon_drm_winsys.c:70
#3  0x7f6d54c7ad30 in radeon_drm_cs_emit_ioctl (param=0x61e4f0) at 
radeon_drm_winsys.c:598
#4  0x7f6d54c71ce0 in cnd_wait (cond=0x61e4f0, mtx=0x7f6d54c7ad07 
radeon_drm_cs_emit_ioctl+168) at 
../../../../../include/c11/threads_posix.h:152
#5  0x7f6d5aac91da in start_thread () from /lib64/libpthread.so.0
#6  0x7f6d5afd5d7d in clone () from /lib64/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f6d5c20c740 (LWP 18649))]
#0  0x7f6d5afae73e in re_node_set_insert_last () from /lib64/libc.so.6
(gdb) bt
#0  0x7f6d5afae73e in re_node_set_insert_last () from /lib64/libc.so.6
#1  0x7f6d5afae7fe in register_state () from /lib64/libc.so.6
#2  0x7f6d5afb1d39 in re_acquire_state_context () from /lib64/libc.so.6
#3  0x7f6d5afbaa95 in re_compile_internal () from /lib64/libc.so.6
#4  0x7f6d5afbb603 in regcomp () from /lib64/libc.so.6
#5  0x00403e9b in regex_get_matches (src=0x63e6c0 float, 
pattern=0x40b940 ^ulong|ulong2|ulong3|ulong4|ulong8|ulong16$, pmatch=0x0, 
size=0, cflags=4) at /home/tstellar/piglit/tests/cl/program/program-tester.c:476
#6  0x004040e2 in regex_match (src=0x63e6c0 float, pattern=0x40b940 
^ulong|ulong2|ulong3|ulong4|ulong8|ulong16$) at 
/home/tstellar/piglit/tests/cl/program/program-tester.c:532
#7  0x004059c6 in get_test_arg (src=0x63de70 1 buffer float[7] 0.5 
-0.5 0.0 -0.0 nan -3.99 1.5, test=0x645710, arg_in=true) at 
/home/tstellar/piglit/tests/cl/program/program-tester.c:1016
#8  0x00406f4a in parse_config ( config_str=0x63fe30 
\n[config]\nname: Test float trunc built-in on CL 1.1\nclc_version_min: 
10\ndimensions: 1\n\n[test]\nname: trunc float1\nkernel_name: 
test_1_trunc_float\nglobal_size: 7 0 0\n\narg_out: 0 buffer float[7] 0.0 
-0.0..., config=0x60e260 config) at 
/home/tstellar/piglit/tests/cl/program/program-tester.c:1410
#9  0x004074a7 in init (argc=2, argv=0x7fff46612d88, config=0x60e260 
config) at /home/tstellar/piglit/tests/cl/program/program-tester.c:1555
#10

Re: [Mesa-dev] [PATCH] radeonsi: Disable asynchronous DMA except for PIPE_BUFFER

2014-11-13 Thread Marek Olšák

Reviewed-by: Marek Olšák marek.ol...@amd.com

I suggest pasting the commit message into the code.

Marek

On Thu, Nov 13, 2014 at 7:52 AM, Michel Dänzer mic...@daenzer.net wrote:
 From: Michel Dänzer michel.daen...@amd.com

 Using the asynchronous DMA engine for multi-dimensional operations seems
 to cause random GPU lockups for various people. While the root cause for
 this might need to be fixed in the kernel, let's disable it for now.

 Before re-enabling this, please make sure you can hit all newly enabled
 paths in your testing, preferably with both piglit and real world apps,
 and get in touch with people on the bug reports below for stability
 testing.

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85647
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83500
 Cc: mesa-sta...@lists.freedesktop.org
 Signed-off-by: Michel Dänzer michel.daen...@amd.com
 ---
  src/gallium/drivers/radeonsi/si_dma.c | 3 +++
  1 file changed, 3 insertions(+)

 diff --git a/src/gallium/drivers/radeonsi/si_dma.c 
 b/src/gallium/drivers/radeonsi/si_dma.c
 index b1bd5e7..1d3b524 100644
 --- a/src/gallium/drivers/radeonsi/si_dma.c
 +++ b/src/gallium/drivers/radeonsi/si_dma.c
 @@ -250,6 +250,9 @@ void si_dma_copy(struct pipe_context *ctx,
 return;
 }

 +   /* XXX: The paths below cause lockups for some */
 +   goto fallback;
 +
 if (src-format != dst-format || src_box-depth  1 ||
 rdst-dirty_level_mask != 0 ||
 rdst-cmask.size || rdst-fmask.size ||
 --
 2.1.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

Thanks for doing this.  It's has been long overdue.

Unfortunately we are relying on TGSI_OPCODE_CND/TGSI_OPCODE_ARR internally.  
I'm also interested in cutting down used opcodes, so I'll try to replace their 
usage with something else.  But until then please hold on to those two patches.

The rest looks good AFAICT.

Concerning subroutines, we rely on BGNSUB/ENDSUB/CAL extensively.  They are 
quite convenient when translating D3D 9/10 shaders, which also have them.  And 
if one day we need to support recursive subroutines (CUDA 4.0 appears to have 
them; not sure about OpenCL, but I suppose it's only a matter of time), then 
they'll be unavoidable, as in-lining subroutines won't work anymore.

Jose




From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric 
Anholt e...@anholt.net
Sent: 13 November 2014 01:18
To: mesa-dev@lists.freedesktop.org
Subject: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

This series removes a bunch of unused opcodes, mostly from TGSI.  It
doesn't go as far as we could possibly go -- while I welcome discussion
for future patch series deleting more, I hope that discussion doesn't
derail the review process for these changes.

I haven't messed with the subroutine stuff, since I don't know what people
are planning with that.  I also haven't messed with the pack/unpack
opcodes in TGSI, since they might be useful for some of the GLSL packing
stuff.

Testing status: compile-tested ilo/r600/softpipe, touch-tested softpipe.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=KrBch2e5-gJGE_5bIs9RInABCFoKy7me_0oysUie4JIs=w3G1SjuOy0EbCJjVrC1tDok52z4eMzIiKu63rvxI7SYe=
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/4] i965/vec4: Use const references in emit() functions.

Kenneth Graunke kenn...@whitecape.org writes:

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org

Reviewed-by: Francisco Jerez curroje...@riseup.net

 ---
  src/mesa/drivers/dri/i965/brw_vec4.h   | 18 --
  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 11 ++-
  2 files changed, 14 insertions(+), 15 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
 b/src/mesa/drivers/dri/i965/brw_vec4.h
 index 3301dd8..ebbf882 100644
 --- a/src/mesa/drivers/dri/i965/brw_vec4.h
 +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
 @@ -399,16 +399,14 @@ public:
 vec4_instruction *emit(vec4_instruction *inst);
  
 vec4_instruction *emit(enum opcode opcode);
 -
 -   vec4_instruction *emit(enum opcode opcode, dst_reg dst);
 -
 -   vec4_instruction *emit(enum opcode opcode, dst_reg dst, src_reg src0);
 -
 -   vec4_instruction *emit(enum opcode opcode, dst_reg dst,
 -   src_reg src0, src_reg src1);
 -
 -   vec4_instruction *emit(enum opcode opcode, dst_reg dst,
 -   src_reg src0, src_reg src1, src_reg src2);
 +   vec4_instruction *emit(enum opcode opcode, const dst_reg dst);
 +   vec4_instruction *emit(enum opcode opcode, const dst_reg dst,
 +  const src_reg src0);
 +   vec4_instruction *emit(enum opcode opcode, const dst_reg dst,
 +  const src_reg src0, const src_reg src1);
 +   vec4_instruction *emit(enum opcode opcode, const dst_reg dst,
 +  const src_reg src0, const src_reg src1,
 +  const src_reg src2);
  
 vec4_instruction *emit_before(bblock_t *block,
   vec4_instruction *inst,
 diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
 b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
 index b46879b..a8ce498 100644
 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
 @@ -79,8 +79,8 @@ vec4_visitor::emit_before(bblock_t *block, vec4_instruction 
 *inst,
  }
  
  vec4_instruction *
 -vec4_visitor::emit(enum opcode opcode, dst_reg dst,
 -src_reg src0, src_reg src1, src_reg src2)
 +vec4_visitor::emit(enum opcode opcode, const dst_reg dst, const src_reg 
 src0,
 +   const src_reg src1, const src_reg src2)
  {
 return emit(new(mem_ctx) vec4_instruction(this, opcode, dst,
src0, src1, src2));
 @@ -88,19 +88,20 @@ vec4_visitor::emit(enum opcode opcode, dst_reg dst,
  
  
  vec4_instruction *
 -vec4_visitor::emit(enum opcode opcode, dst_reg dst, src_reg src0, src_reg 
 src1)
 +vec4_visitor::emit(enum opcode opcode, const dst_reg dst, const src_reg 
 src0,
 +   const src_reg src1)
  {
 return emit(new(mem_ctx) vec4_instruction(this, opcode, dst, src0, src1));
  }
  
  vec4_instruction *
 -vec4_visitor::emit(enum opcode opcode, dst_reg dst, src_reg src0)
 +vec4_visitor::emit(enum opcode opcode, const dst_reg dst, const src_reg 
 src0)
  {
 return emit(new(mem_ctx) vec4_instruction(this, opcode, dst, src0));
  }
  
  vec4_instruction *
 -vec4_visitor::emit(enum opcode opcode, dst_reg dst)
 +vec4_visitor::emit(enum opcode opcode, const dst_reg dst)
  {
 return emit(new(mem_ctx) vec4_instruction(this, opcode, dst));
  }
 -- 
 2.1.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


pgpcwDdFoS5ZP.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/4] i965: Use macros to create prototypes for emitter helpers.

Kenneth Graunke kenn...@whitecape.org writes:

 We do this almost everywhere else; this should make it easier to modify.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org

For this patch:
Reviewed-by: Francisco Jerez curroje...@riseup.net

 ---
  src/mesa/drivers/dri/i965/brw_vec4.h | 98 
 +++-
  1 file changed, 41 insertions(+), 57 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
 b/src/mesa/drivers/dri/i965/brw_vec4.h
 index 750f491..3301dd8 100644
 --- a/src/mesa/drivers/dri/i965/brw_vec4.h
 +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
 @@ -414,68 +414,52 @@ public:
   vec4_instruction *inst,
vec4_instruction *new_inst);
  
 -   vec4_instruction *MOV(const dst_reg dst, const src_reg src0);
 -   vec4_instruction *NOT(const dst_reg dst, const src_reg src0);
 -   vec4_instruction *RNDD(const dst_reg dst, const src_reg src0);
 -   vec4_instruction *RNDE(const dst_reg dst, const src_reg src0);
 -   vec4_instruction *RNDZ(const dst_reg dst, const src_reg src0);
 -   vec4_instruction *FRC(const dst_reg dst, const src_reg src0);
 -   vec4_instruction *F32TO16(const dst_reg dst, const src_reg src0);
 -   vec4_instruction *F16TO32(const dst_reg dst, const src_reg src0);
 -   vec4_instruction *ADD(const dst_reg dst, const src_reg src0,
 - const src_reg src1);
 -   vec4_instruction *MUL(const dst_reg dst, const src_reg src0,
 - const src_reg src1);
 -   vec4_instruction *MACH(const dst_reg dst, const src_reg src0,
 -  const src_reg src1);
 -   vec4_instruction *MAC(const dst_reg dst, const src_reg src0,
 - const src_reg src1);
 -   vec4_instruction *AND(const dst_reg dst, const src_reg src0,
 - const src_reg src1);
 -   vec4_instruction *OR(const dst_reg dst, const src_reg src0,
 -const src_reg src1);
 -   vec4_instruction *XOR(const dst_reg dst, const src_reg src0,
 - const src_reg src1);
 -   vec4_instruction *DP3(const dst_reg dst, const src_reg src0,
 - const src_reg src1);
 -   vec4_instruction *DP4(const dst_reg dst, const src_reg src0,
 - const src_reg src1);
 -   vec4_instruction *DPH(const dst_reg dst, const src_reg src0,
 - const src_reg src1);
 -   vec4_instruction *SHL(const dst_reg dst, const src_reg src0,
 - const src_reg src1);
 -   vec4_instruction *SHR(const dst_reg dst, const src_reg src0,
 - const src_reg src1);
 -   vec4_instruction *ASR(const dst_reg dst, const src_reg src0,
 - const src_reg src1);
 +#define EMIT1(op) vec4_instruction *op(const dst_reg , const src_reg );
 +#define EMIT2(op) vec4_instruction *op(const dst_reg , const src_reg , 
 const src_reg );
 +#define EMIT3(op) vec4_instruction *op(const dst_reg , const src_reg , 
 const src_reg , const src_reg );
 +   EMIT1(MOV)
 +   EMIT1(NOT)
 +   EMIT1(RNDD)
 +   EMIT1(RNDE)
 +   EMIT1(RNDZ)
 +   EMIT1(FRC)
 +   EMIT1(F32TO16)
 +   EMIT1(F16TO32)
 +   EMIT2(ADD)
 +   EMIT2(MUL)
 +   EMIT2(MACH)
 +   EMIT2(MAC)
 +   EMIT2(AND)
 +   EMIT2(OR)
 +   EMIT2(XOR)
 +   EMIT2(DP3)
 +   EMIT2(DP4)
 +   EMIT2(DPH)
 +   EMIT2(SHL)
 +   EMIT2(SHR)
 +   EMIT2(ASR)
 vec4_instruction *CMP(dst_reg dst, src_reg src0, src_reg src1,
enum brw_conditional_mod condition);
 vec4_instruction *IF(src_reg src0, src_reg src1,
  enum brw_conditional_mod condition);
 vec4_instruction *IF(enum brw_predicate predicate);
 -   vec4_instruction *PULL_CONSTANT_LOAD(const dst_reg dst,
 -const src_reg index);
 -   vec4_instruction *SCRATCH_READ(const dst_reg dst, const src_reg index);
 -   vec4_instruction *SCRATCH_WRITE(const dst_reg dst, const src_reg src,
 -   const src_reg index);
 -   vec4_instruction *LRP(const dst_reg dst, const src_reg a,
 - const src_reg y, const src_reg x);
 -   vec4_instruction *BFREV(const dst_reg dst, const src_reg value);
 -   vec4_instruction *BFE(const dst_reg dst, const src_reg bits,
 - const src_reg offset, const src_reg value);
 -   vec4_instruction *BFI1(const dst_reg dst, const src_reg bits,
 -  const src_reg offset);
 -   vec4_instruction *BFI2(const dst_reg dst, const src_reg bfi1_dst,
 -  const src_reg insert, const src_reg base);
 -   vec4_instruction *FBH(const dst_reg dst, const src_reg value);
 -   vec4_instruction *FBL(const dst_reg dst, const src_reg value);
 -   vec4_instruction *CBIT(const dst_reg dst, const src_reg value);
 -   vec4_instruction *MAD(const dst_reg dst, const src_reg c,
 - const src_reg b, const src_reg a);
 -   vec4_instruction *ADDC(const

Re: [Mesa-dev] [PATCH 3/4] i965/vec4: Combine all the math emitters.

Kenneth Graunke kenn...@whitecape.org writes:

 On Wednesday, November 12, 2014 09:57:30 PM Matt Turner wrote:
 On Wed, Nov 12, 2014 at 9:35 PM, Kenneth Graunke kenn...@whitecape.org 
 wrote:
  +vec4_visitor::emit_math(enum opcode opcode,
  +   dst_reg dst, src_reg src0, src_reg src1)
 
 I think you can make the arguments const references too?

 Yeah.  I've changed the prototype to:

 void emit_math(enum opcode opcode, const dst_reg dst, const src_reg src0,
const src_reg src1 = src_reg());

 It also meant changing the first few lines to:

vec4_instruction *math =
   emit(opcode, dst, fix_math_operand(src0), fix_math_operand(src1))

 since src0 = fix_math_operand(src0) doesn't work with const src_reg .

  +   if (brw-gen == 6  dst.writemask != WRITEMASK_XYZW) {
  +  /* MATH on Gen6 must be align1, so we can't do writemasks. */
  +  math-dst = dst_reg(this, glsl_type::vec4_type);
  +  math-dst.type = dst.type;
  +  math-dst.writemask = WRITEMASK_XYZW;
 
 I don't think you need to set the writemask (XYZW is the default).

 I do, actually - it's guaranteed to not be XYZW at this point.  The caller 
 passed us a destination register with some writemask set.  We create the 
 math instruction using dst, so it inherits that writemask.  This block 
 executes when dst.writemask != WRITEMASK_XYZW.

 The point is to override it back to XYZW, since it isn't.

Are you sure?  You are assigning a newly created dst_reg() to math-dst,
so it should have the default writemask for a vec4, which is XYZW
already.  With that fixed and the change you mention above this patch
is:
Reviewed-by: Francisco Jerez curroje...@riseup.net

I had a very similar change in my tree, but you beat me to it ;).


  +  emit(MOV(dst, src_reg(math-dst)));
  +   } else if (brw-gen  6) {
  +  math-base_mrf = 1;
  +  math-mlen = src1.file == BAD_FILE ? 1 : 2;
  }
   }
 
 Series is
 
 Reviewed-by: Matt Turner matts...@gmail.com

 Thanks!
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


pgp2fRzzl4hUz.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] i965/vec4: Make src_reg immediate constructors explicit.

Kenneth Graunke kenn...@whitecape.org writes:

 We did this for fs_reg a while back, and it's generally a good idea.

I disagree, explicit constructors aren't a one-size-fits-all.  IMO there
are three scenarios in which explicit constructors may be a good idea:

 - Cases where your constructor may lose relevant information about its
   argument when used inadvertently.  IOW there is a many-to-one mapping
   between your argument type and your constructed type.

 - Cases where your constructor doesn't leave the constructed object
   completely initialized and some additional action may be required to
   bring the constructed object to a well-defined state.  IOW there is a
   one-to-many mapping between your argument type and your constructed
   type.

 - Cases where your constructor has to do some expensive or run-time
   environment-dependent operation.

If none of these apply your argument and constructed objects are
effectively the same thing, and declaring the constructor explicit just
adds clutter and increases the amount of typing you have to do for no
benefit.  I suspect that the immediate register constructors from both
back-ends don't fit in any of the three categories, they do the only
sane thing they could possibly do without losing any information, so I
don't see why we would want them to be explicit.

Actually it would make it rather annoying to pass immediates around with
the i965 IR builder framework I'm working on for
ARB_shader_image_load_store unless I change my src_vector type to have a
constructor for each immediate type instead of relying on the implicit
conversion to src/fs_reg, but then I'd have to maintain another
constructor for each possible src/fs_reg constructor argument and keep
them up to date.

I agree though that there is a good reason for the src_reg(dst_reg)
constructor and its converse to be marked explicit, because they
(currently) lose information.  dst_reg(src_reg) necessarily loses
component ordering information because you cannot represent that as a
writemask, the transformation could be better behaved than what we have
if it calculated the subset of components referenced by the swizzle of
its argument instead of special-casing .

There's no good reason why src_reg(dst_reg) should lose information, and
I think it would make sense and it would be very convenient to make it
implicit if it fulfills the property 'dst_reg(src_reg(dst_reg(x))) ==
dst_reg(x)' and we fix it so the following code does the only one sane
thing:

| dst_reg reg = x;
| ADD(reg, src_reg(reg), y);

I can send patches to address the last two issues, actually I have a fix
for them lying around in some branch...

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/i965/brw_vec4.h  |  6 +--
  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 35 ---
  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 12 ++---
  src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 55 
 ---
  4 files changed, 55 insertions(+), 53 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
 b/src/mesa/drivers/dri/i965/brw_vec4.h
 index 8abd166..3d2882d 100644
 --- a/src/mesa/drivers/dri/i965/brw_vec4.h
 +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
 @@ -99,9 +99,9 @@ public:
  
 src_reg(register_file file, int reg, const glsl_type *type);
 src_reg();
 -   src_reg(float f);
 -   src_reg(uint32_t u);
 -   src_reg(int32_t i);
 +   explicit src_reg(float f);
 +   explicit src_reg(uint32_t u);
 +   explicit src_reg(int32_t i);
 src_reg(struct brw_reg reg);
  
 bool equals(const src_reg r) const;
 diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
 b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
 index db0e6cc..58c4df2 100644
 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
 @@ -150,7 +150,7 @@ vec4_gs_visitor::emit_prolog()
  */
 this-current_annotation = clear r0.2;
 dst_reg r0(retype(brw_vec4_grf(0, 0), BRW_REGISTER_TYPE_UD));
 -   vec4_instruction *inst = emit(GS_OPCODE_SET_DWORD_2, r0, 0u);
 +   vec4_instruction *inst = emit(GS_OPCODE_SET_DWORD_2, r0, src_reg(0u));
 inst-force_writemask_all = true;
  
 /* Create a virtual register to hold the vertex count */
 @@ -158,7 +158,7 @@ vec4_gs_visitor::emit_prolog()
  
 /* Initialize the vertex_count register to 0 */
 this-current_annotation = initialize vertex_count;
 -   inst = emit(MOV(dst_reg(this-vertex_count), 0u));
 +   inst = emit(MOV(dst_reg(this-vertex_count), src_reg(0u)));
 inst-force_writemask_all = true;
  
 if (c-control_data_header_size_bits  0) {
 @@ -173,7 +173,7 @@ vec4_gs_visitor::emit_prolog()
 */
if (c-control_data_header_size_bits = 32) {
   this-current_annotation = initialize control data bits;
 - inst = emit(MOV(dst_reg(this-control_data_bits), 0u));
 + inst = emit(MOV(dst_reg(this-control_data_bits),

Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

It looks like ARR is generated, as 
src/gallium/state_trackers/nine/nine_shader.c has

#define _OPI(o,t,vv1,vv2,pv1,pv2,d,s,h) \
{ D3DSIO_##o, TGSI_OPCODE_##t, { vv1, vv2 }, { pv1, pv2, }, d, s, h }

[...]

  _OPI(MOVA, ARR, V(2,0), V(3,0), V(0,0), V(0,0), 1, 1, NULL),


Jose


From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric 
Anholt e...@anholt.net
Sent: 13 November 2014 01:43
To: Ilia Mirkin
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

Ilia Mirkin imir...@alum.mit.edu writes:

 AFAIK at least some of these (NRM, ARR, probably others) were being used by
 the d3d9 state tracker. Not sure what its status is, but I believe the hope
 was to eventually get it into the tree.

They've got code for lowering NRM and CND to sanity, and no use of ARR,
ARA, X2D, RFL, STR, SFL, or BRA.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: Avoid deadlock when unloading opengl32.dll

2014-11-13 Thread Tom Stellard

On Thu, Nov 13, 2014 at 11:10:39AM +, Jose Fonseca wrote:
 Hi Tom,
 
 That's peculiar. It looks like pthreads got into a weird state somehow.  
 Don't precisely understand how though.  Maybe there's a race inside 
 pipe_semaphore_signal() with the destruction of the semaphore.
 
 I think the best thing for now is to revert to old behavior for non-windows 
 platforms:
 
 diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c 
 b/src/gallium/drivers/llvmpipe/lp_rast.c
 index 6b54d43..e168766 100644
 --- a/src/gallium/drivers/llvmpipe/lp_rast.c
 +++ b/src/gallium/drivers/llvmpipe/lp_rast.c
 @@ -800,7 +800,9 @@ static PIPE_THREAD_ROUTINE( thread_function, init_data )
pipe_semaphore_signal(task-work_done);
 }
  
 +#ifdef _WIN32
 pipe_semaphore_signal(task-work_done);
 +#endif
  
 return 0;
  }
 @@ -891,7 +893,11 @@ void lp_rast_destroy( struct lp_rasterizer *rast )
  * We don't actually call pipe_thread_wait to avoid dead lock on Windows
  * per https://bugs.freedesktop.org/show_bug.cgi?id=76252 */
 for (i = 0; i  rast-num_threads; i++) {
 +#ifdef _WIN32
pipe_semaphore_wait(rast-tasks[i].work_done);
 +#else
 +  pipe_thread_wait(rast-threads[i]);
 +#endif
 }
  
 /* Clean up per-thread data */
 
 
 Because I don't think that the Windows deadlock ever happens on Linux.
 

This solution works for me.  Feel free to commit.

I wonder if the problem may be the pipe-loader is unloading pipe_swrast.so
before all the threads have finished.

-Tom


 Jose
 
 
 
 From: Tom Stellard t...@stellard.net
 Sent: 13 November 2014 01:45
 To: Jose Fonseca
 Cc: mesa-dev@lists.freedesktop.org; Roland Scheidegger
 Subject: Re: [Mesa-dev] [PATCH] llvmpipe: Avoid deadlock when unloading 
 opengl32.dll
 
 On Fri, Nov 07, 2014 at 04:52:25PM +, jfons...@vmware.com wrote:
  From: José Fonseca jfons...@vmware.com
 
 
 Hi Jose,
 
 This patch is causing random segfaults with OpenCL programs on radeonsi.
 I haven't been able to figure out exactly what is happening, so I was
 hoping you could help.
 
 I think the problem has something to do with the fact that when clover
 probes the hardware for OpenCL devices, the pipe_loader creates an
 llvmpipe screen, checks the value of PIPE_CAP_COMPUTE, and then destroys
 the screen since PIPE_CAP_COMPUTE is 0.
 
 The only way I can reproduce this bug is by running the piglit OpenCL
 tests concurrently.  If it helps, here are the stack traces
 from one of the core dumps I captured from a piglit run:
 
 (gdb) thread 1
 [Switching to thread 1 (Thread 0x7f6d53cdf700 (LWP 18653))]
 #0  0x7f6d53e56d2d in ?? ()
 (gdb) bt
 #0  0x7f6d53e56d2d in ?? ()
 #1  0x in ?? ()
 (gdb) thread 2
 [Switching to thread 2 (Thread 0x7f6d5495f700 (LWP 18652))]
 #0  0x7f6d5aacd44c in pthread_cond_wait () from /lib64/libpthread.so.0
 (gdb) bt
 #0  0x7f6d5aacd44c in pthread_cond_wait () from /lib64/libpthread.so.0
 #1  0x7f6d54c71dbb in mtx_init (mtx=0x7f6d54c71dbb mtx_init+97,type=0) 
 at ../../../../../include/c11/threads_posix.h:182
 #2  0x7f6d54c72157 in radeon_set_fd_access 
 (applier=0x61e828,owner=0x61e800, mutex=0x7f6d54c71dbb mtx_init+97, 
 request=0,request_name=0x0, enable=238 '\356') at radeon_drm_winsys.c:70
 #3  0x7f6d54c7ad30 in radeon_drm_cs_emit_ioctl (param=0x61e4f0) at 
 radeon_drm_winsys.c:598
 #4  0x7f6d54c71ce0 in cnd_wait (cond=0x61e4f0, mtx=0x7f6d54c7ad07 
 radeon_drm_cs_emit_ioctl+168) at 
 ../../../../../include/c11/threads_posix.h:152
 #5  0x7f6d5aac91da in start_thread () from /lib64/libpthread.so.0
 #6  0x7f6d5afd5d7d in clone () from /lib64/libc.so.6
 (gdb) thread 3
 [Switching to thread 3 (Thread 0x7f6d5c20c740 (LWP 18649))]
 #0  0x7f6d5afae73e in re_node_set_insert_last () from /lib64/libc.so.6
 (gdb) bt
 #0  0x7f6d5afae73e in re_node_set_insert_last () from /lib64/libc.so.6
 #1  0x7f6d5afae7fe in register_state () from /lib64/libc.so.6
 #2  0x7f6d5afb1d39 in re_acquire_state_context () from /lib64/libc.so.6
 #3  0x7f6d5afbaa95 in re_compile_internal () from /lib64/libc.so.6
 #4  0x7f6d5afbb603 in regcomp () from /lib64/libc.so.6
 #5  0x00403e9b in regex_get_matches (src=0x63e6c0 float, 
 pattern=0x40b940 ^ulong|ulong2|ulong3|ulong4|ulong8|ulong16$, pmatch=0x0, 
 size=0, cflags=4) at 
 /home/tstellar/piglit/tests/cl/program/program-tester.c:476
 #6  0x004040e2 in regex_match (src=0x63e6c0 float, pattern=0x40b940 
 ^ulong|ulong2|ulong3|ulong4|ulong8|ulong16$) at 
 /home/tstellar/piglit/tests/cl/program/program-tester.c:532
 #7  0x004059c6 in get_test_arg (src=0x63de70 1 buffer float[7] 0.5 
 -0.5 0.0 -0.0 nan -3.99 1.5, test=0x645710, arg_in=true) at 
 /home/tstellar/piglit/tests/cl/program/program-tester.c:1016
 #8  0x00406f4a in parse_config ( config_str=0x63fe30 
 \n[config]\nname: Test float trunc built-in on CL 1.1\nclc_version_min: 
 10\ndimensions: 1\n\n[test]\nname: trunc

[Mesa-dev] [PATCH] linker: Add a missing space in an error message

2014-11-13 Thread Neil Roberts

---
 src/glsl/linker.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index bd2aa3c..41d6a82 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -2411,7 +2411,7 @@ reserve_explicit_locations(struct gl_shader_program *prog,
   * or linker error will be generated.
   */
  linker_error(prog,
-  location qualifier for uniform %s overlaps
+  location qualifier for uniform %s overlaps 
   previously used location,
   var-name);
  return false;
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] linker: Add a missing space in an error message

2014-11-13 Thread Brian Paul


On 11/13/2014 08:32 AM, Neil Roberts wrote:

---
  src/glsl/linker.cpp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index bd2aa3c..41d6a82 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -2411,7 +2411,7 @@ reserve_explicit_locations(struct gl_shader_program *prog,
* or linker error will be generated.
*/
   linker_error(prog,
-  location qualifier for uniform %s overlaps
+  location qualifier for uniform %s overlaps 
previously used location,
var-name);
   return false;



Reviewed-by: Brian Paul bri...@vmware.com

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

2014-11-13 Thread Brian Paul


On 11/12/2014 06:18 PM, Eric Anholt wrote:

This series removes a bunch of unused opcodes, mostly from TGSI.  It
doesn't go as far as we could possibly go -- while I welcome discussion
for future patch series deleting more, I hope that discussion doesn't
derail the review process for these changes.

I haven't messed with the subroutine stuff, since I don't know what people
are planning with that.  I also haven't messed with the pack/unpack
opcodes in TGSI, since they might be useful for some of the GLSL packing
stuff.

Testing status: compile-tested ilo/r600/softpipe, touch-tested softpipe.


Except for what Jose said, this looks fine to me.

Does anyone remember if there was a reason that the TGSI_OPCODE_ tokens 
are #defines instead of an enumeration?


-Brian


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

2014-11-13 Thread Marek Olšák

Nine can lower ARR into ROUND+ARL easily.

Marek

On Thu, Nov 13, 2014 at 3:33 PM, Jose Fonseca jfons...@vmware.com wrote:
 It looks like ARR is generated, as 
 src/gallium/state_trackers/nine/nine_shader.c has

 #define _OPI(o,t,vv1,vv2,pv1,pv2,d,s,h) \
 { D3DSIO_##o, TGSI_OPCODE_##t, { vv1, vv2 }, { pv1, pv2, }, d, s, h }

 [...]

   _OPI(MOVA, ARR, V(2,0), V(3,0), V(0,0), V(0,0), 1, 1, NULL),


 Jose

 
 From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric 
 Anholt e...@anholt.net
 Sent: 13 November 2014 01:43
 To: Ilia Mirkin
 Cc: mesa-dev@lists.freedesktop.org
 Subject: Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

 Ilia Mirkin imir...@alum.mit.edu writes:

 AFAIK at least some of these (NRM, ARR, probably others) were being used by
 the d3d9 state tracker. Not sure what its status is, but I believe the hope
 was to eventually get it into the tree.

 They've got code for lowering NRM and CND to sanity, and no use of ARR,
 ARA, X2D, RFL, STR, SFL, or BRA.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

2014-11-13 Thread Marek Olšák

This looks good to me. Other candidates for removal:

SUB (same as ADD with the Negate bit inverted)
CLAMP (same as MIN+MAX), some drivers don't implement this
ABS (same as MOV with the Abs bit set)

Marek



On Thu, Nov 13, 2014 at 2:18 AM, Eric Anholt e...@anholt.net wrote:
 This series removes a bunch of unused opcodes, mostly from TGSI.  It
 doesn't go as far as we could possibly go -- while I welcome discussion
 for future patch series deleting more, I hope that discussion doesn't
 derail the review process for these changes.

 I haven't messed with the subroutine stuff, since I don't know what people
 are planning with that.  I also haven't messed with the pack/unpack
 opcodes in TGSI, since they might be useful for some of the GLSL packing
 stuff.

 Testing status: compile-tested ilo/r600/softpipe, touch-tested softpipe.

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

2014-11-13 Thread Ilia Mirkin

As long as we have NAND, pretty much anything can be lowered to
that... I am, of course, not advocating keeping around every insane
instruction, but it does seem a bit arbitrary as to which ones we have
and which ones we don't... I am personally guilty of adding a bunch,
and it was never clear to me how much should be left to the backend
optimizer to un-lower and how much should be done as separate
instructions.

My take was that as long there was a state tracker providing it as
input, it made sense to keep the instruction. But perhaps there's a
different policy that'd work better.

Cheers,

  -ilia


On Thu, Nov 13, 2014 at 11:40 AM, Marek Olšák mar...@gmail.com wrote:
 Nine can lower ARR into ROUND+ARL easily.

 Marek

 On Thu, Nov 13, 2014 at 3:33 PM, Jose Fonseca jfons...@vmware.com wrote:
 It looks like ARR is generated, as 
 src/gallium/state_trackers/nine/nine_shader.c has

 #define _OPI(o,t,vv1,vv2,pv1,pv2,d,s,h) \
 { D3DSIO_##o, TGSI_OPCODE_##t, { vv1, vv2 }, { pv1, pv2, }, d, s, h }

 [...]

   _OPI(MOVA, ARR, V(2,0), V(3,0), V(0,0), V(0,0), 1, 1, NULL),


 Jose

 
 From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric 
 Anholt e...@anholt.net
 Sent: 13 November 2014 01:43
 To: Ilia Mirkin
 Cc: mesa-dev@lists.freedesktop.org
 Subject: Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

 Ilia Mirkin imir...@alum.mit.edu writes:

 AFAIK at least some of these (NRM, ARR, probably others) were being used by
 the d3d9 state tracker. Not sure what its status is, but I believe the hope
 was to eventually get it into the tree.

 They've got code for lowering NRM and CND to sanity, and no use of ARR,
 ARA, X2D, RFL, STR, SFL, or BRA.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/9] i915: Gen2 texturing fixes and a few random patches

2014-11-13 Thread Ville Syrjälä

On Wed, Aug 06, 2014 at 09:56:30PM +0300, ville.syrj...@linux.intel.com wrote:
 From: Ville Syrjälä ville.syrj...@linux.intel.com
 
 I had a few rainy days during my summer vacation so I decided to fix a
 chromnium-bsu texturing problem that was nagging me for a while now. I
 ended up fixing a few other things too that I spotted mostly from reading
 the code.
 
 The aniso vs. mip filter thing probably comes down to personal preference,
 but at least to me aniso+mip nearest looks better than trilinear. At least
 when playing the old classic glaxium :)
 
 I have no idea if the scissor patch makes any difference anywhere. I just
 caught the note in the spec and noticed we're doing it in the opposite order.
 
 The rest should be pretty clear.
 
 Ville Syrjälä (9):
   i915: Fix GL_DOT3_RGBA a bit
   i915: Use L8A8 instead of I8 to simulate A8 on gen2
   i915: Override mip filter to nearest with aniso
   i915: Accept GL_DEPTH_STENCIL GL_DEPTH_COMPONENT formats for
 renderbuffers
   i915: Kill intel_context::hw_stencil
   i915: Protect macro argument for TEXTURE_SET()
   i915: Don't call _mesa_meta_glsl_Clear() on gen2
   i915: Emit 3DSTATE_SCISSOR_RECTANGLE_0 before 3DSTATE_SCISSOR_ENABLE

I finally got around to pushing the reviewed patches from this series.
Thanks for the reviews.

   i915: Only use TEXCOORDTYPE_VECTOR with cube maps on gen2

This one is still lacking a review though, and it's actually for the
original bug I set out to fix. So I'd appreaciate if someone can take
a look at the patch.

There's also this gen3 specific patch I did that would like to get
reviewed: http://patchwork.freedesktop.org/patch/31661/

I also have to confess to having a decent pile of more vertex related
i915 patches sitting in a branch, one which actually makes glxgears
render correctly on gen2 ;) I'd like to post those too but I wanted
to get the old stuff out of the way first...

-- 
Ville Syrjälä
rntel OTC
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Always enable VF statistics

2014-11-13 Thread Eric Anholt

Kenneth Graunke kenn...@whitecape.org writes:

 On Wednesday, November 12, 2014 06:54:31 PM Ben Widawsky wrote:
 Every other unit in the geometry pipeline automatically enables
 statistics gathering. This part of the pipe has been controlled by the
 DEBUG_STATS variable, but this is asymmetric. This dates back to the
 original implementation, and I am not sure if there is a reason for it.
 
 I need access to these stats to implement ARB_pipeline_statistics_query.
 
 Eric wrote it, and Ken touched it last. Do you have any opposition?
 
 Cc: Eric Anholt e...@anholt.net
 Cc: Kenneth Graunke kenn...@whitecape.org
 Signed-off-by: Ben Widawsky b...@bwidawsk.net
 ---
  src/mesa/drivers/dri/i965/brw_misc_state.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
 b/src/mesa/drivers/dri/i965/brw_misc_state.c
 index 99fcddc..2c40814 100644
 --- a/src/mesa/drivers/dri/i965/brw_misc_state.c
 +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
 @@ -929,8 +929,7 @@ brw_upload_invariant_state(struct brw_context *brw)
 const uint32_t _3DSTATE_VF_STATISTICS =
is_965 ? GEN4_3DSTATE_VF_STATISTICS : GM45_3DSTATE_VF_STATISTICS;
 BEGIN_BATCH(1);
 -   OUT_BATCH(_3DSTATE_VF_STATISTICS  16 |
 - (unlikely(INTEL_DEBUG  DEBUG_STATS) ? 1 : 0));
 +   OUT_BATCH(_3DSTATE_VF_STATISTICS  16 | 1);
 ADVANCE_BATCH();
  }

 My only complaint about this patch is that it doesn't go far enough.  I'm 
 100% 
 for removing DEBUG_STATS completely.  I've never seen any performance penalty 
 for enabling statistics.  I think we should leave them on except when there's 
 some reason to turn them off (i.e. brw-meta_in_progress flag in the clipper, 
 which prevents us from counting i.e. glClear).

Presumably there's some tiny power cost.  I don't know if it's a relevat
amount of power, though.


pgpR8OyqxuHyu.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/13 v2] gallium: Drop the unused ARA opcode.

2014-11-13 Thread Eric Anholt

Nothing in the tree generated it.
---

The rest of the rebase to deal with the conflicts with this can be found at
tgsi-opcode-nuke-2 of my Mesa tree.  (CND is also left in there)

 src/gallium/auxiliary/gallivm/lp_bld_tgsi.c | 1 -
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c | 6 --
 src/gallium/auxiliary/tgsi/tgsi_exec.c  | 4 
 src/gallium/auxiliary/tgsi/tgsi_info.c  | 2 +-
 src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h| 1 -
 src/gallium/docs/source/tgsi.rst| 8 
 src/gallium/drivers/ilo/shader/toy_tgsi.c   | 2 --
 src/gallium/drivers/r300/r300_tgsi_to_rc.c  | 1 -
 src/gallium/drivers/r600/r600_shader.c  | 6 +++---
 src/gallium/include/pipe/p_shader_tokens.h  | 2 +-
 10 files changed, 5 insertions(+), 28 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
index 4a9ce37..44a44a6 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
@@ -212,7 +212,6 @@ lp_build_tgsi_inst_llvm(
case TGSI_OPCODE_UP4B:
case TGSI_OPCODE_UP4UB:
case TGSI_OPCODE_X2D:
-   case TGSI_OPCODE_ARA:
case TGSI_OPCODE_BRA:
case TGSI_OPCODE_PUSHA:
case TGSI_OPCODE_POPA:
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
index 3b9833a..ed1798d 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
@@ -798,12 +798,6 @@ lp_emit_instruction_aos(
   return FALSE;
   break;
 
-   case TGSI_OPCODE_ARA:
-  /* deprecated */
-  assert(0);
-  return FALSE;
-  break;
-
case TGSI_OPCODE_ARR:
   src0 = lp_build_emit_fetch(bld-bld_base, inst, 0, LP_CHAN_ALL);
   dst0 = lp_build_round(bld-bld_base.base, src0);
diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index b3ea82f..578d4d8 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -3912,10 +3912,6 @@ exec_instruction(
   exec_x2d(mach, inst);
   break;
 
-   case TGSI_OPCODE_ARA:
-  assert (0);
-  break;
-
case TGSI_OPCODE_ARR:
   exec_vector_unary(mach, inst, micro_arr, TGSI_EXEC_DATA_INT, 
TGSI_EXEC_DATA_FLOAT);
   break;
diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
b/src/gallium/auxiliary/tgsi/tgsi_info.c
index d17426f..b94f5ac 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_info.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
@@ -97,7 +97,7 @@ static const struct tgsi_opcode_info 
opcode_info[TGSI_OPCODE_LAST] =
{ 1, 1, 0, 0, 0, 0, COMP, UP4B, TGSI_OPCODE_UP4B },
{ 1, 1, 0, 0, 0, 0, COMP, UP4UB, TGSI_OPCODE_UP4UB },
{ 1, 3, 0, 0, 0, 0, COMP, X2D, TGSI_OPCODE_X2D },
-   { 1, 1, 0, 0, 0, 0, COMP, ARA, TGSI_OPCODE_ARA },
+   { 0, 1, 0, 0, 0, 1, NONE, , 60 },  /* removed */
{ 1, 1, 0, 0, 0, 0, COMP, ARR, TGSI_OPCODE_ARR },
{ 0, 1, 0, 0, 0, 0, NONE, BRA, TGSI_OPCODE_BRA },
{ 0, 0, 0, 1, 0, 0, NONE, CAL, TGSI_OPCODE_CAL },
diff --git a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h 
b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
index b121d32..60ecb2d 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
@@ -112,7 +112,6 @@ OP11(UP2US)
 OP11(UP4B)
 OP11(UP4UB)
 OP13(X2D)
-OP11(ARA)
 OP11(ARR)
 OP01(BRA)
 OP00_LBL(CAL)
diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst
index c912ec5..2138b18 100644
--- a/src/gallium/docs/source/tgsi.rst
+++ b/src/gallium/docs/source/tgsi.rst
@@ -701,14 +701,6 @@ This instruction replicates its result.
Considered for removal.
 
 
-.. opcode:: ARA - Address Register Add
-
-  TBD
-
-.. note::
-
-   Considered for removal.
-
 .. opcode:: ARR - Address Register Load With Round
 
 .. math::
diff --git a/src/gallium/drivers/ilo/shader/toy_tgsi.c 
b/src/gallium/drivers/ilo/shader/toy_tgsi.c
index 1bf9f21..b71d577 100644
--- a/src/gallium/drivers/ilo/shader/toy_tgsi.c
+++ b/src/gallium/drivers/ilo/shader/toy_tgsi.c
@@ -854,7 +854,6 @@ static const toy_tgsi_translate 
aos_translate_table[TGSI_OPCODE_LAST] = {
[TGSI_OPCODE_UP4B] = aos_unsupported,
[TGSI_OPCODE_UP4UB]= aos_unsupported,
[TGSI_OPCODE_X2D]  = aos_unsupported,
-   [TGSI_OPCODE_ARA]  = aos_unsupported,
[TGSI_OPCODE_ARR]  = aos_simple,
[TGSI_OPCODE_BRA]  = aos_unsupported,
[TGSI_OPCODE_CAL]  = aos_unsupported,
@@ -1404,7 +1403,6 @@ static const toy_tgsi_translate 
soa_translate_table[TGSI_OPCODE_LAST] = {
[TGSI_OPCODE_UP4B] = soa_unsupported,
[TGSI_OPCODE_UP4UB]= soa_unsupported,
[TGSI_OPCODE_X2D]  = soa_unsupported,
-   [TGSI_OPCODE_ARA]  = soa_unsupported,
[TGSI_OPCODE_ARR]  = soa_per_channel,
[TGSI_OPCODE_BRA]  = soa_unsupported,
[TGSI_OPCODE_CAL]  =

Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

2014-11-13 Thread Roland Scheidegger

It looks like ARR is mildly useful though as hw often can implement it
natively and it benefits at least one state tracker (not that and
optimizing backend couldn't recognize round+arl but llvmpipe wouldn't at
least right now).
So, maybe it would be better to keep it for now.

Roland

Am 13.11.2014 um 17:40 schrieb Marek Olšák:
 Nine can lower ARR into ROUND+ARL easily.
 
 Marek
 
 On Thu, Nov 13, 2014 at 3:33 PM, Jose Fonseca jfons...@vmware.com wrote:
 It looks like ARR is generated, as 
 src/gallium/state_trackers/nine/nine_shader.c has

 #define _OPI(o,t,vv1,vv2,pv1,pv2,d,s,h) \
 { D3DSIO_##o, TGSI_OPCODE_##t, { vv1, vv2 }, { pv1, pv2, }, d, s, h }

 [...]

   _OPI(MOVA, ARR, V(2,0), V(3,0), V(0,0), V(0,0), 1, 1, NULL),


 Jose

 
 From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric 
 Anholt e...@anholt.net
 Sent: 13 November 2014 01:43
 To: Ilia Mirkin
 Cc: mesa-dev@lists.freedesktop.org
 Subject: Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

 Ilia Mirkin imir...@alum.mit.edu writes:

 AFAIK at least some of these (NRM, ARR, probably others) were being used by
 the d3d9 state tracker. Not sure what its status is, but I believe the hope
 was to eventually get it into the tree.

 They've got code for lowering NRM and CND to sanity, and no use of ARR,
 ARA, X2D, RFL, STR, SFL, or BRA.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=Taa6YbyiGX2xsMrwlSrA_lcjzjGuuBWzdEII8T8CFQQs=3g-djpg3gj45XldHXhQL-VFVLYNCS2hdSP8pfrU-tJ4e=
  
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=Taa6YbyiGX2xsMrwlSrA_lcjzjGuuBWzdEII8T8CFQQs=3g-djpg3gj45XldHXhQL-VFVLYNCS2hdSP8pfrU-tJ4e=
  
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/4] i965/vec4: Combine all the math emitters.

2014-11-13 Thread Kenneth Graunke

On Thursday, November 13, 2014 03:09:22 PM Francisco Jerez wrote:
 Kenneth Graunke kenn...@whitecape.org writes:
 
  On Wednesday, November 12, 2014 09:57:30 PM Matt Turner wrote:
  On Wed, Nov 12, 2014 at 9:35 PM, Kenneth Graunke kenn...@whitecape.org 
  wrote:
   +vec4_visitor::emit_math(enum opcode opcode,
   +   dst_reg dst, src_reg src0, src_reg src1)
  
  I think you can make the arguments const references too?
 
  Yeah.  I've changed the prototype to:
 
  void emit_math(enum opcode opcode, const dst_reg dst, const src_reg 
src0,
 const src_reg src1 = src_reg());
 
  It also meant changing the first few lines to:
 
 vec4_instruction *math =
emit(opcode, dst, fix_math_operand(src0), fix_math_operand(src1))
 
  since src0 = fix_math_operand(src0) doesn't work with const src_reg .
 
   +   if (brw-gen == 6  dst.writemask != WRITEMASK_XYZW) {
   +  /* MATH on Gen6 must be align1, so we can't do writemasks. */
   +  math-dst = dst_reg(this, glsl_type::vec4_type);
   +  math-dst.type = dst.type;
   +  math-dst.writemask = WRITEMASK_XYZW;
  
  I don't think you need to set the writemask (XYZW is the default).
 
  I do, actually - it's guaranteed to not be XYZW at this point.  The caller 
  passed us a destination register with some writemask set.  We create the 
  math instruction using dst, so it inherits that writemask.  This block 
  executes when dst.writemask != WRITEMASK_XYZW.
 
  The point is to override it back to XYZW, since it isn't.
 
 Are you sure?  You are assigning a newly created dst_reg() to math-dst,
 so it should have the default writemask for a vec4, which is XYZW
 already.  With that fixed and the change you mention above this patch
 is:
 Reviewed-by: Francisco Jerez curroje...@riseup.net
 
 I had a very similar change in my tree, but you beat me to it ;).

You're both right, of course.  I've dropped the XYZW setting.

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Always enable VF statistics

2014-11-13 Thread Ben Widawsky

On Thu, Nov 13, 2014 at 09:47:10AM -0800, Eric Anholt wrote:
 Kenneth Graunke kenn...@whitecape.org writes:
 
  On Wednesday, November 12, 2014 06:54:31 PM Ben Widawsky wrote:
  Every other unit in the geometry pipeline automatically enables
  statistics gathering. This part of the pipe has been controlled by the
  DEBUG_STATS variable, but this is asymmetric. This dates back to the
  original implementation, and I am not sure if there is a reason for it.
  
  I need access to these stats to implement ARB_pipeline_statistics_query.
  
  Eric wrote it, and Ken touched it last. Do you have any opposition?
  
  Cc: Eric Anholt e...@anholt.net
  Cc: Kenneth Graunke kenn...@whitecape.org
  Signed-off-by: Ben Widawsky b...@bwidawsk.net
  ---
   src/mesa/drivers/dri/i965/brw_misc_state.c | 3 +--
   1 file changed, 1 insertion(+), 2 deletions(-)
  
  diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
  b/src/mesa/drivers/dri/i965/brw_misc_state.c
  index 99fcddc..2c40814 100644
  --- a/src/mesa/drivers/dri/i965/brw_misc_state.c
  +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
  @@ -929,8 +929,7 @@ brw_upload_invariant_state(struct brw_context *brw)
  const uint32_t _3DSTATE_VF_STATISTICS =
 is_965 ? GEN4_3DSTATE_VF_STATISTICS : GM45_3DSTATE_VF_STATISTICS;
  BEGIN_BATCH(1);
  -   OUT_BATCH(_3DSTATE_VF_STATISTICS  16 |
  -   (unlikely(INTEL_DEBUG  DEBUG_STATS) ? 1 : 0));
  +   OUT_BATCH(_3DSTATE_VF_STATISTICS  16 | 1);
  ADVANCE_BATCH();
   }
 
  My only complaint about this patch is that it doesn't go far enough.  I'm 
  100% 
  for removing DEBUG_STATS completely.  I've never seen any performance 
  penalty 
  for enabling statistics.  I think we should leave them on except when 
  there's 
  some reason to turn them off (i.e. brw-meta_in_progress flag in the 
  clipper, 
  which prevents us from counting i.e. glClear).
 
 Presumably there's some tiny power cost.  I don't know if it's a relevat
 amount of power, though.

It would totally be blown away by the PS stats I assume. So not taking
that as a NAK.

BTW, also:
buzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86145


-- 
Ben Widawsky, Intel Open Source Technology Center
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

Initially TGSI used to be an union of all possible opcodes (NV/ARB fp/vp, Mesa 
IR, D3D Shader Model 1.x, 2.x, more recently D3D10).

But in practice it's just too much of a hassle, and many of the opcodes were 
never handled or generated. Many received little to no testing.

Particularly when implementing drivers for modern hardware that doesn't have 
opcodes to match with the Shader Model 1.x and 2.x quirky semantics, they are 
distractions.  Furthermore the apps who used to generate them are simple by 
nowadays standards and run fine on fast modern hardware.

By having a smaller set of opcodes they can be tested more easily, so one can 
have more confidence that they do actually work as intended; and developing 
analysis/optimization/transformation passes becomes easier too.


But I have no definite answer on which should or not be in TGSI.  D3D10/11 
assembly is not a bad reference, but it has some omissions   It's a matter of 
deciding on case by case..

Jose


From: ibmir...@gmail.com ibmir...@gmail.com on behalf of Ilia Mirkin 
imir...@alum.mit.edu
Sent: 13 November 2014 17:13
To: Marek Olšák
Cc: Jose Fonseca; Eric Anholt; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

As long as we have NAND, pretty much anything can be lowered to
that... I am, of course, not advocating keeping around every insane
instruction, but it does seem a bit arbitrary as to which ones we have
and which ones we don't... I am personally guilty of adding a bunch,
and it was never clear to me how much should be left to the backend
optimizer to un-lower and how much should be done as separate
instructions.

My take was that as long there was a state tracker providing it as
input, it made sense to keep the instruction. But perhaps there's a
different policy that'd work better.

Cheers,

  -ilia


On Thu, Nov 13, 2014 at 11:40 AM, Marek Olšák mar...@gmail.com wrote:
 Nine can lower ARR into ROUND+ARL easily.

 Marek

 On Thu, Nov 13, 2014 at 3:33 PM, Jose Fonseca jfons...@vmware.com wrote:
 It looks like ARR is generated, as 
 src/gallium/state_trackers/nine/nine_shader.c has

 #define _OPI(o,t,vv1,vv2,pv1,pv2,d,s,h) \
 { D3DSIO_##o, TGSI_OPCODE_##t, { vv1, vv2 }, { pv1, pv2, }, d, s, h }

 [...]

   _OPI(MOVA, ARR, V(2,0), V(3,0), V(0,0), V(0,0), 1, 1, NULL),


 Jose

 
 From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric 
 Anholt e...@anholt.net
 Sent: 13 November 2014 01:43
 To: Ilia Mirkin
 Cc: mesa-dev@lists.freedesktop.org
 Subject: Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

 Ilia Mirkin imir...@alum.mit.edu writes:

 AFAIK at least some of these (NRM, ARR, probably others) were being used by
 the d3d9 state tracker. Not sure what its status is, but I believe the hope
 was to eventually get it into the tree.

 They've got code for lowering NRM and CND to sanity, and no use of ARR,
 ARA, X2D, RFL, STR, SFL, or BRA.
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIFaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=j854NOxlaV5nq8kWcima4dP_7hhtaOc2Uj1eJJzZOUMs=51MpEXASrETyaVvEjR8y1V-NPHxlTTfeHhX4Bb8TgKEe=
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

I've eliminated our internal dependency on TGSI_OPCODE_CND (by replacing 
SUB+CMP).   So you can commit the change to remove it as far as I'm concerned.

I have mixed feelings about ARR, because the operation it does is essentially 
an iround(), i.e., (int)roundf(), and at least when targeting x86, we can 
generate better code with the combination.

That said neither D3D10,  GLSL, or OpenCL C code has built-ins for iround(), so 
to be of benefit we'd need to do pattern matching.  So I'm not sure if it's 
worth to keep this around just for that...

Jose


From: Jose Fonseca
Sent: 13 November 2014 13:06
To: Eric Anholt; mesa-dev@lists.freedesktop.org
Subject: RE: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

Thanks for doing this.  It's has been long overdue.

Unfortunately we are relying on TGSI_OPCODE_CND/TGSI_OPCODE_ARR internally.  
I'm also interested in cutting down used opcodes, so I'll try to replace their 
usage with something else.  But until then please hold on to those two patches.

The rest looks good AFAICT.

Concerning subroutines, we rely on BGNSUB/ENDSUB/CAL extensively.  They are 
quite convenient when translating D3D 9/10 shaders, which also have them.  And 
if one day we need to support recursive subroutines (CUDA 4.0 appears to have 
them; not sure about OpenCL, but I suppose it's only a matter of time), then 
they'll be unavoidable, as in-lining subroutines won't work anymore.

Jose




From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric 
Anholt e...@anholt.net
Sent: 13 November 2014 01:18
To: mesa-dev@lists.freedesktop.org
Subject: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

This series removes a bunch of unused opcodes, mostly from TGSI.  It
doesn't go as far as we could possibly go -- while I welcome discussion
for future patch series deleting more, I hope that discussion doesn't
derail the review process for these changes.

I haven't messed with the subroutine stuff, since I don't know what people
are planning with that.  I also haven't messed with the pack/unpack
opcodes in TGSI, since they might be useful for some of the GLSL packing
stuff.

Testing status: compile-tested ilo/r600/softpipe, touch-tested softpipe.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=KrBch2e5-gJGE_5bIs9RInABCFoKy7me_0oysUie4JIs=w3G1SjuOy0EbCJjVrC1tDok52z4eMzIiKu63rvxI7SYe=
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 86070] Host application crash on vmware fusion 7 in vmw_swc_flush

https://bugs.freedesktop.org/show_bug.cgi?id=86070

Sinclair Yeh s...@vmware.com changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |WONTFIX

--- Comment #7 from Sinclair Yeh s...@vmware.com ---
This issue has been fixed by ce9a3a8997d86f3bf387f23578972acb5b16ac4ac, which
is in MESA 10.1.0 onwards.  The fix is not trivial to back port to MESA 8.0.4
and so the easiest way is to wait until the next update of Ubuntu 12.04.

There is a temporary workaround if waiting for the next update is not an
option.
1.  Download MESA 10.1.2 from freedesktop
2.  ./configure --prefix=PATH OF YOUR CHOOSING --with-gallium-drivers=svga
--enable-xa --disable-dri3
3.  make install

After the above is done, before running mplay-bin, do an export
LD_LIBRARY_PATH=PATH OF YOUR CHOOSING FROM EARLIER/lib

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

2014-11-13 Thread Roland Scheidegger

FWIW opencl explicit conversion instructions have optional rounding mode
modifiers.

Roland

Am 13.11.2014 um 21:19 schrieb Jose Fonseca:
I've eliminated our internal dependency on TGSI_OPCODE_CND (by replacing
SUB+CMP). So you can commit the change to remove it as far as I'm concerned.

I have mixed feelings about ARR, because the operation it does is essentially
an iround(), i.e., (int)roundf(), and at least when targeting x86, we can
generate better code with the combination.

That said neither D3D10, GLSL, or OpenCL C code has built-ins for iround(),
so to be of benefit we'd need to do pattern matching. So I'm not sure if
it's worth to keep this around just for that...

Jose

From: Jose Fonseca
Sent: 13 November 2014 13:06
To: Eric Anholt; mesa-dev@lists.freedesktop.org
Subject: RE: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

Thanks for doing this. It's has been long overdue.

Unfortunately we are relying on TGSI_OPCODE_CND/TGSI_OPCODE_ARR internally.
I'm also interested in cutting down used opcodes, so I'll try to replace
their usage with something else. But until then please hold on to those two
patches.

The rest looks good AFAICT.

Concerning subroutines, we rely on BGNSUB/ENDSUB/CAL extensively. They are
quite convenient when translating D3D 9/10 shaders, which also have them.
And if one day we need to support recursive subroutines (CUDA 4.0 appears to
have them; not sure about OpenCL, but I suppose it's only a matter of time),
then they'll be unavoidable, as in-lining subroutines won't work anymore.

Jose

From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric
Anholt e...@anholt.net
Sent: 13 November 2014 01:18
To: mesa-dev@lists.freedesktop.org
Subject: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

This series removes a bunch of unused opcodes, mostly from TGSI. It
doesn't go as far as we could possibly go -- while I welcome discussion
for future patch series deleting more, I hope that discussion doesn't
derail the review process for these changes.

I haven't messed with the subroutine stuff, since I don't know what people
are planning with that. I also haven't messed with the pack/unpack
opcodes in TGSI, since they might be useful for some of the GLSL packing
stuff.

Testing status: compile-tested ilo/r600/softpipe, touch-tested softpipe.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=KrBch2e5-gJGE_5bIs9RInABCFoKy7me_0oysUie4JIs=w3G1SjuOy0EbCJjVrC1tDok52z4eMzIiKu63rvxI7SYe=
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=5I_PKJFWlfqxJa2pURZQFykxOixuGPmVNdNc0FEBojMs=cacpp7IDyYxBIzOO6UYU6IzVdrr6EoyBV66p1rS2Vu0e=

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] mesa/main: Add sse2 streaming clamping

2014-11-13 Thread Bruno Jimenez

On Wed, 2014-11-12 at 21:47 +0200, Juha-Pekka Heikkila wrote:
 On 12.11.2014 19:36, Bruno Jimenez wrote:
  On Wed, 2014-11-12 at 14:50 +0200, Juha-Pekka Heikkila wrote:
  Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
  ---
   src/mesa/Makefile.am  |   8 +++
   src/mesa/main/sse2_clamping.c | 138 
  ++
   src/mesa/main/sse2_clamping.h |  49 +++
   3 files changed, 195 insertions(+)
   create mode 100644 src/mesa/main/sse2_clamping.c
   create mode 100644 src/mesa/main/sse2_clamping.h
 
  diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
  index 932db4f..43dbe87 100644
  --- a/src/mesa/Makefile.am
  +++ b/src/mesa/Makefile.am
  @@ -111,6 +111,10 @@ if SSE41_SUPPORTED
   ARCH_LIBS += libmesa_sse41.la
   endif
   
  +if SSE2_SUPPORTED
  +ARCH_LIBS += libmesa_sse2.la
  +endif
  +
   MESA_ASM_FILES_FOR_ARCH =
   
   if HAVE_X86_ASM
  @@ -155,6 +159,10 @@ libmesa_sse41_la_SOURCES = \
 main/sse_minmax.c
   libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
   
  +libmesa_sse2_la_SOURCES = \
  +  main/sse2_clamping.c
  +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2
  +
   pkgconfigdir = $(libdir)/pkgconfig
   pkgconfig_DATA = gl.pc
   
  diff --git a/src/mesa/main/sse2_clamping.c b/src/mesa/main/sse2_clamping.c
  new file mode 100644
  index 000..66c7dc7
  --- /dev/null
  +++ b/src/mesa/main/sse2_clamping.c
  @@ -0,0 +1,138 @@
  +/*
  + * Copyright © 2014 Intel Corporation
  + *
  + * Permission is hereby granted, free of charge, to any person obtaining a
  + * copy of this software and associated documentation files (the 
  Software),
  + * to deal in the Software without restriction, including without 
  limitation
  + * the rights to use, copy, modify, merge, publish, distribute, 
  sublicense,
  + * and/or sell copies of the Software, and to permit persons to whom the
  + * Software is furnished to do so, subject to the following conditions:
  + *
  + * The above copyright notice and this permission notice (including the 
  next
  + * paragraph) shall be included in all copies or substantial portions of 
  the
  + * Software.
  + *
  + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, 
  EXPRESS OR
  + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
  MERCHANTABILITY,
  + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT 
  SHALL
  + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
  OTHER
  + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
  + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
  DEALINGS
  + * IN THE SOFTWARE.
  + *
  + * Authors:
  + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
  + *
  + */
  +
  +#ifdef __SSE2__
  +#include main/macros.h
  +#include main/sse2_clamping.h
  +#include emmintrin.h
  +
  +/**
  + * Clamp four float values to [min,max]
  + */
  +static inline void
  +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min,
  +   const float max)
  +{
  +   __m128  operand, minval, maxval;
  +
  +   operand = _mm_loadu_ps(src);
  +   minval = _mm_set1_ps(min);
  +   maxval = _mm_set1_ps(max);
  +   operand = _mm_max_ps(operand, minval);
  +   operand = _mm_min_ps(operand, maxval);
  +   _mm_storeu_ps(result, operand);
  +}
  +
  +
  +/* Clamp n amount float rgba pixels to [min,max] using SSE2
  + */
  +__attribute__((optimize(unroll-loops)))
  +void
  +_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4],
  + GLfloat rgba_dst[][4], const GLfloat min,
  + const GLfloat max)
  +{
  +   int  c, prefetch_c;
  +   float*   worker = rgba_src[0][0];
  +   __m128   operand[2], minval, maxval;
  +
  +   _mm_prefetch((char*) (((unsigned long)worker)|0x1f) + 65, _MM_HINT_T0);
 ^^^
  
  Hi,
  
  May I ask why precisely this numbers?
 
 0x1f as you note below is a typo, should be 0x0f. 65 is cache line
 length added with one to even the |0x1f operation.

Hi,

I supposed that it could be something like that, but I wasn't fully
sure, thanks for the answer.

 
  
  +
  +   minval = _mm_set1_ps(min);
  +   maxval = _mm_set1_ps(max);
  +
  +   for (c = n*4; c  0  (((unsigned long)worker)0x1f) != 0; c--, 
  worker++) {
  ^
  
  I guess that this is for alignment, but you only need to align to a 16
  bytes boundary, not 32. Or maybe I am missing something obvious.
  
 
 You are correct, 0x1f is typo. should be 0x0f
 
  +  operand[0] = _mm_load_ss(worker);
  +  operand[0] = _mm_max_ss(operand[0], minval);
  +  operand[0] = _mm_min_ss(operand[0], maxval);
  +  _mm_store_ss(worker, operand[0]);
  +   }
  +
  +   while (c = 8) {
  +  _mm_prefetch((char*) worker + 64, _MM_HINT_T0);
^^^
  +
  +  for

Re: [Mesa-dev] [PATCH 4/4] i965/fs: Remove is_valid_3src().

2014-11-13 Thread Anuj Phogat

On Wed, Nov 12, 2014 at 11:15 AM, Matt Turner matts...@gmail.com wrote:

 ---
  src/mesa/drivers/dri/i965/brw_fs.cpp | 6 --
  src/mesa/drivers/dri/i965/brw_fs.h   | 1 -
  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +-
  3 files changed, 1 insertion(+), 8 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs.cpp
 index 7003691..9196af9 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
 @@ -622,12 +622,6 @@ fs_reg::is_contiguous() const
 return stride == 1;
  }

 -bool
 -fs_reg::is_valid_3src() const
 -{
 -   return file == GRF || file == UNIFORM;
 -}
 -
  int
  fs_visitor::type_size(const struct glsl_type *type)
  {
 diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
 b/src/mesa/drivers/dri/i965/brw_fs.h
 index 0dae800..9e1dddc 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.h
 +++ b/src/mesa/drivers/dri/i965/brw_fs.h
 @@ -83,7 +83,6 @@ public:
 fs_reg(fs_visitor *v, const struct glsl_type *type);

 bool equals(const fs_reg r) const;
 -   bool is_valid_3src() const;
 bool is_contiguous() const;

 /** Smear a channel of the reg to all channels. */
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 index ce4d8c8..f112466 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 @@ -514,7 +514,7 @@ fs_visitor::visit(ir_expression *ir)
  ir-operands[operand]-fprint(stderr);
   fprintf(stderr, \n);
}
 -  assert(this-result.is_valid_3src());
 +  assert(this-result.file == GRF || this-result.file == UNIFORM);
op[operand] = this-result;

/* Matrix expression operands should have been broken down to vector
 --
 2.0.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Series is Reviewed-by: Anuj Phogat anuj.pho...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] r600g/compute: Stop leaking CL shader RAM/VRAM

shader-code_bo was leaked VRAM
shader-bc.bytecode, shader-binary.* were leaked system memory.

Signed-off-by: Aaron Watry awa...@gmail.com
---
 src/gallium/drivers/r600/evergreen_compute.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
b/src/gallium/drivers/r600/evergreen_compute.c
index 5389f96..f3ccffd 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -268,6 +268,13 @@ void evergreen_delete_compute_state(struct pipe_context 
*ctx, void* state)
FREE(shader-kernels);
shader-kernels = NULL;
}
+#else
+   pipe_resource_reference(shader-code_bo, NULL);
+   FREE(shader-bc.bytecode);
+   FREE(shader-binary.code);
+   FREE(shader-binary.config);
+   FREE(shader-binary.global_symbol_offsets);
+   FREE(shader-binary.rodata);
 #endif
 #endif
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/4] aux/pipe_loader: Don't leak dlerror string on dlopen failure

dlopen allocates a string on dlopen failure which is retrieved via dlerror. In
order to free that string, you need to retrieve and then free it.

In order to keep things legit the windows/other util_dl_error paths allocate
and then copy their error message into a buffer as well.

Signed-off-by: Aaron Watry awa...@gmail.com
CC: Ilia Mirkin imir...@alum.mit.edu

v3: Switch comment to C-Style
v2: Use strdup instead of calloc/strcpy
---
 src/gallium/auxiliary/pipe-loader/pipe_loader.c | 5 +
 src/gallium/auxiliary/util/u_dl.c   | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader.c 
b/src/gallium/auxiliary/pipe-loader/pipe_loader.c
index 8e79f85..7a4e0b1 100644
--- a/src/gallium/auxiliary/pipe-loader/pipe_loader.c
+++ b/src/gallium/auxiliary/pipe-loader/pipe_loader.c
@@ -25,6 +25,8 @@
  *
  **/
 
+#include dlfcn.h
+
 #include pipe_loader_priv.h
 
 #include util/u_inlines.h
@@ -101,6 +103,9 @@ pipe_loader_find_module(struct pipe_loader_device *dev,
  if (lib) {
 return lib;
  }
+
+ /* Retrieve the dlerror() str so that it can be freed properly */
+ FREE(util_dl_error());
   }
}
 
diff --git a/src/gallium/auxiliary/util/u_dl.c 
b/src/gallium/auxiliary/util/u_dl.c
index aca435d..00c4d7c 100644
--- a/src/gallium/auxiliary/util/u_dl.c
+++ b/src/gallium/auxiliary/util/u_dl.c
@@ -87,8 +87,8 @@ util_dl_error(void)
 #if defined(PIPE_OS_UNIX)
return dlerror();
 #elif defined(PIPE_OS_WINDOWS)
-   return unknown error;
+   return strdup(unknown error);
 #else
-   return unknown error;
+   return strdup(unknown error);
 #endif
 }
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] r600g/compute: Stop leaking CL kernel bytecode/resources

v3: Rebase and add #if guards
v2: fix indentation

Signed-off-by: Aaron Watry awa...@gmail.com
---
 src/gallium/drivers/r600/evergreen_compute.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
b/src/gallium/drivers/r600/evergreen_compute.c
index 4334743..5389f96 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -252,6 +252,25 @@ void evergreen_delete_compute_state(struct pipe_context 
*ctx, void* state)
if (!shader)
return;
 
+#if HAVE_OPENCL
+#if HAVE_LLVM  0x0306
+   if (shader-kernels) {
+   for (int i = 0; i  shader-num_kernels; i++) {
+   if (shader-kernels[i].code_bo) {
+   pipe_resource_reference(
+   (struct pipe_resource**) 
shader-kernels[i].code_bo,
+   NULL
+   );
+   }
+   FREE(shader-kernels[i].bc.bytecode);
+   shader-kernels[i].bc.bytecode = NULL;
+   }
+   FREE(shader-kernels);
+   shader-kernels = NULL;
+   }
+#endif
+#endif
+
if (shader-ctx){
struct pipe_framebuffer_state *fb_state = 
shader-ctx-framebuffer.state;
for (int i = fb_state-nr_cbufs - 1; fb_state-nr_cbufs  0 ; 
i--){
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/4] r600g/compute: Don't leak cbufs in compute state

Walk the array of cbufs backwards and free all of them.

v3: Rebase on top of changes since Aug 2014

Signed-off-by: Aaron Watry awa...@gmail.com
---
 src/gallium/drivers/r600/evergreen_compute.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
b/src/gallium/drivers/r600/evergreen_compute.c
index 90fdd79..4334743 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -252,6 +252,15 @@ void evergreen_delete_compute_state(struct pipe_context 
*ctx, void* state)
if (!shader)
return;
 
+   if (shader-ctx){
+   struct pipe_framebuffer_state *fb_state = 
shader-ctx-framebuffer.state;
+   for (int i = fb_state-nr_cbufs - 1; fb_state-nr_cbufs  0 ; 
i--){
+   shader-ctx-b.b.surface_destroy(ctx, 
fb_state-cbufs[i]);
+   fb_state-cbufs[i] = NULL;
+   fb_state-nr_cbufs--;
+   }
+   }
+
FREE(shader);
 }
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] aux/pipe_loader: Don't leak dlerror string on dlopen failure

2014-11-13 Thread Ilia Mirkin

On Thu, Nov 13, 2014 at 6:43 PM, Aaron Watry awa...@gmail.com wrote:
 dlopen allocates a string on dlopen failure which is retrieved via dlerror. In
 order to free that string, you need to retrieve and then free it.

Are you basically saying that glibc leaks memory and you're trying to
make up for it? What if you use a non-buggy library? Or is dlopen()
specified in such a way that if it fails, you must free the result of
dlerror? I see nothing in the man pages to suggest that...


 In order to keep things legit the windows/other util_dl_error paths allocate
 and then copy their error message into a buffer as well.

 Signed-off-by: Aaron Watry awa...@gmail.com
 CC: Ilia Mirkin imir...@alum.mit.edu

 v3: Switch comment to C-Style
 v2: Use strdup instead of calloc/strcpy
 ---
  src/gallium/auxiliary/pipe-loader/pipe_loader.c | 5 +
  src/gallium/auxiliary/util/u_dl.c   | 4 ++--
  2 files changed, 7 insertions(+), 2 deletions(-)

 diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader.c 
 b/src/gallium/auxiliary/pipe-loader/pipe_loader.c
 index 8e79f85..7a4e0b1 100644
 --- a/src/gallium/auxiliary/pipe-loader/pipe_loader.c
 +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader.c
 @@ -25,6 +25,8 @@
   *
   **/

 +#include dlfcn.h
 +
  #include pipe_loader_priv.h

  #include util/u_inlines.h
 @@ -101,6 +103,9 @@ pipe_loader_find_module(struct pipe_loader_device *dev,
   if (lib) {
  return lib;
   }
 +
 + /* Retrieve the dlerror() str so that it can be freed properly */
 + FREE(util_dl_error());
}
 }

 diff --git a/src/gallium/auxiliary/util/u_dl.c 
 b/src/gallium/auxiliary/util/u_dl.c
 index aca435d..00c4d7c 100644
 --- a/src/gallium/auxiliary/util/u_dl.c
 +++ b/src/gallium/auxiliary/util/u_dl.c
 @@ -87,8 +87,8 @@ util_dl_error(void)
  #if defined(PIPE_OS_UNIX)
 return dlerror();
  #elif defined(PIPE_OS_WINDOWS)
 -   return unknown error;
 +   return strdup(unknown error);
  #else
 -   return unknown error;
 +   return strdup(unknown error);
  #endif
  }
 --
 2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 86070] Host application crash on vmware fusion 7 in vmw_swc_flush

https://bugs.freedesktop.org/show_bug.cgi?id=86070

--- Comment #8 from Nicholas Yue yue.nicho...@gmail.com ---
(In reply to Sinclair Yeh from comment #7)
 This issue has been fixed by ce9a3a8997d86f3bf387f23578972acb5b16ac4ac,
 which is in MESA 10.1.0 onwards.  The fix is not trivial to back port to
 MESA 8.0.4 and so the easiest way is to wait until the next update of Ubuntu
 12.04.
 
 There is a temporary workaround if waiting for the next update is not an
 option.
 1.  Download MESA 10.1.2 from freedesktop
 2.  ./configure --prefix=PATH OF YOUR CHOOSING --with-gallium-drivers=svga
 --enable-xa --disable-dri3
 3.  make install
 
 After the above is done, before running mplay-bin, do an export
 LD_LIBRARY_PATH=PATH OF YOUR CHOOSING FROM EARLIER/lib

Thanks Sinclair,

  I have built Mesa 10.1.2 as a temporary solution and the Houdini tools are
now running fine on VMWare Fusion 7.

Cheers

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 08/16] i965: Consolidate code to get struct brw_sampler_prog_key_data

This chunk of code is repeated in a few places, and we're going to add
a MESA_SHADER_VERTEX case to it soon.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 37 
 1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 3fc9e39..f36c474 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1696,6 +1696,17 @@ fs_visitor::emit_texture_gen7(ir_texture_opcode op, 
fs_reg dst,
return inst;
 }
 
+static struct brw_sampler_prog_key_data *
+get_tex(gl_shader_stage stage, const void *key)
+{
+   switch (stage) {
+   case MESA_SHADER_FRAGMENT:
+  return ((brw_wm_prog_key*) key)-tex;
+   default:
+  unreachable(unhandled shader stage);
+   }
+}
+
 fs_reg
 fs_visitor::rescale_texcoord(fs_reg coordinate, const glsl_type *coord_type,
  bool is_rect, uint32_t sampler, int texunit)
@@ -1703,10 +1714,7 @@ fs_visitor::rescale_texcoord(fs_reg coordinate, const 
glsl_type *coord_type,
fs_inst *inst = NULL;
bool needs_gl_clamp = true;
fs_reg scale_x, scale_y;
-   const struct brw_sampler_prog_key_data *tex =
-  (stage == MESA_SHADER_FRAGMENT) ?
-  ((brw_wm_prog_key*) this-key)-tex : NULL;
-   assert(tex);
+   struct brw_sampler_prog_key_data *tex = get_tex(stage, this-key);
 
/* The 965 requires the EU to do the normalization of GL rectangle
 * texture coordinates.  We use the program parameter state
@@ -1859,10 +1867,7 @@ fs_visitor::emit_texture(ir_texture_opcode op,
  uint32_t sampler,
  fs_reg sampler_reg, int texunit)
 {
-   const struct brw_sampler_prog_key_data *tex =
-  (stage == MESA_SHADER_FRAGMENT) ?
-  ((brw_wm_prog_key*) this-key)-tex : NULL;
-   assert(tex);
+   struct brw_sampler_prog_key_data *tex = get_tex(stage, this-key);
fs_inst *inst = NULL;
 
if (op == ir_tg4) {
@@ -1952,11 +1957,7 @@ fs_visitor::emit_texture(ir_texture_opcode op,
 void
 fs_visitor::visit(ir_texture *ir)
 {
-   const struct brw_sampler_prog_key_data *tex =
-  (stage == MESA_SHADER_FRAGMENT) ?
-  ((brw_wm_prog_key*) this-key)-tex : NULL;
-   assert(tex);
-
+   const struct brw_sampler_prog_key_data *tex = get_tex(stage, this-key);
uint32_t sampler =
   _mesa_get_sampler_uniform_value(ir-sampler, shader_prog, prog);
 
@@ -2138,10 +2139,7 @@ fs_visitor::emit_gen6_gather_wa(uint8_t wa, fs_reg dst)
 uint32_t
 fs_visitor::gather_channel(int orig_chan, uint32_t sampler)
 {
-   const struct brw_sampler_prog_key_data *tex =
-  (stage == MESA_SHADER_FRAGMENT) ?
-  ((brw_wm_prog_key*) this-key)-tex : NULL;
-   assert(tex);
+   struct brw_sampler_prog_key_data *tex = get_tex(stage, this-key);
int swiz = GET_SWZ(tex-swizzles[sampler], orig_chan);
switch (swiz) {
   case SWIZZLE_X: return 0;
@@ -2181,10 +2179,7 @@ fs_visitor::swizzle_result(ir_texture_opcode op, int 
dest_components,
if (op == ir_txs || op == ir_lod || op == ir_tg4)
   return;
 
-   const struct brw_sampler_prog_key_data *tex =
-  (stage == MESA_SHADER_FRAGMENT) ?
-  ((brw_wm_prog_key*) this-key)-tex : NULL;
-   assert(tex);
+   struct brw_sampler_prog_key_data *tex = get_tex(stage, this-key);
 
if (dest_components == 1) {
   /* Ignore DEPTH_TEXTURE_MODE swizzling. */
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 06/16] i965: Add SIMD8 URB write low-level IR instruction

This is all we need from the generator for SIMD8 vertex shaders.  This
opcode is just the send instruction, all the hard work will happen
in the visitor using LOAD_PAYLOAD.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_defines.h   |  3 +++
 src/mesa/drivers/dri/i965/brw_fs.cpp  |  4 
 src/mesa/drivers/dri/i965/brw_fs.h|  1 +
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp| 25 +++
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 16 ++-
 src/mesa/drivers/dri/i965/brw_shader.cpp  |  4 
 6 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index ab45d3d..650fdb9 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -903,6 +903,8 @@ enum opcode {
SHADER_OPCODE_GEN4_SCRATCH_WRITE,
SHADER_OPCODE_GEN7_SCRATCH_READ,
 
+   SHADER_OPCODE_URB_WRITE_SIMD8,
+
FS_OPCODE_DDX,
FS_OPCODE_DDY,
FS_OPCODE_PIXEL_X,
@@ -1520,6 +1522,7 @@ enum brw_message_target {
 
 #define BRW_URB_OPCODE_WRITE_HWORD  0
 #define BRW_URB_OPCODE_WRITE_OWORD  1
+#define GEN8_URB_OPCODE_SIMD8_WRITE  7
 
 #define BRW_URB_SWIZZLE_NONE  0
 #define BRW_URB_SWIZZLE_INTERLEAVE1
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index bd44b24..9d07857 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -509,6 +509,7 @@ fs_inst::is_send_from_grf() const
case FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET:
case SHADER_OPCODE_UNTYPED_ATOMIC:
case SHADER_OPCODE_UNTYPED_SURFACE_READ:
+   case SHADER_OPCODE_URB_WRITE_SIMD8:
   return true;
case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
   return src[1].file == GRF;
@@ -919,6 +920,8 @@ fs_inst::regs_read(fs_visitor *v, int arg) const
   return mlen;
} else if (opcode == FS_OPCODE_FB_WRITE  arg == 0) {
   return mlen;
+   } else if (opcode == SHADER_OPCODE_URB_WRITE_SIMD8  arg == 0) {
+  return mlen;
} else if (opcode == SHADER_OPCODE_UNTYPED_ATOMIC  arg == 0) {
   return mlen;
} else if (opcode == SHADER_OPCODE_UNTYPED_SURFACE_READ  arg == 0) {
@@ -1009,6 +1012,7 @@ fs_visitor::implied_mrf_writes(fs_inst *inst)
   return 2;
case SHADER_OPCODE_UNTYPED_ATOMIC:
case SHADER_OPCODE_UNTYPED_SURFACE_READ:
+   case SHADER_OPCODE_URB_WRITE_SIMD8:
case FS_OPCODE_INTERPOLATE_AT_CENTROID:
case FS_OPCODE_INTERPOLATE_AT_SAMPLE:
case FS_OPCODE_INTERPOLATE_AT_SHARED_OFFSET:
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 7e99b31..457fb4b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -710,6 +710,7 @@ private:
   struct brw_reg implied_header,
   GLuint nr);
void generate_fb_write(fs_inst *inst, struct brw_reg payload);
+   void generate_urb_write(fs_inst *inst, struct brw_reg payload);
void generate_blorp_fb_write(fs_inst *inst);
void generate_pixel_xy(struct brw_reg dst, bool is_x);
void generate_linterp(fs_inst *inst, struct brw_reg dst,
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index fe09ad5..75ee2c7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -236,6 +236,27 @@ fs_generator::generate_fb_write(fs_inst *inst, struct 
brw_reg payload)
 }
 
 void
+fs_generator::generate_urb_write(fs_inst *inst, struct brw_reg payload)
+{
+   brw_inst *insn;
+
+   insn = brw_next_insn(p, BRW_OPCODE_SEND);
+
+   brw_set_dest(p, insn, brw_null_reg());
+   brw_set_src0(p, insn, payload);
+   brw_set_src1(p, insn, brw_imm_d(0));
+
+   brw_inst_set_sfid(brw, insn, BRW_SFID_URB);
+   brw_inst_set_urb_opcode(brw, insn, GEN8_URB_OPCODE_SIMD8_WRITE);
+
+   brw_inst_set_mlen(brw, insn, inst-mlen);
+   brw_inst_set_rlen(brw, insn, 0);
+   brw_inst_set_eot(brw, insn, inst-eot);
+   brw_inst_set_header_present(brw, insn, true);
+   brw_inst_set_urb_global_offset(brw, insn, inst-offset);
+}
+
+void
 fs_generator::generate_blorp_fb_write(fs_inst *inst)
 {
brw_fb_WRITE(p,
@@ -1879,6 +1900,10 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
 generate_scratch_read_gen7(inst, dst);
 break;
 
+  case SHADER_OPCODE_URB_WRITE_SIMD8:
+generate_urb_write(inst, src[0]);
+break;
+
   case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
 generate_uniform_pull_constant_load(inst, dst, src[0], src[1]);
 break;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
index 44c74a3..8165909 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
@@ -385,6

[Mesa-dev] [PATCH v2 04/16] i965: Set shader name for generator from call site

fs_generator no longer knows what stage it's generating code for, so
we have to set the debug name of the shader from the call site.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp |  4 +++-
 src/mesa/drivers/dri/i965/brw_fs.cpp| 17 --
 src/mesa/drivers/dri/i965/brw_fs.h  |  7 +++---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp  | 31 +++--
 4 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
index 86ed953..f6d0b68 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
@@ -31,8 +31,10 @@ brw_blorp_eu_emitter::brw_blorp_eu_emitter(struct 
brw_context *brw,
: mem_ctx(ralloc_context(NULL)),
  generator(brw, mem_ctx, (void *) rzalloc(mem_ctx, struct brw_wm_prog_key),
(struct brw_stage_prog_data *) rzalloc(mem_ctx, struct 
brw_wm_prog_data),
-   NULL, NULL, false, debug_flag)
+   NULL, NULL, false)
 {
+   if (debug_flag)
+  generator.enable_debug(blorp);
 }
 
 brw_blorp_eu_emitter::~brw_blorp_eu_emitter()
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index e417e0c..e96b375 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3743,8 +3743,21 @@ brw_wm_fs_emit(struct brw_context *brw,
   prog_data-no_8 = false;
}
 
-   fs_generator g(brw, mem_ctx, (void *) key, prog_data-base, prog, 
fp-Base,
-  v.runtime_check_aads_emit, INTEL_DEBUG  DEBUG_WM);
+   fs_generator g(brw, mem_ctx, (void *) key, prog_data-base, prog,
+  fp-Base, v.runtime_check_aads_emit);
+
+   if (unlikely(INTEL_DEBUG  DEBUG_WM)) {
+  char *name;
+  if (prog)
+ name = ralloc_asprintf(mem_ctx, %s fragment shader %d,
+prog-Label ? prog-Label : unnamed,
+prog-Name);
+  else
+ name = ralloc_asprintf(mem_ctx, fragment program %d, fp-Base.Id);
+
+  g.enable_debug(name);
+   }
+
if (simd8_cfg)
   g.generate_code(simd8_cfg, 8);
if (simd16_cfg)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index ae21840..ad47875 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -698,10 +698,10 @@ public:
 struct brw_stage_prog_data *prog_data,
 struct gl_shader_program *shader_prog,
 struct gl_program *fp,
-bool runtime_check_aads_emit,
-bool debug_flag);
+bool runtime_check_aads_emit);
~fs_generator();
 
+   void enable_debug(const char *shader_name);
int generate_code(const cfg_t *cfg, int dispatch_width);
const unsigned *get_assembly(unsigned int *assembly_size);
 
@@ -809,7 +809,8 @@ private:
 
exec_list discard_halt_patches;
bool runtime_check_aads_emit;
-   const bool debug_flag;
+   bool debug_flag;
+   const char *shader_name;
void *mem_ctx;
 };
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 9faecf6..ba9303f 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -42,13 +42,12 @@ fs_generator::fs_generator(struct brw_context *brw,
struct brw_stage_prog_data *prog_data,
struct gl_shader_program *shader_prog,
struct gl_program *prog,
-   bool runtime_check_aads_emit,
-   bool debug_flag)
+   bool runtime_check_aads_emit)
 
: brw(brw), key(key),
  prog_data(prog_data), shader_prog(shader_prog),
  prog(prog), runtime_check_aads_emit(runtime_check_aads_emit),
- debug_flag(debug_flag), mem_ctx(mem_ctx)
+ debug_flag(false), mem_ctx(mem_ctx)
 {
ctx = brw-ctx;
 
@@ -1508,6 +1507,13 @@ fs_generator::generate_untyped_surface_read(fs_inst 
*inst, struct brw_reg dst,
brw_mark_surface_used(prog_data, surf_index.dw1.ud);
 }
 
+void
+fs_generator::enable_debug(const char *shader_name)
+{
+   debug_flag = true;
+   this-shader_name = shader_name;
+}
+
 int
 fs_generator::generate_code(const cfg_t *cfg, int dispatch_width)
 {
@@ -2006,21 +2012,10 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
int after_size = p-next_insn_offset - start_offset;
 
if (unlikely(debug_flag)) {
-  if (shader_prog) {
- fprintf(stderr,
- Native code for %s fragment shader %d (SIMD%d dispatch):\n,
- shader_prog-Label ? shader_prog-Label : unnamed,
- shader_prog-Name, dispatch_width);
-  } else if (prog) {
- fprintf(stderr,
-

[Mesa-dev] [PATCH v2 12/16] i965: Move fs_visitor optimization pass into new method fs_visitor::optimize()

We'll reuse this toplevel optimization driver for the scalar VS.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 136 ++-
 src/mesa/drivers/dri/i965/brw_fs.h   |   1 +
 2 files changed, 72 insertions(+), 65 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 0ffb4d8..cb73b9f 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3468,6 +3468,76 @@ fs_visitor::opt_drop_redundant_mov_to_flags()
}
 }
 
+void
+fs_visitor::optimize()
+{
+   calculate_cfg();
+
+   split_virtual_grfs();
+
+   move_uniform_array_access_to_pull_constants();
+   assign_constant_locations();
+   demote_pull_constants();
+
+   opt_drop_redundant_mov_to_flags();
+
+#define OPT(pass, args...) do { \
+  pass_num++;   \
+  bool this_progress = pass(args);  \
+\
+  if (unlikely(INTEL_DEBUG  DEBUG_OPTIMIZER)  this_progress) {   \
+ char filename[64]; \
+ snprintf(filename, 64, fs%d-%04d-%02d-%02d- #pass,   \
+  dispatch_width, shader_prog ? shader_prog-Name : 0, 
iteration, pass_num); \
+\
+ backend_visitor::dump_instructions(filename);  \
+  } \
+\
+  progress = progress || this_progress; \
+   } while (false)
+
+   if (unlikely(INTEL_DEBUG  DEBUG_OPTIMIZER)) {
+  char filename[64];
+  snprintf(filename, 64, fs%d-%04d-00-start,
+   dispatch_width, shader_prog ? shader_prog-Name : 0);
+
+  backend_visitor::dump_instructions(filename);
+   }
+
+   bool progress;
+   int iteration = 0;
+   do {
+  progress = false;
+  iteration++;
+  int pass_num = 0;
+
+  OPT(remove_duplicate_mrf_writes);
+
+  OPT(opt_algebraic);
+  OPT(opt_cse);
+  OPT(opt_copy_propagate);
+  OPT(opt_peephole_predicated_break);
+  OPT(dead_code_eliminate);
+  OPT(opt_peephole_sel);
+  OPT(dead_control_flow_eliminate, this);
+  OPT(opt_register_renaming);
+  OPT(opt_saturate_propagation);
+  OPT(register_coalesce);
+  OPT(compute_to_mrf);
+
+  OPT(compact_virtual_grfs);
+   } while (progress);
+
+   if (lower_load_payload()) {
+  split_virtual_grfs();
+  register_coalesce();
+  compute_to_mrf();
+  dead_code_eliminate();
+   }
+
+   lower_uniform_pull_constant_loads();
+}
+
 bool
 fs_visitor::run()
 {
@@ -3535,71 +3605,7 @@ fs_visitor::run()
 
   emit_fb_writes();
 
-  calculate_cfg();
-
-  split_virtual_grfs();
-
-  move_uniform_array_access_to_pull_constants();
-  assign_constant_locations();
-  demote_pull_constants();
-
-  opt_drop_redundant_mov_to_flags();
-
-#define OPT(pass, args...) do {\
-  pass_num++;  \
-  bool this_progress = pass(args); \
-   \
-  if (unlikely(INTEL_DEBUG  DEBUG_OPTIMIZER)  this_progress) {  \
- char filename[64];\
- snprintf(filename, 64, fs%d-%04d-%02d-%02d- #pass,  \
-  dispatch_width, shader_prog ? shader_prog-Name : 0, 
iteration, pass_num); \
-   \
- backend_visitor::dump_instructions(filename); \
-  }\
-   \
-  progress = progress || this_progress;\
-   } while (false)
-
-  if (unlikely(INTEL_DEBUG  DEBUG_OPTIMIZER)) {
- char filename[64];
- snprintf(filename, 64, fs%d-%04d-00-start,
-  dispatch_width, shader_prog ? shader_prog-Name : 0);
-
- backend_visitor::dump_instructions(filename);
-  }
-
-  bool progress;
-  int iteration = 0;
-  do {
-progress = false;
- iteration++;
- int pass_num = 0;
-
- OPT(remove_duplicate_mrf_writes);
-
- OPT(opt_algebraic);
- OPT(opt_cse);
- OPT(opt_copy_propagate);
- OPT(opt_peephole_predicated_break);
- OPT(dead_code_eliminate);
- OPT(opt_peephole_sel);
- OPT(dead_control_flow_eliminate, this);
-

[Mesa-dev] [PATCH v2 13/16] i965: Move fs_visitor ra pass to new fs_visitor::allocate_registers()

This will be reused for the scalar VS pass.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 132 +++
 src/mesa/drivers/dri/i965/brw_fs.h   |   1 +
 2 files changed, 71 insertions(+), 62 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index cb73b9f..4dce0a2 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3538,11 +3538,79 @@ fs_visitor::optimize()
lower_uniform_pull_constant_loads();
 }
 
+void
+fs_visitor::allocate_registers()
+{
+   bool allocated_without_spills;
+
+   static enum instruction_scheduler_mode pre_modes[] = {
+  SCHEDULE_PRE,
+  SCHEDULE_PRE_NON_LIFO,
+  SCHEDULE_PRE_LIFO,
+   };
+
+   /* Try each scheduling heuristic to see if it can successfully register
+* allocate without spilling.  They should be ordered by decreasing
+* performance but increasing likelihood of allocating.
+*/
+   for (unsigned i = 0; i  ARRAY_SIZE(pre_modes); i++) {
+  schedule_instructions(pre_modes[i]);
+
+  if (0) {
+ assign_regs_trivial();
+ allocated_without_spills = true;
+  } else {
+ allocated_without_spills = assign_regs(false);
+  }
+  if (allocated_without_spills)
+ break;
+   }
+
+   if (!allocated_without_spills) {
+  /* We assume that any spilling is worse than just dropping back to
+   * SIMD8.  There's probably actually some intermediate point where
+   * SIMD16 with a couple of spills is still better.
+   */
+  if (dispatch_width == 16) {
+ fail(Failure to register allocate.  Reduce number of 
+  live scalar values to avoid this.);
+  } else {
+ perf_debug(Fragment shader triggered register spilling.  
+Try reducing the number of live scalar values to 
+improve performance.\n);
+  }
+
+  /* Since we're out of heuristics, just go spill registers until we
+   * get an allocation.
+   */
+  while (!assign_regs(true)) {
+ if (failed)
+break;
+  }
+   }
+
+   assert(force_uncompressed_stack == 0);
+
+   /* This must come after all optimization and register allocation, since
+* it inserts dead code that happens to have side effects, and it does
+* so based on the actual physical registers in use.
+*/
+   insert_gen4_send_dependency_workarounds();
+
+   if (failed)
+  return;
+
+   if (!allocated_without_spills)
+  schedule_instructions(SCHEDULE_POST);
+
+   if (last_scratch  0)
+  prog_data-total_scratch = brw_get_scratch_size(last_scratch);
+}
+
 bool
 fs_visitor::run()
 {
sanity_param_count = prog-Parameters-NumParameters;
-   bool allocated_without_spills;
 
assign_binding_table_offsets();
 
@@ -3555,7 +3623,6 @@ fs_visitor::run()
   emit_dummy_fs();
} else if (brw-use_rep_send  dispatch_width == 16) {
   emit_repclear_shader();
-  allocated_without_spills = true;
} else {
   if (INTEL_DEBUG  DEBUG_SHADER_TIME)
  emit_shader_time_begin();
@@ -3610,68 +3677,9 @@ fs_visitor::run()
   assign_curb_setup();
   assign_urb_setup();
 
-  static enum instruction_scheduler_mode pre_modes[] = {
- SCHEDULE_PRE,
- SCHEDULE_PRE_NON_LIFO,
- SCHEDULE_PRE_LIFO,
-  };
-
-  /* Try each scheduling heuristic to see if it can successfully register
-   * allocate without spilling.  They should be ordered by decreasing
-   * performance but increasing likelihood of allocating.
-   */
-  for (unsigned i = 0; i  ARRAY_SIZE(pre_modes); i++) {
- schedule_instructions(pre_modes[i]);
-
- if (0) {
-assign_regs_trivial();
-allocated_without_spills = true;
- } else {
-allocated_without_spills = assign_regs(false);
- }
- if (allocated_without_spills)
-break;
-  }
-
-  if (!allocated_without_spills) {
- /* We assume that any spilling is worse than just dropping back to
-  * SIMD8.  There's probably actually some intermediate point where
-  * SIMD16 with a couple of spills is still better.
-  */
- if (dispatch_width == 16) {
-fail(Failure to register allocate.  Reduce number of 
- live scalar values to avoid this.);
- } else {
-perf_debug(Fragment shader triggered register spilling.  
-   Try reducing the number of live scalar values to 
-   improve performance.\n);
- }
-
- /* Since we're out of heuristics, just go spill registers until we
-  * get an allocation.
-  */
- while (!assign_regs(true)) {
-if (failed)
-   break;
- }
-  }
-
-  assert(force_uncompressed_stack == 0);
-
-  /* This must come after all optimization and

[Mesa-dev] [PATCH v2 00/16] Scalar VS for BDW+

Hi,

Here's v2 of the patch series.  It incorportes Matts review comments and
adds a new patch to refactor the way we call fs_generator.  The idea is
to get rid of the MESA_SHADER_FS assertion in generate_assembly)() in a
nicer way.  Now we call generate_code() two times with different dispatch
with instead, which returns the offset in the assembly where we put the
generated code.

Kristian

Kristian Høgsberg (16):
  i965: Don't copy propagate sat MOVs into LOAD_PAYLOAD
  i965: Refactor fs_generator API
  i965: Generalize fs_generator further
  i965: Set shader name for generator from call site
  i965: Remove shader program argument and member from fs_generator
  i965: Add SIMD8 URB write low-level IR instruction
  i965: Add new SIMD8 VS prog data flag
  i965: Consolidate code to get struct brw_sampler_prog_key_data
  i965: Prepare for using the ATTR register file in the fs backend
  i965: Rename brw_vec4_prog_data to brw_vue_prog_data
  i965: Move more code into codegen-branch of the fs_visitor::run() if
statement
  i965: Move fs_visitor optimization pass into new method
fs_visitor::optimize()
  i965: Move fs_visitor ra pass to new fs_visitor::allocate_registers()
  i965: Add fs_visitor::run_vs() to generate scalar vertex shader code
  i965: Clean up fs_visitor::run and rename to run_fs
  i965: Generate vs code using scalar backend for BDW+

 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp|  13 +-
 src/mesa/drivers/dri/i965/brw_context.c|  13 +
 src/mesa/drivers/dri/i965/brw_context.h|  23 +-
 src/mesa/drivers/dri/i965/brw_defines.h|   5 +
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 436 +
 src/mesa/drivers/dri/i965/brw_fs.h |  51 ++-
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   |   6 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 121 +++---
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |  16 +-
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 351 +++--
 src/mesa/drivers/dri/i965/brw_gs_surface_state.c   |   2 +-
 src/mesa/drivers/dri/i965/brw_shader.cpp   |  21 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp |  66 +++-
 src/mesa/drivers/dri/i965/brw_vec4.h   |  18 +-
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp   |   4 +-
 src/mesa/drivers/dri/i965/brw_vec4_gs.c|   4 +-
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp  |   2 +-
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h|   2 +-
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |   4 +-
 src/mesa/drivers/dri/i965/brw_vs.c |  10 +-
 src/mesa/drivers/dri/i965/brw_vs.h |   2 +-
 src/mesa/drivers/dri/i965/brw_vs_surface_state.c   |  10 +-
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c   |   7 +-
 src/mesa/drivers/dri/i965/gen6_gs_state.c  |   2 +-
 src/mesa/drivers/dri/i965/gen7_gs_state.c  |   2 +-
 src/mesa/drivers/dri/i965/gen8_gs_state.c  |   2 +-
 src/mesa/drivers/dri/i965/gen8_vs_state.c  |   4 +-
 src/mesa/drivers/dri/i965/intel_debug.c|   1 +
 src/mesa/drivers/dri/i965/intel_debug.h|   1 +
 29 files changed, 869 insertions(+), 330 deletions(-)

-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 15/16] i965: Clean up fs_visitor::run and rename to run_fs

Now that fs_visitor::run is back to being only fragment
shader compilation, we can clean up a few stage == MESA_SHADER_FRAGMENT
conditions and rename it to run_fs.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 31 +--
 src/mesa/drivers/dri/i965/brw_fs.h   |  2 +-
 2 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 8007977..9bd57c9 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3699,8 +3699,12 @@ fs_visitor::run_vs()
 }
 
 bool
-fs_visitor::run()
+fs_visitor::run_fs()
 {
+   brw_wm_prog_data *wm_prog_data = (brw_wm_prog_data *) this-prog_data;
+   brw_wm_prog_key *wm_key = (brw_wm_prog_key *) this-key;
+   assert(stage == MESA_SHADER_FRAGMENT);
+
sanity_param_count = prog-Parameters-NumParameters;
 
assign_binding_table_offsets();
@@ -3729,13 +3733,7 @@ fs_visitor::run()
   /* We handle discards by keeping track of the still-live pixels in f0.1.
* Initialize it with the dispatched pixels.
*/
-  bool uses_kill =
- (stage == MESA_SHADER_FRAGMENT) 
- ((brw_wm_prog_data*) this-prog_data)-uses_kill;
-  bool alpha_test_func =
- (stage == MESA_SHADER_FRAGMENT) 
- ((brw_wm_prog_key*) this-key)-alpha_test_func;
-  if (uses_kill || alpha_test_func) {
+  if (wm_prog_data-uses_kill || wm_key-alpha_test_func) {
  fs_inst *discard_init = emit(FS_OPCODE_MOV_DISPATCH_TO_FLAGS);
  discard_init-flag_subreg = 1;
   }
@@ -3758,7 +3756,7 @@ fs_visitor::run()
 
   emit(FS_OPCODE_PLACEHOLDER_HALT);
 
-  if (alpha_test_func)
+  if (wm_key-alpha_test_func)
  emit_alpha_test();
 
   emit_fb_writes();
@@ -3773,13 +3771,10 @@ fs_visitor::run()
  return false;
}
 
-   if (stage == MESA_SHADER_FRAGMENT) {
-  brw_wm_prog_data *prog_data = (brw_wm_prog_data*) this-prog_data;
-  if (dispatch_width == 8)
- prog_data-reg_blocks = brw_register_blocks(grf_used);
-  else
- prog_data-reg_blocks_16 = brw_register_blocks(grf_used);
-   }
+   if (dispatch_width == 8)
+  wm_prog_data-reg_blocks = brw_register_blocks(grf_used);
+   else
+  wm_prog_data-reg_blocks_16 = brw_register_blocks(grf_used);
 
/* If any state parameters were appended, then ParameterValues could have
 * been realloced, in which case the driver uniform storage set up by
@@ -3819,7 +3814,7 @@ brw_wm_fs_emit(struct brw_context *brw,
/* Now the main event: Visit the shader IR and generate our FS IR for it.
 */
fs_visitor v(brw, mem_ctx, key, prog_data, prog, fp, 8);
-   if (!v.run()) {
+   if (!v.run_fs()) {
   if (prog) {
  prog-LinkStatus = false;
  ralloc_strcat(prog-InfoLog, v.fail_msg);
@@ -3838,7 +3833,7 @@ brw_wm_fs_emit(struct brw_context *brw,
   if (!v.simd16_unsupported) {
  /* Try a SIMD16 compile */
  v2.import_uniforms(v);
- if (!v2.run()) {
+ if (!v2.run_fs()) {
 perf_debug(SIMD16 shader failed to compile, falling back to 
SIMD8 at a 10-20%% performance cost: %s, v2.fail_msg);
  } else {
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 6888cdd..b83ea87 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -416,7 +416,7 @@ public:
 const fs_reg varying_offset,
 uint32_t const_offset);
 
-   bool run();
+   bool run_fs();
bool run_vs();
void optimize();
void allocate_registers();
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 05/16] i965: Remove shader program argument and member from fs_generator

Now that the caller passes in the shader debug name, we don't need this
anymore.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 2 +-
 src/mesa/drivers/dri/i965/brw_fs.cpp| 2 +-
 src/mesa/drivers/dri/i965/brw_fs.h  | 2 --
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp  | 3 +--
 4 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
index f6d0b68..83fccc2 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
@@ -31,7 +31,7 @@ brw_blorp_eu_emitter::brw_blorp_eu_emitter(struct brw_context 
*brw,
: mem_ctx(ralloc_context(NULL)),
  generator(brw, mem_ctx, (void *) rzalloc(mem_ctx, struct brw_wm_prog_key),
(struct brw_stage_prog_data *) rzalloc(mem_ctx, struct 
brw_wm_prog_data),
-   NULL, NULL, false)
+   NULL, false)
 {
if (debug_flag)
   generator.enable_debug(blorp);
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index e96b375..bd44b24 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3743,7 +3743,7 @@ brw_wm_fs_emit(struct brw_context *brw,
   prog_data-no_8 = false;
}
 
-   fs_generator g(brw, mem_ctx, (void *) key, prog_data-base, prog,
+   fs_generator g(brw, mem_ctx, (void *) key, prog_data-base,
   fp-Base, v.runtime_check_aads_emit);
 
if (unlikely(INTEL_DEBUG  DEBUG_WM)) {
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index ad47875..7e99b31 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -696,7 +696,6 @@ public:
 void *mem_ctx,
 const void *key,
 struct brw_stage_prog_data *prog_data,
-struct gl_shader_program *shader_prog,
 struct gl_program *fp,
 bool runtime_check_aads_emit);
~fs_generator();
@@ -802,7 +801,6 @@ private:
const void * const key;
struct brw_stage_prog_data * const prog_data;
 
-   struct gl_shader_program * const shader_prog;
const struct gl_program *prog;
 
unsigned dispatch_width; /** 8 or 16 */
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index ba9303f..fe09ad5 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -40,12 +40,11 @@ fs_generator::fs_generator(struct brw_context *brw,
void *mem_ctx,
const void *key,
struct brw_stage_prog_data *prog_data,
-   struct gl_shader_program *shader_prog,
struct gl_program *prog,
bool runtime_check_aads_emit)
 
: brw(brw), key(key),
- prog_data(prog_data), shader_prog(shader_prog),
+ prog_data(prog_data),
  prog(prog), runtime_check_aads_emit(runtime_check_aads_emit),
  debug_flag(false), mem_ctx(mem_ctx)
 {
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 02/16] i965: Refactor fs_generator API

We split out SIMD8 and SIMD16 generation into seperate calls to
new method generate_code(), which returns the start offset for the
generated code.  A new get_assembly() method returns the generated code.

This avoids asserting MESA_SHADER_FRAGMENT and accessing wm_prog_data
in the generator.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp |  4 ++-
 src/mesa/drivers/dri/i965/brw_fs.cpp|  9 +++---
 src/mesa/drivers/dri/i965/brw_fs.h  |  6 ++--
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp  | 43 -
 4 files changed, 23 insertions(+), 39 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
index 3afe0e7..7802c9f 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
@@ -45,7 +45,9 @@ const unsigned *
 brw_blorp_eu_emitter::get_program(unsigned *program_size)
 {
cfg_t cfg(insts);
-   return generator.generate_assembly(NULL, cfg, program_size);
+   generator.generate_code(cfg, 16);
+
+   return generator.get_assembly(program_size);
 }
 
 /**
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index aa1d8d2..e12fd77 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3743,11 +3743,12 @@ brw_wm_fs_emit(struct brw_context *brw,
   prog_data-no_8 = false;
}
 
-   const unsigned *assembly = NULL;
fs_generator g(brw, mem_ctx, key, prog_data, prog, fp,
   v.runtime_check_aads_emit, INTEL_DEBUG  DEBUG_WM);
-   assembly = g.generate_assembly(simd8_cfg, simd16_cfg,
-  final_assembly_size);
+   if (simd8_cfg)
+  g.generate_code(simd8_cfg, 8);
+   if (simd16_cfg)
+  prog_data-prog_offset_16 = g.generate_code(simd16_cfg, 16);
 
if (unlikely(brw-perf_debug)  shader) {
   if (shader-compiled_once)
@@ -3760,7 +3761,7 @@ brw_wm_fs_emit(struct brw_context *brw,
   }
}
 
-   return assembly;
+   return g.get_assembly(final_assembly_size);
 }
 
 bool
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 67956bc..5c21dd0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -702,12 +702,10 @@ public:
 bool debug_flag);
~fs_generator();
 
-   const unsigned *generate_assembly(const cfg_t *simd8_cfg,
- const cfg_t *simd16_cfg,
- unsigned *assembly_size);
+   int generate_code(const cfg_t *cfg, int dispatch_width);
+   const unsigned *get_assembly(unsigned int *assembly_size);
 
 private:
-   void generate_code(const cfg_t *cfg);
void fire_fb_write(fs_inst *inst,
   struct brw_reg payload,
   struct brw_reg implied_header,
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index c95beb6..0622b07 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1512,9 +1512,17 @@ fs_generator::generate_untyped_surface_read(fs_inst 
*inst, struct brw_reg dst,
brw_mark_surface_used(prog_data, surf_index.dw1.ud);
 }
 
-void
-fs_generator::generate_code(const cfg_t *cfg)
+int
+fs_generator::generate_code(const cfg_t *cfg, int dispatch_width)
 {
+   /* align to 64 byte boundary. */
+   while (p-next_insn_offset % 64)
+  brw_NOP(p);
+
+   this-dispatch_width = dispatch_width;
+   if (dispatch_width == 16)
+  brw_set_default_compression_control(p, BRW_COMPRESSION_COMPRESSED);
+
int start_offset = p-next_insn_offset;
int loop_count = 0;
 
@@ -2024,37 +2032,12 @@ fs_generator::generate_code(const cfg_t *cfg)
   dump_assembly(p-store, annotation.ann_count, annotation.ann, brw, prog);
   ralloc_free(annotation.ann);
}
+
+   return start_offset;
 }
 
 const unsigned *
-fs_generator::generate_assembly(const cfg_t *simd8_cfg,
-const cfg_t *simd16_cfg,
-unsigned *assembly_size)
+fs_generator::get_assembly(unsigned int *assembly_size)
 {
-   assert(simd8_cfg || simd16_cfg);
-
-   if (simd8_cfg) {
-  dispatch_width = 8;
-  generate_code(simd8_cfg);
-   }
-
-   if (simd16_cfg) {
-  /* align to 64 byte boundary. */
-  while (p-next_insn_offset % 64) {
- brw_NOP(p);
-  }
-
-  assert(stage == MESA_SHADER_FRAGMENT);
-  brw_wm_prog_data *prog_data = (brw_wm_prog_data*) this-prog_data;
-
-  /* Save off the start of this SIMD16 program */
-  prog_data-prog_offset_16 = p-next_insn_offset;
-
-  brw_set_default_compression_control(p, BRW_COMPRESSION_COMPRESSED);
-
-  dispatch_width = 16;
-  generate_code(simd16_cfg);
-   }
-
return brw_get_program(p, assembly_size);
 }
-- 
2.1.0

[Mesa-dev] [PATCH v2 01/16] i965: Don't copy propagate sat MOVs into LOAD_PAYLOAD

The LOAD_PAYLOAD opcode can't saturate its sources, so skip
saturating MOVs.  The register coalescing after lower_load_payload()
will clean up the extra MOVs.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index e1989cb..87ea9c2 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -454,8 +454,12 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
acp_entry *entry)
   val.effective_width = inst-src[i].effective_width;
 
   switch (inst-opcode) {
-  case BRW_OPCODE_MOV:
   case SHADER_OPCODE_LOAD_PAYLOAD:
+ /* LOAD_PAYLOAD can't sat its sources. */
+ if (entry-saturate)
+break;
+ /* Otherwise, fall through */
+  case BRW_OPCODE_MOV:
  inst-src[i] = val;
  progress = true;
  break;
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 09/16] i965: Prepare for using the ATTR register file in the fs backend

The scalar vertex shader will use the ATTR register file for vertex
attributes.  This patch adds support for the ATTR file to fs_visitor.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 12 ++--
 src/mesa/drivers/dri/i965/brw_fs.h |  3 +++
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  2 --
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 11 +--
 4 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 9d07857..00156c7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -76,7 +76,7 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const 
fs_reg dst,
  this-exec_size = dst.width;
   } else {
  for (int i = 0; i  sources; ++i) {
-if (src[i].file != GRF)
+if (src[i].file != GRF  src[i].file != ATTR)
continue;
 
 if (this-exec_size = 1)
@@ -97,6 +97,7 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const 
fs_reg dst,
  break;
   case GRF:
   case HW_REG:
+  case ATTR:
  assert(this-src[i].width  0);
  if (this-src[i].width == 1) {
 this-src[i].effective_width = this-exec_size;
@@ -121,6 +122,7 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const 
fs_reg dst,
case GRF:
case HW_REG:
case MRF:
+   case ATTR:
   this-regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 
32;
   break;
case BAD_FILE:
@@ -636,7 +638,7 @@ fs_reg::is_contiguous() const
 bool
 fs_reg::is_valid_3src() const
 {
-   return file == GRF || file == UNIFORM;
+   return file == GRF || file == UNIFORM || file == ATTR;
 }
 
 int
@@ -3148,6 +3150,9 @@ fs_visitor::dump_instruction(backend_instruction 
*be_inst, FILE *file)
case UNIFORM:
   fprintf(file, ***u%d***, inst-dst.reg + inst-dst.reg_offset);
   break;
+   case ATTR:
+  fprintf(file, attr%d, inst-dst.reg + inst-dst.reg_offset);
+  break;
case HW_REG:
   if (inst-dst.fixed_hw_reg.file == BRW_ARCHITECTURE_REGISTER_FILE) {
  switch (inst-dst.fixed_hw_reg.nr) {
@@ -3199,6 +3204,9 @@ fs_visitor::dump_instruction(backend_instruction 
*be_inst, FILE *file)
   case MRF:
  fprintf(file, ***m%d***, inst-src[i].reg);
  break;
+  case ATTR:
+ fprintf(file, attr%d, inst-src[i].reg + inst-src[i].reg_offset);
+ break;
   case UNIFORM:
  fprintf(file, u%d, inst-src[i].reg + inst-src[i].reg_offset);
  if (inst-src[i].reladdr) {
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 457fb4b..454496e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -132,6 +132,7 @@ byte_offset(fs_reg reg, unsigned delta)
case BAD_FILE:
   break;
case GRF:
+   case ATTR:
   reg.reg_offset += delta / 32;
   break;
case MRF:
@@ -157,6 +158,7 @@ horiz_offset(fs_reg reg, unsigned delta)
   break;
case GRF:
case MRF:
+   case ATTR:
   return byte_offset(reg, delta * reg.stride * type_sz(reg.type));
default:
   assert(delta == 0);
@@ -173,6 +175,7 @@ offset(fs_reg reg, unsigned delta)
   break;
case GRF:
case MRF:
+   case ATTR:
   return byte_offset(reg, delta * reg.width * reg.stride * 
type_sz(reg.type));
case UNIFORM:
   reg.reg_offset += delta;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 75ee2c7..dee79d3 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1270,8 +1270,6 @@ brw_reg_from_fs_reg(fs_reg *reg)
   /* Probably unused. */
   brw_reg = brw_null_reg();
   break;
-   case UNIFORM:
-  unreachable(not reached);
default:
   unreachable(not reached);
}
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index f36c474..0cc51f3 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -196,8 +196,15 @@ fs_visitor::visit(ir_dereference_array *ir)
src.type = brw_type_for_base_type(ir-type);
 
if (constant_index) {
-  assert(src.file == UNIFORM || src.file == GRF || src.file == HW_REG);
-  src = offset(src, constant_index-value.i[0] * element_size);
+  if (src.file == ATTR) {
+ /* Attribute arrays get loaded as one vec4 per element.  In that case
+  * offset the source register.
+  */
+ src.reg += constant_index-value.i[0];
+  } else {
+ assert(src.file == UNIFORM || src.file == GRF || src.file == HW_REG);
+ src = offset(src, constant_index-value.i[0] * element_size);
+  }
} else {
   /* Variable index array dereference.  We attach the

[Mesa-dev] [PATCH v2 11/16] i965: Move more code into codegen-branch of the fs_visitor::run() if statement

These last few operations all only apply when we've actually generated code,
optimized and allocated registers.  The dummy and the repclear shaders don't
touch uncompressed_stack, don't need the gen4 send workaround, and don't
spill.  This means we can move these lines into the else-branch, which will
make the following refactoring easier.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 00156c7..0ffb4d8 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3649,23 +3649,23 @@ fs_visitor::run()
break;
  }
   }
-   }
-   assert(force_uncompressed_stack == 0);
 
-   /* This must come after all optimization and register allocation, since
-* it inserts dead code that happens to have side effects, and it does
-* so based on the actual physical registers in use.
-*/
-   insert_gen4_send_dependency_workarounds();
+  assert(force_uncompressed_stack == 0);
 
-   if (failed)
-  return false;
+  /* This must come after all optimization and register allocation, since
+   * it inserts dead code that happens to have side effects, and it does
+   * so based on the actual physical registers in use.
+   */
+  insert_gen4_send_dependency_workarounds();
+
+  if (failed)
+ return false;
 
-   if (!allocated_without_spills)
-  schedule_instructions(SCHEDULE_POST);
+  if (!allocated_without_spills)
+ schedule_instructions(SCHEDULE_POST);
 
-   if (last_scratch  0) {
-  prog_data-total_scratch = brw_get_scratch_size(last_scratch);
+  if (last_scratch  0)
+ prog_data-total_scratch = brw_get_scratch_size(last_scratch);
}
 
if (stage == MESA_SHADER_FRAGMENT) {
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 07/16] i965: Add new SIMD8 VS prog data flag

This flag signals that we have a SIMD8 VS shader so we can set up the
corresponding state accordingly.  This boils down to setting
the BDW+ SIMD8 enable bit in 3DSTATE_VS and making UBO and pull
constant buffers use dword pitch.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_context.h  |  5 -
 src/mesa/drivers/dri/i965/brw_defines.h  |  2 ++
 src/mesa/drivers/dri/i965/brw_gs_surface_state.c |  2 +-
 src/mesa/drivers/dri/i965/brw_vs_surface_state.c | 10 --
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c |  7 ---
 src/mesa/drivers/dri/i965/gen8_vs_state.c|  2 ++
 6 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index eb37e75..e7cd30f 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -543,6 +543,8 @@ struct brw_vec4_prog_data {
 * is the size of the URB entry used for output.
 */
GLuint urb_entry_size;
+
+   bool simd8;
 };
 
 
@@ -1599,7 +1601,8 @@ brw_update_sol_surface(struct brw_context *brw,
 void brw_upload_ubo_surfaces(struct brw_context *brw,
 struct gl_shader *shader,
  struct brw_stage_state *stage_state,
- struct brw_stage_prog_data *prog_data);
+ struct brw_stage_prog_data *prog_data,
+ bool dword_pitch);
 void brw_upload_abo_surfaces(struct brw_context *brw,
  struct gl_shader_program *prog,
  struct brw_stage_state *stage_state,
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 650fdb9..900d8cf 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1688,6 +1688,8 @@ enum brw_message_target {
 # define GEN6_VS_STATISTICS_ENABLE (1  10)
 # define GEN6_VS_CACHE_DISABLE (1  1)
 # define GEN6_VS_ENABLE(1  0)
+/* Gen8+ DW7 */
+# define GEN8_VS_SIMD8_ENABLE   (1  2)
 /* Gen8+ DW8 */
 # define GEN8_VS_URB_ENTRY_OUTPUT_OFFSET_SHIFT  21
 # define GEN8_VS_URB_OUTPUT_LENGTH_SHIFT16
diff --git a/src/mesa/drivers/dri/i965/brw_gs_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
index 2c2ba56..42cdddb 100644
--- a/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
@@ -77,7 +77,7 @@ brw_upload_gs_ubo_surfaces(struct brw_context *brw)
 
/* CACHE_NEW_GS_PROG */
brw_upload_ubo_surfaces(brw, prog-_LinkedShaders[MESA_SHADER_GEOMETRY],
-  brw-gs.base, brw-gs.prog_data-base.base);
+  brw-gs.base, brw-gs.prog_data-base.base, false);
 }
 
 const struct brw_tracked_state brw_gs_ubo_surfaces = {
diff --git a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
index 1cc96cf..24bc06d 100644
--- a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
@@ -112,6 +112,7 @@ static void
 brw_upload_vs_pull_constants(struct brw_context *brw)
 {
struct brw_stage_state *stage_state = brw-vs.base;
+   bool dword_pitch;
 
/* BRW_NEW_VERTEX_PROGRAM */
struct brw_vertex_program *vp =
@@ -120,9 +121,11 @@ brw_upload_vs_pull_constants(struct brw_context *brw)
/* CACHE_NEW_VS_PROG */
const struct brw_stage_prog_data *prog_data = brw-vs.prog_data-base.base;
 
+   dword_pitch = brw-vs.prog_data-base.simd8;
+
/* _NEW_PROGRAM_CONSTANTS */
brw_upload_pull_constants(brw, BRW_NEW_VS_CONSTBUF, vp-program.Base,
- stage_state, prog_data, false);
+ stage_state, prog_data, dword_pitch);
 }
 
 const struct brw_tracked_state brw_vs_pull_constants = {
@@ -141,13 +144,16 @@ brw_upload_vs_ubo_surfaces(struct brw_context *brw)
/* _NEW_PROGRAM */
struct gl_shader_program *prog =
   ctx-_Shader-CurrentProgram[MESA_SHADER_VERTEX];
+   bool dword_pitch;
 
if (!prog)
   return;
 
/* CACHE_NEW_VS_PROG */
+   dword_pitch = brw-vs.prog_data-base.simd8;
brw_upload_ubo_surfaces(brw, prog-_LinkedShaders[MESA_SHADER_VERTEX],
-  brw-vs.base, brw-vs.prog_data-base.base);
+   brw-vs.base, brw-vs.prog_data-base.base,
+   dword_pitch);
 }
 
 const struct brw_tracked_state brw_vs_ubo_surfaces = {
diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index ef46dd7..ec86841 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -837,7 +837,8 @@ void
 brw_upload_ubo_surfaces(struct

[Mesa-dev] [PATCH v2 16/16] i965: Generate vs code using scalar backend for BDW+

With everything in place, we can now use the scalar backend compiler for
vertex shaders on BDW+.  We make scalar vertex shaders the default on
BDW+ but add a new vec4vs debug option to force the vec4 backend.

No piglit regressions.

Performance impact is minimal, I see a ~1.5 improvement on the T-Rex
GLBenchmark case, but in general it's in the noise.  Some of our
internal synthetic, vs bounded benchmarks show great improvement, 20%-40%
in some cases, but real-world cases are mostly unaffected.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_context.c  | 13 +++
 src/mesa/drivers/dri/i965/brw_context.h  |  1 +
 src/mesa/drivers/dri/i965/brw_shader.cpp | 17 +++--
 src/mesa/drivers/dri/i965/brw_vec4.cpp   | 60 +---
 src/mesa/drivers/dri/i965/intel_debug.c  |  1 +
 src/mesa/drivers/dri/i965/intel_debug.h  |  1 +
 6 files changed, 78 insertions(+), 15 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index e1a994a..f56cfb2 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -557,6 +557,15 @@ brw_initialize_context_constants(struct brw_context *brw)
ctx-Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].OptimizeForAOS = true;
ctx-Const.ShaderCompilerOptions[MESA_SHADER_GEOMETRY].OptimizeForAOS = 
true;
 
+   if (brw-scalar_vs) {
+  /* If we're using the scalar backend for vertex shaders, we need to
+   * configure these accordingly.
+   */
+  
ctx-Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].EmitNoIndirectOutput = 
true;
+  ctx-Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].EmitNoIndirectTemp 
= true;
+  ctx-Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].OptimizeForAOS = 
false;
+   }
+
/* ARB_viewport_array */
if (brw-gen = 7  ctx-API == API_OPENGL_CORE) {
   ctx-Const.MaxViewports = GEN7_NUM_VIEWPORTS;
@@ -755,6 +764,10 @@ brwCreateContext(gl_api api,
 
brw_process_driconf_options(brw);
brw_process_intel_debug_variable(brw);
+
+   if (brw-gen = 8  !(INTEL_DEBUG  DEBUG_VEC4VS))
+  brw-scalar_vs = true;
+
brw_initialize_context_constants(brw);
 
ctx-Const.ResetStrategy = notify_reset
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 463f3d2..f198103 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1067,6 +1067,7 @@ struct brw_context
bool has_pln;
bool no_simd8;
bool use_rep_send;
+   bool scalar_vs;
 
/**
 * Some versions of Gen hardware don't do centroid interpolation correctly
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 3c78afd..26da729 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -71,6 +71,19 @@ brw_shader_precompile(struct gl_context *ctx, struct 
gl_shader_program *prog)
return true;
 }
 
+static inline bool
+is_scalar_shader_stage(struct brw_context *brw, int stage)
+{
+   switch (stage) {
+   case MESA_SHADER_FRAGMENT:
+  return true;
+   case MESA_SHADER_VERTEX:
+  return brw-scalar_vs;
+   default:
+  return false;
+   }
+}
+
 static void
 brw_lower_packing_builtins(struct brw_context *brw,
gl_shader_stage shader_type,
@@ -91,7 +104,7 @@ brw_lower_packing_builtins(struct brw_context *brw,
* lowering is needed. For SOA code, the Half2x16 ops must be
* scalarized.
*/
-  if (shader_type == MESA_SHADER_FRAGMENT) {
+  if (is_scalar_shader_stage(brw, shader_type)) {
  ops |= LOWER_PACK_HALF_2x16_TO_SPLIT
  |  LOWER_UNPACK_HALF_2x16_TO_SPLIT;
   }
@@ -179,7 +192,7 @@ brw_link_shader(struct gl_context *ctx, struct 
gl_shader_program *shProg)
   do {
 progress = false;
 
-if (stage == MESA_SHADER_FRAGMENT) {
+if (is_scalar_shader_stage(brw, stage)) {
brw_do_channel_expressions(shader-base.ir);
brw_do_vector_splitting(shader-base.ir);
 }
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 280db47..3e9cc23 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -22,6 +22,7 @@
  */
 
 #include brw_vec4.h
+#include brw_fs.h
 #include brw_cfg.h
 #include brw_vs.h
 #include brw_dead_control_flow.h
@@ -1863,6 +1864,7 @@ brw_vs_emit(struct brw_context *brw,
 {
bool start_busy = false;
double start_time = 0;
+   const unsigned *assembly = NULL;
 
if (unlikely(brw-perf_debug)) {
   start_busy = (brw-batch.last_bo 
@@ -1877,23 +1879,55 @@ brw_vs_emit(struct brw_context *brw,
if (unlikely(INTEL_DEBUG  DEBUG_VS))
   brw_dump_ir(vertex, prog, shader-base, c-vp-program.Base);
 
-   vec4_vs_visitor v(brw, c, prog_data, prog, mem_ctx);
-   if (!v.run()) {
-  if

[Mesa-dev] [PATCH v2 03/16] i965: Generalize fs_generator further

This removes all stage specific data from the generator, and lets us
create a generator for any stage.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp |  5 ++---
 src/mesa/drivers/dri/i965/brw_fs.cpp|  2 +-
 src/mesa/drivers/dri/i965/brw_fs.h  |  7 +++
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp  | 19 +++
 4 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
index 7802c9f..86ed953 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
@@ -29,9 +29,8 @@
 brw_blorp_eu_emitter::brw_blorp_eu_emitter(struct brw_context *brw,
bool debug_flag)
: mem_ctx(ralloc_context(NULL)),
- generator(brw, mem_ctx,
-   rzalloc(mem_ctx, struct brw_wm_prog_key),
-   rzalloc(mem_ctx, struct brw_wm_prog_data),
+ generator(brw, mem_ctx, (void *) rzalloc(mem_ctx, struct brw_wm_prog_key),
+   (struct brw_stage_prog_data *) rzalloc(mem_ctx, struct 
brw_wm_prog_data),
NULL, NULL, false, debug_flag)
 {
 }
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index e12fd77..e417e0c 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3743,7 +3743,7 @@ brw_wm_fs_emit(struct brw_context *brw,
   prog_data-no_8 = false;
}
 
-   fs_generator g(brw, mem_ctx, key, prog_data, prog, fp,
+   fs_generator g(brw, mem_ctx, (void *) key, prog_data-base, prog, 
fp-Base,
   v.runtime_check_aads_emit, INTEL_DEBUG  DEBUG_WM);
if (simd8_cfg)
   g.generate_code(simd8_cfg, 8);
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 5c21dd0..ae21840 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -694,10 +694,10 @@ class fs_generator
 public:
fs_generator(struct brw_context *brw,
 void *mem_ctx,
-const struct brw_wm_prog_key *key,
-struct brw_wm_prog_data *prog_data,
+const void *key,
+struct brw_stage_prog_data *prog_data,
 struct gl_shader_program *shader_prog,
-struct gl_fragment_program *fp,
+struct gl_program *fp,
 bool runtime_check_aads_emit,
 bool debug_flag);
~fs_generator();
@@ -799,7 +799,6 @@ private:
struct gl_context *ctx;
 
struct brw_compile *p;
-   gl_shader_stage stage;
const void * const key;
struct brw_stage_prog_data * const prog_data;
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 0622b07..9faecf6 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -38,16 +38,16 @@ extern C {
 
 fs_generator::fs_generator(struct brw_context *brw,
void *mem_ctx,
-   const struct brw_wm_prog_key *key,
-   struct brw_wm_prog_data *prog_data,
+   const void *key,
+   struct brw_stage_prog_data *prog_data,
struct gl_shader_program *shader_prog,
-   struct gl_fragment_program *fp,
+   struct gl_program *prog,
bool runtime_check_aads_emit,
bool debug_flag)
 
-   : brw(brw), stage(MESA_SHADER_FRAGMENT), key(key),
- prog_data(prog_data-base), shader_prog(shader_prog),
- prog(fp-Base), runtime_check_aads_emit(runtime_check_aads_emit),
+   : brw(brw), key(key),
+ prog_data(prog_data), shader_prog(shader_prog),
+ prog(prog), runtime_check_aads_emit(runtime_check_aads_emit),
  debug_flag(debug_flag), mem_ctx(mem_ctx)
 {
ctx = brw-ctx;
@@ -105,7 +105,6 @@ fs_generator::fire_fb_write(fs_inst *inst,
 {
uint32_t msg_control;
 
-   assert(stage == MESA_SHADER_FRAGMENT);
brw_wm_prog_data *prog_data = (brw_wm_prog_data*) this-prog_data;
 
if (brw-gen  6) {
@@ -146,7 +145,6 @@ fs_generator::fire_fb_write(fs_inst *inst,
 void
 fs_generator::generate_fb_write(fs_inst *inst, struct brw_reg payload)
 {
-   assert(stage == MESA_SHADER_FRAGMENT);
brw_wm_prog_data *prog_data = (brw_wm_prog_data*) this-prog_data;
const brw_wm_prog_key * const key = (brw_wm_prog_key * const) this-key;
struct brw_reg implied_header;
@@ -700,7 +698,6 @@ fs_generator::generate_ddx(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
assert(quality.file == BRW_IMMEDIATE_VALUE);
assert(quality.type == BRW_REGISTER_TYPE_D);
 
-   assert(stage == MESA_SHADER_FRAGMENT);
const brw_wm_prog_key * const key =

[Mesa-dev] [PATCH v2 10/16] i965: Rename brw_vec4_prog_data to brw_vue_prog_data

With scalar vertex shader coming up, we're going to reuse brw_vec4_prog_data
in the scalar backend.  There's nothing vec4 specific in the struct, it's
instead common state for stages that operate on VUEs.  This patch renames
the struct to brw_vue_prog_data which is more descriptive and will look a lot
less awkward when we use it in the scalar backend.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_context.h   | 17 -
 src/mesa/drivers/dri/i965/brw_vec4.cpp|  6 +++---
 src/mesa/drivers/dri/i965/brw_vec4.h  | 18 +-
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp  |  4 ++--
 src/mesa/drivers/dri/i965/brw_vec4_gs.c   |  4 ++--
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h   |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp|  4 ++--
 src/mesa/drivers/dri/i965/brw_vs.c| 10 +-
 src/mesa/drivers/dri/i965/brw_vs.h|  2 +-
 src/mesa/drivers/dri/i965/gen6_gs_state.c |  2 +-
 src/mesa/drivers/dri/i965/gen7_gs_state.c |  2 +-
 src/mesa/drivers/dri/i965/gen8_gs_state.c |  2 +-
 src/mesa/drivers/dri/i965/gen8_vs_state.c |  2 +-
 14 files changed, 38 insertions(+), 39 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index e7cd30f..463f3d2 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -145,7 +145,7 @@ extern C {
 struct brw_context;
 struct brw_inst;
 struct brw_vs_prog_key;
-struct brw_vec4_prog_key;
+struct brw_vue_prog_key;
 struct brw_wm_prog_key;
 struct brw_wm_prog_data;
 
@@ -528,10 +528,9 @@ struct brw_ff_gs_prog_data {
 };
 
 
-/* Note: brw_vec4_prog_data_compare() must be updated when adding fields to
- * this struct!
+/* Shared data for stages that operate on VUEs (vertex, geometry)
  */
-struct brw_vec4_prog_data {
+struct brw_vue_prog_data {
struct brw_stage_prog_data base;
struct brw_vue_map vue_map;
 
@@ -552,7 +551,7 @@ struct brw_vec4_prog_data {
  * struct!
  */
 struct brw_vs_prog_data {
-   struct brw_vec4_prog_data base;
+   struct brw_vue_prog_data base;
 
GLbitfield64 inputs_read;
 
@@ -610,7 +609,7 @@ struct brw_vs_prog_data {
  */
 struct brw_gs_prog_data
 {
-   struct brw_vec4_prog_data base;
+   struct brw_vue_prog_data base;
 
/**
 * Size of an output vertex, measured in HWORDS (32 bytes).
@@ -1853,9 +1852,9 @@ void gen8_hiz_exec(struct brw_context *brw, struct 
intel_mipmap_tree *mt,
 uint32_t get_hw_prim_for_gl_prim(int mode);
 
 void
-brw_setup_vec4_key_clip_info(struct brw_context *brw,
- struct brw_vec4_prog_key *key,
- bool program_uses_clip_distance);
+brw_setup_vue_key_clip_info(struct brw_context *brw,
+struct brw_vue_prog_key *key,
+bool program_uses_clip_distance);
 
 void
 gen6_upload_push_constants(struct brw_context *brw,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index df589b8..280db47 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1911,9 +1911,9 @@ brw_vs_emit(struct brw_context *brw,
 
 
 void
-brw_vec4_setup_prog_key_for_precompile(struct gl_context *ctx,
-   struct brw_vec4_prog_key *key,
-   GLuint id, struct gl_program *prog)
+brw_vue_setup_prog_key_for_precompile(struct gl_context *ctx,
+  struct brw_vue_prog_key *key,
+  GLuint id, struct gl_program *prog)
 {
key-program_string_id = id;
key-clamp_vertex_color = ctx-API == API_OPENGL_COMPAT;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 750f491..18ec8b3 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -50,7 +50,7 @@ struct brw_vec4_compile {
 };
 
 
-struct brw_vec4_prog_key {
+struct brw_vue_prog_key {
GLuint program_string_id;
 
/**
@@ -77,7 +77,7 @@ extern C {
 
 void
 brw_vec4_setup_prog_key_for_precompile(struct gl_context *ctx,
-   struct brw_vec4_prog_key *key,
+   struct brw_vue_prog_key *key,
GLuint id, struct gl_program *prog);
 
 #ifdef __cplusplus
@@ -210,7 +210,7 @@ public:
 const src_reg src2 = src_reg());
 
struct brw_reg get_dst(void);
-   struct brw_reg get_src(const struct brw_vec4_prog_data *prog_data, int i);
+   struct brw_reg get_src(const struct brw_vue_prog_data *prog_data, int i);
 
dst_reg dst;
src_reg src[3];
@@ -252,8 +252,8 @@ public:
vec4_visitor(struct brw_context *brw,
 struct

[Mesa-dev] [PATCH v2 14/16] i965: Add fs_visitor::run_vs() to generate scalar vertex shader code

This patch uses the previous refactoring to add a new run_vs() method
that generates vertex shader code using the scalar visitor and
optimizer.

Signed-off-by: Kristian Høgsberg k...@bitplanet.net
---
 src/mesa/drivers/dri/i965/brw_fs.cpp |  99 -
 src/mesa/drivers/dri/i965/brw_fs.h   |  21 +-
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 303 ++-
 3 files changed, 412 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 4dce0a2..8007977 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1828,6 +1828,56 @@ fs_visitor::assign_urb_setup()
   urb_start + prog_data-num_varying_inputs * 2;
 }
 
+void
+fs_visitor::assign_vs_urb_setup()
+{
+   brw_vs_prog_data *vs_prog_data = (brw_vs_prog_data *) prog_data;
+   int grf, count, slot, channel, attr;
+
+   assert(stage == MESA_SHADER_VERTEX);
+   count = _mesa_bitcount_64(vs_prog_data-inputs_read);
+   if (vs_prog_data-uses_vertexid || vs_prog_data-uses_instanceid)
+  count++;
+
+   /* Each attribute is 4 regs. */
+   this-first_non_payload_grf =
+  payload.num_regs + prog_data-curb_read_length + count * 4;
+
+   unsigned vue_entries =
+  MAX2(count, vs_prog_data-base.vue_map.num_slots);
+
+   vs_prog_data-base.urb_entry_size = ALIGN(vue_entries, 4) / 4;
+   vs_prog_data-base.urb_read_length = (count + 1) / 2;
+
+   assert(vs_prog_data-base.urb_read_length = 15);
+
+   /* Rewrite all ATTR file references to the hw grf that they land in. */
+   foreach_block_and_inst(block, fs_inst, inst, cfg) {
+  for (int i = 0; i  inst-sources; i++) {
+ if (inst-src[i].file == ATTR) {
+
+if (inst-src[i].reg == VERT_ATTRIB_MAX) {
+   slot = count - 1;
+} else {
+   attr = inst-src[i].reg + inst-src[i].reg_offset / 4;
+   slot = _mesa_bitcount_64(vs_prog_data-inputs_read 
+BITFIELD64_MASK(attr));
+}
+
+channel = inst-src[i].reg_offset  3;
+
+grf = payload.num_regs +
+   prog_data-curb_read_length +
+   slot * 4 + channel;
+
+inst-src[i].file = HW_REG;
+inst-src[i].fixed_hw_reg =
+   retype(brw_vec8_grf(grf, 0), inst-src[i].type);
+ }
+  }
+   }
+}
+
 /**
  * Split large virtual GRFs into separate components if we can.
  *
@@ -3405,6 +3455,13 @@ fs_visitor::setup_payload_gen6()
 }
 
 void
+fs_visitor::setup_vs_payload()
+{
+   /* R0: thread header, R1: urb handles */
+   payload.num_regs = 2;
+}
+
+void
 fs_visitor::assign_binding_table_offsets()
 {
assert(stage == MESA_SHADER_FRAGMENT);
@@ -3471,6 +3528,8 @@ fs_visitor::opt_drop_redundant_mov_to_flags()
 void
 fs_visitor::optimize()
 {
+   const char *stage_name = stage == MESA_SHADER_VERTEX ? vs : fs;
+
calculate_cfg();
 
split_virtual_grfs();
@@ -3487,8 +3546,8 @@ fs_visitor::optimize()
 \
   if (unlikely(INTEL_DEBUG  DEBUG_OPTIMIZER)  this_progress) {   \
  char filename[64]; \
- snprintf(filename, 64, fs%d-%04d-%02d-%02d- #pass,   \
-  dispatch_width, shader_prog ? shader_prog-Name : 0, 
iteration, pass_num); \
+ snprintf(filename, 64, %s%d-%04d-%02d-%02d- #pass,  \
+  stage_name, dispatch_width, shader_prog ? shader_prog-Name 
: 0, iteration, pass_num); \
 \
  backend_visitor::dump_instructions(filename);  \
   } \
@@ -3498,8 +3557,8 @@ fs_visitor::optimize()
 
if (unlikely(INTEL_DEBUG  DEBUG_OPTIMIZER)) {
   char filename[64];
-  snprintf(filename, 64, fs%d-%04d-00-start,
-   dispatch_width, shader_prog ? shader_prog-Name : 0);
+  snprintf(filename, 64, %s%d-%04d-00-start,
+   stage_name, dispatch_width, shader_prog ? shader_prog-Name : 
0);
 
   backend_visitor::dump_instructions(filename);
}
@@ -3608,6 +3667,38 @@ fs_visitor::allocate_registers()
 }
 
 bool
+fs_visitor::run_vs()
+{
+   assert(stage == MESA_SHADER_VERTEX);
+
+   assign_common_binding_table_offsets(0);
+   setup_vs_payload();
+
+   if (INTEL_DEBUG  DEBUG_SHADER_TIME)
+  emit_shader_time_begin();
+
+   foreach_in_list(ir_instruction, ir, shader-base.ir) {
+  base_ir = ir;
+  this-result = reg_undef;
+  ir-accept(this);
+   }
+   base_ir = NULL;
+   if (failed)
+  return false;
+
+   emit_urb_writes();
+
+   optimize();
+
+   assign_curb_setup();
+   assign_vs_urb_setup();
+
+   allocate_registers();
+
+   return !failed;
+}
+
+bool
 fs_visitor::run()
 {
sanity_param_count = prog-Parameters-NumParameters;
diff --git

Re: [Mesa-dev] [PATCH 4/4] aux/pipe_loader: Don't leak dlerror string on dlopen failure

On Thu, Nov 13, 2014 at 6:22 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 On Thu, Nov 13, 2014 at 6:43 PM, Aaron Watry awa...@gmail.com wrote:
 dlopen allocates a string on dlopen failure which is retrieved via dlerror. 
 In
 order to free that string, you need to retrieve and then free it.

 Are you basically saying that glibc leaks memory and you're trying to
 make up for it? What if you use a non-buggy library? Or is dlopen()
 specified in such a way that if it fails, you must free the result of
 dlerror? I see nothing in the man pages to suggest that...

The closest that I can come to documentation at least implying this is [1]:


RETURN VALUE

If file cannot be found, cannot be opened for reading, is not of an
appropriate object format for processing by dlopen(), or if an error
occurs during the process of loading file or relocating its symbolic
references, dlopen() shall return NULL. More detailed diagnostic
information shall be available through dlerror().


Which implies that libdl needs to keep some sort of state regarding
the last error encountered.  I see no requirement that it keep a
malloc'd string, just that it keep some state information around.

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/dlopen.html

That does seem to lead one to read the dlerror() page that has this gem:

DESCRIPTION

The dlerror() function shall return a null-terminated character string
(with no trailing newline) that describes the last error that
occurred during dynamic linking processing. If no dynamic linking
errors have occurred since the last invocation of dlerror(), dlerror()
shall return NULL. Thus, invoking dlerror() a second time, immediately
following a prior invocation, shall result in NULL being returned.

snip

APPLICATION USAGE
The messages returned by dlerror() may reside in a static buffer that
is overwritten on each call to dlerror()


So, it may or may not return a malloc'd string, and all I've managed
here is to fix a leak in glibc's specific implementation.

The above docs seem to imply that dlopen() triggering an error needs
to populate some state and dlerror() retrieves the last error that has
occurred since the last dlerror() call. calling dlerror() again at
that point will return null.

That being said, I think a simpler fix and probably more correct fix
would be to do:
dlerror(); dlerror();

Thoughts?

--Aaron





 In order to keep things legit the windows/other util_dl_error paths allocate
 and then copy their error message into a buffer as well.

 Signed-off-by: Aaron Watry awa...@gmail.com
 CC: Ilia Mirkin imir...@alum.mit.edu

 v3: Switch comment to C-Style
 v2: Use strdup instead of calloc/strcpy
 ---
  src/gallium/auxiliary/pipe-loader/pipe_loader.c | 5 +
  src/gallium/auxiliary/util/u_dl.c   | 4 ++--
  2 files changed, 7 insertions(+), 2 deletions(-)

 diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader.c 
 b/src/gallium/auxiliary/pipe-loader/pipe_loader.c
 index 8e79f85..7a4e0b1 100644
 --- a/src/gallium/auxiliary/pipe-loader/pipe_loader.c
 +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader.c
 @@ -25,6 +25,8 @@
   *
   **/

 +#include dlfcn.h
 +
  #include pipe_loader_priv.h

  #include util/u_inlines.h
 @@ -101,6 +103,9 @@ pipe_loader_find_module(struct pipe_loader_device *dev,
   if (lib) {
  return lib;
   }
 +
 + /* Retrieve the dlerror() str so that it can be freed properly */
 + FREE(util_dl_error());
}
 }

 diff --git a/src/gallium/auxiliary/util/u_dl.c 
 b/src/gallium/auxiliary/util/u_dl.c
 index aca435d..00c4d7c 100644
 --- a/src/gallium/auxiliary/util/u_dl.c
 +++ b/src/gallium/auxiliary/util/u_dl.c
 @@ -87,8 +87,8 @@ util_dl_error(void)
  #if defined(PIPE_OS_UNIX)
 return dlerror();
  #elif defined(PIPE_OS_WINDOWS)
 -   return unknown error;
 +   return strdup(unknown error);
  #else
 -   return unknown error;
 +   return strdup(unknown error);
  #endif
  }
 --
 2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] aux/pipe_loader: Don't leak dlerror string on dlopen failure

2014-11-13 Thread Ilia Mirkin

On Thu, Nov 13, 2014 at 7:54 PM, Aaron Watry awa...@gmail.com wrote:

 On Thu, Nov 13, 2014 at 6:22 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
  On Thu, Nov 13, 2014 at 6:43 PM, Aaron Watry awa...@gmail.com wrote:
  dlopen allocates a string on dlopen failure which is retrieved via 
  dlerror. In
  order to free that string, you need to retrieve and then free it.
 
  Are you basically saying that glibc leaks memory and you're trying to
  make up for it? What if you use a non-buggy library? Or is dlopen()
  specified in such a way that if it fails, you must free the result of
  dlerror? I see nothing in the man pages to suggest that...

 The closest that I can come to documentation at least implying this is [1]:

 
 RETURN VALUE

 If file cannot be found, cannot be opened for reading, is not of an
 appropriate object format for processing by dlopen(), or if an error
 occurs during the process of loading file or relocating its symbolic
 references, dlopen() shall return NULL. More detailed diagnostic
 information shall be available through dlerror().
 

 Which implies that libdl needs to keep some sort of state regarding
 the last error encountered.  I see no requirement that it keep a
 malloc'd string, just that it keep some state information around.

 [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/dlopen.html

 That does seem to lead one to read the dlerror() page that has this gem:
 
 DESCRIPTION

 The dlerror() function shall return a null-terminated character string
 (with no trailing newline) that describes the last error that
 occurred during dynamic linking processing. If no dynamic linking
 errors have occurred since the last invocation of dlerror(), dlerror()
 shall return NULL. Thus, invoking dlerror() a second time, immediately
 following a prior invocation, shall result in NULL being returned.

 snip

 APPLICATION USAGE
 The messages returned by dlerror() may reside in a static buffer that
 is overwritten on each call to dlerror()
 

 So, it may or may not return a malloc'd string, and all I've managed
 here is to fix a leak in glibc's specific implementation.

 The above docs seem to imply that dlopen() triggering an error needs
 to populate some state and dlerror() retrieves the last error that has
 occurred since the last dlerror() call. calling dlerror() again at
 that point will return null.

 That being said, I think a simpler fix and probably more correct fix
 would be to do:
 dlerror(); dlerror();

That seems like it's vastly less likely to do the wrong thing with any
reasonable implementation. So if it fixes things for glibc, let's do
that instead :) Also, I wonder, even if glibc malloc's the string,
whether it is truly leaked or valgrind just thinks that. May be
interesting to pull the curtain back and see what's actually going on
in glibc...

  -ilia



 Thoughts?

 --Aaron



 
 
  In order to keep things legit the windows/other util_dl_error paths 
  allocate
  and then copy their error message into a buffer as well.
 
  Signed-off-by: Aaron Watry awa...@gmail.com
  CC: Ilia Mirkin imir...@alum.mit.edu
 
  v3: Switch comment to C-Style
  v2: Use strdup instead of calloc/strcpy
  ---
   src/gallium/auxiliary/pipe-loader/pipe_loader.c | 5 +
   src/gallium/auxiliary/util/u_dl.c   | 4 ++--
   2 files changed, 7 insertions(+), 2 deletions(-)
 
  diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader.c 
  b/src/gallium/auxiliary/pipe-loader/pipe_loader.c
  index 8e79f85..7a4e0b1 100644
  --- a/src/gallium/auxiliary/pipe-loader/pipe_loader.c
  +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader.c
  @@ -25,6 +25,8 @@
*

  **/
 
  +#include dlfcn.h
  +
   #include pipe_loader_priv.h
 
   #include util/u_inlines.h
  @@ -101,6 +103,9 @@ pipe_loader_find_module(struct pipe_loader_device *dev,
if (lib) {
   return lib;
}
  +
  + /* Retrieve the dlerror() str so that it can be freed properly */
  + FREE(util_dl_error());
 }
  }
 
  diff --git a/src/gallium/auxiliary/util/u_dl.c 
  b/src/gallium/auxiliary/util/u_dl.c
  index aca435d..00c4d7c 100644
  --- a/src/gallium/auxiliary/util/u_dl.c
  +++ b/src/gallium/auxiliary/util/u_dl.c
  @@ -87,8 +87,8 @@ util_dl_error(void)
   #if defined(PIPE_OS_UNIX)
  return dlerror();
   #elif defined(PIPE_OS_WINDOWS)
  -   return unknown error;
  +   return strdup(unknown error);
   #else
  -   return unknown error;
  +   return strdup(unknown error);
   #endif
   }
  --
  2.1.0
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] How difficult would it be to have debugging information for Jitted code show up?

2014-11-13 Thread Steven Stewart-Gallus

But my distribution does build Mesa with debugging symbols. I have the package
libgl1-mesa-dri-dbg installed which gives me debugging symbols such as
drm_intel_bo_wait_rendering and drm_intel_bo_subdata. I assume I don't have is
debugging information for JITted code although maybe the problem is a bug in
perf (they've had problems with artificial intrinsics before.) Please assume I
have at least the small quantity of intelligence needed to have installed the
debugging symbols for the library I a. But tell me, what do you see when you
profile glxgears with perf?

Thank you,
Steven Stewart-Gallus
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] How difficult would it be to have debugging information for Jitted code show up?

2014-11-13 Thread Emil Velikov

Hi Steven,

On 14 November 2014 01:40, Steven Stewart-Gallus
sstewartgallu...@mylangara.bc.ca wrote:
 But my distribution does build Mesa with debugging symbols. I have the package
 libgl1-mesa-dri-dbg installed which gives me debugging symbols such as
 drm_intel_bo_wait_rendering and drm_intel_bo_subdata. I assume I don't have is
 debugging information for JITted code although maybe the problem is a bug in
 perf (they've had problems with artificial intrinsics before.) Please assume I
 have at least the small quantity of intelligence needed to have installed the
 debugging symbols for the library I a. But tell me, what do you see when you
 profile glxgears with perf?

Let me put things a bit differently: the classic drivers (be that i965
or any other) do _not_ use LLVM. So when you say that there are no
debug symbols for the JITted code. that does not make sense for
most(all?) people thus they assume the closest thing. Which is that
you're missing the debug symbols, therefore giving you instructions on
how to get them(rebuild mesa).

So can you please tell us what makes you think that using the i965
driver, goes into LLVM/JITted code ?
Perhaps having some sort of backtrace (I know the some functions names
will be ??) or a testcase might bring some light :)

Cheers,
-Emil

 Thank you,
 Steven Stewart-Gallus
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] aux/pipe_loader: Don't leak dlerror string on dlopen failure