date:20141112

[Mesa-dev] build regression in tinderbox

2014-11-12 Thread Dave Airlie

I've been building git llvm and mesa on a RHEL7 tinderbox (tinderbox.x.org)

This just appeared.

Dave.

gallivm/lp_bld_debug.cpp: In function 'size_t disassemble(const void*,
llvm::raw_ostream)':
gallivm/lp_bld_debug.cpp:283:23: error: cannot declare variable
'memoryObject' to be of abstract type 'BufferMemoryObject'
BufferMemoryObject memoryObject((const uint8_t *)bytes, extent);
   ^
gallivm/lp_bld_debug.cpp:149:7: note:   because the following virtual
functions are pure within 'BufferMemoryObject':
 class BufferMemoryObject:
   ^
In file included from gallivm/lp_bld_debug.cpp:35:0:
/home/tinderbox/xorg-build/include/llvm/Support/MemoryObject.h:49:15:
note: virtual int llvm::MemoryObject::readBytes(uint64_t, uint64_t,
uint8_t*) const
   virtual int readBytes(uint64_t address, uint64_t size,
   ^
/home/tinderbox/xorg-build/include/llvm/Support/MemoryObject.h:58:26:
note: virtual const uint8_t* llvm::MemoryObject::getPointer(uint64_t,
uint64_t) const
   virtual const uint8_t *getPointer(uint64_t address, uint64_t size) const = 0;
  ^
/home/tinderbox/xorg-build/include/llvm/Support/MemoryObject.h:65:16:
note: virtual bool llvm::MemoryObject::isValidAddress(uint64_t) const
   virtual bool isValidAddress(uint64_t address) const = 0;
^
/home/tinderbox/xorg-build/include/llvm/Support/MemoryObject.h:72:16:
note: virtual bool llvm::MemoryObject::isObjectEnd(uint64_t) const
   virtual bool isObjectEnd(uint64_t address) const = 0;
^
gallivm/lp_bld_debug.cpp:300:23: error: no matching function for call
to 'llvm::MCDisassembler::getInstruction(llvm::MCInst, uint64_t,
BufferMemoryObject, uint64_t, llvm::raw_ostream,
llvm::raw_ostream) const'
   nulls(), nulls())) {
   ^
gallivm/lp_bld_debug.cpp:300:23: note: candidate is:
In file included from gallivm/lp_bld_debug.cpp:48:0:
/home/tinderbox/xorg-build/include/llvm/MC/MCDisassembler.h:78:24:
note: virtual llvm::MCDisassembler::DecodeStatus
llvm::MCDisassembler::getInstruction(llvm::MCInst, uint64_t,
llvm::ArrayRefunsigned char, uint64_t, llvm::raw_ostream,
llvm::raw_ostream) const
   virtual DecodeStatus getInstruction(MCInst Instr, uint64_t Size,
^
/home/tinderbox/xorg-build/include/llvm/MC/MCDisassembler.h:78:24:
note:   no known conversion for argument 3 from 'BufferMemoryObject'
to 'llvm::ArrayRefunsigned char'
gallivm/lp_bld_misc.cpp: In function 'LLVMBool
lp_build_create_jit_compiler_for_module(LLVMOpaqueExecutionEngine**,
lp_generated_code**, LLVMModuleRef, LLVMMCJITMemoryManagerRef,
unsigned int, int, char**)':
gallivm/lp_bld_misc.cpp:526:11: warning: 'MM' may be used
uninitialized in this function [-Wmaybe-uninitialized]
delete MM;
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965: Implement WaCsStallAtEveryFourthPipecontrol on IVB/BYT.

2014-11-12 Thread Kenneth Graunke

According to the documentation, we need to do a CS stall on every fourth
PIPE_CONTROL command to avoid GPU hangs.  The kernel does a CS stall
between batches, so we only need to count the PIPE_CONTROLs in our batches.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_context.h   |  2 ++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 35 +++
 2 files changed, 37 insertions(+)

This may help
https://code.google.com/p/chromium/issues/detail?id=333130
My theory is that marcheu's patch removes PIPE_CONTROLs that don't have
CS stalls, which may bring it under 4 in a row, or at least bring ones
with CS stalls closer together.

It also may help the couple of users who've reported IVB GPU hangs recently.

Or it may be totally useless...

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 656cbe8..27cf92c 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -854,6 +854,8 @@ struct intel_batchbuffer {
enum brw_gpu_ring ring;
bool needs_sol_reset;
 
+   uint8_t pipe_controls_since_last_cs_stall;
+
struct {
   uint16_t used;
   int reloc_count;
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index cd45af6..2255cee 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -81,6 +81,7 @@ intel_batchbuffer_reset(struct brw_context *brw)
brw-batch.state_batch_offset = brw-batch.bo-size;
brw-batch.used = 0;
brw-batch.needs_sol_reset = false;
+   brw-batch.pipe_controls_since_last_cs_stall = 0;
 
/* We don't know what ring the new batch will be sent to until we see the
 * first BEGIN_BATCH or BEGIN_BATCH_BLT.  Mark it as unknown.
@@ -433,6 +434,36 @@ gen8_add_cs_stall_workaround_bits(uint32_t *flags)
   *flags |= PIPE_CONTROL_STALL_AT_SCOREBOARD;
 }
 
+/* Implement the WaCsStallAtEveryFourthPipecontrol workaround on IVB, BYT:
+ *
+ * Every 4th PIPE_CONTROL command, not counting the PIPE_CONTROL with
+ *  only read-cache-invalidate bit(s) set, must have a CS_STALL bit set.
+ *
+ * Note that the kernel does CS stalls between batches, so we only need
+ * to count them within a batch.
+ */
+static uint32_t
+gen7_cs_stall_every_four_pipe_controls(struct brw_context *brw, uint32_t flags)
+{
+   if (brw-gen == 7  brw-is_haswell) {
+  if (flags  PIPE_CONTROL_CS_STALL) {
+ /* If we're doing a CS stall, reset the counter and carry on. */
+ brw-batch.pipe_controls_since_last_cs_stall = 0;
+ return 0;
+  } else {
+ /* Otherwise, we need to count this PIPE_CONTROL. */
+ ++brw-batch.pipe_controls_since_last_cs_stall;
+  }
+
+  /* If this is the fourth pipe control without a CS stall, do one now. */
+  if (brw-batch.pipe_controls_since_last_cs_stall == 4) {
+ brw-batch.pipe_controls_since_last_cs_stall = 0;
+ return PIPE_CONTROL_CS_STALL;
+  }
+   }
+   return 0;
+}
+
 /**
  * Emit a PIPE_CONTROL with various flushing flags.
  *
@@ -454,6 +485,8 @@ brw_emit_pipe_control_flush(struct brw_context *brw, 
uint32_t flags)
   OUT_BATCH(0);
   ADVANCE_BATCH();
} else if (brw-gen = 6) {
+  flags |= gen7_cs_stall_every_four_pipe_controls(brw, flags);
+
   BEGIN_BATCH(5);
   OUT_BATCH(_3DSTATE_PIPE_CONTROL | (5 - 2));
   OUT_BATCH(flags);
@@ -496,6 +529,8 @@ brw_emit_pipe_control_write(struct brw_context *brw, 
uint32_t flags,
   OUT_BATCH(imm_upper);
   ADVANCE_BATCH();
} else if (brw-gen = 6) {
+  flags |= gen7_cs_stall_every_four_pipe_controls(brw, flags);
+
   /* PPGTT/GGTT is selected by DW2 bit 2 on Sandybridge, but DW1 bit 24
* on later platforms.  We always use PPGTT on Gen7+.
*/
-- 
2.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Use the predicate enable bit for conditional rendering without stalling

2014-11-12 Thread Daniel Vetter

On Tue, Nov 11, 2014 at 11:13:28AM -0800, Kenneth Graunke wrote:
 On Tuesday, November 11, 2014 06:59:51 PM Neil Roberts wrote:
  Kenneth Graunke kenn...@whitecape.org writes:
  
   drm-intel-next must have the new software checker turned on, which
   disallows non-whitelisted register writes (along with libva, so it
   can't really be enabled upstream yet).
  
  For what it's worth, I get the EINVAL error even on the stock Fedora 20
  kernel on Haswell (and presumably IvyBridge) so I can only assume the
  software checker is already upstream, unless I'm misunderstanding
  something.
  
  $ uname -r
  3.16.7-200.fc20.x86_64
  $ modinfo i915 | grep cmd_parser
  parm: enable_cmd_parser:Enable command parsing [...]
(1=enabled [default], 0=disabled) (int)
  $ sudo cat /sys/module/i915/parameters/enable_cmd_parser 
  1
  
  If I cat 0 to /sys/module/i915/parameters/enable_cmd_parser then I no
  longer get the EINVAL error.
  
  - Neil
 
 Huh.  Yeah, I thought they turned it on by default in 3.16, which I don't 
 understand at all.  AFAIK the libva issue isn't fixed (or wasn't by then), so 
 it sure seems like it would've broken userspace.  Which would be a pretty 
 clear kernel policy violation...

We let libva pass. And in the latest patches from Brad if we detect libva
tricks we'll still let it pass, just not with elevated privs needed for
writing special registers. And the point of enabling the parser in 3.16
already was to have as much coverage early as possible to catch any
userspace issues we've missed.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Implement WaCsStallAtEveryFourthPipecontrol on IVB/BYT.

2014-11-12 Thread Daniel Vetter

On Wed, Nov 12, 2014 at 01:33:01AM -0800, Kenneth Graunke wrote:
 According to the documentation, we need to do a CS stall on every fourth
 PIPE_CONTROL command to avoid GPU hangs.  The kernel does a CS stall
 between batches, so we only need to count the PIPE_CONTROLs in our batches.
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org

Yeah, kernel adds the CS stall bit both to the flush right before/after
the batch so this works. The kernel also has a comment so people hopefully
check userspace assumptions when testing this.

Reviewed-by: Daniel Vetter daniel.vet...@ffwll.ch

Some useless bikesheds for you to ignore below ;-)

 ---
  src/mesa/drivers/dri/i965/brw_context.h   |  2 ++
  src/mesa/drivers/dri/i965/intel_batchbuffer.c | 35 
 +++
  2 files changed, 37 insertions(+)
 
 This may help
 https://code.google.com/p/chromium/issues/detail?id=333130
 My theory is that marcheu's patch removes PIPE_CONTROLs that don't have
 CS stalls, which may bring it under 4 in a row, or at least bring ones
 with CS stalls closer together.
 
 It also may help the couple of users who've reported IVB GPU hangs recently.
 
 Or it may be totally useless...
 
 diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
 b/src/mesa/drivers/dri/i965/brw_context.h
 index 656cbe8..27cf92c 100644
 --- a/src/mesa/drivers/dri/i965/brw_context.h
 +++ b/src/mesa/drivers/dri/i965/brw_context.h
 @@ -854,6 +854,8 @@ struct intel_batchbuffer {
 enum brw_gpu_ring ring;
 bool needs_sol_reset;
  
 +   uint8_t pipe_controls_since_last_cs_stall;

Since the compile aligns this anyway I tend to not bother with smaller
types. And the fixed-width generally used for abi and hw registers, which
this isn't.

 +
 struct {
uint16_t used;
int reloc_count;
 diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
 b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
 index cd45af6..2255cee 100644
 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
 +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
 @@ -81,6 +81,7 @@ intel_batchbuffer_reset(struct brw_context *brw)
 brw-batch.state_batch_offset = brw-batch.bo-size;
 brw-batch.used = 0;
 brw-batch.needs_sol_reset = false;
 +   brw-batch.pipe_controls_since_last_cs_stall = 0;
  
 /* We don't know what ring the new batch will be sent to until we see the
  * first BEGIN_BATCH or BEGIN_BATCH_BLT.  Mark it as unknown.
 @@ -433,6 +434,36 @@ gen8_add_cs_stall_workaround_bits(uint32_t *flags)
*flags |= PIPE_CONTROL_STALL_AT_SCOREBOARD;
  }
  
 +/* Implement the WaCsStallAtEveryFourthPipecontrol workaround on IVB, BYT:
 + *
 + * Every 4th PIPE_CONTROL command, not counting the PIPE_CONTROL with
 + *  only read-cache-invalidate bit(s) set, must have a CS_STALL bit set.
 + *
 + * Note that the kernel does CS stalls between batches, so we only need
 + * to count them within a batch.
 + */
 +static uint32_t
 +gen7_cs_stall_every_four_pipe_controls(struct brw_context *brw, uint32_t 
 flags)
 +{
 +   if (brw-gen == 7  brw-is_haswell) {
 +  if (flags  PIPE_CONTROL_CS_STALL) {
 + /* If we're doing a CS stall, reset the counter and carry on. */
 + brw-batch.pipe_controls_since_last_cs_stall = 0;
 + return 0;
 +  } else {
 + /* Otherwise, we need to count this PIPE_CONTROL. */
 + ++brw-batch.pipe_controls_since_last_cs_stall;
 +  }

You can flatten the control flow here a bit by dropping the else and
moving the ++ into the check. Imo that's a fairly common patter to not
obfuscate the code.

 +
 +  /* If this is the fourth pipe control without a CS stall, do one now. 
 */
 +  if (brw-batch.pipe_controls_since_last_cs_stall == 4) {
 + brw-batch.pipe_controls_since_last_cs_stall = 0;
 + return PIPE_CONTROL_CS_STALL;
 +  }
 +   }
 +   return 0;
 +}
 +
  /**
   * Emit a PIPE_CONTROL with various flushing flags.
   *
 @@ -454,6 +485,8 @@ brw_emit_pipe_control_flush(struct brw_context *brw, 
 uint32_t flags)
OUT_BATCH(0);
ADVANCE_BATCH();
 } else if (brw-gen = 6) {
 +  flags |= gen7_cs_stall_every_four_pipe_controls(brw, flags);
 +
BEGIN_BATCH(5);
OUT_BATCH(_3DSTATE_PIPE_CONTROL | (5 - 2));
OUT_BATCH(flags);
 @@ -496,6 +529,8 @@ brw_emit_pipe_control_write(struct brw_context *brw, 
 uint32_t flags,
OUT_BATCH(imm_upper);
ADVANCE_BATCH();
 } else if (brw-gen = 6) {
 +  flags |= gen7_cs_stall_every_four_pipe_controls(brw, flags);
 +
/* PPGTT/GGTT is selected by DW2 bit 2 on Sandybridge, but DW1 bit 24
 * on later platforms.  We always use PPGTT on Gen7+.
 */
 -- 
 2.1.3
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___

Re: [Mesa-dev] [PATCH] i965: Implement WaCsStallAtEveryFourthPipecontrol on IVB/BYT.

2014-11-12 Thread Chris Wilson

On Wed, Nov 12, 2014 at 11:39:28AM +0100, Daniel Vetter wrote:
 On Wed, Nov 12, 2014 at 01:33:01AM -0800, Kenneth Graunke wrote:
  +/* Implement the WaCsStallAtEveryFourthPipecontrol workaround on IVB, BYT:
  + *
  + * Every 4th PIPE_CONTROL command, not counting the PIPE_CONTROL with
  + *  only read-cache-invalidate bit(s) set, must have a CS_STALL bit set.
  + *
  + * Note that the kernel does CS stalls between batches, so we only need
  + * to count them within a batch.
  + */
  +static uint32_t
  +gen7_cs_stall_every_four_pipe_controls(struct brw_context *brw, uint32_t 
  flags)
  +{
  +   if (brw-gen == 7  brw-is_haswell) {

The comment says for IVB,BYT, the code here only applies to HSW.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] build regression in tinderbox

2014-11-12 Thread Jose Fonseca

Thanks for the heads up.

Fixed now.

Jose


From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Dave 
Airlie airl...@gmail.com
Sent: 12 November 2014 08:05
To: mesa-dev@lists.freedesktop.org
Subject: [Mesa-dev] build regression in tinderbox

I've been building git llvm and mesa on a RHEL7 tinderbox (tinderbox.x.org)

This just appeared.

Dave.

gallivm/lp_bld_debug.cpp: In function 'size_t disassemble(const void*,
llvm::raw_ostream)':
gallivm/lp_bld_debug.cpp:283:23: error: cannot declare variable
'memoryObject' to be of abstract type 'BufferMemoryObject'
BufferMemoryObject memoryObject((const uint8_t *)bytes, extent);
   ^
gallivm/lp_bld_debug.cpp:149:7: note:   because the following virtual
functions are pure within 'BufferMemoryObject':
 class BufferMemoryObject:
   ^
In file included from gallivm/lp_bld_debug.cpp:35:0:
/home/tinderbox/xorg-build/include/llvm/Support/MemoryObject.h:49:15:
note: virtual int llvm::MemoryObject::readBytes(uint64_t, uint64_t,
uint8_t*) const
   virtual int readBytes(uint64_t address, uint64_t size,
   ^
/home/tinderbox/xorg-build/include/llvm/Support/MemoryObject.h:58:26:
note: virtual const uint8_t* llvm::MemoryObject::getPointer(uint64_t,
uint64_t) const
   virtual const uint8_t *getPointer(uint64_t address, uint64_t size) const = 0;
  ^
/home/tinderbox/xorg-build/include/llvm/Support/MemoryObject.h:65:16:
note: virtual bool llvm::MemoryObject::isValidAddress(uint64_t) const
   virtual bool isValidAddress(uint64_t address) const = 0;
^
/home/tinderbox/xorg-build/include/llvm/Support/MemoryObject.h:72:16:
note: virtual bool llvm::MemoryObject::isObjectEnd(uint64_t) const
   virtual bool isObjectEnd(uint64_t address) const = 0;
^
gallivm/lp_bld_debug.cpp:300:23: error: no matching function for call
to 'llvm::MCDisassembler::getInstruction(llvm::MCInst, uint64_t,
BufferMemoryObject, uint64_t, llvm::raw_ostream,
llvm::raw_ostream) const'
   nulls(), nulls())) {
   ^
gallivm/lp_bld_debug.cpp:300:23: note: candidate is:
In file included from gallivm/lp_bld_debug.cpp:48:0:
/home/tinderbox/xorg-build/include/llvm/MC/MCDisassembler.h:78:24:
note: virtual llvm::MCDisassembler::DecodeStatus
llvm::MCDisassembler::getInstruction(llvm::MCInst, uint64_t,
llvm::ArrayRefunsigned char, uint64_t, llvm::raw_ostream,
llvm::raw_ostream) const
   virtual DecodeStatus getInstruction(MCInst Instr, uint64_t Size,
^
/home/tinderbox/xorg-build/include/llvm/MC/MCDisassembler.h:78:24:
note:   no known conversion for argument 3 from 'BufferMemoryObject'
to 'llvm::ArrayRefunsigned char'
gallivm/lp_bld_misc.cpp: In function 'LLVMBool
lp_build_create_jit_compiler_for_module(LLVMOpaqueExecutionEngine**,
lp_generated_code**, LLVMModuleRef, LLVMMCJITMemoryManagerRef,
unsigned int, int, char**)':
gallivm/lp_bld_misc.cpp:526:11: warning: 'MM' may be used
uninitialized in this function [-Wmaybe-uninitialized]
delete MM;
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=pTi-zmVYYqVndX208I3hXW_IovGo3Wm4KgcA4CIldZQs=G6f2PSZnuZU1X1PGHh-8dQESJOXGpaXND5Sj1jhycLEe=
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] clover: fix clCreateContext Piglit test crash

2014-11-12 Thread EdB

clCreateContext no longer crash when CL_CONTEXT_PLATFORM is invalid
---
 src/gallium/state_trackers/clover/api/context.cpp | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/gallium/state_trackers/clover/api/context.cpp 
b/src/gallium/state_trackers/clover/api/context.cpp
index 021eea3..749d2d7 100644
--- a/src/gallium/state_trackers/clover/api/context.cpp
+++ b/src/gallium/state_trackers/clover/api/context.cpp
@@ -39,10 +39,15 @@ clCreateContext(const cl_context_properties *d_props, 
cl_uint num_devs,
   throw error(CL_INVALID_VALUE);
 
for (auto prop : props) {
-  if (prop.first == CL_CONTEXT_PLATFORM)
- obj(prop.second.ascl_platform_id());
-  else
+  if (prop.first == CL_CONTEXT_PLATFORM) {
+ //clover only have one platform
+ cl_platform_id d_platform;
+ cl_int ret = clGetPlatformIDs(1, d_platform, NULL);
+ if (ret || (prop.second.ascl_platform_id() != d_platform))
+throw error(CL_INVALID_PLATFORM);
+  } else {
  throw error(CL_INVALID_PROPERTY);
+  }
}
 
ret_error(r_errcode, CL_SUCCESS);
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Implement WaCsStallAtEveryFourthPipecontrol on IVB/BYT.

2014-11-12 Thread Ian Romanick

On 11/12/2014 10:39 AM, Daniel Vetter wrote:
 On Wed, Nov 12, 2014 at 01:33:01AM -0800, Kenneth Graunke wrote:
 According to the documentation, we need to do a CS stall on every fourth
 PIPE_CONTROL command to avoid GPU hangs.  The kernel does a CS stall
 between batches, so we only need to count the PIPE_CONTROLs in our batches.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 
 Yeah, kernel adds the CS stall bit both to the flush right before/after
 the batch so this works. The kernel also has a comment so people hopefully
 check userspace assumptions when testing this.
 
 Reviewed-by: Daniel Vetter daniel.vet...@ffwll.ch
 
 Some useless bikesheds for you to ignore below ;-)
 
 ---
  src/mesa/drivers/dri/i965/brw_context.h   |  2 ++
  src/mesa/drivers/dri/i965/intel_batchbuffer.c | 35 
 +++
  2 files changed, 37 insertions(+)

 This may help
 https://code.google.com/p/chromium/issues/detail?id=333130
 My theory is that marcheu's patch removes PIPE_CONTROLs that don't have
 CS stalls, which may bring it under 4 in a row, or at least bring ones
 with CS stalls closer together.

 It also may help the couple of users who've reported IVB GPU hangs recently.

 Or it may be totally useless...

 diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
 b/src/mesa/drivers/dri/i965/brw_context.h
 index 656cbe8..27cf92c 100644
 --- a/src/mesa/drivers/dri/i965/brw_context.h
 +++ b/src/mesa/drivers/dri/i965/brw_context.h
 @@ -854,6 +854,8 @@ struct intel_batchbuffer {
 enum brw_gpu_ring ring;
 bool needs_sol_reset;
  
 +   uint8_t pipe_controls_since_last_cs_stall;
 
 Since the compile aligns this anyway I tend to not bother with smaller
 types. And the fixed-width generally used for abi and hw registers, which
 this isn't.

I think this will get stored in the padding after the bool.  Making
this an int will cause extra padding.  I think that's why Ken chose to
put it here.

 +
 struct {
uint16_t used;
int reloc_count;
 diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
 b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
 index cd45af6..2255cee 100644
 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
 +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
 @@ -81,6 +81,7 @@ intel_batchbuffer_reset(struct brw_context *brw)
 brw-batch.state_batch_offset = brw-batch.bo-size;
 brw-batch.used = 0;
 brw-batch.needs_sol_reset = false;
 +   brw-batch.pipe_controls_since_last_cs_stall = 0;
  
 /* We don't know what ring the new batch will be sent to until we see the
  * first BEGIN_BATCH or BEGIN_BATCH_BLT.  Mark it as unknown.
 @@ -433,6 +434,36 @@ gen8_add_cs_stall_workaround_bits(uint32_t *flags)
*flags |= PIPE_CONTROL_STALL_AT_SCOREBOARD;
  }
  
 +/* Implement the WaCsStallAtEveryFourthPipecontrol workaround on IVB, BYT:
 + *
 + * Every 4th PIPE_CONTROL command, not counting the PIPE_CONTROL with
 + *  only read-cache-invalidate bit(s) set, must have a CS_STALL bit set.
 + *
 + * Note that the kernel does CS stalls between batches, so we only need
 + * to count them within a batch.
 + */
 +static uint32_t
 +gen7_cs_stall_every_four_pipe_controls(struct brw_context *brw, uint32_t 
 flags)
 +{
 +   if (brw-gen == 7  brw-is_haswell) {
 +  if (flags  PIPE_CONTROL_CS_STALL) {
 + /* If we're doing a CS stall, reset the counter and carry on. */
 + brw-batch.pipe_controls_since_last_cs_stall = 0;
 + return 0;
 +  } else {
 + /* Otherwise, we need to count this PIPE_CONTROL. */
 + ++brw-batch.pipe_controls_since_last_cs_stall;
 +  }
 
 You can flatten the control flow here a bit by dropping the else and
 moving the ++ into the check. Imo that's a fairly common patter to not
 obfuscate the code.

Yeah, I think that is slightly better... at least making it

/* Count this PIPE_CONTROL. */
brw-batch.pipe_controls_since_last_cs_stall++;

/* If this is the fourth pipe control without a CS stall, do one now. */
if (brw-batch.pipe_controls_since_last_cs_stall == 4) {
...
}

 +
 +  /* If this is the fourth pipe control without a CS stall, do one now. 
 */
 +  if (brw-batch.pipe_controls_since_last_cs_stall == 4) {
 + brw-batch.pipe_controls_since_last_cs_stall = 0;
 + return PIPE_CONTROL_CS_STALL;
 +  }
 +   }
 +   return 0;
 +}
 +
  /**
   * Emit a PIPE_CONTROL with various flushing flags.
   *
 @@ -454,6 +485,8 @@ brw_emit_pipe_control_flush(struct brw_context *brw, 
 uint32_t flags)
OUT_BATCH(0);
ADVANCE_BATCH();
 } else if (brw-gen = 6) {
 +  flags |= gen7_cs_stall_every_four_pipe_controls(brw, flags);
 +
BEGIN_BATCH(5);
OUT_BATCH(_3DSTATE_PIPE_CONTROL | (5 - 2));
OUT_BATCH(flags);
 @@ -496,6 +529,8 @@ brw_emit_pipe_control_write(struct brw_context *brw, 
 uint32_t flags,
OUT_BATCH(imm_upper);
ADVANCE_BATCH();
 } else if (brw-gen = 6) {
 +  flags |=

Re: [Mesa-dev] [PATCH] i965: Implement WaCsStallAtEveryFourthPipecontrol on IVB/BYT.

2014-11-12 Thread Ian Romanick

On 11/12/2014 10:53 AM, Chris Wilson wrote:
 On Wed, Nov 12, 2014 at 11:39:28AM +0100, Daniel Vetter wrote:
 On Wed, Nov 12, 2014 at 01:33:01AM -0800, Kenneth Graunke wrote:
 +/* Implement the WaCsStallAtEveryFourthPipecontrol workaround on IVB, BYT:
 + *
 + * Every 4th PIPE_CONTROL command, not counting the PIPE_CONTROL with
 + *  only read-cache-invalidate bit(s) set, must have a CS_STALL bit set.
 + *
 + * Note that the kernel does CS stalls between batches, so we only need
 + * to count them within a batch.
 + */
 +static uint32_t
 +gen7_cs_stall_every_four_pipe_controls(struct brw_context *brw, uint32_t 
 flags)
 +{
 +   if (brw-gen == 7  brw-is_haswell) {
 
 The comment says for IVB,BYT, the code here only applies to HSW.

Yeah... I think he meant  !brw-is_haswell.

 -Chris

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] build regression in tinderbox

2014-11-12 Thread Andy Furniss


Jose Fonseca wrote:

Thanks for the heads up.

Fixed now.


clover fails for me with current llvm git + mesa head on your fix.

Making all in state_trackers/clover
make[3]: Entering directory 
'/mnt/sdb1/Src64/Mesa-git/mesa/src/gallium/state_trackers/clover'

  CXX  llvm/libclllvm_la-invocation.lo
llvm/invocation.cpp: In function 'void 
{anonymous}::find_kernels(llvm::Module*, std::vectorllvm::Function*)':
llvm/invocation.cpp:286:50: error: 'const class llvm::NamedMDNode' has 
no member named 'getOperandAsMDNode'


kernel_node-getOperandAsMDNode(i)-getOperand(0)));
  ^
Makefile:843: recipe for target 'llvm/libclllvm_la-invocation.lo' failed
make[3]: *** [llvm/libclllvm_la-invocation.lo] Error 1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] st/dri: Support EGL_CONTEXT_OPENGL_DEBUG_BIT_KHR/GLX_CONTEXT_DEBUG_BIT_ARB on ES contexts.

2014-11-12 Thread jfonseca

From: José Fonseca jfons...@vmware.com

The latest version of the specs explicitly allow it, and given that Mesa
universally supports KHR_debug we should definitely support it.

Totally untested.  (Just happened to noticed this while implementing
GLX_EXT_create_context_es2_profile for st/xlib.)
---
 src/gallium/state_trackers/dri/dri_context.c |  6 +++---
 src/mesa/drivers/dri/common/dri_util.c   | 14 ++
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/gallium/state_trackers/dri/dri_context.c 
b/src/gallium/state_trackers/dri/dri_context.c
index fe3240a..84b8807 100644
--- a/src/gallium/state_trackers/dri/dri_context.c
+++ b/src/gallium/state_trackers/dri/dri_context.c
@@ -72,9 +72,6 @@ dri_create_context(gl_api api, const struct gl_config * 
visual,
   attribs.major = major_version;
   attribs.minor = minor_version;
 
-  if ((flags  __DRI_CTX_FLAG_DEBUG) != 0)
-attribs.flags |= ST_CONTEXT_FLAG_DEBUG;
-
   if ((flags  __DRI_CTX_FLAG_FORWARD_COMPATIBLE) != 0)
 attribs.flags |= ST_CONTEXT_FLAG_FORWARD_COMPATIBLE;
   break;
@@ -83,6 +80,9 @@ dri_create_context(gl_api api, const struct gl_config * 
visual,
   goto fail;
}
 
+   if ((flags  __DRI_CTX_FLAG_DEBUG) != 0)
+  attribs.flags |= ST_CONTEXT_FLAG_DEBUG;
+
if (flags  ~(__DRI_CTX_FLAG_DEBUG | __DRI_CTX_FLAG_FORWARD_COMPATIBLE)) {
   *error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
   goto fail;
diff --git a/src/mesa/drivers/dri/common/dri_util.c 
b/src/mesa/drivers/dri/common/dri_util.c
index 02499f2..d6e875f 100644
--- a/src/mesa/drivers/dri/common/dri_util.c
+++ b/src/mesa/drivers/dri/common/dri_util.c
@@ -376,19 +376,17 @@ driCreateContextAttribs(__DRIscreen *screen, int api,
return NULL;
 }
 
-/* The EGL_KHR_create_context spec says:
+/* The latest version of EGL_KHR_create_context spec says:
  *
- * Flags are only defined for OpenGL context creation, and specifying
- * a flags value other than zero for other types of contexts,
- * including OpenGL ES contexts, will generate an error.
+ * If the EGL_CONTEXT_OPENGL_DEBUG_BIT_KHR flag bit is set in
+ * EGL_CONTEXT_FLAGS_KHR, then a debug context will be created.
+ * [...] This bit is supported for OpenGL and OpenGL ES contexts.
  *
- * The GLX_EXT_create_context_es2_profile specification doesn't say
- * anything specific about this case.  However, none of the known flags
- * have any meaning in an ES context, so this seems safe.
+ * None of the other flags have any meaning in an ES context, so this 
seems safe.
  */
 if (mesa_api != API_OPENGL_COMPAT
  mesa_api != API_OPENGL_CORE
- flags != 0) {
+ (flags  ~__DRI_CTX_FLAG_DEBUG)) {
*error = __DRI_CTX_ERROR_BAD_FLAG;
return NULL;
 }
-- 
2.1.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] st/glx: Implement GLX_EXT_create_context_es2_profile.

2014-11-12 Thread jfonseca

From: José Fonseca jfons...@vmware.com

apitrace now supports it, and it makes it much easier to test
tracing/replaying on OpenGL ES contexts since
GLX_EXT_create_context_{es2,es}_profile are widely available.
---
 src/gallium/state_trackers/glx/xlib/glx_api.c |  5 +-
 src/gallium/state_trackers/glx/xlib/xm_api.c  | 86 ---
 2 files changed, 54 insertions(+), 37 deletions(-)

diff --git a/src/gallium/state_trackers/glx/xlib/glx_api.c 
b/src/gallium/state_trackers/glx/xlib/glx_api.c
index 976791b..810910e 100644
--- a/src/gallium/state_trackers/glx/xlib/glx_api.c
+++ b/src/gallium/state_trackers/glx/xlib/glx_api.c
@@ -56,6 +56,8 @@
GLX_ARB_create_context  \
GLX_ARB_create_context_profile  \
GLX_ARB_get_proc_address  \
+   GLX_EXT_create_context_es_profile  \
+   GLX_EXT_create_context_es2_profile  \
GLX_EXT_texture_from_pixmap  \
GLX_EXT_visual_info  \
GLX_EXT_visual_rating  \
@@ -2718,7 +2720,8 @@ glXCreateContextAttribsARB(Display *dpy, GLXFBConfig 
config,
 
/* check profileMask */
if (profileMask != GLX_CONTEXT_CORE_PROFILE_BIT_ARB 
-   profileMask != GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB) {
+   profileMask != GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB 
+   profileMask != GLX_CONTEXT_ES_PROFILE_BIT_EXT) {
   return NULL; /* generate BadValue X Error */
}
 
diff --git a/src/gallium/state_trackers/glx/xlib/xm_api.c 
b/src/gallium/state_trackers/glx/xlib/xm_api.c
index 1b77729..2aa5ac4 100644
--- a/src/gallium/state_trackers/glx/xlib/xm_api.c
+++ b/src/gallium/state_trackers/glx/xlib/xm_api.c
@@ -866,12 +866,12 @@ XMesaContext XMesaCreateContext( XMesaVisual v, 
XMesaContext share_list,
XMesaContext c;
 
if (!xmdpy)
-  return NULL;
+  goto no_xmesa_context;
 
/* Note: the XMesaContext contains a Mesa struct gl_context struct 
(inheritance) */
c = (XMesaContext) CALLOC_STRUCT(xmesa_context);
if (!c)
-  return NULL;
+  goto no_xmesa_context;
 
c-xm_visual = v;
c-xm_buffer = NULL;   /* set later by XMesaMakeCurrent */
@@ -888,40 +888,56 @@ XMesaContext XMesaCreateContext( XMesaVisual v, 
XMesaContext share_list,
if (contextFlags  GLX_CONTEXT_ROBUST_ACCESS_BIT_ARB)
   attribs.flags |= ST_CONTEXT_FLAG_ROBUST_ACCESS;
 
-   /* There are no profiles before OpenGL 3.2.  The
-* GLX_ARB_create_context_profile spec says:
-*
-* If the requested OpenGL version is less than 3.2,
-* GLX_CONTEXT_PROFILE_MASK_ARB is ignored and the functionality of the
-* context is determined solely by the requested version.
-*
-* The spec also says:
-*
-* The default value for GLX_CONTEXT_PROFILE_MASK_ARB is
-* GLX_CONTEXT_CORE_PROFILE_BIT_ARB.
-*
-* The spec also says:
-*
-* If version 3.1 is requested, the context returned may implement
-* any of the following versions:
-*
-*   * Version 3.1. The GL_ARB_compatibility extension may or may not
-* be implemented, as determined by the implementation.
-*   * The core profile of version 3.2 or greater.
-*
-* and because Mesa doesn't support GL_ARB_compatibility, the only chance to
-* honour a 3.1 context is through core profile.
-*/
-   attribs.profile = ST_PROFILE_DEFAULT;
-   if (((major  3 || (major == 3  minor = 2))
- ((profileMask  GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB) == 0)) ||
-   (major == 3  minor == 1))
-  attribs.profile = ST_PROFILE_OPENGL_CORE;
+   switch (profileMask) {
+   case GLX_CONTEXT_CORE_PROFILE_BIT_ARB:
+  /* There are no profiles before OpenGL 3.2.  The
+   * GLX_ARB_create_context_profile spec says:
+   *
+   * If the requested OpenGL version is less than 3.2,
+   * GLX_CONTEXT_PROFILE_MASK_ARB is ignored and the functionality
+   * of the context is determined solely by the requested version.
+   */
+  if (major  3 || (major == 3  minor = 2)) {
+ attribs.profile = ST_PROFILE_OPENGL_CORE;
+ break;
+  }
+  /* fall-through */
+   case GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB:
+  /*
+   * The spec also says:
+   *
+   * If version 3.1 is requested, the context returned may implement
+   * any of the following versions:
+   *
+   *   * Version 3.1. The GL_ARB_compatibility extension may or may not
+   * be implemented, as determined by the implementation.
+   *   * The core profile of version 3.2 or greater.
+   *
+   * and because Mesa doesn't support GL_ARB_compatibility, the only 
chance to
+   * honour a 3.1 context is through core profile.
+   */
+  if (major == 3  minor == 1) {
+ attribs.profile = ST_PROFILE_OPENGL_CORE;
+  } else {
+ attribs.profile = ST_PROFILE_DEFAULT;
+  }
+  break;
+   case GLX_CONTEXT_ES_PROFILE_BIT_EXT:
+  if (major = 2) {
+ attribs.profile = ST_PROFILE_OPENGL_ES2;
+  }

[Mesa-dev] [PATCH 3/3] glx: Allow to create any OpenGL ES version.

2014-11-12 Thread jfonseca

From: José Fonseca jfons...@vmware.com

The latest version of GLX_EXT_create_context_es2_profile states:

  If the version requested is a valid and supported OpenGL-ES version,
  and the GLX_CONTEXT_ES_PROFILE_BIT_EXT bit is set in the
  GLX_CONTEXT_PROFILE_MASK_ARB attribute (see below), then the context
  returned will implement the OpenGL ES version requested.

We must also export EXT_create_context_es_profile too, as
EXT_create_context_es2_profile specification is crystal clear:

  NOTE: implementations of this extension must export BOTH extension
  strings, for backwards compatibility with applications written
  against version 1 of this extension.

Totally untested.  (Just happened to noticed this while implementing
GLX_EXT_create_context_es2_profile for st/xlib.)
---
 src/glx/dri_common.c | 32 
 src/glx/drisw_glx.c  |  2 ++
 2 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/src/glx/dri_common.c b/src/glx/dri_common.c
index 63c8de3..541abbb 100644
--- a/src/glx/dri_common.c
+++ b/src/glx/dri_common.c
@@ -544,9 +544,22 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
uint32_t *attribs,
   case GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB:
 *api = __DRI_API_OPENGL;
 break;
-  case GLX_CONTEXT_ES2_PROFILE_BIT_EXT:
-*api = __DRI_API_GLES2;
-break;
+  case GLX_CONTEXT_ES_PROFILE_BIT_EXT:
+ switch (*major_ver) {
+ case 3:
+*api = __DRI_API_GLES3;
+break;
+ case 2:
+*api = __DRI_API_GLES2;
+break;
+ case 1:
+*api = __DRI_API_GLES;
+break;
+ default:
+*error = __DRI_CTX_ERROR_BAD_API;
+return false;
+ }
+ break;
   default:
 *error = __DRI_CTX_ERROR_BAD_API;
 return false;
@@ -577,19 +590,6 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
uint32_t *attribs,
   return false;
}
 
-   /* The GLX_EXT_create_context_es2_profile spec says:
-*
-* ... If the version requested is 2.0, and the
-* GLX_CONTEXT_ES2_PROFILE_BIT_EXT bit is set in the
-* GLX_CONTEXT_PROFILE_MASK_ARB attribute (see below), then the context
-* returned will implement OpenGL ES 2.0. This is the only way in which
-* an implementation may request an OpenGL ES 2.0 context.
-*/
-   if (*api == __DRI_API_GLES2  (*major_ver != 2 || *minor_ver != 0)) {
-  *error = __DRI_CTX_ERROR_BAD_API;
-  return false;
-   }
-
*error = __DRI_CTX_ERROR_SUCCESS;
return true;
 }
diff --git a/src/glx/drisw_glx.c b/src/glx/drisw_glx.c
index 749ceb0..b0be5d0 100644
--- a/src/glx/drisw_glx.c
+++ b/src/glx/drisw_glx.c
@@ -617,6 +617,8 @@ driswBindExtensions(struct drisw_screen *psc, const 
__DRIextension **extensions)
   /* DRISW version = 2 implies support for OpenGL ES 2.0.
*/
   __glXEnableDirectExtension(psc-base,
+GLX_EXT_create_context_es_profile);
+  __glXEnableDirectExtension(psc-base,
 GLX_EXT_create_context_es2_profile);
}
 
-- 
2.1.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] build regression in tinderbox

2014-11-12 Thread Jose Fonseca

Sorry, but I'm not familiar with that code base -- I don't even have a proper 
environment to build it --, and this build failure is unrelated to my earlier 
fix.

So I'm afraid the clover maintainers will need to step up here.

Jose


From: Andy Furniss adf.li...@gmail.com
Sent: 12 November 2014 12:24
To: Jose Fonseca; Dave Airlie; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] build regression in tinderbox

Jose Fonseca wrote:
 Thanks for the heads up.

 Fixed now.

clover fails for me with current llvm git + mesa head on your fix.

Making all in state_trackers/clover
make[3]: Entering directory
'/mnt/sdb1/Src64/Mesa-git/mesa/src/gallium/state_trackers/clover'
   CXX  llvm/libclllvm_la-invocation.lo
llvm/invocation.cpp: In function 'void
{anonymous}::find_kernels(llvm::Module*, std::vectorllvm::Function*)':
llvm/invocation.cpp:286:50: error: 'const class llvm::NamedMDNode' has
no member named 'getOperandAsMDNode'

kernel_node-getOperandAsMDNode(i)-getOperand(0)));
   ^
Makefile:843: recipe for target 'llvm/libclllvm_la-invocation.lo' failed
make[3]: *** [llvm/libclllvm_la-invocation.lo] Error 1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 6/9] gallium/auxiliary: add dump functions for bind and transfer flags

2014-11-12 Thread Erik Faye-Lund

On Sun, Nov 2, 2014 at 7:32 PM, David Heidelberg da...@ixit.cz wrote:
 v2: rename and extend support with code for C11 and MSVC (thanks to Brian)

 Signed-off-by: David Heidelberg da...@ixit.cz
 ---
  src/gallium/auxiliary/util/u_dump.h |  6 ++
  src/gallium/auxiliary/util/u_dump_defines.c | 86
 +
  2 files changed, 92 insertions(+)

 diff --git a/src/gallium/auxiliary/util/u_dump.h
 b/src/gallium/auxiliary/util/u_dump.h
 index 58e7dfd..84ba1ed 100644
 --- a/src/gallium/auxiliary/util/u_dump.h
 +++ b/src/gallium/auxiliary/util/u_dump.h
 @@ -88,6 +88,12 @@ util_dump_tex_filter(unsigned value, boolean shortened);
  const char *
  util_dump_query_type(unsigned value, boolean shortened);
  +const char *
 +util_dump_bind_flags(unsigned flags);
 +
 +const char *
 +util_dump_transfer_flags(unsigned flags);
 +
   /*
   * p_state.h, through a FILE
 diff --git a/src/gallium/auxiliary/util/u_dump_defines.c
 b/src/gallium/auxiliary/util/u_dump_defines.c
 index 03fd15d..20ae6c0 100644
 --- a/src/gallium/auxiliary/util/u_dump_defines.c
 +++ b/src/gallium/auxiliary/util/u_dump_defines.c
 @@ -61,6 +61,36 @@ util_dump_enum_continuous(unsigned value,
 return names[value];
  }
  +static const char *
 +util_dump_flags(unsigned flags, const char *prefix,
 +unsigned num_names,
 +const char **names)
 +{
 +#if __STDC_VERSION__ = 201112  !defined __STDC_NO_THREADS__
 +   static _Thread_local char str[256];
 +#elif defined(PIPE_CC_GCC)
 +   static __thread char str[256];
 +#elif defined(PIPE_CC_MSVC)
 +   static __declspec(thread) char str[256];
 +#else
 +#error Unsupported compiler: please find how to implement thread local
 storage on it
 +#endif

Isn't failing compilation a bit aggressive? Can't we just lose the
functionality instead?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] mesa/main: Clamp rgba with streamed sse

2014-11-12 Thread Juha-Pekka Heikkila

Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
---
 src/mesa/main/pixeltransfer.c | 62 ++-
 1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/src/mesa/main/pixeltransfer.c b/src/mesa/main/pixeltransfer.c
index 8bbeeb8..273a9ac 100644
--- a/src/mesa/main/pixeltransfer.c
+++ b/src/mesa/main/pixeltransfer.c
@@ -35,7 +35,8 @@
 #include pixeltransfer.h
 #include imports.h
 #include mtypes.h
-
+#include x86/common_x86_asm.h
+#include main/sse2_clamping.h
 
 /*
  * Apply scale and bias factors to an array of RGBA pixels.
@@ -80,25 +81,39 @@ _mesa_scale_and_bias_rgba(GLuint n, GLfloat rgba[][4],
 void
 _mesa_map_rgba( const struct gl_context *ctx, GLuint n, GLfloat rgba[][4] )
 {
-   const GLfloat rscale = (GLfloat) (ctx-PixelMaps.RtoR.Size - 1);
-   const GLfloat gscale = (GLfloat) (ctx-PixelMaps.GtoG.Size - 1);
-   const GLfloat bscale = (GLfloat) (ctx-PixelMaps.BtoB.Size - 1);
-   const GLfloat ascale = (GLfloat) (ctx-PixelMaps.AtoA.Size - 1);
const GLfloat *rMap = ctx-PixelMaps.RtoR.Map;
const GLfloat *gMap = ctx-PixelMaps.GtoG.Map;
const GLfloat *bMap = ctx-PixelMaps.BtoB.Map;
const GLfloat *aMap = ctx-PixelMaps.AtoA.Map;
GLuint i;
-   for (i=0;in;i++) {
-  GLfloat r = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F);
-  GLfloat g = CLAMP(rgba[i][GCOMP], 0.0F, 1.0F);
-  GLfloat b = CLAMP(rgba[i][BCOMP], 0.0F, 1.0F);
-  GLfloat a = CLAMP(rgba[i][ACOMP], 0.0F, 1.0F);
-  rgba[i][RCOMP] = rMap[F_TO_I(r * rscale)];
-  rgba[i][GCOMP] = gMap[F_TO_I(g * gscale)];
-  rgba[i][BCOMP] = bMap[F_TO_I(b * bscale)];
-  rgba[i][ACOMP] = aMap[F_TO_I(a * ascale)];
+   GLfloat scale[4];
+
+   scale[RCOMP] = (GLfloat) (ctx-PixelMaps.RtoR.Size - 1);
+   scale[GCOMP] = (GLfloat) (ctx-PixelMaps.GtoG.Size - 1);
+   scale[BCOMP] = (GLfloat) (ctx-PixelMaps.BtoB.Size - 1);
+   scale[ACOMP] = (GLfloat) (ctx-PixelMaps.AtoA.Size - 1);
+
+#if defined(USE_SSE2)
+   if (cpu_has_xmm2) {
+  _mesa_clamp_float_rgba_scale_and_map(n, rgba, rgba, 0.0F, 1.0F, scale,
+   rMap, gMap, bMap, aMap);
}
+   else {
+#endif
+  for (i=0;in;i++) {
+ GLfloat rgba_temp[4];
+ rgba_temp[RCOMP] = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F);
+ rgba_temp[GCOMP] = CLAMP(rgba[i][GCOMP], 0.0F, 1.0F);
+ rgba_temp[BCOMP] = CLAMP(rgba[i][BCOMP], 0.0F, 1.0F);
+ rgba_temp[ACOMP] = CLAMP(rgba[i][ACOMP], 0.0F, 1.0F);
+ rgba[i][RCOMP] = rMap[F_TO_I(rgba_temp[RCOMP] * scale[RCOMP])];
+ rgba[i][GCOMP] = gMap[F_TO_I(rgba_temp[GCOMP] * scale[GCOMP])];
+ rgba[i][BCOMP] = bMap[F_TO_I(rgba_temp[BCOMP] * scale[BCOMP])];
+ rgba[i][ACOMP] = aMap[F_TO_I(rgba_temp[ACOMP] * scale[ACOMP])];
+  }
+#if defined(USE_SSE2)
+   }
+#endif
 }
 
 /*
@@ -179,12 +194,21 @@ _mesa_apply_rgba_transfer_ops(struct gl_context *ctx, 
GLbitfield transferOps,
/* clamping to [0,1] */
if (transferOps  IMAGE_CLAMP_BIT) {
   GLuint i;
-  for (i = 0; i  n; i++) {
- rgba[i][RCOMP] = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F);
- rgba[i][GCOMP] = CLAMP(rgba[i][GCOMP], 0.0F, 1.0F);
- rgba[i][BCOMP] = CLAMP(rgba[i][BCOMP], 0.0F, 1.0F);
- rgba[i][ACOMP] = CLAMP(rgba[i][ACOMP], 0.0F, 1.0F);
+#if defined(USE_SSE2)
+  if (cpu_has_xmm2) {
+ _mesa_streaming_clamp_float_rgba(n, rgba, rgba, 0.0F, 1.0F);
+  }
+  else {
+#endif
+ for (i = 0; i  n; i++) {
+rgba[i][RCOMP] = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F);
+rgba[i][GCOMP] = CLAMP(rgba[i][GCOMP], 0.0F, 1.0F);
+rgba[i][BCOMP] = CLAMP(rgba[i][BCOMP], 0.0F, 1.0F);
+rgba[i][ACOMP] = CLAMP(rgba[i][ACOMP], 0.0F, 1.0F);
+ }
+#if defined(USE_SSE2)
   }
+#endif
}
 }
 
-- 
1.8.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/3] Do float texture clamping with streaming sse2

2014-11-12 Thread Juha-Pekka Heikkila

I tested this with uploading 1024x1024 656 textures in a loop for 10 seconds.
With glTexImage2D on SNB I get 17% better performance, mobile IVB
(interestingly only) 0..1% better performance and BDW 3% better performance.
For all these tests Mesa was compiled with -O2 -march=native and no Piglit
regressions.

/Juha-Pekka

Juha-Pekka Heikkila (3):
  configure.ac: Add detection for sse2 compilation support
  mesa/main: Add sse2 streaming clamping
  mesa/main: Clamp rgba with streamed sse

 configure.ac  |   7 +++
 src/mesa/Makefile.am  |   8 +++
 src/mesa/main/pixeltransfer.c |  62 +--
 src/mesa/main/sse2_clamping.c | 138 ++
 src/mesa/main/sse2_clamping.h |  49 +++
 5 files changed, 245 insertions(+), 19 deletions(-)
 create mode 100644 src/mesa/main/sse2_clamping.c
 create mode 100644 src/mesa/main/sse2_clamping.h

-- 
1.8.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] mesa/main: Add sse2 streaming clamping

2014-11-12 Thread Juha-Pekka Heikkila

Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
---
 src/mesa/Makefile.am  |   8 +++
 src/mesa/main/sse2_clamping.c | 138 ++
 src/mesa/main/sse2_clamping.h |  49 +++
 3 files changed, 195 insertions(+)
 create mode 100644 src/mesa/main/sse2_clamping.c
 create mode 100644 src/mesa/main/sse2_clamping.h

diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
index 932db4f..43dbe87 100644
--- a/src/mesa/Makefile.am
+++ b/src/mesa/Makefile.am
@@ -111,6 +111,10 @@ if SSE41_SUPPORTED
 ARCH_LIBS += libmesa_sse41.la
 endif
 
+if SSE2_SUPPORTED
+ARCH_LIBS += libmesa_sse2.la
+endif
+
 MESA_ASM_FILES_FOR_ARCH =
 
 if HAVE_X86_ASM
@@ -155,6 +159,10 @@ libmesa_sse41_la_SOURCES = \
main/sse_minmax.c
 libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
 
+libmesa_sse2_la_SOURCES = \
+   main/sse2_clamping.c
+libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2
+
 pkgconfigdir = $(libdir)/pkgconfig
 pkgconfig_DATA = gl.pc
 
diff --git a/src/mesa/main/sse2_clamping.c b/src/mesa/main/sse2_clamping.c
new file mode 100644
index 000..66c7dc7
--- /dev/null
+++ b/src/mesa/main/sse2_clamping.c
@@ -0,0 +1,138 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
+ *
+ */
+
+#ifdef __SSE2__
+#include main/macros.h
+#include main/sse2_clamping.h
+#include emmintrin.h
+
+/**
+ * Clamp four float values to [min,max]
+ */
+static inline void
+_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min,
+   const float max)
+{
+   __m128  operand, minval, maxval;
+
+   operand = _mm_loadu_ps(src);
+   minval = _mm_set1_ps(min);
+   maxval = _mm_set1_ps(max);
+   operand = _mm_max_ps(operand, minval);
+   operand = _mm_min_ps(operand, maxval);
+   _mm_storeu_ps(result, operand);
+}
+
+
+/* Clamp n amount float rgba pixels to [min,max] using SSE2
+ */
+__attribute__((optimize(unroll-loops)))
+void
+_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4],
+ GLfloat rgba_dst[][4], const GLfloat min,
+ const GLfloat max)
+{
+   int  c, prefetch_c;
+   float*   worker = rgba_src[0][0];
+   __m128   operand[2], minval, maxval;
+
+   _mm_prefetch((char*) (((unsigned long)worker)|0x1f) + 65, _MM_HINT_T0);
+
+   minval = _mm_set1_ps(min);
+   maxval = _mm_set1_ps(max);
+
+   for (c = n*4; c  0  (((unsigned long)worker)0x1f) != 0; c--, worker++) {
+  operand[0] = _mm_load_ss(worker);
+  operand[0] = _mm_max_ss(operand[0], minval);
+  operand[0] = _mm_min_ss(operand[0], maxval);
+  _mm_store_ss(worker, operand[0]);
+   }
+
+   while (c = 8) {
+  _mm_prefetch((char*) worker + 64, _MM_HINT_T0);
+
+  for (prefetch_c = 64/8; prefetch_c  0  c = 8; prefetch_c--, c-=8,
+   worker += 8) {
+
+ operand[0] = _mm_load_ps(worker);
+ operand[1] = _mm_load_ps(worker+4);
+ operand[0] = _mm_max_ps(operand[0], minval);
+ operand[1] = _mm_max_ps(operand[1], minval);
+ operand[0] = _mm_min_ps(operand[0], maxval);
+ operand[1] = _mm_min_ps(operand[1], maxval);
+
+ _mm_store_ps(worker, operand[0]);
+ _mm_store_ps(worker+4, operand[1]);
+  }
+   }
+
+   for (; c  0; c--, worker++) {
+  operand[0] = _mm_load_ss(worker);
+  operand[0] = _mm_max_ss(operand[0], minval);
+  operand[0] = _mm_min_ss(operand[0], maxval);
+  _mm_store_ss(worker, operand[0]);
+   }
+}
+
+
+/* Clamp n amount float rgba pixels to [min,max] using SSE2 and apply
+ * scaling and mapping to components.
+ *
+ * this replace handling of [RGBA] channels:
+ * rgba_temp[RCOMP] = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F);
+ * rgba[i][RCOMP] = rMap[F_TO_I(rgba_temp[RCOMP] * scale[RCOMP])];
+ */
+void

[Mesa-dev] [PATCH 1/3] configure.ac: Add detection for sse2 compilation support

2014-11-12 Thread Juha-Pekka Heikkila

Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
---
 configure.ac | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/configure.ac b/configure.ac
index fc7d372..a01e605 100644
--- a/configure.ac
+++ b/configure.ac
@@ -258,6 +258,13 @@ if test x$SSE41_SUPPORTED = x1; then
 fi
 AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1])
 
+AX_CHECK_COMPILE_FLAG([-msse2], [SSE2_SUPPORTED=1], [SSE2_SUPPORTED=0])
+if test x$SSE2_SUPPORTED = x1; then
+DEFINES=$DEFINES -DUSE_SSE2
+fi
+AM_CONDITIONAL([SSE2_SUPPORTED], [test x$SSE2_SUPPORTED = x1])
+
+
 dnl Can't have static and shared libraries, default to static if user
 dnl explicitly requested. If both disabled, set to static since shared
 dnl was explicitly requested.
-- 
1.8.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 86195] Lightswork video editor segfaults

2014-11-12 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=86195

Bug ID: 86195
   Summary: Lightswork video editor segfaults
   Product: Mesa
   Version: git
  Hardware: x86-64 (AMD64)
OS: All
Status: NEW
  Severity: minor
  Priority: low
 Component: Drivers/Gallium/radeonsi
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: pontost...@gmail.com

This app is very unstable with radeonsi, but works fine on intel.

HD 7790
mesa-git\llvm-svn 11.10.14


Program received signal SIGSEGV, Segmentation fault.
0x7dfa9b7807c8 in std::vectorllvm::MachineOperand*,
std::allocatorllvm::MachineOperand*
::_M_fill_insert(__gnu_cxx::__normal_iteratorllvm::MachineOperand**,
std::vectorllvm::MachineOperand*, std::allocatorllvm::MachineOperand*  ,
unsigned long, llvm::MachineOperand* const) ()
   from /usr/lib64/libLLVM-3.6svn.so
(gdb) bt
#0  0x7dfa9b7807c8 in std::vectorllvm::MachineOperand*,
std::allocatorllvm::MachineOperand*
::_M_fill_insert(__gnu_cxx::__normal_iteratorllvm::MachineOperand**,
std::vectorllvm::MachineOperand*, std::allocatorllvm::MachineOperand*  ,
unsigned long, llvm::MachineOperand* const) ()
at /usr/lib64/libLLVM-3.6svn.so
#1  0x7dfa9b780f41 in
llvm::MachineRegisterInfo::MachineRegisterInfo(llvm::MachineFunction const*) ()
at /usr/lib64/libLLVM-3.6svn.so
#2  0x7dfa9b74c99a in llvm::MachineFunction::MachineFunction(llvm::Function
const*, llvm::TargetMachine const, unsigned int, llvm::MachineModuleInfo) ()
at /usr/lib64/libLLVM-3.6svn.so
#3  0x7dfa9b7508cb in
llvm::MachineFunctionAnalysis::runOnFunction(llvm::Function) () at
/usr/lib64/libLLVM-3.6svn.so
#4  0x7dfa9b46caef in llvm::FPPassManager::runOnFunction(llvm::Function)
() at /usr/lib64/libLLVM-3.6svn.so
#5  0x7dfa9b46cb7b in llvm::FPPassManager::runOnModule(llvm::Module) () at
/usr/lib64/libLLVM-3.6svn.so
#6  0x7dfa9b46f225 in llvm::legacy::PassManagerImpl::run(llvm::Module) ()
at /usr/lib64/libLLVM-3.6svn.so
#7  0x7dfa9b8ff303 in  () at /usr/lib64/libLLVM-3.6svn.so
#8  0x7dfa9b8ff510 in LLVMTargetMachineEmitToMemoryBuffer () at
/usr/lib64/libLLVM-3.6svn.so
#9  0x7dfa9d0458af in radeon_llvm_compile (M=0x7ffaa0a17bc4,
binary=0x7fff319ca220, gpu_family=0x7dfa9d1589b5 bonaire, dump=0)
at radeon_llvm_emit.c:185
#10 0x7dfa9cfa96a0 in si_compile_llvm (sscreen=0x18a02b0, shader=0x1be6900,
mod=0x7ffaa0a17bc4) at si_shader.c:2601
#11 0x7dfa9cfa9d8e in si_shader_create (sscreen=0x18a02b0,
shader=0x1be6900) at si_shader.c:2800
#12 0x7dfa9cfafa19 in si_shader_select (ctx=0x1988d20, sel=0x1bd8800) at
si_state.c:2279
#13 0x7dfa9cfb1bb8 in si_update_derived_state (sctx=0x1988d20) at
si_state_draw.c:652
#14 0x7dfa9cfb1dda in si_draw_vbo (ctx=0x1988d20, info=0x7fff319d17f0) at
si_state_draw.c:919
#15 0x7dfa9ccb7fc9 in blitter_draw (ctx=
0x19b2c30, x1=optimized out, y1=optimized out, x2=optimized out,
y2=optimized out, depth=optimized out, num_instances=1)
at ./util/u_draw.h:99
#16 0x7dfa9ccb80e1 in util_blitter_draw_rectangle (blitter=
0x19b2c30, x1=0, y1=0, x2=1920, y2=1080, depth=0, type=optimized out,
attrib=0x7fff319d19f0) at util/u_blitter.c:1156
#17 0x7dfa9d038fe2 in r600_draw_rectangle (blitter=
0x19b2c30, x1=0, y1=0, x2=1920, y2=1080, depth=0, type=optimized out,
attrib=0x7fff319d19f0) at r600_pipe_common.c:56
#18 0x7dfa9ccbaab2 in util_blitter_blit_generic (blitter=
0x19b2c30, dst=0x1bd8470, dstbox=0x7fff319d1b10, src=0x1be23e0,
srcbox=0x7fff319d1cf0, src_width0=1920, src_height0=1080, mask=63, filter=0,
scissor=0x0) at util/u_blitter.c:1619
#19 0x7dfa9cf9d81a in si_resource_copy_region (ctx=0x1988d20,
dst=0x1be34f0, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x1be4370, src_level=0,
src_box=0x7fff319d1cf0) at si_blit.c:664
#20 0x7dfa9cfa2e7c in si_dma_copy (ctx=0x1988d20, dst=0x1be34f0,
dst_level=0, dstx=0, dsty=0, dstz=0, src=0x1be4370, src_level=0,
src_box=0x7fff319d1cf0) at si_dma.c:322
#21 0x7dfa9d03e063 in r600_copy_from_staging_texture (ctx=optimized out,
rtransfer=optimized out) at r600_texture.c:105
#22 0x7dfa9d03e683 in r600_texture_transfer_unmap (ctx=0x1988d20,
transfer=0x1be79a0) at r600_texture.c:1079
#23 0x7dfa9ccd5f42 in u_transfer_unmap_vtbl (pipe=optimized out,
transfer=optimized out) at util/u_transfer.c:138
#24 0x7dfa9cb98877 in st_texture_image_unmap (st=optimized out,
stImage=optimized out, slice=optimized out)
at ../../src/gallium/auxiliary/util/u_inlines.h:481
#25 0x7dfa9cb6ea59 in st_UnmapTextureImage (ctx=optimized out,
texImage=0x1be78d0, slice=0)
at ../../src/mesa/state_tracker/st_cb_texture.c:283
#26 0x7dfa9cb1a32f in store_texsubimage (ctx=0x7dfa99d41010,
texImage=0x1be78d0, xoffset=0, yoffset=0, zoffset=optimized out, width=1920,
height=1080, depth=optimized out, format=32993, type=5121, pixels=0x251deb0,
packing=0x7dfa99d5c6e0, caller=0x7dfa9d07287f glTexSubImage)
at

Re: [Mesa-dev] [PATCH] clover: fix clCreateContext Piglit test crash

2014-11-12 Thread Francisco Jerez

EdB edb+m...@sigluy.net writes:

 clCreateContext no longer crash when CL_CONTEXT_PLATFORM is invalid

NAK.  That piglit test is rather dubious, can we please get rid of it?

Passing pointers to inaccessible memory is likely to cause a segfault on
other implementations too, like nVidia's, Intel's or any implementation
using the official Khronos ICD loader -- And I don't see why that's such
a big deal, SEGFAULT is just the way your OS has to tell you that you
don't have the right to dereference that address, which is precisely
what's going on here.

Doing these invalid pointer checks consistently for other object types
would imply lots of ugly and fragile book-keeping, which when using the
ICD would become completely useless since the crash would then happen in
the loader before we have the chance to do any argument validation.

I've seen some discussion on the Khronos bugzilla regarding handle
validation on the ICD loader.  The conclusion seemed to be that full
validation would be too costly and provide little benefit, a minimal
NULL check should be sufficient -- which is less than Clover is already
doing.

 ---
  src/gallium/state_trackers/clover/api/context.cpp | 11 ---
  1 file changed, 8 insertions(+), 3 deletions(-)

 diff --git a/src/gallium/state_trackers/clover/api/context.cpp 
 b/src/gallium/state_trackers/clover/api/context.cpp
 index 021eea3..749d2d7 100644
 --- a/src/gallium/state_trackers/clover/api/context.cpp
 +++ b/src/gallium/state_trackers/clover/api/context.cpp
 @@ -39,10 +39,15 @@ clCreateContext(const cl_context_properties *d_props, 
 cl_uint num_devs,
throw error(CL_INVALID_VALUE);
  
 for (auto prop : props) {
 -  if (prop.first == CL_CONTEXT_PLATFORM)
 - obj(prop.second.ascl_platform_id());
 -  else
 +  if (prop.first == CL_CONTEXT_PLATFORM) {
 + //clover only have one platform
 + cl_platform_id d_platform;
 + cl_int ret = clGetPlatformIDs(1, d_platform, NULL);
 + if (ret || (prop.second.ascl_platform_id() != d_platform))
 +throw error(CL_INVALID_PLATFORM);
 +  } else {
   throw error(CL_INVALID_PROPERTY);
 +  }
 }
  
 ret_error(r_errcode, CL_SUCCESS);
 -- 
 1.9.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


pgpa7Von_qJzh.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] glx: Allow to create any OpenGL ES version.

2014-11-12 Thread Brian Paul


Series LGTM.  Reviewed-by: Brian Paul bri...@vmware.com


On 11/12/2014 05:37 AM, jfons...@vmware.com wrote:

From: José Fonseca jfons...@vmware.com

The latest version of GLX_EXT_create_context_es2_profile states:

   If the version requested is a valid and supported OpenGL-ES version,
   and the GLX_CONTEXT_ES_PROFILE_BIT_EXT bit is set in the
   GLX_CONTEXT_PROFILE_MASK_ARB attribute (see below), then the context
   returned will implement the OpenGL ES version requested.

We must also export EXT_create_context_es_profile too, as
EXT_create_context_es2_profile specification is crystal clear:

   NOTE: implementations of this extension must export BOTH extension
   strings, for backwards compatibility with applications written
   against version 1 of this extension.

Totally untested.  (Just happened to noticed this while implementing
GLX_EXT_create_context_es2_profile for st/xlib.)
---
  src/glx/dri_common.c | 32 
  src/glx/drisw_glx.c  |  2 ++
  2 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/src/glx/dri_common.c b/src/glx/dri_common.c
index 63c8de3..541abbb 100644
--- a/src/glx/dri_common.c
+++ b/src/glx/dri_common.c
@@ -544,9 +544,22 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
uint32_t *attribs,
case GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB:
 *api = __DRI_API_OPENGL;
 break;
-  case GLX_CONTEXT_ES2_PROFILE_BIT_EXT:
-*api = __DRI_API_GLES2;
-break;
+  case GLX_CONTEXT_ES_PROFILE_BIT_EXT:
+ switch (*major_ver) {
+ case 3:
+*api = __DRI_API_GLES3;
+break;
+ case 2:
+*api = __DRI_API_GLES2;
+break;
+ case 1:
+*api = __DRI_API_GLES;
+break;
+ default:
+*error = __DRI_CTX_ERROR_BAD_API;
+return false;
+ }
+ break;
default:
 *error = __DRI_CTX_ERROR_BAD_API;
 return false;
@@ -577,19 +590,6 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
uint32_t *attribs,
return false;
 }

-   /* The GLX_EXT_create_context_es2_profile spec says:
-*
-* ... If the version requested is 2.0, and the
-* GLX_CONTEXT_ES2_PROFILE_BIT_EXT bit is set in the
-* GLX_CONTEXT_PROFILE_MASK_ARB attribute (see below), then the context
-* returned will implement OpenGL ES 2.0. This is the only way in which
-* an implementation may request an OpenGL ES 2.0 context.
-*/
-   if (*api == __DRI_API_GLES2  (*major_ver != 2 || *minor_ver != 0)) {
-  *error = __DRI_CTX_ERROR_BAD_API;
-  return false;
-   }
-
 *error = __DRI_CTX_ERROR_SUCCESS;
 return true;
  }
diff --git a/src/glx/drisw_glx.c b/src/glx/drisw_glx.c
index 749ceb0..b0be5d0 100644
--- a/src/glx/drisw_glx.c
+++ b/src/glx/drisw_glx.c
@@ -617,6 +617,8 @@ driswBindExtensions(struct drisw_screen *psc, const 
__DRIextension **extensions)
/* DRISW version = 2 implies support for OpenGL ES 2.0.
 */
__glXEnableDirectExtension(psc-base,
+GLX_EXT_create_context_es_profile);
+  __glXEnableDirectExtension(psc-base,
 GLX_EXT_create_context_es2_profile);
 }




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] mesa/main: Add sse2 streaming clamping

2014-11-12 Thread Brian Paul


On 11/12/2014 05:50 AM, Juha-Pekka Heikkila wrote:

Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
---
  src/mesa/Makefile.am  |   8 +++
  src/mesa/main/sse2_clamping.c | 138 ++
  src/mesa/main/sse2_clamping.h |  49 +++
  3 files changed, 195 insertions(+)
  create mode 100644 src/mesa/main/sse2_clamping.c
  create mode 100644 src/mesa/main/sse2_clamping.h

diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
index 932db4f..43dbe87 100644
--- a/src/mesa/Makefile.am
+++ b/src/mesa/Makefile.am
@@ -111,6 +111,10 @@ if SSE41_SUPPORTED
  ARCH_LIBS += libmesa_sse41.la
  endif

+if SSE2_SUPPORTED
+ARCH_LIBS += libmesa_sse2.la
+endif
+
  MESA_ASM_FILES_FOR_ARCH =

  if HAVE_X86_ASM
@@ -155,6 +159,10 @@ libmesa_sse41_la_SOURCES = \
main/sse_minmax.c
  libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1

+libmesa_sse2_la_SOURCES = \
+   main/sse2_clamping.c
+libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2
+
  pkgconfigdir = $(libdir)/pkgconfig
  pkgconfig_DATA = gl.pc

diff --git a/src/mesa/main/sse2_clamping.c b/src/mesa/main/sse2_clamping.c
new file mode 100644
index 000..66c7dc7
--- /dev/null
+++ b/src/mesa/main/sse2_clamping.c
@@ -0,0 +1,138 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
+ *
+ */
+
+#ifdef __SSE2__
+#include main/macros.h
+#include main/sse2_clamping.h
+#include emmintrin.h
+
+/**
+ * Clamp four float values to [min,max]
+ */
+static inline void
+_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min,
+   const float max)


We don't normally put the _mesa_ prefix on local/static functions.

Is there a reason why you const-qualify the min, max parameters but not src?



+{
+   __m128  operand, minval, maxval;
+
+   operand = _mm_loadu_ps(src);
+   minval = _mm_set1_ps(min);
+   maxval = _mm_set1_ps(max);
+   operand = _mm_max_ps(operand, minval);
+   operand = _mm_min_ps(operand, maxval);
+   _mm_storeu_ps(result, operand);
+}
+
+
+/* Clamp n amount float rgba pixels to [min,max] using SSE2
+ */
+__attribute__((optimize(unroll-loops)))


Is there any intention of building this code with MSVC someday?  I don't 
think the __attribute__ stuff will work there.



+void
+_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4],
+ GLfloat rgba_dst[][4], const GLfloat min,
+ const GLfloat max)


Again, it seems odd to const-qualify the min, max parameters but not src.



+{
+   int  c, prefetch_c;
+   float*   worker = rgba_src[0][0];
+   __m128   operand[2], minval, maxval;
+
+   _mm_prefetch((char*) (((unsigned long)worker)|0x1f) + 65, _MM_HINT_T0);
+
+   minval = _mm_set1_ps(min);
+   maxval = _mm_set1_ps(max);
+
+   for (c = n*4; c  0  (((unsigned long)worker)0x1f) != 0; c--, worker++) {


Whitespace on both sides of '' would be good.


+  operand[0] = _mm_load_ss(worker);
+  operand[0] = _mm_max_ss(operand[0], minval);
+  operand[0] = _mm_min_ss(operand[0], maxval);
+  _mm_store_ss(worker, operand[0]);
+   }
+
+   while (c = 8) {
+  _mm_prefetch((char*) worker + 64, _MM_HINT_T0);
+
+  for (prefetch_c = 64/8; prefetch_c  0  c = 8; prefetch_c--, c-=8,
+   worker += 8) {
+
+ operand[0] = _mm_load_ps(worker);
+ operand[1] = _mm_load_ps(worker+4);
+ operand[0] = _mm_max_ps(operand[0], minval);
+ operand[1] = _mm_max_ps(operand[1], minval);
+ operand[0] = _mm_min_ps(operand[0], maxval);
+ operand[1] = _mm_min_ps(operand[1], maxval);
+
+ _mm_store_ps(worker, operand[0]);
+ _mm_store_ps(worker+4, operand[1]);
+  }
+   }
+
+   for (; c  0; c--, worker++) {
+  operand[0] = _mm_load_ss(worker);
+  operand[0] =

Re: [Mesa-dev] [PATCH v3 6/9] gallium/auxiliary: add dump functions for bind and transfer flags

2014-11-12 Thread Brian Paul


On 11/12/2014 05:48 AM, Erik Faye-Lund wrote:

On Sun, Nov 2, 2014 at 7:32 PM, David Heidelberg da...@ixit.cz wrote:

v2: rename and extend support with code for C11 and MSVC (thanks to Brian)

Signed-off-by: David Heidelberg da...@ixit.cz
---
  src/gallium/auxiliary/util/u_dump.h |  6 ++
  src/gallium/auxiliary/util/u_dump_defines.c | 86
+
  2 files changed, 92 insertions(+)

diff --git a/src/gallium/auxiliary/util/u_dump.h
b/src/gallium/auxiliary/util/u_dump.h
index 58e7dfd..84ba1ed 100644
--- a/src/gallium/auxiliary/util/u_dump.h
+++ b/src/gallium/auxiliary/util/u_dump.h
@@ -88,6 +88,12 @@ util_dump_tex_filter(unsigned value, boolean shortened);
  const char *
  util_dump_query_type(unsigned value, boolean shortened);
  +const char *
+util_dump_bind_flags(unsigned flags);
+
+const char *
+util_dump_transfer_flags(unsigned flags);
+
   /*
   * p_state.h, through a FILE
diff --git a/src/gallium/auxiliary/util/u_dump_defines.c
b/src/gallium/auxiliary/util/u_dump_defines.c
index 03fd15d..20ae6c0 100644
--- a/src/gallium/auxiliary/util/u_dump_defines.c
+++ b/src/gallium/auxiliary/util/u_dump_defines.c
@@ -61,6 +61,36 @@ util_dump_enum_continuous(unsigned value,
 return names[value];
  }
  +static const char *
+util_dump_flags(unsigned flags, const char *prefix,
+unsigned num_names,
+const char **names)
+{
+#if __STDC_VERSION__ = 201112  !defined __STDC_NO_THREADS__
+   static _Thread_local char str[256];
+#elif defined(PIPE_CC_GCC)
+   static __thread char str[256];
+#elif defined(PIPE_CC_MSVC)
+   static __declspec(thread) char str[256];
+#else
+#error Unsupported compiler: please find how to implement thread local
storage on it
+#endif


Isn't failing compilation a bit aggressive? Can't we just lose the
functionality instead?


Maybe there should be a macro in p_compiler.h which defines the 
thread-specific storage qualifier so we don't need all the #ifdef stuff 
in the util code.


-Brian


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nvc0: remove unused nvc0_screen::mm_VRAM_fe0

2014-11-12 Thread Alexandre Courbot

Ping, how about this guy?

On Mon, Oct 27, 2014 at 7:36 PM, Alexandre Courbot acour...@nvidia.com wrote:
 This member is declared, allocated and destroyed, but doesn't seem to be
 used or referenced anywhere in the code.

 Signed-off-by: Alexandre Courbot acour...@nvidia.com
 ---
 Resending after fixing typo in email address - apologies for the 
 inconvenience.

  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 3 ---
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.h | 2 --
  2 files changed, 5 deletions(-)

 diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
 b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
 index a7581f286cfc..61b381693224 100644
 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
 +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
 @@ -407,8 +407,6 @@ nvc0_screen_destroy(struct pipe_screen *pscreen)

 FREE(screen-tic.entries);

 -   nouveau_mm_destroy(screen-mm_VRAM_fe0);
 -
 nouveau_object_del(screen-eng3d);
 nouveau_object_del(screen-eng2d);
 nouveau_object_del(screen-m2mf);
 @@ -1027,7 +1025,6 @@ nvc0_screen_create(struct nouveau_device *dev)

 mm_config.nvc0.tile_mode = 0;
 mm_config.nvc0.memtype = 0xfe0;
 -   screen-mm_VRAM_fe0 = nouveau_mm_create(dev, NOUVEAU_BO_VRAM, mm_config);

 if (!nvc0_blitter_create(screen))
goto fail;
 diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h 
 b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
 index 4802057f70ee..8a1991f52eb4 100644
 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
 +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.h
 @@ -73,8 +73,6 @@ struct nvc0_screen {
boolean mp_counters_enabled;
 } pm;

 -   struct nouveau_mman *mm_VRAM_fe0;
 -
 struct nouveau_object *eng3d; /* sqrt(1/2)|kepler + sqrt(1/2)|fermi */
 struct nouveau_object *eng2d;
 struct nouveau_object *m2mf;
 --
 2.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3 6/9] gallium/auxiliary: add dump functions for bind and transfer flags

2014-11-12 Thread Jose Fonseca

David,

__declspec(thread) on DLLs is not supported on XP.  This must not go in or Mesa 
DLL's won't load correctly on XP.

This is why there is no abstraction for compiler TLS. It's not very portable 
yet.  So I rather we didn't use a at all.


TLS is not enough to make this code safe neither.  The caller could do

  printf(%s %s\n, util_dump_transfer_flags(foo), 
util_dump_transfer_flags(boo));

which would print wrong results.


Please just have the callers pass in a buffer as parameter.


Jose




From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Brian Paul 
bri...@vmware.com
Sent: 12 November 2014 16:09
To: kusmab...@gmail.com; David Heidelberg
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH v3 6/9] gallium/auxiliary: add dump functions 
for bind and transfer flags

On 11/12/2014 05:48 AM, Erik Faye-Lund wrote:
 On Sun, Nov 2, 2014 at 7:32 PM, David Heidelberg da...@ixit.cz wrote:
 v2: rename and extend support with code for C11 and MSVC (thanks to Brian)

 Signed-off-by: David Heidelberg da...@ixit.cz
 ---
   src/gallium/auxiliary/util/u_dump.h |  6 ++
   src/gallium/auxiliary/util/u_dump_defines.c | 86
 +
   2 files changed, 92 insertions(+)

 diff --git a/src/gallium/auxiliary/util/u_dump.h
 b/src/gallium/auxiliary/util/u_dump.h
 index 58e7dfd..84ba1ed 100644
 --- a/src/gallium/auxiliary/util/u_dump.h
 +++ b/src/gallium/auxiliary/util/u_dump.h
 @@ -88,6 +88,12 @@ util_dump_tex_filter(unsigned value, boolean shortened);
   const char *
   util_dump_query_type(unsigned value, boolean shortened);
   +const char *
 +util_dump_bind_flags(unsigned flags);
 +
 +const char *
 +util_dump_transfer_flags(unsigned flags);
 +
/*
* p_state.h, through a FILE
 diff --git a/src/gallium/auxiliary/util/u_dump_defines.c
 b/src/gallium/auxiliary/util/u_dump_defines.c
 index 03fd15d..20ae6c0 100644
 --- a/src/gallium/auxiliary/util/u_dump_defines.c
 +++ b/src/gallium/auxiliary/util/u_dump_defines.c
 @@ -61,6 +61,36 @@ util_dump_enum_continuous(unsigned value,
  return names[value];
   }
   +static const char *
 +util_dump_flags(unsigned flags, const char *prefix,
 +unsigned num_names,
 +const char **names)
 +{
 +#if __STDC_VERSION__ = 201112  !defined __STDC_NO_THREADS__
 +   static _Thread_local char str[256];
 +#elif defined(PIPE_CC_GCC)
 +   static __thread char str[256];
 +#elif defined(PIPE_CC_MSVC)
 +   static __declspec(thread) char str[256];
 +#else
 +#error Unsupported compiler: please find how to implement thread local
 storage on it
 +#endif

 Isn't failing compilation a bit aggressive? Can't we just lose the
 functionality instead?

Maybe there should be a macro in p_compiler.h which defines the
thread-specific storage qualifier so we don't need all the #ifdef stuff
in the util code.

-Brian


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=Ng6XeIFU6f0YuWJvZ81AjIa3CO1R3sGWh4-djyNlzQos=_TKab-WfP0G1mGPZlWKGh-wL0miqELhP4M-3srNmffge=
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] glx: Allow to create any OpenGL ES version.

2014-11-12 Thread Emil Velikov

On 12/11/14 12:37, jfons...@vmware.com wrote:
 From: José Fonseca jfons...@vmware.com
 
 The latest version of GLX_EXT_create_context_es2_profile states:
 
   If the version requested is a valid and supported OpenGL-ES version,
   and the GLX_CONTEXT_ES_PROFILE_BIT_EXT bit is set in the
   GLX_CONTEXT_PROFILE_MASK_ARB attribute (see below), then the context
   returned will implement the OpenGL ES version requested.
 
Yet the spec seems to lack any version update, or a note in the revision
history afaict :(

 We must also export EXT_create_context_es_profile too, as
 EXT_create_context_es2_profile specification is crystal clear:
 
   NOTE: implementations of this extension must export BOTH extension
   strings, for backwards compatibility with applications written
   against version 1 of this extension.
 
Yeah I've spotted that one while working on waffle this summer but
completely forgot to send out a patch. I think dri2 and dri3 need a
treatment similar to glxsw ? Perhaps even Cc mesa-stable ?

Fwiw with the addition to dri2.c and dri3.c this patch is
Reviewed-by: Emil Velikov emil.l.veli...@gmail.com


Thanks
Emil

 Totally untested.  (Just happened to noticed this while implementing
 GLX_EXT_create_context_es2_profile for st/xlib.)
 ---
  src/glx/dri_common.c | 32 
  src/glx/drisw_glx.c  |  2 ++
  2 files changed, 18 insertions(+), 16 deletions(-)
 
 diff --git a/src/glx/dri_common.c b/src/glx/dri_common.c
 index 63c8de3..541abbb 100644
 --- a/src/glx/dri_common.c
 +++ b/src/glx/dri_common.c
 @@ -544,9 +544,22 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
 uint32_t *attribs,
case GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB:
*api = __DRI_API_OPENGL;
break;
 -  case GLX_CONTEXT_ES2_PROFILE_BIT_EXT:
 -  *api = __DRI_API_GLES2;
 -  break;
 +  case GLX_CONTEXT_ES_PROFILE_BIT_EXT:
 + switch (*major_ver) {
 + case 3:
 +*api = __DRI_API_GLES3;
 +break;
 + case 2:
 +*api = __DRI_API_GLES2;
 +break;
 + case 1:
 +*api = __DRI_API_GLES;
 +break;
 + default:
 +*error = __DRI_CTX_ERROR_BAD_API;
 +return false;
 + }
 + break;
default:
*error = __DRI_CTX_ERROR_BAD_API;
return false;
 @@ -577,19 +590,6 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
 uint32_t *attribs,
return false;
 }
  
 -   /* The GLX_EXT_create_context_es2_profile spec says:
 -*
 -* ... If the version requested is 2.0, and the
 -* GLX_CONTEXT_ES2_PROFILE_BIT_EXT bit is set in the
 -* GLX_CONTEXT_PROFILE_MASK_ARB attribute (see below), then the 
 context
 -* returned will implement OpenGL ES 2.0. This is the only way in 
 which
 -* an implementation may request an OpenGL ES 2.0 context.
 -*/
 -   if (*api == __DRI_API_GLES2  (*major_ver != 2 || *minor_ver != 0)) {
 -  *error = __DRI_CTX_ERROR_BAD_API;
 -  return false;
 -   }
 -
 *error = __DRI_CTX_ERROR_SUCCESS;
 return true;
  }
 diff --git a/src/glx/drisw_glx.c b/src/glx/drisw_glx.c
 index 749ceb0..b0be5d0 100644
 --- a/src/glx/drisw_glx.c
 +++ b/src/glx/drisw_glx.c
 @@ -617,6 +617,8 @@ driswBindExtensions(struct drisw_screen *psc, const 
 __DRIextension **extensions)
/* DRISW version = 2 implies support for OpenGL ES 2.0.
 */
__glXEnableDirectExtension(psc-base,
 +  GLX_EXT_create_context_es_profile);
 +  __glXEnableDirectExtension(psc-base,
GLX_EXT_create_context_es2_profile);
 }
  
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] glx: Allow to create any OpenGL ES version.

2014-11-12 Thread Jose Fonseca

Thanks for the review.

 Yet the spec seems to lack any version update, or a note in the revision
history afaict :(

I don't know if Khronos did further changes but I believe there is some mention 
on the bottom 
https://www.opengl.org/registry/specs/EXT/glx_create_context_es2_profile.txt 
where it says Version 3, 2012/03/28 - Add support for any OpenGL-ES version 


  I think dri2 and dri3 need a treatment similar to glxsw ? Perhaps even Cc 
 mesa-stable ?

Yes I think you're right.  I only build-tested the DRI changes. But I really 
don't have the time or the right development environment setup to test this 
properly on DRI.

If somebody who works more closely with DRI could take PATCH 2/3 and 3/3 polish 
it up, I'd really appreciate it.  I won't claim authorship.

Jose


From: Emil Velikov emil.l.veli...@gmail.com
Sent: 12 November 2014 16:55
To: Jose Fonseca; mesa-dev@lists.freedesktop.org
Cc: emil.l.veli...@gmail.com
Subject: Re: [Mesa-dev] [PATCH 3/3] glx: Allow to create any OpenGL ES version.

On 12/11/14 12:37, jfons...@vmware.com wrote:
 From: José Fonseca jfons...@vmware.com

 The latest version of GLX_EXT_create_context_es2_profile states:

   If the version requested is a valid and supported OpenGL-ES version,
   and the GLX_CONTEXT_ES_PROFILE_BIT_EXT bit is set in the
   GLX_CONTEXT_PROFILE_MASK_ARB attribute (see below), then the context
   returned will implement the OpenGL ES version requested.

Yet the spec seems to lack any version update, or a note in the revision
history afaict :(

 We must also export EXT_create_context_es_profile too, as
 EXT_create_context_es2_profile specification is crystal clear:

   NOTE: implementations of this extension must export BOTH extension
   strings, for backwards compatibility with applications written
   against version 1 of this extension.

Yeah I've spotted that one while working on waffle this summer but
completely forgot to send out a patch. I think dri2 and dri3 need a
treatment similar to glxsw ? Perhaps even Cc mesa-stable ?

Fwiw with the addition to dri2.c and dri3.c this patch is
Reviewed-by: Emil Velikov emil.l.veli...@gmail.com


Thanks
Emil

 Totally untested.  (Just happened to noticed this while implementing
 GLX_EXT_create_context_es2_profile for st/xlib.)
 ---
  src/glx/dri_common.c | 32 
  src/glx/drisw_glx.c  |  2 ++
  2 files changed, 18 insertions(+), 16 deletions(-)

 diff --git a/src/glx/dri_common.c b/src/glx/dri_common.c
 index 63c8de3..541abbb 100644
 --- a/src/glx/dri_common.c
 +++ b/src/glx/dri_common.c
 @@ -544,9 +544,22 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
 uint32_t *attribs,
case GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB:
*api = __DRI_API_OPENGL;
break;
 -  case GLX_CONTEXT_ES2_PROFILE_BIT_EXT:
 -  *api = __DRI_API_GLES2;
 -  break;
 +  case GLX_CONTEXT_ES_PROFILE_BIT_EXT:
 + switch (*major_ver) {
 + case 3:
 +*api = __DRI_API_GLES3;
 +break;
 + case 2:
 +*api = __DRI_API_GLES2;
 +break;
 + case 1:
 +*api = __DRI_API_GLES;
 +break;
 + default:
 +*error = __DRI_CTX_ERROR_BAD_API;
 +return false;
 + }
 + break;
default:
*error = __DRI_CTX_ERROR_BAD_API;
return false;
 @@ -577,19 +590,6 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
 uint32_t *attribs,
return false;
 }

 -   /* The GLX_EXT_create_context_es2_profile spec says:
 -*
 -* ... If the version requested is 2.0, and the
 -* GLX_CONTEXT_ES2_PROFILE_BIT_EXT bit is set in the
 -* GLX_CONTEXT_PROFILE_MASK_ARB attribute (see below), then the 
 context
 -* returned will implement OpenGL ES 2.0. This is the only way in 
 which
 -* an implementation may request an OpenGL ES 2.0 context.
 -*/
 -   if (*api == __DRI_API_GLES2  (*major_ver != 2 || *minor_ver != 0)) {
 -  *error = __DRI_CTX_ERROR_BAD_API;
 -  return false;
 -   }
 -
 *error = __DRI_CTX_ERROR_SUCCESS;
 return true;
  }
 diff --git a/src/glx/drisw_glx.c b/src/glx/drisw_glx.c
 index 749ceb0..b0be5d0 100644
 --- a/src/glx/drisw_glx.c
 +++ b/src/glx/drisw_glx.c
 @@ -617,6 +617,8 @@ driswBindExtensions(struct drisw_screen *psc, const 
 __DRIextension **extensions)
/* DRISW version = 2 implies support for OpenGL ES 2.0.
 */
__glXEnableDirectExtension(psc-base,
 +  GLX_EXT_create_context_es_profile);
 +  __glXEnableDirectExtension(psc-base,
GLX_EXT_create_context_es2_profile);
 }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] mesa/main: Add sse2 streaming clamping

2014-11-12 Thread Bruno Jimenez

On Wed, 2014-11-12 at 14:50 +0200, Juha-Pekka Heikkila wrote:
 Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 ---
  src/mesa/Makefile.am  |   8 +++
  src/mesa/main/sse2_clamping.c | 138 
 ++
  src/mesa/main/sse2_clamping.h |  49 +++
  3 files changed, 195 insertions(+)
  create mode 100644 src/mesa/main/sse2_clamping.c
  create mode 100644 src/mesa/main/sse2_clamping.h
 
 diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
 index 932db4f..43dbe87 100644
 --- a/src/mesa/Makefile.am
 +++ b/src/mesa/Makefile.am
 @@ -111,6 +111,10 @@ if SSE41_SUPPORTED
  ARCH_LIBS += libmesa_sse41.la
  endif
  
 +if SSE2_SUPPORTED
 +ARCH_LIBS += libmesa_sse2.la
 +endif
 +
  MESA_ASM_FILES_FOR_ARCH =
  
  if HAVE_X86_ASM
 @@ -155,6 +159,10 @@ libmesa_sse41_la_SOURCES = \
   main/sse_minmax.c
  libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
  
 +libmesa_sse2_la_SOURCES = \
 + main/sse2_clamping.c
 +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2
 +
  pkgconfigdir = $(libdir)/pkgconfig
  pkgconfig_DATA = gl.pc
  
 diff --git a/src/mesa/main/sse2_clamping.c b/src/mesa/main/sse2_clamping.c
 new file mode 100644
 index 000..66c7dc7
 --- /dev/null
 +++ b/src/mesa/main/sse2_clamping.c
 @@ -0,0 +1,138 @@
 +/*
 + * Copyright © 2014 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the Software),
 + * to deal in the Software without restriction, including without limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the next
 + * paragraph) shall be included in all copies or substantial portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
 DEALINGS
 + * IN THE SOFTWARE.
 + *
 + * Authors:
 + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 + *
 + */
 +
 +#ifdef __SSE2__
 +#include main/macros.h
 +#include main/sse2_clamping.h
 +#include emmintrin.h
 +
 +/**
 + * Clamp four float values to [min,max]
 + */
 +static inline void
 +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min,
 +   const float max)
 +{
 +   __m128  operand, minval, maxval;
 +
 +   operand = _mm_loadu_ps(src);
 +   minval = _mm_set1_ps(min);
 +   maxval = _mm_set1_ps(max);
 +   operand = _mm_max_ps(operand, minval);
 +   operand = _mm_min_ps(operand, maxval);
 +   _mm_storeu_ps(result, operand);
 +}
 +
 +
 +/* Clamp n amount float rgba pixels to [min,max] using SSE2
 + */
 +__attribute__((optimize(unroll-loops)))
 +void
 +_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4],
 + GLfloat rgba_dst[][4], const GLfloat min,
 + const GLfloat max)
 +{
 +   int  c, prefetch_c;
 +   float*   worker = rgba_src[0][0];
 +   __m128   operand[2], minval, maxval;
 +
 +   _mm_prefetch((char*) (((unsigned long)worker)|0x1f) + 65, _MM_HINT_T0);
   ^^^

Hi,

May I ask why precisely this numbers?

 +
 +   minval = _mm_set1_ps(min);
 +   maxval = _mm_set1_ps(max);
 +
 +   for (c = n*4; c  0  (((unsigned long)worker)0x1f) != 0; c--, 
 worker++) {
^

I guess that this is for alignment, but you only need to align to a 16
bytes boundary, not 32. Or maybe I am missing something obvious.

 +  operand[0] = _mm_load_ss(worker);
 +  operand[0] = _mm_max_ss(operand[0], minval);
 +  operand[0] = _mm_min_ss(operand[0], maxval);
 +  _mm_store_ss(worker, operand[0]);
 +   }
 +
 +   while (c = 8) {
 +  _mm_prefetch((char*) worker + 64, _MM_HINT_T0);
  ^^^
 +
 +  for (prefetch_c = 64/8; prefetch_c  0  c = 8; prefetch_c--, c-=8,
  

May I ask also why this numbers?

Thanks in advance!
Bruno

 +   worker += 8) {
 +
 + operand[0] = _mm_load_ps(worker);
 + operand[1] = _mm_load_ps(worker+4);
 + operand[0] = _mm_max_ps(operand[0], minval);
 + operand[1] = _mm_max_ps(operand[1], minval);
 + operand[0] = _mm_min_ps(operand[0], maxval);
 + operand[1] = _mm_min_ps(operand[1], maxval);
 +
 + _mm_store_ps(worker, operand[0]);
 +

Re: [Mesa-dev] [PATCH 3/3] glx: Allow to create any OpenGL ES version.

2014-11-12 Thread Daniel Stone

Hi,

On 12 November 2014 12:37, jfons...@vmware.com wrote:

 @@ -544,9 +544,22 @@ dri2_convert_glx_attribs(unsigned num_attribs, const
 uint32_t *attribs,
case GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB:
  *api = __DRI_API_OPENGL;
  break;
 -  case GLX_CONTEXT_ES2_PROFILE_BIT_EXT:
 -*api = __DRI_API_GLES2;
 -break;
 +  case GLX_CONTEXT_ES_PROFILE_BIT_EXT:
 + switch (*major_ver) {
 + case 3:
 +*api = __DRI_API_GLES3;
 +break;
 + case 2:
 +*api = __DRI_API_GLES2;
 +break;
 + case 1:
 +*api = __DRI_API_GLES;
 +break;
 + default:
 +*error = __DRI_CTX_ERROR_BAD_API;
 +return false;
 + }
 + break;
default:
  *error = __DRI_CTX_ERROR_BAD_API;
  return false;
 @@ -577,19 +590,6 @@ dri2_convert_glx_attribs(unsigned num_attribs, const
 uint32_t *attribs,
return false;
 }

 -   /* The GLX_EXT_create_context_es2_profile spec says:
 -*
 -* ... If the version requested is 2.0, and the
 -* GLX_CONTEXT_ES2_PROFILE_BIT_EXT bit is set in the
 -* GLX_CONTEXT_PROFILE_MASK_ARB attribute (see below), then the
 context
 -* returned will implement OpenGL ES 2.0. This is the only way in
 which
 -* an implementation may request an OpenGL ES 2.0 context.
 -*/
 -   if (*api == __DRI_API_GLES2  (*major_ver != 2 || *minor_ver != 0)) {


It looks like you're missing minor_ver checking here? For instance, 2.99
isn't a valid GLES version.

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Use the predicate enable bit for conditional rendering without stalling

2014-11-12 Thread Kenneth Graunke

On Wednesday, November 12, 2014 11:28:15 AM Daniel Vetter wrote:
 On Tue, Nov 11, 2014 at 11:13:28AM -0800, Kenneth Graunke wrote:
  On Tuesday, November 11, 2014 06:59:51 PM Neil Roberts wrote:
   Kenneth Graunke kenn...@whitecape.org writes:
   
drm-intel-next must have the new software checker turned on, which
disallows non-whitelisted register writes (along with libva, so it
can't really be enabled upstream yet).
   
   For what it's worth, I get the EINVAL error even on the stock Fedora 20
   kernel on Haswell (and presumably IvyBridge) so I can only assume the
   software checker is already upstream, unless I'm misunderstanding
   something.
   
   $ uname -r
   3.16.7-200.fc20.x86_64
   $ modinfo i915 | grep cmd_parser
   parm: enable_cmd_parser:Enable command parsing [...]
 (1=enabled [default], 0=disabled) (int)
   $ sudo cat /sys/module/i915/parameters/enable_cmd_parser 
   1
   
   If I cat 0 to /sys/module/i915/parameters/enable_cmd_parser then I no
   longer get the EINVAL error.
   
   - Neil
  
  Huh.  Yeah, I thought they turned it on by default in 3.16, which I don't 
  understand at all.  AFAIK the libva issue isn't fixed (or wasn't by then), 
so 
  it sure seems like it would've broken userspace.  Which would be a pretty 
  clear kernel policy violation...
 
 We let libva pass. And in the latest patches from Brad if we detect libva
 tricks we'll still let it pass, just not with elevated privs needed for
 writing special registers. And the point of enabling the parser in 3.16
 already was to have as much coverage early as possible to catch any
 userspace issues we've missed.
 -Daniel

Cool.  I'd seen the discussiosn on making it work, and thought it hadn't 
actually happened...but I had my facts wrong.  Sorry for the trouble.

I actually haven't heard anyone complaining about breakage, so apparently it 
works pretty well.  Nice!

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/1] clover: Fix build after llvm r221375

2014-11-12 Thread Laurent Carlier

Le jeudi 6 novembre 2014, 09:45:40 Tom Stellard a écrit :
 On Thu, Nov 06, 2014 at 11:46:41AM -0500, Jan Vesely wrote:
  Signed-off-by: Jan Vesely jan.ves...@rutgers.edu
 
 I've pushed this, thanks!
 
 -Tom

http://llvm.org/viewvc/llvm-project?view=revisionrevision=221711

Bad luck, it's reverted, so now:

  CXX  core/libclover_la-event.lo
  CXX  core/libclover_la-format.lo
  CXX  core/libclover_la-kernel.lo
  CXX  core/libclover_la-memory.lo
  CXX  core/libclover_la-module.lo
  CXX  core/libclover_la-platform.lo
  CXX  core/libclover_la-program.lo
  CXX  core/libclover_la-queue.lo
  CXX  core/libclover_la-resource.lo
  CXX  core/libclover_la-sampler.lo
  CXX  core/libclover_la-timestamp.lo
  CXX  util/libclover_la-compat.lo
  CXX  tgsi/libcltgsi_la-compiler.lo
  CXX  llvm/libclllvm_la-invocation.lo
  CXXLDlibcltgsi.la
llvm/invocation.cpp: In function 'void {anonymous}::find_kernels(llvm::Module*, 
std::vectorllvm::Function*)':
llvm/invocation.cpp:286:50: error: 'const class llvm::NamedMDNode' has no 
member named 'getOperandAsMDNode'
 kernel_node-getOperandAsMDNode(i)-
getOperand(0)));
  ^
Makefile:843: recipe for target 'llvm/libclllvm_la-invocation.lo' failed
make[3]: *** [llvm/libclllvm_la-invocation.lo] Error 1
make[3]: Leaving directory '/build/mesa-
git/src/mesa/src/gallium/state_trackers/clover'
Makefile:558: recipe for target 'all-recursive' failed
make[2]: *** [all-recursive] Error 1

-- 
Laurent Carlier
http://www.archlinux.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/1] clover: Fix build after llvm r221375

2014-11-12 Thread Tom Stellard

On Wed, Nov 12, 2014 at 07:36:04PM +0100, Laurent Carlier wrote:
 Le jeudi 6 novembre 2014, 09:45:40 Tom Stellard a écrit :
  On Thu, Nov 06, 2014 at 11:46:41AM -0500, Jan Vesely wrote:
   Signed-off-by: Jan Vesely jan.ves...@rutgers.edu
  
  I've pushed this, thanks!
  
  -Tom
 
 http://llvm.org/viewvc/llvm-project?view=revisionrevision=221711
 
 Bad luck, it's reverted, so now:
 
   CXX  core/libclover_la-event.lo
   CXX  core/libclover_la-format.lo
   CXX  core/libclover_la-kernel.lo
   CXX  core/libclover_la-memory.lo
   CXX  core/libclover_la-module.lo
   CXX  core/libclover_la-platform.lo
   CXX  core/libclover_la-program.lo
   CXX  core/libclover_la-queue.lo
   CXX  core/libclover_la-resource.lo
   CXX  core/libclover_la-sampler.lo
   CXX  core/libclover_la-timestamp.lo
   CXX  util/libclover_la-compat.lo
   CXX  tgsi/libcltgsi_la-compiler.lo
   CXX  llvm/libclllvm_la-invocation.lo
   CXXLDlibcltgsi.la
 llvm/invocation.cpp: In function 'void 
 {anonymous}::find_kernels(llvm::Module*, 
 std::vectorllvm::Function*)':
 llvm/invocation.cpp:286:50: error: 'const class llvm::NamedMDNode' has no 
 member named 'getOperandAsMDNode'
  kernel_node-getOperandAsMDNode(i)-
 getOperand(0)));
   ^
 Makefile:843: recipe for target 'llvm/libclllvm_la-invocation.lo' failed
 make[3]: *** [llvm/libclllvm_la-invocation.lo] Error 1
 make[3]: Leaving directory '/build/mesa-
 git/src/mesa/src/gallium/state_trackers/clover'
 Makefile:558: recipe for target 'all-recursive' failed
 make[2]: *** [all-recursive] Error 1
 

Should be fixed now.

-Tom

 -- 
 Laurent Carlier
 http://www.archlinux.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] mesa/main: Add sse2 streaming clamping

2014-11-12 Thread Juha-Pekka Heikkila

On 12.11.2014 18:08, Brian Paul wrote:
 On 11/12/2014 05:50 AM, Juha-Pekka Heikkila wrote:
 Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 ---
   src/mesa/Makefile.am  |   8 +++
   src/mesa/main/sse2_clamping.c | 138
 ++
   src/mesa/main/sse2_clamping.h |  49 +++
   3 files changed, 195 insertions(+)
   create mode 100644 src/mesa/main/sse2_clamping.c
   create mode 100644 src/mesa/main/sse2_clamping.h

 diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
 index 932db4f..43dbe87 100644
 --- a/src/mesa/Makefile.am
 +++ b/src/mesa/Makefile.am
 @@ -111,6 +111,10 @@ if SSE41_SUPPORTED
   ARCH_LIBS += libmesa_sse41.la
   endif

 +if SSE2_SUPPORTED
 +ARCH_LIBS += libmesa_sse2.la
 +endif
 +
   MESA_ASM_FILES_FOR_ARCH =

   if HAVE_X86_ASM
 @@ -155,6 +159,10 @@ libmesa_sse41_la_SOURCES = \
   main/sse_minmax.c
   libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1

 +libmesa_sse2_la_SOURCES = \
 +main/sse2_clamping.c
 +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2
 +
   pkgconfigdir = $(libdir)/pkgconfig
   pkgconfig_DATA = gl.pc

 diff --git a/src/mesa/main/sse2_clamping.c
 b/src/mesa/main/sse2_clamping.c
 new file mode 100644
 index 000..66c7dc7
 --- /dev/null
 +++ b/src/mesa/main/sse2_clamping.c
 @@ -0,0 +1,138 @@
 +/*
 + * Copyright © 2014 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person
 obtaining a
 + * copy of this software and associated documentation files (the
 Software),
 + * to deal in the Software without restriction, including without
 limitation
 + * the rights to use, copy, modify, merge, publish, distribute,
 sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including
 the next
 + * paragraph) shall be included in all copies or substantial portions
 of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
 EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
 SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
 OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
 ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 OTHER DEALINGS
 + * IN THE SOFTWARE.
 + *
 + * Authors:
 + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 + *
 + */
 +
 +#ifdef __SSE2__
 +#include main/macros.h
 +#include main/sse2_clamping.h
 +#include emmintrin.h
 +
 +/**
 + * Clamp four float values to [min,max]
 + */
 +static inline void
 +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float
 min,
 +   const float max)
 
 We don't normally put the _mesa_ prefix on local/static functions.
 
 Is there a reason why you const-qualify the min, max parameters but not
 src?

src can be the same as result but now thinking these as pointers the
pointer can of course be const, same goes for result. Don't know does
const declaration for these make much difference here though.

 
 
 +{
 +   __m128  operand, minval, maxval;
 +
 +   operand = _mm_loadu_ps(src);
 +   minval = _mm_set1_ps(min);
 +   maxval = _mm_set1_ps(max);
 +   operand = _mm_max_ps(operand, minval);
 +   operand = _mm_min_ps(operand, maxval);
 +   _mm_storeu_ps(result, operand);
 +}
 +
 +
 +/* Clamp n amount float rgba pixels to [min,max] using SSE2
 + */
 +__attribute__((optimize(unroll-loops)))
 
 Is there any intention of building this code with MSVC someday?  I don't
 think the __attribute__ stuff will work there.

true, I guess I have to wrap it up inside ifdef.

 
 +void
 +_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4],
 + GLfloat rgba_dst[][4], const GLfloat
 min,
 + const GLfloat max)
 
 Again, it seems odd to const-qualify the min, max parameters but not src.

Here applies same as above. Memory pointed by src can be same as dst but
the pointer itself can be made const. For the clamping where this is now
used src is the same as dst.

 
 
 +{
 +   int  c, prefetch_c;
 +   float*   worker = rgba_src[0][0];
 +   __m128   operand[2], minval, maxval;
 +
 +   _mm_prefetch((char*) (((unsigned long)worker)|0x1f) + 65,
 _MM_HINT_T0);
 +
 +   minval = _mm_set1_ps(min);
 +   maxval = _mm_set1_ps(max);
 +
 +   for (c = n*4; c  0  (((unsigned long)worker)0x1f) != 0; c--,
 worker++) {
 
 Whitespace on both sides of '' would be good.
 
 +  operand[0] = _mm_load_ss(worker);
 +  operand[0] = _mm_max_ss(operand[0], minval);
 +  operand[0] = _mm_min_ss(operand[0], maxval);
 +  _mm_store_ss(worker, operand[0]);
 +   }
 +
 +   while (c = 8) {
 +  _mm_prefetch((char*) worker + 64, _MM_HINT_T0);
 +
 +  for (prefetch_c

[Mesa-dev] [PATCH 2/4] i965/fs: Remove unused apply_stride().

2014-11-12 Thread Matt Turner

---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 10 --
 src/mesa/drivers/dri/i965/brw_fs.h   |  1 -
 2 files changed, 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 39c6231..7003691 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -608,16 +608,6 @@ fs_reg::equals(const fs_reg r) const
 }
 
 fs_reg 
-fs_reg::apply_stride(unsigned stride)
-{
-   assert((this-stride * stride) = 4 
-  (is_power_of_two(stride) || stride == 0) 
-  file != HW_REG  file != IMM);
-   this-stride *= stride;
-   return *this;
-}
-
-fs_reg 
 fs_reg::set_smear(unsigned subreg)
 {
assert(file != HW_REG  file != IMM);
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 8ca5490..0dae800 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -86,7 +86,6 @@ public:
bool is_valid_3src() const;
bool is_contiguous() const;
 
-   fs_reg apply_stride(unsigned stride);
/** Smear a channel of the reg to all channels. */
fs_reg set_smear(unsigned subreg);
 
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/4] i965/fs: Remove is_valid_3src().

2014-11-12 Thread Matt Turner

---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 6 --
 src/mesa/drivers/dri/i965/brw_fs.h   | 1 -
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +-
 3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 7003691..9196af9 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -622,12 +622,6 @@ fs_reg::is_contiguous() const
return stride == 1;
 }
 
-bool
-fs_reg::is_valid_3src() const
-{
-   return file == GRF || file == UNIFORM;
-}
-
 int
 fs_visitor::type_size(const struct glsl_type *type)
 {
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 0dae800..9e1dddc 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -83,7 +83,6 @@ public:
fs_reg(fs_visitor *v, const struct glsl_type *type);
 
bool equals(const fs_reg r) const;
-   bool is_valid_3src() const;
bool is_contiguous() const;
 
/** Smear a channel of the reg to all channels. */
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index ce4d8c8..f112466 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -514,7 +514,7 @@ fs_visitor::visit(ir_expression *ir)
 ir-operands[operand]-fprint(stderr);
  fprintf(stderr, \n);
   }
-  assert(this-result.is_valid_3src());
+  assert(this-result.file == GRF || this-result.file == UNIFORM);
   op[operand] = this-result;
 
   /* Matrix expression operands should have been broken down to vector
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/4] i965/fs: Move ip_record class to its one use.

2014-11-12 Thread Matt Turner

---
 src/mesa/drivers/dri/i965/brw_fs.h | 12 
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 12 
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 1c14d13..8ca5490 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -210,18 +210,6 @@ half(fs_reg reg, unsigned idx)
 
 static const fs_reg reg_undef;
 
-class ip_record : public exec_node {
-public:
-   DECLARE_RALLOC_CXX_OPERATORS(ip_record)
-
-   ip_record(int ip)
-   {
-  this-ip = ip;
-   }
-
-   int ip;
-};
-
 class fs_inst : public backend_instruction {
fs_inst operator=(const fs_inst );
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index c95beb6..2b7580c 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -60,6 +60,18 @@ fs_generator::~fs_generator()
 {
 }
 
+class ip_record : public exec_node {
+public:
+   DECLARE_RALLOC_CXX_OPERATORS(ip_record)
+
+   ip_record(int ip)
+   {
+  this-ip = ip;
+   }
+
+   int ip;
+};
+
 bool
 fs_generator::patch_discard_jumps_to_fb_writes()
 {
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] i965/fs: Remove is_valid_3src() checks from emit_lrp.

2014-11-12 Thread Matt Turner

The visitor emits MOVs to temporary registers for immediates, so these
never trigger. For further proof, check case ir_triop_fma.
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index d4f08aa..ce4d8c8 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -226,10 +226,7 @@ void
 fs_visitor::emit_lrp(const fs_reg dst, const fs_reg x, const fs_reg y,
  const fs_reg a)
 {
-   if (brw-gen  6 ||
-   !x.is_valid_3src() ||
-   !y.is_valid_3src() ||
-   !a.is_valid_3src()) {
+   if (brw-gen  6) {
   /* We can't use the LRP instruction.  Emit x*(1-a) + y*a. */
   fs_reg y_times_a   = fs_reg(this, glsl_type::float_type);
   fs_reg one_minus_a = fs_reg(this, glsl_type::float_type);
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Implement WaCsStallAtEveryFourthPipecontrol on IVB/BYT.

2014-11-12 Thread Kenneth Graunke

On Wednesday, November 12, 2014 10:53:29 AM Chris Wilson wrote:
 On Wed, Nov 12, 2014 at 11:39:28AM +0100, Daniel Vetter wrote:
  On Wed, Nov 12, 2014 at 01:33:01AM -0800, Kenneth Graunke wrote:
   +/* Implement the WaCsStallAtEveryFourthPipecontrol workaround on IVB, 
BYT:
   + *
   + * Every 4th PIPE_CONTROL command, not counting the PIPE_CONTROL with
   + *  only read-cache-invalidate bit(s) set, must have a CS_STALL bit 
set.
   + *
   + * Note that the kernel does CS stalls between batches, so we only need
   + * to count them within a batch.
   + */
   +static uint32_t
   +gen7_cs_stall_every_four_pipe_controls(struct brw_context *brw, 
uint32_t flags)
   +{
   +   if (brw-gen == 7  brw-is_haswell) {
 
 The comment says for IVB,BYT, the code here only applies to HSW.
 -Chris

D'oh...meant to put a ! there.  Thanks Chris!  I retested on IVB (now that it 
actually does something), and it still has no Piglit regressions.

Daniel, I also like your suggestion about ++.  Changed for v2 :)

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965: Implement WaCsStallAtEveryFourthPipecontrol on IVB/BYT.

2014-11-12 Thread Kenneth Graunke

According to the documentation, we need to do a CS stall on every fourth
PIPE_CONTROL command to avoid GPU hangs.  The kernel does a CS stall
between batches, so we only need to count the PIPE_CONTROLs in our batches.

v2: Get the generation check right (caught by Chris Wilson),
combine the ++ with the check (suggested by Daniel Vetter).

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
Reviewed-by: Daniel Vetter daniel.vet...@ffwll.ch
---
 src/mesa/drivers/dri/i965/brw_context.h   |  2 ++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 32 +++
 2 files changed, 34 insertions(+)

This may help
https://code.google.com/p/chromium/issues/detail?id=333130

and other GPU hangs.  No Piglit regressions, but I haven't encountered GPU
hangs to test it on...

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 656cbe8..27cf92c 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -854,6 +854,8 @@ struct intel_batchbuffer {
enum brw_gpu_ring ring;
bool needs_sol_reset;
 
+   uint8_t pipe_controls_since_last_cs_stall;
+
struct {
   uint16_t used;
   int reloc_count;
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index cd45af6..08f8e18 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -81,6 +81,7 @@ intel_batchbuffer_reset(struct brw_context *brw)
brw-batch.state_batch_offset = brw-batch.bo-size;
brw-batch.used = 0;
brw-batch.needs_sol_reset = false;
+   brw-batch.pipe_controls_since_last_cs_stall = 0;
 
/* We don't know what ring the new batch will be sent to until we see the
 * first BEGIN_BATCH or BEGIN_BATCH_BLT.  Mark it as unknown.
@@ -433,6 +434,33 @@ gen8_add_cs_stall_workaround_bits(uint32_t *flags)
   *flags |= PIPE_CONTROL_STALL_AT_SCOREBOARD;
 }
 
+/* Implement the WaCsStallAtEveryFourthPipecontrol workaround on IVB, BYT:
+ *
+ * Every 4th PIPE_CONTROL command, not counting the PIPE_CONTROL with
+ *  only read-cache-invalidate bit(s) set, must have a CS_STALL bit set.
+ *
+ * Note that the kernel does CS stalls between batches, so we only need
+ * to count them within a batch.
+ */
+static uint32_t
+gen7_cs_stall_every_four_pipe_controls(struct brw_context *brw, uint32_t flags)
+{
+   if (brw-gen == 7  !brw-is_haswell) {
+  if (flags  PIPE_CONTROL_CS_STALL) {
+ /* If we're doing a CS stall, reset the counter and carry on. */
+ brw-batch.pipe_controls_since_last_cs_stall = 0;
+ return 0;
+  }
+
+  /* If this is the fourth pipe control without a CS stall, do one now. */
+  if (++brw-batch.pipe_controls_since_last_cs_stall == 4) {
+ brw-batch.pipe_controls_since_last_cs_stall = 0;
+ return PIPE_CONTROL_CS_STALL;
+  }
+   }
+   return 0;
+}
+
 /**
  * Emit a PIPE_CONTROL with various flushing flags.
  *
@@ -454,6 +482,8 @@ brw_emit_pipe_control_flush(struct brw_context *brw, 
uint32_t flags)
   OUT_BATCH(0);
   ADVANCE_BATCH();
} else if (brw-gen = 6) {
+  flags |= gen7_cs_stall_every_four_pipe_controls(brw, flags);
+
   BEGIN_BATCH(5);
   OUT_BATCH(_3DSTATE_PIPE_CONTROL | (5 - 2));
   OUT_BATCH(flags);
@@ -496,6 +526,8 @@ brw_emit_pipe_control_write(struct brw_context *brw, 
uint32_t flags,
   OUT_BATCH(imm_upper);
   ADVANCE_BATCH();
} else if (brw-gen = 6) {
+  flags |= gen7_cs_stall_every_four_pipe_controls(brw, flags);
+
   /* PPGTT/GGTT is selected by DW2 bit 2 on Sandybridge, but DW1 bit 24
* on later platforms.  We always use PPGTT on Gen7+.
*/
-- 
2.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i915g: we also have more than 0 viewports!

2014-11-12 Thread Kenneth Graunke

See 546d6c8d for the corresponding fix in freedreno.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/gallium/drivers/i915/i915_screen.c | 3 +++
 1 file changed, 3 insertions(+)

Not Piglit tested yet, sorry.

diff --git a/src/gallium/drivers/i915/i915_screen.c 
b/src/gallium/drivers/i915/i915_screen.c
index 062f1a6..e9f10bc 100644
--- a/src/gallium/drivers/i915/i915_screen.c
+++ b/src/gallium/drivers/i915/i915_screen.c
@@ -240,6 +240,9 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap 
cap)
case PIPE_CAP_SAMPLER_VIEW_TARGET:
   return 0;
 
+   case PIPE_CAP_MAX_VIEWPORTS:
+  return 1;
+
case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
   return 64;
 
-- 
2.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 86070] Host application crash on vmware fusion 7 in vmw_swc_flush

2014-11-12 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=86070

--- Comment #4 from Sinclair Yeh s...@vmware.com ---
Thanks.  I can now reproduce this issue with MESA 8.0.4 and MESA 9.0.  This
seems to work fine with MESA 10.1.3.

I'll track down the root cause of this.

Do you see this vmw_ioctl_command error Invalid argument. in the terminal
when you run mplay-bin?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 86070] Host application crash on vmware fusion 7 in vmw_swc_flush

2014-11-12 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=86070

--- Comment #5 from Nicholas Yue yue.nicho...@gmail.com ---
(In reply to Sinclair Yeh from comment #4)
 Thanks.  I can now reproduce this issue with MESA 8.0.4 and MESA 9.0.  This
 seems to work fine with MESA 10.1.3.
 
 I'll track down the root cause of this.
 
 Do you see this vmw_ioctl_command error Invalid argument. in the terminal
 when you run mplay-bin?

No. I don't recall seeing vmw_ioctl_command error Invalid argument.

Cheers

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] i965: Combine offset/texture_offset fields.

2014-11-12 Thread Matt Turner

texture_offset was only used by some texturing operations, and offset
was only used by spill/unspill and some URB operations. These fields are
never used at the same time.
---
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 2 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 6 +++---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 4 ++--
 src/mesa/drivers/dri/i965/brw_shader.h   | 3 +--
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 6 +++---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp   | 7 +++
 6 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
index 5fdbf46..b1c433e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
@@ -157,7 +157,7 @@ instructions_match(fs_inst *a, fs_inst *b)
   a-conditional_mod == b-conditional_mod 
   a-dst.type == b-dst.type 
   a-sources == b-sources 
-  (a-is_tex() ? (a-texture_offset == b-texture_offset 
+  (a-is_tex() ? (a-offset == b-offset 
   a-mlen == b-mlen 
   a-regs_written == b-regs_written 
   a-base_mrf == b-base_mrf 
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index c95beb6..dc9e803 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -556,7 +556,7 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
 * Otherwise, we can use an implied move from g0 to the first message reg.
 */
if (inst-header_present) {
-  if (brw-gen  6  !inst-texture_offset) {
+  if (brw-gen  6  !inst-offset) {
  /* Set up an implied move from g0 to the MRF. */
  src = retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UW);
   } else {
@@ -575,10 +575,10 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
  /* Explicitly set up the message header by copying g0 to the MRF. */
  brw_MOV(p, header_reg, brw_vec8_grf(0, 0));
 
- if (inst-texture_offset) {
+ if (inst-offset) {
 /* Set the offset bits in DWord 2. */
 brw_MOV(p, get_element_ud(header_reg, 2),
-   brw_imm_ud(inst-texture_offset));
+   brw_imm_ud(inst-offset));
  }
 
  brw_adjust_sampler_state_pointer(p, header_reg, sampler_index, dst);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 4e1badd..21334a2 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1918,10 +1918,10 @@ fs_visitor::emit_texture(ir_texture_opcode op,
   inst-shadow_compare = true;
 
if (offset_value.file == IMM)
-  inst-texture_offset = offset_value.fixed_hw_reg.dw1.ud;
+  inst-offset = offset_value.fixed_hw_reg.dw1.ud;
 
if (op == ir_tg4) {
-  inst-texture_offset |=
+  inst-offset |=
  gather_channel(gather_component, sampler)  16; /* M0.2:16-17 */
 
   if (brw-gen == 6)
diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
b/src/mesa/drivers/dri/i965/brw_shader.h
index 94db987..32460e2 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.h
+++ b/src/mesa/drivers/dri/i965/brw_shader.h
@@ -112,8 +112,7 @@ struct backend_instruction {
const char *annotation;
/** @} */
 
-   uint32_t texture_offset; /** Texture offset bitfield */
-   uint32_t offset; /** spill/unspill offset */
+   uint32_t offset; /** spill/unspill offset or texture offset bitfield */
uint8_t mlen; /** SEND message length */
int8_t base_mrf; /** First MRF in the SEND message, if mlen is nonzero. */
uint8_t target; /** MRT target. */
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index e522567..0776a91 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -319,7 +319,7 @@ vec4_generator::generate_tex(vec4_instruction *inst,
 * use an implied move from g0 to the first message register.
 */
if (inst-header_present) {
-  if (brw-gen  6  !inst-texture_offset) {
+  if (brw-gen  6  !inst-offset) {
  /* Set up an implied move from g0 to the MRF. */
  src = brw_vec8_grf(0, 0);
   } else {
@@ -333,10 +333,10 @@ vec4_generator::generate_tex(vec4_instruction *inst,
 
  brw_set_default_access_mode(p, BRW_ALIGN_1);
 
- if (inst-texture_offset) {
+ if (inst-offset) {
 /* Set the texel offset bits in DWord 2. */
 brw_MOV(p, get_element_ud(header, 2),
-brw_imm_ud(inst-texture_offset));
+brw_imm_ud(inst-offset));
  }

[Mesa-dev] [PATCH 2/2] i965: Move common fields into backend_instruction.

2014-11-12 Thread Matt Turner

---
 src/mesa/drivers/dri/i965/brw_fs.h | 2 --
 src/mesa/drivers/dri/i965/brw_shader.h | 2 ++
 src/mesa/drivers/dri/i965/brw_vec4.h   | 3 ---
 3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index d9150c3..63641af 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -281,8 +281,6 @@ public:
 
uint8_t regs_written; /** Number of vgrfs written by a SEND message, or 1 
*/
bool eot:1;
-   bool header_present:1;
-   bool shadow_compare:1;
bool force_uncompressed:1;
bool force_sechalf:1;
bool pi_noperspective:1;   /** Pixel interpolator noperspective flag */
diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
b/src/mesa/drivers/dri/i965/brw_shader.h
index 32460e2..cdf86ff 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.h
+++ b/src/mesa/drivers/dri/i965/brw_shader.h
@@ -126,6 +126,8 @@ struct backend_instruction {
bool no_dd_clear:1;
bool no_dd_check:1;
bool saturate:1;
+   bool shadow_compare:1;
+   bool header_present:1;
 };
 
 #ifdef __cplusplus
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 750f491..758d752 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -215,10 +215,7 @@ public:
dst_reg dst;
src_reg src[3];
 
-   bool shadow_compare;
-
enum brw_urb_write_flags urb_write_flags;
-   bool header_present;
 
unsigned sol_binding; /** gen6: SOL binding table index */
bool sol_final_write; /** gen6: send commit message */
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] mesa/main: Add sse2 streaming clamping

2014-11-12 Thread Juha-Pekka Heikkila

On 12.11.2014 19:36, Bruno Jimenez wrote:
 On Wed, 2014-11-12 at 14:50 +0200, Juha-Pekka Heikkila wrote:
 Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 ---
  src/mesa/Makefile.am  |   8 +++
  src/mesa/main/sse2_clamping.c | 138 
 ++
  src/mesa/main/sse2_clamping.h |  49 +++
  3 files changed, 195 insertions(+)
  create mode 100644 src/mesa/main/sse2_clamping.c
  create mode 100644 src/mesa/main/sse2_clamping.h

 diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
 index 932db4f..43dbe87 100644
 --- a/src/mesa/Makefile.am
 +++ b/src/mesa/Makefile.am
 @@ -111,6 +111,10 @@ if SSE41_SUPPORTED
  ARCH_LIBS += libmesa_sse41.la
  endif
  
 +if SSE2_SUPPORTED
 +ARCH_LIBS += libmesa_sse2.la
 +endif
 +
  MESA_ASM_FILES_FOR_ARCH =
  
  if HAVE_X86_ASM
 @@ -155,6 +159,10 @@ libmesa_sse41_la_SOURCES = \
  main/sse_minmax.c
  libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
  
 +libmesa_sse2_la_SOURCES = \
 +main/sse2_clamping.c
 +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2
 +
  pkgconfigdir = $(libdir)/pkgconfig
  pkgconfig_DATA = gl.pc
  
 diff --git a/src/mesa/main/sse2_clamping.c b/src/mesa/main/sse2_clamping.c
 new file mode 100644
 index 000..66c7dc7
 --- /dev/null
 +++ b/src/mesa/main/sse2_clamping.c
 @@ -0,0 +1,138 @@
 +/*
 + * Copyright © 2014 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the 
 Software),
 + * to deal in the Software without restriction, including without limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the next
 + * paragraph) shall be included in all copies or substantial portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS 
 OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
 OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
 DEALINGS
 + * IN THE SOFTWARE.
 + *
 + * Authors:
 + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 + *
 + */
 +
 +#ifdef __SSE2__
 +#include main/macros.h
 +#include main/sse2_clamping.h
 +#include emmintrin.h
 +
 +/**
 + * Clamp four float values to [min,max]
 + */
 +static inline void
 +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min,
 +   const float max)
 +{
 +   __m128  operand, minval, maxval;
 +
 +   operand = _mm_loadu_ps(src);
 +   minval = _mm_set1_ps(min);
 +   maxval = _mm_set1_ps(max);
 +   operand = _mm_max_ps(operand, minval);
 +   operand = _mm_min_ps(operand, maxval);
 +   _mm_storeu_ps(result, operand);
 +}
 +
 +
 +/* Clamp n amount float rgba pixels to [min,max] using SSE2
 + */
 +__attribute__((optimize(unroll-loops)))
 +void
 +_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4],
 + GLfloat rgba_dst[][4], const GLfloat min,
 + const GLfloat max)
 +{
 +   int  c, prefetch_c;
 +   float*   worker = rgba_src[0][0];
 +   __m128   operand[2], minval, maxval;
 +
 +   _mm_prefetch((char*) (((unsigned long)worker)|0x1f) + 65, _MM_HINT_T0);
^^^
 
 Hi,
 
 May I ask why precisely this numbers?

0x1f as you note below is a typo, should be 0x0f. 65 is cache line
length added with one to even the |0x1f operation.

 
 +
 +   minval = _mm_set1_ps(min);
 +   maxval = _mm_set1_ps(max);
 +
 +   for (c = n*4; c  0  (((unsigned long)worker)0x1f) != 0; c--, 
 worker++) {
 ^
 
 I guess that this is for alignment, but you only need to align to a 16
 bytes boundary, not 32. Or maybe I am missing something obvious.
 

You are correct, 0x1f is typo. should be 0x0f

 +  operand[0] = _mm_load_ss(worker);
 +  operand[0] = _mm_max_ss(operand[0], minval);
 +  operand[0] = _mm_min_ss(operand[0], maxval);
 +  _mm_store_ss(worker, operand[0]);
 +   }
 +
 +   while (c = 8) {
 +  _mm_prefetch((char*) worker + 64, _MM_HINT_T0);
   ^^^
 +
 +  for (prefetch_c = 64/8; prefetch_c  0  c = 8; prefetch_c--, c-=8,
   
 
 May I ask also why this numbers?
 

64 is cache line length in bytes, 8 mean this loop handle 8 floats in
one go, operand[0] get 4 floats and same goes for operand[1]. I found
interleaving this way to give more performance, adding

Re: [Mesa-dev] [PATCH 1/2] i965: Combine offset/texture_offset fields.

2014-11-12 Thread Jason Ekstrand

These too are
Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com

On Wed, Nov 12, 2014 at 11:28 AM, Matt Turner matts...@gmail.com wrote:

 texture_offset was only used by some texturing operations, and offset
 was only used by spill/unspill and some URB operations. These fields are
 never used at the same time.
 ---
  src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 2 +-
  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 6 +++---
  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 4 ++--
  src/mesa/drivers/dri/i965/brw_shader.h   | 3 +--
  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 6 +++---
  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp   | 7 +++
  6 files changed, 13 insertions(+), 15 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
 b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
 index 5fdbf46..b1c433e 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
 @@ -157,7 +157,7 @@ instructions_match(fs_inst *a, fs_inst *b)
a-conditional_mod == b-conditional_mod 
a-dst.type == b-dst.type 
a-sources == b-sources 
 -  (a-is_tex() ? (a-texture_offset == b-texture_offset 
 +  (a-is_tex() ? (a-offset == b-offset 
a-mlen == b-mlen 
a-regs_written == b-regs_written 
a-base_mrf == b-base_mrf 
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 index c95beb6..dc9e803 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 @@ -556,7 +556,7 @@ fs_generator::generate_tex(fs_inst *inst, struct
 brw_reg dst, struct brw_reg src
  * Otherwise, we can use an implied move from g0 to the first message
 reg.
  */
 if (inst-header_present) {
 -  if (brw-gen  6  !inst-texture_offset) {
 +  if (brw-gen  6  !inst-offset) {
   /* Set up an implied move from g0 to the MRF. */
   src = retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UW);
} else {
 @@ -575,10 +575,10 @@ fs_generator::generate_tex(fs_inst *inst, struct
 brw_reg dst, struct brw_reg src
   /* Explicitly set up the message header by copying g0 to the
 MRF. */
   brw_MOV(p, header_reg, brw_vec8_grf(0, 0));

 - if (inst-texture_offset) {
 + if (inst-offset) {
  /* Set the offset bits in DWord 2. */
  brw_MOV(p, get_element_ud(header_reg, 2),
 -   brw_imm_ud(inst-texture_offset));
 +   brw_imm_ud(inst-offset));
   }

   brw_adjust_sampler_state_pointer(p, header_reg, sampler_index,
 dst);
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 index 4e1badd..21334a2 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
 @@ -1918,10 +1918,10 @@ fs_visitor::emit_texture(ir_texture_opcode op,
inst-shadow_compare = true;

 if (offset_value.file == IMM)
 -  inst-texture_offset = offset_value.fixed_hw_reg.dw1.ud;
 +  inst-offset = offset_value.fixed_hw_reg.dw1.ud;

 if (op == ir_tg4) {
 -  inst-texture_offset |=
 +  inst-offset |=
   gather_channel(gather_component, sampler)  16; /* M0.2:16-17 */

if (brw-gen == 6)
 diff --git a/src/mesa/drivers/dri/i965/brw_shader.h
 b/src/mesa/drivers/dri/i965/brw_shader.h
 index 94db987..32460e2 100644
 --- a/src/mesa/drivers/dri/i965/brw_shader.h
 +++ b/src/mesa/drivers/dri/i965/brw_shader.h
 @@ -112,8 +112,7 @@ struct backend_instruction {
 const char *annotation;
 /** @} */

 -   uint32_t texture_offset; /** Texture offset bitfield */
 -   uint32_t offset; /** spill/unspill offset */
 +   uint32_t offset; /** spill/unspill offset or texture offset bitfield
 */
 uint8_t mlen; /** SEND message length */
 int8_t base_mrf; /** First MRF in the SEND message, if mlen is
 nonzero. */
 uint8_t target; /** MRT target. */
 diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
 b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
 index e522567..0776a91 100644
 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
 @@ -319,7 +319,7 @@ vec4_generator::generate_tex(vec4_instruction *inst,
  * use an implied move from g0 to the first message register.
  */
 if (inst-header_present) {
 -  if (brw-gen  6  !inst-texture_offset) {
 +  if (brw-gen  6  !inst-offset) {
   /* Set up an implied move from g0 to the MRF. */
   src = brw_vec8_grf(0, 0);
} else {
 @@ -333,10 +333,10 @@ vec4_generator::generate_tex(vec4_instruction *inst,

   brw_set_default_access_mode(p, BRW_ALIGN_1);

 - if (inst-texture_offset) {
 + if (inst-offset) {
  /* Set the texel

Re: [Mesa-dev] [PATCH] i915g: we also have more than 0 viewports!

2014-11-12 Thread Jordan Justen

Reviewed-by: Jordan Justen jordan.l.jus...@intel.com

On 2014-11-12 11:20:46, Kenneth Graunke wrote:
 See 546d6c8d for the corresponding fix in freedreno.
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/gallium/drivers/i915/i915_screen.c | 3 +++
  1 file changed, 3 insertions(+)
 
 Not Piglit tested yet, sorry.
 
 diff --git a/src/gallium/drivers/i915/i915_screen.c 
 b/src/gallium/drivers/i915/i915_screen.c
 index 062f1a6..e9f10bc 100644
 --- a/src/gallium/drivers/i915/i915_screen.c
 +++ b/src/gallium/drivers/i915/i915_screen.c
 @@ -240,6 +240,9 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap 
 cap)
 case PIPE_CAP_SAMPLER_VIEW_TARGET:
return 0;
  
 +   case PIPE_CAP_MAX_VIEWPORTS:
 +  return 1;
 +
 case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
return 64;
  
 -- 
 2.1.3
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mesa 10.4 release plan strawman

2014-11-12 Thread Emil Velikov

On 01/11/14 04:08, Emil Velikov wrote:
 On 22/10/14 22:14, Emil Velikov wrote:
 Hi all,

 I was wondering earlier how far are we until the 10.4 release and it
 hit me... there isn't much left. So in order to stick with the original
 three month release schedule here is my proposal.

 November 14th 2014 - Feature freeze/Release candidate 1
 November 21st 2014 - Release candidate 2
 November 28th 2014 - Release candidate 3
 December 5th 2014 - Release candidate 4/Mesa 10.4.0

 Hi all,
 
 Based on the silence over the last week, it seems that everyone is happy
 with the above plan :-)
 
 As a gentle reminder - I'm planning to branch (feature freeze) 10.4 in
 two weeks. If you have any work that has not landed yet, please try to
 get it in by the 14th.
 
Gents, we have a couple of days until the branch point.

If you have outstanding features that are not in yet and you need please
some extra time, give us a compelling reason why we should delay the
release. Otherwise things will proceed as planned :)

Cheers,
Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] How difficult would it be to have debugging information for Jitted code show up?

2014-11-12 Thread Steven Stewart-Gallus

Okay,

| glxinfo
name of display: :0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.4
server glx extensions:
GLX_ARB_create_context, GLX_ARB_create_context_profile, 
GLX_ARB_create_context_robustness, GLX_ARB_fbconfig_float, 
GLX_ARB_framebuffer_sRGB, GLX_ARB_multisample, 
GLX_EXT_create_context_es2_profile, GLX_EXT_framebuffer_sRGB, 
GLX_EXT_import_context, GLX_EXT_texture_from_pixmap, GLX_EXT_visual_info, 
GLX_EXT_visual_rating, GLX_INTEL_swap_event, GLX_MESA_copy_sub_buffer, 
GLX_OML_swap_method, GLX_SGIS_multisample, GLX_SGIX_fbconfig, 
GLX_SGIX_pbuffer, GLX_SGIX_visual_select_group, GLX_SGI_swap_control
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
client glx extensions:
GLX_ARB_create_context, GLX_ARB_create_context_profile, 
GLX_ARB_create_context_robustness, GLX_ARB_fbconfig_float, 
GLX_ARB_framebuffer_sRGB, GLX_ARB_get_proc_address, GLX_ARB_multisample, 
GLX_EXT_create_context_es2_profile, GLX_EXT_fbconfig_packed_float, 
GLX_EXT_framebuffer_sRGB, GLX_EXT_import_context, 
GLX_EXT_texture_from_pixmap, GLX_EXT_visual_info, GLX_EXT_visual_rating, 
GLX_INTEL_swap_event, GLX_MESA_copy_sub_buffer, 
GLX_MESA_multithread_makecurrent, GLX_MESA_query_renderer, 
GLX_MESA_swap_control, GLX_OML_swap_method, GLX_OML_sync_control, 
GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, 
GLX_SGIX_visual_select_group, GLX_SGI_make_current_read, 
GLX_SGI_swap_control, GLX_SGI_video_sync
GLX version: 1.4
GLX extensions:
GLX_ARB_create_context, GLX_ARB_create_context_profile, 
GLX_ARB_create_context_robustness, GLX_ARB_fbconfig_float, 
GLX_ARB_framebuffer_sRGB, GLX_ARB_get_proc_address, GLX_ARB_multisample, 
GLX_EXT_create_context_es2_profile, GLX_EXT_framebuffer_sRGB, 
GLX_EXT_import_context, GLX_EXT_texture_from_pixmap, GLX_EXT_visual_info, 
GLX_EXT_visual_rating, GLX_INTEL_swap_event, GLX_MESA_copy_sub_buffer, 
GLX_MESA_multithread_makecurrent, GLX_MESA_query_renderer, 
GLX_MESA_swap_control, GLX_OML_swap_method, GLX_OML_sync_control, 
GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, 
GLX_SGIX_visual_select_group, GLX_SGI_make_current_read, 
GLX_SGI_swap_control, GLX_SGI_video_sync
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Bay Trail 
OpenGL core profile version string: 3.3 (Core Profile) Mesa 10.1.3
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
GL_3DFX_texture_compression_FXT1, GL_AMD_conservative_depth, 
GL_AMD_draw_buffers_blend, GL_AMD_performance_monitor, 
GL_AMD_seamless_cubemap_per_texture, GL_AMD_shader_trinary_minmax, 
GL_AMD_vertex_shader_layer, GL_ANGLE_texture_compression_dxt3, 
GL_ANGLE_texture_compression_dxt5, GL_APPLE_object_purgeable, 
GL_ARB_ES2_compatibility, GL_ARB_ES3_compatibility, GL_ARB_base_instance, 
GL_ARB_blend_func_extended, GL_ARB_clear_buffer_object, 
GL_ARB_conservative_depth, GL_ARB_copy_buffer, GL_ARB_debug_output, 
GL_ARB_depth_buffer_float, GL_ARB_depth_clamp, GL_ARB_draw_buffers, 
GL_ARB_draw_buffers_blend, GL_ARB_draw_elements_base_vertex, 
GL_ARB_draw_indirect, GL_ARB_draw_instanced, 
GL_ARB_explicit_attrib_location, GL_ARB_fragment_coord_conventions, 
GL_ARB_fragment_shader, GL_ARB_framebuffer_object, 
GL_ARB_framebuffer_sRGB, GL_ARB_get_program_binary, 
GL_ARB_half_float_pixel, GL_ARB_half_float_vertex, 
GL_ARB_instanced_arrays, GL_ARB_internalformat_query, 
GL_ARB_invalidate_subdata, GL_ARB_map_buffer_alignment, 
GL_ARB_map_buffer_range, GL_ARB_multi_draw_indirect, 
GL_ARB_occlusion_query2, GL_ARB_pixel_buffer_object, GL_ARB_point_sprite, 
GL_ARB_provoking_vertex, GL_ARB_robustness, GL_ARB_sample_shading, 
GL_ARB_sampler_objects, GL_ARB_seamless_cube_map, 
GL_ARB_shader_atomic_counters, GL_ARB_shader_bit_encoding, 
GL_ARB_shader_objects, GL_ARB_shader_texture_lod, 
GL_ARB_shading_language_420pack, GL_ARB_shading_language_packing, 
GL_ARB_sync, GL_ARB_texture_buffer_object, 
GL_ARB_texture_buffer_object_rgb32, GL_ARB_texture_buffer_range, 
GL_ARB_texture_compression_rgtc, GL_ARB_texture_cube_map_array, 
GL_ARB_texture_float, GL_ARB_texture_gather, 
GL_ARB_texture_mirror_clamp_to_edge, GL_ARB_texture_multisample, 
GL_ARB_texture_non_power_of_two, GL_ARB_texture_query_levels, 
GL_ARB_texture_query_lod, GL_ARB_texture_rectangle, GL_ARB_texture_rg, 
GL_ARB_texture_rgb10_a2ui, GL_ARB_texture_storage, 
GL_ARB_texture_storage_multisample, GL_ARB_texture_swizzle, 
GL_ARB_timer_query, GL_ARB_transform_feedback2, 
GL_ARB_transform_feedback3, GL_ARB_transform_feedback_instanced, 
GL_ARB_uniform_buffer_object, GL_ARB_vertex_array_bgra,

[Mesa-dev] [Bug 86070] Host application crash on vmware fusion 7 in vmw_swc_flush

2014-11-12 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=86070

--- Comment #6 from Emil Velikov emil.l.veli...@gmail.com ---
(In reply to Sinclair Yeh from comment #4)
 Do you see this vmw_ioctl_command error Invalid argument. in the terminal
 when you run mplay-bin?

Those messages happen when a mix of downstream  upstream modules happen.
Namely:
- Both downstream vs upstream modules have the same name. Thus userspace loads
either one.
- Upstream modules require another module which is available as upstream 
downstream. In some cases via alias.

One could fix this, by stripping/avoiding the build of the downstream module
when the kernel is new enough/has the equivalent module.
Please kindly forward this to the relevant team/person :)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] How difficult would it be to have debugging information for Jitted code show up?

2014-11-12 Thread Kenneth Graunke

On Wednesday, November 12, 2014 10:09:00 PM Steven Stewart-Gallus wrote:
 OpenGL vendor string: Intel Open Source Technology Center
 OpenGL renderer string: Mesa DRI Intel(R) Bay Trail 
 OpenGL core profile version string: 3.3 (Core Profile) Mesa 10.1.3

You're using the classic Intel driver (i965_dri.so) on Baytrail.

If you want to be able to use tools like sysprof or perf, you just need to 
build Mesa with debugging symbols.

I recommend:

CFLAGS='-g -O2 -fno-omit-frame-pointer' CXXFLAGS='-g -O2 -fno-omit-frame-
pointer' ./autogen.sh --enable-gles1 --enable-gles2 --enable-glx-tls --with-
egl-platforms=x11,drm --with-gallium-drivers= --with-dri-drivers=i965,swrast

This should give you actual function names instead of ?? inside of 
i965_dri.so.

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/13] r300: Drop the /* gap */ notes.

2014-11-12 Thread Eric Anholt

This switch statement's code structure isn't dependent on the numbers of
the opcodes at all.
---
 src/gallium/drivers/r300/r300_tgsi_to_rc.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/src/gallium/drivers/r300/r300_tgsi_to_rc.c 
b/src/gallium/drivers/r300/r300_tgsi_to_rc.c
index 4448f88..7ea9cd2 100644
--- a/src/gallium/drivers/r300/r300_tgsi_to_rc.c
+++ b/src/gallium/drivers/r300/r300_tgsi_to_rc.c
@@ -53,7 +53,6 @@ static unsigned translate_opcode(unsigned opcode)
 case TGSI_OPCODE_LRP: return RC_OPCODE_LRP;
 case TGSI_OPCODE_CND: return RC_OPCODE_CND;
  /* case TGSI_OPCODE_DP2A: return RC_OPCODE_DP2A; */
-/* gap */
 case TGSI_OPCODE_FRC: return RC_OPCODE_FRC;
 case TGSI_OPCODE_CLAMP: return RC_OPCODE_CLAMP;
 case TGSI_OPCODE_FLR: return RC_OPCODE_FLR;
@@ -62,7 +61,6 @@ static unsigned translate_opcode(unsigned opcode)
 case TGSI_OPCODE_LG2: return RC_OPCODE_LG2;
 case TGSI_OPCODE_POW: return RC_OPCODE_POW;
 case TGSI_OPCODE_XPD: return RC_OPCODE_XPD;
-/* gap */
 case TGSI_OPCODE_ABS: return RC_OPCODE_ABS;
  /* case TGSI_OPCODE_RCC: return RC_OPCODE_RCC; */
 case TGSI_OPCODE_DPH: return RC_OPCODE_DPH;
@@ -132,7 +130,6 @@ static unsigned translate_opcode(unsigned opcode)
  /* case TGSI_OPCODE_ENDLOOP2: return RC_OPCODE_ENDLOOP2; */
  /* case TGSI_OPCODE_ENDSUB: return RC_OPCODE_ENDSUB; */
 case TGSI_OPCODE_NOP: return RC_OPCODE_NOP;
-/* gap */
  /* case TGSI_OPCODE_NRM4: return RC_OPCODE_NRM4; */
  /* case TGSI_OPCODE_CALLNZ: return RC_OPCODE_CALLNZ; */
  /* case TGSI_OPCODE_BREAKC: return RC_OPCODE_BREAKC; */
-- 
2.1.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 03/13] ilo: Drop the explicit intialization of gaps in TGSI opcodes.

2014-11-12 Thread Eric Anholt

The nice thing about the good way of initializing arrays like this is that
you don't need to initialize everything in order, or even everything at
all.  Taking advantage of that only needs a tiny fixup to deal with the
default NULL value of the pointers.

I haven't dropped the initialization of opcodes that exist and are unsupported.
---
 src/gallium/drivers/ilo/shader/toy_tgsi.c | 28 ++--
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/src/gallium/drivers/ilo/shader/toy_tgsi.c 
b/src/gallium/drivers/ilo/shader/toy_tgsi.c
index 7c74bad..1ba0606 100644
--- a/src/gallium/drivers/ilo/shader/toy_tgsi.c
+++ b/src/gallium/drivers/ilo/shader/toy_tgsi.c
@@ -853,8 +853,6 @@ static const toy_tgsi_translate 
aos_translate_table[TGSI_OPCODE_LAST] = {
[TGSI_OPCODE_CND]  = aos_CND,
[TGSI_OPCODE_SQRT] = aos_simple,
[TGSI_OPCODE_DP2A] = aos_DP2A,
-   [22]   = aos_unsupported,
-   [23]   = aos_unsupported,
[TGSI_OPCODE_FRC]  = aos_simple,
[TGSI_OPCODE_CLAMP]= aos_CLAMP,
[TGSI_OPCODE_FLR]  = aos_simple,
@@ -863,7 +861,6 @@ static const toy_tgsi_translate 
aos_translate_table[TGSI_OPCODE_LAST] = {
[TGSI_OPCODE_LG2]  = aos_simple,
[TGSI_OPCODE_POW]  = aos_simple,
[TGSI_OPCODE_XPD]  = aos_XPD,
-   [32]   = aos_unsupported,
[TGSI_OPCODE_ABS]  = aos_simple,
[TGSI_OPCODE_RCC]  = aos_unsupported,
[TGSI_OPCODE_DPH]  = aos_simple,
@@ -907,11 +904,8 @@ static const toy_tgsi_translate 
aos_translate_table[TGSI_OPCODE_LAST] = {
[TGSI_OPCODE_BRK]  = aos_BRK,
[TGSI_OPCODE_IF]   = aos_simple,
[TGSI_OPCODE_UIF]  = aos_simple,
-   [76]   = aos_unsupported,
[TGSI_OPCODE_ELSE] = aos_simple,
[TGSI_OPCODE_ENDIF]= aos_simple,
-   [79]   = aos_unsupported,
-   [80]   = aos_unsupported,
[TGSI_OPCODE_PUSHA]= aos_unsupported,
[TGSI_OPCODE_POPA] = aos_unsupported,
[TGSI_OPCODE_CEIL] = aos_CEIL,
@@ -919,7 +913,6 @@ static const toy_tgsi_translate 
aos_translate_table[TGSI_OPCODE_LAST] = {
[TGSI_OPCODE_NOT]  = aos_simple,
[TGSI_OPCODE_TRUNC]= aos_simple,
[TGSI_OPCODE_SHL]  = aos_simple,
-   [88]   = aos_unsupported,
[TGSI_OPCODE_AND]  = aos_simple,
[TGSI_OPCODE_OR]   = aos_simple,
[TGSI_OPCODE_MOD]  = aos_simple,
@@ -935,9 +928,6 @@ static const toy_tgsi_translate 
aos_translate_table[TGSI_OPCODE_LAST] = {
[TGSI_OPCODE_ENDLOOP]  = aos_ENDLOOP,
[TGSI_OPCODE_ENDSUB]   = aos_unsupported,
[TGSI_OPCODE_TXQ_LZ]   = aos_tex,
-   [104]  = aos_unsupported,
-   [105]  = aos_unsupported,
-   [106]  = aos_unsupported,
[TGSI_OPCODE_NOP]  = aos_simple,
[TGSI_OPCODE_FSEQ] = aos_set_on_cond,
[TGSI_OPCODE_FSGE] = aos_set_on_cond,
@@ -948,7 +938,6 @@ static const toy_tgsi_translate 
aos_translate_table[TGSI_OPCODE_LAST] = {
[TGSI_OPCODE_BREAKC]   = aos_unsupported,
[TGSI_OPCODE_KILL_IF]  = aos_simple,
[TGSI_OPCODE_END]  = aos_simple,
-   [118]  = aos_unsupported,
[TGSI_OPCODE_F2I]  = aos_simple,
[TGSI_OPCODE_IDIV] = aos_simple,
[TGSI_OPCODE_IMAX] = aos_simple,
@@ -1469,8 +1458,6 @@ static const toy_tgsi_translate 
soa_translate_table[TGSI_OPCODE_LAST] = {
[TGSI_OPCODE_CND]  = soa_per_channel,
[TGSI_OPCODE_SQRT] = soa_scalar_replicate,
[TGSI_OPCODE_DP2A] = soa_dot_product,
-   [22]   = soa_unsupported,
-   [23]   = soa_unsupported,
[TGSI_OPCODE_FRC]  = soa_per_channel,
[TGSI_OPCODE_CLAMP]= soa_per_channel,
[TGSI_OPCODE_FLR]  = soa_per_channel,
@@ -1479,7 +1466,6 @@ static const toy_tgsi_translate 
soa_translate_table[TGSI_OPCODE_LAST] = {
[TGSI_OPCODE_LG2]  = soa_scalar_replicate,
[TGSI_OPCODE_POW]  = soa_scalar_replicate,
[TGSI_OPCODE_XPD]  = soa_XPD,
-   [32]   = soa_unsupported,
[TGSI_OPCODE_ABS]  = soa_per_channel,
[TGSI_OPCODE_RCC]  = soa_unsupported,
[TGSI_OPCODE_DPH]  = soa_dot_product,
@@ -1523,11 +1509,8 @@ static const toy_tgsi_translate 
soa_translate_table[TGSI_OPCODE_LAST] = {
[TGSI_OPCODE_BRK]  = soa_passthrough,
[TGSI_OPCODE_IF]   = soa_if,
[TGSI_OPCODE_UIF]  = soa_if,
-   [76]   = soa_unsupported,
[TGSI_OPCODE_ELSE] = soa_passthrough,
[TGSI_OPCODE_ENDIF]= soa_passthrough,
-   [79]   = soa_unsupported,
-   [80]   = soa_unsupported,
[TGSI_OPCODE_PUSHA]

[Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

2014-11-12 Thread Eric Anholt

This series removes a bunch of unused opcodes, mostly from TGSI.  It
doesn't go as far as we could possibly go -- while I welcome discussion
for future patch series deleting more, I hope that discussion doesn't
derail the review process for these changes.

I haven't messed with the subroutine stuff, since I don't know what people
are planning with that.  I also haven't messed with the pack/unpack
opcodes in TGSI, since they might be useful for some of the GLSL packing
stuff.

Testing status: compile-tested ilo/r600/softpipe, touch-tested softpipe.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/13] gallium: Drop the unused ARA and ARR opcodes.

2014-11-12 Thread Eric Anholt

Neither was generated anywhere in the tree.  Given that address registers
don't really map as a concept to most hardware these days, we're probably
unlikely to ever extend in the direction of using more address register
opcodes.
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi.c|  1 -
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c | 24 -
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c| 11 
 src/gallium/auxiliary/tgsi/tgsi_exec.c | 18 -
 src/gallium/auxiliary/tgsi/tgsi_info.c |  6 ++---
 src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h   |  2 --
 src/gallium/auxiliary/tgsi/tgsi_util.c |  1 -
 src/gallium/docs/source/tgsi.rst   | 21 ---
 src/gallium/drivers/ilo/shader/toy_tgsi.c  |  5 
 src/gallium/drivers/r300/r300_tgsi_to_rc.c |  2 --
 src/gallium/drivers/r600/r600_shader.c | 31 +-
 src/gallium/include/pipe/p_shader_tokens.h |  3 +--
 12 files changed, 9 insertions(+), 116 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
index 4a9ce37..44a44a6 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
@@ -212,7 +212,6 @@ lp_build_tgsi_inst_llvm(
case TGSI_OPCODE_UP4B:
case TGSI_OPCODE_UP4UB:
case TGSI_OPCODE_X2D:
-   case TGSI_OPCODE_ARA:
case TGSI_OPCODE_BRA:
case TGSI_OPCODE_PUSHA:
case TGSI_OPCODE_POPA:
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
index 722aa9a..5daa028 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
@@ -96,18 +96,6 @@ add_emit(
 emit_data-args[0], emit_data-args[1], );
 }
 
-/* TGSI_OPCODE_ARR */
-static void
-arr_emit(
-   const struct lp_build_tgsi_action * action,
-   struct lp_build_tgsi_context * bld_base,
-   struct lp_build_emit_data * emit_data)
-{
-   LLVMValueRef tmp = lp_build_emit_llvm_unary(bld_base, TGSI_OPCODE_ROUND, 
emit_data-args[0]);
-   emit_data-output[emit_data-chan] = 
LLVMBuildFPToSI(bld_base-base.gallivm-builder, tmp,
-   
bld_base-uint_bld.vec_type, );
-}
-
 /* TGSI_OPCODE_CLAMP */
 static void
 clamp_emit(
@@ -948,7 +936,6 @@ lp_set_default_actions(struct lp_build_tgsi_context * 
bld_base)
bld_base-op_actions[TGSI_OPCODE_LG2].fetch_args = scalar_unary_fetch_args;
 
bld_base-op_actions[TGSI_OPCODE_ADD].emit = add_emit;
-   bld_base-op_actions[TGSI_OPCODE_ARR].emit = arr_emit;
bld_base-op_actions[TGSI_OPCODE_CLAMP].emit = clamp_emit;
bld_base-op_actions[TGSI_OPCODE_END].emit = end_emit;
bld_base-op_actions[TGSI_OPCODE_FRC].emit = frc_emit;
@@ -1028,16 +1015,6 @@ arl_emit_cpu(

bld_base-uint_bld.vec_type, );
 }
 
-/* TGSI_OPCODE_ARR (CPU Only) */
-static void
-arr_emit_cpu(
-   const struct lp_build_tgsi_action * action,
-   struct lp_build_tgsi_context * bld_base,
-   struct lp_build_emit_data * emit_data)
-{
-   emit_data-output[emit_data-chan] = lp_build_iround(bld_base-base, 
emit_data-args[0]);
-}
-
 /* TGSI_OPCODE_CEIL (CPU Only) */
 static void
 ceil_emit_cpu(
@@ -1843,7 +1820,6 @@ lp_set_default_actions_cpu(
bld_base-op_actions[TGSI_OPCODE_ADD].emit = add_emit_cpu;
bld_base-op_actions[TGSI_OPCODE_AND].emit = and_emit_cpu;
bld_base-op_actions[TGSI_OPCODE_ARL].emit = arl_emit_cpu;
-   bld_base-op_actions[TGSI_OPCODE_ARR].emit = arr_emit_cpu;
bld_base-op_actions[TGSI_OPCODE_CEIL].emit = ceil_emit_cpu;
bld_base-op_actions[TGSI_OPCODE_CND].emit = cnd_emit_cpu;
bld_base-op_actions[TGSI_OPCODE_COS].emit = cos_emit_cpu;
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
index 3b9833a..4af96c1 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
@@ -798,17 +798,6 @@ lp_emit_instruction_aos(
   return FALSE;
   break;
 
-   case TGSI_OPCODE_ARA:
-  /* deprecated */
-  assert(0);
-  return FALSE;
-  break;
-
-   case TGSI_OPCODE_ARR:
-  src0 = lp_build_emit_fetch(bld-bld_base, inst, 0, LP_CHAN_ALL);
-  dst0 = lp_build_round(bld-bld_base.base, src0);
-  break;
-
case TGSI_OPCODE_BRA:
   /* deprecated */
   assert(0);
diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index b3ea82f..5b9d820 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -93,16 +93,6 @@ micro_arl(union tgsi_exec_channel *dst,
 }
 
 static void
-micro_arr(union tgsi_exec_channel *dst,
-  const union tgsi_exec_channel *src)
-{
-   dst-i[0] = (int)floorf(src-f[0] + 0.5f);
-   dst-i[1] =

[Mesa-dev] [PATCH 04/13] gallium: Drop the NRM and NRM4 opcodes.

2014-11-12 Thread Eric Anholt

They weren't generated in tree, and as far as I know all hardware had to
lower it to a DP, RSQ, MUL.
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c |  5 --
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 95 -
 src/gallium/auxiliary/tgsi/tgsi_exec.c  | 72 ---
 src/gallium/auxiliary/tgsi/tgsi_info.c  |  4 +-
 src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h|  2 -
 src/gallium/docs/source/tgsi.rst| 34 -
 src/gallium/drivers/ilo/shader/toy_tgsi.c   | 89 ---
 src/gallium/drivers/r300/r300_tgsi_to_rc.c  |  2 -
 src/gallium/drivers/r600/r600_shader.c  | 12 ++--
 src/gallium/drivers/svga/svga_tgsi_insn.c   | 38 --
 src/gallium/include/pipe/p_shader_tokens.h  |  4 +-
 11 files changed, 10 insertions(+), 347 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
index f2fc7b0..7829a7e 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
@@ -852,11 +852,6 @@ lp_emit_instruction_aos(
   dst0 = emit_tex(bld, inst, LP_BLD_TEX_MODIFIER_LOD_BIAS);
   break;
 
-   case TGSI_OPCODE_NRM:
-  /* fall-through */
-   case TGSI_OPCODE_NRM4:
-  return FALSE;
-
case TGSI_OPCODE_DIV:
   assert(0);
   return FALSE;
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
index 05618bc..76b9d69 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
@@ -3507,99 +3507,6 @@ cont_emit(
lp_exec_continue(bld-exec_mask);
 }
 
-/* XXX: Refactor and move it to lp_bld_tgsi_action.c
- *
- * XXX: What do the comments about xmm registers mean?  Maybe they are left 
over
- * from old code, but there is no garauntee that LLVM will use those registers
- * for this code.
- *
- * XXX: There should be no calls to lp_build_emit_fetch in this function.  This
- * should be handled by the emit_data-fetch_args function. */
-static void
-nrm_emit(
-   const struct lp_build_tgsi_action * action,
-   struct lp_build_tgsi_context * bld_base,
-   struct lp_build_emit_data * emit_data)
-{
-   LLVMValueRef tmp0, tmp1;
-   LLVMValueRef tmp4 = NULL;
-   LLVMValueRef tmp5 = NULL;
-   LLVMValueRef tmp6 = NULL;
-   LLVMValueRef tmp7 = NULL;
-   struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
-
-   uint dims = (emit_data-inst-Instruction.Opcode == TGSI_OPCODE_NRM) ? 3 : 
4;
-
-  if (TGSI_IS_DST0_CHANNEL_ENABLED(emit_data-inst, TGSI_CHAN_X) ||
-  TGSI_IS_DST0_CHANNEL_ENABLED(emit_data-inst, TGSI_CHAN_Y) ||
-  TGSI_IS_DST0_CHANNEL_ENABLED(emit_data-inst, TGSI_CHAN_Z) ||
-  (TGSI_IS_DST0_CHANNEL_ENABLED(emit_data-inst, TGSI_CHAN_W)  dims == 
4)) {
-
-  /* NOTE: Cannot use xmm regs 2/3 here (see emit_rsqrt() above). */
-
-  /* xmm4 = src.x */
-  /* xmm0 = src.x * src.x */
-  tmp0 = lp_build_emit_fetch(bld-bld_base, emit_data-inst, 0, 
TGSI_CHAN_X);
-  if (TGSI_IS_DST0_CHANNEL_ENABLED(emit_data-inst, TGSI_CHAN_X)) {
- tmp4 = tmp0;
-  }
-  tmp0 = lp_build_mul( bld-bld_base.base, tmp0, tmp0);
-
-  /* xmm5 = src.y */
-  /* xmm0 = xmm0 + src.y * src.y */
-  tmp1 = lp_build_emit_fetch(bld-bld_base, emit_data-inst, 0, 
TGSI_CHAN_Y);
-  if (TGSI_IS_DST0_CHANNEL_ENABLED(emit_data-inst, TGSI_CHAN_Y)) {
- tmp5 = tmp1;
-  }
-  tmp1 = lp_build_mul( bld-bld_base.base, tmp1, tmp1);
-  tmp0 = lp_build_add( bld-bld_base.base, tmp0, tmp1);
-
-  /* xmm6 = src.z */
-  /* xmm0 = xmm0 + src.z * src.z */
-  tmp1 = lp_build_emit_fetch(bld-bld_base, emit_data-inst, 0, 
TGSI_CHAN_Z);
-  if (TGSI_IS_DST0_CHANNEL_ENABLED(emit_data-inst, TGSI_CHAN_Z)) {
- tmp6 = tmp1;
-  }
-  tmp1 = lp_build_mul( bld-bld_base.base, tmp1, tmp1);
-  tmp0 = lp_build_add( bld-bld_base.base, tmp0, tmp1);
-
-  if (dims == 4) {
- /* xmm7 = src.w */
- /* xmm0 = xmm0 + src.w * src.w */
- tmp1 = lp_build_emit_fetch(bld-bld_base, emit_data-inst, 0, 
TGSI_CHAN_W);
- if (TGSI_IS_DST0_CHANNEL_ENABLED(emit_data-inst, TGSI_CHAN_W)) {
-tmp7 = tmp1;
- }
- tmp1 = lp_build_mul( bld-bld_base.base, tmp1, tmp1);
- tmp0 = lp_build_add( bld-bld_base.base, tmp0, tmp1);
-  }
-  /* xmm1 = 1 / sqrt(xmm0) */
-  tmp1 = lp_build_rsqrt( bld-bld_base.base, tmp0);
-   /* dst.x = xmm1 * src.x */
-  if (TGSI_IS_DST0_CHANNEL_ENABLED(emit_data-inst, TGSI_CHAN_X)) {
- emit_data-output[TGSI_CHAN_X] = lp_build_mul( bld-bld_base.base, 
tmp4, tmp1);
-  }
-  /* dst.y = xmm1 * src.y */
-  if (TGSI_IS_DST0_CHANNEL_ENABLED(emit_data-inst, TGSI_CHAN_Y)) {
- emit_data-output[TGSI_CHAN_Y] = lp_build_mul( bld-bld_base.base, 
tmp5, tmp1);
-  }
-
-  /* dst.z = xmm1 * src.z */
-  if

[Mesa-dev] [PATCH 09/13] gallium: Drop the unused RFL opcode.

2014-11-12 Thread Eric Anholt

---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c  |  3 --
 src/gallium/auxiliary/tgsi/tgsi_exec.c   | 56 
 src/gallium/auxiliary/tgsi/tgsi_info.c   |  2 +-
 src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h |  1 -
 src/gallium/docs/source/tgsi.rst | 17 ---
 src/gallium/drivers/ilo/shader/toy_tgsi.c|  2 -
 src/gallium/drivers/nouveau/nv30/nvfx_fragprog.c | 13 --
 src/gallium/drivers/r300/r300_tgsi_to_rc.c   |  1 -
 src/gallium/drivers/r600/r600_shader.c   |  6 +--
 src/gallium/include/pipe/p_shader_tokens.h   |  2 +-
 10 files changed, 5 insertions(+), 98 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
index dff5d36..03591a3 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
@@ -706,9 +706,6 @@ lp_emit_instruction_aos(
case TGSI_OPCODE_PK4UB:
   return FALSE;
 
-   case TGSI_OPCODE_RFL:
-  return FALSE;
-
case TGSI_OPCODE_SEQ:
   src0 = lp_build_emit_fetch(bld-bld_base, inst, 0, LP_CHAN_ALL);
   src1 = lp_build_emit_fetch(bld-bld_base, inst, 1, LP_CHAN_ALL);
diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index f937615..0a9642a 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -2762,58 +2762,6 @@ exec_scs(struct tgsi_exec_machine *mach,
 }
 
 static void
-exec_rfl(struct tgsi_exec_machine *mach,
- const struct tgsi_full_instruction *inst)
-{
-   union tgsi_exec_channel r[9];
-
-   if (inst-Dst[0].Register.WriteMask  TGSI_WRITEMASK_XYZ) {
-  /* r0 = dp3(src0, src0) */
-  fetch_source(mach, r[2], inst-Src[0], TGSI_CHAN_X, 
TGSI_EXEC_DATA_FLOAT);
-  micro_mul(r[0], r[2], r[2]);
-  fetch_source(mach, r[4], inst-Src[0], TGSI_CHAN_Y, 
TGSI_EXEC_DATA_FLOAT);
-  micro_mul(r[8], r[4], r[4]);
-  micro_add(r[0], r[0], r[8]);
-  fetch_source(mach, r[6], inst-Src[0], TGSI_CHAN_Z, 
TGSI_EXEC_DATA_FLOAT);
-  micro_mul(r[8], r[6], r[6]);
-  micro_add(r[0], r[0], r[8]);
-
-  /* r1 = dp3(src0, src1) */
-  fetch_source(mach, r[3], inst-Src[1], TGSI_CHAN_X, 
TGSI_EXEC_DATA_FLOAT);
-  micro_mul(r[1], r[2], r[3]);
-  fetch_source(mach, r[5], inst-Src[1], TGSI_CHAN_Y, 
TGSI_EXEC_DATA_FLOAT);
-  micro_mul(r[8], r[4], r[5]);
-  micro_add(r[1], r[1], r[8]);
-  fetch_source(mach, r[7], inst-Src[1], TGSI_CHAN_Z, 
TGSI_EXEC_DATA_FLOAT);
-  micro_mul(r[8], r[6], r[7]);
-  micro_add(r[1], r[1], r[8]);
-
-  /* r1 = 2 * r1 / r0 */
-  micro_add(r[1], r[1], r[1]);
-  micro_div(r[1], r[1], r[0]);
-
-  if (inst-Dst[0].Register.WriteMask  TGSI_WRITEMASK_X) {
- micro_mul(r[2], r[2], r[1]);
- micro_sub(r[2], r[2], r[3]);
- store_dest(mach, r[2], inst-Dst[0], inst, TGSI_CHAN_X, 
TGSI_EXEC_DATA_FLOAT);
-  }
-  if (inst-Dst[0].Register.WriteMask  TGSI_WRITEMASK_Y) {
- micro_mul(r[4], r[4], r[1]);
- micro_sub(r[4], r[4], r[5]);
- store_dest(mach, r[4], inst-Dst[0], inst, TGSI_CHAN_Y, 
TGSI_EXEC_DATA_FLOAT);
-  }
-  if (inst-Dst[0].Register.WriteMask  TGSI_WRITEMASK_Z) {
- micro_mul(r[6], r[6], r[1]);
- micro_sub(r[6], r[6], r[7]);
- store_dest(mach, r[6], inst-Dst[0], inst, TGSI_CHAN_Z, 
TGSI_EXEC_DATA_FLOAT);
-  }
-   }
-   if (inst-Dst[0].Register.WriteMask  TGSI_WRITEMASK_W) {
-  store_dest(mach, OneVec, inst-Dst[0], inst, TGSI_CHAN_W, 
TGSI_EXEC_DATA_FLOAT);
-   }
-}
-
-static void
 exec_xpd(struct tgsi_exec_machine *mach,
  const struct tgsi_full_instruction *inst)
 {
@@ -3756,10 +3704,6 @@ exec_instruction(
   assert (0);
   break;
 
-   case TGSI_OPCODE_RFL:
-  exec_rfl(mach, inst);
-  break;
-
case TGSI_OPCODE_SEQ:
   exec_vector_binary(mach, inst, micro_seq, TGSI_EXEC_DATA_FLOAT, 
TGSI_EXEC_DATA_FLOAT);
   break;
diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
b/src/gallium/auxiliary/tgsi/tgsi_info.c
index 0368fc3..efd92fb 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_info.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
@@ -81,7 +81,7 @@ static const struct tgsi_opcode_info 
opcode_info[TGSI_OPCODE_LAST] =
{ 1, 1, 0, 0, 0, 0, COMP, PK2US, TGSI_OPCODE_PK2US },
{ 1, 1, 0, 0, 0, 0, COMP, PK4B, TGSI_OPCODE_PK4B },
{ 1, 1, 0, 0, 0, 0, COMP, PK4UB, TGSI_OPCODE_PK4UB },
-   { 1, 2, 0, 0, 0, 0, COMP, RFL, TGSI_OPCODE_RFL },
+   { 0, 1, 0, 0, 0, 1, NONE, , 44 },  /* removed */
{ 1, 2, 0, 0, 0, 0, COMP, SEQ, TGSI_OPCODE_SEQ },
{ 1, 2, 0, 0, 0, 0, REPL, SFL, TGSI_OPCODE_SFL },
{ 1, 2, 0, 0, 0, 0, COMP, SGT, TGSI_OPCODE_SGT },
diff --git a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h 
b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
index 83a51c7..2612752 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
+++

[Mesa-dev] [PATCH 12/13] mesa: Drop unused SFL/STR opcodes.

2014-11-12 Thread Eric Anholt

They're part of NV_vertex_program2, which I'm pretty sure we're never
going to support.
---
 src/mesa/program/prog_execute.c | 12 
 src/mesa/program/prog_instruction.c |  2 --
 src/mesa/program/prog_instruction.h |  2 --
 3 files changed, 16 deletions(-)

diff --git a/src/mesa/program/prog_execute.c b/src/mesa/program/prog_execute.c
index fcc9ed5..e59ae70 100644
--- a/src/mesa/program/prog_execute.c
+++ b/src/mesa/program/prog_execute.c
@@ -1279,12 +1279,6 @@ _mesa_execute_program(struct gl_context * ctx,
 }
  }
  break;
-  case OPCODE_SFL: /* set false, operands ignored */
- {
-static const GLfloat result[4] = { 0.0F, 0.0F, 0.0F, 0.0F };
-store_vector4(inst, machine, result);
- }
- break;
   case OPCODE_SGE: /* set on greater or equal */
  {
 GLfloat a[4], b[4], result[4];
@@ -1395,12 +1389,6 @@ _mesa_execute_program(struct gl_context * ctx,
 store_vector4(inst, machine, result);
  }
  break;
-  case OPCODE_STR: /* set true, operands ignored */
- {
-static const GLfloat result[4] = { 1.0F, 1.0F, 1.0F, 1.0F };
-store_vector4(inst, machine, result);
- }
- break;
   case OPCODE_SUB:
  {
 GLfloat a[4], b[4], result[4];
diff --git a/src/mesa/program/prog_instruction.c 
b/src/mesa/program/prog_instruction.c
index e2eadc3..abe663d 100644
--- a/src/mesa/program/prog_instruction.c
+++ b/src/mesa/program/prog_instruction.c
@@ -202,7 +202,6 @@ static const struct instruction_info InstInfo[MAX_OPCODE] = 
{
{ OPCODE_RSQ,RSQ, 1, 1 },
{ OPCODE_SCS,SCS, 1, 1 },
{ OPCODE_SEQ,SEQ, 2, 1 },
-   { OPCODE_SFL,SFL, 0, 1 },
{ OPCODE_SGE,SGE, 2, 1 },
{ OPCODE_SGT,SGT, 2, 1 },
{ OPCODE_SIN,SIN, 1, 1 },
@@ -210,7 +209,6 @@ static const struct instruction_info InstInfo[MAX_OPCODE] = 
{
{ OPCODE_SLT,SLT, 2, 1 },
{ OPCODE_SNE,SNE, 2, 1 },
{ OPCODE_SSG,SSG, 1, 1 },
-   { OPCODE_STR,STR, 0, 1 },
{ OPCODE_SUB,SUB, 2, 1 },
{ OPCODE_SWZ,SWZ, 1, 1 },
{ OPCODE_TEX,TEX, 1, 1 },
diff --git a/src/mesa/program/prog_instruction.h 
b/src/mesa/program/prog_instruction.h
index b9604e5..4cca975 100644
--- a/src/mesa/program/prog_instruction.h
+++ b/src/mesa/program/prog_instruction.h
@@ -198,7 +198,6 @@ typedef enum prog_opcode {
OPCODE_RSQ,   /*   XX   X   X X   */
OPCODE_SCS,   /*X X   */
OPCODE_SEQ,   /*2   X X   */
-   OPCODE_SFL,   /*2   X */
OPCODE_SGE,   /*   XX   X   X X   */
OPCODE_SGT,   /*2   X X   */
OPCODE_SIN,   /*X   2   X X   */
@@ -206,7 +205,6 @@ typedef enum prog_opcode {
OPCODE_SLT,   /*   XX   X   X X   */
OPCODE_SNE,   /*2   X X   */
OPCODE_SSG,   /*2 X   */
-   OPCODE_STR,   /*2   X */
OPCODE_SUB,   /*   XX   1.1 X X   */
OPCODE_SWZ,   /*   XX X   */
OPCODE_TEX,   /*X   3   X X   */
-- 
2.1.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 13/13] mesa: Drop unused NV_fragment_program opcodes.

2014-11-12 Thread Eric Anholt

The extension itself was deleted 2 years ago.  There are still some
prog_instruction opcodes from NV_fp that exist because they're used by
ir_to_mesa.cpp, though.
---
 src/mesa/program/prog_execute.c | 144 
 src/mesa/program/prog_instruction.c |  10 ---
 src/mesa/program/prog_instruction.h |  10 ---
 src/mesa/program/program_lexer.l|  13 
 4 files changed, 177 deletions(-)

diff --git a/src/mesa/program/prog_execute.c b/src/mesa/program/prog_execute.c
index e59ae70..650c40f 100644
--- a/src/mesa/program/prog_execute.c
+++ b/src/mesa/program/prog_execute.c
@@ -1119,77 +1119,6 @@ _mesa_execute_program(struct gl_context * ctx,
  break;
   case OPCODE_NOP:
  break;
-  case OPCODE_PK2H:/* pack two 16-bit floats in one 32-bit float */
- {
-GLfloat a[4];
-GLuint result[4];
-GLhalfNV hx, hy;
-fetch_vector4(inst-SrcReg[0], machine, a);
-hx = _mesa_float_to_half(a[0]);
-hy = _mesa_float_to_half(a[1]);
-result[0] =
-result[1] =
-result[2] =
-result[3] = hx | (hy  16);
-store_vector4ui(inst, machine, result);
- }
- break;
-  case OPCODE_PK2US:   /* pack two GLushorts into one 32-bit float */
- {
-GLfloat a[4];
-GLuint result[4], usx, usy;
-fetch_vector4(inst-SrcReg[0], machine, a);
-a[0] = CLAMP(a[0], 0.0F, 1.0F);
-a[1] = CLAMP(a[1], 0.0F, 1.0F);
-usx = F_TO_I(a[0] * 65535.0F);
-usy = F_TO_I(a[1] * 65535.0F);
-result[0] =
-result[1] =
-result[2] =
-result[3] = usx | (usy  16);
-store_vector4ui(inst, machine, result);
- }
- break;
-  case OPCODE_PK4B:/* pack four GLbytes into one 32-bit float */
- {
-GLfloat a[4];
-GLuint result[4], ubx, uby, ubz, ubw;
-fetch_vector4(inst-SrcReg[0], machine, a);
-a[0] = CLAMP(a[0], -128.0F / 127.0F, 1.0F);
-a[1] = CLAMP(a[1], -128.0F / 127.0F, 1.0F);
-a[2] = CLAMP(a[2], -128.0F / 127.0F, 1.0F);
-a[3] = CLAMP(a[3], -128.0F / 127.0F, 1.0F);
-ubx = F_TO_I(127.0F * a[0] + 128.0F);
-uby = F_TO_I(127.0F * a[1] + 128.0F);
-ubz = F_TO_I(127.0F * a[2] + 128.0F);
-ubw = F_TO_I(127.0F * a[3] + 128.0F);
-result[0] =
-result[1] =
-result[2] =
-result[3] = ubx | (uby  8) | (ubz  16) | (ubw  24);
-store_vector4ui(inst, machine, result);
- }
- break;
-  case OPCODE_PK4UB:   /* pack four GLubytes into one 32-bit float */
- {
-GLfloat a[4];
-GLuint result[4], ubx, uby, ubz, ubw;
-fetch_vector4(inst-SrcReg[0], machine, a);
-a[0] = CLAMP(a[0], 0.0F, 1.0F);
-a[1] = CLAMP(a[1], 0.0F, 1.0F);
-a[2] = CLAMP(a[2], 0.0F, 1.0F);
-a[3] = CLAMP(a[3], 0.0F, 1.0F);
-ubx = F_TO_I(255.0F * a[0]);
-uby = F_TO_I(255.0F * a[1]);
-ubz = F_TO_I(255.0F * a[2]);
-ubw = F_TO_I(255.0F * a[3]);
-result[0] =
-result[1] =
-result[2] =
-result[3] = ubx | (uby  8) | (ubz  16) | (ubw  24);
-store_vector4ui(inst, machine, result);
- }
- break;
   case OPCODE_POW:
  {
 GLfloat a[4], b[4], result[4];
@@ -1224,20 +1153,6 @@ _mesa_execute_program(struct gl_context * ctx,
 pc = machine-CallStack[--machine-StackDepth] - 1;
  }
  break;
-  case OPCODE_RFL: /* reflection vector */
- {
-GLfloat axis[4], dir[4], result[4], tmpX, tmpW;
-fetch_vector4(inst-SrcReg[0], machine, axis);
-fetch_vector4(inst-SrcReg[1], machine, dir);
-tmpW = DOT3(axis, axis);
-tmpX = (2.0F * DOT3(axis, dir)) / tmpW;
-result[0] = tmpX * axis[0] - dir[0];
-result[1] = tmpX * axis[1] - dir[1];
-result[2] = tmpX * axis[2] - dir[2];
-/* result[3] is never written! XXX enforce in parser! */
-store_vector4(inst, machine, result);
- }
- break;
   case OPCODE_RSQ: /* 1 / sqrt() */
  {
 GLfloat a[4], result[4];
@@ -1562,52 +1477,6 @@ _mesa_execute_program(struct gl_context * ctx,
 store_vector4(inst, machine, result);
  }
  break;
-  case OPCODE_UP2H:/* unpack two 16-bit floats */
- {
-const GLuint raw = fetch_vector1ui(inst-SrcReg[0], machine);
-GLfloat result[4];
-GLushort hx, hy;
-hx = raw  0x;
-hy = raw  16;
-result[0] = result[2] = _mesa_half_to_float(hx);
-

[Mesa-dev] [PATCH 01/13] r600: Drop the /* gap */ notes.

2014-11-12 Thread Eric Anholt

These are obviously the gaps already, due to the bare numbers with
unsupported implementations.

This makes inserting new gaps less irritating.
---
 src/gallium/drivers/r600/r600_shader.c | 19 ---
 1 file changed, 19 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index aab4215..59d9a46 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -7156,7 +7156,6 @@ static struct r600_shader_tgsi_instruction 
r600_shader_tgsi_instruction[] = {
{TGSI_OPCODE_CND,   0, ALU_OP0_NOP, tgsi_unsupported},
{TGSI_OPCODE_SQRT,  0, ALU_OP1_SQRT_IEEE, 
tgsi_trans_srcx_replicate},
{TGSI_OPCODE_DP2A,  0, ALU_OP0_NOP, tgsi_unsupported},
-   /* gap */
{22,0, ALU_OP0_NOP, tgsi_unsupported},
{23,0, ALU_OP0_NOP, tgsi_unsupported},
{TGSI_OPCODE_FRC,   0, ALU_OP1_FRACT, tgsi_op2},
@@ -7167,7 +7166,6 @@ static struct r600_shader_tgsi_instruction 
r600_shader_tgsi_instruction[] = {
{TGSI_OPCODE_LG2,   0, ALU_OP1_LOG_IEEE, tgsi_trans_srcx_replicate},
{TGSI_OPCODE_POW,   0, ALU_OP0_NOP, tgsi_pow},
{TGSI_OPCODE_XPD,   0, ALU_OP0_NOP, tgsi_xpd},
-   /* gap */
{32,0, ALU_OP0_NOP, tgsi_unsupported},
{TGSI_OPCODE_ABS,   0, ALU_OP1_MOV, tgsi_op2},
{TGSI_OPCODE_RCC,   0, ALU_OP0_NOP, tgsi_unsupported},
@@ -7224,7 +7222,6 @@ static struct r600_shader_tgsi_instruction 
r600_shader_tgsi_instruction[] = {
{TGSI_OPCODE_NOT,   0, ALU_OP1_NOT_INT, tgsi_op2},
{TGSI_OPCODE_TRUNC, 0, ALU_OP1_TRUNC, tgsi_op2},
{TGSI_OPCODE_SHL,   0, ALU_OP2_LSHL_INT, tgsi_op2_trans},
-   /* gap */
{88,0, ALU_OP0_NOP, tgsi_unsupported},
{TGSI_OPCODE_AND,   0, ALU_OP2_AND_INT, tgsi_op2},
{TGSI_OPCODE_OR,0, ALU_OP2_OR_INT, tgsi_op2},
@@ -7241,7 +7238,6 @@ static struct r600_shader_tgsi_instruction 
r600_shader_tgsi_instruction[] = {
{TGSI_OPCODE_ENDLOOP,   0, ALU_OP0_NOP, tgsi_endloop},
{TGSI_OPCODE_ENDSUB,0, ALU_OP0_NOP, tgsi_unsupported},
{TGSI_OPCODE_TXQ_LZ,0, FETCH_OP_GET_TEXTURE_RESINFO, tgsi_tex},
-   /* gap */
{104,   0, ALU_OP0_NOP, tgsi_unsupported},
{105,   0, ALU_OP0_NOP, tgsi_unsupported},
{106,   0, ALU_OP0_NOP, tgsi_unsupported},
@@ -7252,12 +7248,10 @@ static struct r600_shader_tgsi_instruction 
r600_shader_tgsi_instruction[] = {
{TGSI_OPCODE_FSNE,  0, ALU_OP2_SETNE_DX10, tgsi_op2_swap},
{TGSI_OPCODE_NRM4,  0, ALU_OP0_NOP, tgsi_unsupported},
{TGSI_OPCODE_CALLNZ,0, ALU_OP0_NOP, tgsi_unsupported},
-   /* gap */
{114,   0, ALU_OP0_NOP, tgsi_unsupported},
{TGSI_OPCODE_BREAKC,0, ALU_OP0_NOP, tgsi_loop_breakc},
{TGSI_OPCODE_KILL_IF,   0, ALU_OP2_KILLGT, tgsi_kill},  /* conditional 
kill */
{TGSI_OPCODE_END,   0, ALU_OP0_NOP, tgsi_end},  /* aka HALT */
-   /* gap */
{118,   0, ALU_OP0_NOP, tgsi_unsupported},
{TGSI_OPCODE_F2I,   0, ALU_OP1_FLT_TO_INT, tgsi_op2_trans},
{TGSI_OPCODE_IDIV,  0, ALU_OP0_NOP, tgsi_idiv},
@@ -7361,7 +7355,6 @@ static struct r600_shader_tgsi_instruction 
eg_shader_tgsi_instruction[] = {
{TGSI_OPCODE_CND,   0, ALU_OP0_NOP, tgsi_unsupported},
{TGSI_OPCODE_SQRT,  0, ALU_OP1_SQRT_IEEE, 
tgsi_trans_srcx_replicate},
{TGSI_OPCODE_DP2A,  0, ALU_OP0_NOP, tgsi_unsupported},
-   /* gap */
{22,0, ALU_OP0_NOP, tgsi_unsupported},
{23,0, ALU_OP0_NOP, tgsi_unsupported},
{TGSI_OPCODE_FRC,   0, ALU_OP1_FRACT, tgsi_op2},
@@ -7372,7 +7365,6 @@ static struct r600_shader_tgsi_instruction 
eg_shader_tgsi_instruction[] = {
{TGSI_OPCODE_LG2,   0, ALU_OP1_LOG_IEEE, tgsi_trans_srcx_replicate},
{TGSI_OPCODE_POW,   0, ALU_OP0_NOP, tgsi_pow},
{TGSI_OPCODE_XPD,   0, ALU_OP0_NOP, tgsi_xpd},
-   /* gap */
{32,0, ALU_OP0_NOP, tgsi_unsupported},
{TGSI_OPCODE_ABS,   0, ALU_OP1_MOV, tgsi_op2},
{TGSI_OPCODE_RCC,   0, ALU_OP0_NOP, tgsi_unsupported},
@@ -7429,7 +7421,6 @@ static struct r600_shader_tgsi_instruction 
eg_shader_tgsi_instruction[] = {
{TGSI_OPCODE_NOT,   0, ALU_OP1_NOT_INT, tgsi_op2},
{TGSI_OPCODE_TRUNC, 0, ALU_OP1_TRUNC, tgsi_op2},
{TGSI_OPCODE_SHL,   0, ALU_OP2_LSHL_INT, tgsi_op2},
-   /* gap */
{88,0, ALU_OP0_NOP, tgsi_unsupported},
{TGSI_OPCODE_AND,   0, ALU_OP2_AND_INT, tgsi_op2},
{TGSI_OPCODE_OR,0, ALU_OP2_OR_INT, tgsi_op2},
@@ -7446,7 +7437,6 @@ static struct r600_shader_tgsi_instruction

[Mesa-dev] [PATCH 08/13] gallium: Drop unused X2D opcode.

2014-11-12 Thread Eric Anholt

Nothing in the tree generates it.
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi.c |  1 -
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c |  6 
 src/gallium/auxiliary/tgsi/tgsi_exec.c  | 45 -
 src/gallium/auxiliary/tgsi/tgsi_info.c  |  2 +-
 src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h|  1 -
 src/gallium/docs/source/tgsi.rst| 17 --
 src/gallium/drivers/ilo/shader/toy_tgsi.c   |  2 --
 src/gallium/drivers/r300/r300_tgsi_to_rc.c  |  1 -
 src/gallium/drivers/r600/r600_shader.c  |  6 ++--
 src/gallium/include/pipe/p_shader_tokens.h  |  1 -
 10 files changed, 4 insertions(+), 78 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
index 44a44a6..c5d3679 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
@@ -211,7 +211,6 @@ lp_build_tgsi_inst_llvm(
case TGSI_OPCODE_UP2US:
case TGSI_OPCODE_UP4B:
case TGSI_OPCODE_UP4UB:
-   case TGSI_OPCODE_X2D:
case TGSI_OPCODE_BRA:
case TGSI_OPCODE_PUSHA:
case TGSI_OPCODE_POPA:
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
index 5b7993e..dff5d36 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
@@ -783,12 +783,6 @@ lp_emit_instruction_aos(
   return FALSE;
   break;
 
-   case TGSI_OPCODE_X2D:
-  /* deprecated? */
-  assert(0);
-  return FALSE;
-  break;
-
case TGSI_OPCODE_BRA:
   /* deprecated */
   assert(0);
diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index 8b1a2fb..f937615 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -2762,47 +2762,6 @@ exec_scs(struct tgsi_exec_machine *mach,
 }
 
 static void
-exec_x2d(struct tgsi_exec_machine *mach,
- const struct tgsi_full_instruction *inst)
-{
-   union tgsi_exec_channel r[4];
-   union tgsi_exec_channel d[2];
-
-   fetch_source(mach, r[0], inst-Src[1], TGSI_CHAN_X, TGSI_EXEC_DATA_FLOAT);
-   fetch_source(mach, r[1], inst-Src[1], TGSI_CHAN_Y, TGSI_EXEC_DATA_FLOAT);
-   if (inst-Dst[0].Register.WriteMask  TGSI_WRITEMASK_XZ) {
-  fetch_source(mach, r[2], inst-Src[2], TGSI_CHAN_X, 
TGSI_EXEC_DATA_FLOAT);
-  micro_mul(r[2], r[2], r[0]);
-  fetch_source(mach, r[3], inst-Src[2], TGSI_CHAN_Y, 
TGSI_EXEC_DATA_FLOAT);
-  micro_mul(r[3], r[3], r[1]);
-  micro_add(r[2], r[2], r[3]);
-  fetch_source(mach, r[3], inst-Src[0], TGSI_CHAN_X, 
TGSI_EXEC_DATA_FLOAT);
-  micro_add(d[0], r[2], r[3]);
-   }
-   if (inst-Dst[0].Register.WriteMask  TGSI_WRITEMASK_YW) {
-  fetch_source(mach, r[2], inst-Src[2], TGSI_CHAN_Z, 
TGSI_EXEC_DATA_FLOAT);
-  micro_mul(r[2], r[2], r[0]);
-  fetch_source(mach, r[3], inst-Src[2], TGSI_CHAN_W, 
TGSI_EXEC_DATA_FLOAT);
-  micro_mul(r[3], r[3], r[1]);
-  micro_add(r[2], r[2], r[3]);
-  fetch_source(mach, r[3], inst-Src[0], TGSI_CHAN_Y, 
TGSI_EXEC_DATA_FLOAT);
-  micro_add(d[1], r[2], r[3]);
-   }
-   if (inst-Dst[0].Register.WriteMask  TGSI_WRITEMASK_X) {
-  store_dest(mach, d[0], inst-Dst[0], inst, TGSI_CHAN_X, 
TGSI_EXEC_DATA_FLOAT);
-   }
-   if (inst-Dst[0].Register.WriteMask  TGSI_WRITEMASK_Y) {
-  store_dest(mach, d[1], inst-Dst[0], inst, TGSI_CHAN_Y, 
TGSI_EXEC_DATA_FLOAT);
-   }
-   if (inst-Dst[0].Register.WriteMask  TGSI_WRITEMASK_Z) {
-  store_dest(mach, d[0], inst-Dst[0], inst, TGSI_CHAN_Z, 
TGSI_EXEC_DATA_FLOAT);
-   }
-   if (inst-Dst[0].Register.WriteMask  TGSI_WRITEMASK_W) {
-  store_dest(mach, d[1], inst-Dst[0], inst, TGSI_CHAN_W, 
TGSI_EXEC_DATA_FLOAT);
-   }
-}
-
-static void
 exec_rfl(struct tgsi_exec_machine *mach,
  const struct tgsi_full_instruction *inst)
 {
@@ -3882,10 +3841,6 @@ exec_instruction(
   assert (0);
   break;
 
-   case TGSI_OPCODE_X2D:
-  exec_x2d(mach, inst);
-  break;
-
case TGSI_OPCODE_BRA:
   assert (0);
   break;
diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
b/src/gallium/auxiliary/tgsi/tgsi_info.c
index 84e5ea0..0368fc3 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_info.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
@@ -96,7 +96,7 @@ static const struct tgsi_opcode_info 
opcode_info[TGSI_OPCODE_LAST] =
{ 1, 1, 0, 0, 0, 0, COMP, UP2US, TGSI_OPCODE_UP2US },
{ 1, 1, 0, 0, 0, 0, COMP, UP4B, TGSI_OPCODE_UP4B },
{ 1, 1, 0, 0, 0, 0, COMP, UP4UB, TGSI_OPCODE_UP4UB },
-   { 1, 3, 0, 0, 0, 0, COMP, X2D, TGSI_OPCODE_X2D },
+   { 0, 1, 0, 0, 0, 1, NONE, , 59 },  /* removed */
{ 0, 1, 0, 0, 0, 1, NONE, , 60 },  /* removed */
{ 0, 1, 0, 0, 0, 1, NONE, , 61 },  /* removed */
{ 0, 1, 0, 0, 0, 0, NONE, BRA, TGSI_OPCODE_BRA },
diff --git a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h 
b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
index

[Mesa-dev] [PATCH 07/13] gallium: Drop the unused CND opcode.

2014-11-12 Thread Eric Anholt

Nothing in the tree generated it.
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c | 19 ---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c|  9 -
 src/gallium/auxiliary/tgsi/tgsi_exec.c | 16 
 src/gallium/auxiliary/tgsi/tgsi_info.c |  2 +-
 src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h   |  1 -
 src/gallium/auxiliary/tgsi/tgsi_util.c |  1 -
 src/gallium/docs/source/tgsi.rst   | 13 -
 src/gallium/drivers/ilo/shader/toy_tgsi.c  | 17 -
 src/gallium/drivers/r300/r300_tgsi_to_rc.c |  1 -
 src/gallium/drivers/r600/r600_shader.c |  6 +++---
 src/gallium/include/pipe/p_shader_tokens.h |  2 +-
 11 files changed, 5 insertions(+), 82 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
index 5daa028..b4d63ed 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
@@ -1058,24 +1058,6 @@ ucmp_emit_cpu(
   cond, emit_data-args[1], emit_data-args[2]);
 }
 
-
-/* TGSI_OPCODE_CND (CPU Only) */
-static void
-cnd_emit_cpu(
-   const struct lp_build_tgsi_action * action,
-   struct lp_build_tgsi_context * bld_base,
-   struct lp_build_emit_data * emit_data)
-{
-   LLVMValueRef half, tmp;
-   half = lp_build_const_vec(bld_base-base.gallivm, bld_base-base.type, 0.5);
-   tmp = lp_build_cmp(bld_base-base, PIPE_FUNC_GREATER,
-  emit_data-args[2], half);
-   emit_data-output[emit_data-chan] = lp_build_select(bld_base-base,
-  tmp,
-  emit_data-args[0],
-  emit_data-args[1]);
-}
-
 /* TGSI_OPCODE_COS (CPU Only) */
 static void
 cos_emit_cpu(
@@ -1821,7 +1803,6 @@ lp_set_default_actions_cpu(
bld_base-op_actions[TGSI_OPCODE_AND].emit = and_emit_cpu;
bld_base-op_actions[TGSI_OPCODE_ARL].emit = arl_emit_cpu;
bld_base-op_actions[TGSI_OPCODE_CEIL].emit = ceil_emit_cpu;
-   bld_base-op_actions[TGSI_OPCODE_CND].emit = cnd_emit_cpu;
bld_base-op_actions[TGSI_OPCODE_COS].emit = cos_emit_cpu;
bld_base-op_actions[TGSI_OPCODE_CMP].emit = cmp_emit_cpu;
bld_base-op_actions[TGSI_OPCODE_DIV].emit = div_emit_cpu;
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
index 4af96c1..5b7993e 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
@@ -620,15 +620,6 @@ lp_emit_instruction_aos(
   dst0 = lp_build_add(bld-bld_base.base, tmp0, src2);
   break;
 
-   case TGSI_OPCODE_CND:
-  src0 = lp_build_emit_fetch(bld-bld_base, inst, 0, LP_CHAN_ALL);
-  src1 = lp_build_emit_fetch(bld-bld_base, inst, 1, LP_CHAN_ALL);
-  src2 = lp_build_emit_fetch(bld-bld_base, inst, 2, LP_CHAN_ALL);
-  tmp1 = lp_build_const_vec(bld-bld_base.base.gallivm, 
bld-bld_base.base.type, 0.5);
-  tmp0 = lp_build_cmp(bld-bld_base.base, PIPE_FUNC_GREATER, src2, tmp1);
-  dst0 = lp_build_select(bld-bld_base.base, tmp0, src0, src1);
-  break;
-
case TGSI_OPCODE_DP2A:
   return FALSE;
 
diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index 5b9d820..8b1a2fb 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -127,18 +127,6 @@ micro_cmp(union tgsi_exec_channel *dst,
 }
 
 static void
-micro_cnd(union tgsi_exec_channel *dst,
-  const union tgsi_exec_channel *src0,
-  const union tgsi_exec_channel *src1,
-  const union tgsi_exec_channel *src2)
-{
-   dst-f[0] = src2-f[0]  0.5f ? src0-f[0] : src1-f[0];
-   dst-f[1] = src2-f[1]  0.5f ? src0-f[1] : src1-f[1];
-   dst-f[2] = src2-f[2]  0.5f ? src0-f[2] : src1-f[2];
-   dst-f[3] = src2-f[3]  0.5f ? src0-f[3] : src1-f[3];
-}
-
-static void
 micro_cos(union tgsi_exec_channel *dst,
   const union tgsi_exec_channel *src)
 {
@@ -3725,10 +3713,6 @@ exec_instruction(
   exec_vector_trinary(mach, inst, micro_lrp, TGSI_EXEC_DATA_FLOAT, 
TGSI_EXEC_DATA_FLOAT);
   break;
 
-   case TGSI_OPCODE_CND:
-  exec_vector_trinary(mach, inst, micro_cnd, TGSI_EXEC_DATA_FLOAT, 
TGSI_EXEC_DATA_FLOAT);
-  break;
-
case TGSI_OPCODE_SQRT:
   exec_scalar_unary(mach, inst, micro_sqrt, TGSI_EXEC_DATA_FLOAT, 
TGSI_EXEC_DATA_FLOAT);
   break;
diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
b/src/gallium/auxiliary/tgsi/tgsi_info.c
index efdebbd..84e5ea0 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_info.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
@@ -56,7 +56,7 @@ static const struct tgsi_opcode_info 
opcode_info[TGSI_OPCODE_LAST] =
{ 1, 3, 0, 0, 0, 0, COMP, MAD, TGSI_OPCODE_MAD },
{ 1, 2, 0, 0, 0, 0, COMP, SUB, TGSI_OPCODE_SUB },
{ 1, 3, 0, 0, 0, 0, COMP,

[Mesa-dev] [PATCH 11/13] gallium: Drop unused BRA opcode.

2014-11-12 Thread Eric Anholt

Never generated, and implemented in only nvfx vertprog.
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi.c  | 1 -
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c  | 6 --
 src/gallium/auxiliary/tgsi/tgsi_exec.c   | 4 
 src/gallium/auxiliary/tgsi/tgsi_info.c   | 2 +-
 src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h | 1 -
 src/gallium/docs/source/tgsi.rst | 9 -
 src/gallium/drivers/ilo/shader/toy_tgsi.c| 2 --
 src/gallium/drivers/nouveau/nv30/nvfx_fragprog.c | 7 ---
 src/gallium/drivers/nouveau/nv30/nvfx_vertprog.c | 1 -
 src/gallium/drivers/r300/r300_tgsi_to_rc.c   | 1 -
 src/gallium/drivers/r600/r600_shader.c   | 6 +++---
 src/gallium/include/pipe/p_shader_tokens.h   | 1 -
 12 files changed, 4 insertions(+), 37 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
index c5d3679..e391d8a 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
@@ -211,7 +211,6 @@ lp_build_tgsi_inst_llvm(
case TGSI_OPCODE_UP2US:
case TGSI_OPCODE_UP4B:
case TGSI_OPCODE_UP4UB:
-   case TGSI_OPCODE_BRA:
case TGSI_OPCODE_PUSHA:
case TGSI_OPCODE_POPA:
case TGSI_OPCODE_SAD:
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
index bb6582b..02462f2 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
@@ -772,12 +772,6 @@ lp_emit_instruction_aos(
   return FALSE;
   break;
 
-   case TGSI_OPCODE_BRA:
-  /* deprecated */
-  assert(0);
-  return FALSE;
-  break;
-
case TGSI_OPCODE_CAL:
   return FALSE;
 
diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index 65259cd..905e85b 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -3738,10 +3738,6 @@ exec_instruction(
   assert (0);
   break;
 
-   case TGSI_OPCODE_BRA:
-  assert (0);
-  break;
-
case TGSI_OPCODE_CAL:
   /* skip the call if no execution channels are enabled */
   if (mach-ExecMask) {
diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
b/src/gallium/auxiliary/tgsi/tgsi_info.c
index 88dcc5d..193ecd1 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_info.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
@@ -99,7 +99,7 @@ static const struct tgsi_opcode_info 
opcode_info[TGSI_OPCODE_LAST] =
{ 0, 1, 0, 0, 0, 1, NONE, , 59 },  /* removed */
{ 0, 1, 0, 0, 0, 1, NONE, , 60 },  /* removed */
{ 0, 1, 0, 0, 0, 1, NONE, , 61 },  /* removed */
-   { 0, 1, 0, 0, 0, 0, NONE, BRA, TGSI_OPCODE_BRA },
+   { 0, 1, 0, 0, 0, 1, NONE, , 62 },  /* removed */
{ 0, 0, 0, 1, 0, 0, NONE, CAL, TGSI_OPCODE_CAL },
{ 0, 0, 0, 0, 0, 0, NONE, RET, TGSI_OPCODE_RET },
{ 1, 1, 0, 0, 0, 0, COMP, SSG, TGSI_OPCODE_SSG },
diff --git a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h 
b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
index d5c9ca4..207295a 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
@@ -107,7 +107,6 @@ OP11(UP2H)
 OP11(UP2US)
 OP11(UP4B)
 OP11(UP4UB)
-OP01(BRA)
 OP00_LBL(CAL)
 OP00(RET)
 OP11(SSG)
diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst
index 6978fa5..b96e583 100644
--- a/src/gallium/docs/source/tgsi.rst
+++ b/src/gallium/docs/source/tgsi.rst
@@ -841,15 +841,6 @@ This instruction replicates its result.
Considered for removal.
 
 
-.. opcode:: BRA - Branch
-
-  pc = target
-
-.. note::
-
-   Considered for removal.
-
-
 .. opcode:: CALLNZ - Subroutine Call If Not Zero
 
TBD
diff --git a/src/gallium/drivers/ilo/shader/toy_tgsi.c 
b/src/gallium/drivers/ilo/shader/toy_tgsi.c
index 3be19d9..7048504 100644
--- a/src/gallium/drivers/ilo/shader/toy_tgsi.c
+++ b/src/gallium/drivers/ilo/shader/toy_tgsi.c
@@ -811,7 +811,6 @@ static const toy_tgsi_translate 
aos_translate_table[TGSI_OPCODE_LAST] = {
[TGSI_OPCODE_UP2US]= aos_unsupported,
[TGSI_OPCODE_UP4B] = aos_unsupported,
[TGSI_OPCODE_UP4UB]= aos_unsupported,
-   [TGSI_OPCODE_BRA]  = aos_unsupported,
[TGSI_OPCODE_CAL]  = aos_unsupported,
[TGSI_OPCODE_RET]  = aos_unsupported,
[TGSI_OPCODE_SSG]  = aos_set_sign,
@@ -1354,7 +1353,6 @@ static const toy_tgsi_translate 
soa_translate_table[TGSI_OPCODE_LAST] = {
[TGSI_OPCODE_UP2US]= soa_unsupported,
[TGSI_OPCODE_UP4B] = soa_unsupported,
[TGSI_OPCODE_UP4UB]= soa_unsupported,
-   [TGSI_OPCODE_BRA]  = soa_unsupported,
[TGSI_OPCODE_CAL]  = soa_unsupported,
[TGSI_OPCODE_RET]  = soa_unsupported,
[TGSI_OPCODE_SSG]  = soa_per_channel,
diff --git a/src/gallium/drivers/nouveau/nv30/nvfx_fragprog.c

[Mesa-dev] [PATCH 05/13] gallium: Drop the unused RCC opcode.

2014-11-12 Thread Eric Anholt

Nothing in the tree generated it.
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi.c |  1 -
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c |  5 -
 src/gallium/auxiliary/tgsi/tgsi_exec.c  | 20 
 src/gallium/auxiliary/tgsi/tgsi_info.c  |  2 +-
 src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h|  1 -
 src/gallium/auxiliary/tgsi/tgsi_util.c  |  1 -
 src/gallium/docs/source/tgsi.rst| 11 ---
 src/gallium/drivers/ilo/shader/toy_tgsi.c   |  2 --
 src/gallium/drivers/r300/r300_tgsi_to_rc.c  |  1 -
 src/gallium/drivers/r600/r600_shader.c  |  6 +++---
 src/gallium/include/pipe/p_shader_tokens.h  |  2 +-
 11 files changed, 5 insertions(+), 47 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
index 51cb54c..4a9ce37 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
@@ -207,7 +207,6 @@ lp_build_tgsi_inst_llvm(
/* Ignore deprecated instructions */
switch (inst-Instruction.Opcode) {
 
-   case TGSI_OPCODE_RCC:
case TGSI_OPCODE_UP2H:
case TGSI_OPCODE_UP2US:
case TGSI_OPCODE_UP4B:
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
index 7829a7e..3b9833a 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c
@@ -679,11 +679,6 @@ lp_emit_instruction_aos(
case TGSI_OPCODE_XPD:
   return FALSE;
 
-   case TGSI_OPCODE_RCC:
-  /* deprecated? */
-  assert(0);
-  return FALSE;
-
case TGSI_OPCODE_DPH:
   return FALSE;
 
diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index b9a4c7b..b3ea82f 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -911,22 +911,6 @@ micro_div(
 }
 
 static void
-micro_rcc(union tgsi_exec_channel *dst,
-  const union tgsi_exec_channel *src)
-{
-   uint i;
-
-   for (i = 0; i  4; i++) {
-  float recip = 1.0f / src-f[i];
-
-  if (recip  0.0f)
- dst-f[i] = CLAMP(recip, 5.42101e-020f, 1.84467e+019f);
-  else
- dst-f[i] = CLAMP(recip, -1.84467e+019f, -5.42101e-020f);
-   }
-}
-
-static void
 micro_lt(
union tgsi_exec_channel *dst,
const union tgsi_exec_channel *src0,
@@ -3799,10 +3783,6 @@ exec_instruction(
   exec_vector_unary(mach, inst, micro_abs, TGSI_EXEC_DATA_FLOAT, 
TGSI_EXEC_DATA_FLOAT);
   break;
 
-   case TGSI_OPCODE_RCC:
-  exec_scalar_unary(mach, inst, micro_rcc, TGSI_EXEC_DATA_FLOAT, 
TGSI_EXEC_DATA_FLOAT);
-  break;
-
case TGSI_OPCODE_DPH:
   exec_dph(mach, inst);
   break;
diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
b/src/gallium/auxiliary/tgsi/tgsi_info.c
index 6336304..d17426f 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_info.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
@@ -71,7 +71,7 @@ static const struct tgsi_opcode_info 
opcode_info[TGSI_OPCODE_LAST] =
{ 1, 2, 0, 0, 0, 0, COMP, XPD, TGSI_OPCODE_XPD },
{ 0, 0, 0, 0, 0, 0, NONE, , 32 },  /* removed */
{ 1, 1, 0, 0, 0, 0, COMP, ABS, TGSI_OPCODE_ABS },
-   { 1, 1, 0, 0, 0, 0, REPL, RCC, TGSI_OPCODE_RCC },
+   { 0, 0, 0, 0, 0, 0, NONE, , 34 },  /* removed */
{ 1, 2, 0, 0, 0, 0, REPL, DPH, TGSI_OPCODE_DPH },
{ 1, 1, 0, 0, 0, 0, REPL, COS, TGSI_OPCODE_COS },
{ 1, 1, 0, 0, 0, 0, COMP, DDX, TGSI_OPCODE_DDX },
diff --git a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h 
b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
index 8ec3af3..b121d32 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
@@ -87,7 +87,6 @@ OP11(LG2)
 OP12(POW)
 OP12(XPD)
 OP11(ABS)
-OP11(RCC)
 OP12(DPH)
 OP11(COS)
 OP11(DDX)
diff --git a/src/gallium/auxiliary/tgsi/tgsi_util.c 
b/src/gallium/auxiliary/tgsi/tgsi_util.c
index e1cba95..66cb167 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_util.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_util.c
@@ -253,7 +253,6 @@ tgsi_util_get_inst_usage_mask(const struct 
tgsi_full_instruction *inst,
 
case TGSI_OPCODE_EX2:
case TGSI_OPCODE_LG2:
-   case TGSI_OPCODE_RCC:
   read_mask = TGSI_WRITEMASK_X;
   break;
 
diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst
index 49de4ca..c912ec5 100644
--- a/src/gallium/docs/source/tgsi.rst
+++ b/src/gallium/docs/source/tgsi.rst
@@ -404,17 +404,6 @@ This instruction replicates its result.
   dst.w = |src.w|
 
 
-.. opcode:: RCC - Reciprocal Clamped
-
-This instruction replicates its result.
-
-XXX cleanup on aisle three
-
-.. math::
-
-  dst = (1 / src.x)  0 ? clamp(1 / src.x, 5.42101e-020, 1.84467e+019) : 
clamp(1 / src.x, -1.84467e+019, -5.42101e-020)
-
-
 .. opcode:: DPH - Homogeneous Dot Product
 
 This instruction replicates its result.
diff --git a/src/gallium/drivers/ilo/shader/toy_tgsi.c

Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

2014-11-12 Thread Ilia Mirkin

AFAIK at least some of these (NRM, ARR, probably others) were being used by
the d3d9 state tracker. Not sure what its status is, but I believe the hope
was to eventually get it into the tree.

On Wed, Nov 12, 2014 at 8:18 PM, Eric Anholt e...@anholt.net wrote:

 This series removes a bunch of unused opcodes, mostly from TGSI.  It
 doesn't go as far as we could possibly go -- while I welcome discussion
 for future patch series deleting more, I hope that discussion doesn't
 derail the review process for these changes.

 I haven't messed with the subroutine stuff, since I don't know what people
 are planning with that.  I also haven't messed with the pack/unpack
 opcodes in TGSI, since they might be useful for some of the GLSL packing
 stuff.

 Testing status: compile-tested ilo/r600/softpipe, touch-tested softpipe.

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)

2014-11-12 Thread Eric Anholt

Ilia Mirkin imir...@alum.mit.edu writes:

 AFAIK at least some of these (NRM, ARR, probably others) were being used by
 the d3d9 state tracker. Not sure what its status is, but I believe the hope
 was to eventually get it into the tree.

They've got code for lowering NRM and CND to sanity, and no use of ARR,
ARA, X2D, RFL, STR, SFL, or BRA.


pgpaKu7b6qZ2h.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: Avoid deadlock when unloading opengl32.dll

2014-11-12 Thread Tom Stellard

On Fri, Nov 07, 2014 at 04:52:25PM +, jfons...@vmware.com wrote:
 From: José Fonseca jfons...@vmware.com
 

Hi Jose,

This patch is causing random segfaults with OpenCL programs on radeonsi.
I haven't been able to figure out exactly what is happening, so I was
hoping you could help.

I think the problem has something to do with the fact that when clover
probes the hardware for OpenCL devices, the pipe_loader creates an
llvmpipe screen, checks the value of PIPE_CAP_COMPUTE, and then destroys
the screen since PIPE_CAP_COMPUTE is 0.

The only way I can reproduce this bug is by running the piglit OpenCL
tests concurrently.  If it helps, here are the stack traces
from one of the core dumps I captured from a piglit run:

(gdb) thread 1
[Switching to thread 1 (Thread 0x7f6d53cdf700 (LWP 18653))]
#0  0x7f6d53e56d2d in ?? ()
(gdb) bt
#0  0x7f6d53e56d2d in ?? ()
#1  0x in ?? ()
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f6d5495f700 (LWP 18652))]
#0  0x7f6d5aacd44c in pthread_cond_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x7f6d5aacd44c in pthread_cond_wait () from /lib64/libpthread.so.0
#1  0x7f6d54c71dbb in mtx_init (mtx=0x7f6d54c71dbb mtx_init+97,type=0) at 
../../../../../include/c11/threads_posix.h:182
#2  0x7f6d54c72157 in radeon_set_fd_access 
(applier=0x61e828,owner=0x61e800, mutex=0x7f6d54c71dbb mtx_init+97, 
request=0,request_name=0x0, enable=238 '\356') at radeon_drm_winsys.c:70
#3  0x7f6d54c7ad30 in radeon_drm_cs_emit_ioctl (param=0x61e4f0) at 
radeon_drm_winsys.c:598
#4  0x7f6d54c71ce0 in cnd_wait (cond=0x61e4f0, mtx=0x7f6d54c7ad07 
radeon_drm_cs_emit_ioctl+168) at 
../../../../../include/c11/threads_posix.h:152
#5  0x7f6d5aac91da in start_thread () from /lib64/libpthread.so.0
#6  0x7f6d5afd5d7d in clone () from /lib64/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f6d5c20c740 (LWP 18649))]
#0  0x7f6d5afae73e in re_node_set_insert_last () from /lib64/libc.so.6
(gdb) bt
#0  0x7f6d5afae73e in re_node_set_insert_last () from /lib64/libc.so.6
#1  0x7f6d5afae7fe in register_state () from /lib64/libc.so.6
#2  0x7f6d5afb1d39 in re_acquire_state_context () from /lib64/libc.so.6
#3  0x7f6d5afbaa95 in re_compile_internal () from /lib64/libc.so.6
#4  0x7f6d5afbb603 in regcomp () from /lib64/libc.so.6
#5  0x00403e9b in regex_get_matches (src=0x63e6c0 float, 
pattern=0x40b940 ^ulong|ulong2|ulong3|ulong4|ulong8|ulong16$, pmatch=0x0, 
size=0, cflags=4) at /home/tstellar/piglit/tests/cl/program/program-tester.c:476
#6  0x004040e2 in regex_match (src=0x63e6c0 float, pattern=0x40b940 
^ulong|ulong2|ulong3|ulong4|ulong8|ulong16$) at 
/home/tstellar/piglit/tests/cl/program/program-tester.c:532
#7  0x004059c6 in get_test_arg (src=0x63de70 1 buffer float[7] 0.5 
-0.5 0.0 -0.0 nan -3.99 1.5, test=0x645710, arg_in=true) at 
/home/tstellar/piglit/tests/cl/program/program-tester.c:1016
#8  0x00406f4a in parse_config ( config_str=0x63fe30 
\n[config]\nname: Test float trunc built-in on CL 1.1\nclc_version_min: 
10\ndimensions: 1\n\n[test]\nname: trunc float1\nkernel_name: 
test_1_trunc_float\nglobal_size: 7 0 0\n\narg_out: 0 buffer float[7] 0.0 
-0.0..., config=0x60e260 config) at 
/home/tstellar/piglit/tests/cl/program/program-tester.c:1410
#9  0x004074a7 in init (argc=2, argv=0x7fff46612d88, config=0x60e260 
config) at /home/tstellar/piglit/tests/cl/program/program-tester.c:1555
#10 0x7f6d5be0232c in piglit_cl_program_test_init (argc=2, 
argv=0x7fff46612d88, void_config=0x60e260 config) at 
/home/tstellar/piglit/tests/util/piglit-framework-cl-program.c:60
#11 0x7f6d5be00f33 in piglit_cl_framework_run (argc=2, argv=0x7fff46612d88) 
at /home/tstellar/piglit/tests/util/piglit-framework-cl.c:154
#12 0x00403535 in main (argc=2, argv=0x7fff46612d88) at 
/home/tstellar/piglit/tests/cl/program/program-tester.c:164


Thanks,
Tom

 On Windows, DllMain calls and thread creation/destruction are
 serialized, so when llvmpipe is destroyed from DllMain waiting for the
 rasterizer threads to finish will deadlock.
 
 So, instead of waiting for rasterizer threads to have finished, simply wait 
 for the
 rasterizer threads to notify they are just about to finish.
 
 Verified with this very simple program:
 
#include windows.h
int main() {
   HMODULE hModule = LoadLibraryA(opengl32.dll);
   FreeLibrary(hModule);
}
 
 Fixes https://bugs.freedesktop.org/show_bug.cgi?id=76252
 
 Cc: 10.2 10.3 mesa-sta...@lists.freedesktop.org
 ---
  src/gallium/drivers/llvmpipe/lp_rast.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)
 
 diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c 
 b/src/gallium/drivers/llvmpipe/lp_rast.c
 index a3420a2..6b54d43 100644
 --- a/src/gallium/drivers/llvmpipe/lp_rast.c
 +++ b/src/gallium/drivers/llvmpipe/lp_rast.c
 @@ -800,6 +800,8 @@ static PIPE_THREAD_ROUTINE( thread_function, init_data )

[Mesa-dev] [PATCH] i965: Always enable VF statistics

2014-11-12 Thread Ben Widawsky

Every other unit in the geometry pipeline automatically enables
statistics gathering. This part of the pipe has been controlled by the
DEBUG_STATS variable, but this is asymmetric. This dates back to the
original implementation, and I am not sure if there is a reason for it.

I need access to these stats to implement ARB_pipeline_statistics_query.

Eric wrote it, and Ken touched it last. Do you have any opposition?

Cc: Eric Anholt e...@anholt.net
Cc: Kenneth Graunke kenn...@whitecape.org
Signed-off-by: Ben Widawsky b...@bwidawsk.net
---
 src/mesa/drivers/dri/i965/brw_misc_state.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
b/src/mesa/drivers/dri/i965/brw_misc_state.c
index 99fcddc..2c40814 100644
--- a/src/mesa/drivers/dri/i965/brw_misc_state.c
+++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
@@ -929,8 +929,7 @@ brw_upload_invariant_state(struct brw_context *brw)
const uint32_t _3DSTATE_VF_STATISTICS =
   is_965 ? GEN4_3DSTATE_VF_STATISTICS : GM45_3DSTATE_VF_STATISTICS;
BEGIN_BATCH(1);
-   OUT_BATCH(_3DSTATE_VF_STATISTICS  16 |
-(unlikely(INTEL_DEBUG  DEBUG_STATS) ? 1 : 0));
+   OUT_BATCH(_3DSTATE_VF_STATISTICS  16 | 1);
ADVANCE_BATCH();
 }
 
-- 
2.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 86195] Lightswork video editor segfaults

2014-11-12 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=86195

--- Comment #1 from Michel Dänzer mic...@daenzer.net ---
Please run the app with the environment variable R600_DEBUG=vs, capture its
stderr output to a file and attach that file here after the crash.

BTW, does setting the environment variable DRAW_USE_LLVM=0 avoid the problem?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Always enable VF statistics

2014-11-12 Thread Kenneth Graunke

On Wednesday, November 12, 2014 06:54:31 PM Ben Widawsky wrote:
 Every other unit in the geometry pipeline automatically enables
 statistics gathering. This part of the pipe has been controlled by the
 DEBUG_STATS variable, but this is asymmetric. This dates back to the
 original implementation, and I am not sure if there is a reason for it.
 
 I need access to these stats to implement ARB_pipeline_statistics_query.
 
 Eric wrote it, and Ken touched it last. Do you have any opposition?
 
 Cc: Eric Anholt e...@anholt.net
 Cc: Kenneth Graunke kenn...@whitecape.org
 Signed-off-by: Ben Widawsky b...@bwidawsk.net
 ---
  src/mesa/drivers/dri/i965/brw_misc_state.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
b/src/mesa/drivers/dri/i965/brw_misc_state.c
 index 99fcddc..2c40814 100644
 --- a/src/mesa/drivers/dri/i965/brw_misc_state.c
 +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
 @@ -929,8 +929,7 @@ brw_upload_invariant_state(struct brw_context *brw)
 const uint32_t _3DSTATE_VF_STATISTICS =
is_965 ? GEN4_3DSTATE_VF_STATISTICS : GM45_3DSTATE_VF_STATISTICS;
 BEGIN_BATCH(1);
 -   OUT_BATCH(_3DSTATE_VF_STATISTICS  16 |
 -  (unlikely(INTEL_DEBUG  DEBUG_STATS) ? 1 : 0));
 +   OUT_BATCH(_3DSTATE_VF_STATISTICS  16 | 1);
 ADVANCE_BATCH();
  }

My only complaint about this patch is that it doesn't go far enough.  I'm 100% 
for removing DEBUG_STATS completely.  I've never seen any performance penalty 
for enabling statistics.  I think we should leave them on except when there's 
some reason to turn them off (i.e. brw-meta_in_progress flag in the clipper, 
which prevents us from counting i.e. glClear).

Reviewed-by: Kenneth Graunke kenn...@whitecape.org

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] i965/vec4: Use const references in emit() functions.

2014-11-12 Thread Kenneth Graunke

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_vec4.h   | 18 --
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 11 ++-
 2 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 3301dd8..ebbf882 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -399,16 +399,14 @@ public:
vec4_instruction *emit(vec4_instruction *inst);
 
vec4_instruction *emit(enum opcode opcode);
-
-   vec4_instruction *emit(enum opcode opcode, dst_reg dst);
-
-   vec4_instruction *emit(enum opcode opcode, dst_reg dst, src_reg src0);
-
-   vec4_instruction *emit(enum opcode opcode, dst_reg dst,
- src_reg src0, src_reg src1);
-
-   vec4_instruction *emit(enum opcode opcode, dst_reg dst,
- src_reg src0, src_reg src1, src_reg src2);
+   vec4_instruction *emit(enum opcode opcode, const dst_reg dst);
+   vec4_instruction *emit(enum opcode opcode, const dst_reg dst,
+  const src_reg src0);
+   vec4_instruction *emit(enum opcode opcode, const dst_reg dst,
+  const src_reg src0, const src_reg src1);
+   vec4_instruction *emit(enum opcode opcode, const dst_reg dst,
+  const src_reg src0, const src_reg src1,
+  const src_reg src2);
 
vec4_instruction *emit_before(bblock_t *block,
  vec4_instruction *inst,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index b46879b..a8ce498 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -79,8 +79,8 @@ vec4_visitor::emit_before(bblock_t *block, vec4_instruction 
*inst,
 }
 
 vec4_instruction *
-vec4_visitor::emit(enum opcode opcode, dst_reg dst,
-  src_reg src0, src_reg src1, src_reg src2)
+vec4_visitor::emit(enum opcode opcode, const dst_reg dst, const src_reg src0,
+   const src_reg src1, const src_reg src2)
 {
return emit(new(mem_ctx) vec4_instruction(this, opcode, dst,
 src0, src1, src2));
@@ -88,19 +88,20 @@ vec4_visitor::emit(enum opcode opcode, dst_reg dst,
 
 
 vec4_instruction *
-vec4_visitor::emit(enum opcode opcode, dst_reg dst, src_reg src0, src_reg src1)
+vec4_visitor::emit(enum opcode opcode, const dst_reg dst, const src_reg src0,
+   const src_reg src1)
 {
return emit(new(mem_ctx) vec4_instruction(this, opcode, dst, src0, src1));
 }
 
 vec4_instruction *
-vec4_visitor::emit(enum opcode opcode, dst_reg dst, src_reg src0)
+vec4_visitor::emit(enum opcode opcode, const dst_reg dst, const src_reg src0)
 {
return emit(new(mem_ctx) vec4_instruction(this, opcode, dst, src0));
 }
 
 vec4_instruction *
-vec4_visitor::emit(enum opcode opcode, dst_reg dst)
+vec4_visitor::emit(enum opcode opcode, const dst_reg dst)
 {
return emit(new(mem_ctx) vec4_instruction(this, opcode, dst));
 }
-- 
2.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/4] i965/vec4: Make src_reg immediate constructors explicit.

2014-11-12 Thread Kenneth Graunke

We did this for fs_reg a while back, and it's generally a good idea.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_vec4.h  |  6 +--
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 35 ---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 12 ++---
 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 55 ---
 4 files changed, 55 insertions(+), 53 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 8abd166..3d2882d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -99,9 +99,9 @@ public:
 
src_reg(register_file file, int reg, const glsl_type *type);
src_reg();
-   src_reg(float f);
-   src_reg(uint32_t u);
-   src_reg(int32_t i);
+   explicit src_reg(float f);
+   explicit src_reg(uint32_t u);
+   explicit src_reg(int32_t i);
src_reg(struct brw_reg reg);
 
bool equals(const src_reg r) const;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index db0e6cc..58c4df2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -150,7 +150,7 @@ vec4_gs_visitor::emit_prolog()
 */
this-current_annotation = clear r0.2;
dst_reg r0(retype(brw_vec4_grf(0, 0), BRW_REGISTER_TYPE_UD));
-   vec4_instruction *inst = emit(GS_OPCODE_SET_DWORD_2, r0, 0u);
+   vec4_instruction *inst = emit(GS_OPCODE_SET_DWORD_2, r0, src_reg(0u));
inst-force_writemask_all = true;
 
/* Create a virtual register to hold the vertex count */
@@ -158,7 +158,7 @@ vec4_gs_visitor::emit_prolog()
 
/* Initialize the vertex_count register to 0 */
this-current_annotation = initialize vertex_count;
-   inst = emit(MOV(dst_reg(this-vertex_count), 0u));
+   inst = emit(MOV(dst_reg(this-vertex_count), src_reg(0u)));
inst-force_writemask_all = true;
 
if (c-control_data_header_size_bits  0) {
@@ -173,7 +173,7 @@ vec4_gs_visitor::emit_prolog()
*/
   if (c-control_data_header_size_bits = 32) {
  this-current_annotation = initialize control data bits;
- inst = emit(MOV(dst_reg(this-control_data_bits), 0u));
+ inst = emit(MOV(dst_reg(this-control_data_bits), src_reg(0u)));
  inst-force_writemask_all = true;
   }
}
@@ -262,7 +262,7 @@ vec4_gs_visitor::emit_urb_write_header(int mrf)
vec4_instruction *inst = emit(MOV(mrf_reg, r0));
inst-force_writemask_all = true;
emit(GS_OPCODE_SET_WRITE_OFFSET, mrf_reg, this-vertex_count,
-(uint32_t) c-prog_data.output_vertex_size_hwords);
+src_reg(uint32_t(c-prog_data.output_vertex_size_hwords)));
 }
 
 
@@ -349,7 +349,7 @@ vec4_gs_visitor::emit_control_data_bits()
/* If vertex_count is 0, then no control data bits have been accumulated
 * yet, so we should do nothing.
 */
-   emit(CMP(dst_null_d(), this-vertex_count, 0u, BRW_CONDITIONAL_NEQ));
+   emit(CMP(dst_null_d(), this-vertex_count, src_reg(0u), 
BRW_CONDITIONAL_NEQ));
emit(IF(BRW_PREDICATE_NORMAL));
{
   /* If we are using either channel masks or a per-slot offset, then we
@@ -366,11 +366,12 @@ vec4_gs_visitor::emit_control_data_bits()
   src_reg dword_index(this, glsl_type::uint_type);
   if (urb_write_flags) {
  src_reg prev_count(this, glsl_type::uint_type);
- emit(ADD(dst_reg(prev_count), this-vertex_count, 0xu));
+ emit(ADD(dst_reg(prev_count), this-vertex_count,
+  src_reg(0xu)));
  unsigned log2_bits_per_vertex =
 _mesa_fls(c-control_data_bits_per_vertex);
  emit(SHR(dst_reg(dword_index), prev_count,
-  (uint32_t) (6 - log2_bits_per_vertex)));
+  src_reg(uint32_t(6 - log2_bits_per_vertex;
   }
 
   /* Start building the URB write message.  The first MRF gets a copy of
@@ -387,8 +388,8 @@ vec4_gs_visitor::emit_control_data_bits()
   * the appropriate OWORD within the control data header.
   */
  src_reg per_slot_offset(this, glsl_type::uint_type);
- emit(SHR(dst_reg(per_slot_offset), dword_index, 2u));
- emit(GS_OPCODE_SET_WRITE_OFFSET, mrf_reg, per_slot_offset, 1u);
+ emit(SHR(dst_reg(per_slot_offset), dword_index, src_reg(2u)));
+ emit(GS_OPCODE_SET_WRITE_OFFSET, mrf_reg, per_slot_offset, 
src_reg(1u));
   }
 
   if (urb_write_flags  BRW_URB_WRITE_USE_CHANNEL_MASKS) {
@@ -400,10 +401,10 @@ vec4_gs_visitor::emit_control_data_bits()
   * together.
   */
  src_reg channel(this, glsl_type::uint_type);
- inst = emit(AND(dst_reg(channel), dword_index, 3u));
+ inst = emit(AND(dst_reg(channel), dword_index, src_reg(3u)));
  inst-force_writemask_all = true;
  src_reg one(this, glsl_type::uint_type);
- inst = emit(MOV(dst_reg(one), 1u));
+

[Mesa-dev] [PATCH 1/4] i965: Use macros to create prototypes for emitter helpers.

2014-11-12 Thread Kenneth Graunke

We do this almost everywhere else; this should make it easier to modify.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_vec4.h | 98 +++-
 1 file changed, 41 insertions(+), 57 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 750f491..3301dd8 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -414,68 +414,52 @@ public:
  vec4_instruction *inst,
 vec4_instruction *new_inst);
 
-   vec4_instruction *MOV(const dst_reg dst, const src_reg src0);
-   vec4_instruction *NOT(const dst_reg dst, const src_reg src0);
-   vec4_instruction *RNDD(const dst_reg dst, const src_reg src0);
-   vec4_instruction *RNDE(const dst_reg dst, const src_reg src0);
-   vec4_instruction *RNDZ(const dst_reg dst, const src_reg src0);
-   vec4_instruction *FRC(const dst_reg dst, const src_reg src0);
-   vec4_instruction *F32TO16(const dst_reg dst, const src_reg src0);
-   vec4_instruction *F16TO32(const dst_reg dst, const src_reg src0);
-   vec4_instruction *ADD(const dst_reg dst, const src_reg src0,
- const src_reg src1);
-   vec4_instruction *MUL(const dst_reg dst, const src_reg src0,
- const src_reg src1);
-   vec4_instruction *MACH(const dst_reg dst, const src_reg src0,
-  const src_reg src1);
-   vec4_instruction *MAC(const dst_reg dst, const src_reg src0,
- const src_reg src1);
-   vec4_instruction *AND(const dst_reg dst, const src_reg src0,
- const src_reg src1);
-   vec4_instruction *OR(const dst_reg dst, const src_reg src0,
-const src_reg src1);
-   vec4_instruction *XOR(const dst_reg dst, const src_reg src0,
- const src_reg src1);
-   vec4_instruction *DP3(const dst_reg dst, const src_reg src0,
- const src_reg src1);
-   vec4_instruction *DP4(const dst_reg dst, const src_reg src0,
- const src_reg src1);
-   vec4_instruction *DPH(const dst_reg dst, const src_reg src0,
- const src_reg src1);
-   vec4_instruction *SHL(const dst_reg dst, const src_reg src0,
- const src_reg src1);
-   vec4_instruction *SHR(const dst_reg dst, const src_reg src0,
- const src_reg src1);
-   vec4_instruction *ASR(const dst_reg dst, const src_reg src0,
- const src_reg src1);
+#define EMIT1(op) vec4_instruction *op(const dst_reg , const src_reg );
+#define EMIT2(op) vec4_instruction *op(const dst_reg , const src_reg , const 
src_reg );
+#define EMIT3(op) vec4_instruction *op(const dst_reg , const src_reg , const 
src_reg , const src_reg );
+   EMIT1(MOV)
+   EMIT1(NOT)
+   EMIT1(RNDD)
+   EMIT1(RNDE)
+   EMIT1(RNDZ)
+   EMIT1(FRC)
+   EMIT1(F32TO16)
+   EMIT1(F16TO32)
+   EMIT2(ADD)
+   EMIT2(MUL)
+   EMIT2(MACH)
+   EMIT2(MAC)
+   EMIT2(AND)
+   EMIT2(OR)
+   EMIT2(XOR)
+   EMIT2(DP3)
+   EMIT2(DP4)
+   EMIT2(DPH)
+   EMIT2(SHL)
+   EMIT2(SHR)
+   EMIT2(ASR)
vec4_instruction *CMP(dst_reg dst, src_reg src0, src_reg src1,
 enum brw_conditional_mod condition);
vec4_instruction *IF(src_reg src0, src_reg src1,
 enum brw_conditional_mod condition);
vec4_instruction *IF(enum brw_predicate predicate);
-   vec4_instruction *PULL_CONSTANT_LOAD(const dst_reg dst,
-const src_reg index);
-   vec4_instruction *SCRATCH_READ(const dst_reg dst, const src_reg index);
-   vec4_instruction *SCRATCH_WRITE(const dst_reg dst, const src_reg src,
-   const src_reg index);
-   vec4_instruction *LRP(const dst_reg dst, const src_reg a,
- const src_reg y, const src_reg x);
-   vec4_instruction *BFREV(const dst_reg dst, const src_reg value);
-   vec4_instruction *BFE(const dst_reg dst, const src_reg bits,
- const src_reg offset, const src_reg value);
-   vec4_instruction *BFI1(const dst_reg dst, const src_reg bits,
-  const src_reg offset);
-   vec4_instruction *BFI2(const dst_reg dst, const src_reg bfi1_dst,
-  const src_reg insert, const src_reg base);
-   vec4_instruction *FBH(const dst_reg dst, const src_reg value);
-   vec4_instruction *FBL(const dst_reg dst, const src_reg value);
-   vec4_instruction *CBIT(const dst_reg dst, const src_reg value);
-   vec4_instruction *MAD(const dst_reg dst, const src_reg c,
- const src_reg b, const src_reg a);
-   vec4_instruction *ADDC(const dst_reg dst, const src_reg src0,
-  const src_reg src1);
-   vec4_instruction *SUBB(const dst_reg dst, const src_reg src0,
-  const src_reg src1);
+

[Mesa-dev] [PATCH 3/4] i965/vec4: Combine all the math emitters.

2014-11-12 Thread Kenneth Graunke

17 insertions(+), 102 deletions(-).  Works just as well.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_vec4.h   |   8 +-
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 111 -
 2 files changed, 17 insertions(+), 102 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index ebbf882..8abd166 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -503,12 +503,8 @@ public:
 
src_reg fix_3src_operand(src_reg src);
 
-   void emit_math1_gen6(enum opcode opcode, dst_reg dst, src_reg src);
-   void emit_math1_gen4(enum opcode opcode, dst_reg dst, src_reg src);
-   void emit_math(enum opcode opcode, dst_reg dst, src_reg src);
-   void emit_math2_gen6(enum opcode opcode, dst_reg dst, src_reg src0, src_reg 
src1);
-   void emit_math2_gen4(enum opcode opcode, dst_reg dst, src_reg src0, src_reg 
src1);
-   void emit_math(enum opcode opcode, dst_reg dst, src_reg src0, src_reg src1);
+   void emit_math(enum opcode opcode, dst_reg dst, src_reg src0,
+  src_reg src1 = src_reg());
src_reg fix_math_operand(src_reg src);
 
void emit_pack_half_2x16(dst_reg dst, src_reg src0);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index a8ce498..8ce870c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -310,6 +310,9 @@ vec4_visitor::fix_3src_operand(src_reg src)
 src_reg
 vec4_visitor::fix_math_operand(src_reg src)
 {
+   if (brw-gen  6 || brw-gen = 8 || src.file == BAD_FILE)
+  return src;
+
/* The gen6 math instruction ignores the source modifiers --
 * swizzle, abs, negate, and at least some parts of the register
 * region description.
@@ -331,107 +334,23 @@ vec4_visitor::fix_math_operand(src_reg src)
 }
 
 void
-vec4_visitor::emit_math1_gen6(enum opcode opcode, dst_reg dst, src_reg src)
-{
-   src = fix_math_operand(src);
-
-   if (brw-gen == 6  dst.writemask != WRITEMASK_XYZW) {
-  /* The gen6 math instruction must be align1, so we can't do
-   * writemasks.
-   */
-  dst_reg temp_dst = dst_reg(this, glsl_type::vec4_type);
-
-  emit(opcode, temp_dst, src);
-
-  emit(MOV(dst, src_reg(temp_dst)));
-   } else {
-  emit(opcode, dst, src);
-   }
-}
-
-void
-vec4_visitor::emit_math1_gen4(enum opcode opcode, dst_reg dst, src_reg src)
-{
-   vec4_instruction *inst = emit(opcode, dst, src);
-   inst-base_mrf = 1;
-   inst-mlen = 1;
-}
-
-void
-vec4_visitor::emit_math(opcode opcode, dst_reg dst, src_reg src)
-{
-   switch (opcode) {
-   case SHADER_OPCODE_RCP:
-   case SHADER_OPCODE_RSQ:
-   case SHADER_OPCODE_SQRT:
-   case SHADER_OPCODE_EXP2:
-   case SHADER_OPCODE_LOG2:
-   case SHADER_OPCODE_SIN:
-   case SHADER_OPCODE_COS:
-  break;
-   default:
-  unreachable(not reached: bad math opcode);
-   }
-
-   if (brw-gen = 8) {
-  emit(opcode, dst, src);
-   } else if (brw-gen = 6) {
-  emit_math1_gen6(opcode, dst, src);
-   } else {
-  emit_math1_gen4(opcode, dst, src);
-   }
-}
-
-void
-vec4_visitor::emit_math2_gen6(enum opcode opcode,
- dst_reg dst, src_reg src0, src_reg src1)
+vec4_visitor::emit_math(enum opcode opcode,
+   dst_reg dst, src_reg src0, src_reg src1)
 {
src0 = fix_math_operand(src0);
src1 = fix_math_operand(src1);
 
-   if (brw-gen == 6  dst.writemask != WRITEMASK_XYZW) {
-  /* The gen6 math instruction must be align1, so we can't do
-   * writemasks.
-   */
-  dst_reg temp_dst = dst_reg(this, glsl_type::vec4_type);
-  temp_dst.type = dst.type;
-
-  emit(opcode, temp_dst, src0, src1);
-
-  emit(MOV(dst, src_reg(temp_dst)));
-   } else {
-  emit(opcode, dst, src0, src1);
-   }
-}
-
-void
-vec4_visitor::emit_math2_gen4(enum opcode opcode,
- dst_reg dst, src_reg src0, src_reg src1)
-{
-   vec4_instruction *inst = emit(opcode, dst, src0, src1);
-   inst-base_mrf = 1;
-   inst-mlen = 2;
-}
-
-void
-vec4_visitor::emit_math(enum opcode opcode,
-   dst_reg dst, src_reg src0, src_reg src1)
-{
-   switch (opcode) {
-   case SHADER_OPCODE_POW:
-   case SHADER_OPCODE_INT_QUOTIENT:
-   case SHADER_OPCODE_INT_REMAINDER:
-  break;
-   default:
-  unreachable(not reached: unsupported binary math opcode);
-   }
+   vec4_instruction *math = emit(opcode, dst, src0, src1);
 
-   if (brw-gen = 8) {
-  emit(opcode, dst, src0, src1);
-   } else if (brw-gen = 6) {
-  emit_math2_gen6(opcode, dst, src0, src1);
-   } else {
-  emit_math2_gen4(opcode, dst, src0, src1);
+   if (brw-gen == 6  dst.writemask != WRITEMASK_XYZW) {
+  /* MATH on Gen6 must be align1, so we can't do writemasks. */
+  math-dst = dst_reg(this, glsl_type::vec4_type);
+  math-dst.type = dst.type;
+  math-dst.writemask =

Re: [Mesa-dev] [PATCH 3/4] i965/vec4: Combine all the math emitters.

2014-11-12 Thread Matt Turner

On Wed, Nov 12, 2014 at 9:35 PM, Kenneth Graunke kenn...@whitecape.org wrote:
 +vec4_visitor::emit_math(enum opcode opcode,
 +   dst_reg dst, src_reg src0, src_reg src1)

I think you can make the arguments const references too?

 +   if (brw-gen == 6  dst.writemask != WRITEMASK_XYZW) {
 +  /* MATH on Gen6 must be align1, so we can't do writemasks. */
 +  math-dst = dst_reg(this, glsl_type::vec4_type);
 +  math-dst.type = dst.type;
 +  math-dst.writemask = WRITEMASK_XYZW;

I don't think you need to set the writemask (XYZW is the default).

 +  emit(MOV(dst, src_reg(math-dst)));
 +   } else if (brw-gen  6) {
 +  math-base_mrf = 1;
 +  math-mlen = src1.file == BAD_FILE ? 1 : 2;
 }
  }

Series is

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radeonsi: Disable asynchronous DMA except for PIPE_BUFFER

2014-11-12 Thread Michel Dänzer

From: Michel Dänzer michel.daen...@amd.com

Using the asynchronous DMA engine for multi-dimensional operations seems
to cause random GPU lockups for various people. While the root cause for
this might need to be fixed in the kernel, let's disable it for now.

Before re-enabling this, please make sure you can hit all newly enabled
paths in your testing, preferably with both piglit and real world apps,
and get in touch with people on the bug reports below for stability
testing.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85647
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83500
Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Michel Dänzer michel.daen...@amd.com
---
 src/gallium/drivers/radeonsi/si_dma.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_dma.c 
b/src/gallium/drivers/radeonsi/si_dma.c
index b1bd5e7..1d3b524 100644
--- a/src/gallium/drivers/radeonsi/si_dma.c
+++ b/src/gallium/drivers/radeonsi/si_dma.c
@@ -250,6 +250,9 @@ void si_dma_copy(struct pipe_context *ctx,
return;
}
 
+   /* XXX: The paths below cause lockups for some */
+   goto fallback;
+
if (src-format != dst-format || src_box-depth  1 ||
rdst-dirty_level_mask != 0 ||
rdst-cmask.size || rdst-fmask.size ||
-- 
2.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

77 matches

Mail list logo