Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64

2010-08-13 Thread Corbin Simpson
On Fri, Aug 13, 2010 at 8:03 PM, Luca Barbieri  wrote:
>> #if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
>> #define PIPE_ARCH_LITTLE_ENDIAN
>> #elif defined(PIPE_ARCH_PPC) || defined(PIPE_ARCH_PPC_64)
>> #define PIPE_ARCH_BIG_ENDIAN
>> #else
>> #define PIPE_ARCH_UNKNOWN_ENDIAN
>> #endif
>
> Note that this isn't really correct: there endianness must be known by
> the compiler, since it must choose a way to represent global
> initialized 16/32-bit integer variables, among others.
>
> Also, at least some PowerPCs can be configured as little endian (even
> though it is unusual to do so).
>
> Usually the compiler sets the macro WORDS_BIGENDIAN to indicate
> big-endian targets, and this is the one that should be tested.

Feel free to fix it; I introduced those as a base for some r300g
PPC-specific fixes, and have never owned the relevant hardware.

~ C.

-- 
When the facts change, I change my mind. What do you do, sir? ~ Keynes

Corbin Simpson

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Merge of glsl2 branch to master

2010-08-13 Thread Ian Romanick
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ian Romanick wrote:
> I propose that we merge master to glsl2 on *Friday, August 13th* (one
> week from today).  Barring unforeseen issues, I propose that we merge
> glsl2 to master on *Monday, August 16th*.

The master -> glsl2 merge is complete.  There don't appear to be any
regressions in the glsl2 branch caused by the merge.  My plan is to
merge glsl2 -> master on Monday evening, pacific time.  There are still
three build issues with MSVC.  All three either have patches or a
proposed fix.

I've been working on this for almost 12 hours today, so I'm not going to
post the combinations of test results that I usually post.  Sorry.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxmED4ACgkQX1gOwKyEAw9yggCfeSZnpp8IMeZefx593gjJwLAj
AUcAn3L70Z1Yfjck8WVzQOCLQ8J/OGU5
=C84g
-END PGP SIGNATURE-
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] translate_sse: major rewrite (v4)

2010-08-13 Thread Luca Barbieri
Changes in v4:
- Use x86_target() and x86_target_caps()
- Enable translate_sse in x86-64, but not in Win64

Changes in v3:
- Win64 support (untested)
- Use u_cpu_detect.h constants instead of #ifs

Changes in v2:
- Minimize #ifs
- Give a name to magic number CHANNELS_0001
- Add support for CPUs without SSE (only memcpy and swizzles, like non SSE2)
- Fixed comments

translate_sse is currently very limited to the point of
being useless in essentially all cases.

In particular, it only support some float32 and unorm8
formats and doesn't work on x86-64.

This commit rewrites it to support:
1. Dumb memory copy for any pair of identical formats
2. All formats that are swizzles of each other
3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float
4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16
5. Support for x86-64 (doesn't take advantage of it in any way though)

This new translate can even be useful to translate index buffers for
cards that lack 8-bit index support.

It passes the testsuite I wrote, but note that this is a major change, and more
testing would be great.


0002-translate_sse-major-rewrite-v4.patch.gz
Description: GNU Zip compressed data
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] u_cpu_detect: remove arch and little_endian

2010-08-13 Thread Luca Barbieri
This logic duplicates the one in p_config.h, so remove it and adjust
the only two places that were using it.
---
 src/gallium/auxiliary/gallivm/lp_bld_pack.c   |7 +++
 src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |6 +-
 src/gallium/auxiliary/util/u_cpu_detect.c |   18 --
 src/gallium/auxiliary/util/u_cpu_detect.h |   13 +
 4 files changed, 9 insertions(+), 35 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c 
b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
index ecfb13a..b7b630f 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
@@ -171,14 +171,13 @@ lp_build_unpack2(LLVMBuilderRef builder,
   msb = lp_build_zero(src_type);
 
/* Interleave bits */
-   if(util_cpu_caps.little_endian) {
+#ifdef PIPE_ARCH_LITTLE_ENDIAN
   *dst_lo = lp_build_interleave2(builder, src_type, src, msb, 0);
   *dst_hi = lp_build_interleave2(builder, src_type, src, msb, 1);
-   }
-   else {
+#else
   *dst_lo = lp_build_interleave2(builder, src_type, msb, src, 0);
   *dst_hi = lp_build_interleave2(builder, src_type, msb, src, 1);
-   }
+#endif
 
/* Cast the result into the new type (twice as wide) */
 
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
index 3075065..02d43e3 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
@@ -1840,7 +1840,11 @@ lp_build_sample_2d_linear_aos(struct 
lp_build_sample_context *bld,
   unsigned i, j;
 
   for(j = 0; j < h16.type.length; j += 4) {
- unsigned subindex = util_cpu_caps.little_endian ? 0 : 1;
+#ifdef PIPE_ARCH_LITTLE_ENDIAN
+ unsigned subindex = 0;
+#else
+ unsigned subindex = 1;
+#endif
  LLVMValueRef index;
 
  index = LLVMConstInt(elem_type, j/2 + subindex, 0);
diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c 
b/src/gallium/auxiliary/util/u_cpu_detect.c
index b1a8c75..2bbc554 100644
--- a/src/gallium/auxiliary/util/u_cpu_detect.c
+++ b/src/gallium/auxiliary/util/u_cpu_detect.c
@@ -391,23 +391,6 @@ util_cpu_detect(void)
 
memset(&util_cpu_caps, 0, sizeof util_cpu_caps);
 
-   /* Check for arch type */
-#if defined(PIPE_ARCH_MIPS)
-   util_cpu_caps.arch = UTIL_CPU_ARCH_MIPS;
-#elif defined(PIPE_ARCH_ALPHA)
-   util_cpu_caps.arch = UTIL_CPU_ARCH_ALPHA;
-#elif defined(PIPE_ARCH_SPARC)
-   util_cpu_caps.arch = UTIL_CPU_ARCH_SPARC;
-#elif defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
-   util_cpu_caps.arch = UTIL_CPU_ARCH_X86;
-   util_cpu_caps.little_endian = 1;
-#elif defined(PIPE_ARCH_PPC)
-   util_cpu_caps.arch = UTIL_CPU_ARCH_POWERPC;
-   util_cpu_caps.little_endian = 0;
-#else
-   util_cpu_caps.arch = UTIL_CPU_ARCH_UNKNOWN;
-#endif
-
/* Count the number of CPUs in system */
 #if defined(PIPE_OS_WINDOWS)
{
@@ -504,7 +487,6 @@ util_cpu_detect(void)
 
 #ifdef DEBUG
if (debug_get_option_dump_cpu()) {
-  debug_printf("util_cpu_caps.arch = %i\n", util_cpu_caps.arch);
   debug_printf("util_cpu_caps.nr_cpus = %u\n", util_cpu_caps.nr_cpus);
 
   debug_printf("util_cpu_caps.x86_cpu_type = %u\n", 
util_cpu_caps.x86_cpu_type);
diff --git a/src/gallium/auxiliary/util/u_cpu_detect.h 
b/src/gallium/auxiliary/util/u_cpu_detect.h
index 4b3dc39..f3bef09 100644
--- a/src/gallium/auxiliary/util/u_cpu_detect.h
+++ b/src/gallium/auxiliary/util/u_cpu_detect.h
@@ -36,26 +36,15 @@
 #define _UTIL_CPU_DETECT_H
 
 #include "pipe/p_compiler.h"
-
-enum util_cpu_arch {
-   UTIL_CPU_ARCH_UNKNOWN = 0,
-   UTIL_CPU_ARCH_MIPS,
-   UTIL_CPU_ARCH_ALPHA,
-   UTIL_CPU_ARCH_SPARC,
-   UTIL_CPU_ARCH_X86,
-   UTIL_CPU_ARCH_POWERPC
-};
+#include "pipe/p_config.h"
 
 struct util_cpu_caps {
-   enum util_cpu_arch arch;
unsigned nr_cpus;
 
/* Feature flags */
int x86_cpu_type;
unsigned cacheline;
 
-   unsigned little_endian:1;
-
unsigned has_tsc:1;
unsigned has_mmx:1;
unsigned has_mmx2:1;
-- 
1.7.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] rtasm: add minimal x86-64 support and new instructions (v3)

2010-08-13 Thread Luca Barbieri
Changes in v3:
- Add target and target caps functions, so that they could be different in
  principle from the current CPU and they don't need #ifs to check

Changes in v2:
- Win64 support (untested)
- Use u_cpu_detect.h constants instead of #ifs

This commit adds minimal x86-64 support: only movs between registers
are supported for r8-r15, and x64_rexw() must be used to ask for 64-bit
operations.

It also adds several new instructions for the new translate_sse code.
---
 src/gallium/auxiliary/rtasm/rtasm_cpu.c|6 +-
 src/gallium/auxiliary/rtasm/rtasm_x86sse.c |  477 ++--
 src/gallium/auxiliary/rtasm/rtasm_x86sse.h |  101 ++-
 3 files changed, 544 insertions(+), 40 deletions(-)

diff --git a/src/gallium/auxiliary/rtasm/rtasm_cpu.c 
b/src/gallium/auxiliary/rtasm/rtasm_cpu.c
index 2e15751..0461c81 100644
--- a/src/gallium/auxiliary/rtasm/rtasm_cpu.c
+++ b/src/gallium/auxiliary/rtasm/rtasm_cpu.c
@@ -30,7 +30,7 @@
 #include "rtasm_cpu.h"
 
 
-#if defined(PIPE_ARCH_X86)
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
 static boolean rtasm_sse_enabled(void)
 {
static boolean firsttime = 1;
@@ -49,7 +49,7 @@ static boolean rtasm_sse_enabled(void)
 int rtasm_cpu_has_sse(void)
 {
/* FIXME: actually detect this at run-time */
-#if defined(PIPE_ARCH_X86)
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
return rtasm_sse_enabled();
 #else
return 0;
@@ -59,7 +59,7 @@ int rtasm_cpu_has_sse(void)
 int rtasm_cpu_has_sse2(void) 
 {
/* FIXME: actually detect this at run-time */
-#if defined(PIPE_ARCH_X86)
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
return rtasm_sse_enabled();
 #else
return 0;
diff --git a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c 
b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
index 63007c1..e80875a 100644
--- a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
+++ b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
@@ -22,8 +22,9 @@
  **/
 
 #include "pipe/p_config.h"
+#include "util/u_cpu_detect.h"
 
-#if defined(PIPE_ARCH_X86)
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
 
 #include "pipe/p_compiler.h"
 #include "util/u_debug.h"
@@ -231,6 +232,10 @@ static void emit_modrm( struct x86_function *p,

assert(reg.mod == mod_REG);

+   /* TODO: support extended x86-64 registers */
+   assert(reg.idx < 8);
+   assert(regmem.idx < 8);
+
val |= regmem.mod << 6; /* mod field */
val |= reg.idx << 3;/* reg field */
val |= regmem.idx;  /* r/m field */
@@ -363,6 +368,12 @@ int x86_get_label( struct x86_function *p )
  */
 
 
+void x64_rexw(struct x86_function *p)
+{
+   if(x86_target(p) != X86_32)
+  emit_1ub(p, 0x48);
+}
+
 void x86_jcc( struct x86_function *p,
  enum x86_cc cc,
  int label )
@@ -449,6 +460,52 @@ void x86_mov_reg_imm( struct x86_function *p, struct 
x86_reg dst, int imm )
emit_1i(p, imm);
 }
 
+void x86_mov_imm( struct x86_function *p, struct x86_reg dst, int imm )
+{
+   DUMP_RI( dst, imm );
+   if(dst.mod == mod_REG)
+  x86_mov_reg_imm(p, dst, imm);
+   else
+   {
+  emit_1ub(p, 0xc7);
+  emit_modrm_noreg(p, 0, dst);
+  emit_1i(p, imm);
+   }
+}
+
+void x86_mov16_imm( struct x86_function *p, struct x86_reg dst, uint16_t imm )
+{
+   DUMP_RI( dst, imm );
+   emit_1ub(p, 0x66);
+   if(dst.mod == mod_REG)
+   {
+  emit_1ub(p, 0xb8 + dst.idx);
+  emit_2ub(p, imm & 0xff, imm >> 8);
+   }
+   else
+   {
+  emit_1ub(p, 0xc7);
+  emit_modrm_noreg(p, 0, dst);
+  emit_2ub(p, imm & 0xff, imm >> 8);
+   }
+}
+
+void x86_mov8_imm( struct x86_function *p, struct x86_reg dst, uint8_t imm )
+{
+   DUMP_RI( dst, imm );
+   if(dst.mod == mod_REG)
+   {
+  emit_1ub(p, 0xb0 + dst.idx);
+  emit_1ub(p, imm);
+   }
+   else
+   {
+  emit_1ub(p, 0xc6);
+  emit_modrm_noreg(p, 0, dst);
+  emit_1ub(p, imm);
+   }
+}
+
 /**
  * Immediate group 1 instructions.
  */
@@ -520,7 +577,7 @@ void x86_push( struct x86_function *p,
}
 
 
-   p->stack_offset += 4;
+   p->stack_offset += sizeof(void*);
 }
 
 void x86_push_imm32( struct x86_function *p,
@@ -530,7 +587,7 @@ void x86_push_imm32( struct x86_function *p,
emit_1ub(p, 0x68);
emit_1i(p,  imm32);
 
-   p->stack_offset += 4;
+   p->stack_offset += sizeof(void*);
 }
 
 
@@ -540,23 +597,33 @@ void x86_pop( struct x86_function *p,
DUMP_R( reg );
assert(reg.mod == mod_REG);
emit_1ub(p, 0x58 + reg.idx);
-   p->stack_offset -= 4;
+   p->stack_offset -= sizeof(void*);
 }
 
 void x86_inc( struct x86_function *p,
  struct x86_reg reg )
 {
DUMP_R( reg );
-   assert(reg.mod == mod_REG);
-   emit_1ub(p, 0x40 + reg.idx);
+   if(x86_target(p) == X86_32 && reg.mod == mod_REG)
+   {
+  emit_1ub(p, 0x40 + reg.idx);
+  return;
+   }
+   emit_1ub(p, 0xff);
+   emit_modrm_noreg(p, 0, reg);
 }
 
 void x86_dec( struct x86_function *p,
  str

[Mesa-dev] [PATCH 0/2] translate_sse/rtasm improvements (v4)

2010-08-13 Thread Luca Barbieri
This new version replaces direct use of u_cpu_detect.h with rtasm-provided
helpers to check the target and caps.

This seems the cleanest solution, as it allows to target other CPUs than the
running one in theory, and avoids both #ifdefs and duplicating the p_config.h
logic.

The u_cpu_detect.h patch is now separate and independent from these changes.

Luca Barbieri (2):
  rtasm: add minimal x86-64 support and new instructions (v3)
  translate_sse: major rewrite (v4)

 src/gallium/auxiliary/rtasm/rtasm_cpu.c |6 +-
 src/gallium/auxiliary/rtasm/rtasm_x86sse.c  |  477 +-
 src/gallium/auxiliary/rtasm/rtasm_x86sse.h  |  101 ++-
 src/gallium/auxiliary/translate/translate.c |3 +-
 src/gallium/auxiliary/translate/translate_sse.c | 1159 ++-
 5 files changed, 1467 insertions(+), 279 deletions(-)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 29572] [glsl] MSVC build fails with some C99 math functions

2010-08-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=29572

Ian Romanick  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 AssignedTo|mesa-...@lists.freedesktop. |i...@freedesktop.org
   |org |

--- Comment #1 from Ian Romanick  2010-08-13 20:24:30 PDT 
---
Created an attachment (id=37855)
 View: https://bugs.freedesktop.org/attachment.cgi?id=37855
 Review: https://bugs.freedesktop.org/review?bug=29572&attachment=37855

Work-arounds for platforms that lack C99 math functions

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 29573] [glsl2] struct within a struct causes an assertion failure

2010-08-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=29573

Ian Romanick  changed:

   What|Removed |Added

 AssignedTo|mesa-...@lists.freedesktop. |e...@anholt.net
   |org |

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 29573] New: [glsl2] struct within a struct causes an assertion failure

2010-08-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=29573

   Summary: [glsl2] struct within a struct causes an assertion
failure
   Product: Mesa
   Version: git
  Platform: All
OS/Version: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Mesa core
AssignedTo: mesa-dev@lists.freedesktop.org
ReportedBy: i...@freedesktop.org


In CorrectFull.frag, the following structure causes an assertion failure:

struct light1 
{
float intensity;
   vec3 position;
   int test_int[2];
   struct 
   {
  int a;
  float f; 
   } light2;
} lightVar;

This variable is never dereferenced in the program.  The assertion failure is:

ir_validate.cpp:382: void check_node_type(ir_instruction*, void*): Assertion
`ir->type != glsl_type::error_type' failed.

This was first triggered after the commit listed below, but I believe that is
spurious.  The actual assertion is that the declaration of a variable
lightVar_light2 has an error type.  My guess is that rearranging optimization
passes have caused ir_validate to be called before this unused declaration
could be removed.

commit 2f4fe151681a6f6afe1d452eece6cf4144f44e49
Author: Eric Anholt 
Date:   Tue Aug 10 13:06:49 2010 -0700

glsl2: Move the common optimization passes to a helper function.

These are passes that we expect all codegen to be happy with.  The
other lowering passes for Mesa IR are moved to the Mesa IR generator.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64

2010-08-13 Thread Luca Barbieri
> #if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
> #define PIPE_ARCH_LITTLE_ENDIAN
> #elif defined(PIPE_ARCH_PPC) || defined(PIPE_ARCH_PPC_64)
> #define PIPE_ARCH_BIG_ENDIAN
> #else
> #define PIPE_ARCH_UNKNOWN_ENDIAN
> #endif

Note that this isn't really correct: there endianness must be known by
the compiler, since it must choose a way to represent global
initialized 16/32-bit integer variables, among others.

Also, at least some PowerPCs can be configured as little endian (even
though it is unusual to do so).

Usually the compiler sets the macro WORDS_BIGENDIAN to indicate
big-endian targets, and this is the one that should be tested.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64

2010-08-13 Thread Luca Barbieri
I came up with yet another solution, which I believe is the right one.

We remove the arch/abi/endianness in u_cpu_detect.h, but add them as
inline function helpers in rtasm.
Currently they would return a constant, but could be changed if we
ever want rtasm to target anything but the current running CPU.

Most places outside of code generation will actually not even
parse/compile for the wrong architecture (they are inline assembly or
intrinsic usage), and thus can't use ifs, so this should work well.

Changes to replace PIPE_ARCH_* with commonly used macros can be done
separately if desired.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 29572] New: [glsl] MSVC build fails with some C99 math functions

2010-08-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=29572

   Summary: [glsl] MSVC build fails with some C99 math functions
   Product: Mesa
   Version: git
  Platform: x86 (IA32)
OS/Version: Windows (All)
Status: NEW
  Severity: blocker
  Priority: medium
 Component: Mesa core
AssignedTo: mesa-dev@lists.freedesktop.org
ReportedBy: v...@vmware.com


mesa: 8f8cdbfba43550d0b8985fb087961864e4cd92b6 (glsl2)

Build with MSVC.

$ scons quiet=no
...
cl /Fobuild\windows-x86-debug\glsl\ir_constant_expression.obj /c
src\glsl\ir_constant_expression.cpp /TP /nologo /Od /Oi /Oy- /GL- /fp:fast /W3
/MTd /LDd /DDEBUG /DWIN32 /D_WINDOWS /D_WIN32_WINNT=0x0601 /DWINVER=0x0601
/DVC_EXTRALEAN /D_USE_MATH_DEFINES /D_CRT_SECURE_NO_WARNINGS
/D_CRT_SECURE_NO_DEPRECATE /D_SCL_SECURE_NO_WARNINGS /D_SCL_SECURE_NO_DEPRECATE
/D_DEBUG /DPIPE_SUBSYSTEM_WINDOWS_USER /Isrc\talloc /Isrc\mapi /Isrc\mesa
/Iinclude /Isrc\gallium\include /Isrc\gallium\auxiliary /Isrc\gallium\drivers
/Isrc\gallium\winsys /Iinclude\c99 /Z7
ir_constant_expression.cpp
src\glsl\ir_constant_expression.cpp(112) : warning C4244: '=' : conversion from
'float' to 'int', possible loss of data
src\glsl\ir_constant_expression.cpp(118) : warning C4244: '=' : conversion from
'int' to 'float', possible loss of data
src\glsl\ir_constant_expression.cpp(124) : warning C4244: '=' : conversion from
'unsigned int' to 'float', possible loss of data
src\glsl\ir_constant_expression.cpp(130) : warning C4244: '=' : conversion from
'double' to 'float', possible loss of data
src\glsl\ir_constant_expression.cpp(136) : warning C4800: 'float' : forcing
value to bool 'true' or 'false' (performance warning)
src\glsl\ir_constant_expression.cpp(148) : warning C4800: 'unsigned int' :
forcing value to bool 'true' or 'false' (performance warning)
src\glsl\ir_constant_expression.cpp(155) : error C3861: 'truncf': identifier
not found
src\glsl\ir_constant_expression.cpp(209) : warning C4146: unary minus operator
applied to unsigned type, result still unsigned
src\glsl\ir_constant_expression.cpp(275) : warning C4244: '=' : conversion from
'double' to 'float', possible loss of data
src\glsl\ir_constant_expression.cpp(286) : warning C4244: '=' : conversion from
'double' to 'float', possible loss of data
src\glsl\ir_constant_expression.cpp(307) : error C3861: 'exp2f': identifier not
found
src\glsl\ir_constant_expression.cpp(321) : error C3861: 'log2f': identifier not
found
src\glsl\ir_constant_expression.cpp(883) : warning C4244: '=' : conversion from
'double' to 'float', possible loss of data
src\glsl\ir_constant_expression.cpp(1068) : warning C4244: '=' : conversion
from 'double' to 'float', possible loss of data
src\glsl\ir_constant_expression.cpp(1077) : warning C4244: 'initializing' :
conversion from 'double' to 'const float', possible loss of data
src\glsl\ir_constant_expression.cpp(1117) : warning C4244: '=' : conversion
from 'double' to 'float', possible loss of data
scons: *** [build\windows-x86-debug\glsl\ir_constant_expression.obj] Error 2
scons: building terminated because of errors.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 29044] GLSL compiler tracker

2010-08-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=29044

Bug 29044 depends on bug 29500, which changed state.

Bug 29500 Summary: [glsl2]Mesa demo shadow_sampler fail to run with error: 
`shadow2DRectProj' undeclared
https://bugs.freedesktop.org/show_bug.cgi?id=29500

   What|Old Value   |New Value

 Resolution||FIXED
 Status|ASSIGNED|RESOLVED

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 29044] GLSL compiler tracker

2010-08-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=29044

Bug 29044 depends on bug 29537, which changed state.

Bug 29537 Summary: [glsl2] texture2DLod() should not be accepted by fragment 
programs
https://bugs.freedesktop.org/show_bug.cgi?id=29537

   What|Old Value   |New Value

 Resolution||FIXED
 Status|NEW |RESOLVED

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64

2010-08-13 Thread Luca Barbieri
> There's no merit in duplicating in util_caps what's already provided by
> p_config.h / p_compiler.h

Indeed it's not a great thing.

However, Keith wanted to be able to check those with ifs instead of
#ifdefs, and it does indeed make the code a bit nicer.
But the current definitions in p_config.h don't allow that.

So it's either:
1. Duplicate it like I did in the latest patchset
2. Replace the p_config.h logic with something like in the latest
patchset and change hundreds of places in the codebase
3. Change the p_config.h logic so everything not defined is set to 0 instead
4. Give up and just use the #ifdefs as I did in the earlier patchsets

Thinking about it again, I'd suggest either #3 or #4 instead of the #1
I did there.
I don't think it's worth spending much time on this matter though.


BTW, I'm not sure why the PIPE_ARCH_* defines exist in the first
place, since the compiler already provides __i386__, WORDS_BIGENDIAN
and similar.
If Windows doesn't define them, ad-hoc code can be introduced to
define those, like
#if defined(WIN32) && !defined(__i386__)
#define __i386__
#endif

This should reduce the learning curve to the codebase. Several similar
issues exist like the use of INLINE, FREE, etc. instead of manually
defining the commonly used keywords in places where they are not
available.

But this is another matter, and also not really worth spending much
time on (except perhaps "INLINE" which at least personally I manage to
forget about almost every time).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 29571] [glsl2] glcpp/glcpp-parse.y(312) : error C2146: syntax error : missing ')' before identifier 'PRIiMAX'

2010-08-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=29571

Ian Romanick  changed:

   What|Removed |Added

 AssignedTo|e...@anholt.net |mesa-...@lists.freedesktop.
   ||org

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 29545] [bisected glsl2]piglit glslparsertest_preprocess1.frag fails

2010-08-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=29545

Ian Romanick  changed:

   What|Removed |Added

 AssignedTo|mesa-...@lists.freedesktop. |cwo...@cworth.org
   |org |

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 29545] [bisected glsl2]piglit glslparsertest_preprocess1.frag fails

2010-08-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=29545

Ian Romanick  changed:

   What|Removed |Added

 OS/Version|Linux (All) |All
  Component|Drivers/DRI/i965|Mesa core
 AssignedTo|e...@anholt.net |mesa-...@lists.freedesktop.
   ||org

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 29044] GLSL compiler tracker

2010-08-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=29044

Bug 29044 depends on bug 29540, which changed state.

Bug 29540 Summary: [glsl2] problem with vertex attribute locations and 
draw-time validation
https://bugs.freedesktop.org/show_bug.cgi?id=29540

   What|Old Value   |New Value

 Status|NEW |ASSIGNED
 Resolution||FIXED
 Status|ASSIGNED|RESOLVED

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Mesa prog_optimize.c: better optimization for Mesa programs

2010-08-13 Thread Segovia, Benjamin
Just looked at your modifications. I will try to be more GL/mesa style 
compliant for the variable types.
Thanks for your help
Ben


From: Brian Paul [bri...@vmware.com]
Sent: Friday, August 13, 2010 4:01 PM
To: Segovia, Benjamin
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH] Mesa prog_optimize.c: better optimization for 
Mesa programs

On 08/11/2010 09:21 PM, Segovia, Benjamin wrote:
> Corrected.
>
> I rescaned the whole code and tried to perform more aggressive checks.
> I rerun all the tests, warsow and nexuiz.
>
> Please find the updated patch attached.

Committed.  Thanks.

-Brian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Mesa prog_optimize.c: better optimization for Mesa programs

2010-08-13 Thread Brian Paul

On 08/11/2010 09:21 PM, Segovia, Benjamin wrote:

Corrected.

I rescaned the whole code and tried to perform more aggressive checks.
I rerun all the tests, warsow and nexuiz.

Please find the updated patch attached.


Committed.  Thanks.

-Brian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] dri/r300: test for FEATURE defines

2010-08-13 Thread nobled
Chia-I Wu  wrote:
>> Fixes a fatal build error when compiling just OpenGL ES libraries, since
>> FEATURE_EXT_framebuffer_blit is disabled then, so the BlitFramebuffer
>> member doesn't exist.
> Is this change enough to make dri_r300 function as a GLES only driver?
>
> To be honest, I am a little reluctant to sprinkle "#if FEATURE" in the drivers
> at the moment.  The drivers, execept for intel, have not specified GLES api
> support yet.  Even when they do, I would hope there is a more systematic to
> enable/disable certain features, to effectively reduce the driver size.
It's already sprinkled through the DRI drivers right now though(it's
in intel_fbo.c at least), because struct dd_function_table's members
in src/mesa/main/dd.h are #if'd based on the feature defines. As it
is, the code for radeon and nouveau (and the mesa state tracker, now
that I check) is just borked without them.

(new patch for st/mesa attached)
From 428e355978dbc4c3fff00ee46ad7f8455a07308a Mon Sep 17 00:00:00 2001
From: nobled 
Date: Mon, 12 Jul 2010 21:22:08 -0400
Subject: [PATCH] dri/radeon: test for FEATURE defines

'struct dd_function_table' only conditionally contains
the function pointer NewFramebuffer and friends based on
FEATURE_EXT_framebuffer_* defines. (See src/mesa/main/dd.h)

Fixes the build when the features are disabled and the vfuncs
don't exist.
---
 src/mesa/drivers/dri/radeon/radeon_fbo.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/src/mesa/drivers/dri/radeon/radeon_fbo.c b/src/mesa/drivers/dri/radeon/radeon_fbo.c
index 5174850..0597d42 100644
--- a/src/mesa/drivers/dri/radeon/radeon_fbo.c
+++ b/src/mesa/drivers/dri/radeon/radeon_fbo.c
@@ -609,6 +609,7 @@ radeon_validate_framebuffer(GLcontext *ctx, struct gl_framebuffer *fb)
 
 void radeon_fbo_init(struct radeon_context *radeon)
 {
+#if FEATURE_EXT_framebuffer_object
   radeon->glCtx->Driver.NewFramebuffer = radeon_new_framebuffer;
   radeon->glCtx->Driver.NewRenderbuffer = radeon_new_renderbuffer;
   radeon->glCtx->Driver.BindFramebuffer = radeon_bind_framebuffer;
@@ -617,7 +618,10 @@ void radeon_fbo_init(struct radeon_context *radeon)
   radeon->glCtx->Driver.FinishRenderTexture = radeon_finish_render_texture;
   radeon->glCtx->Driver.ResizeBuffers = radeon_resize_buffers;
   radeon->glCtx->Driver.ValidateFramebuffer = radeon_validate_framebuffer;
+#endif
+#if FEATURE_EXT_framebuffer_blit
   radeon->glCtx->Driver.BlitFramebuffer = _mesa_meta_BlitFramebuffer;
+#endif
 }
 
   
-- 
1.5.4.3

From 5cd07814b2ee90bec0eef3cb9ee40043a838c49e Mon Sep 17 00:00:00 2001
From: nobled 
Date: Mon, 12 Jul 2010 22:53:32 -0400
Subject: [PATCH] dri/nouveau: test for FEATURE defines

'struct dd_function_table' only conditionally contains
the function pointer NewFramebuffer and friends based on
FEATURE_EXT_framebuffer_* defines. (See src/mesa/main/dd.h)

Fixes the build when the features are disabled and the vfuncs
don't exist.
---
 src/mesa/drivers/dri/nouveau/nouveau_driver.c |2 ++
 src/mesa/drivers/dri/nouveau/nouveau_fbo.c|2 ++
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/src/mesa/drivers/dri/nouveau/nouveau_driver.c b/src/mesa/drivers/dri/nouveau/nouveau_driver.c
index 4ec864c..6452fe2 100644
--- a/src/mesa/drivers/dri/nouveau/nouveau_driver.c
+++ b/src/mesa/drivers/dri/nouveau/nouveau_driver.c
@@ -138,5 +138,7 @@ nouveau_driver_functions_init(struct dd_function_table *functions)
 	functions->DrawPixels = _mesa_meta_DrawPixels;
 	functions->CopyPixels = _mesa_meta_CopyPixels;
 	functions->Bitmap = _mesa_meta_Bitmap;
+#if FEATURE_EXT_framebuffer_blit
 	functions->BlitFramebuffer = _mesa_meta_BlitFramebuffer;
+#endif
 }
diff --git a/src/mesa/drivers/dri/nouveau/nouveau_fbo.c b/src/mesa/drivers/dri/nouveau/nouveau_fbo.c
index bd1273b..32d8f2d 100644
--- a/src/mesa/drivers/dri/nouveau/nouveau_fbo.c
+++ b/src/mesa/drivers/dri/nouveau/nouveau_fbo.c
@@ -262,10 +262,12 @@ nouveau_finish_render_texture(GLcontext *ctx,
 void
 nouveau_fbo_functions_init(struct dd_function_table *functions)
 {
+#if FEATURE_EXT_framebuffer_object
 	functions->NewFramebuffer = nouveau_framebuffer_new;
 	functions->NewRenderbuffer = nouveau_renderbuffer_new;
 	functions->BindFramebuffer = nouveau_bind_framebuffer;
 	functions->FramebufferRenderbuffer = nouveau_framebuffer_renderbuffer;
 	functions->RenderTexture = nouveau_render_texture;
 	functions->FinishRenderTexture = nouveau_finish_render_texture;
+#endif
 }
-- 
1.5.4.3

From 9c58ac433666dbdc95f6d0d2dba182379606e390 Mon Sep 17 00:00:00 2001
From: nobled 
Date: Fri, 13 Aug 2010 20:23:11 +
Subject: [PATCH] st/mesa: test for FEATURE defines

'struct dd_function_table' only conditionally contains
the function pointer NewFramebuffer and friends based on
FEATURE_EXT_framebuffer_* defines. (See src/mesa/main/dd.h)

Fixes the build when the features are disabled and the vfuncs
don't exist.
---
 src/mesa/state_tracker/st_cb_fbo.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git

Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64

2010-08-13 Thread José Fonseca
On Fri, 2010-08-13 at 21:41 +0100, José Fonseca wrote:
> On Fri, 2010-08-13 at 06:47 -0700, Luca Barbieri wrote:
> > A few related changes:
> > 1. Make x86-64 its own architecture (nothing was using so
> >util_cpu_caps.arch, so nothing can be affected)
> 
> Just remove util_cpu_caps.arch. It's there simply due to its historical
> ancestry. We have PIPE_ARCH already.
> 
> > 2. Turn the CPU arch and endianness into macros, so that the compiler
> >can evaluate that at constant time and eliminate dead code
> 
> Ditto. We have PIPE_ENDIAN or something already.

>From p_config.h:

/*
 * Endian detection.
 */

#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
#define PIPE_ARCH_LITTLE_ENDIAN
#elif defined(PIPE_ARCH_PPC) || defined(PIPE_ARCH_PPC_64)
#define PIPE_ARCH_BIG_ENDIAN
#else
#define PIPE_ARCH_UNKNOWN_ENDIAN
#endif

Basically, in my perspective, util_cpu_caps should *only* have the stuff
that can vary at run time. Everything else should be macros in
p_config.h/p_compiler.h.

The rest of the patches in the series look OK to me.

Jose

> 
> > 3. Add util_cpu_abi to know about non-standard ABIs like Win64
> 
> That's not really prescribed by the CPU. We have PIPE_OS_* already. 
> 
> There's no merit in duplicating in util_caps what's already provided by
> p_config.h / p_compiler.h 
> 
> Jose
> 
> > ---
> >  src/gallium/auxiliary/gallivm/lp_bld_pack.c   |2 +-
> >  src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |2 +-
> >  src/gallium/auxiliary/util/u_cpu_detect.c |   19 +-
> >  src/gallium/auxiliary/util/u_cpu_detect.h |   39 
> > ++--
> >  4 files changed, 38 insertions(+), 24 deletions(-)
> > 
> > diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c 
> > b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
> > index ecfb13a..8ab742a 100644
> > --- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c
> > +++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
> > @@ -171,7 +171,7 @@ lp_build_unpack2(LLVMBuilderRef builder,
> >msb = lp_build_zero(src_type);
> >  
> > /* Interleave bits */
> > -   if(util_cpu_caps.little_endian) {
> > +   if(util_cpu_little_endian) {
> >*dst_lo = lp_build_interleave2(builder, src_type, src, msb, 0);
> >*dst_hi = lp_build_interleave2(builder, src_type, src, msb, 1);
> > }
> > diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c 
> > b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> > index 3075065..d4b8b4f 100644
> > --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> > +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> > @@ -1840,7 +1840,7 @@ lp_build_sample_2d_linear_aos(struct 
> > lp_build_sample_context *bld,
> >unsigned i, j;
> >  
> >for(j = 0; j < h16.type.length; j += 4) {
> > - unsigned subindex = util_cpu_caps.little_endian ? 0 : 1;
> > + unsigned subindex = util_cpu_little_endian ? 0 : 1;
> >   LLVMValueRef index;
> >  
> >   index = LLVMConstInt(elem_type, j/2 + subindex, 0);
> > diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c 
> > b/src/gallium/auxiliary/util/u_cpu_detect.c
> > index b1a8c75..73ce146 100644
> > --- a/src/gallium/auxiliary/util/u_cpu_detect.c
> > +++ b/src/gallium/auxiliary/util/u_cpu_detect.c
> > @@ -391,23 +391,6 @@ util_cpu_detect(void)
> >  
> > memset(&util_cpu_caps, 0, sizeof util_cpu_caps);
> >  
> > -   /* Check for arch type */
> > -#if defined(PIPE_ARCH_MIPS)
> > -   util_cpu_caps.arch = UTIL_CPU_ARCH_MIPS;
> > -#elif defined(PIPE_ARCH_ALPHA)
> > -   util_cpu_caps.arch = UTIL_CPU_ARCH_ALPHA;
> > -#elif defined(PIPE_ARCH_SPARC)
> > -   util_cpu_caps.arch = UTIL_CPU_ARCH_SPARC;
> > -#elif defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
> > -   util_cpu_caps.arch = UTIL_CPU_ARCH_X86;
> > -   util_cpu_caps.little_endian = 1;
> > -#elif defined(PIPE_ARCH_PPC)
> > -   util_cpu_caps.arch = UTIL_CPU_ARCH_POWERPC;
> > -   util_cpu_caps.little_endian = 0;
> > -#else
> > -   util_cpu_caps.arch = UTIL_CPU_ARCH_UNKNOWN;
> > -#endif
> > -
> > /* Count the number of CPUs in system */
> >  #if defined(PIPE_OS_WINDOWS)
> > {
> > @@ -504,7 +487,7 @@ util_cpu_detect(void)
> >  
> >  #ifdef DEBUG
> > if (debug_get_option_dump_cpu()) {
> > -  debug_printf("util_cpu_caps.arch = %i\n", util_cpu_caps.arch);
> > +  debug_printf("util_cpu_caps.arch = %i\n", util_cpu_arch);
> >debug_printf("util_cpu_caps.nr_cpus = %u\n", util_cpu_caps.nr_cpus);
> >  
> >debug_printf("util_cpu_caps.x86_cpu_type = %u\n", 
> > util_cpu_caps.x86_cpu_type);
> > diff --git a/src/gallium/auxiliary/util/u_cpu_detect.h 
> > b/src/gallium/auxiliary/util/u_cpu_detect.h
> > index 4b3dc39..e81e4b5 100644
> > --- a/src/gallium/auxiliary/util/u_cpu_detect.h
> > +++ b/src/gallium/auxiliary/util/u_cpu_detect.h
> > @@ -36,6 +36,7 @@
> >  #define _UTIL_CPU_DETECT_H
> >  
> >  #include "pipe/p_compiler.h"
> > +#include "pipe/p_config.h"
> >  
> >  enum util_cpu_arch {
> > UTIL_CPU_ARCH_UNKNOWN

Re: [Mesa-dev] [PATCH 2/3] rtasm: add minimal x86-64 support and new instructions (v2)

2010-08-13 Thread José Fonseca
Luca,

This is great stuff. 

But one request: if Win64 is untested, please make sure it is disabled
by default until somebody had opportunity to test it. Unfortunately I'm
really busy with other stuff ATM and don't have the time.

Jose

On Fri, 2010-08-13 at 06:47 -0700, Luca Barbieri wrote:
> Changes in v2:
> - Win64 support (untested)
> - Use u_cpu_detect.h constants instead of #ifs
> 
> This commit adds minimal x86-64 support: only movs between registers
> are supported for r8-r15, and x64_rexw() must be used to ask for 64-bit
> operations.
> 
> It also adds several new instructions for the new translate_sse code.
> ---
>  src/gallium/auxiliary/rtasm/rtasm_cpu.c|6 +-
>  src/gallium/auxiliary/rtasm/rtasm_x86sse.c |  455 
> ++--
>  src/gallium/auxiliary/rtasm/rtasm_x86sse.h |   69 -
>  3 files changed, 493 insertions(+), 37 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/rtasm/rtasm_cpu.c 
> b/src/gallium/auxiliary/rtasm/rtasm_cpu.c
> index 2e15751..0461c81 100644
> --- a/src/gallium/auxiliary/rtasm/rtasm_cpu.c
> +++ b/src/gallium/auxiliary/rtasm/rtasm_cpu.c
> @@ -30,7 +30,7 @@
>  #include "rtasm_cpu.h"
> 
> 
> -#if defined(PIPE_ARCH_X86)
> +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
>  static boolean rtasm_sse_enabled(void)
>  {
> static boolean firsttime = 1;
> @@ -49,7 +49,7 @@ static boolean rtasm_sse_enabled(void)
>  int rtasm_cpu_has_sse(void)
>  {
> /* FIXME: actually detect this at run-time */
> -#if defined(PIPE_ARCH_X86)
> +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
> return rtasm_sse_enabled();
>  #else
> return 0;
> @@ -59,7 +59,7 @@ int rtasm_cpu_has_sse(void)
>  int rtasm_cpu_has_sse2(void)
>  {
> /* FIXME: actually detect this at run-time */
> -#if defined(PIPE_ARCH_X86)
> +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
> return rtasm_sse_enabled();
>  #else
> return 0;
> diff --git a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c 
> b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
> index 63007c1..88b182b 100644
> --- a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
> +++ b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
> @@ -22,8 +22,9 @@
>   **/
> 
>  #include "pipe/p_config.h"
> +#include "util/u_cpu_detect.h"
> 
> -#if defined(PIPE_ARCH_X86)
> +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
> 
>  #include "pipe/p_compiler.h"
>  #include "util/u_debug.h"
> @@ -231,6 +232,10 @@ static void emit_modrm( struct x86_function *p,
> 
> assert(reg.mod == mod_REG);
> 
> +   /* TODO: support extended x86-64 registers */
> +   assert(reg.idx < 8);
> +   assert(regmem.idx < 8);
> +
> val |= regmem.mod << 6; /* mod field */
> val |= reg.idx << 3;/* reg field */
> val |= regmem.idx;  /* r/m field */
> @@ -363,6 +368,12 @@ int x86_get_label( struct x86_function *p )
>   */
> 
> 
> +void x64_rexw(struct x86_function *p)
> +{
> +   if(util_cpu_arch == UTIL_CPU_ARCH_X86_64)
> +  emit_1ub(p, 0x48);
> +}
> +
>  void x86_jcc( struct x86_function *p,
>   enum x86_cc cc,
>   int label )
> @@ -449,6 +460,52 @@ void x86_mov_reg_imm( struct x86_function *p, struct 
> x86_reg dst, int imm )
> emit_1i(p, imm);
>  }
> 
> +void x86_mov_imm( struct x86_function *p, struct x86_reg dst, int imm )
> +{
> +   DUMP_RI( dst, imm );
> +   if(dst.mod == mod_REG)
> +  x86_mov_reg_imm(p, dst, imm);
> +   else
> +   {
> +  emit_1ub(p, 0xc7);
> +  emit_modrm_noreg(p, 0, dst);
> +  emit_1i(p, imm);
> +   }
> +}
> +
> +void x86_mov16_imm( struct x86_function *p, struct x86_reg dst, uint16_t imm 
> )
> +{
> +   DUMP_RI( dst, imm );
> +   emit_1ub(p, 0x66);
> +   if(dst.mod == mod_REG)
> +   {
> +  emit_1ub(p, 0xb8 + dst.idx);
> +  emit_2ub(p, imm & 0xff, imm >> 8);
> +   }
> +   else
> +   {
> +  emit_1ub(p, 0xc7);
> +  emit_modrm_noreg(p, 0, dst);
> +  emit_2ub(p, imm & 0xff, imm >> 8);
> +   }
> +}
> +
> +void x86_mov8_imm( struct x86_function *p, struct x86_reg dst, uint8_t imm )
> +{
> +   DUMP_RI( dst, imm );
> +   if(dst.mod == mod_REG)
> +   {
> +  emit_1ub(p, 0xb0 + dst.idx);
> +  emit_1ub(p, imm);
> +   }
> +   else
> +   {
> +  emit_1ub(p, 0xc6);
> +  emit_modrm_noreg(p, 0, dst);
> +  emit_1ub(p, imm);
> +   }
> +}
> +
>  /**
>   * Immediate group 1 instructions.
>   */
> @@ -520,7 +577,7 @@ void x86_push( struct x86_function *p,
> }
> 
> 
> -   p->stack_offset += 4;
> +   p->stack_offset += sizeof(void*);
>  }
> 
>  void x86_push_imm32( struct x86_function *p,
> @@ -530,7 +587,7 @@ void x86_push_imm32( struct x86_function *p,
> emit_1ub(p, 0x68);
> emit_1i(p,  imm32);
> 
> -   p->stack_offset += 4;
> +   p->stack_offset += sizeof(void*);
>  }
> 
> 
> @@ -540,23 +597,33 @@ void x86_pop( struct x86_function *p,
> DUMP_R( reg );
> assert(reg.mod == mod_REG);
> emit_1ub(p, 0x58 + reg.idx);
> -   p

Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64

2010-08-13 Thread José Fonseca
On Fri, 2010-08-13 at 06:47 -0700, Luca Barbieri wrote:
> A few related changes:
> 1. Make x86-64 its own architecture (nothing was using so
>util_cpu_caps.arch, so nothing can be affected)

Just remove util_cpu_caps.arch. It's there simply due to its historical
ancestry. We have PIPE_ARCH already.

> 2. Turn the CPU arch and endianness into macros, so that the compiler
>can evaluate that at constant time and eliminate dead code

Ditto. We have PIPE_ENDIAN or something already.

> 3. Add util_cpu_abi to know about non-standard ABIs like Win64

That's not really prescribed by the CPU. We have PIPE_OS_* already. 

There's no merit in duplicating in util_caps what's already provided by
p_config.h / p_compiler.h 

Jose

> ---
>  src/gallium/auxiliary/gallivm/lp_bld_pack.c   |2 +-
>  src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |2 +-
>  src/gallium/auxiliary/util/u_cpu_detect.c |   19 +-
>  src/gallium/auxiliary/util/u_cpu_detect.h |   39 ++--
>  4 files changed, 38 insertions(+), 24 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c 
> b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
> index ecfb13a..8ab742a 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
> @@ -171,7 +171,7 @@ lp_build_unpack2(LLVMBuilderRef builder,
>msb = lp_build_zero(src_type);
>  
> /* Interleave bits */
> -   if(util_cpu_caps.little_endian) {
> +   if(util_cpu_little_endian) {
>*dst_lo = lp_build_interleave2(builder, src_type, src, msb, 0);
>*dst_hi = lp_build_interleave2(builder, src_type, src, msb, 1);
> }
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c 
> b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> index 3075065..d4b8b4f 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> @@ -1840,7 +1840,7 @@ lp_build_sample_2d_linear_aos(struct 
> lp_build_sample_context *bld,
>unsigned i, j;
>  
>for(j = 0; j < h16.type.length; j += 4) {
> - unsigned subindex = util_cpu_caps.little_endian ? 0 : 1;
> + unsigned subindex = util_cpu_little_endian ? 0 : 1;
>   LLVMValueRef index;
>  
>   index = LLVMConstInt(elem_type, j/2 + subindex, 0);
> diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c 
> b/src/gallium/auxiliary/util/u_cpu_detect.c
> index b1a8c75..73ce146 100644
> --- a/src/gallium/auxiliary/util/u_cpu_detect.c
> +++ b/src/gallium/auxiliary/util/u_cpu_detect.c
> @@ -391,23 +391,6 @@ util_cpu_detect(void)
>  
> memset(&util_cpu_caps, 0, sizeof util_cpu_caps);
>  
> -   /* Check for arch type */
> -#if defined(PIPE_ARCH_MIPS)
> -   util_cpu_caps.arch = UTIL_CPU_ARCH_MIPS;
> -#elif defined(PIPE_ARCH_ALPHA)
> -   util_cpu_caps.arch = UTIL_CPU_ARCH_ALPHA;
> -#elif defined(PIPE_ARCH_SPARC)
> -   util_cpu_caps.arch = UTIL_CPU_ARCH_SPARC;
> -#elif defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
> -   util_cpu_caps.arch = UTIL_CPU_ARCH_X86;
> -   util_cpu_caps.little_endian = 1;
> -#elif defined(PIPE_ARCH_PPC)
> -   util_cpu_caps.arch = UTIL_CPU_ARCH_POWERPC;
> -   util_cpu_caps.little_endian = 0;
> -#else
> -   util_cpu_caps.arch = UTIL_CPU_ARCH_UNKNOWN;
> -#endif
> -
> /* Count the number of CPUs in system */
>  #if defined(PIPE_OS_WINDOWS)
> {
> @@ -504,7 +487,7 @@ util_cpu_detect(void)
>  
>  #ifdef DEBUG
> if (debug_get_option_dump_cpu()) {
> -  debug_printf("util_cpu_caps.arch = %i\n", util_cpu_caps.arch);
> +  debug_printf("util_cpu_caps.arch = %i\n", util_cpu_arch);
>debug_printf("util_cpu_caps.nr_cpus = %u\n", util_cpu_caps.nr_cpus);
>  
>debug_printf("util_cpu_caps.x86_cpu_type = %u\n", 
> util_cpu_caps.x86_cpu_type);
> diff --git a/src/gallium/auxiliary/util/u_cpu_detect.h 
> b/src/gallium/auxiliary/util/u_cpu_detect.h
> index 4b3dc39..e81e4b5 100644
> --- a/src/gallium/auxiliary/util/u_cpu_detect.h
> +++ b/src/gallium/auxiliary/util/u_cpu_detect.h
> @@ -36,6 +36,7 @@
>  #define _UTIL_CPU_DETECT_H
>  
>  #include "pipe/p_compiler.h"
> +#include "pipe/p_config.h"
>  
>  enum util_cpu_arch {
> UTIL_CPU_ARCH_UNKNOWN = 0,
> @@ -43,19 +44,49 @@ enum util_cpu_arch {
> UTIL_CPU_ARCH_ALPHA,
> UTIL_CPU_ARCH_SPARC,
> UTIL_CPU_ARCH_X86,
> -   UTIL_CPU_ARCH_POWERPC
> +   UTIL_CPU_ARCH_X86_64,
> +   UTIL_CPU_ARCH_POWERPC,
> +
> +   /* non-standard ABIs, only used in util_cpu_abi */
> +   UTIL_CPU_ABI_WIN64
>  };
>  
> +/* Check for arch type */
> +#if defined(PIPE_ARCH_MIPS)
> +#define util_cpu_arch UTIL_CPU_ARCH_MIPS
> +#elif defined(PIPE_ARCH_ALPHA)
> +#define util_cpu_arch UTIL_CPU_ARCH_ALPHA
> +#elif defined(PIPE_ARCH_SPARC)
> +#define util_cpu_arch UTIL_CPU_ARCH_SPARC
> +#elif defined(PIPE_ARCH_X86)
> +#define util_cpu_arch UTIL_CPU_ARCH_X86
> +#elif defined(PIPE_ARCH_X86_64)
> +#define util_cpu_arch UTIL_CPU_ARCH_X86_64
> +#elif defined(PIPE_ARCH_PPC)
> +#define uti

[Mesa-dev] [Bug 29460] GNU/Hurd support

2010-08-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=29460

Jon TURNEY  changed:

   What|Removed |Added

 CC||jon.tur...@dronecode.org.uk

--- Comment #3 from Jon TURNEY  2010-08-13 
12:21:19 PDT ---

This is a much better way of achieving what I was trying to achieve with bug
#27840

I have reviewed and tested the changes and they look good to me, but I'm not
building for linux either :-)

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] translate_sse: major rewrite (v3)

2010-08-13 Thread Luca Barbieri
Changes in v3:
- Win64 support (untested)
- Use u_cpu_detect.h constants instead of #ifs

Changes in v2:
- Minimize #ifs
- Give a name to magic number CHANNELS_0001
- Add support for CPUs without SSE (only memcpy and swizzles, like non SSE2)
- Fixed comments

translate_sse is currently very limited to the point of
being useless in essentially all cases.

In particular, it only support some float32 and unorm8
formats and doesn't work on x86-64.

This commit rewrites it to support:
1. Dumb memory copy for any pair of identical formats
2. All formats that are swizzles of each other
3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float
4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16
5. Support for x86-64 (doesn't take advantage of it in any way though)

This new translate can even be useful to translate index buffers for
cards that lack 8-bit index support.

It passes the testsuite I wrote, but note that this is a major change, and more
testing would be great.
---
 src/gallium/auxiliary/translate/translate_sse.c | 1162 ++-
 1 files changed, 924 insertions(+), 238 deletions(-)

diff --git a/src/gallium/auxiliary/translate/translate_sse.c 
b/src/gallium/auxiliary/translate/translate_sse.c
index f9aab92..565edd2 100644
--- a/src/gallium/auxiliary/translate/translate_sse.c
+++ b/src/gallium/auxiliary/translate/translate_sse.c
@@ -30,11 +30,13 @@
 #include "pipe/p_compiler.h"
 #include "util/u_memory.h"
 #include "util/u_math.h"
+#include "util/u_format.h"
+#include "util/u_cpu_detect.h"
 
 #include "translate.h"
 
 
-#if defined(PIPE_ARCH_X86)
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
 
 #include "rtasm/rtasm_cpu.h"
 #include "rtasm/rtasm_x86sse.h"
@@ -48,7 +50,7 @@
 
 struct translate_buffer {
const void *base_ptr;
-   unsigned stride;
+   uintptr_t stride;
unsigned max_index;
 };
 
@@ -72,12 +74,10 @@ struct translate_sse {
struct x86_function *func;
 
boolean loaded_identity;
-   boolean loaded_255;
-   boolean loaded_inv_255;
+   boolean loaded_const[5];
 
float identity[4];
-   float float_255[4];
-   float inv_255[4];
+   float const_value[5][4];
 
struct translate_buffer buffer[PIPE_MAX_ATTRIBS];
unsigned nr_buffers;
@@ -96,10 +96,12 @@ struct translate_sse {
 * like this is helpful to keep them in sync across the file.
 */
struct x86_reg tmp_EAX;
-   struct x86_reg idx_EBX; /* either start+i or &elt[i] */
-   struct x86_reg outbuf_ECX;
-   struct x86_reg machine_EDX;
-   struct x86_reg count_ESI;/* decrements to zero */
+   struct x86_reg tmp2_EDX;
+   struct x86_reg tmp3_ECX;
+   struct x86_reg idx_ESI; /* either start+i or &elt[i] */
+   struct x86_reg machine_EDI;
+   struct x86_reg outbuf_EBX;
+   struct x86_reg count_EBP;/* decrements to zero */
 };
 
 static int get_offset( const void *a, const void *b )
@@ -111,7 +113,7 @@ static int get_offset( const void *a, const void *b )
 
 static struct x86_reg get_identity( struct translate_sse *p )
 {
-   struct x86_reg reg = x86_make_reg(file_XMM, 6);
+   struct x86_reg reg = x86_make_reg(file_XMM, 7);
 
if (!p->loaded_identity) {
   p->loaded_identity = TRUE;
@@ -121,253 +123,910 @@ static struct x86_reg get_identity( struct 
translate_sse *p )
   p->identity[3] = 1;
 
   sse_movups(p->func, reg, 
-x86_make_disp(p->machine_EDX, 
+x86_make_disp(p->machine_EDI,
   get_offset(p, &p->identity[0])));
}
 
return reg;
 }
 
-static struct x86_reg get_255( struct translate_sse *p )
+static struct x86_reg get_const( struct translate_sse *p, unsigned i, float v)
 {
-   struct x86_reg reg = x86_make_reg(file_XMM, 7);
-
-   if (!p->loaded_255) {
-  p->loaded_255 = TRUE;
-  p->float_255[0] =
-p->float_255[1] =
-p->float_255[2] =
-p->float_255[3] = 255.0f;
-
-  sse_movups(p->func, reg, 
-x86_make_disp(p->machine_EDX, 
-  get_offset(p, &p->float_255[0])));
+   struct x86_reg reg = x86_make_reg(file_XMM, 2 + i);
+
+   if (!p->loaded_const[i]) {
+  p->loaded_const[i] = TRUE;
+  p->const_value[i][0] =
+ p->const_value[i][1] =
+ p->const_value[i][2] =
+ p->const_value[i][3] = v;
+
+  sse_movups(p->func, reg,
+ x86_make_disp(p->machine_EDI,
+   get_offset(p, &p->const_value[i][0])));
}
 
return reg;
 }
 
-static struct x86_reg get_inv_255( struct translate_sse *p )
+static struct x86_reg get_inv_127( struct translate_sse *p )
 {
-   struct x86_reg reg = x86_make_reg(file_XMM, 5);
-
-   if (!p->loaded_inv_255) {
-  p->loaded_inv_255 = TRUE;
-  p->inv_255[0] =
-p->inv_255[1] =
-p->inv_255[2] =
-p->inv_255[3] = 1.0f / 255.0f;
-
-  sse_movups(p->func, reg, 
-x86_make_disp(p->machine_EDX, 
-  get_offset(p, &p->inv_255[0

[Mesa-dev] [PATCH 6/6] translate_sse: major rewrite (v2)

2010-08-13 Thread Luca Barbieri
Changes in v2:
- Minimize #ifs
- Give a name to magic number CHANNELS_0001
- Add support for CPUs without SSE (only memcpy and swizzles, like non SSE2)
- Fixed comments

translate_sse is currently very limited to the point of
being useless in essentially all cases.

In particular, it only support some float32 and unorm8
formats and doesn't work on x86-64.

This commit rewrites it to support:
1. Dumb memory copy for any pair of identical formats
2. All formats that are swizzles of each other
3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float
4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16
5. Support for x86-64 (doesn't take advantage of it in any way though)

This new translate can even be useful to translate index buffers for
cards that lack 8-bit index support.

It passes the testsuite I wrote, but note that this is a major change, and more
testing would be great.
---
 src/gallium/auxiliary/translate/translate_sse.c | 1154 ++-
 1 files changed, 920 insertions(+), 234 deletions(-)

diff --git a/src/gallium/auxiliary/translate/translate_sse.c 
b/src/gallium/auxiliary/translate/translate_sse.c
index f9aab92..e2d8d53 100644
--- a/src/gallium/auxiliary/translate/translate_sse.c
+++ b/src/gallium/auxiliary/translate/translate_sse.c
@@ -30,11 +30,13 @@
 #include "pipe/p_compiler.h"
 #include "util/u_memory.h"
 #include "util/u_math.h"
+#include "util/u_format.h"
+#include "util/u_cpu_detect.h"
 
 #include "translate.h"
 
 
-#if defined(PIPE_ARCH_X86)
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
 
 #include "rtasm/rtasm_cpu.h"
 #include "rtasm/rtasm_x86sse.h"
@@ -48,7 +50,7 @@
 
 struct translate_buffer {
const void *base_ptr;
-   unsigned stride;
+   uintptr_t stride;
unsigned max_index;
 };
 
@@ -72,12 +74,10 @@ struct translate_sse {
struct x86_function *func;
 
boolean loaded_identity;
-   boolean loaded_255;
-   boolean loaded_inv_255;
+   boolean loaded_const[5];
 
float identity[4];
-   float float_255[4];
-   float inv_255[4];
+   float const_value[5][4];
 
struct translate_buffer buffer[PIPE_MAX_ATTRIBS];
unsigned nr_buffers;
@@ -96,10 +96,12 @@ struct translate_sse {
 * like this is helpful to keep them in sync across the file.
 */
struct x86_reg tmp_EAX;
-   struct x86_reg idx_EBX; /* either start+i or &elt[i] */
-   struct x86_reg outbuf_ECX;
-   struct x86_reg machine_EDX;
-   struct x86_reg count_ESI;/* decrements to zero */
+   struct x86_reg tmp2_EDX;
+   struct x86_reg tmp3_ECX;
+   struct x86_reg idx_ESI; /* either start+i or &elt[i] */
+   struct x86_reg machine_EDI;
+   struct x86_reg outbuf_EBX;
+   struct x86_reg count_EBP;/* decrements to zero */
 };
 
 static int get_offset( const void *a, const void *b )
@@ -111,7 +113,7 @@ static int get_offset( const void *a, const void *b )
 
 static struct x86_reg get_identity( struct translate_sse *p )
 {
-   struct x86_reg reg = x86_make_reg(file_XMM, 6);
+   struct x86_reg reg = x86_make_reg(file_XMM, 7);
 
if (!p->loaded_identity) {
   p->loaded_identity = TRUE;
@@ -121,253 +123,909 @@ static struct x86_reg get_identity( struct 
translate_sse *p )
   p->identity[3] = 1;
 
   sse_movups(p->func, reg, 
-x86_make_disp(p->machine_EDX, 
+x86_make_disp(p->machine_EDI,
   get_offset(p, &p->identity[0])));
}
 
return reg;
 }
 
-static struct x86_reg get_255( struct translate_sse *p )
+static struct x86_reg get_const( struct translate_sse *p, unsigned i, float v)
 {
-   struct x86_reg reg = x86_make_reg(file_XMM, 7);
-
-   if (!p->loaded_255) {
-  p->loaded_255 = TRUE;
-  p->float_255[0] =
-p->float_255[1] =
-p->float_255[2] =
-p->float_255[3] = 255.0f;
-
-  sse_movups(p->func, reg, 
-x86_make_disp(p->machine_EDX, 
-  get_offset(p, &p->float_255[0])));
+   struct x86_reg reg = x86_make_reg(file_XMM, 2 + i);
+
+   if (!p->loaded_const[i]) {
+  p->loaded_const[i] = TRUE;
+  p->const_value[i][0] =
+ p->const_value[i][1] =
+ p->const_value[i][2] =
+ p->const_value[i][3] = v;
+
+  sse_movups(p->func, reg,
+ x86_make_disp(p->machine_EDI,
+   get_offset(p, &p->const_value[i][0])));
}
 
return reg;
 }
 
-static struct x86_reg get_inv_255( struct translate_sse *p )
+static struct x86_reg get_inv_127( struct translate_sse *p )
 {
-   struct x86_reg reg = x86_make_reg(file_XMM, 5);
-
-   if (!p->loaded_inv_255) {
-  p->loaded_inv_255 = TRUE;
-  p->inv_255[0] =
-p->inv_255[1] =
-p->inv_255[2] =
-p->inv_255[3] = 1.0f / 255.0f;
-
-  sse_movups(p->func, reg, 
-x86_make_disp(p->machine_EDX, 
-  get_offset(p, &p->inv_255[0])));
-   }
-
-   return reg;
+   return get_const(p, 0, 1.0f / 127.0f);
 }
 
-
-static vo

[Mesa-dev] [Bug 29540] [glsl2] problem with vertex attribute locations and draw-time validation

2010-08-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=29540

Ian Romanick  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 AssignedTo|mesa-...@lists.freedesktop. |i...@freedesktop.org
   |org |

--- Comment #3 from Ian Romanick  2010-08-13 10:27:19 PDT 
---
(In reply to comment #1)

> 1. By returning attrib_loc=1 instead of 0 will we have one less user-defined
> vertex attribute available to users with the new compiler?  The query of
> GL_MAX_VERTEX_ATTRIBS_ARB still returns 16.

Attribute 0 is special.  It is bound to gl_Vertex by default.  Applications can
specifically bind to 0 using glBindAttribLocation, but the linker doesn't
automatically bind to it.  It's possible that the spec says we should bind
something to 0 if gl_Vertex isn't used, which is the case in this test.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] talloc (Was: Merge criteria for glsl2 branch)

2010-08-13 Thread Ian Romanick
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

José Fonseca wrote:

> I've pushed a new branch glsl2-win32 that includes Aras' patch, and all
> necessary fixes to get at least MinGW build successfully.

I merged all of these.

> I had to rename some tokens in order to avoid collisions with windows.h
> defines. Aras didn't mention this problem before. Perhaps the indirect
> windows.h include can be avoided, or you prefer to handle this some
> other way.

I did the last one a little differently.  I already had CONST_TOK,
LAYOUT_TOK, and INLINE_TOK to avoid collisions with Linux headers and
other Mesa headers, so I appended _TOK to the colliding names here too.
 I think I got all of the ones from your patch, but you'll want to
double check.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxldFIACgkQX1gOwKyEAw8i5QCeMMVifl+qxhlaQW+Sh+dwxuz3
w7IAmQGuWFwOZCf8xNlprQqMC9yIqdwO
=O+IU
-END PGP SIGNATURE-
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] draw: Replace varray and vcache by vsplit

2010-08-13 Thread Chia-I Wu
On Fri, Aug 13, 2010 at 11:35 PM, Keith Whitwell  wrote:
> On Fri, 2010-08-13 at 08:09 -0700, Chia-I Wu wrote:
>> On Fri, Aug 13, 2010 at 10:51 PM, Keith Whitwell  wrote:
>> > On Fri, 2010-08-13 at 07:46 -0700, Chia-I Wu wrote:
>> >> On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell  
>> >> wrote:
>> >> > On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote:
>> >> >> Hi,
>> >> >>
>> >> >> There are two primitive transformations in gallium draw module.  In
>> >> >> varray, primitives are "split"ted.  When a primitive has more vertices
>> >> >> than the middle end can handle, varray splits the primitive and calls
>> >> >> the middle end multiple times.
>> >> >>
>> >> >> In vcache, primitives are "decompose"d.  More advanced primitives are
>> >> >> decomposed into one of point, line(_adj), or triangle(_adj).
>> >> >> Similarly, vcache may call the middle end multiple times to flush its
>> >> >> internal buffer.  In some cases, vcache passes the primitves through
>> >> >> without decomposing nor splitting, as can be seen in vcache_check_run.
>> >> >>
>> >> >> The issue with vcache is that it has to decompose a primitive
>> >> >> differently depending on the provoking convention, as explained in
>> >> >>
>> >> >>   
>> >> >> http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html
>> >> >>
>> >> >> It becomes a problem when GS is active.
>> >> >>
>> >> >> My proposal is to make vcache split instead of decompose.  Because
>> >> >> varray only splits and vcache has a pass-through path, the rest of the
>> >> >> workflow already has to support all primitive types.  Switching from
>> >> >> decompose to split does not require a big change to the rest of the
>> >> >> workflow.
>> >> >>
>> >> >> But then vcache will look a lot like varray, only with indexed
>> >> >> primitive support.  It leads me to a new frontend that replaces both
>> >> >> varray and vcache: vsplit
>> >> >>
>> >> >>  http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit
>> >> >>
>> >> >> vsplit is based on varray.  It uses some code from vcache to support
>> >> >> indexed primitives.  When vcache decomposes, there are flags being set
>> >> >> to indicate that if the stipple counter should be reset or if some
>> >> >> edge of a triangle should be omitted in unfilled mode.  The segments
>> >> >> of a splitted primitive have flags for similar purposes too:
>> >> >>
>> >> >>   DRAW_SPLIT_AFTER   More segments to come after this one
>> >> >>   DRAW_SPLIT_BEFORE  There are preceding segments
>> >> >>
>> >> >> These flags are set by vsplit and the middle ends pass them to the
>> >> >> other stages.  Therefore, the run methods of middle ends are augmented
>> >> >> to take the flags.
>> >> >>
>> >> >> To summarize, vsplit
>> >> >>
>> >> >>  - fixes GS when (flatshade && flatshade_first) is on
>> >> >>  - never sends more vertices than the middle end claims to handle
>> >> >>  - is faster than vcache: split instead of decompose, no get_elt
>> >> >>    calls
>> >> >>  - no longer uses the higher bits of draw_elts for stipple/edge flags
>> >> >>
>> >> >> Suggestions?
>> >> >
>> >> >
>> >> > Hi - I haven't looked at the patches yet, but a couple of questions:
>> >> >
>> >> > How does this interact with the draw_pipe_* code - which requires
>> >> > decomposed primitives?
>> >> draw_pipe.c decomposes the primitives.  It is there before because it
>> >> has to support varray and vcache_check_run which do not decompose.
>> >
>> > OK.
>> >
>> >> > How does this cope with indexed rendering where the vertex buffers
>> >> > themselves are too large (for hardware or some other entity)?  Eg.
>> >> > imagine the hardware could cope with up to 64k vertices, and you have a
>> >> > drawelements call randomly referencing vertices in range 0..128k ?
>> >> Vertex fetching happens in the middle end so the range of the indices
>> >> is not a problem.  Though vsplit guarantees that it never calls the
>> >> middle end with more vertices than the middle end claims to support
>> >> (as returned by draw_pt_middle_end::prepare).  The limit is usually
>> >> decidied by the size of the buffer for vertex emitting.
>> >
>> > I guess I'm wondering how it does this.  If the middle end says it
>> > supports 64k vertices, and the vertex element looks like
>> >
>> >  [0, 128k, 64k, 32k, 96k, 16k, 1, ... ]
>> >
>> > what gets sent?  (Sorry, I still haven't looked at the code, you could
>> > well have addressed this).
>> I see.  The frontend would set
>>
>>    fetch_elts = [0, 128k, 64k, 32k, 96k, 16k, 1, ... ]
>>    draw_elts = [0, 1, 2, 3, 4, 5, 6, ...]
>>
>> fetch_elts is processed by the middle end and it will fetch the given
>> vertices.  draw_elts will be passed to draw_emit or the pipeline.  It
>> is the new index buffer, which indexes into the fetched vertices.
>>
>> It is actual the same as vcache.  So when fetch_elts is
>>
>>    [0, 128k, 64k, 64k, 128k, 16k, ...],
>>
>> draw_elts would be set to
>>
>>    [0, 1, 2, 2, 1, 3, ...]
>>
>> The number of elements to fetch (and sh

Re: [Mesa-dev] draw: Replace varray and vcache by vsplit

2010-08-13 Thread Keith Whitwell
On Fri, 2010-08-13 at 08:09 -0700, Chia-I Wu wrote:
> On Fri, Aug 13, 2010 at 10:51 PM, Keith Whitwell  wrote:
> > On Fri, 2010-08-13 at 07:46 -0700, Chia-I Wu wrote:
> >> On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell  wrote:
> >> > On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote:
> >> >> Hi,
> >> >>
> >> >> There are two primitive transformations in gallium draw module.  In
> >> >> varray, primitives are "split"ted.  When a primitive has more vertices
> >> >> than the middle end can handle, varray splits the primitive and calls
> >> >> the middle end multiple times.
> >> >>
> >> >> In vcache, primitives are "decompose"d.  More advanced primitives are
> >> >> decomposed into one of point, line(_adj), or triangle(_adj).
> >> >> Similarly, vcache may call the middle end multiple times to flush its
> >> >> internal buffer.  In some cases, vcache passes the primitves through
> >> >> without decomposing nor splitting, as can be seen in vcache_check_run.
> >> >>
> >> >> The issue with vcache is that it has to decompose a primitive
> >> >> differently depending on the provoking convention, as explained in
> >> >>
> >> >>   http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html
> >> >>
> >> >> It becomes a problem when GS is active.
> >> >>
> >> >> My proposal is to make vcache split instead of decompose.  Because
> >> >> varray only splits and vcache has a pass-through path, the rest of the
> >> >> workflow already has to support all primitive types.  Switching from
> >> >> decompose to split does not require a big change to the rest of the
> >> >> workflow.
> >> >>
> >> >> But then vcache will look a lot like varray, only with indexed
> >> >> primitive support.  It leads me to a new frontend that replaces both
> >> >> varray and vcache: vsplit
> >> >>
> >> >>  http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit
> >> >>
> >> >> vsplit is based on varray.  It uses some code from vcache to support
> >> >> indexed primitives.  When vcache decomposes, there are flags being set
> >> >> to indicate that if the stipple counter should be reset or if some
> >> >> edge of a triangle should be omitted in unfilled mode.  The segments
> >> >> of a splitted primitive have flags for similar purposes too:
> >> >>
> >> >>   DRAW_SPLIT_AFTER   More segments to come after this one
> >> >>   DRAW_SPLIT_BEFORE  There are preceding segments
> >> >>
> >> >> These flags are set by vsplit and the middle ends pass them to the
> >> >> other stages.  Therefore, the run methods of middle ends are augmented
> >> >> to take the flags.
> >> >>
> >> >> To summarize, vsplit
> >> >>
> >> >>  - fixes GS when (flatshade && flatshade_first) is on
> >> >>  - never sends more vertices than the middle end claims to handle
> >> >>  - is faster than vcache: split instead of decompose, no get_elt
> >> >>calls
> >> >>  - no longer uses the higher bits of draw_elts for stipple/edge flags
> >> >>
> >> >> Suggestions?
> >> >
> >> >
> >> > Hi - I haven't looked at the patches yet, but a couple of questions:
> >> >
> >> > How does this interact with the draw_pipe_* code - which requires
> >> > decomposed primitives?
> >> draw_pipe.c decomposes the primitives.  It is there before because it
> >> has to support varray and vcache_check_run which do not decompose.
> >
> > OK.
> >
> >> > How does this cope with indexed rendering where the vertex buffers
> >> > themselves are too large (for hardware or some other entity)?  Eg.
> >> > imagine the hardware could cope with up to 64k vertices, and you have a
> >> > drawelements call randomly referencing vertices in range 0..128k ?
> >> Vertex fetching happens in the middle end so the range of the indices
> >> is not a problem.  Though vsplit guarantees that it never calls the
> >> middle end with more vertices than the middle end claims to support
> >> (as returned by draw_pt_middle_end::prepare).  The limit is usually
> >> decidied by the size of the buffer for vertex emitting.
> >
> > I guess I'm wondering how it does this.  If the middle end says it
> > supports 64k vertices, and the vertex element looks like
> >
> >  [0, 128k, 64k, 32k, 96k, 16k, 1, ... ]
> >
> > what gets sent?  (Sorry, I still haven't looked at the code, you could
> > well have addressed this).
> I see.  The frontend would set
> 
>fetch_elts = [0, 128k, 64k, 32k, 96k, 16k, 1, ... ]
>draw_elts = [0, 1, 2, 3, 4, 5, 6, ...]
> 
> fetch_elts is processed by the middle end and it will fetch the given
> vertices.  draw_elts will be passed to draw_emit or the pipeline.  It
> is the new index buffer, which indexes into the fetched vertices.
> 
> It is actual the same as vcache.  So when fetch_elts is
> 
>[0, 128k, 64k, 64k, 128k, 16k, ...],
> 
> draw_elts would be set to
> 
>[0, 1, 2, 2, 1, 3, ...]
> 
> The number of elements to fetch (and shade) is minimized.

Thanks Chia-I, I've taken a look at the code & this makes sense - the
fetch/draw cache is still there, but specialized into 4 versions for
each element t

Re: [Mesa-dev] draw: Replace varray and vcache by vsplit

2010-08-13 Thread Chia-I Wu
On Fri, Aug 13, 2010 at 11:09 PM, Chia-I Wu  wrote:
> On Fri, Aug 13, 2010 at 10:51 PM, Keith Whitwell  wrote:
>> On Fri, 2010-08-13 at 07:46 -0700, Chia-I Wu wrote:
>>> On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell  wrote:
>>> > On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote:
>>> >> Hi,
>>> >>
>>> >> There are two primitive transformations in gallium draw module.  In
>>> >> varray, primitives are "split"ted.  When a primitive has more vertices
>>> >> than the middle end can handle, varray splits the primitive and calls
>>> >> the middle end multiple times.
>>> >>
>>> >> In vcache, primitives are "decompose"d.  More advanced primitives are
>>> >> decomposed into one of point, line(_adj), or triangle(_adj).
>>> >> Similarly, vcache may call the middle end multiple times to flush its
>>> >> internal buffer.  In some cases, vcache passes the primitves through
>>> >> without decomposing nor splitting, as can be seen in vcache_check_run.
>>> >>
>>> >> The issue with vcache is that it has to decompose a primitive
>>> >> differently depending on the provoking convention, as explained in
>>> >>
>>> >>   http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html
>>> >>
>>> >> It becomes a problem when GS is active.
>>> >>
>>> >> My proposal is to make vcache split instead of decompose.  Because
>>> >> varray only splits and vcache has a pass-through path, the rest of the
>>> >> workflow already has to support all primitive types.  Switching from
>>> >> decompose to split does not require a big change to the rest of the
>>> >> workflow.
>>> >>
>>> >> But then vcache will look a lot like varray, only with indexed
>>> >> primitive support.  It leads me to a new frontend that replaces both
>>> >> varray and vcache: vsplit
>>> >>
>>> >>  http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit
>>> >>
>>> >> vsplit is based on varray.  It uses some code from vcache to support
>>> >> indexed primitives.  When vcache decomposes, there are flags being set
>>> >> to indicate that if the stipple counter should be reset or if some
>>> >> edge of a triangle should be omitted in unfilled mode.  The segments
>>> >> of a splitted primitive have flags for similar purposes too:
>>> >>
>>> >>   DRAW_SPLIT_AFTER   More segments to come after this one
>>> >>   DRAW_SPLIT_BEFORE  There are preceding segments
>>> >>
>>> >> These flags are set by vsplit and the middle ends pass them to the
>>> >> other stages.  Therefore, the run methods of middle ends are augmented
>>> >> to take the flags.
>>> >>
>>> >> To summarize, vsplit
>>> >>
>>> >>  - fixes GS when (flatshade && flatshade_first) is on
>>> >>  - never sends more vertices than the middle end claims to handle
>>> >>  - is faster than vcache: split instead of decompose, no get_elt
>>> >>    calls
>>> >>  - no longer uses the higher bits of draw_elts for stipple/edge flags
>>> >>
>>> >> Suggestions?
>>> >
>>> >
>>> > Hi - I haven't looked at the patches yet, but a couple of questions:
>>> >
>>> > How does this interact with the draw_pipe_* code - which requires
>>> > decomposed primitives?
>>> draw_pipe.c decomposes the primitives.  It is there before because it
>>> has to support varray and vcache_check_run which do not decompose.
>>
>> OK.
>>
>>> > How does this cope with indexed rendering where the vertex buffers
>>> > themselves are too large (for hardware or some other entity)?  Eg.
>>> > imagine the hardware could cope with up to 64k vertices, and you have a
>>> > drawelements call randomly referencing vertices in range 0..128k ?
>>> Vertex fetching happens in the middle end so the range of the indices
>>> is not a problem.  Though vsplit guarantees that it never calls the
>>> middle end with more vertices than the middle end claims to support
>>> (as returned by draw_pt_middle_end::prepare).  The limit is usually
>>> decidied by the size of the buffer for vertex emitting.
>>
>> I guess I'm wondering how it does this.  If the middle end says it
>> supports 64k vertices, and the vertex element looks like
>>
>>  [0, 128k, 64k, 32k, 96k, 16k, 1, ... ]
>>
>> what gets sent?  (Sorry, I still haven't looked at the code, you could
>> well have addressed this).
> I see.  The frontend would set
>
>   fetch_elts = [0, 128k, 64k, 32k, 96k, 16k, 1, ... ]
>   draw_elts = [0, 1, 2, 3, 4, 5, 6, ...]
>
> fetch_elts is processed by the middle end and it will fetch the given
> vertices.  draw_elts will be passed to draw_emit or the pipeline.  It
> is the new index buffer, which indexes into the fetched vertices.
>
> It is actual the same as vcache.  So when fetch_elts is
Should be:  So when the index buffer looks like
>   [0, 128k, 64k, 64k, 128k, 16k, ...],

fetch_elts would be set to

  [0, 128k, 64k, 16k, ...] and
> draw_elts would be set to
>
>   [0, 1, 2, 2, 1, 3, ...]
>
> The number of elements to fetch (and shade) is minimized.
>
> --
> o...@lunarg.com
>



-- 
o...@lunarg.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop

[Mesa-dev] [Bug 29540] [glsl2] problem with vertex attribute locations and draw-time validation

2010-08-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=29540

--- Comment #2 from Brian Paul  2010-08-13 08:10:00 PDT 
---
Created an attachment (id=37848)
 View: https://bugs.freedesktop.org/attachment.cgi?id=37848
 Review: https://bugs.freedesktop.org/review?bug=29540&attachment=37848

Patch to fix glDrawArrays/Elements validation

If there's not objections to this, I'll commit later.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] draw: Replace varray and vcache by vsplit

2010-08-13 Thread Chia-I Wu
On Fri, Aug 13, 2010 at 10:51 PM, Keith Whitwell  wrote:
> On Fri, 2010-08-13 at 07:46 -0700, Chia-I Wu wrote:
>> On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell  wrote:
>> > On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote:
>> >> Hi,
>> >>
>> >> There are two primitive transformations in gallium draw module.  In
>> >> varray, primitives are "split"ted.  When a primitive has more vertices
>> >> than the middle end can handle, varray splits the primitive and calls
>> >> the middle end multiple times.
>> >>
>> >> In vcache, primitives are "decompose"d.  More advanced primitives are
>> >> decomposed into one of point, line(_adj), or triangle(_adj).
>> >> Similarly, vcache may call the middle end multiple times to flush its
>> >> internal buffer.  In some cases, vcache passes the primitves through
>> >> without decomposing nor splitting, as can be seen in vcache_check_run.
>> >>
>> >> The issue with vcache is that it has to decompose a primitive
>> >> differently depending on the provoking convention, as explained in
>> >>
>> >>   http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html
>> >>
>> >> It becomes a problem when GS is active.
>> >>
>> >> My proposal is to make vcache split instead of decompose.  Because
>> >> varray only splits and vcache has a pass-through path, the rest of the
>> >> workflow already has to support all primitive types.  Switching from
>> >> decompose to split does not require a big change to the rest of the
>> >> workflow.
>> >>
>> >> But then vcache will look a lot like varray, only with indexed
>> >> primitive support.  It leads me to a new frontend that replaces both
>> >> varray and vcache: vsplit
>> >>
>> >>  http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit
>> >>
>> >> vsplit is based on varray.  It uses some code from vcache to support
>> >> indexed primitives.  When vcache decomposes, there are flags being set
>> >> to indicate that if the stipple counter should be reset or if some
>> >> edge of a triangle should be omitted in unfilled mode.  The segments
>> >> of a splitted primitive have flags for similar purposes too:
>> >>
>> >>   DRAW_SPLIT_AFTER   More segments to come after this one
>> >>   DRAW_SPLIT_BEFORE  There are preceding segments
>> >>
>> >> These flags are set by vsplit and the middle ends pass them to the
>> >> other stages.  Therefore, the run methods of middle ends are augmented
>> >> to take the flags.
>> >>
>> >> To summarize, vsplit
>> >>
>> >>  - fixes GS when (flatshade && flatshade_first) is on
>> >>  - never sends more vertices than the middle end claims to handle
>> >>  - is faster than vcache: split instead of decompose, no get_elt
>> >>    calls
>> >>  - no longer uses the higher bits of draw_elts for stipple/edge flags
>> >>
>> >> Suggestions?
>> >
>> >
>> > Hi - I haven't looked at the patches yet, but a couple of questions:
>> >
>> > How does this interact with the draw_pipe_* code - which requires
>> > decomposed primitives?
>> draw_pipe.c decomposes the primitives.  It is there before because it
>> has to support varray and vcache_check_run which do not decompose.
>
> OK.
>
>> > How does this cope with indexed rendering where the vertex buffers
>> > themselves are too large (for hardware or some other entity)?  Eg.
>> > imagine the hardware could cope with up to 64k vertices, and you have a
>> > drawelements call randomly referencing vertices in range 0..128k ?
>> Vertex fetching happens in the middle end so the range of the indices
>> is not a problem.  Though vsplit guarantees that it never calls the
>> middle end with more vertices than the middle end claims to support
>> (as returned by draw_pt_middle_end::prepare).  The limit is usually
>> decidied by the size of the buffer for vertex emitting.
>
> I guess I'm wondering how it does this.  If the middle end says it
> supports 64k vertices, and the vertex element looks like
>
>  [0, 128k, 64k, 32k, 96k, 16k, 1, ... ]
>
> what gets sent?  (Sorry, I still haven't looked at the code, you could
> well have addressed this).
I see.  The frontend would set

   fetch_elts = [0, 128k, 64k, 32k, 96k, 16k, 1, ... ]
   draw_elts = [0, 1, 2, 3, 4, 5, 6, ...]

fetch_elts is processed by the middle end and it will fetch the given
vertices.  draw_elts will be passed to draw_emit or the pipeline.  It
is the new index buffer, which indexes into the fetched vertices.

It is actual the same as vcache.  So when fetch_elts is

   [0, 128k, 64k, 64k, 128k, 16k, ...],

draw_elts would be set to

   [0, 1, 2, 2, 1, 3, ...]

The number of elements to fetch (and shade) is minimized.

-- 
o...@lunarg.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 29540] [glsl2] problem with vertex attribute locations and draw-time validation

2010-08-13 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=29540

--- Comment #1 from Brian Paul  2010-08-13 08:09:05 PDT 
---
The new piglit test glsl-getattriblocation test returns attrib_loc=0 for the
old compiler and attrib_loc=1 for the new compiler.

This causes the glDrawArrays/Elements() validation test to fail at
api_validate.c:124 so nothing is drawn.

There's two issues here.

1. By returning attrib_loc=1 instead of 0 will we have one less user-defined
vertex attribute available to users with the new compiler?  The query of
GL_MAX_VERTEX_ATTRIBS_ARB still returns 16.

2. Mesa's glDrawArrays/Elements() validation check is incorrect.  I've added
two other piglit tests (glsl-bindattriblocation and glsl-novertexdata) that
test this logic.  They pass with NVIDIA's driver but fail with Mesa.  The
attached patch, however, fixes the problem for Mesa (for the old drivers at
least but it doesn't work with gallium yet).

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] draw: Replace varray and vcache by vsplit

2010-08-13 Thread Keith Whitwell
On Fri, 2010-08-13 at 07:46 -0700, Chia-I Wu wrote:
> On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell  wrote:
> > On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote:
> >> Hi,
> >>
> >> There are two primitive transformations in gallium draw module.  In
> >> varray, primitives are "split"ted.  When a primitive has more vertices
> >> than the middle end can handle, varray splits the primitive and calls
> >> the middle end multiple times.
> >>
> >> In vcache, primitives are "decompose"d.  More advanced primitives are
> >> decomposed into one of point, line(_adj), or triangle(_adj).
> >> Similarly, vcache may call the middle end multiple times to flush its
> >> internal buffer.  In some cases, vcache passes the primitves through
> >> without decomposing nor splitting, as can be seen in vcache_check_run.
> >>
> >> The issue with vcache is that it has to decompose a primitive
> >> differently depending on the provoking convention, as explained in
> >>
> >>   http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html
> >>
> >> It becomes a problem when GS is active.
> >>
> >> My proposal is to make vcache split instead of decompose.  Because
> >> varray only splits and vcache has a pass-through path, the rest of the
> >> workflow already has to support all primitive types.  Switching from
> >> decompose to split does not require a big change to the rest of the
> >> workflow.
> >>
> >> But then vcache will look a lot like varray, only with indexed
> >> primitive support.  It leads me to a new frontend that replaces both
> >> varray and vcache: vsplit
> >>
> >>  http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit
> >>
> >> vsplit is based on varray.  It uses some code from vcache to support
> >> indexed primitives.  When vcache decomposes, there are flags being set
> >> to indicate that if the stipple counter should be reset or if some
> >> edge of a triangle should be omitted in unfilled mode.  The segments
> >> of a splitted primitive have flags for similar purposes too:
> >>
> >>   DRAW_SPLIT_AFTER   More segments to come after this one
> >>   DRAW_SPLIT_BEFORE  There are preceding segments
> >>
> >> These flags are set by vsplit and the middle ends pass them to the
> >> other stages.  Therefore, the run methods of middle ends are augmented
> >> to take the flags.
> >>
> >> To summarize, vsplit
> >>
> >>  - fixes GS when (flatshade && flatshade_first) is on
> >>  - never sends more vertices than the middle end claims to handle
> >>  - is faster than vcache: split instead of decompose, no get_elt
> >>calls
> >>  - no longer uses the higher bits of draw_elts for stipple/edge flags
> >>
> >> Suggestions?
> >
> >
> > Hi - I haven't looked at the patches yet, but a couple of questions:
> >
> > How does this interact with the draw_pipe_* code - which requires
> > decomposed primitives?
> draw_pipe.c decomposes the primitives.  It is there before because it
> has to support varray and vcache_check_run which do not decompose.

OK.

> > How does this cope with indexed rendering where the vertex buffers
> > themselves are too large (for hardware or some other entity)?  Eg.
> > imagine the hardware could cope with up to 64k vertices, and you have a
> > drawelements call randomly referencing vertices in range 0..128k ?
> Vertex fetching happens in the middle end so the range of the indices
> is not a problem.  Though vsplit guarantees that it never calls the
> middle end with more vertices than the middle end claims to support
> (as returned by draw_pt_middle_end::prepare).  The limit is usually
> decidied by the size of the buffer for vertex emitting.

I guess I'm wondering how it does this.  If the middle end says it
supports 64k vertices, and the vertex element looks like

  [0, 128k, 64k, 32k, 96k, 16k, 1, ... ]

what gets sent?  (Sorry, I still haven't looked at the code, you could
well have addressed this).

Keith



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] draw: Replace varray and vcache by vsplit

2010-08-13 Thread Chia-I Wu
On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell  wrote:
> On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote:
>> Hi,
>>
>> There are two primitive transformations in gallium draw module.  In
>> varray, primitives are "split"ted.  When a primitive has more vertices
>> than the middle end can handle, varray splits the primitive and calls
>> the middle end multiple times.
>>
>> In vcache, primitives are "decompose"d.  More advanced primitives are
>> decomposed into one of point, line(_adj), or triangle(_adj).
>> Similarly, vcache may call the middle end multiple times to flush its
>> internal buffer.  In some cases, vcache passes the primitves through
>> without decomposing nor splitting, as can be seen in vcache_check_run.
>>
>> The issue with vcache is that it has to decompose a primitive
>> differently depending on the provoking convention, as explained in
>>
>>   http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html
>>
>> It becomes a problem when GS is active.
>>
>> My proposal is to make vcache split instead of decompose.  Because
>> varray only splits and vcache has a pass-through path, the rest of the
>> workflow already has to support all primitive types.  Switching from
>> decompose to split does not require a big change to the rest of the
>> workflow.
>>
>> But then vcache will look a lot like varray, only with indexed
>> primitive support.  It leads me to a new frontend that replaces both
>> varray and vcache: vsplit
>>
>>  http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit
>>
>> vsplit is based on varray.  It uses some code from vcache to support
>> indexed primitives.  When vcache decomposes, there are flags being set
>> to indicate that if the stipple counter should be reset or if some
>> edge of a triangle should be omitted in unfilled mode.  The segments
>> of a splitted primitive have flags for similar purposes too:
>>
>>   DRAW_SPLIT_AFTER   More segments to come after this one
>>   DRAW_SPLIT_BEFORE  There are preceding segments
>>
>> These flags are set by vsplit and the middle ends pass them to the
>> other stages.  Therefore, the run methods of middle ends are augmented
>> to take the flags.
>>
>> To summarize, vsplit
>>
>>  - fixes GS when (flatshade && flatshade_first) is on
>>  - never sends more vertices than the middle end claims to handle
>>  - is faster than vcache: split instead of decompose, no get_elt
>>    calls
>>  - no longer uses the higher bits of draw_elts for stipple/edge flags
>>
>> Suggestions?
>
>
> Hi - I haven't looked at the patches yet, but a couple of questions:
>
> How does this interact with the draw_pipe_* code - which requires
> decomposed primitives?
draw_pipe.c decomposes the primitives.  It is there before because it
has to support varray and vcache_check_run which do not decompose.
> How does this cope with indexed rendering where the vertex buffers
> themselves are too large (for hardware or some other entity)?  Eg.
> imagine the hardware could cope with up to 64k vertices, and you have a
> drawelements call randomly referencing vertices in range 0..128k ?
Vertex fetching happens in the middle end so the range of the indices
is not a problem.  Though vsplit guarantees that it never calls the
middle end with more vertices than the middle end claims to support
(as returned by draw_pt_middle_end::prepare).  The limit is usually
decidied by the size of the buffer for vertex emitting.

-- 
o...@lunarg.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] talloc (Was: Merge criteria for glsl2 branch)

2010-08-13 Thread Aras Pranckevicius
> I had to rename some tokens in order to avoid collisions with windows.h
> defines. Aras didn't mention this problem before.

I mentioned this to Eric in private conversation, but on this list I
only talked about talloc specific changes.

Yeah, in the glsl2 parser some tokens clash with windows headers
(BOOL, INPUT etc.), and  is indirectly included by Mesa's
own gl.h. I've been doing renames of them in my own fork.

Maybe bulk of glsl2 does not need to include that much of Mesa itself
(e.g. right now you need to have things like src/mapi/mapi/u_thread.h
and src/mesa/math/m_matrix.h - because they are included indirectly by
something that glsl2 includes).


-- 
Aras Pranckevičius
work: http://unity3d.com
home: http://aras-p.info
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64

2010-08-13 Thread Luca Barbieri
> #define symbols are usually uppercase.  Is there a reason why you're using
> lowercase?  I'd prefer the code to follow typical conventions.

Currently they can't be used in preprocessor directives because they
evaluate to enums, so it seems better to treat them as variables, and
thus lowercase.

What I think might make sense is to turn them into static consts
rather than making them uppercase and turning the enum into #defines,
since PIPE_ARCH_* (and the more standard __i386__, etc.) can already
be used in preprocessor directives.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] draw: Replace varray and vcache by vsplit

2010-08-13 Thread Keith Whitwell
On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote:
> Hi,
> 
> There are two primitive transformations in gallium draw module.  In
> varray, primitives are "split"ted.  When a primitive has more vertices
> than the middle end can handle, varray splits the primitive and calls
> the middle end multiple times.
> 
> In vcache, primitives are "decompose"d.  More advanced primitives are
> decomposed into one of point, line(_adj), or triangle(_adj).
> Similarly, vcache may call the middle end multiple times to flush its
> internal buffer.  In some cases, vcache passes the primitves through
> without decomposing nor splitting, as can be seen in vcache_check_run.
> 
> The issue with vcache is that it has to decompose a primitive
> differently depending on the provoking convention, as explained in
> 
>   http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html
> 
> It becomes a problem when GS is active.
> 
> My proposal is to make vcache split instead of decompose.  Because
> varray only splits and vcache has a pass-through path, the rest of the
> workflow already has to support all primitive types.  Switching from
> decompose to split does not require a big change to the rest of the
> workflow.
> 
> But then vcache will look a lot like varray, only with indexed
> primitive support.  It leads me to a new frontend that replaces both
> varray and vcache: vsplit
> 
>  http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit
> 
> vsplit is based on varray.  It uses some code from vcache to support
> indexed primitives.  When vcache decomposes, there are flags being set
> to indicate that if the stipple counter should be reset or if some
> edge of a triangle should be omitted in unfilled mode.  The segments
> of a splitted primitive have flags for similar purposes too:
> 
>   DRAW_SPLIT_AFTER   More segments to come after this one
>   DRAW_SPLIT_BEFORE  There are preceding segments
> 
> These flags are set by vsplit and the middle ends pass them to the
> other stages.  Therefore, the run methods of middle ends are augmented
> to take the flags.
> 
> To summarize, vsplit
> 
>  - fixes GS when (flatshade && flatshade_first) is on
>  - never sends more vertices than the middle end claims to handle
>  - is faster than vcache: split instead of decompose, no get_elt
>calls
>  - no longer uses the higher bits of draw_elts for stipple/edge flags
> 
> Suggestions?


Hi - I haven't looked at the patches yet, but a couple of questions:

How does this interact with the draw_pipe_* code - which requires
decomposed primitives?

How does this cope with indexed rendering where the vertex buffers
themselves are too large (for hardware or some other entity)?  Eg.
imagine the hardware could cope with up to 64k vertices, and you have a
drawelements call randomly referencing vertices in range 0..128k ?

Keith


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] draw: Replace varray and vcache by vsplit

2010-08-13 Thread Chia-I Wu
Hi,

There are two primitive transformations in gallium draw module.  In
varray, primitives are "split"ted.  When a primitive has more vertices
than the middle end can handle, varray splits the primitive and calls
the middle end multiple times.

In vcache, primitives are "decompose"d.  More advanced primitives are
decomposed into one of point, line(_adj), or triangle(_adj).
Similarly, vcache may call the middle end multiple times to flush its
internal buffer.  In some cases, vcache passes the primitves through
without decomposing nor splitting, as can be seen in vcache_check_run.

The issue with vcache is that it has to decompose a primitive
differently depending on the provoking convention, as explained in

  http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html

It becomes a problem when GS is active.

My proposal is to make vcache split instead of decompose.  Because
varray only splits and vcache has a pass-through path, the rest of the
workflow already has to support all primitive types.  Switching from
decompose to split does not require a big change to the rest of the
workflow.

But then vcache will look a lot like varray, only with indexed
primitive support.  It leads me to a new frontend that replaces both
varray and vcache: vsplit

 http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit

vsplit is based on varray.  It uses some code from vcache to support
indexed primitives.  When vcache decomposes, there are flags being set
to indicate that if the stipple counter should be reset or if some
edge of a triangle should be omitted in unfilled mode.  The segments
of a splitted primitive have flags for similar purposes too:

  DRAW_SPLIT_AFTER   More segments to come after this one
  DRAW_SPLIT_BEFORE  There are preceding segments

These flags are set by vsplit and the middle ends pass them to the
other stages.  Therefore, the run methods of middle ends are augmented
to take the flags.

To summarize, vsplit

 - fixes GS when (flatshade && flatshade_first) is on
 - never sends more vertices than the middle end claims to handle
 - is faster than vcache: split instead of decompose, no get_elt
   calls
 - no longer uses the higher bits of draw_elts for stipple/edge flags

Suggestions?


-- 
o...@lunarg.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] rtasm/translate_sse rework (v3)

2010-08-13 Thread Keith Whitwell
 

On Fri, 2010-08-13 at 06:47 -0700, Luca Barbieri wrote:
> This is a new version of just the rtasm/translate_sse patches.
> 
> This version has the Win64 support built-in.
> 
> In addition, it follows Keith's idea to use constants instead of #ifs.
> 
> To achieve this, u_cpu_detect.h is enhanced so that architecture and
> endianness are now compile time constants (and thus produce the same
> code as #ifs after optimization), and the CPU ABI is now also available
> to support Win64.

Thanks for the fixes, Luca - this is looking great.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64

2010-08-13 Thread Brian Paul

On 08/13/2010 07:47 AM, Luca Barbieri wrote:

A few related changes:
1. Make x86-64 its own architecture (nothing was using so
util_cpu_caps.arch, so nothing can be affected)
2. Turn the CPU arch and endianness into macros, so that the compiler
can evaluate that at constant time and eliminate dead code
3. Add util_cpu_abi to know about non-standard ABIs like Win64
---
  src/gallium/auxiliary/gallivm/lp_bld_pack.c   |2 +-
  src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |2 +-
  src/gallium/auxiliary/util/u_cpu_detect.c |   19 +-
  src/gallium/auxiliary/util/u_cpu_detect.h |   39 ++--
  4 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c 
b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
index ecfb13a..8ab742a 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
@@ -171,7 +171,7 @@ lp_build_unpack2(LLVMBuilderRef builder,
msb = lp_build_zero(src_type);

 /* Interleave bits */
-   if(util_cpu_caps.little_endian) {
+   if(util_cpu_little_endian) {
*dst_lo = lp_build_interleave2(builder, src_type, src, msb, 0);
*dst_hi = lp_build_interleave2(builder, src_type, src, msb, 1);
 }
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
index 3075065..d4b8b4f 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
@@ -1840,7 +1840,7 @@ lp_build_sample_2d_linear_aos(struct 
lp_build_sample_context *bld,
unsigned i, j;

for(j = 0; j<  h16.type.length; j += 4) {
- unsigned subindex = util_cpu_caps.little_endian ? 0 : 1;
+ unsigned subindex = util_cpu_little_endian ? 0 : 1;
   LLVMValueRef index;

   index = LLVMConstInt(elem_type, j/2 + subindex, 0);
diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c 
b/src/gallium/auxiliary/util/u_cpu_detect.c
index b1a8c75..73ce146 100644
--- a/src/gallium/auxiliary/util/u_cpu_detect.c
+++ b/src/gallium/auxiliary/util/u_cpu_detect.c
@@ -391,23 +391,6 @@ util_cpu_detect(void)

 memset(&util_cpu_caps, 0, sizeof util_cpu_caps);

-   /* Check for arch type */
-#if defined(PIPE_ARCH_MIPS)
-   util_cpu_caps.arch = UTIL_CPU_ARCH_MIPS;
-#elif defined(PIPE_ARCH_ALPHA)
-   util_cpu_caps.arch = UTIL_CPU_ARCH_ALPHA;
-#elif defined(PIPE_ARCH_SPARC)
-   util_cpu_caps.arch = UTIL_CPU_ARCH_SPARC;
-#elif defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
-   util_cpu_caps.arch = UTIL_CPU_ARCH_X86;
-   util_cpu_caps.little_endian = 1;
-#elif defined(PIPE_ARCH_PPC)
-   util_cpu_caps.arch = UTIL_CPU_ARCH_POWERPC;
-   util_cpu_caps.little_endian = 0;
-#else
-   util_cpu_caps.arch = UTIL_CPU_ARCH_UNKNOWN;
-#endif
-
 /* Count the number of CPUs in system */
  #if defined(PIPE_OS_WINDOWS)
 {
@@ -504,7 +487,7 @@ util_cpu_detect(void)

  #ifdef DEBUG
 if (debug_get_option_dump_cpu()) {
-  debug_printf("util_cpu_caps.arch = %i\n", util_cpu_caps.arch);
+  debug_printf("util_cpu_caps.arch = %i\n", util_cpu_arch);
debug_printf("util_cpu_caps.nr_cpus = %u\n", util_cpu_caps.nr_cpus);

debug_printf("util_cpu_caps.x86_cpu_type = %u\n", 
util_cpu_caps.x86_cpu_type);
diff --git a/src/gallium/auxiliary/util/u_cpu_detect.h 
b/src/gallium/auxiliary/util/u_cpu_detect.h
index 4b3dc39..e81e4b5 100644
--- a/src/gallium/auxiliary/util/u_cpu_detect.h
+++ b/src/gallium/auxiliary/util/u_cpu_detect.h
@@ -36,6 +36,7 @@
  #define _UTIL_CPU_DETECT_H

  #include "pipe/p_compiler.h"
+#include "pipe/p_config.h"

  enum util_cpu_arch {
 UTIL_CPU_ARCH_UNKNOWN = 0,
@@ -43,19 +44,49 @@ enum util_cpu_arch {
 UTIL_CPU_ARCH_ALPHA,
 UTIL_CPU_ARCH_SPARC,
 UTIL_CPU_ARCH_X86,
-   UTIL_CPU_ARCH_POWERPC
+   UTIL_CPU_ARCH_X86_64,
+   UTIL_CPU_ARCH_POWERPC,
+
+   /* non-standard ABIs, only used in util_cpu_abi */
+   UTIL_CPU_ABI_WIN64
  };

+/* Check for arch type */
+#if defined(PIPE_ARCH_MIPS)
+#define util_cpu_arch UTIL_CPU_ARCH_MIPS
+#elif defined(PIPE_ARCH_ALPHA)
+#define util_cpu_arch UTIL_CPU_ARCH_ALPHA
+#elif defined(PIPE_ARCH_SPARC)
+#define util_cpu_arch UTIL_CPU_ARCH_SPARC
+#elif defined(PIPE_ARCH_X86)
+#define util_cpu_arch UTIL_CPU_ARCH_X86
+#elif defined(PIPE_ARCH_X86_64)
+#define util_cpu_arch UTIL_CPU_ARCH_X86_64
+#elif defined(PIPE_ARCH_PPC)
+#define util_cpu_arch UTIL_CPU_ARCH_POWERPC
+#else
+#define util_cpu_arch UTIL_CPU_ARCH_UNKNOWN
+#endif
+
+#ifdef WIN64
+#define util_cpu_abi UTIL_CPU_ABI_WIN64
+#else
+#define util_cpu_abi util_cpu_arch
+#endif
+
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) || 
!defined(WORDS_BIGENDIAN)
+#define util_cpu_little_endian 1
+#else
+#define util_cpu_little_endian 0
+#endif


#define symbols are usually uppercase.  Is there a reason why you're 
using lowercase?  I'd prefer the code to follow typical conventions.


-Brian


___

[Mesa-dev] [PATCH 3/3] translate_sse: major rewrite (v3)

2010-08-13 Thread Luca Barbieri
Changes in v3:
- Win64 support (untested)
- Use u_cpu_detect.h constants instead of #ifs

Changes in v2:
- Minimize #ifs
- Give a name to magic number CHANNELS_0001
- Add support for CPUs without SSE (only memcpy and swizzles, like non SSE2)
- Fixed comments

translate_sse is currently very limited to the point of
being useless in essentially all cases.

In particular, it only support some float32 and unorm8
formats and doesn't work on x86-64.

This commit rewrites it to support:
1. Dumb memory copy for any pair of identical formats
2. All formats that are swizzles of each other
3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float
4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16
5. Support for x86-64 (doesn't take advantage of it in any way though)

This new translate can even be useful to translate index buffers for
cards that lack 8-bit index support.

It passes the testsuite I wrote, but note that this is a major change, and more
testing would be great.


0003-translate_sse-major-rewrite-v3.patch.gz
Description: GNU Zip compressed data
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/6] translate_sse: major rewrite

2010-08-13 Thread Luca Barbieri
> What about just making things prettier by converting the #if's into
> regular if statements?  It would be easier to read if nothing else,
> though it would mean compiling at least a stub version of the x86-64
> opcode emitters on x86.
>
> In fact there's nothing preventing us compiling the entire x86-64
> emitters on x86, though obviously they can't be used -- but there's
> nothing which requires that code to be #ifdef'ed out on other platforms.

OK, done, see new patchset.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] rtasm: add minimal x86-64 support and new instructions (v2)

2010-08-13 Thread Luca Barbieri
Changes in v2:
- Win64 support (untested)
- Use u_cpu_detect.h constants instead of #ifs

This commit adds minimal x86-64 support: only movs between registers
are supported for r8-r15, and x64_rexw() must be used to ask for 64-bit
operations.

It also adds several new instructions for the new translate_sse code.
---
 src/gallium/auxiliary/rtasm/rtasm_cpu.c|6 +-
 src/gallium/auxiliary/rtasm/rtasm_x86sse.c |  455 ++--
 src/gallium/auxiliary/rtasm/rtasm_x86sse.h |   69 -
 3 files changed, 493 insertions(+), 37 deletions(-)

diff --git a/src/gallium/auxiliary/rtasm/rtasm_cpu.c 
b/src/gallium/auxiliary/rtasm/rtasm_cpu.c
index 2e15751..0461c81 100644
--- a/src/gallium/auxiliary/rtasm/rtasm_cpu.c
+++ b/src/gallium/auxiliary/rtasm/rtasm_cpu.c
@@ -30,7 +30,7 @@
 #include "rtasm_cpu.h"
 
 
-#if defined(PIPE_ARCH_X86)
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
 static boolean rtasm_sse_enabled(void)
 {
static boolean firsttime = 1;
@@ -49,7 +49,7 @@ static boolean rtasm_sse_enabled(void)
 int rtasm_cpu_has_sse(void)
 {
/* FIXME: actually detect this at run-time */
-#if defined(PIPE_ARCH_X86)
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
return rtasm_sse_enabled();
 #else
return 0;
@@ -59,7 +59,7 @@ int rtasm_cpu_has_sse(void)
 int rtasm_cpu_has_sse2(void) 
 {
/* FIXME: actually detect this at run-time */
-#if defined(PIPE_ARCH_X86)
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
return rtasm_sse_enabled();
 #else
return 0;
diff --git a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c 
b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
index 63007c1..88b182b 100644
--- a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
+++ b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
@@ -22,8 +22,9 @@
  **/
 
 #include "pipe/p_config.h"
+#include "util/u_cpu_detect.h"
 
-#if defined(PIPE_ARCH_X86)
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
 
 #include "pipe/p_compiler.h"
 #include "util/u_debug.h"
@@ -231,6 +232,10 @@ static void emit_modrm( struct x86_function *p,

assert(reg.mod == mod_REG);

+   /* TODO: support extended x86-64 registers */
+   assert(reg.idx < 8);
+   assert(regmem.idx < 8);
+
val |= regmem.mod << 6; /* mod field */
val |= reg.idx << 3;/* reg field */
val |= regmem.idx;  /* r/m field */
@@ -363,6 +368,12 @@ int x86_get_label( struct x86_function *p )
  */
 
 
+void x64_rexw(struct x86_function *p)
+{
+   if(util_cpu_arch == UTIL_CPU_ARCH_X86_64)
+  emit_1ub(p, 0x48);
+}
+
 void x86_jcc( struct x86_function *p,
  enum x86_cc cc,
  int label )
@@ -449,6 +460,52 @@ void x86_mov_reg_imm( struct x86_function *p, struct 
x86_reg dst, int imm )
emit_1i(p, imm);
 }
 
+void x86_mov_imm( struct x86_function *p, struct x86_reg dst, int imm )
+{
+   DUMP_RI( dst, imm );
+   if(dst.mod == mod_REG)
+  x86_mov_reg_imm(p, dst, imm);
+   else
+   {
+  emit_1ub(p, 0xc7);
+  emit_modrm_noreg(p, 0, dst);
+  emit_1i(p, imm);
+   }
+}
+
+void x86_mov16_imm( struct x86_function *p, struct x86_reg dst, uint16_t imm )
+{
+   DUMP_RI( dst, imm );
+   emit_1ub(p, 0x66);
+   if(dst.mod == mod_REG)
+   {
+  emit_1ub(p, 0xb8 + dst.idx);
+  emit_2ub(p, imm & 0xff, imm >> 8);
+   }
+   else
+   {
+  emit_1ub(p, 0xc7);
+  emit_modrm_noreg(p, 0, dst);
+  emit_2ub(p, imm & 0xff, imm >> 8);
+   }
+}
+
+void x86_mov8_imm( struct x86_function *p, struct x86_reg dst, uint8_t imm )
+{
+   DUMP_RI( dst, imm );
+   if(dst.mod == mod_REG)
+   {
+  emit_1ub(p, 0xb0 + dst.idx);
+  emit_1ub(p, imm);
+   }
+   else
+   {
+  emit_1ub(p, 0xc6);
+  emit_modrm_noreg(p, 0, dst);
+  emit_1ub(p, imm);
+   }
+}
+
 /**
  * Immediate group 1 instructions.
  */
@@ -520,7 +577,7 @@ void x86_push( struct x86_function *p,
}
 
 
-   p->stack_offset += 4;
+   p->stack_offset += sizeof(void*);
 }
 
 void x86_push_imm32( struct x86_function *p,
@@ -530,7 +587,7 @@ void x86_push_imm32( struct x86_function *p,
emit_1ub(p, 0x68);
emit_1i(p,  imm32);
 
-   p->stack_offset += 4;
+   p->stack_offset += sizeof(void*);
 }
 
 
@@ -540,23 +597,33 @@ void x86_pop( struct x86_function *p,
DUMP_R( reg );
assert(reg.mod == mod_REG);
emit_1ub(p, 0x58 + reg.idx);
-   p->stack_offset -= 4;
+   p->stack_offset -= sizeof(void*);
 }
 
 void x86_inc( struct x86_function *p,
  struct x86_reg reg )
 {
DUMP_R( reg );
-   assert(reg.mod == mod_REG);
-   emit_1ub(p, 0x40 + reg.idx);
+   if(util_cpu_arch == UTIL_CPU_ARCH_X86 && reg.mod == mod_REG)
+   {
+  emit_1ub(p, 0x40 + reg.idx);
+  return;
+   }
+   emit_1ub(p, 0xff);
+   emit_modrm_noreg(p, 0, reg);
 }
 
 void x86_dec( struct x86_function *p,
  struct x86_reg reg )
 {
DUMP_R( reg );
-   assert(reg.mod == mod_REG);
-   emit_1ub(p, 0x48 + reg.idx);
+   if(util_cpu_arch == UTIL_C

[Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64

2010-08-13 Thread Luca Barbieri
A few related changes:
1. Make x86-64 its own architecture (nothing was using so
   util_cpu_caps.arch, so nothing can be affected)
2. Turn the CPU arch and endianness into macros, so that the compiler
   can evaluate that at constant time and eliminate dead code
3. Add util_cpu_abi to know about non-standard ABIs like Win64
---
 src/gallium/auxiliary/gallivm/lp_bld_pack.c   |2 +-
 src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |2 +-
 src/gallium/auxiliary/util/u_cpu_detect.c |   19 +-
 src/gallium/auxiliary/util/u_cpu_detect.h |   39 ++--
 4 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c 
b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
index ecfb13a..8ab742a 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
@@ -171,7 +171,7 @@ lp_build_unpack2(LLVMBuilderRef builder,
   msb = lp_build_zero(src_type);
 
/* Interleave bits */
-   if(util_cpu_caps.little_endian) {
+   if(util_cpu_little_endian) {
   *dst_lo = lp_build_interleave2(builder, src_type, src, msb, 0);
   *dst_hi = lp_build_interleave2(builder, src_type, src, msb, 1);
}
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
index 3075065..d4b8b4f 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
@@ -1840,7 +1840,7 @@ lp_build_sample_2d_linear_aos(struct 
lp_build_sample_context *bld,
   unsigned i, j;
 
   for(j = 0; j < h16.type.length; j += 4) {
- unsigned subindex = util_cpu_caps.little_endian ? 0 : 1;
+ unsigned subindex = util_cpu_little_endian ? 0 : 1;
  LLVMValueRef index;
 
  index = LLVMConstInt(elem_type, j/2 + subindex, 0);
diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c 
b/src/gallium/auxiliary/util/u_cpu_detect.c
index b1a8c75..73ce146 100644
--- a/src/gallium/auxiliary/util/u_cpu_detect.c
+++ b/src/gallium/auxiliary/util/u_cpu_detect.c
@@ -391,23 +391,6 @@ util_cpu_detect(void)
 
memset(&util_cpu_caps, 0, sizeof util_cpu_caps);
 
-   /* Check for arch type */
-#if defined(PIPE_ARCH_MIPS)
-   util_cpu_caps.arch = UTIL_CPU_ARCH_MIPS;
-#elif defined(PIPE_ARCH_ALPHA)
-   util_cpu_caps.arch = UTIL_CPU_ARCH_ALPHA;
-#elif defined(PIPE_ARCH_SPARC)
-   util_cpu_caps.arch = UTIL_CPU_ARCH_SPARC;
-#elif defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
-   util_cpu_caps.arch = UTIL_CPU_ARCH_X86;
-   util_cpu_caps.little_endian = 1;
-#elif defined(PIPE_ARCH_PPC)
-   util_cpu_caps.arch = UTIL_CPU_ARCH_POWERPC;
-   util_cpu_caps.little_endian = 0;
-#else
-   util_cpu_caps.arch = UTIL_CPU_ARCH_UNKNOWN;
-#endif
-
/* Count the number of CPUs in system */
 #if defined(PIPE_OS_WINDOWS)
{
@@ -504,7 +487,7 @@ util_cpu_detect(void)
 
 #ifdef DEBUG
if (debug_get_option_dump_cpu()) {
-  debug_printf("util_cpu_caps.arch = %i\n", util_cpu_caps.arch);
+  debug_printf("util_cpu_caps.arch = %i\n", util_cpu_arch);
   debug_printf("util_cpu_caps.nr_cpus = %u\n", util_cpu_caps.nr_cpus);
 
   debug_printf("util_cpu_caps.x86_cpu_type = %u\n", 
util_cpu_caps.x86_cpu_type);
diff --git a/src/gallium/auxiliary/util/u_cpu_detect.h 
b/src/gallium/auxiliary/util/u_cpu_detect.h
index 4b3dc39..e81e4b5 100644
--- a/src/gallium/auxiliary/util/u_cpu_detect.h
+++ b/src/gallium/auxiliary/util/u_cpu_detect.h
@@ -36,6 +36,7 @@
 #define _UTIL_CPU_DETECT_H
 
 #include "pipe/p_compiler.h"
+#include "pipe/p_config.h"
 
 enum util_cpu_arch {
UTIL_CPU_ARCH_UNKNOWN = 0,
@@ -43,19 +44,49 @@ enum util_cpu_arch {
UTIL_CPU_ARCH_ALPHA,
UTIL_CPU_ARCH_SPARC,
UTIL_CPU_ARCH_X86,
-   UTIL_CPU_ARCH_POWERPC
+   UTIL_CPU_ARCH_X86_64,
+   UTIL_CPU_ARCH_POWERPC,
+
+   /* non-standard ABIs, only used in util_cpu_abi */
+   UTIL_CPU_ABI_WIN64
 };
 
+/* Check for arch type */
+#if defined(PIPE_ARCH_MIPS)
+#define util_cpu_arch UTIL_CPU_ARCH_MIPS
+#elif defined(PIPE_ARCH_ALPHA)
+#define util_cpu_arch UTIL_CPU_ARCH_ALPHA
+#elif defined(PIPE_ARCH_SPARC)
+#define util_cpu_arch UTIL_CPU_ARCH_SPARC
+#elif defined(PIPE_ARCH_X86)
+#define util_cpu_arch UTIL_CPU_ARCH_X86
+#elif defined(PIPE_ARCH_X86_64)
+#define util_cpu_arch UTIL_CPU_ARCH_X86_64
+#elif defined(PIPE_ARCH_PPC)
+#define util_cpu_arch UTIL_CPU_ARCH_POWERPC
+#else
+#define util_cpu_arch UTIL_CPU_ARCH_UNKNOWN
+#endif
+
+#ifdef WIN64
+#define util_cpu_abi UTIL_CPU_ABI_WIN64
+#else
+#define util_cpu_abi util_cpu_arch
+#endif
+
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) || 
!defined(WORDS_BIGENDIAN)
+#define util_cpu_little_endian 1
+#else
+#define util_cpu_little_endian 0
+#endif
+
 struct util_cpu_caps {
-   enum util_cpu_arch arch;
unsigned nr_cpus;
 
/* Feature flags */
int x86_cpu_type;
unsigned cacheline;
 
-   unsigned little_endian:1;
-
unsigned has_tsc:1;
unsigned has_mmx:1;
unsigned has_mmx2:1;
-- 

[Mesa-dev] [PATCH 0/3] rtasm/translate_sse rework (v3)

2010-08-13 Thread Luca Barbieri
This is a new version of just the rtasm/translate_sse patches.

This version has the Win64 support built-in.

In addition, it follows Keith's idea to use constants instead of #ifs.

To achieve this, u_cpu_detect.h is enhanced so that architecture and
endianness are now compile time constants (and thus produce the same
code as #ifs after optimization), and the CPU ABI is now also available
to support Win64.

Luca Barbieri (3):
  u_cpu_detect: make arch and little_endian constants, add abi and
x86-64
  rtasm: add minimal x86-64 support and new instructions (v2)
  translate_sse: major rewrite (v3)

 src/gallium/auxiliary/gallivm/lp_bld_pack.c   |2 +-
 src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |2 +-
 src/gallium/auxiliary/rtasm/rtasm_cpu.c   |6 +-
 src/gallium/auxiliary/rtasm/rtasm_x86sse.c|  455 -
 src/gallium/auxiliary/rtasm/rtasm_x86sse.h|   69 ++-
 src/gallium/auxiliary/translate/translate_sse.c   | 1162 -
 src/gallium/auxiliary/util/u_cpu_detect.c |   19 +-
 src/gallium/auxiliary/util/u_cpu_detect.h |   39 +-
 8 files changed, 1455 insertions(+), 299 deletions(-)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] talloc (Was: Merge criteria for glsl2 branch)

2010-08-13 Thread José Fonseca
On Thu, 2010-08-12 at 14:46 -0700, Ian Romanick wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> José Fonseca wrote:
> 
> > OK.
> > 
> > What about this:
> > 
> > For GLUT, GLEW, LLVM and all other dependencies I'll just make a SDK
> > with the binaries, with debug & release, 32 & 64 bit, MinGW & MSVC
> > versions. One seldom needs to modify the source anyway, and they have
> > active upstream development.
> > 
> > But I perceive talloc as different from all above: it's very low level
> > and low weight library, providing very basic functionality, and upstream
> > never showed interest for Windows portability. I'd really prefer to see
> > the talloc source bundled (and only compiled on windows), as a quick way
> > to have glsl2 merged without causing windows build failures. 
> 
> This seems like a reasonable compromise.  Is this something that you and
> / or Aras can tackle?  I don't have a Windows build system set up, so I
> wouldn't be able to test any build system changes that I made.

I've pushed a new branch glsl2-win32 that includes Aras' patch, and all
necessary fixes to get at least MinGW build successfully.

I had to rename some tokens in order to avoid collisions with windows.h
defines. Aras didn't mention this problem before. Perhaps the indirect
windows.h include can be avoided, or you prefer to handle this some
other way.

Jose

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Nouveau errors in /var/log/messages - what are they and what do they mean?

2010-08-13 Thread Alex Buell
What does all these errors that the X11 Nouveau driver spews out in
my /var/log/message mean? They don't seem to do any harm but I was
wondering whether it might be having an impact on the driver's
performance? 


Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: Allocating FIFO 
number 3
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: 
nouveau_channel_alloc: initialised FIFO 3
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/2 Mthd 0x Data 0x8801
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/2 Mthd 0x0180 Data 0x8800
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/3 Mthd 0x Data 0x8802
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/3 Mthd 0x0184 Data 0xbeef0201
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/3 Mthd 0x0188 Data 0xbeef0201
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/4 Mthd 0x Data 0x8803
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/4 Mthd 0x0180 Data 0x8800
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/4 Mthd 0x019c Data 0x8802
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/4 Mthd 0x02fc Data 0x0003
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/5 Mthd 0x Data 0x8804
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/5 Mthd 0x0180 Data 0x8800
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/5 Mthd 0x0198 Data 0x8802
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/5 Mthd 0x02fc Data 0x0003
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/5 Mthd 0x0304 Data 0x0002
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/7 Mthd 0x Data 0xbeef3097
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/7 Mthd 0x0180 Data 0xbeef0301
Aug 13 13:05:54 lithium kernel: nouveau_ratelimit: 38 callbacks suppressed
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - 
nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 
3/7 Class 0x Mthd 0x0184 Data 0x1f42:0x1f42
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - 
nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 
3/7 Class 0x Mthd 0x0188 Data 0x1f43:0x1f43
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - 
nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 
3/7 Class 0x Mthd 0x018c Data 0x1f42:0x1f42
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - 
nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 
3/7 Class 0x Mthd 0x0194 Data 0x1f42:0x1f42
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - 
nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 
3/7 Class 0x Mthd 0x0198 Data 0x1f42:0x1f42
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - 
nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 
3/7 Class 0x Mthd 0x019c Data 0x1f42:0x1f42
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - 
nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 
3/7 Class 0x Mthd 0x01a0 Data 0x1f43:0x1f43
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - 
nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 
3/7 Class 0x Mthd 0x01a4 Data 0x23b5:0x23b5
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - 
Ch 3/7 Mthd 0x01a8 Data 0xbeef0302
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - 
nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 
3/7 Class 0x Mthd 0x01ac Data 0x1f42:0x1f42
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - 
nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT
Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 
3/7 C

Re: [Mesa-dev] [PATCH 6/6] translate_sse: major rewrite

2010-08-13 Thread Keith Whitwell
On Fri, 2010-08-13 at 04:46 -0700, Luca Barbieri wrote:
> > Is it possible to use an explicit flag for the (out_chans == 5) case?
> 
> Gave it the name CHANNELS_0001 and added a comment.
> 
> > Is it possible to do this without all the #ifdefs?  Even if statements
> > based on a preprocessor variable would be easier to read, but better
> > still would be some sort of wrapper function which just did the right
> > thing on either architecture.
> 
> Right, done.
> 
> > Similar comment applies to your x86-64 changes in rtasm.c -- is there a
> > way to reduce the #ifdef load?
> 
> Here it seems impossible, as it doesn't seem to be possible to
> abstract any of them (except possibly adding a function to encode both
> INC and DEC , but that doesn't really seem a win).

What about just making things prettier by converting the #if's into
regular if statements?  It would be easier to read if nothing else,
though it would mean compiling at least a stub version of the x86-64
opcode emitters on x86.

In fact there's nothing preventing us compiling the entire x86-64
emitters on x86, though obviously they can't be used -- but there's
nothing which requires that code to be #ifdef'ed out on other platforms.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] rtasm/translate_sse: support Win64

2010-08-13 Thread Luca Barbieri
I just discovered that Microsoft wisely decided to use their own
calling convention on Win64...

This hasn't actually been tested on Win64 though.
---
 src/gallium/auxiliary/rtasm/rtasm_x86sse.c  |   15 +++
 src/gallium/auxiliary/translate/translate_sse.c |   21 +
 2 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c 
b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
index 6d6b76a..a076e17 100644
--- a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
+++ b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
@@ -2090,6 +2090,20 @@ struct x86_reg x86_fn_arg( struct x86_function *p,
 {
switch(arg)
{
+/* Microsoft uses a different calling convention than the rest of the world */
+#ifdef WIN64
+   case 1:
+  return x86_make_reg(file_REG32, reg_CX);
+   case 2:
+  return x86_make_reg(file_REG32, reg_DX);
+   case 3:
+  return x86_make_reg(file_REG32, reg_R8);
+   case 4:
+  return x86_make_reg(file_REG32, reg_R9);
+   default:
+  return x86_make_disp(x86_make_reg(file_REG32, reg_SP),
+   p->stack_offset + (arg - 4) * 8); /* ??? */
+#else
case 1:
   return x86_make_reg(file_REG32, reg_DI);
case 2:
@@ -2105,6 +2119,7 @@ struct x86_reg x86_fn_arg( struct x86_function *p,
default:
   return x86_make_disp(x86_make_reg(file_REG32, reg_SP),
p->stack_offset + (arg - 6) * 8); /* ??? */
+#endif
}
 }
 #else
diff --git a/src/gallium/auxiliary/translate/translate_sse.c 
b/src/gallium/auxiliary/translate/translate_sse.c
index e2d8d53..5dfb186 100644
--- a/src/gallium/auxiliary/translate/translate_sse.c
+++ b/src/gallium/auxiliary/translate/translate_sse.c
@@ -1234,26 +1234,23 @@ static boolean build_vertex_emit( struct translate_sse 
*p,
 
x86_init_func(p->func);
 
-#ifdef PIPE_ARCH_X86_64
x86_push(p->func, p->outbuf_EBX);
x86_push(p->func, p->count_EBP);
 
-   /* Load arguments into regs; the first two are already there */
-   x86_mov(p->func, p->count_EBP, x86_fn_arg(p->func, 3));
-   x64_mov64(p->func, p->outbuf_EBX, x86_fn_arg(p->func, 5));
-#else
-   /* Push a few regs?
-*/
-   x86_push(p->func, p->outbuf_EBX);
-   x86_push(p->func, p->count_EBP);
+/* on non-Win64 x86-64, these are already in the right registers */
+#if defined(PIPE_ARCH_X86) || defined(WIN64)
x86_push(p->func, p->machine_EDI);
x86_push(p->func, p->idx_ESI);
 
-   /* Load arguments into regs:
-*/
x86_mov(p->func, p->machine_EDI, x86_fn_arg(p->func, 1));
x86_mov(p->func, p->idx_ESI, x86_fn_arg(p->func, 2));
+#endif
+
x86_mov(p->func, p->count_EBP, x86_fn_arg(p->func, 3));
+
+#ifdef PIPE_ARCH_X86_64
+   x64_mov64(p->func, p->outbuf_EBX, x86_fn_arg(p->func, 5));
+#else
x86_mov(p->func, p->outbuf_EBX, x86_fn_arg(p->func, 5));
 #endif
 
@@ -1333,7 +1330,7 @@ static boolean build_vertex_emit( struct translate_sse *p,
/* Pop regs and return
 */

-#ifndef PIPE_ARCH_X86_64
+#if defined(PIPE_ARCH_X86) || defined(WIN64)
x86_pop(p->func, p->idx_ESI);
x86_pop(p->func, p->machine_EDI);
 #endif
-- 
1.7.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] translate_generic: use memcpy if possible

2010-08-13 Thread Luca Barbieri
> In this change you've got an int value (copy_size) which has some
> special meaning when negative -- can you add comments explaining what
> the meaning of a negative size is?  Is there a way to use some more
> explicit flag value to indicate this condition?

I think it makes sense, since -1 stands for "not applicable", because
we aren't doing the memcpy whose size copy_size is about (also, the
CPU will already have the value loaded for use in memcpy if it turns
out to be non-negative).

I added a comment explaining that.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] translate_sse: major rewrite (v2)

2010-08-13 Thread Luca Barbieri
Changes in v2:
- Minimize #ifs
- Give a name to magic number CHANNELS_0001
- Add support for CPUs without SSE (only memcpy and swizzles, like non SSE2)
- Fixed comments

translate_sse is currently very limited to the point of
being useless in essentially all cases.

In particular, it only support some float32 and unorm8
formats and doesn't work on x86-64.

This commit rewrites it to support:
1. Dumb memory copy for any pair of identical formats
2. All formats that are swizzles of each other
3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float
4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16
5. Support for x86-64 (doesn't take advantage of it in any way though)

This new translate can even be useful to translate index buffers for
cards that lack 8-bit index support.

It passes the testsuite I wrote, but note that this is a major change, and more
testing would be great.


0006-translate_sse-major-rewrite-v2.patch.gz
Description: GNU Zip compressed data
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/6] translate_sse: major rewrite

2010-08-13 Thread Luca Barbieri
> Is it possible to use an explicit flag for the (out_chans == 5) case?

Gave it the name CHANNELS_0001 and added a comment.

> Is it possible to do this without all the #ifdefs?  Even if statements
> based on a preprocessor variable would be easier to read, but better
> still would be some sort of wrapper function which just did the right
> thing on either architecture.

Right, done.

> Similar comment applies to your x86-64 changes in rtasm.c -- is there a
> way to reduce the #ifdef load?

Here it seems impossible, as it doesn't seem to be possible to
abstract any of them (except possibly adding a function to encode both
INC and DEC , but that doesn't really seem a win).

> +            // TODO: add support for SSE4.1 pmovzx
>
> Probably want to use C-style comments throughout.

Done.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] rtasm: add minimal x86-64 support and new instructions

2010-08-13 Thread Luca Barbieri
This commit adds minimal x86-64 support: only movs between registers
are supported for r8-r15, and x64_rexw() must be used to ask for 64-bit
operations.

It also adds several new instructions for the new translate_sse code.
---
 src/gallium/auxiliary/rtasm/rtasm_cpu.c|6 +-
 src/gallium/auxiliary/rtasm/rtasm_x86sse.c |  433 ++--
 src/gallium/auxiliary/rtasm/rtasm_x86sse.h |   69 -
 3 files changed, 477 insertions(+), 31 deletions(-)

diff --git a/src/gallium/auxiliary/rtasm/rtasm_cpu.c 
b/src/gallium/auxiliary/rtasm/rtasm_cpu.c
index 2e15751..0461c81 100644
--- a/src/gallium/auxiliary/rtasm/rtasm_cpu.c
+++ b/src/gallium/auxiliary/rtasm/rtasm_cpu.c
@@ -30,7 +30,7 @@
 #include "rtasm_cpu.h"
 
 
-#if defined(PIPE_ARCH_X86)
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
 static boolean rtasm_sse_enabled(void)
 {
static boolean firsttime = 1;
@@ -49,7 +49,7 @@ static boolean rtasm_sse_enabled(void)
 int rtasm_cpu_has_sse(void)
 {
/* FIXME: actually detect this at run-time */
-#if defined(PIPE_ARCH_X86)
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
return rtasm_sse_enabled();
 #else
return 0;
@@ -59,7 +59,7 @@ int rtasm_cpu_has_sse(void)
 int rtasm_cpu_has_sse2(void) 
 {
/* FIXME: actually detect this at run-time */
-#if defined(PIPE_ARCH_X86)
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
return rtasm_sse_enabled();
 #else
return 0;
diff --git a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c 
b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
index 63007c1..6d6b76a 100644
--- a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
+++ b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
@@ -23,7 +23,7 @@
 
 #include "pipe/p_config.h"
 
-#if defined(PIPE_ARCH_X86)
+#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
 
 #include "pipe/p_compiler.h"
 #include "util/u_debug.h"
@@ -231,6 +231,10 @@ static void emit_modrm( struct x86_function *p,

assert(reg.mod == mod_REG);

+   /* TODO: support extended x86-64 registers */
+   assert(reg.idx < 8);
+   assert(regmem.idx < 8);
+
val |= regmem.mod << 6; /* mod field */
val |= reg.idx << 3;/* reg field */
val |= regmem.idx;  /* r/m field */
@@ -363,6 +367,13 @@ int x86_get_label( struct x86_function *p )
  */
 
 
+void x64_rexw(struct x86_function *p)
+{
+#if defined(PIPE_ARCH_X86_64)
+   emit_1ub(p, 0x48);
+#endif
+}
+
 void x86_jcc( struct x86_function *p,
  enum x86_cc cc,
  int label )
@@ -449,6 +460,52 @@ void x86_mov_reg_imm( struct x86_function *p, struct 
x86_reg dst, int imm )
emit_1i(p, imm);
 }
 
+void x86_mov_imm( struct x86_function *p, struct x86_reg dst, int imm )
+{
+   DUMP_RI( dst, imm );
+   if(dst.mod == mod_REG)
+  x86_mov_reg_imm(p, dst, imm);
+   else
+   {
+  emit_1ub(p, 0xc7);
+  emit_modrm_noreg(p, 0, dst);
+  emit_1i(p, imm);
+   }
+}
+
+void x86_mov16_imm( struct x86_function *p, struct x86_reg dst, uint16_t imm )
+{
+   DUMP_RI( dst, imm );
+   emit_1ub(p, 0x66);
+   if(dst.mod == mod_REG)
+   {
+  emit_1ub(p, 0xb8 + dst.idx);
+  emit_2ub(p, imm & 0xff, imm >> 8);
+   }
+   else
+   {
+  emit_1ub(p, 0xc7);
+  emit_modrm_noreg(p, 0, dst);
+  emit_2ub(p, imm & 0xff, imm >> 8);
+   }
+}
+
+void x86_mov8_imm( struct x86_function *p, struct x86_reg dst, uint8_t imm )
+{
+   DUMP_RI( dst, imm );
+   if(dst.mod == mod_REG)
+   {
+  emit_1ub(p, 0xb0 + dst.idx);
+  emit_1ub(p, imm);
+   }
+   else
+   {
+  emit_1ub(p, 0xc6);
+  emit_modrm_noreg(p, 0, dst);
+  emit_1ub(p, imm);
+   }
+}
+
 /**
  * Immediate group 1 instructions.
  */
@@ -520,7 +577,7 @@ void x86_push( struct x86_function *p,
}
 
 
-   p->stack_offset += 4;
+   p->stack_offset += sizeof(void*);
 }
 
 void x86_push_imm32( struct x86_function *p,
@@ -530,7 +587,7 @@ void x86_push_imm32( struct x86_function *p,
emit_1ub(p, 0x68);
emit_1i(p,  imm32);
 
-   p->stack_offset += 4;
+   p->stack_offset += sizeof(void*);
 }
 
 
@@ -540,23 +597,37 @@ void x86_pop( struct x86_function *p,
DUMP_R( reg );
assert(reg.mod == mod_REG);
emit_1ub(p, 0x58 + reg.idx);
-   p->stack_offset -= 4;
+   p->stack_offset -= sizeof(void*);
 }
 
 void x86_inc( struct x86_function *p,
  struct x86_reg reg )
 {
DUMP_R( reg );
-   assert(reg.mod == mod_REG);
-   emit_1ub(p, 0x40 + reg.idx);
+#if defined(PIPE_ARCH_X86)
+   if(reg.mod == mod_REG);
+   {
+  emit_1ub(p, 0x40 + reg.idx);
+  return;
+   }
+#endif
+   emit_1ub(p, 0xff);
+   emit_modrm_noreg(p, 0, reg);
 }
 
 void x86_dec( struct x86_function *p,
  struct x86_reg reg )
 {
DUMP_R( reg );
-   assert(reg.mod == mod_REG);
-   emit_1ub(p, 0x48 + reg.idx);
+#if defined(PIPE_ARCH_X86)
+   if(reg.mod == mod_REG)
+   {
+  emit_1ub(p, 0x48 + reg.idx);
+  return;
+   }
+#endif
+   emit_1ub(p, 0xff);
+   emit_modrm_noreg(p, 1, reg);
 }
 
 void x86_ret( struct x86_function *p )
@@ -583,8 +654

[Mesa-dev] [PATCH 4/6] translate: add support for 8/16-bit indices

2010-08-13 Thread Luca Barbieri
Currently, only 32-bit indices are supported, but some use cases
translate needs support for all types.
---
 src/gallium/auxiliary/rtasm/rtasm_x86sse.c |   14 
 src/gallium/auxiliary/rtasm/rtasm_x86sse.h |2 +
 src/gallium/auxiliary/translate/translate.h|   12 
 .../auxiliary/translate/translate_generic.c|   34 ++
 src/gallium/auxiliary/translate/translate_sse.c|   65 ++--
 5 files changed, 108 insertions(+), 19 deletions(-)

diff --git a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c 
b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
index 9f70b73..63007c1 100644
--- a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
+++ b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c
@@ -586,6 +586,20 @@ void x86_mov( struct x86_function *p,
emit_op_modrm( p, 0x8b, 0x89, dst, src );
 }
 
+void x86_movzx8(struct x86_function *p, struct x86_reg dst, struct x86_reg src 
)
+{
+   DUMP_RR( dst, src );
+   emit_2ub(p, 0x0f, 0xb6);
+   emit_modrm(p, dst, src);
+}
+
+void x86_movzx16(struct x86_function *p, struct x86_reg dst, struct x86_reg 
src )
+{
+   DUMP_RR( dst, src );
+   emit_2ub(p, 0x0f, 0xb7);
+   emit_modrm(p, dst, src);
+}
+
 void x86_xor( struct x86_function *p,
  struct x86_reg dst,
  struct x86_reg src )
diff --git a/src/gallium/auxiliary/rtasm/rtasm_x86sse.h 
b/src/gallium/auxiliary/rtasm/rtasm_x86sse.h
index 6208e8f..365dec1 100644
--- a/src/gallium/auxiliary/rtasm/rtasm_x86sse.h
+++ b/src/gallium/auxiliary/rtasm/rtasm_x86sse.h
@@ -237,6 +237,8 @@ void x86_dec( struct x86_function *p, struct x86_reg reg );
 void x86_inc( struct x86_function *p, struct x86_reg reg );
 void x86_lea( struct x86_function *p, struct x86_reg dst, struct x86_reg src );
 void x86_mov( struct x86_function *p, struct x86_reg dst, struct x86_reg src );
+void x86_movzx8( struct x86_function *p, struct x86_reg dst, struct x86_reg 
src );
+void x86_movzx16( struct x86_function *p, struct x86_reg dst, struct x86_reg 
src );
 void x86_mul( struct x86_function *p, struct x86_reg src );
 void x86_imul( struct x86_function *p, struct x86_reg dst, struct x86_reg src 
);
 void x86_or( struct x86_function *p, struct x86_reg dst, struct x86_reg src );
diff --git a/src/gallium/auxiliary/translate/translate.h 
b/src/gallium/auxiliary/translate/translate.h
index eb6f2cc..a753802 100644
--- a/src/gallium/auxiliary/translate/translate.h
+++ b/src/gallium/auxiliary/translate/translate.h
@@ -85,6 +85,18 @@ struct translate {
 unsigned instance_id,
 void *output_buffer);
 
+   void (PIPE_CDECL *run_elts16)( struct translate *,
+const uint16_t *elts,
+unsigned count,
+unsigned instance_id,
+void *output_buffer);
+
+   void (PIPE_CDECL *run_elts8)( struct translate *,
+const uint8_t *elts,
+unsigned count,
+unsigned instance_id,
+void *output_buffer);
+
void (PIPE_CDECL *run)( struct translate *,
unsigned start,
unsigned count,
diff --git a/src/gallium/auxiliary/translate/translate_generic.c 
b/src/gallium/auxiliary/translate/translate_generic.c
index e7f5384..10706a7 100644
--- a/src/gallium/auxiliary/translate/translate_generic.c
+++ b/src/gallium/auxiliary/translate/translate_generic.c
@@ -441,6 +441,38 @@ static void PIPE_CDECL generic_run_elts( struct translate 
*translate,
}
 }
 
+static void PIPE_CDECL generic_run_elts16( struct translate *translate,
+ const uint16_t *elts,
+ unsigned count,
+ unsigned instance_id,
+ void *output_buffer )
+{
+   struct translate_generic *tg = translate_generic(translate);
+   char *vert = output_buffer;
+   unsigned i;
+
+   for (i = 0; i < count; i++) {
+  generic_run_one(tg, *elts++, instance_id, vert);
+  vert += tg->translate.key.output_stride;
+   }
+}
+
+static void PIPE_CDECL generic_run_elts8( struct translate *translate,
+ const uint8_t *elts,
+ unsigned count,
+ unsigned instance_id,
+ void *output_buffer )
+{
+   struct translate_generic *tg = translate_generic(translate);
+   char *vert = output_buffer;
+   unsigned i;
+
+   for (i = 0; i < count; i++) {
+  generic_run_one(tg, *elts++, instance_id, vert);
+  vert += tg->translate.key.output_stride;
+   }
+}
+
 static void PIPE_CDECL generic_run( struct translate *translate,
 unsigned start,
 unsigned c

[Mesa-dev] [PATCH 3/6] translate_sse: remove useless generated function wrappers

2010-08-13 Thread Luca Barbieri
Currently translate_sse puts two trivial wrappers in the translate vtable.

These slow it down and enlarge the source code for no gain, except perhaps
the ability to set a breakpoint there, so remove them.

Breakpoints can be set on the caller of the translate functions, with no
loss of functionality.
---
 src/gallium/auxiliary/translate/translate_sse.c |   55 ++-
 1 files changed, 4 insertions(+), 51 deletions(-)

diff --git a/src/gallium/auxiliary/translate/translate_sse.c 
b/src/gallium/auxiliary/translate/translate_sse.c
index ef3aa67..68c71f4 100644
--- a/src/gallium/auxiliary/translate/translate_sse.c
+++ b/src/gallium/auxiliary/translate/translate_sse.c
@@ -46,18 +46,6 @@
 #define W3
 
 
-typedef void (PIPE_CDECL *run_func)( struct translate *translate,
- unsigned start,
- unsigned count,
- unsigned instance_id,
- void *output_buffer);
-
-typedef void (PIPE_CDECL *run_elts_func)( struct translate *translate,
-  const unsigned *elts,
-  unsigned count,
-  unsigned instance_id,
-  void *output_buffer);
-
 struct translate_buffer {
const void *base_ptr;
unsigned stride;
@@ -102,9 +90,6 @@ struct translate_sse {
boolean use_instancing;
unsigned instance_id;
 
-   run_func  gen_run;
-   run_elts_func gen_run_elts;
-
/* these are actually known values, but putting them in a struct
 * like this is helpful to keep them in sync across the file.
 */
@@ -700,36 +685,6 @@ static void translate_sse_release( struct translate 
*translate )
FREE(p);
 }
 
-static void PIPE_CDECL translate_sse_run_elts( struct translate *translate,
- const unsigned *elts,
- unsigned count,
-  unsigned instance_id,
- void *output_buffer )
-{
-   struct translate_sse *p = (struct translate_sse *)translate;
-
-   p->gen_run_elts( translate,
-   elts,
-   count,
-instance_id,
-output_buffer);
-}
-
-static void PIPE_CDECL translate_sse_run( struct translate *translate,
-unsigned start,
-unsigned count,
- unsigned instance_id,
-void *output_buffer )
-{
-   struct translate_sse *p = (struct translate_sse *)translate;
-
-   p->gen_run( translate,
-  start,
-  count,
-   instance_id,
-   output_buffer);
-}
-
 
 struct translate *translate_sse2_create( const struct translate_key *key )
 {
@@ -746,8 +701,6 @@ struct translate *translate_sse2_create( const struct 
translate_key *key )
p->translate.key = *key;
p->translate.release = translate_sse_release;
p->translate.set_buffer = translate_sse_set_buffer;
-   p->translate.run_elts = translate_sse_run_elts;
-   p->translate.run = translate_sse_run;
 
for (i = 0; i < key->nr_elements; i++) {
   if (key->element[i].type == TRANSLATE_ELEMENT_NORMAL) {
@@ -789,12 +742,12 @@ struct translate *translate_sse2_create( const struct 
translate_key *key )
if (!build_vertex_emit(p, &p->elt_func, FALSE))
   goto fail;
 
-   p->gen_run = (run_func)x86_get_func(&p->linear_func);
-   if (p->gen_run == NULL)
+   p->translate.run = (void*)x86_get_func(&p->linear_func);
+   if (p->translate.run == NULL)
   goto fail;
 
-   p->gen_run_elts = (run_elts_func)x86_get_func(&p->elt_func);
-   if (p->gen_run_elts == NULL)
+   p->translate.run_elts = (void*)x86_get_func(&p->elt_func);
+   if (p->translate.run_elts == NULL)
   goto fail;
 
return &p->translate;
-- 
1.7.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/6] translate_generic: factor out common code between linear and indexed

2010-08-13 Thread Luca Barbieri
This moves the common code into a separate ALWAYS_INLINE function.
---
 .../auxiliary/translate/translate_generic.c|  177 +++-
 1 files changed, 62 insertions(+), 115 deletions(-)

diff --git a/src/gallium/auxiliary/translate/translate_generic.c 
b/src/gallium/auxiliary/translate/translate_generic.c
index 356d488..e7f5384 100644
--- a/src/gallium/auxiliary/translate/translate_generic.c
+++ b/src/gallium/auxiliary/translate/translate_generic.c
@@ -362,6 +362,66 @@ static emit_func get_emit_func( enum pipe_format format )
}
 }
 
+static ALWAYS_INLINE void PIPE_CDECL generic_run_one( struct translate_generic 
*tg,
+ unsigned elt,
+ unsigned instance_id,
+ void *vert )
+{
+   unsigned nr_attrs = tg->nr_attrib;
+   unsigned attr;
+
+   for (attr = 0; attr < nr_attrs; attr++) {
+  float data[4];
+  char *dst = vert + tg->attrib[attr].output_offset;
+
+  if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) {
+ const uint8_t *src;
+ unsigned index;
+ int copy_size;
+
+ if (tg->attrib[attr].instance_divisor) {
+index = instance_id / tg->attrib[attr].instance_divisor;
+ }
+ else {
+index = elt;
+ }
+
+ /* clamp to void going out of bounds */
+ index = MIN2(index, tg->attrib[attr].max_index);
+
+ src = tg->attrib[attr].input_ptr +
+   tg->attrib[attr].input_stride * index;
+
+ copy_size = tg->attrib[attr].copy_size;
+ if(likely(copy_size >= 0))
+memcpy(dst, src, copy_size);
+ else
+ {
+tg->attrib[attr].fetch( data, src, 0, 0 );
+
+if (0)
+   debug_printf("Fetch linear attr %d  from %p  stride %d  index 
%d: "
+ " %f, %f, %f, %f \n",
+ attr,
+ tg->attrib[attr].input_ptr,
+ tg->attrib[attr].input_stride,
+ index,
+ data[0], data[1],data[2], data[3]);
+
+tg->attrib[attr].emit( data, dst );
+ }
+  } else {
+ if(likely(tg->attrib[attr].copy_size >= 0))
+memcpy(data, &instance_id, 4);
+ else
+ {
+data[0] = (float)instance_id;
+tg->attrib[attr].emit( data, dst );
+ }
+  }
+   }
+}
+
 /**
  * Fetch vertex attributes for 'count' vertices.
  */
@@ -373,71 +433,14 @@ static void PIPE_CDECL generic_run_elts( struct translate 
*translate,
 {
struct translate_generic *tg = translate_generic(translate);
char *vert = output_buffer;
-   unsigned nr_attrs = tg->nr_attrib;
-   unsigned attr;
unsigned i;
 
-   /* loop over vertex attributes (vertex shader inputs)
-*/
for (i = 0; i < count; i++) {
-  const unsigned elt = *elts++;
-
-  for (attr = 0; attr < nr_attrs; attr++) {
-float data[4];
-char *dst = vert + tg->attrib[attr].output_offset;
-
-if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) {
-const uint8_t *src;
-unsigned index;
-int copy_size;
-
-if (tg->attrib[attr].instance_divisor) {
-   index = instance_id / tg->attrib[attr].instance_divisor;
-} else {
-   index = elt;
-}
-
-/* clamp to void going out of bounds */
-index = MIN2(index, tg->attrib[attr].max_index);
-
-src = tg->attrib[attr].input_ptr +
-  tg->attrib[attr].input_stride * index;
-
-copy_size = tg->attrib[attr].copy_size;
-if(likely(copy_size >= 0))
-   memcpy(dst, src, copy_size);
-else
-{
-   tg->attrib[attr].fetch( data, src, 0, 0 );
-
-   if (0)
-  debug_printf("Fetch elt attr %d  from %p  stride %d  div %u  
max %u  index %d:  "
-   " %f, %f, %f, %f \n",
-   attr,
-   tg->attrib[attr].input_ptr,
-   tg->attrib[attr].input_stride,
-   tg->attrib[attr].instance_divisor,
-   tg->attrib[attr].max_index,
-   index,
-   data[0], data[1],data[2], data[3]);
-   tg->attrib[attr].emit( data, dst );
-}
- } else {
-if(likely(tg->attrib[attr].copy_size >= 0))
-   memcpy(data, &instance_id, 4);
-else
-{
-   data[0] = (float)instance_id;
-   tg->attrib[attr].emit( data, dst );
-}
- }
-  }
+  generic_run_one(tg, *elts++, instance_id, vert);
   vert += tg->translate.key.output_stride;
}
 }
 
-
-
 static void PIPE_CDECL generic_run( str

[Mesa-dev] [PATCH 1/6] translate_generic: use memcpy if possible (v2)

2010-08-13 Thread Luca Barbieri
Changes in v2:
- Add comment regarding copy_size

When used in GPU drivers, translate can be used to simultaneously
perform a gather operation, and convert away from unsupported formats.

In this use case, input and output formats will often be identical: clearly
it would make sense to use a memcpy in this case.

Instead, translate will insist to convert to and from 32-bit floating point
numbers.

This is not only extremely expensive, but it also loses precision for
32/64-bit integers and 64-bit floating point numbers.

This patch changes translate_generic to just use memcpy if the formats are
identical, non-blocked, and with an integral number of bytes per pixel (note
that all sensible vertex formats are like this).
---
 .../auxiliary/translate/translate_generic.c|  102 +--
 1 files changed, 70 insertions(+), 32 deletions(-)

diff --git a/src/gallium/auxiliary/translate/translate_generic.c 
b/src/gallium/auxiliary/translate/translate_generic.c
index 42cfd76..356d488 100644
--- a/src/gallium/auxiliary/translate/translate_generic.c
+++ b/src/gallium/auxiliary/translate/translate_generic.c
@@ -64,6 +64,14 @@ struct translate_generic {
   unsigned input_stride;
   unsigned max_index;
 
+  /* this value is set to -1 if this is a normal element with 
output_format != input_format:
+   * in this case, u_format is used to do a full conversion
+   *
+   * this value is set to the format size in bytes if output_format == 
input_format or for 32-bit instance ids:
+   * in this case, memcpy is used to copy this amount of bytes
+   */
+  int copy_size;
+
} attrib[PIPE_MAX_ATTRIBS];
 
unsigned nr_attrib;
@@ -354,8 +362,6 @@ static emit_func get_emit_func( enum pipe_format format )
}
 }
 
-
-
 /**
  * Fetch vertex attributes for 'count' vertices.
  */
@@ -380,9 +386,10 @@ static void PIPE_CDECL generic_run_elts( struct translate 
*translate,
 float data[4];
 char *dst = vert + tg->attrib[attr].output_offset;
 
- if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) {
+if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) {
 const uint8_t *src;
 unsigned index;
+int copy_size;
 
 if (tg->attrib[attr].instance_divisor) {
index = instance_id / tg->attrib[attr].instance_divisor;
@@ -396,27 +403,34 @@ static void PIPE_CDECL generic_run_elts( struct translate 
*translate,
 src = tg->attrib[attr].input_ptr +
   tg->attrib[attr].input_stride * index;
 
-tg->attrib[attr].fetch( data, src, 0, 0 );
-
-if (0)
-   debug_printf("Fetch elt attr %d  from %p  stride %d  div %u  
max %u  index %d:  "
-" %f, %f, %f, %f \n",
-attr,
-tg->attrib[attr].input_ptr,
-tg->attrib[attr].input_stride,
-tg->attrib[attr].instance_divisor,
-tg->attrib[attr].max_index,
-index,
-data[0], data[1],data[2], data[3]);
+copy_size = tg->attrib[attr].copy_size;
+if(likely(copy_size >= 0))
+   memcpy(dst, src, copy_size);
+else
+{
+   tg->attrib[attr].fetch( data, src, 0, 0 );
+
+   if (0)
+  debug_printf("Fetch elt attr %d  from %p  stride %d  div %u  
max %u  index %d:  "
+   " %f, %f, %f, %f \n",
+   attr,
+   tg->attrib[attr].input_ptr,
+   tg->attrib[attr].input_stride,
+   tg->attrib[attr].instance_divisor,
+   tg->attrib[attr].max_index,
+   index,
+   data[0], data[1],data[2], data[3]);
+   tg->attrib[attr].emit( data, dst );
+}
  } else {
-data[0] = (float)instance_id;
+if(likely(tg->attrib[attr].copy_size >= 0))
+   memcpy(data, &instance_id, 4);
+else
+{
+   data[0] = (float)instance_id;
+   tg->attrib[attr].emit( data, dst );
+}
  }
-
- if (0)
-debug_printf("vert %d/%d attr %d: %f %f %f %f\n",
- i, elt, attr, data[0], data[1], data[2], data[3]);
-
-tg->attrib[attr].emit( data, dst );
   }
   vert += tg->translate.key.output_stride;
}
@@ -448,6 +462,7 @@ static void PIPE_CDECL generic_run( struct translate 
*translate,
  if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) {
 const uint8_t *src;
 unsigned index;
+int copy_size;
 
 if (tg->attrib[attr].instance_divisor) {
index = instance_id / tg->attrib[attr].instan

[Mesa-dev] [PATCH 0/6] Translate improvements (v2)

2010-08-13 Thread Luca Barbieri
This patchset addresses review comments, and adds support for running
on CPUs lacking any SSE support, but only for format pairs that are
identical or swizzles of each other.

Luca Barbieri (6):
  translate_generic: use memcpy if possible (v2)
  translate_generic: factor out common code between linear and indexed
  translate_sse: remove useless generated function wrappers
  translate: add support for 8/16-bit indices
  rtasm: add minimal x86-64 support and new instructions
  translate_sse: major rewrite (v2)

 src/gallium/auxiliary/rtasm/rtasm_cpu.c|6 +-
 src/gallium/auxiliary/rtasm/rtasm_x86sse.c |  447 +++-
 src/gallium/auxiliary/rtasm/rtasm_x86sse.h |   67 +-
 src/gallium/auxiliary/translate/translate.h|   12 +
 .../auxiliary/translate/translate_generic.c|  207 ++--
 src/gallium/auxiliary/translate/translate_sse.c| 1270 +++-
 6 files changed, 1584 insertions(+), 425 deletions(-)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] talloc (Was: Merge criteria for glsl2 branch)

2010-08-13 Thread José Fonseca
On Fri, 2010-08-13 at 02:03 -0700, Dave Airlie wrote:
> On Fri, Aug 13, 2010 at 4:58 PM, Aras Pranckevicius  wrote:
> >> > But I perceive talloc as different from all above: it's very low level
> >> > and low weight library, providing very basic functionality, and upstream
> >> > never showed interest for Windows portability. I'd really prefer to see
> >> > the talloc source bundled (and only compiled on windows), as a quick way
> >> > to have glsl2 merged without causing windows build failures.
> >>
> >> This seems like a reasonable compromise.  Is this something that you and
> >> / or Aras can tackle?  I don't have a Windows build system set up, so I
> >> wouldn't be able to test any build system changes that I made.
> >
> > Ok, looks like how/if to bundle talloc is still a very open question. In the
> > meantime, here's talloc 2.0.1 made to compile (and possibly work!) with
> > Visual C++ 2008 (Windows) and Xcode/gcc4.0 (Mac).
> > I've attached the modified talloc.c & talloc.h and the patch from original
> > talloc 2.0.1 (from here http://samba.org/ftp/talloc/). Caveat emptor: I only
> > verified this to work on my own GLSL2 fork, which does not compile in GLSL2
> > preprocessor, only the compiler & optimizer.
> > Like I said before, "full port" of talloc seems to be not needed for
> > compiling on Visual C++; just drop in talloc.h & talloc.c into the project
> > and that's it. Same for Mac with Xcode. It also seems that GLSL2 does not
> > use full talloc's functionality, and at least half of the implementation
> > could be dropped without anyone noticing. Just a note for if/when anyone
> > would try to re-implement talloc with Mesa's license.
> 
> Be careful about LGPLv3 rules,
> 
> If you are distributing anything linked with an LGPL library without
> accompanying source you need to dynamically link it,
>
> So for example a Windows driver or non open compiler, you can't just
> drop the LGPLv3 c+h files into the project, you need to create a
> dynamic library.

Yep. I got excited with v3's 
 http://www.gnu.org/licenses/lgpl.html section 5, "combined libraries",
but rereading it I found the requirement to use shared library (or ship
the object files for the closed source bits) is still there in section 4
d) 1).

I think this pretty much settles on my mind that we need a BSD
reimplementation of this in the medium term, as the hassle of changing
all the installer and code signing code to install/sign a new dll would
by far exceed the effort necessary to implement the functionality of
talloc missing from its muse, halloc.

Jose

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/6] translate_sse: major rewrite

2010-08-13 Thread Keith Whitwell
On Thu, 2010-08-12 at 10:22 -0700, Luca Barbieri wrote:
> translate_sse is currently very limited to the point of
> being useless in essentially all cases.
> 
> In particular, it only support some float32 and unorm8
> formats and doesn't work on x86-64.
> 
> This commit rewrites it to support:
> 1. Dumb memory copy for any pair of identical formats
> 2. All formats that are swizzles of each other
> 3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float
> 4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16
> 5. Support for x86-64 (doesn't take advantage of it in any way though)
> 
> This new translate can even be useful to translate index buffers for
> cards that lack 8-bit index support.
> 
> It passes the testsuite I wrote, but note that this is a major change, and 
> more
> testing would be great.

Luca,

Beyond a few niggles, this looking great - an impressive body of work...

Couple of comments:


-static void emit_load_R32G32( struct translate_sse *p, 
-  struct x86_reg data,
-  struct x86_reg arg0 )
+/* out_chans = 5 means we want 4 channels with 1 in alpha instead of 0 */
+static void emit_load_float32( struct translate_sse *p,
+   struct x86_reg data,
+   struct x86_reg arg0,
+   unsigned out_chans,
+   unsigned chans)
 {

Is it possible to use an explicit flag for the (out_chans == 5) case?  





   case 8:
+#ifdef PIPE_ARCH_X86_64
+  x64_mov64(p->func, dataGPR, src);
+  x64_mov64(p->func, dst, dataGPR);
+#else
+  sse_movlps(p->func, dataXMM, src);
+  sse_movlps(p->func, dst, dataXMM);
+#endif
+  break;
+   case 12:
+#ifdef PIPE_ARCH_X86_64
+  x64_mov64(p->func, dataGPR2, src);
+#else
+  sse_movlps(p->func, dataXMM, src);
+#endif
+  x86_mov(p->func, dataGPR, x86_make_disp(src, 8));
+#ifdef PIPE_ARCH_X86_64
+  x64_mov64(p->func, dst, dataGPR2);
+#else
+  sse_movlps(p->func, dst, dataXMM);
+#endif
+  x86_mov(p->func, x86_make_disp(dst, 8), dataGPR);


Is it possible to do this without all the #ifdefs?  Even if statements
based on a preprocessor variable would be easier to read, but better
still would be some sort of wrapper function which just did the right
thing on either architecture.

Similar comment applies to your x86-64 changes in rtasm.c -- is there a
way to reduce the #ifdef load?

...

+// TODO: add support for SSE4.1 pmovzx

Probably want to use C-style comments throughout.


Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] translate_generic: use memcpy if possible

2010-08-13 Thread Keith Whitwell
Luca,

In this change you've got an int value (copy_size) which has some
special meaning when negative -- can you add comments explaining what
the meaning of a negative size is?  Is there a way to use some more
explicit flag value to indicate this condition?

Keith

On Thu, 2010-08-12 at 10:08 -0700, Luca Barbieri wrote:
> When used in GPU drivers, translate can be used to simultaneously
> perform a gather operation, and convert away from unsupported formats.
> 
> In this use case, input and output formats will often be identical: clearly
> it would make sense to use a memcpy in this case.
> 
> Instead, translate will insist to convert to and from 32-bit floating point
> numbers.
> 
> This is not only extremely expensive, but it also loses precision for
> 32/64-bit integers and 64-bit floating point numbers.
> 
> This patch changes translate_generic to just use memcpy if the formats are
> identical, non-blocked, and with an integral number of bytes per pixel (note
> that all sensible vertex formats are like this).
> ---
>  .../auxiliary/translate/translate_generic.c|   93 +--
>  1 files changed, 63 insertions(+), 30 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/translate/translate_generic.c 
> b/src/gallium/auxiliary/translate/translate_generic.c
> index 42cfd76..57a42b7 100644
> --- a/src/gallium/auxiliary/translate/translate_generic.c
> +++ b/src/gallium/auxiliary/translate/translate_generic.c
> @@ -63,6 +63,7 @@ struct translate_generic {
>const uint8_t *input_ptr;
>unsigned input_stride;
>unsigned max_index;
> +  int copy_size;
>  
> } attrib[PIPE_MAX_ATTRIBS];
>  
> @@ -380,9 +381,10 @@ static void PIPE_CDECL generic_run_elts( struct 
> translate *translate,
>float data[4];
>char *dst = vert + tg->attrib[attr].output_offset;
>  
> - if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) {
> +  if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) {
>  const uint8_t *src;
>  unsigned index;
> +int copy_size;
>  
>  if (tg->attrib[attr].instance_divisor) {
> index = instance_id / tg->attrib[attr].instance_divisor;
> @@ -396,27 +398,34 @@ static void PIPE_CDECL generic_run_elts( struct 
> translate *translate,
>  src = tg->attrib[attr].input_ptr +
>tg->attrib[attr].input_stride * index;
>  
> -tg->attrib[attr].fetch( data, src, 0, 0 );
> -
> -if (0)
> -   debug_printf("Fetch elt attr %d  from %p  stride %d  div %u  
> max %u  index %d:  "
> -" %f, %f, %f, %f \n",
> -attr,
> -tg->attrib[attr].input_ptr,
> -tg->attrib[attr].input_stride,
> -tg->attrib[attr].instance_divisor,
> -tg->attrib[attr].max_index,
> -index,
> -data[0], data[1],data[2], data[3]);
> +copy_size = tg->attrib[attr].copy_size;
> +if(likely(copy_size >= 0))
> +   memcpy(dst, src, tg->attrib[attr].copy_size);
> +else
> +{
> +   tg->attrib[attr].fetch( data, src, 0, 0 );
> +
> +   if (0)
> +  debug_printf("Fetch elt attr %d  from %p  stride %d  div 
> %u  max %u  index %d:  "
> +   " %f, %f, %f, %f \n",
> +   attr,
> +   tg->attrib[attr].input_ptr,
> +   tg->attrib[attr].input_stride,
> +   tg->attrib[attr].instance_divisor,
> +   tg->attrib[attr].max_index,
> +   index,
> +   data[0], data[1],data[2], data[3]);
> +   tg->attrib[attr].emit( data, dst );
> +}
>   } else {
> -data[0] = (float)instance_id;
> +if(likely(tg->attrib[attr].copy_size >= 0))
> +   memcpy(data, &instance_id, 4);
> +else
> +{
> +   data[0] = (float)instance_id;
> +   tg->attrib[attr].emit( data, dst );
> +}
>   }
> -
> - if (0)
> -debug_printf("vert %d/%d attr %d: %f %f %f %f\n",
> - i, elt, attr, data[0], data[1], data[2], data[3]);
> -
> -  tg->attrib[attr].emit( data, dst );
>}
>vert += tg->translate.key.output_stride;
> }
> @@ -448,6 +457,7 @@ static void PIPE_CDECL generic_run( struct translate 
> *translate,
>   if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) {
>  const uint8_t *src;
>  unsigned index;
> +int copy_size;
>  
>  if (tg->attrib[attr].instance_divisor) {
> index = instance_id / tg->attrib[attr].instance_divisor;
> 

Re: [Mesa-dev] talloc (Was: Merge criteria for glsl2 branch)

2010-08-13 Thread Aras Pranckevicius
> > Like I said before, "full port" of talloc seems to be not needed for
> > compiling on Visual C++; just drop in talloc.h & talloc.c into the project
> > and that's it. Same for Mac with Xcode.
> Be careful about LGPLv3 rules,
> If you are distributing anything linked with an LGPL library without
> accompanying source you need to dynamically link it

I know. I'm just providing a MSVC/Xcode compatible talloc source file.
How Mesa or some fork of Mesa includes it in the build or packages it
up - I'll just leave that up to them.

--
Aras Pranckevičius
work: http://unity3d.com
home: http://aras-p.info
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] talloc (Was: Merge criteria for glsl2 branch)

2010-08-13 Thread Dave Airlie
On Fri, Aug 13, 2010 at 4:58 PM, Aras Pranckevicius  wrote:
>> > But I perceive talloc as different from all above: it's very low level
>> > and low weight library, providing very basic functionality, and upstream
>> > never showed interest for Windows portability. I'd really prefer to see
>> > the talloc source bundled (and only compiled on windows), as a quick way
>> > to have glsl2 merged without causing windows build failures.
>>
>> This seems like a reasonable compromise.  Is this something that you and
>> / or Aras can tackle?  I don't have a Windows build system set up, so I
>> wouldn't be able to test any build system changes that I made.
>
> Ok, looks like how/if to bundle talloc is still a very open question. In the
> meantime, here's talloc 2.0.1 made to compile (and possibly work!) with
> Visual C++ 2008 (Windows) and Xcode/gcc4.0 (Mac).
> I've attached the modified talloc.c & talloc.h and the patch from original
> talloc 2.0.1 (from here http://samba.org/ftp/talloc/). Caveat emptor: I only
> verified this to work on my own GLSL2 fork, which does not compile in GLSL2
> preprocessor, only the compiler & optimizer.
> Like I said before, "full port" of talloc seems to be not needed for
> compiling on Visual C++; just drop in talloc.h & talloc.c into the project
> and that's it. Same for Mac with Xcode. It also seems that GLSL2 does not
> use full talloc's functionality, and at least half of the implementation
> could be dropped without anyone noticing. Just a note for if/when anyone
> would try to re-implement talloc with Mesa's license.

Be careful about LGPLv3 rules,

If you are distributing anything linked with an LGPL library without
accompanying source you need to dynamically link it,

So for example a Windows driver or non open compiler, you can't just
drop the LGPLv3 c+h files into the project, you need to create a
dynamic library.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH]R600g more pipe_cap shader params

2010-08-13 Thread Владимир
Ok, I used opengl extension viewer from realtechvr to identify some
parameters values, (used some data from win/osx ati drivers) also identified
most of limits by its normal name.
the patch conatins some new values and comments for them.

2010/8/10 Marek Olšák 

> I've already committed some of the changes and fixed others here:
>
>
> http://cgit.freedesktop.org/mesa/mesa/commit/?id=00963589b4d92460e3ae2c1557a5d816b5c67a6d
>
> If you still think there is something incorrect, please attach a new patch
> against current mesa git.
>
> -Marek
>
> On Sun, Aug 8, 2010 at 9:30 PM, Владимир  wrote:
>
>> Patch based mainly on info from r600c and few bits taken from r300g
>> (vertex tex instruction params)
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>>
>


r600_pipecap.patch
Description: Binary data
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev