Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64
On Fri, Aug 13, 2010 at 8:03 PM, Luca Barbieri wrote: >> #if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) >> #define PIPE_ARCH_LITTLE_ENDIAN >> #elif defined(PIPE_ARCH_PPC) || defined(PIPE_ARCH_PPC_64) >> #define PIPE_ARCH_BIG_ENDIAN >> #else >> #define PIPE_ARCH_UNKNOWN_ENDIAN >> #endif > > Note that this isn't really correct: there endianness must be known by > the compiler, since it must choose a way to represent global > initialized 16/32-bit integer variables, among others. > > Also, at least some PowerPCs can be configured as little endian (even > though it is unusual to do so). > > Usually the compiler sets the macro WORDS_BIGENDIAN to indicate > big-endian targets, and this is the one that should be tested. Feel free to fix it; I introduced those as a base for some r300g PPC-specific fixes, and have never owned the relevant hardware. ~ C. -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Merge of glsl2 branch to master
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ian Romanick wrote: > I propose that we merge master to glsl2 on *Friday, August 13th* (one > week from today). Barring unforeseen issues, I propose that we merge > glsl2 to master on *Monday, August 16th*. The master -> glsl2 merge is complete. There don't appear to be any regressions in the glsl2 branch caused by the merge. My plan is to merge glsl2 -> master on Monday evening, pacific time. There are still three build issues with MSVC. All three either have patches or a proposed fix. I've been working on this for almost 12 hours today, so I'm not going to post the combinations of test results that I usually post. Sorry. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkxmED4ACgkQX1gOwKyEAw9yggCfeSZnpp8IMeZefx593gjJwLAj AUcAn3L70Z1Yfjck8WVzQOCLQ8J/OGU5 =C84g -END PGP SIGNATURE- ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] translate_sse: major rewrite (v4)
Changes in v4: - Use x86_target() and x86_target_caps() - Enable translate_sse in x86-64, but not in Win64 Changes in v3: - Win64 support (untested) - Use u_cpu_detect.h constants instead of #ifs Changes in v2: - Minimize #ifs - Give a name to magic number CHANNELS_0001 - Add support for CPUs without SSE (only memcpy and swizzles, like non SSE2) - Fixed comments translate_sse is currently very limited to the point of being useless in essentially all cases. In particular, it only support some float32 and unorm8 formats and doesn't work on x86-64. This commit rewrites it to support: 1. Dumb memory copy for any pair of identical formats 2. All formats that are swizzles of each other 3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float 4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16 5. Support for x86-64 (doesn't take advantage of it in any way though) This new translate can even be useful to translate index buffers for cards that lack 8-bit index support. It passes the testsuite I wrote, but note that this is a major change, and more testing would be great. 0002-translate_sse-major-rewrite-v4.patch.gz Description: GNU Zip compressed data ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] u_cpu_detect: remove arch and little_endian
This logic duplicates the one in p_config.h, so remove it and adjust the only two places that were using it. --- src/gallium/auxiliary/gallivm/lp_bld_pack.c |7 +++ src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |6 +- src/gallium/auxiliary/util/u_cpu_detect.c | 18 -- src/gallium/auxiliary/util/u_cpu_detect.h | 13 + 4 files changed, 9 insertions(+), 35 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c b/src/gallium/auxiliary/gallivm/lp_bld_pack.c index ecfb13a..b7b630f 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c @@ -171,14 +171,13 @@ lp_build_unpack2(LLVMBuilderRef builder, msb = lp_build_zero(src_type); /* Interleave bits */ - if(util_cpu_caps.little_endian) { +#ifdef PIPE_ARCH_LITTLE_ENDIAN *dst_lo = lp_build_interleave2(builder, src_type, src, msb, 0); *dst_hi = lp_build_interleave2(builder, src_type, src, msb, 1); - } - else { +#else *dst_lo = lp_build_interleave2(builder, src_type, msb, src, 0); *dst_hi = lp_build_interleave2(builder, src_type, msb, src, 1); - } +#endif /* Cast the result into the new type (twice as wide) */ diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c index 3075065..02d43e3 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c @@ -1840,7 +1840,11 @@ lp_build_sample_2d_linear_aos(struct lp_build_sample_context *bld, unsigned i, j; for(j = 0; j < h16.type.length; j += 4) { - unsigned subindex = util_cpu_caps.little_endian ? 0 : 1; +#ifdef PIPE_ARCH_LITTLE_ENDIAN + unsigned subindex = 0; +#else + unsigned subindex = 1; +#endif LLVMValueRef index; index = LLVMConstInt(elem_type, j/2 + subindex, 0); diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c b/src/gallium/auxiliary/util/u_cpu_detect.c index b1a8c75..2bbc554 100644 --- a/src/gallium/auxiliary/util/u_cpu_detect.c +++ b/src/gallium/auxiliary/util/u_cpu_detect.c @@ -391,23 +391,6 @@ util_cpu_detect(void) memset(&util_cpu_caps, 0, sizeof util_cpu_caps); - /* Check for arch type */ -#if defined(PIPE_ARCH_MIPS) - util_cpu_caps.arch = UTIL_CPU_ARCH_MIPS; -#elif defined(PIPE_ARCH_ALPHA) - util_cpu_caps.arch = UTIL_CPU_ARCH_ALPHA; -#elif defined(PIPE_ARCH_SPARC) - util_cpu_caps.arch = UTIL_CPU_ARCH_SPARC; -#elif defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) - util_cpu_caps.arch = UTIL_CPU_ARCH_X86; - util_cpu_caps.little_endian = 1; -#elif defined(PIPE_ARCH_PPC) - util_cpu_caps.arch = UTIL_CPU_ARCH_POWERPC; - util_cpu_caps.little_endian = 0; -#else - util_cpu_caps.arch = UTIL_CPU_ARCH_UNKNOWN; -#endif - /* Count the number of CPUs in system */ #if defined(PIPE_OS_WINDOWS) { @@ -504,7 +487,6 @@ util_cpu_detect(void) #ifdef DEBUG if (debug_get_option_dump_cpu()) { - debug_printf("util_cpu_caps.arch = %i\n", util_cpu_caps.arch); debug_printf("util_cpu_caps.nr_cpus = %u\n", util_cpu_caps.nr_cpus); debug_printf("util_cpu_caps.x86_cpu_type = %u\n", util_cpu_caps.x86_cpu_type); diff --git a/src/gallium/auxiliary/util/u_cpu_detect.h b/src/gallium/auxiliary/util/u_cpu_detect.h index 4b3dc39..f3bef09 100644 --- a/src/gallium/auxiliary/util/u_cpu_detect.h +++ b/src/gallium/auxiliary/util/u_cpu_detect.h @@ -36,26 +36,15 @@ #define _UTIL_CPU_DETECT_H #include "pipe/p_compiler.h" - -enum util_cpu_arch { - UTIL_CPU_ARCH_UNKNOWN = 0, - UTIL_CPU_ARCH_MIPS, - UTIL_CPU_ARCH_ALPHA, - UTIL_CPU_ARCH_SPARC, - UTIL_CPU_ARCH_X86, - UTIL_CPU_ARCH_POWERPC -}; +#include "pipe/p_config.h" struct util_cpu_caps { - enum util_cpu_arch arch; unsigned nr_cpus; /* Feature flags */ int x86_cpu_type; unsigned cacheline; - unsigned little_endian:1; - unsigned has_tsc:1; unsigned has_mmx:1; unsigned has_mmx2:1; -- 1.7.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] rtasm: add minimal x86-64 support and new instructions (v3)
Changes in v3: - Add target and target caps functions, so that they could be different in principle from the current CPU and they don't need #ifs to check Changes in v2: - Win64 support (untested) - Use u_cpu_detect.h constants instead of #ifs This commit adds minimal x86-64 support: only movs between registers are supported for r8-r15, and x64_rexw() must be used to ask for 64-bit operations. It also adds several new instructions for the new translate_sse code. --- src/gallium/auxiliary/rtasm/rtasm_cpu.c|6 +- src/gallium/auxiliary/rtasm/rtasm_x86sse.c | 477 ++-- src/gallium/auxiliary/rtasm/rtasm_x86sse.h | 101 ++- 3 files changed, 544 insertions(+), 40 deletions(-) diff --git a/src/gallium/auxiliary/rtasm/rtasm_cpu.c b/src/gallium/auxiliary/rtasm/rtasm_cpu.c index 2e15751..0461c81 100644 --- a/src/gallium/auxiliary/rtasm/rtasm_cpu.c +++ b/src/gallium/auxiliary/rtasm/rtasm_cpu.c @@ -30,7 +30,7 @@ #include "rtasm_cpu.h" -#if defined(PIPE_ARCH_X86) +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) static boolean rtasm_sse_enabled(void) { static boolean firsttime = 1; @@ -49,7 +49,7 @@ static boolean rtasm_sse_enabled(void) int rtasm_cpu_has_sse(void) { /* FIXME: actually detect this at run-time */ -#if defined(PIPE_ARCH_X86) +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) return rtasm_sse_enabled(); #else return 0; @@ -59,7 +59,7 @@ int rtasm_cpu_has_sse(void) int rtasm_cpu_has_sse2(void) { /* FIXME: actually detect this at run-time */ -#if defined(PIPE_ARCH_X86) +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) return rtasm_sse_enabled(); #else return 0; diff --git a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c index 63007c1..e80875a 100644 --- a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c +++ b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c @@ -22,8 +22,9 @@ **/ #include "pipe/p_config.h" +#include "util/u_cpu_detect.h" -#if defined(PIPE_ARCH_X86) +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) #include "pipe/p_compiler.h" #include "util/u_debug.h" @@ -231,6 +232,10 @@ static void emit_modrm( struct x86_function *p, assert(reg.mod == mod_REG); + /* TODO: support extended x86-64 registers */ + assert(reg.idx < 8); + assert(regmem.idx < 8); + val |= regmem.mod << 6; /* mod field */ val |= reg.idx << 3;/* reg field */ val |= regmem.idx; /* r/m field */ @@ -363,6 +368,12 @@ int x86_get_label( struct x86_function *p ) */ +void x64_rexw(struct x86_function *p) +{ + if(x86_target(p) != X86_32) + emit_1ub(p, 0x48); +} + void x86_jcc( struct x86_function *p, enum x86_cc cc, int label ) @@ -449,6 +460,52 @@ void x86_mov_reg_imm( struct x86_function *p, struct x86_reg dst, int imm ) emit_1i(p, imm); } +void x86_mov_imm( struct x86_function *p, struct x86_reg dst, int imm ) +{ + DUMP_RI( dst, imm ); + if(dst.mod == mod_REG) + x86_mov_reg_imm(p, dst, imm); + else + { + emit_1ub(p, 0xc7); + emit_modrm_noreg(p, 0, dst); + emit_1i(p, imm); + } +} + +void x86_mov16_imm( struct x86_function *p, struct x86_reg dst, uint16_t imm ) +{ + DUMP_RI( dst, imm ); + emit_1ub(p, 0x66); + if(dst.mod == mod_REG) + { + emit_1ub(p, 0xb8 + dst.idx); + emit_2ub(p, imm & 0xff, imm >> 8); + } + else + { + emit_1ub(p, 0xc7); + emit_modrm_noreg(p, 0, dst); + emit_2ub(p, imm & 0xff, imm >> 8); + } +} + +void x86_mov8_imm( struct x86_function *p, struct x86_reg dst, uint8_t imm ) +{ + DUMP_RI( dst, imm ); + if(dst.mod == mod_REG) + { + emit_1ub(p, 0xb0 + dst.idx); + emit_1ub(p, imm); + } + else + { + emit_1ub(p, 0xc6); + emit_modrm_noreg(p, 0, dst); + emit_1ub(p, imm); + } +} + /** * Immediate group 1 instructions. */ @@ -520,7 +577,7 @@ void x86_push( struct x86_function *p, } - p->stack_offset += 4; + p->stack_offset += sizeof(void*); } void x86_push_imm32( struct x86_function *p, @@ -530,7 +587,7 @@ void x86_push_imm32( struct x86_function *p, emit_1ub(p, 0x68); emit_1i(p, imm32); - p->stack_offset += 4; + p->stack_offset += sizeof(void*); } @@ -540,23 +597,33 @@ void x86_pop( struct x86_function *p, DUMP_R( reg ); assert(reg.mod == mod_REG); emit_1ub(p, 0x58 + reg.idx); - p->stack_offset -= 4; + p->stack_offset -= sizeof(void*); } void x86_inc( struct x86_function *p, struct x86_reg reg ) { DUMP_R( reg ); - assert(reg.mod == mod_REG); - emit_1ub(p, 0x40 + reg.idx); + if(x86_target(p) == X86_32 && reg.mod == mod_REG) + { + emit_1ub(p, 0x40 + reg.idx); + return; + } + emit_1ub(p, 0xff); + emit_modrm_noreg(p, 0, reg); } void x86_dec( struct x86_function *p, str
[Mesa-dev] [PATCH 0/2] translate_sse/rtasm improvements (v4)
This new version replaces direct use of u_cpu_detect.h with rtasm-provided helpers to check the target and caps. This seems the cleanest solution, as it allows to target other CPUs than the running one in theory, and avoids both #ifdefs and duplicating the p_config.h logic. The u_cpu_detect.h patch is now separate and independent from these changes. Luca Barbieri (2): rtasm: add minimal x86-64 support and new instructions (v3) translate_sse: major rewrite (v4) src/gallium/auxiliary/rtasm/rtasm_cpu.c |6 +- src/gallium/auxiliary/rtasm/rtasm_x86sse.c | 477 +- src/gallium/auxiliary/rtasm/rtasm_x86sse.h | 101 ++- src/gallium/auxiliary/translate/translate.c |3 +- src/gallium/auxiliary/translate/translate_sse.c | 1159 ++- 5 files changed, 1467 insertions(+), 279 deletions(-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 29572] [glsl] MSVC build fails with some C99 math functions
https://bugs.freedesktop.org/show_bug.cgi?id=29572 Ian Romanick changed: What|Removed |Added Status|NEW |ASSIGNED AssignedTo|mesa-...@lists.freedesktop. |i...@freedesktop.org |org | --- Comment #1 from Ian Romanick 2010-08-13 20:24:30 PDT --- Created an attachment (id=37855) View: https://bugs.freedesktop.org/attachment.cgi?id=37855 Review: https://bugs.freedesktop.org/review?bug=29572&attachment=37855 Work-arounds for platforms that lack C99 math functions -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 29573] [glsl2] struct within a struct causes an assertion failure
https://bugs.freedesktop.org/show_bug.cgi?id=29573 Ian Romanick changed: What|Removed |Added AssignedTo|mesa-...@lists.freedesktop. |e...@anholt.net |org | -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 29573] New: [glsl2] struct within a struct causes an assertion failure
https://bugs.freedesktop.org/show_bug.cgi?id=29573 Summary: [glsl2] struct within a struct causes an assertion failure Product: Mesa Version: git Platform: All OS/Version: All Status: NEW Severity: normal Priority: medium Component: Mesa core AssignedTo: mesa-dev@lists.freedesktop.org ReportedBy: i...@freedesktop.org In CorrectFull.frag, the following structure causes an assertion failure: struct light1 { float intensity; vec3 position; int test_int[2]; struct { int a; float f; } light2; } lightVar; This variable is never dereferenced in the program. The assertion failure is: ir_validate.cpp:382: void check_node_type(ir_instruction*, void*): Assertion `ir->type != glsl_type::error_type' failed. This was first triggered after the commit listed below, but I believe that is spurious. The actual assertion is that the declaration of a variable lightVar_light2 has an error type. My guess is that rearranging optimization passes have caused ir_validate to be called before this unused declaration could be removed. commit 2f4fe151681a6f6afe1d452eece6cf4144f44e49 Author: Eric Anholt Date: Tue Aug 10 13:06:49 2010 -0700 glsl2: Move the common optimization passes to a helper function. These are passes that we expect all codegen to be happy with. The other lowering passes for Mesa IR are moved to the Mesa IR generator. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64
> #if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) > #define PIPE_ARCH_LITTLE_ENDIAN > #elif defined(PIPE_ARCH_PPC) || defined(PIPE_ARCH_PPC_64) > #define PIPE_ARCH_BIG_ENDIAN > #else > #define PIPE_ARCH_UNKNOWN_ENDIAN > #endif Note that this isn't really correct: there endianness must be known by the compiler, since it must choose a way to represent global initialized 16/32-bit integer variables, among others. Also, at least some PowerPCs can be configured as little endian (even though it is unusual to do so). Usually the compiler sets the macro WORDS_BIGENDIAN to indicate big-endian targets, and this is the one that should be tested. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64
I came up with yet another solution, which I believe is the right one. We remove the arch/abi/endianness in u_cpu_detect.h, but add them as inline function helpers in rtasm. Currently they would return a constant, but could be changed if we ever want rtasm to target anything but the current running CPU. Most places outside of code generation will actually not even parse/compile for the wrong architecture (they are inline assembly or intrinsic usage), and thus can't use ifs, so this should work well. Changes to replace PIPE_ARCH_* with commonly used macros can be done separately if desired. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 29572] New: [glsl] MSVC build fails with some C99 math functions
https://bugs.freedesktop.org/show_bug.cgi?id=29572 Summary: [glsl] MSVC build fails with some C99 math functions Product: Mesa Version: git Platform: x86 (IA32) OS/Version: Windows (All) Status: NEW Severity: blocker Priority: medium Component: Mesa core AssignedTo: mesa-dev@lists.freedesktop.org ReportedBy: v...@vmware.com mesa: 8f8cdbfba43550d0b8985fb087961864e4cd92b6 (glsl2) Build with MSVC. $ scons quiet=no ... cl /Fobuild\windows-x86-debug\glsl\ir_constant_expression.obj /c src\glsl\ir_constant_expression.cpp /TP /nologo /Od /Oi /Oy- /GL- /fp:fast /W3 /MTd /LDd /DDEBUG /DWIN32 /D_WINDOWS /D_WIN32_WINNT=0x0601 /DWINVER=0x0601 /DVC_EXTRALEAN /D_USE_MATH_DEFINES /D_CRT_SECURE_NO_WARNINGS /D_CRT_SECURE_NO_DEPRECATE /D_SCL_SECURE_NO_WARNINGS /D_SCL_SECURE_NO_DEPRECATE /D_DEBUG /DPIPE_SUBSYSTEM_WINDOWS_USER /Isrc\talloc /Isrc\mapi /Isrc\mesa /Iinclude /Isrc\gallium\include /Isrc\gallium\auxiliary /Isrc\gallium\drivers /Isrc\gallium\winsys /Iinclude\c99 /Z7 ir_constant_expression.cpp src\glsl\ir_constant_expression.cpp(112) : warning C4244: '=' : conversion from 'float' to 'int', possible loss of data src\glsl\ir_constant_expression.cpp(118) : warning C4244: '=' : conversion from 'int' to 'float', possible loss of data src\glsl\ir_constant_expression.cpp(124) : warning C4244: '=' : conversion from 'unsigned int' to 'float', possible loss of data src\glsl\ir_constant_expression.cpp(130) : warning C4244: '=' : conversion from 'double' to 'float', possible loss of data src\glsl\ir_constant_expression.cpp(136) : warning C4800: 'float' : forcing value to bool 'true' or 'false' (performance warning) src\glsl\ir_constant_expression.cpp(148) : warning C4800: 'unsigned int' : forcing value to bool 'true' or 'false' (performance warning) src\glsl\ir_constant_expression.cpp(155) : error C3861: 'truncf': identifier not found src\glsl\ir_constant_expression.cpp(209) : warning C4146: unary minus operator applied to unsigned type, result still unsigned src\glsl\ir_constant_expression.cpp(275) : warning C4244: '=' : conversion from 'double' to 'float', possible loss of data src\glsl\ir_constant_expression.cpp(286) : warning C4244: '=' : conversion from 'double' to 'float', possible loss of data src\glsl\ir_constant_expression.cpp(307) : error C3861: 'exp2f': identifier not found src\glsl\ir_constant_expression.cpp(321) : error C3861: 'log2f': identifier not found src\glsl\ir_constant_expression.cpp(883) : warning C4244: '=' : conversion from 'double' to 'float', possible loss of data src\glsl\ir_constant_expression.cpp(1068) : warning C4244: '=' : conversion from 'double' to 'float', possible loss of data src\glsl\ir_constant_expression.cpp(1077) : warning C4244: 'initializing' : conversion from 'double' to 'const float', possible loss of data src\glsl\ir_constant_expression.cpp(1117) : warning C4244: '=' : conversion from 'double' to 'float', possible loss of data scons: *** [build\windows-x86-debug\glsl\ir_constant_expression.obj] Error 2 scons: building terminated because of errors. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 29044] GLSL compiler tracker
https://bugs.freedesktop.org/show_bug.cgi?id=29044 Bug 29044 depends on bug 29500, which changed state. Bug 29500 Summary: [glsl2]Mesa demo shadow_sampler fail to run with error: `shadow2DRectProj' undeclared https://bugs.freedesktop.org/show_bug.cgi?id=29500 What|Old Value |New Value Resolution||FIXED Status|ASSIGNED|RESOLVED -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 29044] GLSL compiler tracker
https://bugs.freedesktop.org/show_bug.cgi?id=29044 Bug 29044 depends on bug 29537, which changed state. Bug 29537 Summary: [glsl2] texture2DLod() should not be accepted by fragment programs https://bugs.freedesktop.org/show_bug.cgi?id=29537 What|Old Value |New Value Resolution||FIXED Status|NEW |RESOLVED -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64
> There's no merit in duplicating in util_caps what's already provided by > p_config.h / p_compiler.h Indeed it's not a great thing. However, Keith wanted to be able to check those with ifs instead of #ifdefs, and it does indeed make the code a bit nicer. But the current definitions in p_config.h don't allow that. So it's either: 1. Duplicate it like I did in the latest patchset 2. Replace the p_config.h logic with something like in the latest patchset and change hundreds of places in the codebase 3. Change the p_config.h logic so everything not defined is set to 0 instead 4. Give up and just use the #ifdefs as I did in the earlier patchsets Thinking about it again, I'd suggest either #3 or #4 instead of the #1 I did there. I don't think it's worth spending much time on this matter though. BTW, I'm not sure why the PIPE_ARCH_* defines exist in the first place, since the compiler already provides __i386__, WORDS_BIGENDIAN and similar. If Windows doesn't define them, ad-hoc code can be introduced to define those, like #if defined(WIN32) && !defined(__i386__) #define __i386__ #endif This should reduce the learning curve to the codebase. Several similar issues exist like the use of INLINE, FREE, etc. instead of manually defining the commonly used keywords in places where they are not available. But this is another matter, and also not really worth spending much time on (except perhaps "INLINE" which at least personally I manage to forget about almost every time). ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 29571] [glsl2] glcpp/glcpp-parse.y(312) : error C2146: syntax error : missing ')' before identifier 'PRIiMAX'
https://bugs.freedesktop.org/show_bug.cgi?id=29571 Ian Romanick changed: What|Removed |Added AssignedTo|e...@anholt.net |mesa-...@lists.freedesktop. ||org -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 29545] [bisected glsl2]piglit glslparsertest_preprocess1.frag fails
https://bugs.freedesktop.org/show_bug.cgi?id=29545 Ian Romanick changed: What|Removed |Added AssignedTo|mesa-...@lists.freedesktop. |cwo...@cworth.org |org | -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 29545] [bisected glsl2]piglit glslparsertest_preprocess1.frag fails
https://bugs.freedesktop.org/show_bug.cgi?id=29545 Ian Romanick changed: What|Removed |Added OS/Version|Linux (All) |All Component|Drivers/DRI/i965|Mesa core AssignedTo|e...@anholt.net |mesa-...@lists.freedesktop. ||org -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 29044] GLSL compiler tracker
https://bugs.freedesktop.org/show_bug.cgi?id=29044 Bug 29044 depends on bug 29540, which changed state. Bug 29540 Summary: [glsl2] problem with vertex attribute locations and draw-time validation https://bugs.freedesktop.org/show_bug.cgi?id=29540 What|Old Value |New Value Status|NEW |ASSIGNED Resolution||FIXED Status|ASSIGNED|RESOLVED -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] Mesa prog_optimize.c: better optimization for Mesa programs
Just looked at your modifications. I will try to be more GL/mesa style compliant for the variable types. Thanks for your help Ben From: Brian Paul [bri...@vmware.com] Sent: Friday, August 13, 2010 4:01 PM To: Segovia, Benjamin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH] Mesa prog_optimize.c: better optimization for Mesa programs On 08/11/2010 09:21 PM, Segovia, Benjamin wrote: > Corrected. > > I rescaned the whole code and tried to perform more aggressive checks. > I rerun all the tests, warsow and nexuiz. > > Please find the updated patch attached. Committed. Thanks. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] Mesa prog_optimize.c: better optimization for Mesa programs
On 08/11/2010 09:21 PM, Segovia, Benjamin wrote: Corrected. I rescaned the whole code and tried to perform more aggressive checks. I rerun all the tests, warsow and nexuiz. Please find the updated patch attached. Committed. Thanks. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] dri/r300: test for FEATURE defines
Chia-I Wu wrote: >> Fixes a fatal build error when compiling just OpenGL ES libraries, since >> FEATURE_EXT_framebuffer_blit is disabled then, so the BlitFramebuffer >> member doesn't exist. > Is this change enough to make dri_r300 function as a GLES only driver? > > To be honest, I am a little reluctant to sprinkle "#if FEATURE" in the drivers > at the moment. The drivers, execept for intel, have not specified GLES api > support yet. Even when they do, I would hope there is a more systematic to > enable/disable certain features, to effectively reduce the driver size. It's already sprinkled through the DRI drivers right now though(it's in intel_fbo.c at least), because struct dd_function_table's members in src/mesa/main/dd.h are #if'd based on the feature defines. As it is, the code for radeon and nouveau (and the mesa state tracker, now that I check) is just borked without them. (new patch for st/mesa attached) From 428e355978dbc4c3fff00ee46ad7f8455a07308a Mon Sep 17 00:00:00 2001 From: nobled Date: Mon, 12 Jul 2010 21:22:08 -0400 Subject: [PATCH] dri/radeon: test for FEATURE defines 'struct dd_function_table' only conditionally contains the function pointer NewFramebuffer and friends based on FEATURE_EXT_framebuffer_* defines. (See src/mesa/main/dd.h) Fixes the build when the features are disabled and the vfuncs don't exist. --- src/mesa/drivers/dri/radeon/radeon_fbo.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/src/mesa/drivers/dri/radeon/radeon_fbo.c b/src/mesa/drivers/dri/radeon/radeon_fbo.c index 5174850..0597d42 100644 --- a/src/mesa/drivers/dri/radeon/radeon_fbo.c +++ b/src/mesa/drivers/dri/radeon/radeon_fbo.c @@ -609,6 +609,7 @@ radeon_validate_framebuffer(GLcontext *ctx, struct gl_framebuffer *fb) void radeon_fbo_init(struct radeon_context *radeon) { +#if FEATURE_EXT_framebuffer_object radeon->glCtx->Driver.NewFramebuffer = radeon_new_framebuffer; radeon->glCtx->Driver.NewRenderbuffer = radeon_new_renderbuffer; radeon->glCtx->Driver.BindFramebuffer = radeon_bind_framebuffer; @@ -617,7 +618,10 @@ void radeon_fbo_init(struct radeon_context *radeon) radeon->glCtx->Driver.FinishRenderTexture = radeon_finish_render_texture; radeon->glCtx->Driver.ResizeBuffers = radeon_resize_buffers; radeon->glCtx->Driver.ValidateFramebuffer = radeon_validate_framebuffer; +#endif +#if FEATURE_EXT_framebuffer_blit radeon->glCtx->Driver.BlitFramebuffer = _mesa_meta_BlitFramebuffer; +#endif } -- 1.5.4.3 From 5cd07814b2ee90bec0eef3cb9ee40043a838c49e Mon Sep 17 00:00:00 2001 From: nobled Date: Mon, 12 Jul 2010 22:53:32 -0400 Subject: [PATCH] dri/nouveau: test for FEATURE defines 'struct dd_function_table' only conditionally contains the function pointer NewFramebuffer and friends based on FEATURE_EXT_framebuffer_* defines. (See src/mesa/main/dd.h) Fixes the build when the features are disabled and the vfuncs don't exist. --- src/mesa/drivers/dri/nouveau/nouveau_driver.c |2 ++ src/mesa/drivers/dri/nouveau/nouveau_fbo.c|2 ++ 2 files changed, 4 insertions(+), 0 deletions(-) diff --git a/src/mesa/drivers/dri/nouveau/nouveau_driver.c b/src/mesa/drivers/dri/nouveau/nouveau_driver.c index 4ec864c..6452fe2 100644 --- a/src/mesa/drivers/dri/nouveau/nouveau_driver.c +++ b/src/mesa/drivers/dri/nouveau/nouveau_driver.c @@ -138,5 +138,7 @@ nouveau_driver_functions_init(struct dd_function_table *functions) functions->DrawPixels = _mesa_meta_DrawPixels; functions->CopyPixels = _mesa_meta_CopyPixels; functions->Bitmap = _mesa_meta_Bitmap; +#if FEATURE_EXT_framebuffer_blit functions->BlitFramebuffer = _mesa_meta_BlitFramebuffer; +#endif } diff --git a/src/mesa/drivers/dri/nouveau/nouveau_fbo.c b/src/mesa/drivers/dri/nouveau/nouveau_fbo.c index bd1273b..32d8f2d 100644 --- a/src/mesa/drivers/dri/nouveau/nouveau_fbo.c +++ b/src/mesa/drivers/dri/nouveau/nouveau_fbo.c @@ -262,10 +262,12 @@ nouveau_finish_render_texture(GLcontext *ctx, void nouveau_fbo_functions_init(struct dd_function_table *functions) { +#if FEATURE_EXT_framebuffer_object functions->NewFramebuffer = nouveau_framebuffer_new; functions->NewRenderbuffer = nouveau_renderbuffer_new; functions->BindFramebuffer = nouveau_bind_framebuffer; functions->FramebufferRenderbuffer = nouveau_framebuffer_renderbuffer; functions->RenderTexture = nouveau_render_texture; functions->FinishRenderTexture = nouveau_finish_render_texture; +#endif } -- 1.5.4.3 From 9c58ac433666dbdc95f6d0d2dba182379606e390 Mon Sep 17 00:00:00 2001 From: nobled Date: Fri, 13 Aug 2010 20:23:11 + Subject: [PATCH] st/mesa: test for FEATURE defines 'struct dd_function_table' only conditionally contains the function pointer NewFramebuffer and friends based on FEATURE_EXT_framebuffer_* defines. (See src/mesa/main/dd.h) Fixes the build when the features are disabled and the vfuncs don't exist. --- src/mesa/state_tracker/st_cb_fbo.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git
Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64
On Fri, 2010-08-13 at 21:41 +0100, José Fonseca wrote: > On Fri, 2010-08-13 at 06:47 -0700, Luca Barbieri wrote: > > A few related changes: > > 1. Make x86-64 its own architecture (nothing was using so > >util_cpu_caps.arch, so nothing can be affected) > > Just remove util_cpu_caps.arch. It's there simply due to its historical > ancestry. We have PIPE_ARCH already. > > > 2. Turn the CPU arch and endianness into macros, so that the compiler > >can evaluate that at constant time and eliminate dead code > > Ditto. We have PIPE_ENDIAN or something already. >From p_config.h: /* * Endian detection. */ #if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) #define PIPE_ARCH_LITTLE_ENDIAN #elif defined(PIPE_ARCH_PPC) || defined(PIPE_ARCH_PPC_64) #define PIPE_ARCH_BIG_ENDIAN #else #define PIPE_ARCH_UNKNOWN_ENDIAN #endif Basically, in my perspective, util_cpu_caps should *only* have the stuff that can vary at run time. Everything else should be macros in p_config.h/p_compiler.h. The rest of the patches in the series look OK to me. Jose > > > 3. Add util_cpu_abi to know about non-standard ABIs like Win64 > > That's not really prescribed by the CPU. We have PIPE_OS_* already. > > There's no merit in duplicating in util_caps what's already provided by > p_config.h / p_compiler.h > > Jose > > > --- > > src/gallium/auxiliary/gallivm/lp_bld_pack.c |2 +- > > src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |2 +- > > src/gallium/auxiliary/util/u_cpu_detect.c | 19 +- > > src/gallium/auxiliary/util/u_cpu_detect.h | 39 > > ++-- > > 4 files changed, 38 insertions(+), 24 deletions(-) > > > > diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c > > b/src/gallium/auxiliary/gallivm/lp_bld_pack.c > > index ecfb13a..8ab742a 100644 > > --- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c > > +++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c > > @@ -171,7 +171,7 @@ lp_build_unpack2(LLVMBuilderRef builder, > >msb = lp_build_zero(src_type); > > > > /* Interleave bits */ > > - if(util_cpu_caps.little_endian) { > > + if(util_cpu_little_endian) { > >*dst_lo = lp_build_interleave2(builder, src_type, src, msb, 0); > >*dst_hi = lp_build_interleave2(builder, src_type, src, msb, 1); > > } > > diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c > > b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c > > index 3075065..d4b8b4f 100644 > > --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c > > +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c > > @@ -1840,7 +1840,7 @@ lp_build_sample_2d_linear_aos(struct > > lp_build_sample_context *bld, > >unsigned i, j; > > > >for(j = 0; j < h16.type.length; j += 4) { > > - unsigned subindex = util_cpu_caps.little_endian ? 0 : 1; > > + unsigned subindex = util_cpu_little_endian ? 0 : 1; > > LLVMValueRef index; > > > > index = LLVMConstInt(elem_type, j/2 + subindex, 0); > > diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c > > b/src/gallium/auxiliary/util/u_cpu_detect.c > > index b1a8c75..73ce146 100644 > > --- a/src/gallium/auxiliary/util/u_cpu_detect.c > > +++ b/src/gallium/auxiliary/util/u_cpu_detect.c > > @@ -391,23 +391,6 @@ util_cpu_detect(void) > > > > memset(&util_cpu_caps, 0, sizeof util_cpu_caps); > > > > - /* Check for arch type */ > > -#if defined(PIPE_ARCH_MIPS) > > - util_cpu_caps.arch = UTIL_CPU_ARCH_MIPS; > > -#elif defined(PIPE_ARCH_ALPHA) > > - util_cpu_caps.arch = UTIL_CPU_ARCH_ALPHA; > > -#elif defined(PIPE_ARCH_SPARC) > > - util_cpu_caps.arch = UTIL_CPU_ARCH_SPARC; > > -#elif defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) > > - util_cpu_caps.arch = UTIL_CPU_ARCH_X86; > > - util_cpu_caps.little_endian = 1; > > -#elif defined(PIPE_ARCH_PPC) > > - util_cpu_caps.arch = UTIL_CPU_ARCH_POWERPC; > > - util_cpu_caps.little_endian = 0; > > -#else > > - util_cpu_caps.arch = UTIL_CPU_ARCH_UNKNOWN; > > -#endif > > - > > /* Count the number of CPUs in system */ > > #if defined(PIPE_OS_WINDOWS) > > { > > @@ -504,7 +487,7 @@ util_cpu_detect(void) > > > > #ifdef DEBUG > > if (debug_get_option_dump_cpu()) { > > - debug_printf("util_cpu_caps.arch = %i\n", util_cpu_caps.arch); > > + debug_printf("util_cpu_caps.arch = %i\n", util_cpu_arch); > >debug_printf("util_cpu_caps.nr_cpus = %u\n", util_cpu_caps.nr_cpus); > > > >debug_printf("util_cpu_caps.x86_cpu_type = %u\n", > > util_cpu_caps.x86_cpu_type); > > diff --git a/src/gallium/auxiliary/util/u_cpu_detect.h > > b/src/gallium/auxiliary/util/u_cpu_detect.h > > index 4b3dc39..e81e4b5 100644 > > --- a/src/gallium/auxiliary/util/u_cpu_detect.h > > +++ b/src/gallium/auxiliary/util/u_cpu_detect.h > > @@ -36,6 +36,7 @@ > > #define _UTIL_CPU_DETECT_H > > > > #include "pipe/p_compiler.h" > > +#include "pipe/p_config.h" > > > > enum util_cpu_arch { > > UTIL_CPU_ARCH_UNKNOWN
Re: [Mesa-dev] [PATCH 2/3] rtasm: add minimal x86-64 support and new instructions (v2)
Luca, This is great stuff. But one request: if Win64 is untested, please make sure it is disabled by default until somebody had opportunity to test it. Unfortunately I'm really busy with other stuff ATM and don't have the time. Jose On Fri, 2010-08-13 at 06:47 -0700, Luca Barbieri wrote: > Changes in v2: > - Win64 support (untested) > - Use u_cpu_detect.h constants instead of #ifs > > This commit adds minimal x86-64 support: only movs between registers > are supported for r8-r15, and x64_rexw() must be used to ask for 64-bit > operations. > > It also adds several new instructions for the new translate_sse code. > --- > src/gallium/auxiliary/rtasm/rtasm_cpu.c|6 +- > src/gallium/auxiliary/rtasm/rtasm_x86sse.c | 455 > ++-- > src/gallium/auxiliary/rtasm/rtasm_x86sse.h | 69 - > 3 files changed, 493 insertions(+), 37 deletions(-) > > diff --git a/src/gallium/auxiliary/rtasm/rtasm_cpu.c > b/src/gallium/auxiliary/rtasm/rtasm_cpu.c > index 2e15751..0461c81 100644 > --- a/src/gallium/auxiliary/rtasm/rtasm_cpu.c > +++ b/src/gallium/auxiliary/rtasm/rtasm_cpu.c > @@ -30,7 +30,7 @@ > #include "rtasm_cpu.h" > > > -#if defined(PIPE_ARCH_X86) > +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) > static boolean rtasm_sse_enabled(void) > { > static boolean firsttime = 1; > @@ -49,7 +49,7 @@ static boolean rtasm_sse_enabled(void) > int rtasm_cpu_has_sse(void) > { > /* FIXME: actually detect this at run-time */ > -#if defined(PIPE_ARCH_X86) > +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) > return rtasm_sse_enabled(); > #else > return 0; > @@ -59,7 +59,7 @@ int rtasm_cpu_has_sse(void) > int rtasm_cpu_has_sse2(void) > { > /* FIXME: actually detect this at run-time */ > -#if defined(PIPE_ARCH_X86) > +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) > return rtasm_sse_enabled(); > #else > return 0; > diff --git a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c > b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c > index 63007c1..88b182b 100644 > --- a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c > +++ b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c > @@ -22,8 +22,9 @@ > **/ > > #include "pipe/p_config.h" > +#include "util/u_cpu_detect.h" > > -#if defined(PIPE_ARCH_X86) > +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) > > #include "pipe/p_compiler.h" > #include "util/u_debug.h" > @@ -231,6 +232,10 @@ static void emit_modrm( struct x86_function *p, > > assert(reg.mod == mod_REG); > > + /* TODO: support extended x86-64 registers */ > + assert(reg.idx < 8); > + assert(regmem.idx < 8); > + > val |= regmem.mod << 6; /* mod field */ > val |= reg.idx << 3;/* reg field */ > val |= regmem.idx; /* r/m field */ > @@ -363,6 +368,12 @@ int x86_get_label( struct x86_function *p ) > */ > > > +void x64_rexw(struct x86_function *p) > +{ > + if(util_cpu_arch == UTIL_CPU_ARCH_X86_64) > + emit_1ub(p, 0x48); > +} > + > void x86_jcc( struct x86_function *p, > enum x86_cc cc, > int label ) > @@ -449,6 +460,52 @@ void x86_mov_reg_imm( struct x86_function *p, struct > x86_reg dst, int imm ) > emit_1i(p, imm); > } > > +void x86_mov_imm( struct x86_function *p, struct x86_reg dst, int imm ) > +{ > + DUMP_RI( dst, imm ); > + if(dst.mod == mod_REG) > + x86_mov_reg_imm(p, dst, imm); > + else > + { > + emit_1ub(p, 0xc7); > + emit_modrm_noreg(p, 0, dst); > + emit_1i(p, imm); > + } > +} > + > +void x86_mov16_imm( struct x86_function *p, struct x86_reg dst, uint16_t imm > ) > +{ > + DUMP_RI( dst, imm ); > + emit_1ub(p, 0x66); > + if(dst.mod == mod_REG) > + { > + emit_1ub(p, 0xb8 + dst.idx); > + emit_2ub(p, imm & 0xff, imm >> 8); > + } > + else > + { > + emit_1ub(p, 0xc7); > + emit_modrm_noreg(p, 0, dst); > + emit_2ub(p, imm & 0xff, imm >> 8); > + } > +} > + > +void x86_mov8_imm( struct x86_function *p, struct x86_reg dst, uint8_t imm ) > +{ > + DUMP_RI( dst, imm ); > + if(dst.mod == mod_REG) > + { > + emit_1ub(p, 0xb0 + dst.idx); > + emit_1ub(p, imm); > + } > + else > + { > + emit_1ub(p, 0xc6); > + emit_modrm_noreg(p, 0, dst); > + emit_1ub(p, imm); > + } > +} > + > /** > * Immediate group 1 instructions. > */ > @@ -520,7 +577,7 @@ void x86_push( struct x86_function *p, > } > > > - p->stack_offset += 4; > + p->stack_offset += sizeof(void*); > } > > void x86_push_imm32( struct x86_function *p, > @@ -530,7 +587,7 @@ void x86_push_imm32( struct x86_function *p, > emit_1ub(p, 0x68); > emit_1i(p, imm32); > > - p->stack_offset += 4; > + p->stack_offset += sizeof(void*); > } > > > @@ -540,23 +597,33 @@ void x86_pop( struct x86_function *p, > DUMP_R( reg ); > assert(reg.mod == mod_REG); > emit_1ub(p, 0x58 + reg.idx); > - p
Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64
On Fri, 2010-08-13 at 06:47 -0700, Luca Barbieri wrote: > A few related changes: > 1. Make x86-64 its own architecture (nothing was using so >util_cpu_caps.arch, so nothing can be affected) Just remove util_cpu_caps.arch. It's there simply due to its historical ancestry. We have PIPE_ARCH already. > 2. Turn the CPU arch and endianness into macros, so that the compiler >can evaluate that at constant time and eliminate dead code Ditto. We have PIPE_ENDIAN or something already. > 3. Add util_cpu_abi to know about non-standard ABIs like Win64 That's not really prescribed by the CPU. We have PIPE_OS_* already. There's no merit in duplicating in util_caps what's already provided by p_config.h / p_compiler.h Jose > --- > src/gallium/auxiliary/gallivm/lp_bld_pack.c |2 +- > src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |2 +- > src/gallium/auxiliary/util/u_cpu_detect.c | 19 +- > src/gallium/auxiliary/util/u_cpu_detect.h | 39 ++-- > 4 files changed, 38 insertions(+), 24 deletions(-) > > diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c > b/src/gallium/auxiliary/gallivm/lp_bld_pack.c > index ecfb13a..8ab742a 100644 > --- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c > +++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c > @@ -171,7 +171,7 @@ lp_build_unpack2(LLVMBuilderRef builder, >msb = lp_build_zero(src_type); > > /* Interleave bits */ > - if(util_cpu_caps.little_endian) { > + if(util_cpu_little_endian) { >*dst_lo = lp_build_interleave2(builder, src_type, src, msb, 0); >*dst_hi = lp_build_interleave2(builder, src_type, src, msb, 1); > } > diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c > b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c > index 3075065..d4b8b4f 100644 > --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c > +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c > @@ -1840,7 +1840,7 @@ lp_build_sample_2d_linear_aos(struct > lp_build_sample_context *bld, >unsigned i, j; > >for(j = 0; j < h16.type.length; j += 4) { > - unsigned subindex = util_cpu_caps.little_endian ? 0 : 1; > + unsigned subindex = util_cpu_little_endian ? 0 : 1; > LLVMValueRef index; > > index = LLVMConstInt(elem_type, j/2 + subindex, 0); > diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c > b/src/gallium/auxiliary/util/u_cpu_detect.c > index b1a8c75..73ce146 100644 > --- a/src/gallium/auxiliary/util/u_cpu_detect.c > +++ b/src/gallium/auxiliary/util/u_cpu_detect.c > @@ -391,23 +391,6 @@ util_cpu_detect(void) > > memset(&util_cpu_caps, 0, sizeof util_cpu_caps); > > - /* Check for arch type */ > -#if defined(PIPE_ARCH_MIPS) > - util_cpu_caps.arch = UTIL_CPU_ARCH_MIPS; > -#elif defined(PIPE_ARCH_ALPHA) > - util_cpu_caps.arch = UTIL_CPU_ARCH_ALPHA; > -#elif defined(PIPE_ARCH_SPARC) > - util_cpu_caps.arch = UTIL_CPU_ARCH_SPARC; > -#elif defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) > - util_cpu_caps.arch = UTIL_CPU_ARCH_X86; > - util_cpu_caps.little_endian = 1; > -#elif defined(PIPE_ARCH_PPC) > - util_cpu_caps.arch = UTIL_CPU_ARCH_POWERPC; > - util_cpu_caps.little_endian = 0; > -#else > - util_cpu_caps.arch = UTIL_CPU_ARCH_UNKNOWN; > -#endif > - > /* Count the number of CPUs in system */ > #if defined(PIPE_OS_WINDOWS) > { > @@ -504,7 +487,7 @@ util_cpu_detect(void) > > #ifdef DEBUG > if (debug_get_option_dump_cpu()) { > - debug_printf("util_cpu_caps.arch = %i\n", util_cpu_caps.arch); > + debug_printf("util_cpu_caps.arch = %i\n", util_cpu_arch); >debug_printf("util_cpu_caps.nr_cpus = %u\n", util_cpu_caps.nr_cpus); > >debug_printf("util_cpu_caps.x86_cpu_type = %u\n", > util_cpu_caps.x86_cpu_type); > diff --git a/src/gallium/auxiliary/util/u_cpu_detect.h > b/src/gallium/auxiliary/util/u_cpu_detect.h > index 4b3dc39..e81e4b5 100644 > --- a/src/gallium/auxiliary/util/u_cpu_detect.h > +++ b/src/gallium/auxiliary/util/u_cpu_detect.h > @@ -36,6 +36,7 @@ > #define _UTIL_CPU_DETECT_H > > #include "pipe/p_compiler.h" > +#include "pipe/p_config.h" > > enum util_cpu_arch { > UTIL_CPU_ARCH_UNKNOWN = 0, > @@ -43,19 +44,49 @@ enum util_cpu_arch { > UTIL_CPU_ARCH_ALPHA, > UTIL_CPU_ARCH_SPARC, > UTIL_CPU_ARCH_X86, > - UTIL_CPU_ARCH_POWERPC > + UTIL_CPU_ARCH_X86_64, > + UTIL_CPU_ARCH_POWERPC, > + > + /* non-standard ABIs, only used in util_cpu_abi */ > + UTIL_CPU_ABI_WIN64 > }; > > +/* Check for arch type */ > +#if defined(PIPE_ARCH_MIPS) > +#define util_cpu_arch UTIL_CPU_ARCH_MIPS > +#elif defined(PIPE_ARCH_ALPHA) > +#define util_cpu_arch UTIL_CPU_ARCH_ALPHA > +#elif defined(PIPE_ARCH_SPARC) > +#define util_cpu_arch UTIL_CPU_ARCH_SPARC > +#elif defined(PIPE_ARCH_X86) > +#define util_cpu_arch UTIL_CPU_ARCH_X86 > +#elif defined(PIPE_ARCH_X86_64) > +#define util_cpu_arch UTIL_CPU_ARCH_X86_64 > +#elif defined(PIPE_ARCH_PPC) > +#define uti
[Mesa-dev] [Bug 29460] GNU/Hurd support
https://bugs.freedesktop.org/show_bug.cgi?id=29460 Jon TURNEY changed: What|Removed |Added CC||jon.tur...@dronecode.org.uk --- Comment #3 from Jon TURNEY 2010-08-13 12:21:19 PDT --- This is a much better way of achieving what I was trying to achieve with bug #27840 I have reviewed and tested the changes and they look good to me, but I'm not building for linux either :-) -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] translate_sse: major rewrite (v3)
Changes in v3: - Win64 support (untested) - Use u_cpu_detect.h constants instead of #ifs Changes in v2: - Minimize #ifs - Give a name to magic number CHANNELS_0001 - Add support for CPUs without SSE (only memcpy and swizzles, like non SSE2) - Fixed comments translate_sse is currently very limited to the point of being useless in essentially all cases. In particular, it only support some float32 and unorm8 formats and doesn't work on x86-64. This commit rewrites it to support: 1. Dumb memory copy for any pair of identical formats 2. All formats that are swizzles of each other 3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float 4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16 5. Support for x86-64 (doesn't take advantage of it in any way though) This new translate can even be useful to translate index buffers for cards that lack 8-bit index support. It passes the testsuite I wrote, but note that this is a major change, and more testing would be great. --- src/gallium/auxiliary/translate/translate_sse.c | 1162 ++- 1 files changed, 924 insertions(+), 238 deletions(-) diff --git a/src/gallium/auxiliary/translate/translate_sse.c b/src/gallium/auxiliary/translate/translate_sse.c index f9aab92..565edd2 100644 --- a/src/gallium/auxiliary/translate/translate_sse.c +++ b/src/gallium/auxiliary/translate/translate_sse.c @@ -30,11 +30,13 @@ #include "pipe/p_compiler.h" #include "util/u_memory.h" #include "util/u_math.h" +#include "util/u_format.h" +#include "util/u_cpu_detect.h" #include "translate.h" -#if defined(PIPE_ARCH_X86) +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) #include "rtasm/rtasm_cpu.h" #include "rtasm/rtasm_x86sse.h" @@ -48,7 +50,7 @@ struct translate_buffer { const void *base_ptr; - unsigned stride; + uintptr_t stride; unsigned max_index; }; @@ -72,12 +74,10 @@ struct translate_sse { struct x86_function *func; boolean loaded_identity; - boolean loaded_255; - boolean loaded_inv_255; + boolean loaded_const[5]; float identity[4]; - float float_255[4]; - float inv_255[4]; + float const_value[5][4]; struct translate_buffer buffer[PIPE_MAX_ATTRIBS]; unsigned nr_buffers; @@ -96,10 +96,12 @@ struct translate_sse { * like this is helpful to keep them in sync across the file. */ struct x86_reg tmp_EAX; - struct x86_reg idx_EBX; /* either start+i or &elt[i] */ - struct x86_reg outbuf_ECX; - struct x86_reg machine_EDX; - struct x86_reg count_ESI;/* decrements to zero */ + struct x86_reg tmp2_EDX; + struct x86_reg tmp3_ECX; + struct x86_reg idx_ESI; /* either start+i or &elt[i] */ + struct x86_reg machine_EDI; + struct x86_reg outbuf_EBX; + struct x86_reg count_EBP;/* decrements to zero */ }; static int get_offset( const void *a, const void *b ) @@ -111,7 +113,7 @@ static int get_offset( const void *a, const void *b ) static struct x86_reg get_identity( struct translate_sse *p ) { - struct x86_reg reg = x86_make_reg(file_XMM, 6); + struct x86_reg reg = x86_make_reg(file_XMM, 7); if (!p->loaded_identity) { p->loaded_identity = TRUE; @@ -121,253 +123,910 @@ static struct x86_reg get_identity( struct translate_sse *p ) p->identity[3] = 1; sse_movups(p->func, reg, -x86_make_disp(p->machine_EDX, +x86_make_disp(p->machine_EDI, get_offset(p, &p->identity[0]))); } return reg; } -static struct x86_reg get_255( struct translate_sse *p ) +static struct x86_reg get_const( struct translate_sse *p, unsigned i, float v) { - struct x86_reg reg = x86_make_reg(file_XMM, 7); - - if (!p->loaded_255) { - p->loaded_255 = TRUE; - p->float_255[0] = -p->float_255[1] = -p->float_255[2] = -p->float_255[3] = 255.0f; - - sse_movups(p->func, reg, -x86_make_disp(p->machine_EDX, - get_offset(p, &p->float_255[0]))); + struct x86_reg reg = x86_make_reg(file_XMM, 2 + i); + + if (!p->loaded_const[i]) { + p->loaded_const[i] = TRUE; + p->const_value[i][0] = + p->const_value[i][1] = + p->const_value[i][2] = + p->const_value[i][3] = v; + + sse_movups(p->func, reg, + x86_make_disp(p->machine_EDI, + get_offset(p, &p->const_value[i][0]))); } return reg; } -static struct x86_reg get_inv_255( struct translate_sse *p ) +static struct x86_reg get_inv_127( struct translate_sse *p ) { - struct x86_reg reg = x86_make_reg(file_XMM, 5); - - if (!p->loaded_inv_255) { - p->loaded_inv_255 = TRUE; - p->inv_255[0] = -p->inv_255[1] = -p->inv_255[2] = -p->inv_255[3] = 1.0f / 255.0f; - - sse_movups(p->func, reg, -x86_make_disp(p->machine_EDX, - get_offset(p, &p->inv_255[0
[Mesa-dev] [PATCH 6/6] translate_sse: major rewrite (v2)
Changes in v2: - Minimize #ifs - Give a name to magic number CHANNELS_0001 - Add support for CPUs without SSE (only memcpy and swizzles, like non SSE2) - Fixed comments translate_sse is currently very limited to the point of being useless in essentially all cases. In particular, it only support some float32 and unorm8 formats and doesn't work on x86-64. This commit rewrites it to support: 1. Dumb memory copy for any pair of identical formats 2. All formats that are swizzles of each other 3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float 4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16 5. Support for x86-64 (doesn't take advantage of it in any way though) This new translate can even be useful to translate index buffers for cards that lack 8-bit index support. It passes the testsuite I wrote, but note that this is a major change, and more testing would be great. --- src/gallium/auxiliary/translate/translate_sse.c | 1154 ++- 1 files changed, 920 insertions(+), 234 deletions(-) diff --git a/src/gallium/auxiliary/translate/translate_sse.c b/src/gallium/auxiliary/translate/translate_sse.c index f9aab92..e2d8d53 100644 --- a/src/gallium/auxiliary/translate/translate_sse.c +++ b/src/gallium/auxiliary/translate/translate_sse.c @@ -30,11 +30,13 @@ #include "pipe/p_compiler.h" #include "util/u_memory.h" #include "util/u_math.h" +#include "util/u_format.h" +#include "util/u_cpu_detect.h" #include "translate.h" -#if defined(PIPE_ARCH_X86) +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) #include "rtasm/rtasm_cpu.h" #include "rtasm/rtasm_x86sse.h" @@ -48,7 +50,7 @@ struct translate_buffer { const void *base_ptr; - unsigned stride; + uintptr_t stride; unsigned max_index; }; @@ -72,12 +74,10 @@ struct translate_sse { struct x86_function *func; boolean loaded_identity; - boolean loaded_255; - boolean loaded_inv_255; + boolean loaded_const[5]; float identity[4]; - float float_255[4]; - float inv_255[4]; + float const_value[5][4]; struct translate_buffer buffer[PIPE_MAX_ATTRIBS]; unsigned nr_buffers; @@ -96,10 +96,12 @@ struct translate_sse { * like this is helpful to keep them in sync across the file. */ struct x86_reg tmp_EAX; - struct x86_reg idx_EBX; /* either start+i or &elt[i] */ - struct x86_reg outbuf_ECX; - struct x86_reg machine_EDX; - struct x86_reg count_ESI;/* decrements to zero */ + struct x86_reg tmp2_EDX; + struct x86_reg tmp3_ECX; + struct x86_reg idx_ESI; /* either start+i or &elt[i] */ + struct x86_reg machine_EDI; + struct x86_reg outbuf_EBX; + struct x86_reg count_EBP;/* decrements to zero */ }; static int get_offset( const void *a, const void *b ) @@ -111,7 +113,7 @@ static int get_offset( const void *a, const void *b ) static struct x86_reg get_identity( struct translate_sse *p ) { - struct x86_reg reg = x86_make_reg(file_XMM, 6); + struct x86_reg reg = x86_make_reg(file_XMM, 7); if (!p->loaded_identity) { p->loaded_identity = TRUE; @@ -121,253 +123,909 @@ static struct x86_reg get_identity( struct translate_sse *p ) p->identity[3] = 1; sse_movups(p->func, reg, -x86_make_disp(p->machine_EDX, +x86_make_disp(p->machine_EDI, get_offset(p, &p->identity[0]))); } return reg; } -static struct x86_reg get_255( struct translate_sse *p ) +static struct x86_reg get_const( struct translate_sse *p, unsigned i, float v) { - struct x86_reg reg = x86_make_reg(file_XMM, 7); - - if (!p->loaded_255) { - p->loaded_255 = TRUE; - p->float_255[0] = -p->float_255[1] = -p->float_255[2] = -p->float_255[3] = 255.0f; - - sse_movups(p->func, reg, -x86_make_disp(p->machine_EDX, - get_offset(p, &p->float_255[0]))); + struct x86_reg reg = x86_make_reg(file_XMM, 2 + i); + + if (!p->loaded_const[i]) { + p->loaded_const[i] = TRUE; + p->const_value[i][0] = + p->const_value[i][1] = + p->const_value[i][2] = + p->const_value[i][3] = v; + + sse_movups(p->func, reg, + x86_make_disp(p->machine_EDI, + get_offset(p, &p->const_value[i][0]))); } return reg; } -static struct x86_reg get_inv_255( struct translate_sse *p ) +static struct x86_reg get_inv_127( struct translate_sse *p ) { - struct x86_reg reg = x86_make_reg(file_XMM, 5); - - if (!p->loaded_inv_255) { - p->loaded_inv_255 = TRUE; - p->inv_255[0] = -p->inv_255[1] = -p->inv_255[2] = -p->inv_255[3] = 1.0f / 255.0f; - - sse_movups(p->func, reg, -x86_make_disp(p->machine_EDX, - get_offset(p, &p->inv_255[0]))); - } - - return reg; + return get_const(p, 0, 1.0f / 127.0f); } - -static vo
[Mesa-dev] [Bug 29540] [glsl2] problem with vertex attribute locations and draw-time validation
https://bugs.freedesktop.org/show_bug.cgi?id=29540 Ian Romanick changed: What|Removed |Added Status|NEW |ASSIGNED AssignedTo|mesa-...@lists.freedesktop. |i...@freedesktop.org |org | --- Comment #3 from Ian Romanick 2010-08-13 10:27:19 PDT --- (In reply to comment #1) > 1. By returning attrib_loc=1 instead of 0 will we have one less user-defined > vertex attribute available to users with the new compiler? The query of > GL_MAX_VERTEX_ATTRIBS_ARB still returns 16. Attribute 0 is special. It is bound to gl_Vertex by default. Applications can specifically bind to 0 using glBindAttribLocation, but the linker doesn't automatically bind to it. It's possible that the spec says we should bind something to 0 if gl_Vertex isn't used, which is the case in this test. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] talloc (Was: Merge criteria for glsl2 branch)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 José Fonseca wrote: > I've pushed a new branch glsl2-win32 that includes Aras' patch, and all > necessary fixes to get at least MinGW build successfully. I merged all of these. > I had to rename some tokens in order to avoid collisions with windows.h > defines. Aras didn't mention this problem before. Perhaps the indirect > windows.h include can be avoided, or you prefer to handle this some > other way. I did the last one a little differently. I already had CONST_TOK, LAYOUT_TOK, and INLINE_TOK to avoid collisions with Linux headers and other Mesa headers, so I appended _TOK to the colliding names here too. I think I got all of the ones from your patch, but you'll want to double check. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkxldFIACgkQX1gOwKyEAw8i5QCeMMVifl+qxhlaQW+Sh+dwxuz3 w7IAmQGuWFwOZCf8xNlprQqMC9yIqdwO =O+IU -END PGP SIGNATURE- ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] draw: Replace varray and vcache by vsplit
On Fri, Aug 13, 2010 at 11:35 PM, Keith Whitwell wrote: > On Fri, 2010-08-13 at 08:09 -0700, Chia-I Wu wrote: >> On Fri, Aug 13, 2010 at 10:51 PM, Keith Whitwell wrote: >> > On Fri, 2010-08-13 at 07:46 -0700, Chia-I Wu wrote: >> >> On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell >> >> wrote: >> >> > On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote: >> >> >> Hi, >> >> >> >> >> >> There are two primitive transformations in gallium draw module. In >> >> >> varray, primitives are "split"ted. When a primitive has more vertices >> >> >> than the middle end can handle, varray splits the primitive and calls >> >> >> the middle end multiple times. >> >> >> >> >> >> In vcache, primitives are "decompose"d. More advanced primitives are >> >> >> decomposed into one of point, line(_adj), or triangle(_adj). >> >> >> Similarly, vcache may call the middle end multiple times to flush its >> >> >> internal buffer. In some cases, vcache passes the primitves through >> >> >> without decomposing nor splitting, as can be seen in vcache_check_run. >> >> >> >> >> >> The issue with vcache is that it has to decompose a primitive >> >> >> differently depending on the provoking convention, as explained in >> >> >> >> >> >> >> >> >> http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html >> >> >> >> >> >> It becomes a problem when GS is active. >> >> >> >> >> >> My proposal is to make vcache split instead of decompose. Because >> >> >> varray only splits and vcache has a pass-through path, the rest of the >> >> >> workflow already has to support all primitive types. Switching from >> >> >> decompose to split does not require a big change to the rest of the >> >> >> workflow. >> >> >> >> >> >> But then vcache will look a lot like varray, only with indexed >> >> >> primitive support. It leads me to a new frontend that replaces both >> >> >> varray and vcache: vsplit >> >> >> >> >> >> http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit >> >> >> >> >> >> vsplit is based on varray. It uses some code from vcache to support >> >> >> indexed primitives. When vcache decomposes, there are flags being set >> >> >> to indicate that if the stipple counter should be reset or if some >> >> >> edge of a triangle should be omitted in unfilled mode. The segments >> >> >> of a splitted primitive have flags for similar purposes too: >> >> >> >> >> >> DRAW_SPLIT_AFTER More segments to come after this one >> >> >> DRAW_SPLIT_BEFORE There are preceding segments >> >> >> >> >> >> These flags are set by vsplit and the middle ends pass them to the >> >> >> other stages. Therefore, the run methods of middle ends are augmented >> >> >> to take the flags. >> >> >> >> >> >> To summarize, vsplit >> >> >> >> >> >> - fixes GS when (flatshade && flatshade_first) is on >> >> >> - never sends more vertices than the middle end claims to handle >> >> >> - is faster than vcache: split instead of decompose, no get_elt >> >> >> calls >> >> >> - no longer uses the higher bits of draw_elts for stipple/edge flags >> >> >> >> >> >> Suggestions? >> >> > >> >> > >> >> > Hi - I haven't looked at the patches yet, but a couple of questions: >> >> > >> >> > How does this interact with the draw_pipe_* code - which requires >> >> > decomposed primitives? >> >> draw_pipe.c decomposes the primitives. It is there before because it >> >> has to support varray and vcache_check_run which do not decompose. >> > >> > OK. >> > >> >> > How does this cope with indexed rendering where the vertex buffers >> >> > themselves are too large (for hardware or some other entity)? Eg. >> >> > imagine the hardware could cope with up to 64k vertices, and you have a >> >> > drawelements call randomly referencing vertices in range 0..128k ? >> >> Vertex fetching happens in the middle end so the range of the indices >> >> is not a problem. Though vsplit guarantees that it never calls the >> >> middle end with more vertices than the middle end claims to support >> >> (as returned by draw_pt_middle_end::prepare). The limit is usually >> >> decidied by the size of the buffer for vertex emitting. >> > >> > I guess I'm wondering how it does this. If the middle end says it >> > supports 64k vertices, and the vertex element looks like >> > >> > [0, 128k, 64k, 32k, 96k, 16k, 1, ... ] >> > >> > what gets sent? (Sorry, I still haven't looked at the code, you could >> > well have addressed this). >> I see. The frontend would set >> >> fetch_elts = [0, 128k, 64k, 32k, 96k, 16k, 1, ... ] >> draw_elts = [0, 1, 2, 3, 4, 5, 6, ...] >> >> fetch_elts is processed by the middle end and it will fetch the given >> vertices. draw_elts will be passed to draw_emit or the pipeline. It >> is the new index buffer, which indexes into the fetched vertices. >> >> It is actual the same as vcache. So when fetch_elts is >> >> [0, 128k, 64k, 64k, 128k, 16k, ...], >> >> draw_elts would be set to >> >> [0, 1, 2, 2, 1, 3, ...] >> >> The number of elements to fetch (and sh
Re: [Mesa-dev] draw: Replace varray and vcache by vsplit
On Fri, 2010-08-13 at 08:09 -0700, Chia-I Wu wrote: > On Fri, Aug 13, 2010 at 10:51 PM, Keith Whitwell wrote: > > On Fri, 2010-08-13 at 07:46 -0700, Chia-I Wu wrote: > >> On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell wrote: > >> > On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote: > >> >> Hi, > >> >> > >> >> There are two primitive transformations in gallium draw module. In > >> >> varray, primitives are "split"ted. When a primitive has more vertices > >> >> than the middle end can handle, varray splits the primitive and calls > >> >> the middle end multiple times. > >> >> > >> >> In vcache, primitives are "decompose"d. More advanced primitives are > >> >> decomposed into one of point, line(_adj), or triangle(_adj). > >> >> Similarly, vcache may call the middle end multiple times to flush its > >> >> internal buffer. In some cases, vcache passes the primitves through > >> >> without decomposing nor splitting, as can be seen in vcache_check_run. > >> >> > >> >> The issue with vcache is that it has to decompose a primitive > >> >> differently depending on the provoking convention, as explained in > >> >> > >> >> http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html > >> >> > >> >> It becomes a problem when GS is active. > >> >> > >> >> My proposal is to make vcache split instead of decompose. Because > >> >> varray only splits and vcache has a pass-through path, the rest of the > >> >> workflow already has to support all primitive types. Switching from > >> >> decompose to split does not require a big change to the rest of the > >> >> workflow. > >> >> > >> >> But then vcache will look a lot like varray, only with indexed > >> >> primitive support. It leads me to a new frontend that replaces both > >> >> varray and vcache: vsplit > >> >> > >> >> http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit > >> >> > >> >> vsplit is based on varray. It uses some code from vcache to support > >> >> indexed primitives. When vcache decomposes, there are flags being set > >> >> to indicate that if the stipple counter should be reset or if some > >> >> edge of a triangle should be omitted in unfilled mode. The segments > >> >> of a splitted primitive have flags for similar purposes too: > >> >> > >> >> DRAW_SPLIT_AFTER More segments to come after this one > >> >> DRAW_SPLIT_BEFORE There are preceding segments > >> >> > >> >> These flags are set by vsplit and the middle ends pass them to the > >> >> other stages. Therefore, the run methods of middle ends are augmented > >> >> to take the flags. > >> >> > >> >> To summarize, vsplit > >> >> > >> >> - fixes GS when (flatshade && flatshade_first) is on > >> >> - never sends more vertices than the middle end claims to handle > >> >> - is faster than vcache: split instead of decompose, no get_elt > >> >>calls > >> >> - no longer uses the higher bits of draw_elts for stipple/edge flags > >> >> > >> >> Suggestions? > >> > > >> > > >> > Hi - I haven't looked at the patches yet, but a couple of questions: > >> > > >> > How does this interact with the draw_pipe_* code - which requires > >> > decomposed primitives? > >> draw_pipe.c decomposes the primitives. It is there before because it > >> has to support varray and vcache_check_run which do not decompose. > > > > OK. > > > >> > How does this cope with indexed rendering where the vertex buffers > >> > themselves are too large (for hardware or some other entity)? Eg. > >> > imagine the hardware could cope with up to 64k vertices, and you have a > >> > drawelements call randomly referencing vertices in range 0..128k ? > >> Vertex fetching happens in the middle end so the range of the indices > >> is not a problem. Though vsplit guarantees that it never calls the > >> middle end with more vertices than the middle end claims to support > >> (as returned by draw_pt_middle_end::prepare). The limit is usually > >> decidied by the size of the buffer for vertex emitting. > > > > I guess I'm wondering how it does this. If the middle end says it > > supports 64k vertices, and the vertex element looks like > > > > [0, 128k, 64k, 32k, 96k, 16k, 1, ... ] > > > > what gets sent? (Sorry, I still haven't looked at the code, you could > > well have addressed this). > I see. The frontend would set > >fetch_elts = [0, 128k, 64k, 32k, 96k, 16k, 1, ... ] >draw_elts = [0, 1, 2, 3, 4, 5, 6, ...] > > fetch_elts is processed by the middle end and it will fetch the given > vertices. draw_elts will be passed to draw_emit or the pipeline. It > is the new index buffer, which indexes into the fetched vertices. > > It is actual the same as vcache. So when fetch_elts is > >[0, 128k, 64k, 64k, 128k, 16k, ...], > > draw_elts would be set to > >[0, 1, 2, 2, 1, 3, ...] > > The number of elements to fetch (and shade) is minimized. Thanks Chia-I, I've taken a look at the code & this makes sense - the fetch/draw cache is still there, but specialized into 4 versions for each element t
Re: [Mesa-dev] draw: Replace varray and vcache by vsplit
On Fri, Aug 13, 2010 at 11:09 PM, Chia-I Wu wrote: > On Fri, Aug 13, 2010 at 10:51 PM, Keith Whitwell wrote: >> On Fri, 2010-08-13 at 07:46 -0700, Chia-I Wu wrote: >>> On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell wrote: >>> > On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote: >>> >> Hi, >>> >> >>> >> There are two primitive transformations in gallium draw module. In >>> >> varray, primitives are "split"ted. When a primitive has more vertices >>> >> than the middle end can handle, varray splits the primitive and calls >>> >> the middle end multiple times. >>> >> >>> >> In vcache, primitives are "decompose"d. More advanced primitives are >>> >> decomposed into one of point, line(_adj), or triangle(_adj). >>> >> Similarly, vcache may call the middle end multiple times to flush its >>> >> internal buffer. In some cases, vcache passes the primitves through >>> >> without decomposing nor splitting, as can be seen in vcache_check_run. >>> >> >>> >> The issue with vcache is that it has to decompose a primitive >>> >> differently depending on the provoking convention, as explained in >>> >> >>> >> http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html >>> >> >>> >> It becomes a problem when GS is active. >>> >> >>> >> My proposal is to make vcache split instead of decompose. Because >>> >> varray only splits and vcache has a pass-through path, the rest of the >>> >> workflow already has to support all primitive types. Switching from >>> >> decompose to split does not require a big change to the rest of the >>> >> workflow. >>> >> >>> >> But then vcache will look a lot like varray, only with indexed >>> >> primitive support. It leads me to a new frontend that replaces both >>> >> varray and vcache: vsplit >>> >> >>> >> http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit >>> >> >>> >> vsplit is based on varray. It uses some code from vcache to support >>> >> indexed primitives. When vcache decomposes, there are flags being set >>> >> to indicate that if the stipple counter should be reset or if some >>> >> edge of a triangle should be omitted in unfilled mode. The segments >>> >> of a splitted primitive have flags for similar purposes too: >>> >> >>> >> DRAW_SPLIT_AFTER More segments to come after this one >>> >> DRAW_SPLIT_BEFORE There are preceding segments >>> >> >>> >> These flags are set by vsplit and the middle ends pass them to the >>> >> other stages. Therefore, the run methods of middle ends are augmented >>> >> to take the flags. >>> >> >>> >> To summarize, vsplit >>> >> >>> >> - fixes GS when (flatshade && flatshade_first) is on >>> >> - never sends more vertices than the middle end claims to handle >>> >> - is faster than vcache: split instead of decompose, no get_elt >>> >> calls >>> >> - no longer uses the higher bits of draw_elts for stipple/edge flags >>> >> >>> >> Suggestions? >>> > >>> > >>> > Hi - I haven't looked at the patches yet, but a couple of questions: >>> > >>> > How does this interact with the draw_pipe_* code - which requires >>> > decomposed primitives? >>> draw_pipe.c decomposes the primitives. It is there before because it >>> has to support varray and vcache_check_run which do not decompose. >> >> OK. >> >>> > How does this cope with indexed rendering where the vertex buffers >>> > themselves are too large (for hardware or some other entity)? Eg. >>> > imagine the hardware could cope with up to 64k vertices, and you have a >>> > drawelements call randomly referencing vertices in range 0..128k ? >>> Vertex fetching happens in the middle end so the range of the indices >>> is not a problem. Though vsplit guarantees that it never calls the >>> middle end with more vertices than the middle end claims to support >>> (as returned by draw_pt_middle_end::prepare). The limit is usually >>> decidied by the size of the buffer for vertex emitting. >> >> I guess I'm wondering how it does this. If the middle end says it >> supports 64k vertices, and the vertex element looks like >> >> [0, 128k, 64k, 32k, 96k, 16k, 1, ... ] >> >> what gets sent? (Sorry, I still haven't looked at the code, you could >> well have addressed this). > I see. The frontend would set > > fetch_elts = [0, 128k, 64k, 32k, 96k, 16k, 1, ... ] > draw_elts = [0, 1, 2, 3, 4, 5, 6, ...] > > fetch_elts is processed by the middle end and it will fetch the given > vertices. draw_elts will be passed to draw_emit or the pipeline. It > is the new index buffer, which indexes into the fetched vertices. > > It is actual the same as vcache. So when fetch_elts is Should be: So when the index buffer looks like > [0, 128k, 64k, 64k, 128k, 16k, ...], fetch_elts would be set to [0, 128k, 64k, 16k, ...] and > draw_elts would be set to > > [0, 1, 2, 2, 1, 3, ...] > > The number of elements to fetch (and shade) is minimized. > > -- > o...@lunarg.com > -- o...@lunarg.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop
[Mesa-dev] [Bug 29540] [glsl2] problem with vertex attribute locations and draw-time validation
https://bugs.freedesktop.org/show_bug.cgi?id=29540 --- Comment #2 from Brian Paul 2010-08-13 08:10:00 PDT --- Created an attachment (id=37848) View: https://bugs.freedesktop.org/attachment.cgi?id=37848 Review: https://bugs.freedesktop.org/review?bug=29540&attachment=37848 Patch to fix glDrawArrays/Elements validation If there's not objections to this, I'll commit later. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] draw: Replace varray and vcache by vsplit
On Fri, Aug 13, 2010 at 10:51 PM, Keith Whitwell wrote: > On Fri, 2010-08-13 at 07:46 -0700, Chia-I Wu wrote: >> On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell wrote: >> > On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote: >> >> Hi, >> >> >> >> There are two primitive transformations in gallium draw module. In >> >> varray, primitives are "split"ted. When a primitive has more vertices >> >> than the middle end can handle, varray splits the primitive and calls >> >> the middle end multiple times. >> >> >> >> In vcache, primitives are "decompose"d. More advanced primitives are >> >> decomposed into one of point, line(_adj), or triangle(_adj). >> >> Similarly, vcache may call the middle end multiple times to flush its >> >> internal buffer. In some cases, vcache passes the primitves through >> >> without decomposing nor splitting, as can be seen in vcache_check_run. >> >> >> >> The issue with vcache is that it has to decompose a primitive >> >> differently depending on the provoking convention, as explained in >> >> >> >> http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html >> >> >> >> It becomes a problem when GS is active. >> >> >> >> My proposal is to make vcache split instead of decompose. Because >> >> varray only splits and vcache has a pass-through path, the rest of the >> >> workflow already has to support all primitive types. Switching from >> >> decompose to split does not require a big change to the rest of the >> >> workflow. >> >> >> >> But then vcache will look a lot like varray, only with indexed >> >> primitive support. It leads me to a new frontend that replaces both >> >> varray and vcache: vsplit >> >> >> >> http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit >> >> >> >> vsplit is based on varray. It uses some code from vcache to support >> >> indexed primitives. When vcache decomposes, there are flags being set >> >> to indicate that if the stipple counter should be reset or if some >> >> edge of a triangle should be omitted in unfilled mode. The segments >> >> of a splitted primitive have flags for similar purposes too: >> >> >> >> DRAW_SPLIT_AFTER More segments to come after this one >> >> DRAW_SPLIT_BEFORE There are preceding segments >> >> >> >> These flags are set by vsplit and the middle ends pass them to the >> >> other stages. Therefore, the run methods of middle ends are augmented >> >> to take the flags. >> >> >> >> To summarize, vsplit >> >> >> >> - fixes GS when (flatshade && flatshade_first) is on >> >> - never sends more vertices than the middle end claims to handle >> >> - is faster than vcache: split instead of decompose, no get_elt >> >> calls >> >> - no longer uses the higher bits of draw_elts for stipple/edge flags >> >> >> >> Suggestions? >> > >> > >> > Hi - I haven't looked at the patches yet, but a couple of questions: >> > >> > How does this interact with the draw_pipe_* code - which requires >> > decomposed primitives? >> draw_pipe.c decomposes the primitives. It is there before because it >> has to support varray and vcache_check_run which do not decompose. > > OK. > >> > How does this cope with indexed rendering where the vertex buffers >> > themselves are too large (for hardware or some other entity)? Eg. >> > imagine the hardware could cope with up to 64k vertices, and you have a >> > drawelements call randomly referencing vertices in range 0..128k ? >> Vertex fetching happens in the middle end so the range of the indices >> is not a problem. Though vsplit guarantees that it never calls the >> middle end with more vertices than the middle end claims to support >> (as returned by draw_pt_middle_end::prepare). The limit is usually >> decidied by the size of the buffer for vertex emitting. > > I guess I'm wondering how it does this. If the middle end says it > supports 64k vertices, and the vertex element looks like > > [0, 128k, 64k, 32k, 96k, 16k, 1, ... ] > > what gets sent? (Sorry, I still haven't looked at the code, you could > well have addressed this). I see. The frontend would set fetch_elts = [0, 128k, 64k, 32k, 96k, 16k, 1, ... ] draw_elts = [0, 1, 2, 3, 4, 5, 6, ...] fetch_elts is processed by the middle end and it will fetch the given vertices. draw_elts will be passed to draw_emit or the pipeline. It is the new index buffer, which indexes into the fetched vertices. It is actual the same as vcache. So when fetch_elts is [0, 128k, 64k, 64k, 128k, 16k, ...], draw_elts would be set to [0, 1, 2, 2, 1, 3, ...] The number of elements to fetch (and shade) is minimized. -- o...@lunarg.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 29540] [glsl2] problem with vertex attribute locations and draw-time validation
https://bugs.freedesktop.org/show_bug.cgi?id=29540 --- Comment #1 from Brian Paul 2010-08-13 08:09:05 PDT --- The new piglit test glsl-getattriblocation test returns attrib_loc=0 for the old compiler and attrib_loc=1 for the new compiler. This causes the glDrawArrays/Elements() validation test to fail at api_validate.c:124 so nothing is drawn. There's two issues here. 1. By returning attrib_loc=1 instead of 0 will we have one less user-defined vertex attribute available to users with the new compiler? The query of GL_MAX_VERTEX_ATTRIBS_ARB still returns 16. 2. Mesa's glDrawArrays/Elements() validation check is incorrect. I've added two other piglit tests (glsl-bindattriblocation and glsl-novertexdata) that test this logic. They pass with NVIDIA's driver but fail with Mesa. The attached patch, however, fixes the problem for Mesa (for the old drivers at least but it doesn't work with gallium yet). -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] draw: Replace varray and vcache by vsplit
On Fri, 2010-08-13 at 07:46 -0700, Chia-I Wu wrote: > On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell wrote: > > On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote: > >> Hi, > >> > >> There are two primitive transformations in gallium draw module. In > >> varray, primitives are "split"ted. When a primitive has more vertices > >> than the middle end can handle, varray splits the primitive and calls > >> the middle end multiple times. > >> > >> In vcache, primitives are "decompose"d. More advanced primitives are > >> decomposed into one of point, line(_adj), or triangle(_adj). > >> Similarly, vcache may call the middle end multiple times to flush its > >> internal buffer. In some cases, vcache passes the primitves through > >> without decomposing nor splitting, as can be seen in vcache_check_run. > >> > >> The issue with vcache is that it has to decompose a primitive > >> differently depending on the provoking convention, as explained in > >> > >> http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html > >> > >> It becomes a problem when GS is active. > >> > >> My proposal is to make vcache split instead of decompose. Because > >> varray only splits and vcache has a pass-through path, the rest of the > >> workflow already has to support all primitive types. Switching from > >> decompose to split does not require a big change to the rest of the > >> workflow. > >> > >> But then vcache will look a lot like varray, only with indexed > >> primitive support. It leads me to a new frontend that replaces both > >> varray and vcache: vsplit > >> > >> http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit > >> > >> vsplit is based on varray. It uses some code from vcache to support > >> indexed primitives. When vcache decomposes, there are flags being set > >> to indicate that if the stipple counter should be reset or if some > >> edge of a triangle should be omitted in unfilled mode. The segments > >> of a splitted primitive have flags for similar purposes too: > >> > >> DRAW_SPLIT_AFTER More segments to come after this one > >> DRAW_SPLIT_BEFORE There are preceding segments > >> > >> These flags are set by vsplit and the middle ends pass them to the > >> other stages. Therefore, the run methods of middle ends are augmented > >> to take the flags. > >> > >> To summarize, vsplit > >> > >> - fixes GS when (flatshade && flatshade_first) is on > >> - never sends more vertices than the middle end claims to handle > >> - is faster than vcache: split instead of decompose, no get_elt > >>calls > >> - no longer uses the higher bits of draw_elts for stipple/edge flags > >> > >> Suggestions? > > > > > > Hi - I haven't looked at the patches yet, but a couple of questions: > > > > How does this interact with the draw_pipe_* code - which requires > > decomposed primitives? > draw_pipe.c decomposes the primitives. It is there before because it > has to support varray and vcache_check_run which do not decompose. OK. > > How does this cope with indexed rendering where the vertex buffers > > themselves are too large (for hardware or some other entity)? Eg. > > imagine the hardware could cope with up to 64k vertices, and you have a > > drawelements call randomly referencing vertices in range 0..128k ? > Vertex fetching happens in the middle end so the range of the indices > is not a problem. Though vsplit guarantees that it never calls the > middle end with more vertices than the middle end claims to support > (as returned by draw_pt_middle_end::prepare). The limit is usually > decidied by the size of the buffer for vertex emitting. I guess I'm wondering how it does this. If the middle end says it supports 64k vertices, and the vertex element looks like [0, 128k, 64k, 32k, 96k, 16k, 1, ... ] what gets sent? (Sorry, I still haven't looked at the code, you could well have addressed this). Keith ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] draw: Replace varray and vcache by vsplit
On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell wrote: > On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote: >> Hi, >> >> There are two primitive transformations in gallium draw module. In >> varray, primitives are "split"ted. When a primitive has more vertices >> than the middle end can handle, varray splits the primitive and calls >> the middle end multiple times. >> >> In vcache, primitives are "decompose"d. More advanced primitives are >> decomposed into one of point, line(_adj), or triangle(_adj). >> Similarly, vcache may call the middle end multiple times to flush its >> internal buffer. In some cases, vcache passes the primitves through >> without decomposing nor splitting, as can be seen in vcache_check_run. >> >> The issue with vcache is that it has to decompose a primitive >> differently depending on the provoking convention, as explained in >> >> http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html >> >> It becomes a problem when GS is active. >> >> My proposal is to make vcache split instead of decompose. Because >> varray only splits and vcache has a pass-through path, the rest of the >> workflow already has to support all primitive types. Switching from >> decompose to split does not require a big change to the rest of the >> workflow. >> >> But then vcache will look a lot like varray, only with indexed >> primitive support. It leads me to a new frontend that replaces both >> varray and vcache: vsplit >> >> http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit >> >> vsplit is based on varray. It uses some code from vcache to support >> indexed primitives. When vcache decomposes, there are flags being set >> to indicate that if the stipple counter should be reset or if some >> edge of a triangle should be omitted in unfilled mode. The segments >> of a splitted primitive have flags for similar purposes too: >> >> DRAW_SPLIT_AFTER More segments to come after this one >> DRAW_SPLIT_BEFORE There are preceding segments >> >> These flags are set by vsplit and the middle ends pass them to the >> other stages. Therefore, the run methods of middle ends are augmented >> to take the flags. >> >> To summarize, vsplit >> >> - fixes GS when (flatshade && flatshade_first) is on >> - never sends more vertices than the middle end claims to handle >> - is faster than vcache: split instead of decompose, no get_elt >> calls >> - no longer uses the higher bits of draw_elts for stipple/edge flags >> >> Suggestions? > > > Hi - I haven't looked at the patches yet, but a couple of questions: > > How does this interact with the draw_pipe_* code - which requires > decomposed primitives? draw_pipe.c decomposes the primitives. It is there before because it has to support varray and vcache_check_run which do not decompose. > How does this cope with indexed rendering where the vertex buffers > themselves are too large (for hardware or some other entity)? Eg. > imagine the hardware could cope with up to 64k vertices, and you have a > drawelements call randomly referencing vertices in range 0..128k ? Vertex fetching happens in the middle end so the range of the indices is not a problem. Though vsplit guarantees that it never calls the middle end with more vertices than the middle end claims to support (as returned by draw_pt_middle_end::prepare). The limit is usually decidied by the size of the buffer for vertex emitting. -- o...@lunarg.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] talloc (Was: Merge criteria for glsl2 branch)
> I had to rename some tokens in order to avoid collisions with windows.h > defines. Aras didn't mention this problem before. I mentioned this to Eric in private conversation, but on this list I only talked about talloc specific changes. Yeah, in the glsl2 parser some tokens clash with windows headers (BOOL, INPUT etc.), and is indirectly included by Mesa's own gl.h. I've been doing renames of them in my own fork. Maybe bulk of glsl2 does not need to include that much of Mesa itself (e.g. right now you need to have things like src/mapi/mapi/u_thread.h and src/mesa/math/m_matrix.h - because they are included indirectly by something that glsl2 includes). -- Aras Pranckevičius work: http://unity3d.com home: http://aras-p.info ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64
> #define symbols are usually uppercase. Is there a reason why you're using > lowercase? I'd prefer the code to follow typical conventions. Currently they can't be used in preprocessor directives because they evaluate to enums, so it seems better to treat them as variables, and thus lowercase. What I think might make sense is to turn them into static consts rather than making them uppercase and turning the enum into #defines, since PIPE_ARCH_* (and the more standard __i386__, etc.) can already be used in preprocessor directives. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] draw: Replace varray and vcache by vsplit
On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote: > Hi, > > There are two primitive transformations in gallium draw module. In > varray, primitives are "split"ted. When a primitive has more vertices > than the middle end can handle, varray splits the primitive and calls > the middle end multiple times. > > In vcache, primitives are "decompose"d. More advanced primitives are > decomposed into one of point, line(_adj), or triangle(_adj). > Similarly, vcache may call the middle end multiple times to flush its > internal buffer. In some cases, vcache passes the primitves through > without decomposing nor splitting, as can be seen in vcache_check_run. > > The issue with vcache is that it has to decompose a primitive > differently depending on the provoking convention, as explained in > > http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html > > It becomes a problem when GS is active. > > My proposal is to make vcache split instead of decompose. Because > varray only splits and vcache has a pass-through path, the rest of the > workflow already has to support all primitive types. Switching from > decompose to split does not require a big change to the rest of the > workflow. > > But then vcache will look a lot like varray, only with indexed > primitive support. It leads me to a new frontend that replaces both > varray and vcache: vsplit > > http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit > > vsplit is based on varray. It uses some code from vcache to support > indexed primitives. When vcache decomposes, there are flags being set > to indicate that if the stipple counter should be reset or if some > edge of a triangle should be omitted in unfilled mode. The segments > of a splitted primitive have flags for similar purposes too: > > DRAW_SPLIT_AFTER More segments to come after this one > DRAW_SPLIT_BEFORE There are preceding segments > > These flags are set by vsplit and the middle ends pass them to the > other stages. Therefore, the run methods of middle ends are augmented > to take the flags. > > To summarize, vsplit > > - fixes GS when (flatshade && flatshade_first) is on > - never sends more vertices than the middle end claims to handle > - is faster than vcache: split instead of decompose, no get_elt >calls > - no longer uses the higher bits of draw_elts for stipple/edge flags > > Suggestions? Hi - I haven't looked at the patches yet, but a couple of questions: How does this interact with the draw_pipe_* code - which requires decomposed primitives? How does this cope with indexed rendering where the vertex buffers themselves are too large (for hardware or some other entity)? Eg. imagine the hardware could cope with up to 64k vertices, and you have a drawelements call randomly referencing vertices in range 0..128k ? Keith ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] draw: Replace varray and vcache by vsplit
Hi, There are two primitive transformations in gallium draw module. In varray, primitives are "split"ted. When a primitive has more vertices than the middle end can handle, varray splits the primitive and calls the middle end multiple times. In vcache, primitives are "decompose"d. More advanced primitives are decomposed into one of point, line(_adj), or triangle(_adj). Similarly, vcache may call the middle end multiple times to flush its internal buffer. In some cases, vcache passes the primitves through without decomposing nor splitting, as can be seen in vcache_check_run. The issue with vcache is that it has to decompose a primitive differently depending on the provoking convention, as explained in http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html It becomes a problem when GS is active. My proposal is to make vcache split instead of decompose. Because varray only splits and vcache has a pass-through path, the rest of the workflow already has to support all primitive types. Switching from decompose to split does not require a big change to the rest of the workflow. But then vcache will look a lot like varray, only with indexed primitive support. It leads me to a new frontend that replaces both varray and vcache: vsplit http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit vsplit is based on varray. It uses some code from vcache to support indexed primitives. When vcache decomposes, there are flags being set to indicate that if the stipple counter should be reset or if some edge of a triangle should be omitted in unfilled mode. The segments of a splitted primitive have flags for similar purposes too: DRAW_SPLIT_AFTER More segments to come after this one DRAW_SPLIT_BEFORE There are preceding segments These flags are set by vsplit and the middle ends pass them to the other stages. Therefore, the run methods of middle ends are augmented to take the flags. To summarize, vsplit - fixes GS when (flatshade && flatshade_first) is on - never sends more vertices than the middle end claims to handle - is faster than vcache: split instead of decompose, no get_elt calls - no longer uses the higher bits of draw_elts for stipple/edge flags Suggestions? -- o...@lunarg.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/3] rtasm/translate_sse rework (v3)
On Fri, 2010-08-13 at 06:47 -0700, Luca Barbieri wrote: > This is a new version of just the rtasm/translate_sse patches. > > This version has the Win64 support built-in. > > In addition, it follows Keith's idea to use constants instead of #ifs. > > To achieve this, u_cpu_detect.h is enhanced so that architecture and > endianness are now compile time constants (and thus produce the same > code as #ifs after optimization), and the CPU ABI is now also available > to support Win64. Thanks for the fixes, Luca - this is looking great. Keith ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64
On 08/13/2010 07:47 AM, Luca Barbieri wrote: A few related changes: 1. Make x86-64 its own architecture (nothing was using so util_cpu_caps.arch, so nothing can be affected) 2. Turn the CPU arch and endianness into macros, so that the compiler can evaluate that at constant time and eliminate dead code 3. Add util_cpu_abi to know about non-standard ABIs like Win64 --- src/gallium/auxiliary/gallivm/lp_bld_pack.c |2 +- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |2 +- src/gallium/auxiliary/util/u_cpu_detect.c | 19 +- src/gallium/auxiliary/util/u_cpu_detect.h | 39 ++-- 4 files changed, 38 insertions(+), 24 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c b/src/gallium/auxiliary/gallivm/lp_bld_pack.c index ecfb13a..8ab742a 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c @@ -171,7 +171,7 @@ lp_build_unpack2(LLVMBuilderRef builder, msb = lp_build_zero(src_type); /* Interleave bits */ - if(util_cpu_caps.little_endian) { + if(util_cpu_little_endian) { *dst_lo = lp_build_interleave2(builder, src_type, src, msb, 0); *dst_hi = lp_build_interleave2(builder, src_type, src, msb, 1); } diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c index 3075065..d4b8b4f 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c @@ -1840,7 +1840,7 @@ lp_build_sample_2d_linear_aos(struct lp_build_sample_context *bld, unsigned i, j; for(j = 0; j< h16.type.length; j += 4) { - unsigned subindex = util_cpu_caps.little_endian ? 0 : 1; + unsigned subindex = util_cpu_little_endian ? 0 : 1; LLVMValueRef index; index = LLVMConstInt(elem_type, j/2 + subindex, 0); diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c b/src/gallium/auxiliary/util/u_cpu_detect.c index b1a8c75..73ce146 100644 --- a/src/gallium/auxiliary/util/u_cpu_detect.c +++ b/src/gallium/auxiliary/util/u_cpu_detect.c @@ -391,23 +391,6 @@ util_cpu_detect(void) memset(&util_cpu_caps, 0, sizeof util_cpu_caps); - /* Check for arch type */ -#if defined(PIPE_ARCH_MIPS) - util_cpu_caps.arch = UTIL_CPU_ARCH_MIPS; -#elif defined(PIPE_ARCH_ALPHA) - util_cpu_caps.arch = UTIL_CPU_ARCH_ALPHA; -#elif defined(PIPE_ARCH_SPARC) - util_cpu_caps.arch = UTIL_CPU_ARCH_SPARC; -#elif defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) - util_cpu_caps.arch = UTIL_CPU_ARCH_X86; - util_cpu_caps.little_endian = 1; -#elif defined(PIPE_ARCH_PPC) - util_cpu_caps.arch = UTIL_CPU_ARCH_POWERPC; - util_cpu_caps.little_endian = 0; -#else - util_cpu_caps.arch = UTIL_CPU_ARCH_UNKNOWN; -#endif - /* Count the number of CPUs in system */ #if defined(PIPE_OS_WINDOWS) { @@ -504,7 +487,7 @@ util_cpu_detect(void) #ifdef DEBUG if (debug_get_option_dump_cpu()) { - debug_printf("util_cpu_caps.arch = %i\n", util_cpu_caps.arch); + debug_printf("util_cpu_caps.arch = %i\n", util_cpu_arch); debug_printf("util_cpu_caps.nr_cpus = %u\n", util_cpu_caps.nr_cpus); debug_printf("util_cpu_caps.x86_cpu_type = %u\n", util_cpu_caps.x86_cpu_type); diff --git a/src/gallium/auxiliary/util/u_cpu_detect.h b/src/gallium/auxiliary/util/u_cpu_detect.h index 4b3dc39..e81e4b5 100644 --- a/src/gallium/auxiliary/util/u_cpu_detect.h +++ b/src/gallium/auxiliary/util/u_cpu_detect.h @@ -36,6 +36,7 @@ #define _UTIL_CPU_DETECT_H #include "pipe/p_compiler.h" +#include "pipe/p_config.h" enum util_cpu_arch { UTIL_CPU_ARCH_UNKNOWN = 0, @@ -43,19 +44,49 @@ enum util_cpu_arch { UTIL_CPU_ARCH_ALPHA, UTIL_CPU_ARCH_SPARC, UTIL_CPU_ARCH_X86, - UTIL_CPU_ARCH_POWERPC + UTIL_CPU_ARCH_X86_64, + UTIL_CPU_ARCH_POWERPC, + + /* non-standard ABIs, only used in util_cpu_abi */ + UTIL_CPU_ABI_WIN64 }; +/* Check for arch type */ +#if defined(PIPE_ARCH_MIPS) +#define util_cpu_arch UTIL_CPU_ARCH_MIPS +#elif defined(PIPE_ARCH_ALPHA) +#define util_cpu_arch UTIL_CPU_ARCH_ALPHA +#elif defined(PIPE_ARCH_SPARC) +#define util_cpu_arch UTIL_CPU_ARCH_SPARC +#elif defined(PIPE_ARCH_X86) +#define util_cpu_arch UTIL_CPU_ARCH_X86 +#elif defined(PIPE_ARCH_X86_64) +#define util_cpu_arch UTIL_CPU_ARCH_X86_64 +#elif defined(PIPE_ARCH_PPC) +#define util_cpu_arch UTIL_CPU_ARCH_POWERPC +#else +#define util_cpu_arch UTIL_CPU_ARCH_UNKNOWN +#endif + +#ifdef WIN64 +#define util_cpu_abi UTIL_CPU_ABI_WIN64 +#else +#define util_cpu_abi util_cpu_arch +#endif + +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) || !defined(WORDS_BIGENDIAN) +#define util_cpu_little_endian 1 +#else +#define util_cpu_little_endian 0 +#endif #define symbols are usually uppercase. Is there a reason why you're using lowercase? I'd prefer the code to follow typical conventions. -Brian ___
[Mesa-dev] [PATCH 3/3] translate_sse: major rewrite (v3)
Changes in v3: - Win64 support (untested) - Use u_cpu_detect.h constants instead of #ifs Changes in v2: - Minimize #ifs - Give a name to magic number CHANNELS_0001 - Add support for CPUs without SSE (only memcpy and swizzles, like non SSE2) - Fixed comments translate_sse is currently very limited to the point of being useless in essentially all cases. In particular, it only support some float32 and unorm8 formats and doesn't work on x86-64. This commit rewrites it to support: 1. Dumb memory copy for any pair of identical formats 2. All formats that are swizzles of each other 3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float 4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16 5. Support for x86-64 (doesn't take advantage of it in any way though) This new translate can even be useful to translate index buffers for cards that lack 8-bit index support. It passes the testsuite I wrote, but note that this is a major change, and more testing would be great. 0003-translate_sse-major-rewrite-v3.patch.gz Description: GNU Zip compressed data ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/6] translate_sse: major rewrite
> What about just making things prettier by converting the #if's into > regular if statements? It would be easier to read if nothing else, > though it would mean compiling at least a stub version of the x86-64 > opcode emitters on x86. > > In fact there's nothing preventing us compiling the entire x86-64 > emitters on x86, though obviously they can't be used -- but there's > nothing which requires that code to be #ifdef'ed out on other platforms. OK, done, see new patchset. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] rtasm: add minimal x86-64 support and new instructions (v2)
Changes in v2: - Win64 support (untested) - Use u_cpu_detect.h constants instead of #ifs This commit adds minimal x86-64 support: only movs between registers are supported for r8-r15, and x64_rexw() must be used to ask for 64-bit operations. It also adds several new instructions for the new translate_sse code. --- src/gallium/auxiliary/rtasm/rtasm_cpu.c|6 +- src/gallium/auxiliary/rtasm/rtasm_x86sse.c | 455 ++-- src/gallium/auxiliary/rtasm/rtasm_x86sse.h | 69 - 3 files changed, 493 insertions(+), 37 deletions(-) diff --git a/src/gallium/auxiliary/rtasm/rtasm_cpu.c b/src/gallium/auxiliary/rtasm/rtasm_cpu.c index 2e15751..0461c81 100644 --- a/src/gallium/auxiliary/rtasm/rtasm_cpu.c +++ b/src/gallium/auxiliary/rtasm/rtasm_cpu.c @@ -30,7 +30,7 @@ #include "rtasm_cpu.h" -#if defined(PIPE_ARCH_X86) +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) static boolean rtasm_sse_enabled(void) { static boolean firsttime = 1; @@ -49,7 +49,7 @@ static boolean rtasm_sse_enabled(void) int rtasm_cpu_has_sse(void) { /* FIXME: actually detect this at run-time */ -#if defined(PIPE_ARCH_X86) +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) return rtasm_sse_enabled(); #else return 0; @@ -59,7 +59,7 @@ int rtasm_cpu_has_sse(void) int rtasm_cpu_has_sse2(void) { /* FIXME: actually detect this at run-time */ -#if defined(PIPE_ARCH_X86) +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) return rtasm_sse_enabled(); #else return 0; diff --git a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c index 63007c1..88b182b 100644 --- a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c +++ b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c @@ -22,8 +22,9 @@ **/ #include "pipe/p_config.h" +#include "util/u_cpu_detect.h" -#if defined(PIPE_ARCH_X86) +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) #include "pipe/p_compiler.h" #include "util/u_debug.h" @@ -231,6 +232,10 @@ static void emit_modrm( struct x86_function *p, assert(reg.mod == mod_REG); + /* TODO: support extended x86-64 registers */ + assert(reg.idx < 8); + assert(regmem.idx < 8); + val |= regmem.mod << 6; /* mod field */ val |= reg.idx << 3;/* reg field */ val |= regmem.idx; /* r/m field */ @@ -363,6 +368,12 @@ int x86_get_label( struct x86_function *p ) */ +void x64_rexw(struct x86_function *p) +{ + if(util_cpu_arch == UTIL_CPU_ARCH_X86_64) + emit_1ub(p, 0x48); +} + void x86_jcc( struct x86_function *p, enum x86_cc cc, int label ) @@ -449,6 +460,52 @@ void x86_mov_reg_imm( struct x86_function *p, struct x86_reg dst, int imm ) emit_1i(p, imm); } +void x86_mov_imm( struct x86_function *p, struct x86_reg dst, int imm ) +{ + DUMP_RI( dst, imm ); + if(dst.mod == mod_REG) + x86_mov_reg_imm(p, dst, imm); + else + { + emit_1ub(p, 0xc7); + emit_modrm_noreg(p, 0, dst); + emit_1i(p, imm); + } +} + +void x86_mov16_imm( struct x86_function *p, struct x86_reg dst, uint16_t imm ) +{ + DUMP_RI( dst, imm ); + emit_1ub(p, 0x66); + if(dst.mod == mod_REG) + { + emit_1ub(p, 0xb8 + dst.idx); + emit_2ub(p, imm & 0xff, imm >> 8); + } + else + { + emit_1ub(p, 0xc7); + emit_modrm_noreg(p, 0, dst); + emit_2ub(p, imm & 0xff, imm >> 8); + } +} + +void x86_mov8_imm( struct x86_function *p, struct x86_reg dst, uint8_t imm ) +{ + DUMP_RI( dst, imm ); + if(dst.mod == mod_REG) + { + emit_1ub(p, 0xb0 + dst.idx); + emit_1ub(p, imm); + } + else + { + emit_1ub(p, 0xc6); + emit_modrm_noreg(p, 0, dst); + emit_1ub(p, imm); + } +} + /** * Immediate group 1 instructions. */ @@ -520,7 +577,7 @@ void x86_push( struct x86_function *p, } - p->stack_offset += 4; + p->stack_offset += sizeof(void*); } void x86_push_imm32( struct x86_function *p, @@ -530,7 +587,7 @@ void x86_push_imm32( struct x86_function *p, emit_1ub(p, 0x68); emit_1i(p, imm32); - p->stack_offset += 4; + p->stack_offset += sizeof(void*); } @@ -540,23 +597,33 @@ void x86_pop( struct x86_function *p, DUMP_R( reg ); assert(reg.mod == mod_REG); emit_1ub(p, 0x58 + reg.idx); - p->stack_offset -= 4; + p->stack_offset -= sizeof(void*); } void x86_inc( struct x86_function *p, struct x86_reg reg ) { DUMP_R( reg ); - assert(reg.mod == mod_REG); - emit_1ub(p, 0x40 + reg.idx); + if(util_cpu_arch == UTIL_CPU_ARCH_X86 && reg.mod == mod_REG) + { + emit_1ub(p, 0x40 + reg.idx); + return; + } + emit_1ub(p, 0xff); + emit_modrm_noreg(p, 0, reg); } void x86_dec( struct x86_function *p, struct x86_reg reg ) { DUMP_R( reg ); - assert(reg.mod == mod_REG); - emit_1ub(p, 0x48 + reg.idx); + if(util_cpu_arch == UTIL_C
[Mesa-dev] [PATCH 1/3] u_cpu_detect: make arch and little_endian constants, add abi and x86-64
A few related changes: 1. Make x86-64 its own architecture (nothing was using so util_cpu_caps.arch, so nothing can be affected) 2. Turn the CPU arch and endianness into macros, so that the compiler can evaluate that at constant time and eliminate dead code 3. Add util_cpu_abi to know about non-standard ABIs like Win64 --- src/gallium/auxiliary/gallivm/lp_bld_pack.c |2 +- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |2 +- src/gallium/auxiliary/util/u_cpu_detect.c | 19 +- src/gallium/auxiliary/util/u_cpu_detect.h | 39 ++-- 4 files changed, 38 insertions(+), 24 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c b/src/gallium/auxiliary/gallivm/lp_bld_pack.c index ecfb13a..8ab742a 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c @@ -171,7 +171,7 @@ lp_build_unpack2(LLVMBuilderRef builder, msb = lp_build_zero(src_type); /* Interleave bits */ - if(util_cpu_caps.little_endian) { + if(util_cpu_little_endian) { *dst_lo = lp_build_interleave2(builder, src_type, src, msb, 0); *dst_hi = lp_build_interleave2(builder, src_type, src, msb, 1); } diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c index 3075065..d4b8b4f 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c @@ -1840,7 +1840,7 @@ lp_build_sample_2d_linear_aos(struct lp_build_sample_context *bld, unsigned i, j; for(j = 0; j < h16.type.length; j += 4) { - unsigned subindex = util_cpu_caps.little_endian ? 0 : 1; + unsigned subindex = util_cpu_little_endian ? 0 : 1; LLVMValueRef index; index = LLVMConstInt(elem_type, j/2 + subindex, 0); diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c b/src/gallium/auxiliary/util/u_cpu_detect.c index b1a8c75..73ce146 100644 --- a/src/gallium/auxiliary/util/u_cpu_detect.c +++ b/src/gallium/auxiliary/util/u_cpu_detect.c @@ -391,23 +391,6 @@ util_cpu_detect(void) memset(&util_cpu_caps, 0, sizeof util_cpu_caps); - /* Check for arch type */ -#if defined(PIPE_ARCH_MIPS) - util_cpu_caps.arch = UTIL_CPU_ARCH_MIPS; -#elif defined(PIPE_ARCH_ALPHA) - util_cpu_caps.arch = UTIL_CPU_ARCH_ALPHA; -#elif defined(PIPE_ARCH_SPARC) - util_cpu_caps.arch = UTIL_CPU_ARCH_SPARC; -#elif defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) - util_cpu_caps.arch = UTIL_CPU_ARCH_X86; - util_cpu_caps.little_endian = 1; -#elif defined(PIPE_ARCH_PPC) - util_cpu_caps.arch = UTIL_CPU_ARCH_POWERPC; - util_cpu_caps.little_endian = 0; -#else - util_cpu_caps.arch = UTIL_CPU_ARCH_UNKNOWN; -#endif - /* Count the number of CPUs in system */ #if defined(PIPE_OS_WINDOWS) { @@ -504,7 +487,7 @@ util_cpu_detect(void) #ifdef DEBUG if (debug_get_option_dump_cpu()) { - debug_printf("util_cpu_caps.arch = %i\n", util_cpu_caps.arch); + debug_printf("util_cpu_caps.arch = %i\n", util_cpu_arch); debug_printf("util_cpu_caps.nr_cpus = %u\n", util_cpu_caps.nr_cpus); debug_printf("util_cpu_caps.x86_cpu_type = %u\n", util_cpu_caps.x86_cpu_type); diff --git a/src/gallium/auxiliary/util/u_cpu_detect.h b/src/gallium/auxiliary/util/u_cpu_detect.h index 4b3dc39..e81e4b5 100644 --- a/src/gallium/auxiliary/util/u_cpu_detect.h +++ b/src/gallium/auxiliary/util/u_cpu_detect.h @@ -36,6 +36,7 @@ #define _UTIL_CPU_DETECT_H #include "pipe/p_compiler.h" +#include "pipe/p_config.h" enum util_cpu_arch { UTIL_CPU_ARCH_UNKNOWN = 0, @@ -43,19 +44,49 @@ enum util_cpu_arch { UTIL_CPU_ARCH_ALPHA, UTIL_CPU_ARCH_SPARC, UTIL_CPU_ARCH_X86, - UTIL_CPU_ARCH_POWERPC + UTIL_CPU_ARCH_X86_64, + UTIL_CPU_ARCH_POWERPC, + + /* non-standard ABIs, only used in util_cpu_abi */ + UTIL_CPU_ABI_WIN64 }; +/* Check for arch type */ +#if defined(PIPE_ARCH_MIPS) +#define util_cpu_arch UTIL_CPU_ARCH_MIPS +#elif defined(PIPE_ARCH_ALPHA) +#define util_cpu_arch UTIL_CPU_ARCH_ALPHA +#elif defined(PIPE_ARCH_SPARC) +#define util_cpu_arch UTIL_CPU_ARCH_SPARC +#elif defined(PIPE_ARCH_X86) +#define util_cpu_arch UTIL_CPU_ARCH_X86 +#elif defined(PIPE_ARCH_X86_64) +#define util_cpu_arch UTIL_CPU_ARCH_X86_64 +#elif defined(PIPE_ARCH_PPC) +#define util_cpu_arch UTIL_CPU_ARCH_POWERPC +#else +#define util_cpu_arch UTIL_CPU_ARCH_UNKNOWN +#endif + +#ifdef WIN64 +#define util_cpu_abi UTIL_CPU_ABI_WIN64 +#else +#define util_cpu_abi util_cpu_arch +#endif + +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) || !defined(WORDS_BIGENDIAN) +#define util_cpu_little_endian 1 +#else +#define util_cpu_little_endian 0 +#endif + struct util_cpu_caps { - enum util_cpu_arch arch; unsigned nr_cpus; /* Feature flags */ int x86_cpu_type; unsigned cacheline; - unsigned little_endian:1; - unsigned has_tsc:1; unsigned has_mmx:1; unsigned has_mmx2:1; --
[Mesa-dev] [PATCH 0/3] rtasm/translate_sse rework (v3)
This is a new version of just the rtasm/translate_sse patches. This version has the Win64 support built-in. In addition, it follows Keith's idea to use constants instead of #ifs. To achieve this, u_cpu_detect.h is enhanced so that architecture and endianness are now compile time constants (and thus produce the same code as #ifs after optimization), and the CPU ABI is now also available to support Win64. Luca Barbieri (3): u_cpu_detect: make arch and little_endian constants, add abi and x86-64 rtasm: add minimal x86-64 support and new instructions (v2) translate_sse: major rewrite (v3) src/gallium/auxiliary/gallivm/lp_bld_pack.c |2 +- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |2 +- src/gallium/auxiliary/rtasm/rtasm_cpu.c |6 +- src/gallium/auxiliary/rtasm/rtasm_x86sse.c| 455 - src/gallium/auxiliary/rtasm/rtasm_x86sse.h| 69 ++- src/gallium/auxiliary/translate/translate_sse.c | 1162 - src/gallium/auxiliary/util/u_cpu_detect.c | 19 +- src/gallium/auxiliary/util/u_cpu_detect.h | 39 +- 8 files changed, 1455 insertions(+), 299 deletions(-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] talloc (Was: Merge criteria for glsl2 branch)
On Thu, 2010-08-12 at 14:46 -0700, Ian Romanick wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > José Fonseca wrote: > > > OK. > > > > What about this: > > > > For GLUT, GLEW, LLVM and all other dependencies I'll just make a SDK > > with the binaries, with debug & release, 32 & 64 bit, MinGW & MSVC > > versions. One seldom needs to modify the source anyway, and they have > > active upstream development. > > > > But I perceive talloc as different from all above: it's very low level > > and low weight library, providing very basic functionality, and upstream > > never showed interest for Windows portability. I'd really prefer to see > > the talloc source bundled (and only compiled on windows), as a quick way > > to have glsl2 merged without causing windows build failures. > > This seems like a reasonable compromise. Is this something that you and > / or Aras can tackle? I don't have a Windows build system set up, so I > wouldn't be able to test any build system changes that I made. I've pushed a new branch glsl2-win32 that includes Aras' patch, and all necessary fixes to get at least MinGW build successfully. I had to rename some tokens in order to avoid collisions with windows.h defines. Aras didn't mention this problem before. Perhaps the indirect windows.h include can be avoided, or you prefer to handle this some other way. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Nouveau errors in /var/log/messages - what are they and what do they mean?
What does all these errors that the X11 Nouveau driver spews out in my /var/log/message mean? They don't seem to do any harm but I was wondering whether it might be having an impact on the driver's performance? Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: Allocating FIFO number 3 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: nouveau_channel_alloc: initialised FIFO 3 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/2 Mthd 0x Data 0x8801 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/2 Mthd 0x0180 Data 0x8800 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/3 Mthd 0x Data 0x8802 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/3 Mthd 0x0184 Data 0xbeef0201 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/3 Mthd 0x0188 Data 0xbeef0201 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/4 Mthd 0x Data 0x8803 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/4 Mthd 0x0180 Data 0x8800 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/4 Mthd 0x019c Data 0x8802 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/4 Mthd 0x02fc Data 0x0003 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/5 Mthd 0x Data 0x8804 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/5 Mthd 0x0180 Data 0x8800 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/5 Mthd 0x0198 Data 0x8802 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/5 Mthd 0x02fc Data 0x0003 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/5 Mthd 0x0304 Data 0x0002 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/7 Mthd 0x Data 0xbeef3097 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/7 Mthd 0x0180 Data 0xbeef0301 Aug 13 13:05:54 lithium kernel: nouveau_ratelimit: 38 callbacks suppressed Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 3/7 Class 0x Mthd 0x0184 Data 0x1f42:0x1f42 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 3/7 Class 0x Mthd 0x0188 Data 0x1f43:0x1f43 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 3/7 Class 0x Mthd 0x018c Data 0x1f42:0x1f42 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 3/7 Class 0x Mthd 0x0194 Data 0x1f42:0x1f42 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 3/7 Class 0x Mthd 0x0198 Data 0x1f42:0x1f42 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 3/7 Class 0x Mthd 0x019c Data 0x1f42:0x1f42 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 3/7 Class 0x Mthd 0x01a0 Data 0x1f43:0x1f43 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 3/7 Class 0x Mthd 0x01a4 Data 0x23b5:0x23b5 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PFIFO_CACHE_ERROR - Ch 3/7 Mthd 0x01a8 Data 0xbeef0302 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 3/7 Class 0x Mthd 0x01ac Data 0x1f42:0x1f42 Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - nSource: ILLEGAL_MTHD, nStatus: PROTECTION_FAULT Aug 13 13:05:54 lithium kernel: [drm] nouveau :01:00.0: PGRAPH_ERROR - Ch 3/7 C
Re: [Mesa-dev] [PATCH 6/6] translate_sse: major rewrite
On Fri, 2010-08-13 at 04:46 -0700, Luca Barbieri wrote: > > Is it possible to use an explicit flag for the (out_chans == 5) case? > > Gave it the name CHANNELS_0001 and added a comment. > > > Is it possible to do this without all the #ifdefs? Even if statements > > based on a preprocessor variable would be easier to read, but better > > still would be some sort of wrapper function which just did the right > > thing on either architecture. > > Right, done. > > > Similar comment applies to your x86-64 changes in rtasm.c -- is there a > > way to reduce the #ifdef load? > > Here it seems impossible, as it doesn't seem to be possible to > abstract any of them (except possibly adding a function to encode both > INC and DEC , but that doesn't really seem a win). What about just making things prettier by converting the #if's into regular if statements? It would be easier to read if nothing else, though it would mean compiling at least a stub version of the x86-64 opcode emitters on x86. In fact there's nothing preventing us compiling the entire x86-64 emitters on x86, though obviously they can't be used -- but there's nothing which requires that code to be #ifdef'ed out on other platforms. Keith ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] rtasm/translate_sse: support Win64
I just discovered that Microsoft wisely decided to use their own calling convention on Win64... This hasn't actually been tested on Win64 though. --- src/gallium/auxiliary/rtasm/rtasm_x86sse.c | 15 +++ src/gallium/auxiliary/translate/translate_sse.c | 21 + 2 files changed, 24 insertions(+), 12 deletions(-) diff --git a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c index 6d6b76a..a076e17 100644 --- a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c +++ b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c @@ -2090,6 +2090,20 @@ struct x86_reg x86_fn_arg( struct x86_function *p, { switch(arg) { +/* Microsoft uses a different calling convention than the rest of the world */ +#ifdef WIN64 + case 1: + return x86_make_reg(file_REG32, reg_CX); + case 2: + return x86_make_reg(file_REG32, reg_DX); + case 3: + return x86_make_reg(file_REG32, reg_R8); + case 4: + return x86_make_reg(file_REG32, reg_R9); + default: + return x86_make_disp(x86_make_reg(file_REG32, reg_SP), + p->stack_offset + (arg - 4) * 8); /* ??? */ +#else case 1: return x86_make_reg(file_REG32, reg_DI); case 2: @@ -2105,6 +2119,7 @@ struct x86_reg x86_fn_arg( struct x86_function *p, default: return x86_make_disp(x86_make_reg(file_REG32, reg_SP), p->stack_offset + (arg - 6) * 8); /* ??? */ +#endif } } #else diff --git a/src/gallium/auxiliary/translate/translate_sse.c b/src/gallium/auxiliary/translate/translate_sse.c index e2d8d53..5dfb186 100644 --- a/src/gallium/auxiliary/translate/translate_sse.c +++ b/src/gallium/auxiliary/translate/translate_sse.c @@ -1234,26 +1234,23 @@ static boolean build_vertex_emit( struct translate_sse *p, x86_init_func(p->func); -#ifdef PIPE_ARCH_X86_64 x86_push(p->func, p->outbuf_EBX); x86_push(p->func, p->count_EBP); - /* Load arguments into regs; the first two are already there */ - x86_mov(p->func, p->count_EBP, x86_fn_arg(p->func, 3)); - x64_mov64(p->func, p->outbuf_EBX, x86_fn_arg(p->func, 5)); -#else - /* Push a few regs? -*/ - x86_push(p->func, p->outbuf_EBX); - x86_push(p->func, p->count_EBP); +/* on non-Win64 x86-64, these are already in the right registers */ +#if defined(PIPE_ARCH_X86) || defined(WIN64) x86_push(p->func, p->machine_EDI); x86_push(p->func, p->idx_ESI); - /* Load arguments into regs: -*/ x86_mov(p->func, p->machine_EDI, x86_fn_arg(p->func, 1)); x86_mov(p->func, p->idx_ESI, x86_fn_arg(p->func, 2)); +#endif + x86_mov(p->func, p->count_EBP, x86_fn_arg(p->func, 3)); + +#ifdef PIPE_ARCH_X86_64 + x64_mov64(p->func, p->outbuf_EBX, x86_fn_arg(p->func, 5)); +#else x86_mov(p->func, p->outbuf_EBX, x86_fn_arg(p->func, 5)); #endif @@ -1333,7 +1330,7 @@ static boolean build_vertex_emit( struct translate_sse *p, /* Pop regs and return */ -#ifndef PIPE_ARCH_X86_64 +#if defined(PIPE_ARCH_X86) || defined(WIN64) x86_pop(p->func, p->idx_ESI); x86_pop(p->func, p->machine_EDI); #endif -- 1.7.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] translate_generic: use memcpy if possible
> In this change you've got an int value (copy_size) which has some > special meaning when negative -- can you add comments explaining what > the meaning of a negative size is? Is there a way to use some more > explicit flag value to indicate this condition? I think it makes sense, since -1 stands for "not applicable", because we aren't doing the memcpy whose size copy_size is about (also, the CPU will already have the value loaded for use in memcpy if it turns out to be non-negative). I added a comment explaining that. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] translate_sse: major rewrite (v2)
Changes in v2: - Minimize #ifs - Give a name to magic number CHANNELS_0001 - Add support for CPUs without SSE (only memcpy and swizzles, like non SSE2) - Fixed comments translate_sse is currently very limited to the point of being useless in essentially all cases. In particular, it only support some float32 and unorm8 formats and doesn't work on x86-64. This commit rewrites it to support: 1. Dumb memory copy for any pair of identical formats 2. All formats that are swizzles of each other 3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float 4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16 5. Support for x86-64 (doesn't take advantage of it in any way though) This new translate can even be useful to translate index buffers for cards that lack 8-bit index support. It passes the testsuite I wrote, but note that this is a major change, and more testing would be great. 0006-translate_sse-major-rewrite-v2.patch.gz Description: GNU Zip compressed data ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/6] translate_sse: major rewrite
> Is it possible to use an explicit flag for the (out_chans == 5) case? Gave it the name CHANNELS_0001 and added a comment. > Is it possible to do this without all the #ifdefs? Even if statements > based on a preprocessor variable would be easier to read, but better > still would be some sort of wrapper function which just did the right > thing on either architecture. Right, done. > Similar comment applies to your x86-64 changes in rtasm.c -- is there a > way to reduce the #ifdef load? Here it seems impossible, as it doesn't seem to be possible to abstract any of them (except possibly adding a function to encode both INC and DEC , but that doesn't really seem a win). > + // TODO: add support for SSE4.1 pmovzx > > Probably want to use C-style comments throughout. Done. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/6] rtasm: add minimal x86-64 support and new instructions
This commit adds minimal x86-64 support: only movs between registers are supported for r8-r15, and x64_rexw() must be used to ask for 64-bit operations. It also adds several new instructions for the new translate_sse code. --- src/gallium/auxiliary/rtasm/rtasm_cpu.c|6 +- src/gallium/auxiliary/rtasm/rtasm_x86sse.c | 433 ++-- src/gallium/auxiliary/rtasm/rtasm_x86sse.h | 69 - 3 files changed, 477 insertions(+), 31 deletions(-) diff --git a/src/gallium/auxiliary/rtasm/rtasm_cpu.c b/src/gallium/auxiliary/rtasm/rtasm_cpu.c index 2e15751..0461c81 100644 --- a/src/gallium/auxiliary/rtasm/rtasm_cpu.c +++ b/src/gallium/auxiliary/rtasm/rtasm_cpu.c @@ -30,7 +30,7 @@ #include "rtasm_cpu.h" -#if defined(PIPE_ARCH_X86) +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) static boolean rtasm_sse_enabled(void) { static boolean firsttime = 1; @@ -49,7 +49,7 @@ static boolean rtasm_sse_enabled(void) int rtasm_cpu_has_sse(void) { /* FIXME: actually detect this at run-time */ -#if defined(PIPE_ARCH_X86) +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) return rtasm_sse_enabled(); #else return 0; @@ -59,7 +59,7 @@ int rtasm_cpu_has_sse(void) int rtasm_cpu_has_sse2(void) { /* FIXME: actually detect this at run-time */ -#if defined(PIPE_ARCH_X86) +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) return rtasm_sse_enabled(); #else return 0; diff --git a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c index 63007c1..6d6b76a 100644 --- a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c +++ b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c @@ -23,7 +23,7 @@ #include "pipe/p_config.h" -#if defined(PIPE_ARCH_X86) +#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) #include "pipe/p_compiler.h" #include "util/u_debug.h" @@ -231,6 +231,10 @@ static void emit_modrm( struct x86_function *p, assert(reg.mod == mod_REG); + /* TODO: support extended x86-64 registers */ + assert(reg.idx < 8); + assert(regmem.idx < 8); + val |= regmem.mod << 6; /* mod field */ val |= reg.idx << 3;/* reg field */ val |= regmem.idx; /* r/m field */ @@ -363,6 +367,13 @@ int x86_get_label( struct x86_function *p ) */ +void x64_rexw(struct x86_function *p) +{ +#if defined(PIPE_ARCH_X86_64) + emit_1ub(p, 0x48); +#endif +} + void x86_jcc( struct x86_function *p, enum x86_cc cc, int label ) @@ -449,6 +460,52 @@ void x86_mov_reg_imm( struct x86_function *p, struct x86_reg dst, int imm ) emit_1i(p, imm); } +void x86_mov_imm( struct x86_function *p, struct x86_reg dst, int imm ) +{ + DUMP_RI( dst, imm ); + if(dst.mod == mod_REG) + x86_mov_reg_imm(p, dst, imm); + else + { + emit_1ub(p, 0xc7); + emit_modrm_noreg(p, 0, dst); + emit_1i(p, imm); + } +} + +void x86_mov16_imm( struct x86_function *p, struct x86_reg dst, uint16_t imm ) +{ + DUMP_RI( dst, imm ); + emit_1ub(p, 0x66); + if(dst.mod == mod_REG) + { + emit_1ub(p, 0xb8 + dst.idx); + emit_2ub(p, imm & 0xff, imm >> 8); + } + else + { + emit_1ub(p, 0xc7); + emit_modrm_noreg(p, 0, dst); + emit_2ub(p, imm & 0xff, imm >> 8); + } +} + +void x86_mov8_imm( struct x86_function *p, struct x86_reg dst, uint8_t imm ) +{ + DUMP_RI( dst, imm ); + if(dst.mod == mod_REG) + { + emit_1ub(p, 0xb0 + dst.idx); + emit_1ub(p, imm); + } + else + { + emit_1ub(p, 0xc6); + emit_modrm_noreg(p, 0, dst); + emit_1ub(p, imm); + } +} + /** * Immediate group 1 instructions. */ @@ -520,7 +577,7 @@ void x86_push( struct x86_function *p, } - p->stack_offset += 4; + p->stack_offset += sizeof(void*); } void x86_push_imm32( struct x86_function *p, @@ -530,7 +587,7 @@ void x86_push_imm32( struct x86_function *p, emit_1ub(p, 0x68); emit_1i(p, imm32); - p->stack_offset += 4; + p->stack_offset += sizeof(void*); } @@ -540,23 +597,37 @@ void x86_pop( struct x86_function *p, DUMP_R( reg ); assert(reg.mod == mod_REG); emit_1ub(p, 0x58 + reg.idx); - p->stack_offset -= 4; + p->stack_offset -= sizeof(void*); } void x86_inc( struct x86_function *p, struct x86_reg reg ) { DUMP_R( reg ); - assert(reg.mod == mod_REG); - emit_1ub(p, 0x40 + reg.idx); +#if defined(PIPE_ARCH_X86) + if(reg.mod == mod_REG); + { + emit_1ub(p, 0x40 + reg.idx); + return; + } +#endif + emit_1ub(p, 0xff); + emit_modrm_noreg(p, 0, reg); } void x86_dec( struct x86_function *p, struct x86_reg reg ) { DUMP_R( reg ); - assert(reg.mod == mod_REG); - emit_1ub(p, 0x48 + reg.idx); +#if defined(PIPE_ARCH_X86) + if(reg.mod == mod_REG) + { + emit_1ub(p, 0x48 + reg.idx); + return; + } +#endif + emit_1ub(p, 0xff); + emit_modrm_noreg(p, 1, reg); } void x86_ret( struct x86_function *p ) @@ -583,8 +654
[Mesa-dev] [PATCH 4/6] translate: add support for 8/16-bit indices
Currently, only 32-bit indices are supported, but some use cases translate needs support for all types. --- src/gallium/auxiliary/rtasm/rtasm_x86sse.c | 14 src/gallium/auxiliary/rtasm/rtasm_x86sse.h |2 + src/gallium/auxiliary/translate/translate.h| 12 .../auxiliary/translate/translate_generic.c| 34 ++ src/gallium/auxiliary/translate/translate_sse.c| 65 ++-- 5 files changed, 108 insertions(+), 19 deletions(-) diff --git a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c index 9f70b73..63007c1 100644 --- a/src/gallium/auxiliary/rtasm/rtasm_x86sse.c +++ b/src/gallium/auxiliary/rtasm/rtasm_x86sse.c @@ -586,6 +586,20 @@ void x86_mov( struct x86_function *p, emit_op_modrm( p, 0x8b, 0x89, dst, src ); } +void x86_movzx8(struct x86_function *p, struct x86_reg dst, struct x86_reg src ) +{ + DUMP_RR( dst, src ); + emit_2ub(p, 0x0f, 0xb6); + emit_modrm(p, dst, src); +} + +void x86_movzx16(struct x86_function *p, struct x86_reg dst, struct x86_reg src ) +{ + DUMP_RR( dst, src ); + emit_2ub(p, 0x0f, 0xb7); + emit_modrm(p, dst, src); +} + void x86_xor( struct x86_function *p, struct x86_reg dst, struct x86_reg src ) diff --git a/src/gallium/auxiliary/rtasm/rtasm_x86sse.h b/src/gallium/auxiliary/rtasm/rtasm_x86sse.h index 6208e8f..365dec1 100644 --- a/src/gallium/auxiliary/rtasm/rtasm_x86sse.h +++ b/src/gallium/auxiliary/rtasm/rtasm_x86sse.h @@ -237,6 +237,8 @@ void x86_dec( struct x86_function *p, struct x86_reg reg ); void x86_inc( struct x86_function *p, struct x86_reg reg ); void x86_lea( struct x86_function *p, struct x86_reg dst, struct x86_reg src ); void x86_mov( struct x86_function *p, struct x86_reg dst, struct x86_reg src ); +void x86_movzx8( struct x86_function *p, struct x86_reg dst, struct x86_reg src ); +void x86_movzx16( struct x86_function *p, struct x86_reg dst, struct x86_reg src ); void x86_mul( struct x86_function *p, struct x86_reg src ); void x86_imul( struct x86_function *p, struct x86_reg dst, struct x86_reg src ); void x86_or( struct x86_function *p, struct x86_reg dst, struct x86_reg src ); diff --git a/src/gallium/auxiliary/translate/translate.h b/src/gallium/auxiliary/translate/translate.h index eb6f2cc..a753802 100644 --- a/src/gallium/auxiliary/translate/translate.h +++ b/src/gallium/auxiliary/translate/translate.h @@ -85,6 +85,18 @@ struct translate { unsigned instance_id, void *output_buffer); + void (PIPE_CDECL *run_elts16)( struct translate *, +const uint16_t *elts, +unsigned count, +unsigned instance_id, +void *output_buffer); + + void (PIPE_CDECL *run_elts8)( struct translate *, +const uint8_t *elts, +unsigned count, +unsigned instance_id, +void *output_buffer); + void (PIPE_CDECL *run)( struct translate *, unsigned start, unsigned count, diff --git a/src/gallium/auxiliary/translate/translate_generic.c b/src/gallium/auxiliary/translate/translate_generic.c index e7f5384..10706a7 100644 --- a/src/gallium/auxiliary/translate/translate_generic.c +++ b/src/gallium/auxiliary/translate/translate_generic.c @@ -441,6 +441,38 @@ static void PIPE_CDECL generic_run_elts( struct translate *translate, } } +static void PIPE_CDECL generic_run_elts16( struct translate *translate, + const uint16_t *elts, + unsigned count, + unsigned instance_id, + void *output_buffer ) +{ + struct translate_generic *tg = translate_generic(translate); + char *vert = output_buffer; + unsigned i; + + for (i = 0; i < count; i++) { + generic_run_one(tg, *elts++, instance_id, vert); + vert += tg->translate.key.output_stride; + } +} + +static void PIPE_CDECL generic_run_elts8( struct translate *translate, + const uint8_t *elts, + unsigned count, + unsigned instance_id, + void *output_buffer ) +{ + struct translate_generic *tg = translate_generic(translate); + char *vert = output_buffer; + unsigned i; + + for (i = 0; i < count; i++) { + generic_run_one(tg, *elts++, instance_id, vert); + vert += tg->translate.key.output_stride; + } +} + static void PIPE_CDECL generic_run( struct translate *translate, unsigned start, unsigned c
[Mesa-dev] [PATCH 3/6] translate_sse: remove useless generated function wrappers
Currently translate_sse puts two trivial wrappers in the translate vtable. These slow it down and enlarge the source code for no gain, except perhaps the ability to set a breakpoint there, so remove them. Breakpoints can be set on the caller of the translate functions, with no loss of functionality. --- src/gallium/auxiliary/translate/translate_sse.c | 55 ++- 1 files changed, 4 insertions(+), 51 deletions(-) diff --git a/src/gallium/auxiliary/translate/translate_sse.c b/src/gallium/auxiliary/translate/translate_sse.c index ef3aa67..68c71f4 100644 --- a/src/gallium/auxiliary/translate/translate_sse.c +++ b/src/gallium/auxiliary/translate/translate_sse.c @@ -46,18 +46,6 @@ #define W3 -typedef void (PIPE_CDECL *run_func)( struct translate *translate, - unsigned start, - unsigned count, - unsigned instance_id, - void *output_buffer); - -typedef void (PIPE_CDECL *run_elts_func)( struct translate *translate, - const unsigned *elts, - unsigned count, - unsigned instance_id, - void *output_buffer); - struct translate_buffer { const void *base_ptr; unsigned stride; @@ -102,9 +90,6 @@ struct translate_sse { boolean use_instancing; unsigned instance_id; - run_func gen_run; - run_elts_func gen_run_elts; - /* these are actually known values, but putting them in a struct * like this is helpful to keep them in sync across the file. */ @@ -700,36 +685,6 @@ static void translate_sse_release( struct translate *translate ) FREE(p); } -static void PIPE_CDECL translate_sse_run_elts( struct translate *translate, - const unsigned *elts, - unsigned count, - unsigned instance_id, - void *output_buffer ) -{ - struct translate_sse *p = (struct translate_sse *)translate; - - p->gen_run_elts( translate, - elts, - count, -instance_id, -output_buffer); -} - -static void PIPE_CDECL translate_sse_run( struct translate *translate, -unsigned start, -unsigned count, - unsigned instance_id, -void *output_buffer ) -{ - struct translate_sse *p = (struct translate_sse *)translate; - - p->gen_run( translate, - start, - count, - instance_id, - output_buffer); -} - struct translate *translate_sse2_create( const struct translate_key *key ) { @@ -746,8 +701,6 @@ struct translate *translate_sse2_create( const struct translate_key *key ) p->translate.key = *key; p->translate.release = translate_sse_release; p->translate.set_buffer = translate_sse_set_buffer; - p->translate.run_elts = translate_sse_run_elts; - p->translate.run = translate_sse_run; for (i = 0; i < key->nr_elements; i++) { if (key->element[i].type == TRANSLATE_ELEMENT_NORMAL) { @@ -789,12 +742,12 @@ struct translate *translate_sse2_create( const struct translate_key *key ) if (!build_vertex_emit(p, &p->elt_func, FALSE)) goto fail; - p->gen_run = (run_func)x86_get_func(&p->linear_func); - if (p->gen_run == NULL) + p->translate.run = (void*)x86_get_func(&p->linear_func); + if (p->translate.run == NULL) goto fail; - p->gen_run_elts = (run_elts_func)x86_get_func(&p->elt_func); - if (p->gen_run_elts == NULL) + p->translate.run_elts = (void*)x86_get_func(&p->elt_func); + if (p->translate.run_elts == NULL) goto fail; return &p->translate; -- 1.7.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] translate_generic: factor out common code between linear and indexed
This moves the common code into a separate ALWAYS_INLINE function. --- .../auxiliary/translate/translate_generic.c| 177 +++- 1 files changed, 62 insertions(+), 115 deletions(-) diff --git a/src/gallium/auxiliary/translate/translate_generic.c b/src/gallium/auxiliary/translate/translate_generic.c index 356d488..e7f5384 100644 --- a/src/gallium/auxiliary/translate/translate_generic.c +++ b/src/gallium/auxiliary/translate/translate_generic.c @@ -362,6 +362,66 @@ static emit_func get_emit_func( enum pipe_format format ) } } +static ALWAYS_INLINE void PIPE_CDECL generic_run_one( struct translate_generic *tg, + unsigned elt, + unsigned instance_id, + void *vert ) +{ + unsigned nr_attrs = tg->nr_attrib; + unsigned attr; + + for (attr = 0; attr < nr_attrs; attr++) { + float data[4]; + char *dst = vert + tg->attrib[attr].output_offset; + + if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) { + const uint8_t *src; + unsigned index; + int copy_size; + + if (tg->attrib[attr].instance_divisor) { +index = instance_id / tg->attrib[attr].instance_divisor; + } + else { +index = elt; + } + + /* clamp to void going out of bounds */ + index = MIN2(index, tg->attrib[attr].max_index); + + src = tg->attrib[attr].input_ptr + + tg->attrib[attr].input_stride * index; + + copy_size = tg->attrib[attr].copy_size; + if(likely(copy_size >= 0)) +memcpy(dst, src, copy_size); + else + { +tg->attrib[attr].fetch( data, src, 0, 0 ); + +if (0) + debug_printf("Fetch linear attr %d from %p stride %d index %d: " + " %f, %f, %f, %f \n", + attr, + tg->attrib[attr].input_ptr, + tg->attrib[attr].input_stride, + index, + data[0], data[1],data[2], data[3]); + +tg->attrib[attr].emit( data, dst ); + } + } else { + if(likely(tg->attrib[attr].copy_size >= 0)) +memcpy(data, &instance_id, 4); + else + { +data[0] = (float)instance_id; +tg->attrib[attr].emit( data, dst ); + } + } + } +} + /** * Fetch vertex attributes for 'count' vertices. */ @@ -373,71 +433,14 @@ static void PIPE_CDECL generic_run_elts( struct translate *translate, { struct translate_generic *tg = translate_generic(translate); char *vert = output_buffer; - unsigned nr_attrs = tg->nr_attrib; - unsigned attr; unsigned i; - /* loop over vertex attributes (vertex shader inputs) -*/ for (i = 0; i < count; i++) { - const unsigned elt = *elts++; - - for (attr = 0; attr < nr_attrs; attr++) { -float data[4]; -char *dst = vert + tg->attrib[attr].output_offset; - -if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) { -const uint8_t *src; -unsigned index; -int copy_size; - -if (tg->attrib[attr].instance_divisor) { - index = instance_id / tg->attrib[attr].instance_divisor; -} else { - index = elt; -} - -/* clamp to void going out of bounds */ -index = MIN2(index, tg->attrib[attr].max_index); - -src = tg->attrib[attr].input_ptr + - tg->attrib[attr].input_stride * index; - -copy_size = tg->attrib[attr].copy_size; -if(likely(copy_size >= 0)) - memcpy(dst, src, copy_size); -else -{ - tg->attrib[attr].fetch( data, src, 0, 0 ); - - if (0) - debug_printf("Fetch elt attr %d from %p stride %d div %u max %u index %d: " - " %f, %f, %f, %f \n", - attr, - tg->attrib[attr].input_ptr, - tg->attrib[attr].input_stride, - tg->attrib[attr].instance_divisor, - tg->attrib[attr].max_index, - index, - data[0], data[1],data[2], data[3]); - tg->attrib[attr].emit( data, dst ); -} - } else { -if(likely(tg->attrib[attr].copy_size >= 0)) - memcpy(data, &instance_id, 4); -else -{ - data[0] = (float)instance_id; - tg->attrib[attr].emit( data, dst ); -} - } - } + generic_run_one(tg, *elts++, instance_id, vert); vert += tg->translate.key.output_stride; } } - - static void PIPE_CDECL generic_run( str
[Mesa-dev] [PATCH 1/6] translate_generic: use memcpy if possible (v2)
Changes in v2: - Add comment regarding copy_size When used in GPU drivers, translate can be used to simultaneously perform a gather operation, and convert away from unsupported formats. In this use case, input and output formats will often be identical: clearly it would make sense to use a memcpy in this case. Instead, translate will insist to convert to and from 32-bit floating point numbers. This is not only extremely expensive, but it also loses precision for 32/64-bit integers and 64-bit floating point numbers. This patch changes translate_generic to just use memcpy if the formats are identical, non-blocked, and with an integral number of bytes per pixel (note that all sensible vertex formats are like this). --- .../auxiliary/translate/translate_generic.c| 102 +-- 1 files changed, 70 insertions(+), 32 deletions(-) diff --git a/src/gallium/auxiliary/translate/translate_generic.c b/src/gallium/auxiliary/translate/translate_generic.c index 42cfd76..356d488 100644 --- a/src/gallium/auxiliary/translate/translate_generic.c +++ b/src/gallium/auxiliary/translate/translate_generic.c @@ -64,6 +64,14 @@ struct translate_generic { unsigned input_stride; unsigned max_index; + /* this value is set to -1 if this is a normal element with output_format != input_format: + * in this case, u_format is used to do a full conversion + * + * this value is set to the format size in bytes if output_format == input_format or for 32-bit instance ids: + * in this case, memcpy is used to copy this amount of bytes + */ + int copy_size; + } attrib[PIPE_MAX_ATTRIBS]; unsigned nr_attrib; @@ -354,8 +362,6 @@ static emit_func get_emit_func( enum pipe_format format ) } } - - /** * Fetch vertex attributes for 'count' vertices. */ @@ -380,9 +386,10 @@ static void PIPE_CDECL generic_run_elts( struct translate *translate, float data[4]; char *dst = vert + tg->attrib[attr].output_offset; - if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) { +if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) { const uint8_t *src; unsigned index; +int copy_size; if (tg->attrib[attr].instance_divisor) { index = instance_id / tg->attrib[attr].instance_divisor; @@ -396,27 +403,34 @@ static void PIPE_CDECL generic_run_elts( struct translate *translate, src = tg->attrib[attr].input_ptr + tg->attrib[attr].input_stride * index; -tg->attrib[attr].fetch( data, src, 0, 0 ); - -if (0) - debug_printf("Fetch elt attr %d from %p stride %d div %u max %u index %d: " -" %f, %f, %f, %f \n", -attr, -tg->attrib[attr].input_ptr, -tg->attrib[attr].input_stride, -tg->attrib[attr].instance_divisor, -tg->attrib[attr].max_index, -index, -data[0], data[1],data[2], data[3]); +copy_size = tg->attrib[attr].copy_size; +if(likely(copy_size >= 0)) + memcpy(dst, src, copy_size); +else +{ + tg->attrib[attr].fetch( data, src, 0, 0 ); + + if (0) + debug_printf("Fetch elt attr %d from %p stride %d div %u max %u index %d: " + " %f, %f, %f, %f \n", + attr, + tg->attrib[attr].input_ptr, + tg->attrib[attr].input_stride, + tg->attrib[attr].instance_divisor, + tg->attrib[attr].max_index, + index, + data[0], data[1],data[2], data[3]); + tg->attrib[attr].emit( data, dst ); +} } else { -data[0] = (float)instance_id; +if(likely(tg->attrib[attr].copy_size >= 0)) + memcpy(data, &instance_id, 4); +else +{ + data[0] = (float)instance_id; + tg->attrib[attr].emit( data, dst ); +} } - - if (0) -debug_printf("vert %d/%d attr %d: %f %f %f %f\n", - i, elt, attr, data[0], data[1], data[2], data[3]); - -tg->attrib[attr].emit( data, dst ); } vert += tg->translate.key.output_stride; } @@ -448,6 +462,7 @@ static void PIPE_CDECL generic_run( struct translate *translate, if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) { const uint8_t *src; unsigned index; +int copy_size; if (tg->attrib[attr].instance_divisor) { index = instance_id / tg->attrib[attr].instan
[Mesa-dev] [PATCH 0/6] Translate improvements (v2)
This patchset addresses review comments, and adds support for running on CPUs lacking any SSE support, but only for format pairs that are identical or swizzles of each other. Luca Barbieri (6): translate_generic: use memcpy if possible (v2) translate_generic: factor out common code between linear and indexed translate_sse: remove useless generated function wrappers translate: add support for 8/16-bit indices rtasm: add minimal x86-64 support and new instructions translate_sse: major rewrite (v2) src/gallium/auxiliary/rtasm/rtasm_cpu.c|6 +- src/gallium/auxiliary/rtasm/rtasm_x86sse.c | 447 +++- src/gallium/auxiliary/rtasm/rtasm_x86sse.h | 67 +- src/gallium/auxiliary/translate/translate.h| 12 + .../auxiliary/translate/translate_generic.c| 207 ++-- src/gallium/auxiliary/translate/translate_sse.c| 1270 +++- 6 files changed, 1584 insertions(+), 425 deletions(-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] talloc (Was: Merge criteria for glsl2 branch)
On Fri, 2010-08-13 at 02:03 -0700, Dave Airlie wrote: > On Fri, Aug 13, 2010 at 4:58 PM, Aras Pranckevicius wrote: > >> > But I perceive talloc as different from all above: it's very low level > >> > and low weight library, providing very basic functionality, and upstream > >> > never showed interest for Windows portability. I'd really prefer to see > >> > the talloc source bundled (and only compiled on windows), as a quick way > >> > to have glsl2 merged without causing windows build failures. > >> > >> This seems like a reasonable compromise. Is this something that you and > >> / or Aras can tackle? I don't have a Windows build system set up, so I > >> wouldn't be able to test any build system changes that I made. > > > > Ok, looks like how/if to bundle talloc is still a very open question. In the > > meantime, here's talloc 2.0.1 made to compile (and possibly work!) with > > Visual C++ 2008 (Windows) and Xcode/gcc4.0 (Mac). > > I've attached the modified talloc.c & talloc.h and the patch from original > > talloc 2.0.1 (from here http://samba.org/ftp/talloc/). Caveat emptor: I only > > verified this to work on my own GLSL2 fork, which does not compile in GLSL2 > > preprocessor, only the compiler & optimizer. > > Like I said before, "full port" of talloc seems to be not needed for > > compiling on Visual C++; just drop in talloc.h & talloc.c into the project > > and that's it. Same for Mac with Xcode. It also seems that GLSL2 does not > > use full talloc's functionality, and at least half of the implementation > > could be dropped without anyone noticing. Just a note for if/when anyone > > would try to re-implement talloc with Mesa's license. > > Be careful about LGPLv3 rules, > > If you are distributing anything linked with an LGPL library without > accompanying source you need to dynamically link it, > > So for example a Windows driver or non open compiler, you can't just > drop the LGPLv3 c+h files into the project, you need to create a > dynamic library. Yep. I got excited with v3's http://www.gnu.org/licenses/lgpl.html section 5, "combined libraries", but rereading it I found the requirement to use shared library (or ship the object files for the closed source bits) is still there in section 4 d) 1). I think this pretty much settles on my mind that we need a BSD reimplementation of this in the medium term, as the hassle of changing all the installer and code signing code to install/sign a new dll would by far exceed the effort necessary to implement the functionality of talloc missing from its muse, halloc. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/6] translate_sse: major rewrite
On Thu, 2010-08-12 at 10:22 -0700, Luca Barbieri wrote: > translate_sse is currently very limited to the point of > being useless in essentially all cases. > > In particular, it only support some float32 and unorm8 > formats and doesn't work on x86-64. > > This commit rewrites it to support: > 1. Dumb memory copy for any pair of identical formats > 2. All formats that are swizzles of each other > 3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float > 4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16 > 5. Support for x86-64 (doesn't take advantage of it in any way though) > > This new translate can even be useful to translate index buffers for > cards that lack 8-bit index support. > > It passes the testsuite I wrote, but note that this is a major change, and > more > testing would be great. Luca, Beyond a few niggles, this looking great - an impressive body of work... Couple of comments: -static void emit_load_R32G32( struct translate_sse *p, - struct x86_reg data, - struct x86_reg arg0 ) +/* out_chans = 5 means we want 4 channels with 1 in alpha instead of 0 */ +static void emit_load_float32( struct translate_sse *p, + struct x86_reg data, + struct x86_reg arg0, + unsigned out_chans, + unsigned chans) { Is it possible to use an explicit flag for the (out_chans == 5) case? case 8: +#ifdef PIPE_ARCH_X86_64 + x64_mov64(p->func, dataGPR, src); + x64_mov64(p->func, dst, dataGPR); +#else + sse_movlps(p->func, dataXMM, src); + sse_movlps(p->func, dst, dataXMM); +#endif + break; + case 12: +#ifdef PIPE_ARCH_X86_64 + x64_mov64(p->func, dataGPR2, src); +#else + sse_movlps(p->func, dataXMM, src); +#endif + x86_mov(p->func, dataGPR, x86_make_disp(src, 8)); +#ifdef PIPE_ARCH_X86_64 + x64_mov64(p->func, dst, dataGPR2); +#else + sse_movlps(p->func, dst, dataXMM); +#endif + x86_mov(p->func, x86_make_disp(dst, 8), dataGPR); Is it possible to do this without all the #ifdefs? Even if statements based on a preprocessor variable would be easier to read, but better still would be some sort of wrapper function which just did the right thing on either architecture. Similar comment applies to your x86-64 changes in rtasm.c -- is there a way to reduce the #ifdef load? ... +// TODO: add support for SSE4.1 pmovzx Probably want to use C-style comments throughout. Keith ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] translate_generic: use memcpy if possible
Luca, In this change you've got an int value (copy_size) which has some special meaning when negative -- can you add comments explaining what the meaning of a negative size is? Is there a way to use some more explicit flag value to indicate this condition? Keith On Thu, 2010-08-12 at 10:08 -0700, Luca Barbieri wrote: > When used in GPU drivers, translate can be used to simultaneously > perform a gather operation, and convert away from unsupported formats. > > In this use case, input and output formats will often be identical: clearly > it would make sense to use a memcpy in this case. > > Instead, translate will insist to convert to and from 32-bit floating point > numbers. > > This is not only extremely expensive, but it also loses precision for > 32/64-bit integers and 64-bit floating point numbers. > > This patch changes translate_generic to just use memcpy if the formats are > identical, non-blocked, and with an integral number of bytes per pixel (note > that all sensible vertex formats are like this). > --- > .../auxiliary/translate/translate_generic.c| 93 +-- > 1 files changed, 63 insertions(+), 30 deletions(-) > > diff --git a/src/gallium/auxiliary/translate/translate_generic.c > b/src/gallium/auxiliary/translate/translate_generic.c > index 42cfd76..57a42b7 100644 > --- a/src/gallium/auxiliary/translate/translate_generic.c > +++ b/src/gallium/auxiliary/translate/translate_generic.c > @@ -63,6 +63,7 @@ struct translate_generic { >const uint8_t *input_ptr; >unsigned input_stride; >unsigned max_index; > + int copy_size; > > } attrib[PIPE_MAX_ATTRIBS]; > > @@ -380,9 +381,10 @@ static void PIPE_CDECL generic_run_elts( struct > translate *translate, >float data[4]; >char *dst = vert + tg->attrib[attr].output_offset; > > - if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) { > + if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) { > const uint8_t *src; > unsigned index; > +int copy_size; > > if (tg->attrib[attr].instance_divisor) { > index = instance_id / tg->attrib[attr].instance_divisor; > @@ -396,27 +398,34 @@ static void PIPE_CDECL generic_run_elts( struct > translate *translate, > src = tg->attrib[attr].input_ptr + >tg->attrib[attr].input_stride * index; > > -tg->attrib[attr].fetch( data, src, 0, 0 ); > - > -if (0) > - debug_printf("Fetch elt attr %d from %p stride %d div %u > max %u index %d: " > -" %f, %f, %f, %f \n", > -attr, > -tg->attrib[attr].input_ptr, > -tg->attrib[attr].input_stride, > -tg->attrib[attr].instance_divisor, > -tg->attrib[attr].max_index, > -index, > -data[0], data[1],data[2], data[3]); > +copy_size = tg->attrib[attr].copy_size; > +if(likely(copy_size >= 0)) > + memcpy(dst, src, tg->attrib[attr].copy_size); > +else > +{ > + tg->attrib[attr].fetch( data, src, 0, 0 ); > + > + if (0) > + debug_printf("Fetch elt attr %d from %p stride %d div > %u max %u index %d: " > + " %f, %f, %f, %f \n", > + attr, > + tg->attrib[attr].input_ptr, > + tg->attrib[attr].input_stride, > + tg->attrib[attr].instance_divisor, > + tg->attrib[attr].max_index, > + index, > + data[0], data[1],data[2], data[3]); > + tg->attrib[attr].emit( data, dst ); > +} > } else { > -data[0] = (float)instance_id; > +if(likely(tg->attrib[attr].copy_size >= 0)) > + memcpy(data, &instance_id, 4); > +else > +{ > + data[0] = (float)instance_id; > + tg->attrib[attr].emit( data, dst ); > +} > } > - > - if (0) > -debug_printf("vert %d/%d attr %d: %f %f %f %f\n", > - i, elt, attr, data[0], data[1], data[2], data[3]); > - > - tg->attrib[attr].emit( data, dst ); >} >vert += tg->translate.key.output_stride; > } > @@ -448,6 +457,7 @@ static void PIPE_CDECL generic_run( struct translate > *translate, > if (tg->attrib[attr].type == TRANSLATE_ELEMENT_NORMAL) { > const uint8_t *src; > unsigned index; > +int copy_size; > > if (tg->attrib[attr].instance_divisor) { > index = instance_id / tg->attrib[attr].instance_divisor; >
Re: [Mesa-dev] talloc (Was: Merge criteria for glsl2 branch)
> > Like I said before, "full port" of talloc seems to be not needed for > > compiling on Visual C++; just drop in talloc.h & talloc.c into the project > > and that's it. Same for Mac with Xcode. > Be careful about LGPLv3 rules, > If you are distributing anything linked with an LGPL library without > accompanying source you need to dynamically link it I know. I'm just providing a MSVC/Xcode compatible talloc source file. How Mesa or some fork of Mesa includes it in the build or packages it up - I'll just leave that up to them. -- Aras Pranckevičius work: http://unity3d.com home: http://aras-p.info ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] talloc (Was: Merge criteria for glsl2 branch)
On Fri, Aug 13, 2010 at 4:58 PM, Aras Pranckevicius wrote: >> > But I perceive talloc as different from all above: it's very low level >> > and low weight library, providing very basic functionality, and upstream >> > never showed interest for Windows portability. I'd really prefer to see >> > the talloc source bundled (and only compiled on windows), as a quick way >> > to have glsl2 merged without causing windows build failures. >> >> This seems like a reasonable compromise. Is this something that you and >> / or Aras can tackle? I don't have a Windows build system set up, so I >> wouldn't be able to test any build system changes that I made. > > Ok, looks like how/if to bundle talloc is still a very open question. In the > meantime, here's talloc 2.0.1 made to compile (and possibly work!) with > Visual C++ 2008 (Windows) and Xcode/gcc4.0 (Mac). > I've attached the modified talloc.c & talloc.h and the patch from original > talloc 2.0.1 (from here http://samba.org/ftp/talloc/). Caveat emptor: I only > verified this to work on my own GLSL2 fork, which does not compile in GLSL2 > preprocessor, only the compiler & optimizer. > Like I said before, "full port" of talloc seems to be not needed for > compiling on Visual C++; just drop in talloc.h & talloc.c into the project > and that's it. Same for Mac with Xcode. It also seems that GLSL2 does not > use full talloc's functionality, and at least half of the implementation > could be dropped without anyone noticing. Just a note for if/when anyone > would try to re-implement talloc with Mesa's license. Be careful about LGPLv3 rules, If you are distributing anything linked with an LGPL library without accompanying source you need to dynamically link it, So for example a Windows driver or non open compiler, you can't just drop the LGPLv3 c+h files into the project, you need to create a dynamic library. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH]R600g more pipe_cap shader params
Ok, I used opengl extension viewer from realtechvr to identify some parameters values, (used some data from win/osx ati drivers) also identified most of limits by its normal name. the patch conatins some new values and comments for them. 2010/8/10 Marek Olšák > I've already committed some of the changes and fixed others here: > > > http://cgit.freedesktop.org/mesa/mesa/commit/?id=00963589b4d92460e3ae2c1557a5d816b5c67a6d > > If you still think there is something incorrect, please attach a new patch > against current mesa git. > > -Marek > > On Sun, Aug 8, 2010 at 9:30 PM, Владимир wrote: > >> Patch based mainly on info from r600c and few bits taken from r300g >> (vertex tex instruction params) >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> >> > r600_pipecap.patch Description: Binary data ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev