Re: [Mesa-dev] S2TC - yet another attempt to solve the S3TC issue
On Wed, Aug 10, 2011 at 7:11 AM, Rudolf Polzer divver...@xonotic.org wrote: On Tue, Aug 09, 2011 at 11:45:23PM +0200, Marek Olšák wrote: On Tue, Aug 9, 2011 at 12:25 PM, Jose Fonseca jfons...@vmware.com wrote: I don't have time for a longer reply now, but I do think your S2TC work is interesting, and that you've successfully contoured the patent claims, at least for the decompression, as I didn't look at the compression bits. But, there was never anything you could have done to improve the situation for GPU S3TC decompression. It's not just a US patent system vs the rest. It's complex, but it's all in the archives, as it has all been discussed before. Well, actually, there is a solution. A solution which has not been available until now. The solution is not to use GPU S3TC decompression, obviously. Instead, let's use GPU S2TC decompression. We have integer opcode support coming, right? We can use those to decompress S2TC textures, which would be loaded as integer textures. Of course, we'd have to implement filtering as well and maintain a few shader variants in case somebody binds an S2TC texture, so that we can switch a shader to S2TC decompression for a particular sampler. The only problem is S2TC can't correctly decompress every S3TC texture, so we'd be noncompliant. Noncompliant is probably better than not working at all. So what do you guys think? If we go there... wouldn't it be an alternative to trivially convert S3TC to S2TC, by mapping all pixel values to values that don't use inferred colors? Not sure. It would be safer to avoid anything that assumes full knowledge of S3TC. That way, we'd still be using the S3TC circuits, but we'd only feed S2TC into them. Would this be enough to evade the patent - as we'd basically not care if that circuit does any more than S2TC decoding? I would rather avoid that circuit altogether. To exaggerate again: what if we upload null bytes into that decompressor circuit? The decoding algorithms would still run on the hardware, but all we would get out of it is it expanding a sequence of null bytes to 8 times its length. Would even this still infringe, just because that circuit is a S3TC decoder? I think so. Obviously, the same quality loss would be incurred by this, which is - from my tests - big enough to consider such a decoder noncompliant. Because of this I would suggest that - if we go one of these two ways - we should add a separate extension string for support of S2TC. The problem is there is no adoption of S2TC in the industry. The current state is that Unigine products don't run without full S3TC. Neither does the id Tech 4 engine. Most, if not all, D3D games consider S3TC a standard. In order to succeed, S2TC must become a standard too. The OpenGL ARB cannot incorporate S3TC into a core spec anyway. Perhaps they would be interested in S2TC as a temporary replacement. Proprietary drivers could implement S2TC easily using their hardware (patented) S3TC decoder. Mesa would have to decode it using shaders. But there would be at least something in the core spec that is one-way compatible with S3TC and has comparable quality, even though the algorithms are different. You can see for yourself here (screenshot of a scene using S3TC-decoded-as-S2TC - note the already visible dithering moire that comes from the S2TC decoder case that handles S3TC without using color ramps): https://github.com/divVerent/s2tc/wiki/img/s3tcnv-dds-s2tc.png and here (texture comparison, temporary URL): http://rm.rm.rm-f.org/~xonotic/temp/s3tc-to-s2tc/ Also, s2tc now comes with a tool to quickly convert S3TC to S2TC: https://github.com/divVerent/s2tc/blob/master/s2tc_from_s3tc.cpp Together: GL_EXT_texture_compression_s3tc: - upload of any S3TC texture - only possible if the HW vendor has a broad enough license of the patents, i.e. only possible for nouveau - encoding of existing textures to S3TC or S2TC (here, S2TC is good enough to claim compliance) GL_MESA_texture_compression_s2tc: - upload of any S2TC texture - decoding can take place using S3TC circuits, or using code on the GPU - attempts to upload S3TC using this will yield reduced quality - encoding of existing textures to S2TC - uses same constants as S3TC for uploading the texture - basically, these two extensions define the very same interface, but expect different data I am currently field testing S2TC in Xonotic, and got no complaints about reduced texture quality yet (although I found a few non-optimal cases myself which I will fix by manually excluding these textures from S2TC compression - and the same places were already bad with AMD Compressonator, so I consider it good that these are a bit easier to find now). So, if we go this route, all I'd have to do for Xonotic, is to have
Re: [Mesa-dev] S2TC - yet another attempt to solve the S3TC issue
On Wed, 2011-08-10 at 08:50 +0200, Marek Olšák wrote: On Wed, Aug 10, 2011 at 7:11 AM, Rudolf Polzer divver...@xonotic.org wrote: On Tue, Aug 09, 2011 at 11:45:23PM +0200, Marek Olšák wrote: On Tue, Aug 9, 2011 at 12:25 PM, Jose Fonseca jfons...@vmware.com wrote: I don't have time for a longer reply now, but I do think your S2TC work is interesting, and that you've successfully contoured the patent claims, at least for the decompression, as I didn't look at the compression bits. But, there was never anything you could have done to improve the situation for GPU S3TC decompression. It's not just a US patent system vs the rest. It's complex, but it's all in the archives, as it has all been discussed before. Well, actually, there is a solution. A solution which has not been available until now. The solution is not to use GPU S3TC decompression, obviously. Instead, let's use GPU S2TC decompression. We have integer opcode support coming, right? We can use those to decompress S2TC textures, which would be loaded as integer textures. Of course, we'd have to implement filtering as well and maintain a few shader variants in case somebody binds an S2TC texture, so that we can switch a shader to S2TC decompression for a particular sampler. The only problem is S2TC can't correctly decompress every S3TC texture, so we'd be noncompliant. Noncompliant is probably better than not working at all. So what do you guys think? If we go there... wouldn't it be an alternative to trivially convert S3TC to S2TC, by mapping all pixel values to values that don't use inferred colors? Not sure. It would be safer to avoid anything that assumes full knowledge of S3TC. That way, we'd still be using the S3TC circuits, but we'd only feed S2TC into them. Would this be enough to evade the patent - as we'd basically not care if that circuit does any more than S2TC decoding? I would rather avoid that circuit altogether. I'm still very confused as to why this'd be a problem... I can only imagine how many patents get violated by just telling the GPU to render a triangle.. Wtf is the difference here? Ben. To exaggerate again: what if we upload null bytes into that decompressor circuit? The decoding algorithms would still run on the hardware, but all we would get out of it is it expanding a sequence of null bytes to 8 times its length. Would even this still infringe, just because that circuit is a S3TC decoder? I think so. Obviously, the same quality loss would be incurred by this, which is - from my tests - big enough to consider such a decoder noncompliant. Because of this I would suggest that - if we go one of these two ways - we should add a separate extension string for support of S2TC. The problem is there is no adoption of S2TC in the industry. The current state is that Unigine products don't run without full S3TC. Neither does the id Tech 4 engine. Most, if not all, D3D games consider S3TC a standard. In order to succeed, S2TC must become a standard too. The OpenGL ARB cannot incorporate S3TC into a core spec anyway. Perhaps they would be interested in S2TC as a temporary replacement. Proprietary drivers could implement S2TC easily using their hardware (patented) S3TC decoder. Mesa would have to decode it using shaders. But there would be at least something in the core spec that is one-way compatible with S3TC and has comparable quality, even though the algorithms are different. You can see for yourself here (screenshot of a scene using S3TC-decoded-as-S2TC - note the already visible dithering moire that comes from the S2TC decoder case that handles S3TC without using color ramps): https://github.com/divVerent/s2tc/wiki/img/s3tcnv-dds-s2tc.png and here (texture comparison, temporary URL): http://rm.rm.rm-f.org/~xonotic/temp/s3tc-to-s2tc/ Also, s2tc now comes with a tool to quickly convert S3TC to S2TC: https://github.com/divVerent/s2tc/blob/master/s2tc_from_s3tc.cpp Together: GL_EXT_texture_compression_s3tc: - upload of any S3TC texture - only possible if the HW vendor has a broad enough license of the patents, i.e. only possible for nouveau - encoding of existing textures to S3TC or S2TC (here, S2TC is good enough to claim compliance) GL_MESA_texture_compression_s2tc: - upload of any S2TC texture - decoding can take place using S3TC circuits, or using code on the GPU - attempts to upload S3TC using this will yield reduced quality - encoding of existing textures to S2TC - uses same constants as S3TC for uploading the texture - basically, these two extensions define the very same interface, but expect different data I am currently field testing S2TC in Xonotic, and got no
[Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering
This patch makes it possible to render spans of a triangle in parallel. To make as little changes to the codebase as possible, OpenMP was choosen to implement the actual multithreading. The patch is meant to speedup osmesa rendering. Andreas Fänger (1): swrast: initial multi-threaded span rendering common.py |1 + scons/gallium.py | 12 +++ src/mesa/swrast/s_aatritemp.h | 68 ++- src/mesa/swrast/s_context.c| 26 --- src/mesa/swrast/s_texcombine.c |4 ++ src/mesa/tnl/t_pipeline.c | 12 +++ 6 files changed, 87 insertions(+), 36 deletions(-) -- 1.7.4.msysgit.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering
Optional parallel rendering of spans using OpenMP. Initial implementation for aa triangles. A new option for scons is also provided to activate the openmp support (off by default). --- common.py |1 + scons/gallium.py | 12 +++ src/mesa/swrast/s_aatritemp.h | 68 ++- src/mesa/swrast/s_context.c| 26 --- src/mesa/swrast/s_texcombine.c |4 ++ src/mesa/tnl/t_pipeline.c | 12 +++ 6 files changed, 87 insertions(+), 36 deletions(-) diff --git a/common.py b/common.py index 8657030..cfee1b5 100644 --- a/common.py +++ b/common.py @@ -88,6 +88,7 @@ def AddOptions(opts): opts.Add('toolchain', 'compiler toolchain', default_toolchain) opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support', 'no')) opts.Add(BoolOption('llvm', 'use LLVM', default_llvm)) + opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp (swrast)', 'no')) opts.Add(BoolOption('debug', 'DEPRECATED: debug build', 'yes')) opts.Add(BoolOption('profile', 'DEPRECATED: profile build', 'no')) opts.Add(BoolOption('quiet', 'DEPRECATED: profile build', 'yes')) diff --git a/scons/gallium.py b/scons/gallium.py index 8cd3bc7..7135251 100755 --- a/scons/gallium.py +++ b/scons/gallium.py @@ -596,6 +596,18 @@ def generate(env): libs += ['m', 'pthread', 'dl'] env.Append(LIBS = libs) +# OpenMP +if env['openmp']: +if env['msvc']: +env.Append(CCFLAGS = ['/openmp']) +# When building openmp release VS2008 link.exe crashes with LNK1103 error. +# Workaround: overwrite PDB flags with empty value as it isn't required anyways +if env['build'] == 'release': +env['PDB'] = '' +if env['gcc']: +env.Append(CCFLAGS = ['-fopenmp']) +env.Append(LIBS = ['gomp']) + # Load tools env.Tool('lex') env.Tool('yacc') diff --git a/src/mesa/swrast/s_aatritemp.h b/src/mesa/swrast/s_aatritemp.h index 91d4f7a..005d12c 100644 --- a/src/mesa/swrast/s_aatritemp.h +++ b/src/mesa/swrast/s_aatritemp.h @@ -181,13 +181,18 @@ const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS]; const GLfloat dxdy = majDx / majDy; const GLfloat xAdj = dxdy 0.0F ? -dxdy : 0.0F; - GLfloat x = pMin[0] - (yMin - iyMin) * dxdy; GLint iy; - for (iy = iyMin; iy iyMax; iy++, x += dxdy) { + #pragma omp parallel for schedule(dynamic) private(iy) firstprivate(span) + for (iy = iyMin; iy iyMax; iy++) { + GLfloat x = pMin[0] - (yMin - iy) * dxdy; GLint ix, startX = (GLint) (x - xAdj); GLuint count; GLfloat coverage = 0.0F; +#ifdef _OPENMP + /* each thread needs to use a different (global) SpanArrays variable */ + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num(); +#endif /* skip over fragments with zero coverage */ while (startX MAX_WIDTH) { coverage = compute_coveragef(pMin, pMid, pMax, startX, iy); @@ -228,13 +233,12 @@ coverage = compute_coveragef(pMin, pMid, pMax, ix, iy); } - if (ix = startX) -continue; - - span.x = startX; - span.y = iy; - span.end = (GLuint) ix - (GLuint) startX; - _swrast_write_rgba_span(ctx, span); + if (ix startX) { +span.x = startX; +span.y = iy; +span.end = (GLuint) ix - (GLuint) startX; +_swrast_write_rgba_span(ctx, span); + } } } else { @@ -244,13 +248,18 @@ const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS]; const GLfloat dxdy = majDx / majDy; const GLfloat xAdj = dxdy 0 ? dxdy : 0.0F; - GLfloat x = pMin[0] - (yMin - iyMin) * dxdy; GLint iy; - for (iy = iyMin; iy iyMax; iy++, x += dxdy) { + #pragma omp parallel for schedule(dynamic) private(iy) firstprivate(span) + for (iy = iyMin; iy iyMax; iy++) { + GLfloat x = pMin[0] - (yMin - iy) * dxdy; GLint ix, left, startX = (GLint) (x + xAdj); GLuint count, n; GLfloat coverage = 0.0F; +#ifdef _OPENMP + /* each thread needs to use a different (global) SpanArrays variable */ + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num(); +#endif /* make sure we're not past the window edge */ if (startX = ctx-DrawBuffer-_Xmax) { startX = ctx-DrawBuffer-_Xmax - 1; @@ -296,31 +305,30 @@ ATTRIB_LOOP_END #endif - if (startX = ix) -continue; + if (startX ix) { +n = (GLuint) startX - (GLuint) ix; - n = (GLuint) startX - (GLuint) ix; +left = ix + 1; - left = ix + 1; - - /* shift all values to the left */ - /* XXX this is temporary */ - { -
Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering
I'm not sure it makes a lot of sense to be optimizing swrast at this stage. Take a look at llvmpipe and perhaps consider improving the multithreading already in place in that rasterizer, which is far better optimized than swrast already. Keith On Wed, 2011-08-10 at 08:07 +, Andreas Fänger wrote: Optional parallel rendering of spans using OpenMP. Initial implementation for aa triangles. A new option for scons is also provided to activate the openmp support (off by default). --- common.py |1 + scons/gallium.py | 12 +++ src/mesa/swrast/s_aatritemp.h | 68 ++- src/mesa/swrast/s_context.c| 26 --- src/mesa/swrast/s_texcombine.c |4 ++ src/mesa/tnl/t_pipeline.c | 12 +++ 6 files changed, 87 insertions(+), 36 deletions(-) diff --git a/common.py b/common.py index 8657030..cfee1b5 100644 --- a/common.py +++ b/common.py @@ -88,6 +88,7 @@ def AddOptions(opts): opts.Add('toolchain', 'compiler toolchain', default_toolchain) opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support', 'no')) opts.Add(BoolOption('llvm', 'use LLVM', default_llvm)) + opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp (swrast)', 'no')) opts.Add(BoolOption('debug', 'DEPRECATED: debug build', 'yes')) opts.Add(BoolOption('profile', 'DEPRECATED: profile build', 'no')) opts.Add(BoolOption('quiet', 'DEPRECATED: profile build', 'yes')) diff --git a/scons/gallium.py b/scons/gallium.py index 8cd3bc7..7135251 100755 --- a/scons/gallium.py +++ b/scons/gallium.py @@ -596,6 +596,18 @@ def generate(env): libs += ['m', 'pthread', 'dl'] env.Append(LIBS = libs) +# OpenMP +if env['openmp']: +if env['msvc']: +env.Append(CCFLAGS = ['/openmp']) +# When building openmp release VS2008 link.exe crashes with LNK1103 error. +# Workaround: overwrite PDB flags with empty value as it isn't required anyways +if env['build'] == 'release': +env['PDB'] = '' +if env['gcc']: +env.Append(CCFLAGS = ['-fopenmp']) +env.Append(LIBS = ['gomp']) + # Load tools env.Tool('lex') env.Tool('yacc') diff --git a/src/mesa/swrast/s_aatritemp.h b/src/mesa/swrast/s_aatritemp.h index 91d4f7a..005d12c 100644 --- a/src/mesa/swrast/s_aatritemp.h +++ b/src/mesa/swrast/s_aatritemp.h @@ -181,13 +181,18 @@ const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS]; const GLfloat dxdy = majDx / majDy; const GLfloat xAdj = dxdy 0.0F ? -dxdy : 0.0F; - GLfloat x = pMin[0] - (yMin - iyMin) * dxdy; GLint iy; - for (iy = iyMin; iy iyMax; iy++, x += dxdy) { + #pragma omp parallel for schedule(dynamic) private(iy) firstprivate(span) + for (iy = iyMin; iy iyMax; iy++) { + GLfloat x = pMin[0] - (yMin - iy) * dxdy; GLint ix, startX = (GLint) (x - xAdj); GLuint count; GLfloat coverage = 0.0F; +#ifdef _OPENMP + /* each thread needs to use a different (global) SpanArrays variable */ + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num(); +#endif /* skip over fragments with zero coverage */ while (startX MAX_WIDTH) { coverage = compute_coveragef(pMin, pMid, pMax, startX, iy); @@ -228,13 +233,12 @@ coverage = compute_coveragef(pMin, pMid, pMax, ix, iy); } - if (ix = startX) -continue; - - span.x = startX; - span.y = iy; - span.end = (GLuint) ix - (GLuint) startX; - _swrast_write_rgba_span(ctx, span); + if (ix startX) { +span.x = startX; +span.y = iy; +span.end = (GLuint) ix - (GLuint) startX; +_swrast_write_rgba_span(ctx, span); + } } } else { @@ -244,13 +248,18 @@ const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS]; const GLfloat dxdy = majDx / majDy; const GLfloat xAdj = dxdy 0 ? dxdy : 0.0F; - GLfloat x = pMin[0] - (yMin - iyMin) * dxdy; GLint iy; - for (iy = iyMin; iy iyMax; iy++, x += dxdy) { + #pragma omp parallel for schedule(dynamic) private(iy) firstprivate(span) + for (iy = iyMin; iy iyMax; iy++) { + GLfloat x = pMin[0] - (yMin - iy) * dxdy; GLint ix, left, startX = (GLint) (x + xAdj); GLuint count, n; GLfloat coverage = 0.0F; +#ifdef _OPENMP + /* each thread needs to use a different (global) SpanArrays variable */ + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num(); +#endif /* make sure we're not past the window edge */ if (startX = ctx-DrawBuffer-_Xmax) { startX =
[Mesa-dev] [Bug 39116] [intel] manywin xdemo texture issue
https://bugs.freedesktop.org/show_bug.cgi?id=39116 --- Comment #2 from Jan Rüegg rgg...@gmail.com 2011-08-10 02:29:34 PDT --- Created an attachment (id=50088) -- (https://bugs.freedesktop.org/attachment.cgi?id=50088) glxinfo output on another computer, wher manywin crashes The demo is also not working on my computer... here, it opens very briefly the two windows and then crashes, without any error message. (manywin -s 2 and manywin 2 both suffer from this problem) -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 39116] [intel] manywin xdemo texture issue
https://bugs.freedesktop.org/show_bug.cgi?id=39116 Jan Rüegg rgg...@gmail.com changed: What|Removed |Added CC||rgg...@gmail.com -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] S2TC - yet another attempt to solve the S3TC issue
Am 10.08.2011 11:34, schrieb Rudolf Polzer: The OpenGL ARB cannot incorporate S3TC into a core spec anyway. But it already is core part of OpenGL 3.0. No. Making S3TC part of OpenGL was discussed, but rejected. OpenGL only requires RGTC (since OpenGL 3.0) and BPTC (since OpenGL 4.2). Philipp ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] S2TC - yet another attempt to solve the S3TC issue
On Wed, Aug 10, 2011 at 11:42:11AM +0200, Philipp Klaus Krause wrote: Am 10.08.2011 11:34, schrieb Rudolf Polzer: The OpenGL ARB cannot incorporate S3TC into a core spec anyway. But it already is core part of OpenGL 3.0. No. Making S3TC part of OpenGL was discussed, but rejected. OpenGL only requires RGTC (since OpenGL 3.0) and BPTC (since OpenGL 4.2). Ah, okay then. Speaking of BPTC - what is its patent situation? From a quick glance at the extension spec, it does seem to use S3TC's interpolation method, so it likely is not safe from S3's patents. But I may be wrong there, as BPTC is quite messy and convoluted, and I may have misinterpreted that interpolation value sequence. Best regards, Rudolf Polzer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering
Hi Keith, you are right. The main purpose of this patch is to speedup osmesa rendering as there is no llvmpipe target at the moment. Also llvmpipe is currently missing some important features like aa/fsaa and anisotropic filtering, which is available in swrast now. So I need to stick with the old rasterizer at the moment, with some improvements. Andreas -Ursprüngliche Nachricht- Von: Keith Whitwell [mailto:kei...@vmware.com] Gesendet: Mittwoch, 10. August 2011 11:17 An: Andreas Fänger Cc: mesa-dev@lists.freedesktop.org Betreff: Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering I'm not sure it makes a lot of sense to be optimizing swrast at this stage. Take a look at llvmpipe and perhaps consider improving the multithreading already in place in that rasterizer, which is far better optimized than swrast already. Keith On Wed, 2011-08-10 at 08:07 +, Andreas Fänger wrote: Optional parallel rendering of spans using OpenMP. Initial implementation for aa triangles. A new option for scons is also provided to activate the openmp support (off by default). --- common.py |1 + scons/gallium.py | 12 +++ src/mesa/swrast/s_aatritemp.h | 68 ++- src/mesa/swrast/s_context.c| 26 --- src/mesa/swrast/s_texcombine.c |4 ++ src/mesa/tnl/t_pipeline.c | 12 +++ 6 files changed, 87 insertions(+), 36 deletions(-) diff --git a/common.py b/common.py index 8657030..cfee1b5 100644 --- a/common.py +++ b/common.py @@ -88,6 +88,7 @@ def AddOptions(opts): opts.Add('toolchain', 'compiler toolchain', default_toolchain) opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support', 'no')) opts.Add(BoolOption('llvm', 'use LLVM', default_llvm)) + opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp (swrast)', 'no')) opts.Add(BoolOption('debug', 'DEPRECATED: debug build', 'yes')) opts.Add(BoolOption('profile', 'DEPRECATED: profile build', 'no')) opts.Add(BoolOption('quiet', 'DEPRECATED: profile build', 'yes')) diff --git a/scons/gallium.py b/scons/gallium.py index 8cd3bc7..7135251 100755 --- a/scons/gallium.py +++ b/scons/gallium.py @@ -596,6 +596,18 @@ def generate(env): libs += ['m', 'pthread', 'dl'] env.Append(LIBS = libs) +# OpenMP +if env['openmp']: +if env['msvc']: +env.Append(CCFLAGS = ['/openmp']) +# When building openmp release VS2008 link.exe crashes with LNK1103 error. +# Workaround: overwrite PDB flags with empty value as it isn't required anyways +if env['build'] == 'release': +env['PDB'] = '' +if env['gcc']: +env.Append(CCFLAGS = ['-fopenmp']) +env.Append(LIBS = ['gomp']) + # Load tools env.Tool('lex') env.Tool('yacc') diff --git a/src/mesa/swrast/s_aatritemp.h b/src/mesa/swrast/s_aatritemp.h index 91d4f7a..005d12c 100644 --- a/src/mesa/swrast/s_aatritemp.h +++ b/src/mesa/swrast/s_aatritemp.h @@ -181,13 +181,18 @@ const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS]; const GLfloat dxdy = majDx / majDy; const GLfloat xAdj = dxdy 0.0F ? -dxdy : 0.0F; - GLfloat x = pMin[0] - (yMin - iyMin) * dxdy; GLint iy; - for (iy = iyMin; iy iyMax; iy++, x += dxdy) { + #pragma omp parallel for schedule(dynamic) private(iy) firstprivate(span) + for (iy = iyMin; iy iyMax; iy++) { + GLfloat x = pMin[0] - (yMin - iy) * dxdy; GLint ix, startX = (GLint) (x - xAdj); GLuint count; GLfloat coverage = 0.0F; +#ifdef _OPENMP + /* each thread needs to use a different (global) SpanArrays variable */ + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num(); +#endif /* skip over fragments with zero coverage */ while (startX MAX_WIDTH) { coverage = compute_coveragef(pMin, pMid, pMax, startX, iy); @@ -228,13 +233,12 @@ coverage = compute_coveragef(pMin, pMid, pMax, ix, iy); } - if (ix = startX) -continue; - - span.x = startX; - span.y = iy; - span.end = (GLuint) ix - (GLuint) startX; - _swrast_write_rgba_span(ctx, span); + if (ix startX) { +span.x = startX; +span.y = iy; +span.end = (GLuint) ix - (GLuint) startX; +_swrast_write_rgba_span(ctx, span); + } } } else { @@ -244,13 +248,18 @@ const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS]; const GLfloat dxdy = majDx / majDy; const GLfloat xAdj = dxdy 0 ? dxdy : 0.0F; - GLfloat x = pMin[0] - (yMin - iyMin) * dxdy; GLint iy; - for (iy = iyMin; iy iyMax; iy++, x += dxdy) { + #pragma
Re: [Mesa-dev] S2TC - yet another attempt to solve the S3TC issue
Am 10.08.2011 11:46, schrieb Rudolf Polzer: Speaking of BPTC - what is its patent situation? From a quick glance at the extension spec, it does seem to use S3TC's interpolation method, so it likely is not safe from S3's patents. But I may be wrong there, as BPTC is quite messy and convoluted, and I may have misinterpreted that interpolation value sequence. The ARB says No known IP claims.. HTC probably says that the S3TC patent applies to virtually all kinds of texture compression. Philipp ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] S2TC - yet another attempt to solve the S3TC issue
On Wed, Aug 10, 2011 at 11:46:00AM +0200, Philipp Klaus Krause wrote: Am 10.08.2011 08:50, schrieb Marek Olšák: The problem is there is no adoption of S2TC in the industry. The current state is that Unigine products don't run without full S3TC. Neither does the id Tech 4 engine. Most, if not all, D3D games consider S3TC a standard. In order to succeed, S2TC must become a standard too. Which would take so much time, it wouldn't make sense - by then all hardware probably supports BPTC, since it is required by OpenGL 4.2. AFAIK BPTX most of the time provides better quality than S3TC, and thus better quality than S2TC. Right. Plus, I doubt the ARB would accept an extension that is a subset of an already existing extension, and even temporary. Which is why I would rather suggest Mesa to design and implement a S2TC extension spec that tries to be as compatible as possible to S3TC. The general idea is that an existing game engine should only require minimal changes to use S2TC, and have S3TC decode S2TC easily too. The one big problem could be the texture format identifier. If the S2TC extension is to use GL_COMPRESSED_RGBA_S3TC_DXT1_EXT this would minimize changes to the software affected - all that changes, is that S2TC supporting software would test BOTH extension strings. However, then S3TC-expecting software that uses GL_ARB_texture_compression and enumerates texture formats would still find GL_COMPRESSED_RGBA_S3TC_DXT1_EXT in the list, and expect full S3TC decoding. So likely, a different texture format identifier would be required. Also, this would require both compression and upload (e.g. shader decompression) in Mesa, as this should serve as full replacement for GL_EXT_texture_compression_s3tc. If we use a different identifier for the texture format, which we'd likely have to, an application would need to be changed to support S2TC like this: Change initialization code from: - check for GL_EXT_texture_compression_s3tc - if available, set s3tc_supported = 1 to - check for GL_EXT_texture_compression_s3tc; if available: - set s3tc_supported = 1 and s2tc_supported = 1 - set S2TC_DXT1 = GL_COMPRESSED_RGB_S3TC_DXT1_EXT - set S2TC_DXT1A = GL_COMPRESSED_RGBA_S3TC_DXT1_EXT - set S2TC_DXT3 = GL_COMPRESSED_RGBA_S3TC_DXT3_EXT - set S2TC_DXT5 = GL_COMPRESSED_RGBA_S3TC_DXT5_EXT - otherwise, check for GL_MESA_texture_compression_s2tc; if available: - set s3tc_supported = 0 and s2tc_supported = 1 - set S2TC_DXT1 = GL_COMPRESSED_RGB_S2TC_DXT1_MESA - set S2TC_DXT1A = GL_COMPRESSED_RGBA_S2TC_DXT1_MESA - set S2TC_DXT3 = GL_COMPRESSED_RGBA_S2TC_DXT3_MESA - set S2TC_DXT5 = GL_COMPRESSED_RGBA_S2TC_DXT5_MESA - otherwise: - set s3tc_supported = 0 and s2tc_supported = 0 Change all references to s3tc_supported to s2tc_supported, where it is applicable to the case (e.g. depending on input source). Change all references to GL_COMPRESSED_*_S3TC_*_EXT to S2TC_* where applicable. Best regards, Rudolf Polzer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] S2TC - yet another attempt to solve the S3TC issue
- Original Message - On Tue, Aug 09, 2011 at 03:25:05AM -0700, Jose Fonseca wrote: How should you brought this? You should have assumed that we have our reasons -- after all we've been living under the frustration of these patents, walking on a mine field, for a decade --, instead of assuming we have NIH syndrome. So I should never try to do anything new, as likely someone else may have already done it and rejected it. We must be talking different things. I was answering on how you should have approached this subject on the mailing list _after_ doing S2TC work. Essentially you started to wrongly accuse the developers of already doing illegal things, when your proposal was refused, which totally pissed me off. We strive to keep legal risk under check, and you were trying to tip the scale. If you are talking of what you should have done before doing the S2TC work, then my opinion is that you should have: - a) read all threads about S3TC on mesa3d-dev mailing list -- if you had done so you would have noticed that several times now people have proposed to develop a variant of the S3TC GL extension that would not require any software (de)compression (therefore completely avoiding the software side of the issue), but that it was always abandoned, because the hardware licensing terms were potentially too narrow -- the intuitive idea that if I bought the hardware I must have the license simply is not true in the general case. The recent Apple/S3 lawsuit corroborates that. and/or b) contacted the developers before doing the work -- they would have told you the same. Either would have saved everybody grief. That said, I still think that S2TC work is interesting for software rendererers (although that may not be your intention), or even if one day the IHVs do get a S3TC license for Mesa/Linux, we can use it to solve the software compression issue. But the hardware use, is and always has been in the IHV/S3 hands. (Furthermore b) applies not only to patent issues but any contribution to any open source project. The only way to ensure one does not waste time is to get the maintainers' buy in for the general concept beforehand.) Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] S2TC - yet another attempt to solve the S3TC issue
On Wed, Aug 10, 2011 at 04:32:20AM -0700, Jose Fonseca wrote: - Original Message - On Tue, Aug 09, 2011 at 03:25:05AM -0700, Jose Fonseca wrote: How should you brought this? You should have assumed that we have our reasons -- after all we've been living under the frustration of these patents, walking on a mine field, for a decade --, instead of assuming we have NIH syndrome. So I should never try to do anything new, as likely someone else may have already done it and rejected it. We must be talking different things. I was answering on how you should have approached this subject on the mailing list _after_ doing S2TC work. Essentially you started to wrongly accuse the developers of already doing illegal things, when your proposal was refused, which totally pissed me off. We strive to keep legal risk under check, and you were trying to tip the scale. I did not. It was proof by contradiction, or reductio ad absurdum, a common mathematical method of proof. I was saying, your claim that uploading texture data to a S3TC decoding sircuit is a reason against S2TC must be void, because Mesa does the same already, and thus your interpretation must be wrong. It was not meant as an accusation that Mesa is breaking any law. But rather as a proof that in this specific aspect, S2TC is not. If you are talking of what you should have done before doing the S2TC work, then my opinion is that you should have: - a) read all threads about S3TC on mesa3d-dev mailing list -- if you had done so you would have noticed that several times now people have proposed to develop a variant of the S3TC GL extension that would not require any software (de)compression (therefore completely avoiding the software side of the issue), but that it was always abandoned, because the hardware licensing terms were potentially too narrow -- the intuitive idea that if I bought the hardware I must have the license simply is not true in the general case. The recent Apple/S3 lawsuit corroborates that. I never understood that part, and still do not. But if US law is that insane, it is. If only voters had any actual power to get rid of such nonsense... and apparently, US law forbids public explanations of such issues, so it is impossible to learn about it. Not my fault though. b) contacted the developers before doing the work -- they would have told you the same. Before doing the work, it likely would have been shot down because of potential quality issues. Without actually trying it out, it is not obvious at all that S2TC still yields sufficient quality. Plus, my only contact to the mesa developers IS this list. Either would have saved everybody grief. That said, I still think that S2TC work is interesting for software rendererers (although that may not be your intention), or even if one day the IHVs do get a S3TC license for Mesa/Linux, we can use it to solve the software compression issue. But the hardware use, is and always has been in the IHV/S3 hands. Supposedly, by some post elsewhere in this thread, nvidia has the sufficient license, and thus nouveau could use it. For other but sufficiently new chipsets, it was suggested to implement S2TC decoding via shader. No idea how that would work though, and what performance impact it would have. (Furthermore b) applies not only to patent issues but any contribution to any open source project. The only way to ensure one does not waste time is to get the maintainers' buy in for the general concept beforehand.) In this case, that clearly is not possible without having an implementation already, as a quality comparison was necessary. Best regards, Rudolf Polzer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g winsys backend rework
On Mon, Aug 8, 2011 at 7:56 PM, Alex Deucher alexdeuc...@gmail.com wrote: On Sun, Aug 7, 2011 at 5:25 PM, Marek Olšák mar...@gmail.com wrote: Hi, I have been recently trying to get thread offloading of the CS ioctl into r600g in order to reduce the impact of kernel overhead on fps. That, unfortunately, requires whole winsys/radeon to be used, because even the buffer management (bo_map, bo_wait, bo_busy) must take into account that a CS ioctl may be in progress. Besides that, there are several possible race conditions in r600g, so instead of rewriting r600g and trying to do what winsys/radeon is doing, I decided to simply use winsys/radeon. What's new in r600g: - Thread offloading of the DRM CS ioctl. I expect 0-15% increase in performance from that in CPU-bound apps. - The new GEM_WAIT ioctl is used to avoid waiting for a buffer when possible. (e.g. Mesa may map an index buffer to compute index bounds, which shouldn't cause unnecessary waiting now) I have sent the DRM patches which add the ioctl to dri-devel. - Thread-safety: There are several possible races in r600g. I especially don't like radeon_bo::reloc, which may cause pretty ugly races if a resource is shared and relocated in multiple contexts. winsys/radeon doesn't have that race and also fixes a couple more. Hopefully this thread-safeness won't cause performance regressions. winsys/radeon can do space checking as well, but we don't use that in r600g yet. Performance improvements - I have been able to find a difference with these apps: Unigine Heaven Before: 7.3 fps After: 7.6 fps Torcs Before: 29 fps After: 34 fps Note that every commit in the r600winsys2 branch has been committed without piglit regressions, so that we can bisect through it if needed. The net loss is a little over 900 lines of code in r600g. The fenced cache buffer manager in r600g has turned out not to be superior to pb_cache_bufmgr with the is_buffer_busy hook set, so I removed the former too, as the latter is way simpler. This new work has been pushed into a new branch r600winsys2 in the main Mesa repository, please review/test. Looks good. I'll try and run some tests on various hw later this week. I ran this on a few cards today and all seems well. Alex Reviewed-by: Alex Deucher alexander.deuc...@amd.com Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] x86-64: Fix compile error with clang
Remove the 'f' suffix from a float literal. - .float 0.0f+1.0 + .float 1.0 This fixes the following compile error with clang: error: unexpected token in directive .float 0.0f+1.0 ^ Signed-off-by: Chad Versace c...@chad-versace.us --- src/mesa/x86-64/xform4.S |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/mesa/x86-64/xform4.S b/src/mesa/x86-64/xform4.S index 6141e43..5abd5a2 100644 --- a/src/mesa/x86-64/xform4.S +++ b/src/mesa/x86-64/xform4.S @@ -118,7 +118,7 @@ p4_constants: .byte 0x00, 0x00, 0x00, 0x00 .byte 0x00, 0x00, 0x00, 0x00 .byte 0x00, 0x00, 0x00, 0x00 -.float 0f+1.0 +.float 1.0 .text .align 16 -- 1.7.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g winsys backend rework
On Sun, Aug 7, 2011 at 5:25 PM, Marek Olšák mar...@gmail.com wrote: Hi, I have been recently trying to get thread offloading of the CS ioctl into r600g in order to reduce the impact of kernel overhead on fps. That, unfortunately, requires whole winsys/radeon to be used, because even the buffer management (bo_map, bo_wait, bo_busy) must take into account that a CS ioctl may be in progress. Besides that, there are several possible race conditions in r600g, so instead of rewriting r600g and trying to do what winsys/radeon is doing, I decided to simply use winsys/radeon. Seems to work well on my rv730, with every game I can throw at it. -- Will Dyson ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g winsys backend rework
* Marek Olšák mar...@gmail.com: I have been recently trying to get thread offloading of the CS ioctl into r600g in order to reduce the impact of kernel overhead on fps. That, unfortunately, requires whole winsys/radeon to be used, because even the buffer management (bo_map, bo_wait, bo_busy) must take into account that a CS ioctl may be in progress. Besides that, there are several possible race conditions in r600g, so instead of rewriting r600g and trying to do what winsys/radeon is doing, I decided to simply use winsys/radeon. FWIW, looks good at my place (RS780). Best regards, Nicolas Kaiser ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] x86-64: Fix compile error with clang
On Wed, Aug 10, 2011 at 04:05:41PM -0700, Chad Versace wrote: Remove the 'f' suffix from a float literal. - .float 0.0f+1.0 + .float 1.0 This fixes the following compile error with clang: error: unexpected token in directive .float 0.0f+1.0 ^ Signed-off-by: Chad Versace c...@chad-versace.us Reviewed-by: Ben Widawsky b...@bwidawsk.net ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 39588] [bisected] mesa demo xeglgears draw nothing if surface type is a pixmap
https://bugs.freedesktop.org/show_bug.cgi?id=39588 --- Comment #10 from huihui.zh...@intel.com 2011-08-10 19:20:07 PDT --- (In reply to comment #8) (In reply to comment #7) (In reply to comment #6) Created an attachment (id=49902) View: https://bugs.freedesktop.org/attachment.cgi?id=49902 Review: https://bugs.freedesktop.org/review?bug=39588attachment=49902 [details] intel: Set ctx's drawbuffer according to drawables visual This patch doesn't seem like the right fix, and the commit message is a bit confusing to me (see below). What particularly concerns me is that the change that created the bug was in common code, but the fix is in driver code. Does this leave other drivers broken? egl_dri2 creates contexts with a doubleBufferConfig when PIXMAP and WINDOW bit is request, so _mesa_init_color sets DrawBuffer[0] to GL_BACK. I'm confused about this. The application creates a context with a particular fbconfig, and that determines whether the context is single-buffered or double-buffered. What does egl_dri2 have to do with anything? If a pixmap surface is created egl_dri2 will use a single buffer config, so MakeCurrent has to adjust DrawBuffer[0] to the current drawable. One other nit with the commit message. The usual way to reference a bug is: Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39588 This makes it easier to search in 'git log' for things to cherry pick to stable branches. I just try your patch. ./xeglgears -pixmap works fine Do the other modes mentioned in comment #0 also work? but ./xeglgears -pixmap -fullscreen doesn't work The pixmap fullscreen mode doesn't work out on both the 7.9 and 7.11. If fullscreen didn't work with Mesa 7.9, it's not the same bug. Would you mind helping me looking into this issue. Thanks a lot. Sorry for the late response. GL_RENDERER = Mesa DRI Intel(R) IGD x86/MMX/SSE2 GL_VERSION= 1.4 Mesa 7.12-devel (git-4dd3272) ./xeglgears -pixmap works ./xeglgears -pixmap-texture works ./xeglgears -pixmap -fullscreen doesn't work ./xeglgears -pixmap-texture -fullscreen works ./xeglgears -pbuffer works ./xeglgears -pbuffer-texture works ./xeglgears -pbuffer -fullscreen works ./xeglgears -pbuffer-texture -fullscreen works Hope that can help you. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev