Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering
On Fri, 2011-08-12 at 07:48 +0200, Andreas Fänger wrote: OSMesa for gallium would be really helpful. Are there plans to implement some sort of antialising (GL_POLYGON_SMOOTH, fsaa) in softpipe/llvmpipe? MLAA works with softpipe and llvmpipe: http://candgsoc.host56.com/ -- Cheers, Sven Arvidsson http://www.whiz.se PGP Key ID 760BDD22 signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering
These patches don't look too intrusive so I'm OK with them. I'll apply them, test and push. I've been meaning to write a new OSMesa interface for gallium for some time now. That would probably be useful to a few people. I'm tempted to redesign the OSMesa API a bit, but it's probably more important to keep it unchanged for the sake of existing apps. -Brian On 08/10/2011 03:49 AM, Andreas Fänger wrote: Hi Keith, you are right. The main purpose of this patch is to speedup osmesa rendering as there is no llvmpipe target at the moment. Also llvmpipe is currently missing some important features like aa/fsaa and anisotropic filtering, which is available in swrast now. So I need to stick with the old rasterizer at the moment, with some improvements. Andreas -Ursprüngliche Nachricht- Von: Keith Whitwell [mailto:kei...@vmware.com] Gesendet: Mittwoch, 10. August 2011 11:17 An: Andreas Fänger Cc: mesa-dev@lists.freedesktop.org Betreff: Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering I'm not sure it makes a lot of sense to be optimizing swrast at this stage. Take a look at llvmpipe and perhaps consider improving the multithreading already in place in that rasterizer, which is far better optimized than swrast already. Keith On Wed, 2011-08-10 at 08:07 +, Andreas Fänger wrote: Optional parallel rendering of spans using OpenMP. Initial implementation for aa triangles. A new option for scons is also provided to activate the openmp support (off by default). --- common.py |1 + scons/gallium.py | 12 +++ src/mesa/swrast/s_aatritemp.h | 68 ++- src/mesa/swrast/s_context.c| 26 --- src/mesa/swrast/s_texcombine.c |4 ++ src/mesa/tnl/t_pipeline.c | 12 +++ 6 files changed, 87 insertions(+), 36 deletions(-) diff --git a/common.py b/common.py index 8657030..cfee1b5 100644 --- a/common.py +++ b/common.py @@ -88,6 +88,7 @@ def AddOptions(opts): opts.Add('toolchain', 'compiler toolchain', default_toolchain) opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support', 'no')) opts.Add(BoolOption('llvm', 'use LLVM', default_llvm)) + opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp (swrast)', 'no')) opts.Add(BoolOption('debug', 'DEPRECATED: debug build', 'yes')) opts.Add(BoolOption('profile', 'DEPRECATED: profile build', 'no')) opts.Add(BoolOption('quiet', 'DEPRECATED: profile build', 'yes')) diff --git a/scons/gallium.py b/scons/gallium.py index 8cd3bc7..7135251 100755 --- a/scons/gallium.py +++ b/scons/gallium.py @@ -596,6 +596,18 @@ def generate(env): libs += ['m', 'pthread', 'dl'] env.Append(LIBS = libs) +# OpenMP +if env['openmp']: +if env['msvc']: +env.Append(CCFLAGS = ['/openmp']) +# When building openmp release VS2008 link.exe crashes with LNK1103 error. +# Workaround: overwrite PDB flags with empty value as it isn't required anyways +if env['build'] == 'release': +env['PDB'] = '' +if env['gcc']: +env.Append(CCFLAGS = ['-fopenmp']) +env.Append(LIBS = ['gomp']) + # Load tools env.Tool('lex') env.Tool('yacc') diff --git a/src/mesa/swrast/s_aatritemp.h b/src/mesa/swrast/s_aatritemp.h index 91d4f7a..005d12c 100644 --- a/src/mesa/swrast/s_aatritemp.h +++ b/src/mesa/swrast/s_aatritemp.h @@ -181,13 +181,18 @@ const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS]; const GLfloat dxdy = majDx / majDy; const GLfloat xAdj = dxdy 0.0F ? -dxdy : 0.0F; - GLfloat x = pMin[0] - (yMin - iyMin) * dxdy; GLint iy; - for (iy = iyMin; iy iyMax; iy++, x += dxdy) { + #pragma omp parallel for schedule(dynamic) private(iy) firstprivate(span) + for (iy = iyMin; iy iyMax; iy++) { + GLfloat x = pMin[0] - (yMin - iy) * dxdy; GLint ix, startX = (GLint) (x - xAdj); GLuint count; GLfloat coverage = 0.0F; +#ifdef _OPENMP + /* each thread needs to use a different (global) SpanArrays variable */ + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num(); +#endif /* skip over fragments with zero coverage */ while (startX MAX_WIDTH) { coverage = compute_coveragef(pMin, pMid, pMax, startX, iy); @@ -228,13 +233,12 @@ coverage = compute_coveragef(pMin, pMid, pMax, ix, iy); } - if (ix= startX) -continue; - - span.x = startX; - span.y = iy; - span.end = (GLuint) ix - (GLuint) startX; - _swrast_write_rgba_span(ctx,span); + if (ix startX) { +span.x = startX; +span.y = iy; +span.end = (GLuint) ix - (GLuint) startX; +_swrast_write_rgba_span(ctx,span
Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering
OSMesa for gallium would be really helpful. Are there plans to implement some sort of antialising (GL_POLYGON_SMOOTH, fsaa) in softpipe/llvmpipe? Andreas -Ursprüngliche Nachricht- Von: Brian Paul [mailto:bri...@vmware.com] Gesendet: Donnerstag, 11. August 2011 16:31 An: Andreas Fänger Cc: mesa-dev@lists.freedesktop.org Betreff: Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering These patches don't look too intrusive so I'm OK with them. I'll apply them, test and push. I've been meaning to write a new OSMesa interface for gallium for some time now. That would probably be useful to a few people. I'm tempted to redesign the OSMesa API a bit, but it's probably more important to keep it unchanged for the sake of existing apps. -Brian On 08/10/2011 03:49 AM, Andreas Fänger wrote: Hi Keith, you are right. The main purpose of this patch is to speedup osmesa rendering as there is no llvmpipe target at the moment. Also llvmpipe is currently missing some important features like aa/fsaa and anisotropic filtering, which is available in swrast now. So I need to stick with the old rasterizer at the moment, with some improvements. Andreas -Ursprüngliche Nachricht- Von: Keith Whitwell [mailto:kei...@vmware.com] Gesendet: Mittwoch, 10. August 2011 11:17 An: Andreas Fänger Cc: mesa-dev@lists.freedesktop.org Betreff: Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering I'm not sure it makes a lot of sense to be optimizing swrast at this stage. Take a look at llvmpipe and perhaps consider improving the multithreading already in place in that rasterizer, which is far better optimized than swrast already. Keith On Wed, 2011-08-10 at 08:07 +, Andreas Fänger wrote: Optional parallel rendering of spans using OpenMP. Initial implementation for aa triangles. A new option for scons is also provided to activate the openmp support (off by default). --- common.py |1 + scons/gallium.py | 12 +++ src/mesa/swrast/s_aatritemp.h | 68 ++- src/mesa/swrast/s_context.c| 26 --- src/mesa/swrast/s_texcombine.c |4 ++ src/mesa/tnl/t_pipeline.c | 12 +++ 6 files changed, 87 insertions(+), 36 deletions(-) diff --git a/common.py b/common.py index 8657030..cfee1b5 100644 --- a/common.py +++ b/common.py @@ -88,6 +88,7 @@ def AddOptions(opts): opts.Add('toolchain', 'compiler toolchain', default_toolchain) opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support', 'no')) opts.Add(BoolOption('llvm', 'use LLVM', default_llvm)) +opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp (swrast)', 'no')) opts.Add(BoolOption('debug', 'DEPRECATED: debug build', 'yes')) opts.Add(BoolOption('profile', 'DEPRECATED: profile build', 'no')) opts.Add(BoolOption('quiet', 'DEPRECATED: profile build', 'yes')) diff --git a/scons/gallium.py b/scons/gallium.py index 8cd3bc7..7135251 100755 --- a/scons/gallium.py +++ b/scons/gallium.py @@ -596,6 +596,18 @@ def generate(env): libs += ['m', 'pthread', 'dl'] env.Append(LIBS = libs) +# OpenMP +if env['openmp']: +if env['msvc']: +env.Append(CCFLAGS = ['/openmp']) +# When building openmp release VS2008 link.exe crashes with LNK1103 error. +# Workaround: overwrite PDB flags with empty value as it isn't required anyways +if env['build'] == 'release': +env['PDB'] = '' +if env['gcc']: +env.Append(CCFLAGS = ['-fopenmp']) +env.Append(LIBS = ['gomp']) + # Load tools env.Tool('lex') env.Tool('yacc') diff --git a/src/mesa/swrast/s_aatritemp.h b/src/mesa/swrast/s_aatritemp.h index 91d4f7a..005d12c 100644 --- a/src/mesa/swrast/s_aatritemp.h +++ b/src/mesa/swrast/s_aatritemp.h @@ -181,13 +181,18 @@ const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS]; const GLfloat dxdy = majDx / majDy; const GLfloat xAdj = dxdy 0.0F ? -dxdy : 0.0F; - GLfloat x = pMin[0] - (yMin - iyMin) * dxdy; GLint iy; - for (iy = iyMin; iy iyMax; iy++, x += dxdy) { + #pragma omp parallel for schedule(dynamic) private(iy) firstprivate(span) + for (iy = iyMin; iy iyMax; iy++) { + GLfloat x = pMin[0] - (yMin - iy) * dxdy; GLint ix, startX = (GLint) (x - xAdj); GLuint count; GLfloat coverage = 0.0F; +#ifdef _OPENMP + /* each thread needs to use a different (global) SpanArrays variable */ + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num(); +#endif /* skip over fragments with zero coverage */ while (startX MAX_WIDTH) { coverage = compute_coveragef(pMin, pMid, pMax, startX, iy); @@ -228,13 +233,12
[Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering
This patch makes it possible to render spans of a triangle in parallel. To make as little changes to the codebase as possible, OpenMP was choosen to implement the actual multithreading. The patch is meant to speedup osmesa rendering. Andreas Fänger (1): swrast: initial multi-threaded span rendering common.py |1 + scons/gallium.py | 12 +++ src/mesa/swrast/s_aatritemp.h | 68 ++- src/mesa/swrast/s_context.c| 26 --- src/mesa/swrast/s_texcombine.c |4 ++ src/mesa/tnl/t_pipeline.c | 12 +++ 6 files changed, 87 insertions(+), 36 deletions(-) -- 1.7.4.msysgit.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering
Optional parallel rendering of spans using OpenMP. Initial implementation for aa triangles. A new option for scons is also provided to activate the openmp support (off by default). --- common.py |1 + scons/gallium.py | 12 +++ src/mesa/swrast/s_aatritemp.h | 68 ++- src/mesa/swrast/s_context.c| 26 --- src/mesa/swrast/s_texcombine.c |4 ++ src/mesa/tnl/t_pipeline.c | 12 +++ 6 files changed, 87 insertions(+), 36 deletions(-) diff --git a/common.py b/common.py index 8657030..cfee1b5 100644 --- a/common.py +++ b/common.py @@ -88,6 +88,7 @@ def AddOptions(opts): opts.Add('toolchain', 'compiler toolchain', default_toolchain) opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support', 'no')) opts.Add(BoolOption('llvm', 'use LLVM', default_llvm)) + opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp (swrast)', 'no')) opts.Add(BoolOption('debug', 'DEPRECATED: debug build', 'yes')) opts.Add(BoolOption('profile', 'DEPRECATED: profile build', 'no')) opts.Add(BoolOption('quiet', 'DEPRECATED: profile build', 'yes')) diff --git a/scons/gallium.py b/scons/gallium.py index 8cd3bc7..7135251 100755 --- a/scons/gallium.py +++ b/scons/gallium.py @@ -596,6 +596,18 @@ def generate(env): libs += ['m', 'pthread', 'dl'] env.Append(LIBS = libs) +# OpenMP +if env['openmp']: +if env['msvc']: +env.Append(CCFLAGS = ['/openmp']) +# When building openmp release VS2008 link.exe crashes with LNK1103 error. +# Workaround: overwrite PDB flags with empty value as it isn't required anyways +if env['build'] == 'release': +env['PDB'] = '' +if env['gcc']: +env.Append(CCFLAGS = ['-fopenmp']) +env.Append(LIBS = ['gomp']) + # Load tools env.Tool('lex') env.Tool('yacc') diff --git a/src/mesa/swrast/s_aatritemp.h b/src/mesa/swrast/s_aatritemp.h index 91d4f7a..005d12c 100644 --- a/src/mesa/swrast/s_aatritemp.h +++ b/src/mesa/swrast/s_aatritemp.h @@ -181,13 +181,18 @@ const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS]; const GLfloat dxdy = majDx / majDy; const GLfloat xAdj = dxdy 0.0F ? -dxdy : 0.0F; - GLfloat x = pMin[0] - (yMin - iyMin) * dxdy; GLint iy; - for (iy = iyMin; iy iyMax; iy++, x += dxdy) { + #pragma omp parallel for schedule(dynamic) private(iy) firstprivate(span) + for (iy = iyMin; iy iyMax; iy++) { + GLfloat x = pMin[0] - (yMin - iy) * dxdy; GLint ix, startX = (GLint) (x - xAdj); GLuint count; GLfloat coverage = 0.0F; +#ifdef _OPENMP + /* each thread needs to use a different (global) SpanArrays variable */ + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num(); +#endif /* skip over fragments with zero coverage */ while (startX MAX_WIDTH) { coverage = compute_coveragef(pMin, pMid, pMax, startX, iy); @@ -228,13 +233,12 @@ coverage = compute_coveragef(pMin, pMid, pMax, ix, iy); } - if (ix = startX) -continue; - - span.x = startX; - span.y = iy; - span.end = (GLuint) ix - (GLuint) startX; - _swrast_write_rgba_span(ctx, span); + if (ix startX) { +span.x = startX; +span.y = iy; +span.end = (GLuint) ix - (GLuint) startX; +_swrast_write_rgba_span(ctx, span); + } } } else { @@ -244,13 +248,18 @@ const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS]; const GLfloat dxdy = majDx / majDy; const GLfloat xAdj = dxdy 0 ? dxdy : 0.0F; - GLfloat x = pMin[0] - (yMin - iyMin) * dxdy; GLint iy; - for (iy = iyMin; iy iyMax; iy++, x += dxdy) { + #pragma omp parallel for schedule(dynamic) private(iy) firstprivate(span) + for (iy = iyMin; iy iyMax; iy++) { + GLfloat x = pMin[0] - (yMin - iy) * dxdy; GLint ix, left, startX = (GLint) (x + xAdj); GLuint count, n; GLfloat coverage = 0.0F; +#ifdef _OPENMP + /* each thread needs to use a different (global) SpanArrays variable */ + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num(); +#endif /* make sure we're not past the window edge */ if (startX = ctx-DrawBuffer-_Xmax) { startX = ctx-DrawBuffer-_Xmax - 1; @@ -296,31 +305,30 @@ ATTRIB_LOOP_END #endif - if (startX = ix) -continue; + if (startX ix) { +n = (GLuint) startX - (GLuint) ix; - n = (GLuint) startX - (GLuint) ix; +left = ix + 1; - left = ix + 1; - - /* shift all values to the left */ - /* XXX this is temporary */ - { -
Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering
I'm not sure it makes a lot of sense to be optimizing swrast at this stage. Take a look at llvmpipe and perhaps consider improving the multithreading already in place in that rasterizer, which is far better optimized than swrast already. Keith On Wed, 2011-08-10 at 08:07 +, Andreas Fänger wrote: Optional parallel rendering of spans using OpenMP. Initial implementation for aa triangles. A new option for scons is also provided to activate the openmp support (off by default). --- common.py |1 + scons/gallium.py | 12 +++ src/mesa/swrast/s_aatritemp.h | 68 ++- src/mesa/swrast/s_context.c| 26 --- src/mesa/swrast/s_texcombine.c |4 ++ src/mesa/tnl/t_pipeline.c | 12 +++ 6 files changed, 87 insertions(+), 36 deletions(-) diff --git a/common.py b/common.py index 8657030..cfee1b5 100644 --- a/common.py +++ b/common.py @@ -88,6 +88,7 @@ def AddOptions(opts): opts.Add('toolchain', 'compiler toolchain', default_toolchain) opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support', 'no')) opts.Add(BoolOption('llvm', 'use LLVM', default_llvm)) + opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp (swrast)', 'no')) opts.Add(BoolOption('debug', 'DEPRECATED: debug build', 'yes')) opts.Add(BoolOption('profile', 'DEPRECATED: profile build', 'no')) opts.Add(BoolOption('quiet', 'DEPRECATED: profile build', 'yes')) diff --git a/scons/gallium.py b/scons/gallium.py index 8cd3bc7..7135251 100755 --- a/scons/gallium.py +++ b/scons/gallium.py @@ -596,6 +596,18 @@ def generate(env): libs += ['m', 'pthread', 'dl'] env.Append(LIBS = libs) +# OpenMP +if env['openmp']: +if env['msvc']: +env.Append(CCFLAGS = ['/openmp']) +# When building openmp release VS2008 link.exe crashes with LNK1103 error. +# Workaround: overwrite PDB flags with empty value as it isn't required anyways +if env['build'] == 'release': +env['PDB'] = '' +if env['gcc']: +env.Append(CCFLAGS = ['-fopenmp']) +env.Append(LIBS = ['gomp']) + # Load tools env.Tool('lex') env.Tool('yacc') diff --git a/src/mesa/swrast/s_aatritemp.h b/src/mesa/swrast/s_aatritemp.h index 91d4f7a..005d12c 100644 --- a/src/mesa/swrast/s_aatritemp.h +++ b/src/mesa/swrast/s_aatritemp.h @@ -181,13 +181,18 @@ const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS]; const GLfloat dxdy = majDx / majDy; const GLfloat xAdj = dxdy 0.0F ? -dxdy : 0.0F; - GLfloat x = pMin[0] - (yMin - iyMin) * dxdy; GLint iy; - for (iy = iyMin; iy iyMax; iy++, x += dxdy) { + #pragma omp parallel for schedule(dynamic) private(iy) firstprivate(span) + for (iy = iyMin; iy iyMax; iy++) { + GLfloat x = pMin[0] - (yMin - iy) * dxdy; GLint ix, startX = (GLint) (x - xAdj); GLuint count; GLfloat coverage = 0.0F; +#ifdef _OPENMP + /* each thread needs to use a different (global) SpanArrays variable */ + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num(); +#endif /* skip over fragments with zero coverage */ while (startX MAX_WIDTH) { coverage = compute_coveragef(pMin, pMid, pMax, startX, iy); @@ -228,13 +233,12 @@ coverage = compute_coveragef(pMin, pMid, pMax, ix, iy); } - if (ix = startX) -continue; - - span.x = startX; - span.y = iy; - span.end = (GLuint) ix - (GLuint) startX; - _swrast_write_rgba_span(ctx, span); + if (ix startX) { +span.x = startX; +span.y = iy; +span.end = (GLuint) ix - (GLuint) startX; +_swrast_write_rgba_span(ctx, span); + } } } else { @@ -244,13 +248,18 @@ const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS]; const GLfloat dxdy = majDx / majDy; const GLfloat xAdj = dxdy 0 ? dxdy : 0.0F; - GLfloat x = pMin[0] - (yMin - iyMin) * dxdy; GLint iy; - for (iy = iyMin; iy iyMax; iy++, x += dxdy) { + #pragma omp parallel for schedule(dynamic) private(iy) firstprivate(span) + for (iy = iyMin; iy iyMax; iy++) { + GLfloat x = pMin[0] - (yMin - iy) * dxdy; GLint ix, left, startX = (GLint) (x + xAdj); GLuint count, n; GLfloat coverage = 0.0F; +#ifdef _OPENMP + /* each thread needs to use a different (global) SpanArrays variable */ + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num(); +#endif /* make sure we're not past the window edge */ if (startX = ctx-DrawBuffer-_Xmax) { startX =
Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering
Hi Keith, you are right. The main purpose of this patch is to speedup osmesa rendering as there is no llvmpipe target at the moment. Also llvmpipe is currently missing some important features like aa/fsaa and anisotropic filtering, which is available in swrast now. So I need to stick with the old rasterizer at the moment, with some improvements. Andreas -Ursprüngliche Nachricht- Von: Keith Whitwell [mailto:kei...@vmware.com] Gesendet: Mittwoch, 10. August 2011 11:17 An: Andreas Fänger Cc: mesa-dev@lists.freedesktop.org Betreff: Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering I'm not sure it makes a lot of sense to be optimizing swrast at this stage. Take a look at llvmpipe and perhaps consider improving the multithreading already in place in that rasterizer, which is far better optimized than swrast already. Keith On Wed, 2011-08-10 at 08:07 +, Andreas Fänger wrote: Optional parallel rendering of spans using OpenMP. Initial implementation for aa triangles. A new option for scons is also provided to activate the openmp support (off by default). --- common.py |1 + scons/gallium.py | 12 +++ src/mesa/swrast/s_aatritemp.h | 68 ++- src/mesa/swrast/s_context.c| 26 --- src/mesa/swrast/s_texcombine.c |4 ++ src/mesa/tnl/t_pipeline.c | 12 +++ 6 files changed, 87 insertions(+), 36 deletions(-) diff --git a/common.py b/common.py index 8657030..cfee1b5 100644 --- a/common.py +++ b/common.py @@ -88,6 +88,7 @@ def AddOptions(opts): opts.Add('toolchain', 'compiler toolchain', default_toolchain) opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support', 'no')) opts.Add(BoolOption('llvm', 'use LLVM', default_llvm)) + opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp (swrast)', 'no')) opts.Add(BoolOption('debug', 'DEPRECATED: debug build', 'yes')) opts.Add(BoolOption('profile', 'DEPRECATED: profile build', 'no')) opts.Add(BoolOption('quiet', 'DEPRECATED: profile build', 'yes')) diff --git a/scons/gallium.py b/scons/gallium.py index 8cd3bc7..7135251 100755 --- a/scons/gallium.py +++ b/scons/gallium.py @@ -596,6 +596,18 @@ def generate(env): libs += ['m', 'pthread', 'dl'] env.Append(LIBS = libs) +# OpenMP +if env['openmp']: +if env['msvc']: +env.Append(CCFLAGS = ['/openmp']) +# When building openmp release VS2008 link.exe crashes with LNK1103 error. +# Workaround: overwrite PDB flags with empty value as it isn't required anyways +if env['build'] == 'release': +env['PDB'] = '' +if env['gcc']: +env.Append(CCFLAGS = ['-fopenmp']) +env.Append(LIBS = ['gomp']) + # Load tools env.Tool('lex') env.Tool('yacc') diff --git a/src/mesa/swrast/s_aatritemp.h b/src/mesa/swrast/s_aatritemp.h index 91d4f7a..005d12c 100644 --- a/src/mesa/swrast/s_aatritemp.h +++ b/src/mesa/swrast/s_aatritemp.h @@ -181,13 +181,18 @@ const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS]; const GLfloat dxdy = majDx / majDy; const GLfloat xAdj = dxdy 0.0F ? -dxdy : 0.0F; - GLfloat x = pMin[0] - (yMin - iyMin) * dxdy; GLint iy; - for (iy = iyMin; iy iyMax; iy++, x += dxdy) { + #pragma omp parallel for schedule(dynamic) private(iy) firstprivate(span) + for (iy = iyMin; iy iyMax; iy++) { + GLfloat x = pMin[0] - (yMin - iy) * dxdy; GLint ix, startX = (GLint) (x - xAdj); GLuint count; GLfloat coverage = 0.0F; +#ifdef _OPENMP + /* each thread needs to use a different (global) SpanArrays variable */ + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num(); +#endif /* skip over fragments with zero coverage */ while (startX MAX_WIDTH) { coverage = compute_coveragef(pMin, pMid, pMax, startX, iy); @@ -228,13 +233,12 @@ coverage = compute_coveragef(pMin, pMid, pMax, ix, iy); } - if (ix = startX) -continue; - - span.x = startX; - span.y = iy; - span.end = (GLuint) ix - (GLuint) startX; - _swrast_write_rgba_span(ctx, span); + if (ix startX) { +span.x = startX; +span.y = iy; +span.end = (GLuint) ix - (GLuint) startX; +_swrast_write_rgba_span(ctx, span); + } } } else { @@ -244,13 +248,18 @@ const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS]; const GLfloat dxdy = majDx / majDy; const GLfloat xAdj = dxdy 0 ? dxdy : 0.0F; - GLfloat x = pMin[0] - (yMin - iyMin) * dxdy; GLint iy; - for (iy = iyMin; iy iyMax; iy++, x += dxdy) { + #pragma