Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering

2011-08-12 Thread Sven Arvidsson
On Fri, 2011-08-12 at 07:48 +0200, Andreas Fänger wrote:
 OSMesa for gallium would be really helpful. Are there plans to
 implement some sort of antialising (GL_POLYGON_SMOOTH, fsaa) in
 softpipe/llvmpipe?
 
MLAA works with softpipe and llvmpipe:
http://candgsoc.host56.com/

-- 
Cheers,
Sven Arvidsson
http://www.whiz.se
PGP Key ID 760BDD22



signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering

2011-08-11 Thread Brian Paul
These patches don't look too intrusive so I'm OK with them.  I'll 
apply them, test and push.


I've been meaning to write a new OSMesa interface for gallium for some 
time now.  That would probably be useful to a few people.


I'm tempted to redesign the OSMesa API a bit, but it's probably more 
important to keep it unchanged for the sake of existing apps.


-Brian


On 08/10/2011 03:49 AM, Andreas Fänger wrote:

Hi Keith,

you are right. The main purpose of this patch is to speedup osmesa rendering as 
there is no llvmpipe target at the moment. Also llvmpipe is currently missing 
some important features like aa/fsaa and anisotropic filtering, which is 
available in swrast now.
So I need to stick with the old rasterizer at the moment, with some 
improvements.

Andreas

-Ursprüngliche Nachricht-
Von: Keith Whitwell [mailto:kei...@vmware.com]
Gesendet: Mittwoch, 10. August 2011 11:17
An: Andreas Fänger
Cc: mesa-dev@lists.freedesktop.org
Betreff: Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering

I'm not sure it makes a lot of sense to be optimizing swrast at this
stage.  Take a look at llvmpipe and perhaps consider improving the
multithreading already in place in that rasterizer, which is far better
optimized than swrast already.

Keith

On Wed, 2011-08-10 at 08:07 +, Andreas Fänger wrote:

Optional parallel rendering of spans using OpenMP.
Initial implementation for aa triangles. A new option for scons is
also provided to activate the openmp support (off by default).
---
  common.py  |1 +
  scons/gallium.py   |   12 +++
  src/mesa/swrast/s_aatritemp.h  |   68 ++-
  src/mesa/swrast/s_context.c|   26 ---
  src/mesa/swrast/s_texcombine.c |4 ++
  src/mesa/tnl/t_pipeline.c  |   12 +++
  6 files changed, 87 insertions(+), 36 deletions(-)

diff --git a/common.py b/common.py
index 8657030..cfee1b5 100644
--- a/common.py
+++ b/common.py
@@ -88,6 +88,7 @@ def AddOptions(opts):
opts.Add('toolchain', 'compiler toolchain', default_toolchain)
opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support', 
'no'))
opts.Add(BoolOption('llvm', 'use LLVM', default_llvm))
+   opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp 
(swrast)', 'no'))
opts.Add(BoolOption('debug', 'DEPRECATED: debug build', 'yes'))
opts.Add(BoolOption('profile', 'DEPRECATED: profile build', 'no'))
opts.Add(BoolOption('quiet', 'DEPRECATED: profile build', 'yes'))
diff --git a/scons/gallium.py b/scons/gallium.py
index 8cd3bc7..7135251 100755
--- a/scons/gallium.py
+++ b/scons/gallium.py
@@ -596,6 +596,18 @@ def generate(env):
  libs += ['m', 'pthread', 'dl']
  env.Append(LIBS = libs)

+# OpenMP
+if env['openmp']:
+if env['msvc']:
+env.Append(CCFLAGS = ['/openmp'])
+# When building openmp release VS2008 link.exe crashes with 
LNK1103 error.
+# Workaround: overwrite PDB flags with empty value as it isn't 
required anyways
+if env['build'] == 'release':
+env['PDB'] = ''
+if env['gcc']:
+env.Append(CCFLAGS = ['-fopenmp'])
+env.Append(LIBS = ['gomp'])
+
  # Load tools
  env.Tool('lex')
  env.Tool('yacc')
diff --git a/src/mesa/swrast/s_aatritemp.h b/src/mesa/swrast/s_aatritemp.h
index 91d4f7a..005d12c 100644
--- a/src/mesa/swrast/s_aatritemp.h
+++ b/src/mesa/swrast/s_aatritemp.h
@@ -181,13 +181,18 @@
const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS];
const GLfloat dxdy = majDx / majDy;
const GLfloat xAdj = dxdy  0.0F ? -dxdy : 0.0F;
-  GLfloat x = pMin[0] - (yMin - iyMin) * dxdy;
GLint iy;
-  for (iy = iyMin; iy  iyMax; iy++, x += dxdy) {
+  #pragma omp parallel for schedule(dynamic) private(iy) firstprivate(span)
+  for (iy = iyMin; iy  iyMax; iy++) {
+ GLfloat x = pMin[0] - (yMin - iy) * dxdy;
   GLint ix, startX = (GLint) (x - xAdj);
   GLuint count;
   GLfloat coverage = 0.0F;

+#ifdef _OPENMP
+ /* each thread needs to use a different (global) SpanArrays variable 
*/
+ span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num();
+#endif
   /* skip over fragments with zero coverage */
   while (startX  MAX_WIDTH) {
  coverage = compute_coveragef(pMin, pMid, pMax, startX, iy);
@@ -228,13 +233,12 @@
  coverage = compute_coveragef(pMin, pMid, pMax, ix, iy);
   }

- if (ix= startX)
-continue;
-
- span.x = startX;
- span.y = iy;
- span.end = (GLuint) ix - (GLuint) startX;
- _swrast_write_rgba_span(ctx,span);
+ if (ix  startX) {
+span.x = startX;
+span.y = iy;
+span.end = (GLuint) ix - (GLuint) startX;
+_swrast_write_rgba_span(ctx,span

Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering

2011-08-11 Thread Andreas Fänger
OSMesa for gallium would be really helpful. Are there plans to implement some 
sort of antialising (GL_POLYGON_SMOOTH, fsaa) in softpipe/llvmpipe?

Andreas

-Ursprüngliche Nachricht-
Von: Brian Paul [mailto:bri...@vmware.com] 
Gesendet: Donnerstag, 11. August 2011 16:31
An: Andreas Fänger
Cc: mesa-dev@lists.freedesktop.org
Betreff: Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering

These patches don't look too intrusive so I'm OK with them.  I'll 
apply them, test and push.

I've been meaning to write a new OSMesa interface for gallium for some 
time now.  That would probably be useful to a few people.

I'm tempted to redesign the OSMesa API a bit, but it's probably more 
important to keep it unchanged for the sake of existing apps.

-Brian


On 08/10/2011 03:49 AM, Andreas Fänger wrote:
 Hi Keith,

 you are right. The main purpose of this patch is to speedup osmesa rendering 
 as there is no llvmpipe target at the moment. Also llvmpipe is currently 
 missing some important features like aa/fsaa and anisotropic filtering, which 
 is available in swrast now.
 So I need to stick with the old rasterizer at the moment, with some 
 improvements.

 Andreas

 -Ursprüngliche Nachricht-
 Von: Keith Whitwell [mailto:kei...@vmware.com]
 Gesendet: Mittwoch, 10. August 2011 11:17
 An: Andreas Fänger
 Cc: mesa-dev@lists.freedesktop.org
 Betreff: Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering

 I'm not sure it makes a lot of sense to be optimizing swrast at this
 stage.  Take a look at llvmpipe and perhaps consider improving the
 multithreading already in place in that rasterizer, which is far better
 optimized than swrast already.

 Keith

 On Wed, 2011-08-10 at 08:07 +, Andreas Fänger wrote:
 Optional parallel rendering of spans using OpenMP.
 Initial implementation for aa triangles. A new option for scons is
 also provided to activate the openmp support (off by default).
 ---
   common.py  |1 +
   scons/gallium.py   |   12 +++
   src/mesa/swrast/s_aatritemp.h  |   68 
 ++-
   src/mesa/swrast/s_context.c|   26 ---
   src/mesa/swrast/s_texcombine.c |4 ++
   src/mesa/tnl/t_pipeline.c  |   12 +++
   6 files changed, 87 insertions(+), 36 deletions(-)

 diff --git a/common.py b/common.py
 index 8657030..cfee1b5 100644
 --- a/common.py
 +++ b/common.py
 @@ -88,6 +88,7 @@ def AddOptions(opts):
  opts.Add('toolchain', 'compiler toolchain', default_toolchain)
  opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support', 
 'no'))
  opts.Add(BoolOption('llvm', 'use LLVM', default_llvm))
 +opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp 
 (swrast)', 'no'))
  opts.Add(BoolOption('debug', 'DEPRECATED: debug build', 'yes'))
  opts.Add(BoolOption('profile', 'DEPRECATED: profile build', 'no'))
  opts.Add(BoolOption('quiet', 'DEPRECATED: profile build', 'yes'))
 diff --git a/scons/gallium.py b/scons/gallium.py
 index 8cd3bc7..7135251 100755
 --- a/scons/gallium.py
 +++ b/scons/gallium.py
 @@ -596,6 +596,18 @@ def generate(env):
   libs += ['m', 'pthread', 'dl']
   env.Append(LIBS = libs)

 +# OpenMP
 +if env['openmp']:
 +if env['msvc']:
 +env.Append(CCFLAGS = ['/openmp'])
 +# When building openmp release VS2008 link.exe crashes with 
 LNK1103 error.
 +# Workaround: overwrite PDB flags with empty value as it isn't 
 required anyways
 +if env['build'] == 'release':
 +env['PDB'] = ''
 +if env['gcc']:
 +env.Append(CCFLAGS = ['-fopenmp'])
 +env.Append(LIBS = ['gomp'])
 +
   # Load tools
   env.Tool('lex')
   env.Tool('yacc')
 diff --git a/src/mesa/swrast/s_aatritemp.h b/src/mesa/swrast/s_aatritemp.h
 index 91d4f7a..005d12c 100644
 --- a/src/mesa/swrast/s_aatritemp.h
 +++ b/src/mesa/swrast/s_aatritemp.h
 @@ -181,13 +181,18 @@
 const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS];
 const GLfloat dxdy = majDx / majDy;
 const GLfloat xAdj = dxdy  0.0F ? -dxdy : 0.0F;
 -  GLfloat x = pMin[0] - (yMin - iyMin) * dxdy;
 GLint iy;
 -  for (iy = iyMin; iy  iyMax; iy++, x += dxdy) {
 +  #pragma omp parallel for schedule(dynamic) private(iy) 
 firstprivate(span)
 +  for (iy = iyMin; iy  iyMax; iy++) {
 + GLfloat x = pMin[0] - (yMin - iy) * dxdy;
GLint ix, startX = (GLint) (x - xAdj);
GLuint count;
GLfloat coverage = 0.0F;

 +#ifdef _OPENMP
 + /* each thread needs to use a different (global) SpanArrays 
 variable */
 + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + 
 omp_get_thread_num();
 +#endif
/* skip over fragments with zero coverage */
while (startX  MAX_WIDTH) {
   coverage = compute_coveragef(pMin, pMid, pMax, startX, iy);
 @@ -228,13 +233,12

[Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering

2011-08-10 Thread Andreas Fänger
This patch makes it possible to render spans of a triangle in parallel. To make 
as little changes
to the codebase as possible, OpenMP was choosen to implement the actual 
multithreading. The patch
is meant to speedup osmesa rendering. 

Andreas Fänger (1):
  swrast: initial multi-threaded span rendering

 common.py  |1 +
 scons/gallium.py   |   12 +++
 src/mesa/swrast/s_aatritemp.h  |   68 ++-
 src/mesa/swrast/s_context.c|   26 ---
 src/mesa/swrast/s_texcombine.c |4 ++
 src/mesa/tnl/t_pipeline.c  |   12 +++
 6 files changed, 87 insertions(+), 36 deletions(-)

-- 
1.7.4.msysgit.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering

2011-08-10 Thread Andreas Fänger
Optional parallel rendering of spans using OpenMP.
Initial implementation for aa triangles. A new option for scons is
also provided to activate the openmp support (off by default).
---
 common.py  |1 +
 scons/gallium.py   |   12 +++
 src/mesa/swrast/s_aatritemp.h  |   68 ++-
 src/mesa/swrast/s_context.c|   26 ---
 src/mesa/swrast/s_texcombine.c |4 ++
 src/mesa/tnl/t_pipeline.c  |   12 +++
 6 files changed, 87 insertions(+), 36 deletions(-)

diff --git a/common.py b/common.py
index 8657030..cfee1b5 100644
--- a/common.py
+++ b/common.py
@@ -88,6 +88,7 @@ def AddOptions(opts):
opts.Add('toolchain', 'compiler toolchain', default_toolchain)
opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support', 
'no'))
opts.Add(BoolOption('llvm', 'use LLVM', default_llvm))
+   opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp 
(swrast)', 'no'))
opts.Add(BoolOption('debug', 'DEPRECATED: debug build', 'yes'))
opts.Add(BoolOption('profile', 'DEPRECATED: profile build', 'no'))
opts.Add(BoolOption('quiet', 'DEPRECATED: profile build', 'yes'))
diff --git a/scons/gallium.py b/scons/gallium.py
index 8cd3bc7..7135251 100755
--- a/scons/gallium.py
+++ b/scons/gallium.py
@@ -596,6 +596,18 @@ def generate(env):
 libs += ['m', 'pthread', 'dl']
 env.Append(LIBS = libs)
 
+# OpenMP
+if env['openmp']:
+if env['msvc']:
+env.Append(CCFLAGS = ['/openmp'])
+# When building openmp release VS2008 link.exe crashes with 
LNK1103 error.
+# Workaround: overwrite PDB flags with empty value as it isn't 
required anyways
+if env['build'] == 'release':
+env['PDB'] = ''
+if env['gcc']:
+env.Append(CCFLAGS = ['-fopenmp'])
+env.Append(LIBS = ['gomp'])
+
 # Load tools
 env.Tool('lex')
 env.Tool('yacc')
diff --git a/src/mesa/swrast/s_aatritemp.h b/src/mesa/swrast/s_aatritemp.h
index 91d4f7a..005d12c 100644
--- a/src/mesa/swrast/s_aatritemp.h
+++ b/src/mesa/swrast/s_aatritemp.h
@@ -181,13 +181,18 @@
   const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS];
   const GLfloat dxdy = majDx / majDy;
   const GLfloat xAdj = dxdy  0.0F ? -dxdy : 0.0F;
-  GLfloat x = pMin[0] - (yMin - iyMin) * dxdy;
   GLint iy;
-  for (iy = iyMin; iy  iyMax; iy++, x += dxdy) {
+  #pragma omp parallel for schedule(dynamic) private(iy) firstprivate(span)
+  for (iy = iyMin; iy  iyMax; iy++) {
+ GLfloat x = pMin[0] - (yMin - iy) * dxdy;
  GLint ix, startX = (GLint) (x - xAdj);
  GLuint count;
  GLfloat coverage = 0.0F;
 
+#ifdef _OPENMP
+ /* each thread needs to use a different (global) SpanArrays variable 
*/
+ span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num();
+#endif
  /* skip over fragments with zero coverage */
  while (startX  MAX_WIDTH) {
 coverage = compute_coveragef(pMin, pMid, pMax, startX, iy);
@@ -228,13 +233,12 @@
 coverage = compute_coveragef(pMin, pMid, pMax, ix, iy);
  }
  
- if (ix = startX)
-continue;
- 
- span.x = startX;
- span.y = iy;
- span.end = (GLuint) ix - (GLuint) startX;
- _swrast_write_rgba_span(ctx, span);
+ if (ix  startX) {
+span.x = startX;
+span.y = iy;
+span.end = (GLuint) ix - (GLuint) startX;
+_swrast_write_rgba_span(ctx, span);
+ }
   }
}
else {
@@ -244,13 +248,18 @@
   const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS];
   const GLfloat dxdy = majDx / majDy;
   const GLfloat xAdj = dxdy  0 ? dxdy : 0.0F;
-  GLfloat x = pMin[0] - (yMin - iyMin) * dxdy;
   GLint iy;
-  for (iy = iyMin; iy  iyMax; iy++, x += dxdy) {
+  #pragma omp parallel for schedule(dynamic) private(iy) firstprivate(span)
+  for (iy = iyMin; iy  iyMax; iy++) {
+ GLfloat x = pMin[0] - (yMin - iy) * dxdy;
  GLint ix, left, startX = (GLint) (x + xAdj);
  GLuint count, n;
  GLfloat coverage = 0.0F;
  
+#ifdef _OPENMP
+ /* each thread needs to use a different (global) SpanArrays variable 
*/
+ span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num();
+#endif
  /* make sure we're not past the window edge */
  if (startX = ctx-DrawBuffer-_Xmax) {
 startX = ctx-DrawBuffer-_Xmax - 1;
@@ -296,31 +305,30 @@
  ATTRIB_LOOP_END
 #endif
 
- if (startX = ix)
-continue;
+ if (startX  ix) {
+n = (GLuint) startX - (GLuint) ix;
 
- n = (GLuint) startX - (GLuint) ix;
+left = ix + 1;
 
- left = ix + 1;
-
- /* shift all values to the left */
- /* XXX this is temporary */
- {
-  

Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering

2011-08-10 Thread Keith Whitwell
I'm not sure it makes a lot of sense to be optimizing swrast at this
stage.  Take a look at llvmpipe and perhaps consider improving the
multithreading already in place in that rasterizer, which is far better
optimized than swrast already.

Keith

On Wed, 2011-08-10 at 08:07 +, Andreas Fänger wrote:
 Optional parallel rendering of spans using OpenMP.
 Initial implementation for aa triangles. A new option for scons is
 also provided to activate the openmp support (off by default).
 ---
  common.py  |1 +
  scons/gallium.py   |   12 +++
  src/mesa/swrast/s_aatritemp.h  |   68 ++-
  src/mesa/swrast/s_context.c|   26 ---
  src/mesa/swrast/s_texcombine.c |4 ++
  src/mesa/tnl/t_pipeline.c  |   12 +++
  6 files changed, 87 insertions(+), 36 deletions(-)
 
 diff --git a/common.py b/common.py
 index 8657030..cfee1b5 100644
 --- a/common.py
 +++ b/common.py
 @@ -88,6 +88,7 @@ def AddOptions(opts):
   opts.Add('toolchain', 'compiler toolchain', default_toolchain)
   opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support', 
 'no'))
   opts.Add(BoolOption('llvm', 'use LLVM', default_llvm))
 + opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp 
 (swrast)', 'no'))
   opts.Add(BoolOption('debug', 'DEPRECATED: debug build', 'yes'))
   opts.Add(BoolOption('profile', 'DEPRECATED: profile build', 'no'))
   opts.Add(BoolOption('quiet', 'DEPRECATED: profile build', 'yes'))
 diff --git a/scons/gallium.py b/scons/gallium.py
 index 8cd3bc7..7135251 100755
 --- a/scons/gallium.py
 +++ b/scons/gallium.py
 @@ -596,6 +596,18 @@ def generate(env):
  libs += ['m', 'pthread', 'dl']
  env.Append(LIBS = libs)
  
 +# OpenMP
 +if env['openmp']:
 +if env['msvc']:
 +env.Append(CCFLAGS = ['/openmp'])
 +# When building openmp release VS2008 link.exe crashes with 
 LNK1103 error.
 +# Workaround: overwrite PDB flags with empty value as it isn't 
 required anyways
 +if env['build'] == 'release':
 +env['PDB'] = ''
 +if env['gcc']:
 +env.Append(CCFLAGS = ['-fopenmp'])
 +env.Append(LIBS = ['gomp'])
 +
  # Load tools
  env.Tool('lex')
  env.Tool('yacc')
 diff --git a/src/mesa/swrast/s_aatritemp.h b/src/mesa/swrast/s_aatritemp.h
 index 91d4f7a..005d12c 100644
 --- a/src/mesa/swrast/s_aatritemp.h
 +++ b/src/mesa/swrast/s_aatritemp.h
 @@ -181,13 +181,18 @@
const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS];
const GLfloat dxdy = majDx / majDy;
const GLfloat xAdj = dxdy  0.0F ? -dxdy : 0.0F;
 -  GLfloat x = pMin[0] - (yMin - iyMin) * dxdy;
GLint iy;
 -  for (iy = iyMin; iy  iyMax; iy++, x += dxdy) {
 +  #pragma omp parallel for schedule(dynamic) private(iy) 
 firstprivate(span)
 +  for (iy = iyMin; iy  iyMax; iy++) {
 + GLfloat x = pMin[0] - (yMin - iy) * dxdy;
   GLint ix, startX = (GLint) (x - xAdj);
   GLuint count;
   GLfloat coverage = 0.0F;
  
 +#ifdef _OPENMP
 + /* each thread needs to use a different (global) SpanArrays 
 variable */
 + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num();
 +#endif
   /* skip over fragments with zero coverage */
   while (startX  MAX_WIDTH) {
  coverage = compute_coveragef(pMin, pMid, pMax, startX, iy);
 @@ -228,13 +233,12 @@
  coverage = compute_coveragef(pMin, pMid, pMax, ix, iy);
   }
   
 - if (ix = startX)
 -continue;
 - 
 - span.x = startX;
 - span.y = iy;
 - span.end = (GLuint) ix - (GLuint) startX;
 - _swrast_write_rgba_span(ctx, span);
 + if (ix  startX) {
 +span.x = startX;
 +span.y = iy;
 +span.end = (GLuint) ix - (GLuint) startX;
 +_swrast_write_rgba_span(ctx, span);
 + }
}
 }
 else {
 @@ -244,13 +248,18 @@
const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS];
const GLfloat dxdy = majDx / majDy;
const GLfloat xAdj = dxdy  0 ? dxdy : 0.0F;
 -  GLfloat x = pMin[0] - (yMin - iyMin) * dxdy;
GLint iy;
 -  for (iy = iyMin; iy  iyMax; iy++, x += dxdy) {
 +  #pragma omp parallel for schedule(dynamic) private(iy) 
 firstprivate(span)
 +  for (iy = iyMin; iy  iyMax; iy++) {
 + GLfloat x = pMin[0] - (yMin - iy) * dxdy;
   GLint ix, left, startX = (GLint) (x + xAdj);
   GLuint count, n;
   GLfloat coverage = 0.0F;
   
 +#ifdef _OPENMP
 + /* each thread needs to use a different (global) SpanArrays 
 variable */
 + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num();
 +#endif
   /* make sure we're not past the window edge */
   if (startX = ctx-DrawBuffer-_Xmax) {
  startX = 

Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering

2011-08-10 Thread Andreas Fänger
Hi Keith,

you are right. The main purpose of this patch is to speedup osmesa rendering as 
there is no llvmpipe target at the moment. Also llvmpipe is currently missing 
some important features like aa/fsaa and anisotropic filtering, which is 
available in swrast now. 
So I need to stick with the old rasterizer at the moment, with some 
improvements.

Andreas

-Ursprüngliche Nachricht-
Von: Keith Whitwell [mailto:kei...@vmware.com] 
Gesendet: Mittwoch, 10. August 2011 11:17
An: Andreas Fänger
Cc: mesa-dev@lists.freedesktop.org
Betreff: Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering

I'm not sure it makes a lot of sense to be optimizing swrast at this
stage.  Take a look at llvmpipe and perhaps consider improving the
multithreading already in place in that rasterizer, which is far better
optimized than swrast already.

Keith

On Wed, 2011-08-10 at 08:07 +, Andreas Fänger wrote:
 Optional parallel rendering of spans using OpenMP.
 Initial implementation for aa triangles. A new option for scons is
 also provided to activate the openmp support (off by default).
 ---
  common.py  |1 +
  scons/gallium.py   |   12 +++
  src/mesa/swrast/s_aatritemp.h  |   68 ++-
  src/mesa/swrast/s_context.c|   26 ---
  src/mesa/swrast/s_texcombine.c |4 ++
  src/mesa/tnl/t_pipeline.c  |   12 +++
  6 files changed, 87 insertions(+), 36 deletions(-)
 
 diff --git a/common.py b/common.py
 index 8657030..cfee1b5 100644
 --- a/common.py
 +++ b/common.py
 @@ -88,6 +88,7 @@ def AddOptions(opts):
   opts.Add('toolchain', 'compiler toolchain', default_toolchain)
   opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support', 
 'no'))
   opts.Add(BoolOption('llvm', 'use LLVM', default_llvm))
 + opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp 
 (swrast)', 'no'))
   opts.Add(BoolOption('debug', 'DEPRECATED: debug build', 'yes'))
   opts.Add(BoolOption('profile', 'DEPRECATED: profile build', 'no'))
   opts.Add(BoolOption('quiet', 'DEPRECATED: profile build', 'yes'))
 diff --git a/scons/gallium.py b/scons/gallium.py
 index 8cd3bc7..7135251 100755
 --- a/scons/gallium.py
 +++ b/scons/gallium.py
 @@ -596,6 +596,18 @@ def generate(env):
  libs += ['m', 'pthread', 'dl']
  env.Append(LIBS = libs)
  
 +# OpenMP
 +if env['openmp']:
 +if env['msvc']:
 +env.Append(CCFLAGS = ['/openmp'])
 +# When building openmp release VS2008 link.exe crashes with 
 LNK1103 error.
 +# Workaround: overwrite PDB flags with empty value as it isn't 
 required anyways
 +if env['build'] == 'release':
 +env['PDB'] = ''
 +if env['gcc']:
 +env.Append(CCFLAGS = ['-fopenmp'])
 +env.Append(LIBS = ['gomp'])
 +
  # Load tools
  env.Tool('lex')
  env.Tool('yacc')
 diff --git a/src/mesa/swrast/s_aatritemp.h b/src/mesa/swrast/s_aatritemp.h
 index 91d4f7a..005d12c 100644
 --- a/src/mesa/swrast/s_aatritemp.h
 +++ b/src/mesa/swrast/s_aatritemp.h
 @@ -181,13 +181,18 @@
const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS];
const GLfloat dxdy = majDx / majDy;
const GLfloat xAdj = dxdy  0.0F ? -dxdy : 0.0F;
 -  GLfloat x = pMin[0] - (yMin - iyMin) * dxdy;
GLint iy;
 -  for (iy = iyMin; iy  iyMax; iy++, x += dxdy) {
 +  #pragma omp parallel for schedule(dynamic) private(iy) 
 firstprivate(span)
 +  for (iy = iyMin; iy  iyMax; iy++) {
 + GLfloat x = pMin[0] - (yMin - iy) * dxdy;
   GLint ix, startX = (GLint) (x - xAdj);
   GLuint count;
   GLfloat coverage = 0.0F;
  
 +#ifdef _OPENMP
 + /* each thread needs to use a different (global) SpanArrays 
 variable */
 + span.array = SWRAST_CONTEXT(ctx)-SpanArrays + omp_get_thread_num();
 +#endif
   /* skip over fragments with zero coverage */
   while (startX  MAX_WIDTH) {
  coverage = compute_coveragef(pMin, pMid, pMax, startX, iy);
 @@ -228,13 +233,12 @@
  coverage = compute_coveragef(pMin, pMid, pMax, ix, iy);
   }
   
 - if (ix = startX)
 -continue;
 - 
 - span.x = startX;
 - span.y = iy;
 - span.end = (GLuint) ix - (GLuint) startX;
 - _swrast_write_rgba_span(ctx, span);
 + if (ix  startX) {
 +span.x = startX;
 +span.y = iy;
 +span.end = (GLuint) ix - (GLuint) startX;
 +_swrast_write_rgba_span(ctx, span);
 + }
}
 }
 else {
 @@ -244,13 +248,18 @@
const GLfloat *pMax = vMax-attrib[FRAG_ATTRIB_WPOS];
const GLfloat dxdy = majDx / majDy;
const GLfloat xAdj = dxdy  0 ? dxdy : 0.0F;
 -  GLfloat x = pMin[0] - (yMin - iyMin) * dxdy;
GLint iy;
 -  for (iy = iyMin; iy  iyMax; iy++, x += dxdy) {
 +  #pragma