Re: [Gimp-developer] GIMP and multiple processors

2005-02-28 Thread Jay Cox
On Mon, 2005-02-28 at 00:25 +0100, Sven Neumann wrote:
 Hi,
 
 Jay Cox [EMAIL PROTECTED] writes:
 
  Now that this race condition is eliminated I might look into adding
  hooks to the pixel-processor to allow initialisation of per-thread
  data, like for example a GRand.
 
  I think that is the correct way to do it.  It should be done generaly
  enough so that the histogram code can be moved over to use the
  pixel_region_process_parallel functions.
 
 The histogram code does already use the threaded pixel-processor. You
 think we could simplify the code? IMO the current solution isn't all
 that bad. But I haven't benchmarked it so I don't really know...

Of course you are correct.  I guess it has been a while since I looked
at that code...  It reads like an clever hack, not elegant design.

 I tried to introduce the per-thread initialization code today but
 figured that it adds quite some complexity. It could certainly be done
 but I don't see much need for it any longer.
 

I think per-thread initialization and finalization would make the
histogram code much cleaner (and slightly faster and more scaleable).
It should also let us more easily paralellize a wider selection of
algorithms (not that I can think of any off the top of my head...).

Any thoughts on moving some of the _process_parallel stuff to libgimp? I
think it would be cool if we could integrate it with the preview code.

Jay Cox
[EMAIL PROTECTED]

PS: Shouldn't we be ignoring all this stuff and working on gegl? ;)

___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-28 Thread Sven Neumann
Hi,

Daniel Egger [EMAIL PROTECTED] writes:

 The first goal should be to reduce complexity. Today I looked
 into parallelizing some *really* slow code: supersampling.
 Activating it slows down the processing *much* more than just
 the factor of the supersampling depth. And hey, the code is
 uglee at least a top candidate for the ugliest code left in
 The GIMP.

Since supersampling is in libgimpcolor, it will probably be difficult
to improve without breaking backwards compatibility. But of course
noone forces us to actually use the functionality in libgimpcolor...


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-28 Thread Daniel Egger
On 28.02.2005, at 21:15, Sven Neumann wrote:
Since supersampling is in libgimpcolor, it will probably be difficult
to improve without breaking backwards compatibility. But of course
noone forces us to actually use the functionality in libgimpcolor...
But, and this is one of many upsides of improving library code,
all users of supersampling will benefit at the same time.
Maybe it would be best if someone could come up with a reasonable
example what supersampling might be useful for in gradient blend
code except for slowing down everything...
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-28 Thread Sven Neumann
Hi,

Jay Cox [EMAIL PROTECTED] writes:

   The dither code is way too complex.  It looks like it should boil
 down to: color.{r,g,b,a} += g_rand_int()/RAND_MAX.

   We shouldn't need 32 bits of random data per component.  8 bits
 should do, so we only need one call to g_rand_int per pixel.

Good point. This made a lot of a difference. The dithered case is now
only about 10% slower than the non-dithered one :)


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-28 Thread Adam D. Moss
Daniel Egger wrote:
Maybe it would be best if someone could come example what supersampling might 
be useful for in gradient blend
code except for slowing down everything...
For an A-B linear blend it probably doesn't noticably help
much (unless, possibly, it's a long colour range in a small
area).  But GIMP gradients *can* be fairly arbitrary user-defined
1D designs including near-discontinuities, which benefits from
supersampling in the same way that a stroke, line or polygon
benefits from antialiasing.
--Adam
--
Adam D. Moss   -   [EMAIL PROTECTED]
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-27 Thread Daniel Egger
On 27.02.2005, at 01:04, Sven Neumann wrote:
I have eliminated last_visited from the gradient struct. Instead the
caller of gimp_gradient_get_color_at() may now do the same
optimization without any caching in the gradient itself. I very much
doubt that this makes any difference though. Perhaps if you would
benchmark a gradient blend with a lot of segments. But in the general
case it's just a few of them, very often even only a single one.

Now that this race condition is eliminated I might look into adding
hooks to the pixel-processor to allow initialisation of per-thread
data, like for example a GRand.
Your change dramatically changed the picture.
Instead of:
Linear Gradient blend on a 3000x3000 pixel image (Dithering on)
1 processor:7.5 s
2 processors:   9.6 s
4 processors:   9.6 s
Linear Gradient blend on a 3000x3000 pixel image (Dithering OFF)
1 processor:3.0 s
2 processors:   2.5 s
4 processors:   2.5 s
I now get:
Linear Gradient blend on a 3000x3000 pixel image (Dithering on)
1 processor:7.2 s
2 processors:   8.9 s
Linear Gradient blend on a 3000x3000 pixel image (Dithering OFF)
1 processor:2.5 s
2 processors:   1.5 s
However dithering on is still loosing quite a bit on this SMP
machine
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-27 Thread Sven Neumann
Hi,

Daniel Egger [EMAIL PROTECTED] writes:

 Your change dramatically changed the picture.

That surprises me, but there is obviously a lot I have to learn about
threads. Thanks for testing my changes.

 However dithering on is still loosing quite a bit on this SMP
 machine

Dithering makes heavy use of GRand and as long as the random number
generator is shared between the threads... I wonder if it actually
makes sense to add the overhead of per-thread data initialization or
if we should rather replace the use of a random number generator with
a pre-calculated dither pattern. That would improve the
single-threaded case as well.


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-27 Thread Daniel Egger
On 27.02.2005, at 14:31, Sven Neumann wrote:
Dithering makes heavy use of GRand and as long as the random number
generator is shared between the threads... I wonder if it actually
makes sense to add the overhead of per-thread data initialization or
if we should rather replace the use of a random number generator with
a pre-calculated dither pattern. That would improve the
single-threaded case as well.
Since the randomness doesn't play a big role here (like in a
security environment) I wonder if it wouldn't be easiest to
employ a per-thread pseudo-RNG seeded with a different
random number. One global RNG would be nice for this...
BTW: How often is gradient_fill_single_region_rgb being called?
Maybe we can simply make the RNG a local property of this and
the ..._gray function and be done with it; that of course
wouldn't make much sense if it ends up being called a million
times per fill...
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-27 Thread Sven Neumann
Hi,

Daniel Egger [EMAIL PROTECTED] writes:

 Since the randomness doesn't play a big role here (like in a
 security environment) I wonder if it wouldn't be easiest to
 employ a per-thread pseudo-RNG seeded with a different
 random number. One global RNG would be nice for this...

GRand is such a pseudo-RNG.

 BTW: How often is gradient_fill_single_region_rgb being called?
 Maybe we can simply make the RNG a local property of this and
 the ..._gray function and be done with it; that of course
 wouldn't make much sense if it ends up being called a million
 times per fill...

It is called once per tile. Your approach probably makes sense as long
as don't use g_rand_new() but g_rand_new_with_seed().  g_rand_new()
initializes the random number generator from /dev/urandom which is
probably too expensive to be done once per tile.


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-27 Thread Daniel Egger
On 27.02.2005, at 15:19, Sven Neumann wrote:
It is called once per tile. Your approach probably makes sense as long
as don't use g_rand_new() but g_rand_new_with_seed().  g_rand_new()
initializes the random number generator from /dev/urandom which is
probably too expensive to be done once per tile.
Incidentally this is exactly what I'm testing right now. ;=)
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-27 Thread Daniel Egger
On 27.02.2005, at 15:19, Sven Neumann wrote:
It is called once per tile. Your approach probably makes sense as long
as don't use g_rand_new() but g_rand_new_with_seed().  g_rand_new()
initializes the random number generator from /dev/urandom which is
probably too expensive to be done once per tile.
Not too bad, here are the results (Patch in different mail):
Before:
Linear Gradient blend on a 3000x3000 pixel image (Dithering on)
1 processor:7.2 s
2 processors:   8.9 s
Linear Gradient blend on a 3000x3000 pixel image (Dithering OFF)
1 processor:2.5 s
2 processors:   1.5 s
After
Linear Gradient blend on a 3000x3000 pixel image (Dithering on)
1 processor:7.0 s
2 processors:   3.6 s
Linear Gradient blend on a 3000x3000 pixel image (Dithering OFF)
1 processor:2.6 s
2 processors:   1.4 s
The degradation in the 1 processor setting without dithering
seems to be a small measurement fluke in my last round since
I consistently get better times in all other cases. Apologies
for that.
I need to think about a larger test image since its starting
to get too difficult to time this microbench...
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-27 Thread Daniel Egger
On 27.02.2005, at 17:24, Sven Neumann wrote:
Incidentally this is exactly what I'm testing right now. ;=)

Incidentally that is what the code in CVS is doing. Looks like we were
working on the same code. We should perhaps start using mutexes on the
source code ;)
Heh, you did not only do the same but you've chosen the
exact same approach and function names...
At least I have the performance figures you don't ;-p
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-27 Thread Daniel Egger
On 27.02.2005, at 17:19, [EMAIL PROTECTED] ( Marc) (A.) (Lehmann ) wrote:
As for your claim, dithering is completely invisible, try this image, 
which
is done with gimp's blend and no dithering:

http://data.plan9.de/d0.png

That image features quite visible banding effects - you will have 
banding
effects even if you go to 48 bit rgb, as the eye is very sensitive to
banding.

which has no banding effects, and is also much much bigger :)
I had to compare very hard to see a difference at all,
at least it's not clearly visible and you've chosen a
pretty extreme case... But okay, I now believe in the
power of dithering gradients ;)
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-27 Thread Jay Cox
On Sun, 2005-02-27 at 01:04 +0100, Sven Neumann wrote:
 Hi again,
 
 Jay Cox [EMAIL PROTECTED] writes:
 
  Clearly the gradient code could use some tuning.  A linear blend
  shouldn't take much more than 1/2 a second even with dithering.
 
 Could we improve this by preparing a reasonably fine array
 of samples and picking from these samples instead of calling
 gimp_gradient_get_color_at() over and over again?

That is one obvious optomization.

Other obvious optomizations:

  The dither code is way too complex.  It looks like it should boil down
to: color.{r,g,b,a} += g_rand_int()/RAND_MAX.

  We shouldn't need 32 bits of random data per component.  8 bits should
do, so we only need one call to g_rand_int per pixel.

  For linear blends we should calculate delta values for the factor
variable.  This will allow us to calculate the factor for each pixel
with only one add (plus edge condition checks).

  The reason why the dithering case gets less of a speedup is because
  the threads are fighting over the GRand state.  Each thread needs to
  have it's own GRand state.
 
  It looks like the threads are also fighting over
  gradient-last_visited.  My guess is that fixing this will get us
  much closer to the ideal 2x speed up.
 
 I have eliminated last_visited from the gradient struct. Instead the
 caller of gimp_gradient_get_color_at() may now do the same
 optimization without any caching in the gradient itself. I very much
 doubt that this makes any difference though. Perhaps if you would
 benchmark a gradient blend with a lot of segments. But in the general
 case it's just a few of them, very often even only a single one.
 
 Now that this race condition is eliminated I might look into adding
 hooks to the pixel-processor to allow initialisation of per-thread
 data, like for example a GRand.

I think that is the correct way to do it.  It should be done generaly
enough so that the histogram code can be moved over to use the
pixel_region_process_parallel functions.

Jay Cox
[EMAIL PROTECTED]


___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-27 Thread Sven Neumann
Hi,

Jay Cox [EMAIL PROTECTED] writes:

 Now that this race condition is eliminated I might look into adding
 hooks to the pixel-processor to allow initialisation of per-thread
 data, like for example a GRand.

 I think that is the correct way to do it.  It should be done generaly
 enough so that the histogram code can be moved over to use the
 pixel_region_process_parallel functions.

The histogram code does already use the threaded pixel-processor. You
think we could simplify the code? IMO the current solution isn't all
that bad. But I haven't benchmarked it so I don't really know...

I tried to introduce the per-thread initialization code today but
figured that it adds quite some complexity. It could certainly be done
but I don't see much need for it any longer.


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-27 Thread Daniel Egger
On 28.02.2005, at 00:25, Sven Neumann wrote:
The histogram code does already use the threaded pixel-processor. You
think we could simplify the code? IMO the current solution isn't all
that bad. But I haven't benchmarked it so I don't really know...

I tried to introduce the per-thread initialization code today but
figured that it adds quite some complexity. It could certainly be done
but I don't see much need for it any longer.
The first goal should be to reduce complexity. Today I looked
into parallelizing some *really* slow code: supersampling.
Activating it slows down the processing *much* more than just
the factor of the supersampling depth. And hey, the code is
uglee at least a top candidate for the ugliest code left in
The GIMP.
NB: Not that I had any idea what supersampling might be good for
in the case of gradients, but what do I know...
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-26 Thread Sven Neumann
Hi,

Jay Cox [EMAIL PROTECTED] writes:

 Here are some results for you:

 Dual 2.5ghz g5 mac, mac os x 10.3.8
 CVS gimp Changelog revision 1.10539

 Linear Gradient blend on a 3000x3000 pixel image (Dithering on)
 1 Processor:   7.98 seconds1x
 2 processors:  5.20 seconds1.5x
 3 processors:  5.23 seconds1.5x

 Linear Gradient blend on a 3000x3000 pixel image (Dithering OFF)
 1 processor:   3.89 seconds1x
 2 processors:  2.37 seconds1.7x
 3 processors   2.40 seconds1.7x

 Clearly the gradient code could use some tuning.  A linear blend
 shouldn't take much more than 1/2 a second even with dithering.

 The reason why the dithering case gets less of a speedup is because the
 threads are fighting over the GRand state.  Each thread needs to have
 it's own GRand state.

 It looks like the threads are also fighting over gradient-last_visited.
 My guess is that fixing this will get us much closer to the ideal 2x
 speed up.

Eeek, I didn't think about gradient-last_visited. While the use of
GRand is MT-safe, accessing the gradient obviously isn't. So this is
not only slowing things done, it is a race condition that can lead to
broken results. We definitely need to fix this or revert back to a
non-parallel version of gimp_gradient_blend().

BTW, thanks for fixing my threads code and also thanks for these
benchmarks!


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-26 Thread Daniel Egger
On 26.02.2005, at 02:44, Jay Cox wrote
:
Dual 2.5ghz g5 mac, mac os x 10.3.8
CVS gimp Changelog revision 1.10539

Linear Gradient blend on a 3000x3000 pixel image (Dithering on)
1 Processor:   7.98 seconds1x
2 processors:  5.20 seconds1.5x
3 processors:  5.23 seconds1.5x

Linear Gradient blend on a 3000x3000 pixel image (Dithering OFF)
1 processor:   3.89 seconds1x
2 processors:  2.37 seconds1.7x
3 processors   2.40 seconds1.7x
Oh, I wouldn't have though that disabling dithering makes such
a difference but indeed it does.
Same game on Dual Opteron again with latest sources:
Linear Gradient blend on a 3000x3000 pixel image (Dithering on)
1 processor:7.5 s
2 processors:   9.6 s
4 processors:   9.6 s
Linear Gradient blend on a 3000x3000 pixel image (Dithering OFF)
1 processor:3.0 s
2 processors:   2.5 s
4 processors:   2.5 s
Each time determined over three runs. Deviation is too small to
measure.
So with dithering there's a loss and without a slight win on
this box with several threads.
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-26 Thread Sven Neumann
Hi again,

Jay Cox [EMAIL PROTECTED] writes:

 Clearly the gradient code could use some tuning.  A linear blend
 shouldn't take much more than 1/2 a second even with dithering.

Could we improve this by preparing a reasonably fine array
of samples and picking from these samples instead of calling
gimp_gradient_get_color_at() over and over again?

 The reason why the dithering case gets less of a speedup is because
 the threads are fighting over the GRand state.  Each thread needs to
 have it's own GRand state.

 It looks like the threads are also fighting over
 gradient-last_visited.  My guess is that fixing this will get us
 much closer to the ideal 2x speed up.

I have eliminated last_visited from the gradient struct. Instead the
caller of gimp_gradient_get_color_at() may now do the same
optimization without any caching in the gradient itself. I very much
doubt that this makes any difference though. Perhaps if you would
benchmark a gradient blend with a lot of segments. But in the general
case it's just a few of them, very often even only a single one.

Now that this race condition is eliminated I might look into adding
hooks to the pixel-processor to allow initialisation of per-thread
data, like for example a GRand.


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-25 Thread Jay Cox
On Wed, 2005-02-16 at 22:42 +0100, Sven Neumann wrote:
 Hi,
 
 I couldn't resist and changed the PixelProcessor to use a thread pool.
 Main motivation was to make progress callback work for the threaded
 case. So there's now a variant of pixel_regions_process_parallel()
 that takes progress function and data. This allows us to parallelize
 some of the slower core functions (like for example gradient blend).
 
 It would be interesting to know if this actually gives a noticeable
 speedup on SMP systems. Would be nice if one of you could give this
 some testing. Please try to do gradient blends (w/o adaptive
 supersampling!) on large images. Changing the number of processors in
 the preferences dialog allows you to switch between using the thread
 pool and the single-threaded code.

Here are some results for you:

Dual 2.5ghz g5 mac, mac os x 10.3.8
CVS gimp Changelog revision 1.10539

Linear Gradient blend on a 3000x3000 pixel image (Dithering on)
1 Processor:   7.98 seconds1x
2 processors:  5.20 seconds1.5x
3 processors:  5.23 seconds1.5x

Linear Gradient blend on a 3000x3000 pixel image (Dithering OFF)
1 processor:   3.89 seconds1x
2 processors:  2.37 seconds1.7x
3 processors   2.40 seconds1.7x

Clearly the gradient code could use some tuning.  A linear blend
shouldn't take much more than 1/2 a second even with dithering.

The reason why the dithering case gets less of a speedup is because the
threads are fighting over the GRand state.  Each thread needs to have
it's own GRand state.

It looks like the threads are also fighting over gradient-last_visited.
My guess is that fixing this will get us much closer to the ideal 2x
speed up.

I don't see myself hacking on the gradient code in the near future so
anyone else should feel free...

Jay Cox
[EMAIL PROTECTED]


___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-22 Thread Sven Neumann
Hi,

David Bonnell [EMAIL PROTECTED] writes:

 If each thread obtains an (exclusive) lock on the pixel region then
 the tasks will effectively be serialized and overall execution time
 will increase compared to a non-threaded implementation due to the
 threading overheads. (Queue manipulation, thread creation and
 context switching, etc).

Please read my description again; I don't think it was that bad. Of
course there's no exclusive lock on the pixel region while it is being
processed.


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-22 Thread Sven Neumann
Hi,

John Cupitt [EMAIL PROTECTED] writes:

 FWIW, vips works by having a thread pool (rather than a tile queue)
 and a simple for(;;) loop over tiles. At each tile, the for() loop
 waits for a thread to become free, then assigns it a tile to work
 on.

It would be trivial to change the GIMP code to work this way. We would
use the existing GThreadPool and just push a tasks per tile into the
pool. If someone wants to give this a try, perhaps it makes a difference.


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-21 Thread Tino Schwarze
On Sun, Feb 20, 2005 at 11:52:16PM +, Adam D. Moss wrote:

 I can force it to use both CPUs now, but even with
 200% utilization it is 2s slower to run this stupid
 ubenchmark than on 1 CPU without threads.
 
 Just a vague guess, but the multiprocessor GIMP pixel
 work scheduler might* farm alternating tiles to alternating
 CPUs.  These are reasonably likely to have been allocated
 together and thus sit close together in memory, causing
 memory contention between CPUs.

So it might make sense not to interleave tiles but let each thread start
on another region of the image, like this (for 4 threads)

1xxx

2xxx

3xxx

4xxx


So each of the threads works on it's own memory at first and only when
it finished it's region, it will get tiles from the remaining work. Some
binary-search like algorithm could be used like this:

- figure out largest remaining part of the image (at start, it's the
whole image)
- let x be the number of idle threads
- if starting up, 
  divide into x regions
  assign each thread the start of it's region
   else 
  divide into x+1 regions
  assign idle threads the x+1st, x-th, x-1st etc. region (this
leaves the first to the thread currently processing it)
- start processing the regions
- if any thread runs out of work, restart algorithm

This should improve memory locality while distributing work across all
threads and making sure that non-uniform workloads are also distributed.

Consider the above example where thread 4 has finished it's tiles:





333x




It looks like 1 will need the longest time to finish it's region, so we
jump in there and assign part of it's work to 4 (marked y):


xx4y


333x




Later, 3 finishes:


xx4y







Now, 2 has the largest region  left, so half of it get's assign to 3
(marked z):


xx4y

3zzz





The state of affairs can be managed easily since there are never more
regions than threads.

HTH! Tino.

PS: I don't know whether this algorithm works well in practice, I
actually got the idea while replying Adam's response.

___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-21 Thread Daniel Egger
On 21.02.2005, at 03:14, [EMAIL PROTECTED] ( Marc) (A.) (Lehmann ) wrote:
Forcing the NPTL implementation to degrade to legacy

BTW, is this 2.4 or 2.6?
Linux ulli 2.6.9-k8 #31 SMP Wed Nov 3 10:58:29 CET 2004 x86_64 GNU/Linux
32bit userland from Debian unstable.
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


RE: [Gimp-developer] GIMP and multiple processors

2005-02-21 Thread David Bonnell
Allocating batches of tiles to each thread and dynamically re-allocating
them should a thread finish early will complicate the code and will only
provide very marginal speed improvement.  (Each thread will have to obtain a
lock on a work queue to get its next tile either way but the proposed scheme
has a slightly less chance of lock contention since there are effectively
multiple queues).

As far as task sequencing goes that all depends on how tile memory is
allocated.  To maximize locality of reference and minimize paging the goal
should be to have all threads working on contiguous areas of memory at the
same time where possible.


-Dave

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tino
Schwarze
Sent: Monday, 21 February 2005 6:14 PM
To: gimp-developer@lists.xcf.berkeley.edu
Subject: Re: [Gimp-developer] GIMP and multiple processors

On Sun, Feb 20, 2005 at 11:52:16PM +, Adam D. Moss wrote:

 I can force it to use both CPUs now, but even with
 200% utilization it is 2s slower to run this stupid
 ubenchmark than on 1 CPU without threads.
 
 Just a vague guess, but the multiprocessor GIMP pixel
 work scheduler might* farm alternating tiles to alternating
 CPUs.  These are reasonably likely to have been allocated
 together and thus sit close together in memory, causing
 memory contention between CPUs.

So it might make sense not to interleave tiles but let each thread start
on another region of the image, like this (for 4 threads)

1xxx

2xxx

3xxx

4xxx


So each of the threads works on it's own memory at first and only when
it finished it's region, it will get tiles from the remaining work. Some
binary-search like algorithm could be used like this:

- figure out largest remaining part of the image (at start, it's the
whole image)
- let x be the number of idle threads
- if starting up, 
  divide into x regions
  assign each thread the start of it's region
   else 
  divide into x+1 regions
  assign idle threads the x+1st, x-th, x-1st etc. region (this
leaves the first to the thread currently processing it)
- start processing the regions
- if any thread runs out of work, restart algorithm

This should improve memory locality while distributing work across all
threads and making sure that non-uniform workloads are also distributed.

Consider the above example where thread 4 has finished it's tiles:





333x




It looks like 1 will need the longest time to finish it's region, so we
jump in there and assign part of it's work to 4 (marked y):


xx4y


333x




Later, 3 finishes:


xx4y







Now, 2 has the largest region  left, so half of it get's assign to 3
(marked z):


xx4y

3zzz





The state of affairs can be managed easily since there are never more
regions than threads.

HTH! Tino.

PS: I don't know whether this algorithm works well in practice, I
actually got the idea while replying Adam's response.

___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer

___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-21 Thread pcg
On Mon, Feb 21, 2005 at 09:14:13AM +0100, Tino Schwarze [EMAIL PROTECTED] 
wrote:
  I can force it to use both CPUs now, but even with
  200% utilization it is 2s slower to run this stupid
  ubenchmark than on 1 CPU without threads.
  
  Just a vague guess, but the multiprocessor GIMP pixel
  work scheduler might* farm alternating tiles to alternating
  CPUs.  These are reasonably likely to have been allocated
  together and thus sit close together in memory

This is unlikely, as the gimp has no say in the physical layout of the memory
it gets from the kernel.

Also, interleaving should not have much of an effect, as the blocks are
large, so there will be no thrashing between cpus (if hyperthreading is in
use, then interleaving would actually be slightly better due to limited
cache associativity).

It's more likely that gimp is doing something wrong currently, with respect
to locking or sth. similar.

-- 
The choice of a
  -==- _GNU_
  ==-- _   generation Marc Lehmann
  ---==---(_)__  __   __  [EMAIL PROTECTED]
  --==---/ / _ \/ // /\ \/ /  http://schmorp.de/
  -=/_/_//_/\_,_/ /_/\_\  XX11-RIPE
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-21 Thread John Cupitt
On Mon, 21 Feb 2005 02:00:39 +0100, Sven Neumann [EMAIL PROTECTED] wrote:
  It sounds like the granularity of parallelism is too fine.  That is,
  each task is too short and the overhead of task dispatching (your
  task queue processing, the kernels thread context switching, any IPC
  required, etc.) is longer then the duration of a single task.
 
 The task is not a single pixel but a single tile (that is usually a
 region of 64x64 pixels). GIMP processes pixel regions by iterating
 over the tiles. The multi-threaded pixel processor uses a configurable
 number of threads. Each thread obtains a lock on the pixel-region,
 takes a pointer to the next tile from the queue, releases the lock,
 processes the tile and starts over.

I maintain a threaded image processing library called VIPS. 

  http://www.vips.ecs.soton.ac.uk/

We looked at granularity a while ago and the 'sweet spot' at which the
thread start/stop time became insignificant seemed to be around 50x50
pixels. I've pasted some numbers to the end of the mail in case anyone
is interested. I realise gimp is using a very different evaluation
strategy, but the point (maybe) is that thread manipulation is rather
quick and you're probably not seeing it with 64x64 pixel tiles.

FWIW, vips works by having a thread pool (rather than a tile queue)
and a simple for(;;) loop over tiles. At each tile, the for() loop
waits for a thread to become free, then assigns it a tile to work on.

The benchmark is a 45 degree rotate of a 4000 by 4000 pixel image.
Look for the point at which real time stops falling. The first two
arguments to try are the tilesize. The more recent numbers are
really too small to be accurate :-( but the benchmark is 10 years old
and took minutes back then: oh well. I'm supposed to be getting a quad
opteron soon, which will be interesting. Kernel 2.6 would help too no
doubt.

cima: 1 cpu ultrasparc
./try huysum.hr.v fred.v 10 10 1 20
real   30.1
user   24.5
./try huysum.hr.v fred.v 20 20 1 20
real   19.2
user   16.9
./try huysum.hr.v fred.v 30 30 1 20
real   17.8
user   15.4
./try huysum.hr.v fred.v 40 40 1 20
real   17.1
user   15.1
./try huysum.hr.v fred.v 50 50 1 20
real   16.9
user   15.1
./try huysum.hr.v fred.v 60 60 1 20
real   16.6
user   15.0
./try huysum.hr.v fred.v 70 70 1 20
real   17.2
user   15.2
./try huysum.hr.v fred.v 80 80 1 20
real   17.3
user   15.1
./try huysum.hr.v fred.v 90 90 1 20
real   17.4
user   15.3

perugino: 2 cpu supersparc
./try huysum.hr.v fred.v 10 10 1 20
real0m51.123s
user1m7.623s
./try huysum.hr.v fred.v 20 20 1 20
real0m24.601s
user0m41.133s
./try huysum.hr.v fred.v 30 30 1 20
real0m21.931s
user0m38.393s
./try huysum.hr.v fred.v 40 40 1 20
real0m20.208s 
user0m35.653s
./try huysum.hr.v fred.v 50 50 1 20
real0m20.109s
user0m35.283s 
./try huysum.hr.v fred.v 60 60 1 20
real0m19.501s
user0m34.513s
./try huysum.hr.v fred.v 70 70 1 20
real0m20.435s
user0m34.813s
./try huysum.hr.v fred.v 80 80 1 20
real0m20.558s
user0m35.293s
./try huysum.hr.v fred.v 90 90 1 20
real0m20.785s
user0m35.313s

Run on furini, 2 CPU 450MHz PII Xeon, kernel 2.4.4, vips-7.7.19, gcc 2.95.3
./try huysum.hr.v fred.v 10 10 1 20
real0m4.542s
user0m4.350s
sys 0m3.800s
./try huysum.hr.v fred.v 20 20 1 20
real0m2.206s
user0m2.750s
sys 0m1.250s
./try huysum.hr.v fred.v 30 30 1 20
real0m1.678s
user0m2.610s
sys 0m0.580s
./try huysum.hr.v fred.v 40 40 1 20
real0m1.483s
user0m2.460s
sys 0m0.410s
./try huysum.hr.v fred.v 50 50 1 20
real0m1.443s
user0m2.330s
sys 0m0.350s
./try huysum.hr.v fred.v 60 60 1 20
real0m1.385s
user0m2.390s
sys 0m0.220s
./try huysum.hr.v fred.v 70 70 1 20
real0m1.394s
user0m2.460s
sys 0m0.150s
./try huysum.hr.v fred.v 80 80 1 20
real0m1.365s
user0m2.360s
sys 0m0.200s
./try huysum.hr.v fred.v 90 90 1 20
real0m1.393s
user0m2.450s
sys 0m0.180s

Run on manet, 2 CPU 2.5GHz P4 Xeon, kernel 2.4.18, vips-7.8.5, gcc 2.95.3
./try huysum.hr.v fred.v 10 10 1 20
real0m1.582s
user0m1.640s
sys 0m1.470s
./try huysum.hr.v fred.v 20 20 1 20
 

Re: [Gimp-developer] GIMP and multiple processors

2005-02-21 Thread Jay Cox
On Sun, 2005-02-20 at 23:52 +, Adam D. Moss wrote:
 Daniel Egger wrote:
  I can force it to use both CPUs now, but even with
  200% utilization it is 2s slower to run this stupid
  ubenchmark than on 1 CPU without threads.
 
 Just a vague guess, but the multiprocessor GIMP pixel
 work scheduler might* farm alternating tiles to alternating
 CPUs.  These are reasonably likely to have been allocated
 together and thus sit close together in memory, causing
 memory contention between CPUs.
 

I'm not sure what system the benchmark is being run on, but the cache
line size on a P4 is 128Byes (most other systems have smaller cache line
sizes).  A simple test to see if this is the problem would be to change
the tile allocation code to allocate an extra 128 bytes of memory per
tile.  See app/base/tile.c line 221

I think it is more likely that the problem is with sharing some other
data between processors (for example the gradient struct?)

I think it would be a good idea to get some timings from some other
operations also.  Perhaps painting with a large brush, or flattening a
complicated image.

Jay Cox
[EMAIL PROTECTED]




___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-21 Thread Daniel Egger
On 21.02.2005, at 19:07, Jay Cox wrote:
I'm not sure what system the benchmark is being run on, but the cache
line size on a P4 is 128Byes (most other systems have smaller cache 
line
sizes).  A simple test to see if this is the problem would be to change
the tile allocation code to allocate an extra 128 bytes of memory per
tile.  See app/base/tile.c line 221
Dual Opteron.
I think it would be a good idea to get some timings from some other
operations also.  Perhaps painting with a large brush, or flattening a
complicated image.
Sven, what procedures are currently parallelized?
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


RE: [Gimp-developer] GIMP and multiple processors

2005-02-21 Thread David Bonnell
That all depends how it allocates tile memory.  If it mallocs tile by tile
then they could be anywhere, including separate pages.  If it mallocs a
large block (system page size) and then divides it into tiles itself then
those tiles are *likely* to reside on the same page - which is what matters
most for performance if memory is stretched.


-Dave

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED] ( Marc) (A.) (Lehmann )
Sent: Monday, 21 February 2005 10:45 PM
To: gimp-developer@lists.xcf.berkeley.edu
Subject: Re: [Gimp-developer] GIMP and multiple processors

...

This is unlikely, as the gimp has no say in the physical layout of the
memory
it gets from the kernel.


___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-21 Thread Jay Cox
On Mon, 2005-02-21 at 21:34 +0100, Daniel Egger wrote:
 On 21.02.2005, at 19:07, Jay Cox wrote:
 
  I'm not sure what system the benchmark is being run on, but the cache
  line size on a P4 is 128Byes (most other systems have smaller cache 
  line
  sizes).  A simple test to see if this is the problem would be to change
  the tile allocation code to allocate an extra 128 bytes of memory per
  tile.  See app/base/tile.c line 221
 
 Dual Opteron.

Should only need 64 bytes then.

 
  I think it would be a good idea to get some timings from some other
  operations also.  Perhaps painting with a large brush, or flattening a
  complicated image.
 
 Sven, what procedures are currently parallelized?
 


My recolection from many years ago is this:  
All of the tools-color_tools
Histogram calculation
All compositing(?) including:
Layer stack compositing
Brush-drawable compositing

More recently sven has parallelized:
gimp_image_contiguous_region_by_color
gimp_drawable_desaturate
gradient_fill_region

Jay Cox
[EMAIL PROTECTED]


___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-20 Thread Sven Neumann
Hi,

Daniel Egger [EMAIL PROTECTED] writes:

 Hm, there's still no idea floating around how to benchmark.

There are very clear ideas on how to do it. Someone just needs to sit
down and write the script (or dig out script-fu-bench.scm which is
what we used to use a long time ago).

 I'd rather have some microbenchmark that delivers reproducible
 results than only being able to say hey, it feels faster...

You'd still do me a favor if you would try current CVS and told me
whether it feels faster or not.


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-20 Thread Daniel Egger
On 20.02.2005, at 14:09, Sven Neumann wrote:
There are very clear ideas on how to do it.
Hm, must have missed that...
Someone just needs to sit down and write the script (or dig out
script-fu-bench.scm which is what we used to use a long time ago).

You'd still do me a favor if you would try current CVS and told me
whether it feels faster or not.
Okee, I'm warming up the machine right now. This'll take a
while though because I need to recompile quite some stuff.
I'll do that in 32bit mode if you don't mind...
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-20 Thread Kevin Cozens
Sven Neumann wrote:
Daniel Egger [EMAIL PROTECTED] writes:
 

Hm, there's still no idea floating around how to benchmark.
   

There are very clear ideas on how to do it. Someone just needs to sit
down and write the script (or dig out script-fu-bench.scm which is
what we used to use a long time ago).
 

It would be nice to have a copy of that old script-fu-bench.scm script. 
I used to have a copy with one of my versions of GIMP. It isn't in CVS 
and I can't find it on the net. If someone has the script I'd like to 
get a copy of it.

Even if we had the old script-fu-bench script around I don't think its 
the best choice to do the benchmarking tests which are currently needed. 
It is something I (or someone knowledgeable in Script-Fu or another GIMP 
scripting language) could write. It would be helpful to have a list of 
GIMP features/functions to use which would make call the code which 
needs to be benchmarked.

--
Cheers!
Kevin.  (http://www.interlog.com/~kcozens/)
Owner of Elecraft K2 #2172|What are we going to do today, Borg?
E-mail:kcozens at interlog dot com|Same thing we always do, Pinkutus:
Packet:[EMAIL PROTECTED]|  Try to assimilate the world!
#include disclaimer/favourite   |  -Pinkutus  the Borg
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-20 Thread Daniel Egger
On 20.02.2005, at 14:09, Sven Neumann wrote:
You'd still do me a favor if you would try current CVS and told me
whether it feels faster or not.
It's slower, measurable and reproducible slower.
As a benchmark I used a gradient fill in a 3000x3000px (68.8M)
image. I get consistently times of 8s for 1 thread and between
9.2s and 9.6s for 2 threads. With a running application, after a
restart -- doesn't matter.
What is strange though, is that it only seems two use one CPU
for both threads; maybe a stupid gthread implementation?
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-20 Thread Sven Neumann
Hi,

Daniel Egger [EMAIL PROTECTED] writes:

 It's slower, measurable and reproducible slower.

 As a benchmark I used a gradient fill in a 3000x3000px (68.8M)
 image. I get consistently times of 8s for 1 thread and between
 9.2s and 9.6s for 2 threads. With a running application, after a
 restart -- doesn't matter.

 What is strange though, is that it only seems two use one CPU
 for both threads; maybe a stupid gthread implementation?

What version did you test? What's the last ChangeLog entry?


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-20 Thread Daniel Egger
On 20.02.2005, at 21:55, Sven Neumann wrote:
As a benchmark I used a gradient fill in a 3000x3000px (68.8M)
image. I get consistently times of 8s for 1 thread and between
9.2s and 9.6s for 2 threads. With a running application, after a
restart -- doesn't matter.

What is strange though, is that it only seems two use one CPU
for both threads; maybe a stupid gthread implementation?

What version did you test?
GIMP (r/w) CVS from a few hours ago (about the time when I wrote
in that I'll do it but that it'll take a while...)
What's the last ChangeLog entry?
2005-02-20  Sven Neumann  [EMAIL PROTECTED]
* app/actions/context-actions.c
* app/actions/context-commands.c[ch]: added actions to control 
the
average radius of color picker tools (bug #167765).

* app/actions/tool-options-actions.c: fixed a typo in a comment.
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-20 Thread Sven Neumann
Hi,

Daniel Egger [EMAIL PROTECTED] writes:

 What is strange though, is that it only seems two use one CPU
 for both threads; maybe a stupid gthread implementation?

Since gthread is just a very thin wrapper around pthreads, that would
mean that it's a stupid pthread implementation. To me this looks like
the kernel believes that it would be better to keep the threads local
than to move one to the other CPU. I wonder if perhaps the kernel is
right and using two CPUs would actually cause more overhead than it's
worth?


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-20 Thread pcg
On Sun, Feb 20, 2005 at 10:55:18PM +0100, Sven Neumann [EMAIL PROTECTED] 
wrote:
 mean that it's a stupid pthread implementation. To me this looks like
 the kernel believes that it would be better to keep the threads local
 than to move one to the other CPU.

Linux will not keep two threads running on a single cpu if both are ready
and nothing else is running, regardless of locality etc., as the kernel
lacks the tools to effectively decide wether threads should stay on a cpu
or not.

(I mean, it's of course bad to interlave operations on a per-pixel basis
instead of e.g. a per-tile basis, but the kernel will run the threads
concurrently wether or not it gets slower).

 right and using two CPUs would actually cause more overhead than it's
 worth?

That's quite possible, but IFF the kernel indeed keeps the two threads on
a single cpu then it means that both aren't ready at the same time, e.g.
due to lock contention or other things.

-- 
The choice of a
  -==- _GNU_
  ==-- _   generation Marc Lehmann
  ---==---(_)__  __   __  [EMAIL PROTECTED]
  --==---/ / _ \/ // /\ \/ /  http://schmorp.de/
  -=/_/_//_/\_,_/ /_/\_\  XX11-RIPE
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-20 Thread Daniel Egger
On 20.02.2005, at 22:55, Sven Neumann wrote:
Since gthread is just a very thin wrapper around pthreads, that would
mean that it's a stupid pthread implementation. To me this looks like
the kernel believes that it would be better to keep the threads local
than to move one to the other CPU. I wonder if perhaps the kernel is
right and using two CPUs would actually cause more overhead than it's
worth?
Hm, it is using NPTL, e.g. if I set it to 8 threads I'll have:
egger31344  2.1 14.5 219412 149008 pts/0   Sl+  23:10   0:05 
/opt/gimp-mp-test/bin/gimp-2.3
egger31344 10.8 14.5 219412 149012 pts/0   Rl+  23:10   0:26 
/opt/gimp-mp-test/bin/gimp-2.3
egger31344 11.0 14.5 219412 149012 pts/0   Rl+  23:10   0:26 
/opt/gimp-mp-test/bin/gimp-2.3
egger31344  6.5 14.5 219412 149012 pts/0   Rl+  23:11   0:08 
/opt/gimp-mp-test/bin/gimp-2.3
egger31344  5.7 14.5 219412 149012 pts/0   Rl+  23:11   0:07 
/opt/gimp-mp-test/bin/gimp-2.3
egger31344  6.8 14.5 219412 149016 pts/0   Rl+  23:11   0:08 
/opt/gimp-mp-test/bin/gimp-2.3
egger31344  6.1 14.5 219412 149016 pts/0   Rl+  23:11   0:08 
/opt/gimp-mp-test/bin/gimp-2.3
egger31344  6.8 14.5 219412 149016 pts/0   Rl+  23:11   0:08 
/opt/gimp-mp-test/bin/gimp-2.3
egger31344  6.2 14.5 219412 149016 pts/0   Rl+  23:11   0:08 
/opt/gimp-mp-test/bin/gimp-2.3

If I force it back to classical pthreads then it'll be even
worse:
egger31456 70.8 15.0 164232 154852 pts/0   R+   23:32   1:53 
/opt/gimp-mp-test/bin/gimp-2.3
egger31457  0.0 15.0 164232 154852 pts/0   S+   23:32   0:00 
/opt/gimp-mp-test/bin/gimp-2.3
egger31458  0.0 15.0 164232 154852 pts/0   S+   23:32   0:00 
/opt/gimp-mp-test/bin/gimp-2.3
egger31459  0.0 15.0 164232 154852 pts/0   S+   23:32   0:00 
/opt/gimp-mp-test/bin/gimp-2.3

One interesting thing here certainly is that one process
(== thread) is doing all the work and the rest is doing
nothing in a shape fill, while in a linear fill I'll get:
egger31474  3.4 10.8 120696 111508 pts/0   S+   23:36   0:04 
/opt/gimp-mp-test/bin/gimp-2.3
egger31475  0.0 10.8 120696 111508 pts/0   S+   23:36   0:00 
/opt/gimp-mp-test/bin/gimp-2.3
egger31476 35.1 10.8 120696 111508 pts/0   S+   23:36   0:48 
/opt/gimp-mp-test/bin/gimp-2.3
egger31477 35.1 10.8 120696 111508 pts/0   S+   23:36   0:48 
/opt/gimp-mp-test/bin/gimp-2.3

However those are again running on the same CPU so
rather disturbing each other then crunching away.
To be continued...
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-20 Thread Daniel Egger
On 20.02.2005, at 23:47, [EMAIL PROTECTED] ( Marc) (A.) (Lehmann ) wrote:
Linux will not keep two threads running on a single cpu if both are 
ready
and nothing else is running, regardless of locality etc., as the kernel
lacks the tools to effectively decide wether threads should stay on a 
cpu
or not.
Yes and no. I just figured out that the tools I were
looking for are called schedutils and can be used to
change the affinity settings of a process, i.e. pin
it to some CPU or allow it to migrate as the kernel
decides between a set of CPUs.
Forcing the NPTL implementation to degrade to legacy
pthreads means that one thread equals one process and
thus can be controlled with taskset.
Oh yes, and I just noticed that now this isn't even
necessary anymore because for some reason the kernel now
migrates on of the pthread processes to the other CPU
automatically after a short while of processing.
(I mean, it's of course bad to interlave operations on a per-pixel 
basis
instead of e.g. a per-tile basis, but the kernel will run the threads
concurrently wether or not it gets slower).
Certainly. Opterons are bandwidth monsters but this
doesn't mean that they'll be forgiving to stupid
algorithms.
That's quite possible, but IFF the kernel indeed keeps the two threads 
on
a single cpu then it means that both aren't ready at the same time, 
e.g.
due to lock contention or other things.
I can force it to use both CPUs now, but even with
200% utilization it is 2s slower to run this stupid
ubenchmark than on 1 CPU without threads.
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-20 Thread Adam D. Moss
Daniel Egger wrote:
I can force it to use both CPUs now, but even with
200% utilization it is 2s slower to run this stupid
ubenchmark than on 1 CPU without threads.
Just a vague guess, but the multiprocessor GIMP pixel
work scheduler might* farm alternating tiles to alternating
CPUs.  These are reasonably likely to have been allocated
together and thus sit close together in memory, causing
memory contention between CPUs.
--Adam
* I haven't looked at it.
--
Adam D. Moss   -   [EMAIL PROTECTED]
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


RE: [Gimp-developer] GIMP and multiple processors

2005-02-20 Thread David Bonnell
It sounds like the granularity of parallelism is too fine.  That is, each
task is too short and the overhead of task dispatching (your task queue
processing, the kernels thread context switching, any IPC required, etc.) is
longer then the duration of a single task.

I hit the same problem a decade and a half ago when I worked on distributed
parallel ray tracing systems for my post graduate thesis. If each task is a
pixel then you may want to consider increasing this to a (configurable size)
bundle of pixels.  Depending on the algorithm being parallelized the bundle
may contain contiguous pixels (if processing of each pixel requires
approximately uniform processor time) or a random set of pixels (if there
is, or can potentially be, significant variance in per-pixel processing
time).


-Dave

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Daniel
Egger
Sent: Monday, 21 February 2005 9:13 AM
To: [EMAIL PROTECTED] ( Marc) (A.) (Lehmann )
Cc: Sven Neumann; Developer gimp-devel
Subject: Re: [Gimp-developer] GIMP and multiple processors

On 20.02.2005, at 23:47, [EMAIL PROTECTED] ( Marc) (A.) (Lehmann ) wrote:

 Linux will not keep two threads running on a single cpu if both are 
 ready
 and nothing else is running, regardless of locality etc., as the kernel
 lacks the tools to effectively decide wether threads should stay on a 
 cpu
 or not.

Yes and no. I just figured out that the tools I were
looking for are called schedutils and can be used to
change the affinity settings of a process, i.e. pin
it to some CPU or allow it to migrate as the kernel
decides between a set of CPUs.

Forcing the NPTL implementation to degrade to legacy
pthreads means that one thread equals one process and
thus can be controlled with taskset.

Oh yes, and I just noticed that now this isn't even
necessary anymore because for some reason the kernel now
migrates on of the pthread processes to the other CPU
automatically after a short while of processing.

 (I mean, it's of course bad to interlave operations on a per-pixel 
 basis
 instead of e.g. a per-tile basis, but the kernel will run the threads
 concurrently wether or not it gets slower).

Certainly. Opterons are bandwidth monsters but this
doesn't mean that they'll be forgiving to stupid
algorithms.

 That's quite possible, but IFF the kernel indeed keeps the two threads 
 on
 a single cpu then it means that both aren't ready at the same time, 
 e.g.
 due to lock contention or other things.

I can force it to use both CPUs now, but even with
200% utilization it is 2s slower to run this stupid
ubenchmark than on 1 CPU without threads.

Servus,
   Daniel

___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-20 Thread Sven Neumann
Hi,

David Bonnell [EMAIL PROTECTED] writes:

 It sounds like the granularity of parallelism is too fine.  That is,
 each task is too short and the overhead of task dispatching (your
 task queue processing, the kernels thread context switching, any IPC
 required, etc.) is longer then the duration of a single task.

 I hit the same problem a decade and a half ago when I worked on
 distributed parallel ray tracing systems for my post graduate
 thesis. If each task is a pixel then you may want to consider
 increasing this to a (configurable size) bundle of pixels.
 Depending on the algorithm being parallelized the bundle may contain
 contiguous pixels (if processing of each pixel requires
 approximately uniform processor time) or a random set of pixels (if
 there is, or can potentially be, significant variance in per-pixel
 processing time).

The task is not a single pixel but a single tile (that is usually a
region of 64x64 pixels). GIMP processes pixel regions by iterating
over the tiles. The multi-threaded pixel processor uses a configurable
number of threads. Each thread obtains a lock on the pixel-region,
takes a pointer to the next tile from the queue, releases the lock,
processes the tile and starts over. This goes on until all tiles are
processed. The main threads blocks until the queue is empty and all
threads have finished their jobs. If a progress callback has been
specified, the main thread wakes up regularily and updates the
progress bar.


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


RE: [Gimp-developer] GIMP and multiple processors

2005-02-20 Thread David Bonnell
Thanks for the clarification.

If each thread obtains an (exclusive) lock on the pixel region then the
tasks will effectively be serialized and overall execution time will
increase compared to a non-threaded implementation due to the threading
overheads. (Queue manipulation, thread creation and context switching, etc).

The main thread should lock the pixel region before pushing the tasks onto
the work.  Each task may then operate under the assumption that it has
exclusive access to its tile.  When the last task has completed the lock can
be released.

If this is already the case then please ignore as I have not had the
opportunity of yet to review the code.


-Dave

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Sven
Neumann
Sent: Monday, 21 February 2005 11:01 AM
To: David Bonnell
Cc: Gimp-developer@lists.xcf.berkeley.edu
Subject: Re: [Gimp-developer] GIMP and multiple processors

Hi,

David Bonnell [EMAIL PROTECTED] writes:

 It sounds like the granularity of parallelism is too fine.  That is,
 each task is too short and the overhead of task dispatching (your
 task queue processing, the kernels thread context switching, any IPC
 required, etc.) is longer then the duration of a single task.

 I hit the same problem a decade and a half ago when I worked on
 distributed parallel ray tracing systems for my post graduate
 thesis. If each task is a pixel then you may want to consider
 increasing this to a (configurable size) bundle of pixels.
 Depending on the algorithm being parallelized the bundle may contain
 contiguous pixels (if processing of each pixel requires
 approximately uniform processor time) or a random set of pixels (if
 there is, or can potentially be, significant variance in per-pixel
 processing time).

The task is not a single pixel but a single tile (that is usually a
region of 64x64 pixels). GIMP processes pixel regions by iterating
over the tiles. The multi-threaded pixel processor uses a configurable
number of threads. Each thread obtains a lock on the pixel-region,
takes a pointer to the next tile from the queue, releases the lock,
processes the tile and starts over. This goes on until all tiles are
processed. The main threads blocks until the queue is empty and all
threads have finished their jobs. If a progress callback has been
specified, the main thread wakes up regularily and updates the
progress bar.


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer

___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-18 Thread Daniel Egger
On 16.02.2005, at 22:42, Sven Neumann wrote:
It would be interesting to know if this actually gives a noticeable
speedup on SMP systems. Would be nice if one of you could give this
some testing. Please try to do gradient blends (w/o adaptive
supersampling!) on large images. Changing the number of processors in
the preferences dialog allows you to switch between using the thread
pool and the single-threaded code.
Hm, there's still no idea floating around how to benchmark.
I'd rather have some microbenchmark that delivers
reproducible results than only being able to say hey, it
feels faster...
It think this writing some GIMPbench would be a nice
volunteer job and hey, if you do it, you'll get to choose
the benchmark methods and all the fame... ;)
Anyone?
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-16 Thread Sven Neumann
Hi,

I couldn't resist and changed the PixelProcessor to use a thread pool.
Main motivation was to make progress callback work for the threaded
case. So there's now a variant of pixel_regions_process_parallel()
that takes progress function and data. This allows us to parallelize
some of the slower core functions (like for example gradient blend).

It would be interesting to know if this actually gives a noticeable
speedup on SMP systems. Would be nice if one of you could give this
some testing. Please try to do gradient blends (w/o adaptive
supersampling!) on large images. Changing the number of processors in
the preferences dialog allows you to switch between using the thread
pool and the single-threaded code.


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-15 Thread Nathan Summers
On Sun, 13 Feb 2005 22:10:21 +0100, Sven Neumann [EMAIL PROTECTED] wrote:

 Also tried to port the code to GThreadPool but it turned out to be not
 as trivial as I expected. The current code blocks until all threads
 are returned and it is not trivial to implement this behaviour with
 GThreadPool. So this part, the more interesting part of the TODO, is
 still open.
 
 I don't want to put more time into this, but I think it would
 definitely make sense to introduce a thread pool to cut down the
 number of thread creations. Any volunteers?

Here's a solution that would work:  (note, untested code, error
checking and unintersting stuff omitted for clarity, etc)

Calling code:
struct SynchStuff {
GThreadPool *pool;
GCond *cond;
GMutex *mutex;
} synch;

typedef struct SynchStuff SynchStuff;

synch.cond = g_cond_new();
synch.mutex = g_mutex_new();


/* create an exclusive threadpool */
synch.pool = g_thread_pool_new(gimp_thread_func, synch, ...); 

/* make sure that all the tasks are in the queue before checking if
queue is empty.
 */

g_mutex_lock(synch.mutex);

for (/* each task */ ...) {
g_thread_pool_push(sych.pool, task, );
}

/* alll threads are now launched.  allow threads to test queue for emptiness
 * and wait for signal that all tasks are complete */

g_cond_wait (synch.cond, synch.mutex); 

/* all tasks are now complete */

g_mutex_unlock(synch.mutex);



void gimp_thread_func(GimpTaskData *task, SynchStuff *synch) {

  /* process task */
  .
  .
  .

  /* wait until all tasks have been queued */
  g_mutex_lock(synch-mutex);

  if (g_thread_pool_get_num_threads(synch-pool) == 1)
  g_cond_signal(synch-cond, synch-mutex);

  g_mutex_unlock(mutex);
}

We could also try to talk the glib developers into including
g_thread_pool_join, which would block until the task pool is empty. 
It would be nice if the GThreadPool API were modelled on OpenMP to the
extent possible, which I admit is not really stellar because of the
compiler modifications OpenMP requires.  But perhaps some of what
OpenMP does with compiler modifications we can do with macros.

Rockwalrus
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-15 Thread Nathan Summers
On Tue, 15 Feb 2005 16:29:00 -0500, Nathan Summers [EMAIL PROTECTED] wrote:

   if (g_thread_pool_get_num_threads(synch-pool) == 1)

correction: I meant if (g_thread_pool_get_num_threads(synch-pool) == 1  
   g_thread_pool_unprocessed(synch-pool) == 0)

although the first test is sufficient if n(tasks) = n(threads)

Rockwalrus
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-15 Thread Sven Neumann
Hi,

Nathan Summers [EMAIL PROTECTED] writes:

 Here's a solution that would work: (note, untested code, error
 checking and unintersting stuff omitted for clarity, etc)

Yes, that would most probably work.

 struct SynchStuff {
 GThreadPool *pool;
 GCond *cond;
 GMutex *mutex;
 } synch;

 typedef struct SynchStuff SynchStuff;

I don't think we need a SynchStuff struct though. These can very well
be global static variables. The thread pool will be kept around for
the whole lifetime of gimp anway.


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-13 Thread Sven Neumann
Hi,

a quick followup to myself...

I couldn't resist and spent some time on the threaded pixel-processor
code . The first part of the TODO I posted yesterday is done, the code
has been ported to gthread. This makes the thread functionality
available on all platforms supported by gthread-2.0.

I have also cleaned up the code further while porting it to gthread.
It seems to be working rather well now, but I haven't done any
benchmarking so far. There's now a #define to control the ratio
between created threads and tiles that are to be processed. This
parameter definitely needs tuning.

Also tried to port the code to GThreadPool but it turned out to be not
as trivial as I expected. The current code blocks until all threads
are returned and it is not trivial to implement this behaviour with
GThreadPool. So this part, the more interesting part of the TODO, is
still open.

I don't want to put more time into this, but I think it would
definitely make sense to introduce a thread pool to cut down the
number of thread creations. Any volunteers?


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer


Re: [Gimp-developer] GIMP and multiple processors

2005-02-13 Thread Daniel Egger
On 13.02.2005, at 22:10, Sven Neumann wrote:
I couldn't resist and spent some time on the threaded pixel-processor
code . The first part of the TODO I posted yesterday is done, the code
has been ported to gthread. This makes the thread functionality
available on all platforms supported by gthread-2.0.
If you need someone to run benchmark tests, let me know.
I've a dual-Opteron under my desk, a dual-G4 and a dual-G5
at my disposal, a HT P4 machine in my near, and can easily
fetch a dual-SPARC workstation oder some bigger irons if
needed.
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


Re: [Gimp-developer] GIMP and multiple processors

2005-02-13 Thread Daniel Egger
On 13.02.2005, at 22:43, Sven Neumann wrote:
Doing benchmarks is exactly what would have to be done at this
point. The main problem is probably what to use as a benchmark.
One could write a script-fu and run that using batch-mode.
I've been using the coffee stain effect fu in the past but
this is *way* to random to deliver sensible results for timing
tests; it is somewhat useful for profiling though which is not
what we would want to do in this case, though.
It would be interesting to have numbers that show the effect of the
num-processors gimprc option and it would be interesting to see how
the value of the TILES_PER_THREAD #define influences the performance.
So far I can only guess that the cost of thread creation is perhaps a
problem.
If anyone can deliver some good benchmark I'd be happy to run it
on a few machines with the CVS version
Servus,
  Daniel


PGP.sig
Description: This is a digitally signed message part


[Gimp-developer] GIMP and multiple processors

2005-02-12 Thread Sven Neumann
Hi,

since hyperthreading and multi-core CPUs are becoming more and more
common, I think we should put a little more focus on making use of
these features. For that reason, I have made --enable-mp the default
in CVS HEAD and also changed the gimprc preference so that GIMP uses 2
processors. This default doesn't have necessarily have to make it into
the next stable release. The main concern now should be to make sure
that it works at all.

I had a look at the current implementation and also cleaned up the
code a little. We are talking about app/base/pixel-processor.[ch].
There seems to be room for improvement. I'd like to hack on this
myself but my time is limited, so I thought I'd ask if someone else
would like to take this job. So here are some ideas of what could be
changed:


- Port the thread code to g_thread functions to make it more portable.

  That would probably be necessary since we don't want to maintain our
  own thread abstraction layer. It would mean that we would have to
  call g_thread_init(). I am not sure if that would introduce any
  noticeable overhead on systems that only use a single CPU. Might be
  worth to investigate that.


- Use GThreadPool instead of continuously starting new threads.

  Creating a thread has a considerable overhead. The current
  implementation continuously creates new threads that finish after
  they have done their job. GLib provides a GThreadPool that would
  probably fit nicely here.


As you can see, this is a rather small task coding-wise. It would
however be good to also do some benchmarking to find out how much
overhead is introduced by using the threaded pixel-processor. We don't
want to make things worse for single-CPU systems.

Any volunteers?


Sven
___
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer