Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

2001-04-06 Thread Austin Donnelly

On Thursday, 5 Apr 2001, Georg Acher wrote:

 I just looked into bumpmap.c and tried to figure out if it can profit from
 blocking and played a bit with the code. It seems that there is some major
 (performance) problem with the gimp_pixel_rgn_get/set_row-calls in Gimp
 1.2.1.
 
 The original bumpmap took for my 1400*1400 image about 30s, when I commented
 out one of the gimp_pixel_rgn_get/set_row() that access the source or the
 destination image, the whole stuff took only about 2.4s, while the
 calculations were still done! 
 
 I experimented a bit, and found out, that apparently the "switch" between
 get and set seems to be the problem. I changed the code to get 32 rows,
 calculate them and the write the 32 back. It took only 3s overall, the
 resulting image is the same as with the 30s boiling time. 

Have a look at the gimp_pixel_rgns_register() and
gimp_pixel_rgns_process() functions.  They do similar blocking on a
per-tile basis.

Most plugins should be using them, but ones with more complex access
patterns (eg accessing pixels across tile boundaries) probably won't.'

Typical use is something like this:


gimp_drawable_mask_bounds (drawable-id, x1, y1, x2, y2);

for (y = y1; y  y2; y += tile_width - (y % tile_width))
{
for (x = x1; x  x2; x += tile_width - (x % tile_width))
{
/* process the image in tile-sized regions */
for (pr = gimp_pixel_rgns_register (2, src_rgn, dest_rgn);
 pr != NULL;
 pr = gimp_pixel_rgns_process (pr))
{
src_row  = src_rgn.data;
dest_row = dest_rgn.data;

for (row = 0; row  src_rgn.h; row++)
{
src  = src_row;
dest = dest_row;

for (col = 0; col  src_rgn.w; col++)
{

/* do stuff with src[0..bpp] and dest[0..bpp] */

src  += src_rgn.bpp;
dest += dest_rgn.bpp;
}

src_row  += src_rgn.rowstride;
dest_row += dest_rgn.rowstride;
}
}
}
}

Of course this precise form will only work where there's a one-to-one
correspondence between input and output pixels.  I suspect whirl-pinch
has more complex access patterns, in which case you'll need to be more
creative with the loop structure.

Austin
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer



Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

2001-04-06 Thread Austin Donnelly

Oops - I missed out some vital pixel region initialisation.


On Friday, 6 Apr 2001, Austin Donnelly wrote:

 gimp_drawable_mask_bounds (drawable-id, x1, y1, x2, y2);
 
 for (y = y1; y  y2; y += tile_width - (y % tile_width))
 {
 for (x = x1; x  x2; x += tile_width - (x % tile_width))
 {

  /* set up the source and dest regions */
  gimp_pixel_rgn_init (src_rgn, drawable, x, y, x_step, y_step,
   FALSE/*dirty*/, FALSE/*shadow*/);

  gimp_pixel_rgn_init (dest_rgn, drawable, x, y, x_step, y_step,
   TRUE/*dirty*/, TRUE/*shadow*/);


   /* process the image in tile-sized regions */
   for (pr = gimp_pixel_rgns_register (2, src_rgn, dest_rgn);
pr != NULL;
pr = gimp_pixel_rgns_process (pr))
   {
   src_row  = src_rgn.data;
   dest_row = dest_rgn.data;
 
   for (row = 0; row  src_rgn.h; row++)
   {
   src  = src_row;
   dest = dest_row;
 
   for (col = 0; col  src_rgn.w; col++)
   {
 
   /* do stuff with src[0..bpp] and dest[0..bpp] */
 
   src  += src_rgn.bpp;
   dest += dest_rgn.bpp;
   }
 
   src_row  += src_rgn.rowstride;
   dest_row += dest_rgn.rowstride;
   }
   }
 }
 }
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer



Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

2001-04-06 Thread Ernst Lippe

The speedup that you see is probably mainly caused by better caching.
There is a bug in the tile cache size for the plugin.
The cache is under most circumstances too small and this means that
every requested tile is not in the cache and must be transmitted from
the main gimp process (SLOW).
Effectively this means that every tile of the drawable is read and
written gimp_tile_height (normally 64) times. Your modification is
better, I expect that every tile is read and written
gimp_tile_height/BLOCKS = 2 times (but when a selection is used it will
normally be 3 times). 
This could be somewhat improved by setting BLOCKS = gimp_tile_height,
when the selection starts at a y-position that is a multiple of
gimp_tile_height (e.g. when there is no selection) every tile will
probably be read/written once, but when the selection starts at a
different position, every tile will still be read/written twice.

I used a different approach that gives speedups similar to yours but
that should read/write every tile only once. The solution is pretty
simple: just enlarge the cache size.
What is the problem with the cache size? The current code uses:
  gimp_tile_cache_ntiles (2 * (drawable-width +
   gimp_tile_width () - 1) /
  gimp_tile_width ());
The idea here is to cache one row of tiles for the source drawable and
one row for the destination drawable. But this is not enough because
there is no room in the cache for the bitmap tiles! There is a smaller
problem here too, when a selection is used, it is overkill to have a
cache for the full width of the drawable.

I've attached a patch with my modifications. I hope that someone
examines them critically and incorporates them into the distribution.
I prefer my approach because it should give better performance and it
keeps the code cleaner.

Other improvements are still possible. I expect that it should be
possible to rewrite the algorithm such that the tile cache contains only
3 tiles. From what I see the algorithm is the same in the horizontal and
vertical direction. The current implementation uses 3 extra buffer-rows
so when we add 3 extra buffer-columns it should be possible to rewrite
the algorithm so that it processes one tile at a time instead of a full
row.

Thanks for pointing out a pretty big performance problem with the
plug-in.

Greetings,
Ernst
[EMAIL PROTECTED]
 patchfile


[Gimp-developer] [patch] unbelievable speedup for bumpmap...

2001-04-06 Thread Kelly Martin

On Thu, 5 Apr 2001 23:55:57 +0200, Georg Acher [EMAIL PROTECTED] said:

Hi, I just looked into bumpmap.c and tried to figure out if it can
profit from blocking and played a bit with the code. It seems that
there is some major (performance) problem with the
gimp_pixel_rgn_get/set_row-calls in Gimp 1.2.1.

The original bumpmap took for my 1400*1400 image about 30s, when I
commented out one of the gimp_pixel_rgn_get/set_row() that access the
source or the destination image, the whole stuff took only about
2.4s, while the calculations were still done!

I experimented a bit, and found out, that apparently the "switch"
between get and set seems to be the problem. I changed the code to
get 32 rows, calculate them and the write the 32 back. It took only
3s overall, the resulting image is the same as with the 30s boiling
time.

I suspect this is because the libgimp tile cache is too small.
Libgimp allows a plugin to ask libgimp to cache a small (or not so
small, if it wants to burn memory) number of tiles in its own process
space so as to avoid fetches over the wire.  There's a call to
gimp_tile_cache_ntiles in bumpmap.c but it may simply not be large
enough, or there may be a problem with the tile caching code in
libgimp.  

Another thing to look at in my copious spare time. :)

Kelly
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer



Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

2001-04-06 Thread Kelly Martin

On Fri, 06 Apr 2001 14:55:32 +0200, Ernst Lippe [EMAIL PROTECTED] said:

Other improvements are still possible. I expect that it should be
possible to rewrite the algorithm such that the tile cache contains
only 3 tiles. From what I see the algorithm is the same in the
horizontal and vertical direction. The current implementation uses 3
extra buffer-rows so when we add 3 extra buffer-columns it should be
possible to rewrite the algorithm so that it processes one tile at a
time instead of a full row.

If the algorithm is pixel-by-pixel (each output pixel depends only on
exactly one input pixel from each region being iterated over, and
those regions are all the same size) there is absolutely no excuse not
to use the pixel region iterator, which will automagically minimize
tile accesses to the lowest number possible.  IIRC, the bumpmap
drawable need not be the same size as the input/output drawable, so a
pure region iterator cannot be used, but at least using the region
iterator for the input  output would limit the use of direct fetches
to the bumpmap drawable.

Kelly
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer



Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

2001-04-06 Thread Ernst Lippe

Kelly Martin wrote:
 
 If the algorithm is pixel-by-pixel (each output pixel depends only on
 exactly one input pixel from each region being iterated over, and
 those regions are all the same size) there is absolutely no excuse not
 to use the pixel region iterator, which will automagically minimize
 tile accesses to the lowest number possible. 
 IIRC, the bumpmap
 drawable need not be the same size as the input/output drawable, 

The sizes may be different indeed and there also an option for tiling
(repeating) the bitmap.

 so a
 pure region iterator cannot be used, but at least using the region
 iterator for the input  output would limit the use of direct fetches
 to the bumpmap drawable.
 

When the output pixel depends on neighboring pixels (e.g. I strongly
suspect that bumpmap uses a 3 x 3 neighborhood) the region iterator does
not work very well.
Sometime ago I wrote a piece of code that can handle neighborhoods
without fetching tiles twice. I needed this because I was applying
convolutions where I needed a neighborhood of say 20x20. Basically what
my code does is maintain a buffer that holds two rows of tiles plus some
extra for the vertical neighborhood. It handles those nasty details like
adding borders and fetching tiles. It works for my plugin but it must
still be documented and some missing features must still be implemented. 
There are several filters where it could potentially be used, so if
anyone is interested I could try to clean it up.

Greetings,
Ernst
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer



Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

2001-04-06 Thread Kelly Martin


When the output pixel depends on neighboring pixels (e.g. I strongly
suspect that bumpmap uses a 3 x 3 neighborhood) the region iterator
does not work very well.

Not on its own.  However, if you iterate the output region with the
pixel region iterator, and use a pixel fetcher like the one in
whirlpinch for the input sourcing, and there is at least some weak
relationship between input and output locality, the standard tile
cache should be enough to obtain decent efficiency.

Obviously, the more you know about your data access patterns, the more
efficient an access strategy you can develop.  

Kelly
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer



[Gimp-developer] [patch] unbelievable speedup for bumpmap...

2001-04-05 Thread Georg Acher

Hi,
I just looked into bumpmap.c and tried to figure out if it can profit from
blocking and played a bit with the code. It seems that there is some major
(performance) problem with the gimp_pixel_rgn_get/set_row-calls in Gimp
1.2.1.

The original bumpmap took for my 1400*1400 image about 30s, when I commented
out one of the gimp_pixel_rgn_get/set_row() that access the source or the
destination image, the whole stuff took only about 2.4s, while the
calculations were still done! 

I experimented a bit, and found out, that apparently the "switch" between
get and set seems to be the problem. I changed the code to get 32 rows,
calculate them and the write the 32 back. It took only 3s overall, the
resulting image is the same as with the 30s boiling time. 

That's a speedup of 10!

Can someone please try the patch and confirm that I'm not dreaming?
-- 
 Georg Acher, [EMAIL PROTECTED] 
 http://www.in.tum.de/~acher/
  "Oh no, not again !" The bowl of petunias  


--- bumpmap.c.org   Thu Apr  5 21:01:03 2001
+++ bumpmap.c   Thu Apr  5 23:26:39 2001
@@ -500,6 +500,7 @@
   gimp_drawable_detach (drawable);
 }
 
+#define BLOCKS 32
 static void
 bumpmap (void)
 {
@@ -509,8 +510,8 @@
   gint  bm_width, bm_height, bm_bpp, bm_has_alpha;
   gint  yofs1, yofs2, yofs3;
   guchar   *bm_row1, *bm_row2, *bm_row3, *bm_tmprow;
-  guchar   *src_row, *dest_row;
-  gint  y;
+  guchar   *src_row[BLOCKS], *dest_row[BLOCKS];
+  gint  y,n;
   gint  progress;
   gint  tmp;
 
@@ -551,8 +552,11 @@
   bm_row2 = g_new (guchar, bm_width * bm_bpp);
   bm_row3 = g_new (guchar, bm_width * bm_bpp);
 
-  src_row  = g_new (guchar, sel_width * img_bpp);
-  dest_row = g_new (guchar, sel_width * img_bpp);
+  for(n=0;nBLOCKS;n++)
+  {
+ src_row[n]  = g_new (guchar, sel_width * img_bpp);
+ dest_row[n] = g_new (guchar, sel_width * img_bpp);
+  }
 
   /* Initialize pixel regions */
   gimp_pixel_rgn_init (src_rgn, drawable,
@@ -576,32 +580,40 @@
 
   progress = 0;
 
-  for (y = sel_y1; y  sel_y2; y++)
+  for (y = sel_y1; y  sel_y2; y+=BLOCKS)
 {
-  gimp_pixel_rgn_get_row (src_rgn, src_row, sel_x1, y, sel_width);
-
-  bumpmap_row (src_row, dest_row, sel_width, img_bpp, img_has_alpha,
-  bm_row1, bm_row2, bm_row3, bm_width, bmvals.xofs,
-  bmvals.tiled, 
-  y == CLAMP (y, - bmvals.yofs, - bmvals.yofs + bm_height),
-  params);
-
-  gimp_pixel_rgn_set_row (dest_rgn, dest_row, sel_x1, y, sel_width);
-
-  /* Next line */
-
-  bm_tmprow = bm_row1;
-  bm_row1   = bm_row2;
-  bm_row2   = bm_row3;
-  bm_row3   = bm_tmprow;
-   
-  if (++yofs3 == bm_height)
-   yofs3 = 0;
-
-  gimp_pixel_rgn_get_row (bm_rgn, bm_row3, 0, yofs3, bm_width);
-  bumpmap_convert_row (bm_row3, bm_width, bm_bpp, bm_has_alpha, params.lut);
-
-  gimp_progress_update ((double) ++progress / sel_height);
+   
+   for(n=0;nBLOCKS  (y+n) != sel_y2;n++)
+   { 
+   gimp_pixel_rgn_get_row (src_rgn, src_row[n], sel_x1, y+n, 
+sel_width);
+   
+   bumpmap_row (src_row[n], dest_row[n], sel_width, img_bpp, 
+img_has_alpha,
+bm_row1, bm_row2, bm_row3, bm_width, bmvals.xofs,
+bmvals.tiled, 
+(y+n) == CLAMP (y+n, - bmvals.yofs, - bmvals.yofs + 
+bm_height),
+params);
+   
+   /* Next line */
+  
+   bm_tmprow = bm_row1;
+   bm_row1   = bm_row2;
+   bm_row2   = bm_row3;
+   bm_row3   = bm_tmprow;  
+   
+   if (++yofs3 == bm_height)
+   yofs3 = 0;
+   
+   gimp_pixel_rgn_get_row (bm_rgn, bm_row3, 0, yofs3, bm_width);
+   bumpmap_convert_row (bm_row3, bm_width, bm_bpp, bm_has_alpha, 
+params.lut);
+   
+   }
+
+   for(n=0;nBLOCKS  (y+n) != sel_y2;n++)
+   {
+   gimp_pixel_rgn_set_row (dest_rgn, dest_row[n], sel_x1, y+n, 
+sel_width);
+   }
+   gimp_progress_update ((double) progress / sel_height);
+   progress+=BLOCKS;
 }
 
   /* Done */
@@ -609,9 +621,11 @@
   g_free (bm_row1);
   g_free (bm_row2);
   g_free (bm_row3);
-  g_free (src_row);
-  g_free (dest_row);
-
+  for(n=0;nBLOCKS;n++)
+  {
+ g_free (src_row[n]);
+ g_free (dest_row[n]);
+  }
   if (bm_drawable != drawable)
 gimp_drawable_detach (bm_drawable);