date:20010406

Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

2001-04-06 Thread Kelly Martin



>When the output pixel depends on neighboring pixels (e.g. I strongly
>suspect that bumpmap uses a 3 x 3 neighborhood) the region iterator
>does not work very well.

Not on its own.  However, if you iterate the output region with the
pixel region iterator, and use a pixel fetcher like the one in
whirlpinch for the input sourcing, and there is at least some weak
relationship between input and output locality, the standard tile
cache should be enough to obtain decent efficiency.

Obviously, the more you know about your data access patterns, the more
efficient an access strategy you can develop.  

Kelly
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer

Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

2001-04-06 Thread Ernst Lippe

Kelly Martin wrote:
> 
> If the algorithm is pixel-by-pixel (each output pixel depends only on
> exactly one input pixel from each region being iterated over, and
> those regions are all the same size) there is absolutely no excuse not
> to use the pixel region iterator, which will automagically minimize
> tile accesses to the lowest number possible. 
> IIRC, the bumpmap
> drawable need not be the same size as the input/output drawable, 

The sizes may be different indeed and there also an option for tiling
(repeating) the bitmap.

> so a
> pure region iterator cannot be used, but at least using the region
> iterator for the input & output would limit the use of direct fetches
> to the bumpmap drawable.
> 

When the output pixel depends on neighboring pixels (e.g. I strongly
suspect that bumpmap uses a 3 x 3 neighborhood) the region iterator does
not work very well.
Sometime ago I wrote a piece of code that can handle neighborhoods
without fetching tiles twice. I needed this because I was applying
convolutions where I needed a neighborhood of say 20x20. Basically what
my code does is maintain a buffer that holds two rows of tiles plus some
extra for the vertical neighborhood. It handles those nasty details like
adding borders and fetching tiles. It works for my plugin but it must
still be documented and some missing features must still be implemented. 
There are several filters where it could potentially be used, so if
anyone is interested I could try to clean it up.

Greetings,
Ernst
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer

Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

2001-04-06 Thread Kelly Martin


On Fri, 06 Apr 2001 14:55:32 +0200, Ernst Lippe <[EMAIL PROTECTED]> said:

>Other improvements are still possible. I expect that it should be
>possible to rewrite the algorithm such that the tile cache contains
>only 3 tiles. From what I see the algorithm is the same in the
>horizontal and vertical direction. The current implementation uses 3
>extra buffer-rows so when we add 3 extra buffer-columns it should be
>possible to rewrite the algorithm so that it processes one tile at a
>time instead of a full row.

If the algorithm is pixel-by-pixel (each output pixel depends only on
exactly one input pixel from each region being iterated over, and
those regions are all the same size) there is absolutely no excuse not
to use the pixel region iterator, which will automagically minimize
tile accesses to the lowest number possible.  IIRC, the bumpmap
drawable need not be the same size as the input/output drawable, so a
pure region iterator cannot be used, but at least using the region
iterator for the input & output would limit the use of direct fetches
to the bumpmap drawable.

Kelly
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer

Re: [Gimp-developer] [patch] Major speedup for whirl&pinch plugin

2001-04-06 Thread Georg Acher


On Fri, Apr 06, 2001 at 03:58:10PM +0200, Ernst Lippe wrote:
 
> > I don't know how large a tile is, but since IMHO the major impact of
> > blocking seems to come from the CPU cache, I suspect that is too big for
> > older CPUs. I have done the whirl&pinch blocking thing about three years ago
> > (and forgot to send the patch), and tried it on an Alpha21164 and a P5.
> 
> I think you're looking in the wrong direction here. Similar to the
> bumpmap (see my other message) I strongly suspect that the tile-cache is
> too small.

For bumpmap, it is of course a tile cache problem, for whirl&pinch I'm not
sure, since the performance boost differed very much on the Alpha vs. P5
vs. Athlon depending on the blocking size. So I assume that the CPU cache
plays a major part.

> You can check this for yourself by profiling your code. I expect that
> there is hardly any difference in processor time between your version
> and the original but that there is an important difference in the number
> of calls to gimp_tile_get and gimp_tile_put. (Sorry, did not try that
> myself).

Ok, I will try that. I have no big knowledge of Gimp's internals, but that
seems preety easy to find out...

-- 
 Georg Acher, [EMAIL PROTECTED] 
 http://www.in.tum.de/~acher/
  "Oh no, not again !" The bowl of petunias  
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer

[Gimp-developer] [patch] unbelievable speedup for bumpmap...

2001-04-06 Thread Kelly Martin


On Thu, 5 Apr 2001 23:55:57 +0200, Georg Acher <[EMAIL PROTECTED]> said:

>Hi, I just looked into bumpmap.c and tried to figure out if it can
>profit from blocking and played a bit with the code. It seems that
>there is some major (performance) problem with the
>gimp_pixel_rgn_get/set_row-calls in Gimp 1.2.1.

>The original bumpmap took for my 1400*1400 image about 30s, when I
>commented out one of the gimp_pixel_rgn_get/set_row() that access the
>source or the destination image, the whole stuff took only about
>2.4s, while the calculations were still done!

>I experimented a bit, and found out, that apparently the "switch"
>between get and set seems to be the problem. I changed the code to
>get 32 rows, calculate them and the write the 32 back. It took only
>3s overall, the resulting image is the same as with the 30s boiling
>time.

I suspect this is because the libgimp tile cache is too small.
Libgimp allows a plugin to ask libgimp to cache a small (or not so
small, if it wants to burn memory) number of tiles in its own process
space so as to avoid fetches over the wire.  There's a call to
gimp_tile_cache_ntiles in bumpmap.c but it may simply not be large
enough, or there may be a problem with the tile caching code in
libgimp.  

Another thing to look at in my copious spare time. :)

Kelly
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer

Re: [Gimp-developer] [patch] Major speedup for whirl&pinch plugin

2001-04-06 Thread Ernst Lippe

I think this is yet another tile-cache problem.

Georg Acher wrote:
> 
> On Thu, Apr 05, 2001 at 12:45:52PM -0500, Kelly Martin wrote:
> 
> > Hm, it does not.  The issue with whirlpinch is that there's only a
> > weak locality relationship between destionation pixels (which are
> > iterated across the image) and source pixels (which are fetched with
> > the pixel fetcher).  I haven't looked too closely at your blocking
> 
> That is right, but destination and source for themselves have good locality
> (ie. the next pixel isn't 500 pixels away from the last).
This is an important observation. 

> 
> > patch, but I suspect that much the same improvement would be had by
> > using a pixel region (which respects tiles) to iterate across the
> > destination region.
I agree, but the current algorithm writes two rows (top and bottom) at
the same time, so it is not immediately possible to use the standard
pixel region iterator.

> 
> That is possible... Is there a filter that definitely uses the pixel region
> stuff? Most filters I have seen only use one row, which may not be enough
> "locally", since it uses only one pixel but has to fill a whole cacheline
> (4/8 pixel). I will try whether I can speed up bumpmap also, since this
> takes also the magical 30s and is much more often used in scripts than the
> whirl&pinch module.
> 
> I don't know how large a tile is, but since IMHO the major impact of
> blocking seems to come from the CPU cache, I suspect that is too big for
> older CPUs. I have done the whirl&pinch blocking thing about three years ago
> (and forgot to send the patch), and tried it on an Alpha21164 and a P5.

I think you're looking in the wrong direction here. Similar to the
bumpmap (see my other message) I strongly suspect that the tile-cache is
too small.
You can check this for yourself by profiling your code. I expect that
there is hardly any difference in processor time between your version
and the original but that there is an important difference in the number
of calls to gimp_tile_get and gimp_tile_put. (Sorry, did not try that
myself).
Requesting tiles from the main Gimp process is pretty expensive, it
involves several context switches and a lot of copying. This is orders
of magnitude slower than copying data within the plugin process from the
tile cache to some other buffer. The whirlpinch plugin uses a cache size
that is only big enough to handle the output tiles, there is no room for
input tiles. This leads to a series of cache misses and horrible
performance. Your modifications improve the cache-hit ratio because it
processes larger chunks of data.
However, it is not a fundamental solution. The easiest short time
solution is to increase the cache size with the number of input tiles
that are needed to compute one row of the output (I don't know if this
can be easily determined).
The more fundamental solution is to rewrite the algorithm so that it
works on a tile by tile basis. In that case the minimum cache size is
only 2 (for the output tiles) plus the number of input tiles that are
needed to compute one output tile. Of course it is better to use a
somewhat larger cache size because adjacent output tiles can use the
same input tiles.

Greetings,

Ernst
<[EMAIL PROTECTED]>
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer

Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

2001-04-06 Thread Ernst Lippe


The speedup that you see is probably mainly caused by better caching.
There is a bug in the tile cache size for the plugin.
The cache is under most circumstances too small and this means that
every requested tile is not in the cache and must be transmitted from
the main gimp process (SLOW).
Effectively this means that every tile of the drawable is read and
written gimp_tile_height (normally 64) times. Your modification is
better, I expect that every tile is read and written
gimp_tile_height/BLOCKS = 2 times (but when a selection is used it will
normally be 3 times). 
This could be somewhat improved by setting BLOCKS = gimp_tile_height,
when the selection starts at a y-position that is a multiple of
gimp_tile_height (e.g. when there is no selection) every tile will
probably be read/written once, but when the selection starts at a
different position, every tile will still be read/written twice.

I used a different approach that gives speedups similar to yours but
that should read/write every tile only once. The solution is pretty
simple: just enlarge the cache size.
What is the problem with the cache size? The current code uses:
  gimp_tile_cache_ntiles (2 * (drawable->width +
   gimp_tile_width () - 1) /
  gimp_tile_width ());
The idea here is to cache one row of tiles for the source drawable and
one row for the destination drawable. But this is not enough because
there is no room in the cache for the bitmap tiles! There is a smaller
problem here too, when a selection is used, it is overkill to have a
cache for the full width of the drawable.

I've attached a patch with my modifications. I hope that someone
examines them critically and incorporates them into the distribution.
I prefer my approach because it should give better performance and it
keeps the code cleaner.

Other improvements are still possible. I expect that it should be
possible to rewrite the algorithm such that the tile cache contains only
3 tiles. From what I see the algorithm is the same in the horizontal and
vertical direction. The current implementation uses 3 extra buffer-rows
so when we add 3 extra buffer-columns it should be possible to rewrite
the algorithm so that it processes one tile at a time instead of a full
row.

Thanks for pointing out a pretty big performance problem with the
plug-in.

Greetings,
Ernst
<[EMAIL PROTECTED]>
 patchfile

Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

2001-04-06 Thread Austin Donnelly


Oops - I missed out some vital pixel region initialisation.


On Friday, 6 Apr 2001, Austin Donnelly wrote:

> gimp_drawable_mask_bounds (drawable->id, &x1, &y1, &x2, &y2);
> 
> for (y = y1; y < y2; y += tile_width - (y % tile_width))
> {
> for (x = x1; x < x2; x += tile_width - (x % tile_width))
> {

  /* set up the source and dest regions */
  gimp_pixel_rgn_init (&src_rgn, drawable, x, y, x_step, y_step,
   FALSE/*dirty*/, FALSE/*shadow*/);

  gimp_pixel_rgn_init (&dest_rgn, drawable, x, y, x_step, y_step,
   TRUE/*dirty*/, TRUE/*shadow*/);


>   /* process the image in tile-sized regions */
>   for (pr = gimp_pixel_rgns_register (2, &src_rgn, &dest_rgn);
>pr != NULL;
>pr = gimp_pixel_rgns_process (pr))
>   {
>   src_row  = src_rgn.data;
>   dest_row = dest_rgn.data;
> 
>   for (row = 0; row < src_rgn.h; row++)
>   {
>   src  = src_row;
>   dest = dest_row;
> 
>   for (col = 0; col < src_rgn.w; col++)
>   {
> 
>   /* do stuff with src[0..bpp] and dest[0..bpp] */
> 
>   src  += src_rgn.bpp;
>   dest += dest_rgn.bpp;
>   }
> 
>   src_row  += src_rgn.rowstride;
>   dest_row += dest_rgn.rowstride;
>   }
>   }
> }
> }
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer

Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

2001-04-06 Thread Austin Donnelly


On Thursday, 5 Apr 2001, Georg Acher wrote:

> I just looked into bumpmap.c and tried to figure out if it can profit from
> blocking and played a bit with the code. It seems that there is some major
> (performance) problem with the gimp_pixel_rgn_get/set_row-calls in Gimp
> 1.2.1.
> 
> The original bumpmap took for my 1400*1400 image about 30s, when I commented
> out one of the gimp_pixel_rgn_get/set_row() that access the source or the
> destination image, the whole stuff took only about 2.4s, while the
> calculations were still done! 
> 
> I experimented a bit, and found out, that apparently the "switch" between
> get and set seems to be the problem. I changed the code to get 32 rows,
> calculate them and the write the 32 back. It took only 3s overall, the
> resulting image is the same as with the 30s boiling time. 

Have a look at the gimp_pixel_rgns_register() and
gimp_pixel_rgns_process() functions.  They do similar blocking on a
per-tile basis.

Most plugins should be using them, but ones with more complex access
patterns (eg accessing pixels across tile boundaries) probably won't.'

Typical use is something like this:


gimp_drawable_mask_bounds (drawable->id, &x1, &y1, &x2, &y2);

for (y = y1; y < y2; y += tile_width - (y % tile_width))
{
for (x = x1; x < x2; x += tile_width - (x % tile_width))
{
/* process the image in tile-sized regions */
for (pr = gimp_pixel_rgns_register (2, &src_rgn, &dest_rgn);
 pr != NULL;
 pr = gimp_pixel_rgns_process (pr))
{
src_row  = src_rgn.data;
dest_row = dest_rgn.data;

for (row = 0; row < src_rgn.h; row++)
{
src  = src_row;
dest = dest_row;

for (col = 0; col < src_rgn.w; col++)
{

/* do stuff with src[0..bpp] and dest[0..bpp] */

src  += src_rgn.bpp;
dest += dest_rgn.bpp;
}

src_row  += src_rgn.rowstride;
dest_row += dest_rgn.rowstride;
}
}
}
}

Of course this precise form will only work where there's a one-to-one
correspondence between input and output pixels.  I suspect whirl-pinch
has more complex access patterns, in which case you'll need to be more
creative with the loop structure.

Austin
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer

Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

Re: [Gimp-developer] [patch] Major speedup for whirl&pinch plugin

[Gimp-developer] [patch] unbelievable speedup for bumpmap...

Re: [Gimp-developer] [patch] Major speedup for whirl&pinch plugin

Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...

9 matches

Site Navigation

Mail list logo

Footer information