Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...
On Friday, 6 Apr 2001, Ernst Lippe wrote: > Sometime ago I wrote a piece of code that can handle neighborhoods > without fetching tiles twice. I needed this because I was applying > convolutions where I needed a neighborhood of say 20x20. Basically what > my code does is maintain a buffer that holds two rows of tiles plus some > extra for the vertical neighborhood. It handles those nasty details like > adding borders and fetching tiles. It works for my plugin but it must > still be documented and some missing features must still be implemented. > There are several filters where it could potentially be used, so if > anyone is interested I could try to clean it up. Actually, the core could do with a tile convolution, since currently it copies data into a tempbuf before convolving it. This makes some tools more effificient (or incorrect) eg iscissors. Austin ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...
>When the output pixel depends on neighboring pixels (e.g. I strongly >suspect that bumpmap uses a 3 x 3 neighborhood) the region iterator >does not work very well. Not on its own. However, if you iterate the output region with the pixel region iterator, and use a pixel fetcher like the one in whirlpinch for the input sourcing, and there is at least some weak relationship between input and output locality, the standard tile cache should be enough to obtain decent efficiency. Obviously, the more you know about your data access patterns, the more efficient an access strategy you can develop. Kelly ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...
Kelly Martin wrote: > > If the algorithm is pixel-by-pixel (each output pixel depends only on > exactly one input pixel from each region being iterated over, and > those regions are all the same size) there is absolutely no excuse not > to use the pixel region iterator, which will automagically minimize > tile accesses to the lowest number possible. > IIRC, the bumpmap > drawable need not be the same size as the input/output drawable, The sizes may be different indeed and there also an option for tiling (repeating) the bitmap. > so a > pure region iterator cannot be used, but at least using the region > iterator for the input & output would limit the use of direct fetches > to the bumpmap drawable. > When the output pixel depends on neighboring pixels (e.g. I strongly suspect that bumpmap uses a 3 x 3 neighborhood) the region iterator does not work very well. Sometime ago I wrote a piece of code that can handle neighborhoods without fetching tiles twice. I needed this because I was applying convolutions where I needed a neighborhood of say 20x20. Basically what my code does is maintain a buffer that holds two rows of tiles plus some extra for the vertical neighborhood. It handles those nasty details like adding borders and fetching tiles. It works for my plugin but it must still be documented and some missing features must still be implemented. There are several filters where it could potentially be used, so if anyone is interested I could try to clean it up. Greetings, Ernst ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...
On Fri, 06 Apr 2001 14:55:32 +0200, Ernst Lippe <[EMAIL PROTECTED]> said: >Other improvements are still possible. I expect that it should be >possible to rewrite the algorithm such that the tile cache contains >only 3 tiles. From what I see the algorithm is the same in the >horizontal and vertical direction. The current implementation uses 3 >extra buffer-rows so when we add 3 extra buffer-columns it should be >possible to rewrite the algorithm so that it processes one tile at a >time instead of a full row. If the algorithm is pixel-by-pixel (each output pixel depends only on exactly one input pixel from each region being iterated over, and those regions are all the same size) there is absolutely no excuse not to use the pixel region iterator, which will automagically minimize tile accesses to the lowest number possible. IIRC, the bumpmap drawable need not be the same size as the input/output drawable, so a pure region iterator cannot be used, but at least using the region iterator for the input & output would limit the use of direct fetches to the bumpmap drawable. Kelly ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
[Gimp-developer] [patch] unbelievable speedup for bumpmap...
On Thu, 5 Apr 2001 23:55:57 +0200, Georg Acher <[EMAIL PROTECTED]> said: >Hi, I just looked into bumpmap.c and tried to figure out if it can >profit from blocking and played a bit with the code. It seems that >there is some major (performance) problem with the >gimp_pixel_rgn_get/set_row-calls in Gimp 1.2.1. >The original bumpmap took for my 1400*1400 image about 30s, when I >commented out one of the gimp_pixel_rgn_get/set_row() that access the >source or the destination image, the whole stuff took only about >2.4s, while the calculations were still done! >I experimented a bit, and found out, that apparently the "switch" >between get and set seems to be the problem. I changed the code to >get 32 rows, calculate them and the write the 32 back. It took only >3s overall, the resulting image is the same as with the 30s boiling >time. I suspect this is because the libgimp tile cache is too small. Libgimp allows a plugin to ask libgimp to cache a small (or not so small, if it wants to burn memory) number of tiles in its own process space so as to avoid fetches over the wire. There's a call to gimp_tile_cache_ntiles in bumpmap.c but it may simply not be large enough, or there may be a problem with the tile caching code in libgimp. Another thing to look at in my copious spare time. :) Kelly ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...
The speedup that you see is probably mainly caused by better caching. There is a bug in the tile cache size for the plugin. The cache is under most circumstances too small and this means that every requested tile is not in the cache and must be transmitted from the main gimp process (SLOW). Effectively this means that every tile of the drawable is read and written gimp_tile_height (normally 64) times. Your modification is better, I expect that every tile is read and written gimp_tile_height/BLOCKS = 2 times (but when a selection is used it will normally be 3 times). This could be somewhat improved by setting BLOCKS = gimp_tile_height, when the selection starts at a y-position that is a multiple of gimp_tile_height (e.g. when there is no selection) every tile will probably be read/written once, but when the selection starts at a different position, every tile will still be read/written twice. I used a different approach that gives speedups similar to yours but that should read/write every tile only once. The solution is pretty simple: just enlarge the cache size. What is the problem with the cache size? The current code uses: gimp_tile_cache_ntiles (2 * (drawable->width + gimp_tile_width () - 1) / gimp_tile_width ()); The idea here is to cache one row of tiles for the source drawable and one row for the destination drawable. But this is not enough because there is no room in the cache for the bitmap tiles! There is a smaller problem here too, when a selection is used, it is overkill to have a cache for the full width of the drawable. I've attached a patch with my modifications. I hope that someone examines them critically and incorporates them into the distribution. I prefer my approach because it should give better performance and it keeps the code cleaner. Other improvements are still possible. I expect that it should be possible to rewrite the algorithm such that the tile cache contains only 3 tiles. From what I see the algorithm is the same in the horizontal and vertical direction. The current implementation uses 3 extra buffer-rows so when we add 3 extra buffer-columns it should be possible to rewrite the algorithm so that it processes one tile at a time instead of a full row. Thanks for pointing out a pretty big performance problem with the plug-in. Greetings, Ernst <[EMAIL PROTECTED]> patchfile
Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...
Oops - I missed out some vital pixel region initialisation. On Friday, 6 Apr 2001, Austin Donnelly wrote: > gimp_drawable_mask_bounds (drawable->id, &x1, &y1, &x2, &y2); > > for (y = y1; y < y2; y += tile_width - (y % tile_width)) > { > for (x = x1; x < x2; x += tile_width - (x % tile_width)) > { /* set up the source and dest regions */ gimp_pixel_rgn_init (&src_rgn, drawable, x, y, x_step, y_step, FALSE/*dirty*/, FALSE/*shadow*/); gimp_pixel_rgn_init (&dest_rgn, drawable, x, y, x_step, y_step, TRUE/*dirty*/, TRUE/*shadow*/); > /* process the image in tile-sized regions */ > for (pr = gimp_pixel_rgns_register (2, &src_rgn, &dest_rgn); >pr != NULL; >pr = gimp_pixel_rgns_process (pr)) > { > src_row = src_rgn.data; > dest_row = dest_rgn.data; > > for (row = 0; row < src_rgn.h; row++) > { > src = src_row; > dest = dest_row; > > for (col = 0; col < src_rgn.w; col++) > { > > /* do stuff with src[0..bpp] and dest[0..bpp] */ > > src += src_rgn.bpp; > dest += dest_rgn.bpp; > } > > src_row += src_rgn.rowstride; > dest_row += dest_rgn.rowstride; > } > } > } > } ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...
On Thursday, 5 Apr 2001, Georg Acher wrote: > I just looked into bumpmap.c and tried to figure out if it can profit from > blocking and played a bit with the code. It seems that there is some major > (performance) problem with the gimp_pixel_rgn_get/set_row-calls in Gimp > 1.2.1. > > The original bumpmap took for my 1400*1400 image about 30s, when I commented > out one of the gimp_pixel_rgn_get/set_row() that access the source or the > destination image, the whole stuff took only about 2.4s, while the > calculations were still done! > > I experimented a bit, and found out, that apparently the "switch" between > get and set seems to be the problem. I changed the code to get 32 rows, > calculate them and the write the 32 back. It took only 3s overall, the > resulting image is the same as with the 30s boiling time. Have a look at the gimp_pixel_rgns_register() and gimp_pixel_rgns_process() functions. They do similar blocking on a per-tile basis. Most plugins should be using them, but ones with more complex access patterns (eg accessing pixels across tile boundaries) probably won't.' Typical use is something like this: gimp_drawable_mask_bounds (drawable->id, &x1, &y1, &x2, &y2); for (y = y1; y < y2; y += tile_width - (y % tile_width)) { for (x = x1; x < x2; x += tile_width - (x % tile_width)) { /* process the image in tile-sized regions */ for (pr = gimp_pixel_rgns_register (2, &src_rgn, &dest_rgn); pr != NULL; pr = gimp_pixel_rgns_process (pr)) { src_row = src_rgn.data; dest_row = dest_rgn.data; for (row = 0; row < src_rgn.h; row++) { src = src_row; dest = dest_row; for (col = 0; col < src_rgn.w; col++) { /* do stuff with src[0..bpp] and dest[0..bpp] */ src += src_rgn.bpp; dest += dest_rgn.bpp; } src_row += src_rgn.rowstride; dest_row += dest_rgn.rowstride; } } } } Of course this precise form will only work where there's a one-to-one correspondence between input and output pixels. I suspect whirl-pinch has more complex access patterns, in which case you'll need to be more creative with the loop structure. Austin ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
[Gimp-developer] [patch] unbelievable speedup for bumpmap...
Hi, I just looked into bumpmap.c and tried to figure out if it can profit from blocking and played a bit with the code. It seems that there is some major (performance) problem with the gimp_pixel_rgn_get/set_row-calls in Gimp 1.2.1. The original bumpmap took for my 1400*1400 image about 30s, when I commented out one of the gimp_pixel_rgn_get/set_row() that access the source or the destination image, the whole stuff took only about 2.4s, while the calculations were still done! I experimented a bit, and found out, that apparently the "switch" between get and set seems to be the problem. I changed the code to get 32 rows, calculate them and the write the 32 back. It took only 3s overall, the resulting image is the same as with the 30s boiling time. That's a speedup of 10! Can someone please try the patch and confirm that I'm not dreaming? -- Georg Acher, [EMAIL PROTECTED] http://www.in.tum.de/~acher/ "Oh no, not again !" The bowl of petunias --- bumpmap.c.org Thu Apr 5 21:01:03 2001 +++ bumpmap.c Thu Apr 5 23:26:39 2001 @@ -500,6 +500,7 @@ gimp_drawable_detach (drawable); } +#define BLOCKS 32 static void bumpmap (void) { @@ -509,8 +510,8 @@ gint bm_width, bm_height, bm_bpp, bm_has_alpha; gint yofs1, yofs2, yofs3; guchar *bm_row1, *bm_row2, *bm_row3, *bm_tmprow; - guchar *src_row, *dest_row; - gint y; + guchar *src_row[BLOCKS], *dest_row[BLOCKS]; + gint y,n; gint progress; gint tmp; @@ -551,8 +552,11 @@ bm_row2 = g_new (guchar, bm_width * bm_bpp); bm_row3 = g_new (guchar, bm_width * bm_bpp); - src_row = g_new (guchar, sel_width * img_bpp); - dest_row = g_new (guchar, sel_width * img_bpp); + for(n=0;n