Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...
>When the output pixel depends on neighboring pixels (e.g. I strongly >suspect that bumpmap uses a 3 x 3 neighborhood) the region iterator >does not work very well. Not on its own. However, if you iterate the output region with the pixel region iterator, and use a pixel fetcher like the one in whirlpinch for the input sourcing, and there is at least some weak relationship between input and output locality, the standard tile cache should be enough to obtain decent efficiency. Obviously, the more you know about your data access patterns, the more efficient an access strategy you can develop. Kelly ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...
Kelly Martin wrote: > > If the algorithm is pixel-by-pixel (each output pixel depends only on > exactly one input pixel from each region being iterated over, and > those regions are all the same size) there is absolutely no excuse not > to use the pixel region iterator, which will automagically minimize > tile accesses to the lowest number possible. > IIRC, the bumpmap > drawable need not be the same size as the input/output drawable, The sizes may be different indeed and there also an option for tiling (repeating) the bitmap. > so a > pure region iterator cannot be used, but at least using the region > iterator for the input & output would limit the use of direct fetches > to the bumpmap drawable. > When the output pixel depends on neighboring pixels (e.g. I strongly suspect that bumpmap uses a 3 x 3 neighborhood) the region iterator does not work very well. Sometime ago I wrote a piece of code that can handle neighborhoods without fetching tiles twice. I needed this because I was applying convolutions where I needed a neighborhood of say 20x20. Basically what my code does is maintain a buffer that holds two rows of tiles plus some extra for the vertical neighborhood. It handles those nasty details like adding borders and fetching tiles. It works for my plugin but it must still be documented and some missing features must still be implemented. There are several filters where it could potentially be used, so if anyone is interested I could try to clean it up. Greetings, Ernst ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...
On Fri, 06 Apr 2001 14:55:32 +0200, Ernst Lippe <[EMAIL PROTECTED]> said: >Other improvements are still possible. I expect that it should be >possible to rewrite the algorithm such that the tile cache contains >only 3 tiles. From what I see the algorithm is the same in the >horizontal and vertical direction. The current implementation uses 3 >extra buffer-rows so when we add 3 extra buffer-columns it should be >possible to rewrite the algorithm so that it processes one tile at a >time instead of a full row. If the algorithm is pixel-by-pixel (each output pixel depends only on exactly one input pixel from each region being iterated over, and those regions are all the same size) there is absolutely no excuse not to use the pixel region iterator, which will automagically minimize tile accesses to the lowest number possible. IIRC, the bumpmap drawable need not be the same size as the input/output drawable, so a pure region iterator cannot be used, but at least using the region iterator for the input & output would limit the use of direct fetches to the bumpmap drawable. Kelly ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] [patch] Major speedup for whirl&pinch plugin
On Fri, Apr 06, 2001 at 03:58:10PM +0200, Ernst Lippe wrote: > > I don't know how large a tile is, but since IMHO the major impact of > > blocking seems to come from the CPU cache, I suspect that is too big for > > older CPUs. I have done the whirl&pinch blocking thing about three years ago > > (and forgot to send the patch), and tried it on an Alpha21164 and a P5. > > I think you're looking in the wrong direction here. Similar to the > bumpmap (see my other message) I strongly suspect that the tile-cache is > too small. For bumpmap, it is of course a tile cache problem, for whirl&pinch I'm not sure, since the performance boost differed very much on the Alpha vs. P5 vs. Athlon depending on the blocking size. So I assume that the CPU cache plays a major part. > You can check this for yourself by profiling your code. I expect that > there is hardly any difference in processor time between your version > and the original but that there is an important difference in the number > of calls to gimp_tile_get and gimp_tile_put. (Sorry, did not try that > myself). Ok, I will try that. I have no big knowledge of Gimp's internals, but that seems preety easy to find out... -- Georg Acher, [EMAIL PROTECTED] http://www.in.tum.de/~acher/ "Oh no, not again !" The bowl of petunias ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
[Gimp-developer] [patch] unbelievable speedup for bumpmap...
On Thu, 5 Apr 2001 23:55:57 +0200, Georg Acher <[EMAIL PROTECTED]> said: >Hi, I just looked into bumpmap.c and tried to figure out if it can >profit from blocking and played a bit with the code. It seems that >there is some major (performance) problem with the >gimp_pixel_rgn_get/set_row-calls in Gimp 1.2.1. >The original bumpmap took for my 1400*1400 image about 30s, when I >commented out one of the gimp_pixel_rgn_get/set_row() that access the >source or the destination image, the whole stuff took only about >2.4s, while the calculations were still done! >I experimented a bit, and found out, that apparently the "switch" >between get and set seems to be the problem. I changed the code to >get 32 rows, calculate them and the write the 32 back. It took only >3s overall, the resulting image is the same as with the 30s boiling >time. I suspect this is because the libgimp tile cache is too small. Libgimp allows a plugin to ask libgimp to cache a small (or not so small, if it wants to burn memory) number of tiles in its own process space so as to avoid fetches over the wire. There's a call to gimp_tile_cache_ntiles in bumpmap.c but it may simply not be large enough, or there may be a problem with the tile caching code in libgimp. Another thing to look at in my copious spare time. :) Kelly ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] [patch] Major speedup for whirl&pinch plugin
I think this is yet another tile-cache problem. Georg Acher wrote: > > On Thu, Apr 05, 2001 at 12:45:52PM -0500, Kelly Martin wrote: > > > Hm, it does not. The issue with whirlpinch is that there's only a > > weak locality relationship between destionation pixels (which are > > iterated across the image) and source pixels (which are fetched with > > the pixel fetcher). I haven't looked too closely at your blocking > > That is right, but destination and source for themselves have good locality > (ie. the next pixel isn't 500 pixels away from the last). This is an important observation. > > > patch, but I suspect that much the same improvement would be had by > > using a pixel region (which respects tiles) to iterate across the > > destination region. I agree, but the current algorithm writes two rows (top and bottom) at the same time, so it is not immediately possible to use the standard pixel region iterator. > > That is possible... Is there a filter that definitely uses the pixel region > stuff? Most filters I have seen only use one row, which may not be enough > "locally", since it uses only one pixel but has to fill a whole cacheline > (4/8 pixel). I will try whether I can speed up bumpmap also, since this > takes also the magical 30s and is much more often used in scripts than the > whirl&pinch module. > > I don't know how large a tile is, but since IMHO the major impact of > blocking seems to come from the CPU cache, I suspect that is too big for > older CPUs. I have done the whirl&pinch blocking thing about three years ago > (and forgot to send the patch), and tried it on an Alpha21164 and a P5. I think you're looking in the wrong direction here. Similar to the bumpmap (see my other message) I strongly suspect that the tile-cache is too small. You can check this for yourself by profiling your code. I expect that there is hardly any difference in processor time between your version and the original but that there is an important difference in the number of calls to gimp_tile_get and gimp_tile_put. (Sorry, did not try that myself). Requesting tiles from the main Gimp process is pretty expensive, it involves several context switches and a lot of copying. This is orders of magnitude slower than copying data within the plugin process from the tile cache to some other buffer. The whirlpinch plugin uses a cache size that is only big enough to handle the output tiles, there is no room for input tiles. This leads to a series of cache misses and horrible performance. Your modifications improve the cache-hit ratio because it processes larger chunks of data. However, it is not a fundamental solution. The easiest short time solution is to increase the cache size with the number of input tiles that are needed to compute one row of the output (I don't know if this can be easily determined). The more fundamental solution is to rewrite the algorithm so that it works on a tile by tile basis. In that case the minimum cache size is only 2 (for the output tiles) plus the number of input tiles that are needed to compute one output tile. Of course it is better to use a somewhat larger cache size because adjacent output tiles can use the same input tiles. Greetings, Ernst <[EMAIL PROTECTED]> ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...
The speedup that you see is probably mainly caused by better caching. There is a bug in the tile cache size for the plugin. The cache is under most circumstances too small and this means that every requested tile is not in the cache and must be transmitted from the main gimp process (SLOW). Effectively this means that every tile of the drawable is read and written gimp_tile_height (normally 64) times. Your modification is better, I expect that every tile is read and written gimp_tile_height/BLOCKS = 2 times (but when a selection is used it will normally be 3 times). This could be somewhat improved by setting BLOCKS = gimp_tile_height, when the selection starts at a y-position that is a multiple of gimp_tile_height (e.g. when there is no selection) every tile will probably be read/written once, but when the selection starts at a different position, every tile will still be read/written twice. I used a different approach that gives speedups similar to yours but that should read/write every tile only once. The solution is pretty simple: just enlarge the cache size. What is the problem with the cache size? The current code uses: gimp_tile_cache_ntiles (2 * (drawable->width + gimp_tile_width () - 1) / gimp_tile_width ()); The idea here is to cache one row of tiles for the source drawable and one row for the destination drawable. But this is not enough because there is no room in the cache for the bitmap tiles! There is a smaller problem here too, when a selection is used, it is overkill to have a cache for the full width of the drawable. I've attached a patch with my modifications. I hope that someone examines them critically and incorporates them into the distribution. I prefer my approach because it should give better performance and it keeps the code cleaner. Other improvements are still possible. I expect that it should be possible to rewrite the algorithm such that the tile cache contains only 3 tiles. From what I see the algorithm is the same in the horizontal and vertical direction. The current implementation uses 3 extra buffer-rows so when we add 3 extra buffer-columns it should be possible to rewrite the algorithm so that it processes one tile at a time instead of a full row. Thanks for pointing out a pretty big performance problem with the plug-in. Greetings, Ernst <[EMAIL PROTECTED]> patchfile
Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...
Oops - I missed out some vital pixel region initialisation. On Friday, 6 Apr 2001, Austin Donnelly wrote: > gimp_drawable_mask_bounds (drawable->id, &x1, &y1, &x2, &y2); > > for (y = y1; y < y2; y += tile_width - (y % tile_width)) > { > for (x = x1; x < x2; x += tile_width - (x % tile_width)) > { /* set up the source and dest regions */ gimp_pixel_rgn_init (&src_rgn, drawable, x, y, x_step, y_step, FALSE/*dirty*/, FALSE/*shadow*/); gimp_pixel_rgn_init (&dest_rgn, drawable, x, y, x_step, y_step, TRUE/*dirty*/, TRUE/*shadow*/); > /* process the image in tile-sized regions */ > for (pr = gimp_pixel_rgns_register (2, &src_rgn, &dest_rgn); >pr != NULL; >pr = gimp_pixel_rgns_process (pr)) > { > src_row = src_rgn.data; > dest_row = dest_rgn.data; > > for (row = 0; row < src_rgn.h; row++) > { > src = src_row; > dest = dest_row; > > for (col = 0; col < src_rgn.w; col++) > { > > /* do stuff with src[0..bpp] and dest[0..bpp] */ > > src += src_rgn.bpp; > dest += dest_rgn.bpp; > } > > src_row += src_rgn.rowstride; > dest_row += dest_rgn.rowstride; > } > } > } > } ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] [patch] unbelievable speedup for bumpmap...
On Thursday, 5 Apr 2001, Georg Acher wrote: > I just looked into bumpmap.c and tried to figure out if it can profit from > blocking and played a bit with the code. It seems that there is some major > (performance) problem with the gimp_pixel_rgn_get/set_row-calls in Gimp > 1.2.1. > > The original bumpmap took for my 1400*1400 image about 30s, when I commented > out one of the gimp_pixel_rgn_get/set_row() that access the source or the > destination image, the whole stuff took only about 2.4s, while the > calculations were still done! > > I experimented a bit, and found out, that apparently the "switch" between > get and set seems to be the problem. I changed the code to get 32 rows, > calculate them and the write the 32 back. It took only 3s overall, the > resulting image is the same as with the 30s boiling time. Have a look at the gimp_pixel_rgns_register() and gimp_pixel_rgns_process() functions. They do similar blocking on a per-tile basis. Most plugins should be using them, but ones with more complex access patterns (eg accessing pixels across tile boundaries) probably won't.' Typical use is something like this: gimp_drawable_mask_bounds (drawable->id, &x1, &y1, &x2, &y2); for (y = y1; y < y2; y += tile_width - (y % tile_width)) { for (x = x1; x < x2; x += tile_width - (x % tile_width)) { /* process the image in tile-sized regions */ for (pr = gimp_pixel_rgns_register (2, &src_rgn, &dest_rgn); pr != NULL; pr = gimp_pixel_rgns_process (pr)) { src_row = src_rgn.data; dest_row = dest_rgn.data; for (row = 0; row < src_rgn.h; row++) { src = src_row; dest = dest_row; for (col = 0; col < src_rgn.w; col++) { /* do stuff with src[0..bpp] and dest[0..bpp] */ src += src_rgn.bpp; dest += dest_rgn.bpp; } src_row += src_rgn.rowstride; dest_row += dest_rgn.rowstride; } } } } Of course this precise form will only work where there's a one-to-one correspondence between input and output pixels. I suspect whirl-pinch has more complex access patterns, in which case you'll need to be more creative with the loop structure. Austin ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer