Re: [Gimp-developer] [patch] Major speedup for whirlpinch plugin
On 5 Apr, Kelly Martin wrote: Tiles are 64x64 by default, and changing them is a bad idea because it makes your .xcf files nontransportable. Not to forget that this size is more or less hardcoded. -- Servus, Daniel ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] [patch] Major speedup for whirlpinch plugin
On Mon, 9 Apr 2001 23:02:44 +0200 (CEST), [EMAIL PROTECTED] said: On 5 Apr, Kelly Martin wrote: Tiles are 64x64 by default, and changing them is a bad idea because it makes your .xcf files nontransportable. Not to forget that this size is more or less hardcoded. It's a #define, yes. And because the file format is tightly bound to the tile size, changing it at compile time breaks XCF save and load in ways I don't even want to think about. I don't think that XCF v0 or v1 specifies the tile size anywhere in the data stream, either. I tried to modify the XCF loader saver once to do reblocking, but it gave me a headache and I moved on to other, more interesting things, like feeding my cats. :) Kelly ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] [patch] Major speedup for whirlpinch plugin
I think this is yet another tile-cache problem. Georg Acher wrote: On Thu, Apr 05, 2001 at 12:45:52PM -0500, Kelly Martin wrote: Hm, it does not. The issue with whirlpinch is that there's only a weak locality relationship between destionation pixels (which are iterated across the image) and source pixels (which are fetched with the pixel fetcher). I haven't looked too closely at your blocking That is right, but destination and source for themselves have good locality (ie. the next pixel isn't 500 pixels away from the last). This is an important observation. patch, but I suspect that much the same improvement would be had by using a pixel region (which respects tiles) to iterate across the destination region. I agree, but the current algorithm writes two rows (top and bottom) at the same time, so it is not immediately possible to use the standard pixel region iterator. That is possible... Is there a filter that definitely uses the pixel region stuff? Most filters I have seen only use one row, which may not be enough "locally", since it uses only one pixel but has to fill a whole cacheline (4/8 pixel). I will try whether I can speed up bumpmap also, since this takes also the magical 30s and is much more often used in scripts than the whirlpinch module. I don't know how large a tile is, but since IMHO the major impact of blocking seems to come from the CPU cache, I suspect that is too big for older CPUs. I have done the whirlpinch blocking thing about three years ago (and forgot to send the patch), and tried it on an Alpha21164 and a P5. I think you're looking in the wrong direction here. Similar to the bumpmap (see my other message) I strongly suspect that the tile-cache is too small. You can check this for yourself by profiling your code. I expect that there is hardly any difference in processor time between your version and the original but that there is an important difference in the number of calls to gimp_tile_get and gimp_tile_put. (Sorry, did not try that myself). Requesting tiles from the main Gimp process is pretty expensive, it involves several context switches and a lot of copying. This is orders of magnitude slower than copying data within the plugin process from the tile cache to some other buffer. The whirlpinch plugin uses a cache size that is only big enough to handle the output tiles, there is no room for input tiles. This leads to a series of cache misses and horrible performance. Your modifications improve the cache-hit ratio because it processes larger chunks of data. However, it is not a fundamental solution. The easiest short time solution is to increase the cache size with the number of input tiles that are needed to compute one row of the output (I don't know if this can be easily determined). The more fundamental solution is to rewrite the algorithm so that it works on a tile by tile basis. In that case the minimum cache size is only 2 (for the output tiles) plus the number of input tiles that are needed to compute one output tile. Of course it is better to use a somewhat larger cache size because adjacent output tiles can use the same input tiles. Greetings, Ernst [EMAIL PROTECTED] ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
[Gimp-developer] [patch] Major speedup for whirlpinch plugin
Hi, I don't know who's currently "responsible" for the whirlpinch plugin, so I post my patch to this list. I have modified whirlpinch slightly to use "blocking", ie. doing all calculations in small squares (32*32). With that technique very common in numerical computing, the CPU caches (and for GIMP) the tile cache have a much higher hit rate. The boost is quite spectacular: The original whirlpinch on a larger image (1400*1400) needs on a Athlon-600 30s to complete, with my patch only 6.5s. That's a speedup by a factor of 4.5 without any change in the algorithm itself! The changes are relatively small (effectively about 10 lines) and affect mostly clipping. I have found no side efects of the patch... The blocking can IMHO easily used for a lot of other filters, and should give a large speedup for most of GIMP's filters. Please try out the patch and apply it to the source tree if you like it ;-) -- Georg Acher, [EMAIL PROTECTED] http://www.in.tum.de/~acher/ "Oh no, not again !" The bowl of petunias --- whirlpinch.c.orgThu Apr 5 17:47:17 2001 +++ whirlpinch.cThu Apr 5 18:49:09 2001 @@ -22,6 +22,14 @@ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ +/* Version 2.10: + * + * Major Speedup by use of "blocking", ie. doing the calcualations + * in small squares, thus gaining a performance boost from CPU caches + * and the tile cache. + * + * Georg Acher, [EMAIL PROTECTED] + */ /* Version 2.09: * @@ -63,7 +71,7 @@ #define PLUG_IN_NAME"plug_in_whirl_pinch" -#define PLUG_IN_VERSION "May 1997, 2.09" +#define PLUG_IN_VERSION "April 2001, 2.10" /* Magic numbers */ @@ -71,6 +79,10 @@ #define SCALE_WIDTH 200 #define ENTRY_WIDTH 60 +/* blocking size, 32*32pixels is a good compromise for all CPUs */ + +#define BLOCKING 32 + /* Types */ typedef struct @@ -366,12 +378,13 @@ guchar *top_row, *bot_row; guchar *top_p, *bot_p; gint row, col; + gint row1,col1; guchar pixel[4][4]; guchar values[4]; double whirl; double cx, cy; int ix, iy; - int i; + int i,n; guchar bg_color[4]; pixel_fetcher_t *pft, *pfb; @@ -406,112 +419,133 @@ whirl = wpvals.whirl * G_PI / 180; radius2 = radius * radius * wpvals.radius; - for (row = sel_y1; row = ((sel_y1 + sel_y2) / 2); row++) + /* WhirlPinch in small squares to benefit from cache effects + (tile cache, CPU cache) + 20010405 GA + */ + for (row1 = sel_y1; row1 = ((sel_y1 + sel_y2) / 2); row1+=BLOCKING) { - top_p = top_row; - bot_p = bot_row + img_bpp * (sel_width - 1); - - for (col = sel_x1; col sel_x2; col++) - { - if (calc_undistorted_coords (col, row, whirl, wpvals.pinch, cx, cy)) - { - /* We are inside the distortion area */ - - /* Top */ - - if (cx = 0.0) - ix = (int) cx; - else - ix = -((int) -cx + 1); - - if (cy = 0.0) - iy = (int) cy; - else - iy = -((int) -cy + 1); - - pixel_fetcher_get_pixel (pft, ix, iy, pixel[0]); - pixel_fetcher_get_pixel (pft, ix + 1, iy, pixel[1]); - pixel_fetcher_get_pixel (pft, ix, iy + 1, pixel[2]); - pixel_fetcher_get_pixel (pft, ix + 1, iy + 1, pixel[3]); - - for (i = 0; i img_bpp; i++) - { - values[0] = pixel[0][i]; - values[1] = pixel[1][i]; - values[2] = pixel[2][i]; - values[3] = pixel[3][i]; - - *top_p++ = bilinear (cx, cy, values); - } - - /* Bottom */ - - cx = cen_x + (cen_x - cx); - cy = cen_y + (cen_y - cy); - - if (cx = 0.0) - ix = (int) cx; - else - ix = -((int) -cx + 1); - - if (cy = 0.0) - iy = (int) cy; - else - iy = -((int) -cy + 1); - - pixel_fetcher_get_pixel (pfb, ix, iy, pixel[0]); - pixel_fetcher_get_pixel (pfb, ix + 1, iy, pixel[1]); - pixel_fetcher_get_pixel (pfb, ix, iy + 1, pixel[2]); - pixel_fetcher_get_pixel (pfb, ix + 1, iy + 1, pixel[3]); - - for (i = 0; i img_bpp; i++) - { - values[0] = pixel[0][i]; - values[1] = pixel[1][i]; - values[2] = pixel[2][i]; - values[3] = pixel[3][i]; - - *bot_p++ = bilinear (cx, cy, values); - } - - bot_p -= 2 * img_bpp; /* We move backwards! */ - } - else - { - /* We are outside the distortion area; - * just
[Gimp-developer] [patch] Major speedup for whirlpinch plugin
On Thu, 5 Apr 2001 12:36:05 -0500, Kelly Martin [EMAIL PROTECTED] said: I have modified whirlpinch slightly to use "blocking", ie. doing all calculations in small squares (32*32). With that technique very common in numerical computing, the CPU caches (and for GIMP) the tile cache have a much higher hit rate. The boost is quite spectacular: The original whirlpinch on a larger image (1400*1400) needs on a Athlon-600 30s to complete, with my patch only 6.5s. That's a speedup by a factor of 4.5 without any change in the algorithm itself! I was under the impression that whirlpinch used tile regions, which should do blocking automatically. It's been a couple years since I looked at the code, though. Hm, it does not. The issue with whirlpinch is that there's only a weak locality relationship between destionation pixels (which are iterated across the image) and source pixels (which are fetched with the pixel fetcher). I haven't looked too closely at your blocking patch, but I suspect that much the same improvement would be had by using a pixel region (which respects tiles) to iterate across the destination region. Whirlpinch also has an optimization to do the top and bottom whirls simultaneously to save calculations. This might actually be more of a loss because of the locality hit on the tile cache. If I get some spare time, I might look at this more closely. Kelly ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] [patch] Major speedup for whirlpinch plugin
On Thu, Apr 05, 2001 at 12:45:52PM -0500, Kelly Martin wrote: Hm, it does not. The issue with whirlpinch is that there's only a weak locality relationship between destionation pixels (which are iterated across the image) and source pixels (which are fetched with the pixel fetcher). I haven't looked too closely at your blocking That is right, but destination and source for themselves have good locality (ie. the next pixel isn't 500 pixels away from the last). I have tried whirlpinch with huge distortion values, and the worst case was 8s instead of 6.5s, still better than 30s ;-) patch, but I suspect that much the same improvement would be had by using a pixel region (which respects tiles) to iterate across the destination region. That is possible... Is there a filter that definitely uses the pixel region stuff? Most filters I have seen only use one row, which may not be enough "locally", since it uses only one pixel but has to fill a whole cacheline (4/8 pixel). I will try whether I can speed up bumpmap also, since this takes also the magical 30s and is much more often used in scripts than the whirlpinch module. I don't know how large a tile is, but since IMHO the major impact of blocking seems to come from the CPU cache, I suspect that is too big for older CPUs. I have done the whirlpinch blocking thing about three years ago (and forgot to send the patch), and tried it on an Alpha21164 and a P5. Both have only very small L1-caches (Alpha has 8K) and relatively slow L2 caches (+L3 for Alpha). The best blocking factor for those CPUs was between 64 and 128. My Athlon now also tolerates 512, 32 only is only slightly (2%) slower, but should work for all CPUs fairly good. -- Georg Acher, [EMAIL PROTECTED] http://www.in.tum.de/~acher/ "Oh no, not again !" The bowl of petunias ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer