[Gimp-developer] [patch] Major speedup for whirlpinch plugin

2001-04-05 Thread Georg Acher

Hi,
I don't know who's currently "responsible" for the whirlpinch plugin, so I
post my patch to this list.

I have modified whirlpinch slightly to use "blocking", ie. doing all
calculations in small squares (32*32). With that technique very common in
numerical computing, the CPU caches (and for GIMP) the tile cache have a much
higher hit rate. 
The boost is quite spectacular: The original whirlpinch on a larger image 
(1400*1400) needs on a Athlon-600 30s to complete, with my patch only 6.5s.
That's a speedup by a factor of 4.5 without any change in the algorithm
itself!

The changes are relatively small (effectively about 10 lines) and affect 
mostly clipping.

I have found no side efects of the patch...

The blocking can IMHO easily used for a lot of other filters, and should
give a large speedup for most of GIMP's filters.

Please try out the patch and apply it to the source tree if you like it ;-)
-- 
 Georg Acher, [EMAIL PROTECTED] 
 http://www.in.tum.de/~acher/
  "Oh no, not again !" The bowl of petunias  


--- whirlpinch.c.orgThu Apr  5 17:47:17 2001
+++ whirlpinch.cThu Apr  5 18:49:09 2001
@@ -22,6 +22,14 @@
  * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
  */
 
+/* Version 2.10:
+ *
+ * Major Speedup by use of "blocking", ie. doing the calcualations
+ * in small squares, thus gaining a performance boost from CPU caches
+ * and the tile cache.
+ *
+ * Georg Acher, [EMAIL PROTECTED]
+ */
 
 /* Version 2.09:
  *
@@ -63,7 +71,7 @@
 
 
 #define PLUG_IN_NAME"plug_in_whirl_pinch"
-#define PLUG_IN_VERSION "May 1997, 2.09"
+#define PLUG_IN_VERSION "April 2001, 2.10"
 
 /* Magic numbers */
 
@@ -71,6 +79,10 @@
 #define SCALE_WIDTH  200
 #define ENTRY_WIDTH  60
 
+/* blocking size, 32*32pixels is a good compromise for all CPUs */
+
+#define BLOCKING 32
+
 /* Types */
 
 typedef struct
@@ -366,12 +378,13 @@
   guchar  *top_row, *bot_row;
   guchar  *top_p, *bot_p;
   gint row, col;
+  gint row1,col1;
   guchar   pixel[4][4];
   guchar   values[4];
   double   whirl;
   double   cx, cy;
   int  ix, iy;
-  int  i;
+  int  i,n;
   guchar   bg_color[4];
   pixel_fetcher_t *pft, *pfb;
 
@@ -406,112 +419,133 @@
   whirl   = wpvals.whirl * G_PI / 180;
   radius2 = radius * radius * wpvals.radius;
 
-  for (row = sel_y1; row = ((sel_y1 + sel_y2) / 2); row++)
+  /* WhirlPinch in small squares to benefit from cache effects
+  (tile cache, CPU cache) 
+  20010405 GA
+  */
+  for (row1 = sel_y1; row1 = ((sel_y1 + sel_y2) / 2); row1+=BLOCKING)
 {
-  top_p = top_row;
-  bot_p = bot_row + img_bpp * (sel_width - 1);
-
-  for (col = sel_x1; col  sel_x2; col++)
-   {
- if (calc_undistorted_coords (col, row, whirl, wpvals.pinch, cx, cy))
-   {
- /* We are inside the distortion area */
-
- /* Top */
-
- if (cx = 0.0)
-   ix = (int) cx;
- else
-   ix = -((int) -cx + 1);
-
- if (cy = 0.0)
-   iy = (int) cy;
- else
-   iy = -((int) -cy + 1);
-
- pixel_fetcher_get_pixel (pft, ix, iy, pixel[0]);
- pixel_fetcher_get_pixel (pft, ix + 1, iy, pixel[1]);
- pixel_fetcher_get_pixel (pft, ix, iy + 1, pixel[2]);
- pixel_fetcher_get_pixel (pft, ix + 1, iy + 1, pixel[3]);
-
- for (i = 0; i  img_bpp; i++)
-   {
- values[0] = pixel[0][i];
- values[1] = pixel[1][i];
- values[2] = pixel[2][i];
- values[3] = pixel[3][i];
-
- *top_p++ = bilinear (cx, cy, values);
-   }
-
- /* Bottom */
-
- cx = cen_x + (cen_x - cx);
- cy = cen_y + (cen_y - cy);
-
- if (cx = 0.0)
-   ix = (int) cx;
- else
-   ix = -((int) -cx + 1);
-
- if (cy = 0.0)
-   iy = (int) cy;
- else
-   iy = -((int) -cy + 1);
-
- pixel_fetcher_get_pixel (pfb, ix, iy, pixel[0]);
- pixel_fetcher_get_pixel (pfb, ix + 1, iy, pixel[1]);
- pixel_fetcher_get_pixel (pfb, ix, iy + 1, pixel[2]);
- pixel_fetcher_get_pixel (pfb, ix + 1, iy + 1, pixel[3]);
-
- for (i = 0; i  img_bpp; i++)
-   {
- values[0] = pixel[0][i];
- values[1] = pixel[1][i];
- values[2] = pixel[2][i];
- values[3] = pixel[3][i];
-
- *bot_p++ = bilinear (cx, cy, values);
-   }
-
- bot_p -= 2 * img_bpp; /* We move backwards! */
-   }
- else
-   {
- 

Re: [Gimp-developer] [patch] Major speedup for whirlpinch plugin

2001-04-05 Thread Georg Acher

On Thu, Apr 05, 2001 at 12:45:52PM -0500, Kelly Martin wrote:
 
 Hm, it does not.  The issue with whirlpinch is that there's only a
 weak locality relationship between destionation pixels (which are
 iterated across the image) and source pixels (which are fetched with
 the pixel fetcher).  I haven't looked too closely at your blocking

That is right, but destination and source for themselves have good locality
(ie. the next pixel isn't 500 pixels away from the last).
I have tried whirlpinch with huge distortion values, and the worst case was
8s instead of 6.5s, still better than 30s ;-)

 patch, but I suspect that much the same improvement would be had by
 using a pixel region (which respects tiles) to iterate across the
 destination region.  

That is possible... Is there a filter that definitely uses the pixel region
stuff? Most filters I have seen only use one row, which may not be enough
"locally", since it uses only one pixel but has to fill a whole cacheline
(4/8 pixel). I will try whether I can speed up bumpmap also, since this
takes also the magical 30s and is much more often used in scripts than the
whirlpinch module.

I don't know how large a tile is, but since IMHO the major impact of
blocking seems to come from the CPU cache, I suspect that is too big for
older CPUs. I have done the whirlpinch blocking thing about three years ago
(and forgot to send the patch), and tried it on an Alpha21164 and a P5. 

Both have only very small L1-caches (Alpha has 8K) and relatively slow L2
caches (+L3 for Alpha). The best blocking factor for those CPUs was between
64 and 128. My Athlon now also tolerates 512, 32 only is only slightly (2%)
slower, but should work for all CPUs fairly good.

-- 
 Georg Acher, [EMAIL PROTECTED] 
 http://www.in.tum.de/~acher/
  "Oh no, not again !" The bowl of petunias  
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer