[Gimp-developer] [patch] Major speedup for whirlpinch plugin

2001-04-05 Thread Georg Acher

Hi,
I don't know who's currently "responsible" for the whirlpinch plugin, so I
post my patch to this list.

I have modified whirlpinch slightly to use "blocking", ie. doing all
calculations in small squares (32*32). With that technique very common in
numerical computing, the CPU caches (and for GIMP) the tile cache have a much
higher hit rate. 
The boost is quite spectacular: The original whirlpinch on a larger image 
(1400*1400) needs on a Athlon-600 30s to complete, with my patch only 6.5s.
That's a speedup by a factor of 4.5 without any change in the algorithm
itself!

The changes are relatively small (effectively about 10 lines) and affect 
mostly clipping.

I have found no side efects of the patch...

The blocking can IMHO easily used for a lot of other filters, and should
give a large speedup for most of GIMP's filters.

Please try out the patch and apply it to the source tree if you like it ;-)
-- 
 Georg Acher, [EMAIL PROTECTED] 
 http://www.in.tum.de/~acher/
  "Oh no, not again !" The bowl of petunias  


--- whirlpinch.c.orgThu Apr  5 17:47:17 2001
+++ whirlpinch.cThu Apr  5 18:49:09 2001
@@ -22,6 +22,14 @@
  * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
  */
 
+/* Version 2.10:
+ *
+ * Major Speedup by use of "blocking", ie. doing the calcualations
+ * in small squares, thus gaining a performance boost from CPU caches
+ * and the tile cache.
+ *
+ * Georg Acher, [EMAIL PROTECTED]
+ */
 
 /* Version 2.09:
  *
@@ -63,7 +71,7 @@
 
 
 #define PLUG_IN_NAME"plug_in_whirl_pinch"
-#define PLUG_IN_VERSION "May 1997, 2.09"
+#define PLUG_IN_VERSION "April 2001, 2.10"
 
 /* Magic numbers */
 
@@ -71,6 +79,10 @@
 #define SCALE_WIDTH  200
 #define ENTRY_WIDTH  60
 
+/* blocking size, 32*32pixels is a good compromise for all CPUs */
+
+#define BLOCKING 32
+
 /* Types */
 
 typedef struct
@@ -366,12 +378,13 @@
   guchar  *top_row, *bot_row;
   guchar  *top_p, *bot_p;
   gint row, col;
+  gint row1,col1;
   guchar   pixel[4][4];
   guchar   values[4];
   double   whirl;
   double   cx, cy;
   int  ix, iy;
-  int  i;
+  int  i,n;
   guchar   bg_color[4];
   pixel_fetcher_t *pft, *pfb;
 
@@ -406,112 +419,133 @@
   whirl   = wpvals.whirl * G_PI / 180;
   radius2 = radius * radius * wpvals.radius;
 
-  for (row = sel_y1; row = ((sel_y1 + sel_y2) / 2); row++)
+  /* WhirlPinch in small squares to benefit from cache effects
+  (tile cache, CPU cache) 
+  20010405 GA
+  */
+  for (row1 = sel_y1; row1 = ((sel_y1 + sel_y2) / 2); row1+=BLOCKING)
 {
-  top_p = top_row;
-  bot_p = bot_row + img_bpp * (sel_width - 1);
-
-  for (col = sel_x1; col  sel_x2; col++)
-   {
- if (calc_undistorted_coords (col, row, whirl, wpvals.pinch, cx, cy))
-   {
- /* We are inside the distortion area */
-
- /* Top */
-
- if (cx = 0.0)
-   ix = (int) cx;
- else
-   ix = -((int) -cx + 1);
-
- if (cy = 0.0)
-   iy = (int) cy;
- else
-   iy = -((int) -cy + 1);
-
- pixel_fetcher_get_pixel (pft, ix, iy, pixel[0]);
- pixel_fetcher_get_pixel (pft, ix + 1, iy, pixel[1]);
- pixel_fetcher_get_pixel (pft, ix, iy + 1, pixel[2]);
- pixel_fetcher_get_pixel (pft, ix + 1, iy + 1, pixel[3]);
-
- for (i = 0; i  img_bpp; i++)
-   {
- values[0] = pixel[0][i];
- values[1] = pixel[1][i];
- values[2] = pixel[2][i];
- values[3] = pixel[3][i];
-
- *top_p++ = bilinear (cx, cy, values);
-   }
-
- /* Bottom */
-
- cx = cen_x + (cen_x - cx);
- cy = cen_y + (cen_y - cy);
-
- if (cx = 0.0)
-   ix = (int) cx;
- else
-   ix = -((int) -cx + 1);
-
- if (cy = 0.0)
-   iy = (int) cy;
- else
-   iy = -((int) -cy + 1);
-
- pixel_fetcher_get_pixel (pfb, ix, iy, pixel[0]);
- pixel_fetcher_get_pixel (pfb, ix + 1, iy, pixel[1]);
- pixel_fetcher_get_pixel (pfb, ix, iy + 1, pixel[2]);
- pixel_fetcher_get_pixel (pfb, ix + 1, iy + 1, pixel[3]);
-
- for (i = 0; i  img_bpp; i++)
-   {
- values[0] = pixel[0][i];
- values[1] = pixel[1][i];
- values[2] = pixel[2][i];
- values[3] = pixel[3][i];
-
- *bot_p++ = bilinear (cx, cy, values);
-   }
-
- bot_p -= 2 * img_bpp; /* We move backwards! */
-   }
- else
-   {
- 

[Gimp-developer] Re: [patch] Major speedup for whirlpinch plugin

2001-04-05 Thread Guillermo S. Romero / Familia Romero

[EMAIL PROTECTED] (2001-04-05 at 1904.10 +0200):
 The blocking can IMHO easily used for a lot of other filters, and should
 give a large speedup for most of GIMP's filters.

I do not have the power to decide if it gets accepted, but if Gimp
speed up in some areas is just about things like that, please, the
people in charge (aka plugin coders), take note. Maybe you could do
more reviews or write a small guideline doc for this. And thanks for
the patch.

GSR
 
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer



[Gimp-developer] [patch] Major speedup for whirlpinch plugin

2001-04-05 Thread Kelly Martin

On Thu, 5 Apr 2001 12:36:05 -0500, Kelly Martin [EMAIL PROTECTED] said:

I have modified whirlpinch slightly to use "blocking", ie. doing
all calculations in small squares (32*32). With that technique very
common in numerical computing, the CPU caches (and for GIMP) the
tile cache have a much higher hit rate. The boost is quite
spectacular: The original whirlpinch on a larger image (1400*1400)
needs on a Athlon-600 30s to complete, with my patch only 6.5s.
That's a speedup by a factor of 4.5 without any change in the
algorithm itself!

I was under the impression that whirlpinch used tile regions, which
should do blocking automatically.  It's been a couple years since I
looked at the code, though.

Hm, it does not.  The issue with whirlpinch is that there's only a
weak locality relationship between destionation pixels (which are
iterated across the image) and source pixels (which are fetched with
the pixel fetcher).  I haven't looked too closely at your blocking
patch, but I suspect that much the same improvement would be had by
using a pixel region (which respects tiles) to iterate across the
destination region.  

Whirlpinch also has an optimization to do the top and bottom whirls
simultaneously to save calculations.  This might actually be more of a
loss because of the locality hit on the tile cache.

If I get some spare time, I might look at this more closely.

Kelly
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer



Re: [Gimp-developer] [patch] Major speedup for whirlpinch plugin

2001-04-05 Thread Georg Acher

On Thu, Apr 05, 2001 at 12:45:52PM -0500, Kelly Martin wrote:
 
 Hm, it does not.  The issue with whirlpinch is that there's only a
 weak locality relationship between destionation pixels (which are
 iterated across the image) and source pixels (which are fetched with
 the pixel fetcher).  I haven't looked too closely at your blocking

That is right, but destination and source for themselves have good locality
(ie. the next pixel isn't 500 pixels away from the last).
I have tried whirlpinch with huge distortion values, and the worst case was
8s instead of 6.5s, still better than 30s ;-)

 patch, but I suspect that much the same improvement would be had by
 using a pixel region (which respects tiles) to iterate across the
 destination region.  

That is possible... Is there a filter that definitely uses the pixel region
stuff? Most filters I have seen only use one row, which may not be enough
"locally", since it uses only one pixel but has to fill a whole cacheline
(4/8 pixel). I will try whether I can speed up bumpmap also, since this
takes also the magical 30s and is much more often used in scripts than the
whirlpinch module.

I don't know how large a tile is, but since IMHO the major impact of
blocking seems to come from the CPU cache, I suspect that is too big for
older CPUs. I have done the whirlpinch blocking thing about three years ago
(and forgot to send the patch), and tried it on an Alpha21164 and a P5. 

Both have only very small L1-caches (Alpha has 8K) and relatively slow L2
caches (+L3 for Alpha). The best blocking factor for those CPUs was between
64 and 128. My Athlon now also tolerates 512, 32 only is only slightly (2%)
slower, but should work for all CPUs fairly good.

-- 
 Georg Acher, [EMAIL PROTECTED] 
 http://www.in.tum.de/~acher/
  "Oh no, not again !" The bowl of petunias  
___
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer