Re: miext/shadow/shrotate.c speedup and question

2005-10-04 Thread Marc Aurele La France

On Tue, 21 Sep 2005, Staffan Ulfberg wrote:


I have a Sharp Zaurus C3100, where X normally runs rotated 90 degrees,
using a shadow framebuffer.  I've been hacking a bit on getting the
code that blits a rotated shadow onto the display a bit faster and
came up with the included patch.



Blitting in rotated mode is about 4x the previous speed.  Non-rotated
copies are about the same speed; maybe up to 10% slower for small
rectangles (on the Zaurus).



The idea is to copy the area in blocks of 32x32 pixels, to reduce the
number of cache misses, which are unavoidable when walking either the
source or the destination bitmap across the scanlines.  16x16, 24x24,
andd 32x32 yields about the same result, so I chose 32x32 since it
seems best for the non-rotated modes.



Any comments on this patch?


This looks good to me.  I have committed it.


I have a question myself about the original code: This is the function
call to get the address in the destination frame buffer to write to:



  win = (FbBits *) (*pBuf-window) (pScreen,
scr_y,
scr_x  2,
SHADOW_WINDOW_WRITE,
winSize,
pBuf-closure);



The scr_x  2 part seems, to me, to assume that
sizeof(FbBits) == 4.  Am I missing something, or is this really
correct?  Anyway, my patch does not make this problem either better
or worse, but this is a chance to fix it if it is a bug...


As we compile this, FbBits will always be a CARD32.

Thanks for the patch.

Marc.

+--+---+
|  Marc Aurele La France   |  work:   1-780-492-9310   |
|  Academic Information and|  fax:1-780-492-1729   |
|Communications Technologies   |  email:  [EMAIL PROTECTED]  |
|  352 General Services Building   +---+
|  University of Alberta   |   |
|  Edmonton, Alberta   | Standard disclaimers apply|
|  T6G 2H1 |   |
|  CANADA  |   |
+--+---+
XFree86 developer and VP.  ATI driver and X server internals.
___
Devel mailing list
Devel@XFree86.Org
http://XFree86.Org/mailman/listinfo/devel


miext/shadow/shrotate.c speedup and question

2005-09-20 Thread Staffan Ulfberg
I have a Sharp Zaurus C3100, where X normally runs rotated 90 degrees,
using a shadow framebuffer.  I've been hacking a bit on getting the
code that blits a rotated shadow onto the display a bit faster and
came up with the included patch.

Blitting in rotated mode is about 4x the previous speed.  Non-rotated
copies are about the same speed; maybe up to 10% slower for small
rectangles (on the Zaurus).

The idea is to copy the area in blocks of 32x32 pixels, to reduce the
number of cache misses, which are unavoidable when walking either the
source or the destination bitmap across the scanlines.  16x16, 24x24,
andd 32x32 yields about the same result, so I chose 32x32 since it
seems best for the non-rotated modes.

Any comments on this patch?

I have a question myself about the original code: This is the function
call to get the address in the destination frame buffer to write to:

  win = (FbBits *) (*pBuf-window) (pScreen,
scr_y,
scr_x  2,
SHADOW_WINDOW_WRITE,
winSize,
pBuf-closure);

The scr_x  2 part seems, to me, to assume that 
sizeof(FbBits) == 4.  Am I missing something, or is this really 
correct?  Anyway, my patch does not make this problem either better
or worse, but this is a chance to fix it if it is a bug...

Staffan


Index: programs/Xserver/miext/shadow/shrotate.c
===
RCS file: /scratch/openbsd/cvs/XF4/xc/programs/Xserver/miext/shadow/shrotate.c,v
retrieving revision 1.2
diff -u -r1.2 shrotate.c
--- programs/Xserver/miext/shadow/shrotate.c3 Nov 2004 00:09:54 -   
1.2
+++ programs/Xserver/miext/shadow/shrotate.c20 Sep 2005 23:07:58 -
@@ -45,6 +45,106 @@
 #define TOP_TO_BOTTOM  2
 #define BOTTOM_TO_TOP  -2
 
+
+static void 
+shadowUpdateRotatePackedSubRectangle(shadowBufPtr pBuf,
+FbBits *shaLine, int shaFirstShift, 
+int shaStepOverX, int shaStepOverY,
+int shaStepDownX, int shaStepDownY,
+int shaBpp, FbBits shaMask,
+ScreenPtr pScreen,
+int scr_x1, int scr_y, 
+int scr_h, int scr_w,
+int pixelsPerBits)
+{
+FbBits *sha;
+int shaShift;
+int scr_x;
+int w;
+
+/*
+ * Copy the bits, always write across the physical frame buffer
+ * to take advantage of write combining.
+ */
+while (scr_h--)
+{
+   int p;
+   FbBits  bits;
+   FbBits  *win;
+   int i;
+   CARD32  winSize;
+   
+   sha = shaLine;
+   shaShift = shaFirstShift;
+   w = scr_w;
+   scr_x = scr_x1 * shaBpp  FB_SHIFT;
+   
+   while (w)
+   {
+ /*
+  * Map some of this line
+  */
+ win = (FbBits *) (*pBuf-window) (pScreen,
+   scr_y,
+   scr_x  2,
+   SHADOW_WINDOW_WRITE,
+   winSize,
+   pBuf-closure);
+   i = (winSize  2);
+   if (i  w)
+   i = w;
+   w -= i;
+   scr_x += i;
+   /*
+* Copy the portion of the line mapped
+*/
+   while (i--)
+   {
+   bits = 0;
+   p = pixelsPerBits;
+   /*
+* Build one word of output from multiple inputs
+*/
+   while (p--)
+   {
+   bits = FbScrLeft(bits, shaBpp);
+   bits |= FbScrRight (*sha, shaShift)  shaMask;
+   
+   shaShift -= shaStepOverX;
+   if (shaShift = FB_UNIT)
+   {
+   shaShift -= FB_UNIT;
+   sha--;
+   }
+   else if (shaShift  0)
+   {
+   shaShift += FB_UNIT;
+   sha++;
+   }
+   sha += shaStepOverY;
+   }
+   *win++ = bits;
+   }
+   }
+   scr_y++;
+   shaFirstShift -= shaStepDownX;
+   if (shaFirstShift = FB_UNIT)
+   {
+   shaFirstShift -= FB_UNIT;
+   shaLine--;
+   }
+   else if (shaFirstShift  0)
+   {
+   shaFirstShift += FB_UNIT;
+   shaLine++;
+   }
+   shaLine += shaStepDownY;
+}
+}
+
+#define BLOCKSIZE_HEIGHT 32
+#define BLOCKSIZE_WIDTH 32
+
 void
 shadowUpdateRotatePacked (ScreenPtrpScreen,