Re: EXA performance problem

2011-11-28 Thread Christoph Bartoschek

Am 28.11.2011 07:43, schrieb Maarten Maathuis:
 ___

xorg@lists.freedesktop.org: X.Org support
Archives: http://lists.freedesktop.org/archives/xorg
Info: http://lists.freedesktop.org/mailman/listinfo/xorg
Your subscription address: madman2...@gmail.com



No, damage is an extention, it is called by EXA, it's probably adding
all you rectangles to a damage region used to determine how much data
is actually valid (needed for ram--vram migrations for example).

One thing that just comes to mind, if you are rendering a million
rectangles, how many of those do you actually see on your screen?


Most of them are only 1x1 pixel wide. And lots of rectangles share the 
same pixel.  I assue that one can optimize the application. But I did 
not write it and do not know how it works. I only know that it was able 
to show vector pictures consisting of millions of rectangles within 
seconds (VLSI design data) when run on XFree86. With Xorg it takes minutes.


I only see the problem because we recently upgraded our X11 thin clients 
to better hardware. But they turned out to be much slower than the older 
ones.


My quest to find the problem has led me to the damage extension now. 
First I thought it was a network problem. But Xorg was also slow on my 
notebook when the program was started locally.


The contrast is striking: The old XFree86 thin clients were able to draw 
all the rectangles that were sent over a 100 MBit ethernet network in 
seconds. While my more powerful Xorg server needs minutes although the 
software runs on the same machine.



However, I was able to improve the runtime of the first operation in 
damagePolyRectangle. The runtime of my benchmark went down from 90 
seconds to 64 seconds.


Now one has to look at
(*pGC-ops-PolyRectangle)(pDrawable, pGC, nRects, pRects);


Thanks
Christoph
___
xorg@lists.freedesktop.org: X.Org support
Archives: http://lists.freedesktop.org/archives/xorg
Info: http://lists.freedesktop.org/mailman/listinfo/xorg
Your subscription address: arch...@mail-archive.com


Re: EXA performance problem

2011-11-28 Thread Christoph Bartoschek

Am 28.11.2011 10:35, schrieb Christoph Bartoschek:


Now one has to look at
(*pGC-ops-PolyRectangle)(pDrawable, pGC, nRects, pRects);


Here is what I see so far:

- damagePolyRectangle is called for 2044 rectangles.

- the damage region is computed it consists of about 1000 rectangles 
each time.


- miPolyRectangle is called.

- the function iterates over all rectangles and calls exaPolylines for 
each of them because most have only a width and height of 0


- exaPolylines calls ExaCheckPolylines.


We see that for each rectanlge ExaCheckPolylines is called. I have added 
timers to this function to see what costs time:



void
ExaCheckPolylines (DrawablePtr pDrawable, GCPtr pGC,
  int mode, int npt, DDXPointPtr ppt)
{
  EXA_PRE_FALLBACK_GC(pGC);
  EXA_FALLBACK((to %p (%c), width %d, mode %d, count %d\n,
pDrawable, exaDrawableLocation(pDrawable),
pGC-lineWidth, mode, npt));

  exaPrepareAccess (pDrawable, EXA_PREPARE_DEST);   // Step1: 55 s
  exaPrepareAccessGC (pGC); // Step2: 2.4 s
  pGC-ops-Polylines (pDrawable, pGC, mode, npt, ppt); // Step3: 2.4 s
  exaFinishAccessGC (pGC);  // Step4: 2.2 s
  exaFinishAccess (pDrawable, EXA_PREPARE_DEST);// Step5: 2.2 s
  EXA_POST_FALLBACK_GC(pGC);
}


We see that exaPrepareAccess needs most of the time. Is that expected?

Inside we see that there are some region operations on the damage region 
in exaCopyDirty. As said before the damage region contains about 1000 
rectangles. So we have 2000 times several operations on 1000 rectangeles.


I think this explains the runtime.

Isn't it somehow possible to batch the rectangle drawing such that the 
region operations are not neccessary for each rectangle?


Isn't is possible to expand the damage region such that it contains less 
rectangles?


Is this still the correct list, or should I ask the EXA questions elsewhere?

Christoph
___
xorg@lists.freedesktop.org: X.Org support
Archives: http://lists.freedesktop.org/archives/xorg
Info: http://lists.freedesktop.org/mailman/listinfo/xorg
Your subscription address: arch...@mail-archive.com


Re: EXA performance problem

2011-11-28 Thread Maarten Maathuis
On Mon, Nov 28, 2011 at 4:49 PM, Christoph Bartoschek
bartosc...@or.uni-bonn.de wrote:
 Am 28.11.2011 10:35, schrieb Christoph Bartoschek:

 Now one has to look at
 (*pGC-ops-PolyRectangle)(pDrawable, pGC, nRects, pRects);

 Here is what I see so far:

 - damagePolyRectangle is called for 2044 rectangles.

 - the damage region is computed it consists of about 1000 rectangles each
 time.

 - miPolyRectangle is called.

 - the function iterates over all rectangles and calls exaPolylines for each
 of them because most have only a width and height of 0

 - exaPolylines calls ExaCheckPolylines.


 We see that for each rectanlge ExaCheckPolylines is called. I have added
 timers to this function to see what costs time:


 void
 ExaCheckPolylines (DrawablePtr pDrawable, GCPtr pGC,
                  int mode, int npt, DDXPointPtr ppt)
 {
  EXA_PRE_FALLBACK_GC(pGC);
  EXA_FALLBACK((to %p (%c), width %d, mode %d, count %d\n,
                pDrawable, exaDrawableLocation(pDrawable),
                pGC-lineWidth, mode, npt));

  exaPrepareAccess (pDrawable, EXA_PREPARE_DEST);       // Step1: 55 s
  exaPrepareAccessGC (pGC);                             // Step2: 2.4 s
  pGC-ops-Polylines (pDrawable, pGC, mode, npt, ppt); // Step3: 2.4 s
  exaFinishAccessGC (pGC);                              // Step4: 2.2 s
  exaFinishAccess (pDrawable, EXA_PREPARE_DEST);        // Step5: 2.2 s
  EXA_POST_FALLBACK_GC(pGC);
 }


 We see that exaPrepareAccess needs most of the time. Is that expected?

 Inside we see that there are some region operations on the damage region in
 exaCopyDirty. As said before the damage region contains about 1000
 rectangles. So we have 2000 times several operations on 1000 rectangeles.

 I think this explains the runtime.

 Isn't it somehow possible to batch the rectangle drawing such that the
 region operations are not neccessary for each rectangle?

 Isn't is possible to expand the damage region such that it contains less
 rectangles?

 Is this still the correct list, or should I ask the EXA questions elsewhere?

 Christoph
 ___
 xorg@lists.freedesktop.org: X.Org support
 Archives: http://lists.freedesktop.org/archives/xorg
 Info: http://lists.freedesktop.org/mailman/listinfo/xorg
 Your subscription address: madman2...@gmail.com


EXA doesn't have a seperate list, but now that you ask, you should
probably move to the xorg-devel mailinglist :-)

I don't have any answers right now, but i'll think about it.

-- 
Far away from the primal instinct, the song seems to fade away, the
river get wider between your thoughts and the things we do and say.
___
xorg@lists.freedesktop.org: X.Org support
Archives: http://lists.freedesktop.org/archives/xorg
Info: http://lists.freedesktop.org/mailman/listinfo/xorg
Your subscription address: arch...@mail-archive.com


Re: EXA performance problem

2011-11-27 Thread Maarten Maathuis
On Sun, Nov 27, 2011 at 3:55 PM, Christoph Bartoschek
bartosc...@or.uni-bonn.de wrote:
 Hi,

 I still have a huge performance problem with Xorg. One application that
 painted 2 Mio rectangles on the screen within a second or so with XFree86
 needs about a minute with Xorg.

 Most of the time is spent in libpixman. I've added some debug statements and
 see that pixman_raster_op is called about 7.2 mio times during my testcase.

 I do not think that pixman itself is the problem. It is just used too often
 by EXA.

 Is there anything I can do about this? Is there a better list where I can
 ask? Or do you know a person that might be interested in solving such a
 problem?

 Christoph
 ___
 xorg@lists.freedesktop.org: X.Org support
 Archives: http://lists.freedesktop.org/archives/xorg
 Info: http://lists.freedesktop.org/mailman/listinfo/xorg
 Your subscription address: madman2...@gmail.com


As far as i know it basically boils down to this, rendering rectangles
is done in a software library as you observed. If your pixmap happens
to be outside normal ram then a lot of reads will kill performance.
These days the aim should be to use as little core rendering as
possible. A modern toolkit or a rendering library like cairo should
handle this far better.

-- 
Far away from the primal instinct, the song seems to fade away, the
river get wider between your thoughts and the things we do and say.
___
xorg@lists.freedesktop.org: X.Org support
Archives: http://lists.freedesktop.org/archives/xorg
Info: http://lists.freedesktop.org/mailman/listinfo/xorg
Your subscription address: arch...@mail-archive.com


Re: EXA performance problem

2011-11-27 Thread Chris Wilson
On Sun, 27 Nov 2011 15:55:12 +0100, Christoph Bartoschek 
bartosc...@or.uni-bonn.de wrote:
 Hi,
 
 I still have a huge performance problem with Xorg. One application that 
 painted 2 Mio rectangles on the screen within a second or so with 
 XFree86 needs about a minute with Xorg.

The easiest way for anyone else to reproduce this issue would be if you
were to identify the most common slow op and translate that into the
appropriate x11perf command line.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
xorg@lists.freedesktop.org: X.Org support
Archives: http://lists.freedesktop.org/archives/xorg
Info: http://lists.freedesktop.org/mailman/listinfo/xorg
Your subscription address: arch...@mail-archive.com


Re: EXA performance problem

2011-11-27 Thread Christoph Bartoschek

Am 27.11.2011 16:13, schrieb Maarten Maathuis:

On Sun, Nov 27, 2011 at 3:55 PM, Christoph Bartoschek
bartosc...@or.uni-bonn.de  wrote:

Hi,

I still have a huge performance problem with Xorg. One application that
painted 2 Mio rectangles on the screen within a second or so with XFree86
needs about a minute with Xorg.

Most of the time is spent in libpixman. I've added some debug statements and
see that pixman_raster_op is called about 7.2 mio times during my testcase.

I do not think that pixman itself is the problem. It is just used too often
by EXA.

Is there anything I can do about this? Is there a better list where I can
ask? Or do you know a person that might be interested in solving such a
problem?

Christoph
___
xorg@lists.freedesktop.org: X.Org support
Archives: http://lists.freedesktop.org/archives/xorg
Info: http://lists.freedesktop.org/mailman/listinfo/xorg
Your subscription address: madman2...@gmail.com



As far as i know it basically boils down to this, rendering rectangles
is done in a software library as you observed. If your pixmap happens
to be outside normal ram then a lot of reads will kill performance.
These days the aim should be to use as little core rendering as
possible. A modern toolkit or a rendering library like cairo should
handle this far better.



How can I check whether the pixmap is outside the normal ram? For me it 
does not look as if pixman is used for rendering the image. It looks as 
if EXA is managing the region that needs updates with pixman routines. 
But I could be wrong here.


Christoph
___
xorg@lists.freedesktop.org: X.Org support
Archives: http://lists.freedesktop.org/archives/xorg
Info: http://lists.freedesktop.org/mailman/listinfo/xorg
Your subscription address: arch...@mail-archive.com


Re: EXA performance problem

2011-11-27 Thread Maarten Maathuis
On Sun, Nov 27, 2011 at 9:40 PM, Christoph Bartoschek
bartosc...@or.uni-bonn.de wrote:
 Am 27.11.2011 16:13, schrieb Maarten Maathuis:

 On Sun, Nov 27, 2011 at 3:55 PM, Christoph Bartoschek
 bartosc...@or.uni-bonn.de  wrote:

 Hi,

 I still have a huge performance problem with Xorg. One application that
 painted 2 Mio rectangles on the screen within a second or so with XFree86
 needs about a minute with Xorg.

 Most of the time is spent in libpixman. I've added some debug statements
 and
 see that pixman_raster_op is called about 7.2 mio times during my
 testcase.

 I do not think that pixman itself is the problem. It is just used too
 often
 by EXA.

 Is there anything I can do about this? Is there a better list where I can
 ask? Or do you know a person that might be interested in solving such a
 problem?

 Christoph
 ___
 xorg@lists.freedesktop.org: X.Org support
 Archives: http://lists.freedesktop.org/archives/xorg
 Info: http://lists.freedesktop.org/mailman/listinfo/xorg
 Your subscription address: madman2...@gmail.com


 As far as i know it basically boils down to this, rendering rectangles
 is done in a software library as you observed. If your pixmap happens
 to be outside normal ram then a lot of reads will kill performance.
 These days the aim should be to use as little core rendering as
 possible. A modern toolkit or a rendering library like cairo should
 handle this far better.


 How can I check whether the pixmap is outside the normal ram? For me it does
 not look as if pixman is used for rendering the image. It looks as if EXA is
 managing the region that needs updates with pixman routines. But I could be
 wrong here.

 Christoph


Only the driver knows that. If you know what driver it is, you can
also figure out if the driver handles pixmap allocation themselves.
Then you can see if they (are likely) to use dedicated video ram,
which is the only uncached memory (=slow read) i can think of right
now. I don't know off the top of my head how to determine if a pixmap
is offscreen (for a driver that allocates pixmaps this means that
the pixmap is under driver control, the alternative is that it's a
malloc private to exa).

-- 
Far away from the primal instinct, the song seems to fade away, the
river get wider between your thoughts and the things we do and say.
___
xorg@lists.freedesktop.org: X.Org support
Archives: http://lists.freedesktop.org/archives/xorg
Info: http://lists.freedesktop.org/mailman/listinfo/xorg
Your subscription address: arch...@mail-archive.com


Re: EXA performance problem

2011-11-27 Thread Christoph Bartoschek
I have new information. I am no longer sure whether it is a problem with 
EXA.


I have a testcase that currently takes 90 seconds to draw all 
rectangles. I see that in damage.c two functions are mainly used:


damagePolyRectangle
damagePolyFillRectangle

The first function calls for each given rectangle up to four times 
damageDamageBox (pDrawable, box, pGC-subWindowMode);

which adds the box to a region. The function then calls damageRegionAppend.

This part takes in sum 30 seconds of my testcase. I think the code has 
quadratic behaviour here becuase it adds rectangle by rectangle instead 
of first adding them to a region and then calling damageRegionAppend. I 
think removing the quadratic behaviour can reduce the runtime significantly.


About 60 seconds are spent in the calls

(*pGC-ops-PolyRectangle)(pDrawable, pGC, nRects, pRects);
(*pGC-ops-PolyFillRect)(pDrawable, pGC, nRects, pRects);

However I do not yet know why they are so slow.

Is damage.c part of EXA?

Christoph
___
xorg@lists.freedesktop.org: X.Org support
Archives: http://lists.freedesktop.org/archives/xorg
Info: http://lists.freedesktop.org/mailman/listinfo/xorg
Your subscription address: arch...@mail-archive.com


Re: EXA performance problem

2011-11-27 Thread Maarten Maathuis
On Mon, Nov 28, 2011 at 2:41 AM, Christoph Bartoschek
bartosc...@or.uni-bonn.de wrote:
 I have new information. I am no longer sure whether it is a problem with
 EXA.

 I have a testcase that currently takes 90 seconds to draw all rectangles. I
 see that in damage.c two functions are mainly used:

 damagePolyRectangle
 damagePolyFillRectangle

 The first function calls for each given rectangle up to four times
 damageDamageBox (pDrawable, box, pGC-subWindowMode);
 which adds the box to a region. The function then calls damageRegionAppend.

 This part takes in sum 30 seconds of my testcase. I think the code has
 quadratic behaviour here becuase it adds rectangle by rectangle instead of
 first adding them to a region and then calling damageRegionAppend. I think
 removing the quadratic behaviour can reduce the runtime significantly.

 About 60 seconds are spent in the calls

 (*pGC-ops-PolyRectangle)(pDrawable, pGC, nRects, pRects);
 (*pGC-ops-PolyFillRect)(pDrawable, pGC, nRects, pRects);

 However I do not yet know why they are so slow.

 Is damage.c part of EXA?

 Christoph
 ___
 xorg@lists.freedesktop.org: X.Org support
 Archives: http://lists.freedesktop.org/archives/xorg
 Info: http://lists.freedesktop.org/mailman/listinfo/xorg
 Your subscription address: madman2...@gmail.com


No, damage is an extention, it is called by EXA, it's probably adding
all you rectangles to a damage region used to determine how much data
is actually valid (needed for ram--vram migrations for example).

One thing that just comes to mind, if you are rendering a million
rectangles, how many of those do you actually see on your screen?

Anyway, you can try optimizing damaga, exa and either fb or mi (for
PolyRectangle PolyFillRect software ops). I don't know how efficient
the region code is at reducing the number of rectangles if they
overlap, a region is built up out of rectangles as well.

-- 
Far away from the primal instinct, the song seems to fade away, the
river get wider between your thoughts and the things we do and say.
___
xorg@lists.freedesktop.org: X.Org support
Archives: http://lists.freedesktop.org/archives/xorg
Info: http://lists.freedesktop.org/mailman/listinfo/xorg
Your subscription address: arch...@mail-archive.com


Re: EXA performance problem

2011-11-27 Thread Maarten Maathuis
On Mon, Nov 28, 2011 at 7:43 AM, Maarten Maathuis madman2...@gmail.com wrote:
 On Mon, Nov 28, 2011 at 2:41 AM, Christoph Bartoschek
 bartosc...@or.uni-bonn.de wrote:
 I have new information. I am no longer sure whether it is a problem with
 EXA.

 I have a testcase that currently takes 90 seconds to draw all rectangles. I
 see that in damage.c two functions are mainly used:

 damagePolyRectangle
 damagePolyFillRectangle

 The first function calls for each given rectangle up to four times
 damageDamageBox (pDrawable, box, pGC-subWindowMode);
 which adds the box to a region. The function then calls damageRegionAppend.

 This part takes in sum 30 seconds of my testcase. I think the code has
 quadratic behaviour here becuase it adds rectangle by rectangle instead of
 first adding them to a region and then calling damageRegionAppend. I think
 removing the quadratic behaviour can reduce the runtime significantly.

 About 60 seconds are spent in the calls

 (*pGC-ops-PolyRectangle)(pDrawable, pGC, nRects, pRects);
 (*pGC-ops-PolyFillRect)(pDrawable, pGC, nRects, pRects);

 However I do not yet know why they are so slow.

 Is damage.c part of EXA?

 Christoph
 ___
 xorg@lists.freedesktop.org: X.Org support
 Archives: http://lists.freedesktop.org/archives/xorg
 Info: http://lists.freedesktop.org/mailman/listinfo/xorg
 Your subscription address: madman2...@gmail.com


 No, damage is an extention, it is called by EXA, it's probably adding
 all you rectangles to a damage region used to determine how much data
 is actually valid (needed for ram--vram migrations for example).

 One thing that just comes to mind, if you are rendering a million
 rectangles, how many of those do you actually see on your screen?

 Anyway, you can try optimizing damaga, exa and either fb or mi (for
 PolyRectangle PolyFillRect software ops). I don't know how efficient
 the region code is at reducing the number of rectangles if they
 overlap, a region is built up out of rectangles as well.

 --
 Far away from the primal instinct, the song seems to fade away, the
 river get wider between your thoughts and the things we do and say.


s/damaga/damage and s/extention/extension and s/you rectangles/your rectangles

It was too early in the morning :-)

-- 
Far away from the primal instinct, the song seems to fade away, the
river get wider between your thoughts and the things we do and say.
___
xorg@lists.freedesktop.org: X.Org support
Archives: http://lists.freedesktop.org/archives/xorg
Info: http://lists.freedesktop.org/mailman/listinfo/xorg
Your subscription address: arch...@mail-archive.com