Re: EXA performance problem
Am 28.11.2011 07:43, schrieb Maarten Maathuis: ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: madman2...@gmail.com No, damage is an extention, it is called by EXA, it's probably adding all you rectangles to a damage region used to determine how much data is actually valid (needed for ram--vram migrations for example). One thing that just comes to mind, if you are rendering a million rectangles, how many of those do you actually see on your screen? Most of them are only 1x1 pixel wide. And lots of rectangles share the same pixel. I assue that one can optimize the application. But I did not write it and do not know how it works. I only know that it was able to show vector pictures consisting of millions of rectangles within seconds (VLSI design data) when run on XFree86. With Xorg it takes minutes. I only see the problem because we recently upgraded our X11 thin clients to better hardware. But they turned out to be much slower than the older ones. My quest to find the problem has led me to the damage extension now. First I thought it was a network problem. But Xorg was also slow on my notebook when the program was started locally. The contrast is striking: The old XFree86 thin clients were able to draw all the rectangles that were sent over a 100 MBit ethernet network in seconds. While my more powerful Xorg server needs minutes although the software runs on the same machine. However, I was able to improve the runtime of the first operation in damagePolyRectangle. The runtime of my benchmark went down from 90 seconds to 64 seconds. Now one has to look at (*pGC-ops-PolyRectangle)(pDrawable, pGC, nRects, pRects); Thanks Christoph ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: arch...@mail-archive.com
Re: EXA performance problem
Am 28.11.2011 10:35, schrieb Christoph Bartoschek: Now one has to look at (*pGC-ops-PolyRectangle)(pDrawable, pGC, nRects, pRects); Here is what I see so far: - damagePolyRectangle is called for 2044 rectangles. - the damage region is computed it consists of about 1000 rectangles each time. - miPolyRectangle is called. - the function iterates over all rectangles and calls exaPolylines for each of them because most have only a width and height of 0 - exaPolylines calls ExaCheckPolylines. We see that for each rectanlge ExaCheckPolylines is called. I have added timers to this function to see what costs time: void ExaCheckPolylines (DrawablePtr pDrawable, GCPtr pGC, int mode, int npt, DDXPointPtr ppt) { EXA_PRE_FALLBACK_GC(pGC); EXA_FALLBACK((to %p (%c), width %d, mode %d, count %d\n, pDrawable, exaDrawableLocation(pDrawable), pGC-lineWidth, mode, npt)); exaPrepareAccess (pDrawable, EXA_PREPARE_DEST); // Step1: 55 s exaPrepareAccessGC (pGC); // Step2: 2.4 s pGC-ops-Polylines (pDrawable, pGC, mode, npt, ppt); // Step3: 2.4 s exaFinishAccessGC (pGC); // Step4: 2.2 s exaFinishAccess (pDrawable, EXA_PREPARE_DEST);// Step5: 2.2 s EXA_POST_FALLBACK_GC(pGC); } We see that exaPrepareAccess needs most of the time. Is that expected? Inside we see that there are some region operations on the damage region in exaCopyDirty. As said before the damage region contains about 1000 rectangles. So we have 2000 times several operations on 1000 rectangeles. I think this explains the runtime. Isn't it somehow possible to batch the rectangle drawing such that the region operations are not neccessary for each rectangle? Isn't is possible to expand the damage region such that it contains less rectangles? Is this still the correct list, or should I ask the EXA questions elsewhere? Christoph ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: arch...@mail-archive.com
Re: EXA performance problem
On Mon, Nov 28, 2011 at 4:49 PM, Christoph Bartoschek bartosc...@or.uni-bonn.de wrote: Am 28.11.2011 10:35, schrieb Christoph Bartoschek: Now one has to look at (*pGC-ops-PolyRectangle)(pDrawable, pGC, nRects, pRects); Here is what I see so far: - damagePolyRectangle is called for 2044 rectangles. - the damage region is computed it consists of about 1000 rectangles each time. - miPolyRectangle is called. - the function iterates over all rectangles and calls exaPolylines for each of them because most have only a width and height of 0 - exaPolylines calls ExaCheckPolylines. We see that for each rectanlge ExaCheckPolylines is called. I have added timers to this function to see what costs time: void ExaCheckPolylines (DrawablePtr pDrawable, GCPtr pGC, int mode, int npt, DDXPointPtr ppt) { EXA_PRE_FALLBACK_GC(pGC); EXA_FALLBACK((to %p (%c), width %d, mode %d, count %d\n, pDrawable, exaDrawableLocation(pDrawable), pGC-lineWidth, mode, npt)); exaPrepareAccess (pDrawable, EXA_PREPARE_DEST); // Step1: 55 s exaPrepareAccessGC (pGC); // Step2: 2.4 s pGC-ops-Polylines (pDrawable, pGC, mode, npt, ppt); // Step3: 2.4 s exaFinishAccessGC (pGC); // Step4: 2.2 s exaFinishAccess (pDrawable, EXA_PREPARE_DEST); // Step5: 2.2 s EXA_POST_FALLBACK_GC(pGC); } We see that exaPrepareAccess needs most of the time. Is that expected? Inside we see that there are some region operations on the damage region in exaCopyDirty. As said before the damage region contains about 1000 rectangles. So we have 2000 times several operations on 1000 rectangeles. I think this explains the runtime. Isn't it somehow possible to batch the rectangle drawing such that the region operations are not neccessary for each rectangle? Isn't is possible to expand the damage region such that it contains less rectangles? Is this still the correct list, or should I ask the EXA questions elsewhere? Christoph ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: madman2...@gmail.com EXA doesn't have a seperate list, but now that you ask, you should probably move to the xorg-devel mailinglist :-) I don't have any answers right now, but i'll think about it. -- Far away from the primal instinct, the song seems to fade away, the river get wider between your thoughts and the things we do and say. ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: arch...@mail-archive.com
Re: EXA performance problem
On Sun, Nov 27, 2011 at 3:55 PM, Christoph Bartoschek bartosc...@or.uni-bonn.de wrote: Hi, I still have a huge performance problem with Xorg. One application that painted 2 Mio rectangles on the screen within a second or so with XFree86 needs about a minute with Xorg. Most of the time is spent in libpixman. I've added some debug statements and see that pixman_raster_op is called about 7.2 mio times during my testcase. I do not think that pixman itself is the problem. It is just used too often by EXA. Is there anything I can do about this? Is there a better list where I can ask? Or do you know a person that might be interested in solving such a problem? Christoph ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: madman2...@gmail.com As far as i know it basically boils down to this, rendering rectangles is done in a software library as you observed. If your pixmap happens to be outside normal ram then a lot of reads will kill performance. These days the aim should be to use as little core rendering as possible. A modern toolkit or a rendering library like cairo should handle this far better. -- Far away from the primal instinct, the song seems to fade away, the river get wider between your thoughts and the things we do and say. ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: arch...@mail-archive.com
Re: EXA performance problem
On Sun, 27 Nov 2011 15:55:12 +0100, Christoph Bartoschek bartosc...@or.uni-bonn.de wrote: Hi, I still have a huge performance problem with Xorg. One application that painted 2 Mio rectangles on the screen within a second or so with XFree86 needs about a minute with Xorg. The easiest way for anyone else to reproduce this issue would be if you were to identify the most common slow op and translate that into the appropriate x11perf command line. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: arch...@mail-archive.com
Re: EXA performance problem
Am 27.11.2011 16:13, schrieb Maarten Maathuis: On Sun, Nov 27, 2011 at 3:55 PM, Christoph Bartoschek bartosc...@or.uni-bonn.de wrote: Hi, I still have a huge performance problem with Xorg. One application that painted 2 Mio rectangles on the screen within a second or so with XFree86 needs about a minute with Xorg. Most of the time is spent in libpixman. I've added some debug statements and see that pixman_raster_op is called about 7.2 mio times during my testcase. I do not think that pixman itself is the problem. It is just used too often by EXA. Is there anything I can do about this? Is there a better list where I can ask? Or do you know a person that might be interested in solving such a problem? Christoph ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: madman2...@gmail.com As far as i know it basically boils down to this, rendering rectangles is done in a software library as you observed. If your pixmap happens to be outside normal ram then a lot of reads will kill performance. These days the aim should be to use as little core rendering as possible. A modern toolkit or a rendering library like cairo should handle this far better. How can I check whether the pixmap is outside the normal ram? For me it does not look as if pixman is used for rendering the image. It looks as if EXA is managing the region that needs updates with pixman routines. But I could be wrong here. Christoph ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: arch...@mail-archive.com
Re: EXA performance problem
On Sun, Nov 27, 2011 at 9:40 PM, Christoph Bartoschek bartosc...@or.uni-bonn.de wrote: Am 27.11.2011 16:13, schrieb Maarten Maathuis: On Sun, Nov 27, 2011 at 3:55 PM, Christoph Bartoschek bartosc...@or.uni-bonn.de wrote: Hi, I still have a huge performance problem with Xorg. One application that painted 2 Mio rectangles on the screen within a second or so with XFree86 needs about a minute with Xorg. Most of the time is spent in libpixman. I've added some debug statements and see that pixman_raster_op is called about 7.2 mio times during my testcase. I do not think that pixman itself is the problem. It is just used too often by EXA. Is there anything I can do about this? Is there a better list where I can ask? Or do you know a person that might be interested in solving such a problem? Christoph ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: madman2...@gmail.com As far as i know it basically boils down to this, rendering rectangles is done in a software library as you observed. If your pixmap happens to be outside normal ram then a lot of reads will kill performance. These days the aim should be to use as little core rendering as possible. A modern toolkit or a rendering library like cairo should handle this far better. How can I check whether the pixmap is outside the normal ram? For me it does not look as if pixman is used for rendering the image. It looks as if EXA is managing the region that needs updates with pixman routines. But I could be wrong here. Christoph Only the driver knows that. If you know what driver it is, you can also figure out if the driver handles pixmap allocation themselves. Then you can see if they (are likely) to use dedicated video ram, which is the only uncached memory (=slow read) i can think of right now. I don't know off the top of my head how to determine if a pixmap is offscreen (for a driver that allocates pixmaps this means that the pixmap is under driver control, the alternative is that it's a malloc private to exa). -- Far away from the primal instinct, the song seems to fade away, the river get wider between your thoughts and the things we do and say. ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: arch...@mail-archive.com
Re: EXA performance problem
I have new information. I am no longer sure whether it is a problem with EXA. I have a testcase that currently takes 90 seconds to draw all rectangles. I see that in damage.c two functions are mainly used: damagePolyRectangle damagePolyFillRectangle The first function calls for each given rectangle up to four times damageDamageBox (pDrawable, box, pGC-subWindowMode); which adds the box to a region. The function then calls damageRegionAppend. This part takes in sum 30 seconds of my testcase. I think the code has quadratic behaviour here becuase it adds rectangle by rectangle instead of first adding them to a region and then calling damageRegionAppend. I think removing the quadratic behaviour can reduce the runtime significantly. About 60 seconds are spent in the calls (*pGC-ops-PolyRectangle)(pDrawable, pGC, nRects, pRects); (*pGC-ops-PolyFillRect)(pDrawable, pGC, nRects, pRects); However I do not yet know why they are so slow. Is damage.c part of EXA? Christoph ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: arch...@mail-archive.com
Re: EXA performance problem
On Mon, Nov 28, 2011 at 2:41 AM, Christoph Bartoschek bartosc...@or.uni-bonn.de wrote: I have new information. I am no longer sure whether it is a problem with EXA. I have a testcase that currently takes 90 seconds to draw all rectangles. I see that in damage.c two functions are mainly used: damagePolyRectangle damagePolyFillRectangle The first function calls for each given rectangle up to four times damageDamageBox (pDrawable, box, pGC-subWindowMode); which adds the box to a region. The function then calls damageRegionAppend. This part takes in sum 30 seconds of my testcase. I think the code has quadratic behaviour here becuase it adds rectangle by rectangle instead of first adding them to a region and then calling damageRegionAppend. I think removing the quadratic behaviour can reduce the runtime significantly. About 60 seconds are spent in the calls (*pGC-ops-PolyRectangle)(pDrawable, pGC, nRects, pRects); (*pGC-ops-PolyFillRect)(pDrawable, pGC, nRects, pRects); However I do not yet know why they are so slow. Is damage.c part of EXA? Christoph ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: madman2...@gmail.com No, damage is an extention, it is called by EXA, it's probably adding all you rectangles to a damage region used to determine how much data is actually valid (needed for ram--vram migrations for example). One thing that just comes to mind, if you are rendering a million rectangles, how many of those do you actually see on your screen? Anyway, you can try optimizing damaga, exa and either fb or mi (for PolyRectangle PolyFillRect software ops). I don't know how efficient the region code is at reducing the number of rectangles if they overlap, a region is built up out of rectangles as well. -- Far away from the primal instinct, the song seems to fade away, the river get wider between your thoughts and the things we do and say. ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: arch...@mail-archive.com
Re: EXA performance problem
On Mon, Nov 28, 2011 at 7:43 AM, Maarten Maathuis madman2...@gmail.com wrote: On Mon, Nov 28, 2011 at 2:41 AM, Christoph Bartoschek bartosc...@or.uni-bonn.de wrote: I have new information. I am no longer sure whether it is a problem with EXA. I have a testcase that currently takes 90 seconds to draw all rectangles. I see that in damage.c two functions are mainly used: damagePolyRectangle damagePolyFillRectangle The first function calls for each given rectangle up to four times damageDamageBox (pDrawable, box, pGC-subWindowMode); which adds the box to a region. The function then calls damageRegionAppend. This part takes in sum 30 seconds of my testcase. I think the code has quadratic behaviour here becuase it adds rectangle by rectangle instead of first adding them to a region and then calling damageRegionAppend. I think removing the quadratic behaviour can reduce the runtime significantly. About 60 seconds are spent in the calls (*pGC-ops-PolyRectangle)(pDrawable, pGC, nRects, pRects); (*pGC-ops-PolyFillRect)(pDrawable, pGC, nRects, pRects); However I do not yet know why they are so slow. Is damage.c part of EXA? Christoph ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: madman2...@gmail.com No, damage is an extention, it is called by EXA, it's probably adding all you rectangles to a damage region used to determine how much data is actually valid (needed for ram--vram migrations for example). One thing that just comes to mind, if you are rendering a million rectangles, how many of those do you actually see on your screen? Anyway, you can try optimizing damaga, exa and either fb or mi (for PolyRectangle PolyFillRect software ops). I don't know how efficient the region code is at reducing the number of rectangles if they overlap, a region is built up out of rectangles as well. -- Far away from the primal instinct, the song seems to fade away, the river get wider between your thoughts and the things we do and say. s/damaga/damage and s/extention/extension and s/you rectangles/your rectangles It was too early in the morning :-) -- Far away from the primal instinct, the song seems to fade away, the river get wider between your thoughts and the things we do and say. ___ xorg@lists.freedesktop.org: X.Org support Archives: http://lists.freedesktop.org/archives/xorg Info: http://lists.freedesktop.org/mailman/listinfo/xorg Your subscription address: arch...@mail-archive.com