Re: i915 performance, master, i915tex gem
Keith Packard wrote: On Mon, 2008-05-19 at 20:11 +0100, Keith Whitwell wrote: I'm still confused by your test setup... Stepping back from cache metaphysics, why doesn't classic pin the hardware, if it's still got 60% cpu to burn? glxgears under classic is definitely not pinning the hardware -- the 'intel_idle' tool shows that it's only using about 70% of the GPU. GEM is pinning the hardware. Usually this means there's some synchronization between the CPU and GPU causing each to wait part of the time while the other executes. I haven't really looked at the non-gem case though; the numbers seem similar enough to what I've seen in the past. I think getting reproducible results makes a lot of sense. What hardware are you actually using -- ie. what is this laptop? This is a Panasonic CF-R4. So we were actually using a slightly stale version of GEM from Eric's repos, Michel rerun his tests with the indicated versions without any significant changes. I've rebuilt on a HP (Compaq) nx 7300 laptop, 1GB single-channel i945G, Celeron M [EMAIL PROTECTED] GHz. Kernel 2.6.25rc4. This is the third system we test. I've added the teapot demo, since it should be completely CPU-bound, even on this machine. After the tests, I must say Keith Whitwell's conclusions seem to hold: * Intel's TTM and GEM's approaches to buffer management translate to a lot extra CPU usage and worse performance * With that approach, GEM might improve over TTM, but it's not seen here. * Classic is apparently doing suboptimal syncs that limits its performance in some cases (gears, teapot and perhaps openarena), one should not benchmark framerates against classic in those cases. And furtermore * GEM's tex(sub)image (map and copy to device) performance really sucks. It would be good to see some benchmarks using pwrite here. Gears: Classic845fps @ 35% Intel TTM 942fps @ 61% GEM902fps @ 61% i915tex (TTM) 977fps @ 40% Openarena + exec anholt @ 640x480 Classic49.8fps 12.3u 1.1s Intel TTM 54.0fps 12.9u 4.6s GEM50.3fps 12.0u 4.8s i915tex (TTM) 61.0fps 12.6u 1.7s Ipers without help screen Classic333000 pps Intel TTM 254000 pps GEMGPU lockup i915tex (TTM) 325000 pps Teapot Classic65.5 fps (CPU at 77%) Intel TTM 70.3 fps GEMGPU lockup i915tex (TTM) 77.0 fps Texdown + subimage Classic452 + 510 MB/s Intel TTM 537 + 158 MB/s GEM385 + 86 MB/s i915tex (TTM) 1185 + 1664 MB/s - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
So possibilities are: - batchbuffer starvation -- has I was going to say 'has this changed significantly' -- and the answer is that it has of course, with the bufmgr_fake changes... I can't tell by quick inspection if these are a likely culprit, but it's certainly a signifcant set of changes relative to the classic version of classic... - over-throttling in swapbuffers -- I think we used to let it get two frames ahead - has this changed? - something else... Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
* Classic is apparently doing suboptimal syncs that limits its performance in some cases (gears, teapot and perhaps openarena), one should not benchmark framerates against classic in those cases. As I said elsewhere, I'd like to get to the bottom of this -- it wasn't always this way. Otherwise we should abandon 'classic' off the trunk and use one of the ye olde 7.0 versions. Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
Keith Whitwell wrote: * Classic is apparently doing suboptimal syncs that limits its performance in some cases (gears, teapot and perhaps openarena), one should not benchmark framerates against classic in those cases. As I said elsewhere, I'd like to get to the bottom of this -- it wasn't always this way. Otherwise we should abandon 'classic' off the trunk and use one of the ye olde 7.0 versions. I agree. I did some benchmarks on TTM vs classic then and they were quite similar back then with TTM generally using slightly more CPU, as we would expect. TTM of course would do better on apps with certain texture functionality due to it's single texture copy. Keith /Thomas - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
So possibilities are: - batchbuffer starvation -- has - over-throttling in swapbuffers -- I think we used to let it get two frames ahead - has this changed? I would suspect this broke somehow at some point.. Dave. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
Hi, everyone, I wonder how you got any OpenGL-app running using Keith's GEM tree. For me even glxgears turns the screen black although AFAIK not necessarily crashing the Xserver. I will further investigate on that. Best regards, Johannes - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
Johannes Engel schrieb: Hi, everyone, I wonder how you got any OpenGL-app running using Keith's GEM tree. For me even glxgears turns the screen black although AFAIK not necessarily crashing the Xserver. I will further investigate on that. OK, at least that seems not to be reproducible, since it does not occur at the moment one restart later. On my 945GM GEM lets kwin4 with composite feel much smoother. But that's only subjective. glxgears does not pin the CPU but returns values similar to those with TTM. Greetings, Johannes - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
Johannes Engel wrote: Hi, everyone, I wonder how you got any OpenGL-app running using Keith's GEM tree. For me even glxgears turns the screen black although AFAIK not necessarily crashing the Xserver. I will further investigate on that. Best regards, Johannes Johannes, Double-check that you're not enabling AIGLX. /Thomas - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
Thomas Hellström schrieb: Johannes Engel wrote: Hi, everyone, I wonder how you got any OpenGL-app running using Keith's GEM tree. For me even glxgears turns the screen black although AFAIK not necessarily crashing the Xserver. I will further investigate on that. Best regards, Johannes Johannes, Double-check that you're not enabling AIGLX. /Thomas Without AIGLX it does not even run, since I cannot compile the glcore driver since the source file seems to miss any include. :) Greetings, Johannes - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
Keith Whitwell wrote: Texdown 1327MB/s (i915tex) 551MB/s (master, ttm) 572MB/s (master, no-ttm) Texdown, subimage 1014MB/s (i915tex) 134MB/s (master, ttm) 148MB/s (master, no-ttm) Gem on this machine (kernel 2.6.24) is hitting Texdown 342MB/s Texdown, subimage 76MB/s ... - a separate regression seems to have killed texture upload performance on master/ttm relative to it's ancestor i915tex. Actually I think these are mostly issues stemming from not using write-combined mappings and instead using write-back mappings with clflush and chipset flush before binding to the GTT. Note that, from what I can tell, the i915 gem driver is still using mmap for these operations. /Thomas Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
On Mon, 2008-05-19 at 05:09 -0700, Keith Whitwell wrote: I think the latter is the significant result -- none of these experiments in memory management significantly change the command stream the hardware has to operate on, so what we're varying essentially is the CPU behaviour to acheive that command stream. And it is in CPU usage where GEM (and Keith/Eric's now-abandoned TTM driver) do significantly dissapoint. Your GEM results do not match mine; perhaps we're running different kernels? Anything older than 2.6.24 won't be using clflush and will instead use wbinvd, a significant performance impact. Profiling would show whether this is the case. I did some fairly simple measurements using openarena and enemy territory. Kernel version 2.6.25, CPU 1.3GHz Pentium M, 915GMS with the slowest possible memory. I'm afraid I don't have a working TTM environment at present; I will try to get that working so I can do more complete comparisons. fps realuserkernel glxgears classic: 665 glxgears GEM: 889 openareana classic: 17.1 59.19 37.13 1.80 openarena GEM: 24.6 44.06 25.52 5.29 enemy territory classic: 9.0382.13 226.38 11.51 enemy territory GEM:15.7212.80 121.72 40.50 Or to put it another way, GEM master/TTM seem to burn huge amounts of CPU just running the memory manager. I'm not seeing that in these demos; actual allocation is costing about 3% of the CPU time. Of course, for this hardware, the obvious solution of re-using batch buffers would eliminate that cost entirely. It would be nice to see the kernel time reduced further, but it's not terrible so far. -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
Keith Packard wrote: On Mon, 2008-05-19 at 05:09 -0700, Keith Whitwell wrote: I think the latter is the significant result -- none of these experiments in memory management significantly change the command stream the hardware has to operate on, so what we're varying essentially is the CPU behaviour to acheive that command stream. And it is in CPU usage where GEM (and Keith/Eric's now-abandoned TTM driver) do significantly dissapoint. Your GEM results do not match mine; perhaps we're running different kernels? Anything older than 2.6.24 won't be using clflush and will instead use wbinvd, a significant performance impact. Profiling would show whether this is the case. I did some fairly simple measurements using openarena and enemy territory. Kernel version 2.6.25, CPU 1.3GHz Pentium M, 915GMS with the slowest possible memory. I'm afraid I don't have a working TTM environment at present; I will try to get that working so I can do more complete comparisons. fps realuserkernel glxgears classic: 665 glxgears GEM: 889 openareana classic: 17.1 59.19 37.13 1.80 openarena GEM:24.6 44.06 25.52 5.29 enemy territory classic: 9.0382.13 226.38 11.51 enemy territory GEM: 15.7212.80 121.72 40.50 Keith, The GEM timings were done with 2.6.25, except on the i915 system texdown timings which used 2.6.24. Indeed, Michel reported much worse GEM figures with 2.6.23. Your figures look a bit odd. Is glxgears classic CPU-bound? If not, why does it give a significantly slower framerate than glxgears GEM? The other apps are obviously GPU bound judging from the timings. They shouldn't really differ in frame-rate? /Thomas - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
On Mon, 2008-05-19 at 20:32 +0200, Thomas Hellström wrote: Keith Packard wrote: On Mon, 2008-05-19 at 05:09 -0700, Keith Whitwell wrote: I think the latter is the significant result -- none of these experiments in memory management significantly change the command stream the hardware has to operate on, so what we're varying essentially is the CPU behaviour to acheive that command stream. And it is in CPU usage where GEM (and Keith/Eric's now-abandoned TTM driver) do significantly dissapoint. Your GEM results do not match mine; perhaps we're running different kernels? Anything older than 2.6.24 won't be using clflush and will instead use wbinvd, a significant performance impact. Profiling would show whether this is the case. I did some fairly simple measurements using openarena and enemy territory. Kernel version 2.6.25, CPU 1.3GHz Pentium M, 915GMS with the slowest possible memory. I'm afraid I don't have a working TTM environment at present; I will try to get that working so I can do more complete comparisons. fps realuserkernel glxgears classic: 665 glxgears GEM: 889 openareana classic: 17.1 59.19 37.13 1.80 openarena GEM: 24.6 44.06 25.52 5.29 enemy territory classic: 9.0382.13 226.38 11.51 enemy territory GEM:15.7212.80 121.72 40.50 Keith, The GEM timings were done with 2.6.25, except on the i915 system texdown timings which used 2.6.24. Indeed, Michel reported much worse GEM figures with 2.6.23. We clearly need to find a way to generate reproducible benchmark data. Here's what I'm running: kernel: commit 4b119e21d0c66c22e8ca03df05d9de623d0eb50f Author: Linus Torvalds [EMAIL PROTECTED] Date: Wed Apr 16 19:49:44 2008 -0700 Linux 2.6.25 (there's a patch to export shmem_file_setup on top of this) mesa (from git://people.freedesktop.org/~keithp/mesa): commit 8b49cc104dd556218fc769178b96f4a8a428d057 Author: Keith Packard [EMAIL PROTECTED] Date: Sat May 17 23:34:47 2008 -0700 [intel-gem] Don't calloc reloc buffers Only a few relocations are typically used, so don't clear the whole thing. drm (from git://people.freedesktop.org/~keithp/drm): commit 6e46a3c762919af05fcc6a08542faa7d185487a1 Author: Eric Anholt [EMAIL PROTECTED] Date: Mon May 12 15:42:20 2008 -0700 [GEM] Update testcases for new API. xf86-video-intel (from git://people.freedesktop.org/~keithp/xf86-video-intel): commit c81050c0058e32098259b5078515807038beb7d6 Merge: 9c9a5d0... e9532f3... Author: Keith Packard [EMAIL PROTECTED] Date: Sat May 17 23:26:14 2008 -0700 Merge commit 'origin/master' into drm-gem Your figures look a bit odd. Is glxgears classic CPU-bound? If not, why does it give a significantly slower framerate than glxgears GEM? glxgears uses 40% of the CPU in both classic and gem. Note that the gem version takes about 20 seconds to reach a steady state -- the gem driver isn't clearing the gtt actively and so glxgears gets far ahead of the gpu. My theory is that this shows that using cache-aware copies from a single static batch buffer (as gem does now) improves cache performance and write bandwidth. -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
glxgears uses 40% of the CPU in both classic and gem. Note that the gem version takes about 20 seconds to reach a steady state -- the gem driver isn't clearing the gtt actively and so glxgears gets far ahead of the gpu. My theory is that this shows that using cache-aware copies from a single static batch buffer (as gem does now) improves cache performance and write bandwidth. I'm still confused by your test setup... Stepping back from cache metaphysics, why doesn't classic pin the hardware, if it's still got 60% cpu to burn? I think getting reproducible results makes a lot of sense. What hardware are you actually using -- ie. what is this laptop? Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
On Mon, 2008-05-19 at 20:11 +0100, Keith Whitwell wrote: I'm still confused by your test setup... Stepping back from cache metaphysics, why doesn't classic pin the hardware, if it's still got 60% cpu to burn? glxgears under classic is definitely not pinning the hardware -- the 'intel_idle' tool shows that it's only using about 70% of the GPU. GEM is pinning the hardware. Usually this means there's some synchronization between the CPU and GPU causing each to wait part of the time while the other executes. I haven't really looked at the non-gem case though; the numbers seem similar enough to what I've seen in the past. I think getting reproducible results makes a lot of sense. What hardware are you actually using -- ie. what is this laptop? This is a Panasonic CF-R4. -- [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel