Re: i915 performance, master, i915tex gem

2008-05-20 Thread Thomas Hellström
Keith Packard wrote:
 On Mon, 2008-05-19 at 20:11 +0100, Keith Whitwell wrote:

   
 I'm still confused by your test setup...  Stepping back from cache
 metaphysics, why doesn't classic pin the hardware, if it's still got
 60% cpu to burn?
 

 glxgears under classic is definitely not pinning the hardware -- the
 'intel_idle' tool shows that it's only using about 70% of the GPU. GEM
 is pinning the hardware. Usually this means there's some synchronization
 between the CPU and GPU causing each to wait part of the time while the
 other executes. I haven't really looked at the non-gem case though; the
 numbers seem similar enough to what I've seen in the past.

   
 I think getting reproducible results makes a lot of sense.  What
 hardware are you actually using -- ie. what is this laptop?
 

 This is a Panasonic CF-R4.

   
So we were actually using a slightly stale version of GEM from Eric's 
repos,  Michel rerun his tests with the indicated versions without any 
significant changes.

I've rebuilt on a HP (Compaq) nx 7300 laptop, 1GB single-channel i945G, 
Celeron M [EMAIL PROTECTED] GHz. Kernel 2.6.25rc4. This is the third system we 
test.

I've added the teapot demo, since it should be completely CPU-bound, 
even on this machine.
After the tests, I must say Keith Whitwell's conclusions seem to hold:

* Intel's TTM and GEM's approaches to buffer management translate to
  a lot extra CPU usage and worse performance
* With that approach, GEM might improve over TTM, but it's not seen
  here.
* Classic is apparently doing suboptimal syncs that limits its
  performance in some cases (gears, teapot and perhaps openarena),
  one should not benchmark framerates against classic in those cases.

And furtermore

* GEM's tex(sub)image (map and copy to device) performance really
  sucks. It would be good to see some benchmarks using pwrite here.


Gears:

Classic845fps @ 35%
Intel TTM  942fps @ 61%
GEM902fps @ 61%
i915tex (TTM)  977fps @ 40%

Openarena + exec anholt @ 640x480

Classic49.8fps 12.3u 1.1s
Intel TTM  54.0fps 12.9u 4.6s
GEM50.3fps 12.0u 4.8s
i915tex (TTM)  61.0fps 12.6u 1.7s

Ipers without help screen

Classic333000 pps
Intel TTM  254000 pps
GEMGPU lockup
i915tex (TTM)  325000 pps

Teapot

Classic65.5 fps (CPU at 77%)
Intel TTM  70.3 fps
GEMGPU lockup
i915tex (TTM)  77.0 fps

Texdown + subimage

Classic452 + 510 MB/s
Intel TTM  537 + 158 MB/s
GEM385 + 86 MB/s
i915tex (TTM)  1185 + 1664 MB/s





-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: i915 performance, master, i915tex gem

2008-05-20 Thread Keith Whitwell
 So possibilities are:
  - batchbuffer starvation -- has

I was going to say 'has this changed significantly' -- and the answer
is that it has of course, with the bufmgr_fake changes...  I can't
tell by quick inspection if these are a likely culprit, but it's
certainly a signifcant set of changes relative to the classic version
of classic...

  - over-throttling in swapbuffers -- I think we used to let it get
 two frames ahead - has this changed?
  - something else...

Keith

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: i915 performance, master, i915tex gem

2008-05-20 Thread Keith Whitwell
   * Classic is apparently doing suboptimal syncs that limits its
 performance in some cases (gears, teapot and perhaps openarena),
 one should not benchmark framerates against classic in those cases.

As I said elsewhere, I'd like to get to the bottom of this -- it
wasn't always this way.  Otherwise we should abandon 'classic' off the
trunk and use one of the ye olde 7.0 versions.

Keith

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: i915 performance, master, i915tex gem

2008-05-20 Thread Thomas Hellström
Keith Whitwell wrote:
   * Classic is apparently doing suboptimal syncs that limits its
 performance in some cases (gears, teapot and perhaps openarena),
 one should not benchmark framerates against classic in those cases.
 

 As I said elsewhere, I'd like to get to the bottom of this -- it
 wasn't always this way.  Otherwise we should abandon 'classic' off the
 trunk and use one of the ye olde 7.0 versions.

   
I agree.

I did some benchmarks on TTM vs classic then and they were quite similar 
back then with TTM generally using slightly more CPU,  as we would 
expect. TTM of course would do better on apps with certain texture 
functionality due to it's single texture copy.

 Keith
   
/Thomas




-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: i915 performance, master, i915tex gem

2008-05-20 Thread Dave Airlie

 So possibilities are:
   - batchbuffer starvation -- has
   - over-throttling in swapbuffers -- I think we used to let it get
 two frames ahead - has this changed?

I would suspect this broke somehow at some point..

Dave.

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: i915 performance, master, i915tex gem

2008-05-20 Thread Johannes Engel
Hi, everyone,

I wonder how you got any OpenGL-app running using Keith's GEM tree. For 
me even glxgears turns the screen black although AFAIK not necessarily 
crashing the Xserver.
I will further investigate on that.

Best regards, Johannes

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: i915 performance, master, i915tex gem

2008-05-20 Thread Johannes Engel
Johannes Engel schrieb:
 Hi, everyone,

 I wonder how you got any OpenGL-app running using Keith's GEM tree. 
 For me even glxgears turns the screen black although AFAIK not 
 necessarily crashing the Xserver.
 I will further investigate on that.
OK, at least that seems not to be reproducible, since it does not occur 
at the moment one restart later.
On my 945GM GEM lets kwin4 with composite feel much smoother. But that's 
only subjective. glxgears does not pin the CPU but returns values 
similar to those with TTM.

Greetings, Johannes

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: i915 performance, master, i915tex gem

2008-05-20 Thread Thomas Hellström
Johannes Engel wrote:
 Hi, everyone,

 I wonder how you got any OpenGL-app running using Keith's GEM tree. For 
 me even glxgears turns the screen black although AFAIK not necessarily 
 crashing the Xserver.
 I will further investigate on that.

 Best regards, Johannes

   
Johannes,
Double-check that you're not enabling AIGLX.

/Thomas

 -
 This SF.net email is sponsored by: Microsoft 
 Defy all challenges. Microsoft(R) Visual Studio 2008. 
 http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
 --
 ___
 Dri-devel mailing list
 Dri-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dri-devel
   




-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: i915 performance, master, i915tex gem

2008-05-20 Thread Johannes Engel
Thomas Hellström schrieb:
 Johannes Engel wrote:
 Hi, everyone,

 I wonder how you got any OpenGL-app running using Keith's GEM tree. 
 For me even glxgears turns the screen black although AFAIK not 
 necessarily crashing the Xserver.
 I will further investigate on that.

 Best regards, Johannes

   
 Johannes,
 Double-check that you're not enabling AIGLX.

 /Thomas 
Without AIGLX it does not even run, since I cannot compile the glcore 
driver since the source file seems to miss any include. :)

Greetings, Johannes

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: i915 performance, master, i915tex gem

2008-05-19 Thread Thomas Hellström
Keith Whitwell wrote:
 Texdown
 1327MB/s (i915tex)
 551MB/s (master, ttm)
 572MB/s (master, no-ttm)
 Texdown, subimage
 1014MB/s (i915tex)
 134MB/s (master, ttm)
 148MB/s (master, no-ttm)
   

Gem on this machine (kernel 2.6.24) is hitting
Texdown 342MB/s
Texdown, subimage 76MB/s

...

- a separate regression seems to have killed texture upload performance on 
 master/ttm relative to it's ancestor i915tex.
   

Actually I think these are mostly issues stemming from not using 
write-combined mappings and instead using write-back mappings with 
clflush and chipset flush before binding to the GTT.

Note that, from what I can tell, the i915 gem driver is still using mmap 
for these operations.
/Thomas

 Keith

   




-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: i915 performance, master, i915tex gem

2008-05-19 Thread Keith Packard
On Mon, 2008-05-19 at 05:09 -0700, Keith Whitwell wrote:

 I
 think the latter is the significant result -- none of these experiments
 in memory management significantly change the command stream the
 hardware has to operate on, so what we're varying essentially is the
 CPU behaviour to acheive that command stream.  And it is in CPU usage
 where GEM (and Keith/Eric's now-abandoned TTM driver) do significantly
 dissapoint.

Your GEM results do not match mine; perhaps we're running different
kernels? Anything older than 2.6.24 won't be using clflush and will
instead use wbinvd, a significant performance impact.  Profiling would
show whether this is the case.

I did some fairly simple measurements using openarena and enemy
territory. Kernel version 2.6.25, CPU 1.3GHz Pentium M, 915GMS with the
slowest possible memory. I'm afraid I don't have a working TTM
environment at present; I will try to get that working so I can do more
complete comparisons.

fps realuserkernel
glxgears classic:   665
glxgears GEM:   889
openareana classic: 17.1 59.19   37.13   1.80
openarena GEM:  24.6 44.06   25.52   5.29
enemy territory classic: 9.0382.13  226.38  11.51   
enemy territory GEM:15.7212.80  121.72  40.50

 Or to put it another way, GEM  master/TTM seem to burn huge
 amounts
 of CPU just running the memory manager.

I'm not seeing that in these demos; actual allocation is costing about
3% of the CPU time. Of course, for this hardware, the obvious solution
of re-using batch buffers would eliminate that cost entirely.

It would be nice to see the kernel time reduced further, but it's not
terrible so far.

-- 
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part
-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: i915 performance, master, i915tex gem

2008-05-19 Thread Thomas Hellström
Keith Packard wrote:
 On Mon, 2008-05-19 at 05:09 -0700, Keith Whitwell wrote:

   
 I
 think the latter is the significant result -- none of these experiments
 in memory management significantly change the command stream the
 hardware has to operate on, so what we're varying essentially is the
 CPU behaviour to acheive that command stream.  And it is in CPU usage
 where GEM (and Keith/Eric's now-abandoned TTM driver) do significantly
 dissapoint.
 

 Your GEM results do not match mine; perhaps we're running different
 kernels? Anything older than 2.6.24 won't be using clflush and will
 instead use wbinvd, a significant performance impact.  Profiling would
 show whether this is the case.

 I did some fairly simple measurements using openarena and enemy
 territory. Kernel version 2.6.25, CPU 1.3GHz Pentium M, 915GMS with the
 slowest possible memory. I'm afraid I don't have a working TTM
 environment at present; I will try to get that working so I can do more
 complete comparisons.
   
   fps realuserkernel
 glxgears classic: 665
 glxgears GEM: 889
 openareana classic:   17.1 59.19   37.13   1.80
 openarena GEM:24.6 44.06   25.52   5.29
 enemy territory classic:   9.0382.13  226.38  11.51   
 enemy territory GEM:  15.7212.80  121.72  40.50

   
Keith,

The GEM timings were done with 2.6.25, except on the i915 system texdown 
timings which used 2.6.24.
Indeed, Michel reported much worse GEM figures with 2.6.23.

Your figures look a bit odd. Is glxgears classic CPU-bound? If not, why 
does it give a significantly slower framerate than
glxgears GEM?

The other apps are obviously GPU bound judging from the timings. They 
shouldn't really differ in frame-rate?

/Thomas





-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: i915 performance, master, i915tex gem

2008-05-19 Thread Keith Packard
On Mon, 2008-05-19 at 20:32 +0200, Thomas Hellström wrote:
 Keith Packard wrote:
  On Mon, 2008-05-19 at 05:09 -0700, Keith Whitwell wrote:
 

  I
  think the latter is the significant result -- none of these experiments
  in memory management significantly change the command stream the
  hardware has to operate on, so what we're varying essentially is the
  CPU behaviour to acheive that command stream.  And it is in CPU usage
  where GEM (and Keith/Eric's now-abandoned TTM driver) do significantly
  dissapoint.
  
 
  Your GEM results do not match mine; perhaps we're running different
  kernels? Anything older than 2.6.24 won't be using clflush and will
  instead use wbinvd, a significant performance impact.  Profiling would
  show whether this is the case.
 
  I did some fairly simple measurements using openarena and enemy
  territory. Kernel version 2.6.25, CPU 1.3GHz Pentium M, 915GMS with the
  slowest possible memory. I'm afraid I don't have a working TTM
  environment at present; I will try to get that working so I can do more
  complete comparisons.
  
  fps realuserkernel
  glxgears classic:   665
  glxgears GEM:   889
  openareana classic: 17.1 59.19   37.13   1.80
  openarena GEM:  24.6 44.06   25.52   5.29
  enemy territory classic: 9.0382.13  226.38  11.51   
  enemy territory GEM:15.7212.80  121.72  40.50
 

 Keith,
 
 The GEM timings were done with 2.6.25, except on the i915 system texdown 
 timings which used 2.6.24.
 Indeed, Michel reported much worse GEM figures with 2.6.23.

We clearly need to find a way to generate reproducible benchmark data.

Here's what I'm running:

kernel: 

commit 4b119e21d0c66c22e8ca03df05d9de623d0eb50f
Author: Linus Torvalds [EMAIL PROTECTED]
Date:   Wed Apr 16 19:49:44 2008 -0700

Linux 2.6.25

(there's a patch to export shmem_file_setup on top of this)

mesa (from git://people.freedesktop.org/~keithp/mesa):

commit 8b49cc104dd556218fc769178b96f4a8a428d057
Author: Keith Packard [EMAIL PROTECTED]
Date:   Sat May 17 23:34:47 2008 -0700

[intel-gem] Don't calloc reloc buffers

Only a few relocations are typically used, so don't clear
the
whole thing.

drm (from git://people.freedesktop.org/~keithp/drm):

commit 6e46a3c762919af05fcc6a08542faa7d185487a1
Author: Eric Anholt [EMAIL PROTECTED]
Date:   Mon May 12 15:42:20 2008 -0700

[GEM] Update testcases for new API.

xf86-video-intel (from git://people.freedesktop.org/~keithp/xf86-video-intel):

commit c81050c0058e32098259b5078515807038beb7d6
Merge: 9c9a5d0... e9532f3...
Author: Keith Packard [EMAIL PROTECTED]
Date:   Sat May 17 23:26:14 2008 -0700

Merge commit 'origin/master' into drm-gem

 Your figures look a bit odd. Is glxgears classic CPU-bound? If not, why 
 does it give a significantly slower framerate than
 glxgears GEM?

glxgears uses 40% of the CPU in both classic and gem. Note that the gem
version takes about 20 seconds to reach a steady state -- the gem driver
isn't clearing the gtt actively and so glxgears gets far ahead of the
gpu.

My theory is that this shows that using cache-aware copies from a single
static batch buffer (as gem does now) improves cache performance and
write bandwidth.

-- 
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part
-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: i915 performance, master, i915tex gem

2008-05-19 Thread Keith Whitwell

 glxgears uses 40% of the CPU in both classic and gem. Note that the gem
 version takes about 20 seconds to reach a steady state -- the gem driver
 isn't clearing the gtt actively and so glxgears gets far ahead of the
 gpu.

 My theory is that this shows that using cache-aware copies from a single
 static batch buffer (as gem does now) improves cache performance and
 write bandwidth.

I'm still confused by your test setup...  Stepping back from cache
metaphysics, why doesn't classic pin the hardware, if it's still got
60% cpu to burn?

I think getting reproducible results makes a lot of sense.  What
hardware are you actually using -- ie. what is this laptop?

Keith

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: i915 performance, master, i915tex gem

2008-05-19 Thread Keith Packard
On Mon, 2008-05-19 at 20:11 +0100, Keith Whitwell wrote:

 I'm still confused by your test setup...  Stepping back from cache
 metaphysics, why doesn't classic pin the hardware, if it's still got
 60% cpu to burn?

glxgears under classic is definitely not pinning the hardware -- the
'intel_idle' tool shows that it's only using about 70% of the GPU. GEM
is pinning the hardware. Usually this means there's some synchronization
between the CPU and GPU causing each to wait part of the time while the
other executes. I haven't really looked at the non-gem case though; the
numbers seem similar enough to what I've seen in the past.

 I think getting reproducible results makes a lot of sense.  What
 hardware are you actually using -- ie. what is this laptop?

This is a Panasonic CF-R4.

-- 
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part
-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel