Re: OpenGL apps causes frequent system locks

2005-02-10 Thread Geller Sandor
On Tue, 8 Feb 2005, Michel [ISO-8859-1] Dnzer wrote:

 On Mon, 2005-02-07 at 13:40 +0100, Geller Sandor wrote:
 
  Is there any way I can help to track down the problem(s)? My machine
  doesn't have network connection, so I can use only scripts which run in
  the background. With expect and gdb maybe it is possible to get at least a
  backtrace from my non-local-interactive machine.

 Unfortunately, a backtrace is usually useless for a lockup because all
 it will show you is the X server and/or the client(s) waiting for the
 GPU to become idle, which it never does because it's locked up. The
 problem is finding out what caused it to lock up, and that can be very
 hard and time consuming.

 That being said, I too have noticed slightly decreased stability with
 r200 recently. As this seems to have snuck in gradually, binary searches
 to try and isolate the CVS changes causing problems might be a good
 strategy.

Thanks, I checked out CVS versions 2004-08-31 and 2004-09-30 of the X.org.
I will test with this two snapshots on the weekend, and if the latter
crashes while the former doesn't, then I will be able track down which is
the latest CVS snapshot which works on my machine without crashes.

  Geller Sandor [EMAIL PROTECTED]


---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95alloc_id396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: drm race fix for non-core

2005-02-10 Thread Keith Whitwell
Stephane Marchesin wrote:
Hi,
Attached is a straight port of Eric's fix for the drm race to non-core drm.
Committed.
Keith
---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: r300 on PPC problem

2005-02-10 Thread Jerome Glisse
On Thu, 10 Feb 2005 16:16:12 +1100, Benjamin Herrenschmidt
[EMAIL PROTECTED] wrote:
 Hi !
 
 An interesting issue with current X.org CVS and current Linux bk is that
 on r300, the DRI module now loads, and 2D is broken. It looks like an
 endian issue (like pixels are horizontally flipped), I can post a
 snapshot later I suppose. Preventing the kernel module from loading
 fixes it, so I suspect it's an issue with the 2D CCE accel for r300. Is
 this a known problem ?

Yes i added a bug to Xorg. But i am wondering if the problem is
in the drm. Really don't know enought on that but i will look at
drm see if it may came from here. This is more probably due to
the fact that rbbm_gui_cntl endian swapping don't work on r300.

Jerome Glisse


---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: savage-20050205-linux snapshot - problems

2005-02-10 Thread Felix Kühling
Am Donnerstag, den 10.02.2005, 07:21 +0100 schrieb [EMAIL PROTECTED]:
 On Monday 07 February 2005 15:33, Felix K=FChling wrote:
  Am Montag, den 07.02.2005, 15:12 +0100 schrieb=20
 [EMAIL PROTECTED]:
   Hardware:
  
   Toshiba Libretto L2 Tm5600 with:
  
   :00:04.0 VGA compatible controller: S3 Inc.
   86C270-294=3D20 Savage/IX-MV (rev 13) (prog-if 00 [VGA])
   Subsystem: Toshiba America Info Systems:
   Unknown device 0001 Control: I/O+ Mem+ BusMaster+
   SpecCycle- MemWINV- VGASnoop- ParErr-=3D Stepping- SERR-
   FastB2B-
   Status: Cap+ 66Mhz- UDF- FastB2B- ParErr-
   DEVSEL=3D3Dmedium TAbort- =3D TAbort- MAbort- SERR-
   PERR-
   Latency: 248 (1000ns min, 63750ns max), cache
   line size 08 Interrupt: pin A routed to IRQ 11
   Region 0: Memory at e000 (32-bit,
   non-prefetchable) [size=3D3D128=3D M]
   Expansion ROM at 000c [disabled]
   [size=3D3D64K] Capabilities: available only to root
  
   Software:
  
   Gentoo current with Gentoo supplied X Window System
   Version 6.8.1.903 (6.8.=3D 2 RC 3)
   Release Date: 25 January 2005
   X Protocol Version 11, Revision 0, Release 6.8.1.903
   Build Operating System: Linux 2.4.29-rc3-mhf239 i686
   [ELF]=3D20 Current Operating System: Linux mhfl4
   2.4.29-rc3-mhf239 #2 Tue Jan 18 17:43=3D
  
   :33 CET 2005 i686
  
   Build Date: 05 February 2005
  
   Installed snapshot from
   savage-20050205-linux.i386.tar.bz2. On starting X:
  
 [snip]
  
   So, driver in snapshot still reports 1.0. Seems to be
   quite old (2001).
 
  The new Savage DRM 2.0.0 (in fact 2.2.0 by now) is only
  available for Linux 2.6.=20
 
 Tested with 2.6.11-rc3. DRM functional with glxgears.
 
 tuxkart and tuxracer work most the time but sometimes=20
 painting occurs outside of games window. Parts of the image=20
 appear (sometime mirrored) outside game window or random=20
 patterns appear. Cursor and numeric display in game window=20
 appear as random patterns.

The garbage patters could be that it's getting the texture tiling wrong.
I messed with that code recently. Could be that I broke it on Savage/IX.
Also please try if the latest snapshot fixes this.

 
 Sometimes above games mess up the screen but restart Game a=20
 few times fixes it.
 
 =46lighgear messes up the entire screen and would never work.

Weird. I haven't had this kind of problems in a while. Though I haven't
tested on my Savage/IX recently. Looks like it's time to swap cards
again.

 
 BTW, the games work on i810 HW with 2.6.11-rc3.
 
  Since Linux 2.4 is no longer=20
  open for new features there is not much point
  back-porting it to Linux 2.4.=20
 
  See=20
  http://dri.freedesktop.org/wiki/S3Savage for more
  information about the savage driver status. I just added
  a note about Linux 2.4 to that page.
 
 Sorry, have not found any reference to 2.4 being unsupported=20
 on that page.=20

Err, I probably pushed Preview instead of Save. :-/

 
 Are there any test programs available to systematically test=20
 DRM/GL functionality?

For example mesa/progs/demos and Glean (http://glean.sourceforge.net/).
For reference you can always run with indirect software rendering. Set
LIBGL_ALWAYS_INDIRECT in the environment.

 
 Regards
 Michael
 
-- 
| Felix Kühling [EMAIL PROTECTED] http://fxk.de.vu |
| PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3  B152 151C 5CC1 D888 E595 |



---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95alloc_id396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


fglrx vs. r200 dri

2005-02-10 Thread Roland Scheidegger
Since 2 people have asked for it, here are some quick numbers for r200 
dri vs. fglrx.
r200 dri is using 45MB local tex heap (I believe fglrx reseverves pretty 
much anything for textures too, so that's only fair...). btw fglrx 
certainly has made some progress, what I noticed is at least 2d 
subjectively feels much faster (in fact, previously it felt about the 
same as when you used ACCEL_MMIO with the radeon driver, but now it 
feels pretty much the same as with the open source driver).
fglrx might be at an unfair disadvantage, I think it is not using 
pageflip. Don't know if it's using hyperz, last time I checked (with 
glxtest) it didn't seem to use that on my setup neither (but that was 
with an older driver). I suspect it still doesn't, at least not always, 
since glxgears (which gets a HUGE boost with hyperz) is now over two 
times faster with the r200 driver.
r200 dri uses xorg cvs head, with dri driver from Mesa cvs head, with 
color tiling, texture tiling, hyperz and whatever else I could find 
boosting performance :-).
fglrx uses XFree86 4.3.99.902 (from suse 9.1), with stock configuration, 
except I needed to correct the bus id and switched it to external gart. 
I don't know of any options which would boost performance.
Desktop resolution is 1280x1924, 85Hz.

Q3 demo four fullscreen 1024x768:
r200 dri 1): 129 fps
r200 dri 2): 150 fps
fglrx:   118 fps
Q3 windowed 1024x768
r200 dri 1): 125 fps
r200 dri 2): 145 fps
fglrx 3):108 fps
rtcw demo checkpoint fullscreen 1024x768
r200 dri 1): 85 fps
r200 dri 2): 95 fps
fglrx 4):89 fps
fglrx 5):78 fps
ut2k3 flyby-antalus, low/average/high
r200 dri: 15.750896 / 37.862827 / 281.284637 fps
fglrx:30.838823 / 78.981781 / 688.162048 fps
Ok now the interesting part:
Did I already mention there is a massive performance problem with vertex 
arrays in ut2k3 with the r200 driver? It is really really bad.

Remark 4) 5): 4) is the first benchmark run after the game is started, 
5) are all subsequent runs. I don't know why fglrx is always faster on 
the first run with rtcw, but it behaved like that two years ago already.
Remark 3): It is really impossible to run 3d applications correctly at a 
screen resolution of 1280x1024 with 85Hz on my card with fglrx, 
independant of the 3d application. There is a lot of flicker going on 
around the screen. AFAIK this still is the bug with insufficient 
bandwidth allocation for scanout, which was fixed in the open source 
radeon driver ages ago (by an ati employee, no less!).

And now the really interesting thing:
The results marked with 1) are obtained BEFORE running fglrx, the result 
marked with 2) AFTER running fglrx, i.e. when I did not reboot between 
running the fglrx driver and the radeon driver (which in the past lead 
to lockups, but driver switching now seems to work fine, in both 
directions). This was a completely repeatable effect, I even figured out 
that starting the X server with fglrx is not enough, but a simple 
glxinfo when it's running triggers it.
Any ideas what's causing this? Maybe fglrx reconfigures the card's 
caches or something like that? It would be nice if we could get that 
additional 10-15% performance, especially if it is as simple as writing 
a single register...

Roland
---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: fglrx vs. r200 dri

2005-02-10 Thread Ian Romanick
Roland Scheidegger wrote:
Any ideas what's causing this? Maybe fglrx reconfigures the card's 
caches or something like that? It would be nice if we could get that 
additional 10-15% performance, especially if it is as simple as writing 
a single register...
My guess would be that the fglrx driver uploads some new microcode when 
it enters 3D mode.

---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: fglrx vs. r200 dri

2005-02-10 Thread Alex Deucher
On Thu, 10 Feb 2005 17:18:44 +0100, Roland Scheidegger
[EMAIL PROTECTED] wrote:
 Since 2 people have asked for it, here are some quick numbers for r200
 dri vs. fglrx.
 r200 dri is using 45MB local tex heap (I believe fglrx reseverves pretty
 much anything for textures too, so that's only fair...). btw fglrx
 certainly has made some progress, what I noticed is at least 2d
 subjectively feels much faster (in fact, previously it felt about the
 same as when you used ACCEL_MMIO with the radeon driver, but now it
 feels pretty much the same as with the open source driver).
 fglrx might be at an unfair disadvantage, I think it is not using
 pageflip. Don't know if it's using hyperz, last time I checked (with
 glxtest) it didn't seem to use that on my setup neither (but that was
 with an older driver). I suspect it still doesn't, at least not always,
 since glxgears (which gets a HUGE boost with hyperz) is now over two
 times faster with the r200 driver.
 r200 dri uses xorg cvs head, with dri driver from Mesa cvs head, with
 color tiling, texture tiling, hyperz and whatever else I could find
 boosting performance :-).
 fglrx uses XFree86 4.3.99.902 (from suse 9.1), with stock configuration,
 except I needed to correct the bus id and switched it to external gart.
 I don't know of any options which would boost performance.
 Desktop resolution is 1280x1924, 85Hz.
 
 Q3 demo four fullscreen 1024x768:
 r200 dri 1): 129 fps
 r200 dri 2): 150 fps
 fglrx:   118 fps
 
 Q3 windowed 1024x768
 r200 dri 1): 125 fps
 r200 dri 2): 145 fps
 fglrx 3):108 fps
 
 rtcw demo checkpoint fullscreen 1024x768
 r200 dri 1): 85 fps
 r200 dri 2): 95 fps
 fglrx 4):89 fps
 fglrx 5):78 fps
 
 ut2k3 flyby-antalus, low/average/high
 r200 dri: 15.750896 / 37.862827 / 281.284637 fps
 fglrx:30.838823 / 78.981781 / 688.162048 fps
 
 Ok now the interesting part:
 Did I already mention there is a massive performance problem with vertex
 arrays in ut2k3 with the r200 driver? It is really really bad.
 
 Remark 4) 5): 4) is the first benchmark run after the game is started,
 5) are all subsequent runs. I don't know why fglrx is always faster on
 the first run with rtcw, but it behaved like that two years ago already.
 Remark 3): It is really impossible to run 3d applications correctly at a
 screen resolution of 1280x1024 with 85Hz on my card with fglrx,
 independant of the 3d application. There is a lot of flicker going on
 around the screen. AFAIK this still is the bug with insufficient
 bandwidth allocation for scanout, which was fixed in the open source
 radeon driver ages ago (by an ati employee, no less!).
 
 And now the really interesting thing:
 The results marked with 1) are obtained BEFORE running fglrx, the result
 marked with 2) AFTER running fglrx, i.e. when I did not reboot between
 running the fglrx driver and the radeon driver (which in the past lead
 to lockups, but driver switching now seems to work fine, in both
 directions). This was a completely repeatable effect, I even figured out
 that starting the X server with fglrx is not enough, but a simple
 glxinfo when it's running triggers it.
 Any ideas what's causing this? Maybe fglrx reconfigures the card's
 caches or something like that? It would be nice if we could get that
 additional 10-15% performance, especially if it is as simple as writing
 a single register...

compare a reg dump (script from Hui):
http://www.botchco.com/alex/radeon/mergedfb/cvs/DRI/hy0/radeon_dump.tgz

Alex

 
 Roland



---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: fglrx vs. r200 dri

2005-02-10 Thread Roland Scheidegger
Ian Romanick wrote:
Roland Scheidegger wrote:
Any ideas what's causing this? Maybe fglrx reconfigures the card's 
caches or something like that? It would be nice if we could get that 
additional 10-15% performance, especially if it is as simple as 
writing a single register...

My guess would be that the fglrx driver uploads some new microcode when 
it enters 3D mode.
That shouldn't matter afaik, it may be different but it will get 
completely replaced again when the radeon drm module is loaded again. 
Unless I misunderstood something...

Roland

---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: fglrx vs. r200 dri

2005-02-10 Thread Keith Whitwell
Roland Scheidegger wrote:
Ian Romanick wrote:
Roland Scheidegger wrote:
Any ideas what's causing this? Maybe fglrx reconfigures the card's 
caches or something like that? It would be nice if we could get that 
additional 10-15% performance, especially if it is as simple as 
writing a single register...

My guess would be that the fglrx driver uploads some new microcode 
when it enters 3D mode.

That shouldn't matter afaik, it may be different but it will get 
completely replaced again when the radeon drm module is loaded again. 
Unless I misunderstood something...
Makes sense.
An interesting experiment then would be to disable the drm microcode 
upload and see if there are further gains to be had from a newer microcode.

Keith
---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: fglrx vs. r200 dri

2005-02-10 Thread Ian Romanick
Roland Scheidegger wrote:
Ian Romanick wrote:
Roland Scheidegger wrote:
Any ideas what's causing this? Maybe fglrx reconfigures the card's 
caches or something like that? It would be nice if we could get that 
additional 10-15% performance, especially if it is as simple as 
writing a single register...
My guess would be that the fglrx driver uploads some new microcode 
when it enters 3D mode.
That shouldn't matter afaik, it may be different but it will get 
completely replaced again when the radeon drm module is loaded again. 
Unless I misunderstood something...
Hmm...maybe it adjusts the core / memory clock?
---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: fglrx vs. r200 dri

2005-02-10 Thread Adam Jackson
On Thursday 10 February 2005 11:18, Roland Scheidegger wrote:
 r200 dri uses xorg cvs head, with dri driver from Mesa cvs head, with
 color tiling, texture tiling, hyperz and whatever else I could find
 boosting performance :-).
 fglrx uses XFree86 4.3.99.902 (from suse 9.1), with stock configuration,
 except I needed to correct the bus id and switched it to external gart.
 I don't know of any options which would boost performance.
 Desktop resolution is 1280x1924, 85Hz.

Exactly which fglrx version is this with, the old 3.x series or the new 
8.8.25?

- ajax


pgp6e4X59OC9d.pgp
Description: PGP signature


Re: fglrx vs. r200 dri

2005-02-10 Thread Roland Scheidegger
Alex Deucher wrote:
And now the really interesting thing:
The results marked with 1) are obtained BEFORE running fglrx, the result
marked with 2) AFTER running fglrx, i.e. when I did not reboot between
running the fglrx driver and the radeon driver (which in the past lead
to lockups, but driver switching now seems to work fine, in both
directions). This was a completely repeatable effect, I even figured out
that starting the X server with fglrx is not enough, but a simple
glxinfo when it's running triggers it.
Any ideas what's causing this? Maybe fglrx reconfigures the card's
caches or something like that? It would be nice if we could get that
additional 10-15% performance, especially if it is as simple as writing
a single register...

compare a reg dump (script from Hui):
http://www.botchco.com/alex/radeon/mergedfb/cvs/DRI/hy0/radeon_dump.tgz
Sounds like a good idea. There are quite some differences, though I 
couldn't see any obvious reason (e.g. just checking out some registers).
If someone wants to take a look I've uploaded the dumps here:
http://homepage.hispeed.ch/rscheidegger/dri_experimental/r200_dumps.tar.gz

dump 1 is taken within radeon driver, after running glxgears.
dump 2 is taken within fglrx driver, after startup
dump 3 is taken within fglrx driver, after glxinfo
dump 4 is taken within radeon driver, after startup
dump 5 is taken within radeon driver, after glxgears
All of course in chronological order...
Roland
---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: fglrx vs. r200 dri

2005-02-10 Thread Roland Scheidegger
Adam Jackson wrote:
On Thursday 10 February 2005 11:18, Roland Scheidegger wrote:
r200 dri uses xorg cvs head, with dri driver from Mesa cvs head, with
color tiling, texture tiling, hyperz and whatever else I could find
boosting performance :-).
fglrx uses XFree86 4.3.99.902 (from suse 9.1), with stock configuration,
except I needed to correct the bus id and switched it to external gart.
I don't know of any options which would boost performance.
Desktop resolution is 1280x1924, 85Hz.

Exactly which fglrx version is this with, the old 3.x series or the new 
8.8.25?
This was newest 8.8.25 (for xfree 4.3, your suggested linker magic for 
using the 6.8 version with xorg cvs head does not work :-)).

Roland
---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[r2xx|r1xx] readpixels-3 and pntparam_1 - Progress?

2005-02-10 Thread Dieter Nützel
r100-readpixels-3.patch (Stephane)
r200_pntparam_1.diff (Roland)

I'm ran with both.
Should they merged?

BTW readpix sigfault is still there. (with and without both patches)

X.org CVS do NOT show this bug.

SunWave1 progs/demos# ./readpix
GL_VERSION = 1.3 Mesa 6.3
GL_RENDERER = Mesa DRI R200 20041207 AGP 4x x86/MMX+/3DNow!+/SSE TCL
GL_OES_read_format supported.  Using type / format = 0x1401 / 0x1908
Loaded 194 by 188 image
Speicherschutzverletzung (core dumped)

Reading symbols from /usr/X11R6/lib/modules/dri/r200_dri.so...done.
Loaded symbols for /usr/X11R6/lib/modules/dri/r200_dri.so
Reading symbols from /usr/X11R6/lib/libexpat.so.0...done.
Loaded symbols for /usr/X11R6/lib/libexpat.so.0
Reading symbols from /usr/lib/libtxc_dxtn.so...done.
Loaded symbols for /usr/lib/libtxc_dxtn.so
Reading symbols from /usr/X11R6/lib/X11/locale/lib/common/xlcDef.so.2...done.
Loaded symbols for /usr/X11R6/lib/X11/locale/lib/common/xlcDef.so.2
#0  0x406a911d in _generic_read_RGBA_span_BGRA_REV_SSE ()
   from /usr/X11R6/lib/modules/dri/r200_dri.so
(gdb) bt
#0  0x406a911d in _generic_read_RGBA_span_BGRA_REV_SSE ()
   from /usr/X11R6/lib/modules/dri/r200_dri.so
#1  0xff131f11 in ?? ()
(gdb) list
262
263TempImage = (GLubyte *) malloc(ImgWidth * ImgHeight * 4 * 
sizeof(GLubyte));
264assert(TempImage);
265 }
266
267
268 int
269 main( int argc, char *argv[] )
270 {
271GLboolean ciMode = GL_FALSE;

-Dieter


---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Felix Kühling
Am Mittwoch, den 09.02.2005, 22:12 +0100 schrieb Felix Kühling: 
 Am Mittwoch, den 09.02.2005, 20:58 +0100 schrieb Roland Scheidegger:
[snip] 
  Performance with gart texturing, even in 4x mode, takes a big hit 
  (almost 50%).
  I was not really able to get consistent performance results when both 
  texture heaps were active, I guess it's luck of the day which textures 
  got put in the gart heap and which ones in the local heap. But that 
  performance indeed got faster with a smaller gart heap is not a good 
  sign. And even if the maximum obtained in rtcw with 35MB local heap and 
  29MB gart heap was higher than the score obtained with 35MB local heap 
  alone, there were clearly areas which ran faster with only the local heap.
  It seems to me that the allocator really should try harder to use the 
  local heap to be useful on r200 cards, moreover it is likely that you'd 
  get quite a bit better performance when you DO have to put textures into 
  the gart heap when you revisit that later when more space becomes 
  available on the local heap and upload the still-used textures from the 
  gart heap to the local heap (in fact, should be even faster than those 
  650MB/s, since no in-kernel-copy would be needed, it should be possible 
  to blit it directly).
 
 The big problem with the current texture allocator is that it can't tell
 which areas are really unused. Texture space is only allocated and never
 freed. Once the memory is full it starts kicking textures to upload
 new ones. This is the only way of freeing memory. Using an LRU
 strategy it has a good chance of kicking unused textures first, but
 there's no guarantee. It can't tell if a kicked texture will be needed
 the next instant. So trying to move textures from GART to local memory
 would basically mean that you blindly kick the least recently used
 texture(s) from local memory. If those textures are needed again soon
 then performance is going to suffer badly.
 
 Therefore I'm proposing a modified allocator that fails when it needs to
 start kicking too recently used textures (e.g. textures used in the
 current or previous frame). Failure would not be fatal in this case, you
 just keep the texture in GART memory and try again later. Actually you
 could use the same allocator for normal texture uploads. Just specify
 the current texture heap age as the limit.
 
 If you try to move textures back to local memory each time a texture is
 used, this would result in some kind of automatic regulation of heap
 usage. By kicking only textures that are several frames old in this
 process, you'd avoid trashing.
 
 Currently the texture heap age is only incremented on lock contention
 (IIRC). In this scheme you'd also increment it on buffer swaps and
 remember the texture heap ages of the last two buffer swaps.

I simplified this idea a little further and attached a patch against
texmem.[ch]. It frees stale textures (and also place holders for other
clients' textures) that havn't been used in 1 second when it runs out of
space on a texture heap. This way it will try a bit harder to put
textures into the first heap before using the second heap, without much
risk (I hope) of performance regressions.

I tested this on a ProSavageDDR where rendering speed appears to be the
same with local and GART textures. There was no measurable performance
regression in Quake3 and I noticed no subjective performance regression
in Torcs or Quake1 either.

Now the only thing missing in texmem.c for migrating textures from GART
to local memory would be a flag to driAllocateTexture to stop trying if
kicking stale textures didn't free up enough space (on the first texture
heap).

Anyway, I think the attached patch should already make a difference as
it is. I'd be interested how much it improves your performance numbers
with Quake3 and rtcw on r200 when both texture heaps are enabled.

 
[snip]

Regards,
  Felix

-- 
| Felix Kühling [EMAIL PROTECTED] http://fxk.de.vu |
| PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3  B152 151C 5CC1 D888 E595 |
--- ./texmem.h.~1.6.~	2005-02-02 17:20:40.0 +0100
+++ ./texmem.h	2005-02-10 17:44:40.0 +0100
@@ -101,6 +101,11 @@
 	 * value must be greater than
 	 * or equal to \c firstLevel.
 	 */
+
+	double  clockAge;		/** Clock time stamp indicating when
+	 * the texture was last used. The unit
+	 * is seconds.
+	 */
 };
 
 
--- ./texmem.c.~1.10.~	2005-02-05 14:16:25.0 +0100
+++ ./texmem.c	2005-02-10 18:39:15.0 +0100
@@ -50,6 +50,7 @@
 #include texformat.h
 
 #include assert.h
+#include sys/time.h
 
 
 
@@ -243,6 +244,13 @@
*/
 
   move_to_head(  heap-texture_objects, t );
+  {
+	 struct timeval tv;
+	 if ( gettimeofday( tv, NULL ) == 0 ) {
+	t-clockAge = (double)tv.tv_sec + (double)tv.tv_usec / 1e6;
+	 } else
+	t-clockAge = 0.0;
+  }
 
 
   for (i = start ; i = end ; i++) {
@@ -415,6 +423,15 @@
   t-heap = heap;
   if (in_use) 
 

Re: [r2xx|r1xx] readpixels-3 and pntparam_1 - Progress?

2005-02-10 Thread Stephane Marchesin
Dieter Ntzel wrote:
r100-readpixels-3.patch (Stephane)
r200_pntparam_1.diff (Roland)
I'm ran with both.
Should they merged?
I surely hope to get my readpixels patch merged. However, I found a 
serious flaw in it (not related to the readpixe segfault) which I have 
to fix before this happens.

Stephane

---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95alloc_id396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


r300 vb path

2005-02-10 Thread Ben Skeggs
Hello,
I've attached a patch with a port of the r200 vertex buffer code for the 
r300 driver.
The performance of the vertex buffer codepath is now roughly the same as the
immediate path, and tuxracer now seems to be rendered almost correctly.

Vladimir, I haven't found a way that I can directly call the 
r200/radeon's discard buffer
command from r300_dri, so this patch still includes the drm additions.  
Perhaps someone
could help me out with this one?

Could the people testing r300_dri test this if they have the time?  And 
Vladimir, can you
let me know if you want me to commit this, or if it needs more work.

Thanks,
Ben Skeggs.
diff -Nur r300_driver/drm/shared-core/r300_cmdbuf.c 
r300_driver_wip/drm/shared-core/r300_cmdbuf.c
--- r300_driver/drm/shared-core/r300_cmdbuf.c   2005-01-08 13:46:34.0 
+1100
+++ r300_driver_wip/drm/shared-core/r300_cmdbuf.c   2005-02-11 
05:10:16.185196992 +1100
@@ -354,16 +354,37 @@
return 0;
 }
 
+static void r300_discard_buffer(drm_device_t * dev, drm_buf_t * buf)
+{
+drm_radeon_private_t *dev_priv = dev-dev_private;
+drm_radeon_buf_priv_t *buf_priv = buf-dev_private;
+RING_LOCALS;
+
+buf_priv-age = ++dev_priv-sarea_priv-last_dispatch;
+
+/* Emit the vertex buffer age */
+BEGIN_RING(2);
+RADEON_DISPATCH_AGE(buf_priv-age);
+ADVANCE_RING();
+
+buf-pending = 1;
+buf-used = 0;
+}
+
+
 /**
  * Parses and validates a user-supplied command buffer and emits appropriate
  * commands on the DMA ring buffer.
  * Called by the ioctl handler function radeon_cp_cmdbuf.
  */
 int r300_do_cp_cmdbuf(drm_device_t* dev,
+ DRMFILE filp,
  drm_file_t* filp_priv,
  drm_radeon_cmd_buffer_t* cmdbuf)
 {
drm_radeon_private_t *dev_priv = dev-dev_private;
+drm_device_dma_t *dma = dev-dma;
+drm_buf_t *buf = NULL;
int ret;
 
DRM_DEBUG(\n);
@@ -375,6 +396,7 @@
}
 
while(cmdbuf-bufsz = sizeof(drm_r300_cmd_header_t)) {
+   int idx;
drm_r300_cmd_header_t header;
 
if (DRM_GET_USER_UNCHECKED(header.u, (int __user*)cmdbuf-buf)) 
{
@@ -431,6 +453,26 @@
ADVANCE_RING();
}
return 0;
+
+   case R300_CMD_DMA_DISCARD:
+   DRM_DEBUG(RADEON_CMD_DMA_DISCARD\n);
+idx = header.dma.buf_idx;
+if (idx  0 || idx = dma-buf_count) {
+DRM_ERROR(buffer index %d (of %d max)\n,
+  idx, dma-buf_count - 1);
+return DRM_ERR(EINVAL);
+}
+   
+buf = dma-buflist[idx];
+if (buf-filp != filp || buf-pending) {
+DRM_ERROR(bad buffer %p %p %d\n,
+  buf-filp, filp, buf-pending);
+return DRM_ERR(EINVAL);
+}
+
+r300_discard_buffer(dev, buf);
+break;
+
default:
DRM_ERROR(bad cmd_type %i at %p\n,
  header.header.cmd_type,
diff -Nur r300_driver/drm/shared-core/radeon_drm.h 
r300_driver_wip/drm/shared-core/radeon_drm.h
--- r300_driver/drm/shared-core/radeon_drm.h2005-01-02 05:32:52.0 
+1100
+++ r300_driver_wip/drm/shared-core/radeon_drm.h2005-02-06 
20:20:06.0 +1100
@@ -199,6 +199,7 @@
 #define R300_CMD_PACKET3   3 /* emit a packet3 */
 #define R300_CMD_END3D 4 /* emit sequence ending 3d rendering 
*/
 #define R300_CMD_CP_DELAY  5
+#define R300_CMD_DMA_DISCARD   6
 
 typedef union {
unsigned int u;
@@ -218,6 +219,9 @@
unsigned char cmd_type, packet;
unsigned short count; /* amount of packet2 to emit */
} delay;
+   struct {
+   unsigned char cmd_type, buf_idx, pad0, pad1;
+   } dma;
 } drm_r300_cmd_header_t;
 
 #define RADEON_FRONT   0x1
diff -Nur r300_driver/drm/shared-core/radeon_drv.h 
r300_driver_wip/drm/shared-core/radeon_drv.h
--- r300_driver/drm/shared-core/radeon_drv.h2004-12-28 07:44:39.0 
+1100
+++ r300_driver_wip/drm/shared-core/radeon_drv.h2005-02-11 
05:11:02.953087192 +1100
@@ -310,6 +310,7 @@
 
 /* r300_cmdbuf.c */
 extern int r300_do_cp_cmdbuf( drm_device_t* dev,
+ DRMFILE filp,
  drm_file_t* filp_priv,
  drm_radeon_cmd_buffer_t* cmdbuf );
 
diff -Nur r300_driver/drm/shared-core/radeon_state.c 
r300_driver_wip/drm/shared-core/radeon_state.c
--- r300_driver/drm/shared-core/radeon_state.c  2005-01-31 13:33:24.0 
+1100
+++ r300_driver_wip/drm/shared-core/radeon_state.c  2005-02-06 
20:20:06.0 +1100
@@ -2469,7 +2469,7 @@
return DRM_ERR(EFAULT);
 
if ( IS_FAMILY_R300(dev_priv) )
-   return r300_do_cp_cmdbuf(dev, filp_priv, 

Re: fglrx vs. r200 dri

2005-02-10 Thread Adam Jackson
On Thursday 10 February 2005 12:53, Roland Scheidegger wrote:
 Adam Jackson wrote:
  On Thursday 10 February 2005 11:18, Roland Scheidegger wrote:
 r200 dri uses xorg cvs head, with dri driver from Mesa cvs head, with
 color tiling, texture tiling, hyperz and whatever else I could find
 boosting performance :-).
 fglrx uses XFree86 4.3.99.902 (from suse 9.1), with stock configuration,
 except I needed to correct the bus id and switched it to external gart.
 I don't know of any options which would boost performance.
 Desktop resolution is 1280x1924, 85Hz.
 
  Exactly which fglrx version is this with, the old 3.x series or the new
  8.8.25?

 This was newest 8.8.25 (for xfree 4.3, your suggested linker magic for
 using the 6.8 version with xorg cvs head does not work :-)).

Interesting, I've been doing that with fglrx for quite a while now with no 
problems.  Maybe I left out a step in the instructions.

If you could post the loader errors you got I could tell you how to work 
around them.

- ajax


pgpRTHLPMXuuz.pgp
Description: PGP signature


[Bug 1648] R200 SWTCL path doesn't do projtex right

2005-02-10 Thread bugzilla-daemon
Please do not reply to this email: if you want to comment on the bug, go to
   
the URL shown below and enter yourcomments there. 
   
https://bugs.freedesktop.org/show_bug.cgi?id=1648  
 

[EMAIL PROTECTED] changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED




--- Additional Comments From [EMAIL PROTECTED]  2005-02-10 10:31 ---
(In reply to comment #5)
 First I thought
 if (tnl-render_inputs  _TNL_BITS_TEX_ANY)
 might not be up to date, but at least in projtex it seems so.
Thought that too first, but then I guessed if _TNL_BITS_TEX_ANY wouldn't be up
to date, it would not work correctly at all, since the code would incorrectly
select the tiny vertex format in cases where it shouldn't.
Applied to cvs.
  
 
 
--   
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email 
 
--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.


---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [r2xx|r1xx] readpixels-3 and pntparam_1 - Progress?

2005-02-10 Thread Roland Scheidegger
Stephane Marchesin wrote:
Dieter Ntzel wrote:
r100-readpixels-3.patch (Stephane)
r200_pntparam_1.diff (Roland)
I'm ran with both.
Should they merged?
I surely hope to get my readpixels patch merged. However, I found a 
serious flaw in it (not related to the readpixe segfault) which I have 
to fix before this happens.
As for pntparam, I couldn't get it really working, and in this form it's 
overkill for what works (larger point sizes). I'm not so sure it's worth 
the trouble of whipping up a simpler patch which would only contain 
that, since only aliased larger point sizes are working, but everyone 
uses antialiased points...
Maybe I'll try it later again.

Roland

---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95alloc_id396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Jon Smirl
I haven't looked at the texture heap management code, but one simple
idea for heap management would be to cascade the on-board heap to the
AGP one. How does the current algorithm work? Does an algorithm like
the one below have merit? It should sort the hot textures on-board,
and single use textures should fall out of the cache.

1) load all textures initially in the on-board heap. Since if you are
loading them you're probably going to use them.
2) Do LRU with the on-board heap. 
3) When you run out of space on-board, demote the end of the LRU list
to the top of the AGP heap and copy the texture between heaps.
4) Run LRU on the AGP heap.
5) When it runs out of space lose the item.
6) an added twist would be if the top of the AGP heap gets hit too
often knock it out of cache so that it will get reloaded on-board.


Jon Smirl
[EMAIL PROTECTED]


---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: fglrx vs. r200 dri

2005-02-10 Thread Roland Scheidegger
Adam Jackson wrote:
On Thursday 10 February 2005 12:53, Roland Scheidegger wrote:
Adam Jackson wrote:
On Thursday 10 February 2005 11:18, Roland Scheidegger wrote:
r200 dri uses xorg cvs head, with dri driver from Mesa cvs head, with
color tiling, texture tiling, hyperz and whatever else I could find
boosting performance :-).
fglrx uses XFree86 4.3.99.902 (from suse 9.1), with stock configuration,
except I needed to correct the bus id and switched it to external gart.
I don't know of any options which would boost performance.
Desktop resolution is 1280x1924, 85Hz.
Exactly which fglrx version is this with, the old 3.x series or the new
8.8.25?
This was newest 8.8.25 (for xfree 4.3, your suggested linker magic for
using the 6.8 version with xorg cvs head does not work :-)).

Interesting, I've been doing that with fglrx for quite a while now with no 
problems.  Maybe I left out a step in the instructions.

If you could post the loader errors you got I could tell you how to work 
around them.
I just did what you suggested for fglrx_drv.o, i.e.
gcc -shared -nostdlib -o fglrx_drv.so fglrx_drv.o -Bstatic -lgcc.
libfglrxdrm.a consists of two objects, (modules.o and FireGLdrm.o) I 
extracted them with ar and linked both objects together then (gcc 
-shared -nostdlib -o libfglrxdrm.so modules.o FireGLdrm.o -Bstatic -lgcc).
But this gave me an error, it complained about unreferenced symbol from 
fglrx_drv.so (I believe it was firegl_PM4Alloc, I am sure though it was 
one of the symbols defined in this libfglrxdrm.so).

I tried some weird things like linking the two objects from 
libfglrxdrm.a together with libdrm.so without much success. At one point 
though I got a different missing symbol, I believe that was 
XAACreateScreenRec or something like that. At this point I gave up...

Roland
---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[Bug 2241] implement GL_ARB_texture_cube_map in radeon driver

2005-02-10 Thread bugzilla-daemon
Please do not reply to this email: if you want to comment on the bug, go to
   
the URL shown below and enter yourcomments there. 
   
https://bugs.freedesktop.org/show_bug.cgi?id=2241  
 




--- Additional Comments From [EMAIL PROTECTED]  2005-02-10 14:06 ---
I have applied the drm part to cvs (together with a texture micro tiling so they
can have the same drm minor number), together with the corresponding sanity code
pieces. I'm afraid, the rest is a bit too much for me to review/commit,
especially since I don't have too much time testing on r100. It would definitely
be nice to have though.
  
 
 
--   
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email 
 
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.


---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Felix Kühling
Am Donnerstag, den 10.02.2005, 15:31 -0500 schrieb Jon Smirl:
 I haven't looked at the texture heap management code, but one simple
 idea for heap management would be to cascade the on-board heap to the
 AGP one. How does the current algorithm work? Does an algorithm like
 the one below have merit? It should sort the hot textures on-board,
 and single use textures should fall out of the cache.
 
 1) load all textures initially in the on-board heap. Since if you are
 loading them you're probably going to use them.

Drivers usually upload textures to the hardware just before binding them
to a hardware texture unit. So this assumption is always true.

 2) Do LRU with the on-board heap. 
 3) When you run out of space on-board, demote the end of the LRU list
 to the top of the AGP heap and copy the texture between heaps.

This means you copy a texture when you don't know if or when you're
going to need it again. So the move of the texture may just be a waste
of time. It would be better to just kick the texture and upload it again
later when it's really needed.

 4) Run LRU on the AGP heap.
 5) When it runs out of space lose the item.
 6) an added twist would be if the top of the AGP heap gets hit too
 often knock it out of cache so that it will get reloaded on-board.

I'd rather reverse your scheme. Upload a texture to the GART heap first,
because that's potentially faster (though not with the current
implementation in the radeon drivers). When the texture is needed more
frequently, try promoting it to the local texture heap.

This scheme would give good results with movie players that need fast
texture uploads and typically use each texture exactly once. It would
also improve performance with games, simulations, ... that tend to use
the same textures many times and benefit from the higher memory
bandwidth when accessing local textures.

 
 
 Jon Smirl
 [EMAIL PROTECTED]
 

-- 
| Felix Kühling [EMAIL PROTECTED] http://fxk.de.vu |
| PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3  B152 151C 5CC1 D888 E595 |



---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95alloc_id396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Jon Smirl
On Thu, 10 Feb 2005 23:13:30 +0100, Felix Kühling [EMAIL PROTECTED] wrote:
 This means you copy a texture when you don't know if or when you're
 going to need it again. So the move of the texture may just be a waste
 of time. It would be better to just kick the texture and upload it again
 later when it's really needed.

I suspect this extra texture copy wouldn't be noticable except when
you construct a test program which articifically triggers it. Most
games will achieve a steady state with their loaded textures after a
frame or two and the copies will stop.

 I'd rather reverse your scheme. Upload a texture to the GART heap first,
 because that's potentially faster (though not with the current
 implementation in the radeon drivers). When the texture is needed more
 frequently, try promoting it to the local texture heap.

I thought about this, but there is no automatic way to figure out when
to promote from GART to local. Same problem when local overflows, what
do you demote to AGP? You still have copies with this scheme too.

Going first to local and then demoting to AGP sorts everything
automatically. It may cause a little more churn in the heaps, but the
advantage is that the algorithm is very simple and doesn't need much
tuning. The only tunable parameter is determining when the top of the
AGP heap is hot and booting it. You could use something simple like
boot after 500 accesses.

-- 
Jon Smirl
[EMAIL PROTECTED]


---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95alloc_id396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Jon Smirl
On Thu, 10 Feb 2005 23:13:30 +0100, Felix Kühling [EMAIL PROTECTED] wrote:
 This scheme would give good results with movie players that need fast
 texture uploads and typically use each texture exactly once. It would

Movie players aren't even close to being texture bandwidth bound. The
demote from local to AGP scheme would cause two copies on each frame
but there is plenty of bandwidth. But this assumes that the movie
player creates a new texture for each frame.

A better scheme for a movie player would be to create a single texture
and then keep replacing it's contents. Or use two textures and double
buffer. But once created these textures would not move in the LRU list
unless you started something like a game in another window.

-- 
Jon Smirl
[EMAIL PROTECTED]


---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95alloc_id396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Felix Kühling
Am Donnerstag, den 10.02.2005, 17:40 -0500 schrieb Jon Smirl: 
 On Thu, 10 Feb 2005 23:13:30 +0100, Felix Kühling [EMAIL PROTECTED] wrote:
  This scheme would give good results with movie players that need fast
  texture uploads and typically use each texture exactly once. It would
 
 Movie players aren't even close to being texture bandwidth bound. The

That's not my experience. Optimizations in the texture upload path,
using the AGP heap and partial texture uploads had a big impact on
mplayer -vo gl performance on my ProSavageDDR (factor 2-3 all of them
taken together).

 demote from local to AGP scheme would cause two copies on each frame
 but there is plenty of bandwidth. But this assumes that the movie
 player creates a new texture for each frame.
 
 A better scheme for a movie player would be to create a single texture
 and then keep replacing it's contents.

You're right, that's what actually happens in mplayer. It uses
glTexSubImage2D because it typically changes only a part of a texture
with power-of-two dimensions.

 Or use two textures and double
 buffer. But once created these textures would not move in the LRU list
 unless you started something like a game in another window.

Yes, they would move in the LRU list. That's why it's called least
recently used not least recently created. ;-)

So I would have to modify my scheme to reset the usage count/frequency
when a texture image is changed, such that a texture that is updated
very frequently would not be promoted to local memory.

Am Donnerstag, den 10.02.2005, 17:34 -0500 schrieb Jon Smirl:
 On Thu, 10 Feb 2005 23:13:30 +0100, Felix Kühling [EMAIL PROTECTED] wrote:
  This means you copy a texture when you don't know if or when you're
  going to need it again. So the move of the texture may just be a waste
  of time. It would be better to just kick the texture and upload it again
  later when it's really needed.
 
 I suspect this extra texture copy wouldn't be noticable except when
 you construct a test program which articifically triggers it. Most
 games will achieve a steady state with their loaded textures after a
 frame or two and the copies will stop.

Still this copy is unnecessary at the time. Delaying the re-upload to
the time when the texture is needed again has only advantages and is not
difficult to implement.

 
  I'd rather reverse your scheme. Upload a texture to the GART heap first,
  because that's potentially faster (though not with the current
  implementation in the radeon drivers). When the texture is needed more
  frequently, try promoting it to the local texture heap.
 
 I thought about this, but there is no automatic way to figure out when
 to promote from GART to local.

Yes there is. In the current scheme, whenever a texture is bound to a
hardware tex unit the driver calls driUpdateTexLRU, which moves the
texture to the front of the LRU list. In this function you could easily
count how often or how frequently a texture has been used. Based on this
information and maybe the texture size you could decide which textures
to promote and when. You will keep promoting textures until the local
heap is full of non-stale textures.

  Same problem when local overflows, what
 do you demote to AGP? You still have copies with this scheme too.

Textures are sorted in LRU-order on the texture heaps. So you always
kick least recently used textures first. It has always worked like this
even in the current scheme. For promoting textures I would only kick
stale textures from the local heap.

 
 Going first to local and then demoting to AGP sorts everything
 automatically. It may cause a little more churn in the heaps,

In my experience texture uploads are quite expensive. So IMO avoiding
unnecessary texture uploads or copies should have a high priority.

  but the
 advantage is that the algorithm is very simple and doesn't need much
 tuning. The only tunable parameter is determining when the top of the
 AGP heap is hot and booting it. You could use something simple like
 boot after 500 accesses.

I don't think my algorithm is much more complicated. It can be
implemented by gradual improvements of the current algorithm (freeing
stale texture memory is one step) which helps avoiding unexpected
performance regressions. At the moment I'm not planning to rewrite it
from scratch, especially because I can't test on any hardware where I
can actually measure great performance improvements ATM.

The only tunable parameter in my algorithm is how often/frequently used
a texture must be in order to try to promote it to the local texture
heap. Maybe there are a few more degrees of freedom, because you can
also consider the texture size for promotion. I think the steady state
result would be about the same as with your algorithm, but I expect my
scheme to work better when textures are used very infrequently or
updated very frequently (movie players). In particular this would make
the texture_heaps option unnecessary, which is a good think IMO (good

Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Roland Scheidegger
Felix Kühling wrote:
I simplified this idea a little further and attached a patch against
texmem.[ch]. It frees stale textures (and also place holders for other
clients' textures) that havn't been used in 1 second when it runs out of
space on a texture heap. This way it will try a bit harder to put
textures into the first heap before using the second heap, without much
risk (I hope) of performance regressions.
I tested this on a ProSavageDDR where rendering speed appears to be the
same with local and GART textures. There was no measurable performance
regression in Quake3 and I noticed no subjective performance regression
in Torcs or Quake1 either.
Now the only thing missing in texmem.c for migrating textures from GART
to local memory would be a flag to driAllocateTexture to stop trying if
kicking stale textures didn't free up enough space (on the first texture
heap).
Anyway, I think the attached patch should already make a difference as
it is. I'd be interested how much it improves your performance numbers
with Quake3 and rtcw on r200 when both texture heaps are enabled.
I've done a couple of benchmarks. All results are fglrx-boosted, so to 
speak (too lazy to reboot).

q3, local 45MB or 35MB:  145 fps
rtcw, local 45MB: 95 fps
rtcw, local 35MB: 76 fps
with both heaps, local size 35MB, GART texture size 61MB:
q3, old allocator:   105-125 fps
rtcw, old allocator:   70-84 fps
q3, new allocator:   108-126 fps
rtcw, new allocator:   71-85 fps
This does not seem to really make a difference.
One interesting thing I noticed though is that it is actually not really 
a range of results, but only some distinct values. For rtcw, the 
scores were always very close to either 70, 77 or 85 fps (within 1 
frame), out of 10 runs maybe 6 were around 77, 2 around 70 and 2 around 
85. Quake3 mostly ran at around 125 fps but once every while was just 
below 110.

Roland
---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95alloc_id396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Dave Airlie

 A better scheme for a movie player would be to create a single texture
 and then keep replacing it's contents. Or use two textures and double
 buffer. But once created these textures would not move in the LRU list
 unless you started something like a game in another window.

if we supported that in any reasonable fashion (at least on radeon/r200),
movie players are very texture upload bound, well at least on my embedded
system, I do a lot of animation with movies, and mngs and arrays of pngs,
and most of my time is spent in memcpy and texstore_rgba, this is a
real pain for me, and I'm slowly gathering enough knowledge to do a great
big hack for my own internal use,

Dave.

-- 
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
pam_smb / Linux DECstation / Linux VAX / ILUG person



---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Jon Smirl
AGP 8x should just be able to keep up with 1280x1024x24b 60 times/sec.

How does mesa access AGP memory from the CPU side? AGP memory is
system memory which the AGP makes visible to the GPU.  Are we using
the GPU to load textures into AGP memory or is it being done entirely
on the main CPU with a memcopy?

For things like a movie player we should even be able to give it a
pointer to the texture in system memory(AGP space) and let it directly
manipulate the texture buffer. Doing that would require playing with
the page tables to preserve protection.

-- 
Jon Smirl
[EMAIL PROTECTED]


---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Roland Scheidegger
Jon Smirl wrote:
AGP 8x should just be able to keep up with 1280x1024x24b 60 
times/sec.
AGP 4x should be enough. Remember I got 600MB/s max throughput. Not with
24bit textures though, the Mesa RGBA-BGRA conversion takes WAY too much
time to achieve that.
How does mesa access AGP memory from the CPU side? AGP memory is 
system memory which the AGP makes visible to the GPU.  Are we using 
the GPU to load textures into AGP memory or is it being done entirely
 on the main CPU with a memcopy?
depends on driver. radeon/r200 use gpu blit. Might be suboptimal but at
least it handles things like tiling (when the gpu blitter can do it) 
automatically. I'm not sure but couldn't the radeon blitter actually do 
rgba-bgra conversion too for instance?

For things like a movie player we should even be able to give it a 
pointer to the texture in system memory(AGP space) and let it 
directly manipulate the texture buffer. Doing that would require 
playing with the page tables to preserve protection.
This seems exactly to be what the client extension of the r200 driver is 
intended for. But for normal apps, it's useless (and for the most part 
even for apps which could make good use of it, since it's an extension 
almost noone uses anyway).

Roland


---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Roland Scheidegger
Felix Kühling wrote:
I don't think my algorithm is much more complicated. It can be
implemented by gradual improvements of the current algorithm (freeing
stale texture memory is one step) which helps avoiding unexpected
performance regressions. At the moment I'm not planning to rewrite it
from scratch, especially because I can't test on any hardware where I
can actually measure great performance improvements ATM.
I'm not sure what a really good implementation would look like, but you 
could try lowering gart speed to 1x with a savage to see a performance 
difference between local and gart texturing. Though I'm not convinced 
the savages are actually fast enough to even take a hit with agp 1x...

Roland
---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95alloc_id396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Owen Taylor
Dave Airlie wrote:
A better scheme for a movie player would be to create a single texture
and then keep replacing it's contents. Or use two textures and double
buffer. But once created these textures would not move in the LRU list
unless you started something like a game in another window.

if we supported that in any reasonable fashion (at least on radeon/r200),
movie players are very texture upload bound, well at least on my embedded
system, I do a lot of animation with movies, and mngs and arrays of pngs,
and most of my time is spent in memcpy and texstore_rgba, this is a
real pain for me, and I'm slowly gathering enough knowledge to do a great
big hack for my own internal use,
Perhaps a wild idea ... does APPLE_client_texture do what you want? If 
so then it might be a lot simpler and more reusable to 
test/optimize/fixup that then to start from scratch.

That should allow a straight-copy from data you create to memory card 
the can texture from, which is about as good as possible.

For subimage modification the spec seems to permit modifying the data in 
place then calling TexSubImage on the subregion with a pointer into
the original data to notify of the change.

Regards,
Owen
---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Jon Smirl
On Thu, 10 Feb 2005 21:59:29 -0500, Owen Taylor [EMAIL PROTECTED] wrote:
 That should allow a straight-copy from data you create to memory card
 the can texture from, which is about as good as possible.

If you have a big AGP aperture to play with there is a faster way.
When you get the call to copy the texture from user space, don't copy
it. Instead mark it's page table entries as copy on write. Get the
physical address of the page and set it into the GART. Now the GPU can
get to it with zero copies. When you are done with it, check and see
if the app caused a copy on write, if so free the page, else just
remove the COW flag.

-- 
Jon Smirl
[EMAIL PROTECTED]


---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Eric Anholt
On Thu, 2005-02-10 at 22:23 -0500, Jon Smirl wrote:
 On Thu, 10 Feb 2005 21:59:29 -0500, Owen Taylor [EMAIL PROTECTED] wrote:
  That should allow a straight-copy from data you create to memory card
  the can texture from, which is about as good as possible.
 
 If you have a big AGP aperture to play with there is a faster way.
 When you get the call to copy the texture from user space, don't copy
 it. Instead mark it's page table entries as copy on write. Get the
 physical address of the page and set it into the GART. Now the GPU can
 get to it with zero copies. When you are done with it, check and see
 if the app caused a copy on write, if so free the page, else just
 remove the COW flag.

Is there evidence that this is/would be in fact faster?

-- 
Eric Anholt[EMAIL PROTECTED]  
http://people.freebsd.org/~anholt/ [EMAIL PROTECTED]


---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: r300 vb path

2005-02-10 Thread Vladimir Dergachev
Hi Ben,
   Great work !
   With regards to discard buffer command - now that I think of it, you 
want this code initiated from within cmdbuf, not as a separate ioctl, so 
your way is correct - we need to implement the appropriate cmd for r300.

   Go ahead and apply the patch, can't wait to try it :)
  thank you !
   Vladimir Dergachev

On Fri, 11 Feb 2005, Ben Skeggs wrote:
Hello,
I've attached a patch with a port of the r200 vertex buffer code for the r300 
driver.
The performance of the vertex buffer codepath is now roughly the same as the
immediate path, and tuxracer now seems to be rendered almost correctly.

Vladimir, I haven't found a way that I can directly call the r200/radeon's 
discard buffer
command from r300_dri, so this patch still includes the drm additions. 
Perhaps someone
could help me out with this one?

Could the people testing r300_dri test this if they have the time?  And 
Vladimir, can you
let me know if you want me to commit this, or if it needs more work.

Thanks,
Ben Skeggs.


---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Dave Airlie
  it. Instead mark it's page table entries as copy on write. Get the
  physical address of the page and set it into the GART. Now the GPU can
  get to it with zero copies. When you are done with it, check and see
  if the app caused a copy on write, if so free the page, else just
  remove the COW flag.

 Is there evidence that this is/would be in fact faster?

no but I could practically guarantee anything is faster than the 3-4
copies a radeon texture goes through at the moment..

Dave.

-- 
David Airlie, Software Engineer
http://www.skynet.ie/~airlied / airlied at skynet.ie
pam_smb / Linux DECstation / Linux VAX / ILUG person



---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] texturing performance local/gart on rv250

2005-02-10 Thread Jon Smirl
On Thu, 10 Feb 2005 20:14:00 -0800, Eric Anholt [EMAIL PROTECTED] wrote:
 Is there evidence that this is/would be in fact faster?

That's how the networking drivers work and they may be the fastest
drivers in the system.
But, it has not been coded for AGP so nobody knows for sure. It has to
be faster though, having the CPU do the copy will cause the TLB cache
to be flushed as you walk through all of the pages. Having the GPU do
the copy is even worse since it moves across AGP.

We have bigger problems to chase. Plus implementing it this way
probably has a bunch of architecture specific problems I don't know
about. But I'm sure it would work on the x86.

After we get X on GL up on mesa-solo I can look at changing the
texture copy code.

-- 
Jon Smirl
[EMAIL PROTECTED]


---
SF email is sponsored by - The IT Product Guide
Read honest  candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel