Re: OpenGL apps causes frequent system locks
On Tue, 8 Feb 2005, Michel [ISO-8859-1] Dnzer wrote: On Mon, 2005-02-07 at 13:40 +0100, Geller Sandor wrote: Is there any way I can help to track down the problem(s)? My machine doesn't have network connection, so I can use only scripts which run in the background. With expect and gdb maybe it is possible to get at least a backtrace from my non-local-interactive machine. Unfortunately, a backtrace is usually useless for a lockup because all it will show you is the X server and/or the client(s) waiting for the GPU to become idle, which it never does because it's locked up. The problem is finding out what caused it to lock up, and that can be very hard and time consuming. That being said, I too have noticed slightly decreased stability with r200 recently. As this seems to have snuck in gradually, binary searches to try and isolate the CVS changes causing problems might be a good strategy. Thanks, I checked out CVS versions 2004-08-31 and 2004-09-30 of the X.org. I will test with this two snapshots on the weekend, and if the latter crashes while the former doesn't, then I will be able track down which is the latest CVS snapshot which works on my machine without crashes. Geller Sandor [EMAIL PROTECTED] --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: drm race fix for non-core
Stephane Marchesin wrote: Hi, Attached is a straight port of Eric's fix for the drm race to non-core drm. Committed. Keith --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: r300 on PPC problem
On Thu, 10 Feb 2005 16:16:12 +1100, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: Hi ! An interesting issue with current X.org CVS and current Linux bk is that on r300, the DRI module now loads, and 2D is broken. It looks like an endian issue (like pixels are horizontally flipped), I can post a snapshot later I suppose. Preventing the kernel module from loading fixes it, so I suspect it's an issue with the 2D CCE accel for r300. Is this a known problem ? Yes i added a bug to Xorg. But i am wondering if the problem is in the drm. Really don't know enought on that but i will look at drm see if it may came from here. This is more probably due to the fact that rbbm_gui_cntl endian swapping don't work on r300. Jerome Glisse --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: savage-20050205-linux snapshot - problems
Am Donnerstag, den 10.02.2005, 07:21 +0100 schrieb [EMAIL PROTECTED]: On Monday 07 February 2005 15:33, Felix K=FChling wrote: Am Montag, den 07.02.2005, 15:12 +0100 schrieb=20 [EMAIL PROTECTED]: Hardware: Toshiba Libretto L2 Tm5600 with: :00:04.0 VGA compatible controller: S3 Inc. 86C270-294=3D20 Savage/IX-MV (rev 13) (prog-if 00 [VGA]) Subsystem: Toshiba America Info Systems: Unknown device 0001 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-=3D Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=3D3Dmedium TAbort- =3D TAbort- MAbort- SERR- PERR- Latency: 248 (1000ns min, 63750ns max), cache line size 08 Interrupt: pin A routed to IRQ 11 Region 0: Memory at e000 (32-bit, non-prefetchable) [size=3D3D128=3D M] Expansion ROM at 000c [disabled] [size=3D3D64K] Capabilities: available only to root Software: Gentoo current with Gentoo supplied X Window System Version 6.8.1.903 (6.8.=3D 2 RC 3) Release Date: 25 January 2005 X Protocol Version 11, Revision 0, Release 6.8.1.903 Build Operating System: Linux 2.4.29-rc3-mhf239 i686 [ELF]=3D20 Current Operating System: Linux mhfl4 2.4.29-rc3-mhf239 #2 Tue Jan 18 17:43=3D :33 CET 2005 i686 Build Date: 05 February 2005 Installed snapshot from savage-20050205-linux.i386.tar.bz2. On starting X: [snip] So, driver in snapshot still reports 1.0. Seems to be quite old (2001). The new Savage DRM 2.0.0 (in fact 2.2.0 by now) is only available for Linux 2.6.=20 Tested with 2.6.11-rc3. DRM functional with glxgears. tuxkart and tuxracer work most the time but sometimes=20 painting occurs outside of games window. Parts of the image=20 appear (sometime mirrored) outside game window or random=20 patterns appear. Cursor and numeric display in game window=20 appear as random patterns. The garbage patters could be that it's getting the texture tiling wrong. I messed with that code recently. Could be that I broke it on Savage/IX. Also please try if the latest snapshot fixes this. Sometimes above games mess up the screen but restart Game a=20 few times fixes it. =46lighgear messes up the entire screen and would never work. Weird. I haven't had this kind of problems in a while. Though I haven't tested on my Savage/IX recently. Looks like it's time to swap cards again. BTW, the games work on i810 HW with 2.6.11-rc3. Since Linux 2.4 is no longer=20 open for new features there is not much point back-porting it to Linux 2.4.=20 See=20 http://dri.freedesktop.org/wiki/S3Savage for more information about the savage driver status. I just added a note about Linux 2.4 to that page. Sorry, have not found any reference to 2.4 being unsupported=20 on that page.=20 Err, I probably pushed Preview instead of Save. :-/ Are there any test programs available to systematically test=20 DRM/GL functionality? For example mesa/progs/demos and Glean (http://glean.sourceforge.net/). For reference you can always run with indirect software rendering. Set LIBGL_ALWAYS_INDIRECT in the environment. Regards Michael -- | Felix Kühling [EMAIL PROTECTED] http://fxk.de.vu | | PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3 B152 151C 5CC1 D888 E595 | --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
fglrx vs. r200 dri
Since 2 people have asked for it, here are some quick numbers for r200 dri vs. fglrx. r200 dri is using 45MB local tex heap (I believe fglrx reseverves pretty much anything for textures too, so that's only fair...). btw fglrx certainly has made some progress, what I noticed is at least 2d subjectively feels much faster (in fact, previously it felt about the same as when you used ACCEL_MMIO with the radeon driver, but now it feels pretty much the same as with the open source driver). fglrx might be at an unfair disadvantage, I think it is not using pageflip. Don't know if it's using hyperz, last time I checked (with glxtest) it didn't seem to use that on my setup neither (but that was with an older driver). I suspect it still doesn't, at least not always, since glxgears (which gets a HUGE boost with hyperz) is now over two times faster with the r200 driver. r200 dri uses xorg cvs head, with dri driver from Mesa cvs head, with color tiling, texture tiling, hyperz and whatever else I could find boosting performance :-). fglrx uses XFree86 4.3.99.902 (from suse 9.1), with stock configuration, except I needed to correct the bus id and switched it to external gart. I don't know of any options which would boost performance. Desktop resolution is 1280x1924, 85Hz. Q3 demo four fullscreen 1024x768: r200 dri 1): 129 fps r200 dri 2): 150 fps fglrx: 118 fps Q3 windowed 1024x768 r200 dri 1): 125 fps r200 dri 2): 145 fps fglrx 3):108 fps rtcw demo checkpoint fullscreen 1024x768 r200 dri 1): 85 fps r200 dri 2): 95 fps fglrx 4):89 fps fglrx 5):78 fps ut2k3 flyby-antalus, low/average/high r200 dri: 15.750896 / 37.862827 / 281.284637 fps fglrx:30.838823 / 78.981781 / 688.162048 fps Ok now the interesting part: Did I already mention there is a massive performance problem with vertex arrays in ut2k3 with the r200 driver? It is really really bad. Remark 4) 5): 4) is the first benchmark run after the game is started, 5) are all subsequent runs. I don't know why fglrx is always faster on the first run with rtcw, but it behaved like that two years ago already. Remark 3): It is really impossible to run 3d applications correctly at a screen resolution of 1280x1024 with 85Hz on my card with fglrx, independant of the 3d application. There is a lot of flicker going on around the screen. AFAIK this still is the bug with insufficient bandwidth allocation for scanout, which was fixed in the open source radeon driver ages ago (by an ati employee, no less!). And now the really interesting thing: The results marked with 1) are obtained BEFORE running fglrx, the result marked with 2) AFTER running fglrx, i.e. when I did not reboot between running the fglrx driver and the radeon driver (which in the past lead to lockups, but driver switching now seems to work fine, in both directions). This was a completely repeatable effect, I even figured out that starting the X server with fglrx is not enough, but a simple glxinfo when it's running triggers it. Any ideas what's causing this? Maybe fglrx reconfigures the card's caches or something like that? It would be nice if we could get that additional 10-15% performance, especially if it is as simple as writing a single register... Roland --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: fglrx vs. r200 dri
Roland Scheidegger wrote: Any ideas what's causing this? Maybe fglrx reconfigures the card's caches or something like that? It would be nice if we could get that additional 10-15% performance, especially if it is as simple as writing a single register... My guess would be that the fglrx driver uploads some new microcode when it enters 3D mode. --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: fglrx vs. r200 dri
On Thu, 10 Feb 2005 17:18:44 +0100, Roland Scheidegger [EMAIL PROTECTED] wrote: Since 2 people have asked for it, here are some quick numbers for r200 dri vs. fglrx. r200 dri is using 45MB local tex heap (I believe fglrx reseverves pretty much anything for textures too, so that's only fair...). btw fglrx certainly has made some progress, what I noticed is at least 2d subjectively feels much faster (in fact, previously it felt about the same as when you used ACCEL_MMIO with the radeon driver, but now it feels pretty much the same as with the open source driver). fglrx might be at an unfair disadvantage, I think it is not using pageflip. Don't know if it's using hyperz, last time I checked (with glxtest) it didn't seem to use that on my setup neither (but that was with an older driver). I suspect it still doesn't, at least not always, since glxgears (which gets a HUGE boost with hyperz) is now over two times faster with the r200 driver. r200 dri uses xorg cvs head, with dri driver from Mesa cvs head, with color tiling, texture tiling, hyperz and whatever else I could find boosting performance :-). fglrx uses XFree86 4.3.99.902 (from suse 9.1), with stock configuration, except I needed to correct the bus id and switched it to external gart. I don't know of any options which would boost performance. Desktop resolution is 1280x1924, 85Hz. Q3 demo four fullscreen 1024x768: r200 dri 1): 129 fps r200 dri 2): 150 fps fglrx: 118 fps Q3 windowed 1024x768 r200 dri 1): 125 fps r200 dri 2): 145 fps fglrx 3):108 fps rtcw demo checkpoint fullscreen 1024x768 r200 dri 1): 85 fps r200 dri 2): 95 fps fglrx 4):89 fps fglrx 5):78 fps ut2k3 flyby-antalus, low/average/high r200 dri: 15.750896 / 37.862827 / 281.284637 fps fglrx:30.838823 / 78.981781 / 688.162048 fps Ok now the interesting part: Did I already mention there is a massive performance problem with vertex arrays in ut2k3 with the r200 driver? It is really really bad. Remark 4) 5): 4) is the first benchmark run after the game is started, 5) are all subsequent runs. I don't know why fglrx is always faster on the first run with rtcw, but it behaved like that two years ago already. Remark 3): It is really impossible to run 3d applications correctly at a screen resolution of 1280x1024 with 85Hz on my card with fglrx, independant of the 3d application. There is a lot of flicker going on around the screen. AFAIK this still is the bug with insufficient bandwidth allocation for scanout, which was fixed in the open source radeon driver ages ago (by an ati employee, no less!). And now the really interesting thing: The results marked with 1) are obtained BEFORE running fglrx, the result marked with 2) AFTER running fglrx, i.e. when I did not reboot between running the fglrx driver and the radeon driver (which in the past lead to lockups, but driver switching now seems to work fine, in both directions). This was a completely repeatable effect, I even figured out that starting the X server with fglrx is not enough, but a simple glxinfo when it's running triggers it. Any ideas what's causing this? Maybe fglrx reconfigures the card's caches or something like that? It would be nice if we could get that additional 10-15% performance, especially if it is as simple as writing a single register... compare a reg dump (script from Hui): http://www.botchco.com/alex/radeon/mergedfb/cvs/DRI/hy0/radeon_dump.tgz Alex Roland --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: fglrx vs. r200 dri
Ian Romanick wrote: Roland Scheidegger wrote: Any ideas what's causing this? Maybe fglrx reconfigures the card's caches or something like that? It would be nice if we could get that additional 10-15% performance, especially if it is as simple as writing a single register... My guess would be that the fglrx driver uploads some new microcode when it enters 3D mode. That shouldn't matter afaik, it may be different but it will get completely replaced again when the radeon drm module is loaded again. Unless I misunderstood something... Roland --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: fglrx vs. r200 dri
Roland Scheidegger wrote: Ian Romanick wrote: Roland Scheidegger wrote: Any ideas what's causing this? Maybe fglrx reconfigures the card's caches or something like that? It would be nice if we could get that additional 10-15% performance, especially if it is as simple as writing a single register... My guess would be that the fglrx driver uploads some new microcode when it enters 3D mode. That shouldn't matter afaik, it may be different but it will get completely replaced again when the radeon drm module is loaded again. Unless I misunderstood something... Makes sense. An interesting experiment then would be to disable the drm microcode upload and see if there are further gains to be had from a newer microcode. Keith --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: fglrx vs. r200 dri
Roland Scheidegger wrote: Ian Romanick wrote: Roland Scheidegger wrote: Any ideas what's causing this? Maybe fglrx reconfigures the card's caches or something like that? It would be nice if we could get that additional 10-15% performance, especially if it is as simple as writing a single register... My guess would be that the fglrx driver uploads some new microcode when it enters 3D mode. That shouldn't matter afaik, it may be different but it will get completely replaced again when the radeon drm module is loaded again. Unless I misunderstood something... Hmm...maybe it adjusts the core / memory clock? --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: fglrx vs. r200 dri
On Thursday 10 February 2005 11:18, Roland Scheidegger wrote: r200 dri uses xorg cvs head, with dri driver from Mesa cvs head, with color tiling, texture tiling, hyperz and whatever else I could find boosting performance :-). fglrx uses XFree86 4.3.99.902 (from suse 9.1), with stock configuration, except I needed to correct the bus id and switched it to external gart. I don't know of any options which would boost performance. Desktop resolution is 1280x1924, 85Hz. Exactly which fglrx version is this with, the old 3.x series or the new 8.8.25? - ajax pgp6e4X59OC9d.pgp Description: PGP signature
Re: fglrx vs. r200 dri
Alex Deucher wrote: And now the really interesting thing: The results marked with 1) are obtained BEFORE running fglrx, the result marked with 2) AFTER running fglrx, i.e. when I did not reboot between running the fglrx driver and the radeon driver (which in the past lead to lockups, but driver switching now seems to work fine, in both directions). This was a completely repeatable effect, I even figured out that starting the X server with fglrx is not enough, but a simple glxinfo when it's running triggers it. Any ideas what's causing this? Maybe fglrx reconfigures the card's caches or something like that? It would be nice if we could get that additional 10-15% performance, especially if it is as simple as writing a single register... compare a reg dump (script from Hui): http://www.botchco.com/alex/radeon/mergedfb/cvs/DRI/hy0/radeon_dump.tgz Sounds like a good idea. There are quite some differences, though I couldn't see any obvious reason (e.g. just checking out some registers). If someone wants to take a look I've uploaded the dumps here: http://homepage.hispeed.ch/rscheidegger/dri_experimental/r200_dumps.tar.gz dump 1 is taken within radeon driver, after running glxgears. dump 2 is taken within fglrx driver, after startup dump 3 is taken within fglrx driver, after glxinfo dump 4 is taken within radeon driver, after startup dump 5 is taken within radeon driver, after glxgears All of course in chronological order... Roland --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: fglrx vs. r200 dri
Adam Jackson wrote: On Thursday 10 February 2005 11:18, Roland Scheidegger wrote: r200 dri uses xorg cvs head, with dri driver from Mesa cvs head, with color tiling, texture tiling, hyperz and whatever else I could find boosting performance :-). fglrx uses XFree86 4.3.99.902 (from suse 9.1), with stock configuration, except I needed to correct the bus id and switched it to external gart. I don't know of any options which would boost performance. Desktop resolution is 1280x1924, 85Hz. Exactly which fglrx version is this with, the old 3.x series or the new 8.8.25? This was newest 8.8.25 (for xfree 4.3, your suggested linker magic for using the 6.8 version with xorg cvs head does not work :-)). Roland --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
[r2xx|r1xx] readpixels-3 and pntparam_1 - Progress?
r100-readpixels-3.patch (Stephane) r200_pntparam_1.diff (Roland) I'm ran with both. Should they merged? BTW readpix sigfault is still there. (with and without both patches) X.org CVS do NOT show this bug. SunWave1 progs/demos# ./readpix GL_VERSION = 1.3 Mesa 6.3 GL_RENDERER = Mesa DRI R200 20041207 AGP 4x x86/MMX+/3DNow!+/SSE TCL GL_OES_read_format supported. Using type / format = 0x1401 / 0x1908 Loaded 194 by 188 image Speicherschutzverletzung (core dumped) Reading symbols from /usr/X11R6/lib/modules/dri/r200_dri.so...done. Loaded symbols for /usr/X11R6/lib/modules/dri/r200_dri.so Reading symbols from /usr/X11R6/lib/libexpat.so.0...done. Loaded symbols for /usr/X11R6/lib/libexpat.so.0 Reading symbols from /usr/lib/libtxc_dxtn.so...done. Loaded symbols for /usr/lib/libtxc_dxtn.so Reading symbols from /usr/X11R6/lib/X11/locale/lib/common/xlcDef.so.2...done. Loaded symbols for /usr/X11R6/lib/X11/locale/lib/common/xlcDef.so.2 #0 0x406a911d in _generic_read_RGBA_span_BGRA_REV_SSE () from /usr/X11R6/lib/modules/dri/r200_dri.so (gdb) bt #0 0x406a911d in _generic_read_RGBA_span_BGRA_REV_SSE () from /usr/X11R6/lib/modules/dri/r200_dri.so #1 0xff131f11 in ?? () (gdb) list 262 263TempImage = (GLubyte *) malloc(ImgWidth * ImgHeight * 4 * sizeof(GLubyte)); 264assert(TempImage); 265 } 266 267 268 int 269 main( int argc, char *argv[] ) 270 { 271GLboolean ciMode = GL_FALSE; -Dieter --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] texturing performance local/gart on rv250
Am Mittwoch, den 09.02.2005, 22:12 +0100 schrieb Felix Kühling: Am Mittwoch, den 09.02.2005, 20:58 +0100 schrieb Roland Scheidegger: [snip] Performance with gart texturing, even in 4x mode, takes a big hit (almost 50%). I was not really able to get consistent performance results when both texture heaps were active, I guess it's luck of the day which textures got put in the gart heap and which ones in the local heap. But that performance indeed got faster with a smaller gart heap is not a good sign. And even if the maximum obtained in rtcw with 35MB local heap and 29MB gart heap was higher than the score obtained with 35MB local heap alone, there were clearly areas which ran faster with only the local heap. It seems to me that the allocator really should try harder to use the local heap to be useful on r200 cards, moreover it is likely that you'd get quite a bit better performance when you DO have to put textures into the gart heap when you revisit that later when more space becomes available on the local heap and upload the still-used textures from the gart heap to the local heap (in fact, should be even faster than those 650MB/s, since no in-kernel-copy would be needed, it should be possible to blit it directly). The big problem with the current texture allocator is that it can't tell which areas are really unused. Texture space is only allocated and never freed. Once the memory is full it starts kicking textures to upload new ones. This is the only way of freeing memory. Using an LRU strategy it has a good chance of kicking unused textures first, but there's no guarantee. It can't tell if a kicked texture will be needed the next instant. So trying to move textures from GART to local memory would basically mean that you blindly kick the least recently used texture(s) from local memory. If those textures are needed again soon then performance is going to suffer badly. Therefore I'm proposing a modified allocator that fails when it needs to start kicking too recently used textures (e.g. textures used in the current or previous frame). Failure would not be fatal in this case, you just keep the texture in GART memory and try again later. Actually you could use the same allocator for normal texture uploads. Just specify the current texture heap age as the limit. If you try to move textures back to local memory each time a texture is used, this would result in some kind of automatic regulation of heap usage. By kicking only textures that are several frames old in this process, you'd avoid trashing. Currently the texture heap age is only incremented on lock contention (IIRC). In this scheme you'd also increment it on buffer swaps and remember the texture heap ages of the last two buffer swaps. I simplified this idea a little further and attached a patch against texmem.[ch]. It frees stale textures (and also place holders for other clients' textures) that havn't been used in 1 second when it runs out of space on a texture heap. This way it will try a bit harder to put textures into the first heap before using the second heap, without much risk (I hope) of performance regressions. I tested this on a ProSavageDDR where rendering speed appears to be the same with local and GART textures. There was no measurable performance regression in Quake3 and I noticed no subjective performance regression in Torcs or Quake1 either. Now the only thing missing in texmem.c for migrating textures from GART to local memory would be a flag to driAllocateTexture to stop trying if kicking stale textures didn't free up enough space (on the first texture heap). Anyway, I think the attached patch should already make a difference as it is. I'd be interested how much it improves your performance numbers with Quake3 and rtcw on r200 when both texture heaps are enabled. [snip] Regards, Felix -- | Felix Kühling [EMAIL PROTECTED] http://fxk.de.vu | | PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3 B152 151C 5CC1 D888 E595 | --- ./texmem.h.~1.6.~ 2005-02-02 17:20:40.0 +0100 +++ ./texmem.h 2005-02-10 17:44:40.0 +0100 @@ -101,6 +101,11 @@ * value must be greater than * or equal to \c firstLevel. */ + + double clockAge; /** Clock time stamp indicating when + * the texture was last used. The unit + * is seconds. + */ }; --- ./texmem.c.~1.10.~ 2005-02-05 14:16:25.0 +0100 +++ ./texmem.c 2005-02-10 18:39:15.0 +0100 @@ -50,6 +50,7 @@ #include texformat.h #include assert.h +#include sys/time.h @@ -243,6 +244,13 @@ */ move_to_head( heap-texture_objects, t ); + { + struct timeval tv; + if ( gettimeofday( tv, NULL ) == 0 ) { + t-clockAge = (double)tv.tv_sec + (double)tv.tv_usec / 1e6; + } else + t-clockAge = 0.0; + } for (i = start ; i = end ; i++) { @@ -415,6 +423,15 @@ t-heap = heap; if (in_use)
Re: [r2xx|r1xx] readpixels-3 and pntparam_1 - Progress?
Dieter Ntzel wrote: r100-readpixels-3.patch (Stephane) r200_pntparam_1.diff (Roland) I'm ran with both. Should they merged? I surely hope to get my readpixels patch merged. However, I found a serious flaw in it (not related to the readpixe segfault) which I have to fix before this happens. Stephane --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
r300 vb path
Hello, I've attached a patch with a port of the r200 vertex buffer code for the r300 driver. The performance of the vertex buffer codepath is now roughly the same as the immediate path, and tuxracer now seems to be rendered almost correctly. Vladimir, I haven't found a way that I can directly call the r200/radeon's discard buffer command from r300_dri, so this patch still includes the drm additions. Perhaps someone could help me out with this one? Could the people testing r300_dri test this if they have the time? And Vladimir, can you let me know if you want me to commit this, or if it needs more work. Thanks, Ben Skeggs. diff -Nur r300_driver/drm/shared-core/r300_cmdbuf.c r300_driver_wip/drm/shared-core/r300_cmdbuf.c --- r300_driver/drm/shared-core/r300_cmdbuf.c 2005-01-08 13:46:34.0 +1100 +++ r300_driver_wip/drm/shared-core/r300_cmdbuf.c 2005-02-11 05:10:16.185196992 +1100 @@ -354,16 +354,37 @@ return 0; } +static void r300_discard_buffer(drm_device_t * dev, drm_buf_t * buf) +{ +drm_radeon_private_t *dev_priv = dev-dev_private; +drm_radeon_buf_priv_t *buf_priv = buf-dev_private; +RING_LOCALS; + +buf_priv-age = ++dev_priv-sarea_priv-last_dispatch; + +/* Emit the vertex buffer age */ +BEGIN_RING(2); +RADEON_DISPATCH_AGE(buf_priv-age); +ADVANCE_RING(); + +buf-pending = 1; +buf-used = 0; +} + + /** * Parses and validates a user-supplied command buffer and emits appropriate * commands on the DMA ring buffer. * Called by the ioctl handler function radeon_cp_cmdbuf. */ int r300_do_cp_cmdbuf(drm_device_t* dev, + DRMFILE filp, drm_file_t* filp_priv, drm_radeon_cmd_buffer_t* cmdbuf) { drm_radeon_private_t *dev_priv = dev-dev_private; +drm_device_dma_t *dma = dev-dma; +drm_buf_t *buf = NULL; int ret; DRM_DEBUG(\n); @@ -375,6 +396,7 @@ } while(cmdbuf-bufsz = sizeof(drm_r300_cmd_header_t)) { + int idx; drm_r300_cmd_header_t header; if (DRM_GET_USER_UNCHECKED(header.u, (int __user*)cmdbuf-buf)) { @@ -431,6 +453,26 @@ ADVANCE_RING(); } return 0; + + case R300_CMD_DMA_DISCARD: + DRM_DEBUG(RADEON_CMD_DMA_DISCARD\n); +idx = header.dma.buf_idx; +if (idx 0 || idx = dma-buf_count) { +DRM_ERROR(buffer index %d (of %d max)\n, + idx, dma-buf_count - 1); +return DRM_ERR(EINVAL); +} + +buf = dma-buflist[idx]; +if (buf-filp != filp || buf-pending) { +DRM_ERROR(bad buffer %p %p %d\n, + buf-filp, filp, buf-pending); +return DRM_ERR(EINVAL); +} + +r300_discard_buffer(dev, buf); +break; + default: DRM_ERROR(bad cmd_type %i at %p\n, header.header.cmd_type, diff -Nur r300_driver/drm/shared-core/radeon_drm.h r300_driver_wip/drm/shared-core/radeon_drm.h --- r300_driver/drm/shared-core/radeon_drm.h2005-01-02 05:32:52.0 +1100 +++ r300_driver_wip/drm/shared-core/radeon_drm.h2005-02-06 20:20:06.0 +1100 @@ -199,6 +199,7 @@ #define R300_CMD_PACKET3 3 /* emit a packet3 */ #define R300_CMD_END3D 4 /* emit sequence ending 3d rendering */ #define R300_CMD_CP_DELAY 5 +#define R300_CMD_DMA_DISCARD 6 typedef union { unsigned int u; @@ -218,6 +219,9 @@ unsigned char cmd_type, packet; unsigned short count; /* amount of packet2 to emit */ } delay; + struct { + unsigned char cmd_type, buf_idx, pad0, pad1; + } dma; } drm_r300_cmd_header_t; #define RADEON_FRONT 0x1 diff -Nur r300_driver/drm/shared-core/radeon_drv.h r300_driver_wip/drm/shared-core/radeon_drv.h --- r300_driver/drm/shared-core/radeon_drv.h2004-12-28 07:44:39.0 +1100 +++ r300_driver_wip/drm/shared-core/radeon_drv.h2005-02-11 05:11:02.953087192 +1100 @@ -310,6 +310,7 @@ /* r300_cmdbuf.c */ extern int r300_do_cp_cmdbuf( drm_device_t* dev, + DRMFILE filp, drm_file_t* filp_priv, drm_radeon_cmd_buffer_t* cmdbuf ); diff -Nur r300_driver/drm/shared-core/radeon_state.c r300_driver_wip/drm/shared-core/radeon_state.c --- r300_driver/drm/shared-core/radeon_state.c 2005-01-31 13:33:24.0 +1100 +++ r300_driver_wip/drm/shared-core/radeon_state.c 2005-02-06 20:20:06.0 +1100 @@ -2469,7 +2469,7 @@ return DRM_ERR(EFAULT); if ( IS_FAMILY_R300(dev_priv) ) - return r300_do_cp_cmdbuf(dev, filp_priv,
Re: fglrx vs. r200 dri
On Thursday 10 February 2005 12:53, Roland Scheidegger wrote: Adam Jackson wrote: On Thursday 10 February 2005 11:18, Roland Scheidegger wrote: r200 dri uses xorg cvs head, with dri driver from Mesa cvs head, with color tiling, texture tiling, hyperz and whatever else I could find boosting performance :-). fglrx uses XFree86 4.3.99.902 (from suse 9.1), with stock configuration, except I needed to correct the bus id and switched it to external gart. I don't know of any options which would boost performance. Desktop resolution is 1280x1924, 85Hz. Exactly which fglrx version is this with, the old 3.x series or the new 8.8.25? This was newest 8.8.25 (for xfree 4.3, your suggested linker magic for using the 6.8 version with xorg cvs head does not work :-)). Interesting, I've been doing that with fglrx for quite a while now with no problems. Maybe I left out a step in the instructions. If you could post the loader errors you got I could tell you how to work around them. - ajax pgpRTHLPMXuuz.pgp Description: PGP signature
[Bug 1648] R200 SWTCL path doesn't do projtex right
Please do not reply to this email: if you want to comment on the bug, go to the URL shown below and enter yourcomments there. https://bugs.freedesktop.org/show_bug.cgi?id=1648 [EMAIL PROTECTED] changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Additional Comments From [EMAIL PROTECTED] 2005-02-10 10:31 --- (In reply to comment #5) First I thought if (tnl-render_inputs _TNL_BITS_TEX_ANY) might not be up to date, but at least in projtex it seems so. Thought that too first, but then I guessed if _TNL_BITS_TEX_ANY wouldn't be up to date, it would not work correctly at all, since the code would incorrectly select the tiny vertex format in cases where it shouldn't. Applied to cvs. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug, or are watching someone who is. --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [r2xx|r1xx] readpixels-3 and pntparam_1 - Progress?
Stephane Marchesin wrote: Dieter Ntzel wrote: r100-readpixels-3.patch (Stephane) r200_pntparam_1.diff (Roland) I'm ran with both. Should they merged? I surely hope to get my readpixels patch merged. However, I found a serious flaw in it (not related to the readpixe segfault) which I have to fix before this happens. As for pntparam, I couldn't get it really working, and in this form it's overkill for what works (larger point sizes). I'm not so sure it's worth the trouble of whipping up a simpler patch which would only contain that, since only aliased larger point sizes are working, but everyone uses antialiased points... Maybe I'll try it later again. Roland --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] texturing performance local/gart on rv250
I haven't looked at the texture heap management code, but one simple idea for heap management would be to cascade the on-board heap to the AGP one. How does the current algorithm work? Does an algorithm like the one below have merit? It should sort the hot textures on-board, and single use textures should fall out of the cache. 1) load all textures initially in the on-board heap. Since if you are loading them you're probably going to use them. 2) Do LRU with the on-board heap. 3) When you run out of space on-board, demote the end of the LRU list to the top of the AGP heap and copy the texture between heaps. 4) Run LRU on the AGP heap. 5) When it runs out of space lose the item. 6) an added twist would be if the top of the AGP heap gets hit too often knock it out of cache so that it will get reloaded on-board. Jon Smirl [EMAIL PROTECTED] --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: fglrx vs. r200 dri
Adam Jackson wrote: On Thursday 10 February 2005 12:53, Roland Scheidegger wrote: Adam Jackson wrote: On Thursday 10 February 2005 11:18, Roland Scheidegger wrote: r200 dri uses xorg cvs head, with dri driver from Mesa cvs head, with color tiling, texture tiling, hyperz and whatever else I could find boosting performance :-). fglrx uses XFree86 4.3.99.902 (from suse 9.1), with stock configuration, except I needed to correct the bus id and switched it to external gart. I don't know of any options which would boost performance. Desktop resolution is 1280x1924, 85Hz. Exactly which fglrx version is this with, the old 3.x series or the new 8.8.25? This was newest 8.8.25 (for xfree 4.3, your suggested linker magic for using the 6.8 version with xorg cvs head does not work :-)). Interesting, I've been doing that with fglrx for quite a while now with no problems. Maybe I left out a step in the instructions. If you could post the loader errors you got I could tell you how to work around them. I just did what you suggested for fglrx_drv.o, i.e. gcc -shared -nostdlib -o fglrx_drv.so fglrx_drv.o -Bstatic -lgcc. libfglrxdrm.a consists of two objects, (modules.o and FireGLdrm.o) I extracted them with ar and linked both objects together then (gcc -shared -nostdlib -o libfglrxdrm.so modules.o FireGLdrm.o -Bstatic -lgcc). But this gave me an error, it complained about unreferenced symbol from fglrx_drv.so (I believe it was firegl_PM4Alloc, I am sure though it was one of the symbols defined in this libfglrxdrm.so). I tried some weird things like linking the two objects from libfglrxdrm.a together with libdrm.so without much success. At one point though I got a different missing symbol, I believe that was XAACreateScreenRec or something like that. At this point I gave up... Roland --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
[Bug 2241] implement GL_ARB_texture_cube_map in radeon driver
Please do not reply to this email: if you want to comment on the bug, go to the URL shown below and enter yourcomments there. https://bugs.freedesktop.org/show_bug.cgi?id=2241 --- Additional Comments From [EMAIL PROTECTED] 2005-02-10 14:06 --- I have applied the drm part to cvs (together with a texture micro tiling so they can have the same drm minor number), together with the corresponding sanity code pieces. I'm afraid, the rest is a bit too much for me to review/commit, especially since I don't have too much time testing on r100. It would definitely be nice to have though. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] texturing performance local/gart on rv250
Am Donnerstag, den 10.02.2005, 15:31 -0500 schrieb Jon Smirl: I haven't looked at the texture heap management code, but one simple idea for heap management would be to cascade the on-board heap to the AGP one. How does the current algorithm work? Does an algorithm like the one below have merit? It should sort the hot textures on-board, and single use textures should fall out of the cache. 1) load all textures initially in the on-board heap. Since if you are loading them you're probably going to use them. Drivers usually upload textures to the hardware just before binding them to a hardware texture unit. So this assumption is always true. 2) Do LRU with the on-board heap. 3) When you run out of space on-board, demote the end of the LRU list to the top of the AGP heap and copy the texture between heaps. This means you copy a texture when you don't know if or when you're going to need it again. So the move of the texture may just be a waste of time. It would be better to just kick the texture and upload it again later when it's really needed. 4) Run LRU on the AGP heap. 5) When it runs out of space lose the item. 6) an added twist would be if the top of the AGP heap gets hit too often knock it out of cache so that it will get reloaded on-board. I'd rather reverse your scheme. Upload a texture to the GART heap first, because that's potentially faster (though not with the current implementation in the radeon drivers). When the texture is needed more frequently, try promoting it to the local texture heap. This scheme would give good results with movie players that need fast texture uploads and typically use each texture exactly once. It would also improve performance with games, simulations, ... that tend to use the same textures many times and benefit from the higher memory bandwidth when accessing local textures. Jon Smirl [EMAIL PROTECTED] -- | Felix Kühling [EMAIL PROTECTED] http://fxk.de.vu | | PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3 B152 151C 5CC1 D888 E595 | --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] texturing performance local/gart on rv250
On Thu, 10 Feb 2005 23:13:30 +0100, Felix Kühling [EMAIL PROTECTED] wrote: This means you copy a texture when you don't know if or when you're going to need it again. So the move of the texture may just be a waste of time. It would be better to just kick the texture and upload it again later when it's really needed. I suspect this extra texture copy wouldn't be noticable except when you construct a test program which articifically triggers it. Most games will achieve a steady state with their loaded textures after a frame or two and the copies will stop. I'd rather reverse your scheme. Upload a texture to the GART heap first, because that's potentially faster (though not with the current implementation in the radeon drivers). When the texture is needed more frequently, try promoting it to the local texture heap. I thought about this, but there is no automatic way to figure out when to promote from GART to local. Same problem when local overflows, what do you demote to AGP? You still have copies with this scheme too. Going first to local and then demoting to AGP sorts everything automatically. It may cause a little more churn in the heaps, but the advantage is that the algorithm is very simple and doesn't need much tuning. The only tunable parameter is determining when the top of the AGP heap is hot and booting it. You could use something simple like boot after 500 accesses. -- Jon Smirl [EMAIL PROTECTED] --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] texturing performance local/gart on rv250
On Thu, 10 Feb 2005 23:13:30 +0100, Felix Kühling [EMAIL PROTECTED] wrote: This scheme would give good results with movie players that need fast texture uploads and typically use each texture exactly once. It would Movie players aren't even close to being texture bandwidth bound. The demote from local to AGP scheme would cause two copies on each frame but there is plenty of bandwidth. But this assumes that the movie player creates a new texture for each frame. A better scheme for a movie player would be to create a single texture and then keep replacing it's contents. Or use two textures and double buffer. But once created these textures would not move in the LRU list unless you started something like a game in another window. -- Jon Smirl [EMAIL PROTECTED] --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] texturing performance local/gart on rv250
Am Donnerstag, den 10.02.2005, 17:40 -0500 schrieb Jon Smirl: On Thu, 10 Feb 2005 23:13:30 +0100, Felix Kühling [EMAIL PROTECTED] wrote: This scheme would give good results with movie players that need fast texture uploads and typically use each texture exactly once. It would Movie players aren't even close to being texture bandwidth bound. The That's not my experience. Optimizations in the texture upload path, using the AGP heap and partial texture uploads had a big impact on mplayer -vo gl performance on my ProSavageDDR (factor 2-3 all of them taken together). demote from local to AGP scheme would cause two copies on each frame but there is plenty of bandwidth. But this assumes that the movie player creates a new texture for each frame. A better scheme for a movie player would be to create a single texture and then keep replacing it's contents. You're right, that's what actually happens in mplayer. It uses glTexSubImage2D because it typically changes only a part of a texture with power-of-two dimensions. Or use two textures and double buffer. But once created these textures would not move in the LRU list unless you started something like a game in another window. Yes, they would move in the LRU list. That's why it's called least recently used not least recently created. ;-) So I would have to modify my scheme to reset the usage count/frequency when a texture image is changed, such that a texture that is updated very frequently would not be promoted to local memory. Am Donnerstag, den 10.02.2005, 17:34 -0500 schrieb Jon Smirl: On Thu, 10 Feb 2005 23:13:30 +0100, Felix Kühling [EMAIL PROTECTED] wrote: This means you copy a texture when you don't know if or when you're going to need it again. So the move of the texture may just be a waste of time. It would be better to just kick the texture and upload it again later when it's really needed. I suspect this extra texture copy wouldn't be noticable except when you construct a test program which articifically triggers it. Most games will achieve a steady state with their loaded textures after a frame or two and the copies will stop. Still this copy is unnecessary at the time. Delaying the re-upload to the time when the texture is needed again has only advantages and is not difficult to implement. I'd rather reverse your scheme. Upload a texture to the GART heap first, because that's potentially faster (though not with the current implementation in the radeon drivers). When the texture is needed more frequently, try promoting it to the local texture heap. I thought about this, but there is no automatic way to figure out when to promote from GART to local. Yes there is. In the current scheme, whenever a texture is bound to a hardware tex unit the driver calls driUpdateTexLRU, which moves the texture to the front of the LRU list. In this function you could easily count how often or how frequently a texture has been used. Based on this information and maybe the texture size you could decide which textures to promote and when. You will keep promoting textures until the local heap is full of non-stale textures. Same problem when local overflows, what do you demote to AGP? You still have copies with this scheme too. Textures are sorted in LRU-order on the texture heaps. So you always kick least recently used textures first. It has always worked like this even in the current scheme. For promoting textures I would only kick stale textures from the local heap. Going first to local and then demoting to AGP sorts everything automatically. It may cause a little more churn in the heaps, In my experience texture uploads are quite expensive. So IMO avoiding unnecessary texture uploads or copies should have a high priority. but the advantage is that the algorithm is very simple and doesn't need much tuning. The only tunable parameter is determining when the top of the AGP heap is hot and booting it. You could use something simple like boot after 500 accesses. I don't think my algorithm is much more complicated. It can be implemented by gradual improvements of the current algorithm (freeing stale texture memory is one step) which helps avoiding unexpected performance regressions. At the moment I'm not planning to rewrite it from scratch, especially because I can't test on any hardware where I can actually measure great performance improvements ATM. The only tunable parameter in my algorithm is how often/frequently used a texture must be in order to try to promote it to the local texture heap. Maybe there are a few more degrees of freedom, because you can also consider the texture size for promotion. I think the steady state result would be about the same as with your algorithm, but I expect my scheme to work better when textures are used very infrequently or updated very frequently (movie players). In particular this would make the texture_heaps option unnecessary, which is a good think IMO (good
Re: [patch] texturing performance local/gart on rv250
Felix Kühling wrote: I simplified this idea a little further and attached a patch against texmem.[ch]. It frees stale textures (and also place holders for other clients' textures) that havn't been used in 1 second when it runs out of space on a texture heap. This way it will try a bit harder to put textures into the first heap before using the second heap, without much risk (I hope) of performance regressions. I tested this on a ProSavageDDR where rendering speed appears to be the same with local and GART textures. There was no measurable performance regression in Quake3 and I noticed no subjective performance regression in Torcs or Quake1 either. Now the only thing missing in texmem.c for migrating textures from GART to local memory would be a flag to driAllocateTexture to stop trying if kicking stale textures didn't free up enough space (on the first texture heap). Anyway, I think the attached patch should already make a difference as it is. I'd be interested how much it improves your performance numbers with Quake3 and rtcw on r200 when both texture heaps are enabled. I've done a couple of benchmarks. All results are fglrx-boosted, so to speak (too lazy to reboot). q3, local 45MB or 35MB: 145 fps rtcw, local 45MB: 95 fps rtcw, local 35MB: 76 fps with both heaps, local size 35MB, GART texture size 61MB: q3, old allocator: 105-125 fps rtcw, old allocator: 70-84 fps q3, new allocator: 108-126 fps rtcw, new allocator: 71-85 fps This does not seem to really make a difference. One interesting thing I noticed though is that it is actually not really a range of results, but only some distinct values. For rtcw, the scores were always very close to either 70, 77 or 85 fps (within 1 frame), out of 10 runs maybe 6 were around 77, 2 around 70 and 2 around 85. Quake3 mostly ran at around 125 fps but once every while was just below 110. Roland --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] texturing performance local/gart on rv250
A better scheme for a movie player would be to create a single texture and then keep replacing it's contents. Or use two textures and double buffer. But once created these textures would not move in the LRU list unless you started something like a game in another window. if we supported that in any reasonable fashion (at least on radeon/r200), movie players are very texture upload bound, well at least on my embedded system, I do a lot of animation with movies, and mngs and arrays of pngs, and most of my time is spent in memcpy and texstore_rgba, this is a real pain for me, and I'm slowly gathering enough knowledge to do a great big hack for my own internal use, Dave. -- David Airlie, Software Engineer http://www.skynet.ie/~airlied / airlied at skynet.ie pam_smb / Linux DECstation / Linux VAX / ILUG person --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] texturing performance local/gart on rv250
AGP 8x should just be able to keep up with 1280x1024x24b 60 times/sec. How does mesa access AGP memory from the CPU side? AGP memory is system memory which the AGP makes visible to the GPU. Are we using the GPU to load textures into AGP memory or is it being done entirely on the main CPU with a memcopy? For things like a movie player we should even be able to give it a pointer to the texture in system memory(AGP space) and let it directly manipulate the texture buffer. Doing that would require playing with the page tables to preserve protection. -- Jon Smirl [EMAIL PROTECTED] --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] texturing performance local/gart on rv250
Jon Smirl wrote: AGP 8x should just be able to keep up with 1280x1024x24b 60 times/sec. AGP 4x should be enough. Remember I got 600MB/s max throughput. Not with 24bit textures though, the Mesa RGBA-BGRA conversion takes WAY too much time to achieve that. How does mesa access AGP memory from the CPU side? AGP memory is system memory which the AGP makes visible to the GPU. Are we using the GPU to load textures into AGP memory or is it being done entirely on the main CPU with a memcopy? depends on driver. radeon/r200 use gpu blit. Might be suboptimal but at least it handles things like tiling (when the gpu blitter can do it) automatically. I'm not sure but couldn't the radeon blitter actually do rgba-bgra conversion too for instance? For things like a movie player we should even be able to give it a pointer to the texture in system memory(AGP space) and let it directly manipulate the texture buffer. Doing that would require playing with the page tables to preserve protection. This seems exactly to be what the client extension of the r200 driver is intended for. But for normal apps, it's useless (and for the most part even for apps which could make good use of it, since it's an extension almost noone uses anyway). Roland --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] texturing performance local/gart on rv250
Felix Kühling wrote: I don't think my algorithm is much more complicated. It can be implemented by gradual improvements of the current algorithm (freeing stale texture memory is one step) which helps avoiding unexpected performance regressions. At the moment I'm not planning to rewrite it from scratch, especially because I can't test on any hardware where I can actually measure great performance improvements ATM. I'm not sure what a really good implementation would look like, but you could try lowering gart speed to 1x with a savage to see a performance difference between local and gart texturing. Though I'm not convinced the savages are actually fast enough to even take a hit with agp 1x... Roland --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_ide95alloc_id396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] texturing performance local/gart on rv250
Dave Airlie wrote: A better scheme for a movie player would be to create a single texture and then keep replacing it's contents. Or use two textures and double buffer. But once created these textures would not move in the LRU list unless you started something like a game in another window. if we supported that in any reasonable fashion (at least on radeon/r200), movie players are very texture upload bound, well at least on my embedded system, I do a lot of animation with movies, and mngs and arrays of pngs, and most of my time is spent in memcpy and texstore_rgba, this is a real pain for me, and I'm slowly gathering enough knowledge to do a great big hack for my own internal use, Perhaps a wild idea ... does APPLE_client_texture do what you want? If so then it might be a lot simpler and more reusable to test/optimize/fixup that then to start from scratch. That should allow a straight-copy from data you create to memory card the can texture from, which is about as good as possible. For subimage modification the spec seems to permit modifying the data in place then calling TexSubImage on the subregion with a pointer into the original data to notify of the change. Regards, Owen --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] texturing performance local/gart on rv250
On Thu, 10 Feb 2005 21:59:29 -0500, Owen Taylor [EMAIL PROTECTED] wrote: That should allow a straight-copy from data you create to memory card the can texture from, which is about as good as possible. If you have a big AGP aperture to play with there is a faster way. When you get the call to copy the texture from user space, don't copy it. Instead mark it's page table entries as copy on write. Get the physical address of the page and set it into the GART. Now the GPU can get to it with zero copies. When you are done with it, check and see if the app caused a copy on write, if so free the page, else just remove the COW flag. -- Jon Smirl [EMAIL PROTECTED] --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] texturing performance local/gart on rv250
On Thu, 2005-02-10 at 22:23 -0500, Jon Smirl wrote: On Thu, 10 Feb 2005 21:59:29 -0500, Owen Taylor [EMAIL PROTECTED] wrote: That should allow a straight-copy from data you create to memory card the can texture from, which is about as good as possible. If you have a big AGP aperture to play with there is a faster way. When you get the call to copy the texture from user space, don't copy it. Instead mark it's page table entries as copy on write. Get the physical address of the page and set it into the GART. Now the GPU can get to it with zero copies. When you are done with it, check and see if the app caused a copy on write, if so free the page, else just remove the COW flag. Is there evidence that this is/would be in fact faster? -- Eric Anholt[EMAIL PROTECTED] http://people.freebsd.org/~anholt/ [EMAIL PROTECTED] --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: r300 vb path
Hi Ben, Great work ! With regards to discard buffer command - now that I think of it, you want this code initiated from within cmdbuf, not as a separate ioctl, so your way is correct - we need to implement the appropriate cmd for r300. Go ahead and apply the patch, can't wait to try it :) thank you ! Vladimir Dergachev On Fri, 11 Feb 2005, Ben Skeggs wrote: Hello, I've attached a patch with a port of the r200 vertex buffer code for the r300 driver. The performance of the vertex buffer codepath is now roughly the same as the immediate path, and tuxracer now seems to be rendered almost correctly. Vladimir, I haven't found a way that I can directly call the r200/radeon's discard buffer command from r300_dri, so this patch still includes the drm additions. Perhaps someone could help me out with this one? Could the people testing r300_dri test this if they have the time? And Vladimir, can you let me know if you want me to commit this, or if it needs more work. Thanks, Ben Skeggs. --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] texturing performance local/gart on rv250
it. Instead mark it's page table entries as copy on write. Get the physical address of the page and set it into the GART. Now the GPU can get to it with zero copies. When you are done with it, check and see if the app caused a copy on write, if so free the page, else just remove the COW flag. Is there evidence that this is/would be in fact faster? no but I could practically guarantee anything is faster than the 3-4 copies a radeon texture goes through at the moment.. Dave. -- David Airlie, Software Engineer http://www.skynet.ie/~airlied / airlied at skynet.ie pam_smb / Linux DECstation / Linux VAX / ILUG person --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] texturing performance local/gart on rv250
On Thu, 10 Feb 2005 20:14:00 -0800, Eric Anholt [EMAIL PROTECTED] wrote: Is there evidence that this is/would be in fact faster? That's how the networking drivers work and they may be the fastest drivers in the system. But, it has not been coded for AGP so nobody knows for sure. It has to be faster though, having the CPU do the copy will cause the TLB cache to be flushed as you walk through all of the pages. Having the GPU do the copy is even worse since it moves across AGP. We have bigger problems to chase. Plus implementing it this way probably has a bunch of architecture specific problems I don't know about. But I'm sure it would work on the x86. After we get X on GL up on mesa-solo I can look at changing the texture copy code. -- Jon Smirl [EMAIL PROTECTED] --- SF email is sponsored by - The IT Product Guide Read honest candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595alloc_id=14396op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel