On Fri, 25 Feb 2005 19:37:36 -0500, Daniel Phillips <[EMAIL PROTECTED]> wrote: > On Friday 25 February 2005 18:37, Timothy Miller wrote: > > On Fri, 25 Feb 2005 16:10:14 -0500, Daniel Phillips wrote: > > > LRU implements the policy I described so long as the eight texels > > > of a trilinear blend are accessed in the proper order, which isn't > > > the best order for the DRAM (it alternates between two textures). > > > But if the memory request queue is long enough, maybe the accesses > > > can be reordered there. Or simpler, have two independent texture > > > caches per port, or switch that on when trilinear blending is > > > active. > > > > As a matter of fact, doing the two texture units as really two > > texture units simplifies some things and makes some more complex, but > > it would give us room to improve efficiency by reordering memory > > accesses. Given the change of direction, I may do it that way. > > Doing reordering in the memory interface and fragment processing would > be a fun challenge. It doesn't have to be in the first rev.
I think it won't be that complex of a state machine. It's trivial to determine which words go with which memory controllers, and which agent is active on each is based on a combination of priority, memory row hit, and an expiration counter. The idea is to increase the chances that at least N words are accessed from each memory row before changing rows and/or agents. > > > The way I was going to do FSAA was to have one stage that divided the > > screen coordinates by 2**n and modified the alpha channel > > appropriately. Horizontally associated fragments would be > > automatically combined, but vertically associated fragments would be > > resolved by the blender. That would lose some accuracy and > > performance, but oh-well. > > I am not sure how well this works with randomly ordered polygon > overdraw. > > There is a standard software-level OpenGL solution which is to draw the > screen several times into an accumlation buffer. This sucks for a > number of reasons, not least that all the geometry has to go over the > bus multiple times, and the bus is already saturated. > > A whole scene could be buffered on the card and processed multiple > times, but that is not transparent to software and eats texture memory. > It's probably workable though. The accumulation buffer has its own > overhead. Adding a stage for automatic antialiasing would not be a big deal. It was removed because it was considered to be not as important as some other features that were too complex to be added already. This is the case with a number of features that were dropped. > > > However, it's been suggested that one solution is to render the scene > > N times with slightly different camera positions and them composite > > them. It seems to me that that would be a lot slower, but I haven't > > put a lot of thought into it. > > This is the classic way. It has the great virtue that it is easy to > implement and the results are really nice. > > > Only after starting this project did I start to notice that polygon > > edges in games I play are jagged. :) > > Now you are doomed to never ignore it :) > > By the way, I goofed by not adding in the Z buffer overhead, which is > significant. I will have to redo the estimate. It's probably ok to do > the Z buffer at 1X resolution, together with the 2x2 rendering. This > would require some hardware tweaks, for example, the Z test results > should be cached for even scans so that they can be reused on odd > scans, or something like that. > > Like the earlier Id engines, Quake 3 only does Z fill on the scenery > pass, saving a whole lot of memory accesses. Only the mobile object > pass does full Z buffering. Well, given what you say, there's little reason to go either way except for memory consumption. > > > There are several ways to claw back some fill rate: > > > > > > 1) Texture cache, saving maybe 25% of texel accesses > > > > There are all sorts of thing that would qualify as a "texture cache". > > I'm thinking about "the one we discussed today" because this one has the > great virtue of being _our_ texture cache. To be precise, it only has > a handful of entries, and it is specifically tailored to accelerating > bilinear and trilinear filtering. > > > > 2) 16 bit textures, saving 50% of texel bandwidth and improving > > > cache hit rate slightly > > > > Making all memory words exactly 32 bits is a simplification that I > > don't want to give up. > > Maybe we could put this in the category of "later rev". The additional > complexity isn't that bad, at least according to my model of how the > memory controller works. None of these improvements absolutely has to > be done, but without these optimizations, FSAA+trilinear performance > has no chance of reaching game speed. Consider the possibility that any "later rev" is likely to have much greater memory bandwidth, more pipelines, etc. Certain kinds of optimizations only come into play when we start to reach other sorts of cost boundaries. What is it they say about not pre-optimizing? :) > > > > 3) 16 bit frame buffer, saving 50% of pixel write and scanout > > > bandwidth. > > > > > > If we do all of these, we just might hit 30 frames/sec for Quake 3, > > > with decent filtering and antialiasing. > > > > This is something to consider. Quake 3 isn't so advanced that we > > shouldn't pay attention to it. > > However unfair it sounds, Quake 3 is the standard by which the card will > be measured. What is nice is, the Quake 3 GPL release is probably > coming down the pipe pretty soon, maybe this Christmas. It would be > absolutely great to run the GPL Quake engine on an open card. > > Note to those who think that it is wrong to focus too much on game > performance: if the card runs Quake 3 decently it will run a whole lot > of applications well. And running Quake 3 is easily worth a big > increase in sales, so it helps the project. It's being taken into consideration as much as is reasonable. > > But we're going to do what we can do, and that's going to be more > > than good enough for most embedded and desktop applications. > > Great. I know this card is going to work well for 3D accelerated X, but > that doesn't mean a thing if nobody owns it. And that is why our primary market is now embedded. Our OpenGL implementation will be better than any OpenGL ES chip. _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
