On Thursday 17 March 2005 16:37, Attila Kinali wrote:
> > >   * There is no advantage to being able to reuse consumed pages
> > > in random order versus a fixed order
>
> It doesn't matter. A driver will always have to determine whether
> it can reuse a certain page or not. Thus it doesn't matter
> whether it goes trough a linear list or a linked list.

Do you grok the concept of a ring buffer?  Ring buffers were made in 
heaven for synchronizing nasty asynchronous pairs of things such as 
fast hardware vs fast host. 

> > >   * Another way to grow/shrink the ring buffer is to have a
> > > predefined list of pages as I described earlier
>
> Which will fail if you cannot reserve the buffer at startup -> bad
> thing[tm]

Wrong, the "list of pages" is exactly what prevents this.

> > >   * This is more complicated for the driver than a simple,
> > > virtually contiguous buffer
>
> Actually, all solutions i've seen so far are too complicated.

Then I simply didn't explain clearly.  The ring buffer scheme (even with 
definable physical page vector) is pretty much as simple as it gets.  
It needs the following PIO registers:

  - Head register: The ring buffer offset from which the card will
    next read an indirect DMA command.

  - Tail register: The ring buffer offset where the kernel driver
    will next write an indirect DMA command.

  - Threshold register: The threshold below which the card will
    generate a buffer low interrupt.

As opposed to having a single physical base register for the command 
ring buffer (the classic arrangement) there is a table of physical 
pages, which I suggested can be just one page for the initial rev 
(sufficient for a PCI card).

These are the command/status bits:

  - 1 bit: Enable/disable command DMA

  - 1 bit: Initiate table upload. The next N values written to the
    cursor upload register will be physical command buffer addresses,
    where N is 1 for the initial rev.

  - 4 bits: Ring buffer size field.  Gives log2 of the ring
    buffer size in pages, implying the ring buffer is always a binary
    size.  For the initial rev, this bit field is reserved/zero,
    implying a single page for command DMA, which is sufficient to
    keep up with PCI speeds.

That's it: three PIO registers, two command/status bits and a four-bit 
command/status field.  We press the cursor upload mechanism into double 
duty, a nicety I'm rather fond of.

This simple PIO interface gives complete control over a classic ring 
buffer DMA interface.  The ring buffer model is a popular choice on 
many types of cards these days.  In fact, I'll venture to say that if 
we don't use a ring buffer, certain kernel maintainers will throw 
tomatoes, and rightly so.

> I'd like to have a bus system that "just knows" my virtual
> adresses so that i just can tell you where my data stats,
> how much it has to read and where to store it.

This sounds like a whole bunch of hardware and a potential breeding 
place for bugs.

> Unfortunately, i'm too new in the field of hardware and driver
> design to make any good suggestions :(
>
> > And the question is:  For a fixed set, how many pages is enough?
>
> Infinte.
> Anything less will restrict the usability of the card.

I don't think so.

It helps to have an idea what actually goes through the command ring 
buffer.  This will predominantly (or even exclusively) be DMA setup 
commands.  It is easy to calculate how big the ring buffer needs to be, 
given the speed of the bus (the limitation for PCI) and the speed of 
the card (the limitation for PCI-e).

> So, the only solution is to have some way of using a distributed
> buffer.

Which is exactly what the uploadable-table approach gives you.  The 
buffer does not have to be infinite.  Four bits of size definition 
field gives you up to 2**15 pages, which is 128 megabytes.  That should 
do.  Remember, geometry does not go through this buffer, it is mainly 
(or exclusively) for indirect DMA commands.

I should reiterate what we are avoiding with the proposed ring buffer 
interface: a fullblown and wooly scatter/gather DMA interface.
Scatter/gather would solve the problem, but in a far clunkier way.
Personally, I'd rather see all those gates allocated to something less 
mundane, like, say, the YUV->RGB converter ;-)

Regards,

Daniel
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to