Re: Using bus_dma(9)

2009-04-24 Thread John Baldwin
On Thursday 23 April 2009 3:59:28 pm Peter Jeremy wrote:
> I'm currently trying to port some code that uses bus_dma(9) from
> OpenBSD to FreeBSD and am having some difficulties in following the
> way bus_dma is intended to be used on FreeBSD (and how it differs from
> Net/OpenBSD).  Other than the man page and existing FreeBSD drivers, I
> am unable to locate any information on bus_dma care and feeding.  Has
> anyone written any tutorial guide to using bus_dma?
> 
> The OpenBSD man page provides pseudo-code showing the basic cycle.
> Unfortunately, FreeBSD doesn't provide any similar pseudo-code and
> the functionality is distributed somewhat differently amongst the
> functions (and the drivers I've looked at tend to use a different
> order of calls).
> 
> So far, I've hit a number of issues that I'd like some advice on:
> 
> Firstly, the OpenBSD model only provides a single DMA tag for the
> device at attach() time, whereas FreeBSD provides the parent's DMA tag
> at attach time and allows the driver to create multiple tags.  Rather
> than just creating a single tag for a device, many drivers create a
> device tag which is only used as the parent for additional tags to
> handle receive, transmit etc.  Whilst the need for multiple tags is
> probably a consequence of moving much of the dmamap information from
> OpenBSD bus_dmamap_create() into FreeBSD bus_dma_tag_create(), the
> rationale behind multiple levels of tags is unclear.  Is this solely
> to provide a single point where overall device DMA characteristics &
> limitations can be specified or is there another reason?

Many drivers provide a parent "driver" tag specifically to have a single 
point, yes.

> Secondly, bus_dma_tag_create() supports a BUS_DMA_ALLOCNOW flag that
> "pre-allocates enough resources to handle at least one map load
> operation on this tag".  However it also states "[t]his should not be
> used for tags that only describe buffers that will be allocated with
> bus_dmamem_alloc()" - does this mean that only one of bus_dmamap_load()
> or bus_dmamap_alloc() should be used on a tag/mapping?  Or is the
> sense backwards (ie "don't specify BUS_DMA_ALLOCNOW for tags that are
> only used as the parent for other tags and never mapped themselves")?
> Or is there some other explanation.

What happens usually now is that each thing you want to pre-alloc memory
for using bus_dmamem_alloc() (such as descriptor rings) uses its own tag.
This is somewhat mandated by the fact that bus_dmamem_alloc() doesn't take
a size but gets the size to allocate from the tag.  So usually a NIC driver
will have 3 tags: 1 for the RX ring, 1 for packet data, and 1 for the TX
ring.  Some drivers have 2 tags for packet data, 1 for TX buffers and 1
for RX buffers.
 
> Thirdly, bus_dmamap_load() has a uses a callback function to return
> the actual mapping details.  According to the man page, there is no
> way to ensure that the callback occurs synchronously - a caller can
> only request that bus_dmamap_load() fail if resources are not
> immediately available.  Despite this, many drivers pass 0 for flags
> (allowing an asynchronous invocation of the callback) and then fail
> (and cleanup) if bus_dmamap_load() returns EINPROGRESS.  This appears
> to open a race condition where the callback and cleanup could occur
> simultaneously.  Mitigating the race condition seems to rely on one of
> the following two behaviours:
> 
> a) The system is implicitly single-threaded when bus_dmamap_load() is
> called (generally as part of the device attach() function).  Whilst
> this is true at boot time, it would not be true for a dynamically
> loaded module.
> 
> b) Passing BUS_DMA_ALLOCNOW to bus_dma_tag_create() guarantees that
> the first bus_dmamap_load() on that tag will be synchronous.  Is this
> true?  Whilst it appears to be implied, it's not explicitly stated.

That doesn't really guarantee that either as the pool of bounce pages can be 
shared across multiple tags.  I think what you might be missing is this:

c) bus_dmamap_load() of a map returned from bus_dmamem_alloc() will always 
succeed synchronously.

That is the only case other than BUS_DMA_NOWAIT where one can assume 
synchronous calls to the callback.  Also, some bus_dma calls basically 
assumes BUS_DMA_NOWAIT such as bus_dmamap_load_mbuf() and 
bus_dmamap_load_mbuf_sg().

> Finally, what are the ordering requirements between the alloc, create,
> load and sync functions?  OpenBSD implies that the normal ordering is
> create, alloc, load, sync whilst several FreeBSD drivers use
> tag_create, alloc, load and then create.

FreeBSD uses the same ordering as OpenBSD.  I think you might be confused by 
the bus_dmamem_alloc() case.  There are basically two cases, the first is 
preallocating a block of RAM to use for a descriptor or command ring:

alloc_ring:
bus_dma_tag_create(..., &ring_tag);

/* Creates a map internally. */
bus_dmamem_alloc(ring_tag, &p, ..., &ring_map);

/* Will not fail wit

Using bus_dma(9)

2009-04-23 Thread Peter Jeremy
I'm currently trying to port some code that uses bus_dma(9) from
OpenBSD to FreeBSD and am having some difficulties in following the
way bus_dma is intended to be used on FreeBSD (and how it differs from
Net/OpenBSD).  Other than the man page and existing FreeBSD drivers, I
am unable to locate any information on bus_dma care and feeding.  Has
anyone written any tutorial guide to using bus_dma?

The OpenBSD man page provides pseudo-code showing the basic cycle.
Unfortunately, FreeBSD doesn't provide any similar pseudo-code and
the functionality is distributed somewhat differently amongst the
functions (and the drivers I've looked at tend to use a different
order of calls).

So far, I've hit a number of issues that I'd like some advice on:

Firstly, the OpenBSD model only provides a single DMA tag for the
device at attach() time, whereas FreeBSD provides the parent's DMA tag
at attach time and allows the driver to create multiple tags.  Rather
than just creating a single tag for a device, many drivers create a
device tag which is only used as the parent for additional tags to
handle receive, transmit etc.  Whilst the need for multiple tags is
probably a consequence of moving much of the dmamap information from
OpenBSD bus_dmamap_create() into FreeBSD bus_dma_tag_create(), the
rationale behind multiple levels of tags is unclear.  Is this solely
to provide a single point where overall device DMA characteristics &
limitations can be specified or is there another reason?

Secondly, bus_dma_tag_create() supports a BUS_DMA_ALLOCNOW flag that
"pre-allocates enough resources to handle at least one map load
operation on this tag".  However it also states "[t]his should not be
used for tags that only describe buffers that will be allocated with
bus_dmamem_alloc()" - does this mean that only one of bus_dmamap_load()
or bus_dmamap_alloc() should be used on a tag/mapping?  Or is the
sense backwards (ie "don't specify BUS_DMA_ALLOCNOW for tags that are
only used as the parent for other tags and never mapped themselves")?
Or is there some other explanation.

Thirdly, bus_dmamap_load() has a uses a callback function to return
the actual mapping details.  According to the man page, there is no
way to ensure that the callback occurs synchronously - a caller can
only request that bus_dmamap_load() fail if resources are not
immediately available.  Despite this, many drivers pass 0 for flags
(allowing an asynchronous invocation of the callback) and then fail
(and cleanup) if bus_dmamap_load() returns EINPROGRESS.  This appears
to open a race condition where the callback and cleanup could occur
simultaneously.  Mitigating the race condition seems to rely on one of
the following two behaviours:

a) The system is implicitly single-threaded when bus_dmamap_load() is
called (generally as part of the device attach() function).  Whilst
this is true at boot time, it would not be true for a dynamically
loaded module.

b) Passing BUS_DMA_ALLOCNOW to bus_dma_tag_create() guarantees that
the first bus_dmamap_load() on that tag will be synchronous.  Is this
true?  Whilst it appears to be implied, it's not explicitly stated.

Finally, what are the ordering requirements between the alloc, create,
load and sync functions?  OpenBSD implies that the normal ordering is
create, alloc, load, sync whilst several FreeBSD drivers use
tag_create, alloc, load and then create.

As a side-note, the manpage does not document the behaviour when
bus_dmamap_destroy() or bus_dma_tag_destroy() are called whilst a
bus_dmamap_load() callback is queued.  Is the callback cancelled
or do one or both destroy operations fail?

-- 
Peter Jeremy


pgprRjJNH0S6R.pgp
Description: PGP signature