Re: Where to put new bus_dmamap_load_mbuf() code

2001-08-22 Thread Bill Paul


 Maybe, but bus_dmamap_load() only lets you map one buffer at a time.
 I want to map a bunch of little buffers, and the API doesn't let me
 do that. And I don't want to change the API, because that would mean
 modifying busdma_machdep.c on each platform, which is a hell that I
 would rather avoid.
 
 bus_dmamap_load() is only one part of the API.  bus_dmamap_load_mbuf
 or bus_dmamap_load_uio or also part of the API.  They just don't happen
 to be impmeneted yet. 8-)  Perhaps there should be an MD primitive
 that knows how to append to a mapping?  This would allow you to write
 an MI loop that does exactly what you want.

Any one of those ideas would be just fine. I eagerly await their
realization. :)
 
 It's a separate list. The driver is reponsible for allocating the
 head of the list, then it hands it to bus_dmamap_list_alloc() along
 with the required dma tag. bus_dmamap_list_alloc() then calls
 bus_dmapap_create() to populate the list. The driver doesn't have
 to manipulate the list itself, until time comes to destroy it.
 
 Okay, but does this mean that bus_dmamap_load_mbuf no longer takes
 a dmamap?  Drivers may want to allocate/manage the dmamaps in a
 different way.

Yes, bus_dmamap_load_mbuf() accepts a dma tag, the head of the
dmamap list, an mbuf, an segment array and a segment count. The
Driver allocates the segment array with a certain number of
members. It passes the array and segment count to bus_dmamap_load_mbuf(),
which treats the segment count as the maximum number of segments
that it can return to the caller. Once all the mappings have been
done, it updates the segment count to indicate how many segments
were actually needed. Then the driver transfers the info from
the segment array into its DMA descriptor structures and kicks
off the DMA operation.

Once the device signals the transfer is done, the driver calls
bus_dmamap_unload_mbuf() and bus_dmamap_destroy_mbuf() to unload
the maps and return them to the map list for later use. It isn't
until the driver calls bus_dmamap_list_destroy() that the dmamaps
are actually released and the list free()ed.

-Bill

--
=
-Bill Paul(510) 749-2329 | Senior Engineer, Master of Unix-Fu
 [EMAIL PROTECTED] | Wind River Systems
=
I like zees guys. Zey are fonny guys. Just keel one of zem. -- The 3 Amigos
=

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Where to put new bus_dmamap_load_mbuf() code

2001-08-22 Thread Bill Paul

 My understanding is that you need a dmamap for every buffer that you want
 to map into bus space.
 
 You need one dmamap for each independantly manageable mapping.  A
 single mapping may result in a long list of segments, regardless
 of whether you have a single KVA buffer or multiple KVA buffers
 that might contribute to the mapping.

Yes yes, I understand that. But that's only if you want to map
a buffer that's larger than PAGE_SIZE bytes, like, say, a 64K
buffer being sent to a disk controller. What I want to make sure
everyone understands here is that I'm not typically dealing with
buffers this large: instead I have lots of small buffers that are
smaller than PAGE_SIZE bytes. A single mbuf alone is only 256
bytes, of which only a fraction is used for data. An mbuf cluster
buffer is usually only 2048 bytes. Transmitted packets are typically
fragmented across 2 or 3 mbufs: the first mbuf contains the header,
and the other two contain data. (Or the first one contains part
of the header, the second one contains additional header data,
and the third contains data -- whatever.) At most I will have 1500
bytes of data to send, which is less than PAGE_SIZE, and that 1500
bytes will be fragmented across a bunch of smaller buffers that
are also smaller than PAGE_SIZE. Therefore I will not have one
dmamap with multiple segments: I will have a bunch of dmamaps
with one segment each.

(I can hear somebody out there saying: What about jumbo frames?
Yes, with jumbo frames, I will have 9K buffers to deal with, and
in that case, you could have one dmamap with several segments, and
I am taking this into account with the updated code I've written.)

 So unless I'm mistaken, for each mbuf in an mbuf list, what we
 have to do is this:
 
 - create a bus_dmamap_t for the data area in the mbuf using
   bus_dmamap_create()
 
 Creating a dmamap, depending on the architecture, could be expensive.
 You really want to create them in advance (or pool them), with at most
 one dmamap per concurrent transaction you support in your driver.

The only problem here is that I can't really predict how many transactions
will be going at one time. I will have at least RX_DMA_RING maps (one for
each mbuf in the RX DMA ring), and some fraction of TX_DMA_RING maps.
I could have the TX DMA ring completely filled with packets waiting
to be DMA'ed and transmitted, or I may have only one entry in the ring
currently in use. So I guess I have to allocate RX_DMA_RING + TX_DMA_RING
dmamaps in order to be safe.

 - do the physical to bus mapping with bus_dmamap_load()
 
 bus_dmamap_load() only understands how to map a single buffer.
 You will have to pull pieces of bus_dmamap_load into a new
 function (or create inlines for common bits) to do this
 correctly.  The algorithm goes something like this:
 
   foreach mbuf in the mbuf chain to load
   /*
* Parse this contiguous piece of KVA into
* its bus space regions.
*/
   foreach bus space discontiguous region
   if (too_many_segs)
   return (error);
   Add new S/G element
 
 With the added complications of deferring the mapping if we're
 out of space, issuing the callback, etc.

Why can't I just call bus_dmamap_load() multiple times, once for
each mbuf in the mbuf list?

(Note: for the record, an mbuf list usually contains one packet
fragmented across multiple mbufs. An mbuf chain contains several
mbuf lists, linked together via the m_nextpkt pointer in the
header of the first mbuf in each list. By the time we get to
the device driver, we always have mbuf lists only.)

 Chances are you are going to use the map again soon, so destroying
 it on every transaction is a waste.

Ok, I spent some more time on this. I updated the code at:

http://www.freebsd.org/~wpaul/busdma

The changes are:

- Tried to account for the case where an mbuf data region is larger
  than a page, i.e. when we have an mbuf with a 9K external buffer
  attached for use a jumbo ethernet frame.
- Added routines to allocate a chunk of maps in a singly linked list,
  from which the other routines can grab them as needed. The driver
  attach routine calls bus_dmamap_list_init() with the max number of
  dmamaps that it will need, then the detach routine calls
  bus_dmamap_list_destroy() to nuke them when the driver is unloaded.
  The bus_dmamap_load_mbuf() routine uses the pre-allocated dmamaps
  from the list and bus_dmamap_list_destroy() returns them to the
  list when the transaction is completed.
- Updated the modified if_sf driver to use the new code.

Again, I've got this code running on the test box in the lab, so it's
correct inasmuch as it compiles and runs, even though it may not be
aesthetically pleasing.

-Bill 

--
=
-Bill Paul(510) 749-2329 | Senior Engineer, Master of Unix-Fu
 

Re: Where to put new bus_dmamap_load_mbuf() code

2001-08-22 Thread Justin T. Gibbs

 My understanding is that you need a dmamap for every buffer that you want
 to map into bus space.
 
 You need one dmamap for each independantly manageable mapping.  A
 single mapping may result in a long list of segments, regardless
 of whether you have a single KVA buffer or multiple KVA buffers
 that might contribute to the mapping.

Yes yes, I understand that. But that's only if you want to map
a buffer that's larger than PAGE_SIZE bytes, like, say, a 64K
buffer being sent to a disk controller. What I want to make sure
everyone understands here is that I'm not typically dealing with
buffers this large: instead I have lots of small buffers that are
smaller than PAGE_SIZE bytes. A single mbuf alone is only 256
bytes, of which only a fraction is used for data. An mbuf cluster
buffer is usually only 2048 bytes. Transmitted packets are typically
fragmented across 2 or 3 mbufs: the first mbuf contains the header,
and the other two contain data. (Or the first one contains part
of the header, the second one contains additional header data,
and the third contains data -- whatever.) At most I will have 1500
bytes of data to send, which is less than PAGE_SIZE, and that 1500
bytes will be fragmented across a bunch of smaller buffers that
are also smaller than PAGE_SIZE. Therefore I will not have one
dmamap with multiple segments: I will have a bunch of dmamaps
with one segment each.

The fact that the data is less than a page in size matters little
to the bus dma concept.  In other words, how is this packet presented
to the hardware?  Does it care that all of the component pieces are
 PAGE_SIZE in length?  Probably not.  It just wants the list of
address/length pairs that compose that packet and there is no reason
that each chunk needs to have it own, and potentially expensive, dmamap.

 Creating a dmamap, depending on the architecture, could be expensive.
 You really want to create them in advance (or pool them), with at most
 one dmamap per concurrent transaction you support in your driver.

The only problem here is that I can't really predict how many transactions
will be going at one time. I will have at least RX_DMA_RING maps (one for
each mbuf in the RX DMA ring), and some fraction of TX_DMA_RING maps.
I could have the TX DMA ring completely filled with packets waiting
to be DMA'ed and transmitted, or I may have only one entry in the ring
currently in use. So I guess I have to allocate RX_DMA_RING + TX_DMA_RING
dmamaps in order to be safe.

Yes or allocate them in chunks so that the total amount is only as large
as the greatest demand your driver has ever seen.

 With the added complications of deferring the mapping if we're
 out of space, issuing the callback, etc.

Why can't I just call bus_dmamap_load() multiple times, once for
each mbuf in the mbuf list?

Due to the cost of the dmamaps, the cost of which is platform and
bus-dma implementation dependent - e.g. could be a 1-1 mapping to
a hardware resource.  Consider the case of having a full TX and RX
ring in your driver.  Instead of #TX*#RX dmamaps, you will now have
three or more times that number.

There is also the issue of coalessing the discontiguous chunks if
there are too many chunks for your driver to handle.  Bus dma is
supposed to handle that for you (the x86 implementation doesn't
yet, but it should) but it can't if it doesn't understand the segment
limit per transaction.  You've hidden that from bus dma by using a
map per segment.

(Note: for the record, an mbuf list usually contains one packet
fragmented across multiple mbufs. An mbuf chain contains several
mbuf lists, linked together via the m_nextpkt pointer in the
header of the first mbuf in each list. By the time we get to
the device driver, we always have mbuf lists only.)

Okay, so I haven't written a network driver yet, but you got the idea,
right? 8-)

 Chances are you are going to use the map again soon, so destroying
 it on every transaction is a waste.

Ok, I spent some more time on this. I updated the code at:

http://www.freebsd.org/~wpaul/busdma

I'll take a look.

The changes are:

...

- Added routines to allocate a chunk of maps in a singly linked list,
  from which the other routines can grab them as needed.

Are these hung off the dma tag or something?  dmamaps may hold settings
that are peculuar to the device that allocated them, so they cannot
be shared with other clients of bus_dmamap_load_mbuf.

--
Justin

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Where to put new bus_dmamap_load_mbuf() code

2001-08-22 Thread Bill Paul


 The fact that the data is less than a page in size matters little
 to the bus dma concept.  In other words, how is this packet presented
 to the hardware?  Does it care that all of the component pieces are
  PAGE_SIZE in length?  Probably not.  It just wants the list of
 address/length pairs that compose that packet and there is no reason
 that each chunk needs to have it own, and potentially expensive, dmamap.

Maybe, but bus_dmamap_load() only lets you map one buffer at a time.
I want to map a bunch of little buffers, and the API doesn't let me
do that. And I don't want to change the API, because that would mean
modifying busdma_machdep.c on each platform, which is a hell that I
would rather avoid.

 Why can't I just call bus_dmamap_load() multiple times, once for
 each mbuf in the mbuf list?
 
 Due to the cost of the dmamaps, the cost of which is platform and
 bus-dma implementation dependent - e.g. could be a 1-1 mapping to
 a hardware resource.  Consider the case of having a full TX and RX
 ring in your driver.  Instead of #TX*#RX dmamaps, you will now have
 three or more times that number.
 
 There is also the issue of coalessing the discontiguous chunks if
 there are too many chunks for your driver to handle.  Bus dma is
 supposed to handle that for you (the x86 implementation doesn't
 yet, but it should) but it can't if it doesn't understand the segment
 limit per transaction.  You've hidden that from bus dma by using a
 map per segment.

Ok, a slightly different question: what happens if I call
bus_dmamap_load() more than once with different buffers but with
the same dmamap?

 (Note: for the record, an mbuf list usually contains one packet
 fragmented across multiple mbufs. An mbuf chain contains several
 mbuf lists, linked together via the m_nextpkt pointer in the
 header of the first mbuf in each list. By the time we get to
 the device driver, we always have mbuf lists only.)
 
 Okay, so I haven't written a network driver yet, but you got the idea,
 right? 8-)

Just don't get 3c509 and 3c905 misxed up and we'll be fine. :)

 - Added routines to allocate a chunk of maps in a singly linked list,
   from which the other routines can grab them as needed.
 
 Are these hung off the dma tag or something?  dmamaps may hold settings
 that are peculuar to the device that allocated them, so they cannot
 be shared with other clients of bus_dmamap_load_mbuf.

It's a separate list. The driver is reponsible for allocating the
head of the list, then it hands it to bus_dmamap_list_alloc() along
with the required dma tag. bus_dmamap_list_alloc() then calls
bus_dmapap_create() to populate the list. The driver doesn't have
to manipulate the list itself, until time comes to destroy it.

-Bill

--
=
-Bill Paul(510) 749-2329 | Senior Engineer, Master of Unix-Fu
 [EMAIL PROTECTED] | Wind River Systems
=
I like zees guys. Zey are fonny guys. Just keel one of zem. -- The 3 Amigos
=

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Where to put new bus_dmamap_load_mbuf() code

2001-08-22 Thread Justin T. Gibbs


 The fact that the data is less than a page in size matters little
 to the bus dma concept.  In other words, how is this packet presented
 to the hardware?  Does it care that all of the component pieces are
  PAGE_SIZE in length?  Probably not.  It just wants the list of
 address/length pairs that compose that packet and there is no reason
 that each chunk needs to have it own, and potentially expensive, dmamap.

Maybe, but bus_dmamap_load() only lets you map one buffer at a time.
I want to map a bunch of little buffers, and the API doesn't let me
do that. And I don't want to change the API, because that would mean
modifying busdma_machdep.c on each platform, which is a hell that I
would rather avoid.

bus_dmamap_load() is only one part of the API.  bus_dmamap_load_mbuf
or bus_dmamap_load_uio or also part of the API.  They just don't happen
to be impmeneted yet. 8-)  Perhaps there should be an MD primitive
that knows how to append to a mapping?  This would allow you to write
an MI loop that does exactly what you want.

 there are too many chunks for your driver to handle.  Bus dma is
 supposed to handle that for you (the x86 implementation doesn't
 yet, but it should) but it can't if it doesn't understand the segment
 limit per transaction.  You've hidden that from bus dma by using a
 map per segment.

Ok, a slightly different question: what happens if I call
bus_dmamap_load() more than once with different buffers but with
the same dmamap?

The behavior is undefined.

 - Added routines to allocate a chunk of maps in a singly linked list,
   from which the other routines can grab them as needed.
 
 Are these hung off the dma tag or something?  dmamaps may hold settings
 that are peculuar to the device that allocated them, so they cannot
 be shared with other clients of bus_dmamap_load_mbuf.

It's a separate list. The driver is reponsible for allocating the
head of the list, then it hands it to bus_dmamap_list_alloc() along
with the required dma tag. bus_dmamap_list_alloc() then calls
bus_dmapap_create() to populate the list. The driver doesn't have
to manipulate the list itself, until time comes to destroy it.

Okay, but does this mean that bus_dmamap_load_mbuf no longer takes
a dmamap?  Drivers may want to allocate/manage the dmamaps in a
different way.

--
Justin

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Where to put new bus_dmamap_load_mbuf() code

2001-08-21 Thread Justin T. Gibbs


Correction.

This sample:

 if (bus_dma_tag_create(pci-parent_dmat, PAGE_SIZE, lim,
 BUS_SPACE_MAXADDR, BUS_SPACE_MAXADDR, NULL, NULL, len, 1,
 BUS_SPACE_MAXSIZE_32BIT, 0, pci-cntrol_dmat) != 0) {
 isp_prt(isp, ISP_LOGERR,
 cannot create a dma tag for control spaces);
 free(isp-isp_xflist, M_DEVBUF);
 free(pci-dmaps, M_DEVBUF);
 return (1);
 }


You'll need to change the number of segments to match the max
supported by the card (or the max you will ever need).  This
example made me realize that the bounce code doesn't deal with
multiple segments being copied into a single page (i.e. tracking
and using remaining free space in a page already allocated for
bouncing for a single map).  I'll have to break loose some time
to fix that.

--
Justin

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Where to put new bus_dmamap_load_mbuf() code

2001-08-20 Thread Matthew Jacob


Yay!

The current suggestion is fine except that each platform might have a more
efficient, or even required, actual h/w mechanism for mapping mbufs.

I'd also be a little concerned with the way you're overloading stuff into mbuf
itself- but I'm a little shakier on this.

Finally- why not make this an inline?

-matt


On Mon, 20 Aug 2001, Bill Paul wrote:

 Okay, I decided today to write a bus_dmamap_load_mbuf() routine to
 make it a little easier to convert the PCI NIC drivers to use the
 busdma API. It's not the same as the NetBSD code. There are four
 new functions:
 
 bus_dmamap_load_mbuf()
 bus_dmamap_unload_mbuf()
 bus_dmamap_sync_mbuf()
 bus_dmamap_destroy_mbuf()
 
 This is more or less in keeping with the existing API, except the new
 routines work exclusively on mbuf lists. The thing I need to figure
 out now is where to put the code. The current suggestion from jhb is
 to create the following two new files:
 
 sys/kern/kern_busdma.c
 sys/sys/busdma.h
 
 The functions are machine-independent, so they shouldn't be in
 sys/arch/arch/busdma_machdep.c. I mean, they could go there, but
 that would just result in code duplication. If somebody has a better
 suggestion, now's the time to speak up. Please let's avoid creating
 another bikeshed over this.
 
 Current code snapshot resides at:
 
 http://www.freebsd.org/~wpaul/busdma
 
 There's also a modified version if the Adaptec starfire driver there
 which uses the new routines. I'm running this version of the driver on
 a test box in the lab right now.
 
 -Bill
 
 --
 =
 -Bill Paul(510) 749-2329 | Senior Engineer, Master of Unix-Fu
  [EMAIL PROTECTED] | Wind River Systems
 =
 I like zees guys. Zey are fonny guys. Just keel one of zem. -- The 3 Amigos
 =
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-hackers in the body of the message
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Where to put new bus_dmamap_load_mbuf() code

2001-08-20 Thread Matthew Jacob


Another thing- maybe I'm confused- but I still don't see why you want to
require the creating of a map each time you want to load an mbuf
chain. Wouldn't it be better and more efficient to let the driver decide when
and where the map is created and just use the common code for loads/unloads?

On Mon, 20 Aug 2001, Matthew Jacob wrote:

 
 Yay!
 
 The current suggestion is fine except that each platform might have a more
 efficient, or even required, actual h/w mechanism for mapping mbufs.
 
 I'd also be a little concerned with the way you're overloading stuff into mbuf
 itself- but I'm a little shakier on this.
 
 Finally- why not make this an inline?
 
 -matt
 
 
 On Mon, 20 Aug 2001, Bill Paul wrote:
 
  Okay, I decided today to write a bus_dmamap_load_mbuf() routine to
  make it a little easier to convert the PCI NIC drivers to use the
  busdma API. It's not the same as the NetBSD code. There are four
  new functions:
  
  bus_dmamap_load_mbuf()
  bus_dmamap_unload_mbuf()
  bus_dmamap_sync_mbuf()
  bus_dmamap_destroy_mbuf()
  
  This is more or less in keeping with the existing API, except the new
  routines work exclusively on mbuf lists. The thing I need to figure
  out now is where to put the code. The current suggestion from jhb is
  to create the following two new files:
  
  sys/kern/kern_busdma.c
  sys/sys/busdma.h
  
  The functions are machine-independent, so they shouldn't be in
  sys/arch/arch/busdma_machdep.c. I mean, they could go there, but
  that would just result in code duplication. If somebody has a better
  suggestion, now's the time to speak up. Please let's avoid creating
  another bikeshed over this.
  
  Current code snapshot resides at:
  
  http://www.freebsd.org/~wpaul/busdma
  
  There's also a modified version if the Adaptec starfire driver there
  which uses the new routines. I'm running this version of the driver on
  a test box in the lab right now.
  
  -Bill
  
  --
  =
  -Bill Paul(510) 749-2329 | Senior Engineer, Master of Unix-Fu
   [EMAIL PROTECTED] | Wind River Systems
  =
  I like zees guys. Zey are fonny guys. Just keel one of zem. -- The 3 Amigos
  =
  
  To Unsubscribe: send mail to [EMAIL PROTECTED]
  with unsubscribe freebsd-hackers in the body of the message
  
 
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Where to put new bus_dmamap_load_mbuf() code

2001-08-20 Thread Bill Paul

 
 Another thing- maybe I'm confused- but I still don't see why you want to
 require the creating of a map each time you want to load an mbuf
 chain. Wouldn't it be better and more efficient to let the driver decide when
 and where the map is created and just use the common code for loads/unloads?

Every hear the phrase you get what you pay for? The API isn't all that
clear, and we don't have a man page or document that describes in detail
how to use it properly. Rather than whining about that, I decided to
tinker with it and Use The Source, Luke (tm). This is the result.

My understanding is that you need a dmamap for every buffer that you want
to map into bus space. Each mbuf has a single data buffer associated with
it (either the data area in the mbuf itself, or external storage). We're
not allowed to make assumptions about where these buffers are. Also, a
single ethernet frame can be fragmented across multiple mbufs in a list.

So unless I'm mistaken, for each mbuf in an mbuf list, what we
have to do is this:

- create a bus_dmamap_t for the data area in the mbuf using
  bus_dmamap_create()
- do the physical to bus mapping with bus_dmamap_load()
- call bus_dmamap_sync() as needed (might handle copying if bounce
  buffers are required)
- insert mysterious DMA operation here
- do post-DMA sync as needed (again, might require bounce copying)
- call bus_dmamap_unload() to un-do the bus mapping (which might free
  bounce buffers if some were allocated by bus_dmamap_load())
- destroy the bus_dmamap_t

One memory region, one DMA map. It seems to me that you can't use a
single dmamap for multiple memory buffers, unless you make certain
assumptions about where in physical memory those buffers reside, and
I thought the idea of busdma was to provide a consistent, opaque API
so that you would not have to make any assumptions.

Now if I've gotten any of this wrong, please tell me how I should be
doing it. Remember to show all work. I don't give partial credit, nor
do I grade on a curve.

  Yay!
  
  The current suggestion is fine except that each platform might have a more
  efficient, or even required, actual h/w mechanism for mapping mbufs.

It might, but right now, it doesn't. All I have to work with is the
existing API. I'm not here to stick my fingers in it and change it all
around. I just want to add a bit of code on top of it so that I don't
have to go through quite so many contortions when I use the API in
network adapter drivers.
 
  I'd also be a little concerned with the way you're overloading stuff into mbuf
  itself- but I'm a little shakier on this.

I thought about this. Like it says in the comments, at the device driver
level, you're almost never going to be using some of the pointers in the
mbuf header. On the RX side, *we* (i.e. the driver) are allocating the
mbufs, so we can do whatever the heck we want with them until such time
as we hand them off to ether_input(), and by then we will have put things
back the way they were. For the TX side, by the time we get the mbufs
off the send queue, we always know we're going to have just an mbuf list
(and not an mbuf chain), and we're going to toss the mbufs once we're done
with them, so we can trample on certain things that we know don't matter
to the OS or network stack anymore.

The alternatives are:

- Allocate some extra space in the DMA descriptor structures for the
  necessary bus_dmamap_t pointers. This is tricky with this particular
  NIC, and a little awkward.
- Allocate my own private arrays of bus_dmamap_t that mirror the DMA
  rings. This is yet more memory I need to allocate and free at device
  attach and detach time.

I've got space in the mbuf header. It's not being used. It's right
where I need it. Why not take advantage of it?

  Finally- why not make this an inline?

Er... because that idea offended my delicate sensibilities? :)

-Bill

--
=
-Bill Paul(510) 749-2329 | Senior Engineer, Master of Unix-Fu
 [EMAIL PROTECTED] | Wind River Systems
=
I like zees guys. Zey are fonny guys. Just keel one of zem. -- The 3 Amigos
=

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Where to put new bus_dmamap_load_mbuf() code

2001-08-20 Thread Matthew Jacob



On Mon, 20 Aug 2001, Bill Paul wrote:

 Every hear the phrase you get what you pay for? The API isn't all that
 clear, and we don't have a man page or document that describes in detail
 how to use it properly. Rather than whining about that, I decided to
 tinker with it and Use The Source, Luke (tm). This is the result.

Well, I'm more familiar with the NetBSD BusDma code, which is similar, and
heavily documented. I'm also one of the principle authors of the Solaris
DKI/DDI, which is *also* heavily documented, so I have some small notion of
how a few of these subsystems are supposed to work, and where documentation
exists for these- and similar systems (e.g., UDI).


 My understanding is that you need a dmamap for every buffer that you want
 to map into bus space. Each mbuf has a single data buffer associated with
 it (either the data area in the mbuf itself, or external storage). We're
 not allowed to make assumptions about where these buffers are. Also, a
 single ethernet frame can be fragmented across multiple mbufs in a list.

 So unless I'm mistaken, for each mbuf in an mbuf list, what we
 have to do is this:

 - create a bus_dmamap_t for the data area in the mbuf using
   bus_dmamap_create()
 - do the physical to bus mapping with bus_dmamap_load()
 - call bus_dmamap_sync() as needed (might handle copying if bounce
   buffers are required)
 - insert mysterious DMA operation here
 - do post-DMA sync as needed (again, might require bounce copying)
 - call bus_dmamap_unload() to un-do the bus mapping (which might free
   bounce buffers if some were allocated by bus_dmamap_load())
 - destroy the bus_dmamap_t

 One memory region, one DMA map. It seems to me that you can't use a
 single dmamap for multiple memory buffers, unless you make certain
 assumptions about where in physical memory those buffers reside, and
 I thought the idea of busdma was to provide a consistent, opaque API
 so that you would not have to make any assumptions.

 Now if I've gotten any of this wrong, please tell me how I should be
 doing it. Remember to show all work. I don't give partial credit, nor
 do I grade on a curve.


This is fine insofar as it goes, but there's nothing, I believe, that requires
you to *create* a bus_dmamap_t each time you wish to map something and then
destroy it when you unmap something. You might ask why one actually has the
separate step from map creation and map load at all then. All the rest of the
stuff for load/sync/sync/unload is fine.

Using The Code (tm)- you can see that, for example, you can create
a tag that describes all of the the addressable space your device
can access, e.g.:

if (bus_dma_tag_create(pci-parent_dmat, PAGE_SIZE, lim,
BUS_SPACE_MAXADDR, BUS_SPACE_MAXADDR, NULL, NULL, len, 1,
BUS_SPACE_MAXSIZE_32BIT, 0, pci-cntrol_dmat) != 0) {
isp_prt(isp, ISP_LOGERR,
cannot create a dma tag for control spaces);
free(isp-isp_xflist, M_DEVBUF);
free(pci-dmaps, M_DEVBUF);
return (1);
}

Then, for each possible transaction slot- if you have a device that has a
fixed number of transactions that are possible (as many do), you can create
maps ahead of time:

for (i = 0; i  isp-isp_maxcmds; i++) {
error = bus_dmamap_create(pci-parent_dmat, 0, pci-dmaps[i]);
if (error) {
...

so that for each transaction that needs to be mapped, you can dma load it:

bus_dmamap_t *dp;
...
dp = pci-dmaps[isp_handle_index(rq-req_handle)];
...
s = splsoftvm();
error = bus_dmamap_load(pci-parent_dmat, *dp,
 csio-data_ptr, csio-dxfer_len, eptr, mp, 0);
...

which as part of the load process can sync it:

dp = pci-dmaps[isp_handle_index(rq-req_handle)];

if ((csio-ccb_h.flags  CAM_DIR_MASK) == CAM_DIR_IN) {
bus_dmamap_sync(pci-parent_dmat, *dp, BUS_DMASYNC_PREREAD);
} else {
bus_dmamap_sync(pci-parent_dmat, *dp, BUS_DMASYNC_PREWRITE);
}

and when the transaction is done, you can sync and unload:

static void
isp_pci_dmateardown(struct ispsoftc *isp, XS_T *xs, u_int16_t handle)
{
struct isp_pcisoftc *pci = (struct isp_pcisoftc *)isp;
bus_dmamap_t *dp = pci-dmaps[isp_handle_index(handle)];
if ((xs-ccb_h.flags  CAM_DIR_MASK) == CAM_DIR_IN) {
bus_dmamap_sync(pci-parent_dmat, *dp, BUS_DMASYNC_POSTREAD);
} else {
bus_dmamap_sync(pci-parent_dmat, *dp, BUS_DMASYNC_POSTWRITE);
}
bus_dmamap_unload(pci-parent_dmat, *dp);
}




So- my question still stands- from a performance point of view (Networking
people *do* care about performance I believe, yes? :-))- if you don't need to
create the map each time, wouldn't you rather not? So, the mbuf mapping code,
which is cool to have, really might not need this?

   The current suggestion is 

Re: Where to put new bus_dmamap_load_mbuf() code

2001-08-20 Thread Matthew Jacob


Correction.

This sample:

 if (bus_dma_tag_create(pci-parent_dmat, PAGE_SIZE, lim,
 BUS_SPACE_MAXADDR, BUS_SPACE_MAXADDR, NULL, NULL, len, 1,
 BUS_SPACE_MAXSIZE_32BIT, 0, pci-cntrol_dmat) != 0) {
 isp_prt(isp, ISP_LOGERR,
 cannot create a dma tag for control spaces);
 free(isp-isp_xflist, M_DEVBUF);
 free(pci-dmaps, M_DEVBUF);
 return (1);
 }


Should have been:

if (bus_dma_tag_create(NULL, 1, 0, BUS_SPACE_MAXADDR_32BIT,
BUS_SPACE_MAXADDR, NULL, NULL, lim + 1,
255, lim, 0, pcs-parent_dmat) != 0) {
device_printf(dev, could not create master dma tag\n);
free(isp-isp_param, M_DEVBUF);
free(pcs, M_DEVBUF);
return (ENXIO);
}



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Where to put new bus_dmamap_load_mbuf() code

2001-08-20 Thread Justin T. Gibbs

Every hear the phrase you get what you pay for? The API isn't all that
clear, and we don't have a man page or document that describes in detail
how to use it properly. Rather than whining about that, I decided to
tinker with it and Use The Source, Luke (tm). This is the result.

Fair enough.

My understanding is that you need a dmamap for every buffer that you want
to map into bus space.

You need one dmamap for each independantly manageable mapping.  A
single mapping may result in a long list of segments, regardless
of whether you have a single KVA buffer or multiple KVA buffers
that might contribute to the mapping.

Each mbuf has a single data buffer associated with
it (either the data area in the mbuf itself, or external storage). We're
not allowed to make assumptions about where these buffers are. Also, a
single ethernet frame can be fragmented across multiple mbufs in a list.

So unless I'm mistaken, for each mbuf in an mbuf list, what we
have to do is this:

- create a bus_dmamap_t for the data area in the mbuf using
  bus_dmamap_create()

Creating a dmamap, depending on the architecture, could be expensive.
You really want to create them in advance (or pool them), with at most
one dmamap per concurrent transaction you support in your driver.

- do the physical to bus mapping with bus_dmamap_load()

bus_dmamap_load() only understands how to map a single buffer.
You will have to pull pieces of bus_dmamap_load into a new
function (or create inlines for common bits) to do this
correctly.  The algorithm goes something like this:

foreach mbuf in the mbuf chain to load
/*
 * Parse this contiguous piece of KVA into
 * its bus space regions.
 */
foreach bus space discontiguous region
if (too_many_segs)
return (error);
Add new S/G element

With the added complications of deferring the mapping if we're
out of space, issuing the callback, etc.

- call bus_dmamap_sync() as needed (might handle copying if bounce
  buffers are required)
- insert mysterious DMA operation here
- do post-DMA sync as needed (again, might require bounce copying)
- call bus_dmamap_unload() to un-do the bus mapping (which might free
  bounce buffers if some were allocated by bus_dmamap_load())
- destroy the bus_dmamap_t

Chances are you are going to use the map again soon, so destroying
it on every transaction is a waste.

--
Justin

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message