Re: Where to put new bus_dmamap_load_mbuf() code
Maybe, but bus_dmamap_load() only lets you map one buffer at a time. I want to map a bunch of little buffers, and the API doesn't let me do that. And I don't want to change the API, because that would mean modifying busdma_machdep.c on each platform, which is a hell that I would rather avoid. bus_dmamap_load() is only one part of the API. bus_dmamap_load_mbuf or bus_dmamap_load_uio or also part of the API. They just don't happen to be impmeneted yet. 8-) Perhaps there should be an MD primitive that knows how to append to a mapping? This would allow you to write an MI loop that does exactly what you want. Any one of those ideas would be just fine. I eagerly await their realization. :) It's a separate list. The driver is reponsible for allocating the head of the list, then it hands it to bus_dmamap_list_alloc() along with the required dma tag. bus_dmamap_list_alloc() then calls bus_dmapap_create() to populate the list. The driver doesn't have to manipulate the list itself, until time comes to destroy it. Okay, but does this mean that bus_dmamap_load_mbuf no longer takes a dmamap? Drivers may want to allocate/manage the dmamaps in a different way. Yes, bus_dmamap_load_mbuf() accepts a dma tag, the head of the dmamap list, an mbuf, an segment array and a segment count. The Driver allocates the segment array with a certain number of members. It passes the array and segment count to bus_dmamap_load_mbuf(), which treats the segment count as the maximum number of segments that it can return to the caller. Once all the mappings have been done, it updates the segment count to indicate how many segments were actually needed. Then the driver transfers the info from the segment array into its DMA descriptor structures and kicks off the DMA operation. Once the device signals the transfer is done, the driver calls bus_dmamap_unload_mbuf() and bus_dmamap_destroy_mbuf() to unload the maps and return them to the map list for later use. It isn't until the driver calls bus_dmamap_list_destroy() that the dmamaps are actually released and the list free()ed. -Bill -- = -Bill Paul(510) 749-2329 | Senior Engineer, Master of Unix-Fu [EMAIL PROTECTED] | Wind River Systems = I like zees guys. Zey are fonny guys. Just keel one of zem. -- The 3 Amigos = To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Where to put new bus_dmamap_load_mbuf() code
My understanding is that you need a dmamap for every buffer that you want to map into bus space. You need one dmamap for each independantly manageable mapping. A single mapping may result in a long list of segments, regardless of whether you have a single KVA buffer or multiple KVA buffers that might contribute to the mapping. Yes yes, I understand that. But that's only if you want to map a buffer that's larger than PAGE_SIZE bytes, like, say, a 64K buffer being sent to a disk controller. What I want to make sure everyone understands here is that I'm not typically dealing with buffers this large: instead I have lots of small buffers that are smaller than PAGE_SIZE bytes. A single mbuf alone is only 256 bytes, of which only a fraction is used for data. An mbuf cluster buffer is usually only 2048 bytes. Transmitted packets are typically fragmented across 2 or 3 mbufs: the first mbuf contains the header, and the other two contain data. (Or the first one contains part of the header, the second one contains additional header data, and the third contains data -- whatever.) At most I will have 1500 bytes of data to send, which is less than PAGE_SIZE, and that 1500 bytes will be fragmented across a bunch of smaller buffers that are also smaller than PAGE_SIZE. Therefore I will not have one dmamap with multiple segments: I will have a bunch of dmamaps with one segment each. (I can hear somebody out there saying: What about jumbo frames? Yes, with jumbo frames, I will have 9K buffers to deal with, and in that case, you could have one dmamap with several segments, and I am taking this into account with the updated code I've written.) So unless I'm mistaken, for each mbuf in an mbuf list, what we have to do is this: - create a bus_dmamap_t for the data area in the mbuf using bus_dmamap_create() Creating a dmamap, depending on the architecture, could be expensive. You really want to create them in advance (or pool them), with at most one dmamap per concurrent transaction you support in your driver. The only problem here is that I can't really predict how many transactions will be going at one time. I will have at least RX_DMA_RING maps (one for each mbuf in the RX DMA ring), and some fraction of TX_DMA_RING maps. I could have the TX DMA ring completely filled with packets waiting to be DMA'ed and transmitted, or I may have only one entry in the ring currently in use. So I guess I have to allocate RX_DMA_RING + TX_DMA_RING dmamaps in order to be safe. - do the physical to bus mapping with bus_dmamap_load() bus_dmamap_load() only understands how to map a single buffer. You will have to pull pieces of bus_dmamap_load into a new function (or create inlines for common bits) to do this correctly. The algorithm goes something like this: foreach mbuf in the mbuf chain to load /* * Parse this contiguous piece of KVA into * its bus space regions. */ foreach bus space discontiguous region if (too_many_segs) return (error); Add new S/G element With the added complications of deferring the mapping if we're out of space, issuing the callback, etc. Why can't I just call bus_dmamap_load() multiple times, once for each mbuf in the mbuf list? (Note: for the record, an mbuf list usually contains one packet fragmented across multiple mbufs. An mbuf chain contains several mbuf lists, linked together via the m_nextpkt pointer in the header of the first mbuf in each list. By the time we get to the device driver, we always have mbuf lists only.) Chances are you are going to use the map again soon, so destroying it on every transaction is a waste. Ok, I spent some more time on this. I updated the code at: http://www.freebsd.org/~wpaul/busdma The changes are: - Tried to account for the case where an mbuf data region is larger than a page, i.e. when we have an mbuf with a 9K external buffer attached for use a jumbo ethernet frame. - Added routines to allocate a chunk of maps in a singly linked list, from which the other routines can grab them as needed. The driver attach routine calls bus_dmamap_list_init() with the max number of dmamaps that it will need, then the detach routine calls bus_dmamap_list_destroy() to nuke them when the driver is unloaded. The bus_dmamap_load_mbuf() routine uses the pre-allocated dmamaps from the list and bus_dmamap_list_destroy() returns them to the list when the transaction is completed. - Updated the modified if_sf driver to use the new code. Again, I've got this code running on the test box in the lab, so it's correct inasmuch as it compiles and runs, even though it may not be aesthetically pleasing. -Bill -- = -Bill Paul(510) 749-2329 | Senior Engineer, Master of Unix-Fu
Re: Where to put new bus_dmamap_load_mbuf() code
My understanding is that you need a dmamap for every buffer that you want to map into bus space. You need one dmamap for each independantly manageable mapping. A single mapping may result in a long list of segments, regardless of whether you have a single KVA buffer or multiple KVA buffers that might contribute to the mapping. Yes yes, I understand that. But that's only if you want to map a buffer that's larger than PAGE_SIZE bytes, like, say, a 64K buffer being sent to a disk controller. What I want to make sure everyone understands here is that I'm not typically dealing with buffers this large: instead I have lots of small buffers that are smaller than PAGE_SIZE bytes. A single mbuf alone is only 256 bytes, of which only a fraction is used for data. An mbuf cluster buffer is usually only 2048 bytes. Transmitted packets are typically fragmented across 2 or 3 mbufs: the first mbuf contains the header, and the other two contain data. (Or the first one contains part of the header, the second one contains additional header data, and the third contains data -- whatever.) At most I will have 1500 bytes of data to send, which is less than PAGE_SIZE, and that 1500 bytes will be fragmented across a bunch of smaller buffers that are also smaller than PAGE_SIZE. Therefore I will not have one dmamap with multiple segments: I will have a bunch of dmamaps with one segment each. The fact that the data is less than a page in size matters little to the bus dma concept. In other words, how is this packet presented to the hardware? Does it care that all of the component pieces are PAGE_SIZE in length? Probably not. It just wants the list of address/length pairs that compose that packet and there is no reason that each chunk needs to have it own, and potentially expensive, dmamap. Creating a dmamap, depending on the architecture, could be expensive. You really want to create them in advance (or pool them), with at most one dmamap per concurrent transaction you support in your driver. The only problem here is that I can't really predict how many transactions will be going at one time. I will have at least RX_DMA_RING maps (one for each mbuf in the RX DMA ring), and some fraction of TX_DMA_RING maps. I could have the TX DMA ring completely filled with packets waiting to be DMA'ed and transmitted, or I may have only one entry in the ring currently in use. So I guess I have to allocate RX_DMA_RING + TX_DMA_RING dmamaps in order to be safe. Yes or allocate them in chunks so that the total amount is only as large as the greatest demand your driver has ever seen. With the added complications of deferring the mapping if we're out of space, issuing the callback, etc. Why can't I just call bus_dmamap_load() multiple times, once for each mbuf in the mbuf list? Due to the cost of the dmamaps, the cost of which is platform and bus-dma implementation dependent - e.g. could be a 1-1 mapping to a hardware resource. Consider the case of having a full TX and RX ring in your driver. Instead of #TX*#RX dmamaps, you will now have three or more times that number. There is also the issue of coalessing the discontiguous chunks if there are too many chunks for your driver to handle. Bus dma is supposed to handle that for you (the x86 implementation doesn't yet, but it should) but it can't if it doesn't understand the segment limit per transaction. You've hidden that from bus dma by using a map per segment. (Note: for the record, an mbuf list usually contains one packet fragmented across multiple mbufs. An mbuf chain contains several mbuf lists, linked together via the m_nextpkt pointer in the header of the first mbuf in each list. By the time we get to the device driver, we always have mbuf lists only.) Okay, so I haven't written a network driver yet, but you got the idea, right? 8-) Chances are you are going to use the map again soon, so destroying it on every transaction is a waste. Ok, I spent some more time on this. I updated the code at: http://www.freebsd.org/~wpaul/busdma I'll take a look. The changes are: ... - Added routines to allocate a chunk of maps in a singly linked list, from which the other routines can grab them as needed. Are these hung off the dma tag or something? dmamaps may hold settings that are peculuar to the device that allocated them, so they cannot be shared with other clients of bus_dmamap_load_mbuf. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Where to put new bus_dmamap_load_mbuf() code
The fact that the data is less than a page in size matters little to the bus dma concept. In other words, how is this packet presented to the hardware? Does it care that all of the component pieces are PAGE_SIZE in length? Probably not. It just wants the list of address/length pairs that compose that packet and there is no reason that each chunk needs to have it own, and potentially expensive, dmamap. Maybe, but bus_dmamap_load() only lets you map one buffer at a time. I want to map a bunch of little buffers, and the API doesn't let me do that. And I don't want to change the API, because that would mean modifying busdma_machdep.c on each platform, which is a hell that I would rather avoid. Why can't I just call bus_dmamap_load() multiple times, once for each mbuf in the mbuf list? Due to the cost of the dmamaps, the cost of which is platform and bus-dma implementation dependent - e.g. could be a 1-1 mapping to a hardware resource. Consider the case of having a full TX and RX ring in your driver. Instead of #TX*#RX dmamaps, you will now have three or more times that number. There is also the issue of coalessing the discontiguous chunks if there are too many chunks for your driver to handle. Bus dma is supposed to handle that for you (the x86 implementation doesn't yet, but it should) but it can't if it doesn't understand the segment limit per transaction. You've hidden that from bus dma by using a map per segment. Ok, a slightly different question: what happens if I call bus_dmamap_load() more than once with different buffers but with the same dmamap? (Note: for the record, an mbuf list usually contains one packet fragmented across multiple mbufs. An mbuf chain contains several mbuf lists, linked together via the m_nextpkt pointer in the header of the first mbuf in each list. By the time we get to the device driver, we always have mbuf lists only.) Okay, so I haven't written a network driver yet, but you got the idea, right? 8-) Just don't get 3c509 and 3c905 misxed up and we'll be fine. :) - Added routines to allocate a chunk of maps in a singly linked list, from which the other routines can grab them as needed. Are these hung off the dma tag or something? dmamaps may hold settings that are peculuar to the device that allocated them, so they cannot be shared with other clients of bus_dmamap_load_mbuf. It's a separate list. The driver is reponsible for allocating the head of the list, then it hands it to bus_dmamap_list_alloc() along with the required dma tag. bus_dmamap_list_alloc() then calls bus_dmapap_create() to populate the list. The driver doesn't have to manipulate the list itself, until time comes to destroy it. -Bill -- = -Bill Paul(510) 749-2329 | Senior Engineer, Master of Unix-Fu [EMAIL PROTECTED] | Wind River Systems = I like zees guys. Zey are fonny guys. Just keel one of zem. -- The 3 Amigos = To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Where to put new bus_dmamap_load_mbuf() code
The fact that the data is less than a page in size matters little to the bus dma concept. In other words, how is this packet presented to the hardware? Does it care that all of the component pieces are PAGE_SIZE in length? Probably not. It just wants the list of address/length pairs that compose that packet and there is no reason that each chunk needs to have it own, and potentially expensive, dmamap. Maybe, but bus_dmamap_load() only lets you map one buffer at a time. I want to map a bunch of little buffers, and the API doesn't let me do that. And I don't want to change the API, because that would mean modifying busdma_machdep.c on each platform, which is a hell that I would rather avoid. bus_dmamap_load() is only one part of the API. bus_dmamap_load_mbuf or bus_dmamap_load_uio or also part of the API. They just don't happen to be impmeneted yet. 8-) Perhaps there should be an MD primitive that knows how to append to a mapping? This would allow you to write an MI loop that does exactly what you want. there are too many chunks for your driver to handle. Bus dma is supposed to handle that for you (the x86 implementation doesn't yet, but it should) but it can't if it doesn't understand the segment limit per transaction. You've hidden that from bus dma by using a map per segment. Ok, a slightly different question: what happens if I call bus_dmamap_load() more than once with different buffers but with the same dmamap? The behavior is undefined. - Added routines to allocate a chunk of maps in a singly linked list, from which the other routines can grab them as needed. Are these hung off the dma tag or something? dmamaps may hold settings that are peculuar to the device that allocated them, so they cannot be shared with other clients of bus_dmamap_load_mbuf. It's a separate list. The driver is reponsible for allocating the head of the list, then it hands it to bus_dmamap_list_alloc() along with the required dma tag. bus_dmamap_list_alloc() then calls bus_dmapap_create() to populate the list. The driver doesn't have to manipulate the list itself, until time comes to destroy it. Okay, but does this mean that bus_dmamap_load_mbuf no longer takes a dmamap? Drivers may want to allocate/manage the dmamaps in a different way. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Where to put new bus_dmamap_load_mbuf() code
Correction. This sample: if (bus_dma_tag_create(pci-parent_dmat, PAGE_SIZE, lim, BUS_SPACE_MAXADDR, BUS_SPACE_MAXADDR, NULL, NULL, len, 1, BUS_SPACE_MAXSIZE_32BIT, 0, pci-cntrol_dmat) != 0) { isp_prt(isp, ISP_LOGERR, cannot create a dma tag for control spaces); free(isp-isp_xflist, M_DEVBUF); free(pci-dmaps, M_DEVBUF); return (1); } You'll need to change the number of segments to match the max supported by the card (or the max you will ever need). This example made me realize that the bounce code doesn't deal with multiple segments being copied into a single page (i.e. tracking and using remaining free space in a page already allocated for bouncing for a single map). I'll have to break loose some time to fix that. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Where to put new bus_dmamap_load_mbuf() code
Yay! The current suggestion is fine except that each platform might have a more efficient, or even required, actual h/w mechanism for mapping mbufs. I'd also be a little concerned with the way you're overloading stuff into mbuf itself- but I'm a little shakier on this. Finally- why not make this an inline? -matt On Mon, 20 Aug 2001, Bill Paul wrote: Okay, I decided today to write a bus_dmamap_load_mbuf() routine to make it a little easier to convert the PCI NIC drivers to use the busdma API. It's not the same as the NetBSD code. There are four new functions: bus_dmamap_load_mbuf() bus_dmamap_unload_mbuf() bus_dmamap_sync_mbuf() bus_dmamap_destroy_mbuf() This is more or less in keeping with the existing API, except the new routines work exclusively on mbuf lists. The thing I need to figure out now is where to put the code. The current suggestion from jhb is to create the following two new files: sys/kern/kern_busdma.c sys/sys/busdma.h The functions are machine-independent, so they shouldn't be in sys/arch/arch/busdma_machdep.c. I mean, they could go there, but that would just result in code duplication. If somebody has a better suggestion, now's the time to speak up. Please let's avoid creating another bikeshed over this. Current code snapshot resides at: http://www.freebsd.org/~wpaul/busdma There's also a modified version if the Adaptec starfire driver there which uses the new routines. I'm running this version of the driver on a test box in the lab right now. -Bill -- = -Bill Paul(510) 749-2329 | Senior Engineer, Master of Unix-Fu [EMAIL PROTECTED] | Wind River Systems = I like zees guys. Zey are fonny guys. Just keel one of zem. -- The 3 Amigos = To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Where to put new bus_dmamap_load_mbuf() code
Another thing- maybe I'm confused- but I still don't see why you want to require the creating of a map each time you want to load an mbuf chain. Wouldn't it be better and more efficient to let the driver decide when and where the map is created and just use the common code for loads/unloads? On Mon, 20 Aug 2001, Matthew Jacob wrote: Yay! The current suggestion is fine except that each platform might have a more efficient, or even required, actual h/w mechanism for mapping mbufs. I'd also be a little concerned with the way you're overloading stuff into mbuf itself- but I'm a little shakier on this. Finally- why not make this an inline? -matt On Mon, 20 Aug 2001, Bill Paul wrote: Okay, I decided today to write a bus_dmamap_load_mbuf() routine to make it a little easier to convert the PCI NIC drivers to use the busdma API. It's not the same as the NetBSD code. There are four new functions: bus_dmamap_load_mbuf() bus_dmamap_unload_mbuf() bus_dmamap_sync_mbuf() bus_dmamap_destroy_mbuf() This is more or less in keeping with the existing API, except the new routines work exclusively on mbuf lists. The thing I need to figure out now is where to put the code. The current suggestion from jhb is to create the following two new files: sys/kern/kern_busdma.c sys/sys/busdma.h The functions are machine-independent, so they shouldn't be in sys/arch/arch/busdma_machdep.c. I mean, they could go there, but that would just result in code duplication. If somebody has a better suggestion, now's the time to speak up. Please let's avoid creating another bikeshed over this. Current code snapshot resides at: http://www.freebsd.org/~wpaul/busdma There's also a modified version if the Adaptec starfire driver there which uses the new routines. I'm running this version of the driver on a test box in the lab right now. -Bill -- = -Bill Paul(510) 749-2329 | Senior Engineer, Master of Unix-Fu [EMAIL PROTECTED] | Wind River Systems = I like zees guys. Zey are fonny guys. Just keel one of zem. -- The 3 Amigos = To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Where to put new bus_dmamap_load_mbuf() code
Another thing- maybe I'm confused- but I still don't see why you want to require the creating of a map each time you want to load an mbuf chain. Wouldn't it be better and more efficient to let the driver decide when and where the map is created and just use the common code for loads/unloads? Every hear the phrase you get what you pay for? The API isn't all that clear, and we don't have a man page or document that describes in detail how to use it properly. Rather than whining about that, I decided to tinker with it and Use The Source, Luke (tm). This is the result. My understanding is that you need a dmamap for every buffer that you want to map into bus space. Each mbuf has a single data buffer associated with it (either the data area in the mbuf itself, or external storage). We're not allowed to make assumptions about where these buffers are. Also, a single ethernet frame can be fragmented across multiple mbufs in a list. So unless I'm mistaken, for each mbuf in an mbuf list, what we have to do is this: - create a bus_dmamap_t for the data area in the mbuf using bus_dmamap_create() - do the physical to bus mapping with bus_dmamap_load() - call bus_dmamap_sync() as needed (might handle copying if bounce buffers are required) - insert mysterious DMA operation here - do post-DMA sync as needed (again, might require bounce copying) - call bus_dmamap_unload() to un-do the bus mapping (which might free bounce buffers if some were allocated by bus_dmamap_load()) - destroy the bus_dmamap_t One memory region, one DMA map. It seems to me that you can't use a single dmamap for multiple memory buffers, unless you make certain assumptions about where in physical memory those buffers reside, and I thought the idea of busdma was to provide a consistent, opaque API so that you would not have to make any assumptions. Now if I've gotten any of this wrong, please tell me how I should be doing it. Remember to show all work. I don't give partial credit, nor do I grade on a curve. Yay! The current suggestion is fine except that each platform might have a more efficient, or even required, actual h/w mechanism for mapping mbufs. It might, but right now, it doesn't. All I have to work with is the existing API. I'm not here to stick my fingers in it and change it all around. I just want to add a bit of code on top of it so that I don't have to go through quite so many contortions when I use the API in network adapter drivers. I'd also be a little concerned with the way you're overloading stuff into mbuf itself- but I'm a little shakier on this. I thought about this. Like it says in the comments, at the device driver level, you're almost never going to be using some of the pointers in the mbuf header. On the RX side, *we* (i.e. the driver) are allocating the mbufs, so we can do whatever the heck we want with them until such time as we hand them off to ether_input(), and by then we will have put things back the way they were. For the TX side, by the time we get the mbufs off the send queue, we always know we're going to have just an mbuf list (and not an mbuf chain), and we're going to toss the mbufs once we're done with them, so we can trample on certain things that we know don't matter to the OS or network stack anymore. The alternatives are: - Allocate some extra space in the DMA descriptor structures for the necessary bus_dmamap_t pointers. This is tricky with this particular NIC, and a little awkward. - Allocate my own private arrays of bus_dmamap_t that mirror the DMA rings. This is yet more memory I need to allocate and free at device attach and detach time. I've got space in the mbuf header. It's not being used. It's right where I need it. Why not take advantage of it? Finally- why not make this an inline? Er... because that idea offended my delicate sensibilities? :) -Bill -- = -Bill Paul(510) 749-2329 | Senior Engineer, Master of Unix-Fu [EMAIL PROTECTED] | Wind River Systems = I like zees guys. Zey are fonny guys. Just keel one of zem. -- The 3 Amigos = To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Where to put new bus_dmamap_load_mbuf() code
On Mon, 20 Aug 2001, Bill Paul wrote: Every hear the phrase you get what you pay for? The API isn't all that clear, and we don't have a man page or document that describes in detail how to use it properly. Rather than whining about that, I decided to tinker with it and Use The Source, Luke (tm). This is the result. Well, I'm more familiar with the NetBSD BusDma code, which is similar, and heavily documented. I'm also one of the principle authors of the Solaris DKI/DDI, which is *also* heavily documented, so I have some small notion of how a few of these subsystems are supposed to work, and where documentation exists for these- and similar systems (e.g., UDI). My understanding is that you need a dmamap for every buffer that you want to map into bus space. Each mbuf has a single data buffer associated with it (either the data area in the mbuf itself, or external storage). We're not allowed to make assumptions about where these buffers are. Also, a single ethernet frame can be fragmented across multiple mbufs in a list. So unless I'm mistaken, for each mbuf in an mbuf list, what we have to do is this: - create a bus_dmamap_t for the data area in the mbuf using bus_dmamap_create() - do the physical to bus mapping with bus_dmamap_load() - call bus_dmamap_sync() as needed (might handle copying if bounce buffers are required) - insert mysterious DMA operation here - do post-DMA sync as needed (again, might require bounce copying) - call bus_dmamap_unload() to un-do the bus mapping (which might free bounce buffers if some were allocated by bus_dmamap_load()) - destroy the bus_dmamap_t One memory region, one DMA map. It seems to me that you can't use a single dmamap for multiple memory buffers, unless you make certain assumptions about where in physical memory those buffers reside, and I thought the idea of busdma was to provide a consistent, opaque API so that you would not have to make any assumptions. Now if I've gotten any of this wrong, please tell me how I should be doing it. Remember to show all work. I don't give partial credit, nor do I grade on a curve. This is fine insofar as it goes, but there's nothing, I believe, that requires you to *create* a bus_dmamap_t each time you wish to map something and then destroy it when you unmap something. You might ask why one actually has the separate step from map creation and map load at all then. All the rest of the stuff for load/sync/sync/unload is fine. Using The Code (tm)- you can see that, for example, you can create a tag that describes all of the the addressable space your device can access, e.g.: if (bus_dma_tag_create(pci-parent_dmat, PAGE_SIZE, lim, BUS_SPACE_MAXADDR, BUS_SPACE_MAXADDR, NULL, NULL, len, 1, BUS_SPACE_MAXSIZE_32BIT, 0, pci-cntrol_dmat) != 0) { isp_prt(isp, ISP_LOGERR, cannot create a dma tag for control spaces); free(isp-isp_xflist, M_DEVBUF); free(pci-dmaps, M_DEVBUF); return (1); } Then, for each possible transaction slot- if you have a device that has a fixed number of transactions that are possible (as many do), you can create maps ahead of time: for (i = 0; i isp-isp_maxcmds; i++) { error = bus_dmamap_create(pci-parent_dmat, 0, pci-dmaps[i]); if (error) { ... so that for each transaction that needs to be mapped, you can dma load it: bus_dmamap_t *dp; ... dp = pci-dmaps[isp_handle_index(rq-req_handle)]; ... s = splsoftvm(); error = bus_dmamap_load(pci-parent_dmat, *dp, csio-data_ptr, csio-dxfer_len, eptr, mp, 0); ... which as part of the load process can sync it: dp = pci-dmaps[isp_handle_index(rq-req_handle)]; if ((csio-ccb_h.flags CAM_DIR_MASK) == CAM_DIR_IN) { bus_dmamap_sync(pci-parent_dmat, *dp, BUS_DMASYNC_PREREAD); } else { bus_dmamap_sync(pci-parent_dmat, *dp, BUS_DMASYNC_PREWRITE); } and when the transaction is done, you can sync and unload: static void isp_pci_dmateardown(struct ispsoftc *isp, XS_T *xs, u_int16_t handle) { struct isp_pcisoftc *pci = (struct isp_pcisoftc *)isp; bus_dmamap_t *dp = pci-dmaps[isp_handle_index(handle)]; if ((xs-ccb_h.flags CAM_DIR_MASK) == CAM_DIR_IN) { bus_dmamap_sync(pci-parent_dmat, *dp, BUS_DMASYNC_POSTREAD); } else { bus_dmamap_sync(pci-parent_dmat, *dp, BUS_DMASYNC_POSTWRITE); } bus_dmamap_unload(pci-parent_dmat, *dp); } So- my question still stands- from a performance point of view (Networking people *do* care about performance I believe, yes? :-))- if you don't need to create the map each time, wouldn't you rather not? So, the mbuf mapping code, which is cool to have, really might not need this? The current suggestion is
Re: Where to put new bus_dmamap_load_mbuf() code
Correction. This sample: if (bus_dma_tag_create(pci-parent_dmat, PAGE_SIZE, lim, BUS_SPACE_MAXADDR, BUS_SPACE_MAXADDR, NULL, NULL, len, 1, BUS_SPACE_MAXSIZE_32BIT, 0, pci-cntrol_dmat) != 0) { isp_prt(isp, ISP_LOGERR, cannot create a dma tag for control spaces); free(isp-isp_xflist, M_DEVBUF); free(pci-dmaps, M_DEVBUF); return (1); } Should have been: if (bus_dma_tag_create(NULL, 1, 0, BUS_SPACE_MAXADDR_32BIT, BUS_SPACE_MAXADDR, NULL, NULL, lim + 1, 255, lim, 0, pcs-parent_dmat) != 0) { device_printf(dev, could not create master dma tag\n); free(isp-isp_param, M_DEVBUF); free(pcs, M_DEVBUF); return (ENXIO); } To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Where to put new bus_dmamap_load_mbuf() code
Every hear the phrase you get what you pay for? The API isn't all that clear, and we don't have a man page or document that describes in detail how to use it properly. Rather than whining about that, I decided to tinker with it and Use The Source, Luke (tm). This is the result. Fair enough. My understanding is that you need a dmamap for every buffer that you want to map into bus space. You need one dmamap for each independantly manageable mapping. A single mapping may result in a long list of segments, regardless of whether you have a single KVA buffer or multiple KVA buffers that might contribute to the mapping. Each mbuf has a single data buffer associated with it (either the data area in the mbuf itself, or external storage). We're not allowed to make assumptions about where these buffers are. Also, a single ethernet frame can be fragmented across multiple mbufs in a list. So unless I'm mistaken, for each mbuf in an mbuf list, what we have to do is this: - create a bus_dmamap_t for the data area in the mbuf using bus_dmamap_create() Creating a dmamap, depending on the architecture, could be expensive. You really want to create them in advance (or pool them), with at most one dmamap per concurrent transaction you support in your driver. - do the physical to bus mapping with bus_dmamap_load() bus_dmamap_load() only understands how to map a single buffer. You will have to pull pieces of bus_dmamap_load into a new function (or create inlines for common bits) to do this correctly. The algorithm goes something like this: foreach mbuf in the mbuf chain to load /* * Parse this contiguous piece of KVA into * its bus space regions. */ foreach bus space discontiguous region if (too_many_segs) return (error); Add new S/G element With the added complications of deferring the mapping if we're out of space, issuing the callback, etc. - call bus_dmamap_sync() as needed (might handle copying if bounce buffers are required) - insert mysterious DMA operation here - do post-DMA sync as needed (again, might require bounce copying) - call bus_dmamap_unload() to un-do the bus mapping (which might free bounce buffers if some were allocated by bus_dmamap_load()) - destroy the bus_dmamap_t Chances are you are going to use the map again soon, so destroying it on every transaction is a waste. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message