Re: [Fwd: Re: use of bus_dmamap_sync]
On 10/26/05 10:39 Scott Long said the following: Apparently the original poster sent his question to me in private, then sent it again to the mailing list right as I was responding in private. apologies on that, scott. an initial search only turned up your message in the archives, but spreading it wider (not confining the google to lists.freebsd.org) brought up more hits, and that made me post it into -hackers. do bear with me as i try to understand this. Below is my response. Note that I edited it slightly to fix an error that I found bus_dmamap_sync(tag, map, BUS_DMASYNC_PREREAD); Ask hardware for data bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTREAD); read from readbuf (i'm assuming that device has put data in readbuf) POSITION B } in other words, the PREREAD/POSTREAD wrap around the device's access to memory, and not the CPU's ? bus_dmamap_sync(tag, map, BUS_DMASYNC_PREWRITE); notify hardware of the write bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTWRITE); The point of the syncs is to do the proper memory barrier and cache coherency magic between the CPU and the bus as well as do the memory copies for bounce buffers. If you are dealing with statically mapped buffers, i.e. for an rx/tx descriptor ring, then you'll want code however, reading thru the syscall code, bus_dmamem_alloc() sets the dmamap to NULL, and if it's null, bus_dmamap_sync() is not called at all. would this mean that if memory is allocated by bus_dmamem_alloc(), it does not need to be synced with bus_dmamap_sync() ? bus_dmamap_sync() only seems critical when a dma map is created with bus_dmamap_create() and the buffer space allocated dynamically thru contigmalloc. well for storage devices where the load operation must succeed. It doesn't work as well for network devices where the latency of the indirect calls is measurable. So for that, I added bus_dmamap_load_mbuf_sg(). It eliminates the callback function and returns the scatter gather list directly. So, the above example would be: i'm basically trying to debug a problem with a driver which works like a charm on freebsd 5.x, but somehow doesnt on freebsd 4.x. the source for the driver is /exactly/ the same on both systems. the symptoms i keep seeing are that the same data which is written out is also read in by the read routines, which is what made me suspect that somewhere the dma transfers were not happenning and stumbled upon this. for each buffer. It's often better to pre-allocate the maps at init time, put them on a list, and then just push and pop them off the list i do this, for each buffer, at init time. int *readbuf bus_dma_tag_create() bus_dmamem_alloc() bus_dmamap_load() int *writebuf bus_dma_tag_create() bus_dmamem_alloc() bus_dmamap_load() subsequently, the device interrupts once every ms (1000Hz) and the buffers are read/written to. in the interrupt handler, i currently have, bus_dmamap_sync(POSTREAD) read data from readbuf (readval = readbuf) write data to writebuf (writebuf = someval) bus_dmamap_sync(PREWRITE) i've left out PREREAD and POSTWRITE as both seem to be no ops in freebsd 4.x. this seems consistent with your explanation. is this correct ? -- Regards, /\_/\ All dogs go to heaven. [EMAIL PROTECTED](0 0)http://www.alphaque.com/ +==oOO--(_)--OOo==+ | for a in past present future; do| | for b in clients employers associates relatives neighbours pets; do | | echo The opinions here in no way reflect the opinions of my $a $b. | | done; done | +=+ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: [Fwd: Re: use of bus_dmamap_sync]
On Wednesday 26 October 2005 04:47 am, Dinesh Nair wrote: On 10/26/05 10:39 Scott Long said the following: Apparently the original poster sent his question to me in private, then sent it again to the mailing list right as I was responding in private. apologies on that, scott. an initial search only turned up your message in the archives, but spreading it wider (not confining the google to lists.freebsd.org) brought up more hits, and that made me post it into -hackers. do bear with me as i try to understand this. Below is my response. Note that I edited it slightly to fix an error that I found bus_dmamap_sync(tag, map, BUS_DMASYNC_PREREAD); Ask hardware for data bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTREAD); read from readbuf (i'm assuming that device has put data in readbuf) POSITION B } in other words, the PREREAD/POSTREAD wrap around the device's access to memory, and not the CPU's ? Yes, scott's notes are more correct than mine here. bus_dmamap_sync(tag, map, BUS_DMASYNC_PREWRITE); notify hardware of the write bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTWRITE); The point of the syncs is to do the proper memory barrier and cache coherency magic between the CPU and the bus as well as do the memory copies for bounce buffers. If you are dealing with statically mapped buffers, i.e. for an rx/tx descriptor ring, then you'll want code however, reading thru the syscall code, bus_dmamem_alloc() sets the dmamap to NULL, and if it's null, bus_dmamap_sync() is not called at all. would this mean that if memory is allocated by bus_dmamem_alloc(), it does not need to be synced with bus_dmamap_sync() ? Perhaps on i386. Each arch implements sync(). Argh, it does look like the memory barriers needed on e.g., Alpha aren't used with static buffers because of the map != NULL check in sys/busdma.h. *sigh* I guess archs that need membars even without bounce buffers need to always allocate and setup a bus_dmamap. None of that matters for i386 though. -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve = http://www.FreeBSD.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: [Fwd: Re: use of bus_dmamap_sync]
John Baldwin wrote: On Wednesday 26 October 2005 04:47 am, Dinesh Nair wrote: On 10/26/05 10:39 Scott Long said the following: Apparently the original poster sent his question to me in private, then sent it again to the mailing list right as I was responding in private. apologies on that, scott. an initial search only turned up your message in the archives, but spreading it wider (not confining the google to lists.freebsd.org) brought up more hits, and that made me post it into -hackers. do bear with me as i try to understand this. Below is my response. Note that I edited it slightly to fix an error that I found bus_dmamap_sync(tag, map, BUS_DMASYNC_PREREAD); Ask hardware for data bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTREAD); read from readbuf (i'm assuming that device has put data in readbuf) POSITION B } in other words, the PREREAD/POSTREAD wrap around the device's access to memory, and not the CPU's ? Yes, scott's notes are more correct than mine here. bus_dmamap_sync(tag, map, BUS_DMASYNC_PREWRITE); notify hardware of the write bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTWRITE); The point of the syncs is to do the proper memory barrier and cache coherency magic between the CPU and the bus as well as do the memory copies for bounce buffers. If you are dealing with statically mapped buffers, i.e. for an rx/tx descriptor ring, then you'll want code however, reading thru the syscall code, bus_dmamem_alloc() sets the dmamap to NULL, and if it's null, bus_dmamap_sync() is not called at all. would this mean that if memory is allocated by bus_dmamem_alloc(), it does not need to be synced with bus_dmamap_sync() ? The value of the map is an implementation detail, which is why it's an opaque typedef. Portable code should always assume that the map has valid data. Now, specifically for i386, if you have a device with a 4GB address limit, and it has no data alignment constraints (unlike twe), and you are not using PAE, then yes the map will be NULL and the syncs will do nothing. Assuming that all three of these cases are false is not good, though. Perhaps on i386. Each arch implements sync(). Argh, it does look like the memory barriers needed on e.g., Alpha aren't used with static buffers because of the map != NULL check in sys/busdma.h. *sigh* I guess archs that need membars even without bounce buffers need to always allocate and setup a bus_dmamap. None of that matters for i386 though. Feel free to fix alpha. Again, long ago, I thought that alpha pretended to be coherent in the 2GB DMA window that we use so that it could be more like i386. If that's not true then that's fine. If you need to make structural changes to the MI code on order to fix alpha, please let me know. Scott ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: [Fwd: Re: use of bus_dmamap_sync]
On 10/26/05 23:54 Scott Long said the following: The value of the map is an implementation detail, which is why it's an opaque typedef. Portable code should always assume that the map has valid data. Now, specifically for i386, if you have a device with a right, so for portability's sake, bus_dmamap_sync should be used anyway. twe), and you are not using PAE, then yes the map will be NULL and the syncs will do nothing. Assuming that all three of these cases are false is not good, though. well, they are in my situation, so obviously bus dma is not the cause of the problem i'm seeing. thanx to both scottl and jhb for the explanation. this throws a lot of light onto the handling of dma access to devices for me. -- Regards, /\_/\ All dogs go to heaven. [EMAIL PROTECTED](0 0)http://www.alphaque.com/ +==oOO--(_)--OOo==+ | for a in past present future; do| | for b in clients employers associates relatives neighbours pets; do | | echo The opinions here in no way reflect the opinions of my $a $b. | | done; done | +=+ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: [Fwd: Re: use of bus_dmamap_sync]
On Wednesday 26 October 2005 11:54 am, Scott Long wrote: Perhaps on i386. Each arch implements sync(). Argh, it does look like the memory barriers needed on e.g., Alpha aren't used with static buffers because of the map != NULL check in sys/busdma.h. *sigh* I guess archs that need membars even without bounce buffers need to always allocate and setup a bus_dmamap. None of that matters for i386 though. Feel free to fix alpha. Again, long ago, I thought that alpha pretended to be coherent in the 2GB DMA window that we use so that it could be more like i386. If that's not true then that's fine. If you need to make structural changes to the MI code on order to fix alpha, please let me know. No, I'm just a moron. Alpha uses the nobounce_map for static buffers, so bus_dmamap_sync will use the appropriate membars. -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve = http://www.FreeBSD.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
[Fwd: Re: use of bus_dmamap_sync]
Apparently the original poster sent his question to me in private, then sent it again to the mailing list right as I was responding in private. Anyways, no need to continue to guess; if anyone has any questions, feel free to ask. Below is my response. Note that I edited it slightly to fix an error that I found Scott Original Message Subject: Re: use of bus_dmamap_sync Date: Tue, 25 Oct 2005 07:59:03 -0600 From: Scott Long [EMAIL PROTECTED] To: Dinesh Nair [EMAIL PROTECTED] References: [EMAIL PROTECTED] Dinesh Nair wrote: hi scott, i came across this message of yours, http://lists.freebsd.org/pipermail/freebsd-current/2004-December/044395.html and you seem like the perfect person to assist me in something. i've been trying to figure out the best places to use bus_dmamap_sync when reading/writing to a dma mapped address space. however, i cant seem to get the gist of this, either from the mailing list discussions or the man page. could you assist me ? i'm on FreeBSD 4.11 right now, and i notice the definitions of BUS_DMASYNC_* has changed from an enum (0-3) in 4.x to a typedef in 5.x. this is what i have done. i have used two buffers to handle reads from the device and writes to the device. the pseudocode is as follows rx_func() { POSITION A bus_dmamap_sync(tag, map, BUS_DMASYNC_PREREAD); Ask hardware for data bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTREAD); read from readbuf (i'm assuming that device has put data in readbuf) POSITION B } tx_func() { POSITION C write to txbuf (here's where we write to txbuf) bus_dmamap_sync(tag, map, BUS_DMASYNC_PREWRITE); notify hardware of the write POSITION D bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTWRITE); } what BUS_DMASYNC_{PRE,POST}{READ,WRITE} option should i use for bus_dmamap_sync in position A, B, C and D ? any assistance would be gladly appreciated, as i'm seeing some really weird symptoms on this device, where data written out is being immediately read in. i'm guessing this has to do with my wrong usage of bus_dmamap_sync(). The point of the syncs is to do the proper memory barrier and cache coherency magic between the CPU and the bus as well as do the memory copies for bounce buffers. If you are dealing with statically mapped buffers, i.e. for an rx/tx descriptor ring, then you'll want code exactly like described above. In reality, most platforms only do stuff for the POSTREAD and PREWRITE cases, but for the sake of completeness the others are documented and usually used in drivers. NetBSD might have platforms that require operations for PREREAD and POSTWRITE, but I've never looked that closely. If you are dealing with dynamic buffers, i.e. for mbuf data, then you'll want the PREREAD and PREWRITE ops to happen in the callback function for bus_dmamap_load() and the POSTREAD and POSTWRITE ops to happen right before calling bus_dmamap_unload. So in this case is would be: rx_buf() { allocate buffer allocate map bus_dmamap_load(tag, map, buffer, size, rx_callback, arg, flags) } rx_callback(arg, segs, nsegs, errno) { convert segs to hardware format bus_dmamap_sync(tag, map, BUS_DMASYNC_PREREAD) notify hardware about buffer } rx_complete() { bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTREAD) bus_dmamap_unload(tag, map, buffer) deallocate map process buffer } tx_buf() { fill buffer allocate map bus_dmamap_load(tag, map, buffer, size, tx_callback, arg, flags) } tx_callback(arg, segs, nsegs, errno) { convert segs to hardware format bus_dmamap_sync(tag, map, BUS_DMASYNC_PREWRITE) notify hardware about buffer } tx_complete() bus_dmamap_sync(tag, map, BUS_DMASYNC_POSTWRITE) bus_dmamap_unload(tag, map, buffer) deallocate map free buffer } This is the design that busdma was originally modelled on. It works well for storage devices where the load operation must succeed. It doesn't work as well for network devices where the latency of the indirect calls is measurable. So for that, I added bus_dmamap_load_mbuf_sg(). It eliminates the callback function and returns the scatter gather list directly. So, the above example would be: tx_buf() { bus_dma_segment_t segs[maxsegs]; int nsegs; fill buffer allocate map bus_dmamap_load_mbuf_sg(tag, map, buffer, size, segs, nsegs) convert segs to hardware format bus_dmamap_sync(tag, map, BUS_DMASYNC_PREWRITE) notify hardware about buffer } Also, the 'allocate map' part should be done carefully. Most network drivers are lazy and call bus_dmamap_create() and bus_dmamap_destroy() for each buffer. It's often better to pre-allocate the maps at init time, put them on a list, and then just push and pop them off the list at runtime. This is usually faster than calling the busdma