Re: Ethernet over PCIe driver for Inter-Processor Communication
On Thu, Aug 22, 2013 at 02:43:38PM -0700, David Hawkins wrote: Hi S.Saravanan, I have a custom board with four MPC8640 nodes connected over a transparent PCI express switch . In this configuration one node is configured as host(Root Complex) and others as agents(End Point). Thus the legacy PCI software works fine . However the mainline kernel lacks any standard support for Inter-processor communication over PCI. I am in the process of developing an Ethernet over PCI driver for the same on the lines of rionet . However I am facing the following problems. a) I can generate MSI interrupts from End Point to Root Complex over PCI . But the vice-versa is not possible . However i need a method to interrupt the End Point from the Root Complex to complete my driver. Root complex's would normally interrupt a device via a PCIe write to a register in a BAR on the end-point (or in extended configuration space registers depending on the hardware implementation). Only previous references I can find are this post http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg25765.html However this uses doorbells and I think may not be possible in MPC8640. PCIe drivers need some way to interrupt the processor, so there must be an option somewhere ... for example, what are the message register interrupts intended for? See p479 http://cache.freescale.com/files/32bit/doc/ref_manual/MPC8641DRM.pdf (Ira and myself have not used the MPC8640 so are not familiar with its user manual). Any pointers on this issue and guidance on this driver development would be helpful . We use the Ethernet-over-PCI driver that Ira developed. Our next boards will use an MPC8308, but we don't currently have any in a PCIe device form-factor (just the MPC8038RDB), so he has not ported it to PCIe. Feel free to discuss your ideas for your PCIe driver (eg., why start with rionet rather than Ira's driver), either on-list, or email Ira and myself directly. One further note. You might want to look at rproc/rpmsg and their virtio driver support. That seems to be where the Linux world is moving for inter-processor communications. See for example the ARM CPUs interfacing with DSPs. Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: recommended method of netbooting kernel/dtb in u-boot?
On Thu, Apr 11, 2013 at 12:39:00PM -0600, Chris Friesen wrote: On 04/11/2013 12:12 PM, Kumar Gala wrote: On Apr 11, 2013, at 10:44 AM, Chris Friesen wrote: Hi all, We've got a powerpc system that uses u-boot. In our environment on bootup u-boot does a DHCP to get networking info, then uses TFTP to get the kernel, which then does DHCP again and NFS-mounts the initial root filesystem. What's the standard practice for this sort of thing when using device tree blobs? Do most people use multi-file images or do they TFTP scripts to load and execute separate kernel/dtb files? We've normally just done multiple tftp fetches and one grabs dtb and one grabs kernel. Do you hardcode the path to the file in the firmware? Or do you upload a script that knows the path to the file? In our case the path to the boot file(s) depends on which slot the card being booted has been inserted in. The DHCP server knows what the path is, so it can set dhcpd.conf appropriately, but we need to get that information to the firmware on the card being booted. Hello Chris, I use a hardware setup which sounds similar to yours. The DHCP server controls which file is sent to each card. I use the FIT image format to combine a kernel, dtb, and initrd in one package. Then the U-Boot command dhcp $address sets up the network and tftp's the filename sent by the DHCP server. You don't need to invoke the U-Boot command tftp if you only have one image. dhcp is enough. I used the U-Boot doc/uImage.FIT/*.its examples to get started, and wrote my own custom .its file for my board. I don't use anything other than the vmlinux.bin.gz provided by the kernel build. Hope it helps, Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v7 0/8] Raid: enable talitos xor offload for improving performance
On Thu, Aug 09, 2012 at 04:19:35PM +0800, qiang@freescale.com wrote: Hi all, The following 8 patches enabling fsl-dma and talitos offload raid operations for improving raid performance and balancing CPU load. These patches include talitos, fsl-dma and carma module (caram uses some features of fsl-dma). Write performance will be improved by 25-30% tested by iozone. Write performance is improved about 2% after using spin_lock_bh replace spin_lock_irqsave. CPU load will be reduced by 8%. fwiw, I gave v5 a test-drive, setting up a RAID5 array on ramdisks [1], and this patchseries, along with FSL_DMA NET_DMA set seems to be holding water, so this series gets my: Tested-by: Kim Phillips kim.phill...@freescale.com The fsldma parts of the series all look great to me. Thanks, Ira [1] mdadm --create --verbose --force /dev/md0 --level=raid5 --raid-devices=4 \ /dev/ram[0123] Changes in v7: - add test result which is provided by Kim Phillips; - correct one coding style issue in patch 5/8; - add comments by Arnd Bergmann in patch 6/8; Changes in v6: - swap the order of original patch 3/6 and 4/6; - merge Ira's patch to reduce the size of original patch; - merge Ira's patch of carma in 8/8; - update documents and descriptions according to Ira's advice; Changes in v5: - add detail description in patch 3/6 about the process of completed descriptor, the process is in align with fsl-dma Reference Manual, illustrate the potential risk and how to reproduce it; - drop the patch 7/7 in v4 according to Timur's comments; Changes in v4: - fix an error in talitos when dest addr is same with src addr, dest should be freed only one time if src is same with dest addr; - correct coding style in fsl-dma according to Ira's comments; - fix a race condition in fsl-dma fsl_tx_status(), remove the interface which is used to free descriptors in queue ld_completed, this interface has been included in fsldma_cleanup_descriptor(), in v3, there is one place missed spin_lock protect; - split the original patch 3/4 up to 2 patches 3/7 and 4/7 according to Li Yang's comments; - fix a warning of unitialized cookie; - add memory copy self test in fsl-dma; - add more detail description about use spin_lock_bh() to instead of spin_lock_irqsave() according to Timur's comments. Changes in v3: - change release process of fsl-dma descriptor for resolve the potential race condition; - add test result when use spin_lock_bh replace spin_lock_irqsave; - modify the benchmark results according to the latest patch. Changes in v2: - rebase onto cryptodev tree; - split the patch 3/4 up to 3 independent patches; - remove the patch 4/4, the fix is not for cryptodev tree; Qiang Liu (8): Talitos: Support for async_tx XOR offload fsl-dma: remove attribute DMA_INTERRUPT of dmaengine fsl-dma: add fsl_dma_free_descriptor() to reduce code duplication fsl-dma: move functions to avoid forward declarations fsl-dma: change release process of dma descriptor for supporting async_tx fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave fsl-dma: fix a warning of unitialized cookie carma: remove unnecessary DMA_INTERRUPT capability drivers/crypto/Kconfig |9 + drivers/crypto/talitos.c| 413 ++ drivers/crypto/talitos.h| 53 drivers/dma/fsldma.c| 488 +-- drivers/dma/fsldma.h| 17 +- drivers/misc/carma/carma-fpga-program.c |1 - drivers/misc/carma/carma-fpga.c |2 +- 7 files changed, 761 insertions(+), 222 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v6 5/8] fsl-dma: change release process of dma descriptor for supporting async_tx
On Mon, Aug 06, 2012 at 06:14:33PM +0800, qiang@freescale.com wrote: From: Qiang Liu qiang@freescale.com Fix the potential risk when enable config NET_DMA and ASYNC_TX. Async_tx is lack of support in current release process of dma descriptor, all descriptors will be released whatever is acked or no-acked by async_tx, so there is a potential race condition when dma engine is uesd by others clients (e.g. when enable NET_DMA to offload TCP). In our case, a race condition which is raised when use both of talitos and dmaengine to offload xor is because napi scheduler will sync all pending requests in dma channels, it affects the process of raid operations due to ack_tx is not checked in fsl dma. The no-acked descriptor is freed which is submitted just now, as a dependent tx, this freed descriptor trigger BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit(). TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0 GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4 0001 GPR08: a7a7a7a7 0001 045/92002 42028042 100a38d4 ed576d98 GPR16: ed5a11b0 2b162000 0200 046/92000 2d555000 ed3015e8 c15a7aa0 GPR24: c155fc40 ecb63220 ecf41d28 e47/92f640bb0 ef640c30 ecf41ca0 NIP [c02b048c] async_tx_submit+0x6c/0x2b4 LR [c02b068c] async_tx_submit+0x26c/0x2b4 Call Trace: [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable) [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10 [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8 [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4 [ecf41f40] [c04329b8] md_thread+0x138/0x16c [ecf41f90] [c008277c] kthread+0x8c/0x90 [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68 Another modification in this patch is the change of completed descriptors, there is a potential risk which caused by exception interrupt, all descriptors in ld_running list are seemed completed when an interrupt raised, it works fine under normal condition, but if there is an exception occured, it cannot work as our excepted. Hardware should not be depend on s/w list, the right way is to read current descriptor address register to find the last completed descriptor. If an interrupt is raised by an error, all descriptors in ld_running should not be seemed finished, or these unfinished descriptors in ld_running will be released wrongly. A simple way to reproduce, Enable dmatest first, then insert some bad descriptors which can trigger Programming Error interrupts before the good descriptors. Last, the good descriptors will be freed before they are processsed because of the exception intrerrupt. Note: the bad descriptors are only for simulating an exception interrupt. This case can illustrate the potential risk in current fsl-dma very well. Cc: Dan Williams dan.j.willi...@intel.com Cc: Dan Williams dan.j.willi...@gmail.com Cc: Vinod Koul vinod.k...@intel.com Cc: Li Yang le...@freescale.com Signed-off-by: Qiang Liu qiang@freescale.com Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu There are two minor nitpicks below. Other than that, the patch looks excellent to me. Ira --- drivers/dma/fsldma.c | 232 ++ drivers/dma/fsldma.h | 17 +++- 2 files changed, 174 insertions(+), 75 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 36490a3..938d8c1 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -472,6 +472,110 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor(struct fsldma_chan *chan) } /** + * fsldma_clean_completed_descriptor - free all descriptors which + * has been completed and acked + * @chan: Freescale DMA channel + * + * This function is used on all completed and acked descriptors. + * All descriptors should only be freed in this function. + */ +static void +fsldma_clean_completed_descriptor(struct fsldma_chan *chan) +{ + struct fsl_desc_sw *desc, *_desc; + + /* Run the callback for each descriptor, in order */ + list_for_each_entry_safe(desc, _desc, chan-ld_completed, node) + if (async_tx_test_ack(desc-async_tx)) + fsl_dma_free_descriptor(chan, desc); +} + +/** + * fsldma_run_tx_complete_actions - cleanup a single link descriptor + * @chan: Freescale DMA channel + * @desc: descriptor to cleanup and free + * @cookie: Freescale DMA transaction identifier + * + * This function is used on a descriptor which has been executed by the DMA + * controller. It will run any callbacks, submit any dependencies. + */ +static dma_cookie_t fsldma_run_tx_complete_actions(struct fsldma_chan *chan, + struct fsl_desc_sw *desc, dma_cookie_t cookie) +{ + struct dma_async_tx_descriptor *txd = desc-async_tx; + struct device *dev = chan-common.device-dev
Re: [PATCH v5 3/6] fsl-dma: change release process of dma descriptor for supporting async_tx
On Thu, Aug 02, 2012 at 07:21:51AM +, Liu Qiang-B32616 wrote: -Original Message- From: Ira W. Snyder [mailto:i...@ovro.caltech.edu] Sent: Thursday, August 02, 2012 1:25 AM To: Liu Qiang-B32616 Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- ker...@vger.kernel.org; dan.j.willi...@gmail.com; Vinod Koul; herb...@gondor.hengli.com.au; Dan Williams; da...@davemloft.net Subject: Re: [PATCH v5 3/6] fsl-dma: change release process of dma descriptor for supporting async_tx On Wed, Aug 01, 2012 at 04:49:17PM +0800, qiang@freescale.com wrote: From: Qiang Liu qiang@freescale.com Fix the potential risk when enable config NET_DMA and ASYNC_TX. Async_tx is lack of support in current release process of dma descriptor, all descriptors will be released whatever is acked or no-acked by async_tx, so there is a potential race condition when dma engine is uesd by others clients (e.g. when enable NET_DMA to offload TCP). In our case, a race condition which is raised when use both of talitos and dmaengine to offload xor is because napi scheduler will sync all pending requests in dma channels, it affects the process of raid operations due to ack_tx is not checked in fsl dma. The no-acked descriptor is freed which is submitted just now, as a dependent tx, this freed descriptor trigger BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit(). TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0 GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4 0001 GPR08: a7a7a7a7 0001 045/92002 42028042 100a38d4 ed576d98 GPR16: ed5a11b0 2b162000 0200 046/92000 2d555000 ed3015e8 c15a7aa0 GPR24: c155fc40 ecb63220 ecf41d28 e47/92f640bb0 ef640c30 ecf41ca0 NIP [c02b048c] async_tx_submit+0x6c/0x2b4 LR [c02b068c] async_tx_submit+0x26c/0x2b4 Call Trace: [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable) [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10 [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8 [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4 [ecf41f40] [c04329b8] md_thread+0x138/0x16c [ecf41f90] [c008277c] kthread+0x8c/0x90 [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68 Another major modification in this patch is the change to completed descriptors, there is a potential risk which caused by exception interrupt, all descriptors in ld_running list are seemed completed when an interrupt raised, it works fine under normal condition, but if there is an exception occured, it cannot work as our excepted. Hardware should not depend on s/w list, the right way is to read current descriptor address register to find the last completed descriptor. If an interrupt is raised by an error, all descriptors in ld_running should not be seemed finished, or these unfinished descriptors in ld_running will be released wrongly. A simple way to reproduce, Enable dmatest first, then insert some bad descriptors which can trigger Programming Error interrupts before the good descriptors. Last, the good descriptors will be freed before they are processsed because of the exception intrerrupt. Note: the bad descriptors are only for simulating an exception interrupt. This case can illustrate the potential risk in current fsl-dma very well. I've never managed to trigger a PE (programming error) interrupt on the 83xx hardware. Any time I intentionally caused an error, the hardware wedged itself. The CB (channel busy) bit is stuck high, and cannot be cleared without a hard reset of the board. Sorry, the exception indeed will be occurred, actually, the capability DMA_INTERRUPT will reproduce the issue under conditions. It will trigger a exception because of bad descriptor (length is zero, src and dst are zero, a-b-c-bada-badb-d, we cannot find out which one is really finished in an interrupt tasklet). So, we'd better consider this case. BTW, I have already found out your patch which is used to resolve the issue of dma lock, http://lkml.indiana.edu/hypermail/linux/kernel/1103.0/01519.html Ok. I haven't tested bad descriptors since several years ago. I agree that it can happen, so we should fix it. I agree the snoop on the hardware technique works. As far as I can tell, you have implemented the code correctly. The MPC8349EARM.pdf from Freescale indicates that the hardware will halt in response to a programming error, and generate a PE interrupt. See section 12.5.3.3 (pg 568). The driver, as it is written, will never recover from such a condition. Since you are complaining about this situation, do you intend to fix it? Frankly, I don't think your patch really can resolve the issue. Now, I understand what
Re: [PATCH v5 4/6] fsl-dma: move the function ahead of its invoke function
On Wed, Aug 01, 2012 at 04:49:43PM +0800, qiang@freescale.com wrote: From: Qiang Liu qiang@freescale.com Move the function fsldma_cleanup_descriptor() and fsl_chan_xfer_ld_queue() ahead of its invoke function for avoiding redundant definition. Cc: Dan Williams dan.j.willi...@intel.com Cc: Vinod Koul vinod.k...@intel.com Cc: Li Yang le...@freescale.com Signed-off-by: Qiang Liu qiang@freescale.com --- drivers/dma/fsldma.c | 252 +- 1 files changed, 124 insertions(+), 128 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 87f52c0..bb883c0 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -400,9 +400,6 @@ out_splice: list_splice_tail_init(desc-tx_list, chan-ld_pending); } -static void fsldma_cleanup_descriptor(struct fsldma_chan *chan); -static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan); - Please swap the order of this patch (patch 4/6) and the previous patch (patch 3/6). You added these lines in the patch 3/6 and deleted them here. If you reverse the order of the patches, this doesn't happen. Adding lines only to delete them in the next patch should be avoided. /** * fsldma_clean_completed_descriptor - free all descriptors which * has been completed and acked @@ -519,6 +516,130 @@ fsldma_clean_running_descriptor(struct fsldma_chan *chan, return 0; } +/** + * fsl_chan_xfer_ld_queue - transfer any pending transactions + * @chan : Freescale DMA channel + * + * HARDWARE STATE: idle + * LOCKING: must hold chan-desc_lock + */ +static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) +{ + struct fsl_desc_sw *desc; + + /* + * If the list of pending descriptors is empty, then we + * don't need to do any work at all + */ + if (list_empty(chan-ld_pending)) { + chan_dbg(chan, no pending LDs\n); + return; + } + + /* + * The DMA controller is not idle, which means that the interrupt + * handler will start any queued transactions when it runs after + * this transaction finishes + */ + if (!chan-idle) { + chan_dbg(chan, DMA controller still busy\n); + return; + } + + /* + * If there are some link descriptors which have not been + * transferred, we need to start the controller + */ + + /* + * Move all elements from the queue of pending transactions + * onto the list of running transactions + */ + chan_dbg(chan, idle, starting controller\n); + desc = list_first_entry(chan-ld_pending, struct fsl_desc_sw, node); + list_splice_tail_init(chan-ld_pending, chan-ld_running); + + /* + * The 85xx DMA controller doesn't clear the channel start bit + * automatically at the end of a transfer. Therefore we must clear + * it in software before starting the transfer. + */ + if ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) { + u32 mode; + + mode = DMA_IN(chan, chan-regs-mr, 32); + mode = ~FSL_DMA_MR_CS; + DMA_OUT(chan, chan-regs-mr, mode, 32); + } + + /* + * Program the descriptor's address into the DMA controller, + * then start the DMA transaction + */ + set_cdar(chan, desc-async_tx.phys); + get_cdar(chan); + + dma_start(chan); + chan-idle = false; +} + +/** + * fsldma_cleanup_descriptor - cleanup and free a single link descriptor + * @chan: Freescale DMA channel + * @desc: descriptor to cleanup and free + * + * This function is used on a descriptor which has been executed by the DMA + * controller. It will run any callbacks, submit any dependencies, and then + * free the descriptor. + */ +static void fsldma_cleanup_descriptor(struct fsldma_chan *chan) +{ + struct fsl_desc_sw *desc, *_desc; + dma_cookie_t cookie = 0; + dma_addr_t curr_phys = get_cdar(chan); + int idle = dma_is_idle(chan); + int seen_current = 0; + + fsldma_clean_completed_descriptor(chan); + + /* Run the callback for each descriptor, in order */ + list_for_each_entry_safe(desc, _desc, chan-ld_running, node) { + /* + * do not advance past the current descriptor loaded into the + * hardware channel, subsequent descriptors are either in + * process or have not been submitted + */ + if (seen_current) + break; + + /* + * stop the search if we reach the current descriptor and the + * channel is busy + */ + if (desc-async_tx.phys == curr_phys) { + seen_current = 1; + if (!idle) + break; + } + + cookie = fsldma_run_tx_complete_actions(desc,
Re: [PATCH v5 2/6] fsl-dma: remove attribute DMA_INTERRUPT of dmaengine
On Wed, Aug 01, 2012 at 04:49:08PM +0800, qiang@freescale.com wrote: From: Qiang Liu qiang@freescale.com Delete attribute DMA_INTERRUPT because fsl-dma doesn't support this function, exception will be thrown if talitos is used to offload xor at the same time. I have no problem with this patch. However, it ***WILL BREAK*** both drivers in drivers/misc/carma. Please add my patch 7/7 titled [PATCH 7/7] carma: remove unnecessary DMA_INTERRUPT capability to your series. I suggest placing it immediately after this patch in your series. The carma drivers use the fsldma driver exclusively. Cc: Dan Williams dan.j.willi...@intel.com Cc: Vinod Koul vinod.k...@intel.com Cc: Li Yang le...@freescale.com Signed-off-by: Qiang Liu qiang@freescale.com Acked-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 31 --- 1 files changed, 0 insertions(+), 31 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 8f84761..4f2f212 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -543,35 +543,6 @@ static void fsl_dma_free_chan_resources(struct dma_chan *dchan) } static struct dma_async_tx_descriptor * -fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags) -{ - struct fsldma_chan *chan; - struct fsl_desc_sw *new; - - if (!dchan) - return NULL; - - chan = to_fsl_chan(dchan); - - new = fsl_dma_alloc_descriptor(chan); - if (!new) { - chan_err(chan, %s\n, msg_ld_oom); - return NULL; - } - - new-async_tx.cookie = -EBUSY; - new-async_tx.flags = flags; - - /* Insert the link descriptor to the LD ring */ - list_add_tail(new-node, new-tx_list); - - /* Set End-of-link to the last link descriptor of new list */ - set_ld_eol(chan, new); - - return new-async_tx; -} - -static struct dma_async_tx_descriptor * fsl_dma_prep_memcpy(struct dma_chan *dchan, dma_addr_t dma_dst, dma_addr_t dma_src, size_t len, unsigned long flags) @@ -1352,12 +1323,10 @@ static int __devinit fsldma_of_probe(struct platform_device *op) fdev-irq = irq_of_parse_and_map(op-dev.of_node, 0); dma_cap_set(DMA_MEMCPY, fdev-common.cap_mask); - dma_cap_set(DMA_INTERRUPT, fdev-common.cap_mask); dma_cap_set(DMA_SG, fdev-common.cap_mask); dma_cap_set(DMA_SLAVE, fdev-common.cap_mask); fdev-common.device_alloc_chan_resources = fsl_dma_alloc_chan_resources; fdev-common.device_free_chan_resources = fsl_dma_free_chan_resources; - fdev-common.device_prep_dma_interrupt = fsl_dma_prep_interrupt; fdev-common.device_prep_dma_memcpy = fsl_dma_prep_memcpy; fdev-common.device_prep_dma_sg = fsl_dma_prep_sg; fdev-common.device_tx_status = fsl_tx_status; -- 1.7.5.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 3/6] fsl-dma: change release process of dma descriptor for supporting async_tx
On Wed, Aug 01, 2012 at 04:49:17PM +0800, qiang@freescale.com wrote: From: Qiang Liu qiang@freescale.com Fix the potential risk when enable config NET_DMA and ASYNC_TX. Async_tx is lack of support in current release process of dma descriptor, all descriptors will be released whatever is acked or no-acked by async_tx, so there is a potential race condition when dma engine is uesd by others clients (e.g. when enable NET_DMA to offload TCP). In our case, a race condition which is raised when use both of talitos and dmaengine to offload xor is because napi scheduler will sync all pending requests in dma channels, it affects the process of raid operations due to ack_tx is not checked in fsl dma. The no-acked descriptor is freed which is submitted just now, as a dependent tx, this freed descriptor trigger BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit(). TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0 GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4 0001 GPR08: a7a7a7a7 0001 045/92002 42028042 100a38d4 ed576d98 GPR16: ed5a11b0 2b162000 0200 046/92000 2d555000 ed3015e8 c15a7aa0 GPR24: c155fc40 ecb63220 ecf41d28 e47/92f640bb0 ef640c30 ecf41ca0 NIP [c02b048c] async_tx_submit+0x6c/0x2b4 LR [c02b068c] async_tx_submit+0x26c/0x2b4 Call Trace: [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable) [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10 [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8 [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4 [ecf41f40] [c04329b8] md_thread+0x138/0x16c [ecf41f90] [c008277c] kthread+0x8c/0x90 [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68 Another major modification in this patch is the change to completed descriptors, there is a potential risk which caused by exception interrupt, all descriptors in ld_running list are seemed completed when an interrupt raised, it works fine under normal condition, but if there is an exception occured, it cannot work as our excepted. Hardware should not depend on s/w list, the right way is to read current descriptor address register to find the last completed descriptor. If an interrupt is raised by an error, all descriptors in ld_running should not be seemed finished, or these unfinished descriptors in ld_running will be released wrongly. A simple way to reproduce, Enable dmatest first, then insert some bad descriptors which can trigger Programming Error interrupts before the good descriptors. Last, the good descriptors will be freed before they are processsed because of the exception intrerrupt. Note: the bad descriptors are only for simulating an exception interrupt. This case can illustrate the potential risk in current fsl-dma very well. I've never managed to trigger a PE (programming error) interrupt on the 83xx hardware. Any time I intentionally caused an error, the hardware wedged itself. The CB (channel busy) bit is stuck high, and cannot be cleared without a hard reset of the board. I agree the snoop on the hardware technique works. As far as I can tell, you have implemented the code correctly. The MPC8349EARM.pdf from Freescale indicates that the hardware will halt in response to a programming error, and generate a PE interrupt. See section 12.5.3.3 (pg 568). The driver, as it is written, will never recover from such a condition. Since you are complaining about this situation, do you intend to fix it? Cc: Dan Williams dan.j.willi...@intel.com Cc: Dan Williams dan.j.willi...@gmail.com Cc: Vinod Koul vinod.k...@intel.com Cc: Li Yang le...@freescale.com Cc: Ira W. Snyder i...@ovro.caltech.edu Signed-off-by: Qiang Liu qiang@freescale.com --- drivers/dma/fsldma.c | 242 +++--- drivers/dma/fsldma.h |1 + 2 files changed, 172 insertions(+), 71 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 4f2f212..87f52c0 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -400,6 +400,125 @@ out_splice: list_splice_tail_init(desc-tx_list, chan-ld_pending); } +static void fsldma_cleanup_descriptor(struct fsldma_chan *chan); +static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan); + As noted in my reply to patch 4/6, please swap the order of this patch and the following patch. These lines should not be added or removed in either patch. +/** + * fsldma_clean_completed_descriptor - free all descriptors which + * has been completed and acked + * @chan: Freescale DMA channel + * + * This function is used on all completed and acked descriptors. + * All descriptors should only be freed in this function. + */ +static int +fsldma_clean_completed_descriptor(struct fsldma_chan *chan) This should be 'static void'. It does not return an error
Re: [PATCH v5 6/6] fsl-dma: fix a warning of unitialized cookie
On Wed, Aug 01, 2012 at 04:50:27PM +0800, qiang@freescale.com wrote: From: Qiang Liu qiang@freescale.com Fix a warning of unitialized value when compile with -Wuninitialized. Looks good to me. Acked-by: Ira W. Snyder i...@ovro.caltech.edu Cc: Dan Williams dan.j.willi...@intel.com Cc: Vinod Koul vinod.k...@intel.com Cc: Li Yang le...@freescale.com Signed-off-by: Qiang Liu qiang@freescale.com Reported-by: Kim Phillips kim.phill...@freescale.com --- drivers/dma/fsldma.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index e3814aa..6fc22eb 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -645,7 +645,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) struct fsldma_chan *chan = to_fsl_chan(tx-chan); struct fsl_desc_sw *desc = tx_to_fsl_desc(tx); struct fsl_desc_sw *child; - dma_cookie_t cookie; + dma_cookie_t cookie = 0; spin_lock_bh(chan-desc_lock); -- 1.7.5.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 5/6] fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave
On Wed, Aug 01, 2012 at 04:50:09PM +0800, qiang@freescale.com wrote: From: Qiang Liu qiang@freescale.com - use spin_lock_bh() is the right way to use async_tx api, dma_run_dependencies() should not be protected by spin_lock_irqsave(); - use spin_lock_bh to instead of spin_lock_irqsave for improving performance, There is not any place to access descriptor queues in fsl-dma ISR except its tasklet, spin_lock_bh() is more proper here. Interrupts will be turned off and context will be save in irqsave, there is needless to use irqsave.. This description is not very clear English. I understand it is not your native language. Let me try to help. The use of spin_lock_irqsave() is a stronger locking mechanism than is required throughout the driver. The minimum locking required should be used instead. Change all instances of spin_lock_irqsave() to spin_lock_bh(). All manipulation of protected fields is done using tasklet context or weaker, which makes spin_lock_bh() the correct choice. Other than that, Acked-by: Ira W. Snyder i...@ovro.caltech.edu Cc: Dan Williams dan.j.willi...@intel.com Cc: Vinod Koul vinod.k...@intel.com Cc: Li Yang le...@freescale.com Cc: Timur Tabi ti...@freescale.com Signed-off-by: Qiang Liu qiang@freescale.com --- drivers/dma/fsldma.c | 30 -- 1 files changed, 12 insertions(+), 18 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index bb883c0..e3814aa 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -645,10 +645,9 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) struct fsldma_chan *chan = to_fsl_chan(tx-chan); struct fsl_desc_sw *desc = tx_to_fsl_desc(tx); struct fsl_desc_sw *child; - unsigned long flags; dma_cookie_t cookie; - spin_lock_irqsave(chan-desc_lock, flags); + spin_lock_bh(chan-desc_lock); /* * assign cookies to all of the software descriptors @@ -661,7 +660,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) /* put this transaction onto the tail of the pending queue */ append_ld_queue(chan, desc); - spin_unlock_irqrestore(chan-desc_lock, flags); + spin_unlock_bh(chan-desc_lock); return cookie; } @@ -770,15 +769,14 @@ static void fsldma_free_desc_list_reverse(struct fsldma_chan *chan, static void fsl_dma_free_chan_resources(struct dma_chan *dchan) { struct fsldma_chan *chan = to_fsl_chan(dchan); - unsigned long flags; chan_dbg(chan, free all channel resources\n); - spin_lock_irqsave(chan-desc_lock, flags); + spin_lock_bh(chan-desc_lock); fsldma_cleanup_descriptor(chan); fsldma_free_desc_list(chan, chan-ld_pending); fsldma_free_desc_list(chan, chan-ld_running); fsldma_free_desc_list(chan, chan-ld_completed); - spin_unlock_irqrestore(chan-desc_lock, flags); + spin_unlock_bh(chan-desc_lock); dma_pool_destroy(chan-desc_pool); chan-desc_pool = NULL; @@ -997,7 +995,6 @@ static int fsl_dma_device_control(struct dma_chan *dchan, { struct dma_slave_config *config; struct fsldma_chan *chan; - unsigned long flags; int size; if (!dchan) @@ -1007,7 +1004,7 @@ static int fsl_dma_device_control(struct dma_chan *dchan, switch (cmd) { case DMA_TERMINATE_ALL: - spin_lock_irqsave(chan-desc_lock, flags); + spin_lock_bh(chan-desc_lock); /* Halt the DMA engine */ dma_halt(chan); @@ -1017,7 +1014,7 @@ static int fsl_dma_device_control(struct dma_chan *dchan, fsldma_free_desc_list(chan, chan-ld_running); chan-idle = true; - spin_unlock_irqrestore(chan-desc_lock, flags); + spin_unlock_bh(chan-desc_lock); return 0; case DMA_SLAVE_CONFIG: @@ -1059,11 +1056,10 @@ static int fsl_dma_device_control(struct dma_chan *dchan, static void fsl_dma_memcpy_issue_pending(struct dma_chan *dchan) { struct fsldma_chan *chan = to_fsl_chan(dchan); - unsigned long flags; - spin_lock_irqsave(chan-desc_lock, flags); + spin_lock_bh(chan-desc_lock); fsl_chan_xfer_ld_queue(chan); - spin_unlock_irqrestore(chan-desc_lock, flags); + spin_unlock_bh(chan-desc_lock); } /** @@ -1076,15 +1072,14 @@ static enum dma_status fsl_tx_status(struct dma_chan *dchan, { struct fsldma_chan *chan = to_fsl_chan(dchan); enum dma_status ret; - unsigned long flags; ret = dma_cookie_status(dchan, cookie, txstate); if (ret == DMA_SUCCESS) return ret; - spin_lock_irqsave(chan-desc_lock, flags); + spin_lock_bh(chan-desc_lock); fsldma_cleanup_descriptor(chan); - spin_unlock_irqrestore(chan-desc_lock, flags); + spin_unlock_bh(chan-desc_lock); return
Re: [PATCH v4 3/7] fsl-dma: change release process of dma descriptor for supporting async_tx
On Tue, Jul 31, 2012 at 04:09:28AM +, Liu Qiang-B32616 wrote: -Original Message- From: Ira W. Snyder [mailto:i...@ovro.caltech.edu] Sent: Tuesday, July 31, 2012 5:10 AM To: Liu Qiang-B32616 Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Phillips Kim-R1AAHA; herb...@gondor.hengli.com.au; da...@davemloft.net; Dan Williams; Vinod Koul; Li Yang-R58472 Subject: Re: [PATCH v4 3/7] fsl-dma: change release process of dma descriptor for supporting async_tx On Fri, Jul 27, 2012 at 05:16:09PM +0800, qiang@freescale.com wrote: From: Qiang Liu qiang@freescale.com Fix the potential risk when enable config NET_DMA and ASYNC_TX. Async_tx is lack of support in current release process of dma descriptor, all descriptors will be released whatever is acked or no-acked by async_tx, so there is a potential race condition when dma engine is uesd by others clients (e.g. when enable NET_DMA to offload TCP). In our case, a race condition which is raised when use both of talitos and dmaengine to offload xor is because napi scheduler will sync all pending requests in dma channels, it affects the process of raid operations due to ack_tx is not checked in fsl dma. The no-acked descriptor is freed which is submitted just now, as a dependent tx, this freed descriptor trigger BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit(). I'm preparing an alternative version of this patch that I think is easier to understand (it is much shorter). I'll post it up here as soon as I finish testing. Can you give a simple description/idea about your patch? My patch is for fix the problems when I build a raid environment with talitos offload xor. I think the new interface is clear enough and similar with the implement of other dma devices. And do you have any comments about this patch? My patch will fix the same problem, in a simpler way. It will not involve checking if the hardware is finished with a descriptor on ld_running. It would be nice to know how to easily reproduce this bug, without needing to set up a RAID system. I don't have access to any such hardware. A driver similar to drivers/dma/dmatest.c (using the async_tx API instead) would be wonderful. You can refer to raid5.c if you do not want to use hardware. Or you can use you ram (or other storage devices) to build a raid env to test. Thanks. Thanks, Ira TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0 GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4 0001 GPR08: a7a7a7a7 0001 045/92002 42028042 100a38d4 ed576d98 GPR16: ed5a11b0 2b162000 0200 046/92000 2d555000 ed3015e8 c15a7aa0 GPR24: c155fc40 ecb63220 ecf41d28 e47/92f640bb0 ef640c30 ecf41ca0 NIP [c02b048c] async_tx_submit+0x6c/0x2b4 LR [c02b068c] async_tx_submit+0x26c/0x2b4 Call Trace: [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable) [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10 [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8 [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4 [ecf41f40] [c04329b8] md_thread+0x138/0x16c [ecf41f90] [c008277c] kthread+0x8c/0x90 [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68 Cc: Dan Williams dan.j.willi...@intel.com Cc: Vinod Koul vinod.k...@intel.com Cc: Li Yang le...@freescale.com Cc: Ira W. Snyder i...@ovro.caltech.edu Signed-off-by: Qiang Liu qiang@freescale.com --- drivers/dma/fsldma.c | 242 +++--- drivers/dma/fsldma.h |1 + 2 files changed, 172 insertions(+), 71 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 4f2f212..87f52c0 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -400,6 +400,125 @@ out_splice: list_splice_tail_init(desc-tx_list, chan-ld_pending); } +static void fsldma_cleanup_descriptor(struct fsldma_chan *chan); +static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan); + You should have re-arranged the patches to avoid introducing these forward declarations in this patch and then deleting them in the next patch. I reversed the order in my patch series. +/** + * fsldma_clean_completed_descriptor - free all descriptors which + * has been completed and acked + * @chan: Freescale DMA channel + * + * This function is used on all completed and acked descriptors. + * All descriptors should only be freed in this function. + */ +static int +fsldma_clean_completed_descriptor(struct fsldma_chan *chan) +{ + struct fsl_desc_sw *desc, *_desc; + + /* Run the callback for each descriptor, in order */ + list_for_each_entry_safe(desc, _desc, chan-ld_completed
[PATCH 0/7] fsl-dma: fixes for Freescale DMA driver
From: Ira W. Snyder i...@ovro.caltech.edu Hello everyone, This is my alternative (simpler) attempt at solving the problems reported by Qiang Liu with the async_tx API and MD RAID hardware offload support when using the Freescale DMA driver. The bug is caused by this driver freeing descriptors before they have been ACKed by software using the async_tx API. I don't like Qiang Liu's code to check where the hardware is in the processing of the descriptor chain, and try to free a partial list of descriptors. This was a source of bugs in this driver before I fixed them several years ago. Instead, the DMA controller raises an interrupt every time it has completed a descriptor chain. This means it is ready for new descriptors: no need to try and figure out where it is in the middle of a descriptor chain. Qiang Liu: I do not have a hardware setup capable of using MD RAID. Please test these patches to see if they fix the bug you reported. You may use these patches as-is, or build upon them. I have tested this using the drivers/dma/dmatest.c driver, as well as the CARMA drivers. There are no regressions that I can find. [ 355.069679] dma0chan3-copy0: terminating after 10 tests, 0 failures (status 0) [ 355.192278] dma0chan2-copy0: terminating after 10 tests, 0 failures (status 0) Ira W. Snyder (5): fsl-dma: minimize locking overhead fsl-dma: add fsl_dma_free_descriptor() to reduce code duplication fsl-dma: move functions to avoid forward declarations fsl-dma: fix support for async_tx API carma: remove unnecessary DMA_INTERRUPT capability Qiang Liu (2): fsl-dma: remove attribute DMA_INTERRUPT of dmaengine fsl-dma: fix a warning of unitialized cookie drivers/dma/fsldma.c| 318 +++ drivers/dma/fsldma.h|1 + drivers/misc/carma/carma-fpga-program.c |1 - drivers/misc/carma/carma-fpga.c |3 +- 4 files changed, 159 insertions(+), 164 deletions(-) -- 1.7.8.6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/7] fsl-dma: remove attribute DMA_INTERRUPT of dmaengine
From: Qiang Liu qiang@freescale.com Delete attribute DMA_INTERRUPT because fsl-dma doesn't support this function, exception will be thrown if talitos is used to offload xor at the same time. Cc: Dan Williams dan.j.willi...@intel.com Cc: Vinod Koul vinod.k...@intel.com Cc: Li Yang le...@freescale.com Signed-off-by: Qiang Liu qiang@freescale.com Acked-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 31 --- 1 files changed, 0 insertions(+), 31 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 8f84761..4f2f212 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -543,35 +543,6 @@ static void fsl_dma_free_chan_resources(struct dma_chan *dchan) } static struct dma_async_tx_descriptor * -fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags) -{ - struct fsldma_chan *chan; - struct fsl_desc_sw *new; - - if (!dchan) - return NULL; - - chan = to_fsl_chan(dchan); - - new = fsl_dma_alloc_descriptor(chan); - if (!new) { - chan_err(chan, %s\n, msg_ld_oom); - return NULL; - } - - new-async_tx.cookie = -EBUSY; - new-async_tx.flags = flags; - - /* Insert the link descriptor to the LD ring */ - list_add_tail(new-node, new-tx_list); - - /* Set End-of-link to the last link descriptor of new list */ - set_ld_eol(chan, new); - - return new-async_tx; -} - -static struct dma_async_tx_descriptor * fsl_dma_prep_memcpy(struct dma_chan *dchan, dma_addr_t dma_dst, dma_addr_t dma_src, size_t len, unsigned long flags) @@ -1352,12 +1323,10 @@ static int __devinit fsldma_of_probe(struct platform_device *op) fdev-irq = irq_of_parse_and_map(op-dev.of_node, 0); dma_cap_set(DMA_MEMCPY, fdev-common.cap_mask); - dma_cap_set(DMA_INTERRUPT, fdev-common.cap_mask); dma_cap_set(DMA_SG, fdev-common.cap_mask); dma_cap_set(DMA_SLAVE, fdev-common.cap_mask); fdev-common.device_alloc_chan_resources = fsl_dma_alloc_chan_resources; fdev-common.device_free_chan_resources = fsl_dma_free_chan_resources; - fdev-common.device_prep_dma_interrupt = fsl_dma_prep_interrupt; fdev-common.device_prep_dma_memcpy = fsl_dma_prep_memcpy; fdev-common.device_prep_dma_sg = fsl_dma_prep_sg; fdev-common.device_tx_status = fsl_tx_status; -- 1.7.8.6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/7] fsl-dma: add fsl_dma_free_descriptor() to reduce code duplication
From: Ira W. Snyder i...@ovro.caltech.edu There are several places where descriptors are freed using identical code. Put this code into a function to reduce code duplication. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 32 ++-- 1 files changed, 14 insertions(+), 18 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 8f0505d..c34a628 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -400,6 +400,15 @@ out_splice: list_splice_tail_init(desc-tx_list, chan-ld_pending); } +static void fsl_dma_free_descriptor(struct fsldma_chan *chan, struct fsl_desc_sw *desc) +{ + list_del(desc-node); +#ifdef FSL_DMA_LD_DEBUG + chan_dbg(chan, LD %p free\n, desc); +#endif + dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys); +} + static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) { struct fsldma_chan *chan = to_fsl_chan(tx-chan); @@ -499,13 +508,8 @@ static void fsldma_free_desc_list(struct fsldma_chan *chan, { struct fsl_desc_sw *desc, *_desc; - list_for_each_entry_safe(desc, _desc, list, node) { - list_del(desc-node); -#ifdef FSL_DMA_LD_DEBUG - chan_dbg(chan, LD %p free\n, desc); -#endif - dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys); - } + list_for_each_entry_safe(desc, _desc, list, node) + fsl_dma_free_descriptor(chan, desc); } static void fsldma_free_desc_list_reverse(struct fsldma_chan *chan, @@ -513,13 +517,8 @@ static void fsldma_free_desc_list_reverse(struct fsldma_chan *chan, { struct fsl_desc_sw *desc, *_desc; - list_for_each_entry_safe_reverse(desc, _desc, list, node) { - list_del(desc-node); -#ifdef FSL_DMA_LD_DEBUG - chan_dbg(chan, LD %p free\n, desc); -#endif - dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys); - } + list_for_each_entry_safe_reverse(desc, _desc, list, node) + fsl_dma_free_descriptor(chan, desc); } /** @@ -852,10 +851,7 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan *chan, dma_unmap_page(dev, src, len, DMA_TO_DEVICE); } -#ifdef FSL_DMA_LD_DEBUG - chan_dbg(chan, LD %p free\n, desc); -#endif - dma_pool_free(chan-desc_pool, desc, txd-phys); + fsl_dma_free_descriptor(chan, desc); } /** -- 1.7.8.6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/7] fsl-dma: minimize locking overhead
From: Ira W. Snyder i...@ovro.caltech.edu The use of spin_lock_irqsave() was a stronger locking mechanism than was actually needed in many cases. As the current code is written, spin_lock_bh() everywhere is sufficient. The next patch in this series will add some code to hardware interrupt context. This patch is intended to minimize the differences in the next patch and make review easier. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 25 ++--- 1 files changed, 10 insertions(+), 15 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 4f2f212..8f0505d 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -405,10 +405,9 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) struct fsldma_chan *chan = to_fsl_chan(tx-chan); struct fsl_desc_sw *desc = tx_to_fsl_desc(tx); struct fsl_desc_sw *child; - unsigned long flags; dma_cookie_t cookie; - spin_lock_irqsave(chan-desc_lock, flags); + spin_lock_irq(chan-desc_lock); /* * assign cookies to all of the software descriptors @@ -421,7 +420,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) /* put this transaction onto the tail of the pending queue */ append_ld_queue(chan, desc); - spin_unlock_irqrestore(chan-desc_lock, flags); + spin_unlock_irq(chan-desc_lock); return cookie; } @@ -530,13 +529,12 @@ static void fsldma_free_desc_list_reverse(struct fsldma_chan *chan, static void fsl_dma_free_chan_resources(struct dma_chan *dchan) { struct fsldma_chan *chan = to_fsl_chan(dchan); - unsigned long flags; chan_dbg(chan, free all channel resources\n); - spin_lock_irqsave(chan-desc_lock, flags); + spin_lock_irq(chan-desc_lock); fsldma_free_desc_list(chan, chan-ld_pending); fsldma_free_desc_list(chan, chan-ld_running); - spin_unlock_irqrestore(chan-desc_lock, flags); + spin_unlock_irq(chan-desc_lock); dma_pool_destroy(chan-desc_pool); chan-desc_pool = NULL; @@ -755,7 +753,6 @@ static int fsl_dma_device_control(struct dma_chan *dchan, { struct dma_slave_config *config; struct fsldma_chan *chan; - unsigned long flags; int size; if (!dchan) @@ -765,7 +762,7 @@ static int fsl_dma_device_control(struct dma_chan *dchan, switch (cmd) { case DMA_TERMINATE_ALL: - spin_lock_irqsave(chan-desc_lock, flags); + spin_lock_irq(chan-desc_lock); /* Halt the DMA engine */ dma_halt(chan); @@ -775,7 +772,7 @@ static int fsl_dma_device_control(struct dma_chan *dchan, fsldma_free_desc_list(chan, chan-ld_running); chan-idle = true; - spin_unlock_irqrestore(chan-desc_lock, flags); + spin_unlock_irq(chan-desc_lock); return 0; case DMA_SLAVE_CONFIG: @@ -935,11 +932,10 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) static void fsl_dma_memcpy_issue_pending(struct dma_chan *dchan) { struct fsldma_chan *chan = to_fsl_chan(dchan); - unsigned long flags; - spin_lock_irqsave(chan-desc_lock, flags); + spin_lock_irq(chan-desc_lock); fsl_chan_xfer_ld_queue(chan); - spin_unlock_irqrestore(chan-desc_lock, flags); + spin_unlock_irq(chan-desc_lock); } /** @@ -952,11 +948,10 @@ static enum dma_status fsl_tx_status(struct dma_chan *dchan, { struct fsldma_chan *chan = to_fsl_chan(dchan); enum dma_status ret; - unsigned long flags; - spin_lock_irqsave(chan-desc_lock, flags); + spin_lock_irq(chan-desc_lock); ret = dma_cookie_status(dchan, cookie, txstate); - spin_unlock_irqrestore(chan-desc_lock, flags); + spin_unlock_irq(chan-desc_lock); return ret; } -- 1.7.8.6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 4/7] fsl-dma: move functions to avoid forward declarations
From: Ira W. Snyder i...@ovro.caltech.edu This function will be modified in the next patch in the series. By moving the function in a patch separate from the changes, it will make review easier. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 96 +- 1 files changed, 48 insertions(+), 48 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index c34a628..80edc63 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -409,6 +409,54 @@ static void fsl_dma_free_descriptor(struct fsldma_chan *chan, struct fsl_desc_sw dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys); } +/** + * fsldma_cleanup_descriptor - cleanup and free a single link descriptor + * @chan: Freescale DMA channel + * @desc: descriptor to cleanup and free + * + * This function is used on a descriptor which has been executed by the DMA + * controller. It will run any callbacks, submit any dependencies, and then + * free the descriptor. + */ +static void fsldma_cleanup_descriptor(struct fsldma_chan *chan, + struct fsl_desc_sw *desc) +{ + struct dma_async_tx_descriptor *txd = desc-async_tx; + struct device *dev = chan-common.device-dev; + dma_addr_t src = get_desc_src(chan, desc); + dma_addr_t dst = get_desc_dst(chan, desc); + u32 len = get_desc_cnt(chan, desc); + + /* Run the link descriptor callback function */ + if (txd-callback) { +#ifdef FSL_DMA_LD_DEBUG + chan_dbg(chan, LD %p callback\n, desc); +#endif + txd-callback(txd-callback_param); + } + + /* Run any dependencies */ + dma_run_dependencies(txd); + + /* Unmap the dst buffer, if requested */ + if (!(txd-flags DMA_COMPL_SKIP_DEST_UNMAP)) { + if (txd-flags DMA_COMPL_DEST_UNMAP_SINGLE) + dma_unmap_single(dev, dst, len, DMA_FROM_DEVICE); + else + dma_unmap_page(dev, dst, len, DMA_FROM_DEVICE); + } + + /* Unmap the src buffer, if requested */ + if (!(txd-flags DMA_COMPL_SKIP_SRC_UNMAP)) { + if (txd-flags DMA_COMPL_SRC_UNMAP_SINGLE) + dma_unmap_single(dev, src, len, DMA_TO_DEVICE); + else + dma_unmap_page(dev, src, len, DMA_TO_DEVICE); + } + + fsl_dma_free_descriptor(chan, desc); +} + static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) { struct fsldma_chan *chan = to_fsl_chan(tx-chan); @@ -807,54 +855,6 @@ static int fsl_dma_device_control(struct dma_chan *dchan, } /** - * fsldma_cleanup_descriptor - cleanup and free a single link descriptor - * @chan: Freescale DMA channel - * @desc: descriptor to cleanup and free - * - * This function is used on a descriptor which has been executed by the DMA - * controller. It will run any callbacks, submit any dependencies, and then - * free the descriptor. - */ -static void fsldma_cleanup_descriptor(struct fsldma_chan *chan, - struct fsl_desc_sw *desc) -{ - struct dma_async_tx_descriptor *txd = desc-async_tx; - struct device *dev = chan-common.device-dev; - dma_addr_t src = get_desc_src(chan, desc); - dma_addr_t dst = get_desc_dst(chan, desc); - u32 len = get_desc_cnt(chan, desc); - - /* Run the link descriptor callback function */ - if (txd-callback) { -#ifdef FSL_DMA_LD_DEBUG - chan_dbg(chan, LD %p callback\n, desc); -#endif - txd-callback(txd-callback_param); - } - - /* Run any dependencies */ - dma_run_dependencies(txd); - - /* Unmap the dst buffer, if requested */ - if (!(txd-flags DMA_COMPL_SKIP_DEST_UNMAP)) { - if (txd-flags DMA_COMPL_DEST_UNMAP_SINGLE) - dma_unmap_single(dev, dst, len, DMA_FROM_DEVICE); - else - dma_unmap_page(dev, dst, len, DMA_FROM_DEVICE); - } - - /* Unmap the src buffer, if requested */ - if (!(txd-flags DMA_COMPL_SKIP_SRC_UNMAP)) { - if (txd-flags DMA_COMPL_SRC_UNMAP_SINGLE) - dma_unmap_single(dev, src, len, DMA_TO_DEVICE); - else - dma_unmap_page(dev, src, len, DMA_TO_DEVICE); - } - - fsl_dma_free_descriptor(chan, desc); -} - -/** * fsl_chan_xfer_ld_queue - transfer any pending transactions * @chan : Freescale DMA channel * -- 1.7.8.6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 5/7] fsl-dma: fix support for async_tx API
From: Ira W. Snyder i...@ovro.caltech.edu The current fsldma driver does not support the async_tx API. This is due to a bug: all descriptors are released when the hardware is finished with them. The async_tx API requires that the ACK bit is set by software before descriptors are freed. This bug is easiest to reproduce by enabling both CONFIG_NET_DMA and CONFIG_ASYNC_TX, and using the hardware offload support in MD RAID. The network stack will force operations on shared DMA channels, and will free descriptors which are being used by the MD RAID code. The BUG_ON(async_tx_test_ack(depend_tx)) test in async_tx_submit() will be hit, and panic the machine. TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0 GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4 0001 GPR08: a7a7a7a7 0001 045/92002 42028042 100a38d4 ed576d98 GPR16: ed5a11b0 2b162000 0200 046/92000 2d555000 ed3015e8 c15a7aa0 GPR24: c155fc40 ecb63220 ecf41d28 e47/92f640bb0 ef640c30 ecf41ca0 NIP [c02b048c] async_tx_submit+0x6c/0x2b4 LR [c02b068c] async_tx_submit+0x26c/0x2b4 Call Trace: [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable) [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10 [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8 [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4 [ecf41f40] [c04329b8] md_thread+0x138/0x16c [ecf41f90] [c008277c] kthread+0x8c/0x90 [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68 Cc: Dan Williams dan.j.willi...@intel.com Cc: Vinod Koul vinod.k...@intel.com Cc: Li Yang le...@freescale.com Cc: Qiang Liu qiang@freescale.com Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 156 +++--- drivers/dma/fsldma.h |1 + 2 files changed, 97 insertions(+), 60 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 80edc63..380c1b7 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -410,16 +410,15 @@ static void fsl_dma_free_descriptor(struct fsldma_chan *chan, struct fsl_desc_sw } /** - * fsldma_cleanup_descriptor - cleanup and free a single link descriptor + * fsldma_run_tx_complete_actions - run callback and unmap descriptor * @chan: Freescale DMA channel * @desc: descriptor to cleanup and free * * This function is used on a descriptor which has been executed by the DMA - * controller. It will run any callbacks, submit any dependencies, and then - * free the descriptor. + * controller. It will run the callback and unmap the descriptor, if requested. */ -static void fsldma_cleanup_descriptor(struct fsldma_chan *chan, - struct fsl_desc_sw *desc) +static void fsldma_run_tx_complete_actions(struct fsldma_chan *chan, + struct fsl_desc_sw *desc) { struct dma_async_tx_descriptor *txd = desc-async_tx; struct device *dev = chan-common.device-dev; @@ -427,6 +426,10 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan *chan, dma_addr_t dst = get_desc_dst(chan, desc); u32 len = get_desc_cnt(chan, desc); + /* Cookies with value zero are already cleaned up */ + if (txd-cookie == 0) + return; + /* Run the link descriptor callback function */ if (txd-callback) { #ifdef FSL_DMA_LD_DEBUG @@ -435,9 +438,6 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan *chan, txd-callback(txd-callback_param); } - /* Run any dependencies */ - dma_run_dependencies(txd); - /* Unmap the dst buffer, if requested */ if (!(txd-flags DMA_COMPL_SKIP_DEST_UNMAP)) { if (txd-flags DMA_COMPL_DEST_UNMAP_SINGLE) @@ -454,7 +454,68 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan *chan, dma_unmap_page(dev, src, len, DMA_TO_DEVICE); } - fsl_dma_free_descriptor(chan, desc); + /* +* A zeroed cookie indicates that cleanup actions have been +* run, but the descriptor has not yet been ACKed. +*/ + txd-cookie = 0; +} + +/** + * fsldma_cleanup_descriptors - cleanup and free link descriptors + * @chan: Freescale DMA channel + * + * This function is used to clean up all descriptors which have been executed + * by the DMA controller. It will run any callbacks, submit any dependencies, + * and free any descriptors which have been ACKed. + * + * LOCKING: must NOT hold chan-desc_lock + * CONTEXT: any + */ +static void fsldma_cleanup_descriptors(struct fsldma_chan *chan) +{ + struct fsl_desc_sw *desc, *_desc; + dma_cookie_t cookie = 0; + LIST_HEAD(ld_cleanup); + unsigned long flags; + + /* +* Move all descriptors onto a temporary list so that hardware +* interrupts can be enabled during cleanup
[PATCH 6/7] fsl-dma: fix a warning of unitialized cookie
From: Qiang Liu qiang@freescale.com Fix a warning of unitialized value when compile with -Wuninitialized. Cc: Dan Williams dan.j.willi...@intel.com Cc: Vinod Koul vinod.k...@intel.com Cc: Li Yang le...@freescale.com Signed-off-by: Qiang Liu qiang@freescale.com Cc: Kim Phillips kim.phill...@freescale.com --- drivers/dma/fsldma.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 380c1b7..8588cf7 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -523,7 +523,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) struct fsldma_chan *chan = to_fsl_chan(tx-chan); struct fsl_desc_sw *desc = tx_to_fsl_desc(tx); struct fsl_desc_sw *child; - dma_cookie_t cookie; + dma_cookie_t cookie = 0; spin_lock_irq(chan-desc_lock); -- 1.7.8.6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 7/7] carma: remove unnecessary DMA_INTERRUPT capability
From: Ira W. Snyder i...@ovro.caltech.edu These drivers set the DMA_INTERRUPT capability bit when requesting a DMA controller channel. This was historical, and is no longer needed. Recent changes to the drivers/dma/fsldma.c driver have removed support for this flag. This makes the carma drivers unable to find a DMA channel with the required capabilities. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/misc/carma/carma-fpga-program.c |1 - drivers/misc/carma/carma-fpga.c |3 +-- 2 files changed, 1 insertions(+), 3 deletions(-) diff --git a/drivers/misc/carma/carma-fpga-program.c b/drivers/misc/carma/carma-fpga-program.c index a2d25e4..eaddfe9 100644 --- a/drivers/misc/carma/carma-fpga-program.c +++ b/drivers/misc/carma/carma-fpga-program.c @@ -978,7 +978,6 @@ static int fpga_of_probe(struct platform_device *op) dev_set_drvdata(priv-dev, priv); dma_cap_zero(mask); dma_cap_set(DMA_MEMCPY, mask); - dma_cap_set(DMA_INTERRUPT, mask); dma_cap_set(DMA_SLAVE, mask); dma_cap_set(DMA_SG, mask); diff --git a/drivers/misc/carma/carma-fpga.c b/drivers/misc/carma/carma-fpga.c index 8c279da..861b298 100644 --- a/drivers/misc/carma/carma-fpga.c +++ b/drivers/misc/carma/carma-fpga.c @@ -666,7 +666,7 @@ static int data_submit_dma(struct fpga_device *priv, struct data_buf *buf) src = SYS_FPGA_BLOCK; tx = chan-device-device_prep_dma_memcpy(chan, dst, src, REG_BLOCK_SIZE, - DMA_PREP_INTERRUPT); + 0); if (!tx) { dev_err(priv-dev, unable to prep SYS-FPGA DMA\n); return -ENOMEM; @@ -1333,7 +1333,6 @@ static int data_of_probe(struct platform_device *op) dma_cap_zero(mask); dma_cap_set(DMA_MEMCPY, mask); - dma_cap_set(DMA_INTERRUPT, mask); dma_cap_set(DMA_SLAVE, mask); dma_cap_set(DMA_SG, mask); -- 1.7.8.6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [linuxppc-release] [PATCH v4 7/7] fsl-dma: add memcpy self test interface
On Mon, Jul 30, 2012 at 12:48:41PM -0500, Timur Tabi wrote: qiang@freescale.com wrote: Add memory copy self test when probe device, fsl-dma will be disabled if self test failed. Is this a real problem that can occur? The DMA driver used to have a self-test, but I removed it a long time ago because it was pointless. I don't see why we need to add another one back in. -- Timur Tabi Linux kernel developer at Freescale I made a comment that a test suite for the async_tx API would be very helpful in diagnosing similar problems in this and other DMA drivers. Something standalone, similar to the drivers/dma/dmatest.c driver, using the async_tx API. I think this was misinterpreted into me asking that the driver have a built-in self test. Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4 3/7] fsl-dma: change release process of dma descriptor for supporting async_tx
On Fri, Jul 27, 2012 at 05:16:09PM +0800, qiang@freescale.com wrote: From: Qiang Liu qiang@freescale.com Fix the potential risk when enable config NET_DMA and ASYNC_TX. Async_tx is lack of support in current release process of dma descriptor, all descriptors will be released whatever is acked or no-acked by async_tx, so there is a potential race condition when dma engine is uesd by others clients (e.g. when enable NET_DMA to offload TCP). In our case, a race condition which is raised when use both of talitos and dmaengine to offload xor is because napi scheduler will sync all pending requests in dma channels, it affects the process of raid operations due to ack_tx is not checked in fsl dma. The no-acked descriptor is freed which is submitted just now, as a dependent tx, this freed descriptor trigger BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit(). I'm preparing an alternative version of this patch that I think is easier to understand (it is much shorter). I'll post it up here as soon as I finish testing. It would be nice to know how to easily reproduce this bug, without needing to set up a RAID system. I don't have access to any such hardware. A driver similar to drivers/dma/dmatest.c (using the async_tx API instead) would be wonderful. Thanks, Ira TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0 GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4 0001 GPR08: a7a7a7a7 0001 045/92002 42028042 100a38d4 ed576d98 GPR16: ed5a11b0 2b162000 0200 046/92000 2d555000 ed3015e8 c15a7aa0 GPR24: c155fc40 ecb63220 ecf41d28 e47/92f640bb0 ef640c30 ecf41ca0 NIP [c02b048c] async_tx_submit+0x6c/0x2b4 LR [c02b068c] async_tx_submit+0x26c/0x2b4 Call Trace: [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable) [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10 [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8 [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4 [ecf41f40] [c04329b8] md_thread+0x138/0x16c [ecf41f90] [c008277c] kthread+0x8c/0x90 [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68 Cc: Dan Williams dan.j.willi...@intel.com Cc: Vinod Koul vinod.k...@intel.com Cc: Li Yang le...@freescale.com Cc: Ira W. Snyder i...@ovro.caltech.edu Signed-off-by: Qiang Liu qiang@freescale.com --- drivers/dma/fsldma.c | 242 +++--- drivers/dma/fsldma.h |1 + 2 files changed, 172 insertions(+), 71 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 4f2f212..87f52c0 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -400,6 +400,125 @@ out_splice: list_splice_tail_init(desc-tx_list, chan-ld_pending); } +static void fsldma_cleanup_descriptor(struct fsldma_chan *chan); +static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan); + +/** + * fsldma_clean_completed_descriptor - free all descriptors which + * has been completed and acked + * @chan: Freescale DMA channel + * + * This function is used on all completed and acked descriptors. + * All descriptors should only be freed in this function. + */ +static int +fsldma_clean_completed_descriptor(struct fsldma_chan *chan) +{ + struct fsl_desc_sw *desc, *_desc; + + /* Run the callback for each descriptor, in order */ + list_for_each_entry_safe(desc, _desc, chan-ld_completed, node) { + + if (async_tx_test_ack(desc-async_tx)) { + /* Remove from the list of transactions */ + list_del(desc-node); +#ifdef FSL_DMA_LD_DEBUG + chan_dbg(chan, LD %p free\n, desc); +#endif + dma_pool_free(chan-desc_pool, desc, + desc-async_tx.phys); + } + } + + return 0; +} + +/** + * fsldma_run_tx_complete_actions - cleanup and free a single link descriptor + * @chan: Freescale DMA channel + * @desc: descriptor to cleanup and free + * @cookie: Freescale DMA transaction identifier + * + * This function is used on a descriptor which has been executed by the DMA + * controller. It will run any callbacks, submit any dependencies. + */ +static dma_cookie_t fsldma_run_tx_complete_actions(struct fsl_desc_sw *desc, + struct fsldma_chan *chan, dma_cookie_t cookie) +{ + struct dma_async_tx_descriptor *txd = desc-async_tx; + struct device *dev = chan-common.device-dev; + dma_addr_t src = get_desc_src(chan, desc); + dma_addr_t dst = get_desc_dst(chan, desc); + u32 len = get_desc_cnt(chan, desc); + + BUG_ON(txd-cookie 0); + + if (txd-cookie 0) { + cookie = txd-cookie; + + /* Run the link descriptor callback function */ + if (txd-callback) { +#ifdef FSL_DMA_LD_DEBUG
Re: [PATCH v3 3/4] fsl-dma: change release process of dma descriptor for supporting async_tx
On Tue, Jul 17, 2012 at 07:06:33AM +, Liu Qiang-B32616 wrote: -Original Message- From: Ira W. Snyder [mailto:i...@ovro.caltech.edu] Sent: Tuesday, July 17, 2012 4:01 AM To: Liu Qiang-B32616 Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Phillips Kim-R1AAHA; herb...@gondor.hengli.com.au; da...@davemloft.net; Dan Williams; Vinod Koul; Li Yang-R58472 Subject: Re: [PATCH v3 3/4] fsl-dma: change release process of dma descriptor for supporting async_tx On Mon, Jul 16, 2012 at 12:08:29PM +0800, Qiang Liu wrote: Fix the potential risk when enable config NET_DMA and ASYNC_TX. Async_tx is lack of support in current release process of dma descriptor, all descriptors will be released whatever is acked or no-acked by async_tx, so there is a potential race condition when dma engine is uesd by others clients (e.g. when enable NET_DMA to offload TCP). In our case, a race condition which is raised when use both of talitos and dmaengine to offload xor is because napi scheduler will sync all pending requests in dma channels, it affects the process of raid operations due to ack_tx is not checked in fsl dma. The no-acked descriptor is freed which is submitted just now, as a dependent tx, this freed descriptor trigger BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit(). Cc: Dan Williams dan.j.willi...@intel.com Cc: Vinod Koul vinod.k...@intel.com Cc: Li Yang le...@freescale.com Cc: Ira W. Snyder i...@ovro.caltech.edu Signed-off-by: Qiang Liu qiang@freescale.com --- drivers/dma/fsldma.c | 378 +- --- drivers/dma/fsldma.h |1 + 2 files changed, 225 insertions(+), 154 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 4f2f212..4ee1b8f 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -400,6 +400,217 @@ out_splice: list_splice_tail_init(desc-tx_list, chan-ld_pending); } +/** + * fsl_chan_xfer_ld_queue - transfer any pending transactions + * @chan : Freescale DMA channel + * + * HARDWARE STATE: idle + * LOCKING: must hold chan-desc_lock */ static void +fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) { + struct fsl_desc_sw *desc; + + /* + * If the list of pending descriptors is empty, then we + * don't need to do any work at all + */ + if (list_empty(chan-ld_pending)) { + chan_dbg(chan, no pending LDs\n); + return; + } + + /* + * The DMA controller is not idle, which means that the interrupt + * handler will start any queued transactions when it runs after + * this transaction finishes + */ + if (!chan-idle) { + chan_dbg(chan, DMA controller still busy\n); + return; + } + + /* + * If there are some link descriptors which have not been + * transferred, we need to start the controller + */ + + /* + * Move all elements from the queue of pending transactions + * onto the list of running transactions + */ + chan_dbg(chan, idle, starting controller\n); + desc = list_first_entry(chan-ld_pending, struct fsl_desc_sw, node); + list_splice_tail_init(chan-ld_pending, chan-ld_running); + + /* + * The 85xx DMA controller doesn't clear the channel start bit + * automatically at the end of a transfer. Therefore we must clear + * it in software before starting the transfer. + */ + if ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) { + u32 mode; + + mode = DMA_IN(chan, chan-regs-mr, 32); + mode = ~FSL_DMA_MR_CS; + DMA_OUT(chan, chan-regs-mr, mode, 32); + } + + /* + * Program the descriptor's address into the DMA controller, + * then start the DMA transaction + */ + set_cdar(chan, desc-async_tx.phys); + get_cdar(chan); + + dma_start(chan); + chan-idle = false; +} + Please add a note about the locking requirements here, and for the other new functions you've added. You call this function from two places: 1) fsldma_cleanup_descriptor() - called with mod-desc_lock held 2) fsl_tx_status() - WITHOUT mod-desc_lock held One of them is definitely wrong, and I'd bet that it is #2. You're modifying ld_completed without a lock. Yes, My bad, I will correct it. +static int +fsldma_clean_completed_descriptor(struct fsldma_chan *chan) { + struct fsl_desc_sw *desc, *_desc; + + /* Run the callback for each descriptor, in order */ + list_for_each_entry_safe(desc, _desc, chan-ld_completed, node) { + + if (async_tx_test_ack(desc-async_tx)) { + /* Remove from the list of transactions */ + list_del(desc-node); +#ifdef FSL_DMA_LD_DEBUG + chan_dbg(chan, LD %p free\n, desc); #endif + dma_pool_free(chan
Re: [PATCH v3 3/4] fsl-dma: change release process of dma descriptor for supporting async_tx
On Mon, Jul 16, 2012 at 12:08:29PM +0800, Qiang Liu wrote: Fix the potential risk when enable config NET_DMA and ASYNC_TX. Async_tx is lack of support in current release process of dma descriptor, all descriptors will be released whatever is acked or no-acked by async_tx, so there is a potential race condition when dma engine is uesd by others clients (e.g. when enable NET_DMA to offload TCP). In our case, a race condition which is raised when use both of talitos and dmaengine to offload xor is because napi scheduler will sync all pending requests in dma channels, it affects the process of raid operations due to ack_tx is not checked in fsl dma. The no-acked descriptor is freed which is submitted just now, as a dependent tx, this freed descriptor trigger BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit(). Cc: Dan Williams dan.j.willi...@intel.com Cc: Vinod Koul vinod.k...@intel.com Cc: Li Yang le...@freescale.com Cc: Ira W. Snyder i...@ovro.caltech.edu Signed-off-by: Qiang Liu qiang@freescale.com --- drivers/dma/fsldma.c | 378 + drivers/dma/fsldma.h |1 + 2 files changed, 225 insertions(+), 154 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 4f2f212..4ee1b8f 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -400,6 +400,217 @@ out_splice: list_splice_tail_init(desc-tx_list, chan-ld_pending); } +/** + * fsl_chan_xfer_ld_queue - transfer any pending transactions + * @chan : Freescale DMA channel + * + * HARDWARE STATE: idle + * LOCKING: must hold chan-desc_lock + */ +static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) +{ + struct fsl_desc_sw *desc; + + /* + * If the list of pending descriptors is empty, then we + * don't need to do any work at all + */ + if (list_empty(chan-ld_pending)) { + chan_dbg(chan, no pending LDs\n); + return; + } + + /* + * The DMA controller is not idle, which means that the interrupt + * handler will start any queued transactions when it runs after + * this transaction finishes + */ + if (!chan-idle) { + chan_dbg(chan, DMA controller still busy\n); + return; + } + + /* + * If there are some link descriptors which have not been + * transferred, we need to start the controller + */ + + /* + * Move all elements from the queue of pending transactions + * onto the list of running transactions + */ + chan_dbg(chan, idle, starting controller\n); + desc = list_first_entry(chan-ld_pending, struct fsl_desc_sw, node); + list_splice_tail_init(chan-ld_pending, chan-ld_running); + + /* + * The 85xx DMA controller doesn't clear the channel start bit + * automatically at the end of a transfer. Therefore we must clear + * it in software before starting the transfer. + */ + if ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) { + u32 mode; + + mode = DMA_IN(chan, chan-regs-mr, 32); + mode = ~FSL_DMA_MR_CS; + DMA_OUT(chan, chan-regs-mr, mode, 32); + } + + /* + * Program the descriptor's address into the DMA controller, + * then start the DMA transaction + */ + set_cdar(chan, desc-async_tx.phys); + get_cdar(chan); + + dma_start(chan); + chan-idle = false; +} + Please add a note about the locking requirements here, and for the other new functions you've added. You call this function from two places: 1) fsldma_cleanup_descriptor() - called with mod-desc_lock held 2) fsl_tx_status() - WITHOUT mod-desc_lock held One of them is definitely wrong, and I'd bet that it is #2. You're modifying ld_completed without a lock. +static int +fsldma_clean_completed_descriptor(struct fsldma_chan *chan) +{ + struct fsl_desc_sw *desc, *_desc; + + /* Run the callback for each descriptor, in order */ + list_for_each_entry_safe(desc, _desc, chan-ld_completed, node) { + + if (async_tx_test_ack(desc-async_tx)) { + /* Remove from the list of transactions */ + list_del(desc-node); +#ifdef FSL_DMA_LD_DEBUG + chan_dbg(chan, LD %p free\n, desc); +#endif + dma_pool_free(chan-desc_pool, desc, + desc-async_tx.phys); + } + } + + return 0; +} + +/** + * fsldma_run_tx_complete_actions - cleanup and free a single link descriptor + * @chan: Freescale DMA channel + * @desc: descriptor to cleanup and free + * @cookie: Freescale DMA transaction identifier + * + * This function is used on a descriptor which has been executed by the DMA + * controller. It will run any callbacks, submit any dependencies, and then + * free the descriptor. + */ +static
Re: [PATCH v2 2/4] fsl-dma: remove attribute DMA_INTERRUPT of dmaengine
On Wed, Jul 11, 2012 at 05:00:53PM +0800, Qiang Liu wrote: Delete attribute DMA_INTERRUPT because fsl-dma doesn't support this function, exception will be thrown if talitos is used to offload xor at the same time. Both drivers/misc/carma/carma-fpga.c and drivers/misc/carma/carma-fpga-program.c expect the DMA_INTERRUPT property, though they do not use it. The mask is set for historical reasons. It is safe to delete the line dma_cap_set(DMA_INTERRUPT, mask); from both drivers. I don't know which other drivers may expect this feature to be present. These are only the ones which I maintain. Other than that, you can add my: Acked-by: Ira W. Snyder i...@ovro.caltech.edu Cc: Dan Williams dan.j.willi...@intel.com Cc: Vinod Koul vinod.k...@intel.com Cc: Li Yang le...@freescale.com Signed-off-by: Qiang Liu qiang@freescale.com --- drivers/dma/fsldma.c | 31 --- 1 files changed, 0 insertions(+), 31 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 8f84761..4f2f212 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -543,35 +543,6 @@ static void fsl_dma_free_chan_resources(struct dma_chan *dchan) } static struct dma_async_tx_descriptor * -fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags) -{ - struct fsldma_chan *chan; - struct fsl_desc_sw *new; - - if (!dchan) - return NULL; - - chan = to_fsl_chan(dchan); - - new = fsl_dma_alloc_descriptor(chan); - if (!new) { - chan_err(chan, %s\n, msg_ld_oom); - return NULL; - } - - new-async_tx.cookie = -EBUSY; - new-async_tx.flags = flags; - - /* Insert the link descriptor to the LD ring */ - list_add_tail(new-node, new-tx_list); - - /* Set End-of-link to the last link descriptor of new list */ - set_ld_eol(chan, new); - - return new-async_tx; -} - -static struct dma_async_tx_descriptor * fsl_dma_prep_memcpy(struct dma_chan *dchan, dma_addr_t dma_dst, dma_addr_t dma_src, size_t len, unsigned long flags) @@ -1352,12 +1323,10 @@ static int __devinit fsldma_of_probe(struct platform_device *op) fdev-irq = irq_of_parse_and_map(op-dev.of_node, 0); dma_cap_set(DMA_MEMCPY, fdev-common.cap_mask); - dma_cap_set(DMA_INTERRUPT, fdev-common.cap_mask); dma_cap_set(DMA_SG, fdev-common.cap_mask); dma_cap_set(DMA_SLAVE, fdev-common.cap_mask); fdev-common.device_alloc_chan_resources = fsl_dma_alloc_chan_resources; fdev-common.device_free_chan_resources = fsl_dma_free_chan_resources; - fdev-common.device_prep_dma_interrupt = fsl_dma_prep_interrupt; fdev-common.device_prep_dma_memcpy = fsl_dma_prep_memcpy; fdev-common.device_prep_dma_sg = fsl_dma_prep_sg; fdev-common.device_tx_status = fsl_tx_status; -- 1.7.5.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 3/4] fsl-dma: change the release process of dma descriptor
On Wed, Jul 11, 2012 at 05:01:25PM +0800, Qiang Liu wrote: Modify the release process of dma descriptor for avoiding exception when enable config NET_DMA, release dma descriptor from 1st to last second, the last descriptor which is reserved in current descriptor register may not be completed, race condition will be raised if free current descriptor. A race condition which is raised when use both of talitos and dmaengine to offload xor is because napi scheduler (NET_DMA is enabled) will sync all pending requests in dma channels, it affects the process of raid operations. The descriptor is freed which is submitted just now, but async_tx must check whether this depend tx descriptor is acked, there are poison contents in the invalid address, then BUG_ON() is thrown, so this descriptor will be freed in the next time. This patch seems to be covering up a bug in the driver, rather than actually fixing it. When it was written, it was expected that dma_do_tasklet() would run only when the controller was idle. Cc: Dan Williams dan.j.willi...@intel.com Cc: Vinod Koul vinod.k...@intel.com Cc: Li Yang le...@freescale.com Signed-off-by: Qiang Liu qiang@freescale.com --- drivers/dma/fsldma.c | 15 --- 1 files changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 4f2f212..0ba3e40 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -1035,14 +1035,22 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data) static void dma_do_tasklet(unsigned long data) { struct fsldma_chan *chan = (struct fsldma_chan *)data; - struct fsl_desc_sw *desc, *_desc; + struct fsl_desc_sw *desc, *_desc, *prev = NULL; LIST_HEAD(ld_cleanup); unsigned long flags; + dma_addr_t curr_phys = get_cdar(chan); chan_dbg(chan, tasklet entry\n); spin_lock_irqsave(chan-desc_lock, flags); + /* find the descriptor which is already completed */ + list_for_each_entry_safe(desc, _desc, chan-ld_running, node) { + if (prev desc-async_tx.phys == curr_phys) + break; + prev = desc; + } + If the DMA controller was still busy processing transactions, you should have gotten the printout irq: controller not idle! from fsldma_chan_irq() just before it scheduled the dma_do_tasklet() to run. If you did not get this printout, how was dma_do_tasklet() entered with the controller still busy? I don't understand how it can happen. If you test without your spin_lock_bh() and spin_unlock_bh() conversion patch, do you still hit the error? What happens if a user submits exactly one DMA transaction, and then leaves the system idle? The callback for the last descriptor in the chain will never get run, right? That's a bug. /* update the cookie if we have some descriptors to cleanup */ if (!list_empty(chan-ld_running)) { dma_cookie_t cookie; @@ -1058,13 +1066,14 @@ static void dma_do_tasklet(unsigned long data) * move the descriptors to a temporary list so we can drop the lock * during the entire cleanup operation */ - list_splice_tail_init(chan-ld_running, ld_cleanup); + list_cut_position(ld_cleanup, chan-ld_running, prev-node); /* the hardware is now idle and ready for more */ chan-idle = true; /* - * Start any pending transactions automatically + * Start any pending transactions automatically if current descriptor + * list is completed * * In the ideal case, we keep the DMA controller busy while we go * ahead and free the descriptors below. -- 1.7.5.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Replacement to of_register_platform_driver ?
On Wed, Jun 13, 2012 at 05:21:22PM +0200, Guillaume Dargaud wrote: Hello all, I just updated to the most recent kernel and a driver I wrote last year won't compile: xad.c:534:2: error: implicit declaration of function 'of_register_platform_driver' I see references to this function as 'obsolete' but could not find what's the new recommended way to do things. Here's a bit of the offending code: static struct of_platform_driver xad_driver = { .probe = xad_driver_probe, .remove = xad_driver_remove, .driver = { .owner = THIS_MODULE, .name = xad-driver, .of_match_table = xad_device_id, }, }; ... static int __init xad_init(void) { ... first = MKDEV (my_major, my_minor); register_chrdev_region(first, count, DEVNAME); my_cdev = cdev_alloc (); if (NULL==my_cdev) goto Err; cdev_init(my_cdev, fops); rc=cdev_add (my_cdev, first, count); ... rc = of_register_platform_driver(xad_driver); ... } -- Guillaume Dargaud http://www.gdargaud.net/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev The history of drivers/misc/carma/carma-fpga.c will show you the code changes necessary. Specifically, these two commits perform the conversion: 493340207 carma-fpga: Missed switch from of_register_platform_driver() b00e126ff MISC: convert drivers/misc/* to use module_platform_driver() Hope it helps, Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/1] fsldma: ignore end of segments interrupt
On Thu, Feb 16, 2012 at 05:50:47PM +, Tabi Timur-B04825 wrote: On Thu, Jan 26, 2012 at 2:58 PM, Ira W. Snyder i...@ovro.caltech.edu wrote: The mpc8349ea has been observed to generate spurious end of segments interrupts despite the fact that they are not enabled by this driver. Check for them and ignore them to avoid a kernel error message. When this happens, are there any other status bits set? It seems weird that there are spurious interrupts from an internal block, especially since it's the same block on all 83xx parts. I wonder if the EOSI bit just happens to be set when the interrupt occurs for some other reason. I'm not sure. The fsldma irq handler only prints bits it did not handle. There are several other bits in the driver which should never be seen, but they are handled by the irq handler anyway. This is just a remnant from the original Freescale code. I have a set of 15 test boards that I can use to figure out which other bits are set when this happens, if it is important. I put a variation of this patch (missing the skip tasklet if not idle logic) into my production boards roughly a month ago. I've gotten the controller not idle error message 748 times, as compared to the unhandled sr 0x0002 message 3449 times. This leads me to believe that this occurs mostly (but not always) concurrent with the end-of-chain interrupt. In the last month, the unhandled sr error has occurred on 92 out of 120 boards in production use. The statistics are included below. On some boards, it is much more frequent than on others. All boards have roughly the same workload. Another interesting tidbit from my logs: this only occurs on DMA channel 2 (the are numbered starting at 0, it is the 3rd channel). Here is an example log message: [3484053.821689] of:fsl-elo-dma e00082a8.dma: chan2: irq: unhandled sr 0x0002 Thanks, Ira 15 serial-number-5 1 serial-number-16 8 serial-number-18 16 serial-number-19 3 serial-number-20 21 serial-number-21 1 serial-number-24 1 serial-number-26 3 serial-number-27 2 serial-number-28 16 serial-number-29 4 serial-number-30 1 serial-number-31 4 serial-number-32 5 serial-number-33 1 serial-number-34 6 serial-number-35 18 serial-number-36 1 serial-number-39 1 serial-number-40 2 serial-number-41 10 serial-number-42 11 serial-number-43 32 serial-number-45 6 serial-number-46 4 serial-number-47 1 serial-number-49 6 serial-number-50 2 serial-number-51 4 serial-number-53 1 serial-number-55 1 serial-number-57 15 serial-number-58 1 serial-number-60 1 serial-number-62 1 serial-number-66 8 serial-number-67 2 serial-number-75 1 serial-number-76 11 serial-number-79 4 serial-number-80 8 serial-number-81 1 serial-number-82 11 serial-number-84 2 serial-number-92 20 serial-number-93 30 serial-number-94 19 serial-number-95 32 serial-number-96 73 serial-number-97 18 serial-number-99 57 serial-number-100 41 serial-number-101 28 serial-number-102 8 serial-number-103 132 serial-number-107 60 serial-number-108 55 serial-number-109 97 serial-number-110 18 serial-number-111 45 serial-number-113 6 serial-number-114 123 serial-number-115 27 serial-number-117 29 serial-number-118 12 serial-number-119 47 serial-number-120 74 serial-number-121 8 serial-number-124 128 serial-number-125 326 serial-number-128 84 serial-number-129 36 serial-number-130 2 serial-number-131 75 serial-number-133 64 serial-number-135 686 serial-number-137 97 serial-number-139 28 serial-number-140 82 serial-number-141 36 serial-number-144 31 serial-number-145 47 serial-number-147 60 serial-number-150 22 serial-number-152 36 serial-number-154 57 serial-number-156 68 serial-number-158 54 serial-number-159 37 serial-number-160 46 serial-number-161 14 serial-number-162 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/1] fsldma: ignore end of segments interrupt
On Thu, Feb 16, 2012 at 01:34:00PM -0600, Timur Tabi wrote: Ira W. Snyder wrote: This leads me to believe that this occurs mostly (but not always) concurrent with the end-of-chain interrupt. Have you tested this on an 85xx platform? No. I don't have the ability to connect my P2020 up to an FPGA to recreate the DMA workload that causes this on my 8349EA. I can run the dmatest module, if you'd like. I noticed something odd. You're modifying fsldma_chan_irq(), which is for DMA controllers that have per-channel IRQs. 83xx devices don't have per-channel IRQs -- all channels on one controller have the same IRQ. Looking at the device tree, I see that the IRQs are listed in the channel nodes *and* in the controller node. I don't see how we ever use the per-controller ISR. fsldma_ctrl_irq() (the per-controller irq handler) just calls through to fsldma_chan_irq() (the per-channel irq handler). I wonder if the shared IRQ is the part of the cause of the interrupts you're seeing. My device tree is slightly modified to remove the per-controller interrupts and interrupt-parent properties. Each individual channel has identical interrupts and interrupt-parent properties specified. Someone here suggested that I do that, several years ago. It has been too long, and I do not remember who. I can reverse it, and use the per-controller IRQ instead. In the last month, the unhandled sr error has occurred on 92 out of 120 boards in production use. The statistics are included below. On some boards, it is much more frequent than on others. All boards have roughly the same workload. Another interesting tidbit from my logs: this only occurs on DMA channel 2 (the are numbered starting at 0, it is the 3rd channel). Here is an example log message: What happens if you never register that channel? That is, remove this node from the device tree: dma-channel@100 { compatible = fsl,mpc8349-dma-channel, fsl,elo-dma-channel; reg = 0x100 0x80; cell-index = 2; interrupt-parent = ipic; interrupts = 71 8; }; I can try that. I hunch the problem will move, as the carma-fpga driver (see drivers/misc/carma/carma-fpga.c) will claim the 4th channel instead. Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/1] fsldma: ignore end of segments interrupt
On Thu, Feb 16, 2012 at 01:48:20PM -0600, Timur Tabi wrote: Ira W. Snyder wrote: No. I don't have the ability to connect my P2020 up to an FPGA to recreate the DMA workload that causes this on my 8349EA. I can run the dmatest module, if you'd like. I just want to make sure your patch doesn't break 85xx. I tried both with and without this patch on my P2020 COM Express board. With both kernels, the board locks up after 20 minutes or so, no messages to the serial console. I wouldn't be surprised if there are some memory problems with this board. In any case, I don't have any reason to believe that this patch causes any trouble: the board dies without it. However, the patch doesn't break DMA on 85xx. If I unload the dmatest module after 10 minutes or so, it claims to have passed many thousands of tests without problems. My 8349EA test boards (15 of them) have been running their normal DMA workload plus dmatest on the unused 4th channel, all without errors, for several hours. ~2.5 million successful tests per board, as I write this. Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2] fsldma: ignore end of segments interrupt
The mpc8349ea has been observed to generate spurious end of segments interrupts despite the fact that they are not enabled by this driver. Check for them and ignore them to avoid a kernel error message. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu Cc: Dan Williams dan.j.willi...@intel.com --- Changes v1 - v2: - skip the descriptor cleanup tasklet if the controller is not yet idle drivers/dma/fsldma.c | 27 --- 1 files changed, 24 insertions(+), 3 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 8a78154..037631a 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -1052,20 +1052,41 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data) stat = ~FSL_DMA_SR_EOLNI; } - /* check that the DMA controller is really idle */ - if (!dma_is_idle(chan)) - chan_err(chan, irq: controller not idle!\n); + /* +* This driver does not use this feature, therefore we shouldn't +* ever see this bit set in the status register. However, it has +* been observed on MPC8349EA parts. +*/ + if (stat FSL_DMA_SR_EOSI) { + chan_dbg(chan, irq: End-of-Segments INT\n); + stat = ~FSL_DMA_SR_EOSI; + } /* check that we handled all of the bits */ if (stat) chan_err(chan, irq: unhandled sr 0x%08x\n, stat); /* +* Check that the DMA controller is really idle +* +* Occasionally on MPC8349EA parts, a spurious End-of-Segments +* interrupt is generated. When this happens, the controller is +* still busy. In this case, we shouldn't run the tasklet to +* clean up idle descriptors, since the controller is not yet idle. +*/ + if (!dma_is_idle(chan)) { + chan_err(chan, irq: controller not idle!\n); + goto out_skip_tasklet; + } + + /* * Schedule the tasklet to handle all cleanup of the current * transaction. It will start a new transaction if there is * one pending. */ tasklet_schedule(chan-tasklet); + +out_skip_tasklet: chan_dbg(chan, irq: Exit\n); return IRQ_HANDLED; } -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/1] carma-fpga: fix race between data dumping and DMA callback
On Fri, Jan 27, 2012 at 08:25:37AM +1100, Benjamin Herrenschmidt wrote: On Thu, 2012-01-26 at 13:00 -0800, Ira W. Snyder wrote: @@ -970,7 +984,13 @@ static ssize_t data_en_show(struct device *dev, struct device_attribute *attr, char *buf) { struct fpga_device *priv = dev_get_drvdata(dev); - return snprintf(buf, PAGE_SIZE, %u\n, priv-enabled); + int ret; + + spin_lock_irq(priv-lock); + ret = snprintf(buf, PAGE_SIZE, %u\n, priv-enabled); + spin_unlock_irq(priv-lock); + + return ret; } I don't think the lock buys you anything here. You're right. Feel free to drop the hunk. Ira Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/1] fsldma: ignore end of segments interrupt
The mpc8349ea has been observed to generate spurious end of segments interrupts despite the fact that they are not enabled by this driver. Check for them and ignore them to avoid a kernel error message. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu Cc: Dan Williams dan.j.willi...@intel.com --- drivers/dma/fsldma.c | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 8a78154..7dc9689 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -1052,6 +1052,16 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data) stat = ~FSL_DMA_SR_EOLNI; } + /* +* This driver does not use this feature, therefore we shouldn't +* ever see this bit set in the status register. However, it has +* been observed on MPC8349EA parts. +*/ + if (stat FSL_DMA_SR_EOSI) { + chan_dbg(chan, irq: End-of-Segments INT\n); + stat = ~FSL_DMA_SR_EOSI; + } + /* check that the DMA controller is really idle */ if (!dma_is_idle(chan)) chan_err(chan, irq: controller not idle!\n); -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/1] carma-fpga: fix lockdep warning
Lockdep occasionally complains with the message: INFO: HARDIRQ-safe - HARDIRQ-unsafe lock order detected This is caused by calling videobuf_dma_unmap() under spin_lock_irq(). To fix the warning, we drop the lock before unmapping and freeing the buffer. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu Cc: Benjamin Herrenschmidt b...@kernel.crashing.org --- drivers/misc/carma/carma-fpga.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/misc/carma/carma-fpga.c b/drivers/misc/carma/carma-fpga.c index 14e974b2..4fd896d 100644 --- a/drivers/misc/carma/carma-fpga.c +++ b/drivers/misc/carma/carma-fpga.c @@ -1079,6 +1079,7 @@ static ssize_t data_read(struct file *filp, char __user *ubuf, size_t count, struct fpga_reader *reader = filp-private_data; struct fpga_device *priv = reader-priv; struct list_head *used = priv-used; + bool drop_buffer = false; struct data_buf *dbuf; size_t avail; void *data; @@ -1166,10 +1167,12 @@ have_buffer: * One of two things has happened, the device is disabled, or the * device has been reconfigured underneath us. In either case, we * should just throw away the buffer. +* +* Lockdep complains if this is done under the spinlock, so we +* handle it during the unlock path. */ if (!priv-enabled || dbuf-size != priv-bufsize) { - videobuf_dma_unmap(priv-dev, dbuf-vb); - data_free_buffer(dbuf); + drop_buffer = true; goto out_unlock; } @@ -1178,6 +1181,12 @@ have_buffer: out_unlock: spin_unlock_irq(priv-lock); + + if (drop_buffer) { + videobuf_dma_unmap(priv-dev, dbuf-vb); + data_free_buffer(dbuf); + } + return count; } -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/1] carma-fpga: fix race between data dumping and DMA callback
When the system is under heavy load, we occasionally saw a problem where the system would get a legitimate interrupt when they should be disabled. This was caused by the data_dma_cb() DMA callback unconditionally re-enabling FPGA interrupts even when data dumping is disabled. When data dumping was re-enabled, the irq handler would fire while a DMA was in progress. The BUG_ON(priv-inflight != NULL); during the second invocation of the DMA callback caused the system to crash. To fix the issue, the priv-enabled boolean is moved under the protection of the priv-lock spinlock. The DMA callback checks the boolean to know whether to re-enable FPGA interrupts before it returns. Now that it is fixed, the driver keeps FPGA interrupts disabled when it expects that they are disabled, fixing the bug. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu Cc: Benjamin Herrenschmidt b...@kernel.crashing.org --- drivers/misc/carma/carma-fpga.c | 101 +++--- 1 files changed, 61 insertions(+), 40 deletions(-) diff --git a/drivers/misc/carma/carma-fpga.c b/drivers/misc/carma/carma-fpga.c index 4fd896d..0cfc5bf 100644 --- a/drivers/misc/carma/carma-fpga.c +++ b/drivers/misc/carma/carma-fpga.c @@ -560,6 +560,9 @@ static void data_enable_interrupts(struct fpga_device *priv) /* flush the writes */ fpga_read_reg(priv, 0, MMAP_REG_STATUS); + fpga_read_reg(priv, 1, MMAP_REG_STATUS); + fpga_read_reg(priv, 2, MMAP_REG_STATUS); + fpga_read_reg(priv, 3, MMAP_REG_STATUS); /* switch back to the external interrupt source */ iowrite32be(0x3F, priv-regs + SYS_IRQ_SOURCE_CTL); @@ -591,8 +594,12 @@ static void data_dma_cb(void *data) list_move_tail(priv-inflight-entry, priv-used); priv-inflight = NULL; - /* clear the FPGA status and re-enable interrupts */ - data_enable_interrupts(priv); + /* +* If data dumping is still enabled, then clear the FPGA +* status registers and re-enable FPGA interrupts +*/ + if (priv-enabled) + data_enable_interrupts(priv); spin_unlock_irqrestore(priv-lock, flags); @@ -708,6 +715,15 @@ static irqreturn_t data_irq(int irq, void *dev_id) spin_lock(priv-lock); + /* +* This is an error case that should never happen. +* +* If this driver has a bug and manages to re-enable interrupts while +* a DMA is in progress, then we will hit this statement and should +* start paying attention immediately. +*/ + BUG_ON(priv-inflight != NULL); + /* hide the interrupt by switching the IRQ driver to GPIO */ data_disable_interrupts(priv); @@ -762,11 +778,15 @@ out: */ static int data_device_enable(struct fpga_device *priv) { + bool enabled; u32 val; int ret; /* multiple enables are safe: they do nothing */ - if (priv-enabled) + spin_lock_irq(priv-lock); + enabled = priv-enabled; + spin_unlock_irq(priv-lock); + if (enabled) return 0; /* check that the FPGAs are programmed */ @@ -797,6 +817,9 @@ static int data_device_enable(struct fpga_device *priv) goto out_error; } + /* prevent the FPGAs from generating interrupts */ + data_disable_interrupts(priv); + /* hookup the irq handler */ ret = request_irq(priv-irq, data_irq, IRQF_SHARED, drv_name, priv); if (ret) { @@ -804,11 +827,13 @@ static int data_device_enable(struct fpga_device *priv) goto out_error; } - /* switch to the external FPGA IRQ line */ - data_enable_interrupts(priv); - - /* success, we're enabled */ + /* allow the DMA callback to re-enable FPGA interrupts */ + spin_lock_irq(priv-lock); priv-enabled = true; + spin_unlock_irq(priv-lock); + + /* allow the FPGAs to generate interrupts */ + data_enable_interrupts(priv); return 0; out_error: @@ -834,41 +859,40 @@ out_error: */ static int data_device_disable(struct fpga_device *priv) { - int ret; + spin_lock_irq(priv-lock); /* allow multiple disable */ - if (!priv-enabled) + if (!priv-enabled) { + spin_unlock_irq(priv-lock); return 0; + } + + /* +* Mark the device disabled +* +* This stops DMA callbacks from re-enabling interrupts +*/ + priv-enabled = false; - /* switch to the internal GPIO IRQ line */ + /* prevent the FPGAs from generating interrupts */ data_disable_interrupts(priv); + /* wait until all ongoing DMA has finished */ + while (priv-inflight != NULL) { + spin_unlock_irq(priv-lock); + wait_event(priv-wait, priv-inflight == NULL); + spin_lock_irq(priv-lock); + } + + spin_unlock_irq(priv-lock
Re: [PATCH] fsldma: fix performance degradation by optimizing spinlock use.
On Wed, Jan 11, 2012 at 07:54:55AM +, Shi Xuelin-B29237 wrote: Hello Iris, As we discussed in the previous patch, I add one smp_mb() in fsl_tx_status. In my testing with iozone, this smp_mb() could cause 1%~2% performance degradation. Anyway it is acceptable for me. Do you have any other comments? This patch looks fine to me. Ira -Original Message- From: Shi Xuelin-B29237 Sent: 2011年12月26日 14:01 To: i...@ovro.caltech.edu; vinod.k...@intel.com; dan.j.willi...@intel.com; linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org Cc: Shi Xuelin-B29237 Subject: [PATCH] fsldma: fix performance degradation by optimizing spinlock use. From: Forrest shi b29...@freescale.com dma status check function fsl_tx_status is heavily called in a tight loop and the desc lock in fsl_tx_status contended by the dma status update function. this caused the dma performance degrades much. this patch releases the lock in the fsl_tx_status function, and introduce the smp_mb() to avoid possible memory inconsistency. Signed-off-by: Forrest Shi xuelin@freescale.com --- drivers/dma/fsldma.c |6 +- 1 files changed, 1 insertions(+), 5 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 8a78154..008fb5e 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -986,15 +986,11 @@ static enum dma_status fsl_tx_status(struct dma_chan *dchan, struct fsldma_chan *chan = to_fsl_chan(dchan); dma_cookie_t last_complete; dma_cookie_t last_used; - unsigned long flags; - - spin_lock_irqsave(chan-desc_lock, flags); last_complete = chan-completed_cookie; + smp_mb(); last_used = dchan-cookie; - spin_unlock_irqrestore(chan-desc_lock, flags); - dma_set_tx_state(txstate, last_complete, last_used, 0); return dma_async_is_complete(cookie, last_complete, last_used); } -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: fix kernel log of oops/panic instruction dump
A kernel oops/panic prints an instruction dump showing several instructions before and after the instruction which caused the oops/panic. The code intended that the faulting instruction be enclosed in angle brackets, however a bug caused the faulting instruction to be interpreted by printk() as the message log level. To fix this, the KERN_CONT log level is added before the actualy text of the printed message. === Before the patch === [ 1081.587266] Instruction dump: [ 1081.590236] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 4e800020 3801 [ 1081.598034] 3d20c03a 9009a114 7c0004ac 3920 [ 1081.602500] 4e800020 3803ffd0 2b89 4[ 1081.587266] Instruction dump: 4[ 1081.590236] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 4e800020 3801 4[ 1081.598034] 3d20c03a 9009a114 7c0004ac 3920 9809[ 1081.602500] 4e800020 3803ffd0 2b89 === After the patch === [ 51.385216] Instruction dump: [ 51.388186] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 4e800020 3801 [ 51.395986] 3d20c03a 9009a114 7c0004ac 3920 9809 4e800020 3803ffd0 2b89 4[ 51.385216] Instruction dump: 4[ 51.388186] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 4e800020 3801 4[ 51.395986] 3d20c03a 9009a114 7c0004ac 3920 9809 4e800020 3803ffd0 2b89 Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu Cc: Paul Mackerras pau...@samba.org Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: linuxppc-dev@lists.ozlabs.org --- In the above examples, the first block is what is shown on the serial console as the machine dies. The second block is the dump as captured by mtdoops. arch/powerpc/kernel/process.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 6457574..271f809 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -566,12 +566,12 @@ static void show_instructions(struct pt_regs *regs) */ if (!__kernel_text_address(pc) || __get_user(instr, (unsigned int __user *)pc)) { - printk( ); + printk(KERN_CONT ); } else { if (regs-nip == pc) - printk(%08x , instr); + printk(KERN_CONT %08x , instr); else - printk(%08x , instr); + printk(KERN_CONT %08x , instr); } pc += sizeof(int); -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: fix kernel log of oops/panic instruction dump
On Sat, Jan 07, 2012 at 09:50:10AM +1100, Benjamin Herrenschmidt wrote: On Fri, 2012-01-06 at 14:34 -0800, Ira W. Snyder wrote: A kernel oops/panic prints an instruction dump showing several instructions before and after the instruction which caused the oops/panic. The code intended that the faulting instruction be enclosed in angle brackets, however a bug caused the faulting instruction to be interpreted by printk() as the message log level. To fix this, the KERN_CONT log level is added before the actualy text of If you could fix the text above to say 'actual' (instead of 'actualy') when you commit this, that would be great. Darn typos. :) the printed message. Nice one, thanks. Cheers, Ben. === Before the patch === [ 1081.587266] Instruction dump: [ 1081.590236] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 4e800020 3801 [ 1081.598034] 3d20c03a 9009a114 7c0004ac 3920 [ 1081.602500] 4e800020 3803ffd0 2b89 4[ 1081.587266] Instruction dump: 4[ 1081.590236] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 4e800020 3801 4[ 1081.598034] 3d20c03a 9009a114 7c0004ac 3920 9809[ 1081.602500] 4e800020 3803ffd0 2b89 === After the patch === [ 51.385216] Instruction dump: [ 51.388186] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 4e800020 3801 [ 51.395986] 3d20c03a 9009a114 7c0004ac 3920 9809 4e800020 3803ffd0 2b89 4[ 51.385216] Instruction dump: 4[ 51.388186] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 4e800020 3801 4[ 51.395986] 3d20c03a 9009a114 7c0004ac 3920 9809 4e800020 3803ffd0 2b89 Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu Cc: Paul Mackerras pau...@samba.org Cc: Benjamin Herrenschmidt b...@kernel.crashing.org Cc: linuxppc-dev@lists.ozlabs.org --- In the above examples, the first block is what is shown on the serial console as the machine dies. The second block is the dump as captured by mtdoops. arch/powerpc/kernel/process.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 6457574..271f809 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -566,12 +566,12 @@ static void show_instructions(struct pt_regs *regs) */ if (!__kernel_text_address(pc) || __get_user(instr, (unsigned int __user *)pc)) { - printk( ); + printk(KERN_CONT ); } else { if (regs-nip == pc) - printk(%08x , instr); + printk(KERN_CONT %08x , instr); else - printk(%08x , instr); + printk(KERN_CONT %08x , instr); } pc += sizeof(int); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use.
On Fri, Dec 02, 2011 at 03:47:27AM +, Shi Xuelin-B29237 wrote: Hi Iris, I'm convinced that smp_rmb() is needed when removing the spinlock. As noted, Documentation/memory-barriers.txt says that stores on one CPU can be observed by another CPU in a different order. Previously, there was an UNLOCK (in fsl_dma_tx_submit) followed by a LOCK (in fsl_tx_status). This provided a full barrier, forcing the operations to complete correctly when viewed by the second CPU. I do not agree this smp_rmb() works here. Because when this smp_rmb() executed and begin to read chan-common.cookie, you still cannot avoid the order issue. Something like one is reading old value, but another CPU is updating the new value. My point is here the order is not important for the DMA decision. Completed DMA tx is decided as not complete is not a big deal, because next time it will be OK. I believe there is no case that could cause uncompleted DMA tx is decided as completed, because the fsl_tx_status is called after fsl_dma_tx_submit for a specific cookie. If you can give me an example here, I will agree with you. According to memory-barriers.txt, writes to main memory may be observed in any order if memory barriers are not used. This means that writes can appear to happen in a different order than they were issued by the CPU. Citing from the text: There are certain things that the Linux kernel memory barriers do not guarantee: (*) There is no guarantee that any of the memory accesses specified before a memory barrier will be _complete_ by the completion of a memory barrier instruction; the barrier can be considered to draw a line in that CPU's access queue that accesses of the appropriate type may not cross. Also: Without intervention, CPU 2 may perceive the events on CPU 1 in some effectively random order, despite the write barrier issued by CPU 1: Also: When dealing with CPU-CPU interactions, certain types of memory barrier should always be paired. A lack of appropriate pairing is almost certainly an error. A write barrier should always be paired with a data dependency barrier or read barrier, though a general barrier would also be viable. Therefore, in an SMP system, the following situation can happen. descriptor-cookie = 2 chan-common.cookie = 1 chan-completed_cookie = 1 This occurs when CPU-A calls fsl_dma_tx_submit() and then CPU-B calls dma_async_is_complete() ***after*** CPU-B has observed the write to descriptor-cookie, and ***before*** before CPU-B has observed the write to chan-common.cookie. Remember, without barriers, CPU-B can observe CPU-A's memory accesses in *any possible order*. Memory accesses are not guaranteed to be *complete* by the time fsl_dma_tx_submit() returns! With the above values, dma_async_is_complete() returns DMA_COMPLETE. This is incorrect: the DMA is still in progress. The required invariant chan-common.cookie = descriptor-cookie has not been met. By adding an smp_rmb(), I force CPU-B to stall until *both* stores in fsl_dma_tx_submit() (descriptor-cookie and chan-common.cookie) actually hit main memory. This avoids the above situation: all CPU's observe descriptor-cookie and chan-common.cookie to update in sync with each other. Is this unclear in any way? Please run your test with the smp_rmb() and measure the performance impact. Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use.
On Wed, Nov 30, 2011 at 09:57:47AM +, Shi Xuelin-B29237 wrote: Hello Ira, In drivers/dma/dmaengine.c, we have below tight loop to check DMA completion in mainline Linux: do { status = dma_async_is_tx_complete(chan, cookie, NULL, NULL); if (time_after_eq(jiffies, dma_sync_wait_timeout)) { printk(KERN_ERR dma_sync_wait_timeout!\n); return DMA_ERROR; } } while (status == DMA_IN_PROGRESS); That is the body of dma_sync_wait(). It is mostly used in the raid code. I understand that you don't want to change the raid code to use callbacks. In any case, I think we've strayed from the topic under consideration, which is: can we remove this spinlock without introducing a bug. I'm convinced that smp_rmb() is needed when removing the spinlock. As noted, Documentation/memory-barriers.txt says that stores on one CPU can be observed by another CPU in a different order. Previously, there was an UNLOCK (in fsl_dma_tx_submit) followed by a LOCK (in fsl_tx_status). This provided a full barrier, forcing the operations to complete correctly when viewed by the second CPU. From the text: Therefore, from (1), (2) and (4) an UNLOCK followed by an unconditional LOCK is equivalent to a full barrier, but a LOCK followed by an UNLOCK is not. Also, please read EXAMPLES OF MEMORY BARRIER SEQUENCES and INTER-CPU LOCKING BARRIER EFFECTS. Particularly, in EXAMPLES OF MEMORY BARRIER SEQUENCES, the text notes: Without intervention, CPU 2 may perceive the events on CPU 1 in some effectively random order, despite the write barrier issued by CPU 1: [snip diagram] And thirdly, a read barrier acts as a partial order on loads. Consider the following sequence of events: [snip diagram] Without intervention, CPU 2 may then choose to perceive the events on CPU 1 in some effectively random order, despite the write barrier issued by CPU 1: [snip diagram] And so on. Please read this entire section in the document. I can't give you an ACK on the proposed patch. To the best of my understanding, I believe it introduces a bug. I've tried to provide as much evidence for this belief as I can, in the form of documentation in the kernel source tree. If you can cite some documentation that shows I am wrong, I will happily change my mind! Ira -Original Message- From: Ira W. Snyder [mailto:i...@ovro.caltech.edu] Sent: 2011年11月30日 1:26 To: Li Yang-R58472 Cc: Shi Xuelin-B29237; vinod.k...@intel.com; dan.j.willi...@intel.com; linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use. On Tue, Nov 29, 2011 at 03:19:05AM +, Li Yang-R58472 wrote: Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use. On Thu, Nov 24, 2011 at 08:12:25AM +, Shi Xuelin-B29237 wrote: Hi Ira, Thanks for your review. After second thought, I think your scenario may not occur. Because the cookie 20 we query must be returned by fsl_dma_tx_submit(...) in practice. We never query a cookie not returned by fsl_dma_tx_submit(...). I agree about this part. When we call fsl_tx_status(20), the chan-common.cookie is definitely wrote as 20 and cpu2 could not read as 19. This is what I don't agree about. However, I'm not an expert on CPU cache vs. memory accesses in an multi-processor system. The section titled CACHE COHERENCY in Documentation/memory-barriers.txt leads me to believe that the scenario I described is possible. For Freescale PowerPC, the chip automatically takes care of cache coherency. Even if this is a concern, spinlock can't address it. What happens if CPU1's write of chan-common.cookie only goes into CPU1's cache. It never makes it to main memory before CPU2 fetches the old value of 19. I don't think you should see any performance impact from the smp_mb() operation. Smp_mb() do have impact on performance if it's in the hot path. While it might be safer having it, I doubt it is really necessary. If the CPU1 doesn't have the updated last_used, it's shouldn't have known there is a cookie 20 existed either. I believe that you are correct, for powerpc. However, anything outside of arch/powerpc shouldn't assume it only runs on powerpc. I wouldn't be surprised to see fsldma running on an iMX someday (ARM processor). My interpretation says that the change introduces the possibility that fsl_tx_status() returns the wrong answer for an extremely small time window, on SMP only, based on Documentation/memory-barriers.txt. But I can't seem convince you. My real question is what code path is hitting this spinlock? Is it in mainline Linux? Why is it polling rather than using callbacks to determine DMA completion
Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use.
On Tue, Nov 29, 2011 at 03:19:05AM +, Li Yang-R58472 wrote: Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use. On Thu, Nov 24, 2011 at 08:12:25AM +, Shi Xuelin-B29237 wrote: Hi Ira, Thanks for your review. After second thought, I think your scenario may not occur. Because the cookie 20 we query must be returned by fsl_dma_tx_submit(...) in practice. We never query a cookie not returned by fsl_dma_tx_submit(...). I agree about this part. When we call fsl_tx_status(20), the chan-common.cookie is definitely wrote as 20 and cpu2 could not read as 19. This is what I don't agree about. However, I'm not an expert on CPU cache vs. memory accesses in an multi-processor system. The section titled CACHE COHERENCY in Documentation/memory-barriers.txt leads me to believe that the scenario I described is possible. For Freescale PowerPC, the chip automatically takes care of cache coherency. Even if this is a concern, spinlock can't address it. What happens if CPU1's write of chan-common.cookie only goes into CPU1's cache. It never makes it to main memory before CPU2 fetches the old value of 19. I don't think you should see any performance impact from the smp_mb() operation. Smp_mb() do have impact on performance if it's in the hot path. While it might be safer having it, I doubt it is really necessary. If the CPU1 doesn't have the updated last_used, it's shouldn't have known there is a cookie 20 existed either. I believe that you are correct, for powerpc. However, anything outside of arch/powerpc shouldn't assume it only runs on powerpc. I wouldn't be surprised to see fsldma running on an iMX someday (ARM processor). My interpretation says that the change introduces the possibility that fsl_tx_status() returns the wrong answer for an extremely small time window, on SMP only, based on Documentation/memory-barriers.txt. But I can't seem convince you. My real question is what code path is hitting this spinlock? Is it in mainline Linux? Why is it polling rather than using callbacks to determine DMA completion? Thanks, Ira -Original Message- From: Ira W. Snyder [mailto:i...@ovro.caltech.edu] Sent: 2011年11月23日 2:59 To: Shi Xuelin-B29237 Cc: dan.j.willi...@intel.com; Li Yang-R58472; z...@zh-kernel.org; vinod.k...@intel.com; linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use. On Tue, Nov 22, 2011 at 12:55:05PM +0800, b29...@freescale.com wrote: From: Forrest Shi b29...@freescale.com dma status check function fsl_tx_status is heavily called in a tight loop and the desc lock in fsl_tx_status contended by the dma status update function. this caused the dma performance degrades much. this patch releases the lock in the fsl_tx_status function. I believe it has no neglect impact on the following call of dma_async_is_complete(...). we can see below three conditions will be identified as success a) x complete use b) x complete+N use+N c) x complete use+N here complete is the completed_cookie, use is the last_used cookie, x is the querying cookie, N is MAX cookie when chan-completed_cookie is being read, the last_used may be incresed. Anyway it has no neglect impact on the dma status decision. Signed-off-by: Forrest Shi xuelin@freescale.com --- drivers/dma/fsldma.c |5 - 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 8a78154..1dca56f 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -986,15 +986,10 @@ static enum dma_status fsl_tx_status(struct dma_chan *dchan, struct fsldma_chan *chan = to_fsl_chan(dchan); dma_cookie_t last_complete; dma_cookie_t last_used; - unsigned long flags; - - spin_lock_irqsave(chan-desc_lock, flags); This will cause a bug. See below for a detailed explanation. You need this instead: /* * On an SMP system, we must ensure that this CPU has seen the * memory accesses performed by another CPU under the * chan-desc_lock spinlock. */ smp_mb(); last_complete = chan-completed_cookie; last_used = dchan-cookie; - spin_unlock_irqrestore(chan-desc_lock, flags); - dma_set_tx_state(txstate, last_complete, last_used, 0); return dma_async_is_complete(cookie, last_complete, last_used); } Facts: - dchan-cookie is the same member as chan-common.cookie (same memory location) - chan-common.cookie is the last allocated cookie for a pending
Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use.
On Thu, Nov 24, 2011 at 08:12:25AM +, Shi Xuelin-B29237 wrote: Hi Ira, Thanks for your review. After second thought, I think your scenario may not occur. Because the cookie 20 we query must be returned by fsl_dma_tx_submit(...) in practice. We never query a cookie not returned by fsl_dma_tx_submit(...). I agree about this part. When we call fsl_tx_status(20), the chan-common.cookie is definitely wrote as 20 and cpu2 could not read as 19. This is what I don't agree about. However, I'm not an expert on CPU cache vs. memory accesses in an multi-processor system. The section titled CACHE COHERENCY in Documentation/memory-barriers.txt leads me to believe that the scenario I described is possible. What happens if CPU1's write of chan-common.cookie only goes into CPU1's cache. It never makes it to main memory before CPU2 fetches the old value of 19. I don't think you should see any performance impact from the smp_mb() operation. Thanks, Ira -Original Message- From: Ira W. Snyder [mailto:i...@ovro.caltech.edu] Sent: 2011年11月23日 2:59 To: Shi Xuelin-B29237 Cc: dan.j.willi...@intel.com; Li Yang-R58472; z...@zh-kernel.org; vinod.k...@intel.com; linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use. On Tue, Nov 22, 2011 at 12:55:05PM +0800, b29...@freescale.com wrote: From: Forrest Shi b29...@freescale.com dma status check function fsl_tx_status is heavily called in a tight loop and the desc lock in fsl_tx_status contended by the dma status update function. this caused the dma performance degrades much. this patch releases the lock in the fsl_tx_status function. I believe it has no neglect impact on the following call of dma_async_is_complete(...). we can see below three conditions will be identified as success a) x complete use b) x complete+N use+N c) x complete use+N here complete is the completed_cookie, use is the last_used cookie, x is the querying cookie, N is MAX cookie when chan-completed_cookie is being read, the last_used may be incresed. Anyway it has no neglect impact on the dma status decision. Signed-off-by: Forrest Shi xuelin@freescale.com --- drivers/dma/fsldma.c |5 - 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 8a78154..1dca56f 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -986,15 +986,10 @@ static enum dma_status fsl_tx_status(struct dma_chan *dchan, struct fsldma_chan *chan = to_fsl_chan(dchan); dma_cookie_t last_complete; dma_cookie_t last_used; - unsigned long flags; - - spin_lock_irqsave(chan-desc_lock, flags); This will cause a bug. See below for a detailed explanation. You need this instead: /* * On an SMP system, we must ensure that this CPU has seen the * memory accesses performed by another CPU under the * chan-desc_lock spinlock. */ smp_mb(); last_complete = chan-completed_cookie; last_used = dchan-cookie; - spin_unlock_irqrestore(chan-desc_lock, flags); - dma_set_tx_state(txstate, last_complete, last_used, 0); return dma_async_is_complete(cookie, last_complete, last_used); } Facts: - dchan-cookie is the same member as chan-common.cookie (same memory location) - chan-common.cookie is the last allocated cookie for a pending transaction - chan-completed_cookie is the last completed transaction I have replaced dchan-cookie with chan-common.cookie in the below explanation, to keep everything referenced from the same structure. Variable usage before your change. Everything is used locked. - RW chan-common.cookie (fsl_dma_tx_submit) - R chan-common.cookie (fsl_tx_status) - R chan-completed_cookie (fsl_tx_status) - W chan-completed_cookie (dma_do_tasklet) Variable usage after your change: - RW chan-common.cookie LOCKED - R chan-common.cookie NO LOCK - R chan-completed_cookie NO LOCK - W chan-completed_cookie LOCKED What if we assume that you have a 2 CPU system (such as a P2020). After your changes, one possible sequence is: === CPU1 - allocate + submit descriptor: fsl_dma_tx_submit() === spin_lock_irqsave descriptor-cookie = 20 (x in your example) chan-common.cookie = 20 (used in your example) spin_unlock_irqrestore === CPU2 - immediately calls fsl_tx_status() === chan-common.cookie == 19 chan-completed_cookie == 19 descriptor-cookie == 20 Since we don't have locks anymore, CPU2 may not have seen the write to chan-common.cookie yet. Also assume that the DMA hardware has not started processing the transaction yet
Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use.
On Tue, Nov 22, 2011 at 12:55:05PM +0800, b29...@freescale.com wrote: From: Forrest Shi b29...@freescale.com dma status check function fsl_tx_status is heavily called in a tight loop and the desc lock in fsl_tx_status contended by the dma status update function. this caused the dma performance degrades much. this patch releases the lock in the fsl_tx_status function. I believe it has no neglect impact on the following call of dma_async_is_complete(...). we can see below three conditions will be identified as success a) x complete use b) x complete+N use+N c) x complete use+N here complete is the completed_cookie, use is the last_used cookie, x is the querying cookie, N is MAX cookie when chan-completed_cookie is being read, the last_used may be incresed. Anyway it has no neglect impact on the dma status decision. Signed-off-by: Forrest Shi xuelin@freescale.com --- drivers/dma/fsldma.c |5 - 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 8a78154..1dca56f 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -986,15 +986,10 @@ static enum dma_status fsl_tx_status(struct dma_chan *dchan, struct fsldma_chan *chan = to_fsl_chan(dchan); dma_cookie_t last_complete; dma_cookie_t last_used; - unsigned long flags; - - spin_lock_irqsave(chan-desc_lock, flags); This will cause a bug. See below for a detailed explanation. You need this instead: /* * On an SMP system, we must ensure that this CPU has seen the * memory accesses performed by another CPU under the * chan-desc_lock spinlock. */ smp_mb(); last_complete = chan-completed_cookie; last_used = dchan-cookie; - spin_unlock_irqrestore(chan-desc_lock, flags); - dma_set_tx_state(txstate, last_complete, last_used, 0); return dma_async_is_complete(cookie, last_complete, last_used); } Facts: - dchan-cookie is the same member as chan-common.cookie (same memory location) - chan-common.cookie is the last allocated cookie for a pending transaction - chan-completed_cookie is the last completed transaction I have replaced dchan-cookie with chan-common.cookie in the below explanation, to keep everything referenced from the same structure. Variable usage before your change. Everything is used locked. - RW chan-common.cookie(fsl_dma_tx_submit) - R chan-common.cookie(fsl_tx_status) - R chan-completed_cookie (fsl_tx_status) - W chan-completed_cookie (dma_do_tasklet) Variable usage after your change: - RW chan-common.cookieLOCKED - R chan-common.cookieNO LOCK - R chan-completed_cookie NO LOCK - W chan-completed_cookie LOCKED What if we assume that you have a 2 CPU system (such as a P2020). After your changes, one possible sequence is: === CPU1 - allocate + submit descriptor: fsl_dma_tx_submit() === spin_lock_irqsave descriptor-cookie = 20 (x in your example) chan-common.cookie = 20(used in your example) spin_unlock_irqrestore === CPU2 - immediately calls fsl_tx_status() === chan-common.cookie == 19 chan-completed_cookie == 19 descriptor-cookie == 20 Since we don't have locks anymore, CPU2 may not have seen the write to chan-common.cookie yet. Also assume that the DMA hardware has not started processing the transaction yet. Therefore dma_do_tasklet() has not been called, and chan-completed_cookie has not been updated. In this case, dma_async_is_complete() (on CPU2) returns DMA_SUCCESS, even though the DMA operation has not succeeded. The DMA operation has not even started yet! The smp_mb() fixes this, since it forces CPU2 to have seen all memory operations that happened before CPU1 released the spinlock. Spinlocks are implicit SMP memory barriers. Therefore, the above example becomes: smp_mb(); chan-common.cookie == 20 chan-completed_cookie == 19 descriptor-cookie == 20 Then dma_async_is_complete() returns DMA_IN_PROGRESS, which is correct. Thanks, Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: PCI DMA to user mem on mpc83xx
On Mon, May 23, 2011 at 11:12:41AM +0200, Andre Schwarz wrote: Ira, we have a pretty old PCI device driver here that needs some basic rework running on 2.6.27 on several MPC83xx. It's a simple char-device with give me some data implemented using read() resulting in zero-copy DMA to user mem. There's get_user_pages() working under the hood along with SetPageDirty() and page_cache_release(). Main goal is to prepare a sg-list that gets fed into a DMA controller. I wonder if there's a more up-to-date/efficient and future proof scheme of creating the mapping. Could you provide some pointers or would you stick to the current scheme ? This scheme is the best you'll come up with for zero-copy IO. I used get_user_pages_fast(), but otherwise my implementation was the same. These interfaces should be fairly future proof. In the end, I realized that most of my transfers were 4 bytes in length, and zero copy IO was a waste of effort. I decided to use mmap instead. Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH RFCv7 0/2] CARMA Board Support
On Thu, May 19, 2011 at 02:13:41PM +1000, Benjamin Herrenschmidt wrote: On Fri, 2011-02-11 at 15:34 -0800, Ira W. Snyder wrote: Hello everyone, This is the seventh posting of these drivers, taking into account comments from earlier postings. I've made sure that the drivers both pass checkpatch without any errors or warnings. I would appreciate as much review as you can offer, so that these can get into the next merge cycle. They've been sitting outside mainline for far too long. This has been bitrotting for way too long indeed. I'm sticking this into powerpc -next today. Thanks Ben. I'll grab the -next tree and make sure it builds on my board. I don't think any API's have changed, but I will send an updated version if they have. Thanks, Ira RFCv6 - RFCv7: - reference count private data structure (to support unbind) - use #defines instead of hex values for registers - keep lines =80 characters RFCv5 - RFCv6: - change locking in several functions - use list_move_tail() to simplify code - remove unused helper functions RFCv4 - RFCv5: - remove unecessary locking per review comments - do not clobber return values from *_interruptible() - explicitly track buffer DMA mapping - use #defines instead of raw hex addresses - change enable sysfs attribute to root-writeable only RFCv3 - RFCv4: - updates for DATA-FPGA version 2 RFCv2 - RFCv3: - use miscdevice framework (removing the carma class) - add bitfile readback capability to the programmer RFCv1 - RFCv2: - change comments to kerneldoc format - Kconfig improvements - use the videobuf_dma_sg API in the programmer - updates for Freescale DMAEngine DMA_SLAVE API changes KNOWN ISSUES: - untested with a setup that can generate interrupts (will get access soon) - does not handle runtime unbind Information about the CARMA board: The CARMA board is essentially an MPC8349EA MDS reference design with a 1GHz ADC and 4 high powered data processing FPGAs connected to the local bus. It is all packed into a compact PCI form factor. It is used at the Owens Valley Radio Observatory as the main component in the correlator system. For board information, see: http://www.mmarray.org/~dwh/carma_board/index.html For DATA-FPGA register layout, see: http://www.mmarray.org/memos/carma_memo46.pdf These drivers are the necessary pieces to get the data processing FPGAs working and producing data. Despite the fact that the hardware is custom and we are the only users, I'd still like to get the drivers upstream. Several people have suggested that this is possible. Some further patches will be forthcoming. I have a driver for the LED subsystem and the PPS subsystem. The LED register layout is expected to change soon, so I won't post the driver until that is finished. The PPS driver will be posted seperately from this patch series; it is very generic. Thanks to everyone who has provided comments on earlier versions! Ira W. Snyder (2): misc: add CARMA DATA-FPGA Access Driver misc: add CARMA DATA-FPGA Programmer support drivers/misc/Kconfig|1 + drivers/misc/Makefile |1 + drivers/misc/carma/Kconfig | 18 + drivers/misc/carma/Makefile |2 + drivers/misc/carma/carma-fpga-program.c | 1141 drivers/misc/carma/carma-fpga.c | 1433 +++ 6 files changed, 2596 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/carma/Kconfig create mode 100644 drivers/misc/carma/Makefile create mode 100644 drivers/misc/carma/carma-fpga-program.c create mode 100644 drivers/misc/carma/carma-fpga.c ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC v2] virtio: add virtio-over-PCI driver
On Fri, May 06, 2011 at 12:00:34PM +, Kushwaha Prabhakar-B32579 wrote: Hi, I want to use this patch as base patch for FSL 85xx platform to support PCIe Agent. The work looks to be little old now. So wanted to understand if any development has happened further on it. In case no, I would take this work forward for PCIe Agent. Any help/suggestions are most appreciated in this regard. Hi Prabhakar, I use PCI agent mode on an mpc8349emds board. All of the important setup is done very early in the boot process, by U-Boot. Search the U-Boot source for CONFIG_PCISLAVE. I hunch that the setup needed for 85xx boards are similar. This virtio-over-PCI work is now very old. It was intended to provide a communication mechanism between a PCI Master and many PCI Agents (slaves). Dave Miller (networking maintainer) suggested to use virtio for this so that many different devices could be used. Such as: - network interface - serial port (for serial console) I am aware of other ongoing work in this area. Specifically, some ARM developers are working on a virtio API using their message registers. This work is much newer, and will be a much better starting place for you. Search the virtualization mailing list for: [PATCH 00/02] virtio: Virtio platform driver Here is a link to some of their code: http://www.spinics.net/lists/linux-sh/msg07188.html I am currently using a custom driver to provide a network device on my PCI agents. Searching the mailing list archives for PCINet, you will find early versions of the driver. I am happy to provide you a current copy. It does not use virtio at all, and is unlikely to be accepted into mainline Linux. I am happy to provide any of my code if you think it would help you get started. Specifically, the current version of PCINet show how to use the DMA controller in order to get good network performance. I am also happy to help port code to 83xx, as well as test on 83xx. Please ask any questions you may have. I have people ask about this code about once every two months. There is plenty of interest in a mainline Linux solution to this problem. :) I will be moving to 85xx someday, and I hope there is an accepted mainline solution by then. I hope it helps, Ira -Original Message- From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Ira Snyder Sent: Friday, 27 February, 2009 3:19 AM To: Arnd Bergmann Cc: linux-ker...@vger.kernel.org; Rusty Russell; Jan-Bernd Themann; linuxppc-...@ozlabs.org; net...@vger.kernel.org Subject: Re: [RFC v2] virtio: add virtio-over-PCI driver On Thu, Feb 26, 2009 at 09:37:14PM +0100, Arnd Bergmann wrote: On Thursday 26 February 2009, Ira Snyder wrote: On Thu, Feb 26, 2009 at 05:15:27PM +0100, Arnd Bergmann wrote: I think so too. I was just getting something working, and thought it would be better to have it out there rather than be working on it forever. I'll try to break things up as I have time. Ok, perfect! For the libraries, would you suggest breaking things into seperate code files, and using EXPORT_SYMBOL_GPL()? I'm not very familiar with doing that, I've mostly been writing code within the existing device driver frameworks. Or do I need export symbol at all? I'm not sure... You have both options. When you list each file as a separate module in the Makefile, you use EXPORT_SYMBOL_GPL to mark functions that get called by dependent modules, but this will work only in one way. You can also link multiple files together into one module, although it is less common to link a single source file into multiple modules. Ok. I'm more familiar with the EXPORT_SYMBOL_GPL interface, so I'll do that. If we decide it sucks later, we'll change it. I always thought you were supposed to use packed for data structures that are external to the system. I purposely designed the structures so they wouldn't need padding. That would only make sense for structures that are explicitly unaligned, like a register layout using struct my_registers { __le16 first; __le32 second __attribute__((packed)); __le16 third; }; Even here, I'd recommend listing the individual members as packed rather than the entire struct. Obviously if you layout the members in a sane way, you don't need either. Ok. I'll drop the __attribute__((packed)) and make sure there aren't problems. I don't suspect any, though. I mostly don't need it. In fact, the only place I'm using registers not specific to the messaging unit is in the probe routine, where I setup the 1GB window into host memory and setting up access to the guest memory on the PCI bus. You could add the registers you need for this to the reg property of your device, to be mapped with of_iomap. If the registers for setting up this window don't logically fit into the same device as the one you
Re: tmpfs size restriction
On Wed, Apr 20, 2011 at 09:21:00PM +0200, Schwarz,Andre wrote: Hi, I'm facing an issue with tmpfs mounts on PowerPC (mpc83xx specifically). After mount -t tmpfs tmpfs /tmp -o size=16m I can fill the machine's mem (512MiB) until oom becomes active. I can't see this on any other machine (x86/ARM) I have access to. There's always a no space left on device message as soon as size specified is reached ... kernel versions available are v2.6.26.27 and v2.6.34.7 showing the same behaviour. I'd expect the kernel to limit available tmpfs size to 50% of physical memory anyway. Any ideas what might be wrong ? For what it is worth, I tried this on an 8349EA board, using 2.6.38rc8. It behaved exactly as I would expect. A short log is below. Maybe your mount command parses options differently on the powerpc machine? Try it with the mount options before the mount points? iws@carmaboard7 ~ $ mkdir mnt mkdir: cannot create directory `mnt': File exists iws@carmaboard7 ~ $ ls mnt/ iws@carmaboard7 ~ $ sudo mount -t tmpfs -o size=16m,users none mnt iws@carmaboard7 ~ $ ls mnt/ iws@carmaboard7 ~ $ mount | grep mnt none on /home/iws/mnt type tmpfs (rw,nosuid,nodev,noexec,relatime,size=16384k) iws@carmaboard7 ~ $ cd ^C iws@carmaboard7 ~ $ dd if=/dev/zero of=mnt/file.bin bs=1M count=18 dd: writing `mnt/file.bin': No space left on device 16+0 records in 15+0 records out 16760832 bytes (17 MB) copied, 0.313836 s, 53.4 MB/s iws@carmaboard7 ~ $ du -b mnt/file.bin 16760832mnt/file.bin iws@carmaboard7 ~ $ df -h mnt FilesystemSize Used Avail Use% Mounted on none 16M - - - /home/iws/mnt iws@carmaboard7 ~ $ uname -a Linux carmaboard7.correlator.pvt 2.6.38-rc8-00028-g24d6894 #1 Tue Mar 8 09:48:15 PST 2011 ppc e300c1 GNU/Linux iws@carmaboard7 ~ $ cat /proc/cpuinfo processor : 0 cpu : e300c1 clock : 533.28MHz revision: 3.1 (pvr 8083 0031) bogomips: 133.29 timebase: 66646782 platform: MPC834x MDS model : CARMA Memory : 256 MB Hope it helps, Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: platform_driver/of_platform_driver compile warning in fsldma.c
On Fri, Apr 08, 2011 at 04:12:13AM -0500, Kumar Gala wrote: Grant, I'm being lazy, can you give any quick insight on the following compile warning: drivers/dma/fsldma.c:1457:2: warning: initialization from incompatible pointer type drivers/dma/fsldma.c: In function 'fsldma_init': drivers/dma/fsldma.c:1468:2: warning: passing argument 1 of 'platform_driver_register' from incompatible pointer type include/linux/platform_device.h:124:12: note: expected 'struct platform_driver *' but argument is of type 'struct of_platform_driver *' drivers/dma/fsldma.c: In function 'fsldma_exit': drivers/dma/fsldma.c:1473:2: warning: passing argument 1 of 'platform_driver_unregister' from incompatible pointer type include/linux/platform_device.h:125:13: note: expected 'struct platform_driver *' but argument is of type 'struct of_platform_driver *' The struct of_platform_driver needs to be changed to a struct platform_driver. Just remove the of_ prefix, the structure initialization is correct. I sent a patch for this yesterday to LKML. The title is: fsldma: fix build warning caused by of_platform_device changes Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Using dmaengine on Freescale P2020 RDB
On Wed, Apr 06, 2011 at 12:40:58PM -0700, Chuck Ketcham wrote: All, I have a Freescale P2020 Reference Design Board. I am investigating the possibility of using the dmaengine capability in the 2.6.32.13 kernel to transfer data from memory out onto the PCIe bus. As a first step, I thought I would try the DMA test client (dmatest.ko) to make sure the dmaengine was functioning. I know this doesn't transfer anything over PCIe but only transfers from one memory buffer to another, but I figured I need to get this working first. Anyway I built dmatest.ko and ran it (with insmod), and discovered it didn't do anything. I added some printk's to the kernel to investigate what was going on and I found that all attempts to find a channel within dma_request_channel were unsuccessful. Three of the channels were not used because they were already publicly allocated. One channel was not used because it didn't have DMA_MEMCPY capability. Here are my questions then: 1. Is the dmaengine the appropriate method to use for transferring data from memory out onto the PCIe bus? 2. If dmaengine is correct, what can I do to free up a channel for my own use? I use the Freescale DMA engine to transfer lots of data out to PCI, on an 8349EA chip. The P2020 DMA engine uses the same driver. I hunch you have enabled CONFIG_NET_DMA, which will claim the channels. You should disable it to use the devices for other uses. If you want an example of using the DMA engine to transfer from DDR memory to the PowerPC local bus, search the mailing list archives for CARMA Board Drivers (RFCv7 was the latest posting). Transferring from DDR to PCI works exactly the same way. Hope it helps, Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Using dmaengine on Freescale P2020 RDB
On Wed, Apr 06, 2011 at 01:29:05PM -0700, Chuck Ketcham wrote: Ira, Thanks for the reference to the CARMA drivers. I will have to take a look at that. In my case, CONFIG_NET_DMA is not enabled. However, I did notice the following entry in my p2020rdb.dts file that may have something to do with dma channels being allocated -- can anyone interpret this?: dma@21300 { #address-cells = 1; #size-cells = 1; compatible = fsl,eloplus-dma; reg = 0x21300 0x4; ranges = 0x0 0x21100 0x200; cell-index = 0; dma-channel@0 { compatible = fsl,eloplus-dma-channel; reg = 0x0 0x80; cell-index = 0; interrupt-parent = mpic; interrupts = 20 2; }; dma-channel@80 { compatible = fsl,eloplus-dma-channel; reg = 0x80 0x80; cell-index = 1; interrupt-parent = mpic; interrupts = 21 2; }; dma-channel@100 { compatible = fsl,eloplus-dma-channel; reg = 0x100 0x80; cell-index = 2; interrupt-parent = mpic; interrupts = 22 2; }; dma-channel@180 { compatible = fsl,eloplus-dma-channel; reg = 0x180 0x80; cell-index = 3; interrupt-parent = mpic; interrupts = 23 2; }; }; Your DTS file looks fine. It is what I would expect to see. The channels are not allocated by anything here. Turning on CONFIG_DMADEVICES_DEBUG may give you some insight into how the dmaengine core is allocating the channels. I don't have any better advice. I'm afraid you'll have to figure out who is requesting all of the channels on your own. Ira --- On Wed, 4/6/11, Ira W. Snyder i...@ovro.caltech.edu wrote: From: Ira W. Snyder i...@ovro.caltech.edu Subject: Re: Using dmaengine on Freescale P2020 RDB To: Chuck Ketcham chuckk2...@yahoo.com Cc: linuxppc-dev@lists.ozlabs.org Date: Wednesday, April 6, 2011, 1:10 PM On Wed, Apr 06, 2011 at 12:40:58PM -0700, Chuck Ketcham wrote: All, I have a Freescale P2020 Reference Design Board. I am investigating the possibility of using the dmaengine capability in the 2.6.32.13 kernel to transfer data from memory out onto the PCIe bus. As a first step, I thought I would try the DMA test client (dmatest.ko) to make sure the dmaengine was functioning. I know this doesn't transfer anything over PCIe but only transfers from one memory buffer to another, but I figured I need to get this working first. Anyway I built dmatest.ko and ran it (with insmod), and discovered it didn't do anything. I added some printk's to the kernel to investigate what was going on and I found that all attempts to find a channel within dma_request_channel were unsuccessful. Three of the channels were not used because they were already publicly allocated. One channel was not used because it didn't have DMA_MEMCPY capability. Here are my questions then: 1. Is the dmaengine the appropriate method to use for transferring data from memory out onto the PCIe bus? 2. If dmaengine is correct, what can I do to free up a channel for my own use? I use the Freescale DMA engine to transfer lots of data out to PCI, on an 8349EA chip. The P2020 DMA engine uses the same driver. I hunch you have enabled CONFIG_NET_DMA, which will claim the channels. You should disable it to use the devices for other uses. If you want an example of using the DMA engine to transfer from DDR memory to the PowerPC local bus, search the mailing list archives for CARMA Board Drivers (RFCv7 was the latest posting). Transferring from DDR to PCI works exactly the same way. Hope it helps, Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 1/9] dmatest: fix automatic buffer unmap type
The dmatest code relies on the DMAEngine API to automatically call dma_unmap_single() on src buffers. The flags it passes are incorrect, fix them. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/dmatest.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c index 5589358..7e1b0aa 100644 --- a/drivers/dma/dmatest.c +++ b/drivers/dma/dmatest.c @@ -285,7 +285,12 @@ static int dmatest_func(void *data) set_user_nice(current, 10); - flags = DMA_CTRL_ACK | DMA_COMPL_SKIP_DEST_UNMAP | DMA_PREP_INTERRUPT; + /* +* src buffers are freed by the DMAEngine code with dma_unmap_single() +* dst buffers are freed by ourselves below +*/ + flags = DMA_CTRL_ACK | DMA_PREP_INTERRUPT + | DMA_COMPL_SKIP_DEST_UNMAP | DMA_COMPL_SRC_UNMAP_SINGLE; while (!kthread_should_stop() !(iterations total_tests = iterations)) { -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 2/9] fsldma: move related helper functions near each other
This is a purely cosmetic cleanup. It is nice to have related functions right next to each other in the code. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 116 +++-- 1 files changed, 64 insertions(+), 52 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 4de947a..2e1af45 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -39,33 +39,9 @@ static const char msg_ld_oom[] = No free memory for link descriptor\n; -static void dma_init(struct fsldma_chan *chan) -{ - /* Reset the channel */ - DMA_OUT(chan, chan-regs-mr, 0, 32); - - switch (chan-feature FSL_DMA_IP_MASK) { - case FSL_DMA_IP_85XX: - /* Set the channel to below modes: -* EIE - Error interrupt enable -* EOSIE - End of segments interrupt enable (basic mode) -* EOLNIE - End of links interrupt enable -* BWC - Bandwidth sharing among channels -*/ - DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC - | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE - | FSL_DMA_MR_EOSIE, 32); - break; - case FSL_DMA_IP_83XX: - /* Set the channel to below modes: -* EOTIE - End-of-transfer interrupt enable -* PRC_RM - PCI read multiple -*/ - DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_EOTIE - | FSL_DMA_MR_PRC_RM, 32); - break; - } -} +/* + * Register Helpers + */ static void set_sr(struct fsldma_chan *chan, u32 val) { @@ -77,6 +53,30 @@ static u32 get_sr(struct fsldma_chan *chan) return DMA_IN(chan, chan-regs-sr, 32); } +static void set_cdar(struct fsldma_chan *chan, dma_addr_t addr) +{ + DMA_OUT(chan, chan-regs-cdar, addr | FSL_DMA_SNEN, 64); +} + +static dma_addr_t get_cdar(struct fsldma_chan *chan) +{ + return DMA_IN(chan, chan-regs-cdar, 64) ~FSL_DMA_SNEN; +} + +static dma_addr_t get_ndar(struct fsldma_chan *chan) +{ + return DMA_IN(chan, chan-regs-ndar, 64); +} + +static u32 get_bcr(struct fsldma_chan *chan) +{ + return DMA_IN(chan, chan-regs-bcr, 32); +} + +/* + * Descriptor Helpers + */ + static void set_desc_cnt(struct fsldma_chan *chan, struct fsl_dma_ld_hw *hw, u32 count) { @@ -113,24 +113,49 @@ static void set_desc_next(struct fsldma_chan *chan, hw-next_ln_addr = CPU_TO_DMA(chan, snoop_bits | next, 64); } -static void set_cdar(struct fsldma_chan *chan, dma_addr_t addr) +static void set_ld_eol(struct fsldma_chan *chan, + struct fsl_desc_sw *desc) { - DMA_OUT(chan, chan-regs-cdar, addr | FSL_DMA_SNEN, 64); -} + u64 snoop_bits; -static dma_addr_t get_cdar(struct fsldma_chan *chan) -{ - return DMA_IN(chan, chan-regs-cdar, 64) ~FSL_DMA_SNEN; -} + snoop_bits = ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_83XX) + ? FSL_DMA_SNEN : 0; -static dma_addr_t get_ndar(struct fsldma_chan *chan) -{ - return DMA_IN(chan, chan-regs-ndar, 64); + desc-hw.next_ln_addr = CPU_TO_DMA(chan, + DMA_TO_CPU(chan, desc-hw.next_ln_addr, 64) | FSL_DMA_EOL + | snoop_bits, 64); } -static u32 get_bcr(struct fsldma_chan *chan) +/* + * DMA Engine Hardware Control Helpers + */ + +static void dma_init(struct fsldma_chan *chan) { - return DMA_IN(chan, chan-regs-bcr, 32); + /* Reset the channel */ + DMA_OUT(chan, chan-regs-mr, 0, 32); + + switch (chan-feature FSL_DMA_IP_MASK) { + case FSL_DMA_IP_85XX: + /* Set the channel to below modes: +* EIE - Error interrupt enable +* EOSIE - End of segments interrupt enable (basic mode) +* EOLNIE - End of links interrupt enable +* BWC - Bandwidth sharing among channels +*/ + DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC + | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE + | FSL_DMA_MR_EOSIE, 32); + break; + case FSL_DMA_IP_83XX: + /* Set the channel to below modes: +* EOTIE - End-of-transfer interrupt enable +* PRC_RM - PCI read multiple +*/ + DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_EOTIE + | FSL_DMA_MR_PRC_RM, 32); + break; + } } static int dma_is_idle(struct fsldma_chan *chan) @@ -185,19 +210,6 @@ static void dma_halt(struct fsldma_chan *chan) dev_err(chan-dev, DMA halt timeout!\n); } -static void set_ld_eol(struct fsldma_chan *chan, - struct fsl_desc_sw *desc) -{ - u64 snoop_bits; - - snoop_bits = ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_83XX
[PATCH v3 0/9] fsldma: lockup fixes
Hello everyone, I've been chasing random infrequent controller lockups in the fsldma driver for a long time. I finally managed to find the problem and fix it. I'm not quite sure about the exact sequence of events which causes the race condition, but it is related to using the hardware registers to track the controller state. See the patch changelogs for more detail. The problems were quickly found by turning on DMAPOOL_DEBUG inside mm/dmapool.c. This poisons memory allocated with the dmapool API. With dmapool poisoning turned on, the dmatest driver would start producing failures within a few seconds. After this patchset has been applied, I have run several iterations of the 10 threads per channel, 10 iterations per thread test without any problems. I have also tested it with the CARMA drivers (posted at linuxppc-dev previously), which make use of the external control features. While making the previous changes, I noticed that the fsldma driver does not respect the automatic DMA unmapping of src and dst buffers. I have added support for this feature. This also required a fix to dmatest, which was sending incorrect flags. The support async_tx dependencies patch could be split apart from the automatic unmapping patch if it is desirable. They both touch the same piece of code, so I thought it was ok to combine them. Let me know. I would really like to see this go into 2.6.39. I think we can get it reviewed before then. :) Much thanks goes to Felix Radensky for testing on a P2020 (85xx DMA IP core). I wouldn't have been able to track down the problems on 85xx without his dilligent testing. v2 - v3: - use chan_dbg() and chan_err() macros for channel printk v1 - v2: - reordered patches (dmatest change is first now) - fix problems on 85xx controller - only set correct bits for 83xx in dma_halt() Ira W. Snyder (9): dmatest: fix automatic buffer unmap type fsldma: move related helper functions near each other fsldma: use channel name in printk output fsldma: improve link descriptor debugging fsldma: minor codingstyle and consistency fixes fsldma: fix controller lockups fsldma: support async_tx dependencies and automatic unmapping fsldma: reduce locking during descriptor cleanup fsldma: make halt behave nicely on all supported controllers drivers/dma/dmatest.c |7 +- drivers/dma/fsldma.c | 551 +++-- drivers/dma/fsldma.h |6 +- 3 files changed, 311 insertions(+), 253 deletions(-) -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 4/9] fsldma: improve link descriptor debugging
This adds better tracking to link descriptor allocations, callbacks, and frees. This makes it much easier to track errors with link descriptors. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 21 +++-- 1 files changed, 15 insertions(+), 6 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index e535cd1..82b8e9f 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -420,6 +420,10 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor( desc-async_tx.tx_submit = fsl_dma_tx_submit; desc-async_tx.phys = pdesc; +#ifdef FSL_DMA_LD_DEBUG + chan_dbg(chan, LD %p allocated\n, desc); +#endif + return desc; } @@ -470,6 +474,9 @@ static void fsldma_free_desc_list(struct fsldma_chan *chan, list_for_each_entry_safe(desc, _desc, list, node) { list_del(desc-node); +#ifdef FSL_DMA_LD_DEBUG + chan_dbg(chan, LD %p free\n, desc); +#endif dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys); } } @@ -481,6 +488,9 @@ static void fsldma_free_desc_list_reverse(struct fsldma_chan *chan, list_for_each_entry_safe_reverse(desc, _desc, list, node) { list_del(desc-node); +#ifdef FSL_DMA_LD_DEBUG + chan_dbg(chan, LD %p free\n, desc); +#endif dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys); } } @@ -557,9 +567,6 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy( chan_err(chan, %s\n, msg_ld_oom); goto fail; } -#ifdef FSL_DMA_LD_DEBUG - chan_dbg(chan, new link desc alloc %p\n, new); -#endif copy = min(len, (size_t)FSL_DMA_BCR_MAX_CNT); @@ -645,9 +652,6 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_sg(struct dma_chan *dchan, chan_err(chan, %s\n, msg_ld_oom); goto fail; } -#ifdef FSL_DMA_LD_DEBUG - chan_dbg(chan, new link desc alloc %p\n, new); -#endif set_desc_cnt(chan, new-hw, len); set_desc_src(chan, new-hw, src); @@ -882,13 +886,18 @@ static void fsl_chan_ld_cleanup(struct fsldma_chan *chan) callback_param = desc-async_tx.callback_param; if (callback) { spin_unlock_irqrestore(chan-desc_lock, flags); +#ifdef FSL_DMA_LD_DEBUG chan_dbg(chan, LD %p callback\n, desc); +#endif callback(callback_param); spin_lock_irqsave(chan-desc_lock, flags); } /* Run any dependencies, then free the descriptor */ dma_run_dependencies(desc-async_tx); +#ifdef FSL_DMA_LD_DEBUG + chan_dbg(chan, LD %p free\n, desc); +#endif dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys); } -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 3/9] fsldma: use channel name in printk output
This makes debugging the driver much easier when multiple channels are running concurrently. In addition, you can see how much descriptor memory each channel has allocated via the dmapool API in sysfs. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 69 + drivers/dma/fsldma.h |1 + 2 files changed, 36 insertions(+), 34 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 2e1af45..e535cd1 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -37,7 +37,12 @@ #include fsldma.h -static const char msg_ld_oom[] = No free memory for link descriptor\n; +#define chan_dbg(chan, fmt, arg...)\ + dev_dbg(chan-dev, %s: fmt, chan-name, ##arg) +#define chan_err(chan, fmt, arg...)\ + dev_err(chan-dev, %s: fmt, chan-name, ##arg) + +static const char msg_ld_oom[] = No free memory for link descriptor; /* * Register Helpers @@ -207,7 +212,7 @@ static void dma_halt(struct fsldma_chan *chan) } if (!dma_is_idle(chan)) - dev_err(chan-dev, DMA halt timeout!\n); + chan_err(chan, DMA halt timeout!\n); } /** @@ -405,7 +410,7 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor( desc = dma_pool_alloc(chan-desc_pool, GFP_ATOMIC, pdesc); if (!desc) { - dev_dbg(chan-dev, out of memory for link desc\n); + chan_dbg(chan, out of memory for link descriptor\n); return NULL; } @@ -439,13 +444,11 @@ static int fsl_dma_alloc_chan_resources(struct dma_chan *dchan) * We need the descriptor to be aligned to 32bytes * for meeting FSL DMA specification requirement. */ - chan-desc_pool = dma_pool_create(fsl_dma_engine_desc_pool, - chan-dev, + chan-desc_pool = dma_pool_create(chan-name, chan-dev, sizeof(struct fsl_desc_sw), __alignof__(struct fsl_desc_sw), 0); if (!chan-desc_pool) { - dev_err(chan-dev, unable to allocate channel %d - descriptor pool\n, chan-id); + chan_err(chan, unable to allocate descriptor pool\n); return -ENOMEM; } @@ -491,7 +494,7 @@ static void fsl_dma_free_chan_resources(struct dma_chan *dchan) struct fsldma_chan *chan = to_fsl_chan(dchan); unsigned long flags; - dev_dbg(chan-dev, Free all channel resources.\n); + chan_dbg(chan, free all channel resources\n); spin_lock_irqsave(chan-desc_lock, flags); fsldma_free_desc_list(chan, chan-ld_pending); fsldma_free_desc_list(chan, chan-ld_running); @@ -514,7 +517,7 @@ fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags) new = fsl_dma_alloc_descriptor(chan); if (!new) { - dev_err(chan-dev, msg_ld_oom); + chan_err(chan, %s\n, msg_ld_oom); return NULL; } @@ -551,11 +554,11 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy( /* Allocate the link descriptor from DMA pool */ new = fsl_dma_alloc_descriptor(chan); if (!new) { - dev_err(chan-dev, msg_ld_oom); + chan_err(chan, %s\n, msg_ld_oom); goto fail; } #ifdef FSL_DMA_LD_DEBUG - dev_dbg(chan-dev, new link desc alloc %p\n, new); + chan_dbg(chan, new link desc alloc %p\n, new); #endif copy = min(len, (size_t)FSL_DMA_BCR_MAX_CNT); @@ -639,11 +642,11 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_sg(struct dma_chan *dchan, /* allocate and populate the descriptor */ new = fsl_dma_alloc_descriptor(chan); if (!new) { - dev_err(chan-dev, msg_ld_oom); + chan_err(chan, %s\n, msg_ld_oom); goto fail; } #ifdef FSL_DMA_LD_DEBUG - dev_dbg(chan-dev, new link desc alloc %p\n, new); + chan_dbg(chan, new link desc alloc %p\n, new); #endif set_desc_cnt(chan, new-hw, len); @@ -815,7 +818,7 @@ static void fsl_dma_update_completed_cookie(struct fsldma_chan *chan) spin_lock_irqsave(chan-desc_lock, flags); if (list_empty(chan-ld_running)) { - dev_dbg(chan-dev, no running descriptors\n); + chan_dbg(chan, no running descriptors\n); goto out_unlock; } @@ -863,7 +866,7 @@ static void fsl_chan_ld_cleanup(struct fsldma_chan *chan) spin_lock_irqsave(chan-desc_lock, flags); - dev_dbg(chan-dev, chan completed_cookie = %d\n, chan-completed_cookie); + chan_dbg(chan, chan
[PATCH v3 9/9] fsldma: make halt behave nicely on all supported controllers
The original dma_halt() function set the CA (channel abort) bit on both the 83xx and 85xx controllers. This is incorrect on the 83xx, where this bit means TEM (transfer error mask) instead. The 83xx doesn't support channel abort, so we only do this operation on 85xx. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 19 --- 1 files changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index d300de4..8670a50 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -221,13 +221,26 @@ static void dma_halt(struct fsldma_chan *chan) u32 mode; int i; + /* read the mode register */ mode = DMA_IN(chan, chan-regs-mr, 32); - mode |= FSL_DMA_MR_CA; - DMA_OUT(chan, chan-regs-mr, mode, 32); - mode = ~(FSL_DMA_MR_CS | FSL_DMA_MR_EMS_EN | FSL_DMA_MR_CA); + /* +* The 85xx controller supports channel abort, which will stop +* the current transfer. On 83xx, this bit is the transfer error +* mask bit, which should not be changed. +*/ + if ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) { + mode |= FSL_DMA_MR_CA; + DMA_OUT(chan, chan-regs-mr, mode, 32); + + mode = ~FSL_DMA_MR_CA; + } + + /* stop the DMA controller */ + mode = ~(FSL_DMA_MR_CS | FSL_DMA_MR_EMS_EN); DMA_OUT(chan, chan-regs-mr, mode, 32); + /* wait for the DMA controller to become idle */ for (i = 0; i 100; i++) { if (dma_is_idle(chan)) return; -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3 5/9] fsldma: minor codingstyle and consistency fixes
This fixes some minor violations of the coding style. It also changes the style of the device_prep_dma_*() function definitions so they are identical. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 29 + drivers/dma/fsldma.h |4 ++-- 2 files changed, 15 insertions(+), 18 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 82b8e9f..5da1a4a 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -89,7 +89,7 @@ static void set_desc_cnt(struct fsldma_chan *chan, } static void set_desc_src(struct fsldma_chan *chan, - struct fsl_dma_ld_hw *hw, dma_addr_t src) +struct fsl_dma_ld_hw *hw, dma_addr_t src) { u64 snoop_bits; @@ -99,7 +99,7 @@ static void set_desc_src(struct fsldma_chan *chan, } static void set_desc_dst(struct fsldma_chan *chan, - struct fsl_dma_ld_hw *hw, dma_addr_t dst) +struct fsl_dma_ld_hw *hw, dma_addr_t dst) { u64 snoop_bits; @@ -109,7 +109,7 @@ static void set_desc_dst(struct fsldma_chan *chan, } static void set_desc_next(struct fsldma_chan *chan, - struct fsl_dma_ld_hw *hw, dma_addr_t next) + struct fsl_dma_ld_hw *hw, dma_addr_t next) { u64 snoop_bits; @@ -118,8 +118,7 @@ static void set_desc_next(struct fsldma_chan *chan, hw-next_ln_addr = CPU_TO_DMA(chan, snoop_bits | next, 64); } -static void set_ld_eol(struct fsldma_chan *chan, - struct fsl_desc_sw *desc) +static void set_ld_eol(struct fsldma_chan *chan, struct fsl_desc_sw *desc) { u64 snoop_bits; @@ -338,8 +337,7 @@ static void fsl_chan_toggle_ext_start(struct fsldma_chan *chan, int enable) chan-feature = ~FSL_DMA_CHAN_START_EXT; } -static void append_ld_queue(struct fsldma_chan *chan, - struct fsl_desc_sw *desc) +static void append_ld_queue(struct fsldma_chan *chan, struct fsl_desc_sw *desc) { struct fsl_desc_sw *tail = to_fsl_desc(chan-ld_pending.prev); @@ -380,8 +378,8 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) cookie = chan-common.cookie; list_for_each_entry(child, desc-tx_list, node) { cookie++; - if (cookie 0) - cookie = 1; + if (cookie DMA_MIN_COOKIE) + cookie = DMA_MIN_COOKIE; child-async_tx.cookie = cookie; } @@ -402,8 +400,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) * * Return - The descriptor allocated. NULL for failed. */ -static struct fsl_desc_sw *fsl_dma_alloc_descriptor( - struct fsldma_chan *chan) +static struct fsl_desc_sw *fsl_dma_alloc_descriptor(struct fsldma_chan *chan) { struct fsl_desc_sw *desc; dma_addr_t pdesc; @@ -427,7 +424,6 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor( return desc; } - /** * fsl_dma_alloc_chan_resources - Allocate resources for DMA channel. * @chan : Freescale DMA channel @@ -537,14 +533,15 @@ fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags) /* Insert the link descriptor to the LD ring */ list_add_tail(new-node, new-tx_list); - /* Set End-of-link to the last link descriptor of new list*/ + /* Set End-of-link to the last link descriptor of new list */ set_ld_eol(chan, new); return new-async_tx; } -static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy( - struct dma_chan *dchan, dma_addr_t dma_dst, dma_addr_t dma_src, +static struct dma_async_tx_descriptor * +fsl_dma_prep_memcpy(struct dma_chan *dchan, + dma_addr_t dma_dst, dma_addr_t dma_src, size_t len, unsigned long flags) { struct fsldma_chan *chan; @@ -594,7 +591,7 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy( new-async_tx.flags = flags; /* client is in control of this ack */ new-async_tx.cookie = -EBUSY; - /* Set End-of-link to the last link descriptor of new list*/ + /* Set End-of-link to the last link descriptor of new list */ set_ld_eol(chan, new); return first-async_tx; diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h index 113e713..49189da 100644 --- a/drivers/dma/fsldma.h +++ b/drivers/dma/fsldma.h @@ -102,8 +102,8 @@ struct fsl_desc_sw { } __attribute__((aligned(32))); struct fsldma_chan_regs { - u32 mr; /* 0x00 - Mode Register */ - u32 sr; /* 0x04 - Status Register */ + u32 mr; /* 0x00 - Mode Register */ + u32 sr; /* 0x04 - Status Register */ u64 cdar; /* 0x08 - Current descriptor address register */ u64 sar;/* 0x10 - Source Address Register */ u64 dar;/* 0x18 - Destination Address
[PATCH v3 6/9] fsldma: fix controller lockups
Enabling poisoning in the dmapool API quickly showed that the DMA controller was fetching descriptors that should not have been in use. This has caused intermittent controller lockups during testing. I have been unable to figure out the exact set of conditions which cause this to happen. However, I believe it is related to the driver using the hardware registers to track whether the controller is busy or not. The code can incorrectly decide that the hardware is idle due to lag between register writes and the hardware actually becoming busy. To fix this, the driver has been reworked to explicitly track the state of the hardware, rather than try to guess what it is doing based on the register values. This has passed dmatest with 10 threads per channel, 10 iterations per thread several times without error. Previously, this would fail within a few seconds. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 220 ++ drivers/dma/fsldma.h |1 + 2 files changed, 99 insertions(+), 122 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 5da1a4a..6e9ad6e 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -68,11 +68,6 @@ static dma_addr_t get_cdar(struct fsldma_chan *chan) return DMA_IN(chan, chan-regs-cdar, 64) ~FSL_DMA_SNEN; } -static dma_addr_t get_ndar(struct fsldma_chan *chan) -{ - return DMA_IN(chan, chan-regs-ndar, 64); -} - static u32 get_bcr(struct fsldma_chan *chan) { return DMA_IN(chan, chan-regs-bcr, 32); @@ -143,13 +138,11 @@ static void dma_init(struct fsldma_chan *chan) case FSL_DMA_IP_85XX: /* Set the channel to below modes: * EIE - Error interrupt enable -* EOSIE - End of segments interrupt enable (basic mode) * EOLNIE - End of links interrupt enable * BWC - Bandwidth sharing among channels */ DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC - | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE - | FSL_DMA_MR_EOSIE, 32); + | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE, 32); break; case FSL_DMA_IP_83XX: /* Set the channel to below modes: @@ -168,25 +161,32 @@ static int dma_is_idle(struct fsldma_chan *chan) return (!(sr FSL_DMA_SR_CB)) || (sr FSL_DMA_SR_CH); } +/* + * Start the DMA controller + * + * Preconditions: + * - the CDAR register must point to the start descriptor + * - the MRn[CS] bit must be cleared + */ static void dma_start(struct fsldma_chan *chan) { u32 mode; mode = DMA_IN(chan, chan-regs-mr, 32); - if ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) { - if (chan-feature FSL_DMA_CHAN_PAUSE_EXT) { - DMA_OUT(chan, chan-regs-bcr, 0, 32); - mode |= FSL_DMA_MR_EMP_EN; - } else { - mode = ~FSL_DMA_MR_EMP_EN; - } + if (chan-feature FSL_DMA_CHAN_PAUSE_EXT) { + DMA_OUT(chan, chan-regs-bcr, 0, 32); + mode |= FSL_DMA_MR_EMP_EN; + } else { + mode = ~FSL_DMA_MR_EMP_EN; } - if (chan-feature FSL_DMA_CHAN_START_EXT) + if (chan-feature FSL_DMA_CHAN_START_EXT) { mode |= FSL_DMA_MR_EMS_EN; - else + } else { + mode = ~FSL_DMA_MR_EMS_EN; mode |= FSL_DMA_MR_CS; + } DMA_OUT(chan, chan-regs-mr, mode, 32); } @@ -760,14 +760,15 @@ static int fsl_dma_device_control(struct dma_chan *dchan, switch (cmd) { case DMA_TERMINATE_ALL: + spin_lock_irqsave(chan-desc_lock, flags); + /* Halt the DMA engine */ dma_halt(chan); - spin_lock_irqsave(chan-desc_lock, flags); - /* Remove and free all of the descriptors in the LD queue */ fsldma_free_desc_list(chan, chan-ld_pending); fsldma_free_desc_list(chan, chan-ld_running); + chan-idle = true; spin_unlock_irqrestore(chan-desc_lock, flags); return 0; @@ -805,76 +806,43 @@ static int fsl_dma_device_control(struct dma_chan *dchan, } /** - * fsl_dma_update_completed_cookie - Update the completed cookie. + * fsl_chan_ld_cleanup - Clean up link descriptors * @chan : Freescale DMA channel * - * CONTEXT: hardirq + * This function is run after the queue of running descriptors has been + * executed by the DMA engine. It will run any callbacks, and then free + * the descriptors. + * + * HARDWARE STATE: idle */ -static void fsl_dma_update_completed_cookie(struct fsldma_chan *chan) +static void fsl_chan_ld_cleanup(struct fsldma_chan *chan) { - struct fsl_desc_sw *desc; + struct fsl_desc_sw *desc, *_desc; unsigned
[PATCH v3 7/9] fsldma: support async_tx dependencies and automatic unmapping
Previous to this patch, the dma_run_dependencies() function has been called while holding desc_lock. This function can call tx_submit() for other descriptors, which may try to re-grab the lock. Avoid this by moving the descriptors to be cleaned up to a temporary list, and dropping the lock before cleanup. At the same time, add support for automatic unmapping of src and dst buffers, as offered by the DMAEngine API. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 131 -- 1 files changed, 95 insertions(+), 36 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 6e9ad6e..526579d 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -83,6 +83,11 @@ static void set_desc_cnt(struct fsldma_chan *chan, hw-count = CPU_TO_DMA(chan, count, 32); } +static u32 get_desc_cnt(struct fsldma_chan *chan, struct fsl_desc_sw *desc) +{ + return DMA_TO_CPU(chan, desc-hw.count, 32); +} + static void set_desc_src(struct fsldma_chan *chan, struct fsl_dma_ld_hw *hw, dma_addr_t src) { @@ -93,6 +98,16 @@ static void set_desc_src(struct fsldma_chan *chan, hw-src_addr = CPU_TO_DMA(chan, snoop_bits | src, 64); } +static dma_addr_t get_desc_src(struct fsldma_chan *chan, + struct fsl_desc_sw *desc) +{ + u64 snoop_bits; + + snoop_bits = ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) + ? ((u64)FSL_DMA_SATR_SREADTYPE_SNOOP_READ 32) : 0; + return DMA_TO_CPU(chan, desc-hw.src_addr, 64) ~snoop_bits; +} + static void set_desc_dst(struct fsldma_chan *chan, struct fsl_dma_ld_hw *hw, dma_addr_t dst) { @@ -103,6 +118,16 @@ static void set_desc_dst(struct fsldma_chan *chan, hw-dst_addr = CPU_TO_DMA(chan, snoop_bits | dst, 64); } +static dma_addr_t get_desc_dst(struct fsldma_chan *chan, + struct fsl_desc_sw *desc) +{ + u64 snoop_bits; + + snoop_bits = ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) + ? ((u64)FSL_DMA_DATR_DWRITETYPE_SNOOP_WRITE 32) : 0; + return DMA_TO_CPU(chan, desc-hw.dst_addr, 64) ~snoop_bits; +} + static void set_desc_next(struct fsldma_chan *chan, struct fsl_dma_ld_hw *hw, dma_addr_t next) { @@ -806,6 +831,57 @@ static int fsl_dma_device_control(struct dma_chan *dchan, } /** + * fsldma_cleanup_descriptor - cleanup and free a single link descriptor + * @chan: Freescale DMA channel + * @desc: descriptor to cleanup and free + * + * This function is used on a descriptor which has been executed by the DMA + * controller. It will run any callbacks, submit any dependencies, and then + * free the descriptor. + */ +static void fsldma_cleanup_descriptor(struct fsldma_chan *chan, + struct fsl_desc_sw *desc) +{ + struct dma_async_tx_descriptor *txd = desc-async_tx; + struct device *dev = chan-common.device-dev; + dma_addr_t src = get_desc_src(chan, desc); + dma_addr_t dst = get_desc_dst(chan, desc); + u32 len = get_desc_cnt(chan, desc); + + /* Run the link descriptor callback function */ + if (txd-callback) { +#ifdef FSL_DMA_LD_DEBUG + chan_dbg(chan, LD %p callback\n, desc); +#endif + txd-callback(txd-callback_param); + } + + /* Run any dependencies */ + dma_run_dependencies(txd); + + /* Unmap the dst buffer, if requested */ + if (!(txd-flags DMA_COMPL_SKIP_DEST_UNMAP)) { + if (txd-flags DMA_COMPL_DEST_UNMAP_SINGLE) + dma_unmap_single(dev, dst, len, DMA_FROM_DEVICE); + else + dma_unmap_page(dev, dst, len, DMA_FROM_DEVICE); + } + + /* Unmap the src buffer, if requested */ + if (!(txd-flags DMA_COMPL_SKIP_SRC_UNMAP)) { + if (txd-flags DMA_COMPL_SRC_UNMAP_SINGLE) + dma_unmap_single(dev, src, len, DMA_TO_DEVICE); + else + dma_unmap_page(dev, src, len, DMA_TO_DEVICE); + } + +#ifdef FSL_DMA_LD_DEBUG + chan_dbg(chan, LD %p free\n, desc); +#endif + dma_pool_free(chan-desc_pool, desc, txd-phys); +} + +/** * fsl_chan_ld_cleanup - Clean up link descriptors * @chan : Freescale DMA channel * @@ -818,56 +894,39 @@ static int fsl_dma_device_control(struct dma_chan *dchan, static void fsl_chan_ld_cleanup(struct fsldma_chan *chan) { struct fsl_desc_sw *desc, *_desc; + LIST_HEAD(ld_cleanup); unsigned long flags; spin_lock_irqsave(chan-desc_lock, flags); - /* if the ld_running list is empty, there is nothing to do */ - if (list_empty(chan-ld_running)) { - chan_dbg(chan, no descriptors to cleanup\n); - goto out_unlock; + /* update the cookie if we have some descriptors to cleanup
[PATCH v3 8/9] fsldma: reduce locking during descriptor cleanup
This merges the fsl_chan_ld_cleanup() function into the dma_do_tasklet() function to reduce locking overhead. In the best case, we will be able to keep the DMA controller busy while we are freeing used descriptors. In all cases, the spinlock is grabbed two times fewer than before on each transaction. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 108 + 1 files changed, 46 insertions(+), 62 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 526579d..d300de4 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -882,65 +882,15 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan *chan, } /** - * fsl_chan_ld_cleanup - Clean up link descriptors - * @chan : Freescale DMA channel - * - * This function is run after the queue of running descriptors has been - * executed by the DMA engine. It will run any callbacks, and then free - * the descriptors. - * - * HARDWARE STATE: idle - */ -static void fsl_chan_ld_cleanup(struct fsldma_chan *chan) -{ - struct fsl_desc_sw *desc, *_desc; - LIST_HEAD(ld_cleanup); - unsigned long flags; - - spin_lock_irqsave(chan-desc_lock, flags); - - /* update the cookie if we have some descriptors to cleanup */ - if (!list_empty(chan-ld_running)) { - dma_cookie_t cookie; - - desc = to_fsl_desc(chan-ld_running.prev); - cookie = desc-async_tx.cookie; - - chan-completed_cookie = cookie; - chan_dbg(chan, completed cookie=%d\n, cookie); - } - - /* -* move the descriptors to a temporary list so we can drop the lock -* during the entire cleanup operation -*/ - list_splice_tail_init(chan-ld_running, ld_cleanup); - - spin_unlock_irqrestore(chan-desc_lock, flags); - - /* Run the callback for each descriptor, in order */ - list_for_each_entry_safe(desc, _desc, ld_cleanup, node) { - - /* Remove from the list of transactions */ - list_del(desc-node); - - /* Run all cleanup for this descriptor */ - fsldma_cleanup_descriptor(chan, desc); - } -} - -/** * fsl_chan_xfer_ld_queue - transfer any pending transactions * @chan : Freescale DMA channel * * HARDWARE STATE: idle + * LOCKING: must hold chan-desc_lock */ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) { struct fsl_desc_sw *desc; - unsigned long flags; - - spin_lock_irqsave(chan-desc_lock, flags); /* * If the list of pending descriptors is empty, then we @@ -948,7 +898,7 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) */ if (list_empty(chan-ld_pending)) { chan_dbg(chan, no pending LDs\n); - goto out_unlock; + return; } /* @@ -958,7 +908,7 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) */ if (!chan-idle) { chan_dbg(chan, DMA controller still busy\n); - goto out_unlock; + return; } /* @@ -996,9 +946,6 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) dma_start(chan); chan-idle = false; - -out_unlock: - spin_unlock_irqrestore(chan-desc_lock, flags); } /** @@ -1008,7 +955,11 @@ out_unlock: static void fsl_dma_memcpy_issue_pending(struct dma_chan *dchan) { struct fsldma_chan *chan = to_fsl_chan(dchan); + unsigned long flags; + + spin_lock_irqsave(chan-desc_lock, flags); fsl_chan_xfer_ld_queue(chan); + spin_unlock_irqrestore(chan-desc_lock, flags); } /** @@ -1109,20 +1060,53 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data) static void dma_do_tasklet(unsigned long data) { struct fsldma_chan *chan = (struct fsldma_chan *)data; + struct fsl_desc_sw *desc, *_desc; + LIST_HEAD(ld_cleanup); unsigned long flags; chan_dbg(chan, tasklet entry\n); - /* run all callbacks, free all used descriptors */ - fsl_chan_ld_cleanup(chan); - - /* the channel is now idle */ spin_lock_irqsave(chan-desc_lock, flags); + + /* update the cookie if we have some descriptors to cleanup */ + if (!list_empty(chan-ld_running)) { + dma_cookie_t cookie; + + desc = to_fsl_desc(chan-ld_running.prev); + cookie = desc-async_tx.cookie; + + chan-completed_cookie = cookie; + chan_dbg(chan, completed_cookie=%d\n, cookie); + } + + /* +* move the descriptors to a temporary list so we can drop the lock +* during the entire cleanup operation +*/ + list_splice_tail_init(chan-ld_running, ld_cleanup); + + /* the hardware is now idle and ready for more */ chan-idle = true; - spin_unlock_irqrestore(chan
Re: [PATCH 0/8] fsldma: lockup fixes
On Wed, Mar 02, 2011 at 07:49:57AM +0200, Felix Radensky wrote: Hi Ira, On 03/01/2011 09:52 PM, Ira W. Snyder wrote: On Tue, Mar 01, 2011 at 08:55:15AM -0800, Ira W. Snyder wrote: [ big snip ] I'd still like the bisect if you have a chance. I've re-reviewed the patch series, and found the places that change register writes to the controller. The patch below changes the register operations back to the original order. It doesn't make any sense why this would be required, but it is worth a quick try. I've added an XXX mark where you can comment out a single line if this patch fails. It is highly unlikely to make any difference, but I'm really having a hard time understanding what is wrong. This patch fixes the problem. See below Excellent! I know what is happening now. The 85xx controller doesn't clear the channel start bit at the end of a transfer. Sure enough, buried near the end of the chapter, the datasheet implies this in a table very far away from the register definitions. The 83xx datasheet explicitly states that it clears this bit automatically. I'll post an updated patch series later today. Thank you so much for being patient and trying out all of these patches. Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 1/9] dmatest: fix automatic buffer unmap type
The dmatest code relies on the DMAEngine API to automatically call dma_unmap_single() on src buffers. The flags it passes are incorrect, fix them. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/dmatest.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c index 5589358..7e1b0aa 100644 --- a/drivers/dma/dmatest.c +++ b/drivers/dma/dmatest.c @@ -285,7 +285,12 @@ static int dmatest_func(void *data) set_user_nice(current, 10); - flags = DMA_CTRL_ACK | DMA_COMPL_SKIP_DEST_UNMAP | DMA_PREP_INTERRUPT; + /* +* src buffers are freed by the DMAEngine code with dma_unmap_single() +* dst buffers are freed by ourselves below +*/ + flags = DMA_CTRL_ACK | DMA_PREP_INTERRUPT + | DMA_COMPL_SKIP_DEST_UNMAP | DMA_COMPL_SRC_UNMAP_SINGLE; while (!kthread_should_stop() !(iterations total_tests = iterations)) { -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 0/9] fsldma: lockup fixes
Hello everyone, I've been chasing random infrequent controller lockups in the fsldma driver for a long time. I finally managed to find the problem and fix it. I'm not quite sure about the exact sequence of events which causes the race condition, but it is related to using the hardware registers to track the controller state. See the patch changelogs for more detail. The problems were quickly found by turning on DMAPOOL_DEBUG inside mm/dmapool.c. This poisons memory allocated with the dmapool API. With dmapool poisoning turned on, the dmatest driver would start producing failures within a few seconds. After this patchset has been applied, I have run several iterations of the 10 threads per channel, 10 iterations per thread test without any problems. I have also tested it with the CARMA drivers (posted at linuxppc-dev previously), which make use of the external control features. While making the previous changes, I noticed that the fsldma driver does not respect the automatic DMA unmapping of src and dst buffers. I have added support for this feature. This also required a fix to dmatest, which was sending incorrect flags. The support async_tx dependencies patch could be split apart from the automatic unmapping patch if it is desirable. They both touch the same piece of code, so I thought it was ok to combine them. Let me know. I would really like to see this go into 2.6.39. I think we can get it reviewed before then. :) Much thanks goes to Felix Radensky for testing on a P2020 (85xx DMA IP core). I wouldn't have been able to track down the problems on 85xx without his dilligent testing. v1 - v2: - reordered patches (dmatest change is first now) - fix problems on 85xx controller - only set correct bits for 83xx in dma_halt() Ira W. Snyder (9): dmatest: fix automatic buffer unmap type fsldma: move related helper functions near each other fsldma: use channel name in printk output fsldma: improve link descriptor debugging fsldma: minor codingstyle and consistency fixes fsldma: fix controller lockups fsldma: support async_tx dependencies and automatic unmapping fsldma: reduce locking during descriptor cleanup fsldma: make halt behave nicely on all supported controllers drivers/dma/dmatest.c |7 +- drivers/dma/fsldma.c | 542 +++-- drivers/dma/fsldma.h |6 +- 3 files changed, 308 insertions(+), 247 deletions(-) -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 2/9] fsldma: move related helper functions near each other
This is a purely cosmetic cleanup. It is nice to have related functions right next to each other in the code. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 116 +++-- 1 files changed, 64 insertions(+), 52 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 4de947a..2e1af45 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -39,33 +39,9 @@ static const char msg_ld_oom[] = No free memory for link descriptor\n; -static void dma_init(struct fsldma_chan *chan) -{ - /* Reset the channel */ - DMA_OUT(chan, chan-regs-mr, 0, 32); - - switch (chan-feature FSL_DMA_IP_MASK) { - case FSL_DMA_IP_85XX: - /* Set the channel to below modes: -* EIE - Error interrupt enable -* EOSIE - End of segments interrupt enable (basic mode) -* EOLNIE - End of links interrupt enable -* BWC - Bandwidth sharing among channels -*/ - DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC - | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE - | FSL_DMA_MR_EOSIE, 32); - break; - case FSL_DMA_IP_83XX: - /* Set the channel to below modes: -* EOTIE - End-of-transfer interrupt enable -* PRC_RM - PCI read multiple -*/ - DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_EOTIE - | FSL_DMA_MR_PRC_RM, 32); - break; - } -} +/* + * Register Helpers + */ static void set_sr(struct fsldma_chan *chan, u32 val) { @@ -77,6 +53,30 @@ static u32 get_sr(struct fsldma_chan *chan) return DMA_IN(chan, chan-regs-sr, 32); } +static void set_cdar(struct fsldma_chan *chan, dma_addr_t addr) +{ + DMA_OUT(chan, chan-regs-cdar, addr | FSL_DMA_SNEN, 64); +} + +static dma_addr_t get_cdar(struct fsldma_chan *chan) +{ + return DMA_IN(chan, chan-regs-cdar, 64) ~FSL_DMA_SNEN; +} + +static dma_addr_t get_ndar(struct fsldma_chan *chan) +{ + return DMA_IN(chan, chan-regs-ndar, 64); +} + +static u32 get_bcr(struct fsldma_chan *chan) +{ + return DMA_IN(chan, chan-regs-bcr, 32); +} + +/* + * Descriptor Helpers + */ + static void set_desc_cnt(struct fsldma_chan *chan, struct fsl_dma_ld_hw *hw, u32 count) { @@ -113,24 +113,49 @@ static void set_desc_next(struct fsldma_chan *chan, hw-next_ln_addr = CPU_TO_DMA(chan, snoop_bits | next, 64); } -static void set_cdar(struct fsldma_chan *chan, dma_addr_t addr) +static void set_ld_eol(struct fsldma_chan *chan, + struct fsl_desc_sw *desc) { - DMA_OUT(chan, chan-regs-cdar, addr | FSL_DMA_SNEN, 64); -} + u64 snoop_bits; -static dma_addr_t get_cdar(struct fsldma_chan *chan) -{ - return DMA_IN(chan, chan-regs-cdar, 64) ~FSL_DMA_SNEN; -} + snoop_bits = ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_83XX) + ? FSL_DMA_SNEN : 0; -static dma_addr_t get_ndar(struct fsldma_chan *chan) -{ - return DMA_IN(chan, chan-regs-ndar, 64); + desc-hw.next_ln_addr = CPU_TO_DMA(chan, + DMA_TO_CPU(chan, desc-hw.next_ln_addr, 64) | FSL_DMA_EOL + | snoop_bits, 64); } -static u32 get_bcr(struct fsldma_chan *chan) +/* + * DMA Engine Hardware Control Helpers + */ + +static void dma_init(struct fsldma_chan *chan) { - return DMA_IN(chan, chan-regs-bcr, 32); + /* Reset the channel */ + DMA_OUT(chan, chan-regs-mr, 0, 32); + + switch (chan-feature FSL_DMA_IP_MASK) { + case FSL_DMA_IP_85XX: + /* Set the channel to below modes: +* EIE - Error interrupt enable +* EOSIE - End of segments interrupt enable (basic mode) +* EOLNIE - End of links interrupt enable +* BWC - Bandwidth sharing among channels +*/ + DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC + | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE + | FSL_DMA_MR_EOSIE, 32); + break; + case FSL_DMA_IP_83XX: + /* Set the channel to below modes: +* EOTIE - End-of-transfer interrupt enable +* PRC_RM - PCI read multiple +*/ + DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_EOTIE + | FSL_DMA_MR_PRC_RM, 32); + break; + } } static int dma_is_idle(struct fsldma_chan *chan) @@ -185,19 +210,6 @@ static void dma_halt(struct fsldma_chan *chan) dev_err(chan-dev, DMA halt timeout!\n); } -static void set_ld_eol(struct fsldma_chan *chan, - struct fsl_desc_sw *desc) -{ - u64 snoop_bits; - - snoop_bits = ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_83XX
[PATCH v2 4/9] fsldma: improve link descriptor debugging
This adds better tracking to link descriptor allocations, callbacks, and frees. This makes it much easier to track errors with link descriptors. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 21 +++-- 1 files changed, 15 insertions(+), 6 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 6e3d3d7..851993c 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -416,6 +416,10 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor( desc-async_tx.tx_submit = fsl_dma_tx_submit; desc-async_tx.phys = pdesc; +#ifdef FSL_DMA_LD_DEBUG + dev_dbg(chan-dev, %s: LD %p allocated\n, chan-name, desc); +#endif + return desc; } @@ -467,6 +471,9 @@ static void fsldma_free_desc_list(struct fsldma_chan *chan, list_for_each_entry_safe(desc, _desc, list, node) { list_del(desc-node); +#ifdef FSL_DMA_LD_DEBUG + dev_dbg(chan-dev, %s: LD %p free\n, chan-name, desc); +#endif dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys); } } @@ -478,6 +485,9 @@ static void fsldma_free_desc_list_reverse(struct fsldma_chan *chan, list_for_each_entry_safe_reverse(desc, _desc, list, node) { list_del(desc-node); +#ifdef FSL_DMA_LD_DEBUG + dev_dbg(chan-dev, %s: LD %p free\n, chan-name, desc); +#endif dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys); } } @@ -554,9 +564,6 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy( dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom); goto fail; } -#ifdef FSL_DMA_LD_DEBUG - dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, new); -#endif copy = min(len, (size_t)FSL_DMA_BCR_MAX_CNT); @@ -642,9 +649,6 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_sg(struct dma_chan *dchan, dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom); goto fail; } -#ifdef FSL_DMA_LD_DEBUG - dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, new); -#endif set_desc_cnt(chan, new-hw, len); set_desc_src(chan, new-hw, src); @@ -881,13 +885,18 @@ static void fsl_chan_ld_cleanup(struct fsldma_chan *chan) callback_param = desc-async_tx.callback_param; if (callback) { spin_unlock_irqrestore(chan-desc_lock, flags); +#ifdef FSL_DMA_LD_DEBUG dev_dbg(chan-dev, %s: LD %p callback\n, name, desc); +#endif callback(callback_param); spin_lock_irqsave(chan-desc_lock, flags); } /* Run any dependencies, then free the descriptor */ dma_run_dependencies(desc-async_tx); +#ifdef FSL_DMA_LD_DEBUG + dev_dbg(chan-dev, %s: LD %p free\n, name, desc); +#endif dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys); } -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 3/9] fsldma: use channel name in printk output
This makes debugging the driver much easier when multiple channels are running concurrently. In addition, you can see how much descriptor memory each channel has allocated via the dmapool API in sysfs. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 60 +++-- drivers/dma/fsldma.h |1 + 2 files changed, 34 insertions(+), 27 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 2e1af45..6e3d3d7 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -37,7 +37,7 @@ #include fsldma.h -static const char msg_ld_oom[] = No free memory for link descriptor\n; +static const char msg_ld_oom[] = No free memory for link descriptor; /* * Register Helpers @@ -207,7 +207,7 @@ static void dma_halt(struct fsldma_chan *chan) } if (!dma_is_idle(chan)) - dev_err(chan-dev, DMA halt timeout!\n); + dev_err(chan-dev, %s: DMA halt timeout!\n, chan-name); } /** @@ -400,12 +400,13 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) static struct fsl_desc_sw *fsl_dma_alloc_descriptor( struct fsldma_chan *chan) { + const char *name = chan-name; struct fsl_desc_sw *desc; dma_addr_t pdesc; desc = dma_pool_alloc(chan-desc_pool, GFP_ATOMIC, pdesc); if (!desc) { - dev_dbg(chan-dev, out of memory for link desc\n); + dev_dbg(chan-dev, %s: out of memory for link desc\n, name); return NULL; } @@ -439,13 +440,12 @@ static int fsl_dma_alloc_chan_resources(struct dma_chan *dchan) * We need the descriptor to be aligned to 32bytes * for meeting FSL DMA specification requirement. */ - chan-desc_pool = dma_pool_create(fsl_dma_engine_desc_pool, - chan-dev, + chan-desc_pool = dma_pool_create(chan-name, chan-dev, sizeof(struct fsl_desc_sw), __alignof__(struct fsl_desc_sw), 0); if (!chan-desc_pool) { - dev_err(chan-dev, unable to allocate channel %d - descriptor pool\n, chan-id); + dev_err(chan-dev, %s: unable to allocate descriptor pool\n, + chan-name); return -ENOMEM; } @@ -491,7 +491,7 @@ static void fsl_dma_free_chan_resources(struct dma_chan *dchan) struct fsldma_chan *chan = to_fsl_chan(dchan); unsigned long flags; - dev_dbg(chan-dev, Free all channel resources.\n); + dev_dbg(chan-dev, %s: Free all channel resources.\n, chan-name); spin_lock_irqsave(chan-desc_lock, flags); fsldma_free_desc_list(chan, chan-ld_pending); fsldma_free_desc_list(chan, chan-ld_running); @@ -514,7 +514,7 @@ fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags) new = fsl_dma_alloc_descriptor(chan); if (!new) { - dev_err(chan-dev, msg_ld_oom); + dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom); return NULL; } @@ -551,11 +551,11 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy( /* Allocate the link descriptor from DMA pool */ new = fsl_dma_alloc_descriptor(chan); if (!new) { - dev_err(chan-dev, msg_ld_oom); + dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom); goto fail; } #ifdef FSL_DMA_LD_DEBUG - dev_dbg(chan-dev, new link desc alloc %p\n, new); + dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, new); #endif copy = min(len, (size_t)FSL_DMA_BCR_MAX_CNT); @@ -639,11 +639,11 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_sg(struct dma_chan *dchan, /* allocate and populate the descriptor */ new = fsl_dma_alloc_descriptor(chan); if (!new) { - dev_err(chan-dev, msg_ld_oom); + dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom); goto fail; } #ifdef FSL_DMA_LD_DEBUG - dev_dbg(chan-dev, new link desc alloc %p\n, new); + dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, new); #endif set_desc_cnt(chan, new-hw, len); @@ -815,7 +815,7 @@ static void fsl_dma_update_completed_cookie(struct fsldma_chan *chan) spin_lock_irqsave(chan-desc_lock, flags); if (list_empty(chan-ld_running)) { - dev_dbg(chan-dev, no running descriptors\n); + dev_dbg(chan-dev, %s: no running descriptors\n, chan-name); goto out_unlock; } @@ -859,11 +859,13 @@ static enum dma_status
[PATCH v2 9/9] fsldma: make halt behave nicely on all supported controllers
The original dma_halt() function set the CA (channel abort) bit on both the 83xx and 85xx controllers. This is incorrect on the 83xx, where this bit means TEM (transfer error mask) instead. The 83xx doesn't support channel abort, so we only do this operation on 85xx. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 19 --- 1 files changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 40babc1..eb7bc24 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -216,13 +216,26 @@ static void dma_halt(struct fsldma_chan *chan) u32 mode; int i; + /* read the mode register */ mode = DMA_IN(chan, chan-regs-mr, 32); - mode |= FSL_DMA_MR_CA; - DMA_OUT(chan, chan-regs-mr, mode, 32); - mode = ~(FSL_DMA_MR_CS | FSL_DMA_MR_EMS_EN | FSL_DMA_MR_CA); + /* +* The 85xx controller supports channel abort, which will stop +* the current transfer. On 83xx, this bit is the transfer error +* mask bit, which should not be changed. +*/ + if ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) { + mode |= FSL_DMA_MR_CA; + DMA_OUT(chan, chan-regs-mr, mode, 32); + + mode = ~FSL_DMA_MR_CA; + } + + /* stop the DMA controller */ + mode = ~(FSL_DMA_MR_CS | FSL_DMA_MR_EMS_EN); DMA_OUT(chan, chan-regs-mr, mode, 32); + /* wait for the DMA controller to become idle */ for (i = 0; i 100; i++) { if (dma_is_idle(chan)) return; -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 5/9] fsldma: minor codingstyle and consistency fixes
This fixes some minor violations of the coding style. It also changes the style of the device_prep_dma_*() function definitions so they are identical. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 29 + drivers/dma/fsldma.h |4 ++-- 2 files changed, 15 insertions(+), 18 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 851993c..06421c0 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -84,7 +84,7 @@ static void set_desc_cnt(struct fsldma_chan *chan, } static void set_desc_src(struct fsldma_chan *chan, - struct fsl_dma_ld_hw *hw, dma_addr_t src) +struct fsl_dma_ld_hw *hw, dma_addr_t src) { u64 snoop_bits; @@ -94,7 +94,7 @@ static void set_desc_src(struct fsldma_chan *chan, } static void set_desc_dst(struct fsldma_chan *chan, - struct fsl_dma_ld_hw *hw, dma_addr_t dst) +struct fsl_dma_ld_hw *hw, dma_addr_t dst) { u64 snoop_bits; @@ -104,7 +104,7 @@ static void set_desc_dst(struct fsldma_chan *chan, } static void set_desc_next(struct fsldma_chan *chan, - struct fsl_dma_ld_hw *hw, dma_addr_t next) + struct fsl_dma_ld_hw *hw, dma_addr_t next) { u64 snoop_bits; @@ -113,8 +113,7 @@ static void set_desc_next(struct fsldma_chan *chan, hw-next_ln_addr = CPU_TO_DMA(chan, snoop_bits | next, 64); } -static void set_ld_eol(struct fsldma_chan *chan, - struct fsl_desc_sw *desc) +static void set_ld_eol(struct fsldma_chan *chan, struct fsl_desc_sw *desc) { u64 snoop_bits; @@ -333,8 +332,7 @@ static void fsl_chan_toggle_ext_start(struct fsldma_chan *chan, int enable) chan-feature = ~FSL_DMA_CHAN_START_EXT; } -static void append_ld_queue(struct fsldma_chan *chan, - struct fsl_desc_sw *desc) +static void append_ld_queue(struct fsldma_chan *chan, struct fsl_desc_sw *desc) { struct fsl_desc_sw *tail = to_fsl_desc(chan-ld_pending.prev); @@ -375,8 +373,8 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) cookie = chan-common.cookie; list_for_each_entry(child, desc-tx_list, node) { cookie++; - if (cookie 0) - cookie = 1; + if (cookie DMA_MIN_COOKIE) + cookie = DMA_MIN_COOKIE; child-async_tx.cookie = cookie; } @@ -397,8 +395,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) * * Return - The descriptor allocated. NULL for failed. */ -static struct fsl_desc_sw *fsl_dma_alloc_descriptor( - struct fsldma_chan *chan) +static struct fsl_desc_sw *fsl_dma_alloc_descriptor(struct fsldma_chan *chan) { const char *name = chan-name; struct fsl_desc_sw *desc; @@ -423,7 +420,6 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor( return desc; } - /** * fsl_dma_alloc_chan_resources - Allocate resources for DMA channel. * @chan : Freescale DMA channel @@ -534,14 +530,15 @@ fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags) /* Insert the link descriptor to the LD ring */ list_add_tail(new-node, new-tx_list); - /* Set End-of-link to the last link descriptor of new list*/ + /* Set End-of-link to the last link descriptor of new list */ set_ld_eol(chan, new); return new-async_tx; } -static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy( - struct dma_chan *dchan, dma_addr_t dma_dst, dma_addr_t dma_src, +static struct dma_async_tx_descriptor * +fsl_dma_prep_memcpy(struct dma_chan *dchan, + dma_addr_t dma_dst, dma_addr_t dma_src, size_t len, unsigned long flags) { struct fsldma_chan *chan; @@ -591,7 +588,7 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy( new-async_tx.flags = flags; /* client is in control of this ack */ new-async_tx.cookie = -EBUSY; - /* Set End-of-link to the last link descriptor of new list*/ + /* Set End-of-link to the last link descriptor of new list */ set_ld_eol(chan, new); return first-async_tx; diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h index 113e713..49189da 100644 --- a/drivers/dma/fsldma.h +++ b/drivers/dma/fsldma.h @@ -102,8 +102,8 @@ struct fsl_desc_sw { } __attribute__((aligned(32))); struct fsldma_chan_regs { - u32 mr; /* 0x00 - Mode Register */ - u32 sr; /* 0x04 - Status Register */ + u32 mr; /* 0x00 - Mode Register */ + u32 sr; /* 0x04 - Status Register */ u64 cdar; /* 0x08 - Current descriptor address register */ u64 sar;/* 0x10 - Source Address Register */ u64 dar;/* 0x18
[PATCH v2 6/9] fsldma: fix controller lockups
Enabling poisoning in the dmapool API quickly showed that the DMA controller was fetching descriptors that should not have been in use. This has caused intermittent controller lockups during testing. I have been unable to figure out the exact set of conditions which cause this to happen. However, I believe it is related to the driver using the hardware registers to track whether the controller is busy or not. The code can incorrectly decide that the hardware is idle due to lag between register writes and the hardware actually becoming busy. To fix this, the driver has been reworked to explicitly track the state of the hardware, rather than try to guess what it is doing based on the register values. This has passed dmatest with 10 threads per channel, 10 iterations per thread several times without error. Previously, this would fail within a few seconds. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 225 ++ drivers/dma/fsldma.h |1 + 2 files changed, 101 insertions(+), 125 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 06421c0..e9bb51e 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -63,11 +63,6 @@ static dma_addr_t get_cdar(struct fsldma_chan *chan) return DMA_IN(chan, chan-regs-cdar, 64) ~FSL_DMA_SNEN; } -static dma_addr_t get_ndar(struct fsldma_chan *chan) -{ - return DMA_IN(chan, chan-regs-ndar, 64); -} - static u32 get_bcr(struct fsldma_chan *chan) { return DMA_IN(chan, chan-regs-bcr, 32); @@ -138,13 +133,11 @@ static void dma_init(struct fsldma_chan *chan) case FSL_DMA_IP_85XX: /* Set the channel to below modes: * EIE - Error interrupt enable -* EOSIE - End of segments interrupt enable (basic mode) * EOLNIE - End of links interrupt enable * BWC - Bandwidth sharing among channels */ DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC - | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE - | FSL_DMA_MR_EOSIE, 32); + | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE, 32); break; case FSL_DMA_IP_83XX: /* Set the channel to below modes: @@ -163,25 +156,32 @@ static int dma_is_idle(struct fsldma_chan *chan) return (!(sr FSL_DMA_SR_CB)) || (sr FSL_DMA_SR_CH); } +/* + * Start the DMA controller + * + * Preconditions: + * - the CDAR register must point to the start descriptor + * - the MRn[CS] bit must be cleared + */ static void dma_start(struct fsldma_chan *chan) { u32 mode; mode = DMA_IN(chan, chan-regs-mr, 32); - if ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) { - if (chan-feature FSL_DMA_CHAN_PAUSE_EXT) { - DMA_OUT(chan, chan-regs-bcr, 0, 32); - mode |= FSL_DMA_MR_EMP_EN; - } else { - mode = ~FSL_DMA_MR_EMP_EN; - } + if (chan-feature FSL_DMA_CHAN_PAUSE_EXT) { + DMA_OUT(chan, chan-regs-bcr, 0, 32); + mode |= FSL_DMA_MR_EMP_EN; + } else { + mode = ~FSL_DMA_MR_EMP_EN; } - if (chan-feature FSL_DMA_CHAN_START_EXT) + if (chan-feature FSL_DMA_CHAN_START_EXT) { mode |= FSL_DMA_MR_EMS_EN; - else + } else { + mode = ~FSL_DMA_MR_EMS_EN; mode |= FSL_DMA_MR_CS; + } DMA_OUT(chan, chan-regs-mr, mode, 32); } @@ -757,14 +757,15 @@ static int fsl_dma_device_control(struct dma_chan *dchan, switch (cmd) { case DMA_TERMINATE_ALL: + spin_lock_irqsave(chan-desc_lock, flags); + /* Halt the DMA engine */ dma_halt(chan); - spin_lock_irqsave(chan-desc_lock, flags); - /* Remove and free all of the descriptors in the LD queue */ fsldma_free_desc_list(chan, chan-ld_pending); fsldma_free_desc_list(chan, chan-ld_running); + chan-idle = true; spin_unlock_irqrestore(chan-desc_lock, flags); return 0; @@ -802,78 +803,45 @@ static int fsl_dma_device_control(struct dma_chan *dchan, } /** - * fsl_dma_update_completed_cookie - Update the completed cookie. + * fsl_chan_ld_cleanup - Clean up link descriptors * @chan : Freescale DMA channel * - * CONTEXT: hardirq + * This function is run after the queue of running descriptors has been + * executed by the DMA engine. It will run any callbacks, and then free + * the descriptors. + * + * HARDWARE STATE: idle */ -static void fsl_dma_update_completed_cookie(struct fsldma_chan *chan) +static void fsl_chan_ld_cleanup(struct fsldma_chan *chan) { - struct fsl_desc_sw *desc; + struct fsl_desc_sw *desc, *_desc; + const char
[PATCH v2 7/9] fsldma: support async_tx dependencies and automatic unmapping
Previous to this patch, the dma_run_dependencies() function has been called while holding desc_lock. This function can call tx_submit() for other descriptors, which may try to re-grab the lock. Avoid this by moving the descriptors to be cleaned up to a temporary list, and dropping the lock before cleanup. At the same time, add support for automatic unmapping of src and dst buffers, as offered by the DMAEngine API. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 132 -- 1 files changed, 95 insertions(+), 37 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index e9bb51e..48e48c7 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -78,6 +78,11 @@ static void set_desc_cnt(struct fsldma_chan *chan, hw-count = CPU_TO_DMA(chan, count, 32); } +static u32 get_desc_cnt(struct fsldma_chan *chan, struct fsl_desc_sw *desc) +{ + return DMA_TO_CPU(chan, desc-hw.count, 32); +} + static void set_desc_src(struct fsldma_chan *chan, struct fsl_dma_ld_hw *hw, dma_addr_t src) { @@ -88,6 +93,16 @@ static void set_desc_src(struct fsldma_chan *chan, hw-src_addr = CPU_TO_DMA(chan, snoop_bits | src, 64); } +static dma_addr_t get_desc_src(struct fsldma_chan *chan, + struct fsl_desc_sw *desc) +{ + u64 snoop_bits; + + snoop_bits = ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) + ? ((u64)FSL_DMA_SATR_SREADTYPE_SNOOP_READ 32) : 0; + return DMA_TO_CPU(chan, desc-hw.src_addr, 64) ~snoop_bits; +} + static void set_desc_dst(struct fsldma_chan *chan, struct fsl_dma_ld_hw *hw, dma_addr_t dst) { @@ -98,6 +113,16 @@ static void set_desc_dst(struct fsldma_chan *chan, hw-dst_addr = CPU_TO_DMA(chan, snoop_bits | dst, 64); } +static dma_addr_t get_desc_dst(struct fsldma_chan *chan, + struct fsl_desc_sw *desc) +{ + u64 snoop_bits; + + snoop_bits = ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) + ? ((u64)FSL_DMA_DATR_DWRITETYPE_SNOOP_WRITE 32) : 0; + return DMA_TO_CPU(chan, desc-hw.dst_addr, 64) ~snoop_bits; +} + static void set_desc_next(struct fsldma_chan *chan, struct fsl_dma_ld_hw *hw, dma_addr_t next) { @@ -803,6 +828,57 @@ static int fsl_dma_device_control(struct dma_chan *dchan, } /** + * fsldma_cleanup_descriptor - cleanup and free a single link descriptor + * @chan: Freescale DMA channel + * @desc: descriptor to cleanup and free + * + * This function is used on a descriptor which has been executed by the DMA + * controller. It will run any callbacks, submit any dependencies, and then + * free the descriptor. + */ +static void fsldma_cleanup_descriptor(struct fsldma_chan *chan, + struct fsl_desc_sw *desc) +{ + struct dma_async_tx_descriptor *txd = desc-async_tx; + struct device *dev = chan-common.device-dev; + dma_addr_t src = get_desc_src(chan, desc); + dma_addr_t dst = get_desc_dst(chan, desc); + u32 len = get_desc_cnt(chan, desc); + + /* Run the link descriptor callback function */ + if (txd-callback) { +#ifdef FSL_DMA_LD_DEBUG + dev_dbg(chan-dev, %s: LD %p callback\n, chan-name, desc); +#endif + txd-callback(txd-callback_param); + } + + /* Run any dependencies */ + dma_run_dependencies(txd); + + /* Unmap the dst buffer, if requested */ + if (!(txd-flags DMA_COMPL_SKIP_DEST_UNMAP)) { + if (txd-flags DMA_COMPL_DEST_UNMAP_SINGLE) + dma_unmap_single(dev, dst, len, DMA_FROM_DEVICE); + else + dma_unmap_page(dev, dst, len, DMA_FROM_DEVICE); + } + + /* Unmap the src buffer, if requested */ + if (!(txd-flags DMA_COMPL_SKIP_SRC_UNMAP)) { + if (txd-flags DMA_COMPL_SRC_UNMAP_SINGLE) + dma_unmap_single(dev, src, len, DMA_TO_DEVICE); + else + dma_unmap_page(dev, src, len, DMA_TO_DEVICE); + } + +#ifdef FSL_DMA_LD_DEBUG + dev_dbg(chan-dev, %s: LD %p free\n, chan-name, desc); +#endif + dma_pool_free(chan-desc_pool, desc, txd-phys); +} + +/** * fsl_chan_ld_cleanup - Clean up link descriptors * @chan : Freescale DMA channel * @@ -816,57 +892,39 @@ static void fsl_chan_ld_cleanup(struct fsldma_chan *chan) { struct fsl_desc_sw *desc, *_desc; const char *name = chan-name; + LIST_HEAD(ld_cleanup); unsigned long flags; spin_lock_irqsave(chan-desc_lock, flags); - /* if the ld_running list is empty, there is nothing to do */ - if (list_empty(chan-ld_running)) { - dev_dbg(chan-dev, %s: no descriptors to cleanup\n, name); - goto out_unlock; + /* update the cookie if we have some
[PATCH v2 8/9] fsldma: reduce locking during descriptor cleanup
This merges the fsl_chan_ld_cleanup() function into the dma_do_tasklet() function to reduce locking overhead. In the best case, we will be able to keep the DMA controller busy while we are freeing used descriptors. In all cases, the spinlock is grabbed two times fewer than before on each transaction. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 114 + 1 files changed, 49 insertions(+), 65 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 48e48c7..40babc1 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -879,67 +879,16 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan *chan, } /** - * fsl_chan_ld_cleanup - Clean up link descriptors - * @chan : Freescale DMA channel - * - * This function is run after the queue of running descriptors has been - * executed by the DMA engine. It will run any callbacks, and then free - * the descriptors. - * - * HARDWARE STATE: idle - */ -static void fsl_chan_ld_cleanup(struct fsldma_chan *chan) -{ - struct fsl_desc_sw *desc, *_desc; - const char *name = chan-name; - LIST_HEAD(ld_cleanup); - unsigned long flags; - - spin_lock_irqsave(chan-desc_lock, flags); - - /* update the cookie if we have some descriptors to cleanup */ - if (!list_empty(chan-ld_running)) { - dma_cookie_t cookie; - - desc = to_fsl_desc(chan-ld_running.prev); - cookie = desc-async_tx.cookie; - - chan-completed_cookie = cookie; - dev_dbg(chan-dev, %s: completed_cookie=%d\n, name, cookie); - } - - /* -* move the descriptors to a temporary list so we can drop the lock -* during the entire cleanup operation -*/ - list_splice_tail_init(chan-ld_running, ld_cleanup); - - spin_unlock_irqrestore(chan-desc_lock, flags); - - /* Run the callback for each descriptor, in order */ - list_for_each_entry_safe(desc, _desc, ld_cleanup, node) { - - /* Remove from the list of transactions */ - list_del(desc-node); - - /* Run all cleanup for this descriptor */ - fsldma_cleanup_descriptor(chan, desc); - } -} - -/** * fsl_chan_xfer_ld_queue - transfer any pending transactions * @chan : Freescale DMA channel * * HARDWARE STATE: idle + * LOCKING: must hold chan-desc_lock */ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) { const char *name = chan-name; struct fsl_desc_sw *desc; - unsigned long flags; - - spin_lock_irqsave(chan-desc_lock, flags); /* * If the list of pending descriptors is empty, then we @@ -947,7 +896,7 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) */ if (list_empty(chan-ld_pending)) { dev_dbg(chan-dev, %s: no pending LDs\n, name); - goto out_unlock; + return; } /* @@ -957,7 +906,7 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) */ if (!chan-idle) { dev_dbg(chan-dev, %s: DMA controller still busy\n, name); - goto out_unlock; + return; } /* @@ -995,9 +944,6 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) dma_start(chan); chan-idle = false; - -out_unlock: - spin_unlock_irqrestore(chan-desc_lock, flags); } /** @@ -1007,7 +953,11 @@ out_unlock: static void fsl_dma_memcpy_issue_pending(struct dma_chan *dchan) { struct fsldma_chan *chan = to_fsl_chan(dchan); + unsigned long flags; + + spin_lock_irqsave(chan-desc_lock, flags); fsl_chan_xfer_ld_queue(chan); + spin_unlock_irqrestore(chan-desc_lock, flags); } /** @@ -1109,21 +1059,55 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data) static void dma_do_tasklet(unsigned long data) { struct fsldma_chan *chan = (struct fsldma_chan *)data; + struct fsl_desc_sw *desc, *_desc; + const char *name = chan-name; + LIST_HEAD(ld_cleanup); unsigned long flags; - dev_dbg(chan-dev, %s: tasklet entry\n, chan-name); + dev_dbg(chan-dev, %s: tasklet entry\n, name); - /* run all callbacks, free all used descriptors */ - fsl_chan_ld_cleanup(chan); - - /* the channel is now idle */ spin_lock_irqsave(chan-desc_lock, flags); + + /* update the cookie if we have some descriptors to cleanup */ + if (!list_empty(chan-ld_running)) { + dma_cookie_t cookie; + + desc = to_fsl_desc(chan-ld_running.prev); + cookie = desc-async_tx.cookie; + + chan-completed_cookie = cookie; + dev_dbg(chan-dev, %s: completed_cookie=%d\n, name, cookie); + } + + /* +* move the descriptors to a temporary list so we can drop the lock
Re: [PATCH 0/8] fsldma: lockup fixes
On Tue, Mar 01, 2011 at 07:52:39AM +0200, Felix Radensky wrote: Hi Ira, On 03/01/2011 02:21 AM, Ira W. Snyder wrote: On Mon, Feb 28, 2011 at 11:27:40PM +0200, Felix Radensky wrote: Hi Ira, On 02/28/2011 11:11 PM, Ira W. Snyder wrote: On Mon, Feb 28, 2011 at 10:15:49PM +0200, Felix Radensky wrote: Hi Ira, Thank you very much Felix. The dmesg output shows that the controller never got an interrupt for the second transaction. The patch below has extra debugging information that may help determine why this happens. Please apply it and re-run the test. The last section of dmesg (after Freeing unused kernel memory) is all I need. Attached relevant dmesg portion. Ok, try this patch on top of the last one. It looks like you used the dmatest module in multi-channel mode last time. One channel makes it easier to debug: modprobe dmatest max_channels=1 threads_per_chan=2 iterations=1 Thanks for your help in debugging this. Hopefully this is the last patch to test. :) Ira Looks like this was not the last one. The test still fails, see below From this log, it looks like the DMA controller is not generating an interrupt after the second chain is started. The first chain is finished before the second thread runs and starts its chain. The end-of-segments interrupt is completely missing. The part is not behaving as the datasheet explains it should. Are you sure you applied the patch and rebuilt the kernel? (Just checking to be sure. I'm very appreciative of the amount of help you've given me debugging this!) Can you run this for me: modprobe dmatest max_channels=1 threads_per_chan=1 iterations=4 Thanks again, Ira Without your patches applied the output of the test above looks like this: Thanks, this is exactly what I was going to ask for next. :) I really don't understand why the P2020 DMA controller isn't behaving nicely after my patches. Can you run a git bisect to figure out which patch in the series causes the problems. It should take three or four build + test cycles to narrow down which patch breaks the driver. When it is finished, send me the output of git bisect. Like this (assuming you have two branches: master and fsldma, where fsldma is master + my patches): # setup the bisect git bisect start git bisect bad fsldma git bisect good master # build and test the kernel using the same test as before: modprobe dmatest max_channels=1 threads_per_chan=1 iterations=4 # if the test passes: git bisect good # if the test fails: git bisect bad # now build + test again, then mark that good or bad. Repeat until # finished. I really appreciate your help in testing this. You've been great at providing everything I've asked for. Thanks, Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] fsldma: lockup fixes
On Tue, Mar 01, 2011 at 08:55:15AM -0800, Ira W. Snyder wrote: [ big snip ] Thanks, this is exactly what I was going to ask for next. :) I really don't understand why the P2020 DMA controller isn't behaving nicely after my patches. Can you run a git bisect to figure out which patch in the series causes the problems. It should take three or four build + test cycles to narrow down which patch breaks the driver. When it is finished, send me the output of git bisect. Like this (assuming you have two branches: master and fsldma, where fsldma is master + my patches): # setup the bisect git bisect start git bisect bad fsldma git bisect good master # build and test the kernel using the same test as before: modprobe dmatest max_channels=1 threads_per_chan=1 iterations=4 # if the test passes: git bisect good # if the test fails: git bisect bad # now build + test again, then mark that good or bad. Repeat until # finished. I really appreciate your help in testing this. You've been great at providing everything I've asked for. I'd still like the bisect if you have a chance. I've re-reviewed the patch series, and found the places that change register writes to the controller. The patch below changes the register operations back to the original order. It doesn't make any sense why this would be required, but it is worth a quick try. I've added an XXX mark where you can comment out a single line if this patch fails. It is highly unlikely to make any difference, but I'm really having a hard time understanding what is wrong. Ira From 9e479ce27f8c1819694d7082bb4a27772b4baf52 Mon Sep 17 00:00:00 2001 From: Ira W. Snyder i...@ovro.caltech.edu Date: Tue, 1 Mar 2011 11:43:00 -0800 Subject: [PATCH] fsldma: try and fix 85xx DMA controller This is just a random guess at what might be wrong. The datasheet doesn't say that a completed transfer must be aborted before starting a new transfer (nor does it make much sense). However, the old code did it anyway. NOT AT ALL Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 15 +++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index e4d9d17..d8eedbc 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -213,6 +213,7 @@ static void dma_halt(struct fsldma_chan *chan) int i; mode = DMA_IN(chan, chan-regs-mr, 32); + dev_dbg(chan-dev, %s: dma_halt mode=0x%.8x\n, chan-name, mode); mode |= FSL_DMA_MR_CA; DMA_OUT(chan, chan-regs-mr, mode, 32); @@ -921,10 +922,24 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) list_splice_tail_init(chan-ld_pending, chan-ld_running); /* +* XXX: Guess at problems +* +* The 85xx requires that you run this routine before you try to start +* the next DMA for an as yet unknown reason. Maybe. +*/ + if ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) { + dev_dbg(chan-dev, %s: 85xx, running workaround\n, name); + dma_halt(chan); + } + + /* * Program the descriptor's address into the DMA controller, * then start the DMA transaction */ set_cdar(chan, desc-async_tx.phys); + + + /* XXX: if that doesn't work, comment the get_cdar() line below */ get_cdar(chan); dma_start(chan); -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] fsldma: lockup fixes
On Mon, Feb 28, 2011 at 01:36:38PM +0200, Felix Radensky wrote: Hi Ira, I've tried your patches with linux-2.6.38-rc6 on P2020RDB. DMA test fails with the following errors if threads_per_chan != 1 dma0chan0-copy1: terminating after 1 tests, 1 failures (status 0) dma0chan0-copy2: #0: test timed out I've run the test like this: modprobe dmatest threads_per_chan=2 iterations=1 Thanks Felix. This works fine on the 83xx DMA controller. When you have a chance, can you add #define DEBUG 1 as the first line of drivers/dma/fsldma.c and then rerun your test with: modprobe dmatest threads_per_chan=2 iterations=1 max_channels=1 And send me the dmesg output. I don't quite understand the difference between links and lists in the 85xx controller yet. I'll work my way through the datasheet this morning and send out a fixed patch. Thanks very much for running the tests! Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] fsldma: lockup fixes
On Mon, Feb 28, 2011 at 08:47:42PM +0200, Felix Radensky wrote: br Hi Ira,br br Attached dmesg output.br br Felix.br br preOn Mon, Feb 28, 2011 at 01:36:38PM +0200, Felix Radensky wrote: gt; Hi Ira, gt; gt; I've tried your patches with linux-2.6.38-rc6 on P2020RDB. gt; DMA test fails with the following errors if threads_per_chan != 1 gt; gt; dma0chan0-copy1: terminating after 1 tests, 1 failures (status 0) gt; dma0chan0-copy2: #0: test timed out gt; gt; I've run the test like this: gt; gt; modprobe dmatest threads_per_chan=2 iterations=1 gt; Thanks Felix. This works fine on the 83xx DMA controller. When you have a chance, can you add #define DEBUG 1 as the first line of drivers/dma/fsldma.c and then rerun your test with: modprobe dmatest threads_per_chan=2 iterations=1 max_channels=1 And send me the dmesg output. I don't quite understand the difference between links and lists in the 85xx controller yet. I'll work my way through the datasheet this morning and send out a fixed patch. Thanks very much for running the tests! Ira [ snip most of dmesg output ] Freeing unused kernel memory: 136k init __dma_request_channel: success (dma0chan0) of:fsl-elo-dma ffe0c300.dma: chan0: idle, starting controller dmatest: Started 2 threads using dma0chan0 of:fsl-elo-dma ffe0c300.dma: chan0: irq: stat = 0x8 of:fsl-elo-dma ffe0c300.dma: chan0: irq: End-of-link INT of:fsl-elo-dma ffe0c300.dma: chan0: irq: Exit of:fsl-elo-dma ffe0c300.dma: chan0: tasklet entry of:fsl-elo-dma ffe0c300.dma: chan0: completed_cookie=1 of:fsl-elo-dma ffe0c300.dma: chan0: no pending LDs of:fsl-elo-dma ffe0c300.dma: chan0: tasklet exit dma0chan0-copy0: verifying source buffer... dma0chan0-copy0: verifying dest buffer... dma0chan0-copy0: #0: No errors with src_off=0x3a2 dst_off=0xc1e len=0x2ce5 dma0chan0-copy0: terminating after 1 tests, 0 failures (status 0) of:fsl-elo-dma ffe0c300.dma: chan0: idle, starting controller dma0chan0-copy1: #0: test timed out dma0chan0-copy1: terminating after 1 tests, 1 failures (status 0) Thank you very much Felix. The dmesg output shows that the controller never got an interrupt for the second transaction. The patch below has extra debugging information that may help determine why this happens. Please apply it and re-run the test. The last section of dmesg (after Freeing unused kernel memory) is all I need. Thanks again, Ira From 8935444cb18c921332ebe1d055531e54f0c100e9 Mon Sep 17 00:00:00 2001 From: Ira W. Snyder i...@ovro.caltech.edu Date: Mon, 28 Feb 2011 11:33:17 -0800 Subject: [PATCH] fsldma: try and debug 85xx controller 1 - reduce the maximum transfer size to 1000 bytes to force chains 2 - re-enable end-of-segment interrupts to see what the hardware does 3 - enable end-of-list interrupts to see what the hardware does 4 - debug cookies (this shouldn't be a problem, but just in case) NOT AT ALL Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 16 drivers/dma/fsldma.h |3 ++- 2 files changed, 18 insertions(+), 1 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 3dc27a9..b82b76e 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -24,6 +24,9 @@ * */ +#define DEBUG 1 +#define FSL_DMA_LD_DEBUG 1 + #include linux/init.h #include linux/module.h #include linux/pci.h @@ -162,6 +165,7 @@ static void dma_init(struct fsldma_chan *chan) * BWC - Bandwidth sharing among channels */ DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC + | FSL_DMA_MR_EOSIE | FSL_DMA_MR_EOLSIE | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE, 32); break; case FSL_DMA_IP_83XX: @@ -389,6 +393,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) * that make up this transaction */ cookie = chan-common.cookie; + dev_dbg(chan-dev, %s: assign cookies: start=%d\n, chan-name, cookie); list_for_each_entry(child, desc-tx_list, node) { cookie++; if (cookie DMA_MIN_COOKIE) @@ -397,6 +402,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) child-async_tx.cookie = cookie; } + dev_dbg(chan-dev, %s: assign cookies: end=%d\n, chan-name, cookie); chan-common.cookie = cookie; /* put this transaction onto the tail of the pending queue */ @@ -1018,6 +1024,16 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data) stat = ~FSL_DMA_SR_EOLNI; } + if (stat FSL_DMA_SR_EOLSI) { + dev_dbg(chan-dev, %s: irq: End-of-list INT\n, name); + stat = ~FSL_DMA_SR_EOLSI; + } + + if (stat FSL_DMA_SR_EOSI) { + dev_dbg(chan-dev, %s: irq: End-of-segment INT\n, name); + stat = ~FSL_DMA_SR_EOSI
Re: [PATCH 0/8] fsldma: lockup fixes
On Mon, Feb 28, 2011 at 10:15:49PM +0200, Felix Radensky wrote: Hi Ira, Thank you very much Felix. The dmesg output shows that the controller never got an interrupt for the second transaction. The patch below has extra debugging information that may help determine why this happens. Please apply it and re-run the test. The last section of dmesg (after Freeing unused kernel memory) is all I need. Attached relevant dmesg portion. Ok, try this patch on top of the last one. It looks like you used the dmatest module in multi-channel mode last time. One channel makes it easier to debug: modprobe dmatest max_channels=1 threads_per_chan=2 iterations=1 Thanks for your help in debugging this. Hopefully this is the last patch to test. :) Ira From 58bc23c3b68f8db0aa09434fdeb6aef641a5eadd Mon Sep 17 00:00:00 2001 From: Ira W. Snyder i...@ovro.caltech.edu Date: Mon, 28 Feb 2011 12:55:55 -0800 Subject: [PATCH] fsldma: enable end-of-segments interrupt on last descriptor This is a hack to manually set the end-of-segments interrupt on the last descriptor in each chain. It appears that the P2020RDB controller doesn't generate the end-of-links interrupt as explained in the datasheet. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index b82b76e..e4d9d17 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -141,7 +141,7 @@ static void set_ld_eol(struct fsldma_chan *chan, struct fsl_desc_sw *desc) u64 snoop_bits; snoop_bits = ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_83XX) - ? FSL_DMA_SNEN : 0; + ? FSL_DMA_SNEN : (u64)(0x8); desc-hw.next_ln_addr = CPU_TO_DMA(chan, DMA_TO_CPU(chan, desc-hw.next_ln_addr, 64) | FSL_DMA_EOL @@ -165,7 +165,6 @@ static void dma_init(struct fsldma_chan *chan) * BWC - Bandwidth sharing among channels */ DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC - | FSL_DMA_MR_EOSIE | FSL_DMA_MR_EOLSIE | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE, 32); break; case FSL_DMA_IP_83XX: -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] fsldma: lockup fixes
On Mon, Feb 28, 2011 at 11:27:40PM +0200, Felix Radensky wrote: Hi Ira, On 02/28/2011 11:11 PM, Ira W. Snyder wrote: On Mon, Feb 28, 2011 at 10:15:49PM +0200, Felix Radensky wrote: Hi Ira, Thank you very much Felix. The dmesg output shows that the controller never got an interrupt for the second transaction. The patch below has extra debugging information that may help determine why this happens. Please apply it and re-run the test. The last section of dmesg (after Freeing unused kernel memory) is all I need. Attached relevant dmesg portion. Ok, try this patch on top of the last one. It looks like you used the dmatest module in multi-channel mode last time. One channel makes it easier to debug: modprobe dmatest max_channels=1 threads_per_chan=2 iterations=1 Thanks for your help in debugging this. Hopefully this is the last patch to test. :) Ira Looks like this was not the last one. The test still fails, see below From this log, it looks like the DMA controller is not generating an interrupt after the second chain is started. The first chain is finished before the second thread runs and starts its chain. The end-of-segments interrupt is completely missing. The part is not behaving as the datasheet explains it should. Are you sure you applied the patch and rebuilt the kernel? (Just checking to be sure. I'm very appreciative of the amount of help you've given me debugging this!) Can you run this for me: modprobe dmatest max_channels=1 threads_per_chan=1 iterations=4 Thanks again, Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/8] fsldma: lockup fixes
Hello everyone, I've been chasing random infrequent controller lockups in the fsldma driver for a long time. I finally managed to find the problem and fix it. I'm not quite sure about the exact sequence of events which causes the race condition, but it is related to using the hardware registers to track the controller state. See the patch changelogs for more detail. The problems were quickly found by turning on DMAPOOL_DEBUG inside mm/dmapool.c. This poisons memory allocated with the dmapool API. With dmapool poisoning turned on, the dmatest driver would start producing failures within a few seconds. After this patchset has been applied, I have run several iterations of the 10 threads per channel, 10 iterations per thread test without any problems. I have made some changes which effect the 85xx/86xx part. I believe that the changes only effect features which have been unused since the rewrite in Jan 2010. It would be very good to get a test report from an 85xx/86xx user. While making the previous changes, I noticed that the fsldma driver does not respect the automatic DMA unmapping of src and dst buffers. I have added support for this feature. This also required a fix to dmatest, which was sending incorrect flags. The support async_tx dependencies patch could be split apart from the automatic unmapping patch if it is desirable. They both touch the same piece of code, so I thought it was ok to combine them. Let me know. I would really like to see this go into 2.6.39. I think we can get it reviewed before then. :) Ira W. Snyder (8): fsldma: move related helper functions near each other fsldma: use channel name in printk output fsldma: improve link descriptor debugging fsldma: minor codingstyle and consistency fixes fsldma: fix controller lockups fsldma: support async_tx dependencies and automatic unmapping dmatest: fix automatic buffer unmap type fsldma: reduce locking during descriptor cleanup drivers/dma/dmatest.c |7 +- drivers/dma/fsldma.c | 485 +--- drivers/dma/fsldma.h |6 +- 3 files changed, 263 insertions(+), 235 deletions(-) -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 3/8] fsldma: improve link descriptor debugging
This adds better tracking to link descriptor allocations, callbacks, and frees. This makes it much easier to track errors with link descriptors. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 21 +++-- 1 files changed, 15 insertions(+), 6 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 6e3d3d7..851993c 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -416,6 +416,10 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor( desc-async_tx.tx_submit = fsl_dma_tx_submit; desc-async_tx.phys = pdesc; +#ifdef FSL_DMA_LD_DEBUG + dev_dbg(chan-dev, %s: LD %p allocated\n, chan-name, desc); +#endif + return desc; } @@ -467,6 +471,9 @@ static void fsldma_free_desc_list(struct fsldma_chan *chan, list_for_each_entry_safe(desc, _desc, list, node) { list_del(desc-node); +#ifdef FSL_DMA_LD_DEBUG + dev_dbg(chan-dev, %s: LD %p free\n, chan-name, desc); +#endif dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys); } } @@ -478,6 +485,9 @@ static void fsldma_free_desc_list_reverse(struct fsldma_chan *chan, list_for_each_entry_safe_reverse(desc, _desc, list, node) { list_del(desc-node); +#ifdef FSL_DMA_LD_DEBUG + dev_dbg(chan-dev, %s: LD %p free\n, chan-name, desc); +#endif dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys); } } @@ -554,9 +564,6 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy( dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom); goto fail; } -#ifdef FSL_DMA_LD_DEBUG - dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, new); -#endif copy = min(len, (size_t)FSL_DMA_BCR_MAX_CNT); @@ -642,9 +649,6 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_sg(struct dma_chan *dchan, dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom); goto fail; } -#ifdef FSL_DMA_LD_DEBUG - dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, new); -#endif set_desc_cnt(chan, new-hw, len); set_desc_src(chan, new-hw, src); @@ -881,13 +885,18 @@ static void fsl_chan_ld_cleanup(struct fsldma_chan *chan) callback_param = desc-async_tx.callback_param; if (callback) { spin_unlock_irqrestore(chan-desc_lock, flags); +#ifdef FSL_DMA_LD_DEBUG dev_dbg(chan-dev, %s: LD %p callback\n, name, desc); +#endif callback(callback_param); spin_lock_irqsave(chan-desc_lock, flags); } /* Run any dependencies, then free the descriptor */ dma_run_dependencies(desc-async_tx); +#ifdef FSL_DMA_LD_DEBUG + dev_dbg(chan-dev, %s: LD %p free\n, name, desc); +#endif dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys); } -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/8] fsldma: move related helper functions near each other
This is a purely cosmetic cleanup. It is nice to have related functions right next to each other in the code. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 116 +++-- 1 files changed, 64 insertions(+), 52 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 4de947a..2e1af45 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -39,33 +39,9 @@ static const char msg_ld_oom[] = No free memory for link descriptor\n; -static void dma_init(struct fsldma_chan *chan) -{ - /* Reset the channel */ - DMA_OUT(chan, chan-regs-mr, 0, 32); - - switch (chan-feature FSL_DMA_IP_MASK) { - case FSL_DMA_IP_85XX: - /* Set the channel to below modes: -* EIE - Error interrupt enable -* EOSIE - End of segments interrupt enable (basic mode) -* EOLNIE - End of links interrupt enable -* BWC - Bandwidth sharing among channels -*/ - DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC - | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE - | FSL_DMA_MR_EOSIE, 32); - break; - case FSL_DMA_IP_83XX: - /* Set the channel to below modes: -* EOTIE - End-of-transfer interrupt enable -* PRC_RM - PCI read multiple -*/ - DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_EOTIE - | FSL_DMA_MR_PRC_RM, 32); - break; - } -} +/* + * Register Helpers + */ static void set_sr(struct fsldma_chan *chan, u32 val) { @@ -77,6 +53,30 @@ static u32 get_sr(struct fsldma_chan *chan) return DMA_IN(chan, chan-regs-sr, 32); } +static void set_cdar(struct fsldma_chan *chan, dma_addr_t addr) +{ + DMA_OUT(chan, chan-regs-cdar, addr | FSL_DMA_SNEN, 64); +} + +static dma_addr_t get_cdar(struct fsldma_chan *chan) +{ + return DMA_IN(chan, chan-regs-cdar, 64) ~FSL_DMA_SNEN; +} + +static dma_addr_t get_ndar(struct fsldma_chan *chan) +{ + return DMA_IN(chan, chan-regs-ndar, 64); +} + +static u32 get_bcr(struct fsldma_chan *chan) +{ + return DMA_IN(chan, chan-regs-bcr, 32); +} + +/* + * Descriptor Helpers + */ + static void set_desc_cnt(struct fsldma_chan *chan, struct fsl_dma_ld_hw *hw, u32 count) { @@ -113,24 +113,49 @@ static void set_desc_next(struct fsldma_chan *chan, hw-next_ln_addr = CPU_TO_DMA(chan, snoop_bits | next, 64); } -static void set_cdar(struct fsldma_chan *chan, dma_addr_t addr) +static void set_ld_eol(struct fsldma_chan *chan, + struct fsl_desc_sw *desc) { - DMA_OUT(chan, chan-regs-cdar, addr | FSL_DMA_SNEN, 64); -} + u64 snoop_bits; -static dma_addr_t get_cdar(struct fsldma_chan *chan) -{ - return DMA_IN(chan, chan-regs-cdar, 64) ~FSL_DMA_SNEN; -} + snoop_bits = ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_83XX) + ? FSL_DMA_SNEN : 0; -static dma_addr_t get_ndar(struct fsldma_chan *chan) -{ - return DMA_IN(chan, chan-regs-ndar, 64); + desc-hw.next_ln_addr = CPU_TO_DMA(chan, + DMA_TO_CPU(chan, desc-hw.next_ln_addr, 64) | FSL_DMA_EOL + | snoop_bits, 64); } -static u32 get_bcr(struct fsldma_chan *chan) +/* + * DMA Engine Hardware Control Helpers + */ + +static void dma_init(struct fsldma_chan *chan) { - return DMA_IN(chan, chan-regs-bcr, 32); + /* Reset the channel */ + DMA_OUT(chan, chan-regs-mr, 0, 32); + + switch (chan-feature FSL_DMA_IP_MASK) { + case FSL_DMA_IP_85XX: + /* Set the channel to below modes: +* EIE - Error interrupt enable +* EOSIE - End of segments interrupt enable (basic mode) +* EOLNIE - End of links interrupt enable +* BWC - Bandwidth sharing among channels +*/ + DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC + | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE + | FSL_DMA_MR_EOSIE, 32); + break; + case FSL_DMA_IP_83XX: + /* Set the channel to below modes: +* EOTIE - End-of-transfer interrupt enable +* PRC_RM - PCI read multiple +*/ + DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_EOTIE + | FSL_DMA_MR_PRC_RM, 32); + break; + } } static int dma_is_idle(struct fsldma_chan *chan) @@ -185,19 +210,6 @@ static void dma_halt(struct fsldma_chan *chan) dev_err(chan-dev, DMA halt timeout!\n); } -static void set_ld_eol(struct fsldma_chan *chan, - struct fsl_desc_sw *desc) -{ - u64 snoop_bits; - - snoop_bits = ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_83XX
[PATCH 4/8] fsldma: minor codingstyle and consistency fixes
This fixes some minor violations of the coding style. It also changes the style of the device_prep_dma_*() function definitions so they are identical. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 29 + drivers/dma/fsldma.h |4 ++-- 2 files changed, 15 insertions(+), 18 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 851993c..06421c0 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -84,7 +84,7 @@ static void set_desc_cnt(struct fsldma_chan *chan, } static void set_desc_src(struct fsldma_chan *chan, - struct fsl_dma_ld_hw *hw, dma_addr_t src) +struct fsl_dma_ld_hw *hw, dma_addr_t src) { u64 snoop_bits; @@ -94,7 +94,7 @@ static void set_desc_src(struct fsldma_chan *chan, } static void set_desc_dst(struct fsldma_chan *chan, - struct fsl_dma_ld_hw *hw, dma_addr_t dst) +struct fsl_dma_ld_hw *hw, dma_addr_t dst) { u64 snoop_bits; @@ -104,7 +104,7 @@ static void set_desc_dst(struct fsldma_chan *chan, } static void set_desc_next(struct fsldma_chan *chan, - struct fsl_dma_ld_hw *hw, dma_addr_t next) + struct fsl_dma_ld_hw *hw, dma_addr_t next) { u64 snoop_bits; @@ -113,8 +113,7 @@ static void set_desc_next(struct fsldma_chan *chan, hw-next_ln_addr = CPU_TO_DMA(chan, snoop_bits | next, 64); } -static void set_ld_eol(struct fsldma_chan *chan, - struct fsl_desc_sw *desc) +static void set_ld_eol(struct fsldma_chan *chan, struct fsl_desc_sw *desc) { u64 snoop_bits; @@ -333,8 +332,7 @@ static void fsl_chan_toggle_ext_start(struct fsldma_chan *chan, int enable) chan-feature = ~FSL_DMA_CHAN_START_EXT; } -static void append_ld_queue(struct fsldma_chan *chan, - struct fsl_desc_sw *desc) +static void append_ld_queue(struct fsldma_chan *chan, struct fsl_desc_sw *desc) { struct fsl_desc_sw *tail = to_fsl_desc(chan-ld_pending.prev); @@ -375,8 +373,8 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) cookie = chan-common.cookie; list_for_each_entry(child, desc-tx_list, node) { cookie++; - if (cookie 0) - cookie = 1; + if (cookie DMA_MIN_COOKIE) + cookie = DMA_MIN_COOKIE; child-async_tx.cookie = cookie; } @@ -397,8 +395,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) * * Return - The descriptor allocated. NULL for failed. */ -static struct fsl_desc_sw *fsl_dma_alloc_descriptor( - struct fsldma_chan *chan) +static struct fsl_desc_sw *fsl_dma_alloc_descriptor(struct fsldma_chan *chan) { const char *name = chan-name; struct fsl_desc_sw *desc; @@ -423,7 +420,6 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor( return desc; } - /** * fsl_dma_alloc_chan_resources - Allocate resources for DMA channel. * @chan : Freescale DMA channel @@ -534,14 +530,15 @@ fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags) /* Insert the link descriptor to the LD ring */ list_add_tail(new-node, new-tx_list); - /* Set End-of-link to the last link descriptor of new list*/ + /* Set End-of-link to the last link descriptor of new list */ set_ld_eol(chan, new); return new-async_tx; } -static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy( - struct dma_chan *dchan, dma_addr_t dma_dst, dma_addr_t dma_src, +static struct dma_async_tx_descriptor * +fsl_dma_prep_memcpy(struct dma_chan *dchan, + dma_addr_t dma_dst, dma_addr_t dma_src, size_t len, unsigned long flags) { struct fsldma_chan *chan; @@ -591,7 +588,7 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy( new-async_tx.flags = flags; /* client is in control of this ack */ new-async_tx.cookie = -EBUSY; - /* Set End-of-link to the last link descriptor of new list*/ + /* Set End-of-link to the last link descriptor of new list */ set_ld_eol(chan, new); return first-async_tx; diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h index 113e713..49189da 100644 --- a/drivers/dma/fsldma.h +++ b/drivers/dma/fsldma.h @@ -102,8 +102,8 @@ struct fsl_desc_sw { } __attribute__((aligned(32))); struct fsldma_chan_regs { - u32 mr; /* 0x00 - Mode Register */ - u32 sr; /* 0x04 - Status Register */ + u32 mr; /* 0x00 - Mode Register */ + u32 sr; /* 0x04 - Status Register */ u64 cdar; /* 0x08 - Current descriptor address register */ u64 sar;/* 0x10 - Source Address Register */ u64 dar;/* 0x18
[PATCH 2/8] fsldma: use channel name in printk output
This makes debugging the driver much easier when multiple channels are running concurrently. In addition, you can see how much descriptor memory each channel has allocated via the dmapool API in sysfs. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 60 +++-- drivers/dma/fsldma.h |1 + 2 files changed, 34 insertions(+), 27 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 2e1af45..6e3d3d7 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -37,7 +37,7 @@ #include fsldma.h -static const char msg_ld_oom[] = No free memory for link descriptor\n; +static const char msg_ld_oom[] = No free memory for link descriptor; /* * Register Helpers @@ -207,7 +207,7 @@ static void dma_halt(struct fsldma_chan *chan) } if (!dma_is_idle(chan)) - dev_err(chan-dev, DMA halt timeout!\n); + dev_err(chan-dev, %s: DMA halt timeout!\n, chan-name); } /** @@ -400,12 +400,13 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) static struct fsl_desc_sw *fsl_dma_alloc_descriptor( struct fsldma_chan *chan) { + const char *name = chan-name; struct fsl_desc_sw *desc; dma_addr_t pdesc; desc = dma_pool_alloc(chan-desc_pool, GFP_ATOMIC, pdesc); if (!desc) { - dev_dbg(chan-dev, out of memory for link desc\n); + dev_dbg(chan-dev, %s: out of memory for link desc\n, name); return NULL; } @@ -439,13 +440,12 @@ static int fsl_dma_alloc_chan_resources(struct dma_chan *dchan) * We need the descriptor to be aligned to 32bytes * for meeting FSL DMA specification requirement. */ - chan-desc_pool = dma_pool_create(fsl_dma_engine_desc_pool, - chan-dev, + chan-desc_pool = dma_pool_create(chan-name, chan-dev, sizeof(struct fsl_desc_sw), __alignof__(struct fsl_desc_sw), 0); if (!chan-desc_pool) { - dev_err(chan-dev, unable to allocate channel %d - descriptor pool\n, chan-id); + dev_err(chan-dev, %s: unable to allocate descriptor pool\n, + chan-name); return -ENOMEM; } @@ -491,7 +491,7 @@ static void fsl_dma_free_chan_resources(struct dma_chan *dchan) struct fsldma_chan *chan = to_fsl_chan(dchan); unsigned long flags; - dev_dbg(chan-dev, Free all channel resources.\n); + dev_dbg(chan-dev, %s: Free all channel resources.\n, chan-name); spin_lock_irqsave(chan-desc_lock, flags); fsldma_free_desc_list(chan, chan-ld_pending); fsldma_free_desc_list(chan, chan-ld_running); @@ -514,7 +514,7 @@ fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags) new = fsl_dma_alloc_descriptor(chan); if (!new) { - dev_err(chan-dev, msg_ld_oom); + dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom); return NULL; } @@ -551,11 +551,11 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy( /* Allocate the link descriptor from DMA pool */ new = fsl_dma_alloc_descriptor(chan); if (!new) { - dev_err(chan-dev, msg_ld_oom); + dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom); goto fail; } #ifdef FSL_DMA_LD_DEBUG - dev_dbg(chan-dev, new link desc alloc %p\n, new); + dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, new); #endif copy = min(len, (size_t)FSL_DMA_BCR_MAX_CNT); @@ -639,11 +639,11 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_sg(struct dma_chan *dchan, /* allocate and populate the descriptor */ new = fsl_dma_alloc_descriptor(chan); if (!new) { - dev_err(chan-dev, msg_ld_oom); + dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom); goto fail; } #ifdef FSL_DMA_LD_DEBUG - dev_dbg(chan-dev, new link desc alloc %p\n, new); + dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, new); #endif set_desc_cnt(chan, new-hw, len); @@ -815,7 +815,7 @@ static void fsl_dma_update_completed_cookie(struct fsldma_chan *chan) spin_lock_irqsave(chan-desc_lock, flags); if (list_empty(chan-ld_running)) { - dev_dbg(chan-dev, no running descriptors\n); + dev_dbg(chan-dev, %s: no running descriptors\n, chan-name); goto out_unlock; } @@ -859,11 +859,13 @@ static enum dma_status
[PATCH 7/8] dmatest: fix automatic buffer unmap type
The dmatest code relies on the DMAEngine API to automatically call dma_unmap_single() on src buffers. The flags it passes are incorrect, fix them. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/dmatest.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c index 5589358..7e1b0aa 100644 --- a/drivers/dma/dmatest.c +++ b/drivers/dma/dmatest.c @@ -285,7 +285,12 @@ static int dmatest_func(void *data) set_user_nice(current, 10); - flags = DMA_CTRL_ACK | DMA_COMPL_SKIP_DEST_UNMAP | DMA_PREP_INTERRUPT; + /* +* src buffers are freed by the DMAEngine code with dma_unmap_single() +* dst buffers are freed by ourselves below +*/ + flags = DMA_CTRL_ACK | DMA_PREP_INTERRUPT + | DMA_COMPL_SKIP_DEST_UNMAP | DMA_COMPL_SRC_UNMAP_SINGLE; while (!kthread_should_stop() !(iterations total_tests = iterations)) { -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 8/8] fsldma: reduce locking during descriptor cleanup
This merges the fsl_chan_ld_cleanup() function into the dma_do_tasklet() function to reduce locking overhead. In the best case, we will be able to keep the DMA controller busy while we are freeing used descriptors. In all cases, the spinlock is grabbed two times fewer than before on each transaction. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 114 + 1 files changed, 49 insertions(+), 65 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 4014790..3dc27a9 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -872,67 +872,16 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan *chan, } /** - * fsl_chan_ld_cleanup - Clean up link descriptors - * @chan : Freescale DMA channel - * - * This function is run after the queue of running descriptors has been - * executed by the DMA engine. It will run any callbacks, and then free - * the descriptors. - * - * HARDWARE STATE: idle - */ -static void fsl_chan_ld_cleanup(struct fsldma_chan *chan) -{ - struct fsl_desc_sw *desc, *_desc; - const char *name = chan-name; - LIST_HEAD(ld_cleanup); - unsigned long flags; - - spin_lock_irqsave(chan-desc_lock, flags); - - /* update the cookie if we have some descriptors to cleanup */ - if (!list_empty(chan-ld_running)) { - dma_cookie_t cookie; - - desc = to_fsl_desc(chan-ld_running.prev); - cookie = desc-async_tx.cookie; - - chan-completed_cookie = cookie; - dev_dbg(chan-dev, %s: completed_cookie=%d\n, name, cookie); - } - - /* -* move the descriptors to a temporary list so we can drop the lock -* during the entire cleanup operation -*/ - list_splice_tail_init(chan-ld_running, ld_cleanup); - - spin_unlock_irqrestore(chan-desc_lock, flags); - - /* Run the callback for each descriptor, in order */ - list_for_each_entry_safe(desc, _desc, ld_cleanup, node) { - - /* Remove from the list of transactions */ - list_del(desc-node); - - /* Run all cleanup for this descriptor */ - fsldma_cleanup_descriptor(chan, desc); - } -} - -/** * fsl_chan_xfer_ld_queue - transfer any pending transactions * @chan : Freescale DMA channel * * HARDWARE STATE: idle + * LOCKING: must hold chan-desc_lock */ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) { const char *name = chan-name; struct fsl_desc_sw *desc; - unsigned long flags; - - spin_lock_irqsave(chan-desc_lock, flags); /* * If the list of pending descriptors is empty, then we @@ -940,7 +889,7 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) */ if (list_empty(chan-ld_pending)) { dev_dbg(chan-dev, %s: no pending LDs\n, name); - goto out_unlock; + return; } /* @@ -950,7 +899,7 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) */ if (!chan-idle) { dev_dbg(chan-dev, %s: DMA controller still busy\n, name); - goto out_unlock; + return; } /* @@ -975,9 +924,6 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) dma_start(chan); chan-idle = false; - -out_unlock: - spin_unlock_irqrestore(chan-desc_lock, flags); } /** @@ -987,7 +933,11 @@ out_unlock: static void fsl_dma_memcpy_issue_pending(struct dma_chan *dchan) { struct fsldma_chan *chan = to_fsl_chan(dchan); + unsigned long flags; + + spin_lock_irqsave(chan-desc_lock, flags); fsl_chan_xfer_ld_queue(chan); + spin_unlock_irqrestore(chan-desc_lock, flags); } /** @@ -1089,21 +1039,55 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data) static void dma_do_tasklet(unsigned long data) { struct fsldma_chan *chan = (struct fsldma_chan *)data; + struct fsl_desc_sw *desc, *_desc; + const char *name = chan-name; + LIST_HEAD(ld_cleanup); unsigned long flags; - dev_dbg(chan-dev, %s: tasklet entry\n, chan-name); + dev_dbg(chan-dev, %s: tasklet entry\n, name); - /* run all callbacks, free all used descriptors */ - fsl_chan_ld_cleanup(chan); - - /* the channel is now idle */ spin_lock_irqsave(chan-desc_lock, flags); + + /* update the cookie if we have some descriptors to cleanup */ + if (!list_empty(chan-ld_running)) { + dma_cookie_t cookie; + + desc = to_fsl_desc(chan-ld_running.prev); + cookie = desc-async_tx.cookie; + + chan-completed_cookie = cookie; + dev_dbg(chan-dev, %s: completed_cookie=%d\n, name, cookie); + } + + /* +* move the descriptors to a temporary list so we can drop the lock
[PATCH 6/8] fsldma: support async_tx dependencies and automatic unmapping
Previous to this patch, the dma_run_dependencies() function has been called while holding desc_lock. This function can call tx_submit() for other descriptors, which may try to re-grab the lock. Avoid this by moving the descriptors to be cleaned up to a temporary list, and dropping the lock before cleanup. At the same time, add support for automatic unmapping of src and dst buffers, as offered by the DMAEngine API. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 132 -- 1 files changed, 95 insertions(+), 37 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index d3c5100..4014790 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -78,6 +78,11 @@ static void set_desc_cnt(struct fsldma_chan *chan, hw-count = CPU_TO_DMA(chan, count, 32); } +static u32 get_desc_cnt(struct fsldma_chan *chan, struct fsl_desc_sw *desc) +{ + return DMA_TO_CPU(chan, desc-hw.count, 32); +} + static void set_desc_src(struct fsldma_chan *chan, struct fsl_dma_ld_hw *hw, dma_addr_t src) { @@ -88,6 +93,16 @@ static void set_desc_src(struct fsldma_chan *chan, hw-src_addr = CPU_TO_DMA(chan, snoop_bits | src, 64); } +static dma_addr_t get_desc_src(struct fsldma_chan *chan, + struct fsl_desc_sw *desc) +{ + u64 snoop_bits; + + snoop_bits = ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) + ? ((u64)FSL_DMA_SATR_SREADTYPE_SNOOP_READ 32) : 0; + return DMA_TO_CPU(chan, desc-hw.src_addr, 64) ~snoop_bits; +} + static void set_desc_dst(struct fsldma_chan *chan, struct fsl_dma_ld_hw *hw, dma_addr_t dst) { @@ -98,6 +113,16 @@ static void set_desc_dst(struct fsldma_chan *chan, hw-dst_addr = CPU_TO_DMA(chan, snoop_bits | dst, 64); } +static dma_addr_t get_desc_dst(struct fsldma_chan *chan, + struct fsl_desc_sw *desc) +{ + u64 snoop_bits; + + snoop_bits = ((chan-feature FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) + ? ((u64)FSL_DMA_DATR_DWRITETYPE_SNOOP_WRITE 32) : 0; + return DMA_TO_CPU(chan, desc-hw.dst_addr, 64) ~snoop_bits; +} + static void set_desc_next(struct fsldma_chan *chan, struct fsl_dma_ld_hw *hw, dma_addr_t next) { @@ -796,6 +821,57 @@ static int fsl_dma_device_control(struct dma_chan *dchan, } /** + * fsldma_cleanup_descriptor - cleanup and free a single link descriptor + * @chan: Freescale DMA channel + * @desc: descriptor to cleanup and free + * + * This function is used on a descriptor which has been executed by the DMA + * controller. It will run any callbacks, submit any dependencies, and then + * free the descriptor. + */ +static void fsldma_cleanup_descriptor(struct fsldma_chan *chan, + struct fsl_desc_sw *desc) +{ + struct dma_async_tx_descriptor *txd = desc-async_tx; + struct device *dev = chan-common.device-dev; + dma_addr_t src = get_desc_src(chan, desc); + dma_addr_t dst = get_desc_dst(chan, desc); + u32 len = get_desc_cnt(chan, desc); + + /* Run the link descriptor callback function */ + if (txd-callback) { +#ifdef FSL_DMA_LD_DEBUG + dev_dbg(chan-dev, %s: LD %p callback\n, chan-name, desc); +#endif + txd-callback(txd-callback_param); + } + + /* Run any dependencies */ + dma_run_dependencies(txd); + + /* Unmap the dst buffer, if requested */ + if (!(txd-flags DMA_COMPL_SKIP_DEST_UNMAP)) { + if (txd-flags DMA_COMPL_DEST_UNMAP_SINGLE) + dma_unmap_single(dev, dst, len, DMA_FROM_DEVICE); + else + dma_unmap_page(dev, dst, len, DMA_FROM_DEVICE); + } + + /* Unmap the src buffer, if requested */ + if (!(txd-flags DMA_COMPL_SKIP_SRC_UNMAP)) { + if (txd-flags DMA_COMPL_SRC_UNMAP_SINGLE) + dma_unmap_single(dev, src, len, DMA_TO_DEVICE); + else + dma_unmap_page(dev, src, len, DMA_TO_DEVICE); + } + +#ifdef FSL_DMA_LD_DEBUG + dev_dbg(chan-dev, %s: LD %p free\n, chan-name, desc); +#endif + dma_pool_free(chan-desc_pool, desc, txd-phys); +} + +/** * fsl_chan_ld_cleanup - Clean up link descriptors * @chan : Freescale DMA channel * @@ -809,57 +885,39 @@ static void fsl_chan_ld_cleanup(struct fsldma_chan *chan) { struct fsl_desc_sw *desc, *_desc; const char *name = chan-name; + LIST_HEAD(ld_cleanup); unsigned long flags; spin_lock_irqsave(chan-desc_lock, flags); - /* if the ld_running list is empty, there is nothing to do */ - if (list_empty(chan-ld_running)) { - dev_dbg(chan-dev, %s: no descriptors to cleanup\n, name); - goto out_unlock; + /* update the cookie if we have some
[PATCH 5/8] fsldma: fix controller lockups
Enabling poisoning in the dmapool API quickly showed that the DMA controller was fetching descriptors that should not have been in use. This has caused intermittent controller lockups during testing. I have been unable to figure out the exact set of conditions which cause this to happen. However, I believe it is related to the driver using the hardware registers to track whether the controller is busy or not. The code can incorrectly decide that the hardware is idle due to lag between register writes and the hardware actually becoming busy. To fix this, the driver has been reworked to explicitly track the state of the hardware, rather than try to guess what it is doing based on the register values. This has passed dmatest with 10 threads per channel, 10 iterations per thread several times without error. Previously, this would fail within a few seconds. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/dma/fsldma.c | 187 +++--- drivers/dma/fsldma.h |1 + 2 files changed, 72 insertions(+), 116 deletions(-) diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 06421c0..d3c5100 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -63,11 +63,6 @@ static dma_addr_t get_cdar(struct fsldma_chan *chan) return DMA_IN(chan, chan-regs-cdar, 64) ~FSL_DMA_SNEN; } -static dma_addr_t get_ndar(struct fsldma_chan *chan) -{ - return DMA_IN(chan, chan-regs-ndar, 64); -} - static u32 get_bcr(struct fsldma_chan *chan) { return DMA_IN(chan, chan-regs-bcr, 32); @@ -138,13 +133,11 @@ static void dma_init(struct fsldma_chan *chan) case FSL_DMA_IP_85XX: /* Set the channel to below modes: * EIE - Error interrupt enable -* EOSIE - End of segments interrupt enable (basic mode) * EOLNIE - End of links interrupt enable * BWC - Bandwidth sharing among channels */ DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC - | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE - | FSL_DMA_MR_EOSIE, 32); + | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE, 32); break; case FSL_DMA_IP_83XX: /* Set the channel to below modes: @@ -757,14 +750,15 @@ static int fsl_dma_device_control(struct dma_chan *dchan, switch (cmd) { case DMA_TERMINATE_ALL: + spin_lock_irqsave(chan-desc_lock, flags); + /* Halt the DMA engine */ dma_halt(chan); - spin_lock_irqsave(chan-desc_lock, flags); - /* Remove and free all of the descriptors in the LD queue */ fsldma_free_desc_list(chan, chan-ld_pending); fsldma_free_desc_list(chan, chan-ld_running); + chan-idle = true; spin_unlock_irqrestore(chan-desc_lock, flags); return 0; @@ -802,78 +796,45 @@ static int fsl_dma_device_control(struct dma_chan *dchan, } /** - * fsl_dma_update_completed_cookie - Update the completed cookie. + * fsl_chan_ld_cleanup - Clean up link descriptors * @chan : Freescale DMA channel * - * CONTEXT: hardirq + * This function is run after the queue of running descriptors has been + * executed by the DMA engine. It will run any callbacks, and then free + * the descriptors. + * + * HARDWARE STATE: idle */ -static void fsl_dma_update_completed_cookie(struct fsldma_chan *chan) +static void fsl_chan_ld_cleanup(struct fsldma_chan *chan) { - struct fsl_desc_sw *desc; + struct fsl_desc_sw *desc, *_desc; + const char *name = chan-name; unsigned long flags; - dma_cookie_t cookie; spin_lock_irqsave(chan-desc_lock, flags); + /* if the ld_running list is empty, there is nothing to do */ if (list_empty(chan-ld_running)) { - dev_dbg(chan-dev, %s: no running descriptors\n, chan-name); + dev_dbg(chan-dev, %s: no descriptors to cleanup\n, name); goto out_unlock; } - /* Get the last descriptor, update the cookie to that */ + /* +* Get the last descriptor, update the cookie to it +* +* This is done before callbacks run so that clients can check the +* status of their DMA transfer inside the callback. +*/ desc = to_fsl_desc(chan-ld_running.prev); - if (dma_is_idle(chan)) - cookie = desc-async_tx.cookie; - else { - cookie = desc-async_tx.cookie - 1; - if (unlikely(cookie DMA_MIN_COOKIE)) - cookie = DMA_MAX_COOKIE; - } - - chan-completed_cookie = cookie; - -out_unlock: - spin_unlock_irqrestore(chan-desc_lock, flags); -} - -/** - * fsldma_desc_status - Check the status of a descriptor - * @chan: Freescale DMA channel - * @desc: DMA SW descriptor
Re: [RFC] Inter-processor Mailboxes Drivers
On Mon, Feb 14, 2011 at 12:03:59PM +0200, Ohad Ben-Cohen wrote: On Mon, Feb 14, 2011 at 12:01 PM, Jamie Iles ja...@jamieiles.com wrote: On Fri, Feb 11, 2011 at 03:19:51PM -0600, Meador Inge wrote:   1. Hardware specific bits somewhere under '.../arch/*'.  Drivers     for the MPIC message registers on Power and OMAP4 mailboxes, for     example.   2. A higher level driver under '.../drivers/mailbox/*'.  That the     pieces in (1) would register with.  This piece would expose the     main kernel API.   3. Userspace interfaces for accessing the mailboxes.  A     '/dev/mailbox1', '/dev/mailbox2', etc... mapping, for example. How about using virtio for all of this and having the mailbox as a notification/message passing driver for the virtio backend? This is exactly what we are doing now, and it looks promising. expect patches soon. I'll be happy to examine the feasibility of doing a port to mpc83xx as soon as I see the code. :-) I have been using the message registers to create a software network card over PCI (between a host system and an mpc83xx in a PCI slot). I have wanted to use virtio for this task for a long time. I think a uniform interface for the mailbox registers would be a very useful API. Thanks, Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH RFCv7 0/2] CARMA Board Support
Hello everyone, This is the seventh posting of these drivers, taking into account comments from earlier postings. I've made sure that the drivers both pass checkpatch without any errors or warnings. I would appreciate as much review as you can offer, so that these can get into the next merge cycle. They've been sitting outside mainline for far too long. RFCv6 - RFCv7: - reference count private data structure (to support unbind) - use #defines instead of hex values for registers - keep lines =80 characters RFCv5 - RFCv6: - change locking in several functions - use list_move_tail() to simplify code - remove unused helper functions RFCv4 - RFCv5: - remove unecessary locking per review comments - do not clobber return values from *_interruptible() - explicitly track buffer DMA mapping - use #defines instead of raw hex addresses - change enable sysfs attribute to root-writeable only RFCv3 - RFCv4: - updates for DATA-FPGA version 2 RFCv2 - RFCv3: - use miscdevice framework (removing the carma class) - add bitfile readback capability to the programmer RFCv1 - RFCv2: - change comments to kerneldoc format - Kconfig improvements - use the videobuf_dma_sg API in the programmer - updates for Freescale DMAEngine DMA_SLAVE API changes KNOWN ISSUES: - untested with a setup that can generate interrupts (will get access soon) - does not handle runtime unbind Information about the CARMA board: The CARMA board is essentially an MPC8349EA MDS reference design with a 1GHz ADC and 4 high powered data processing FPGAs connected to the local bus. It is all packed into a compact PCI form factor. It is used at the Owens Valley Radio Observatory as the main component in the correlator system. For board information, see: http://www.mmarray.org/~dwh/carma_board/index.html For DATA-FPGA register layout, see: http://www.mmarray.org/memos/carma_memo46.pdf These drivers are the necessary pieces to get the data processing FPGAs working and producing data. Despite the fact that the hardware is custom and we are the only users, I'd still like to get the drivers upstream. Several people have suggested that this is possible. Some further patches will be forthcoming. I have a driver for the LED subsystem and the PPS subsystem. The LED register layout is expected to change soon, so I won't post the driver until that is finished. The PPS driver will be posted seperately from this patch series; it is very generic. Thanks to everyone who has provided comments on earlier versions! Ira W. Snyder (2): misc: add CARMA DATA-FPGA Access Driver misc: add CARMA DATA-FPGA Programmer support drivers/misc/Kconfig|1 + drivers/misc/Makefile |1 + drivers/misc/carma/Kconfig | 18 + drivers/misc/carma/Makefile |2 + drivers/misc/carma/carma-fpga-program.c | 1141 drivers/misc/carma/carma-fpga.c | 1433 +++ 6 files changed, 2596 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/carma/Kconfig create mode 100644 drivers/misc/carma/Makefile create mode 100644 drivers/misc/carma/carma-fpga-program.c create mode 100644 drivers/misc/carma/carma-fpga.c -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH RFCv7 2/2] misc: add CARMA DATA-FPGA Programmer support
This adds support for programming the data processing FPGAs on the OVRO CARMA board. These FPGAs have a special programming sequence that requires that we program the Freescale DMA engine, which is only available inside the kernel. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/misc/carma/Kconfig |9 + drivers/misc/carma/Makefile |1 + drivers/misc/carma/carma-fpga-program.c | 1141 +++ 3 files changed, 1151 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/carma/carma-fpga-program.c diff --git a/drivers/misc/carma/Kconfig b/drivers/misc/carma/Kconfig index 4be183f..e57a9d3 100644 --- a/drivers/misc/carma/Kconfig +++ b/drivers/misc/carma/Kconfig @@ -7,3 +7,12 @@ config CARMA_FPGA Say Y here to include support for communicating with the data processing FPGAs on the OVRO CARMA board. +config CARMA_FPGA_PROGRAM + tristate CARMA DATA-FPGA Programmer + depends on FSL_SOC PPC_83xx MEDIA_SUPPORT HAS_DMA FSL_DMA + select VIDEOBUF_DMA_SG + default n + help + Say Y here to include support for programming the data processing + FPGAs on the OVRO CARMA board. + diff --git a/drivers/misc/carma/Makefile b/drivers/misc/carma/Makefile index 0b69fa7..ff36ac2 100644 --- a/drivers/misc/carma/Makefile +++ b/drivers/misc/carma/Makefile @@ -1 +1,2 @@ obj-$(CONFIG_CARMA_FPGA) += carma-fpga.o +obj-$(CONFIG_CARMA_FPGA_PROGRAM) += carma-fpga-program.o diff --git a/drivers/misc/carma/carma-fpga-program.c b/drivers/misc/carma/carma-fpga-program.c new file mode 100644 index 000..7ce6065 --- /dev/null +++ b/drivers/misc/carma/carma-fpga-program.c @@ -0,0 +1,1141 @@ +/* + * CARMA Board DATA-FPGA Programmer + * + * Copyright (c) 2009-2011 Ira W. Snyder i...@ovro.caltech.edu + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ + +#include linux/dma-mapping.h +#include linux/of_platform.h +#include linux/completion.h +#include linux/miscdevice.h +#include linux/dmaengine.h +#include linux/interrupt.h +#include linux/highmem.h +#include linux/kernel.h +#include linux/module.h +#include linux/mutex.h +#include linux/delay.h +#include linux/init.h +#include linux/leds.h +#include linux/slab.h +#include linux/kref.h +#include linux/fs.h +#include linux/io.h + +#include media/videobuf-dma-sg.h + +/* MPC8349EMDS specific get_immrbase() */ +#include sysdev/fsl_soc.h + +static const char drv_name[] = carma-fpga-program; + +/* + * Firmware images are always this exact size + * + * 12849552 bytes for a CARMA Digitizer Board (EP2S90 FPGAs) + * 18662880 bytes for a CARMA Correlator Board (EP2S130 FPGAs) + */ +#define FW_SIZE_EP2S90 12849552 +#define FW_SIZE_EP2S13018662880 + +struct fpga_dev { + struct miscdevice miscdev; + + /* Reference count */ + struct kref ref; + + /* Device Registers */ + struct device *dev; + void __iomem *regs; + void __iomem *immr; + + /* Freescale DMA Device */ + struct dma_chan *chan; + + /* Interrupts */ + int irq, status; + struct completion completion; + + /* FPGA Bitfile */ + struct mutex lock; + + struct videobuf_dmabuf vb; + bool vb_allocated; + + /* max size and written bytes */ + size_t fw_size; + size_t bytes; +}; + +/* + * FPGA Bitfile Helpers + */ + +/** + * fpga_drop_firmware_data() - drop the bitfile image from memory + * @priv: the driver's private data structure + * + * LOCKING: must hold priv-lock + */ +static void fpga_drop_firmware_data(struct fpga_dev *priv) +{ + videobuf_dma_free(priv-vb); + priv-vb_allocated = false; + priv-bytes = 0; +} + +/* + * Private Data Reference Count + */ + +static void fpga_dev_remove(struct kref *ref) +{ + struct fpga_dev *priv = container_of(ref, struct fpga_dev, ref); + + /* free any firmware image that was not programmed */ + fpga_drop_firmware_data(priv); + + mutex_destroy(priv-lock); + kfree(priv); +} + +/* + * LED Trigger (could be a seperate module) + */ + +/* + * NOTE: this whole thing does have the problem that whenever the led's are + * NOTE: first set to use the fpga trigger, they could be in the wrong state + */ + +DEFINE_LED_TRIGGER(ledtrig_fpga); + +static void ledtrig_fpga_programmed(bool enabled) +{ + if (enabled) + led_trigger_event(ledtrig_fpga, LED_FULL); + else + led_trigger_event(ledtrig_fpga, LED_OFF); +} + +/* + * FPGA Register Helpers + */ + +/* Register Definitions */ +#define FPGA_CONFIG_CONTROL0x40 +#define FPGA_CONFIG_STATUS 0x44 +#define FPGA_CONFIG_FIFO_SIZE 0x48 +#define FPGA_CONFIG_FIFO_USED
[PATCH RFCv7 1/2] misc: add CARMA DATA-FPGA Access Driver
This driver allows userspace to access the data processing FPGAs on the OVRO CARMA board. It has two modes of operation: 1) random access This allows users to poke any DATA-FPGA registers by using mmap to map the address region directly into their memory map. 2) correlation dumping When correlating, the DATA-FPGA's have special requirements for getting the data out of their memory before the next correlation. This nominally happens at 64Hz (every 15.625ms). If the data is not dumped before the next correlation, data is lost. The data dumping driver handles buffering up to 1 second worth of correlation data from the FPGAs. This lowers the realtime scheduling requirements for the userspace process reading the device. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/misc/Kconfig|1 + drivers/misc/Makefile |1 + drivers/misc/carma/Kconfig |9 + drivers/misc/carma/Makefile |1 + drivers/misc/carma/carma-fpga.c | 1433 +++ 5 files changed, 1445 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/carma/Kconfig create mode 100644 drivers/misc/carma/Makefile create mode 100644 drivers/misc/carma/carma-fpga.c diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index cc8e49d..93cf1e6 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -457,5 +457,6 @@ source drivers/misc/eeprom/Kconfig source drivers/misc/cb710/Kconfig source drivers/misc/iwmc3200top/Kconfig source drivers/misc/ti-st/Kconfig +source drivers/misc/carma/Kconfig endif # MISC_DEVICES diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 98009cc..2c1610e 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -42,3 +42,4 @@ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o obj-$(CONFIG_PCH_PHUB) += pch_phub.o obj-y += ti-st/ obj-$(CONFIG_AB8500_PWM) += ab8500-pwm.o +obj-y += carma/ diff --git a/drivers/misc/carma/Kconfig b/drivers/misc/carma/Kconfig new file mode 100644 index 000..4be183f --- /dev/null +++ b/drivers/misc/carma/Kconfig @@ -0,0 +1,9 @@ +config CARMA_FPGA + tristate CARMA DATA-FPGA Access Driver + depends on FSL_SOC PPC_83xx MEDIA_SUPPORT HAS_DMA FSL_DMA + select VIDEOBUF_DMA_SG + default n + help + Say Y here to include support for communicating with the data + processing FPGAs on the OVRO CARMA board. + diff --git a/drivers/misc/carma/Makefile b/drivers/misc/carma/Makefile new file mode 100644 index 000..0b69fa7 --- /dev/null +++ b/drivers/misc/carma/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_CARMA_FPGA) += carma-fpga.o diff --git a/drivers/misc/carma/carma-fpga.c b/drivers/misc/carma/carma-fpga.c new file mode 100644 index 000..3965821 --- /dev/null +++ b/drivers/misc/carma/carma-fpga.c @@ -0,0 +1,1433 @@ +/* + * CARMA DATA-FPGA Access Driver + * + * Copyright (c) 2009-2011 Ira W. Snyder i...@ovro.caltech.edu + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ + +/* + * FPGA Memory Dump Format + * + * FPGA #0 control registers (32 x 32-bit words) + * FPGA #1 control registers (32 x 32-bit words) + * FPGA #2 control registers (32 x 32-bit words) + * FPGA #3 control registers (32 x 32-bit words) + * SYSFPGA control registers (32 x 32-bit words) + * FPGA #0 correlation array (NUM_CORL0 correlation blocks) + * FPGA #1 correlation array (NUM_CORL1 correlation blocks) + * FPGA #2 correlation array (NUM_CORL2 correlation blocks) + * FPGA #3 correlation array (NUM_CORL3 correlation blocks) + * + * Each correlation array consists of: + * + * Correlation Data (2 x NUM_LAGSn x 32-bit words) + * Pipeline Metadata (2 x NUM_METAn x 32-bit words) + * Quantization Counters (2 x NUM_QCNTn x 32-bit words) + * + * The NUM_CORLn, NUM_LAGSn, NUM_METAn, and NUM_QCNTn values come from + * the FPGA configuration registers. They do not change once the FPGA's + * have been programmed, they only change on re-programming. + */ + +/* + * Basic Description: + * + * This driver is used to capture correlation spectra off of the four data + * processing FPGAs. The FPGAs are often reprogrammed at runtime, therefore + * this driver supports dynamic enable/disable of capture while the device + * remains open. + * + * The nominal capture rate is 64Hz (every 15.625ms). To facilitate this fast + * capture rate, all buffers are pre-allocated to avoid any potentially long + * running memory allocations while capturing. + * + * There are two lists and one pointer which are used to keep track of the + * different states of data buffers. + * + * 1) free list + * This list holds all empty data buffers which are ready to receive data. + * + * 2) inflight pointer
Re: [PATCH 1/2] misc: add CARMA DATA-FPGA Access Driver
On Wed, Feb 09, 2011 at 04:30:23PM -, David Laight wrote: This driver allows userspace to access the data processing FPGAs on the OVRO CARMA board. It has two modes of operation: 1) random access This allows users to poke any DATA-FPGA registers by using mmap to map the address region directly into their memory map. I needed something similar, but used pread() and pwrite() to request the transfers. While this does require a system call per transfer, it allows the driver to use dma (if available) to speed up the request. In my case doing single cycle transfers would be too slow. We initially started with a read()/write() interface for individual register reads and writes, just like you describe. It turned out that mmap was plenty fast for our use. I made the decision to ditch all of the extra code needed to setup and execute the DMA for the much simpler mmap code. In our case, going all the way through the DMA engine code just to transfer 4 bytes is overkill. The local bus is already quite fast, and we can increase the clock speed if needed. Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] misc: add CARMA DATA-FPGA Access Driver
On Wed, Feb 09, 2011 at 12:33:25AM -0800, Dmitry Torokhov wrote: Hi Ira, On Tue, Feb 08, 2011 at 03:37:46PM -0800, Ira W. Snyder wrote: This driver allows userspace to access the data processing FPGAs on the OVRO CARMA board. It has two modes of operation: Thank you for making the changes, some more comments below. + +#define inode_to_dev(inode) container_of(inode-i_cdev, struct fpga_device, cdev) + Does not seem to be used. Leftovers from earlier versions, will remove. +/* + * Data Buffer Allocation Helpers + */ + +static int data_map_buffer(struct device *dev, struct data_buf *buf) +{ + return videobuf_dma_map(dev, buf-vb); +} + +static void data_unmap_buffer(struct device *dev, struct data_buf *buf) +{ + videobuf_dma_unmap(dev, buf-vb); +} Why can't we all videobuf_dma_map() and videobuf_dma_unmap() directly? What these helpers supposed to solve? The helpers were useful in the past. Now they are not. Will change. +static int data_alloc_buffers(struct fpga_device *priv) +{ + struct data_buf *buf; + int i, ret; + + for (i = 0; i MAX_DATA_BUFS; i++) { + + /* allocate a buffer */ + buf = data_alloc_buffer(priv-bufsize); + if (!buf) + continue; break? + + /* map it for DMA */ + ret = data_map_buffer(priv-dev, buf); + if (ret) { + data_free_buffer(buf); + continue; and here as well? Yep, break is fine also. + } + + /* add it to the list of free buffers */ + list_add_tail(buf-entry, priv-free); + priv-num_buffers++; + } + + /* Make sure we allocated the minimum required number of buffers */ + if (priv-num_buffers MIN_DATA_BUFS) { + dev_err(priv-dev, Unable to allocate enough data buffers\n); + data_free_buffers(priv); + return -ENOMEM; + } + + /* Warn if we are running in a degraded state, but do not fail */ + if (priv-num_buffers MAX_DATA_BUFS) { + dev_warn(priv-dev, Unable to allocate one second worth of + buffers, using %d buffers instead\n, i); The latest style is not break strings even if they exceed 80 column limit to make sure they are easily greppable. I usually just follow checkpatch warnings. I'll combine the strings onto one line. +static void data_dma_cb(void *data) +{ + struct fpga_device *priv = data; + struct data_buf *buf; + unsigned long flags; + + spin_lock_irqsave(priv-lock, flags); + + /* clear the FPGA status and re-enable interrupts */ + data_enable_interrupts(priv); + + /* If the inflight list is empty, we've got a bug */ + BUG_ON(list_empty(priv-inflight)); + + /* Grab the first buffer from the inflight list */ + buf = list_first_entry(priv-inflight, struct data_buf, entry); + list_del_init(buf-entry); + + /* Add it to the used list */ + list_add_tail(buf-entry, priv-used); or list_move_tail(buf-entry, priv-used); Using list_move_tail() didn't occur to me. I'll change it. + +static irqreturn_t data_irq(int irq, void *dev_id) +{ + struct fpga_device *priv = dev_id; + struct data_buf *buf; + u32 status; + int i; + + /* detect spurious interrupts via FPGA status */ + for (i = 0; i 4; i++) { + status = fpga_read_reg(priv, i, MMAP_REG_STATUS); + if (!(status (CORL_DONE | CORL_ERR))) { + dev_err(priv-dev, spurious irq detected (FPGA)\n); + return IRQ_NONE; + } + } + + /* detect spurious interrupts via raw IRQ pin readback */ + status = ioread32be(priv-regs + SYS_IRQ_INPUT_DATA); + if (status IRQ_CORL_DONE) { + dev_err(priv-dev, spurious irq detected (IRQ)\n); + return IRQ_NONE; + } + + spin_lock(priv-lock); + + /* hide the interrupt by switching the IRQ driver to GPIO */ + data_disable_interrupts(priv); + + /* Check that we actually have a free buffer */ + if (list_empty(priv-free)) { + priv-num_dropped++; + data_enable_interrupts(priv); + goto out_unlock; + } + + buf = list_first_entry(priv-free, struct data_buf, entry); + list_del_init(buf-entry); + + /* Check the buffer size */ + BUG_ON(buf-size != priv-bufsize); + + /* Submit a DMA transfer to get the correlation data */ + if (data_submit_dma(priv, buf)) { + dev_err(priv-dev, Unable to setup DMA transfer\n); + list_add_tail(buf-entry, priv-free); + data_enable_interrupts(priv); + goto out_unlock; + } + + /* DMA setup succeeded, GO!!! */ + list_add_tail(buf-entry, priv-inflight); + dma_async_memcpy_issue_pending(priv-chan); + +out_unlock
Re: [PATCH 1/2] misc: add CARMA DATA-FPGA Access Driver
On Wed, Feb 09, 2011 at 10:27:40AM -0800, Dmitry Torokhov wrote: [ snip stuff I've already fixed in the next version ] The requirement is that the device stay open during reconfiguration. This provides for that. Readers just block for as long as the device is not producing data. OK, you still need to make sure you do not touch free/used buffer while device is disabled. Also, you need to kick readers if you unbind the driver, so maybe a new flag priv-exists should be introduced and checked. I don't understand what you mean by kick readers if you unbind the driver. The kernel automatically increases the refcount on a module when a process is using the module. This shows up in the Used by column of lsmod's output. The kernel will not let you rmmod a module with a non-zero refcount. You cannot get into the situation where you have rmmod'ed the module and a reader is still blocking in read()/poll(). Thanks for the review. A v6 is coming right up. Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] misc: add CARMA DATA-FPGA Access Driver
On Wed, Feb 09, 2011 at 03:42:31PM -0800, Dmitry Torokhov wrote: On Wed, Feb 09, 2011 at 03:35:45PM -0800, Ira W. Snyder wrote: On Wed, Feb 09, 2011 at 10:27:40AM -0800, Dmitry Torokhov wrote: [ snip stuff I've already fixed in the next version ] The requirement is that the device stay open during reconfiguration. This provides for that. Readers just block for as long as the device is not producing data. OK, you still need to make sure you do not touch free/used buffer while device is disabled. Also, you need to kick readers if you unbind the driver, so maybe a new flag priv-exists should be introduced and checked. I don't understand what you mean by kick readers if you unbind the driver. The kernel automatically increases the refcount on a module when a process is using the module. This shows up in the Used by column of lsmod's output. The kernel will not let you rmmod a module with a non-zero refcount. You cannot get into the situation where you have rmmod'ed the module and a reader is still blocking in read()/poll(). However you can still unbind the driver from the device by writing into driver's sysfs 'unbind' attribute. See drivers/base/bus.c::driver_unbind(). I was completely unaware of that feature. I hunch that many drivers are incapable of dealing with an unbind while they are still open. Matter of fact, I don't see how this can EVER be safe. The driver core automatically calls the data_of_remove() routine while there are still blocked readers. This kfree()s the private data structure, which contains the suggested priv-exists flag. What happens if the memory allocator re-allocates that memory to a different driver before the reader process is woken up to check the priv-exists flag? The only way to solve this is to count the number of open()s and close()s, and block the unbind until all users have close()d the device. Thanks, Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH RFCv6 0/2] CARMA Board Support
Hello everyone, This is the sixth posting of these drivers, taking into account comments from earlier postings. I would appreciate as much review as you can offer. RFCv5 - RFCv6: - change locking in several functions - use list_move_tail() to simplify code - remove unused helper functions RFCv4 - RFCv5: - remove unecessary locking per review comments - do not clobber return values from *_interruptible() - explicitly track buffer DMA mapping - use #defines instead of raw hex addresses - change enable sysfs attribute to root-writeable only RFCv3 - RFCv4: - updates for DATA-FPGA version 2 RFCv2 - RFCv3: - use miscdevice framework (removing the carma class) - add bitfile readback capability to the programmer RFCv1 - RFCv2: - change comments to kerneldoc format - Kconfig improvements - use the videobuf_dma_sg API in the programmer - updates for Freescale DMAEngine DMA_SLAVE API changes KNOWN ISSUES: - untested with a setup that can generate interrupts (will get access soon) - does not handle runtime unbind Information about the CARMA board: The CARMA board is essentially an MPC8349EA MDS reference design with a 1GHz ADC and 4 high powered data processing FPGAs connected to the local bus. It is all packed into a compact PCI form factor. It is used at the Owens Valley Radio Observatory as the main component in the correlator system. For board information, see: http://www.mmarray.org/~dwh/carma_board/index.html For DATA-FPGA register layout, see: http://www.mmarray.org/memos/carma_memo46.pdf These drivers are the necessary pieces to get the data processing FPGAs working and producing data. Despite the fact that the hardware is custom and we are the only users, I'd still like to get the drivers upstream. Several people have suggested that this is possible. Some further patches will be forthcoming. I have a driver for the LED subsystem and the PPS subsystem. The LED register layout is expected to change soon, so I won't post the driver until that is finished. The PPS driver will be posted seperately from this patch series; it is very generic. Thanks to everyone who has provided comments on earlier versions! Ira W. Snyder (2): misc: add CARMA DATA-FPGA Access Driver misc: add CARMA DATA-FPGA Programmer support drivers/misc/Kconfig|1 + drivers/misc/Makefile |1 + drivers/misc/carma/Kconfig | 18 + drivers/misc/carma/Makefile |2 + drivers/misc/carma/carma-fpga-program.c | 1084 drivers/misc/carma/carma-fpga.c | 1407 +++ 6 files changed, 2513 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/carma/Kconfig create mode 100644 drivers/misc/carma/Makefile create mode 100644 drivers/misc/carma/carma-fpga-program.c create mode 100644 drivers/misc/carma/carma-fpga.c -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH RFCv6 1/2] misc: add CARMA DATA-FPGA Access Driver
This driver allows userspace to access the data processing FPGAs on the OVRO CARMA board. It has two modes of operation: 1) random access This allows users to poke any DATA-FPGA registers by using mmap to map the address region directly into their memory map. 2) correlation dumping When correlating, the DATA-FPGA's have special requirements for getting the data out of their memory before the next correlation. This nominally happens at 64Hz (every 15.625ms). If the data is not dumped before the next correlation, data is lost. The data dumping driver handles buffering up to 1 second worth of correlation data from the FPGAs. This lowers the realtime scheduling requirements for the userspace process reading the device. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/misc/Kconfig|1 + drivers/misc/Makefile |1 + drivers/misc/carma/Kconfig |9 + drivers/misc/carma/Makefile |1 + drivers/misc/carma/carma-fpga.c | 1407 +++ 5 files changed, 1419 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/carma/Kconfig create mode 100644 drivers/misc/carma/Makefile create mode 100644 drivers/misc/carma/carma-fpga.c diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index cc8e49d..93cf1e6 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -457,5 +457,6 @@ source drivers/misc/eeprom/Kconfig source drivers/misc/cb710/Kconfig source drivers/misc/iwmc3200top/Kconfig source drivers/misc/ti-st/Kconfig +source drivers/misc/carma/Kconfig endif # MISC_DEVICES diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 98009cc..2c1610e 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -42,3 +42,4 @@ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o obj-$(CONFIG_PCH_PHUB) += pch_phub.o obj-y += ti-st/ obj-$(CONFIG_AB8500_PWM) += ab8500-pwm.o +obj-y += carma/ diff --git a/drivers/misc/carma/Kconfig b/drivers/misc/carma/Kconfig new file mode 100644 index 000..4be183f --- /dev/null +++ b/drivers/misc/carma/Kconfig @@ -0,0 +1,9 @@ +config CARMA_FPGA + tristate CARMA DATA-FPGA Access Driver + depends on FSL_SOC PPC_83xx MEDIA_SUPPORT HAS_DMA FSL_DMA + select VIDEOBUF_DMA_SG + default n + help + Say Y here to include support for communicating with the data + processing FPGAs on the OVRO CARMA board. + diff --git a/drivers/misc/carma/Makefile b/drivers/misc/carma/Makefile new file mode 100644 index 000..0b69fa7 --- /dev/null +++ b/drivers/misc/carma/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_CARMA_FPGA) += carma-fpga.o diff --git a/drivers/misc/carma/carma-fpga.c b/drivers/misc/carma/carma-fpga.c new file mode 100644 index 000..be40a07 --- /dev/null +++ b/drivers/misc/carma/carma-fpga.c @@ -0,0 +1,1407 @@ +/* + * CARMA DATA-FPGA Access Driver + * + * Copyright (c) 2009-2011 Ira W. Snyder i...@ovro.caltech.edu + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ + +/* + * FPGA Memory Dump Format + * + * FPGA #0 control registers (32 x 32-bit words) + * FPGA #1 control registers (32 x 32-bit words) + * FPGA #2 control registers (32 x 32-bit words) + * FPGA #3 control registers (32 x 32-bit words) + * SYSFPGA control registers (32 x 32-bit words) + * FPGA #0 correlation array (NUM_CORL0 correlation blocks) + * FPGA #1 correlation array (NUM_CORL1 correlation blocks) + * FPGA #2 correlation array (NUM_CORL2 correlation blocks) + * FPGA #3 correlation array (NUM_CORL3 correlation blocks) + * + * Each correlation array consists of: + * + * Correlation Data (2 x NUM_LAGSn x 32-bit words) + * Pipeline Metadata (2 x NUM_METAn x 32-bit words) + * Quantization Counters (2 x NUM_QCNTn x 32-bit words) + * + * The NUM_CORLn, NUM_LAGSn, NUM_METAn, and NUM_QCNTn values come from + * the FPGA configuration registers. They do not change once the FPGA's + * have been programmed, they only change on re-programming. + */ + +/* + * Basic Description: + * + * This driver is used to capture correlation spectra off of the four data + * processing FPGAs. The FPGAs are often reprogrammed at runtime, therefore + * this driver supports dynamic enable/disable of capture while the device + * remains open. + * + * The nominal capture rate is 64Hz (every 15.625ms). To facilitate this fast + * capture rate, all buffers are pre-allocated to avoid any potentially long + * running memory allocations while capturing. + * + * There are two lists and one pointer which are used to keep track of the + * different states of data buffers. + * + * 1) free list + * This list holds all empty data buffers which are ready to receive data. + * + * 2) inflight pointer
[PATCH RFCv6 2/2] misc: add CARMA DATA-FPGA Programmer support
This adds support for programming the data processing FPGAs on the OVRO CARMA board. These FPGAs have a special programming sequence that requires that we program the Freescale DMA engine, which is only available inside the kernel. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/misc/carma/Kconfig |9 + drivers/misc/carma/Makefile |1 + drivers/misc/carma/carma-fpga-program.c | 1084 +++ 3 files changed, 1094 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/carma/carma-fpga-program.c diff --git a/drivers/misc/carma/Kconfig b/drivers/misc/carma/Kconfig index 4be183f..e57a9d3 100644 --- a/drivers/misc/carma/Kconfig +++ b/drivers/misc/carma/Kconfig @@ -7,3 +7,12 @@ config CARMA_FPGA Say Y here to include support for communicating with the data processing FPGAs on the OVRO CARMA board. +config CARMA_FPGA_PROGRAM + tristate CARMA DATA-FPGA Programmer + depends on FSL_SOC PPC_83xx MEDIA_SUPPORT HAS_DMA FSL_DMA + select VIDEOBUF_DMA_SG + default n + help + Say Y here to include support for programming the data processing + FPGAs on the OVRO CARMA board. + diff --git a/drivers/misc/carma/Makefile b/drivers/misc/carma/Makefile index 0b69fa7..ff36ac2 100644 --- a/drivers/misc/carma/Makefile +++ b/drivers/misc/carma/Makefile @@ -1 +1,2 @@ obj-$(CONFIG_CARMA_FPGA) += carma-fpga.o +obj-$(CONFIG_CARMA_FPGA_PROGRAM) += carma-fpga-program.o diff --git a/drivers/misc/carma/carma-fpga-program.c b/drivers/misc/carma/carma-fpga-program.c new file mode 100644 index 000..ef16cb3 --- /dev/null +++ b/drivers/misc/carma/carma-fpga-program.c @@ -0,0 +1,1084 @@ +/* + * CARMA Board DATA-FPGA Programmer + * + * Copyright (c) 2009-2010 Ira W. Snyder i...@ovro.caltech.edu + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ + +#include linux/dma-mapping.h +#include linux/of_platform.h +#include linux/completion.h +#include linux/miscdevice.h +#include linux/dmaengine.h +#include linux/interrupt.h +#include linux/highmem.h +#include linux/kernel.h +#include linux/module.h +#include linux/mutex.h +#include linux/delay.h +#include linux/init.h +#include linux/leds.h +#include linux/slab.h +#include linux/fs.h +#include linux/io.h + +#include media/videobuf-dma-sg.h + +/* MPC8349EMDS specific get_immrbase() */ +#include sysdev/fsl_soc.h + +static const char drv_name[] = carma-fpga-program; + +/* + * Maximum firmware size + * + * 12849552 bytes for a CARMA Digitizer Board + * 18662880 bytes for a CARMA Correlator Board + */ +#define FW_SIZE_EP2S90 12849552 +#define FW_SIZE_EP2S13018662880 + +struct fpga_dev { + struct miscdevice miscdev; + + /* Device Registers */ + struct device *dev; + void __iomem *regs; + void __iomem *immr; + + /* Freescale DMA Device */ + struct dma_chan *chan; + + /* Interrupts */ + int irq, status; + struct completion completion; + + /* FPGA Bitfile */ + struct mutex lock; + + struct videobuf_dmabuf vb; + bool vb_allocated; + + /* max size and written bytes */ + size_t fw_size; + size_t bytes; +}; + +/* + * FPGA Bitfile Helpers + */ + +/** + * fpga_drop_firmware_data() - drop the bitfile image from memory + * @priv: the driver's private data structure + * + * LOCKING: must hold priv-lock + */ +static void fpga_drop_firmware_data(struct fpga_dev *priv) +{ + videobuf_dma_free(priv-vb); + priv-vb_allocated = false; + priv-bytes = 0; +} + +/* + * LED Trigger (could be a seperate module) + */ + +/* + * NOTE: this whole thing does have the problem that whenever the led's are + * NOTE: first set to use the fpga trigger, they could be in the wrong state + */ + +DEFINE_LED_TRIGGER(ledtrig_fpga); + +static void ledtrig_fpga_programmed(bool enabled) +{ + if (enabled) + led_trigger_event(ledtrig_fpga, LED_FULL); + else + led_trigger_event(ledtrig_fpga, LED_OFF); +} + +/* + * FPGA Register Helpers + */ + +/* Register Definitions */ +#define FPGA_CONFIG_CONTROL0x40 +#define FPGA_CONFIG_STATUS 0x44 +#define FPGA_CONFIG_FIFO_SIZE 0x48 +#define FPGA_CONFIG_FIFO_USED 0x4C +#define FPGA_CONFIG_TOTAL_BYTE_COUNT 0x50 +#define FPGA_CONFIG_CUR_BYTE_COUNT 0x54 + +#define FPGA_FIFO_ADDRESS 0x3000 + +static int fpga_fifo_size(void __iomem *regs) +{ + return ioread32be(regs + FPGA_CONFIG_FIFO_SIZE); +} + +static int fpga_config_error(void __iomem *regs) +{ + return ioread32be(regs + FPGA_CONFIG_STATUS) 0xFFFE; +} + +static int fpga_fifo_empty(void __iomem *regs) +{ + return ioread32be
Re: [PATCH 1/2] misc: add CARMA DATA-FPGA Access Driver
On Mon, Feb 07, 2011 at 11:33:10PM -0800, Dmitry Torokhov wrote: Hi Ira, On Mon, Feb 07, 2011 at 03:23:40PM -0800, Ira W. Snyder wrote: This driver allows userspace to access the data processing FPGAs on the OVRO CARMA board. It has two modes of operation: 1) random access This allows users to poke any DATA-FPGA registers by using mmap to map the address region directly into their memory map. 2) correlation dumping When correlating, the DATA-FPGA's have special requirements for getting the data out of their memory before the next correlation. This nominally happens at 64Hz (every 15.625ms). If the data is not dumped before the next correlation, data is lost. The data dumping driver handles buffering up to 1 second worth of correlation data from the FPGAs. This lowers the realtime scheduling requirements for the userspace process reading the device. Kind of a fly-by review but it looks like the locking in the driver needs work. Hi Dmitry, Thanks for the review. I have a few comments inline below. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/misc/Kconfig|1 + drivers/misc/Makefile |1 + drivers/misc/carma/Kconfig |9 + drivers/misc/carma/Makefile |1 + drivers/misc/carma/carma-fpga.c | 1446 +++ 5 files changed, 1458 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/carma/Kconfig create mode 100644 drivers/misc/carma/Makefile create mode 100644 drivers/misc/carma/carma-fpga.c diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 4d073f1..f457f14 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -457,5 +457,6 @@ source drivers/misc/eeprom/Kconfig source drivers/misc/cb710/Kconfig source drivers/misc/iwmc3200top/Kconfig source drivers/misc/ti-st/Kconfig +source drivers/misc/carma/Kconfig endif # MISC_DEVICES diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 98009cc..2c1610e 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -42,3 +42,4 @@ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o obj-$(CONFIG_PCH_PHUB) += pch_phub.o obj-y += ti-st/ obj-$(CONFIG_AB8500_PWM) += ab8500-pwm.o +obj-y += carma/ diff --git a/drivers/misc/carma/Kconfig b/drivers/misc/carma/Kconfig new file mode 100644 index 000..4be183f --- /dev/null +++ b/drivers/misc/carma/Kconfig @@ -0,0 +1,9 @@ +config CARMA_FPGA + tristate CARMA DATA-FPGA Access Driver + depends on FSL_SOC PPC_83xx MEDIA_SUPPORT HAS_DMA FSL_DMA + select VIDEOBUF_DMA_SG + default n + help + Say Y here to include support for communicating with the data + processing FPGAs on the OVRO CARMA board. + diff --git a/drivers/misc/carma/Makefile b/drivers/misc/carma/Makefile new file mode 100644 index 000..0b69fa7 --- /dev/null +++ b/drivers/misc/carma/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_CARMA_FPGA) += carma-fpga.o diff --git a/drivers/misc/carma/carma-fpga.c b/drivers/misc/carma/carma-fpga.c new file mode 100644 index 000..52620b3 --- /dev/null +++ b/drivers/misc/carma/carma-fpga.c @@ -0,0 +1,1446 @@ +/* + * CARMA DATA-FPGA Access Driver + * + * Copyright (c) 2009-2010 Ira W. Snyder i...@ovro.caltech.edu + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ + +/* + * FPGA Memory Dump Format + * + * FPGA #0 control registers (32 x 32-bit words) + * FPGA #1 control registers (32 x 32-bit words) + * FPGA #2 control registers (32 x 32-bit words) + * FPGA #3 control registers (32 x 32-bit words) + * SYSFPGA control registers (32 x 32-bit words) + * FPGA #0 correlation array (NUM_CORL0 correlation blocks) + * FPGA #1 correlation array (NUM_CORL1 correlation blocks) + * FPGA #2 correlation array (NUM_CORL2 correlation blocks) + * FPGA #3 correlation array (NUM_CORL3 correlation blocks) + * + * Each correlation array consists of: + * + * Correlation Data (2 x NUM_LAGSn x 32-bit words) + * Pipeline Metadata (2 x NUM_METAn x 32-bit words) + * Quantization Counters (2 x NUM_QCNTn x 32-bit words) + * + * The NUM_CORLn, NUM_LAGSn, NUM_METAn, and NUM_QCNTn values come from + * the FPGA configuration registers. They do not change once the FPGA's + * have been programmed, they only change on re-programming. + */ + +/* + * Basic Description: + * + * This driver is used to capture correlation spectra off of the four data + * processing FPGAs. The FPGAs are often reprogrammed at runtime, therefore + * this driver supports dynamic enable
Re: [PATCH 1/2] misc: add CARMA DATA-FPGA Access Driver
On Tue, Feb 08, 2011 at 09:50:29AM -0800, Dmitry Torokhov wrote: On Tue, Feb 08, 2011 at 09:20:46AM -0800, Ira W. Snyder wrote: On Mon, Feb 07, 2011 at 11:33:10PM -0800, Dmitry Torokhov wrote: +static void data_free_buffer(struct device *dev, struct data_buf *buf) +{ + /* It is ok to free a NULL buffer */ + if (!buf) + return; + + /* Make sure the buffer is not on any list */ + list_del_init(buf-entry); And what happens if it is? Should it be WARN_ON(!list_empty()) instead? This was only defensive programming. Everywhere this function is called, the buffer has already been removed from the list. I am concerned as sometimes defencive programming is the sign that we arenot quite sure how the code works. I believe defensive programming should be used when providing library-like code, not in local cases. Ok. + + list_for_each_entry_safe(buf, tmp, priv-free, entry) { + list_del_init(buf-entry); + spin_unlock_irq(priv-lock); + data_free_buffer(priv-dev, buf); + spin_lock_irq(priv-lock); + } This is messed up. If there is concurrent access to the free list then it is not safe to continue iterating list after releasing the lock, you need to do: spin_lock_irq(priv-lock); while (!list_empty(priv-free)) { buf = list_first_entry(priv-free, struct data_buf, entry); list_del_init(buf-entry); spin_unlock_irq(priv-lock); data_free_buffer(priv-dev, buf); spin_lock_irq(priv-lock); } BUT, the function is only called when you disable (or fail to enable) device which, at this point, should be quiesced, thus all this locking is not really needed. Correct. I thought it would be clearer to reviewers if I always used the lock to protect a data structure, even when it isn't technically needed. No, locks should protect wehat needs to be protected. The rest just muddles water. Ok. + + spin_lock_irq(priv-lock); + while (!list_empty(list)) { + spin_unlock_irq(priv-lock); + + ret = wait_event_interruptible(priv-wait, list_empty(list)); + if (ret) + return -ERESTARTSYS; + + spin_lock_irq(priv-lock); + } + spin_unlock_irq(priv-lock); Locking is not needed - if you disable interrupyts what would put more stuff on the list? The locking is definitely needed. You've missed a critical piece of information. There are *two* devices we are interacting with here, and BOTH generate interrupts. No, I did not miss this fact. The point is that when we get to this code the device _putting_ items on wauiting list is stopped and we only need to wait for the list to drain. Nobody puts more stuff on it. You can check fir list_empty() condition without locking. And if someone _is_ putting more stuff on the list - you are screwed since list may become non-empty the moment you release the lock. Ok, I understand what you mean now. You are correct, nothing else can add things to the list. Thanks for clarifying this for me. :) + +static ssize_t data_num_buffers_show(struct device *dev, +struct device_attribute *attr, char *buf) +{ + struct fpga_device *priv = dev_get_drvdata(dev); + unsigned int num; + + spin_lock_irq(priv-lock); + num = priv-num_buffers; + spin_unlock_irq(priv-lock); This spin lock is pointless, priv-num_buffers might be already changed here, you can't guarantee that you show accurate data. Correct, I know this. I just wanted to protect the data structure at all points of use in the driver. Protect from what? integer reads are guaranteed to be complete and you are not concerned with missing updates as information is obsolete the moment you release trhe lock. Would an atomic_t be better for this, or should I just remove the locking completely? Just remove the locking. Ok. + + if (mutex_lock_interruptible(priv-mutex)) + return -ERESTARTSYS; Why don't error = mutex_lock_interruptible(priv-mutex); if (error) return error; - do not clobber perfectly valid error codes. That's what the Linux Device Drivers 3rd Edition book does. See page 112. I will change it to fix the return code. LDD3 is quite old by now... I know, but it is still the best written reference I have. Reviewers like yourself are better, but I can't look up your advice in a book. :) I'll return the error code. + +static struct attribute *data_sysfs_attrs[] = { + dev_attr_num_buffers.attr
[PATCH RFCv5 0/2] CARMA Board Support
Hello everyone, This is the fifth posting of these drivers, taking into account comments from earlier postings. I would appreciate as much review as you can offer. RFCv4 - RFCv5: - remove unecessary locking per review comments - do not clobber return values from *_interruptible() - explicitly track buffer DMA mapping - use #defines instead of raw hex addresses - change enable sysfs attribute to root-writeable only RFCv3 - RFCv4: - updates for DATA-FPGA version 2 RFCv2 - RFCv3: - use miscdevice framework (removing the carma class) - add bitfile readback capability to the programmer RFCv1 - RFCv2: - change comments to kerneldoc format - Kconfig improvements - use the videobuf_dma_sg API in the programmer - updates for Freescale DMAEngine DMA_SLAVE API changes Information about the CARMA board: The CARMA board is essentially an MPC8349EA MDS reference design with a 1GHz ADC and 4 high powered data processing FPGAs connected to the local bus. It is all packed into a compact PCI form factor. It is used at the Owens Valley Radio Observatory as the main component in the correlator system. For board information, see: http://www.mmarray.org/~dwh/carma_board/index.html For DATA-FPGA register layout, see: http://www.mmarray.org/memos/carma_memo46.pdf These drivers are the necessary pieces to get the data processing FPGAs working and producing data. Despite the fact that the hardware is custom and we are the only users, I'd still like to get the drivers upstream. Several people have suggested that this is possible. Some further patches will be forthcoming. I have a driver for the LED subsystem and the PPS subsystem. The LED register layout is expected to change soon, so I won't post the driver until that is finished. The PPS driver will be posted seperately from this patch series; it is very generic. Thanks to everyone who has provided comments on earlier versions! Ira W. Snyder (2): misc: add CARMA DATA-FPGA Access Driver misc: add CARMA DATA-FPGA Programmer support drivers/misc/Kconfig|1 + drivers/misc/Makefile |1 + drivers/misc/carma/Kconfig | 18 + drivers/misc/carma/Makefile |2 + drivers/misc/carma/carma-fpga-program.c | 1084 drivers/misc/carma/carma-fpga.c | 1396 +++ 6 files changed, 2502 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/carma/Kconfig create mode 100644 drivers/misc/carma/Makefile create mode 100644 drivers/misc/carma/carma-fpga-program.c create mode 100644 drivers/misc/carma/carma-fpga.c -- 1.7.3.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] misc: add CARMA DATA-FPGA Access Driver
This driver allows userspace to access the data processing FPGAs on the OVRO CARMA board. It has two modes of operation: 1) random access This allows users to poke any DATA-FPGA registers by using mmap to map the address region directly into their memory map. 2) correlation dumping When correlating, the DATA-FPGA's have special requirements for getting the data out of their memory before the next correlation. This nominally happens at 64Hz (every 15.625ms). If the data is not dumped before the next correlation, data is lost. The data dumping driver handles buffering up to 1 second worth of correlation data from the FPGAs. This lowers the realtime scheduling requirements for the userspace process reading the device. Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu --- drivers/misc/Kconfig|1 + drivers/misc/Makefile |1 + drivers/misc/carma/Kconfig |9 + drivers/misc/carma/Makefile |1 + drivers/misc/carma/carma-fpga.c | 1396 +++ 5 files changed, 1408 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/carma/Kconfig create mode 100644 drivers/misc/carma/Makefile create mode 100644 drivers/misc/carma/carma-fpga.c diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index cc8e49d..93cf1e6 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -457,5 +457,6 @@ source drivers/misc/eeprom/Kconfig source drivers/misc/cb710/Kconfig source drivers/misc/iwmc3200top/Kconfig source drivers/misc/ti-st/Kconfig +source drivers/misc/carma/Kconfig endif # MISC_DEVICES diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 98009cc..2c1610e 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -42,3 +42,4 @@ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o obj-$(CONFIG_PCH_PHUB) += pch_phub.o obj-y += ti-st/ obj-$(CONFIG_AB8500_PWM) += ab8500-pwm.o +obj-y += carma/ diff --git a/drivers/misc/carma/Kconfig b/drivers/misc/carma/Kconfig new file mode 100644 index 000..4be183f --- /dev/null +++ b/drivers/misc/carma/Kconfig @@ -0,0 +1,9 @@ +config CARMA_FPGA + tristate CARMA DATA-FPGA Access Driver + depends on FSL_SOC PPC_83xx MEDIA_SUPPORT HAS_DMA FSL_DMA + select VIDEOBUF_DMA_SG + default n + help + Say Y here to include support for communicating with the data + processing FPGAs on the OVRO CARMA board. + diff --git a/drivers/misc/carma/Makefile b/drivers/misc/carma/Makefile new file mode 100644 index 000..0b69fa7 --- /dev/null +++ b/drivers/misc/carma/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_CARMA_FPGA) += carma-fpga.o diff --git a/drivers/misc/carma/carma-fpga.c b/drivers/misc/carma/carma-fpga.c new file mode 100644 index 000..4ea473a --- /dev/null +++ b/drivers/misc/carma/carma-fpga.c @@ -0,0 +1,1396 @@ +/* + * CARMA DATA-FPGA Access Driver + * + * Copyright (c) 2009-2011 Ira W. Snyder i...@ovro.caltech.edu + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ + +/* + * FPGA Memory Dump Format + * + * FPGA #0 control registers (32 x 32-bit words) + * FPGA #1 control registers (32 x 32-bit words) + * FPGA #2 control registers (32 x 32-bit words) + * FPGA #3 control registers (32 x 32-bit words) + * SYSFPGA control registers (32 x 32-bit words) + * FPGA #0 correlation array (NUM_CORL0 correlation blocks) + * FPGA #1 correlation array (NUM_CORL1 correlation blocks) + * FPGA #2 correlation array (NUM_CORL2 correlation blocks) + * FPGA #3 correlation array (NUM_CORL3 correlation blocks) + * + * Each correlation array consists of: + * + * Correlation Data (2 x NUM_LAGSn x 32-bit words) + * Pipeline Metadata (2 x NUM_METAn x 32-bit words) + * Quantization Counters (2 x NUM_QCNTn x 32-bit words) + * + * The NUM_CORLn, NUM_LAGSn, NUM_METAn, and NUM_QCNTn values come from + * the FPGA configuration registers. They do not change once the FPGA's + * have been programmed, they only change on re-programming. + */ + +/* + * Basic Description: + * + * This driver is used to capture correlation spectra off of the four data + * processing FPGAs. The FPGAs are often reprogrammed at runtime, therefore + * this driver supports dynamic enable/disable of capture while the device + * remains open. + * + * The nominal capture rate is 64Hz (every 15.625ms). To facilitate this fast + * capture rate, all buffers are pre-allocated to avoid any potentially long + * running memory allocations while capturing. + * + * There are three lists which are used to keep track of the different states + * of data buffers. + * + * 1) free list + * This list holds all empty data buffers which are ready to receive data. + * + * 2) inflight list + * This list holds data