Re: Ethernet over PCIe driver for Inter-Processor Communication

2013-08-22 Thread Ira W. Snyder
On Thu, Aug 22, 2013 at 02:43:38PM -0700, David Hawkins wrote:
 Hi S.Saravanan,
 
  I have a custom board  with four MPC8640 nodes connected over
  a transparent PCI express switch . In this configuration one node is
  configured as host(Root Complex) and others as agents(End Point). Thus
  the legacy PCI software works fine . However the mainline kernel lacks
  any standard support for Inter-processor communication over PCI. I am
  in the process of developing an Ethernet over  PCI driver for the same
  on the lines of rionet . However I am facing the following problems.
 
  a)   I can generate MSI interrupts from End Point to Root Complex over
  PCI . But the vice-versa is not possible . However i need a method to
  interrupt the End Point from the Root Complex to complete my driver.
 
 Root complex's would normally interrupt a device via a PCIe write
 to a register in a BAR on the end-point (or in extended configuration
 space registers depending on the hardware implementation).
 
  Only previous references I can find are this post
  http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg25765.html
  However this uses doorbells and I think may not be possible in MPC8640.
 
 PCIe drivers need some way to interrupt the processor, so there must
 be an option somewhere ... for example, what are the message register
 interrupts intended for? See p479
 
 http://cache.freescale.com/files/32bit/doc/ref_manual/MPC8641DRM.pdf
 
 (Ira and myself have not used the MPC8640 so are not familiar with
 its user manual).
 
  Any pointers on this issue and guidance on this driver development would
  be helpful .
 
 We use the Ethernet-over-PCI driver that Ira developed. Our next boards
 will use an MPC8308, but we don't currently have any in a PCIe device
 form-factor (just the MPC8038RDB), so he has not ported it to PCIe.
 
 Feel free to discuss your ideas for your PCIe driver (eg., why start
 with rionet rather than Ira's driver), either on-list, or email Ira
 and myself directly.
 

One further note. You might want to look at rproc/rpmsg and their virtio
driver support. That seems to be where the Linux world is moving for
inter-processor communications. See for example the ARM CPUs interfacing
with DSPs.

Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: recommended method of netbooting kernel/dtb in u-boot?

2013-04-11 Thread Ira W. Snyder
On Thu, Apr 11, 2013 at 12:39:00PM -0600, Chris Friesen wrote:
 On 04/11/2013 12:12 PM, Kumar Gala wrote:
 
  On Apr 11, 2013, at 10:44 AM, Chris Friesen wrote:
 
 
  Hi all,
 
  We've got a powerpc system that uses u-boot.  In our environment on
  bootup u-boot does a DHCP to get networking info, then uses TFTP to
  get the kernel, which then does DHCP again and NFS-mounts the
  initial root filesystem.
 
  What's the standard practice for this sort of thing when using
  device tree blobs?  Do most people use multi-file images or do they
  TFTP scripts to load and execute separate kernel/dtb files?
 
  We've normally just done multiple tftp fetches and one grabs dtb and
  one grabs kernel.
 
 Do you hardcode the path to the file in the firmware?  Or do you upload 
 a script that knows the path to the file?
 
 In our case the path to the boot file(s) depends on which slot the card 
 being booted has been inserted in.  The DHCP server knows what the path 
 is, so it can set dhcpd.conf appropriately, but we need to get that 
 information to the firmware on the card being booted.
 

Hello Chris,

I use a hardware setup which sounds similar to yours. The DHCP server
controls which file is sent to each card.

I use the FIT image format to combine a kernel, dtb, and initrd in one
package. Then the U-Boot command dhcp $address sets up the network and
tftp's the filename sent by the DHCP server. You don't need to invoke
the U-Boot command tftp if you only have one image. dhcp is enough.

I used the U-Boot doc/uImage.FIT/*.its examples to get started, and
wrote my own custom .its file for my board. I don't use anything other
than the vmlinux.bin.gz provided by the kernel build.

Hope it helps,
Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v7 0/8] Raid: enable talitos xor offload for improving performance

2012-08-09 Thread Ira W. Snyder
On Thu, Aug 09, 2012 at 04:19:35PM +0800, qiang@freescale.com wrote:
 Hi all,
 
 The following 8 patches enabling fsl-dma and talitos offload raid
 operations for improving raid performance and balancing CPU load.
 
 These patches include talitos, fsl-dma and carma module (caram uses
 some features of fsl-dma).
 
 Write performance will be improved by 25-30% tested by iozone.
 Write performance is improved about 2% after using spin_lock_bh replace
 spin_lock_irqsave.
 CPU load will be reduced by 8%.
 
 fwiw, I gave v5 a test-drive, setting up a RAID5 array on ramdisks
 [1], and this patchseries, along with FSL_DMA  NET_DMA set seems
 to be holding water, so this series gets my:
 
 Tested-by: Kim Phillips kim.phill...@freescale.com
 

The fsldma parts of the series all look great to me.

Thanks,
Ira

 [1] mdadm --create --verbose --force /dev/md0 --level=raid5 --raid-devices=4 \
   /dev/ram[0123]
 
 Changes in v7:
   - add test result which is provided by Kim Phillips;
   - correct one coding style issue in patch 5/8;
   - add comments by Arnd Bergmann in patch 6/8;
 
 Changes in v6:
   - swap the order of original patch 3/6 and 4/6;
   - merge Ira's patch to reduce the size of original patch;
   - merge Ira's patch of carma in 8/8;
   - update documents and descriptions according to Ira's advice;
 
 Changes in v5:
   - add detail description in patch 3/6 about the process of completed
   descriptor, the process is in align with fsl-dma Reference Manual,
   illustrate the potential risk and how to reproduce it;
   - drop the patch 7/7 in v4 according to Timur's comments;
 
 Changes in v4:
   - fix an error in talitos when dest addr is same with src addr, dest
   should be freed only one time if src is same with dest addr;
   - correct coding style in fsl-dma according to Ira's comments;
   - fix a race condition in fsl-dma fsl_tx_status(), remove the interface
   which is used to free descriptors in queue ld_completed, this interface
   has been included in fsldma_cleanup_descriptor(), in v3, there is one
   place missed spin_lock protect;
   - split the original patch 3/4 up to 2 patches 3/7 and 4/7 according to
   Li Yang's comments;
   - fix a warning of unitialized cookie;
   - add memory copy self test in fsl-dma;
   - add more detail description about use spin_lock_bh() to instead of
   spin_lock_irqsave() according to Timur's comments.
 
 Changes in v3:
   - change release process of fsl-dma descriptor for resolve the
   potential race condition;
   - add test result when use spin_lock_bh replace spin_lock_irqsave;
   - modify the benchmark results according to the latest patch.
 
 Changes in v2:
   - rebase onto cryptodev tree;
   - split the patch 3/4 up to 3 independent patches;
   - remove the patch 4/4, the fix is not for cryptodev tree;
 
 Qiang Liu (8):
   Talitos: Support for async_tx XOR offload
   fsl-dma: remove attribute DMA_INTERRUPT of dmaengine
   fsl-dma: add fsl_dma_free_descriptor() to reduce code duplication
   fsl-dma: move functions to avoid forward declarations
   fsl-dma: change release process of dma descriptor for supporting 
 async_tx
   fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave
   fsl-dma: fix a warning of unitialized cookie
   carma: remove unnecessary DMA_INTERRUPT capability
 
  drivers/crypto/Kconfig  |9 +
  drivers/crypto/talitos.c|  413 ++
  drivers/crypto/talitos.h|   53 
  drivers/dma/fsldma.c|  488 
 +--
  drivers/dma/fsldma.h|   17 +-
  drivers/misc/carma/carma-fpga-program.c |1 -
  drivers/misc/carma/carma-fpga.c |2 +-
  7 files changed, 761 insertions(+), 222 deletions(-)
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v6 5/8] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-08-06 Thread Ira W. Snyder
On Mon, Aug 06, 2012 at 06:14:33PM +0800, qiang@freescale.com wrote:
 From: Qiang Liu qiang@freescale.com
 
 Fix the potential risk when enable config NET_DMA and ASYNC_TX.
 Async_tx is lack of support in current release process of dma descriptor,
 all descriptors will be released whatever is acked or no-acked by async_tx,
 so there is a potential race condition when dma engine is uesd by others
 clients (e.g. when enable NET_DMA to offload TCP).
 
 In our case, a race condition which is raised when use both of talitos
 and dmaengine to offload xor is because napi scheduler will sync all
 pending requests in dma channels, it affects the process of raid operations
 due to ack_tx is not checked in fsl dma. The no-acked descriptor is freed
 which is submitted just now, as a dependent tx, this freed descriptor trigger
 BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
 
 TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
 GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4  
 0001
 GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4 ed576d98 
 
 GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000 ed3015e8 
 c15a7aa0
 GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0 ef640c30 
 ecf41ca0
 NIP [c02b048c] async_tx_submit+0x6c/0x2b4
 LR [c02b068c] async_tx_submit+0x26c/0x2b4
 Call Trace:
 [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
 [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
 [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
 [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
 [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
 [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
 [ecf41f40] [c04329b8] md_thread+0x138/0x16c
 [ecf41f90] [c008277c] kthread+0x8c/0x90
 [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68
 
 Another modification in this patch is the change of completed descriptors,
 there is a potential risk which caused by exception interrupt, all descriptors
 in ld_running list are seemed completed when an interrupt raised, it works 
 fine
 under normal condition, but if there is an exception occured, it cannot work
 as our excepted. Hardware should not be depend on s/w list, the right way is
 to read current descriptor address register to find the last completed
 descriptor. If an interrupt is raised by an error, all descriptors in 
 ld_running
 should not be seemed finished, or these unfinished descriptors in ld_running
 will be released wrongly.
 
 A simple way to reproduce,
 Enable dmatest first, then insert some bad descriptors which can trigger
 Programming Error interrupts before the good descriptors. Last, the good
 descriptors will be freed before they are processsed because of the exception
 intrerrupt.
 
 Note: the bad descriptors are only for simulating an exception interrupt.
 This case can illustrate the potential risk in current fsl-dma very well.
 
 Cc: Dan Williams dan.j.willi...@intel.com
 Cc: Dan Williams dan.j.willi...@gmail.com
 Cc: Vinod Koul vinod.k...@intel.com
 Cc: Li Yang le...@freescale.com
 Signed-off-by: Qiang Liu qiang@freescale.com
 Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu

There are two minor nitpicks below. Other than that, the patch looks
excellent to me.

Ira

 ---
  drivers/dma/fsldma.c |  232 
 ++
  drivers/dma/fsldma.h |   17 +++-
  2 files changed, 174 insertions(+), 75 deletions(-)
 
 diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
 index 36490a3..938d8c1 100644
 --- a/drivers/dma/fsldma.c
 +++ b/drivers/dma/fsldma.c
 @@ -472,6 +472,110 @@ static struct fsl_desc_sw 
 *fsl_dma_alloc_descriptor(struct fsldma_chan *chan)
  }
 
  /**
 + * fsldma_clean_completed_descriptor - free all descriptors which
 + * has been completed and acked
 + * @chan: Freescale DMA channel
 + *
 + * This function is used on all completed and acked descriptors.
 + * All descriptors should only be freed in this function.
 + */
 +static void
 +fsldma_clean_completed_descriptor(struct fsldma_chan *chan)
 +{
 + struct fsl_desc_sw *desc, *_desc;
 +
 + /* Run the callback for each descriptor, in order */
 + list_for_each_entry_safe(desc, _desc, chan-ld_completed, node)
 + if (async_tx_test_ack(desc-async_tx))
 + fsl_dma_free_descriptor(chan, desc);
 +}
 +
 +/**
 + * fsldma_run_tx_complete_actions - cleanup a single link descriptor
 + * @chan: Freescale DMA channel
 + * @desc: descriptor to cleanup and free
 + * @cookie: Freescale DMA transaction identifier
 + *
 + * This function is used on a descriptor which has been executed by the DMA
 + * controller. It will run any callbacks, submit any dependencies.
 + */
 +static dma_cookie_t fsldma_run_tx_complete_actions(struct fsldma_chan *chan,
 + struct fsl_desc_sw *desc, dma_cookie_t cookie)
 +{
 + struct dma_async_tx_descriptor *txd = desc-async_tx;
 + struct device *dev = chan-common.device-dev

Re: [PATCH v5 3/6] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-08-02 Thread Ira W. Snyder
On Thu, Aug 02, 2012 at 07:21:51AM +, Liu Qiang-B32616 wrote:
  -Original Message-
  From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
  Sent: Thursday, August 02, 2012 1:25 AM
  To: Liu Qiang-B32616
  Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
  ker...@vger.kernel.org; dan.j.willi...@gmail.com; Vinod Koul;
  herb...@gondor.hengli.com.au; Dan Williams; da...@davemloft.net
  Subject: Re: [PATCH v5 3/6] fsl-dma: change release process of dma
  descriptor for supporting async_tx
  
  On Wed, Aug 01, 2012 at 04:49:17PM +0800, qiang@freescale.com wrote:
   From: Qiang Liu qiang@freescale.com
  
   Fix the potential risk when enable config NET_DMA and ASYNC_TX.
   Async_tx is lack of support in current release process of dma
   descriptor, all descriptors will be released whatever is acked or
   no-acked by async_tx, so there is a potential race condition when dma
   engine is uesd by others clients (e.g. when enable NET_DMA to offload
  TCP).
  
   In our case, a race condition which is raised when use both of talitos
   and dmaengine to offload xor is because napi scheduler will sync all
   pending requests in dma channels, it affects the process of raid
   operations due to ack_tx is not checked in fsl dma. The no-acked
   descriptor is freed which is submitted just now, as a dependent tx,
   this freed descriptor trigger
   BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
  
   TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
   GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4
    0001
   GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4
   ed576d98 
   GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000
   ed3015e8 c15a7aa0
   GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0
   ef640c30 ecf41ca0 NIP [c02b048c] async_tx_submit+0x6c/0x2b4 LR
   [c02b068c] async_tx_submit+0x26c/0x2b4 Call Trace:
   [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
   [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c [ecf41d20] [c0421064]
   async_copy_data+0xa0/0x17c [ecf41d70] [c0421cf4]
   __raid_run_ops+0x874/0xe10 [ecf41df0] [c0426ee4]
   handle_stripe+0x820/0x25e8 [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
   [ecf41f40] [c04329b8] md_thread+0x138/0x16c [ecf41f90] [c008277c]
   kthread+0x8c/0x90 [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68
  
   Another major modification in this patch is the change to completed
   descriptors, there is a potential risk which caused by exception
   interrupt, all descriptors in ld_running list are seemed completed
   when an interrupt raised, it works fine under normal condition, but if
   there is an exception occured, it cannot work as our excepted.
   Hardware should not depend on s/w list, the right way is to read
   current descriptor address register to find the last completed
   descriptor. If an interrupt is raised by an error, all descriptors in
   ld_running should not be seemed finished, or these unfinished
  descriptors in ld_running will be released wrongly.
  
   A simple way to reproduce,
   Enable dmatest first, then insert some bad descriptors which can
   trigger Programming Error interrupts before the good descriptors.
   Last, the good descriptors will be freed before they are processsed
   because of the exception intrerrupt.
  
   Note: the bad descriptors are only for simulating an exception
  interrupt.
   This case can illustrate the potential risk in current fsl-dma very
  well.
  
  
  I've never managed to trigger a PE (programming error) interrupt on the
  83xx hardware. Any time I intentionally caused an error, the hardware
  wedged itself. The CB (channel busy) bit is stuck high, and cannot be
  cleared without a hard reset of the board.
 Sorry, the exception indeed will be occurred, actually, the capability 
 DMA_INTERRUPT
 will reproduce the issue under conditions. It will trigger a exception 
 because of
 bad descriptor (length is zero, src and dst are zero, a-b-c-bada-badb-d, 
 we cannot find out which one is really finished in an interrupt tasklet).
 So, we'd better consider this case.
 
 BTW, I have already found out your patch which is used to resolve the issue 
 of dma lock,
 http://lkml.indiana.edu/hypermail/linux/kernel/1103.0/01519.html
 

Ok. I haven't tested bad descriptors since several years ago. I agree
that it can happen, so we should fix it.

  
  I agree the snoop on the hardware technique works. As far as I can tell,
  you have implemented the code correctly.
  
  The MPC8349EARM.pdf from Freescale indicates that the hardware will halt
  in response to a programming error, and generate a PE interrupt. See
  section 12.5.3.3 (pg 568).
  
  The driver, as it is written, will never recover from such a condition.
  Since you are complaining about this situation, do you intend to fix it?
 Frankly, I don't think your patch really can resolve the issue. Now, I 
 understand what

Re: [PATCH v5 4/6] fsl-dma: move the function ahead of its invoke function

2012-08-01 Thread Ira W. Snyder
On Wed, Aug 01, 2012 at 04:49:43PM +0800, qiang@freescale.com wrote:
 From: Qiang Liu qiang@freescale.com
 
 Move the function fsldma_cleanup_descriptor() and fsl_chan_xfer_ld_queue()
 ahead of its invoke function for avoiding redundant definition.
 
 Cc: Dan Williams dan.j.willi...@intel.com
 Cc: Vinod Koul vinod.k...@intel.com
 Cc: Li Yang le...@freescale.com
 Signed-off-by: Qiang Liu qiang@freescale.com
 ---
  drivers/dma/fsldma.c |  252 
 +-
  1 files changed, 124 insertions(+), 128 deletions(-)
 
 diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
 index 87f52c0..bb883c0 100644
 --- a/drivers/dma/fsldma.c
 +++ b/drivers/dma/fsldma.c
 @@ -400,9 +400,6 @@ out_splice:
   list_splice_tail_init(desc-tx_list, chan-ld_pending);
  }
 
 -static void fsldma_cleanup_descriptor(struct fsldma_chan *chan);
 -static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan);
 -

Please swap the order of this patch (patch 4/6) and the previous patch
(patch 3/6).

You added these lines in the patch 3/6 and deleted them here. If you
reverse the order of the patches, this doesn't happen.

Adding lines only to delete them in the next patch should be avoided.

  /**
   * fsldma_clean_completed_descriptor - free all descriptors which
   * has been completed and acked
 @@ -519,6 +516,130 @@ fsldma_clean_running_descriptor(struct fsldma_chan 
 *chan,
   return 0;
  }
 
 +/**
 + * fsl_chan_xfer_ld_queue - transfer any pending transactions
 + * @chan : Freescale DMA channel
 + *
 + * HARDWARE STATE: idle
 + * LOCKING: must hold chan-desc_lock
 + */
 +static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
 +{
 + struct fsl_desc_sw *desc;
 +
 + /*
 +  * If the list of pending descriptors is empty, then we
 +  * don't need to do any work at all
 +  */
 + if (list_empty(chan-ld_pending)) {
 + chan_dbg(chan, no pending LDs\n);
 + return;
 + }
 +
 + /*
 +  * The DMA controller is not idle, which means that the interrupt
 +  * handler will start any queued transactions when it runs after
 +  * this transaction finishes
 +  */
 + if (!chan-idle) {
 + chan_dbg(chan, DMA controller still busy\n);
 + return;
 + }
 +
 + /*
 +  * If there are some link descriptors which have not been
 +  * transferred, we need to start the controller
 +  */
 +
 + /*
 +  * Move all elements from the queue of pending transactions
 +  * onto the list of running transactions
 +  */
 + chan_dbg(chan, idle, starting controller\n);
 + desc = list_first_entry(chan-ld_pending, struct fsl_desc_sw, node);
 + list_splice_tail_init(chan-ld_pending, chan-ld_running);
 +
 + /*
 +  * The 85xx DMA controller doesn't clear the channel start bit
 +  * automatically at the end of a transfer. Therefore we must clear
 +  * it in software before starting the transfer.
 +  */
 + if ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) {
 + u32 mode;
 +
 + mode = DMA_IN(chan, chan-regs-mr, 32);
 + mode = ~FSL_DMA_MR_CS;
 + DMA_OUT(chan, chan-regs-mr, mode, 32);
 + }
 +
 + /*
 +  * Program the descriptor's address into the DMA controller,
 +  * then start the DMA transaction
 +  */
 + set_cdar(chan, desc-async_tx.phys);
 + get_cdar(chan);
 +
 + dma_start(chan);
 + chan-idle = false;
 +}
 +
 +/**
 + * fsldma_cleanup_descriptor - cleanup and free a single link descriptor
 + * @chan: Freescale DMA channel
 + * @desc: descriptor to cleanup and free
 + *
 + * This function is used on a descriptor which has been executed by the DMA
 + * controller. It will run any callbacks, submit any dependencies, and then
 + * free the descriptor.
 + */
 +static void fsldma_cleanup_descriptor(struct fsldma_chan *chan)
 +{
 + struct fsl_desc_sw *desc, *_desc;
 + dma_cookie_t cookie = 0;
 + dma_addr_t curr_phys = get_cdar(chan);
 + int idle = dma_is_idle(chan);
 + int seen_current = 0;
 +
 + fsldma_clean_completed_descriptor(chan);
 +
 + /* Run the callback for each descriptor, in order */
 + list_for_each_entry_safe(desc, _desc, chan-ld_running, node) {
 + /*
 +  * do not advance past the current descriptor loaded into the
 +  * hardware channel, subsequent descriptors are either in
 +  * process or have not been submitted
 +  */
 + if (seen_current)
 + break;
 +
 + /*
 +  * stop the search if we reach the current descriptor and the
 +  * channel is busy
 +  */
 + if (desc-async_tx.phys == curr_phys) {
 + seen_current = 1;
 + if (!idle)
 + break;
 + }
 +
 + cookie = fsldma_run_tx_complete_actions(desc, 

Re: [PATCH v5 2/6] fsl-dma: remove attribute DMA_INTERRUPT of dmaengine

2012-08-01 Thread Ira W. Snyder
On Wed, Aug 01, 2012 at 04:49:08PM +0800, qiang@freescale.com wrote:
 From: Qiang Liu qiang@freescale.com
 
 Delete attribute DMA_INTERRUPT because fsl-dma doesn't support this function,
 exception will be thrown if talitos is used to offload xor at the same time.
 

I have no problem with this patch.

However, it ***WILL BREAK*** both drivers in drivers/misc/carma. Please
add my patch 7/7 titled [PATCH 7/7] carma: remove unnecessary
DMA_INTERRUPT capability to your series. I suggest placing it
immediately after this patch in your series.

The carma drivers use the fsldma driver exclusively.

 Cc: Dan Williams dan.j.willi...@intel.com
 Cc: Vinod Koul vinod.k...@intel.com
 Cc: Li Yang le...@freescale.com
 Signed-off-by: Qiang Liu qiang@freescale.com
 Acked-by: Ira W. Snyder i...@ovro.caltech.edu
 ---
  drivers/dma/fsldma.c |   31 ---
  1 files changed, 0 insertions(+), 31 deletions(-)
 
 diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
 index 8f84761..4f2f212 100644
 --- a/drivers/dma/fsldma.c
 +++ b/drivers/dma/fsldma.c
 @@ -543,35 +543,6 @@ static void fsl_dma_free_chan_resources(struct dma_chan 
 *dchan)
  }
 
  static struct dma_async_tx_descriptor *
 -fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags)
 -{
 - struct fsldma_chan *chan;
 - struct fsl_desc_sw *new;
 -
 - if (!dchan)
 - return NULL;
 -
 - chan = to_fsl_chan(dchan);
 -
 - new = fsl_dma_alloc_descriptor(chan);
 - if (!new) {
 - chan_err(chan, %s\n, msg_ld_oom);
 - return NULL;
 - }
 -
 - new-async_tx.cookie = -EBUSY;
 - new-async_tx.flags = flags;
 -
 - /* Insert the link descriptor to the LD ring */
 - list_add_tail(new-node, new-tx_list);
 -
 - /* Set End-of-link to the last link descriptor of new list */
 - set_ld_eol(chan, new);
 -
 - return new-async_tx;
 -}
 -
 -static struct dma_async_tx_descriptor *
  fsl_dma_prep_memcpy(struct dma_chan *dchan,
   dma_addr_t dma_dst, dma_addr_t dma_src,
   size_t len, unsigned long flags)
 @@ -1352,12 +1323,10 @@ static int __devinit fsldma_of_probe(struct 
 platform_device *op)
   fdev-irq = irq_of_parse_and_map(op-dev.of_node, 0);
 
   dma_cap_set(DMA_MEMCPY, fdev-common.cap_mask);
 - dma_cap_set(DMA_INTERRUPT, fdev-common.cap_mask);
   dma_cap_set(DMA_SG, fdev-common.cap_mask);
   dma_cap_set(DMA_SLAVE, fdev-common.cap_mask);
   fdev-common.device_alloc_chan_resources = fsl_dma_alloc_chan_resources;
   fdev-common.device_free_chan_resources = fsl_dma_free_chan_resources;
 - fdev-common.device_prep_dma_interrupt = fsl_dma_prep_interrupt;
   fdev-common.device_prep_dma_memcpy = fsl_dma_prep_memcpy;
   fdev-common.device_prep_dma_sg = fsl_dma_prep_sg;
   fdev-common.device_tx_status = fsl_tx_status;
 --
 1.7.5.1
 
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 3/6] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-08-01 Thread Ira W. Snyder
On Wed, Aug 01, 2012 at 04:49:17PM +0800, qiang@freescale.com wrote:
 From: Qiang Liu qiang@freescale.com
 
 Fix the potential risk when enable config NET_DMA and ASYNC_TX.
 Async_tx is lack of support in current release process of dma descriptor,
 all descriptors will be released whatever is acked or no-acked by async_tx,
 so there is a potential race condition when dma engine is uesd by others
 clients (e.g. when enable NET_DMA to offload TCP).
 
 In our case, a race condition which is raised when use both of talitos
 and dmaengine to offload xor is because napi scheduler will sync all
 pending requests in dma channels, it affects the process of raid operations
 due to ack_tx is not checked in fsl dma. The no-acked descriptor is freed
 which is submitted just now, as a dependent tx, this freed descriptor trigger
 BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
 
 TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
 GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4  
 0001
 GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4 ed576d98 
 
 GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000 ed3015e8 
 c15a7aa0
 GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0 ef640c30 
 ecf41ca0
 NIP [c02b048c] async_tx_submit+0x6c/0x2b4
 LR [c02b068c] async_tx_submit+0x26c/0x2b4
 Call Trace:
 [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
 [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
 [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
 [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
 [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
 [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
 [ecf41f40] [c04329b8] md_thread+0x138/0x16c
 [ecf41f90] [c008277c] kthread+0x8c/0x90
 [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68
 
 Another major modification in this patch is the change to completed 
 descriptors,
 there is a potential risk which caused by exception interrupt, all descriptors
 in ld_running list are seemed completed when an interrupt raised, it works 
 fine
 under normal condition, but if there is an exception occured, it cannot work
 as our excepted. Hardware should not depend on s/w list, the right way is
 to read current descriptor address register to find the last completed
 descriptor. If an interrupt is raised by an error, all descriptors in 
 ld_running
 should not be seemed finished, or these unfinished descriptors in ld_running
 will be released wrongly.
 
 A simple way to reproduce,
 Enable dmatest first, then insert some bad descriptors which can trigger
 Programming Error interrupts before the good descriptors. Last, the good
 descriptors will be freed before they are processsed because of the exception
 intrerrupt.
 
 Note: the bad descriptors are only for simulating an exception interrupt.
 This case can illustrate the potential risk in current fsl-dma very well.
 

I've never managed to trigger a PE (programming error) interrupt on the
83xx hardware. Any time I intentionally caused an error, the hardware
wedged itself. The CB (channel busy) bit is stuck high, and cannot be
cleared without a hard reset of the board.

I agree the snoop on the hardware technique works. As far as I can
tell, you have implemented the code correctly.

The MPC8349EARM.pdf from Freescale indicates that the hardware will halt
in response to a programming error, and generate a PE interrupt. See
section 12.5.3.3 (pg 568).

The driver, as it is written, will never recover from such a condition.
Since you are complaining about this situation, do you intend to fix it?

 Cc: Dan Williams dan.j.willi...@intel.com
 Cc: Dan Williams dan.j.willi...@gmail.com
 Cc: Vinod Koul vinod.k...@intel.com
 Cc: Li Yang le...@freescale.com
 Cc: Ira W. Snyder i...@ovro.caltech.edu
 Signed-off-by: Qiang Liu qiang@freescale.com
 ---
  drivers/dma/fsldma.c |  242 
 +++---
  drivers/dma/fsldma.h |1 +
  2 files changed, 172 insertions(+), 71 deletions(-)
 
 diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
 index 4f2f212..87f52c0 100644
 --- a/drivers/dma/fsldma.c
 +++ b/drivers/dma/fsldma.c
 @@ -400,6 +400,125 @@ out_splice:
   list_splice_tail_init(desc-tx_list, chan-ld_pending);
  }
 
 +static void fsldma_cleanup_descriptor(struct fsldma_chan *chan);
 +static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan);
 +

As noted in my reply to patch 4/6, please swap the order of this patch
and the following patch.

These lines should not be added or removed in either patch.

 +/**
 + * fsldma_clean_completed_descriptor - free all descriptors which
 + * has been completed and acked
 + * @chan: Freescale DMA channel
 + *
 + * This function is used on all completed and acked descriptors.
 + * All descriptors should only be freed in this function.
 + */
 +static int
 +fsldma_clean_completed_descriptor(struct fsldma_chan *chan)

This should be 'static void'. It does not return an error

Re: [PATCH v5 6/6] fsl-dma: fix a warning of unitialized cookie

2012-08-01 Thread Ira W. Snyder
On Wed, Aug 01, 2012 at 04:50:27PM +0800, qiang@freescale.com wrote:
 From: Qiang Liu qiang@freescale.com
 
 Fix a warning of unitialized value when compile with -Wuninitialized.
 

Looks good to me.

Acked-by: Ira W. Snyder i...@ovro.caltech.edu

 Cc: Dan Williams dan.j.willi...@intel.com
 Cc: Vinod Koul vinod.k...@intel.com
 Cc: Li Yang le...@freescale.com
 Signed-off-by: Qiang Liu qiang@freescale.com
 Reported-by: Kim Phillips kim.phill...@freescale.com
 ---
  drivers/dma/fsldma.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
 index e3814aa..6fc22eb 100644
 --- a/drivers/dma/fsldma.c
 +++ b/drivers/dma/fsldma.c
 @@ -645,7 +645,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
 dma_async_tx_descriptor *tx)
   struct fsldma_chan *chan = to_fsl_chan(tx-chan);
   struct fsl_desc_sw *desc = tx_to_fsl_desc(tx);
   struct fsl_desc_sw *child;
 - dma_cookie_t cookie;
 + dma_cookie_t cookie = 0;
 
   spin_lock_bh(chan-desc_lock);
 
 --
 1.7.5.1
 
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 5/6] fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave

2012-08-01 Thread Ira W. Snyder
On Wed, Aug 01, 2012 at 04:50:09PM +0800, qiang@freescale.com wrote:
 From: Qiang Liu qiang@freescale.com
 
 - use spin_lock_bh() is the right way to use async_tx api,
 dma_run_dependencies() should not be protected by spin_lock_irqsave();
 - use spin_lock_bh to instead of spin_lock_irqsave for improving performance,
 There is not any place to access descriptor queues in fsl-dma ISR except its
 tasklet, spin_lock_bh() is more proper here. Interrupts will be turned off and
 context will be save in irqsave, there is needless to use irqsave..
 

This description is not very clear English. I understand it is not your
native language. Let me try to help.


The use of spin_lock_irqsave() is a stronger locking mechanism than is
required throughout the driver. The minimum locking required should be
used instead.

Change all instances of spin_lock_irqsave() to spin_lock_bh(). All
manipulation of protected fields is done using tasklet context or
weaker, which makes spin_lock_bh() the correct choice.


Other than that,
Acked-by: Ira W. Snyder i...@ovro.caltech.edu

 Cc: Dan Williams dan.j.willi...@intel.com
 Cc: Vinod Koul vinod.k...@intel.com
 Cc: Li Yang le...@freescale.com
 Cc: Timur Tabi ti...@freescale.com
 Signed-off-by: Qiang Liu qiang@freescale.com
 ---
  drivers/dma/fsldma.c |   30 --
  1 files changed, 12 insertions(+), 18 deletions(-)
 
 diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
 index bb883c0..e3814aa 100644
 --- a/drivers/dma/fsldma.c
 +++ b/drivers/dma/fsldma.c
 @@ -645,10 +645,9 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
 dma_async_tx_descriptor *tx)
   struct fsldma_chan *chan = to_fsl_chan(tx-chan);
   struct fsl_desc_sw *desc = tx_to_fsl_desc(tx);
   struct fsl_desc_sw *child;
 - unsigned long flags;
   dma_cookie_t cookie;
 
 - spin_lock_irqsave(chan-desc_lock, flags);
 + spin_lock_bh(chan-desc_lock);
 
   /*
* assign cookies to all of the software descriptors
 @@ -661,7 +660,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
 dma_async_tx_descriptor *tx)
   /* put this transaction onto the tail of the pending queue */
   append_ld_queue(chan, desc);
 
 - spin_unlock_irqrestore(chan-desc_lock, flags);
 + spin_unlock_bh(chan-desc_lock);
 
   return cookie;
  }
 @@ -770,15 +769,14 @@ static void fsldma_free_desc_list_reverse(struct 
 fsldma_chan *chan,
  static void fsl_dma_free_chan_resources(struct dma_chan *dchan)
  {
   struct fsldma_chan *chan = to_fsl_chan(dchan);
 - unsigned long flags;
 
   chan_dbg(chan, free all channel resources\n);
 - spin_lock_irqsave(chan-desc_lock, flags);
 + spin_lock_bh(chan-desc_lock);
   fsldma_cleanup_descriptor(chan);
   fsldma_free_desc_list(chan, chan-ld_pending);
   fsldma_free_desc_list(chan, chan-ld_running);
   fsldma_free_desc_list(chan, chan-ld_completed);
 - spin_unlock_irqrestore(chan-desc_lock, flags);
 + spin_unlock_bh(chan-desc_lock);
 
   dma_pool_destroy(chan-desc_pool);
   chan-desc_pool = NULL;
 @@ -997,7 +995,6 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
  {
   struct dma_slave_config *config;
   struct fsldma_chan *chan;
 - unsigned long flags;
   int size;
 
   if (!dchan)
 @@ -1007,7 +1004,7 @@ static int fsl_dma_device_control(struct dma_chan 
 *dchan,
 
   switch (cmd) {
   case DMA_TERMINATE_ALL:
 - spin_lock_irqsave(chan-desc_lock, flags);
 + spin_lock_bh(chan-desc_lock);
 
   /* Halt the DMA engine */
   dma_halt(chan);
 @@ -1017,7 +1014,7 @@ static int fsl_dma_device_control(struct dma_chan 
 *dchan,
   fsldma_free_desc_list(chan, chan-ld_running);
   chan-idle = true;
 
 - spin_unlock_irqrestore(chan-desc_lock, flags);
 + spin_unlock_bh(chan-desc_lock);
   return 0;
 
   case DMA_SLAVE_CONFIG:
 @@ -1059,11 +1056,10 @@ static int fsl_dma_device_control(struct dma_chan 
 *dchan,
  static void fsl_dma_memcpy_issue_pending(struct dma_chan *dchan)
  {
   struct fsldma_chan *chan = to_fsl_chan(dchan);
 - unsigned long flags;
 
 - spin_lock_irqsave(chan-desc_lock, flags);
 + spin_lock_bh(chan-desc_lock);
   fsl_chan_xfer_ld_queue(chan);
 - spin_unlock_irqrestore(chan-desc_lock, flags);
 + spin_unlock_bh(chan-desc_lock);
  }
 
  /**
 @@ -1076,15 +1072,14 @@ static enum dma_status fsl_tx_status(struct dma_chan 
 *dchan,
  {
   struct fsldma_chan *chan = to_fsl_chan(dchan);
   enum dma_status ret;
 - unsigned long flags;
 
   ret = dma_cookie_status(dchan, cookie, txstate);
   if (ret == DMA_SUCCESS)
   return ret;
 
 - spin_lock_irqsave(chan-desc_lock, flags);
 + spin_lock_bh(chan-desc_lock);
   fsldma_cleanup_descriptor(chan);
 - spin_unlock_irqrestore(chan-desc_lock, flags);
 + spin_unlock_bh(chan-desc_lock);
 
   return

Re: [PATCH v4 3/7] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-07-31 Thread Ira W. Snyder
On Tue, Jul 31, 2012 at 04:09:28AM +, Liu Qiang-B32616 wrote:
  -Original Message-
  From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
  Sent: Tuesday, July 31, 2012 5:10 AM
  To: Liu Qiang-B32616
  Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Phillips
  Kim-R1AAHA; herb...@gondor.hengli.com.au; da...@davemloft.net; Dan
  Williams; Vinod Koul; Li Yang-R58472
  Subject: Re: [PATCH v4 3/7] fsl-dma: change release process of dma
  descriptor for supporting async_tx
  
  On Fri, Jul 27, 2012 at 05:16:09PM +0800, qiang@freescale.com wrote:
   From: Qiang Liu qiang@freescale.com
  
   Fix the potential risk when enable config NET_DMA and ASYNC_TX.
   Async_tx is lack of support in current release process of dma
  descriptor,
   all descriptors will be released whatever is acked or no-acked by
  async_tx,
   so there is a potential race condition when dma engine is uesd by
  others
   clients (e.g. when enable NET_DMA to offload TCP).
  
   In our case, a race condition which is raised when use both of talitos
   and dmaengine to offload xor is because napi scheduler will sync all
   pending requests in dma channels, it affects the process of raid
  operations
   due to ack_tx is not checked in fsl dma. The no-acked descriptor is
  freed
   which is submitted just now, as a dependent tx, this freed descriptor
  trigger
   BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
  
  
  I'm preparing an alternative version of this patch that I think is
  easier to understand (it is much shorter). I'll post it up here as soon
  as I finish testing.
 Can you give a simple description/idea about your patch? My patch is for fix 
 the
 problems when I build a raid environment with talitos offload xor.
 I think the new interface is clear enough and similar with the implement of 
 other dma devices.
 
 And do you have any comments about this patch?
 

My patch will fix the same problem, in a simpler way. It will not
involve checking if the hardware is finished with a descriptor on
ld_running.

  
  It would be nice to know how to easily reproduce this bug, without
  needing to set up a RAID system. I don't have access to any such
  hardware. A driver similar to drivers/dma/dmatest.c (using the async_tx
  API instead) would be wonderful.
 You can refer to raid5.c if you do not want to use hardware. Or you can use
 you ram (or other storage devices) to build a raid env to test.
 Thanks.
 
  
  Thanks,
  Ira
  
   TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
   GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4
   0001
   GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4
  ed576d98 
   GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000
  ed3015e8 c15a7aa0
   GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0
  ef640c30 ecf41ca0
   NIP [c02b048c] async_tx_submit+0x6c/0x2b4
   LR [c02b068c] async_tx_submit+0x26c/0x2b4
   Call Trace:
   [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
   [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
   [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
   [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
   [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
   [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
   [ecf41f40] [c04329b8] md_thread+0x138/0x16c
   [ecf41f90] [c008277c] kthread+0x8c/0x90
   [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68
  
   Cc: Dan Williams dan.j.willi...@intel.com
   Cc: Vinod Koul vinod.k...@intel.com
   Cc: Li Yang le...@freescale.com
   Cc: Ira W. Snyder i...@ovro.caltech.edu
   Signed-off-by: Qiang Liu qiang@freescale.com
   ---
drivers/dma/fsldma.c |  242 +++---
  
drivers/dma/fsldma.h |1 +
2 files changed, 172 insertions(+), 71 deletions(-)
  
   diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
   index 4f2f212..87f52c0 100644
   --- a/drivers/dma/fsldma.c
   +++ b/drivers/dma/fsldma.c
   @@ -400,6 +400,125 @@ out_splice:
 list_splice_tail_init(desc-tx_list, chan-ld_pending);
}
  
   +static void fsldma_cleanup_descriptor(struct fsldma_chan *chan);
   +static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan);
   +

You should have re-arranged the patches to avoid introducing these
forward declarations in this patch and then deleting them in the next
patch. I reversed the order in my patch series.

   +/**
   + * fsldma_clean_completed_descriptor - free all descriptors which
   + * has been completed and acked
   + * @chan: Freescale DMA channel
   + *
   + * This function is used on all completed and acked descriptors.
   + * All descriptors should only be freed in this function.
   + */
   +static int
   +fsldma_clean_completed_descriptor(struct fsldma_chan *chan)
   +{
   + struct fsl_desc_sw *desc, *_desc;
   +
   + /* Run the callback for each descriptor, in order */
   + list_for_each_entry_safe(desc, _desc, chan-ld_completed

[PATCH 0/7] fsl-dma: fixes for Freescale DMA driver

2012-07-31 Thread Ira W. Snyder
From: Ira W. Snyder i...@ovro.caltech.edu

Hello everyone,

This is my alternative (simpler) attempt at solving the problems reported
by Qiang Liu with the async_tx API and MD RAID hardware offload support
when using the Freescale DMA driver.

The bug is caused by this driver freeing descriptors before they have been
ACKed by software using the async_tx API.

I don't like Qiang Liu's code to check where the hardware is in the
processing of the descriptor chain, and try to free a partial list of
descriptors. This was a source of bugs in this driver before I fixed them
several years ago.

Instead, the DMA controller raises an interrupt every time it has completed
a descriptor chain. This means it is ready for new descriptors: no need to
try and figure out where it is in the middle of a descriptor chain.

Qiang Liu: I do not have a hardware setup capable of using MD RAID. Please
test these patches to see if they fix the bug you reported. You may use
these patches as-is, or build upon them.

I have tested this using the drivers/dma/dmatest.c driver, as well as the
CARMA drivers. There are no regressions that I can find.

[  355.069679] dma0chan3-copy0: terminating after 10 tests, 0 failures 
(status 0)
[  355.192278] dma0chan2-copy0: terminating after 10 tests, 0 failures 
(status 0)

Ira W. Snyder (5):
  fsl-dma: minimize locking overhead
  fsl-dma: add fsl_dma_free_descriptor() to reduce code duplication
  fsl-dma: move functions to avoid forward declarations
  fsl-dma: fix support for async_tx API
  carma: remove unnecessary DMA_INTERRUPT capability

Qiang Liu (2):
  fsl-dma: remove attribute DMA_INTERRUPT of dmaengine
  fsl-dma: fix a warning of unitialized cookie

 drivers/dma/fsldma.c|  318 +++
 drivers/dma/fsldma.h|1 +
 drivers/misc/carma/carma-fpga-program.c |1 -
 drivers/misc/carma/carma-fpga.c |3 +-
 4 files changed, 159 insertions(+), 164 deletions(-)

-- 
1.7.8.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/7] fsl-dma: remove attribute DMA_INTERRUPT of dmaengine

2012-07-31 Thread Ira W. Snyder
From: Qiang Liu qiang@freescale.com

Delete attribute DMA_INTERRUPT because fsl-dma doesn't support this function,
exception will be thrown if talitos is used to offload xor at the same time.

Cc: Dan Williams dan.j.willi...@intel.com
Cc: Vinod Koul vinod.k...@intel.com
Cc: Li Yang le...@freescale.com
Signed-off-by: Qiang Liu qiang@freescale.com
Acked-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   31 ---
 1 files changed, 0 insertions(+), 31 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 8f84761..4f2f212 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -543,35 +543,6 @@ static void fsl_dma_free_chan_resources(struct dma_chan 
*dchan)
 }
 
 static struct dma_async_tx_descriptor *
-fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags)
-{
-   struct fsldma_chan *chan;
-   struct fsl_desc_sw *new;
-
-   if (!dchan)
-   return NULL;
-
-   chan = to_fsl_chan(dchan);
-
-   new = fsl_dma_alloc_descriptor(chan);
-   if (!new) {
-   chan_err(chan, %s\n, msg_ld_oom);
-   return NULL;
-   }
-
-   new-async_tx.cookie = -EBUSY;
-   new-async_tx.flags = flags;
-
-   /* Insert the link descriptor to the LD ring */
-   list_add_tail(new-node, new-tx_list);
-
-   /* Set End-of-link to the last link descriptor of new list */
-   set_ld_eol(chan, new);
-
-   return new-async_tx;
-}
-
-static struct dma_async_tx_descriptor *
 fsl_dma_prep_memcpy(struct dma_chan *dchan,
dma_addr_t dma_dst, dma_addr_t dma_src,
size_t len, unsigned long flags)
@@ -1352,12 +1323,10 @@ static int __devinit fsldma_of_probe(struct 
platform_device *op)
fdev-irq = irq_of_parse_and_map(op-dev.of_node, 0);
 
dma_cap_set(DMA_MEMCPY, fdev-common.cap_mask);
-   dma_cap_set(DMA_INTERRUPT, fdev-common.cap_mask);
dma_cap_set(DMA_SG, fdev-common.cap_mask);
dma_cap_set(DMA_SLAVE, fdev-common.cap_mask);
fdev-common.device_alloc_chan_resources = fsl_dma_alloc_chan_resources;
fdev-common.device_free_chan_resources = fsl_dma_free_chan_resources;
-   fdev-common.device_prep_dma_interrupt = fsl_dma_prep_interrupt;
fdev-common.device_prep_dma_memcpy = fsl_dma_prep_memcpy;
fdev-common.device_prep_dma_sg = fsl_dma_prep_sg;
fdev-common.device_tx_status = fsl_tx_status;
-- 
1.7.8.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 3/7] fsl-dma: add fsl_dma_free_descriptor() to reduce code duplication

2012-07-31 Thread Ira W. Snyder
From: Ira W. Snyder i...@ovro.caltech.edu

There are several places where descriptors are freed using identical
code. Put this code into a function to reduce code duplication.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   32 ++--
 1 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 8f0505d..c34a628 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -400,6 +400,15 @@ out_splice:
list_splice_tail_init(desc-tx_list, chan-ld_pending);
 }
 
+static void fsl_dma_free_descriptor(struct fsldma_chan *chan, struct 
fsl_desc_sw *desc)
+{
+   list_del(desc-node);
+#ifdef FSL_DMA_LD_DEBUG
+   chan_dbg(chan, LD %p free\n, desc);
+#endif
+   dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys);
+}
+
 static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx)
 {
struct fsldma_chan *chan = to_fsl_chan(tx-chan);
@@ -499,13 +508,8 @@ static void fsldma_free_desc_list(struct fsldma_chan *chan,
 {
struct fsl_desc_sw *desc, *_desc;
 
-   list_for_each_entry_safe(desc, _desc, list, node) {
-   list_del(desc-node);
-#ifdef FSL_DMA_LD_DEBUG
-   chan_dbg(chan, LD %p free\n, desc);
-#endif
-   dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys);
-   }
+   list_for_each_entry_safe(desc, _desc, list, node)
+   fsl_dma_free_descriptor(chan, desc);
 }
 
 static void fsldma_free_desc_list_reverse(struct fsldma_chan *chan,
@@ -513,13 +517,8 @@ static void fsldma_free_desc_list_reverse(struct 
fsldma_chan *chan,
 {
struct fsl_desc_sw *desc, *_desc;
 
-   list_for_each_entry_safe_reverse(desc, _desc, list, node) {
-   list_del(desc-node);
-#ifdef FSL_DMA_LD_DEBUG
-   chan_dbg(chan, LD %p free\n, desc);
-#endif
-   dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys);
-   }
+   list_for_each_entry_safe_reverse(desc, _desc, list, node)
+   fsl_dma_free_descriptor(chan, desc);
 }
 
 /**
@@ -852,10 +851,7 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan 
*chan,
dma_unmap_page(dev, src, len, DMA_TO_DEVICE);
}
 
-#ifdef FSL_DMA_LD_DEBUG
-   chan_dbg(chan, LD %p free\n, desc);
-#endif
-   dma_pool_free(chan-desc_pool, desc, txd-phys);
+   fsl_dma_free_descriptor(chan, desc);
 }
 
 /**
-- 
1.7.8.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2/7] fsl-dma: minimize locking overhead

2012-07-31 Thread Ira W. Snyder
From: Ira W. Snyder i...@ovro.caltech.edu

The use of spin_lock_irqsave() was a stronger locking mechanism than was
actually needed in many cases.

As the current code is written, spin_lock_bh() everywhere is sufficient.

The next patch in this series will add some code to hardware interrupt
context. This patch is intended to minimize the differences in the next
patch and make review easier.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   25 ++---
 1 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 4f2f212..8f0505d 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -405,10 +405,9 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
struct fsldma_chan *chan = to_fsl_chan(tx-chan);
struct fsl_desc_sw *desc = tx_to_fsl_desc(tx);
struct fsl_desc_sw *child;
-   unsigned long flags;
dma_cookie_t cookie;
 
-   spin_lock_irqsave(chan-desc_lock, flags);
+   spin_lock_irq(chan-desc_lock);
 
/*
 * assign cookies to all of the software descriptors
@@ -421,7 +420,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
/* put this transaction onto the tail of the pending queue */
append_ld_queue(chan, desc);
 
-   spin_unlock_irqrestore(chan-desc_lock, flags);
+   spin_unlock_irq(chan-desc_lock);
 
return cookie;
 }
@@ -530,13 +529,12 @@ static void fsldma_free_desc_list_reverse(struct 
fsldma_chan *chan,
 static void fsl_dma_free_chan_resources(struct dma_chan *dchan)
 {
struct fsldma_chan *chan = to_fsl_chan(dchan);
-   unsigned long flags;
 
chan_dbg(chan, free all channel resources\n);
-   spin_lock_irqsave(chan-desc_lock, flags);
+   spin_lock_irq(chan-desc_lock);
fsldma_free_desc_list(chan, chan-ld_pending);
fsldma_free_desc_list(chan, chan-ld_running);
-   spin_unlock_irqrestore(chan-desc_lock, flags);
+   spin_unlock_irq(chan-desc_lock);
 
dma_pool_destroy(chan-desc_pool);
chan-desc_pool = NULL;
@@ -755,7 +753,6 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 {
struct dma_slave_config *config;
struct fsldma_chan *chan;
-   unsigned long flags;
int size;
 
if (!dchan)
@@ -765,7 +762,7 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 
switch (cmd) {
case DMA_TERMINATE_ALL:
-   spin_lock_irqsave(chan-desc_lock, flags);
+   spin_lock_irq(chan-desc_lock);
 
/* Halt the DMA engine */
dma_halt(chan);
@@ -775,7 +772,7 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
fsldma_free_desc_list(chan, chan-ld_running);
chan-idle = true;
 
-   spin_unlock_irqrestore(chan-desc_lock, flags);
+   spin_unlock_irq(chan-desc_lock);
return 0;
 
case DMA_SLAVE_CONFIG:
@@ -935,11 +932,10 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan 
*chan)
 static void fsl_dma_memcpy_issue_pending(struct dma_chan *dchan)
 {
struct fsldma_chan *chan = to_fsl_chan(dchan);
-   unsigned long flags;
 
-   spin_lock_irqsave(chan-desc_lock, flags);
+   spin_lock_irq(chan-desc_lock);
fsl_chan_xfer_ld_queue(chan);
-   spin_unlock_irqrestore(chan-desc_lock, flags);
+   spin_unlock_irq(chan-desc_lock);
 }
 
 /**
@@ -952,11 +948,10 @@ static enum dma_status fsl_tx_status(struct dma_chan 
*dchan,
 {
struct fsldma_chan *chan = to_fsl_chan(dchan);
enum dma_status ret;
-   unsigned long flags;
 
-   spin_lock_irqsave(chan-desc_lock, flags);
+   spin_lock_irq(chan-desc_lock);
ret = dma_cookie_status(dchan, cookie, txstate);
-   spin_unlock_irqrestore(chan-desc_lock, flags);
+   spin_unlock_irq(chan-desc_lock);
 
return ret;
 }
-- 
1.7.8.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 4/7] fsl-dma: move functions to avoid forward declarations

2012-07-31 Thread Ira W. Snyder
From: Ira W. Snyder i...@ovro.caltech.edu

This function will be modified in the next patch in the series. By
moving the function in a patch separate from the changes, it will make
review easier.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   96 +-
 1 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index c34a628..80edc63 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -409,6 +409,54 @@ static void fsl_dma_free_descriptor(struct fsldma_chan 
*chan, struct fsl_desc_sw
dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys);
 }
 
+/**
+ * fsldma_cleanup_descriptor - cleanup and free a single link descriptor
+ * @chan: Freescale DMA channel
+ * @desc: descriptor to cleanup and free
+ *
+ * This function is used on a descriptor which has been executed by the DMA
+ * controller. It will run any callbacks, submit any dependencies, and then
+ * free the descriptor.
+ */
+static void fsldma_cleanup_descriptor(struct fsldma_chan *chan,
+ struct fsl_desc_sw *desc)
+{
+   struct dma_async_tx_descriptor *txd = desc-async_tx;
+   struct device *dev = chan-common.device-dev;
+   dma_addr_t src = get_desc_src(chan, desc);
+   dma_addr_t dst = get_desc_dst(chan, desc);
+   u32 len = get_desc_cnt(chan, desc);
+
+   /* Run the link descriptor callback function */
+   if (txd-callback) {
+#ifdef FSL_DMA_LD_DEBUG
+   chan_dbg(chan, LD %p callback\n, desc);
+#endif
+   txd-callback(txd-callback_param);
+   }
+
+   /* Run any dependencies */
+   dma_run_dependencies(txd);
+
+   /* Unmap the dst buffer, if requested */
+   if (!(txd-flags  DMA_COMPL_SKIP_DEST_UNMAP)) {
+   if (txd-flags  DMA_COMPL_DEST_UNMAP_SINGLE)
+   dma_unmap_single(dev, dst, len, DMA_FROM_DEVICE);
+   else
+   dma_unmap_page(dev, dst, len, DMA_FROM_DEVICE);
+   }
+
+   /* Unmap the src buffer, if requested */
+   if (!(txd-flags  DMA_COMPL_SKIP_SRC_UNMAP)) {
+   if (txd-flags  DMA_COMPL_SRC_UNMAP_SINGLE)
+   dma_unmap_single(dev, src, len, DMA_TO_DEVICE);
+   else
+   dma_unmap_page(dev, src, len, DMA_TO_DEVICE);
+   }
+
+   fsl_dma_free_descriptor(chan, desc);
+}
+
 static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx)
 {
struct fsldma_chan *chan = to_fsl_chan(tx-chan);
@@ -807,54 +855,6 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 }
 
 /**
- * fsldma_cleanup_descriptor - cleanup and free a single link descriptor
- * @chan: Freescale DMA channel
- * @desc: descriptor to cleanup and free
- *
- * This function is used on a descriptor which has been executed by the DMA
- * controller. It will run any callbacks, submit any dependencies, and then
- * free the descriptor.
- */
-static void fsldma_cleanup_descriptor(struct fsldma_chan *chan,
- struct fsl_desc_sw *desc)
-{
-   struct dma_async_tx_descriptor *txd = desc-async_tx;
-   struct device *dev = chan-common.device-dev;
-   dma_addr_t src = get_desc_src(chan, desc);
-   dma_addr_t dst = get_desc_dst(chan, desc);
-   u32 len = get_desc_cnt(chan, desc);
-
-   /* Run the link descriptor callback function */
-   if (txd-callback) {
-#ifdef FSL_DMA_LD_DEBUG
-   chan_dbg(chan, LD %p callback\n, desc);
-#endif
-   txd-callback(txd-callback_param);
-   }
-
-   /* Run any dependencies */
-   dma_run_dependencies(txd);
-
-   /* Unmap the dst buffer, if requested */
-   if (!(txd-flags  DMA_COMPL_SKIP_DEST_UNMAP)) {
-   if (txd-flags  DMA_COMPL_DEST_UNMAP_SINGLE)
-   dma_unmap_single(dev, dst, len, DMA_FROM_DEVICE);
-   else
-   dma_unmap_page(dev, dst, len, DMA_FROM_DEVICE);
-   }
-
-   /* Unmap the src buffer, if requested */
-   if (!(txd-flags  DMA_COMPL_SKIP_SRC_UNMAP)) {
-   if (txd-flags  DMA_COMPL_SRC_UNMAP_SINGLE)
-   dma_unmap_single(dev, src, len, DMA_TO_DEVICE);
-   else
-   dma_unmap_page(dev, src, len, DMA_TO_DEVICE);
-   }
-
-   fsl_dma_free_descriptor(chan, desc);
-}
-
-/**
  * fsl_chan_xfer_ld_queue - transfer any pending transactions
  * @chan : Freescale DMA channel
  *
-- 
1.7.8.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 5/7] fsl-dma: fix support for async_tx API

2012-07-31 Thread Ira W. Snyder
From: Ira W. Snyder i...@ovro.caltech.edu

The current fsldma driver does not support the async_tx API. This is due
to a bug: all descriptors are released when the hardware is finished
with them. The async_tx API requires that the ACK bit is set by software
before descriptors are freed.

This bug is easiest to reproduce by enabling both CONFIG_NET_DMA and
CONFIG_ASYNC_TX, and using the hardware offload support in MD RAID. The
network stack will force operations on shared DMA channels, and will
free descriptors which are being used by the MD RAID code.

The BUG_ON(async_tx_test_ack(depend_tx)) test in async_tx_submit() will
be hit, and panic the machine.

TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4  
0001
GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4 ed576d98 

GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000 ed3015e8 
c15a7aa0
GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0 ef640c30 
ecf41ca0
NIP [c02b048c] async_tx_submit+0x6c/0x2b4
LR [c02b068c] async_tx_submit+0x26c/0x2b4
Call Trace:
[ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
[ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
[ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
[ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
[ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
[ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
[ecf41f40] [c04329b8] md_thread+0x138/0x16c
[ecf41f90] [c008277c] kthread+0x8c/0x90
[ecf41ff0] [c0011630] kernel_thread+0x4c/0x68

Cc: Dan Williams dan.j.willi...@intel.com
Cc: Vinod Koul vinod.k...@intel.com
Cc: Li Yang le...@freescale.com
Cc: Qiang Liu qiang@freescale.com
Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |  156 +++---
 drivers/dma/fsldma.h |1 +
 2 files changed, 97 insertions(+), 60 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 80edc63..380c1b7 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -410,16 +410,15 @@ static void fsl_dma_free_descriptor(struct fsldma_chan 
*chan, struct fsl_desc_sw
 }
 
 /**
- * fsldma_cleanup_descriptor - cleanup and free a single link descriptor
+ * fsldma_run_tx_complete_actions - run callback and unmap descriptor
  * @chan: Freescale DMA channel
  * @desc: descriptor to cleanup and free
  *
  * This function is used on a descriptor which has been executed by the DMA
- * controller. It will run any callbacks, submit any dependencies, and then
- * free the descriptor.
+ * controller. It will run the callback and unmap the descriptor, if requested.
  */
-static void fsldma_cleanup_descriptor(struct fsldma_chan *chan,
- struct fsl_desc_sw *desc)
+static void fsldma_run_tx_complete_actions(struct fsldma_chan *chan,
+  struct fsl_desc_sw *desc)
 {
struct dma_async_tx_descriptor *txd = desc-async_tx;
struct device *dev = chan-common.device-dev;
@@ -427,6 +426,10 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan 
*chan,
dma_addr_t dst = get_desc_dst(chan, desc);
u32 len = get_desc_cnt(chan, desc);
 
+   /* Cookies with value zero are already cleaned up */
+   if (txd-cookie == 0)
+   return;
+
/* Run the link descriptor callback function */
if (txd-callback) {
 #ifdef FSL_DMA_LD_DEBUG
@@ -435,9 +438,6 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan 
*chan,
txd-callback(txd-callback_param);
}
 
-   /* Run any dependencies */
-   dma_run_dependencies(txd);
-
/* Unmap the dst buffer, if requested */
if (!(txd-flags  DMA_COMPL_SKIP_DEST_UNMAP)) {
if (txd-flags  DMA_COMPL_DEST_UNMAP_SINGLE)
@@ -454,7 +454,68 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan 
*chan,
dma_unmap_page(dev, src, len, DMA_TO_DEVICE);
}
 
-   fsl_dma_free_descriptor(chan, desc);
+   /*
+* A zeroed cookie indicates that cleanup actions have been
+* run, but the descriptor has not yet been ACKed.
+*/
+   txd-cookie = 0;
+}
+
+/**
+ * fsldma_cleanup_descriptors - cleanup and free link descriptors
+ * @chan: Freescale DMA channel
+ *
+ * This function is used to clean up all descriptors which have been executed
+ * by the DMA controller. It will run any callbacks, submit any dependencies,
+ * and free any descriptors which have been ACKed.
+ *
+ * LOCKING: must NOT hold chan-desc_lock
+ * CONTEXT: any
+ */
+static void fsldma_cleanup_descriptors(struct fsldma_chan *chan)
+{
+   struct fsl_desc_sw *desc, *_desc;
+   dma_cookie_t cookie = 0;
+   LIST_HEAD(ld_cleanup);
+   unsigned long flags;
+
+   /*
+* Move all descriptors onto a temporary list so that hardware
+* interrupts can be enabled during cleanup

[PATCH 6/7] fsl-dma: fix a warning of unitialized cookie

2012-07-31 Thread Ira W. Snyder
From: Qiang Liu qiang@freescale.com

Fix a warning of unitialized value when compile with -Wuninitialized.

Cc: Dan Williams dan.j.willi...@intel.com
Cc: Vinod Koul vinod.k...@intel.com
Cc: Li Yang le...@freescale.com
Signed-off-by: Qiang Liu qiang@freescale.com
Cc: Kim Phillips kim.phill...@freescale.com
---
 drivers/dma/fsldma.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 380c1b7..8588cf7 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -523,7 +523,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
struct fsldma_chan *chan = to_fsl_chan(tx-chan);
struct fsl_desc_sw *desc = tx_to_fsl_desc(tx);
struct fsl_desc_sw *child;
-   dma_cookie_t cookie;
+   dma_cookie_t cookie = 0;
 
spin_lock_irq(chan-desc_lock);
 
-- 
1.7.8.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 7/7] carma: remove unnecessary DMA_INTERRUPT capability

2012-07-31 Thread Ira W. Snyder
From: Ira W. Snyder i...@ovro.caltech.edu

These drivers set the DMA_INTERRUPT capability bit when requesting a DMA
controller channel. This was historical, and is no longer needed.

Recent changes to the drivers/dma/fsldma.c driver have removed support
for this flag. This makes the carma drivers unable to find a DMA channel
with the required capabilities.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/misc/carma/carma-fpga-program.c |1 -
 drivers/misc/carma/carma-fpga.c |3 +--
 2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/carma/carma-fpga-program.c 
b/drivers/misc/carma/carma-fpga-program.c
index a2d25e4..eaddfe9 100644
--- a/drivers/misc/carma/carma-fpga-program.c
+++ b/drivers/misc/carma/carma-fpga-program.c
@@ -978,7 +978,6 @@ static int fpga_of_probe(struct platform_device *op)
dev_set_drvdata(priv-dev, priv);
dma_cap_zero(mask);
dma_cap_set(DMA_MEMCPY, mask);
-   dma_cap_set(DMA_INTERRUPT, mask);
dma_cap_set(DMA_SLAVE, mask);
dma_cap_set(DMA_SG, mask);
 
diff --git a/drivers/misc/carma/carma-fpga.c b/drivers/misc/carma/carma-fpga.c
index 8c279da..861b298 100644
--- a/drivers/misc/carma/carma-fpga.c
+++ b/drivers/misc/carma/carma-fpga.c
@@ -666,7 +666,7 @@ static int data_submit_dma(struct fpga_device *priv, struct 
data_buf *buf)
src = SYS_FPGA_BLOCK;
tx = chan-device-device_prep_dma_memcpy(chan, dst, src,
  REG_BLOCK_SIZE,
- DMA_PREP_INTERRUPT);
+ 0);
if (!tx) {
dev_err(priv-dev, unable to prep SYS-FPGA DMA\n);
return -ENOMEM;
@@ -1333,7 +1333,6 @@ static int data_of_probe(struct platform_device *op)
 
dma_cap_zero(mask);
dma_cap_set(DMA_MEMCPY, mask);
-   dma_cap_set(DMA_INTERRUPT, mask);
dma_cap_set(DMA_SLAVE, mask);
dma_cap_set(DMA_SG, mask);
 
-- 
1.7.8.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [linuxppc-release] [PATCH v4 7/7] fsl-dma: add memcpy self test interface

2012-07-30 Thread Ira W. Snyder
On Mon, Jul 30, 2012 at 12:48:41PM -0500, Timur Tabi wrote:
 qiang@freescale.com wrote:
  
  Add memory copy self test when probe device, fsl-dma will be disabled
  if self test failed.
 
 Is this a real problem that can occur?  The DMA driver used to have a
 self-test, but I removed it a long time ago because it was pointless.  I
 don't see why we need to add another one back in.
 
 -- 
 Timur Tabi
 Linux kernel developer at Freescale
 

I made a comment that a test suite for the async_tx API would be very
helpful in diagnosing similar problems in this and other DMA drivers.
Something standalone, similar to the drivers/dma/dmatest.c driver, using
the async_tx API.

I think this was misinterpreted into me asking that the driver have a
built-in self test.

Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v4 3/7] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-07-30 Thread Ira W. Snyder
On Fri, Jul 27, 2012 at 05:16:09PM +0800, qiang@freescale.com wrote:
 From: Qiang Liu qiang@freescale.com
 
 Fix the potential risk when enable config NET_DMA and ASYNC_TX.
 Async_tx is lack of support in current release process of dma descriptor,
 all descriptors will be released whatever is acked or no-acked by async_tx,
 so there is a potential race condition when dma engine is uesd by others
 clients (e.g. when enable NET_DMA to offload TCP).
 
 In our case, a race condition which is raised when use both of talitos
 and dmaengine to offload xor is because napi scheduler will sync all
 pending requests in dma channels, it affects the process of raid operations
 due to ack_tx is not checked in fsl dma. The no-acked descriptor is freed
 which is submitted just now, as a dependent tx, this freed descriptor trigger
 BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
 

I'm preparing an alternative version of this patch that I think is
easier to understand (it is much shorter). I'll post it up here as soon
as I finish testing.

It would be nice to know how to easily reproduce this bug, without
needing to set up a RAID system. I don't have access to any such
hardware. A driver similar to drivers/dma/dmatest.c (using the async_tx
API instead) would be wonderful.

Thanks,
Ira

 TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
 GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4  
 0001
 GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4 ed576d98 
 
 GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000 ed3015e8 
 c15a7aa0
 GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0 ef640c30 
 ecf41ca0
 NIP [c02b048c] async_tx_submit+0x6c/0x2b4
 LR [c02b068c] async_tx_submit+0x26c/0x2b4
 Call Trace:
 [ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
 [ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
 [ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
 [ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
 [ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
 [ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
 [ecf41f40] [c04329b8] md_thread+0x138/0x16c
 [ecf41f90] [c008277c] kthread+0x8c/0x90
 [ecf41ff0] [c0011630] kernel_thread+0x4c/0x68
 
 Cc: Dan Williams dan.j.willi...@intel.com
 Cc: Vinod Koul vinod.k...@intel.com
 Cc: Li Yang le...@freescale.com
 Cc: Ira W. Snyder i...@ovro.caltech.edu
 Signed-off-by: Qiang Liu qiang@freescale.com
 ---
  drivers/dma/fsldma.c |  242 
 +++---
  drivers/dma/fsldma.h |1 +
  2 files changed, 172 insertions(+), 71 deletions(-)
 
 diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
 index 4f2f212..87f52c0 100644
 --- a/drivers/dma/fsldma.c
 +++ b/drivers/dma/fsldma.c
 @@ -400,6 +400,125 @@ out_splice:
   list_splice_tail_init(desc-tx_list, chan-ld_pending);
  }
 
 +static void fsldma_cleanup_descriptor(struct fsldma_chan *chan);
 +static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan);
 +
 +/**
 + * fsldma_clean_completed_descriptor - free all descriptors which
 + * has been completed and acked
 + * @chan: Freescale DMA channel
 + *
 + * This function is used on all completed and acked descriptors.
 + * All descriptors should only be freed in this function.
 + */
 +static int
 +fsldma_clean_completed_descriptor(struct fsldma_chan *chan)
 +{
 + struct fsl_desc_sw *desc, *_desc;
 +
 + /* Run the callback for each descriptor, in order */
 + list_for_each_entry_safe(desc, _desc, chan-ld_completed, node) {
 +
 + if (async_tx_test_ack(desc-async_tx)) {
 + /* Remove from the list of transactions */
 + list_del(desc-node);
 +#ifdef FSL_DMA_LD_DEBUG
 + chan_dbg(chan, LD %p free\n, desc);
 +#endif
 + dma_pool_free(chan-desc_pool, desc,
 + desc-async_tx.phys);
 + }
 + }
 +
 + return 0;
 +}
 +
 +/**
 + * fsldma_run_tx_complete_actions - cleanup and free a single link descriptor
 + * @chan: Freescale DMA channel
 + * @desc: descriptor to cleanup and free
 + * @cookie: Freescale DMA transaction identifier
 + *
 + * This function is used on a descriptor which has been executed by the DMA
 + * controller. It will run any callbacks, submit any dependencies.
 + */
 +static dma_cookie_t fsldma_run_tx_complete_actions(struct fsl_desc_sw *desc,
 + struct fsldma_chan *chan, dma_cookie_t cookie)
 +{
 + struct dma_async_tx_descriptor *txd = desc-async_tx;
 + struct device *dev = chan-common.device-dev;
 + dma_addr_t src = get_desc_src(chan, desc);
 + dma_addr_t dst = get_desc_dst(chan, desc);
 + u32 len = get_desc_cnt(chan, desc);
 +
 + BUG_ON(txd-cookie  0);
 +
 + if (txd-cookie  0) {
 + cookie = txd-cookie;
 +
 + /* Run the link descriptor callback function */
 + if (txd-callback) {
 +#ifdef FSL_DMA_LD_DEBUG

Re: [PATCH v3 3/4] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-07-17 Thread Ira W. Snyder
On Tue, Jul 17, 2012 at 07:06:33AM +, Liu Qiang-B32616 wrote:
  -Original Message-
  From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
  Sent: Tuesday, July 17, 2012 4:01 AM
  To: Liu Qiang-B32616
  Cc: linux-cry...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Phillips
  Kim-R1AAHA; herb...@gondor.hengli.com.au; da...@davemloft.net; Dan
  Williams; Vinod Koul; Li Yang-R58472
  Subject: Re: [PATCH v3 3/4] fsl-dma: change release process of dma
  descriptor for supporting async_tx
  
  On Mon, Jul 16, 2012 at 12:08:29PM +0800, Qiang Liu wrote:
   Fix the potential risk when enable config NET_DMA and ASYNC_TX.
   Async_tx is lack of support in current release process of dma
   descriptor, all descriptors will be released whatever is acked or
   no-acked by async_tx, so there is a potential race condition when dma
   engine is uesd by others clients (e.g. when enable NET_DMA to offload
  TCP).
  
   In our case, a race condition which is raised when use both of talitos
   and dmaengine to offload xor is because napi scheduler will sync all
   pending requests in dma channels, it affects the process of raid
   operations due to ack_tx is not checked in fsl dma. The no-acked
   descriptor is freed which is submitted just now, as a dependent tx,
   this freed descriptor trigger
   BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
  
   Cc: Dan Williams dan.j.willi...@intel.com
   Cc: Vinod Koul vinod.k...@intel.com
   Cc: Li Yang le...@freescale.com
   Cc: Ira W. Snyder i...@ovro.caltech.edu
   Signed-off-by: Qiang Liu qiang@freescale.com
   ---
drivers/dma/fsldma.c |  378 +-
  ---
drivers/dma/fsldma.h |1 +
2 files changed, 225 insertions(+), 154 deletions(-)
  
   diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index
   4f2f212..4ee1b8f 100644
   --- a/drivers/dma/fsldma.c
   +++ b/drivers/dma/fsldma.c
   @@ -400,6 +400,217 @@ out_splice:
 list_splice_tail_init(desc-tx_list, chan-ld_pending);  }
  
   +/**
   + * fsl_chan_xfer_ld_queue - transfer any pending transactions
   + * @chan : Freescale DMA channel
   + *
   + * HARDWARE STATE: idle
   + * LOCKING: must hold chan-desc_lock  */ static void
   +fsl_chan_xfer_ld_queue(struct fsldma_chan *chan) {
   + struct fsl_desc_sw *desc;
   +
   + /*
   +  * If the list of pending descriptors is empty, then we
   +  * don't need to do any work at all
   +  */
   + if (list_empty(chan-ld_pending)) {
   + chan_dbg(chan, no pending LDs\n);
   + return;
   + }
   +
   + /*
   +  * The DMA controller is not idle, which means that the interrupt
   +  * handler will start any queued transactions when it runs after
   +  * this transaction finishes
   +  */
   + if (!chan-idle) {
   + chan_dbg(chan, DMA controller still busy\n);
   + return;
   + }
   +
   + /*
   +  * If there are some link descriptors which have not been
   +  * transferred, we need to start the controller
   +  */
   +
   + /*
   +  * Move all elements from the queue of pending transactions
   +  * onto the list of running transactions
   +  */
   + chan_dbg(chan, idle, starting controller\n);
   + desc = list_first_entry(chan-ld_pending, struct fsl_desc_sw,
  node);
   + list_splice_tail_init(chan-ld_pending, chan-ld_running);
   +
   + /*
   +  * The 85xx DMA controller doesn't clear the channel start bit
   +  * automatically at the end of a transfer. Therefore we must clear
   +  * it in software before starting the transfer.
   +  */
   + if ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) {
   + u32 mode;
   +
   + mode = DMA_IN(chan, chan-regs-mr, 32);
   + mode = ~FSL_DMA_MR_CS;
   + DMA_OUT(chan, chan-regs-mr, mode, 32);
   + }
   +
   + /*
   +  * Program the descriptor's address into the DMA controller,
   +  * then start the DMA transaction
   +  */
   + set_cdar(chan, desc-async_tx.phys);
   + get_cdar(chan);
   +
   + dma_start(chan);
   + chan-idle = false;
   +}
   +
  
  Please add a note about the locking requirements here, and for the other
  new functions you've added.
  
  You call this function from two places:
  
  1) fsldma_cleanup_descriptor() - called with mod-desc_lock held
  2) fsl_tx_status() - WITHOUT mod-desc_lock held
  
  One of them is definitely wrong, and I'd bet that it is #2. You're
  modifying ld_completed without a lock.
 Yes, My bad, I will correct it.
 
  
   +static int
   +fsldma_clean_completed_descriptor(struct fsldma_chan *chan) {
   + struct fsl_desc_sw *desc, *_desc;
   +
   + /* Run the callback for each descriptor, in order */
   + list_for_each_entry_safe(desc, _desc, chan-ld_completed, node) {
   +
   + if (async_tx_test_ack(desc-async_tx)) {
   + /* Remove from the list of transactions */
   + list_del(desc-node);
   +#ifdef FSL_DMA_LD_DEBUG
   + chan_dbg(chan, LD %p free\n, desc); #endif
   + dma_pool_free(chan

Re: [PATCH v3 3/4] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-07-16 Thread Ira W. Snyder
On Mon, Jul 16, 2012 at 12:08:29PM +0800, Qiang Liu wrote:
 Fix the potential risk when enable config NET_DMA and ASYNC_TX.
 Async_tx is lack of support in current release process of dma descriptor,
 all descriptors will be released whatever is acked or no-acked by async_tx,
 so there is a potential race condition when dma engine is uesd by others
 clients (e.g. when enable NET_DMA to offload TCP).
 
 In our case, a race condition which is raised when use both of talitos
 and dmaengine to offload xor is because napi scheduler will sync all
 pending requests in dma channels, it affects the process of raid operations
 due to ack_tx is not checked in fsl dma. The no-acked descriptor is freed
 which is submitted just now, as a dependent tx, this freed descriptor trigger
 BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().
 
 Cc: Dan Williams dan.j.willi...@intel.com
 Cc: Vinod Koul vinod.k...@intel.com
 Cc: Li Yang le...@freescale.com
 Cc: Ira W. Snyder i...@ovro.caltech.edu
 Signed-off-by: Qiang Liu qiang@freescale.com
 ---
  drivers/dma/fsldma.c |  378 +
  drivers/dma/fsldma.h |1 +
  2 files changed, 225 insertions(+), 154 deletions(-)
 
 diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
 index 4f2f212..4ee1b8f 100644
 --- a/drivers/dma/fsldma.c
 +++ b/drivers/dma/fsldma.c
 @@ -400,6 +400,217 @@ out_splice:
   list_splice_tail_init(desc-tx_list, chan-ld_pending);
  }
 
 +/**
 + * fsl_chan_xfer_ld_queue - transfer any pending transactions
 + * @chan : Freescale DMA channel
 + *
 + * HARDWARE STATE: idle
 + * LOCKING: must hold chan-desc_lock
 + */
 +static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
 +{
 + struct fsl_desc_sw *desc;
 +
 + /*
 +  * If the list of pending descriptors is empty, then we
 +  * don't need to do any work at all
 +  */
 + if (list_empty(chan-ld_pending)) {
 + chan_dbg(chan, no pending LDs\n);
 + return;
 + }
 +
 + /*
 +  * The DMA controller is not idle, which means that the interrupt
 +  * handler will start any queued transactions when it runs after
 +  * this transaction finishes
 +  */
 + if (!chan-idle) {
 + chan_dbg(chan, DMA controller still busy\n);
 + return;
 + }
 +
 + /*
 +  * If there are some link descriptors which have not been
 +  * transferred, we need to start the controller
 +  */
 +
 + /*
 +  * Move all elements from the queue of pending transactions
 +  * onto the list of running transactions
 +  */
 + chan_dbg(chan, idle, starting controller\n);
 + desc = list_first_entry(chan-ld_pending, struct fsl_desc_sw, node);
 + list_splice_tail_init(chan-ld_pending, chan-ld_running);
 +
 + /*
 +  * The 85xx DMA controller doesn't clear the channel start bit
 +  * automatically at the end of a transfer. Therefore we must clear
 +  * it in software before starting the transfer.
 +  */
 + if ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) {
 + u32 mode;
 +
 + mode = DMA_IN(chan, chan-regs-mr, 32);
 + mode = ~FSL_DMA_MR_CS;
 + DMA_OUT(chan, chan-regs-mr, mode, 32);
 + }
 +
 + /*
 +  * Program the descriptor's address into the DMA controller,
 +  * then start the DMA transaction
 +  */
 + set_cdar(chan, desc-async_tx.phys);
 + get_cdar(chan);
 +
 + dma_start(chan);
 + chan-idle = false;
 +}
 +

Please add a note about the locking requirements here, and for the other
new functions you've added.

You call this function from two places:

1) fsldma_cleanup_descriptor() - called with mod-desc_lock held
2) fsl_tx_status() - WITHOUT mod-desc_lock held

One of them is definitely wrong, and I'd bet that it is #2. You're
modifying ld_completed without a lock.

 +static int
 +fsldma_clean_completed_descriptor(struct fsldma_chan *chan)
 +{
 + struct fsl_desc_sw *desc, *_desc;
 +
 + /* Run the callback for each descriptor, in order */
 + list_for_each_entry_safe(desc, _desc, chan-ld_completed, node) {
 +
 + if (async_tx_test_ack(desc-async_tx)) {
 + /* Remove from the list of transactions */
 + list_del(desc-node);
 +#ifdef FSL_DMA_LD_DEBUG
 + chan_dbg(chan, LD %p free\n, desc);
 +#endif
 + dma_pool_free(chan-desc_pool, desc,
 + desc-async_tx.phys);
 + }
 + }
 +
 + return 0;
 +}
 +
 +/**
 + * fsldma_run_tx_complete_actions - cleanup and free a single link descriptor
 + * @chan: Freescale DMA channel
 + * @desc: descriptor to cleanup and free
 + * @cookie: Freescale DMA transaction identifier
 + *
 + * This function is used on a descriptor which has been executed by the DMA
 + * controller. It will run any callbacks, submit any dependencies, and then
 + * free the descriptor.
 + */
 +static

Re: [PATCH v2 2/4] fsl-dma: remove attribute DMA_INTERRUPT of dmaengine

2012-07-11 Thread Ira W. Snyder
On Wed, Jul 11, 2012 at 05:00:53PM +0800, Qiang Liu wrote:
 Delete attribute DMA_INTERRUPT because fsl-dma doesn't support this function,
 exception will be thrown if talitos is used to offload xor at the same time.
 

Both drivers/misc/carma/carma-fpga.c and
drivers/misc/carma/carma-fpga-program.c expect the DMA_INTERRUPT
property, though they do not use it. The mask is set for historical
reasons. It is safe to delete the line dma_cap_set(DMA_INTERRUPT, mask);
from both drivers.

I don't know which other drivers may expect this feature to be present.
These are only the ones which I maintain.

Other than that, you can add my:
Acked-by: Ira W. Snyder i...@ovro.caltech.edu

 Cc: Dan Williams dan.j.willi...@intel.com
 Cc: Vinod Koul vinod.k...@intel.com
 Cc: Li Yang le...@freescale.com
 Signed-off-by: Qiang Liu qiang@freescale.com
 ---
  drivers/dma/fsldma.c |   31 ---
  1 files changed, 0 insertions(+), 31 deletions(-)
 
 diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
 index 8f84761..4f2f212 100644
 --- a/drivers/dma/fsldma.c
 +++ b/drivers/dma/fsldma.c
 @@ -543,35 +543,6 @@ static void fsl_dma_free_chan_resources(struct dma_chan 
 *dchan)
  }
 
  static struct dma_async_tx_descriptor *
 -fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags)
 -{
 - struct fsldma_chan *chan;
 - struct fsl_desc_sw *new;
 -
 - if (!dchan)
 - return NULL;
 -
 - chan = to_fsl_chan(dchan);
 -
 - new = fsl_dma_alloc_descriptor(chan);
 - if (!new) {
 - chan_err(chan, %s\n, msg_ld_oom);
 - return NULL;
 - }
 -
 - new-async_tx.cookie = -EBUSY;
 - new-async_tx.flags = flags;
 -
 - /* Insert the link descriptor to the LD ring */
 - list_add_tail(new-node, new-tx_list);
 -
 - /* Set End-of-link to the last link descriptor of new list */
 - set_ld_eol(chan, new);
 -
 - return new-async_tx;
 -}
 -
 -static struct dma_async_tx_descriptor *
  fsl_dma_prep_memcpy(struct dma_chan *dchan,
   dma_addr_t dma_dst, dma_addr_t dma_src,
   size_t len, unsigned long flags)
 @@ -1352,12 +1323,10 @@ static int __devinit fsldma_of_probe(struct 
 platform_device *op)
   fdev-irq = irq_of_parse_and_map(op-dev.of_node, 0);
 
   dma_cap_set(DMA_MEMCPY, fdev-common.cap_mask);
 - dma_cap_set(DMA_INTERRUPT, fdev-common.cap_mask);
   dma_cap_set(DMA_SG, fdev-common.cap_mask);
   dma_cap_set(DMA_SLAVE, fdev-common.cap_mask);
   fdev-common.device_alloc_chan_resources = fsl_dma_alloc_chan_resources;
   fdev-common.device_free_chan_resources = fsl_dma_free_chan_resources;
 - fdev-common.device_prep_dma_interrupt = fsl_dma_prep_interrupt;
   fdev-common.device_prep_dma_memcpy = fsl_dma_prep_memcpy;
   fdev-common.device_prep_dma_sg = fsl_dma_prep_sg;
   fdev-common.device_tx_status = fsl_tx_status;
 --
 1.7.5.1
 
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2 3/4] fsl-dma: change the release process of dma descriptor

2012-07-11 Thread Ira W. Snyder
On Wed, Jul 11, 2012 at 05:01:25PM +0800, Qiang Liu wrote:
 Modify the release process of dma descriptor for avoiding exception when
 enable config NET_DMA, release dma descriptor from 1st to last second, the
 last descriptor which is reserved in current descriptor register may not be
 completed, race condition will be raised if free current descriptor.
 
 A race condition which is raised when use both of talitos and dmaengine to
 offload xor is because napi scheduler (NET_DMA is enabled) will sync all
 pending requests in dma channels, it affects the process of raid operations.
 The descriptor is freed which is submitted just now, but async_tx must check
 whether this depend tx descriptor is acked, there are poison contents in the
 invalid address, then BUG_ON() is thrown, so this descriptor will be freed
 in the next time.
 

This patch seems to be covering up a bug in the driver, rather than
actually fixing it.

When it was written, it was expected that dma_do_tasklet() would run
only when the controller was idle.

 Cc: Dan Williams dan.j.willi...@intel.com
 Cc: Vinod Koul vinod.k...@intel.com
 Cc: Li Yang le...@freescale.com
 Signed-off-by: Qiang Liu qiang@freescale.com
 ---
  drivers/dma/fsldma.c |   15 ---
  1 files changed, 12 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
 index 4f2f212..0ba3e40 100644
 --- a/drivers/dma/fsldma.c
 +++ b/drivers/dma/fsldma.c
 @@ -1035,14 +1035,22 @@ static irqreturn_t fsldma_chan_irq(int irq, void 
 *data)
  static void dma_do_tasklet(unsigned long data)
  {
   struct fsldma_chan *chan = (struct fsldma_chan *)data;
 - struct fsl_desc_sw *desc, *_desc;
 + struct fsl_desc_sw *desc, *_desc, *prev = NULL;
   LIST_HEAD(ld_cleanup);
   unsigned long flags;
 + dma_addr_t curr_phys = get_cdar(chan);
 
   chan_dbg(chan, tasklet entry\n);
 
   spin_lock_irqsave(chan-desc_lock, flags);
 
 + /* find the descriptor which is already completed */
 + list_for_each_entry_safe(desc, _desc, chan-ld_running, node) {
 + if (prev  desc-async_tx.phys == curr_phys)
 + break;
 + prev = desc;
 + }
 +

If the DMA controller was still busy processing transactions, you should
have gotten the printout irq: controller not idle! from
fsldma_chan_irq() just before it scheduled the dma_do_tasklet() to run.
If you did not get this printout, how was dma_do_tasklet() entered with
the controller still busy? I don't understand how it can happen.

If you test without your spin_lock_bh() and spin_unlock_bh() conversion
patch, do you still hit the error?

What happens if a user submits exactly one DMA transaction, and then
leaves the system idle? The callback for the last descriptor in the
chain will never get run, right? That's a bug.

   /* update the cookie if we have some descriptors to cleanup */
   if (!list_empty(chan-ld_running)) {
   dma_cookie_t cookie;
 @@ -1058,13 +1066,14 @@ static void dma_do_tasklet(unsigned long data)
* move the descriptors to a temporary list so we can drop the lock
* during the entire cleanup operation
*/
 - list_splice_tail_init(chan-ld_running, ld_cleanup);
 + list_cut_position(ld_cleanup, chan-ld_running, prev-node);
 
   /* the hardware is now idle and ready for more */
   chan-idle = true;
 
   /*
 -  * Start any pending transactions automatically
 +  * Start any pending transactions automatically if current descriptor
 +  * list is completed
*
* In the ideal case, we keep the DMA controller busy while we go
* ahead and free the descriptors below.
 --
 1.7.5.1
 
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Replacement to of_register_platform_driver ?

2012-06-13 Thread Ira W. Snyder
On Wed, Jun 13, 2012 at 05:21:22PM +0200, Guillaume Dargaud wrote:
 Hello all,
 I just updated to the most recent kernel and a driver I wrote last year 
 won't compile:
 xad.c:534:2: error: implicit declaration of function 
 'of_register_platform_driver'
 
 I see references to this function as 'obsolete' but could not find 
 what's the new recommended way to do things. Here's a bit of the 
 offending code:
 
 static struct of_platform_driver xad_driver = {
   .probe  = xad_driver_probe,
   .remove = xad_driver_remove,
   .driver = {
   .owner = THIS_MODULE,
   .name = xad-driver,
   .of_match_table = xad_device_id,
   },
 };
 
 ...
 
 static int __init xad_init(void) {
 ...   
   first = MKDEV (my_major, my_minor);
   register_chrdev_region(first, count, DEVNAME);
   my_cdev = cdev_alloc ();
   if (NULL==my_cdev) goto Err;
   
   cdev_init(my_cdev, fops);
   rc=cdev_add (my_cdev, first, count);
 ...   
   rc = of_register_platform_driver(xad_driver);
 ...
 }
 
 
 -- 
 Guillaume Dargaud
 http://www.gdargaud.net/
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev

The history of drivers/misc/carma/carma-fpga.c will show you the code
changes necessary. Specifically, these two commits perform the
conversion:

493340207 carma-fpga: Missed switch from of_register_platform_driver()
b00e126ff MISC: convert drivers/misc/* to use module_platform_driver()

Hope it helps,
Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/1] fsldma: ignore end of segments interrupt

2012-02-16 Thread Ira W. Snyder
On Thu, Feb 16, 2012 at 05:50:47PM +, Tabi Timur-B04825 wrote:
 On Thu, Jan 26, 2012 at 2:58 PM, Ira W. Snyder i...@ovro.caltech.edu wrote:
  The mpc8349ea has been observed to generate spurious end of segments
  interrupts despite the fact that they are not enabled by this driver.
  Check for them and ignore them to avoid a kernel error message.
 
 When this happens, are there any other status bits set?  It seems
 weird that there are spurious interrupts from an internal block,
 especially since it's the same block on all 83xx parts.
 
 I wonder if the EOSI bit just happens to be set when the interrupt
 occurs for some other reason.
 

I'm not sure. The fsldma irq handler only prints bits it did not handle.
There are several other bits in the driver which should never be seen,
but they are handled by the irq handler anyway. This is just a remnant
from the original Freescale code.

I have a set of 15 test boards that I can use to figure out which other
bits are set when this happens, if it is important.

I put a variation of this patch (missing the skip tasklet if not idle
logic) into my production boards roughly a month ago. I've gotten the
controller not idle error message 748 times, as compared to the
unhandled sr 0x0002 message 3449 times.

This leads me to believe that this occurs mostly (but not always)
concurrent with the end-of-chain interrupt.

In the last month, the unhandled sr error has occurred on 92 out of
120 boards in production use. The statistics are included below. On some
boards, it is much more frequent than on others. All boards have roughly
the same workload.

Another interesting tidbit from my logs: this only occurs on DMA channel
2 (the are numbered starting at 0, it is the 3rd channel). Here is an
example log message:

[3484053.821689] of:fsl-elo-dma e00082a8.dma: chan2: irq: unhandled sr 
0x0002

Thanks,
Ira

 15 serial-number-5
  1 serial-number-16
  8 serial-number-18
 16 serial-number-19
  3 serial-number-20
 21 serial-number-21
  1 serial-number-24
  1 serial-number-26
  3 serial-number-27
  2 serial-number-28
 16 serial-number-29
  4 serial-number-30
  1 serial-number-31
  4 serial-number-32
  5 serial-number-33
  1 serial-number-34
  6 serial-number-35
 18 serial-number-36
  1 serial-number-39
  1 serial-number-40
  2 serial-number-41
 10 serial-number-42
 11 serial-number-43
 32 serial-number-45
  6 serial-number-46
  4 serial-number-47
  1 serial-number-49
  6 serial-number-50
  2 serial-number-51
  4 serial-number-53
  1 serial-number-55
  1 serial-number-57
 15 serial-number-58
  1 serial-number-60
  1 serial-number-62
  1 serial-number-66
  8 serial-number-67
  2 serial-number-75
  1 serial-number-76
 11 serial-number-79
  4 serial-number-80
  8 serial-number-81
  1 serial-number-82
 11 serial-number-84
  2 serial-number-92
 20 serial-number-93
 30 serial-number-94
 19 serial-number-95
 32 serial-number-96
 73 serial-number-97
 18 serial-number-99
 57 serial-number-100
 41 serial-number-101
 28 serial-number-102
  8 serial-number-103
132 serial-number-107
 60 serial-number-108
 55 serial-number-109
 97 serial-number-110
 18 serial-number-111
 45 serial-number-113
  6 serial-number-114
123 serial-number-115
 27 serial-number-117
 29 serial-number-118
 12 serial-number-119
 47 serial-number-120
 74 serial-number-121
  8 serial-number-124
128 serial-number-125
326 serial-number-128
 84 serial-number-129
 36 serial-number-130
  2 serial-number-131
 75 serial-number-133
 64 serial-number-135
686 serial-number-137
 97 serial-number-139
 28 serial-number-140
 82 serial-number-141
 36 serial-number-144
 31 serial-number-145
 47 serial-number-147
 60 serial-number-150
 22 serial-number-152
 36 serial-number-154
 57 serial-number-156
 68 serial-number-158
 54 serial-number-159
 37 serial-number-160
 46 serial-number-161
 14 serial-number-162
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/1] fsldma: ignore end of segments interrupt

2012-02-16 Thread Ira W. Snyder
On Thu, Feb 16, 2012 at 01:34:00PM -0600, Timur Tabi wrote:
 Ira W. Snyder wrote:
  This leads me to believe that this occurs mostly (but not always)
  concurrent with the end-of-chain interrupt.
 
 Have you tested this on an 85xx platform?
 

No. I don't have the ability to connect my P2020 up to an FPGA to
recreate the DMA workload that causes this on my 8349EA. I can run the
dmatest module, if you'd like.

 I noticed something odd.  You're modifying fsldma_chan_irq(), which is for
 DMA controllers that have per-channel IRQs.  83xx devices don't have
 per-channel IRQs -- all channels on one controller have the same IRQ.
 Looking at the device tree, I see that the IRQs are listed in the channel
 nodes *and* in the controller node.  I don't see how we ever use the
 per-controller ISR.
 

fsldma_ctrl_irq() (the per-controller irq handler) just calls through to
fsldma_chan_irq() (the per-channel irq handler).

 I wonder if the shared IRQ is the part of the cause of the interrupts
 you're seeing.
 

My device tree is slightly modified to remove the per-controller
interrupts and interrupt-parent properties. Each individual channel has
identical interrupts and interrupt-parent properties specified.

Someone here suggested that I do that, several years ago. It has been
too long, and I do not remember who. I can reverse it, and use the
per-controller IRQ instead.

  
  In the last month, the unhandled sr error has occurred on 92 out of
  120 boards in production use. The statistics are included below. On some
  boards, it is much more frequent than on others. All boards have roughly
  the same workload.
  
  Another interesting tidbit from my logs: this only occurs on DMA channel
  2 (the are numbered starting at 0, it is the 3rd channel). Here is an
  example log message:
 
 What happens if you never register that channel?  That is, remove this
 node from the device tree:
 
 dma-channel@100 {
   compatible = fsl,mpc8349-dma-channel, fsl,elo-dma-channel;
   reg = 0x100 0x80;
   cell-index = 2;
   interrupt-parent = ipic;
   interrupts = 71 8;
 };
 

I can try that. I hunch the problem will move, as the carma-fpga driver
(see drivers/misc/carma/carma-fpga.c) will claim the 4th channel
instead.

Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/1] fsldma: ignore end of segments interrupt

2012-02-16 Thread Ira W. Snyder
On Thu, Feb 16, 2012 at 01:48:20PM -0600, Timur Tabi wrote:
 Ira W. Snyder wrote:
 
  No. I don't have the ability to connect my P2020 up to an FPGA to
  recreate the DMA workload that causes this on my 8349EA. I can run the
  dmatest module, if you'd like.
 
 I just want to make sure your patch doesn't break 85xx.
 

I tried both with and without this patch on my P2020 COM Express board.
With both kernels, the board locks up after 20 minutes or so, no
messages to the serial console.

I wouldn't be surprised if there are some memory problems with this
board. In any case, I don't have any reason to believe that this patch
causes any trouble: the board dies without it.

However, the patch doesn't break DMA on 85xx. If I unload the dmatest
module after 10 minutes or so, it claims to have passed many thousands
of tests without problems.

My 8349EA test boards (15 of them) have been running their normal DMA
workload plus dmatest on the unused 4th channel, all without errors, for
several hours. ~2.5 million successful tests per board, as I write this.

Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2] fsldma: ignore end of segments interrupt

2012-01-31 Thread Ira W. Snyder
The mpc8349ea has been observed to generate spurious end of segments
interrupts despite the fact that they are not enabled by this driver.
Check for them and ignore them to avoid a kernel error message.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
Cc: Dan Williams dan.j.willi...@intel.com
---

Changes v1 - v2:
- skip the descriptor cleanup tasklet if the controller is not yet idle

 drivers/dma/fsldma.c |   27 ---
 1 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 8a78154..037631a 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -1052,20 +1052,41 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data)
stat = ~FSL_DMA_SR_EOLNI;
}
 
-   /* check that the DMA controller is really idle */
-   if (!dma_is_idle(chan))
-   chan_err(chan, irq: controller not idle!\n);
+   /*
+* This driver does not use this feature, therefore we shouldn't
+* ever see this bit set in the status register. However, it has
+* been observed on MPC8349EA parts.
+*/
+   if (stat  FSL_DMA_SR_EOSI) {
+   chan_dbg(chan, irq: End-of-Segments INT\n);
+   stat = ~FSL_DMA_SR_EOSI;
+   }
 
/* check that we handled all of the bits */
if (stat)
chan_err(chan, irq: unhandled sr 0x%08x\n, stat);
 
/*
+* Check that the DMA controller is really idle
+*
+* Occasionally on MPC8349EA parts, a spurious End-of-Segments
+* interrupt is generated. When this happens, the controller is
+* still busy. In this case, we shouldn't run the tasklet to
+* clean up idle descriptors, since the controller is not yet idle.
+*/
+   if (!dma_is_idle(chan)) {
+   chan_err(chan, irq: controller not idle!\n);
+   goto out_skip_tasklet;
+   }
+
+   /*
 * Schedule the tasklet to handle all cleanup of the current
 * transaction. It will start a new transaction if there is
 * one pending.
 */
tasklet_schedule(chan-tasklet);
+
+out_skip_tasklet:
chan_dbg(chan, irq: Exit\n);
return IRQ_HANDLED;
 }
-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/1] carma-fpga: fix race between data dumping and DMA callback

2012-01-27 Thread Ira W. Snyder
On Fri, Jan 27, 2012 at 08:25:37AM +1100, Benjamin Herrenschmidt wrote:
 On Thu, 2012-01-26 at 13:00 -0800, Ira W. Snyder wrote:
  
  @@ -970,7 +984,13 @@ static ssize_t data_en_show(struct device *dev, struct 
  device_attribute *attr,
  char *buf)
   {
  struct fpga_device *priv = dev_get_drvdata(dev);
  -   return snprintf(buf, PAGE_SIZE, %u\n, priv-enabled);
  +   int ret;
  +
  +   spin_lock_irq(priv-lock);
  +   ret = snprintf(buf, PAGE_SIZE, %u\n, priv-enabled);
  +   spin_unlock_irq(priv-lock);
  +
  +   return ret;
   } 
 
 I don't think the lock buys you anything here.
 

You're right. Feel free to drop the hunk.

Ira

 Cheers,
 Ben.
 
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/1] fsldma: ignore end of segments interrupt

2012-01-26 Thread Ira W. Snyder
The mpc8349ea has been observed to generate spurious end of segments
interrupts despite the fact that they are not enabled by this driver.
Check for them and ignore them to avoid a kernel error message.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
Cc: Dan Williams dan.j.willi...@intel.com
---
 drivers/dma/fsldma.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 8a78154..7dc9689 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -1052,6 +1052,16 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data)
stat = ~FSL_DMA_SR_EOLNI;
}
 
+   /*
+* This driver does not use this feature, therefore we shouldn't
+* ever see this bit set in the status register. However, it has
+* been observed on MPC8349EA parts.
+*/
+   if (stat  FSL_DMA_SR_EOSI) {
+   chan_dbg(chan, irq: End-of-Segments INT\n);
+   stat = ~FSL_DMA_SR_EOSI;
+   }
+
/* check that the DMA controller is really idle */
if (!dma_is_idle(chan))
chan_err(chan, irq: controller not idle!\n);
-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/1] carma-fpga: fix lockdep warning

2012-01-26 Thread Ira W. Snyder
Lockdep occasionally complains with the message:
INFO: HARDIRQ-safe - HARDIRQ-unsafe lock order detected

This is caused by calling videobuf_dma_unmap() under spin_lock_irq(). To
fix the warning, we drop the lock before unmapping and freeing the
buffer.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
---
 drivers/misc/carma/carma-fpga.c |   13 +++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/carma/carma-fpga.c b/drivers/misc/carma/carma-fpga.c
index 14e974b2..4fd896d 100644
--- a/drivers/misc/carma/carma-fpga.c
+++ b/drivers/misc/carma/carma-fpga.c
@@ -1079,6 +1079,7 @@ static ssize_t data_read(struct file *filp, char __user 
*ubuf, size_t count,
struct fpga_reader *reader = filp-private_data;
struct fpga_device *priv = reader-priv;
struct list_head *used = priv-used;
+   bool drop_buffer = false;
struct data_buf *dbuf;
size_t avail;
void *data;
@@ -1166,10 +1167,12 @@ have_buffer:
 * One of two things has happened, the device is disabled, or the
 * device has been reconfigured underneath us. In either case, we
 * should just throw away the buffer.
+*
+* Lockdep complains if this is done under the spinlock, so we
+* handle it during the unlock path.
 */
if (!priv-enabled || dbuf-size != priv-bufsize) {
-   videobuf_dma_unmap(priv-dev, dbuf-vb);
-   data_free_buffer(dbuf);
+   drop_buffer = true;
goto out_unlock;
}
 
@@ -1178,6 +1181,12 @@ have_buffer:
 
 out_unlock:
spin_unlock_irq(priv-lock);
+
+   if (drop_buffer) {
+   videobuf_dma_unmap(priv-dev, dbuf-vb);
+   data_free_buffer(dbuf);
+   }
+
return count;
 }
 
-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/1] carma-fpga: fix race between data dumping and DMA callback

2012-01-26 Thread Ira W. Snyder
When the system is under heavy load, we occasionally saw a problem where
the system would get a legitimate interrupt when they should be
disabled.

This was caused by the data_dma_cb() DMA callback unconditionally
re-enabling FPGA interrupts even when data dumping is disabled. When
data dumping was re-enabled, the irq handler would fire while a DMA was
in progress. The BUG_ON(priv-inflight != NULL); during the second
invocation of the DMA callback caused the system to crash.

To fix the issue, the priv-enabled boolean is moved under the
protection of the priv-lock spinlock. The DMA callback checks the
boolean to know whether to re-enable FPGA interrupts before it returns.

Now that it is fixed, the driver keeps FPGA interrupts disabled when it
expects that they are disabled, fixing the bug.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
---
 drivers/misc/carma/carma-fpga.c |  101 +++---
 1 files changed, 61 insertions(+), 40 deletions(-)

diff --git a/drivers/misc/carma/carma-fpga.c b/drivers/misc/carma/carma-fpga.c
index 4fd896d..0cfc5bf 100644
--- a/drivers/misc/carma/carma-fpga.c
+++ b/drivers/misc/carma/carma-fpga.c
@@ -560,6 +560,9 @@ static void data_enable_interrupts(struct fpga_device *priv)
 
/* flush the writes */
fpga_read_reg(priv, 0, MMAP_REG_STATUS);
+   fpga_read_reg(priv, 1, MMAP_REG_STATUS);
+   fpga_read_reg(priv, 2, MMAP_REG_STATUS);
+   fpga_read_reg(priv, 3, MMAP_REG_STATUS);
 
/* switch back to the external interrupt source */
iowrite32be(0x3F, priv-regs + SYS_IRQ_SOURCE_CTL);
@@ -591,8 +594,12 @@ static void data_dma_cb(void *data)
list_move_tail(priv-inflight-entry, priv-used);
priv-inflight = NULL;
 
-   /* clear the FPGA status and re-enable interrupts */
-   data_enable_interrupts(priv);
+   /*
+* If data dumping is still enabled, then clear the FPGA
+* status registers and re-enable FPGA interrupts
+*/
+   if (priv-enabled)
+   data_enable_interrupts(priv);
 
spin_unlock_irqrestore(priv-lock, flags);
 
@@ -708,6 +715,15 @@ static irqreturn_t data_irq(int irq, void *dev_id)
 
spin_lock(priv-lock);
 
+   /*
+* This is an error case that should never happen.
+*
+* If this driver has a bug and manages to re-enable interrupts while
+* a DMA is in progress, then we will hit this statement and should
+* start paying attention immediately.
+*/
+   BUG_ON(priv-inflight != NULL);
+
/* hide the interrupt by switching the IRQ driver to GPIO */
data_disable_interrupts(priv);
 
@@ -762,11 +778,15 @@ out:
  */
 static int data_device_enable(struct fpga_device *priv)
 {
+   bool enabled;
u32 val;
int ret;
 
/* multiple enables are safe: they do nothing */
-   if (priv-enabled)
+   spin_lock_irq(priv-lock);
+   enabled = priv-enabled;
+   spin_unlock_irq(priv-lock);
+   if (enabled)
return 0;
 
/* check that the FPGAs are programmed */
@@ -797,6 +817,9 @@ static int data_device_enable(struct fpga_device *priv)
goto out_error;
}
 
+   /* prevent the FPGAs from generating interrupts */
+   data_disable_interrupts(priv);
+
/* hookup the irq handler */
ret = request_irq(priv-irq, data_irq, IRQF_SHARED, drv_name, priv);
if (ret) {
@@ -804,11 +827,13 @@ static int data_device_enable(struct fpga_device *priv)
goto out_error;
}
 
-   /* switch to the external FPGA IRQ line */
-   data_enable_interrupts(priv);
-
-   /* success, we're enabled */
+   /* allow the DMA callback to re-enable FPGA interrupts */
+   spin_lock_irq(priv-lock);
priv-enabled = true;
+   spin_unlock_irq(priv-lock);
+
+   /* allow the FPGAs to generate interrupts */
+   data_enable_interrupts(priv);
return 0;
 
 out_error:
@@ -834,41 +859,40 @@ out_error:
  */
 static int data_device_disable(struct fpga_device *priv)
 {
-   int ret;
+   spin_lock_irq(priv-lock);
 
/* allow multiple disable */
-   if (!priv-enabled)
+   if (!priv-enabled) {
+   spin_unlock_irq(priv-lock);
return 0;
+   }
+
+   /*
+* Mark the device disabled
+*
+* This stops DMA callbacks from re-enabling interrupts
+*/
+   priv-enabled = false;
 
-   /* switch to the internal GPIO IRQ line */
+   /* prevent the FPGAs from generating interrupts */
data_disable_interrupts(priv);
 
+   /* wait until all ongoing DMA has finished */
+   while (priv-inflight != NULL) {
+   spin_unlock_irq(priv-lock);
+   wait_event(priv-wait, priv-inflight == NULL);
+   spin_lock_irq(priv-lock);
+   }
+
+   spin_unlock_irq(priv-lock

Re: [PATCH] fsldma: fix performance degradation by optimizing spinlock use.

2012-01-11 Thread Ira W. Snyder
On Wed, Jan 11, 2012 at 07:54:55AM +, Shi Xuelin-B29237 wrote:
 Hello Iris,
 
 As we discussed in the previous patch, I add one smp_mb() in fsl_tx_status.
 In my testing with iozone, this smp_mb() could cause 1%~2% performance 
 degradation.
 Anyway it is acceptable for me. Do you have any other comments?
 

This patch looks fine to me.

Ira

 -Original Message-
 From: Shi Xuelin-B29237 
 Sent: 2011年12月26日 14:01
 To: i...@ovro.caltech.edu; vinod.k...@intel.com; dan.j.willi...@intel.com; 
 linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org
 Cc: Shi Xuelin-B29237
 Subject: [PATCH] fsldma: fix performance degradation by optimizing spinlock 
 use.
 
 From: Forrest shi b29...@freescale.com
 
 dma status check function fsl_tx_status is heavily called in
 a tight loop and the desc lock in fsl_tx_status contended by
 the dma status update function. this caused the dma performance
 degrades much.
 
 this patch releases the lock in the fsl_tx_status function, and
 introduce the smp_mb() to avoid possible memory inconsistency.
 
 Signed-off-by: Forrest Shi xuelin@freescale.com
 ---
  drivers/dma/fsldma.c |6 +-
  1 files changed, 1 insertions(+), 5 deletions(-)
 
 diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 
 8a78154..008fb5e 100644
 --- a/drivers/dma/fsldma.c
 +++ b/drivers/dma/fsldma.c
 @@ -986,15 +986,11 @@ static enum dma_status fsl_tx_status(struct dma_chan 
 *dchan,
   struct fsldma_chan *chan = to_fsl_chan(dchan);
   dma_cookie_t last_complete;
   dma_cookie_t last_used;
 - unsigned long flags;
 -
 - spin_lock_irqsave(chan-desc_lock, flags);
  
   last_complete = chan-completed_cookie;
 + smp_mb();
   last_used = dchan-cookie;
  
 - spin_unlock_irqrestore(chan-desc_lock, flags);
 -
   dma_set_tx_state(txstate, last_complete, last_used, 0);
   return dma_async_is_complete(cookie, last_complete, last_used); }
 --
 1.7.0.4
 
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: fix kernel log of oops/panic instruction dump

2012-01-06 Thread Ira W. Snyder
A kernel oops/panic prints an instruction dump showing several
instructions before and after the instruction which caused the
oops/panic.

The code intended that the faulting instruction be enclosed in angle
brackets, however a bug caused the faulting instruction to be
interpreted by printk() as the message log level.

To fix this, the KERN_CONT log level is added before the actualy text of
the printed message.

=== Before the patch ===

[ 1081.587266] Instruction dump:
[ 1081.590236] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 4e800020 
3801
[ 1081.598034] 3d20c03a 9009a114 7c0004ac 3920
[ 1081.602500]  4e800020 3803ffd0 2b89

4[ 1081.587266] Instruction dump:
4[ 1081.590236] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 
4e800020 3801
4[ 1081.598034] 3d20c03a 9009a114 7c0004ac 3920
9809[ 1081.602500]  4e800020 3803ffd0 2b89

=== After the patch ===

[   51.385216] Instruction dump:
[   51.388186] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 4e800020 
3801
[   51.395986] 3d20c03a 9009a114 7c0004ac 3920 9809 4e800020 3803ffd0 
2b89

4[   51.385216] Instruction dump:
4[   51.388186] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 
4e800020 3801
4[   51.395986] 3d20c03a 9009a114 7c0004ac 3920 9809 4e800020 
3803ffd0 2b89

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
Cc: Paul Mackerras pau...@samba.org
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: linuxppc-dev@lists.ozlabs.org
---

In the above examples, the first block is what is shown on the serial
console as the machine dies. The second block is the dump as captured by
mtdoops.

 arch/powerpc/kernel/process.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 6457574..271f809 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -566,12 +566,12 @@ static void show_instructions(struct pt_regs *regs)
 */
if (!__kernel_text_address(pc) ||
 __get_user(instr, (unsigned int __user *)pc)) {
-   printk( );
+   printk(KERN_CONT  );
} else {
if (regs-nip == pc)
-   printk(%08x , instr);
+   printk(KERN_CONT %08x , instr);
else
-   printk(%08x , instr);
+   printk(KERN_CONT %08x , instr);
}
 
pc += sizeof(int);
-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc: fix kernel log of oops/panic instruction dump

2012-01-06 Thread Ira W. Snyder
On Sat, Jan 07, 2012 at 09:50:10AM +1100, Benjamin Herrenschmidt wrote:
 On Fri, 2012-01-06 at 14:34 -0800, Ira W. Snyder wrote:
  A kernel oops/panic prints an instruction dump showing several
  instructions before and after the instruction which caused the
  oops/panic.
  
  The code intended that the faulting instruction be enclosed in angle
  brackets, however a bug caused the faulting instruction to be
  interpreted by printk() as the message log level.
  
  To fix this, the KERN_CONT log level is added before the actualy text of

If you could fix the text above to say 'actual' (instead of 'actualy')
when you commit this, that would be great. Darn typos. :)

  the printed message.
 
 Nice one, thanks.
 
 Cheers,
 Ben.
 
  === Before the patch ===
  
  [ 1081.587266] Instruction dump:
  [ 1081.590236] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 
  4e800020 3801
  [ 1081.598034] 3d20c03a 9009a114 7c0004ac 3920
  [ 1081.602500]  4e800020 3803ffd0 2b89
  
  4[ 1081.587266] Instruction dump:
  4[ 1081.590236] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 
  4e800020 3801
  4[ 1081.598034] 3d20c03a 9009a114 7c0004ac 3920
  9809[ 1081.602500]  4e800020 3803ffd0 2b89
  
  === After the patch ===
  
  [   51.385216] Instruction dump:
  [   51.388186] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 
  4e800020 3801
  [   51.395986] 3d20c03a 9009a114 7c0004ac 3920 9809 4e800020 
  3803ffd0 2b89
  
  4[   51.385216] Instruction dump:
  4[   51.388186] 7c000110 7cf8 5400077c 552907f6 7d290378 992b0003 
  4e800020 3801
  4[   51.395986] 3d20c03a 9009a114 7c0004ac 3920 9809 4e800020 
  3803ffd0 2b89
  
  Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
  Cc: Paul Mackerras pau...@samba.org
  Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
  Cc: linuxppc-dev@lists.ozlabs.org
  ---
  
  In the above examples, the first block is what is shown on the serial
  console as the machine dies. The second block is the dump as captured by
  mtdoops.
  
   arch/powerpc/kernel/process.c |6 +++---
   1 files changed, 3 insertions(+), 3 deletions(-)
  
  diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
  index 6457574..271f809 100644
  --- a/arch/powerpc/kernel/process.c
  +++ b/arch/powerpc/kernel/process.c
  @@ -566,12 +566,12 @@ static void show_instructions(struct pt_regs *regs)
   */
  if (!__kernel_text_address(pc) ||
   __get_user(instr, (unsigned int __user *)pc)) {
  -   printk( );
  +   printk(KERN_CONT  );
  } else {
  if (regs-nip == pc)
  -   printk(%08x , instr);
  +   printk(KERN_CONT %08x , instr);
  else
  -   printk(%08x , instr);
  +   printk(KERN_CONT %08x , instr);
  }
   
  pc += sizeof(int);
 
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use.

2011-12-02 Thread Ira W. Snyder
On Fri, Dec 02, 2011 at 03:47:27AM +, Shi Xuelin-B29237 wrote:
 Hi Iris,
 
 I'm convinced that smp_rmb() is needed when removing the spinlock. As 
 noted, Documentation/memory-barriers.txt says that stores on one CPU can be
 observed by another CPU in a different order.
 Previously, there was an UNLOCK (in fsl_dma_tx_submit) followed by a LOCK 
 (in fsl_tx_status). This provided a full barrier, forcing the operations 
 to 
 complete correctly when viewed by the second CPU. 
 
 I do not agree this smp_rmb() works here. Because when this smp_rmb() 
 executed and begin to read chan-common.cookie, you still cannot avoid the 
 order issue. Something like one is reading old value, but another CPU is 
 updating the new value. 
 
 My point is here the order is not important for the DMA decision.
 Completed DMA tx is decided as not complete is not a big deal, because next 
 time it will be OK.
 
 I believe there is no case that could cause uncompleted DMA tx is decided as 
 completed, because the fsl_tx_status is called after fsl_dma_tx_submit for a 
 specific cookie. If you can give me an example here, I will agree with you.
 

According to memory-barriers.txt, writes to main memory may be observed in
any order if memory barriers are not used. This means that writes can
appear to happen in a different order than they were issued by the CPU.

Citing from the text:

 There are certain things that the Linux kernel memory barriers do not 
 guarantee:

  (*) There is no guarantee that any of the memory accesses specified before a
  memory barrier will be _complete_ by the completion of a memory barrier
  instruction; the barrier can be considered to draw a line in that CPU's
  access queue that accesses of the appropriate type may not cross.

Also:

 Without intervention, CPU 2 may perceive the events on CPU 1 in some
 effectively random order, despite the write barrier issued by CPU 1:

Also:

 When dealing with CPU-CPU interactions, certain types of memory barrier should
 always be paired.  A lack of appropriate pairing is almost certainly an error.

 A write barrier should always be paired with a data dependency barrier or read
 barrier, though a general barrier would also be viable.

Therefore, in an SMP system, the following situation can happen.

descriptor-cookie = 2
chan-common.cookie = 1
chan-completed_cookie = 1

This occurs when CPU-A calls fsl_dma_tx_submit() and then CPU-B calls
dma_async_is_complete() ***after*** CPU-B has observed the write to
descriptor-cookie, and ***before*** before CPU-B has observed the write to
chan-common.cookie.

Remember, without barriers, CPU-B can observe CPU-A's memory accesses in
*any possible order*. Memory accesses are not guaranteed to be *complete*
by the time fsl_dma_tx_submit() returns!

With the above values, dma_async_is_complete() returns DMA_COMPLETE. This
is incorrect: the DMA is still in progress. The required invariant
chan-common.cookie = descriptor-cookie has not been met.

By adding an smp_rmb(), I force CPU-B to stall until *both* stores in
fsl_dma_tx_submit() (descriptor-cookie and chan-common.cookie) actually
hit main memory. This avoids the above situation: all CPU's observe
descriptor-cookie and chan-common.cookie to update in sync with each
other.

Is this unclear in any way?

Please run your test with the smp_rmb() and measure the performance
impact.

Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use.

2011-11-30 Thread Ira W. Snyder
On Wed, Nov 30, 2011 at 09:57:47AM +, Shi Xuelin-B29237 wrote:
 Hello Ira,
 
 In drivers/dma/dmaengine.c, we have below tight loop to check DMA completion 
 in mainline Linux:
do {
 status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
 if (time_after_eq(jiffies, dma_sync_wait_timeout)) {
 printk(KERN_ERR dma_sync_wait_timeout!\n);
 return DMA_ERROR;
 }
 } while (status == DMA_IN_PROGRESS);
 

That is the body of dma_sync_wait(). It is mostly used in the raid code.
I understand that you don't want to change the raid code to use
callbacks.

In any case, I think we've strayed from the topic under consideration,
which is: can we remove this spinlock without introducing a bug.

I'm convinced that smp_rmb() is needed when removing the spinlock. As
noted, Documentation/memory-barriers.txt says that stores on one CPU can
be observed by another CPU in a different order.

Previously, there was an UNLOCK (in fsl_dma_tx_submit) followed by a
LOCK (in fsl_tx_status). This provided a full barrier, forcing the
operations to complete correctly when viewed by the second CPU. From the
text:

 Therefore, from (1), (2) and (4) an UNLOCK followed by an unconditional LOCK 
 is
 equivalent to a full barrier, but a LOCK followed by an UNLOCK is not.

Also, please read EXAMPLES OF MEMORY BARRIER SEQUENCES and INTER-CPU
LOCKING BARRIER EFFECTS. Particularly, in EXAMPLES OF MEMORY BARRIER
SEQUENCES, the text notes:

 Without intervention, CPU 2 may perceive the events on CPU 1 in some
 effectively random order, despite the write barrier issued by CPU 1:

 [snip diagram]

 And thirdly, a read barrier acts as a partial order on loads. Consider the
 following sequence of events:

 [snip diagram]

 Without intervention, CPU 2 may then choose to perceive the events on CPU 1 in
 some effectively random order, despite the write barrier issued by CPU 1:

 [snip diagram]


And so on. Please read this entire section in the document.

I can't give you an ACK on the proposed patch. To the best of my
understanding, I believe it introduces a bug. I've tried to provide as
much evidence for this belief as I can, in the form of documentation in
the kernel source tree. If you can cite some documentation that shows I
am wrong, I will happily change my mind!

Ira

 -Original Message-
 From: Ira W. Snyder [mailto:i...@ovro.caltech.edu] 
 Sent: 2011年11月30日 1:26
 To: Li Yang-R58472
 Cc: Shi Xuelin-B29237; vinod.k...@intel.com; dan.j.willi...@intel.com; 
 linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org
 Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing 
 spinlock use.
 
 On Tue, Nov 29, 2011 at 03:19:05AM +, Li Yang-R58472 wrote:
   Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by 
   optimizing spinlock use.
   
   On Thu, Nov 24, 2011 at 08:12:25AM +, Shi Xuelin-B29237 wrote:
Hi Ira,
   
Thanks for your review.
   
After second thought, I think your scenario may not occur.
Because the cookie 20 we query must be returned by 
fsl_dma_tx_submit(...) in
   practice.
We never query a cookie not returned by fsl_dma_tx_submit(...).
   
   
   I agree about this part.
   
When we call fsl_tx_status(20), the chan-common.cookie is 
definitely wrote as
   20 and cpu2 could not read as 19.
   
   
   This is what I don't agree about. However, I'm not an expert on CPU cache 
   vs.
   memory accesses in an multi-processor system. The section titled 
   CACHE COHERENCY in Documentation/memory-barriers.txt leads me to 
   believe that the scenario I described is possible.
  
  For Freescale PowerPC, the chip automatically takes care of cache 
  coherency.  Even if this is a concern, spinlock can't address it.
  
   
   What happens if CPU1's write of chan-common.cookie only goes into 
   CPU1's cache. It never makes it to main memory before CPU2 fetches the 
   old value of 19.
   
   I don't think you should see any performance impact from the 
   smp_mb() operation.
  
  Smp_mb() do have impact on performance if it's in the hot path.  While it 
  might be safer having it, I doubt it is really necessary.  If the CPU1 
  doesn't have the updated last_used, it's shouldn't have known there is a 
  cookie 20 existed either.
  
 
 I believe that you are correct, for powerpc. However, anything outside of 
 arch/powerpc shouldn't assume it only runs on powerpc. I wouldn't be 
 surprised to see fsldma running on an iMX someday (ARM processor).
 
 My interpretation says that the change introduces the possibility that
 fsl_tx_status() returns the wrong answer for an extremely small time window, 
 on SMP only, based on Documentation/memory-barriers.txt. But I can't seem 
 convince you.
 
 My real question is what code path is hitting this spinlock? Is it in 
 mainline Linux? Why is it polling rather than using callbacks to determine 
 DMA completion

Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use.

2011-11-29 Thread Ira W. Snyder
On Tue, Nov 29, 2011 at 03:19:05AM +, Li Yang-R58472 wrote:
  Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing
  spinlock use.
  
  On Thu, Nov 24, 2011 at 08:12:25AM +, Shi Xuelin-B29237 wrote:
   Hi Ira,
  
   Thanks for your review.
  
   After second thought, I think your scenario may not occur.
   Because the cookie 20 we query must be returned by fsl_dma_tx_submit(...) 
   in
  practice.
   We never query a cookie not returned by fsl_dma_tx_submit(...).
  
  
  I agree about this part.
  
   When we call fsl_tx_status(20), the chan-common.cookie is definitely 
   wrote as
  20 and cpu2 could not read as 19.
  
  
  This is what I don't agree about. However, I'm not an expert on CPU cache 
  vs.
  memory accesses in an multi-processor system. The section titled CACHE
  COHERENCY in Documentation/memory-barriers.txt leads me to believe that the
  scenario I described is possible.
 
 For Freescale PowerPC, the chip automatically takes care of cache coherency.  
 Even if this is a concern, spinlock can't address it.
 
  
  What happens if CPU1's write of chan-common.cookie only goes into CPU1's
  cache. It never makes it to main memory before CPU2 fetches the old value 
  of 19.
  
  I don't think you should see any performance impact from the smp_mb()
  operation.
 
 Smp_mb() do have impact on performance if it's in the hot path.  While it 
 might be safer having it, I doubt it is really necessary.  If the CPU1 
 doesn't have the updated last_used, it's shouldn't have known there is a 
 cookie 20 existed either.
 

I believe that you are correct, for powerpc. However, anything outside
of arch/powerpc shouldn't assume it only runs on powerpc. I wouldn't be
surprised to see fsldma running on an iMX someday (ARM processor).

My interpretation says that the change introduces the possibility that
fsl_tx_status() returns the wrong answer for an extremely small time
window, on SMP only, based on Documentation/memory-barriers.txt. But I
can't seem convince you.

My real question is what code path is hitting this spinlock? Is it in
mainline Linux? Why is it polling rather than using callbacks to
determine DMA completion?

Thanks,
Ira

   -Original Message-
   From: Ira W. Snyder [mailto:i...@ovro.caltech.edu]
   Sent: 2011年11月23日 2:59
   To: Shi Xuelin-B29237
   Cc: dan.j.willi...@intel.com; Li Yang-R58472; z...@zh-kernel.org;
   vinod.k...@intel.com; linuxppc-dev@lists.ozlabs.org;
   linux-ker...@vger.kernel.org
   Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by 
   optimizing
  spinlock use.
  
   On Tue, Nov 22, 2011 at 12:55:05PM +0800, b29...@freescale.com wrote:
From: Forrest Shi b29...@freescale.com
   
dma status check function fsl_tx_status is heavily called in
a tight loop and the desc lock in fsl_tx_status contended by
the dma status update function. this caused the dma performance
degrades much.
   
this patch releases the lock in the fsl_tx_status function.
I believe it has no neglect impact on the following call of
dma_async_is_complete(...).
   
we can see below three conditions will be identified as success
a)  x  complete  use
b)  x  complete+N  use+N
c)  x  complete  use+N
here complete is the completed_cookie, use is the last_used
cookie, x is the querying cookie, N is MAX cookie
   
when chan-completed_cookie is being read, the last_used may
be incresed. Anyway it has no neglect impact on the dma status
decision.
   
Signed-off-by: Forrest Shi xuelin@freescale.com
---
 drivers/dma/fsldma.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)
   
diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index
8a78154..1dca56f 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -986,15 +986,10 @@ static enum dma_status fsl_tx_status(struct
  dma_chan *dchan,
struct fsldma_chan *chan = to_fsl_chan(dchan);
dma_cookie_t last_complete;
dma_cookie_t last_used;
-   unsigned long flags;
-
-   spin_lock_irqsave(chan-desc_lock, flags);
   
  
   This will cause a bug. See below for a detailed explanation. You need 
   this instead:
  
 /*
  * On an SMP system, we must ensure that this CPU has seen the
  * memory accesses performed by another CPU under the
  * chan-desc_lock spinlock.
  */
 smp_mb();
last_complete = chan-completed_cookie;
last_used = dchan-cookie;
   
-   spin_unlock_irqrestore(chan-desc_lock, flags);
-
dma_set_tx_state(txstate, last_complete, last_used, 0);
return dma_async_is_complete(cookie, last_complete, last_used); 
 }
  
   Facts:
   - dchan-cookie is the same member as chan-common.cookie (same memory
   location)
   - chan-common.cookie is the last allocated cookie for a pending

Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use.

2011-11-28 Thread Ira W. Snyder
On Thu, Nov 24, 2011 at 08:12:25AM +, Shi Xuelin-B29237 wrote:
 Hi Ira,
 
 Thanks for your review.
 
 After second thought, I think your scenario may not occur.
 Because the cookie 20 we query must be returned by fsl_dma_tx_submit(...) in 
 practice. 
 We never query a cookie not returned by fsl_dma_tx_submit(...).
 

I agree about this part.

 When we call fsl_tx_status(20), the chan-common.cookie is definitely wrote 
 as 20 and cpu2 could not read as 19.
 

This is what I don't agree about. However, I'm not an expert on CPU
cache vs. memory accesses in an multi-processor system. The section
titled CACHE COHERENCY in Documentation/memory-barriers.txt leads me
to believe that the scenario I described is possible.

What happens if CPU1's write of chan-common.cookie only goes into
CPU1's cache. It never makes it to main memory before CPU2 fetches the
old value of 19.

I don't think you should see any performance impact from the smp_mb()
operation.

Thanks,
Ira

 -Original Message-
 From: Ira W. Snyder [mailto:i...@ovro.caltech.edu] 
 Sent: 2011年11月23日 2:59
 To: Shi Xuelin-B29237
 Cc: dan.j.willi...@intel.com; Li Yang-R58472; z...@zh-kernel.org; 
 vinod.k...@intel.com; linuxppc-dev@lists.ozlabs.org; 
 linux-ker...@vger.kernel.org
 Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing 
 spinlock use.
 
 On Tue, Nov 22, 2011 at 12:55:05PM +0800, b29...@freescale.com wrote:
  From: Forrest Shi b29...@freescale.com
  
  dma status check function fsl_tx_status is heavily called in
  a tight loop and the desc lock in fsl_tx_status contended by
  the dma status update function. this caused the dma performance
  degrades much.
  
  this patch releases the lock in the fsl_tx_status function.
  I believe it has no neglect impact on the following call of
  dma_async_is_complete(...).
  
  we can see below three conditions will be identified as success
  a)  x  complete  use
  b)  x  complete+N  use+N
  c)  x  complete  use+N
  here complete is the completed_cookie, use is the last_used
  cookie, x is the querying cookie, N is MAX cookie
  
  when chan-completed_cookie is being read, the last_used may
  be incresed. Anyway it has no neglect impact on the dma status
  decision.
  
  Signed-off-by: Forrest Shi xuelin@freescale.com
  ---
   drivers/dma/fsldma.c |5 -
   1 files changed, 0 insertions(+), 5 deletions(-)
  
  diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 
  8a78154..1dca56f 100644
  --- a/drivers/dma/fsldma.c
  +++ b/drivers/dma/fsldma.c
  @@ -986,15 +986,10 @@ static enum dma_status fsl_tx_status(struct dma_chan 
  *dchan,
  struct fsldma_chan *chan = to_fsl_chan(dchan);
  dma_cookie_t last_complete;
  dma_cookie_t last_used;
  -   unsigned long flags;
  -
  -   spin_lock_irqsave(chan-desc_lock, flags);
   
 
 This will cause a bug. See below for a detailed explanation. You need this 
 instead:
 
   /*
* On an SMP system, we must ensure that this CPU has seen the
* memory accesses performed by another CPU under the
* chan-desc_lock spinlock.
*/
   smp_mb();
  last_complete = chan-completed_cookie;
  last_used = dchan-cookie;
   
  -   spin_unlock_irqrestore(chan-desc_lock, flags);
  -
  dma_set_tx_state(txstate, last_complete, last_used, 0);
  return dma_async_is_complete(cookie, last_complete, last_used);  }
 
 Facts:
 - dchan-cookie is the same member as chan-common.cookie (same memory 
 location)
 - chan-common.cookie is the last allocated cookie for a pending transaction
 - chan-completed_cookie is the last completed transaction
 
 I have replaced dchan-cookie with chan-common.cookie in the below 
 explanation, to keep everything referenced from the same structure.
 
 Variable usage before your change. Everything is used locked.
 - RW chan-common.cookie  (fsl_dma_tx_submit)
 - R  chan-common.cookie  (fsl_tx_status)
 - R  chan-completed_cookie   (fsl_tx_status)
 - W  chan-completed_cookie   (dma_do_tasklet)
 
 Variable usage after your change:
 - RW chan-common.cookie  LOCKED
 - R  chan-common.cookie  NO LOCK
 - R  chan-completed_cookie   NO LOCK
 - W  chan-completed_cookie LOCKED
 
 What if we assume that you have a 2 CPU system (such as a P2020). After your 
 changes, one possible sequence is:
 
 === CPU1 - allocate + submit descriptor: fsl_dma_tx_submit() === 
 spin_lock_irqsave
 descriptor-cookie = 20   (x in your example)
 chan-common.cookie = 20  (used in your example)
 spin_unlock_irqrestore
 
 === CPU2 - immediately calls fsl_tx_status() ===
 chan-common.cookie == 19
 chan-completed_cookie == 19
 descriptor-cookie == 20
 
 Since we don't have locks anymore, CPU2 may not have seen the write to
 chan-common.cookie yet.
 
 Also assume that the DMA hardware has not started processing the transaction 
 yet

Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing spinlock use.

2011-11-22 Thread Ira W. Snyder
On Tue, Nov 22, 2011 at 12:55:05PM +0800, b29...@freescale.com wrote:
 From: Forrest Shi b29...@freescale.com
 
 dma status check function fsl_tx_status is heavily called in
 a tight loop and the desc lock in fsl_tx_status contended by
 the dma status update function. this caused the dma performance
 degrades much.
 
 this patch releases the lock in the fsl_tx_status function.
 I believe it has no neglect impact on the following call of
 dma_async_is_complete(...).
 
 we can see below three conditions will be identified as success
 a)  x  complete  use
 b)  x  complete+N  use+N
 c)  x  complete  use+N
 here complete is the completed_cookie, use is the last_used
 cookie, x is the querying cookie, N is MAX cookie
 
 when chan-completed_cookie is being read, the last_used may
 be incresed. Anyway it has no neglect impact on the dma status
 decision.
 
 Signed-off-by: Forrest Shi xuelin@freescale.com
 ---
  drivers/dma/fsldma.c |5 -
  1 files changed, 0 insertions(+), 5 deletions(-)
 
 diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
 index 8a78154..1dca56f 100644
 --- a/drivers/dma/fsldma.c
 +++ b/drivers/dma/fsldma.c
 @@ -986,15 +986,10 @@ static enum dma_status fsl_tx_status(struct dma_chan 
 *dchan,
   struct fsldma_chan *chan = to_fsl_chan(dchan);
   dma_cookie_t last_complete;
   dma_cookie_t last_used;
 - unsigned long flags;
 -
 - spin_lock_irqsave(chan-desc_lock, flags);
  

This will cause a bug. See below for a detailed explanation. You need
this instead:

/*
 * On an SMP system, we must ensure that this CPU has seen the
 * memory accesses performed by another CPU under the
 * chan-desc_lock spinlock.
 */
smp_mb();
   last_complete = chan-completed_cookie;
   last_used = dchan-cookie;
  
 - spin_unlock_irqrestore(chan-desc_lock, flags);
 -
   dma_set_tx_state(txstate, last_complete, last_used, 0);
   return dma_async_is_complete(cookie, last_complete, last_used);
  }

Facts:
- dchan-cookie is the same member as chan-common.cookie (same memory location)
- chan-common.cookie is the last allocated cookie for a pending transaction
- chan-completed_cookie is the last completed transaction

I have replaced dchan-cookie with chan-common.cookie in the below
explanation, to keep everything referenced from the same structure.

Variable usage before your change. Everything is used locked.
- RW chan-common.cookie(fsl_dma_tx_submit)
- R  chan-common.cookie(fsl_tx_status)
- R  chan-completed_cookie (fsl_tx_status)
- W  chan-completed_cookie (dma_do_tasklet)

Variable usage after your change:
- RW chan-common.cookieLOCKED
- R  chan-common.cookieNO LOCK
- R  chan-completed_cookie NO LOCK
- W  chan-completed_cookie LOCKED

What if we assume that you have a 2 CPU system (such as a P2020). After
your changes, one possible sequence is:

=== CPU1 - allocate + submit descriptor: fsl_dma_tx_submit() ===
spin_lock_irqsave
descriptor-cookie = 20 (x in your example)
chan-common.cookie = 20(used in your example)
spin_unlock_irqrestore

=== CPU2 - immediately calls fsl_tx_status() ===
chan-common.cookie == 19
chan-completed_cookie == 19
descriptor-cookie == 20

Since we don't have locks anymore, CPU2 may not have seen the write to
chan-common.cookie yet.

Also assume that the DMA hardware has not started processing the
transaction yet. Therefore dma_do_tasklet() has not been called, and
chan-completed_cookie has not been updated.

In this case, dma_async_is_complete() (on CPU2) returns DMA_SUCCESS,
even though the DMA operation has not succeeded. The DMA operation has
not even started yet!

The smp_mb() fixes this, since it forces CPU2 to have seen all memory
operations that happened before CPU1 released the spinlock. Spinlocks
are implicit SMP memory barriers.

Therefore, the above example becomes:
smp_mb();
chan-common.cookie == 20
chan-completed_cookie == 19
descriptor-cookie == 20

Then dma_async_is_complete() returns DMA_IN_PROGRESS, which is correct.

Thanks,
Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: PCI DMA to user mem on mpc83xx

2011-05-23 Thread Ira W. Snyder
On Mon, May 23, 2011 at 11:12:41AM +0200, Andre Schwarz wrote:
 Ira,
 
 we have a pretty old PCI device driver here that needs some basic rework 
 running on 2.6.27 on several MPC83xx.
 It's a simple char-device with give me some data implemented using 
 read() resulting in zero-copy DMA to user mem.
 
 There's get_user_pages() working under the hood along with 
 SetPageDirty() and page_cache_release().
 
 Main goal is to prepare a sg-list that gets fed into a DMA controller.
 
 I wonder if there's a more up-to-date/efficient and future proof scheme 
 of creating the mapping.
 
 
 Could you provide some pointers or would you stick to the current scheme ?
 

This scheme is the best you'll come up with for zero-copy IO. I used
get_user_pages_fast(), but otherwise my implementation was the same.
These interfaces should be fairly future proof.

In the end, I realized that most of my transfers were 4 bytes in length,
and zero copy IO was a waste of effort. I decided to use mmap instead.

Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RFCv7 0/2] CARMA Board Support

2011-05-19 Thread Ira W. Snyder
On Thu, May 19, 2011 at 02:13:41PM +1000, Benjamin Herrenschmidt wrote:
 On Fri, 2011-02-11 at 15:34 -0800, Ira W. Snyder wrote:
  Hello everyone,
  
  This is the seventh posting of these drivers, taking into account comments
  from earlier postings. I've made sure that the drivers both pass checkpatch
  without any errors or warnings. I would appreciate as much review as you
  can offer, so that these can get into the next merge cycle. They've been
  sitting outside mainline for far too long.
 
 This has been bitrotting for way too long indeed. I'm sticking this into
 powerpc -next today.
 

Thanks Ben.

I'll grab the -next tree and make sure it builds on my board. I don't
think any API's have changed, but I will send an updated version if they
have.

Thanks,
Ira

  RFCv6 - RFCv7:
  - reference count private data structure (to support unbind)
  - use #defines instead of hex values for registers
  - keep lines =80 characters
  
  RFCv5 - RFCv6:
  - change locking in several functions
  - use list_move_tail() to simplify code
  - remove unused helper functions
  
  RFCv4 - RFCv5:
  - remove unecessary locking per review comments
  - do not clobber return values from *_interruptible()
  - explicitly track buffer DMA mapping
  - use #defines instead of raw hex addresses
  - change enable sysfs attribute to root-writeable only
  
  RFCv3 - RFCv4:
  - updates for DATA-FPGA version 2
  
  RFCv2 - RFCv3:
  - use miscdevice framework (removing the carma class)
  - add bitfile readback capability to the programmer
  
  RFCv1 - RFCv2:
  - change comments to kerneldoc format
  - Kconfig improvements
  - use the videobuf_dma_sg API in the programmer
  - updates for Freescale DMAEngine DMA_SLAVE API changes
  
  KNOWN ISSUES:
  - untested with a setup that can generate interrupts (will get access soon)
  - does not handle runtime unbind
  
  Information about the CARMA board:
  
  The CARMA board is essentially an MPC8349EA MDS reference design with a
  1GHz ADC and 4 high powered data processing FPGAs connected to the local
  bus. It is all packed into a compact PCI form factor. It is used at the
  Owens Valley Radio Observatory as the main component in the correlator
  system.
  
  For board information, see:
  http://www.mmarray.org/~dwh/carma_board/index.html
  
  For DATA-FPGA register layout, see:
  http://www.mmarray.org/memos/carma_memo46.pdf
  
  These drivers are the necessary pieces to get the data processing FPGAs
  working and producing data. Despite the fact that the hardware is custom
  and we are the only users, I'd still like to get the drivers upstream.
  Several people have suggested that this is possible.
  
  Some further patches will be forthcoming. I have a driver for the LED
  subsystem and the PPS subsystem. The LED register layout is expected to
  change soon, so I won't post the driver until that is finished. The PPS
  driver will be posted seperately from this patch series; it is very
  generic.
  
  Thanks to everyone who has provided comments on earlier versions!
  
  Ira W. Snyder (2):
misc: add CARMA DATA-FPGA Access Driver
misc: add CARMA DATA-FPGA Programmer support
  
   drivers/misc/Kconfig|1 +
   drivers/misc/Makefile   |1 +
   drivers/misc/carma/Kconfig  |   18 +
   drivers/misc/carma/Makefile |2 +
   drivers/misc/carma/carma-fpga-program.c | 1141 
   drivers/misc/carma/carma-fpga.c | 1433 
  +++
   6 files changed, 2596 insertions(+), 0 deletions(-)
   create mode 100644 drivers/misc/carma/Kconfig
   create mode 100644 drivers/misc/carma/Makefile
   create mode 100644 drivers/misc/carma/carma-fpga-program.c
   create mode 100644 drivers/misc/carma/carma-fpga.c
  
 
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC v2] virtio: add virtio-over-PCI driver

2011-05-06 Thread Ira W. Snyder
On Fri, May 06, 2011 at 12:00:34PM +, Kushwaha Prabhakar-B32579 wrote:
 Hi,
 
 I want to use this patch as base patch for FSL 85xx platform to support 
 PCIe Agent.
 The work looks to be little old now. So wanted to understand if any 
 development has happened further on it.
 
 In case no, I would take this work forward for PCIe Agent. 
 
 Any help/suggestions are most appreciated in this regard.
 

Hi Prabhakar,

I use PCI agent mode on an mpc8349emds board. All of the important setup is
done very early in the boot process, by U-Boot. Search the U-Boot source
for CONFIG_PCISLAVE. I hunch that the setup needed for 85xx boards are
similar.

This virtio-over-PCI work is now very old. It was intended to provide a
communication mechanism between a PCI Master and many PCI Agents (slaves).
Dave Miller (networking maintainer) suggested to use virtio for this so
that many different devices could be used. Such as:
- network interface
- serial port (for serial console)

I am aware of other ongoing work in this area. Specifically, some ARM
developers are working on a virtio API using their message registers. This
work is much newer, and will be a much better starting place for you.

Search the virtualization mailing list for:
[PATCH 00/02] virtio: Virtio platform driver

Here is a link to some of their code:
http://www.spinics.net/lists/linux-sh/msg07188.html

I am currently using a custom driver to provide a network device on my PCI
agents. Searching the mailing list archives for PCINet, you will find
early versions of the driver. I am happy to provide you a current copy. It
does not use virtio at all, and is unlikely to be accepted into mainline
Linux.

I am happy to provide any of my code if you think it would help you get
started. Specifically, the current version of PCINet show how to use the
DMA controller in order to get good network performance. I am also happy to
help port code to 83xx, as well as test on 83xx. Please ask any questions
you may have.

I have people ask about this code about once every two months. There is
plenty of interest in a mainline Linux solution to this problem. :) I
will be moving to 85xx someday, and I hope there is an accepted mainline
solution by then.

I hope it helps,
Ira

 -Original Message-
 From: linux-kernel-ow...@vger.kernel.org 
 [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Ira Snyder
 Sent: Friday, 27 February, 2009 3:19 AM
 To: Arnd Bergmann
 Cc: linux-ker...@vger.kernel.org; Rusty Russell; Jan-Bernd Themann; 
 linuxppc-...@ozlabs.org; net...@vger.kernel.org
 Subject: Re: [RFC v2] virtio: add virtio-over-PCI driver
 
 On Thu, Feb 26, 2009 at 09:37:14PM +0100, Arnd Bergmann wrote:
  On Thursday 26 February 2009, Ira Snyder wrote:
   On Thu, Feb 26, 2009 at 05:15:27PM +0100, Arnd Bergmann wrote:
  
   I think so too. I was just getting something working, and thought it 
   would be better to have it out there rather than be working on it 
   forever. I'll try to break things up as I have time.
  
  Ok, perfect!
   
   For the libraries, would you suggest breaking things into seperate 
   code files, and using EXPORT_SYMBOL_GPL()? I'm not very familiar 
   with doing that, I've mostly been writing code within the existing 
   device driver frameworks. Or do I need export symbol at all? I'm not 
   sure...
  
  You have both options. When you list each file as a separate module in 
  the Makefile, you use EXPORT_SYMBOL_GPL to mark functions that get 
  called by dependent modules, but this will work only in one way.
  
  You can also link multiple files together into one module, although it 
  is less common to link a single source file into multiple modules.
  
 
 Ok. I'm more familiar with the EXPORT_SYMBOL_GPL interface, so I'll do that. 
 If we decide it sucks later, we'll change it.
 
   I always thought you were supposed to use packed for data structures 
   that are external to the system. I purposely designed the structures 
   so they wouldn't need padding.
  
  That would only make sense for structures that are explicitly 
  unaligned, like a register layout using
  
  struct my_registers {
  __le16 first;
  __le32 second __attribute__((packed));
  __le16 third;
  };
  
  Even here, I'd recommend listing the individual members as packed 
  rather than the entire struct. Obviously if you layout the members in 
  a sane way, you don't need either.
  
 
 Ok. I'll drop the __attribute__((packed)) and make sure there aren't 
 problems. I don't suspect any, though.
 
   I mostly don't need it. In fact, the only place I'm using registers 
   not specific to the messaging unit is in the probe routine, where I 
   setup the 1GB window into host memory and setting up access to the 
   guest memory on the PCI bus.
  
  You could add the registers you need for this to the reg property of 
  your device, to be mapped with of_iomap.
  
  If the registers for setting up this window don't logically fit into 
  the same device as the one you 

Re: tmpfs size restriction

2011-04-20 Thread Ira W. Snyder
On Wed, Apr 20, 2011 at 09:21:00PM +0200, Schwarz,Andre wrote:
 Hi,
 
 I'm facing an issue with tmpfs mounts on PowerPC (mpc83xx specifically).
 
 After mount -t tmpfs tmpfs /tmp -o size=16m I can fill the machine's mem
 (512MiB) until oom becomes active.
 
 I can't see this on any other machine (x86/ARM) I have access to.
 There's always a no space left on device message as soon as size specified 
 is
 reached ...
 
 kernel versions available are v2.6.26.27 and v2.6.34.7 showing the same
 behaviour.
 
 I'd expect the kernel to limit available tmpfs size to 50% of physical memory
 anyway.
 
 Any ideas what might be wrong ?
 

For what it is worth, I tried this on an 8349EA board, using 2.6.38rc8.
It behaved exactly as I would expect. A short log is below. Maybe your
mount command parses options differently on the powerpc machine? Try it
with the mount options before the mount points?

iws@carmaboard7 ~ $ mkdir mnt
mkdir: cannot create directory `mnt': File exists
iws@carmaboard7 ~ $ ls mnt/
iws@carmaboard7 ~ $ sudo mount -t tmpfs -o size=16m,users none mnt
iws@carmaboard7 ~ $ ls mnt/
iws@carmaboard7 ~ $ mount | grep mnt
none on /home/iws/mnt type tmpfs (rw,nosuid,nodev,noexec,relatime,size=16384k)
iws@carmaboard7 ~ $ cd ^C
iws@carmaboard7 ~ $ dd if=/dev/zero of=mnt/file.bin bs=1M count=18
dd: writing `mnt/file.bin': No space left on device
16+0 records in
15+0 records out
16760832 bytes (17 MB) copied, 0.313836 s, 53.4 MB/s
iws@carmaboard7 ~ $ du -b mnt/file.bin 
16760832mnt/file.bin
iws@carmaboard7 ~ $ df -h mnt
FilesystemSize  Used Avail Use% Mounted on
none   16M - -   -  /home/iws/mnt
iws@carmaboard7 ~ $ uname -a
Linux carmaboard7.correlator.pvt 2.6.38-rc8-00028-g24d6894 #1 Tue Mar 8 
09:48:15 PST 2011 ppc e300c1 GNU/Linux
iws@carmaboard7 ~ $ cat /proc/cpuinfo 
processor   : 0
cpu : e300c1
clock   : 533.28MHz
revision: 3.1 (pvr 8083 0031)
bogomips: 133.29
timebase: 66646782
platform: MPC834x MDS
model   : CARMA
Memory  : 256 MB

Hope it helps,
Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: platform_driver/of_platform_driver compile warning in fsldma.c

2011-04-08 Thread Ira W. Snyder
On Fri, Apr 08, 2011 at 04:12:13AM -0500, Kumar Gala wrote:
 Grant,
 
 I'm being lazy, can you give any quick insight on the following compile 
 warning:
 
 drivers/dma/fsldma.c:1457:2: warning: initialization from incompatible 
 pointer type
 drivers/dma/fsldma.c: In function 'fsldma_init':
 drivers/dma/fsldma.c:1468:2: warning: passing argument 1 of 
 'platform_driver_register' from incompatible pointer type
 include/linux/platform_device.h:124:12: note: expected 'struct 
 platform_driver *' but argument is of type 'struct of_platform_driver *'
 drivers/dma/fsldma.c: In function 'fsldma_exit':
 drivers/dma/fsldma.c:1473:2: warning: passing argument 1 of 
 'platform_driver_unregister' from incompatible pointer type
 include/linux/platform_device.h:125:13: note: expected 'struct 
 platform_driver *' but argument is of type 'struct of_platform_driver *'
 

The struct of_platform_driver needs to be changed to a
struct platform_driver. Just remove the of_ prefix, the structure
initialization is correct. I sent a patch for this yesterday to LKML. The
title is: fsldma: fix build warning caused by of_platform_device changes

Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Using dmaengine on Freescale P2020 RDB

2011-04-06 Thread Ira W. Snyder
On Wed, Apr 06, 2011 at 12:40:58PM -0700, Chuck Ketcham wrote:
 All,
 
 I have a Freescale P2020 Reference Design Board.  I am investigating the 
 possibility of using the dmaengine capability in the 2.6.32.13 kernel to 
 transfer data from memory out onto the PCIe bus.  As a first step, I thought 
 I would try the DMA test client (dmatest.ko) to make sure the dmaengine was 
 functioning.  I know this doesn't transfer anything over PCIe but only 
 transfers from one memory buffer to another, but I figured I need to get this 
 working first.  Anyway I built dmatest.ko and ran it (with insmod), and 
 discovered it didn't do anything.  I added some printk's to the kernel to 
 investigate what was going on and I found that all attempts to find a channel 
 within dma_request_channel were unsuccessful.  Three of the channels were not 
 used because they were already publicly allocated.  One channel was not used 
 because it didn't have DMA_MEMCPY capability.
 
 Here are my questions then:
 1. Is the dmaengine the appropriate method to use for transferring data from 
 memory out onto the PCIe bus?
 2. If dmaengine is correct, what can I do to free up a channel for my own use?
 

I use the Freescale DMA engine to transfer lots of data out to PCI, on
an 8349EA chip. The P2020 DMA engine uses the same driver.

I hunch you have enabled CONFIG_NET_DMA, which will claim the channels.
You should disable it to use the devices for other uses.

If you want an example of using the DMA engine to transfer from DDR
memory to the PowerPC local bus, search the mailing list archives for
CARMA Board Drivers (RFCv7 was the latest posting). Transferring from
DDR to PCI works exactly the same way.

Hope it helps,
Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Using dmaengine on Freescale P2020 RDB

2011-04-06 Thread Ira W. Snyder
On Wed, Apr 06, 2011 at 01:29:05PM -0700, Chuck Ketcham wrote:
 Ira,
 
 Thanks for the reference to the CARMA drivers.  I will have to take a look at 
 that.
 
 In my case, CONFIG_NET_DMA is not enabled.  However, I did notice the 
 following entry in my p2020rdb.dts file that may have something to do with 
 dma channels being allocated -- can anyone interpret this?:
 
 dma@21300 {
 #address-cells = 1;
 #size-cells = 1;
 compatible = fsl,eloplus-dma;
 reg = 0x21300 0x4;
 ranges = 0x0 0x21100 0x200;
 cell-index = 0;
 dma-channel@0 {
 compatible = fsl,eloplus-dma-channel;
 reg = 0x0 0x80;
 cell-index = 0;
 interrupt-parent = mpic;
 interrupts = 20 2;
 };
 dma-channel@80 {
 compatible = fsl,eloplus-dma-channel;
 reg = 0x80 0x80;
 cell-index = 1;
 interrupt-parent = mpic;
 interrupts = 21 2;
 };
 dma-channel@100 {
 compatible = fsl,eloplus-dma-channel;
 reg = 0x100 0x80;
 cell-index = 2;
 interrupt-parent = mpic;
 interrupts = 22 2;
 };
 dma-channel@180 {
 compatible = fsl,eloplus-dma-channel;
 reg = 0x180 0x80;
 cell-index = 3;
 interrupt-parent = mpic;
 interrupts = 23 2;
 };
 };
 
 

Your DTS file looks fine. It is what I would expect to see. The channels
are not allocated by anything here.

Turning on CONFIG_DMADEVICES_DEBUG may give you some insight into how
the dmaengine core is allocating the channels. I don't have any better
advice. I'm afraid you'll have to figure out who is requesting all of
the channels on your own.

Ira

 --- On Wed, 4/6/11, Ira W. Snyder i...@ovro.caltech.edu wrote:
 
  From: Ira W. Snyder i...@ovro.caltech.edu
  Subject: Re: Using dmaengine on Freescale P2020 RDB
  To: Chuck Ketcham chuckk2...@yahoo.com
  Cc: linuxppc-dev@lists.ozlabs.org
  Date: Wednesday, April 6, 2011, 1:10 PM
  On Wed, Apr 06, 2011 at 12:40:58PM
  -0700, Chuck Ketcham wrote:
   All,
   
   I have a Freescale P2020 Reference Design Board. 
  I am investigating the possibility of using the dmaengine
  capability in the 2.6.32.13 kernel to transfer data from
  memory out onto the PCIe bus.  As a first step, I
  thought I would try the DMA test client (dmatest.ko) to make
  sure the dmaengine was functioning.  I know this
  doesn't transfer anything over PCIe but only transfers from
  one memory buffer to another, but I figured I need to get
  this working first.  Anyway I built dmatest.ko and ran
  it (with insmod), and discovered it didn't do
  anything.  I added some printk's to the kernel to
  investigate what was going on and I found that all attempts
  to find a channel within dma_request_channel were
  unsuccessful.  Three of the channels were not used
  because they were already publicly allocated.  One
  channel was not used because it didn't have DMA_MEMCPY
  capability.
   
   Here are my questions then:
   1. Is the dmaengine the appropriate method to use for
  transferring data from memory out onto the PCIe bus?
   2. If dmaengine is correct, what can I do to free up a
  channel for my own use?
   
  
  I use the Freescale DMA engine to transfer lots of data out
  to PCI, on
  an 8349EA chip. The P2020 DMA engine uses the same driver.
  
  I hunch you have enabled CONFIG_NET_DMA, which will claim
  the channels.
  You should disable it to use the devices for other uses.
  
  If you want an example of using the DMA engine to transfer
  from DDR
  memory to the PowerPC local bus, search the mailing list
  archives for
  CARMA Board Drivers (RFCv7 was the latest posting).
  Transferring from
  DDR to PCI works exactly the same way.
  
  Hope it helps,
  Ira
  ___
  Linuxppc-dev mailing list
  Linuxppc-dev@lists.ozlabs.org
  https://lists.ozlabs.org/listinfo/linuxppc-dev
 
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v3 1/9] dmatest: fix automatic buffer unmap type

2011-03-03 Thread Ira W. Snyder
The dmatest code relies on the DMAEngine API to automatically call
dma_unmap_single() on src buffers. The flags it passes are incorrect,
fix them.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/dmatest.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index 5589358..7e1b0aa 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -285,7 +285,12 @@ static int dmatest_func(void *data)
 
set_user_nice(current, 10);
 
-   flags = DMA_CTRL_ACK | DMA_COMPL_SKIP_DEST_UNMAP | DMA_PREP_INTERRUPT;
+   /*
+* src buffers are freed by the DMAEngine code with dma_unmap_single()
+* dst buffers are freed by ourselves below
+*/
+   flags = DMA_CTRL_ACK | DMA_PREP_INTERRUPT
+ | DMA_COMPL_SKIP_DEST_UNMAP | DMA_COMPL_SRC_UNMAP_SINGLE;
 
while (!kthread_should_stop()
!(iterations  total_tests = iterations)) {
-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v3 2/9] fsldma: move related helper functions near each other

2011-03-03 Thread Ira W. Snyder
This is a purely cosmetic cleanup. It is nice to have related functions
right next to each other in the code.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |  116 +++--
 1 files changed, 64 insertions(+), 52 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 4de947a..2e1af45 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -39,33 +39,9 @@
 
 static const char msg_ld_oom[] = No free memory for link descriptor\n;
 
-static void dma_init(struct fsldma_chan *chan)
-{
-   /* Reset the channel */
-   DMA_OUT(chan, chan-regs-mr, 0, 32);
-
-   switch (chan-feature  FSL_DMA_IP_MASK) {
-   case FSL_DMA_IP_85XX:
-   /* Set the channel to below modes:
-* EIE - Error interrupt enable
-* EOSIE - End of segments interrupt enable (basic mode)
-* EOLNIE - End of links interrupt enable
-* BWC - Bandwidth sharing among channels
-*/
-   DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC
-   | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE
-   | FSL_DMA_MR_EOSIE, 32);
-   break;
-   case FSL_DMA_IP_83XX:
-   /* Set the channel to below modes:
-* EOTIE - End-of-transfer interrupt enable
-* PRC_RM - PCI read multiple
-*/
-   DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_EOTIE
-   | FSL_DMA_MR_PRC_RM, 32);
-   break;
-   }
-}
+/*
+ * Register Helpers
+ */
 
 static void set_sr(struct fsldma_chan *chan, u32 val)
 {
@@ -77,6 +53,30 @@ static u32 get_sr(struct fsldma_chan *chan)
return DMA_IN(chan, chan-regs-sr, 32);
 }
 
+static void set_cdar(struct fsldma_chan *chan, dma_addr_t addr)
+{
+   DMA_OUT(chan, chan-regs-cdar, addr | FSL_DMA_SNEN, 64);
+}
+
+static dma_addr_t get_cdar(struct fsldma_chan *chan)
+{
+   return DMA_IN(chan, chan-regs-cdar, 64)  ~FSL_DMA_SNEN;
+}
+
+static dma_addr_t get_ndar(struct fsldma_chan *chan)
+{
+   return DMA_IN(chan, chan-regs-ndar, 64);
+}
+
+static u32 get_bcr(struct fsldma_chan *chan)
+{
+   return DMA_IN(chan, chan-regs-bcr, 32);
+}
+
+/*
+ * Descriptor Helpers
+ */
+
 static void set_desc_cnt(struct fsldma_chan *chan,
struct fsl_dma_ld_hw *hw, u32 count)
 {
@@ -113,24 +113,49 @@ static void set_desc_next(struct fsldma_chan *chan,
hw-next_ln_addr = CPU_TO_DMA(chan, snoop_bits | next, 64);
 }
 
-static void set_cdar(struct fsldma_chan *chan, dma_addr_t addr)
+static void set_ld_eol(struct fsldma_chan *chan,
+   struct fsl_desc_sw *desc)
 {
-   DMA_OUT(chan, chan-regs-cdar, addr | FSL_DMA_SNEN, 64);
-}
+   u64 snoop_bits;
 
-static dma_addr_t get_cdar(struct fsldma_chan *chan)
-{
-   return DMA_IN(chan, chan-regs-cdar, 64)  ~FSL_DMA_SNEN;
-}
+   snoop_bits = ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_83XX)
+   ? FSL_DMA_SNEN : 0;
 
-static dma_addr_t get_ndar(struct fsldma_chan *chan)
-{
-   return DMA_IN(chan, chan-regs-ndar, 64);
+   desc-hw.next_ln_addr = CPU_TO_DMA(chan,
+   DMA_TO_CPU(chan, desc-hw.next_ln_addr, 64) | FSL_DMA_EOL
+   | snoop_bits, 64);
 }
 
-static u32 get_bcr(struct fsldma_chan *chan)
+/*
+ * DMA Engine Hardware Control Helpers
+ */
+
+static void dma_init(struct fsldma_chan *chan)
 {
-   return DMA_IN(chan, chan-regs-bcr, 32);
+   /* Reset the channel */
+   DMA_OUT(chan, chan-regs-mr, 0, 32);
+
+   switch (chan-feature  FSL_DMA_IP_MASK) {
+   case FSL_DMA_IP_85XX:
+   /* Set the channel to below modes:
+* EIE - Error interrupt enable
+* EOSIE - End of segments interrupt enable (basic mode)
+* EOLNIE - End of links interrupt enable
+* BWC - Bandwidth sharing among channels
+*/
+   DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC
+   | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE
+   | FSL_DMA_MR_EOSIE, 32);
+   break;
+   case FSL_DMA_IP_83XX:
+   /* Set the channel to below modes:
+* EOTIE - End-of-transfer interrupt enable
+* PRC_RM - PCI read multiple
+*/
+   DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_EOTIE
+   | FSL_DMA_MR_PRC_RM, 32);
+   break;
+   }
 }
 
 static int dma_is_idle(struct fsldma_chan *chan)
@@ -185,19 +210,6 @@ static void dma_halt(struct fsldma_chan *chan)
dev_err(chan-dev, DMA halt timeout!\n);
 }
 
-static void set_ld_eol(struct fsldma_chan *chan,
-   struct fsl_desc_sw *desc)
-{
-   u64 snoop_bits;
-
-   snoop_bits = ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_83XX

[PATCH v3 0/9] fsldma: lockup fixes

2011-03-03 Thread Ira W. Snyder
Hello everyone,

I've been chasing random infrequent controller lockups in the fsldma driver
for a long time. I finally managed to find the problem and fix it. I'm not
quite sure about the exact sequence of events which causes the race
condition, but it is related to using the hardware registers to track the
controller state. See the patch changelogs for more detail.

The problems were quickly found by turning on DMAPOOL_DEBUG inside
mm/dmapool.c. This poisons memory allocated with the dmapool API.

With dmapool poisoning turned on, the dmatest driver would start producing
failures within a few seconds. After this patchset has been applied, I have
run several iterations of the 10 threads per channel, 10 iterations per
thread test without any problems. I have also tested it with the CARMA
drivers (posted at linuxppc-dev previously), which make use of the external
control features.

While making the previous changes, I noticed that the fsldma driver does
not respect the automatic DMA unmapping of src and dst buffers. I have
added support for this feature. This also required a fix to dmatest, which
was sending incorrect flags.

The support async_tx dependencies patch could be split apart from the
automatic unmapping patch if it is desirable. They both touch the same
piece of code, so I thought it was ok to combine them. Let me know.

I would really like to see this go into 2.6.39. I think we can get it
reviewed before then. :)

Much thanks goes to Felix Radensky for testing on a P2020 (85xx DMA IP core).
I wouldn't have been able to track down the problems on 85xx without his
dilligent testing.

v2 - v3:
- use chan_dbg() and chan_err() macros for channel printk

v1 - v2:
- reordered patches (dmatest change is first now)
- fix problems on 85xx controller
- only set correct bits for 83xx in dma_halt()

Ira W. Snyder (9):
  dmatest: fix automatic buffer unmap type
  fsldma: move related helper functions near each other
  fsldma: use channel name in printk output
  fsldma: improve link descriptor debugging
  fsldma: minor codingstyle and consistency fixes
  fsldma: fix controller lockups
  fsldma: support async_tx dependencies and automatic unmapping
  fsldma: reduce locking during descriptor cleanup
  fsldma: make halt behave nicely on all supported controllers

 drivers/dma/dmatest.c |7 +-
 drivers/dma/fsldma.c  |  551 +++--
 drivers/dma/fsldma.h  |6 +-
 3 files changed, 311 insertions(+), 253 deletions(-)

-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v3 4/9] fsldma: improve link descriptor debugging

2011-03-03 Thread Ira W. Snyder
This adds better tracking to link descriptor allocations, callbacks, and
frees. This makes it much easier to track errors with link descriptors.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   21 +++--
 1 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index e535cd1..82b8e9f 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -420,6 +420,10 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor(
desc-async_tx.tx_submit = fsl_dma_tx_submit;
desc-async_tx.phys = pdesc;
 
+#ifdef FSL_DMA_LD_DEBUG
+   chan_dbg(chan, LD %p allocated\n, desc);
+#endif
+
return desc;
 }
 
@@ -470,6 +474,9 @@ static void fsldma_free_desc_list(struct fsldma_chan *chan,
 
list_for_each_entry_safe(desc, _desc, list, node) {
list_del(desc-node);
+#ifdef FSL_DMA_LD_DEBUG
+   chan_dbg(chan, LD %p free\n, desc);
+#endif
dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys);
}
 }
@@ -481,6 +488,9 @@ static void fsldma_free_desc_list_reverse(struct 
fsldma_chan *chan,
 
list_for_each_entry_safe_reverse(desc, _desc, list, node) {
list_del(desc-node);
+#ifdef FSL_DMA_LD_DEBUG
+   chan_dbg(chan, LD %p free\n, desc);
+#endif
dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys);
}
 }
@@ -557,9 +567,6 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy(
chan_err(chan, %s\n, msg_ld_oom);
goto fail;
}
-#ifdef FSL_DMA_LD_DEBUG
-   chan_dbg(chan, new link desc alloc %p\n, new);
-#endif
 
copy = min(len, (size_t)FSL_DMA_BCR_MAX_CNT);
 
@@ -645,9 +652,6 @@ static struct dma_async_tx_descriptor 
*fsl_dma_prep_sg(struct dma_chan *dchan,
chan_err(chan, %s\n, msg_ld_oom);
goto fail;
}
-#ifdef FSL_DMA_LD_DEBUG
-   chan_dbg(chan, new link desc alloc %p\n, new);
-#endif
 
set_desc_cnt(chan, new-hw, len);
set_desc_src(chan, new-hw, src);
@@ -882,13 +886,18 @@ static void fsl_chan_ld_cleanup(struct fsldma_chan *chan)
callback_param = desc-async_tx.callback_param;
if (callback) {
spin_unlock_irqrestore(chan-desc_lock, flags);
+#ifdef FSL_DMA_LD_DEBUG
chan_dbg(chan, LD %p callback\n, desc);
+#endif
callback(callback_param);
spin_lock_irqsave(chan-desc_lock, flags);
}
 
/* Run any dependencies, then free the descriptor */
dma_run_dependencies(desc-async_tx);
+#ifdef FSL_DMA_LD_DEBUG
+   chan_dbg(chan, LD %p free\n, desc);
+#endif
dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys);
}
 
-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v3 3/9] fsldma: use channel name in printk output

2011-03-03 Thread Ira W. Snyder
This makes debugging the driver much easier when multiple channels are
running concurrently. In addition, you can see how much descriptor
memory each channel has allocated via the dmapool API in sysfs.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   69 +
 drivers/dma/fsldma.h |1 +
 2 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 2e1af45..e535cd1 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -37,7 +37,12 @@
 
 #include fsldma.h
 
-static const char msg_ld_oom[] = No free memory for link descriptor\n;
+#define chan_dbg(chan, fmt, arg...)\
+   dev_dbg(chan-dev, %s:  fmt, chan-name, ##arg)
+#define chan_err(chan, fmt, arg...)\
+   dev_err(chan-dev, %s:  fmt, chan-name, ##arg)
+
+static const char msg_ld_oom[] = No free memory for link descriptor;
 
 /*
  * Register Helpers
@@ -207,7 +212,7 @@ static void dma_halt(struct fsldma_chan *chan)
}
 
if (!dma_is_idle(chan))
-   dev_err(chan-dev, DMA halt timeout!\n);
+   chan_err(chan, DMA halt timeout!\n);
 }
 
 /**
@@ -405,7 +410,7 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor(
 
desc = dma_pool_alloc(chan-desc_pool, GFP_ATOMIC, pdesc);
if (!desc) {
-   dev_dbg(chan-dev, out of memory for link desc\n);
+   chan_dbg(chan, out of memory for link descriptor\n);
return NULL;
}
 
@@ -439,13 +444,11 @@ static int fsl_dma_alloc_chan_resources(struct dma_chan 
*dchan)
 * We need the descriptor to be aligned to 32bytes
 * for meeting FSL DMA specification requirement.
 */
-   chan-desc_pool = dma_pool_create(fsl_dma_engine_desc_pool,
- chan-dev,
+   chan-desc_pool = dma_pool_create(chan-name, chan-dev,
  sizeof(struct fsl_desc_sw),
  __alignof__(struct fsl_desc_sw), 0);
if (!chan-desc_pool) {
-   dev_err(chan-dev, unable to allocate channel %d 
-  descriptor pool\n, chan-id);
+   chan_err(chan, unable to allocate descriptor pool\n);
return -ENOMEM;
}
 
@@ -491,7 +494,7 @@ static void fsl_dma_free_chan_resources(struct dma_chan 
*dchan)
struct fsldma_chan *chan = to_fsl_chan(dchan);
unsigned long flags;
 
-   dev_dbg(chan-dev, Free all channel resources.\n);
+   chan_dbg(chan, free all channel resources\n);
spin_lock_irqsave(chan-desc_lock, flags);
fsldma_free_desc_list(chan, chan-ld_pending);
fsldma_free_desc_list(chan, chan-ld_running);
@@ -514,7 +517,7 @@ fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned 
long flags)
 
new = fsl_dma_alloc_descriptor(chan);
if (!new) {
-   dev_err(chan-dev, msg_ld_oom);
+   chan_err(chan, %s\n, msg_ld_oom);
return NULL;
}
 
@@ -551,11 +554,11 @@ static struct dma_async_tx_descriptor 
*fsl_dma_prep_memcpy(
/* Allocate the link descriptor from DMA pool */
new = fsl_dma_alloc_descriptor(chan);
if (!new) {
-   dev_err(chan-dev, msg_ld_oom);
+   chan_err(chan, %s\n, msg_ld_oom);
goto fail;
}
 #ifdef FSL_DMA_LD_DEBUG
-   dev_dbg(chan-dev, new link desc alloc %p\n, new);
+   chan_dbg(chan, new link desc alloc %p\n, new);
 #endif
 
copy = min(len, (size_t)FSL_DMA_BCR_MAX_CNT);
@@ -639,11 +642,11 @@ static struct dma_async_tx_descriptor 
*fsl_dma_prep_sg(struct dma_chan *dchan,
/* allocate and populate the descriptor */
new = fsl_dma_alloc_descriptor(chan);
if (!new) {
-   dev_err(chan-dev, msg_ld_oom);
+   chan_err(chan, %s\n, msg_ld_oom);
goto fail;
}
 #ifdef FSL_DMA_LD_DEBUG
-   dev_dbg(chan-dev, new link desc alloc %p\n, new);
+   chan_dbg(chan, new link desc alloc %p\n, new);
 #endif
 
set_desc_cnt(chan, new-hw, len);
@@ -815,7 +818,7 @@ static void fsl_dma_update_completed_cookie(struct 
fsldma_chan *chan)
spin_lock_irqsave(chan-desc_lock, flags);
 
if (list_empty(chan-ld_running)) {
-   dev_dbg(chan-dev, no running descriptors\n);
+   chan_dbg(chan, no running descriptors\n);
goto out_unlock;
}
 
@@ -863,7 +866,7 @@ static void fsl_chan_ld_cleanup(struct fsldma_chan *chan)
 
spin_lock_irqsave(chan-desc_lock, flags);
 
-   dev_dbg(chan-dev, chan completed_cookie = %d\n, 
chan-completed_cookie);
+   chan_dbg(chan, chan

[PATCH v3 9/9] fsldma: make halt behave nicely on all supported controllers

2011-03-03 Thread Ira W. Snyder
The original dma_halt() function set the CA (channel abort) bit on both
the 83xx and 85xx controllers. This is incorrect on the 83xx, where this
bit means TEM (transfer error mask) instead. The 83xx doesn't support
channel abort, so we only do this operation on 85xx.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   19 ---
 1 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index d300de4..8670a50 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -221,13 +221,26 @@ static void dma_halt(struct fsldma_chan *chan)
u32 mode;
int i;
 
+   /* read the mode register */
mode = DMA_IN(chan, chan-regs-mr, 32);
-   mode |= FSL_DMA_MR_CA;
-   DMA_OUT(chan, chan-regs-mr, mode, 32);
 
-   mode = ~(FSL_DMA_MR_CS | FSL_DMA_MR_EMS_EN | FSL_DMA_MR_CA);
+   /*
+* The 85xx controller supports channel abort, which will stop
+* the current transfer. On 83xx, this bit is the transfer error
+* mask bit, which should not be changed.
+*/
+   if ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) {
+   mode |= FSL_DMA_MR_CA;
+   DMA_OUT(chan, chan-regs-mr, mode, 32);
+
+   mode = ~FSL_DMA_MR_CA;
+   }
+
+   /* stop the DMA controller */
+   mode = ~(FSL_DMA_MR_CS | FSL_DMA_MR_EMS_EN);
DMA_OUT(chan, chan-regs-mr, mode, 32);
 
+   /* wait for the DMA controller to become idle */
for (i = 0; i  100; i++) {
if (dma_is_idle(chan))
return;
-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v3 5/9] fsldma: minor codingstyle and consistency fixes

2011-03-03 Thread Ira W. Snyder
This fixes some minor violations of the coding style. It also changes
the style of the device_prep_dma_*() function definitions so they are
identical.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   29 +
 drivers/dma/fsldma.h |4 ++--
 2 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 82b8e9f..5da1a4a 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -89,7 +89,7 @@ static void set_desc_cnt(struct fsldma_chan *chan,
 }
 
 static void set_desc_src(struct fsldma_chan *chan,
-   struct fsl_dma_ld_hw *hw, dma_addr_t src)
+struct fsl_dma_ld_hw *hw, dma_addr_t src)
 {
u64 snoop_bits;
 
@@ -99,7 +99,7 @@ static void set_desc_src(struct fsldma_chan *chan,
 }
 
 static void set_desc_dst(struct fsldma_chan *chan,
-   struct fsl_dma_ld_hw *hw, dma_addr_t dst)
+struct fsl_dma_ld_hw *hw, dma_addr_t dst)
 {
u64 snoop_bits;
 
@@ -109,7 +109,7 @@ static void set_desc_dst(struct fsldma_chan *chan,
 }
 
 static void set_desc_next(struct fsldma_chan *chan,
-   struct fsl_dma_ld_hw *hw, dma_addr_t next)
+ struct fsl_dma_ld_hw *hw, dma_addr_t next)
 {
u64 snoop_bits;
 
@@ -118,8 +118,7 @@ static void set_desc_next(struct fsldma_chan *chan,
hw-next_ln_addr = CPU_TO_DMA(chan, snoop_bits | next, 64);
 }
 
-static void set_ld_eol(struct fsldma_chan *chan,
-   struct fsl_desc_sw *desc)
+static void set_ld_eol(struct fsldma_chan *chan, struct fsl_desc_sw *desc)
 {
u64 snoop_bits;
 
@@ -338,8 +337,7 @@ static void fsl_chan_toggle_ext_start(struct fsldma_chan 
*chan, int enable)
chan-feature = ~FSL_DMA_CHAN_START_EXT;
 }
 
-static void append_ld_queue(struct fsldma_chan *chan,
-   struct fsl_desc_sw *desc)
+static void append_ld_queue(struct fsldma_chan *chan, struct fsl_desc_sw *desc)
 {
struct fsl_desc_sw *tail = to_fsl_desc(chan-ld_pending.prev);
 
@@ -380,8 +378,8 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
cookie = chan-common.cookie;
list_for_each_entry(child, desc-tx_list, node) {
cookie++;
-   if (cookie  0)
-   cookie = 1;
+   if (cookie  DMA_MIN_COOKIE)
+   cookie = DMA_MIN_COOKIE;
 
child-async_tx.cookie = cookie;
}
@@ -402,8 +400,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
  *
  * Return - The descriptor allocated. NULL for failed.
  */
-static struct fsl_desc_sw *fsl_dma_alloc_descriptor(
-   struct fsldma_chan *chan)
+static struct fsl_desc_sw *fsl_dma_alloc_descriptor(struct fsldma_chan *chan)
 {
struct fsl_desc_sw *desc;
dma_addr_t pdesc;
@@ -427,7 +424,6 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor(
return desc;
 }
 
-
 /**
  * fsl_dma_alloc_chan_resources - Allocate resources for DMA channel.
  * @chan : Freescale DMA channel
@@ -537,14 +533,15 @@ fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned 
long flags)
/* Insert the link descriptor to the LD ring */
list_add_tail(new-node, new-tx_list);
 
-   /* Set End-of-link to the last link descriptor of new list*/
+   /* Set End-of-link to the last link descriptor of new list */
set_ld_eol(chan, new);
 
return new-async_tx;
 }
 
-static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy(
-   struct dma_chan *dchan, dma_addr_t dma_dst, dma_addr_t dma_src,
+static struct dma_async_tx_descriptor *
+fsl_dma_prep_memcpy(struct dma_chan *dchan,
+   dma_addr_t dma_dst, dma_addr_t dma_src,
size_t len, unsigned long flags)
 {
struct fsldma_chan *chan;
@@ -594,7 +591,7 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy(
new-async_tx.flags = flags; /* client is in control of this ack */
new-async_tx.cookie = -EBUSY;
 
-   /* Set End-of-link to the last link descriptor of new list*/
+   /* Set End-of-link to the last link descriptor of new list */
set_ld_eol(chan, new);
 
return first-async_tx;
diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h
index 113e713..49189da 100644
--- a/drivers/dma/fsldma.h
+++ b/drivers/dma/fsldma.h
@@ -102,8 +102,8 @@ struct fsl_desc_sw {
 } __attribute__((aligned(32)));
 
 struct fsldma_chan_regs {
-   u32 mr; /* 0x00 - Mode Register */
-   u32 sr; /* 0x04 - Status Register */
+   u32 mr; /* 0x00 - Mode Register */
+   u32 sr; /* 0x04 - Status Register */
u64 cdar;   /* 0x08 - Current descriptor address register */
u64 sar;/* 0x10 - Source Address Register */
u64 dar;/* 0x18 - Destination Address

[PATCH v3 6/9] fsldma: fix controller lockups

2011-03-03 Thread Ira W. Snyder
Enabling poisoning in the dmapool API quickly showed that the DMA
controller was fetching descriptors that should not have been in use.
This has caused intermittent controller lockups during testing.

I have been unable to figure out the exact set of conditions which cause
this to happen. However, I believe it is related to the driver using the
hardware registers to track whether the controller is busy or not. The
code can incorrectly decide that the hardware is idle due to lag between
register writes and the hardware actually becoming busy.

To fix this, the driver has been reworked to explicitly track the state
of the hardware, rather than try to guess what it is doing based on the
register values.

This has passed dmatest with 10 threads per channel, 10 iterations
per thread several times without error. Previously, this would fail
within a few seconds.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |  220 ++
 drivers/dma/fsldma.h |1 +
 2 files changed, 99 insertions(+), 122 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 5da1a4a..6e9ad6e 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -68,11 +68,6 @@ static dma_addr_t get_cdar(struct fsldma_chan *chan)
return DMA_IN(chan, chan-regs-cdar, 64)  ~FSL_DMA_SNEN;
 }
 
-static dma_addr_t get_ndar(struct fsldma_chan *chan)
-{
-   return DMA_IN(chan, chan-regs-ndar, 64);
-}
-
 static u32 get_bcr(struct fsldma_chan *chan)
 {
return DMA_IN(chan, chan-regs-bcr, 32);
@@ -143,13 +138,11 @@ static void dma_init(struct fsldma_chan *chan)
case FSL_DMA_IP_85XX:
/* Set the channel to below modes:
 * EIE - Error interrupt enable
-* EOSIE - End of segments interrupt enable (basic mode)
 * EOLNIE - End of links interrupt enable
 * BWC - Bandwidth sharing among channels
 */
DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC
-   | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE
-   | FSL_DMA_MR_EOSIE, 32);
+   | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE, 32);
break;
case FSL_DMA_IP_83XX:
/* Set the channel to below modes:
@@ -168,25 +161,32 @@ static int dma_is_idle(struct fsldma_chan *chan)
return (!(sr  FSL_DMA_SR_CB)) || (sr  FSL_DMA_SR_CH);
 }
 
+/*
+ * Start the DMA controller
+ *
+ * Preconditions:
+ * - the CDAR register must point to the start descriptor
+ * - the MRn[CS] bit must be cleared
+ */
 static void dma_start(struct fsldma_chan *chan)
 {
u32 mode;
 
mode = DMA_IN(chan, chan-regs-mr, 32);
 
-   if ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) {
-   if (chan-feature  FSL_DMA_CHAN_PAUSE_EXT) {
-   DMA_OUT(chan, chan-regs-bcr, 0, 32);
-   mode |= FSL_DMA_MR_EMP_EN;
-   } else {
-   mode = ~FSL_DMA_MR_EMP_EN;
-   }
+   if (chan-feature  FSL_DMA_CHAN_PAUSE_EXT) {
+   DMA_OUT(chan, chan-regs-bcr, 0, 32);
+   mode |= FSL_DMA_MR_EMP_EN;
+   } else {
+   mode = ~FSL_DMA_MR_EMP_EN;
}
 
-   if (chan-feature  FSL_DMA_CHAN_START_EXT)
+   if (chan-feature  FSL_DMA_CHAN_START_EXT) {
mode |= FSL_DMA_MR_EMS_EN;
-   else
+   } else {
+   mode = ~FSL_DMA_MR_EMS_EN;
mode |= FSL_DMA_MR_CS;
+   }
 
DMA_OUT(chan, chan-regs-mr, mode, 32);
 }
@@ -760,14 +760,15 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 
switch (cmd) {
case DMA_TERMINATE_ALL:
+   spin_lock_irqsave(chan-desc_lock, flags);
+
/* Halt the DMA engine */
dma_halt(chan);
 
-   spin_lock_irqsave(chan-desc_lock, flags);
-
/* Remove and free all of the descriptors in the LD queue */
fsldma_free_desc_list(chan, chan-ld_pending);
fsldma_free_desc_list(chan, chan-ld_running);
+   chan-idle = true;
 
spin_unlock_irqrestore(chan-desc_lock, flags);
return 0;
@@ -805,76 +806,43 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 }
 
 /**
- * fsl_dma_update_completed_cookie - Update the completed cookie.
+ * fsl_chan_ld_cleanup - Clean up link descriptors
  * @chan : Freescale DMA channel
  *
- * CONTEXT: hardirq
+ * This function is run after the queue of running descriptors has been
+ * executed by the DMA engine. It will run any callbacks, and then free
+ * the descriptors.
+ *
+ * HARDWARE STATE: idle
  */
-static void fsl_dma_update_completed_cookie(struct fsldma_chan *chan)
+static void fsl_chan_ld_cleanup(struct fsldma_chan *chan)
 {
-   struct fsl_desc_sw *desc;
+   struct fsl_desc_sw *desc, *_desc;
unsigned

[PATCH v3 7/9] fsldma: support async_tx dependencies and automatic unmapping

2011-03-03 Thread Ira W. Snyder
Previous to this patch, the dma_run_dependencies() function has been
called while holding desc_lock. This function can call tx_submit() for
other descriptors, which may try to re-grab the lock. Avoid this by
moving the descriptors to be cleaned up to a temporary list, and
dropping the lock before cleanup.

At the same time, add support for automatic unmapping of src and dst
buffers, as offered by the DMAEngine API.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |  131 --
 1 files changed, 95 insertions(+), 36 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 6e9ad6e..526579d 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -83,6 +83,11 @@ static void set_desc_cnt(struct fsldma_chan *chan,
hw-count = CPU_TO_DMA(chan, count, 32);
 }
 
+static u32 get_desc_cnt(struct fsldma_chan *chan, struct fsl_desc_sw *desc)
+{
+   return DMA_TO_CPU(chan, desc-hw.count, 32);
+}
+
 static void set_desc_src(struct fsldma_chan *chan,
 struct fsl_dma_ld_hw *hw, dma_addr_t src)
 {
@@ -93,6 +98,16 @@ static void set_desc_src(struct fsldma_chan *chan,
hw-src_addr = CPU_TO_DMA(chan, snoop_bits | src, 64);
 }
 
+static dma_addr_t get_desc_src(struct fsldma_chan *chan,
+  struct fsl_desc_sw *desc)
+{
+   u64 snoop_bits;
+
+   snoop_bits = ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX)
+   ? ((u64)FSL_DMA_SATR_SREADTYPE_SNOOP_READ  32) : 0;
+   return DMA_TO_CPU(chan, desc-hw.src_addr, 64)  ~snoop_bits;
+}
+
 static void set_desc_dst(struct fsldma_chan *chan,
 struct fsl_dma_ld_hw *hw, dma_addr_t dst)
 {
@@ -103,6 +118,16 @@ static void set_desc_dst(struct fsldma_chan *chan,
hw-dst_addr = CPU_TO_DMA(chan, snoop_bits | dst, 64);
 }
 
+static dma_addr_t get_desc_dst(struct fsldma_chan *chan,
+  struct fsl_desc_sw *desc)
+{
+   u64 snoop_bits;
+
+   snoop_bits = ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX)
+   ? ((u64)FSL_DMA_DATR_DWRITETYPE_SNOOP_WRITE  32) : 0;
+   return DMA_TO_CPU(chan, desc-hw.dst_addr, 64)  ~snoop_bits;
+}
+
 static void set_desc_next(struct fsldma_chan *chan,
  struct fsl_dma_ld_hw *hw, dma_addr_t next)
 {
@@ -806,6 +831,57 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 }
 
 /**
+ * fsldma_cleanup_descriptor - cleanup and free a single link descriptor
+ * @chan: Freescale DMA channel
+ * @desc: descriptor to cleanup and free
+ *
+ * This function is used on a descriptor which has been executed by the DMA
+ * controller. It will run any callbacks, submit any dependencies, and then
+ * free the descriptor.
+ */
+static void fsldma_cleanup_descriptor(struct fsldma_chan *chan,
+ struct fsl_desc_sw *desc)
+{
+   struct dma_async_tx_descriptor *txd = desc-async_tx;
+   struct device *dev = chan-common.device-dev;
+   dma_addr_t src = get_desc_src(chan, desc);
+   dma_addr_t dst = get_desc_dst(chan, desc);
+   u32 len = get_desc_cnt(chan, desc);
+
+   /* Run the link descriptor callback function */
+   if (txd-callback) {
+#ifdef FSL_DMA_LD_DEBUG
+   chan_dbg(chan, LD %p callback\n, desc);
+#endif
+   txd-callback(txd-callback_param);
+   }
+
+   /* Run any dependencies */
+   dma_run_dependencies(txd);
+
+   /* Unmap the dst buffer, if requested */
+   if (!(txd-flags  DMA_COMPL_SKIP_DEST_UNMAP)) {
+   if (txd-flags  DMA_COMPL_DEST_UNMAP_SINGLE)
+   dma_unmap_single(dev, dst, len, DMA_FROM_DEVICE);
+   else
+   dma_unmap_page(dev, dst, len, DMA_FROM_DEVICE);
+   }
+
+   /* Unmap the src buffer, if requested */
+   if (!(txd-flags  DMA_COMPL_SKIP_SRC_UNMAP)) {
+   if (txd-flags  DMA_COMPL_SRC_UNMAP_SINGLE)
+   dma_unmap_single(dev, src, len, DMA_TO_DEVICE);
+   else
+   dma_unmap_page(dev, src, len, DMA_TO_DEVICE);
+   }
+
+#ifdef FSL_DMA_LD_DEBUG
+   chan_dbg(chan, LD %p free\n, desc);
+#endif
+   dma_pool_free(chan-desc_pool, desc, txd-phys);
+}
+
+/**
  * fsl_chan_ld_cleanup - Clean up link descriptors
  * @chan : Freescale DMA channel
  *
@@ -818,56 +894,39 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 static void fsl_chan_ld_cleanup(struct fsldma_chan *chan)
 {
struct fsl_desc_sw *desc, *_desc;
+   LIST_HEAD(ld_cleanup);
unsigned long flags;
 
spin_lock_irqsave(chan-desc_lock, flags);
 
-   /* if the ld_running list is empty, there is nothing to do */
-   if (list_empty(chan-ld_running)) {
-   chan_dbg(chan, no descriptors to cleanup\n);
-   goto out_unlock;
+   /* update the cookie if we have some descriptors to cleanup

[PATCH v3 8/9] fsldma: reduce locking during descriptor cleanup

2011-03-03 Thread Ira W. Snyder
This merges the fsl_chan_ld_cleanup() function into the dma_do_tasklet()
function to reduce locking overhead. In the best case, we will be able
to keep the DMA controller busy while we are freeing used descriptors.
In all cases, the spinlock is grabbed two times fewer than before on
each transaction.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |  108 +
 1 files changed, 46 insertions(+), 62 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 526579d..d300de4 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -882,65 +882,15 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan 
*chan,
 }
 
 /**
- * fsl_chan_ld_cleanup - Clean up link descriptors
- * @chan : Freescale DMA channel
- *
- * This function is run after the queue of running descriptors has been
- * executed by the DMA engine. It will run any callbacks, and then free
- * the descriptors.
- *
- * HARDWARE STATE: idle
- */
-static void fsl_chan_ld_cleanup(struct fsldma_chan *chan)
-{
-   struct fsl_desc_sw *desc, *_desc;
-   LIST_HEAD(ld_cleanup);
-   unsigned long flags;
-
-   spin_lock_irqsave(chan-desc_lock, flags);
-
-   /* update the cookie if we have some descriptors to cleanup */
-   if (!list_empty(chan-ld_running)) {
-   dma_cookie_t cookie;
-
-   desc = to_fsl_desc(chan-ld_running.prev);
-   cookie = desc-async_tx.cookie;
-
-   chan-completed_cookie = cookie;
-   chan_dbg(chan, completed cookie=%d\n, cookie);
-   }
-
-   /*
-* move the descriptors to a temporary list so we can drop the lock
-* during the entire cleanup operation
-*/
-   list_splice_tail_init(chan-ld_running, ld_cleanup);
-
-   spin_unlock_irqrestore(chan-desc_lock, flags);
-
-   /* Run the callback for each descriptor, in order */
-   list_for_each_entry_safe(desc, _desc, ld_cleanup, node) {
-
-   /* Remove from the list of transactions */
-   list_del(desc-node);
-
-   /* Run all cleanup for this descriptor */
-   fsldma_cleanup_descriptor(chan, desc);
-   }
-}
-
-/**
  * fsl_chan_xfer_ld_queue - transfer any pending transactions
  * @chan : Freescale DMA channel
  *
  * HARDWARE STATE: idle
+ * LOCKING: must hold chan-desc_lock
  */
 static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
 {
struct fsl_desc_sw *desc;
-   unsigned long flags;
-
-   spin_lock_irqsave(chan-desc_lock, flags);
 
/*
 * If the list of pending descriptors is empty, then we
@@ -948,7 +898,7 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
 */
if (list_empty(chan-ld_pending)) {
chan_dbg(chan, no pending LDs\n);
-   goto out_unlock;
+   return;
}
 
/*
@@ -958,7 +908,7 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
 */
if (!chan-idle) {
chan_dbg(chan, DMA controller still busy\n);
-   goto out_unlock;
+   return;
}
 
/*
@@ -996,9 +946,6 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
 
dma_start(chan);
chan-idle = false;
-
-out_unlock:
-   spin_unlock_irqrestore(chan-desc_lock, flags);
 }
 
 /**
@@ -1008,7 +955,11 @@ out_unlock:
 static void fsl_dma_memcpy_issue_pending(struct dma_chan *dchan)
 {
struct fsldma_chan *chan = to_fsl_chan(dchan);
+   unsigned long flags;
+
+   spin_lock_irqsave(chan-desc_lock, flags);
fsl_chan_xfer_ld_queue(chan);
+   spin_unlock_irqrestore(chan-desc_lock, flags);
 }
 
 /**
@@ -1109,20 +1060,53 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data)
 static void dma_do_tasklet(unsigned long data)
 {
struct fsldma_chan *chan = (struct fsldma_chan *)data;
+   struct fsl_desc_sw *desc, *_desc;
+   LIST_HEAD(ld_cleanup);
unsigned long flags;
 
chan_dbg(chan, tasklet entry\n);
 
-   /* run all callbacks, free all used descriptors */
-   fsl_chan_ld_cleanup(chan);
-
-   /* the channel is now idle */
spin_lock_irqsave(chan-desc_lock, flags);
+
+   /* update the cookie if we have some descriptors to cleanup */
+   if (!list_empty(chan-ld_running)) {
+   dma_cookie_t cookie;
+
+   desc = to_fsl_desc(chan-ld_running.prev);
+   cookie = desc-async_tx.cookie;
+
+   chan-completed_cookie = cookie;
+   chan_dbg(chan, completed_cookie=%d\n, cookie);
+   }
+
+   /*
+* move the descriptors to a temporary list so we can drop the lock
+* during the entire cleanup operation
+*/
+   list_splice_tail_init(chan-ld_running, ld_cleanup);
+
+   /* the hardware is now idle and ready for more */
chan-idle = true;
-   spin_unlock_irqrestore(chan

Re: [PATCH 0/8] fsldma: lockup fixes

2011-03-02 Thread Ira W. Snyder
On Wed, Mar 02, 2011 at 07:49:57AM +0200, Felix Radensky wrote:
 Hi Ira,
 
 On 03/01/2011 09:52 PM, Ira W. Snyder wrote:
  On Tue, Mar 01, 2011 at 08:55:15AM -0800, Ira W. Snyder wrote:
 
  [ big snip ]
 
 
  I'd still like the bisect if you have a chance. I've re-reviewed the
  patch series, and found the places that change register writes to the
  controller.
 
  The patch below changes the register operations back to the original
  order. It doesn't make any sense why this would be required, but it is
  worth a quick try.
 
  I've added an XXX mark where you can comment out a single line if this
  patch fails. It is highly unlikely to make any difference, but I'm
  really having a hard time understanding what is wrong.
 
 
 This patch fixes the problem. See below
 

Excellent! I know what is happening now. The 85xx controller doesn't
clear the channel start bit at the end of a transfer. Sure enough,
buried near the end of the chapter, the datasheet implies this in a
table very far away from the register definitions. The 83xx datasheet
explicitly states that it clears this bit automatically.

I'll post an updated patch series later today. Thank you so much for
being patient and trying out all of these patches.

Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 1/9] dmatest: fix automatic buffer unmap type

2011-03-02 Thread Ira W. Snyder
The dmatest code relies on the DMAEngine API to automatically call
dma_unmap_single() on src buffers. The flags it passes are incorrect,
fix them.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/dmatest.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index 5589358..7e1b0aa 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -285,7 +285,12 @@ static int dmatest_func(void *data)
 
set_user_nice(current, 10);
 
-   flags = DMA_CTRL_ACK | DMA_COMPL_SKIP_DEST_UNMAP | DMA_PREP_INTERRUPT;
+   /*
+* src buffers are freed by the DMAEngine code with dma_unmap_single()
+* dst buffers are freed by ourselves below
+*/
+   flags = DMA_CTRL_ACK | DMA_PREP_INTERRUPT
+ | DMA_COMPL_SKIP_DEST_UNMAP | DMA_COMPL_SRC_UNMAP_SINGLE;
 
while (!kthread_should_stop()
!(iterations  total_tests = iterations)) {
-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 0/9] fsldma: lockup fixes

2011-03-02 Thread Ira W. Snyder
Hello everyone,

I've been chasing random infrequent controller lockups in the fsldma driver
for a long time. I finally managed to find the problem and fix it. I'm not
quite sure about the exact sequence of events which causes the race
condition, but it is related to using the hardware registers to track the
controller state. See the patch changelogs for more detail.

The problems were quickly found by turning on DMAPOOL_DEBUG inside
mm/dmapool.c. This poisons memory allocated with the dmapool API.

With dmapool poisoning turned on, the dmatest driver would start producing
failures within a few seconds. After this patchset has been applied, I have
run several iterations of the 10 threads per channel, 10 iterations per
thread test without any problems. I have also tested it with the CARMA
drivers (posted at linuxppc-dev previously), which make use of the external
control features.

While making the previous changes, I noticed that the fsldma driver does
not respect the automatic DMA unmapping of src and dst buffers. I have
added support for this feature. This also required a fix to dmatest, which
was sending incorrect flags.

The support async_tx dependencies patch could be split apart from the
automatic unmapping patch if it is desirable. They both touch the same
piece of code, so I thought it was ok to combine them. Let me know.

I would really like to see this go into 2.6.39. I think we can get it
reviewed before then. :)

Much thanks goes to Felix Radensky for testing on a P2020 (85xx DMA IP core).
I wouldn't have been able to track down the problems on 85xx without his
dilligent testing.

v1 - v2:
- reordered patches (dmatest change is first now)
- fix problems on 85xx controller
- only set correct bits for 83xx in dma_halt()

Ira W. Snyder (9):
  dmatest: fix automatic buffer unmap type
  fsldma: move related helper functions near each other
  fsldma: use channel name in printk output
  fsldma: improve link descriptor debugging
  fsldma: minor codingstyle and consistency fixes
  fsldma: fix controller lockups
  fsldma: support async_tx dependencies and automatic unmapping
  fsldma: reduce locking during descriptor cleanup
  fsldma: make halt behave nicely on all supported controllers

 drivers/dma/dmatest.c |7 +-
 drivers/dma/fsldma.c  |  542 +++--
 drivers/dma/fsldma.h  |6 +-
 3 files changed, 308 insertions(+), 247 deletions(-)

-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 2/9] fsldma: move related helper functions near each other

2011-03-02 Thread Ira W. Snyder
This is a purely cosmetic cleanup. It is nice to have related functions
right next to each other in the code.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |  116 +++--
 1 files changed, 64 insertions(+), 52 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 4de947a..2e1af45 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -39,33 +39,9 @@
 
 static const char msg_ld_oom[] = No free memory for link descriptor\n;
 
-static void dma_init(struct fsldma_chan *chan)
-{
-   /* Reset the channel */
-   DMA_OUT(chan, chan-regs-mr, 0, 32);
-
-   switch (chan-feature  FSL_DMA_IP_MASK) {
-   case FSL_DMA_IP_85XX:
-   /* Set the channel to below modes:
-* EIE - Error interrupt enable
-* EOSIE - End of segments interrupt enable (basic mode)
-* EOLNIE - End of links interrupt enable
-* BWC - Bandwidth sharing among channels
-*/
-   DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC
-   | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE
-   | FSL_DMA_MR_EOSIE, 32);
-   break;
-   case FSL_DMA_IP_83XX:
-   /* Set the channel to below modes:
-* EOTIE - End-of-transfer interrupt enable
-* PRC_RM - PCI read multiple
-*/
-   DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_EOTIE
-   | FSL_DMA_MR_PRC_RM, 32);
-   break;
-   }
-}
+/*
+ * Register Helpers
+ */
 
 static void set_sr(struct fsldma_chan *chan, u32 val)
 {
@@ -77,6 +53,30 @@ static u32 get_sr(struct fsldma_chan *chan)
return DMA_IN(chan, chan-regs-sr, 32);
 }
 
+static void set_cdar(struct fsldma_chan *chan, dma_addr_t addr)
+{
+   DMA_OUT(chan, chan-regs-cdar, addr | FSL_DMA_SNEN, 64);
+}
+
+static dma_addr_t get_cdar(struct fsldma_chan *chan)
+{
+   return DMA_IN(chan, chan-regs-cdar, 64)  ~FSL_DMA_SNEN;
+}
+
+static dma_addr_t get_ndar(struct fsldma_chan *chan)
+{
+   return DMA_IN(chan, chan-regs-ndar, 64);
+}
+
+static u32 get_bcr(struct fsldma_chan *chan)
+{
+   return DMA_IN(chan, chan-regs-bcr, 32);
+}
+
+/*
+ * Descriptor Helpers
+ */
+
 static void set_desc_cnt(struct fsldma_chan *chan,
struct fsl_dma_ld_hw *hw, u32 count)
 {
@@ -113,24 +113,49 @@ static void set_desc_next(struct fsldma_chan *chan,
hw-next_ln_addr = CPU_TO_DMA(chan, snoop_bits | next, 64);
 }
 
-static void set_cdar(struct fsldma_chan *chan, dma_addr_t addr)
+static void set_ld_eol(struct fsldma_chan *chan,
+   struct fsl_desc_sw *desc)
 {
-   DMA_OUT(chan, chan-regs-cdar, addr | FSL_DMA_SNEN, 64);
-}
+   u64 snoop_bits;
 
-static dma_addr_t get_cdar(struct fsldma_chan *chan)
-{
-   return DMA_IN(chan, chan-regs-cdar, 64)  ~FSL_DMA_SNEN;
-}
+   snoop_bits = ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_83XX)
+   ? FSL_DMA_SNEN : 0;
 
-static dma_addr_t get_ndar(struct fsldma_chan *chan)
-{
-   return DMA_IN(chan, chan-regs-ndar, 64);
+   desc-hw.next_ln_addr = CPU_TO_DMA(chan,
+   DMA_TO_CPU(chan, desc-hw.next_ln_addr, 64) | FSL_DMA_EOL
+   | snoop_bits, 64);
 }
 
-static u32 get_bcr(struct fsldma_chan *chan)
+/*
+ * DMA Engine Hardware Control Helpers
+ */
+
+static void dma_init(struct fsldma_chan *chan)
 {
-   return DMA_IN(chan, chan-regs-bcr, 32);
+   /* Reset the channel */
+   DMA_OUT(chan, chan-regs-mr, 0, 32);
+
+   switch (chan-feature  FSL_DMA_IP_MASK) {
+   case FSL_DMA_IP_85XX:
+   /* Set the channel to below modes:
+* EIE - Error interrupt enable
+* EOSIE - End of segments interrupt enable (basic mode)
+* EOLNIE - End of links interrupt enable
+* BWC - Bandwidth sharing among channels
+*/
+   DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC
+   | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE
+   | FSL_DMA_MR_EOSIE, 32);
+   break;
+   case FSL_DMA_IP_83XX:
+   /* Set the channel to below modes:
+* EOTIE - End-of-transfer interrupt enable
+* PRC_RM - PCI read multiple
+*/
+   DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_EOTIE
+   | FSL_DMA_MR_PRC_RM, 32);
+   break;
+   }
 }
 
 static int dma_is_idle(struct fsldma_chan *chan)
@@ -185,19 +210,6 @@ static void dma_halt(struct fsldma_chan *chan)
dev_err(chan-dev, DMA halt timeout!\n);
 }
 
-static void set_ld_eol(struct fsldma_chan *chan,
-   struct fsl_desc_sw *desc)
-{
-   u64 snoop_bits;
-
-   snoop_bits = ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_83XX

[PATCH v2 4/9] fsldma: improve link descriptor debugging

2011-03-02 Thread Ira W. Snyder
This adds better tracking to link descriptor allocations, callbacks, and
frees. This makes it much easier to track errors with link descriptors.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   21 +++--
 1 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 6e3d3d7..851993c 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -416,6 +416,10 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor(
desc-async_tx.tx_submit = fsl_dma_tx_submit;
desc-async_tx.phys = pdesc;
 
+#ifdef FSL_DMA_LD_DEBUG
+   dev_dbg(chan-dev, %s: LD %p allocated\n, chan-name, desc);
+#endif
+
return desc;
 }
 
@@ -467,6 +471,9 @@ static void fsldma_free_desc_list(struct fsldma_chan *chan,
 
list_for_each_entry_safe(desc, _desc, list, node) {
list_del(desc-node);
+#ifdef FSL_DMA_LD_DEBUG
+   dev_dbg(chan-dev, %s: LD %p free\n, chan-name, desc);
+#endif
dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys);
}
 }
@@ -478,6 +485,9 @@ static void fsldma_free_desc_list_reverse(struct 
fsldma_chan *chan,
 
list_for_each_entry_safe_reverse(desc, _desc, list, node) {
list_del(desc-node);
+#ifdef FSL_DMA_LD_DEBUG
+   dev_dbg(chan-dev, %s: LD %p free\n, chan-name, desc);
+#endif
dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys);
}
 }
@@ -554,9 +564,6 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy(
dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom);
goto fail;
}
-#ifdef FSL_DMA_LD_DEBUG
-   dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, 
new);
-#endif
 
copy = min(len, (size_t)FSL_DMA_BCR_MAX_CNT);
 
@@ -642,9 +649,6 @@ static struct dma_async_tx_descriptor 
*fsl_dma_prep_sg(struct dma_chan *dchan,
dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom);
goto fail;
}
-#ifdef FSL_DMA_LD_DEBUG
-   dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, 
new);
-#endif
 
set_desc_cnt(chan, new-hw, len);
set_desc_src(chan, new-hw, src);
@@ -881,13 +885,18 @@ static void fsl_chan_ld_cleanup(struct fsldma_chan *chan)
callback_param = desc-async_tx.callback_param;
if (callback) {
spin_unlock_irqrestore(chan-desc_lock, flags);
+#ifdef FSL_DMA_LD_DEBUG
dev_dbg(chan-dev, %s: LD %p callback\n, name, desc);
+#endif
callback(callback_param);
spin_lock_irqsave(chan-desc_lock, flags);
}
 
/* Run any dependencies, then free the descriptor */
dma_run_dependencies(desc-async_tx);
+#ifdef FSL_DMA_LD_DEBUG
+   dev_dbg(chan-dev, %s: LD %p free\n, name, desc);
+#endif
dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys);
}
 
-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 3/9] fsldma: use channel name in printk output

2011-03-02 Thread Ira W. Snyder
This makes debugging the driver much easier when multiple channels are
running concurrently. In addition, you can see how much descriptor
memory each channel has allocated via the dmapool API in sysfs.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   60 +++--
 drivers/dma/fsldma.h |1 +
 2 files changed, 34 insertions(+), 27 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 2e1af45..6e3d3d7 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -37,7 +37,7 @@
 
 #include fsldma.h
 
-static const char msg_ld_oom[] = No free memory for link descriptor\n;
+static const char msg_ld_oom[] = No free memory for link descriptor;
 
 /*
  * Register Helpers
@@ -207,7 +207,7 @@ static void dma_halt(struct fsldma_chan *chan)
}
 
if (!dma_is_idle(chan))
-   dev_err(chan-dev, DMA halt timeout!\n);
+   dev_err(chan-dev, %s: DMA halt timeout!\n, chan-name);
 }
 
 /**
@@ -400,12 +400,13 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
 static struct fsl_desc_sw *fsl_dma_alloc_descriptor(
struct fsldma_chan *chan)
 {
+   const char *name = chan-name;
struct fsl_desc_sw *desc;
dma_addr_t pdesc;
 
desc = dma_pool_alloc(chan-desc_pool, GFP_ATOMIC, pdesc);
if (!desc) {
-   dev_dbg(chan-dev, out of memory for link desc\n);
+   dev_dbg(chan-dev, %s: out of memory for link desc\n, name);
return NULL;
}
 
@@ -439,13 +440,12 @@ static int fsl_dma_alloc_chan_resources(struct dma_chan 
*dchan)
 * We need the descriptor to be aligned to 32bytes
 * for meeting FSL DMA specification requirement.
 */
-   chan-desc_pool = dma_pool_create(fsl_dma_engine_desc_pool,
- chan-dev,
+   chan-desc_pool = dma_pool_create(chan-name, chan-dev,
  sizeof(struct fsl_desc_sw),
  __alignof__(struct fsl_desc_sw), 0);
if (!chan-desc_pool) {
-   dev_err(chan-dev, unable to allocate channel %d 
-  descriptor pool\n, chan-id);
+   dev_err(chan-dev, %s: unable to allocate descriptor pool\n,
+  chan-name);
return -ENOMEM;
}
 
@@ -491,7 +491,7 @@ static void fsl_dma_free_chan_resources(struct dma_chan 
*dchan)
struct fsldma_chan *chan = to_fsl_chan(dchan);
unsigned long flags;
 
-   dev_dbg(chan-dev, Free all channel resources.\n);
+   dev_dbg(chan-dev, %s: Free all channel resources.\n, chan-name);
spin_lock_irqsave(chan-desc_lock, flags);
fsldma_free_desc_list(chan, chan-ld_pending);
fsldma_free_desc_list(chan, chan-ld_running);
@@ -514,7 +514,7 @@ fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned 
long flags)
 
new = fsl_dma_alloc_descriptor(chan);
if (!new) {
-   dev_err(chan-dev, msg_ld_oom);
+   dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom);
return NULL;
}
 
@@ -551,11 +551,11 @@ static struct dma_async_tx_descriptor 
*fsl_dma_prep_memcpy(
/* Allocate the link descriptor from DMA pool */
new = fsl_dma_alloc_descriptor(chan);
if (!new) {
-   dev_err(chan-dev, msg_ld_oom);
+   dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom);
goto fail;
}
 #ifdef FSL_DMA_LD_DEBUG
-   dev_dbg(chan-dev, new link desc alloc %p\n, new);
+   dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, 
new);
 #endif
 
copy = min(len, (size_t)FSL_DMA_BCR_MAX_CNT);
@@ -639,11 +639,11 @@ static struct dma_async_tx_descriptor 
*fsl_dma_prep_sg(struct dma_chan *dchan,
/* allocate and populate the descriptor */
new = fsl_dma_alloc_descriptor(chan);
if (!new) {
-   dev_err(chan-dev, msg_ld_oom);
+   dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom);
goto fail;
}
 #ifdef FSL_DMA_LD_DEBUG
-   dev_dbg(chan-dev, new link desc alloc %p\n, new);
+   dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, 
new);
 #endif
 
set_desc_cnt(chan, new-hw, len);
@@ -815,7 +815,7 @@ static void fsl_dma_update_completed_cookie(struct 
fsldma_chan *chan)
spin_lock_irqsave(chan-desc_lock, flags);
 
if (list_empty(chan-ld_running)) {
-   dev_dbg(chan-dev, no running descriptors\n);
+   dev_dbg(chan-dev, %s: no running descriptors\n, chan-name);
goto out_unlock;
}
 
@@ -859,11 +859,13 @@ static enum dma_status

[PATCH v2 9/9] fsldma: make halt behave nicely on all supported controllers

2011-03-02 Thread Ira W. Snyder
The original dma_halt() function set the CA (channel abort) bit on both
the 83xx and 85xx controllers. This is incorrect on the 83xx, where this
bit means TEM (transfer error mask) instead. The 83xx doesn't support
channel abort, so we only do this operation on 85xx.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   19 ---
 1 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 40babc1..eb7bc24 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -216,13 +216,26 @@ static void dma_halt(struct fsldma_chan *chan)
u32 mode;
int i;
 
+   /* read the mode register */
mode = DMA_IN(chan, chan-regs-mr, 32);
-   mode |= FSL_DMA_MR_CA;
-   DMA_OUT(chan, chan-regs-mr, mode, 32);
 
-   mode = ~(FSL_DMA_MR_CS | FSL_DMA_MR_EMS_EN | FSL_DMA_MR_CA);
+   /*
+* The 85xx controller supports channel abort, which will stop
+* the current transfer. On 83xx, this bit is the transfer error
+* mask bit, which should not be changed.
+*/
+   if ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) {
+   mode |= FSL_DMA_MR_CA;
+   DMA_OUT(chan, chan-regs-mr, mode, 32);
+
+   mode = ~FSL_DMA_MR_CA;
+   }
+
+   /* stop the DMA controller */
+   mode = ~(FSL_DMA_MR_CS | FSL_DMA_MR_EMS_EN);
DMA_OUT(chan, chan-regs-mr, mode, 32);
 
+   /* wait for the DMA controller to become idle */
for (i = 0; i  100; i++) {
if (dma_is_idle(chan))
return;
-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v2 5/9] fsldma: minor codingstyle and consistency fixes

2011-03-02 Thread Ira W. Snyder
This fixes some minor violations of the coding style. It also changes
the style of the device_prep_dma_*() function definitions so they are
identical.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   29 +
 drivers/dma/fsldma.h |4 ++--
 2 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 851993c..06421c0 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -84,7 +84,7 @@ static void set_desc_cnt(struct fsldma_chan *chan,
 }
 
 static void set_desc_src(struct fsldma_chan *chan,
-   struct fsl_dma_ld_hw *hw, dma_addr_t src)
+struct fsl_dma_ld_hw *hw, dma_addr_t src)
 {
u64 snoop_bits;
 
@@ -94,7 +94,7 @@ static void set_desc_src(struct fsldma_chan *chan,
 }
 
 static void set_desc_dst(struct fsldma_chan *chan,
-   struct fsl_dma_ld_hw *hw, dma_addr_t dst)
+struct fsl_dma_ld_hw *hw, dma_addr_t dst)
 {
u64 snoop_bits;
 
@@ -104,7 +104,7 @@ static void set_desc_dst(struct fsldma_chan *chan,
 }
 
 static void set_desc_next(struct fsldma_chan *chan,
-   struct fsl_dma_ld_hw *hw, dma_addr_t next)
+ struct fsl_dma_ld_hw *hw, dma_addr_t next)
 {
u64 snoop_bits;
 
@@ -113,8 +113,7 @@ static void set_desc_next(struct fsldma_chan *chan,
hw-next_ln_addr = CPU_TO_DMA(chan, snoop_bits | next, 64);
 }
 
-static void set_ld_eol(struct fsldma_chan *chan,
-   struct fsl_desc_sw *desc)
+static void set_ld_eol(struct fsldma_chan *chan, struct fsl_desc_sw *desc)
 {
u64 snoop_bits;
 
@@ -333,8 +332,7 @@ static void fsl_chan_toggle_ext_start(struct fsldma_chan 
*chan, int enable)
chan-feature = ~FSL_DMA_CHAN_START_EXT;
 }
 
-static void append_ld_queue(struct fsldma_chan *chan,
-   struct fsl_desc_sw *desc)
+static void append_ld_queue(struct fsldma_chan *chan, struct fsl_desc_sw *desc)
 {
struct fsl_desc_sw *tail = to_fsl_desc(chan-ld_pending.prev);
 
@@ -375,8 +373,8 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
cookie = chan-common.cookie;
list_for_each_entry(child, desc-tx_list, node) {
cookie++;
-   if (cookie  0)
-   cookie = 1;
+   if (cookie  DMA_MIN_COOKIE)
+   cookie = DMA_MIN_COOKIE;
 
child-async_tx.cookie = cookie;
}
@@ -397,8 +395,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
  *
  * Return - The descriptor allocated. NULL for failed.
  */
-static struct fsl_desc_sw *fsl_dma_alloc_descriptor(
-   struct fsldma_chan *chan)
+static struct fsl_desc_sw *fsl_dma_alloc_descriptor(struct fsldma_chan *chan)
 {
const char *name = chan-name;
struct fsl_desc_sw *desc;
@@ -423,7 +420,6 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor(
return desc;
 }
 
-
 /**
  * fsl_dma_alloc_chan_resources - Allocate resources for DMA channel.
  * @chan : Freescale DMA channel
@@ -534,14 +530,15 @@ fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned 
long flags)
/* Insert the link descriptor to the LD ring */
list_add_tail(new-node, new-tx_list);
 
-   /* Set End-of-link to the last link descriptor of new list*/
+   /* Set End-of-link to the last link descriptor of new list */
set_ld_eol(chan, new);
 
return new-async_tx;
 }
 
-static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy(
-   struct dma_chan *dchan, dma_addr_t dma_dst, dma_addr_t dma_src,
+static struct dma_async_tx_descriptor *
+fsl_dma_prep_memcpy(struct dma_chan *dchan,
+   dma_addr_t dma_dst, dma_addr_t dma_src,
size_t len, unsigned long flags)
 {
struct fsldma_chan *chan;
@@ -591,7 +588,7 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy(
new-async_tx.flags = flags; /* client is in control of this ack */
new-async_tx.cookie = -EBUSY;
 
-   /* Set End-of-link to the last link descriptor of new list*/
+   /* Set End-of-link to the last link descriptor of new list */
set_ld_eol(chan, new);
 
return first-async_tx;
diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h
index 113e713..49189da 100644
--- a/drivers/dma/fsldma.h
+++ b/drivers/dma/fsldma.h
@@ -102,8 +102,8 @@ struct fsl_desc_sw {
 } __attribute__((aligned(32)));
 
 struct fsldma_chan_regs {
-   u32 mr; /* 0x00 - Mode Register */
-   u32 sr; /* 0x04 - Status Register */
+   u32 mr; /* 0x00 - Mode Register */
+   u32 sr; /* 0x04 - Status Register */
u64 cdar;   /* 0x08 - Current descriptor address register */
u64 sar;/* 0x10 - Source Address Register */
u64 dar;/* 0x18

[PATCH v2 6/9] fsldma: fix controller lockups

2011-03-02 Thread Ira W. Snyder
Enabling poisoning in the dmapool API quickly showed that the DMA
controller was fetching descriptors that should not have been in use.
This has caused intermittent controller lockups during testing.

I have been unable to figure out the exact set of conditions which cause
this to happen. However, I believe it is related to the driver using the
hardware registers to track whether the controller is busy or not. The
code can incorrectly decide that the hardware is idle due to lag between
register writes and the hardware actually becoming busy.

To fix this, the driver has been reworked to explicitly track the state
of the hardware, rather than try to guess what it is doing based on the
register values.

This has passed dmatest with 10 threads per channel, 10 iterations
per thread several times without error. Previously, this would fail
within a few seconds.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |  225 ++
 drivers/dma/fsldma.h |1 +
 2 files changed, 101 insertions(+), 125 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 06421c0..e9bb51e 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -63,11 +63,6 @@ static dma_addr_t get_cdar(struct fsldma_chan *chan)
return DMA_IN(chan, chan-regs-cdar, 64)  ~FSL_DMA_SNEN;
 }
 
-static dma_addr_t get_ndar(struct fsldma_chan *chan)
-{
-   return DMA_IN(chan, chan-regs-ndar, 64);
-}
-
 static u32 get_bcr(struct fsldma_chan *chan)
 {
return DMA_IN(chan, chan-regs-bcr, 32);
@@ -138,13 +133,11 @@ static void dma_init(struct fsldma_chan *chan)
case FSL_DMA_IP_85XX:
/* Set the channel to below modes:
 * EIE - Error interrupt enable
-* EOSIE - End of segments interrupt enable (basic mode)
 * EOLNIE - End of links interrupt enable
 * BWC - Bandwidth sharing among channels
 */
DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC
-   | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE
-   | FSL_DMA_MR_EOSIE, 32);
+   | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE, 32);
break;
case FSL_DMA_IP_83XX:
/* Set the channel to below modes:
@@ -163,25 +156,32 @@ static int dma_is_idle(struct fsldma_chan *chan)
return (!(sr  FSL_DMA_SR_CB)) || (sr  FSL_DMA_SR_CH);
 }
 
+/*
+ * Start the DMA controller
+ *
+ * Preconditions:
+ * - the CDAR register must point to the start descriptor
+ * - the MRn[CS] bit must be cleared
+ */
 static void dma_start(struct fsldma_chan *chan)
 {
u32 mode;
 
mode = DMA_IN(chan, chan-regs-mr, 32);
 
-   if ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) {
-   if (chan-feature  FSL_DMA_CHAN_PAUSE_EXT) {
-   DMA_OUT(chan, chan-regs-bcr, 0, 32);
-   mode |= FSL_DMA_MR_EMP_EN;
-   } else {
-   mode = ~FSL_DMA_MR_EMP_EN;
-   }
+   if (chan-feature  FSL_DMA_CHAN_PAUSE_EXT) {
+   DMA_OUT(chan, chan-regs-bcr, 0, 32);
+   mode |= FSL_DMA_MR_EMP_EN;
+   } else {
+   mode = ~FSL_DMA_MR_EMP_EN;
}
 
-   if (chan-feature  FSL_DMA_CHAN_START_EXT)
+   if (chan-feature  FSL_DMA_CHAN_START_EXT) {
mode |= FSL_DMA_MR_EMS_EN;
-   else
+   } else {
+   mode = ~FSL_DMA_MR_EMS_EN;
mode |= FSL_DMA_MR_CS;
+   }
 
DMA_OUT(chan, chan-regs-mr, mode, 32);
 }
@@ -757,14 +757,15 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 
switch (cmd) {
case DMA_TERMINATE_ALL:
+   spin_lock_irqsave(chan-desc_lock, flags);
+
/* Halt the DMA engine */
dma_halt(chan);
 
-   spin_lock_irqsave(chan-desc_lock, flags);
-
/* Remove and free all of the descriptors in the LD queue */
fsldma_free_desc_list(chan, chan-ld_pending);
fsldma_free_desc_list(chan, chan-ld_running);
+   chan-idle = true;
 
spin_unlock_irqrestore(chan-desc_lock, flags);
return 0;
@@ -802,78 +803,45 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 }
 
 /**
- * fsl_dma_update_completed_cookie - Update the completed cookie.
+ * fsl_chan_ld_cleanup - Clean up link descriptors
  * @chan : Freescale DMA channel
  *
- * CONTEXT: hardirq
+ * This function is run after the queue of running descriptors has been
+ * executed by the DMA engine. It will run any callbacks, and then free
+ * the descriptors.
+ *
+ * HARDWARE STATE: idle
  */
-static void fsl_dma_update_completed_cookie(struct fsldma_chan *chan)
+static void fsl_chan_ld_cleanup(struct fsldma_chan *chan)
 {
-   struct fsl_desc_sw *desc;
+   struct fsl_desc_sw *desc, *_desc;
+   const char

[PATCH v2 7/9] fsldma: support async_tx dependencies and automatic unmapping

2011-03-02 Thread Ira W. Snyder
Previous to this patch, the dma_run_dependencies() function has been
called while holding desc_lock. This function can call tx_submit() for
other descriptors, which may try to re-grab the lock. Avoid this by
moving the descriptors to be cleaned up to a temporary list, and
dropping the lock before cleanup.

At the same time, add support for automatic unmapping of src and dst
buffers, as offered by the DMAEngine API.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |  132 --
 1 files changed, 95 insertions(+), 37 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index e9bb51e..48e48c7 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -78,6 +78,11 @@ static void set_desc_cnt(struct fsldma_chan *chan,
hw-count = CPU_TO_DMA(chan, count, 32);
 }
 
+static u32 get_desc_cnt(struct fsldma_chan *chan, struct fsl_desc_sw *desc)
+{
+   return DMA_TO_CPU(chan, desc-hw.count, 32);
+}
+
 static void set_desc_src(struct fsldma_chan *chan,
 struct fsl_dma_ld_hw *hw, dma_addr_t src)
 {
@@ -88,6 +93,16 @@ static void set_desc_src(struct fsldma_chan *chan,
hw-src_addr = CPU_TO_DMA(chan, snoop_bits | src, 64);
 }
 
+static dma_addr_t get_desc_src(struct fsldma_chan *chan,
+  struct fsl_desc_sw *desc)
+{
+   u64 snoop_bits;
+
+   snoop_bits = ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX)
+   ? ((u64)FSL_DMA_SATR_SREADTYPE_SNOOP_READ  32) : 0;
+   return DMA_TO_CPU(chan, desc-hw.src_addr, 64)  ~snoop_bits;
+}
+
 static void set_desc_dst(struct fsldma_chan *chan,
 struct fsl_dma_ld_hw *hw, dma_addr_t dst)
 {
@@ -98,6 +113,16 @@ static void set_desc_dst(struct fsldma_chan *chan,
hw-dst_addr = CPU_TO_DMA(chan, snoop_bits | dst, 64);
 }
 
+static dma_addr_t get_desc_dst(struct fsldma_chan *chan,
+  struct fsl_desc_sw *desc)
+{
+   u64 snoop_bits;
+
+   snoop_bits = ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX)
+   ? ((u64)FSL_DMA_DATR_DWRITETYPE_SNOOP_WRITE  32) : 0;
+   return DMA_TO_CPU(chan, desc-hw.dst_addr, 64)  ~snoop_bits;
+}
+
 static void set_desc_next(struct fsldma_chan *chan,
  struct fsl_dma_ld_hw *hw, dma_addr_t next)
 {
@@ -803,6 +828,57 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 }
 
 /**
+ * fsldma_cleanup_descriptor - cleanup and free a single link descriptor
+ * @chan: Freescale DMA channel
+ * @desc: descriptor to cleanup and free
+ *
+ * This function is used on a descriptor which has been executed by the DMA
+ * controller. It will run any callbacks, submit any dependencies, and then
+ * free the descriptor.
+ */
+static void fsldma_cleanup_descriptor(struct fsldma_chan *chan,
+ struct fsl_desc_sw *desc)
+{
+   struct dma_async_tx_descriptor *txd = desc-async_tx;
+   struct device *dev = chan-common.device-dev;
+   dma_addr_t src = get_desc_src(chan, desc);
+   dma_addr_t dst = get_desc_dst(chan, desc);
+   u32 len = get_desc_cnt(chan, desc);
+
+   /* Run the link descriptor callback function */
+   if (txd-callback) {
+#ifdef FSL_DMA_LD_DEBUG
+   dev_dbg(chan-dev, %s: LD %p callback\n, chan-name, desc);
+#endif
+   txd-callback(txd-callback_param);
+   }
+
+   /* Run any dependencies */
+   dma_run_dependencies(txd);
+
+   /* Unmap the dst buffer, if requested */
+   if (!(txd-flags  DMA_COMPL_SKIP_DEST_UNMAP)) {
+   if (txd-flags  DMA_COMPL_DEST_UNMAP_SINGLE)
+   dma_unmap_single(dev, dst, len, DMA_FROM_DEVICE);
+   else
+   dma_unmap_page(dev, dst, len, DMA_FROM_DEVICE);
+   }
+
+   /* Unmap the src buffer, if requested */
+   if (!(txd-flags  DMA_COMPL_SKIP_SRC_UNMAP)) {
+   if (txd-flags  DMA_COMPL_SRC_UNMAP_SINGLE)
+   dma_unmap_single(dev, src, len, DMA_TO_DEVICE);
+   else
+   dma_unmap_page(dev, src, len, DMA_TO_DEVICE);
+   }
+
+#ifdef FSL_DMA_LD_DEBUG
+   dev_dbg(chan-dev, %s: LD %p free\n, chan-name, desc);
+#endif
+   dma_pool_free(chan-desc_pool, desc, txd-phys);
+}
+
+/**
  * fsl_chan_ld_cleanup - Clean up link descriptors
  * @chan : Freescale DMA channel
  *
@@ -816,57 +892,39 @@ static void fsl_chan_ld_cleanup(struct fsldma_chan *chan)
 {
struct fsl_desc_sw *desc, *_desc;
const char *name = chan-name;
+   LIST_HEAD(ld_cleanup);
unsigned long flags;
 
spin_lock_irqsave(chan-desc_lock, flags);
 
-   /* if the ld_running list is empty, there is nothing to do */
-   if (list_empty(chan-ld_running)) {
-   dev_dbg(chan-dev, %s: no descriptors to cleanup\n, name);
-   goto out_unlock;
+   /* update the cookie if we have some

[PATCH v2 8/9] fsldma: reduce locking during descriptor cleanup

2011-03-02 Thread Ira W. Snyder
This merges the fsl_chan_ld_cleanup() function into the dma_do_tasklet()
function to reduce locking overhead. In the best case, we will be able
to keep the DMA controller busy while we are freeing used descriptors.
In all cases, the spinlock is grabbed two times fewer than before on
each transaction.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |  114 +
 1 files changed, 49 insertions(+), 65 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 48e48c7..40babc1 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -879,67 +879,16 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan 
*chan,
 }
 
 /**
- * fsl_chan_ld_cleanup - Clean up link descriptors
- * @chan : Freescale DMA channel
- *
- * This function is run after the queue of running descriptors has been
- * executed by the DMA engine. It will run any callbacks, and then free
- * the descriptors.
- *
- * HARDWARE STATE: idle
- */
-static void fsl_chan_ld_cleanup(struct fsldma_chan *chan)
-{
-   struct fsl_desc_sw *desc, *_desc;
-   const char *name = chan-name;
-   LIST_HEAD(ld_cleanup);
-   unsigned long flags;
-
-   spin_lock_irqsave(chan-desc_lock, flags);
-
-   /* update the cookie if we have some descriptors to cleanup */
-   if (!list_empty(chan-ld_running)) {
-   dma_cookie_t cookie;
-
-   desc = to_fsl_desc(chan-ld_running.prev);
-   cookie = desc-async_tx.cookie;
-
-   chan-completed_cookie = cookie;
-   dev_dbg(chan-dev, %s: completed_cookie=%d\n, name, cookie);
-   }
-
-   /*
-* move the descriptors to a temporary list so we can drop the lock
-* during the entire cleanup operation
-*/
-   list_splice_tail_init(chan-ld_running, ld_cleanup);
-
-   spin_unlock_irqrestore(chan-desc_lock, flags);
-
-   /* Run the callback for each descriptor, in order */
-   list_for_each_entry_safe(desc, _desc, ld_cleanup, node) {
-
-   /* Remove from the list of transactions */
-   list_del(desc-node);
-
-   /* Run all cleanup for this descriptor */
-   fsldma_cleanup_descriptor(chan, desc);
-   }
-}
-
-/**
  * fsl_chan_xfer_ld_queue - transfer any pending transactions
  * @chan : Freescale DMA channel
  *
  * HARDWARE STATE: idle
+ * LOCKING: must hold chan-desc_lock
  */
 static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
 {
const char *name = chan-name;
struct fsl_desc_sw *desc;
-   unsigned long flags;
-
-   spin_lock_irqsave(chan-desc_lock, flags);
 
/*
 * If the list of pending descriptors is empty, then we
@@ -947,7 +896,7 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
 */
if (list_empty(chan-ld_pending)) {
dev_dbg(chan-dev, %s: no pending LDs\n, name);
-   goto out_unlock;
+   return;
}
 
/*
@@ -957,7 +906,7 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
 */
if (!chan-idle) {
dev_dbg(chan-dev, %s: DMA controller still busy\n, name);
-   goto out_unlock;
+   return;
}
 
/*
@@ -995,9 +944,6 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
 
dma_start(chan);
chan-idle = false;
-
-out_unlock:
-   spin_unlock_irqrestore(chan-desc_lock, flags);
 }
 
 /**
@@ -1007,7 +953,11 @@ out_unlock:
 static void fsl_dma_memcpy_issue_pending(struct dma_chan *dchan)
 {
struct fsldma_chan *chan = to_fsl_chan(dchan);
+   unsigned long flags;
+
+   spin_lock_irqsave(chan-desc_lock, flags);
fsl_chan_xfer_ld_queue(chan);
+   spin_unlock_irqrestore(chan-desc_lock, flags);
 }
 
 /**
@@ -1109,21 +1059,55 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data)
 static void dma_do_tasklet(unsigned long data)
 {
struct fsldma_chan *chan = (struct fsldma_chan *)data;
+   struct fsl_desc_sw *desc, *_desc;
+   const char *name = chan-name;
+   LIST_HEAD(ld_cleanup);
unsigned long flags;
 
-   dev_dbg(chan-dev, %s: tasklet entry\n, chan-name);
+   dev_dbg(chan-dev, %s: tasklet entry\n, name);
 
-   /* run all callbacks, free all used descriptors */
-   fsl_chan_ld_cleanup(chan);
-
-   /* the channel is now idle */
spin_lock_irqsave(chan-desc_lock, flags);
+
+   /* update the cookie if we have some descriptors to cleanup */
+   if (!list_empty(chan-ld_running)) {
+   dma_cookie_t cookie;
+
+   desc = to_fsl_desc(chan-ld_running.prev);
+   cookie = desc-async_tx.cookie;
+
+   chan-completed_cookie = cookie;
+   dev_dbg(chan-dev, %s: completed_cookie=%d\n, name, cookie);
+   }
+
+   /*
+* move the descriptors to a temporary list so we can drop the lock

Re: [PATCH 0/8] fsldma: lockup fixes

2011-03-01 Thread Ira W. Snyder
On Tue, Mar 01, 2011 at 07:52:39AM +0200, Felix Radensky wrote:
 Hi Ira,
 
 On 03/01/2011 02:21 AM, Ira W. Snyder wrote:
  On Mon, Feb 28, 2011 at 11:27:40PM +0200, Felix Radensky wrote:
  Hi Ira,
 
  On 02/28/2011 11:11 PM, Ira W. Snyder wrote:
  On Mon, Feb 28, 2011 at 10:15:49PM +0200, Felix Radensky wrote:
  Hi Ira,
 
  Thank you very much Felix. The dmesg output shows that the controller
  never got an interrupt for the second transaction. The patch below has
  extra debugging information that may help determine why this happens.
  Please apply it and re-run the test.
 
  The last section of dmesg (after Freeing unused kernel memory) is all
  I need.
 
  Attached relevant dmesg portion.
 
  Ok, try this patch on top of the last one.
 
  It looks like you used the dmatest module in multi-channel mode last
  time. One channel makes it easier to debug:
 
  modprobe dmatest max_channels=1 threads_per_chan=2 iterations=1
 
  Thanks for your help in debugging this. Hopefully this is the last
  patch to test. :)
 
  Ira
 
  Looks like this was not the last one. The test still fails, see below
 
   From this log, it looks like the DMA controller is not generating an
  interrupt after the second chain is started. The first chain is finished
  before the second thread runs and starts its chain.
 
  The end-of-segments interrupt is completely missing. The part is not
  behaving as the datasheet explains it should. Are you sure you applied
  the patch and rebuilt the kernel? (Just checking to be sure. I'm very
  appreciative of the amount of help you've given me debugging this!)
 
  Can you run this for me:
 
  modprobe dmatest max_channels=1 threads_per_chan=1 iterations=4
 
  Thanks again,
  Ira
 
 Without your patches applied the output of the test above looks
 like this:
 

Thanks, this is exactly what I was going to ask for next. :)

I really don't understand why the P2020 DMA controller isn't behaving
nicely after my patches.

Can you run a git bisect to figure out which patch in the series causes
the problems. It should take three or four build + test cycles to narrow
down which patch breaks the driver. When it is finished, send me the
output of git bisect.

Like this (assuming you have two branches: master and
fsldma, where fsldma is master + my patches):

# setup the bisect
git bisect start
git bisect bad fsldma
git bisect good master

# build and test the kernel using the same test as before:
modprobe dmatest max_channels=1 threads_per_chan=1 iterations=4

# if the test passes:
git bisect good

# if the test fails:
git bisect bad

# now build + test again, then mark that good or bad. Repeat until
# finished.


I really appreciate your help in testing this. You've been great at
providing everything I've asked for.

Thanks,
Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] fsldma: lockup fixes

2011-03-01 Thread Ira W. Snyder
On Tue, Mar 01, 2011 at 08:55:15AM -0800, Ira W. Snyder wrote:

[ big snip ]

 
 Thanks, this is exactly what I was going to ask for next. :)
 
 I really don't understand why the P2020 DMA controller isn't behaving
 nicely after my patches.
 
 Can you run a git bisect to figure out which patch in the series causes
 the problems. It should take three or four build + test cycles to narrow
 down which patch breaks the driver. When it is finished, send me the
 output of git bisect.
 
 Like this (assuming you have two branches: master and
 fsldma, where fsldma is master + my patches):
 
 # setup the bisect
 git bisect start
 git bisect bad fsldma
 git bisect good master
 
 # build and test the kernel using the same test as before:
 modprobe dmatest max_channels=1 threads_per_chan=1 iterations=4
 
 # if the test passes:
 git bisect good
 
 # if the test fails:
 git bisect bad
 
 # now build + test again, then mark that good or bad. Repeat until
 # finished.
 
 
 I really appreciate your help in testing this. You've been great at
 providing everything I've asked for.
 

I'd still like the bisect if you have a chance. I've re-reviewed the
patch series, and found the places that change register writes to the
controller.

The patch below changes the register operations back to the original
order. It doesn't make any sense why this would be required, but it is
worth a quick try.

I've added an XXX mark where you can comment out a single line if this
patch fails. It is highly unlikely to make any difference, but I'm
really having a hard time understanding what is wrong.

Ira


From 9e479ce27f8c1819694d7082bb4a27772b4baf52 Mon Sep 17 00:00:00 2001
From: Ira W. Snyder i...@ovro.caltech.edu
Date: Tue, 1 Mar 2011 11:43:00 -0800
Subject: [PATCH] fsldma: try and fix 85xx DMA controller

This is just a random guess at what might be wrong. The datasheet
doesn't say that a completed transfer must be aborted before starting a
new transfer (nor does it make much sense). However, the old code did it
anyway.

NOT AT ALL Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   15 +++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index e4d9d17..d8eedbc 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -213,6 +213,7 @@ static void dma_halt(struct fsldma_chan *chan)
int i;
 
mode = DMA_IN(chan, chan-regs-mr, 32);
+   dev_dbg(chan-dev, %s: dma_halt mode=0x%.8x\n, chan-name, mode);
mode |= FSL_DMA_MR_CA;
DMA_OUT(chan, chan-regs-mr, mode, 32);
 
@@ -921,10 +922,24 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan 
*chan)
list_splice_tail_init(chan-ld_pending, chan-ld_running);
 
/*
+* XXX: Guess at problems
+*
+* The 85xx requires that you run this routine before you try to start
+* the next DMA for an as yet unknown reason. Maybe.
+*/
+   if ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) {
+   dev_dbg(chan-dev, %s: 85xx, running workaround\n, name);
+   dma_halt(chan);
+   }
+
+   /*
 * Program the descriptor's address into the DMA controller,
 * then start the DMA transaction
 */
set_cdar(chan, desc-async_tx.phys);
+
+
+   /* XXX: if that doesn't work, comment the get_cdar() line below */
get_cdar(chan);
 
dma_start(chan);
-- 
1.7.3.4
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] fsldma: lockup fixes

2011-02-28 Thread Ira W. Snyder
On Mon, Feb 28, 2011 at 01:36:38PM +0200, Felix Radensky wrote:
 Hi Ira,
 
 I've tried your patches with linux-2.6.38-rc6 on P2020RDB.
 DMA test fails with the following errors if threads_per_chan != 1
 
 dma0chan0-copy1: terminating after 1 tests, 1 failures (status 0)
 dma0chan0-copy2: #0: test timed out
 
 I've run the test like this:
 
 modprobe dmatest threads_per_chan=2 iterations=1
 

Thanks Felix. This works fine on the 83xx DMA controller. When you have
a chance, can you add #define DEBUG 1 as the first line of
drivers/dma/fsldma.c and then rerun your test with:

modprobe dmatest threads_per_chan=2 iterations=1 max_channels=1

And send me the dmesg output.

I don't quite understand the difference between links and lists in the
85xx controller yet. I'll work my way through the datasheet this morning
and send out a fixed patch.

Thanks very much for running the tests!

Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] fsldma: lockup fixes

2011-02-28 Thread Ira W. Snyder
On Mon, Feb 28, 2011 at 08:47:42PM +0200, Felix Radensky wrote:
 br
 Hi Ira,br
 br
 Attached dmesg output.br
 br
 Felix.br
 br
 preOn Mon, Feb 28, 2011 at 01:36:38PM +0200, Felix Radensky wrote:

 gt; Hi Ira,
 gt; 
 gt; I've tried your patches with linux-2.6.38-rc6 on P2020RDB.
 gt; DMA test fails with the following errors if threads_per_chan != 1
 gt; 
 gt; dma0chan0-copy1: terminating after 1 tests, 1 failures (status 0)
 gt; dma0chan0-copy2: #0: test timed out
 gt; 
 gt; I've run the test like this:
 gt; 
 gt; modprobe dmatest threads_per_chan=2 iterations=1
 gt; 
 
 Thanks Felix. This works fine on the 83xx DMA controller. When you have
 a chance, can you add #define DEBUG 1 as the first line of
 drivers/dma/fsldma.c and then rerun your test with:
 
 modprobe dmatest threads_per_chan=2 iterations=1 max_channels=1
 
 And send me the dmesg output.
 
 I don't quite understand the difference between links and lists in the
 85xx controller yet. I'll work my way through the datasheet this morning
 and send out a fixed patch.
 
 Thanks very much for running the tests!
 
 Ira

[ snip most of dmesg output ]

 Freeing unused kernel memory: 136k init
 __dma_request_channel: success (dma0chan0)
 of:fsl-elo-dma ffe0c300.dma: chan0: idle, starting controller
 dmatest: Started 2 threads using dma0chan0
 of:fsl-elo-dma ffe0c300.dma: chan0: irq: stat = 0x8
 of:fsl-elo-dma ffe0c300.dma: chan0: irq: End-of-link INT
 of:fsl-elo-dma ffe0c300.dma: chan0: irq: Exit
 of:fsl-elo-dma ffe0c300.dma: chan0: tasklet entry
 of:fsl-elo-dma ffe0c300.dma: chan0: completed_cookie=1
 of:fsl-elo-dma ffe0c300.dma: chan0: no pending LDs
 of:fsl-elo-dma ffe0c300.dma: chan0: tasklet exit
 dma0chan0-copy0: verifying source buffer...
 dma0chan0-copy0: verifying dest buffer...
 dma0chan0-copy0: #0: No errors with src_off=0x3a2 dst_off=0xc1e len=0x2ce5
 dma0chan0-copy0: terminating after 1 tests, 0 failures (status 0)
 of:fsl-elo-dma ffe0c300.dma: chan0: idle, starting controller
 dma0chan0-copy1: #0: test timed out
 dma0chan0-copy1: terminating after 1 tests, 1 failures (status 0)

Thank you very much Felix. The dmesg output shows that the controller
never got an interrupt for the second transaction. The patch below has
extra debugging information that may help determine why this happens.
Please apply it and re-run the test.

The last section of dmesg (after Freeing unused kernel memory) is all
I need.

Thanks again,
Ira


From 8935444cb18c921332ebe1d055531e54f0c100e9 Mon Sep 17 00:00:00 2001
From: Ira W. Snyder i...@ovro.caltech.edu
Date: Mon, 28 Feb 2011 11:33:17 -0800
Subject: [PATCH] fsldma: try and debug 85xx controller

1 - reduce the maximum transfer size to 1000 bytes to force chains
2 - re-enable end-of-segment interrupts to see what the hardware does
3 - enable end-of-list interrupts to see what the hardware does
4 - debug cookies (this shouldn't be a problem, but just in case)

NOT AT ALL Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   16 
 drivers/dma/fsldma.h |3 ++-
 2 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 3dc27a9..b82b76e 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -24,6 +24,9 @@
  *
  */
 
+#define DEBUG 1
+#define FSL_DMA_LD_DEBUG 1
+
 #include linux/init.h
 #include linux/module.h
 #include linux/pci.h
@@ -162,6 +165,7 @@ static void dma_init(struct fsldma_chan *chan)
 * BWC - Bandwidth sharing among channels
 */
DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC
+   | FSL_DMA_MR_EOSIE | FSL_DMA_MR_EOLSIE
| FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE, 32);
break;
case FSL_DMA_IP_83XX:
@@ -389,6 +393,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
 * that make up this transaction
 */
cookie = chan-common.cookie;
+   dev_dbg(chan-dev, %s: assign cookies: start=%d\n, chan-name, 
cookie);
list_for_each_entry(child, desc-tx_list, node) {
cookie++;
if (cookie  DMA_MIN_COOKIE)
@@ -397,6 +402,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
child-async_tx.cookie = cookie;
}
 
+   dev_dbg(chan-dev, %s: assign cookies: end=%d\n, chan-name, cookie);
chan-common.cookie = cookie;
 
/* put this transaction onto the tail of the pending queue */
@@ -1018,6 +1024,16 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data)
stat = ~FSL_DMA_SR_EOLNI;
}
 
+   if (stat  FSL_DMA_SR_EOLSI) {
+   dev_dbg(chan-dev, %s: irq: End-of-list INT\n, name);
+   stat = ~FSL_DMA_SR_EOLSI;
+   }
+
+   if (stat  FSL_DMA_SR_EOSI) {
+   dev_dbg(chan-dev, %s: irq: End-of-segment INT\n, name);
+   stat = ~FSL_DMA_SR_EOSI

Re: [PATCH 0/8] fsldma: lockup fixes

2011-02-28 Thread Ira W. Snyder
On Mon, Feb 28, 2011 at 10:15:49PM +0200, Felix Radensky wrote:
 Hi Ira,
 
  Thank you very much Felix. The dmesg output shows that the controller
  never got an interrupt for the second transaction. The patch below has
  extra debugging information that may help determine why this happens.
  Please apply it and re-run the test.
 
  The last section of dmesg (after Freeing unused kernel memory) is all
  I need.
 
 
 Attached relevant dmesg portion.
 

Ok, try this patch on top of the last one.

It looks like you used the dmatest module in multi-channel mode last
time. One channel makes it easier to debug:

modprobe dmatest max_channels=1 threads_per_chan=2 iterations=1

Thanks for your help in debugging this. Hopefully this is the last
patch to test. :)

Ira


From 58bc23c3b68f8db0aa09434fdeb6aef641a5eadd Mon Sep 17 00:00:00 2001
From: Ira W. Snyder i...@ovro.caltech.edu
Date: Mon, 28 Feb 2011 12:55:55 -0800
Subject: [PATCH] fsldma: enable end-of-segments interrupt on last descriptor

This is a hack to manually set the end-of-segments interrupt on the last
descriptor in each chain. It appears that the P2020RDB controller
doesn't generate the end-of-links interrupt as explained in the
datasheet.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index b82b76e..e4d9d17 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -141,7 +141,7 @@ static void set_ld_eol(struct fsldma_chan *chan, struct 
fsl_desc_sw *desc)
u64 snoop_bits;
 
snoop_bits = ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_83XX)
-   ? FSL_DMA_SNEN : 0;
+   ? FSL_DMA_SNEN : (u64)(0x8);
 
desc-hw.next_ln_addr = CPU_TO_DMA(chan,
DMA_TO_CPU(chan, desc-hw.next_ln_addr, 64) | FSL_DMA_EOL
@@ -165,7 +165,6 @@ static void dma_init(struct fsldma_chan *chan)
 * BWC - Bandwidth sharing among channels
 */
DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC
-   | FSL_DMA_MR_EOSIE | FSL_DMA_MR_EOLSIE
| FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE, 32);
break;
case FSL_DMA_IP_83XX:
-- 
1.7.3.4


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] fsldma: lockup fixes

2011-02-28 Thread Ira W. Snyder
On Mon, Feb 28, 2011 at 11:27:40PM +0200, Felix Radensky wrote:
 Hi Ira,
 
 On 02/28/2011 11:11 PM, Ira W. Snyder wrote:
  On Mon, Feb 28, 2011 at 10:15:49PM +0200, Felix Radensky wrote:
  Hi Ira,
 
  Thank you very much Felix. The dmesg output shows that the controller
  never got an interrupt for the second transaction. The patch below has
  extra debugging information that may help determine why this happens.
  Please apply it and re-run the test.
 
  The last section of dmesg (after Freeing unused kernel memory) is all
  I need.
 
  Attached relevant dmesg portion.
 
  Ok, try this patch on top of the last one.
 
  It looks like you used the dmatest module in multi-channel mode last
  time. One channel makes it easier to debug:
 
  modprobe dmatest max_channels=1 threads_per_chan=2 iterations=1
 
  Thanks for your help in debugging this. Hopefully this is the last
  patch to test. :)
 
  Ira
 
 
 Looks like this was not the last one. The test still fails, see below
 

From this log, it looks like the DMA controller is not generating an
interrupt after the second chain is started. The first chain is finished
before the second thread runs and starts its chain.

The end-of-segments interrupt is completely missing. The part is not
behaving as the datasheet explains it should. Are you sure you applied
the patch and rebuilt the kernel? (Just checking to be sure. I'm very
appreciative of the amount of help you've given me debugging this!)

Can you run this for me:

modprobe dmatest max_channels=1 threads_per_chan=1 iterations=4

Thanks again,
Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 0/8] fsldma: lockup fixes

2011-02-25 Thread Ira W. Snyder
Hello everyone,

I've been chasing random infrequent controller lockups in the fsldma driver
for a long time. I finally managed to find the problem and fix it. I'm not
quite sure about the exact sequence of events which causes the race
condition, but it is related to using the hardware registers to track the
controller state. See the patch changelogs for more detail.

The problems were quickly found by turning on DMAPOOL_DEBUG inside
mm/dmapool.c. This poisons memory allocated with the dmapool API.

With dmapool poisoning turned on, the dmatest driver would start producing
failures within a few seconds. After this patchset has been applied, I have
run several iterations of the 10 threads per channel, 10 iterations per
thread test without any problems.

I have made some changes which effect the 85xx/86xx part. I believe that
the changes only effect features which have been unused since the rewrite
in Jan 2010. It would be very good to get a test report from an 85xx/86xx
user.

While making the previous changes, I noticed that the fsldma driver does
not respect the automatic DMA unmapping of src and dst buffers. I have
added support for this feature. This also required a fix to dmatest, which
was sending incorrect flags.

The support async_tx dependencies patch could be split apart from the
automatic unmapping patch if it is desirable. They both touch the same
piece of code, so I thought it was ok to combine them. Let me know.

I would really like to see this go into 2.6.39. I think we can get it
reviewed before then. :)

Ira W. Snyder (8):
  fsldma: move related helper functions near each other
  fsldma: use channel name in printk output
  fsldma: improve link descriptor debugging
  fsldma: minor codingstyle and consistency fixes
  fsldma: fix controller lockups
  fsldma: support async_tx dependencies and automatic unmapping
  dmatest: fix automatic buffer unmap type
  fsldma: reduce locking during descriptor cleanup

 drivers/dma/dmatest.c |7 +-
 drivers/dma/fsldma.c  |  485 +---
 drivers/dma/fsldma.h  |6 +-
 3 files changed, 263 insertions(+), 235 deletions(-)

-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 3/8] fsldma: improve link descriptor debugging

2011-02-25 Thread Ira W. Snyder
This adds better tracking to link descriptor allocations, callbacks, and
frees. This makes it much easier to track errors with link descriptors.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   21 +++--
 1 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 6e3d3d7..851993c 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -416,6 +416,10 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor(
desc-async_tx.tx_submit = fsl_dma_tx_submit;
desc-async_tx.phys = pdesc;
 
+#ifdef FSL_DMA_LD_DEBUG
+   dev_dbg(chan-dev, %s: LD %p allocated\n, chan-name, desc);
+#endif
+
return desc;
 }
 
@@ -467,6 +471,9 @@ static void fsldma_free_desc_list(struct fsldma_chan *chan,
 
list_for_each_entry_safe(desc, _desc, list, node) {
list_del(desc-node);
+#ifdef FSL_DMA_LD_DEBUG
+   dev_dbg(chan-dev, %s: LD %p free\n, chan-name, desc);
+#endif
dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys);
}
 }
@@ -478,6 +485,9 @@ static void fsldma_free_desc_list_reverse(struct 
fsldma_chan *chan,
 
list_for_each_entry_safe_reverse(desc, _desc, list, node) {
list_del(desc-node);
+#ifdef FSL_DMA_LD_DEBUG
+   dev_dbg(chan-dev, %s: LD %p free\n, chan-name, desc);
+#endif
dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys);
}
 }
@@ -554,9 +564,6 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy(
dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom);
goto fail;
}
-#ifdef FSL_DMA_LD_DEBUG
-   dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, 
new);
-#endif
 
copy = min(len, (size_t)FSL_DMA_BCR_MAX_CNT);
 
@@ -642,9 +649,6 @@ static struct dma_async_tx_descriptor 
*fsl_dma_prep_sg(struct dma_chan *dchan,
dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom);
goto fail;
}
-#ifdef FSL_DMA_LD_DEBUG
-   dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, 
new);
-#endif
 
set_desc_cnt(chan, new-hw, len);
set_desc_src(chan, new-hw, src);
@@ -881,13 +885,18 @@ static void fsl_chan_ld_cleanup(struct fsldma_chan *chan)
callback_param = desc-async_tx.callback_param;
if (callback) {
spin_unlock_irqrestore(chan-desc_lock, flags);
+#ifdef FSL_DMA_LD_DEBUG
dev_dbg(chan-dev, %s: LD %p callback\n, name, desc);
+#endif
callback(callback_param);
spin_lock_irqsave(chan-desc_lock, flags);
}
 
/* Run any dependencies, then free the descriptor */
dma_run_dependencies(desc-async_tx);
+#ifdef FSL_DMA_LD_DEBUG
+   dev_dbg(chan-dev, %s: LD %p free\n, name, desc);
+#endif
dma_pool_free(chan-desc_pool, desc, desc-async_tx.phys);
}
 
-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/8] fsldma: move related helper functions near each other

2011-02-25 Thread Ira W. Snyder
This is a purely cosmetic cleanup. It is nice to have related functions
right next to each other in the code.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |  116 +++--
 1 files changed, 64 insertions(+), 52 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 4de947a..2e1af45 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -39,33 +39,9 @@
 
 static const char msg_ld_oom[] = No free memory for link descriptor\n;
 
-static void dma_init(struct fsldma_chan *chan)
-{
-   /* Reset the channel */
-   DMA_OUT(chan, chan-regs-mr, 0, 32);
-
-   switch (chan-feature  FSL_DMA_IP_MASK) {
-   case FSL_DMA_IP_85XX:
-   /* Set the channel to below modes:
-* EIE - Error interrupt enable
-* EOSIE - End of segments interrupt enable (basic mode)
-* EOLNIE - End of links interrupt enable
-* BWC - Bandwidth sharing among channels
-*/
-   DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC
-   | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE
-   | FSL_DMA_MR_EOSIE, 32);
-   break;
-   case FSL_DMA_IP_83XX:
-   /* Set the channel to below modes:
-* EOTIE - End-of-transfer interrupt enable
-* PRC_RM - PCI read multiple
-*/
-   DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_EOTIE
-   | FSL_DMA_MR_PRC_RM, 32);
-   break;
-   }
-}
+/*
+ * Register Helpers
+ */
 
 static void set_sr(struct fsldma_chan *chan, u32 val)
 {
@@ -77,6 +53,30 @@ static u32 get_sr(struct fsldma_chan *chan)
return DMA_IN(chan, chan-regs-sr, 32);
 }
 
+static void set_cdar(struct fsldma_chan *chan, dma_addr_t addr)
+{
+   DMA_OUT(chan, chan-regs-cdar, addr | FSL_DMA_SNEN, 64);
+}
+
+static dma_addr_t get_cdar(struct fsldma_chan *chan)
+{
+   return DMA_IN(chan, chan-regs-cdar, 64)  ~FSL_DMA_SNEN;
+}
+
+static dma_addr_t get_ndar(struct fsldma_chan *chan)
+{
+   return DMA_IN(chan, chan-regs-ndar, 64);
+}
+
+static u32 get_bcr(struct fsldma_chan *chan)
+{
+   return DMA_IN(chan, chan-regs-bcr, 32);
+}
+
+/*
+ * Descriptor Helpers
+ */
+
 static void set_desc_cnt(struct fsldma_chan *chan,
struct fsl_dma_ld_hw *hw, u32 count)
 {
@@ -113,24 +113,49 @@ static void set_desc_next(struct fsldma_chan *chan,
hw-next_ln_addr = CPU_TO_DMA(chan, snoop_bits | next, 64);
 }
 
-static void set_cdar(struct fsldma_chan *chan, dma_addr_t addr)
+static void set_ld_eol(struct fsldma_chan *chan,
+   struct fsl_desc_sw *desc)
 {
-   DMA_OUT(chan, chan-regs-cdar, addr | FSL_DMA_SNEN, 64);
-}
+   u64 snoop_bits;
 
-static dma_addr_t get_cdar(struct fsldma_chan *chan)
-{
-   return DMA_IN(chan, chan-regs-cdar, 64)  ~FSL_DMA_SNEN;
-}
+   snoop_bits = ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_83XX)
+   ? FSL_DMA_SNEN : 0;
 
-static dma_addr_t get_ndar(struct fsldma_chan *chan)
-{
-   return DMA_IN(chan, chan-regs-ndar, 64);
+   desc-hw.next_ln_addr = CPU_TO_DMA(chan,
+   DMA_TO_CPU(chan, desc-hw.next_ln_addr, 64) | FSL_DMA_EOL
+   | snoop_bits, 64);
 }
 
-static u32 get_bcr(struct fsldma_chan *chan)
+/*
+ * DMA Engine Hardware Control Helpers
+ */
+
+static void dma_init(struct fsldma_chan *chan)
 {
-   return DMA_IN(chan, chan-regs-bcr, 32);
+   /* Reset the channel */
+   DMA_OUT(chan, chan-regs-mr, 0, 32);
+
+   switch (chan-feature  FSL_DMA_IP_MASK) {
+   case FSL_DMA_IP_85XX:
+   /* Set the channel to below modes:
+* EIE - Error interrupt enable
+* EOSIE - End of segments interrupt enable (basic mode)
+* EOLNIE - End of links interrupt enable
+* BWC - Bandwidth sharing among channels
+*/
+   DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC
+   | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE
+   | FSL_DMA_MR_EOSIE, 32);
+   break;
+   case FSL_DMA_IP_83XX:
+   /* Set the channel to below modes:
+* EOTIE - End-of-transfer interrupt enable
+* PRC_RM - PCI read multiple
+*/
+   DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_EOTIE
+   | FSL_DMA_MR_PRC_RM, 32);
+   break;
+   }
 }
 
 static int dma_is_idle(struct fsldma_chan *chan)
@@ -185,19 +210,6 @@ static void dma_halt(struct fsldma_chan *chan)
dev_err(chan-dev, DMA halt timeout!\n);
 }
 
-static void set_ld_eol(struct fsldma_chan *chan,
-   struct fsl_desc_sw *desc)
-{
-   u64 snoop_bits;
-
-   snoop_bits = ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_83XX

[PATCH 4/8] fsldma: minor codingstyle and consistency fixes

2011-02-25 Thread Ira W. Snyder
This fixes some minor violations of the coding style. It also changes
the style of the device_prep_dma_*() function definitions so they are
identical.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   29 +
 drivers/dma/fsldma.h |4 ++--
 2 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 851993c..06421c0 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -84,7 +84,7 @@ static void set_desc_cnt(struct fsldma_chan *chan,
 }
 
 static void set_desc_src(struct fsldma_chan *chan,
-   struct fsl_dma_ld_hw *hw, dma_addr_t src)
+struct fsl_dma_ld_hw *hw, dma_addr_t src)
 {
u64 snoop_bits;
 
@@ -94,7 +94,7 @@ static void set_desc_src(struct fsldma_chan *chan,
 }
 
 static void set_desc_dst(struct fsldma_chan *chan,
-   struct fsl_dma_ld_hw *hw, dma_addr_t dst)
+struct fsl_dma_ld_hw *hw, dma_addr_t dst)
 {
u64 snoop_bits;
 
@@ -104,7 +104,7 @@ static void set_desc_dst(struct fsldma_chan *chan,
 }
 
 static void set_desc_next(struct fsldma_chan *chan,
-   struct fsl_dma_ld_hw *hw, dma_addr_t next)
+ struct fsl_dma_ld_hw *hw, dma_addr_t next)
 {
u64 snoop_bits;
 
@@ -113,8 +113,7 @@ static void set_desc_next(struct fsldma_chan *chan,
hw-next_ln_addr = CPU_TO_DMA(chan, snoop_bits | next, 64);
 }
 
-static void set_ld_eol(struct fsldma_chan *chan,
-   struct fsl_desc_sw *desc)
+static void set_ld_eol(struct fsldma_chan *chan, struct fsl_desc_sw *desc)
 {
u64 snoop_bits;
 
@@ -333,8 +332,7 @@ static void fsl_chan_toggle_ext_start(struct fsldma_chan 
*chan, int enable)
chan-feature = ~FSL_DMA_CHAN_START_EXT;
 }
 
-static void append_ld_queue(struct fsldma_chan *chan,
-   struct fsl_desc_sw *desc)
+static void append_ld_queue(struct fsldma_chan *chan, struct fsl_desc_sw *desc)
 {
struct fsl_desc_sw *tail = to_fsl_desc(chan-ld_pending.prev);
 
@@ -375,8 +373,8 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
cookie = chan-common.cookie;
list_for_each_entry(child, desc-tx_list, node) {
cookie++;
-   if (cookie  0)
-   cookie = 1;
+   if (cookie  DMA_MIN_COOKIE)
+   cookie = DMA_MIN_COOKIE;
 
child-async_tx.cookie = cookie;
}
@@ -397,8 +395,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
  *
  * Return - The descriptor allocated. NULL for failed.
  */
-static struct fsl_desc_sw *fsl_dma_alloc_descriptor(
-   struct fsldma_chan *chan)
+static struct fsl_desc_sw *fsl_dma_alloc_descriptor(struct fsldma_chan *chan)
 {
const char *name = chan-name;
struct fsl_desc_sw *desc;
@@ -423,7 +420,6 @@ static struct fsl_desc_sw *fsl_dma_alloc_descriptor(
return desc;
 }
 
-
 /**
  * fsl_dma_alloc_chan_resources - Allocate resources for DMA channel.
  * @chan : Freescale DMA channel
@@ -534,14 +530,15 @@ fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned 
long flags)
/* Insert the link descriptor to the LD ring */
list_add_tail(new-node, new-tx_list);
 
-   /* Set End-of-link to the last link descriptor of new list*/
+   /* Set End-of-link to the last link descriptor of new list */
set_ld_eol(chan, new);
 
return new-async_tx;
 }
 
-static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy(
-   struct dma_chan *dchan, dma_addr_t dma_dst, dma_addr_t dma_src,
+static struct dma_async_tx_descriptor *
+fsl_dma_prep_memcpy(struct dma_chan *dchan,
+   dma_addr_t dma_dst, dma_addr_t dma_src,
size_t len, unsigned long flags)
 {
struct fsldma_chan *chan;
@@ -591,7 +588,7 @@ static struct dma_async_tx_descriptor *fsl_dma_prep_memcpy(
new-async_tx.flags = flags; /* client is in control of this ack */
new-async_tx.cookie = -EBUSY;
 
-   /* Set End-of-link to the last link descriptor of new list*/
+   /* Set End-of-link to the last link descriptor of new list */
set_ld_eol(chan, new);
 
return first-async_tx;
diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h
index 113e713..49189da 100644
--- a/drivers/dma/fsldma.h
+++ b/drivers/dma/fsldma.h
@@ -102,8 +102,8 @@ struct fsl_desc_sw {
 } __attribute__((aligned(32)));
 
 struct fsldma_chan_regs {
-   u32 mr; /* 0x00 - Mode Register */
-   u32 sr; /* 0x04 - Status Register */
+   u32 mr; /* 0x00 - Mode Register */
+   u32 sr; /* 0x04 - Status Register */
u64 cdar;   /* 0x08 - Current descriptor address register */
u64 sar;/* 0x10 - Source Address Register */
u64 dar;/* 0x18

[PATCH 2/8] fsldma: use channel name in printk output

2011-02-25 Thread Ira W. Snyder
This makes debugging the driver much easier when multiple channels are
running concurrently. In addition, you can see how much descriptor
memory each channel has allocated via the dmapool API in sysfs.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |   60 +++--
 drivers/dma/fsldma.h |1 +
 2 files changed, 34 insertions(+), 27 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 2e1af45..6e3d3d7 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -37,7 +37,7 @@
 
 #include fsldma.h
 
-static const char msg_ld_oom[] = No free memory for link descriptor\n;
+static const char msg_ld_oom[] = No free memory for link descriptor;
 
 /*
  * Register Helpers
@@ -207,7 +207,7 @@ static void dma_halt(struct fsldma_chan *chan)
}
 
if (!dma_is_idle(chan))
-   dev_err(chan-dev, DMA halt timeout!\n);
+   dev_err(chan-dev, %s: DMA halt timeout!\n, chan-name);
 }
 
 /**
@@ -400,12 +400,13 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
 static struct fsl_desc_sw *fsl_dma_alloc_descriptor(
struct fsldma_chan *chan)
 {
+   const char *name = chan-name;
struct fsl_desc_sw *desc;
dma_addr_t pdesc;
 
desc = dma_pool_alloc(chan-desc_pool, GFP_ATOMIC, pdesc);
if (!desc) {
-   dev_dbg(chan-dev, out of memory for link desc\n);
+   dev_dbg(chan-dev, %s: out of memory for link desc\n, name);
return NULL;
}
 
@@ -439,13 +440,12 @@ static int fsl_dma_alloc_chan_resources(struct dma_chan 
*dchan)
 * We need the descriptor to be aligned to 32bytes
 * for meeting FSL DMA specification requirement.
 */
-   chan-desc_pool = dma_pool_create(fsl_dma_engine_desc_pool,
- chan-dev,
+   chan-desc_pool = dma_pool_create(chan-name, chan-dev,
  sizeof(struct fsl_desc_sw),
  __alignof__(struct fsl_desc_sw), 0);
if (!chan-desc_pool) {
-   dev_err(chan-dev, unable to allocate channel %d 
-  descriptor pool\n, chan-id);
+   dev_err(chan-dev, %s: unable to allocate descriptor pool\n,
+  chan-name);
return -ENOMEM;
}
 
@@ -491,7 +491,7 @@ static void fsl_dma_free_chan_resources(struct dma_chan 
*dchan)
struct fsldma_chan *chan = to_fsl_chan(dchan);
unsigned long flags;
 
-   dev_dbg(chan-dev, Free all channel resources.\n);
+   dev_dbg(chan-dev, %s: Free all channel resources.\n, chan-name);
spin_lock_irqsave(chan-desc_lock, flags);
fsldma_free_desc_list(chan, chan-ld_pending);
fsldma_free_desc_list(chan, chan-ld_running);
@@ -514,7 +514,7 @@ fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned 
long flags)
 
new = fsl_dma_alloc_descriptor(chan);
if (!new) {
-   dev_err(chan-dev, msg_ld_oom);
+   dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom);
return NULL;
}
 
@@ -551,11 +551,11 @@ static struct dma_async_tx_descriptor 
*fsl_dma_prep_memcpy(
/* Allocate the link descriptor from DMA pool */
new = fsl_dma_alloc_descriptor(chan);
if (!new) {
-   dev_err(chan-dev, msg_ld_oom);
+   dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom);
goto fail;
}
 #ifdef FSL_DMA_LD_DEBUG
-   dev_dbg(chan-dev, new link desc alloc %p\n, new);
+   dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, 
new);
 #endif
 
copy = min(len, (size_t)FSL_DMA_BCR_MAX_CNT);
@@ -639,11 +639,11 @@ static struct dma_async_tx_descriptor 
*fsl_dma_prep_sg(struct dma_chan *dchan,
/* allocate and populate the descriptor */
new = fsl_dma_alloc_descriptor(chan);
if (!new) {
-   dev_err(chan-dev, msg_ld_oom);
+   dev_err(chan-dev, %s: %s\n, chan-name, msg_ld_oom);
goto fail;
}
 #ifdef FSL_DMA_LD_DEBUG
-   dev_dbg(chan-dev, new link desc alloc %p\n, new);
+   dev_dbg(chan-dev, %s: new link desc alloc %p\n, chan-name, 
new);
 #endif
 
set_desc_cnt(chan, new-hw, len);
@@ -815,7 +815,7 @@ static void fsl_dma_update_completed_cookie(struct 
fsldma_chan *chan)
spin_lock_irqsave(chan-desc_lock, flags);
 
if (list_empty(chan-ld_running)) {
-   dev_dbg(chan-dev, no running descriptors\n);
+   dev_dbg(chan-dev, %s: no running descriptors\n, chan-name);
goto out_unlock;
}
 
@@ -859,11 +859,13 @@ static enum dma_status

[PATCH 7/8] dmatest: fix automatic buffer unmap type

2011-02-25 Thread Ira W. Snyder
The dmatest code relies on the DMAEngine API to automatically call
dma_unmap_single() on src buffers. The flags it passes are incorrect,
fix them.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/dmatest.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index 5589358..7e1b0aa 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -285,7 +285,12 @@ static int dmatest_func(void *data)
 
set_user_nice(current, 10);
 
-   flags = DMA_CTRL_ACK | DMA_COMPL_SKIP_DEST_UNMAP | DMA_PREP_INTERRUPT;
+   /*
+* src buffers are freed by the DMAEngine code with dma_unmap_single()
+* dst buffers are freed by ourselves below
+*/
+   flags = DMA_CTRL_ACK | DMA_PREP_INTERRUPT
+ | DMA_COMPL_SKIP_DEST_UNMAP | DMA_COMPL_SRC_UNMAP_SINGLE;
 
while (!kthread_should_stop()
!(iterations  total_tests = iterations)) {
-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 8/8] fsldma: reduce locking during descriptor cleanup

2011-02-25 Thread Ira W. Snyder
This merges the fsl_chan_ld_cleanup() function into the dma_do_tasklet()
function to reduce locking overhead. In the best case, we will be able
to keep the DMA controller busy while we are freeing used descriptors.
In all cases, the spinlock is grabbed two times fewer than before on
each transaction.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |  114 +
 1 files changed, 49 insertions(+), 65 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 4014790..3dc27a9 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -872,67 +872,16 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan 
*chan,
 }
 
 /**
- * fsl_chan_ld_cleanup - Clean up link descriptors
- * @chan : Freescale DMA channel
- *
- * This function is run after the queue of running descriptors has been
- * executed by the DMA engine. It will run any callbacks, and then free
- * the descriptors.
- *
- * HARDWARE STATE: idle
- */
-static void fsl_chan_ld_cleanup(struct fsldma_chan *chan)
-{
-   struct fsl_desc_sw *desc, *_desc;
-   const char *name = chan-name;
-   LIST_HEAD(ld_cleanup);
-   unsigned long flags;
-
-   spin_lock_irqsave(chan-desc_lock, flags);
-
-   /* update the cookie if we have some descriptors to cleanup */
-   if (!list_empty(chan-ld_running)) {
-   dma_cookie_t cookie;
-
-   desc = to_fsl_desc(chan-ld_running.prev);
-   cookie = desc-async_tx.cookie;
-
-   chan-completed_cookie = cookie;
-   dev_dbg(chan-dev, %s: completed_cookie=%d\n, name, cookie);
-   }
-
-   /*
-* move the descriptors to a temporary list so we can drop the lock
-* during the entire cleanup operation
-*/
-   list_splice_tail_init(chan-ld_running, ld_cleanup);
-
-   spin_unlock_irqrestore(chan-desc_lock, flags);
-
-   /* Run the callback for each descriptor, in order */
-   list_for_each_entry_safe(desc, _desc, ld_cleanup, node) {
-
-   /* Remove from the list of transactions */
-   list_del(desc-node);
-
-   /* Run all cleanup for this descriptor */
-   fsldma_cleanup_descriptor(chan, desc);
-   }
-}
-
-/**
  * fsl_chan_xfer_ld_queue - transfer any pending transactions
  * @chan : Freescale DMA channel
  *
  * HARDWARE STATE: idle
+ * LOCKING: must hold chan-desc_lock
  */
 static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
 {
const char *name = chan-name;
struct fsl_desc_sw *desc;
-   unsigned long flags;
-
-   spin_lock_irqsave(chan-desc_lock, flags);
 
/*
 * If the list of pending descriptors is empty, then we
@@ -940,7 +889,7 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
 */
if (list_empty(chan-ld_pending)) {
dev_dbg(chan-dev, %s: no pending LDs\n, name);
-   goto out_unlock;
+   return;
}
 
/*
@@ -950,7 +899,7 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
 */
if (!chan-idle) {
dev_dbg(chan-dev, %s: DMA controller still busy\n, name);
-   goto out_unlock;
+   return;
}
 
/*
@@ -975,9 +924,6 @@ static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
 
dma_start(chan);
chan-idle = false;
-
-out_unlock:
-   spin_unlock_irqrestore(chan-desc_lock, flags);
 }
 
 /**
@@ -987,7 +933,11 @@ out_unlock:
 static void fsl_dma_memcpy_issue_pending(struct dma_chan *dchan)
 {
struct fsldma_chan *chan = to_fsl_chan(dchan);
+   unsigned long flags;
+
+   spin_lock_irqsave(chan-desc_lock, flags);
fsl_chan_xfer_ld_queue(chan);
+   spin_unlock_irqrestore(chan-desc_lock, flags);
 }
 
 /**
@@ -1089,21 +1039,55 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data)
 static void dma_do_tasklet(unsigned long data)
 {
struct fsldma_chan *chan = (struct fsldma_chan *)data;
+   struct fsl_desc_sw *desc, *_desc;
+   const char *name = chan-name;
+   LIST_HEAD(ld_cleanup);
unsigned long flags;
 
-   dev_dbg(chan-dev, %s: tasklet entry\n, chan-name);
+   dev_dbg(chan-dev, %s: tasklet entry\n, name);
 
-   /* run all callbacks, free all used descriptors */
-   fsl_chan_ld_cleanup(chan);
-
-   /* the channel is now idle */
spin_lock_irqsave(chan-desc_lock, flags);
+
+   /* update the cookie if we have some descriptors to cleanup */
+   if (!list_empty(chan-ld_running)) {
+   dma_cookie_t cookie;
+
+   desc = to_fsl_desc(chan-ld_running.prev);
+   cookie = desc-async_tx.cookie;
+
+   chan-completed_cookie = cookie;
+   dev_dbg(chan-dev, %s: completed_cookie=%d\n, name, cookie);
+   }
+
+   /*
+* move the descriptors to a temporary list so we can drop the lock

[PATCH 6/8] fsldma: support async_tx dependencies and automatic unmapping

2011-02-25 Thread Ira W. Snyder
Previous to this patch, the dma_run_dependencies() function has been
called while holding desc_lock. This function can call tx_submit() for
other descriptors, which may try to re-grab the lock. Avoid this by
moving the descriptors to be cleaned up to a temporary list, and
dropping the lock before cleanup.

At the same time, add support for automatic unmapping of src and dst
buffers, as offered by the DMAEngine API.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |  132 --
 1 files changed, 95 insertions(+), 37 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index d3c5100..4014790 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -78,6 +78,11 @@ static void set_desc_cnt(struct fsldma_chan *chan,
hw-count = CPU_TO_DMA(chan, count, 32);
 }
 
+static u32 get_desc_cnt(struct fsldma_chan *chan, struct fsl_desc_sw *desc)
+{
+   return DMA_TO_CPU(chan, desc-hw.count, 32);
+}
+
 static void set_desc_src(struct fsldma_chan *chan,
 struct fsl_dma_ld_hw *hw, dma_addr_t src)
 {
@@ -88,6 +93,16 @@ static void set_desc_src(struct fsldma_chan *chan,
hw-src_addr = CPU_TO_DMA(chan, snoop_bits | src, 64);
 }
 
+static dma_addr_t get_desc_src(struct fsldma_chan *chan,
+  struct fsl_desc_sw *desc)
+{
+   u64 snoop_bits;
+
+   snoop_bits = ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX)
+   ? ((u64)FSL_DMA_SATR_SREADTYPE_SNOOP_READ  32) : 0;
+   return DMA_TO_CPU(chan, desc-hw.src_addr, 64)  ~snoop_bits;
+}
+
 static void set_desc_dst(struct fsldma_chan *chan,
 struct fsl_dma_ld_hw *hw, dma_addr_t dst)
 {
@@ -98,6 +113,16 @@ static void set_desc_dst(struct fsldma_chan *chan,
hw-dst_addr = CPU_TO_DMA(chan, snoop_bits | dst, 64);
 }
 
+static dma_addr_t get_desc_dst(struct fsldma_chan *chan,
+  struct fsl_desc_sw *desc)
+{
+   u64 snoop_bits;
+
+   snoop_bits = ((chan-feature  FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX)
+   ? ((u64)FSL_DMA_DATR_DWRITETYPE_SNOOP_WRITE  32) : 0;
+   return DMA_TO_CPU(chan, desc-hw.dst_addr, 64)  ~snoop_bits;
+}
+
 static void set_desc_next(struct fsldma_chan *chan,
  struct fsl_dma_ld_hw *hw, dma_addr_t next)
 {
@@ -796,6 +821,57 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 }
 
 /**
+ * fsldma_cleanup_descriptor - cleanup and free a single link descriptor
+ * @chan: Freescale DMA channel
+ * @desc: descriptor to cleanup and free
+ *
+ * This function is used on a descriptor which has been executed by the DMA
+ * controller. It will run any callbacks, submit any dependencies, and then
+ * free the descriptor.
+ */
+static void fsldma_cleanup_descriptor(struct fsldma_chan *chan,
+ struct fsl_desc_sw *desc)
+{
+   struct dma_async_tx_descriptor *txd = desc-async_tx;
+   struct device *dev = chan-common.device-dev;
+   dma_addr_t src = get_desc_src(chan, desc);
+   dma_addr_t dst = get_desc_dst(chan, desc);
+   u32 len = get_desc_cnt(chan, desc);
+
+   /* Run the link descriptor callback function */
+   if (txd-callback) {
+#ifdef FSL_DMA_LD_DEBUG
+   dev_dbg(chan-dev, %s: LD %p callback\n, chan-name, desc);
+#endif
+   txd-callback(txd-callback_param);
+   }
+
+   /* Run any dependencies */
+   dma_run_dependencies(txd);
+
+   /* Unmap the dst buffer, if requested */
+   if (!(txd-flags  DMA_COMPL_SKIP_DEST_UNMAP)) {
+   if (txd-flags  DMA_COMPL_DEST_UNMAP_SINGLE)
+   dma_unmap_single(dev, dst, len, DMA_FROM_DEVICE);
+   else
+   dma_unmap_page(dev, dst, len, DMA_FROM_DEVICE);
+   }
+
+   /* Unmap the src buffer, if requested */
+   if (!(txd-flags  DMA_COMPL_SKIP_SRC_UNMAP)) {
+   if (txd-flags  DMA_COMPL_SRC_UNMAP_SINGLE)
+   dma_unmap_single(dev, src, len, DMA_TO_DEVICE);
+   else
+   dma_unmap_page(dev, src, len, DMA_TO_DEVICE);
+   }
+
+#ifdef FSL_DMA_LD_DEBUG
+   dev_dbg(chan-dev, %s: LD %p free\n, chan-name, desc);
+#endif
+   dma_pool_free(chan-desc_pool, desc, txd-phys);
+}
+
+/**
  * fsl_chan_ld_cleanup - Clean up link descriptors
  * @chan : Freescale DMA channel
  *
@@ -809,57 +885,39 @@ static void fsl_chan_ld_cleanup(struct fsldma_chan *chan)
 {
struct fsl_desc_sw *desc, *_desc;
const char *name = chan-name;
+   LIST_HEAD(ld_cleanup);
unsigned long flags;
 
spin_lock_irqsave(chan-desc_lock, flags);
 
-   /* if the ld_running list is empty, there is nothing to do */
-   if (list_empty(chan-ld_running)) {
-   dev_dbg(chan-dev, %s: no descriptors to cleanup\n, name);
-   goto out_unlock;
+   /* update the cookie if we have some

[PATCH 5/8] fsldma: fix controller lockups

2011-02-25 Thread Ira W. Snyder
Enabling poisoning in the dmapool API quickly showed that the DMA
controller was fetching descriptors that should not have been in use.
This has caused intermittent controller lockups during testing.

I have been unable to figure out the exact set of conditions which cause
this to happen. However, I believe it is related to the driver using the
hardware registers to track whether the controller is busy or not. The
code can incorrectly decide that the hardware is idle due to lag between
register writes and the hardware actually becoming busy.

To fix this, the driver has been reworked to explicitly track the state
of the hardware, rather than try to guess what it is doing based on the
register values.

This has passed dmatest with 10 threads per channel, 10 iterations
per thread several times without error. Previously, this would fail
within a few seconds.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/dma/fsldma.c |  187 +++---
 drivers/dma/fsldma.h |1 +
 2 files changed, 72 insertions(+), 116 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 06421c0..d3c5100 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -63,11 +63,6 @@ static dma_addr_t get_cdar(struct fsldma_chan *chan)
return DMA_IN(chan, chan-regs-cdar, 64)  ~FSL_DMA_SNEN;
 }
 
-static dma_addr_t get_ndar(struct fsldma_chan *chan)
-{
-   return DMA_IN(chan, chan-regs-ndar, 64);
-}
-
 static u32 get_bcr(struct fsldma_chan *chan)
 {
return DMA_IN(chan, chan-regs-bcr, 32);
@@ -138,13 +133,11 @@ static void dma_init(struct fsldma_chan *chan)
case FSL_DMA_IP_85XX:
/* Set the channel to below modes:
 * EIE - Error interrupt enable
-* EOSIE - End of segments interrupt enable (basic mode)
 * EOLNIE - End of links interrupt enable
 * BWC - Bandwidth sharing among channels
 */
DMA_OUT(chan, chan-regs-mr, FSL_DMA_MR_BWC
-   | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE
-   | FSL_DMA_MR_EOSIE, 32);
+   | FSL_DMA_MR_EIE | FSL_DMA_MR_EOLNIE, 32);
break;
case FSL_DMA_IP_83XX:
/* Set the channel to below modes:
@@ -757,14 +750,15 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 
switch (cmd) {
case DMA_TERMINATE_ALL:
+   spin_lock_irqsave(chan-desc_lock, flags);
+
/* Halt the DMA engine */
dma_halt(chan);
 
-   spin_lock_irqsave(chan-desc_lock, flags);
-
/* Remove and free all of the descriptors in the LD queue */
fsldma_free_desc_list(chan, chan-ld_pending);
fsldma_free_desc_list(chan, chan-ld_running);
+   chan-idle = true;
 
spin_unlock_irqrestore(chan-desc_lock, flags);
return 0;
@@ -802,78 +796,45 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 }
 
 /**
- * fsl_dma_update_completed_cookie - Update the completed cookie.
+ * fsl_chan_ld_cleanup - Clean up link descriptors
  * @chan : Freescale DMA channel
  *
- * CONTEXT: hardirq
+ * This function is run after the queue of running descriptors has been
+ * executed by the DMA engine. It will run any callbacks, and then free
+ * the descriptors.
+ *
+ * HARDWARE STATE: idle
  */
-static void fsl_dma_update_completed_cookie(struct fsldma_chan *chan)
+static void fsl_chan_ld_cleanup(struct fsldma_chan *chan)
 {
-   struct fsl_desc_sw *desc;
+   struct fsl_desc_sw *desc, *_desc;
+   const char *name = chan-name;
unsigned long flags;
-   dma_cookie_t cookie;
 
spin_lock_irqsave(chan-desc_lock, flags);
 
+   /* if the ld_running list is empty, there is nothing to do */
if (list_empty(chan-ld_running)) {
-   dev_dbg(chan-dev, %s: no running descriptors\n, chan-name);
+   dev_dbg(chan-dev, %s: no descriptors to cleanup\n, name);
goto out_unlock;
}
 
-   /* Get the last descriptor, update the cookie to that */
+   /*
+* Get the last descriptor, update the cookie to it
+*
+* This is done before callbacks run so that clients can check the
+* status of their DMA transfer inside the callback.
+*/
desc = to_fsl_desc(chan-ld_running.prev);
-   if (dma_is_idle(chan))
-   cookie = desc-async_tx.cookie;
-   else {
-   cookie = desc-async_tx.cookie - 1;
-   if (unlikely(cookie  DMA_MIN_COOKIE))
-   cookie = DMA_MAX_COOKIE;
-   }
-
-   chan-completed_cookie = cookie;
-
-out_unlock:
-   spin_unlock_irqrestore(chan-desc_lock, flags);
-}
-
-/**
- * fsldma_desc_status - Check the status of a descriptor
- * @chan: Freescale DMA channel
- * @desc: DMA SW descriptor

Re: [RFC] Inter-processor Mailboxes Drivers

2011-02-14 Thread Ira W. Snyder
On Mon, Feb 14, 2011 at 12:03:59PM +0200, Ohad Ben-Cohen wrote:
 On Mon, Feb 14, 2011 at 12:01 PM, Jamie Iles ja...@jamieiles.com wrote:
  On Fri, Feb 11, 2011 at 03:19:51PM -0600, Meador Inge wrote:
      1. Hardware specific bits somewhere under '.../arch/*'.  Drivers
         for the MPIC message registers on Power and OMAP4 mailboxes, for
         example.
      2. A higher level driver under '.../drivers/mailbox/*'.  That the
         pieces in (1) would register with.  This piece would expose the
         main kernel API.
      3. Userspace interfaces for accessing the mailboxes.  A
         '/dev/mailbox1', '/dev/mailbox2', etc... mapping, for example.
 
  How about using virtio for all of this and having the mailbox as a
  notification/message passing driver for the virtio backend?
 
 This is exactly what we are doing now, and it looks promising. expect
 patches soon.

I'll be happy to examine the feasibility of doing a port to mpc83xx as
soon as I see the code. :-) I have been using the message registers to
create a software network card over PCI (between a host system and an
mpc83xx in a PCI slot). I have wanted to use virtio for this task for a
long time.

I think a uniform interface for the mailbox registers would be a very
useful API.

Thanks,
Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH RFCv7 0/2] CARMA Board Support

2011-02-11 Thread Ira W. Snyder
Hello everyone,

This is the seventh posting of these drivers, taking into account comments
from earlier postings. I've made sure that the drivers both pass checkpatch
without any errors or warnings. I would appreciate as much review as you
can offer, so that these can get into the next merge cycle. They've been
sitting outside mainline for far too long.

RFCv6 - RFCv7:
- reference count private data structure (to support unbind)
- use #defines instead of hex values for registers
- keep lines =80 characters

RFCv5 - RFCv6:
- change locking in several functions
- use list_move_tail() to simplify code
- remove unused helper functions

RFCv4 - RFCv5:
- remove unecessary locking per review comments
- do not clobber return values from *_interruptible()
- explicitly track buffer DMA mapping
- use #defines instead of raw hex addresses
- change enable sysfs attribute to root-writeable only

RFCv3 - RFCv4:
- updates for DATA-FPGA version 2

RFCv2 - RFCv3:
- use miscdevice framework (removing the carma class)
- add bitfile readback capability to the programmer

RFCv1 - RFCv2:
- change comments to kerneldoc format
- Kconfig improvements
- use the videobuf_dma_sg API in the programmer
- updates for Freescale DMAEngine DMA_SLAVE API changes

KNOWN ISSUES:
- untested with a setup that can generate interrupts (will get access soon)
- does not handle runtime unbind

Information about the CARMA board:

The CARMA board is essentially an MPC8349EA MDS reference design with a
1GHz ADC and 4 high powered data processing FPGAs connected to the local
bus. It is all packed into a compact PCI form factor. It is used at the
Owens Valley Radio Observatory as the main component in the correlator
system.

For board information, see:
http://www.mmarray.org/~dwh/carma_board/index.html

For DATA-FPGA register layout, see:
http://www.mmarray.org/memos/carma_memo46.pdf

These drivers are the necessary pieces to get the data processing FPGAs
working and producing data. Despite the fact that the hardware is custom
and we are the only users, I'd still like to get the drivers upstream.
Several people have suggested that this is possible.

Some further patches will be forthcoming. I have a driver for the LED
subsystem and the PPS subsystem. The LED register layout is expected to
change soon, so I won't post the driver until that is finished. The PPS
driver will be posted seperately from this patch series; it is very
generic.

Thanks to everyone who has provided comments on earlier versions!

Ira W. Snyder (2):
  misc: add CARMA DATA-FPGA Access Driver
  misc: add CARMA DATA-FPGA Programmer support

 drivers/misc/Kconfig|1 +
 drivers/misc/Makefile   |1 +
 drivers/misc/carma/Kconfig  |   18 +
 drivers/misc/carma/Makefile |2 +
 drivers/misc/carma/carma-fpga-program.c | 1141 
 drivers/misc/carma/carma-fpga.c | 1433 +++
 6 files changed, 2596 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/carma/Kconfig
 create mode 100644 drivers/misc/carma/Makefile
 create mode 100644 drivers/misc/carma/carma-fpga-program.c
 create mode 100644 drivers/misc/carma/carma-fpga.c

-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH RFCv7 2/2] misc: add CARMA DATA-FPGA Programmer support

2011-02-11 Thread Ira W. Snyder
This adds support for programming the data processing FPGAs on the OVRO
CARMA board. These FPGAs have a special programming sequence that
requires that we program the Freescale DMA engine, which is only
available inside the kernel.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/misc/carma/Kconfig  |9 +
 drivers/misc/carma/Makefile |1 +
 drivers/misc/carma/carma-fpga-program.c | 1141 +++
 3 files changed, 1151 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/carma/carma-fpga-program.c

diff --git a/drivers/misc/carma/Kconfig b/drivers/misc/carma/Kconfig
index 4be183f..e57a9d3 100644
--- a/drivers/misc/carma/Kconfig
+++ b/drivers/misc/carma/Kconfig
@@ -7,3 +7,12 @@ config CARMA_FPGA
  Say Y here to include support for communicating with the data
  processing FPGAs on the OVRO CARMA board.
 
+config CARMA_FPGA_PROGRAM
+   tristate CARMA DATA-FPGA Programmer
+   depends on FSL_SOC  PPC_83xx  MEDIA_SUPPORT  HAS_DMA  FSL_DMA
+   select VIDEOBUF_DMA_SG
+   default n
+   help
+ Say Y here to include support for programming the data processing
+ FPGAs on the OVRO CARMA board.
+
diff --git a/drivers/misc/carma/Makefile b/drivers/misc/carma/Makefile
index 0b69fa7..ff36ac2 100644
--- a/drivers/misc/carma/Makefile
+++ b/drivers/misc/carma/Makefile
@@ -1 +1,2 @@
 obj-$(CONFIG_CARMA_FPGA)   += carma-fpga.o
+obj-$(CONFIG_CARMA_FPGA_PROGRAM)   += carma-fpga-program.o
diff --git a/drivers/misc/carma/carma-fpga-program.c 
b/drivers/misc/carma/carma-fpga-program.c
new file mode 100644
index 000..7ce6065
--- /dev/null
+++ b/drivers/misc/carma/carma-fpga-program.c
@@ -0,0 +1,1141 @@
+/*
+ * CARMA Board DATA-FPGA Programmer
+ *
+ * Copyright (c) 2009-2011 Ira W. Snyder i...@ovro.caltech.edu
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ */
+
+#include linux/dma-mapping.h
+#include linux/of_platform.h
+#include linux/completion.h
+#include linux/miscdevice.h
+#include linux/dmaengine.h
+#include linux/interrupt.h
+#include linux/highmem.h
+#include linux/kernel.h
+#include linux/module.h
+#include linux/mutex.h
+#include linux/delay.h
+#include linux/init.h
+#include linux/leds.h
+#include linux/slab.h
+#include linux/kref.h
+#include linux/fs.h
+#include linux/io.h
+
+#include media/videobuf-dma-sg.h
+
+/* MPC8349EMDS specific get_immrbase() */
+#include sysdev/fsl_soc.h
+
+static const char drv_name[] = carma-fpga-program;
+
+/*
+ * Firmware images are always this exact size
+ *
+ * 12849552 bytes for a CARMA Digitizer Board (EP2S90 FPGAs)
+ * 18662880 bytes for a CARMA Correlator Board (EP2S130 FPGAs)
+ */
+#define FW_SIZE_EP2S90 12849552
+#define FW_SIZE_EP2S13018662880
+
+struct fpga_dev {
+   struct miscdevice miscdev;
+
+   /* Reference count */
+   struct kref ref;
+
+   /* Device Registers */
+   struct device *dev;
+   void __iomem *regs;
+   void __iomem *immr;
+
+   /* Freescale DMA Device */
+   struct dma_chan *chan;
+
+   /* Interrupts */
+   int irq, status;
+   struct completion completion;
+
+   /* FPGA Bitfile */
+   struct mutex lock;
+
+   struct videobuf_dmabuf vb;
+   bool vb_allocated;
+
+   /* max size and written bytes */
+   size_t fw_size;
+   size_t bytes;
+};
+
+/*
+ * FPGA Bitfile Helpers
+ */
+
+/**
+ * fpga_drop_firmware_data() - drop the bitfile image from memory
+ * @priv: the driver's private data structure
+ *
+ * LOCKING: must hold priv-lock
+ */
+static void fpga_drop_firmware_data(struct fpga_dev *priv)
+{
+   videobuf_dma_free(priv-vb);
+   priv-vb_allocated = false;
+   priv-bytes = 0;
+}
+
+/*
+ * Private Data Reference Count
+ */
+
+static void fpga_dev_remove(struct kref *ref)
+{
+   struct fpga_dev *priv = container_of(ref, struct fpga_dev, ref);
+
+   /* free any firmware image that was not programmed */
+   fpga_drop_firmware_data(priv);
+
+   mutex_destroy(priv-lock);
+   kfree(priv);
+}
+
+/*
+ * LED Trigger (could be a seperate module)
+ */
+
+/*
+ * NOTE: this whole thing does have the problem that whenever the led's are
+ * NOTE: first set to use the fpga trigger, they could be in the wrong state
+ */
+
+DEFINE_LED_TRIGGER(ledtrig_fpga);
+
+static void ledtrig_fpga_programmed(bool enabled)
+{
+   if (enabled)
+   led_trigger_event(ledtrig_fpga, LED_FULL);
+   else
+   led_trigger_event(ledtrig_fpga, LED_OFF);
+}
+
+/*
+ * FPGA Register Helpers
+ */
+
+/* Register Definitions */
+#define FPGA_CONFIG_CONTROL0x40
+#define FPGA_CONFIG_STATUS 0x44
+#define FPGA_CONFIG_FIFO_SIZE  0x48
+#define FPGA_CONFIG_FIFO_USED

[PATCH RFCv7 1/2] misc: add CARMA DATA-FPGA Access Driver

2011-02-11 Thread Ira W. Snyder
This driver allows userspace to access the data processing FPGAs on the
OVRO CARMA board. It has two modes of operation:

1) random access

This allows users to poke any DATA-FPGA registers by using mmap to map
the address region directly into their memory map.

2) correlation dumping

When correlating, the DATA-FPGA's have special requirements for getting
the data out of their memory before the next correlation. This nominally
happens at 64Hz (every 15.625ms). If the data is not dumped before the
next correlation, data is lost.

The data dumping driver handles buffering up to 1 second worth of
correlation data from the FPGAs. This lowers the realtime scheduling
requirements for the userspace process reading the device.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/misc/Kconfig|1 +
 drivers/misc/Makefile   |1 +
 drivers/misc/carma/Kconfig  |9 +
 drivers/misc/carma/Makefile |1 +
 drivers/misc/carma/carma-fpga.c | 1433 +++
 5 files changed, 1445 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/carma/Kconfig
 create mode 100644 drivers/misc/carma/Makefile
 create mode 100644 drivers/misc/carma/carma-fpga.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index cc8e49d..93cf1e6 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -457,5 +457,6 @@ source drivers/misc/eeprom/Kconfig
 source drivers/misc/cb710/Kconfig
 source drivers/misc/iwmc3200top/Kconfig
 source drivers/misc/ti-st/Kconfig
+source drivers/misc/carma/Kconfig
 
 endif # MISC_DEVICES
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 98009cc..2c1610e 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -42,3 +42,4 @@ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o
 obj-$(CONFIG_PCH_PHUB) += pch_phub.o
 obj-y  += ti-st/
 obj-$(CONFIG_AB8500_PWM)   += ab8500-pwm.o
+obj-y  += carma/
diff --git a/drivers/misc/carma/Kconfig b/drivers/misc/carma/Kconfig
new file mode 100644
index 000..4be183f
--- /dev/null
+++ b/drivers/misc/carma/Kconfig
@@ -0,0 +1,9 @@
+config CARMA_FPGA
+   tristate CARMA DATA-FPGA Access Driver
+   depends on FSL_SOC  PPC_83xx  MEDIA_SUPPORT  HAS_DMA  FSL_DMA
+   select VIDEOBUF_DMA_SG
+   default n
+   help
+ Say Y here to include support for communicating with the data
+ processing FPGAs on the OVRO CARMA board.
+
diff --git a/drivers/misc/carma/Makefile b/drivers/misc/carma/Makefile
new file mode 100644
index 000..0b69fa7
--- /dev/null
+++ b/drivers/misc/carma/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_CARMA_FPGA)   += carma-fpga.o
diff --git a/drivers/misc/carma/carma-fpga.c b/drivers/misc/carma/carma-fpga.c
new file mode 100644
index 000..3965821
--- /dev/null
+++ b/drivers/misc/carma/carma-fpga.c
@@ -0,0 +1,1433 @@
+/*
+ * CARMA DATA-FPGA Access Driver
+ *
+ * Copyright (c) 2009-2011 Ira W. Snyder i...@ovro.caltech.edu
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ */
+
+/*
+ * FPGA Memory Dump Format
+ *
+ * FPGA #0 control registers (32 x 32-bit words)
+ * FPGA #1 control registers (32 x 32-bit words)
+ * FPGA #2 control registers (32 x 32-bit words)
+ * FPGA #3 control registers (32 x 32-bit words)
+ * SYSFPGA control registers (32 x 32-bit words)
+ * FPGA #0 correlation array (NUM_CORL0 correlation blocks)
+ * FPGA #1 correlation array (NUM_CORL1 correlation blocks)
+ * FPGA #2 correlation array (NUM_CORL2 correlation blocks)
+ * FPGA #3 correlation array (NUM_CORL3 correlation blocks)
+ *
+ * Each correlation array consists of:
+ *
+ * Correlation Data  (2 x NUM_LAGSn x 32-bit words)
+ * Pipeline Metadata (2 x NUM_METAn x 32-bit words)
+ * Quantization Counters (2 x NUM_QCNTn x 32-bit words)
+ *
+ * The NUM_CORLn, NUM_LAGSn, NUM_METAn, and NUM_QCNTn values come from
+ * the FPGA configuration registers. They do not change once the FPGA's
+ * have been programmed, they only change on re-programming.
+ */
+
+/*
+ * Basic Description:
+ *
+ * This driver is used to capture correlation spectra off of the four data
+ * processing FPGAs. The FPGAs are often reprogrammed at runtime, therefore
+ * this driver supports dynamic enable/disable of capture while the device
+ * remains open.
+ *
+ * The nominal capture rate is 64Hz (every 15.625ms). To facilitate this fast
+ * capture rate, all buffers are pre-allocated to avoid any potentially long
+ * running memory allocations while capturing.
+ *
+ * There are two lists and one pointer which are used to keep track of the
+ * different states of data buffers.
+ *
+ * 1) free list
+ * This list holds all empty data buffers which are ready to receive data.
+ *
+ * 2) inflight pointer

Re: [PATCH 1/2] misc: add CARMA DATA-FPGA Access Driver

2011-02-09 Thread Ira W. Snyder
On Wed, Feb 09, 2011 at 04:30:23PM -, David Laight wrote:
  
  This driver allows userspace to access the data processing 
  FPGAs on the OVRO CARMA board. It has two modes of operation:
  
  1) random access
  
  This allows users to poke any DATA-FPGA registers by using mmap to map
  the address region directly into their memory map.
 
 I needed something similar, but used pread() and pwrite()
 to request the transfers.
 While this does require a system call per transfer, it allows
 the driver to use dma (if available) to speed up the request.
 In my case doing single cycle transfers would be too slow.
 

We initially started with a read()/write() interface for individual
register reads and writes, just like you describe. It turned out that
mmap was plenty fast for our use. I made the decision to ditch all of
the extra code needed to setup and execute the DMA for the much simpler
mmap code.

In our case, going all the way through the DMA engine code just to
transfer 4 bytes is overkill. The local bus is already quite fast, and
we can increase the clock speed if needed.

Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] misc: add CARMA DATA-FPGA Access Driver

2011-02-09 Thread Ira W. Snyder
On Wed, Feb 09, 2011 at 12:33:25AM -0800, Dmitry Torokhov wrote:
 Hi Ira,
 
 On Tue, Feb 08, 2011 at 03:37:46PM -0800, Ira W. Snyder wrote:
  This driver allows userspace to access the data processing FPGAs on the
  OVRO CARMA board. It has two modes of operation:
  
 
 Thank you for making the changes, some more comments below.
 
  +
  +#define inode_to_dev(inode) container_of(inode-i_cdev, struct 
  fpga_device, cdev)
  +
 
 Does not seem to be used.
 

Leftovers from earlier versions, will remove.

  +/*
  + * Data Buffer Allocation Helpers
  + */
  +
  +static int data_map_buffer(struct device *dev, struct data_buf *buf)
  +{
  +   return videobuf_dma_map(dev, buf-vb);
  +}
  +
  +static void data_unmap_buffer(struct device *dev, struct data_buf *buf)
  +{
  +   videobuf_dma_unmap(dev, buf-vb);
  +}
 
 Why can't we all videobuf_dma_map() and videobuf_dma_unmap() directly?
 What these helpers supposed to solve?
 

The helpers were useful in the past. Now they are not. Will change.

  +static int data_alloc_buffers(struct fpga_device *priv)
  +{
  +   struct data_buf *buf;
  +   int i, ret;
  +
  +   for (i = 0; i  MAX_DATA_BUFS; i++) {
  +
  +   /* allocate a buffer */
  +   buf = data_alloc_buffer(priv-bufsize);
  +   if (!buf)
  +   continue;
 
 break?
 
  +
  +   /* map it for DMA */
  +   ret = data_map_buffer(priv-dev, buf);
  +   if (ret) {
  +   data_free_buffer(buf);
  +   continue;
 
 and here as well?
 

Yep, break is fine also.

  +   }
  +
  +   /* add it to the list of free buffers */
  +   list_add_tail(buf-entry, priv-free);
  +   priv-num_buffers++;
  +   }
  +
  +   /* Make sure we allocated the minimum required number of buffers */
  +   if (priv-num_buffers  MIN_DATA_BUFS) {
  +   dev_err(priv-dev, Unable to allocate enough data buffers\n);
  +   data_free_buffers(priv);
  +   return -ENOMEM;
  +   }
  +
  +   /* Warn if we are running in a degraded state, but do not fail */
  +   if (priv-num_buffers  MAX_DATA_BUFS) {
  +   dev_warn(priv-dev, Unable to allocate one second worth of 
  +   buffers, using %d buffers instead\n, i);
 
 The latest style is not break strings even if they exceed 80 column
 limit to make sure they are easily greppable.
 

I usually just follow checkpatch warnings. I'll combine the strings onto
one line.

  +static void data_dma_cb(void *data)
  +{
  +   struct fpga_device *priv = data;
  +   struct data_buf *buf;
  +   unsigned long flags;
  +
  +   spin_lock_irqsave(priv-lock, flags);
  +
  +   /* clear the FPGA status and re-enable interrupts */
  +   data_enable_interrupts(priv);
  +
  +   /* If the inflight list is empty, we've got a bug */
  +   BUG_ON(list_empty(priv-inflight));
  +
  +   /* Grab the first buffer from the inflight list */
  +   buf = list_first_entry(priv-inflight, struct data_buf, entry);
  +   list_del_init(buf-entry);
  +
  +   /* Add it to the used list */
  +   list_add_tail(buf-entry, priv-used);
 
 or
   list_move_tail(buf-entry, priv-used);
 

Using list_move_tail() didn't occur to me. I'll change it.

  +
  +static irqreturn_t data_irq(int irq, void *dev_id)
  +{
  +   struct fpga_device *priv = dev_id;
  +   struct data_buf *buf;
  +   u32 status;
  +   int i;
  +
  +   /* detect spurious interrupts via FPGA status */
  +   for (i = 0; i  4; i++) {
  +   status = fpga_read_reg(priv, i, MMAP_REG_STATUS);
  +   if (!(status  (CORL_DONE | CORL_ERR))) {
  +   dev_err(priv-dev, spurious irq detected (FPGA)\n);
  +   return IRQ_NONE;
  +   }
  +   }
  +
  +   /* detect spurious interrupts via raw IRQ pin readback */
  +   status = ioread32be(priv-regs + SYS_IRQ_INPUT_DATA);
  +   if (status  IRQ_CORL_DONE) {
  +   dev_err(priv-dev, spurious irq detected (IRQ)\n);
  +   return IRQ_NONE;
  +   }
  +
  +   spin_lock(priv-lock);
  +
  +   /* hide the interrupt by switching the IRQ driver to GPIO */
  +   data_disable_interrupts(priv);
  +
  +   /* Check that we actually have a free buffer */
  +   if (list_empty(priv-free)) {
  +   priv-num_dropped++;
  +   data_enable_interrupts(priv);
  +   goto out_unlock;
  +   }
  +
  +   buf = list_first_entry(priv-free, struct data_buf, entry);
  +   list_del_init(buf-entry);
  +
  +   /* Check the buffer size */
  +   BUG_ON(buf-size != priv-bufsize);
  +
  +   /* Submit a DMA transfer to get the correlation data */
  +   if (data_submit_dma(priv, buf)) {
  +   dev_err(priv-dev, Unable to setup DMA transfer\n);
  +   list_add_tail(buf-entry, priv-free);
  +   data_enable_interrupts(priv);
  +   goto out_unlock;
  +   }
  +
  +   /* DMA setup succeeded, GO!!! */
  +   list_add_tail(buf-entry, priv-inflight);
  +   dma_async_memcpy_issue_pending(priv-chan);
  +
  +out_unlock

Re: [PATCH 1/2] misc: add CARMA DATA-FPGA Access Driver

2011-02-09 Thread Ira W. Snyder
On Wed, Feb 09, 2011 at 10:27:40AM -0800, Dmitry Torokhov wrote:

[ snip stuff I've already fixed in the next version ]

  
  The requirement is that the device stay open during reconfiguration.
  This provides for that. Readers just block for as long as the device is
  not producing data.
 
 OK, you still need to make sure you do not touch free/used buffer while
 device is disabled. Also, you need to kick readers if you unbind the
 driver, so maybe a new flag priv-exists should be introduced and
 checked.
 

I don't understand what you mean by kick readers if you unbind the
driver. The kernel automatically increases the refcount on a module
when a process is using the module. This shows up in the Used by
column of lsmod's output.

The kernel will not let you rmmod a module with a non-zero refcount. You
cannot get into the situation where you have rmmod'ed the module and a
reader is still blocking in read()/poll().

Thanks for the review. A v6 is coming right up.

Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] misc: add CARMA DATA-FPGA Access Driver

2011-02-09 Thread Ira W. Snyder
On Wed, Feb 09, 2011 at 03:42:31PM -0800, Dmitry Torokhov wrote:
 On Wed, Feb 09, 2011 at 03:35:45PM -0800, Ira W. Snyder wrote:
  On Wed, Feb 09, 2011 at 10:27:40AM -0800, Dmitry Torokhov wrote:
  
  [ snip stuff I've already fixed in the next version ]
  

The requirement is that the device stay open during reconfiguration.
This provides for that. Readers just block for as long as the device is
not producing data.
   
   OK, you still need to make sure you do not touch free/used buffer while
   device is disabled. Also, you need to kick readers if you unbind the
   driver, so maybe a new flag priv-exists should be introduced and
   checked.
   
  
  I don't understand what you mean by kick readers if you unbind the
  driver. The kernel automatically increases the refcount on a module
  when a process is using the module. This shows up in the Used by
  column of lsmod's output.
  
  The kernel will not let you rmmod a module with a non-zero refcount. You
  cannot get into the situation where you have rmmod'ed the module and a
  reader is still blocking in read()/poll().
 
 However you can still unbind the driver from the device by writing into
 driver's sysfs 'unbind' attribute.
 
 See drivers/base/bus.c::driver_unbind().
 

I was completely unaware of that feature. I hunch that many drivers
are incapable of dealing with an unbind while they are still open.

Matter of fact, I don't see how this can EVER be safe. The driver core
automatically calls the data_of_remove() routine while there are still
blocked readers. This kfree()s the private data structure, which
contains the suggested priv-exists flag. What happens if the memory
allocator re-allocates that memory to a different driver before the
reader process is woken up to check the priv-exists flag?

The only way to solve this is to count the number of open()s and
close()s, and block the unbind until all users have close()d the device.

Thanks,
Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH RFCv6 0/2] CARMA Board Support

2011-02-09 Thread Ira W. Snyder
Hello everyone,

This is the sixth posting of these drivers, taking into account comments from
earlier postings. I would appreciate as much review as you can offer.

RFCv5 - RFCv6:
- change locking in several functions
- use list_move_tail() to simplify code
- remove unused helper functions

RFCv4 - RFCv5:
- remove unecessary locking per review comments
- do not clobber return values from *_interruptible()
- explicitly track buffer DMA mapping
- use #defines instead of raw hex addresses
- change enable sysfs attribute to root-writeable only

RFCv3 - RFCv4:
- updates for DATA-FPGA version 2

RFCv2 - RFCv3:
- use miscdevice framework (removing the carma class)
- add bitfile readback capability to the programmer

RFCv1 - RFCv2:
- change comments to kerneldoc format
- Kconfig improvements
- use the videobuf_dma_sg API in the programmer
- updates for Freescale DMAEngine DMA_SLAVE API changes

KNOWN ISSUES:
- untested with a setup that can generate interrupts (will get access soon)
- does not handle runtime unbind

Information about the CARMA board:

The CARMA board is essentially an MPC8349EA MDS reference design with a
1GHz ADC and 4 high powered data processing FPGAs connected to the local
bus. It is all packed into a compact PCI form factor. It is used at the
Owens Valley Radio Observatory as the main component in the correlator
system.

For board information, see:
http://www.mmarray.org/~dwh/carma_board/index.html

For DATA-FPGA register layout, see:
http://www.mmarray.org/memos/carma_memo46.pdf

These drivers are the necessary pieces to get the data processing FPGAs
working and producing data. Despite the fact that the hardware is custom
and we are the only users, I'd still like to get the drivers upstream.
Several people have suggested that this is possible.

Some further patches will be forthcoming. I have a driver for the LED
subsystem and the PPS subsystem. The LED register layout is expected to
change soon, so I won't post the driver until that is finished. The PPS
driver will be posted seperately from this patch series; it is very
generic.

Thanks to everyone who has provided comments on earlier versions!

Ira W. Snyder (2):
  misc: add CARMA DATA-FPGA Access Driver
  misc: add CARMA DATA-FPGA Programmer support

 drivers/misc/Kconfig|1 +
 drivers/misc/Makefile   |1 +
 drivers/misc/carma/Kconfig  |   18 +
 drivers/misc/carma/Makefile |2 +
 drivers/misc/carma/carma-fpga-program.c | 1084 
 drivers/misc/carma/carma-fpga.c | 1407 +++
 6 files changed, 2513 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/carma/Kconfig
 create mode 100644 drivers/misc/carma/Makefile
 create mode 100644 drivers/misc/carma/carma-fpga-program.c
 create mode 100644 drivers/misc/carma/carma-fpga.c

-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH RFCv6 1/2] misc: add CARMA DATA-FPGA Access Driver

2011-02-09 Thread Ira W. Snyder
This driver allows userspace to access the data processing FPGAs on the
OVRO CARMA board. It has two modes of operation:

1) random access

This allows users to poke any DATA-FPGA registers by using mmap to map
the address region directly into their memory map.

2) correlation dumping

When correlating, the DATA-FPGA's have special requirements for getting
the data out of their memory before the next correlation. This nominally
happens at 64Hz (every 15.625ms). If the data is not dumped before the
next correlation, data is lost.

The data dumping driver handles buffering up to 1 second worth of
correlation data from the FPGAs. This lowers the realtime scheduling
requirements for the userspace process reading the device.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/misc/Kconfig|1 +
 drivers/misc/Makefile   |1 +
 drivers/misc/carma/Kconfig  |9 +
 drivers/misc/carma/Makefile |1 +
 drivers/misc/carma/carma-fpga.c | 1407 +++
 5 files changed, 1419 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/carma/Kconfig
 create mode 100644 drivers/misc/carma/Makefile
 create mode 100644 drivers/misc/carma/carma-fpga.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index cc8e49d..93cf1e6 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -457,5 +457,6 @@ source drivers/misc/eeprom/Kconfig
 source drivers/misc/cb710/Kconfig
 source drivers/misc/iwmc3200top/Kconfig
 source drivers/misc/ti-st/Kconfig
+source drivers/misc/carma/Kconfig
 
 endif # MISC_DEVICES
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 98009cc..2c1610e 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -42,3 +42,4 @@ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o
 obj-$(CONFIG_PCH_PHUB) += pch_phub.o
 obj-y  += ti-st/
 obj-$(CONFIG_AB8500_PWM)   += ab8500-pwm.o
+obj-y  += carma/
diff --git a/drivers/misc/carma/Kconfig b/drivers/misc/carma/Kconfig
new file mode 100644
index 000..4be183f
--- /dev/null
+++ b/drivers/misc/carma/Kconfig
@@ -0,0 +1,9 @@
+config CARMA_FPGA
+   tristate CARMA DATA-FPGA Access Driver
+   depends on FSL_SOC  PPC_83xx  MEDIA_SUPPORT  HAS_DMA  FSL_DMA
+   select VIDEOBUF_DMA_SG
+   default n
+   help
+ Say Y here to include support for communicating with the data
+ processing FPGAs on the OVRO CARMA board.
+
diff --git a/drivers/misc/carma/Makefile b/drivers/misc/carma/Makefile
new file mode 100644
index 000..0b69fa7
--- /dev/null
+++ b/drivers/misc/carma/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_CARMA_FPGA)   += carma-fpga.o
diff --git a/drivers/misc/carma/carma-fpga.c b/drivers/misc/carma/carma-fpga.c
new file mode 100644
index 000..be40a07
--- /dev/null
+++ b/drivers/misc/carma/carma-fpga.c
@@ -0,0 +1,1407 @@
+/*
+ * CARMA DATA-FPGA Access Driver
+ *
+ * Copyright (c) 2009-2011 Ira W. Snyder i...@ovro.caltech.edu
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ */
+
+/*
+ * FPGA Memory Dump Format
+ *
+ * FPGA #0 control registers (32 x 32-bit words)
+ * FPGA #1 control registers (32 x 32-bit words)
+ * FPGA #2 control registers (32 x 32-bit words)
+ * FPGA #3 control registers (32 x 32-bit words)
+ * SYSFPGA control registers (32 x 32-bit words)
+ * FPGA #0 correlation array (NUM_CORL0 correlation blocks)
+ * FPGA #1 correlation array (NUM_CORL1 correlation blocks)
+ * FPGA #2 correlation array (NUM_CORL2 correlation blocks)
+ * FPGA #3 correlation array (NUM_CORL3 correlation blocks)
+ *
+ * Each correlation array consists of:
+ *
+ * Correlation Data  (2 x NUM_LAGSn x 32-bit words)
+ * Pipeline Metadata (2 x NUM_METAn x 32-bit words)
+ * Quantization Counters (2 x NUM_QCNTn x 32-bit words)
+ *
+ * The NUM_CORLn, NUM_LAGSn, NUM_METAn, and NUM_QCNTn values come from
+ * the FPGA configuration registers. They do not change once the FPGA's
+ * have been programmed, they only change on re-programming.
+ */
+
+/*
+ * Basic Description:
+ *
+ * This driver is used to capture correlation spectra off of the four data
+ * processing FPGAs. The FPGAs are often reprogrammed at runtime, therefore
+ * this driver supports dynamic enable/disable of capture while the device
+ * remains open.
+ *
+ * The nominal capture rate is 64Hz (every 15.625ms). To facilitate this fast
+ * capture rate, all buffers are pre-allocated to avoid any potentially long
+ * running memory allocations while capturing.
+ *
+ * There are two lists and one pointer which are used to keep track of the
+ * different states of data buffers.
+ *
+ * 1) free list
+ * This list holds all empty data buffers which are ready to receive data.
+ *
+ * 2) inflight pointer

[PATCH RFCv6 2/2] misc: add CARMA DATA-FPGA Programmer support

2011-02-09 Thread Ira W. Snyder
This adds support for programming the data processing FPGAs on the OVRO
CARMA board. These FPGAs have a special programming sequence that
requires that we program the Freescale DMA engine, which is only
available inside the kernel.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/misc/carma/Kconfig  |9 +
 drivers/misc/carma/Makefile |1 +
 drivers/misc/carma/carma-fpga-program.c | 1084 +++
 3 files changed, 1094 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/carma/carma-fpga-program.c

diff --git a/drivers/misc/carma/Kconfig b/drivers/misc/carma/Kconfig
index 4be183f..e57a9d3 100644
--- a/drivers/misc/carma/Kconfig
+++ b/drivers/misc/carma/Kconfig
@@ -7,3 +7,12 @@ config CARMA_FPGA
  Say Y here to include support for communicating with the data
  processing FPGAs on the OVRO CARMA board.
 
+config CARMA_FPGA_PROGRAM
+   tristate CARMA DATA-FPGA Programmer
+   depends on FSL_SOC  PPC_83xx  MEDIA_SUPPORT  HAS_DMA  FSL_DMA
+   select VIDEOBUF_DMA_SG
+   default n
+   help
+ Say Y here to include support for programming the data processing
+ FPGAs on the OVRO CARMA board.
+
diff --git a/drivers/misc/carma/Makefile b/drivers/misc/carma/Makefile
index 0b69fa7..ff36ac2 100644
--- a/drivers/misc/carma/Makefile
+++ b/drivers/misc/carma/Makefile
@@ -1 +1,2 @@
 obj-$(CONFIG_CARMA_FPGA)   += carma-fpga.o
+obj-$(CONFIG_CARMA_FPGA_PROGRAM)   += carma-fpga-program.o
diff --git a/drivers/misc/carma/carma-fpga-program.c 
b/drivers/misc/carma/carma-fpga-program.c
new file mode 100644
index 000..ef16cb3
--- /dev/null
+++ b/drivers/misc/carma/carma-fpga-program.c
@@ -0,0 +1,1084 @@
+/*
+ * CARMA Board DATA-FPGA Programmer
+ *
+ * Copyright (c) 2009-2010 Ira W. Snyder i...@ovro.caltech.edu
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ */
+
+#include linux/dma-mapping.h
+#include linux/of_platform.h
+#include linux/completion.h
+#include linux/miscdevice.h
+#include linux/dmaengine.h
+#include linux/interrupt.h
+#include linux/highmem.h
+#include linux/kernel.h
+#include linux/module.h
+#include linux/mutex.h
+#include linux/delay.h
+#include linux/init.h
+#include linux/leds.h
+#include linux/slab.h
+#include linux/fs.h
+#include linux/io.h
+
+#include media/videobuf-dma-sg.h
+
+/* MPC8349EMDS specific get_immrbase() */
+#include sysdev/fsl_soc.h
+
+static const char drv_name[] = carma-fpga-program;
+
+/*
+ * Maximum firmware size
+ *
+ * 12849552 bytes for a CARMA Digitizer Board
+ * 18662880 bytes for a CARMA Correlator Board
+ */
+#define FW_SIZE_EP2S90 12849552
+#define FW_SIZE_EP2S13018662880
+
+struct fpga_dev {
+   struct miscdevice miscdev;
+
+   /* Device Registers */
+   struct device *dev;
+   void __iomem *regs;
+   void __iomem *immr;
+
+   /* Freescale DMA Device */
+   struct dma_chan *chan;
+
+   /* Interrupts */
+   int irq, status;
+   struct completion completion;
+
+   /* FPGA Bitfile */
+   struct mutex lock;
+
+   struct videobuf_dmabuf vb;
+   bool vb_allocated;
+
+   /* max size and written bytes */
+   size_t fw_size;
+   size_t bytes;
+};
+
+/*
+ * FPGA Bitfile Helpers
+ */
+
+/**
+ * fpga_drop_firmware_data() - drop the bitfile image from memory
+ * @priv: the driver's private data structure
+ *
+ * LOCKING: must hold priv-lock
+ */
+static void fpga_drop_firmware_data(struct fpga_dev *priv)
+{
+   videobuf_dma_free(priv-vb);
+   priv-vb_allocated = false;
+   priv-bytes = 0;
+}
+
+/*
+ * LED Trigger (could be a seperate module)
+ */
+
+/*
+ * NOTE: this whole thing does have the problem that whenever the led's are
+ * NOTE: first set to use the fpga trigger, they could be in the wrong state
+ */
+
+DEFINE_LED_TRIGGER(ledtrig_fpga);
+
+static void ledtrig_fpga_programmed(bool enabled)
+{
+   if (enabled)
+   led_trigger_event(ledtrig_fpga, LED_FULL);
+   else
+   led_trigger_event(ledtrig_fpga, LED_OFF);
+}
+
+/*
+ * FPGA Register Helpers
+ */
+
+/* Register Definitions */
+#define FPGA_CONFIG_CONTROL0x40
+#define FPGA_CONFIG_STATUS 0x44
+#define FPGA_CONFIG_FIFO_SIZE  0x48
+#define FPGA_CONFIG_FIFO_USED  0x4C
+#define FPGA_CONFIG_TOTAL_BYTE_COUNT   0x50
+#define FPGA_CONFIG_CUR_BYTE_COUNT 0x54
+
+#define FPGA_FIFO_ADDRESS  0x3000
+
+static int fpga_fifo_size(void __iomem *regs)
+{
+   return ioread32be(regs + FPGA_CONFIG_FIFO_SIZE);
+}
+
+static int fpga_config_error(void __iomem *regs)
+{
+   return ioread32be(regs + FPGA_CONFIG_STATUS)  0xFFFE;
+}
+
+static int fpga_fifo_empty(void __iomem *regs)
+{
+   return ioread32be

Re: [PATCH 1/2] misc: add CARMA DATA-FPGA Access Driver

2011-02-08 Thread Ira W. Snyder
On Mon, Feb 07, 2011 at 11:33:10PM -0800, Dmitry Torokhov wrote:
 Hi Ira,
 
 On Mon, Feb 07, 2011 at 03:23:40PM -0800, Ira W. Snyder wrote:
  This driver allows userspace to access the data processing FPGAs on the
  OVRO CARMA board. It has two modes of operation:
  
  1) random access
  
  This allows users to poke any DATA-FPGA registers by using mmap to map
  the address region directly into their memory map.
  
  2) correlation dumping
  
  When correlating, the DATA-FPGA's have special requirements for getting
  the data out of their memory before the next correlation. This nominally
  happens at 64Hz (every 15.625ms). If the data is not dumped before the
  next correlation, data is lost.
  
  The data dumping driver handles buffering up to 1 second worth of
  correlation data from the FPGAs. This lowers the realtime scheduling
  requirements for the userspace process reading the device.
 
 Kind of a fly-by review but it looks like the locking in the driver
 needs work.
 

Hi Dmitry,

Thanks for the review. I have a few comments inline below.

  
  Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
  ---
   drivers/misc/Kconfig|1 +
   drivers/misc/Makefile   |1 +
   drivers/misc/carma/Kconfig  |9 +
   drivers/misc/carma/Makefile |1 +
   drivers/misc/carma/carma-fpga.c | 1446 
  +++
   5 files changed, 1458 insertions(+), 0 deletions(-)
   create mode 100644 drivers/misc/carma/Kconfig
   create mode 100644 drivers/misc/carma/Makefile
   create mode 100644 drivers/misc/carma/carma-fpga.c
  
  diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
  index 4d073f1..f457f14 100644
  --- a/drivers/misc/Kconfig
  +++ b/drivers/misc/Kconfig
  @@ -457,5 +457,6 @@ source drivers/misc/eeprom/Kconfig
   source drivers/misc/cb710/Kconfig
   source drivers/misc/iwmc3200top/Kconfig
   source drivers/misc/ti-st/Kconfig
  +source drivers/misc/carma/Kconfig
   
   endif # MISC_DEVICES
  diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
  index 98009cc..2c1610e 100644
  --- a/drivers/misc/Makefile
  +++ b/drivers/misc/Makefile
  @@ -42,3 +42,4 @@ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o
   obj-$(CONFIG_PCH_PHUB) += pch_phub.o
   obj-y  += ti-st/
   obj-$(CONFIG_AB8500_PWM)   += ab8500-pwm.o
  +obj-y  += carma/
  diff --git a/drivers/misc/carma/Kconfig b/drivers/misc/carma/Kconfig
  new file mode 100644
  index 000..4be183f
  --- /dev/null
  +++ b/drivers/misc/carma/Kconfig
  @@ -0,0 +1,9 @@
  +config CARMA_FPGA
  +   tristate CARMA DATA-FPGA Access Driver
  +   depends on FSL_SOC  PPC_83xx  MEDIA_SUPPORT  HAS_DMA  FSL_DMA
  +   select VIDEOBUF_DMA_SG
  +   default n
  +   help
  + Say Y here to include support for communicating with the data
  + processing FPGAs on the OVRO CARMA board.
  +
  diff --git a/drivers/misc/carma/Makefile b/drivers/misc/carma/Makefile
  new file mode 100644
  index 000..0b69fa7
  --- /dev/null
  +++ b/drivers/misc/carma/Makefile
  @@ -0,0 +1 @@
  +obj-$(CONFIG_CARMA_FPGA)   += carma-fpga.o
  diff --git a/drivers/misc/carma/carma-fpga.c 
  b/drivers/misc/carma/carma-fpga.c
  new file mode 100644
  index 000..52620b3
  --- /dev/null
  +++ b/drivers/misc/carma/carma-fpga.c
  @@ -0,0 +1,1446 @@
  +/*
  + * CARMA DATA-FPGA Access Driver
  + *
  + * Copyright (c) 2009-2010 Ira W. Snyder i...@ovro.caltech.edu
  + *
  + * This program is free software; you can redistribute it and/or modify it
  + * under the terms of the GNU General Public License as published by the
  + * Free Software Foundation; either version 2 of the License, or (at your
  + * option) any later version.
  + */
  +
  +/*
  + * FPGA Memory Dump Format
  + *
  + * FPGA #0 control registers (32 x 32-bit words)
  + * FPGA #1 control registers (32 x 32-bit words)
  + * FPGA #2 control registers (32 x 32-bit words)
  + * FPGA #3 control registers (32 x 32-bit words)
  + * SYSFPGA control registers (32 x 32-bit words)
  + * FPGA #0 correlation array (NUM_CORL0 correlation blocks)
  + * FPGA #1 correlation array (NUM_CORL1 correlation blocks)
  + * FPGA #2 correlation array (NUM_CORL2 correlation blocks)
  + * FPGA #3 correlation array (NUM_CORL3 correlation blocks)
  + *
  + * Each correlation array consists of:
  + *
  + * Correlation Data  (2 x NUM_LAGSn x 32-bit words)
  + * Pipeline Metadata (2 x NUM_METAn x 32-bit words)
  + * Quantization Counters (2 x NUM_QCNTn x 32-bit words)
  + *
  + * The NUM_CORLn, NUM_LAGSn, NUM_METAn, and NUM_QCNTn values come from
  + * the FPGA configuration registers. They do not change once the FPGA's
  + * have been programmed, they only change on re-programming.
  + */
  +
  +/*
  + * Basic Description:
  + *
  + * This driver is used to capture correlation spectra off of the four data
  + * processing FPGAs. The FPGAs are often reprogrammed at runtime, therefore
  + * this driver supports dynamic enable

Re: [PATCH 1/2] misc: add CARMA DATA-FPGA Access Driver

2011-02-08 Thread Ira W. Snyder
On Tue, Feb 08, 2011 at 09:50:29AM -0800, Dmitry Torokhov wrote:
 On Tue, Feb 08, 2011 at 09:20:46AM -0800, Ira W. Snyder wrote:
  On Mon, Feb 07, 2011 at 11:33:10PM -0800, Dmitry Torokhov wrote:
+static void data_free_buffer(struct device *dev, struct data_buf *buf)
+{
+   /* It is ok to free a NULL buffer */
+   if (!buf)
+   return;
+
+   /* Make sure the buffer is not on any list */
+   list_del_init(buf-entry);
   
   And what happens if it is? Should it be WARN_ON(!list_empty()) instead?
   
  
  This was only defensive programming. Everywhere this function is called,
  the buffer has already been removed from the list.
 
 I am concerned as sometimes defencive programming is the sign that we
 arenot quite sure how the code works. I believe defensive programming
 should be used when providing library-like code, not in local cases.
 

Ok.

+
+   list_for_each_entry_safe(buf, tmp, priv-free, entry) {
+   list_del_init(buf-entry);
+   spin_unlock_irq(priv-lock);
+   data_free_buffer(priv-dev, buf);
+   spin_lock_irq(priv-lock);
+   }
   
   This is messed up. If there is concurrent access to the free list then
   it is not safe to continue iterating list after releasing the lock, you
   need to do:
   
 spin_lock_irq(priv-lock);
 while (!list_empty(priv-free)) {
 buf = list_first_entry(priv-free, struct data_buf, entry);
 list_del_init(buf-entry);
 spin_unlock_irq(priv-lock);
 data_free_buffer(priv-dev, buf);
 spin_lock_irq(priv-lock);
 }
   
   BUT, the function is only called when you disable (or fail to enable) 
   device
   which, at this point, should be quiesced, thus all this locking is not
   really needed.
   
  
  Correct.
  
  I thought it would be clearer to reviewers if I always used the lock to
  protect a data structure, even when it isn't technically needed.
 
 No, locks should protect wehat needs to be protected. The rest just
 muddles water.
 

Ok.

+
+   spin_lock_irq(priv-lock);
+   while (!list_empty(list)) {
+   spin_unlock_irq(priv-lock);
+
+   ret = wait_event_interruptible(priv-wait, 
list_empty(list));
+   if (ret)
+   return -ERESTARTSYS;
+
+   spin_lock_irq(priv-lock);
+   }
+   spin_unlock_irq(priv-lock);
   
   Locking is not needed - if you disable interrupyts what would put more
   stuff on the list?
   
  
  The locking is definitely needed.
  
  You've missed a critical piece of information. There are *two* devices
  we are interacting with here, and BOTH generate interrupts.
  
 
 No, I did not miss this fact. The point is that when we get to this code
 the device _putting_ items on wauiting list is stopped and we only need
 to wait for the list to drain. Nobody puts more stuff on it. You can
 check fir list_empty() condition without locking.
 
 And if someone _is_ putting more stuff on the list - you are screwed
 since list may become non-empty the moment you release the lock.
 

Ok, I understand what you mean now. You are correct, nothing else can
add things to the list. Thanks for clarifying this for me. :)

+
+static ssize_t data_num_buffers_show(struct device *dev,
+struct device_attribute *attr, 
char *buf)
+{
+   struct fpga_device *priv = dev_get_drvdata(dev);
+   unsigned int num;
+
+   spin_lock_irq(priv-lock);
+   num = priv-num_buffers;
+   spin_unlock_irq(priv-lock);
   
   This spin lock is pointless, priv-num_buffers might be already changed
   here, you can't guarantee that you show accurate data.
   
  
  Correct, I know this. I just wanted to protect the data structure at all
  points of use in the driver.
 
 Protect from what? integer reads are guaranteed to be complete and you
 are not concerned with missing updates as information is obsolete the
 moment you release trhe lock.
 
  Would an atomic_t be better for this, or
  should I just remove the locking completely?
 
 Just remove the locking.
 

Ok.

+
+   if (mutex_lock_interruptible(priv-mutex))
+   return -ERESTARTSYS;
   
   Why don't
   
 error = mutex_lock_interruptible(priv-mutex);
 if (error)
 return error;
   
   - do not clobber perfectly valid error codes.
   
  
  That's what the Linux Device Drivers 3rd Edition book does. See page
  112. I will change it to fix the return code.
 
 
 LDD3 is quite old by now...
 

I know, but it is still the best written reference I have. Reviewers
like yourself are better, but I can't look up your advice in a book. :)

I'll return the error code.

+
+static struct attribute *data_sysfs_attrs[] = {
+   dev_attr_num_buffers.attr

[PATCH RFCv5 0/2] CARMA Board Support

2011-02-08 Thread Ira W. Snyder
Hello everyone,

This is the fifth posting of these drivers, taking into account comments from
earlier postings. I would appreciate as much review as you can offer.

RFCv4 - RFCv5:
- remove unecessary locking per review comments
- do not clobber return values from *_interruptible()
- explicitly track buffer DMA mapping
- use #defines instead of raw hex addresses
- change enable sysfs attribute to root-writeable only

RFCv3 - RFCv4:
- updates for DATA-FPGA version 2

RFCv2 - RFCv3:
- use miscdevice framework (removing the carma class)
- add bitfile readback capability to the programmer

RFCv1 - RFCv2:
- change comments to kerneldoc format
- Kconfig improvements
- use the videobuf_dma_sg API in the programmer
- updates for Freescale DMAEngine DMA_SLAVE API changes

Information about the CARMA board:

The CARMA board is essentially an MPC8349EA MDS reference design with a
1GHz ADC and 4 high powered data processing FPGAs connected to the local
bus. It is all packed into a compact PCI form factor. It is used at the
Owens Valley Radio Observatory as the main component in the correlator
system.

For board information, see:
http://www.mmarray.org/~dwh/carma_board/index.html

For DATA-FPGA register layout, see:
http://www.mmarray.org/memos/carma_memo46.pdf

These drivers are the necessary pieces to get the data processing FPGAs
working and producing data. Despite the fact that the hardware is custom
and we are the only users, I'd still like to get the drivers upstream.
Several people have suggested that this is possible.

Some further patches will be forthcoming. I have a driver for the LED
subsystem and the PPS subsystem. The LED register layout is expected to
change soon, so I won't post the driver until that is finished. The PPS
driver will be posted seperately from this patch series; it is very
generic.

Thanks to everyone who has provided comments on earlier versions!

Ira W. Snyder (2):
  misc: add CARMA DATA-FPGA Access Driver
  misc: add CARMA DATA-FPGA Programmer support

 drivers/misc/Kconfig|1 +
 drivers/misc/Makefile   |1 +
 drivers/misc/carma/Kconfig  |   18 +
 drivers/misc/carma/Makefile |2 +
 drivers/misc/carma/carma-fpga-program.c | 1084 
 drivers/misc/carma/carma-fpga.c | 1396 +++
 6 files changed, 2502 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/carma/Kconfig
 create mode 100644 drivers/misc/carma/Makefile
 create mode 100644 drivers/misc/carma/carma-fpga-program.c
 create mode 100644 drivers/misc/carma/carma-fpga.c

-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/2] misc: add CARMA DATA-FPGA Access Driver

2011-02-08 Thread Ira W. Snyder
This driver allows userspace to access the data processing FPGAs on the
OVRO CARMA board. It has two modes of operation:

1) random access

This allows users to poke any DATA-FPGA registers by using mmap to map
the address region directly into their memory map.

2) correlation dumping

When correlating, the DATA-FPGA's have special requirements for getting
the data out of their memory before the next correlation. This nominally
happens at 64Hz (every 15.625ms). If the data is not dumped before the
next correlation, data is lost.

The data dumping driver handles buffering up to 1 second worth of
correlation data from the FPGAs. This lowers the realtime scheduling
requirements for the userspace process reading the device.

Signed-off-by: Ira W. Snyder i...@ovro.caltech.edu
---
 drivers/misc/Kconfig|1 +
 drivers/misc/Makefile   |1 +
 drivers/misc/carma/Kconfig  |9 +
 drivers/misc/carma/Makefile |1 +
 drivers/misc/carma/carma-fpga.c | 1396 +++
 5 files changed, 1408 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/carma/Kconfig
 create mode 100644 drivers/misc/carma/Makefile
 create mode 100644 drivers/misc/carma/carma-fpga.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index cc8e49d..93cf1e6 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -457,5 +457,6 @@ source drivers/misc/eeprom/Kconfig
 source drivers/misc/cb710/Kconfig
 source drivers/misc/iwmc3200top/Kconfig
 source drivers/misc/ti-st/Kconfig
+source drivers/misc/carma/Kconfig
 
 endif # MISC_DEVICES
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 98009cc..2c1610e 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -42,3 +42,4 @@ obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o
 obj-$(CONFIG_PCH_PHUB) += pch_phub.o
 obj-y  += ti-st/
 obj-$(CONFIG_AB8500_PWM)   += ab8500-pwm.o
+obj-y  += carma/
diff --git a/drivers/misc/carma/Kconfig b/drivers/misc/carma/Kconfig
new file mode 100644
index 000..4be183f
--- /dev/null
+++ b/drivers/misc/carma/Kconfig
@@ -0,0 +1,9 @@
+config CARMA_FPGA
+   tristate CARMA DATA-FPGA Access Driver
+   depends on FSL_SOC  PPC_83xx  MEDIA_SUPPORT  HAS_DMA  FSL_DMA
+   select VIDEOBUF_DMA_SG
+   default n
+   help
+ Say Y here to include support for communicating with the data
+ processing FPGAs on the OVRO CARMA board.
+
diff --git a/drivers/misc/carma/Makefile b/drivers/misc/carma/Makefile
new file mode 100644
index 000..0b69fa7
--- /dev/null
+++ b/drivers/misc/carma/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_CARMA_FPGA)   += carma-fpga.o
diff --git a/drivers/misc/carma/carma-fpga.c b/drivers/misc/carma/carma-fpga.c
new file mode 100644
index 000..4ea473a
--- /dev/null
+++ b/drivers/misc/carma/carma-fpga.c
@@ -0,0 +1,1396 @@
+/*
+ * CARMA DATA-FPGA Access Driver
+ *
+ * Copyright (c) 2009-2011 Ira W. Snyder i...@ovro.caltech.edu
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ */
+
+/*
+ * FPGA Memory Dump Format
+ *
+ * FPGA #0 control registers (32 x 32-bit words)
+ * FPGA #1 control registers (32 x 32-bit words)
+ * FPGA #2 control registers (32 x 32-bit words)
+ * FPGA #3 control registers (32 x 32-bit words)
+ * SYSFPGA control registers (32 x 32-bit words)
+ * FPGA #0 correlation array (NUM_CORL0 correlation blocks)
+ * FPGA #1 correlation array (NUM_CORL1 correlation blocks)
+ * FPGA #2 correlation array (NUM_CORL2 correlation blocks)
+ * FPGA #3 correlation array (NUM_CORL3 correlation blocks)
+ *
+ * Each correlation array consists of:
+ *
+ * Correlation Data  (2 x NUM_LAGSn x 32-bit words)
+ * Pipeline Metadata (2 x NUM_METAn x 32-bit words)
+ * Quantization Counters (2 x NUM_QCNTn x 32-bit words)
+ *
+ * The NUM_CORLn, NUM_LAGSn, NUM_METAn, and NUM_QCNTn values come from
+ * the FPGA configuration registers. They do not change once the FPGA's
+ * have been programmed, they only change on re-programming.
+ */
+
+/*
+ * Basic Description:
+ *
+ * This driver is used to capture correlation spectra off of the four data
+ * processing FPGAs. The FPGAs are often reprogrammed at runtime, therefore
+ * this driver supports dynamic enable/disable of capture while the device
+ * remains open.
+ *
+ * The nominal capture rate is 64Hz (every 15.625ms). To facilitate this fast
+ * capture rate, all buffers are pre-allocated to avoid any potentially long
+ * running memory allocations while capturing.
+ *
+ * There are three lists which are used to keep track of the different states
+ * of data buffers.
+ *
+ * 1) free list
+ * This list holds all empty data buffers which are ready to receive data.
+ *
+ * 2) inflight list
+ * This list holds data

  1   2   >