RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
Will this engine be coordinating with another to handle memory copies? The dma mapping code for async_tx/raid is broken when dma mapping requests overlap or cross dma device boundaries [1]. [1]: http://marc.info/?l=linux-arm-kernelm=129407269402930w=2 Yes, it needs fsl-dma to handle memcpy copies. I read your link, the unmap address is stored in talitos hwdesc, the address will be unmapped when async_tx ack this descriptor, I know fsl- dma won't wait this ack flag in current kernel, so I fix it in fsl-dma patch 5/8. Do you mean that? Unfortunately no. I'm open to other suggestions. but as far as I can see it requires deeper changes to rip out the dma mapping that happens in async_tx and the automatic unmapping done by drivers. It should all be pushed to the client (md). Currently async_tx hides hardware details from md such that it doesn't even care if the operation is offloaded to hardware at all, but that takes things too far. In the worst case an copy-xor chain handled by multiple channels results in : 1/ dma_map(copy_chan...) 2/ dma_map(xor_chan...) 3/ exec copy 4/ dma_unmap(copy_chan...) 5/ exec xor ---initiated by the copy_chan 6/ dma_unmap(xor_chan...) Step 2 violates the dma api since the buffers belong to the xor_chan until unmap. Step 5 also causes the random completion context of the copy channel to bleed into submission context of the xor channel which is problematic. So the order needs to be: 1/ dma_map(copy_chan...) 2/ exec copy 3/ dma_unmap(copy_chan...) 4/ dma_map(xor_chan...) 5/ exec xor --initiated by md in a static context 6/ dma_unmap(xor_chan...) Also, if xor_chan and copy_chan lie with the same dma mapping domain (iommu or parent device) then we can map the stripe once and skip the extra maintenance for the duration of the chain of operations. This dumps a lot of hardware details on md, but I think it is the only way to get consistent semantics when arbitrary offload devices are involved. Thanks for your answer and links, I did some investigate these days, first, powerpc processor should be hardware assured cache coherency, it should be ok for hardware when in step 5 (but I will avoid map same address on different device). second, I have a workaround to make dma_map/unmap by order when using 2 different device to offload, I will submit next descriptor until current descriptor complete, if (submit-flags ASYNC_TX_ACK) async_tx_ack(tx); if (depend_tx) async_tx_ack(depend_tx); + /* do more check to support 2 devices offload? */ + if (dma_wait_for_async_tx(tx) == DMA_ERROR) + panic(%s: DMA_ERROR waiting for tx\n, __func__); } EXPORT_SYMBOL_GPL(async_tx_submit); Also use your example, 1/ dma_map(copy_chan...) 2/ tx-submit(tx); async_tx_ack(tx); 3/ dma_unmap(copy_chan...) 4/ dma_map(xor_chan...) 5/ exec xor -- initialized by tx-submit(tx); 6/ dma_unmap(xor_chan...) Under this way, actually dma_run_dependency() is useless, so this can make sure copy and xor with same page processed by order, and only one descriptor per channel is served. dma_unmap in driver is controlled by client (tx-flags) How's you thinking or any suggestions? I test it on our powerpc, I don't know whether it does work on other architecture. Thanks. -- Dan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
-Original Message- From: dan.j.willi...@gmail.com [mailto:dan.j.willi...@gmail.com] On Behalf Of Dan Williams Sent: Sunday, September 02, 2012 4:12 PM To: Liu Qiang-B32616 Cc: linux-cry...@vger.kernel.org; herb...@gondor.hengli.com.au; da...@davemloft.net; linux-ker...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Li Yang-R58472; Phillips Kim-R1AAHA; vinod.k...@intel.com; dan.j.willi...@intel.com; a...@arndb.de; gre...@linuxfoundation.org Subject: Re: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload On Thu, Aug 9, 2012 at 1:20 AM, qiang@freescale.com wrote: From: Qiang Liu qiang@freescale.com Expose Talitos's XOR functionality to be used for RAID parity calculation via the Async_tx layer. Cc: Herbert Xu herb...@gondor.apana.org.au Cc: David S. Miller da...@davemloft.net Signed-off-by: Dipen Dudhat dipen.dud...@freescale.com Signed-off-by: Maneesh Gupta maneesh.gu...@freescale.com Signed-off-by: Kim Phillips kim.phill...@freescale.com Signed-off-by: Vishnu Suresh vis...@freescale.com Signed-off-by: Qiang Liu qiang@freescale.com --- drivers/crypto/Kconfig |9 + drivers/crypto/talitos.c | 413 ++ drivers/crypto/talitos.h | 53 ++ 3 files changed, 475 insertions(+), 0 deletions(-) diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig index be6b2ba..f0a7c29 100644 --- a/drivers/crypto/Kconfig +++ b/drivers/crypto/Kconfig @@ -222,6 +222,15 @@ config CRYPTO_DEV_TALITOS To compile this driver as a module, choose M here: the module will be called talitos. +config CRYPTO_DEV_TALITOS_RAIDXOR + bool Talitos RAID5 XOR Calculation Offload + default y No, default y. The user should explicitly enable this. Ok, I will change it to N. + select DMA_ENGINE + depends on CRYPTO_DEV_TALITOS + help + Say 'Y' here to use the Freescale Security Engine (SEC) to + offload RAID XOR parity Calculation + Is it faster than cpu xor? Yes, I tested with IOZone, cpu load is also reduced. I think one possible reason is the processor of p1022ds is not strong as others, so the performance is improved obviously. Will this engine be coordinating with another to handle memory copies? The dma mapping code for async_tx/raid is broken when dma mapping requests overlap or cross dma device boundaries [1]. [1]: http://marc.info/?l=linux-arm-kernelm=129407269402930w=2 Yes, it needs fsl-dma to handle memcpy copies. I read your link, the unmap address is stored in talitos hwdesc, the address will be unmapped when async_tx ack this descriptor, I know fsl-dma won't wait this ack flag in current kernel, so I fix it in fsl-dma patch 5/8. Do you mean that? +static void talitos_process_pending(struct talitos_xor_chan *xor_chan) +{ + struct talitos_xor_desc *desc, *_desc; + unsigned long flags; + int status; + struct talitos_private *priv; + int ch; + + priv = dev_get_drvdata(xor_chan-dev); + ch = atomic_inc_return(priv-last_chan) + (priv-num_channels - 1); Maybe a comment about why this this is duplicated from talitos_cra_init()? It sticks out here and I had to go looking to find out why this channel number increments on submit. My fault, I will correct it as below, atomic_read((priv-last_chan) (priv-num_channels - 1); + spin_lock_irqsave(xor_chan-desc_lock, flags); + + list_for_each_entry_safe(desc, _desc, xor_chan-pending_q, node) { + status = talitos_submit(xor_chan-dev, ch, desc- hwdesc, + talitos_release_xor, desc); + if (status != -EINPROGRESS) + break; + + list_del(desc-node); + list_add_tail(desc-node, xor_chan-in_progress_q); + } + + spin_unlock_irqrestore(xor_chan-desc_lock, flags); +} + +static void talitos_xor_run_tx_complete_actions(struct talitos_xor_desc *desc, + struct talitos_xor_chan *xor_chan) +{ + struct device *dev = xor_chan-dev; + dma_addr_t dest, addr; + unsigned int src_cnt = desc-unmap_src_cnt; + unsigned int len = desc-unmap_len; + enum dma_ctrl_flags flags = desc-async_tx.flags; + struct dma_async_tx_descriptor *tx = desc-async_tx; + + /* unmap dma addresses */ + dest = desc-hwdesc.ptr[6].ptr; + if (likely(!(flags DMA_COMPL_SKIP_DEST_UNMAP))) + dma_unmap_page(dev, dest, len, DMA_BIDIRECTIONAL); + + desc-idx = 6 - src_cnt; + if (likely(!(flags DMA_COMPL_SKIP_SRC_UNMAP))) { + while(desc-idx 6) { Checkpatch says: ERROR: space required before the open parenthesis '(' I will correct it next. + addr = desc
Re: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
On Tue, Sep 4, 2012 at 5:28 AM, Liu Qiang-B32616 b32...@freescale.com wrote: Will this engine be coordinating with another to handle memory copies? The dma mapping code for async_tx/raid is broken when dma mapping requests overlap or cross dma device boundaries [1]. [1]: http://marc.info/?l=linux-arm-kernelm=129407269402930w=2 Yes, it needs fsl-dma to handle memcpy copies. I read your link, the unmap address is stored in talitos hwdesc, the address will be unmapped when async_tx ack this descriptor, I know fsl-dma won't wait this ack flag in current kernel, so I fix it in fsl-dma patch 5/8. Do you mean that? Unfortunately no. I'm open to other suggestions. but as far as I can see it requires deeper changes to rip out the dma mapping that happens in async_tx and the automatic unmapping done by drivers. It should all be pushed to the client (md). Currently async_tx hides hardware details from md such that it doesn't even care if the operation is offloaded to hardware at all, but that takes things too far. In the worst case an copy-xor chain handled by multiple channels results in : 1/ dma_map(copy_chan...) 2/ dma_map(xor_chan...) 3/ exec copy 4/ dma_unmap(copy_chan...) 5/ exec xor ---initiated by the copy_chan 6/ dma_unmap(xor_chan...) Step 2 violates the dma api since the buffers belong to the xor_chan until unmap. Step 5 also causes the random completion context of the copy channel to bleed into submission context of the xor channel which is problematic. So the order needs to be: 1/ dma_map(copy_chan...) 2/ exec copy 3/ dma_unmap(copy_chan...) 4/ dma_map(xor_chan...) 5/ exec xor --initiated by md in a static context 6/ dma_unmap(xor_chan...) Also, if xor_chan and copy_chan lie with the same dma mapping domain (iommu or parent device) then we can map the stripe once and skip the extra maintenance for the duration of the chain of operations. This dumps a lot of hardware details on md, but I think it is the only way to get consistent semantics when arbitrary offload devices are involved. -- Dan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
On Thu, Aug 30, 2012 at 7:23 AM, Geanta Neag Horia Ioan-B05471 b05...@freescale.com wrote: Besides these: 1. As Dan Williams mentioned, you should explain why you are using both spin_lock_bh and spin_lock_irqsave on the same lock. It looks like talitos_process_pending() can be called from hardirq context via talitos_error(). So spin_lock_bh is not sufficient. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
On Thu, Aug 9, 2012 at 1:20 AM, qiang@freescale.com wrote: From: Qiang Liu qiang@freescale.com Expose Talitos's XOR functionality to be used for RAID parity calculation via the Async_tx layer. Cc: Herbert Xu herb...@gondor.apana.org.au Cc: David S. Miller da...@davemloft.net Signed-off-by: Dipen Dudhat dipen.dud...@freescale.com Signed-off-by: Maneesh Gupta maneesh.gu...@freescale.com Signed-off-by: Kim Phillips kim.phill...@freescale.com Signed-off-by: Vishnu Suresh vis...@freescale.com Signed-off-by: Qiang Liu qiang@freescale.com --- drivers/crypto/Kconfig |9 + drivers/crypto/talitos.c | 413 ++ drivers/crypto/talitos.h | 53 ++ 3 files changed, 475 insertions(+), 0 deletions(-) diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig index be6b2ba..f0a7c29 100644 --- a/drivers/crypto/Kconfig +++ b/drivers/crypto/Kconfig @@ -222,6 +222,15 @@ config CRYPTO_DEV_TALITOS To compile this driver as a module, choose M here: the module will be called talitos. +config CRYPTO_DEV_TALITOS_RAIDXOR + bool Talitos RAID5 XOR Calculation Offload + default y No, default y. The user should explicitly enable this. + select DMA_ENGINE + depends on CRYPTO_DEV_TALITOS + help + Say 'Y' here to use the Freescale Security Engine (SEC) to + offload RAID XOR parity Calculation + Is it faster than cpu xor? Will this engine be coordinating with another to handle memory copies? The dma mapping code for async_tx/raid is broken when dma mapping requests overlap or cross dma device boundaries [1]. [1]: http://marc.info/?l=linux-arm-kernelm=129407269402930w=2 +static void talitos_process_pending(struct talitos_xor_chan *xor_chan) +{ + struct talitos_xor_desc *desc, *_desc; + unsigned long flags; + int status; + struct talitos_private *priv; + int ch; + + priv = dev_get_drvdata(xor_chan-dev); + ch = atomic_inc_return(priv-last_chan) + (priv-num_channels - 1); Maybe a comment about why this this is duplicated from talitos_cra_init()? It sticks out here and I had to go looking to find out why this channel number increments on submit. + spin_lock_irqsave(xor_chan-desc_lock, flags); + + list_for_each_entry_safe(desc, _desc, xor_chan-pending_q, node) { + status = talitos_submit(xor_chan-dev, ch, desc-hwdesc, + talitos_release_xor, desc); + if (status != -EINPROGRESS) + break; + + list_del(desc-node); + list_add_tail(desc-node, xor_chan-in_progress_q); + } + + spin_unlock_irqrestore(xor_chan-desc_lock, flags); +} + +static void talitos_xor_run_tx_complete_actions(struct talitos_xor_desc *desc, + struct talitos_xor_chan *xor_chan) +{ + struct device *dev = xor_chan-dev; + dma_addr_t dest, addr; + unsigned int src_cnt = desc-unmap_src_cnt; + unsigned int len = desc-unmap_len; + enum dma_ctrl_flags flags = desc-async_tx.flags; + struct dma_async_tx_descriptor *tx = desc-async_tx; + + /* unmap dma addresses */ + dest = desc-hwdesc.ptr[6].ptr; + if (likely(!(flags DMA_COMPL_SKIP_DEST_UNMAP))) + dma_unmap_page(dev, dest, len, DMA_BIDIRECTIONAL); + + desc-idx = 6 - src_cnt; + if (likely(!(flags DMA_COMPL_SKIP_SRC_UNMAP))) { + while(desc-idx 6) { Checkpatch says: ERROR: space required before the open parenthesis '(' + addr = desc-hwdesc.ptr[desc-idx++].ptr; + if (addr == dest) + continue; + dma_unmap_page(dev, addr, len, DMA_TO_DEVICE); + } + } + + /* run dependent operations */ + dma_run_dependencies(tx); Here is where we run into problems if another engine accesses these same buffers, especially on ARM v6+. +} + +static void talitos_release_xor(struct device *dev, struct talitos_desc *hwdesc, + void *context, int error) +{ + struct talitos_xor_desc *desc = context; + struct talitos_xor_chan *xor_chan; + dma_async_tx_callback callback; + void *callback_param; + + if (unlikely(error)) + dev_err(dev, xor operation: talitos error %d\n, error); + + xor_chan = container_of(desc-async_tx.chan, struct talitos_xor_chan, + common); + spin_lock_bh(xor_chan-desc_lock); + if (xor_chan-completed_cookie desc-async_tx.cookie) + xor_chan-completed_cookie = desc-async_tx.cookie; + Use dma_cookie_complete(). + callback = desc-async_tx.callback; + callback_param = desc-async_tx.callback_param; + +
RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
On Fri, 31 Aug 2012 06:08:05 +0300, Liu Qiang-B32616 b32...@freescale.com wrote: -Original Message- From: Geanta Neag Horia Ioan-B05471 Sent: Thursday, August 30, 2012 10:23 PM To: Liu Qiang-B32616; linux-cry...@vger.kernel.org; dan.j.willi...@gmail.com; herb...@gondor.hengli.com.au; da...@davemloft.net; linux-ker...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Cc: Li Yang-R58472; Phillips Kim-R1AAHA; vinod.k...@intel.com; dan.j.willi...@intel.com; a...@arndb.de; gre...@linuxfoundation.org; Liu Qiang-B32616 Subject: RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload On Thu, 9 Aug 2012 11:20:48 +0300, qiang@freescale.com wrote: From: Qiang Liu qiang@freescale.com Expose Talitos's XOR functionality to be used for RAID parity calculation via the Async_tx layer. Cc: Herbert Xu herb...@gondor.apana.org.au Cc: David S. Miller da...@davemloft.net Signed-off-by: Dipen Dudhat dipen.dud...@freescale.com Signed-off-by: Maneesh Gupta maneesh.gu...@freescale.com Signed-off-by: Kim Phillips kim.phill...@freescale.com Signed-off-by: Vishnu Suresh vis...@freescale.com Signed-off-by: Qiang Liu qiang@freescale.com --- drivers/crypto/Kconfig |9 + drivers/crypto/talitos.c | 413 ++ drivers/crypto/talitos.h | 53 ++ 3 files changed, 475 insertions(+), 0 deletions(-) +static int talitos_alloc_chan_resources(struct dma_chan *chan) +{ + struct talitos_xor_chan *xor_chan; + struct talitos_xor_desc *desc; + LIST_HEAD(tmp_list); + int i; + + xor_chan = container_of(chan, struct talitos_xor_chan, common); + + if (!list_empty(xor_chan-free_desc)) + return xor_chan-total_desc; + + for (i = 0; i TALITOS_MAX_DESCRIPTOR_NR; i++) { + desc = talitos_xor_alloc_descriptor(xor_chan, + GFP_KERNEL | GFP_DMA); talitos_xor_alloc_descriptor() is called here without holding the xor_chan-desc_lock and it increments xor_chan-total_desc. Isn't this an issue ? No, please refer to the code as below, + list_add_tail(desc-node, tmp_list); The list is temporary list, it will be merged to xor_chan-free_desc in next step, here is protected by lock, + spin_lock_bh(xor_chan-desc_lock); + list_splice_init(tmp_list, xor_chan-free_desc); + spin_unlock_bh(xor_chan-desc_lock); I was not referring to the list, but to xor_chan-total_desc variable. The following access: talitos_alloc_chan_resources()-talitos_xor_alloc_descriptor()-total_desc++ is not protected by the xor_chan-desc_lock. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
-Original Message- From: Geanta Neag Horia Ioan-B05471 Sent: Friday, August 31, 2012 6:39 PM To: Liu Qiang-B32616; linux-cry...@vger.kernel.org; dan.j.willi...@gmail.com; herb...@gondor.hengli.com.au; da...@davemloft.net; linux-ker...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Cc: Li Yang-R58472; Phillips Kim-R1AAHA; vinod.k...@intel.com; Dan Williams; a...@arndb.de; gre...@linuxfoundation.org Subject: RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload On Fri, 31 Aug 2012 06:08:05 +0300, Liu Qiang-B32616 b32...@freescale.com wrote: -Original Message- From: Geanta Neag Horia Ioan-B05471 Sent: Thursday, August 30, 2012 10:23 PM To: Liu Qiang-B32616; linux-cry...@vger.kernel.org; dan.j.willi...@gmail.com; herb...@gondor.hengli.com.au; da...@davemloft.net; linux-ker...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Cc: Li Yang-R58472; Phillips Kim-R1AAHA; vinod.k...@intel.com; dan.j.willi...@intel.com; a...@arndb.de; gre...@linuxfoundation.org; Liu Qiang-B32616 Subject: RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload On Thu, 9 Aug 2012 11:20:48 +0300, qiang@freescale.com wrote: From: Qiang Liu qiang@freescale.com Expose Talitos's XOR functionality to be used for RAID parity calculation via the Async_tx layer. Cc: Herbert Xu herb...@gondor.apana.org.au Cc: David S. Miller da...@davemloft.net Signed-off-by: Dipen Dudhat dipen.dud...@freescale.com Signed-off-by: Maneesh Gupta maneesh.gu...@freescale.com Signed-off-by: Kim Phillips kim.phill...@freescale.com Signed-off-by: Vishnu Suresh vis...@freescale.com Signed-off-by: Qiang Liu qiang@freescale.com --- drivers/crypto/Kconfig |9 + drivers/crypto/talitos.c | 413 ++ drivers/crypto/talitos.h | 53 ++ 3 files changed, 475 insertions(+), 0 deletions(-) +static int talitos_alloc_chan_resources(struct dma_chan *chan) +{ + struct talitos_xor_chan *xor_chan; + struct talitos_xor_desc *desc; + LIST_HEAD(tmp_list); + int i; + + xor_chan = container_of(chan, struct talitos_xor_chan, common); + + if (!list_empty(xor_chan-free_desc)) + return xor_chan-total_desc; + + for (i = 0; i TALITOS_MAX_DESCRIPTOR_NR; i++) { + desc = talitos_xor_alloc_descriptor(xor_chan, + GFP_KERNEL | GFP_DMA); talitos_xor_alloc_descriptor() is called here without holding the xor_chan-desc_lock and it increments xor_chan-total_desc. Isn't this an issue ? No, please refer to the code as below, + list_add_tail(desc-node, tmp_list); The list is temporary list, it will be merged to xor_chan-free_desc in next step, here is protected by lock, + spin_lock_bh(xor_chan-desc_lock); + list_splice_init(tmp_list, xor_chan-free_desc); + spin_unlock_bh(xor_chan-desc_lock); I was not referring to the list, but to xor_chan-total_desc variable. The following access: talitos_alloc_chan_resources()-talitos_xor_alloc_descriptor()- total_desc++ is not protected by the xor_chan-desc_lock. Ok, I will correct it in next series. Thanks. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
On Thu, 9 Aug 2012 11:20:48 +0300, qiang@freescale.com wrote: From: Qiang Liu qiang@freescale.com Expose Talitos's XOR functionality to be used for RAID parity calculation via the Async_tx layer. Cc: Herbert Xu herb...@gondor.apana.org.au Cc: David S. Miller da...@davemloft.net Signed-off-by: Dipen Dudhat dipen.dud...@freescale.com Signed-off-by: Maneesh Gupta maneesh.gu...@freescale.com Signed-off-by: Kim Phillips kim.phill...@freescale.com Signed-off-by: Vishnu Suresh vis...@freescale.com Signed-off-by: Qiang Liu qiang@freescale.com --- drivers/crypto/Kconfig |9 + drivers/crypto/talitos.c | 413 ++ drivers/crypto/talitos.h | 53 ++ 3 files changed, 475 insertions(+), 0 deletions(-) +static void talitos_xor_run_tx_complete_actions(struct talitos_xor_desc *desc, + struct talitos_xor_chan *xor_chan) +{ + struct device *dev = xor_chan-dev; + dma_addr_t dest, addr; + unsigned int src_cnt = desc-unmap_src_cnt; + unsigned int len = desc-unmap_len; + enum dma_ctrl_flags flags = desc-async_tx.flags; + struct dma_async_tx_descriptor *tx = desc-async_tx; + + /* unmap dma addresses */ + dest = desc-hwdesc.ptr[6].ptr; + if (likely(!(flags DMA_COMPL_SKIP_DEST_UNMAP))) + dma_unmap_page(dev, dest, len, DMA_BIDIRECTIONAL); + + desc-idx = 6 - src_cnt; + if (likely(!(flags DMA_COMPL_SKIP_SRC_UNMAP))) { + while(desc-idx 6) { + addr = desc-hwdesc.ptr[desc-idx++].ptr; + if (addr == dest) + continue; + dma_unmap_page(dev, addr, len, DMA_TO_DEVICE); + } + } No need for braces around the while block. + /* run dependent operations */ + dma_run_dependencies(tx); +} +static void talitos_release_xor(struct device *dev, struct talitos_desc *hwdesc, + void *context, int error) +{ + struct talitos_xor_desc *desc = context; + struct talitos_xor_chan *xor_chan; + dma_async_tx_callback callback; + void *callback_param; + + if (unlikely(error)) + dev_err(dev, xor operation: talitos error %d\n, error); + + xor_chan = container_of(desc-async_tx.chan, struct talitos_xor_chan, + common); + spin_lock_bh(xor_chan-desc_lock); + if (xor_chan-completed_cookie desc-async_tx.cookie) + xor_chan-completed_cookie = desc-async_tx.cookie; + + callback = desc-async_tx.callback; + callback_param = desc-async_tx.callback_param; + + if (callback) { + spin_unlock_bh(xor_chan-desc_lock); + callback(callback_param); + spin_lock_bh(xor_chan-desc_lock); + } Since callback_param is used only here, maybe: if (callback) { void *callback_param = desc-async_tx.callback_param; spin_unlock_bh(xor_chan-desc_lock); callback(callback_param); spin_lock_bh(xor_chan-desc_lock); } + + talitos_xor_run_tx_complete_actions(desc, xor_chan); + + list_del(desc-node); + list_add_tail(desc-node, xor_chan-free_desc); + spin_unlock_bh(xor_chan-desc_lock); + if (!list_empty(xor_chan-pending_q)) + talitos_process_pending(xor_chan); +} +static int talitos_alloc_chan_resources(struct dma_chan *chan) +{ + struct talitos_xor_chan *xor_chan; + struct talitos_xor_desc *desc; + LIST_HEAD(tmp_list); + int i; + + xor_chan = container_of(chan, struct talitos_xor_chan, common); + + if (!list_empty(xor_chan-free_desc)) + return xor_chan-total_desc; + + for (i = 0; i TALITOS_MAX_DESCRIPTOR_NR; i++) { + desc = talitos_xor_alloc_descriptor(xor_chan, + GFP_KERNEL | GFP_DMA); talitos_xor_alloc_descriptor() is called here without holding the xor_chan-desc_lock and it increments xor_chan-total_desc. Isn't this an issue ? + if (!desc) { + dev_err(xor_chan-common.device-dev, + Only %d initial descriptors\n, i); + break; + } + list_add_tail(desc-node, tmp_list); + } + + if (!i) + return -ENOMEM; + + /* At least one desc is allocated */ + spin_lock_bh(xor_chan-desc_lock); + list_splice_init(tmp_list, xor_chan-free_desc); + spin_unlock_bh(xor_chan-desc_lock); + + return xor_chan-total_desc; +} +/** + * talitos_register_dma_async - Initialize the Freescale XOR ADMA device + * It is registered as a DMA device with the capability to perform + * XOR operation with the Async_tx layer. + * The various queues and channel resources are also allocated. + */ +static int talitos_register_async_tx(struct device *dev, int max_xor_srcs) +{ + struct talitos_private *priv =
RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
-Original Message- From: Geanta Neag Horia Ioan-B05471 Sent: Thursday, August 30, 2012 10:23 PM To: Liu Qiang-B32616; linux-cry...@vger.kernel.org; dan.j.willi...@gmail.com; herb...@gondor.hengli.com.au; da...@davemloft.net; linux-ker...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Cc: Li Yang-R58472; Phillips Kim-R1AAHA; vinod.k...@intel.com; dan.j.willi...@intel.com; a...@arndb.de; gre...@linuxfoundation.org; Liu Qiang-B32616 Subject: RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload On Thu, 9 Aug 2012 11:20:48 +0300, qiang@freescale.com wrote: From: Qiang Liu qiang@freescale.com Expose Talitos's XOR functionality to be used for RAID parity calculation via the Async_tx layer. Cc: Herbert Xu herb...@gondor.apana.org.au Cc: David S. Miller da...@davemloft.net Signed-off-by: Dipen Dudhat dipen.dud...@freescale.com Signed-off-by: Maneesh Gupta maneesh.gu...@freescale.com Signed-off-by: Kim Phillips kim.phill...@freescale.com Signed-off-by: Vishnu Suresh vis...@freescale.com Signed-off-by: Qiang Liu qiang@freescale.com --- drivers/crypto/Kconfig |9 + drivers/crypto/talitos.c | 413 ++ drivers/crypto/talitos.h | 53 ++ 3 files changed, 475 insertions(+), 0 deletions(-) +static void talitos_xor_run_tx_complete_actions(struct talitos_xor_desc *desc, + struct talitos_xor_chan *xor_chan) +{ + struct device *dev = xor_chan-dev; + dma_addr_t dest, addr; + unsigned int src_cnt = desc-unmap_src_cnt; + unsigned int len = desc-unmap_len; + enum dma_ctrl_flags flags = desc-async_tx.flags; + struct dma_async_tx_descriptor *tx = desc-async_tx; + + /* unmap dma addresses */ + dest = desc-hwdesc.ptr[6].ptr; + if (likely(!(flags DMA_COMPL_SKIP_DEST_UNMAP))) + dma_unmap_page(dev, dest, len, DMA_BIDIRECTIONAL); + + desc-idx = 6 - src_cnt; + if (likely(!(flags DMA_COMPL_SKIP_SRC_UNMAP))) { + while(desc-idx 6) { + addr = desc-hwdesc.ptr[desc-idx++].ptr; + if (addr == dest) + continue; + dma_unmap_page(dev, addr, len, DMA_TO_DEVICE); + } + } No need for braces around the while block. I will remove it. + /* run dependent operations */ + dma_run_dependencies(tx); +} +static void talitos_release_xor(struct device *dev, struct talitos_desc *hwdesc, + void *context, int error) +{ + struct talitos_xor_desc *desc = context; + struct talitos_xor_chan *xor_chan; + dma_async_tx_callback callback; + void *callback_param; + + if (unlikely(error)) + dev_err(dev, xor operation: talitos error %d\n, error); + + xor_chan = container_of(desc-async_tx.chan, struct talitos_xor_chan, + common); + spin_lock_bh(xor_chan-desc_lock); + if (xor_chan-completed_cookie desc-async_tx.cookie) + xor_chan-completed_cookie = desc-async_tx.cookie; + + callback = desc-async_tx.callback; + callback_param = desc-async_tx.callback_param; + + if (callback) { + spin_unlock_bh(xor_chan-desc_lock); + callback(callback_param); + spin_lock_bh(xor_chan-desc_lock); + } Since callback_param is used only here, maybe: if (callback) { void *callback_param = desc-async_tx.callback_param; spin_unlock_bh(xor_chan-desc_lock); callback(callback_param); spin_lock_bh(xor_chan-desc_lock); } Fine. I will modify it in next. + + talitos_xor_run_tx_complete_actions(desc, xor_chan); + + list_del(desc-node); + list_add_tail(desc-node, xor_chan-free_desc); + spin_unlock_bh(xor_chan-desc_lock); + if (!list_empty(xor_chan-pending_q)) + talitos_process_pending(xor_chan); +} +static int talitos_alloc_chan_resources(struct dma_chan *chan) +{ + struct talitos_xor_chan *xor_chan; + struct talitos_xor_desc *desc; + LIST_HEAD(tmp_list); + int i; + + xor_chan = container_of(chan, struct talitos_xor_chan, common); + + if (!list_empty(xor_chan-free_desc)) + return xor_chan-total_desc; + + for (i = 0; i TALITOS_MAX_DESCRIPTOR_NR; i++) { + desc = talitos_xor_alloc_descriptor(xor_chan, + GFP_KERNEL | GFP_DMA); talitos_xor_alloc_descriptor() is called here without holding the xor_chan-desc_lock and it increments xor_chan-total_desc. Isn't this an issue ? No, please refer to the code as below, + list_add_tail(desc-node, tmp_list); The list is temporary list, it will be merged to xor_chan-free_desc in next step, here is protected by lock, + spin_lock_bh(xor_chan-desc_lock); + list_splice_init(tmp_list, xor_chan-free_desc); + spin_unlock_bh(xor_chan-desc_lock