RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload

2012-09-12 Thread Liu Qiang-B32616
  Will this engine be coordinating with another to handle memory copies?
   The dma mapping code for async_tx/raid is broken when dma mapping
  requests overlap or cross dma device boundaries [1].
 
  [1]: http://marc.info/?l=linux-arm-kernelm=129407269402930w=2
  Yes, it needs fsl-dma to handle memcpy copies.
  I read your link, the unmap address is stored in talitos hwdesc, the
 address will be unmapped when async_tx ack this descriptor, I know fsl-
 dma won't wait this ack flag in current kernel, so I fix it in fsl-dma
 patch 5/8. Do you mean that?
 
 Unfortunately no.  I'm open to other suggestions. but as far as I can
 see it requires deeper changes to rip out the dma mapping that happens
 in async_tx and the automatic unmapping done by drivers.  It should
 all be pushed to the client (md).
 
 Currently async_tx hides hardware details from md such that it doesn't
 even care if the operation is offloaded to hardware at all, but that
 takes things too far.  In the worst case an copy-xor chain handled by
 multiple channels results in :
 
 1/ dma_map(copy_chan...)
 2/ dma_map(xor_chan...)
 3/ exec copy
 4/ dma_unmap(copy_chan...)
 5/ exec xor ---initiated by the copy_chan
 6/ dma_unmap(xor_chan...)
 
 Step 2 violates the dma api since the buffers belong to the xor_chan
 until unmap.  Step 5 also causes the random completion context of the
 copy channel to bleed into submission context of the xor channel which
 is problematic.  So the order needs to be:
 
 1/ dma_map(copy_chan...)
 2/ exec copy
 3/ dma_unmap(copy_chan...)
 4/ dma_map(xor_chan...)
 5/ exec xor --initiated by md in a static context
 6/ dma_unmap(xor_chan...)
 
 Also, if xor_chan and copy_chan lie with the same dma mapping domain
 (iommu or parent device) then we can map the stripe once and skip the
 extra maintenance for the duration of the chain of operations.  This
 dumps a lot of hardware details on md, but I think it is the only way
 to get consistent semantics when arbitrary offload devices are
 involved.
Thanks for your answer and links, I did some investigate these days,

first, powerpc processor should be hardware assured cache coherency, it should
be ok for hardware when in step 5 (but I will avoid map same address on 
different
device).

second, I have a workaround to make dma_map/unmap by order when using 2 
different
device to offload, I will submit next descriptor until current descriptor 
complete,
if (submit-flags  ASYNC_TX_ACK)
async_tx_ack(tx);
if (depend_tx)
async_tx_ack(depend_tx);

+   /* do more check to support 2 devices offload? */
+   if (dma_wait_for_async_tx(tx) == DMA_ERROR)
+   panic(%s: DMA_ERROR waiting for tx\n, __func__);
}
EXPORT_SYMBOL_GPL(async_tx_submit);

Also use your example, 
1/ dma_map(copy_chan...)
2/ tx-submit(tx); async_tx_ack(tx);
3/ dma_unmap(copy_chan...)
4/ dma_map(xor_chan...)
5/ exec xor -- initialized by tx-submit(tx);
6/ dma_unmap(xor_chan...)

Under this way, actually dma_run_dependency() is useless, so this can make sure 
copy
and xor with same page processed by order, and only one descriptor per channel 
is
served. dma_unmap in driver is controlled by client (tx-flags)

How's you thinking or any suggestions? I test it on our powerpc, I don't know 
whether
it does work on other architecture.

Thanks.

 
 --
 Dan


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload

2012-09-04 Thread Liu Qiang-B32616
 -Original Message-
 From: dan.j.willi...@gmail.com [mailto:dan.j.willi...@gmail.com] On
 Behalf Of Dan Williams
 Sent: Sunday, September 02, 2012 4:12 PM
 To: Liu Qiang-B32616
 Cc: linux-cry...@vger.kernel.org; herb...@gondor.hengli.com.au;
 da...@davemloft.net; linux-ker...@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org; Li Yang-R58472; Phillips Kim-R1AAHA;
 vinod.k...@intel.com; dan.j.willi...@intel.com; a...@arndb.de;
 gre...@linuxfoundation.org
 Subject: Re: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
 
 On Thu, Aug 9, 2012 at 1:20 AM,  qiang@freescale.com wrote:
  From: Qiang Liu qiang@freescale.com
 
  Expose Talitos's XOR functionality to be used for RAID parity
  calculation via the Async_tx layer.
 
  Cc: Herbert Xu herb...@gondor.apana.org.au
  Cc: David S. Miller da...@davemloft.net
  Signed-off-by: Dipen Dudhat dipen.dud...@freescale.com
  Signed-off-by: Maneesh Gupta maneesh.gu...@freescale.com
  Signed-off-by: Kim Phillips kim.phill...@freescale.com
  Signed-off-by: Vishnu Suresh vis...@freescale.com
  Signed-off-by: Qiang Liu qiang@freescale.com
  ---
   drivers/crypto/Kconfig   |9 +
   drivers/crypto/talitos.c |  413
 ++
   drivers/crypto/talitos.h |   53 ++
   3 files changed, 475 insertions(+), 0 deletions(-)
 
  diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
  index be6b2ba..f0a7c29 100644
  --- a/drivers/crypto/Kconfig
  +++ b/drivers/crypto/Kconfig
  @@ -222,6 +222,15 @@ config CRYPTO_DEV_TALITOS
To compile this driver as a module, choose M here: the module
will be called talitos.
 
  +config CRYPTO_DEV_TALITOS_RAIDXOR
  +   bool Talitos RAID5 XOR Calculation Offload
  +   default y
 
 No, default y.  The user should explicitly enable this.
Ok, I will change it to N.

 
  +   select DMA_ENGINE
  +   depends on CRYPTO_DEV_TALITOS
  +   help
  + Say 'Y' here to use the Freescale Security Engine (SEC) to
  + offload RAID XOR parity Calculation
  +
 
 Is it faster than cpu xor?
Yes, I tested with IOZone, cpu load is also reduced.
I think one possible reason is the processor of p1022ds is not strong as 
others, so the performance is improved obviously.

 
 Will this engine be coordinating with another to handle memory copies?
  The dma mapping code for async_tx/raid is broken when dma mapping
 requests overlap or cross dma device boundaries [1].
 
 [1]: http://marc.info/?l=linux-arm-kernelm=129407269402930w=2
Yes, it needs fsl-dma to handle memcpy copies.
I read your link, the unmap address is stored in talitos hwdesc, the address 
will be unmapped when async_tx ack this descriptor, I know fsl-dma won't wait 
this ack flag in current kernel, so I fix it in fsl-dma patch 5/8. Do you mean 
that?

 
  +static void talitos_process_pending(struct talitos_xor_chan *xor_chan)
  +{
  +   struct talitos_xor_desc *desc, *_desc;
  +   unsigned long flags;
  +   int status;
  +   struct talitos_private *priv;
  +   int ch;
  +
  +   priv = dev_get_drvdata(xor_chan-dev);
  +   ch = atomic_inc_return(priv-last_chan) 
  + (priv-num_channels - 1);
 
 Maybe a comment about why this this is duplicated from
 talitos_cra_init()?  It sticks out here and I had to go looking to
 find out why this channel number increments on submit.
My fault, I will correct it as below,

atomic_read((priv-last_chan)  (priv-num_channels - 1);

 
 
  +   spin_lock_irqsave(xor_chan-desc_lock, flags);
  +
  +   list_for_each_entry_safe(desc, _desc, xor_chan-pending_q,
 node) {
  +   status = talitos_submit(xor_chan-dev, ch, desc-
 hwdesc,
  +   talitos_release_xor, desc);
  +   if (status != -EINPROGRESS)
  +   break;
  +
  +   list_del(desc-node);
  +   list_add_tail(desc-node, xor_chan-in_progress_q);
  +   }
  +
  +   spin_unlock_irqrestore(xor_chan-desc_lock, flags);
  +}
  +
  +static void talitos_xor_run_tx_complete_actions(struct
 talitos_xor_desc *desc,
  +   struct talitos_xor_chan *xor_chan)
  +{
  +   struct device *dev = xor_chan-dev;
  +   dma_addr_t dest, addr;
  +   unsigned int src_cnt = desc-unmap_src_cnt;
  +   unsigned int len = desc-unmap_len;
  +   enum dma_ctrl_flags flags = desc-async_tx.flags;
  +   struct dma_async_tx_descriptor *tx = desc-async_tx;
  +
  +   /* unmap dma addresses */
  +   dest = desc-hwdesc.ptr[6].ptr;
  +   if (likely(!(flags  DMA_COMPL_SKIP_DEST_UNMAP)))
  +   dma_unmap_page(dev, dest, len, DMA_BIDIRECTIONAL);
  +
  +   desc-idx = 6 - src_cnt;
  +   if (likely(!(flags  DMA_COMPL_SKIP_SRC_UNMAP))) {
  +   while(desc-idx  6) {
 
 Checkpatch says:
 ERROR: space required before the open parenthesis '('
I will correct it next.

 
 
  +   addr = desc

Re: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload

2012-09-04 Thread Dan Williams
On Tue, Sep 4, 2012 at 5:28 AM, Liu Qiang-B32616 b32...@freescale.com wrote:
 Will this engine be coordinating with another to handle memory copies?
  The dma mapping code for async_tx/raid is broken when dma mapping
 requests overlap or cross dma device boundaries [1].

 [1]: http://marc.info/?l=linux-arm-kernelm=129407269402930w=2
 Yes, it needs fsl-dma to handle memcpy copies.
 I read your link, the unmap address is stored in talitos hwdesc, the address 
 will be unmapped when async_tx ack this descriptor, I know fsl-dma won't wait 
 this ack flag in current kernel, so I fix it in fsl-dma patch 5/8. Do you 
 mean that?

Unfortunately no.  I'm open to other suggestions. but as far as I can
see it requires deeper changes to rip out the dma mapping that happens
in async_tx and the automatic unmapping done by drivers.  It should
all be pushed to the client (md).

Currently async_tx hides hardware details from md such that it doesn't
even care if the operation is offloaded to hardware at all, but that
takes things too far.  In the worst case an copy-xor chain handled by
multiple channels results in :

1/ dma_map(copy_chan...)
2/ dma_map(xor_chan...)
3/ exec copy
4/ dma_unmap(copy_chan...)
5/ exec xor ---initiated by the copy_chan
6/ dma_unmap(xor_chan...)

Step 2 violates the dma api since the buffers belong to the xor_chan
until unmap.  Step 5 also causes the random completion context of the
copy channel to bleed into submission context of the xor channel which
is problematic.  So the order needs to be:

1/ dma_map(copy_chan...)
2/ exec copy
3/ dma_unmap(copy_chan...)
4/ dma_map(xor_chan...)
5/ exec xor --initiated by md in a static context
6/ dma_unmap(xor_chan...)

Also, if xor_chan and copy_chan lie with the same dma mapping domain
(iommu or parent device) then we can map the stripe once and skip the
extra maintenance for the duration of the chain of operations.  This
dumps a lot of hardware details on md, but I think it is the only way
to get consistent semantics when arbitrary offload devices are
involved.

--
Dan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload

2012-09-02 Thread Dan Williams
On Thu, Aug 30, 2012 at 7:23 AM, Geanta Neag Horia Ioan-B05471
b05...@freescale.com wrote:

 Besides these:
 1. As Dan Williams mentioned, you should explain why you are using
 both spin_lock_bh and spin_lock_irqsave on the same lock.

It looks like talitos_process_pending() can be called from hardirq
context via talitos_error().  So spin_lock_bh is not sufficient.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload

2012-09-02 Thread Dan Williams
On Thu, Aug 9, 2012 at 1:20 AM,  qiang@freescale.com wrote:
 From: Qiang Liu qiang@freescale.com

 Expose Talitos's XOR functionality to be used for RAID parity
 calculation via the Async_tx layer.

 Cc: Herbert Xu herb...@gondor.apana.org.au
 Cc: David S. Miller da...@davemloft.net
 Signed-off-by: Dipen Dudhat dipen.dud...@freescale.com
 Signed-off-by: Maneesh Gupta maneesh.gu...@freescale.com
 Signed-off-by: Kim Phillips kim.phill...@freescale.com
 Signed-off-by: Vishnu Suresh vis...@freescale.com
 Signed-off-by: Qiang Liu qiang@freescale.com
 ---
  drivers/crypto/Kconfig   |9 +
  drivers/crypto/talitos.c |  413 
 ++
  drivers/crypto/talitos.h |   53 ++
  3 files changed, 475 insertions(+), 0 deletions(-)

 diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
 index be6b2ba..f0a7c29 100644
 --- a/drivers/crypto/Kconfig
 +++ b/drivers/crypto/Kconfig
 @@ -222,6 +222,15 @@ config CRYPTO_DEV_TALITOS
   To compile this driver as a module, choose M here: the module
   will be called talitos.

 +config CRYPTO_DEV_TALITOS_RAIDXOR
 +   bool Talitos RAID5 XOR Calculation Offload
 +   default y

No, default y.  The user should explicitly enable this.

 +   select DMA_ENGINE
 +   depends on CRYPTO_DEV_TALITOS
 +   help
 + Say 'Y' here to use the Freescale Security Engine (SEC) to
 + offload RAID XOR parity Calculation
 +

Is it faster than cpu xor?

Will this engine be coordinating with another to handle memory copies?
 The dma mapping code for async_tx/raid is broken when dma mapping
requests overlap or cross dma device boundaries [1].

[1]: http://marc.info/?l=linux-arm-kernelm=129407269402930w=2

 +static void talitos_process_pending(struct talitos_xor_chan *xor_chan)
 +{
 +   struct talitos_xor_desc *desc, *_desc;
 +   unsigned long flags;
 +   int status;
 +   struct talitos_private *priv;
 +   int ch;
 +
 +   priv = dev_get_drvdata(xor_chan-dev);
 +   ch = atomic_inc_return(priv-last_chan) 
 + (priv-num_channels - 1);

Maybe a comment about why this this is duplicated from
talitos_cra_init()?  It sticks out here and I had to go looking to
find out why this channel number increments on submit.


 +   spin_lock_irqsave(xor_chan-desc_lock, flags);
 +
 +   list_for_each_entry_safe(desc, _desc, xor_chan-pending_q, node) {
 +   status = talitos_submit(xor_chan-dev, ch, desc-hwdesc,
 +   talitos_release_xor, desc);
 +   if (status != -EINPROGRESS)
 +   break;
 +
 +   list_del(desc-node);
 +   list_add_tail(desc-node, xor_chan-in_progress_q);
 +   }
 +
 +   spin_unlock_irqrestore(xor_chan-desc_lock, flags);
 +}
 +
 +static void talitos_xor_run_tx_complete_actions(struct talitos_xor_desc 
 *desc,
 +   struct talitos_xor_chan *xor_chan)
 +{
 +   struct device *dev = xor_chan-dev;
 +   dma_addr_t dest, addr;
 +   unsigned int src_cnt = desc-unmap_src_cnt;
 +   unsigned int len = desc-unmap_len;
 +   enum dma_ctrl_flags flags = desc-async_tx.flags;
 +   struct dma_async_tx_descriptor *tx = desc-async_tx;
 +
 +   /* unmap dma addresses */
 +   dest = desc-hwdesc.ptr[6].ptr;
 +   if (likely(!(flags  DMA_COMPL_SKIP_DEST_UNMAP)))
 +   dma_unmap_page(dev, dest, len, DMA_BIDIRECTIONAL);
 +
 +   desc-idx = 6 - src_cnt;
 +   if (likely(!(flags  DMA_COMPL_SKIP_SRC_UNMAP))) {
 +   while(desc-idx  6) {

Checkpatch says:
ERROR: space required before the open parenthesis '('


 +   addr = desc-hwdesc.ptr[desc-idx++].ptr;
 +   if (addr == dest)
 +   continue;
 +   dma_unmap_page(dev, addr, len, DMA_TO_DEVICE);
 +   }
 +   }
 +
 +   /* run dependent operations */
 +   dma_run_dependencies(tx);

Here is where we run into problems if another engine accesses these
same buffers, especially on ARM v6+.

 +}
 +
 +static void talitos_release_xor(struct device *dev, struct talitos_desc 
 *hwdesc,
 +   void *context, int error)
 +{
 +   struct talitos_xor_desc *desc = context;
 +   struct talitos_xor_chan *xor_chan;
 +   dma_async_tx_callback callback;
 +   void *callback_param;
 +
 +   if (unlikely(error))
 +   dev_err(dev, xor operation: talitos error %d\n, error);
 +
 +   xor_chan = container_of(desc-async_tx.chan, struct talitos_xor_chan,
 +   common);
 +   spin_lock_bh(xor_chan-desc_lock);
 +   if (xor_chan-completed_cookie  desc-async_tx.cookie)
 +   xor_chan-completed_cookie = desc-async_tx.cookie;
 +

Use dma_cookie_complete().

 +   callback = desc-async_tx.callback;
 +   callback_param = desc-async_tx.callback_param;
 +
 +   

RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload

2012-08-31 Thread Geanta Neag Horia Ioan-B05471
On Fri, 31 Aug 2012 06:08:05 +0300, Liu Qiang-B32616 b32...@freescale.com
wrote:
  -Original Message-
  From: Geanta Neag Horia Ioan-B05471
  Sent: Thursday, August 30, 2012 10:23 PM
  To: Liu Qiang-B32616; linux-cry...@vger.kernel.org;
  dan.j.willi...@gmail.com; herb...@gondor.hengli.com.au;
  da...@davemloft.net; linux-ker...@vger.kernel.org; linuxppc-
  d...@lists.ozlabs.org
  Cc: Li Yang-R58472; Phillips Kim-R1AAHA; vinod.k...@intel.com;
  dan.j.willi...@intel.com; a...@arndb.de; gre...@linuxfoundation.org; Liu
  Qiang-B32616
  Subject: RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
 
  On Thu, 9 Aug 2012 11:20:48 +0300, qiang@freescale.com wrote:
   From: Qiang Liu qiang@freescale.com
  
   Expose Talitos's XOR functionality to be used for RAID parity
   calculation via the Async_tx layer.
  
   Cc: Herbert Xu herb...@gondor.apana.org.au
   Cc: David S. Miller da...@davemloft.net
   Signed-off-by: Dipen Dudhat dipen.dud...@freescale.com
   Signed-off-by: Maneesh Gupta maneesh.gu...@freescale.com
   Signed-off-by: Kim Phillips kim.phill...@freescale.com
   Signed-off-by: Vishnu Suresh vis...@freescale.com
   Signed-off-by: Qiang Liu qiang@freescale.com
   ---
drivers/crypto/Kconfig   |9 +
drivers/crypto/talitos.c |  413
  ++
drivers/crypto/talitos.h |   53 ++
3 files changed, 475 insertions(+), 0 deletions(-)
 

   +static int talitos_alloc_chan_resources(struct dma_chan *chan)
   +{
   + struct talitos_xor_chan *xor_chan;
   + struct talitos_xor_desc *desc;
   + LIST_HEAD(tmp_list);
   + int i;
   +
   + xor_chan = container_of(chan, struct talitos_xor_chan, common);
   +
   + if (!list_empty(xor_chan-free_desc))
   + return xor_chan-total_desc;
   +
   + for (i = 0; i  TALITOS_MAX_DESCRIPTOR_NR; i++) {
   + desc = talitos_xor_alloc_descriptor(xor_chan,
   + GFP_KERNEL | GFP_DMA);
 
  talitos_xor_alloc_descriptor() is called here without holding
  the xor_chan-desc_lock and it increments xor_chan-total_desc.
  Isn't this an issue ?
 
 No, please refer to the code as below,
 + list_add_tail(desc-node, tmp_list);
 
 The list is temporary list, it will be merged to xor_chan-free_desc in next 
 step, here is protected
 by lock,
 + spin_lock_bh(xor_chan-desc_lock);
 + list_splice_init(tmp_list, xor_chan-free_desc);
 + spin_unlock_bh(xor_chan-desc_lock);

I was not referring to the list, but to xor_chan-total_desc variable.
The following access:
talitos_alloc_chan_resources()-talitos_xor_alloc_descriptor()-total_desc++
is not protected by the xor_chan-desc_lock.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload

2012-08-31 Thread Liu Qiang-B32616
 -Original Message-
 From: Geanta Neag Horia Ioan-B05471
 Sent: Friday, August 31, 2012 6:39 PM
 To: Liu Qiang-B32616; linux-cry...@vger.kernel.org;
 dan.j.willi...@gmail.com; herb...@gondor.hengli.com.au;
 da...@davemloft.net; linux-ker...@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org
 Cc: Li Yang-R58472; Phillips Kim-R1AAHA; vinod.k...@intel.com; Dan
 Williams; a...@arndb.de; gre...@linuxfoundation.org
 Subject: RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
 
 On Fri, 31 Aug 2012 06:08:05 +0300, Liu Qiang-B32616
 b32...@freescale.com
 wrote:
   -Original Message-
   From: Geanta Neag Horia Ioan-B05471
   Sent: Thursday, August 30, 2012 10:23 PM
   To: Liu Qiang-B32616; linux-cry...@vger.kernel.org;
   dan.j.willi...@gmail.com; herb...@gondor.hengli.com.au;
   da...@davemloft.net; linux-ker...@vger.kernel.org; linuxppc-
   d...@lists.ozlabs.org
   Cc: Li Yang-R58472; Phillips Kim-R1AAHA; vinod.k...@intel.com;
   dan.j.willi...@intel.com; a...@arndb.de; gre...@linuxfoundation.org;
 Liu
   Qiang-B32616
   Subject: RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
  
   On Thu, 9 Aug 2012 11:20:48 +0300, qiang@freescale.com wrote:
From: Qiang Liu qiang@freescale.com
   
Expose Talitos's XOR functionality to be used for RAID parity
calculation via the Async_tx layer.
   
Cc: Herbert Xu herb...@gondor.apana.org.au
Cc: David S. Miller da...@davemloft.net
Signed-off-by: Dipen Dudhat dipen.dud...@freescale.com
Signed-off-by: Maneesh Gupta maneesh.gu...@freescale.com
Signed-off-by: Kim Phillips kim.phill...@freescale.com
Signed-off-by: Vishnu Suresh vis...@freescale.com
Signed-off-by: Qiang Liu qiang@freescale.com
---
 drivers/crypto/Kconfig   |9 +
 drivers/crypto/talitos.c |  413
   ++
 drivers/crypto/talitos.h |   53 ++
 3 files changed, 475 insertions(+), 0 deletions(-)
  
 
+static int talitos_alloc_chan_resources(struct dma_chan *chan)
+{
+   struct talitos_xor_chan *xor_chan;
+   struct talitos_xor_desc *desc;
+   LIST_HEAD(tmp_list);
+   int i;
+
+   xor_chan = container_of(chan, struct talitos_xor_chan,
 common);
+
+   if (!list_empty(xor_chan-free_desc))
+   return xor_chan-total_desc;
+
+   for (i = 0; i  TALITOS_MAX_DESCRIPTOR_NR; i++) {
+   desc = talitos_xor_alloc_descriptor(xor_chan,
+   GFP_KERNEL | GFP_DMA);
  
   talitos_xor_alloc_descriptor() is called here without holding
   the xor_chan-desc_lock and it increments xor_chan-total_desc.
   Isn't this an issue ?
 
  No, please refer to the code as below,
  +   list_add_tail(desc-node, tmp_list);
 
  The list is temporary list, it will be merged to xor_chan-free_desc in
 next step, here is protected
  by lock,
  +   spin_lock_bh(xor_chan-desc_lock);
  +   list_splice_init(tmp_list, xor_chan-free_desc);
  +   spin_unlock_bh(xor_chan-desc_lock);
 
 I was not referring to the list, but to xor_chan-total_desc variable.
 The following access:
 talitos_alloc_chan_resources()-talitos_xor_alloc_descriptor()-
 total_desc++
 is not protected by the xor_chan-desc_lock.
Ok, I will correct it in next series. Thanks.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload

2012-08-30 Thread Geanta Neag Horia Ioan-B05471
On Thu, 9 Aug 2012 11:20:48 +0300, qiang@freescale.com wrote:
 From: Qiang Liu qiang@freescale.com
 
 Expose Talitos's XOR functionality to be used for RAID parity
 calculation via the Async_tx layer.
 
 Cc: Herbert Xu herb...@gondor.apana.org.au
 Cc: David S. Miller da...@davemloft.net
 Signed-off-by: Dipen Dudhat dipen.dud...@freescale.com
 Signed-off-by: Maneesh Gupta maneesh.gu...@freescale.com
 Signed-off-by: Kim Phillips kim.phill...@freescale.com
 Signed-off-by: Vishnu Suresh vis...@freescale.com
 Signed-off-by: Qiang Liu qiang@freescale.com
 ---
  drivers/crypto/Kconfig   |9 +
  drivers/crypto/talitos.c |  413 
 ++
  drivers/crypto/talitos.h |   53 ++
  3 files changed, 475 insertions(+), 0 deletions(-)


 +static void talitos_xor_run_tx_complete_actions(struct talitos_xor_desc 
 *desc,
 + struct talitos_xor_chan *xor_chan)
 +{
 + struct device *dev = xor_chan-dev;
 + dma_addr_t dest, addr;
 + unsigned int src_cnt = desc-unmap_src_cnt;
 + unsigned int len = desc-unmap_len;
 + enum dma_ctrl_flags flags = desc-async_tx.flags;
 + struct dma_async_tx_descriptor *tx = desc-async_tx;
 +
 + /* unmap dma addresses */
 + dest = desc-hwdesc.ptr[6].ptr;
 + if (likely(!(flags  DMA_COMPL_SKIP_DEST_UNMAP)))
 + dma_unmap_page(dev, dest, len, DMA_BIDIRECTIONAL);
 +
 + desc-idx = 6 - src_cnt;
 + if (likely(!(flags  DMA_COMPL_SKIP_SRC_UNMAP))) {
 + while(desc-idx  6) {
 + addr = desc-hwdesc.ptr[desc-idx++].ptr;
 + if (addr == dest)
 + continue;
 + dma_unmap_page(dev, addr, len, DMA_TO_DEVICE);
 + }
 + }

No need for braces around the while block.

 + /* run dependent operations */
 + dma_run_dependencies(tx);
 +}


 +static void talitos_release_xor(struct device *dev, struct talitos_desc 
 *hwdesc,
 + void *context, int error)
 +{
 + struct talitos_xor_desc *desc = context;
 + struct talitos_xor_chan *xor_chan;
 + dma_async_tx_callback callback;
 + void *callback_param;
 +
 + if (unlikely(error))
 + dev_err(dev, xor operation: talitos error %d\n, error);
 +
 + xor_chan = container_of(desc-async_tx.chan, struct talitos_xor_chan,
 + common);
 + spin_lock_bh(xor_chan-desc_lock);
 + if (xor_chan-completed_cookie  desc-async_tx.cookie)
 + xor_chan-completed_cookie = desc-async_tx.cookie;
 +
 + callback = desc-async_tx.callback;
 + callback_param = desc-async_tx.callback_param;
 +
 + if (callback) {
 + spin_unlock_bh(xor_chan-desc_lock);
 + callback(callback_param);
 + spin_lock_bh(xor_chan-desc_lock);
 + }

Since callback_param is used only here, maybe:

if (callback) {
void *callback_param = desc-async_tx.callback_param;

spin_unlock_bh(xor_chan-desc_lock);
callback(callback_param);
spin_lock_bh(xor_chan-desc_lock);
}

 +
 + talitos_xor_run_tx_complete_actions(desc, xor_chan);
 +
 + list_del(desc-node);
 + list_add_tail(desc-node, xor_chan-free_desc);
 + spin_unlock_bh(xor_chan-desc_lock);
 + if (!list_empty(xor_chan-pending_q))
 + talitos_process_pending(xor_chan);
 +}


 +static int talitos_alloc_chan_resources(struct dma_chan *chan)
 +{
 + struct talitos_xor_chan *xor_chan;
 + struct talitos_xor_desc *desc;
 + LIST_HEAD(tmp_list);
 + int i;
 +
 + xor_chan = container_of(chan, struct talitos_xor_chan, common);
 +
 + if (!list_empty(xor_chan-free_desc))
 + return xor_chan-total_desc;
 +
 + for (i = 0; i  TALITOS_MAX_DESCRIPTOR_NR; i++) {
 + desc = talitos_xor_alloc_descriptor(xor_chan,
 + GFP_KERNEL | GFP_DMA);

talitos_xor_alloc_descriptor() is called here without holding
the xor_chan-desc_lock and it increments xor_chan-total_desc.
Isn't this an issue ?

 + if (!desc) {
 + dev_err(xor_chan-common.device-dev,
 + Only %d initial descriptors\n, i);
 + break;
 + }
 + list_add_tail(desc-node, tmp_list);
 + }
 +
 + if (!i)
 + return -ENOMEM;
 +
 + /* At least one desc is allocated */
 + spin_lock_bh(xor_chan-desc_lock);
 + list_splice_init(tmp_list, xor_chan-free_desc);
 + spin_unlock_bh(xor_chan-desc_lock);
 +
 + return xor_chan-total_desc;
 +}


 +/**
 + * talitos_register_dma_async - Initialize the Freescale XOR ADMA device
 + * It is registered as a DMA device with the capability to perform
 + * XOR operation with the Async_tx layer.
 + * The various queues and channel resources are also allocated.
 + */
 +static int talitos_register_async_tx(struct device *dev, int max_xor_srcs)
 +{
 + struct talitos_private *priv = 

RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload

2012-08-30 Thread Liu Qiang-B32616
 -Original Message-
 From: Geanta Neag Horia Ioan-B05471
 Sent: Thursday, August 30, 2012 10:23 PM
 To: Liu Qiang-B32616; linux-cry...@vger.kernel.org;
 dan.j.willi...@gmail.com; herb...@gondor.hengli.com.au;
 da...@davemloft.net; linux-ker...@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org
 Cc: Li Yang-R58472; Phillips Kim-R1AAHA; vinod.k...@intel.com;
 dan.j.willi...@intel.com; a...@arndb.de; gre...@linuxfoundation.org; Liu
 Qiang-B32616
 Subject: RE: [PATCH v7 1/8] Talitos: Support for async_tx XOR offload
 
 On Thu, 9 Aug 2012 11:20:48 +0300, qiang@freescale.com wrote:
  From: Qiang Liu qiang@freescale.com
 
  Expose Talitos's XOR functionality to be used for RAID parity
  calculation via the Async_tx layer.
 
  Cc: Herbert Xu herb...@gondor.apana.org.au
  Cc: David S. Miller da...@davemloft.net
  Signed-off-by: Dipen Dudhat dipen.dud...@freescale.com
  Signed-off-by: Maneesh Gupta maneesh.gu...@freescale.com
  Signed-off-by: Kim Phillips kim.phill...@freescale.com
  Signed-off-by: Vishnu Suresh vis...@freescale.com
  Signed-off-by: Qiang Liu qiang@freescale.com
  ---
   drivers/crypto/Kconfig   |9 +
   drivers/crypto/talitos.c |  413
 ++
   drivers/crypto/talitos.h |   53 ++
   3 files changed, 475 insertions(+), 0 deletions(-)
 
 
  +static void talitos_xor_run_tx_complete_actions(struct
 talitos_xor_desc *desc,
  +   struct talitos_xor_chan *xor_chan)
  +{
  +   struct device *dev = xor_chan-dev;
  +   dma_addr_t dest, addr;
  +   unsigned int src_cnt = desc-unmap_src_cnt;
  +   unsigned int len = desc-unmap_len;
  +   enum dma_ctrl_flags flags = desc-async_tx.flags;
  +   struct dma_async_tx_descriptor *tx = desc-async_tx;
  +
  +   /* unmap dma addresses */
  +   dest = desc-hwdesc.ptr[6].ptr;
  +   if (likely(!(flags  DMA_COMPL_SKIP_DEST_UNMAP)))
  +   dma_unmap_page(dev, dest, len, DMA_BIDIRECTIONAL);
  +
  +   desc-idx = 6 - src_cnt;
  +   if (likely(!(flags  DMA_COMPL_SKIP_SRC_UNMAP))) {
  +   while(desc-idx  6) {
  +   addr = desc-hwdesc.ptr[desc-idx++].ptr;
  +   if (addr == dest)
  +   continue;
  +   dma_unmap_page(dev, addr, len, DMA_TO_DEVICE);
  +   }
  +   }
 
 No need for braces around the while block.
I will remove it.

 
  +   /* run dependent operations */
  +   dma_run_dependencies(tx);
  +}
 
 
  +static void talitos_release_xor(struct device *dev, struct
 talitos_desc *hwdesc,
  +   void *context, int error)
  +{
  +   struct talitos_xor_desc *desc = context;
  +   struct talitos_xor_chan *xor_chan;
  +   dma_async_tx_callback callback;
  +   void *callback_param;
  +
  +   if (unlikely(error))
  +   dev_err(dev, xor operation: talitos error %d\n, error);
  +
  +   xor_chan = container_of(desc-async_tx.chan, struct
 talitos_xor_chan,
  +   common);
  +   spin_lock_bh(xor_chan-desc_lock);
  +   if (xor_chan-completed_cookie  desc-async_tx.cookie)
  +   xor_chan-completed_cookie = desc-async_tx.cookie;
  +
  +   callback = desc-async_tx.callback;
  +   callback_param = desc-async_tx.callback_param;
  +
  +   if (callback) {
  +   spin_unlock_bh(xor_chan-desc_lock);
  +   callback(callback_param);
  +   spin_lock_bh(xor_chan-desc_lock);
  +   }
 
 Since callback_param is used only here, maybe:
 
 if (callback) {
   void *callback_param = desc-async_tx.callback_param;
 
   spin_unlock_bh(xor_chan-desc_lock);
   callback(callback_param);
   spin_lock_bh(xor_chan-desc_lock);
 }
Fine. I will modify it in next.

 
  +
  +   talitos_xor_run_tx_complete_actions(desc, xor_chan);
  +
  +   list_del(desc-node);
  +   list_add_tail(desc-node, xor_chan-free_desc);
  +   spin_unlock_bh(xor_chan-desc_lock);
  +   if (!list_empty(xor_chan-pending_q))
  +   talitos_process_pending(xor_chan);
  +}
 
 
  +static int talitos_alloc_chan_resources(struct dma_chan *chan)
  +{
  +   struct talitos_xor_chan *xor_chan;
  +   struct talitos_xor_desc *desc;
  +   LIST_HEAD(tmp_list);
  +   int i;
  +
  +   xor_chan = container_of(chan, struct talitos_xor_chan, common);
  +
  +   if (!list_empty(xor_chan-free_desc))
  +   return xor_chan-total_desc;
  +
  +   for (i = 0; i  TALITOS_MAX_DESCRIPTOR_NR; i++) {
  +   desc = talitos_xor_alloc_descriptor(xor_chan,
  +   GFP_KERNEL | GFP_DMA);
 
 talitos_xor_alloc_descriptor() is called here without holding
 the xor_chan-desc_lock and it increments xor_chan-total_desc.
 Isn't this an issue ?

No, please refer to the code as below, 
+   list_add_tail(desc-node, tmp_list);

The list is temporary list, it will be merged to xor_chan-free_desc in next 
step, here is protected by lock,
+   spin_lock_bh(xor_chan-desc_lock);
+   list_splice_init(tmp_list, xor_chan-free_desc);
+   spin_unlock_bh(xor_chan-desc_lock