Linus, please pull from:
git://lost.foo-projects.org/~dwillia2/git/iop async-tx-fixes-for-linus
to receive:
Dan Williams (3):
async_tx: usage documentation and developer notes (v2)
async_tx: fix dma_wait_for_async_tx
raid5: fix 2 bugs in ops_complete_biofill
The raid5 change has been reviewed with Neil, and the documentation
received some fixups from Randy Dunlap and Shannon Nelson.
Documentation/crypto/async-tx-api.txt | 219 +
crypto/async_tx/async_tx.c| 12 ++-
drivers/md/raid5.c| 17 +--
3 files changed, 236 insertions(+), 12 deletions(-)
---
diff --git a/Documentation/crypto/async-tx-api.txt
b/Documentation/crypto/async-tx-api.txt
new file mode 100644
index 000..c1e9545
--- /dev/null
+++ b/Documentation/crypto/async-tx-api.txt
@@ -0,0 +1,219 @@
+Asynchronous Transfers/Transforms API
+
+1 INTRODUCTION
+
+2 GENEALOGY
+
+3 USAGE
+3.1 General format of the API
+3.2 Supported operations
+3.3 Descriptor management
+3.4 When does the operation execute?
+3.5 When does the operation complete?
+3.6 Constraints
+3.7 Example
+
+4 DRIVER DEVELOPER NOTES
+4.1 Conformance points
+4.2 "My application needs finer control of hardware channels"
+
+5 SOURCE
+
+---
+
+1 INTRODUCTION
+
+The async_tx API provides methods for describing a chain of asynchronous
+bulk memory transfers/transforms with support for inter-transactional
+dependencies. It is implemented as a dmaengine client that smooths over
+the details of different hardware offload engine implementations. Code
+that is written to the API can optimize for asynchronous operation and
+the API will fit the chain of operations to the available offload
+resources.
+
+2 GENEALOGY
+
+The API was initially designed to offload the memory copy and
+xor-parity-calculations of the md-raid5 driver using the offload engines
+present in the Intel(R) Xscale series of I/O processors. It also built
+on the 'dmaengine' layer developed for offloading memory copies in the
+network stack using Intel(R) I/OAT engines. The following design
+features surfaced as a result:
+1/ implicit synchronous path: users of the API do not need to know if
+ the platform they are running on has offload capabilities. The
+ operation will be offloaded when an engine is available and carried out
+ in software otherwise.
+2/ cross channel dependency chains: the API allows a chain of dependent
+ operations to be submitted, like xor->copy->xor in the raid5 case. The
+ API automatically handles cases where the transition from one operation
+ to another implies a hardware channel switch.
+3/ dmaengine extensions to support multiple clients and operation types
+ beyond 'memcpy'
+
+3 USAGE
+
+3.1 General format of the API:
+struct dma_async_tx_descriptor *
+async_(,
+ enum async_tx_flags flags,
+ struct dma_async_tx_descriptor *dependency,
+ dma_async_tx_callback callback_routine,
+ void *callback_parameter);
+
+3.2 Supported operations:
+memcpy - memory copy between a source and a destination buffer
+memset - fill a destination buffer with a byte value
+xor - xor a series of source buffers and write the result to a
+ destination buffer
+xor_zero_sum - xor a series of source buffers and set a flag if the
+ result is zero. The implementation attempts to prevent
+ writes to memory
+
+3.3 Descriptor management:
+The return value is non-NULL and points to a 'descriptor' when the operation
+has been queued to execute asynchronously. Descriptors are recycled
+resources, under control of the offload engine driver, to be reused as
+operations complete. When an application needs to submit a chain of
+operations it must guarantee that the descriptor is not automatically recycled
+before the dependency is submitted. This requires that all descriptors be
+acknowledged by the application before the offload engine driver is allowed to
+recycle (or free) the descriptor. A descriptor can be acked by one of the
+following methods:
+1/ setting the ASYNC_TX_ACK flag if no child operations are to be submitted
+2/ setting the ASYNC_TX_DEP_ACK flag to acknowledge the parent
+ descriptor of a new operation.
+3/ calling async_tx_ack() on the descriptor.
+
+3.4 When does the operation execute?
+Operations do not immediately issue after return from the
+async_ call. Offload engine drivers batch operations to
+improve performance by reducing the number of mmio cycles needed to
+manage the channel. Once a driver-specific threshold is met the driver
+automatically issues pending operations. An application can force this
+event by calling async_tx_issue_pending_all(). This operates on all
+channels since the application has no knowledge of channel to operation
+mapping.
+
+3.5 When does the operation complete?
+There are two methods for an application to learn about the co