Re: [PATCH] random: Don't overwrite CRNG state in crng_initialize()

2017-02-08 Thread Greg Kroah-Hartman
On Wed, Feb 08, 2017 at 08:31:26PM -0700, Alden Tondettar wrote:
> In short, the situation is:
> 
> A) No usable hardware RNG or arch_get_random() (or we don't trust it...)

Wait, why would you not trust arch_get_random()?  Is it broken somehow
on some arches?  If so, why not fix that as well?

thanks,

greg k-h


Re: [PATCH] random: Don't overwrite CRNG state in crng_initialize()

2017-02-08 Thread Theodore Ts'o
On Wed, Feb 08, 2017 at 08:31:26PM -0700, Alden Tondettar wrote:
> The new non-blocking system introduced in commit e192be9d9a30 ("random:
> replace non-blocking pool with a Chacha20-based CRNG") can under
> some circumstances report itself initialized while it still contains
> dangerously little entropy, as follows:
> 
> Approximately every 64th call to add_interrupt_randomness(), the "fast"
> pool of interrupt-timing-based entropy is fed into one of two places. At
> calls numbered <= 256, the fast pool is XORed into the primary CRNG state.
> At call 256, the CRNG is deemed initialized, getrandom(2) is unblocked,
> and reading from /dev/urandom no longer gives warnings.
> 
> At calls > 256, the fast pool is fed into the input pool, leaving the CRNG
> untouched.
> 
> The problem arises between call number 256 and 320. If crng_initialize()
> is called at this time, it will overwrite the _entire_ CRNG state with
> 48 bytes generated from the input pool.

So in practice this isn't a problem because crng_initialize is called
in early init.   For reference, the ordering of init calls are:

"early",  <--- crng_initialize is here()
"core",   < ftrace is initialized here()
"postcore",
"arch",
"subsys",  < acpi_init is here()
"fs",
"device",  < device probing is here
"late",

So in practice, call 256 typically happens **well** after
crng_initialize.  You can see where it is the boot messages, which is
after 2.5 seconds into the boot:

[2.570733] rtc_cmos 00:02: alarms up to one month, y3k, 114 bytes nvram, 
hpet irqs
[2.570863] usbcore: registered new interface driver i2c-tiny-usb
[2.571035] device-mapper: uevent: version 1.0.3
[2.571215] random: fast init done  <-
[2.571316] device-mapper: ioctl: 4.35.0-ioctl (2016-06-23) initialised: 
dm-de...@redhat.com
[2.571678] device-mapper: multipath round-robin: version 1.1.0 loaded
[2.571728] intel_pstate: Intel P-state driver initializing
[2.572331] input: AT Translated Set 2 keyboard as 
/devices/platform/i8042/serio0/input/input3
[2.572462] intel_pstate: HWP enabled
[2.572464] sdhci: Secure Digital Host Controller Interface driver

When is crng_initialize() called?  Sometime *before* 0.05 seconds into
the boot on my laptop:

[0.054529] ftrace: allocating 29140 entries in 114 pages

> In short, the situation is:
> 
> A) No usable hardware RNG or arch_get_random() (or we don't trust it...)
> B) add_interrupt_randomness() called 256-320 times but other
>add_*_randomness() functions aren't adding much entropy.
> C) then crng_initialize() is called
> D) not enough calls to add_*_randomness() to push the entropy
>estimate over 128 (yet)
> E) getrandom(2) or /dev/urandom used for something important
> 
> Based on a few experiments with VMs, A) through D) can occur easily in
> practice. And with no HDD we have a window of about a minute or two for
> E) to happen before add_interrupt_randomness() finally pushes the
> estimate over 128 on its own.

How did you determine when crng_initialize() was being called?  On a
VM generally there are fewer interrupts than on real hardware.  On
KVM, for I see the random: fast_init message being printed 3.6 seconds
into the boot.

On Google Compute Engine, the fast_init message happens 52 seconds into the
boot.

So what VM where you using?  I'm trying to figure out whether this is
hypothetical or real problem, and on what systems.

 - Ted


Re: [PATCH] Revert "hwrng: core - zeroize buffers with random data"

2017-02-08 Thread Linus Torvalds
Stephan, Herbert? The zeroes in /dev/hwrng output are obviously
complete crap, so there's something badly wrong somewhere.

The locking, for example, is completely buggered. There's even a
comment about it, but that comment makes the correct observation of
"but y'know: randomness". But the memset() also being outside the lock
makes a complete joke of the whole thing.

Is the hwrng thing even worth maintaining? Compared to something like
/dev/urandom, it clearly does not do a very good job.

So I'm inclined to take the revert, but I'm also somewhat inclined to
simply mark this crud broken when we have other things that clearly do
a lot better.

  Linus

On Tue, Feb 7, 2017 at 4:23 PM, David Daney  wrote:
> This reverts commit 2cc751545854d7bd7eedf4d7e377bb52e176cd07.


Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-08 Thread Eric Biggers
On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote:
> 
> Today, I did zram-lz4 performance test with fio in current mmotm and
> found it makes regression about 20%.
> 

This may or may not be the cause of the specific regression you're observing,
but I just noticed that the proposed patch drops a lot of FORCEINLINE
annotations from upstream LZ4.  The FORCEINLINE's are there for a reason,
especially for the main decompression and compression functions which are
basically "templates" that take in different sets of constant parameters, and
should be left in.  We should #define FORCEINLINE to __always_inline somewhere,
or just do a s/FORCEINLINE/__always_inline/g.

Note that the upstream LZ4 code is very carefully optimized, so we should not,
in general, be changing things like when functions are force-inlined, what the
hash table size is, etc.

[Also, for some reason linux-crypto is apparently still not receiving patch 1/5
in the series.  It's missing from the linux-crypto archive at
http://www.spinics.net/lists/linux-crypto/, so it's not just me.]

Thanks!

Eric


Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-08 Thread Minchan Kim
Hello Sven,

On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote:
> 
> This patchset is for updating the LZ4 compression module to a version based
> on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast
> which provides an "acceleration" parameter as a tradeoff between
> high compression ratio and high compression speed.
> 
> We want to use LZ4 fast in order to support compression in lustre
> and (mostly, based on that) investigate data reduction techniques in behalf of
> storage systems.
> 
> Also, it will be useful for other users of LZ4 compression, as with LZ4 fast
> it is possible to enable applications to use fast and/or high compression
> depending on the usecase.
> For instance, ZRAM is offering a LZ4 backend and could benefit from an updated
> LZ4 in the kernel.
> 
> LZ4 homepage: http://www.lz4.org/
> LZ4 source repository: https://github.com/lz4/lz4
> Source version: 1.7.3
> 
> Benchmark (taken from [1], Core i5-4300U @1.9GHz):
> |--||--
> Compressor  | Compression  | Decompression  | Ratio
> |--||--
> memcpy  |  4200 MB/s   |  4200 MB/s | 1.000
> LZ4 fast 50 |  1080 MB/s   |  2650 MB/s | 1.375
> LZ4 fast 17 |   680 MB/s   |  2220 MB/s | 1.607
> LZ4 fast 5  |   475 MB/s   |  1920 MB/s | 1.886
> LZ4 default |   385 MB/s   |  1850 MB/s | 2.101
> 
> [1] http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html
> 
> [PATCH 1/5] lib: Update LZ4 compressor module
> [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 module 
> version
> [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version
> [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new LZ4 
> version
> [PATCH 5/5] lib/lz4: Remove back-compat wrappers

Today, I did zram-lz4 performance test with fio in current mmotm and
found it makes regression about 20%.

"lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) so
applied your 5 patches. (But now sure current mmots has recent uptodate
patches)
"revert" means I reverted your 5 patches in current mmots.

 revertlz4-update

  seq-write   1547   1339  86.55%
 rand-write  22775  19381  85.10%
   seq-read   7035   5589  79.45%
  rand-read  78556  68479  87.17%
   mixed-seq(R)   1305   1066  81.69%
   mixed-seq(W)   1205984  81.66%
  mixed-rand(R)  17421  14993  86.06%
  mixed-rand(W)  17391  14968  86.07%

My fio description file

[global]
bs=4k
ioengine=sync
size=100m
numjobs=1
group_reporting
buffer_compress_percentage=30
scramble_buffers=0
filename=/dev/zram0
loops=10
fsync_on_close=1

[seq-write]
bs=64k
rw=write
stonewall

[rand-write]
rw=randwrite
stonewall

[seq-read]
bs=64k
rw=read
stonewall

[rand-read]
rw=randread
stonewall

[mixed-seq]
bs=64k
rw=rw
stonewall

[mixed-rand]
rw=randrw
stonewall



[PATCH v2 0/2] crypto: AF_ALG memory management fix

2017-02-08 Thread Stephan Müller
Hi Herbert,

Changes v2:
* import fix from Harsh Jain  to remove SG
  from list before freeing
* fix return code used for ki_complete to match AIO behavior
  with sync behavior
* rename variable list -> tsgl_list
* update the algif_aead patch to include a dynamic TX SGL
  allocation similar to what algif_skcipher does. This allows
  concurrent continuous read/write operations to the extent
  you requested. Although I have not implemented "pairs of
  TX/RX SGLs" as I think that is even more overhead, the
  implementation conceptually defines such pairs. The recvmsg
  call defines how much from the input data is processed.
  The caller can have arbitrary number of sendmsg calls
  where the data is added to the TX SGL before an recvmsg
  asks the kernel to process a given amount (or all) of the
  TX SGL.

With the changes, you will see a lot of code duplication now
as I deliberately tried to use the same struct and variable names,
the same function names and even the same oder of functions.
If you agree to this patch, I volunteer to provide a followup
patch that will extract the code duplication into common
functions.

Please find attached memory management updates to

- simplify the code: the old AIO memory management is very
  complex and seemingly very fragile -- the update now
  eliminates all reported bugs in the skcipher and AEAD
  interfaces which allowed the kernel to be crashed by
  an unprivileged user

- streamline the code: there is one code path for AIO and sync
  operation; the code between algif_skcipher and algif_aead
  is very similar (if that patch set is accepted, I volunteer
  to reduce code duplication by moving service operations
  into af_alg.c and to further unify the TX SGL handling)

- unify the AIO and sync operation which only differ in the
  kernel crypto API callback and whether to wait for the
  crypto operation or not

- fix all reported bugs regarding the handling of multiple
  IOCBs.

The following testing was performed:

- stress testing to verify that no memleaks exist

- testing using Tadeusz Struck AIO test tool (see
  https://github.com/tstruk/afalg_async_test) -- the AEAD test
  is not applicable any more due to the changed user space
  interface; the skcipher test works once the user space
  interface change is honored in the test code

- using the libkcapi test suite, all tests including the
  originally failing ones (AIO with multiple IOCBs) work now --
  the current libkcapi code artificially limits the AEAD
  operation to one IOCB. After altering the libkcapi code
  to allow multiple IOCBs, the testing works flawless.

Stephan Mueller (2):
  crypto: skcipher AF_ALG - overhaul memory management
  crypto: aead AF_ALG - overhaul memory management

 crypto/algif_aead.c | 668 +---
 crypto/algif_skcipher.c | 472 ++
 2 files changed, 544 insertions(+), 596 deletions(-)

-- 
2.9.3




[PATCH v2 2/2] crypto: aead AF_ALG - overhaul memory management

2017-02-08 Thread Stephan Müller
The updated memory management is described in the top part of the code.
As one benefit of the changed memory management, the AIO and synchronous
operation is now implemented in one common function. The AF_ALG
operation uses the async kernel crypto API interface for each cipher
operation. Thus, the only difference between the AIO and sync operation
types visible from user space is:

1. the callback function to be invoked when the asynchronous operation
   is completed

2. whether to wait for the completion of the kernel crypto API operation
   or not

The change includes the overhaul of the TX and RX SGL handling. The TX
SGL holding the data sent from user space to the kernel is now dynamic
similar to algif_skcipher. This dynamic nature allows a continuous
operation of a thread sending data and a second thread receiving the
data. These threads do not need to synchronize as the kernel processes
as much data from the TX SGL to fill the RX SGL.

The caller reading the data from the kernel defines the amount of data
to be processed. Considering that the interface covers AEAD
authenticating ciphers, the reader must provide the buffer in the
correct size. Thus the reader defines the encryption size.

Signed-off-by: Stephan Mueller 
---
 crypto/algif_aead.c | 668 +++-
 1 file changed, 351 insertions(+), 317 deletions(-)

diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index 533265f..6c24c81 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -11,6 +11,26 @@
  * under the terms of the GNU General Public License as published by the Free
  * Software Foundation; either version 2 of the License, or (at your option)
  * any later version.
+ *
+ * The following concept of the memory management is used:
+ *
+ * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is
+ * filled by user space with the data submitted via sendpage/sendmsg. Filling
+ * up the TX SGL does not cause a crypto operation -- the data will only be
+ * tracked by the kernel. Upon receipt of one recvmsg call, the caller must
+ * provide a buffer which is tracked with the RX SGL.
+ *
+ * During the processing of the recvmsg operation, the cipher request is
+ * allocated and prepared. To support multiple recvmsg operations operating
+ * on one TX SGL, an offset pointer into the TX SGL is maintained. The TX SGL
+ * that is used for the crypto request is scatterwalk_ffwd by the offset
+ * pointer to obtain the start address the crypto operation shall use for
+ * the input data.
+ *
+ * After the completion of the crypto operation, the RX SGL and the cipher
+ * request is released. The processed TX SGL parts are released together with
+ * the RX SGL release and the offset pointer is reduced by the released
+ * data.
  */
 
 #include 
@@ -24,45 +44,55 @@
 #include 
 #include 
 
-struct aead_sg_list {
-   unsigned int cur;
-   struct scatterlist sg[ALG_MAX_PAGES];
+struct aead_tsgl {
+   struct list_head list;
+   unsigned int cur;   /* Last processed SG entry */
+   struct scatterlist sg[0];   /* Array of SGs forming the SGL */
 };
 
-struct aead_async_rsgl {
+struct aead_rsgl {
struct af_alg_sgl sgl;
struct list_head list;
 };
 
 struct aead_async_req {
-   struct scatterlist *tsgl;
-   struct aead_async_rsgl first_rsgl;
-   struct list_head list;
struct kiocb *iocb;
-   unsigned int tsgls;
-   char iv[];
+   struct sock *sk;
+
+   struct aead_rsgl first_rsgl;/* First RX SG */
+   struct list_head rsgl_list; /* Track RX SGs */
+
+   unsigned int outlen;/* Filled output buf length */
+
+   unsigned int areqlen;   /* Length of this data struct */
+   struct aead_request aead_req;   /* req ctx trails this struct */
 };
 
 struct aead_ctx {
-   struct aead_sg_list tsgl;
-   struct aead_async_rsgl first_rsgl;
-   struct list_head list;
+   struct list_head tsgl_list; /* Link to TX SGL */
 
void *iv;
+   size_t aead_assoclen;
 
-   struct af_alg_completion completion;
+   struct af_alg_completion completion;/* sync work queue */
 
-   unsigned long used;
+   unsigned int inflight;  /* Outstanding AIO ops */
+   size_t used;/* TX bytes sent to kernel */
+   size_t processed;   /* Processed TX bytes */
 
-   unsigned int len;
-   bool more;
-   bool merge;
-   bool enc;
+   bool more;  /* More data to be expected? */
+   bool merge; /* Merge new data into existing SG */
+   bool enc;   /* Crypto operation: enc, dec */
 
-   size_t aead_assoclen;
-   struct aead_request aead_req;
+   unsigned int len;   /* Length of allocated memory for this struct */
+   struct crypto_aead *aead_tfm;
 };
 
+static DECLARE_WAIT_QUEUE_HEAD(aead_aio_finish_wait);
+
+#define MAX_SGL_ENTS ((4096 - 

[PATCH v2 1/2] crypto: skcipher AF_ALG - overhaul memory management

2017-02-08 Thread Stephan Müller
The updated memory management is described in the top part of the code.
As one benefit of the changed memory management, the AIO and synchronous
operation is now implemented in one common function. The AF_ALG
operation uses the async kernel crypto API interface for each cipher
operation. Thus, the only difference between the AIO and sync operation
types visible from user space is:

1. the callback function to be invoked when the asynchronous operation
   is completed

2. whether to wait for the completion of the kernel crypto API operation
   or not

In addition, the code structure is adjusted to match the structure of
algif_aead for easier code assessment.

The user space interface changed slightly as follows: the old AIO
operation returned zero upon success and < 0 in case of an error to user
space. As all other AF_ALG interfaces (including the sync skcipher
interface) returned the number of processed bytes upon success and < 0
in case of an error, the new skcipher interface (regardless of AIO or
sync) returns the number of processed bytes in case of success.

Signed-off-by: Stephan Mueller 
---
 crypto/algif_skcipher.c | 472 
 1 file changed, 193 insertions(+), 279 deletions(-)

diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index a9e79d8..e65ebc4 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -10,6 +10,25 @@
  * Software Foundation; either version 2 of the License, or (at your option)
  * any later version.
  *
+ * The following concept of the memory management is used:
+ *
+ * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is
+ * filled by user space with the data submitted via sendpage/sendmsg. Filling
+ * up the TX SGL does not cause a crypto operation -- the data will only be
+ * tracked by the kernel. Upon receipt of one recvmsg call, the caller must
+ * provide a buffer which is tracked with the RX SGL.
+ *
+ * During the processing of the recvmsg operation, the cipher request is
+ * allocated and prepared. To support multiple recvmsg operations operating
+ * on one TX SGL, an offset pointer into the TX SGL is maintained. The TX SGL
+ * that is used for the crypto request is scatterwalk_ffwd by the offset
+ * pointer to obtain the start address the crypto operation shall use for
+ * the input data.
+ *
+ * After the completion of the crypto operation, the RX SGL and the cipher
+ * request is released. The processed TX SGL parts are released together with
+ * the RX SGL release and the offset pointer is reduced by the released
+ * data.
  */
 
 #include 
@@ -31,78 +50,50 @@ struct skcipher_sg_list {
struct scatterlist sg[0];
 };
 
+struct skcipher_rsgl {
+   struct af_alg_sgl sgl;
+   struct list_head list;
+};
+
+struct skcipher_async_req {
+   struct kiocb *iocb;
+   struct sock *sk;
+
+   struct skcipher_rsgl first_sgl;
+   struct list_head rsgl_list;
+
+   unsigned int areqlen;
+   struct skcipher_request req;
+};
+
 struct skcipher_tfm {
struct crypto_skcipher *skcipher;
bool has_key;
 };
 
 struct skcipher_ctx {
-   struct list_head tsgl;
-   struct af_alg_sgl rsgl;
+   struct list_head tsgl_list;
 
void *iv;
 
struct af_alg_completion completion;
 
-   atomic_t inflight;
+   unsigned int inflight;
size_t used;
+   size_t processed;
 
-   unsigned int len;
bool more;
bool merge;
bool enc;
 
-   struct skcipher_request req;
-};
-
-struct skcipher_async_rsgl {
-   struct af_alg_sgl sgl;
-   struct list_head list;
+   unsigned int len;
 };
 
-struct skcipher_async_req {
-   struct kiocb *iocb;
-   struct skcipher_async_rsgl first_sgl;
-   struct list_head list;
-   struct scatterlist *tsg;
-   atomic_t *inflight;
-   struct skcipher_request req;
-};
+static DECLARE_WAIT_QUEUE_HEAD(skcipher_aio_finish_wait);
 
 #define MAX_SGL_ENTS ((4096 - sizeof(struct skcipher_sg_list)) / \
  sizeof(struct scatterlist) - 1)
 
-static void skcipher_free_async_sgls(struct skcipher_async_req *sreq)
-{
-   struct skcipher_async_rsgl *rsgl, *tmp;
-   struct scatterlist *sgl;
-   struct scatterlist *sg;
-   int i, n;
-
-   list_for_each_entry_safe(rsgl, tmp, >list, list) {
-   af_alg_free_sg(>sgl);
-   if (rsgl != >first_sgl)
-   kfree(rsgl);
-   }
-   sgl = sreq->tsg;
-   n = sg_nents(sgl);
-   for_each_sg(sgl, sg, n, i)
-   put_page(sg_page(sg));
-
-   kfree(sreq->tsg);
-}
-
-static void skcipher_async_cb(struct crypto_async_request *req, int err)
-{
-   struct skcipher_async_req *sreq = req->data;
-   struct kiocb *iocb = sreq->iocb;
-
-   atomic_dec(sreq->inflight);
-   skcipher_free_async_sgls(sreq);
-   kzfree(sreq);
-   iocb->ki_complete(iocb, err, err);
-}
-
 static inline 

[PATCH] crypto: ccp - Set the AES size field for all modes

2017-02-08 Thread Gary R Hook
Ensure that the size field is correctly populated for
all AES modes.

Signed-off-by: Gary R Hook 
---
 drivers/crypto/ccp/ccp-dev-v5.c |3 +--
 drivers/crypto/ccp/ccp-dev.h|1 +
 drivers/crypto/ccp/ccp-ops.c|8 
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/ccp/ccp-dev-v5.c b/drivers/crypto/ccp/ccp-dev-v5.c
index 612898b..9c6ff8b8 100644
--- a/drivers/crypto/ccp/ccp-dev-v5.c
+++ b/drivers/crypto/ccp/ccp-dev-v5.c
@@ -284,8 +284,7 @@ static int ccp5_perform_aes(struct ccp_op *op)
CCP_AES_ENCRYPT() = op->u.aes.action;
CCP_AES_MODE() = op->u.aes.mode;
CCP_AES_TYPE() = op->u.aes.type;
-   if (op->u.aes.mode == CCP_AES_MODE_CFB)
-   CCP_AES_SIZE() = 0x7f;
+   CCP_AES_SIZE() = op->u.aes.size;
 
CCP5_CMD_FUNCTION() = function.raw;
 
diff --git a/drivers/crypto/ccp/ccp-dev.h b/drivers/crypto/ccp/ccp-dev.h
index 649e561..2b5c01f 100644
--- a/drivers/crypto/ccp/ccp-dev.h
+++ b/drivers/crypto/ccp/ccp-dev.h
@@ -467,6 +467,7 @@ struct ccp_aes_op {
enum ccp_aes_type type;
enum ccp_aes_mode mode;
enum ccp_aes_action action;
+   unsigned int size;
 };
 
 struct ccp_xts_aes_op {
diff --git a/drivers/crypto/ccp/ccp-ops.c b/drivers/crypto/ccp/ccp-ops.c
index 50fae44..6878160 100644
--- a/drivers/crypto/ccp/ccp-ops.c
+++ b/drivers/crypto/ccp/ccp-ops.c
@@ -692,6 +692,14 @@ static int ccp_run_aes_cmd(struct ccp_cmd_queue *cmd_q, 
struct ccp_cmd *cmd)
goto e_ctx;
}
}
+   switch (aes->mode) {
+   case CCP_AES_MODE_CFB: /* CFB128 only */
+   case CCP_AES_MODE_CTR:
+   op.u.aes.size = AES_BLOCK_SIZE * BITS_PER_BYTE - 1;
+   break;
+   default:
+   op.u.aes.size = 0;
+   }
 
/* Prepare the input and output data workareas. For in-place
 * operations we need to set the dma direction to BIDIRECTIONAL



Re: [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients

2017-02-08 Thread Dan Williams
On Wed, Feb 8, 2017 at 12:57 AM, Anup Patel  wrote:
> On Tue, Feb 7, 2017 at 11:46 PM, Dan Williams  
> wrote:
>> On Tue, Feb 7, 2017 at 1:02 AM, Anup Patel  wrote:
>>> On Tue, Feb 7, 2017 at 1:57 PM, Dan Williams  
>>> wrote:
 On Tue, Feb 7, 2017 at 12:16 AM, Anup Patel  
 wrote:
> The DMAENGINE framework assumes that if PQ offload is supported by a
> DMA device then all 256 PQ coefficients are supported. This assumption
> does not hold anymore because we now have BCM-SBA-RAID offload engine
> which supports PQ offload with limited number of PQ coefficients.
>
> This patch extends async_tx APIs to handle DMA devices with support
> for fewer PQ coefficients.
>
> Signed-off-by: Anup Patel 
> Reviewed-by: Scott Branden 

 I don't like this approach. Define an interface for md to query the
 offload engine once at the beginning of time. We should not be adding
 any new extensions to async_tx.
>>>
>>> Even if we do capability checks in Linux MD, we still need a way
>>> for DMAENGINE drivers to advertise number of PQ coefficients
>>> handled by the HW.
>>>
>>> I agree capability checks should be done once in Linux MD but I don't
>>> see why this has to be part of BCM-SBA-RAID driver patches. We need
>>> separate patchsets to address limitations of async_tx framework.
>>
>> Right, separate enabling before we pile on new hardware support to a
>> known broken framework.
>
> Linux Async Tx not broken framework. The issue is:
> 1. Its not complete enough
> 2. Its not optimized for very high through-put offload engines

I'm not understanding your point. I'm nak'ing this change to add yet
more per-transaction capability checking to async_tx. I don't like the
DMA_HAS_FEWER_PQ_COEF flag, especially since it is equal to
DMA_HAS_PQ_CONTINUE. I'm not asking for all of async_tx's problems to
be fixed before this new hardware support, I'm simply saying we should
start the process of moving offload-engine capability checking to the
raid code.


[PATCH] crypto: arm/aes-ce: assign err return conditionally

2017-02-08 Thread Nicholas Mc Guire
As the err value is not used unless there was an error it can be assigned
conditionally here. 

Signed-off-by: Nicholas Mc Guire <der.h...@hofr.at>
---

Not sure if this is really relevant and worth changing, effectively it
is practically no change as gcc would move the err = PTR_ERR(simd); 
below unregister_simds: anyway (based on inspection of .lst/.s files)
- so it is more of an adjust C-level to object level for readability.

Patch was compile-tested with multi_v7_defconfig 
(implies CONFIG_CRYPTO_AES_ARM_CE=m) 

Patch is against 4.10-rc7 (localversion-next is next-20170208)

 arch/arm/crypto/aes-ce-glue.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm/crypto/aes-ce-glue.c b/arch/arm/crypto/aes-ce-glue.c
index 883b84d..8f65030 100644
--- a/arch/arm/crypto/aes-ce-glue.c
+++ b/arch/arm/crypto/aes-ce-glue.c
@@ -437,9 +437,10 @@ static int __init aes_init(void)
drvname = aes_algs[i].base.cra_driver_name + 2;
basename = aes_algs[i].base.cra_driver_name;
simd = simd_skcipher_create_compat(algname, drvname, basename);
-   err = PTR_ERR(simd);
-   if (IS_ERR(simd))
+   if (IS_ERR(simd)) {
+   err = PTR_ERR(simd);
goto unregister_simds;
+   }
 
aes_simd_algs[i] = simd;
}
-- 
2.1.4



Re: [PATCH v2 2/5] async_tx: Handle DMA devices having support for fewer PQ coefficients

2017-02-08 Thread Anup Patel
On Tue, Feb 7, 2017 at 11:46 PM, Dan Williams  wrote:
> On Tue, Feb 7, 2017 at 1:02 AM, Anup Patel  wrote:
>> On Tue, Feb 7, 2017 at 1:57 PM, Dan Williams  
>> wrote:
>>> On Tue, Feb 7, 2017 at 12:16 AM, Anup Patel  wrote:
 The DMAENGINE framework assumes that if PQ offload is supported by a
 DMA device then all 256 PQ coefficients are supported. This assumption
 does not hold anymore because we now have BCM-SBA-RAID offload engine
 which supports PQ offload with limited number of PQ coefficients.

 This patch extends async_tx APIs to handle DMA devices with support
 for fewer PQ coefficients.

 Signed-off-by: Anup Patel 
 Reviewed-by: Scott Branden 
>>>
>>> I don't like this approach. Define an interface for md to query the
>>> offload engine once at the beginning of time. We should not be adding
>>> any new extensions to async_tx.
>>
>> Even if we do capability checks in Linux MD, we still need a way
>> for DMAENGINE drivers to advertise number of PQ coefficients
>> handled by the HW.
>>
>> I agree capability checks should be done once in Linux MD but I don't
>> see why this has to be part of BCM-SBA-RAID driver patches. We need
>> separate patchsets to address limitations of async_tx framework.
>
> Right, separate enabling before we pile on new hardware support to a
> known broken framework.

Linux Async Tx not broken framework. The issue is:
1. Its not complete enough
2. Its not optimized for very high through-put offload engines

Regards,
Anup