Re: [PATCH v8 0/4] crypto: add algif_akcipher user space API

2017-08-10 Thread Marcel Holtmann
Hi Mat,

>> This patch set adds the AF_ALG user space API to externalize the
>> asymmetric cipher API recently added to the kernel crypto API.
> 
> ...
> 
>> Changes v8:
>> * port to kernel 4.13
>> * port to consolidated AF_ALG code
>> 
>> Stephan Mueller (4):
>> crypto: AF_ALG -- add sign/verify API
>> crypto: AF_ALG -- add setpubkey setsockopt call
>> crypto: AF_ALG -- add asymmetric cipher
>> crypto: algif_akcipher - enable compilation
>> 
>> crypto/Kconfig  |   9 +
>> crypto/Makefile |   1 +
>> crypto/af_alg.c |  28 ++-
>> crypto/algif_aead.c |  36 ++--
>> crypto/algif_akcipher.c | 466 
>> 
>> crypto/algif_skcipher.c |  26 ++-
>> include/crypto/if_alg.h |   7 +-
>> include/uapi/linux/if_alg.h |   3 +
>> 8 files changed, 543 insertions(+), 33 deletions(-)
>> create mode 100644 crypto/algif_akcipher.c
>> 
>> -- 
>> 2.13.4
> 
> The last round of reviews for AF_ALG akcipher left off at an impasse around a 
> year ago: the consensus was that hardware key support was needed, but that 
> requirement was in conflict with the "always have a software fallback" rule 
> for the crypto subsystem. For example, a private key securely generated by 
> and stored in a TPM could not be copied out for use by a software algorithm. 
> Has anything come about to resolve this impasse?
> 
> There were some patches around to add keyring support by associating a key ID 
> with an akcipher socket, but that approach ran in to a mismatch between the 
> proposed keyring API for the verify operation and the semantics of AF_ALG 
> verify.
> 
> AF_ALG is best suited for crypto use cases where a socket is set up once and 
> there are lots of reads and writes to justify the setup cost. With asymmetric 
> crypto, the setup cost is high when you might only use the socket for a brief 
> time to do one verify or encrypt operation.
> 
> Given the efficiency and hardware key issues, AF_ALG seems to be mismatched 
> with asymmetric crypto. Have you looked at the proposed keyctl() support for 
> crypto operations?

we have also seen hardware now where the private key will never leave the 
crypto hardware. They public and private key is only generated for key exchange 
purposes and later on discarded again. Asymmetric ciphers are really not a good 
fit for AF_ALG and they should be solely supported via keyctl.

Regards

Marcel



Re: [PATCH v8 0/4] crypto: add algif_akcipher user space API

2017-08-10 Thread Mat Martineau


Hi Stephan,

On Thu, 10 Aug 2017, Stephan Müller wrote:


Hi,

This patch set adds the AF_ALG user space API to externalize the
asymmetric cipher API recently added to the kernel crypto API.


...


Changes v8:
* port to kernel 4.13
* port to consolidated AF_ALG code

Stephan Mueller (4):
 crypto: AF_ALG -- add sign/verify API
 crypto: AF_ALG -- add setpubkey setsockopt call
 crypto: AF_ALG -- add asymmetric cipher
 crypto: algif_akcipher - enable compilation

crypto/Kconfig  |   9 +
crypto/Makefile |   1 +
crypto/af_alg.c |  28 ++-
crypto/algif_aead.c |  36 ++--
crypto/algif_akcipher.c | 466 
crypto/algif_skcipher.c |  26 ++-
include/crypto/if_alg.h |   7 +-
include/uapi/linux/if_alg.h |   3 +
8 files changed, 543 insertions(+), 33 deletions(-)
create mode 100644 crypto/algif_akcipher.c

--
2.13.4


The last round of reviews for AF_ALG akcipher left off at an impasse 
around a year ago: the consensus was that hardware key support was needed, 
but that requirement was in conflict with the "always have a software 
fallback" rule for the crypto subsystem. For example, a private key 
securely generated by and stored in a TPM could not be copied out for use 
by a software algorithm. Has anything come about to resolve this impasse?


There were some patches around to add keyring support by associating a key 
ID with an akcipher socket, but that approach ran in to a mismatch between 
the proposed keyring API for the verify operation and the semantics of 
AF_ALG verify.


AF_ALG is best suited for crypto use cases where a socket is set up once 
and there are lots of reads and writes to justify the setup cost. With 
asymmetric crypto, the setup cost is high when you might only use the 
socket for a brief time to do one verify or encrypt operation.


Given the efficiency and hardware key issues, AF_ALG seems to be 
mismatched with asymmetric crypto. Have you looked at the proposed 
keyctl() support for crypto operations?


Thanks,

--
Mat Martineau
Intel OTC

Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Austin S. Hemmelgarn

On 2017-08-10 15:25, Hugo Mills wrote:

On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote:

On 08/10/2017 04:30 AM, Eric Biggers wrote:


Theses benchmarks are misleading because they compress the whole file as a
single stream without resetting the dictionary, which isn't how data will
typically be compressed in kernel mode.  With filesystem compression the data
has to be divided into small chunks that can each be decompressed independently.
That eliminates one of the primary advantages of Zstandard (support for large
dictionary sizes).


I did btrfs benchmarks of kernel trees and other normal data sets as
well.  The numbers were in line with what Nick is posting here.
zstd is a big win over both lzo and zlib from a btrfs point of view.

It's true Nick's patches only support a single compression level in
btrfs, but that's because btrfs doesn't have a way to pass in the
compression ratio.  It could easily be a mount option, it was just
outside the scope of Nick's initial work.


Could we please not add more mount options? I get that they're easy
to implement, but it's a very blunt instrument. What we tend to see
(with both nodatacow and compress) is people using the mount options,
then asking for exceptions, discovering that they can't do that, and
then falling back to doing it with attributes or btrfs properties.
Could we just start with btrfs properties this time round, and cut out
the mount option part of this cycle.
AFAIUI, the intent is to extend the compression type specification for 
both the mount options and the property, not to add a new mount option. 
I think we all agree that `mount -o compress=zstd3` is a lot better than 
`mount -o compress=zstd,compresslevel=3`.


In the long run, it'd be great to see most of the btrfs-specific
mount options get deprecated and ultimately removed entirely, in
favour of attributes/properties, where feasible.
Are properties set on the root subvolume inherited properly?  Because 
unless they are, we can't get the same semantics.


Two other counter arguments on completely removing BTRFS-specific mount 
options:
1. It's a lot easier and a lot more clearly defined to change things 
that affect global behavior of the FS by a remount than having to 
iterate everything in the FS to update properties.  If I'm disabling 
autodefrag, I'd much rather just `mount -o remount,noautodefrag` than 
`find / -xdev -exec btrfs property set \{\} autodefrag false`, as the 
first will take effect for everything simultaneously and run 
exponentially quicker.
2. There are some things that don't make sense as per-object settings or 
are otherwise nonsensical on objects.  Many, but not all, of the BTRFS 
specific mount options fall into this category IMO, with the notable 
exception of compress[-force], [no]autodefrag, [no]datacow, and 
[no]datasum.  Some other options do make sense as properties of the 
filesystem (commit, flushoncommit, {inode,space}_cache, max_inline, 
metadata_ratio, [no]ssd, and [no]treelog are such options), but many are 
one-off options that affect behavior on mount (like skip_balance, 
clear_cache, nologreplay, norecovery, usebbackuproot, and subvol).


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Hugo Mills
On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote:
> On 08/10/2017 04:30 AM, Eric Biggers wrote:
> >
> >Theses benchmarks are misleading because they compress the whole file as a
> >single stream without resetting the dictionary, which isn't how data will
> >typically be compressed in kernel mode.  With filesystem compression the data
> >has to be divided into small chunks that can each be decompressed 
> >independently.
> >That eliminates one of the primary advantages of Zstandard (support for large
> >dictionary sizes).
> 
> I did btrfs benchmarks of kernel trees and other normal data sets as
> well.  The numbers were in line with what Nick is posting here.
> zstd is a big win over both lzo and zlib from a btrfs point of view.
> 
> It's true Nick's patches only support a single compression level in
> btrfs, but that's because btrfs doesn't have a way to pass in the
> compression ratio.  It could easily be a mount option, it was just
> outside the scope of Nick's initial work.

   Could we please not add more mount options? I get that they're easy
to implement, but it's a very blunt instrument. What we tend to see
(with both nodatacow and compress) is people using the mount options,
then asking for exceptions, discovering that they can't do that, and
then falling back to doing it with attributes or btrfs properties.
Could we just start with btrfs properties this time round, and cut out
the mount option part of this cycle.

   In the long run, it'd be great to see most of the btrfs-specific
mount options get deprecated and ultimately removed entirely, in
favour of attributes/properties, where feasible.

   Hugo.

-- 
Hugo Mills | Klytus! Are your men on the right pills? Maybe you
hugo@... carfax.org.uk | should execute their trainer!
http://carfax.org.uk/  |
PGP: E2AB1DE4  |  Ming the Merciless, Flash Gordon


signature.asc
Description: Digital signature


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Nick Terrell
On 8/10/17, 10:48 AM, "Austin S. Hemmelgarn"  wrote:
>On 2017-08-10 13:24, Eric Biggers wrote:
>>On Thu, Aug 10, 2017 at 07:32:18AM -0400, Austin S. Hemmelgarn wrote:
>>>On 2017-08-10 04:30, Eric Biggers wrote:
On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
>
> It can compress at speeds approaching lz4, and quality approaching lzma.

 Well, for a very loose definition of "approaching", and certainly not at 
 the
 same time.  I doubt there's a use case for using the highest compression 
 levels
 in kernel mode --- especially the ones using zstd_opt.h.
>>> Large data-sets with WORM access patterns and infrequent writes
>>> immediately come to mind as a use case for the highest compression
>>> level.
>>>
>>> As a more specific example, the company I work for has a very large
>>> amount of documentation, and we keep all old versions.  This is all
>>> stored on a file server which is currently using BTRFS.  Once a
>>> document is written, it's almost never rewritten, so write
>>> performance only matters for the first write.  However, they're read
>>> back pretty frequently, so we need good read performance.  As of
>>> right now, the system is set to use LZO compression by default, and
>>> then when a new document is added, the previous version of that
>>> document gets re-compressed using zlib compression, which actually
>>> results in pretty significant space savings most of the time.  I
>>> would absolutely love to use zstd compression with this system with
>>> the highest compression level, because most people don't care how
>>> long it takes to write the file out, but they do care how long it
>>> takes to read a file (even if it's an older version).
>> 
>> This may be a reasonable use case, but note this cannot just be the regular
>> "zstd" compression setting, since filesystem compression by default must 
>> provide
>> reasonable performance for many different access patterns.  See the patch in
>> this series which actually adds zstd compression to btrfs; it only uses 
>> level 1.
>> I do not see a patch which adds a higher compression mode.  It would need to 
>> be
>> a special setting like "zstdhc" that users could opt-in to on specific
>> directories.  It also would need to be compared to simply compressing in
>> userspace.  In many cases compressing in userspace is probably the better
>> solution for the use case in question because it works on any filesystem, 
>> allows
>> using any compression algorithm, and if random access is not needed it is
>> possible to compress each file as a single stream (like a .xz file), which
>> produces a much better compression ratio than the block-by-block compression
>> that filesystems have to use.
> There has been discussion as well as (I think) initial patches merged 
> for support of specifying the compression level for algorithms which 
> support multiple compression levels in BTRFS.  I was actually under the 
> impression that we had decided to use level 3 as the default for zstd, 
> but that apparently isn't the case, and with the benchmark issues, it 
> may not be once proper benchmarks are run.

There are some initial patches to add compression levels to BtrFS [1]. Once
it's ready, we can add compression levels to zstd. The default compression
level in the current patch is 3.

[1] https://lkml.kernel.org/r/20170724172939.24527-1-dste...@suse.com




Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Nick Terrell
On 8/10/17, 1:30 AM, "Eric Biggers"  wrote:
> On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
>>
>> It can compress at speeds approaching lz4, and quality approaching lzma.
> 
> Well, for a very loose definition of "approaching", and certainly not at the
> same time.  I doubt there's a use case for using the highest compression 
> levels
> in kernel mode --- especially the ones using zstd_opt.h.
> 
>> 
>> The code was ported from the upstream zstd source repository.
> 
> What version?

zstd-1.1.4 with patches applied from upstream. I'll include it in the
next patch version.

>> `linux/zstd.h` header was modified to match linux kernel style.
>> The cross-platform and allocation code was stripped out. Instead zstd
>> requires the caller to pass a preallocated workspace. The source files
>> were clang-formatted [1] to match the Linux Kernel style as much as
>> possible. 
> 
> It would be easier to compare to the upstream version if it was not all
> reformatted.  There is a chance that bugs were introduced by Linux-specific
> changes, and it would be nice if they could be easily reviewed.  (Also I don't
> know what clang-format settings you used, but there are still a lot of
> differences from the Linux coding style.)

The clang-format settings I used are available in the zstd repo [1]. I left
the line length long, since it looked terrible otherwise.I set up a branch
in my zstd GitHub fork called "original-formatted" [2]. I've taken the source
I based the kernel patches off of [3] and ran clang-format without any other
changes. If you have any suggestions to improve the clang-formatting
please let me know.

>> 
>> I benchmarked zstd compression as a special character device. I ran zstd
>> and zlib compression at several levels, as well as performing no
>> compression, which measure the time spent copying the data to kernel space.
>> Data is passed to the compresser 4096 B at a time. The benchmark file is
>> located in the upstream zstd source repository under
>> `contrib/linux-kernel/zstd_compress_test.c` [2].
>> 
>> I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
>> The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
>> 16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
>> 211,988,480 B large. Run the following commands for the benchmark:
>> 
>> sudo modprobe zstd_compress_test
>> sudo mknod zstd_compress_test c 245 0
>> sudo cp silesia.tar zstd_compress_test
>> 
>> The time is reported by the time of the userland `cp`.
>> The MB/s is computed with
>> 
>> 1,536,217,008 B / time(buffer size, hash)
>> 
>> which includes the time to copy from userland.
>> The Adjusted MB/s is computed with
>> 
>> 1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
>> 
>> The memory reported is the amount of memory the compressor requests.
>> 
>> | Method   | Size (B) | Time (s) | Ratio | MB/s| Adj MB/s | Mem (MB) |
>> |--|--|--|---|-|--|--|
>> | none | 11988480 |0.100 | 1 | 2119.88 |- |- |
>> | zstd -1  | 73645762 |1.044 | 2.878 |  203.05 |   224.56 | 1.23 |
>> | zstd -3  | 66988878 |1.761 | 3.165 |  120.38 |   127.63 | 2.47 |
>> | zstd -5  | 65001259 |2.563 | 3.261 |   82.71 |86.07 | 2.86 |
>> | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |16.13 |13.22 |
>> | zstd -15 | 58009756 |   47.601 | 3.654 |4.45 | 4.46 |21.61 |
>> | zstd -19 | 54014593 |  102.835 | 3.925 |2.06 | 2.06 |60.15 |
>> | zlib -1  | 77260026 |2.895 | 2.744 |   73.23 |75.85 | 0.27 |
>> | zlib -3  | 72972206 |4.116 | 2.905 |   51.50 |52.79 | 0.27 |
>> | zlib -6  | 68190360 |9.633 | 3.109 |   22.01 |22.24 | 0.27 |
>> | zlib -9  | 67613382 |   22.554 | 3.135 |9.40 | 9.44 | 0.27 |
>> 
> 
> Theses benchmarks are misleading because they compress the whole file as a
> single stream without resetting the dictionary, which isn't how data will
> typically be compressed in kernel mode.  With filesystem compression the data
> has to be divided into small chunks that can each be decompressed 
> independently.
> That eliminates one of the primary advantages of Zstandard (support for large
> dictionary sizes).

This benchmark isn't meant to be representative of a filesystem scenario. I
wanted to show off zstd without anything else going on. Even in filesystems
where the data is chunked, zstd uses the whole chunk as the window (128 KB
in BtrFS and SquashFS by default), where zlib uses 32 KB. I have benchmarks
for BtrFS and SquashFS in their respective patches [4][5], and I've copied
the BtrFS table below (which was run with 2 threads).

| Method  | Ratio | Compression MB/s | Decompression speed |
|-|---|--|-|
| None|  0.99 |  504 | 686 |
| lzo |  1.66 |

Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Chris Mason

On 08/10/2017 03:00 PM, Eric Biggers wrote:

On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote:

On 08/10/2017 04:30 AM, Eric Biggers wrote:

On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:



The memory reported is the amount of memory the compressor requests.

| Method   | Size (B) | Time (s) | Ratio | MB/s| Adj MB/s | Mem (MB) |
|--|--|--|---|-|--|--|
| none | 11988480 |0.100 | 1 | 2119.88 |- |- |
| zstd -1  | 73645762 |1.044 | 2.878 |  203.05 |   224.56 | 1.23 |
| zstd -3  | 66988878 |1.761 | 3.165 |  120.38 |   127.63 | 2.47 |
| zstd -5  | 65001259 |2.563 | 3.261 |   82.71 |86.07 | 2.86 |
| zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |16.13 |13.22 |
| zstd -15 | 58009756 |   47.601 | 3.654 |4.45 | 4.46 |21.61 |
| zstd -19 | 54014593 |  102.835 | 3.925 |2.06 | 2.06 |60.15 |
| zlib -1  | 77260026 |2.895 | 2.744 |   73.23 |75.85 | 0.27 |
| zlib -3  | 72972206 |4.116 | 2.905 |   51.50 |52.79 | 0.27 |
| zlib -6  | 68190360 |9.633 | 3.109 |   22.01 |22.24 | 0.27 |
| zlib -9  | 67613382 |   22.554 | 3.135 |9.40 | 9.44 | 0.27 |



Theses benchmarks are misleading because they compress the whole file as a
single stream without resetting the dictionary, which isn't how data will
typically be compressed in kernel mode.  With filesystem compression the data
has to be divided into small chunks that can each be decompressed independently.
That eliminates one of the primary advantages of Zstandard (support for large
dictionary sizes).


I did btrfs benchmarks of kernel trees and other normal data sets as
well.  The numbers were in line with what Nick is posting here.
zstd is a big win over both lzo and zlib from a btrfs point of view.

It's true Nick's patches only support a single compression level in
btrfs, but that's because btrfs doesn't have a way to pass in the
compression ratio.  It could easily be a mount option, it was just
outside the scope of Nick's initial work.



I am not surprised --- Zstandard is closer to the state of the art, both
format-wise and implementation-wise, than the other choices in BTRFS.  My point
is that benchmarks need to account for how much data is compressed at a time.
This is a common mistake when comparing different compression algorithms; the
algorithm name and compression level do not tell the whole story.  The
dictionary size is extremely significant.  No one is going to compress or
decompress a 200 MB file as a single stream in kernel mode, so it does not make
sense to justify adding Zstandard *to the kernel* based on such a benchmark.  It
is going to be divided into chunks.  How big are the chunks in BTRFS?  I thought
that it compressed only one page (4 KiB) at a time, but I hope that has been, or
is being, improved; 32 KiB - 128 KiB should be a better amount.  (And if the
amount of data compressed at a time happens to be different between the
different algorithms, note that BTRFS benchmarks are likely to be measuring that
as much as the algorithms themselves.)


Btrfs hooks the compression code into the delayed allocation mechanism 
we use to gather large extents for COW.  So if you write 100MB to a 
file, we'll have 100MB to compress at a time (within the limits of the 
amount of pages we allow to collect before forcing it down).


But we want to balance how much memory you might need to uncompress 
during random reads.  So we have an artificial limit of 128KB that we 
send at a time to the compression code.  It's easy to change this, it's 
just a tradeoff made to limit the cost of reading small bits.


It's the same for zlib,lzo and the new zstd patch.

-chris



Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Eric Biggers
On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote:
> On 08/10/2017 04:30 AM, Eric Biggers wrote:
> >On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
> 
> >>The memory reported is the amount of memory the compressor requests.
> >>
> >>| Method   | Size (B) | Time (s) | Ratio | MB/s| Adj MB/s | Mem (MB) |
> >>|--|--|--|---|-|--|--|
> >>| none | 11988480 |0.100 | 1 | 2119.88 |- |- |
> >>| zstd -1  | 73645762 |1.044 | 2.878 |  203.05 |   224.56 | 1.23 |
> >>| zstd -3  | 66988878 |1.761 | 3.165 |  120.38 |   127.63 | 2.47 |
> >>| zstd -5  | 65001259 |2.563 | 3.261 |   82.71 |86.07 | 2.86 |
> >>| zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |16.13 |13.22 |
> >>| zstd -15 | 58009756 |   47.601 | 3.654 |4.45 | 4.46 |21.61 |
> >>| zstd -19 | 54014593 |  102.835 | 3.925 |2.06 | 2.06 |60.15 |
> >>| zlib -1  | 77260026 |2.895 | 2.744 |   73.23 |75.85 | 0.27 |
> >>| zlib -3  | 72972206 |4.116 | 2.905 |   51.50 |52.79 | 0.27 |
> >>| zlib -6  | 68190360 |9.633 | 3.109 |   22.01 |22.24 | 0.27 |
> >>| zlib -9  | 67613382 |   22.554 | 3.135 |9.40 | 9.44 | 0.27 |
> >>
> >
> >Theses benchmarks are misleading because they compress the whole file as a
> >single stream without resetting the dictionary, which isn't how data will
> >typically be compressed in kernel mode.  With filesystem compression the data
> >has to be divided into small chunks that can each be decompressed 
> >independently.
> >That eliminates one of the primary advantages of Zstandard (support for large
> >dictionary sizes).
> 
> I did btrfs benchmarks of kernel trees and other normal data sets as
> well.  The numbers were in line with what Nick is posting here.
> zstd is a big win over both lzo and zlib from a btrfs point of view.
> 
> It's true Nick's patches only support a single compression level in
> btrfs, but that's because btrfs doesn't have a way to pass in the
> compression ratio.  It could easily be a mount option, it was just
> outside the scope of Nick's initial work.
> 

I am not surprised --- Zstandard is closer to the state of the art, both
format-wise and implementation-wise, than the other choices in BTRFS.  My point
is that benchmarks need to account for how much data is compressed at a time.
This is a common mistake when comparing different compression algorithms; the
algorithm name and compression level do not tell the whole story.  The
dictionary size is extremely significant.  No one is going to compress or
decompress a 200 MB file as a single stream in kernel mode, so it does not make
sense to justify adding Zstandard *to the kernel* based on such a benchmark.  It
is going to be divided into chunks.  How big are the chunks in BTRFS?  I thought
that it compressed only one page (4 KiB) at a time, but I hope that has been, or
is being, improved; 32 KiB - 128 KiB should be a better amount.  (And if the
amount of data compressed at a time happens to be different between the
different algorithms, note that BTRFS benchmarks are likely to be measuring that
as much as the algorithms themselves.)

Eric


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Austin S. Hemmelgarn

On 2017-08-10 13:24, Eric Biggers wrote:

On Thu, Aug 10, 2017 at 07:32:18AM -0400, Austin S. Hemmelgarn wrote:

On 2017-08-10 04:30, Eric Biggers wrote:

On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:


It can compress at speeds approaching lz4, and quality approaching lzma.


Well, for a very loose definition of "approaching", and certainly not at the
same time.  I doubt there's a use case for using the highest compression levels
in kernel mode --- especially the ones using zstd_opt.h.

Large data-sets with WORM access patterns and infrequent writes
immediately come to mind as a use case for the highest compression
level.

As a more specific example, the company I work for has a very large
amount of documentation, and we keep all old versions.  This is all
stored on a file server which is currently using BTRFS.  Once a
document is written, it's almost never rewritten, so write
performance only matters for the first write.  However, they're read
back pretty frequently, so we need good read performance.  As of
right now, the system is set to use LZO compression by default, and
then when a new document is added, the previous version of that
document gets re-compressed using zlib compression, which actually
results in pretty significant space savings most of the time.  I
would absolutely love to use zstd compression with this system with
the highest compression level, because most people don't care how
long it takes to write the file out, but they do care how long it
takes to read a file (even if it's an older version).


This may be a reasonable use case, but note this cannot just be the regular
"zstd" compression setting, since filesystem compression by default must provide
reasonable performance for many different access patterns.  See the patch in
this series which actually adds zstd compression to btrfs; it only uses level 1.
I do not see a patch which adds a higher compression mode.  It would need to be
a special setting like "zstdhc" that users could opt-in to on specific
directories.  It also would need to be compared to simply compressing in
userspace.  In many cases compressing in userspace is probably the better
solution for the use case in question because it works on any filesystem, allows
using any compression algorithm, and if random access is not needed it is
possible to compress each file as a single stream (like a .xz file), which
produces a much better compression ratio than the block-by-block compression
that filesystems have to use.
There has been discussion as well as (I think) initial patches merged 
for support of specifying the compression level for algorithms which 
support multiple compression levels in BTRFS.  I was actually under the 
impression that we had decided to use level 3 as the default for zstd, 
but that apparently isn't the case, and with the benchmark issues, it 
may not be once proper benchmarks are run.


Also, on the note of compressing in userspace, the use case I quoted at 
least can't do that because we have to deal with Windows clients and 
users have to be able to open files directly on said Windows clients.  I 
entirely agree that real archival storage is better off using userspace 
compression, but sometimes real archival storage isn't an option.


Note also that LZ4HC is in the kernel source tree currently but no one is using
it vs. the regular LZ4.  I think it is the kind of thing that sounded useful
originally, but at the end of the day no one really wants to use it in kernel
mode.  I'd certainly be interested in actual patches, though.
Part of that is the fact that BTRFS is one of the only consumers (AFAIK) 
of this API that can freely choose all aspects of their usage, and the 
consensus here (which I don't agree with I might add) amounts to the 
argument that 'we already have  compression with a  compression 
ratio, we don't need more things like that'.  I would personally love to 
see LZ4HC support in BTRFS (based on testing my own use cases, LZ4 is 
more deterministic than LZO for both compression and decompression, and 
most of the non archival usage I have of BTRFS benefits from 
determinism), but there's not any point in me writing up such a patch 
because it's almost certain to get rejected because BTRFS already has 
LZO.  The main reason that zstd is getting considered at all is that the 
quoted benchmarks show clear benefits in decompression speed relative to 
zlib and far better compression ratios than LZO.


[RFC PATCH 03/10] staging: fsl-mc: dpio: add order preservation support

2017-08-10 Thread Horia Geantă
From: Radu Alexe 

Order preservation is a feature that will be supported
in dpni, dpseci and dpci devices.
This is a preliminary patch for the changes to be
introduced in the corresponding drivers.

Signed-off-by: Radu Alexe 
Signed-off-by: Horia Geantă 
---
 drivers/staging/fsl-mc/include/dpopr.h | 110 +
 1 file changed, 110 insertions(+)
 create mode 100644 drivers/staging/fsl-mc/include/dpopr.h

diff --git a/drivers/staging/fsl-mc/include/dpopr.h 
b/drivers/staging/fsl-mc/include/dpopr.h
new file mode 100644
index ..e1110af2fe54
--- /dev/null
+++ b/drivers/staging/fsl-mc/include/dpopr.h
@@ -0,0 +1,110 @@
+/*
+ * Copyright 2017 NXP
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ * * Neither the name of the above-listed copyright holders nor the
+ * names of any contributors may be used to endorse or promote products
+ * derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+#ifndef __FSL_DPOPR_H_
+#define __FSL_DPOPR_H_
+
+/* Data Path Order Restoration API
+ * Contains initialization APIs and runtime APIs for the Order Restoration
+ */
+
+/** Order Restoration properties */
+
+/**
+ * Create a new Order Point Record option
+ */
+#define OPR_OPT_CREATE 0x1
+/**
+ * Retire an existing Order Point Record option
+ */
+#define OPR_OPT_RETIRE 0x2
+
+/**
+ * struct opr_cfg - Structure representing OPR configuration
+ * @oprrws: Order point record (OPR) restoration window size (0 to 5)
+ * 0 - Window size is 32 frames.
+ * 1 - Window size is 64 frames.
+ * 2 - Window size is 128 frames.
+ * 3 - Window size is 256 frames.
+ * 4 - Window size is 512 frames.
+ * 5 - Window size is 1024 frames.
+ * @oa: OPR auto advance NESN window size (0 disabled, 1 enabled)
+ * @olws: OPR acceptable late arrival window size (0 to 3)
+ * 0 - Disabled. Late arrivals are always rejected.
+ * 1 - Window size is 32 frames.
+ * 2 - Window size is the same as the OPR restoration
+ * window size configured in the OPRRWS field.
+ * 3 - Window size is 8192 frames. Late arrivals are
+ * always accepted.
+ * @oeane: Order restoration list (ORL) resource exhaustion
+ * advance NESN enable (0 disabled, 1 enabled)
+ * @oloe: OPR loose ordering enable (0 disabled, 1 enabled)
+ */
+struct opr_cfg {
+   u8 oprrws;
+   u8 oa;
+   u8 olws;
+   u8 oeane;
+   u8 oloe;
+};
+
+/**
+ * struct opr_qry - Structure representing OPR configuration
+ * @enable: Enabled state
+ * @rip: Retirement In Progress
+ * @ndsn: Next dispensed sequence number
+ * @nesn: Next expected sequence number
+ * @ea_hseq: Early arrival head sequence number
+ * @hseq_nlis: HSEQ not last in sequence
+ * @ea_tseq: Early arrival tail sequence number
+ * @tseq_nlis: TSEQ not last in sequence
+ * @ea_tptr: Early arrival tail pointer
+ * @ea_hptr: Early arrival head pointer
+ * @opr_id: Order Point Record ID
+ * @opr_vid: Order Point Record Virtual ID
+ */
+struct opr_qry {
+   char enable;
+   char rip;
+   u16 ndsn;
+   u16 nesn;
+   u16 ea_hseq;
+   char hseq_nlis;
+   u16 ea_tseq;
+   char tseq_nlis;
+   u16 ea_tptr;
+   u16 ea_hptr;
+   

[RFC PATCH 04/10] staging: fsl-dpaa2/eth: move generic FD defines to DPIO

2017-08-10 Thread Horia Geantă
Previous commits:
6e2387e8f19e ("staging: fsl-dpaa2/eth: Add Freescale DPAA2 Ethernet driver")
39163c0ce0f4 ("staging: fsl-dpaa2/eth: Errors checking update")
have added bits that are not specific to the WRIOP accelerator.

Move these where they belong (in DPIO) such that other accelerators
can make use of them.

While here, fix the values of FD_CTRL_FSE and FD_CTRL_FAERR, which
were shifted off by one bit.

Signed-off-by: Horia Geantă 
---
 drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c |  8 +++-
 drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.h | 19 +--
 drivers/staging/fsl-mc/include/dpaa2-fd.h  | 12 
 3 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c 
b/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c
index b9a0a315e6fb..a1d5c371e1c4 100644
--- a/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c
+++ b/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c
@@ -410,8 +410,7 @@ static int build_sg_fd(struct dpaa2_eth_priv *priv,
dpaa2_fd_set_format(fd, dpaa2_fd_sg);
dpaa2_fd_set_addr(fd, addr);
dpaa2_fd_set_len(fd, skb->len);
-   dpaa2_fd_set_ctrl(fd, DPAA2_FD_CTRL_ASAL | DPAA2_FD_CTRL_PTA |
- DPAA2_FD_CTRL_PTV1);
+   dpaa2_fd_set_ctrl(fd, DPAA2_FD_CTRL_ASAL | FD_CTRL_PTA | FD_CTRL_PTV1);
 
return 0;
 
@@ -464,8 +463,7 @@ static int build_single_fd(struct dpaa2_eth_priv *priv,
dpaa2_fd_set_offset(fd, (u16)(skb->data - buffer_start));
dpaa2_fd_set_len(fd, skb->len);
dpaa2_fd_set_format(fd, dpaa2_fd_single);
-   dpaa2_fd_set_ctrl(fd, DPAA2_FD_CTRL_ASAL | DPAA2_FD_CTRL_PTA |
- DPAA2_FD_CTRL_PTV1);
+   dpaa2_fd_set_ctrl(fd, DPAA2_FD_CTRL_ASAL | FD_CTRL_PTA | FD_CTRL_PTV1);
 
return 0;
 }
@@ -653,7 +651,7 @@ static void dpaa2_eth_tx_conf(struct dpaa2_eth_priv *priv,
/* We only check error bits in the FAS field if corresponding
 * FAERR bit is set in FD and the FAS field is marked as valid
 */
-   has_fas_errors = (fd_errors & DPAA2_FD_CTRL_FAERR) &&
+   has_fas_errors = (fd_errors & FD_CTRL_FAERR) &&
 !!(dpaa2_fd_get_frc(fd) & DPAA2_FD_FRC_FASV);
if (net_ratelimit())
netdev_dbg(priv->net_dev, "TX frame FD error: %x08\n",
diff --git a/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.h 
b/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.h
index e6d28a249fc1..dfbb60b1 100644
--- a/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.h
+++ b/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.h
@@ -120,23 +120,14 @@ struct dpaa2_eth_swa {
 #define DPAA2_FD_FRC_FASWOV0x0800
 #define DPAA2_FD_FRC_FAICFDV   0x0400
 
-/* Error bits in FD CTRL */
-#define DPAA2_FD_CTRL_UFD  0x0004
-#define DPAA2_FD_CTRL_SBE  0x0008
-#define DPAA2_FD_CTRL_FSE  0x0010
-#define DPAA2_FD_CTRL_FAERR0x0020
-
-#define DPAA2_FD_RX_ERR_MASK   (DPAA2_FD_CTRL_SBE  | \
-DPAA2_FD_CTRL_FAERR)
-#define DPAA2_FD_TX_ERR_MASK   (DPAA2_FD_CTRL_UFD  | \
-DPAA2_FD_CTRL_SBE  | \
-DPAA2_FD_CTRL_FSE  | \
-DPAA2_FD_CTRL_FAERR)
+#define DPAA2_FD_RX_ERR_MASK   (FD_CTRL_SBE | FD_CTRL_FAERR)
+#define DPAA2_FD_TX_ERR_MASK   (FD_CTRL_UFD| \
+FD_CTRL_SBE| \
+FD_CTRL_FSE| \
+FD_CTRL_FAERR)
 
 /* Annotation bits in FD CTRL */
 #define DPAA2_FD_CTRL_ASAL 0x0002  /* ASAL = 128 */
-#define DPAA2_FD_CTRL_PTA  0x0080
-#define DPAA2_FD_CTRL_PTV1 0x0040
 
 /* Frame annotation status */
 struct dpaa2_fas {
diff --git a/drivers/staging/fsl-mc/include/dpaa2-fd.h 
b/drivers/staging/fsl-mc/include/dpaa2-fd.h
index 992fdc7ba5b8..72328415c26d 100644
--- a/drivers/staging/fsl-mc/include/dpaa2-fd.h
+++ b/drivers/staging/fsl-mc/include/dpaa2-fd.h
@@ -101,6 +101,18 @@ struct dpaa2_fd {
 #define FL_FINAL_FLAG_MASK 0x1
 #define FL_FINAL_FLAG_SHIFT15
 
+/* Error bits in FD CTRL */
+#define FD_CTRL_ERR_MASK   0x00FF
+#define FD_CTRL_UFD0x0004
+#define FD_CTRL_SBE0x0008
+#define FD_CTRL_FLC0x0010
+#define FD_CTRL_FSE0x0020
+#define FD_CTRL_FAERR  0x0040
+
+/* Annotation bits in FD CTRL */
+#define FD_CTRL_PTA0x0080
+#define FD_CTRL_PTV1   0x0040
+
 enum dpaa2_fd_format {
dpaa2_fd_single = 0,
dpaa2_fd_list,
-- 
2.12.0.264.gd6db3f216544



[RFC PATCH 01/10] staging: fsl-mc: dpio: add frame list format support

2017-08-10 Thread Horia Geantă
Add support for dpaa2_fd_list format, i.e. dpaa2_fl_entry structure
and accessors.

Frame list entries (FLEs) are similar, but not identical to frame
descriptors (FDs):
+ "F" (final) bit
- FMT[b'01] is reserved
- DD, SC, DROPP bits (covered by "FD compatibility" field in FLE case)
- FLC[5:0] not used for stashing

Signed-off-by: Horia Geantă 
---
 drivers/staging/fsl-mc/include/dpaa2-fd.h | 243 ++
 1 file changed, 243 insertions(+)

diff --git a/drivers/staging/fsl-mc/include/dpaa2-fd.h 
b/drivers/staging/fsl-mc/include/dpaa2-fd.h
index cf7857f00a5c..992fdc7ba5b8 100644
--- a/drivers/staging/fsl-mc/include/dpaa2-fd.h
+++ b/drivers/staging/fsl-mc/include/dpaa2-fd.h
@@ -91,6 +91,15 @@ struct dpaa2_fd {
 #define SG_BPID_MASK   0x3FFF
 #define SG_FINAL_FLAG_MASK 0x1
 #define SG_FINAL_FLAG_SHIFT15
+#define FL_SHORT_LEN_FLAG_MASK 0x1
+#define FL_SHORT_LEN_FLAG_SHIFT14
+#define FL_SHORT_LEN_MASK  0x3
+#define FL_OFFSET_MASK 0x0FFF
+#define FL_FORMAT_MASK 0x3
+#define FL_FORMAT_SHIFT12
+#define FL_BPID_MASK   0x3FFF
+#define FL_FINAL_FLAG_MASK 0x1
+#define FL_FINAL_FLAG_SHIFT15
 
 enum dpaa2_fd_format {
dpaa2_fd_single = 0,
@@ -448,4 +457,238 @@ static inline void dpaa2_sg_set_final(struct 
dpaa2_sg_entry *sg, bool final)
sg->format_offset |= cpu_to_le16(final << SG_FINAL_FLAG_SHIFT);
 }
 
+/**
+ * struct dpaa2_fl_entry - structure for frame list entry.
+ * @addr:  address in the FLE
+ * @len:   length in the FLE
+ * @bpid:  buffer pool ID
+ * @format_offset: format, offset, and short-length fields
+ * @frc:   frame context
+ * @ctrl:  control bits...including pta, pvt1, pvt2, err, etc
+ * @flc:   flow context address
+ */
+struct dpaa2_fl_entry {
+   __le64 addr;
+   __le32 len;
+   __le16 bpid;
+   __le16 format_offset;
+   __le32 frc;
+   __le32 ctrl;
+   __le64 flc;
+};
+
+enum dpaa2_fl_format {
+   dpaa2_fl_single = 0,
+   dpaa2_fl_res,
+   dpaa2_fl_sg
+};
+
+/**
+ * dpaa2_fl_get_addr() - get the addr field of FLE
+ * @fle: the given frame list entry
+ *
+ * Return the address in the frame list entry.
+ */
+static inline dma_addr_t dpaa2_fl_get_addr(const struct dpaa2_fl_entry *fle)
+{
+   return (dma_addr_t)le64_to_cpu(fle->addr);
+}
+
+/**
+ * dpaa2_fl_set_addr() - Set the addr field of FLE
+ * @fle: the given frame list entry
+ * @addr: the address needs to be set in frame list entry
+ */
+static inline void dpaa2_fl_set_addr(struct dpaa2_fl_entry *fle,
+dma_addr_t addr)
+{
+   fle->addr = cpu_to_le64(addr);
+}
+
+/**
+ * dpaa2_fl_get_frc() - Get the frame context in the FLE
+ * @fle: the given frame list entry
+ *
+ * Return the frame context field in the frame lsit entry.
+ */
+static inline u32 dpaa2_fl_get_frc(const struct dpaa2_fl_entry *fle)
+{
+   return le32_to_cpu(fle->frc);
+}
+
+/**
+ * dpaa2_fl_set_frc() - Set the frame context in the FLE
+ * @fle: the given frame list entry
+ * @frc: the frame context needs to be set in frame list entry
+ */
+static inline void dpaa2_fl_set_frc(struct dpaa2_fl_entry *fle, u32 frc)
+{
+   fle->frc = cpu_to_le32(frc);
+}
+
+/**
+ * dpaa2_fl_get_ctrl() - Get the control bits in the FLE
+ * @fle: the given frame list entry
+ *
+ * Return the control bits field in the frame list entry.
+ */
+static inline u32 dpaa2_fl_get_ctrl(const struct dpaa2_fl_entry *fle)
+{
+   return le32_to_cpu(fle->ctrl);
+}
+
+/**
+ * dpaa2_fl_set_ctrl() - Set the control bits in the FLE
+ * @fle: the given frame list entry
+ * @ctrl: the control bits to be set in the frame list entry
+ */
+static inline void dpaa2_fl_set_ctrl(struct dpaa2_fl_entry *fle, u32 ctrl)
+{
+   fle->ctrl = cpu_to_le32(ctrl);
+}
+
+/**
+ * dpaa2_fl_get_flc() - Get the flow context in the FLE
+ * @fle: the given frame list entry
+ *
+ * Return the flow context in the frame list entry.
+ */
+static inline dma_addr_t dpaa2_fl_get_flc(const struct dpaa2_fl_entry *fle)
+{
+   return (dma_addr_t)le64_to_cpu(fle->flc);
+}
+
+/**
+ * dpaa2_fl_set_flc() - Set the flow context field of FLE
+ * @fle: the given frame list entry
+ * @flc_addr: the flow context needs to be set in frame list entry
+ */
+static inline void dpaa2_fl_set_flc(struct dpaa2_fl_entry *fle,
+   dma_addr_t flc_addr)
+{
+   fle->flc = cpu_to_le64(flc_addr);
+}
+
+static inline bool dpaa2_fl_short_len(const struct dpaa2_fl_entry *fle)
+{
+   return !!((le16_to_cpu(fle->format_offset) >>
+ FL_SHORT_LEN_FLAG_SHIFT) & FL_SHORT_LEN_FLAG_MASK);
+}
+
+/**
+ * dpaa2_fl_get_len() - Get the length in the FLE
+ * @fle: the given frame list entry
+ *
+ * Return the length field in the frame list entry.
+ */
+static inline u32 dpaa2_fl_get_len(const struct dpaa2_fl_entry *fle)
+{
+   if (dpaa2_fl_short_len(fle))
+  

[RFC PATCH 00/10] crypto: caam - add DPAA2 (DPSECI) driver

2017-08-10 Thread Horia Geantă
Hi,

This patch set adds the CAAM crypto engine driver for DPAA2
(Data Path Acceleration Architecture v2) found on ARMv8-based SoCs
like LS1088A, LS2088A.

Driver consists of:
-DPSECI (Data Path SEC Interface) backend - low-level API that allows
to manage DPSECI devices (DPAA2 objects) that sit on
the Management Complex (MC) fsl-mc bus
-algorithms frontend - AEAD and ablkcipher algorithms implementation

Patches 1-4 include DPIO object dependencies.
I am aware that DPIO is currently in staging, however I don't consider
these to be a large feature set. Anyhow, please let me know if going
with the patches through staging is acceptable.

Patches 5-9 are the core of the patch set, adding the driver.
For symmetric encryption the legacy ablkcipher interface is used; the
plan is to convert to skcipher all CAAM frontends at once at a certain
point in time.

Patch 10 enables driver on arm64. It will be built only if dependency
on DPIO (CONFIG_FSL_MC_DPIO) is satisfied.

Thanks,
Horia

Horia Geantă (9):
  staging: fsl-mc: dpio: add frame list format support
  staging: fsl-mc: dpio: add congestion notification support
  staging: fsl-dpaa2/eth: move generic FD defines to DPIO
  crypto: caam/qi - prepare for gcm(aes) support
  crypto: caam - add DPAA2-CAAM (DPSECI) backend API
  crypto: caam - add Queue Interface v2 error codes
  crypto: caam/qi2 - add DPAA2-CAAM driver
  crypto: caam/qi2 - add ablkcipher algorithms
  arm64: defconfig: enable CAAM crypto engine on QorIQ DPAA2 SoCs

Radu Alexe (1):
  staging: fsl-mc: dpio: add order preservation support

 arch/arm64/configs/defconfig   |1 +
 drivers/crypto/Makefile|2 +-
 drivers/crypto/caam/Kconfig|   57 +-
 drivers/crypto/caam/Makefile   |9 +-
 drivers/crypto/caam/caamalg.c  |   19 +-
 drivers/crypto/caam/caamalg_desc.c |  165 +-
 drivers/crypto/caam/caamalg_desc.h |   24 +-
 drivers/crypto/caam/caamalg_qi2.c  | 3949 
 drivers/crypto/caam/caamalg_qi2.h  |  243 ++
 drivers/crypto/caam/compat.h   |1 +
 drivers/crypto/caam/dpseci.c   |  858 +
 drivers/crypto/caam/dpseci.h   |  395 +++
 drivers/crypto/caam/dpseci_cmd.h   |  261 ++
 drivers/crypto/caam/error.c|   75 +-
 drivers/crypto/caam/error.h|6 +-
 drivers/crypto/caam/key_gen.c  |   30 -
 drivers/crypto/caam/key_gen.h  |   30 +
 drivers/crypto/caam/regs.h |2 +
 drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c |8 +-
 drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.h |   19 +-
 drivers/staging/fsl-mc/include/dpaa2-fd.h  |  255 ++
 drivers/staging/fsl-mc/include/dpaa2-io.h  |   43 +
 drivers/staging/fsl-mc/include/dpopr.h |  110 +
 23 files changed, 6463 insertions(+), 99 deletions(-)
 create mode 100644 drivers/crypto/caam/caamalg_qi2.c
 create mode 100644 drivers/crypto/caam/caamalg_qi2.h
 create mode 100644 drivers/crypto/caam/dpseci.c
 create mode 100644 drivers/crypto/caam/dpseci.h
 create mode 100644 drivers/crypto/caam/dpseci_cmd.h
 create mode 100644 drivers/staging/fsl-mc/include/dpopr.h

-- 
2.12.0.264.gd6db3f216544



[RFC PATCH 09/10] crypto: caam/qi2 - add ablkcipher algorithms

2017-08-10 Thread Horia Geantă
Add support to submit the following ablkcipher algorithms
via the DPSECI backend:
cbc({aes,des,des3_ede})
ctr(aes), rfc3686(ctr(aes))
xts(aes)

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/Kconfig   |   1 +
 drivers/crypto/caam/caamalg_qi2.c | 816 ++
 drivers/crypto/caam/caamalg_qi2.h |  23 +-
 3 files changed, 839 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/caam/Kconfig b/drivers/crypto/caam/Kconfig
index e45d39d9007e..eb202e59c4fa 100644
--- a/drivers/crypto/caam/Kconfig
+++ b/drivers/crypto/caam/Kconfig
@@ -159,6 +159,7 @@ config CRYPTO_DEV_FSL_DPAA2_CAAM
tristate "QorIQ DPAA2 CAAM (DPSECI) driver"
depends on FSL_MC_DPIO
select CRYPTO_DEV_FSL_CAAM_COMMON
+   select CRYPTO_BLKCIPHER
select CRYPTO_AUTHENC
select CRYPTO_AEAD
---help---
diff --git a/drivers/crypto/caam/caamalg_qi2.c 
b/drivers/crypto/caam/caamalg_qi2.c
index 9dc5e1184e80..f32c518bc680 100644
--- a/drivers/crypto/caam/caamalg_qi2.c
+++ b/drivers/crypto/caam/caamalg_qi2.c
@@ -1047,6 +1047,457 @@ static int rfc4543_setkey(struct crypto_aead *aead,
return ret;
 }
 
+static int ablkcipher_setkey(struct crypto_ablkcipher *ablkcipher,
+const u8 *key, unsigned int keylen)
+{
+   struct caam_ctx *ctx = crypto_ablkcipher_ctx(ablkcipher);
+   struct crypto_tfm *tfm = crypto_ablkcipher_tfm(ablkcipher);
+   const char *alg_name = crypto_tfm_alg_name(tfm);
+   struct device *dev = ctx->dev;
+   struct caam_flc *flc;
+   unsigned int ivsize = crypto_ablkcipher_ivsize(ablkcipher);
+   u32 *desc;
+   u32 ctx1_iv_off = 0;
+   const bool ctr_mode = ((ctx->cdata.algtype & OP_ALG_AAI_MASK) ==
+  OP_ALG_AAI_CTR_MOD128);
+   const bool is_rfc3686 = (ctr_mode && strstr(alg_name, "rfc3686"));
+
+   memcpy(ctx->key, key, keylen);
+#ifdef DEBUG
+   print_hex_dump(KERN_ERR, "key in @" __stringify(__LINE__)": ",
+  DUMP_PREFIX_ADDRESS, 16, 4, key, keylen, 1);
+#endif
+   /*
+* AES-CTR needs to load IV in CONTEXT1 reg
+* at an offset of 128bits (16bytes)
+* CONTEXT1[255:128] = IV
+*/
+   if (ctr_mode)
+   ctx1_iv_off = 16;
+
+   /*
+* RFC3686 specific:
+*  | CONTEXT1[255:128] = {NONCE, IV, COUNTER}
+*  | *key = {KEY, NONCE}
+*/
+   if (is_rfc3686) {
+   ctx1_iv_off = 16 + CTR_RFC3686_NONCE_SIZE;
+   keylen -= CTR_RFC3686_NONCE_SIZE;
+   }
+
+   ctx->key_dma = dma_map_single(dev, ctx->key, keylen, DMA_TO_DEVICE);
+   if (dma_mapping_error(dev, ctx->key_dma)) {
+   dev_err(dev, "unable to map key i/o memory\n");
+   return -ENOMEM;
+   }
+   ctx->cdata.keylen = keylen;
+   ctx->cdata.key_virt = ctx->key;
+   ctx->cdata.key_inline = true;
+
+   /* ablkcipher_encrypt shared descriptor */
+   flc = >flc[ENCRYPT];
+   desc = flc->sh_desc;
+
+   cnstr_shdsc_ablkcipher_encap(desc, >cdata, ivsize,
+is_rfc3686, ctx1_iv_off);
+
+   flc->flc[1] = desc_len(desc); /* SDL */
+   flc->flc_dma = dma_map_single(dev, flc, sizeof(flc->flc) +
+ desc_bytes(desc), DMA_TO_DEVICE);
+   if (dma_mapping_error(dev, flc->flc_dma)) {
+   dev_err(dev, "unable to map shared descriptor\n");
+   return -ENOMEM;
+   }
+
+   /* ablkcipher_decrypt shared descriptor */
+   flc = >flc[DECRYPT];
+   desc = flc->sh_desc;
+
+   cnstr_shdsc_ablkcipher_decap(desc, >cdata, ivsize,
+is_rfc3686, ctx1_iv_off);
+
+   flc->flc[1] = desc_len(desc); /* SDL */
+   flc->flc_dma = dma_map_single(dev, flc, sizeof(flc->flc) +
+ desc_bytes(desc), DMA_TO_DEVICE);
+   if (dma_mapping_error(dev, flc->flc_dma)) {
+   dev_err(dev, "unable to map shared descriptor\n");
+   return -ENOMEM;
+   }
+
+   /* ablkcipher_givencrypt shared descriptor */
+   flc = >flc[GIVENCRYPT];
+   desc = flc->sh_desc;
+
+   cnstr_shdsc_ablkcipher_givencap(desc, >cdata,
+   ivsize, is_rfc3686, ctx1_iv_off);
+
+   flc->flc[1] = desc_len(desc); /* SDL */
+   flc->flc_dma = dma_map_single(dev, flc, sizeof(flc->flc) +
+ desc_bytes(desc), DMA_TO_DEVICE);
+   if (dma_mapping_error(dev, flc->flc_dma)) {
+   dev_err(dev, "unable to map shared descriptor\n");
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
+static int xts_ablkcipher_setkey(struct crypto_ablkcipher *ablkcipher,
+const u8 *key, unsigned int keylen)
+{
+   struct caam_ctx *ctx = crypto_ablkcipher_ctx(ablkcipher);
+   struct device *dev = 

[RFC PATCH 07/10] crypto: caam - add Queue Interface v2 error codes

2017-08-10 Thread Horia Geantă
Add support to translate error codes returned by QI v2, i.e.
Queue Interface present on DataPath Acceleration Architecture
v2 (DPAA2).

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/error.c | 75 +++--
 drivers/crypto/caam/error.h |  6 +++-
 drivers/crypto/caam/regs.h  |  2 ++
 3 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/caam/error.c b/drivers/crypto/caam/error.c
index 3d639f3b45aa..65756bab800f 100644
--- a/drivers/crypto/caam/error.c
+++ b/drivers/crypto/caam/error.c
@@ -107,6 +107,54 @@ static const struct {
{ 0xF1, "3GPP HFN matches or exceeds the Threshold" },
 };
 
+static const struct {
+   u8 value;
+   const char *error_text;
+} qi_error_list[] = {
+   { 0x1F, "Job terminated by FQ or ICID flush" },
+   { 0x20, "FD format error"},
+   { 0x21, "FD command format error"},
+   { 0x23, "FL format error"},
+   { 0x25, "CRJD specified in FD, but not enabled in FLC"},
+   { 0x30, "Max. buffer size too small"},
+   { 0x31, "DHR exceeds max. buffer size (allocate mode, S/G format)"},
+   { 0x32, "SGT exceeds max. buffer size (allocate mode, S/G format"},
+   { 0x33, "Size over/underflow (allocate mode)"},
+   { 0x34, "Size over/underflow (reuse mode)"},
+   { 0x35, "Length exceeds max. short length (allocate mode, S/G/ 
format)"},
+   { 0x36, "Memory footprint exceeds max. value (allocate mode, S/G/ 
format)"},
+   { 0x41, "SBC frame format not supported (allocate mode)"},
+   { 0x42, "Pool 0 invalid / pool 1 size < pool 0 size (allocate mode)"},
+   { 0x43, "Annotation output enabled but ASAR = 0 (allocate mode)"},
+   { 0x44, "Unsupported or reserved frame format or SGHR = 1 (reuse 
mode)"},
+   { 0x45, "DHR correction underflow (reuse mode, single buffer format)"},
+   { 0x46, "Annotation length exceeds offset (reuse mode)"},
+   { 0x48, "Annotation output enabled but ASA limited by ASAR (reuse 
mode)"},
+   { 0x49, "Data offset correction exceeds input frame data length (reuse 
mode)"},
+   { 0x4B, "Annotation output enabled but ASA cannote be expanded (frame 
list)"},
+   { 0x51, "Unsupported IF reuse mode"},
+   { 0x52, "Unsupported FL use mode"},
+   { 0x53, "Unsupported RJD use mode"},
+   { 0x54, "Unsupported inline descriptor use mode"},
+   { 0xC0, "Table buffer pool 0 depletion"},
+   { 0xC1, "Table buffer pool 1 depletion"},
+   { 0xC2, "Data buffer pool 0 depletion, no OF allocated"},
+   { 0xC3, "Data buffer pool 1 depletion, no OF allocated"},
+   { 0xC4, "Data buffer pool 0 depletion, partial OF allocated"},
+   { 0xC5, "Data buffer pool 1 depletion, partial OF allocated"},
+   { 0xD0, "FLC read error"},
+   { 0xD1, "FL read error"},
+   { 0xD2, "FL write error"},
+   { 0xD3, "OF SGT write error"},
+   { 0xD4, "PTA read error"},
+   { 0xD5, "PTA write error"},
+   { 0xD6, "OF SGT F-bit write error"},
+   { 0xD7, "ASA write error"},
+   { 0xE1, "FLC[ICR]=0 ICID error"},
+   { 0xE2, "FLC[ICR]=1 ICID error"},
+   { 0xE4, "source of ICID flush not trusted (BDI = 0)"},
+};
+
 static const char * const cha_id_list[] = {
"",
"AES",
@@ -235,6 +283,27 @@ static void report_deco_status(struct device *jrdev, const 
u32 status,
status, error, idx_str, idx, err_str, err_err_code);
 }
 
+static void report_qi_status(struct device *qidev, const u32 status,
+const char *error)
+{
+   u8 err_id = status & JRSTA_QIERR_ERROR_MASK;
+   const char *err_str = "unidentified error value 0x";
+   char err_err_code[3] = { 0 };
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(qi_error_list); i++)
+   if (qi_error_list[i].value == err_id)
+   break;
+
+   if (i != ARRAY_SIZE(qi_error_list) && qi_error_list[i].error_text)
+   err_str = qi_error_list[i].error_text;
+   else
+   snprintf(err_err_code, sizeof(err_err_code), "%02x", err_id);
+
+   dev_err(qidev, "%08x: %s: %s%s\n",
+   status, error, err_str, err_err_code);
+}
+
 static void report_jr_status(struct device *jrdev, const u32 status,
 const char *error)
 {
@@ -249,7 +318,7 @@ static void report_cond_code_status(struct device *jrdev, 
const u32 status,
status, error, __func__);
 }
 
-void caam_jr_strstatus(struct device *jrdev, u32 status)
+void caam_strstatus(struct device *jrdev, u32 status, bool qi_v2)
 {
static const struct stat_src {
void (*report_ssed)(struct device *jrdev, const u32 status,
@@ -261,7 +330,7 @@ void caam_jr_strstatus(struct device *jrdev, u32 status)
{ report_ccb_status, "CCB" },
{ report_jump_status, "Jump" },
{ report_deco_status, "DECO" },
-   { NULL, "Queue Manager 

[RFC PATCH 10/10] arm64: defconfig: enable CAAM crypto engine on QorIQ DPAA2 SoCs

2017-08-10 Thread Horia Geantă
Enable CAAM (Cryptographic Accelerator and Assurance Module) driver
for QorIQ Data Path Acceleration Architecture (DPAA) v2.
It handles DPSECI (Data Path SEC Interface) DPAA2 objects that sit
on the Management Complex (MC) fsl-mc bus.

Signed-off-by: Horia Geantă 
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 6c7d147eed54..43455ad6fff5 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -561,6 +561,7 @@ CONFIG_MEMTEST=y
 CONFIG_SECURITY=y
 CONFIG_CRYPTO_ECHAINIV=y
 CONFIG_CRYPTO_ANSI_CPRNG=y
+CONFIG_CRYPTO_DEV_FSL_DPAA2_CAAM=y
 CONFIG_ARM64_CRYPTO=y
 CONFIG_CRYPTO_SHA1_ARM64_CE=y
 CONFIG_CRYPTO_SHA2_ARM64_CE=y
-- 
2.12.0.264.gd6db3f216544



[RFC PATCH 06/10] crypto: caam - add DPAA2-CAAM (DPSECI) backend API

2017-08-10 Thread Horia Geantă
Add the low-level API that allows to manage DPSECI DPAA2 objects
that sit on the Management Complex (MC) fsl-mc bus.

The API is compatible with MC firmware 10.2.0+.

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/dpseci.c | 858 +++
 drivers/crypto/caam/dpseci.h | 395 ++
 drivers/crypto/caam/dpseci_cmd.h | 261 
 3 files changed, 1514 insertions(+)
 create mode 100644 drivers/crypto/caam/dpseci.c
 create mode 100644 drivers/crypto/caam/dpseci.h
 create mode 100644 drivers/crypto/caam/dpseci_cmd.h

diff --git a/drivers/crypto/caam/dpseci.c b/drivers/crypto/caam/dpseci.c
new file mode 100644
index ..dec05ecbeab1
--- /dev/null
+++ b/drivers/crypto/caam/dpseci.c
@@ -0,0 +1,858 @@
+/*
+ * Copyright 2013-2016 Freescale Semiconductor Inc.
+ * Copyright 2017 NXP
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the names of the above-listed copyright holders nor the
+ *  names of any contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "../../../drivers/staging/fsl-mc/include/mc.h"
+#include "../../../drivers/staging/fsl-mc/include/dpopr.h"
+#include "dpseci.h"
+#include "dpseci_cmd.h"
+
+/**
+ * dpseci_open() - Open a control session for the specified object
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @dpseci_id: DPSECI unique ID
+ * @token: Returned token; use in subsequent API calls
+ *
+ * This function can be used to open a control session for an already created
+ * object; an object may have been declared in the DPL or by calling the
+ * dpseci_create() function.
+ * This function returns a unique authentication token, associated with the
+ * specific object ID and the specific MC portal; this token must be used in 
all
+ * subsequent commands for this specific object.
+ *
+ * Return: '0' on success, error code otherwise
+ */
+int dpseci_open(struct fsl_mc_io *mc_io, u32 cmd_flags, int dpseci_id,
+   u16 *token)
+{
+   struct mc_command cmd = { 0 };
+   struct dpseci_cmd_open *cmd_params;
+   int err;
+
+   cmd.header = mc_encode_cmd_header(DPSECI_CMDID_OPEN,
+ cmd_flags,
+ 0);
+   cmd_params = (struct dpseci_cmd_open *)cmd.params;
+   cmd_params->dpseci_id = cpu_to_le32(dpseci_id);
+   err = mc_send_command(mc_io, );
+   if (err)
+   return err;
+
+   *token = mc_cmd_hdr_read_token();
+
+   return 0;
+}
+
+/**
+ * dpseci_close() - Close the control session of the object
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPSECI object
+ *
+ * After this function is called, no further operations are allowed on the
+ * object without opening a new control session.
+ *
+ * Return: '0' on success, error code otherwise
+ */
+int dpseci_close(struct fsl_mc_io *mc_io, u32 cmd_flags, u16 token)
+{
+   struct mc_command cmd = { 0 };
+
+   cmd.header = mc_encode_cmd_header(DPSECI_CMDID_CLOSE,
+ cmd_flags,
+ token);
+   return mc_send_command(mc_io, );
+}
+
+/**
+ * dpseci_create() - Create 

[RFC PATCH 08/10] crypto: caam/qi2 - add DPAA2-CAAM driver

2017-08-10 Thread Horia Geantă
Add CAAM driver that works using the DPSECI backend, i.e. manages
DPSECI DPAA2 objects sitting on the Management Complex (MC) fsl-mc bus.

Data transfers (crypto requests) are sent/received to/from CAAM crypto
engine via Queue Interface (v2), this being similar to existing caam/qi.
OTOH, configuration/setup (obtaining virtual queue IDs, authorization
etc.) is done by sending commands to the MC f/w.

Note that the CAAM accelerator included in DPAA2 platforms still has
Job Rings. However, the driver being added does not handle access
via this backend. Kconfig & Makefile are updated such that DPAA2-CAAM
(a.k.a. "caam/qi2") driver does not depend on caam/jr or caam/qi
backends - which rely on platform bus support (ctrl.c).

Support for the following aead and authenc algorithms is also added
in this patch:
-aead:
gcm(aes)
rfc4106(gcm(aes))
rfc4543(gcm(aes))
-authenc:
authenc(hmac({md5,sha*}),cbc({aes,des,des3_ede}))
echainiv(authenc(hmac({md5,sha*}),cbc({aes,des,des3_ede})))
authenc(hmac({md5,sha*}),rfc3686(ctr(aes))
seqiv(authenc(hmac({md5,sha*}),rfc3686(ctr(aes)))

Signed-off-by: Horia Geantă 
---
 drivers/crypto/Makefile   |2 +-
 drivers/crypto/caam/Kconfig   |   56 +-
 drivers/crypto/caam/Makefile  |9 +-
 drivers/crypto/caam/caamalg_qi2.c | 3133 +
 drivers/crypto/caam/caamalg_qi2.h |  222 +++
 drivers/crypto/caam/compat.h  |1 +
 drivers/crypto/caam/key_gen.c |   30 -
 drivers/crypto/caam/key_gen.h |   30 +
 8 files changed, 3432 insertions(+), 51 deletions(-)
 create mode 100644 drivers/crypto/caam/caamalg_qi2.c
 create mode 100644 drivers/crypto/caam/caamalg_qi2.h

diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index b12eb3c99430..50c3436611f1 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -9,7 +9,7 @@ obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chelsio/
 obj-$(CONFIG_CRYPTO_DEV_CPT) += cavium/cpt/
 obj-$(CONFIG_CRYPTO_DEV_NITROX) += cavium/nitrox/
 obj-$(CONFIG_CRYPTO_DEV_EXYNOS_RNG) += exynos-rng.o
-obj-$(CONFIG_CRYPTO_DEV_FSL_CAAM) += caam/
+obj-$(CONFIG_CRYPTO_DEV_FSL_CAAM_COMMON) += caam/
 obj-$(CONFIG_CRYPTO_DEV_GEODE) += geode-aes.o
 obj-$(CONFIG_CRYPTO_DEV_HIFN_795X) += hifn_795x.o
 obj-$(CONFIG_CRYPTO_DEV_IMGTEC_HASH) += img-hash.o
diff --git a/drivers/crypto/caam/Kconfig b/drivers/crypto/caam/Kconfig
index e36aeacd7635..e45d39d9007e 100644
--- a/drivers/crypto/caam/Kconfig
+++ b/drivers/crypto/caam/Kconfig
@@ -1,6 +1,10 @@
+config CRYPTO_DEV_FSL_CAAM_COMMON
+   tristate
+
 config CRYPTO_DEV_FSL_CAAM
-   tristate "Freescale CAAM-Multicore driver backend"
+   tristate "Freescale CAAM-Multicore platform driver backend"
depends on FSL_SOC || ARCH_MXC || ARCH_LAYERSCAPE
+   select CRYPTO_DEV_FSL_CAAM_COMMON
help
  Enables the driver module for Freescale's Cryptographic Accelerator
  and Assurance Module (CAAM), also known as the SEC version 4 (SEC4).
@@ -11,9 +15,19 @@ config CRYPTO_DEV_FSL_CAAM
  To compile this driver as a module, choose M here: the module
  will be called caam.
 
+if CRYPTO_DEV_FSL_CAAM
+
+config CRYPTO_DEV_FSL_CAAM_IMX
+   def_bool SOC_IMX6 || SOC_IMX7D
+
+config CRYPTO_DEV_FSL_CAAM_DEBUG
+   bool "Enable debug output in CAAM driver"
+   help
+ Selecting this will enable printing of various debug
+ information in the CAAM driver.
+
 config CRYPTO_DEV_FSL_CAAM_JR
tristate "Freescale CAAM Job Ring driver backend"
-   depends on CRYPTO_DEV_FSL_CAAM
default y
help
  Enables the driver module for Job Rings which are part of
@@ -24,9 +38,10 @@ config CRYPTO_DEV_FSL_CAAM_JR
  To compile this driver as a module, choose M here: the module
  will be called caam_jr.
 
+if CRYPTO_DEV_FSL_CAAM_JR
+
 config CRYPTO_DEV_FSL_CAAM_RINGSIZE
int "Job Ring size"
-   depends on CRYPTO_DEV_FSL_CAAM_JR
range 2 9
default "9"
help
@@ -44,7 +59,6 @@ config CRYPTO_DEV_FSL_CAAM_RINGSIZE
 
 config CRYPTO_DEV_FSL_CAAM_INTC
bool "Job Ring interrupt coalescing"
-   depends on CRYPTO_DEV_FSL_CAAM_JR
help
  Enable the Job Ring's interrupt coalescing feature.
 
@@ -74,7 +88,6 @@ config CRYPTO_DEV_FSL_CAAM_INTC_TIME_THLD
 
 config CRYPTO_DEV_FSL_CAAM_CRYPTO_API
tristate "Register algorithm implementations with the Crypto API"
-   depends on CRYPTO_DEV_FSL_CAAM_JR
default y
select CRYPTO_AEAD
select CRYPTO_AUTHENC
@@ -89,7 +102,7 @@ config CRYPTO_DEV_FSL_CAAM_CRYPTO_API
 
 config CRYPTO_DEV_FSL_CAAM_CRYPTO_API_QI
tristate "Queue Interface as Crypto API backend"
-   depends on CRYPTO_DEV_FSL_CAAM_JR && FSL_DPAA && NET
+   depends on FSL_DPAA && NET
default y
select CRYPTO_AUTHENC
select CRYPTO_BLKCIPHER
@@ -106,7 +119,6 @@ config CRYPTO_DEV_FSL_CAAM_CRYPTO_API_QI
 
 config CRYPTO_DEV_FSL_CAAM_AHASH_API

[RFC PATCH 05/10] crypto: caam/qi - prepare for gcm(aes) support

2017-08-10 Thread Horia Geantă
Update gcm(aes) descriptors (generic, rfc4106 and rfc4543) such that
they would also work when submitted via the QI interface.

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/caamalg.c  |  19 +++--
 drivers/crypto/caam/caamalg_desc.c | 165 ++---
 drivers/crypto/caam/caamalg_desc.h |  24 --
 3 files changed, 183 insertions(+), 25 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index e0d1b5c3c1ba..94e12ec8141c 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -290,6 +290,7 @@ static int gcm_set_sh_desc(struct crypto_aead *aead)
 {
struct caam_ctx *ctx = crypto_aead_ctx(aead);
struct device *jrdev = ctx->jrdev;
+   unsigned int ivsize = crypto_aead_ivsize(aead);
u32 *desc;
int rem_bytes = CAAM_DESC_BYTES_MAX - GCM_DESC_JOB_IO_LEN -
ctx->cdata.keylen;
@@ -311,7 +312,7 @@ static int gcm_set_sh_desc(struct crypto_aead *aead)
}
 
desc = ctx->sh_desc_enc;
-   cnstr_shdsc_gcm_encap(desc, >cdata, ctx->authsize);
+   cnstr_shdsc_gcm_encap(desc, >cdata, ivsize, ctx->authsize, false);
dma_sync_single_for_device(jrdev, ctx->sh_desc_enc_dma,
   desc_bytes(desc), DMA_TO_DEVICE);
 
@@ -328,7 +329,7 @@ static int gcm_set_sh_desc(struct crypto_aead *aead)
}
 
desc = ctx->sh_desc_dec;
-   cnstr_shdsc_gcm_decap(desc, >cdata, ctx->authsize);
+   cnstr_shdsc_gcm_decap(desc, >cdata, ivsize, ctx->authsize, false);
dma_sync_single_for_device(jrdev, ctx->sh_desc_dec_dma,
   desc_bytes(desc), DMA_TO_DEVICE);
 
@@ -349,6 +350,7 @@ static int rfc4106_set_sh_desc(struct crypto_aead *aead)
 {
struct caam_ctx *ctx = crypto_aead_ctx(aead);
struct device *jrdev = ctx->jrdev;
+   unsigned int ivsize = crypto_aead_ivsize(aead);
u32 *desc;
int rem_bytes = CAAM_DESC_BYTES_MAX - GCM_DESC_JOB_IO_LEN -
ctx->cdata.keylen;
@@ -370,7 +372,8 @@ static int rfc4106_set_sh_desc(struct crypto_aead *aead)
}
 
desc = ctx->sh_desc_enc;
-   cnstr_shdsc_rfc4106_encap(desc, >cdata, ctx->authsize);
+   cnstr_shdsc_rfc4106_encap(desc, >cdata, ivsize, ctx->authsize,
+ false);
dma_sync_single_for_device(jrdev, ctx->sh_desc_enc_dma,
   desc_bytes(desc), DMA_TO_DEVICE);
 
@@ -387,7 +390,8 @@ static int rfc4106_set_sh_desc(struct crypto_aead *aead)
}
 
desc = ctx->sh_desc_dec;
-   cnstr_shdsc_rfc4106_decap(desc, >cdata, ctx->authsize);
+   cnstr_shdsc_rfc4106_decap(desc, >cdata, ivsize, ctx->authsize,
+ false);
dma_sync_single_for_device(jrdev, ctx->sh_desc_dec_dma,
   desc_bytes(desc), DMA_TO_DEVICE);
 
@@ -409,6 +413,7 @@ static int rfc4543_set_sh_desc(struct crypto_aead *aead)
 {
struct caam_ctx *ctx = crypto_aead_ctx(aead);
struct device *jrdev = ctx->jrdev;
+   unsigned int ivsize = crypto_aead_ivsize(aead);
u32 *desc;
int rem_bytes = CAAM_DESC_BYTES_MAX - GCM_DESC_JOB_IO_LEN -
ctx->cdata.keylen;
@@ -430,7 +435,8 @@ static int rfc4543_set_sh_desc(struct crypto_aead *aead)
}
 
desc = ctx->sh_desc_enc;
-   cnstr_shdsc_rfc4543_encap(desc, >cdata, ctx->authsize);
+   cnstr_shdsc_rfc4543_encap(desc, >cdata, ivsize, ctx->authsize,
+ false);
dma_sync_single_for_device(jrdev, ctx->sh_desc_enc_dma,
   desc_bytes(desc), DMA_TO_DEVICE);
 
@@ -447,7 +453,8 @@ static int rfc4543_set_sh_desc(struct crypto_aead *aead)
}
 
desc = ctx->sh_desc_dec;
-   cnstr_shdsc_rfc4543_decap(desc, >cdata, ctx->authsize);
+   cnstr_shdsc_rfc4543_decap(desc, >cdata, ivsize, ctx->authsize,
+ false);
dma_sync_single_for_device(jrdev, ctx->sh_desc_dec_dma,
   desc_bytes(desc), DMA_TO_DEVICE);
 
diff --git a/drivers/crypto/caam/caamalg_desc.c 
b/drivers/crypto/caam/caamalg_desc.c
index 530c14ee32de..54c6ff2ff975 100644
--- a/drivers/crypto/caam/caamalg_desc.c
+++ b/drivers/crypto/caam/caamalg_desc.c
@@ -587,10 +587,13 @@ EXPORT_SYMBOL(cnstr_shdsc_aead_givencap);
  * @desc: pointer to buffer used for descriptor construction
  * @cdata: pointer to block cipher transform definitions
  * Valid algorithm values - OP_ALG_ALGSEL_AES ANDed with 
OP_ALG_AAI_GCM.
+ * @ivsize: initialization vector size
  * @icvsize: integrity check value (ICV) size (truncated or full)
+ * @is_qi: true when called from caam/qi
  */
 void cnstr_shdsc_gcm_encap(u32 * const desc, struct alginfo *cdata,
-  unsigned int icvsize)
+  unsigned int ivsize, 

[RFC PATCH 02/10] staging: fsl-mc: dpio: add congestion notification support

2017-08-10 Thread Horia Geantă
Add support for Congestion State Change Notifications (CSCN), which
allow DPIO users to be notified when a congestion group changes its
state (due to hitting the entrance / exit threshold).

Signed-off-by: Ioana Radulescu 
Signed-off-by: Radu Alexe 
Signed-off-by: Horia Geantă 
---
 drivers/staging/fsl-mc/include/dpaa2-io.h | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/drivers/staging/fsl-mc/include/dpaa2-io.h 
b/drivers/staging/fsl-mc/include/dpaa2-io.h
index 002829cecd75..e7af7d647ab1 100644
--- a/drivers/staging/fsl-mc/include/dpaa2-io.h
+++ b/drivers/staging/fsl-mc/include/dpaa2-io.h
@@ -136,4 +136,47 @@ struct dpaa2_io_store *dpaa2_io_store_create(unsigned int 
max_frames,
 void dpaa2_io_store_destroy(struct dpaa2_io_store *s);
 struct dpaa2_dq *dpaa2_io_store_next(struct dpaa2_io_store *s, int *is_last);
 
+/***/
+/* CSCN*/
+/***/
+
+/**
+ * struct dpaa2_cscn - The CSCN message format
+ * @verb: identifies the type of message (should be 0x27).
+ * @stat: status bits related to dequeuing response (not used)
+ * @state: bit 0 = 0/1 if CG is no/is congested
+ * @reserved: reserved byte
+ * @cgid: congest grp ID - the first 16 bits
+ * @ctx: context data
+ *
+ * Congestion management can be implemented in software through
+ * the use of Congestion State Change Notifications (CSCN). These
+ * are messages written by DPAA2 hardware to memory whenever the
+ * instantaneous count (I_CNT field in the CG) exceeds the
+ * Congestion State (CS) entrance threshold, signifying congestion
+ * entrance, or when the instantaneous count returns below exit
+ * threshold, signifying congestion exit. The format of the message
+ * is given by the dpaa2_cscn structure. Bit 0 of the state field
+ * represents congestion state written by the hardware.
+ */
+struct dpaa2_cscn {
+   u8 verb;
+   u8 stat;
+   u8 state;
+   u8 reserved;
+   __le32 cgid;
+   __le64 ctx;
+};
+
+#define DPAA2_CSCN_SIZE64
+#define DPAA2_CSCN_ALIGN   16
+
+#define DPAA2_CSCN_STATE_MASK  0x1
+#define DPAA2_CSCN_CONGESTED   1
+
+static inline bool dpaa2_cscn_state_congested(struct dpaa2_cscn *cscn)
+{
+   return ((cscn->state & DPAA2_CSCN_STATE_MASK) == DPAA2_CSCN_CONGESTED);
+}
+
 #endif /* __FSL_DPAA2_IO_H */
-- 
2.12.0.264.gd6db3f216544



Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Chris Mason

On 08/10/2017 04:30 AM, Eric Biggers wrote:

On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:



The memory reported is the amount of memory the compressor requests.

| Method   | Size (B) | Time (s) | Ratio | MB/s| Adj MB/s | Mem (MB) |
|--|--|--|---|-|--|--|
| none | 11988480 |0.100 | 1 | 2119.88 |- |- |
| zstd -1  | 73645762 |1.044 | 2.878 |  203.05 |   224.56 | 1.23 |
| zstd -3  | 66988878 |1.761 | 3.165 |  120.38 |   127.63 | 2.47 |
| zstd -5  | 65001259 |2.563 | 3.261 |   82.71 |86.07 | 2.86 |
| zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |16.13 |13.22 |
| zstd -15 | 58009756 |   47.601 | 3.654 |4.45 | 4.46 |21.61 |
| zstd -19 | 54014593 |  102.835 | 3.925 |2.06 | 2.06 |60.15 |
| zlib -1  | 77260026 |2.895 | 2.744 |   73.23 |75.85 | 0.27 |
| zlib -3  | 72972206 |4.116 | 2.905 |   51.50 |52.79 | 0.27 |
| zlib -6  | 68190360 |9.633 | 3.109 |   22.01 |22.24 | 0.27 |
| zlib -9  | 67613382 |   22.554 | 3.135 |9.40 | 9.44 | 0.27 |



Theses benchmarks are misleading because they compress the whole file as a
single stream without resetting the dictionary, which isn't how data will
typically be compressed in kernel mode.  With filesystem compression the data
has to be divided into small chunks that can each be decompressed independently.
That eliminates one of the primary advantages of Zstandard (support for large
dictionary sizes).


I did btrfs benchmarks of kernel trees and other normal data sets as 
well.  The numbers were in line with what Nick is posting here.  zstd is 
a big win over both lzo and zlib from a btrfs point of view.


It's true Nick's patches only support a single compression level in 
btrfs, but that's because btrfs doesn't have a way to pass in the 
compression ratio.  It could easily be a mount option, it was just 
outside the scope of Nick's initial work.


-chris





Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Eric Biggers
On Thu, Aug 10, 2017 at 10:57:01AM -0400, Austin S. Hemmelgarn wrote:
> Also didn't think to mention this, but I could see the max level
> being very popular for use with SquashFS root filesystems used in
> LiveCD's. Currently, they have to decide between read performance
> and image size, while zstd would provide both.

The high compression levels of Zstandard are indeed a great fit for SquashFS,
but SquashFS images are created in userspace by squashfs-tools.  The kernel only
needs to be able to decompress them.

(Also, while Zstandard provides very good tradeoffs and will probably become the
preferred algorithm for SquashFS, it's misleading to imply that users won't have
to make decisions anymore.  It does not compress as well as XZ or decompress as
fast as LZ4, except maybe in very carefully crafted benchmarks.)

Eric


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Eric Biggers
On Thu, Aug 10, 2017 at 07:32:18AM -0400, Austin S. Hemmelgarn wrote:
> On 2017-08-10 04:30, Eric Biggers wrote:
> >On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
> >>
> >>It can compress at speeds approaching lz4, and quality approaching lzma.
> >
> >Well, for a very loose definition of "approaching", and certainly not at the
> >same time.  I doubt there's a use case for using the highest compression 
> >levels
> >in kernel mode --- especially the ones using zstd_opt.h.
> Large data-sets with WORM access patterns and infrequent writes
> immediately come to mind as a use case for the highest compression
> level.
> 
> As a more specific example, the company I work for has a very large
> amount of documentation, and we keep all old versions.  This is all
> stored on a file server which is currently using BTRFS.  Once a
> document is written, it's almost never rewritten, so write
> performance only matters for the first write.  However, they're read
> back pretty frequently, so we need good read performance.  As of
> right now, the system is set to use LZO compression by default, and
> then when a new document is added, the previous version of that
> document gets re-compressed using zlib compression, which actually
> results in pretty significant space savings most of the time.  I
> would absolutely love to use zstd compression with this system with
> the highest compression level, because most people don't care how
> long it takes to write the file out, but they do care how long it
> takes to read a file (even if it's an older version).

This may be a reasonable use case, but note this cannot just be the regular
"zstd" compression setting, since filesystem compression by default must provide
reasonable performance for many different access patterns.  See the patch in
this series which actually adds zstd compression to btrfs; it only uses level 1.
I do not see a patch which adds a higher compression mode.  It would need to be
a special setting like "zstdhc" that users could opt-in to on specific
directories.  It also would need to be compared to simply compressing in
userspace.  In many cases compressing in userspace is probably the better
solution for the use case in question because it works on any filesystem, allows
using any compression algorithm, and if random access is not needed it is
possible to compress each file as a single stream (like a .xz file), which
produces a much better compression ratio than the block-by-block compression
that filesystems have to use.

Note also that LZ4HC is in the kernel source tree currently but no one is using
it vs. the regular LZ4.  I think it is the kind of thing that sounded useful
originally, but at the end of the day no one really wants to use it in kernel
mode.  I'd certainly be interested in actual patches, though.

Eric


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Austin S. Hemmelgarn

On 2017-08-10 07:32, Austin S. Hemmelgarn wrote:

On 2017-08-10 04:30, Eric Biggers wrote:

On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:


It can compress at speeds approaching lz4, and quality approaching lzma.


Well, for a very loose definition of "approaching", and certainly not 
at the
same time.  I doubt there's a use case for using the highest 
compression levels

in kernel mode --- especially the ones using zstd_opt.h.
Large data-sets with WORM access patterns and infrequent writes 
immediately come to mind as a use case for the highest compression level.


As a more specific example, the company I work for has a very large 
amount of documentation, and we keep all old versions.  This is all 
stored on a file server which is currently using BTRFS.  Once a document 
is written, it's almost never rewritten, so write performance only 
matters for the first write.  However, they're read back pretty 
frequently, so we need good read performance.  As of right now, the 
system is set to use LZO compression by default, and then when a new 
document is added, the previous version of that document gets 
re-compressed using zlib compression, which actually results in pretty 
significant space savings most of the time.  I would absolutely love to 
use zstd compression with this system with the highest compression 
level, because most people don't care how long it takes to write the 
file out, but they do care how long it takes to read a file (even if 
it's an older version).
Also didn't think to mention this, but I could see the max level being 
very popular for use with SquashFS root filesystems used in LiveCD's. 
Currently, they have to decide between read performance and image size, 
while zstd would provide both.


Re: [PATCH v4 2/4] crypto: add crypto_(un)register_ahashes()

2017-08-10 Thread Lars Persson



On 08/10/2017 02:53 PM, Lars Persson wrote:

From: Rabin Vincent 

There are already helpers to (un)register multiple normal
and AEAD algos.  Add one for ahashes too.

Signed-off-by: Lars Persson 
Signed-off-by: Rabin Vincent 
---
v4: crypto_register_skciphers was used where crypto_unregister_skciphers
 was intended.



The v4 change comment above in fact belongs to patch 3/4 of this series. 
Sorry for the confusion.


BR,
 Lars


[PATCH] crypto: AF_ALG - get_page upon reassignment to TX SGL

2017-08-10 Thread Stephan Müller
Hi Herbert,

The error can be triggered with the following test. Invoking that test
in a while [ 1 ] loop shows that no memory is leaked.

#include 
#include 

int main(int argc, char *argv[])
{
char buf[8192];
struct kcapi_handle *handle;
struct iovec iov;
int ret;

(void)argc;
(void)argv;

iov.iov_base = buf;

ret = kcapi_cipher_init(, "ctr(aes)", 0);
if (ret)
return ret;

ret = kcapi_cipher_setkey(handle, (unsigned char *)"0123456789abcdef", 
16);
if (ret)
return ret;

ret = kcapi_cipher_stream_init_enc(handle, (unsigned char 
*)"0123456789abcdef", NULL, 0);
if (ret < 0)
return ret;

iov.iov_len = 4152;
ret = kcapi_cipher_stream_update(handle, , 1);
if (ret < 0)
return ret;

iov.iov_len = 4096;
ret = kcapi_cipher_stream_op(handle, , 1);
if (ret < 0)
return ret;

kcapi_cipher_destroy(handle);

return 0;
}

---8<---

When a page is assigned to a TX SGL, call get_page to increment the
reference counter. It is possible that one page is referenced in
multiple SGLs:

- in the global TX SGL in case a previous af_alg_pull_tsgl only
reassigned parts of a page to a per-request TX SGL

- in the per-request TX SGL as assigned by af_alg_pull_tsgl

Note, multiple requests can be active at the same time whose TX SGLs all
point to different parts of the same page.

Signed-off-by: Stephan Mueller 
---
 crypto/af_alg.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index d6936c0e08d9..ffa9f4ccd9b4 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -641,9 +641,9 @@ void af_alg_pull_tsgl(struct sock *sk, size_t used, struct 
scatterlist *dst,
if (dst_offset >= plen) {
/* discard page before offset */
dst_offset -= plen;
-   put_page(page);
} else {
/* reassign page to dst after offset */
+   get_page(page);
sg_set_page(dst + j, page,
plen - dst_offset,
sg[i].offset + dst_offset);
@@ -661,9 +661,7 @@ void af_alg_pull_tsgl(struct sock *sk, size_t used, struct 
scatterlist *dst,
if (sg[i].length)
return;
 
-   if (!dst)
-   put_page(page);
-
+   put_page(page);
sg_assign_page(sg + i, NULL);
}
 
-- 
2.13.4




Re: [Freedombox-discuss] Hardware Crypto

2017-08-10 Thread Sandy Harris
To me it seems obvious that if the hardware provides a real RNG, that
should be used to feed random(4). This solves a genuine problem and,
even if calls to the hardware are expensive, overall overhead will not
be high because random(4) does not need huge amounts of input.

I'm much less certain hardware acceleration is worthwhile for ciphers
& hashes, except where the CPU itself includes instructions to speed
them up.


Re: [PATCH v8 1/4] crypto: AF_ALG -- add sign/verify API

2017-08-10 Thread Stephan Müller
Am Donnerstag, 10. August 2017, 15:59:33 CEST schrieb Tudor Ambarus:

Hi Tudor,

> On 08/10/2017 04:03 PM, Stephan Mueller wrote:
> > Is there a style requirement for that? checkpatch.pl does not complain. I
> > thought that one liners in a conditional should not have braces?
> 
> Linux coding style requires braces in both branches when you have a
> branch with a statement and the other with multiple statements.
> 
> Checkpatch complains about this when you run it with --strict option.

Ok, then I will add it.

Thanks

Ciao
Stephan


Re: [PATCH v8 1/4] crypto: AF_ALG -- add sign/verify API

2017-08-10 Thread Tudor Ambarus



On 08/10/2017 04:03 PM, Stephan Mueller wrote:

Is there a style requirement for that? checkpatch.pl does not complain. I
thought that one liners in a conditional should not have braces?


Linux coding style requires braces in both branches when you have a
branch with a statement and the other with multiple statements.

Checkpatch complains about this when you run it with --strict option.

Cheers,
ta


Re: [PATCH v8 1/4] crypto: AF_ALG -- add sign/verify API

2017-08-10 Thread Stephan Mueller
Am Donnerstag, 10. August 2017, 14:49:39 CEST schrieb Tudor Ambarus:

Hi Tudor,

thanks for reviewing

> > 
> > -   err = ctx->enc ? crypto_aead_encrypt(>cra_u.aead_req) :
> > -crypto_aead_decrypt(>cra_u.aead_req);
> > -   } else {
> > +   } else
> 
> Unbalanced braces around else statement.

Is there a style requirement for that? checkpatch.pl does not complain. I 
thought that one liners in a conditional should not have braces?

> > -   ctx->enc = 0;
> > +   ctx->op = 0;
> 
> This implies decryption. Should we change the value of ALG_OP_DECRYPT?

ALG_OP_DECRYPT is a user space interface, so we cannot change it.

Do you see harm in leaving it as is? Note, I did not want to introduce 
functional changes that have no bearing on the addition of the sign/verify 
API. If you think this is problematic, I would like to add another patch that 
is dedicated to fix this.

> > -   err = ctx->enc ?
> > -   crypto_skcipher_encrypt(>cra_u.skcipher_req) :
> > -   crypto_skcipher_decrypt(>cra_u.skcipher_req);
> > -   } else {
> > +   } else
> 
> Unbalanced braces around else statement.

Same as above.

Thanks a lot!

Ciao
Stephan


[PATCH v4 4/4] MAINTAINERS: Add ARTPEC crypto maintainer

2017-08-10 Thread Lars Persson
Assign the Axis kernel team as maintainer for crypto drivers under
drivers/crypto/axis.

Signed-off-by: Lars Persson 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index d5b6c71e783e..72186cf9820d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1129,6 +1129,7 @@ L:linux-arm-ker...@axis.com
 F: arch/arm/mach-artpec
 F: arch/arm/boot/dts/artpec6*
 F: drivers/clk/axis
+F: drivers/crypto/axis
 F: drivers/pinctrl/pinctrl-artpec*
 F: Documentation/devicetree/bindings/pinctrl/axis,artpec6-pinctrl.txt
 
-- 
2.11.0



[PATCH v4 2/4] crypto: add crypto_(un)register_ahashes()

2017-08-10 Thread Lars Persson
From: Rabin Vincent 

There are already helpers to (un)register multiple normal
and AEAD algos.  Add one for ahashes too.

Signed-off-by: Lars Persson 
Signed-off-by: Rabin Vincent 
---
v4: crypto_register_skciphers was used where crypto_unregister_skciphers
was intended.

 crypto/ahash.c | 29 +
 include/crypto/internal/hash.h |  2 ++
 2 files changed, 31 insertions(+)

diff --git a/crypto/ahash.c b/crypto/ahash.c
index 826cd7ab4d4a..5e8666e6ccae 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -588,6 +588,35 @@ int crypto_unregister_ahash(struct ahash_alg *alg)
 }
 EXPORT_SYMBOL_GPL(crypto_unregister_ahash);
 
+int crypto_register_ahashes(struct ahash_alg *algs, int count)
+{
+   int i, ret;
+
+   for (i = 0; i < count; i++) {
+   ret = crypto_register_ahash([i]);
+   if (ret)
+   goto err;
+   }
+
+   return 0;
+
+err:
+   for (--i; i >= 0; --i)
+   crypto_unregister_ahash([i]);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(crypto_register_ahashes);
+
+void crypto_unregister_ahashes(struct ahash_alg *algs, int count)
+{
+   int i;
+
+   for (i = count - 1; i >= 0; --i)
+   crypto_unregister_ahash([i]);
+}
+EXPORT_SYMBOL_GPL(crypto_unregister_ahashes);
+
 int ahash_register_instance(struct crypto_template *tmpl,
struct ahash_instance *inst)
 {
diff --git a/include/crypto/internal/hash.h b/include/crypto/internal/hash.h
index f6d9af3efa45..f0b44c16e88f 100644
--- a/include/crypto/internal/hash.h
+++ b/include/crypto/internal/hash.h
@@ -76,6 +76,8 @@ static inline int crypto_ahash_walk_last(struct 
crypto_hash_walk *walk)
 
 int crypto_register_ahash(struct ahash_alg *alg);
 int crypto_unregister_ahash(struct ahash_alg *alg);
+int crypto_register_ahashes(struct ahash_alg *algs, int count);
+void crypto_unregister_ahashes(struct ahash_alg *algs, int count);
 int ahash_register_instance(struct crypto_template *tmpl,
struct ahash_instance *inst);
 void ahash_free_instance(struct crypto_instance *inst);
-- 
2.11.0



[PATCH v4 0/4] crypto: add driver for Axis ARTPEC crypto accelerator

2017-08-10 Thread Lars Persson
This series adds a driver for the crypto accelerator in the ARTPEC series of
SoCs from Axis Communications AB.

Changelog v4:
- The skcipher conversion had a mistake where the algos were registered
  instead of unregistered at module unloading.

Changelog v3:
- The patch author added his Signed-off-by on patch 2.

Changelog v2:
- Use xts_check_key() for xts keys.
- Use CRYPTO_ALG_TYPE_SKCIPHER instead of CRYPTO_ALG_TYPE_ABLKCIPHER
  in cra_flags.

Lars Persson (3):
  dt-bindings: crypto: add ARTPEC crypto
  crypto: axis: add ARTPEC-6/7 crypto accelerator driver
  MAINTAINERS: Add ARTPEC crypto maintainer

Rabin Vincent (1):
  crypto: add crypto_(un)register_ahashes()

 .../devicetree/bindings/crypto/artpec6-crypto.txt  |   16 +
 MAINTAINERS|1 +
 crypto/ahash.c |   29 +
 drivers/crypto/Kconfig |   21 +
 drivers/crypto/Makefile|1 +
 drivers/crypto/axis/Makefile   |1 +
 drivers/crypto/axis/artpec6_crypto.c   | 3192 
 include/crypto/internal/hash.h |2 +
 8 files changed, 3263 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/crypto/artpec6-crypto.txt
 create mode 100644 drivers/crypto/axis/Makefile
 create mode 100644 drivers/crypto/axis/artpec6_crypto.c

-- 
2.11.0



[PATCH v4 1/4] dt-bindings: crypto: add ARTPEC crypto

2017-08-10 Thread Lars Persson
Document the device tree bindings for the ARTPEC crypto accelerator on
ARTPEC-6 and ARTPEC-7 SoCs.

Acked-by: Rob Herring 
Signed-off-by: Lars Persson 
---
 .../devicetree/bindings/crypto/artpec6-crypto.txt| 16 
 1 file changed, 16 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/crypto/artpec6-crypto.txt

diff --git a/Documentation/devicetree/bindings/crypto/artpec6-crypto.txt 
b/Documentation/devicetree/bindings/crypto/artpec6-crypto.txt
new file mode 100644
index ..d9cca4875bd6
--- /dev/null
+++ b/Documentation/devicetree/bindings/crypto/artpec6-crypto.txt
@@ -0,0 +1,16 @@
+Axis crypto engine with PDMA interface.
+
+Required properties:
+- compatible : Should be one of the following strings:
+   "axis,artpec6-crypto" for the version in the Axis ARTPEC-6 SoC
+   "axis,artpec7-crypto" for the version in the Axis ARTPEC-7 SoC.
+- reg: Base address and size for the PDMA register area.
+- interrupts: Interrupt handle for the PDMA interrupt line.
+
+Example:
+
+crypto@f4264000 {
+   compatible = "axis,artpec6-crypto";
+   reg = <0xf4264000 0x1000>;
+   interrupts = ;
+};
-- 
2.11.0



[PATCH v4 3/4] crypto: axis: add ARTPEC-6/7 crypto accelerator driver

2017-08-10 Thread Lars Persson
This is an asynchronous crypto API driver for the accelerator present
in the ARTPEC-6 and -7 SoCs from Axis Communications AB.

The driver supports AES in ECB/CTR/CBC/XTS/GCM modes and SHA1/2 hash
standards.

Signed-off-by: Lars Persson 
---
 drivers/crypto/Kconfig   |   21 +
 drivers/crypto/Makefile  |1 +
 drivers/crypto/axis/Makefile |1 +
 drivers/crypto/axis/artpec6_crypto.c | 3192 ++
 4 files changed, 3215 insertions(+)
 create mode 100644 drivers/crypto/axis/Makefile
 create mode 100644 drivers/crypto/axis/artpec6_crypto.c

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 5b5393f1b87a..fe33c199fc1a 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -708,4 +708,25 @@ config CRYPTO_DEV_SAFEXCEL
  chain mode, AES cipher mode and SHA1/SHA224/SHA256/SHA512 hash
  algorithms.
 
+config CRYPTO_DEV_ARTPEC6
+   tristate "Support for Axis ARTPEC-6/7 hardware crypto acceleration."
+   depends on ARM && (ARCH_ARTPEC || COMPILE_TEST)
+   depends on HAS_DMA
+   depends on OF
+   select CRYPTO_AEAD
+   select CRYPTO_AES
+   select CRYPTO_ALGAPI
+   select CRYPTO_BLKCIPHER
+   select CRYPTO_CTR
+   select CRYPTO_HASH
+   select CRYPTO_SHA1
+   select CRYPTO_SHA256
+   select CRYPTO_SHA384
+   select CRYPTO_SHA512
+   help
+ Enables the driver for the on-chip crypto accelerator
+ of Axis ARTPEC SoCs.
+
+ To compile this driver as a module, choose M here.
+
 endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index de629165fde7..7bf0997eae25 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -44,3 +44,4 @@ obj-$(CONFIG_CRYPTO_DEV_VIRTIO) += virtio/
 obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
 obj-$(CONFIG_CRYPTO_DEV_BCM_SPU) += bcm/
 obj-$(CONFIG_CRYPTO_DEV_SAFEXCEL) += inside-secure/
+obj-$(CONFIG_CRYPTO_DEV_ARTPEC6) += axis/
diff --git a/drivers/crypto/axis/Makefile b/drivers/crypto/axis/Makefile
new file mode 100644
index ..be9a84a4b667
--- /dev/null
+++ b/drivers/crypto/axis/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_CRYPTO_DEV_ARTPEC6) := artpec6_crypto.o
diff --git a/drivers/crypto/axis/artpec6_crypto.c 
b/drivers/crypto/axis/artpec6_crypto.c
new file mode 100644
index ..d9fbbf01062b
--- /dev/null
+++ b/drivers/crypto/axis/artpec6_crypto.c
@@ -0,0 +1,3192 @@
+/*
+ *   Driver for ARTPEC-6 crypto block using the kernel asynchronous crypto api.
+ *
+ *Copyright (C) 2014-2017  Axis Communications AB
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Max length of a line in all cache levels for Artpec SoCs. */
+#define ARTPEC_CACHE_LINE_MAX  32
+
+#define PDMA_OUT_CFG   0x
+#define PDMA_OUT_BUF_CFG   0x0004
+#define PDMA_OUT_CMD   0x0008
+#define PDMA_OUT_DESCRQ_PUSH   0x0010
+#define PDMA_OUT_DESCRQ_STAT   0x0014
+
+#define A6_PDMA_IN_CFG 0x0028
+#define A6_PDMA_IN_BUF_CFG 0x002c
+#define A6_PDMA_IN_CMD 0x0030
+#define A6_PDMA_IN_STATQ_PUSH  0x0038
+#define A6_PDMA_IN_DESCRQ_PUSH 0x0044
+#define A6_PDMA_IN_DESCRQ_STAT 0x0048
+#define A6_PDMA_INTR_MASK  0x0068
+#define A6_PDMA_ACK_INTR   0x006c
+#define A6_PDMA_MASKED_INTR0x0074
+
+#define A7_PDMA_IN_CFG 0x002c
+#define A7_PDMA_IN_BUF_CFG 0x0030
+#define A7_PDMA_IN_CMD 0x0034
+#define A7_PDMA_IN_STATQ_PUSH  0x003c
+#define A7_PDMA_IN_DESCRQ_PUSH 0x0048
+#define A7_PDMA_IN_DESCRQ_STAT 0x004C
+#define A7_PDMA_INTR_MASK  0x006c
+#define A7_PDMA_ACK_INTR   0x0070
+#define A7_PDMA_MASKED_INTR0x0078
+
+#define PDMA_OUT_CFG_ENBIT(0)
+
+#define PDMA_OUT_BUF_CFG_DATA_BUF_SIZE GENMASK(4, 0)
+#define PDMA_OUT_BUF_CFG_DESCR_BUF_SIZEGENMASK(9, 5)
+
+#define PDMA_OUT_CMD_START BIT(0)
+#define A6_PDMA_OUT_CMD_STOP   BIT(3)
+#define A7_PDMA_OUT_CMD_STOP   BIT(2)
+
+#define PDMA_OUT_DESCRQ_PUSH_LEN   GENMASK(5, 0)
+#define PDMA_OUT_DESCRQ_PUSH_ADDR  GENMASK(31, 6)
+
+#define PDMA_OUT_DESCRQ_STAT_LEVEL GENMASK(3, 0)
+#define PDMA_OUT_DESCRQ_STAT_SIZE  GENMASK(7, 4)
+
+#define PDMA_IN_CFG_EN BIT(0)
+
+#define PDMA_IN_BUF_CFG_DATA_BUF_SIZE  GENMASK(4, 0)
+#define PDMA_IN_BUF_CFG_DESCR_BUF_SIZE GENMASK(9, 5)
+#define PDMA_IN_BUF_CFG_STAT_BUF_SIZE  GENMASK(14, 10)
+
+#define PDMA_IN_CMD_START  BIT(0)
+#define A6_PDMA_IN_CMD_FLUSH_STAT  BIT(2)
+#define A6_PDMA_IN_CMD_STOPBIT(3)
+#define A7_PDMA_IN_CMD_FLUSH_STAT 

Re: [PATCH v8 1/4] crypto: AF_ALG -- add sign/verify API

2017-08-10 Thread Tudor Ambarus

Hi, Stephan,

On 08/10/2017 09:39 AM, Stephan Müller wrote:

Add the flags for handling signature generation and signature
verification.

The af_alg helper code as well as the algif_skcipher and algif_aead code
must be changed from a boolean indicating the cipher operation to an
integer because there are now 4 different cipher operations that are
defined. Yet, the algif_aead and algif_skcipher code still only allows
encryption and decryption cipher operations.

Signed-off-by: Stephan Mueller 
Signed-off-by: Tadeusz Struk 
---
  crypto/af_alg.c | 10 +-
  crypto/algif_aead.c | 36 
  crypto/algif_skcipher.c | 26 +-
  include/crypto/if_alg.h |  4 ++--
  include/uapi/linux/if_alg.h |  2 ++
  5 files changed, 50 insertions(+), 28 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index d6936c0e08d9..a35a9f854a04 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -859,7 +859,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, 
size_t size,
struct af_alg_tsgl *sgl;
struct af_alg_control con = {};
long copied = 0;
-   bool enc = 0;
+   int op = 0;
bool init = 0;
int err = 0;
  
@@ -870,11 +870,11 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
  
  		init = 1;

switch (con.op) {
+   case ALG_OP_VERIFY:
+   case ALG_OP_SIGN:
case ALG_OP_ENCRYPT:
-   enc = 1;
-   break;
case ALG_OP_DECRYPT:
-   enc = 0;
+   op = con.op;
break;
default:
return -EINVAL;
@@ -891,7 +891,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, 
size_t size,
}
  
  	if (init) {

-   ctx->enc = enc;
+   ctx->op = op;
if (con.iv)
memcpy(ctx->iv, con.iv->iv, ivsize);
  
diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c

index 516b38c3a169..77abc04cf942 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -60,7 +60,7 @@ static inline bool aead_sufficient_data(struct sock *sk)
 * The minimum amount of memory needed for an AEAD cipher is
 * the AAD and in case of decryption the tag.
 */
-   return ctx->used >= ctx->aead_assoclen + (ctx->enc ? 0 : as);
+   return ctx->used >= ctx->aead_assoclen + (ctx->op ? 0 : as);
  }
  
  static int aead_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)

@@ -137,7 +137,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr 
*msg,
 * buffer provides the tag which is consumed resulting in only the
 * plaintext without a buffer for the tag returned to the caller.
 */
-   if (ctx->enc)
+   if (ctx->op)
outlen = used + as;
else
outlen = used - as;
@@ -196,7 +196,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr 
*msg,
/* Use the RX SGL as source (and destination) for crypto op. */
src = areq->first_rsgl.sgl.sg;
  
-	if (ctx->enc) {

+   if (ctx->op == ALG_OP_ENCRYPT) {
/*
 * Encryption operation - The in-place cipher operation is
 * achieved by the following operation:
@@ -212,7 +212,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr 
*msg,
if (err)
goto free;
af_alg_pull_tsgl(sk, processed, NULL, 0);
-   } else {
+   } else if (ctx->op == ALG_OP_DECRYPT) {
/*
 * Decryption operation - To achieve an in-place cipher
 * operation, the following  SGL structure is used:
@@ -258,6 +258,9 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr 
*msg,
} else
/* no RX SGL present (e.g. authentication only) */
src = areq->tsgl;
+   } else {
+   err = -EOPNOTSUPP;
+   goto free;
}
  
  	/* Initialize the crypto operation */

@@ -272,19 +275,28 @@ static int _aead_recvmsg(struct socket *sock, struct 
msghdr *msg,
aead_request_set_callback(>cra_u.aead_req,
  CRYPTO_TFM_REQ_MAY_BACKLOG,
  af_alg_async_cb, areq);
-   err = ctx->enc ? crypto_aead_encrypt(>cra_u.aead_req) :
-crypto_aead_decrypt(>cra_u.aead_req);
-   } else {
+   } else


Unbalanced braces around else statement.


/* Synchronous operation */
aead_request_set_callback(>cra_u.aead_req,
  CRYPTO_TFM_REQ_MAY_BACKLOG,
  af_alg_complete, 

Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Austin S. Hemmelgarn

On 2017-08-10 04:30, Eric Biggers wrote:

On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:


It can compress at speeds approaching lz4, and quality approaching lzma.


Well, for a very loose definition of "approaching", and certainly not at the
same time.  I doubt there's a use case for using the highest compression levels
in kernel mode --- especially the ones using zstd_opt.h.
Large data-sets with WORM access patterns and infrequent writes 
immediately come to mind as a use case for the highest compression level.


As a more specific example, the company I work for has a very large 
amount of documentation, and we keep all old versions.  This is all 
stored on a file server which is currently using BTRFS.  Once a document 
is written, it's almost never rewritten, so write performance only 
matters for the first write.  However, they're read back pretty 
frequently, so we need good read performance.  As of right now, the 
system is set to use LZO compression by default, and then when a new 
document is added, the previous version of that document gets 
re-compressed using zlib compression, which actually results in pretty 
significant space savings most of the time.  I would absolutely love to 
use zstd compression with this system with the highest compression 
level, because most people don't care how long it takes to write the 
file out, but they do care how long it takes to read a file (even if 
it's an older version).




The code was ported from the upstream zstd source repository.


What version?


`linux/zstd.h` header was modified to match linux kernel style.
The cross-platform and allocation code was stripped out. Instead zstd
requires the caller to pass a preallocated workspace. The source files
were clang-formatted [1] to match the Linux Kernel style as much as
possible.


It would be easier to compare to the upstream version if it was not all
reformatted.  There is a chance that bugs were introduced by Linux-specific
changes, and it would be nice if they could be easily reviewed.  (Also I don't
know what clang-format settings you used, but there are still a lot of
differences from the Linux coding style.)



I benchmarked zstd compression as a special character device. I ran zstd
and zlib compression at several levels, as well as performing no
compression, which measure the time spent copying the data to kernel space.
Data is passed to the compresser 4096 B at a time. The benchmark file is
located in the upstream zstd source repository under
`contrib/linux-kernel/zstd_compress_test.c` [2].

I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
211,988,480 B large. Run the following commands for the benchmark:

 sudo modprobe zstd_compress_test
 sudo mknod zstd_compress_test c 245 0
 sudo cp silesia.tar zstd_compress_test

The time is reported by the time of the userland `cp`.
The MB/s is computed with

 1,536,217,008 B / time(buffer size, hash)

which includes the time to copy from userland.
The Adjusted MB/s is computed with

 1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).

The memory reported is the amount of memory the compressor requests.

| Method   | Size (B) | Time (s) | Ratio | MB/s| Adj MB/s | Mem (MB) |
|--|--|--|---|-|--|--|
| none | 11988480 |0.100 | 1 | 2119.88 |- |- |
| zstd -1  | 73645762 |1.044 | 2.878 |  203.05 |   224.56 | 1.23 |
| zstd -3  | 66988878 |1.761 | 3.165 |  120.38 |   127.63 | 2.47 |
| zstd -5  | 65001259 |2.563 | 3.261 |   82.71 |86.07 | 2.86 |
| zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |16.13 |13.22 |
| zstd -15 | 58009756 |   47.601 | 3.654 |4.45 | 4.46 |21.61 |
| zstd -19 | 54014593 |  102.835 | 3.925 |2.06 | 2.06 |60.15 |
| zlib -1  | 77260026 |2.895 | 2.744 |   73.23 |75.85 | 0.27 |
| zlib -3  | 72972206 |4.116 | 2.905 |   51.50 |52.79 | 0.27 |
| zlib -6  | 68190360 |9.633 | 3.109 |   22.01 |22.24 | 0.27 |
| zlib -9  | 67613382 |   22.554 | 3.135 |9.40 | 9.44 | 0.27 |



Theses benchmarks are misleading because they compress the whole file as a
single stream without resetting the dictionary, which isn't how data will
typically be compressed in kernel mode.  With filesystem compression the data
has to be divided into small chunks that can each be decompressed independently.
That eliminates one of the primary advantages of Zstandard (support for large
dictionary sizes).


Re: [PATCH v5 2/5] lib: Add zstd modules

2017-08-10 Thread Eric Biggers
On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote:
>
> It can compress at speeds approaching lz4, and quality approaching lzma.

Well, for a very loose definition of "approaching", and certainly not at the
same time.  I doubt there's a use case for using the highest compression levels
in kernel mode --- especially the ones using zstd_opt.h.

> 
> The code was ported from the upstream zstd source repository.

What version?

> `linux/zstd.h` header was modified to match linux kernel style.
> The cross-platform and allocation code was stripped out. Instead zstd
> requires the caller to pass a preallocated workspace. The source files
> were clang-formatted [1] to match the Linux Kernel style as much as
> possible. 

It would be easier to compare to the upstream version if it was not all
reformatted.  There is a chance that bugs were introduced by Linux-specific
changes, and it would be nice if they could be easily reviewed.  (Also I don't
know what clang-format settings you used, but there are still a lot of
differences from the Linux coding style.)

> 
> I benchmarked zstd compression as a special character device. I ran zstd
> and zlib compression at several levels, as well as performing no
> compression, which measure the time spent copying the data to kernel space.
> Data is passed to the compresser 4096 B at a time. The benchmark file is
> located in the upstream zstd source repository under
> `contrib/linux-kernel/zstd_compress_test.c` [2].
> 
> I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
> The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
> 16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
> 211,988,480 B large. Run the following commands for the benchmark:
> 
> sudo modprobe zstd_compress_test
> sudo mknod zstd_compress_test c 245 0
> sudo cp silesia.tar zstd_compress_test
> 
> The time is reported by the time of the userland `cp`.
> The MB/s is computed with
> 
> 1,536,217,008 B / time(buffer size, hash)
> 
> which includes the time to copy from userland.
> The Adjusted MB/s is computed with
> 
> 1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
> 
> The memory reported is the amount of memory the compressor requests.
> 
> | Method   | Size (B) | Time (s) | Ratio | MB/s| Adj MB/s | Mem (MB) |
> |--|--|--|---|-|--|--|
> | none | 11988480 |0.100 | 1 | 2119.88 |- |- |
> | zstd -1  | 73645762 |1.044 | 2.878 |  203.05 |   224.56 | 1.23 |
> | zstd -3  | 66988878 |1.761 | 3.165 |  120.38 |   127.63 | 2.47 |
> | zstd -5  | 65001259 |2.563 | 3.261 |   82.71 |86.07 | 2.86 |
> | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |16.13 |13.22 |
> | zstd -15 | 58009756 |   47.601 | 3.654 |4.45 | 4.46 |21.61 |
> | zstd -19 | 54014593 |  102.835 | 3.925 |2.06 | 2.06 |60.15 |
> | zlib -1  | 77260026 |2.895 | 2.744 |   73.23 |75.85 | 0.27 |
> | zlib -3  | 72972206 |4.116 | 2.905 |   51.50 |52.79 | 0.27 |
> | zlib -6  | 68190360 |9.633 | 3.109 |   22.01 |22.24 | 0.27 |
> | zlib -9  | 67613382 |   22.554 | 3.135 |9.40 | 9.44 | 0.27 |
> 

Theses benchmarks are misleading because they compress the whole file as a
single stream without resetting the dictionary, which isn't how data will
typically be compressed in kernel mode.  With filesystem compression the data
has to be divided into small chunks that can each be decompressed independently.
That eliminates one of the primary advantages of Zstandard (support for large
dictionary sizes).

Eric


Re: [PATCH v2] crypto: AF_ALG - consolidation of duplicate code

2017-08-10 Thread Stephan Mueller
Am Donnerstag, 10. August 2017, 10:21:53 CEST schrieb Herbert Xu:

Hi Herbert,

> On Thu, Aug 10, 2017 at 10:16:48AM +0200, Stephan Mueller wrote:
> > As now the AIO code path is updated, the bug that I was reporting last
> > September allowing to crash the kernel via AF_ALG is fixed.
> > 
> > As the patch is very invasive, I am not sure that patch set should be sent
> > to stable. How do you propose we fix the crash bug in older kernels that
> > are due to memory management problems in the AIO code path?
> 
> Is it possible to create a minimal fix for the stable kernels?

I think there is such patch already, see [1].

Your comment to that patch triggered my rewrite of the memory managment code.

[1] https://www.spinics.net/lists/linux-crypto/msg21618.html

Ciao
Stephan


Re: [PATCH v2] crypto: AF_ALG - consolidation of duplicate code

2017-08-10 Thread Herbert Xu
On Thu, Aug 10, 2017 at 10:16:48AM +0200, Stephan Mueller wrote:
>
> As now the AIO code path is updated, the bug that I was reporting last 
> September allowing to crash the kernel via AF_ALG is fixed.
> 
> As the patch is very invasive, I am not sure that patch set should be sent to 
> stable. How do you propose we fix the crash bug in older kernels that are due 
> to memory management problems in the AIO code path?

Is it possible to create a minimal fix for the stable kernels?

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v2] crypto: AF_ALG - consolidation of duplicate code

2017-08-10 Thread Stephan Mueller
Am Mittwoch, 9. August 2017, 15:57:34 CEST schrieb Herbert Xu:

Hi Herbert,
> 
> Patch applied.  Thanks.

Thanks.

As now the AIO code path is updated, the bug that I was reporting last 
September allowing to crash the kernel via AF_ALG is fixed.

As the patch is very invasive, I am not sure that patch set should be sent to 
stable. How do you propose we fix the crash bug in older kernels that are due 
to memory management problems in the AIO code path?

Ciao
Stephan


[PATCH v8 3/4] crypto: AF_ALG -- add asymmetric cipher

2017-08-10 Thread Stephan Müller
This patch adds the user space interface for asymmetric ciphers. The
interface allows the use of sendmsg as well as vmsplice to provide data.

The akcipher interface implementation uses the common AF_ALG interface
code regarding TX and RX SGL handling.

Signed-off-by: Stephan Mueller 
---
 crypto/algif_akcipher.c | 466 
 include/crypto/if_alg.h |   2 +
 2 files changed, 468 insertions(+)
 create mode 100644 crypto/algif_akcipher.c

diff --git a/crypto/algif_akcipher.c b/crypto/algif_akcipher.c
new file mode 100644
index ..1b36eb0b6e8f
--- /dev/null
+++ b/crypto/algif_akcipher.c
@@ -0,0 +1,466 @@
+/*
+ * algif_akcipher: User-space interface for asymmetric cipher algorithms
+ *
+ * Copyright (C) 2017, Stephan Mueller 
+ *
+ * This file provides the user-space API for asymmetric ciphers.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * The following concept of the memory management is used:
+ *
+ * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is
+ * filled by user space with the data submitted via sendpage/sendmsg. Filling
+ * up the TX SGL does not cause a crypto operation -- the data will only be
+ * tracked by the kernel. Upon receipt of one recvmsg call, the caller must
+ * provide a buffer which is tracked with the RX SGL.
+ *
+ * During the processing of the recvmsg operation, the cipher request is
+ * allocated and prepared. As part of the recvmsg operation, the processed
+ * TX buffers are extracted from the TX SGL into a separate SGL.
+ *
+ * After the completion of the crypto operation, the RX SGL and the cipher
+ * request is released. The extracted TX SGL parts are released together with
+ * the RX SGL release.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct akcipher_tfm {
+   struct crypto_akcipher *akcipher;
+   bool has_key;
+};
+
+static int akcipher_sendmsg(struct socket *sock, struct msghdr *msg,
+   size_t size)
+{
+   return af_alg_sendmsg(sock, msg, size, 0);
+}
+
+static int _akcipher_recvmsg(struct socket *sock, struct msghdr *msg,
+size_t ignored, int flags)
+{
+   struct sock *sk = sock->sk;
+   struct alg_sock *ask = alg_sk(sk);
+   struct sock *psk = ask->parent;
+   struct alg_sock *pask = alg_sk(psk);
+   struct af_alg_ctx *ctx = ask->private;
+   struct akcipher_tfm *akc = pask->private;
+   struct crypto_akcipher *tfm = akc->akcipher;
+   struct af_alg_async_req *areq;
+   int err = 0;
+   int maxsize;
+   size_t len = 0;
+   size_t used = 0;
+
+   maxsize = crypto_akcipher_maxsize(tfm);
+   if (maxsize < 0)
+   return maxsize;
+
+   /* Allocate cipher request for current operation. */
+   areq = af_alg_alloc_areq(sk, sizeof(struct af_alg_async_req) +
+crypto_akcipher_reqsize(tfm));
+   if (IS_ERR(areq))
+   return PTR_ERR(areq);
+
+   /* convert iovecs of output buffers into RX SGL */
+   err = af_alg_get_rsgl(sk, msg, flags, areq, maxsize, );
+   if (err)
+   goto free;
+
+   /* ensure output buffer is sufficiently large */
+   if (len < maxsize) {
+   err = -EMSGSIZE;
+   goto free;
+   }
+
+   /*
+* Create a per request TX SGL for this request which tracks the
+* SG entries from the global TX SGL.
+*/
+   used = ctx->used;
+   areq->tsgl_entries = af_alg_count_tsgl(sk, used, 0);
+   if (!areq->tsgl_entries)
+   areq->tsgl_entries = 1;
+   areq->tsgl = sock_kmalloc(sk, sizeof(*areq->tsgl) * areq->tsgl_entries,
+ GFP_KERNEL);
+   if (!areq->tsgl) {
+   err = -ENOMEM;
+   goto free;
+   }
+   sg_init_table(areq->tsgl, areq->tsgl_entries);
+   af_alg_pull_tsgl(sk, used, areq->tsgl, 0);
+
+   /* Initialize the crypto operation */
+   akcipher_request_set_tfm(>cra_u.akcipher_req, tfm);
+   akcipher_request_set_crypt(>cra_u.akcipher_req, areq->tsgl,
+  areq->first_rsgl.sgl.sg, used, len);
+
+   if (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) {
+   /* AIO operation */
+   areq->iocb = msg->msg_iocb;
+   akcipher_request_set_callback(>cra_u.akcipher_req,
+ CRYPTO_TFM_REQ_MAY_SLEEP,
+ af_alg_async_cb, areq);
+   } else
+   /* Synchronous operation */
+   

[PATCH v8 2/4] crypto: AF_ALG -- add setpubkey setsockopt call

2017-08-10 Thread Stephan Müller
For supporting asymmetric ciphers, user space must be able to set the
public key. The patch adds a new setsockopt call for setting the public
key.

Signed-off-by: Stephan Mueller 
---
 crypto/af_alg.c | 18 +-
 include/crypto/if_alg.h |  1 +
 include/uapi/linux/if_alg.h |  1 +
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index a35a9f854a04..176921d7593a 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -203,13 +203,17 @@ static int alg_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len)
 }
 
 static int alg_setkey(struct sock *sk, char __user *ukey,
- unsigned int keylen)
+ unsigned int keylen,
+ int (*setkey)(void *private, const u8 *key,
+   unsigned int keylen))
 {
struct alg_sock *ask = alg_sk(sk);
-   const struct af_alg_type *type = ask->type;
u8 *key;
int err;
 
+   if (!setkey)
+   return -ENOPROTOOPT;
+
key = sock_kmalloc(sk, keylen, GFP_KERNEL);
if (!key)
return -ENOMEM;
@@ -218,7 +222,7 @@ static int alg_setkey(struct sock *sk, char __user *ukey,
if (copy_from_user(key, ukey, keylen))
goto out;
 
-   err = type->setkey(ask->private, key, keylen);
+   err = setkey(ask->private, key, keylen);
 
 out:
sock_kzfree_s(sk, key, keylen);
@@ -248,10 +252,14 @@ static int alg_setsockopt(struct socket *sock, int level, 
int optname,
case ALG_SET_KEY:
if (sock->state == SS_CONNECTED)
goto unlock;
-   if (!type->setkey)
+
+   err = alg_setkey(sk, optval, optlen, type->setkey);
+   break;
+   case ALG_SET_PUBKEY:
+   if (sock->state == SS_CONNECTED)
goto unlock;
 
-   err = alg_setkey(sk, optval, optlen);
+   err = alg_setkey(sk, optval, optlen, type->setpubkey);
break;
case ALG_SET_AEAD_AUTHSIZE:
if (sock->state == SS_CONNECTED)
diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
index 50a21488f3ba..d1de8ed3e77b 100644
--- a/include/crypto/if_alg.h
+++ b/include/crypto/if_alg.h
@@ -55,6 +55,7 @@ struct af_alg_type {
void *(*bind)(const char *name, u32 type, u32 mask);
void (*release)(void *private);
int (*setkey)(void *private, const u8 *key, unsigned int keylen);
+   int (*setpubkey)(void *private, const u8 *key, unsigned int keylen);
int (*accept)(void *private, struct sock *sk);
int (*accept_nokey)(void *private, struct sock *sk);
int (*setauthsize)(void *private, unsigned int authsize);
diff --git a/include/uapi/linux/if_alg.h b/include/uapi/linux/if_alg.h
index d81dcca5bdd7..02e61627e089 100644
--- a/include/uapi/linux/if_alg.h
+++ b/include/uapi/linux/if_alg.h
@@ -34,6 +34,7 @@ struct af_alg_iv {
 #define ALG_SET_OP 3
 #define ALG_SET_AEAD_ASSOCLEN  4
 #define ALG_SET_AEAD_AUTHSIZE  5
+#define ALG_SET_PUBKEY 6
 
 /* Operations */
 #define ALG_OP_DECRYPT 0
-- 
2.13.4




[PATCH v8 4/4] crypto: algif_akcipher - enable compilation

2017-08-10 Thread Stephan Müller
Add the Makefile and Kconfig updates to allow algif_akcipher to be
compiled.

Signed-off-by: Stephan Mueller 
---
 crypto/Kconfig  | 9 +
 crypto/Makefile | 1 +
 2 files changed, 10 insertions(+)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 0a121f9ddf8e..fdcec68545f3 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1760,6 +1760,15 @@ config CRYPTO_USER_API_AEAD
  This option enables the user-spaces interface for AEAD
  cipher algorithms.
 
+config CRYPTO_USER_API_AKCIPHER
+   tristate "User-space interface for asymmetric key cipher algorithms"
+   depends on NET
+   select CRYPTO_AKCIPHER2
+   select CRYPTO_USER_API
+   help
+ This option enables the user-spaces interface for asymmetric
+ key cipher algorithms.
+
 config CRYPTO_HASH_INFO
bool
 
diff --git a/crypto/Makefile b/crypto/Makefile
index d41f0331b085..12dbf2c5fe7c 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -133,6 +133,7 @@ obj-$(CONFIG_CRYPTO_USER_API_HASH) += algif_hash.o
 obj-$(CONFIG_CRYPTO_USER_API_SKCIPHER) += algif_skcipher.o
 obj-$(CONFIG_CRYPTO_USER_API_RNG) += algif_rng.o
 obj-$(CONFIG_CRYPTO_USER_API_AEAD) += algif_aead.o
+obj-$(CONFIG_CRYPTO_USER_API_AKCIPHER) += algif_akcipher.o
 
 ecdh_generic-y := ecc.o
 ecdh_generic-y += ecdh.o
-- 
2.13.4




[PATCH v8 0/4] crypto: add algif_akcipher user space API

2017-08-10 Thread Stephan Müller
Hi,

This patch set adds the AF_ALG user space API to externalize the
asymmetric cipher API recently added to the kernel crypto API.

The patch set is tested with the user space library of libkcapi [1].
Use [1] test/test.sh for a full test run. The test covers the
following scenarios:

* sendmsg of one IOVEC

* sendmsg of 16 IOVECs with non-linear buffer

* vmsplice of one IOVEC

* vmsplice of 15 IOVECs with non-linear buffer

* invoking multiple separate cipher operations with one
  open cipher handle

* encryption with private key (using vector from testmgr.h)

* encryption with public key (using vector from testmgr.h)

* decryption with private key (using vector from testmgr.h)

Note, to enable the test, edit line [2] from "4 99" to "4 13".

[1] http://www.chronox.de/libkcapi.html
[2] https://github.com/smuellerDD/libkcapi/blob/master/test/test.sh#L1452

Changes v8:
 * port to kernel 4.13
 * port to consolidated AF_ALG code

Stephan Mueller (4):
  crypto: AF_ALG -- add sign/verify API
  crypto: AF_ALG -- add setpubkey setsockopt call
  crypto: AF_ALG -- add asymmetric cipher
  crypto: algif_akcipher - enable compilation

 crypto/Kconfig  |   9 +
 crypto/Makefile |   1 +
 crypto/af_alg.c |  28 ++-
 crypto/algif_aead.c |  36 ++--
 crypto/algif_akcipher.c | 466 
 crypto/algif_skcipher.c |  26 ++-
 include/crypto/if_alg.h |   7 +-
 include/uapi/linux/if_alg.h |   3 +
 8 files changed, 543 insertions(+), 33 deletions(-)
 create mode 100644 crypto/algif_akcipher.c

-- 
2.13.4




[PATCH v8 1/4] crypto: AF_ALG -- add sign/verify API

2017-08-10 Thread Stephan Müller
Add the flags for handling signature generation and signature
verification.

The af_alg helper code as well as the algif_skcipher and algif_aead code
must be changed from a boolean indicating the cipher operation to an
integer because there are now 4 different cipher operations that are
defined. Yet, the algif_aead and algif_skcipher code still only allows
encryption and decryption cipher operations.

Signed-off-by: Stephan Mueller 
Signed-off-by: Tadeusz Struk 
---
 crypto/af_alg.c | 10 +-
 crypto/algif_aead.c | 36 
 crypto/algif_skcipher.c | 26 +-
 include/crypto/if_alg.h |  4 ++--
 include/uapi/linux/if_alg.h |  2 ++
 5 files changed, 50 insertions(+), 28 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index d6936c0e08d9..a35a9f854a04 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -859,7 +859,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, 
size_t size,
struct af_alg_tsgl *sgl;
struct af_alg_control con = {};
long copied = 0;
-   bool enc = 0;
+   int op = 0;
bool init = 0;
int err = 0;
 
@@ -870,11 +870,11 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr 
*msg, size_t size,
 
init = 1;
switch (con.op) {
+   case ALG_OP_VERIFY:
+   case ALG_OP_SIGN:
case ALG_OP_ENCRYPT:
-   enc = 1;
-   break;
case ALG_OP_DECRYPT:
-   enc = 0;
+   op = con.op;
break;
default:
return -EINVAL;
@@ -891,7 +891,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, 
size_t size,
}
 
if (init) {
-   ctx->enc = enc;
+   ctx->op = op;
if (con.iv)
memcpy(ctx->iv, con.iv->iv, ivsize);
 
diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index 516b38c3a169..77abc04cf942 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -60,7 +60,7 @@ static inline bool aead_sufficient_data(struct sock *sk)
 * The minimum amount of memory needed for an AEAD cipher is
 * the AAD and in case of decryption the tag.
 */
-   return ctx->used >= ctx->aead_assoclen + (ctx->enc ? 0 : as);
+   return ctx->used >= ctx->aead_assoclen + (ctx->op ? 0 : as);
 }
 
 static int aead_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
@@ -137,7 +137,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr 
*msg,
 * buffer provides the tag which is consumed resulting in only the
 * plaintext without a buffer for the tag returned to the caller.
 */
-   if (ctx->enc)
+   if (ctx->op)
outlen = used + as;
else
outlen = used - as;
@@ -196,7 +196,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr 
*msg,
/* Use the RX SGL as source (and destination) for crypto op. */
src = areq->first_rsgl.sgl.sg;
 
-   if (ctx->enc) {
+   if (ctx->op == ALG_OP_ENCRYPT) {
/*
 * Encryption operation - The in-place cipher operation is
 * achieved by the following operation:
@@ -212,7 +212,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr 
*msg,
if (err)
goto free;
af_alg_pull_tsgl(sk, processed, NULL, 0);
-   } else {
+   } else if (ctx->op == ALG_OP_DECRYPT) {
/*
 * Decryption operation - To achieve an in-place cipher
 * operation, the following  SGL structure is used:
@@ -258,6 +258,9 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr 
*msg,
} else
/* no RX SGL present (e.g. authentication only) */
src = areq->tsgl;
+   } else {
+   err = -EOPNOTSUPP;
+   goto free;
}
 
/* Initialize the crypto operation */
@@ -272,19 +275,28 @@ static int _aead_recvmsg(struct socket *sock, struct 
msghdr *msg,
aead_request_set_callback(>cra_u.aead_req,
  CRYPTO_TFM_REQ_MAY_BACKLOG,
  af_alg_async_cb, areq);
-   err = ctx->enc ? crypto_aead_encrypt(>cra_u.aead_req) :
-crypto_aead_decrypt(>cra_u.aead_req);
-   } else {
+   } else
/* Synchronous operation */
aead_request_set_callback(>cra_u.aead_req,
  CRYPTO_TFM_REQ_MAY_BACKLOG,
  af_alg_complete, >completion);
-   err = af_alg_wait_for_completion(ctx->enc ?
-

[PATCH] crypto: MPI - kunmap after finishing accessing buffer

2017-08-10 Thread Stephan Müller
Hi Herbert,

I found that issue while playing around with edge conditions in my
algif_akcipher implementation. This issue only manifests in a
segmentation violation on 32 bit machines and with an SGL where each
SG points to one byte. SGLs with larger buffers seem to be not
affected by this issue.

Yet this access-after-unmap should be a candidate for stable, IMHO.

---8<---

Using sg_miter_start and sg_miter_next, the buffer of an SG is kmap'ed
to *buff. The current code calls sg_miter_stop (and thus kunmap) on the
SG entry before the last access of *buff.

The patch moves the sg_miter_stop call after the last access to *buff to
ensure that the memory pointed to by *buff is still mapped.

Signed-off-by: Stephan Mueller 
---
 lib/mpi/mpicoder.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/mpi/mpicoder.c b/lib/mpi/mpicoder.c
index 5a0f75a3bf01..eead4b339466 100644
--- a/lib/mpi/mpicoder.c
+++ b/lib/mpi/mpicoder.c
@@ -364,11 +364,11 @@ MPI mpi_read_raw_from_sgl(struct scatterlist *sgl, 
unsigned int nbytes)
}
 
miter.consumed = lzeros;
-   sg_miter_stop();
 
nbytes -= lzeros;
nbits = nbytes * 8;
if (nbits > MAX_EXTERN_MPI_BITS) {
+   sg_miter_stop();
pr_info("MPI: mpi too large (%u bits)\n", nbits);
return NULL;
}
@@ -376,6 +376,8 @@ MPI mpi_read_raw_from_sgl(struct scatterlist *sgl, unsigned 
int nbytes)
if (nbytes > 0)
nbits -= count_leading_zeros(*buff) - (BITS_PER_LONG - 8);
 
+   sg_miter_stop();
+
nlimbs = DIV_ROUND_UP(nbytes, BYTES_PER_MPI_LIMB);
val = mpi_alloc(nlimbs);
if (!val)
-- 
2.13.4