Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-12 Thread Minchan Kim
Hi Sven,

On Sun, Feb 12, 2017 at 12:16:17PM +0100, Sven Schmidt wrote:
> 
> 
> 
> On 02/10/2017 01:13 AM, Minchan Kim wrote:
> > Hello Sven,
> >
> > On Thu, Feb 09, 2017 at 11:56:17AM +0100, Sven Schmidt wrote:
> >> Hey Minchan,
> >>
> >> On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote:
> >>> Hello Sven,
> >>>
> >>> On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote:
> 
>  This patchset is for updating the LZ4 compression module to a version 
>  based
>  on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast
>  which provides an "acceleration" parameter as a tradeoff between
>  high compression ratio and high compression speed.
> 
>  We want to use LZ4 fast in order to support compression in lustre
>  and (mostly, based on that) investigate data reduction techniques in 
>  behalf of
>  storage systems.
> 
>  Also, it will be useful for other users of LZ4 compression, as with LZ4 
>  fast
>  it is possible to enable applications to use fast and/or high compression
>  depending on the usecase.
>  For instance, ZRAM is offering a LZ4 backend and could benefit from an 
>  updated
>  LZ4 in the kernel.
> 
>  LZ4 homepage: http://www.lz4.org/
>  LZ4 source repository: https://github.com/lz4/lz4
>  Source version: 1.7.3
> 
>  Benchmark (taken from [1], Core i5-4300U @1.9GHz):
>  |--||--
>  Compressor  | Compression  | Decompression  | Ratio
>  |--||--
>  memcpy  |  4200 MB/s   |  4200 MB/s | 1.000
>  LZ4 fast 50 |  1080 MB/s   |  2650 MB/s | 1.375
>  LZ4 fast 17 |   680 MB/s   |  2220 MB/s | 1.607
>  LZ4 fast 5  |   475 MB/s   |  1920 MB/s | 1.886
>  LZ4 default |   385 MB/s   |  1850 MB/s | 2.101
> 
>  [1] 
>  http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html
> 
>  [PATCH 1/5] lib: Update LZ4 compressor module
>  [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 
>  module version
>  [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module 
>  version
>  [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new 
>  LZ4 version
>  [PATCH 5/5] lib/lz4: Remove back-compat wrappers
> >>>
> >>> Today, I did zram-lz4 performance test with fio in current mmotm and
> >>> found it makes regression about 20%.
> >>>
> >>> "lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) so
> >>> applied your 5 patches. (But now sure current mmots has recent uptodate
> >>> patches)
> >>> "revert" means I reverted your 5 patches in current mmots.
> >>>
> >>>  revertlz4-update
> >>>
> >>>   seq-write   1547   1339  86.55%
> >>>  rand-write  22775  19381  85.10%
> >>>seq-read   7035   5589  79.45%
> >>>   rand-read  78556  68479  87.17%
> >>>mixed-seq(R)   1305   1066  81.69%
> >>>mixed-seq(W)   1205984  81.66%
> >>>   mixed-rand(R)  17421  14993  86.06%
> >>>   mixed-rand(W)  17391  14968  86.07%
> >>
> >> which parts of the output (as well as units) are these values exactly?
> >> I did not work with fio until now, so I think I might ask before 
> >> misinterpreting my results.
> >
> > It is IOPS.
> >
> >>  
> >>> My fio description file
> >>>
> >>> [global]
> >>> bs=4k
> >>> ioengine=sync
> >>> size=100m
> >>> numjobs=1
> >>> group_reporting
> >>> buffer_compress_percentage=30
> >>> scramble_buffers=0
> >>> filename=/dev/zram0
> >>> loops=10
> >>> fsync_on_close=1
> >>>
> >>> [seq-write]
> >>> bs=64k
> >>> rw=write
> >>> stonewall
> >>>
> >>> [rand-write]
> >>> rw=randwrite
> >>> stonewall
> >>>
> >>> [seq-read]
> >>> bs=64k
> >>> rw=read
> >>> stonewall
> >>>
> >>> [rand-read]
> >>> rw=randread
> >>> stonewall
> >>>
> >>> [mixed-seq]
> >>> bs=64k
> >>> rw=rw
> >>> stonewall
> >>>
> >>> [mixed-rand]
> >>> rw=randrw
> >>> stonewall
> >>>
> >>
> >> Great, this makes it easy for me to reproduce your test.
> >
> > If you have trouble to reproduce, feel free to ask me. I'm happy to test 
> > it. :)
> >
> > Thanks!
> >
> 
> Hi Minchan,
> 
> I will send an updated patch as a reply to this E-Mail. Would be really 
> grateful If you'd test it and provide feedback!
> The patch should be applied to the current mmots tree.
> 
> In fact, the updated LZ4 _is_ slower than the current one in kernel. But I 
> was not able to reproduce such large regressions
> as you did. I now tried to define FORCE_INLINE as Eric suggested. I also 
> inlined some functions which weren't in upstream LZ4,
> but are defined as macros in the current kernel LZ4. The approach to replace 
> LZ4_ARCH64 with the function call _seemed_ to behave
> worse than the macro, so I withdrew

Re: linux-next: build warnings after merge of the crypto tree

2017-02-12 Thread Stephen Rothwell
Hi Herbert,

On Sat, 11 Feb 2017 18:56:21 +0800 Herbert Xu  
wrote:
>
> On Fri, Feb 10, 2017 at 02:12:51PM +1100, Stephen Rothwell wrote:
> >
> > I am still getting these warnings ... I have seen no updates to the
> > crypot tree since Feb 2.  
> 
> Sorry Stephen.  I have now applied Arnd's fixes for this problem
> and it should be pushed out.

Thanks, its much cleaner now. :-)

-- 
Cheers,
Stephen Rothwell


Re: [PATCH] lz4: fix performance regressions

2017-02-12 Thread Eric Biggers
Hi Sven,

On Sun, Feb 12, 2017 at 12:16:18PM +0100, Sven Schmidt wrote:
>  /*-
>   *   Reading and writing into memory
>   **/
> +typedef union {
> + U16 u16;
> + U32 u32;
> + size_t uArch;
> +} __packed unalign;
> 
> -static inline U16 LZ4_read16(const void *memPtr)
> +static FORCE_INLINE __maybe_unused U16 LZ4_read16(const void *ptr)
>  {
> - U16 val;
> -
> - memcpy(&val, memPtr, sizeof(val));
> -
> - return val;
> + return ((const unalign *)ptr)->u16;
>  }
> 
> -static inline U32 LZ4_read32(const void *memPtr)
> +static FORCE_INLINE __maybe_unused U32 LZ4_read32(const void *ptr)
>  {
> - U32 val;
> -
> - memcpy(&val, memPtr, sizeof(val));
> -
> - return val;
> + return ((const unalign *)ptr)->u32;
>  }
> 
> -static inline size_t LZ4_read_ARCH(const void *memPtr)
> +static FORCE_INLINE __maybe_unused size_t LZ4_read_ARCH(const void *ptr)
>  {
> - size_t val;
> -
> - memcpy(&val, memPtr, sizeof(val));
> -
> - return val;
> + return ((const unalign *)ptr)->uArch;
>  }
> 
> -static inline void LZ4_write16(void *memPtr, U16 value)
> +static FORCE_INLINE __maybe_unused void LZ4_write16(void *memPtr, U16 value)
>  {
> - memcpy(memPtr, &value, sizeof(value));
> + ((unalign *)memPtr)->u16 = value;
>  }
> 
> -static inline void LZ4_write32(void *memPtr, U32 value)
> -{
> - memcpy(memPtr, &value, sizeof(value));
> +static FORCE_INLINE __maybe_unused void LZ4_write32(void *memPtr, U32 value) 
> {
> + ((unalign *)memPtr)->u32 = value;
>  }
> 
> -static inline U16 LZ4_readLE16(const void *memPtr)
> +static FORCE_INLINE __maybe_unused U16 LZ4_readLE16(const void *memPtr)
>  {
> -#ifdef __LITTLE_ENDIAN__
> +#if LZ4_LITTLE_ENDIAN
>   return LZ4_read16(memPtr);
>  #else
>   const BYTE *p = (const BYTE *)memPtr;
> @@ -137,19 +143,19 @@ static inline U16 LZ4_readLE16(const void *memPtr)
>  #endif
>  }

Since upstream LZ4 is intended to be compiled at -O3, this may allow it to get
away with using memcpy() for unaligned memory accesses.  The reason it uses
memcpy() is that, other than a byte-by-byte copy, it is the only portable way to
express unaligned memory accesses.  But the Linux kernel is sometimes compiled
optimized for size (-Os), and I wouldn't be *too* surprised if some of the
memcpy()'s don't always get inlined then, which could be causing the performance
regression being observed.  (Of course, this could be verified by checking
whether CONFIG_CC_OPTIMIZE_FOR_SIZE=y is set, then reading the assembly.)

But I don't think accessing a __packed structure directly is the right
alternative.  Instead, Linux already includes macros for unaligned memory
accesses which have been optimized for every supported architecture.  Those
should just be used instead, e.g. like this:

static FORCE_INLINE U16 LZ4_read16(const void *ptr)
{
return get_unaligned((const u16 *)ptr);
}

static FORCE_INLINE U32 LZ4_read32(const void *ptr)
{
return get_unaligned((const u32 *)ptr);
}

static FORCE_INLINE size_t LZ4_read_ARCH(const void *ptr)
{
return get_unaligned((const size_t *)ptr);
}

static FORCE_INLINE void LZ4_write16(void *memPtr, U16 value)
{
put_unaligned(value, (u16 *)memPtr);
}

static FORCE_INLINE void LZ4_write32(void *memPtr, U32 value)
{
put_unaligned(value, (u32 *)memPtr);
}

static FORCE_INLINE U16 LZ4_readLE16(const void *memPtr)
{
return get_unaligned_le16(memPtr);
}

static FORCE_INLINE void LZ4_writeLE16(void *memPtr, U16 value)
{
return put_unaligned_le16(value, memPtr);
}

static FORCE_INLINE void LZ4_copy8(void *dst, const void *src)
{
if (LZ4_64bits()) {
u64 a = get_unaligned((const u64 *)src);
put_unaligned(a, (u64 *)dst);
} else {
u32 a = get_unaligned((const u32 *)src);
u32 b = get_unaligned((const u32 *)src + 1);
put_unaligned(a, (u32 *)dst);
put_unaligned(b, (u32 *)dst + 1);
}
}


Note that I dropped __maybe_unused as it's not needed on inline functions.
That should be done everywhere else the patch proposes to add it too.

> -#if LZ4_ARCH64
> -#ifdef __BIG_ENDIAN__
> -#define LZ4_NBCOMMONBYTES(val) (__builtin_clzll(val) >> 3)
> +static FORCE_INLINE unsigned int LZ4_NbCommonBytes(register size_t val)
> +{
> +#if LZ4_LITTLE_ENDIAN
> +#if LZ4_ARCH64 /* 64 Bits Little Endian */
> +#if defined(LZ4_FORCE_SW_BITCOUNT)
> + static const int DeBruijnBytePos[64] = {
> + 0, 0, 0, 0, 0, 1, 1, 2, 0, 3, 1, 3, 1, 4, 2, 7,
> + 0, 2, 3, 6, 1, 5, 3, 5, 1, 3, 4, 4, 2, 5, 6, 7,
> + 7, 0, 1, 2, 3, 3, 4, 6, 2, 6, 5, 5, 3, 4, 5, 6,
> + 7, 1, 2, 4, 6, 4, 4, 5, 7, 2, 6, 5, 7, 6, 7, 7
> + };
> +
> + return DeBruijnBytePos[((U64)((val & -(long long)val)
> + * 0x0218A392CDABBD3FULL)) >> 58];
>  #else
> -#define LZ4_NBCOMMONBYTES(val) (__builtin_

Re: [PATCH] lz4: fix performance regressions

2017-02-12 Thread Willy Tarreau
On Sun, Feb 12, 2017 at 04:20:00PM +0100, Sven Schmidt wrote:
> On Sun, Feb 12, 2017 at 02:05:08PM +0100, Willy Tarreau wrote:
> > Hi Sven,
> > 
> > On Sun, Feb 12, 2017 at 12:16:18PM +0100, Sven Schmidt wrote:
> > > Fix performance regressions compared to current kernel LZ4
> > 
> > Your patch contains mostly style cleanups which certainly are welcome
> > but make the whole patch hard to review. These cleanups would have been
> > better into a separate, preliminary patch IMHO.
> > 
> > Regards,
> > Willy
> 
> Hi Willy,
> 
> the problem was, I wanted to compare my version to the upstream LZ4 to find 
> bugs (as with my last patch version: wrong indentation in LZ4HC 
> in two for loops). But since the LZ4 code is a pain to read, I made 
> additional style cleanups "on the way".

Oh I can easily understand!

> Hope you can manage to review the patch though, because it is difficult to 
> separate the cleanups now.

When I need to split a patch into pieces, usually what I do is that I
revert it, re-apply it without committing, then "git add -p", validate
all the hunks to be taken as the first patch (ie here the cleanups),
commit, then commit the rest as a separate one. It seems to me that the
fix is in the last few hunks though I'm not sure yet.

Thanks,
Willy


Re: [PATCH] lz4: fix performance regressions

2017-02-12 Thread Sven Schmidt
On Sun, Feb 12, 2017 at 02:05:08PM +0100, Willy Tarreau wrote:
> Hi Sven,
> 
> On Sun, Feb 12, 2017 at 12:16:18PM +0100, Sven Schmidt wrote:
> > Fix performance regressions compared to current kernel LZ4
> 
> Your patch contains mostly style cleanups which certainly are welcome
> but make the whole patch hard to review. These cleanups would have been
> better into a separate, preliminary patch IMHO.
> 
> Regards,
> Willy

Hi Willy,

the problem was, I wanted to compare my version to the upstream LZ4 to find 
bugs (as with my last patch version: wrong indentation in LZ4HC 
in two for loops). But since the LZ4 code is a pain to read, I made additional 
style cleanups "on the way".
Hope you can manage to review the patch though, because it is difficult to 
separate the cleanups now.
Please feel free to ask if you stumble upon something.

Greetings,

Sven 


[PATCH RFC] crypto: testmgr drop wrong init_completion

2017-02-12 Thread Nicholas Mc Guire
init_completion() is called here to reinitialize a completion object
that was already re-initialized in wait_async_op() by 
reinit_completion() if complete (via tcrypt_complete()) had been called
and wait_for_completion() returned, so no need to reinit it here.


Fixes: commit 946cc46372dc ("crypto: testmgr - add tests vectors for RSA")
Signed-off-by: Nicholas Mc Guire 
---

Found by experimental coccinelle script
./crypto/testmgr.c:2174:1-16: WARNING: possible duplicate init_completion

Only based on code review and no testing. In case I am overlooking something
and the re-initialization of the completion object is actually needed it
should be using reinit_completion() and not init_completion() anyway.
But as wait_async_op() will leave with the completion object re-initialized
it really should not be needed here (found no path in between that could
have called completion()).

Patch was only compile tested with: x86_64_defconfig (implies cryptomgr-y)

Patch is against linux-4.10-rc6 (localversion-next is next-20170210)

 crypto/testmgr.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 98eb097..15fb453 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -2171,7 +2171,6 @@ static int test_akcipher_one(struct crypto_akcipher *tfm,
 
sg_init_one(&src, xbuf[0], vecs->c_size);
sg_init_one(&dst, outbuf_dec, out_len_max);
-   init_completion(&result.completion);
akcipher_request_set_crypt(req, &src, &dst, vecs->c_size, out_len_max);
 
/* Run RSA decrypt - m = c^d mod n;*/
-- 
2.1.4



Re: [PATCH] lz4: fix performance regressions

2017-02-12 Thread Willy Tarreau
Hi Sven,

On Sun, Feb 12, 2017 at 12:16:18PM +0100, Sven Schmidt wrote:
> Fix performance regressions compared to current kernel LZ4

Your patch contains mostly style cleanups which certainly are welcome
but make the whole patch hard to review. These cleanups would have been
better into a separate, preliminary patch IMHO.

Regards,
Willy


[PATCH] lz4: fix performance regressions

2017-02-12 Thread Sven Schmidt
Fix performance regressions compared to current kernel LZ4

Signed-off-by: Sven Schmidt <4ssch...@informatik.uni-hamburg.de>
---
 include/linux/lz4.h  |   2 +-
 lib/lz4/lz4_compress.c   | 157 +++-
 lib/lz4/lz4_decompress.c |  50 
 lib/lz4/lz4defs.h| 203 ---
 lib/lz4/lz4hc_compress.c |   8 +-
 5 files changed, 281 insertions(+), 139 deletions(-)

diff --git a/include/linux/lz4.h b/include/linux/lz4.h
index a3912d7..394e3d9 100644
--- a/include/linux/lz4.h
+++ b/include/linux/lz4.h
@@ -82,7 +82,7 @@
 /*-
  * STREAMING CONSTANTS AND STRUCTURES
  **/
-#define LZ4_STREAMSIZE_U64 ((1 << (LZ4_MEMORY_USAGE-3)) + 4)
+#define LZ4_STREAMSIZE_U64 ((1 << (LZ4_MEMORY_USAGE - 3)) + 4)
 #define LZ4_STREAMSIZE (LZ4_STREAMSIZE_U64 * sizeof(unsigned long long))

 #define LZ4_STREAMHCSIZE262192
diff --git a/lib/lz4/lz4_compress.c b/lib/lz4/lz4_compress.c
index 697dbda..2cbbf99 100644
--- a/lib/lz4/lz4_compress.c
+++ b/lib/lz4/lz4_compress.c
@@ -39,27 +39,33 @@
 #include 
 #include 

+static const int LZ4_minLength = (MFLIMIT + 1);
+static const int LZ4_64Klimit = ((64 * KB) + (MFLIMIT - 1));
+
 /*-**
  * Compression functions
  /
-static U32 LZ4_hash4(U32 sequence, tableType_t const tableType)
+static FORCE_INLINE U32 LZ4_hash4(
+   U32 sequence,
+   tableType_t const tableType)
 {
if (tableType == byU16)
return ((sequence * 2654435761U)
-   >> ((MINMATCH*8) - (LZ4_HASHLOG + 1)));
+   >> ((MINMATCH * 8) - (LZ4_HASHLOG + 1)));
else
return ((sequence * 2654435761U)
-   >> ((MINMATCH*8) - LZ4_HASHLOG));
+   >> ((MINMATCH * 8) - LZ4_HASHLOG));
 }

-#if LZ4_ARCH64
-static U32 LZ4_hash5(U64 sequence, tableType_t const tableType)
+static FORCE_INLINE __maybe_unused U32 LZ4_hash5(
+   U64 sequence,
+   tableType_t const tableType)
 {
const U32 hashLog = (tableType == byU16)
? LZ4_HASHLOG + 1
: LZ4_HASHLOG;

-#ifdef __LITTLE_ENDIAN__
+#if LZ4_LITTLE_ENDIAN
static const U64 prime5bytes = 889523592379ULL;

return (U32)(((sequence << 24) * prime5bytes) >> (64 - hashLog));
@@ -69,9 +75,10 @@ static U32 LZ4_hash5(U64 sequence, tableType_t const 
tableType)
return (U32)(((sequence >> 24) * prime8bytes) >> (64 - hashLog));
 #endif
 }
-#endif

-static U32 LZ4_hashPosition(const void *p, tableType_t tableType)
+static FORCE_INLINE U32 LZ4_hashPosition(
+   const void *p,
+   tableType_t const tableType)
 {
 #if LZ4_ARCH64
if (tableType == byU32)
@@ -81,8 +88,12 @@ static U32 LZ4_hashPosition(const void *p, tableType_t 
tableType)
return LZ4_hash4(LZ4_read32(p), tableType);
 }

-static void LZ4_putPositionOnHash(const BYTE *p, U32 h, void *tableBase,
-   tableType_t const tableType, const BYTE *srcBase)
+static void LZ4_putPositionOnHash(
+   const BYTE *p,
+   U32 h,
+   void *tableBase,
+   tableType_t const tableType,
+   const BYTE *srcBase)
 {
switch (tableType) {
case byPtr:
@@ -109,16 +120,22 @@ static void LZ4_putPositionOnHash(const BYTE *p, U32 h, 
void *tableBase,
}
 }

-static inline void LZ4_putPosition(const BYTE *p, void *tableBase,
-   tableType_t tableType, const BYTE *srcBase)
+static FORCE_INLINE void LZ4_putPosition(
+   const BYTE *p,
+   void *tableBase,
+   tableType_t tableType,
+   const BYTE *srcBase)
 {
U32 const h = LZ4_hashPosition(p, tableType);

LZ4_putPositionOnHash(p, h, tableBase, tableType, srcBase);
 }

-static const BYTE *LZ4_getPositionOnHash(U32 h, void *tableBase,
-   tableType_t tableType, const BYTE *srcBase)
+static const BYTE *LZ4_getPositionOnHash(
+   U32 h,
+   void *tableBase,
+   tableType_t tableType,
+   const BYTE *srcBase)
 {
if (tableType == byPtr) {
const BYTE **hashTable = (const BYTE **) tableBase;
@@ -135,12 +152,16 @@ static const BYTE *LZ4_getPositionOnHash(U32 h, void 
*tableBase,
{
/* default, to ensure a return */
const U16 * const hashTable = (U16 *) tableBase;
+
return hashTable[h] + srcBase;
}
 }

-static inline const BYTE *LZ4_getPosition(const BYTE *p, void *tableBase,
-   tableType_t tableType, const BYTE *srcBase)
+static FORCE_INLINE const BYTE *LZ4_getPosition(
+   const BYTE *p,
+   void *tableBase,
+   tableType_t tableType,
+   const BYTE *srcBase)
 {
U32 const h = LZ4_hashPosition(p, tableType);

@@ -152,7 +173,7 @@ static inline const BYTE *LZ4_getPosition(const BYTE *p, 
void *tableBase,
  * LZ4_compress_generi

Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-12 Thread Sven Schmidt



On 02/10/2017 01:13 AM, Minchan Kim wrote:
> Hello Sven,
>
> On Thu, Feb 09, 2017 at 11:56:17AM +0100, Sven Schmidt wrote:
>> Hey Minchan,
>>
>> On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote:
>>> Hello Sven,
>>>
>>> On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote:

 This patchset is for updating the LZ4 compression module to a version based
 on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast
 which provides an "acceleration" parameter as a tradeoff between
 high compression ratio and high compression speed.

 We want to use LZ4 fast in order to support compression in lustre
 and (mostly, based on that) investigate data reduction techniques in 
 behalf of
 storage systems.

 Also, it will be useful for other users of LZ4 compression, as with LZ4 
 fast
 it is possible to enable applications to use fast and/or high compression
 depending on the usecase.
 For instance, ZRAM is offering a LZ4 backend and could benefit from an 
 updated
 LZ4 in the kernel.

 LZ4 homepage: http://www.lz4.org/
 LZ4 source repository: https://github.com/lz4/lz4
 Source version: 1.7.3

 Benchmark (taken from [1], Core i5-4300U @1.9GHz):
 |--||--
 Compressor  | Compression  | Decompression  | Ratio
 |--||--
 memcpy  |  4200 MB/s   |  4200 MB/s | 1.000
 LZ4 fast 50 |  1080 MB/s   |  2650 MB/s | 1.375
 LZ4 fast 17 |   680 MB/s   |  2220 MB/s | 1.607
 LZ4 fast 5  |   475 MB/s   |  1920 MB/s | 1.886
 LZ4 default |   385 MB/s   |  1850 MB/s | 2.101

 [1] http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html

 [PATCH 1/5] lib: Update LZ4 compressor module
 [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 
 module version
 [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version
 [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new 
 LZ4 version
 [PATCH 5/5] lib/lz4: Remove back-compat wrappers
>>>
>>> Today, I did zram-lz4 performance test with fio in current mmotm and
>>> found it makes regression about 20%.
>>>
>>> "lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) so
>>> applied your 5 patches. (But now sure current mmots has recent uptodate
>>> patches)
>>> "revert" means I reverted your 5 patches in current mmots.
>>>
>>>  revertlz4-update
>>>
>>>   seq-write   1547   1339  86.55%
>>>  rand-write  22775  19381  85.10%
>>>seq-read   7035   5589  79.45%
>>>   rand-read  78556  68479  87.17%
>>>mixed-seq(R)   1305   1066  81.69%
>>>mixed-seq(W)   1205984  81.66%
>>>   mixed-rand(R)  17421  14993  86.06%
>>>   mixed-rand(W)  17391  14968  86.07%
>>
>> which parts of the output (as well as units) are these values exactly?
>> I did not work with fio until now, so I think I might ask before 
>> misinterpreting my results.
>
> It is IOPS.
>
>>  
>>> My fio description file
>>>
>>> [global]
>>> bs=4k
>>> ioengine=sync
>>> size=100m
>>> numjobs=1
>>> group_reporting
>>> buffer_compress_percentage=30
>>> scramble_buffers=0
>>> filename=/dev/zram0
>>> loops=10
>>> fsync_on_close=1
>>>
>>> [seq-write]
>>> bs=64k
>>> rw=write
>>> stonewall
>>>
>>> [rand-write]
>>> rw=randwrite
>>> stonewall
>>>
>>> [seq-read]
>>> bs=64k
>>> rw=read
>>> stonewall
>>>
>>> [rand-read]
>>> rw=randread
>>> stonewall
>>>
>>> [mixed-seq]
>>> bs=64k
>>> rw=rw
>>> stonewall
>>>
>>> [mixed-rand]
>>> rw=randrw
>>> stonewall
>>>
>>
>> Great, this makes it easy for me to reproduce your test.
>
> If you have trouble to reproduce, feel free to ask me. I'm happy to test it. 
> :)
>
> Thanks!
>

Hi Minchan,

I will send an updated patch as a reply to this E-Mail. Would be really 
grateful If you'd test it and provide feedback!
The patch should be applied to the current mmots tree.

In fact, the updated LZ4 _is_ slower than the current one in kernel. But I was 
not able to reproduce such large regressions
as you did. I now tried to define FORCE_INLINE as Eric suggested. I also 
inlined some functions which weren't in upstream LZ4,
but are defined as macros in the current kernel LZ4. The approach to replace 
LZ4_ARCH64 with the function call _seemed_ to behave
worse than the macro, so I withdrew the change.

The main difference is, that I replaced the read32/read16/write... etc. 
functions using memcpy with the other ones defined 
in upstream LZ4 (which can be switched using a macro). 
The comment of the author stated, that they're as fast as the memcpy variants 
(or faster), but not as portable
(which does not matter since we're not dependent for multiple compilers).

In m

Клиентские базы +79139230330 Skype: prodawez390 Email: prodawez...@gmail.com Узнайте подробнее!

2017-02-12 Thread linux-crypto
Клиентские базы +79139230330 Skype: prodawez390 Email: prodawez...@gmail.com 
Узнайте подробнее!