Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-14 Thread Minchan Kim
Hi Sven,

On Mon, Feb 13, 2017 at 01:08:41PM +0100, Sven Schmidt wrote:
> On Mon, Feb 13, 2017 at 09:03:24AM +0900, Minchan Kim wrote:
> > Hi Sven,
> > 
> > On Sun, Feb 12, 2017 at 12:16:17PM +0100, Sven Schmidt wrote:
> > > 
> > > 
> > > 
> > > On 02/10/2017 01:13 AM, Minchan Kim wrote:
> > > > Hello Sven,
> > > >
> > > > On Thu, Feb 09, 2017 at 11:56:17AM +0100, Sven Schmidt wrote:
> > > >> Hey Minchan,
> > > >>
> > > >> On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote:
> > > >>> Hello Sven,
> > > >>>
> > > >>> On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote:
> > > 
> > >  This patchset is for updating the LZ4 compression module to a 
> > >  version based
> > >  on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 
> > >  fast
> > >  which provides an "acceleration" parameter as a tradeoff between
> > >  high compression ratio and high compression speed.
> > > 
> > >  We want to use LZ4 fast in order to support compression in lustre
> > >  and (mostly, based on that) investigate data reduction techniques in 
> > >  behalf of
> > >  storage systems.
> > > 
> > >  Also, it will be useful for other users of LZ4 compression, as with 
> > >  LZ4 fast
> > >  it is possible to enable applications to use fast and/or high 
> > >  compression
> > >  depending on the usecase.
> > >  For instance, ZRAM is offering a LZ4 backend and could benefit from 
> > >  an updated
> > >  LZ4 in the kernel.
> > > 
> > >  LZ4 homepage: http://www.lz4.org/
> > >  LZ4 source repository: https://github.com/lz4/lz4
> > >  Source version: 1.7.3
> > > 
> > >  Benchmark (taken from [1], Core i5-4300U @1.9GHz):
> > >  |--||--
> > >  Compressor  | Compression  | Decompression  | Ratio
> > >  |--||--
> > >  memcpy  |  4200 MB/s   |  4200 MB/s | 1.000
> > >  LZ4 fast 50 |  1080 MB/s   |  2650 MB/s | 1.375
> > >  LZ4 fast 17 |   680 MB/s   |  2220 MB/s | 1.607
> > >  LZ4 fast 5  |   475 MB/s   |  1920 MB/s | 1.886
> > >  LZ4 default |   385 MB/s   |  1850 MB/s | 2.101
> > > 
> > >  [1] 
> > >  http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html
> > > 
> > >  [PATCH 1/5] lib: Update LZ4 compressor module
> > >  [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 
> > >  module version
> > >  [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module 
> > >  version
> > >  [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with 
> > >  new LZ4 version
> > >  [PATCH 5/5] lib/lz4: Remove back-compat wrappers
> > > >>>
> > > >>> Today, I did zram-lz4 performance test with fio in current mmotm and
> > > >>> found it makes regression about 20%.
> > > >>>
> > > >>> "lz4-update" means current 
> > > >>> mmots(git://git.cmpxchg.org/linux-mmots.git) so
> > > >>> applied your 5 patches. (But now sure current mmots has recent 
> > > >>> uptodate
> > > >>> patches)
> > > >>> "revert" means I reverted your 5 patches in current mmots.
> > > >>>
> > > >>>  revertlz4-update
> > > >>>
> > > >>>   seq-write   1547   1339  86.55%
> > > >>>  rand-write  22775  19381  85.10%
> > > >>>seq-read   7035   5589  79.45%
> > > >>>   rand-read  78556  68479  87.17%
> > > >>>mixed-seq(R)   1305   1066  81.69%
> > > >>>mixed-seq(W)   1205984  81.66%
> > > >>>   mixed-rand(R)  17421  14993  86.06%
> > > >>>   mixed-rand(W)  17391  14968  86.07%
> > > >>
> > > >> which parts of the output (as well as units) are these values exactly?
> > > >> I did not work with fio until now, so I think I might ask before 
> > > >> misinterpreting my results.
> > > >
> > > > It is IOPS.
> > > >
> > > >>  
> > > >>> My fio description file
> > > >>>
> > > >>> [global]
> > > >>> bs=4k
> > > >>> ioengine=sync
> > > >>> size=100m
> > > >>> numjobs=1
> > > >>> group_reporting
> > > >>> buffer_compress_percentage=30
> > > >>> scramble_buffers=0
> > > >>> filename=/dev/zram0
> > > >>> loops=10
> > > >>> fsync_on_close=1
> > > >>>
> > > >>> [seq-write]
> > > >>> bs=64k
> > > >>> rw=write
> > > >>> stonewall
> > > >>>
> > > >>> [rand-write]
> > > >>> rw=randwrite
> > > >>> stonewall
> > > >>>
> > > >>> [seq-read]
> > > >>> bs=64k
> > > >>> rw=read
> > > >>> stonewall
> > > >>>
> > > >>> [rand-read]
> > > >>> rw=randread
> > > >>> stonewall
> > > >>>
> > > >>> [mixed-seq]
> > > >>> bs=64k
> > > >>> rw=rw
> > > >>> stonewall
> > > >>>
> > > >>> [mixed-rand]
> > > >>> rw=randrw
> > > >>> stonewall
> > > >>>
> > > >>
> > > >> Great, this makes it easy for me to reproduce your test.
> > > >
> > > > If you have trouble to 

Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-13 Thread Sven Schmidt
On Mon, Feb 13, 2017 at 09:03:24AM +0900, Minchan Kim wrote:
> Hi Sven,
> 
> On Sun, Feb 12, 2017 at 12:16:17PM +0100, Sven Schmidt wrote:
> > 
> > 
> > 
> > On 02/10/2017 01:13 AM, Minchan Kim wrote:
> > > Hello Sven,
> > >
> > > On Thu, Feb 09, 2017 at 11:56:17AM +0100, Sven Schmidt wrote:
> > >> Hey Minchan,
> > >>
> > >> On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote:
> > >>> Hello Sven,
> > >>>
> > >>> On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote:
> > 
> >  This patchset is for updating the LZ4 compression module to a version 
> >  based
> >  on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 
> >  fast
> >  which provides an "acceleration" parameter as a tradeoff between
> >  high compression ratio and high compression speed.
> > 
> >  We want to use LZ4 fast in order to support compression in lustre
> >  and (mostly, based on that) investigate data reduction techniques in 
> >  behalf of
> >  storage systems.
> > 
> >  Also, it will be useful for other users of LZ4 compression, as with 
> >  LZ4 fast
> >  it is possible to enable applications to use fast and/or high 
> >  compression
> >  depending on the usecase.
> >  For instance, ZRAM is offering a LZ4 backend and could benefit from an 
> >  updated
> >  LZ4 in the kernel.
> > 
> >  LZ4 homepage: http://www.lz4.org/
> >  LZ4 source repository: https://github.com/lz4/lz4
> >  Source version: 1.7.3
> > 
> >  Benchmark (taken from [1], Core i5-4300U @1.9GHz):
> >  |--||--
> >  Compressor  | Compression  | Decompression  | Ratio
> >  |--||--
> >  memcpy  |  4200 MB/s   |  4200 MB/s | 1.000
> >  LZ4 fast 50 |  1080 MB/s   |  2650 MB/s | 1.375
> >  LZ4 fast 17 |   680 MB/s   |  2220 MB/s | 1.607
> >  LZ4 fast 5  |   475 MB/s   |  1920 MB/s | 1.886
> >  LZ4 default |   385 MB/s   |  1850 MB/s | 2.101
> > 
> >  [1] 
> >  http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html
> > 
> >  [PATCH 1/5] lib: Update LZ4 compressor module
> >  [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 
> >  module version
> >  [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module 
> >  version
> >  [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with 
> >  new LZ4 version
> >  [PATCH 5/5] lib/lz4: Remove back-compat wrappers
> > >>>
> > >>> Today, I did zram-lz4 performance test with fio in current mmotm and
> > >>> found it makes regression about 20%.
> > >>>
> > >>> "lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) 
> > >>> so
> > >>> applied your 5 patches. (But now sure current mmots has recent uptodate
> > >>> patches)
> > >>> "revert" means I reverted your 5 patches in current mmots.
> > >>>
> > >>>  revertlz4-update
> > >>>
> > >>>   seq-write   1547   1339  86.55%
> > >>>  rand-write  22775  19381  85.10%
> > >>>seq-read   7035   5589  79.45%
> > >>>   rand-read  78556  68479  87.17%
> > >>>mixed-seq(R)   1305   1066  81.69%
> > >>>mixed-seq(W)   1205984  81.66%
> > >>>   mixed-rand(R)  17421  14993  86.06%
> > >>>   mixed-rand(W)  17391  14968  86.07%
> > >>
> > >> which parts of the output (as well as units) are these values exactly?
> > >> I did not work with fio until now, so I think I might ask before 
> > >> misinterpreting my results.
> > >
> > > It is IOPS.
> > >
> > >>  
> > >>> My fio description file
> > >>>
> > >>> [global]
> > >>> bs=4k
> > >>> ioengine=sync
> > >>> size=100m
> > >>> numjobs=1
> > >>> group_reporting
> > >>> buffer_compress_percentage=30
> > >>> scramble_buffers=0
> > >>> filename=/dev/zram0
> > >>> loops=10
> > >>> fsync_on_close=1
> > >>>
> > >>> [seq-write]
> > >>> bs=64k
> > >>> rw=write
> > >>> stonewall
> > >>>
> > >>> [rand-write]
> > >>> rw=randwrite
> > >>> stonewall
> > >>>
> > >>> [seq-read]
> > >>> bs=64k
> > >>> rw=read
> > >>> stonewall
> > >>>
> > >>> [rand-read]
> > >>> rw=randread
> > >>> stonewall
> > >>>
> > >>> [mixed-seq]
> > >>> bs=64k
> > >>> rw=rw
> > >>> stonewall
> > >>>
> > >>> [mixed-rand]
> > >>> rw=randrw
> > >>> stonewall
> > >>>
> > >>
> > >> Great, this makes it easy for me to reproduce your test.
> > >
> > > If you have trouble to reproduce, feel free to ask me. I'm happy to test 
> > > it. :)
> > >
> > > Thanks!
> > >
> > 
> > Hi Minchan,
> > 
> > I will send an updated patch as a reply to this E-Mail. Would be really 
> > grateful If you'd test it and provide feedback!
> > The patch should be applied to the current mmots tree.
> > 
> > In fact, the updated LZ4 _is_ slower than the 

Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-12 Thread Minchan Kim
Hi Sven,

On Sun, Feb 12, 2017 at 12:16:17PM +0100, Sven Schmidt wrote:
> 
> 
> 
> On 02/10/2017 01:13 AM, Minchan Kim wrote:
> > Hello Sven,
> >
> > On Thu, Feb 09, 2017 at 11:56:17AM +0100, Sven Schmidt wrote:
> >> Hey Minchan,
> >>
> >> On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote:
> >>> Hello Sven,
> >>>
> >>> On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote:
> 
>  This patchset is for updating the LZ4 compression module to a version 
>  based
>  on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast
>  which provides an "acceleration" parameter as a tradeoff between
>  high compression ratio and high compression speed.
> 
>  We want to use LZ4 fast in order to support compression in lustre
>  and (mostly, based on that) investigate data reduction techniques in 
>  behalf of
>  storage systems.
> 
>  Also, it will be useful for other users of LZ4 compression, as with LZ4 
>  fast
>  it is possible to enable applications to use fast and/or high compression
>  depending on the usecase.
>  For instance, ZRAM is offering a LZ4 backend and could benefit from an 
>  updated
>  LZ4 in the kernel.
> 
>  LZ4 homepage: http://www.lz4.org/
>  LZ4 source repository: https://github.com/lz4/lz4
>  Source version: 1.7.3
> 
>  Benchmark (taken from [1], Core i5-4300U @1.9GHz):
>  |--||--
>  Compressor  | Compression  | Decompression  | Ratio
>  |--||--
>  memcpy  |  4200 MB/s   |  4200 MB/s | 1.000
>  LZ4 fast 50 |  1080 MB/s   |  2650 MB/s | 1.375
>  LZ4 fast 17 |   680 MB/s   |  2220 MB/s | 1.607
>  LZ4 fast 5  |   475 MB/s   |  1920 MB/s | 1.886
>  LZ4 default |   385 MB/s   |  1850 MB/s | 2.101
> 
>  [1] 
>  http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html
> 
>  [PATCH 1/5] lib: Update LZ4 compressor module
>  [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 
>  module version
>  [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module 
>  version
>  [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new 
>  LZ4 version
>  [PATCH 5/5] lib/lz4: Remove back-compat wrappers
> >>>
> >>> Today, I did zram-lz4 performance test with fio in current mmotm and
> >>> found it makes regression about 20%.
> >>>
> >>> "lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) so
> >>> applied your 5 patches. (But now sure current mmots has recent uptodate
> >>> patches)
> >>> "revert" means I reverted your 5 patches in current mmots.
> >>>
> >>>  revertlz4-update
> >>>
> >>>   seq-write   1547   1339  86.55%
> >>>  rand-write  22775  19381  85.10%
> >>>seq-read   7035   5589  79.45%
> >>>   rand-read  78556  68479  87.17%
> >>>mixed-seq(R)   1305   1066  81.69%
> >>>mixed-seq(W)   1205984  81.66%
> >>>   mixed-rand(R)  17421  14993  86.06%
> >>>   mixed-rand(W)  17391  14968  86.07%
> >>
> >> which parts of the output (as well as units) are these values exactly?
> >> I did not work with fio until now, so I think I might ask before 
> >> misinterpreting my results.
> >
> > It is IOPS.
> >
> >>  
> >>> My fio description file
> >>>
> >>> [global]
> >>> bs=4k
> >>> ioengine=sync
> >>> size=100m
> >>> numjobs=1
> >>> group_reporting
> >>> buffer_compress_percentage=30
> >>> scramble_buffers=0
> >>> filename=/dev/zram0
> >>> loops=10
> >>> fsync_on_close=1
> >>>
> >>> [seq-write]
> >>> bs=64k
> >>> rw=write
> >>> stonewall
> >>>
> >>> [rand-write]
> >>> rw=randwrite
> >>> stonewall
> >>>
> >>> [seq-read]
> >>> bs=64k
> >>> rw=read
> >>> stonewall
> >>>
> >>> [rand-read]
> >>> rw=randread
> >>> stonewall
> >>>
> >>> [mixed-seq]
> >>> bs=64k
> >>> rw=rw
> >>> stonewall
> >>>
> >>> [mixed-rand]
> >>> rw=randrw
> >>> stonewall
> >>>
> >>
> >> Great, this makes it easy for me to reproduce your test.
> >
> > If you have trouble to reproduce, feel free to ask me. I'm happy to test 
> > it. :)
> >
> > Thanks!
> >
> 
> Hi Minchan,
> 
> I will send an updated patch as a reply to this E-Mail. Would be really 
> grateful If you'd test it and provide feedback!
> The patch should be applied to the current mmots tree.
> 
> In fact, the updated LZ4 _is_ slower than the current one in kernel. But I 
> was not able to reproduce such large regressions
> as you did. I now tried to define FORCE_INLINE as Eric suggested. I also 
> inlined some functions which weren't in upstream LZ4,
> but are defined as macros in the current kernel LZ4. The approach to replace 
> LZ4_ARCH64 with the function call _seemed_ to behave
> worse than the macro, so I 

Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-12 Thread Sven Schmidt



On 02/10/2017 01:13 AM, Minchan Kim wrote:
> Hello Sven,
>
> On Thu, Feb 09, 2017 at 11:56:17AM +0100, Sven Schmidt wrote:
>> Hey Minchan,
>>
>> On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote:
>>> Hello Sven,
>>>
>>> On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote:

 This patchset is for updating the LZ4 compression module to a version based
 on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast
 which provides an "acceleration" parameter as a tradeoff between
 high compression ratio and high compression speed.

 We want to use LZ4 fast in order to support compression in lustre
 and (mostly, based on that) investigate data reduction techniques in 
 behalf of
 storage systems.

 Also, it will be useful for other users of LZ4 compression, as with LZ4 
 fast
 it is possible to enable applications to use fast and/or high compression
 depending on the usecase.
 For instance, ZRAM is offering a LZ4 backend and could benefit from an 
 updated
 LZ4 in the kernel.

 LZ4 homepage: http://www.lz4.org/
 LZ4 source repository: https://github.com/lz4/lz4
 Source version: 1.7.3

 Benchmark (taken from [1], Core i5-4300U @1.9GHz):
 |--||--
 Compressor  | Compression  | Decompression  | Ratio
 |--||--
 memcpy  |  4200 MB/s   |  4200 MB/s | 1.000
 LZ4 fast 50 |  1080 MB/s   |  2650 MB/s | 1.375
 LZ4 fast 17 |   680 MB/s   |  2220 MB/s | 1.607
 LZ4 fast 5  |   475 MB/s   |  1920 MB/s | 1.886
 LZ4 default |   385 MB/s   |  1850 MB/s | 2.101

 [1] http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html

 [PATCH 1/5] lib: Update LZ4 compressor module
 [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 
 module version
 [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version
 [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new 
 LZ4 version
 [PATCH 5/5] lib/lz4: Remove back-compat wrappers
>>>
>>> Today, I did zram-lz4 performance test with fio in current mmotm and
>>> found it makes regression about 20%.
>>>
>>> "lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) so
>>> applied your 5 patches. (But now sure current mmots has recent uptodate
>>> patches)
>>> "revert" means I reverted your 5 patches in current mmots.
>>>
>>>  revertlz4-update
>>>
>>>   seq-write   1547   1339  86.55%
>>>  rand-write  22775  19381  85.10%
>>>seq-read   7035   5589  79.45%
>>>   rand-read  78556  68479  87.17%
>>>mixed-seq(R)   1305   1066  81.69%
>>>mixed-seq(W)   1205984  81.66%
>>>   mixed-rand(R)  17421  14993  86.06%
>>>   mixed-rand(W)  17391  14968  86.07%
>>
>> which parts of the output (as well as units) are these values exactly?
>> I did not work with fio until now, so I think I might ask before 
>> misinterpreting my results.
>
> It is IOPS.
>
>>  
>>> My fio description file
>>>
>>> [global]
>>> bs=4k
>>> ioengine=sync
>>> size=100m
>>> numjobs=1
>>> group_reporting
>>> buffer_compress_percentage=30
>>> scramble_buffers=0
>>> filename=/dev/zram0
>>> loops=10
>>> fsync_on_close=1
>>>
>>> [seq-write]
>>> bs=64k
>>> rw=write
>>> stonewall
>>>
>>> [rand-write]
>>> rw=randwrite
>>> stonewall
>>>
>>> [seq-read]
>>> bs=64k
>>> rw=read
>>> stonewall
>>>
>>> [rand-read]
>>> rw=randread
>>> stonewall
>>>
>>> [mixed-seq]
>>> bs=64k
>>> rw=rw
>>> stonewall
>>>
>>> [mixed-rand]
>>> rw=randrw
>>> stonewall
>>>
>>
>> Great, this makes it easy for me to reproduce your test.
>
> If you have trouble to reproduce, feel free to ask me. I'm happy to test it. 
> :)
>
> Thanks!
>

Hi Minchan,

I will send an updated patch as a reply to this E-Mail. Would be really 
grateful If you'd test it and provide feedback!
The patch should be applied to the current mmots tree.

In fact, the updated LZ4 _is_ slower than the current one in kernel. But I was 
not able to reproduce such large regressions
as you did. I now tried to define FORCE_INLINE as Eric suggested. I also 
inlined some functions which weren't in upstream LZ4,
but are defined as macros in the current kernel LZ4. The approach to replace 
LZ4_ARCH64 with the function call _seemed_ to behave
worse than the macro, so I withdrew the change.

The main difference is, that I replaced the read32/read16/write... etc. 
functions using memcpy with the other ones defined 
in upstream LZ4 (which can be switched using a macro). 
The comment of the author stated, that they're as fast as the memcpy variants 
(or faster), but not as portable
(which does not matter since we're not dependent for multiple compilers).

In 

Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-09 Thread David Miller
From: Eric Biggers 
Date: Thu, 9 Feb 2017 10:29:14 -0800

> On Thu, Feb 09, 2017 at 12:02:11PM +0100, Sven Schmidt wrote:
>> > 
>> > [Also, for some reason linux-crypto is apparently still not receiving 
>> > patch 1/5
>> > in the series.  It's missing from the linux-crypto archive at
>> > http://www.spinics.net/lists/linux-crypto/, so it's not just me.]
>> > 
>> 
>> I don't really know what to do about this. I think the matter is the size of 
>> the E-Mail.
>> Are there filters or something like that? Since in linux-kernel the patch 
>> seems to get delivered.
>> I could otherwise CC you if you wish.
>> 
> 
> If I'm not mistaken, David Miller is the admin of the mail server on
> vger.kernel.org, and he already happens to be Cc'ed on this thread, so maybe 
> he
> can answer as to why linux-crypto may be configured differently?
> 
> Anyway, since the patch did make it to linux-kernel anyone can still download 
> it
> from patchwork if they know where to look:
> https://patchwork.kernel.org/patch/9556271/

I've increased the maxlength parameter for linux-cryto so this doesn't
happen again.


Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-09 Thread Minchan Kim
Hello Sven,

On Thu, Feb 09, 2017 at 11:56:17AM +0100, Sven Schmidt wrote:
> Hey Minchan,
> 
> On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote:
> > Hello Sven,
> > 
> > On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote:
> > > 
> > > This patchset is for updating the LZ4 compression module to a version 
> > > based
> > > on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast
> > > which provides an "acceleration" parameter as a tradeoff between
> > > high compression ratio and high compression speed.
> > > 
> > > We want to use LZ4 fast in order to support compression in lustre
> > > and (mostly, based on that) investigate data reduction techniques in 
> > > behalf of
> > > storage systems.
> > > 
> > > Also, it will be useful for other users of LZ4 compression, as with LZ4 
> > > fast
> > > it is possible to enable applications to use fast and/or high compression
> > > depending on the usecase.
> > > For instance, ZRAM is offering a LZ4 backend and could benefit from an 
> > > updated
> > > LZ4 in the kernel.
> > > 
> > > LZ4 homepage: http://www.lz4.org/
> > > LZ4 source repository: https://github.com/lz4/lz4
> > > Source version: 1.7.3
> > > 
> > > Benchmark (taken from [1], Core i5-4300U @1.9GHz):
> > > |--||--
> > > Compressor  | Compression  | Decompression  | Ratio
> > > |--||--
> > > memcpy  |  4200 MB/s   |  4200 MB/s | 1.000
> > > LZ4 fast 50 |  1080 MB/s   |  2650 MB/s | 1.375
> > > LZ4 fast 17 |   680 MB/s   |  2220 MB/s | 1.607
> > > LZ4 fast 5  |   475 MB/s   |  1920 MB/s | 1.886
> > > LZ4 default |   385 MB/s   |  1850 MB/s | 2.101
> > > 
> > > [1] http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html
> > > 
> > > [PATCH 1/5] lib: Update LZ4 compressor module
> > > [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 
> > > module version
> > > [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version
> > > [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new 
> > > LZ4 version
> > > [PATCH 5/5] lib/lz4: Remove back-compat wrappers
> > 
> > Today, I did zram-lz4 performance test with fio in current mmotm and
> > found it makes regression about 20%.
> > 
> > "lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) so
> > applied your 5 patches. (But now sure current mmots has recent uptodate
> > patches)
> > "revert" means I reverted your 5 patches in current mmots.
> > 
> >  revertlz4-update
> > 
> >   seq-write   1547   1339  86.55%
> >  rand-write  22775  19381  85.10%
> >seq-read   7035   5589  79.45%
> >   rand-read  78556  68479  87.17%
> >mixed-seq(R)   1305   1066  81.69%
> >mixed-seq(W)   1205984  81.66%
> >   mixed-rand(R)  17421  14993  86.06%
> >   mixed-rand(W)  17391  14968  86.07%
> 
> which parts of the output (as well as units) are these values exactly?
> I did not work with fio until now, so I think I might ask before 
> misinterpreting my results.

It is IOPS.

>  
> > My fio description file
> > 
> > [global]
> > bs=4k
> > ioengine=sync
> > size=100m
> > numjobs=1
> > group_reporting
> > buffer_compress_percentage=30
> > scramble_buffers=0
> > filename=/dev/zram0
> > loops=10
> > fsync_on_close=1
> > 
> > [seq-write]
> > bs=64k
> > rw=write
> > stonewall
> > 
> > [rand-write]
> > rw=randwrite
> > stonewall
> > 
> > [seq-read]
> > bs=64k
> > rw=read
> > stonewall
> > 
> > [rand-read]
> > rw=randread
> > stonewall
> > 
> > [mixed-seq]
> > bs=64k
> > rw=rw
> > stonewall
> > 
> > [mixed-rand]
> > rw=randrw
> > stonewall
> > 
> 
> Great, this makes it easy for me to reproduce your test.

If you have trouble to reproduce, feel free to ask me. I'm happy to test it. :)

Thanks!


Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-09 Thread Eric Biggers
On Thu, Feb 09, 2017 at 12:02:11PM +0100, Sven Schmidt wrote:
> > 
> > [Also, for some reason linux-crypto is apparently still not receiving patch 
> > 1/5
> > in the series.  It's missing from the linux-crypto archive at
> > http://www.spinics.net/lists/linux-crypto/, so it's not just me.]
> > 
> 
> I don't really know what to do about this. I think the matter is the size of 
> the E-Mail.
> Are there filters or something like that? Since in linux-kernel the patch 
> seems to get delivered.
> I could otherwise CC you if you wish.
> 

If I'm not mistaken, David Miller is the admin of the mail server on
vger.kernel.org, and he already happens to be Cc'ed on this thread, so maybe he
can answer as to why linux-crypto may be configured differently?

Anyway, since the patch did make it to linux-kernel anyone can still download it
from patchwork if they know where to look:
https://patchwork.kernel.org/patch/9556271/

Eric


Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-09 Thread Eric Biggers
On Thu, Feb 09, 2017 at 12:05:40PM +0100, Sven Schmidt wrote:
> > Because of how LZ4_ARCH64 is defined, it needs to be '#if LZ4_ARCH64'.
> > 
> > But I also think the way upstream LZ4 does 64-bit detection could have just 
> > been
> > left as-is; it has a function which gets inlined:
> > 
> > static unsigned LZ4_64bits(void) { return sizeof(void*)==8; }
> > 
> > Eric
> 
> does this apply for LZ4_isLittleEndian() as well? As a reminder:
>   
>   static unsigned LZ4_isLittleEndian(void)
>   {
>   /* don't use static : performance detrimental */
>   const union { U32 u; BYTE c[4]; } one = { 1 };
> 
>   return one.c[0];
>   }
> 
> It is surely easier to read and understand using these functions in favor of 
> the macros.

Yes, it makes sense to retain LZ4_isLittleEndian() too, mainly so that the call
sites don't need to be changed and we have less chance of introducing bugs.

I also believe the *implementation* of LZ4_isLittleEndian() using the union
"hack" to detecting endianness works fine and will get optimized correctly,
though we could replace it with an #ifdef __LITTLE_ENDIAN__ if we wanted to.  (I
am sure that upstream LZ4 would do that if it was guaranteed to work, but
upstream LZ4 needs to detect endianness reliably across many more different
compilers, compiler versions, and platforms than the Linux kernel does, and
there is no standard preprocessor macro for doing so.)

Eric


Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-09 Thread Sven Schmidt
Hey Eric,

On Wed, Feb 08, 2017 at 09:24:25PM -0800, Eric Biggers wrote:
> Also I noticed another bug, this time in LZ4_count():
> 
> > #if defined(CONFIG_64BIT)
> > #define LZ4_ARCH64 1
> > #else
> > #define LZ4_ARCH64 0
> > #endif
> ...
> > #ifdef LZ4_ARCH64
> >if ((pIn < (pInLimit-3))
> >&& (LZ4_read32(pMatch) == LZ4_read32(pIn))) {
> >pIn += 4; pMatch += 4;
> >}
> > #endif
> 
> Because of how LZ4_ARCH64 is defined, it needs to be '#if LZ4_ARCH64'.
> 
> But I also think the way upstream LZ4 does 64-bit detection could have just 
> been
> left as-is; it has a function which gets inlined:
> 
>   static unsigned LZ4_64bits(void) { return sizeof(void*)==8; }
> 
> Eric

does this apply for LZ4_isLittleEndian() as well? As a reminder:

static unsigned LZ4_isLittleEndian(void)
{
/* don't use static : performance detrimental */
const union { U32 u; BYTE c[4]; } one = { 1 };

return one.c[0];
}

It is surely easier to read and understand using these functions in favor of 
the macros.

Thanks,

Sven


Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-09 Thread Sven Schmidt
Hey Eric,

On Wed, Feb 08, 2017 at 04:24:36PM -0800, Eric Biggers wrote:
> On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote:
> > 
> > Today, I did zram-lz4 performance test with fio in current mmotm and
> > found it makes regression about 20%.
> > 
> 
> This may or may not be the cause of the specific regression you're observing,
> but I just noticed that the proposed patch drops a lot of FORCEINLINE
> annotations from upstream LZ4.  The FORCEINLINE's are there for a reason,
> especially for the main decompression and compression functions which are
> basically "templates" that take in different sets of constant parameters, and
> should be left in.  We should #define FORCEINLINE to __always_inline 
> somewhere,
> or just do a s/FORCEINLINE/__always_inline/g.
>

I generally just replaced "FORCE_INLINE" by "static inline". At least I thought 
so.
I rechecked and realised, I missed at least two of them (why did I not just use 
"search+replace"?).
So I think it's maybe safer and easier to eventually just use "FORCE_INLINE"
with the definition you suggested. Will try that.
 
> Note that the upstream LZ4 code is very carefully optimized, so we should not,
> in general, be changing things like when functions are force-inlined, what the
> hash table size is, etc.
> 
> [Also, for some reason linux-crypto is apparently still not receiving patch 
> 1/5
> in the series.  It's missing from the linux-crypto archive at
> http://www.spinics.net/lists/linux-crypto/, so it's not just me.]
> 

I don't really know what to do about this. I think the matter is the size of 
the E-Mail.
Are there filters or something like that? Since in linux-kernel the patch seems 
to get delivered.
I could otherwise CC you if you wish.

Thanks,

Sven


Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-09 Thread Sven Schmidt
Hey Minchan,

On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote:
> Hello Sven,
> 
> On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote:
> > 
> > This patchset is for updating the LZ4 compression module to a version based
> > on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast
> > which provides an "acceleration" parameter as a tradeoff between
> > high compression ratio and high compression speed.
> > 
> > We want to use LZ4 fast in order to support compression in lustre
> > and (mostly, based on that) investigate data reduction techniques in behalf 
> > of
> > storage systems.
> > 
> > Also, it will be useful for other users of LZ4 compression, as with LZ4 fast
> > it is possible to enable applications to use fast and/or high compression
> > depending on the usecase.
> > For instance, ZRAM is offering a LZ4 backend and could benefit from an 
> > updated
> > LZ4 in the kernel.
> > 
> > LZ4 homepage: http://www.lz4.org/
> > LZ4 source repository: https://github.com/lz4/lz4
> > Source version: 1.7.3
> > 
> > Benchmark (taken from [1], Core i5-4300U @1.9GHz):
> > |--||--
> > Compressor  | Compression  | Decompression  | Ratio
> > |--||--
> > memcpy  |  4200 MB/s   |  4200 MB/s | 1.000
> > LZ4 fast 50 |  1080 MB/s   |  2650 MB/s | 1.375
> > LZ4 fast 17 |   680 MB/s   |  2220 MB/s | 1.607
> > LZ4 fast 5  |   475 MB/s   |  1920 MB/s | 1.886
> > LZ4 default |   385 MB/s   |  1850 MB/s | 2.101
> > 
> > [1] http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html
> > 
> > [PATCH 1/5] lib: Update LZ4 compressor module
> > [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 module 
> > version
> > [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version
> > [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new 
> > LZ4 version
> > [PATCH 5/5] lib/lz4: Remove back-compat wrappers
> 
> Today, I did zram-lz4 performance test with fio in current mmotm and
> found it makes regression about 20%.
> 
> "lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) so
> applied your 5 patches. (But now sure current mmots has recent uptodate
> patches)
> "revert" means I reverted your 5 patches in current mmots.
> 
>  revertlz4-update
> 
>   seq-write   1547   1339  86.55%
>  rand-write  22775  19381  85.10%
>seq-read   7035   5589  79.45%
>   rand-read  78556  68479  87.17%
>mixed-seq(R)   1305   1066  81.69%
>mixed-seq(W)   1205984  81.66%
>   mixed-rand(R)  17421  14993  86.06%
>   mixed-rand(W)  17391  14968  86.07%

which parts of the output (as well as units) are these values exactly?
I did not work with fio until now, so I think I might ask before 
misinterpreting my results.
 
> My fio description file
> 
> [global]
> bs=4k
> ioengine=sync
> size=100m
> numjobs=1
> group_reporting
> buffer_compress_percentage=30
> scramble_buffers=0
> filename=/dev/zram0
> loops=10
> fsync_on_close=1
> 
> [seq-write]
> bs=64k
> rw=write
> stonewall
> 
> [rand-write]
> rw=randwrite
> stonewall
> 
> [seq-read]
> bs=64k
> rw=read
> stonewall
> 
> [rand-read]
> rw=randread
> stonewall
> 
> [mixed-seq]
> bs=64k
> rw=rw
> stonewall
> 
> [mixed-rand]
> rw=randrw
> stonewall
> 

Great, this makes it easy for me to reproduce your test.

Thanks,

Sven


Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-08 Thread Eric Biggers
On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote:
> 
> Today, I did zram-lz4 performance test with fio in current mmotm and
> found it makes regression about 20%.
> 

This may or may not be the cause of the specific regression you're observing,
but I just noticed that the proposed patch drops a lot of FORCEINLINE
annotations from upstream LZ4.  The FORCEINLINE's are there for a reason,
especially for the main decompression and compression functions which are
basically "templates" that take in different sets of constant parameters, and
should be left in.  We should #define FORCEINLINE to __always_inline somewhere,
or just do a s/FORCEINLINE/__always_inline/g.

Note that the upstream LZ4 code is very carefully optimized, so we should not,
in general, be changing things like when functions are force-inlined, what the
hash table size is, etc.

[Also, for some reason linux-crypto is apparently still not receiving patch 1/5
in the series.  It's missing from the linux-crypto archive at
http://www.spinics.net/lists/linux-crypto/, so it's not just me.]

Thanks!

Eric


Re: [PATCH v7 0/5] Update LZ4 compressor module

2017-02-08 Thread Minchan Kim
Hello Sven,

On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote:
> 
> This patchset is for updating the LZ4 compression module to a version based
> on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast
> which provides an "acceleration" parameter as a tradeoff between
> high compression ratio and high compression speed.
> 
> We want to use LZ4 fast in order to support compression in lustre
> and (mostly, based on that) investigate data reduction techniques in behalf of
> storage systems.
> 
> Also, it will be useful for other users of LZ4 compression, as with LZ4 fast
> it is possible to enable applications to use fast and/or high compression
> depending on the usecase.
> For instance, ZRAM is offering a LZ4 backend and could benefit from an updated
> LZ4 in the kernel.
> 
> LZ4 homepage: http://www.lz4.org/
> LZ4 source repository: https://github.com/lz4/lz4
> Source version: 1.7.3
> 
> Benchmark (taken from [1], Core i5-4300U @1.9GHz):
> |--||--
> Compressor  | Compression  | Decompression  | Ratio
> |--||--
> memcpy  |  4200 MB/s   |  4200 MB/s | 1.000
> LZ4 fast 50 |  1080 MB/s   |  2650 MB/s | 1.375
> LZ4 fast 17 |   680 MB/s   |  2220 MB/s | 1.607
> LZ4 fast 5  |   475 MB/s   |  1920 MB/s | 1.886
> LZ4 default |   385 MB/s   |  1850 MB/s | 2.101
> 
> [1] http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html
> 
> [PATCH 1/5] lib: Update LZ4 compressor module
> [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 module 
> version
> [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version
> [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new LZ4 
> version
> [PATCH 5/5] lib/lz4: Remove back-compat wrappers

Today, I did zram-lz4 performance test with fio in current mmotm and
found it makes regression about 20%.

"lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) so
applied your 5 patches. (But now sure current mmots has recent uptodate
patches)
"revert" means I reverted your 5 patches in current mmots.

 revertlz4-update

  seq-write   1547   1339  86.55%
 rand-write  22775  19381  85.10%
   seq-read   7035   5589  79.45%
  rand-read  78556  68479  87.17%
   mixed-seq(R)   1305   1066  81.69%
   mixed-seq(W)   1205984  81.66%
  mixed-rand(R)  17421  14993  86.06%
  mixed-rand(W)  17391  14968  86.07%

My fio description file

[global]
bs=4k
ioengine=sync
size=100m
numjobs=1
group_reporting
buffer_compress_percentage=30
scramble_buffers=0
filename=/dev/zram0
loops=10
fsync_on_close=1

[seq-write]
bs=64k
rw=write
stonewall

[rand-write]
rw=randwrite
stonewall

[seq-read]
bs=64k
rw=read
stonewall

[rand-read]
rw=randread
stonewall

[mixed-seq]
bs=64k
rw=rw
stonewall

[mixed-rand]
rw=randrw
stonewall



[PATCH v7 0/5] Update LZ4 compressor module

2017-02-05 Thread Sven Schmidt

This patchset is for updating the LZ4 compression module to a version based
on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast
which provides an "acceleration" parameter as a tradeoff between
high compression ratio and high compression speed.

We want to use LZ4 fast in order to support compression in lustre
and (mostly, based on that) investigate data reduction techniques in behalf of
storage systems.

Also, it will be useful for other users of LZ4 compression, as with LZ4 fast
it is possible to enable applications to use fast and/or high compression
depending on the usecase.
For instance, ZRAM is offering a LZ4 backend and could benefit from an updated
LZ4 in the kernel.

LZ4 homepage: http://www.lz4.org/
LZ4 source repository: https://github.com/lz4/lz4
Source version: 1.7.3

Benchmark (taken from [1], Core i5-4300U @1.9GHz):
|--||--
Compressor  | Compression  | Decompression  | Ratio
|--||--
memcpy  |  4200 MB/s   |  4200 MB/s | 1.000
LZ4 fast 50 |  1080 MB/s   |  2650 MB/s | 1.375
LZ4 fast 17 |   680 MB/s   |  2220 MB/s | 1.607
LZ4 fast 5  |   475 MB/s   |  1920 MB/s | 1.886
LZ4 default |   385 MB/s   |  1850 MB/s | 2.101

[1] http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html

[PATCH 1/5] lib: Update LZ4 compressor module
[PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 module 
version
[PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version
[PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new LZ4 
version
[PATCH 5/5] lib/lz4: Remove back-compat wrappers

Changes:
v7:
- Fixed errors reported by the Smatch tool
- Changed function documentation comments in lz4.h to match kernel-doc style
- Fixed a misbehaviour of LZ4HC caused by the wrong level of indentation
  concerning two for loops introduced after I refactored the code style using
  checkpatch.pl (upstream LZ4 put dozens of stuff in just one line, gnah)
- Updated the crypto tests for LZ4 since they did fail for the new code
  and hence zram did fail to allocate memory for LZ4

v6:
- Fixed LZ4_NBCOMMONBYTES() for 64-bit little endian
- Reset LZ4_MEMORY_USAGE to 14 (which is the value used in
  upstream LZ4 as well as the previous kernel module)
- Fixed that weird double-indentation in lz4defs.h and lz4.h
- Adjusted general styling issues in lz4defs.h
  (e.g. lines consisting of more than one instruction)
- Removed the architecture-dependent typedef to reg_t
  since upstream LZ4 is just using size_t and that works fine
- Changed error messages in pstore/platform.c:
  * LZ4_compress_default always returns 0 in case of an error
(no need to print the return value)
  * LZ4_decompress_safe returns a negative error message
(return value _does_ matter)

v5:
- Added a fifth patch to remove the back-compat wrappers introduced
  to ensure bisectibility between the patches (the functions are no longer
  needed since there's no callers left)

v4:
- Fixed kbuild errors
- Re-added lz4_compressbound as alias for LZ4_compressBound
  to ensure backwards compatibility
- Wrapped LZ4_hash5 with check for LZ4_ARCH64 since it is only used there
  and triggers an unused function warning when false

v3:
- Adjusted the code to satisfy kernel coding style (checkpatch.pl)
- Made sure the changes to LZ4 in Kernel (overflow checks etc.)
  are included in the new module (they are)
- Removed the second LZ4_compressBound function with related name but
  different return type
- Corrected version number (was LZ4 1.7.3)
- Added missing LZ4 streaming functions

v2:
- Changed order of the patches since in the initial patchset the lz4.h was in 
the
  last patch but was referenced by the other ones
- Split lib/decompress_unlz4.c in an own patch
- Fixed errors reported by the buildbot
- Further refactorings
- Added more appropriate copyright note to include/linux/lz4.h