Re: [PATCH v7 0/5] Update LZ4 compressor module
Hi Sven, On Mon, Feb 13, 2017 at 01:08:41PM +0100, Sven Schmidt wrote: > On Mon, Feb 13, 2017 at 09:03:24AM +0900, Minchan Kim wrote: > > Hi Sven, > > > > On Sun, Feb 12, 2017 at 12:16:17PM +0100, Sven Schmidt wrote: > > > > > > > > > > > > On 02/10/2017 01:13 AM, Minchan Kim wrote: > > > > Hello Sven, > > > > > > > > On Thu, Feb 09, 2017 at 11:56:17AM +0100, Sven Schmidt wrote: > > > >> Hey Minchan, > > > >> > > > >> On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote: > > > >>> Hello Sven, > > > >>> > > > >>> On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote: > > > > > > This patchset is for updating the LZ4 compression module to a > > > version based > > > on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 > > > fast > > > which provides an "acceleration" parameter as a tradeoff between > > > high compression ratio and high compression speed. > > > > > > We want to use LZ4 fast in order to support compression in lustre > > > and (mostly, based on that) investigate data reduction techniques in > > > behalf of > > > storage systems. > > > > > > Also, it will be useful for other users of LZ4 compression, as with > > > LZ4 fast > > > it is possible to enable applications to use fast and/or high > > > compression > > > depending on the usecase. > > > For instance, ZRAM is offering a LZ4 backend and could benefit from > > > an updated > > > LZ4 in the kernel. > > > > > > LZ4 homepage: http://www.lz4.org/ > > > LZ4 source repository: https://github.com/lz4/lz4 > > > Source version: 1.7.3 > > > > > > Benchmark (taken from [1], Core i5-4300U @1.9GHz): > > > |--||-- > > > Compressor | Compression | Decompression | Ratio > > > |--||-- > > > memcpy | 4200 MB/s | 4200 MB/s | 1.000 > > > LZ4 fast 50 | 1080 MB/s | 2650 MB/s | 1.375 > > > LZ4 fast 17 | 680 MB/s | 2220 MB/s | 1.607 > > > LZ4 fast 5 | 475 MB/s | 1920 MB/s | 1.886 > > > LZ4 default | 385 MB/s | 1850 MB/s | 2.101 > > > > > > [1] > > > http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html > > > > > > [PATCH 1/5] lib: Update LZ4 compressor module > > > [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 > > > module version > > > [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module > > > version > > > [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with > > > new LZ4 version > > > [PATCH 5/5] lib/lz4: Remove back-compat wrappers > > > >>> > > > >>> Today, I did zram-lz4 performance test with fio in current mmotm and > > > >>> found it makes regression about 20%. > > > >>> > > > >>> "lz4-update" means current > > > >>> mmots(git://git.cmpxchg.org/linux-mmots.git) so > > > >>> applied your 5 patches. (But now sure current mmots has recent > > > >>> uptodate > > > >>> patches) > > > >>> "revert" means I reverted your 5 patches in current mmots. > > > >>> > > > >>> revertlz4-update > > > >>> > > > >>> seq-write 1547 1339 86.55% > > > >>> rand-write 22775 19381 85.10% > > > >>>seq-read 7035 5589 79.45% > > > >>> rand-read 78556 68479 87.17% > > > >>>mixed-seq(R) 1305 1066 81.69% > > > >>>mixed-seq(W) 1205984 81.66% > > > >>> mixed-rand(R) 17421 14993 86.06% > > > >>> mixed-rand(W) 17391 14968 86.07% > > > >> > > > >> which parts of the output (as well as units) are these values exactly? > > > >> I did not work with fio until now, so I think I might ask before > > > >> misinterpreting my results. > > > > > > > > It is IOPS. > > > > > > > >> > > > >>> My fio description file > > > >>> > > > >>> [global] > > > >>> bs=4k > > > >>> ioengine=sync > > > >>> size=100m > > > >>> numjobs=1 > > > >>> group_reporting > > > >>> buffer_compress_percentage=30 > > > >>> scramble_buffers=0 > > > >>> filename=/dev/zram0 > > > >>> loops=10 > > > >>> fsync_on_close=1 > > > >>> > > > >>> [seq-write] > > > >>> bs=64k > > > >>> rw=write > > > >>> stonewall > > > >>> > > > >>> [rand-write] > > > >>> rw=randwrite > > > >>> stonewall > > > >>> > > > >>> [seq-read] > > > >>> bs=64k > > > >>> rw=read > > > >>> stonewall > > > >>> > > > >>> [rand-read] > > > >>> rw=randread > > > >>> stonewall > > > >>> > > > >>> [mixed-seq] > > > >>> bs=64k > > > >>> rw=rw > > > >>> stonewall > > > >>> > > > >>> [mixed-rand] > > > >>> rw=randrw > > > >>> stonewall > > > >>> > > > >> > > > >> Great, this makes it easy for me to reproduce your test. > > > > > > > > If you have trouble to
Re: [PATCH v7 0/5] Update LZ4 compressor module
On Mon, Feb 13, 2017 at 09:03:24AM +0900, Minchan Kim wrote: > Hi Sven, > > On Sun, Feb 12, 2017 at 12:16:17PM +0100, Sven Schmidt wrote: > > > > > > > > On 02/10/2017 01:13 AM, Minchan Kim wrote: > > > Hello Sven, > > > > > > On Thu, Feb 09, 2017 at 11:56:17AM +0100, Sven Schmidt wrote: > > >> Hey Minchan, > > >> > > >> On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote: > > >>> Hello Sven, > > >>> > > >>> On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote: > > > > This patchset is for updating the LZ4 compression module to a version > > based > > on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 > > fast > > which provides an "acceleration" parameter as a tradeoff between > > high compression ratio and high compression speed. > > > > We want to use LZ4 fast in order to support compression in lustre > > and (mostly, based on that) investigate data reduction techniques in > > behalf of > > storage systems. > > > > Also, it will be useful for other users of LZ4 compression, as with > > LZ4 fast > > it is possible to enable applications to use fast and/or high > > compression > > depending on the usecase. > > For instance, ZRAM is offering a LZ4 backend and could benefit from an > > updated > > LZ4 in the kernel. > > > > LZ4 homepage: http://www.lz4.org/ > > LZ4 source repository: https://github.com/lz4/lz4 > > Source version: 1.7.3 > > > > Benchmark (taken from [1], Core i5-4300U @1.9GHz): > > |--||-- > > Compressor | Compression | Decompression | Ratio > > |--||-- > > memcpy | 4200 MB/s | 4200 MB/s | 1.000 > > LZ4 fast 50 | 1080 MB/s | 2650 MB/s | 1.375 > > LZ4 fast 17 | 680 MB/s | 2220 MB/s | 1.607 > > LZ4 fast 5 | 475 MB/s | 1920 MB/s | 1.886 > > LZ4 default | 385 MB/s | 1850 MB/s | 2.101 > > > > [1] > > http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html > > > > [PATCH 1/5] lib: Update LZ4 compressor module > > [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 > > module version > > [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module > > version > > [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with > > new LZ4 version > > [PATCH 5/5] lib/lz4: Remove back-compat wrappers > > >>> > > >>> Today, I did zram-lz4 performance test with fio in current mmotm and > > >>> found it makes regression about 20%. > > >>> > > >>> "lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) > > >>> so > > >>> applied your 5 patches. (But now sure current mmots has recent uptodate > > >>> patches) > > >>> "revert" means I reverted your 5 patches in current mmots. > > >>> > > >>> revertlz4-update > > >>> > > >>> seq-write 1547 1339 86.55% > > >>> rand-write 22775 19381 85.10% > > >>>seq-read 7035 5589 79.45% > > >>> rand-read 78556 68479 87.17% > > >>>mixed-seq(R) 1305 1066 81.69% > > >>>mixed-seq(W) 1205984 81.66% > > >>> mixed-rand(R) 17421 14993 86.06% > > >>> mixed-rand(W) 17391 14968 86.07% > > >> > > >> which parts of the output (as well as units) are these values exactly? > > >> I did not work with fio until now, so I think I might ask before > > >> misinterpreting my results. > > > > > > It is IOPS. > > > > > >> > > >>> My fio description file > > >>> > > >>> [global] > > >>> bs=4k > > >>> ioengine=sync > > >>> size=100m > > >>> numjobs=1 > > >>> group_reporting > > >>> buffer_compress_percentage=30 > > >>> scramble_buffers=0 > > >>> filename=/dev/zram0 > > >>> loops=10 > > >>> fsync_on_close=1 > > >>> > > >>> [seq-write] > > >>> bs=64k > > >>> rw=write > > >>> stonewall > > >>> > > >>> [rand-write] > > >>> rw=randwrite > > >>> stonewall > > >>> > > >>> [seq-read] > > >>> bs=64k > > >>> rw=read > > >>> stonewall > > >>> > > >>> [rand-read] > > >>> rw=randread > > >>> stonewall > > >>> > > >>> [mixed-seq] > > >>> bs=64k > > >>> rw=rw > > >>> stonewall > > >>> > > >>> [mixed-rand] > > >>> rw=randrw > > >>> stonewall > > >>> > > >> > > >> Great, this makes it easy for me to reproduce your test. > > > > > > If you have trouble to reproduce, feel free to ask me. I'm happy to test > > > it. :) > > > > > > Thanks! > > > > > > > Hi Minchan, > > > > I will send an updated patch as a reply to this E-Mail. Would be really > > grateful If you'd test it and provide feedback! > > The patch should be applied to the current mmots tree. > > > > In fact, the updated LZ4 _is_ slower than the
Re: [PATCH v7 0/5] Update LZ4 compressor module
Hi Sven, On Sun, Feb 12, 2017 at 12:16:17PM +0100, Sven Schmidt wrote: > > > > On 02/10/2017 01:13 AM, Minchan Kim wrote: > > Hello Sven, > > > > On Thu, Feb 09, 2017 at 11:56:17AM +0100, Sven Schmidt wrote: > >> Hey Minchan, > >> > >> On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote: > >>> Hello Sven, > >>> > >>> On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote: > > This patchset is for updating the LZ4 compression module to a version > based > on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast > which provides an "acceleration" parameter as a tradeoff between > high compression ratio and high compression speed. > > We want to use LZ4 fast in order to support compression in lustre > and (mostly, based on that) investigate data reduction techniques in > behalf of > storage systems. > > Also, it will be useful for other users of LZ4 compression, as with LZ4 > fast > it is possible to enable applications to use fast and/or high compression > depending on the usecase. > For instance, ZRAM is offering a LZ4 backend and could benefit from an > updated > LZ4 in the kernel. > > LZ4 homepage: http://www.lz4.org/ > LZ4 source repository: https://github.com/lz4/lz4 > Source version: 1.7.3 > > Benchmark (taken from [1], Core i5-4300U @1.9GHz): > |--||-- > Compressor | Compression | Decompression | Ratio > |--||-- > memcpy | 4200 MB/s | 4200 MB/s | 1.000 > LZ4 fast 50 | 1080 MB/s | 2650 MB/s | 1.375 > LZ4 fast 17 | 680 MB/s | 2220 MB/s | 1.607 > LZ4 fast 5 | 475 MB/s | 1920 MB/s | 1.886 > LZ4 default | 385 MB/s | 1850 MB/s | 2.101 > > [1] > http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html > > [PATCH 1/5] lib: Update LZ4 compressor module > [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 > module version > [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module > version > [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new > LZ4 version > [PATCH 5/5] lib/lz4: Remove back-compat wrappers > >>> > >>> Today, I did zram-lz4 performance test with fio in current mmotm and > >>> found it makes regression about 20%. > >>> > >>> "lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) so > >>> applied your 5 patches. (But now sure current mmots has recent uptodate > >>> patches) > >>> "revert" means I reverted your 5 patches in current mmots. > >>> > >>> revertlz4-update > >>> > >>> seq-write 1547 1339 86.55% > >>> rand-write 22775 19381 85.10% > >>>seq-read 7035 5589 79.45% > >>> rand-read 78556 68479 87.17% > >>>mixed-seq(R) 1305 1066 81.69% > >>>mixed-seq(W) 1205984 81.66% > >>> mixed-rand(R) 17421 14993 86.06% > >>> mixed-rand(W) 17391 14968 86.07% > >> > >> which parts of the output (as well as units) are these values exactly? > >> I did not work with fio until now, so I think I might ask before > >> misinterpreting my results. > > > > It is IOPS. > > > >> > >>> My fio description file > >>> > >>> [global] > >>> bs=4k > >>> ioengine=sync > >>> size=100m > >>> numjobs=1 > >>> group_reporting > >>> buffer_compress_percentage=30 > >>> scramble_buffers=0 > >>> filename=/dev/zram0 > >>> loops=10 > >>> fsync_on_close=1 > >>> > >>> [seq-write] > >>> bs=64k > >>> rw=write > >>> stonewall > >>> > >>> [rand-write] > >>> rw=randwrite > >>> stonewall > >>> > >>> [seq-read] > >>> bs=64k > >>> rw=read > >>> stonewall > >>> > >>> [rand-read] > >>> rw=randread > >>> stonewall > >>> > >>> [mixed-seq] > >>> bs=64k > >>> rw=rw > >>> stonewall > >>> > >>> [mixed-rand] > >>> rw=randrw > >>> stonewall > >>> > >> > >> Great, this makes it easy for me to reproduce your test. > > > > If you have trouble to reproduce, feel free to ask me. I'm happy to test > > it. :) > > > > Thanks! > > > > Hi Minchan, > > I will send an updated patch as a reply to this E-Mail. Would be really > grateful If you'd test it and provide feedback! > The patch should be applied to the current mmots tree. > > In fact, the updated LZ4 _is_ slower than the current one in kernel. But I > was not able to reproduce such large regressions > as you did. I now tried to define FORCE_INLINE as Eric suggested. I also > inlined some functions which weren't in upstream LZ4, > but are defined as macros in the current kernel LZ4. The approach to replace > LZ4_ARCH64 with the function call _seemed_ to behave > worse than the macro, so I
Re: [PATCH v7 0/5] Update LZ4 compressor module
On 02/10/2017 01:13 AM, Minchan Kim wrote: > Hello Sven, > > On Thu, Feb 09, 2017 at 11:56:17AM +0100, Sven Schmidt wrote: >> Hey Minchan, >> >> On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote: >>> Hello Sven, >>> >>> On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote: This patchset is for updating the LZ4 compression module to a version based on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast which provides an "acceleration" parameter as a tradeoff between high compression ratio and high compression speed. We want to use LZ4 fast in order to support compression in lustre and (mostly, based on that) investigate data reduction techniques in behalf of storage systems. Also, it will be useful for other users of LZ4 compression, as with LZ4 fast it is possible to enable applications to use fast and/or high compression depending on the usecase. For instance, ZRAM is offering a LZ4 backend and could benefit from an updated LZ4 in the kernel. LZ4 homepage: http://www.lz4.org/ LZ4 source repository: https://github.com/lz4/lz4 Source version: 1.7.3 Benchmark (taken from [1], Core i5-4300U @1.9GHz): |--||-- Compressor | Compression | Decompression | Ratio |--||-- memcpy | 4200 MB/s | 4200 MB/s | 1.000 LZ4 fast 50 | 1080 MB/s | 2650 MB/s | 1.375 LZ4 fast 17 | 680 MB/s | 2220 MB/s | 1.607 LZ4 fast 5 | 475 MB/s | 1920 MB/s | 1.886 LZ4 default | 385 MB/s | 1850 MB/s | 2.101 [1] http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html [PATCH 1/5] lib: Update LZ4 compressor module [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 module version [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new LZ4 version [PATCH 5/5] lib/lz4: Remove back-compat wrappers >>> >>> Today, I did zram-lz4 performance test with fio in current mmotm and >>> found it makes regression about 20%. >>> >>> "lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) so >>> applied your 5 patches. (But now sure current mmots has recent uptodate >>> patches) >>> "revert" means I reverted your 5 patches in current mmots. >>> >>> revertlz4-update >>> >>> seq-write 1547 1339 86.55% >>> rand-write 22775 19381 85.10% >>>seq-read 7035 5589 79.45% >>> rand-read 78556 68479 87.17% >>>mixed-seq(R) 1305 1066 81.69% >>>mixed-seq(W) 1205984 81.66% >>> mixed-rand(R) 17421 14993 86.06% >>> mixed-rand(W) 17391 14968 86.07% >> >> which parts of the output (as well as units) are these values exactly? >> I did not work with fio until now, so I think I might ask before >> misinterpreting my results. > > It is IOPS. > >> >>> My fio description file >>> >>> [global] >>> bs=4k >>> ioengine=sync >>> size=100m >>> numjobs=1 >>> group_reporting >>> buffer_compress_percentage=30 >>> scramble_buffers=0 >>> filename=/dev/zram0 >>> loops=10 >>> fsync_on_close=1 >>> >>> [seq-write] >>> bs=64k >>> rw=write >>> stonewall >>> >>> [rand-write] >>> rw=randwrite >>> stonewall >>> >>> [seq-read] >>> bs=64k >>> rw=read >>> stonewall >>> >>> [rand-read] >>> rw=randread >>> stonewall >>> >>> [mixed-seq] >>> bs=64k >>> rw=rw >>> stonewall >>> >>> [mixed-rand] >>> rw=randrw >>> stonewall >>> >> >> Great, this makes it easy for me to reproduce your test. > > If you have trouble to reproduce, feel free to ask me. I'm happy to test it. > :) > > Thanks! > Hi Minchan, I will send an updated patch as a reply to this E-Mail. Would be really grateful If you'd test it and provide feedback! The patch should be applied to the current mmots tree. In fact, the updated LZ4 _is_ slower than the current one in kernel. But I was not able to reproduce such large regressions as you did. I now tried to define FORCE_INLINE as Eric suggested. I also inlined some functions which weren't in upstream LZ4, but are defined as macros in the current kernel LZ4. The approach to replace LZ4_ARCH64 with the function call _seemed_ to behave worse than the macro, so I withdrew the change. The main difference is, that I replaced the read32/read16/write... etc. functions using memcpy with the other ones defined in upstream LZ4 (which can be switched using a macro). The comment of the author stated, that they're as fast as the memcpy variants (or faster), but not as portable (which does not matter since we're not dependent for multiple compilers). In
Re: [PATCH v7 0/5] Update LZ4 compressor module
From: Eric BiggersDate: Thu, 9 Feb 2017 10:29:14 -0800 > On Thu, Feb 09, 2017 at 12:02:11PM +0100, Sven Schmidt wrote: >> > >> > [Also, for some reason linux-crypto is apparently still not receiving >> > patch 1/5 >> > in the series. It's missing from the linux-crypto archive at >> > http://www.spinics.net/lists/linux-crypto/, so it's not just me.] >> > >> >> I don't really know what to do about this. I think the matter is the size of >> the E-Mail. >> Are there filters or something like that? Since in linux-kernel the patch >> seems to get delivered. >> I could otherwise CC you if you wish. >> > > If I'm not mistaken, David Miller is the admin of the mail server on > vger.kernel.org, and he already happens to be Cc'ed on this thread, so maybe > he > can answer as to why linux-crypto may be configured differently? > > Anyway, since the patch did make it to linux-kernel anyone can still download > it > from patchwork if they know where to look: > https://patchwork.kernel.org/patch/9556271/ I've increased the maxlength parameter for linux-cryto so this doesn't happen again.
Re: [PATCH v7 0/5] Update LZ4 compressor module
Hello Sven, On Thu, Feb 09, 2017 at 11:56:17AM +0100, Sven Schmidt wrote: > Hey Minchan, > > On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote: > > Hello Sven, > > > > On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote: > > > > > > This patchset is for updating the LZ4 compression module to a version > > > based > > > on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast > > > which provides an "acceleration" parameter as a tradeoff between > > > high compression ratio and high compression speed. > > > > > > We want to use LZ4 fast in order to support compression in lustre > > > and (mostly, based on that) investigate data reduction techniques in > > > behalf of > > > storage systems. > > > > > > Also, it will be useful for other users of LZ4 compression, as with LZ4 > > > fast > > > it is possible to enable applications to use fast and/or high compression > > > depending on the usecase. > > > For instance, ZRAM is offering a LZ4 backend and could benefit from an > > > updated > > > LZ4 in the kernel. > > > > > > LZ4 homepage: http://www.lz4.org/ > > > LZ4 source repository: https://github.com/lz4/lz4 > > > Source version: 1.7.3 > > > > > > Benchmark (taken from [1], Core i5-4300U @1.9GHz): > > > |--||-- > > > Compressor | Compression | Decompression | Ratio > > > |--||-- > > > memcpy | 4200 MB/s | 4200 MB/s | 1.000 > > > LZ4 fast 50 | 1080 MB/s | 2650 MB/s | 1.375 > > > LZ4 fast 17 | 680 MB/s | 2220 MB/s | 1.607 > > > LZ4 fast 5 | 475 MB/s | 1920 MB/s | 1.886 > > > LZ4 default | 385 MB/s | 1850 MB/s | 2.101 > > > > > > [1] http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html > > > > > > [PATCH 1/5] lib: Update LZ4 compressor module > > > [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 > > > module version > > > [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version > > > [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new > > > LZ4 version > > > [PATCH 5/5] lib/lz4: Remove back-compat wrappers > > > > Today, I did zram-lz4 performance test with fio in current mmotm and > > found it makes regression about 20%. > > > > "lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) so > > applied your 5 patches. (But now sure current mmots has recent uptodate > > patches) > > "revert" means I reverted your 5 patches in current mmots. > > > > revertlz4-update > > > > seq-write 1547 1339 86.55% > > rand-write 22775 19381 85.10% > >seq-read 7035 5589 79.45% > > rand-read 78556 68479 87.17% > >mixed-seq(R) 1305 1066 81.69% > >mixed-seq(W) 1205984 81.66% > > mixed-rand(R) 17421 14993 86.06% > > mixed-rand(W) 17391 14968 86.07% > > which parts of the output (as well as units) are these values exactly? > I did not work with fio until now, so I think I might ask before > misinterpreting my results. It is IOPS. > > > My fio description file > > > > [global] > > bs=4k > > ioengine=sync > > size=100m > > numjobs=1 > > group_reporting > > buffer_compress_percentage=30 > > scramble_buffers=0 > > filename=/dev/zram0 > > loops=10 > > fsync_on_close=1 > > > > [seq-write] > > bs=64k > > rw=write > > stonewall > > > > [rand-write] > > rw=randwrite > > stonewall > > > > [seq-read] > > bs=64k > > rw=read > > stonewall > > > > [rand-read] > > rw=randread > > stonewall > > > > [mixed-seq] > > bs=64k > > rw=rw > > stonewall > > > > [mixed-rand] > > rw=randrw > > stonewall > > > > Great, this makes it easy for me to reproduce your test. If you have trouble to reproduce, feel free to ask me. I'm happy to test it. :) Thanks!
Re: [PATCH v7 0/5] Update LZ4 compressor module
On Thu, Feb 09, 2017 at 12:02:11PM +0100, Sven Schmidt wrote: > > > > [Also, for some reason linux-crypto is apparently still not receiving patch > > 1/5 > > in the series. It's missing from the linux-crypto archive at > > http://www.spinics.net/lists/linux-crypto/, so it's not just me.] > > > > I don't really know what to do about this. I think the matter is the size of > the E-Mail. > Are there filters or something like that? Since in linux-kernel the patch > seems to get delivered. > I could otherwise CC you if you wish. > If I'm not mistaken, David Miller is the admin of the mail server on vger.kernel.org, and he already happens to be Cc'ed on this thread, so maybe he can answer as to why linux-crypto may be configured differently? Anyway, since the patch did make it to linux-kernel anyone can still download it from patchwork if they know where to look: https://patchwork.kernel.org/patch/9556271/ Eric
Re: [PATCH v7 0/5] Update LZ4 compressor module
On Thu, Feb 09, 2017 at 12:05:40PM +0100, Sven Schmidt wrote: > > Because of how LZ4_ARCH64 is defined, it needs to be '#if LZ4_ARCH64'. > > > > But I also think the way upstream LZ4 does 64-bit detection could have just > > been > > left as-is; it has a function which gets inlined: > > > > static unsigned LZ4_64bits(void) { return sizeof(void*)==8; } > > > > Eric > > does this apply for LZ4_isLittleEndian() as well? As a reminder: > > static unsigned LZ4_isLittleEndian(void) > { > /* don't use static : performance detrimental */ > const union { U32 u; BYTE c[4]; } one = { 1 }; > > return one.c[0]; > } > > It is surely easier to read and understand using these functions in favor of > the macros. Yes, it makes sense to retain LZ4_isLittleEndian() too, mainly so that the call sites don't need to be changed and we have less chance of introducing bugs. I also believe the *implementation* of LZ4_isLittleEndian() using the union "hack" to detecting endianness works fine and will get optimized correctly, though we could replace it with an #ifdef __LITTLE_ENDIAN__ if we wanted to. (I am sure that upstream LZ4 would do that if it was guaranteed to work, but upstream LZ4 needs to detect endianness reliably across many more different compilers, compiler versions, and platforms than the Linux kernel does, and there is no standard preprocessor macro for doing so.) Eric
Re: [PATCH v7 0/5] Update LZ4 compressor module
Hey Eric, On Wed, Feb 08, 2017 at 09:24:25PM -0800, Eric Biggers wrote: > Also I noticed another bug, this time in LZ4_count(): > > > #if defined(CONFIG_64BIT) > > #define LZ4_ARCH64 1 > > #else > > #define LZ4_ARCH64 0 > > #endif > ... > > #ifdef LZ4_ARCH64 > >if ((pIn < (pInLimit-3)) > >&& (LZ4_read32(pMatch) == LZ4_read32(pIn))) { > >pIn += 4; pMatch += 4; > >} > > #endif > > Because of how LZ4_ARCH64 is defined, it needs to be '#if LZ4_ARCH64'. > > But I also think the way upstream LZ4 does 64-bit detection could have just > been > left as-is; it has a function which gets inlined: > > static unsigned LZ4_64bits(void) { return sizeof(void*)==8; } > > Eric does this apply for LZ4_isLittleEndian() as well? As a reminder: static unsigned LZ4_isLittleEndian(void) { /* don't use static : performance detrimental */ const union { U32 u; BYTE c[4]; } one = { 1 }; return one.c[0]; } It is surely easier to read and understand using these functions in favor of the macros. Thanks, Sven
Re: [PATCH v7 0/5] Update LZ4 compressor module
Hey Eric, On Wed, Feb 08, 2017 at 04:24:36PM -0800, Eric Biggers wrote: > On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote: > > > > Today, I did zram-lz4 performance test with fio in current mmotm and > > found it makes regression about 20%. > > > > This may or may not be the cause of the specific regression you're observing, > but I just noticed that the proposed patch drops a lot of FORCEINLINE > annotations from upstream LZ4. The FORCEINLINE's are there for a reason, > especially for the main decompression and compression functions which are > basically "templates" that take in different sets of constant parameters, and > should be left in. We should #define FORCEINLINE to __always_inline > somewhere, > or just do a s/FORCEINLINE/__always_inline/g. > I generally just replaced "FORCE_INLINE" by "static inline". At least I thought so. I rechecked and realised, I missed at least two of them (why did I not just use "search+replace"?). So I think it's maybe safer and easier to eventually just use "FORCE_INLINE" with the definition you suggested. Will try that. > Note that the upstream LZ4 code is very carefully optimized, so we should not, > in general, be changing things like when functions are force-inlined, what the > hash table size is, etc. > > [Also, for some reason linux-crypto is apparently still not receiving patch > 1/5 > in the series. It's missing from the linux-crypto archive at > http://www.spinics.net/lists/linux-crypto/, so it's not just me.] > I don't really know what to do about this. I think the matter is the size of the E-Mail. Are there filters or something like that? Since in linux-kernel the patch seems to get delivered. I could otherwise CC you if you wish. Thanks, Sven
Re: [PATCH v7 0/5] Update LZ4 compressor module
Hey Minchan, On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote: > Hello Sven, > > On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote: > > > > This patchset is for updating the LZ4 compression module to a version based > > on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast > > which provides an "acceleration" parameter as a tradeoff between > > high compression ratio and high compression speed. > > > > We want to use LZ4 fast in order to support compression in lustre > > and (mostly, based on that) investigate data reduction techniques in behalf > > of > > storage systems. > > > > Also, it will be useful for other users of LZ4 compression, as with LZ4 fast > > it is possible to enable applications to use fast and/or high compression > > depending on the usecase. > > For instance, ZRAM is offering a LZ4 backend and could benefit from an > > updated > > LZ4 in the kernel. > > > > LZ4 homepage: http://www.lz4.org/ > > LZ4 source repository: https://github.com/lz4/lz4 > > Source version: 1.7.3 > > > > Benchmark (taken from [1], Core i5-4300U @1.9GHz): > > |--||-- > > Compressor | Compression | Decompression | Ratio > > |--||-- > > memcpy | 4200 MB/s | 4200 MB/s | 1.000 > > LZ4 fast 50 | 1080 MB/s | 2650 MB/s | 1.375 > > LZ4 fast 17 | 680 MB/s | 2220 MB/s | 1.607 > > LZ4 fast 5 | 475 MB/s | 1920 MB/s | 1.886 > > LZ4 default | 385 MB/s | 1850 MB/s | 2.101 > > > > [1] http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html > > > > [PATCH 1/5] lib: Update LZ4 compressor module > > [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 module > > version > > [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version > > [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new > > LZ4 version > > [PATCH 5/5] lib/lz4: Remove back-compat wrappers > > Today, I did zram-lz4 performance test with fio in current mmotm and > found it makes regression about 20%. > > "lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) so > applied your 5 patches. (But now sure current mmots has recent uptodate > patches) > "revert" means I reverted your 5 patches in current mmots. > > revertlz4-update > > seq-write 1547 1339 86.55% > rand-write 22775 19381 85.10% >seq-read 7035 5589 79.45% > rand-read 78556 68479 87.17% >mixed-seq(R) 1305 1066 81.69% >mixed-seq(W) 1205984 81.66% > mixed-rand(R) 17421 14993 86.06% > mixed-rand(W) 17391 14968 86.07% which parts of the output (as well as units) are these values exactly? I did not work with fio until now, so I think I might ask before misinterpreting my results. > My fio description file > > [global] > bs=4k > ioengine=sync > size=100m > numjobs=1 > group_reporting > buffer_compress_percentage=30 > scramble_buffers=0 > filename=/dev/zram0 > loops=10 > fsync_on_close=1 > > [seq-write] > bs=64k > rw=write > stonewall > > [rand-write] > rw=randwrite > stonewall > > [seq-read] > bs=64k > rw=read > stonewall > > [rand-read] > rw=randread > stonewall > > [mixed-seq] > bs=64k > rw=rw > stonewall > > [mixed-rand] > rw=randrw > stonewall > Great, this makes it easy for me to reproduce your test. Thanks, Sven
Re: [PATCH v7 0/5] Update LZ4 compressor module
On Thu, Feb 09, 2017 at 08:31:21AM +0900, Minchan Kim wrote: > > Today, I did zram-lz4 performance test with fio in current mmotm and > found it makes regression about 20%. > This may or may not be the cause of the specific regression you're observing, but I just noticed that the proposed patch drops a lot of FORCEINLINE annotations from upstream LZ4. The FORCEINLINE's are there for a reason, especially for the main decompression and compression functions which are basically "templates" that take in different sets of constant parameters, and should be left in. We should #define FORCEINLINE to __always_inline somewhere, or just do a s/FORCEINLINE/__always_inline/g. Note that the upstream LZ4 code is very carefully optimized, so we should not, in general, be changing things like when functions are force-inlined, what the hash table size is, etc. [Also, for some reason linux-crypto is apparently still not receiving patch 1/5 in the series. It's missing from the linux-crypto archive at http://www.spinics.net/lists/linux-crypto/, so it's not just me.] Thanks! Eric
Re: [PATCH v7 0/5] Update LZ4 compressor module
Hello Sven, On Sun, Feb 05, 2017 at 08:09:03PM +0100, Sven Schmidt wrote: > > This patchset is for updating the LZ4 compression module to a version based > on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast > which provides an "acceleration" parameter as a tradeoff between > high compression ratio and high compression speed. > > We want to use LZ4 fast in order to support compression in lustre > and (mostly, based on that) investigate data reduction techniques in behalf of > storage systems. > > Also, it will be useful for other users of LZ4 compression, as with LZ4 fast > it is possible to enable applications to use fast and/or high compression > depending on the usecase. > For instance, ZRAM is offering a LZ4 backend and could benefit from an updated > LZ4 in the kernel. > > LZ4 homepage: http://www.lz4.org/ > LZ4 source repository: https://github.com/lz4/lz4 > Source version: 1.7.3 > > Benchmark (taken from [1], Core i5-4300U @1.9GHz): > |--||-- > Compressor | Compression | Decompression | Ratio > |--||-- > memcpy | 4200 MB/s | 4200 MB/s | 1.000 > LZ4 fast 50 | 1080 MB/s | 2650 MB/s | 1.375 > LZ4 fast 17 | 680 MB/s | 2220 MB/s | 1.607 > LZ4 fast 5 | 475 MB/s | 1920 MB/s | 1.886 > LZ4 default | 385 MB/s | 1850 MB/s | 2.101 > > [1] http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html > > [PATCH 1/5] lib: Update LZ4 compressor module > [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 module > version > [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version > [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new LZ4 > version > [PATCH 5/5] lib/lz4: Remove back-compat wrappers Today, I did zram-lz4 performance test with fio in current mmotm and found it makes regression about 20%. "lz4-update" means current mmots(git://git.cmpxchg.org/linux-mmots.git) so applied your 5 patches. (But now sure current mmots has recent uptodate patches) "revert" means I reverted your 5 patches in current mmots. revertlz4-update seq-write 1547 1339 86.55% rand-write 22775 19381 85.10% seq-read 7035 5589 79.45% rand-read 78556 68479 87.17% mixed-seq(R) 1305 1066 81.69% mixed-seq(W) 1205984 81.66% mixed-rand(R) 17421 14993 86.06% mixed-rand(W) 17391 14968 86.07% My fio description file [global] bs=4k ioengine=sync size=100m numjobs=1 group_reporting buffer_compress_percentage=30 scramble_buffers=0 filename=/dev/zram0 loops=10 fsync_on_close=1 [seq-write] bs=64k rw=write stonewall [rand-write] rw=randwrite stonewall [seq-read] bs=64k rw=read stonewall [rand-read] rw=randread stonewall [mixed-seq] bs=64k rw=rw stonewall [mixed-rand] rw=randrw stonewall
[PATCH v7 0/5] Update LZ4 compressor module
This patchset is for updating the LZ4 compression module to a version based on LZ4 v1.7.3 allowing to use the fast compression algorithm aka LZ4 fast which provides an "acceleration" parameter as a tradeoff between high compression ratio and high compression speed. We want to use LZ4 fast in order to support compression in lustre and (mostly, based on that) investigate data reduction techniques in behalf of storage systems. Also, it will be useful for other users of LZ4 compression, as with LZ4 fast it is possible to enable applications to use fast and/or high compression depending on the usecase. For instance, ZRAM is offering a LZ4 backend and could benefit from an updated LZ4 in the kernel. LZ4 homepage: http://www.lz4.org/ LZ4 source repository: https://github.com/lz4/lz4 Source version: 1.7.3 Benchmark (taken from [1], Core i5-4300U @1.9GHz): |--||-- Compressor | Compression | Decompression | Ratio |--||-- memcpy | 4200 MB/s | 4200 MB/s | 1.000 LZ4 fast 50 | 1080 MB/s | 2650 MB/s | 1.375 LZ4 fast 17 | 680 MB/s | 2220 MB/s | 1.607 LZ4 fast 5 | 475 MB/s | 1920 MB/s | 1.886 LZ4 default | 385 MB/s | 1850 MB/s | 2.101 [1] http://fastcompression.blogspot.de/2015/04/sampling-or-faster-lz4.html [PATCH 1/5] lib: Update LZ4 compressor module [PATCH 2/5] lib/decompress_unlz4: Change module to work with new LZ4 module version [PATCH 3/5] crypto: Change LZ4 modules to work with new LZ4 module version [PATCH 4/5] fs/pstore: fs/squashfs: Change usage of LZ4 to work with new LZ4 version [PATCH 5/5] lib/lz4: Remove back-compat wrappers Changes: v7: - Fixed errors reported by the Smatch tool - Changed function documentation comments in lz4.h to match kernel-doc style - Fixed a misbehaviour of LZ4HC caused by the wrong level of indentation concerning two for loops introduced after I refactored the code style using checkpatch.pl (upstream LZ4 put dozens of stuff in just one line, gnah) - Updated the crypto tests for LZ4 since they did fail for the new code and hence zram did fail to allocate memory for LZ4 v6: - Fixed LZ4_NBCOMMONBYTES() for 64-bit little endian - Reset LZ4_MEMORY_USAGE to 14 (which is the value used in upstream LZ4 as well as the previous kernel module) - Fixed that weird double-indentation in lz4defs.h and lz4.h - Adjusted general styling issues in lz4defs.h (e.g. lines consisting of more than one instruction) - Removed the architecture-dependent typedef to reg_t since upstream LZ4 is just using size_t and that works fine - Changed error messages in pstore/platform.c: * LZ4_compress_default always returns 0 in case of an error (no need to print the return value) * LZ4_decompress_safe returns a negative error message (return value _does_ matter) v5: - Added a fifth patch to remove the back-compat wrappers introduced to ensure bisectibility between the patches (the functions are no longer needed since there's no callers left) v4: - Fixed kbuild errors - Re-added lz4_compressbound as alias for LZ4_compressBound to ensure backwards compatibility - Wrapped LZ4_hash5 with check for LZ4_ARCH64 since it is only used there and triggers an unused function warning when false v3: - Adjusted the code to satisfy kernel coding style (checkpatch.pl) - Made sure the changes to LZ4 in Kernel (overflow checks etc.) are included in the new module (they are) - Removed the second LZ4_compressBound function with related name but different return type - Corrected version number (was LZ4 1.7.3) - Added missing LZ4 streaming functions v2: - Changed order of the patches since in the initial patchset the lz4.h was in the last patch but was referenced by the other ones - Split lib/decompress_unlz4.c in an own patch - Fixed errors reported by the buildbot - Further refactorings - Added more appropriate copyright note to include/linux/lz4.h