Re: [PATCH v7 0/5] add compression algorithm zBeWalgo

2018-04-20 Thread Minchan Kim
Hi Benjamin,

Today I tried your new patchset but I couldn't go further due to below
problem. Unfortunately, I don't have the time to look into.
Could you check on it?

Thanks.

[  169.597064] zram0: detected capacity change from 1073741824 to 0
[  177.523268] zram0: detected capacity change from 0 to 1073741824
[  181.312545] BUG: sleeping function called from invalid context at 
mm/page-writeback.c:2274
[  181.315578] in_atomic(): 1, irqs_disabled(): 0, pid: 2051, name: dd
[  181.317804] 1 lock held by dd/2051:
[  181.318973]  #0: d83cd3cb (>bd_mutex){+.+.}, at: 
__blkdev_put+0x41/0x1f0
[  181.321590] CPU: 5 PID: 2051 Comm: dd Not tainted 4.16.0-mm1+ #202
[  181.323599] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[  181.326295] Call Trace:
[  181.327117]  dump_stack+0x67/0x9b
[  181.328246]  ___might_sleep+0x149/0x230
[  181.329475]  write_cache_pages+0x31d/0x620
[  181.330726]  ? tag_pages_for_writeback+0x140/0x140
[  181.332201]  ? __lock_acquire+0x2b5/0x1300
[  181.333466]  generic_writepages+0x5f/0x90
[  181.334695]  ? do_writepages+0x4b/0xf0
[  181.335840]  ? blkdev_readpages+0x20/0x20
[  181.337077]  do_writepages+0x4b/0xf0
[  181.338174]  ? __filemap_fdatawrite_range+0xb4/0x100
[  181.339672]  ? __blkdev_put+0x41/0x1f0
[  181.340826]  ? __filemap_fdatawrite_range+0xc1/0x100
[  181.342251]  __filemap_fdatawrite_range+0xc1/0x100
[  181.343610]  filemap_write_and_wait+0x2c/0x70
[  181.344867]  __blkdev_put+0x71/0x1f0
[  181.345891]  blkdev_close+0x21/0x30
[  181.346889]  __fput+0xeb/0x220
[  181.347769]  task_work_run+0x93/0xc0
[  181.348803]  exit_to_usermode_loop+0x8d/0x90
[  181.350009]  do_syscall_64+0x16b/0x1b0
[  181.351080]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[  181.352498] RIP: 0033:0x7f5e88e028f0
[  181.353512] RSP: 002b:7fff448399d8 EFLAGS: 0246 ORIG_RAX: 
0003
[  181.355501] RAX:  RBX: 0001 RCX: 7f5e88e028f0
[  181.357382] RDX: 1000 RSI:  RDI: 0001
[  181.359254] RBP: 7f5e892e2698 R08: 0117e000 R09: 7fff448f2080
[  181.361134] R10: 086f R11: 0246 R12: 
[  181.362995] R13:  R14:  R15: 
[  181.365448] show_signal_msg: 12 callbacks suppressed
[  181.365452] dd[2051]: segfault at 7f5e88d78d70 ip 7f5e88d78d70 sp 
7fff44839548 error 14 in libc-2.23.so[7f5e88d0b000+1c]
[  181.369877] BUG: scheduling while atomic: dd/2051/0x0002
[  181.371734] no locks held by dd/2051.
[  181.372658] Modules linked in:
[  181.373503] CPU: 5 PID: 2051 Comm: dd Tainted: GW 
4.16.0-mm1+ #202
[  181.375379] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[  181.377454] Call Trace:
[  181.378055]  dump_stack+0x67/0x9b
[  181.378854]  __schedule_bug+0x5d/0x80
[  181.379731]  __schedule+0x7b5/0xbd0
[  181.380569]  ? find_held_lock+0x2d/0x90
[  181.381503]  ? try_to_wake_up+0x56/0x510
[  181.382437]  ? wait_for_completion+0x112/0x1a0
[  181.383486]  schedule+0x2f/0x90
[  181.384237]  schedule_timeout+0x22b/0x550
[  181.385198]  ? find_held_lock+0x2d/0x90
[  181.386105]  ? wait_for_completion+0x132/0x1a0
[  181.387158]  ? wait_for_completion+0x112/0x1a0
[  181.388221]  wait_for_completion+0x13a/0x1a0
[  181.389236]  ? wake_up_q+0x70/0x70
[  181.390008]  call_usermodehelper_exec+0x13b/0x170
[  181.391067]  do_coredump+0xaed/0x1040
[  181.391893]  ? try_to_wake_up+0x56/0x510
[  181.392815]  ? __lock_is_held+0x55/0x90
[  181.393694]  get_signal+0x32f/0x8e0
[  181.394485]  ? page_fault+0x2f/0x50
[  181.395271]  do_signal+0x36/0x6f0
[  181.396021]  ? force_sig_info_fault+0x97/0xf0
[  181.397018]  ? __bad_area_nosemaphore+0x19e/0x1b0
[  181.398074]  ? __do_page_fault+0xde/0x4b0
[  181.398977]  ? page_fault+0x2f/0x50
[  181.399780]  exit_to_usermode_loop+0x62/0x90
[  181.400770]  prepare_exit_to_usermode+0xbf/0xd0
[  181.401734]  retint_user+0x8/0x18
[  181.402446] RIP: 0033:0x7f5e88d78d70
[  181.403213] RSP: 002b:7fff44839548 EFLAGS: 00010246
[  181.404319] RAX: 7fff4483956f RBX: 7fff44839550 RCX: 007361696c612e65
[  181.405827] RDX:  RSI: 7f5e88e97733 RDI: 7fff44839550
[  181.407245] RBP: 7fff44839790 R08: 656c61636f6c2f65 R09: feff7e5cff372c00
[  181.408920] R10:  R11:  R12: 0117df30
[  181.410817] R13: 7fff44839860 R14: 7fff44839880 R15: 



On Fri, Apr 13, 2018 at 05:48:35PM +0200, Benjamin Warnke wrote:
> This patch series adds a new compression algorithm to the kernel and to
> the crypto api.
> 
> Changes since v6:
> - Fixed git apply error due to other recently applied patches
> 
> Changes since v5:
> - Fixed compile-error due to variable definitions inside #ifdef 
> CONFIG_ZRAM_WRITEBACK
> 
> Changes since v4:
> - Fix mismatching function-prototypes
> - Fix mismatching License errors
> - Add static to global 

[PATCH v7 0/5] add compression algorithm zBeWalgo

2018-04-13 Thread Benjamin Warnke
This patch series adds a new compression algorithm to the kernel and to
the crypto api.

Changes since v6:
- Fixed git apply error due to other recently applied patches

Changes since v5:
- Fixed compile-error due to variable definitions inside #ifdef 
CONFIG_ZRAM_WRITEBACK

Changes since v4:
- Fix mismatching function-prototypes
- Fix mismatching License errors
- Add static to global vars
- Add ULL to long constants

Changes since v3:
- Split patch into patchset
- Add Zstd = Zstandard to the list of benchmarked algorithms
- Added configurable compression levels to crypto-api
- Added multiple compression levels to the benchmarks below
- Added unsafe decompressor functions to crypto-api
- Added flag to mark unstable algorithms to crypto-api
- Test the code using afl-fuzz -> and fix the code
- Added 2 new Benchmark datasets
- checkpatch.pl fixes

Changes since v2:
- added linux-kernel Mailinglist

Changes since v1:
- improved documentation
- improved code style
- replaced numerous casts with get_unaligned*
- added tests in crypto/testmgr.h/c
- added zBeWalgo to the list of algorithms shown by 
/sys/block/zram0/comp_algorithm


Currently ZRAM uses compression-algorithms from the crypto-api. ZRAM
compresses each page individually. As a result the compression algorithm is
forced to use a very small sliding window. None of the available compression
algorithms is designed to achieve high compression ratios with small inputs.

This patch-set adds a new compression algorithm 'zBeWalgo' to the crypto api.
This algorithm focusses on increasing the capacity of the compressed
block-device created by ZRAM. The choice of compression algorithms is always
a tradeoff between speed and compression ratio.

If faster algorithms like 'lz4' are chosen the compression ratio is often
lower than the ratio of zBeWalgo as shown in the following benchmarks. Due to
the lower compression ratio, ZRAM needs to fall back to backing_devices
mode often. If backing_devices are required, the effective speed of ZRAM is a
weighted average of de/compression time and writing/reading from the
backing_device. This should be considered when comparing the speeds in the
benchmarks.

There are different kinds of backing_devices, each with its own drawbacks.
1. HDDs: This kind of backing device is very slow. If the compression ratio
of an algorithm is much lower than the ratio of zBeWalgo, it might be faster
to use zBewalgo instead.
2. SSDs: I tested a swap partition on my NVME-SSD. The speed is even higher
than zram with lz4, but after about 5 Minutes the SSD is blocking all
read/write requests due to overheating. This is definitly not an option.


Benchmarks:


To obtain reproducable benchmarks, the datasets were first loaded into a
userspace-program. Than the data is written directly to a clean
zram-partition without any filesystem. Between writing and reading 'sync'
and 'echo 3 > /proc/sys/vm/drop_caches' is called. All time measurements are
wall clock times, and the benchmarks are using only one cpu-core at a time.
The new algorithm is compared to all available compression algorithms from
the crypto-api.

Before loading the datasets to user-space deduplication is applied, since
none Algorithm has deduplication. Duplicated pages are removed to
prevent an algorithm to obtain high/low ratios, just because a single page can
be compressed very well - or not.

All Algorithms marked with '*' are using unsafe decompression.

All Read and Write Speed Measurements are given in MBit/s

zbewalgo' uses per dataset specialized different combinations. These can be
specified at runtime via /sys/kernel/zbewalgo/combinations.


- '/dev/zero' This dataset is used to measure the speed limitations
for ZRAM. ZRAM filters zero-data internally and does not even call the
specified compression algorithm.

Algorithm   writeread
--zram--  2724.08 2828.87


- 'ecoham' This dataset is one of the input files for the scientific
application ECOHAM which runs an ocean simulation. This dataset contains a
lot of zeros - even after deduplication. Where the data is not zero there are
arrays of floating point values, adjacent float values are likely to be
similar to each other, allowing for high compression ratios.

zbewalgo reaches very high compression ratios and is a lot faster than other
algorithms with similar compression ratios.

Algorithmratiowrite read
--hdd--   1.00   134.70   156.62
lz4*_10   6.73  1303.12  1547.17
lz4_106.73  1303.12  1574.51
lzo   6.88  1205.98  1468.09
lz4*_05   7.00  1291.81  1642.41
lz4_057.00  1291.81  1682.81
lz4_077.13  1250.29  1593.89
lz4*_07   7.13  1250.29  1677.08
lz4_067.16  1307.62  1666.66
lz4*_06   7.16  1307.62  1669.42
lz4_037.21  1250.87  1449.48
lz4*_03   7.21  1250.87  1621.97
lz4*_04   7.23  1281.62  1645.56
lz4_047.23  1281.62  1666.81
lz4_027.33  1267.54  1523.11
lz4*_02   7.33  1267.54  1576.54
lz4_097.36