Re: [PATCH v5 2/5] lib: Add zstd modules
On 08/10/2017 03:25 PM, Hugo Mills wrote: On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote: On 08/10/2017 04:30 AM, Eric Biggers wrote: Theses benchmarks are misleading because they compress the whole file as a single stream without resetting the dictionary, which isn't how data will typically be compressed in kernel mode. With filesystem compression the data has to be divided into small chunks that can each be decompressed independently. That eliminates one of the primary advantages of Zstandard (support for large dictionary sizes). I did btrfs benchmarks of kernel trees and other normal data sets as well. The numbers were in line with what Nick is posting here. zstd is a big win over both lzo and zlib from a btrfs point of view. It's true Nick's patches only support a single compression level in btrfs, but that's because btrfs doesn't have a way to pass in the compression ratio. It could easily be a mount option, it was just outside the scope of Nick's initial work. Could we please not add more mount options? I get that they're easy to implement, but it's a very blunt instrument. What we tend to see (with both nodatacow and compress) is people using the mount options, then asking for exceptions, discovering that they can't do that, and then falling back to doing it with attributes or btrfs properties. Could we just start with btrfs properties this time round, and cut out the mount option part of this cycle. In the long run, it'd be great to see most of the btrfs-specific mount options get deprecated and ultimately removed entirely, in favour of attributes/properties, where feasible. It's a good point, and as was commented later down I'd just do mount -o compress=zstd:3 or something. But I do prefer properties in general for this. My big point was just that next step is outside of Nick's scope. -chris
Re: [PATCH v5 2/5] lib: Add zstd modules
On 08/10/2017 03:00 PM, Eric Biggers wrote: On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote: On 08/10/2017 04:30 AM, Eric Biggers wrote: On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote: The memory reported is the amount of memory the compressor requests. | Method | Size (B) | Time (s) | Ratio | MB/s| Adj MB/s | Mem (MB) | |--|--|--|---|-|--|--| | none | 11988480 |0.100 | 1 | 2119.88 |- |- | | zstd -1 | 73645762 |1.044 | 2.878 | 203.05 | 224.56 | 1.23 | | zstd -3 | 66988878 |1.761 | 3.165 | 120.38 | 127.63 | 2.47 | | zstd -5 | 65001259 |2.563 | 3.261 | 82.71 |86.07 | 2.86 | | zstd -10 | 60165346 | 13.242 | 3.523 | 16.01 |16.13 |13.22 | | zstd -15 | 58009756 | 47.601 | 3.654 |4.45 | 4.46 |21.61 | | zstd -19 | 54014593 | 102.835 | 3.925 |2.06 | 2.06 |60.15 | | zlib -1 | 77260026 |2.895 | 2.744 | 73.23 |75.85 | 0.27 | | zlib -3 | 72972206 |4.116 | 2.905 | 51.50 |52.79 | 0.27 | | zlib -6 | 68190360 |9.633 | 3.109 | 22.01 |22.24 | 0.27 | | zlib -9 | 67613382 | 22.554 | 3.135 |9.40 | 9.44 | 0.27 | Theses benchmarks are misleading because they compress the whole file as a single stream without resetting the dictionary, which isn't how data will typically be compressed in kernel mode. With filesystem compression the data has to be divided into small chunks that can each be decompressed independently. That eliminates one of the primary advantages of Zstandard (support for large dictionary sizes). I did btrfs benchmarks of kernel trees and other normal data sets as well. The numbers were in line with what Nick is posting here. zstd is a big win over both lzo and zlib from a btrfs point of view. It's true Nick's patches only support a single compression level in btrfs, but that's because btrfs doesn't have a way to pass in the compression ratio. It could easily be a mount option, it was just outside the scope of Nick's initial work. I am not surprised --- Zstandard is closer to the state of the art, both format-wise and implementation-wise, than the other choices in BTRFS. My point is that benchmarks need to account for how much data is compressed at a time. This is a common mistake when comparing different compression algorithms; the algorithm name and compression level do not tell the whole story. The dictionary size is extremely significant. No one is going to compress or decompress a 200 MB file as a single stream in kernel mode, so it does not make sense to justify adding Zstandard *to the kernel* based on such a benchmark. It is going to be divided into chunks. How big are the chunks in BTRFS? I thought that it compressed only one page (4 KiB) at a time, but I hope that has been, or is being, improved; 32 KiB - 128 KiB should be a better amount. (And if the amount of data compressed at a time happens to be different between the different algorithms, note that BTRFS benchmarks are likely to be measuring that as much as the algorithms themselves.) Btrfs hooks the compression code into the delayed allocation mechanism we use to gather large extents for COW. So if you write 100MB to a file, we'll have 100MB to compress at a time (within the limits of the amount of pages we allow to collect before forcing it down). But we want to balance how much memory you might need to uncompress during random reads. So we have an artificial limit of 128KB that we send at a time to the compression code. It's easy to change this, it's just a tradeoff made to limit the cost of reading small bits. It's the same for zlib,lzo and the new zstd patch. -chris
Re: [PATCH v5 2/5] lib: Add zstd modules
On 08/10/2017 04:30 AM, Eric Biggers wrote: On Wed, Aug 09, 2017 at 07:35:53PM -0700, Nick Terrell wrote: The memory reported is the amount of memory the compressor requests. | Method | Size (B) | Time (s) | Ratio | MB/s| Adj MB/s | Mem (MB) | |--|--|--|---|-|--|--| | none | 11988480 |0.100 | 1 | 2119.88 |- |- | | zstd -1 | 73645762 |1.044 | 2.878 | 203.05 | 224.56 | 1.23 | | zstd -3 | 66988878 |1.761 | 3.165 | 120.38 | 127.63 | 2.47 | | zstd -5 | 65001259 |2.563 | 3.261 | 82.71 |86.07 | 2.86 | | zstd -10 | 60165346 | 13.242 | 3.523 | 16.01 |16.13 |13.22 | | zstd -15 | 58009756 | 47.601 | 3.654 |4.45 | 4.46 |21.61 | | zstd -19 | 54014593 | 102.835 | 3.925 |2.06 | 2.06 |60.15 | | zlib -1 | 77260026 |2.895 | 2.744 | 73.23 |75.85 | 0.27 | | zlib -3 | 72972206 |4.116 | 2.905 | 51.50 |52.79 | 0.27 | | zlib -6 | 68190360 |9.633 | 3.109 | 22.01 |22.24 | 0.27 | | zlib -9 | 67613382 | 22.554 | 3.135 |9.40 | 9.44 | 0.27 | Theses benchmarks are misleading because they compress the whole file as a single stream without resetting the dictionary, which isn't how data will typically be compressed in kernel mode. With filesystem compression the data has to be divided into small chunks that can each be decompressed independently. That eliminates one of the primary advantages of Zstandard (support for large dictionary sizes). I did btrfs benchmarks of kernel trees and other normal data sets as well. The numbers were in line with what Nick is posting here. zstd is a big win over both lzo and zlib from a btrfs point of view. It's true Nick's patches only support a single compression level in btrfs, but that's because btrfs doesn't have a way to pass in the compression ratio. It could easily be a mount option, it was just outside the scope of Nick's initial work. -chris
Re: [PATCH v3 02/12] btrfs: LLVMLinux: Remove VLAIS
On 09/15/2014 03:30 AM, beh...@converseincode.com wrote: From: Vinícius Tinti viniciusti...@gmail.com Replaced the use of a Variable Length Array In Struct (VLAIS) with a C99 compliant equivalent. This is the original VLAIS struct. struct { struct shash_desc shash; char ctx[crypto_shash_descsize(tfm)]; } desc; This patch instead allocates the appropriate amount of memory using a char array using the SHASH_DESC_ON_STACK macro. The new code can be compiled with both gcc and clang. Signed-off-by: Vinícius Tinti viniciusti...@gmail.com Reviewed-by: Jan-Simon Möller dl...@gmx.de Reviewed-by: Mark Charlebois charl...@gmail.com Signed-off-by: Behan Webster beh...@converseincode.com Cc: David S. Miller da...@davemloft.net Cc: Herbert Xu herb...@gondor.apana.org.au Acked-by: Chris Mason c...@fb.com On the btrfs bits. Thanks for the v3. -chris -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Using Intel CRC32 instruction to accelerate CRC32c algorithm by new crypto API.
On Mon, 2008-08-04 at 11:45 +0100, David Woodhouse wrote: On Mon, 2008-08-04 at 06:35 -0400, Austin Zhang wrote: On Mon, 2008-08-04 at 11:12 +0100, David Woodhouse wrote: You could perhaps just use 'unsigned long' here, to avoid the ifdef. Thanks. And it would be nice if we could make libcrc32c use this too, rather than just the 'crypto' users. From previous discussing, herbert would like to transfer the libcrc32c interface by new crypto because there were few user using the current libcrc32c interface. Are we deprecating libcrc32c, then? Or just turning it into a wrapper around the crypto code? Long term I'd like to switch btrfs to the crypto api, but right now I'm using libcrc32c. From a performance point of view I'm probably reading the crypto API code wrong, but it looks like my choices are to either have a long standing context and use locking around the digest/hash calls to protect internal crypto state, or create a new context every time and take a perf hit while crypto looks up the right module. Either way it looks slower than just calling good old libcrc32c. -chris -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Using Intel CRC32 instruction to accelerate CRC32c algorithm by new crypto API.
On Mon, 2008-08-04 at 23:42 +0800, Herbert Xu wrote: Chris Mason [EMAIL PROTECTED] wrote: From a performance point of view I'm probably reading the crypto API code wrong, but it looks like my choices are to either have a long standing context and use locking around the digest/hash calls to protect internal crypto state, or create a new context every time and take a perf hit while crypto looks up the right module. You're looking at the old hash interface. New users should use the ahash interface which was only recently added to the kernel. It lets you store the state in the request object which you pass to the algorithm on every call. This means that you only need one tfm in the entire system for crc32c. Great to hear, that solves my main concern then. There is still the embedded argument against needing all of crypto api just for libcrc32c. It does make sense to me to have a libcrc32c that does the HW detection and uses HW assist when present, and just have the cypto api call that. -chris -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html