Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
Eric Biggers wrote: Hi, I did some more testing with zlib-like and multiplicative hashing. The differences for the files I was testing were quite small. However, the proposed hash function, as well as a slightly modified version of it, did come out slightly ahead. So it might as well be used. I also did a few quick tests with greedy parsing. In general, it seemed to improve performance by 10-20% and increase the size of the compressed files by 1-2%. If the performance improvement is considered more desirable, then I can change the patch to use greedy parsing. For now it's using lazy parsing, like it was before but more optimized. I tend to be more interested in compression rates than in timings. I mostly use compression for directories which are rarely updated. Jean-Pierre -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
Eric Biggers wrote: Good news: I tried some different constants for multiplicative hashing, and the results for two constants were as good as the CRC-based hash function, and faster to compute (at least on x86). So here's the revised patch that does away with static data completely. Good indeed. Slightly better results than the previous version. See comments below. Jean-Pierre --- diff --git a/libntfs-3g/compress.c b/libntfs-3g/compress.c index 73ad283..b356e35 100644 --- a/libntfs-3g/compress.c +++ b/libntfs-3g/compress.c [...] /* + * Hash the next 3-byte sequence in the input buffer + */ +static inline unsigned int ntfs_hash(const u8 *p) +{ + u32 str; + +#if defined(__i386__) || defined(__x86_64__) + /* Unaligned access okay */ Also warn about only valid on a little endian cpu. + str = *(u32 *)p 0xFF; Please cast to const u32* to avoid a compiler warning about dropping a const qualifier. +#else + str = ((u32)p[0] 0) | ((u32)p[1] 8) | ((u32)p[2] 16); +#endif + + return (str * HASH_MULTIPLIER) (32 - HASH_SHIFT); This is assuming int is 32-bit. Please mask the result with ((1 HASH_SHIFT) - 1) for safety. I have checked that gcc recognizes this as unneeded, so there is no drop in performance with smart compilers. [...] + + /* Search the appropriate hash chain for matches. */ + + for (; cur_match = 0 depth_remaining--; cur_match = prev[cur_match]) { Please split this line (and other ones which are not within the 80 chars limit.) [...] -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
Hi, Very interesting. So far I have just checked my tests and they run fine. You have defined the hash table on static data, and I do not want to enter into the meanings of static data in shared objects in various operating systems (allowed or not, shared by threads or not...). I prefer to have it dynamically allocated (hence never shared by mounts), and pointed to in the volume structure. Unfortunately this means adding an extra argument to ntfs_compress_block() and freeing the table when unmounting. (I will later post the code for that). Also a minor issue : please use http://sourceforge.net/p/ntfs-3g/ntfs-3g/ci/edge/tree/libntfs-3g/compress.c as a reference for your patch for easier integration. Jean-Pierre Eric Biggers wrote: The current compression algorithm does lazy parsing of matches, backed by a binary tree match-finder with one byte hashing. Performance-wise, this approach is not very good for several reasons: (1) One byte hashing results in a lot of hash collisions, which slows down searches. (2) With lazy parsing, many positions never actually need to be matched. But when using binary trees, all the work needs to be done anyway, because the sequence at each position needs to be inserted into the appropriate binary tree. This makes binary trees better suited for optimal parsing --- but that isn't being done and is probably too slow to be practical for NTFS. (3) There was also no hard cut-off on the amount of work done per position. This did not matter too much because the buffer size is never greater than 4096 bytes, but in degenerate cases the binary trees could generate into linked lists and there could be hundreds of matches considered at each position. This patch changes the algorithm to use hash chains instead of binary trees, with much stronger hashing. It also introduces useful (for performance) parameters, such as the nice match length and maximum search depth, that are similar to those used in other commonly used compression algorithms such as zlib's DEFLATE implementation. The performance improvement is very significant, but data-dependent. Compressing text files is faster by about 3x; x86 executables files by about 3x; random data by about 1.7x; all zeroes by about 1.2x; some degenerate cases by over 10x. (I did my tests on an x86_64 CPU.) The compression ratio is the same or slightly worse. It is less than 1% worse on all files I tested except an ASCII representation of a genome. No changes were made to the decompressor. --- libntfs-3g/compress.c | 484 +- 1 file changed, 324 insertions(+), 160 deletions(-) diff --git a/libntfs-3g/compress.c b/libntfs-3g/compress.c index 69b39ed..e62c7dd 100644 --- a/libntfs-3g/compress.c +++ b/libntfs-3g/compress.c @@ -6,6 +6,7 @@ * Copyright (c) 2004-2006 Szabolcs Szakacsits * Copyright (c) 2005 Yura Pakhuchiy * Copyright (c) 2009-2013 Jean-Pierre Andre + * Copyright (c) 2014 Eric Biggers * * This program/include file is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as published @@ -21,17 +22,6 @@ * along with this program (in the main directory of the NTFS-3G * distribution in the file COPYING); if not, write to the Free Software * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - * - * A part of the compression algorithm is based on lzhuf.c whose header - * describes the roles of the original authors (with no apparent copyright - * notice, and according to http://home.earthlink.net/~neilbawd/pall.html - * this was put into public domain in 1988 by Haruhiko OKUMURA). - * - * LZHUF.C English version 1.0 - * Based on Japanese version 29-NOV-1988 - * LZSS coded by Haruhiko OKUMURA - * Adaptive Huffman Coding coded by Haruyasu YOSHIZAKI - * Edited and translated to English by Kenji RIKITAKE */ #ifdef HAVE_CONFIG_H @@ -81,96 +71,210 @@ typedef enum { NTFS_SB_IS_COMPRESSED = 0x8000, } ntfs_compression_constants; +/* Match length at or above which ntfs_best_match() will stop searching for + * longer matches. */ +#define NICE_MATCH_LEN 16 + +/* Maximum length at which a lazy match will be attempted. */ +#define MAX_LAZY_MATCH_LEN 20 + +/* Maximum number of potential matches that ntfs_best_match() will consider at + * each position. */ +#define MAX_SEARCH_DEPTH 24 + +/* Number of entries in the hash table for match-finding. This can be changed, + * but it should be a power of 2 so that computing the hash bucket is fast. */ +#define HASH_LEN (1 14) + struct COMPRESS_CONTEXT { const unsigned char *inbuf; int bufsize; int size; int rel; int mxsz; - s16 head[256]; - s16 lson[NTFS_SB_SIZE]; - s16 rson[NTFS_SB_SIZE]; + s16 head[HASH_LEN]; +
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
Hi, Did you compare with the Microsoft implementation ? I have only checked the biggest file in IE7 update for WinXP (WINDOWS/ie7updates/KB963027-IE7/ieframe.dll) with cluster size 4096 : Original size 6066688 Microsoft implementation 3883008 (64.0%) current implementation 3682304 (60.7%) proposed implementation 3710976 (61.2%) Jean-Pierre Eric Biggers wrote: Also, results from tests I did copying a file to compressed directory on a NTFS-3g mount, with time elapsed and compressed sizes shown: silesia_corpus.tar (211,957,760 bytes) Current 43.318s 111,230,976 bytes Proposed12.903s 111,751,168 bytes canterbury_corpus.tar (2,826,240 bytes): Current 1.778s 1,232,896 bytes Proposed0.142s 1,241,088 bytes Firefox-11-windows-bin.tar (38,225,920 bytes) Current 5.685s 27,418,624 bytes Proposed1.992s 27,492,352 bytes boot.wim (361,315,088 bytes, no internal compression) Current 64.682s 189,124,608 bytes Proposed20.990s 190,386,176 bytes mp3-files.tar (201,646,080 bytes) Current 14.547s 200,916,992 bytes Proposed8.585s 200,937,472 bytes linux-2.4.31-src.tar (174,417,920 bytes) Current 36.115s 75,751,424 bytes Proposed10.262s 76,251,136 bytes gcc-4.7.3.tar.bz2 (82,904,224 bytes) Current 5.637s 82,907,136 bytes Proposed3.276s 82,907,136 bytes ntoskrnl.exe (3,911,040 bytes) Current 0.492s 2,789,376 bytes Proposed0.186s 2,789,376 bytes E_coli_genome.fasta (4,706,040 bytes) Current 1.101s 2,060,288 bytes Proposed0.458s 2,351,104 bytes shakespeare.txt (5,328,042 bytes) Current 0.909s 3,321,856 bytes Proposed0.303s 3,321,856 bytes zeroes.bin (134,217,728 bytes) Current 3.053s 0 bytes Proposed2.601s 0 bytes ascending_digrams.bin (16,777,216 bytes) Current 4.309s 16,777,216 bytes Proposed0.673s 16,777,216 bytes degenerate_trigrams.bin (16,777,216 bytes) Current 9.292s 5,242,880 bytes Proposed0.684s 5,242,880 bytes -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
On Sun, Aug 10, 2014 at 10:45:16AM +0200, Jean-Pierre André wrote: You have defined the hash table on static data, and I do not want to enter into the meanings of static data in shared objects in various operating systems (allowed or not, shared by threads or not...). I prefer to have it dynamically allocated (hence never shared by mounts), and pointed to in the volume structure. Unfortunately this means adding an extra argument to ntfs_compress_block() and freeing the table when unmounting. (I will later post the code for that). Yes, I wondered if that would cause issues. Since the algorithm does not depend on the specific hash function used, an alternative to jumping through hoops to use the crc32_table is to swap ntfs_hash() with another 3-byte hash function, one that does not rely on static data. I will try some other ones (zlib-like which I've already tested a little bit, and maybe multiplicative hashing) and see how they affect the results. Also a minor issue : please use http://sourceforge.net/p/ntfs-3g/ntfs-3g/ci/edge/tree/libntfs-3g/compress.c as a reference for your patch for easier integration. Will do next time. I somehow missed the fact that this repository even exists! Looks like the only conflict is in the change to the copyright block... Eric -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
On Sun, Aug 10, 2014 at 11:18:49AM +0200, Jean-Pierre André wrote: Hi, Did you compare with the Microsoft implementation ? I have only checked the biggest file in IE7 update for WinXP (WINDOWS/ie7updates/KB963027-IE7/ieframe.dll) with cluster size 4096 : Original size 6066688 Microsoft implementation 3883008 (64.0%) current implementation 3682304 (60.7%) proposed implementation 3710976 (61.2%) I have not done any comparisons with the Microsoft implementation yet. Is there a more precise way to test it than actually copying a file to a NTFS volume from Windows? I'm not surprised that it apparently produces a worse compression ratio than NTFS-3g. Although it's impossible to know for sure what their algorithm does, my expectation is that they use hash chains --- similar to my proposal, perhaps with a slightly less exhaustive search --- but use greedy parsing rather than lazy parsing. If there's a desire for even greater performance improvement, then greedy parsing is the way to go. But it will degrade the compression ratio, maybe placing it closer to the Microsoft implementation. Eric -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
Eric Biggers wrote: On Sun, Aug 10, 2014 at 10:45:16AM +0200, Jean-Pierre André wrote: You have defined the hash table on static data, and I do not want to enter into the meanings of static data in shared objects in various operating systems (allowed or not, shared by threads or not...). I prefer to have it dynamically allocated (hence never shared by mounts), and pointed to in the volume structure. Unfortunately this means adding an extra argument to ntfs_compress_block() and freeing the table when unmounting. (I will later post the code for that). Yes, I wondered if that would cause issues. Since the algorithm does not depend on the specific hash function used, an alternative to jumping through hoops to use the crc32_table is to swap ntfs_hash() with another 3-byte hash function, one that does not rely on static data. I will try some other ones (zlib-like which I've already tested a little bit, and maybe multiplicative hashing) and see how they affect the results. I gather the crc polynomial you propose has proved efficient in other compression algorithm, and is supposed to be stable. Then a simpler way is to compute the table once for all, and encode it as an array of constants. Just comment out the encoding function in case someone would like to test other polynomials. Also a minor issue : please use http://sourceforge.net/p/ntfs-3g/ntfs-3g/ci/edge/tree/libntfs-3g/compress.c as a reference for your patch for easier integration. Will do next time. I somehow missed the fact that this repository even exists! Looks like the only conflict is in the change to the copyright block... Indeed. Jean-Pierre Eric -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
Eric Biggers wrote: On Sun, Aug 10, 2014 at 11:18:49AM +0200, Jean-Pierre André wrote: Hi, Did you compare with the Microsoft implementation ? I have only checked the biggest file in IE7 update for WinXP (WINDOWS/ie7updates/KB963027-IE7/ieframe.dll) with cluster size 4096 : Original size 6066688 Microsoft implementation 3883008 (64.0%) current implementation 3682304 (60.7%) proposed implementation 3710976 (61.2%) I have not done any comparisons with the Microsoft implementation yet. Is there a more precise way to test it than actually copying a file to a NTFS volume from Windows? For a better way you would have to identify which is the dll which compresses , and submit compression tasks with some control over the durations. I'm not surprised that it apparently produces a worse compression ratio than NTFS-3g. Although it's impossible to know for sure what their algorithm does, my expectation is that they use hash chains --- similar to my proposal, perhaps with a slightly less exhaustive search --- but use greedy parsing rather than lazy parsing. I had analyzed the difference of results, and I was surprised to find that the full length of the matching string was not always used (such as found a matching string at some position with a matching length of 20, but only used a length of 12 and the next match not being better than the expected 8 bytes), and there does not appear to be a fixed maximum length (when all bytes are the same, the matching length is 4095 as would be expected). They probably bargained the duration against the compression rate. If there's a desire for even greater performance improvement, then greedy parsing is the way to go. But it will degrade the compression ratio, maybe placing it closer to the Microsoft implementation. Eric -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
On Sun, Aug 10, 2014 at 05:29:53PM +0200, Jean-Pierre André wrote: For a better way you would have to identify which is the dll which compresses , and submit compression tasks with some control over the durations. RtlCompressBuffer() in ntdll.dll can do LZNT1 compression, which I think is the same as NTFS compression. But I wouldn't be surprised if there is actually another implementation in Microsoft's NTFS driver, which would be inaccessible. But either way, simply copying files to a NTFS volume is probably good enough for approximate benchmarking; that's what I was doing with NTFS-3g, after all. I had analyzed the difference of results, and I was surprised to find that the full length of the matching string was not always used (such as found a matching string at some position with a matching length of 20, but only used a length of 12 and the next match not being better than the expected 8 bytes), and there does not appear to be a fixed maximum length (when all bytes are the same, the matching length is 4095 as would be expected). They probably bargained the duration against the compression rate. This is strange. If the algorithm does the work to find a match at some position, it should at least extend it to its full length. Although a non-greedy parser will not necessarily choose that full length, it's unexpected that the algorithm would actually choose a sequence of matches that is *worse* than that which a greedy parser would choose. Is it possible that the length 12 match was less distant than the length 20 match? If it was, then this would be an expected result of an incomplete search using hash chains. Regardless, it's probably possible to implement something that beats the Microsoft implementation (and I have done this for the XPRESS-Huffman and LZX algorithms), so I personally wouldn't read too much into how they might be doing things. Eric -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
Eric Biggers wrote: On Sun, Aug 10, 2014 at 04:51:44PM +0200, Jean-Pierre André wrote: I gather the crc polynomial you propose has proved efficient in other compression algorithm, and is supposed to be stable. Then a simpler way is to compute the table once for all, and encode it as an array of constants. Just comment out the encoding function in case someone would like to test other polynomials. The proposed hash function is borrowed from LZMA, the algorithm used by xz. The CRC polynomial is standard; it's widely used for checksums (Ethernet, PNG, zlib, bzip2, etc.). The main consideration is how to actually use the CRC32 table to hash 3 bytes. A true CRC would checksum all three bytes, but the current proposal only does a CRC on the first byte, then mixes the other two bytes into the hash, which is what LZMA does. I agree that a hard-coded table is the simplest solution if CRC32 is retained in the hash function. I'll post a patch for that unless I decide a different hash function is preferable. For a hash modulus of (1 14), you can even halve the needed space by truncating to u16's. Jean-Pierre Eric -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
Hi, I did some more testing with zlib-like and multiplicative hashing. The differences for the files I was testing were quite small. However, the proposed hash function, as well as a slightly modified version of it, did come out slightly ahead. So it might as well be used. I also did a few quick tests with greedy parsing. In general, it seemed to improve performance by 10-20% and increase the size of the compressed files by 1-2%. If the performance improvement is considered more desirable, then I can change the patch to use greedy parsing. For now it's using lazy parsing, like it was before but more optimized. Here's the updated patch. diff --git a/libntfs-3g/compress.c b/libntfs-3g/compress.c index 73ad283..1fefc3e 100644 --- a/libntfs-3g/compress.c +++ b/libntfs-3g/compress.c @@ -6,6 +6,7 @@ * Copyright (c) 2004-2006 Szabolcs Szakacsits * Copyright (c) 2005 Yura Pakhuchiy * Copyright (c) 2009-2014 Jean-Pierre Andre + * Copyright (c) 2014 Eric Biggers * * This program/include file is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as published @@ -21,17 +22,6 @@ * along with this program (in the main directory of the NTFS-3G * distribution in the file COPYING); if not, write to the Free Software * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - * - * A part of the compression algorithm is based on lzhuf.c whose header - * describes the roles of the original authors (with no apparent copyright - * notice, and according to http://home.earthlink.net/~neilbawd/pall.html - * this was put into public domain in 1988 by Haruhiko OKUMURA). - * - * LZHUF.C English version 1.0 - * Based on Japanese version 29-NOV-1988 - * LZSS coded by Haruhiko OKUMURA - * Adaptive Huffman Coding coded by Haruyasu YOSHIZAKI - * Edited and translated to English by Kenji RIKITAKE */ #ifdef HAVE_CONFIG_H @@ -81,96 +71,256 @@ typedef enum { NTFS_SB_IS_COMPRESSED = 0x8000, } ntfs_compression_constants; +/* Match length at or above which ntfs_best_match() will stop searching for + * longer matches. */ +#define NICE_MATCH_LEN 18 + +/* Maximum number of potential matches that ntfs_best_match() will consider at + * each position. */ +#define MAX_SEARCH_DEPTH 24 + +/* Number of entries in the hash table for match-finding. + * + * This can be changed, but ntfs_hash() would need to be updated to use an + * appropriate shift. Also, if this is made more than 1 16 (not recommended + * for the 4096-byte buffers used in NTFS compression!), then 'crc_table' would + * need to be updated to use 32-bit entries. */ +#define HASH_LEN (1 14) + struct COMPRESS_CONTEXT { const unsigned char *inbuf; int bufsize; int size; int rel; int mxsz; - s16 head[256]; - s16 lson[NTFS_SB_SIZE]; - s16 rson[NTFS_SB_SIZE]; + s16 head[HASH_LEN]; + s16 prev[NTFS_SB_SIZE]; } ; /* - * Search for the longest sequence matching current position + * CRC table for hashing bytes for Lempel-Ziv match-finding. * - * A binary tree is maintained to locate all previously met sequences, - * and this function has to be called for all of them. + * We use a CRC32 for this purpose. But since log2(HASH_LEN) = 16, we only + * need 16 bit entries, each of which contains the low 16 bits of the entry in a + * real CRC32 table. (CRC16 would also work, but it caused more collisions when + * I tried it.) * - * This function is heavily used, it has to be optimized carefully + * Hard-coding the table avoids dealing with thread-safe initialization. + */ +static const u16 crc_table[256] = { + 0x, 0x3096, 0x612C, 0x51BA, 0xC419, 0xF48F, 0xA535, 0x95A3, + 0x8832, 0xB8A4, 0xE91E, 0xD988, 0x4C2B, 0x7CBD, 0x2D07, 0x1D91, + 0x1064, 0x20F2, 0x7148, 0x41DE, 0xD47D, 0xE4EB, 0xB551, 0x85C7, + 0x9856, 0xA8C0, 0xF97A, 0xC9EC, 0x5C4F, 0x6CD9, 0x3D63, 0x0DF5, + 0x20C8, 0x105E, 0x41E4, 0x7172, 0xE4D1, 0xD447, 0x85FD, 0xB56B, + 0xA8FA, 0x986C, 0xC9D6, 0xF940, 0x6CE3, 0x5C75, 0x0DCF, 0x3D59, + 0x30AC, 0x003A, 0x5180, 0x6116, 0xF4B5, 0xC423, 0x9599, 0xA50F, + 0xB89E, 0x8808, 0xD9B2, 0xE924, 0x7C87, 0x4C11, 0x1DAB, 0x2D3D, + 0x4190, 0x7106, 0x20BC, 0x102A, 0x8589, 0xB51F, 0xE4A5, 0xD433, + 0xC9A2, 0xF934, 0xA88E, 0x9818, 0x0DBB, 0x3D2D, 0x6C97, 0x5C01, + 0x51F4, 0x6162, 0x30D8, 0x004E, 0x95ED, 0xA57B, 0xF4C1, 0xC457, + 0xD9C6, 0xE950, 0xB8EA, 0x887C, 0x1DDF, 0x2D49, 0x7CF3, 0x4C65, + 0x6158, 0x51CE, 0x0074, 0x30E2, 0xA541, 0x95D7, 0xC46D, 0xF4FB, + 0xE96A, 0xD9FC, 0x8846, 0xB8D0, 0x2D73, 0x1DE5, 0x4C5F, 0x7CC9, + 0x713C, 0x41AA, 0x1010, 0x2086, 0xB525, 0x85B3, 0xD409, 0xE49F, + 0xF90E, 0xC998, 0x9822, 0xA8B4, 0x3D17, 0x0D81, 0x5C3B, 0x6CAD, + 0x8320, 0xB3B6, 0xE20C, 0xD29A, 0x4739, 0x77AF, 0x2615, 0x1683, + 0x0B12, 0x3B84, 0x6A3E, 0x5AA8, 0xCF0B,
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
Good news: I tried some different constants for multiplicative hashing, and the results for two constants were as good as the CRC-based hash function, and faster to compute (at least on x86). So here's the revised patch that does away with static data completely. --- diff --git a/libntfs-3g/compress.c b/libntfs-3g/compress.c index 73ad283..b356e35 100644 --- a/libntfs-3g/compress.c +++ b/libntfs-3g/compress.c @@ -6,6 +6,7 @@ * Copyright (c) 2004-2006 Szabolcs Szakacsits * Copyright (c) 2005 Yura Pakhuchiy * Copyright (c) 2009-2014 Jean-Pierre Andre + * Copyright (c) 2014 Eric Biggers * * This program/include file is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as published @@ -21,17 +22,6 @@ * along with this program (in the main directory of the NTFS-3G * distribution in the file COPYING); if not, write to the Free Software * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - * - * A part of the compression algorithm is based on lzhuf.c whose header - * describes the roles of the original authors (with no apparent copyright - * notice, and according to http://home.earthlink.net/~neilbawd/pall.html - * this was put into public domain in 1988 by Haruhiko OKUMURA). - * - * LZHUF.C English version 1.0 - * Based on Japanese version 29-NOV-1988 - * LZSS coded by Haruhiko OKUMURA - * Adaptive Huffman Coding coded by Haruyasu YOSHIZAKI - * Edited and translated to English by Kenji RIKITAKE */ #ifdef HAVE_CONFIG_H @@ -81,96 +71,183 @@ typedef enum { NTFS_SB_IS_COMPRESSED = 0x8000, } ntfs_compression_constants; +/* Match length at or above which ntfs_best_match() will stop searching for + * longer matches. */ +#define NICE_MATCH_LEN 18 + +/* Maximum number of potential matches that ntfs_best_match() will consider at + * each position. */ +#define MAX_SEARCH_DEPTH 24 + +/* log base 2 of the number of entries in the hash table for match-finding. */ +#define HASH_SHIFT 14 + +/* Constant for the multiplicative hash function. */ +#define HASH_MULTIPLIER 0x1E35A7BD + struct COMPRESS_CONTEXT { const unsigned char *inbuf; int bufsize; int size; int rel; int mxsz; - s16 head[256]; - s16 lson[NTFS_SB_SIZE]; - s16 rson[NTFS_SB_SIZE]; + s16 head[1 HASH_SHIFT]; + s16 prev[NTFS_SB_SIZE]; } ; /* + * Hash the next 3-byte sequence in the input buffer + */ +static inline unsigned int ntfs_hash(const u8 *p) +{ + u32 str; + +#if defined(__i386__) || defined(__x86_64__) + /* Unaligned access okay */ + str = *(u32 *)p 0xFF; +#else + str = ((u32)p[0] 0) | ((u32)p[1] 8) | ((u32)p[2] 16); +#endif + + return (str * HASH_MULTIPLIER) (32 - HASH_SHIFT); +} + +/* * Search for the longest sequence matching current position * - * A binary tree is maintained to locate all previously met sequences, - * and this function has to be called for all of them. + * A hash table, each entry of which points to a chain of sequence + * positions sharing the corresponding hash code, is maintained to speed up + * searching for matches. To maintain the hash table, either + * ntfs_best_match() or ntfs_skip_position() has to be called for each + * consecutive position. + * + * This function is heavily used; it has to be optimized carefully. + * + * This function sets pctx-size and pctx-rel to the length and offset, + * respectively, of the longest match found. + * + * The minimum match length is assumed to be 3, and the maximum match + * length is assumed to be pctx-mxsz. If this function produces + * pctx-size 3, then no match was found. + * + * Note: for the following reasons, this function is not guaranteed to find + * *the* longest match up to pctx-mxsz: * - * This function is heavily used, it has to be optimized carefully + * (1) If this function finds a match of NICE_MATCH_LEN bytes or greater, + * it ends early because a match this long is good enough and it's not + * worth spending more time searching. * - * Returns the size of the longest match, - * zero if no match is found. + * (2) If this function considers MAX_SEARCH_DEPTH matches with a single + * position, it ends early and returns the longest match found so far. + * This saves a lot of time on degenerate inputs. */ - -static int ntfs_best_match(struct COMPRESS_CONTEXT *pctx, int i) +static void ntfs_best_match(struct COMPRESS_CONTEXT *pctx, const int i, + int best_len) { - s16 *prev; - int node; - register long j; - long maxpos; - long startj; - long bestj; - int bufsize; - int bestnode; - register const unsigned char *p1,*p2; - - p1 = pctx-inbuf; - node = pctx-head[p1[i] 255]; -
[ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
The current compression algorithm does lazy parsing of matches, backed by a binary tree match-finder with one byte hashing. Performance-wise, this approach is not very good for several reasons: (1) One byte hashing results in a lot of hash collisions, which slows down searches. (2) With lazy parsing, many positions never actually need to be matched. But when using binary trees, all the work needs to be done anyway, because the sequence at each position needs to be inserted into the appropriate binary tree. This makes binary trees better suited for optimal parsing --- but that isn't being done and is probably too slow to be practical for NTFS. (3) There was also no hard cut-off on the amount of work done per position. This did not matter too much because the buffer size is never greater than 4096 bytes, but in degenerate cases the binary trees could generate into linked lists and there could be hundreds of matches considered at each position. This patch changes the algorithm to use hash chains instead of binary trees, with much stronger hashing. It also introduces useful (for performance) parameters, such as the nice match length and maximum search depth, that are similar to those used in other commonly used compression algorithms such as zlib's DEFLATE implementation. The performance improvement is very significant, but data-dependent. Compressing text files is faster by about 3x; x86 executables files by about 3x; random data by about 1.7x; all zeroes by about 1.2x; some degenerate cases by over 10x. (I did my tests on an x86_64 CPU.) The compression ratio is the same or slightly worse. It is less than 1% worse on all files I tested except an ASCII representation of a genome. No changes were made to the decompressor. --- libntfs-3g/compress.c | 484 +- 1 file changed, 324 insertions(+), 160 deletions(-) diff --git a/libntfs-3g/compress.c b/libntfs-3g/compress.c index 69b39ed..e62c7dd 100644 --- a/libntfs-3g/compress.c +++ b/libntfs-3g/compress.c @@ -6,6 +6,7 @@ * Copyright (c) 2004-2006 Szabolcs Szakacsits * Copyright (c) 2005 Yura Pakhuchiy * Copyright (c) 2009-2013 Jean-Pierre Andre + * Copyright (c) 2014 Eric Biggers * * This program/include file is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License as published @@ -21,17 +22,6 @@ * along with this program (in the main directory of the NTFS-3G * distribution in the file COPYING); if not, write to the Free Software * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - * - * A part of the compression algorithm is based on lzhuf.c whose header - * describes the roles of the original authors (with no apparent copyright - * notice, and according to http://home.earthlink.net/~neilbawd/pall.html - * this was put into public domain in 1988 by Haruhiko OKUMURA). - * - * LZHUF.C English version 1.0 - * Based on Japanese version 29-NOV-1988 - * LZSS coded by Haruhiko OKUMURA - * Adaptive Huffman Coding coded by Haruyasu YOSHIZAKI - * Edited and translated to English by Kenji RIKITAKE */ #ifdef HAVE_CONFIG_H @@ -81,96 +71,210 @@ typedef enum { NTFS_SB_IS_COMPRESSED = 0x8000, } ntfs_compression_constants; +/* Match length at or above which ntfs_best_match() will stop searching for + * longer matches. */ +#define NICE_MATCH_LEN 16 + +/* Maximum length at which a lazy match will be attempted. */ +#define MAX_LAZY_MATCH_LEN 20 + +/* Maximum number of potential matches that ntfs_best_match() will consider at + * each position. */ +#define MAX_SEARCH_DEPTH 24 + +/* Number of entries in the hash table for match-finding. This can be changed, + * but it should be a power of 2 so that computing the hash bucket is fast. */ +#define HASH_LEN (1 14) + struct COMPRESS_CONTEXT { const unsigned char *inbuf; int bufsize; int size; int rel; int mxsz; - s16 head[256]; - s16 lson[NTFS_SB_SIZE]; - s16 rson[NTFS_SB_SIZE]; + s16 head[HASH_LEN]; + s16 prev[NTFS_SB_SIZE]; } ; +#define CRC32_POLYNOMIAL 0xEDB88320 + +static u32 crc32_table[256]; + +static void do_crc32_init(void) +{ + int i, j; + u32 r; + + for (i = 0; i 256; i++) { + r = i; + for (j = 0; j 8; j++) + r = (r 1) ^ (CRC32_POLYNOMIAL ~((r 1) - 1)); + crc32_table[i] = r; + } +} + +/* + * Initialize the CRC32 table for ntfs_hash() if not done already + */ +static void crc32_init(void) +{ + static int done = 0; + + if (!done) { + do_crc32_init(); + done = 1; + } +} + +/* + * Hash the next 3-byte sequence in the input buffer + * + * Currently, we use a hash function similar to that used in LZMA. It + * takes slightly longer to compute than zlib's hash
Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm
Also, results from tests I did copying a file to compressed directory on a NTFS-3g mount, with time elapsed and compressed sizes shown: silesia_corpus.tar (211,957,760 bytes) Current 43.318s 111,230,976 bytes Proposed12.903s 111,751,168 bytes canterbury_corpus.tar (2,826,240 bytes): Current 1.778s 1,232,896 bytes Proposed0.142s 1,241,088 bytes Firefox-11-windows-bin.tar (38,225,920 bytes) Current 5.685s 27,418,624 bytes Proposed1.992s 27,492,352 bytes boot.wim (361,315,088 bytes, no internal compression) Current 64.682s 189,124,608 bytes Proposed20.990s 190,386,176 bytes mp3-files.tar (201,646,080 bytes) Current 14.547s 200,916,992 bytes Proposed8.585s 200,937,472 bytes linux-2.4.31-src.tar (174,417,920 bytes) Current 36.115s 75,751,424 bytes Proposed10.262s 76,251,136 bytes gcc-4.7.3.tar.bz2 (82,904,224 bytes) Current 5.637s 82,907,136 bytes Proposed3.276s 82,907,136 bytes ntoskrnl.exe (3,911,040 bytes) Current 0.492s 2,789,376 bytes Proposed0.186s 2,789,376 bytes E_coli_genome.fasta (4,706,040 bytes) Current 1.101s 2,060,288 bytes Proposed0.458s 2,351,104 bytes shakespeare.txt (5,328,042 bytes) Current 0.909s 3,321,856 bytes Proposed0.303s 3,321,856 bytes zeroes.bin (134,217,728 bytes) Current 3.053s 0 bytes Proposed2.601s 0 bytes ascending_digrams.bin (16,777,216 bytes) Current 4.309s 16,777,216 bytes Proposed0.673s 16,777,216 bytes degenerate_trigrams.bin (16,777,216 bytes) Current 9.292s 5,242,880 bytes Proposed0.684s 5,242,880 bytes -- ___ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel