Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm

2014-08-11 Thread Jean-Pierre André
Eric Biggers wrote:
 Hi,

 I did some more testing with zlib-like and multiplicative hashing.  The
 differences for the files I was testing were quite small.  However, the 
 proposed
 hash function, as well as a slightly modified version of it, did come out
 slightly ahead.  So it might as well be used.

 I also did a few quick tests with greedy parsing.  In general, it seemed to
 improve performance by 10-20% and increase the size of the compressed files by
 1-2%.  If the performance improvement is considered more desirable, then I can
 change the patch to use greedy parsing.  For now it's using lazy parsing, like
 it was before but more optimized.

I tend to be more interested in compression rates than
in timings. I mostly use compression for directories which
are rarely updated.

Jean-Pierre



--
___
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel


Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm

2014-08-11 Thread Jean-Pierre André
Eric Biggers wrote:
 Good news: I tried some different constants for multiplicative hashing, and 
 the
 results for two constants were as good as the CRC-based hash function, and
 faster to compute (at least on x86).  So here's the revised patch that does 
 away
 with static data completely.

Good indeed. Slightly better results than the previous
version.

See comments below.

Jean-Pierre



 ---

 diff --git a/libntfs-3g/compress.c b/libntfs-3g/compress.c
 index 73ad283..b356e35 100644
 --- a/libntfs-3g/compress.c
 +++ b/libntfs-3g/compress.c
[...]

   
   /*
 + *   Hash the next 3-byte sequence in the input buffer
 + */
 +static inline unsigned int ntfs_hash(const u8 *p)
 +{
 + u32 str;
 +
 +#if defined(__i386__) || defined(__x86_64__)
 + /* Unaligned access okay  */
Also warn about only valid on a little endian cpu.
 + str = *(u32 *)p  0xFF;
Please cast to const u32* to avoid a compiler warning
about dropping a const qualifier.
 +#else
 + str = ((u32)p[0]  0) | ((u32)p[1]  8) | ((u32)p[2]  16);
 +#endif
 +
 + return (str * HASH_MULTIPLIER)  (32 - HASH_SHIFT);

This is assuming int is 32-bit. Please mask the result
with ((1  HASH_SHIFT) - 1) for safety.
I have checked that gcc recognizes this as unneeded,
so there is no drop in performance with smart compilers.

[...]


 +
 + /* Search the appropriate hash chain for matches.  */
 +
 + for (; cur_match = 0  depth_remaining--; cur_match = 
 prev[cur_match]) {

Please split this line (and other ones which are not
within the 80 chars limit.)

[...]

--
___
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel


Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm

2014-08-10 Thread Jean-Pierre André
Hi,

Very interesting.

So far I have just checked my tests and they run fine.

You have defined the hash table on static data, and I do
not want to enter into the meanings of static data in
shared objects in various operating systems (allowed or
not, shared by threads or not...). I prefer to have it
dynamically allocated (hence never shared by mounts),
and pointed to in the volume structure. Unfortunately
this means adding an extra argument to
ntfs_compress_block() and freeing the table when unmounting.
(I will later post the code for that).

Also a minor issue : please use
http://sourceforge.net/p/ntfs-3g/ntfs-3g/ci/edge/tree/libntfs-3g/compress.c
as a reference for your patch for easier integration.

Jean-Pierre


Eric Biggers wrote:
 The current compression algorithm does lazy parsing of matches, backed
 by a binary tree match-finder with one byte hashing.  Performance-wise,
 this approach is not very good for several reasons:

 (1) One byte hashing results in a lot of hash collisions, which slows
  down searches.

 (2) With lazy parsing, many positions never actually need to be
  matched.  But when using binary trees, all the work needs to be done
  anyway, because the sequence at each position needs to be inserted
  into the appropriate binary tree.  This makes binary trees better
  suited for optimal parsing --- but that isn't being done and is
  probably too slow to be practical for NTFS.

 (3) There was also no hard cut-off on the amount of work done per
  position.  This did not matter too much because the buffer size is
  never greater than 4096 bytes, but in degenerate cases the binary
  trees could generate into linked lists and there could be hundreds of
  matches considered at each position.

 This patch changes the algorithm to use hash chains instead of binary
 trees, with much stronger hashing.  It also introduces useful (for
 performance) parameters, such as the nice match length and maximum
 search depth, that are similar to those used in other commonly used
 compression algorithms such as zlib's DEFLATE implementation.

 The performance improvement is very significant, but data-dependent.
 Compressing text files is faster by about 3x; x86 executables files by
 about 3x; random data by about 1.7x; all zeroes by about 1.2x; some
 degenerate cases by over 10x.  (I did my tests on an x86_64 CPU.)

 The compression ratio is the same or slightly worse.  It is less than 1%
 worse on all files I tested except an ASCII representation of a genome.

 No changes were made to the decompressor.
 ---
   libntfs-3g/compress.c | 484 
 +-
   1 file changed, 324 insertions(+), 160 deletions(-)

 diff --git a/libntfs-3g/compress.c b/libntfs-3g/compress.c
 index 69b39ed..e62c7dd 100644
 --- a/libntfs-3g/compress.c
 +++ b/libntfs-3g/compress.c
 @@ -6,6 +6,7 @@
* Copyright (c) 2004-2006 Szabolcs Szakacsits
* Copyright (c)  2005 Yura Pakhuchiy
* Copyright (c) 2009-2013 Jean-Pierre Andre
 + * Copyright (c)  2014 Eric Biggers
*
* This program/include file is free software; you can redistribute it 
 and/or
* modify it under the terms of the GNU General Public License as published
 @@ -21,17 +22,6 @@
* along with this program (in the main directory of the NTFS-3G
* distribution in the file COPYING); if not, write to the Free Software
* Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 - *
 - * A part of the compression algorithm is based on lzhuf.c whose header
 - * describes the roles of the original authors (with no apparent copyright
 - * notice, and according to http://home.earthlink.net/~neilbawd/pall.html
 - * this was put into public domain in 1988 by Haruhiko OKUMURA).
 - *
 - * LZHUF.C English version 1.0
 - * Based on Japanese version 29-NOV-1988
 - * LZSS coded by Haruhiko OKUMURA
 - * Adaptive Huffman Coding coded by Haruyasu YOSHIZAKI
 - * Edited and translated to English by Kenji RIKITAKE
*/
   
   #ifdef HAVE_CONFIG_H
 @@ -81,96 +71,210 @@ typedef enum {
   NTFS_SB_IS_COMPRESSED   =   0x8000,
   } ntfs_compression_constants;
   
 +/* Match length at or above which ntfs_best_match() will stop searching for
 + * longer matches.  */
 +#define NICE_MATCH_LEN 16
 +
 +/* Maximum length at which a lazy match will be attempted.  */
 +#define MAX_LAZY_MATCH_LEN 20
 +
 +/* Maximum number of potential matches that ntfs_best_match() will consider 
 at
 + * each position.  */
 +#define MAX_SEARCH_DEPTH 24
 +
 +/* Number of entries in the hash table for match-finding.  This can be 
 changed,
 + * but it should be a power of 2 so that computing the hash bucket is fast.  
 */
 +#define HASH_LEN (1  14)
 +
   struct COMPRESS_CONTEXT {
   const unsigned char *inbuf;
   int bufsize;
   int size;
   int rel;
   int mxsz;
 - s16 head[256];
 - s16 lson[NTFS_SB_SIZE];
 - s16 rson[NTFS_SB_SIZE];
 + s16 head[HASH_LEN];
 + 

Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm

2014-08-10 Thread Jean-Pierre André
Hi,

Did you compare with the Microsoft implementation ?

I have only checked the biggest file in IE7 update for WinXP
(WINDOWS/ie7updates/KB963027-IE7/ieframe.dll) with
cluster size 4096 :

Original size 6066688
Microsoft implementation 3883008 (64.0%)
current implementation 3682304 (60.7%)
proposed implementation 3710976 (61.2%)

Jean-Pierre

Eric Biggers wrote:
 Also, results from tests I did copying a file to compressed directory on a
 NTFS-3g mount, with time elapsed and compressed sizes shown:

 silesia_corpus.tar (211,957,760 bytes)
   Current 43.318s 111,230,976 bytes
   Proposed12.903s 111,751,168 bytes

 canterbury_corpus.tar (2,826,240 bytes):
   Current 1.778s  1,232,896 bytes
   Proposed0.142s  1,241,088 bytes

 Firefox-11-windows-bin.tar (38,225,920 bytes)
   Current 5.685s  27,418,624 bytes
   Proposed1.992s  27,492,352 bytes

 boot.wim (361,315,088 bytes, no internal compression)
   Current 64.682s 189,124,608 bytes
   Proposed20.990s 190,386,176 bytes

 mp3-files.tar (201,646,080 bytes)
   Current 14.547s 200,916,992 bytes
   Proposed8.585s  200,937,472 bytes

 linux-2.4.31-src.tar (174,417,920 bytes)
   Current 36.115s 75,751,424 bytes
   Proposed10.262s 76,251,136 bytes

 gcc-4.7.3.tar.bz2 (82,904,224 bytes)
   Current 5.637s  82,907,136 bytes
   Proposed3.276s  82,907,136 bytes

 ntoskrnl.exe (3,911,040 bytes)
   Current 0.492s  2,789,376 bytes
   Proposed0.186s  2,789,376 bytes

 E_coli_genome.fasta (4,706,040 bytes)
   Current 1.101s  2,060,288 bytes
   Proposed0.458s  2,351,104 bytes

 shakespeare.txt (5,328,042 bytes)
   Current 0.909s  3,321,856 bytes
   Proposed0.303s  3,321,856 bytes

 zeroes.bin (134,217,728 bytes)
   Current 3.053s  0 bytes
   Proposed2.601s  0 bytes

 ascending_digrams.bin (16,777,216 bytes)
   Current 4.309s  16,777,216 bytes
   Proposed0.673s  16,777,216 bytes

 degenerate_trigrams.bin (16,777,216 bytes)
   Current 9.292s  5,242,880 bytes
   Proposed0.684s  5,242,880 bytes

 --
 ___
 ntfs-3g-devel mailing list
 ntfs-3g-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel



--
___
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel


Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm

2014-08-10 Thread Eric Biggers
On Sun, Aug 10, 2014 at 10:45:16AM +0200, Jean-Pierre André wrote:
 
 You have defined the hash table on static data, and I do
 not want to enter into the meanings of static data in
 shared objects in various operating systems (allowed or
 not, shared by threads or not...). I prefer to have it
 dynamically allocated (hence never shared by mounts),
 and pointed to in the volume structure. Unfortunately
 this means adding an extra argument to
 ntfs_compress_block() and freeing the table when unmounting.
 (I will later post the code for that).
 

Yes, I wondered if that would cause issues.  Since the algorithm does not depend
on the specific hash function used, an alternative to jumping through hoops to
use the crc32_table is to swap ntfs_hash() with another 3-byte hash function,
one that does not rely on static data.  I will try some other ones (zlib-like
which I've already tested a little bit, and maybe multiplicative hashing) and
see how they affect the results.

 Also a minor issue : please use
 http://sourceforge.net/p/ntfs-3g/ntfs-3g/ci/edge/tree/libntfs-3g/compress.c
 as a reference for your patch for easier integration.

Will do next time.  I somehow missed the fact that this repository even exists!
Looks like the only conflict is in the change to the copyright block...

Eric

--
___
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel


Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm

2014-08-10 Thread Eric Biggers
On Sun, Aug 10, 2014 at 11:18:49AM +0200, Jean-Pierre André wrote:
 Hi,
 
 Did you compare with the Microsoft implementation ?
 
 I have only checked the biggest file in IE7 update for WinXP
 (WINDOWS/ie7updates/KB963027-IE7/ieframe.dll) with
 cluster size 4096 :
 
 Original size 6066688
 Microsoft implementation 3883008 (64.0%)
 current implementation 3682304 (60.7%)
 proposed implementation 3710976 (61.2%)

I have not done any comparisons with the Microsoft implementation yet.  Is there
a more precise way to test it than actually copying a file to a NTFS volume from
Windows?

I'm not surprised that it apparently produces a worse compression ratio than
NTFS-3g.  Although it's impossible to know for sure what their algorithm does,
my expectation is that they use hash chains --- similar to my proposal, perhaps
with a slightly less exhaustive search --- but use greedy parsing rather than
lazy parsing.

If there's a desire for even greater performance improvement, then greedy
parsing is the way to go.  But it will degrade the compression ratio, maybe
placing it closer to the Microsoft implementation.

Eric

--
___
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel


Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm

2014-08-10 Thread Jean-Pierre André
Eric Biggers wrote:
 On Sun, Aug 10, 2014 at 10:45:16AM +0200, Jean-Pierre André wrote:
 You have defined the hash table on static data, and I do
 not want to enter into the meanings of static data in
 shared objects in various operating systems (allowed or
 not, shared by threads or not...). I prefer to have it
 dynamically allocated (hence never shared by mounts),
 and pointed to in the volume structure. Unfortunately
 this means adding an extra argument to
 ntfs_compress_block() and freeing the table when unmounting.
 (I will later post the code for that).

 Yes, I wondered if that would cause issues.  Since the algorithm does not 
 depend
 on the specific hash function used, an alternative to jumping through hoops to
 use the crc32_table is to swap ntfs_hash() with another 3-byte hash function,
 one that does not rely on static data.  I will try some other ones (zlib-like
 which I've already tested a little bit, and maybe multiplicative hashing) and
 see how they affect the results.

I gather the crc polynomial you propose has proved
efficient in other compression algorithm, and is
supposed to be stable. Then a simpler way is to
compute the table once for all, and encode it as an
array of constants. Just comment out the encoding
function in case someone would like to test other
polynomials.

 Also a minor issue : please use
 http://sourceforge.net/p/ntfs-3g/ntfs-3g/ci/edge/tree/libntfs-3g/compress.c
 as a reference for your patch for easier integration.
 Will do next time.  I somehow missed the fact that this repository even 
 exists!
 Looks like the only conflict is in the change to the copyright block...

Indeed.

Jean-Pierre


 Eric




--
___
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel


Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm

2014-08-10 Thread Jean-Pierre André
Eric Biggers wrote:
 On Sun, Aug 10, 2014 at 11:18:49AM +0200, Jean-Pierre André wrote:
 Hi,

 Did you compare with the Microsoft implementation ?

 I have only checked the biggest file in IE7 update for WinXP
 (WINDOWS/ie7updates/KB963027-IE7/ieframe.dll) with
 cluster size 4096 :

 Original size 6066688
 Microsoft implementation 3883008 (64.0%)
 current implementation 3682304 (60.7%)
 proposed implementation 3710976 (61.2%)
 I have not done any comparisons with the Microsoft implementation yet.  Is 
 there
 a more precise way to test it than actually copying a file to a NTFS volume 
 from
 Windows?

For a better way you would have to identify which is the dll
which compresses , and submit compression tasks with some
control over the durations.

 I'm not surprised that it apparently produces a worse compression ratio than
 NTFS-3g.  Although it's impossible to know for sure what their algorithm does,
 my expectation is that they use hash chains --- similar to my proposal, 
 perhaps
 with a slightly less exhaustive search --- but use greedy parsing rather 
 than
 lazy parsing.

I had analyzed the difference of results, and I was surprised
to find that the full length of the matching string was not
always used (such as found a matching string at some position
with a matching length of 20, but only used a length of 12
and the next match not being better than the expected 8 bytes),
and there does not appear to be a fixed maximum length
(when all bytes are the same, the matching length is 4095
as would be expected).

They probably bargained the duration against the compression
rate.

 If there's a desire for even greater performance improvement, then greedy
 parsing is the way to go.  But it will degrade the compression ratio, maybe
 placing it closer to the Microsoft implementation.

 Eric




--
___
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel


Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm

2014-08-10 Thread Eric Biggers
On Sun, Aug 10, 2014 at 05:29:53PM +0200, Jean-Pierre André wrote:
 
 For a better way you would have to identify which is the dll
 which compresses , and submit compression tasks with some
 control over the durations.
 

RtlCompressBuffer() in ntdll.dll can do LZNT1 compression, which I think is the
same as NTFS compression.  But I wouldn't be surprised if there is actually
another implementation in Microsoft's NTFS driver, which would be inaccessible.
But either way, simply copying files to a NTFS volume is probably good enough
for approximate benchmarking; that's what I was doing with NTFS-3g, after all.

 I had analyzed the difference of results, and I was surprised
 to find that the full length of the matching string was not
 always used (such as found a matching string at some position
 with a matching length of 20, but only used a length of 12
 and the next match not being better than the expected 8 bytes),
 and there does not appear to be a fixed maximum length
 (when all bytes are the same, the matching length is 4095
 as would be expected).
 
 They probably bargained the duration against the compression
 rate.

This is strange.  If the algorithm does the work to find a match at some
position, it should at least extend it to its full length.  Although a
non-greedy parser will not necessarily choose that full length, it's unexpected
that the algorithm would actually choose a sequence of matches that is *worse*
than that which a greedy parser would choose.  Is it possible that the length 12
match was less distant than the length 20 match?  If it was, then this would be
an expected result of an incomplete search using hash chains.

Regardless, it's probably possible to implement something that beats the
Microsoft implementation (and I have done this for the XPRESS-Huffman and LZX
algorithms), so I personally wouldn't read too much into how they might be doing
things.

Eric

--
___
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel


Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm

2014-08-10 Thread Jean-Pierre André
Eric Biggers wrote:
 On Sun, Aug 10, 2014 at 04:51:44PM +0200, Jean-Pierre André wrote:
 I gather the crc polynomial you propose has proved
 efficient in other compression algorithm, and is
 supposed to be stable. Then a simpler way is to
 compute the table once for all, and encode it as an
 array of constants. Just comment out the encoding
 function in case someone would like to test other
 polynomials.

 The proposed hash function is borrowed from LZMA, the algorithm used by xz.  
 The
 CRC polynomial is standard; it's widely used for checksums (Ethernet, PNG, 
 zlib,
 bzip2, etc.).  The main consideration is how to actually use the CRC32 table 
 to
 hash 3 bytes.  A true CRC would checksum all three bytes, but the current
 proposal only does a CRC on the first byte, then mixes the other two bytes 
 into
 the hash, which is what LZMA does.

 I agree that a hard-coded table is the simplest solution if CRC32 is retained 
 in
 the hash function.  I'll post a patch for that unless I decide a different 
 hash
 function is preferable.

For a hash modulus of (1  14), you can even halve the
needed space by truncating to u16's.

Jean-Pierre


 Eric




--
___
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel


Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm

2014-08-10 Thread Eric Biggers
Hi,

I did some more testing with zlib-like and multiplicative hashing.  The
differences for the files I was testing were quite small.  However, the proposed
hash function, as well as a slightly modified version of it, did come out
slightly ahead.  So it might as well be used.

I also did a few quick tests with greedy parsing.  In general, it seemed to
improve performance by 10-20% and increase the size of the compressed files by
1-2%.  If the performance improvement is considered more desirable, then I can
change the patch to use greedy parsing.  For now it's using lazy parsing, like
it was before but more optimized.

Here's the updated patch.



diff --git a/libntfs-3g/compress.c b/libntfs-3g/compress.c
index 73ad283..1fefc3e 100644
--- a/libntfs-3g/compress.c
+++ b/libntfs-3g/compress.c
@@ -6,6 +6,7 @@
  * Copyright (c) 2004-2006 Szabolcs Szakacsits
  * Copyright (c)  2005 Yura Pakhuchiy
  * Copyright (c) 2009-2014 Jean-Pierre Andre
+ * Copyright (c)  2014 Eric Biggers
  *
  * This program/include file is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License as published
@@ -21,17 +22,6 @@
  * along with this program (in the main directory of the NTFS-3G
  * distribution in the file COPYING); if not, write to the Free Software
  * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
- *
- * A part of the compression algorithm is based on lzhuf.c whose header
- * describes the roles of the original authors (with no apparent copyright
- * notice, and according to http://home.earthlink.net/~neilbawd/pall.html
- * this was put into public domain in 1988 by Haruhiko OKUMURA).
- *
- * LZHUF.C English version 1.0
- * Based on Japanese version 29-NOV-1988   
- * LZSS coded by Haruhiko OKUMURA
- * Adaptive Huffman Coding coded by Haruyasu YOSHIZAKI
- * Edited and translated to English by Kenji RIKITAKE
  */
 
 #ifdef HAVE_CONFIG_H
@@ -81,96 +71,256 @@ typedef enum {
NTFS_SB_IS_COMPRESSED   =   0x8000,
 } ntfs_compression_constants;
 
+/* Match length at or above which ntfs_best_match() will stop searching for
+ * longer matches.  */
+#define NICE_MATCH_LEN 18
+
+/* Maximum number of potential matches that ntfs_best_match() will consider at
+ * each position.  */
+#define MAX_SEARCH_DEPTH 24
+
+/* Number of entries in the hash table for match-finding.
+ *
+ * This can be changed, but ntfs_hash() would need to be updated to use an
+ * appropriate shift.  Also, if this is made more than 1  16 (not recommended
+ * for the 4096-byte buffers used in NTFS compression!), then 'crc_table' would
+ * need to be updated to use 32-bit entries.  */
+#define HASH_LEN (1  14)
+
 struct COMPRESS_CONTEXT {
const unsigned char *inbuf;
int bufsize;
int size;
int rel;
int mxsz;
-   s16 head[256];
-   s16 lson[NTFS_SB_SIZE];
-   s16 rson[NTFS_SB_SIZE];
+   s16 head[HASH_LEN];
+   s16 prev[NTFS_SB_SIZE];
 } ;
 
 /*
- * Search for the longest sequence matching current position
+ * CRC table for hashing bytes for Lempel-Ziv match-finding.
  *
- * A binary tree is maintained to locate all previously met sequences,
- * and this function has to be called for all of them.
+ * We use a CRC32 for this purpose.  But since log2(HASH_LEN) = 16, we only
+ * need 16 bit entries, each of which contains the low 16 bits of the entry in 
a
+ * real CRC32 table.  (CRC16 would also work, but it caused more collisions 
when
+ * I tried it.)
  *
- * This function is heavily used, it has to be optimized carefully
+ * Hard-coding the table avoids dealing with thread-safe initialization.
+ */
+static const u16 crc_table[256] = {
+   0x, 0x3096, 0x612C, 0x51BA, 0xC419, 0xF48F, 0xA535, 0x95A3,
+   0x8832, 0xB8A4, 0xE91E, 0xD988, 0x4C2B, 0x7CBD, 0x2D07, 0x1D91,
+   0x1064, 0x20F2, 0x7148, 0x41DE, 0xD47D, 0xE4EB, 0xB551, 0x85C7,
+   0x9856, 0xA8C0, 0xF97A, 0xC9EC, 0x5C4F, 0x6CD9, 0x3D63, 0x0DF5,
+   0x20C8, 0x105E, 0x41E4, 0x7172, 0xE4D1, 0xD447, 0x85FD, 0xB56B,
+   0xA8FA, 0x986C, 0xC9D6, 0xF940, 0x6CE3, 0x5C75, 0x0DCF, 0x3D59,
+   0x30AC, 0x003A, 0x5180, 0x6116, 0xF4B5, 0xC423, 0x9599, 0xA50F,
+   0xB89E, 0x8808, 0xD9B2, 0xE924, 0x7C87, 0x4C11, 0x1DAB, 0x2D3D,
+   0x4190, 0x7106, 0x20BC, 0x102A, 0x8589, 0xB51F, 0xE4A5, 0xD433,
+   0xC9A2, 0xF934, 0xA88E, 0x9818, 0x0DBB, 0x3D2D, 0x6C97, 0x5C01,
+   0x51F4, 0x6162, 0x30D8, 0x004E, 0x95ED, 0xA57B, 0xF4C1, 0xC457,
+   0xD9C6, 0xE950, 0xB8EA, 0x887C, 0x1DDF, 0x2D49, 0x7CF3, 0x4C65,
+   0x6158, 0x51CE, 0x0074, 0x30E2, 0xA541, 0x95D7, 0xC46D, 0xF4FB,
+   0xE96A, 0xD9FC, 0x8846, 0xB8D0, 0x2D73, 0x1DE5, 0x4C5F, 0x7CC9,
+   0x713C, 0x41AA, 0x1010, 0x2086, 0xB525, 0x85B3, 0xD409, 0xE49F,
+   0xF90E, 0xC998, 0x9822, 0xA8B4, 0x3D17, 0x0D81, 0x5C3B, 0x6CAD,
+   0x8320, 0xB3B6, 0xE20C, 0xD29A, 0x4739, 0x77AF, 0x2615, 0x1683,
+   0x0B12, 0x3B84, 0x6A3E, 0x5AA8, 0xCF0B, 

Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm

2014-08-10 Thread Eric Biggers
Good news: I tried some different constants for multiplicative hashing, and the
results for two constants were as good as the CRC-based hash function, and
faster to compute (at least on x86).  So here's the revised patch that does away
with static data completely.

---

diff --git a/libntfs-3g/compress.c b/libntfs-3g/compress.c
index 73ad283..b356e35 100644
--- a/libntfs-3g/compress.c
+++ b/libntfs-3g/compress.c
@@ -6,6 +6,7 @@
  * Copyright (c) 2004-2006 Szabolcs Szakacsits
  * Copyright (c)  2005 Yura Pakhuchiy
  * Copyright (c) 2009-2014 Jean-Pierre Andre
+ * Copyright (c)  2014 Eric Biggers
  *
  * This program/include file is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License as published
@@ -21,17 +22,6 @@
  * along with this program (in the main directory of the NTFS-3G
  * distribution in the file COPYING); if not, write to the Free Software
  * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
- *
- * A part of the compression algorithm is based on lzhuf.c whose header
- * describes the roles of the original authors (with no apparent copyright
- * notice, and according to http://home.earthlink.net/~neilbawd/pall.html
- * this was put into public domain in 1988 by Haruhiko OKUMURA).
- *
- * LZHUF.C English version 1.0
- * Based on Japanese version 29-NOV-1988   
- * LZSS coded by Haruhiko OKUMURA
- * Adaptive Huffman Coding coded by Haruyasu YOSHIZAKI
- * Edited and translated to English by Kenji RIKITAKE
  */
 
 #ifdef HAVE_CONFIG_H
@@ -81,96 +71,183 @@ typedef enum {
NTFS_SB_IS_COMPRESSED   =   0x8000,
 } ntfs_compression_constants;
 
+/* Match length at or above which ntfs_best_match() will stop searching for
+ * longer matches.  */
+#define NICE_MATCH_LEN 18
+
+/* Maximum number of potential matches that ntfs_best_match() will consider at
+ * each position.  */
+#define MAX_SEARCH_DEPTH 24
+
+/* log base 2 of the number of entries in the hash table for match-finding.  */
+#define HASH_SHIFT 14
+
+/* Constant for the multiplicative hash function.  */
+#define HASH_MULTIPLIER 0x1E35A7BD
+
 struct COMPRESS_CONTEXT {
const unsigned char *inbuf;
int bufsize;
int size;
int rel;
int mxsz;
-   s16 head[256];
-   s16 lson[NTFS_SB_SIZE];
-   s16 rson[NTFS_SB_SIZE];
+   s16 head[1  HASH_SHIFT];
+   s16 prev[NTFS_SB_SIZE];
 } ;
 
 /*
+ * Hash the next 3-byte sequence in the input buffer
+ */
+static inline unsigned int ntfs_hash(const u8 *p)
+{
+   u32 str;
+
+#if defined(__i386__) || defined(__x86_64__)
+   /* Unaligned access okay  */
+   str = *(u32 *)p  0xFF;
+#else
+   str = ((u32)p[0]  0) | ((u32)p[1]  8) | ((u32)p[2]  16);
+#endif
+
+   return (str * HASH_MULTIPLIER)  (32 - HASH_SHIFT);
+}
+
+/*
  * Search for the longest sequence matching current position
  *
- * A binary tree is maintained to locate all previously met sequences,
- * and this function has to be called for all of them.
+ * A hash table, each entry of which points to a chain of sequence
+ * positions sharing the corresponding hash code, is maintained to speed up
+ * searching for matches.  To maintain the hash table, either
+ * ntfs_best_match() or ntfs_skip_position() has to be called for each
+ * consecutive position.
+ *
+ * This function is heavily used; it has to be optimized carefully.
+ *
+ * This function sets pctx-size and pctx-rel to the length and offset,
+ * respectively, of the longest match found.
+ *
+ * The minimum match length is assumed to be 3, and the maximum match
+ * length is assumed to be pctx-mxsz.  If this function produces
+ * pctx-size  3, then no match was found.
+ *
+ * Note: for the following reasons, this function is not guaranteed to find
+ * *the* longest match up to pctx-mxsz:
  *
- * This function is heavily used, it has to be optimized carefully
+ * (1) If this function finds a match of NICE_MATCH_LEN bytes or greater,
+ * it ends early because a match this long is good enough and it's not
+ * worth spending more time searching.
  *
- * Returns the size of the longest match,
- * zero if no match is found.
+ * (2) If this function considers MAX_SEARCH_DEPTH matches with a single
+ * position, it ends early and returns the longest match found so far.
+ * This saves a lot of time on degenerate inputs.
  */
-
-static int ntfs_best_match(struct COMPRESS_CONTEXT *pctx, int i)
+static void ntfs_best_match(struct COMPRESS_CONTEXT *pctx, const int i,
+   int best_len)
 {
-   s16 *prev;
-   int node;
-   register long j;
-   long maxpos;
-   long startj;
-   long bestj;
-   int bufsize;
-   int bestnode;
-   register const unsigned char *p1,*p2;
-
-   p1 = pctx-inbuf;
-   node = pctx-head[p1[i]  255];
-  

[ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm

2014-08-09 Thread Eric Biggers
The current compression algorithm does lazy parsing of matches, backed
by a binary tree match-finder with one byte hashing.  Performance-wise,
this approach is not very good for several reasons:

(1) One byte hashing results in a lot of hash collisions, which slows
down searches.

(2) With lazy parsing, many positions never actually need to be
matched.  But when using binary trees, all the work needs to be done
anyway, because the sequence at each position needs to be inserted
into the appropriate binary tree.  This makes binary trees better
suited for optimal parsing --- but that isn't being done and is
probably too slow to be practical for NTFS.

(3) There was also no hard cut-off on the amount of work done per
position.  This did not matter too much because the buffer size is
never greater than 4096 bytes, but in degenerate cases the binary
trees could generate into linked lists and there could be hundreds of
matches considered at each position.

This patch changes the algorithm to use hash chains instead of binary
trees, with much stronger hashing.  It also introduces useful (for
performance) parameters, such as the nice match length and maximum
search depth, that are similar to those used in other commonly used
compression algorithms such as zlib's DEFLATE implementation.

The performance improvement is very significant, but data-dependent.
Compressing text files is faster by about 3x; x86 executables files by
about 3x; random data by about 1.7x; all zeroes by about 1.2x; some
degenerate cases by over 10x.  (I did my tests on an x86_64 CPU.)

The compression ratio is the same or slightly worse.  It is less than 1%
worse on all files I tested except an ASCII representation of a genome.

No changes were made to the decompressor.
---
 libntfs-3g/compress.c | 484 +-
 1 file changed, 324 insertions(+), 160 deletions(-)

diff --git a/libntfs-3g/compress.c b/libntfs-3g/compress.c
index 69b39ed..e62c7dd 100644
--- a/libntfs-3g/compress.c
+++ b/libntfs-3g/compress.c
@@ -6,6 +6,7 @@
  * Copyright (c) 2004-2006 Szabolcs Szakacsits
  * Copyright (c)  2005 Yura Pakhuchiy
  * Copyright (c) 2009-2013 Jean-Pierre Andre
+ * Copyright (c)  2014 Eric Biggers
  *
  * This program/include file is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License as published
@@ -21,17 +22,6 @@
  * along with this program (in the main directory of the NTFS-3G
  * distribution in the file COPYING); if not, write to the Free Software
  * Foundation,Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
- *
- * A part of the compression algorithm is based on lzhuf.c whose header
- * describes the roles of the original authors (with no apparent copyright
- * notice, and according to http://home.earthlink.net/~neilbawd/pall.html
- * this was put into public domain in 1988 by Haruhiko OKUMURA).
- *
- * LZHUF.C English version 1.0
- * Based on Japanese version 29-NOV-1988   
- * LZSS coded by Haruhiko OKUMURA
- * Adaptive Huffman Coding coded by Haruyasu YOSHIZAKI
- * Edited and translated to English by Kenji RIKITAKE
  */
 
 #ifdef HAVE_CONFIG_H
@@ -81,96 +71,210 @@ typedef enum {
NTFS_SB_IS_COMPRESSED   =   0x8000,
 } ntfs_compression_constants;
 
+/* Match length at or above which ntfs_best_match() will stop searching for
+ * longer matches.  */
+#define NICE_MATCH_LEN 16
+
+/* Maximum length at which a lazy match will be attempted.  */
+#define MAX_LAZY_MATCH_LEN 20
+
+/* Maximum number of potential matches that ntfs_best_match() will consider at
+ * each position.  */
+#define MAX_SEARCH_DEPTH 24
+
+/* Number of entries in the hash table for match-finding.  This can be changed,
+ * but it should be a power of 2 so that computing the hash bucket is fast.  */
+#define HASH_LEN (1  14)
+
 struct COMPRESS_CONTEXT {
const unsigned char *inbuf;
int bufsize;
int size;
int rel;
int mxsz;
-   s16 head[256];
-   s16 lson[NTFS_SB_SIZE];
-   s16 rson[NTFS_SB_SIZE];
+   s16 head[HASH_LEN];
+   s16 prev[NTFS_SB_SIZE];
 } ;
 
+#define CRC32_POLYNOMIAL 0xEDB88320
+
+static u32 crc32_table[256];
+
+static void do_crc32_init(void)
+{
+   int i, j;
+   u32 r;
+
+   for (i = 0; i  256; i++) {
+   r = i;
+   for (j = 0; j  8; j++)
+   r = (r  1) ^ (CRC32_POLYNOMIAL  ~((r  1) - 1));
+   crc32_table[i] = r;
+   }
+}
+
+/*
+ * Initialize the CRC32 table for ntfs_hash() if not done already
+ */
+static void crc32_init(void)
+{
+   static int done = 0;
+
+   if (!done) {
+   do_crc32_init();
+   done = 1;
+   }
+}
+
+/*
+ * Hash the next 3-byte sequence in the input buffer
+ *
+ * Currently, we use a hash function similar to that used in LZMA.  It
+ * takes slightly longer to compute than zlib's hash 

Re: [ntfs-3g-devel] [PATCH] compress.c: Speed up NTFS compression algorithm

2014-08-09 Thread Eric Biggers
Also, results from tests I did copying a file to compressed directory on a
NTFS-3g mount, with time elapsed and compressed sizes shown:

silesia_corpus.tar (211,957,760 bytes)
Current 43.318s 111,230,976 bytes
Proposed12.903s 111,751,168 bytes

canterbury_corpus.tar (2,826,240 bytes):
Current 1.778s  1,232,896 bytes
Proposed0.142s  1,241,088 bytes

Firefox-11-windows-bin.tar (38,225,920 bytes)
Current 5.685s  27,418,624 bytes
Proposed1.992s  27,492,352 bytes

boot.wim (361,315,088 bytes, no internal compression)
Current 64.682s 189,124,608 bytes
Proposed20.990s 190,386,176 bytes

mp3-files.tar (201,646,080 bytes)
Current 14.547s 200,916,992 bytes
Proposed8.585s  200,937,472 bytes

linux-2.4.31-src.tar (174,417,920 bytes)
Current 36.115s 75,751,424 bytes
Proposed10.262s 76,251,136 bytes

gcc-4.7.3.tar.bz2 (82,904,224 bytes)
Current 5.637s  82,907,136 bytes
Proposed3.276s  82,907,136 bytes

ntoskrnl.exe (3,911,040 bytes)
Current 0.492s  2,789,376 bytes
Proposed0.186s  2,789,376 bytes

E_coli_genome.fasta (4,706,040 bytes)
Current 1.101s  2,060,288 bytes
Proposed0.458s  2,351,104 bytes

shakespeare.txt (5,328,042 bytes)
Current 0.909s  3,321,856 bytes
Proposed0.303s  3,321,856 bytes

zeroes.bin (134,217,728 bytes)
Current 3.053s  0 bytes
Proposed2.601s  0 bytes

ascending_digrams.bin (16,777,216 bytes)
Current 4.309s  16,777,216 bytes
Proposed0.673s  16,777,216 bytes

degenerate_trigrams.bin (16,777,216 bytes)
Current 9.292s  5,242,880 bytes
Proposed0.684s  5,242,880 bytes

--
___
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel