Re: Need a Faster Compressor

2016-10-17 Thread Era Scarecrow via Digitalmars-d
On Tuesday, 24 May 2016 at 20:25:18 UTC, Era Scarecrow wrote: 64 bytes? Hmm I had the impression it was going to be far larger... Well it's been a while since I last touched this topic (or code) so I'll post it, but there's 3 formulas written. 0) compress.c - control code, we are comparin

Re: Need a Faster Compressor

2016-05-25 Thread Marco Leise via Digitalmars-d
Am Tue, 24 May 2016 19:16:07 + schrieb Anon : > Upping it to sixteen levels: > > current: Generate a 3_688_679 byte symbol > lz4 -9: Generate a 3_688_679 byte symbol, then compress it to > 15535 bytes > lzma -9: Generate a 3_688_679 byte symbol, then compress it to > 840 bytes > BRNT: Gener

Re: Need a Faster Compressor

2016-05-24 Thread Walter Bright via Digitalmars-d
On 5/24/2016 2:34 PM, Stefan Koch wrote: But would one ever have to be able to reconstruct the mangle ? Symbolic debugging, and dealing with various linker errors. templates cannot be exported nor can instanciations. Or can they ? Of course they can!

Re: Need a Faster Compressor

2016-05-24 Thread Stefan Koch via Digitalmars-d
On Tuesday, 24 May 2016 at 21:26:58 UTC, Walter Bright wrote: On 5/24/2016 2:06 PM, Stefan Koch wrote: A serious question, do these templates even need a mangle ? Is anyone ever going to call them from a different module ? I don't think you could do that because those are voldemort types. Tho

Re: Need a Faster Compressor

2016-05-24 Thread Walter Bright via Digitalmars-d
On 5/24/2016 2:06 PM, Stefan Koch wrote: On Tuesday, 24 May 2016 at 20:39:41 UTC, Walter Bright wrote: On 5/24/2016 1:27 PM, Guillaume Boucher wrote: There's no reason not to use the compression in the deco name. Just make sure the references are relative and you're set. Not totally, since y

Re: Need a Faster Compressor

2016-05-24 Thread Stefan Koch via Digitalmars-d
On Tuesday, 24 May 2016 at 20:39:41 UTC, Walter Bright wrote: On 5/24/2016 1:27 PM, Guillaume Boucher wrote: There's no reason not to use the compression in the deco name. Just make sure the references are relative and you're set. Not totally, since you'd like to refer to types that are commo

Re: Need a Faster Compressor

2016-05-24 Thread Walter Bright via Digitalmars-d
On 5/24/2016 1:27 PM, Guillaume Boucher wrote: There's no reason not to use the compression in the deco name. Just make sure the references are relative and you're set. Not totally, since you'd like to refer to types that are common across deco names.

Re: Need a Faster Compressor

2016-05-24 Thread Guillaume Boucher via Digitalmars-d
On Tuesday, 24 May 2016 at 20:16:32 UTC, Walter Bright wrote: On 5/24/2016 9:22 AM, Timon Gehr wrote: Yes, it does. The compiler does not use exponential space to store the AST. BTW, all types in dmd have a 'deco' string which is the AST linearized as a string. This string is also used to bui

Re: Need a Faster Compressor

2016-05-24 Thread Era Scarecrow via Digitalmars-d
On Tuesday, 24 May 2016 at 20:07:02 UTC, Walter Bright wrote: On 5/24/2016 4:55 AM, Era Scarecrow wrote: So Walter, big question: How big before the compressor kicks in? What's the minimum size of the input string the compressor should expect? It's currently set at 64 in the PR. But that numb

Re: Need a Faster Compressor

2016-05-24 Thread Guillaume Boucher via Digitalmars-d
On Monday, 23 May 2016 at 20:34:19 UTC, Walter Bright wrote: On 5/23/2016 12:32 PM, Timon Gehr wrote: Isn't it enough to create references to previously embedded mangled symbols (i-th symbol already mangled) while generating the mangled string? That's been covered upthread, and VC++/g++ do t

Re: Need a Faster Compressor

2016-05-24 Thread Walter Bright via Digitalmars-d
On 5/24/2016 9:22 AM, Timon Gehr wrote: Yes, it does. The compiler does not use exponential space to store the AST. BTW, all types in dmd have a 'deco' string which is the AST linearized as a string. This string is also used to build the mangled names. All the deco's are put into a hashtable

Re: Need a Faster Compressor

2016-05-24 Thread Walter Bright via Digitalmars-d
On 5/24/2016 4:55 AM, Era Scarecrow wrote: So Walter, big question: How big before the compressor kicks in? What's the minimum size of the input string the compressor should expect? It's currently set at 64 in the PR. But that number is just picked out of the air. I assume speed is more i

Re: Need a Faster Compressor

2016-05-24 Thread Walter Bright via Digitalmars-d
On 5/24/2016 12:16 PM, Anon wrote: As if winning the compactness war while still being a symbol linkers won't have problems with wasn't enough, this approach also avoids generating giant symbols in the first place. Compression and/or hashing cannot do that. If D really wants to, it can compress s

Re: Need a Faster Compressor

2016-05-24 Thread Anon via Digitalmars-d
On Tuesday, 24 May 2016 at 16:22:56 UTC, Timon Gehr wrote: It's _exponential_ growth. We don't even want to spend the time and memory required to generate the strings. The reason we have this discussion is that the worst case isn't rare enough to make this argument. Why compress in the first

Re: Need a Faster Compressor

2016-05-24 Thread deadalnix via Digitalmars-d
On Tuesday, 24 May 2016 at 18:30:36 UTC, Stefan Koch wrote: On Tuesday, 24 May 2016 at 16:22:56 UTC, Timon Gehr wrote: On 24.05.2016 01:02, Walter Bright wrote: [...] Speed. The title of this thread is "Need a faster compressor". [...] No. Just increase the recursion depth

Re: Need a Faster Compressor

2016-05-24 Thread Stefan Koch via Digitalmars-d
On Tuesday, 24 May 2016 at 16:22:56 UTC, Timon Gehr wrote: On 24.05.2016 01:02, Walter Bright wrote: [...] Speed. The title of this thread is "Need a faster compressor". [...] No. Just increase the recursion depth by a small number of levels to completely kill the speedups

Re: Need a Faster Compressor

2016-05-24 Thread Timon Gehr via Digitalmars-d
ing a compression pass? Speed. The title of this thread is "Need a faster compressor". The only real problem with the compressor was the compression speed, and the LZ4 test shows this is solvable. ... No. Just increase the recursion depth by a small number of levels to completely ki

Re: Need a Faster Compressor

2016-05-24 Thread Era Scarecrow via Digitalmars-d
On Monday, 23 May 2016 at 20:36:04 UTC, Walter Bright wrote: On 5/23/2016 9:00 AM, Stefan Koch wrote: The 64k limit does not apply to the input string. The starting line of the compression function posted here tests for it and aborts if larger. So Walter, big question: How big before the c

Re: Need a Faster Compressor

2016-05-23 Thread Walter Bright via Digitalmars-d
On 5/23/2016 4:49 PM, Stefan Koch wrote: Just a heads up on the LZ4. I have spent roughly 3 hours optimizing my decompresser. And while I had stunning success, a speed-up of about 400%. I am still about 600x slower then the C variant. It is still a mystery to me why that is :) Since the generate

Re: Need a Faster Compressor

2016-05-23 Thread Stefan Koch via Digitalmars-d
Just a heads up on the LZ4. I have spent roughly 3 hours optimizing my decompresser. And while I had stunning success, a speed-up of about 400%. I am still about 600x slower then the C variant. It is still a mystery to me why that is :) Since the generated code both smaller and works almost witho

Re: Need a Faster Compressor

2016-05-23 Thread Walter Bright via Digitalmars-d
On 5/23/2016 2:17 PM, Timon Gehr wrote: Then don't do that. I.e. re-mangle recursively from scratch for each mangled name and allow sharing parts between unrelated components within that mangled name. How is that essentially different from running a compression pass? The only real problem with

Re: Need a Faster Compressor

2016-05-23 Thread Timon Gehr via Digitalmars-d
On 23.05.2016 22:34, Walter Bright wrote: On 5/23/2016 12:32 PM, Timon Gehr wrote: Instead, compression should be performed while generating the mangled string. The trouble with that is the mangled string is formed from component pieces. Then don't do that. I.e. re-mangle recursively from sc

Re: Need a Faster Compressor

2016-05-23 Thread Walter Bright via Digitalmars-d
On 5/23/2016 9:00 AM, Stefan Koch wrote: On Monday, 23 May 2016 at 15:33:45 UTC, Walter Bright wrote: Also, the LZ4 compressor posted here has a 64K string limit, which won't work for D because there are reported 8Mb identifier strings. This is only partially true. The 64k limit does not appl

Re: Need a Faster Compressor

2016-05-23 Thread Walter Bright via Digitalmars-d
On 5/23/2016 12:32 PM, Timon Gehr wrote: Instead, compression should be performed while generating the mangled string. The trouble with that is the mangled string is formed from component pieces. Those component pieces may have common substrings with each other, which won't be detected until

Re: Need a Faster Compressor

2016-05-23 Thread Timon Gehr via Digitalmars-d
On 22.05.2016 00:07, Walter Bright wrote: On 5/21/2016 2:37 PM, Timon Gehr wrote: Why is longest_match Ω(nm) instead of O(n+m) (e.g. KMP)? I don't understand the terms you use, but as to the "why" it is based on what I knew about LZ77 compression. I don't pretend to be an expert on compressio

Re: Need a Faster Compressor

2016-05-23 Thread Marco Leise via Digitalmars-d
Am Mon, 23 May 2016 12:04:48 + schrieb Stefan Koch : > The method for archiving perfect compression it outlined here: > https://github.com/Cyan4973/lz4/issues/183 Nice, if it can keep the compression speed up. -- Marco

Re: Need a Faster Compressor

2016-05-23 Thread deadalnix via Digitalmars-d
On Monday, 23 May 2016 at 16:00:20 UTC, Stefan Koch wrote: On Monday, 23 May 2016 at 15:33:45 UTC, Walter Bright wrote: Also, the LZ4 compressor posted here has a 64K string limit, which won't work for D because there are reported 8Mb identifier strings. This is only partially true. The 64k

Re: Need a Faster Compressor

2016-05-23 Thread Stefan Koch via Digitalmars-d
On Monday, 23 May 2016 at 15:33:45 UTC, Walter Bright wrote: Also, the LZ4 compressor posted here has a 64K string limit, which won't work for D because there are reported 8Mb identifier strings. This is only partially true. The 64k limit does not apply to the input string. It does only app

Re: Need a Faster Compressor

2016-05-23 Thread Walter Bright via Digitalmars-d
On 5/23/2016 5:04 AM, Stefan Koch wrote: Am Sun, 22 May 2016 23:42:33 -0700 schrieb Walter Bright : The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html It doesn't look too difficult. If we implement our own LZ4 compressor based on that, from scratch, we can boost license it.

Re: Need a Faster Compressor

2016-05-23 Thread Wyatt via Digitalmars-d
On Monday, 23 May 2016 at 14:47:31 UTC, Wyatt wrote: Maybe consider lz4 instead? Disregard that: I see it's come up already.

Re: Need a Faster Compressor

2016-05-23 Thread Wyatt via Digitalmars-d
On Saturday, 21 May 2016 at 21:27:37 UTC, Era Scarecrow wrote: I assume this is related to compressing symbols thread? I mentioned possibly considering the LZO library. Maybe consider lz4 instead? Tends to be a bit faster, and it's BSD instead of GPL. https://cyan4973.github.io/lz4/ -Wyat

Re: Need a Faster Compressor

2016-05-23 Thread Stefan Koch via Digitalmars-d
Am Sun, 22 May 2016 23:42:33 -0700 schrieb Walter Bright : The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html It doesn't look too difficult. If we implement our own LZ4 compressor based on that, from scratch, we can boost license it. That it right. It's pretty simple. On

Re: Need a Faster Compressor

2016-05-23 Thread ZombineDev via Digitalmars-d
On Monday, 23 May 2016 at 01:46:40 UTC, Era Scarecrow wrote: On Sunday, 22 May 2016 at 19:44:08 UTC, Era Scarecrow wrote: ... Well here's the rundown of some numbers. min_compress uses a tiny window, big_compress was my original algorithmn but modified to use 16k total for a window. reduced_

Re: Need a Faster Compressor

2016-05-23 Thread Marco Leise via Digitalmars-d
Am Sun, 22 May 2016 23:42:33 -0700 schrieb Walter Bright : > The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html > > It doesn't look too difficult. If we implement our own LZ4 compressor based > on > that, from scratch, we can boost license it. Ok, any volunteers? I'm not pers

Re: Need a Faster Compressor

2016-05-22 Thread Walter Bright via Digitalmars-d
The file format: http://cyan4973.github.io/lz4/lz4_Block_format.html It doesn't look too difficult. If we implement our own LZ4 compressor based on that, from scratch, we can boost license it.

Re: Need a Faster Compressor

2016-05-22 Thread Walter Bright via Digitalmars-d
On 5/22/2016 10:52 PM, Marco Leise wrote: Can you elaborate on how copying the source overrules your above license concerns? It does not. It just avoids having to link in a library that may or may not exist on every machine we want to create demanglers on. I would create a directory "lz4" f

Re: Need a Faster Compressor

2016-05-22 Thread Marco Leise via Digitalmars-d
Am Sun, 22 May 2016 20:22:06 +0200 schrieb Rainer Schuetze : > The 3 consecutive ?s are also in the object file, so they are actually > part of the mangled symbol. VC++ uses only ASCII characters for mangling > (which D should do, too, IMO). In general I agree. The extended characters may be sw

Re: Need a Faster Compressor

2016-05-22 Thread Marco Leise via Digitalmars-d
Am Sun, 22 May 2016 13:02:37 -0700 schrieb Walter Bright : > On 5/22/2016 9:06 AM, Marco Leise wrote: > > Are there any objections > > against the benchmarking method or the license? > > BSD may cause problems. > > > Can the implementation be in D? > > Not yet. > > > If not, should we link

Re: Need a Faster Compressor

2016-05-22 Thread Era Scarecrow via Digitalmars-d
On Sunday, 22 May 2016 at 19:44:08 UTC, Era Scarecrow wrote: ... Well here's the rundown of some numbers. min_compress uses a tiny window, big_compress was my original algorithmn but modified to use 16k total for a window. reduced_id_compress is the original except reduced to a 220 window an

Re: Need a Faster Compressor

2016-05-22 Thread Walter Bright via Digitalmars-d
On 5/22/2016 9:06 AM, Marco Leise wrote: Are there any objections against the benchmarking method or the license? BSD may cause problems. Can the implementation be in D? Not yet. If not, should we link against the system liblz4.so/a No. or copy the C code into the backend? Yes. Is

Re: Need a Faster Compressor

2016-05-22 Thread Walter Bright via Digitalmars-d
On 5/22/2016 10:44 AM, Rainer Schuetze wrote: You are right about the symbols using the VC mangling. The test case "1.s.s.s.s.s" in the bugreport translated to C++ yields ?foo@Result@?1???$s@UResult@?1???$s@UResult@?1???$s@UResult@?1???$s@UResult@?1???$s@H@testexpansion@@YA@H@Z@@testexpansion@@Y

Re: Need a Faster Compressor

2016-05-22 Thread Era Scarecrow via Digitalmars-d
On Sunday, 22 May 2016 at 19:21:13 UTC, Walter Bright wrote: On 5/22/2016 3:32 AM, Era Scarecrow wrote: Curiously the extra 95 symbols actually is just enough to keep compression up and bring speed to about 37% speed gain, with it looks like no disadvantages. You're doing the gods' work, Era!

Re: Need a Faster Compressor

2016-05-22 Thread Walter Bright via Digitalmars-d
On 5/22/2016 3:32 AM, Era Scarecrow wrote: [...] My idea for speeding things up is every time a new character c is matched in pattern[], a bit is set in mask: ulong mask; ... mask |= 1 << (c & 63); Then, when scanning for longer matches, test: if (!((1 << (id[i + matchlen -

Re: Need a Faster Compressor

2016-05-22 Thread Walter Bright via Digitalmars-d
On 5/22/2016 3:32 AM, Era Scarecrow wrote: Curiously the extra 95 symbols actually is just enough to keep compression up and bring speed to about 37% speed gain, with it looks like no disadvantages. Then again I still need a larger set to test things on, but it's very promising! You're doing t

Re: Need a Faster Compressor

2016-05-22 Thread Walter Bright via Digitalmars-d
On 5/22/2016 2:57 AM, Era Scarecrow wrote: Question Walter (or Andrei): Would it be terrible to have lower characters as part of the identifier string? I'm referring to characters under 32 (control characters, especially the \r, \n and \t). Currently there's a bug in id_compress() where it doe

Re: Need a Faster Compressor

2016-05-22 Thread Walter Bright via Digitalmars-d
On 5/22/2016 5:50 AM, deadalnix wrote: On Sunday, 22 May 2016 at 01:36:39 UTC, Walter Bright wrote: Just a note: large lookup tables can have cold cache performance problems. 16k lookup table are the gold standard. I like 64 bit vectors because the lookup and test can be done in one instru

Re: Need a Faster Compressor

2016-05-22 Thread Walter Bright via Digitalmars-d
On 5/22/2016 2:07 AM, Era Scarecrow wrote: On Sunday, 22 May 2016 at 08:50:47 UTC, Walter Bright wrote: On 5/21/2016 11:26 PM, Era Scarecrow wrote: With 1 Million iterations: new_compress: TickDuration(311404604) id_compress TickDuration(385806589) Approx 20% increase in speed (if i'm read

Re: Need a Faster Compressor

2016-05-22 Thread Stefan Koch via Digitalmars-d
On Sunday, 22 May 2016 at 18:18:46 UTC, Marco Leise wrote: If you provide the decompressor, we're set. :) I am already working on a non-ctfeable version that does not use slices. And therefore should be faster. But decompression is only needed for demangling.

Re: Need a Faster Compressor

2016-05-22 Thread Rainer Schuetze via Digitalmars-d
On 22.05.2016 20:12, Era Scarecrow wrote: On Sunday, 22 May 2016 at 18:04:25 UTC, Rainer Schuetze wrote: On 22.05.2016 19:56, Era Scarecrow wrote: Unless ? is the proper/valid character those are far more likely to be failed UTF-8 decoding. Care to double check? ? is an allowed character i

Re: Need a Faster Compressor

2016-05-22 Thread Marco Leise via Digitalmars-d
Am Sun, 22 May 2016 17:19:38 + schrieb Stefan Koch : > Here are my answers please take them with a grain of ignorance. > […] Maybe you are right about the dictionary. I haven't put much any thought into it. Anyways I was looking for input from the core devs who are affected by this. > Howeve

Re: Need a Faster Compressor

2016-05-22 Thread Era Scarecrow via Digitalmars-d
On Sunday, 22 May 2016 at 18:04:25 UTC, Rainer Schuetze wrote: On 22.05.2016 19:56, Era Scarecrow wrote: Unless ? is the proper/valid character those are far more likely to be failed UTF-8 decoding. Care to double check? ? is an allowed character in VC++ and for example permits unambiguous d

Re: Need a Faster Compressor

2016-05-22 Thread Rainer Schuetze via Digitalmars-d
On 22.05.2016 19:56, Era Scarecrow wrote: On Sunday, 22 May 2016 at 17:44:22 UTC, Rainer Schuetze wrote: You are right about the symbols using the VC mangling. The test case "1.s.s.s.s.s" in the bugreport translated to C++ yields ?foo@Result@?1???$s@UResult@?1???$s@UResult@?1???$s i.e. 936

Re: Need a Faster Compressor

2016-05-22 Thread Era Scarecrow via Digitalmars-d
On Sunday, 22 May 2016 at 17:44:22 UTC, Rainer Schuetze wrote: You are right about the symbols using the VC mangling. The test case "1.s.s.s.s.s" in the bugreport translated to C++ yields ?foo@Result@?1???$s@UResult@?1???$s@UResult@?1???$s i.e. 936 characters. I think this is due to the very

Re: Need a Faster Compressor

2016-05-22 Thread Rainer Schuetze via Digitalmars-d
On 22.05.2016 00:58, Walter Bright wrote: On 5/21/2016 3:41 PM, Guillaume Boucher wrote: Sorry if I didn't memorize everything in this forum from the last 20 years, can you give a link to some reasoning? DMC++ matches the Microsoft name mangling scheme, which includes such compression. It pr

Re: Need a Faster Compressor

2016-05-22 Thread Era Scarecrow via Digitalmars-d
On Sunday, 22 May 2016 at 16:06:07 UTC, Marco Leise wrote: There are a few open questions, for me at least. Are there other proposals with good properties? Are there any objections against the benchmarking method or the license? Can the implementation be in D? If not, should we link against the

Re: Need a Faster Compressor

2016-05-22 Thread Stefan Koch via Digitalmars-d
On Sunday, 22 May 2016 at 16:06:07 UTC, Marco Leise wrote: There are a few open questions, for me at least. Are there other proposals with good properties? Are there any objections against the benchmarking method or the license? Can the implementation be in D? If not, should we link against

Re: Need a Faster Compressor

2016-05-22 Thread Marco Leise via Digitalmars-d
Am Sun, 22 May 2016 14:06:52 + schrieb Stefan Koch : > On Sunday, 22 May 2016 at 14:05:23 UTC, Marco Leise wrote: > > ⬅ Download proof of concept > > > > This is a proof of concept micro-benchmark compressing the > > 37_640 bytes symbol from the bug report linked above with both > > id_compr

Re: Need a Faster Compressor

2016-05-22 Thread Marco Leise via Digitalmars-d
⬅ Download proof of concept This is a proof of concept micro-benchmark compressing the 37_640 bytes symbol from the bug report linked above with both id_compress from the dmd backend and lz4 (without dictionary). The lz4 result is additionally transformed to 7-bits similar to base-64. Original si

Re: Need a Faster Compressor

2016-05-22 Thread Stefan Koch via Digitalmars-d
On Sunday, 22 May 2016 at 14:05:23 UTC, Marco Leise wrote: ⬅ Download proof of concept This is a proof of concept micro-benchmark compressing the 37_640 bytes symbol from the bug report linked above with both id_compress from the dmd backend and lz4 (without dictionary). The lz4 result is add

Re: Need a Faster Compressor

2016-05-22 Thread Stefan Koch via Digitalmars-d
On Sunday, 22 May 2016 at 12:12:09 UTC, Martin Nowak wrote: Yes, LZO, LZ4, or snappy would be good choices. They are real-time compression algorithms, i.e. it's faster to compress and write to main memory than w/o compression. Yes the LZ compressors are very good for this usecase Please do

Re: Need a Faster Compressor

2016-05-22 Thread deadalnix via Digitalmars-d
On Sunday, 22 May 2016 at 01:36:39 UTC, Walter Bright wrote: Just a note: large lookup tables can have cold cache performance problems. 16k lookup table are the gold standard.

Re: Need a Faster Compressor

2016-05-22 Thread deadalnix via Digitalmars-d
On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote: The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if t

Re: Need a Faster Compressor

2016-05-22 Thread Andrei Alexandrescu via Digitalmars-d
On 5/22/16 6:40 AM, qznc wrote: On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote: The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as do

Re: Need a Faster Compressor

2016-05-22 Thread Martin Nowak via Digitalmars-d
On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote: The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Yes, LZO, LZ4, or snappy would be good choices. They are real-time compression algorithms, i.e. it's faster to compress an

Re: Need a Faster Compressor

2016-05-22 Thread Marco Leise via Digitalmars-d
Am Sun, 22 May 2016 10:55:40 + schrieb Era Scarecrow : > On Sunday, 22 May 2016 at 10:40:46 UTC, qznc wrote: > > On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote: > > It links to SMAZ: > > > > https://github.com/antirez/smaz > > > > It uses a codebook primed for english text. Sinc

Re: Need a Faster Compressor

2016-05-22 Thread Marco Leise via Digitalmars-d
Am Sat, 21 May 2016 14:12:15 -0700 schrieb Walter Bright : > The current one is effective, but slow: > >https://github.com/dlang/dmd/blob/master/src/backend/compress.c > > Anyone want to make a stab at making it faster? Changing the format is fair > game, as well as down and dirty assembler

Re: Need a Faster Compressor

2016-05-22 Thread Era Scarecrow via Digitalmars-d
On Sunday, 22 May 2016 at 10:40:46 UTC, qznc wrote: On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote: It links to SMAZ: https://github.com/antirez/smaz It uses a codebook primed for english text. Since this is for name mangling, priming for Phobos might be a good idea. E.g. put "

Re: Need a Faster Compressor

2016-05-22 Thread qznc via Digitalmars-d
On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote: The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if t

Re: Need a Faster Compressor

2016-05-22 Thread Era Scarecrow via Digitalmars-d
On Sunday, 22 May 2016 at 10:12:11 UTC, Marco Leise wrote: Question Walter (or Andrei): Would it be terrible to have lower characters as part of the identifier string? I'm referring to characters under 32 (control characters, especially the \r, \n and \t). That would be my question, too. What

Re: Need a Faster Compressor

2016-05-22 Thread Marco Leise via Digitalmars-d
Am Sun, 22 May 2016 09:57:20 + schrieb Era Scarecrow : > On Sunday, 22 May 2016 at 09:48:32 UTC, Stefan Koch wrote: > > Nice results. > > This is with one sample to test against. I need a much bigger > sample, but I'm not sure how to generate/find it to give it a > full run for it's mon

Re: Need a Faster Compressor

2016-05-22 Thread Era Scarecrow via Digitalmars-d
On Sunday, 22 May 2016 at 09:48:32 UTC, Stefan Koch wrote: Nice results. This is with one sample to test against. I need a much bigger sample, but I'm not sure how to generate/find it to give it a full run for it's money. Not to mention there's a huge memory leak while doing the tests since

Re: Need a Faster Compressor

2016-05-22 Thread Stefan Koch via Digitalmars-d
On Sunday, 22 May 2016 at 09:43:29 UTC, Era Scarecrow wrote: And shrinking the lookback to a mere 127 increases it faster but compression starts taking a hit. Can't shrink it anymore anyways. modified id_compress2: TickDuration(230528074) 41% faster!! Nice results.

Re: Need a Faster Compressor

2016-05-22 Thread Era Scarecrow via Digitalmars-d
On Sunday, 22 May 2016 at 09:19:54 UTC, Era Scarecrow wrote: On 5/21/2016 11:26 PM, Era Scarecrow wrote: With 1 Million iterations: id_compress: TickDuration(385806589) (original/baseline) modified id_compress: TickDuration(269275629) 31% faster And shrinking the lookback to a me

Re: Need a Faster Compressor

2016-05-22 Thread Era Scarecrow via Digitalmars-d
On Sunday, 22 May 2016 at 09:07:07 UTC, Era Scarecrow wrote: On Sunday, 22 May 2016 at 08:50:47 UTC, Walter Bright wrote: On 5/21/2016 11:26 PM, Era Scarecrow wrote: With 1 Million iterations: new_compress: TickDuration(311404604) id_compress TickDuration(385806589) Approx 20% increase in

Re: Need a Faster Compressor

2016-05-22 Thread Era Scarecrow via Digitalmars-d
On Sunday, 22 May 2016 at 08:50:47 UTC, Walter Bright wrote: On 5/21/2016 11:26 PM, Era Scarecrow wrote: With 1 Million iterations: new_compress: TickDuration(311404604) id_compress TickDuration(385806589) Approx 20% increase in speed (if i'm reading and did this right). It is promising.

Re: Need a Faster Compressor

2016-05-22 Thread Walter Bright via Digitalmars-d
On 5/22/2016 1:50 AM, Walter Bright wrote: It is promising. Need more! I did figure out an improvement to the existing algorithm whereby it could skip forward by the length of the existing match rather than test every character (when it's looking for a longer match). But I have no clue how mu

Re: Need a Faster Compressor

2016-05-22 Thread Walter Bright via Digitalmars-d
On 5/21/2016 11:26 PM, Era Scarecrow wrote: On Sunday, 22 May 2016 at 04:35:15 UTC, Era Scarecrow wrote: Now I just need to stress test it against some really large input and check for failures. Regardless it has a compression and decompression functions, along with benchmarking it vs the compr

Re: Need a Faster Compressor

2016-05-21 Thread Era Scarecrow via Digitalmars-d
On Sunday, 22 May 2016 at 04:35:15 UTC, Era Scarecrow wrote: Now I just need to stress test it against some really large input and check for failures. Regardless it has a compression and decompression functions, along with benchmarking it vs the compression algorithm I'm trying to beat. With

Re: Need a Faster Compressor

2016-05-21 Thread Lionello Lunesu via Digitalmars-d
On 22/5/2016 05:51, Stefan Koch wrote: On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote: The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as wel

Re: Need a Faster Compressor

2016-05-21 Thread Era Scarecrow via Digitalmars-d
On Sunday, 22 May 2016 at 01:46:23 UTC, Era Scarecrow wrote: I know, if it doesn't fit in 64k then you can't rely on the cache... Finished. It's blazing fast and even compresses in my test example as good or better than gzip/zlib (although maybe that's because it's headerless). Now I jus

Re: Need a Faster Compressor

2016-05-21 Thread Era Scarecrow via Digitalmars-d
On Sunday, 22 May 2016 at 01:36:39 UTC, Walter Bright wrote: On 5/21/2016 5:52 PM, Era Scarecrow wrote: I'm going to try for a lookback and scan which uses quite a bit of memory. Just a note: large lookup tables can have cold cache performance problems. I know, if it doesn't fit in 64k th

Re: Need a Faster Compressor

2016-05-21 Thread Walter Bright via Digitalmars-d
On 5/21/2016 5:52 PM, Era Scarecrow wrote: I'm going to try for a lookback and scan which uses quite a bit of memory. On the other hand you can reuse the memory over and over again, so that isn't an issue. Although it will take up a Meg in memory. Also written entirely in D, so... I think I'l

Re: Need a Faster Compressor

2016-05-21 Thread Era Scarecrow via Digitalmars-d
On Saturday, 21 May 2016 at 22:01:40 UTC, Walter Bright wrote: Give it a try, see if it pans out. Other possibilities are the lookback dictionary is fixed at 1023 characters. Experimenting and benchmarking can indicate if it should be more or less. Perhaps 512 will yield just as good compress

Re: Need a Faster Compressor

2016-05-21 Thread Walter Bright via Digitalmars-d
On 5/21/2016 3:41 PM, Guillaume Boucher wrote: Sorry if I didn't memorize everything in this forum from the last 20 years, can you give a link to some reasoning? DMC++ matches the Microsoft name mangling scheme, which includes such compression. It proved hopelessly inadequate, which is why I i

Re: Need a Faster Compressor

2016-05-21 Thread Stefan Koch via Digitalmars-d
On Saturday, 21 May 2016 at 22:41:49 UTC, Guillaume Boucher wrote: It's almost as if using information about the actual data leads to a better compression scheme. Exactly. It always is. Since you know where to look for redundancy instead of blindly comparing bytes.

Re: Need a Faster Compressor

2016-05-21 Thread Guillaume Boucher via Digitalmars-d
On Saturday, 21 May 2016 at 22:07:27 UTC, Walter Bright wrote: As mentioned elsewhere, the C++ mangling scheme has a primitive and ineffective LZ77 scheme built in, so I wouldn't waste time on that. Sorry if I didn't memorize everything in this forum from the last 20 years, can you give a lin

Re: Need a Faster Compressor

2016-05-21 Thread Walter Bright via Digitalmars-d
On 5/21/2016 3:13 PM, Stefan Koch wrote: Could you post a link to code you use for benchmarking ? In the comments: https://github.com/dlang/dmd/pull/5793

Re: Need a Faster Compressor

2016-05-21 Thread Stefan Koch via Digitalmars-d
On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote: The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if t

Re: Need a Faster Compressor

2016-05-21 Thread Walter Bright via Digitalmars-d
On 5/21/2016 2:37 PM, Timon Gehr wrote: Why is longest_match Ω(nm) instead of O(n+m) (e.g. KMP)? I don't understand the terms you use, but as to the "why" it is based on what I knew about LZ77 compression. I don't pretend to be an expert on compression, and this is what I churned out. As ment

Re: Need a Faster Compressor

2016-05-21 Thread Walter Bright via Digitalmars-d
On 5/21/2016 2:27 PM, Era Scarecrow wrote: I assume this is related to compressing symbols thread? Yes. I mentioned possibly considering the LZO library. Give it a try, see if it pans out. Other possibilities are the lookback dictionary is fixed at 1023 characters. Experimenting and benc

Re: Need a Faster Compressor

2016-05-21 Thread Stefan Koch via Digitalmars-d
On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote: The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if t

Re: Need a Faster Compressor

2016-05-21 Thread Timon Gehr via Digitalmars-d
On 21.05.2016 23:37, Timon Gehr wrote: On 21.05.2016 23:12, Walter Bright wrote: The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirt

Re: Need a Faster Compressor

2016-05-21 Thread Timon Gehr via Digitalmars-d
On 21.05.2016 23:12, Walter Bright wrote: The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if that's what it takes. So

Re: Need a Faster Compressor

2016-05-21 Thread Era Scarecrow via Digitalmars-d
On Saturday, 21 May 2016 at 21:12:15 UTC, Walter Bright wrote: So really, how good are you at fast code? I recall being able to get a marginal boost in speed by assigning the first 1-2 bytes in preassigned buckets (256 & 65536 respectively), so you have an immediate 1-2 byte matches (skippin

Need a Faster Compressor

2016-05-21 Thread Walter Bright via Digitalmars-d
The current one is effective, but slow: https://github.com/dlang/dmd/blob/master/src/backend/compress.c Anyone want to make a stab at making it faster? Changing the format is fair game, as well as down and dirty assembler if that's what it takes. So really, how good are you at fast code?