Re: Re[2]: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]
G'day all. Andrew Coppin wrote: Actually, LZW works surprisingly well for such a trivial little algorithm... When you compare it to the complexity of an arithmetic coder driven by a high-order adaptive PPM model (theoretically the best general-purpose algorithm that exists), it's amazing that anything so simple as LZW should work at all! Not really. LZW is basically PPM with a static (and flat!) frequency prediction model. The contexts are build in a very similar way. Quoting Bulat Ziganshin [EMAIL PROTECTED]: the devil in details. just imagine size of huffman table with 64k entries :) If you're allowed to pick your code values, you can do this extremely compactly. Some compression schemes optimised for English separates words from the intermediate space and assigns a Huffman code to each word. The tables aren't that big (though the space to hold the words might be). Cheers, Andrew Bromage ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]
Hello Andrew, Sunday, July 8, 2007, 7:07:59 PM, you wrote: Actually, LZW works surprisingly well for such a trivial little algorithm... When you compare it to the complexity of an arithmetic coder driven by a high-order adaptive PPM model (theoretically the best general-purpose algorithm that exists), it's amazing that anything so simple as LZW should work at all! i don't think that ppm is so complex - it's just probability of symbol in some context. it's just too slow in naive implementation there is no much meaning to use huffman with lzw because many words will have only one or two occurrences (The downside of course is that now we need a Huffman table in the output - and any rare symbols might end up with rather long codewords. But remember: Huffman *guarantees* 0% compression or higher on the payload itself. A Huffman-compressed payload can *never* be bigger, only smaller or same size. So long as you can encode the Huffman table efficiently, you should be fine...) the devil in details. just imagine size of huffman table with 64k entries :) huffman encoding is inappropriate for lzw output simply because most words will have only a few occurrences and economy on their optimal encoding doesn't justify price of their entries in the table .ru = Russia? of course (Realistically though. My program takes a [Word8] and turns it into a [Bool] before running a parser over it. The GHC optimiser doesn't really stand a hope in hell of optimising that into a program that reads a machine word into a CPU register and starts playing with bit flips on it...) as time goes, those terrible compilers becomes smarter and smarter :) Oh hey, I think GHC is already pretty smart. But no optimiser can ever hope to cover *every* possible case. And transforming [Bool] - [Bool] into UArray Word8 - UArray Word8 just seems a little bit optimistic, methinks. ;-) 15 years ago i've written very smart asm program (btw, it was ARJ unpacker) with handmade function inlining, loop unrolling, register allocation, cpu recognition and so on. now, most of these tricks are standard for C compilers. times changes and now it's hard to imagine which optimizations will be available 10 years later The way I heard it is that it *does* native code generation, but it's not as good as the via-C version. (Can't imagine why... You would have thought directly implementing functional primitives would be much easier than trying to mangle them to fit C's multitude of limitations. Still, what do I know?) ghc's native and via-C modes are blind vs lame. in native mode, its codegenerator is comparable with 20 years-old C codegenerators. see above how much modern C compilers changed in these years. in via-C mode it generates unnatural C code which is hard to optimize for any C compiler. the jhc is very different story - it generates natural C code, so for simple, low-level programs its speed is the same as with C. but it lacks huge base of high-level GHC optimizations. as you may guess, in order to have C-like speed, compiler should implement both good high-level and low-level optimization. there are (low-priority) plans to provide LLVM backend for ghc which may solve this problem. but actually speed of generated code isn't number 1 priority for ghc users -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]
Hello Donald, Sunday, July 8, 2007, 12:50:36 PM, you wrote: too much quoting :( Good work. Probably worth benchmarking against the other compression libraries are you really want to totally discredit Haskell? :) they should be hundreds of times slower than any practical compression algorithm (and btw, zlib/bzlib isn't good performers anyway, my own algorithm is several times faster with the same compression ratio) Haskell isn't a good tool to develop compression algorithms because it's the very well studied area where it has meaning to use all the sorts of optimizations. so it falls in the low-level algorithms category where using Haskell means at least 3x slower development and 3x worse performance - or faster development with 100x worse performance. Andrew's code should fall into later category -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]
Hello Stefan, Sunday, July 8, 2007, 7:22:03 PM, you wrote: This is very true - but -fvia-C isn't really C. If you want to see what a difference exists between true C and the NCG, try compiling your program with -fvia-C -unreg even better, try to to use jhc. it compiles via gcc and unlike ghc, it generates natural code. as result, it's only haskell compiler that is able to generate C-quality code: http://thread.gmane.org/gmane.comp.lang.haskell.glasgow.user/8827 -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]
Hello Andrew, Sunday, July 8, 2007, 7:12:38 PM, you wrote: (Realistically though. My program takes a [Word8] and turns it into a [Bool] before running a parser over it. The GHC optimiser doesn't really stand a hope in hell of optimising that into a program that reads a machine word into a CPU register and starts playing with bit flips on it...) Actually, if you're very lucky (fusion is just as hard in Haskell as it is in real life), it *does*. It seems to fit nicely into the stream-fusion framework. Ooo... really? That's pretty impressive...(!) it's our collective tale for bringing new haskellers. i bet that Stefan never seen asm code generated by ghc :) -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]
Hello Stefan, Sunday, July 8, 2007, 11:03:00 PM, you wrote: (Realistically though. My program takes a [Word8] and turns it into a [Bool] before running a parser over it. The GHC optimiser doesn't really stand a hope in hell of optimising that into a program that reads a machine word into a CPU register and starts playing with bit flips on it...) well, can you give us an example of code which does that Andrew said and translates into the simple bit fiddling without battling all around lazy lists which kills performance? -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe