Re: Re[2]: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]

2007-07-10 Thread ajb
G'day all.

Andrew Coppin wrote:

  Actually, LZW works surprisingly well for such a trivial little
  algorithm... When you compare it to the complexity of an arithmetic
  coder driven by a high-order adaptive PPM model (theoretically the best
  general-purpose algorithm that exists), it's amazing that anything so
  simple as LZW should work at all!

Not really.  LZW is basically PPM with a static (and flat!) frequency
prediction model.  The contexts are build in a very similar way.

Quoting Bulat Ziganshin [EMAIL PROTECTED]:

 the devil in details. just imagine size of huffman table with 64k
 entries :)

If you're allowed to pick your code values, you can do this extremely
compactly.  Some compression schemes optimised for English separates
words from the intermediate space and assigns a Huffman code to each
word.  The tables aren't that big (though the space to hold the words
might be).

Cheers,
Andrew Bromage
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]

2007-07-09 Thread Bulat Ziganshin
Hello Andrew,

Sunday, July 8, 2007, 7:07:59 PM, you wrote:

 Actually, LZW works surprisingly well for such a trivial little
 algorithm... When you compare it to the complexity of an arithmetic 
 coder driven by a high-order adaptive PPM model (theoretically the best
 general-purpose algorithm that exists), it's amazing that anything so 
 simple as LZW should work at all!

i don't think that ppm is so complex - it's just probability of
symbol in some context. it's just too slow in naive implementation

 there is no much meaning to use huffman with lzw because many words
 will have only one or two occurrences

 (The downside of course is that now we need a Huffman table in the 
 output - and any rare symbols might end up with rather long codewords.
 But remember: Huffman *guarantees* 0% compression or higher on the 
 payload itself. A Huffman-compressed payload can *never* be bigger, only
 smaller or same size. So long as you can encode the Huffman table 
 efficiently, you should be fine...)

the devil in details. just imagine size of huffman table with 64k
entries :)  huffman encoding is inappropriate for lzw output simply
because most words will have only a few occurrences and economy on
their optimal encoding doesn't justify price of their entries in the table

 .ru = Russia?

of course

 (Realistically though. My program takes a [Word8] and turns it into a
 [Bool] before running a parser over it. The GHC optimiser doesn't really
 stand a hope in hell of optimising that into a program that reads a 
 machine word into a CPU register and starts playing with bit flips on it...)

 as time goes, those terrible compilers becomes smarter and smarter :)
   

 Oh hey, I think GHC is already pretty smart. But no optimiser can ever
 hope to cover *every* possible case. And transforming [Bool] - [Bool]
 into UArray Word8 - UArray Word8 just seems a little bit
 optimistic, methinks. ;-)

15 years ago i've written very smart asm program (btw, it was ARJ
unpacker) with handmade function inlining, loop unrolling, register
allocation, cpu recognition and so on. now, most of these tricks are
standard for C compilers. times changes and now it's hard to imagine which
optimizations will be available 10 years later

 The way I heard it is that it *does* native code generation, but it's
 not as good as the via-C version. (Can't imagine why... You would have
 thought directly implementing functional primitives would be much easier
 than trying to mangle them to fit C's multitude of limitations. Still,
 what do I know?)

ghc's native and via-C modes are blind vs lame. in native mode, its
codegenerator is comparable with 20 years-old C codegenerators. see
above how much modern C compilers changed in these years. in via-C
mode it generates unnatural C code which is hard to optimize for any C
compiler. the jhc is very different story - it generates natural C
code, so for simple, low-level programs its speed is the same as
with C. but it lacks huge base of high-level GHC optimizations. as you
may guess, in order to have C-like speed, compiler should implement
both good high-level and low-level optimization. there are
(low-priority) plans to provide LLVM backend for ghc which may solve
this problem. but actually speed of generated code isn't number 1
priority for ghc users



-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]

2007-07-08 Thread Bulat Ziganshin
Hello Donald,

Sunday, July 8, 2007, 12:50:36 PM, you wrote:



too much quoting :(

 Good work. Probably worth benchmarking against the other compression
 libraries

are you really want to totally discredit Haskell? :)  they should be
hundreds of times slower than any practical compression algorithm
(and btw, zlib/bzlib isn't good performers anyway, my own algorithm is
several times faster with the same compression ratio)

Haskell isn't a good tool to develop compression algorithms because
it's the very well studied area where it has meaning to use all the
sorts of optimizations. so it falls in the low-level algorithms
category where using Haskell means at least 3x slower development and
3x worse performance - or faster development with 100x worse
performance. Andrew's code should fall into later category


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]

2007-07-08 Thread Bulat Ziganshin
Hello Stefan,

Sunday, July 8, 2007, 7:22:03 PM, you wrote:

 This is very true - but -fvia-C isn't really C.

 If you want to see what a difference exists between true C and the NCG,
 try compiling your program with -fvia-C -unreg

even better, try to to use jhc. it compiles via gcc and unlike ghc, it
generates natural code. as result, it's only haskell compiler that is
able to generate C-quality code:

http://thread.gmane.org/gmane.comp.lang.haskell.glasgow.user/8827


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]

2007-07-08 Thread Bulat Ziganshin
Hello Andrew,

Sunday, July 8, 2007, 7:12:38 PM, you wrote:
 (Realistically though. My program takes a [Word8] and turns it into a
 [Bool] before running a parser over it. The GHC optimiser doesn't really 
 stand a hope in hell of optimising that into a program that reads a machine 
 word into a CPU register and starts playing with bit flips on it...)
 

 Actually, if you're very lucky (fusion is just as hard in Haskell as it
 is in real life), it *does*.  It seems to fit nicely into the
 stream-fusion framework.
   

 Ooo... really? That's pretty impressive...(!)

it's our collective tale for bringing new haskellers. i bet that
Stefan never seen asm code generated by ghc :)


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Toy compression algorithms [was: A very edgy language]

2007-07-08 Thread Bulat Ziganshin
Hello Stefan,

Sunday, July 8, 2007, 11:03:00 PM, you wrote:

  (Realistically though. My program takes a [Word8] and turns it into a
  [Bool] before running a parser over it. The GHC optimiser doesn't really 
  stand a hope in hell of optimising that into a program that reads a 
  machine 
  word into a CPU register and starts playing with bit flips on it...)

well, can you give us an example of code which does that Andrew said
and translates into the simple bit fiddling without battling all
around lazy lists which kills performance?

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe