Re: [agi] Lossy *&* lossless compression

2006-08-26 Thread Matt Mahoney
Suppose I claim that text8.zip available at http://cs.fit.edu/~mmahoney/compression/textdata.html is in canonical form.  The procedure and a program for generating it is described at the bottom of that page.  The output consists of only the lowercase letters a-z and spaces.  If you claim that this

Re: [agi] Lossy *&* lossless compression

2006-08-26 Thread Mark Waser
>> I think that either putting Wikipedia in canonical form, or recognizing that it is in canonical form, are two equally difficult problems.  So the problem does not go away easily.       Um.  I think you missed my point.  The compression program should be able to take the Wikipedia in it's

Re: [agi] Lossy *&* lossless compression

2006-08-26 Thread Matt Mahoney
I think that either putting Wikipedia in canonical form, or recognizing that it is in canonical form, are two equally difficult problems.  So the problem does not go away easily. -- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser <[EMAIL PROTECTED]>To: agi@v2.listbox.comS

Re: [agi] Lossy *&* lossless compression

2006-08-26 Thread Mark Waser
>> Mark suggested putting Wikipedia in a canonical form, which would remove the distinction between lossless and lossy compression.   Hmmm.  Interesting . . . .  Actually, I didn't suggest exactly that -- though I can see how you got that impression.  I suggested that the decompression progr

Re: [agi] Lossy *&* lossless compression

2006-08-26 Thread Matt Mahoney
First let me respond to Boris and Mark. I agree. Mark suggested putting Wikipedia in a canonical form, which would remove the distinction between lossless and lossy compression. This will be hard, but Boris made an important observation that useful data is generally compressable and useless d

Re: [agi] Lossy *&* lossless compression

2006-08-26 Thread Mark Waser
Let me state one more time why a lossless model has more knowledge. If x and x' have the same meaning to a lossy compressor (they compress to identical codes), then the lossy model only knows p(x)+p(x'). A lossless model also knows p(x) and p(x'). You can argue that if x and x' are not disti

Re: [agi] Lossy *&* lossless compression

2006-08-26 Thread Boris Kazachenko
Sorry, doesn't anyone get it? We don't have to chose between lossy & lossless. If compression is produced by pattern recognition we can quantify lossless compression by individual patterns & *lose* insufficiently compressed ones. That would be objectively measurable lossy compression. To