Suppose I claim that text8.zip available at http://cs.fit.edu/~mmahoney/compression/textdata.html is in canonical form. The procedure and a program for generating it is described at the bottom of that page. The output consists of only the lowercase letters a-z and spaces. If you claim that this
>> I
think that either putting Wikipedia in canonical form, or recognizing that it is
in canonical form, are two equally difficult problems. So the problem does
not go away easily.
Um. I think you missed my point. The
compression program should be able to take the Wikipedia in it's
I think that either putting Wikipedia in canonical form, or recognizing that it is in canonical form, are two equally difficult problems. So the problem does not go away easily. -- Matt Mahoney, [EMAIL PROTECTED]- Original Message From: Mark Waser <[EMAIL PROTECTED]>To: agi@v2.listbox.comS
>> Mark suggested putting Wikipedia in a
canonical form, which would remove the distinction between lossless and lossy
compression.
Hmmm. Interesting . . . . Actually, I
didn't suggest exactly that -- though I can see how
you got that impression. I suggested that the decompression progr
First let me respond to Boris and Mark. I agree. Mark suggested putting
Wikipedia in a canonical form, which would remove the distinction between
lossless and lossy compression. This will be hard, but Boris made an important
observation that useful data is generally compressable and useless d
Let me state one more time why a lossless model has more knowledge. If x
and x' have the same meaning to a lossy compressor (they compress to
identical codes), then the lossy model only knows p(x)+p(x'). A lossless
model also knows p(x) and p(x'). You can argue that if x and x' are not
disti
Sorry, doesn't anyone get it? We don't have to chose
between lossy & lossless. If compression is produced by pattern recognition
we can quantify lossless compression by individual patterns & *lose*
insufficiently compressed ones.
That would be objectively measurable lossy
compression.
To