Re: Compression theory reference?
Matt Crawford wrote: >>Plus a string of log(N) bits telling you how many times to apply the >>decompression function! >>Uh-oh, now goes over the judge's head ... Hadmut Danisch wrote: The problem is that if you ask for a string of log(N) bits, then someone else could take this as a proof that this actually works, because a string of log(N) bits is obviously shorter than the message of N bits, thus the compression scheme is working. Hooray! That misses the point of the construction that was the subject of Matt's remark. The point was (and remains) that the compressed output (whether it be 1 bit, or 1+log(N) bits, or 1+log^*(N) bits) is ridiculous because it is manifestly undecodeable. It is far, far too good to be true. The only question is whether the construction is simple enough for the judge to understand. There is no question whether the construction is a valid _reductio ad absurdum_. While we are on the subject, I recommend the clean and elegant argument submitted by Victor Duchovni (08/31/2004 03:50 PM) and also in more detail by Matt Crawford (08/31/2004 06:04 PM). It uses mathematical induction rather than proof-by-contradiction. It is a less showy argument, but probably it is understandable by a wider audience. It proves a less-spectacular point, but it is quite sufficient to show that the we-can-compress-anything claim is false. (Although with either approach, at least *some* mathematical sophistication is required. Neither PbC nor MI will give you any traction with ten-year-olds.) So it appears we have many different ways of approaching things: 1) The pigeon-hole argument. (This disproves the claim that all N-bit strings are compressible ... even if the claim is restricted to a particular fixed N.) 2) The mathematical induction argument. (Requires the claimed algorithm to be non-expansive for a range of N.) 3) The proof-by-contradiction. (Requires the claimed algorithm to be compressive -- not just non-expansive -- for a range of N.) 4) Readily-demonstrable failure of *any* particular claimed example, including Lempel-Ziv and all the rest. *) Others? Harumph. That really ought to be enough. Indeed *one* disproof should have been enough. The problem is, that the number of iterations is not in the order of N, but in the order of 2^N, so it takes log2(around 2^N) = around N bits to store the number of iterations. I don't see why the number of iterations should be exponential in the length (N) of the input string. A compression function is supposed to decrease N. It is not supposed to decrement the number represented by some N-bit numeral after all, the string might not represent a number at all. Also I repeat that there exist special cases (e.g. inputs of known fixed length) for which no extra bits need be represented, as I explained previously. The recursion convertes a message of N bit recursively into a message of 1 or zero bit length (to your taste), *and* a number which takes around N bit to be stored. Nothing is won. But proof that. I disagree, for the reasons given above. In the worst case, you need log^*(N) extra bits, not N bits. In special cases, you don't need any extra bits at all. The "win" is very substantial. The "win" is extreme. This recursion game is far more complicated than it appears to be. Maybe. But there's no need to make it more complicated than it really is. Note also that storing a number takes in reality more than log(N) bits. Why? Because you don't know N in advance. We don't have any limit for the message length. For general N, that's true. > So you'r counting register needs theoretically inifinte many bits. Maybe. For many practical purposes, the required number of bits is considerably less than infinity. > When you're finished you know how many bits your number took. But storing your number needs an end symbol or a tristate-bit (0,1,void). That's a common mistake. We agree that there are many common mistakes. We agree that it is a mistake to have undelimited strings. But it is also a mistake to think that you need to reserve a special symbol to mark the end of the string. Yes, that is one option, but from a data-compression point of view it is an inefficient option. Anybody who is interested in this stuff reeeally ought to read Chaitin's work. He makes a big fuss about the existence of self-delimiting strings and self-delimiting programs. There are many examples of such: -- The codewords of many commonly-used compression algorithms are self-delimiting. This is related to the property of being "instantaneously decodable". -- As Chaitin points out, you can set up a dialect of Lisp such that Lisp programs are self-delimiting. -- If you want to be able to represent M, where M is *any* N-bit number, you need more than log(M) bits (i.e. more than N bits). That's because you need to specify how many bits are used to represent log(M), which adds another log(lo
Re: Compression theory reference?
It's a sad situation when you have to get a non-technical judge to resolve academic conflicts like this, but it's your head that you're banging against the wall, not mine. If you want to appeal to authority, there's the FAQ, which of course requires explaining the Usenet FAQ traditions; perhaps you can find Lempel, Ziv, or Welch? In reality, you could show an algorithm for which any input gets at most _one_ bit longer, rather than arbitrarily longer. And of course most of the compression algorithms work because real data almost always has structure which reduces the entropy. My information theory books from grad school have long vanished into some closet, and were written just about the time LZ came out so they mainly discuss Huffman coding in the discrete-message sections, but you should be able to find a modern book on the topic. Matt Crawford's inductive argument is very strong - it gives you a constructive way to say that "for any integer N, I can give a proof for that N", starting at 1 and working your way up, showing that if there's a lossless coding that doesn't make any messages of length N any longer, then it doesn't make any any shorter, so it's not a compression method, just a permutation. The "You could compress any message down to 1 bit" argument is a great throwaway line, but it isn't rigorously valid. (And if it were, you might as well compress down to zero bits while you're at it.) The problem is that for most lossless compression algorithms, some strings will get shorter (maybe even much shorter), but some will stay the same length, so even if you had a hypothetical "never gets longer" compression algorithm, you can't guarantee that your starting message would be one that got shorter as opposed to staying the same, so you can't say that all messages would compress to zero. If your judge doesn't like inductions that count up, or your academic opponents insist on examining methods that count down, Bear's argument gets you most of the way there, by emphasizing the 1-1 mapping aspect, but you could easily get tangled. (To do this properly, you need to do n and 2**n, but I'll use 10 for concreteness.) There are 1024 10-bit messages, and only 512 9-bit messages, so something obviously happened to the >=512 that didn't compress to 9 bits. Maybe 512 of them didn't compress further and stayed as 10-bit; almost certainly some of them became 8 bits or shorter. At least one message didn't get shorter, because (2**10 - 1) = 2**9 + 2**8 + 2**7 ... + 2**1 So if you want to recurse downwards through repeated compression, you need to be sure your mapping keeps track of the ones that didn't compress the first time (maybe they'll compress the second time?), the ones that compressed by one bit, and the ones that compressed by more than one bit, and avoid wandering around in a maze of twisty little passages. So working your way up is probably cleaner. At 11:30 AM 9/1/2004, bear wrote: Actually you don't need to take it all the way to the reductio ad absurdum here. There are 1024 10-bit messages. There are 512 9-bit messages. You need to point out that since a one-to-one mapping between sets of different ordinality cannot exist, it is not possible to create something that will compress every ten-bit message by one bit. Or, slightly more formally, assume that a function C can reversibly compress any ten-bit message by one bit. Since there are 1024 distinct ten-bit messages, there must be at least 1024 distinct nine-bit messages which, when the reversal is applied, result in these 1024 messages. There are exactly 512 distinct nine-bit messages. Therefore 512 >= 1024. Bear - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED] Bill Stewart [EMAIL PROTECTED] - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Compression theory reference?
On Tue, 31 Aug 2004, John Denker wrote: > I emphasize that I am only discussing messages of length N, > where N is some pre-chosen number. For concreteness, let's > choose N=10. > I repeat my assertion that _if_ XX can compress any string, > shortening it by one bit, and _if_ you know that the original > messages each have exactly 10 bits, _then_ any 10-bit message > can be compressed down to a single bit. Actually you don't need to take it all the way to the reductio ad absurdum here. There are 1024 10-bit messages. There are 512 9-bit messages. You need to point out that since a one-to-one mapping between sets of different ordinality cannot exist, it is not possible to create something that will compress every ten-bit message by one bit. Or, slightly more formally, assume that a function C can reversibly compress any ten-bit message by one bit. Since there are 1024 distinct ten-bit messages, there must be at least 1024 distinct nine-bit messages which, when the reversal is applied, result in these 1024 messages. There are exactly 512 distinct nine-bit messages. Therefore 512 >= 1024. Bear - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Compression theory reference?
On Wed, 1 Sep 2004, Hadmut Danisch wrote: > >I have often heard that example and I used it myself several times. I >do not use it anymore, because it is not that easy. There's a major >flaw in this example, and therefore it is not really a >counterexample. If the faculty found that flaw I'd be in a bad >position. > >You could define some reversible bizarre function f that does exactly >that job, i.e. for a given message m you need to apply f again and >again and after some finite number of calculations you'll find that >f(...f(m)...) = x where x = 0 or 1 For example, loading the message into a Linear Feedback Shift Register and iterating until it produces one of two predetermined messages 0 or 1? For a nontrivial message, the last star will burn out before you get that many iterations. Also, as you say, in order to decode the message you have to know how many iterations you made - a number which will, on the average, be the same length as the message. It hardly seems a worthwhile example. >They say LZW and MTF. I have already given an example for LZW. >They don't care. I've told them to take any random string taken >from /dev/random under Linux. They don't care. The german principle is >that a faculty is always right by definition. That is inconsistent with the advancement of knowledge. Any university relying on such a principle has abandoned its duty. Bear - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
RE: Compression theory reference?
On Tue, Aug 31, 2004 at 02:48:00PM +0200, Hadmut Danisch wrote: > It can be easily shown that there is no lossless > compression method which can effectively compress every possible > input. Even more simply, if such a method existed, you could recursively apply it to its output and compress every message as one bit. - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Compression theory reference?
On Wed, Sep 01, 2004 at 04:02:02PM +1200, Peter Gutmann wrote: > > comp.compression FAQ, probably question #1 given the number of times this > comes up in the newsgroup. > > (I've just checked, it's question #9 in part 1. Question #73 in part 2 may > also be useful). Thanks, that's a pretty good hint, especially because it contains an explicit statement, and it's an FAQ, making it easy to show, that the university's claim is not just wrong, but silly. :-) regards Hadmut - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Compression theory reference?
Hadmut Danisch <[EMAIL PROTECTED]> writes: >I need a literature reference for a simple problem of encoding/compression >theory: comp.compression FAQ, probably question #1 given the number of times this comes up in the newsgroup. (I've just checked, it's question #9 in part 1. Question #73 in part 2 may also be useful). Peter. - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Compression theory reference?
I wrote: 4) Don't forget the _recursion_ argument. Take their favorite algorithm (call it XX). If their claims are correct, XX should be able to compress _anything_. That is, the output of XX should _always_ be at least one bit shorter than the input. Then the compound operation XX(XX(...)) should produce something two bits shorter than the original input. If you start with a N-bit message and apply the XX function N-1 times, you should be able to compress each and every message down to a single bit. Matt Crawford wrote: Plus a string of log(N) bits telling you how many times to apply the decompression function! Uh-oh, now goes over the judge's head ... Actually you don't need to adjoin log(N) bits. But perhaps my assertion would benefit from some clarification. I emphasize that I am only discussing messages of length N, where N is some pre-chosen number. For concreteness, let's choose N=10. I repeat my assertion that _if_ XX can compress any string, shortening it by one bit, and _if_ you know that the original messages each have exactly 10 bits, _then_ any 10-bit message can be compressed down to a single bit. I have proved that XX is ridiculous in this one case. My function YY := XX^9 is less general than XX. XX works on any input, whereas YY by its definition only applies to 10-bit messages. The fact remains that we have a proof by contradiction. We assume by way of hypothesis that the bad-guys are right, namely that XX exists and has the properties they assign to it. Then I can construct YY. But YY is ridiculous, through no fault of mine. Ergo the bad guys are wrong, i.e. no such XX can exist. With a little more work I could construct a more powerful and/or more general version of YY ... but that would be doing more work than is required. Their XX stuck its neck out; it is not required for my YY to stick its neck out in the same way. The requirement, as I understood it, was to prove the bad guys wrong. Well, the bad guys have been proved wrong. If something more is required, please explain the requirements in more detail. (BTW I suppose it would be better to call this the 'iterated composition' argument rather than the recursion argument.) Hadmut wrote: I found some outside Germany. But they didn't give me a paper with signature, just e-mails. Will see whether the court will accept that. Ask your legal advisor. In the US, getting such emails admitted as evidence would be problematic. Each jurisdiction (I assume) has its own standards for how affidavits should be prepared. Figure out the rules, and play by the rules. - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Compression theory reference?
On Tue, Aug 31, 2004 at 04:56:25PM -0400, John Denker wrote: > 4) Don't forget the _recursion_ argument. Take their favorite > algorithm (call it XX). If their claims are correct, XX should > be able to compress _anything_. That is, the output of XX > should _always_ be at least one bit shorter than the input. > Then the compound operation XX(XX(...)) should produce something > two bits shorter than the original input. If you start with a > N-bit message and apply the XX function N-1 times, you should be > able to compress each and every message down to a single bit. > This proof as written (the theorem is still true of course) relies on the algorithm always compressing, rather than never expanding. It is much simpler as a result, is there an equally simple argument to prove that all non-expanding codes never compress? Note that it is possible to turn any compressor into one whose expansion is at most one 1 extra bit: If F(x) is shorter than x by at least one bit output 0|F(x) if F(x) is the same length as x or longer output 1|x. So we can lose 1 bit of efficiency in compressed strings to gain at most 1 bit of overhead in uncompressed strings. -- /"\ ASCII RIBBON NOTICE: If received in error, \ / CAMPAIGN Victor Duchovni please destroy and notify X AGAINST IT Security, sender. Sender does not waive / \ HTML MAILMorgan Stanley confidentiality or privilege, and use is prohibited. - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Compression theory reference?
On Tue, Aug 31, 2004 at 05:07:30PM -0500, Matt Crawford wrote: > > Plus a string of log(N) bits telling you how many times to apply the > decompression function! > Uh-oh, now goes over the judge's head ... Yeah, I just posted a lengthy description why I think that this counterexample is not a counterexample. The problem is that if you ask for a string of log(N) bits, then someone else could take this as a proof that this actually works, because a string of log(N) bits is obviously shorter than the message of N bits, thus the compression scheme is working. Hooray! The problem is, that the number of iterations is not in the order of N, but in the order of 2^N, so it takes log2(around 2^N) = around N bits to store the number of iterations. The recursion convertes a message of N bit recursively into a message of 1 or zero bit length (to your taste), *and* a number which takes around N bit to be stored. Nothing is won. But proof that. This recursion game is far more complicated than it appears to be. Note also that storing a number takes in reality more than log(N) bits. Why? Because you don't know N in advance. We don't have any limit for the message length. So you'r counting register needs theoretically inifinte many bits. When you're finished you know how many bits your number took. But storing your number needs an end symbol or a tristate-bit (0,1,void). That's a common mistake. When determining the compression rate for a file people often forget, that some information is not stored in the file itself, but in the file system, e.g. the file length (telling where the compressed data stops) and the file name (telling you, that the file was compressed). That's basically the same problem. thanks and regards Hadmut - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Compression theory reference?
On Tue, Aug 31, 2004 at 04:56:25PM -0400, John Denker wrote: > 4) Don't forget the _recursion_ argument. Take their favorite > algorithm (call it XX). If their claims are correct, XX should > be able to compress _anything_. That is, the output of XX > should _always_ be at least one bit shorter than the input. > Then the compound operation XX(XX(...)) should produce something > two bits shorter than the original input. If you start with a > N-bit message and apply the XX function N-1 times, you should be > able to compress each and every message down to a single bit. I have often heard that example and I used it myself several times. I do not use it anymore, because it is not that easy. There's a major flaw in this example, and therefore it is not really a counterexample. If the faculty found that flaw I'd be in a bad position. You could define some reversible bizarre function f that does exactly that job, i.e. for a given message m you need to apply f again and again and after some finite number of calculations you'll find that f(...f(m)...) = x where x = 0 or 1 So this is possible. Since the function is reversible, let's call the inverse function f', and you'll find m = f'( ... f'(x)...) where x is still 0 or 1 Ooops. What happened? Why does this work? Because the commonly used counterexample has a flaw. The reason is that you can invert f(...f(m)...) only if you count the number of times you applied f. You need to know the number of times in order to revert = decompress it, because you need to apply f' exactly that many times. You don't have any other stop condition. Applying f' is not a proper recursion, it's an iteration. So your information is actually stored in this number, not in 0 or 1. The output of the compression function is not 0 or 1, it is that hidden number telling how often you need to apply f to reach 0 or 1. So just give it as a contradiction that there can not be such a function because it could be recursively applied to result in a single bit is insufficient, it is not a contradiction. You need to consider the recursion depth and the inversion. But then it get's so complicated that most people don't understand it anymore. And the argument, that reaching a single bit recursively is a contradiction, is gone. You need to store a number. So what? Who says that this number isn't shorter that the plain message? And suddenly your back deep in theory. That's why I don't like that example. It's convincing at a first glance, but I don't believe that it is correct. > > 1) Get a few "expert witnesses" to certify that your position is > certainly not a personal fantasy. (I assume German jurisprudence > has something akin to the US notion of expert witnesses, right?) I did. Unfortunately I didn't find a german one, because it is very difficult to find a german professor witnessing against any other. It's a tight community. I found some outside Germany. But they didn't give me a paper with signature, just e-mails. Will see whether the court will accept that. I've sent those e-mails to the dean of the faculty of computer science to convince him that the faculty is wrong. As a result, he configured the mail relay of the faculty to drop any e-mail containing my last name anywhere in the header. It's ridiculous and I would laugh if it wouldn't be exactly the faculty that's said to be the best german faculty of computer science. > 2) Try to get the burden-of-proof reversed. Very difficult. I meanwhile became an expert in german examination law and it usually requires the examinee to proof that the examiners opinion is wrong. But since I already have proven several times that the university was lying intentionally to the court, they might take that into consideration. After all, I have brought this forward, and I have done my duty. Now it should be up to the university to respond. They didn't comment for more than four years now. > The opposition > are claiming that known algorithms do the job. Get them to be > specific about which algorithms they are referring to. Then > file with the court some disks, each containing 1.44 megabytes > of data. They say LZW and MTF. I have already given an example for LZW. They don't care. I've told them to take any random string taken from /dev/random under Linux. They don't care. The german principle is that a faculty is always right by definition. > 3) Diagram the pigeon-hole argument for the judge. See > diagrams below. I'll try that. Thanks. regards Hadmut - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Compression theory reference?
On Aug 31, 2004, at 15:56, John Denker wrote: 4) Don't forget the _recursion_ argument. Take their favorite algorithm (call it XX). If their claims are correct, XX should be able to compress _anything_. That is, the output of XX should _always_ be at least one bit shorter than the input. Then the compound operation XX(XX(...)) should produce something two bits shorter than the original input. If you start with a N-bit message and apply the XX function N-1 times, you should be able to compress each and every message down to a single bit. Plus a string of log(N) bits telling you how many times to apply the decompression function! Uh-oh, now goes over the judge's head ... - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Compression theory reference?
On Aug 31, 2004, at 14:50, Victor Duchovni wrote: This is a question in elementary finite set theory, not computer science (whatever that is :-). All proofs will involve some mathematics. The above is I think simpler than your original argument and is the simplest I can come up with... I think Hadmut was looking for an authority that would be respected by the CS department he is dealing with. It's a sad state of affairs when they will accept authority over proof. However, I can give what I think is a simpler proof, using only high school math. Assume that some invertible function f() turns no input message into a longer output message. We can prove that it also does not make any message *shorter*, and hence is not a "compression" function after all. In particular, f() turns every one-bit message into a one-bit message. Suppose f() preserves the length of all n-bit messages, for 1 <= n <= N. (This is already the case for N=1.) What does f() do to a message M of N+1 bits length? By assumption, f(M) is not N+2 bits or longer. But all output messages of N bits or less are the result of some input of N bits or less and hence cannot be f(M). So by elimination, f(M) is N+1 bits long. By mathematical induction, f() preserves the length of every message. - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Compression theory reference?
From: Hadmut Danisch <[EMAIL PROTECTED]> > It can be easily shown that there is no lossless > compression method which can effectively compress every possible > input. > Therefore, I need a book about computer science or encoding theory, > which explicitely says that this is impossible, in a way that a person > unexperienced in computer science (judges at a court) can read and > at least see that this is not just my personal phantasy and at least > somehow reasonable. Section 20.4 of Press, Teukolsky, Vetterling & Flannery Numerical Recipies in Fortran, 2nd Ed (1992) Cambridge University Press ISBN 0-521-43064-X makes simple reading to this reader with minimal computer science education incidental to a Physics course. There are related "Numerical Recipies" books using other computer languages. The first paragraph on compression is (with *italics*): "A lossless data compression algorithm takes a string of symbols (typically ASCII characters or bytes) and translates it *reversibly* into another string, one that is *on the average* of shorter length. The words "on the average" are crucial; it is obvious that no reversible algorithm can make all strings shorter - there just aren't enough short strings to be in one-to-one correspondence with longer strings. Compression algorithms are possible only when, on the input side, some strings, or some input symbols, are more common than others. These can then be encoded in fewer bits than rarer input strings or symbols, giving a net average gain." There is 10.5 pages on compression and some further references I have not read. - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Re: Compression theory reference?
Hadmut Danisch wrote: It can be easily shown that there is no lossless compression method which can effectively compress every possible input. OK. ... I need a book about computer science or encoding theory, which explicitely says that this is impossible, in a way that a person unexperienced in computer science (judges at a court) can read and at least see that this is not just my personal phantasy and at least somehow reasonable. There are two conficting requirements there. -- authoritative and correct, and -- understandable to a judge with no technical expertise. As you know, it is easy to satisfy either requirement separately. My suggestions: 1) Get a few "expert witnesses" to certify that your position is certainly not a personal fantasy. (I assume German jurisprudence has something akin to the US notion of expert witnesses, right?) 2) Try to get the burden-of-proof reversed. The opposition are claiming that known algorithms do the job. Get them to be specific about which algorithms they are referring to. Then file with the court some disks, each containing 1.44 megabytes of data. If the opposition are telling the truth, they should be able to compress these using one of their chosen algorithms. Offer to concede the entire case if they can shorten the text by more than, say, 0.1 percent (i.e. 1.44 kilobytes shorter). Of course you fill your disks using a good hardware random number generator, e.g. http://www.av8n.com/turbid/ Be sure to arrange suitable precautions so that the opposition doesn't get a chance to peek at your disks and build a special-purpose "rumplestiltskin" encoder/decoder, i.e. one that contains quotations of your data. One precaution would be to require that they use an open-source implementation, or at least an impartial implementation, of a published algorithm. 3) Diagram the pigeon-hole argument for the judge. See diagrams below. 4) Don't forget the _recursion_ argument. Take their favorite algorithm (call it XX). If their claims are correct, XX should be able to compress _anything_. That is, the output of XX should _always_ be at least one bit shorter than the input. Then the compound operation XX(XX(...)) should produce something two bits shorter than the original input. If you start with a N-bit message and apply the XX function N-1 times, you should be able to compress each and every message down to a single bit. Here's how to diagram the pigeon-hole argument: Write down all the three-bit numbers, and demonstrate a lossless non-compressive encoding: plaintext codetext 000 -- 000 001 -- 001 010 -- 010 011 ---\ /=== 011 \/ /\ 100 ===/ \--- 100 101 -- 101 110 -- 110 111 -- 111 === Then give an example of a lossy compressive code. Make the point that the codetext words are shorter than the plaintext words. However, there are necessarily fewer of them, so the coding scheme is necessarily lossy and non-invertible. plaintext codetext 000 -- 00 "zero" 001 -- 01 "one" 010 -- 10 "two" 011 ---=== 11 "many" / | 100 /| | | 101 /| | | 110 /| | | 111 / = Then give an example of a code that might or might not be compressive, depending on statistical assumptions. The following code is compressive if zero, one, and two are considerably more probable than the other three-bit numbers, but does is anti-compressive if all eight three-bit numbers are equally likely, or nearly equally likely. 000 -- 00 "zero" 001 -- 01 "one" 010 -- 10 "two" 011 -- 11000 100 -- 11001 101 -- 11010 110 -- 11011 111 -- 11100 Average length, for equiprobable plaintext words: (2+2+2+5+5+5+5+5) / 8 = 3.875 which is an expansion, not a compression. Judges like fairness. Certainly an equiprobable distribution of input words is fair. It reflects the bit-strings that would be generated by tossing fair coins (three at a time), or tossing a fair eight-sided die. This fairest of distributions is incompressible in the usual sense. Finally, offer a challenge. Write down all eight three-bit plaintext words, and all four two-bit codetext words. Ask the judge to conclude for himself that it is obviously impossible to draw lines connecting corresponding words in a one-to-one fashion. 000 00 001 01 010 10 011 11 100 101 110 111 = Note that in the last example, the codetext words were only one bit shorter than the plaintext words. If you want to pound on the point, present the samething with four-bit plaint
Re: Compression theory reference?
On Tue, Aug 31, 2004 at 02:48:00PM +0200, Hadmut Danisch wrote: > It can be easily shown that there is no lossless > compression method which can effectively compress every possible > input. Proof is easy: In a first step, consider all > possible messages of length n bit, n>0. There are 2^n different > ones. But there are only (2^n)-1 shorter messages, so > there is no injective encoding to encode all messages into > a shorter one. More simply: Choose any injection (e.g. lossless compression) from the set of finite length messages to the set of finite length messages. Suppose for *some* finite number N the injection has the property that no message of length *at most* N bits encodes in more than N bits (some might get longer while staying under N+1 bits, some shorter we don't care). Since there is a finite number of messages that are N bits or less, and the injection is from a finite set to itself, the injection is a bijection (no missed outputs). So from the weaker assumption we conclude that all N bit or less outputs correspond to N bit or less inputs, and therefore that the injection we are looking at for the given number N never encodes a message of length N+1 or greater in N bits or less. >From the above we can conclude that any injection, that for *all* N has the above (weaker for any *given* N than always compressing) property, never encodes a message of length N+1 in N or less bits. The only candidate injections are then simple length-preserving permutations. > Therefore, I need a book about computer science or encoding theory, > which explicitely says that this is impossible, in a way that a person > unexperienced in computer science (judges at a court) can read and > at least see that this is not just my personal phantasy and at least > somehow reasonable. > > I have checked several books about information and coding theory, > but didn't find any suitable for readers without computer science > background. The statement is always hidden in formulas about > entropy, average code lengths, code efficiency inequations etc. > If you had such an encoding scheme, you could easily show that > the average length is below the entropy of the efficiency is >100%. > But a non-computer science person does not understand that. > This is a question in elementary finite set theory, not computer science (whatever that is :-). All proofs will involve some mathematics. The above is I think simpler than your original argument and is the simplest I can come up with... -- /"\ ASCII RIBBON NOTICE: If received in error, \ / CAMPAIGN Victor Duchovni please destroy and notify X AGAINST IT Security, sender. Sender does not waive / \ HTML MAILMorgan Stanley confidentiality or privilege, and use is prohibited. - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]
Compression theory reference?
Hi, I need a literature reference for a simple problem of encoding/compression theory: It can be easily shown that there is no lossless compression method which can effectively compress every possible input. Proof is easy: In a first step, consider all possible messages of length n bit, n>0. There are 2^n different ones. But there are only (2^n)-1 shorter messages, so there is no injektive encoding to encode all messages into a shorter one. And then all codewords of length 100%. But a non-computer science person does not understand that. Does anybody know a book about coding theory which explicitely states the impossibility of such a compression method in plain language? regards Hadmut - The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]