Re: Compression theory reference?

2004-09-06 Thread John Denker
Matt Crawford wrote:
>>Plus a string of log(N) bits telling you how many times to apply the
>>decompression function!
>>Uh-oh, now goes over the judge's head ...
Hadmut Danisch wrote:
The problem is that if you ask for a string of log(N) bits, then 
someone else could take this as a proof that this actually works, 
because a string of log(N) bits is obviously shorter than the 
message of N bits, thus the compression scheme is working. Hooray!
That misses the point of the construction that was the subject of
Matt's remark.  The point was (and remains) that the compressed
output (whether it be 1 bit, or 1+log(N) bits, or 1+log^*(N) bits)
is ridiculous because it is manifestly undecodeable.  It is far, far
too good to be true.
The only question is whether the construction is simple enough
for the judge to understand.
There is no question whether the construction is a valid _reductio
ad absurdum_.
  While we are on the subject, I recommend the clean and elegant
  argument submitted by Victor Duchovni (08/31/2004 03:50 PM) and
  also in more detail by Matt Crawford (08/31/2004 06:04 PM).  It
  uses mathematical induction rather than proof-by-contradiction.
  It is a less showy argument, but probably it is understandable
  by a wider audience.  It proves a less-spectacular point, but it
  is quite sufficient to show that the we-can-compress-anything
  claim is false.  (Although with either approach, at least *some*
  mathematical sophistication is required.  Neither PbC nor MI will
  give you any traction with ten-year-olds.)
  So it appears we have many different ways of approaching things:
   1) The pigeon-hole argument.  (This disproves the claim that all
N-bit strings are compressible ... even if the claim is restricted
to a particular fixed N.)
   2) The mathematical induction argument.  (Requires the claimed
algorithm to be non-expansive for a range of N.)
   3) The proof-by-contradiction.  (Requires the claimed algorithm
to be compressive -- not just non-expansive -- for a range of N.)
   4) Readily-demonstrable failure of *any* particular claimed example,
including Lempel-Ziv and all the rest.
   *) Others?
  Harumph.  That really ought to be enough.  Indeed *one* disproof
  should have been enough.
The problem is, that the number of iterations is not in the order of 
N, but in the order of 2^N, so it takes log2(around 2^N) = around N bits to
store the number of iterations.
I don't see why the number of iterations should be exponential in
the length (N) of the input string.  A compression function is
supposed to decrease N.  It is not supposed to decrement the
number represented by some N-bit numeral  after all, the string
might not represent a number at all.
Also I repeat that there exist special cases (e.g. inputs of
known fixed length) for which no extra bits need be represented,
as I explained previously.
The recursion convertes a message of 
N bit recursively into a message of 1 or zero bit length (to your
taste), *and* a number which takes around N bit to be stored. 
Nothing is won. But proof that. 
I disagree, for the reasons given above.
In the worst case, you need log^*(N) extra bits, not N bits.  In
special cases, you don't need any extra bits at all.  The "win"
is very substantial.  The "win" is extreme.
This recursion game is far more complicated than it appears to be. 
Maybe.  But there's no need to make it more complicated than it
really is.
Note also that storing a number takes in reality more than log(N)
bits. Why? Because you don't know N in advance. We don't have any
limit for the message length. 
For general N, that's true.
> So you'r counting register needs
theoretically inifinte many bits. 
Maybe.  For many practical purposes, the required number of bits
is considerably less than infinity.
> When you're finished you know
how many bits your number took. But storing your number needs an 
end symbol or a tristate-bit (0,1,void). That's a common mistake. 
We agree that there are many common mistakes.  We agree that
it is a mistake to have undelimited strings.  But it is also a
mistake to think that you need to reserve a special symbol to
mark the end of the string.  Yes, that is one option, but from a
data-compression point of view it is an inefficient option.
Anybody who is interested in this stuff reeeally ought to read
Chaitin's work.  He makes a big fuss about the existence of
self-delimiting strings and self-delimiting programs.  There
are many examples of such:
 -- The codewords of many commonly-used compression algorithms
  are self-delimiting.  This is related to the property of being
  "instantaneously decodable".
 -- As Chaitin points out, you can set up a dialect of Lisp such
  that Lisp programs are self-delimiting.
 -- If you want to be able to represent M, where M is *any* N-bit
  number, you need more than log(M) bits (i.e. more than N bits).
  That's because you need to specify how many bits are used to
  represent log(M), which adds another log(lo

Re: Compression theory reference?

2004-09-06 Thread Bill Stewart
It's a sad situation when you have to get a non-technical
judge to resolve academic conflicts like this,
but it's your head that you're banging against the wall, not mine.
If you want to appeal to authority, there's the FAQ,
which of course requires explaining the Usenet FAQ traditions;
perhaps you can find Lempel, Ziv, or Welch?
In reality, you could show an algorithm for which any input
gets at most _one_ bit longer, rather than arbitrarily longer.
And of course most of the compression algorithms work because
real data almost always has structure which reduces the entropy.
My information theory books from grad school have
long vanished into some closet, and were written
just about the time LZ came out so they mainly discuss
Huffman coding in the discrete-message sections,
but you should be able to find a modern book on the topic.
Matt Crawford's inductive argument is very strong -
it gives you a constructive way to say that
"for any integer N, I can give a proof for that N",
starting at 1 and working your way up,
showing that if there's a lossless coding that doesn't make
any messages of length N any longer, then it doesn't make any any shorter,
so it's not a compression method, just a permutation.
The "You could compress any message down to 1 bit"
argument is a great throwaway line, but it isn't rigorously valid.
(And if it were, you might as well compress down to zero bits while you're 
at it.)
The problem is that for most lossless compression algorithms,
some strings will get shorter (maybe even much shorter),
but some will stay the same length,
so even if you had a hypothetical "never gets longer"
compression algorithm, you can't guarantee that your
starting message would be one that got shorter as opposed to staying the same,
so you can't say that all messages would compress to zero.

If your judge doesn't like inductions that count up,
or your academic opponents insist on examining methods that count down,
Bear's argument gets you most of the way there,
by emphasizing the 1-1 mapping aspect, but you could easily get tangled.
(To do this properly, you need to do n and 2**n, but I'll use 10 for 
concreteness.)

There are 1024 10-bit messages, and only 512 9-bit messages,
so something obviously happened to the >=512 that didn't compress to 9 bits.
Maybe 512 of them didn't compress further and stayed as 10-bit;
almost certainly some of them became 8 bits or shorter.
At least one message didn't get shorter, because
(2**10 - 1) = 2**9 + 2**8 + 2**7 ... + 2**1
So if you want to recurse downwards through repeated compression,
you need to be sure your mapping keeps track of the ones that
didn't compress the first time (maybe they'll compress the second time?),
the ones that compressed by one bit,
and the ones that compressed by more than one bit,
and avoid wandering around in a maze of twisty little passages.
So working your way up is probably cleaner.
At 11:30 AM 9/1/2004, bear wrote:
Actually you don't need to take it all the way to the
reductio ad absurdum here.  There are 1024 10-bit
messages.  There are 512 9-bit messages.  You need
to point out that since a one-to-one mapping between
sets of different ordinality cannot exist, it is not
possible to create something that will compress every
ten-bit message by one bit.
Or, slightly more formally, assume that a function C
can reversibly compress any ten-bit message by one bit.
Since there are 1024 distinct ten-bit messages, there
must be at least 1024 distinct nine-bit messages which,
when the reversal is applied, result in these 1024
messages.  There are exactly 512 distinct nine-bit
messages.  Therefore 512 >= 1024.
Bear
-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]

Bill Stewart  [EMAIL PROTECTED] 

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Compression theory reference?

2004-09-01 Thread bear


On Tue, 31 Aug 2004, John Denker wrote:

> I emphasize that I am only discussing messages of length N,
> where N is some pre-chosen number.  For concreteness, let's
> choose N=10.

> I repeat my assertion that _if_ XX can compress any string,
> shortening it by one bit, and _if_ you know that the original
> messages each have exactly 10 bits, _then_ any 10-bit message
> can be compressed down to a single bit.

Actually you don't need to take it all the way to the
reductio ad absurdum here.  There are 1024 10-bit
messages.  There are 512 9-bit messages.  You need
to point out that since a one-to-one mapping between
sets of different ordinality cannot exist, it is not
possible to create something that will compress every
ten-bit message by one bit.

Or, slightly more formally, assume that a function C
can reversibly compress any ten-bit message by one bit.
Since there are 1024 distinct ten-bit messages, there
must be at least 1024 distinct nine-bit messages which,
when the reversal is applied, result in these 1024
messages.  There are exactly 512 distinct nine-bit
messages.  Therefore 512 >= 1024.

Bear

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Compression theory reference?

2004-09-01 Thread bear


On Wed, 1 Sep 2004, Hadmut Danisch wrote:

>
>I have often heard that example and I used it myself several times.  I
>do not use it anymore, because it is not that easy. There's a major
>flaw in this example, and therefore it is not really a
>counterexample. If the faculty found that flaw I'd be in a bad
>position.
>
>You could define some reversible bizarre function f that does exactly
>that job, i.e. for a given message m you need to apply f again and
>again and after some finite number of calculations you'll find that
>f(...f(m)...) = x where x = 0 or 1

For example, loading the message into a Linear Feedback Shift
Register and iterating until it produces one of two predetermined
messages 0 or 1?

For a nontrivial message, the last star will burn out before you
get that many iterations.  Also, as you say, in order to decode
the message you have to know how many iterations you made - a
number which will, on the average, be the same length as the
message.

It hardly seems a worthwhile example.

>They say LZW and MTF. I have already given an example for LZW.
>They don't care. I've told them to take any random string taken
>from /dev/random under Linux. They don't care. The german principle is
>that a faculty is always right by definition.

That is inconsistent with the advancement of knowledge.  Any
university relying on such a principle has abandoned its duty.

Bear

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


RE: Compression theory reference?

2004-09-01 Thread Dean, James
On Tue, Aug 31, 2004 at 02:48:00PM +0200, Hadmut Danisch wrote:

> It can be easily shown that there is no lossless
> compression method which can effectively compress every possible
> input. 

Even more simply, if such a method existed, you could recursively 
apply it to its output and compress every message as one bit.

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Compression theory reference?

2004-09-01 Thread Hadmut Danisch
On Wed, Sep 01, 2004 at 04:02:02PM +1200, Peter Gutmann wrote:
> 
> comp.compression FAQ, probably question #1 given the number of times this
> comes up in the newsgroup.
> 
> (I've just checked, it's question #9 in part 1.  Question #73 in part 2 may
>  also be useful).


Thanks, that's a pretty good hint, especially because it contains 
an explicit statement, and it's an FAQ, making it easy to show, that
the university's claim is not just wrong, but silly. :-)

regards
Hadmut

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Compression theory reference?

2004-09-01 Thread Peter Gutmann
Hadmut Danisch <[EMAIL PROTECTED]> writes:

>I need a literature reference for a simple problem of encoding/compression
>theory:

comp.compression FAQ, probably question #1 given the number of times this
comes up in the newsgroup.

(I've just checked, it's question #9 in part 1.  Question #73 in part 2 may
 also be useful).

Peter.

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Compression theory reference?

2004-09-01 Thread John Denker
I wrote:
 4) Don't forget the _recursion_ argument.  Take their favorite
algorithm (call it XX).  If their claims are correct, XX should
be able to compress _anything_.   That is, the output of XX
should _always_ be at least one bit shorter than the input.
Then the compound operation XX(XX(...)) should produce something
two bits shorter than the original input.  If you start with a
N-bit message and apply the XX function N-1 times, you should be
able to compress each and every message down to a single bit.
Matt Crawford wrote:
Plus a string of log(N) bits telling you how many times to apply the 
decompression function!
Uh-oh, now goes over the judge's head ...
Actually you don't need to adjoin log(N) bits.  But perhaps my
assertion would benefit from some clarification.
I emphasize that I am only discussing messages of length N,
where N is some pre-chosen number.  For concreteness, let's
choose N=10.
I repeat my assertion that _if_ XX can compress any string,
shortening it by one bit, and _if_ you know that the original
messages each have exactly 10 bits, _then_ any 10-bit message
can be compressed down to a single bit.
I have proved that XX is ridiculous in this one case.
My function YY := XX^9 is less general than XX.  XX works
on any input, whereas YY by its definition only applies to
10-bit messages.
The fact remains that we have a proof by contradiction.  We
assume by way of hypothesis that the bad-guys are right, namely
that XX exists and has the properties they assign to it.  Then
I can construct YY.  But YY is ridiculous, through no fault of
mine.  Ergo the bad guys are wrong, i.e. no such XX can exist.
With a little more work I could construct a more powerful and/or
more general version of YY ... but that would be doing more work
than is required.  Their XX stuck its neck out;  it is not
required for my YY to stick its neck out in the same way.  The
requirement, as I understood it, was to prove the bad guys
wrong.  Well, the bad guys have been proved wrong.  If something
more is required, please explain the requirements in more detail.
  (BTW I suppose it would be better to call this the 'iterated
  composition' argument rather than the recursion argument.)
Hadmut wrote:
I found some outside Germany. But they didn't give me a paper with 
signature, just e-mails. Will see whether the court will accept that. 
Ask your legal advisor.
In the US, getting such emails admitted as evidence would be
problematic.  Each jurisdiction (I assume) has its own standards
for how affidavits should be prepared.  Figure out the rules,
and play by the rules.
-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Compression theory reference?

2004-09-01 Thread Victor Duchovni
On Tue, Aug 31, 2004 at 04:56:25PM -0400, John Denker wrote:

>  4) Don't forget the _recursion_ argument.  Take their favorite
> algorithm (call it XX).  If their claims are correct, XX should
> be able to compress _anything_.   That is, the output of XX
> should _always_ be at least one bit shorter than the input.
> Then the compound operation XX(XX(...)) should produce something
> two bits shorter than the original input.  If you start with a
> N-bit message and apply the XX function N-1 times, you should be
> able to compress each and every message down to a single bit.
> 

This proof as written (the theorem is still true of course) relies on
the algorithm always compressing, rather than never expanding. It is
much simpler as a result, is there an equally simple argument to prove
that all non-expanding codes never compress?

Note that it is possible to turn any compressor into one whose expansion
is at most one 1 extra bit:

If F(x) is shorter than x by at least one bit output 0|F(x) if F(x)
is the same length as x or longer output 1|x. So we can lose 1 bit of
efficiency in compressed strings to gain at most 1 bit of overhead in
uncompressed strings.

-- 

 /"\ ASCII RIBBON  NOTICE: If received in error,
 \ / CAMPAIGN Victor Duchovni  please destroy and notify
  X AGAINST   IT Security, sender. Sender does not waive
 / \ HTML MAILMorgan Stanley   confidentiality or privilege,
   and use is prohibited.

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Compression theory reference?

2004-09-01 Thread Hadmut Danisch
On Tue, Aug 31, 2004 at 05:07:30PM -0500, Matt Crawford wrote:
>
> Plus a string of log(N) bits telling you how many times to apply the 
> decompression function!
> Uh-oh, now goes over the judge's head ...


Yeah, I just posted a lengthy description why I think that this 
counterexample is not a counterexample. 

The problem is that if you ask for a string of log(N) bits, then 
someone else could take this as a proof that this actually works, 
because a string of log(N) bits is obviously shorter than the 
message of N bits, thus the compression scheme is working. Hooray!


The problem is, that the number of iterations is not in the order of 
N, but in the order of 2^N, so it takes log2(around 2^N) = around N bits to
store the number of iterations. The recursion convertes a message of 
N bit recursively into a message of 1 or zero bit length (to your
taste), *and* a number which takes around N bit to be stored. 
Nothing is won. But proof that. 

This recursion game is far more complicated than it appears to be. 


Note also that storing a number takes in reality more than log(N)
bits. Why? Because you don't know N in advance. We don't have any
limit for the message length. So you'r counting register needs
theoretically inifinte many bits. When you're finished you know 
how many bits your number took. But storing your number needs an 
end symbol or a tristate-bit (0,1,void). That's a common mistake. 

When determining the compression rate for a file people often 
forget, that some information is not stored in the file itself, but in
the file system, e.g. the file length (telling where the compressed
data stops) and the file name (telling you, that the file was
compressed). That's basically the same problem.

thanks and regards
Hadmut




-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Compression theory reference?

2004-09-01 Thread Hadmut Danisch

On Tue, Aug 31, 2004 at 04:56:25PM -0400, John Denker wrote:

>  4) Don't forget the _recursion_ argument.  Take their favorite
> algorithm (call it XX).  If their claims are correct, XX should
> be able to compress _anything_.   That is, the output of XX
> should _always_ be at least one bit shorter than the input.
> Then the compound operation XX(XX(...)) should produce something
> two bits shorter than the original input.  If you start with a
> N-bit message and apply the XX function N-1 times, you should be
> able to compress each and every message down to a single bit.


I have often heard that example and I used it myself several times.  I
do not use it anymore, because it is not that easy. There's a major
flaw in this example, and therefore it is not really a
counterexample. If the faculty found that flaw I'd be in a bad
position.

You could define some reversible bizarre function f that does exactly
that job, i.e. for a given message m you need to apply f again and
again and after some finite number of calculations you'll find that
f(...f(m)...) = x where x = 0 or 1

So this is possible. Since the function is reversible, let's call
the inverse function f', and you'll find 
m = f'( ... f'(x)...)  where x is still 0 or 1

Ooops. What happened? Why does this work? Because the commonly used
counterexample has a flaw. 

The reason is that you can invert f(...f(m)...) only if you count the 
number of times you applied f. You need to know the number of times in 
order to revert = decompress it, because you need to apply f' exactly that
many times. You don't have any other stop condition. Applying f' is not a
proper recursion, it's an iteration.

So your information is actually stored in this number, not in 0 or 1.
The output of the compression function is not 0 or 1, it is that
hidden number telling how often you need to apply f to reach 0 or 1.

So just give it as a contradiction that there can not be such a
function because it could be recursively applied to result in 
a single bit is insufficient, it is not a contradiction. 

You need to consider the recursion depth and the inversion. But then
it get's so complicated that most people don't understand it anymore.
And the argument, that reaching a single bit recursively is a
contradiction, is gone. You need to store a number. So what? Who says
that this number isn't shorter that the plain message? And suddenly
your back deep in theory.


That's why I don't like that example. It's convincing at a first
glance, but I don't believe that it is correct.






>
>  1) Get a few "expert witnesses" to certify that your position is
> certainly not a personal fantasy.  (I assume German jurisprudence
> has something akin to the US notion of expert witnesses, right?)


I did. Unfortunately I didn't find a german one, because
it is very difficult to find a german professor witnessing against
any other. It's a tight community. 

I found some outside Germany. But they didn't give me a paper with 
signature, just e-mails. Will see whether the court will accept that. 

I've sent those e-mails to the dean of the faculty of computer science
to convince him that the faculty is wrong. As a result, he configured
the mail relay of the faculty to drop any e-mail containing my 
last name anywhere in the header. 


It's ridiculous and I would laugh if it wouldn't be exactly the
faculty that's said to be the best german faculty of computer science.  





>  2) Try to get the burden-of-proof reversed. 

Very difficult. I meanwhile became an expert in german examination law
and it usually requires the examinee to proof that the examiners
opinion is wrong. But since I already have proven several times that
the university was lying intentionally to the court, they might take
that into consideration. After all, I have brought this forward, and 
I have done my duty. Now it should be up to the university to
respond. They didn't comment for more than four years now.


> The opposition
> are claiming that known algorithms do the job.  Get them to be
> specific about which algorithms they are referring to.  Then
> file with the court some disks, each containing 1.44 megabytes
> of data.

They say LZW and MTF. I have already given an example for LZW. 
They don't care. I've told them to take any random string taken 
from /dev/random under Linux. They don't care. The german principle is
that a faculty is always right by definition. 



>  3) Diagram the pigeon-hole argument for the judge.  See
> diagrams below.

I'll try that. Thanks.



regards
Hadmut




-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Compression theory reference?

2004-09-01 Thread Matt Crawford
On Aug 31, 2004, at 15:56, John Denker wrote:
 4) Don't forget the _recursion_ argument.  Take their favorite
algorithm (call it XX).  If their claims are correct, XX should
be able to compress _anything_.   That is, the output of XX
should _always_ be at least one bit shorter than the input.
Then the compound operation XX(XX(...)) should produce something
two bits shorter than the original input.  If you start with a
N-bit message and apply the XX function N-1 times, you should be
able to compress each and every message down to a single bit.
Plus a string of log(N) bits telling you how many times to apply the 
decompression function!
Uh-oh, now goes over the judge's head ...

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Compression theory reference?

2004-09-01 Thread Matt Crawford
On Aug 31, 2004, at 14:50, Victor Duchovni wrote:
This is a question in elementary finite set theory, not computer 
science
(whatever that is :-). All proofs will involve some mathematics. The
above is I think simpler than your original argument and is the 
simplest
I can come up with...
I think Hadmut was looking for an authority that would be respected by 
the CS department he is dealing with.  It's a sad state of affairs when 
they will accept authority over proof.  However, I can give what I 
think is a simpler proof, using only high school math.

Assume that some invertible function f() turns no input message into a 
longer output message.

We can prove that it also does not make any message *shorter*, and 
hence is not a "compression" function after all.

In particular, f() turns every one-bit message into a one-bit message.
Suppose f() preserves the length of all n-bit messages, for 1 <= n <= 
N.  (This is already the case for N=1.) What does f() do to a message M 
of N+1 bits length?  By assumption, f(M) is not N+2 bits or longer.   
But all output messages of N bits or less are the result of some input 
of N bits or less and hence cannot be f(M).  So by elimination, f(M) is 
N+1 bits long.

By mathematical induction, f() preserves the length of every message.
-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Compression theory reference?

2004-09-01 Thread lists

From: Hadmut Danisch <[EMAIL PROTECTED]>

> It can be easily shown that there is no lossless 
> compression method which can effectively compress every possible
> input.

> Therefore, I need a book about computer science or encoding theory,
> which explicitely says that this is impossible, in a way that a person
> unexperienced in computer science (judges at a court) can read and 
> at least see that this is not just my personal phantasy and at least
> somehow reasonable. 

Section 20.4 of
Press, Teukolsky, Vetterling & Flannery
Numerical Recipies in Fortran, 2nd Ed (1992)
Cambridge University Press
ISBN 0-521-43064-X
makes simple reading to this reader with minimal computer science
education incidental to a Physics course.  There are related
"Numerical Recipies" books using other computer languages.

The first paragraph on compression is (with *italics*):

  "A lossless data compression algorithm takes a string of symbols
   (typically ASCII characters or bytes) and translates it *reversibly*
   into another string, one that is *on the average* of shorter length.
   The words "on the average" are crucial; it is obvious that no reversible
   algorithm can make all strings shorter - there just aren't enough short
   strings to be in one-to-one correspondence with longer strings.
   Compression algorithms are possible only when, on the input side, some
   strings, or some input symbols, are more common than others.  These can
   then be encoded in fewer bits than rarer input strings or symbols,
   giving a net average gain."

There is 10.5 pages on compression and some further references I have
not read.

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Compression theory reference?

2004-08-31 Thread John Denker
Hadmut Danisch wrote:
It can be easily shown that there is no lossless 
compression method which can effectively compress every possible
input.
OK.
... I need a book about computer science or encoding theory,
which explicitely says that this is impossible, in a way that a person
unexperienced in computer science (judges at a court) can read and 
at least see that this is not just my personal phantasy and at least
somehow reasonable. 
There are two conficting requirements there.
 -- authoritative and correct, and
 -- understandable to a judge with no technical expertise.
As you know, it is easy to satisfy either requirement separately.
My suggestions:
 1) Get a few "expert witnesses" to certify that your position is
certainly not a personal fantasy.  (I assume German jurisprudence
has something akin to the US notion of expert witnesses, right?)
 2) Try to get the burden-of-proof reversed.  The opposition
are claiming that known algorithms do the job.  Get them to be
specific about which algorithms they are referring to.  Then
file with the court some disks, each containing 1.44 megabytes
of data.  If the opposition are telling the truth, they should
be able to compress these using one of their chosen algorithms.
Offer to concede the entire case if they can shorten the text
by more than, say, 0.1 percent (i.e. 1.44 kilobytes shorter).
Of course you fill your disks using a good hardware random
number generator, e.g.
  http://www.av8n.com/turbid/
Be sure to arrange suitable precautions so that the opposition
doesn't get a chance to peek at your disks and build a
special-purpose "rumplestiltskin" encoder/decoder, i.e. one
that contains quotations of your data.  One precaution would
be to require that they use an open-source implementation, or
at least an impartial implementation, of a published algorithm.
 3) Diagram the pigeon-hole argument for the judge.  See
diagrams below.
 4) Don't forget the _recursion_ argument.  Take their favorite
algorithm (call it XX).  If their claims are correct, XX should
be able to compress _anything_.   That is, the output of XX
should _always_ be at least one bit shorter than the input.
Then the compound operation XX(XX(...)) should produce something
two bits shorter than the original input.  If you start with a
N-bit message and apply the XX function N-1 times, you should be
able to compress each and every message down to a single bit.

Here's how to diagram the pigeon-hole argument:
Write down all the three-bit numbers, and demonstrate a lossless
non-compressive encoding:
plaintext codetext
  000 -- 000
  001 -- 001
  010 -- 010
  011 ---\  /=== 011
  \/
  /\
  100 ===/  \--- 100
  101 -- 101
  110 -- 110
  111 -- 111
===
Then give an example of a lossy compressive code.  Make
the point that the codetext words are shorter than the
plaintext words.  However, there are necessarily fewer
of them, so the coding scheme is necessarily lossy and
non-invertible.
plaintext codetext
  000 -- 00  "zero"
  001 -- 01  "one"
  010 -- 10  "two"
  011 ---=== 11  "many"
/
   |
  100 /|
   |
   |
  101 /|
   |
   |
  110 /|
   |
   |
  111 /
=
Then give an example of a code that might or might not
be compressive, depending on statistical assumptions.
The following code is compressive if zero, one, and two
are considerably more probable than the other three-bit
numbers, but does is anti-compressive if all eight
three-bit numbers are equally likely, or nearly equally
likely.

  000 -- 00  "zero"
  001 -- 01  "one"
  010 -- 10  "two"
  011 -- 11000
  100 -- 11001
  101 -- 11010
  110 -- 11011
  111 -- 11100
Average length, for equiprobable plaintext words:
   (2+2+2+5+5+5+5+5) / 8 = 3.875
which is an expansion, not a compression.
Judges like fairness.  Certainly an equiprobable distribution
of input words is fair.  It reflects the bit-strings that
would be generated by tossing fair coins (three at a time),
or tossing a fair eight-sided die.  This fairest of distributions
is incompressible in the usual sense.

Finally, offer a challenge.  Write down all eight three-bit
plaintext words, and all four two-bit codetext words.  Ask
the judge to conclude for himself that it is obviously
impossible to draw lines connecting corresponding words in
a one-to-one fashion.
  000   00
  001   01
  010   10
  011   11
  100
  101
  110
  111

=
Note that in the last example, the codetext words were only one bit
shorter than the plaintext words.  If you want to pound on the point,
present the samething with four-bit plaint

Re: Compression theory reference?

2004-08-31 Thread Victor Duchovni
On Tue, Aug 31, 2004 at 02:48:00PM +0200, Hadmut Danisch wrote:

> It can be easily shown that there is no lossless 
> compression method which can effectively compress every possible
> input. Proof is easy: In a first step, consider all 
> possible messages of length n bit, n>0. There are 2^n different
> ones. But there are only (2^n)-1 shorter messages, so 
> there is no injective encoding to encode all messages into
> a shorter one.

More simply:

Choose any injection (e.g. lossless compression) from the set of finite
length messages to the set of finite length messages. Suppose for
*some* finite number N the injection has the property that no message
of length *at most* N bits encodes in more than N bits (some might get
longer while staying under N+1 bits, some shorter we don't care). Since
there is a finite number of messages that are N bits or less, and the
injection is from a finite set to itself, the injection is a bijection
(no missed outputs). So from the weaker assumption we conclude that all
N bit or less outputs correspond to N bit or less inputs, and therefore
that the injection we are looking at for the given number N never encodes
a message of length N+1 or greater in N bits or less.

>From the above we can conclude that any injection, that for *all*
N has the above (weaker for any *given* N than always compressing)
property, never encodes a message of length N+1 in N or less bits. The
only candidate injections are then simple length-preserving permutations.

> Therefore, I need a book about computer science or encoding theory,
> which explicitely says that this is impossible, in a way that a person
> unexperienced in computer science (judges at a court) can read and 
> at least see that this is not just my personal phantasy and at least
> somehow reasonable. 
> 
> I have checked several books about information and coding theory, 
> but didn't find any suitable for readers without computer science 
> background. The statement is always hidden in formulas about
> entropy, average code lengths, code efficiency inequations etc.
> If you had such an encoding scheme, you could easily show that 
> the average length is below the entropy of the efficiency is >100%.  
> But a non-computer science person does not understand that. 
> 

This is a question in elementary finite set theory, not computer science
(whatever that is :-). All proofs will involve some mathematics. The
above is I think simpler than your original argument and is the simplest
I can come up with...

-- 

 /"\ ASCII RIBBON  NOTICE: If received in error,
 \ / CAMPAIGN Victor Duchovni  please destroy and notify
  X AGAINST   IT Security, sender. Sender does not waive
 / \ HTML MAILMorgan Stanley   confidentiality or privilege,
   and use is prohibited.

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Compression theory reference?

2004-08-31 Thread Hadmut Danisch

Hi,

I need a literature reference for a simple problem of 
encoding/compression theory:

It can be easily shown that there is no lossless 
compression method which can effectively compress every possible
input. Proof is easy: In a first step, consider all 
possible messages of length n bit, n>0. There are 2^n different
ones. But there are only (2^n)-1 shorter messages, so 
there is no injektive encoding to encode all messages into
a shorter one. And then all codewords of length 100%.  
But a non-computer science person does not understand that. 

Does anybody know a book about coding theory which explicitely states
the impossibility of such a compression method in plain language?

regards
Hadmut







-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]