On Sun, 19 Oct 2008 16:38:37 +1300, Lawrence D'Oliveiro wrote: > In message <[EMAIL PROTECTED]>, Steven D'Aprano > wrote: > >> On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote: >> >>> Data can come in fractional bits. That's how compression works. >> >> If you don't believe me, try compressing a single bit and see if you >> get a "fractional bit". > > If both states of the bit are not equally likely, then you do indeed > have a fractional bit, since > > nrbits = (- logbase2(P[bit = 0]) - logbase2(P[bit = 1])) / 2
That's an arithmetic mean of the logarithms. It doesn't imply that there are fractional bits any more than an average family having 2.3 children implies that there are 0.3 of a child wandering around the streets. Using the Shannon measure of information, you can have messages which contain fractional information (technically, "surprisal"), when measured in bits. But that doesn't imply the existence of fractional bits. Look at it this way: consider a barter economy where I agree to swap 5 chickens for 2 axes. So each axe is equivalent to 2.5 chickens. But that doesn't imply that there is such a thing as 0.5 of a chicken -- at least not a *live* chicken. While I can blithely talk about bartering fractional chickens, in practice when I actually go to make good on my promise, it must be an integer number of chickens. Similarly, we can talk about messages containing fractional bits of information, but when we actually store or transmit that message in practice, we can only use integer numbers of bits. As Wikipedia puts it: It is important to differentiate between the use of "bit" in referring to a discrete storage unit and the use of "bit" in referring to a statistical unit of information. The bit, as a discrete storage unit, can by definition store only 0 or 1. A statistical bit is the amount of information that, on average[citation needed], can be stored in a discrete bit. ... If these two ideas need to be distinguished, sometimes the name bit is used when discussing data storage while shannon is used for the statistical bit. http://en.wikipedia.org/wiki/Bit -- Steven -- http://mail.python.org/mailman/listinfo/python-list
