Re: [music-dsp] about entropy encoding

2015-07-18 Thread Peter S
I had an idea on how to improve the entropy estimate, especially for
some types of signals. Here are the results:

http://morpheus.spectralhead.com/entropy/estimate2/

-P
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp


Re: [music-dsp] about entropy encoding

2015-07-18 Thread Peter S
On 17/07/2015, Ethan Duni ethan.d...@gmail.com wrote:

 Said another way: if that's the only signal I want to transmit, then I'll
 just build those parameters into the receiver and not bother transmitting
 anything. The entropy will be zero. The receiver will simply output the
 desired waveform without any help from the other end.

I argue that's not a transmitter, but rather, a signal generator.
In that case, you do not transmit anything, you just have a signal
generator that constantly outputs a square wave with constant
parameters.

Once you build an _actual_ transmitter, that actually transmits
something, then the entropy of a message is 100% certain to be bound
to be nonzero, since you need to transmit at least a single bit to
transmit anything. You cannot transmit nothing ... That's not a
transmitter.

[Sidenote: assuming your signal generator is a digital one, it will
contain bits that represent 1) its code 2) its parameters, both of
which have nonzero entropy. The just build those parameters into the
receiver step is exactly where that entropy comes from.

Consider algorithmic entropy[1] (also called Kolmogorov complexity) -
any program that has an output of nonzero length, is bound to have
nonzero size, hence nonzero entropy. Therefore, your constant signal
generator will have nonzero (algorithmic) entropy. If you want to
transmit your constant signal generator program in a message, then the
length of the message is bound to be nonzero.]

 It's only if there's
 some larger class of signals containing this one that I want to handle,
 that the question of entropy comes up.

Larger class of signals precisely means, at least two signals,
since 2  1. Once you have at least two signals that you possibly want
to transmit (which is virtuall all cases when you have an actual
transmitter and not a constant signal generator), then each message
will always have nonzero entropy. Hence, when you have an actual
transmitter, the entropy of a message is bound to be nonzero in all
cases.

[And if you only have a single signal with 100% probability, then your
constant signal generator will have nonzero algorithmic entropy, which
means the length of the smallest program that produces that output.
You cannot produce something from nothing - by hard-coding the
parameters into your program, you just hid that entropy inside the
program's algorithm or data. If you want to transmit the program or
its parameters, the length of the message will be nonzero.

Quote from [1]: In Shannon's theory, entropy is a property of a
statistical ensemble [...]. Kolmogorov and Chaitin showed
independently that entropy is in fact an *intrinsic* property of an
object, which can be computed (in theory at least) without reference
to anything outside that object. The field that deals with this is
called algorithmic information theory, developed around 1965, and
these crackpots believe that entropy (information content) is not
necessarily a function of a probability distribution. This theory is a
half century old, I'm not saying anything new.]

-P

Reference:
[1] http://www.maths.tcd.ie/pub/Maths/Courseware/AlgorithmicEntropy/Entropy.pdf
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp


Re: [music-dsp] about entropy encoding

2015-07-18 Thread Peter S
On 18/07/2015, Peter S peter.schoffhau...@gmail.com wrote:

 [Sidenote: assuming your signal generator is a digital one, it will
 contain bits that represent 1) its code 2) its parameters, both of
 which have nonzero entropy. The just build those parameters into the
 receiver step is exactly where that entropy comes from.

Example: if your 100% probability squarewave that you want to generate
is characterized by the set of parameters amplitude=100, phase=30,
frequency=1000, and duty cycle=75%, represented by the binary
sequence 1 1100100 0 101000 1001011, then if you just build
those parameters into the receiver means, that you put those bits (or
some other represenations of) into your program.

Without your program containing these numbers (in whatever form or
encoding), how will it know that it needs to output a square wave with
amplitude of 100, phase of 30, frequency of 1000 and duty cycle of
75% ? If any of those parameters are missing, it won't know, so it
won't output that exact square wave. Therefore, your program *must*
contain these numbers.

So all you did, is you put that entropy into your program, and you
pretend that there's no entropy.

It's still there, except it is now in your program, not in a message.
You merely turned it into algorithmic entropy - you cannot make
something out of nothing.

-P
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp


Re: [music-dsp] about entropy encoding

2015-07-18 Thread Peter S
Arbitrary waveform entropy estimate #3
http://morpheus.spectralhead.com/entropy/estimate3/

Results:

- For white noise, gives ~1.
- For simple periodic waveforms, gives  ~0.1.
- For mixture of two simple nonharmonic periodic waveforms, gives  0.5.
- For increasing length periodic noise, gives increasing estimate.

At this point I'm convinced that it is possible to create entropy
estimators that work well for arbitrary simple periodic waveforms. For
nonharmonic mixtures of periodic waveforms, not as good, but still
distinguishes them from noise.

-P
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp


Re: [music-dsp] about entropy encoding

2015-07-18 Thread Peter S
Consider what happens when you have an actual transmitter/receiver
pair, and not just a constant signal generator.

There are two possibilities:

1) They use a pre-agreed codebook.

2) The transmitter first sends the codebook to be used to decode
the messages.

If either of these is not true, then the receiver will have absolutely
no clue what to output. It will just receive nessages '0', '001',
'101', '1100', '1110', '10', ... If receiver doesn't have a codebook
(either pre-agreed or received) to look up what each of these codes
mean, then the receiver will think: What to do? I have no idea what
these mean, so I don't know what to output.

In case 1) when they have a pre-agreed codebook, you simply turned its
entropy into algorithmic entropy (you hid that entropy inside your
program's data), and the receiver can only reconstruct a certain set
of hard-coded parameters (a certain set of hard-coded waveforms).

In case 2), transmitter first builds a codebook (for example, based on
the probabilities of the messages it is going to send using Huffman
coding), and transmits it to the receiver. For example, let's say that
there are going to be only 2 different square waves:

A) Amplitude=100, phase=30, frequency=1000, duty cycle=75%,
represented by binary sequence '1 1100100 0 101000 1001011'.

B) Amplitude=50, phase=0, frequency=333, duty cycle=50%,
represented by binary sequence '1 110010 0 101001101 110010'.

Transmitter decides to send square wave A as bit '0', and send
square wave B as bit '1'. Now, in order for them to be able to
communicate, transmitter *must* first send this mapping (codebook) to
the receiver for the receiver to be able to understand and decode the
messages!

So the *very* first message will be this:

Hey receiver! When I send you '0', that means 'square wave,
amplitude=100, phase=30, frequency=1000, duty cycle=75%'. And when I
send you '1', that means 'square wave, amplitude=50, phase=0,
frequency=333, duty cycle=50%'. End of codebook.

Expressed as binary, it may look something like this (encoded as
either prefix code, fixed length code or some other encoding):

0 1 1100100 0 101000 1001011 1 1 110010 0 101001101 110010

The encoding of the codebook may be different, but it is *certain* to
contain all the parameters for all the possible square waves that will
be sent, and this _must_ be the very first message, otherwise the
receiver won't know what the received code means!

Note, there's no way to encode this further using another encoding,
because then for that other encoding, it would need to send another
codebook first Hence, the codebook must once be sent 'as-is' or
using some prefix encoding that doesn't use a custom code mapping.

So, before sending *any* encoded messages, unless the codebook is
pre-agreed - the transmitter *must* first send *all* sets of
parameters *once*, for the receiver to be able to understand the
messages. Irregardless of _whatever_ probability distribution they
have.

If transmitter wants to send some *very* complex waveform, even if
that will be encoded as a short, single '0' bit later on, it must
first send the waveform *once* for the receiver to be able to
reconstruct it! Otherwise, receiver will not know what to do when
receiving the message '0'!

So, assuming that the messages are sent by encoding via a codebook,
using however short code something may be encoded, unless it is
pre-agreed (and thus hard-coded as algorithmic entropy in the
program), it *must* be sent at least *once* for the receiver to
reconstruct it. Without that, receiver is unable to decode the
message.

So, if codebook has length of 100 (using whatever encoding), and there
are going to be 100 messages, then the *actual* entropy (length) of
each message will be increased by 1 bit, since on average, there is 1
extra bit in the code book for each message that is sent, which is
absolutely necessary for decoding the messages.

Said otherwise, if the average length of a message is H bits and you
send N messages, and the codebook has a length of L bits, then the
*actual* effective entropy H' of each message will be

H' = H + L/N

bits, not H bits, because transmitter must also send the codebook
first, increasing the overall average bits per message (hence, the
actual effective entropy) by L/N bits. Since the length of the
codebook L is nonzero unless it's pre-agreed, L/N will be nonzero
unless the number of messages to be sent are infinite (so, always).

Hence, the encoded parameter set in the cookbook (from whatever
probability distribution) will *always* boost the average entropy per
message by a *nonzero* amount, its exact amount only determined by the
length of the codebook (the size of the parameter set), and the number
of messages.

(And that is just in addition to the entropy of the messages coming
from the probability distribution, which is already nonzero once you
have at least two possible messages of nonzero probability. So we have

Re: [music-dsp] about entropy encoding

2015-07-18 Thread Peter S
It follows from the above, that Shannon's entropy model is a
simplified, idealized model of information, that pretends that
algorithms have zero length and thus no entropy, and you can magically
share a codebook telepathically without actually transmitting it.

If you dismiss all these details and just pretend they do not exist,
then in a highly simplistic, idealized model, you may actually have an
entropy of zero in a *single*, special corner case. Even in Shannon's
highly simplified model, in virtually all other cases, entropy is
nonzero.

Later models of information, like the one of Kolmogorov, base the
amount of information not on statistical probability, but rather, the
minimal length of the algorithm needed to encode that information,
which turn out to be near equivalent definitions.

Quote from [1]:

Firstly, we have to point out the relation between Kolmogorov
complexity and Shannon's entropy. Briefly, classic information theory
says a random variable X distributed according to P(X=x) has entropy
or complexity of a statistical form, where the interpretation is that
H(X) bits are on the average sufficient to describe an outcome x.
Computational or algorithmic complexity says that an object x has
complexity C(x)=the minimum length of a binary program for x. It is
very interesting to remark that these two notions turn out to be much
the same. Thus, the intended interpretation of complexity C(x) as a
measure of the information content of an individual object x is
supported by a tight quantitative relationship to Shannon's
probabilistic notion. In particular, the entropy H=-SUM P(x)*logP(x)
of the distribution p is asymptotically equal to the expected
complexity SUM P(x)*C(x).

According to [2]:

Algorithmic entropy is closely related to statistically defined
entropy, the statistical entropy of an ensemble being, for any
concisely describable ensemble, very nearly equal to the ensemble
average of the algorithmic entropy of its members.

According to [3]:

Nevertheless, once allowance is made for the units used, the
expectation value of the algorithmic entropy for a set of strings
belonging to an ensemble is virtually the same as the Shannon entropy,
or the Boltzmann or Gibbs entropies derived for a set of states.

According to these sources, the algorithmic entropy asymptotically
equals, very nearly equals, or virtually the same as the
probabilistic entropy. Therefore, any signal that has nonzero length,
needs a nonzero-length program to encode, therefore, has nonzero
entropy.

[1] On Shannon, Fisher, and Algorithmic Entropy in Cognitive Systems
http://uni-obuda.hu/conferences/saci04/Crisan.pdf

[2] 2003, Niels Henrik Gregersen, From Complexity to Life
https://en.wiktionary.org/wiki/algorithmic_entropy

[3] 2009, Sean Devine, The Insights of Algorithmic Entropy
http://www.mdpi.com/1099-4300/11/1/85/pdf
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp


Re: [music-dsp] about entropy encoding

2015-07-18 Thread robert bristow-johnson

On 7/18/15 10:10 PM, Peter S wrote:

It follows from the above, that Shannon's entropy model is a
simplified, idealized model of information, that pretends that
algorithms have zero length and thus no entropy, and you can magically
share a codebook telepathically without actually transmitting it.



Shannon makes no issue of that.  if you need the codebook at the other 
end (because you don't already have it), you need to send it.  it's not 
so big and i have seen that in some compressed files and i am sure .zip 
files (at least the old ones, that were mostly Huffman coded but also 
had some run-length encoding in the model) have the codebook in the file 
preamble.


even so, Shannon information theory is sorta static.  it does not deal 
with the kind of redundancy of a repeated symbol or string.  just with 
the probability of occurrence (which is measured from the relative 
frequency of occurrence) of messages.  run-length coding isn't in 
there.  maybe there is something in Shannon information theory about 
information measure with conditional probability (that might obviate 
what LPC can do to compress data).


--

r b-j  r...@audioimagination.com

Imagination is more important than knowledge.



--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp