Re: Entropy of other languages

2007-02-26 Thread Sandy Harris

Travis H. <[EMAIL PROTECTED]> wrote:


On Wed, Feb 07, 2007 at 05:42:49AM -0800, Sandy Harris wrote:
> He starts from information theory and an assumption that
> there needs to be some constant upper bound on the
> receiver's per-symbol processing time. From there, with
> nothing else, he gets to a proof that the optimal frequency
> distribution of symbols is always some member of a
> parameterized set of curves.

Do you remember how he got from the "upper bound on processing time"
to anything other than a completely uniform distribution of symbols?


No. There was some pretty heavy math in the paper. With it in my hand,
I understood enough to follow the argument. 20 years later with no paper
to hand, I haven't a clue.

Paper is likely somewhere under his home page.
http://www.math.yale.edu/mandelbrot/


Seems to me a flat distribution has the minimal upper bound on
information content per symbol for a given amount of information!


Probably, but he did have a proof that the skewed distribution is
more efficient in some ways.

--
Sandy Harris
Quanzhou, Fujian, China

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Entropy of other languages

2007-02-07 Thread Travis H.
On Wed, Feb 07, 2007 at 05:53:16PM -0500, Steven M. Bellovin wrote:
>   Speakers of such Native American languages as Navajo, Choctaw
>   and Cheyenne served as radio operators, know as Code Talkers,
>   to keep communications secret during both World Wars. Welsh
>   speakers played a similar role during the Bosnian War.
> 
> Does anyone know anything more about this use of Welsh?

http://en.wikipedia.org/wiki/Welsh_Guards says:

In 2002 the regiment arrived in Bosnia as part of SFOR, a NATO-led
force intended to ensure peace and stability reigns supreme in the
Balkan nation. During their deployment HM the Queen Mother died. A
number of officers of the Welsh Guards stood in vigil around the Queen
Mother's coffin which was lying in state in Westminster Hall, one of a
number of regiments to do so. The regiment returned home from their
deployment to Bosnia later in the year.

That's all I could find in a 10 minute search...
-- 
Good code works.  Great code can't fail. -><-
http://www.subspacefield.org/~travis/>
For a good time on my UBE blacklist, email [EMAIL PROTECTED]


pgp0PTSZawU9U.pgp
Description: PGP signature


Re: Entropy of other languages

2007-02-07 Thread Travis H.
On Wed, Feb 07, 2007 at 05:42:49AM -0800, Sandy Harris wrote:
> He starts from information theory and an assumption that
> there needs to be some constant upper bound on the
> receiver's per-symbol processing time. From there, with
> nothing else, he gets to a proof that the optimal frequency
> distribution of symbols is always some member of a
> parameterized set of curves.

Do you remember how he got from the "upper bound on processing time"
to anything other than a completely uniform distribution of symbols?

Seems to me a flat distribution has the minimal upper bound on
information content per symbol for a given amount of information!

-- 
Good code works.  Great code can't fail. -><-
http://www.subspacefield.org/~travis/>
For a good time on my UBE blacklist, email [EMAIL PROTECTED]


pgpmipxzIhxBi.pgp
Description: PGP signature


Re: Entropy of other languages

2007-02-07 Thread Steven M. Bellovin
On Wed, 7 Feb 2007 12:44:30 -0600
Nicolas Williams <[EMAIL PROTECTED]> wrote:

> 
> http://www.omniglot.com/writing/mayan.htm
> 
An interesting web site, which also contains the following
crypto-relevant statement:

Speakers of such Native American languages as Navajo, Choctaw
and Cheyenne served as radio operators, know as Code Talkers,
to keep communications secret during both World Wars. Welsh
speakers played a similar role during the Bosnian War.

Does anyone know anything more about this use of Welsh?


--Steve Bellovin, http://www.cs.columbia.edu/~smb

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


RE: Entropy of other languages

2007-02-07 Thread Trei, Peter
Travis H. wrote:

On Sun, Feb 04, 2007 at 03:46:41PM -0800, Allen wrote:
[...]

> What about other languages? Does anyone know the relative entropy of 
> other alphabetic languages? What about the entropy of ideographic 
> languages? Pictographic? Hieroglyphic?

IIRC, it turned out that Egyptian heiroglyphs were actually syllabic,
like Mesopotamian, so no fun there.  Mayan, on the other hand, remains
an enigma.  I read not long ago that they also had a way of recording
stories on bundles of knotted string, like the end of a mop.

The string-encoding system was Incan, not Mayan. They're called
'quipus', and 
while they contain a lot of numeric data, its highly debated whether
they were 
a generalized writing system (most experts seem to doubt it).

The Maya used an logosyllabic writing system which has been deciphered,
most of the progress having been made in the last 25 years or so.

Peter Trei


-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Entropy of other languages

2007-02-07 Thread Nicolas Williams
On Mon, Feb 05, 2007 at 09:08:07PM -0600, Travis H. wrote:
> IIRC, it turned out that Egyptian heiroglyphs were actually syllabic,
> like Mesopotamian, so no fun there.  Mayan, on the other hand, remains
> an enigma.  I read not long ago that they also had a way of recording
> stories on bundles of knotted string, like the end of a mop.

Er, no, Mayan has been decoded:

http://www.omniglot.com/writing/mayan.htm

The knotted string system was an Inca writing system, IIRC.

Nico
-- 

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Entropy of other languages

2007-02-07 Thread Sandy Harris

Allen <[EMAIL PROTECTED]> wrote:


An idle question. English has a relatively low entropy as a
language. Don't recall the exact figure, but if you look at words
that start with "q" it is very low indeed.

What about other languages? Does anyone know the relative entropy
of other alphabetic languages? What about the entropy of
ideographic languages? Pictographic? Hieroglyphic?


The most general answer is in a very old paper of Mandelbrot's.
Sorry, I don't recall the exact reference or have it to hand.

He starts from information theory and an assumption that
there needs to be some constant upper bound on the
receiver's per-symbol processing time. From there, with
nothing else, he gets to a proof that the optimal frequency
distribution of symbols is always some member of a
parameterized set of curves.

Pick the right parameters and Mandelbrot's equation
simplifies to Zipf's Law, the well-known rule about
word, letter or sound frequencies in linguistics.
I'm not sure if you can also get Pareto's Law which
covers income & wealth distributions in economics.

--
Sandy Harris
Quanzhou, Fujian, China

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Entropy of other languages

2007-02-07 Thread Travis H.
On Sun, Feb 04, 2007 at 03:46:41PM -0800, Allen wrote:
> An idle question. English has a relatively low entropy as a 
> language. Don't recall the exact figure, but if you look at words 
> that start with "q" it is very low indeed.

I seem to recall Shannon did some experiments which showed that with a
human as your probability oracle, it's roughly 1-2 bits per letter.
Many of his papers are online last time I looked, but some of his
experimental results are harder to locate online.

> What about other languages? Does anyone know the relative entropy 
> of other alphabetic languages? What about the entropy of 
> ideographic languages? Pictographic? Hieroglyphic?

IIRC, it turned out that Egyptian heiroglyphs were actually syllabic,
like Mesopotamian, so no fun there.  Mayan, on the other hand, remains
an enigma.  I read not long ago that they also had a way of recording
stories on bundles of knotted string, like the end of a mop.
-- 
The driving force behind innovation is sublimation.
-><- http://www.subspacefield.org/~travis/>
For a good time on my UBE blacklist, email [EMAIL PROTECTED]


pgpyE3iyc6JFI.pgp
Description: PGP signature


Re: Entropy of other languages

2007-02-05 Thread Steven M. Bellovin
On Sun, 04 Feb 2007 15:46:41 -0800
Allen <[EMAIL PROTECTED]> wrote:

> Hi gang,
> 
> An idle question. English has a relatively low entropy as a language.
> Don't recall the exact figure, but if you look at words that start
> with "q" it is very low indeed.
> 
> What about other languages? Does anyone know the relative entropy of
> other alphabetic languages? What about the entropy of ideographic
> languages? Pictographic? Hieroglyphic?
> 
It should be pretty easy to do at least some experiments today --
there's a lot of online text in many different languages.  Have a look
at http://www.gutenberg.org/catalog/ for freely-available books that
one could mine for statistics.


--Steve Bellovin, http://www.cs.columbia.edu/~smb

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]