Re: Entropy of other languages

2007-02-26 Thread Sandy Harris

Travis H. [EMAIL PROTECTED] wrote:


On Wed, Feb 07, 2007 at 05:42:49AM -0800, Sandy Harris wrote:
 He starts from information theory and an assumption that
 there needs to be some constant upper bound on the
 receiver's per-symbol processing time. From there, with
 nothing else, he gets to a proof that the optimal frequency
 distribution of symbols is always some member of a
 parameterized set of curves.

Do you remember how he got from the upper bound on processing time
to anything other than a completely uniform distribution of symbols?


No. There was some pretty heavy math in the paper. With it in my hand,
I understood enough to follow the argument. 20 years later with no paper
to hand, I haven't a clue.

Paper is likely somewhere under his home page.
http://www.math.yale.edu/mandelbrot/


Seems to me a flat distribution has the minimal upper bound on
information content per symbol for a given amount of information!


Probably, but he did have a proof that the skewed distribution is
more efficient in some ways.

--
Sandy Harris
Quanzhou, Fujian, China

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Entropy of other languages

2007-02-07 Thread Travis H.
On Sun, Feb 04, 2007 at 03:46:41PM -0800, Allen wrote:
 An idle question. English has a relatively low entropy as a 
 language. Don't recall the exact figure, but if you look at words 
 that start with q it is very low indeed.

I seem to recall Shannon did some experiments which showed that with a
human as your probability oracle, it's roughly 1-2 bits per letter.
Many of his papers are online last time I looked, but some of his
experimental results are harder to locate online.

 What about other languages? Does anyone know the relative entropy 
 of other alphabetic languages? What about the entropy of 
 ideographic languages? Pictographic? Hieroglyphic?

IIRC, it turned out that Egyptian heiroglyphs were actually syllabic,
like Mesopotamian, so no fun there.  Mayan, on the other hand, remains
an enigma.  I read not long ago that they also had a way of recording
stories on bundles of knotted string, like the end of a mop.
-- 
The driving force behind innovation is sublimation.
-- URL:http://www.subspacefield.org/~travis/
For a good time on my UBE blacklist, email [EMAIL PROTECTED]


pgpyE3iyc6JFI.pgp
Description: PGP signature


FW: Entropy of other languages

2007-02-07 Thread Trei, Peter


Steven M. Bellovin wrote:

 
 On Sun, 04 Feb 2007 15:46:41 -0800
 Allen [EMAIL PROTECTED] wrote:
 
  Hi gang,
  
  An idle question. English has a relatively low entropy as a
 language.
  Don't recall the exact figure, but if you look at words that start 
  with q it is very low indeed.
  
  What about other languages? Does anyone know the relative entropy of

  other alphabetic languages? What about the entropy of ideographic 
  languages? Pictographic? Hieroglyphic?
  
 It should be pretty easy to do at least some experiments today -- 
 there's a lot of online text in many different languages.  Have a look

 at http://www.gutenberg.org/catalog/ for freely-available books that 
 one could mine for statistics.

As a very rough proxy, look at the length of the same text in different
translations. 

My father was in advertising in Europe. When they laid out a print ad,
they always did so using the German text. If the German fit, any other
language they were interested in would do so as well.

Now that I work (among other things) on cellphone applications, I'm
running into similar issues in internationalizing text on tiny screens.

Peter Trei

Disclaimer: This is a personal opinion. It may or may not jibe with my
employer's opinion.


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Entropy of other languages

2007-02-07 Thread Sandy Harris

Allen [EMAIL PROTECTED] wrote:


An idle question. English has a relatively low entropy as a
language. Don't recall the exact figure, but if you look at words
that start with q it is very low indeed.

What about other languages? Does anyone know the relative entropy
of other alphabetic languages? What about the entropy of
ideographic languages? Pictographic? Hieroglyphic?


The most general answer is in a very old paper of Mandelbrot's.
Sorry, I don't recall the exact reference or have it to hand.

He starts from information theory and an assumption that
there needs to be some constant upper bound on the
receiver's per-symbol processing time. From there, with
nothing else, he gets to a proof that the optimal frequency
distribution of symbols is always some member of a
parameterized set of curves.

Pick the right parameters and Mandelbrot's equation
simplifies to Zipf's Law, the well-known rule about
word, letter or sound frequencies in linguistics.
I'm not sure if you can also get Pareto's Law which
covers income  wealth distributions in economics.

--
Sandy Harris
Quanzhou, Fujian, China

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Entropy of other languages

2007-02-07 Thread Nicolas Williams
On Mon, Feb 05, 2007 at 09:08:07PM -0600, Travis H. wrote:
 IIRC, it turned out that Egyptian heiroglyphs were actually syllabic,
 like Mesopotamian, so no fun there.  Mayan, on the other hand, remains
 an enigma.  I read not long ago that they also had a way of recording
 stories on bundles of knotted string, like the end of a mop.

Er, no, Mayan has been decoded:

http://www.omniglot.com/writing/mayan.htm

The knotted string system was an Inca writing system, IIRC.

Nico
-- 

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


RE: Entropy of other languages

2007-02-07 Thread Trei, Peter
Travis H. wrote:

On Sun, Feb 04, 2007 at 03:46:41PM -0800, Allen wrote:
[...]

 What about other languages? Does anyone know the relative entropy of 
 other alphabetic languages? What about the entropy of ideographic 
 languages? Pictographic? Hieroglyphic?

IIRC, it turned out that Egyptian heiroglyphs were actually syllabic,
like Mesopotamian, so no fun there.  Mayan, on the other hand, remains
an enigma.  I read not long ago that they also had a way of recording
stories on bundles of knotted string, like the end of a mop.

The string-encoding system was Incan, not Mayan. They're called
'quipus', and 
while they contain a lot of numeric data, its highly debated whether
they were 
a generalized writing system (most experts seem to doubt it).

The Maya used an logosyllabic writing system which has been deciphered,
most of the progress having been made in the last 25 years or so.

Peter Trei


-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Entropy of other languages

2007-02-07 Thread Travis H.
On Wed, Feb 07, 2007 at 05:42:49AM -0800, Sandy Harris wrote:
 He starts from information theory and an assumption that
 there needs to be some constant upper bound on the
 receiver's per-symbol processing time. From there, with
 nothing else, he gets to a proof that the optimal frequency
 distribution of symbols is always some member of a
 parameterized set of curves.

Do you remember how he got from the upper bound on processing time
to anything other than a completely uniform distribution of symbols?

Seems to me a flat distribution has the minimal upper bound on
information content per symbol for a given amount of information!

-- 
Good code works.  Great code can't fail. --
URL:http://www.subspacefield.org/~travis/
For a good time on my UBE blacklist, email [EMAIL PROTECTED]


pgpmipxzIhxBi.pgp
Description: PGP signature


Re: Entropy of other languages

2007-02-07 Thread Travis H.
On Wed, Feb 07, 2007 at 05:53:16PM -0500, Steven M. Bellovin wrote:
   Speakers of such Native American languages as Navajo, Choctaw
   and Cheyenne served as radio operators, know as Code Talkers,
   to keep communications secret during both World Wars. Welsh
   speakers played a similar role during the Bosnian War.
 
 Does anyone know anything more about this use of Welsh?

http://en.wikipedia.org/wiki/Welsh_Guards says:

In 2002 the regiment arrived in Bosnia as part of SFOR, a NATO-led
force intended to ensure peace and stability reigns supreme in the
Balkan nation. During their deployment HM the Queen Mother died. A
number of officers of the Welsh Guards stood in vigil around the Queen
Mother's coffin which was lying in state in Westminster Hall, one of a
number of regiments to do so. The regiment returned home from their
deployment to Bosnia later in the year.

That's all I could find in a 10 minute search...
-- 
Good code works.  Great code can't fail. --
URL:http://www.subspacefield.org/~travis/
For a good time on my UBE blacklist, email [EMAIL PROTECTED]


pgp0PTSZawU9U.pgp
Description: PGP signature


Entropy of other languages

2007-02-05 Thread Allen

Hi gang,

An idle question. English has a relatively low entropy as a 
language. Don't recall the exact figure, but if you look at words 
that start with q it is very low indeed.


What about other languages? Does anyone know the relative entropy 
of other alphabetic languages? What about the entropy of 
ideographic languages? Pictographic? Hieroglyphic?


Thanks,

Allen

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Entropy of other languages

2007-02-05 Thread Steven M. Bellovin
On Sun, 04 Feb 2007 15:46:41 -0800
Allen [EMAIL PROTECTED] wrote:

 Hi gang,
 
 An idle question. English has a relatively low entropy as a language.
 Don't recall the exact figure, but if you look at words that start
 with q it is very low indeed.
 
 What about other languages? Does anyone know the relative entropy of
 other alphabetic languages? What about the entropy of ideographic
 languages? Pictographic? Hieroglyphic?
 
It should be pretty easy to do at least some experiments today --
there's a lot of online text in many different languages.  Have a look
at http://www.gutenberg.org/catalog/ for freely-available books that
one could mine for statistics.


--Steve Bellovin, http://www.cs.columbia.edu/~smb

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]