Re: Entropy of other languages
Travis H. [EMAIL PROTECTED] wrote: On Wed, Feb 07, 2007 at 05:42:49AM -0800, Sandy Harris wrote: He starts from information theory and an assumption that there needs to be some constant upper bound on the receiver's per-symbol processing time. From there, with nothing else, he gets to a proof that the optimal frequency distribution of symbols is always some member of a parameterized set of curves. Do you remember how he got from the upper bound on processing time to anything other than a completely uniform distribution of symbols? No. There was some pretty heavy math in the paper. With it in my hand, I understood enough to follow the argument. 20 years later with no paper to hand, I haven't a clue. Paper is likely somewhere under his home page. http://www.math.yale.edu/mandelbrot/ Seems to me a flat distribution has the minimal upper bound on information content per symbol for a given amount of information! Probably, but he did have a proof that the skewed distribution is more efficient in some ways. -- Sandy Harris Quanzhou, Fujian, China - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: Entropy of other languages
On Sun, Feb 04, 2007 at 03:46:41PM -0800, Allen wrote: An idle question. English has a relatively low entropy as a language. Don't recall the exact figure, but if you look at words that start with q it is very low indeed. I seem to recall Shannon did some experiments which showed that with a human as your probability oracle, it's roughly 1-2 bits per letter. Many of his papers are online last time I looked, but some of his experimental results are harder to locate online. What about other languages? Does anyone know the relative entropy of other alphabetic languages? What about the entropy of ideographic languages? Pictographic? Hieroglyphic? IIRC, it turned out that Egyptian heiroglyphs were actually syllabic, like Mesopotamian, so no fun there. Mayan, on the other hand, remains an enigma. I read not long ago that they also had a way of recording stories on bundles of knotted string, like the end of a mop. -- The driving force behind innovation is sublimation. -- URL:http://www.subspacefield.org/~travis/ For a good time on my UBE blacklist, email [EMAIL PROTECTED] pgpyE3iyc6JFI.pgp Description: PGP signature
Re: Entropy of other languages
Allen [EMAIL PROTECTED] wrote: An idle question. English has a relatively low entropy as a language. Don't recall the exact figure, but if you look at words that start with q it is very low indeed. What about other languages? Does anyone know the relative entropy of other alphabetic languages? What about the entropy of ideographic languages? Pictographic? Hieroglyphic? The most general answer is in a very old paper of Mandelbrot's. Sorry, I don't recall the exact reference or have it to hand. He starts from information theory and an assumption that there needs to be some constant upper bound on the receiver's per-symbol processing time. From there, with nothing else, he gets to a proof that the optimal frequency distribution of symbols is always some member of a parameterized set of curves. Pick the right parameters and Mandelbrot's equation simplifies to Zipf's Law, the well-known rule about word, letter or sound frequencies in linguistics. I'm not sure if you can also get Pareto's Law which covers income wealth distributions in economics. -- Sandy Harris Quanzhou, Fujian, China - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: Entropy of other languages
On Mon, Feb 05, 2007 at 09:08:07PM -0600, Travis H. wrote: IIRC, it turned out that Egyptian heiroglyphs were actually syllabic, like Mesopotamian, so no fun there. Mayan, on the other hand, remains an enigma. I read not long ago that they also had a way of recording stories on bundles of knotted string, like the end of a mop. Er, no, Mayan has been decoded: http://www.omniglot.com/writing/mayan.htm The knotted string system was an Inca writing system, IIRC. Nico -- - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
RE: Entropy of other languages
Travis H. wrote: On Sun, Feb 04, 2007 at 03:46:41PM -0800, Allen wrote: [...] What about other languages? Does anyone know the relative entropy of other alphabetic languages? What about the entropy of ideographic languages? Pictographic? Hieroglyphic? IIRC, it turned out that Egyptian heiroglyphs were actually syllabic, like Mesopotamian, so no fun there. Mayan, on the other hand, remains an enigma. I read not long ago that they also had a way of recording stories on bundles of knotted string, like the end of a mop. The string-encoding system was Incan, not Mayan. They're called 'quipus', and while they contain a lot of numeric data, its highly debated whether they were a generalized writing system (most experts seem to doubt it). The Maya used an logosyllabic writing system which has been deciphered, most of the progress having been made in the last 25 years or so. Peter Trei - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: Entropy of other languages
On Wed, Feb 07, 2007 at 05:42:49AM -0800, Sandy Harris wrote: He starts from information theory and an assumption that there needs to be some constant upper bound on the receiver's per-symbol processing time. From there, with nothing else, he gets to a proof that the optimal frequency distribution of symbols is always some member of a parameterized set of curves. Do you remember how he got from the upper bound on processing time to anything other than a completely uniform distribution of symbols? Seems to me a flat distribution has the minimal upper bound on information content per symbol for a given amount of information! -- Good code works. Great code can't fail. -- URL:http://www.subspacefield.org/~travis/ For a good time on my UBE blacklist, email [EMAIL PROTECTED] pgpmipxzIhxBi.pgp Description: PGP signature
Re: Entropy of other languages
On Wed, Feb 07, 2007 at 05:53:16PM -0500, Steven M. Bellovin wrote: Speakers of such Native American languages as Navajo, Choctaw and Cheyenne served as radio operators, know as Code Talkers, to keep communications secret during both World Wars. Welsh speakers played a similar role during the Bosnian War. Does anyone know anything more about this use of Welsh? http://en.wikipedia.org/wiki/Welsh_Guards says: In 2002 the regiment arrived in Bosnia as part of SFOR, a NATO-led force intended to ensure peace and stability reigns supreme in the Balkan nation. During their deployment HM the Queen Mother died. A number of officers of the Welsh Guards stood in vigil around the Queen Mother's coffin which was lying in state in Westminster Hall, one of a number of regiments to do so. The regiment returned home from their deployment to Bosnia later in the year. That's all I could find in a 10 minute search... -- Good code works. Great code can't fail. -- URL:http://www.subspacefield.org/~travis/ For a good time on my UBE blacklist, email [EMAIL PROTECTED] pgp0PTSZawU9U.pgp Description: PGP signature
Re: Entropy of other languages
On Sun, 04 Feb 2007 15:46:41 -0800 Allen [EMAIL PROTECTED] wrote: Hi gang, An idle question. English has a relatively low entropy as a language. Don't recall the exact figure, but if you look at words that start with q it is very low indeed. What about other languages? Does anyone know the relative entropy of other alphabetic languages? What about the entropy of ideographic languages? Pictographic? Hieroglyphic? It should be pretty easy to do at least some experiments today -- there's a lot of online text in many different languages. Have a look at http://www.gutenberg.org/catalog/ for freely-available books that one could mine for statistics. --Steve Bellovin, http://www.cs.columbia.edu/~smb - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]