Francois Yergeau wrote:
yea. That could be it. I got a hard copy and it looks like the Fig 2 is the one I am looking for.[EMAIL PROTECTED] wrote:I remember there were some study to show although UTF-8 encode each Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use LESS characters in writting to communicate information than alphabetic base langauges.Any one can point to me such research?I don't know of exactly what you want, but I vaguely remember a paper given at a Unicode conference long ago that compared various translations of the charter (or some such) of the Voice of America in a couple or three encodings. Hmmmm, let's see.... could be this: http://www.unicode.org/iuc/iuc9/Friday2.html#b3 Reuters Compression Scheme for Unicode (RCSU) Misha Wolf
No paper online, alas. I remember that Chinese was a clear winner in terms of # of characters. In fact, I kind of remember that Chinese was so much denser that it still won after RCSU (now SCSU) compression, which would mean that a Han character contains more than twice as much info on average as a Latin letter as used in (say) English. This is all on pretty shaky ground, distant memories. Perhaps Misha stil has the figures (if that's in fact the right paper).

