On Mon 09 Jun 03, 4:57 PM, Mark K. Kim <[EMAIL PROTECTED]> said: > On Mon, 9 Jun 2003, Peter Jay Salzman wrote: > > > the language i'm thinking of is hebrew, but with some important issues. > > > > 1. i need vowel support. > > 2. i really want to have mixed hebrew/english > > > > i believe taken together, i want to use ISO 10646 which can represent > > all languages at the same time. > > Unfortunately I don't know the hebrew language so I don't know what the > difficulties are. For both Korean and Japanese, we use two-bytes to > represent a single Asian "character", while maintaining backwards > compatibility with ASCII by using the MSb on the first character to flag a > multibyte character. > > If Hebrew does the same thing, there is no technical reason why it can't > use both English and Hebrew.
fwiw, i happen to know this is the case. :) i think you just described (part of the) utf-8 encoding... the portion of the encoding which insures backwards compatibility. the msb being set is also part of utf-8 encoding, and is necessary because strings in unicode can contain NULL characters, which would wreak havoc on C string handling. that's why you don't see utf-2 and utf-4 encoding on linux. they don't set the msb byte. > > as a first stab at getting utf-8 capable xterms, i set: > > > > LC_CTYPE=en_US.UTF-8 > > > > but wierd things started to happen, like mutt's threading lines turned > > into really strange characters. i guess the applications themselves > > need to be utf-8 aware too. > > UTF-8 is compatible only with the standard ASCII set. The threading lines > are in the extended ASCII set (it uses the MSb), not the standard ASCII > set. They clash because UTF-8 uses the MSb to signal multibyte character, > while the extended ASCII set use the MSb. > > I recommend just ignoring it (you get used to it). If not, I think you > can tell Mutt to use standard ASCII for threading lines (using +, -, |, > etc.) unicode includes mathematical and scientific symbols, so those extended characters are in there somewhere. it's probably just a matter of whether you can mutt which characters to use for threading (and how, of course). heh. i read that unicode even includes klingon and the tengwar. unicode has everything we need. it's "just" a matter of getting software to use it correctly. but boy oh boy are there alot of details in that word "just"... :( > It's one of the reasons I have WindowsXP. The international language > support is so amazing. I can read multi-language data file with so much > ease. I've seen Windows2000 also do a very nice job. you're breaking my heart... :( pete -- GPG Instructions: http://www.dirac.org/linux/gpg GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D _______________________________________________ vox-tech mailing list [EMAIL PROTECTED] http://lists.lugod.org/mailman/listinfo/vox-tech
