Looking for information on the UnicodeData file

2003-03-05 Thread Pim Blokland
Iapologize if this question has been asked before, but I'm relatively new at this. My question is: where can I find formal definitions of the terms used in the Character Name field of the UnicodeData.txt file? Most specifically, precise explanations of designations like "turned", "inverse",

Re: Caron / Hacek?

2003-03-05 Thread Pim Blokland
John Hudson wrote: In the Slovak orthography, the lowercase d, l and t are normally written with the 'apostrophe' form of the accent. Then why does UnicodeData break them down as (e.g.) 0064 030C rather than 0064 0315? Pim Blokland

Re: The display of *kholam* on PCs

2003-03-05 Thread Dean Snyder
Chris Jacobs wrote at 12:54 AM on Wednesday, March 5, 2003: But why do you call the kholam a high left dot? As far as I know it can appear high left or middle, to indicate that is should be pronounced after the consonant, or right, to pronounce it before. So the meaning of a shin with two dots

Re: Khmer encoding model (had no subject)

2003-03-05 Thread Mijan
Quoting Marco Cimarosti [EMAIL PROTECTED]: Mijan wrote: [...] 3. There are no other cases of a Vowel+Virama combination in the Unicode encoding model. Yes, there are. Khmer. I do not understand Khmer but I see that it does not use the same 'encoding model'. Please look,

RE: Reph and Khmer encoding model

2003-03-05 Thread Mijan
Quoting Kent Karlsson [EMAIL PROTECTED]: I understand that unicode is supposed to represent the language, not the way it is written. No, Unicode is supposed to be able to represent the written form. (Of course.) Yes, I was wrong! I think I wanted to say something like, Unicode is

Re: Caron / Hacek?

2003-03-05 Thread John Cowan
Pim Blokland scripsit: Then why does UnicodeData break them down as (e.g.) 0064 030C rather than 0064 0315? To keep the upper case and lower case characters in sync for decomposition, they always have the same combining characters. For another example, G with cedilla gets the cedilla on top

Ya-phalaa

2003-03-05 Thread Michael Everson
Mijan, Unicode has a mechanism for producing the ya-phalaa conjunct, namely by preceding the ya with virama. This works also in the unusual situation where the consonant the ya-phalaa modifies is an independent vowel. A + VIRAMA + YA + -AA (this is aa-yaphalaa) RA + VIRAMA + ZWJ + YA (this

FAQ entry (was: Looking for information on the UnicodeData file)

2003-03-05 Thread John Cowan
I've reformatted Pim Blokland's question as a Unicode FAQ. Q: What do the terms turned, inverted, reversed, rotated, inverse, digraph, and ligature used in the names of Unicode characters mean? A: These terms are basically typographical rather than Unicode-specific. A turned character is one

RE: Ya-phalaa

2003-03-05 Thread Michael Everson
At 17:41 + 2003-03-05, Andy White wrote: Unicode has a mechanism for producing the ya-phalaa conjunct, namely by preceding the ya with virama. This works also in the unusual situation where the consonant the ya-phalaa modifies is an independent vowel. A + VIRAMA + YA + -AA (this

Re: The display of *kholam* on PCs

2003-03-05 Thread John Hudson
At 07:57 AM 3/5/2003, Dean Snyder wrote: About the only unusual orthographic phenomenon I can think of related to KHOLEM is that when it occurs after SIN it shares the same dot with SIN. Not always. I have not done a close analysis of manuscript sources, but I wouldn't be surprised to find that

RE: Ya-phalaa

2003-03-05 Thread Andy White
Michael Everson wrote: [...] RA + VIRAMA + ZWJ + YA (this is the reph-ya) RA + VIRAMA + YA (this is the ra-yaphalaa) [...] ... in the Indic OpenType secifications, you will see that a Ra+Virama is recognised as reph before any other processing is applied. [...] If this is the case

RE: Ya-phalaa

2003-03-05 Thread Michael Everson
Andy, the ya-phalaa is a presentation form of cojoined YA, which is produced in Unicode by the sequence VIRAMA + YA. Encoding it as anything else makes very little sense at all. However it is pronounced today in Bengali, and however weird you feel about its being applied to an initial vowel,

Re: The display of *kholam* on PCs

2003-03-05 Thread Dean Snyder
Chris Jacobs wrote at 7:27 PM on Wednesday, March 5, 2003: Chris Jacobs wrote at 12:54 AM on Wednesday, March 5, 2003: But why do you call the kholam a high left dot? As far as I know it can appear high left or middle, to indicate that it should be pronounced after the consonant, or right,

[OT] The project is done

2003-03-05 Thread David Oftedal
Hello! My keymap is done, and is working well. I just wanted to thank everyone who helped me during the construction of all the scripts and tidbits that made it work. Thanks a lot! -Dave Oftedal -- Sonna ojamasan ni ha batsu-geemu namatako pantsu juppun!

RE: Ya-phalaa

2003-03-05 Thread Andy White
Michael, I do not wish to get into yet another long discussion (argument) but I must reply to one point. Your proposed combining ya-phalaa will do Bengali no service, as it will introduce multiple spellings for consonant clusters in -YA. Um, actually if you look, you will not find any place

Re: Malayalam Cillaksharams (was Ya-phalaa)

2003-03-05 Thread Michael Everson
At 21:14 + 2003-03-05, Andy White wrote: I am replying to this portion of the reply as I feel it is a very important revelation. We weren't hiding it. This is part of the improvements to Unicode that have been made for 4.0. One of the tasks I was given was to improve the block descriptions

Re: [OT] The project is done

2003-03-05 Thread Edward H Trager
On Wed, 5 Mar 2003, David Oftedal wrote: Hello! My keymap is done, and is working well. I just wanted to thank everyone who helped me during the construction of all the scripts and tidbits that made it work. I'm curious what keymap and for what language/script that is? Probably I ignored

RE: Ya-phalaa

2003-03-05 Thread jameskass
. Moreover, RA + VIRAMA + YA cannot represent Ra-yaphalaa as Ra+Virama is relied upon as being representative of Reph. For example, in the Indic OpenType secifications, you will see that a Ra+Virama is recognised as reph before any other processing is applied. If this is the case (and one

length of text by different languages

2003-03-05 Thread Yung-Fong Tang
I remember there were some study to show although UTF-8 encode each Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use LESS characters in writting to communicate information than alphabetic base langauges. Any one can point to me such research? Martin, do you have some paper

RE: Ya-phalaa

2003-03-05 Thread Andy White
I once wrote: My thoughts were to put a ZWNJ after the Ra to indicate that is not to form a Reph e.g. Ra+ZWNJ+Virama+Ya = Ra+Jophola Then I remembered that in some font designs, secondary forms such as jophola can form a conjunct ligature with the preceding consonant. I think that a

RE: Ya-phalaa

2003-03-05 Thread jameskass
. Andy White wrote, No! This is an example of stating something that can be read in two ways - Hmmm, kind of like RA+VIRAMA+YA in current implementations? unfortunatly you took an unintended meaning :-( Actually, I did get the intended meaning. Unfortunately, though, I didn't get it until

Re: Looking for information on the UnicodeData file

2003-03-05 Thread Asmus Freytag
At 04:57 PM 3/5/03 +0100, Pim Blokland wrote: I apologize if this question has been asked before, but I'm relatively new at this. My question is: where can I find formal definitions of the terms used in the Character Name field of the UnicodeData.txt file? Most specifically, precise

Re: Looking for information on the UnicodeData file

2003-03-05 Thread Rick McGowan
By the way, the FAQ was updated today, thanks to people on this list. Rick My question is: where can I find formal definitions of the terms used in the Character Name field of the UnicodeData.txt file? Most

RE: Ya-phalaa

2003-03-05 Thread Andy White
Jameskass wrote: If a font designer makes a special ligature form of RA+JOPHOLA, then the easy solution would be to put a look-up in the font's GSUB table: RA + ZWNJ + VIRAMA + YA --- my special ligature form Now that simplicity makes me smile :-) I would be surprised if anyone (even

RE: length of text by different languages

2003-03-05 Thread Francois Yergeau
[EMAIL PROTECTED] wrote: I remember there were some study to show although UTF-8 encode each Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use LESS characters in writting to communicate information than alphabetic base langauges. Any one can point to me such research?