Iapologize if this question has been asked
before, but I'm relatively new at this.
My question is: where can I find formal definitions
of the terms used in the Character Name field of the UnicodeData.txt file? Most
specifically, precise explanations of designations like "turned", "inverse",
John Hudson wrote:
In the Slovak orthography, the lowercase d, l and t are normally written
with the 'apostrophe' form of the accent.
Then why does UnicodeData break them down as (e.g.) 0064 030C rather than
0064 0315?
Pim Blokland
Chris Jacobs wrote at 12:54 AM on Wednesday, March 5, 2003:
But why do you call the kholam a high left dot?
As far as I know it can appear high left or middle, to indicate that is
should be pronounced after the consonant, or right, to pronounce it before.
So the meaning of a shin with two dots
Quoting Marco Cimarosti [EMAIL PROTECTED]:
Mijan wrote:
[...]
3. There are no other cases of a Vowel+Virama combination in the
Unicode encoding model.
Yes, there are. Khmer.
I do not understand Khmer but I see that it does not use the
same 'encoding model'. Please look,
Quoting Kent Karlsson [EMAIL PROTECTED]:
I understand that unicode is supposed to represent the
language, not the way it is written.
No, Unicode is supposed to be able to represent the written
form. (Of course.)
Yes, I was wrong! I think I wanted to say something like, Unicode is
Pim Blokland scripsit:
Then why does UnicodeData break them down as (e.g.) 0064 030C rather than
0064 0315?
To keep the upper case and lower case characters in sync for decomposition,
they always have the same combining characters. For another example, G with
cedilla gets the cedilla on top
Mijan,
Unicode has a mechanism for producing the ya-phalaa conjunct, namely
by preceding the ya with virama. This works also in the unusual
situation where the consonant the ya-phalaa modifies is an
independent vowel.
A + VIRAMA + YA + -AA (this is aa-yaphalaa)
RA + VIRAMA + ZWJ + YA (this
I've reformatted Pim Blokland's question as a Unicode FAQ.
Q: What do the terms turned, inverted, reversed, rotated,
inverse, digraph, and ligature used in the names of Unicode
characters mean?
A: These terms are basically typographical rather than Unicode-specific.
A turned character is one
At 17:41 + 2003-03-05, Andy White wrote:
Unicode has a mechanism for producing the ya-phalaa conjunct, namely
by preceding the ya with virama. This works also in the unusual
situation where the consonant the ya-phalaa modifies is an
independent vowel.
A + VIRAMA + YA + -AA (this
At 07:57 AM 3/5/2003, Dean Snyder wrote:
About the only unusual orthographic phenomenon I can think of related
to KHOLEM is that when it occurs after SIN it shares the same dot with SIN.
Not always. I have not done a close analysis of manuscript sources, but I
wouldn't be surprised to find that
Michael Everson wrote:
[...]
RA + VIRAMA + ZWJ + YA (this is the reph-ya)
RA + VIRAMA + YA (this is the ra-yaphalaa)
[...]
... in the
Indic OpenType secifications, you will see that a
Ra+Virama is recognised as reph before any other processing
is applied.
[...]
If this is the case
Andy, the ya-phalaa is a presentation form of cojoined YA, which is
produced in Unicode by the sequence VIRAMA + YA. Encoding it as
anything else makes very little sense at all. However it is
pronounced today in Bengali, and however weird you feel about its
being applied to an initial vowel,
Chris Jacobs wrote at 7:27 PM on Wednesday, March 5, 2003:
Chris Jacobs wrote at 12:54 AM on Wednesday, March 5, 2003:
But why do you call the kholam a high left dot?
As far as I know it can appear high left or middle, to indicate that it
should be pronounced after the consonant, or right,
Hello!
My keymap is done, and is working well. I just wanted to thank everyone
who helped me during the construction of all the scripts and tidbits
that made it work.
Thanks a lot!
-Dave Oftedal
--
Sonna ojamasan ni ha batsu-geemu namatako pantsu juppun!
Michael, I do not wish to get into yet another long discussion
(argument) but I must reply to one point.
Your proposed combining ya-phalaa will do Bengali no service, as it
will introduce multiple spellings for consonant clusters in -YA.
Um, actually if you look, you will not find any place
At 21:14 + 2003-03-05, Andy White wrote:
I am replying to this portion of the reply as I feel it is a very
important revelation.
We weren't hiding it. This is part of the improvements to Unicode
that have been made for 4.0. One of the tasks I was given was to
improve the block descriptions
On Wed, 5 Mar 2003, David Oftedal wrote:
Hello!
My keymap is done, and is working well. I just wanted to thank everyone
who helped me during the construction of all the scripts and tidbits
that made it work.
I'm curious what keymap and for what language/script that is? Probably I
ignored
.
Moreover, RA + VIRAMA + YA cannot represent Ra-yaphalaa as Ra+Virama
is relied upon as being representative of Reph.
For example, in the Indic OpenType secifications, you will see that a
Ra+Virama is recognised as reph before any other processing is applied.
If this is the case (and one
I remember there were some study to show although UTF-8 encode each
Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use
LESS characters in writting to communicate information than alphabetic
base langauges.
Any one can point to me such research? Martin, do you have some paper
I once wrote:
My thoughts were to put a ZWNJ after the Ra to indicate that is not
to form a Reph e.g. Ra+ZWNJ+Virama+Ya = Ra+Jophola
Then I remembered that in some font designs, secondary forms such
as jophola can form a conjunct ligature with the preceding
consonant.
I think that a
.
Andy White wrote,
No!
This is an example of stating something that can be read in two ways -
Hmmm, kind of like RA+VIRAMA+YA in current implementations?
unfortunatly you took an unintended meaning :-(
Actually, I did get the intended meaning. Unfortunately, though,
I didn't get it until
At 04:57 PM 3/5/03 +0100, Pim Blokland wrote:
I apologize if this question has been asked before, but I'm relatively new
at this.
My question is: where can I find formal definitions of the terms used in
the Character Name field of the UnicodeData.txt file? Most specifically,
precise
By the way, the FAQ was updated today, thanks to people on this list.
Rick
My question is: where can I find formal definitions of the terms used
in the Character Name field of the UnicodeData.txt file? Most
Jameskass wrote:
If a font designer makes a special ligature form of
RA+JOPHOLA, then the easy solution would be to put a look-up
in the font's GSUB table:
RA + ZWNJ + VIRAMA + YA --- my special ligature form
Now that simplicity makes me smile :-)
I would be surprised if anyone (even
[EMAIL PROTECTED] wrote:
I remember there were some study to show although UTF-8 encode each
Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use
LESS characters in writting to communicate information than
alphabetic base langauges.
Any one can point to me such research?
25 matches
Mail list logo