Marco, that is a very nice FAQ; the only addition I would suggest is to also point to
http://www.unicode.org/unicode/standard/where/ Mark ————— Γνῶθι σαυτόν — Θαλῆς [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com ----- Original Message ----- From: "Marco Cimarosti" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Friday, February 22, 2002 07:21 Subject: FAQ proposal (was RE: Combining letters in Devanagiri) > Varada wrote: > > I am developing an uni code editor for Devanagiri and have a > > clarification on combine letters in devanagiri. > > > > For Eg if have to form a word that like "PATNI" It should > > have first > > half of "PA" + "TA" + "NA" + "I" . > > > > So also if I have to form a word "HAMSA" it should have full "HA" + > > half "MA" + full "SA". > > > > I downloaded the Unicode 3.2 beta and could not find codes for half > > letters. Would like to know how are these supported in Unicode ? > > As this question has been raised and answered many times, and not everybody > has a copy of TUS or can read PDF files, I propose to paraphrase Varada's > question into a specific FAQ, to be added on > <http://www.unicode.org/unicode/faq/indic.html>, possibly as the first > question. > > « > Q: I cannot find on Unicode charts the "half forms" of Devanagari letters > (or any other Indic script). These characters are needed to form words such > as "patni". > > A: Unicode does not encode half or subjoined letters for the scripts of > India. Like in the ISCII standard, Unicode forms all "consonant clusters" > (such as the "tn" in "patni") by inserting the character "virama" (or > "halant") between the two relevant consonant letters. > > For instance, the Devanagari syllable "tna" ("त्न") is encoded with the > following code points: > > U+0924 (त DEVANAGARI LETTER TA) > U+094D (् DEVANAGARI SIGN VIRAMA = halant) > U+0928 (न DEVANAGARI LETTER NA) > > These three characters will be normally displayed using the single glyph > <tna ligature> ("त्न"). But it is also possible that they are displayed > using a <half ta> glyph followed by a <full na> glyph ("त्न"), or even with > a <full ta> glyph combined with a <virama> glyph and followed by a <full na> > glyph ("त्न") > > Which form will be actually displayed is the decision of an underlying > software module called "display engine", which bases this decision on the > availability of glyphs in the font. > > If the sequence U+0924, U+094D is not followed by another consonant letter > (such as "na") it is always displayed as a <full ta> glyph combined with the > <virama> glyph ("त्"). > > Unicode provides a way to force the display engine to show a half letter > form. To do this, an invisible character called ZERO WIDTH JOINER should be > inserted after the virama: > > U+0924 (त DEVANAGARI LETTER TA) > U+094D (् DEVANAGARI SIGN VIRAMA = halant) > U+200D (zwj ZERO WIDTH JOINER) > U+0928 (न DEVANAGARI LETTER NA) > > This sequence is always displayed as a <half ta> glyph followed by a <full > na> glyph ("त्न"). Even if the consonant "na" is not present, the sequence > U+0924, U+094D, U+200D is displayed as a <half ta> glyph ("त्"). > > Unicode also provides a way to force the display engine to show the <virama> > glyph. To do this, an invisible character called ZERO WIDTH NON-JOINER > should be inserted after the virama: > > U+0924 (त DEVANAGARI LETTER TA) > U+094D (् DEVANAGARI SIGN VIRAMA = halant) > U+200C (zwnj ZERO WIDTH NON-JOINER) > U+0928 (न DEVANAGARI LETTER NA) > > This sequence is always displayed as a <full ta> glyph combined with a > <virama> glyph and followed by a <full na> glyph ("त्न"). > > For more detailed information, see Chapter 9 of the Unicode Standard, "South > and Southeast Asian Scripts" > <http://www.unicode.org/unicode/uni2book/ch09.pdf>. > » > > I don't know if all the glyphs in this e-mail will show correctly to > everybody. However, I can provide GIF images for all the examples. > > _ Marco > >

