Varada wrote: > I am developing an uni code editor for Devanagiri and have a > clarification on combine letters in devanagiri. > > For Eg if have to form a word that like "PATNI" It should > have first > half of "PA" + "TA" + "NA" + "I" . > > So also if I have to form a word "HAMSA" it should have full "HA" + > half "MA" + full "SA". > > I downloaded the Unicode 3.2 beta and could not find codes for half > letters. Would like to know how are these supported in Unicode ?
As this question has been raised and answered many times, and not everybody has a copy of TUS or can read PDF files, I propose to paraphrase Varada's question into a specific FAQ, to be added on <http://www.unicode.org/unicode/faq/indic.html>, possibly as the first question. « Q: I cannot find on Unicode charts the "half forms" of Devanagari letters (or any other Indic script). These characters are needed to form words such as "patni". A: Unicode does not encode half or subjoined letters for the scripts of India. Like in the ISCII standard, Unicode forms all "consonant clusters" (such as the "tn" in "patni") by inserting the character "virama" (or "halant") between the two relevant consonant letters. For instance, the Devanagari syllable "tna" ("त्न") is encoded with the following code points: U+0924 (त DEVANAGARI LETTER TA) U+094D (् DEVANAGARI SIGN VIRAMA = halant) U+0928 (न DEVANAGARI LETTER NA) These three characters will be normally displayed using the single glyph <tna ligature> ("त्न"). But it is also possible that they are displayed using a <half ta> glyph followed by a <full na> glyph ("त्न"), or even with a <full ta> glyph combined with a <virama> glyph and followed by a <full na> glyph ("त्न") Which form will be actually displayed is the decision of an underlying software module called "display engine", which bases this decision on the availability of glyphs in the font. If the sequence U+0924, U+094D is not followed by another consonant letter (such as "na") it is always displayed as a <full ta> glyph combined with the <virama> glyph ("त्"). Unicode provides a way to force the display engine to show a half letter form. To do this, an invisible character called ZERO WIDTH JOINER should be inserted after the virama: U+0924 (त DEVANAGARI LETTER TA) U+094D (् DEVANAGARI SIGN VIRAMA = halant) U+200D (zwj ZERO WIDTH JOINER) U+0928 (न DEVANAGARI LETTER NA) This sequence is always displayed as a <half ta> glyph followed by a <full na> glyph ("त्न"). Even if the consonant "na" is not present, the sequence U+0924, U+094D, U+200D is displayed as a <half ta> glyph ("त्"). Unicode also provides a way to force the display engine to show the <virama> glyph. To do this, an invisible character called ZERO WIDTH NON-JOINER should be inserted after the virama: U+0924 (त DEVANAGARI LETTER TA) U+094D (् DEVANAGARI SIGN VIRAMA = halant) U+200C (zwnj ZERO WIDTH NON-JOINER) U+0928 (न DEVANAGARI LETTER NA) This sequence is always displayed as a <full ta> glyph combined with a <virama> glyph and followed by a <full na> glyph ("त्न"). For more detailed information, see Chapter 9 of the Unicode Standard, "South and Southeast Asian Scripts" <http://www.unicode.org/unicode/uni2book/ch09.pdf>. » I don't know if all the glyphs in this e-mail will show correctly to everybody. However, I can provide GIF images for all the examples. _ Marco

