RE: Indic Devanagari Query

Marco Cimarosti Wed, 29 Jan 2003 03:06:07 -0800

Aditya Gokhale wrote:
> Hello Everybody,
>     I had few query regarding representation of Devanagari 
> script in Unicode


All your questions are FAQ's, so I'll just reference the entries which
answers them.

> (Code page - 0x0900 - 0x097F). Devanagari is a writing 
> script, is used in Hindi, Marathi and Sanskrit languages. I 
> have following questions - 

Unicode has no code pages:
        http://www.unicode.org/faq/basic_q.html#18

> 1. In Marathi and Sanskrit language two characters glyphs of 
> 'la' and 'sha' are represented differently as shown in the 
> image below - 
>  (First glyph is 'la' and second one is 'sha')
> as compared to Hindi where these character glyphs are 
> represented as shown in the image below - 
> (First glyph is 'la' and second one is 'sha')

Unicode encodes (abstract) characters, not glyphs:
        http://www.unicode.org/faq/han_cjk.html#3

(This FAQ is in the Chinese/Japanese/Korean section because it is more often
raised for Chinese ideograms.)

> In the same script code page, how do I use these two 
> different Glyphs, to represent the same character ? Is there 
> any way by which I can do it in an Open type font and Free 
> type font implementation ?

Unicode's requirements for fonts:
        http://www.unicode.org/faq/font_keyboard.html#1

A few links to OpenType stuff:
        http://www.unicode.org/faq/font_keyboard.html#4

> 2. Implementation Query - 
>     In an implementation where I need to send / process 
> Hindi, Marathi and Sanskrit data, how do I differentiate 
> between languages (Hindi, Marathi and Sanskrit). Say for 
> example, I am writing a translation engine, and I want to 
> translate a document having Hindi, Marathi and Sanskrit Text 
> in it, how do I know from the code points between 0x0900 and 
> 0x097F, that the data under perusal is Hindi / Marathi / Sanskrit ?

What you need here is some sort of language tagging:
        http://www.unicode.org/faq/languagetagging.html

>     I would suggest that we should give different code pages 
> for Marathi, Hindi and Sanskrit. May be current code page of 
> Devanagari can be traded as Hindi and two new code pages for 
> Marathi and Sanskrit be added. This could solve these issues. 
> If there is any better way of solving this, any one suggest.

Characters are encoder "per scripts", not "per languages":
        http://www.unicode.org/faq/basic_q.html#17

> 3. Character codes for jna, shra, ksh - 
> 
> In Sanskrit and Marathi jna, shra and ksh are considered as 
> separate characters and not ligatures. How do we take care of 
> this ? Can I get over all views on the matter from the group 
> ? In my opinion they should be given different code points in 
> the specific language code page.
> Please find below the character glyphs - 

Unicode encodes Indic analytically:
        http://www.unicode.org/faq/indic.html#17

> thanks,

For more details about Devanagari in Unicode, see Chapter 9 of the Standard:
        http://www.unicode.org/uni2book/ch09.pdf

_ Marco

RE: Indic Devanagari Query

Reply via email to