Re: Telugu Unicode Encoding Review

Kiran Kumar Chava Sat, 23 Oct 2010 06:02:44 -0700

I spent some more time after going through feedback I received here and to
my personal mail. I tried to divide discussion in to bullet points. where
ever I have answer I put it in questions and answers section, where ever I
don't have answer I put it up in open issues section. The result is another
post on my blog. I am pasting the same here for reference. I had gone
through related mails on racchabanda and Unicode.org. References are given
in the end.


This is a follow up post to my previous post titled Telugu Unicode Encoding
Review (http://geek.chavakiran.com/archives/55 )

*1. Anybody using reserved code points for private use?*

a. No AFAIK. I haven’t seen any such instances. But the reason may be lack
of serious computing work in Telugu language, apart from whatever is
happening on web and PC. Once people start using Unicode for mobile, once
people start using Unicode for book publishing (now this area is dominated
by dynamic encoding from Anu) we may get into this hell.

*2. **ఌ** and **ౡ** (and other related places) are not assigned consecutive
code points. Is this a problem for sorting? ** *

*a.* The answer is No. As sorting is supposed to happen according to Unicode
collation charts. In this charts, as shown in my previous post
<http://geek.chavakiran.com/archives/55>the order looks OK. But if there is
some encoding that is going to replace Unicode in future, I guess we better
have them in order. This might save sorting time. More over whatever is not
placed in consecutive code points, is rarely used in Telugu (see my post on
Telugu character usage http://archives.chavakiran.com/?p=254 ) so just for
the sake of these rarely used characters we are wasting un-necessary CPU
time I guess. (FYI – not in order code points ఋ && ౠ , ఌ && ౡ, ౘ, ౙ, ళ, ఱ) .
So for all practical purposed probably we may simply do a binary data
sorting and move on! Of these characters only ~La ( ళ) seems to be used with
good frequency.

*3. Mr. Chava, you said "Telugu digits are not taught in school", does that
mean they are un-necessarily present in Unicode encoding? *

Hmmm... Not exactly. Even though the Telugu dits are not taught in school
during my days, I guess now they are being taught in recent years. More over
there are attempts to make people aware of them example now Hyderabad city
buses contain numbers in both Telugu digits and Indo-Arabic numerals. And
most important point is religious and classical Telugu books printed very
recently also uses these numbers. For images see my previous blog
post<http://geek.chavakiran.com/archives/55>images. My only point is
font developers should feel free to have Indo
Arabic numerals for Telugu digits also.

*4. Mr. Chava, you said Current Telugu Unicode encoding is flawed, do you
detest Unicode encoding? *

No. I Love it for all the scenarios it enabled for Telugu people on digital
life. I love it, that is why I am spending time over it.

*5. Avagraha symbol is this encoded? *

*Yes. \u0c3D *

*6. Does OM (AUM) symbol need a code point in Telugu? *

My personal opinion : No. Telugu Om is always a combination of 'O' and ~M.
Unless I am missing something. Even on temples, calendars devanagari OM is
used in Telugu land and where ever Telugu Om is used that is a simple
combination of 'O' and '~M'. There may be one or two special cases but that
must be artistic freedom, may not require a code point. *
*

*Open issues: *

*1. Telugu danda and double danda are to be encoded. *

(I saw some discussions of this here and there, but none conclusive. A
decision made?)

*2. How to encode something for musical Telugu books (for example a dot
above character,  a dot below character, a horizontal line above character,
a dot just before the character)*

*3. How to encode a Telugu script Vedic book? (For example a vertical line
over character, A horizontal line below character)*

Ansser? Do we need to use the code points from the vedic block?
http://www.unicode.org/charts/PDF/U1CD0.pdf

*4. Guruvu , Laguvu are to be encoded with new code points? *

(Suggested by Suresh Kolichala in Racchabanda mailing list)

*6. Yati symbol is to be encoded with new code point. *

(Suggested  by Suresh Kolichala in Racchabanda mailing list)

*7. Is there any way to encode Tala kaTTu? *

*8. Is there any way to encode ka ottu?  (క్క , the second half of
preceeding glyph). This is required to to encode a Telugu alphabets text
book, where children were taught of ka ottu and then after few lessions they
are taught about combining them with other vowels. The same question for all
other ottulu. *

*9. What are the pros and cons of new encoding scheme I proposed for Telugu
script? (section 9 of my blog post <http://geek.chavakiran.com/archives/55>)
Is this discussed somewhere?*

*References*

1.http://groups.yahoo.com/group/racchabanda/message/15576 --> Discussion on
tzh character in Telugu.

2. http://groups.yahoo.com/group/racchabanda/message/16367 RB mail after
previous changes to Telugu Unicode.

3.http://groups.yahoo.com/group/racchabanda/message/16378 A discussion on
musical symbols in Telugu.

4. http://unicode.org/alloc/nonapprovals.html Unapproval of arda visarga.

5. http://unicode.org/~emuller/southasia/vedic/ Encoding of Vedic.




----
నెనర్లు,
కిరణ్ కుమార్ చావా
http://te.chavakiran.com/blog
http://en.chavakiran.com/blog





2010/10/17 Frédéric Grosshans <[email protected]>

> Le samedi 16 octobre 2010 à 22:36 +0530, Kiran Kumar Chava a écrit :
> > At the link, http://geek.chavakiran.com/archives/55 , I tried to
> > understand Telugu Unicode encoding and then I tried to do an out of
> > box review of this encoding. Kindly let me know if I am missing
> > something, mentioned as missing in above article are really missing or
> > not. Any other views...
>
> The 13 Telugu characters added in Unicode 5.1, including the fractions,
> are enumerated here :
> http://www.unicode.org/charts/PDF/Unicode-5.1/U51-0C00.pdf .
>
> The rationale for their inclusion are documented in
> http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3116.pdf (which proposed 18
> characters) . I have not looked close enough to check whether the 5
> "missing" characters are linked to the one you consider as missing.
>
>        Frédéric
>
> --
> Frédéric Grosshans
> Chargé de Recherche
> Laboratoire de Photonique Quantique et Moléculaire
> ENS Cachan / CNRS UMR 8437
> tel: (+33)1 47 40 77 15
> GSM: (+33)6 09 24 29 64
> e-mail: [email protected]
>
>

Re: Telugu Unicode Encoding Review

Reply via email to