On 01/03/2004 12:30, Philippe Verdy wrote:

On 01/03/2004 00:18, Asomiddin Atoev wrote:



I am emailing on behalf of the Tajikistani state
working group on localizing software for Tajik
language. Could you please kindly guide us to be in
right direction. What shall be the procedure of
standartization of alphabet symbols? Tajik alphabet
makes use of cyrillic symbols and contains of 35
letters.



I think that his question is not whever Unicode supports Tajik, if works has
been done (may be in other countries, for librarian purposes) to define a subset
appropriate to publish and work with texts in Tajik language. The fact that
Tajik orthograph has been influenced a lot from the time of USSR and Russian
domination in this former Republic of the Union, may have influenced the
language so that some old texts with important cultural backgrounds have lost
some of their original semantic.



Any texts from before the time of Russian domination would be in Arabic script. Some from the earlier Soviet period may be in Latin script. It is clear that Aso's e-mail related to Cyrillic not Arabic script, and there is no hint that it relates to anything other than the current orthography.


So there may exist libraries in the world, where there remains texts in original
orthograph, or adapted from the Cyrillic-based orthograph, which contain more
letters than those that we commonly see. If there are attempts to reform the
orthograph to better match the language needs, there may already exist some
letter variants which would interest him.

Also, if there are existing sets, this means that this creates an opportunity to
propose an alternate 8-bit encoding for Tajik, which would be a variant of the
ISO-8859 Cyrillic encoding used for Russian, except that it would contain all
letters needed for Tajik.

Unicode clearly seems to support this language well, but there's still a need to
have a common framework for working with Tajik texts with an 8-bit encoding
(which would be better than UTF-8 and as simple and efficient as ISO-8859-1 for
Western European languages, or ISO-8859-4 for Russian).

So this question would certainly meet some exports at the ISO Working Group
working on 8-bit encodings compatible with the ISO-8859 standard (this is
independant of the fact that this subset will be fully mapped and supported with
Unicode. Having such a subset will certainly help unifying various sources by
agreeing on a common orthograph, instead of relying on the support of the large
Unicode/ISO/IEC 10646 coded set. If such a subset is then approved nationally,
it will help get a decent support and mapping within many fonts, keyboard
drivers, and text processing tools.

After all, ISO-8859-15 was decided and standardized after a similar reform in
the Euopean Union.that needed some Latin characters not present in ISO-8859-1,
even if all these characters were already present in Unicode, or adopted
recently in Unicode (like the Euro codepoint that was created instead of using
the legacy and non standard ECU symbol with various and non distinctive forms).
So why not with Tajik too?






I understand that there have been previous attempts to define a new or extended Cyrillic 8-but character set supporting Central Asian languages, but that such proposals have been rejected. I hardly think that Aso would have turned to the Unicode list if he wanted to define an 8-bit encoding.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Reply via email to