Michael Everson writes: > Peter Constable wrote: > > > > I think the TDIL chart is wrong. > > > >It seems reasonable that one should need extra persuasion to take > >the word of an American living in Ireland over Indians. (Sorry.)
Isn't there a specific list for Brahmic scripts? ([EMAIL PROTECTED] ???). We are near to explode the number of issues with these scripts if Indian sources start publishing new undated references for their encoding and conversion to Unicode, including proposed changes of orthographic rules to better match either the phonology or the tradition or the inclusion of foreign terms. SIL.org also is working quite actively in this area, in relation with a proposed extended UTR22 reference for transcoding. But I'd like to see discussions about proposed UTR22 changes in the main Unicode list. There's not much isues with Thai as it has been standardized since long in TIS620, which was the base of Unicode encoding (but shamely before UTR22 was produced which would have allowed a better logical encoding without needing lexical dictionnaries to parse the Thai text). Semantic analysis of Thai text is an interesting issue by itself, but not for the correct way to encode Thai words (TIS620 rules are clear as it mostly encodes glyphs, expecting that readers will interpret the written text using their knowledge of the language). So Thai discussions can remain in the main list. I also think that Tibetan issues should be discussed in that list, despite its composition model is very different from Brahmic scripts of India, unless there's a specific rapporteur group for it. But not Han issues which should be discussed possibly in their own list in relation with the IRG workgroup (which already works on its own technical reports as well as the standardization of the extended repertoire). The recent issues I have read seem to multiply the number of Brahmic conjuncts we have to deal with, possibly in relation with new normalization forms (not NFC and NFD); as for Hebrew, there's probably a need for work in these scripts with a separate discussion list, with the aim to produce a technical report in accordance to Indian sources. Other related South Asian scripts should be there too: Lao, Khmer... My recent works with UCA and collation, as well as UTR22 and phonologic analysis of many texts tend to promote the idea of new normalization forms in all areas where NFC/NFD or even NFKC/NFKD are failing (we can't change them due to the stability pact, but UCA and collation in general seems to create a new coded character set (made of ordered collation weights belonging to separate ranges for each collation level, these ranges being sorted in the reverse order of the collation level). I've tried to experiment a collation algorithm to implement UCA by the same system as used in UCD decompositions, but with added (and sometimes modified) decompositions. This system creates new "code points" needed to represent only <font> compatibility differences, ligatures, or alternate forms, as a decomposition of the existing compatibility character, into more basic characters exposed with primary differences in UCA, plus these new characters given "variable" collation weights, which may be ignorable in applications which ignore extra levels. This encoding uses a 31 bit code space, which is still highly compressible, but still representable with the UTF-8 TES (but they are not containing Unicode code points) or similar ad-hoc representation. I am currently trying to adapt this system to work in relation with UTR22 transcodings, and I am testing it against Brahmic scripts, Hebrew, and Latin. This is very promizing, and my next step will be to handle decomposition of Han characters into their component radicals and strokes. I do think that it is possible to handle almost all UCA and UTR22 rules by using UTR22 itself and decomposition rules in a simple table matching nearly the format of the UCD. But all these discussions and encoding ambiguities of Brahmic scripts are polluting my work. I am quite near to remove my current work on them, until there's some agreement found, notably within an revision of ISCII if there's one in preparation which will be more precise and will give more precise rules. For now it is impossible for me to adapt my model with the proposed (and sometimes contradictory) encoding solutions proposed by distinct people. __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com
<<attachment: winmail.dat>>