Rick investigated, and came up with: > In a specific case, Andy asked about Khanda Ta, and pointed to a WG2 > resolution that contradicts the Unicode FAQ on the same topic. I looked up > a paper listing an action item as follows, taken from document > http://anubis.dkuug.dk/JTC1/SC2/WG2/docs/M40ActionItems.pdf which are the > action items from meeting #40 of WG2; the decision was from meeting #39 in > October 2000: > > Resolution M39.11 (Request from Bangladesh): In response to the > request from Bangladesh Standards and Testing Institution in > document N2261 for adding KHANDATA character to 10646, WG2 instructs > its convener to communicate to the BSTI: a. that the requested > character can be encoded in 10646 using the following combining > sequence: Bengali TA (U+09A4 ) + Bengali Virama (U+09CD) + ZWNJ > (U+200C) + Following Character(s), to be able to separate the > KHANDATA from forming a conjunct with the Following Character(s). > Therefore, their proposal is not accepted. b. our understanding > that BDS 1520: 2000 completely replaces the BDS 1520: 1997. > > That does indeed give a different answer than the Unicode FAQ. > > I wonder if anyone else knows whether the text of 10646 contains any > mention of Khanda Ta, and if so, what it says.
It does not mention Khanda Ta. And I guess it's time to open that old CBS (character BS) mailbag to track this sucker down. Resolution M39.11 dates from the WG2 discussion of September 20, 2000 (at the WG2 meeting in Vouliagmeni, Greece). It was agenda item 7.12 at that meeting, "Proposal to synchronize Bengali standard with 10646", during which the question came up about what is this "KHANDATA" thing in Bengali BDS 1520:2000 standard anyway, and should it be encoded as a separate character, as it was (at code point 0xBA) in BDS 1520:2000. For details of the discussion, see the WG2 meeting minutes, online in WG2 N2253. The upshot of the initial discussion was that Michael Everson was tasked with an action item, to wit: "Michael Everson to contact BSTI (email id, name etc. are in the cover letter) - a query was sent out to Unicode expert's list also." The response received to the query to the Unicode list on September 20 from a Mr. Abdul Malik seemed to answer the question of what the KHANDATA was. Anyone who wants to can dig it out of the Unicode email archives: X-UML-Sequence: 16066 (2000-09-20 16:22:21 GMT). But the relevant portions of the email were: <quote> ----- Original Message ----- From: "Michael Everson" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Wednesday, September 20, 2000 10:30 AM Subject: Request about Bengali/Bangla > BDS 1520:2000 contains a BANGLA LETTER KHANDATA and it has been proposed > for addition to the UCS. I am at the WG2 meetings in Athens where the > character is being discussed, but we don't know how to evaluate it. A representative of the Bangladesh Standards and Testing Institution (the instigator of the proposal) should be better placed to answering these questions than me, anyway... > What is this character and how is it used? KhandoTa is a form of the letter Ta. It is the form Ta takes when it has no inherent vowel. It occurs when final and medial, but never the initial letter of a word. It is equivalent to Ta virama. Ta with a visible virama is only needed for illustrative purposes, kandaTa being used in its place in all Bengali words, except when it forms a conjunct form. For example in a standard without KhandaTa, there are two different forms the sequence Ta Virama Ma need to take i.e. khandoTa_Ma or the Ta/Ma_conjunct_form. As BSD1520:2000 does not include any ligation control characters other than Virama, it is necessary to include KhandaTa as a separate letter to make the two previously mentioned forms. > Another question, is does BDS 1520:2000 completely replace BDS 1520:1997, > or is the old standard still valid (and being implemented)? BDS 1520:1997 is based on a font encoding. It is the standard currently used in the products of Proshika Computer Systems and AdarshaBangla Technologies Inc. It is also the encoding used in many web sites. BDS 1520:2000 is a complete replacement, being based on the ISO/IEC10646 character encoding model. AFAIK it is yet to receive a real world implementation. BDS 1520:2000 seems immature as it does not include any encoding principles or rendering rules, for example, how is Bengali zophola to be formed? Is it formed from Ya or YYa? > What are the implications for interoperability between this standard and ISCII standards? As BDS 1520 does not currently have an encoding model to refer to, one can not say. e.g. to form Ka_halant Ka: in Unicode :- Ka virama ZWNJ Ka In ISCII :- KA Virama Virama Ka In BDS :- ?? Regards Abdul </quote> It was on the basis of *this* feedback from a Bengali expert on the Unicode list, reported back by Michael Everson to the WG2 meeting, that WG2 drafted a resolution responding to the request by BSTI expressed in WG2 N2261. The intent of resolution M39.11 is expressed in the last sentence of part a: "Therefore their proposal is not accepted." In other words, WG2 went on record as claiming there is already a way to represent Khanda Ta unambiguously using the current characters, and that hence there was no reason to encode a separate character. Abdul's discussion above explains the reason why BDS 1520:2000 felt it necessary to have a separate character for Khanda Ta, since it contains no ZWNJ or rendering rules which could explain how it would otherwise be represented using that standard. What WG2 resolution M39.11 can *not* be interpreted as, however, is a definitive ISO statement about Bengali rendering rules in 10646. No such language was, in fact added to ISO/IEC 10646, and in general such material is not a part of that standard. Rendering rules for Indic scripts are the kind of add-on one finds in the Unicode Standard, instead. The language in M39.11 was quickly drafted to sketch out the reason why encoding of Khanda Ta was not needed, but cannot be understood as establishing an ISO standard in the matter of rendering of Bengali ta's. Now the analysis of Khanda Ta presented in the Unicode FAQ resulted from further discussion of the issue which took place on the Unicode email list after the Greece WG2 meeting. I can't recall all the details of that right now -- although I'm sure people could dig it out of the archives, but my reading of the FAQ suggests that the proposal that Abdul Malik had suggested for how to represent Khanda Ta was subjected to more analysis in the context of similar rendering processes for other Indic scripts. In particular, since the sequence C - virama - ZWNJ - C is generally used to display the *explicit* virama (blocking a conjunct), and since such forms with explicit virama also occur in Bengali, it seemed better to keep that sequence for explicit viramas in Bengali as well. The other sequence, C - virama - ZWJ - C in Devanagari, at least, is used for representing half-consonant forms. Now while the Bengali Khanda Ta is not actually a "half-consonant", but a full letter form, it still contrasts with TA in conjuncts and TA with explicit virama (halant). So the moral equivalent sequence for representing the Khanda Ta would then be: TA - virama - ZWJ - C. I have not digested all the argumentation in the last month about this topic, so cannot say what I feel the *right* answer, finally, is for this. But now, please, stop speculating about how things got to be the way they are, stop arguing about whose specification trumps whose (a statement in a WG2 resolution which is not reflected in the ISO 10646 standard or a statement in a Unicode website FAQ which is not reflected in the Unicode Standard), and focus on what is the technically best advice to give people about representing the Bengali Khanda Ta, given the context explained in the Unicode FAQ. --Ken

