On Sun, 2 Jun 2002, C Bobroff wrote: > > Ok, it seems that we are seeing a lot of monolouges here. > I'm sure more people than just me are finding the monologues educational
I just wish to emphasize that I have seen repetitions of the same concern. And I can't forget referring to some of us as dictators or things like that. (At least as people who try to impose their ideas on others.) Regarding the dictatorship things, I wish to emphasize that the matter of Heh+Hamza was also discussed at the ISIRI meeting for approval of the standard, and all of the experts agreed or got convinced. The list includes Dr Mostafa Asi (A computational linguist also working with Farhangestan), Mr Ebrahim Mashayekh (President of Informatics Society of Iran), Dr Mohammad Ghodsi (Project Leader of FarsiTeX), Mr Mohammad Azadnia (Technical manager of Persian project at Iran Communication Research Center), Mr Arash Rezaiizadeh (one of entrepreneurs of Windows Farsification), and Mr Arash Zeini (President of Chapar Shabdiz, the first Iranian Free Software company, also of FarsiKDE fame), and Mr Hashemi (Gam Electronic's Persian Expert). All other known experts, if present in Iran, were invited, but some could not attend: this includes people like Dr Mohammad San'ati of SinaSoft fame, whom Behdad and me met personally after the meeting, to make sure he does not have major objections. I can't understand who Abi was refering to, when she or he writes "Next I expect we will be told how to combe out hair. [...] They have nothing to offer to the Persain IT and language discussion." Was he refering to me, or to Mr Khanban? (We are both members of the technical committee of the standard you heard a lot about.) To say the least, neither me nor Mr Khanban have anything to hide about what we have done for the Persian IT world: Just search Google for "Khanban" or "Pournader". We both use our real and full names, and have done everything publicly. But who is "Abi Lover"? Also, quoting Abi's exact words, she or he is against any standardization: "There are some people [...] who think that they have a duty to lay down rules for other people to follow." Unicode Consortium is doing this. ISO is doing this. W3C is doing this. Many software companies, from Microsoft to SinaSoft also do this, by creating things that will become de facto standards. You are not obliged to follow standards, but you will come to trouble if you don't. Noone will be able to use your software with other software. > Roozbeh, can you please tell us about this "normalization" and why > the mention of "Persian" is to be removed from this character? Sure. I have explained the problem a number of times, and I will explain it again: There is a notion in Unicode, called Normalization. You can read about it at <http://www.unicode.org/unicode/reports/tr15/>. If you don't have the time, I will brief you in short: Since Unicode is not just for displaying the text, but also for processing, and it sometimes has different alternatives for encoding the same text, you need to have some mechanism to find that two strings of characters are actually the same. One example, is the equivalence of U+0624 ARABIC LETTER WAW WITH HAMZA ABOVE, with the string <U+0648, U+0654> which is <ARABIC LETTER WAW, ARABIC HAMZA AOBVE>. The algorithm is intelligent enough so it can detect the equivalence even if you put a FATHA between the WAW and the HAMZA, so <WAW WITH HAMZA ABOVE, FATHA> will be equal to both <WAW, FATHA, HAMZA> and <WAW, HAMZA, FATHA>. This equivalence is very important for security issues, and proper functioning of the software, but I won't get into the details. To say the least, this is an important part of the two most awaited standards, which are still a draft: "Internationalized Domain Names", http://www.ietf.org/internet-drafts/draft-ietf-idn-idna-09.tx where applications MUST do normalization before doing name lookup for a non-ASCII domain name, and "Character Model for the World Wide Web", http://www.w3.org/TR/charmod/ where all web authoring or web content generation software is REQUIRED to normalize the text of a web document before putting it on the wire. Getting back to our U+06C0 ARABIC LETTER HEH WITH SMALL YEH ABOVE, this letter is specified to be equal to <U+06D5, U+0654>, which is <ARABIC LETTER AE, ARABIC HAMZA ABOVE>. This AE things, is a letter similiar to HEH in shape, but only used in Final and Isolated forms, something like U+0629 ARABIC LETTER TEH MARBUTA but without the dots. (I think that everyone agrees that this AE letter has no place in Persian.) Now let's consider the real sitation: one likes to encode this "ezaafe" thing. He may look at the charts, and he will either choose U+06D5, or <U+0647, U+0645> (HEH, HAMZA ABOVE), based on his preference for "precomposed" or "decomposed" forms. Let me say that you choose the first, and I choose the second. The sad point will be that no Unicode compliant application will be able to tell you that these string are equivalent. In a rewording, you will have two ways to encode the same text, without having them considered equal. The first time I found this, I asked Unicode people for changing the decomposition for U+06C0. I then found that there is a stability policy about these, and that they have had their own reasons for selecting this decomposition. After that, I asked them to remove the mention of "Persian" from the comments for this character. They asked me for a formal proposal, which will not have any problems for passing, I guess. This is the whole story. If you have questions, please be brief and patient, so I can answer them. roozbeh _______________________________________________ FarsiWeb mailing list [EMAIL PROTECTED] http://lists.sharif.edu/mailman/listinfo/farsiweb