Re: sequences and stuff
On Thu, 30 Nov 2000, Brendan Murray/DUB/Lotus wrote: There are similar situations in many languages. Possibly more complicated is the use of graphemes which usually contract but don't in some cases. For example, the "aa" sequence as in "gaard" in Danish is traditionally sorted as å (a-ring), after ø (o-slash), but in other situations, particularly in names, the "aa" is really "a"+"a", and should be sorted before "b". How can this be catered for algorithmically? My guess is that there are only two possible solutions: 1. use an exceptions list, or 2. break the grapheme with some marker like ZWNJ to prevent the contraction. Obviously the first creates a maintenance nightmare, and the latter has to be somehow tagged to store the data correctly. In any case there's no simple solution. The situation is somehow worse with Persian. The letter "U+0622, Alef With Madda Above", when at the middle of a word, is treated based on its root when sorted. This letter, although pronounced the same, may be a letter of its own (with Persian root), or may be a Hamza+Alef, and treated like a ligature when being sorted. The librarians who know the meaning of the words, have no problem when sorting, but the poor computer programs, you know. Any ideas for different markup? If you need examples, you can take "MEEM ALEF-MADDA KHAH THAL" which is sorted like "MEEM HAMZA ALEF KHAH THAL" (Hamza is sorted after Alef in Persian) and "MEEM FARSI-YEH REH ALEF-MADDA BEH" in which the Alef-Madda is considered a single unit, sorted before Alef. --roozbeh
eot and pfr files
hi all, I had earlier posted a question on viewing the characters of different languages on my browser(IE 5.0) and i have come to a conclusion that it is possible through installing fonts on your browser but what if we have to provide support for indian languages. My team mates say it is possible by either buying a software that supports these fonts or by writing eot files. But I am of the impression that eot files are taken care of by the WEFT utility so what do I do? I am totally confused and require guidance in this behalf. bye, Sreekanth Devarakonda
Re: display problems on browser
Have you tried looking at the Unicode home page, at "Display Problems", or the FAQ "Unicode on the Web"? - Original Message - From: "sreekant" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Thursday, November 30, 2000 22:27 Subject: display problems on browser hi, I am facing problems when I am trying to display non-english characters on my browser. I am getting "?" and I want to see characters in various other languages too. What should I do? Should I install any special software or should I configure my browser. Please advise as I have to deliver my application within a very short time. thanks, Sreekanth Devarakonda
Indic support in Windows (was Re: eot and pfr files)
On 12/01/2000 06:24:37 AM sreekant wrote: hi all, I had earlier posted a question on viewing the characters of different languages on my browser(IE 5.0) and i have come to a conclusion that it is possible through installing fonts on your browser but what if we have to provide support for indian languages. I'll respond making two assumptions: OS is Windows, and character encoding is Unicode, not ISCII or some custom "hacked font" encoding. Depending on what level of support you mean, things are potentially possible, but this varies by platform. You could work with at least some Indic scripts today, and view them on IE5, if you (or the user browsing the Web) are using Windows 2000, and have appropriate OpenType fonts. This without eot or WEFT. The Unicode characters are transformed in the rendering process by MS's Uniscribe engine and by substitution and positioning tables in the OT fonts into the appropriate sequence of positioned glyphs. All this happens on the user's machine, using fonts on their machine (assuming, of course, that they're running Win2K and have the fonts). On Win9x/Me, it would be potentially possible to display Indic text, but not to edit it. This is because of limitations in Unicode support in Win9x/Me: there is general support for display of Unicode characters, but character input is dependent upon codepage support, and there are no MS codepages for any S. Asian scripts. (The codepage limitation does not apply on Win2000 except for apps that are not written to support the Unicode capabilities of WinNT/2K.) Now, I say "potentially" because MS has not chosen (thus far, at any rate), to provide any Indic fonts or a version of Uniscribe with Indic support as an update option for users on Win9x/Me. The thinking apparently has been, "Why offer an Indic update pack? The support is already there on Win2000, and editing wouldn't be possible on Win9x/Me." (This was discussed recently on this list.) I'm inclined to say that they should provide an Indic display update for Win9x/Me users, though, and this is certainly a technical possibility. In Office 97, support was included for displaying Far East text, including the necessary fonts, even though there was no way to input or edit Far East text. That was seen to be useful for Office users then, and it seems to me to have been a good idea. In the same way, I'm inclined to say that it makes sense for IE users on Win9x/Me to be able to display Indic text, even though they can't input or edit. With Office 10 (I'm guessing), Win9x/Me users could also view Indic text in any Office app (but not input or edit). That would seem to me to be A Good Thing. It would certainly help motivate those who are creating Web content in S. Asian languages to start using UTF-8 rather than custom encodings that rely on hacked fonts. But I'm not the one who decides what MS will and will not provide to users. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: URDU fonts
Well, one difficulty would be in trying to understand what you mean by "TTF URDU" fonts. Are these Unicode fonts? It is true that the Arabic versions of Windows 95/98/Me do not fully support Urdu as they are supporting the Arabic *language* not the Arabic *script* (a name overload that I wish both Unicode and Microsoft would try to avoid whenever possible since it can cause confusion!). However, Windows 2000 and the Arabic enabled version of NT4 both will have much more luck with Unicode fonts that support the necessary characters for Urdu. Windows 2000 has an Urdu keyboard, and I believe you will find that the capabilities in Windows 2000 will suit all of your immediate needs here. For editing, both Word 2000 and FrontPage 2000 can do well with Urdu text (if you use the former then you have to be willing to live with all the extra tags Word loves to add, if you use the latter then I would recommend HTML view over Normal view after long experience with complex scripts in FP2000. Now, none of this will help you convert an English website to Urdu they will give you tools so that you could convert the site yourself, though. MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/ - Original Message - From: "MULTI-LINGUIST" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 8:45 AM Subject: URDU fonts Is it possible to type Urdu in Arabic Windows for a website?? I have come to know that Arabic Windows does not support any TTF Urdu fonts. Is this true? If it is true, then is it possible to transfer the fonts of Universal Word (Urdu typing software) to Arabic Windows and then type Urdu? Can we paste the Urdu text into the HTML files? If someone could also tell whether these fonts would support the Unicode system. I am actually confused. If all the above is not understandable, can someone simply tell which software to use for converting an English Website into URDU?? And what procedure to follow. Best regards Paresh Agarwal
RE: URDU fonts
To add to what have been said by Michael, I should say, that some of the big fonts (multi-script) shipped with Windows 2000 had a GSub table problem with regards to few Urdu specific characters. Part of these issues have been addressed for SP1 and the remaining will be resolved in Whistler. Tahoma, Microsoft Sans Serif and Arial are the best fonts for Urdu. Houman Microsoft Corporation -Original Message- From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] Sent: Friday, December 01, 2000 10:04 AM To: Unicode List Subject: Re: URDU fonts Well, one difficulty would be in trying to understand what you mean by "TTF URDU" fonts. Are these Unicode fonts? It is true that the Arabic versions of Windows 95/98/Me do not fully support Urdu as they are supporting the Arabic *language* not the Arabic *script* (a name overload that I wish both Unicode and Microsoft would try to avoid whenever possible since it can cause confusion!). However, Windows 2000 and the Arabic enabled version of NT4 both will have much more luck with Unicode fonts that support the necessary characters for Urdu. Windows 2000 has an Urdu keyboard, and I believe you will find that the capabilities in Windows 2000 will suit all of your immediate needs here. For editing, both Word 2000 and FrontPage 2000 can do well with Urdu text (if you use the former then you have to be willing to live with all the extra tags Word loves to add, if you use the latter then I would recommend HTML view over Normal view after long experience with complex scripts in FP2000. Now, none of this will help you convert an English website to Urdu they will give you tools so that you could convert the site yourself, though. MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/ - Original Message - From: "MULTI-LINGUIST" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 8:45 AM Subject: URDU fonts Is it possible to type Urdu in Arabic Windows for a website?? I have come to know that Arabic Windows does not support any TTF Urdu fonts. Is this true? If it is true, then is it possible to transfer the fonts of Universal Word (Urdu typing software) to Arabic Windows and then type Urdu? Can we paste the Urdu text into the HTML files? If someone could also tell whether these fonts would support the Unicode system. I am actually confused. If all the above is not understandable, can someone simply tell which software to use for converting an English Website into URDU?? And what procedure to follow. Best regards Paresh Agarwal
Re: URDU fonts
This is interesting, actually. Of the three fonts you name, is there a particular preference in terms of appearance, from an Urdu perspective? I know for example that some consider Tahoma to be wonderful for Arabic but downright homely for Farsi (when compared to Microsoft Sans Serif). Just trying to improve my knowledge of best font choices! :-) MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/ - Original Message - From: "Houman Pournasseh" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 10:37 AM Subject: RE: URDU fonts To add to what have been said by Michael, I should say, that some of the big fonts (multi-script) shipped with Windows 2000 had a GSub table problem with regards to few Urdu specific characters. Part of these issues have been addressed for SP1 and the remaining will be resolved in Whistler. Tahoma, Microsoft Sans Serif and Arial are the best fonts for Urdu. Houman Microsoft Corporation -Original Message- From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] Sent: Friday, December 01, 2000 10:04 AM To: Unicode List Subject: Re: URDU fonts Well, one difficulty would be in trying to understand what you mean by "TTF URDU" fonts. Are these Unicode fonts? It is true that the Arabic versions of Windows 95/98/Me do not fully support Urdu as they are supporting the Arabic *language* not the Arabic *script* (a name overload that I wish both Unicode and Microsoft would try to avoid whenever possible since it can cause confusion!). However, Windows 2000 and the Arabic enabled version of NT4 both will have much more luck with Unicode fonts that support the necessary characters for Urdu. Windows 2000 has an Urdu keyboard, and I believe you will find that the capabilities in Windows 2000 will suit all of your immediate needs here. For editing, both Word 2000 and FrontPage 2000 can do well with Urdu text (if you use the former then you have to be willing to live with all the extra tags Word loves to add, if you use the latter then I would recommend HTML view over Normal view after long experience with complex scripts in FP2000. Now, none of this will help you convert an English website to Urdu they will give you tools so that you could convert the site yourself, though. MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/ - Original Message - From: "MULTI-LINGUIST" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 8:45 AM Subject: URDU fonts Is it possible to type Urdu in Arabic Windows for a website?? I have come to know that Arabic Windows does not support any TTF Urdu fonts. Is this true? If it is true, then is it possible to transfer the fonts of Universal Word (Urdu typing software) to Arabic Windows and then type Urdu? Can we paste the Urdu text into the HTML files? If someone could also tell whether these fonts would support the Unicode system. I am actually confused. If all the above is not understandable, can someone simply tell which software to use for converting an English Website into URDU?? And what procedure to follow. Best regards Paresh Agarwal
gb 18030 mapping available
Hi all, Yesterday, I received the re-released mapping table for GB 18030. I updated the ICU implementation with this, in time for ICU 1.7. See http://oss.software.ibm.com/cvs/icu/~checkout~/icu/source/tools/makeconv/gb18030/gb18030.html#officialdata George and I also converted the mapping data into XML format, using the new range element that was accepted at the last UTC meeting to avoid listing 1.1 million assignments. You will find this data linked from the ICU homepage, http://oss.software.ibm.com/icu/ - see "News/Events" on the right side. Note that single surrogates are not mapped any more, and that the Euro sign was moved from GB+80 to GB+A2E3. There are no fallback mappings any more, only roundtrip mappings. markus
RE: URDU fonts
Tahoma, when it comes to Arabic script (both Arabic and Farsi languages) has a funny shape of some characters (example of ending MIME) that give the look of a kid handwriting ;-) This is why we at Microsoft have chosen Microsoft Sans Serif for Arabic Windows 2000 default UI language font. But Tahoma still remains a quality font and the current shape is pleasant for a lot contexts. Houman -Original Message- From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] Sent: Friday, December 01, 2000 11:36 AM To: Houman Pournasseh; Unicode List Subject: Re: URDU fonts This is interesting, actually. Of the three fonts you name, is there a particular preference in terms of appearance, from an Urdu perspective? I know for example that some consider Tahoma to be wonderful for Arabic but downright homely for Farsi (when compared to Microsoft Sans Serif). Just trying to improve my knowledge of best font choices! :-) MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/ - Original Message - From: "Houman Pournasseh" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 10:37 AM Subject: RE: URDU fonts To add to what have been said by Michael, I should say, that some of the big fonts (multi-script) shipped with Windows 2000 had a GSub table problem with regards to few Urdu specific characters. Part of these issues have been addressed for SP1 and the remaining will be resolved in Whistler. Tahoma, Microsoft Sans Serif and Arial are the best fonts for Urdu. Houman Microsoft Corporation -Original Message- From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] Sent: Friday, December 01, 2000 10:04 AM To: Unicode List Subject: Re: URDU fonts Well, one difficulty would be in trying to understand what you mean by "TTF URDU" fonts. Are these Unicode fonts? It is true that the Arabic versions of Windows 95/98/Me do not fully support Urdu as they are supporting the Arabic *language* not the Arabic *script* (a name overload that I wish both Unicode and Microsoft would try to avoid whenever possible since it can cause confusion!). However, Windows 2000 and the Arabic enabled version of NT4 both will have much more luck with Unicode fonts that support the necessary characters for Urdu. Windows 2000 has an Urdu keyboard, and I believe you will find that the capabilities in Windows 2000 will suit all of your immediate needs here. For editing, both Word 2000 and FrontPage 2000 can do well with Urdu text (if you use the former then you have to be willing to live with all the extra tags Word loves to add, if you use the latter then I would recommend HTML view over Normal view after long experience with complex scripts in FP2000. Now, none of this will help you convert an English website to Urdu they will give you tools so that you could convert the site yourself, though. MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/ - Original Message - From: "MULTI-LINGUIST" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 8:45 AM Subject: URDU fonts Is it possible to type Urdu in Arabic Windows for a website?? I have come to know that Arabic Windows does not support any TTF Urdu fonts. Is this true? If it is true, then is it possible to transfer the fonts of Universal Word (Urdu typing software) to Arabic Windows and then type Urdu? Can we paste the Urdu text into the HTML files? If someone could also tell whether these fonts would support the Unicode system. I am actually confused. If all the above is not understandable, can someone simply tell which software to use for converting an English Website into URDU?? And what procedure to follow. Best regards Paresh Agarwal
Re: Transcriptions of Unicode
Sad to report, my browser (Netscape 4.7) shows the Yiddish as Daw-key-nu-ye (It's left to right not rtl...) I am using the Monotype Andale Duospace font. tex Mark Davis wrote: I am interested in collecting transcriptions of the word "Unicode" in different scripts (and languages). If you are fluent in a language other than Unicode, I'd appreciate any suggestions. What I have so far is at: http://www.macchiato.com/unicode/Unicode_transcriptions.html Mark ___ Mark Davis, IBM Center for Java Technology, Cupertino (408) 777-5850 [fax: 5891], [EMAIL PROTECTED], [EMAIL PROTECTED] http://maps.yahoo.com/py/maps.py?Pyt=Tmapaddr=10275+N.+De+Anzacsz=95014 -- -- Tex Texin Director, International Business mailto:[EMAIL PROTECTED] +1-781-280-4271 Fax:+1-781-280-4655 Progress Software Corp.14 Oak Park, Bedford, MA 01730 http://www.Progress.com#1 Embedded Database http://www.SonicMQ.com #1 Performing JMS Messaging http://www.ASPconnections.com #1 provider in the ASP marketplace http://www.NuSphere.comOpen Source software and services for MySQL Globalization Program http://www.Progress.com/partners/globalization.htm ---
Topicality of Postings
Good Morning Children: The topic of the day is Topicality. It should have been yesterday's topic, but I was busy with something else. Topicality was moderately disregarded in message UMLSEQ:17099 when Monsieur LaBonte saw fit to regale us with the once-cute "Revocation" that has been making the rounds so much lately over there in the Colonies... 'Leven Digit Boy predictably, compounded the digression (which Monsieur LaBonte should have known better than to start in the first place) by quoting the missive in toto with one off-color comment embedded (in UMLSEQ:17114). Children must play of course, but please let's show some restraint and moderation: there is quite enough traffic on this list without resorting to such attempts to derail the sublime train of topicality. It's not that I really mind so very much personally -- I mean, bits are bits as far as I'm concerned, and one is as good as another -- but some folks are annoyed by excess bits of the wrong persuasion, and begin sending me more pieces of their minds than I care to receive when they are sprayed with such playful bits by well-meaning off-topic correspondents. Cheery regards from your effervescent but bitwise conservative, -- Sarasvati
Re: Topicality of Postings
À 15:43 2000-12-01 -0800, Sarasvati a écrit: Topicality was moderately disregarded in message UMLSEQ:17099 when Monsieur LaBonte saw fit to regale us with the once-cute "Revocation" that has been making the rounds so much lately over there in the Colonies... 'Leven Digit Boy predictably, compounded the digression (which Monsieur LaBonte should have known better than to start in the first place) by quoting the missive in toto with one off-color comment embedded (in UMLSEQ:17114). Question: Who is this Mr. LaBonte to distort my name like this on a list dedicated to characters of the world? He is not very serious indeed to care about universal characters then... He should know about his roots if he wrote his name like this. (; Alain LaBonté Québec
Re: URDU fonts
From: "MULTI-LINGUIST" [EMAIL PROTECTED] Thanks so much Mr.Michka, Just michka, or Michael is fine. :-) I donot know whether the URDU TTF fonts are Unicode fonts or not. What I mean is InPage (Urdu typesetting software) has TTF fonts, but these are only restricted to InPage i.e. not compatible with any other software, even with Farsi/Arabic Windows. Universal Word also has TTF URdu fonts, but may or may not be compatible with Arabic/Farsi Windows. Sounds like they are not Unicode fonts, which would be useable in Word and other programs. Forgive me for the lack of knowledge, but you have mentioned "Arabic *language* not the Arabic *script* ". Well, what is the difference between the two?? Well, the Arabic language is used in various middle eastern countries. However, the Arabic script is used to represent many languages, including Arabic, Farsi, Urdu, Pashto, and about a dozen others. Many of these languages, such as Urdu, have letters that are not used in the Arabic language but are necessary to make proper use of the other language. I understand that Windows 2000 has URDU keyboard, but then what about the URDU fonts? We have Windows 98 and WORD 2000, will that be helpful? Well, it will be helpful in VIEWING documents, but not in typing them in there are I believe 16 letters that are used by Urdu that do not exist in cp1256 (the Arabic code page for Windows) and those characters do not exist in keyboards for Win98 or anywhere else. MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/
Re: Transcriptions of Unicode
Done. - Original Message - From: "Michael (michka) Kaplan" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 15:19 Subject: Re: Transcriptions of "Unicode" IE 5.0, 5.5, NN 6.0, and the latest build of Mozilla all do the right thing with the word. So that would be the fault of your browser choice. :-) I would suggest adding a span title="{insert lang name}"/title around each lang name, as it will cause IE to show the language name in a tooltip when you hover the mouse after a slight delay lets people guess the languages and then see if their guesses were right. Always a nice effect... michka a new book on internationalization in VB at http://www.i18nWithVB.com/ - Original Message - From: "Tex Texin" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 2:30 PM Subject: Re: Transcriptions of "Unicode" Sad to report, my browser (Netscape 4.7) shows the Yiddish as Daw-key-nu-ye (It's left to right not rtl...) I am using the Monotype Andale Duospace font. tex Mark Davis wrote: I am interested in collecting transcriptions of the word "Unicode" in different scripts (and languages). If you are fluent in a language other than Unicode, I'd appreciate any suggestions. What I have so far is at: http://www.macchiato.com/unicode/Unicode_transcriptions.html Mark ___ Mark Davis, IBM Center for Java Technology, Cupertino (408) 777-5850 [fax: 5891], [EMAIL PROTECTED], [EMAIL PROTECTED] http://maps.yahoo.com/py/maps.py?Pyt=Tmapaddr=10275+N.+De+Anzacsz=95014 -- -- Tex Texin Director, International Business mailto:[EMAIL PROTECTED] +1-781-280-4271 Fax:+1-781-280-4655 Progress Software Corp.14 Oak Park, Bedford, MA 01730 http://www.Progress.com#1 Embedded Database http://www.SonicMQ.com #1 Performing JMS Messaging http://www.ASPconnections.com #1 provider in the ASP marketplace http://www.NuSphere.comOpen Source software and services for MySQL Globalization Program http://www.Progress.com/partners/globalization.htm -- -