Re: displaying Unicode text (was Re: Transcriptions of Unicode)
Mark Davis wrote: Let's take an example. - The page is UTF-8. - It contains a mixture of German, dingbats and Hindi text. - My locale is de_DE. From your description, it sounds like Modzilla works as follows: - The locale maps (I'm guessing) to 8859-1 - 8859 maps to, say Helvetica. - The dingbats and Hindi appear as boxes or question marks. This would be pretty lame, so I hope I misunderstand you!! Sorry, I've been abbreviating quite a bit, so I left out a lot. Yes, you've misunderstood me, but only because I abbreviated so much. Sorry. Let me try again, with more feeling this time. Using the example above: - The locale maps to "x-western" (ja_JP would map to "ja", so I've prepended "x-" for the "language groups" that don't exist in RFC 1766) - x-western and CSS' sans-serif map to Arial - The dingbats appear as dingbats if they are in Unicode and at least one of the dingbat fonts on the system has a Unicode cmap subtable (WingDings is a "symbol" font, so it doesn't have such a table), while the Hindi might display OK on some Windows systems if they have Hindi support (Mozilla itself does not support any Indic languages yet). We could support the WingDings font if we add an entry for WingDings to the following table: http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp#872 We just haven't done that yet. Basically, Mozilla will look at all the fonts on the system to find one that contains a glyph for the current character. The language group and user locale stuff that I mentioned earlier is only one part of the process -- the part that deals with the user's font preferences. I'll explain more of the rest of the process: Mozilla implements CSS2's font matching algorithm: http://www.w3.org/TR/REC-CSS2/fonts.html#algorithm This states that *for each character* in the element, the implementation is supposed to go down the list of fonts in the font-family property, to find a font that exists and that contains a glyph for the current character. Mozilla implements this algorithm to the letter, which means that fonts are chosen for each character without regard for neighboring characters (unlike MSIE). This may actually have been a bad decision, since we sometimes end up with text that looks odd due to font changes. Anyway, Mozilla's algorithm has the following steps: 1. "User-Defined" font 2. CSS font-family property 3. CSS generic font (e.g. serif) 4. list of all fonts on system 5. transliteration 6. question mark You can see these steps in the following pieces of code: http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp#2642 http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#3108 1. "User-Defined" font (FindUserDefinedFont) We decided to include the User-Defined font functionality in Netscape 6 again. It is similar to the old Netscape 4.X. Basically, if the user selects this encoding from the View menu, then the browser passes the bytes through to the font, untouched. This is for charsets that we don't already support. This step needs to be the first step, since it overrides everything else. 2. CSS font-family property (FindLocalFont) If the user hasn't selected User-Defined, we invoke this routine. It simply goes down the font-family list to find a font that exists and that contains a glyph for the current character. E.g.: font-family: Arial, "MS Gothic", sans-serif; 3. CSS generic font (FindGenericFont) If the above fails, this routine tries to find a font for the CSS generic (e.g. sans-serif) that was found in the font-family property, if any, otherwise it falls back to the user's default (serif or sans-serif). This is where the font preferences come in, so this is where we try to determine the language group of the element. I.e. we take the LANG attribute of this element or a parent element if any, otherwise the language group of the document's charset, if non-Unicode-based, otherwise the user's locale's language group. 4. list of all fonts on system (FindGlobalFont) If the above fails, this routine goes through all fonts on the system, trying to find one that contains a glyph for the current character. 5. transliteration (FindSubstituteFont) If we still can't find a font for this character, we try a transliteration table. For example, the euro is mapped to the 3 ASCIIs "EUR", which is useful on some Unix systems that don't have the euro glyph yet. Actually, this transliteration step isn't even implemented on Windows yet. 6. question mark (FindSubstituteFont) If we can't find a transliteration, we fall back to the last resort -- the good ol' question mark. That's it. I hope I didn't abbreviate too much this time! Erik
Re: OT (Kind of): Determining whether Locales are left-to-right or right-to-left.
Michael Kaplan wrote: plus... dumb question 1. Is Aramaic (which doesn't seem to have a 2 character ISO code) the same as Amharic (which does...AM)? If not, Amharic appears to be a Semetic language too, is that written right-to-left too? Amharic uses the Ethiopic script, and is not RTL as far a I know. Aramaic has no native speakers As far as I know, there is still a (small) minority of speakers in Turkey and Syria who speak the present-day descendant language of (biblical) Aramaic. This present-day dialect is commonly called Aramaic too. I have absolutely no idea what writing system, if any, they would use today (although probably not the ancient Aramaic script? More likely Arabic?) Lukas Pietsch
Re: OT (Kind of): Determining whether Locales are left-to-right
Ar 01:47 -0800 2000-12-07, scríobh Antoine Leca: Urdu written in Nagari script is left-to-right? This is new to me... No, Urdu written in Nagari script is Hindi. There are a number of Muslim people that insist on naming it Urdu rather than Hindi. Since both codes exist, and hence if you look at the "locale" you are in fact looking after the language code, as a result, you "should" be able to deal with requests such as "Urdu written in Nagari"; or "Urdu written in Latin script"... (why not? after all, it is more practical when you are using some random computer.) Antoine, it was a joke. Humour. Furthermore, as a language Urdu predates Hindi by a wide margin (several centuries). And between Medieval times and XIXth century, this language, named Urdu, was written alternatively using Nagari or Arabic script. Now I agree that this is irrelevant to the current problem. Um, my understanding is that the "Hindustani" language (so called "Hindoostani" by the British way back when) is really fairly uniform, apart from the alphabet it is written (Arabic by Muslims, Nagari by Hindus, to use the sectarian taxonomy), and the fact that for much of the higher terminology, Urdu tends to borrow from Arabic and Hindi tends to borrow from Sanskrit. You may mean that "as a written language" Urdu predates Hindi. Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire
Re: Transcriptions of Unicode
On Wed, Dec 06, 2000 at 11:12:24PM -0800, James Kass wrote: As for Chinese users searching for Chinese strings, Japanese text will most probably be incomprehensible regardless of font or mark-up. That's true for pretty much every other pair of languages that use the same script, though. -- David Starner - [EMAIL PROTECTED] http://dvdeug.dhis.org "(You see, the best way to solve a problem is to rigorously define it in terms of other people's problems and then run away quickly.)" -- Roland McGrath [EMAIL PROTECTED]
Re: OT (Kind of): Determining whether Locales are left-to-right or
The application isn't "english", it's "an application". Properly done, it should be internationalized and thus able to be an "Arabic application" when serving Arabic pages and English when serving English pages. I totally agree. This is actually what I am trying to achieve and what I was trying to convey to John. Unfortunately, the nature of the application means that certain labels from the application are exposed to the user which may be in a different language to the data they pertain. I have been using the example of English for the application data and Arabic for the data from the database. In reality, the application will be translated into several (probably about 8 languages). The data in the database is stored in Unicode and so could be in any language supported by Unicode...and the browser of course. The actual data is maintained by users via a web interface. It is conceivably possible that data can be entered by multiple users in multiple languages...some of which RTL and some LTR. We will, of course, allow the user to tag the data with a language identifier. However, I guess my problem boils down to this. If a user requests a page that contains data that could, potentially, be in multiple languages. What criteria does one use to determine directionality of the page? The directionality of the *text* is implied by each data element itself. But what about the page? I don't think that examining all the data elements and saying well there's 70% RTL data so lets put the page in RTL. I didn't think that was practical. Especially, if it causes means that one user is going to see the same page in RTL or LTR based on selection criteria over the database. My reasoning was that a user that has expressed a preference for a RTL language would not mind seeing the page formatted for a RTL language. Especially, if that there is a high probability that that data they will be seeing is in that language. (If they have indicated a preference for that language, then they would probably enter the data in that language.)I know this reasoning is somewhat flawed but I am trying to the best with what I can. I would be interested to hear the opinion of a native Arabic or Hebrew (or one of the other RTL language) speakers as to how they would prefer to view a page such as that. My thought would be that such a person would have a preference for a RTL page; their eyes would naturally scan RTL formatted pages easier than LTR formatted pages. Hell, if they have no preference then I can just leave the directionality of the page to always be LTR...it's, obviously, less work! :-) David Tooke [EMAIL PROTECTED] - Original Message - From: [EMAIL PROTECTED] To: "David Tooke" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 08, 2000 2:54 AM Subject: Re: OT (Kind of): Determining whether Locales are left-to-right or Hi David, I sense a subtle (and not uncommon) disconnect in your last response. The application isn't "english", it's "an application". Properly done, it should be internationalized and thus able to be an "Arabic application" when serving Arabic pages and English when serving English pages. You might have instances where an "English application" is *showing* some Arabic data, in which case you have some Arabic data on an English formatted page. Now: the data locale, page (user) locale, and server locale are three separate, independent things and each has a certain validity under certain circumstances. Just because you're showing the date stamp for some event in Riyadh (sp?) doesn't mean you should use an Arabic date format (if the user's locale is English). Similarly, the date stamp for an event in London *would* be in an Arabic format for an Arabic user, no? Trying to rely on the range of characters or some heuristic to determine what the data locale is implies a hole in your database schema. The page layout will vary by *user* locale, but data presented in it may be formatted for its own locale (for example, an Arabic piece of text, say a customer's name, will be presented RTL *within* a generally LTR page). So, in short, you need to negotiate a locale with the user in your application and use that to determine the overall page layout. *Within* that page there will be specific instances (or not) of the data locale being used to format content. An "Arabic application" run from your server will have directionality tags in the HTML (at least for modern browsers) which will greatly assist the "relayout", plus it will load explicitly RTL page elements (such as graphics, etc.) and use an Arabic locale for formatting non-String datatypes. Good luck, Addison === Addison P. PhillipsPrincipal Consultant Inter-Locale LLChttp://www.inter-locale.com Los Gatos, CA, USA mailto:[EMAIL PROTECTED] +1
Re: OT (Kind of): Determining whether Locales are left-to-right or
Hi David, Actually, I think that the directionality of the page *CONSIDERED AS A WHOLE* is directly related to the locale of the user. For example, I might look at a page that contains an very large result set from a database query, presented as a table. The results would comprise 90% of the text, let's say, of the document. If the results are all in Hebrew, should I re-layout the page, if the headings and the footer and other information is in English? I think the answer is "no"--in other words, the opposite of what you're saying. If the *user* locale is en_US, then the page should be laid out for that user's preferences, even if the data itself (in individual fields) is RTL. IOW, locale has more than one level and you can have more than one locale on a page... but the "overriding" locale--the "page default" should go with the USER'S language preference, not with the specific data currently being retrieved. That way you can still do reasonable things when mixing disparate languages in the same page. It also means that the user experience will be homogenous during a session---once in the RTL layout, always in the RTL layout. thanks, Addison === Addison P. PhillipsPrincipal Consultant Inter-Locale LLChttp://www.inter-locale.com Los Gatos, CA, USA mailto:[EMAIL PROTECTED] +1 408.210.3569 (mobile) +1 408.904.4762 (fax) === Globalization Engineering Consulting Services On Thu, 7 Dec 2000, David Tooke wrote: The application isn't "english", it's "an application". Properly done, it should be internationalized and thus able to be an "Arabic application" when serving Arabic pages and English when serving English pages. I totally agree. This is actually what I am trying to achieve and what I was trying to convey to John. Unfortunately, the nature of the application means that certain labels from the application are exposed to the user which may be in a different language to the data they pertain. I have been using the example of English for the application data and Arabic for the data from the database. In reality, the application will be translated into several (probably about 8 languages). The data in the database is stored in Unicode and so could be in any language supported by Unicode...and the browser of course. The actual data is maintained by users via a web interface. It is conceivably possible that data can be entered by multiple users in multiple languages...some of which RTL and some LTR. We will, of course, allow the user to tag the data with a language identifier. However, I guess my problem boils down to this. If a user requests a page that contains data that could, potentially, be in multiple languages. What criteria does one use to determine directionality of the page? The directionality of the *text* is implied by each data element itself. But what about the page? I don't think that examining all the data elements and saying well there's 70% RTL data so lets put the page in RTL. I didn't think that was practical. Especially, if it causes means that one user is going to see the same page in RTL or LTR based on selection criteria over the database. My reasoning was that a user that has expressed a preference for a RTL language would not mind seeing the page formatted for a RTL language. Especially, if that there is a high probability that that data they will be seeing is in that language. (If they have indicated a preference for that language, then they would probably enter the data in that language.)I know this reasoning is somewhat flawed but I am trying to the best with what I can. I would be interested to hear the opinion of a native Arabic or Hebrew (or one of the other RTL language) speakers as to how they would prefer to view a page such as that. My thought would be that such a person would have a preference for a RTL page; their eyes would naturally scan RTL formatted pages easier than LTR formatted pages. Hell, if they have no preference then I can just leave the directionality of the page to always be LTR...it's, obviously, less work! :-) David Tooke [EMAIL PROTECTED] - Original Message - From: [EMAIL PROTECTED] To: "David Tooke" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 08, 2000 2:54 AM Subject: Re: OT (Kind of): Determining whether Locales are left-to-right or Hi David, I sense a subtle (and not uncommon) disconnect in your last response. The application isn't "english", it's "an application". Properly done, it should be internationalized and thus able to be an "Arabic application" when serving Arabic pages and English when serving English pages. You might have instances where an "English application" is *showing* some Arabic data, in
RE: [OT] Arabic script langs in 3.0 ; list?
Elaine, You speak of the "standard Arabic script". You must add additional letters to the standard Arabic script for languages such as Farsi and Urdu. Carl -Original Message- From: Elaine Keown [mailto:[EMAIL PROTECTED]] Sent: Monday, December 04, 2000 2:20 PM To: Unicode List Subject: [OT] Arabic script langs in 3.0 ; list? Hello, Unicode 3.0 mentions 11 contemporary languages written in Arabic, most from Central Asia, none from Africa except Arabic---Berber is not mentioned. Is Arabic script no longer used south of the Sahara? Or does standard Arabic script easily cover relevant African languages? My usually excellent university library did not answer this question also: today how many languages are written in Arabic script? I can only find 25: Arabic Balti Baluchi Berber Farsi Hausa Karaite Kashmiri Kazakh Kirghiz Kurmanji Luri Mazanderani Moplah Panjabi---PakistaniPashto Pulaar Sindhi Siraiki (also known as Saraiki or Lahnda or Western Panjabi) Sulu Uighur Urdu Uzbek Wolof Help appreciated with improving my list-Elaine ___ Free Unlimited Internet Access! Try it now! http://www.zdnet.com/downloads/altavista/index.html ___
Re: OT (Kind of): Determining whether Locales are left-to-right or
This message is best viewed with a monowidth font. [EMAIL PROTECTED] wrote: For example, I might look at a page that contains an very large result set from a database query, presented as a table. The results would comprise 90% of the text, let's say, of the document. If the results are all in Hebrew, should I re-layout the page, if the headings and the footer and other information is in English? I think the answer is "no"--in other words, the opposite of what you're saying. So far so clear. The page as a whole is LTR with RTL inclusions, namely the database content, like this (as usual, lowercase is LTR text, UPPERCASE is RTL text): top dogs by country countryfirstnamelastname ---- u.s. bill clinton israel DUHE KARAB u.k. tony blair syria RASHAB DASSA-LA If the *user* locale is en_US, then the page should be laid out for that user's preferences, even if the data itself (in individual fields) is RTL. But now your message seems to go off the rails. If the application is *not* localizable in Hebrew (it insists on presenting header, footer, etc. in English), but the browser's locale setting is "il-he", you want the page to be presented RTL with the header and footer as embedded LTR, like this? top dogs by country lastname firstname country - --- clinton bill u.s. KARABDUHE israel blairtony u.k. DASSA-LA RASHABsyria If a user requests a page that contains data that could, potentially, be in multiple languages. What criteria does one use to determine directionality of the page? The directionality of the *text* is implied by each data element itself. But what about the page? My view is that the base direction of the page is the direction of the fixed elements on the page. If these fixed elements are in English, the base direction is always LTR. If the fixed elements are localizable based on the browser settings, then whatever language/script they are localized into determines the base direction. In short, Example 1 good, Example 2 bad, no matter what the browser setting is. Of course, if the application can cope with an "il-he" browser setting and render the fixed elements into Hebrew, then the base direction should be RTL. -- There is / one art || John Cowan [EMAIL PROTECTED] no more / no less|| http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein
Re: displaying Unicode text (was Re: Transcriptions of Unicode)
Thanks! I appreciate the description. My fears were unfounded. This states that *for each character* in the element, the implementation is supposed to go down the list of fonts in the font-family property, to find a font that exists and that contains a glyph for the current character. I agree that this does not produce the optimal results, since one should have the freedom to select different fonts based on the context of the character. The above description is much better than a very coarse-grained approach (like having the entire document or element in the same font), but needs some more wriggle-room to allow people flexibility. Mark - Original Message - From: "Erik van der Poel" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Thursday, December 07, 2000 00:30 Subject: Re: displaying Unicode text (was Re: Transcriptions of "Unicode") Mark Davis wrote: Let's take an example. - The page is UTF-8. - It contains a mixture of German, dingbats and Hindi text. - My locale is de_DE. From your description, it sounds like Modzilla works as follows: - The locale maps (I'm guessing) to 8859-1 - 8859 maps to, say Helvetica. - The dingbats and Hindi appear as boxes or question marks. This would be pretty lame, so I hope I misunderstand you!! Sorry, I've been abbreviating quite a bit, so I left out a lot. Yes, you've misunderstood me, but only because I abbreviated so much. Sorry. Let me try again, with more feeling this time. Using the example above: - The locale maps to "x-western" (ja_JP would map to "ja", so I've prepended "x-" for the "language groups" that don't exist in RFC 1766) - x-western and CSS' sans-serif map to Arial - The dingbats appear as dingbats if they are in Unicode and at least one of the dingbat fonts on the system has a Unicode cmap subtable (WingDings is a "symbol" font, so it doesn't have such a table), while the Hindi might display OK on some Windows systems if they have Hindi support (Mozilla itself does not support any Indic languages yet). We could support the WingDings font if we add an entry for WingDings to the following table: http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp #872 We just haven't done that yet. Basically, Mozilla will look at all the fonts on the system to find one that contains a glyph for the current character. The language group and user locale stuff that I mentioned earlier is only one part of the process -- the part that deals with the user's font preferences. I'll explain more of the rest of the process: Mozilla implements CSS2's font matching algorithm: http://www.w3.org/TR/REC-CSS2/fonts.html#algorithm This states that *for each character* in the element, the implementation is supposed to go down the list of fonts in the font-family property, to find a font that exists and that contains a glyph for the current character. Mozilla implements this algorithm to the letter, which means that fonts are chosen for each character without regard for neighboring characters (unlike MSIE). This may actually have been a bad decision, since we sometimes end up with text that looks odd due to font changes. Anyway, Mozilla's algorithm has the following steps: 1. "User-Defined" font 2. CSS font-family property 3. CSS generic font (e.g. serif) 4. list of all fonts on system 5. transliteration 6. question mark You can see these steps in the following pieces of code: http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp #2642 http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#310 8 1. "User-Defined" font (FindUserDefinedFont) We decided to include the User-Defined font functionality in Netscape 6 again. It is similar to the old Netscape 4.X. Basically, if the user selects this encoding from the View menu, then the browser passes the bytes through to the font, untouched. This is for charsets that we don't already support. This step needs to be the first step, since it overrides everything else. 2. CSS font-family property (FindLocalFont) If the user hasn't selected User-Defined, we invoke this routine. It simply goes down the font-family list to find a font that exists and that contains a glyph for the current character. E.g.: font-family: Arial, "MS Gothic", sans-serif; 3. CSS generic font (FindGenericFont) If the above fails, this routine tries to find a font for the CSS generic (e.g. sans-serif) that was found in the font-family property, if any, otherwise it falls back to the user's default (serif or sans-serif). This is where the font preferences come in, so this is where we try to determine the language group of the element. I.e. we take the LANG attribute of this element or a parent element if any, otherwise the language group of the document's charset, if non-Unicode-based, otherwise the user's locale's language group.
Re: OT (Kind of): Determining whether Locales are left-to-right or
On 12/06/2000 12:19:00 PM "Michael \(michka\) Kaplan" wrote: Aramaic has no native speakers True, if what is meant is Ancient Aramaic. False if we mean Assyrian or Chaldean Neo-Aramaic. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: OT (Kind of): Determining whether Locales are left-to-right or
I think you really need to give the user the option to override the assumptions being made, as the degree of familiarity and experience the user has with Hebrew and Arabic, and the purpose for using the application will make a big difference. For example, the case that you are suggesting goes off the rails, is exactly what the user would want if he were going to query the list of just Arabic and Israeli records and then copy and paste them into perhaps a spreadsheet. Having the header ordering change would allow the paste to place the date from each field in the right columns in the target spreadsheet. I would spend less time debating which is correct, and simply offer a button on the UI to flip the ordering of the page. tex John Cowan wrote: This message is best viewed with a monowidth font. [EMAIL PROTECTED] wrote: For example, I might look at a page that contains an very large result set from a database query, presented as a table. The results would comprise 90% of the text, let's say, of the document. If the results are all in Hebrew, should I re-layout the page, if the headings and the footer and other information is in English? I think the answer is "no"--in other words, the opposite of what you're saying. So far so clear. The page as a whole is LTR with RTL inclusions, namely the database content, like this (as usual, lowercase is LTR text, UPPERCASE is RTL text): top dogs by country countryfirstnamelastname ---- u.s. bill clinton israel DUHE KARAB u.k. tony blair syria RASHAB DASSA-LA If the *user* locale is en_US, then the page should be laid out for that user's preferences, even if the data itself (in individual fields) is RTL. But now your message seems to go off the rails. If the application is *not* localizable in Hebrew (it insists on presenting header, footer, etc. in English), but the browser's locale setting is "il-he", you want the page to be presented RTL with the header and footer as embedded LTR, like this? top dogs by country lastname firstname country - --- clinton bill u.s. KARABDUHE israel blairtony u.k. DASSA-LA RASHABsyria If a user requests a page that contains data that could, potentially, be in multiple languages. What criteria does one use to determine directionality of the page? The directionality of the *text* is implied by each data element itself. But what about the page? My view is that the base direction of the page is the direction of the fixed elements on the page. If these fixed elements are in English, the base direction is always LTR. If the fixed elements are localizable based on the browser settings, then whatever language/script they are localized into determines the base direction. In short, Example 1 good, Example 2 bad, no matter what the browser setting is. Of course, if the application can cope with an "il-he" browser setting and render the fixed elements into Hebrew, then the base direction should be RTL. -- There is / one art || John Cowan [EMAIL PROTECTED] no more / no less|| http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein -- According to Murphy, nothing goes according to Hoyle. -- Tex Texin Director, International Business mailto:[EMAIL PROTECTED] +1-781-280-4271 Fax:+1-781-280-4655 Progress Software Corp.14 Oak Park, Bedford, MA 01730 http://www.Progress.com#1 Embedded Database Globalization Program http://www.Progress.com/partners/globalization.htm ---
Re: OT (Kind of): Determining whether Locales are left-to-right or
So, suppose all the data from the database is in Hebrew...and the user's browser is set to Hebrew. You are saying that because we have some headings in English; the entire page should be formatted LTR not RTL. I have to disagree. The *content* of the page would be Hebrew. The headings are simply utilitarian, they shouldn't affect that. I think we have established it would be impractical to base the directionality on that of the database because of the presence of mixed languages...in order to create a uniform experience for the user. I don't think it should be based of the application. A Hebrew document written by a user on an untranslated word processor is *still* a Hebrew document. This just leaves the users locale. I know you balk at having it like your second example, but what is so bad about it? It looks kinda strange to you and me. But would it to a native speaker of a RTL language? I would be interested in their opinion. (I am assuming you are not a native speaker of a RTL language.) - Original Message - From: "John Cowan" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Thursday, December 07, 2000 11:01 AM Subject: Re: OT (Kind of): Determining whether Locales are left-to-right or This message is best viewed with a monowidth font. [EMAIL PROTECTED] wrote: For example, I might look at a page that contains an very large result set from a database query, presented as a table. The results would comprise 90% of the text, let's say, of the document. If the results are all in Hebrew, should I re-layout the page, if the headings and the footer and other information is in English? I think the answer is "no"--in other words, the opposite of what you're saying. So far so clear. The page as a whole is LTR with RTL inclusions, namely the database content, like this (as usual, lowercase is LTR text, UPPERCASE is RTL text): top dogs by country countryfirstnamelastname ---- u.s. bill clinton israel DUHE KARAB u.k.tony blair syria RASHAB DASSA-LA If the *user* locale is en_US, then the page should be laid out for that user's preferences, even if the data itself (in individual fields) is RTL. But now your message seems to go off the rails. If the application is *not* localizable in Hebrew (it insists on presenting header, footer, etc. in English), but the browser's locale setting is "il-he", you want the page to be presented RTL with the header and footer as embedded LTR, like this? top dogs by country lastname firstname country - --- clinton bill u.s. KARABDUHE israel blairtony u.k. DASSA-LA RASHABsyria If a user requests a page that contains data that could, potentially, be in multiple languages. What criteria does one use to determine directionality of the page? The directionality of the *text* is implied by each data element itself. But what about the page? My view is that the base direction of the page is the direction of the fixed elements on the page. If these fixed elements are in English, the base direction is always LTR. If the fixed elements are localizable based on the browser settings, then whatever language/script they are localized into determines the base direction. In short, Example 1 good, Example 2 bad, no matter what the browser setting is. Of course, if the application can cope with an "il-he" browser setting and render the fixed elements into Hebrew, then the base direction should be RTL. -- There is / one art || John Cowan [EMAIL PROTECTED] no more / no less|| http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein
Re: OT (Kind of): Determining whether Locales are left-to-right or
David Tooke wrote: I don't think it should be based of the application. A Hebrew document written by a user on an untranslated word processor is *still* a Hebrew document. I assume you mean "a word processor localized in English" rather than "a word processor that can't do bidi". If the former, I agree. If the latter, it will be sheer gibberish, unless the user is a magician at writing his Hebrew LTR, line by line. But note that the chrome of a word-processor (the border, menus, etc. etc.) aren't *part* of the document it is editing. Take the converse case: the document frame is in Hebrew, with the key column on the right, but most of the data is in English. On my assumptions, when I look at such a thing, I am a little confused because the key column is on the right. But when I select the variable part of the text (starting in the upper right corner and sweeping to the lower left corner), and copy and paste it to a word processor, the logical-order rules ensure that the text comes out with the key column on the left again. (I am assuming you are not a native speaker of a RTL language.) I am not. -- There is / one art || John Cowan [EMAIL PROTECTED] no more / no less|| http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein
Re: OT (Kind of): Determining whether Locales are left-to-right or
[EMAIL PROTECTED] wrote: Hi John, I think we're saying the same thing: the language of the page is the base directionality. If the application is localized into Hebrew, then the base directionality is RTL. Okay, good. In that case by "user locale" you mean "language of the (fixed parts of the) page", which is confusing; David was interpreting it as "locale set in the browser". -- There is / one art || John Cowan [EMAIL PROTECTED] no more / no less|| http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein
Re: OT (Kind of): Determining whether Locales are left-to-right or
Hi John, I think we're saying the same thing: the language of the page is the base directionality. If the application is localized into Hebrew, then the base directionality is RTL. Addison === Addison P. PhillipsPrincipal Consultant Inter-Locale LLChttp://www.inter-locale.com Los Gatos, CA, USA mailto:[EMAIL PROTECTED] +1 408.210.3569 (mobile) +1 408.904.4762 (fax) === Globalization Engineering Consulting Services On Thu, 7 Dec 2000, John Cowan wrote: This message is best viewed with a monowidth font. [EMAIL PROTECTED] wrote: For example, I might look at a page that contains an very large result set from a database query, presented as a table. The results would comprise 90% of the text, let's say, of the document. If the results are all in Hebrew, should I re-layout the page, if the headings and the footer and other information is in English? I think the answer is "no"--in other words, the opposite of what you're saying. So far so clear. The page as a whole is LTR with RTL inclusions, namely the database content, like this (as usual, lowercase is LTR text, UPPERCASE is RTL text): top dogs by country country firstnamelastname ---- u.s. bill clinton israel DUHE KARAB u.k. tony blair syria RASHAB DASSA-LA If the *user* locale is en_US, then the page should be laid out for that user's preferences, even if the data itself (in individual fields) is RTL. But now your message seems to go off the rails. If the application is *not* localizable in Hebrew (it insists on presenting header, footer, etc. in English), but the browser's locale setting is "il-he", you want the page to be presented RTL with the header and footer as embedded LTR, like this? top dogs by country lastname firstname country - --- clinton bill u.s. KARABDUHE israel blairtony u.k. DASSA-LA RASHABsyria If a user requests a page that contains data that could, potentially, be in multiple languages. What criteria does one use to determine directionality of the page? The directionality of the *text* is implied by each data element itself. But what about the page? My view is that the base direction of the page is the direction of the fixed elements on the page. If these fixed elements are in English, the base direction is always LTR. If the fixed elements are localizable based on the browser settings, then whatever language/script they are localized into determines the base direction. In short, Example 1 good, Example 2 bad, no matter what the browser setting is. Of course, if the application can cope with an "il-he" browser setting and render the fixed elements into Hebrew, then the base direction should be RTL. -- There is / one art || John Cowan [EMAIL PROTECTED] no more / no less|| http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein
Re: Font help
Michael Everson wrote: I have no idea how I should encode such a font. Help? According to a suitable legacy encoding, such as JIS X 0208, Shift-JIS, or MacJapanese. What you will have is a font which provides glyphs for only a few coded characters, but no matter. It's just like an 8859-1 encoded font with glyphs only for A-Z and a-z, which is common enough. You could also support the katakana fingerspellings only in an 8-bit font encoded using JIS X 0201. Mapping tables for all of these are available at http://www.unicode.org/Public/MAPPINGS under EASTASIA/JIS and VENDORS/APPLE. -- There is / one art || John Cowan [EMAIL PROTECTED] no more / no less|| http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein
RE: (bidi) Determining whether Locales are left-to-right
Localized Hebrew Excel has a button to flip the global direction of the screen and presentation of the columns - RTL or LTR - and this preference is also stored in the file. The user can choose his default. It should not be determined from the locale, when I open a new spreadsheet the application cannot guess whether I am going to enter Hebrew or English data. Nor is it determined by the user interface language. Word is similar. BTW, it is not off topic, the global direction is defined by the Unicode bidi algorithm. Jony -Original Message- From: John Cowan [mailto:[EMAIL PROTECTED]] Sent: Thursday, December 07, 2000 7:31 PM To: Unicode List Subject: Re: OT (Kind of): Determining whether Locales are left-to-right or Tex Texin wrote: For example, the case that you are suggesting goes off the rails, is exactly what the user would want if he were going to query the list of just Arabic and Israeli records and then copy and paste them into perhaps a spreadsheet. Is it? I've never seen a RTL localized spreadsheet program; perhaps the A1 column appears rightmost, in which case pasting the logical-order would make the text end up with country in A (rightmost), firstname in B, lastname in C. Which is what you want. -- There is / one art || John Cowan [EMAIL PROTECTED] no more / no less|| http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein
Re: OT (Kind of): Determining whether Locales are left-to-right or
John, I wasn't thinking about a,b,c so much as the headings in the results table and the ordering of the data in the fields: country, state, name Where I have seen Hebrew or Arabic tables the heading order changes from LTR to RTL, and so you would want the data returned from the query to be in the same order. Hence the need for the columns of the query to be switched around. tex John Cowan wrote: Tex Texin wrote: For example, the case that you are suggesting goes off the rails, is exactly what the user would want if he were going to query the list of just Arabic and Israeli records and then copy and paste them into perhaps a spreadsheet. Is it? I've never seen a RTL localized spreadsheet program; perhaps the A1 column appears rightmost, in which case pasting the logical-order would make the text end up with country in A (rightmost), firstname in B, lastname in C. Which is what you want. -- There is / one art || John Cowan [EMAIL PROTECTED] no more / no less|| http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein -- According to Murphy, nothing goes according to Hoyle. -- Tex Texin Director, International Business mailto:[EMAIL PROTECTED] +1-781-280-4271 Fax:+1-781-280-4655 Progress Software Corp.14 Oak Park, Bedford, MA 01730 http://www.Progress.com#1 Embedded Database Globalization Program http://www.Progress.com/partners/globalization.htm ---
Re: OT (Kind of): Determining whether Locales are left-to-right or
David Tooke wrote: ...And in response to Tex. I would spend less time debating which is correct, and simply offer a button on the UI to flip the ordering of the page. Although we cannot have a option to store preferences for particular users, we will allow users to change the locale during the current session. This is initially set to the users browser default; but they can change it. We do not anticipate providing an option that just changes the page directionality. I believe if the user says 'ar_EG' they should get RTL, if they say 'en_US' then they should get LTR. David, Since you have the technology to flip the page direction, it is easy to make a button override. And as there are examples where the user's locale is wrong for establishing the page direction, it seems silly to me to ask the user to change his locale to get the direction he wants. Doing so may also change other settings in ways he doesn't want. It also requires the user to know all of the properties of all of the locales to establish which ones change page direction but keep other properties with the same or similar values. However, you are free to do as you like. ;-) tex -- According to Murphy, nothing goes according to Hoyle. -- Tex Texin Director, International Business mailto:[EMAIL PROTECTED] +1-781-280-4271 Fax:+1-781-280-4655 Progress Software Corp.14 Oak Park, Bedford, MA 01730 http://www.Progress.com#1 Embedded Database Globalization Program http://www.Progress.com/partners/globalization.htm ---
Testing - please ignore [eom]
Aramaic by any other name.........
Hello, Aramaic is spoken in many countries today:Israel, Armenia, Georgia, Turkey, Iran, Iraq, U.S., probably Azerbaijan, maybe further into Central Asia in some pockets...but it's never called Aramaic, as far as I knowit's called Surit, Kurdit, Turoyo, Assyrian, Mandaean, Ma'alula, etc.Scholars frequently call all these "Neo-Aramaic." Aramaic has been written in its original square script (today called "Hebrew square script"), in Syriac, in Arabic, and even in Cyrillic in the former Soviet Union. Thanks for the help with the Arabic-script languages---when I get a "complete" list--about 100-- I'll post --Elaine PS: Right to left languages written in "Hebrew-Aramaic square script" include Kurdi, Dzhidi, Yevanic, Italki, Bukhari, Persian, Tamazight, Shuadit, Comtadin, pre-1920s Ladino, Middle Arabic, Yiddish, and others..
UCS-2, UCS-4, UTF-16 unicode format files
Hello, I am new to this mailing list. Hope it is appropriate to ask the following question : 1. Any browser that I can used to view UCS-2, UCS-4, UTF-16 unicode format files ? 2. I am looking for some UCS-2, UCS-4, UTF-16 unicode format file. Any web site which I can download the above files ? or any software I can generate those files ? Thanks your help Er Song Moong
Re: Did I do this right?
[EMAIL PROTECTED] wrote: I put some japanese text up on the web http://11digitboy.stormloader.com and i think i did it right. Is the #n; format correct where n is a unicode code in decimal? do i set netscape to utf-8 to see it? what about msie? For Netscape 6/Mozilla MSIE, you don't have to set the encoding at all as long as you have a Japanese font. For Communicator 4.x, you need to set the encoding to one which contains the characters you have in mind. Since the ones you used are Japanese Katakana, you can view them under any Japanese encoding, GB2312, EUC-KR and Unicode. (Note that you would have to have set appropriate fonts which contain these glyphs for the encodings I mentioned.) - Kat -- Katsuhiko Momoi Netscape International Client Products Group [EMAIL PROTECTED]