Re: displaying Unicode text (was Re: Transcriptions of Unicode)

2000-12-07 Thread Erik van der Poel

Mark Davis wrote:
 
 Let's take an example.
 
 - The page is UTF-8.
 - It contains a mixture of German, dingbats and Hindi text.
 - My locale is de_DE.
 
 From your description, it sounds like Modzilla works as follows:
 
 - The locale maps (I'm guessing) to 8859-1
 - 8859 maps to, say Helvetica.
 - The dingbats and Hindi appear as boxes or question marks.
 
 This would be pretty lame, so I hope I misunderstand you!!

Sorry, I've been abbreviating quite a bit, so I left out a lot. Yes,
you've misunderstood me, but only because I abbreviated so much. Sorry.
Let me try again, with more feeling this time.

Using the example above:

- The locale maps to "x-western" (ja_JP would map to "ja", so I've
prepended "x-" for the "language groups" that don't exist in RFC 1766)

- x-western and CSS' sans-serif map to Arial

- The dingbats appear as dingbats if they are in Unicode and at least
one of the dingbat fonts on the system has a Unicode cmap subtable
(WingDings is a "symbol" font, so it doesn't have such a table), while
the Hindi might display OK on some Windows systems if they have Hindi
support (Mozilla itself does not support any Indic languages yet).

We could support the WingDings font if we add an entry for WingDings to
the following table:

http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp#872

We just haven't done that yet.

Basically, Mozilla will look at all the fonts on the system to find one
that contains a glyph for the current character.

The language group and user locale stuff that I mentioned earlier is
only one part of the process -- the part that deals with the user's font
preferences. I'll explain more of the rest of the process:

Mozilla implements CSS2's font matching algorithm:

  http://www.w3.org/TR/REC-CSS2/fonts.html#algorithm

This states that *for each character* in the element, the implementation
is supposed to go down the list of fonts in the font-family property, to
find a font that exists and that contains a glyph for the current
character. Mozilla implements this algorithm to the letter, which means
that fonts are chosen for each character without regard for neighboring
characters (unlike MSIE). This may actually have been a bad decision,
since we sometimes end up with text that looks odd due to font changes.

Anyway, Mozilla's algorithm has the following steps:

1. "User-Defined" font
2. CSS font-family property
3. CSS generic font (e.g. serif)
4. list of all fonts on system
5. transliteration
6. question mark

You can see these steps in the following pieces of code:

http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp#2642

http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#3108

1. "User-Defined" font (FindUserDefinedFont)

We decided to include the User-Defined font functionality in Netscape 6
again. It is similar to the old Netscape 4.X. Basically, if the user
selects this encoding from the View menu, then the browser passes the
bytes through to the font, untouched. This is for charsets that we don't
already support. This step needs to be the first step, since it
overrides everything else.

2. CSS font-family property (FindLocalFont)

If the user hasn't selected User-Defined, we invoke this routine. It
simply goes down the font-family list to find a font that exists and
that contains a glyph for the current character. E.g.:

  font-family: Arial, "MS Gothic", sans-serif;

3. CSS generic font (FindGenericFont)

If the above fails, this routine tries to find a font for the CSS
generic (e.g. sans-serif) that was found in the font-family property, if
any, otherwise it falls back to the user's default (serif or
sans-serif). This is where the font preferences come in, so this is
where we try to determine the language group of the element. I.e. we
take the LANG attribute of this element or a parent element if any,
otherwise the language group of the document's charset, if
non-Unicode-based, otherwise the user's locale's language group.

4. list of all fonts on system (FindGlobalFont)

If the above fails, this routine goes through all fonts on the system,
trying to find one that contains a glyph for the current character.

5. transliteration (FindSubstituteFont)

If we still can't find a font for this character, we try a
transliteration table. For example, the euro is mapped to the 3 ASCIIs
"EUR", which is useful on some Unix systems that don't have the euro
glyph yet. Actually, this transliteration step isn't even implemented on
Windows yet.

6. question mark (FindSubstituteFont)

If we can't find a transliteration, we fall back to the last resort --
the good ol' question mark.

That's it. I hope I didn't abbreviate too much this time!

Erik



Re: OT (Kind of): Determining whether Locales are left-to-right or right-to-left.

2000-12-07 Thread Lukas Pietsch

Michael Kaplan wrote:


  plus...
  dumb question 1.  Is Aramaic (which doesn't seem to have a 2 character
ISO
  code) the same as Amharic (which does...AM)?   If not, Amharic appears
to
 be
  a Semetic language too, is that written right-to-left too?

 Amharic uses the Ethiopic script, and is not RTL as far a I know. Aramaic
 has no native speakers

As far as I know, there is still a (small) minority of speakers in Turkey
and Syria who speak the present-day descendant language of (biblical)
Aramaic. This present-day dialect is commonly called Aramaic too. I have
absolutely no idea what writing system, if any, they would use today
(although probably not the ancient Aramaic script? More likely Arabic?)

Lukas Pietsch




Re: OT (Kind of): Determining whether Locales are left-to-right

2000-12-07 Thread Michael Everson

Ar 01:47 -0800 2000-12-07, scríobh Antoine Leca:

 Urdu written in Nagari script is left-to-right? This is new to me...

 No, Urdu written in Nagari script is Hindi.

There are a number of Muslim people that insist on naming it Urdu rather
than Hindi. Since both codes exist, and hence if you look at the "locale"
you are in fact looking after the language code, as a result, you "should"
be able to deal with requests such as "Urdu written in Nagari"; or "Urdu
written in Latin script"... (why not? after all, it is more practical when
you are using some random computer.)

Antoine, it was a joke. Humour.

Furthermore, as a language Urdu predates Hindi by a wide margin (several
centuries). And between Medieval times and XIXth century, this language,
named Urdu, was written alternatively using Nagari or Arabic script.
Now I agree that this is irrelevant to the current problem.

Um, my understanding is that the "Hindustani" language (so called
"Hindoostani" by the British way back when) is really fairly uniform, apart
from the alphabet it is written (Arabic by Muslims, Nagari by Hindus, to
use the sectarian taxonomy), and the fact that for much of the higher
terminology, Urdu tends to borrow from Arabic and Hindi tends to borrow
from Sanskrit. You may mean that "as a written language" Urdu predates
Hindi.

Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire





Re: Transcriptions of Unicode

2000-12-07 Thread David Starner

On Wed, Dec 06, 2000 at 11:12:24PM -0800, James Kass wrote:
 As for Chinese users searching for Chinese
 strings, Japanese text will most probably be incomprehensible
 regardless of font or mark-up. 

That's true for pretty much every other pair of languages that use the
same script, though.

-- 
David Starner - [EMAIL PROTECTED]
http://dvdeug.dhis.org
"(You see, the best way to solve a problem is to rigorously define it in
terms of other people's problems and then run away quickly.)"
   -- Roland McGrath [EMAIL PROTECTED]



Re: OT (Kind of): Determining whether Locales are left-to-right or

2000-12-07 Thread David Tooke

 The application isn't "english", it's "an application". Properly done, it
 should be internationalized and thus able to be an "Arabic
 application" when serving Arabic pages and English when serving English
 pages.
I totally agree.  This  is actually what I am trying to achieve and what I
was trying to convey to John.  Unfortunately, the nature of the application
means that certain labels from the application are exposed to the user which
may be in a different language to the data they pertain.  I have been using
the example of English for the application data and Arabic for the data from
the database.   In reality, the application will be translated into several
(probably about 8 languages).  The data in the database is stored in Unicode
and so could be in any language supported by Unicode...and the browser of
course.   The actual data is maintained by users via a web interface.   It
is conceivably possible that data can be entered by multiple users in
multiple languages...some of which RTL and some LTR.  We will, of course,
allow the user to tag the data with a language identifier.

However, I guess my problem boils down to this.

If a user requests a page that contains data that could, potentially, be in
multiple languages.   What criteria does one use to determine directionality
of the page?   The directionality of the *text* is implied by each data
element itself.   But what about the page?

I don't think that examining all the data elements and saying well there's
70% RTL data so lets put the page in RTL.  I didn't think that was
practical.  Especially, if it causes means that one user is going to see the
same page in RTL or LTR based on selection criteria over the database.

My reasoning was that a user that has expressed a preference for a RTL
language would not mind seeing the page formatted for a RTL language.
Especially, if that there is a high probability that that data they will be
seeing is in that language.  (If they have indicated a preference for that
language, then they would probably enter the data in that language.)I
know this reasoning is somewhat flawed but I am trying to the best with what
I can.

I would be interested to hear the opinion of a native Arabic or Hebrew (or
one of the other RTL language) speakers as to how they would prefer to view
a page such as that.   My thought would be that such a person would have a
preference for a RTL page; their eyes would naturally scan RTL formatted
pages easier than LTR formatted pages.

Hell, if they have no preference then I can just leave the directionality of
the page to always be LTR...it's, obviously, less work!   :-)


David Tooke
[EMAIL PROTECTED]





- Original Message -
From: [EMAIL PROTECTED]
To: "David Tooke" [EMAIL PROTECTED]
Cc: "Unicode List" [EMAIL PROTECTED]
Sent: Friday, December 08, 2000 2:54 AM
Subject: Re: OT (Kind of): Determining whether Locales are left-to-right or


 Hi David,

 I sense a subtle (and not uncommon) disconnect in your last response.

 The application isn't "english", it's "an application". Properly done, it
 should be internationalized and thus able to be an "Arabic
 application" when serving Arabic pages and English when serving English
 pages. You might have instances where an "English application" is
 *showing* some Arabic data, in which case you have some Arabic data on an
 English formatted page.

 Now: the data locale, page (user) locale, and server locale are three
 separate, independent things and each has a certain validity under certain
 circumstances. Just because you're showing the date stamp for some event
 in Riyadh (sp?) doesn't mean you should use an Arabic date format (if the
 user's locale is English). Similarly, the date stamp for an event in
 London *would* be in an Arabic format for an Arabic user, no?

 Trying to rely on the range of characters or some heuristic to determine
 what the data locale is implies a hole in your database schema. The page
 layout will vary by *user* locale, but data presented in it may be
 formatted for its own locale (for example, an Arabic piece of text, say a
 customer's name, will be presented RTL *within* a generally LTR page).

 So, in short, you need to negotiate a locale with the user in your
 application and use that to determine the overall page layout. *Within*
 that page there will be specific instances (or not) of the data locale
 being used to format content.

 An "Arabic application" run from your server will have directionality tags
 in the HTML (at least for modern browsers) which will greatly assist the
 "relayout", plus it will load explicitly RTL page elements (such as
 graphics, etc.) and use an Arabic locale for formatting non-String
 datatypes.

 Good luck,

 Addison

 ===
 Addison P. PhillipsPrincipal Consultant
 Inter-Locale LLChttp://www.inter-locale.com
 Los Gatos, CA, USA  mailto:[EMAIL PROTECTED]

 +1 

Re: OT (Kind of): Determining whether Locales are left-to-right or

2000-12-07 Thread addison

Hi David,

Actually, I think that the directionality of the page *CONSIDERED AS A
WHOLE* is directly related to the locale of the user.

For example, I might look at a page that contains an very large result set
from a database query, presented as a table. The results would comprise
90% of the text, let's say, of the document. If the results are all in
Hebrew, should I re-layout the page, if the headings and the footer and
other information is in English? 

I think the answer is "no"--in other words, the opposite of what you're
saying. If the *user* locale is en_US, then the page should be laid out
for that user's preferences, even if the data itself (in individual
fields) is RTL.

IOW, locale has more than one level and you can have more than one locale
on a page... but the "overriding" locale--the "page default" should go
with the USER'S language preference, not with the specific data currently
being retrieved. That way you can still do reasonable things when mixing
disparate languages in the same page. It also means that the user
experience will be homogenous during a session---once in the RTL layout,
always in the RTL layout.

thanks,

Addison

===
Addison P. PhillipsPrincipal Consultant
Inter-Locale LLChttp://www.inter-locale.com
Los Gatos, CA, USA  mailto:[EMAIL PROTECTED]

+1 408.210.3569 (mobile)  +1 408.904.4762 (fax)
===
Globalization Engineering  Consulting Services

On Thu, 7 Dec 2000, David Tooke wrote:

  The application isn't "english", it's "an application". Properly done, it
  should be internationalized and thus able to be an "Arabic
  application" when serving Arabic pages and English when serving English
  pages.
 I totally agree.  This  is actually what I am trying to achieve and what I
 was trying to convey to John.  Unfortunately, the nature of the application
 means that certain labels from the application are exposed to the user which
 may be in a different language to the data they pertain.  I have been using
 the example of English for the application data and Arabic for the data from
 the database.   In reality, the application will be translated into several
 (probably about 8 languages).  The data in the database is stored in Unicode
 and so could be in any language supported by Unicode...and the browser of
 course.   The actual data is maintained by users via a web interface.   It
 is conceivably possible that data can be entered by multiple users in
 multiple languages...some of which RTL and some LTR.  We will, of course,
 allow the user to tag the data with a language identifier.
 
 However, I guess my problem boils down to this.
 
 If a user requests a page that contains data that could, potentially, be in
 multiple languages.   What criteria does one use to determine directionality
 of the page?   The directionality of the *text* is implied by each data
 element itself.   But what about the page?
 
 I don't think that examining all the data elements and saying well there's
 70% RTL data so lets put the page in RTL.  I didn't think that was
 practical.  Especially, if it causes means that one user is going to see the
 same page in RTL or LTR based on selection criteria over the database.
 
 My reasoning was that a user that has expressed a preference for a RTL
 language would not mind seeing the page formatted for a RTL language.
 Especially, if that there is a high probability that that data they will be
 seeing is in that language.  (If they have indicated a preference for that
 language, then they would probably enter the data in that language.)I
 know this reasoning is somewhat flawed but I am trying to the best with what
 I can.
 
 I would be interested to hear the opinion of a native Arabic or Hebrew (or
 one of the other RTL language) speakers as to how they would prefer to view
 a page such as that.   My thought would be that such a person would have a
 preference for a RTL page; their eyes would naturally scan RTL formatted
 pages easier than LTR formatted pages.
 
 Hell, if they have no preference then I can just leave the directionality of
 the page to always be LTR...it's, obviously, less work!   :-)
 
 
 David Tooke
 [EMAIL PROTECTED]
 
 
 
 
 
 - Original Message -
 From: [EMAIL PROTECTED]
 To: "David Tooke" [EMAIL PROTECTED]
 Cc: "Unicode List" [EMAIL PROTECTED]
 Sent: Friday, December 08, 2000 2:54 AM
 Subject: Re: OT (Kind of): Determining whether Locales are left-to-right or
 
 
  Hi David,
 
  I sense a subtle (and not uncommon) disconnect in your last response.
 
  The application isn't "english", it's "an application". Properly done, it
  should be internationalized and thus able to be an "Arabic
  application" when serving Arabic pages and English when serving English
  pages. You might have instances where an "English application" is
  *showing* some Arabic data, in 

RE: [OT] Arabic script langs in 3.0 ; list?

2000-12-07 Thread Carl W. Brown

Elaine,

You speak of the "standard Arabic script".  You must add additional letters
to the standard Arabic script for languages such as Farsi and Urdu.

Carl

-Original Message-
From: Elaine Keown [mailto:[EMAIL PROTECTED]]
Sent: Monday, December 04, 2000 2:20 PM
To: Unicode List
Subject: [OT] Arabic script langs in 3.0 ; list?


Hello,

Unicode 3.0 mentions 11 contemporary languages written in Arabic, most from
Central Asia, none from Africa except Arabic---Berber is not mentioned.  Is
Arabic script no longer used south of the Sahara? Or does standard Arabic
script easily cover relevant African languages?

My usually excellent university library did not answer this question also:
today how many languages are written in Arabic script?

I can only find 25:  Arabic  Balti  Baluchi  Berber  Farsi  Hausa  Karaite
Kashmiri  Kazakh  Kirghiz Kurmanji  Luri  Mazanderani   Moplah
Panjabi---PakistaniPashto   Pulaar  Sindhi  Siraiki (also known as
Saraiki or Lahnda or Western Panjabi)   Sulu   Uighur   Urdu   Uzbek  Wolof

Help appreciated with improving my list-Elaine

___

Free Unlimited Internet Access! Try it now!
http://www.zdnet.com/downloads/altavista/index.html

___




Re: OT (Kind of): Determining whether Locales are left-to-right or

2000-12-07 Thread John Cowan

This message is best viewed with a monowidth font.

[EMAIL PROTECTED] wrote:

 For example, I might look at a page that contains an very large result set
 from a database query, presented as a table. The results would comprise
 90% of the text, let's say, of the document. If the results are all in
 Hebrew, should I re-layout the page, if the headings and the footer and
 other information is in English?
 
 I think the answer is "no"--in other words, the opposite of what you're
 saying.

So far so clear.  The page as a whole is LTR with RTL inclusions,
namely the database content, like this (as usual, lowercase is
LTR text, UPPERCASE is RTL text):

top dogs by country

countryfirstnamelastname
----
u.s.   bill clinton
israel DUHE KARAB
u.k.   tony blair
syria  RASHAB   DASSA-LA

 If the *user* locale is en_US, then the page should be laid out
 for that user's preferences, even if the data itself (in individual
 fields) is RTL.

But now your message seems to go off the rails.  If the application is
*not* localizable in Hebrew (it insists on presenting header, footer,
etc. in English), but the browser's locale setting is "il-he", you want
the page to be presented RTL with the header and footer as embedded
LTR, like this?

   top dogs by country

lastname firstname country
 - ---
clinton  bill  u.s.
KARABDUHE  israel
blairtony  u.k.
DASSA-LA RASHABsyria

  If a user requests a page that contains data that could, potentially, be in
  multiple languages.   What criteria does one use to determine directionality
  of the page?   The directionality of the *text* is implied by each data
  element itself.   But what about the page?

My view is that the base direction of the page is the direction of the fixed
elements on the page.  If these fixed elements are in English, the base
direction is always LTR.  If the fixed elements are localizable based on
the browser settings, then whatever language/script they are localized
into determines the base direction.

In short, Example 1 good, Example 2 bad, no matter what the browser setting is.
Of course, if the application can cope with an "il-he" browser setting and
render the fixed elements into Hebrew, then the base direction should be
RTL.

-- 
There is / one art   || John Cowan [EMAIL PROTECTED]
no more / no less|| http://www.reutershealth.com
to do / all things   || http://www.ccil.org/~cowan
with art- / lessness \\ -- Piet Hein



Re: displaying Unicode text (was Re: Transcriptions of Unicode)

2000-12-07 Thread Mark Davis

Thanks! I appreciate the description. My fears were unfounded.

 This states that *for each character* in the element, the implementation
 is supposed to go down the list of fonts in the font-family property, to
 find a font that exists and that contains a glyph for the current
 character.

I agree that this does not produce the optimal results, since one should
have the freedom to select different fonts based on the context of the
character. The above description is much better than a very coarse-grained
approach (like having the entire document or element in the same font), but
needs some more wriggle-room to allow people flexibility.

Mark

- Original Message -
From: "Erik van der Poel" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Cc: "Unicode List" [EMAIL PROTECTED]
Sent: Thursday, December 07, 2000 00:30
Subject: Re: displaying Unicode text (was Re: Transcriptions of "Unicode")


 Mark Davis wrote:
 
  Let's take an example.
 
  - The page is UTF-8.
  - It contains a mixture of German, dingbats and Hindi text.
  - My locale is de_DE.
 
  From your description, it sounds like Modzilla works as follows:
 
  - The locale maps (I'm guessing) to 8859-1
  - 8859 maps to, say Helvetica.
  - The dingbats and Hindi appear as boxes or question marks.
 
  This would be pretty lame, so I hope I misunderstand you!!

 Sorry, I've been abbreviating quite a bit, so I left out a lot. Yes,
 you've misunderstood me, but only because I abbreviated so much. Sorry.
 Let me try again, with more feeling this time.

 Using the example above:

 - The locale maps to "x-western" (ja_JP would map to "ja", so I've
 prepended "x-" for the "language groups" that don't exist in RFC 1766)

 - x-western and CSS' sans-serif map to Arial

 - The dingbats appear as dingbats if they are in Unicode and at least
 one of the dingbat fonts on the system has a Unicode cmap subtable
 (WingDings is a "symbol" font, so it doesn't have such a table), while
 the Hindi might display OK on some Windows systems if they have Hindi
 support (Mozilla itself does not support any Indic languages yet).

 We could support the WingDings font if we add an entry for WingDings to
 the following table:


http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp
#872

 We just haven't done that yet.

 Basically, Mozilla will look at all the fonts on the system to find one
 that contains a glyph for the current character.

 The language group and user locale stuff that I mentioned earlier is
 only one part of the process -- the part that deals with the user's font
 preferences. I'll explain more of the rest of the process:

 Mozilla implements CSS2's font matching algorithm:

   http://www.w3.org/TR/REC-CSS2/fonts.html#algorithm

 This states that *for each character* in the element, the implementation
 is supposed to go down the list of fonts in the font-family property, to
 find a font that exists and that contains a glyph for the current
 character. Mozilla implements this algorithm to the letter, which means
 that fonts are chosen for each character without regard for neighboring
 characters (unlike MSIE). This may actually have been a bad decision,
 since we sometimes end up with text that looks odd due to font changes.

 Anyway, Mozilla's algorithm has the following steps:

 1. "User-Defined" font
 2. CSS font-family property
 3. CSS generic font (e.g. serif)
 4. list of all fonts on system
 5. transliteration
 6. question mark

 You can see these steps in the following pieces of code:


http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp
#2642


http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#310
8

 1. "User-Defined" font (FindUserDefinedFont)

 We decided to include the User-Defined font functionality in Netscape 6
 again. It is similar to the old Netscape 4.X. Basically, if the user
 selects this encoding from the View menu, then the browser passes the
 bytes through to the font, untouched. This is for charsets that we don't
 already support. This step needs to be the first step, since it
 overrides everything else.

 2. CSS font-family property (FindLocalFont)

 If the user hasn't selected User-Defined, we invoke this routine. It
 simply goes down the font-family list to find a font that exists and
 that contains a glyph for the current character. E.g.:

   font-family: Arial, "MS Gothic", sans-serif;

 3. CSS generic font (FindGenericFont)

 If the above fails, this routine tries to find a font for the CSS
 generic (e.g. sans-serif) that was found in the font-family property, if
 any, otherwise it falls back to the user's default (serif or
 sans-serif). This is where the font preferences come in, so this is
 where we try to determine the language group of the element. I.e. we
 take the LANG attribute of this element or a parent element if any,
 otherwise the language group of the document's charset, if
 non-Unicode-based, otherwise the user's locale's language group.

Re: OT (Kind of): Determining whether Locales are left-to-right or

2000-12-07 Thread Peter_Constable


On 12/06/2000 12:19:00 PM "Michael \(michka\) Kaplan" wrote:

Aramaic has no native speakers

True, if what is meant is Ancient Aramaic. False if we mean Assyrian or
Chaldean Neo-Aramaic.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]





Re: OT (Kind of): Determining whether Locales are left-to-right or

2000-12-07 Thread Tex Texin

I think you really need to give the user the option to override
the assumptions being made, as the degree of familiarity and experience
the user has with Hebrew and Arabic, and the purpose for using
the application will make a big difference.

For example, the case that you are suggesting goes off the rails,
is exactly what the user would want if he were going to
query the list of just Arabic and Israeli records and then copy and
paste them into perhaps a spreadsheet. Having the header
ordering change would allow the paste to place the date from
each field in the right columns in the target spreadsheet.

I would spend less time debating which is correct, and simply offer
a button on the UI to flip the ordering of the page.

tex

John Cowan wrote:
 
 This message is best viewed with a monowidth font.
 
 [EMAIL PROTECTED] wrote:
 
  For example, I might look at a page that contains an very large result set
  from a database query, presented as a table. The results would comprise
  90% of the text, let's say, of the document. If the results are all in
  Hebrew, should I re-layout the page, if the headings and the footer and
  other information is in English?
 
  I think the answer is "no"--in other words, the opposite of what you're
  saying.
 
 So far so clear.  The page as a whole is LTR with RTL inclusions,
 namely the database content, like this (as usual, lowercase is
 LTR text, UPPERCASE is RTL text):
 
 top dogs by country
 
 countryfirstnamelastname
 ----
 u.s.   bill clinton
 israel DUHE KARAB
 u.k.   tony blair
 syria  RASHAB   DASSA-LA
 
  If the *user* locale is en_US, then the page should be laid out
  for that user's preferences, even if the data itself (in individual
  fields) is RTL.
 
 But now your message seems to go off the rails.  If the application is
 *not* localizable in Hebrew (it insists on presenting header, footer,
 etc. in English), but the browser's locale setting is "il-he", you want
 the page to be presented RTL with the header and footer as embedded
 LTR, like this?
 
top dogs by country
 
 lastname firstname country
  - ---
 clinton  bill  u.s.
 KARABDUHE  israel
 blairtony  u.k.
 DASSA-LA RASHABsyria
 
   If a user requests a page that contains data that could, potentially, be in
   multiple languages.   What criteria does one use to determine directionality
   of the page?   The directionality of the *text* is implied by each data
   element itself.   But what about the page?
 
 My view is that the base direction of the page is the direction of the fixed
 elements on the page.  If these fixed elements are in English, the base
 direction is always LTR.  If the fixed elements are localizable based on
 the browser settings, then whatever language/script they are localized
 into determines the base direction.
 
 In short, Example 1 good, Example 2 bad, no matter what the browser setting is.
 Of course, if the application can cope with an "il-he" browser setting and
 render the fixed elements into Hebrew, then the base direction should be
 RTL.
 
 --
 There is / one art   || John Cowan [EMAIL PROTECTED]
 no more / no less|| http://www.reutershealth.com
 to do / all things   || http://www.ccil.org/~cowan
 with art- / lessness \\ -- Piet Hein

-- 
According to Murphy, nothing goes according to Hoyle.
--
Tex Texin  Director, International Business
mailto:[EMAIL PROTECTED]  +1-781-280-4271 Fax:+1-781-280-4655
Progress Software Corp.14 Oak Park, Bedford, MA 01730

http://www.Progress.com#1 Embedded Database

Globalization Program   
http://www.Progress.com/partners/globalization.htm
---



Re: OT (Kind of): Determining whether Locales are left-to-right or

2000-12-07 Thread David Tooke

So, suppose all the data from the database is in Hebrew...and the user's
browser is set to Hebrew.  You are saying that because we have some headings
in English; the entire page should be formatted LTR not RTL.
I have to disagree.   The *content* of the page would be Hebrew.  The
headings are simply utilitarian, they shouldn't affect that.

I think we have established it would be impractical to base the
directionality on that of the database because of the presence of mixed
languages...in order to create a uniform experience for the user.

I don't think it should be based of the application.  A Hebrew document
written by a user on an untranslated word processor is *still* a Hebrew
document.

This just leaves the users locale.   I know you balk at having it like your
second example, but what is so bad about it?   It looks kinda strange to you
and me.   But would it to a native speaker of a RTL language?  I would be
interested in their opinion.  (I am assuming you are not a native speaker of
a RTL language.)

- Original Message -
From: "John Cowan" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Cc: "Unicode List" [EMAIL PROTECTED]
Sent: Thursday, December 07, 2000 11:01 AM
Subject: Re: OT (Kind of): Determining whether Locales are left-to-right or


 This message is best viewed with a monowidth font.

 [EMAIL PROTECTED] wrote:

  For example, I might look at a page that contains an very large result
set
  from a database query, presented as a table. The results would comprise
  90% of the text, let's say, of the document. If the results are all in
  Hebrew, should I re-layout the page, if the headings and the footer and
  other information is in English?
 
  I think the answer is "no"--in other words, the opposite of what you're
  saying.

 So far so clear.  The page as a whole is LTR with RTL inclusions,
 namely the database content, like this (as usual, lowercase is
 LTR text, UPPERCASE is RTL text):

 top dogs by country

 countryfirstnamelastname
 ----
 u.s.   bill clinton
 israel DUHE KARAB
 u.k.tony blair
 syria  RASHAB   DASSA-LA

  If the *user* locale is en_US, then the page should be laid out
  for that user's preferences, even if the data itself (in individual
  fields) is RTL.

 But now your message seems to go off the rails.  If the application is
 *not* localizable in Hebrew (it insists on presenting header, footer,
 etc. in English), but the browser's locale setting is "il-he", you want
 the page to be presented RTL with the header and footer as embedded
 LTR, like this?

top dogs by country

 lastname firstname country
  - ---
 clinton  bill  u.s.
 KARABDUHE  israel
 blairtony  u.k.
 DASSA-LA RASHABsyria

   If a user requests a page that contains data that could, potentially,
be in
   multiple languages.   What criteria does one use to determine
directionality
   of the page?   The directionality of the *text* is implied by each
data
   element itself.   But what about the page?

 My view is that the base direction of the page is the direction of the
fixed
 elements on the page.  If these fixed elements are in English, the base
 direction is always LTR.  If the fixed elements are localizable based on
 the browser settings, then whatever language/script they are localized
 into determines the base direction.

 In short, Example 1 good, Example 2 bad, no matter what the browser
setting is.
 Of course, if the application can cope with an "il-he" browser setting and
 render the fixed elements into Hebrew, then the base direction should be
 RTL.

 --
 There is / one art   || John Cowan
[EMAIL PROTECTED]
 no more / no less|| http://www.reutershealth.com
 to do / all things   || http://www.ccil.org/~cowan
 with art- / lessness \\ -- Piet Hein




Re: OT (Kind of): Determining whether Locales are left-to-right or

2000-12-07 Thread John Cowan

David Tooke wrote:

 I don't think it should be based of the application.  A Hebrew document
 written by a user on an untranslated word processor is *still* a Hebrew
 document.

I assume you mean "a word processor localized in English" rather than
"a word processor that can't do bidi".  If the former, I agree.  If the
latter, it will be sheer gibberish, unless the user is a magician at
writing his Hebrew LTR, line by line.

But note that the chrome of a word-processor (the border, menus,
etc. etc.) aren't *part* of the document it is editing.

Take the converse case:  the document frame is in Hebrew, with the
key column on the right, but most of the data is in English.  On my
assumptions, when I look at such a thing, I am a little confused
because the key column is on the right.  But when I select the
variable part of the text (starting in the upper right corner and
sweeping to the lower left corner), and copy and paste it to
a word processor, the logical-order rules ensure that the text
comes out with the key column on the left again.

 (I am assuming you are not a native speaker of a RTL language.)

I am not.

-- 
There is / one art   || John Cowan [EMAIL PROTECTED]
no more / no less|| http://www.reutershealth.com
to do / all things   || http://www.ccil.org/~cowan
with art- / lessness \\ -- Piet Hein



Re: OT (Kind of): Determining whether Locales are left-to-right or

2000-12-07 Thread John Cowan

[EMAIL PROTECTED] wrote:

 Hi John,
 
 I think we're saying the same thing: the language of the page is the base
 directionality. If the application is localized into Hebrew, then the base
 directionality is RTL.

Okay, good.  In that case by "user locale" you mean "language of the (fixed
parts of the) page", which is confusing; David was interpreting it as
"locale set in the browser".

-- 
There is / one art   || John Cowan [EMAIL PROTECTED]
no more / no less|| http://www.reutershealth.com
to do / all things   || http://www.ccil.org/~cowan
with art- / lessness \\ -- Piet Hein



Re: OT (Kind of): Determining whether Locales are left-to-right or

2000-12-07 Thread addison

Hi John,

I think we're saying the same thing: the language of the page is the base
directionality. If the application is localized into Hebrew, then the base
directionality is RTL.

Addison

===
Addison P. PhillipsPrincipal Consultant
Inter-Locale LLChttp://www.inter-locale.com
Los Gatos, CA, USA  mailto:[EMAIL PROTECTED]

+1 408.210.3569 (mobile)  +1 408.904.4762 (fax)
===
Globalization Engineering  Consulting Services

On Thu, 7 Dec 2000, John Cowan wrote:

 This message is best viewed with a monowidth font.
 
 [EMAIL PROTECTED] wrote:
 
  For example, I might look at a page that contains an very large result set
  from a database query, presented as a table. The results would comprise
  90% of the text, let's say, of the document. If the results are all in
  Hebrew, should I re-layout the page, if the headings and the footer and
  other information is in English?
  
  I think the answer is "no"--in other words, the opposite of what you're
  saying.
 
 So far so clear.  The page as a whole is LTR with RTL inclusions,
 namely the database content, like this (as usual, lowercase is
 LTR text, UPPERCASE is RTL text):
 
 top dogs by country
 
 country  firstnamelastname
 ----
 u.s.   bill   clinton
 israel DUHE KARAB
 u.k. tony blair
 syria  RASHAB   DASSA-LA
 
  If the *user* locale is en_US, then the page should be laid out
  for that user's preferences, even if the data itself (in individual
  fields) is RTL.
 
 But now your message seems to go off the rails.  If the application is
 *not* localizable in Hebrew (it insists on presenting header, footer,
 etc. in English), but the browser's locale setting is "il-he", you want
 the page to be presented RTL with the header and footer as embedded
 LTR, like this?
 
top dogs by country
 
 lastname firstname country
  - ---
 clinton  bill  u.s.
 KARABDUHE  israel
 blairtony  u.k.
 DASSA-LA RASHABsyria
 
   If a user requests a page that contains data that could, potentially, be in
   multiple languages.   What criteria does one use to determine directionality
   of the page?   The directionality of the *text* is implied by each data
   element itself.   But what about the page?
 
 My view is that the base direction of the page is the direction of the fixed
 elements on the page.  If these fixed elements are in English, the base
 direction is always LTR.  If the fixed elements are localizable based on
 the browser settings, then whatever language/script they are localized
 into determines the base direction.
 
 In short, Example 1 good, Example 2 bad, no matter what the browser setting is.
 Of course, if the application can cope with an "il-he" browser setting and
 render the fixed elements into Hebrew, then the base direction should be
 RTL.
 
 -- 
 There is / one art   || John Cowan [EMAIL PROTECTED]
 no more / no less|| http://www.reutershealth.com
 to do / all things   || http://www.ccil.org/~cowan
 with art- / lessness \\ -- Piet Hein
 




Re: Font help

2000-12-07 Thread John Cowan

Michael Everson wrote:

 I have no idea how I should encode such a font. Help?

According to a suitable legacy encoding, such as JIS X 0208,
Shift-JIS, or MacJapanese.  What you will have is a font which
provides glyphs for only a few coded characters, but no matter.
It's just like an 8859-1 encoded font with glyphs only for
A-Z and a-z, which is common enough.

You could also support the katakana fingerspellings only in an
8-bit font encoded using JIS X 0201.  Mapping tables for all
of these are available at http://www.unicode.org/Public/MAPPINGS
under EASTASIA/JIS and VENDORS/APPLE.

-- 
There is / one art   || John Cowan [EMAIL PROTECTED]
no more / no less|| http://www.reutershealth.com
to do / all things   || http://www.ccil.org/~cowan
with art- / lessness \\ -- Piet Hein



RE: (bidi) Determining whether Locales are left-to-right

2000-12-07 Thread Jonathan Rosenne

Localized Hebrew Excel has a button to flip the global direction of the screen
and presentation of the columns - RTL or LTR - and this preference is also
stored in the file.

The user can choose his default. It should not be determined from the locale,
when I open a new spreadsheet the application cannot guess whether I am going to
enter Hebrew or English data. Nor is it determined by the user interface
language.

Word is similar.

BTW, it is not off topic, the global direction is defined by the Unicode bidi
algorithm.

Jony

 -Original Message-
 From: John Cowan [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, December 07, 2000 7:31 PM
 To: Unicode List
 Subject: Re: OT (Kind of): Determining whether Locales are left-to-right or


 Tex Texin wrote:

  For example, the case that you are suggesting goes off the rails,
  is exactly what the user would want if he were going to
  query the list of just Arabic and Israeli records and then copy and
  paste them into perhaps a spreadsheet.

 Is it?  I've never seen a RTL localized spreadsheet program;
 perhaps the A1 column appears rightmost, in which case pasting the
 logical-order would make the text end up with country in A (rightmost),
 firstname in B, lastname in C.  Which is what you want.

 --
 There is / one art   || John Cowan [EMAIL PROTECTED]
 no more / no less|| http://www.reutershealth.com
 to do / all things   || http://www.ccil.org/~cowan
 with art- / lessness \\ -- Piet Hein




Re: OT (Kind of): Determining whether Locales are left-to-right or

2000-12-07 Thread Tex Texin

John,

I wasn't thinking about a,b,c so much as the headings in the results
table and the ordering of the data in the fields: country, state, name

Where I have seen Hebrew or Arabic tables the heading order
changes from LTR to RTL, and so you would want the data
returned from the query to be in the same order. Hence the
need for the columns of the query to be switched around.

tex

John Cowan wrote:
 
 Tex Texin wrote:
 
  For example, the case that you are suggesting goes off the rails,
  is exactly what the user would want if he were going to
  query the list of just Arabic and Israeli records and then copy and
  paste them into perhaps a spreadsheet.
 
 Is it?  I've never seen a RTL localized spreadsheet program;
 perhaps the A1 column appears rightmost, in which case pasting the
 logical-order would make the text end up with country in A (rightmost),
 firstname in B, lastname in C.  Which is what you want.
 
 --
 There is / one art   || John Cowan [EMAIL PROTECTED]
 no more / no less|| http://www.reutershealth.com
 to do / all things   || http://www.ccil.org/~cowan
 with art- / lessness \\ -- Piet Hein

-- 
According to Murphy, nothing goes according to Hoyle.
--
Tex Texin  Director, International Business
mailto:[EMAIL PROTECTED]  +1-781-280-4271 Fax:+1-781-280-4655
Progress Software Corp.14 Oak Park, Bedford, MA 01730

http://www.Progress.com#1 Embedded Database

Globalization Program   
http://www.Progress.com/partners/globalization.htm
---



Re: OT (Kind of): Determining whether Locales are left-to-right or

2000-12-07 Thread Tex Texin

David Tooke wrote:
 
 ...And in response to Tex.
 I would spend less time debating which is correct, and simply offer
 a button on the UI to flip the ordering of the page.
 Although we cannot have a option to store preferences for particular users,
 we will allow users to change the locale during the current session.   This
 is initially set to the users browser default; but they can change it.  We
 do not anticipate providing an option that just changes the page
 directionality.  I believe if the user says 'ar_EG' they should get RTL, if
 they say 'en_US' then they should get LTR.

David,
Since you have the technology
to flip the page direction, it is easy to make a button override. And as
there are examples where the user's locale is wrong for establishing the
page direction, it seems silly to me to ask the user to change his
locale
to get the direction he wants. Doing so may also change other settings
in
ways he doesn't want. It also requires the user to know all of the
properties of
all of the locales to establish which ones change page direction but
keep other properties with the same or similar values.

However, you are free to do as you like. ;-)

tex

-- 
According to Murphy, nothing goes according to Hoyle.
--
Tex Texin  Director, International Business
mailto:[EMAIL PROTECTED]  +1-781-280-4271 Fax:+1-781-280-4655
Progress Software Corp.14 Oak Park, Bedford, MA 01730

http://www.Progress.com#1 Embedded Database

Globalization Program   
http://www.Progress.com/partners/globalization.htm
---



Testing - please ignore [eom]

2000-12-07 Thread Song Moong Er






Aramaic by any other name.........

2000-12-07 Thread Elaine Keown

Hello, 

Aramaic is spoken in many countries today:Israel, Armenia, Georgia, Turkey, Iran, 
Iraq, U.S., probably Azerbaijan, maybe further into Central Asia in some 
pockets...but it's never called Aramaic, as far as I knowit's called 
Surit, Kurdit, Turoyo, Assyrian, Mandaean, Ma'alula, etc.Scholars frequently call 
all  these "Neo-Aramaic."  

Aramaic has been written in its original square script (today called "Hebrew square 
script"), in Syriac, in Arabic, and even in Cyrillic in the former Soviet Union.  

Thanks for the help with the Arabic-script languages---when I get a "complete" 
list--about 100-- I'll post --Elaine

PS:  Right to left languages written in "Hebrew-Aramaic square script" include Kurdi, 
Dzhidi, Yevanic, Italki, Bukhari, Persian, Tamazight, Shuadit, Comtadin, pre-1920s 
Ladino, Middle Arabic, Yiddish, and others..



UCS-2, UCS-4, UTF-16 unicode format files

2000-12-07 Thread Song Moong Er

Hello,

I am new to this mailing list. Hope it is appropriate to ask the following
question :

1. Any browser that I can used to view UCS-2, UCS-4, UTF-16 unicode format
files ?

2. I am looking for some UCS-2, UCS-4, UTF-16 unicode format file.
Any web site which I can download the above files ? or any software I can
generate those files ?


Thanks your help
Er Song Moong




Re: Did I do this right?

2000-12-07 Thread Katsuhiko Momoi

[EMAIL PROTECTED] wrote:

 I put some japanese text up on the web
 http://11digitboy.stormloader.com
 and i think i did it right. Is the #n; format
 correct where n is a unicode code in decimal? do
 i set netscape to utf-8 to see it? what about msie?

For Netscape 6/Mozilla  MSIE, you don't have to set the encoding at all 
as long as you have a Japanese font. For Communicator 4.x, you need to 
set the encoding to one which contains the characters you have in mind. 
Since the ones you used are Japanese Katakana, you can view them under 
any Japanese encoding, GB2312, EUC-KR and Unicode. (Note that you would 
have to have set appropriate fonts which contain these glyphs for the 
encodings I mentioned.)


- Kat

-- 
Katsuhiko Momoi
Netscape International Client Products Group
[EMAIL PROTECTED]