MCBastos wrote:

Ran into a curious issue today... it's not really a problem, at least
not for me, but still curious.

Let me establish the parameters first. I'm running XP SP3 with the
"extra" fonts (Far Eastern and right-to-left) installed, so it should
have a pretty good Unicode font coverage. Not only that, but I have the
Code2000 font installed also, which should plug most holes Microsoft
left in their coverage. I have the following browsers available:
- Seamonkey 2.0.3 (primary browser)
- IE 8 (fully patched)
- Firefox 3.6
- Opera 10.5
- SRWare Iron 4.0.280 (equivalent to Google Chrome 4)

So, I'm fooling around on Wikipedia and opened this page:

http://en.wikipedia.org/wiki/Burmese_units_of_measurement

The display of the Burmese characters, though, wasn't working right. A
few of the characters were replaced by little squares with four hex
characters, like on the Unicode BMP Fallback font. Which I had
downloaded but not installed, by the way.
Tried other browsers, with the following results:
- Firefox: identical to SM
- Opera: a few *less* Burmese characters displayed correctly. The ones
that didn't show were replaced with thin blank rectangles.
- IE8: *No* Burmese characters were displayed. Instead, I got blank squares.
- Iron: Same results as in IE8.

Gecko browsers still got the best results of the lot, so I guess I
shouldn't complain (I don't even read non-Latin scripts, I install those
extra fonts just because I think the blank characters are ugly). But
still, a few things puzzle me:
1. Why Gecko and Opera achieve only *partial* success? Is this a problem
with Microsoft fonts? Or does Burmese needs special fonts? All the
browsers seemed to display correctly other scripts, such as Thai and
Chinese.
2. Where did those fallback glyphs came from? As I said, I don't have
the Unicode BMP Fallback font installed, and the other browsers don't
show them. Is that a Gecko feature?
3. Even with all those Unicode fonts installed, the page still failed to
display correctly in any browser. Yet I imagine that it should be
displaying correctly for *someone* -- at the least, the person who wrote
the entry. I wonder if this page only works correctly with Oriental
versions of Windows? Or Macs, perhaps? Or is something really hosed with
my computer?


One thing that's different about Burmese (like some other languages I'll mention below) is that their writing system doesn't consist of a linear sequence of glyphs in a straight line. Rather, glyphs are assembled several at a time into compound characters. So even though Chinese has thousands and thousands of glyphs, they are written in a straight line (vertically or horizontally like English), but Burmese, Tamil, Korean, Arabic, and several other languages have so-called "complex scripts" (M$ term).

For example, a common Korean formal greeting is "안뇽 하십니까?" (annyong hashimnikka?), which decomposes thus:
        ㅇ (null consonant for syllables beginning with vowels)
        ㅏ a
        ㄴ n
        ㄴ n
        ㅛ yo
        ㅇ ng (same glyph as null above, but at end of syllable)
        ㅎ h
        ㅏ a
        ㅅ s (pronounced /sh/ before /i/ or /y/)
        ㅣ i
        ㅂ b (pronounced /m/ before /n/)
        ㄴ n
        ㅣ i
        ㄲ kk
        ㅏ a

The first "character" you see (assuming your software displays it correctly) is an assemblage of ㅇ + ㅏ + ㄴ (null plus a plus n) = "an."

Most of the scripts of India and Southeast Asia are organized into syllables like this.

The Dēvanāgari script, used originally for Sanskrit and now adapted for most of the languages of India, has vowels that go above, below, to the right, and even to the left of the corresponding consonants within the syllable. Thus, the word "Dēvanāgari" is देवनागरि, which breaks down thus:
        द d (dental d)
        े ē (long e, written above the consonant)
        व v
          a (short a is not explicitly written)
        न n (dental n)
        ा ā (long a, written to the right of the consonant)
        ग g
          a (short a is not explicitly written)
        र r
        ि i (short i, written to the left of the consonant)

So what you're seeing (assuming your software displays it correctly) looks something like this:

        ē
        d - v - nā - g - ir

But the vowels are still pronounced after the consonants despite the visual arrangement.

As you can imagine, programmers had to devise special tricks to get computers to render these correctly.

--
War doesn't determine who's right, just who's left.
--
Paul B. Gallagher
_______________________________________________
support-seamonkey mailing list
[email protected]
https://lists.mozilla.org/listinfo/support-seamonkey

Reply via email to