Re: Transcriptions of Unicode
I happened upon a passage bolstering Mario's point that the English pronunciation of long U (as yoo, /ju/) does derive from it's being the closest pronunciation that the English could make to the French pronunciation of U (as /y/). That passage is in Honni soit qui mal y pense : L'incroyable histoire de l'amour entre le français et l'anglais. It is on page 158, in Pourquoi dit-on « miouzik » en anglais ? (I recommend the book: the writing is clear and accessible even for someone (like me) of limited French.) I put a link to the book on my booklist (http://www.macchiato.com/books/nonfiction.html). Mark — Ὀλίγοι ἔμφονες πολλῶν ἀφρόνων φοβερώτεροι — Πλάτωνος [http://www.macchiato.com] - Original Message - From: Marco Cimarosti [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Sent: Monday, January 15, 2001 01:15 Subject: RE: Transcriptions of Unicode Mark Davis wrote: Much as I admire and appreciate the French language (second only to Italian), the proximate derivation of Unicode was not from that language, and the transcription should not match the French pronunciation. Instead, it has solid Northern Californian roots (even though not exactly dating from the Gold Rush days). Of course, my comment about French pronunciation was only partially serious -- I should have added as smiley. But I think that /ynikod/ is the actual pronunciation of Unicode in French (as opposed to most other European language, that simply approximate the English pronunciation). So, as you explained that you are listing languages, and that you accept more than one language for each script, you might consider a second IPA example. According to the references I have, the prefix uni is directly from Latin while the word code is through French. I wonder what directly from Latin may mean in the case of English. Because of some timing problems, I would say it means: through direct knowledge of *written* Latin. A direct derivation from Latin of English uni- would imply that, at some age, English scholars used to read Latin with a pronunciation influenced by French. In fact, the initial [ju:] is the regular English approximation of French vowel [y]. (Is this likely?) The Indo-European would have been *oi-no-kau-do (give one strike): *kau apparently being related to [...] caudal, [...] Wow! So Unicode also means single tail, after all... What would that be in Chinese? :-) Marco
Re: Transcriptions of Unicode
Mon, 15 Jan 2001 13:09:47 -0800 (GMT-0800), G. Adam Stanislav [EMAIL PROTECTED] pisze: I would not be surprised if speakers of certain Slavic languages even changed the SPELLING to Unikod (with an acute over the [o]), as they have done with other imported words (such as futbal for football). That is what we in Polish newsgroups often do, even if it's very unofficial; I don't expect Unicode or Unikod in dictionaries soon. Without acute over the [o], which would mean a different thing. Actually "kod" in Polish means "code". -- __(" Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTPCZA QRCZAK
Re: Transcriptions of Unicode
Fri, 12 Jan 2001 07:28:18 -0800 (GMT-0800), Mark Davis [EMAIL PROTECTED] pisze: According to the references I have, the prefix "uni" is directly from Latin while the word "code" is through French. The Indo-European would have been *oi-no-kau-do ("give one strike"): *kau apparently being related to such English words as: hew, haggle, hoe, hag, hay, hack, caudad, caudal, caudate, caudex, coda, codex, codicil, coward, incus, and Kova (personal name: 'smith'). Oh, so my surname is related to Unicode? :-) "Kowal" means "smith" in Polish. -- __(" Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTPCZA QRCZAK
Re: Transcriptions of Unicode
On Monday, January 15, 2001, at 05:08 PM, G. Adam Stanislav wrote: That's exactly what I said. Unicode as an international standard will be pronounced internationally: Speakers of each language will have their own pronunciation, and some will even spell it differently. Ah, got it. I'm sorry, I misunderstood you as meaning that there would be one, international pronunciation. My apologies.
RE: Transcriptions of Unicode
Mark Davis wrote: Much as I admire and appreciate the French language (second only to Italian), the proximate derivation of "Unicode" was not from that language, and the transcription should not match the French pronunciation. Instead, it has solid Northern Californian roots (even though not exactly dating from the Gold Rush days). Of course, my comment about French pronunciation was only partially serious -- I should have added as smiley. But I think that /ynikod/ is the actual pronunciation of "Unicode" in French (as opposed to most other European language, that simply approximate the English pronunciation). So, as you explained that you are listing languages, and that you accept more than one language for each script, you might consider a second IPA example. According to the references I have, the prefix "uni" is directly from Latin while the word "code" is through French. I wonder what "directly from Latin" may mean in the case of English. Because of some timing problems, I would say it means: "through direct knowledge of *written* Latin". A direct derivation from Latin of English "uni-" would imply that, at some age, English scholars used to read Latin with a pronunciation influenced by French. In fact, the initial [ju:] is the regular English approximation of French vowel [y]. (Is this likely?) The Indo-European would have been *oi-no-kau-do ("give one strike"): *kau apparently being related to [...] caudal, [...] Wow! So Unicode also means "single tail", after all... What would that be in Chinese? :-) Marco
Re: Transcriptions of Unicode
Michael Everson wrote: "The pronuncuation ['juni:ko:d] with [i:] or [i] instead of schwa irritates me a lot. No one would pronounce "universe" with an [i]." I beg to differ; "universe" is commonly pronounced with a short [i] in the English Midlands. Charles Cox
Re: Transcriptions of Unicode
À 06:16 2001-01-15 -0800, Charles a écrit: Michael Everson wrote: The pronuncuation ['juni:ko:d] with [i:] or [i] instead of schwa irritates me a lot. No one would pronounce universe with an [i]. [Charles] I beg to differ; universe is commonly pronounced with a short [i] in the English Midlands. [Alain] A schwa for an i and an English u to pronounce Unicode begins to be extremely different from the pronunciation of Unicode in French (as I can't write with the IPA on this list, I will add German Ünicod to Marco's ynicod to make sure that most of you know how we pronounce it). This word, in its written form, shocks nobody in French (« et ce n'est pas peu dire ! »), even the most bigot and pious purists of the French language... But if you insist that the French speakers pronounce those two letters, it is the contrary, we will have to write the mandated IPA prononciation as « Iouneucôde » in French (there is no real scwha in French, imho)... Otherwise you create a strong issue in French. Please do not play with pronunciation... Unicode is not a standard about pronunciation, but rather -- and it is where it is an instrument of civilization -- a standard about writing... Writing tends to unite people, spoken languages tend to disunite them... An English speaker with a prefect knowledge of written French who does not pronounce French correctly is absolutely not understood, and the reverse is probably true too. I am a watcher of some American TV programs (mainly sci-fi) on TV, but I have to put subtitles to fully catch what I don't understand (unfortunately there is no subtitle in a meeting where English is spoken, and it is *always* a handicap to me). Please, no official IPA transcription for Unicode... Alain LaBonté Québec
Re: Transcriptions of Unicode
On Monday, January 15, 2001, at 06:34 AM, Michael Everson wrote: The pronuncuation ['juni:ko:d] with [i:] or [i] instead of schwa irritates me a lot. No one would pronounce "universe" with an [i]. Then forgive me, Michael, for I have sinned. I just sent in to Mark a Deseret Alphabet transcription that uses [i]. In my defense, I tend to find short, unstressesd vowels hard to tell apart in many English words. I really don't know what vowel I use in "universe." And the DA doesn't have a true schwa symbol, anyway.
RE: Transcriptions of Unicode
{Notice: way off-topic} Mark Davis wrote: There was a period well after the Norman invasion where a large number of words came into English directly from Latin, which was still in widespread use among scholars. Right. And it also was the language of priests, on both sides of the Channel. [ju:] isn't an approximation to the French [y]. There was a phase in the development of English called the Great Vowel Shift, where certain long vowels shifted back: a = [e:], e = [i:], i = [ai], o = [u:] (as in fool, move), u = [ju:]. I don't remember when this was -- it's been a long time -- but I seem to recall that it was a bit before Shakespeare. The pronunciation of u in French shifted at some point from [u] to [y]; I have no idea when this change happened, or if it would have affected the Latin spoken by the English at the time. Perhaps someone else knows. No, sorry. Middle English [u:] normally became modern [au] -- e.g.: "hus" [hu:s] - "house" [hauz]. I insist that [ju:] was the English rendering of the alien French phoneme [y]. The fact that it did not become [jau] simply testifies that most French words (re-)entered English *after* the GVS was concluded. Marco
RE: Transcriptions of Unicode
Mark Davis [mailto:[EMAIL PROTECTED]] wrote: "Marco Cimarosti" [EMAIL PROTECTED] wrote: I wonder what "directly from Latin" may mean in the case of English. Because of some timing problems, I would say it means: "through direct knowledge of *written* Latin". There was a period well after the Norman invasion where a large number of words came into English directly from Latin, which was still in widespread use among scholars. Yes, and it was right into the early 20th Century. Even when I was in school a large percentage of English schoolboys _had_ to learn Latin (- and in many "public" [private] schools they still do). This included "spoken" Latin - though I'm sure the pronunciation taught was quite different than what it was in 55 BCE. Not all that long ago you couldn't get into many English universities without having studied some Latin. In English we still get plenty of scientific names and terms from Latin and Greek and many of these words eventually come into more common usage. - Chris
Re: Transcriptions of Unicode
At 06:16 AM 1/15/01, Charles wrote: Michael Everson wrote: "The pronuncuation ['juni:ko:d] with [i:] or [i] instead of schwa irritates me a lot. No one would pronounce "universe" with an [i]." I beg to differ; "universe" is commonly pronounced with a short [i] in the English Midlands. And indeed on this side of the Pond, [i] is common (I find it unnatural to drop my tongue enough for the schwa), and I have heard (iirc) [i:] in the southeastern U.S. -- Curtis Clark http://www.csupomona.edu/~jcclark/ Biological Sciences Department Voice: (909) 869-4062 California State Polytechnic University FAX: (909) 869-4078 Pomona CA 91768-4032 USA [EMAIL PROTECTED]
Re: Transcriptions of Unicode
Just to expand upon this with data: 1. When I learned Latin in the U.S. in the 1960s, we were taught a reconstructed Roman pronunciation. Before someone asks him how anyone could know how say a 1st c. ce Roman pronounced things, reconstruction can be informed by such things as transliteration of names into Greek by Greek authors, common misspellings, metrical values, etc. It can't be precisely accurate, but it's probably not that far off. BTW, Montaigne's first language was Latin. French was his second language. His father wanted him to know his Latin like a Roman. This is rather like A.K. Ramanujan's (Indian poet's) description of his upbringing: in one floor/wing of the house, only English was allowed; on another floor, only Hindi(?), in a third, only Tamil. . . . Patrick Rourke [EMAIL PROTECTED]
Re: Transcriptions of Unicode
At 06:16 15-01-2001 -0800, Charles wrote: Michael Everson wrote: "The pronuncuation ['juni:ko:d] with [i:] or [i] instead of schwa irritates me a lot. No one would pronounce "universe" with an [i]." I beg to differ; "universe" is commonly pronounced with a short [i] in the English Midlands. Besides, the name of an international standard will be pronounced internationally. For example, my native tongue, Slovak, does not even have the English schwa sound. It would be ridiculous to expect Slovak Unicode users to learn a new phoneme just so they pronounce Unicode properly. They will pronounce it ['uniko:d], like it or not. :) And they will not turn the [o:] into an [ou] (as the English speakers do) either. Plus the [i] will be somewhere halfway between the English [i] and [i:]. I would not be surprised if speakers of certain Slavic languages even changed the SPELLING to Unikod (with an acute over the [o]), as they have done with other imported words (such as futbal for football). This is simply because they follow the write-as-you-hear rule. Insisting that they keep the original spelling would be linguistic imperialism, hardly appropriate for the makers of Unicode. Adam --- Whiz Kid Technomagic - brand name computers for less. See http://www.whizkidtech.net/pcwarehouse/ for details.
Re: Transcriptions of Unicode
13:27 2001-01-15 -0500, [EMAIL PROTECTED] a crit: My argument for the world converging on dutch as the only language that is written as it is spoke. Vic You really believe that Schiphol is written as pronounced ? (; (: Alain __ ifrance.com, l'email gratuit le plus complet de l'Internet ! vos emails depuis un navigateur, en POP3, sur Minitel, sur le WAP... http://www.ifrance.com/_reloc/email.emailif
RE: Transcriptions of Unicode
1. When I learned Latin in the U.S. in the 1960s, we were taught a reconstructed Roman pronunciation. Latin is still spoken in Rome, at the Vatican. So there is a Roman pronunciation even today... (; Just kidding... although what I say is true... Alain How about a weekly radio news broadcast in Latin? http://www.yle.fi/ylenykko/nuntii.html (in Finnish :-) http://www.yle.fi/fbc/latini/ (in Latin) http://www.yle.fi/fbc/latini/summary.html (in English)
Re: Transcriptions of Unicode
On Monday, January 15, 2001, at 01:09 PM, G. Adam Stanislav wrote: Besides, the name of an international standard will be pronounced internationally. Why? I don't pronounce "Paris" the way the French do. Why should I expect people from other countries to pronounce "Unicode" the way I do? Heck, I don't even expect other *English* speakers to pronounce it the way I do. I'm convinced I have a short i in the middle of it.
Re: Transcriptions of Unicode
On 01/15/2001 04:25:00 AM Michael Everson wrote: The pronuncuation ['juni:ko:d] with [i:] or [i] instead of schwa irritates me a lot. No one would pronounce "universe" with an [i]. Well, note that it was transcribed not with [i:] but with the open counterpart (IPA symbol 319 rather than 301). That's certainly plausible, perhaps in certain dialects or in careful speech. I have heard some actually say [i] (not [i:]) but as I remember they were not native English speakers (i.e. they had the phonology of another language influencing their pronunciation when speaking English). I agree with Michael that schwa is a probably lot more likely for most speakers, though. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Transcriptions of Unicode
He didn't actually say it: someone joked at a dinner or fundraiser that Dan Quayle had felt guilty that he hadn't studied his Latin upon his visit to Latin America, and the press picked it up as though it were a true report of Quayle's own words. What it says of the man that millions of people believed him capable of saying it, I leave others to decide. I suspect that he was nominated because he reminded GHWB of someone. Patrick Rourke [EMAIL PROTECTED] - Original Message - From: "Tex Texin" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Monday, January 15, 2001 5:27 PM Subject: Re: Transcriptions of "Unicode" Wasn't it Dan Quayle who said they speak Latin in Latin America? [EMAIL PROTECTED] wrote: 1. When I learned Latin in the U.S. in the 1960s, we were taught a reconstructed Roman pronunciation. Latin is still spoken in Rome, at the Vatican. So there is a Roman pronunciation even today... (; Just kidding... although what I say is true... Alain How about a weekly radio news broadcast in Latin? http://www.yle.fi/ylenykko/nuntii.html (in Finnish :-) http://www.yle.fi/fbc/latini/ (in Latin) http://www.yle.fi/fbc/latini/summary.html (in English) -- According to Murphy, nothing goes according to Hoyle. -- Tex Texin Director, International Business mailto:[EMAIL PROTECTED] +1-781-280-4271 Fax:+1-781-280-4655 Progress Software Corp.14 Oak Park, Bedford, MA 01730 http://www.Progress.com#1 Embedded Database Globalization Program http://www.Progress.com/partners/globalization.htm -- -
Re: Transcriptions of Unicode
At 14:11 15-01-2001 -0800, John Jenkins wrote: On Monday, January 15, 2001, at 01:09 PM, G. Adam Stanislav wrote: Besides, the name of an international standard will be pronounced internationally. Why? I don't pronounce "Paris" the way the French do. Why should I expect people from other countries to pronounce "Unicode" the way I do? That's exactly what I said. Unicode as an international standard will be pronounced internationally: Speakers of each language will have their own pronunciation, and some will even spell it differently. Adam --- Whiz Kid Technomagic - brand name computers for less. See http://www.whizkidtech.net/pcwarehouse/ for details.
Re: Transcriptions of Unicode
Hallo everybody! I don't fully agree with Mark Davis' API transcription of "Unicode": http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_ IPA.gif Because: 1) I think that IPA transcriptions should be in [square brackets], while phonemic transcriptions should be in /slashes/. If neither enclosing is present, the transcription is ambiguous. 2) AFAIK, the phoneme [o:] (a long version of "o" in "got") does not exist in any standard pronunciation of contemporary English. It should rather be the diphthong [ou] (where the [u] would probably better be U+028A). 3) The transcription shows the primary stress on the first syllable, and a secondary stress on the last one. In the few occasions when I heard native English speakers saying "Unicode", I had the impression that it rather was the other way round. 4) As "Unicode" is the proper name of an international standard, and it is built with two English roots of French origin, it could as well be considered a French word, which would lead to a totally different transcription. Sorry if I am repeating something already said by other people: I have been off the list for a while. And, about points 2 and 3 above, beware that I am a second language English speaker and that I don't have much experience of American pronunciation. Ciao. Marco Cimarosti
Re: Transcriptions of Unicode
Marco Cimarosti wrote: I don't fully agree with Mark Davis' API transcription of "Unicode": http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U _IPA.gif Neither do I, but partly for different reasons. 1) I think that IPA transcriptions should be in [square brackets], while phonemic transcriptions should be in /slashes/. If neither enclosing is present, the transcription is ambiguous. Right. And that's actually part of the key to the problem's answer: 2) AFAIK, the phoneme [o:] (a long version of "o" in "got") does not exist in any standard pronunciation of contemporary English. It should rather be the diphthong [ou] (where the [u] would probably better be U+028A). In America, transcribing the vowel in "code" as /o/ (and "made" as /e/) is not uncommon, at least in *phonemic* transcription. Generally, American accents have less diphthongization in these sounds than British accents have, and phonemically it makes sense to see these sounds as part of the series of "long vowels". A *narrow phonetic* transcription would have something like [u+006F u+028A] for American, and [u+0259 u+028A] for British. 3) The transcription shows the primary stress on the first syllable, and a secondary stress on the last one. In the few occasions when I heard native English speakers saying "Unicode", I had the impression that it rather was the other way round. I can't tell, because where I live I don't get to talk to native speakers about Unicode a lot. But: According to standard word-formation and pronunciation patterns in English, the stress pattern shown ('uni,code) is absolutely what you'd expect: as in "uniform", "unisex", "unicorn", "universe". (D. Jones, English Pronouncing Dictionary, doesn't even mark a secondary stress on the third syllable at all.) 4) As "Unicode" is the proper name of an international standard, and it is built with two English roots of French origin, it could as well be considered a French word, which would lead to a totally different transcription. Right, but this particular pattern of merging word roots into a new word does suggest English provenance, I think. And, historically, that's where it did come from. But there's another inconsistency in the transcription: the vowels in the first ("u-") and third ("-code") syllable are both phonemically long. Either you put the length mark on both (recommended for *phonetic* transcription), or on neither (okay with *phonemic* transcription). (Of course, if you transcribe the third syllable as a diphthong then you won't get a length mark there.) According to the conventions in D. Jones, English Pronouncing Dictionary, you'd get something like: [u+02C8 u+006A u+0075 u+02D0 u+006E u+026A u+006B u+0259 u+028A u+0064] Lukas - Lukas Pietsch University of Freiburg English Department Phone (p.) (#49) (761) 696 37 23 mailto:[EMAIL PROTECTED]
Re: Transcriptions of Unicode
Much as I admire and appreciate the French language (second only to Italian), the proximate derivation of "Unicode" was not from that language, and the transcription should not match the French pronunciation. Instead, it has solid Northern Californian roots (even thoughnot exactly dating from the Gold Rush days). According to the references I have, the prefix "uni" is directly from Latin while the word "code" is through French. The Indo-European would have been *oi-no-kau-do ("give one strike"):*kau apparently being related to such English words as: hew, haggle, hoe, hag, hay, hack, caudad, caudal, caudate, caudex, coda, codex, codicil, coward, incus, and Kovač (personal name: 'smith'). I will leave the exact derivations to the exegetes, but I like the association with "haggle" myself. I will ask our resident phonetician about the IPA transcription. Clearly Standard British English would add some interesting -- and no doubt valuable --complexities and nuances to the vowels, but that is not the goal in this case. Even "o" is oftena diphthong in English, it is probably better to have [o:] as a target for matching from other languages, since [ou] may be considered slightly affected in the native language. The stress is definitely on the first syllable. One does hear some normal generative English variations such as ˈjunəˌkoːd. (schwa instead of short-i), but the stress still should be on the first syllable, as in "unify", not later in the word as in "unique". Of course, the best approximation in the target language should be used: if it does not allow for that position for the stress (without affection), then the secondary stress should be used. Mark - Original Message - From: "Marco Cimarosti" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Friday, January 12, 2001 03:11 Subject: Re: Transcriptions of "Unicode" Hallo everybody! I don't fully agree with Mark Davis' API transcription of "Unicode": http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_ IPA.gif Because: 1) I think that IPA transcriptions should be in [square brackets], while phonemic transcriptions should be in /slashes/. If neither enclosing is present, the transcription is ambiguous. 2) AFAIK, the phoneme [o:] (a long version of "o" in "got") does not exist in any standard pronunciation of contemporary English. It should rather be the diphthong [ou] (where the [u] would probably better be U+028A). 3) The transcription shows the primary stress on the first syllable, and a secondary stress on the last one. In the few occasions when I heard native English speakers saying "Unicode", I had the impression that it rather was the other way round. 4) As "Unicode" is the proper name of an international standard, and it is built with two English roots of French origin, it could as well be considered a French word, which would lead to a totally different transcription. Sorry if I am repeating something already said by other people: I have been off the list for a while. And, about points 2 and 3 above, beware that I am a second language English speaker and that I don't have much experience of American pronunciation. Ciao. Marco Cimarosti
Re: Transcriptions of Unicode: Still Missing scripts
On Thu, 11 Jan 2001, Mark Davis wrote: By the way, I am still missing the following. If anyone can supply them, I'd appreciate it. [BOPOMOFO] [snip] [MONGOLIAN] [snip] See http://www.macchiato.com/unicode/Unicode_transcriptions.html for details. It's still not very clear to me what this is supposed to be a list of. The title says "Transcriptions of Unicode", and a note at the bottom says "For non-Latin scripts the goal is to match the English pronunciation -- not spelling." Some of the entries (leftmost column of the table) are names of languages, while others are names of scripts. e.g., "Russian" and "Japanese" are names of languages, with examples given in Cyrillic and Katakana, respectively. For some scripts, there is basically only one language that uses it, such as Katakana (used by Japanese) or Hangul (used by Korean), while other scripts are used by many languages. It this supposed to suggest that Russian is the representative language to give a Cyrillic example in, and say, not Mongolian? In some cases, it seems the example is not necessarily a transcription of the English pronunciation, but a translation into another language, most likely a loanword, with attendant sound changes. e.g., Japanese "yunikoodo". I notice the lack of a request for an example using the Hiragana script (which is also used by Japanese), which suggests that the Japanese example is not a transcription of the English pronunciation into Katakana, but a Japanese word (albeit a loanword). Otherwise, it would be possible to provide a Hiragana example, however nonsenical or non-existant it may be in reality. There is also the particular case of the Chinese entries, written in CJK "ideographs", which *are* translations using the calque strategy. It seems to me that this list is intended to showcase a variety of ways to write "Unicode", be they transcriptions, transliterations, or translations--whatever maximizes the number of scripts that one can show off, apparently. This raises some questions of what an example showcasing the Bopomofo script should look like. Basically, it is used only for Chinese, primarily Mandarin (zh-guoyu). It is also primarily an auxiliary script for ruby annotation of Chinese text written in CJK "ideographs", although it may stand alone. So, if it is a transcription of English pronunciation, then it will have to go through the language filter of Mandarin Chinese, and this form may or may not be attested in reality--perhaps as a "best-fit" colloquial attempt to say a foreign (English) word. And this version would have the script standing alone. Alternatively, it could be a transcription according to Mandarin Chinese pronunication of the already existing Chinese translations written in CJK "ideographs". In this case, it could either stand alone, or be attached as ruby annotation to the CJK "ideograph" version (in Chinese). Implemenation-wise, it would be problematic seeing the Bopomofo at the size it would be in for ruby annotation of text in a 96x24 bitmap (as requested on the page. Also, Bopomofo does have an inclination to be used with Chinese text written top-to-bottom, so the horizontal shape of the 96x24 bitmap is problematic--more generally, vertically written scripts such as the traditional Mongolian script (also requested) cannot be demonstrated within this framework. Thomas Chan [EMAIL PROTECTED]
RE: Transcriptions of Unicode
Peter Constable wrote: I'd add the square brackets, an off-glide on the "o", and aspiration (02b0) after the "k". Is that k aspirated? I do hear an aspiration when [p], [t] or [k] are at the *beginning* of "words" (mainly because teachers told me I was supposed to notice it), but I don't feel it *inside* a word. One other point: Yes? :-) Marco
Re: Transcriptions of Unicode
On Fri, 12 Jan 2001, Lukas Pietsch wrote: Marco Cimarosti wrote: 3) The transcription shows the primary stress on the first syllable, and a secondary stress on the last one. In the few occasions when I heard native English speakers saying "Unicode", I had the impression that it rather was the other way round. I can't tell, because where I live I don't get to talk to native speakers about Unicode a lot. But: According to standard word-formation and There is "Unicode, Oh Unicode" anthem/hymn--sound files located in /Other/Sounds/ directory on the cd-rom published with the book, as well as an audio track on the same disc. If this can be taken as an official stance on pronunciation of the term (the WhatIsThis.txt explanatory file does not provide any clues), well, I do not know... Thomas Chan [EMAIL PROTECTED]
RE: Transcriptions of Unicode
On 01/12/2001 10:33:48 AM Marco Cimarosti wrote: Is that k aspirated? It is for any English speakers I've ever met. One other point: Yes? :-) Oops. It was to be the point about the aspirated k. I forgot to delete that. Peter
Re: Representation of aspiration (was: Re: Transcriptions of Unicode)
Kenneth Whistler wrote: Richard Cook surmised: BTW, in a very close transcription, if one is using superscription (position above baseline) and relative size reduction to indicate aspiration, I suppose that degree of superscription or the size or both could be modulated to indicate degree of aspiration? Nah, if you tried to go down that path, you'd just end up with unrepresentable transcriptions and unreliable reproduction. I doubt that there are many transcribers who could reliably record more than three degrees of aspiration, anyway (roughly: slight aspiration, "normal" aspiration, and superaspiration). Ken, I was only kidding ... mostly, should have put a smiley in there :-) But I was also thinking of the superscription question, which I think Peter C. might like to discuss. Once you go past that level, which could be reliably indicated with appropriate use of diacritics, you are really into the realm of instrumental phonetics. I'd just hook up the machine and let it give you precise timings of voice delays post consonatal release in milliseconds. Or perhaps just mark-up the unsuperscripted aspiration indicator, to note degree of aspiration ... however you would like to measure that. No need to "mark it up". Just add another diacritic. That's how most transcribers would work, in practice. Well, I was thinking of linking the transcription to the machine data ... so that the relation would be set on a compound key (aspiration diacritic measurement reference) ...
Re: Transcriptions of Unicode
Thanks for your detailed note; I'll have to think it over. ... But there's another inconsistency in the transcription: the vowels in the first ("u-") and third ("-code") syllable are both phonemically long. Either you put the length mark on both (recommended for *phonetic* transcription), or on neither (okay with *phonemic* transcription). (Of The o is significantly longer than the u, probably due to the following d. ... - Lukas Pietsch University of Freiburg English Department Phone (p.) (#49) (761) 696 37 23 mailto:[EMAIL PROTECTED]
Re: Transcriptions of Unicode
I see 2 Traditional Chinese translations here: http://www.macchiato.com/unicode/Unicode_transcriptions.html Which one do people like? http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_Chinese2.gif http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_Chinese3.gif
Re: Transcriptions of Unicode
On Thursday, January 11, 2001, at 10:25 AM, Richard Cook wrote: Which one do people like? http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/ U_Chinese2.gif Is much better. "Unified Code" http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/ U_Chinese3.gif Stinks. "Standard International Code"
RE: Transcriptions of Unicode
http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_ Chinese3.gif and http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_ Chinese2.gif both are used in Taiwan. If you type "Unicode" to the search field at Taiwan Yahoo page http://tw.yahoo.com, you will find http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_ Chinese3.gif . See the traditional Chinese web page at http://www.unicode.org/unicode/standard/translations/t-chinese.html The translation of http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_ Chinese2.gif is used in China/Hong Kong, see the simplified Chinese web page at http://www.unicode.org/unicode/standard/translations/s-chinese.html -Jenny Pan -Original Message- From: John Jenkins [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 11, 2001 3:42 PM To: Unicode List Subject: Re: Transcriptions of "Unicode" On Thursday, January 11, 2001, at 10:25 AM, Richard Cook wrote: Which one do people like? http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/ U_Chinese2.gif Is much better. "Unified Code" http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/ U_Chinese3.gif Stinks. "Standard International Code"
Re: Transcriptions of Unicode
On Thu, 11 Jan 2001, Richard Cook wrote: I see 2 Traditional Chinese translations here: http://www.macchiato.com/unicode/Unicode_transcriptions.html Which one do people like? http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_Chinese2.gif http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_Chinese3.gif It seems the former ("tongyi ma") rather than the latter ("biaozhun wanguo ma"). Some searches... "tongyi ma" (U_Chinese2.gif): Altavista: 66 matches Yahoo (Chinese/Hong Kong/Taiwan): 78 matches Microsoft Taiwan: 100 matches ("Yahoo Chinese" != "Yahoo China". I couldn't get through to Microsoft Hong Kong's search page.) Also IUC10 page (http://www.unicode.org/iuc/iuc10/languages.html) and Java glossary (http://java.sun.com/docs/glossaries/glossary.print.html) agree. "biaozhun wanguo ma" (U_Chinese3.gif): Altavista: 7 matches Yahoo (Chinese/Hong Kong/Taiwan): 1 match Microsoft Taiwan: 78 matches I do wonder, however, if "biaozhun wanguo ..." was meant as a translation of "ISO ...". Thomas Chan [EMAIL PROTECTED]
Re: Transcriptions of Unicode
John Jenkins wrote: On Thursday, January 11, 2001, at 10:25 AM, Richard Cook wrote: Which one do people like? http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_Chinese2.gif Is much better. "Unified Code" This was my opinion too. I like "tongyima". And so far I haven't heard from anyone advocating http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images U_Chinese3.gif Stinks. "Standard International Code" Although "stinks" might be a little harsh. I'd opt for "opaque" :-) Anyone else?
Re: Transcriptions of Unicode
Jon Babcock wrote: At first glance, I agreed. But then if the U_Chinese3.gif, gets shortened to the last three characters, wanguo ma, as I suspect it would in practice, I'd favor it slightly over the three-character tongyi ma of U_Chinese2.gif. FWIW. To me, wanguo ma emphasizes the multilingual aspect, whereas tongyi ma emphasizes the unifying aspect, but it isn't fully apparent, from the name (tongyi ma) alone, what is being unified. Well, I'd say a problem with wanguo ma [lit. 'standard myriad-country code'] is that it would be a better translation of Globalcode, rather than of Unicode. All in favor of changing the standard name, say aye? And is it apparent from the name "Unicode" alone that "Uni-" stands for "Unified" and not, um, "Unicorn"? :-) tongyi ma seems much more natural, less clunky to me ... but some people prefer what I think is clunky, so I'm willing to admit that my opinion of clunkiness may be completely subjective. Here's the Unicode, courtesy of http://www.wenlin.com/ : [U+6a19][U+6e96][U+842c][U+570b][U+78bc] biao1zhun3 wan4guo2 ma3 [U+7d71][U+4e00][U+78bc] tong3yi1 ma3 UTF8: * biao1zhun3 wan4guo2 ma3 * tong3yi1 ma3
Re: Transcriptions of Unicode
Michael, that's great. Could you send the code points? (I couldn't use the images -- if you can make a 96 x 24 GIF, I can use that). Thanks, Mark - Original Message - From: "Michael Everson" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Tuesday, December 12, 2000 09:01 Subject: Re: Transcriptions of Unicode Ar 07:11 -0800 2000-12-12, scríobh Mark Davis: ARMENIAN BULGARIAN CHEROKEE ETHIOPIC GREEK GUJARATI GURMUKHI INUKTITUT OGHAM RUNIC RUSSIAN SINHALA UCAS See http://www.egt.ie/standards/iso10646/pdf/junikod.pdf Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire
Re: Transcriptions of Unicode
That matches what I have on http://www.macchiato.com/unicode/Unicode_transcriptions.html, right? (circle?) Mark - Original Message - From: "Michael (michka) Kaplan" [EMAIL PROTECTED] To: "Mark Davis" [EMAIL PROTECTED]; "Unicode List" [EMAIL PROTECTED] Sent: Thursday, December 14, 2000 11:25 Subject: Re: Transcriptions of Unicode Here is Hindi: यूिनकोड I was convinced that that circle was a mistake, but per my friend the native Hindi speaker: "that circle is right, and that's the char that gives the phonetic minor e" michka - Original Message - From: "Mark Davis" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Tuesday, December 12, 2000 7:11 AM Subject: Transcriptions of Unicode Some people were kind enough to send me extra transcriptions for http://www.macchiato.com/unicode/Unicode_transcriptions.html I am still missing confirmation on the Russian and Greek, and (at least one language in) the following scripts. Any help from native speakers would be appreciated. ARMENIAN BENGALI BOPOMOFO CHEROKEE ETHIOPIC GUJARATI GURMUKHI KANNADA KHMER LAO MALAYALAM MONGOLIAN MYANMAR OGHAM ORIYA RUNIC SINHALA SYRIAC TAMIL TELUGU THAANA THAI TIBETAN UCAS YI
Re: Transcriptions of Unicode
Sorry, it was the fault of the machine I was on then, I think. I had mistyped it (U+0928 after U+093F rather than before it). My friend concured. michka - Original Message - From: "Mark Davis" [EMAIL PROTECTED] To: "Michael (michka) Kaplan" [EMAIL PROTECTED]; "Unicode List" [EMAIL PROTECTED] Sent: Thursday, December 14, 2000 8:01 PM Subject: Re: Transcriptions of Unicode That matches what I have on http://www.macchiato.com/unicode/Unicode_transcriptions.html, right? (circle?) Mark - Original Message - From: "Michael (michka) Kaplan" [EMAIL PROTECTED] To: "Mark Davis" [EMAIL PROTECTED]; "Unicode List" [EMAIL PROTECTED] Sent: Thursday, December 14, 2000 11:25 Subject: Re: Transcriptions of Unicode Here is Hindi: यूिनकोड I was convinced that that circle was a mistake, but per my friend the native Hindi speaker: "that circle is right, and that's the char that gives the phonetic minor e" michka - Original Message - From: "Mark Davis" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Tuesday, December 12, 2000 7:11 AM Subject: Transcriptions of Unicode Some people were kind enough to send me extra transcriptions for http://www.macchiato.com/unicode/Unicode_transcriptions.html I am still missing confirmation on the Russian and Greek, and (at least one language in) the following scripts. Any help from native speakers would be appreciated. ARMENIAN BENGALI BOPOMOFO CHEROKEE ETHIOPIC GUJARATI GURMUKHI KANNADA KHMER LAO MALAYALAM MONGOLIAN MYANMAR OGHAM ORIYA RUNIC SINHALA SYRIAC TAMIL TELUGU THAANA THAI TIBETAN UCAS YI
Re: Transcriptions of Unicode
Who needs those mungers? Let's nuke them straight to HELL. WITH a nuke. Or at least a couple hundred hand grenades. | ||\ __/__ | | _/_ | || / | _|_ ,--, / \ /_| -+- / --- | / |V T_)| | |\ | ||/ _ \_/ T / \ / __/ | /--- \_/ L/ \ Sarasvati [EMAIL PROTECTED] wrote: Michka wrote: Ok, it happened again. I can send mail to other people and the encoding stays intact. Just the Unicode List is losing it. Does anyone have any ideas on this? Sarasvati contends that you're probably sending raw 8-bit mail over an SMTP connection without any indication of the encoding, nor any MIME headers in your message. The raw message that was received by Unicode.ORG was _ALREADY_ munged into 7-bits, so the fault does not lie with Unicode.ORG. Your original mail had this interesting header in it, which might be of some interest... Received: from 157.54.9.108 by mail5.microsoft.com (InterScan E-Mail VirusWall NT); Tue, 12 Dec 2000 10:20:42 -0800 (Pacific Standard Time) Received: by inet-imc-05.redmond.corp.microsoft.com with Internet Mail Service (5.5.2651.58) id YWS8WTM0; Tue, 12 Dec 2000 10:20:41 -0800 Probably someone else is munging your mail on its way to me. -- Sarasvati ___ Get your own FREE Bolt Onebox - FREE voicemail, email, and fax, all in one place - sign up at http://www.bolt.com
Re: Transcriptions of Unicode
Ar 07:11 -0800 2000-12-12, scríobh Mark Davis: ARMENIAN BULGARIAN CHEROKEE ETHIOPIC GREEK GUJARATI GURMUKHI INUKTITUT OGHAM RUNIC RUSSIAN SINHALA UCAS See http://www.egt.ie/standards/iso10646/pdf/junikod.pdf Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire
Re: Transcriptions of Unicode
Here's Tamil (sorry I did not see this earlier on the list!) MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/ - Original Message - From: "Mark Davis" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Tuesday, December 12, 2000 7:11 AM Subject: Transcriptions of Unicode Some people were kind enough to send me extra transcriptions for http://www.macchiato.com/unicode/Unicode_transcriptions.html I am still missing confirmation on the Russian and Greek, and (at least one language in) the following scripts. Any help from native speakers would be appreciated. ARMENIAN BENGALI BOPOMOFO CHEROKEE ETHIOPIC GUJARATI GURMUKHI KANNADA KHMER LAO MALAYALAM MONGOLIAN MYANMAR OGHAM ORIYA RUNIC SINHALA SYRIAC TAMIL TELUGU THAANA THAI TIBETAN UCAS YI
Re: Transcriptions of Unicode
Hmmm... wonder how the UTF-8 encoding got lost? I will try one more time Mark, let me know if the e-mail to you retained it. MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/
Re: Transcriptions of Unicode
Ok, it happened again. I can send mail to other people and the encoding stays intact. Just the Unicode List is losing it. Does anyone have any ideas on this? The code points are: U+0BAF U+0BC2 U+0BA9 U+0BBF U+0B95 U+0BCB U+0B9F U+0BCD and is the one INFITT (Information Forum for Information Technology in Tamil) has been using in its recent discussions. michka - Original Message - From: "Michael (michka) Kaplan" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Tuesday, December 12, 2000 9:58 AM Subject: Re: Transcriptions of Unicode Hmmm... wonder how the UTF-8 encoding got lost? I will try one more time Mark, let me know if the e-mail to you retained it. MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/
Re: Transcriptions of Unicode
At Tue, 12 Dec 2000 10:25:59 -0800 (GMT-0800), Michael (michka) Kaplan [EMAIL PROTECTED] wrote: Ok, it happened again. I can send mail to other people and the encoding stays intact. Just the Unicode List is losing it. Does anyone have any ideas on this? I think that's because the list server strip off almost all the mail header information. The server should retain MIME-Version: Content-Type: header to allow mail clients to display the message in the right encoding. It would be even better if the server retain In-Reply-To: header so that I can view the messages in thread. --- Shigemichi Yazawa [EMAIL PROTECTED]
Re: Transcriptions of Unicode
Michka wrote: Ok, it happened again. I can send mail to other people and the encoding stays intact. Just the Unicode List is losing it. Does anyone have any ideas on this? Sarasvati contends that you're probably sending raw 8-bit mail over an SMTP connection without any indication of the encoding, nor any MIME headers in your message. The raw message that was received by Unicode.ORG was _ALREADY_ munged into 7-bits, so the fault does not lie with Unicode.ORG. Your original mail had this interesting header in it, which might be of some interest... Received: from 157.54.9.108 by mail5.microsoft.com (InterScan E-Mail VirusWall NT); Tue, 12 Dec 2000 10:20:42 -0800 (Pacific Standard Time) Received: by inet-imc-05.redmond.corp.microsoft.com with Internet Mail Service (5.5.2651.58) id YWS8WTM0; Tue, 12 Dec 2000 10:20:41 -0800 Probably someone else is munging your mail on its way to me. -- Sarasvati
Re: Transcriptions of Unicode
Darlings, Shigemichi Yazawa wrote: I think that's because the list server strip off almost all the mail header information. The server should retain MIME-Version: Content-Type: On the contrary, Sarasvati is a highly discerning stripper, and certainly does not remove anything so essential as MIME headers. If you look at your own message, as massaged, you will find your MIME headers intact. And I even know your ditch-dwelling flagellum-waving protozoan mailer's name. MIME-version: 1.0 (generated by EMIKO 1.13.9 - "Euglena tripteris") Content-type: text/plain; charset=US-ASCII Euglena tripteris. First isolated by E. G. Pringsheim in 1943. See Sweet Emiko up-close in all her 140-micron glory at: http://taxa.soken.ac.jp/WWW/PDB/PCD2460/D/07.jpg Your cheeky, -- Sarasvati
Re: Transcriptions of Unicode
Interesting... strange how other people I send e-mail to do not have this problem? Let me try one more time. :-) யூனிகோட் MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/ - Original Message - From: "Sarasvati" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Tuesday, December 12, 2000 10:45 AM Subject: Re: Transcriptions of Unicode Michka wrote: Ok, it happened again. I can send mail to other people and the encoding stays intact. Just the Unicode List is losing it. Does anyone have any ideas on this? Sarasvati contends that you're probably sending raw 8-bit mail over an SMTP connection without any indication of the encoding, nor any MIME headers in your message. The raw message that was received by Unicode.ORG was _ALREADY_ munged into 7-bits, so the fault does not lie with Unicode.ORG. Your original mail had this interesting header in it, which might be of some interest... Received: from 157.54.9.108 by mail5.microsoft.com (InterScan E-Mail VirusWall NT); Tue, 12 Dec 2000 10:20:42 -0800 (Pacific Standard Time) Received: by inet-imc-05.redmond.corp.microsoft.com with Internet Mail Service (5.5.2651.58) id YWS8WTM0; Tue, 12 Dec 2000 10:20:41 -0800 Probably someone else is munging your mail on its way to me. -- Sarasvati
Re: Transcriptions of Unicode
Michael Interesting... strange how other people I send e-mail to do not Michael have this problem? It came through this time, even on my stone-age mail reader. Given a widely used homogeneous system like Windows, I wouldn't be surprised if the recipients that successfully viewed the original were running Windows too. - Mark Leisher Computing Research LabCinema, radio, television, magazines are a New Mexico State University school of inattention: people look without Box 30001, Dept. 3CRL seeing, listen without hearing. Las Cruces, NM 88003-- Robert Bresson
Re: Transcriptions of Unicode
Ah, I actually change to a new SMTP server, hoping it would be a bit more advanced. It appears to be a lot more up to date! michka - Original Message - From: "Mark Leisher" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Tuesday, December 12, 2000 11:52 AM Subject: Re: Transcriptions of Unicode Michael Interesting... strange how other people I send e-mail to do not Michael have this problem? It came through this time, even on my stone-age mail reader. Given a widely used homogeneous system like Windows, I wouldn't be surprised if the recipients that successfully viewed the original were running Windows too. -- --- Mark Leisher Computing Research LabCinema, radio, television, magazines are a New Mexico State University school of inattention: people look without Box 30001, Dept. 3CRL seeing, listen without hearing. Las Cruces, NM 88003-- Robert Bresson
Re: Transcriptions of Unicode
At 03:01 PM 12/8/00, John H. Jenkins wrote: Yes, this is really true. If someone were reading an extended text or an entire book in Chinese, they might prefer to see the Chinese glyphs, but isolated words, quotations, and short passages are printed with Japanese ones. This is not unique to Chinese/Japanese. When I learned German many years ago, the textbook printed German in Fraktur, so that the students would gain experience in reading older German texts. But German quotes in English text have almost invariably been in the same face as the English. -- Curtis Clark http://www.csupomona.edu/~jcclark/ Biological Sciences Department Voice: (909) 869-4062 California State Polytechnic University FAX: (909) 869-4078 Pomona CA 91768-4032 USA [EMAIL PROTECTED]
Re: displaying Unicode text (was Re: Transcriptions of Unicode)
Mark Davis wrote: Let's take an example. - The page is UTF-8. - It contains a mixture of German, dingbats and Hindi text. - My locale is de_DE. From your description, it sounds like Modzilla works as follows: - The locale maps (I'm guessing) to 8859-1 - 8859 maps to, say Helvetica. - The dingbats and Hindi appear as boxes or question marks. This would be pretty lame, so I hope I misunderstand you!! Sorry, I've been abbreviating quite a bit, so I left out a lot. Yes, you've misunderstood me, but only because I abbreviated so much. Sorry. Let me try again, with more feeling this time. Using the example above: - The locale maps to "x-western" (ja_JP would map to "ja", so I've prepended "x-" for the "language groups" that don't exist in RFC 1766) - x-western and CSS' sans-serif map to Arial - The dingbats appear as dingbats if they are in Unicode and at least one of the dingbat fonts on the system has a Unicode cmap subtable (WingDings is a "symbol" font, so it doesn't have such a table), while the Hindi might display OK on some Windows systems if they have Hindi support (Mozilla itself does not support any Indic languages yet). We could support the WingDings font if we add an entry for WingDings to the following table: http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp#872 We just haven't done that yet. Basically, Mozilla will look at all the fonts on the system to find one that contains a glyph for the current character. The language group and user locale stuff that I mentioned earlier is only one part of the process -- the part that deals with the user's font preferences. I'll explain more of the rest of the process: Mozilla implements CSS2's font matching algorithm: http://www.w3.org/TR/REC-CSS2/fonts.html#algorithm This states that *for each character* in the element, the implementation is supposed to go down the list of fonts in the font-family property, to find a font that exists and that contains a glyph for the current character. Mozilla implements this algorithm to the letter, which means that fonts are chosen for each character without regard for neighboring characters (unlike MSIE). This may actually have been a bad decision, since we sometimes end up with text that looks odd due to font changes. Anyway, Mozilla's algorithm has the following steps: 1. "User-Defined" font 2. CSS font-family property 3. CSS generic font (e.g. serif) 4. list of all fonts on system 5. transliteration 6. question mark You can see these steps in the following pieces of code: http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp#2642 http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#3108 1. "User-Defined" font (FindUserDefinedFont) We decided to include the User-Defined font functionality in Netscape 6 again. It is similar to the old Netscape 4.X. Basically, if the user selects this encoding from the View menu, then the browser passes the bytes through to the font, untouched. This is for charsets that we don't already support. This step needs to be the first step, since it overrides everything else. 2. CSS font-family property (FindLocalFont) If the user hasn't selected User-Defined, we invoke this routine. It simply goes down the font-family list to find a font that exists and that contains a glyph for the current character. E.g.: font-family: Arial, "MS Gothic", sans-serif; 3. CSS generic font (FindGenericFont) If the above fails, this routine tries to find a font for the CSS generic (e.g. sans-serif) that was found in the font-family property, if any, otherwise it falls back to the user's default (serif or sans-serif). This is where the font preferences come in, so this is where we try to determine the language group of the element. I.e. we take the LANG attribute of this element or a parent element if any, otherwise the language group of the document's charset, if non-Unicode-based, otherwise the user's locale's language group. 4. list of all fonts on system (FindGlobalFont) If the above fails, this routine goes through all fonts on the system, trying to find one that contains a glyph for the current character. 5. transliteration (FindSubstituteFont) If we still can't find a font for this character, we try a transliteration table. For example, the euro is mapped to the 3 ASCIIs "EUR", which is useful on some Unix systems that don't have the euro glyph yet. Actually, this transliteration step isn't even implemented on Windows yet. 6. question mark (FindSubstituteFont) If we can't find a transliteration, we fall back to the last resort -- the good ol' question mark. That's it. I hope I didn't abbreviate too much this time! Erik
Re: Transcriptions of Unicode
On Wed, Dec 06, 2000 at 11:12:24PM -0800, James Kass wrote: As for Chinese users searching for Chinese strings, Japanese text will most probably be incomprehensible regardless of font or mark-up. That's true for pretty much every other pair of languages that use the same script, though. -- David Starner - [EMAIL PROTECTED] http://dvdeug.dhis.org "(You see, the best way to solve a problem is to rigorously define it in terms of other people's problems and then run away quickly.)" -- Roland McGrath [EMAIL PROTECTED]
Re: displaying Unicode text (was Re: Transcriptions of Unicode)
Thanks! I appreciate the description. My fears were unfounded. This states that *for each character* in the element, the implementation is supposed to go down the list of fonts in the font-family property, to find a font that exists and that contains a glyph for the current character. I agree that this does not produce the optimal results, since one should have the freedom to select different fonts based on the context of the character. The above description is much better than a very coarse-grained approach (like having the entire document or element in the same font), but needs some more wriggle-room to allow people flexibility. Mark - Original Message - From: "Erik van der Poel" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Thursday, December 07, 2000 00:30 Subject: Re: displaying Unicode text (was Re: Transcriptions of "Unicode") Mark Davis wrote: Let's take an example. - The page is UTF-8. - It contains a mixture of German, dingbats and Hindi text. - My locale is de_DE. From your description, it sounds like Modzilla works as follows: - The locale maps (I'm guessing) to 8859-1 - 8859 maps to, say Helvetica. - The dingbats and Hindi appear as boxes or question marks. This would be pretty lame, so I hope I misunderstand you!! Sorry, I've been abbreviating quite a bit, so I left out a lot. Yes, you've misunderstood me, but only because I abbreviated so much. Sorry. Let me try again, with more feeling this time. Using the example above: - The locale maps to "x-western" (ja_JP would map to "ja", so I've prepended "x-" for the "language groups" that don't exist in RFC 1766) - x-western and CSS' sans-serif map to Arial - The dingbats appear as dingbats if they are in Unicode and at least one of the dingbat fonts on the system has a Unicode cmap subtable (WingDings is a "symbol" font, so it doesn't have such a table), while the Hindi might display OK on some Windows systems if they have Hindi support (Mozilla itself does not support any Indic languages yet). We could support the WingDings font if we add an entry for WingDings to the following table: http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp #872 We just haven't done that yet. Basically, Mozilla will look at all the fonts on the system to find one that contains a glyph for the current character. The language group and user locale stuff that I mentioned earlier is only one part of the process -- the part that deals with the user's font preferences. I'll explain more of the rest of the process: Mozilla implements CSS2's font matching algorithm: http://www.w3.org/TR/REC-CSS2/fonts.html#algorithm This states that *for each character* in the element, the implementation is supposed to go down the list of fonts in the font-family property, to find a font that exists and that contains a glyph for the current character. Mozilla implements this algorithm to the letter, which means that fonts are chosen for each character without regard for neighboring characters (unlike MSIE). This may actually have been a bad decision, since we sometimes end up with text that looks odd due to font changes. Anyway, Mozilla's algorithm has the following steps: 1. "User-Defined" font 2. CSS font-family property 3. CSS generic font (e.g. serif) 4. list of all fonts on system 5. transliteration 6. question mark You can see these steps in the following pieces of code: http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp #2642 http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#310 8 1. "User-Defined" font (FindUserDefinedFont) We decided to include the User-Defined font functionality in Netscape 6 again. It is similar to the old Netscape 4.X. Basically, if the user selects this encoding from the View menu, then the browser passes the bytes through to the font, untouched. This is for charsets that we don't already support. This step needs to be the first step, since it overrides everything else. 2. CSS font-family property (FindLocalFont) If the user hasn't selected User-Defined, we invoke this routine. It simply goes down the font-family list to find a font that exists and that contains a glyph for the current character. E.g.: font-family: Arial, "MS Gothic", sans-serif; 3. CSS generic font (FindGenericFont) If the above fails, this routine tries to find a font for the CSS generic (e.g. sans-serif) that was found in the font-family property, if any, otherwise it falls back to the user's default (serif or sans-serif). This is where the font preferences come in, so this is where we try to determine the language group of the element. I.e. we take the LANG attribute of this element or a parent element if any, oth
Re: Transcriptions of Unicode
But NN6 *does* select a font for characters outside the so-called user's locale when said characters are in a UTF-8 page. It appears that this mechanism is somewhat haphazard for CJK unified ideographs: I get a mix of fonts usually (probably because ja is in my locale "stack" currently and 'zh' and 'ko' are not, so I guess Japanese fonts are preferred for characters that are in JIS X 208 ??). AP === Addison P. PhillipsPrincipal Consultant Inter-Locale LLChttp://www.inter-locale.com Los Gatos, CA, USA mailto:[EMAIL PROTECTED] +1 408.210.3569 (mobile) +1 408.904.4762 (fax) === Globalization Engineering Consulting Services On Mon, 4 Dec 2000, Erik van der Poel wrote: Mark Davis wrote: What wasn't clear from his message is whether Mozilla picks a reasonable font if the language is not there. Sorry about the lack of clarity. When there is no LANG attribute in the element (or in a parent element), Mozilla uses the document's charset as a fallback. Mozilla has font preferences for each language group. The language groups have been set up to have a one-to-one correspondence with charsets (roughly). E.g. iso-8859-1 - Western, shift_jis - ja. When the charset is a Unicode-based one (e.g. UTF-8), then Mozilla uses the language group that contains the user's locale's language. In other words, Mozilla does not (yet) use the Unicode character codes to select fonts. We may do this in the future. Erik
Re: Transcriptions of Unicode
Erik van der Poel wrote: The font selection is indeed somewhat haphazard for CJK when there are no LANG attributes and the charset doesn't tell us anything either, but then, what do you expect in that situation anyway? I suppose we could deduce that the language is Japanese for Hiragana and Katakana, but what should we do about ideographs? Don't tell me the browser has to start guessing the language for those characters. I've had enough of the guessing game. We have been doing it for charsets for years, and it has led to trouble that we can't back out of now. I think we need to draw the line here, and tell Web page authors to mark their pages with LANG attributes or with particular fonts, preferrably in style sheets. A Universal Character Set should not require mark-up/tags. If the Japanese version of a Chinese character looks different than the Chinese character, it *is* different. In many cases, "variant" does not mean "same". When limited to BMP code points, CJK unification kind of made sense. In light of the new additional planes... The IRG seems to be doing a fine job. Best regards, James Kass.
Re: Transcriptions of Unicode
At 3:57 PM -0800 12/6/00, James Kass wrote: A Universal Character Set should not require mark-up/tags. Au contraire, it's been implicit in the design of Unicode from the beginning that markup/tags would be required in certain situations. If the Japanese version of a Chinese character looks different than the Chinese character, it *is* different. In many cases, "variant" does not mean "same". But as a rule, the Japanese and Chinese would disagree with you here. Certainly the IRG would disagree. Few in the west would argue over the fundamental unity of Fraktur and Roman variations of the Latin alphabet; most of the Chinese/Japanese variations are on that order or less. When limited to BMP code points, CJK unification kind of made sense. In light of the new additional planes... The IRG seems to be doing a fine job. Here you've really lost me. The IRG is unifying in plane 2, as well. Nobody in the IRG has suggested that we abandon unification for plane 2. -- = John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Transcriptions of Unicode
James Kass wrote: Erik van der Poel wrote: The font selection is indeed somewhat haphazard for CJK when there are no LANG attributes and the charset doesn't tell us anything either, but then, what do you expect in that situation anyway? I suppose we could deduce that the language is Japanese for Hiragana and Katakana, but what should we do about ideographs? Don't tell me the browser has to start guessing the language for those characters. I've had enough of the guessing game. We have been doing it for charsets for years, and it has led to trouble that we can't back out of now. I think we need to draw the line here, and tell Web page authors to mark their pages with LANG attributes or with particular fonts, preferrably in style sheets. A Universal Character Set should not require mark-up/tags. If the Japanese version of a Chinese character looks different than the Chinese character, it *is* different. In many cases, "variant" does not mean "same". I was referring to the CJK Unified Ideagraphs in the range U+4E00 to U+9FA5. I agree that those codes do not *require* mark-up/tags, but if the author wishes to have them displayed with a "Japanese font", then they must indicate the language or specify the font directly. The latter may be problematic. I don't think it's reasonable to expect a browser to apply various heuristics to determine the language. When limited to BMP code points, CJK unification kind of made sense. In light of the new additional planes... The IRG seems to be doing a fine job. Somehow I get the impression that you have more to say, but you just aren't saying it. Cough it up already. :-) Erik
Re: Transcriptions of Unicode
Erik van der Poel wrote: The font selection is indeed somewhat haphazard for CJK when there are no LANG attributes and the charset doesn't tell us anything either, but then, what do you expect in that situation anyway? I suppose we could deduce that the language is Japanese for Hiragana and Katakana, but what should we do about ideographs? Don't tell me the browser has to start guessing the language for those characters. I've had enough of the guessing game. We have been doing it for charsets for years, and it has led to trouble that we can't back out of now. I think we need to draw the line here, and tell Web page authors to mark their pages with LANG attributes or with particular fonts, preferrably in style sheets. A Universal Character Set should not require mark-up/tags. If the Japanese version of a Chinese character looks different than the Chinese character, it *is* different. In many cases, "variant" does not mean "same". I was referring to the CJK Unified Ideagraphs in the range U+4E00 to U+9FA5. I agree that those codes do not *require* mark-up/tags, but if the author wishes to have them displayed with a "Japanese font", then they must indicate the language or specify the font directly. The latter may be problematic. I don't think it's reasonable to expect a browser to apply various heuristics to determine the language. I completely agree that it is not reasonable to expect a browser to guess the language. Since browsers primarily display information, the browser doesn't really need to be language-aware in most cases. Exceptions like word-breaks for Thai and related scripts exist, of course. Even scripts which don't use spaces or other word breaks can be encoded with the special spacing variants available in the Unicode Standard, though. When limited to BMP code points, CJK unification kind of made sense. In light of the new additional planes... The IRG seems to be doing a fine job. Somehow I get the impression that you have more to say, but you just aren't saying it. Cough it up already. :-) Sorry, I'm trying to learn how to be brief (!) and hoped the inference would be apparent. Although the IRG still considers unification relevant, it seems to me that they are much tighter now in their definition of 'sameness' than was previously the case. Not all of the approx 4 "new" characters in Plane 2 are the names of race horses, some of them, as far as I can tell, would have been unified before. Consider the "teeth" ideograph(s). (Radical number 211, in some radical lists.) Because this is a radical, CJK encoders can select the specific desired character: U+2FD2 for Traditional Chinese U+2EED for Japanese U+2EEE for Simplified Chinese Since anyone encoding U+9F52 might see any of the above three versions, my opinion is that encoders (authors) would wish to explicitly encode their expected character and would do so whenever they have the option. I believe that they should have the option. The abundance of unassigned code points offered by additional Unicode planes makes this possible and would eliminate the need for a browser (or any other application) to "guess" a language in order to display material as its authors and users desire. Best regards, James Kass.
Re: Transcriptions of Unicode
At 6:40 PM -0800 12/6/00, James Kass wrote: Consider the "teeth" ideograph(s). (Radical number 211, in some radical lists.) Because this is a radical, CJK encoders can select the specific desired character: U+2FD2 for Traditional Chinese U+2EED for Japanese U+2EEE for Simplified Chinese Since anyone encoding U+9F52 might see any of the above three versions, my opinion is that encoders (authors) would wish to explicitly encode their expected character and would do so whenever they have the option. This doesn't reflect, however, the way people actually use these ideographs. By and large, the Japanese reader wants to see them drawn with the Japanese glyph, whether or not the originator was Chinese. There are some cases where the specific glyph *does* matter, largely in personal names. (We had a mildly heated discussion this morning in the IRG meeting going on about how to show one particular glyph for precisely this reason.) By and large, however, it is recognized that the glyph differences do *not* affect meaning and should be up to the reader, not forced by the originator. I believe that they should have the option. The abundance of unassigned code points offered by additional Unicode planes makes this possible and would eliminate the need for a browser (or any other application) to "guess" a language in order to display material as its authors and users desire. But then why not deunify the English and French alphabets? Or French and Polish accents? Or Fraktur and Italic and Roman styles of Latin? -- = John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: displaying Unicode text (was re: Transcriptions of Unicode)
John H. Jenkins wrote: At 3:57 PM -0800 12/6/00, James Kass wrote: A Universal Character Set should not require mark-up/tags. Au contraire, it's been implicit in the design of Unicode from the beginning that markup/tags would be required in certain situations. Because of the 65536 character limitation ? (Which no longer applies.) If the Japanese version of a Chinese character looks different than the Chinese character, it *is* different. In many cases, "variant" does not mean "same". But as a rule, the Japanese and Chinese would disagree with you here. Certainly the IRG would disagree. Few in the west would argue over the fundamental unity of Fraktur and Roman variations of the Latin alphabet; most of the Chinese/Japanese variations are on that order or less. As our Asian friends come on-line, they will hopefully contribute to the discussion in this regard. The reason I suspect that the Japanese would tend to agree is that Unicode had not been widely accepted by the Japanese user community. Perhaps if Unicode originated elsewhere, we would have had to deal with Greek/Latin/Cyrillic unification? (And we could say that since the "W" is really a ligature of two "V"s, it shouldn't have an explicit encoding...) When limited to BMP code points, CJK unification kind of made sense. In light of the new additional planes... The IRG seems to be doing a fine job. Here you've really lost me. The IRG is unifying in plane 2, as well. Nobody in the IRG has suggested that we abandon unification for plane 2. I tried to respond to this in an earlier letter. We don't even have CJK unification in the BMP, witness the blocks U+8A00 to U+8B9f versus U+8BA0 to U+8C36. Many of the characters in the latter block are simplified versions of the former. U+8A02/U+8BA2 U+8A03/U+8BA3 U+8A0C/U+8BA7 U+8A41/U+8BC2 etc. Fraktur and roman are both adaptations of the Latin script, or stylistic variations just as italic and roman. The Japanese writing system is Japanese, but derived from Chinese. As you say, some of the differences are minimal, perhaps slight variation in stroke order, but other differences are substantial. In some cases, the Japanese version may use a variant of a certain radical component, or even a different radical. I said I think the IRG is doing a fine job because it is such a monumental task, much progress is being made, and the results of their work seem to reflect the expectations of the various user communities involved. Best regards, James Kass.
Re: Transcriptions of Unicode
Hi Mark, You're right, but I believe what Erik is saying is that you can get Japanese-looking characters to be *preferred* over Chinese-looking characters (where fonts drawn in both styles are available) by using a LANG attribute for a specific page or SPAN. This could increase the acceptance of using UTF-8 as a page encoding in Asia Best Regards, Addison === Addison P. PhillipsPrincipal Consultant Inter-Locale LLChttp://www.inter-locale.com Los Gatos, CA, USA mailto:[EMAIL PROTECTED] +1 408.210.3569 (mobile) +1 408.904.4762 (fax) === Globalization Engineering Consulting Services On Sat, 2 Dec 2000, Mark Davis wrote: Won't Modzilla pick fonts based on character code? The only ones in the list that couldn't be deduced from that would be the Yiddish and the Chinese. Mark - Original Message - From: "Erik van der Poel" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 22:46 Subject: Re: Transcriptions of "Unicode" Cool. Now if you also add LANG attributes, Mozilla/Netscape 6 will use the fonts that have been set up for those languages. E.g.: span lang="ja" title="Japanese".../span Erik Mark Davis wrote: Done. From: "Michael (michka) Kaplan" [EMAIL PROTECTED] I would suggest adding a span title="{insert lang name}"/title Mark Davis wrote: http://www.macchiato.com/unicode/Unicode_transcriptions.html
Re: Transcriptions of Unicode
I agree, that is the right thing to do. What wasn't clear from his message is whether Mozilla picks a reasonable font if the language is not there. Since NN didn't do this in the past, I was wondering whether that has been improved. Mark - Original Message - From: [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Monday, December 04, 2000 10:40 Subject: Re: Transcriptions of "Unicode" Hi Mark, You're right, but I believe what Erik is saying is that you can get Japanese-looking characters to be *preferred* over Chinese-looking characters (where fonts drawn in both styles are available) by using a LANG attribute for a specific page or SPAN. This could increase the acceptance of using UTF-8 as a page encoding in Asia Best Regards, Addison === Addison P. PhillipsPrincipal Consultant Inter-Locale LLChttp://www.inter-locale.com Los Gatos, CA, USA mailto:[EMAIL PROTECTED] +1 408.210.3569 (mobile) +1 408.904.4762 (fax) === Globalization Engineering Consulting Services On Sat, 2 Dec 2000, Mark Davis wrote: Won't Modzilla pick fonts based on character code? The only ones in the list that couldn't be deduced from that would be the Yiddish and the Chinese. Mark - Original Message - From: "Erik van der Poel" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 22:46 Subject: Re: Transcriptions of "Unicode" Cool. Now if you also add LANG attributes, Mozilla/Netscape 6 will use the fonts that have been set up for those languages. E.g.: span lang="ja" title="Japanese".../span Erik Mark Davis wrote: Done. From: "Michael (michka) Kaplan" [EMAIL PROTECTED] I would suggest adding a span title="{insert lang name}"/title Mark Davis wrote: http://www.macchiato.com/unicode/Unicode_transcriptions.html
Re: Transcriptions of Unicode
Mark Davis wrote: What wasn't clear from his message is whether Mozilla picks a reasonable font if the language is not there. Sorry about the lack of clarity. When there is no LANG attribute in the element (or in a parent element), Mozilla uses the document's charset as a fallback. Mozilla has font preferences for each language group. The language groups have been set up to have a one-to-one correspondence with charsets (roughly). E.g. iso-8859-1 - Western, shift_jis - ja. When the charset is a Unicode-based one (e.g. UTF-8), then Mozilla uses the language group that contains the user's locale's language. In other words, Mozilla does not (yet) use the Unicode character codes to select fonts. We may do this in the future. Erik
Re: Transcriptions of Unicode
Mark Davis wrote: What wasn't clear from his message is whether Mozilla picks a reasonable font if the language is not there. Sorry about the lack of clarity. When there is no LANG attribute in the element (or in a parent element), Mozilla uses the document's charset as a fallback. Mozilla has font preferences for each language group. The language groups have been set up to have a one-to-one correspondence with charsets (roughly). E.g. iso-8859-1 - Western, shift_jis - ja. When the charset is a Unicode-based one (e.g. UTF-8), then Mozilla uses the language group that contains the user's locale's language. In other words, Mozilla does not (yet) use the Unicode character codes to select fonts. We may do this in the future. Erik
Re: Transcriptions of Unicode
FWIW, IE does not do an absolutely stellar job here, either. Not all Unicode subranges have fonts automatically assigned, yet it is smart enough if you bring up the font dialog that lists the fonts which cover the subrange. Although there was no "lame" button when I pulled up the dialog, selected Ethiopic, saw two fonts listed but IE did not select either, there SHOULD have been one. Because it was awfully lame. smart enough to know a font is needed, smart enough to list the ones that would work, but stupid to just select one? :-( I am hoping they address this is in IE 6.0. No one should ever need this dialog unless they want to override choices. :-) michka a new book on internationalization in VB at http://www.i18nWithVB.com/ - Original Message - From: "Erik van der Poel" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Monday, December 04, 2000 10:08 PM Subject: Re: Transcriptions of "Unicode" Mark Davis wrote: What wasn't clear from his message is whether Mozilla picks a reasonable font if the language is not there. Sorry about the lack of clarity. When there is no LANG attribute in the element (or in a parent element), Mozilla uses the document's charset as a fallback. Mozilla has font preferences for each language group. The language groups have been set up to have a one-to-one correspondence with charsets (roughly). E.g. iso-8859-1 - Western, shift_jis - ja. When the charset is a Unicode-based one (e.g. UTF-8), then Mozilla uses the language group that contains the user's locale's language. In other words, Mozilla does not (yet) use the Unicode character codes to select fonts. We may do this in the future. Erik
Re: Transcriptions of Unicode
Won't Modzilla pick fonts based on character code? The only ones in the list that couldn't be deduced from that would be the Yiddish and the Chinese. Mark - Original Message - From: "Erik van der Poel" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 22:46 Subject: Re: Transcriptions of "Unicode" Cool. Now if you also add LANG attributes, Mozilla/Netscape 6 will use the fonts that have been set up for those languages. E.g.: span lang="ja" title="Japanese".../span Erik Mark Davis wrote: Done. From: "Michael (michka) Kaplan" [EMAIL PROTECTED] I would suggest adding a span title="{insert lang name}"/title Mark Davis wrote: http://www.macchiato.com/unicode/Unicode_transcriptions.html
Re: Transcriptions of Unicode
By the way, Eric, I got NN6 to run, but it does some wierd things with pages. Take a look at my homepage http://www.macchiato.com/ on NN6, compared to NN4.7 or IE5.5. Also, my javascript converter doesn't work on it, where it does on NN4.7 and IE5.5 Mark - Original Message - From: "Erik van der Poel" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 22:46 Subject: Re: Transcriptions of "Unicode" Cool. Now if you also add LANG attributes, Mozilla/Netscape 6 will use the fonts that have been set up for those languages. E.g.: span lang="ja" title="Japanese".../span Erik Mark Davis wrote: Done. From: "Michael (michka) Kaplan" [EMAIL PROTECTED] I would suggest adding a span title="{insert lang name}"/title Mark Davis wrote: http://www.macchiato.com/unicode/Unicode_transcriptions.html
Re: Transcriptions of Unicode
Sad to report, my browser (Netscape 4.7) shows the Yiddish as Daw-key-nu-ye (It's left to right not rtl...) I am using the Monotype Andale Duospace font. tex Mark Davis wrote: I am interested in collecting transcriptions of the word "Unicode" in different scripts (and languages). If you are fluent in a language other than Unicode, I'd appreciate any suggestions. What I have so far is at: http://www.macchiato.com/unicode/Unicode_transcriptions.html Mark ___ Mark Davis, IBM Center for Java Technology, Cupertino (408) 777-5850 [fax: 5891], [EMAIL PROTECTED], [EMAIL PROTECTED] http://maps.yahoo.com/py/maps.py?Pyt=Tmapaddr=10275+N.+De+Anzacsz=95014 -- -- Tex Texin Director, International Business mailto:[EMAIL PROTECTED] +1-781-280-4271 Fax:+1-781-280-4655 Progress Software Corp.14 Oak Park, Bedford, MA 01730 http://www.Progress.com#1 Embedded Database http://www.SonicMQ.com #1 Performing JMS Messaging http://www.ASPconnections.com #1 provider in the ASP marketplace http://www.NuSphere.comOpen Source software and services for MySQL Globalization Program http://www.Progress.com/partners/globalization.htm ---
Re: Transcriptions of Unicode
Done. - Original Message - From: "Michael (michka) Kaplan" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 15:19 Subject: Re: Transcriptions of "Unicode" IE 5.0, 5.5, NN 6.0, and the latest build of Mozilla all do the right thing with the word. So that would be the fault of your browser choice. :-) I would suggest adding a span title="{insert lang name}"/title around each lang name, as it will cause IE to show the language name in a tooltip when you hover the mouse after a slight delay lets people guess the languages and then see if their guesses were right. Always a nice effect... michka a new book on internationalization in VB at http://www.i18nWithVB.com/ - Original Message - From: "Tex Texin" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 2:30 PM Subject: Re: Transcriptions of "Unicode" Sad to report, my browser (Netscape 4.7) shows the Yiddish as Daw-key-nu-ye (It's left to right not rtl...) I am using the Monotype Andale Duospace font. tex Mark Davis wrote: I am interested in collecting transcriptions of the word "Unicode" in different scripts (and languages). If you are fluent in a language other than Unicode, I'd appreciate any suggestions. What I have so far is at: http://www.macchiato.com/unicode/Unicode_transcriptions.html Mark ___ Mark Davis, IBM Center for Java Technology, Cupertino (408) 777-5850 [fax: 5891], [EMAIL PROTECTED], [EMAIL PROTECTED] http://maps.yahoo.com/py/maps.py?Pyt=Tmapaddr=10275+N.+De+Anzacsz=95014 -- -- Tex Texin Director, International Business mailto:[EMAIL PROTECTED] +1-781-280-4271 Fax:+1-781-280-4655 Progress Software Corp.14 Oak Park, Bedford, MA 01730 http://www.Progress.com#1 Embedded Database http://www.SonicMQ.com #1 Performing JMS Messaging http://www.ASPconnections.com #1 provider in the ASP marketplace http://www.NuSphere.comOpen Source software and services for MySQL Globalization Program http://www.Progress.com/partners/globalization.htm -- -