Re: Transcriptions of Unicode
Hallo everybody! I don't fully agree with Mark Davis' API transcription of "Unicode": http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_ IPA.gif Because: 1) I think that IPA transcriptions should be in [square brackets], while phonemic transcriptions should be in /slashes/. If neither enclosing is present, the transcription is ambiguous. 2) AFAIK, the phoneme [o:] (a long version of "o" in "got") does not exist in any standard pronunciation of contemporary English. It should rather be the diphthong [ou] (where the [u] would probably better be U+028A). 3) The transcription shows the primary stress on the first syllable, and a secondary stress on the last one. In the few occasions when I heard native English speakers saying "Unicode", I had the impression that it rather was the other way round. 4) As "Unicode" is the proper name of an international standard, and it is built with two English roots of French origin, it could as well be considered a French word, which would lead to a totally different transcription. Sorry if I am repeating something already said by other people: I have been off the list for a while. And, about points 2 and 3 above, beware that I am a second language English speaker and that I don't have much experience of American pronunciation. Ciao. Marco Cimarosti
Re: Transcriptions of Unicode
Marco Cimarosti wrote: I don't fully agree with Mark Davis' API transcription of "Unicode": http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U _IPA.gif Neither do I, but partly for different reasons. 1) I think that IPA transcriptions should be in [square brackets], while phonemic transcriptions should be in /slashes/. If neither enclosing is present, the transcription is ambiguous. Right. And that's actually part of the key to the problem's answer: 2) AFAIK, the phoneme [o:] (a long version of "o" in "got") does not exist in any standard pronunciation of contemporary English. It should rather be the diphthong [ou] (where the [u] would probably better be U+028A). In America, transcribing the vowel in "code" as /o/ (and "made" as /e/) is not uncommon, at least in *phonemic* transcription. Generally, American accents have less diphthongization in these sounds than British accents have, and phonemically it makes sense to see these sounds as part of the series of "long vowels". A *narrow phonetic* transcription would have something like [u+006F u+028A] for American, and [u+0259 u+028A] for British. 3) The transcription shows the primary stress on the first syllable, and a secondary stress on the last one. In the few occasions when I heard native English speakers saying "Unicode", I had the impression that it rather was the other way round. I can't tell, because where I live I don't get to talk to native speakers about Unicode a lot. But: According to standard word-formation and pronunciation patterns in English, the stress pattern shown ('uni,code) is absolutely what you'd expect: as in "uniform", "unisex", "unicorn", "universe". (D. Jones, English Pronouncing Dictionary, doesn't even mark a secondary stress on the third syllable at all.) 4) As "Unicode" is the proper name of an international standard, and it is built with two English roots of French origin, it could as well be considered a French word, which would lead to a totally different transcription. Right, but this particular pattern of merging word roots into a new word does suggest English provenance, I think. And, historically, that's where it did come from. But there's another inconsistency in the transcription: the vowels in the first ("u-") and third ("-code") syllable are both phonemically long. Either you put the length mark on both (recommended for *phonetic* transcription), or on neither (okay with *phonemic* transcription). (Of course, if you transcribe the third syllable as a diphthong then you won't get a length mark there.) According to the conventions in D. Jones, English Pronouncing Dictionary, you'd get something like: [u+02C8 u+006A u+0075 u+02D0 u+006E u+026A u+006B u+0259 u+028A u+0064] Lukas - Lukas Pietsch University of Freiburg English Department Phone (p.) (#49) (761) 696 37 23 mailto:[EMAIL PROTECTED]
Re: Transcriptions of Unicode
Much as I admire and appreciate the French language (second only to Italian), the proximate derivation of "Unicode" was not from that language, and the transcription should not match the French pronunciation. Instead, it has solid Northern Californian roots (even thoughnot exactly dating from the Gold Rush days). According to the references I have, the prefix "uni" is directly from Latin while the word "code" is through French. The Indo-European would have been *oi-no-kau-do ("give one strike"):*kau apparently being related to such English words as: hew, haggle, hoe, hag, hay, hack, caudad, caudal, caudate, caudex, coda, codex, codicil, coward, incus, and Kovač (personal name: 'smith'). I will leave the exact derivations to the exegetes, but I like the association with "haggle" myself. I will ask our resident phonetician about the IPA transcription. Clearly Standard British English would add some interesting -- and no doubt valuable --complexities and nuances to the vowels, but that is not the goal in this case. Even "o" is oftena diphthong in English, it is probably better to have [o:] as a target for matching from other languages, since [ou] may be considered slightly affected in the native language. The stress is definitely on the first syllable. One does hear some normal generative English variations such as ˈjunəˌkoːd. (schwa instead of short-i), but the stress still should be on the first syllable, as in "unify", not later in the word as in "unique". Of course, the best approximation in the target language should be used: if it does not allow for that position for the stress (without affection), then the secondary stress should be used. Mark - Original Message - From: "Marco Cimarosti" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Friday, January 12, 2001 03:11 Subject: Re: Transcriptions of "Unicode" Hallo everybody! I don't fully agree with Mark Davis' API transcription of "Unicode": http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_ IPA.gif Because: 1) I think that IPA transcriptions should be in [square brackets], while phonemic transcriptions should be in /slashes/. If neither enclosing is present, the transcription is ambiguous. 2) AFAIK, the phoneme [o:] (a long version of "o" in "got") does not exist in any standard pronunciation of contemporary English. It should rather be the diphthong [ou] (where the [u] would probably better be U+028A). 3) The transcription shows the primary stress on the first syllable, and a secondary stress on the last one. In the few occasions when I heard native English speakers saying "Unicode", I had the impression that it rather was the other way round. 4) As "Unicode" is the proper name of an international standard, and it is built with two English roots of French origin, it could as well be considered a French word, which would lead to a totally different transcription. Sorry if I am repeating something already said by other people: I have been off the list for a while. And, about points 2 and 3 above, beware that I am a second language English speaker and that I don't have much experience of American pronunciation. Ciao. Marco Cimarosti
Re: Transcriptions of Unicode: Still Missing scripts
On Thu, 11 Jan 2001, Mark Davis wrote: By the way, I am still missing the following. If anyone can supply them, I'd appreciate it. [BOPOMOFO] [snip] [MONGOLIAN] [snip] See http://www.macchiato.com/unicode/Unicode_transcriptions.html for details. It's still not very clear to me what this is supposed to be a list of. The title says "Transcriptions of Unicode", and a note at the bottom says "For non-Latin scripts the goal is to match the English pronunciation -- not spelling." Some of the entries (leftmost column of the table) are names of languages, while others are names of scripts. e.g., "Russian" and "Japanese" are names of languages, with examples given in Cyrillic and Katakana, respectively. For some scripts, there is basically only one language that uses it, such as Katakana (used by Japanese) or Hangul (used by Korean), while other scripts are used by many languages. It this supposed to suggest that Russian is the representative language to give a Cyrillic example in, and say, not Mongolian? In some cases, it seems the example is not necessarily a transcription of the English pronunciation, but a translation into another language, most likely a loanword, with attendant sound changes. e.g., Japanese "yunikoodo". I notice the lack of a request for an example using the Hiragana script (which is also used by Japanese), which suggests that the Japanese example is not a transcription of the English pronunciation into Katakana, but a Japanese word (albeit a loanword). Otherwise, it would be possible to provide a Hiragana example, however nonsenical or non-existant it may be in reality. There is also the particular case of the Chinese entries, written in CJK "ideographs", which *are* translations using the calque strategy. It seems to me that this list is intended to showcase a variety of ways to write "Unicode", be they transcriptions, transliterations, or translations--whatever maximizes the number of scripts that one can show off, apparently. This raises some questions of what an example showcasing the Bopomofo script should look like. Basically, it is used only for Chinese, primarily Mandarin (zh-guoyu). It is also primarily an auxiliary script for ruby annotation of Chinese text written in CJK "ideographs", although it may stand alone. So, if it is a transcription of English pronunciation, then it will have to go through the language filter of Mandarin Chinese, and this form may or may not be attested in reality--perhaps as a "best-fit" colloquial attempt to say a foreign (English) word. And this version would have the script standing alone. Alternatively, it could be a transcription according to Mandarin Chinese pronunication of the already existing Chinese translations written in CJK "ideographs". In this case, it could either stand alone, or be attached as ruby annotation to the CJK "ideograph" version (in Chinese). Implemenation-wise, it would be problematic seeing the Bopomofo at the size it would be in for ruby annotation of text in a 96x24 bitmap (as requested on the page. Also, Bopomofo does have an inclination to be used with Chinese text written top-to-bottom, so the horizontal shape of the 96x24 bitmap is problematic--more generally, vertically written scripts such as the traditional Mongolian script (also requested) cannot be demonstrated within this framework. Thomas Chan [EMAIL PROTECTED]
RE: Transcriptions of Unicode
Peter Constable wrote: I'd add the square brackets, an off-glide on the "o", and aspiration (02b0) after the "k". Is that k aspirated? I do hear an aspiration when [p], [t] or [k] are at the *beginning* of "words" (mainly because teachers told me I was supposed to notice it), but I don't feel it *inside* a word. One other point: Yes? :-) Marco
Re: Transcriptions of Unicode
On Fri, 12 Jan 2001, Lukas Pietsch wrote: Marco Cimarosti wrote: 3) The transcription shows the primary stress on the first syllable, and a secondary stress on the last one. In the few occasions when I heard native English speakers saying "Unicode", I had the impression that it rather was the other way round. I can't tell, because where I live I don't get to talk to native speakers about Unicode a lot. But: According to standard word-formation and There is "Unicode, Oh Unicode" anthem/hymn--sound files located in /Other/Sounds/ directory on the cd-rom published with the book, as well as an audio track on the same disc. If this can be taken as an official stance on pronunciation of the term (the WhatIsThis.txt explanatory file does not provide any clues), well, I do not know... Thomas Chan [EMAIL PROTECTED]
Unicode before Unicode
I didn't expect 'Unicode' to be in OED II (1989), but it is. OED II cites a few examples (including the title of a book: 'Unicode: The Universal Telegraphic Phrase-Book' ) of 'Unicode' used in the late 19th century and gives the following meaning to the word: A telegraphic code in which one word or set of letters represents a sentence or phrase; a telegram or message in this. Apparently, the word was coined in Britain (so the 'old Unicode' does not have North Californian origin :-) while the new one has ) Maybe it's been known to some, but I though this is new to some other people like me. Just out of curiosity, I'm wondering if the book mentioned above was used in the US as well as in Britain. Jungshik Shin
RE: Transcriptions of Unicode
On 01/12/2001 10:33:48 AM Marco Cimarosti wrote: Is that k aspirated? It is for any English speakers I've ever met. One other point: Yes? :-) Oops. It was to be the point about the aspirated k. I forgot to delete that. Peter
Re: Representation of aspiration (was: Re: Transcriptions of Unicode)
Kenneth Whistler wrote: Richard Cook surmised: BTW, in a very close transcription, if one is using superscription (position above baseline) and relative size reduction to indicate aspiration, I suppose that degree of superscription or the size or both could be modulated to indicate degree of aspiration? Nah, if you tried to go down that path, you'd just end up with unrepresentable transcriptions and unreliable reproduction. I doubt that there are many transcribers who could reliably record more than three degrees of aspiration, anyway (roughly: slight aspiration, "normal" aspiration, and superaspiration). Ken, I was only kidding ... mostly, should have put a smiley in there :-) But I was also thinking of the superscription question, which I think Peter C. might like to discuss. Once you go past that level, which could be reliably indicated with appropriate use of diacritics, you are really into the realm of instrumental phonetics. I'd just hook up the machine and let it give you precise timings of voice delays post consonatal release in milliseconds. Or perhaps just mark-up the unsuperscripted aspiration indicator, to note degree of aspiration ... however you would like to measure that. No need to "mark it up". Just add another diacritic. That's how most transcribers would work, in practice. Well, I was thinking of linking the transcription to the machine data ... so that the relation would be set on a compound key (aspiration diacritic measurement reference) ...
Re: Transcriptions of Unicode
Thanks for your detailed note; I'll have to think it over. ... But there's another inconsistency in the transcription: the vowels in the first ("u-") and third ("-code") syllable are both phonemically long. Either you put the length mark on both (recommended for *phonetic* transcription), or on neither (okay with *phonemic* transcription). (Of The o is significantly longer than the u, probably due to the following d. ... - Lukas Pietsch University of Freiburg English Department Phone (p.) (#49) (761) 696 37 23 mailto:[EMAIL PROTECTED]