Re: Transcriptions of Unicode

2001-01-12 Thread Marco Cimarosti

Hallo everybody!

I don't fully agree with Mark Davis' API transcription of "Unicode":

http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_
IPA.gif

Because:

1) I think that IPA transcriptions should be in [square brackets], while
phonemic transcriptions should be in /slashes/. If neither enclosing is
present, the transcription is ambiguous.

2) AFAIK, the phoneme [o:] (a long version of "o" in "got") does not exist
in any standard pronunciation of contemporary English. It should rather be
the diphthong [ou] (where the [u] would probably better be U+028A).

3) The transcription shows the primary stress on the first syllable, and a
secondary stress on the last one. In the few occasions when I heard native
English speakers saying "Unicode", I had the impression that it rather was
the other way round.

4) As "Unicode" is the proper name of an international standard, and it is
built with two English roots of French origin, it could as well be
considered a French word, which would lead to a totally different
transcription.

Sorry if I am repeating something already said by other people: I have been
off the list for a while. And, about points 2 and 3 above, beware that I am
a second language English speaker and that I don't have much experience of
American pronunciation.

Ciao.
Marco Cimarosti



Re: Transcriptions of Unicode

2001-01-12 Thread Lukas Pietsch

Marco Cimarosti wrote:

 I don't fully agree with Mark Davis' API transcription of "Unicode":


http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U
_IPA.gif

Neither do I, but partly for different reasons.


 1) I think that IPA transcriptions should be in [square brackets], while
 phonemic transcriptions should be in /slashes/. If neither enclosing is
 present, the transcription is ambiguous.

Right. And that's actually part of the key to the problem's answer:

 2) AFAIK, the phoneme [o:] (a long version of "o" in "got") does not
exist
 in any standard pronunciation of contemporary English. It should rather
be
 the diphthong [ou] (where the [u] would probably better be U+028A).

In America, transcribing the vowel in "code" as /o/ (and "made" as /e/) is
not uncommon, at least in *phonemic* transcription. Generally, American
accents have less diphthongization in these sounds than British accents
have, and phonemically it makes sense to see these sounds as part of the
series of "long vowels". A *narrow phonetic* transcription would have
something like [u+006F u+028A] for American, and [u+0259 u+028A] for
British.

 3) The transcription shows the primary stress on the first syllable, and
a
 secondary stress on the last one. In the few occasions when I heard
native
 English speakers saying "Unicode", I had the impression that it rather
was
 the other way round.

I can't tell, because where I live I don't get to talk to native speakers
about Unicode a lot. But: According to standard word-formation and
pronunciation patterns in English, the stress pattern shown ('uni,code) is
absolutely what you'd expect: as in "uniform", "unisex", "unicorn",
"universe". (D. Jones, English Pronouncing Dictionary, doesn't even mark a
secondary stress on the third syllable at all.)

 4) As "Unicode" is the proper name of an international standard, and it
is
 built with two English roots of French origin, it could as well be
 considered a French word, which would lead to a totally different
 transcription.

Right, but this particular pattern of merging word roots into a new word
does suggest English provenance, I think. And, historically, that's where
it did come from.

But there's another inconsistency in the transcription: the vowels in the
first ("u-") and third ("-code") syllable are both phonemically long.
Either you put the length mark on both (recommended for *phonetic*
transcription), or on neither (okay with *phonemic* transcription). (Of
course, if you transcribe the third syllable as a diphthong then you won't
get a length mark there.)

According to the conventions in D. Jones, English Pronouncing Dictionary,
you'd get something like:

[u+02C8 u+006A u+0075 u+02D0 u+006E u+026A  u+006B u+0259 u+028A u+0064]

Lukas


-
Lukas Pietsch
University of Freiburg
English Department

Phone (p.) (#49) (761) 696 37 23
mailto:[EMAIL PROTECTED]




Re: Transcriptions of Unicode

2001-01-12 Thread Mark Davis



Much as I admire and appreciate the 
French language (second only to Italian), the proximate derivation of "Unicode" 
was not from that language, and the transcription should not match the French 
pronunciation. Instead, it has solid Northern Californian roots (even 
thoughnot exactly dating from the Gold Rush days).

According to the references I have, 
the prefix "uni" is directly from Latin while the word "code" is through French. 
The Indo-European would have been *oi-no-kau-do ("give one strike"):*kau 
apparently being related to such English words as: hew, haggle, hoe, hag, hay, 
hack, caudad, caudal, caudate, caudex, coda, codex, codicil, coward, incus, and 
Kovač (personal name: 'smith'). I will leave the exact derivations to the 
exegetes, but I like the association with "haggle" myself.

I will ask our resident phonetician 
about the IPA transcription. Clearly Standard British English would add some 
interesting -- and no doubt valuable --complexities and nuances to the 
vowels, but that is not the goal in this case. Even "o" is oftena 
diphthong in English, it is probably better to have [o:] as a target for 
matching from other languages, since [ou] may be considered slightly affected in 
the native language.

The stress is definitely on the 
first syllable. One does hear some normal generative English variations such as 
ˈjunəˌkoːd. (schwa instead of short-i), but the stress 
still should be on the first syllable, as in "unify", not later in the word as 
in "unique". Of course, the best approximation in the target language should be 
used: if it does not allow for that position for the stress (without affection), 
then the secondary stress should be used.

Mark

- Original Message - 


From: "Marco Cimarosti" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Friday, January 12, 2001 
03:11
Subject: Re: Transcriptions of 
"Unicode"
 Hallo everybody!  I don't 
fully agree with Mark Davis' API transcription of "Unicode":  
http://my.ispchannel.com/~markdavis//unicode/Unicode_transcription_images/U_ IPA.gif  
Because:  1) I think that IPA transcriptions should be in 
[square brackets], while phonemic transcriptions should be in /slashes/. 
If neither enclosing is present, the transcription is ambiguous. 
 2) AFAIK, the phoneme [o:] (a long version of "o" in "got") does not 
exist in any standard pronunciation of contemporary English. It should 
rather be the diphthong [ou] (where the [u] would probably better be 
U+028A).  3) The transcription shows the primary stress on the 
first syllable, and a secondary stress on the last one. In the few 
occasions when I heard native English speakers saying "Unicode", I had 
the impression that it rather was the other way round.  
4) As "Unicode" is the proper name of an international standard, and it 
is built with two English roots of French origin, it could as well 
be considered a French word, which would lead to a totally 
different transcription.  Sorry if I am repeating 
something already said by other people: I have been off the list for a 
while. And, about points 2 and 3 above, beware that I am a second 
language English speaker and that I don't have much experience of 
American pronunciation.  Ciao. Marco 
Cimarosti


Re: Transcriptions of Unicode: Still Missing scripts

2001-01-12 Thread Thomas Chan

On Thu, 11 Jan 2001, Mark Davis wrote:

 By the way, I am still missing the following. If anyone can supply them, I'd
 appreciate it.
 
 [BOPOMOFO]
[snip]
[MONGOLIAN]
[snip]
 See http://www.macchiato.com/unicode/Unicode_transcriptions.html for
 details.

It's still not very clear to me what this is supposed to be a list of.
The title says "Transcriptions of Unicode", and a note at the bottom says
"For non-Latin scripts the goal is to match the English pronunciation --
not spelling."

Some of the entries (leftmost column of the table) are names of languages,
while others are names of scripts.  e.g., "Russian" and "Japanese" are
names of languages, with examples given in Cyrillic and Katakana,
respectively.  For some scripts, there is basically only one language that
uses it, such as Katakana (used by Japanese) or Hangul (used by Korean),
while other scripts are used by many languages.  It this supposed to
suggest that Russian is the representative language to give a Cyrillic
example in, and say, not Mongolian?

In some cases, it seems the example is not necessarily a transcription of
the English pronunciation, but a translation into another language,
most likely a loanword, with attendant sound changes.  e.g., Japanese
"yunikoodo".  I notice the lack of a request for an example using the
Hiragana script (which is also used by Japanese), which suggests that the
Japanese example is not a transcription of the English pronunciation into
Katakana, but a Japanese word (albeit a loanword).  Otherwise, it would be
possible to provide a Hiragana example, however nonsenical or non-existant
it may be in reality.  There is also the particular case of the Chinese
entries, written in CJK "ideographs", which *are* translations using the
calque strategy.

It seems to me that this list is intended to showcase a variety of ways to
write "Unicode", be they transcriptions, transliterations, or
translations--whatever maximizes the number of scripts that one can show
off, apparently.

This raises some questions of what an example showcasing the Bopomofo
script should look like.  Basically, it is used only for Chinese,
primarily Mandarin (zh-guoyu).  It is also primarily an auxiliary script
for ruby annotation of Chinese text written in CJK "ideographs", although
it may stand alone.  So, if it is a transcription of English
pronunciation, then it will have to go through the language filter of
Mandarin Chinese, and this form may or may not be attested in 
reality--perhaps as a "best-fit" colloquial attempt to say a foreign
(English) word.  And this version would have the script standing alone.

Alternatively, it could be a transcription according to Mandarin Chinese
pronunication of the already existing Chinese translations written in CJK
"ideographs".  In this case, it could either stand alone, or be attached
as ruby annotation to the CJK "ideograph" version (in Chinese).
Implemenation-wise, it would be problematic seeing the Bopomofo at the
size it would be in for ruby annotation of text in a 96x24 bitmap (as
requested on the page.  Also, Bopomofo does have an inclination to be used
with Chinese text written top-to-bottom, so the horizontal shape of the
96x24 bitmap is problematic--more generally, vertically written scripts
such as the traditional Mongolian script (also requested) cannot be
demonstrated within this framework.


Thomas Chan
[EMAIL PROTECTED]






RE: Transcriptions of Unicode

2001-01-12 Thread Marco Cimarosti

Peter Constable wrote:
 I'd add the square brackets, an off-glide on the "o", and
 aspiration (02b0) after the "k".

Is that k aspirated? I do hear an aspiration when [p], [t] or [k] are at the
*beginning* of "words" (mainly because teachers told me I was supposed to
notice it), but I don't feel it *inside* a word.

 One other point:

Yes? :-)

Marco



Re: Transcriptions of Unicode

2001-01-12 Thread Thomas Chan

On Fri, 12 Jan 2001, Lukas Pietsch wrote:

 Marco Cimarosti wrote:
  3) The transcription shows the primary stress on the first syllable, and
 a
  secondary stress on the last one. In the few occasions when I heard
 native
  English speakers saying "Unicode", I had the impression that it rather
 was
  the other way round.
 
 I can't tell, because where I live I don't get to talk to native speakers
 about Unicode a lot. But: According to standard word-formation and

There is "Unicode, Oh Unicode" anthem/hymn--sound files located in
/Other/Sounds/ directory on the cd-rom published with the book, as well as
an audio track on the same disc.  If this can be taken as an official
stance on pronunciation of the term (the WhatIsThis.txt explanatory file
does not provide any clues), well, I do not know...


Thomas Chan
[EMAIL PROTECTED]




Unicode before Unicode

2001-01-12 Thread Jungshik Shin


I didn't expect 'Unicode' to be in OED II (1989), but it is.  OED II cites
a few examples (including the title of a book: 'Unicode: The Universal
Telegraphic Phrase-Book' ) of 'Unicode' used in the late 19th century
and gives the following meaning to the word:

  A telegraphic code in which one word or set of letters represents a
sentence or phrase; a telegram or message in this.

Apparently, the word was coined in Britain (so the 'old Unicode'
does not have North Californian origin :-) while the new one
has )

Maybe it's been known to some, but I though this is new to some other
people like me.  Just out of curiosity, I'm wondering if the book
mentioned above was used in the US as well as in Britain.

Jungshik Shin




RE: Transcriptions of Unicode

2001-01-12 Thread Peter_Constable


On 01/12/2001 10:33:48 AM Marco Cimarosti wrote:

Is that k aspirated?

It is for any English speakers I've ever met.


 One other point:

Yes? :-)

Oops. It was to be the point about the aspirated k. I forgot to delete
that.


Peter




Re: Representation of aspiration (was: Re: Transcriptions of Unicode)

2001-01-12 Thread Richard Cook

Kenneth Whistler wrote:
 
 Richard Cook surmised:
 
  BTW, in a very close transcription, if one is using superscription
  (position above baseline) and relative size reduction to indicate
  aspiration, I suppose that degree of superscription or the size or both
  could be modulated to indicate degree of aspiration?
 
 Nah, if you tried to go down that path, you'd just end up with
 unrepresentable transcriptions and unreliable reproduction. I doubt
 that there are many transcribers who could reliably record more than
 three degrees of aspiration, anyway (roughly: slight aspiration,
 "normal" aspiration, and superaspiration).

Ken, I was only kidding ... mostly,  should have put a smiley in there
:-) But I was also thinking of the superscription question, which I
think Peter C. might like to discuss.
 
 Once you go past that level, which could be reliably indicated with
 appropriate use of diacritics, you are really into the realm of
 instrumental phonetics. I'd just hook up the machine and let it
 give you precise timings of voice delays post consonatal release
 in milliseconds.
 
 
  Or perhaps just mark-up the unsuperscripted aspiration indicator, to
  note degree of aspiration ... however you would like to measure that.
 
 No need to "mark it up". Just add another diacritic. That's how
 most transcribers would work, in practice.
 
Well, I was thinking of linking the transcription to the machine data
... so that the relation would be set on a compound key (aspiration
diacritic  measurement reference) ...



Re: Transcriptions of Unicode

2001-01-12 Thread Mark Davis

Thanks for your detailed note; I'll have to think it over.
...
 But there's another inconsistency in the transcription: the vowels in the
 first ("u-") and third ("-code") syllable are both phonemically long.
 Either you put the length mark on both (recommended for *phonetic*
 transcription), or on neither (okay with *phonemic* transcription). (Of

The o is significantly longer than the u, probably due to the following d.

...
 
 -
 Lukas Pietsch
 University of Freiburg
 English Department
 
 Phone (p.) (#49) (761) 696 37 23
 mailto:[EMAIL PROTECTED]