On Sep 1, 2011, at 9:30 PM, Steven D'Aprano wrote:
> Antoine Pitrou wrote:
>> Le jeudi 01 septembre 2011 à 08:45 -0700, Guido van Rossum a écrit :
>>> This is definitely thought of as a separate
>>> mark added to the e; ë is not a new letter. I have a feeling it's the same
>>> way for the French and Germans, but I really don't know.
>>> (Antoine? Georg?)
>> Indeed, they are not separate "letters" (they are considered the same in
>> lexicographic order, and the French alphabet has 26 letters).
>
>
> On the other hand, the same doesn't necessarily apply to other languages. (At
> least according to Wikipedia.)
>
> http://en.wikipedia.org/wiki/Diacritic#Languages_with_letters_containing_diacritics
For example, in Serbo-Croatian (Serbian, Croatian, Bosnian, Montenegrin, if you
want), each of the following letters represent one distinct sound of the
language. In Serbian Cyrillic alphabet, they are distinct symbols. In Latin
alphabet, the corresponding letters are formed with diacritics because the
alphabet is shorter.
Letter Approximate pronunciation Cyrillic
------ ------------------------- --------
č tch in butcher ч
ć ch in chapter, but softer ћ
dž j in jump џ
đ j in juice ђ
š sh in ship ш
ž s in pleasure, measure, ... ж
The language has 30 sounds and the corresponding 30 letters.
See the count of the letters in these tables:
- http://hr.wikipedia.org/wiki/Hrvatska_abeceda
- http://sr.wikipedia.org/wiki/Азбука
Diacritics are used in grammar books and in print (occasionally) to distinguish
between four different accents of the language:
- long rising: á,
- short rising: à,
- long falling: ȃ (inverted breve, *not* a circumflex â), and
- short falling: ȁ,
especially when the words that use the same sounds -- thus, spelled with the
same letters -- are next to each other. The accents are used to change the
intonation of the whole word, not to change the sound of the letter.
For example: "Ja sam sȃm." -- "I am alone."
Both words "sam" contain the "a" sound, but the first one is pronounced short.
As a form of the verb "to be" it's an enclitic that takes the accent of the
preceding word "I". The second one is pronounced with a long falling accent.
The macron can be used to indicate the length of a *non-stressed* vowel,
e.g. ā, but is usually unnecessary in standard print.
Many languages use alphabets that are not suitable to their sound system. The
speakers of these languages adapted alphabets to their sounds either by using
letters with distinct shapes (Cyrillic letters above), or adding diacritics to
an existing shape (Latin letters above).
The new combined form is a distinct letter. These letters have separate
sections in dictionaries and a sorting order.
The diacritics that indicate an accent or length are used only above vowels and
do *not* represent distinct letters.
Best regards,
Zvezdan Petković
P.S. Since I live in the USA, the last letter of my surname is *wrongly*
spelled (ć -> c) and pronounced (ch -> k) most of the time. :-)
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com