Re: [Python-Dev] PEP 393 Summer of Code Project

Zvezdan Petkovic Fri, 02 Sep 2011 09:14:01 -0700

On Sep 1, 2011, at 9:30 PM, Steven D'Aprano wrote:

> Antoine Pitrou wrote:
>> Le jeudi 01 septembre 2011 à 08:45 -0700, Guido van Rossum a écrit :
>>> This is definitely thought of as a separate
>>> mark added to the e; ë is not a new letter. I have a feeling it's the same 
>>> way for the French and Germans, but I really don't know.
>>> (Antoine? Georg?)
>> Indeed, they are not separate "letters" (they are considered the same in 
>> lexicographic order, and the French alphabet has 26 letters).
> 
> 
> On the other hand, the same doesn't necessarily apply to other languages. (At 
> least according to Wikipedia.)
> 
> http://en.wikipedia.org/wiki/Diacritic#Languages_with_letters_containing_diacritics


For example, in Serbo-Croatian (Serbian, Croatian, Bosnian, Montenegrin, if you 
want), each of the following letters represent one distinct sound of the 
language.  In Serbian Cyrillic alphabet, they are distinct symbols. In Latin 
alphabet, the corresponding letters are formed with diacritics because the 
alphabet is shorter.

        Letter  Approximate pronunciation       Cyrillic
        ------  -------------------------       --------
        č       tch in butcher                  ч
        ć       ch in chapter, but softer       ћ
        dž      j in jump                       џ
        đ       j in juice                      ђ
        š       sh in ship                      ш
        ž       s in pleasure, measure, ...     ж

The language has 30 sounds and the corresponding 30 letters.
See the count of the letters in these tables:
- http://hr.wikipedia.org/wiki/Hrvatska_abeceda
- http://sr.wikipedia.org/wiki/Азбука

Diacritics are used in grammar books and in print (occasionally) to distinguish 
between four different accents of the language:

        - long rising: á,
        - short rising: à,
        - long falling: ȃ (inverted breve, *not* a circumflex â), and
        - short falling: ȁ,

especially when the words that use the same sounds -- thus, spelled with the 
same letters -- are next  to each other.  The accents are used to change the 
intonation of the whole word, not to change the sound of the letter.

For example: "Ja sam sȃm." -- "I am alone."

Both words "sam" contain the "a" sound, but the first one is pronounced short.  
As a form of the verb "to be" it's an enclitic that takes the accent of the 
preceding word "I".  The second one is pronounced with a long falling accent.

The macron can be used to indicate the length of a *non-stressed* vowel,
e.g. ā, but is usually unnecessary in standard print.

Many languages use alphabets that are not suitable to their sound system.  The 
speakers of these languages adapted alphabets to their sounds either by using 
letters with distinct shapes (Cyrillic letters above), or adding diacritics to 
an existing shape (Latin letters above).  

The new combined form is a distinct letter.  These letters have separate 
sections in dictionaries and a sorting order.

The diacritics that indicate an accent or length are used only above vowels and 
do *not* represent distinct letters.

Best regards,

        Zvezdan Petković

P.S. Since I live in the USA, the last letter of my surname is *wrongly* 
spelled (ć -> c) and pronounced (ch -> k) most of the time.  :-)

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393 Summer of Code Project

Reply via email to