Guido van Rossum, 01.09.2011 18:31:
On Thu, Sep 1, 2011 at 9:03 AM, Antoine Pitrou wrote:
Le jeudi 01 septembre 2011 à 08:45 -0700, Guido van Rossum a écrit :
This is definitely thought of as a separate
mark added to the e; ë is not a new letter. I have a feeling it's the
same way for the French and Germans, but I really don't know.
(Antoine? Georg?)

Indeed, they are not separate "letters" (they are considered the same in
lexicographic order, and the French alphabet has 26 letters).

So does the German alphabet, even though that does not include "ß", which basically descended from a ligature of the old German way of writing "sz", where "s" looked similar to an "f" and "z" had a low hanging tail.

IIRC, German Umlaut letters are lexicographically sorted according to their emergency replacement spelling ("ä" -> "ae"), which is also sometimes used in all upper case words ("Glück" -> "GLUECK"). I guess that's because Umlaut dots are harder to see on top of upper case letters. So, Latin-1 byte value sorting always yields totally wrong results.

That aside, Umlaut letters are commonly considered separate letters, different from the undotted letters and also different from the replacement spellings. I, for one, always found the replacements rather weird and never got used to using them in upper case words. In any case, it's wrong to always use them, and it makes text harder to read.


But I'm not sure how it's relevant, because you can't remove an accent
without most likely making a spelling error, or at least changing the
meaning. Accents are very much part of the language (while ligatures
like "ff" are not, they are a rendering detail). So I would consider
"é", "ê", "ù", etc. atomic characters for the purpose of processing
French text. And I don't see how a decomposed form could help an
application.

I recall long ago that when the french wrote words in all caps they
would drop the accents, e.g. ECOLE. I even recall (through the mists
of time) observing this in Paris on public signs. Is this still the
convention?

Yes, and it's a huge problem when trying to pronounce last names. In French, you'd commonly write

LASTNAME, Firstname

and if LASTNAME happens to have accented letters, you'd miss them when reading that. I know a couple of French people who severely suffer from this, because the pronunciation of their name gets a totally different meaning without accents.

Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to