On 12.07.2014 14:19, Steven D'Aprano wrote:
On Sat, Jul 12, 2014 at 11:27:17AM +0100, Alan Gauld wrote:
On 12/07/14 10:28, Steven D'Aprano wrote:

If you're using Python 3.3 or higher, it is better to use
message.casefold rather than lower. For English, there's no real
difference:
...
but it can make a difference for non-English languages:

py> "Große".lower()  # German for "great" or "large"
'große'
py> "Große".casefold()
'grosse'

You learn something new etc...

But I'm trying to figure out what difference this makes in
practice?

If you were targeting a German audience wouldn't you just test
against the German alphabet? After all you still have to expect 'grosse'
which isn't English, so if you know to expect grosse
why not just test against große instead?

Because the person might have typed any of:

grosse
GROSSE
gROSSE
große
Große
GROßE
GROẞE

etc., and you want to accept them all, just like in English you'd want
to accept any of GREAT great gREAT Great gReAt etc. Hence you want to
fold everything to a single, known, canonical version. Case-fold will do
that, while lowercasing won't.

(The last example includes a character which might not be visible to
many people, since it is quite unusual and not supported by many fonts
yet. If it looks like a box or empty space for you, it is supposed
to be capital sharp-s, matching the small sharp-s ß.)


Very interesting advice. Wasn't aware at all of this feature of casefold.
As a native German speaker, I have to say that your last two examples involving the capital ß are pretty contrived: although the capital ß is part of unicode, it is not an official part of the German alphabet and nobody is using it (in fact, I had to look it up in Wikipedia now to learn what that letter is). An even better example than the rest of yours would be Kuß (German for the noun kiss), which only people above 30 (like me) still spell this way, but younger people spell Kuss since the official rules have changed over the last 10 years. In this particular case, you should definitely be prepared to handle "Kuss" and "Kuß" as legal input.

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to