Martin v. Löwis:

> This appears to be based on the usedDefault return value of
> WideCharToMultiByte. I believe this is insufficient:
> WideCharToMultiByte might convert Unicode characters to
> codepage characters in a lossy way, without using the default
> character. For example, it converts U+0308 (combining diaeresis)
> to U+00A8 (diaeresis) (or something like that, I forgot the
> exact details). So if you have, say, "p-umlaut" (i.e. U+0070
> U+0308), it converts it to U+0070 U+00A8 (in the local code page).
> Trying to use this as a filename later fails.

   There is WC_NO_BEST_FIT_CHARS to defeat that. It says that it will
use the default character if the translation can't be round-tripped.
Available on WIndows 2000 and XP but not NT4. We could compare the
original against the round-tripped as described at
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_2bj9.asp

   Neil
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to