Greetings, Today, while trying to internationalize a program I'm working on, I found an interesting side-effect of how we're dealing with encoding of unicode strings while being written to files.
Suppose the following example: # -*- encoding: iso-8859-1 -*- print u"á" This will correctly print the string 'á', as expected. Now, what surprises me, is that the following code won't work in an equivalent way (unless using sys.setdefaultencoding()): # -*- encoding: iso-8859-1 -*- import sys sys.stdout.write(u"á\n") This will raise the following error: Traceback (most recent call last): File "asd.py", line 3, in ? sys.stdout.write(u"á") UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 0:ordinal not in range(128) This difference may become a really annoying problem when trying to internationalize programs, since it's usual to see third-party code dealing with sys.stdout, instead of using 'print'. The standard optparse module, for instance, has a reference to sys.stdout which is used in the default --help handling mechanism. Given the fact that files have an 'encoding' parameter, and that any unicode strings with characters not in the 0-127 range will raise an exception if being written to files, isn't it reasonable to respect the 'encoding' attribute whenever writing data to a file? The workaround for that problem is to either use the evil-considered sys.setdefaultencoding(), or to wrap sys.stdout. IMO, both options seem unreasonable for such a common idiom. -- Gustavo Niemeyer http://niemeyer.net _______________________________________________ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com