Dear all, I have a problem with logging an exception.
environment: Python 2.4, Debian testing ${LANGUAGE} not set ${LC_ALL} not set ${LC_CTYPE} not set ${LANG}=de_DE.UTF-8 activating user-default locale with <locale.setlocale(locale.LC_ALL, '')> returns: [de_DE.UTF-8] locale.getdefaultlocale() - default (user) locale: ('de_DE', 'utf-8') encoding sanity check (also check "locale.nl_langinfo(CODESET)" below): sys.getdefaultencoding(): [ascii] locale.getpreferredencoding(): [UTF-8] locale.getlocale()[1]: [utf-8] sys.getfilesystemencoding(): [UTF-8] _logfile = codecs.open(filename = _logfile_name, mode = 'wb', encoding = 'utf8', errors = 'replace') logging.basicConfig ( format = fmt, datefmt = '%Y-%m-%d %H:%M:%S', level = logging.DEBUG, stream = _logfile ) I am using psycopg2 which in turn uses libpq. When trying to connect to the database and providing faulty authentication information: try: ... try to connect ... except StandardError, e: _log.error(u"login attempt %s/%s failed:", attempt+1, max_attempts) print "exception type :", type(e) print "exception dir :", dir(e) print "exception args :", e.args msg = e.args[0] print "msg type :", type(msg) print "msg.decode(utf8):", msg.decode('utf8') t,v,tb = sys.exc_info() print "sys.exc_info() :", t, v _log.exception(u'exception detected') the following output is generated: exception type : <type 'instance'> exception dir : ['__doc__', '__getitem__', '__init__', '__module__', '__str__', 'args'] exception args : ('FATAL: Passwort-Authentifizierung f\xc3\xbcr Benutzer \xc2\xbbany-doc\xc2\xab fehlgeschlagen\n',) msg type : <type 'str'> msg.decode(utf8): FATAL: Passwort-Authentifizierung für Benutzer »any-doc« fehlgeschlagen sys.exc_info() : psycopg2.OperationalError FATAL: Passwort-Authentifizierung für Benutzer »any-doc« fehlgeschlagen Traceback (most recent call last): File "/usr/lib/python2.4/logging/__init__.py", line 739, in emit self.stream.write(fs % msg.encode("UTF-8")) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 191: ordinal not in range(128) Now, the string "FATAL: Passwort-Auth..." comes from libpq via psycopg2. It is translated to German via gettext within libpq (at the C level). As we can see it is of type string. I know from the environment that it is likely encoded in utf8 manually applying which (see the decode call) succeeds. On _log.exception() the logging module wants to output the message as encoded as utf8 (that's what the log file is set up as). So it'll look at the string, decide it is of type "str" and decode with the *Python default encoding* to get to type "unicode". Following which it'll re-encode with utf8 to get back to type "str" ready for outputting to the log file. However, since the Python default encoding is "ascii" that conversion fails. Changing the Python default encoding isn't really an option as it is advocated against and would have to be made to work reliably on other users machines. One could, of course, write code to specifically check for this condition and manually pre-convert the message string to unicode but that seems not as things should be. How can I cleanly handle this situation ? Should the logging module internally use an encoding gotten from the locale module rather than the default string encoding ? Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 -- http://mail.python.org/mailman/listinfo/python-list