atsuo ishimoto wrote: > Using repr() to build output string is common practice in Python world, > so repr() is called everywhere in Python-core and third-party applications > to print objects, emitting logs, etc.,. > > For example, > >>>> f = open("日本語") > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "c:\ww\Python-3.0a4-orig\lib\io.py", line 212, in __new__ > return open(*args, **kwargs) > File "c:\ww\Python-3.0a4-orig\lib\io.py", line 151, in open > closefd) > IOError: [Errno 2] No such file or directory: '\u65e5\u672c\u8a9e' > > This is annoying error message. Or, in Python 2, > >>>> f = open(u"日本語", "w") >>>> f > <open file u'\u65e5\u672c\u8a9e', mode 'w' at 0x009370F8> > > This repr()ed form is difficult to read. When Japanese (or Chinise) > programmers look u'\u65e5\u672c\u8a9e', they'll have strong > impression that Python is not intended to be used in their country.
This is starting to seem to me more like something to be addressed through sys.displayhook/excepthook at the interactive interpreter level than it is to be dealt with through changes to any __repr__() implementations. Given the following setup code: def replace_escapes(escaped_str): return escaped_str.encode('latin-1').decode('unicode_escape') def displayhook_unicode(expr_result): if expr_result is not None: __builtins__._ = expr_result print(replace_escapes(repr(expr_result))) from traceback import format_exception def excepthook_unicode(*exc_details): msg = ''.join(format_exception(*exc_details)) print(replace_escapes(msg), end='') import sys sys.displayhook = displayhook_unicode sys.excepthook = excepthook_unicode I get the following behaviour: >>> "\u65e5\u672c\u8a9e" '日本語' >>> print("\u65e5\u672c\u8a9e") 日本語 >>> '日本語' '日本語' >>> print('日本語') 日本語 >>> 日本語 = 1 >>> 日本語 1 >>> dir() ['__builtins__', '__doc__', '__name__', '__package__', 'displayhook_unicode', 'excepthook_unicode', 'format_exception', 'replace_escapes', 'sys', '日本語'] >>> b"\u65e5\u672c\u8a9e" b'\u65e5\u672c\u8a9e' >>> print(b"\u65e5\u672c\u8a9e") b'\\u65e5\\u672c\\u8a9e' >>> f = open("\u65e5\u672c\u8a9e") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ncoghlan/devel/py3k/Lib/io.py", line 212, in __new__ return open(*args, **kwargs) File "/home/ncoghlan/devel/py3k/Lib/io.py", line 151, in open closefd) IOError: [Errno 2] No such file or directory: '日本語' >>> f = open("\u65e5\u672c\u8a9e", 'w') >>> f.name '日本語' Note that even though the bytes object representation is slightly different from that for the normal displayhook (which doubles up on the backslashes, just like the bytes printing example above), the two different representations are equivalent because \u isn't a valid escape sequence for bytes literals. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com