I just had a shower, and I think it's cleared my thoughts a bit. :-) Clearly this is an important problem to those in countries where ASCII doesn't cut it. And just like in Python 3000 we're using UTF-8 as the default source encoding and allowing Unicode letters in identifiers, I think we should bite the bullet and allow repr() of a string to pass through all characters that the Unicode standard considers printable. For those of us with less capable IO devices, setting the error flag for stdout and stderr to backslashreplace is probably the best solution (and it solves more problems than just repr()).
I will have another look at Atsuo's patch. I do think we should use some kind of Unicode-standard-endorsed definition of "printable" (as long as it excludes all ASCII escapes), since there are plenty of undefined code points that even Japanese people would probably prefer to see rendered as \uxxxx rather than completely invisible. I'm also not sure what people would want to happen for surrogate pairs. (OTOH an unpaired surrogate should be rendered as \uxxxx.) I expect that this will require some more research and agreement. Perhaps someone can produce a draft PEP and attempt to sort out the details of specification and implementation? It would also be nice if it could be friendly to Jython, IronPython and PyPy. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com