On Fri, May 9, 2008 at 1:52 AM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: >>> For sys.stdout this doesn't make sense at all, since it hides encoding >>> errors for all applications using sys.stdout as piping mechanism. >>> -1 on that. >> >> You can raise UnicodeEncodigError for encoding errors if you want, by >> setting sys.stdout's error-handler to `strict`. > > No, that's not a good idea. I don't want to change every single > affected application just to make sure that they don't write > corrupt data to stdout.
The changes you need to make for your applications will be so small that I don't think this is valid argument. And number of applications you need to change will be rather small. What you call "corrupt data" are just hex-escaped characters of foreign language. In most case, printing(or writing to file) such string doesn't harm, so I think raising exception by default is overkill. Java doesn't raise exception for encoding error, but just print `?`. .NET languages such as C# also prints '?'. Perl prints hex-escaped string, as proposed in this PEP. >> Even though this PEP was rejected, > > You mean PEP 3138 was rejected ?? Er, I should have written "Even if this PEP was ...", perhaps. > Well, "annoying" is not good enough for such a big change :-) So? Annoyance of Perl was enough reason to change entire language for me :-) > The backslashreplace idea may have some merrits in interactive > Python sessions or IDLE, but it hides encoding errors in all > other situations. Encoding errors are not hidden, but are represented by hex-escaped strings. We can get much more information about the string being printed than printing tracebacks. > I'm not against changing the repr() of Unicode objects, but > please make sure that this change does not break debugging > Python applications.Whether you're debugging an app using > 'print' statements, piping repr() through a socket to a remote > debugger or writing information to a log file. The important > factor to take into account is the other end that will receive > the data. I think your request is too vague to be completed. This proposal improve current broken debugging for me, and I see no lost information for debugging. But the "other end" may be too vary to say something. > BTW: One problem that your PEP doesn't address, which I mentioned > on the ticket: > > By putting all printable chars into the repr() you lose the > ability to actually see the number of code points you have > in a Unicode string. > With current repr(), I can not get any information other than number of code points. This is not what I want to know by printing repr(). For length of the string, I'll just do print(len(s)). > > Please name the property Py_UNICODE_ISPRINTABLE. Py_UNICODE_ISHEXESCAPED > isn't all that intuitive. The name `Py_UNICODE_ISPRINTABLE` came to my mind at first, but I was not sure the `printable` is accurate word. I'm okay for Py_UNICODE_ISPRINTABLE, but I'd like to hear opinions. If no one objects Py_UNICODE_ISPRINTABLE, I'll go for it. > > How can things easily be changed so that it's possible to get the > Py2.x style hex escaping back into Py3k without having to change > all repr() calls and %r format markers for Unicode objects ? I didn't intend to imply "without having to change". Perhaps, "migrate" would be wrong word and "port" may be better. For repr() and %r format, they are unlikely to be changed in most case. They need to be changed if pure ASCII are required even if your locale is capable to print the strings. > I can see your point with it being easier to read e.g. German, > Japanese or Korean data, but it still has to be possible to > use repr() for proper debugging which allows the user to > actually see what is stored in a Unicode object in terms of > code points. You can see code points easily, the function I wrote in the PEP to convert such strings as repr() in Python 2 is good example. But I believe ordinary use-case prefer readable string over code points. _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com