On 5/10/07, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: > On 2007-05-10 20:53, Paul Moore wrote: > > On 10/05/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > >> I just discovered that, in all versions of Python as far back as I > >> have access to (2.0), \uXXXX escapes are interpreted inside raw > >> unicode strings. Thus: > > [...] > >> Does anyone remember why it is done this way? The reference manual > >> describes this behavior, but doesn't give an explanation: > > > > My memory is so dim as to be more speculation than anything else, but > > I suspect it's simply because there's no other way of including > > characters outside the ASCII range in a raw string. > > This is per design (see PEP 100) and was done for the reason given > by Paul. The motivation for the chosen approach was to make Python's > raw Unicode strings compatible to Java's raw Unicode strings: > > http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html
I'm not sure what Java compatibility buys us. It is also far from perfect -- IIUC, in Java if you write \u0022 (that's the " character) it counts as an opening or closing quote, and if you write \u005c (a backslash) it can be used to escape the following character. OTOH, in Python, you can write ur"C:\Program Files\u005c" and voila, a raw string terminating in a backslash. (In Java this would escape the " instead.) However, I understand the other reason (inclusion of non-ASCII characters in raw strings) and I reluctantly agree with it. Reluctantly, because it means I can't create a raw string containing a \ followed by u or U -- I needed one of those today. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com