In python2, "\u" escapes are processed in raw unicode strings. That is, ur'\u3000' is a string of length 1 consisting of the IDEOGRAPHIC SPACE unicode character.
In python3, "\u" escapes are not processed in raw strings. r'\u3000' is a string of length 6 consisting of a backslash, 'u', '3' and three '0' characters. This breaks a lot of my code because in python 2 re.split (ur'[\u3000]', u'A\u3000A') ==> [u'A', u'A'] but in python 3 (the result of running 2to3), re.split (r'[\u3000]', 'A\u3000A' ) ==> ['A\u3000A'] I can remove the "r" prefix from the regex string but then if I have other regex backslash symbols in it, I have to double all the other backslashes -- the very thing that the r-prefix was invented to avoid. Or I can leave the "r" prefix and replace something like r'[ \u3000]' with r'[ ]'. But that is confusing because one can't distinguish between the space character and the ideographic space character. It also a problem if a reader of the code doesn't have a font that can display the character. Was there a reason for dropping the lexical processing of \u escapes in strings in python3 (other than to add another annoyance in a long list of python3 annoyances?) And is there no choice for me but to choose between the two poor choices I mention above to deal with this problem? -- http://mail.python.org/mailman/listinfo/python-list