On 05/30/2012 05:54 AM, Thomas Rachel wrote: > Am 30.05.2012 08:52 schrieb ru...@yahoo.com: > >> This breaks a lot of my code because in python 2 >> re.split (ur'[\u3000]', u'A\u3000A') ==> [u'A', u'A'] >> but in python 3 (the result of running 2to3), >> re.split (r'[\u3000]', 'A\u3000A' ) ==> ['A\u3000A'] >> >> I can remove the "r" prefix from the regex string but then >> if I have other regex backslash symbols in it, I have to >> double all the other backslashes -- the very thing that >> the r-prefix was invented to avoid. >> >> Or I can leave the "r" prefix and replace something like >> r'[ \u3000]' with r'[ ]'. But that is confusing because >> one can't distinguish between the space character and >> the ideographic space character. It also a problem if a >> reader of the code doesn't have a font that can display >> the character. >> >> Was there a reason for dropping the lexical processing of >> \u escapes in strings in python3 (other than to add another >> annoyance in a long list of python3 annoyances?) > > Probably it is more consequent. Alas, it makes the whole stuff > incompatible to Py2. > > But if you think about it: why allow for \u if \r, \n etc. are > disallowed as well?
Maybe the blame is elsewhere then... If the re module interprets (in a regex string) the 2-character string consisting of r'\' followed by 'n' as a single newline character, then why wasn't re changed for Python 3 to interpret the 6-character string, r'\u3000' as a single unicode character to correspond with Python's lexer no longer doing that (as it did in Python 2)? >> And is there no choice for me but to choose between the two >> poor choices I mention above to deal with this problem? > > There is a 3rd one: use r'[ ' + '\u3000' + ']'. Not very nice to read, > but should do the trick... I guess the "+"s could be left out allowing something like, '[ \u3000]' r'\w+ \d{3}' but I'll have to try it a little; maybe just doubling backslashes won't be much worse. I did that for years in Perl and lived through it. -- http://mail.python.org/mailman/listinfo/python-list