On Sun, Jun 17, 2012 at 10:59 PM, Terry Reedy <tjre...@udel.edu> wrote:
> On 6/17/2012 9:07 PM, Guido van Rossum wrote: > >> On Sun, Jun 17, 2012 at 4:55 PM, Nick Coghlan <ncogh...@gmail.com >> > > So, perhaps the answer is to leave this as is, and try to make 2to3 >> smart enough to detect such escapes and replace them with their >> properly encoded (according to the source code encoding) Unicode >> equivalent? >> >> >> But the whole point of the reintroduction of u"..." is to support code >> that isn't run through 2to3. >> > > People writing 2&3 code sometimes use 2to3 once (or a few times) on their > 2.6/7 version during development to find things they must pay attention to. > So Nick's idea could be helpful to people who do not want to use 2to3 > routinely either in development or deployment. > > > > Frankly, I don't care how it's done, but > >> I'd say it's important not to silently have different behavior for the >> same notation in the two versions. >> > > The fundamental problem was giving the 'u' prefix two different meanings > in 2.x: 'change the storage type from bytes to unicode', and 'change the > contents by partially cooking the literal even when raw processing is > requested'*. The only way to silently have the same behavior is to > re-introduce the second meaning of partial cooking. (But I would rather > make it unnecessary.) But that would freeze the 'u' prefix, or at least > 'ur' ('un-raw') forever. It would be better to introduce a new, separate > 'p' prefix, to mean partially raw, partially cooked. (But I am opposes to > > *I think this non-orthogonal interaction effect was a design mistake and > that it would have been better to have re do all the cooking needed by also > interpreting \u and \U sequences. I also think we should add this now for > 3.3 if possible, to make partial cooking at the parsing stage unnecessary. > Putting the processing in re makes it work for all strings, not just those > given as literals. > > > > If that means we have to add an extra > >> step to the compiler to reject r"\u03b3", so be it. >> > > I do not get this. Surely you cannot mean to suddenly start rejecting, in > 3.3, a large set of perfectly legal and sensible 6 and 10 character > sequences when embedded in literals? > Sorry, I meant rejecting ru"...." (and ur"....") if it contains a \u or \U escape that would be expanded by Python 2. Hm. I still encounter enough environments that don't know how to display > such characters that I would prefer to have a rock solid \u escape > mechanism. I can think of two ways to support "expanded" unicode > characters in raw strings a la Python 2; > (a) let the re module interpret the escapes (like it does for \r and \n); As said above, I favor this. The 2.x partial cooking (with 'ur' prefix) was > primarily a substitute for this. > > > (b) the user can write r"someblah" "\u03b3" r"moreblah". > > This is somewhat orthogonal to (a). Users can this whenever they want > partial processing of backslashes without doubling those they want left as > is. A generic example is r'someraw' 'somecooked' r'moreraw' 'morecooked'. > > -- > Terry Jan Reedy > > > > > ______________________________**_________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev> > Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** > guido%40python.org<http://mail.python.org/mailman/options/python-dev/guido%40python.org> > -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com