To answer Larry's question, there's an overwhelming number of different options -- bytes/unicode, raw/cooked, and (in Py2) `from __future__ import unicode_literals`. So it's easier to do the actual semantic conversion in a later stage -- then the lexer only has to worry about hopping over backslashes.
On Thu, May 17, 2018 at 3:38 PM, Eric V. Smith <e...@trueblade.com> wrote: > On 5/17/2018 3:01 PM, Larry Hastings wrote: > >> >> >> I fed this into tokenize.tokenize(): >> >> b''' x = "\u1234" ''' >> >> I was a bit surprised to see \Uxxxx in the output. Particularly because >> the output (t.string) was a *string* and not *bytes*. >> > > For those (like me) who have no idea how to use tokenize.tokenize's wacky > interface, the test code is: > > list(tokenize.tokenize(io.BytesIO(b''' x = "\u1234" ''').readline)) > > Maybe I'm making a parade of my ignorance, but I assumed that string >> literals were parsed by the parser--just like everything else is parsed by >> the parser, hey it seems like a good place for it--and in particular that >> the escape sequence substitutions would be done in the tokenizer. Having >> stared at it a little, I now detect a whiff of "this design solved a real >> problem". So... what was the problem, and how does this design solve it? >> > > I assume the intent is to not throw away any information in the lexer, and > give the parser full access to the original string. But that's just a guess. > > BTW, my use case is that I hoped to use CPython's tokenizer to parse some >> Python-ish-looking text and handle double-quoted strings for me. >> *Especially* all the escape sequences--leveraging all CPython's support for >> funny things like \U{penguin}. The current behavior of the tokenizer makes >> me think it'd be easier to roll my own! >> > > Can you feed the token text to the ast? > > >>> ast.literal_eval('"\u1234"') > 'ሴ' > > Eric > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido% > 40python.org > -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com