Re: [Python-Dev] Small tweak to tokenize.py?

Guido van Rossum Sat, 02 Dec 2006 10:06:58 -0800

On 12/2/06, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
>
> >> it would be a good thing if it could, optionally, be made to report
> >> horizontal whitespace as well.
> >
> > It's remarkably easy to get this out of the existing API
>
> sure, but it would be even easier if I didn't have to write that code
> myself (last time I did that, I needed a couple of tries before the
> parser handled all cases correctly...).
>
> but maybe this could simply be handled by a helper generator in the
> tokenizer module, that simply wraps the standard tokenizer generator
> and inserts whitespace tokens where necessary?


A helper sounds like a promising idea. Anyone interested in
volunteering a patch?

> > keep track
> > of the end position returned by the previous call, and if it's
> > different from the start position returned by the next call, slice the
> > line text from the column positions, assuming the line numbers are the
> > same.If the line numbers differ, something has been eating \n tokens;
> > this shouldn't happen any more with my patch.
>
> you'll still have to deal with multiline strings, right?

No, they are returned as a single token whose start and stop correctly
reflect line/col of the begin and end of the token. My current code
(based on the second patch I gave in this thread and the algorithm
described above) doesn't have to special-case anything except the
ENDMARKER token  (to break out of its loop :-).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Small tweak to tokenize.py?

Reply via email to