On 12/2/06, Fredrik Lundh <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: > > >> it would be a good thing if it could, optionally, be made to report > >> horizontal whitespace as well. > > > > It's remarkably easy to get this out of the existing API > > sure, but it would be even easier if I didn't have to write that code > myself (last time I did that, I needed a couple of tries before the > parser handled all cases correctly...). > > but maybe this could simply be handled by a helper generator in the > tokenizer module, that simply wraps the standard tokenizer generator > and inserts whitespace tokens where necessary?
A helper sounds like a promising idea. Anyone interested in volunteering a patch? > > keep track > > of the end position returned by the previous call, and if it's > > different from the start position returned by the next call, slice the > > line text from the column positions, assuming the line numbers are the > > same.If the line numbers differ, something has been eating \n tokens; > > this shouldn't happen any more with my patch. > > you'll still have to deal with multiline strings, right? No, they are returned as a single token whose start and stop correctly reflect line/col of the begin and end of the token. My current code (based on the second patch I gave in this thread and the algorithm described above) doesn't have to special-case anything except the ENDMARKER token (to break out of its loop :-). -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com