Re: Bug in jinja2.lexer

Armin Ronacher Mon, 28 Feb 2011 01:37:15 -0800

Hi,

On 2/27/11 1:02 PM, Tomas Gavenciak wrote:

I think you misunderstand me -- the normalization is currently done
twice, which seems unnecessary:


First, different kinds of newlines ('\n', '\r' and '\r\n' in Python)
are replaced by '\n' by
source = '\n'.join(unicode(source).splitlines())

That's done because the lexer is using lookbehinds and negativelookbehinds in regular expressions which in Python are fixed width. Assuch the newlines have to be normalized to a fixed length form (and forsimplicities sake I chose to normalize to a unix newline). That said,the function also currently drops the trailing newline and theinformation if such a newline was there or not.

And then again, in _normalize_newlines(), '\n' '\r' and '\r\n' (as
given by newline_re) are replaced with the set newline_sequence.

That goes from \n to the target format then. For instance some peopleset it to \r\n (probably the only alternative that makes sense thesedays) for HTTP and windows environments.

Dropping the first operation does not change the behaviour except for
preserving the possible final newline (and that is easily added, see
below). Also it probably speeds up and clarifies the parsing a little.

It does cause troubles if the source format is windows newlines aslookbehinds break. I don't have the code in front of me right now butthat was the original reason I went with the newline normalization.

I guess that dealing with the last newline is not an issue that would
deserve a new flag -- if you ALWAYS strip it and state that in the
docs, that seems like a good solution to me (and is the current
behaviour). Even if you drop the splitlines() line, it can be easily
done in _normalize_newlines(). If somebody (like me ;-) wants
more/less newlines, it is easy to just append them. The current docs
just do not state the current behaviour (nor do they explain how is
newline_sequence used for) and confuse (me) with stating that
whitespace is not touched.

I will update the documentation for sure and consider adding a flag thatcontrols the trailing one.

What I would REALLY appreciate would be an option not to touch
whitespace (or other characters) at all. Then it would be easier to
use jinja for non-HTML templates where the newlines and special
characters matter (I frequently use jinja for generating program
code). The flag could be implemented i.e. by allowing newline_sequence
to be None (or '' os some other value) and checking that in
_normalize_newlines() (the variable newline_sequence is not used
anywhere else).

Jinja2 currently only normalizes newline whitespace. The rest is keptunchanged. On top of that it supports a wide range of unicode supportedwhitespace characters as token separator inside the Jinja2 blocks.Someone was joking last time that this makes it impossible to use Jinja2to generate "whitespace" (the programming language) sourcecode, but sofar that was the only use case where the normalization of newlinescaused problems. If you have some more use cases I will considerchanging the lexer to operate on arbitrary newlines instead ofnormalizing them upfront.



Regards,
Armin

--
You received this message because you are subscribed to the Google Groups 
"pocoo-libs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/pocoo-libs?hl=en.

Re: Bug in jinja2.lexer

Reply via email to