Hi,

On 2/27/11 1:02 PM, Tomas Gavenciak wrote:
I think you misunderstand me -- the normalization is currently done
twice, which seems unnecessary:

First, different kinds of newlines ('\n', '\r' and '\r\n' in Python)
are replaced by '\n' by
source = '\n'.join(unicode(source).splitlines())
That's done because the lexer is using lookbehinds and negative lookbehinds in regular expressions which in Python are fixed width. As such the newlines have to be normalized to a fixed length form (and for simplicities sake I chose to normalize to a unix newline). That said, the function also currently drops the trailing newline and the information if such a newline was there or not.

And then again, in _normalize_newlines(), '\n' '\r' and '\r\n' (as
given by newline_re) are replaced with the set newline_sequence.
That goes from \n to the target format then. For instance some people set it to \r\n (probably the only alternative that makes sense these days) for HTTP and windows environments.

Dropping the first operation does not change the behaviour except for
preserving the possible final newline (and that is easily added, see
below). Also it probably speeds up and clarifies the parsing a little.
It does cause troubles if the source format is windows newlines as lookbehinds break. I don't have the code in front of me right now but that was the original reason I went with the newline normalization.

I guess that dealing with the last newline is not an issue that would
deserve a new flag -- if you ALWAYS strip it and state that in the
docs, that seems like a good solution to me (and is the current
behaviour). Even if you drop the splitlines() line, it can be easily
done in _normalize_newlines(). If somebody (like me ;-) wants
more/less newlines, it is easy to just append them. The current docs
just do not state the current behaviour (nor do they explain how is
newline_sequence used for) and confuse (me) with stating that
whitespace is not touched.
I will update the documentation for sure and consider adding a flag that controls the trailing one.

What I would REALLY appreciate would be an option not to touch
whitespace (or other characters) at all. Then it would be easier to
use jinja for non-HTML templates where the newlines and special
characters matter (I frequently use jinja for generating program
code). The flag could be implemented i.e. by allowing newline_sequence
to be None (or '' os some other value) and checking that in
_normalize_newlines() (the variable newline_sequence is not used
anywhere else).
Jinja2 currently only normalizes newline whitespace. The rest is kept unchanged. On top of that it supports a wide range of unicode supported whitespace characters as token separator inside the Jinja2 blocks. Someone was joking last time that this makes it impossible to use Jinja2 to generate "whitespace" (the programming language) sourcecode, but so far that was the only use case where the normalization of newlines caused problems. If you have some more use cases I will consider changing the lexer to operate on arbitrary newlines instead of normalizing them upfront.


Regards,
Armin

--
You received this message because you are subscribed to the Google Groups 
"pocoo-libs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/pocoo-libs?hl=en.

Reply via email to