If you read a message or two ahead, yes, I see why that doesn't work.
I will stare at this a bit more so I can feel confident I understand it
end to end, but yes, there is a big problem here because of the byte
masking, and your work on the solution really helps. I understand why
your example inputs have the problem, and further understand that we
didn't see this sooner because of sheer chance, although I want to look
a little deeper.
In the end, I think we will just make, as you suggest, a
VelocityCharStream to keep the confusion to a bare minimum. I don't
like the idea of moving and renaming - that will just be too confusing
down the road.
I suspect this will be in 1.2 rather than 1.1, unless things turn out to
be lucid and clean - I think we want to beat up any major changes before
declaring production ready.
I'll make a huge test template to make sure nothing slips by.
geir
Michael Zhou wrote:
>
> > Geir Magnusson Jr. wrote:
> >
> >
> >Re the question about why to hack, I think I see why.
> >
> Yes, I have to hack the ASCII_CharStream instead of the generated UCode_CharStream.
>Because UCode_CharStream combines every 2 characters into 1 characters(see
>UCode_CharStream.ReadChar()), while ASCII_CharStream masks higher byte of every
>character(see ASCII_CharStream.readChar()). So there's no existing class can do the
>work. A much graceful solution is to set the option USER_CHAR_STREAM=true in
>Parser.jjt file, and write a VelocityCharStream.java to extends the generated
>CharStream.java interface. This needs to modify the constructor of Parser so that
>it can instantiate the user defined CharStream.
>
> > Heh. I wasn't arguing that there wasn't a problem - just inquiring
> > about what was going on.
> >
> > I would have guessed that the higher byte came into play with the
> > testcase too.
> No, there IS a problem. Velocity(JavaCC) has ignored any higher byte of UNICODE, so
>it will consider (U+4e0d) will match "\n", (U+4e2d) will match "-", etc. When the
>token manager returns the token, it get the characters directly from buffer(in which
>higher byte of character hasn't been masked), so in most cases, there seems no
>problem.
>
> Did you tested the encodingtest.vm I attached in last mail? It may cause the
>parsing error just because it masks the higher byte!
>
> Looking forward to the future velocity will correct this.
>
> Best regards,
> Michael Zhou
--
Geir Magnusson Jr. [EMAIL PROTECTED]
System and Software Consulting
Developing for the web? See http://jakarta.apache.org/velocity/
"still climbing up to the shoulders..."