> Geir Magnusson Jr. wrote:
>
>
>Re the question about why to hack, I think I see why.
>
Yes, I have to hack the ASCII_CharStream instead of the generated UCode_CharStream.
Because UCode_CharStream combines every 2 characters into 1 characters(see
UCode_CharStream.ReadChar()), while ASCII_CharStream masks higher byte of every
character(see ASCII_CharStream.readChar()). So there's no existing class can do the
work. A much graceful solution is to set the option USER_CHAR_STREAM=true in
Parser.jjt file, and write a VelocityCharStream.java to extends the generated
CharStream.java interface. This needs to modify the constructor of Parser so that it
can instantiate the user defined CharStream.
> Heh. I wasn't arguing that there wasn't a problem - just inquiring
> about what was going on.
>
> I would have guessed that the higher byte came into play with the
> testcase too.
No, there IS a problem. Velocity(JavaCC) has ignored any higher byte of UNICODE, so
it will consider (U+4e0d) will match "\n", (U+4e2d) will match "-", etc. When the
token manager returns the token, it get the characters directly from buffer(in which
higher byte of character hasn't been masked), so in most cases, there seems no problem.
Did you tested the encodingtest.vm I attached in last mail? It may cause the parsing
error just because it masks the higher byte!
Looking forward to the future velocity will correct this.
Best regards,
Michael Zhou