> Geir Magnusson Jr. wrote:
> 
>
>Re the question about why to hack, I think I see why.
>
Yes, I have to hack the ASCII_CharStream instead of the generated UCode_CharStream.  
Because UCode_CharStream combines every 2 characters into 1 characters(see 
UCode_CharStream.ReadChar()), while ASCII_CharStream masks higher byte of every 
character(see ASCII_CharStream.readChar()).  So there's no existing class can do the 
work.  A much graceful solution is to set the option USER_CHAR_STREAM=true in 
Parser.jjt file, and write a VelocityCharStream.java to extends the generated 
CharStream.java interface.   This needs to modify the constructor of Parser so that it 
can instantiate the user defined CharStream.

> Heh.  I wasn't arguing that there wasn't a problem - just inquiring
> about what was going on.
> 
> I would have guessed that the higher byte came into play with the
> testcase too.
No, there IS a problem.  Velocity(JavaCC) has ignored any higher byte of UNICODE, so 
it will consider (U+4e0d) will match "\n", (U+4e2d) will match "-", etc.  When the 
token manager returns the token, it get the characters directly from buffer(in which 
higher byte of character hasn't been masked), so in most cases, there seems no problem.

Did you tested the encodingtest.vm I attached in last mail?  It may cause the parsing 
error just because it masks the higher byte!

Looking forward to the future velocity will correct this.

Best regards,
Michael Zhou


Reply via email to