If you read a message or two ahead, yes, I see why that doesn't work.

I will stare at this a bit more so I can feel confident I understand it
end to end, but yes, there is a big problem here because of the byte
masking, and your work on the solution really helps.  I understand why
your example inputs have the problem, and further understand that we
didn't see this sooner because of sheer chance, although I want to look
a little deeper.

In the end, I think we will just make, as you suggest, a
VelocityCharStream to keep the confusion to a bare minimum.  I don't
like the idea of moving and renaming - that will just be too confusing
down the road.

I suspect this will be in 1.2 rather than 1.1, unless things turn out to
be lucid and clean - I think we want to beat up any major changes before
declaring production ready.

I'll make a huge test template to make sure nothing slips by.

geir


Michael Zhou wrote:
> 
> > Geir Magnusson Jr. wrote:
> >
> >
> >Re the question about why to hack, I think I see why.
> >
> Yes, I have to hack the ASCII_CharStream instead of the generated UCode_CharStream.  
>Because UCode_CharStream combines every 2 characters into 1 characters(see 
>UCode_CharStream.ReadChar()), while ASCII_CharStream masks higher byte of every 
>character(see ASCII_CharStream.readChar()).  So there's no existing class can do the 
>work.  A much graceful solution is to set the option USER_CHAR_STREAM=true in 
>Parser.jjt file, and write a VelocityCharStream.java to extends the generated 
>CharStream.java interface.   This needs to modify the constructor of Parser so that 
>it can instantiate the user defined CharStream.
> 
> > Heh.  I wasn't arguing that there wasn't a problem - just inquiring
> > about what was going on.
> >
> > I would have guessed that the higher byte came into play with the
> > testcase too.
> No, there IS a problem.  Velocity(JavaCC) has ignored any higher byte of UNICODE, so 
>it will consider (U+4e0d) will match "\n", (U+4e2d) will match "-", etc.  When the 
>token manager returns the token, it get the characters directly from buffer(in which 
>higher byte of character hasn't been masked), so in most cases, there seems no 
>problem.
> 
> Did you tested the encodingtest.vm I attached in last mail?  It may cause the 
>parsing error just because it masks the higher byte!
> 
> Looking forward to the future velocity will correct this.
> 
> Best regards,
> Michael Zhou

-- 
Geir Magnusson Jr.                           [EMAIL PROTECTED]
System and Software Consulting
Developing for the web?  See http://jakarta.apache.org/velocity/
"still climbing up to the shoulders..."

Reply via email to