Martin Jericho wrote:
I want to parse an HTML document using Neko, and all I want to find out is the character position of particular nodes in the input stream. I

You have to make a distinction between "character position" and "byte offset" into the source file. They are not equivalent and can vary greatly depending on the character encoding of the file.

saw the XMLLocator interface, which I presume allows me to find out the line number and the column within the line, but it doesn't include the position from the start of the stream.

This is because it is very difficult to map back to the original byte offset of the source file. Unless I wanted to re-implement all of the character decoders, that is...

Is there an example somewhere of doing this with Neko? I would really appreciate it if someone could help me with this, as I have nearly spent the whole day trying to figure it out from the source code.

Because I use the standard Java character decoders, I have no way of knowing the original byte offsets that correspond to the resulting Unicode characters.

Could you explain in more detail exactly what information
you are trying to retrieve?


-- Andy Clark * [EMAIL PROTECTED]


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to