Martin Jericho wrote:
I want to parse an HTML document using Neko, and all I want to find out
is the character position of particular nodes in the input stream. I
You have to make a distinction between "character position"
and "byte offset" into the source file. They are not equivalent
and can vary greatly depending on the character encoding of
the file.
saw the XMLLocator interface, which I presume allows me to find out the
line number and the column within the line, but it doesn't include the
position from the start of the stream.
This is because it is very difficult to map back to the
original byte offset of the source file. Unless I wanted to
re-implement all of the character decoders, that is...
Is there an example somewhere of doing this with Neko? I would really
appreciate it if someone could help me with this, as I have nearly spent
the whole day trying to figure it out from the source code.
Because I use the standard Java character decoders, I have
no way of knowing the original byte offsets that correspond
to the resulting Unicode characters.
Could you explain in more detail exactly what information
you are trying to retrieve?
--
Andy Clark * [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]