Re: Getting the position of a node in the input stream (using Neko)

Andy Clark 20 Aug 2002 18:19:59 -0000

Martin Jericho wrote:

I want to parse an HTML document using Neko, and all I want to find out is the character position of particular nodes in the input stream. I


You have to make a distinction between "character position"
and "byte offset" into the source file. They are not equivalent
and can vary greatly depending on the character encoding of
the file.

saw the XMLLocator interface, which I presume allows me to find out the line number and the column within the line, but it doesn't include the position from the start of the stream.


This is because it is very difficult to map back to the
original byte offset of the source file. Unless I wanted to
re-implement all of the character decoders, that is...

Is there an example somewhere of doing this with Neko? I would really appreciate it if someone could help me with this, as I have nearly spent the whole day trying to figure it out from the source code.


Because I use the standard Java character decoders, I have
no way of knowing the original byte offsets that correspond
to the resulting Unicode characters.

Could you explain in more detail exactly what information
you are trying to retrieve?


--
Andy Clark * [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Getting the position of a node in the input stream (using Neko)

Reply via email to