Re: Getting the position of a node in the input stream (using Neko)

Andy Clark 25 Aug 2002 18:07:20 -0000

Martin Jericho wrote:

 > to implement the same thing in NekoHTML. But in neither
 > case do we track "character offsets", which I think has
 > limited usefulness but others disagree.
Hopefully my arguments below will help to convince you of their usefulness.


I can understand the cases in which people would like to
be able to do this but I also realize what it would take
to implement it. ;)

The "limited usefulness" that I was referring to was the
fact that reporting character offsets only works if the
parsed source is already a character stream. If it's
anything else (say a byte stream in UTF8 or Shift_JIS)
then the application can't map those offsets back to the
source without re-reading the file.

> Because "no-change" has the potential of producing XML > that is not well-formed. And the whole purpose of Neko- > HTML is to parse HTML and make it appear as XML. So Neko has to do this because otherwise the underlying xerces parser would not be able to parse it. Is that right? This would not be of any


I don't use the Xerces2 parser to implement NekoHTML --
I only use the XNI framework and some utility classes.
The NekoHTML scanner is written completely from scratch
to be able to handle HTML

concern to me anyway if character positions were reported, I was just using it as an example to demonstrate that you can't get Neko to output the orginal source unchanged.


Ugh. ;)

> Please let me know more detail about these bugs so > that I can fix them. Minimal sample files would be > preferable. I have attached the relevant files.


Thanks for attaching those! I fixed the problem and
have included your sample input file (albeit a little
bit more stripped down) in my set of regression tests.
So if I ever break it, I'll know right away. :)

I changed the behavior of <COL> to *not* automatically
insert a <COLGROUP> as its parent. Is this the behavior
you were expecting? Also, as a general question, do
you think that NekoHTML should insert a <TBODY> parent
for <TR> elements? I notice that Mozilla inserts one.
<aside>The DOM Inspector rules!</aside>

I get the feeling that this would have to be implemented in the XNI framework rather than as a Neko improvement. I would love to get


If it were to be added, the place would be in the
XNI interfaces which would then be implemented in
NekoHTML.

buffer, not the original source. In fact the only HTML parser I have found which does it is the one in the swing package, although I still haven't tested it properly. If that doesn't do what I want, I might even have to write my own from scratch.


Well, whatever you do, take advantage of all of the
code that's available. I hope NekoHTML can be of use
but if not then that's ok, too.

I'm trying to get a new version of NekoHTML posted
"real soon now". I will make an announcement when
it's ready.

--
Andy Clark * [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Getting the position of a node in the input stream (using Neko)

Reply via email to