Martin Jericho wrote:
 > to implement the same thing in NekoHTML. But in neither
 > case do we track "character offsets", which I think has
 > limited usefulness but others disagree.
Hopefully my arguments below will help to convince you of their usefulness.

I can understand the cases in which people would like to be able to do this but I also realize what it would take to implement it. ;)

The "limited usefulness" that I was referring to was the
fact that reporting character offsets only works if the
parsed source is already a character stream. If it's
anything else (say a byte stream in UTF8 or Shift_JIS)
then the application can't map those offsets back to the
source without re-reading the file.

> Because "no-change" has the potential of producing XML
> that is not well-formed. And the whole purpose of Neko-
> HTML is to parse HTML and make it appear as XML.
So Neko has to do this because otherwise the underlying xerces parser would not be able to parse it. Is that right? This would not be of any

I don't use the Xerces2 parser to implement NekoHTML -- I only use the XNI framework and some utility classes. The NekoHTML scanner is written completely from scratch to be able to handle HTML

concern to me anyway if character positions were reported, I was just using it as an example to demonstrate that you can't get Neko to output the orginal source unchanged.

Ugh. ;)

> Please let me know more detail about these bugs so
> that I can fix them. Minimal sample files would be
> preferable.
I have attached the relevant files.

Thanks for attaching those! I fixed the problem and have included your sample input file (albeit a little bit more stripped down) in my set of regression tests. So if I ever break it, I'll know right away. :)

I changed the behavior of <COL> to *not* automatically
insert a <COLGROUP> as its parent. Is this the behavior
you were expecting? Also, as a general question, do
you think that NekoHTML should insert a <TBODY> parent
for <TR> elements? I notice that Mozilla inserts one.
<aside>The DOM Inspector rules!</aside>

I get the feeling that this would have to be implemented in the XNI framework rather than as a Neko improvement. I would love to get

If it were to be added, the place would be in the XNI interfaces which would then be implemented in NekoHTML.

buffer, not the original source. In fact the only HTML parser I have found which does it is the one in the swing package, although I still haven't tested it properly. If that doesn't do what I want, I might even have to write my own from scratch.

Well, whatever you do, take advantage of all of the code that's available. I hope NekoHTML can be of use but if not then that's ok, too.

I'm trying to get a new version of NekoHTML posted
"real soon now". I will make an announcement when
it's ready.

--
Andy Clark * [EMAIL PROTECTED]


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to