Re: [Announce] The CyberNeko Tools for XNI 2002.07.17 Available

Andy Clark Mon, 22 Jul 2002 08:49:08 -0700

Elliotte Rusty Harold wrote:
> You are working under the misapprehension that this is your choice to 
> make. It isn't. No matter how valid your criticisms of XML are, you do 
> not get to decide what is and isn't in the language. That was decided 
> years ago by the XML working group and the W3C. Sometimes they decided 
> right. Sometimes they decided wrong. But they have decided, and it is 
> now up to implementers to follow the spec as it's written, not to write 
> their own half-XML/half meat-by-product parser.


Elliotte, you make some good points but I think you're
overreacting. I don't think it's as bad as you say. But
if a user goes to the XPP page and expects that a pull-
parser implementing this interface is fully compliant
with XML 1.0 out of the box, then that's clearly a
problem. But the problem is in the documentation (and
perhaps marketing).

There are many situations where a fully compliant XML
parser is not needed. For example, take any closed
application system that is generating well-formed,
valid XML documents. In this situation, they do not
need fully compliant parsers. And if they chose to use
a fully compliant parser, they would very likely not
hit their transactions/second requirements. However,
I do agree that the programmers that choose this path
must be fully aware of what they are doing.

On a related topic, I've been working on my own pull-
parsing API based on XNI. I have a preliminary design
implemented using the Xerces2 standard parser config
as the driver. It's working quite well and allows all
of the functionality of Xerces2 natively, including
augmentations such as PSVI.

The only downside I'm seeing is performance. But
this isn't due to the API but rather because of my
implementation. The convertor from document handler
callbacks to pull-parser events has to buffer the
data coming from the parser configuration. If you're
processing small documents you probably wouldn't
notice it but as the document size increases, the
constant character array copying kills you.

So I'm thinking of adding a feature to Xerces2 that
allows applications to set whether the scanner re-
uses its character buffers. If the scanner doesn't
reuse buffers, then my converter doesn't have to
copy anything. Therefore, the performance of the
pull-parser driven by Xerces2 should be better.

Of course, you could always write an implementation
only for the pull-parsing API that wouldn't suffer
this problem at all. You could even drive the API
from other pull-parsing APIs like XPP, etc. :)

-- 
Andy Clark * [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [Announce] The CyberNeko Tools for XNI 2002.07.17 Available

Reply via email to