Hi everybody,
on my search for a XML parser/DOM with namespace support, I discovered
Xerces-J. I made some simple tests with it and realized that it is not quite
useable for my purposes, because of some problems I encountered. Maybe you are
interested in and maybe you can give me some tips:
- In my case, I don't need validation, since I have a fixed set of
documents. So I tried to turn the validation off. But it still attempts
to load the DTD and/or the schemes specified via DOCTYPE/the namespace
URIs. This is a performance/memory overhead I would like to get rid of.
- Since I want to use the original namespace URIs (e.g. for HTML), it
tries to download stuff from the W3C server. But I would like to
avoid the download for each document which is parsed, esp. as I want
to work offline. Well, I got the idea to use an own implementation
of the EntityResolver, which maps the URIs to local files (or to
empty input streams). This works with the DTD/DOCTYPE, but
unfortunately not for the namespace URIs. I tracked the problem down:
The schema loader creates a new instance of an XML parser, and sets
the EntityResolver of the parser to a default one, instead of using
the EntityResolver which was used in the originating parser.
- I know that DOM2 is still in draft phase, and therefore it makes sense
that the org.w3c.dom packages still contain only DOM Level 1. On the
other side, it would be much easier to switch from the DOM2 WD APIs
to the final APIs instead from the xerces.dom.* APIs. A compile test
where I "borrowed" the DOM2 WD interfaces from OpenXML showed that
obviously Xerces already implements all current interfaces.
So why don't you supply a build version, which includes the
DOM2 WD APIs? You can mark all classes with "deprecated" and
corresponding comments, so that every user will notice the "draft"
status.
Thanks for reading up to here :-)
regards,
Klaus Malorny