In order to improve parsing performance of documents that do not include DOCTYPE, I've tried to modify the XNI pipeline dynamically, removing DTDValidator during parsing if it is not needed (i.e. no DTD grammar is found).
Performance improvement varies depending mostly on the size of the document, i.e. SAX parsing of medium size documents (100K-1M) with validation turned off is 8%-12% faster. I've also created one example based on personal-schema.xml (removing identity constraints from personal.xsd) with a size of 200K. I've turned the (XML Schema) validation on and found improvement of 6%-7%. The above change does not affect small documents (1K-100K) -- the improvement is minor (less than 1%) probably is caused by measurement noise. However, to be able to modify the pipeline dynamically, we need to make a change to XNI to add the following method to xni.parser.XMLDocumentFilter: public void setDocumentSource(XMLDocumentSource source); While constructing a parsing pipeline, for each filter component we need to set the documentSource, so that if any component needs to remove itself, it does it easily by calling documentSource.setDocumentHandler(this.documentHandler); Given the performance gain, I don't expect any objection. As we said before, XNI is an API under development and could be modified. If anyone does have any objection or concern, speak up now. Thank you, -- Elena Litani / IBM Toronto --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
