In order to improve parsing performance of documents that do not include
DOCTYPE, I've tried to modify the XNI pipeline dynamically, removing
DTDValidator during parsing if it is not needed (i.e. no DTD grammar is
found).

Performance improvement varies depending mostly on the size of the
document, i.e. SAX parsing of medium size documents (100K-1M) with
validation turned off is 8%-12% faster.  

I've also created one example based on personal-schema.xml (removing
identity constraints from personal.xsd) with a size of 200K. I've turned
the (XML Schema) validation on and found improvement of 6%-7%.

The above change does not affect small documents (1K-100K) -- the
improvement is minor (less than 1%) probably is caused by measurement
noise.

However, to be able to modify the pipeline dynamically, we need to make
a change to XNI to add the following method to
xni.parser.XMLDocumentFilter:
  
  public void setDocumentSource(XMLDocumentSource source);

While constructing a parsing pipeline, for each filter component we need
to set the documentSource, so that if any component needs to remove
itself, it does it easily by calling
documentSource.setDocumentHandler(this.documentHandler);

Given the performance gain, I don't expect any objection. As we said
before, XNI is an API under development and could be modified.

If anyone does have any objection or concern, speak up now.


Thank you,
-- 
Elena Litani / IBM Toronto

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to