Hi Elena,
> I've been working on performance, and I've noticed that in fact the
> start/endPrefixMapping calls are redundant in the XNI pipeline: the
> information about namespace binding can be retrieved via a
> namespace-context property (currently it is internal property but can be
> made public) that holds xerces.xni.NamespaceContext object. This object
> is *live*, since it is updated if new namespace declaration was read,
> and allows to query local namespace declarations.
>
> I've prototyped this in Xerces and have seen 0%-6% performance
> improvement depending on the content of the file, i.e. if no namespace
> declarations were found, obviously, there is no performance gain.
> However, if there are many namespace declarations and file size is 32K
> and larger you would start seeing performance gain.
> So .. I propose to remove start/endPrefixMapping methods from the
> xerces.xni.XMLDocumentHandler interface.
file size of 32k and with many namespace declarations would result in many
start/endPrefixMapping method calls, though response time for plain method call
is very small but would show up for large nubmer of calls. So, idea of removing
redundant method calls is good, though i wont like to see this method be removed
from XMLDocumentHandler interface. XNI is separate set of APIs like SAX,DOM to
pass XML information to the application. So for those APIs to be complete, IMO,
it should be part of XMLDocumentHandler interface as it is.
However, lets see other side of it, in worst case scenario, there can be
three component in pipeline for which there would be 6 (redundant) calls for
each namespace declarations. So, we should take care of this issue in *our* XNI
implementation. One way can be , we can leave the implementaion of this method
empty in our various compoents and our parser generaters
AbstractSAXParser/AbstractDOMParser can do the job of passing this information
to the application via NamespaceContext object as you suggested. For XNI
parsers, probably it can be taken care in AbstractXMLDocumentParser.. Need to
think more on it.
What do you think ?
I did some performance changes in jaxp-fcs branch and some of it were
integrated in main branch. There is another issue regarding performance, which i
wanted to bring up. However, if everyone agree for the following change, we can
put it in main trunk.
Well, the issue is about storing entities information. As of right now
this information is duplicated in DTDGrammar and XMLEntityManager. I picked up
entity file, sent by one of the user at xerces-user mailing list which contains
lot of entity declarations. The peformance considerably decereases when there
are lot of entity delcatations. Despite the need of storing all the entity
information, symbol table lookup becomes a costly affair. So, i would like to
put 'rehash' function in our current symbol table similar to Crimsons'
SymbolTable#rehash().
Moreover, complete entity information is stored in XMLEntityManager
and DTDGrammar. 2-3 2d arrays and a hashtable is used to store entitiy index
mapping in DTDGrammar. We can avoid storing the same information at two places,
we can remove 2-3d arrays in DTDGrammar. However, this can be a issue, when we
want to cache DTDGrammar where we need to store entity information in cached
DTDGrammar. Neil, can comment more on this. I would suggest making different
public class for storing all the entity information, which can be used by
XMLEntityManager and DTDGrammar, by reference.
Stroring entity information at only one place, increases the performance
of Xerces2 by 3-5 % when parsing an entity intensive file.
Neeraj
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]