Hi Elena,

> I've been working on performance, and I've noticed that in fact the
> start/endPrefixMapping calls are redundant in the XNI pipeline: the
> information about namespace binding can be retrieved via a
> namespace-context property (currently it is internal property but can be
> made public) that holds xerces.xni.NamespaceContext object. This object
> is *live*, since it is updated if new namespace declaration was read,
> and allows to query local namespace declarations.
> 
> I've prototyped this in Xerces and have seen 0%-6% performance
> improvement depending on the content of the file, i.e. if no namespace
> declarations were found, obviously, there is no performance gain.
> However, if there are many namespace declarations and file size is 32K
> and larger you would start seeing performance gain.

> So .. I propose to remove start/endPrefixMapping methods from the
> xerces.xni.XMLDocumentHandler interface. 

file size of 32k and with many namespace declarations would result in many 
start/endPrefixMapping method calls, though response time for plain method call 
is very small but would show up for large nubmer of calls. So, idea of removing 
redundant method calls is good, though i wont like to see this method be removed 
from XMLDocumentHandler interface. XNI is separate set of APIs like SAX,DOM to 
pass XML information to the application. So for those APIs to be complete, IMO, 
it should be part of XMLDocumentHandler interface as it is.

        However, lets see other side of it, in worst case scenario, there can be 
three component in pipeline for which there would be 6 (redundant) calls for 
each namespace declarations. So, we should take care of this issue in *our* XNI 
implementation. One way can be , we can leave the implementaion of this method 
empty in our various compoents and our parser generaters 
AbstractSAXParser/AbstractDOMParser can do the job of passing this information 
to the application via NamespaceContext object as you suggested. For XNI 
parsers, probably it can be taken care in AbstractXMLDocumentParser.. Need to 
think more on it.

What do you think ? 

        I did some performance changes in jaxp-fcs branch and some of it were 
integrated in main branch. There is another issue regarding performance, which i 
wanted to bring up. However, if everyone agree for the following change, we can 
put it in main trunk.
         
         Well, the issue is about storing entities information. As of right now 
this information is duplicated in DTDGrammar and XMLEntityManager. I picked up 
entity file, sent by one of the user at xerces-user mailing list which contains 
lot of entity declarations. The peformance considerably decereases when there 
are lot of entity delcatations. Despite the need of storing all the entity 
information, symbol table lookup becomes a costly affair. So, i would like to 
put 'rehash' function in our current symbol table similar to Crimsons' 
SymbolTable#rehash(). 
         
         Moreover, complete entity information is stored in XMLEntityManager
and DTDGrammar. 2-3 2d arrays and a hashtable is used to store entitiy index 
mapping in DTDGrammar. We can avoid storing the same information at two places, 
we can remove 2-3d arrays in DTDGrammar. However, this can be a issue, when we 
want to cache DTDGrammar where we need to store entity information in cached 
DTDGrammar. Neil, can comment more on this. I would suggest making different 
public class for storing all the entity information, which can be used by 
XMLEntityManager and DTDGrammar, by reference. 
        
        Stroring entity information at only one place, increases the performance 
of Xerces2 by 3-5 % when parsing an entity intensive file.


Neeraj


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to