Re: [xerces2] Measuring performance and optimization

Rahul Srivastava Sun, 05 May 2002 22:39:35 -0700

> Theodore W. Leung wrote...
> 
> Tuning Xerces is going to be an iterative process.  We need some test
> data that everyone can use, and we need a test driver that everyone can
> use.


I think that is going to be really useful. Everytime we add some huge piece of code, 
we can actually see how is the performance affected. I will write the test driver 
also that can be used by everyone.

> 
> I'm fine with the metrics and characterization of test data that you are
> proposing in your message.  I think it's a great start
> 
> I'd also like to propose that all the people working on this check the
> test data and the test classes into the build, so that anyone can run
> the performance timings for themselves.  (I'd like to see this for the
> full test suite as well, but that's another message).  

Right!. I agree w/ you.

> 
> I have some time that I can contribute towards this effort.  

You are always welcome Ted. :-)


Thanks,
Rahul.


> 
> On Fri, 2002-05-03 at 14:03, Rahul Srivastava wrote:
> > 
> > Hi folks,
> > 
> > It has been long talking about improving the performance of Xerces2. There has 
> > been some benchmarking done earlier, for instance the one done by Dennis 
> > Sosnoski, see: http://www.sosnoski.com/opensrc/xmlbench/index.html . These 
> > results are important to know how fast/slow xerces is as compared to other 
> > parsers. But, we need to identify areas of improvement in xerces. We need to 
> > calculate the time taken by each individual component in the pipeline and figure 
> > out which component swallows how much time for various events and then we can 
> > actually concentrate on improving performance for those areas. So, here is what 
> > we plan to do:
> > 
> > + sax parsing
> >   - time taken
> > + dom parsing
> >   - dom construction time
> >   - dom traversal time
> >   - memory consumed
> >   - considering the feature deferred-dom as true/false for all of above
> > + DTD validation
> >   - one time parse, time taken
> >   - multiple times parse using same instance, time taken for second parse onwards
> > + Schema validation
> >   - one time parse, time taken
> >   - multiple times parse using same instance, time taken for second parse onwards
> > + optimising the pipeline
> >   - calculate pipeline/component initialization time.
> >   - calculating the time each component in the pipeline takes to propagate
> >     the event.
> >   - Using configurations to set up an optimised pipeline for various cases
> >     such as novalidation, DTD validation only, etc. and calculate the 
> >     time taken. 
> > 
> > Apart from this should we consider the existing grammar caching framework to 
> > evaluate the performance of the parser?
> > 
> > We have classified the inputs to be used for this testing as follows:
> > 
> > + instance docs used
> >   - tag centric (more tags and small content say 10-50 bytes)
> >       Type      Tags#
> >     -------------------
> >     * small     5-50   
> >     * medium    50-500
> >     * large     >500  
> >     
> >   - content centric (less tags say 5-10 and huge content)
> >       Type      content b/w a pair of tag
> >     -------------------------------------
> >     * small     500 kb
> >     * medium    500-5000 kb
> >     * large     >5000 kb
> > 
> > We can also have depth of the tags as a criteria for the above cases.
> > 
> > Actually speaking, there can be enormous combinations and different figures in 
> > the above table that reflect the real word instance docs used. I would like to 
> > know the view of the community here. Is this data enough to evaluate the 
> > performance of the parser. Is there any data which is publicly available and can 
> > be used for performance evaluation?.
> > 
> > + DTD's used
> >   - should use different types of entities
> >   
> > + XMLSchema's used
> >   - should use most of the elements and datatypes
> >   
> > Will it really help in any way?
> > 
> > Any comments or suggestions appreciated.
> > 
> > Thanks,
> > Rahul.
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [xerces2] Measuring performance and optimization

Reply via email to