> Theodore W. Leung wrote... > > Tuning Xerces is going to be an iterative process. We need some test > data that everyone can use, and we need a test driver that everyone can > use.
I think that is going to be really useful. Everytime we add some huge piece of code, we can actually see how is the performance affected. I will write the test driver also that can be used by everyone. > > I'm fine with the metrics and characterization of test data that you are > proposing in your message. I think it's a great start > > I'd also like to propose that all the people working on this check the > test data and the test classes into the build, so that anyone can run > the performance timings for themselves. (I'd like to see this for the > full test suite as well, but that's another message). Right!. I agree w/ you. > > I have some time that I can contribute towards this effort. You are always welcome Ted. :-) Thanks, Rahul. > > On Fri, 2002-05-03 at 14:03, Rahul Srivastava wrote: > > > > Hi folks, > > > > It has been long talking about improving the performance of Xerces2. There has > > been some benchmarking done earlier, for instance the one done by Dennis > > Sosnoski, see: http://www.sosnoski.com/opensrc/xmlbench/index.html . These > > results are important to know how fast/slow xerces is as compared to other > > parsers. But, we need to identify areas of improvement in xerces. We need to > > calculate the time taken by each individual component in the pipeline and figure > > out which component swallows how much time for various events and then we can > > actually concentrate on improving performance for those areas. So, here is what > > we plan to do: > > > > + sax parsing > > - time taken > > + dom parsing > > - dom construction time > > - dom traversal time > > - memory consumed > > - considering the feature deferred-dom as true/false for all of above > > + DTD validation > > - one time parse, time taken > > - multiple times parse using same instance, time taken for second parse onwards > > + Schema validation > > - one time parse, time taken > > - multiple times parse using same instance, time taken for second parse onwards > > + optimising the pipeline > > - calculate pipeline/component initialization time. > > - calculating the time each component in the pipeline takes to propagate > > the event. > > - Using configurations to set up an optimised pipeline for various cases > > such as novalidation, DTD validation only, etc. and calculate the > > time taken. > > > > Apart from this should we consider the existing grammar caching framework to > > evaluate the performance of the parser? > > > > We have classified the inputs to be used for this testing as follows: > > > > + instance docs used > > - tag centric (more tags and small content say 10-50 bytes) > > Type Tags# > > ------------------- > > * small 5-50 > > * medium 50-500 > > * large >500 > > > > - content centric (less tags say 5-10 and huge content) > > Type content b/w a pair of tag > > ------------------------------------- > > * small 500 kb > > * medium 500-5000 kb > > * large >5000 kb > > > > We can also have depth of the tags as a criteria for the above cases. > > > > Actually speaking, there can be enormous combinations and different figures in > > the above table that reflect the real word instance docs used. I would like to > > know the view of the community here. Is this data enough to evaluate the > > performance of the parser. Is there any data which is publicly available and can > > be used for performance evaluation?. > > > > + DTD's used > > - should use different types of entities > > > > + XMLSchema's used > > - should use most of the elements and datatypes > > > > Will it really help in any way? > > > > Any comments or suggestions appreciated. > > > > Thanks, > > Rahul. > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
