I've put two zip files on my personal home page that AEA Technology is
donating to the Apache project.  Neither are anything particularly special,
but they would let anyone who is diving who wants to take a look at the hot
spot in XMLAttr::set or lexical validation of reals to start with code that
works.

The first zip file contains a modified version of XMLAttr.hpp and
XMLAttr.cpp that eliminates the delete and new that currently occurs when
processing every attribute.  The donated code allocates an initial chunk of
memory for the name, value, etc of each XMLAttr and only delete's and new's
when an encountered attribute part is larger than the currently allocated
chunk.  If a larger attribute part is encountered, it then allocates the
next multiple of the chunk size.  Since the vast majority of attribute names
and values are fairly short (on the order of 1-32 characters), this
typically results in no allocations after the initial population of the
XMLAttr pool.  In sample files that were heavy with attributes, this
increased the speed of SAXCount by 20%.  The code has a preprocessor
definition in the XMLAttr.hpp file which allows you to switch from original
to new behavior.  Of course, that would have to be removed and the code
cleaned up before any type of integration.

The second file is Visual C++ 6 console application that implements and
benchmarks lexical validation and comparision of real values.  With slight
modification (some booleans to determine if periods, exponents, NaN's and
Infinity's are allowed), it could validate all numeric types (real, decimal,
integer).  If it is necessary to exactly reproduce the effects of IEEE
rounding on comparisions, it would also be relatively easy to determine if
rounding would be significant and in those few cases defer to a conversion
to double then a comparison.  Benchmarking on VC6 showed that lexical
validation could be 20 times faster than using of atof and 5  times faster
than using VarR8FromStr() and that use of atof() for numeric validation
could take as much time as parsing.  Since schema datatypes are first being
implemented in Java, I don't expect this code to do much more than to serve
as a potential resource for whoever does the implementation for Xerces-J.
However, maybe it will give you something to think about.

Here are the links (not guaranteed to be there for any length of time)

http://home.houston.rr.com/curta/XMLAttr.zip
http://home.houston.rr.com/curta/realvalid.zip


Reply via email to